Monographs of the Portuguese Mathematical Society - Vol. I
Introduction to Random Time and Quantum Randomness
/
/
* ...
57 downloads
967 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Monographs of the Portuguese Mathematical Society - Vol. I
Introduction to Random Time and Quantum Randomness
/
/
*
-
•
New Edition
I Ti Kai Lai Chung Jean-Claude Zambrini
World Scientific
Introduction to Random Time and Quantum Randomness
This page is intentionally left blank
Monographs of the Portuguese Mathematical
Society-
Vol. 1
Introduction to Random Time and Quantum Randomness N e w Edition
Kai Lai Chung Stanford University, USA
Jean-Claude Zambrini Universidade de Lisboa, Portugal
V f e World Scientific wB
New Jersey • London • Si Singapore • Hong Kong
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: Suite 202, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
INTRODUCTION TO RANDOM TIME AND QUANTUM RANDOMNESS (New Edition) Copyright © 2003 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN 981-238-388-3 ISBN 981-238-415-4 (pbk)
Printed in Singapore by World Scientific Printers (S) Pte Ltd
Monographs of the Portuguese Mathematical Society With Introduction to Random Time and Quantum Randomness by Professors K. L. Chung and J. C. Zambrini, the Portuguese Mathematical Society initiates the edition of a series of Monographs in Mathematics. Its aim is to publish lecture notes in domains of current mathematical research, with an accessible and relatively self-contained style, in such a way that dedicated graduate students may initiate themselves into the topics presented and the advanced literature. Mathematical Research in Portugal, during the 20th century, has suffered the consequences of various difficulties, of cultural, political and economical nature, of the lack of scientific tradition and of good postgraduate schools, among others. The Portuguese Mathematical Society has been, since its creation in 1940, one of the rare circles standing up for development and scientific quality, even when it was not a priority of the State. In spite of the recent spectacular progress in this direction, a certain lack of ambition, as well as the heavy weight of academic structures, archaic in many respects, are still felt. It is our wish that this series of Monographs will contribute to make accessible current trends of research in Mathematics and their various applications to the representatives of the new generation, and will help them to believe in their true potential to build a better scientific future. We thank the financial support of the Portuguese Foundation for Science and Technology, the Calouste Gulbenkian Foundation and the Mathematics Center of the University of Coimbra, without which it would not have been possible to launch this collection. ANA BELA CRUZEIRO
Director of the Monografias da S.P.M.
This page is intentionally left blank
Monografias da Sociedade Portuguesa de Matematica Com a obra Introduction to Random Time and Quantum Randomness da autoria dos Professores K. L. Chung e J. C. Zambrini inicia a Sociedade Portuguesa de Matematica a edigao de uma serie de Monografias em Matematica. Pretende-se atraves dela publicar textos em dominios de investigagao matematica actual, abordados de forma acessivel e auto-contida, de tal forma que um licenciado possa iniciar-se aos temas ai tratados. A investigagao em Matematica em Portugal no seculo XX sofreu as consequencias de dificuldades varias, de natureza politica, cultural e economica, de falta de tradigao, de escassez de boas escolas, entre outras. A Sociedade Portuguesa de Matematica foi, desde a sua criagao em 1940, um dos raros circulos a apostar no desenvolvimento, na qualidade cientifica. E apesar de nos ultimos anos termos assistido a um indiscutivel progresso neste sentido, uma certa falta de ambigao, bem como o peso de estruturas universitarias em muitos aspectos arcaicas, ainda se fazem sentir. E nosso desejo que esta serie de Monografias venha a contribuir para que as novas geragoes tenham maior acesso ao que hoje se faz em Matematica e suas aplicagoes e sobretudo para que acreditem nas suas potencialidades e na possibilidade de construirem um futuro melhor. Agradecemos os apoios financeiros da Fundagao para a Ciencia e Tecnologia, da Fundagao Calouste Gulbenkian e do Centro de Matematica da Universidade de Coimbra, sem os quais nao nos teria sido possivel iniciar as Monografias de Matematica. A N A BELA CRUZEIRO
Directora das Monografias da S.P.M.
This page is intentionally left blank
Guide
This book consists of two parts. Part 1 can be read independently from Part 2, and is strictly elementary, at least from the beginning. Part 2 requires basic knowledge of classical as well as quantum physics. The reader whose main interest is in quantum physics, and who is "willing and able" to learn something from probability theory should read the Foreword to Part 2 to see what it is all about, then read the Foreword to Part 1 to see where Random Time may enter into the picture.
This page is intentionally left blank
Contents
Monographs of the Portuguese Mathematical Society
v
Monografias da Sociedade Portuguesa de Matematica
vii
Guide
ix
Foreword to Part 1
1
Part 1. Introduction to Random Time 1 Prologue 2 Stopping time 3 Martingale stopped 4 Random past and future 5 Other times 6 From first to last 7 Gapless time 8 Markov chain in continuum time 9 The trouble with the infinite References
3 11 18 28 36 47 58 64 69 76
Foreword to Part 2
79
Part 2. 1 2 3 4
Introduction to Quantum Randomness Classical prologue Standard quantum mechanics Probabilities in standard quantum mechanics Feynman's approach to quantum probabilities 4.1 Lagrangian mechanics 4.2 Feynman's space-time reinterpretation of quantum mechanics xi
85 93 108 118 118 123
xii
Contents
5
Schrodinger's Euclidean quantum mechanics 5.1 A probabilistic interpretation of Feynman's approach 5.2 Feynman's results revisited 6 Beyond Feynman's approach 6.1 More quantum symmetries 6.2 Introduction to functional calculus 7 Time for a dialogue References
143 143 157 171 171 190 196 205
Index
209
Foreword to Part 1
Part 1 originated in a series of lectures given in May 1997 at the Grupo de Fisica Matematica in Lisboa, by invitation of Professor Zambrini. My notes were later considerably expanded and portions of them were presented in seminars at the Scuola Normale Superiore at Pisa, University of California at San Diego, and Korteweg-DeVries Institute at Amsterdam. The announced title for the last occasion was: "II Tempo Va Meglio Per Caso" [Time Better Be Random], which gives an idea of my intention. Diverse kinds of RANDOM TIME in various probability schemes, some ancient and some modernistic, are revealed and studied, and their often tremendous impact on the impending process illustrated. These random times sometimes behave like nonrandom (constant) ones and sometimes radically differently. Two of them, the first exit time, and the last exit time of the Brownian paths from a conductor, have successfully served in the mathematical theory of the Green-Gauss-Schrodinger-Feynman "Ideenkreis" (see reference [7] at the end of Part 1). Other kinds of random time and their significance remain to be discovered. With the introduction of probability into quantum physics, there would seem to be a fertile field of exploitation and cultivation. As my former colleague Dick Feynman (c. 1949) might put it: here is a bag of tricks for the smart cookies to pick up. At least such is the hope of both authors of this small volume. K A I LAI CHUNG
Stanford, 2001
l
This page is intentionally left blank
Part 1
Introduction to Random Time
1.
Prologue
Random Time appeared at the dawn of the probability discourse, between Fermat and Pascal, in the Gambler's Ruin problem. In the simpler language of random walk: one starts at an integer between a an b (both integers), takes one unit step to the next integer, either right or left with equal probability 1/2, and continues to do so until either a or 6 is reached, then stops. Two questions arise: Question 1 What is the probability that a is reached before 6, or vice versa? Question 2 How many steps have been taken before stop? The second question is clearly that of a Random Time: the duration of the random walk. Mathematics begin with notation: for each nonnegative integer t let X{t) denote the position of the walk at time t, namely after t steps. Then define a random variable T as follows: T = min{t > 0 : X(t) = a or b}.
(1.1)
In words, T is the first time a or b is reached. The silly case of t = 0 is allowed in the definition for notational convenience, as will be seen. Question 1 may be formulated as that of the Random Place X(T) at the random time T. Its distribution is given by P{X(T)
= z) = pz,
for z = a or b.
(1.2)
A complete answer to Question 2 requires the distribution of T, namely: P{T = t},
t = 0,1,2,... 3
(1.3)
4
Part 1. Introduction
to random
time
Although this can be done we shall content ourselves with the simpler mathematical expectation or "mean" of T: oo
£ ( T ) = £ P ( T = t)t.
(1.4)
t=o
These problems were solved by the ancients by combinatorial and analytical methods, even in the harder case where the probabilities of walking to the right or left are unequal: see Todhunter's History of Probability [12]. Here we shall treat them from a "higher standpoint" (Felix Klein) to highlight "The Importance of Being Random". The random walk, known as Bernoullian, may be described in terms of a sequence of "independent and identically distributed" (IID) random variables {yt,t 6 N} (N = the set of natural numbers) with P{yi
= +1} = P{yi
= -1} = i,
with mean E(yi) — 0 and variance E(y\) = 1. When the walk starts at yo, a constant, we have for all integers t > 0: t
X(i) = £ i / s=0
We obtain then, using the additivity of mean and variance, the two equations for the mean and variance of X(t): E(X(t)) 2
2
E(X (t))=y +t
= y0 = E(X(0)); 2
= E(X (0)+t).
(1.5) (1.6)
Be sure to pay attention to the last members of these equations which seem unnecessary, a waste of time. But wait! It is folklore that two unknowns can be solved from two equations. The calculation of the mean and the variance is also folklore in statistics as well as probability, tantamount to the momentum mv and the kinetic energy (l/2)mv2 in classical mechanics of a point particle of mass m and velocity v. Here the two unknowns are T and X(T) or rather their numerical (nonrandom) "characteristics" given in (1.2) and (1.4), and they are to to be gotten from the equations (1.5) and (1.6). A bold idea is to think of the Time t in them as a random T. Why, it is "arbitrary" there, which in our everyday speech signifies "picked at random"! I have indeed prepared for
5
1. Prologue
this randomization by adjoining the third members in (1.5) and (1.6), as already alerted, to make it feasible. Thus we want E(X(T)) 2
E(X (T))
= E(X(0)); 2
= E(X (0))
+ E(T).
(1.7) (1.8)
Recall that X(0) = y0. As normal procedure of scientific research, one makes a good guess, deduces some useful consequences, verifies them in simple cases by observation or experiment, before one ponders the meaning of the unproven premises. Now equation (1.7) produces only one equation between the two unknowns in (1.2): apa + bpb = y0.
(1.9)
But there is another "obvious" equation lurking in the background: P«+P6 = l.
(1.10)
The solution of this system of two linear equations is immediate: b-yo 2/o-a , . Pa = T , Pb=-r • (1-11) b— a b— a With these the second moment of X(T) can be computed and then equation (1.8) yields E(T) = (yo -a)(by0). (1.12) Both problems are solved. A small computer can check the numerical accuracy of the answers for a few values of a and b (presumably, since I have never done this). It is time to go back to the brilliant insight that produced the probability relation (1.10). I have intentionally postponed the discussion for the reader to reflect on his own credulity or credibility. Here is a caveat: we have proceeded as if that random time T really existsl Its definition in (1.1) presupposes that there is a t for which X(t) equals a or b. How do we know that? Take a — 0, b = 3 and yo = 1. The random walk can go back and forth between 1 and 2, indefinitely, namely ad infinitum, without ever reaching 0 or 2. Mathematically, if 2/2s-i = +1 and y2s = —1 for s = 1,2,..., then X(2s — 1) = 2, X(2s) = 1 for all those s, and consequently X(t) for all t > 0 is never 0 or 3, and so T is not defined. (This case is physically realizable: the perpetual swinging of a pendulum in vacuum
6
Part 1. Introduction
to random
time
being an approximate parable). It is true that the particular event has probability zero because 1 1 1 2 2 2
, ,
x
v
;
„
But that does not mean it cannot happen. Moreover if a = 0 but b = 10, say, then there are many many ways in which the random walk can persist among the nine integers 1,2,...,8,9, without ever hitting 0 or 10. How do you show that all those diverse events have a total probability zero? 1 I have made this "digression" into the distinction between logical impossibility and "zero probability" or "almost impossibility" for two reasons. One is the reminder that the Second Law of Thermodynamics is only an "almost sure" theorem, a rare physical law.2 The other is a personal experience of conversation with an expert in electrical engineering on the "transience" of a ball (or any compact set in R 3 ) for the Brownian motion process in space R 3 , or the discrete Bernoullian random walk in the 3-dimensional lattice, demonstrated by Polya in 1921. His incredulous tone still rings in my ear: "You mean there is not even a tiny tiny probability that the trajectory will always return to that ball?" So let us prove that the random time T is defined, "almost surely", namely: P{T exist} = P{X(t)
= a or b for some t < oo} = 1.
(1.13)
This result can be proved in a much larger context by a general method, to be discussed later, but it may be worthwhile to indicate here a special inductive argument which may be more convincing to an applied scientist. If b — a = 2 so that a + 1 = b — l i s the only position strictly between a and b, then starting from there, in one step whether to the right or left, b or a is reached. This is of course a trivial case but permits us to begin the mathematical induction. Suppose then the result is true when b — a = n, we 1
T h e total number of possible paths that never leave the interval [0,10] is not only infinite but uncountably so, having the cardinal number 2 a , where a is aleph-zero, hence that of the set [0,10] under the Continuum Hypothesis. Erwin Schrodinger, the preeminent physicist who brought probability into quantum physics, would be interested in this "null set possessing full cardinal power". He discussed at length the Cantor tertiary set in a lecture reprinted in [11]. 2 In 1937 or 1938, in Changsha or Kunming, I heard in a class on Heat the lecture by Professor Yeh Chieh-Sun giving a deduction of the second law of thermodynamics from the impossibility of perpetual motion, which made a strong impression on me. In drafting the present notes I consulted a number of experts on this topic: to my disillusionment none could cite a "state-of-the-art" version of the Law as an almost sure theorem.
1. Prologue
7
shall prove that it is also true when 6 — a = n + 1. Namely: if one starts at an integer strictly between a and 6, the probability is zero that one remains in [a + 1, b — 1] forever. By the induction hypothesis, one will reach either a or b — 1, almost surely, because (b — 1) — a = n. Hence in order to remain in [a + 1, b — 1], only 6 - 1 may be reached. Once at b — 1, the next step must be to the left, namely —1, to avoid hitting b. Thus one must go to b — 2, and then the random walk continues. By the induction hypothesis, sooner or later one will be at b— 1 again, and then the next step must be —1 again, and so on and so forth, indefinitely, ad infinitum (the Latin sounds more emphatic!): thus the event of remaining forever in [a +1, b — 1] necessitates, under our induction hypothesis, the repeated being at b — 1 followed by a step to the left, and this repetition must go on without end. Each obligatory step-left has probability 1/2, and these successive steps are independent (why?), hence the compound probability of its unending recurrence is given by the same evaluation as above: zero. I must warn the reader that in the (usually convincing) argument given above I have used an infinite number of random times, viz. the successive times the random walk finds itself at 6 — 1. Moreover I have used the independence of the corresponding infinite number of left-steps at random times. As I have alerted by "why?" there, this is not at all an obvious consequence of the independence of the original steps denoted by yn that occur at non-random times n = 1, 2 , 3 , . . . We shall return to the question later. As Nature abhors a vacuum, mathematical symbol abhors an undefined case. Therefore it is customary to define the min (or inf) of an empty set to be +00. Under this convention the T in (1.1) will be defined to be +oo when the min does not exist, namely when X(t) is never equal to a or 6 for all integer t > 0. This makes good common sense since "Never" signifies "Wait until eternity!" But note that when T — +00, X(T) is undefined. It is also customary to extend the summation in (1.4) to include a final term with t = +00, and define 0 • (+00) = 0, while c • (+00) = +00 for any number c > 0. Then of course E(T) < +00
only if
P{T < +00} = 1.
(1.14)
Don't forget that our previous derivation of (1-12) made fundamental use of the second assertion in (1.14) in the form of (1.10)—one must not put the cart before the horse.
8
Part 1. Introduction
to random
time
We now examine the legitimacy of substituting the random T for the constant t in (1.5) and (1.6) to get (1.7) and (1.8). Indeed this is my primary purpose in expounding the devious approach to the gambler's problem. To dispel any unwarranted optimism, we will give at once a trivial example to show that such a substitution is absolutely forbidden when T is "arbitrarily random"! A good physicist is supposed to rely on intuition to discern the boundaries between the heuristically feasible and the palpably fallacious. In the present case this means the guessing of some condition for the validity of the substitution. It will be left open here to give the reader time to think. Example 1. Define T to be 1 if 2/1 and j/2 are both + 1 : and to be 2 in all other cases. Fix y0 = 0. Then X(T) = 1 if y i = 1, y2 = 1; X(T) = 0 if 2/i = 1,2/2 = - 1 ; X(T) = 0 if 2/1 = - 1 , 2/2 = 1; X(T) = - 2 if yx = - 1 , 2/2 = — 1. Each of the four enumerated events has probability 1/4, hence E(X{T)) = - 1 / 4 whereas E{X(0)) = 0. (1.7) is false. Now let us change the definition of T a little: define T to be 1 when 2/i = 2/2 and to be 2 otherwise. Then E(X(T)) = 0 = E(X(0)) so that (1.7) is true. This goes to show that we can play all sorts of games and get what we want, no? But one can raise a serious objection to these examples as being concocted without rhyme or reason: we do not encounter such wild random times in the real world of physics or, say, genetics! True, so let us consider a "natural" scheme. Example 2. Suppose, as a natural extension of the T in (1.1), we omit the stop b in the random walk so that it ranges in [a, oo] with the only stop at a. In the lingo of gambling, you are playing against an infinitely rich opponent who cannot be ruined. Thus it is only your own ruin that's the question, and the relevant random time is now Ta = min{t > 0 : X(t) = a}.
(1.15)
Let us denote the T in (1.1) by Tab) = Ex(Ta). b—>oo
(1.16)
1. Prologue
9
Hence by (1.12) we have Ex(Ta)
= oo,
i6(a,oo).
(1.17)
Easy? But it is an astounding result! Take x = a + 1 to let this sink in: the expected time to make that one little step from a + 1 to a is infinite! I wonder if any good practicing scientist can surmise such a miracle without the trivial calculation. Posto facto, one can argue that were it not so, then in a fair coin-tossing game one could always win as many dollars as one desires simply by "sticking it out" long enough. Unfortunately (1.17) does not imply Px{Ta < oo} = 1.
(1.18)
To show this we need to go through another limiting process, using (1.11) as follows: Px{Ta < oo} > Px{Ta < Tb} = | 5 |
(1.19)
and letting b —» oo. Thus (1.18) is true. I have often wondered if this result, more profound than the previous (1.13), could also be "physically" obvious. Now let us consider X(Ta). Of course it is equal to a but let's not forget that we need (1.18) desperately to clinch this "obvious" evaluation. Well then we have, if x ^ a: Ex{X{Ta)}
= a±x
= Ex{X(0)}.
(1.20)
Thus (1.7) is manifestly false when T = Ta. Is there any reason to presume its truth when T = T a , b ? In the next section we shall prove this by two different methods, but first a historical compendium. The ancient greats were usually sparing in introducing symbolism and preferred verbal descriptions. So let us dispense with T and X(T) and put ux = Px {there exists integer t>0
such that X(t) = a
but X(s) ^ b for all integer s in [0,t)}.
(1.21)
With the convention for T allowing it to be +oo it turns out that we have unambiguously: ux = Px{Ta < Tb}.
(1.22)
10
Part 1. Introduction
to random
time
Note t h a t when Tb = + 0 0 , "Ta < T b " means "T a < + 0 0 " — a lucky fluke. It is easy t o show t h a t ux satisfies t h e equation 1 Ux = 2Ux~1
+
1 oUx+1'
a <x < b,
with t h e b o u n d a r y condition ua = 1,
Ub = 0.
T h e unique solution is given by (1.23) a Interchanging a and b in (1.22) and denoting t h e resulting probability by vx, we have of course vx =
a —x
6'
and therefore by addition a < x < b. Recalling our remark after (1.22), t h e last equation is nothing but our previous (1.13) now proved by a two-pronged calculation. Old analysis is powerful, as the late Solomon Bochner used t o say in his class. As a m a t t e r of fact, Henri Poincare ("the last universalist") took note of the above derivation after observing t h a t a priori, t h e eventual ruin of one of t h e two gamblers is by n o means certain; see his Calcul des probabilites, 1912 (p. 73). 3 T h u s Question 1 for t h e gambler's ruin problem is solved, and t h e numerical result shows t h a t t h e duration time T is finite with probability one. Turning to Question 2 and p u t t i n g wx =
Ex{Ta,b},
we can show t h a t it satisfies t h e equations: wx = - » i - i + -^Wx+\ + 1,
a < x
X(t)\) Thus we obtain (2.10) from (2.11). So the finiteness of Ta^ is still needed in this new proof—an unexpected twist that should please Poincare. Now let us apply a similar argument to Ta, a > 0, and let we start at 0. It does not matter whether we choose a to be 1 or 10 6 ' 10 7 We have seen above that Ta is almost surely finite but has infinite mean. As in (2.11) we obtain from Theorem 2.1: Ex{X(Ta
A t)} = 0.
(2.12)
This disingenuous equation has a deep significance in the gambling context. Recall that the coin-tossing, namely Bernoullian game is known as "fair" because E{ys} = 0 for all s. On the other hand, the finiteness of Ta means that the gambler is sure (almost!) to win the sum a sooner or later provided he plays long enough and has unlimited credit. For a large value a this does not sound real despite the caveats. Equation (2.12) offers a "reality check": Take a = 1 and carry out the computation of the exact distribution of X(Ti A t) for t = 1, 2, 3,4,5,6, 7, say, and you will begin to see a tangible re-vindication of the "fairness" of the game. D o it. 8 Before leaving this fascinating theme, let me exhibit here the exact 7
Except for the factor 6 the reciprocal of this number is found on p. 343 of [9], where the Second Law is the topic. The undergraduate mathematics, physics and engineering students should be reminded that 10 1 0 is to infinity as zero to one. So far as an event of zero probability is concerned, the sound and fury signifies nothing. According to Todhunter, D'Alembert advised to equate a probability 10~ 4 (one in ten thousand) with 0. Borel elaborated on this by three descending scales: 1 0 - 6 , 10~ 1 5 , 1 0 - 5 0 [2, pp. 6,7]. This footnote belongs perhaps more appropriately to my earlier mention of the Second Law of Thermodynamics in footnote 2. 8 Dr. James A. Given was kind enough to do my assignment on computer up to t = 10; the exact distributions for t = 8, 9,10 are recorded below, where each possible value is followed by its probability: 28 . n 14 . -i 186 1 . C 7 . A 20 . o X(Ti A 8) : °'Q 256' z ' 256'"' 256' -"•> 256' ' 256' '256' X(Ti A 9) :- q X(Ti A 10)
- 1 - - - 7 -§-•->; 27 . o 48 . 1i 42 . i 512' °' 512' ' 512' ' ' 512'1 ' 512' °' in • n 9 ;-6, 35 . A 75 . n 1024' ' 1024' ' "' 1024 ' °' 1024 ;
386 512' 772 90 . n 42 . i 1024' ;1024 » i ' 1 0 2 4 '
18
Part 1. Introduction
to random
time
distribution of Tt (and of T0) [5, p. 273]: P ° { T 1 = 2 n - l } = P°{To = 2n} = ^
T
(
2
n n
) ^
r
,
„ G N.
(2.13)
Prom this we can prove again that E°{T\} = oo, a truly incredible result. Is there any intuitive way to see it? The result can be generalized to the case of Theorem 2.1, with M = 0. For any real number a > 0 put T = min{n > 1 : X(n) > a}.
(2.14)
If E{T} were finite then Theorem 2.1 would imply that E{X(T)} = 0 which is absurd because X(T) > a > 0 by definition. Note carefully that we need T finite in the argument which is implied by the premise of the reductio ad absurdum. For an "incredible" neat extension of (2.13) and a centuries-old unsolved problem, see my article "Sul problema del ritorno all' equilibrio", Rend. Mat. Ace. Lincei s. 9, v. 10, 213-218 (1999).
3.
Martingale Stopped
For an arbitrary sequence of random variables {ys} with finite means, we have in the notation of (2.1): E{Xt+1
| Tt} = E{Xt + yt+1 | Tt) = Xt + E{yt+1
| Tt}.
(3.1)
Hence if for each t E{yt+1
| Tt) = 0,
(3.2)
then we have E{Xt+i
| Tt} =Xt,
t€ N.
(3.3)
When {ys} is IID with common mean zero, this is of course the case. On second thought the independence may be dropped from the assumption. Now let us forget all about the ys and simply define a process under the sole assumption (3.3) with the proviso that i?{|X t |} < oo for all t. Such a process is called a martingale; more completely denoted by {Xt, T}, where the tribes {Tt} may be an arbitrary given increasing sequence of subtribes of T^ subject to the condition that Xt be J^-measurable for each t.
3. Martingale
stopped
19
The equation (3.3) has a meaning in gambling, when X(t) represents the "current balance" of the gambler. It says that his expected new balance after one more game will be the same as his current balance. It is unlikely that an avid investor would be interested in such an investment, but the mathematical theory of martingale turns out to be quite rich, indeed it has been fashionably applied in stock markets. Let us begin by simplifying the notation for conditional expectation. For any random variable Z with E{\Z\] < oo and any subtribe Q of J7, we define a random variable denoted by [Z/G] with the following property: [Z/G] e £;
and for any AeQ:
E{A; [Z/G]} = E{A; Z}.
(3.4)
Such a random variable exists and is unique almost surely, namely up to a P-null set. See e.g. [3, Sec. 9.1] for discussion. An important property of conditioning is the following: [{Z/g]/H} = [Z/gAH}
(3.4*)
for two subtribes of T one of which is contained (included) in the other, with the smaller one denoted by g A H. (The result is false if g and H are any two subtribes and g A H is their common part!) When this successive conditioning is applied to Tt and Ts, we obtain the result that the defining relation for a martingale given in (3.3) implies the propagated relation in which the Xt+i there is enhanced to Xt+n for any n > 1. In the gambling interpretation this seems "obvious", but it is a dead-pan mathematical deduction that is worth an exercise. Referring back to Theorem 2.1 above, where {X(t)} is a martingale when M = 0, as may be supposed without loss of generality, the result then says that for any optional random time T with finite mean, we have E{X(T)} = E{X(t)} = 0 for all t. We shall extend this result to a martingale. Consider first a special kind of optional time, already introduced in section 2, in fact by the practical consideration that one must stop some time and not go on forever. Let T be optional (with respect to {Ft}), taking values in N but may be oo as per convention: for any t in N let min(T, t) be denoted by T A t; let X(t) be real-valued (not necessarily integer as in some previous examples). X(oo) may not be defined but X(T A t) is. We prove that for any A in Tt, we have E{A; X(T A (t + 1))} = E{A; X{T A t)}.
(3.5)
20
Part 1. Introduction
to random
time
In fact, the left member is equal to E{A; T>t
+ l;X(t + 1)} + E{A; T t +1} both belong to Tt, by the definition of [X{t + l)/Tt] the first term above is equal to E{A;T>t
+ l; [X{t+
\)/Tt]},
which is equal to E{A;T>t
+ l;X(t)}
(3.7)
by (3.3) in the new notation. Substituting (3.7) for the first term in (3.6) we see that the sum is then equal to the right member of (3.5), senz'altro. Q.E.D. As a particular case of (3.5), we see that E{X(T At)} does not change value with t, hence for all t it is equal to E{X(T A 1)} = E{X(1)}. We record this below: E{X(TAt)}
= E{X(l)},
t€N.
(3.8)
Of course it follows from (3.3) that E{X(t)} = E{X(l)} for all t, but this result is included in (3.8) when T is identically oo. Actually we have proved the following theorem due to Doob [4]: Theorem 3.1. If {Xt,Tt\ is a martingale, so is {X(T A t ) , f ( } for any optional T with respect to {.Ft}. Corollary. If E{X{T)}
is finite and lim E{T>t;
\X(t)\} = 0,
(3.9)
t—»oo
then E{X(T)}
= E{X(l)};
(3.10)
in fact we have [X(T)/^i] = X{1). To see this let A £ T\, then \E{A; X(T) - X(T A t)}\ < E{A;T
> t; \X(T) -
X{t)\}
< E{T > t; \X(T)\} + E{T > t; \X(t)\}.
3. Martingale
stopped
21
As t —> oo both last terms go to 0 by the hypotheses. Hence by Theorem 2.1,
E{A;X(T)}
= E{A;X(1)}.
It follows that (3.10) is true for a bounded martingale and any finite optional T. Also for any martingale and any bounded optional T. "Incredibly" there is a converse to the second assertion that I learned belatedly from Maria Rosaria Simonelli who cited Giorgio Letta. In its simplest form, where "bounded" is cut down to size, it reads as follows: LetX(l) = 0 andletE{\X(t)\} < oo for all te N. Then{X(t), te N} is a martingale if E{X(T)} = 0 for any optional time T which takes only one or two values. To see this, take first T = t to get E{X(t)} = 0 for all t; then for an arbitrary A in Tt take T = 1^ t + l^c(t +1) to deduce the defining equation (3.3) for a martingale. Thus, resorting to preferred old verbiage, a game is a martingale if it is fair not only at fixed times but also at non-clairvoyant dichotomic options! As an example of the applicability of Theorem 3.1 where Theorem 2.1 does not apply, consider a sequence of independent random variables {ys} with varying distributions but common mean zero. Then {Xt,Tt} is a martingale with the usual Tt and so (3.8) is true for any optional T. Take T = To,6i where a < 0 < b. Then X(T) is a random position outside (a, b) but not otherwise known. Now suppose the "steps" are bounded, i.e. \ys\ < B < oo. Then |X(T)| must lie between a - B and b + B. In this situation the bounded convergence theorem yields (2.10) as before. Unlike in the old gambler's case we have no idea of the exact credit or debit of the endgame; and yet the game remains fair when it ends. Up to now we have treated our original Question 1, namely the "linear" equation embodied in (1.7). It is time to treat Question 2, that of the expectation of the random time, shown in (1.8) as related to the square (quadratic) of the random place. Thus, t i m e does have to do with energy or mass (e = mc2) as in physics! Where does that come from? Return to the generalized random walk {ys}, IID with mean zero, but now add the "scale" condition that E{y2} = 1 where the unity is of course just a convenient number; unlike physicists we do not encumber our formulas with all sorts of weird universal "constants". The Bernoullian random walk, where each step taken is exactly one unit, fits the scale perfectly. In general
Part 1. Introduction
22
to random
time
this scaling leads to the equation (1.6) (with yo = 0 ) : E{X2(t)}=t;
(3.11)
a momentous identification of space and time! 9 To proceed, let us calculate X2(t + 1) = (X(t) + yt+1)2
= X2(t) + 2X(t)yt+1
+ y2+1,
and so [X2(t + 1)/Tt] = X 2 ( t ) + l by a simple use of independence. This is further evidence that X2(t) progresses in concordance with t, and leads to the discovery that {X2(t)
- t,Tt)
is a martingale.
(3.12)
Hence by Doob's theorem, for any optional T we have E{X2(T
A t) - (T A t)} = E{X2(T 2
= E{X {\)
A 1) - (T A 1)} - 1} = E{y2 - 1} = 0,
(3.13)
an extension of (3.11) when the non-random t is replaced by the random T At. Unfortunately, in general we cannot replace T At by T, because condition (3.9) may not be satisfied. It is not, even in the case of Bernoullian random walk. Nonetheless (3.13) suffices for the proof of (1.8) when T = Ta^ by the same argument using bounded convergence. Indeed, monotone convergence is also needed to obtain lim E{T At] = E{T}, t—>oo
a happy cooperation of the two pillars of integration theory. 9 O n e could credit Dr. Einstein with this pre-relativistic confounding of space and time, which he wrote as:
V/P=
y/2Dt,
where we may take D = 1/2. See his paper (1905) "On the movement of small particles suspended in a stationary liquid demanded by the molecular-kinetic theory of heat"; English translation in Investig'ation on the theory of the Brownian movement (Dover Publication 1956; p. 17). In more modern times this became dBt = yfdt, where Bt is the Brownian movement and dBt is popularly and ambiguously known as "white noise".
3. Martingale
23
stopped
Finally we have answered both questions in the Prologue. To recapitulate: apply (1.8) to two martingales: {X(t)} and {X2(t) — t}, and use Lebesgue's theorems of integration. Let us note that we have not used the full strength of Theorem 3.1, only its simplest consequence. The curious reader may wonder if there are other martingales, say cubic and quartic ones, that can be concocted from X(t). Yes indeed there is even an exponential one containing all powers, but we would rather turn the clock back two or three centuries and treat some old schemes. Example 3. A fair coin is tossed repeatedly. The gambler wins on heads and loses on tails, but the stakes vary from one game to another. His net gain-or-loss after t games is then t
X(t) = ^2bsys,
t€N,
(3.14)
s=l
where {bs} is a sequence of nonnegative real numbers, {ys} is IID and each ys takes the values +1 and —1 with probability 1/2. Now suppose the game is stopped as soon as head appears, namely at the random time T = min{i € N : yt = 1}.
(3.15)
Then {X(t)} is a martingale and T is optional. Question: is E{X(T)}
= 0 = E{X(t)}?
(3.16)
In this example we can calculate explicitly. If the game stops after the nth toss, then the actual balance of the gambler is bn — Yl"=i ^s> since P{T = n} = 2-n we have oo
/
n— 1
n
\
b
( 3 - 17 )
E{x(T)} = j2^ [bn-J2 A71=1
\
S= l
/
The double series with terms bs may be inverted to be oo
oo
oo
£ > 53 2 - = X> s 2s—l
n=s+l
(3.18)
5=1
Lo and behold the sum in (3.17) cancels out to be 0. Let us try a particularly famous case where bs = 2 S _ 1 , known as the gambit "doubling the stake". A funny thing happens on the way to nullity when we perceive that with this gambit the gambler wins, at the first head, one more than his accumulated losses: 2" = 1 + 2 + 4 + 8H t - 2 " - 1 + l,
24
Part 1. Introduction
to random
time
including the first game if he wins: 1 = 0 + 1. In other words, no matter when the game stops he always wins one. Not zero! So what went wrong with our computation above? There is no problem with the re-summation of the double series because all terms are positive: the trouble is that the result in (3.18) may be oo and then we get oo — oo in (3.17). Our labor is not totally in vain, though. Now we see that if the simple series on the right side of (3.18) converges, then (3.16) is true. This condition requires that bs = o(2 s ) as s —> oo. With a bit strengthening we can deduce the result from the Corollary to Theorem 3.1, as follows. t
E{T>t:
t
= 2"' Y. b"
\X{t)\] = P{T >t)YJbs s=l
s=l
Hence it is sufficient that the last member above goes to 0 as t —> oo. Example 4. The direct computation in Example 3 is so easy that no "fancy theory" is needed. So let us consider a slightly more advanced stopping time in the same coin-tossing game. Put T = T^ and define recursively: T (fe) = m i n { * > T ( f c - 1 ) :yt = l } ,
fc
= 2,3,...,
(3.19)
namely the kth head. Elementary probability theory gives the exact distribution: p{T(fc)=t}= ^ " ^ 2 " * , Then E{X(T^)}
t = k,k + l,...
(3.20)
= 0, provided that
s=l
This follows from the Corollary in the same way as the special case where k = 1 just shown. Note that if the game is stopped at T^k> the gambler has won k times at the random times T^x\ . . . , T ^ and lost at the other times, so that an explicit expression for E{X{TW)} would be very cumbersome if not "impossible". Thus we see a general abstract argument does have its advantage. The name "martingale" was introduced in probability theory by Jean Ville [5, p. 99] as a synonym of "jeux equitable" or "fair game". However,
3. Martingale
stopped
25
the Universal Oxford Dictionary defines it as follows: "A system of gambling which consists in doubling the stake when losing in order to recoup oneself" (1815). Let us hear Thackeray, author of Vanity Fair: You have not played yet? Do not do so; above all avoid a martingale if you do. The mathematization of this game is given in Example 3, viz. (3.14) with bs = 2 S _ 1 . The gambler has a net gain of 1 no matter when the game stops. Hence in order to make the game "fair" to his opponent he should pay the ante 1 before the game begins. Although he can recover this at the random time T, he may suffer losses totaling 2 T _ 1 — 1, which may exceed his ability to pay. Thus he risks ruin for a puny gain—no doubt this is what Thackeray had in mind in his exhortation. Historically there is a non-martingale version of doubling the stake, known as St. Petersburg game dating at least as early as 1730 when Cramer proposed it to Buffon at Geneva. See Todhunter [12], where there are ten references to the "problem" in the index. In this game, Pierre simply bets on the value of the random time T in the coin-tossing in Example 3. He gets 2 T _ 1 from Paul whatever T be. (The "—1" in the exponent is a nuisance but preserved here for old time's sake; change it to 2T if you wish, as Borel did.) It is trivial to compute his expectation: oo
T 1
oo
P T
E {2 - } = J2 i = *} ' = 53 2_1 = °°t=i
2 -1
1. It would seem that this old process is so simple that there is absolutely no need to consider its HM property. This may be one reason why it took a couple of centuries from James (Jakob, Jacques) Bernoulli, DeMoivre,..., Laplace to Markov. Nevertheless, Theorem 4.1 applied to this case yields a museum piece in gambling system that should impact a mathematically minded gambler. It is said that Poincare frequented roulette tables to observe the vicissitudes of fortune, but there is no record that he discovered the following result, due to Doob, 1936. Corollary. Let {T(n),n € N} be a sequence of strictly increasing optional times. Then the sequence of variables {y(T(n) + l ) , n € N} is an IID sequence with the same common distribution as y\. This follows from [f(y(T + 1))/FT] = f*(y(T
+ 1)) = E{f(y(T
+ 1))} =
E{f(Vl)}
for each T = T(n), and mathematical induction on n. In fact, Theorem 4.1 says more. It says that the post-T(n) process {X(T(n) +1), t G N} is independent of the pre-T(n) history of the original process, and behaves probabilistically as a replica ("clone"!) of the latter:
33
4- Random past and future
in other words, the process starts "from scratch" 13 at T(n) + 1. As a sad consequence of the Corollary, the gambler who thinks that he can improve his chances by opting for lucky moments to place his bets is deluding himself. If you consider this as rather trite, the encyclopedist D'Alembert (who wrote a treatise on dynamics and has a Principle named after him) thought it obvious that if head has a run of three then tail is more likely in the next throw. What makes this curiouser is that he also advocated experiments in coin-tossing, as Buffon did for the St. Petersburg game described in section 3. Now from the standpoint of experimental science, if head has a long run (3 may be insufficient!), the statistical inference known as "testing hypothesis" would tend to reject the a priori probability 1/2 for head and in the next throw would rather predict head than tail. This, incidentally, is the view of the economist John Maynard Keynes who began his career with a treatise on probability. 14 But such discourse is really outside the realm of pure mathematics. The next example of a HM process is the successive partial sums of an IID process, namely the general random walk defined in (2.1). This is an infinitely richer theory and the preeminent domain of classic probability. We have already studied several aspects of it in the preceding sections. To verify its HM property it is expedient to deviate from the existential definition of conditioning reviewed in §3, and make use of an operational device when the conditioning is on a single random variable X; to write "X = x" when, literally speaking, this event may have zero probability so that conditioning on it is without meaning. Nevertheless it will serve as a clear and efficient guide to obtain the result, as will be illustrated now. Since X(t + 1) = X(t) + 2/4+1, conditioned on X(t) = x, we must have X(t + 1) = x + j/t+i- Since yt+i is independent of the just said condition, we must have [f(X(t + 1))/X(t)
=x} = E{f(x
+ yt+1)}
= E{f(x
= J f(x + y)D(dy),
+
yi)}
(4.7)
where D is the distribution of y\ and the integration is over the whole space. Thus the second equation (4.4) holds with f*(x) equal to the last member in (4.7). 13 After a span of half a century, I can still hear William Feller articulating this newly acquired American colloquialism in his lectures: "From Scratch!". 14 1 translated this book during the war years and gave the manuscript to Prof. Pao Lu Hsu in Kunming. It disappeared as a war casualty.
34
Part 1. Introduction
to random
time
We have yet to tackle the first member in (4.4). Although !Ft contains X(t), the iterated conditioning property (3.4*) is inutile as it goes the wrong way. The independence of ft-i and X(t) must help, since we know conditioning on independent stuff means no condition. Unfortunately these bright thoughts do not make a logical deduction and the only rigorous way to verify that conditioning X(t + 1) on Tt turns out to be the same as conditioning on X{t) alone, is to face down the uninspiring formal definition of conditional expectation reviewed in §3. See [3, p. 309] if you can't do it as an exercise. Once the HM character of the random walk is established so that its strong Markov property is available, "incredibly" rich and varied developments follow; see, e.g., [3, Chapter 8]. Here we must be content with a simple example built on an old theme. Consider a random walk on the set of all integers denoted by Z. There is no "stop", or "barrier" in the physicist's lingo, and the walk is continued indefinitely, namely time goes to infinity. Suppose that P°{T0 < oo} = 1,
(4.8)
namely if we start from 0 we are (almost) sure to return to 0. This is true in the Bernoullian case discussed in §1. It is shown under (1.19), though there we start at x ^ 0. But if we start at 0, one step will get us to ±1 from where we shall reach 0 for sure. Actually we can prove in the general case that if E{y\} exists, then (4.8) is true if and only if E{y\} = 0; see [3, §8.3]. Now define inductively, as in (3.19): T0(fc) = min{* > T 0 (fc-1) :X(t)=
0}
(4.8*)
with the usual convention for oo. Applying the global Theorem 4.2, we have P{T^k) < oo | X(T ( f c - 1 } )} = P{T0 < oo | X(0)} = 1. It follows that all T^k\ keN, are finite and therefore "return to 0 infinitely often (i.o.)" is certain. This epoch-making notion is due to Borel (1909) and may be expressed as follows: denoting the event {Xn = 0} by En, we have
{
oo
oo
"^
fl (J En I = 1. m=ln=m
)
(4.9)
4- Random past and future
35
This phenomenon is nowadays described by saying that "the state 0 is recurrent" in the random walk process. It is an elementary analogue of a cosmological possibility known as Poincare recurrence, which is sometimes cited as a counterposition to the dissipative Second Law of Thermodynamics, e.g. [9, p. 315]. For the Bernoullian walk the result was proved by Polya (1922). A similar result holds for the planar integer lattice but fails in three or higher dimension. The last mentioned "fact" is what the eminent professor of engineering (since deceased) could not believe, as I reminisced in §1 above. Having gone this far let me push the matter one more step to show the power of random-time thinking. Let A denote the subset of Z such that Pa = P{yi = a} > 0,
a£A;
^
P a
= l.
(4.10)
aeA
By the Corollary to Theorem 4.1, the sequence {y{T^n) + 1), n £ N} is IID and distributed as y\ in (4.10). Following the great Borel, let us write En for the event {y(T^n) + 1) = a} for any a in A; Ec = W\E. Then we have P{(En
i.o.)c} = P oo
^E m=l
Q
f ] El
\m=Xn=m / oo \
p
< ) oo
oo
1
fl^n =E( -p) \n=ro
/
ro=l
oo
= E0-
(411)
m=l
Hence for this new En, we have again "En i.o.", namely return to a indefinitely often is also certain. Now we can repeat the argument using a instead of 0, and so on and so forth. What is the final conclusion? Theorem 4.3. For the general random walk on Z, if 0 is recurrent then every accessible state is also recurrent. An integer z is accessible iff it can be reached from 0 in a finite number of steps. In the (symmetric) Bernoullian walk, of course, it is banal that all z in Z are accessible and therefore recurrent. From this it is easy (not as Laplace used to say, but really) to see that the walk begun at any x in (a, b) must stop at either a or b, whichever comes first—namely one of the gamblers is certain to be ruined. Compare this elegant solution of the problem with previous clues discussed in §1. It is done by random times. A few words about that existential / * predicated in (4.4) for each / — what is it? Here it pays to be old-fashioned and begin with a set B, viz.
Part 1. Introduction
36
to random
time
f = \B the indicator of B. Then (4.4) for t = 1 reduces to P{X{2) e B | X(l) = x} = P(x, B).
(4.12)
This last entity is known as a "kernel" (of Germanic origin, I think, for lack of anglo vocabulary). The "mapping" of / to / * is then representable as
f*(x)=P(x,f) = Jp(x,dy)f(y), or in operatorial notation simply / * = Pf-
(4-13)
Iterating the operation n times yields P " , which corresponds to enhancing t + 1 in (4.4) to t + n, so that P{X(n + 1) € B | X(l) = x} = Pn(x, B) = j Pn(x, dy)
lB(y).
Clearly we have pn+m _ pn
pm
/ ^ J^-J
a banal equation apparently first jotted down by Sidney Chapman in his study of bird migration. A mathematical object built upon such operations is called "semigroup". It is often confused with the Markovian process owing to the connection just resume. But as it lives on the "state space", i.e. R d or Z or a topological space, though it proceeds by the same time n or t as in the process, there is simply no room in its framework for a random time w —> T(w) that lives in a probability space W, in which the Markov process makes its residence and casts its long shadow (image) with the random passage of time, mapping W ceaselessly into whatever phase (state) space such as our R d above, or the lately fashionable and predictably transitory 26 or 10 dimensional twisted (string) models of the so-called Universe. To what a fortuitous concurrence do we not owe every pleasure and convenience...
5.
Other Times
Since optional time was introduced in §2, we have discussed nothing but it. "Are there other random times?" Of course there are, plenty. Indeed,
5. Other
37
times
even arbitrary random times can serve good purposes. Let us begin with a completely arbitrary sequence of random variables and distinguish them by R„, n & N. Suppose lim Rn(w) =R00{w)
(5.1)
71—>0O
for almost all w (in W, for P). This ought to be the crudest notion of convergence when probability is involved, but historically it had to wait for the measure theory of the twentieth century before it was formally recognized. Why, it is just convergence "almost everywhere" in Lebesgue's integration! Now let Tn be an arbitrary sequence of random times increasing to infinity, then (5.1) implies that lim RTnrw)(w)
= Rx(w)
(5.2)
n—>oo
for "another" almost all w. The ungrammatical use of "another" here is to emphasize the idea that the exceptional null set may be different from the one for (5.1); it is the same only when Tn(w) goes to oo for all w without exception. If on the other hand it does so only for almost all w, then we need to pool together the two null sets for (5.2), but the union of two null sets is still a null set, so "no problem". The result (5.2) is so trivial but apparently when used in somewhat contrived manner 15 it has confounded "intuitive" thinkers. To test such intuition, suppose the sequence converges in L 1 and L2, modes of convergence beloved by analysts, would the analogous property hold? Namely would (5.2) follow in L1 or L2? A beautiful example is afforded by the recurrence phenomenon discussed at the end of §4. Example 5. Let {Xn} be the Bernoullian random walk so that 0 is a recurrent state. Define Rn(w) to be 1 on the set {w : Xn(w) = 0} and to be 0 everywhere else in W. E{Rn}
= P{Rn = 1} = P{Xn
= 0}
= ( , I— if n is even, zero if n is odd. \n/2J 2n 15 Here is an esoteric application of (5.2) by David Blackwell in his proof (1946) of Wald's Theorem 2.1. In the notation there, let R„ = X(n)/n; and construct iterates {T^k\ k £ N} of the stopping time T, in the manner of (4.8*). Then of course R{T^)/T^ converges a.s. to M, senz' altro. By virtue of an obscure converse to Kolmogorov's strong law of large numbers that Dave had to confirm by mail, this implies (2.3). A tour de force, but as the Chinese saying goes: "To kill a chicken who needs a bully ax?"
38
Part 1. Introduction
to random
time
Hence linin-^oo E{Rn} = 0; a fortiori for any e > 0, P{Rn > e} —> 0. The first conclusion above means that Rn converges in L1 to 0, and the second means "R„, converges to 0 in probability". Strictly speaking we should make Rn its absolute value |i? n | but that is not necessary here since Rn > 0. It is also trivial that we can raise Rn to any power |-Rn|p, p > 0, without changing anything. The convergence in probability is, of course, always implied by convergence in any Lp, p > 0, but I took care to spell it out there because that notion of convergence was first discovered in probability theory a couple of centuries before the Borel-Lebesgue theory of measure and integration, by none other than Jakob (Jacques, James) Bernoulli, albeit implicitly, in his Law of Great Numbers (now called the Weak law). The recognition of it as a mode of convergence must have taken some time. In sum, our sequence {Rn} converges in Lp, any p > 0; now define a sequence of random times Tn as follows: Tn(w) = the nth time when X(w, •) takes the value 0, i.e. the nth value of t in N such that X(w, t) = 0. This time exists (finite!) because X(w,t) = 0 for infinitely many values of t in N, as proved in (4.9) (sorry for the change of notation). Of course for each w, these values depend on w, here noted as Tn(w) (do not confuse this with the previous Ta in other discussions). Thus by the very definition, and existence, we have X(w,Tn(w))
= 0,
so RTn(w)(w)
= 1.
Not only we have convergence to 1, but we hit the value 1 exactly along the random times Tn(w). Conclusion: what is true for an arbitrary random time may be completely false when almost sure convergence is altered into any Lp convergence, not to say the feebler convergence in probability. The next example further strengthens the negation. Example 6. The weak Law of Great (large) Numbers due to J. Bernoulli was strengthened to the strong law by Kolmogorov (1929). To recapitulate: let {ys} be IID with finite mean M, and X(t) = J2s=i 2^s' t n e n we have lim ^ t—*oo
= M
(5.3)
t
almost surely. It is not easy to construct an example where the weak but
5. Other
times
39
not the strong law of large numbers holds. But here is one: P{yi = -n} = P{yi = n} = -^ , n = 3 , 4 , 5 , . . . , oo, nz log n where c is some constant to make the sum over ± n equal to 1, and I start n with 3 only because I do not want log 1 and log 2. The distribution above is symmetric so it even has mean zero in the Cauchy sense. Using the necessary and sufficient conditions for the weak law of large numbers due to Feller and Levy, we can verify that the latter holds:
&p{ffi2!>«}-o for any e > 0; but the strong law does not hold, in fact we have limsup t—>oo
X(t) t
=+oo
X(t) and l i m i n f — = —oo t—too
t
(5-5)
almost surely. See [3, pp. 112,113] for proofs. In consequence of (5.5), there exist random times Tn(w) and Un(w) both increasing to infinity almost surely such that when they are substituted for the non-random t in (5.3) the limits there exist and are equal to +oo and — oo respectively. That is, in spite of (5.4). Now it is well known that (5.4), namely the convergence in probability of X(t)/t to zero, implies also the existence of a nonrandom subsequence tk, k € N, such that X(tk)/tk converges almost surely to zero. Thus all sorts of things can happen at random times ("nonrandom" or "constant" time is a particular case of random time!) when we begin with a weak sort of convergence, in dire contrast to a strong one as shown in (5.2). The algebraist turned statistician van der Waerden questioned the utility of the strong (as against the weak) law of large numbers; perhaps random times are not much employed in his statistical studies? The strong law of large numbers leads to a new kind of random time: Suppose the ys are positive-valued, more precisely suppose their common distribution is supported in the open (0, oo) and has a finite mean M, necessarily > 0. Then X(t) strictly increases with t, and to infinity. Hence given any n in N, there is a value of t (in N) depending on n and also on the hidden w, such that X(t) > n. Notation can be incredibly important! Here I will deliberately invert the roles of n and t and define for each w a function of t as follows: Q(w,t)=M{neN:X(w,n)>t}.
(5.6)
40
Part 1. Introduction
to random
time
Thus if we put X(w, 0) = 0 for all w, we have Q(w,t) = n
is equivalent to
X(w,n — 1) < t < X(w,n).
(5.7)
The changed notation seduces us (as the French say) to regard the variable t no longer as a member of N but as a positive real parameter; why not call it t i m e ? For such a real positive t, the definition of Q(w, t) makes perfect sense. By (5.7), Q(w,t) = 1 when 0 < t < X(w,l) = yi{w). Of course there is nothing wrong to begin "something" with 1, but an entrenched syntax prefers to begin with 0 which makes us do a redundant R(w, t) = Q(w, t)-l=
sup{n eN:X(w,n) X(t) and t —> R(t) are just inverse to each other. As things are, it still follows from (5.3) the inverse limit: ,- R(t) 1 lim —— = — .
(5.9)
Now we shall change our perspective: instead of regarding X(n) as a random distance walked in n steps, we think of it as the random time passed or spent in n successions or "renewals" determined by 2/1,2/2, ••• ,2M> each of which may be regarded as a "single lifetime" of a king, a photon, or a light bulb. We think of the latter as variable (random) lengths marked out on the time-axis, end to end, gaplessly (Schrodinger's excellent diction!). Then (5.3) simply says that the accumulated time spent in n contiguous renewals (read n for the t there) tends to be n times the average lifetime of those individual objects, in the long run. From this point of view, R(w, t) is the number of renewals registered in the clocked time span (0,t]—the inclusion of the endpoint t owing to mathematical fussbudget will be seen to be of excruciating consequence later. Thus (5.9) says, e.g., that the number of light bulbs burned out and instantly replaced in t days is, grosso modo, equal to the number t divided by the average "life" of a bulb in days. This is the way practical scientists figure out such simple averaging problems without recourse to mathematics, and if it works, who needs it? We want however to pursue the counting process a bit more by exhibiting it under another notation: for any probabilisable set A let us denote its indicator function by 1(A); then the number of renewals up to time t
5. Other
times
41
can be counted one after another as: oo
R(t) = Y/I({X(n) r > =
E
W
where the sum is over all s, possibly including oo. Thus, this is another occasion when an arbitrary random time behaves like a nonrandom one. Domination is a tremendous assumption or property, well recognized in old analysis: Lebesgue's Dominated Convergence Theorem and all that. Mathematical physicists assume it whenever they run into "hard times"! Even boundedness may be a reasonable assumption when we deal with things of the real world such as the life of a king. It surely saves labor in Mathematics. Using the language of renewal of life, the random variable y(Qt) may be called the "life-span across t", and may be split into two parts: y(Qt) = (t- X(Qt - 1)) + (X(Qt) - t),
44
Part 1. Introduction
to random
time
which are respectively "the age at t" and "the residual life at £". It is paradoxical that the residual life may have an expectation that exceeds that of each ys. Actually, the limiting distribution of the residual life, as well as that of the age at t, as t becomes infinite, can be calculated by means of a Fine Renewal Theorem that refines the gross one stated above. It then follows that the limiting expectation will be finite if E{y\} is finite; it is quite different from E{yi) and can be larger. In fact the limiting value of E{y(Qt)} is always larger than E{y{\. For details see [6, Chapter XI]; for the most peculiar case of Poisson, see [9]. The lesson to be inculcated here is: treat a random time with respect, or suspicion.16 Turning to another direction, we observe that the random variable Rt is defined for all real number t > 0, yielding the stochastic process {Rt,t G [0, oo)} in continuous time. When the common distribution of the ys is the exponential with density AeAs, for any A > 0, it is the famous Poisson process. Thus Rt(w)=n
iort€[Xn(w),Xn+i(w)),
n € N°.
(5.16)
A typical sample function (path) of this process is an extended step function beginning at 0, with successive jumps all of height 1, and all way to infinity, while remaining constant or flat between jumps, something like a stairway to heaven. The sites of the jumps are the random times by means of which the process is constructed. This process, suitably "compounded", serves as a model for all sorts of applications from telephone calls through atomic fission to life insurance, if not the stock exchanges (why?). It is intimately related to the Poisson distribution (once known as "the law of small numbers" applicable to accidental death caused by horse kicks in the Prussian army, and children suicides in other regions). The latter is really a probabilistic reincarnation of the exponential function developed in its Taylor series:
„ n! n=0 16 During the drafting of the present manuscript, Lajos Takacs told me his experience with t h e "residual waiting time" X(Qt) — t when he arrives at t h e bus stop at time t to wait for t h e next bus t o go t o t h e Mathematics Institute in Budapest (where I visited Erdos and Renyi in 1968). He found the average waiting time to exceed t h e mean gap-time between arrivals of unpunctual buses. When he talked about this "buswaiting paradox" the audience of mathematicians were incredulous. Only one remarked, a posteriori, that he is more likely to arrive within a longer gap than a shorter one.
5. Other
times
45
Thus, the Poisson distribution with mean A is given by {7r„(A),n € N 0 }, where 7rn(A) = e - A ^ ,
n€N°.
For the Poisson process with parameter A we have the identifying relation: P{Rt = n} = irn(\t),
n G N°.
(5.17)
It is a good exercise for the novice to establish (5.17) by means of (5.16). Suppose telephone calls come on a line gaplessly. Suppose, as we have done, the duration of these calls form an IID sequence with the exponential distribution. The resulting Poisson process is then registered as a random positive integer that increases with the clocked t as the number of calls received, the R(w,t) defined above. Thus (5.17) gives the probability that exactly n calls have come in the time interval from 0 to t. The only quibbling is: if a call ends precisely at the moment t so that the next one begins instantly at the same t, should we count R(w, t) as n or n — 1? This ambiguous event occurs with zero probability, but it cannot be ignored because mathematics, and its notation, must be excruciatingly unambiguous. According to what we have laid down in (5.16), if Xn(w) = t then the nth call is counted at time t although it barely begins then. In other words we have decided that the sample step function be assigned a value at each jump site equal to its value immediately to the right of the site: more briefly, we have decided to make the function right-continuous. (We could have made it left-continuous if we wished!) 17 Practical folks don't worry about such finesse, and the numerical formula given by (5.17) conceals the issue. In the modern theory of stochastic processes in continuous time, however, it plays a major, though preliminary, role. Indeed, it may be said that a general theory of Markov process was held back for years until G. A. Hunt proclaimed (c. 1957): "Let the paths be right-continuous and have left limits everywhere". Etcetera. Let me also broach another approach to the Poisson process, the usual one found in "elementary" texts. That is not a random construct as we have made it but a Newtonian tradition of solving differential equations. It was probably not done by Poisson, but somebody put down the equations: p-i(t)=0; 17
P'n(t)
= XPn^(t)
- XPn(t),
n£N°.
(5.18)
When the paths are reversed in time, one by one, the reversed paths of course become left-continuous if the original ones are right continuous. Left-continuous Markov processes have been studied.
46
Part 1. Introduction
to random
time
And pronto, the solution is Pn{t) = nn(\t),
n € N°.
(5.19)
This should be our previous (5.17). Where is the process? "Who needs it?" is one answer. For us, however. "The Process is the Thing!" The Poisson Process is a Homogeneous Markov Process as discussed in §4, though it runs in continuous time. It has the strong Markov property embodied in Theorem 4.1, to be translated appropriately, where the rightcontinuity of paths is essential. We cannot enter into the details here but will avail ourselves of the resulting enhancement of (5.7) into its more useful form: Pm{Rt
= m + n}=Pn(t)
= nn(Xt).
(5.20)
This says: whenever you look at the process and whichever state m it happens to be in at that time, you are (almost!) sure that after another lapse of time t, the process will be found in state m + n with the probability given by the Poisson distribution. Thus the displacement (difference) n depends only on the time t transpired, not on the initial time when you look, nor on the initial state m. This is homogeneity in both Time and Space (the latter owing to the homogeneous structure of the integers serving as states). Where is the Markov property? This is passed muster because, as I warned you before, mathematics is awfully taciturn and you must learn to read what is not said as well as what is said. Here it means that the assertion above does not depend in any way on what happened to the process before you looked, namely how it had got to that state m, provided that your looking time is optional. Recall Theorem 4.1, although there the time is discrete. How to extend the definition of optionality to continuous time? Clearly (or is it?) the old (2.2) won't do. But I have mentioned earlier that it can be modified by changing "T = t" into "T < i" to an equivalent condition when time is discrete. The latter will serve in continuous time. For example, the first, second,.. .jump times of the Poisson process are all optional in this sense. But we must stop here before getting into complications in continuous time. If you are eager to find out, you should try to write down a precise definition of those "jump times" just casually spoken of, and verify that they are indeed optional according to the definition: {w : T(w) < t} G Tt,
for each t G [0, oo).
(5.21)
Enfin de compte, they are none other than the Xn(w) used in our random
6. From first to last
47
construction of the process. This section began with the possibility of substituting arbitrary random times when there is almost sure convergence, as in the strong law of large numbers. In classical probability, next to the law of large numbers stands the central limit theorem. The relationship between these two pillars may be epitomized in the role of the first and second moments of the underlying random variables which are denoted by {ys, s e N}, IID as assumed before but generalizable to some extent. For the central limit theorem we assume the finiteness of the second moment; and we may as well make the mean zero and the variance one. Here is a remarkable theorem originated with Vincent Doblin (Wolfgang Doeblin): Theorem 5.1. Let {ys, s G N} be IID with mean 0 and variance 1. Then, for an arbitrary sequence of random variables vn(w) such that vn(w)/n converges to 1 almost surely, the limiting distribution ofX(w,vn(w))/(vn(w))1^2 is the normal distribution with 0 mean and unit variance. For the proof and further reference see [3, pp. 216,249]. The result is made possible owing to the control over the difference \X(vn) — X(n)\ for all values of vn satisfying \vn — n\ < ne, by means of Kolmogorov's maximum inequality sharpening Chebyshev's in an essential way. It was Kolmogorov's problems about denumerable Markov chains solved by Doblin that led to his result on the central limit theorem with a random number of summands. There should be similar randomized versions of classical limit theorems for sums of independent random variables.
6.
From First to Last
An observant reader should have observed that all the optional times discussed above are of the form TA = min{n G N : X(n) e A},
(6.1)
where A is a set. This is the first time the event "to be in A" occurs. True, we have also its "iterates" as in (3.19): the second, third, . . . , times. More generally we can define the first time an event occurs after another event has occurred, which may be denoted by TBoTA=
min{n > TA : X(n) e B}.
(6.2)
Part 1. Introduction
48
to random
time
Here is a more complex example: the first time all six faces of a die have appeared when it is thrown repeatedly. To symbolize this event we need multiple sample values of the process X(-). In fact, from the definition of optionality in (2.2), "it is easy to see" that any optional T can be represented as follows: T = m i n { n e N : (XUX2,...
,Xn) G A},
(6.3)
with an n-dimensional space set A. This result is of little practical utility; it only serves to highlight "The Importance of Being First". On the other hand, from the practical side, the notion of "first passage" was known to physicists like Furth in the study of diffusion, where he calculated the density of such a time-distribution in the form of a Fourier sine series, cited by Feller [6, p. 359]. Physicists like to compute and are good at it. Whether or how the random time giving rise to that fine formula was explored by Furth is not known. Now, in physics the lack of an arrow for the flow of time in the great equations of Newton and Maxwell is a common topic, expounded in the semi-popular books by Eddington, Jeans, . . . , Feynman. If time is reversed, then of course "first" becomes "last", "initial" (position or "state") becomes "final"; and the neutral noun "passage" should be distinguished into "entrance" vis-a-vis "exit". Calling the TA in (6.1) the "first entrance time into A", we define its reverse-dual the "last exit time from A", as follows: LA = max{n G N : X(n) G A}.
(6.4)
The max above is defined to be 0 when it does not really exist, just as the min in (6.1) is defined to be oo when it does not exist. There is symmetry in this convention. But reversing the TIME has a big problem: from when do we reverse it? Mathematicians, unlike physicists, need not worry about the unanswerable question whether time is infinite or has an end, but there is the algebraic difficulty of confronting "oo — t". And yet we can certainly talk of a last time as pinned down in (6.4). A simple example will help. E x a m p l e 7. Consider the Bernoullian random walk introduced in §1, but now unsymmetric: P{yi = +l}=P,
P{yi
= -l}
= l-p
= q,
\ 0, we can still infer that Lo < oo but if e.g. z = 903 above, it is clear that Lo < LA • Moreover the random place X(LA) may be anywhere in the set A. Its exact distribution may not be so easy to compute even in a simple case where e.g. X takes three different values. By way of illustration, in the case of (6.5) we have PX{T0 < oo} = (q/p)x,
x>l.
(6.6)
This is the chance of the gambler's ruin: if he starts with capital x and dares to play against an infinitely rich adversary, even though his chance of winning each wager is better than even. The solution is obtained as on p. 10 by the method of difference equations, [5, §8.2]. From this we deduce P°{Lo = 0}=p(l-^=p-q.
(6.7)
For, in order never to return to 0 we must take first the step + 1 , then not to return to 0 from 1. Next we have P°{L0 = 2n} = P°{X(2n)
= 0} • P°{L0 = 0}
(6.8)
for n € N. The crucial thing to observe is that for the first factor in the right member above, the probability is that of a "free" random walk without any hint that 2n is the last time we are at 0. That condition is expressed in the second factor there: indeed we restart from 0 as if nothing has happened, but from then on our random walk must be restrained so as not to return to 0 ever again. Compare this situation with that for an optional time at which the strong Markov property says to start from scratch. If this is not clear it will become more clear in the general context of a Markov chain to be treated shortly. For the moment, let us record the explicit formula: P°{L 0 = 2n} = Q
(pqT • (p - q).
(6.8*)
A serious student of mathematics will wish to "check out" this numerical
Part 1. Introduction
50
to random
time
result by summing it over n > 0. Here is the Taylor series
0,
Y^Pij = 1,
(6.10)
j
where unspecified indices range over 7. The matrix P = iPik) is the "transition matrix" of the chain; it is finite square or infinite according to the finitude of / . Owing to (6.10), the powers of P are well-defined even in the infinite case and we have pm . pn
=
pm+n^
n £
j ^
(g
n
)
where P° is the identity matrix. We denote the elements of Pn by p\j', thus we have as an extension of (6.9):
P{X{t + n)=j\X{t)
=
i}=pf.
Needless to say the conditional probability is defined only if P{X(t) = i] > 0, otherwise the corresponding equation is null and void. Recall that (6.11) is a particular case of (4.14), a creature of the semigroupy shadow of the infinitely richer chain {X(t),t e N 0 }. As opined there, the matrices by themselves do not suffice for probing the deeper structures of the process, and other essential probabilities arising from random times will soon be introduced. But let it be mentioned here that A. A. Markov, good analyst he, was able to squeeze out of those P " his fundamental limit theorem as n —> oo (1907). His method however does not work in the general case where / is infinite. Kolmogorov did that (1936) beginning with a formula that is (6.15) below, was in doubt, despite the temporal symmetry of the definition of the Markov property.
Part 1. Introduction to random time
52
Consider p£ • this is the measure of all the paths X(w,t) that are at i when t = 0 and at j when t = n. The two time moments are 0 and n, the two events are "being at i" and "being at j " . Think dynamically, viz. track the time: if the event j occurs at time n, there must be a first time when this occurs, between 1 and n. (Banish the thought that i may be j \ ) . Decompose the set of these specified paths according to the first occurrence, viz. Tj (notation is very important). Doing so in writing, we get
pf = I*{X(n) =j} = J2 piVi = v'>X^ = & V=l
n
= Y,pi{T3 = v) • pJiX(n -v)= J}-
( 6 - 12 )
v=l
where the HMP is used in the last step. Now we introduce a new, incredible notation: for any three states i,j, k, and n > 1: fcp£>
= Pl{X{t)
^k,l = jP*j,
(6.20)
n=l
where * denotes a sum over all n. It follows that jp*j < 1 for all i and j . Dually, what can we say about jP*,? We shall say H leads to j " 1 9 iff P'iTj
< oo} = X
> 0;
it should be obvious that this condition is equivalent to the shorter p*j > 0. If so, does it follow that Pi{Tj<Ti}=iP*j>0?
(6.21)
This is a wonderful question I always asked the class, testing their ability to think dynamically. Actually a quick, static (algebraic) answer is hidden in the last exit decomposition. For there is an n for which the probability in (6.19) is strictly positive, since i leads to j by hypothesis; then at least one of the terms in the sum must be so, hence (6.21) is true. A bright student can also answer the question as follows: suppose the contrary, then it is impossible to go from i to j before making a stop at i, in other words returning to i (say to pick up something forgotten). Then after the (first) return there must be another, and, yet another..., ad infinitum. Thus one will never be able to get to j . Next we prove, under the reverse hypothesis that j leads to i, that jP*- < oo. This time I do not know any bright answer, and must rely on 19 It is a linguistic oddity that there does not exist a common adjective in English t o enable us t o reverse the direction of passage in the sentence "(the state) j is accessible from i". My coinage "to lead to" also sounds very academic. But there is a Chinese verb that does it perfectly: tong.
6. Prom first to last
55
clumsy algebra. However, it is still necessary to figure out the meaning of the inequality (n)
(m)
,
(n+m)
Since j leads to i there is m for which j>^ > 0, by what we have just proved in (6.21), interchanging j and i. Summing over n we obtain
tfj ip£) 0} = P^Tj
< oo},
i
E {Nj}=p*j,
(6.26)
so that the new notation seems wasteful. But now we can write also jP*j
= E^NjiTj)},
iV*i
= E'iNjiTi)}.
(6.27)
The state i is called "recurrent" iff P*{T- < oo} = iVl = 1,
(6.28)
which says that, starting from i, it is almost certain we shall return to i. But then the HMP implies that we shall return again and again, an infinite number of times, namely: P^Nt
= oo} = 1.
(6.29)
= pi = +oo;
(6.30)
This of course implies E'iNi}
what is remarkable is that the converse is also true, namely the apparently weaker property (6.30) in fact implies the stronger one (6.29). Next, consider a class C of mutually communicating states. It is another remarkable result that either all the states in C are recurrent as just defined, or none of them is. A state that is not recurrent, is of course called "nonrecurrent" , though for some syntactic reason the name "transient" is also commonly used. A class is called recurrent iff any, hence every, state in it is recurrent. Such a class is distinguished by the condition (6.28). Moreover for any two states i and j in a recurrent class, we have P^Nj
= oo} = 1;
hence P^Tj
< oo} = 1;
and ^ { A ^ } = p*j = +oo. (6.31)
6. From first
to
last
57
Let C be a recurrent class, and choose any state i in C. Re-denote ej=iP*j,
j€C.
(6.32)
By (6.28), e» = 1. We know also that 0 < e, < oo by (6.22). Let us evaluate, for any k £ C: oo
]T ejPjk = e^k + J2Y1 iPi?Prt jeC
j=jti n = l oo
n = l jtfii oo — iPjfc
+
2^/iPik n=l
—
ek
-
This exhibits {e.,-,j G C} as an invariant (stationary) measure for the Markov chain with matrix (pjk) on C. Our wise investment in duality has produced an unexpected dividend. Once we have the invariant measure (in fact unique up to a multiplicative constant), it is trivial to construct a "reverse chain" with the matrix Q = (qjk), where qjk = ekpkj/ej. When the invariant measure is finite and so may be made into a probability measure, this is old stuff. It happens when El{Ti\ < oo for any i in C (then this is true for all i in C), and has been called the "positiverecurrent" or "strongly ergodic" case. But we must stop here; for details of the material above, see [8]. Let us return to a general random walk as discussed in earlier sections. Without loss of generality we may suppose that the state space is Z, the set of all integers. When is it recurrent? The necessary and sufficient condition, provided £{|yi|} < oo, is precisely E{yi) = 0, as already referred to on p. 34 above. Hence in this case (6.28) is true for all z in Z. What now are those quantities in (6.32)? They can be identified from the ratio limits in (6.23) and (6.24): all equal to one, appunto. Now look at the second equation in (6.27): what does it say? For the symmetric Bernoullian random walk on Z, starting from any z in Z, return to z "i.o." is (almost) certain; between two successive returns the expected number of visits at any y in Z, e.g. y = 1, y = 17, y = 903, y = —1010, are all the same and equal to one. A trivial corollary of this state of affairs is: the expected return time from z t o z i s + o o = ^ l where the sum is over Z. This was proved by the
Part 1. Introduction
58
to random
time
ancient greats in (1.18), checked out in a devious manner at t h e end of §2. T h e same conclusion holds t r u e for any generalized random walk with mean 0 (provided it leads from 0 to all of Z). W h a t would De Moivre or Laplace say about this modern ramification of their occupation with duration of play, and gambler's ruin? 2 0
7.
Gapless Time
In the preceding sections t i m e is t r e a t e d discreetly, in discrete units, as t = 0 , 1 , 2 , 3 , . . . , n , . . . T h e concept of "continuous time" troubled early Greek philosophers (Zeno of Elea, c. 450 B C ) . In Erwin Schrodinger's Science and Humanism (Physics in our Time), 1951, he devoted a whole section, "The intricacy of the continuum" to it. He says: We must not admit t h e possibility of continuous observation [in italics, p. 131]. Is the impossibility of a continuous, gapless, uninterrupted description in space and time really founded in incontrovertible facts? T h e current opinion among physicists is t h a t this is the case [p. 153]. In making these statements it is probable t h a t Schrodinger was not thinking of the mathematical difficulty for probabilities concerning a cont i n u u m of events observed in a continuum of time. Consider t h e event E of an electron being at a certain energy level i observed through an interval of continuous time [0,t]. Let us denote by uX(s) = i" the event t h a t t h e electron is at energy level i at the time-instant s where s £ [0,t], so t h a t the compound (joint) event symbolized as
f){X(s)=i}
(7.1)
s€[0,t]
represents the event t h a t the electron has remained at t h e energy level i throughout t h e time interval [0,t], in other words it has not made any q u a n t u m j u m p . To calculate its probability, let p(h) = E{X{t 20
+ h) = i\ X{t)
= i}
B u t I can report that when the "unexpected expectation one" above was reported at an early Berkeley Symposium, the eminent statistician Jerzy Neyman voiced his surprised delight.
7. Gapless time
59
and assume that the physical process X(t) is a homogeneous Markov chain as discussed in §4 and more specifically in §6. Thus, in the notation of (6.9), our present p(h) is just the pa there, with h taking the place of the hidden time unit 1 in the latter. It follows that P{X{kh)
= i,for 1 < fc < n} = p(h)n,
n € N.
(7.2)
If we let h shrink to zero while simultaneously let n increase so that nh = t or just nh —> t, the preceding probability should in the limit become P{X(s)
= i, for a l l s in [0,t]},
(7.3)
which is the probability of that event in (7.1). So let us compute the limit. It turns out that we may assume that the function p(-) has a derivative at 0, viz. p ' ( 0
+
) = l i m ^ - « = -, /ij.o n
(7.4)
exists, where 0 < q < oo, because p(0) = 1 andp(/i) < 1. A little Newtonian calculus (you may need Mercator's famous power series for log(l — x)) yields \imp(h)n Mo
= e-qt
(7.5)
for 0 < t < oo. But is this really the probability in (7.3) of a continuous, gapless, uninterrupted description of the kind Schrodinger wondered aloud about? Inexorable mathematical logic outdoes him in declaring that the entity exhibited in (7.1), ostensibly a set of sample points, or if one prefers, a bunch of Feynman's paths, may not be subject to the laws of probability because it may not be probabilisable, namely measurable. Of course each factor {X(s) = i) is (assumed to be) measurable so that P{X(s) = i} is a number between 0 and 1, but the intersection in (7.1) is non-denumerable and the laws of probability provide no clue to such a continuum-intersection. A juvenile example will make this clear: put Ex = [0,1] — (x), where x 6 [0,1], i.e. the unit interval with one point x deleted, then m(Ex) = 1, where m is the Borel-Lebesgue measure (if one must name it). On the other hand rix6[o ll Ex is empty and so m(f]x Ex) = 0. In this example the continuumintersection is measurable alright, but its measure cannot be derived from the measures of its factors. I have given the example in the intersection mode,to make it look like (7.1). Its complementary in the union mode is
60
Part 1. Introduction
to random
time
nothing but Zeno-of-Elea's paradox:
[0,1]= (J (*) xe[o,i]
where (x) is the singleton. Apropos, suppose we change the interval [0,1] into the Cantor tertiary set C, which is non-denumerable (and "perfect") and has measure zero; then we obtain the true relation:
m(C) = 0 = £>((x)) = £ > , x€C
x
provided the summation over a non-denumerable set is permitted. Contrast this with m([0,l]) = l >
£
m((z)) =
:ce[0,l]
5 > x
Our physicist friends might say, "But who is interested in such a set (Cantor's)?". In the lecture by Schrodinger quoted above, he spent 4 to 5 pages describing that set in detail, with the following concluding remark (p. 143): I have brought this case before you, in order to make you feel that there is something mysterious about the continuum and that we must not be all too astonished at the apparent failure of our attempts to use it for a precise description of nature. Schrodinger listened to his colleague (teacher?) Radon known for the "Radon measure". Would he find it astonishing that a description of nature such as indicated in (7.1) could be "incalculable" owing to its nonmeasurability? Let us abandon the preceding rough and ready computation and make an organized effort. Following the great tradition of ancient mathematics let us bisect the time interval [0, t], then bisect again,..., the way monitored by Zeno, for Achilles to catch up with the tortoise. Namely, put h = 2~mt, m = 1,2,..., partition [0, t] into 2 m equal parts of length h, then let m go to infinity. Let (notation is important!) 2m
Am = f | {X(n2~mt) n=l
= i},
7. Gapless time
then we see that Am D Am+i
61
for all m > 1, and so Am has the limit
oo
Pi Am = f] {x(dt)=i}, m=l
dED
where D is the set of dyadic (binary) numbers in [0,1]. You must know that those are used to represent numbers in computers, e.g. the decimal 2000 is the dyadic 11111010000. Check this out! We have therefore proved, using a particular case of (7.3), that P{X{dt)
= i for all d e D} = e~qt.
(7.6)
The set {dt}, d € D, is denumerable and everywhere dense in the continuum [0,i\. There is absolutely no gap in the set, no interruption, yet it is not "continuous". Would Erwin accept this as a physical substitute for the unfathomable continuum in (7.1)? We must remind ourselves that even in this weakened gapless description we have made a fantastic extrapolation in letting m "go to infinity"—not just those tiny numbers like 10 10 the real-world astronomers play with; see footnote 7. Pure mathematics has found a way to reconcile (7.2) with (7.4). I have sketched a historical account of it in "Probability And Doob", American Mathematical Monthly 105 (1), January 1998; here I must condense it to a summary. The heart of matter, or root of trouble, goes back to the old definition (construction due to Kolmogorov) of a "Stochastic Process" as a family of random variables {Xt,t € R}, where R = (—oo, oo). But each Xt is really X(w, t), w € W, where W is the sample space with the probability measure P defined on a tribe (Borel field) of sets of w. This point of view is a historical remnant heritage from a finite or denumerable sequence of random variables Xi, X2, It does not "work" for a continuum of t in R. We must reinvent the stochastic process as {X(w, -),w E W}, namely a space of functions on R, parametrized by the sample point w. It is better to call them "sample functions" (rather than trajectories or paths) in order to remind ourselves that each one is just an ordinary function on line that is studied in a first course of calculus. We shall consider only the case of real-valued functions but must allow the limiting values ±00. Now here is an unfamiliar theorem about such a function: Theorem 7 . 1 . Let f be a function from R to R = [—00,00]. Then there exists a denumerable set S in R with the following property: for any t e l ,
62
Part 1. Introduction
to random
there exists an infinite sequence {sn(t),n lims n = i n
and
time
€ N} such that
lim/(s„) = /(£).
(7.7)
n
Indeed, we may choose all sn > t, or all sn < t in the above. I do not think this very general result can be found in any text book in real analysis. The proof is not difficult, at least without the unilateral restriction. Coining a new term, I shall describe the conclusion of the theorem by: " / is 5-approachable". Clearly this is clearer than the old term "S-separable". Note that S is necessarily dense in R, and may be enlarged at will. Thus if W is denumerable, then the denumerable family of function {fw,w G W} are all 5-approachable with one single S. Unfortunately we cannot assume our probability space W to be denumerable (although Feller does so in his popular Volume 1, precisely to evade measurability problems). Hence in order to make all sample functions iS-approachable with a single set S, it devolves on Doob [4] (c. 1952) to re-invent the stochastic process erstwhile constructed by Kolmogorov (c. 1929). The idea stems from the original ambivalence in the definition of a random variable such as Xt. As a function of w it is defined only almost everywhere, viz. unspecified on P-null set. This gives us leeway (wriggle room) to select a version of each Xt, a una a una, without disturbing the joint distribution of any finite (or denumerable!) number of them. Theorem 7.2 (Doob's re-invention). By selecting a particular version of each Xt, t € M., it is possible to make all the sample functions X(w, •), w G W, S-approachable with a single denumerable set S. For a proof of this theorem in the unilateral formulation, see DellacherieMeyer: Probabilites et potentiel (chapitre I a IV), p. 154ff.21 Let us apply this theorem to (7.1). If X(w, s) = i for all s in (0, t) n S, then (7.7) with / = X(w, •) of course implies X(w, s) = i for all s in (0, t). If we use the left approach we get also X(w, i) = i. But for each t we can also adjoin the point t to S. Incidentally we are assuming X(0) = i to begin with, to be indicated by P% below. Thus we have proved P*{X(s) = i for all s G [0,t]} = P^Xis)
= i for all s G Sn[0,t])}. (7.8)
21 While drafting these notes I asked Meyer if he knew of any application of the unilateral S-approachability. He did not. But there is t i m e .
7. Gapless time
63
To compute the second probability above, we re-arrange the S n [0, t] with t included as an increasing sequence of finer and finer partitions of [0, t] into a finite number of subintervals with maximum length decreasing (nonincreasing!) to zero, so that the probability is limp(/ii)p(/i 2 )---p(/i„), where hi + h,2 + ••• + hn = t but the hs vary with each n. Then the limit is computed as in (7.5) with the same result. In short, this is an extension of the case treated before with equal partitions. Actually there is an indirect way which avoids the slightly messy approach, but which requires a theoretical assumption and its verification. Applied mathematicians and physicists tend to shun theory in favor of rough-and-ready computation; it is for this reason we will state the following special case of Doob's theorem, which has an easy proof. Theorem 7.3 (Special Case). Suppose that the stochastic process {Xt,t £ R} has the property that for each t, we have lim P{\Xt+h
- Xt\ > e} = 0 for every e > 0;
(7.9)
h—>0
then in Doob 's theorem any denumerable set S dense in R will serve. In the present case the condition (7.7) is satisfied since rim^o p(h) = 1, which is implicit in (7.4). Hence we can use the set {dt, d G D} for S and so return to (7.6). Neat? The condition (7.7) has a technical name but I am afraid of using it for fear of facile confusion (which has actually occurred as alluded to in my article cited above). 22 To test the meaning of S-approachability, suppose we have a process in which all sample functions X(w, •), for all w, are continuous at all points of the famous set S in the theorem; does it follow then that they are in fact all continuous for all t in R? The answer is no. Norbert Wiener and later 22 Condition (7.9) means: Xs converges to Xt in probability (which Levy called "in Bernoulli's sense"). It is weaker than: Xs converges to Xt with probability one, viz. almost surely. T h e former condition is true for every t for a homogeneous Markov chain with a standard transition matrix, discussed in section 9, where the state space is denumerable, even if instantaneous states exist. The latter condition is true when the state space is finite as in section 8. Now suppose for every t, X„ converges to Xt with probability one. This may be stated as: "the process X is almost surely continuous at every (all) t" . Some mistook this to mean "almost all sample functions of the X process are continuous (for all t)". The last statement is stronger and is the property of the Brownian Motion Process mentioned at the end of the section.
64
Part 1. Introduction
to random
time
Paul Levy had to work a lot harder to prove the continuity of the Brownian paths.
8.
Markov Chain in Continuum Time
We continue with an anytron observed in various states by quantum jumps. The probability model is a Homogeneous Markov Chain (HMC) {Xt, t e R} where R = [0, oo) and each Xt is a function on R to the state / = { 1 , . . . , L}. Its transition matrix is P(t) — (Pij(t)), a square L x L matrix satisfying the following conditions, for t > 0, s > 0: P(t) > 0,
P(t)l = 1,
P(s)P(t)
= P(s + t);
limP(i) = identity matrix.
(8.1)
These are the continuous analogues of (6.9), (6.10), (6.11), except the last initial condition known as "standard". A HMC can be constructed (a la Kolmogorov) so that Pij(t)
= P{X(s +
t)=j\X(s)=i}
= P{X(s + t)=j\X(s)
=
i;fs},
where Ta is the tribe generated by X(r), r € [0, s]. Each pij being a function on R to / , its analytic properties are essential and must first be derived from (8.1) by Newtonian methods. It is easy to prove that it is continuous in R. For i — j it is also easy to prove that p'ii(0) = qu exists and —oo < qu < 0. This has been recorded in (7.4) and plays a major role in the not so easy proof that Py (0) also exists for i ^ j , and 0 < qij < oo. In fact the latter depends on probabilistic thinking even if concealed in algebraic formulas; see [4, pp. 246-248]. Let us put Q = (p«(o)). Facile derivations from (8.1) then yield Q\ = 0,
P'(t) = QP(t) = P(t)Q.
(8.2)
The two differential equations are called "backward" and "forward" respectively, and the latter is said to be a particular case of the Fokker-Planck equation. All these results can be mathematically (viz. rigorously) proved but the details, (as e.g. spelled out in [4]) might deter if not dissuade the novice.
8. Markov chain in continuum
time
65
What has all this to do with the stochastic process? We begin with scrutinizing the set-theoretic identities: sup {t > 0 : X(w, s) = i for all s in (0, t)} = sup {t > 0 : X(w, s) = i for all s in (0, t]} = mf{t>0:X(w,t)^i}.
(8.3)
Define this function of w by T(w) and name it as the sojourn time in i, under the hypothesis X(w,0) = i. This is a random time. In the third incarnation above, it is reminiscent of several first entrance times in discrete time defined in the previous sections, although it is manifestly a first exit time. No matter, we can now record the result painstakingly proved in §7 by (7.6) and (7.8) as follows: P^T
>t} = eqiit,
(8.4)
which is true for t in [0, oo). The probability law in (8.3) is the exponential and —qa is called its "rate" sometimes. The chemical element radium is supposed to decay at certain rate exponentially; among other applications it renders an estimate of the age of archaeological findings such as the ice man recently discovered in the Alps. The exponential law, being so simple, is cherished by applied scientists and can be retrieved analytically by its "memoryless" property: P{T >s + t\T>s}=
P{T > t},
for all s > 0 and t > 0, but "that is another story". Our next question is the random place X(T) = X(w,T(w)). The alert reader should have observed from the first equation in (8.3) that this may well be equal to i and therefore useless. Conventional wisdom suggests that our sample function X(w, •) should take some value other than i after T(w), namely the anytron should jump to another state j ^ i. If this is obvious to the physicist it is questionable from the unrelenting querulous mathematical point of view. The fault lies with the continuum of time: there are so many instants of time right after the random T(w), even if we admit Doob's Sset: how does one know that the function s —* X(w, s) cannot take various different values as s decreases to T(w)7 Namely, it is possible that there are si > S2 > • • • I T(w) such that X(w, sn) = j , X(w, sn+i) = k where j ^ k (one of them may even be i!). For such a sample, how can we speak of the state of the anytron when it exits from i? We cannot, but fortunately we can prove, preemptively, that such bad-behaving samples may be ignored,
Part 1. Introduction
66
to random
time
discarded or purged (in the Russian vocabulary). Namely they belong to a P-null set and contribute zero to any calculable numerical probability. Theorem 8.1 (Re-invention of Markov chain). We may suppose all sample functions to be right continuous and to have left-limits everywhere in (0, oo) (left limit at oo being excluded). The proof of course depends on the special character of the process accruing from its transition matrix and the finiteness of the state space as well as the preliminary 5-approachability where S may be taken to be the dyadic set because condition (7.9) is satisfied. The result may be regarded as a junior companion to the continuous sample functions for the Brownian Motion Process mentioned at the end of §7. But this particular re-invention tends to be down-played as if it were "easy to see"—it is definitely not, as a glance at pp. 248ff in Doob's book cited above will show. Now we can find out about the existing right limit at T(w): X(w, T(w)+) which will be abbreviated to X(T+) or even X(T). The theorem above says that it is equal to some j not i and we want to compute its distribution. The conditional probability that in a short time h, the anytron quits i and reaches j is P{X(t + h) = j | X(t) = i and X(t + h) ± i}. Owing to time-homogeneity we may take t = 0 and so get pij(h)/(l — pu(h)). As h I 0, the limit must be the probability of a jump from i to j : lim
J&&L
KM/*
= lim
=
hiol-Pii(h) hlo(Pii(0)-Pii(h))/h As a beautiful check we have £
^
W
:
P
"
( 0 )
= 0,
4 | = .SL. -p'u(0)
hence £
m
(8.5)
-qu
= 0.
(8.6)
From here on let -qu
(> 0).
The novice will be disgruntled to be told that whereas the conclusion in (8.5) is correct the argument does not constitute a mathematical proof.23 Even if we assume all sample functions to be right continuous etc As Schrodinger shrewdly perceived, continuous time is ungovernable and must be tamed by discrete imitations. Something like Archimedes measuring the Such a "proof" is given in some textbooks without ado. It is difficult to explain to a student what is wrong with it. It is a little like telling a foreigner a grammatical or syntactic mistake in a language that he has not yet learned.
8. Markov chain in continuum
time
67
slippery circle circumference by persistently inscribing and circumscribing polygonal chords and tangents. In the present case we must approximate the random continuous T(w) by discrete "skeletons": T by T ^ and X(T) by X(T< m )), where Tim){w) v
=
'
[2-T(W)] 2m
+
l
for m £ N. Each new random time T^ takes dyadic (binary) numbers as its values. This corresponds to the squares, octagons, . . . (what are the Greek name for 16, 64?). As m increases indefinitely, T^ decreases to T and X{T^m'>) approaches X(T) by right-continuity. This was how the following result was proved by Doob (loc. cit.) in a tedious, excruciating way. Theorem 8.2 (Joint distribution of T and X(T)). j ^ i, we have
For each t € R and
Pl{T < t and X(T) = j} = (1 - e~qii) ^-.
(8.7)
Qi
It is done by tracking the trajectory, namely cherchez the sample function. There is no other way. Let us normalize (Q) into a "stochastic matrix" (known to algebraists)
[0,
if j=
i.
A HMC in discrete time can be constructed with V as its P, as in (6.11). We call it the "jump chain" and denote it by {Yn,n € N}. The intuitive picture is clear: the chain {Xt,t e R} in continuous time follows the "road map" indicated by its jump chain, at a succession of jumping times Ti < T2 < . . . ; when X(Tn+) = j , T„+i — Tn is the sojourn time in 3 distributed exponentially with rate qj. The whole path t —> X(w, t) is an (extended) step function in R, very easy to visualize. There is just one little question: is it clear that this picture (Schrodinger's "description") tells all? In symbols, is it certain that lim n Tn = 00? I have often wondered if the practising scientist can see this. A quick check is to compute the mean, viz. E(Tn): this is easy because the mean of an exponential distribution with rate qj is qj . Although we do not know what the successive s are, we know there is a maximum among the qjS, call it c, so that E{Tn — T n _i) > c _ 1 for all n and therefore E(Tn) > nc —> 00, quick and easy. Unfortunately this does not imply that Tn —> 00 by a long
Part 1. Introduction
68
to random
time
shot. The failure was forewarned in (1.14): its converse is false. To stress the point: for a random variable Z > 0, it is possible that E{Z) = oo while Z < oo. We will give an elementary proof of the desired result along the lines of (4.11). We have 24 P{Tn - T n _! < 1 | TX,... , T n _ i } < 1 - e- c , hence for any m > 1: P{Tn - T„_! < 1 for all n > m} < (1 - e" 0 ) 0 0 = 0.
(8.8)
This means the event {Tn — T„_i > l,i.o.} has probability one, namely an infinite number of the successive sojourn times exceed 1 (micro-second, or light-year) and so, of course, Tn increases to infinity. By the way, those sojourn times exceeding one in duration are random-random times, namely the numbers for which Tn — Tn-\ > 1 is a function of the hidden w : n = n(w), a random integer. It is possible that qi = 0, in which case (8.4) shows that T = oo a.s. Such a state is called "absorbing" in a physico-chemical context. Since time is our concern it will be called "permanent"—one must rectify the name. 25 In discrete times the state is permanent iff pa = 1. We have met with such states at the outset of our discourse: in (1.1) both "a" and "6" are permanent. The gambler's problem consists in the probability of ending up at the one or the other. There are many practical problems of this sort. To return to the description of a HMC in continuum time: Time from zero to infinity is dissected into a denumerable number of contiguous random intervals [T n _i, Tn), n = 1,2,3,... with To = 0. In each one of these the "chain" stays in a fixed state. The lengths of these random intervals are distributed independently of one another given the states they represent, and they are all exponentially distributed with rate qi for the state i. The total number of the intervals is infinite unless the chain gets into a permanent state. 24 T h e Strong Markov Property is used below: given the pre-T n _i tribe, the random variable Tn — T „ _ i is exponentially distributed with rate qx(Tn_1)> a random number, one of the qj, 1 < j < L. 2b 7i\ Lu asked Confucius what would be his first order of business to govern the state Wei. "That must be to rectify the names" answered the master. "What a pedant you are!" cried the bravado pupil. (The Analects, Chapter 7; my free rendering.)
9. The trouble with the
9.
infinite
69
The Trouble with the Infinite
In the preceding discussion the state space J (physical configuration space) is finite. As discussed in §6, for a HMC in discrete time a theory has been developed for a denumerably infinite, discrete space. The additional qualifier "discrete" is here used in the topological diction, meaning "no accumulation" so that e.g. the set of rationals or the smaller set of dyadics is excluded but the set of integers or the smaller set of prime numbers is allowed. Thus we may as well use N for / , confounding discrete time with space when there is no fear of confusion. For such an J a HMC can be constructed as before, and the relations in (8.1) hold despite the fact that we are now multiplying infinite matrices (and columns like 1) which are not always permissible for arbitrary infinite matrices. In particular the famous Frobenius-Perron algebraico-analytic treatment of positive matrices no longer "works"—at least for our problems. But all Pij are still continuous and the initial derivatives p\, (0) still exist; however —p'u(0) = qi may be +00, which serves as a timely warning. Following Paul Levy (1951), the state is "instantaneous" if qi = 00, "stable" ii qi < 00 ("permanent" if qi = 0). Incredibly, all « oo
So the fictitious state makes a new appearance, this time however as a left limit, in contrast to its former appearance as a right limit at an exit time discussed earlier. There is symmetry here. By the way, now is the time to look again at our initial re-invention of the process: when we make it right continuous and to have left limits we mean (without saying so then) in the compactified /*, not in I. This is also necessary when the state space is (—oo, +oo) which must he compactified to [—oo, +oo]. The inescapable role of the infinite is revealed in the following theorem [8, p. 249, Theorem 4]. Theorem 9.2. Necessary and sufficient condition for the validity of both systems (9.5) is that almost all sample functions X(w, •) have the property: wherever X(w,s) —> oo as s —> t from one side, it does so also from the other side.
75
9. The trouble with the infinite
Namely the sample function cannot leap to infinity (as at an exit time discussed above), or leap back to any finite state (as at the first infinity). The conservatism of the Q-matrix precludes the first kind of leap, and is a necessary and sufficient condition for the validity of the first (backward) system. No such analytic condition has ever been found for the validity of the second (forward) system. So Fokker is trickier. A sufficient condition for both systems is the boundedness of the set of numbers qi,i £ I. Indeed under this assumption the chain behaves in the same way as described in §8 for a finite 7; for a special treatment of this case see [17]. Here is a famous example. Let all qn = A, any constant greater than 0 and let qn,n+i — In as in (9.6). The Markov chain with this Q-matrix is the Poisson process, introduced in (5.16) by means of random times. In general, when the qi are bounded, the first infinity time is infinite with probability one, and so the minimal solution is the P(t) of the chain, and the chain behaves in the same way as described in §8 for a finite I. On the set {w : Too(w) < oo} we define the post-first-infinity process similarly to the post-exit process (9.3): Z(w,t)=X(w,T00(w)+t),
teR.
Observe that Z(w,0) need not be oo because it is the limit of X(w,t) as t approaches T00(w) from the right. We have for every t > 0 : Pl{Zt € 1} = 1 for every i E I as in the case of Yt before (there the initial state i is given). But now the after-life of the process opens up a new vista. This was initiated by Feller (1957) as a "boundary" theory, a ramification of the fictitious state into diverse boundary points as X(w,t) goes to the boundary when t increases to T00(w). It turns out that some boundary points are "sticky", others not, and the path's return from the boundary may take multifarious ways. Sooner or later it may go up to another boundary point, and return to terra firma, and so on and so forth. A glimpse of a putative picture is given in [16]; details are in [15]. It is all done with random times. Finally, we will say a few words about instantaneous states, so far excluded. To describe them we must define the random time-set: Ai(w) = {te R:X(w,t)
=i}.
In analogy with (6.25), this may be called the "occupation time" of the state i. If i is stable, it can be shown to be a sequence of separated intervals. If i is instantaneous, it is a perfect set nowhere dense in R, and has BorelLebesgue measure greater than zero. Namely except for its measures it
76
Part 1. Introduction to random time
is like the famous Cantor set t h a t Schrodinger described in his lectures cited above t o illustrate t h e wonder of continuum. A basic property for an instantaneous s t a t e i is t h e following: P l { l i m i n f X(w,t) '10
= i and l i m s u p X ( i y , t ) = 00} = 1. t|0
T h u s an instantaneous state i and t h e fictitious s t a t e 00 co-habit in inseparable symbiosis. Are there q u a n t u m physical phenomena which might make use of this mathematical model? Comparing t h e simplicity of a finite / with t h e complications of an infinite / , one may well question the practical wisdom of t h e ideal extension. 3 0 A conventional reply t o this non-rhetorical query is "approximation". How to do it in any particular situation in a meaningful way is an important problem for t h e applied scientist. A case in point is t h e so-called "explosion" in (9.9); its non-monotonic feature suggests occasional relief due to h u m a n control or nature's accidents such as occurring in population growth. And what about t h e reverse explosion in (9.3)—does it not sound like a big bang?
References 1. Antoninus, Marcus Aurelius, Meditations. 2. Emile Borel, Valeur pratique et philosophie des probabilites (GauthierVillars, 1939, 1952). 3. Kai Lai Chung, A course in probability theory, 3rd ed. (Academic Press, 2001) (1st ed.: 1968). 4. J. L. Doob, Stochastic processes (John Wiley, 1953). 5. Kai Lai Chung, Elementary probability theory with stochastic processes, 4th ed. (Springer-Verlag, 2000). 1st ed.: 1974. Cited page numbers are for the 3rd ed. (1979). 6. William Feller, An introduction to probability theory and its applications, Vol. 1, 3rd ed. (John Wiley, 1968) (1st ed.: 1950). 7. Kai Lai Chung, Green, Brown, and probability & Brownian Motion on the line, 2nd ed. (World Scientific Publishing, 2001). 1st ed.: 1995. 8. Kai Lai Chung, Maricov chains with stationary transition probabilities, 2nd ed. (Springer-Verlag, 1967) (1st ed.: 1960). 30
On p. 38 of Nature And The Greeks, Schrodinger quoted "an eminent scientist" who told us that the total number of elementary particles in the world was 16 x 17 x 2 256 , where 256 is the square of the square of the square of 2. The number 17 was left unexplained. It is known as a magic number in Italy and happens also to be the number of my old house on Wushan Road, Hangchow, China.
References
77
9. Roger Penrose, Emperor's new mind (Oxford University Press, 1989). 10. Kai Lai Chung, "Reminiscences of one of Doeblin's papers", Contemporary Mathematics 149 (American Mathematical Society, 1993). 11. Erwin Schrodinger, Science and Humanism (Cambridge University Press, 1951). 12. Isaac Todhunter, History of the Mathematical Theory of Probability from the time of Pascal to that of Laplace (McMillan, 1865). 13. Kai Lai Chung, "Poisson process as renewal process", Period. Math. 2 (1972), 43-52. 14. Kai Lai Chung and John B. Walsh, "To reverse a Markov process", Acta Math. 123 (1969), 225-251. 15. Kai Lai Chung, "Boundary behavior of Markov chains and its contributions to general processes", in Proc. International Congress of Mathematicians, (Nice: 1970), 499-505. 16. Kai Lai Chung, Lectures on boundary theory for Markov chains (Princeton University Press, 1970). 17. Kai Lai Chung, "Stochastic Analysis of Q-matrix", Selected Topics on Stochastic Modelling (Word Scientific, 1994), 3-16. Nota bene: A number of references given in t h e text are not repeated in this list.
This page is intentionally left blank
Foreword to Part 2
After the series of lectures given in 1997 by Professor Kai Lai Chung at the Group of Mathematical Physics in Lisbon, where he described many kinds of random time, we started to talk more generally about the strange status of probability theory in quantum mechanics. The subject has been, to say the least, much debated from the very beginning of quantum theory. A number of peremptory "solutions of the problem" have been announced in mathematical physics, then discreetly withdrawn. But what problem? Everyday, hundreds of quantum experiments are carried out around the planet which confirm the theory. The mathematical apparatus, axiomatized by von Neumann in 1932, has reached a remarkable level of sophistication and elegance, allowing us to answer most of the questions relevant to a given experimental set-up. Most but not all. Some apparently elementary problems have resisted so far. For example, consider a beam of free quantum particles in K3, starting from the origin O. We would like to measure their time of arrival at the boundary of a sphere of radius x centred in O. This problem has no solution in quantum theory because the solution would involve a time observable and it is known (at least since W. Pauli) that such a time would have properties conflicting with the axiomatic (von Neumann's) concept of observable. Many physicists would consider this explanation as sufficient and would not even regard this state of affairs as unfortunate. According to von Neumann (1932), however, it is no more no less than the "essential weakness" of quantum theory. At around the same time as von Neumann, in a series of conferences at the Institut Henri Poincare, Schrodinger described the conciliation between special relativity and quantum mechanics as the most severe difficulty of theoretical physics. It was obvious to him that we should renounce our Newtonian concept of time,
79
80
Foreword to Part 2
"much too classical, and not only because of relativity" [34, section VI, part 2]. In his words: "The time is an observable and must be treated as an observable". So the time should be random. Now, consider instead an analogue of the abovementioned elementary problem, where the free quantum particles are replaced by a R3-valued Wiener process. Then the first hitting time on the boundary is a well defined random variable, an optional time about which everything is known explicitly. According to the first author of these notes: "This is the single tool that separates probabilistic methods from others, without which the theory of Markov processes would lose much of its strength and depth" ([19, p. 6] of Part 2). On the other hand, in what specific sense should we regard this "analogous" problem as the proper probabilistic counterpart of the quantum one? The central (von Neumann's) axiom of quantum theory is that of "quantum statics" (axiom 2.3 here). It is fundamentally probabilistic. But the very basis of what is needed to start thinking, in probability theory, i.e. a probability space, is never specified. So, although the result of most quantum experiments is intrinsically random, these are not random experiments in the sense of probability theory. To A. Einstein's despair, it seems that God does, indeed, play dice and that He does it without any respect for the rules of classical probability theory. This strange situation is at the origin of particularly complicated relations between the two associated scientific communities. A number of probabilists regard quantum theory as a mystery not far from irrational. A few of them announce periodically that they have solved all the problems at once, in general after the discovery of a probabilistic interpretation of a particular Schrodinger equation, sometimes known for more than thirty years. Feynman's path integral approach to quantum mechanics is by far the most puzzling for probabilists, as it is very close to the familiar ideas of stochastic analysis developed since N. Wiener. However, they realize immediately that the symbolic stochastic processes manipulated informally by Feynman cannot make mathematical sense, and so cannot induce any measure in a space of paths. Moreover, they are told by physicists that the probabilistic ("Euclidean") method due to M. Kac, and regarded as the rigorous counterpart of Feynman's, is technically useful but conceptually very far from the real thing. It is all the more frustrating for probabilists to see the extraordinary influence of Feynman's ideas in modern mathematics and physics, since they know how shaky their foundations are. And if one remembers that the mathematical formulation of Feynman's approach
Foreword to Part 2
81
is certainly one of the burning open problems of modern mathematical physics, there should be no surprise that any new serious attempt in this direction is much debated. Also notice that the common interpretation of Feynman's ideas does not make any room either for physically meaningful random times, though in this perspective their absence is considerably more shocking than in Hilbert space. Symmetrically, physicists are used to the principle that probability theory and stochastic analysis are of no use at all in quantum physics. Although some mathematical methods from Euclidean quantum field theory have been successfully imported into equilibrium statistical mechanics since the seventies, it does not seem that quantum dynamics itself has ever benefited directly from any such classical analogy. As regards the path integral approach, most physicists know about the mathematical difficulties associated with the presence of the factor V — 1 in front of the action functional in Feynman's representation of the solution of Schrodinger's equation. Consequently, many convinced themselves that only time discretized results are physically meaningful (since the continuum limit does not exist), although the father of path integrals often complained about the lack of proper technical tools to go beyond such inelegant expressions. Even among theoretical physicists who use Feynman's ideas on a daily basis, there is a kind of soft consensus that they can be completely understood without the slightest knowledge of the theory of stochastic processes. Part 2 is elementary. Its aim is to show that the gap between quantum mechanics and stochastic analysis is indeed very small, if one adopts an appropriate perspective. We will do this in the simplest possible context, i.e., that considered by Feynman of quantum systems with a classical analogue. The two communities involved are perfectly entitled to their mutual suspicion as long as they refuse to abandon the conventional mindsets mentioned before. However, there is a consistent line of thought, as old as quantum mechanics itself, which is a natural playground for both. Feynman's approach, in particular, can be more deeply understood in this perspective. So much so that his ideas can be elaborated in order to find qualitatively new results about standard quantum theory in Hilbert space. Given this preamble, it will be clear why we have adopted a presentation stressing much more than usual the historical evolution of the ideas. We believe that if one wishes to go beyond the vague and sterile analogies which are at the origin of the abovementioned suspicions, there are very few independent and consistent strategies available and that, somehow,
82
Foreword to Part 2
their basic ideas have been around for a long time, but have been ignored or misunderstood. This second part of these notes is an introduction to quantum randomness, where the role of the (deterministic) concept of time is considerably more emphasized than in traditional presentations. It summarizes, first, the origin of the trouble with probability in elementary quantum mechanics and shows why, in a way, it is illusory to claim that the problem can be solved just by the introduction of some underlying stochastic processes (unless, of course, one tries to construct a completely different theory). However, the conceptual foundations of the Feynman path integral approach are revisited and it is shown that they suggest a general construction of probability measures on path space quite distinct from the usual one. Then, it is described how this construction can be done rigorously, in a probabilistic analogy different from that of M. Kac and suggested, in fact, by Schrodinger in [34], just after his alarming diagnosis on the lack of time observable. This analogy is so strong that it allows to understand, for example, fresh aspects of quantum symmetries in Hilbert space. In this perspective, we can claim simultaneously that quantum mechanics is not a stochastic theory but that probability theory and stochastic analysis are very natural tools for further investigation of the structure of quantum theory, including some of its fundamental problems. In particular, the abovementioned analogy should allow us to randomize the time parameter in a way suggesting the existence of a kind of timedependent potential theory, which is interesting both for probabilists and quantum physicists. Such a theory is not given here, but we hope to show in which sense it is reasonable to expect that one may be constructed. So, although the notes are elementary, their two parts taken together constitute a research program which may be of interest for graduate students in mathematics and physics. We have had to adapt the style of these notes to their unconventional purpose, namely to think afresh about the role of time in probability and quantum physics. Throughout the years, many publications, in physics and mathematics, have contributed to the mutual suspicion mentioned before. The problem seems to lie at the level of the very basic principles, and therefore a clarification may be needed. So, although we refer to rigorous and detailed works, complete proofs are generally not given here. Instead, the conceptual background is carefully described, in an historical perspective, in order to emphasize the continuity of the viewpoint advocated in these notes.
Foreword to Part 2
83
Our reader is assumed to have only a basic knowledge of probability theory and quantum physics but a serious curiosity about their conceptual foundations. We wish to advocate a more constructive dialogue between the associated communities, which should be exciting and rewarding for each of them, if the priorities of the other are taken into consideration. Any interdisciplinary exchange of ideas is hard: it requires, after all, at least some interest for two fields of knowledge. We hope to convey to our readers the feeling that it can also be very stimulating. I am very grateful to F. Bardou and A. B. Cruzeiro for their critical reading of a first version of Part 2. J E A N - C L A U D E ZAMBRINI
Lisbon, 2001
This page is intentionally left blank
Part 2
Introduction to Quantum Randomness
1.
Classical Prologue
To think about a random time, we have seen that we need a random walk (or process) to start with. There are a number of physical phenomena where such processes are omnipresent. They belong to the province of statistical mechanics, this venerable domain of classical physics created in the nineteenth century by Maxwell, Boltzmann and Gibbs. All accept that in order to study the properties of systems of 10 23 particles it is hopeless to investigate their Newtonian ordinary differential equations (ODEs); these are much too complex and their initial conditions are inaccessible. Average values over ensembles of systems are what we actually observe and this is where stochastic processes appear naturally. In short, one may say that, although the foundations of statistical mechanics have been the scene of violent polemic, the natural role of probability theory has never been in doubt in this context. The situation is remarkably different in (non-relativistic) quantum mechanics. We are going to do a tour of elementary quantum theory, emphasizing the obscure meaning of probability in this framework as well as the very discreet role played there by time. In consequence, we shall not be able to produce a general quantum version of random time (nobody has ever had a serious clue about this) but we hope to show why the problem of its construction is, indeed, serious, and advocate a fresh and more systematic way to approach it. The shortest way to introduce quantum mechanics is to list the axioms put forward by J. von Neumann. It is easiest to start with the corresponding axioms for classical mechanics since we live in a classical world. Axiom 1.1. To every (classical) physical system corresponds a differen85
Part 2. Introduction
86
to quantum
randomness
tiable manifold, the phase space S, and to every (pure) state £ of the system corresponds a point in S. All over theoretical physics, the concept of state of a physical system is defined in such a way that its knowledge determines uniquely the solution of the equation of motion of this system. The archetype of mechanical system is the JV-body problem whose formulation appeared in Newton's Philosophiae Naturalis Principia Mathematica (1687). If g^ = qi(t) denotes the Cartesian coordinates of the i-th body (a vector in R 3 ) at time t, the equation of motion of this system is d2Qi v^m,m7- , . . . w » - ^ - = > 7^ -sili-Qj), i,j = l,...,N, (1.1) dt *£i l