This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
.)
( IT 1 +
�
lc=O
[ :>.) :>.-lc � n-:>.
-k n - X
) <e�c-o
x
(:>.+!>"
� e2(n-:>.>.
Consequently
m X"' Pm < 1- � ei����; 1 + �1 + � 1·2 + ·· · + 1·2·3···m
( )
[
..
n
But
},. "
whence
Pm <eXQmi
Qm
=
e
<e
1
}. -:>.-2n,
( ) - [ 1+�1+ 1x•· 2 + 1-;
:>.
•
.
On the other hand, ..
1
- Pm
whence and
n(n =
,.-m+l
- 1)
•
•
•
"'
(n - 1£
+ 1)
]·
q"-"P"
.
xm + · ·3 12 · · ·m
]·
=
1 Pm <eX(1 - Qm) eX. P m > eXQm + 1
-
-
The final statement follows immediately from both inequalities obtained for P ...
137
APPROXIMATE EVALUATION OF PROBABILITIES 10. With the usual notation, show that
where �
Q
6 n
=
[ (
(n-m)A•_m(m-1) 2n• 2n 1
Indication of the Proof.
_
(J
m)).l E MoiVRE: "Miscellanea analytica," Lib. V, 1730. DE MoiVRE: "Doctrine of Chances," 3d ed., 1756.
LAPLAcE: "Theorie analytique des probabilit�," Oeuvres VII, pp. 28G-285. Ca. DE
LA
VALLEE-PoussiN: Demonstration nouvelle du theoreme de Bernoulli,
Ann. Soc. Sci: Bruxelles, 31. D. MIRIIIIANOFF: Le jeu de pile ou face et les formules de Laplace et de J. Eggenberger, Oommentarii Mathematici Helvetici 2, pp. 133-168.
CHAPTER VIII . FURTHER CONSIDERATIONS ON GAMES OF CHANCE 1. When a person undertakes to play a very large number of games " under theoretically identical conditions, the inference to be drawn from Bernoulli's theorem is that that person will almost certainly be ruined if the mathematical expectation of his gain in a single game is negative. In case of a positive expectation, on the other hand, he is very likely to win as large a sum as he likes in a sufficiently long series of games. Finally, in an equitable game when the mathematical expectation of a gain is zero, the only inference to be drawn from Bernoulli's theorem is that his gain or loss will likely be small in comparison with the number of games played. These conclusions are appropriate however, only if it is possible to continue the series of games indefinitely, with an agreement to postpone the final settling of accounts until the end of the series.
But if the
settlement, as in ordinary gambling, is made at the end of each game, it may happen that even playing a profitable game one will lose all his money and will have to discontinue playing long before the number of games becomes large enough to enable him to realize the advantages which continuation of the games would bring to him. A whole series of new problems arises in this connection, known as problems on the duration of play or ruin of gamblers.
Since the science
of probability had its humble origin in computing chances of players in different games, the important question of the ruin of gamblers was discussed at a very early stage in the historical development of the theory of probability.
The simplest problem of this kind was solved by
·Huygens, who in this field had such great successors as de Moivre, Lagrange, and Laplace. 2. It is natural to attack the problem first in its simplest aspect, and then to proceed to more involved and difficult questions. Problem 1.
Two players A and B play a series of games, the proba
bility of winning a single game being p for A and q for B, and each game ends with a loss for one of them.
If the loser after each game gives his
adversary an amount representing a unit of money and the fortunes of A and B are measured by the whole numbers a and b, what is the proba bility that A (or B) will be ruined if no limit is set for the number of games? 139
140
INTRODUCTION TO MATHEMATICAL PROBABILITY
Solution.
[CHAP. VIII
It is necessary first to show how we can attach a definite
numerical value to the probability of the ruin of A if no limit is set for the number of games. As in many similar cases (see, for instance, Prob. 15, page 41) we start by supposing that a limit is set. Let n be this limit.
There is only a finite number of mutually exclusive ways in which
A can be ruined in n games; either he can be ruined just after the first game, or just after the second, and so on. Denoting by p1, p2, . . . p,. the probabilities for A to be ruined just after the first, second, . . . nth game, the probability of his ruin before or at the nth game is P1+ P2 + · ·
·
+ p,..
Now, this sum being a probability, must remain , Ya+1>-1,
is negative, then Proof.
Let
that the player
a+ b
:..__
•
•
•
Ya+l>-fJ+1
!5;:; 0 for x = O, 1, 2, ... a + b. u�"'> (k = 0, 1, 2, . . . a - 1) represent the probability A whose actual fortune is x (and that of his adversary
y,.
x) will be forced to quit when his fortune becomes exactly k. u�"'> is a solution of equation (4) satisfying the conditions
Evidently
=
SEc. 3)
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
n�k> = 0 for a+ b- 1,
x = 0, 1, ... k- 1, k+ 1, uf•> a+ b - fJ+ 1; 1.
. . . a- 1;
145
a+ b,
==
Similarly, if v�0(l = 0, 1, 2, . . . fJ
1) represents the probability that
-
the player B will be forced to quit when the fortune of A becomes exactly =
a+ b - l, v�1> will be a solution of (4) satisfying the conditions
x = 0, 1, 2, . . . a for v�!> = 0 a+ b - l- 1, ... a+ b- fJ + 1;
-
1; a+ b, ... a+ b - l+ 1, v��-� = 1.
Thus we get a+ fJparticular solutions of (4), and it is almost evident that these solutions are independent.
Moreover, since they represent
probabilities, u�"> � 0, v�n � 0 for x = 0, 1, 2, ... a + b.
Now, any
solution y., of (4) with given values of
Ya-1 Ya+b-/3+1
Yo, Y1, . Ya+b> Ya+b-1,
•
can be represented thus
y.,
=
/3-1 a -1 �y�r,u�k> + �Ya+b-!V�!>. k=O
l=O
Hence, y., � 0 for x = 0, 1, 2, . ..a+ b if none of the numbers
Yo, Y1, . . . Ya-1 Ya+b> Ya+b-1, Ya+b-/3+1 •
is negative.
•
•
This interesting property of the solutions of equation (4)
derived almost intuitively from the consideration of probabilities can be established directly. (See Prob. 9, page 160.) The lemma just proved yields almost immediately the following proposition: If for any two solutions y� and y�' of equation (4) the inequality holds for X = 0, 1, 2,
a - 1;
a+ b, a+ b- 1, ... a+ b - fJ + 1,
the same inequality will be true for all x = 0, 1, 2, ... a+ b.
It suffices to notice that y., = y�' - y� is a solution of the linear equation (4) and, by hypothesis, y., � 0 for a+ b - 1, ... a+ b - fJ+ 1.
x = 0,
Now we can come back to our problem. expectation of A
pfJ-
qa
1, 2, . . . a- 1; a+ b, First, if the mathematical
1 46
INTRODUCTION TO MATHEMATICAL PROBABILITY
is different from 0, equation (7) has two positive roots:
1
[CHAP. VIII
and 8.
With
arbitrary constants C and D
y�
=
C + D8"
is a solution of (4). Whatever C and D may be, y� as a function of x varies monotonically. Therefore, if C and D are determined by the conditions
y�
=
1,
Y�-tb-11+1
0
=
we shall have
y� � 1 y� � 0
0, 1, 2, . . . a - 1 + b-{3+ 1, . . . a + b
if
x =
if
x = a
and by the above established lemma, taking into account conditions (5) and
(6), we shall have for the required probability the following inequality
or, substituting the explicit expression for
y�,
8a+b-JI+l - 8"' . Y .., = > 8a+b-J1+1 1 _
If, on the contrary, C and Dare determined by Y�-1 ,;
1,
we shall have
y� G 1 y� G 0
if
X
if
x =
and
Finally, taking
x = a,
0, 1, 2, . . . a - 1 a + b-{3+ 1, . a + b
8a+b-a+1 - 8z-a+1
1,
we
+ tp(fJ)a'"']fJ1dfJ.
c
Here the integration is made along a circle of sufficiently large radius and
f(B)
and
tp(B)
are two unknown functions which can be developed into
series of descending powers of {J. in
x
and t.
For
x
=
Obviously z,.,, satisfies
(13)
identically
0 and t � 0 we have the condition
�i [f(B)
+ tp(B)]B1dfJ
=
1;
t
o,
=
1, 2, .
which is satisfied if
J(fJ) + tp(B)
(17) Condition
(15) will
=
1 B-1
--·
be satisfied if
aa+bJ(B) + a'a+biP(B)
(18)
=
and it remains to show that at the same time
(17)
and
{18),
0
(16)
we have
/(fJ) tp(fJ)
=
=
is satisfied.
Solving
a'a+b 1 a+b - ()(.a-rb . fJ - 1 a' -aa+b . 1 a+b - aa+b fJ - 1 a'
and
(19) /(fJ)a"' + tp({J)a'"'
=
a'a+ba"' - aa+ba'"' aa+b) (fJ 1)(a'a+b _
_
=
Now let
a
(q)"'
fJ.
Evidently x
=
_
aa+b-:�:
a' the other root whose fJ starts with the term the development of (19) for 1, 2, 3, . . . a + b - 1
be the root vanishing for
fJ
=
oo
and
development in series of descending powers of containing
a'a+b-:o
P (fJ - 1)(a'a+b - aa+b)'
INTRODUCTION TO MATHEMATICAL PROBABILITY
156
[CHAP. VIII
... a+
does not contain terms involving the first power of 1/fJ, and hence b -1 as it should be. The solution 1, 2, 3, o if x z.,,o being (16) (15), (14), ying unique, its analytical expression is tisf sa 13) of ( =
=
therefore
whence for x =
a
Zs,t
- (�ria'IJa'+b'*-s" - atJ+b-s 2-n-i
and t
- all+l>
c_
fJ'dfJ fJ- 1'
n
=
To find an explicit expression for z,.,,. it remains to find the coefficient of 1/fJ in the development of
P
(p) a',.+ b - a11+1> a' q
=
,.
a'"
in series of descending powers of fl. ways. First we can substitute for
and present P in the form
P=
q
1or developing into series
P=
[a�� (�)"a��+�"+ (�Y+��aa..
H"
_
But the coefficient of 1/fJ in
a:
This can be done in two different its expression in
(!!.)" a,.+2b (qp)IJ+b a211+2b
all-
fJ" fJ-1
a"
_
fJ" fJ- 1
__,
(�)'*2"aa..
H" +
amfJn
fJ-1 by the second solution of Prob. 3 is the probability a fortune games. z,.,,.
=
m
fJ-1
for a player with
to be ruined by an infinitely rich player in the course of
Hence, the final expression for
y,.,,.
Ym,n
... ]1-.
" q Y��-t2b,n - (p)
+
is
(p)a+b Y3<J-t2b,n (qp)"+2bYa��-t4b,n q Za,,.
-
+
•
•
•
1
n
SEc.
7)
FURTHER CONSIDERATIONS ON GAMES OF CHANCE
157
the terms of this series being alternately of the form
() � () -
and
p
ka+kb Y =
L
Sill 'P Sill
r
a by
p > q. an even number, then
i sin a
0; ak; � 0 if k ;o!E i; au + a1; + (2) One of these sums is positive.
·
·
·
+ a,.; � 0.
If these forms assume nonnegative values, then every Proof by induction: Express x..
x,.
through
x,,
f,. - a,,.x, - a2nX1 =
.
. x,._,
� O(i
-
·
·
·
=
1, 2,
•
.
.
n).
- a,._1,,.:r;,._1
a,.,.
and substitute into the remaining forms. .
x;
x1, ... x,._,, thus:
satisfy the same conditions
Show that the resulting forms in x,, x2, Hence, it remains to prove the
(1) and (2).
proposition for two forms, which can easily be done.
References
CHR. HuYGENs: "De ratiociniis in ludo aleae," 1654. DE MoiVRE: "Doctrine of Chances," 3d ed., 1756. LAGRANGE: M6moire sur les suites r6currentes dont les termes varient de plusieurs manieres diff6rentes, etc., Oeuvres IV, pp. 151ff. LAPLAcE: "Thoorie analytique des probabilit6s," Oeuvres VII, pp. 228-242. BERTRAND: "Calcul des probabilit6s," pp. 104-141, Paris, 1889. MARKOFF: "Wahrscheinlichkeitsrechnung," pp. 142--146, Leipzig, 1912. 1
The author is indebted to Professor Besikovitch of Cambridge, England, for the
communication of this direct proof.
CHAPTER IX MATHEMATICAL EXPECTATION 1. Bernoulli's theorem, important though it is, is but the first link
in a chain of theorems of the same character, all contained in an extremely general proposition with which we shall deal in the next chapter.
But
before· proceeding to this task, it is necessary to extend the definition of "mathematical
expectation"-an
important
concept
originating
in
connection with games of chance. If, according to the conditions of the game, the player can win a sum
a with
probability
p,
and lose a sum b with probability q
=
1
-
p,
the mathematical expectation of his gain is by definition
pa-
qb.
Considering the loss as a negative gain, we may say that the gain of the player may have only two values,
a
and -b, with the corresponding
probabilities p and q, so that the expectation of his gain is the sum of the products of two possible values of the gain by their probabilities.
In this
case, the gain appears as a variable quantity possessing two values. Variable quantities with a definite range of values each one of which, depending on chance, can be attained with a definite probability, are called "chance variables," or, using a Greek term, "stochastic:' variables. They play an important part in the theory of probability. variable is defined
(a)
A stochastic
if the set of its possible values is given, and (b) if
the probability to attain each particular value is also given. It is easy to give examples of stochastic variables. game of chance is a stochastic variable with two values.
The gain in a The number of
points on a die that is tossed, is a stochastic variable with six values, 1, 2, . . . 6, each of which has the same probability 76. A number on a ticket drawn from an urn containing 20 tickets numbered from 1 to 20, is a stochastic variable with 20 values, and the probability to attain any one of them is balls.
72o.
Each of two urns contains 2 white and 2 black
Simultaneously, one ball is transferred from the first urn into the
second, while one ball from the latter is transferred into the first.
·
After
this exchange, the number of white balls in one of the urns may be regarded as a stochastic variable with three values, 1, 2, 3, whose corresponding probabilities are, respectively, �.
72,
�.
It is natural to extend the
concept of mathematical expectation to stochastic variables in general. 161
162
INTRODUCTION TO MATHEMATICAL PROBABILITY
x possesses n values:
Suppose that a stochastic variable
and
[CHAP. IX
P1, P2, . .Pn ·
denote the respective probabilities for x to assume values x1, x2, By definition the mathematical expectation of x is
E(x)
=
P1X1 + P2X2 +
·
·
·
.
.
•
Xn.
+ PnXn.
It is understood in this definition that the possible values of the variable
x are nume1·ically different. For instance, if the variable is a
number of points on a die, its numerically different values are 1, 2, 3, 4,
5,
6, each having the same probability,.%'. By definition, the mathematical expectation of the number of points on a die is H1 + 2
+ 3 + 4 + 5 + 6)
=
3.5.
If the variable is the number on a ticket drawn from an urn containing 20 tickets numbered from 1 to 20, its numerically different values are represented by numbers from 1 to 20, and the probability of each of these values is �o, so that the mathematical expectation of the number on a ticket is io-(1
+ 2 +
.
. . + 20)
=
10.5.
2. It is obvious that the computation of mathematical expectation requires only the knowledge of the numerically different values of the variables with their respective probabilities. But in some cases this computation is greatly simplified by extending the definition of mathe matical expectation. Suppose that, corresponding to mutually exclusive
A1, A2, . . . Am, the variable x assumes the values x1, x2, ...Xm, with the corresponding probabilities p1, p2, ... Pmi we can define the mathematical expectation of x by
and exhaustive cases
E(x)
=
P1X1 + P2X2 +
·
'
' + PmXm.
What distinguishes this extended definition from the original one is that in the second definition the values
x1, x2, . . . x,. need not be numerically
different; the only condition is that they are determined by mutually exclusive and exhaustive cases. To make this distinction clear, suppose that the variable
x is the
number of points on two dice. Numerically different values of this variable are
2, 3, 4,
, 5 6, 7, 8, 9, 10, 11, 12
and their respective probabilities
n.�.�.n.�.�.�.n.�.�.n·
SEC.
2]
163
MATHEMATICAL EXPECTATION
Therefore, by original definition, the expectation of xis
-a\ + /u- + H + i* + H + H + H + « + H + H + H
=
W = 7.
But we can distinguish 36 exhaustive and mutually exclusive cases accord ing to the number of points on each die and, correspondingly,
36 values
of the variable x, as shown in the following table:
First die
Second die
1 1 1. 1 1 1
1 2
1 2 3 4 5 6
1 2
3 3 3 3 3 3
Second die
3 4 5 6 7
4 4 4 4 4 4
1 2
3 4 5 6 7 8
5 5 5 5 5 5
4 5 6 7 8 9
6 6 6 6 6 6
2
3 4 5 6
2 2 2 2 2 2
First die
X
3 4 5 6
The probability of each of these
X
5 6 7 8 9
3 4 5 6
10
1 2
6 7 8 9
3 4 5 6
10 11
1 2
7 8 9
3 4 5 6
10 11 12
36 cases being .73s, by the extended
definition the mathematical expectation of x is
2 + 2. 3 + 4 . 3 + 5 . 4 + 6 5 + 7 . 6 + 8 . 5 + 9 4 + 10 . 3 + 11 . 2 + 12 36 =7 .
.
as it should be. It is important to show that both definitions always give the same value for the mathematical expectation. Let x1, x2,
•
•
•
Xm be the values of the variable
x corresponding
to mutually exclusive and exhaustive cases A1, A2, Am, and, p1, p2, . . . Pm, their respective probabiltties. By the extended defini •
tion of mathematical expectation, we have
(1)
•
•
164
[CHAP. IX
INTRODUCTION TO MATHEMATICAL PROBABILITY
The values x1, x2, . . . Xm are not necessarily numerically different, the numerically different values being
�. 71, s, . . . X. We can suppose that the notation is chosen in such a way that Xt, x2, ... Xa are equal to �; . . Xb are equal to 71; Xa+l , Xa+2, Xb+1, Xb+2,
. Xc are equal to
•
. . x,, are equal to X.
xl+l, X1+2,
Hence, the right-hand member of
(p1 +
P2
+
·
·
·
+pa)� +
s;
(Pa+l
(1)
+
can be represented thus:
Pa+2
+ + Pb)71 + + (Pl+t + Pl+2 + ·
·
·
·
·
·
·
·
·
+ + Pm)X.
But by the theorem of total probabilities, the sum Pt
represents the probability
+
P
P2
+
·
·
·
+
Pa
for the variable x to assume a determined
value �' because this can happen in a mutually exclusive ways; namely, when x = x1, or x = x2,
.
. . or x = Xa.
By a similar argument we see
that the sums Pa+l Pb+l Pl+l
represent the probabilities values 71, s, . . . X.
+ Pa+2 + + Pb+2 +
+ Pb + P•
+ Pl+2 + Q, R, . . T
+
P•n
for the variable x to assume
.
Therefore, the right-hand member of
(1)
reduces
to the sum
P� + Q71 + Rs +
·
·
·
+ T'A.
which, by the original definition, is the mathematical expectation of x. If,
corresponding to mutually exclusive and exhaustive cases, a
variable x assumes the same value a-in other words, remains constant it is almost evident that its mathematical expectation is a, because the sum of the probabilities of mutually exclusive and exhaustive cases is 1. It is also evident that the expectation of ax where a is a constant, is equal to a times the expectation of x. NOTE: Very often the mathematical expectation of a stochastic variable is called its "mean value."
MATHEMATICAL ExPECTATION OF A SuM 3. In many cases the computation of mathematical expectation is greatly facilitated by means of the following very general theorem:
SEc. 3]
MATHEMATICAL EXPECTATION
165
The mathematical expectation of the sum of several variables
Theorem.
is equal to the sum of their expectations; or, in symbols,
E(x +
y
+z+
·
·
·
+ w)
=
E(x ) + E(y ) + E(z) +
·
·
·
+ E(w).
We shall prove this theorem first in the case of a sum of two Let x assume numerically different values x1, x2, Xm, while numerically different values of y are Y1, Y2, . . . Yn· In regard to
Proof.
variables.
•
•
•
x + y we can distinguish mn mutually exclusive cases; namely, when x assumes a definite value x, and y another definite value y1, while i andj range respectively over numbers 1, 2, 3, ... m and 1, 2, 3, . . . n.
the sum
If
Pii denotes
the probability of coexistence of the equalities
Y
=
Yi
we have by the extended definition of mathematical expectation n
m
E(x +
y)
=
� �Pii(x, + Yi), i=li-l
or y) =
m
n
m
E(x +
(2)
n
� � PifXi + � � Pi!Yi·
i=li=l
i=li-l
As the variable x assumes a definite value x, in n mutually exclusive ways (namely, when the value
Yh Ys, . . . Yn
Xi
of x is accompanied by the values
of y) it is obvious that the sum n
�Poi i=l
represents the probability
Pi of the equality x
=
x ,.
In a similar manner
we see that 'the sum
represents the probability qi of the equality y m
n
� �PiiXi
m =
i=1j=1
n
�X; �Pii
=
Yi·
Therefore
m =
i=1 i=l
.�PiXi
=
E(x),
i=l
and similarly m
n
� �PiiYi i-li=1
n. =
m
� �P•iYi i-1i-1
n. =
�qiYi i-1
=
E(y ) ;
INTRODUCT ION TO MATHEMATICAL PROBABILITY
166
that is, by
[CHAP.
IX
(2) E(x + y)
E(x) + E(y)
=
.
which proves the theorem for the sum of two variables.
x + y + z, we may consider
If we deal with the sum of three variables it at first as the sum of
x +y
and
z
and, applying the foregoing result,
we get
E(x + y + z)
=
E(x + y) + E(z) ;
E(x) + E(y)
and again, by substituting
E(x + y + z)
=
for
E(x + y),
E(x) + E(y) + E(z).
In a similar way we may proceed farther and prove the theorem for the sum of any num her of variables.
4. The theorem concerning mathematical expectation of sums,
simple though it is, is of fundamental importance on account of its very general nature and will be used frequently.
At present, we shall use it
in the solution of a few selected problems.
Problem 1.
What is the mathematical expectation of the sum of
points onn dice?
Solution.
Denoting by
Xi
the number of points on the ith die, the
sum of the points onn dice will be 8
X1 + X2 +
=
·
+ Xn,
·
and by the preceding theorem
E(s)
=
E(x1) + E(x2) +
·
+ E(xn).
·
·
But for every single die
E(xi)
=
i
t;
=
therefore
E(s) Problem 2.
=
1, 2, ... n; 7n
2.
What is the mathematical expectation of the number of
successes inn trials with constant probability p?
Solution.
Suppose that we attach to every trial a variable which
has the value 1 in case of a success and the value 0 in case of failure.
the variables attached to trials 1, Xn,
their sum
m
=
2, 3, .. .
X1 + X2 +
·
n
·
are denoted by
+ Xn
·
obviously gives the number of successes in n trials. required expectation is
E(m)
=
E(x1) + E(x2) +
·
If
x1, x2,
·
·
+ E(xn).
Therefore, the
Smc.
4]
167
MATHEMATWAL EXPECTATION
But for every i
=
1,
2, 3,
E(x;) because
x,
. . n
.
p 1 + (1 - p)
=
·
·
may have values
1
0
=
p,
and 0 with the probabilities
p
and
1
-
p
which are the same as the probabilities of a success or a failure in the ith trial.
Hence,
E(m)
np
=
or
E(m- np)
=
0,
which may also be written in the form n
� Tm(m- np)
0.
=
m=O
116
This result was obtained on page complicated way.
in a totally different and more
The new deduction is preferable in that it is more
elementary and can easily be extended to more complicated cases, as we shall see in the next problem.
Problem 3.
P• in
known about the results of other trials. expectation of the number of successes
Solution.
Xi
=
independent or
the ith trial when nothing is What is the mathematical
m in n
trials?
Xi connected with 1 when the trial results in a success
Again let us introduce the variable
the ith trial in such a way that and
n trials
Suppose that we have a series of
not, the probability of an event being
x,
=
0 when it results in failure.
m
=
Obviously,
X1 + X2 +
'
•
+ Xn
•
and
E(m)
=
E(xt) + E(x2) +
·
·
·
+
E(x,.).
But
E(xi)
=
1
·
Pi
+ 0
·
(1 - Pi)
=
Pi
and therefore
E(m)
=
Pt + P2 +
·
·
·
+
p,..
For instance, if we have 5 urns containing 1 white, 9 black; 2 white, 8 black; 3 white, 7 black; 4 white, 6 black; 5 white, 5 black balls, and we draw one ball out of every urn, the mathematical expectation of the number of white balls taken will be:
E(m) Problem 4. drawn.
=
io- + 1� + -h + -h + -h
=
·1.5.
An urn contains a white and b black balls, and c balls are
What is the mathematical expectation of the number of the
white balls drawn?
[CHAP.
INTRODUCTION TO MATHEMATICAL PROBABILITY
168
IX
To every ball taken we attach a variable which has the if the extracted ball is white, and the value 0 otherwise. The number of white balls drawn will then be
Solution.
value
1
8 = X1 + x2 +
·
·
·
+ x•.
But the probability that the ith ball removed will be white when nothing is known of the other balls is a
� b; therefore
a b 1 +a E(x,) = a 0 = +b +b ·
·
a a
+ b
for every i, and the required expectation is ca
E(8) =a ' +b Problem 5.
An urn contains
m tickets are drawn at a time.
n
tickets numbered from
1
to
n,
and
What is the mathematical expectation
of the sum of numbers on the tickets dr�wn?
Solution.
Suppose that m tickets drawn from the urn are disposed
in a certain order, and a variable is attached to every ticket expressing its number.
Denoting the variable attached to the ith ticket by
x,,
the sum of the numbers on all m tickets apparently is
8 = X1 +X2 + ' ' ' + Xm. But when taken singly, the variable
1, 2, 3, . . . n, the numbers being 1/n.
x, may represent any of
the numbers
probability of its being equal to any one of these By the definition of mathematical expectation, we
have
E(x,) =
1+2+3+ n
·
·
·
+n
=
n+1 --
2
,
and therefore
E(8) =
m (n+1).
2
For example, taking the French lottery where
n = 90
and m
""
5, we
find for the mathematical expectation of the sum of numbers on all
5
tickets
5. 91 E(8) = -- = 227.5. 2 Problem 6.
An urn contains n tickets numbered from 1 to n.
These
tickets are drawn one by one, so that a certain number appears in the first place, another number in the second place, and so on.
We shall say
SEc. 4]
MATHEMATICAL EXPECTATION
I69
that there is a "coincidence" when the number on a ticket corresponds to the place it occupies. For instance, there is a coincidence when the first ticket has number I or the second ticket has number 2, etc. Find . the mathematical expectation of the number of coincidences. Also, find the probability that there will be none, or one, or two, etc., coincidences.
Solution.
Let
x,
denote a variable which has the value I if there is
coincidence in the ith place, otherwise 8 =
x,
0.
=
The sum
X1 + X2 + . . ' + Xn
gives the total number of coincidences and
But
E(x ') ·
=
I
-
I
I
·
n
=
-
n
because the probability of drawing a ticket with the number i in the ith place without any regard to other tickets obviously is I/n; therefore,
E(s)
n
=
!
·
n
1.
=
On the other hand, denoting the probability of exactly i coincidences by
p,, we have by definition
E(s)
=
P1 + 2p2 +
·
·
·
+ npn,
and, comparing with the preceding result, we obtain
(3)
P1 + 2p2 +
·
·
+ npn
·
=
1.
Let us denote by cp(n) the probability that in drawing n tickets, we shall It is easy to express P• by means of cp(n - i).
have no coincidences.
In fact, we have exactly i coincidences in
Cin
(n - i + I) n(n - I) - ---'-----=I..:..·-;2-· " -- 3 -.-'--. -.-'1..,-. ---'-----'_
·
·
·
mutually exclusive cases; namely, when the tickets of one of the c� specified groups of i tickets have numbers corresponding to their places while the remaining n - i tickets do not present coincidences at all.
By the theorem of compound probability, the probability of i coincidences in i specified places is I
I
1 n-i+I
INTRODUCTION TO MATHEMATICAL PROBABILITY
170
[CHAP. IX
and the probability of the absence of coincidences in the remaining n - i places is lfJ(n - i). The probability of exactly i coincidences in i specified
places is therefore
lfJ(n- i) n(n- 1)· · · (n - i + 1)' and the total probability P• of exactly of places is P•
=
or
i coincidences without specification
n(n - 1)· · · (n - i + 1) lfJ(n - i) · n(n- 1)· · · (n - i + 1)' 1 ·2 3 ··· i ·
(4)
P•
=
lfJ(n - i) . - -=2�. -=a- -___,i =-1· . ....!. .
lfJ(O) has no meaning, but the preceding formula holds i = n if we assume lfJ(O) 1. Substituting expression (4) for P• into (3), we reach the relation
The symbol good even for
lfJ(n
_
or changing
=
1) + n
lfJ(n - 3) + lfJ(n - 2) + 21 1!
.. +
(O) np
the sum being extended over all integers m which are >np.
Denoting by F(x, y) the
sum
F(x, y)=
�
cr;:xmyn-m
m>np
we have
�
aF(p, q) Tm(m- np)=p---- npF(p, q). ap
m>np
On the other hand, by Euler's theorem on homogeneous functions
aF aF nF(p, q)=p-;:- + q-• aq ap whence
m>np
Here 1-' represents an integer determined by
'-' ;;; np + 1 < '-' + 1.
The answer is therefore given by the simple formula
M
=
2npqC�=IP"-1q"-"·
2. By applying Stirling's formula (Appendix 1, page 347) prove the following
result:
M where
c
and n is so large HINT:
log log
as
( �7) ( � n; ) M:
M:
2
2
q
q
�7( ( 2
=
to make
c
=
q
1 +
IBI < 1
1 max. -- ' _1_ np- 1 nq- 1
)
;;; X o.
/
< 2(n -iJ) + 2(nq >
�Be}
1
� iJ') �
"
-
1
2
ma . x
( � /�
4(n 12(nq- t'J') 12(np - iJ) 0 ;;; " ;;; 1 j t'J' + = 1 "
np
iJ)2
' {} nq
�
,,
t'J 4(nq
)
: iJ')2
178
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. IX
3. What is the expectation of the number of failures preceding the first success jn
an indefinite series of independent trials with the probability p?
qp+2q •p+3q3p+ .
Ans.
pq .
.
=
---
(1 - q)•
=
q
-·
p
4. Balls are taken one by one out of an urn containing a white and b black balls
until the first white ball is drawn.
What is the expectation of the number of black
balls preceding the first white ball? Ans. 1. By direct application of definition the following first expression for the required expectation
[
M is obtained:
a b b(b - 1) +2 + M=-(a+b- 1)(a+b- 2) a+b a+b - 1 +3
b(b - 1)(b - 2) + ... (a+b- 1)(a+b - 2) (a+b - 3)
Ans. 2. However, it is possible to find a simpler expression for M. number of black balls preceding the first white ball, by
Denote by
Xt
]
.
the
x. the number of black balls
between the first and second white ball, and so on; finally, by Xa+t the number of black We have
balls following the last white ball. Xt
+X2 +
•
•
+Xa +l = b
•
and
E(xt)+E(xo)+ · · · +E(x a+t)
b.
=
But as the probability of every sequence of balls (that is, of every system of numbers
t, x., . ..
X
Xa+t
) is the same, namely, alb! (a+b)!
it is easy to see that
E(:�;.)
=
That is,
E(x.) = · · · = E(xa+t)
M.
=
(a+ 1)M= b
or
b
M= -- · a+1
Equ9:ting this to the preceding expression for
M, an interesting identity can be
obtained, whose direct proof is left to the student. 6. In Prob.
6, page 168, to determine the probability
no.
Consequently, we shall have
p > 1-., for all
n
>
no.
This conclusion leads to the following important theorem
due, in the main, to Tshebysheff:
SEc. 3]
185
THE LAW OF LARGE NUMBERS
With the probability approaching 1 m·
The Law of Large Numbers.
certainty as near as we please, we may expect that the arithmetic mean of values actuclly assumed by n stochastic variables will differ from the arithmetic mean of their expectations by less than any given number, however small, provided the number of variables can be taken sufficiently large and provided the condition as
n�
oo
is fulfilled. If, instead of variables x., we consider new variables with their means
=
z,
=
x,
0, the same theorem can be stated as follows:
-
a,
For a fixed e > 0, however small, the probability of the inequality
I
Z1 + Z2 +
.
·
+ z,.
·
n
,
� e
tends to 1 as a limit when n increases indefinitely, provided
B;�o. n
This theorem is very general.
It holds for independent or dependent
variables indifferently if the sufficient condition for its validity, namely, that as
n�
oo
is fulfilled.
3. This condition, which is recognized as sufficient, is at the same time necessary, if the variables
z1, z2, ... z,.
are uniformly bounded;
..
that is, if a constant number (one independent of n), C, can be found so that all particular values of than C.
z, (i
=
1, 2, .
n) are numerically less
Let P, as before, denote the probability of the inequality
jz1 + Z2 +
·
·
·
+ z,.j
� ne.
Then the probability of the opposite inequality
jz1 + Z2 +
·
·
·
+ z,.j
will be 1- P. Now, by definition,
B,.
=
E(z1 + z2 +
whence one can easily derive the inequality
> ne
[CHAP. X
INTRODUCTION TO MATHEMATICAL PROBABILITY
186
from which it follows that
B ; n
i
i-1
and it is easy to show that as provided a>�.
n--+ co
But the law of large numbers no longer holds if a ;::i!
proof of this is more difficult.
�.
The
8. The following extension of Tshebysheff's lemma was indicated by Kolmogoroft'. 0, E(:x:n b;, Let :x:1, :x: • :X:n be independent variables; E(:x:i) •
•
•
=
B,.
=
b1+ bt+
•
•
•
+ bn
=
INTRODUCTION TO MATHEMATICAL PROBABILITY
202 and
8k
Denoting by
P
Xt
=
+Xz +
·
•
•
+Xk;
k
=
[CHAP. X
1, 2, ... n.
the probability of the inequality
(A)
max.
P < 1/t2• Indication of the Proof.
(8�, 8�, ... 8�) > B.tz,
we shall have
The inequality
(A)
can materialize if and only if one of
the following mutually exclusive events occurs: event event event
If
e1: 8� > B.t2; ez; 8� ;;:i! B.t2; e3; 8� ;;:i! B.tz;
8� > B.t2; 8� ;;:i! B.t2;
(e;) represents the
probability of
P
=
e;(i
=
(et) + (ez) +
1, 2, ... n) ·
·
·
then
+ (e.).
Now consider the conditional mathematical expectation E(8�1ek) of
ek has occurred.
Since the indication of
these variables and
8k
are independent.
E(s�lek)
=
ek does not affect variables Xk+t, Hence
E(8:lek) +bk+t +
·
·
·
8�
given that
Xk+z,
..
. x.,
+b. > B.t2•
On the other hand
B.
=
n
� (ek)E(8!lek) > B.t21 (et) + (ez) +
E(s�)
whence P < 1/tz. 9. The Strong Law as
·
of Large Numbers (Kolmogoroff).
•
+ (e.))
Using the same notations
in the preceding problem, show that the probability of the simultaneous inequalities
will be greater than 1 of
·
k=l
E
-
8 + =:;;; n+1 _E,
1 nII "' provided
n exceeds a
1 1
8n+2 ;;:i! , n+2 E
.
.
•
certain limit depending on the choice
and "' and granted the convergence of the series
��. �nz
Indication of the Proof. T;
=
max.
(
)
8m- 8 n m
---
1
Consider variables
and denote by q, the probability of the inequality lemma
i
for T;
=
> �E.
1, 2, 3, . . . By Kolmogoro:ff's
203
THE LAW OF LARGE NUMBERS and
oo ql + qz + qa +
·
•
1=2•n-1
< 4E-2!2 ,� n 2 2 2 !
·
. i=l
1=21-•n
or
oo
1=2'n-1
< 16E-2! ! i=l 1=2•-•n
(),
�
.. q1 + q2 + qa +
·
·
< 16E-2 �
·
!!._,._
�kB
k=n Hence, the probability of fulfillment of all the inequalities
..,
is greater than
� 3-2E; i
1, 2, 3,
=
..
1- 16c2!b kB,.
-·
k=n
ls,./kl
The inequalities
�
e; k
=
taneously
n, n + 1, n + 2, i
=
and
. . . are satisfied when simul
1, 2, a,
-
The probability of the last inequality ·being greater than 1
4B,.
-, the probability
nte2
of simultaneous inequalities
Ik�� -
S
k
E'
,
=
n, n + 1, n + 2,
.
•
•
a fortiori will be greater than
B,./n1 tends to 0 when
This inequality suffices to complete the proof if we notice that the series
is convergent.
10. Let x1, x2,
by
P,.(E)
and
•
x,.
be identical stochastic variables and E (x 1)
•
•
•
+ Xn
n
>E
and
X1 + X2 +
+X.. ����----��
,.(E), respectively, the probabilities of the inequalities
X1 +X2 +
. according
=
E (xn > or 1
(Tshebysheff's inequality)
1, 2, ... n, which can be assumed without loss of generality. provided E(x1) = 0, i In case variables are totaUy independent and are subject to certain limitations of com paratively mild character, S. Bernstein has shown that Tshebysheff's inequality can be considerably improved. 12. Let x1, x2, x,. be totally independent variables. We suppose E(x;) = 0, E(x�) = b; and =
for i
=
1, 2,
•
•
. n and h > 2,
c
being a certain constant.
Show that
BnEII:
A= E{e•l < e2(1- .. ) where EC
�
IT
IT.
is an arbitrary positive number < 1 and
Indication of the Proof.
E
is a positive number so small that
We have
.. e"fJI � 1
+
EX;
+
�
n=2
whence
E(e""'•) � 1 +
00
::'
E"
"
b;e2
b ie• � (Ec)" < e2(1- ..). n=O
THE LAW OF LARGE NUMBERS
Q
13. If
denotes the probability of the inequality
+
X1
show that
X2
+
If
Q is
of the Proof.
(J
14. S. Bernstein's Inequality.
a
>
Bne
tt
+E
2(1 - a)
the probability of the inequality
2
< e-1 and
Q
Denoting by
lx1 + X2 +
·
·
·
Q
and M is an upper bound
n
P
•
·
Pn ·
·
for an + p.
,
of the inequality
[CHAP. X
INTRODUCTION TO MATHEMATICAL PROBABILITY
206
In the Bernoullian case P•
=
P2
=
·
·
P > 1
·
=
-
Pn, }..
pq and consequently
=
n•' 2e -2Pq+f•.
has the 17. An indefinite series of totally independent variables Xt, x2, xa, property that the mathematical expectations of any odd power of these variables is •
rigorously
2k) (�)\2k)l. k! '
\x,
=
1, 2, 3,
Xt + X2 + where Bn
=
•
0 while
=
E' for i
•
·
s
b;
2
-
=
E (xD
Prove that the probability of either one of the inequalities ·
·
+ Xn >
bt + b2 +
·
·
·
Xt + X2 +
or
tv'2B,;
+ bn is less than
e-12 (S.
·
·
•
+ Xn < -
Bernstein).
ty'2B;.
Prove first that
e'bi
E (e•x•)
� e2.
18. Positive and negative proper decimal fractions limited to, say, five decimals, are obtained in the following manner: From an urn containing tickets with numbers
0, 1, 2,
.
•
. 9
in equal proportion, five tickets are drawn in succession (the ticket
drawn in a previous trial being returned before the next) and their respective numbers
This fraction, if not equal to 0, is preceded by the sign +or -,according as a coin tossed at the same time
are written in succession as five decimals of a proper fraction. shows heads or tails.
Thus, repeating this process several times, we may obtain as
many positive or negative proper fractions with five decimals as we desire. can be said about the probability that the sum of between prescribed limits - "' and oo?
Ans.
n
These
What
such fractions will be contained
n
fractions may be considered as
so many identical stochastic variables for each of which
{3
=
E(x2)
(1
-
lQ-6)(2 - lQ-6)
=
6
1 - 2.,.
hold simultaneously, we have
yspq 1 -
(3)
- 1
1
+
< � < yspq 1 +fT.
f1
M.
B
f1
-
I
(f
Therefore the probability of these inequalities is again > let us take
1
- 271•
Now
T
q where
-r
= 2+-r
is another positive number arbitrarily chosen.
l 1
+
-
f1 (f
1
= 1 + T', 1
+ :
>
1 -
Then
T,
"
Hence, the inequalities
yspq l - -r) 1
- 2.,.
It suffices to take T
M. = ----=E
yspq
to arrive at the following proposition: The probability of the inequality
':! ySpql < B
for a fixed
E
-
M.
E
and sufficiently large number of series can be made as near to
1 as we please. If spq is somewhat large, the quotient
vspq M. differs but little from
V'lf'/2
(see Chap. IX, Prob.
2,
page 177).
Hence,
when the number of series is large and the series themselves sufficiently long, we may expect with great probability that the quotient A
B will not differ much from
v'i/2.
212
[CHAP. XI
INTRODUCTION TO MATHEMATICAL PROBABILITY
DIVERGENCE CoEFFICIENT 3. The considerations of the preceding section can be generalized. n series containing s trials each, and let
Let us consider again
represent the numbers of successes in each of these series.
Without
specifying the nature of the trials (which can be independent or depend ent) we shall denote by by
q=
p
the mean probability in all
1 - pits complement.
N = ns
trials and
Again considering the quotient n
! <m•- sp)2
=-i =--=1----,..-----
Q=
Np q
we seek its mathematical expectation E(Q) =D. When all the
N trials
are of the Bernoullian type, D
possible to imagine cases when "coefficient of dispersion." divergence coefficient." quencies in
n
If
D
> 1 or
D
We shall call
m1, m2, ... m,.
D
But it is also
1.
=
Lexis calls
< 1.
VD
the
itself the "theoretical
are actually observed fre
series, the quotient
n
D
'
! <m•- sp)2
=
'-i =--=1'-o-: ----c :-- -
Npq
may be called "empirical divergence coefficient."
Then, if the law of
large numbers can be applied to variables Xi=
(m, - sp)2 Npq ;
i = 1, 2, 3, .
.
.
'
we can expect with probability, approaching certainty as near as we please, that the inequality
ID'- Dl
+ 8(p,- p))2
E(m,-
=
8p,)2 + 82(p, - p)2
since E(m;-
8p,)
0.
=
On the other hand, 8
8
E(m;-
sp,)2
=
i-1
and
8
!Pi•- !P'•
=
i=1
i=1 8
8
!
(p,-
Pii)2
=
-8pf
+!Pi�, i -1
i -1
whence
8p,- !7'1•
8
E(m,
- 8p,)2
=
sp,- 8p f
-
! (p, - Pi•)2• i-1
. n
Now, letting i take values 1, 2, results, we get
and taking the sum of the
n
n
! E(m,- 8p,)2
=
i-1
8
n
n8p - 8! p' -
! !(p, - Pii)2. i-1i-1
i-1
But
n
n
8!(P- p;)2
=
-n8p2
+ 8!pf
i-1
whence finally
i:al n
n
D
=
1
+
8- 1 npq
--
� (p- P•)2 i-1
�� (p,- Pii)2• 1
-
Npq
i-1 i-1
Two particular cases deserve special attention.
214
INTRODUCTION TO MATHEMATICAL PROBABILITY
Lexis' Case.
[CHAP.
XI
Probabilities remain the same within each series,
but vary from series to series.
In this case
p ii
=
D becomes:
Pi and the expression of
n
D =1 +
s-1�
� (p - Pi)2 . npq i=l
The theoretical divergence coefficient in this case is always greater than 1 and may be arbitrarily large. Poisson's Case. The probabilities of. the corresponding trials in all series are the same, so that
Pii and
P = Pi =
= 1ri
7rl + 7r2 +
+
8
'Trs
��
------------
In this case the divergence coefficient 8
� (p - 7ri)2
D= 1 -
1 = --= .._ =------·•:.
spq
is always less than 1. Since the law of large numbers evidently is applicable to variables
X·• =
(m,- sp)2 ' Npq
we may expect that the empirical divergence coefficient D' will not differ much from D if the number of series is sufficiently large. ·For numerical illustration let us consider 100 series each containing 100 trials, such that in 50 series the probability is % and in the remaining 50 series it is ,%.
Here we evidently have Lexis' case.
The mean
probability in all trials is
p=! and 100
� (!
- p;)2 = 50 'Th +
50
'
Th
=
1.
i=l
Finally,
D =1 + U = 4.96. Now, suppose that we combine in pairs series of 100 trials with probability % and series of 100 trials with probability %, to form 50
4)
SEc.
APPLICATIONS OF THE LAW OF LARGE NUMBERS
series each of 200 trials.
215
Evidently we have here Poisson's case.
mean probability in each series again is
p
=
The
� and
200
� (! Finally,
-
'11'>)2
100
=
·
·dnr + 100 Th ·
=
2.
i-1
D
=
1
-
-h
=
0.96.
The consideration of the divergence coefficient may be useful in testing the assumed independence of trials and values of probabilities attached to these trials.
In the simplest case of Bernoullian trials with
a constant and known probability, the theoretical divergence coefficient is 1.
Now, if the number of series is sufficiently large and the empirical
divergence coefficient turns out to be considerably different from 1, we must admit with great probability that the trials we deal with are not of the supposed type.
If, however, the empirical divergence coefficient
turns out to be near 1, that does not conclusively prove the hypothesis concerning the independence of trials and the assumed value of the probability.
It only makes this hypothesis plausible.
There are cases of dependent trials (complex chains considered by Markoff) in which the theoretical divergence coefficient is exactly
i.
and
the probability of an event has the same constant value in each trial, insofar as the results of other trials remain unknown.
Cases like that
may easily be mistaken for Bernoullian trials without further detailed study of the entire course of trials.
4. When there is good reason to believe that the trials are independent with a constant but unknown probability, we cannot in all rigor find the value of the empirical divergence coefficient n
�(m;- sp)2 D'
=
..'--'- 1�==
--
Npq
to compare it with the theoretical divergence coefficient D
=
1, since
p
remains unknown. But, relying on Bernoulli's theorem, we can take the quotient
M N where
M
=
m1 + m2 +
as an approximate value of
p.
·
·
By taking
expression for D' we get another number
·
+
p
m,.
=
M/N in the preceding
216
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
XI
However, considering m1, m2, ... mn
which in general is close to D'.
not as observed but as eventual numbers of successes in n series, the mathematical expectation of D" is different from
1.
To avoid this
difficulty, it is better to consider a slightly different quotient n
-
� (mi- s;y n(N -1) Q
(n
=
- 1)j}(N - M)
For this quotient there exists a theorem discovered and proved for the first time by the eminent Russian statistician Tschuprow.
Theorem. The mathematical expectation of Q is 1·igorously equal to Proof. Here we shall develop the proof given by Markoff. The
1.1
above given expression of Q presents itself in the form% and therefore has no meaning in two cases: M cases we set Q
=
=
0 or
1 by definition.
M N. For these exceptional M 0 nor M N, we =
If neither
=
=
can present Q in the form
(4)
Q
Considering m1, m2,
.
=
� mi2- M2 n (N- 1) n n- 1 M(N- M) .
. . mn as stochastic variables assuming integral
values from 0 to s, the probability of a definite system of values
IS
To get the expectation of Q we must multiply it by P and take the sum E(Q)
'l;PQ
=
extended over all non-negative integers m1, m2, . . . mr., each of them not exceeding s. To perform this multiple summation we first collect all terms with a given sum m1 + m2 + 1
·
·
·
+ mn
=
M.
The theorem itself and its proof given by Markoff can be extended to the case of
series of unequal length.
SEc.
4)
217
APPLICATIONS OF THE LAW OF LARGE NUMBERS
Let the result of this summation be S,,t. sum
Then it remains to take the
N
� SM
M=O
to have the desired expression E(Q). To this end we first separate two terms corresponding toM 0 and M N. In the former case =
=
and the probability of such an event is qN while Q case the probability of which is pN, while again Q
E(Q)
=
1.
=
=
1.
In the latter
Thus
N -1
� S.u.
pN + qN +
M=l
To find SM we observe that the denominator of Q has a constant value when summation is performed over variable integers m1, m2, . . . m,. connected by the relation m1 + m2 +
·
·
·
+ m,.
=
M.
Hence, it suffices to find two sums '];Pm;
and
'];P
extended over integers m1, m2, . . . mn varying within limits 0 and and having the sumM. To this end consider the function
V
=
(pte�• + q)•(pteh + q)•
·
involving n + 1 arbitrary variables t, ��. b, V consists of terms of the form
·
·
.
(pte�· + q)• . . �...
Evidently we obtain the sum '];P by setting�� and taking the coefficient of tM in the expansion
(Vh,=h=
.
.
.
�.=o
=
=
�2
(pt + q)N,
Thus
(5)
�p
�
=
Nl M N-M MI(N -M) 1P q '
To find '];Pm; take the second derivative o2V a�;
s
When developed,
=
·
·
·
=
���
=
0
2 18
and after setting
�1 �2 = ·
=
·
·
·
(�2�r)
and take the coefficient of
2-[
tM.
�n
=
=
O, expand
�·-Eo- ... -�·=0
Thus we find
]
(N-1)! (N-2)! 1) + �p M N� m;- 8 (M-1)l(N-M)I 88( - (M-2)!(N-M)! p q M.
(6)
(4), (5),
Referring to
SM
[CHAP. XI
INTRODUCTION TO MATHEMATICAL PROBABILITY
=
and (6), we easily get
(N- 2)!N n(N- 1) nN- n + (n- 1)M (N- M) n(M - 1)!(N- M)l[ + (N - n)(M - 1) - M (N- 1)]pMqN-M;
or, after obvious simplifications,
S
-
'11-
Nl pMqN-M. M l(N - M)!
Hence N-1
! SM
=
(p + q)N - p N - qN
=
M=1
1 - pN - qN,
and finally E(Q)
=
1.
Markoff, using the same method, succeeded in finding the explicit expression of the expectation E(Q-
2
1) •
Since there is no difficulty in finding this expression except for some what tedious calculations, we give it here without entering into details of the proof:
E(Q
-
2
1)
N-1
=
�M- 1 N- M- 1 2 N(N - n) N (n-1)(N- 2)(N- 3)��. N - M c::pMq -M, M=1
whence the following inequality immediately follows: E(Q-
1)2
1,
1,
However, X;
as we can easily see.
Since
E(x,) = E(xD = p2 1 + (1 - p2) o = p2 ·
·
XI + X2 + + Xn will + Xn) = np2• E(xi + X2 +
the expectation of the sum
·
·
·
·
·
be
·
As to the dispersion of this sum, it can be expressed as follows:
n
Bn =
� E(x, - p2)2 + 2� E(x, - p2)(xi - p2).
i-1
i>i
Now
(8)
E(x, - p2)2 = E(xD - 2p2E(x,) + p4 = p2(1 - p2)
and
(9)
E(x, - p2)(xi - p2) = E(x, - p2) E(xi - p2) = 0 ·
for j > i + 1 because then
(10)
x;
and
E(x; - p2)(xi+I - p2)
xi
=
are independent.
E(x;x;+I) - p4 = pa - p4
since the probability of simultaneous events
X;= 1,
But
7)
SEc.
is p3•
223
APPLICATIONS OF THE LAW OF LARGE NUMBERS
Taking into account Bn
(8), (9), and (10), we find
=
np2q(3p +
1) - 2p 3q
and the condition as is satisfied. x2,
•
•
•
x,.
n---?
oo
Hence, the law of large numbers holds for variables x1, To express it in the simplest form, it suffices to notice that
the sum XI+ X2 + ' ' ' +X,.
represents the number of pairs EE occurring in consecutive trials of the Bernoullian series of n + pairs by m.
1 trials.
Let us denote the frequency of such
Then, referring to the law of large numbers, we get the
following proposition: If in n consecutive pairs of Bernoullian t1ials the frequency of double successes EE ism, then the probability of the inequality
1 as near as we please, when n becomes sufficiently large.
will approach
7. Simple chains of trials, described in Chap. V, Sec. 1, offer a good example of dependent trials to which the law of large numbers can be Let PI be the given probability of an event E in the first trial. According to the ·definition of a simple chain, the probability of E in
applied.
any subsequent trial is a or {3 according as E occurred or failed to occur in the preceding trial.
By p,. we ·denote the probability for E to occur
in the nth trial when the results of other trials are unknown.
{j
=a-
{3,
p= 1
{3
Let
{j·
--
-
Then, according to the developments in Chap. V, Sec. 2,
) -I Pn = P + (Pt - p {jn , whence PI+ P2 + ' ' ' + Pn �__:_.!...__:_ .::_ n __:_�
=
p+
__
barring the trivial cases {)
=
1 or {j
=
-1.
{j n -1---{j'
Pt- P 1 n
-
It follows that p represents
the limit of the mean probability inn trials when n increases indefinitely, and for that reason p may be called the mean probability in an infinite chain of trials.
When it is known that E has occurred in the ith trial, its
224
[CHAP. XI
INTRODUCTION TO MATHEMATICAL PROBABILITY
probability of occurring in some subsequent jth trial is given by
p}')
p + q{Ji-i,
=
q = 1 - p.
In the usual way we associate with trials 1, so that in general
Xt, x2, x3,
•
•
2, 3, . . . variables
•
x, x,
=
=
1 when E occurs in the ith trial 0 when E fails to occur in the ith trial.
Evidently
E(xi)
=
E(xr)
=
Pi·
In order to prove that the law of large numbers can be applied to variables for large
x1, x2, xa, . . , we must have an idea of the behavior of B,. n. By definition .
n
+ Xn - Pn)2 =
� E(xi- p;)2 +
i=l
+ 2 � E[(xi- Pi)(xi - Pi)]. i>i
The first sum can easily be found.
E(x, - Pi)2
=
We have
Pi- p� =pq + (q - p)(pt - p)fli-1 - (Pt - p)2fJ2i-2
whence n
A
=
� E(x;- Pi)2"' npq
i=l
neglecting terms which remain bounded.
As to the second sum, we
observe first that
E(x, - p;)(xi - Pi)
E(x;xf) - PiPi·
=
Again, since the probability of
XiXf is evidently
=
1
p;p<jl we have E(xix;) =PiP}»,
and
E(xi - pi)(x1 - p;)
=
Pi.P}i> - p;) pqfJi-i + 1 + (p - p)(q - p){Ji-1 - (p1 =
_
p)2fJHi-2.
1, 2, ... n - 1, we must take the sum of these expr�ssions letting j run over i + 1, i + 2, ...n. The result of this Now, for a fixed i
=
summation is
pq
O
_
0n-Hl
1- o
[Ji
0n
'
2 i + (pt - p)(q - p) � - (Pt - p) o _
0i-l
0n-l . 1- fJ _
SEc. 8)
APPLICATIONS OF THE LAW OF LARGE NUMBERS
Taking i
225
1, 2, 3, . . . n - 1 and neglecting in the sum the terms
=
which remain bounded, we get B
�E(x,- p,)(x;- p;)
=
,_
i>i
6 npq-_ 1- 6
whence B,.
=
1+6
A +2B,_np� ·
This asymptotic equality suffices to show that B ,..-+0 n2
as
n-+oo.
Therefore the law of large numbers can be applied to variables x2,
xa,
. . . .
x 1,
Since the sum X1 + X2 +
represents the frequency of
E
'
' ' +X,.
=
m
in n trials, the law of large numbers in
this particular case can be stated as follows: For a fixed
E
> 0, no matter
how small, the probability of the inequality
l tends to 1 as n -+
+P2 + · m - P1
oo
·
·
+p,.
n
n
,
<e
•
The arithmetic mean P1 +P2 +
·
·
·
+p,.
n
itself approaches the limit p.
It is easy then to express the preceding
theorem thus: The probability of the inequality
tends to 1 as n -+ oo
•
This proposition is of exactly the same type as Bernoulli's theorem, but applies to series of dependent trials.
8. Let a simple chain of N
=
ns trials be divided into n consecutive
series each consisting of s trials; also, let mr, m2, quencies of
E
in each of these series.
.
•
.
m,. be the fre
When N is a large number, the
mean probability in N trials differs little from the quantity denoted by p. It is natural to 'modify the definition of the divergence coefficient given in Sec. 3 by taking p instead of the variable mean probability in N trials. Thus we define
226
[CHAP. XI
INTRODUCTION TO MATHEMATICAL PROBABILITY
In our case, the variables
X1 = (m1- 8p)2,
X2 = (m - 8p)2, 2
·
·
·
X,.= (m,.- sp)2
are neither identical nor independent, although the degree of dependence is evidently very slight.
These variables can also be presented in the
form (11)
(x..- p + Xcz+l - p +
taking successively a
•
•
. . . (n - 1)8 + 1.
1, 8 + 1, 28 + 1,
=
+ Xcz+s-1 - p)2
•
To find the mathematical expectation of (11) it suffices to notice that E(x,
_
p)2 = E(x,
E(x, -
p;)2 + (p,
_
_
p)2
=
pq + (q
_
p)(p1
_
p)�>-1
p)(xi- p) = E(x, - p ;)(x i - Pi) + (p; - p)(Pi - p) pq�-i + (p1- p)(q - p)�i-1 =
and then proceed exactly as in the approximate evaluation of B,. in Sec. 7. The final result is E(x.. - p + Xa+11 � - 8pq + 1 - � _
P+ _
Xa+s- 1-
p)2 = (q- p)(p1- p)(1 + �) 4_1 � (1- �)2 (1- �)2 +
2pq (1- �)2
·
·
2pq�
�·
·
+
_
(q - p)(p1- p) +1 [28(1 - �) + 1 + �]�a+•-1. (1 - �)2
For somewhat large 8 the two last terms in the right member are com pletely negligible; so is the third term if a � 8 + 1. approximation,
Hence, with a good
2pq� 1 � (q- p)(p1- p)(1 + �) 8pq + 1- � (1 - �)2 + (1 - �)2 , 2pq� 1 + � > 1 ,; if X E( ) = 8pq , (1 �)2 � 1 E(X1)
=
�
_
_
and
D
=
1 + �
1- �
_
(q- p)(p1- p)(1 + �). 2� Npq(1 - � )2 s(1 - �)2 +
Again, when N is large, the last term can be dropped and as a good approximation to D we can take (12)
D=
1 + � -
1- �
2� 8(1- �)2
·
It can be shown that the law of large numbers holds for variables X11 X2, . . . X,. and therefore when n (or the number of series) is large, the
SEc. 9]
APPLICATIONS OF THE LAW OF LARGE NUMBERS
227
empirical divergence coefficient is not likely to differ considerably from D as given by the above approximate formula. 9. In order to see how far the theory of simple chains agrees with
actual experiments, the author of this book himself has done extensive experimental work.
To form a chain of trials, one can take two sets of
cards containing red and black cards in different proportions, and proceed to draw one card at a time (returning it to the pack in which it belongs after each drawing) according to the following rules: At the outset one card is taken from a pack which we shall call the
first
set;
then, whenever a red card is drawn, the next card is taken from the first set; but after a black card, the next one is taken from the
second
set.
Evidently, these rules completely determine a series of trials possessing properties of a simple chain.
In the first experiment the first pack
contained 10 red and 10 black cards, while the second pack contained 5 red and 15 black cards.
Altogether, 10,000 drawings were made, and
following their natural order, they were divided into 400 series of 25 drawings each.
The results are given in Table III.
TABLE III.-DISTRIBUTION OF RED CARDS IN 400 SERIES OF 25 CARDS Frequency of red cards,
m
Difference, m
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7
8 9
Number of series with these frequencies
2 4 8 27 29 54 37 52 47 44 41 20 20 7 4 3 1
The sum of the numbers in column 3 is 400, as it should be.
Taking
the sum of the products of numbers in columns 1 and 3, we get 3,323, which is the total number of red cards.
The relative frequency of red cards in
10,000 trials is, therefore, 0.3323.
228
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP.
XI
In our case
i,
a =
f3
o
!,
=
=
l
and the mean probability p in an infinite series of trials p
1
{3
=
-
1
=
0
3
0.3333 .
=
Thus, the relative frequency observed differs from p only by 10-3 and in this respect the agreement between theory and experiment is very satisfactory.
Now let us consider the theoretical divergence coefficient
for which we have the approximate expression
D Here we must substitute
D
o - o
1 + =
o
1
_
>i and
=
s
2o s(1 - o)2 25.
=
.
The result is
1.631, approximately.
=
To find the empirical divergence coefficient we must first evaluate the sum S
:Z(m- -Y-)2
=
extended over all 400 series.
For the sake of easier calculation, we
presentS thus: S
:Z(m - 8)2 - i:Z(m -
=
8) +
-'4�.
Now from Table III we get
:Z(m-
8) 2
=
:Z(m -
3,521;
8)
=
123
whence
s
=
3,483.4.
2000%
Dividing this number by
=
2,222.2,
we find the empirical
divergence coefficient
D' which differs from D
=
=
1.568
1.631 by only about 0.06, well within reasonable
limits.
10. In two other experiments two packs were used: one containing
13 red and 7 black cards, and another 7 red and 13 black cards.
In
one experiment the pack with 13 red cards was considerPd as the
first
deck, and in the other experiment it became the
second
deck.
The
new experiments were conducted in the same way as that described in Sec. 9, but they were both carried to 20,000 trials divided into 1,000 series of 20 trials each. a =
In the first experiment, we have
f3
U,
=
'!"'--f,
o
=
T"'�r,
and D
=
1.796, approximately,
P
=
i
SEc. 10)
APPLICATIONS OF THE LAW OF LARGE NUMBERS
229
while the same quantities for the second experiment are a=
p=i
f3 = H,
y�,
and
D = 0.556, approximately. The results of these experiments are recorded in the following two tables: TABLE IV.-CONCERNING THE FIRST ExPERIMENT Frequency of red cards,
m
2 3 4 5 6 7 8 9 10 11
Difference, m
-10
3
-8 -7 -6 -5 -4
36 59
-3 -2 -1
93 103 117
0 2 3
128 121 101 93
4 5
48 39
6 7
26 7
8 9 10
1 1 1
1
12
13 14
15 16 17 18 19 20
Number of seties with these frequencies
5 18
TABLE V.-CONCERNING THE SECOND ExPERIMENT Frequency of red cards,
m
Difference, m-
10
Number of series with t.hese frequencies
2 10 48 112
8 9
-5 -4 -3 -2 -1
10 11 12 13
0 1 2 3
193 251 201 113 56
14 15
4 5
9 5
5 6 7
230
INTRODUCTION TO MATHEMATICAL PROBABILITY
[CHAP. XI
Taking the sum of the products of numbers in columns 1 and 3, we find 10,036
and
10,045
as the total number of red cards in the first and second experiments. Dividing these numbers by 20,000, we have the following relative frequencies of red cards: 0.50018 extremely near to p
=
0.5.
and
0.500225
From the first table we find that
2:(m - 10)2
=
8,924
summation being extended over all 1,000 series. by 20,000
·
�
=
Dividing this number
5,000, we find the empirical divergence coefficient in
the first experiment
D'
1.785
=
which comes close to
D
=
1.796.
Likewise, from the second table we find 2:(m - 10)2
=
2,709,
whence, dividing by 5,000,
D"
=
0.5418
again close to
D
=
0.5562.
Thus, all the essential circumstances foreseen theoretically, for simple chains of trials, are in excellent agreement with our experiments. Problems for Solution 1. From an urn originally containing a white and b black balls, in succession, each ball drawn being replaced by 1 + c(c > before the next drawing.
If
m
0)
n
balls are drawn
balls of the same color
is the frequency of white balls, show that the prob
ability of the inequality
1
I<E
a n � - a+b -
does not tend to 1 as n increases indefinitely Indication of the Proof. If x, 1 or x; =
(Markoff, G. P6lya). =
O, according as a white or a black ball
appears in the ith drawing, we have E(x;)
2
=
E(x;)
=
a a+b'
E(x;x;)
=
a+c a a+b a+b+c
--
·
•
APPLIOATIONS OF THE LAW OF LARGE NUMBERS
231
Hence
B,.
=
E
(
x, + x, +
+x
)
na ' ,.-a+b
(a+b} l(a + b +c) +
+ nab
(a+b)(a+b +c)
.
2. Marbe' 8 Prob lem. A group of exactly m uninterrupted successesE or failures F in a Bernoullian series of trials with the probability p for a success is called an "m sequence." If N is the frequency of m sequences in n trials. show that the probability of the inequality
�� for a fixed
E
_
i
(pmq2 + plqm)
0) we find that cr2tl
oo
and, indeed, by setting a =
2l q = 1!'b in conformity with the result in Sec. 14. These examples may suffice to give an idea of problems in geometric Sylvester, Crofton, and others have enriched this field
probabilities.
by extremely ingenious methods of evaluating, or rather of avoiding However, from the evaluations, of very complicated multiple integrals. standpoint of principles, these investigations, ingenious as they are, do not contribute much to the general theory of probability. Problems for Solution
l whose middle 1. A point X is taken at random on a rectilinear segment AB point is 0. What is the probability that AX, BX, and AO can form a triangle? The x is assumed to be uniform. Ans. _%. distribution of AX l. 2. Two points X1, X2 are taken at random on AB X Assuming uniform distribution of probability, what is the mathe- A B ° between distance the of n power any of expectation matical X1 FIG. 15. =
=
=
and
X2?
dx1dx2 2ln . Ans. rz rz lx1- x � ln-2JoJo l (n + 1)(n + 2) 3. Three points X1, X2, X a are taken at random on AB. What is the probability that X a lies between X1 and X2? Ans. �.assuming uniform distribution of probability. 4. A rectilinear segment AB is divided into four equal parts AC CO OD DB. =
=
=
=
Supposing that the distribution of probability is symmetric with respect to 0, let p be the probability that a point selected at random on
0 D B
C
A
AB will
be between
C and
D.
Also, let Q be the probability that the middle point between two points selected at random will be between C and D. Prove
FIG. 16.
1 +P2 >- -· 2
thatQ
HINT: The middle point of a segment X1X2 is surely between C and D if : (i) X1 X2 are in CO; or (ii) X1 and X2 are in OD; or (iii) X1 and X2 are on opposite sides
and
of 0.
5. Two points
X1, X2
are chosen at random in a circle of radius
r.
Assuming
uniform distribution of probability, what is the mathematical expectation of their distance?
Ans.
Denoting the required mathematical expectation by
.,.2r•M
=
M,
fo2"fo2"F(r, 9, 9')d9d9'
where
F(r, r
Hence, varying
o,
9')
by
dr
d
F
=
=
fo'J:'·v> + foryr2
2rdr
+
P'•
p2
-
2pp' cos (9 - 9')pp'dpdp'.
- 2rp cos
(9 - 9')pdp
we have
INTRODUCTION TO MATHEMATICAL PROBABILITY
258
f2wJror
and
d(,.1r�M)
41rrdrJo
=
v'rt + p• - 2rp
cos
[CHAP. XII
c.>pdpdM.
By introduction of new polar coordinates the integral in the right member can be exhibited as
f�2 L "'d.,
0
--
FIG. 17.
Thus
2r
cos.,
u1du
=
d(...r'M)
16 -r 3 3
=
L
i
cos3
c.>d
will be > R tends to
oo no matter how large n is. Let Xt, X2 Xni y,, y2, ...Yn be components of w l r axesOX,OY. Then f.\M3
;�:. : ��::�� �1
Indication of Solution.
OM,, OM2, ...OMn
+ 02M2 +OaMa + . .. + OnMn
E (x D
=
EM)
=
•
•
•
Q
�
2"
M. Mh
� �
FIG. 18.
By Tshebysheff's lemma (Chap.X, Sec.1) the probabilities Q and Q' of the inequalities
!Yt
+ Y2 + • • • + Ynl >
are both less than
1/t2•
/r� + r� " Jr2 + r22 t "
+ XnI > t
!x1 + X2 +
1
+ r: + 2 + 12 . + 23
=
=
� �
t t
Now, if the length OM> R then either
!x1
+ X2 + • · · + Xnl >
IYt
+ Y2 + ' ' ' + Ynl >
or
� � =
_Ji_ V2
=
t
@. "2
t
Hence, the probability P for the length of OM to be >
that is,
R
is less than Q
+Q';
2G p 0 tends to
-
1 as
•
2 +x! ...:.--" ._ < 3 + • x + n
x�+x�+ < -!... _::...� ._ _;__ X , + x2 +
n->
__
oo.
References
E.CzUBER: "Geometrische Wahrscheinlichkeiten und Mittelwerte," Leipzig, 1884. E.CzUBER: "Wahrscheinlichkeitsrechnung,"
1,
pp.75- 109, Leipzig,
1908.
H.PoiNCARE: "Calcul des probabilites," pp. 118- 152, Paris, 1912. W.CROFTON: On the Theory of Local Probability Applied to Straight Lines Drawn at
Random in a Plane, the Method Used Being Also Extended to the Proof of
Certain New Theorems in Integral Calculus,
Philos.
Tr
ans., vol.
W.CRoFToN: Probability, "Encyclopaedi�t Brit�tnnica," 9th ed.
158, 1868.
CHAPTER XIII
THE GENERAL CONCEPT OF DISTRIBUTION 1. In dealing with continuous stochastic variables we have introduced the important concept of the function of distribution.
Denoting the
density of probability by f(z), this function was defined by F(t)
J� .J(z)dz
=
and it represents the probability of the inequality
:c
< t.
For a variable with a finite number of values the function of distribu tion can be defined as the sum F(t)
� P•
=
o:t<e
p11 p2, :c1, :c2, . :c,. where
•
•
•
p,.
are respective probabilities of all possible values
of the variable :c. The notation :c, < t is intended to show that the summation is extended over all values of :c less than t. •
•
Again, F(t) for any real t represents the probability of the inequality
:c
< t.
In this case F(t) is a discontinuous function, never decreasing and varying between F(- co) at the points :c1, :c2,
=
0 and F( +co) . . . :c,. and are
=
F (:c,
+ 0)
-
1.
Its discontinuities are located
such that
F (:c,
- 0)
=
p,,
denoting, in the customary way,
F(:c, + 0) F(:c, - 0)
=
=
lim F(:c, + lim F (:c, -
E) E)
when E1 through positive values, converges to
0.
To represent F(t)
graphically we note that
F(t) F(t)
=
P1. + P2 +
+p,.
=
F(t)
=
=
·
0 P1 P1 + P2
F(t)
260
for
t
8>0. HINT: Use Liapounoff's inequality. 3. A variable is distributed over the interval (0, + oo ) with a decreasing density of probability. Show that in this case moments M2 and M, satisfy the inequality
M: ;:i! IM, (Gauss) and that in general
if
-1 -1 [(p + 1)M,.]" ;:i! [(v + 1)M v]"
J' >I'> 0.
Indication of th£ Proof.
Show first that the existence of the integral
Jo .. x•f(x)d:l: in case /(:c) is a. positive and decreasing function implies the existence of the limit lim a•Hf(a)
=
0;
a
-+ + oo.
279
THE GENERAL CONCEPT OF DISTRIBUTION Hence, deduce that
fo .. xd.p(a;) where
.p(a;)
=
Jo '"z•+ld.p(a;)
1,
=
=
( v + 1)M
•
J(O) :...... J(z) and, finally, apply the inequality
[.fo
..
a;l'+ld.p(a;)
r [.fo
..
�
zv+1d.p(a;)
]" [.fo .. zd.p(a;) r-1'. •
4. Using the composition formula (1), page 269, prove Laplace's formula on page 278 by mathematical induction. 15. Prove that the distribution function of probability for a variable whose charac teristic function
.p(t) is given can be determined by the formula F (t)
=
.
1
C +lim
,.,_ h-o--
J
'"
.p(v)
... 2
1- 6-i•t
dv.
•
.. 1 + h -.,
-
�v
HINT: 'In carrying out Liapounoff's idea, take an auxiliary variable with the dis tribution
G(y)
=
Also make use of the integral
_!_f" 2h
- ..
-lildz.
e
Many definite integrals can be evaluated using the relation between characteristic and distribution functions, as the following example shows.
6. Let a; be distributed over ( istic function being in this case
oo,
+
oo
F(t)
=
J 1J
1
C +2-r
whence
1r
'"
1- e-ivt
- .. iv(1 + v1) oo
The character
1 1 +t1
.p(t) we find
) with the density Y2e-lzl,
e-ivt
-- dv - ,.1 + v1
=
J
1
dv
= -
2
'
_
..
e-1•1da;,
e-ltl,
an integral due to Laplace.
7. A variable is said to have Poisson's distribution if it can have only integral 1, 2, ...and the probability of a; = k is
values 0,
the quantity
a
is called "parameter" of distribution. If
distribution with parameters
a1, as,
.
.•
distribution, the parameter of which is
a1
a,.,
+
n
variables have Poisson's
show that their sum has also Poisson's a2
+
·
·
·
+
a,..
280
INTRODUCTION TO MATHEMATICAL PROBABILITY
8. Prove the following result:
.!_f .. (�)
211"
-oo
v
"
sin tv v d
1 + -:::---- --2n 2·4·6·
_!
=
2
v
·
[
t ( +
n)"
1·2
n(n - 1)
+
-
� t (
1
(t + n
[CHAP. XIII
+ n - 2)" + - 4)"
...]
-
the series being continued as long as arguments remain positive.
HINT! Consider the sum of n uniformly distributed variables in the interval ( -1, +1) and express its distribution function in two different ways. 9. Establish the expression for the mathematical expectation of the absolute
value of the sum of
n
uniformly distributed variables in the interval
Am.
2
+ 3:nl
(2n + 2)
2·4·6· · ·
=
[nl nn 1 n n
- �(n 1
+
(
+
- )( 1. 2
( - �. + �).
2)"+1 + _
4)"+1
_
.
•
.
]·
the series being continued as long as the arguments remain positive.
HINT! Apply Laplace's formula on page 278, conveniently modified, to express the 3:t + Xs + · · · + x,. and that of lxt + X2 + + x,.l. 10. Show that under the same conditions as in Prob. 9
expectation of
·
Elx + x2 +
t
·
·
·
+ x,.l
n
=
211"
f'".. ( ) sint -t
_
"-1
·
·
sin t -t cost dt. ta
HINT: Prove and use the following formula lim
T-oo
I
T .
81'",.-
-T
1 -tWX . tk X2
=
-.,.lwl.
11. Let Xt and X2 be two identical and normally distributed variables with the 0 and the standard deviation tr. If xis defined as the greater of the values
mean
=
lxtl, lx2!,
that is,
x find the mean value of
x as
max.
well as that of
E(x) 12. Let
=
x
(ixtl, lx•l> x1•
Am.
2tr
=
=
y;• min.
(lxtl, lx21, . . . lx,.i)
. x,. are identical normally distributed variables with the mean 0 where Xt1 x2, and the standard deviation tr. Find the mean value of x. Am. Setting for brevity =
--
_ria' ... fo .. v2
--
we have
tTy;.
E(3:)
=
e
0
2.•
du
(1 -
=
B(t),
B(t) }"dt.
THE GENERAL CONCEPT OF DISTRIBUTION In particular for n
=
281
2
E =
;.cV2 - t).
For large n asymptotically
E(:c) "'
(v) -1 V1
f..
=
-
-oo
•
=
V,.
fl"t
The quotient
Ll
t2dF(t)
- x)ei•l•d:e
(1
0
is represented by a uniformly convergent integral; hence •
hm
tp(V) - 1
1
=
v1
--
2
J
00
-oo
=
t1dF(t)
1
--
2
or
0, we can take n so large that
I B(vi ) l < e;
i
=
0, 1,
•
•
•
n
whence
IOI < et•. Thus
l log q>(t) + it11 < et1 and since e can be taken arbitrarily small, log ..; ->.., >..; >.., l; l, + oo.
it into five integrals -oo,
that
I 1, I2, Ia, I 41 I 6
and
To estimate I3, we notice
taken respectively between limits
ltp.,(v) - e -21 � v2. ••
Hence
(8)
l >.vdv l
-IIal �21r
1
11"
=
0
To estimate 12 + I 4, we use the inequality get
}.,2
21r
-·
ltp.,(v) - e -21 � E.,(l) ••
(9) Finally, dealing with I 1 and
I 6,
we use the obvious inequality
ltp.,(v) - e -21 � 2 ••
and we
288
[CHAP. XIV
INTRODUCTION TO MATHEMATICAL PROBABILITY
and we obtain
(10) Taking into account results:
IH (t) n
_
In it, since
!. 2
_
(7), (8), (9),
!. f "'e -fsin tvdv v Jo
7f'
l
and
the following inequality
(10),
(Ill)•
En(l) + �e(hl)2 v:;hx
2
X2 + < h + 2'11' 4n-
4
•
7f'
X is still at our disposal, we can take X En(l)lh-1. =
The inequality thus obtained when combined with
(11)
I .. F
.. Jo I
�sin tv 1 1 dv (t) - 2- ?i f e -2 v-
crt
0, it will be satisfied for all smaller 8. Let
/i(t)
be the distribution function of
x;(i
=
1, 2, ... n).
The
sum
f(t)
=
ft(t) + f2(t) + . . . + f,.(t)
being a nondecreasing function of (Chap. XIII, Sec.
t,
the following inequality holds
5):
(f_ ,.Wdf(t) y-c � (f_ ,.ltj•df(t) y-b' (f_")l"df(t) y-•, ..
..
INTRODUCTION TO MATHEMATICAL PROBABILITY
290
provided a> b > c > 0.
We take here
a= 2 + 6, '
supposing 0 < 6 < 6.
f_ and
IWdf(t)
=
b
=
2 + 6',
c=2
Then
�Jl��a'; n
....
[CHAP, XIV
f_00,.1tjadj(t)
1
�Jl��6; n
=
f_ ,.ltl•df(t)
..
1
=
B ,.
(*Jl��6'Y B,.a-a'(*JlwaY'· �
But this inequality is equivalent to
and it shows that
if n
�r2+6 __a_�o, �,(k)
1 _ 1+ B" 2 provided 0 < 61 < 8.
Hence, in the proof we can assume that the funda-
mental condition is satisfied for some positive 6 � 1. b. Liapounoff's inequality (Chap. XIII, Sec. 5) with c= 0, b = 2, a
=
2 + 8 when applied to
Hence, (12)
x, gives ( u
'
we can write bk
,.
I - B t2 2
(I4) If w,.ltjH 2
=
v211' _
I
t
-oo
z,.
e
-
-�2
na
du,
,.(v)
in any fixed
v•
interval -l � v � l tends uniformly to e-2. It suffices to apply the fundamental lemma to conclude that the probability of the inequality m
< t,. �
-np,.
tends uniformly to the limit
--
¥2; 1
if t,. tends to t. Since B,. is asymptotic to
1 npq1
f'
e
_,.
2du
-�
+ 6 and Pn differs from p by a quantity of the order -6
1 /n, the inequality m-
np
0. p < 1 and hence the limit 1 and at the same time the
Show (a) that Liapounoff's condition is satisfied when theorem holds; (b) that this condition is not satisfied if p � limit theorem fails at least for p > 1. Solution. a. By using Euler's formula we find
c.�,."'
1+�
2
(2p. +1 - p) (2 + B)p. +1
_ P
' -
(n + a))2
{log
(p
-
1) •
Hence the first part is answered. b. The probability of the inequality
z, +z2+
·
·
+z.. �O
·
is less than
2�
� (k k�l
and this, in case
1 +a) {log (k + a)}P
p > 1, is less than 2 p-1
---- (log
a)I-P,
Hence, the probability of the equality Z1
+ Z2 +
•
·
·
+ :r;,.
=
0
2 >1----- (log a)1-P and the limit theorem cannot hold. p- 1 because 2p. -p + 1 > 0.
remains always that
B,. �
oo
Note
4. Prove the asymptotic formula
+
n" 1·2
·
•
•
n
1 "'-e"'
2
being a large integer. Hm: Apply Liapounoff's theorem ton variables distributed according to Poisson's law with parameter 1. 5. By resorting to the fundamental lemma, prove the following theorem due to Markoff: If for a variables,. with the mean 0 and the standard deviation 1 n
=
lim
E(s�)
n-+oo
1 =
--
y2w-
J
..
_
00
=
t�<e-it'dt
305
FUNDAMENTAL LIMIT THEOREMS for any given k
=
3, 4, 5,
. , then the probability of the inequality
s,.< t
tends
to the limit
6. In many special cases the limit of the error term can be considerably lower than :r;,. are identical and uni that given in Sec. 6. For instance, if variables x1, xs, formly distributed in the interval -�.� the probability F,.(t) of the inequality •
•
•
/n "\/12
Zt+z•+ · · · +x,.u-4pq(l-3pq)"2 + ...
for small u, while the other root tends to
0
be reached in the same way as in Examples
as
u-+
1 and 2,
0.
The final conclusion can now
pages
297 and 301.
References P. LAPLACE: "Thoorie analytique des probabiliMs," Oeuvres VII, pp. 309ff. S. PoissoN: "Recherches sur la probabilite des jugements en matiere criminelle et en matiere civile," pp.
246ff.,
Paris,
1837.
P. TsHEBYSHEFF: "Sur deux th�oremes relatifs aux probabilit�s," Oeuvres
481-491.
2,
pp.
A. MARKOFF: The Law of Large Numbers and the Method of the Least Squa1·es (Russian), Proc. Phys.-Math. Soc. Kazan, .Jl"e-z• --- : Sur les racines de }'equation e� dx"
1898. =
0,
Bull. Acad. Sci. St. Petersbourg,
9,
1898. --- :
"D�monstration du second thooreme limite du calcul des probabilit� par la
m�thode des moments," St. Petersbourg, --- :
1913. 1912.
"Wahrscheinlichkeitsrechnung," Leipzig,
A. LIAPOUNOFF: Sur une proposition de la thoorie des probabilit�, Bull. Acad. Sci. St. Petersbourg,
13, 1900.
---: Nouvelle forme du thooreme sur la limite de probabilite, M�m. Acad. Sci. St. Petersbourg,
12, 1901.
G. P6LYA: Ueber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung, Math. Z., 8, 1920. J. W. LINDEBERG: Eine neue Herleitung des Exponcntialgesetzes in der Wahrschein lichkeitsrechnung, Math. Z.,
15, 1922.
P. LEVY: "Calcul des probabilit�," Paris,
1925.
S. BERNSTEIN: Sur !'extension du thooreme limite du calcul des probabilites aux sommes de quantit� d�pendantes, Math. Ann., A.
KoLMOGOROFF:
Eine
97, 1927.
Verallgemeinerung des Laplace-Liapounoffschen Satzes,
Bull.·Acad. Sci. U.S.S.R., p.
959, 1931.
---: Uebe� die Grenzwertsii.tze der Wahrscheinlichkeitsrechnung, Bull. Acad. Sci.
. U.S.S.R., p. 363, 1933. A. KHINTcHiNE: "Asymptotische Gesetze der WahrscheinlichkeitsrechnutJ.g," Berlin,
1933.
CHAPTER XV
NORMAL DISTRIBUTION IN TWO DIMENSIONS. LIMIT THEOREM FOR SUMS OF INDEPENDENT VECTORS. ORIGIN OF NORMAL CORRELATION 1. The concept of normal distribution can easily be extended to two and more variables.
Since the extension to more than two variables
does not involve new ideas, we shall confine ourselves to the case of two-dimensional normal distribution. Two variables,
x, y, are said to be normally distributed if for them
the density of probability has the form (j'l'
where cp
ax2 + 2bxy + cy2 + 2dx + 2ey + f
=
is a quadratic function of x, y becoming positive and infinitely large together with lxl + IYI· This requirement is fulfilled if, and only if,
ax2 + 2bxy + cy2 is a positive quadratic form.
The necessary and sufficient conditions
for this are:
a> 0;
ac-b2
=A > 0.
Since A> 0 (even a milder requirement A ;;6 0 suffices), cbnstants
xo, y0
can be found so that cp
a(x-xo)2 + 2b(x-Xo)(y-Yo) + c(y - Yo)2 + g
=
identically in
x, y.
presented thus:
It follows that the density of probability tr"' may be
4
The expression in the right member depends on six parameters K; a, b, c; x0, yo. But the requirement
reduces the number of independent parameters to five.
We can take
a, b, c; xo, Yo for independent parameters and determine K by the condition
J_ ..,f_"',.e-,.(s-so)t-'Jb(z-so)(V:"•l-c(u-1/o>•dxdy
K
.
308
=
1
'
SEc. 2)
309
NORMAL DISTRIBUTION IN TWO DIMENSIONS
which, by introducing new variables
-
�= x can be exhibited thus
.. ..
f1 = Y
Xo,
- Yo
f_ .. J_ .. e-af._26�"'�"d�d., = 1.
K
To evaluate this and similar double integrals we observe that the positive quadratic form
a�2+ 2b�., + c.,2 can be presented in infinitely many ways as a sum of two squares
a�2+ 2b�., + c.,2 = (a�+ p.,)2+ ('Y�+ a .,)2, whence
b and
(aa
- P'Y)2
=
=
a{j+ 'Ya
1:1.
By changing the signs of a and {j if necessary, we can always suppose
a8
- P'Y
=
+Vil.
Now we take u =a�+ Pfli
for new variables of integration.
v
=
'Y�+ a.,
Since the Jacobian of u, v with respect
to ��.,is Vil, the Jacobian of�� f1 with respect to u, v will be by the known rules
f .. f .. -oo
-oo
e-afL-26�•'1"d�fl =
1
__
Vil
Thus
K'lf' Vil
=1
'
J
oo
-oo
f
oo
e-u•-v•dudv
-oo
=
1/Vil and, __!____ . Vil
Vil K=-· 7f'
That is, the general expression for the density of probability in two dimensional normal distribution is
vac 7f'
-
b2e- (..-zo)L- (..-zol l 1, we find as the expression of the probability for x,yto belong to the ring between
two ellipses l1 and l2.
If l1
=
0 and l2
1
l,
=
- e-1
gives the probability for x, yto belong to the ellipse l. If n numbers l, l1, l2, . . . ln-1 are determined by the conditions
1 - -1 e
=
-1 - -1 • e e
-Z. - -1• e e
=
=
·
the whole plane is divided into n +
·
1
·
=
e-1·-•
-
e-1·-•
=
1 --, n+l
regions of equal probability:
namely, the interior of the ellipse l, rings between l, l1; lt, l2; ••• ln-2, ln-l and, finally, part of the plane outside of the ellipse ln_1• 5. To find the distribution function of the variable x (without any regard toy), we must take forD the domain - oo <X< t;
-oo dz
we see that the probability of the inequality X< t is expressed by
1
---
IT1
-yl2;
f'
-
e 2dx1
·
·
·
dx,.dy1
·
·
·
dy,.
extended over the 2n-dimensional domain
2;(x, - 8)(y, - 81)
(3)
a
n
will be used to designate the sum extended over all integers which are >a and �b. It is an important problem to devise means for the approxi mate evaluation of the above sum when it contains a considerable number of terms..
Let [x], as usual, denote the largest integer contained in a real number x, so that x [x] + 0 where 0, so-called "fractional part" of x, satisfies the inequalities =
0�0(x)f'(x)dx =
n>a
which is known as "Euler's summation formula."
Proof.
Let k be the least integer
>a and l the greatest integer �b.
The sum in the left member of (1) is, by definition,
f(k) + f(k + 1) +
.
. . + f(l)
and we must show that this is equal to the right member. we write first 347
To this end
348
INTRODUCTION TO MATHEMATICAL PROBABILITY
(H f.abp(x)f'(x)dx f."a p(x)f'(x)dx J(z bp(x)f'(x)dx J i-A: i 1p(x)f'(x)dx. j s:+1p(x)f'(x)dx LH1(j- �)f'(x)dx _f(j) �(j 1) fiH1f(x)dx J i=l-1
+
=
Next, since
� ..,
+
is an integer,
+
X+
=
+
+
=
+
and
i=l-1 i 1 n=l-1 ) f( f( ) �i + p(x)f'(x)dx _ k t l - � f(n) Lf(x)dx. +
=
1=A:
n=A:+1
On the other hand,
i"p(x)f'(x)dx i"(k- - x �)f'(x)dx _f�)- p(a)f(a) if(x)dx .[bp(x)f'(x)dx .[b(z- x �)f'(x)dx _!�) p(b)f(b) .[bf(x)dx, 1
=
+
+
=
+
+
=
+
=
+
so that finally
J.>(x)f'(x)dx
=
-f(k) - f(k + 1)
-
·
+
f(l) p(b)f(b) - p(a)f(a) J)<x)dx; ·
·
+
-
+
whence
n:S:b a J)<x)dx p(b)f(b) - p(a)f(a) - J:bp(x)f'(x)dx, n>�f(n) +
=
which completes the proof of Euler's formula.
Corollary 1.
The integral
JCzp(z)dz u(x) =
represents a continuous and periodic function of
u(x
+ 1)
If 0 �X� 1,
x
with the period
- u(x) J:+1p(z)dz Jo1p(z)dz JC1(i - z)dz =
=
=
1.
=
0.
For
349
APPENDIX I
11(x)
=
f'"o (2l - z)dz = x(l - x) 2
J
and in general
O'(X) where
0 is
a fractional part of
x.
0 �
=
0(1
- 0} 2
Hence, for every real
O'(X) � j.
Supposing thatf"(x) exists and is continuous in parts, we get
x
(a, b) and integrating by
J>(x)f'(x}dx O'(b)f'(b} - q(a)f'(a) - J:bO'(x)f"(x)dx, =
which leads to another form of Euler's formula:
S:b "'£f(n) J)<x}dx + p(b)f(b) - p(a)f(a) - O"(b)f'(b) +
n
n>a
=
+ Corollary 2.
If
11(a)j'(a) + J:O'(x)f"(x)dx.
f(x) is defined for all x � a and possesses a continuous (a, + oo); if, besides, the integral J: p(x)f'(x}dx
derivative throughout the interval oo
exists, then for a variable limit
(2}
n
b
we have
S:b
"'£f(n)
n >a
= C+
jf(b}db + p(b)f(b) + !"'p(x)f'(x)dx
where C is a constant with respect to It suffices to substitute for
b.
J:p(x)f'(x}dx the difference
£ p(x)f'(x)dx - h p(x)f'(x)dx oo
oo
and separate the terms depending upon
2. Stirling's Formula.
b
from those involving a.
Factorials increase with extreme rapidity
and their exact computation soon becomes practically impossible.
The
question then naturally arises of finding . a convenient approximate
350
INTRODUCTION TO MATHEMATICAL PROBABILITY
expression for large factorials, which question is answered by a celebrated formula usually known as "Stirling's formula," although, in the main, it was established by de Moivre in connection with problems on proba bility.
De Moivre did not establish the relation to the number 11" =
3.14159 ...
of the constant involved in his formula; it was done by Stirling.
(2)
In formula
it suffices to take
a
=
�.
J(x)
= log
x,
and replace b
by an arbitrary integer n to arrive at the remarkable expression log
(1
·
2
·
3
·
·
·
( �)
+ n+
n) = C
log n - n+
i ..p(x�dx
For the sake of brevity we shall set
where C is a constant.
f ..p(x)dx. Jn X
w (n) =
Now
f ..p(x)dx Jn X
=
and
f k+1p(x)dx Jk x
=
=
i
f"+1p(x)dx + "+2p(x)dx + Jn X n+l X
r-p(u)du = f ' p(u)du f1p(u)du = Jo u + k +J; u +k Jo u + k f' Ci - u)du + f1(!- u)du = ! f 1 (1 - 2u)2du . 2 j o(k+u)(k+1-u) J; u+k Jo u+k
Hence
where
F (u) ,.
=
1
� (k +u)(k + 1- u)' k=n
Since
(k +u)(k + 1 - u) it follows that for 0 < u < � (k (k Thus for 0
0 for all real
X.
Hence, the roots of the polynomial in the left member are
imaginary, and this implies
J! Taking n for
J2m
2 ·4
1·
3 ·5
=
·
6
2m and n
J2m+l,
and ·
·
=
rp(x),
rp(Xo),
=
and the positive difference
rp(xo + 0) - rp(xo - 0) may be considered as a mass concentrated at the point x0• In all cases rp(x0 - 0) is the total mass on the segment (a, xo) excluding the end point Xo, whereas rp(xo + O) is the mass spread over the same segment including the point
Xo.
The points of discontinuity, if there are any, form an enumerable set, whence it follows that in any part of the interval (a, b) there are points of continuity. If for any sufficiently small positive
rp(xo + E)
>
E
rp(xo
-
)
E ,
xo is called a "point of increase" of rp(x). There is at least one point of increase and there might be infinitely many. For instance, if 856
357
APPENDIX II
IP(x) IP(x) then
c
0
for
mo
for
=
=
is the only point of increase.
1.0(x) every point of the interval
On the other hand, for
x-a b-a
mo-
=
(a, b)
a�x�c C <X� b,
is a point of increase.
In case of a
finite number of points of increase the whole mass is concentrated in these points and the distribution function
IP(x)
is a step function with a
finite number of steps. Stieltjes' integrals
J:d1,0(x)
=
mo,
represent respectively the whole mass mo and its moments about the origin of the order 1, 2, . . . i. When the distribution function IP(x) is given, moments mo, mt, m2, . . . mi (provided they exist) are deter mined.
If, however, these moments are given and are known to originate
in a certain distribution of a mass over
(a, b),
the question may be raised
with what error the mass spread over an interval by these data?
(a, x) can be determined
In other words, given mo, m1, m2, . . . mi, what are the
precise upper and lower bounds of a mass spread over an interval
(a,
x)?
Such is the question raised by Tshebysheff in a short but important article "Sur les valeurs limites des integrates"
(1874).1
The results contained
in this article, including very remarkable inequalities which indeed are of fundamental importance, are given without proof. The first proof of these results and the complete solution of the question raised by Tsheby sheff was given by Markoff in his eminent thesis "On some applications of algebraic continued fractions"
(St.
Petersburg,
1884),
written in
Russian and therefore comparatively little known. Suppose that Pi is the limit of the error with which we can evaluate the (a, x) or, which is almost the same, the
mass belonging to the interval value of
IP(x),
when moments mo, m1, m2, . . . mi are given.
If, with i
tending to infinity, Pi tends to 0 for any given x, then the distribution function IP(x) will be completely determined by giving all the moments
One case of this kind, that in which mo
1
=
1,
m2,.
( 2_k_-_1...!_) , 1 _3 _. _5 _· ---=:---'-
__· =
Jour. Liouville, Ser. 2, T. XIX, 1874.
2k
APPENDIX II METHOD OF MOMENTS AND ITS APPLICATIONS 1. Introductory Remarks. To prove the fundamental limit theorem Tshebysheff devised an ingenious method, known as the "method of moments," which later was completed and simplified by one of the most prominent among Tshebysheff's disciples, the late Markoff. The simplicity and elegance inherent in this method of moments make it advisable to present in this Appendix a brief exposition of it. The distribution of a mass spread over a given interval (a, b) may be characterized by a never decreasing function cp(x), defined in (a, b) and varying from cp(a) = 0 to cp(b) = mo, where mo is the total mass con tained in (a, b). Since cp(x) is never decreasing, for any particular point xo, both the limits lim cp(xo - e) = cp(xo 0) lim cp(xo + E) = cp(xo + 0) -
exist when a positive number
cp(xo
-
E
tends to 0.
Evidently
0) ;;;;; cp(xo) ;;;;; cp(xo + 0).
If
cp(xo - 0)
=
cp(xo + 0)
then x0 is a "point of continuity" of cp(x).
cp(xo + 0)
>
cp(xo
=
cp(xo),
In case -
0),
x0 is a point of discontinuity of cp(x), and the positive difference cp(xo + 0) - cp(Xo
-
0)
may be considered as a mass concentrated at the point x0• In all cases cp(x0 - 0) is the total mass on the segment (a, xo) excluding the end point x0, whereas cp(x0 + 0) is the mass spread over the same segment including the point Xo. The points of discontinuity, if there are any, form an enumerable set, whence it follows that in any part of the interval (a, b) there are points of continuity. If for any sufficiently small positive E
cp(xo + E) > cp(xo
-
)
e ,
x0 is called a "point of increase" of cp(x). There is at least one point of increase and there might be infinitely many. For instance, if 356
357
APPENDIX II
IP(x)
=
IP(x)
=
for
0
a�x�c C <X� b,
for
mo
then c is the only point of increase. On the other hand, for IP(X) every point of the interval
=
x-a 1no;-b
-a
(a, b) is a point of increase. In case of
a
finite number of points of increase the whole mass is concentrated in these points and the distribution function IP(x) is a step function with a finite number of steps. Stieltjes' integrals
represent respectively the whole mass mo and its moments about the
origin of the order
1, 2, ...i.
When the distribution function IP(X)
is given, ·moments mo, m1, m2, . . . m. (provided they exist) are deter mined. If, however, these moments are given and are known to originate in a certain distribution of a mass over
(a, b), the question may be raised (a, x) can be determined
with what error the mass spread over an interval by these data?
In other words, given mo, m1, m2, . . . m., what are the
precise upper and lower bounds of a mass spread over an interval
(a, x)?
Such is the question raised by Tshebysheff in a short but important article 11 Sur
les valeurs limites des integrales"
{1874)
.
1
The results contained
in this article, including very remarkable inequalities which indeed are of fundamental importance, are given without proof. The first proof of these results and the complete solution of the question raised by Tsheby sheff was given by Markoff in his eminent thesis of algebraic continued fractions"
11 On
(St. Petersburg,
some applications
1884), written in
Russian and therefore comparatively little known. Suppose that P• is the limit of the error with which we can evaluate the mass belonging to the interval
(a, x) or, which is almost the same, the
value of IP(x), when moments mo, m1, m2, . . . m, are given. If, with i tending to infinity, P• tends to 0 for any given x, then the distribution
function IP(x) will be completely determined by giving all the moments
One case of this kind, that in which mo 1
=
1,
m2,.
=
5_ . · �>+I(z)) - P;-1 Q;(qi+1 - 4>>+1(z)) - Q'-1
=
=
P;+l - P;lf>;+J(z) Q'+1 - Q;lf>;+1(z)
we can write
q,(z)
=
P1+1 - P,q,,+l(z) Q;+1 - Q;lf>;+J(z)
in the sense that the formal development of the right-hand member is identical with
q,(z).
By virtue of relation
p,
4> (z) Q, The degree of
(3)
1 =
Q,(Q>+I - Q;l/>i+1)"
Q; being �� and that of Qi+1 being �i+1, the expansion of Q;(Q;+1 - Q;lf>i+l)
in a .series of descending powers of z begins with the power
z>-tH·1+�.
Hence,
p,
q,(z) - Q, and, since
M
=
z>.M->-1+1 +
�i+J ?; �; + 1, the expansion of q,(z) - p, Q;
begins with a term of the order characterizes the convergents
2�, + 1 in 1/z at least. This property P;/Q; completely. For let P/Q be a
rational fraction whose denominator is of the nth degree and such that in the expansion of
q,(z) - p
Q
361
APPENDIX II
the lowest term is of the order 2n + 1 in 1 Iz at least. Then PIQ coincides with one of the convergents to the continued fraction (1). Let i be determined by the condition
Then 4> (
z
)
P; - Q; P
4J(z) -Q
M zX.+X.; +
>-•+t in
1/z.
Hence, the
PQ;- P;Q in
z is
not greater than both the numbers
>.;-n-1
and
which are both negative while
is a polynomial.
Hence, identically, PQ;- P;Q
=
0
or
which proves the statement. 3. Continued Fraction Associated with
ib d'P(x)X
Let
·
a Z-
IP(X)
be a never
decreasing function characterizing the distribution of a mass over an interval (a, b). The moments of this distribution up to the moment of the order 2n are represented by integrals mo
=
.C d'P(x),
m1
=
J:bxd'P(x), m2
=
.Cx2d1P(X),
•
·
•
m2n
=
J:bx2nd'P(x).
INTRODUCTION TO MATHEMATICAL PROBABILITY
362 Let
If rp(x) has not less than
n + 1 points of increase, we must have Lll > o, . . . Ll,. > o,
.do > 0,
and conversely, if these inequalities are satisfied, rp(x) has at least points of increase. To prove this, consider the quadratic form
in
n + 1 variables to, t1, ,P
=
.
.
•
n + 1
t,.. Evidently
};mi+;t,t;
(i, j
=
0, 1, 2, . ..n)
so that � .. is the determinant of ,P and �o, Ll 1, � .. -1 its principal minors. The form ,P cannot vanish unless to 0. t1 t,. For if x 0, we must have also � is a point of increase and q, •
•
•
=
=
·
·
·
=
=
=
=
. J:e+•(to + t1x + f-• for an arbitrary positive
(to + t11/ +
'
•
•
E1
·
·
+ t,.x")2drp(x)
·
=
0
whence by the mean value theorem
J:e+•drp(X) e-e
+ t,.f/")2
=
0 (� -
E
< 1/ < � + E)
or
to + tlf/ + . . . + t,..,.. = 0 because
J:e+•d�p(x) e-e Letting
E
>
0.
converge to O, we conclude to + lt � +
at any point of increase.
·
·
·
+ t,.�"
=
Since there are at least
0
n + 1 points of increase
the equation to + t1x + would have at least
·
·
·
+ t,.x"
=
0
n + 1 roots and that necessitates to
=
t1
=
·
·
·
=
t,.
=
0.
363
APPENDIX II
Hence, the quadratic form !/>, which is never negative, can vanish only if all its variables vanish; that is, !/> is a definite positive form.
determinant � .. and all its principal minors �n-t, �..-2, positive, which proves the first statement.
•
•
Its
�0 must be
•
Suppose the conditions
At> 0, ... A,.> 0
Ao> 0, satisfied and let
have 8 < n+ 1 points of increase.
cp(x)
Then the
integral representing !/> reduces to a finite sum
+ t..��)2 + + t..��)2 + P2(to + t1�2 + + + p ( to + h�. + + t,.�:)2 ·
·
denoting by p1, p 2 ,
•
•
•
·
·
·
,
·
•
·
p. masses concentrated in the 8 points of
increase �1, �2, �.. Now, since 8� n constants t0, t1, all zero, can be determined by the system of equations •
·
·
•
+ t..�� to + h�t + . . + to + tt�2 . . . + t..��
to + tt�• +
·
·
+ t..�:
·
=
=
=
•
•
•
t,., not
0 0 0.
Thus !/> vanishes when not all variables vanish; hence, its determinant
�..
=
0, contrary to hypothesis.
From now on we shall assume that increase.
cp(x)
has at least n +
1
points of
The integral
rb dcp(x) Ja Z- X can be expanded into a formal power series of
rb dcp(x) Ja z X -
=
1/z,
thus
m2n mo mt m2 ... + 2 +t + + + 2 + z3 zn z z
and this power series can be converted into a continued fraction as explained in Sec. 2. Let
Pt P 2 Q/ Q21 be the first n +
1
•
P,. P..+t ' Q,. Qn+J
convergents to that continued fraction.
degrees of their denominators are, respectively,
I say that the
1, 2, 3, ... n+ 1.
Since these degrees form an increasing sequence, it suffices to show that there exists a convergent with the denominator of a given degree
8�n+l. This convergent PjQ is completely determined by the condition that in a formal expansion of the difference
INTRODUCTION TO MATHEMATICAL PROBABILITY
364
ibdlp(X)X
_
1/z, terms involving 1/z, 1/z2,
into a power series of absent.
!:_ Q
a Z-
•
•
•
1/z2" are
This is the same as to say that in the expansion of
ibdfP(X)X- P(z)
Q(z)
a Z-
1/z, 1/z2,
there are no terms involving
•
•
•
1jz•.
The preceding expres
sion can be written thus:
ibQ(x)dfP(x) ibQ(z)- Q(x)dlp(x)- P(z) - X - X +
a
a
Z
Z
=
__!._ z•+l
+
Since
i bQ(z)- XQ(x)dfP(x)- P(z) a
is a polynomial in
z,
it must vanish identically.
P(z}
(4) To determine
Z-
That gives
ibQ(z)- XQ(x)dfP(X).
=
a
Z-
Q(z) we must express the conditions that in the expansion of
ibQ�)�fP;X) terms in 8
1/z, 1/z2,
•
•
•
ljz•
vanish.
These conditions are equivalent to
relations
which in turn amount to the single requirement that
(6) 8(x) of degree � 8- 1. Q(z) of degree 8 satisfying con ditions (5), and P(z) is determined by equation (4), then P(z)/Q(z) is a
for an arbitrary polynomial
Conversely, if there exists a polynomial
convergent whose denominator is of degree
ibdfP(X)
_
z- x
lacks the terms in
1/z, 1/z2,
•
•
•
1/z2•.
8.
P(z) Q(z)
For then the expansion of
365
APPENDIX II
Let
Q(z) Then equations
=
lo + l1z + l,z2 +
·
·
+ l,_1z•-1 + z•.
·
(5) become
molo + m1l1 + m2l2 + m1lo + m2l1 + mal2 +
·
·
·
·
·
·
+ m,_1l,_, + m, + m,la-1 + m,+l
ma-1lo + m.l1 + ma+1l2 + . . . + m2a-2la-1 + m2a-1
=
=
=
0 0 0.
This system of linear equations determines completely the coefficients lo, l1, . . . la-1 since its determinant �•-1 > 0. The existence of a convergent with the denominator of degree 8�n+l being established, it follows that the denominator of the sth convergent P,jQ, is exactly of degree 8. The denominator Q. is determined, except for a constant factor, and can be presented in the form:
Q,
c =
-
�a-1
1 z z2 mo m1m2 m1 m2ma
•
•
·
•
• •
•
•
•
z• m, ma+1
A remarkable result follows from equation (6) by taking Q 8
=
Q,.; namely, if
(7) while
(8 � n). In the general relation
Q.
=
q.Qa-1 - Q,_ll
the polynomial q, must be of the first degree
q,
=
a,z + fJ,,
which shows that the continued fraction associated with
ibdrp(x)
z-x
=
Q, and
366
INTRODUCTION TO MATHEMATICAL PROBABILITY
has the form 1
a1Z + �1-
1
a:aZ + ,.,R 2-
1
aaz +�a---:---=-
The next question is, how to determine the constants plying both members of the equation
a,
and �..
Multi
(s � 2) Q,_2drp(z), (7), we get
by
integrating between limits
0
=
a
and b, and taking into account
a.J:zQ,_1Q,_zdrp(z) - .CQL2drp(z).
On the other hand, the highest terms in
Q,_l and Q,_2 are
Hence,
zQ,_z
.
1
=
-Q,_1 + 1/1 ao-1
where ![I is a polynomial of degree �s
-
2.
Referring to equation (6),
we have
and consequently
(8)
a, aa-1
J:bQ;_2drp(z)
=
J:bQ;_ldrp(z).
mz,.; how m1, a, can be found? Evidently a1 1/mo. Fur thermore, Qo = 1 and Q1 is completely determined given mo and m1. Relation (8) determines a2, and Q2 will be completely determined given mo, m1, m2, m8. The same relation again determines a8, and Q3 will be determined giyen mo, m1, ... m5. Proceeding in the same way, we conclude that, given mo, m1, m2, ... m2,., all the polynomials' Suppose that the following moments are given: mo,
many of the coefficients
as
well as constants
•
=
.
.
367
APPENDIX II
can be determined.
It is important to note that all these constants are
positive. Proceeding in a similar manner, the following expression can be found
{3,
=
J:b zQ!-1d111(z).
-a,
(b Ja Q:_ldiP(Z)
It follows that constants
f311 f3s1
•
•
•
f3n For if B
are determined by our data, but not f3n+l·
=
n + 1, the integral
J:bzQ�diP(z) can be expressed as a linear function of mo, coefficients.
m1,
•
•
•
m2n+l with known
But msn+l is not included among our data; hence, f3n+1 ·
cannot be determined.
4.. Properties of Polynomials Q,.
Q,(z)
=
0
Theorem.
Roots of the equat{on
(s ;;!! n)
are real, simple, ancl. contained within the interval (a, b). Proof. Let Q,(z) change its sign r < B times when z passes through points z1, z2, Zr contained strictly within (a, b). Setting •
•
•
8(z)
=
(z - Zt)(Z - Z2)
•
•
•
(z - Zr)
the product
B(z)Q,(z) does not change its sign when
z increases from a to b.
J:8(z)Q,(z)d!l'(z)
=
However,
0,
and this necessitates that
B(z)Q,(z) or
Q,(z) vanishes in all points of increase of !II(Z). But this is impossible, n + 1 points of increase,· whereas the degree B of Q, does not exceed n. Consequently, Q,(z) changes its sign in the interval (a, b) exactly 8 times and has all its roots real, simple, and located within (a, b).
since by hypothesis there are at lea,st
It follows from this theorem that the convergent
Pn
Q,.
368
INTRODUCTION TO MATHEMATICAL PROBABILITY
can be resolved into a sum of simple fractions as follows:
P,.(z) =�+�+ . Q,.(z) z - Zt z - Zs
(9)
. z,. are
. +� z - z,.
.
roots of the equation
Q,.(z)
=
0
and in general
,. P�(zr.). A = Q,.(zr.) The right member of coefficient of
1/z"'
can be expanded into power series of
(9)
being
1/z,
the
"
�A..z�1•
a=l
By the property of convergents we must have the following equations: n
�A..
=
mo
=
m1
a�l "
�A ..z..
a=l n
�A..z!n-1 = msn-l•
a-1
These equations can be condensed into one, n
�A..T(z..) = J:T(z)drp(z)
(10)
a=l
which should hold for any polynomial Let us take for
T(z)
T(z)
of degree � 2n
-
1.
a polynomial of degree 2n - 2:
T(z) -
[ (z
Q,.(z) z..)Q�(z..)
-
]2
•
Then
T(z..)
=
1,
T(zfl)
·=
0
and consequently, by virtue of equation
A.. = Thus constants At, A2,
fb[ (z .
.
.
.
(10),
]2
Q,.(z) drp(z) > 0. z.. )Q�(z..)
_
A,.
if
are. all positive, which shows that
.
P,.(zr.)
369
APPENDIX II
has the same sign as
Q�(zr.).
Now in the sequence
Q�(zt), Q�(z2), ...Q�(z,.) any two consecutive terms are of opposite signs.
The same being true of
the sequence
P,.(zt), P,.(z2), ...P,.(z,.), P,.(z)
it follows that the roots of
are all simple, real, and located in the
intervals
(Zt1 Z2); (z2, Za); . ..(Zn-11 z,.). Finally, we shall prove the following theorem:
Theorem.
For any real x
Q�(x)Qn-t(X) - Q�_1(x)Q,.(x) is a positive number. Proof. From the relations
Q.(z) Q.(x)
=
=
(a.z + {3.)Qs-t(Z) - Q._2(z) (a.x + {3.)Q,_t(x) - Qa-2(x)
it follows that
Q.(z)Q.-t(X) - Q.(x)Qa-t(Z) =
z-x
a.Q._1(z)Q._1(x) + +
whence, takings
Q .-t (z) Q .-2 (x)
-
Q. _t (x)Qa- 2(z)
z-x
1, 2, 3, ... nand adding results,
=
•=1
It suffices now to take z
=
x to arrive at the identity. n
Q�(x)Q,._t(X) - Q:,_1(x)Q,.(x) Since
Q0
=
1 and
a.
>
=
! a.Q._ (x) 1
2•
•-1
0, it is evident that
Q�(x)Q,._t (X) - Q:,_1(x)Q,.(x)
>
0
for every real x.
6. Equivalent Point Distributions.
If the whole mass can be con
centrated in a finite number of points so as to produce the same l first moments as a given distribution, we have an "equivalent point distribu-
.
·370
INTRODUCTION TO MATHEMATICAL PROBABILITY
tion" in respect to the l first moments.
In what follows we shall suppose
that the whole mass is spread over an infinite interval
- oo,
oo
and that
the given moments, originating in a distribution with at least n + 1 points of increase, are
The question is: Is it possible to find an equivalent point distribution where the whole mass is concentrated inn + 1 points?
Let the unknown
points be ��. ��. . . . �..+1 and the masses concentrated in them
At, A2, ... A,.+l· Evidently the question will be answered in the affirmative if the system of 2n + 1 equations
n+l
�
Aa a=l n+l Aa�a a=l n+l Aae: a=l
�
(A)
�
=
=
=
mo m1 ms
n+l
� AaE!"
=
a=l
m2,.
can be satisfied by real numbers E1, � � • . . • �..+Ji A1, A2, • • • An+11 the last n + 1 numbers being positive. The number of unknowns being greater by one unit than the number of equations, we can introduce the additional requirement that one of the numbers ��. ��. . . . �..+1 should be equal to a given real number v.
(A) may be replaced by
The system
the single requirement that the equation
n+l (11)
� AaT(�a) f_"'.., T(x)dlf'(X) =
a-1
shall hold for any polynomial
T(x) of degree �2n. Let Q(x) be the �n+l and let B(x) be e2,
polynomial of degree n + 1 having roots��. an arbitrary polynomial of degree �n
-
1.
(11) to
T(x)
=
B(x)Q(x).
•
.
•
Then we can apply equation
APPENDIX II
Since
QU..)
(12)
= 0, we shall have J_"'.., 8(x)Q(x)d�P(x)
for an arbitrary polynomial see that requirement
(12)
371
0
=
8(x) of degree �n - 1. Presently we shall Q(v) 0 determines Q(x), save
=
together with
for a constant factor if
Q,.(v) ¢ 0. Dividing
Q(x) by Q,.(x), we have identically Q(x)
where
=
(Xx + p.)Q,.(x) + R,.-I(x)
R,._I(x) is a polynomial of degree �n - 2,
-
1.
If
8(x) is an arbi
trary polynomial of degree �n
(Xx + p.)8(x) will be of degree �n
- 1.
Hence
.f<xx + p.)8(x)Q,.(x)diP(x) = 0 by
(6),
and
(12)
shows that
Lb8(x)R,._I(x)d�P(X)
=
0
8(x) of degree �n - 2. The last require R,.-I(x) differs from Q,._I(x) by a constant factor. Since the highest coefficient in Q(x) is arbitrary, we can set
for an arbitrary polynomial ment shows that
Rn-J(x)
=
-Q,._I(x).
In the equation
Q(x)
=
(Xx + p.)Q,.(x) - Q,._I(x)
X and p.. Multiplying both members by Q,._1(x)dfP(X) and integrating between - oo and oo, we get
it remains to determine constants
X _"'..,xQ,._lQndfP(X)
J
or
But
=
J_"',. Q!...1d!p(X)
372
INTRODUCTION TO MATHEMATICAL PROBABILITY
whence The equation
Q(v) (an+tV p,)Q,.(v) - Q,._t(V) serves to determine p, if Qn(v) The final expression of Q(x) will be Q(x) (an+t(X - v) QQ:(;)>)Q,.(x) - Q,._1(x). 0
+
=
=
¢ 0.
+
=
Owing to recurrence relations
Q2 (a2X f32)Q1 - Qo; Qs (asx f3s)Q2 - Q1; Q,. (a,.X (3,.)Qn-l - Qn-21 it is evident that Q, Q,., Q,._J, . Qt, Qo 1 +
=
+
=
·
+
=
·
x
in a Sturm series. For = oo only permanences.
x
=
·
1
- oo, it contains n + variations and for It follows that the equation
=
Q(x)
1
=
0
v.
has exactly n + distinct real roots and among them Thus, if the is solvable, the numbers h, , determined as problem + are roots of
Q(x)
�2
=
•
•
•
�n l
0.
Furthermore, all unknowns Aa will be positive. it follows that
(11)
In fact, from equation
a J_.. [<x _ ��)b'(�a)rd�(x) > o. .. Now we must show that constants Aa can actually be determined so A
=
to satisfy equations (A).
as
To this end let
_ P(x) f_.... Q(x) - Q(z) d�(z) [an+t(X- v) + Q,.Q,.(tv(v))JPn(X)- Pn-t(x). Then Q(x) J_.... :�z�- P(x) J_..,. Q�)��;z) =
X_ Z
=
=
(12),
and, on account of the expansion of the right member into power series of lacks the terms in Hence, the expan sion of
1/x
1/x, 1/x2, 1/x". f .... d�(z) P(x) _ x- z Q(x) •
_
•
•
373
APPENDIX II
lacks the terms in
P(x) Q(x)
1/x, 1/x2, =
mo
+
x
•
•
mt
•
1/x2"+1;
+ ... +
x2
that is, m2,. + x2"+1
On the other hand, resolving in simple fractions,
P(x) Q(x)
�
=
+
�
n
A +t . X - �n+l
+ ... +
X - �2
X - �1
1/x (A).
Expanding the right member into power series of
and comparing
with the preceding expansion, we obtain the system
By the previous
remark all constants A .. are positive.
Thus, there exists a point distribu
tion in which masses concentrated in
mo,
m1, . . . m2,..
+
n
1
points produce moments
One of these points v may be taken arbitrarily, with
the condition
Q,. (v)
¢
0
being observed, however.
6. Tshebysheff's Inequalities.
In a note referred to in the introduc
tion Tshebysheff made known certain inequalities of the utmost impor tance for the theory we are concerned with. proof of them was given by Markoff in
The first very ingenious
1884
and, by a remarkable
coincidence, the same proof was rediscovered almost at the same time by Stieltjes.
A few years later, Stieltjes found another totally different
proof; and it is this second proof that we shall follow. Let
-
oo,
IP(x)
oo.
be a distribution function of a mass spread over the interval
Supposing that a moment of the order i,
J_..
x'diP(x)
=
m;,
..
exists, we shall show first that lim
when
l
tends to
+ oo
.
Similarly
= =
0 0
For
J, .. xidiP(x) or
li(mo - IP(l)) lim liiP( -l)
f, .. diP(X)
!1;; li
=
li[IP(
+ oo
) - IP(Z)]
374
INTRODUCTION TO MATHEMATICAL PROBABILITY
or
Now both integrals
converge to 0 as ately.
l tends to + oo; whence both statements follow immedi
Integrating by parts, we have
J:xidrp(x) = li[rp(l)- mo] - i.fo1[rp(x) - m0]xi-1dx fo xidrp(x) = ( -1)i-llirp(-l)- iJ-01xi-1rp(x)dx, -l
whence, letting
l converge to + oo, '
=
m;
f .. xidrp(x) =-iJro .. [rp(x)- mo]xi-ldx- if 0 -�
xi-lrp(x)dx.
-�
If the same mass
mo, with the same moment m;, is spread according to 1/t(x), we shall have
the law characterized by the function m;
�
J_.... xidl/t(x) = -ifo ..[l/t(x) - mo]xi-1dx- if� .. xi-11/t(x)dx,
whence
J_....xi-l[rp(x)- 1/t(x)]dx = 0.
(13)
Suppose the moments
rp(x) are known. Provided rp(x) + 1 points of increase, there exists an equivalent point
of the distribution characterized by ha.
dO dv
v < 0;
if
O
-
O
if
O
= 0 and :Xn(v) attains its maximum C2m, c2m+li H2m(O),
Referring to the above expressions of we find that
2 4·6··· 2m 3· 5 · 7 ··· ( 2m + 1) 2·4·6··· 2m :X2m+l(O) = • 3· 5 · 7 ··· ( 2m + 1) X2m(O) =
354, we find the inequality
In Appendix I, page
·
2·4·6 .. · 2m 1 .y; -: :----.---: -"'"F.'====:=� < 2 5;:;·- ··---= 1) ( 2m 1:;--· 3;oV 4m + 2 -
-
·
whence
•
-
2·4·6···2m 5 7 ( 2m + 1)
��=-����=
..
Suppose that each one of them, when instance
oo
),
tends to a certain limit (for
tends to the corresponding moment of normal distribution.
One can foresee that under such circumstances
1p(x)
F(x, >..)
"' e-u•du. J v;
will tend to
1
=
--
-oo
In fact, the following fundamental theorem holds:
If, for a variable distribution characterized
Fundamental Theorem.
by the function F(x, >..) , lim
for any fixed
J_
k
..
=
xkdF(x, >..)
=
�I-'" e-"'•xkdx;
..
0, 1,
lim
>..�
oo
..
2,
3,
F(v, >..)
..., then =
1 _ ,r: V 'll"
J
"
>..�
e-"'"dx;
oo
-oo
uniformly in v. Proof.
be
2n +
Let
1 moments corresponding to a normal distribution.
They
allow formation of the polynomials
Qo(x), Q1(x), ... Qn(x) and the function designated in Sec. 6 by
and
Q(x)
1/l(x).
Similar entities cor
responding to the variable distribution will be specified by an asterisk. Since as
and since .!ln > O, we shall have
.1! for sufficiently large
>...
Then
> 0
F(x, >..)
will have not less than
n+1
points of increase and the whole theory can be applied to variable dis tribution.
In particular, we shall have
0 ;;! 1p(v) - 1/l(v
-
0) ;;! Xn(v)
(20) 0 ;;! F(v, >..) - 1/l*(v - 0) ;;! x!(v). Now
Q �(x) (s
=
0,
1,
2, ... n)
and
Q*(x)
depend rationally upon
385
APPENDIX II
mt(k
=
0,
1,
. 2n);
2,
hence, without any difficulty one can see that
Q!(x) --+ Q,(x); s 0, 1, 2, ... n Q*(x) --+ Q(x) =
as >.. --+ oo ; whence,
x:(v)
--+ x.. (v).
Again
1/t*(v
-
O)
--+
1/t(v
0)
-
as >.. --+ oo.
A few explanations are necessary to prove this.
Q,.(v) ¢
Then the polynomial
0.
�1
�2
---+
0.
B!+2 Liapounoff's result in regard to generality of conditions surpassed by far what had been established before by Tshebysheff and Markoff, whose proofs were based on the fundamental result derived in the preceding sec Since Liapounoff's conditions do not require the existence of
tion.
moments in an infinite number, it seemed that the method of moments was not powerful enough to establish the limit theorem in such a general form. ·
Nevertheless, by resorting to an ingenious artifice, of which we
made use in Chap. X, Sec. 8, Markoff finally succeeded in proving the limit theorem by the method of moments to the same degree of generality as did Liapounoff.
389
APPENDIX II
Markoff's artifice consists in associating with the variable
x, andy,. defined as follows:
variables
z, two new
Let N be a positive number which in the course of proof will be selected so as to tend to infinity together with
n.
Then
if if Evidently
z,., x,, y,. are connected by the relation z,. x, +y,. =
whence
E(x,) + E(y,.)
(21)
=
0.
Moreover
(22)
Elx�ol*'
+
EIYki2H
=
Elz,.l*'
I'L2+11,
=
x, and y,. x,. is bounded, mathematical expectations E(xt)
as one can see immediately from the definition of Since
exist for all integer exponents l
l, 2, 3, . . . and fork
=
=
1, 2, 3,
In the following we shall use the notations
IE(xDI
cL11; l 1, 2, 3, . . . c�2> + c�2> + . . . + c�2> B� 1'�2+1> + 1'�2+8> + . . . + 1'�2+6) c... =
=
=
=
Not to obscure the essential steps of the reasoning we shall first establish a few preliminary results.
Lemma 1.
Let
q,
represent the probability thaty, � . < c.. + q.. . q1 + q2 + ·
·
·
=
Proof. only if
lz�ol
Let
IP�,
Since y,. � 0
390
INTRODUCTION TO MATHEMATICAL PROBABILITY
But
J:
- lxl2+6d + c�•> +
·
·
•
+ c�>
B! tends to 0 for all positive integer exponents e except e = 2. 10. Let F,.(t) and q,,.(t) represent, respectively, the probabilities of the inequalities
.
+ Z2 + ··· + Z,. -v'2B.: X1 + X2 + ' ' ' + X,. -v'2B.: Z1
t
. + .
.
Apparently
w.
B" n
B"' n Besides
and
if not all subscripts X, p.,
But by Corollary
•
•
w
•
are equal to 1.
It follows that
2 B' _.!!
�
B..
and evidently H1,1,
•
•
1
G2,2,
= (;)I -
•
G2,2,
Hence
2·
;! · 2 �
•
•
1 .
B ..
1
(i}
and this in connection with (23) shows that for an even m
( +x. �. + "J �
E "'
2
•
·
Finally, no matter whether the exponent m is odd or even, we have
( + +Inn' ' ' + )
• X1 IliD E
X2
v2B,. _
Xn m
=
_1_
� V1T _
.. J .. -
X me- .,.dX.
APPENDIX II
395
Tshebysheff-Markoff's fundamental theorem can be applied directly and leads to the result: lim t/>n (t) uniformly in
t. t.
r:. V'll" _
f'
e-"'•dx oo
-
On the other hand, as has been established before, lim
uniformly in
1
=
[F,.(t) - q,,.(t)]
Hence, finally lim
F,.(t)
1
=
r:. V'll" _
f' -
=
0
e-"•dx oo
uniformly in t. And this is the fundamental limit tJ?.eorem with Liapounoff's condi tions now proved by the method of moments. Markoff, is simple enough and of high elegance.
This proof, due to However, preliminary
considerations which underlie the proof of the fundamental theorem, though simple and elegant also, are rather long.
Nevertheless, we must
bear in mind that they are not only useful in connection with the theory of probability, but they have great importance in other fields of analysis.
APPENDIX III ON A GAUSSIAN PROBLEM 1. In a letter to Laplace dated January 30, 1812,1 Gauss mentions a difficult problem in probability for which he could not find a perfectly satisfactory solution.
We quote from his letter:
Je me rappelle pourtant d'un probleme curieux duquel je me suis occup� il y a
12 ans,
mais lequel je n'ai pas r�ussi alors a r�soudre a rna satisfaction.
Peut
�tre daignerez-vous en occuper quelques moments: dans ce cas je suis sur que vous La voici: Soit M une quantit� inconnue
trouverez une solution plus complete. entre les limites 0 et
1 pour laquelle toutes les valeurs sont ou �galement probabies
ou plus ou moins selon une loi donn�e: qu'on Ia suppose convertie en une fraction continue 1
M=-, + a
1 a"+
Quelle est la probabilite qu'en s'arr�tant dans le developpement a un terme fini > a