MATHEMATICAL METHODS IN SAMPLE SURVEYS
Series
on
I
Multivariate I Analysis I • Vol.3 • I
MATHEMATICAL METHODS IN S...
108 downloads
1408 Views
11MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
MATHEMATICAL METHODS IN SAMPLE SURVEYS
Series
on
I
Multivariate I Analysis I • Vol.3 • I
MATHEMATICAL METHODS IN SAMPLE SURVEYS
Howard G. Tucker Universiiy of California, Irvine University
World Scientific
Singapore-New Jersey •ondon'Hong Kong
SERIES ON MULTIVARIATE ANALYSIS Editor: M M Rao
Published Vol. 1: Martingales and Stochastic Analysis J. Yeh Vol. 2: Multidimensional Second Order Stochastic Processes Y. Kakihara Forthcoming Convolution Structures and Stochastic Processes R. Lasser Topics in Circular Statistics S. R. Jammalamadaka and A. SenGupta Abstract Methods in Information Theory Y. Kakihara
Published by World Scientific Publishing Co. Pte. Ltd. P O Box 128, Farrer Road, Singapore 912805 USA office: Suite IB, 1060 Main Street, River Edge, NJ 07661 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress Cataloging-in-Publication Data Tucker, Howard G. Mathematical methods in sample surveys / Howard G. Tucker. p. cm. — (Series on multivariate analysis : vol. 3) Includes bibliographical references (p. - ) and index. ISBN 9810226179 1. Sampling (Statistics) I. Title. II. Series. QA276.6.T83 1998 519.5*2--dc21 98-29452 CIP
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
First published 1998 Reprinted 2002
Copyright © 1998 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
Printed in Singapore by Uto-Print
Preface As the title of this book suggests, it is a textbook about some math ematical methods in sample surveys. It is not about the nuts and bolts of setting up a sample survey, but it does introduce students (or readers) to some basic methodology of doing sample surveys. The mathemat ics is both elementary and rigorous, and it requires as a prerequisite the satisfactory experience of one or two years of university mathe matics courses. It is suitable for a one year junior-senior level course for mathematics and statistics majors; it is also suitable for students in the social sciences who are not handicapped by a fear of proofs in mathematics. It requires no previous knowledge of statistics, and it could actually serve as both an intuitive and mathematically rigorous introduction to statistics. A sizable part of the book covers only those topics in discrete probability that are needed for the sampling meth ods treated here. Topics in sampling that are covered in depth include simple random sampling with and without replacement, sampling with unequal probabilities, various linear relationships, stratified sampling, cluster sampling and two stage sampling. There is just enough material included here for a one year under graduate course, and it has been used as such at the University of California at Irvine for the last twenty years. The first five chapters cover the discrete probability needed for the next six chapters; these can be covered in an academic quarter. It should be pointed out that a usual one quarter course in discrete probability cannot replace what is developed in these five chapters. For one thing, considerable emphasis on working with multivariate discrete densities was needed because of the dependence that arises when the sampling is done without replace ment. Also the material on conditional expectation and conditional variance and conditional covariance as random variables is rarely, if at all, treated at the elementary level as it is here. It is this body of results that is so important in developing the material in the sample survey v
vi
PREFACE
part of the book and without any handwaving. This is particularly true for Chapters 7 through 11. It should also be stated that there is no fat in Chapters 1 through 5. Indeed, the topics covered in these chapters were not settled upon until the material in Chapters 6 through 11 was finally in place. Indeed, great care was taken to insure that Chapters 1 through 5 contained the minimal amount of material needed for the remaining chapters. There is no doubt as to the importance of the topics covered in this text for students specializing in statistics and biostatistics. Awareness of them is also important for students in the social sciences and in the various areas of business administration. But I would like to include some comments on the importance of a course based on this text for stu dents majoring in pure mathematics. Except for the unproved central limit theorem in Chapter 5 (which is not invoked in the proofs of any of the results following that chapter), this text can be claimed to be an example of an undergraduate course that teaches utmost mathematical rigor. What is more, the development is a vertical one, and very few of the chapters can be taken out of order. I call everyone's attention to Chapter 4 where results on conditional expectation and conditional variance as random variables are developed. In this chapter condi tional expectation is defined as a number and as a random variable. As a random variable, all properties that are usually obtained by a certain amount of measure-theoretic prowess elsewhere are here obtained by rather elementary methods. In addition, in this setting basic results are obtained on conditional variance and conditional covariance which culminate with the Rao-Blackwell theorem. I have two hopes connected with this text and the course it serves. One hope is that the student who is primarily applications oriented will appreciate and enjoy the mathematical ideas behind the problems of estimation in sample surveys. At the same time I hope that those who are primarily oriented in the direction of pure and abstract mathematics will see that one can keep this orientation and at the same time enjoy how well it touches on real life. I wish to express my appreciation to Mrs. Mary Moore who did the original lATgX typesetting for almost all of this document. Professors Mark Finkelstein and Jerry A. Veeh contributed greatly to my entrance
PREFACE
vii
into the age of computer typesetting; indeed, the completion of this document might never have taken place without their help. This book is dedicated to my wife, Marcia. Howard G. Tucker Irvine, California November 20, 1997
This page is intentionally left blank
Contents 1
Events and Probability 1.1 Introduction to Probability 1.2 Combinatorial Probability 1.3 The Algebra of Events 1.4 Probability 1.5 Conditional Probability
1 1 3 9 17 20
2
R a n d o m Variables 2.1 Random Variables as Functions 2.2 Densities of Random Variables 2.3 Some Particular Distributions
27 27 31 41
3
Expectation 3.1 Properties of Expectation 3.2 Moments of Random Variables 3.3 Covariance and Correlation
47 47 51 56
4
Conditional Expectation 4.1 Definition and Properties 4.2 Conditional Variance
65 65 72
5
Limit Theorems 5.1 The Law of Large Numbers 5.2 The Central Limit Theorem
83 83 86
6
Simple Random Sampling 6.1 The Model
91 91 IX
x
CONTENTS 6.2 6.3 6.4 6.5
Unbiased Estimates for Y and Y Estimation of Sampling Errors Estimation of Proportions Sensitive Questions
99 103 107 112
7 Unequal Probability Sampling 7.1 How to Sample 7.2 WR Probability Proportional to Size Sampling 7.3 WOR Probability Proportional to Size Sampling
117 117 122 128
8
Linear Relationships 8.1 Linear Regression Model 8.2 Ratio Estimation 8.3 Unbiased Ratio Estimation 8.4 Difference Estimation 8.5 Which Estimate? An Advanced Topic
135 135 138 144 148 150
9
Stratified Sampling 9.1 The Model and Basic Estimates 9.2 Allocation of Sample Sizes to Stata
155 155 161
10 Cluster Sampling 10.1 Unbiased Estimate of the Mean 10.2 The Variance 10.3 An Unbiased Estimate of Var(Y)
169 169 175 177
11 Two-Stage Sampling 11.1 Two-Stage Sampling 11.2 Sampling for Non-Response 11.3 Sampling for Stratification
183 183 189 196
A The Normal Distribution
203
Index
205
Chapter 1 Events and Probability 1.1
Introduction to Probability
The notion of the probability of an event may be approached by at least three methods. One method, perhaps the first historically, is to repeat an experiment or game (in which a certain event might or might not occur) many times under identical conditions and compute the relative frequency with which the event occurs. This means: divide the total number of times that the specific event occurs by the total number of times the experiment is performed or the game is played. This ratio is called the relative frequency and is really only an approximation of what would be considered as the probability of the event. For example, if one tosses a penny 25 times, and if it comes up heads exactly 13 times, then we would estimate the probability that this particular coin will come up heads when tossed is 13/25 or 0.52. Although this method of arriving at the notion of probability is the most primitive and un sophisticated, it is the most meaningful to the practical individual, in particular, to the working scientist and engineer who have to apply the results of probability theory to real-life situations. Accordingly, what ever results one obtains in the theory of probability and statistics, one should be able to interpret them in terms of relative frequency. A sec ond approach to the notion of probability is from an axiomatic point of view. That is, a minimal list of axioms is set down which assumes cer tain properties of probabilities. From this minimal set of assumptions 1
2
CHAPTER 1. EVENTS AND
PROBABILITY
the further properties of probabiUty are deduced and applied. A third approach to the notion of probability is limited in applica tion but is sufficient for our study of sample surveys. This approach is that of probabiUty in the "equaUy likely" case. Let us consider some game or experiment which, when played or performed, has among its possible outcomes a certain event E. For example, in tossing a die once, the event E might be: the outcome is an even number. In general, we suppose that the experiment or game has a certain number of mutuaUy exclusive "equaUy likely" outcomes. Let us further suppose that a certain event E can occur in any one of a specified number of these "equaUy likely" outcomes. Then the probabiUty of the event is defined to be the number of "equaUy likely" ways in which the event can oc cur divided by the total number of possible "equaUy likely" outcomes. It must be emphasized here that the number of equally likely ways in which the event can occur must be from among the total number of equally likely outcomes. For example, if, as above, the experiment or game is the single toss of a fair die in which the "equaUy likely" out comes are the numbers {1,2,3,4,5,6}, and if the event E considered is that the outcome is an even number, i.e., is 2,4 or 6, then the prob ability of E here is defined to be 3/6 or 1/2. This approach is limited, as was mentioned above, because in many games and experiments the possible outcomes are not equally likely. The probabiUty model used in this course is the "equaUy likely" model. EXERCISES 1. A (possibly loaded) die was tossed 150 times. The number 1 came up 27 times, 2 came up 26 times, 3 came up 24 times, 4 came up 20 times, 5 came up 29 times and 6 came up 24 times. a) Compute the relative frequency of the event that on the toss of this die the outcome is 1. b) Find the relative frequency of the event that the outcome is even. c) Find the relative frequency of the event that the outcome is not less than 5.
1.2. COMBINATORIAL
PROBABILITY
3
2. Twenty numbered tags are in a hat. The number 1 is on 7 of the tags, the number 2 is on 5 of the tags, and the number 3 is on 8 of the tags. The experiment is to stir the tags without looking and to select one tag "at random". a) What are the total number of equally likely outcomes of the experiment? b) From among these 20 equally likely outcomes what is the total number of ways in which the outcome is the number 1? c) Compute the probability of selecting a tag numbered 1. Do the same for 2 and 3. d) What is the sum of the probabilities obtained in (c)?
1.2
Combinatorial Probability
We now consider the computation of probabilities in the "equally likely" case. Let us suppose that we have n different objects, and we want to arrange k of these in a row (where, of course, k < n). We wish to know in how many ways this can be accomplished. As an example, suppose there are five members of a committee, call them A, 5 , C, D, E, and we want to know in how many ways we can select a chairman and a secretary. When we select the arrangement ((7, A), we mean that C is the chairman and A is the secretary. In this case n = 5 and k = 2. The different arrangements are listed as follows:
(A,B) (A,C) (B,A) (B,C) (C,A) (C,B) (D,A) (D,B) (E,A) (E,B)
(A,D) (A,E) (B,D) (B,E) (C,D) (C,E) (D,C) (D,E) (E,C) (E,D)
One sees that there are 20 such arrangements. The number 20 can also be obtained by the following reasoning: there are five ways in which the chairman can be selected (which accounts for the five horizontal rows of pairs), and for each chairman selected there are four ways of selecting the secretary (which accounts for the four vertical columns).
4
CHAPTER 1. EVENTS AND
PROBABILITY
Consequently there are 20 such pairs. In general, if we want to deter mine in how many ways we can arrange k out of n objects, we reason as follows. There are n ways of selecting the first object. For each way we select the first object there are n — 1 ways of selecting the second object. Hence the total number of ways in which the first two objects can be selected is n(n — 1). For every way in which the first two ob jects are selected there are n — 2 ways of selecting the third object. Thus the number of ways in which the first three objects can be se lected is n(n — l)(n — 2). From this one can easily conclude that the number of ways in which k out of n objects can be laid in a row is n(n — l)(n — 2) • • • (n — (fc — 1)), which can be written as the ratio of factorials: n\/(n — k)\ (Recall: 5! = 1 x 2 x 3 x 4 x 5). This is also referred to as the number of permutations of n things taken & at a time. In the above arrangements (or permutations) of n things taken k at a time, we counted each way in which we could arrange the same k objects in a row. Suppose, however, that one is interested only in the number of ways k objects can be selected out of n objects and is not interested in order or arrangement. In the case of the committee discussed above, the ways in which two members can be selected out of the five to form a subcommittee are as follows: (A3) (B,C) (C,D) (D,E)
(A,C) (B,D) (C,E)
(A,D) (B,E)
(A,E)
We do not list (Z), B) as before, because the subcommittee denoted by (Z), B) is the same as that denoted by (£?, D) which is already listed. Thus, now we have only half the number of selections. In general, if we want to find the number of ways in which one can select k objects out of n objects, we reason as follows. As before, there are n\/(n — k)\ ways of arranging (or permuting) n objects taken k at a time. However, all k\ ways of arranging each k objects are included here. Hence we must divide the n\/(n — k)\ ways of arranging k out of n objects by k\ to obtain the number of ways in which we can make the k selections. This number of ways in which we can select k objects out of n objects without regard to order is usually referred to as the number of combinations of n objects or things taken fc at a time. It is usually denoted by the
1.2. COMBINATORIAL
PROBABILITY
5
binomial coefficient: /n\
\k)
n!
=
k\(n-k)V
This binomial coefficient is encountered in the binomial theorem which states: n k k
(a + br = £( )a F- , fc=o
v/c/
where 0! is defined to be 1. Now we apply these two notions to some combinatorial probabihty problems, i.e., the computation of probabilities in the "equally likely" case. In each problem, the cautious approach is first to determine the number of equally likely outcomes in the game or experiment. Then one computes the number of equally likely ways from among these in which the particular event can occur. Then the ratio of this second number to the first number is computed in order to obtain the probabihty of the event. Example 1. The numbers 1,2, • • • ,n are arranged in random order, i.e., the n! ways in which these numbers can be arranged are assumed to be equally likely. We are to find the probabihty that the numbers 1 and 2 appear as neighbors with 1 followed by 2. As was mentioned in the problem, there are n! equally likely out comes. In order to compute the number of these ways in which the indicated event can occur, we reason as follows: there are n — 1 po sitions permitted for 1; for each position available for 1 there is only one position available 2, and for every selection of positions for 1 and 2, there are (n — 2)! ways for arranging the remaining n — 2 integers in the remaining n — 2 positions. Consequently, there are (n — 1) • 1 • (n — 2)! ways in which this event can occur, and its probabihty is p =
(n-l)-l-(n-2)!
=
(n - 1)! = j _
Before beginning Example 2, we should explain what is meant by selecting a random digit (or random number). In effect, one takes 10 tags and marks 0 on the first tag, 1 on the second tag, 2 on the third tag, • • •, and 9 on the tenth tag. Then these tags are put into a hat
6
CHAPTER 1. EVENTS AND
PROBABILITY
(or urn). If we say "select n random digits" or "sample n times with replacement", we mean that one selects a tag "at random", notes the number on it and records it, returns it to the container, and repeats this action n — 1 times more. Example 2. We are to find the probability p that among k random digits neither 0 nor 1 appears. The total number of possible outcomes is obtained as follows. There are 10 possibilities for selecting the first digit. For each way in which the first digit is selected there are 10 ways of selecting the second digit. So there are 102 ways of selecting the first two digits. In general, then, the number of ways in which the first k digits can be selected is 10*. Now we consider the event: neither 0 nor 1 appears. In how many "equally likely" ways from among the 10* possible outcomes can this event occur? In selecting the k random digits, it is clear that with the first random digit there are eight ways in which it can occur. The same goes for the second, third, and on up to the fcth random digit. Hence, out of the 10* total possible "equally likely" outcomes there are 8* outcomes in which this event can occur. Thus p = 8k/10k. Example 3. Now let us determine the probability P that among k random digits the digit zero appears exactly 3 times (where 3 < k). Again, the total number of equally likely outcomes is 10*. Among the k trials (i.e., k different objects) there or (* J ways of selecting the 3 trials in which the zeros appear. For each way of selecting the 3 trials in which only zeros occur there are 9*~3 ways in which the outcomes of the remaining k — 3 trials can occur. Thus P = (*) §k~*llQk. Example 4. A box contains 90 white balls and 10 red balls. If 9 balls are selected at random without replacement, what is the probability P that 6 of them are white? In this problem there are \$)
ways of selecting the 9 balls out of
100. Since there are ( 96° J ways of selecting 6 white balls out of 90 white balls, and since for each way one selects 6 white balls there are f1®) ways of selecting 3 red balls out of the 10 red balls, we see that there
1.2. COMBINATORIAL
7
PROBABILITY
are (9°J (™) ways of getting 6 white balls when we select 9 without replacement. Consequently, P =
(?) (?) /rloo^
'T)
Example 5. There are n men standing in row, among whom are two men named A and B. We would like to find the probability P that there are r people between A and B. There are two ways of solving this problem. In the first place there are l?) ways in which one can select two places for A and B to stand, and among these there are n — r — 1 ways in which one can pick two positions with r positions between them. S o P = (n — r — l ) / ^ ] . Another way of solving this problem is to observe that there are n! ways of arranging the n men, and that among these n\ ways there are two ways of selecting one of the men A or B. For each way of selecting one of A or B there are n — r — 1 ways of placing him, and for each way of selecting one of A or B and for each way of placing him there is one way in which the other man can be placed in order that there be r men between them, and there are (n — 2)! ways of arranging the remaining n — 2 men. So 2(n-r-l)(n-2)! _ n! "
n-r-l (;) '
EXERCISES 1. An urn contains 4 black balls and 6 white balls. Two balls are selected without replacement. What is the probability that a) one ball is black and one ball is white? b) both balls are black? c) both balls are white? d) both balls are the same color? 2. In tossing a pair of fair dice what is the probability of throwing a 7 or an 11?
8
CHAPTER 1. EVENTS AND
PROBABILITY
3. Two fair coins are tossed simultaneously. What is the pfobabiUty that a) they are both heads? b) they match? c) one is heads and one is tails? 4. The numbers 1,2, • • •, n are placed in random order in a straight line. Find the probability that a) the numbers 1,2,3 appear as neighbors in the order given, and b) the numbers 1,2,3 appear as neighbors in any order. 5. Among k random digits find the probability that a) no even digit appears, b) no digit divisible by 3 appears 6. Among k random digits (k > 5) find the probability that a) the digit 1 appears exactly five times, b) The digit 0 appears exactly two times and the digit 1 appears exactly three times. 7. A box contains 10 white tags and 5 black tags. Three tags are selected at random without replacement. What is the probability that two are black and one is white? 8. There are n people standing in a circle, among whom are two people named A and B. What is the probability that there are r people between them? 9. Six random digits are selected. In the pattern that emerges, find the probability that the pattern will contain the sequence 4,5,6.
1.3. THE ALGEBRA
1.3
OF EVENTS
9
T h e Algebra of Events
Before we may adequately discuss probabilities of events we must dis cuss the algebra of events. Then we are able to establish the properties of probability. Connected with any game or experiment is a set or space of all possible individual outcomes. We shall consider only those games or experiments where these individual outcomes are equally likely. Such a collection of all possible individual outcomes is called a fundamental probability set or sure event It will be denoted by the Greek letter omega, fi. We shall also use the expression fundamental probability set (or sure event) for any representation we might construct of all individual outcomes. For example, in a game consisting of one toss of an unbiased coin, a fundamental probability set consists of two individual outcomes which can be conveniently referred to as H (for heads) and T (for tails). If the game consists in tossing a fair coin twice, then the fundamental probability set consists of four individual outcomes. One of these outcomes could be denoted by (T, # ) , which means that tails occurs on the first toss of the coin and heads occurs on the second toss. The remaining three individual outcomes may be denoted by (jff, H), (H, T) and (T, T). In general, an arbitrary individual outcome will be denoted by u> and will be referred to as an elementary event Thus, fi denotes the set of all elementary events. An event is simply a collection of certain elementary events. Differ ent events are different collections of elementary events. Consider the game again where a fair coin is tossed twice. Then, as indicated above, the sure event consists of the following four elementary events: (H,H)
(H,T)
(T,H)
(T,T).
If A denotes the event: [heads occurs in the first toss], then A consists of two elementary events, (H, H) and ( # , T), and we write this as A =
{(H,H),(H,T)}.
If B denotes the event: [at least one head appears], then B consists of the three elementary events ( # , # ) , ( # , T) and (T, # ) , i.e., B =
{(H,H),(H,T),(T,H)}.
10
CHAPTER 1. EVENTS AND
PROBABILITY
If C denotes the event: [no heads appear], then C consists of one el ementary event, i.e., C = {(T,T)}. If D denotes the event: [at least three heads occur], this is clearly impossible and is an empty collection of elementary events; we denote this by D = <j), where <j> always means the empty set. In general, we shall denote the fact that an elementary event u belongs to the collection of elementary events which determine an event A by u G A. If an elementary event w occurs, and if u G A, then we say that the event A occurs. If might be noted at this point that just because an event A occurs, it does not mean that no other events occur. In the example above, if (i/, H) occurs, then A occurs and so does B. The fundamental probabihty set f) is also called the sure event for the basic reason that whatever elementary event u does occur, then always We now introduce some algebraic operations of events. If A is an event, then Ac will denote the event that A does not occur. Thus Ac consists of all those elementary events in the fundamental probabihty set which are not in A. For every elementary event u in the fundamental probability set and for every event A, one and only one of the following is true: u G A or u> G A c . An equivalent way of writing u> G A c is u £ A, and we say that u> is not in A. Also, A c is called the negation of A or the complement of A. If A and B are events, then AUB will denote the event that at least one of the two events A, B occur. By this we mean that A can occur and B not occur, or B can occur and A not occur, or both A and B can occur. In the previous example, if E denotes the event that heads occurs in the second trial, then E =
{(H,H),(T,H)}
=
{(H,H),(H,T),(T,H)}.
and AUE
In other words, A U E is the event that heads occurs at least once, and we may write A U E = B. In general, if Ai, • • •, A n are any n events, then Ai U A2 U • • • U A n
1.3. THE ALGEBRA
OF EVENTS
11
denotes the event that at least one of these n events occurs. This event will also be written as
Suppose A and B are events which cannot both occur, i.e., if u> G A, then uj £ B, and if u> G i?, then u; $£ A. In this case, A and J5 are said to be incompatible or disjoint or mutually exclusive. Events Ai, • • •, An are said to be disjoint if and only if every pair of these events has this property. The notation A C B means: if event A occurs, then event B occurs. Other ways of stating this are: A implies B and B is implied by A. Thus A C B is true if and only if for every u> G A, then u G B. In any situation where it is desired to prove A C 5 , one should select an arbitrary u G A and prove that this implies u; G B. We define the equality of two events A and 5 , namely A = B, to occur if A C i? and 5 C A , i.e., A and B share the same elementary events. Finally we define the event that A and B both occur, which we denote by A 0 5 , to be the event consisting of all elementary events cv in both A and B. This is frequently referred to as the intersection of A and B. If Ai, • • •, A n are any n events, then the event that they all occur is denoted in two ways by n
A i n A 2 n - - - r i A n = f] Aj. We sometimes write AB instead of A fl B and Ai A 2 • • • A n instead of Ai n A 2 n • • • n A n . We now prove some propositions on the algebra of events. P R O P O S I T I O N 1. For every event A, then A C A. Proof: Let u € A. Then this same u G A. Hence every elementary event in the left event is an elementary event in the right event. P R O P O S I T I O N 2. If A,B,C then A C C.
are events, if A C B and if B C C,
Proof: Let u G A; we must show that w G C Since Ac
B and a; G A,
CHAPTER 1. EVENTS AND
12 then OJGB.
PROBABILITY
Now, ,snce B C C and ssnce u € B, ,hen n € C.
P R O P O S I T I O N 3. For every event A, AD A = A, A U A = A, and (Ac)c = A. Proof: These are obvious. P R O P O S I T I O N 4. If A is any event, then CACSfi Proof: The trick here involves the fact that any u one might ffnd in <j> is certainly in A, since <j> containn no w'so The implication A C ft is obvious. We noted above that if A and B are two events, and if we wished to prove A C B, then we should take an arbitrary elementary event u in A and prove that it is in B. Now suppose we have two events A and B, and suppose we wish to prove A = B. Because of the definition of equality of two events given above, one is required to do the following: (i) take an arbitrary u G A a n d prove that u> e B, and ((ii take an arbitrary u € B and prove that u e A. P R O P O S I T I O N 5. IfAx, A*---,
A* are events then
(u*V-rK- is an element of at least one of A, 1?, namely A. Hence w G i U B . This in turn implies u; is in at least one of A U B and C, namely, A U 5 . Thus u; 6 (A U 5 ) U C. If w G 5 U C, then CJ is in at least one of B or C. If it is in C, then it is in (A U 5 ) U C If it is in 5 , then it is in A U i?, namely 5 . Thus we have established the inclusion AU(5UC)C(AUB)UC. In order to establish the reverse inclusion, and hence the first equation, we use Proposition 10 and the above inclusion to obtain (AUB)UC
= = =
CU(AUB) CU(BUA)C(CUB)UA (BUC)UA = AU(BUC).
In order to establish the second equation, replace A, B and C in both sides of the first equation by A c , Bc and Cc respectively, take the com plements of both sides, and apply Propositions 9 and 5 to obtain the conclusions. P R O P O S I T I O N 12. IfA.B An(BDC)
and C are events, then =
ABuAC
1.3. THE ALGEBRA
OF EVENTS
15
and A U ( £ n C) = (A U £ ) n (A U C). Proof: If u G A 0 (B U C), then u> G A and a; is in at least one of J3, C. Hue B, then u G A # ; if u G C, then a; G AC. Hence u> G AB U AC. If a; G AB U AC, then u> is in at least one of AB and AC. If u G A 5 , then co G A and a; G 5 ; now w £ B implies w G B U C , and hence u; G A n (B U C). If u> G AC, then replace C by B in the previous sentence to obtain the same conclusion. In order to prove the second equation, replace A, B and C in the first equation by A c , Bc and C c respectively, take the complements of both sides and apply Propositions 9 and 5. P R O P O S I T I O N 13. If A and B are events then A U B = A U ACB, and A and ACB are disjoint. Proof: If u> G A U 5 , then a; G A or a; G B. If w G A, thenu; G AUA C 5. If a? G -B, then two cases occur: u G A also, or u; ^ A. In the first case u> G A U A c 5 . In the second case, w G A c while yet u G J5, i.e., u> G A c 5 and thus u> G A U A C B. Thus, A U 5 C A U A c 5 . Now let u G A U A C B. Then w G A or u> G A c 5 . If u G A, then a; G A U 5 . If u> G A C B, then w 6 B , and hence ueAUB. Thus A U ACB C A U B , and the equation is established. Also, A and ACB are disjoint since if u) G A c 5 , then u G A c , i.e., w £ A. Q.E.D. EXERCISES 1. Prove: if 1? is any event, then D B = and 5 fl 0 = B. (See Propositions 4 and 7.) 2. If A is an event, use Problem 1 and Propositions 3 and 6 to prove U A = A and ft U A = fl. 3. Use Propositions 10 and 8 to prove: if A and B are events, then ACiB C B. 4. Use Problem 3 and two propositions of this section to prove: if C and D are events, then C C C U D.
CHAPTER 1. EVENTS AND
16
PROBABILITY
5. Prove: if A, B, C and D are events, if A C C and if B C D , then AB C CD. 6. Let Ai, A 2 , A 3 , A*, A 5 , A 6 , A 7 be events. Match these three events: AJA£A3, A6A^ and A2A5 with the following statements: (i) A 2 and A 5 both occur, (ii) As is the first among the seven events to occur, and (iii) AQ is the last event to occur. 7. Let Ai, A2 and A3 be events, and define 2?t to be the event that A{ is the first of these events to occur, i = 1,2,3. Write each of 2?i, 2?2, #3 in terms of Ai, A 2 , A 3 and prove that A\ U A 2 U A3 = J5i U 5 2 U B3. 8. Prove: if A is any event, then A U Ac = ft and A n A c = . 9. Prove: if A and B are events, then B = A S U A c 5 . (Hint: Use Problem 8, Proposition 12 and Problem 1.) 10. In Problem 6, construct the event: A5 is the last of these events to occur. 11. Five tags, numbered 1 through 5 are in a bowl. The game is to select a tag at random and, without replacing it, select a second tag. (I.e., take a sample of size two without replacement.) After you list all 20 elementary events in 0 , list the elementary events in each of the following events: A:
the sum of the two numbers is < 6
B:
the sum of the numbers is 5
C:
the larger of the two numbers is < 3
D:
the smaller of the two numbers is 2
E:
the first number selected is 5
F:
the second number selected is 4 or 5.
1.4. PROBABILITY
17
12. In Problem 11, list the elementary events in each of the following events:
Al)£,AnC,Dc,(AUEy,EnF.
13. Prove Proposition 3. 14. Prove the converse to the second statement in Proposition 9: If A and B are events, and if Ac = Bc, then A = B. 15. Prove: If A, B and C are events, and if A and B are disjoint, then AC and BC are disjoint. 16. Prove: If A, # ! , - • • , # „ are events, if A C U^tf; andif Hu • ■ -- Hn are disjoint, then AHU • • •• AH/ are disjoint, and A = U?=1 Aff,-.
1.4
Probability
The only notion of probability that we shall use in this course is that where the elementary events are all equally likely. In most cases these equally likely outcomes will be apparent. In others, they will be difficult to find, but in most of these cases we shall not have to find them. In any game or experiment, if N denotes the total number of equally likely outcomes (in ft) , and if NA denotes the number of equally likely outcomes in the event A, then we define the probability of A by
™-¥ Concrete examples of this were given in Section 1.2. The following propositions will be used repeatedly in this course. P R O P O S I T I O N 1. If A is an event, then 0 < P(A) < 1. Proof: Since 0 < NA < JV, divide through by N, and obtain 0 < P(A) < 1. Q.E.D. P R O P O S I T I O N 2. If A is an event, then P{AC) = 1 -
P(A).
18
CHAPTER 1. EVENTS AND
PROBABILITY
Proof: Since N = NA + NA; we have, upon dividing through by N, 1 = P(A) + P(AC), from which the conclusion follows. Q.E.D. P R O P O S I T I O N 3. P(fi) = 1 and P{) = 0. Proof: This follows from the fact that N+ = 0 and N0 = N. P R O P O S I T I O N 4. IfA!,---,Ar
are disjoint events, then
pp(i[ i jj A) P(A ■) A ) == y^y^P(A■) Proof: Disjointness of Alt1? • • •• Ar implies
t=l
Dividing through by N yields the result.
Q.E.D.
P R O P O S I T I O N 5. / / A and B are events, then P(A) = P(BA) (BCA).
+
Proof: Because ft = B\JBC, then A = Anil = A(BUBC) = ABUABC. Since AB and ABC are disjoint, it follows that P(A) = P(ABUABC) = P(AB) + P(ABC). Q.E.D. P R O P O S I T I O N 6. If A and B are events, then P(AUB)
= P{A) +
P(ACB).
Proof: By Proposition 13 in Section 1.3, AliB = AU ACB, and A and ACB are disjoint. Applying Proposition 4 above we obtain P(AUB) = P(A) + P{ACB) . Q.E.D. P R O P O S I T I O N 7. If A and B are events then P(A UB) = P(A) + P{B) -
P{AB).
1.4.
PROBABILITY
19
Proof: By Proposition 6, P{A U B) = P(A) + P(ACB) . By Propo sition 5, P(B) = P(AB) + P(ACB) , or P(ACB) = P{B) - P{AB). Substituting this into the first formula, we get the result. Q.E.D. P R O P O S I T I O N 8. (Boole's inequality). If A and B are events, then P(AU B) < P(A) + P(B). Proof: By Propositions 7 and 1, P(Al)B) P(A) + P{B).
= P(A) + P(B)-P(AB) < Q.E.D.
EXERCISES 1. Prove: if A, B and C are events, then P(AUBUC)
= P{A) + P(B) + P(C) -P(AB) - P{AC) - P(BC) +
P(ABC).
2. Prove: if A, B and C are events, then P(A U B U C) < P{A) + P{B) + P(C). 3. Use the principle of mathematical induction to prove: if Ai, • • •, An
P (^)3,thenpfu^)<E \«=i / «=i
are events,
4. Prove: if A and B are events, and if A C B, then P(A) < P(B). 5. Prove: if A and B are events, then P(AB) < P(A) < P(A U B). 6. Prove: if A and B are events, if A C B, and if P(B) < then P(A) = P(B). 7. Prove: if Ai,A2 PiAJ + PiA^
P(A),
and A3 are events, then P(A X U A2 U A3) = + PiAiAlAs).
8. Another way to prove Proposition 6 is as follows. First prove that A fl (A U B) = A and A c (A UB) = A c 5 . Then apply Proposition 5. Now fill in complete details of this proof.
20
1.5
CHAPTER 1. EVENTS AND
PROBABILITY
Conditional Probability
We now define the conditional probability that an event A occurs, given that a certain event B occurs. Since we are given that B occurs, we can only define this conditional probability when P(B) > 0 or NB > 0. Since we are given that B occurs, then the total possible number of equally likely outcomes is NB which we place in the denominator. Among these we wish to find the total number of equally likely ways in which A occurs. This is seen to be NAHB or NAB- Thus, the conditional probability that A occurs, given that B occurs, which we denote by P(A\B), is obtained from the formula P W B )
R E M A R K l.IfA
.
^
and B are events, and if P(B) > 0, then
p(A\mP{m
P AB
"> ~ 1^ W
Proof: By the definition above, we have P(A\B)
=
NAB
NB
NABIN
NB/N
P(AB)
P{B) " Q.E.D.
R E M A R K 2. If A and B are events with positive probabilities, then P{AB) = P(A\B)P(B)
=
P(B\A)P(A).
Proof: The first equality follows from Remark 1, and the second is the same as the first with A replaced by B and B by A. Q.E.D. The three most useful theorems in connection with conditional prob abilities will now be presented along with some applications.
1.5. CONDITIONAL
PROBABILITY
21
T H E O R E M 1. (The Multiplication Rule). If A0, Air n + 1 events such that P(AoAi • • • A„_i) > 0, then P{AQAX ••■An) = P(A0)P(A1\A0)P(A2\A0A1)•
•• ,An are any
• • P(An\A0■ • • A n _ a ).
Proof: We prove this by induction on n. For n = l,P(A0) > 0, so P{Ax\Ao) = P(A 0 A!)/P(Ao), or P{AoA\) = P(A0)P(A1\A0). Now let us assume the theorem is true for n — 1 (where n > 2); we shall show it is also true for n. By induction hypothesis, P(A0A1 •.. An-i) = P(A0)P(A1\Ao)
- • • P(An^\A0
• • • A n _ 2 ).
By Remark 1, letting B = A0 • • • A n _ 1? A = An, we obtain PfAoAx.-.An)
=
P(i4o---i4 n _ 1 )P(i4 w |i4 0 ---i4 n - 1 )
=
P(A0)P(A1\Ao)
• • • P(A n |A 0 - • • A n _x),
which proves that the theorem then holds for all n.
Q.E.D.
E x a m p l e 1. (Polya Urn Scheme). An urn contains r red balls and b black balls originally. At each trial, one selects a ball at random from the urn, notes its color and replaces it along with c balls of the same color. Let i?t- denote the event that a red ball is selected at the ith trial, and let B\ denote the event that a black ball is obtained at the ith trial. We wish to compute P(RiB2B3) . Using the multiplication rule, we obtain P(RlB2B3)
=
P(R1)P(B2\R1)P(B3\R1B2)
r "
b
b+c
(r + b) (r + b + c) (r + b + 2c) *
E x a m p l e 2. (Sampling Without Replacement). An urn contains N tags, numbered 1 through N. One selects at random three tags with out replacement. This means: one first selects a tag at random, then without replacing it one selects a tag at random from those remaining, and again, without replacing it, one selects yet another tag from the
22
CHAPTER 1. EVENTS AND
PROBABILITY
N — 2 remaining tags. If ii, Z2, ^3 are three distinct positive integers not greater than JV, then, by the multiplication rule, the probabiUty that i\ is selected on the first trial, z*2 on the second trial and i% on the third trial is
L
!
1
N* N-l'
N-2
T H E O R E M 2. (Theorem of Total Probabilities). IfHir-,Hn are are disjoint events with positive probabilities, and if A is an event sat isfying A C U^Hi, then P(A) =
±P(A\Hi)P(Hi). 1=1
Proof: First note, using Propositions 7 and 12 in Section 1.3, that A = A n (U?=1fff-) =
U^AHi.
Further, AH\,- • • ,AHn are disjoint; this comes from the hypothesis that J/i, • • •, Hn are disjoint. Hence, by Proposition 4 in Section 1.4,
P(A) = p((jAHt)=J2P(AHi). \t=l
/
t=l
Now by Remark 1 or 2 above, P(AHt) =
P(A\Hi)P(Hi)
P(A) =
'£P(A\Hi)P(Hi).
for 1 < i < n . Thus
t=l
Q.E.D. Example 3. In the Polya urn scheme in Example 1, P(R1) = ^—
r+b
and P ( £ i ) =
&
r + b'
1.5. CONDITIONAL
23
PROBABILITY
In order to compute P{R2), we first note that R2 C R\ U B\ and R\ and l?i are disjoint. Hence by the theorem of total probabilities, P(R2) = P(R2\R1)P(Rl)
+
P(R2\B1)P(B1).
Since P(R2\Ri) =
r r | C and P{R2\B1) = r + 6+ c r + b-\- c
we have PfFM-
r + C
r r + 6+ c r + 6
r b r + 6+ c r + 6
r r + 6"
Example 4. In Example 2 above on sampling without replacement, let us compute the probability that 1 is selected on the second trial. Using the theorem of total probabilities we obtain N
P[\ in trial#2]
=
£P([1
in
trial#2]|[i in trial#l]) P([i in trial#l])
t=2
= f-JL. 1 = 1 T H E O R E M 3. (Bayes' Theorem) IfHi,-,Hn are disjoint events with positive probabilities, if A is an event satisfying A C U^Hi, and if P(A) > 0, then for j = l , 2 , - - - , n ,
_
WTO) £?„p(/iiff()nffi)
Proof: By the definition of conditional probability, we have, by our hypotheses and by Theorem 2, that P(H \A\] =
r{ni{
P{AHj)
P(A)
=
P A H P H
( \ i) ( >) Y2=iP{A\Hi)P(Hiy Q.E.D.
24
CHAPTER 1. EVENTS AND
PROBABILITY
In rather loose terminology, Bayes' theorem is applied in this general situation. An event A is known to have occurred. There are n disjoint events, called the possible causes of A, and since A has occurred it is known that one of Hu-,Hn "caused it" (i.e., A C U? = 1 # n ). If one wishes to determine which of the possible causes really caused it, one might wish to evaluate P(Hj\A), for 1 < j < n, and select as a possible cause an Hj for which P(Hj\A) is maximum. Example 5. Consider the Polya urn scheme again. Suppose one ob serves that the event R2 has occurred and wishes to determine the probability that Bi was the "cause" of it, i.e., to evaluate P(i?i|i? 2 )By Bayes' theorem we find that
P(B,W
=
W + W P(R) \R )P(R )
P(R2\B1)P(B1) b r +b+c
One should note that P(Bi|i? 2 ) =
2
1
1
P(R2\B1).
Example 6. Consider the sampling without replacement that occurred in Examples 2 and 4. Suppose one observes that 1 is selected in the second trial and wishes to find the probabiHty that selecting 3 in the first trial is its "cause", i.e., to evaluate P([3 in 1st trial]|[l in 2nd trial]). Using Bayes' theorem this turns out to be P([l in 2nd trial]|[3 in 1st trial])P([3 in 1st trial]) EiL 2 ^([1 in 2nd trial]|[j in 1st trial])P(£j in 1st trial])
1 N - 1'
EXERCISES 1. In the hypothesis of Theorem 1, it was assumed that P(AoAi - • • A n _i) > 0 so that the last conditional probability was well-defined. Prove that this assumption implies that P(C\j=0Aj) > 0 for 0 < k < n — 2, so that all the other conditional probabilities are also well-defined.
1.5. CONDITIONAL
PROBABILITY
25
2. In the proof of the theorem of total probabilities, the statement is made that since H\, • • •, Hn are disjoint, then AH\, • • •, AHn are disjoint. Prove this statement. 3. In sampling without replacement considered in Examples 2 and 4, suppose a simple random sample of size 3 is selected. Prove that the probability of getting a 1 in the third trial is 1/N. 4. In the Polya urn scheme, find P(Rs). 5. An urn contains four objects: A, 2?, C, D. Each trial consists of selecting at random an object from the urn, and, without replac ing it, proceeding to the next trial. If X is one of those four objects, and if i = 1 or 2 or 3 or 4, let X{ denote that event that X is selected at the zth trial. Compute the following: i) p ( ^ ) . ii)
P(A2\A1),
iii) P(A 2 |Bx), iv) P(A2) and v)
P(B3).
6. In the Polya urn scheme, compute
P(Ri\R2).
7. In the Polya urn scheme, compute i) P(lfe|fll), ii) P(R3\R2) iii)
and
P(R1R3)>
8. An urn contains 2 black balls and 4 white balls. At each trial a ball is selected at random from the urn and is not replaced for the next trial. Let B{ denote the event that the first black ball selected is on the ith trial. Compute i)
P(B2),
ii) P(Pn),
26
CHAPTER 1. EVENTS AND
PROBABILITY
iii) P(B5) and iv)
P(B6).
9. In Problem 8, let C,- denote the event that the second black ball selected is selected at the ith trial. Compute i) P(C 2 ), ii) iii)
P(C3), P{BXCZ),
iv) P(B2\C3) v)
and
P{CX).
10. An absent-minded professor has five keys on a key ring. What is the probability he will have to try all five of them in order to open his office door? 11. Urn # 1 contains 2 white balls and 4 black balls, and urn # 2 contains 5 white balls and 4 black balls. An urn is selected at random, and then a ball is selected at random from it. What is the probability that the ball selected is white? 12. In Example 2, find the probability that 2 is selected in the third trial, where N = 5.
Chapter 2 Random Variables 2.1
R a n d o m Variables as Functions
In a sample survey, when we select individuals at random, we are really not interested in the particular individuals selected. Rather, we are interested in some numerical characteristic (or characteristics) of the individual selected. This numerical characteristic is a function, in that to each elementary event selected there is a number assigned to it. Definition. A random variable X is a function which assigns to every element u £ fi a real number X{LO). The following examples illustrate the idea of random variable. (i) Take a sample of size one from the set 0 of all registered students students at your university. In this case, X{u) might be the grade point average of u. Thus, corresponding to student u G H is the number X(u>). (ii) Sample three times without replacement from the set of all regis tered students at your university. In this case, fi will consist of the set of all ordered triples ( t ^ i , ^ , ^ ) , where no repetitions are allowed. If Y denotes the age of the third student selected, i.e., if Y assigns to ( w i , ^ ^ ) the number: the age of CJ3, then Y is a random variable.
27
28
CHAPTER 2. RANDOM
VARIABLES
(iii) In example (ii), if Z assigns to {wi,W2,wz) the total indebtedness of u>i and u>2 and u>3, then Z is a random variable. We are usually interested in the values that random variables take and the probabilities with which these values are taken. Thus we have the following definition. Definition. If X is a random variable defined over some fundamen tal probabiUty space fi, then the range of X , which we denote by range(X), is defined as the set of numbers X(u>) for all u> G fi, i.e., range(X) = {X{u) : u € f)}. This is also denoted by X(£i) or {x : x = ^(ci>) for some u> € fi}. Since fi is finite, then the range of a random variable X is finite and has at most as many members as does Q. Random variables are func tions, and so, like functions, they admit algebraic operations. These are given in the following definition, Definition. If fi is a fundamental probability space, if X and Y are random variables defined over f), and if c is a constant, then we de fine the random variables X + F , XY, X/Y, cX, max{X,Y}, and min{X, Y} as follows: (i) X + Y assigns to every u 6 fi the number X(u) +
Y(u),
(ii) XY assigns to every UJ £ Q the product X(u)Y(u) X(UJ) and Y(u),
of the numbers
(iii) X/Y assigns to every u € 0 the quotient X{u)/Y(u)\ for at least one u> & fi, then X/Y is not defined,
if Y(u) = 0
(iv) cX assigns to u the number CX{UJ) , (v) max{X,Y} assigns to each u> e 0, the larger of the numbers X(u>),Y(u>), and (vi) min{X, y } assigns to each u € ft the smaller of the numbers X(u>) and y(u;) .
2.1.
RANDOM
VARIABLES
AS
29
FUNCTIONS
In general, if X, Y, Z are random variables, and if /(u,v,u;) is any function of three variables, then / ( X , Y, Z) is a random variable which assigns to every w G f i the number f(X(Lo), Y(u), Z{u)). Among the random variables defined over 0 , some very important ones are the indicator random variables, defined as follows. Definition. If A C fi is an event, the indicator of A , denoted by 7^, is defined as the random variable that assigns t o w G f l the number 1 if u> G A and 0 if u £ A , i.e., T
/
N
/ 1
if u> G A
P R O P O S I T I O N 1. If A is an event, then l\ = IA. Proof: If u> G A, then 7i(w) = l 2 = 1 = iyt(u>), and if u; ^ A, then I\(w) = 02 = 0 = 7^(a>). Q.E.D. P R O P O S I T I O N 2. If A and B are events, then IAIB = IABProof: liuj E A and u E B, then u; e AB, and thus /A/BM
= IA(U)IB{U)
= 1-1 = 1 =
IAB{u).
If u> is not in A S , then IAB{W) = 0 and, since UJ is not in at least one of A, 5 , then at least one of the numbers I A (), IB{&) is zero, in which case IAIB(W) = IA{U)IB(U) = 0 = /^(w). Q.E.D. P R O P O S I T I O N 3. 7/A and J5 are events, then IAB = m i n { / ^ , / B } . Proof: Note that the minimum of IA{W) and IB{W) is 1 if and only if IA{U>) = 1 and IB(W) = 1, which is true if and only if u G A and UJ € B, i.e., a; G A S o r / ^ u ; ) = 1. Q.E.D. P R O P O S I T I O N 4. / / A aradB are events, then IAuB = m a x { J A , / B } . Proof: IAUB(V) = 1 if and only if UJ G A U 2?, which is true if and only if CJ is in at least one of A, B. This means at least one of IA{U), IB{W) is 1. Q.E.D.
30
CHAPTER 2. RANDOM
VARIABLES
In our notation we generally suppress the symbol u>. Thus, if X is a random variable, we shall write [X = x] instead of {a; G ft : X(u) = x}, we shall write [X < x] instead of {u G ft : X(u) < x} and, in general, for any set of numbers A, we shall write [X € A] instead of {u G ft : X(u>) G A}. An example of this is the game where ft denotes the set of outcomes of three tosses of an unbiased coin. In this case ft consists of (HHH) (HHT) (HTH) (HTT) (THH) (THT) (TTH) (TTT)' Let X denote the number of heads in the three tosses. For example, if u) = (HTH), then X(u) = 2. Also X(THH) = 2 , while X(TTT) = 0. Thus [X = 2] = {(HHT), (HTH), (THH)}, and [X 30] and [X G (1,4]]? 2. A box contains five tags, numbered 1 through 5. A tag is selected at random, and, without replacing it, a second tag is selected. (i) List all 20 ordered pairs of outcomes in this game of sampling twice without replacement. (ii) Let X denote the first number selected, and let Y denote the second number selected. List the elementary events in each of the following events: [X = 2], [Y = 2], [X = 3], [Y = 3]
and [X + Y < 4].
2.2. DENSITIES
OF RANDOM VARIABLES
31
3. Prove: if A is an event, then I A* = 1 — 7^. 4. Prove: if A and B are events, then they are disjoint if and only if IAUB = IA + IB-
5. Prove: if X and Y are random variables, and if x £ range(X), then [X = x] = U{[X = x][Y = y]:ye range(Y)}. 6. Prove: if X is a random variable, and if A is any set of numbers, then [X e A] = U{[X = x] : x e A n ransre(X)}. 7. Prove: if X is a random variable, then
X = J2xIlx=*h X
where the symbol ^
means the sum for all x G range(X).
X
2.2
Densities of Random Variables
What we are primarily interested in are the probabihties with which a random variable takes certain values. This set of probabihties is usually referred to as the density of the random variable, which we now define. Definition. If X is a random variable, its density fx{x) is defined by r / \ _ f P[X = x] if x G range(X) -{0 iix£ range(X)
}x[x)
In the example presented at the end of Section 2.1 (where X denotes the number of heads occurring in three tosses of an unbiased coin) the density of X is the following: fx(0) fx(l)
= P[X = 0] = P{(TTT)} = l/8, = P[X = l] = P{(HTT),(THT),(TTH)}
= 3/8,
32
CHAPTER 2. RANDOM fx(2)
=
P[X = 2] = P{(HHT),(HTH),(THH)}
fx(S)
= P[X = 3] = P{(HHH)}
fx(x)
=
VARIABLES = 3/8,
= 1/8, and
O i f s £ {0,1,2,3}.
Note that for every x € range(X), fx(x) > 0. We shall also need a definition of range of two or more random variables considered jointly (or as a random vector). Definition. If X and Y are random variables, their (joint) range, denoted by range(X, F ) , is defined by range(X,Y)
= {(X(LJ),Y(CJ))
: UJ G ft}.
In general, if X\, • • •, Xn are n random variables, then we define range(Xu--,Xn)
= {(Xi(o;), • • • ,Xn(u>)) : w G ft}.
One should note that range(X, Y) is a set of number pairs, i.e., a set of points in 2-dimensional Euclidean space R2. Likewise, range(Xi,..., Xn) is a set of points in n-dimensional Euclidean space Rn. Definition. If X and Y are random variables, their joint density fxy{x,y) is defined by fxA*, V) = f P{[X 10
= XU Y = d )
if
(*' y\ G range{X, otherwise.
Y),
This is referred to as a bivariate density. As an example, consider an urn with the following nine number pairs: (1,1) (1,2) (1,3) (2,1) (2,2) (2,3) (3,1) (3,2) (3,3) A number pair is selected at random. Let X denote the smaller of the two numbers (e.g., X assigns to (3,1) the number 1), and let Y denote the sum of the two numbers (for example, Y assigns to (3,2) the number 5). We shall find the joint density of X and Y .
2.2. DENSITIES
OF RANDOM
VARIABLES
33
First note that range(X) = {1,2,3}, range(Y) = {2,3,4,5,6}, and range{X,Y) = {(1,2),(1,3),(1,4),(2,4),(2,5),(3,6)}. Next observe that
[X- -1}[Y = 2] [X- = i][y = 3] [X-- = i][y = 4] [X- = 2][y = 4]
=
{(1,1)} = {(1,2), (2,1)} = {(1,3), (3,1)} = {(2,2)} [X-- --2][Y = 5] = {(2,3), (3,2)} [X- = 3][F = 6] = {(3,3)}. Thus, / x , y ( l , 2 ) = l / 9 , / * , y ( l , 3 ) = 2/9,/x,y(l,4) = 2/9,/*,y(2,4) = l / 9 , / x , y ( 2 , 5 ) = 2/9,/*,y(3,6) = 1/9, and fxA^v) = 0 for aU other pairs (x,y). Notice in this bivariate case that /x,y(2,3) = 0 although 2 £ range(X) and 3 £ range(Y). What matters now is range(X, Y) . Joint densities of more than two random variables are similarly de fined. For example, if X, Y, Z are random variables, their joint density is defined, for all (#, y, z) € range(X, Y, Z), by fx,Y,z(*, y,z) = P([X = x][Y = y][Z = *]); otherwise we define /jr,y,z(x, y,z) = 0. It is important to keep in mind the notation used from here on. If Uij - • •, Ur are random variables, the joint density of f/i, • • •, Ur is denoted by The subscripts are t / i , - - - , [ / r ; they indicate the random variables of which it is the joint density. Within the parentheses will be the points (i/i, • • •, ur) in the range of C/i, • • •, Ur at which the density is evaluated. Many times we have (or start out with) a joint density of several random variables, but we wish to have the densities of each single ran dom variable. This can be accomplished by using theorems like the following. T H E O R E M 1. If X and Y are random variables with joint /x\y(#>2/); then the densities fx(x) and fy(y) are fx{x)
= YHfx>Y(x,y)
: y £ range(Y)} if x £ range(X)
density
34
CHAPTER 2. RANDOM
VARIABLES
and fv(y) = Ys{fxy(xiV)
:x e
range(X)} if y
erange(Y).
Proof: We first observe that for each x G range(X), [X = x] = U{[X = x][Y = y]:y€
range(Y)}.
Indeed, if UJ is in the right hand side, then X(UJ) = x, and if UJ is in the left hand side, i.e., X{UJ) = x, then Y(u>) G range(Y) , say, Y{UJ) = j/i, in which case UJ G [X = x)[Y = Vl] C U{[X = x][r = y] : y € r a n 5 e ( F ) } . Since the right hand side is a disjoint union, we have fx(x)
= P[X = x] = 52{P([X =
YHfx>Y(x>y)
:
y
e
= x][Y =
y]):yerange(Y)}
range(Y)}.
The proof of the second equation of the theorem is similar.
Q.E.D.
In Theorem 1, fx(x) is called a marginal or marginal density of fx7Y(x,y)- As an example, consider the random variables X, Y whose joint density is given by: /*,y(l,l) /x,r(3,2)
= =
l / 8 , / ^ y ( 2 , l ) = l/8,/x,y(2,2) = l / 4 , l/8,/*,y(3,3) = l / 4 , / x , y ( 4 , l ) = l / 8 .
Graphically, this is represented as follows:
2.2. DENSITIES
OF RANDOM
Y
35
VARIABLES
.I/4
3-
.I/4
2-
.1/8
1-
0()
.1/8
.1/8
i
i
1
2
.1/8
1
3
i
4
The marginals for X and Y are: / x ( l ) = 1/8, fx(2) = 3/8,/x(3) = 3/8,/x(4) = 1/8, and / K ( l ) = 3/8, fy (2) = 3/8,/y(3) = 1/4. Individual and joint densities of random variables are useful in com puting the probabilities that certain functions of these random variables "take values" in certain sets. We consider particular cases of this in the following theorems. T H E O R E M 2. If X is any random variable, and if A is any set of real numbers then P[X e A] = £ { / * ( * ) : x e A n
range{X)}.
Proof: It is clear that [X E A] = U{[X = x] : x € A n
range(X)}
and that the right hand side is a disjoint union. Taking probabilities of both sides, we get P[XeA]
= ^{P[X =
Jlifxix)
= x]:xe : x e A fl
Anrange(X)} range(X)}.
CHAPTER 2. RANDOM
36
VARIABLES
which proves the theorem. One can prove in a similar fashion the following theorem whose proof we omit. T H E O R E M 3. / / X and Y are random variables, and if S is any subset of 2-dimensional Euclidean space, R 2 , then P[(X, Y) G S] = £ { / A : , Y ( Z , 2 / ) : (x,y) G S and (x,y) G range(X,Y)}. Another result of this type that we shall use is the following. T H E O R E M 4. If X andY are random variables, and ifg(x^y) function defined over range(X,Y), then P[g{X,Y)
= z] = ^2{fxAx>y)
:
9(z>y) = z
and
(*>V) €
is a
range(X,Y)}.
Proof: First observe that the right hand side of the above equation is summed over all number pairs (x,y) such that (x,y) G range(X,Y) and g(x, y) = z. One can easily verify that \g{X,Y)
= z] = U{[X = x][Y = y] : g(x, y) = z and (x, y) G range(X,
Y)}.
One is also able to verify that the union on the right side is a disjoint union. Taking probabilities of both sides yields the conclusion of the theorem. Q.E.D. The above theorem shows how to obtain the density of a function of one or more random variables. We next need to develop the idea of independence of random variables. Definition. If Xi, • • • , X m are random variables defined over fi, we shall call them independent if and only if, for every yt- G range(Xi), 1 < i < m, the events [Xx = y x ], • • •, [Xm = ym] are independent. T H E O R E M 5. If X\,- • • , X m are random variables, they are inde-
2.2. DENSITIES
OF RANDOM
37
VARIABLES
pendent if and only if m
fXu^XmiVU ' " • , Vm) = I I fXjiVj) for all yi 6 range(Xi), I
<m.
Proof: The equation is a consequence of the definition of independence. The converse is obtained, starting from the equation given, by summing both sides over various indices j/ t - £ range(Xi) to obtain r
fxilt...txir(yi1,...,yir)
=
Tifxtjiytj)
for 1 < 4 < • • • < £r < m, 1 < r < m.
Q.E.D.
T H E O R E M 6, If X\, • • • , X n are independent random variables, and if c i>' * • ? Cn are constants, then C\X\, • • •, CnXn are independent. Proof: First assume that none of the c,'s are zero. Then for 1 < ji < " ' < jr < n and by the hypothesis of independence of X\, • • •, X n , we have P f][cjtXjt
= ue] = Pf)[Xjt
£=1
= ut/cjt)
£=1
=
t[P[Xjk=ut/cjt] £=1
=
f[P[cJtXjt=ue}. 1=1
If ct = 0, then P[ctXi = 0] = 1 and P([ctXt = 0] n A) = P{A) for any event A. If u£ ^ 0, P [ Q X * = u£] = 0, and P ( [ Q X * = ut]nA) = 0 for any event A. Putting all these statements together gives us the theorem. Q.E.D. The most concrete examples of both independent random variables and non-independent (i.e., dependent) random variables occur in sample survey theory. Let us look at the basic model. We have a population U of N units denoted by f/i, • • •, i/;v, all of which are equally likely to be selected under random sampling. If we sample n times with
38
CHAPTER 2. RANDOM
VARIABLES
replacement, then the fundamental probability space ft is the set of all possible ordered n-tuples of units from U, with repetition of units in each n-tuple allowed. Let X be a function denned over U which assigns to the unit U{ tht number it,-, 1 < i < N. The u,'u need non be distinct and usually are not. For 1 < j < N let Xj be a function (or random variable) denned over ft by assigning to the n-tuple (Utl,• • • Utn) the number u^, i.e., ^ 0 ( ( ^ 1 J " ' 5 Uln)) = X(Ut-)
= U£-.
The random variables Xu • • •• Xn are referred to as a sample of fsze n on X, where the sampling is done with replacement. The fundamental probability space fi contains Nn equally likely elementary events. T H E O R E M 7. In sampling n times with replacement as defined above, the random variables Xu • ■ • •Xn are independent, and all have the same density. Proof: Let Nx denote the number of units Ut such that X(Ui) = x, i.e., Nx = #{U G U : X{U) = x}. Then the joint density f[NXi /*.-*.(*!, - ^ ) = ^
r iV
n
= f [ ^ = f[P[xi t=l
i V
t=l
= Xi] = f[
fXi(Xi).
t=i
By Theorem 5, Xu • • •• Xn are independentt Clearly, because of fhe replacement after each selection, all densities fXi(x) are the same. Q.E.D. If the sampling is done without replacement, then it is clear that the random variables Xu • ■ • •Xn are eot tndependent. However rhere is the following interesting and useful result. T H E O R E M 8. In sampling n times without replacement, all univariate densities {fXi(x)} are the same, and all bivariate densities {fXi Xi{u,v),i ± j} are the same.
2.2. DENSITIES
OF RANDOM
VARIABLES
39
Proof: Both conclusions follow from this remark: in sampling n times without replacement from £/, the set of f^J equally likely outcomes is the same as that obtained by recording tne selection of the zth unit first, then the jih unit (i ^ j ) , and then the remaining n — 2 units from left to right. Q.E.D. EXERCISES 1. Consider a game in which an unbiased coin is tossed four times. (i) List the set fl of all equally likely outcomes. (ii) Let X denote the number of heads in the four tosses. Find the density fx{') oi X. (iii) Let U denote the smaller of the number of heads and the number of tails in the four tosses. Find the density fu{') of U. (iv) Find the joint density of X and U. 2. Let X and Y be random variables whose joint density is given by /x,y(0,0) /x,y(l,2)
= =
l/3,/x,y(0,l) = l / 4 , / x , y ( l , l ) = l/6 1/6 a n d / x , y ( 2 , 0 ) = 1/12.
(i) Find the density, /x(*)> of X. (ii) Find the density, /y(-)> of Y. (iii) If U = min(X, F ) , find the joint density of U and X, /z7,x(#, •)• 3. Compute P([Y = l]\[X = 1]) and P([X = 2]\[Y = 0]), where X and Y are as in Problem 2. 4. In Problem 2, find the density of X + Y. 5. An urn contains 3 red balls and 4 black balls. One ball after another is drawn without replacement from the urn until the first black ball is drawn. Let X denote the number of balls drawn from the urn at the time the first black ball is drawn. Find the density of X .
40
CHAPTER 2. RANDOM
VARIABLES
6. Prove: if X is a random variable, and if g is a function defined over range(X), then the density of g(X) is fg(X)(z) = 52{fx(x)
: g(x) = z}.
7. Prove: if X, Y and Z are independent random variables, if g is a function defined over range(Z), and if h is a function defined over range(X,Y), then /&(X, F ) and g(Z) are independent random variables. 8. Prove: if X, Y and Z are random variables, if ip(x, y) is a function defined over range(X,Y) and if ^(x,j/,z) is a function defined over range(X, Y, Z\ then the joint density, / ( u , v), of the random variables (X, Y\ */>(X, F, Z) is /(w, v) = 53{/y,y,z(«, 2/, z) : (p(x, y) = u, 0 ( s , y, *) = v}. 9. Let £/ : t/i, • • •, £/# be a population with N units, and let X be a real-valued function defined over U. Sampling is done n times without replacement. Let Xi denote the value of X for the ith unit selected. If 3 < n < JV, prove that the joint density of all triples (Xi,Xj,Xk) of observations are the same when i , j , k are distinct. 10. Let U,X and N of Problem 9 be as follows: U X:
Ui 2.3
U2 U3 2.1 3.0
U4 2.3
U5 3.0
U6 U7 3.5 2.1"
One samples three times without replacement. Letting XL, X 2 , X3 be as in Problem 9, determine the joint densities of (i) (Xi, X2, X3), (ii) (Xi,X2) (by taking a suitable marginal), (iii) (X 2 ,X 3 ), (iv) (X 3 ,Xi), (v) Xu (vi) X2 and (vii) X3. 11. Compute P[X2 = 2.1] and P([X2 = 2A]\[X1 = 2.1]), where Xx and X2 are as in Problem 10. Ponder over the intuitive reason for the difference.
2.3. SOME PARTICULAR
DISTRIBUTIONS
41
12. Let X, Y, Z be random variables with joint density as follows: /x,y,z(0,0,0)
=
l / 2 1 , / w ( 0 , 0 , l ) = 2/21,
/x,y,z(0,l,l)
=
l / 7 , / w ( 2 , l , 0 ) = 4/21,
/w(2,0,l)
=
5/21,/ X l y,z(l,0,2) = 2/7.
(i) Find the joint density of X and Z. (ii) Find the marginals / * ( • ) , / y (•)>/&(■)• 13. An urn contains 3 red, 2 white and 2 blue balls. Sampling is done without replacement until no balls are left in the urn. Let X denote the trial number at which the first blue ball is selected, let Y be the trial number at which the first red ball is selected, and let Z be the trial number at which the first white ball is selected. (i) Determine the joint density of X, Y, Z. (ii) Determine the joint density of X, Y. (iii) Compute the probability that blue is the last color selected. (iv) Compute the probability that white appears before red, i.e., P[Z < Y\. 14. Prove: If X\, • • •, Xn are independent random variables, if ai, • • •, a n are constants, and if Yj = Xj + a^, 1 < j < n, then Yi, • • •, Yn are independent random variables.
2.3
Some Particular Distributions
Some particular formulae for densities arise in practice again and again. Those that appear in sample survey mathematics are singled out here. Definition. A sequence of Bernoulli trials refers to repeated plays of a game such that i) with each play there are two disjoint outcomes, S and F , of which exactly one occurs,
42
CHAPTER 2. RANDOM
VARIABLES
ii) P(S) does not vary from play to play, and iii) the outcomes of the plays are independent events. An example of a sequence of Bernoulli trials is the repeated tossing of a die where, say, S denotes the outcome that the die comes up 1 or 2 and F denotes the outcome that the die comes up 3,4,5 or 6. Independence of outcomes is clear, and P(S) = 1/3 for each toss of the die. Definition. Let X denote the number of times S occurs in n Bernoulli trials, and let p = P(S). Then the random variable X is said to have the binomial distribution, denoted by 5 ( n , p ) . We shall sometimes write: X is B{n,p) o r X ~ B(n,p). find the density of X when X is B{n,p). T H E O R E M 1. IfX
is B(n,p),
then
fx{x) = [ t)pX(l-P)n-X \ 0
We now
ifO-sr^35^n^)f(n*) ^•••••->-sr^35^n^)f(n*) (•-£*) (•-£*) forxi forXi
>0,•-••x >0,,;i-|-----|hx >0,•-••Xr>0,Xi + ---+X r r r
< U. n.
Proof: The event C\ri=1[Xi = x;] can be written as a disjoint union of all n-tuples involving Xl A^S,* ■ • •xr Ar'ss Each of these n tuples has probabiUty 1 K •Pr r (l -- Pi Pi P? •• •• •#'(!
pr) Pr)n-n-XlXl-.-"-XrXr.
2.3. SOME PARTICULAR
DISTRIBUTIONS
45
Now how many such n-tuples do we have? There are f M selections of trial numbers possible for A\ to occur. For each x\ trial numbers for Ai to occur there are ( n ~ x i ) selections of trial numbers for A2 to occur. Continuing, we see that the total number of n-tuple outcomes in which there are X\ Ai's,- • •, xr A r 's is fr (n-T£{xt\ =
MV
*i
)
n!
*i!---* r !(n-ELi*/*(s). X
As an example, suppose X is a random variable with density /x(-2.3) fx(2.79)
= l/10,/ j r (0) = l / 5 , / * ( 1 . 3 4 ) = 3 / 1 0 , = 2/5, fx(x) = 0 for x$ {-2.3,0,1.34,2.79}.
Then, in accordance with the above definition, EX = ( - 2 . 3 ) ^ + 0 • | + ( 1 . 3 4 ) ^ + (2.79) • | = 1.288. We might have random variables which are functions of one or more random variables. The following theorem gives the formulae for their expectations. 47
48
CHAPTER 3.
EXPECTATION
T H E O R E M 1. If X andY are random variables, ifh is a function defined over the range of X, and if g is a function defined over the range ofX,Y, then Eh(X) =
Y,h(x)fx(x) X
and Eg(X,Y)
=
J2g(x,y)fx,Y(x,y).
Proof: Using the definition and Exercise 6 in Section 2.2, we obtain Eh{X)
=
y
jTtzP[h(X)
= z]
z Z
= EH*)fx(x). x
The proof of the second formula is accomplished in a similar manner. Q.E.D. As an example, let X be a random variable with density given just above the statement of Theorem 1, and let hhe & function defined by h(x) = 2ex + x2. Then Eh(X)
= *(-2.3)-L + h(0) • I + Ml-34) - A +
k(2.79).
|
=
(2e"2-3 + (-2.3) 2 )(1/10) + (2e° + 0 2 )(l/5)
=
+(2e 1 - 34 + (1.34) 2 )(3/10) + (2e2-79 + (2.79) 2 )(2/5) 19.918
T H E O R E M 2. If X and Y are random variables, and if C is a constant, then i) E(X + Y) = E(X) + E(Y), and ii) E(CX)
=
CE(X).
3.1. PROPERTIES
OF EXPECTATION
49
Proof: By Theorem 1 above and by Theorem 1 of Section 2.2, if we take the function g defined by g(x, y) = x + y,we have
E(X Y) = J2(* J2(x ++y)f £(x ++ y) y)fxA*,y) XX(x,y)
= E D */*.>-(*» y) = Ei: yfxA*, v) = EX xDy */*.r (*»x y) =y Ex E y/*.K*,») = Yl (52fx,y( 'y)) + 12y(J2fx,Y(x>y)) x
x
y
y
x
= Yl fx(x) + Y,yfY(y) = E(x) + E(Y). y = Ex */*(*)+ EyMs/) = £W + £ ( n
Again, by Theorem 1, if h(x) = Cx, then Again, by Theorem 1, if h(x) = Cx, then E(CX) = Y,Cxfx(x) X
= C£xfx(x)
= CE{X).
X
Q.E.D. T H E O R E M 3. If X is B(n,p) then EX = np. Proof: By the definition of B(n,p) and expectation we have
T=0 x=0
and, letting fc = x - 1, we eave eusing gth einomial lheorem) - 1n 1 k n £(X) = npY,( ,1)pk(l-p) -E(X) = npE^r^Al-P)" "* k k=o ^ ^ * ' ' fc=o = np{p^-(1-p)) = np(p + ( 1 - p ) ) n -n1-1==np. np.
Q.E.D.
50
CHAPTER 3.
EXPECTATION
Two little examples should be mentioned now. Example 1. If X = IA for some event A, then, because [IA = 1] = A and [IA = 0] = Ac, we have
E(IA) = 1 • P(A) + 0 • P(AC) = P(A).
Example 2. If X is uniformly distributed over {1,2, • • •, N} , then P[X = x) = l/N for x = 1,2, • • •, N, and hence
x=1
=
JV
iV
1 JV(iV N(N + l) 1) N' JV" 2
=
N+ JV +1l ~ 2 ~ -
EXERCISES 1. If X and Y are random variables with joint density given by / x y ( 0 , 0 ) = 1/36, / x r ( 0 , l ) = 1/18, / y y ( 0 , 2 ) = 1/12, / * y ( l , 2 ) = 1/9, /x,y(2,2) = 5/36, /x,y(2,l) = 1/6, /Ar,y(2,0) = 7/36, and /x,y(l,0) = 2/9, compute (i) E(X), (ii) E(Y), (iii) E(X% (iv) E ( X y ) , (v) £;(2X+y+l), (vi) E (^y) and (vii) E(max{X,Y}). 2. Prove: if a and b are constants, and if X and F are random variables, then E(aX + bY) = aE(X) + bE(Y). 3. Prove: If X and Y are random variables which satisfy X(u) Y(w) for all u e 0 , then £(X) < £;(F).
0. Hence there exists a number xo € range(X) such that XQ ^ 0 and P[X = x0] > 0. Then E(X2)
= Y,x2fx(x)
> xlfx(x0)
> 0,
X
which contradicts the hypothesis that E(X2) = 0.
Q.E.D.
T H E O R E M 1. (Schwarz's Inequality). If X andY are random vari ables, then {E(XY))2
0] = 1, and hence E((tX + Y)2) > 0 for all t. Thus the second degree polynomial in t, E({tX + Y)2) = E(X2)t2
+ 2E(XY)t
+ E(Y2)
is always non-negative, i.e., it either has a double real root or no real root. In either case, b2 - ac = (E{XY))2 - E(X2)E(Y2) < 0, in 2 2 2 which case (E(XY)) < E(X )E(Y ). Equahty holds if and only if the polynomial has one real double root, i.e., there exists a value of t, call it t 0 , such that E((t0X + Y)2) = 0. By Lemma 1, this is true if and only if P[t0X + Y = 0] = 1 . Q.E.D. An easy application of the theorem is an alternate proof of the fact that for every random variable X, (E(X))2 < E(X2). Indeed, let Y = 1 with probability 1. Then X = XY and E(Y2) = 1, so (E{X))2 = {E(XY))2 < E{X2)E{Y2) = E(X2).
CHAPTER 3.
58
EXPECTATION
Definition. If X and Y are random variables, we define the covariance of X and Y by Cov{X,Y)
= E((X - E(X))(Y
-
E(Y))).
T H E O R E M 2. If X andY are random variables, then Cov(X,Y) E(XY) - E{X)E(Y) and Cov(X,X) = Var(X).
=
Proof: Using properties of expectation, we have Cov(X,Y)
= =
E((X - E(X))(Y - E(Y))) E(XY - E(X)Y - E(Y)X +
=
E(XY)
-
E(X)E(Y))
E(X)E(Y).
The second conclusion follows from the two definitions.
Q.E.D.
Definition. If X and Y are non-constant random variables, then their correlation or correlation coefficient, pxy, is defined by PX Y
'
Cov(X,Y) ~~ s.d.(X)s.d.(Y)'
T H E O R E M 3. IfX andY are independent (and non-constant) dom variables then Cov{X,Y) = 0 and pxy = 0.
ran
Proof: Using the definition of covariance, Theorem 4 of Section 3.2 and Theorem 2 above, we have Cov(X,Y)
= E(XY) =
This implies pXly = 0.
-
E(X)E(Y)
E(X)E(Y) - E(X)E(Y)
= 0. Q.E.D.
It is important to note that the converse is not necessarily true, namely, pxy = 0 does not necessarily imply that X and Y are inde pendent. There are examples (see Exercise # 1 ) where X and Y are not independent and yet px,Y = 0-
3.3. COVARIANCE
AND
CORRELATION
59
T H E O R E M 4. IfX andY are non-constant random variables, then —1 < PxyY < 1- Further, pxy = I if and only ifY = aX + b for some constants a > 0 and b, and pxy — —1 &/ a^d onZy ifY = cX + d /or some constants c < 0 and d. Proof: In Theorem 1 (Schwarz's inequality) replace Xand Y by X — E(X) and y—E(Y) respectively, and one immediately obtains p2XY ^ 1 or — 1 < /OX,Y < 1. By Theorem 1, /o^ y = 1 if and only if there are real numbers u and v, not both zero, such that u{X - E(X)) + v(Y - E(Y)) = 0. Since by hypothesis X and Y are non-constant, this last condition implies that both u and v are non-zero. Hence we may write Y = aX+b when PXY = 1- Now
cw(x,r) = £((* - £(x))(y - £(y))) = a £ ( ( X - E(X))2) = aVar(X).
+ bE(X -
E(X))
Since Var(X) > 0, we may conclude (i) Pxy = 1 if and only if Cov(X, Y) > 0, which is true if and only if a > 0, and (ii) />x,y = —1 if and only if Cov(X, Y) < 0, which is true if and only it a < 0. Q.E.D. In sample survey theory we shall need to know the formula for the correlation coefficient of two multinomially connected random variables. L E M M A 2. If X\j • • •,Xr are random variables whose joint distribu tion is MAf(n,pi, • • • ,p r ), then (Xi,X2) are MAf(n,pi,p2) and X± is S(n,pi). Proof: In the definition of multinomial distribution in Section 2.3, we could consider the events B0 = A0 U UJ=3 B\ = A\ and B2 = A2. Then Xi and X2 are the number of times B\ occurs in n trials and the number of times that B2 occurs in n trials, respectively. Further B0, B\
60
CHAPTER 3.
EXPECTATION
and B2 are disjoint, and one of them must occur. Thus, by definition (X\,Xi) are MN(n,p\,p2). Also, X\ is the number of times event A\ occurs in n trials. The definition of Bernoulli trials is satisfied, and thus Xx is B{n,pi). Q.E.D. T H E O R E M 5. IfXu---,Xr Cov(Xi,X2) = —npip2, and
are MAf(n,pu-
Pxux2 = - 4
• - ,p r )(r > 2), then
P1P2
(l-ft)(l-ft)-
Proof: By Lemma 2, the random variables X\,X2 are A4J\f(n,pi,p2). Now Xt- denotes the number of times Ai occurs in n trials, i = 1,2. Let Ct- denote the event that A\ occurs in the zth trial, and let Di denote the event that A2 occurs in the ith trial, 1 < i < n. Hence X\ = J^JLi ^C and X 2 = YA=I ID{,
and
£(x^2) = £((£>,)(£;/i,,)) t=i
i=i
For each i, C, and JD, are disjoint, so Ic(Di = 0- For the n2 — n pairs {(z? i ) : * 7^ i}? since Ct- and Z)j are independent, £ ( / c , ^ ) = P(dDi)
=
P{Ci)P{Dj)=Pm.
2
Hence E(XiX2) = (n — n)pip 2 - Also, by Lemma 2, X t is B{n,pi), and hence E(Xi) = npt-. Thus, by the above and Theorem 3 of Section 3.1 we have Cov(XuX2)
= = =
E(X1X2) (n 2 - n)PlP2 -npxp2,
E(X1)E(X2) - nPlnp2
which yields the first formula. By Theorem 6 of Section 3.2, Var(Xi) npi(l — pi), * = 1,2, and thus Cov(XuX2) Pxux2
=
y/VariXJVariX^
=
3.3. COVARIANCE
AND
CORRELATION
61
-npip2
^npi(l-p1)np2(l
-P2)
P1P2
Q.E.D.
EXERCISES
1. If X is uniformly distributed over {—2, —1,0,1,2}, and if Y = X2, find the joint density of X, Y (draw a graph of it), and show that X, Y are not independent.
2. In Problem 1, compute px,Y-
3. Let X, Y be random variables with joint density given by
/x,y(0,2) = l / 6 , / * , y ( 5 , l ) = 1/3, fx,Y (10,0) = 1/2.
Compute px,Y-
4. If X and Y are random variables whose joint density is graphed below, compute E(X), E(Y), Cov{X,Y), Var(X), Var(Y), px,Y and Var(X + Y).
CHAPTER
62
.1/4
Y 4-
.I/6
3-
,1/12
EXPECTATION
.1/5
2-
,1/10
1-
1
-1
3.
()
.1/5
1
1
1
1
2
3
5. Let X and F be random variables whose joint density is given by /x,y(-5,-l) /*,y(l,3)
= =
2 / 1 3 , y * , y ( - 2 , l ) = 3/13, 5/13,/jr,y(5,5) = 3/13.
Compute pxyY6. Determine whether the following polynomials have two distinct real zeros, one double real zero or no real zeros: (i) 2x2 - 3x - 1 (ii) x2 + x + l (iii) x2 + 25z + 625/4. 7. Prove: if Z is a random variable, and if Z > 0, then -E(Z) > 0 . 8. Prove: if X and y are random variables, then Var(X Var(X) + V a r ( r ) if and only if pXy = 0.
+ Y) =
3,3, COVARIANCE
AND CORRELATION
63
9. If X and Y are random variables, and if Var(X+Y) = Var(X) + Var(Y), does this imply that X and Y are independent?
Chapter 4 Conditional Expectation 4.1
Definition and Properties
A frequent goal in statistical inference is to determine as accurately as possible the expectation of an observable random variable. Many times all one can do is observe the random variable and declare that this is the best one can do. However, on occasion one might be able to observe the conditional expectation of the random variable, given prior information. This has a tendency to be closer to the expectation that one wishes to ascertain. The Rao-Blackwell theorem at the end of this chapter renders these remarks more precise. We shall define two kinds of conditional expectation. One conditional expectation of a random variable X , given a value y of another random variable Y, is a number which we denote by E(X\Y = y). Another conditional expectation, that of a random variable X , given a random variable y , is a random variable E(X\Y) which assigns to each u> 6 fi the number E(X\Y)(u>). These two are interrelated and determine each other. Definition. If X and Y are random variables, and if y G range(Y), we define E(X\Y = y), the conditional expectation of X given Y = j / , by
E(X\Y = y) = J2*P([X = *W = y})X
65
66
CHAPTER 4. CONDITIONAL
EXPECTATION
The definition above has a corollary which is essentially the theorem of total probabilities extended to conditional expectation. T H E O R E M 1. If X and Y are random variables, then E(X) ZyE(X\Y = y)P[Y = y].
=
Proof: Using the identities X = ^2X xI[x=x] a ^ d 1 = J2y I[Y=y], w e have
E(X) = £((E*fc«-i)(£W) x
E
y
xI
= (E12 [x=*W=v]) y
y
x
x
= E(E*P(lx = *W = v]))P\r = v] y
x
= J2E(X\Y = y)P[Y = y}. y
Q.E.D. Loosely speaking, then, if one knows the conditional expectation of X given any particular value of F , and if one knows the distribution (i.e., density) of F , then one is able to compute E(X). Remark 1. If X andY are random variables, and if g is a function defined over range(X), then E(g(X)\Y
= y) = j:9(x)P([X
= x]\[Y = y))
Proof: By the definition of conditional expectation and by Theorem 4 in Section 2.2,
E(g(X)\Y = y) = 5>P(far(*) = z]\[Y = „]) * r
P([g(X) = z][Y = y}) P[Y = y]
4.1. DEFINITION AND PROPERTIES =
J^yjY,*Y,{P([X
= E9(*)P([X
67 = x}lY = y)):9(x) = z}
= x]\[Y = y]).
Q.E.D. LEMMA 1. If X andY are random variables, and ifyE then E X Y
( \
range{Y),
= y) = T ^ y ^ W ™ ) -
Proof: By the definition of conditional probabiUty and the above def inition,
=
p[Y = y] TsXEVlx^W^y})
=
P[Y
= y^E(Y,xIlx=x)I[)r=y])
Q.E.D. Example. In the joint density that is pictured on the next page, one easily obtains E(X\Y = 0) = 3/4, E(X\Y = 1) = 4/3, and E(X\Y = 2) = 19/12. (See if you can work these out intuitively.)
CHAPTER 4. CONDITIONAL
68
,5/28
,7/28
12/28
,4/28
,6/28
jl/28
f 3/28
EXPECTATION
1
T H E O R E M 2. If X,Y and Z are random variables, and if a and b are constants, then for every z G range(Z), E(aX + bY\Z = z) = aE(X\Z
= z) + bE(Y\Z = z).
Proof: By Lemma 1 and properties of expectation we have E(aX + bY\Z = z)
=
p[z
= x]E{{aX
=
aE(X\Z
+
bY)I[z=z])
= z) + bE(Y\Z = z). Q.E.D.
T H E O R E M 3. If X and Y are random variables, and if g is a func tion defined over range(X, Y), then for y € range(Y) E(g(X, Y)\Y = y) = E(g(X, y)\Y = y).
4.1. DEFINITION
AND
PROPERTIES
69
Proof: Again by Lemma 1,
E(g(X,Y)\Y
= y) =
p
^
=
y]E{g{X,Y)Ip=y])
=
p^yl^iX^I^y])
= E(g(X,y)\Y = y). Q.E.D. T H E O R E M A. If X andY are independent random variables, if g is a function defined over range(X), and ify€ range(Y), then E(g(X)\Y
= y) =
E(g(X)).
Proof: Remark 1 and the hypothesis of independence of X and Y imply E(g(X)\Y
= y)
= ^g(x)P([X
= x)\[Y = y})
X
= E9^)P[X
= x) = E(g(X)).
X
Q.E.D. We now define conditional expectation as a random variable. Definition. If X and Y are random variables, the conditional expec tation of X given Y is defined to be the random variable E(X\Y)
= Y;E(X\Y
= y)I[y=y],
y
where the summation is taken over all y 6 range(Y). E(X\Y) is a function of Y.) T H E O R E M 5. If X,Y are constants, then
(Note that
and Z are random variables, and if a and b
E(aX + bY\Z) = aE{X\Z)
+
bE(Y\Z).
70
CHAPTER 4. CONDITIONAL
EXPECTATION
Proof: By the definition and Theorem 2,
E{aX + bY\Z) = Y,E(aX +
hY Z=zZ
\
)hz=A
Z
=
£ ( a £ ( X | Z = z) + bE(Y\Z = z))I[Zzzz] z
= a J2 E(X\Z = z)I[z=z] + 6 £ E{Y\Z = z)I[z=z] Z
=
Z
aE{X\Z) + bE{Y\Z). Q.ED.
Since E(X\Y) expectation.
is a random variable, we shall be concerned about its
T H E O R E M 6. IfX andY are random variables, then E(E(X\Y)) E{X).
=
Proof: By the definitions of expectation and conditional expectation, and by Theorem 1, we have E(E(X\Y))
= E(£E(X\Y
=
y)Ipr=a)
y
= Y.E(x\Y = y)p[Y = y] = E{x). y
Q.E.D. T H E O R E M 7.I/X andY tion over range(X), then
are random variables, and if g is a func-
E(g(X)Y\X)
=
g(X)E(Y\X).
Proof: Using the definitions and the properties already proved of ex pectation and conditional expectation, we have E{g{X)Y\X)
= Y,E{9{X)Y\X
= x)I[x=x]
X
= Y,E{g{x)Y\X = x)I[x=x]
4.1. DEFINITION AND PROPERTIES -£g(x)E(Y\X
71 = x)I[x=x]
X
Y,g(X)E(Y\X
= x)I[x=x]
X
g(X)Y,E(Y\X
= x)I[x=x]
g(X)E(Y\X). Q.E.D. COROLLARY TO THEOREM 7. If X is a random variable, and if g is a function defined overrange(X), then E(g(X)\X) = g(X). Proof: This follows by taking Y = 1 and noting that E(Y\X) = 1. Q.E.D. EXERCISES 1. Prove: if X and Y are random variables, if a and b are constants, and if Y = aX + b, then E(Y\X) = Y. 2. If fx,Y(x,y) is as displayed graphically below, compute E(Y\X = 0), E(Y\X = 1), E(Y\X = 2), E(Y\X = 3) and E(Y). Y 3 11/2
ll/8
.V8
ll/24
.1/24
.1/24
11/32
,1/32
,1/32
_^IZ32
X
72
CHAPTER 4. CONDITIONAL 3. In Problem 2, compute E(^\X
EXPECTATION
= 2).
4. In Problem 2, find the density of the random variable 5. In Problem 2, compute E(X2\X
E(Y\X).
+ Y = 2).
6. Prove: if Y = c, where c is some constant, and if X is a random variable, then E(Y\X) = c. 7. Prove: if X is a random variable, then XI[x=c] = d[x=c]8. Prove: if X is a random variable, then X = y%2xllx=x]. x
4.2
Conditional Variance
Loosely speaking, the conditional expectation of a random variable given another replaces more evenly spread probability masses by more concentrated point masses but leaves the expectation the same. A happy consequence is that the variance is decreased, which is some thing much to be desired in sample survey theory. We shall see just how that happens in this section. The notion of conditional variance is a principal tool used in multi-stage sampling, and thus what is about to unfold is of utmost importance in sample survey theory. Definition. If [/, V and W are random variables, then the conditional covariance of (7, V given W = w, Cov(U, V\W = w), is defined by Cov(U, V\W = w) = E(UV\W
= w) - E(U\W = w)E{V\W
= w).
An equivalent definition of Cov(U, V\W = w) is given by the fol lowing theorem. T H E O R E M 1. IfU,V,W Cov(U,V\W
are random variables, then
= w) = E((U-E(U\W
= w))(V-E(V\W
= w))\W = w).
4.2. CONDITIONAL
73
VARIANCE
Proof: By Theorem 2 in Section 4.1, the right hand side of the above equation becomes E(UV-UE(V\W
= w)
- E(U\W = w)V + E(U\W = w)E(V\W = w)\W = w) = E(UV\W = w) - E(U\W = w)E(V\W = Cov{U,V\W = w).
= w)
Q.E.D. Definition. If X and W are random variables, the conditional variance of X given W = w,Var(X\W = w), is defined by yar(X|VT = w) = Cov(X,X\W = w). T H E O R E M 2. If X and W are random variables, then Var{X\W
= w) = E{(X - E{X\W
= w)f\W
= w).
Proof: This is a direct consequence of the definition of conditional variance and of Theorem 1. Q.E.D. Corollary to Theorem 2. If X and W are random variables, then Var(X\W
=
w)>0.
Proof: By Theorem 2, using the definition of conditional expectation and the facts that I[w=w] = I\w=w] a n d E(Y2) > 0 for any random variable F , we have Var(X\W
= w)
= E{(X - E(X\W
= w))2\W = w)
Q.E.D. Conditional variance has much the same properties as does variance, plus: any function of the conditioning random variable behaves very much like a constant.
74
CHAPTER 4. CONDITIONAL
EXPECTATION
T H E O R E M 3. If X and W are random variables, and if c is a con stant, then Var(c + X\W = w) = Var(X\W = w) and Var(cX\W = w) = c2Var(X\W = w) . Proof: Since E(c + X\W = w) = c + E{X\W = w), we apply The orem 2 to obtain the first conclusion. Also, since E(cX\W = w) = cE(X\W = «i), we again apply Theorem 2 to obtain the second equa tion. Q.E.D. T H E O R E M 4. If X,Y and W are random variables, and if X is any function of W, say, X = f(W), then Var(X + Y\W = w) = Var(Y\W = w), and Var(XY\W = w) = {f{w))2Var(Y\W = w). Proof: By Theorem 3 in Section 4.1, E(f(W)
+ =
Y\W = w) = E(f(w) + Y\W - w) f(w) + E(Y\W = w),
and E{{f(W)
+ Y-E(f(W)
+ Y\W =
=
E((Y - E(Y\W
=
Var(Y\W
tv))2\W=:w)
= w))\W = w)
= w).
Also E(f(W)Y\W and Var(f(W)Y\W
= w) =
f(w)E(Y\W
= w),
= w) = E{(f(W)Y - f(w)E(Y\W = (f(w))2Var(Y\W = w).
= w))2\W = w)
Q.E.D. A result to be used frequently in multi-stage sampling is the follow ing. T H E O R E M 4A. If X and Y are random variables, and if f(x,y) any function of two variables then Var(f(X,Y)\Y
= y) = Var(f(X,y)\Y
= y)
is
4.2. CONDITIONAL
VARIANCE
75
for all y G rangeY. Proof: Using the definition of conditional variance and Theorem 3 of Section 4.1, we have Var(f(X,Y)\Y
= E{{f{X,Y))2\Y
= y)
= =
= y)-{E(f(X,Y)\Y
E({f{X,y))2\Y Var{f(X,y)\Y
= y) - (E(f(X,y)\Y = y)
for all y G rangeY.
= y))2 = y)f
Q.E.D.
Definition. If U, V and H are random variables, then the conditional covariance of U and V, given H, is a random variable defined by Cov(U,V\H)
= E(UV\H)
-
E(U\H)E(V\H).
An immediate corollary of this definition is the following result. T H E O R E M 5. IfU.V
and H are random variables, then
Cov(U,V\H)
= ^Cov(U,V\H
= h)I[H=h].
h
Proof: If h' ^ h", then easily I[H=h']I[H=h"] = 0. Thus, E(U\H)E(V\H)
( ^ E(U\H = />') W l ) ( £ E(V\H = hff)I[H=h>l]
=
h'
=
£
h" E U H
( \
= h)E(V\H
= h)I[H=h].
h
Also by the definition, E(UV\H)
= 2 > ( t f V | f f = *)'[*=*]• h
Thus, by the definition above, Cov{U, V\H)
=
£ ( £ ( # V|ff = h) - E(U\H h h
= h)E(V\H
= h))I[H=h]
76
CHAPTER 4. CONDITIONAL EXPECTATION Q.E.D.
Analogous to Theorem 1 is the following result for conditional covariance given a random variable. THEOREM 6. IfU,V and H are random variables, then Cov(U, V\H) = E({U - E(U\H))(V -
E{V\H))\H).
Proof: Remembering that E(X\Y) is a function of Y, then by Theo rems 5 and 7 of Section 4.1 we have E((U - E{U\H))(V - E{V\H))\H) = = E(UV - E{U\H)V - E{V\H)U + E(U\H)E{V\H)\H) = E(UV\H) - E(E(U\H)V\H) -E(E(V\H)U\H) + E(E(U\H)E(V\H)\H) = E{UV\H) - E{U\H)E{V\H) -E(V\H)E(U\H) + E{U\H)E{V\H) = E{UV\H) - E(U\H)E{V\H) = Cov(U,V\H). Q.E.D. The fundamental theorem of this section is the following. THEOREM 7. IfU,V and H are random variables, then Cov(U, V) = E(Cov(U, V\H)) + Cov{E{U\H), E(V\H)). Proof: By Theorem 6 of Section 4.1, E(UV) = E(E(UV\H)), E{U) = E(E(U\H)) and E(V) = E(E(V\H)). Thus Cov(U,V) = E(UV) - E{U)E{V) = E(E(UV\H)) E(E(U\H))E(E(V\H)) = E(E(UV\H)) - E{E(U\H)E{V\H)) +E(E(U\H)E(V\H)) E(E(U\H))E(E(V\H)) = E(Cov(U, V\H)) + Cov{E{U\H), E{V\H)).Q.E.D.
4.2. CONDITIONAL
77
VARIANCE
Definition. If X and H are random variables, then the conditional variance of X given H is the random variable denned by Var{X\H) = Cov(X,X\H). T H E O R E M 8. If X and H are random variables then (i) Var(X\H) (ii) Var(X\H) (iii) Var(X)
= ZkVar(X\H
= h)I[H=hh
= E((X - E{X\H))2\H), = E{Var{X\H))
+
and
Var(E(X\H)).
Proof: These three results are special cases of Theorems 5,6 and 7. Q.E.D. Conclusion (iii) in Theorem 8 is applied again and again in multi stage methods in sample survey theory. The following theorem should be given here since it is an immediate corollary to Theorem 8 and is widely used in mathematical statistics. T H E O R E M 9. (Rao-Blackwell Theorem). If X andY variables then Var(X) > Var(E(X\Y)).
are random
Proof: Since Var(X\Y = y) > 0, it follows that Var(X\Y) > 0 and thus E(Var(X\Y)) > 0. The conclusion follows now from Theorem 8. Q.E.D. The following extension of theorem 3 is a standard tool used in sample survey theory. T H E O R E M 10. If X,Y function of Z, then
and Z are random variables, and if X is a
Var(X + Y\Z) =
Var(Y\Z)
and Var(XY\Z)
=
X2Var(Y\Z).
78
CHAPTER 4. CONDITIONAL
EXPECTATION
Proof: Suppose X = f(Z). Then by Theorems 4 and 8, Var(X + Y\Z)
= £ Var(f(Z)
+ Y\Z =
z)I[z=z]
Z
=
£ Var(Y\Z = z)/ I Z = ,] =
Var(Y\Z),
Z
and Var(XF|Z)
=
^Var(f(Z)Y\Z
=
z)I[z=z]
Z
=
T,(M?Var Z are random variables, and if U\, - - -, Un are conditionally independent given Z, then VariUi + • • • + Un \Z = z) = £ Var{Ui\Z = z)
4.2. CONDITIONAL
VARIANCE
for all z € range(Z),
79
and
Var(J2Ui\z)
=J2Var{Ui\Z).
\t=l
/
1=1
Proof: We first note that for n > 2, if i =^ j , then also Ui and Uj are conditionally independent given Z. Thus E(UiUj\Z = z) = J2uvP([Ui = u][Uj = v]\[Z = z]) tt,v
= E « ^ ( [ ^ = «]|[Z = ^ ( P i = f]|[Z = *]) =
fe*PW
= u]\[Z = ,])) fevP([Ui = v]\[Z = *]))
= E(Ui\Z = z)E{Uj\Z = z). Using this we obtain, for i ^ j , Cov(Ui,Uj\Z
= z) = 0.
Thus Var(Ui + --- + Un\Z = z)
= £ Var(^|Z = z) + £ tf(w(l/i, l/,-|Z = z)
= £Var(C/i|Z = *), t=l
thus establishing the first conclusion. The second conclusion follows from the first by multiplying both sides by I[z=z] a n d summing over all z e range(Z). Q.E.D. EXERCISES 1. Prove: If £7, V and W are random variables, and if W is a function of V, then E(U\W) = E(E{U\V)\W).
80
CHAPTER 4. CONDITIONAL
EXPECTATION
2. Let X, Y be random variables whose joint density is given by the graph below. Compute (i) Var(E(X\Y)), (ii) the density of Var{X\Y) and (Hi) E(Var(X\Y)).
Y
4-
.1/7
3-
.1/7
.1/7
.I/7
2-
1H
,1/7
~i
-1
()
.1/7
.1/7
i
i
i
1
2
3
X
3. Prove: If X and Y are random variables, then
{E{X\Y)Y = Y,{E{X\Y = y)yi[Y=y]. 4. Prove: If X, Y and H are random variables, and if X and Y are conditionally independent given H, then £(AT|#) = £(X|#)£(F|#). 5. Prove: If X, y and Z are random variables, and if {X, F } and Z are independent, then £(X|Y, Z) = £ ( X | F ) . 6. Let X and Y* be random variables with joint density given by /y,y(3,6)
= =
/y,y(4,6) = / y ,K(5,6) = / Y , y ( - 2 , 0 ) / x , y ( - 4 , 0 ) = / x , y ( - 6 , 0 ) = l/6.
4.2. CONDITIONAL
VARIANCE
(i) Compute Var(X\Y
= 6) and Var(X\Y
(ii) Find the density of (iii) Compute E(X\Y
81 = 0).
Var(X\Y).
= 6) and E(X\Y
(iv) Find the density of
E(X\Y).
(v) Compute E(Var(X\Y))
and
= 0).
Var(E(X\Y)).
(vi) Verify for this example that Var(X)
= Var(E(X\Y))
+
E(Var(X\Y)).
Chapter 5 Limit Theorems 5.1
T h e Law of Large Numbers
Two limit theorems, known as the law of large numbers and the central limit theorem, occupy key positions in statistical inference. The law of large numbers provides a method of estimating certain unknown constants. The central limit theorem, among its many uses, gives us a means of determining how accurate these estimates are. This section is devoted to a most accessible law of large numbers. L E M M A 1. (Chebishev's Inequality). If X is a random variable, then for every e > 0, P([\X - E(X)\ > e))
0, -x
0,
P([\Z-E(Z)\<e])>l-^P-. If we replace e by JVar(Z)lt,
we obtain
P([\Z-E(Z)\<J^Q])>l-t. If we let t = 0.01, then P([\Z - E(Z)\ < 10^/yar(Z)]) > .99, which gives us a larger bound to the error of estimating E(Z) by Z, namely 10y/Var(Z). In any C2tse, we see that if we obtain an unbiased estimate Z for some unknown constant, the smaller the variance, the smaller the error. If we wish to be right in stating the maximum error of estimates in at least 99 out of every 100 cases we would state that the error is less than lOy Var(Z), provided we know the value Var(Z). In practice, we should prefer to use 2.57VVarZ for rather good theoretical reasons.
6.3. ESTIMATION
OF SAMPLING ERRORS ERRORS
105
Thus it becomes necessary to be able to estimate the variance of an unbiased estimate if we wish to estimate the maximum error in using it. T H E O R E M 1. In WR sampling ifyu-••,ynisa and if2ysi is defined by andifs
is a sample of size n,
n
1
4 = - r T It =>i - j ' ) 2 . n
i
then s2y is an unbiased estimate
ofVar(yi).
Proof: We may write
^{l>j-4 Now E(yj) = E(Vl yn E(y]) = E(y2) for 1 < j < n. Since yyi, u • • •, y„ yi)) and E{y]) are independent, then E
(l>j) = nE(yl),
and
E(f) = ^ ( E J / J3 + E w ) n^
\fr[
fe
w
2 2 = ±-{nE( = l!nE(yl) y ) + +n(nn(n-l)(E( l)(E(yiyi)) ))>\. }. n2
*■
Thus, with a little algebra we obtain E(s2y) = Var{y Var(y x).i).
J
Q.E.D.
COROLLARY 1. In WR sampling, an unbiased estimate of the variance of the unbiased estimate Y = Ny is ^-s2y. Proof: In WR sampling, Var(Ny) = N2Var(yi)/n. Var{yi)/n. By Theorem 1, 2 2 E(sl) = Variyi). Var(yx). Thus N s Jn Var(Ny). v/„ is an unbiased estimate of Q.ED.
CHAPTER 6. SIMPLE RANDOM
106
SAMPLING
The situation for WOR is a bit more complicated. WOR sampling, if si = ;^^ EE"=iG/j vf,2 , &en then T H E O R E M 2. In /n WO# " = i ( y ; - £) 2 2 2 2 E{s f e » S = (1/(N - l))Ef=i(^ E(s})y) = S SI, «where l))T$Li<Xj - Y) . Ey =1 (y,—y)23 = ELiS/ E"=iJ/ Proof: Wefirst E?=i(j/i-j/) -"*/ We first observe that (n-l)s (n-l>22 = E"=i(j/j-y) Ej=iJ/222-raJ/ -™/ 2222 22 (_E(j/)), , we use Theorem 4 in Section 2 to Since Var(y) = E(y 2?(j/)) — (E(y)) obtain E(y £(y 2 ) = Var(y) + (E(y)) {E(y))2 = 1(1 -_ £*)S* y22. We also have )£2 + Y
*(s*)--«>-ss* Using these two equations we have (n-l)E(s2y)
= E rtyU (f^y2)
-
nE(y2)
-spM1-*)*-** _
n(iV-l) n(N-l)
=
jy ^ " V *~V~N) ~ N) bs*y N 2 (n-l)S .
2
n\ n\
/
2
s
This yields £E(s2 ( s 2y)) = = S2 S2y...
Q.E.D.
COROLLARY . i/ n W O R sampling, an unbiased estimate of the variance ofY = Ny is Var(Y) = ^- (l f 1 — j^J ^J s2, and aw an unbiased estimate ofVar(y) ofVar{y)
is Var(y) Var{y) = (1 —
n/N)s2/n.
Proof: By Theorem 2 above and Theorem 4 of Section 2,
J 7 " ' ■ - T -J'K) *{£('-*)• Proof: Again by Theorem 1 of Section 2, x 7risisan anunbiased unbiased estimate estimateofof Y = it. By Theorem 4 of Section 2,
v^.).I(,_i)5;. v**)-1 (i-£)sj. But
3 = ^{p?-***} 1 =
< > I • — IS I 7
N ~ Ifei ' J ( i V , r = ±1(N*-N**) = JFTT "^ = = j \j£-ir{l-ir), r r r ( 1 - *),
and thus
r-W-wS)*1-')-
By Theorem 2 of Section 3, E(s\) £(*») = 5J, SJ, and thus an unbiased estimate of Var(it) Var{v) is
**>-io-sx. «*>-i > for 11
— —
„o/K. V.
fc^). Note that for Proof: Let us compute £ ( ,!==* „ • • •fc,-.^ , » , ■ _ , = *,-_,). 1 < iz < <j —1 E(yt\ui = h,---,
u,-_i = *,-_!) = Ffci.
Also, for »rf i £ {{Jfci,...,*,-.!} *!,•••,*,•_!}
P([Uj P([Uj = =i ] »]!«! |Ul = = *!,-..,U,-_ *x, • • -,«,•_! = *,-_!) = Xi/ = X,7 ( X -( X ^X - f£c r jX*, . J . X = *,-_!) Thus we obtain
E{tj\ = *!,...,«,-_! *!,...,«,-_!==Vi) v o==n,n,++• •• ••+• v+ ,v +. + E{tj\Ul Ul =
EE ^ ^ = Y=-
F
-
130
CHAPTER
7. UNEQUAL PROBABILITY
SAMPLING
Hence E(tj) = Y. This proves that each tj is unbiased. Easily,
E(ij) = E((t1 + --- + tj)/j) = Y Q.E.D. The statistic tn is what we intend to use as an estimate of Y. The first question that arises is whether this estimate is "better" than the es timate Y obtained in Section 7.2 where the samphng was done WR. By the definition of Y in Section 7.2 it follows that Var(Y) = Var(ti)/n. Thus we shall have proved in to be a "better" estimate than Y when we have proved that- Var(in) < Var(ti)/n. T H E O R E M 2. If {U, 1 < i < n} are as defined above and then U and tj have correlation zero for i ^ j , and Var(tn)
2,
-Varfa).
Proof: Let 2 < j < n, let &i, • • •, kj be j distinct integers in { 1 , • • • , N}. By what was proved in the proof of Theorem 1, E(tj\ui = A?i, • • •, Uj-i = %_i) = Y for 2 < j < n. Thus the random variable ^(fi-i = I H J / ^ . I ^ J
£
YI
ct:>^] = Y'
a constant. We use this fact now to show that if 1 < i < j < n, then Cov(ti,tj) = 0. Indeed, by properties of conditional expectation established in Chapter 4, since U is a function of Ui, • • •, u,-, and since i < j , we have E{Utj)
=
£(£(W>ir.., t^))
= EiUEitjluu-.tUj-t)) = E(tiY) = YE(ti)^Y29
7.3. WO R PROBABILITY
PROPORTIONAL
TO ...
131
this last equality by Theorem 1. Since E(t{) = E(tj) = y , then Cov(th ts) = E(Utj) - E(ti)E(tj) = y 2 - y 2 = 0, i.e., the covariance of U and tj is zero. We next show that for 2 < i < n, Var(i t ) < Var(t\). We first recall the theorem proved in Chapter 4: if U is a random variable, and if H is any vector random variable, then Var(U) = E(Var(U\H))
+
Var(E(U\H)).
To apply this result here, let U = U and let H be the random vec tor whose coordinates are Ui, • • •, ut-_i. As shown earlier in this proof, E(ti\ui, • • •, Ui-i) = Y which is a constant random variable, and thus Var(E(ti\ui, • • •, Ut-i)) = 0. This implies by the above-recalled result that Var(U)
=
E(Yar(ti\uir^,Ui^i)) {E((ti)2\ui
£
= ki,-~,Ui-i
=
k-i)
= *!,-••, wt_i = fc^x))2} J^-i [ u r = k r ] ) .
-(E(ti\Ul
For fixed fc1? • • •, fct-_i, the expression inside the curly brackets, {•}, is the variance of yi/p* when samphng WOR and with probability propor tional to size after units Ukx, • • •, Uki_x have been removed. By Theorem 3 in Section 7.2, Var\
M-
E l Vn are defined, and thus are themselves random variables. They are called least squares solutions for a and 6. EXERCISES 1. The equation y = a + bx considered in this section is called the regression line of y on x. Find the least squares estimates of the constants in the regression line of x on y. 2. In our least squares treatment of x, y, it might be more reasonable to assume that y is a second degree polynomial function of x, i.e., y = a + bx + ex2. If so, find the least squares estimate a, 6, c of a, 6, c. 3. Prove: if ci, • • •, Cn are numbers, and if c = (c\-\
h cn)ln, then
E? =1 (c,-c) = 0. 4. Prove: if X is a random variable, then the value of m that mini mizes E((X - m) 2 ) is m = E(X).
8.2
Ratio Estimation
In this and the subsequent sections we shall concern ourselves about special cases of the linear regression model y = a + bx plus a small error, as considered in Section 1. In this section we shall deal with the case a = 0. When this situation holds, y = bx or Y{ = bX{ plus a small error. This is just the situation where probability proportional to size sampling is so effective. Another effective method here, if one chooses to do simple random sampling either WR or WOR, is the ratio estimate, which is the subject of this section. As before we consider the population (U; x, y) with predictor. A sample of size n is taken yielding observations (#i, j/i), • • •, (x n , j/ n ), and the problem is to estimate Y.
8.2. RATIO
ESTIMATION
139
Definition: The ratio estimate Y of Y is defined to be Y =
X
^X.
Roughly speaking, y estimates K, x estimates X , so y/x estimates Y/X = Y/X, and hence the ratio estimate should estimate Y. A word of caution is in order here. It is not necessarily true that the expecta tion of a quotient of two random variables is equal to the quotient of the expectations. Hence the ratio estimate Y of Y is not necessarily unbiased. In Theorem 2 we shall give conditions under which the ratio estimate is unbiased. The literature on ratio estimates appears to some as unsatisfying. The results are arrived at without sufficient rigor. Also, the approxi mations are found to be wanting. In Theorem 1 below we shall show that if the assumption of the model that y/x is uniformly close to b is true, then the ratio estimate Y of Y attains remarkable precision and has very small variance. This theorem is not of much practical use, but an understanding of it should give the student the courage to use ratio estimates when the model above applies. But first we need a lemma. L E M M A 1. IfV then
is a random variable satisfying P[a < V < b] = 1, Var(V)