P369463-pr.qxd 9/8/05 12:50 PM Page i
Probability and Statistics
P369463-pr.qxd 9/8/05 12:50 PM Page ii
P369463-pr.qxd 9/8/05 12:50 PM Page iii
Probability and Statistics
Ronald Deep
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Academic Press is an imprint of Elsevier
P369463-pr.qxd 9/8/05 12:50 PM Page iv
Acquisitions Editor Project Manager Marketing Manager Cover Design Composition Cover Printer Interior Printer
Tom Singer Brandy Lilly Francine Ribeau Eric Decicco SNP Best-set Typesetter Ltd., Hong Kong Phoenix Color Corp The Maple-Vail Book Manufacturing Group
Academic Press is an imprint of Elsevier 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA 525 B Street, Suite 1900, San Diego, California 92101-4495, USA 84 Theobald’s Road, London WC1X 8RR, UK
This book is printed on acid-free paper. Copyright © 2006, Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage (http://elsevier.com), by selecting “Customer Support” and then “Obtaining Permissions.” Library of Congress Cataloging-in-Publication Data Deep, Ronald. Probability and statistics / Ronald Deep. p. cm. Includes bibliographical references and index. ISBN 0-12-369463-9 (alk. paper) 1. Probabilities—Computer simulation. 2. Mathematical statistics—Computer simulation. I. Title. QA273.19.E4D44 2006 519.2—dc22 2005053028 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN-13: 978-0-12-369463-8 ISBN-10: 0-12-369463-9 For all information on all Elsevier Academic Press Publications visit our Web site at www.books.elsevier.com Printed in the United States of America 05 06 07 08 09 10 9 8 7 6 5 4 3 2 1
Working together to grow libraries in developing countries www.elsevier.com | www.bookaid.org | www.sabre.org
P369463-pr.qxd 9/8/05 12:50 PM Page v
The book is dedicated to Carolyn Gregory Geoff, Abby, Thomas, John, and Samuel Brian, Michelle, and Ethan Brent, Katharine, and Joseph
P369463-pr.qxd 9/8/05 12:50 PM Page vi
P369463-pr.qxd 9/8/05 12:50 PM Page vii
Contents
Preface Acknowledgments
xv xx
1
INTRODUCTION TO PROBABILITY
1
1.0 1.1
Introduction Interpretations of Probability Objectivists Classical (a priori) Empirical or Relative Frequency (a posteriori) Mathematical or Axiomatic
2 4 5 5 5 7
1.2
Sets Set Algebra
8 8
1.3 1.4 1.5 1.6 1.7
Probability Parlance Probability Theorems Conditional Probability and Independence Bayes’s Rule Counting the Ways Two Fundamental Principles of Counting (FPC) and the Pigeonhole Principle Tree Diagrams Permutations Combinations Match Problem Revisited
10 13 14 19 23 23 27 28 32 46 vii
P369463-pr.qxd 9/8/05 12:50 PM Page viii
viii
Contents
1.8
Summary Problems Miscellaneous Software Exercises Self Quiz 1A: Conditional Probability Self Quiz 1B: Poker Probability
57 60 64 70 81 82
2
RANDOM VARIABLES, MOMENTS, AND DISTRIBUTIONS
83
2.0 2.1 2.2 2.3
Introduction Random Variables Distributions Moments Information Content (Entropy) Higher Moments
84 84 91 96 104 108
2.4 2.5
Standardized Random Variables Jointly Distributed Random Variables Discrete Joint Density Functions
112 114 117
2.6
Independence of Jointly Distributed Random Variables Covariance and Correlation Conditional Densities Functions Moment Generating Functions Transformation of Variables Transformation of 2 or More Variables
120 121 126 131 134 136
2.11
Summary Problems Review Paradoxes Software Exercises Self Quiz 2: Moments
138 140 145 147 149 156
3
SPECIAL DISCRETE DISTRIBUTIONS
158
3.0 3.1 3.2 3.3
Introduction Discrete Uniform Bernoulli Distribution Binomial Distribution
159 159 163 164
2.7 2.8 2.9 2.10
P369463-pr.qxd 9/8/05 12:50 PM Page ix
ix
Contents
3.4 3.5 3.6 3.7 3.8 3.9
Multinomial Distribution Hypergeometric Distribution Geometric Distribution Negative Binomial Distribution Poisson Distribution Summary Problems Review Software Exercises Self Quiz 3: Discrete Distributions
174 175 180 184 187 194 196 205 206 214
4
SPECIAL CONTINUOUS DISTRIBUTIONS
215
4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11
Introduction Continuous Uniform Distribution Gamma Function Gamma Family (Exponential, Chi-Square, Gamma) Exponential Distribution Chi-Square Distribution Normal Distribution Student t Distribution Beta Distribution Weibull Distribution F Distribution Summary Problems Miscellaneous Review Software Exercises Self Quiz 4: Continuous Distributions
216 216 220 221 224 228 231 243 245 247 250 252 255 260 261 262 268
5
SAMPLING, DATA DISPLAYS, MEASURES OF CENTRAL TENDENCIES, MEASURES OF DISPERSION, AND SIMULATION
269
Introduction Data Displays Boxplots Frequency Distributions and Histograms
270 271 273 274
5.0 5.1
P369463-pr.qxd 9/8/05 12:50 PM Page x
x
Contents
5.2
Measures of Location Mean Median Mode Trimmed Mean Robustness
277 277 279 282 285 287
5.3
Measures of Dispersion Sample Variance and Standard Deviation Interquartile Range (IQR) Median Absolute Deviation from the Median (MAD) Outliers Coefficient of Variation Skewness Kurtosis
287 287 288
5.4 5.5
Joint Distribution of X and S2 Simulation of Random Variables Rejection Method
295 298 303
5.6 5.7 5.8
Using Monte Carlo for Integration Order Statistics Summary Problems Software Exercises Self Quiz 5: Sampling and Data Displays
306 307 310 313 316 324
6
POINT AND INTERVAL ESTIMATION
325
6.0 6.1
Introduction Unbiased Estimators and Point Estimates Cramér-Rao Inequality
326 327 329
6.2
Methods of Finding Point Estimates Method of Moments Estimators (MME) Maximum Likelihood Estimators (MLE)
333 333 337
6.3
Interval Estimates (Confidence Intervals) Trade-Off: Sample Size Confidence Interval When s Is Not Known
347 351 352
288 289 291 292 293
P369463-pr.qxd 9/8/05 12:50 PM Page xi
xi
Contents
Confidence Interval for the Difference between Two Means (m1 - m2 ) Confidence Interval for s 2 of a Normal Distribution Confidence Interval for a Proportion Confidence Interval for the Difference between Two Proportions Confidence Interval for the Paired T-Test Confidence Intervals for Ratio of Variances s22/s 12
353 355 355 357 358 358
6.4 6.5 6.6 6.7
Prediction Intervals Central Limit Theorem (Revisited) Parametric Bootstrap Estimation Summary Problems Confidence Intervals Miscellaneous Software Exercises Self Quiz 6: Estimation and Confidence Intervals
360 361 363 366 369 371 373 375 384
7
HYPOTHESIS TESTING
386
7.0 7.1 7.2
Introduction Terminology in Statistical Tests of Hypotheses Hypothesis Tests: Means P-value Directional Tests
387 387 397 400 402
7.3
Hypothesis Tests: Proportions Fisher-Irwin Test
403 405
7.4
Hypothesis Tests for Difference between Two Means: Small Samples (n £ 30) s 2 Known n < 30; s 2 Unknown
407 408
Hypothesis Test with Paired Samples Paired vs. Unpaired Statistically Significant vs. Practically Significant
411 412
7.6
Hypothesis Tests: Variances Hypothesis Tests for the Equality of Two Variances
414 416
7.7
Hypothesis Tests for Independence, Homogeneity, and Goodness of Fit
417
7.5
414
P369463-pr.qxd 9/8/05 12:50 PM Page xii
xii
Contents
R ¥ C Contingency Tables Test for Homogeneity and Independence Goodness of Fit Probability Plots
418 424 428
7.8
Summary Problems Miscellaneous Software Exercises Self Test 7: Hypothesis Testing
435 437 445 448 453
8
REGRESSION
455
8.0
Introduction
456
8.1
Review of Joint and Conditional Densities
457
8.2
Simple Linear Regression Least Squares Estimation Other Models of Simple Linear Regression
459 461 465
8.3
Distribution of Estimators with Inference on Parameters Distribution of RV E Distribution of RV Yi
466 467
Distribution of RV B Inference on the Slope b Distribution of RV A Inference on the Intercept a ˆ Distribution of RV Y
469 471 473 476 477 478
8.4
Variation Coefficient of Determination
483 485
8.5
Residual Analysis Lack of Fit F-Test
486 490
8.6
Convertible Nonlinear Forms for Linear Regression
493
8.7
Polynomial Regression
494
8.8
Multiple Linear Regression Multiple Linear Regression with Matrices
498 501
8.9
Multiple Regression Techniques Forward Selection
506 506
P369463-pr.qxd 9/8/05 12:50 PM Page xiii
xiii
Contents
Backward Elimination Model Variables Selection Criteria Stepwise Regression
507 508 514
8.10
Correlation Analysis
520
8.11
Summary Problems Miscellaneous Software Exercises
524 527 533 536
Self Test 8: Regression
545
9
ANALYSIS OF VARIANCE
547
9.0 9.1
Introduction Single-Factor Analysis The Bartlett Test for Homogeneity of Variances
548 548 560
9.2
Two-Way ANOVA without Replication To Block or Not to Block
562 565
9.3 9.4
Two-Way ANOVA with Replication Multiple Comparisons of Treatment Means Contrasts Contrast Confidence Intervals Least Significant Difference (LSD), Fisher LSD, and Scheffe Procedures Tukey Method Bonferroni Method Tukey Method vs. Bonferroni Method
566 571 572 578
9.5 9.6
ANOVA and Regression Analysis of Means (ANOM) Graphical Analysis of Treament Means
586 590 592
9.7
Summary Problems Review Software Exercises Self Quiz 9: Analysis of Variance
593 596 602 603 607
10
NONPARAMETRIC STATISTICS
609
10.0 10.1
Introduction The Sign Test
610 610
579 581 583 585
P369463-pr.qxd 9/8/05 12:50 PM Page xiv
xiv
Contents
10.2 10.3
Nonparametric Bootstrap Estimation The Sign Test for Paired Data Type II Beta Error for the Sign-Test
613 614 615
10.4 10.5
The Wilcoxon Signed-Rank Test Wilcoxon-Mann-Whitney (WMW) Rank Test for Two Samples Spearman Rank Order Correlation Coefficient Kendall’s Rank Correlation Coefficient (t) Nonparametric Tests for Regression Nonparametric Tests for ANOVA Kruskal-Wallis Friedman Test
615
Runs Test Randomization Tests Summary Problems Software Exercises
637 641 645 647 653
10.6 10.7 10.8 10.9
10.10 10.11 10.12
618 623 625 626 630 630 634
APPENDIX A
659
APPENDIX B
662
REFERENCES
679
INDEX
681
P369463-pr.qxd 9/8/05 12:50 PM Page xv
Preface
This book is a calculus-based introductory treatment of probability and statistics written for the junior-senior undergraduate or beginning graduate student with a wide range of ability in engineering, science, natural science, mathematics, engineering management, management science, computer science, the arts, and business with a desire to experience the phenomena of probability and statistics hands on. Practitioners may also benefit from the many software applications and statistical tests. The contents of the book may be covered in two semester courses: Chapters 1 through 5 in the first semester and Chapters 6 through 10 in the second semester. The development of the concepts are illustrated and iterated with examples reinforced with software simulations and applications so that the student can experience the chance phenomena through simulation. The student is encouraged to try the software exercises to get a feel for the meaning of probability and statistical averages. The software programs essentially amplify the brains of the readers so that they need not fret the calculations while they focus on and retain the concepts of the calculations. Imagine electronically flipping a fair coin 550,000 times and getting the following results in less than three seconds with the command (time (sim-coins-1-1 1000 1/2 10)). 1 483 1000 1533 2037 2495 2967 3448 4038 4396 5036
2
3
4
5
6
7
8
9
10
#Heads
#Flips
P(H)
496 988 1526 2054 2531 3056 3520 3988 4508 4993
525 1007 1525 2031 2437 3010 3533 4073 4495 5042
496 997 1464 1995 2519 2991 3481 3991 4505 5063
502 1019 1514 1989 2452 2963 3475 3943 4470 5017
499 1025 1495 2026 2487 2992 3557 4067 4433 4931
499 1013 1491 1987 2489 2995 3555 4078 4543 5010
493 993 1471 2032 2471 3018 3498 4011 4532 4988
505 1031 1510 1952 2500 3000 3543 3976 4460 4933
487 986 1500 1986 2478 2988 3492 3994 4490 4975
4985 10059 15029 20089 24859 29980 35102 40159 44832 49988
10000 20000 30000 40000 50000 60000 70000 80000 90000 100000
0.498500 0.502950 0.500967 0.502225 0.497180 0.499667 0.501457 0.501988 0.498133 0.499880
xv
P369463-pr.qxd 9/8/05 12:50 PM Page xvi
xvi
Preface
Change the probability of 1/2 to 1/20 and the command can be repeated instantly. Readers are encouraged to integrate the software exercises with their readings while experimenting with over 500 commands to satisfy their curiosity and expectations from data displays, simulations, distribution probabilities, estimation, hypothesis testing, linear and multiple linear regression, and analysis of variance and nonparametric statistics. Computer programming is an excellent skill for problem solvers, involving design, prototyping, data gathering, testing, redesign, validating, and so on, all wrapped in the scientific method. The software exercises provide supporting evidence of probability phenomena. Answers are provided to almost all of the odd-numbered problems at the end of each chapter. The problems help the readers exercise their logical thinking power. The process of problem solving is emphasized by various approaches to solving a variety of problems. Although a course in probability and statistics can at times be reduced to formulas, I’ve stressed concept formulations, assumptions, and frequent revisits of topics, with learning through iteration. Emphasis is on understanding the concepts through very simple, but challenging problems that are then confirmed through software exercises. Problem descriptions are short and often terse, so that the student can concentrate on the basic concepts. In order for the student to solve a problem, a concept must be applied. Several concepts are incubated before formal discussion occurs. Distinguishing features of the text include: • Probability concurrent with and integrated with statistics through tailored software simulations • Comprehensive summary problems at the end of each chapter • Engaging, pedagogical, conceptual problems with answers • Solutions to all problems in a separate electronic manual for instructors • Sample quizzes for students • Simulations verifying theoretical concepts • Tailored software algorithms covering common games (dice, coins, cards, etc.), the discrete and continuous distributions, estimation, hypothesis testing, multiple linear regression, ANOVA, parametric, nonparametric statistics, and Monte Carlo simulation • Use of varied problem-solving methods The Genie software application with a software tutorial and user’s guide is available for the students. PowerPoint slide presentations, Word files, sample quizzes, and a complete solution manual are available for instructors. The integrated software routines are written in the programming language called Corman Lisp, where Lisp stands for List Processing (see http://www.cormanlisp.com/). The software routines contain commands for generating random samples from most of the specific discrete and continuous distributions. The student experiments with generating random samples from density functions using the inverse cumulative distribution method and others to verify that the esti-
P369463-pr.qxd 9/8/05 12:50 PM Page xvii
Preface
xvii
mators are close to the expected values of the parameters of the density distributions. The student also must recall the distribution and the parameters before executing the command. The linear regression software includes multiple linear regressions with the automation of all models resulting in combinations of regressor variables along with multiple criteria for judging the appropriateness of the models. ANOVA software generates the solution tables from the data and partitions the variation. Analyses of means (ANOM) techniques are also included. Nonparametric routines including nonparametric analysis of variances and regression are used and contrasted with parametric routines under appropriate assumptions. The course is also supported by a web site with PowerPoint and Word presentations.
Notation Members of sets are separated by white space, not commas. {1, 2, 3} = {1 2 3}. Given that sets A = {1 2 3}, B = {2 3 5}, S = {1 2 3 4 5 6 7 8 9 10}. A intersect B or A B is written AB = {2 3}. A union B or A B is written A + B = {1 2 3 5}. A complement is written Ac = {4 5 6 7 8 9 10} in sample space S. Set difference A - B is written A - B = {1}, elements in A but not B. (A + B)c = {4 6 7 8 9 10} = AcBc = {4 5 6 7 8 9 10} {1 4 6 7 8 9 10}; DeMorgan’s Law. P(A + B) is the probability of event (set) A or B or both occurring. Random variables (RVs) appear in capital letters. The default symbol is X. At times the mnemonic symbol is used for the continuous uniform RV. The value of the RV is expressed in small letters. For example, P(X £ x) is the probability that the random variable X is less than or equal to the specified value x. Similarly, f(x) is the default notation for the probability density function and F(x) is the notation for the cumulative distribution function of RVs. The symbol F is used exclusively for the cumulative normal distribution. Normal distributions are indicated by N(m, s 2) and the symbol ~ means is distributed as. The domain of a density function f(x) is assumed to exist where it is explicitly defined and is assumed to be zero where it is not explicitly defined.
P369463-pr.qxd 9/8/05 12:50 PM Page xviii
xviii
Preface
Example: f ( x ) = 2x on [0, 1] instead of f ( x ) = 2x, 0 £ x £ 1; = 0; elsewhere. It is assumed that we do not divide by zero. Software commentary is presented with shaded background and the commands are presented in bold. When a function template is created, it is given a name and a list of entering arguments in italics. (Cube x) is created in the language by writing the template (defun cube (x) (* x x x)). The name of the function template is cube and the entering argument is x. If the variable x has been assigned to a value, then (cube x) is a command. One can write (cube 5) and get 125 returned, but (cube x) only returns a value if x has been assigned to a number. The commands are given mnemonic names. To find the arguments, one types the command (args function) to get a list of the entering arguments. For example, (args 'normal) returns “Args: mu variance x; Returns lower probability of N(mu, variance)” implying (normal 50 4 52) returns 0.8413 as the value P(X £ 52) given X ~ N(50, 4). In many of the examples, the template is shown for the reader to see the entering arguments. Thus (sim-normal m s n) is a template for (sim-normal 50 4 30) requesting n = 30 simulated samples from the normal distribution with mean of m = 50 and a standard deviation of s = 4. The simulated functions are preceded by sim. The command (apropos 'sim-) returns the names of all functions matching sim-, that is, nearly all the simulation functions. Distributions are given by name followed by parameters in alphabetical order and then x-value, for example, (binomial n p x) or (normal mu sigma-square x) or (Poisson k x). That is, (binomial 3 1/2 1) returns P(X = 1 | n = 3; p = 1/2) = 0.375; (normal 5 9 8) returns P(X < 8 | N(5, 9) = 0.841345; (poisson 3 0) returns P(X = 0 | k = 3) = e-3 = 0.049787, and so on. Distribution commands with a c in front are cumulative. For example, (binomial n p x) returns P(X = x) given n, p and x; but (cbinomial n p x) returns P(X £ x). Distributions with a-b attached, for example, (cbinomial-a-b n p a b) returns P(a £ X £ b). Distributions commands return the lower tail probability by default, but when preceded by a U return the upper tail probability. Thus, (phi 1) returns 0.8413447, as does (L-phi 1), but (U-phi 1) returns 0.1586552 for the unit normal distribution. The software commands always return something. The arrow symbol Æ is used to show what a command returns, for example, (cube 5) Æ 125. Software commands return the object or value on the next line. Usually nil is returned after a final side effect calculation.
P369463-pr.qxd 9/8/05 12:50 PM Page xix
xix
Preface
The last object returned by a command is assigned to the symbol *. The second and third from the last return are assigned to the symbols ** and ***, respectively. Thus, (sim-normal 50 4 100) returns a simulated sample size of 100 from N(50, 16) (46.9 46.3 50.7 49.1 51.9 48.1 50.1 48.7 57.8 54.1 55.1 48.2 56.9 48.6 47.5 44.8 50.6 49.1 54.3 48.1 54.1 41.4 50.7 49.0 49.7 50.0 47.3 52.2 47.0 52.0 51.5 49.3 50.7 46.1 47.8 53.0 40.8 43.5 48.0 60.0 53.6 51.0 48.1 51.3 49.8 57.8 51.1 43.2 51.3 48.7 46.5 48.6 48.0 54.7 47.9 53.9 53.0 52.2 54.6 51.1 59.6 47.6 52.4 40.4 58.3 44.4 48.3 49.6 52.6 47.8 48.3 48.1 49.6 52.2 51.2 50.6 49.8 49.4 58.7 57.9 41.8 50.9 47.0 48.8 50.6 50.9)
48.2 46.3 57.6 55.1 55.6 53.0 50.1
51.1 48.8 58.4 47.4 48.3 53.7 46.2
(mu-svar *) returns the mean and variance of the sample (50.3 16.5) (hdp **) horizontally dot plots the sample. * * * * *
* * * * *
* * * * *
* * * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
* * * *
*** ********************** *** *
(sample-moment 2 ***) returns the second moment of the sample 2555.1 To enhance reader comprehension, the reader should have an expected value in mind before issuing the commands. A short description of some basic software functions is available in Appendix A of the text. A short tutorial is contained on the disc for the interested reader who would like to write original programs. However, the tutorial is not necessary for issuing software commands. A User’s Guide is available on disc and contains the functions used in the text with a short explanation of each. Computer programming is one of the best engineering, science, or mathematical experiences a student can have in regard to design, test, logical validity, prototyping, redesign, and so on. Thus, students are encouraged to use their programming language to simulate some of the random phenomena in the software exercises or to learn the basics of the language. Programming languages enable students to think at higher levels without regard to the minutiae of computations.
P369463-pr.qxd 9/8/05 12:50 PM Page xx
Acknowledgments
I would like to thank those who helped with this effort. First and foremost, Tom Singer, excellent editor and coordinator, for his encouragement and diligence in getting the manuscript reviewed. I would also like to thank Anne Williams, for her excellent management of the copyediting and paging, and for maintaining the schedule; Brandy Lilly, for her production work and patience with the software; Tyson Sturgeon, for advice on the software; and to the reviewers, for their comments, suggestions, and encouragements: Alex Suciu, Northeastern University; Leming Qu, Boise State University; Krzysztof Ostaszewski, Illinois State University; Athanasios Micheas, University of Missouri–Columbia; B. M. Golam Kibria, Florida International University; and Thomas Zachariah, Loyola Marymount University.
xx
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 1
Chapter 1
Introduction to Probability
I returned, and saw under the sun, that the race is not to the swift, nor the battle to the strong, . . . but time and chance happeneth to us all. Ecclesiastes
The concept of probability is open to interpretation. Is probability a function of the describing experiment or a function of the mind perceiving the experiment? Understanding probability helps us predict and control variability. Basic concepts and interpretations of probability are introduced. Set notation is reviewed and probability terms are defined. Fundamental principles of counting are used with combinations and permutations. Conditional probability is developed, leading to Bayes’s rule. Several examples that test the basic concepts of probability are statistically verified through simulation using software routines. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
Introduction Interpretations of Probability Sets Probability Parlance Probability Theorems Conditional Probability and Independence Bayes’s Rule Counting the Ways Summary 1
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 2
Chapter 1 Introduction to Probability
2
1.0
Introduction Ever since the beginning of time, chance phenomena have shaped the universe through explosions, erosions, and evolutions. There is even evidence from quantum physics that suggests that God does indeed play dice with the Universe, at least at the microinfrastructure. The race may not always be to the swift nor the battle to the strong, but as a wit once put it, “That’s still the best way to bet.” Probability and statistics pervade our daily lives, as characterized by our betting on humans, animals, and machines to flourish among the chance processes of nature. Perhaps the word betting is not as elegant as decisionmaking, but people bet their premiums that they will die soon with regard to insurance policies (a bet they are most happy to lose), bet their investments in crops and livestock, and bet their factories with regard to their machines of production. Insurance from risk is unpredictable in particular yet virtually predictable in general, that is, unpredictable for individuals but predictable for groups. Phrases like that’s the way the ball bounces or that’s how the cookie crumbles show an acceptance for the unpredictability of events. We use the word lucky to describe someone who has benefited from chance phenomena and the word unlucky for one who has not. Achieving a basic understanding of probability and statistics can go a long way to improve one’s luck. It can also show why both bad and good things happen to both bad and nice people, why one can make an excellent decision but suffer from an unfortunate outcome, and why one can make a horrible decision but benefit from a fortunate outcome. Most people understand the horrible consequences from horrible decision-making and the excellent consequences from excellent decision-making. Probability phenomena are omnipresent in weather forecasts, recreational betting on lotteries and horses, business mergers, stock markets, and industry. The major question is, “Where does the concept of probability lie: in our belief system or in the experimental phenomena itself?” In some sense people can agree on certain aspects of probability phenomena. If a coin were flipped 10,000 times in America and also 10,000 times in Asia, one would expect the coin to show a similar distribution of heads and tails. The randomness of the outcomes is a function of the coin and not of the beliefs of people. But in another sense, people’s beliefs differ, as witness to differences in opinion supported by money (bets). Suppose that you call a coin flip TAILS while the coin is in the air and then are told that the coin is two-headed. Your belief system may then cause you to change your prediction to HEADS, even though nothing physically changed with the coin. Even the concept of randomness is difficult if not impossible to define with precision. Randomness is a negative property indicating the lack of a pattern in occurrences and is a prior (before observing) phenomenon. One must never conclude posteriorly (after observing) that the occurrence was random.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 3
1.0 Introduction
3
Suppose we consider the sequence of heads and tails from 100 fair coin flips. If we can describe the sequence in a pattern such as alternating sets of 3 heads and 3 tails, then such a sequence is definitely not random. If we cannot compress the sequence into a shorter description than the entire sequence itself, we conclude that the sequence is random. Learning occurs through the observation of repeated patterns to enable one to predict the next occurrence. The first few million digits of pi (p) have been statistically analyzed and appear random; yet the digits are completely deterministic. We say that the digits are relatively random (at a local level) due to our inability to discern a pattern. Probability practice consists of a well-defined experiment specifying all the possible uncertain outcomes. The set of all elementary outcomes for each trial is referred to as a sample space and is denoted by the symbol S. An outcome is an element of an event, which is a set of outcomes. An event is a subset of the sample space. The experiment can be replicated, that is, repeated indefinitely under essentially unchanging conditions, and the experimenter may be interested in different results from the experiment. A single or repeated performance of an experiment is referred to as a trial. If the experiment is to flip a coin 3 times, then a trial is the single flip of a coin, the outcomes are H for heads and T for tails, and the sample space S is the set {H, T} for each trial. The sample space for a 2-coin experiment is {HH HT TH TT} and for a 3-coin experiment is {HHH HHT HTH HTT TTH THT THH TTT}. Note that the term “trial” can apply to each flip of the coin as well as to each experiment (flipping a fair coin 3 times) and repeating the 3-coin experiment several times. Perhaps someone is interested only in the number of heads in the 3-coin experiment. Then the outcome of the experiment is a random variable (RV) that has values from the set of integers {0 1 2 3}. Random variables have probability functions that assign probabilities to the outcomes. When the event HHH occurs, the random variable has a value of 3; when HTT occurs, the value assigned is 1. Notice that in modeling a fair coin flip, we do not permit the coin to land on its edge nor permit other such rare events, because we are only modeling reality. The experimenter may also be interested in the probability of getting exactly 2 heads in the 3-coin experiment. If we assume the coin is fair, then each event in the sample space of 8 events is equally likely, and three events (HHT, HTH, THH) result in exactly 2 heads, resulting in a probability of 3/8. That is, there are 8 possible outcomes, and all are equally likely; 3 of the outcomes result in exactly two heads. The probability function consists of the ordered pairs {(0, 1/8) (1, 3/8) (2, 3/8) (3, 1/8)}. Thus one way of defining probability is as the ratio of favorable outcomes to the total number of equally likely outcomes, often referred to as statistical probability. In this chapter we look at different interpretations of probability. Casual set theory is reviewed and common probability terms are introduced. A
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 4
Chapter 1 Introduction to Probability
4
variety of everyday experiments, such as flipping coins or rolling dice, selecting cards from a deck, or selecting marbles from an urn, are discussed in order to secure the heuristic interpretations of probability with familiar sample spaces. Note that probability is prediction before an experiment; statistics is inference after an experiment.
1.1
Interpretations of Probability There are two basic viewpoints or interpretations of probability: objective and subjective. Those who favor the objective perspective require repeatable random processes under similar conditions (regularity) like the flip of a coin, the roll of a die, or the number of deaths in the next 10 years for people 50 years old now. Objectivists would not speculate on the probability of putting a human on the planet Mars in the next 20 years. Such an event is out of their purview. But enter the subjectivists, and all events are under their purview. Subjectivists believe probability is linked to one’s state of mind with regard to the knowledge about the event in question. The subjectivist individually determines through his or her own source of knowledge and evaluation of that knowledge whether the coin is fair. Of course, other subjectivists as well as objectivists may disagree, but subjective probabilities (differences in beliefs) constitute the basis for business ventures and just about anything imaginable. The subjective person has little trouble with randomness as indicated by the lack of a pattern. If the subjectivist sees no pattern and concludes “random,” the objectivist may point out a pattern. The subjectivist nods, updates the stored knowledge, and acknowledges recognition of the pattern or yields to the lack of randomness pointed out in the pattern. As an example, the reader is invited to discern the pattern and predict the digit denoted by “?” in the following sequence: 3 9 7 9 8 5 3 5 6 2 9 5 1 4 1 ? After a while, the reader may conclude the digits are random, but the digits are completely deterministic in that they are the reverse of the initial decimal digits of p. Of course, even if subjectivists are exposed to the same facts and knowledge base pertaining to the probability of an event, they may still disagree because they may weigh the facts differently. Subjectivists are constantly updating their belief systems with the arrival of incoming knowledge. Subjectivists can be characterized by the colloquial “wanna bet?” because subjectivists possess opinions of experiments of which most are not repeatable.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 5
1.1 Interpretations of Probability
5
Objectivists Objectivists subscribe to the classical (a priori), the relative frequency (empirical or a posteriori), or the axiomatic (mathematical basis) approach to probability. Each is briefly discussed.
Classical (a priori) Asked to determine the fairness of a coin, the classicist may examine the coin carefully to note nonhomogeneous material, asymmetry, or rough edges and upon finding none would assume that the coin is fair. That is, both sides of the coin are assumed to be equally likely on the basis of the principle of symmetry. The coin, however, may not be fair. Note that the coin does not need to be flipped. If it is assumed to be fair, then both sides are equally likely. The probability is assigned before (a priori) an experiment is conducted. The interpretation of the probability of an event in the classical sense is the ratio of the equally likely ways the event can occur to the total number of ways. If the experiment were the rolling of a single die and if the event of interest were the die face with 4 spots, then the probability of the event would equal 1/6 since the 4-spot is 1 of the 6 equally likely faces that can occur. The experimenter must be careful in specifying the outcomes to ensure that each outcome is equally likely. In an experiment consisting of flipping a coin twice, one may falsely reason that the equally likely outcomes are 2 heads, 2 tails, or 1 of each and wrongly assign the probability of 1/3 to each of these three events. But there are four equally likely events in the sample space, {HH HT TH TT}, which is a more accurate model of reality.
Empirical or Relative Frequency (a posteriori) On the other hand, the empiricist would just flip the coin without any assumption of fairness to determine the probability of heads. The probability of heads is assigned after (a posteriori) the experiment. The ratio of the number of heads to the total number of flips is the approximate probability assigned to the coin of turning up a head. If the number of each did not vary “too much,” the empiricist might conclude the coin is fair. The empiricist reasons that the flipping of a coin is a repeatable random process with a theoretical result of half the time coming up heads and half the time coming up tails. Thus short-term discrepancies in the sameness of heads and tails are tolerated because the empiricist believes that these discrepancies disappear over the long run, as the number of flips increases. In the flipping of a coin, many factors are ignored, such as temperature, humidity, dew point, barometric pressure, elevation, gravity, air currents, Brownian movements, magnetic phenomena, radiation, and location (the
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 6
Chapter 1 Introduction to Probability
6
Tropics, the Poles, Asia, Africa, America, etc.). All of these factors and more are identified as the vagaries of chance and are essentially ignored. If these discrepancies do not disappear in the long run, then the empiricist concludes that the coin is not fair. Part of the problem with this interpretation is in knowing when the long run has occurred. The coin flip experiment is a sequence of trials where each trial consists of flipping a coin exactly once. An experiment is a repeatable, stochastic (chance) process consisting of a sequence of repeatable trials. The result of each trial is an outcome. An event is any combination of outcomes of special interest. The set of outcomes constitutes a sample space for the trial and is denoted by the symbol S. In this experiment the sample space S consists of the set {head, tail}, implying that the two outcomes of the trial are the event head and the event tail. The empiricist is an experimenter with a stronger contact with statistics than the classicist. The empiricist observes over time to determine the ratio of favorable events to the total number of events that can occur. The more trials observed, the more stable the ratio. Are there more boys than girls born yearly? The classicist may assume that the birth rate is the same for each. The empiricist, by test and data collection, “knows” that the rate for boys is slightly higher than the rate for girls and that the rate may be changing. Table 1.1 displays the results of 55,000 simulations of a coin flip. Each experiment (row) consists of 10 trials of a fair coin flip, with repetition ranging from 100 to 1000 flips in increments of 100. The ratio of the average number of heads to the total number of flips is recorded. The first row signifies the simulated tossing of a coin 100 times, with 48 heads resulting on the first trial, 45 heads on the second trial, 51 heads on the third trial, and so on. The results from each of the 10 trials are added to yield 505 heads from 1000 total flips, for a probability of heads equaling 0.505. The second
Table 1.1
Simulated Fair Coin Flips EXPERIMENTAL TRIALS
1
2
3
4
5
6
7
8
9
10
Total Heads
Total Flips
P(Heads)
48 102 139 198 237 290 368 369 455 497
45 112 139 188 243 281 325 381 464 497
51 95 152 208 254 317 354 421 441 508
59 98 157 197 255 288 341 400 460 495
56 94 149 211 243 299 341 391 460 516
43 94 148 204 249 298 322 417 437 505
58 90 149 195 254 294 339 403 444 518
50 106 150 191 236 298 357 379 434 516
52 111 155 200 241 289 338 391 443 487
43 94 153 189 230 297 338 380 436 498
505 1,000 1,491 1,981 2,442 2,951 3,423 3,932 4,474 5,037
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000
0.505 0.500 0.497 0.495 0.488 0.492 0.489 0.492 0.497 0.504
Total
27,236
55,000
0.495
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 7
1.1 Interpretations of Probability
7
row signifies the similar experiment except that the coin was flipped 200 times on each trial, the third row 300 times, etc. Notice under the “Total Heads” column that the number of heads increases about 500 with each additional 1000 flips. The ratio of the number of heads to the number of flips should be close number of heads 1 1 ª as the to /2. The probability of heads is: P(heads) = number of flips 2 number of flips increases. We expect the theoretical probability of the number of heads to approach 1 /2 if the coin flips were repeated indefinitely. Again, the problem with the empirical view is the indefiniteness of the long run. Notice that the probability of getting exactly the same number of heads as the number of tails is relatively small, but the probability that the difference is “small” is relatively high. For example, the probability of getting exactly 500 heads and 500 tails from 1000 fair coin flips is small (0.025225), as is the probability of getting 550 heads and 450 tails (0.000169).
The command (coin-flips n p) returns the simulated results from n coin flips with probability p of success for heads. For example, (coinflips 100 1/2) may return “48 heads 52 tails.” (sim-coins n p m) returns a list of the number of heads repeating n coin flips m times with p the probability of heads occurring. For example, (sim-coins 1000 1/2 10) returned (487 489 493 482 511 533 491 494 479 513). (mu '(487 489 493 482 511 533 491 494 479 513)) returns 497.2, the average of the numbers. (mu (sim-coins 100 1/2 10)) returns the average number of heads in 10 replications of 100 flips. Try (mu (sim-coins 100 1/2 10)) and press the F3 key to execute the command repeatedly. Estimate the return of (mu (sim-coins 100 1/4 10)). (sim-coins-1-1 100 1/2 10) generated Table 1.1. Repeat the command several times while varying the probability p and observe the regularity in the random fluctuations.
Mathematical or Axiomatic The mathematician begins with 3 axioms of probability and derives theorems from these axioms in the hope that the resulting theorems will have applications in modeling real world phenomena, for example, the outcomes from coin flips. For a sample space S of events or outcomes {Ai} defined by an experiment, the 3 axioms of probability are
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 8
Chapter 1 Introduction to Probability
8
1. P(A) ≥ 0 where A is any arbitrary set of events in the sample space S. 2. P(S) = 1 where S is the entire sample space of events and always occurs. 3. For mutually exclusive events denoted by a set of disjoint events {Ai} in S either finite or infinite in number, n
n
i =1
i =1
P Ê U Ai ˆ = Â P ( Ai ). Ë ¯ The probability function maps the sample space domain into the range of the unit interval [0, 1]. From the 3 axioms emerges a theory of probability that is applicable to real world phenomena and for the most part encompasses the heuristic approaches (classical and relative frequency) to probability. We still use the heuristics approaches, as they heighten our intuition and lend credence to computing probabilities.
1.2
Sets Capital letters are used to designate sets, with small letters designating the elements of the set. The statement “a is an element of set A” is written a ŒA, and a œ A denotes that “a is not an element of A.” The complement of a set A is written as Ac and the symbol “+” is used to denote set union and number addition. Set union is denoted as A » B = A + B. The reader should enhance comprehension in figuring the appropriateness of the symbol in the context. Juxtaposition is used for set intersection, i.e., AB = A « B is the intersection of set A with B. The “-” symbol is used for set difference and is denoted by S - A. The set A is a subset of set S, denoted by A Õ S if and only if " s ŒA, s ŒS. Complement: Ac = {s: s œ A};
Union: A + B = {s: s Œ A or s Œ B};
Intersection: AB = {s: s Œ A and s Œ B};
Difference: S - A = {s: s Œ S and s œ A}.
Set Algebra The algebra of sets includes the following laws for the universal set S and arbitrary sets A, B, C. 1) Commutative 2) Distributive 3) Associative
4) Set Identity 5) DeMorgan’s Laws
Union A+B=B+A A + BC = (A + B)(A + C) A + (B + C) = (A + B) + C =A+B+C S+A=S (A + B)c = AcBc
Intersection AB = BA A(B + C) = AB + AC A(BC) = (AB)C
SA = A (AB)c = Ac + Bc
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 9
1.2 Sets
9
A B c
AB C
c
ABC
c
c
A BC
c
c AB C ABC c A BC c
c
ABC
C
Figure 1.1
Venn Diagram
The empty set is denoted by ∆. If two sets have an empty intersection, the sets are said to be disjoint. When the sets are events in a sample space, the events are said to be mutually exclusive, meaning that when one event occurs, the other event could not possibly have occurred. For example, the event “heads” and the event “tails” are mutually exclusive. Since the domain of the probability function is a set of sets, it helps to understand the laws of set algebra. A Venn diagram of three intersecting sets A, B, and C with its 7 distinct regions is shown in Figure 1.1. EXAMPLE 1.1
Let S = {0, 1, 4, 9, 16, 25}, E = {0, 2, 4, 6, 8, 10}, O = {1, 3, 5, 7, 9, 11}, and the universal set U = {x: 0 £ x £ 25, x integer}. Find a) SE, b) S + E, c) EO, d) Sc, and e) O + U. Solution a) SE = {0, 4}, b) S + E = {0, 1, 2, 4, 6, 8, 9, 10, 16, 25}, c) EO = ∆, the empty set, d) Sc = {2, 3, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 23, 24}, e) O + U = U. The command (setf S '(0 1 4 9 16 25) E '(0 2 4 6 8 10) O '(1 3 5 7 9 11) U (upt0 25)) assigns the sets. (union S E) returns (1 9 16 25 0 2 4 6 8 10). (intersection S E) returns (0 4); (intersection E O) returns nil. (set-difference U S) returns Sc (2 3 5 6 7 8 10 11 12 13 14 15 17 18 19 20 21 22 23 24). (subsetp S U) returns t for true. (subsetp (intersection S E) E) returns t for true.
; SE is a subset of E.
(member 7 S) returns nil, stating 7 is not a member of S.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 10
Chapter 1 Introduction to Probability
10
1.3
Probability Parlance Other terms for probability are odds, chance, likelihood, and percentage. A prizefighter may be given 3 to 2 odds of winning the fight. The odds are interpreted to mean that the fighter is expected to win with probability 3/5 and expected to lose with probability 2/5 with no likelihood of a draw. Given the odds in favor of an event, the probability is the first number divided by the sum, (odds 3 : 2, in favor, or for, is probability 3/5); the probability against is the second number divided by the sum, (odds 3 : 2 for yields probability 2/5 against). q p The odds can also be expressed : 1 FOR or : 1 AGAINST, p q where p is probability of success and q is probability of failure. For example, given that p = 0.1 and q = 0.9, the odds may be expressed as 9 : 1 AGAINST. P ( A) 1/4 1 Suppose P ( A) = 1/4. Then odds( A) = = = : 1 = 1 : 3 FOR. 1 - P ( A) 3/4 3 odds( A) 1/3 1/3 1 Also, P ( A) = = = = . 1 + odds( A) 1 + 1/3 4/3 4 If a person is interviewed for a job, the person may want to know the chances of being hired. Upon being told that 3 other people have been interviewed for the job, the person may then conclude that the chances are 1 out of 4, using an equally likely assumption of being hired. When a weather forecast is 30% chance of rain, what is implied is that the probability of rain is 0.30. Asking the likelihood of rain is another way of asking for the probability of rain. Do we see the difficulty in a classical approach to predicting the weather? The relative frequency approach would necessitate the recordings of pertinent data to account for the number of times the weather phenomena occurred. However, the subjectivist has no problem rendering an expert opinion on the occurrence of weather phenomena.
EXAMPLE 1.2
If the odds are 2 : 3 for event A, 3 : 7 for event B, and 1 : 4 for the compound event AB (A AND B), compute the odds for the event A + B, i.e., A OR B, and express the odds for event B in the form x : 1. Solution
P ( A) =
2
, P( B ) =
5
3
, P ( AB ) =
10
1
,
5
P ( A + B ) = P ( A) + P ( B ) - P ( AB ),
(1–1)
which is soon proved in Section 1.5. Thus P( A + B ) =
2 5
+
3 10
-
1 5
=
1
with even odds 1:1.
2
The odds for event B may be expressed as 3/7 : 1 FOR or 7/3 :1 AGAINST.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 11
1.3 Probability Parlance
11
The software command (odds probability) returns the odds given the probability. For example, (odds 3/10) returns 2.333 :1 1 AGAINST and (odds 7/10) returns 2.333 : 1 FOR.
The elementary outcomes (singleton sets) are often referred to as sample points in the sample space. A compound event consists of two or more elementary events. If the event or element that occurs is contained in the set A, then the event A is said to have occurred. The sample space specified by an experiment is the set of all elementary events. The sample space may be defined in a coarser granularity. For example, with the experiment of tossing a single die, the sample space may be defined as Odd or Even. However, it is beneficial to have the finest granularity (elementary outcomes) of the events in mind when calculating the probabilities. The complement of event A, denoted by Ac, is the set of outcomes in the sample space that are not in A. In the coin flip experiment the event “heads” is the complement of the event “tails.” It is sometimes easier to determine the probability of an event by computing the probability of the event not happening and subtracting this probability from 1. That is, P(A) = 1 - P(Ac). This principle via the back door is an excellent problem-solving technique. For example, A man buys a $50 pair of boots from a storeowner and gives the owner $100. The owner has no change and goes to a neighbor to exchange the $100 bill for two $50 bills. The owner gives the buyer the pair of boots and $50 change. The buyer then leaves. The neighbor returns to the owner, claiming that the $100 bill is counterfeit. The owner agrees and gives the neighbor a good $100 bill. How much is the owner out? The answer is in the chapter summary.
As events consist of sets and as sets can be mutually exclusive, events can be mutually exclusive. Two events are said to be mutually exclusive if the occurrence of one event prevents the simultaneous occurrence of the other event. In other words, two events A and B are mutually exclusive if their intersection is empty, that is, AB = ∆. A set of events (subsets) whose union includes the entire sample space is said to be collectively exhaustive. For example, in the 6-sided die toss, the event “even” and the event “odd” are collectively exhaustive (as well as mutually exclusive). The three events “prime,” “even,” and “die face 1” are collectively exhaustive but not mutually exclusive. Consider the experiment of rolling a fair 6-sided die. The elementary sample space of the first trial (roll) is the set {1 2 3 4 5 6}, but other nonelementary sample spaces could be designated as the set {even odd} or be specified by the set of events divisible by 3 and the events not divisible by 3. Still many other such spaces could be specified.
P369463-Ch001.qxd
12
9/2/05
10:56 AM
Page 12
Chapter 1 Introduction to Probability
Notice that if we were told that the outcome was even, we could not determine whether the outcome was divisible by 3 but that outcomes specified from the elementary sample space {1, 2, 3, 4, 5, 6} can determine even/odd and divisibility by 3. Its granularity is the finest, consisting of all elementary (equally likely) events. The sample space of interest as determined by the experiment of rolling two fair dice can be mapped into range space {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} denoted by the sum of the dice, but the event sums are not equally likely and thus are not elementary events. The set of the elementary events, ordered pairs (1st die, 2nd die), indicate the 36 equally likely outcomes as shown in Table 1.2. The southwest-northeast diagonal passes through the six points summing to 7. The probability function for the sum of the dice can now be assigned by counting the elementary events for each of the sums. For example, the compound event S5, designated as the sum being 5, occurs when any one of the points in the set {(1, 4) (2, 3) (4, 1) (3, 2)} occurs. The discrete probability density function is shown in Table 1.3. Notice that the probabilities sum to 1. The sample space may be discrete (countable number of possible points or outcomes or events) or continuous (nondenumerable or cardinality of the continuum). The experiment of flipping a coin 3 times has a finite discrete sample space; the experiment of flipping a coin until a heads appears has a discrete, countably infinite sample space, and the experiment of choosing a number in the interval [0, 1] has both a continuous and infinite sample space. It is also possible to specify discrete outcomes over a continuous interval. For example, over the interval [0, 1], assign probability 1/2 if a random number is less than 1/2 and assign probability 1/2 if the random number is greater than or equal to 1/2. Thus we have 2 events defined on the interval. Table 1.2
Sample Space of Elementary Dice Outcomes 1
1 2 3 4 5 6
Table 1.3
(1, (2, (3, (4, (5, (6,
2 1) 1) 1) 1) 1) 1)
(1, (2, (3, (4, (5, (6,
3 2) 2) 2) 2) 2) 2)
(1, (2, (3, (4, (5, (6,
4 3) 3) 3) 3) 3) 3)
(1, (2, (3, (4, (5, (6,
5 4) 4) 4) 4) 4) 4)
(1, (2, (3, (4, (5, (6,
6 5) 5) 5) 5) 5) 5)
(1, (2, (3, (4, (5, (6,
6) 6) 6) 6) 6) 6)
Discrete Density Function (Dice Sums)
Sum of Dice Number Ways Probability
2 1 1/36
3 2 2/36
4 3 3/36
5 4 4/36
6 5 5/36
7 6 6/36
8 5 5/36
9 4 4/36
10 3 3/36
11 2 2/36
12 1 1/36
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 13
1.4 Probability Theorems
1.4
13
Probability Theorems From three probability axioms flow the theorems. A short development of a few theorems that follow from the axioms is given. Let A be an event in sample space S with P(A) denoting the probability of event A occurring. Recall that the three axioms are 1. P(A) ≥ 0; 2. P(S) = 1, 3. For a set of mutually exclusive (disjoint) events Ai, n
n
i =1
i =1
P Ê U Ai ˆ = Â P ( Ai ). Ë ¯ With disjoint events A and ∆, the null event, the intersection A∆ = ∆, and thus they are mutually exclusive events. 1. P(A) = P(A + ∆) = P(A) + P(∆) = P(A) fi P(∆) = P(A) - P(A) = 0. axiom 3 2. P(S) = 1 = P(A + Ac) = P(A) + P(Ac) fi P(A) = 1 - P(Ac). axiom 2 3. If A Õ B, then B = A + BAc, a union of disjoint sets. Consequently, P(B) = P(A + BAc) = P(A) + P(BAc) fi P(A) = P(B) - P(BAc) £ P(B). axioms 1 and 3 P(A + B) = P(ABc + AB + AcB) = P(ABc) + P(AB) + P(AcB) axiom 3 P(A) = P(ABc) + P(AB); P(B) = P(AB) + P(AcB) where axiom 3 A = AB + ABc and B = AB + AcB. Thus P(A) + P(B) = P(ABc) + P(AcB) + 2P(AB) = P(A + B) + P(AB), resulting in P(A + B) = P(A) + P(B) - P(AB). This rule is often called the sum rule, or the principle of inclusion-exclusion. The union of three or more sets can be continued by induction. For three sets, P ( A + B + C ) = P ( A) + P ( B ) + P ( B ) - P ( AB ) - P ( AC ) - P ( BC ) + P ( ABC ). (1–1) Figure 1.2 is a Venn diagram illustrating P(A + B) = P(A) + P(B) - P(AB). A
B
a b
c
P(A + B) = P(A) + P(B ) – P(AB ) a + b + c = (a + b) + (b + c) – b
Figure 1.2
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 14
Chapter 1 Introduction to Probability
14
1.5
Conditional Probability and Independence Suppose in rolling a pair of fair dice we seek the probability of S2, the event sum of 2. The probability is 1/36. If we are told that one of the dice is not a 1, then we can immediately conclude that the event S2 did not occur. Conditional probability concerns the update of the probability of event A upon knowing that some other event B has occurred. We use the notation “P(A | B)” to denote the probability of event A given that event B has occurred. In a die roll the probability of getting a “4” is 1/6, but on being told that the die is even, the probability increases to 1/3, i.e., P(“4” | even) = 1/3. Our conditional sample space becomes B = {2 4 6} and we focus on the points in B that are also in A in computing P(A | B). The general formula is given by P( A B ) =
P ( AB )
,
(1–2)
P( B ) and since the event B occurred, P(B) is assumed to be greater than zero. Notice also that P ( AB ) = P ( A B ) P ( B ) = P ( BA) = P ( B A) P ( A) and that P ( A B ) = P ( B A) P ( A)/ P ( B ). See Figure 1.3.
A
a
P (A|B ) =
B
b
P(AB) ; P(B)
b b = ; b+c b+c
Figure 1.3
c
P(B|A) =
P(BA) P(A)
b b = a+b a+b
Conditional Probability
(1–3)
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 15
1.5 Conditional Probability and Independence
15
Two events are said to be mutually independent if the occurrence of one event does not affect the probability of the occurrence of the other event. Events A and B are independent if P ( AB ) = P ( A) * P ( B ).
(1–4)
The rule is often referred to as the multiplication rule for probability. Independence is also defined in terms of conditional probability. If P ( A B ) = P ( A),
(1–5)
then A and B are said to be independent. Independence is a symmetrical relationship; that is, if event A is independent of event B, then event B is also independent of event A. Recall that P ( AB ) = P ( A) P ( B A) = P ( B ) P ( A B )
(1–6)
for arbitrary sets A and B, but if A and B are independent, then P ( AB ) = P ( A) * P ( B ), and P( A B ) =
P ( AB ) P( B )
=
P ( A) P ( B )
= P ( A).
P( B )
If a coin was flipped and a die rolled, it seems intuitively obvious that knowing the coin outcome would in no way influence the probability of the die outcome. The elementary sample space is
{( H , 1)( H , 2)( H , 3)( H , 4)( H , 5)( H , 6)(T , 1)(T , 2)(T , 3)(T , 4)(T , 5)(T , 6)}. Since there are 12 elementary events, the probability of each elementary outcome is 1/12, which is equal to (1/2) * (1/6), indicating that the coin and die outcomes are independent. However, in an experiment of selecting two marbles without replacement from an urn containing 3 red marbles and 2 blue marbles in seeking the probability of 2 blue marbles, the probability of the second selection would be affected by the outcome of the first selection. That is, the outcome from the second selection is dependent on the outcome of the first selection. With replacement, P (blue, blue) = (2/5) * (2/5) = 4/25. Independent Without replacement, P (blue, blue) = (2/5) * (1/4) = 2/20. Dependent But the event “second marble is blue” is independent of replacement. That is, P(2nd marble is blue) = P(blue, blue) + P(red, blue) 2 1 3 2 8 2 2 3 2 10 * + * = = * + * = 5 4 5 4 20 5 5 5 5 25 Without replacement = With replacement. =
Recall that two events are said to be mutually exclusive if the occurrence of one event precludes the simultaneous occurrence of the other. Such events
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 16
Chapter 1 Introduction to Probability
16
are clearly dependent. For example, a coin flip with the outcome “heads” implies the outcome is not “tails.” For three events to be independent, it is not enough to show that the events are independent pair-wise. All three events must be independent in concert. Consider the following example. EXAMPLE 1.3
The experiment is the rolling of a pair of fair dice. Let A be the event 1 on one die and B the event 1 on the other die. Let C be the event that the sum of the dice is odd. Show that A and B are independent, B and C are independent, and A and C are independent, but that A, B, and C are not independent events. P ( A) = P ( B ) =
1
, P (C ) =
1
; P( A B ) =
P ( AB )
1 P( A C ) =
P ( AC )
= P (C ) 1 = = P ( A), 6
P (1, S3 ) + (1, S5 ) + (1, S7 )
= 36
1/2
+
=
1/36
1 36 1/2
+
=
1
= 6 2 P( B ) 1/6 6 P(A) implying that A and B are independent events. Let Si indicate that the sum of the dice is i. Then Solution
1 36
implying that A and C are independent events, and similarly for events B and C. Thus the events A and B, A and C, and B and C are independent pair-wise, 1 1 1 1 but P ( ABC ) = 0 π P ( A) * P ( B ) * P (C ) = * * = , implying that A, B, 6 6 2 72 and C are not independent events three-wise. Independence is one of the most important properties in probability and statistics. The assumption of independence often simplifies complicated scenarios. For independent events A and B, P(A | B) = P(A) or P(B | A) = P(B), implying that P(AB) = P(A) * P(B). For mutually exclusive events A and B, P(A + B) = P(A) + P(B) - P(AB), but P(AB) = 0. EXAMPLE 1.4
Urn X has 3 red and 2 blue marbles; Urn Y has 5 red and 10 blue marbles. The experiment is to randomly choose an urn and to randomly draw a marble from the chosen urn. Compute the probability of drawing a) a blue marble; b) a red marble. 3 red
5 red
2 blue
10 blue
Urn X
Urn Y
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 17
1.5 Conditional Probability and Independence
17
Solution a) Since the urns are picked at random (equally likely), P ( X ) = P (Y ) =
1
,
P ( blue X ) =
2
2
,
P ( blue Y ) =
5
10
.
15
P ( blue) = P ( blue X ) * P ( X ) + P ( blue Y ) * P ( X ) 2 1 10 1 = * + * 5 2 15 2 8 = . 15 b) P ( red) = P ( red X ) * P ( X ) + P ( red Y ) * P (Y ) 3 1 5 1 = * + * 5 2 15 2 7 = 15 8 8 = 1 - , confirming P (blue) = . 15 15
EXAMPLE 1.5
Consider the same experiment as in Example 1.4. Compute a) P(X | blue),
b) P(X | red),
c) P(Y | blue),
d) P(Y | red).
Solution
a) P ( X
b) P ( X
c) P (Y
d) P (Y
1 2 * 2 5 =3 blue) = = = P ( blue) P ( blue) 8/15 8 1 3 * P ( X , red ) P ( X ) * P ( red X ) 2 5 9 red ) = = = = P ( red ) P ( red ) 7/15 14 1 10 * P (Y , blue) P ( T ) * P ( blue T ) 2 15 5 blue) = = = = P ( blue) P ( blue) 8/15 8 1 5 * P (Y , red ) P (Y ) * P ( red Y ) 2 15 5 red ) = = = = P ( red ) P ( red ) 7/15 14 P ( X , blue)
P ( X ) * P ( blue X )
Observe that P(X | blue) + P(Y | blue) = 1 and that P(X | red) + P(Y | red) = 1.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 18
Chapter 1 Introduction to Probability
18
Table 1.4
Table Format for Conditional Probability
Marble
Urn X 1 3 3 * = 2 5 10 1 2 1 * = 2 5 5 1 P( X ) = 2
Red Blue Marginal Probability
Urn Y
Marginal Probability
1
5 1 = * 2 15 6 1 10 1 = * 2 15 3 1 P (Y ) = 2
P(red) = P(blue) =
7 15 8 15
1
Conditional probability can also be displayed in table format. Examples 1.4 and 1.5 are shown in Table 1.4. The compound probabilities are shown in the table proper, e.g., 1 3 3 * = . 2 5 10 The marginal (unconditional) probabilities of red, blue, Urn X, and Urn Y are in the table margins. Notice that the sum of the marginal probabilities (the probabilities in the margins) is 1. To compute P(X | red), look at the end of the first (red) row to see 7/15 total probability of being red. Take the ratio of Urn X’s contribution 3/10 (red AND urn X) to the total probability of being red 7/15. That is, P ( X , Red ) =
P ( X red ) =
P ( X , red )
P ( red ) 3/10 9 = = . 7/15 14 Similarly, to compute P(red | X), look at the Urn X column and take the ratio of the compound event red AND urn X (3/10) to the total probability 1/2 of urn X to get (3/10)/(1/2) = 3/5. EXAMPLE 1.6
Complete the joint probability table below by filling in the blanks and compute the following probability statements: a) P(BY ), b) P(B | Y ), c) P(Y | B), d) P(Z), e) P(ZC ), f) P(C), g) P(X + C ), h) P(D | X ) + P(D | Y ) + P(D | Z ). i) Are A and D mutually exclusive? j) Are A and X independent? A
B
C
D
Total
25
25 40 20
100
X Y Z
20 25
0
Total
70
60
50
50
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 19
1.6 Bayes’s Rule
19
Solution The missing row-wise entries are: Row Row Row Row
X: 30 Y: 25, 35, and 150 Z: 5 Total: 85, 85, and 300.
A
B
C
D
Total
X Y Z
20 25 25
30 50 5
25 35 0
25 40 20
100 150 50
Total
70
85
60
85
300
P(BY ) = 50/300 = 1/6. b) P(B | Y) = 50/150 = 1/3. P(Y | B) = 50/85 = 10/17. d) P(Z) = 50/300 = 1/6. P(ZC ) = 0. f) P(C) = 60/300 = 1/5. P(X + C ) = P(X) + P(C) - P(XC) = 1/3 + 1/5 - 1/12 = 9/20. 25 40 20 85 h) P ( D X ) + P ( D Y ) + P ( D Z ) = + + = . 100 150 50 300 i) Yes, AD = F. j) No, since P(A) = 70/300 π P(A | X) = 20/100.
a) c) e) g)
1.6
Bayes’s Rule Reverend Thomas Bayes was a pioneer in probability theory (subjective interpretation). One begins with a prior distribution in mind and, upon seeing the results of empirical tests, revises the prior distribution to a posterior distribution. Examples 1.4 and 1.5 are applications of Bayes’s rule, which is just a slightly extended application of conditional probability. Recall P(AB) = P ( B A) P ( A) P(A | B)P(B) = P(BA) = P(B | A)P(A). Observe that P ( A B ) = . P( B ) Thus, with a sample space of n collectively exhaustive, disjoint events {Ai}, and arbitrary event B with P(B) > 0, P ( Ai B ) = [ P ( B Ai ) * P ( Ai )] / P ( B ) where n
P ( B ) = Â P ( B Ai ) P ( Ai ). See Figure 1.4. i =1
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 20
Chapter 1 Introduction to Probability
20
A2
A1
B A 2B
A 1B A 4B
A3
A 3B A 6B
A 5B
A4
A6 A5
6
6
P (B ) =Â P(BAi) = ÂP(B|Ai)P(Ai). i=1
Figure 1.4
i=1
Partition of Ellipse B by Sets Ai
The denominator P(B) is often referred to as the total probability for B. In Figure 1.4 the sample space is partitioned into disjoint subsets and the event B (the ellipse) is the union of the intersection of B with the set of partitions Ai. A partition of a set S is a subdivision of disjoint subsets of S such that each s Œ S is in one and only one of the subsets. An application of Bayes’s rule illustrating why one should not panic upon hearing rare bad news is given in the following example. EXAMPLE 1.7
Suppose that 3/4% of a population have a terminal disease and that the test to detect this disease is 99% accurate in identifying those with the disease and 95% accurate in identifying those without the disease. Compute the probability that one has the disease given that the test so indicates. Let D ~ Disease, ND ~ No Disease, TP ~ Tested Positive. Apply Bayes’s rule to get P ( D TP ) = =
P ( TP D) P ( D) P ( TP D) P ( D) + P ( TP ND) P ( ND) 0.99 * 0.0075
(0.99 * 0.0075) + (0.05 * 0.9925) = 0.13 or 13%. If the test were perfect, 3 out of 400 would test positive, whereas with the current test and a perfectly healthy population 20 out of 400 would indicate positive falsely.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 21
1.6 Bayes’s Rule
21
In table form,
Have Disease Not Have Disease Total Probability
Tested Positive
Tested Negative
Total Probability
0.0075 * 0.99 = 0.007425 0.9925 * 0.05 = 0.049625 0.05705
0.0075 * 0.01 = 0.000075 0.9925 * 0.95 = 0.942875 0.94295
0.0075 0.9925 1
Because the disease is relatively rare, the odds favor a mistaken test (tested positive incorrectly, or false positive) rather than the person actually having the disease. Let TPC indicate Tested Positive Correctly, TPI indicate Tested Positive Incorrectly, TNC indicate Tested Negative Correctly, and TNI indicate Tested Negative Incorrectly. Disease
No Disease
Total TPC + TPI TNI + TNC
Test Positive Test Negative
TPC TNI
TPI TNC
Total
TPC + TNI
TPI + TNC
The sensitivity of the test is the conditional probability TPC/(TPC + TNI). In words, sensitivity is the ratio of those who correctly tested positive to all those who actually have the disease. The specificity of the test is the conditional probability TNC/(TPI + TNC ). In words, specificity is the ratio of the number who correctly tested negative to the total number who actually do not have the disease. The positive predictive value of the test is the conditional probability TPC/(TPC + TPI), the ratio of the number who correctly tested positive to the total number who tested positive. The negative predictive value of the test is the conditional probability TNC/(TNC + TNI ), the ratio of the number who correctly tested negative to the total number who tested negative. The prevailing rate is the proportion of the total number of people who actually have the disease (TPC + TNI)/(TPC + TNI + TPI + TNC ). EXAMPLE 1.8
Compute the sensitivity, specificity, and predictive values from the following 1000 test results. ACTUAL DIAGNOSIS Disease No Disease
Total
Test Positive Test Negative
180 20
10 790
190 810
Total
200
800
1000
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 22
Chapter 1 Introduction to Probability
22
Solution TPC = 180, TPI = 10, TNI = 20, TNC = 790. Sensitivity = 180/(180 + 20) = 0.90, Specificity = 790/(10 + 790) = 0.9875, Positive Predictive Value = 180/(180 + 10) = 0.947, Negative Predictive Value = 790/(20 + 790) = 0.975, Prevailing rate = (180 + 20)/1000 = 0.20. Sensitivity is the probability of X testing positive given that X has the disease; specificity is the probability of X not having the disease given that the test was negative; the positive predictive value is the probability of X having the disease given that X tested positive; and the negative predictive value is the probability of X not having the disease given that X tested negative. EXAMPLE 1.9
In an urn are five fair coins, three 2-headed coins, and four 2-tailed coins. A coin is to be randomly selected and flipped. Compute the probability that the coin is fair if the result was a) the flip was heads, or b) 2 flips were both heads. Use prior and posterior probabilities. Solution Let Fair denote the set of fair coins, 2H the set of 2-headed coins, and 2T the set of 2-tailed coins. P(Fair) = 5/12 prior. After the result of the coin flip is heads (Head), Bayes’s rule is applied to get the posterior probability. a) P (Fair Head) = =
P (Head Fair) P (Fair) P (Head Fair) P (Fair) + P (Head 2 H ) P (2 H ) + P (Head) 2T ) P (2T ) (1/2)(5/12)
(1/2)(5/12) + (1)(3/12) + (0)(4/12)
= 5/11 posterior probabliity which becomes the prior probability for the 2nd flip. b) Method I: Use prior probability 5/11 and thus 6/11 for selecting a 2headed coin. P(Fair Head) =
(1/2)(5/11)
(1/2)(5/11) + (1)(6/11) + (0)(0) = 5/17.
Method II: Start experiment anew. P(Fair Head,Head) =
(1/4)(5/12)
(1/4)(5/12) + (1)(3/12) + (0)(4/12) = 5/17.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 23
1.7 Counting the Ways
23
The probability of heads given that the coin is fair is 1/2; given that the coin is 2-headed, is 1; and given that the coin is 2-tailed, is 0. The initial prevailing rate is 5/8.
1.7
Counting the Ways We introduce the various sophisticated methods of counting the ways that events can occur from tree diagrams, recursion, permutations, and combinations. The examples show the principles behind these methods. The reader should be aware of the many various but equivalent models for solving probability problems by counting the number of favorable ways for events to occur. Factors to consider in counting are whether the objects are distinguishable, replaceable, or ordered.
Two Fundamental Principles of Counting (FPC) and the Pigeonhole Principle If there are n1 different items in set 1, n2 different items in set 2, etc., for r disjoint sets, then the number of ways to select an item from one of the r sets is n1 + n2 + . . . + nr. This principle is referred to as the addition principle. If there are n1 outcomes for the first decision in stage 1, followed by n2 outcomes for the second decision in stage 2, followed by n3 for the 3rd decision in stage 3, etc., for r stages, where each decision is independent of all prior decisions, then all together there are n1 * n2 * . . . * nr. This Cartesian product is the total number of ways that the r sequence of decisions can be made. This principle is also referred to as the multiplication principle. Pigeonhole Principle: If the number of pigeons exceeds the number of pigeonholes, then some pigeonhole has at least 2 pigeons. EXAMPLE 1.10
a) At a picnic of 50 people, 30 had hamburgers, 25 had hotdogs, and 15 had both. How many had neither? b) How many ways can a 20-question true/false test be answered? c) In a drawer are 12 black socks and 12 white socks. What is the minimum number of socks to randomly pull out of the drawer to ensure getting a matching pair? d) In an urn are 3 red, 4 white, and 5 blue marbles. How many ways can a sample of 4 marbles be selected so that the sample contains a marble of each color? Solution a) Disjoint sets are A = just hamburgers, B = just hotdogs, C = both hamburgers and hotdogs, and D = neither hamburgers nor hotdogs.
P369463-Ch001.qxd
9/2/05
24
10:56 AM
Page 24
Chapter 1 Introduction to Probability
Applying the addition FPC for disjoint sets, A + B + C + D = 50 or (30 15) + (25 - 15) + 15 + D = 50 fi D = 10 had neither. b) There are 2 choices for each question. Applying the multiplication FPC yields 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 * 2 = 220 = 1,048,576. The probability (0.00000095367) of guessing all 20 questions correctly is less than 1 in a million. c) There are two pigeonholes (black and white). Pull out 3 socks. d) Using both the addition and multiplication FPCs, we have 2 R 1W 1B or 1R 2W 1B or 1R 1W 2 B yielding Ê 3ˆ Ê 4ˆ Ê 5ˆ + Ê 3ˆ Ê 4ˆ Ê 5ˆ + Ê 3ˆ Ê 4ˆ Ê 5ˆ = 60 + 90 + 120 = 270. Ë 2¯ Ë 1¯ Ë 1¯ Ë 1¯ Ë 2¯ Ë 1¯ Ë 1¯ Ë 1¯ Ë 2¯ Consider the automorphic mappings from the discrete domain space 1, 2, and 3 into and onto the discrete range space 1, 2, and 3. If the point 1 is mapped into 1, then the 1 is said to be a fixed point of the mapping. A mapping with no fixed points is called a derangement. EXAMPLE 1.11
(Matching Problem) a) How many ways can the integers 1, 2, and 3 be uniquely mapped onto themselves? b) Compute the probability of at least one of the integers being a fixed point or self-assigned. c) Compute the probability of a derangement, no integer being self-assigned. Solution a) We can map the first integer to any of the 3, the second integer to any of the two remaining, and the third integer to the one remaining, giving by the multiplication FPC a sequence of choices of 3 * 2 * 1 = 6 maps. The maps are enumerated as follows: 1 2 3* 123
1 2 3* 132
1 2 3* 213
123 231
123 312
1 2 3* 321
b) Let A1 be the set of maps where 1 Æ 1, A2 the set of maps where 2 Æ 2, and A3 the set of maps where 3 Æ 3 (fixed points). We seek P(A1 + A2 + A3). Since the maps (elementary events) are enumerated, we can simply count the number of mappings with at least one match (starred) to compute the probability as 4/6. The inclusion/exclusion yields P ( A1 + A2 + A3 ) = P ( A1 ) + P ( A2 ) + P ( A3 ) - P ( A1 A2 ) - P ( A1 A3 ) - P ( A2 A3 ) + P ( A1 A2 A3 ), 1 1 1 1 1 1 1 1 1 1 1 1 = + + - * - * - * + * * , 3 3 3 3 2 3 2 3 2 3 2 1 4 = (the probability of at least one match). 6
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 25
1.7 Counting the Ways
25
1 1 1 * = , confirming that if 2 3 2 6 1 1 1 1 match, then all 3 match with P ( A1 A2 A3 ) = P ( A1 A2 A3 ) = * * = . 3 2 1 6 Note that P ( A1 A2 ) = P ( A1 ) P ( A2 A1 ) =
That is, no map of n integers can assign exactly (n - 1) matches. For n = 3, either all positions match, or one of the positions matches, or none of the positions match. c) P(derangement) = 1 - P(A1 + A2 + A3) = 1 - 4/6 = 1/3. (back door) P(derangement 1 Æ 2 Æ 3) = (1/3)(1/2)(1) = 1/6; P(derangement 1 Æ 3 Æ 2) = (1/3)(1/2)(1) = 1/6 fi 1/6 + 1/6 = 1/3. (front door)
The command (pm n r) returns the probability of exactly r matches in the n! permutation maps. (pm 3 0) returns 1/3, the probability that none match; (pm 3 1) returns 1/2, the probability that exactly 1 matches; (pm 3 2) returns 0, the probability that exactly 2 match; (pm 3 3) returns 1/6, the probability that exactly 3 match. (N n r) returns the number of permutation maps with exactly r matches. (N 50 25) Æ returns 721331190766322471793800016473143520 448. (print-permutations ordered-list-of-integers) prints the permutations of the list. (print-permutations '(1 2 3)) prints (1 2 3) (1 3 2) (2 1 3) (2 3 1) (3 1 2) (3 2 1). # # maps (print-maps 3) returns 0 2 Compute the probability of at 1 3 least one match in the map2 0 pings of (1 2 3). (3 + 0 + 1)/6 3 1 = 4/6. The number of maps sum to 3! = 6. Try (print-maps 50) to get a look at the 50! number of maps.
EXAMPLE 1.12
Consider all unique 4-digit integers formed from the set {0 to 9} with neither leading zeros nor repetition of digits (sampling without replacement). Notice that order is a factor (permutation).
P369463-Ch001.qxd
26
9/2/05
10:56 AM
Page 26
Chapter 1 Introduction to Probability
a) b) c) d) e)
How How How How How
many many many many many
4-digit integers can be formed? are odd? are even? end with the digit 3? are even if the sampling is done with replacement?
Solution a) As an aid in applying the multiplication FPC we draw 4 dashes to be filled by our choices for each position: __ __ __ __. For the first digit, there are 9 choices (not 10), because we have specified no leading zeros. For the second digit, there are also 9 choices, because we used one but now zero can be used. For the third digit there are 8 choices, and for the final digit there are 7, resulting in 9 * 9 * 8 * 7 = 4536 integers. b) Since odd integers are specified, there are 5 choices for the final digit, leaving 8 (no leading 0) for the first digit, 8 (0 permitted) for the second digit, and 7 for the third digit, resulting in 8 * 8 * 7 * 5 = 2240 odd integers. Notice that the order of fill is actually arbitrary, but it is helpful to take care of the more constrained choices first. c) Since even integers are specified with no leading zeros, we first compute the number of ways to form the even integers without ending in zero and then the number of ways to form even integers ending in zero. We have 8 * 8 * 7 * 4 = 1792 even integers not ending in zero and 9 * 8 * 7 * 1 = 504 even integers ending in zero giving a total of 2296 even integers. Notice that the total number of integers is 4536 and that the number of odd integers is 2240. Thus the number of even integers is 4536 2240 = 2296. d) The number of integers ending in the digit 3 has exactly one choice for the final digit, 8 for the first and second with no leading zero, and 7 for the third, yielding 8 * 8 * 7 * 1 = 448 integers ending in 3 or (1 5 7 9). Note that this result implies that the total number of odd integers is 5 * 448 = 2240.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 27
1.7 Counting the Ways
27
1 2
3
3 123 123 123
3 2 1
2 3 132 213 123 132
Figure 1.5
123 213
3
2
1
1 1 2 231 321 312 123 231
123 321
123 312
Permutation Maps of (1 2 3)
e) With repetition, our fourth digit can be chosen from the 5 even {0, 2, 4, 6, 8}, the first digit can be chosen from 9 (no 0), and the second digit can be chosen from 10, as can the third digit, for 9 * 10 * 10 * 5 = 4500 even integers with replacement. Observe we have repeatedly used the multiplication FPC. In many probability problems the FPC is used with other combinatory ways to account for all the possible choices.
Tree Diagrams Tree diagrams display the outcomes of an experiment and can pictorially depict the fundamental principle of counting. The 6 permutation maps of the integers {1 2 3} are shown as a tree diagram and directly in Figure 1.5. When probabilities are assigned to the outcomes, the tree diagram is referred to as a probability diagram. Figure 1.6 represents the outcomes from flipping three coins with 1/4 probability of heads and 3/4 probability of tails. The notation H 1/4 indicates the event heads occurred with probability 1/4. EXAMPLE 1.13
The experiment is flipping an unfair coin three times. The probability of heads is 1/4. Compute the probability of exactly 0, 1, 2, and 3 heads in 3 coin flips. Sum the probabilities. See Figure 1.6. Solution
P(0 head) = P(TTT) = (3/4)3 = 27/64 P(1 head) = P(HTT) + P(THT) + P(TTH) = 3*(1/4)(3/4)2 = 27/64 P(2 heads) = P(HHT) + P(THH) + P(HTH) = 3*(1/4)2(3/4) = 9/64 P(3 heads) = P(HHH) = (1/4)3 = 1/64 27/64 + 27/64 + 9/64 + 1/64 = 64/64 = 1.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 28
Chapter 1 Introduction to Probability
28
H 1/4 HHH 1/64 T 3/4
H 1/4
HHT 3/64 T 3/4
H 1/4 HTH 3/64
H 1/4 T 3/4
HTT 9/64 H 1/4 H 1/4
T 3/4
THH 3/64 T 3/4 THT 9/64
T 3/4 H 1/4
TTH 9/64
T 3/4
Figure 1.6
TTT 27/64
Probability Diagram: Flipping a Coin 3 Times with P(H) = 1/4
Permutations Permutations are ordered arrangements of objects. With n distinguishable objects, an arrangement of r (r £ n) of them can be achieved with replacement in nr ways by the fundamental principle of counting. With n distinguishable objects, an arrangement of r of them can be achieved without replacement in n
Pr = n( n - 1)( n - 2) . . . ( n - r + 1) ways, again by the FPC, n( n - 1)( n - 2) . . . ( n - r + 1)( n - r )( n - r - 1) . . . * 2 * 1 n! = = . ( n - r )( n - r - 1) . . . * 2 * 1 ( n - r )!
Similar notations are ( permutation n r ) = P ( n, r ) = Prn =
n! ( n - r )!
(1–7)
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 29
1.7 Counting the Ways
29
For positive integer n, n! is defined as n(n - 1)(n - 2) . . . 3 * 2 * 1, and 0! = 1. Permutations are indicated when the word “arrangement” or “order” is used. EXAMPLE 1.14
List the 6 permutations of the three letters a, b and c. For 3 objects taken 3 at a time, the number of permutations using 3! equation (1–7) is = 6. (3 - 3)! Solution
The 6 permutations are: abc, acb, bca, bac, cab, cba. The command (permute '(A B C) case) returns a list of the 3! permutations of A B C. Case is either D for distinguishable or I for Indistinguishable. If not given, I is assumed. (permute '(A B C) 'D) Æ ((A B C) (A C B) (B A C) (B C A) (C A B) (C B A)); Distinguishable (permute '(A B A ) 'I) Æ ((B A A) (A B A) (A A B)); A’s Indistinguishable (permute '(A B A) 'D) Æ ((A B A) (A A B) (B A A) (B A A) (A B A) (A A B)); A’s Distinguishable EXAMPLE 1.15
How many ways can 3 of 7 different books be arranged on a shelf? Use the multiplication FPC to get 7 * 6 * 5 = 210 permutations or 7! 5040 = = 210. 7 P3 to get (7 - 3)! 24 Solution
The command (permutation n r) returns nPr, if n is a number. (permutation 5 3) returns 60. The command (permutation list n) returns the arrangements of n items from the list set. For example, (permutation '(1 2 3 4 5) 3) returns the 60 permutations taken 3 at a time: ((1 (1 (2 (3 (3 (4 (5 (5
2 4 3 1 4 2 1 3
3) 5) 4) 2) 5) 3) 2) 4)
(1 (1 (2 (3 (3 (4 (5 (5
2 5 3 1 5 2 1 4
4) 2) 5) 4) 1) 5) 3) 1)
(1 (1 (2 (3 (3 (4 (5 (5
2 5 4 1 5 3 1 4
5) 3) 1) 5) 2) 1) 4) 2)
(1 (1 (2 (3 (3 (4 (5 (5
3 5 4 2 5 3 2 4
2) (1 4) (2 3) (2 1) (3 4) (4 2) (4 1) (5 3))
3 1 4 2 1 3 2
4) 3) 5) 4) 2) 5) 3)
(1 (2 (2 (3 (4 (4 (5
3 1 5 2 1 5 2
5) 4) 1) 5) 3) 1) 4)
(1 (2 (2 (3 (4 (4 (5
4 1 5 4 1 5 3
2) 5) 3) 1) 5) 2) 1)
(1 (2 (2 (3 (4 (4 (5
4 3 5 4 2 5 3
3) 1) 4) 2) 1) 3) 2)
P369463-Ch001.qxd
9/2/05
30
EXAMPLE 1.16
10:56 AM
Page 30
Chapter 1 Introduction to Probability
Compute the probability of getting an actual 4-letter word from the permutations of the word STOP. Solution The only method to solve this problem is to enumerate the 4! = 24 word candidates. By enumeration the permutations are listed below with 6 recognizable words. The probability is thus 6/24 = 1/4. TPOS TPSO TSPO STPO PTOS PTSO PSTO SPTO POTS POST PSOT SPOT TOPS TOSP TSOP STOP OTPS OTSP OSTP SOTP OPTS OPST OSPT SOPT
The command (permute '(s t o p)) returns the 24 permutations.
EXAMPLE 1.17
How many distinct 4-letter “words” can be made from the word “book”? Solution We notice the 2 indistinguishable letter o’s and reason that there are 4! = 24 ways of arranging the letters but 2! ways are indistinguishable. Thus the number of distinct words is 4!/2! = 12. We do not distinguish between the words bo1o2k, and bo2o1k, for example, by equating o1 to o2.
The command (permute '(b o o k) 'I) returns 12 indistinguishable permutations ((B O O K) (B O K O) (B K O O) (O B O K) (O B K O) (O O B K) (O O K B) (O K B O) (O K O B) (K B O O) (K O B O) (K O O B)) (perm-list '(b o o k)) returns 12; (perm-list '(m i s s i s s i p p i)) returns 34650.
In general, if there are n items to permute with n1 of one kind and n2 of another, and so on, up to nr of the rth kind, where n = n1 + n2 + . . . + nr, then the number of distinct permutations of the n objects is given by n! n Ê ˆ Ë n ,n , . . . , n ¯ = n !n !. . . n ! 1 2 r 1 2 r
(1–8)
n ˆ are called multinomial coefficients from the expansion of n1, n2 , . . . , nr ¯ (x1 + x2 + . . . + xr)n. The coefficient of x2y3z in the expansion of (2x + 3y - 4z)6 is 6! 2233(-4)1 = -25, 920. 2! 3!1! Ê The Ë
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 31
1.7 Counting the Ways
EXAMPLE 1.18
31
a) How many distinct 15-letter permutations are contained in the word “morphophonemics”? b) Find the term in the expansion of (2x2 - 3xy2 + 4z3)5 containing x7y2. Solution a) There are 2m’s, 3 o’s, 2p’s, and 2h’s; thus P(15; 2, 3, 2, 2,1,1,1,1,1,1) =
15! 3!*(2!)3 *(1!)6
= 27, 243, 216, 000.
The command (perm-list '(m o r p h o p h o n e m i c s)) Æ 27243216000. 5 ˆ 5 ˆ a 2 ( -3)b 4c x 2 a + b y 2 b z 3 c (2x 2 )a ( -3 xy 2 )b (4 z 3 )c = ÊË b) ÊË a b c¯ a b c¯ 2a + b = 7 and 2b = 2 fi b = 1 and a = 3 with c = 1. Thus the term is
5!
* 23 * ( -3)(4) x 7 y 2 z 3 = 20 * ( -96) x 7 y 2 z 3 =
3!1!1! -1920x7y2z3. EXAMPLE 1.19
(birthday problem). Compute the probability that in a random gathering of n people, at least 2 of them share a birthday, under appropriate assumptions of each is equally likely to be born on any of the 365 days and ignore twins and February 29 of leap years. Solution To have distinct birthdays the first person can be born on any of the 365 days, the next person on any of the remaining 364 days, and so on, and the nth person can be born on any of the 365 - n + 1 remaining days, yielding 365Pn distinct birthdays. The total number of possible birthday occurrences is 365n. Thus the probability of no one sharing a birthday is bability of at least two people sharing a birthday is 1 -
EXAMPLE 1.20
365
Pn
365n 365 Pn 365n
, and the pro.
Find the number of people to ask in order to find a person with the same birth month and day as yours with probability of 1/2. P(1 person not sharing your birthday)=
364
.
365 n
P ( n people not sharing your birthday)=
Ê 364 ˆ . Ë 365 ¯
P369463-Ch001.qxd
32
9/2/05
10:56 AM
Page 32
Chapter 1 Introduction to Probability
Table 1.5
Probabilities of at Least 2 People Sharing a Birthday
n
P(n)
n
P(n)
n
P(n)
n
P(n)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.0000 0.0027 0.0082 0.0163 0.0271 0.0404 0.0562 0.0743 0.0946 0.1169 0.1411 0.1670 0.1944 0.2231 0.2529 0.2836 0.3150 0.3469 0.3791 0.4114
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
0.4436 0.4756 0.5072 0.5383 0.5686 0.5982 0.6268 0.6544 0.6809 0.7063 0.7304 0.7533 0.7749 0.7953 0.8143 0.8321 0.8487 0.8640 0.8782 0.8912
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
0.9031 0.9140 0.9239 0.9328 0.9409 0.9482 0.9547 0.9605 0.9657 0.9703 0.9744 0.9780 0.9811 0.9838 0.9862 0.9883 0.9901 0.9916 0.9929 0.9941
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
0.9950 0.9959 0.9966 0.9971 0.9976 0.9980 0.9984 0.9987 0.9989 0.9991 0.9993 0.9994 0.9995 0.9996 0.9997 0.9997 0.9998 0.9998 0.9998 0.9999
n
Ê 364 ˆ Thus, 1 = 1/2 fi n = 252.65 or 253 people. Ë 365 ¯ Table 1.5 shows the birthday computations for n people from 1 to 80. The probability of at least 2 people sharing a birthday becomes greater than 1/2 with 23 people (n = 23). We also note that this same phenomenon pertains to deaths, for example, the probability that at least two former presidents of the United States share a birth month and day and the probability that at least two former presidents share a death day and month is the same. From the table for n = 43 presidents, the probability is 0.9239. James Polk and Warren Harding were born on November 2; Milliard Fillmore and William Howard Taft died on March 8. John Adams, Thomas Jefferson, and James Monroe all died on July 4. Of course, these presidents’ birth and death events do not prove the phenomenon. The command (birthday n) returns the probability of at least 2 of n people sharing a birthday. For example, (birthday 23) returns 0.5073.
Combinations Combinations are sampling sets of permutations without replacement and without regard for order. For example, our hand in a card game is the same
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 33
1.7 Counting the Ways
33
no matter what the order of the cards received or arranged. The number of combinations of r objects selected from a set of n objects is given by various notations such as n! n (combination n r ) = C ( n, r ) = Crn = n Cr = ÊË ˆ¯ = r r!( n - r )! for r = 0, 1, 2, . . . , n.
(1–9)
Combinations are a special case of the multinomial coefficients with two distinct kinds. When one kind is selected, the other kind is also separated from the pack, that is, one cannot fence in without fencing out. n Each of the ÊË ˆ¯ combinations may be permuted by r!, creating r! * r n Ê ˆ = P permutations. Ë r¯ n r Note that n - (n - r) = r so that n! Ê nˆ = Ê n ˆ = Ë r ¯ Ë n - r ¯ r!( n - r )! . EXAMPLE 1.21
(1–10)
Consider combinations of the 4 letters in the set {A B C D} taken 3 at a time to show that 4C3 = 4C1. Solution ABC (D), ACD (B), BCD (A), BDA (C). Included are the bold triplets comprising the left side of the equation 1-10; excluded are the singletons comprising the right side. There are 4 of each. The command (combination n r) returns nCr or if n is a list, the ways to select r items from the list. (combination 4 3) returns 4, where as (combination '(A B C D) 2) prints (A B C) (A B D) (A C D) (B C D). (combination-list '(A B C D) 3) Æ ((A B C) (A B D) (A C D) (B C D)) what is selected 4C3 ( (D) (C) (B) (A) ) what is not selected 4C1 Combinations are sometimes referred to as binomial coefficients because of the relationship to the binomial expansion of (x + 1)n. For example, (x + 1)3 = x3 + 3x2 + 3x + 1, and the coefficients 1, 3, 3, 1 correspond to 3C0, 3C1, 3C2 and 3C3. For positive integer n, we have the binomial expansion n
( x + y)n =
n
 ÊË r ˆ¯ x
r
yn-r .
r =0
Notice the symmetry due to nCr = nCn-r in equation (1-10). Notice that for x = y = 1,
P369463-Ch001.qxd
9/2/05
34
10:56 AM
Page 34
Chapter 1 Introduction to Probability n
(1 + 1)n = 2n =
n
 ÊË r ˆ¯ r =0
and that for x + y = 1, n
1=
n
 ÊË r ˆ¯ x
r
yn-r .
r =0
EXAMPLE 1.22
The start of Pascal’s triangle is shown below. The first row is the 0th row. a) Write the 9th row. b) Observe that each entry other than 1 is the sum of the two entries immediately above it. Write this observation and give an interpretation of one particular item being selected in the r objects from the n objects. c) Explain why the sum of each row in the triangle is 2n where n is the row number. 2
2
2
2
n n n n 2n d) Verify for any row that ÊË ˆ¯ + ÊË ˆ¯ + ÊË ˆ¯ + . . . + ÊË ˆ¯ = ÊË ˆ¯ . 0 1 2 n n 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 ................... Solution Ê 9ˆ a) Ë ¯ 0 1 b)
Ê 9ˆ Ê 9ˆ Ê 9ˆ Ê 9ˆ Ê 9ˆ Ê 9ˆ Ë 1¯ Ë 2¯ Ë 3¯ Ë 4¯ Ë 5¯ Ë 6¯ 9 36 84 126 126 84 Ê nˆ = Ë r¯
Total ways to select r from n.
Remove item from set. Select r - 1 from n - 1 and reinsert item into each set n
n n c) (1 + 1) = 2 =
Ê 9ˆ Ê 9ˆ Ê 9ˆ Ë 7¯ Ë 8¯ Ë 9¯ 36 9 1 Ê n - 1ˆ Ë r - 1¯
+
Ê n - 1ˆ Ë r ¯ Remove item from set and select r from n - 1 items without item.
n
 ÊË r ˆ¯ r =0
d) For the 4th row 1 4 6 4 1: 1 + 16 + 36 + 16 + 1 = 70 = 8C4.
The command (pascal n) returns the nth row of Pascal’s triangle. (pascal 10) returns (1 10 45 120 210 252 210 120 45 10 1).
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 35
1.7 Counting the Ways
EXAMPLE 1.23
35
Use combinations to determine how many distinct 15-letter permutations are contained in the word morphophonemics. Solution There are 2m’s, 3 o’s, 2p’s, and 2h’s; thus 15!
15ˆ Ê12ˆ Ê10ˆ Ê 8ˆ Ê 6ˆ Ê 5ˆ Ê 4ˆ Ê 3ˆ Ê 2ˆ Ê1ˆ 3 ¯ Ë 2 ¯ Ë 2 ¯ Ë 2¯ Ë 1¯ Ë 1¯ Ë 1¯ Ë 1¯ Ë 1¯ Ë1¯ 3! 2! 2! 2!(1!) = 455 * 66 * 45 * 28 * 720 = 27, 243, 216, 000. 6
= ÊË
After choosing 3 from the 15, we are left with 12 to choose the 2 of a kind, and then 10 to choose 2, 8 to choose 2, and 6! ways to choose the remaining 6, followed by an application of the FTC. Of course, the order of the choice of letters is arbitrary. EXAMPLE 1.24
Mr. and Mrs. Zero want to name their child so that the three initials are in alphabetical order. How many choices are there? Solution It is good practice to solve a simpler problem where enumeration can be used to gain insight into the original problem. Suppose the last name ended in the letter E. Enumerating the solution yields AB-E AC-E AD-E BC-E BD-E CD-E and there are 6 alphabetical pairs of initials from selecting a combination of 4 taken 2 at a time. Thus any two letters from the 25 letters A to Y can be selected to form a set of alphabetical initials with Z, giving 25C2 = 300. The command (print-combinations '(A B) 25) generates the 300 candidate combinations for the first and middle initials in lexicographic order. (A B) (A C) (A D) . . . (A W) (A X) (A Y) (B C) (B D) (B E) . . . (B W) (B X) (B Y) ... (W X) (W Y) (X Y).
EXAMPLE 1.25
An n-gon is a regular polygon with n equal sides. Find the number of diagonals of a) a hexagon (6-gon); b) a 10-gon; c) the n-gon with the same number of diagonals as sides. Solution a) The number of vertices in a hexagon is 6. Two points determine a line. Thus 6C2 = 15 lines, 6 sides and 9 diagonals. b) 10C2 = 45, thus 10 sides and 35 diagonals. c) Number of lines minus the number of sides = number of diagonals. nC2 - n = n fi n(n - 1)/2 - 2n or n = 5; thus a pentagon.
P369463-Ch001.qxd
9/2/05
36
EXAMPLE 1.26
10:56 AM
Page 36
Chapter 1 Introduction to Probability
(5-card poker). In a deck of 52 playing cards there are 13 different ranks from the ace, deuce, trey, etc., to the jack, queen, king or from 1 to 13. We compute the probabilities for designated 5-card poker hands. Each hand means nothing more, nothing less, e.g., 4 of a rank is not considered two pairs; neither is one pair considered in 3 of a rank or a straight considered in a flush or a flush in a straight. The ace is considered to be both low and high in a straight. A bust is a hand of 5 cards with at least 2 suits present, without duplicates and with not all 5 cards consecutive. Top-down combination hierarchy can be used for symmetrical sample spaces. There are deck.
52
C5 = 2,598,960 ways to select the 5 cards from the 52 in the
The command (combination n r) returns nCr, for example, (combination 52 5) returns 2598960.
Let NOWTS denote number of ways to select. A canonical form of each holding is shown with letters. For example, the canonical pattern [ww xyz 4 ranks] is used to designate a poker hand of exactly one pair with 4 (w x y z) of the 13 ranks represented. Probabilities are computed for the following poker hands along with the odds (q/p) to 1 against. NOWTS 4 ranks NOWTS the rank in hand from to be the pair 13 in deck
NOWTS 2 of pair NOWTS 1 from 4 from 4 in deck for each (the pair) single from the remaining 3 ranks
3
P(1 pair) =
Ê13ˆ Ê 4ˆ Ê 4ˆ Ê 4ˆ Ë 4 ¯ Ë 1¯ Ë 2¯ Ë 1¯ Ê 52ˆ Ë 5¯
=
1, 098, 240
ª 0.42257 or 1.37 : 1.
2, 598, 960 [ww x y z 4 ranks]
13 Numerator: ÊË ˆ¯ fi e.g. 4 ranks; [ Jacks, Fives, Sevens, Kings]; 4 Ê 4ˆ fi e.g.; [ J, J, F, S, K] Ë 1¯ Ê 4ˆ fi e.g.; [ J , J , 5, 7, K] H S Ë 2¯ Ê 4ˆ fi e.g.; [ J , J , 5 , 7, K] H S C Ë 1¯
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 37
1.7 Counting the Ways
37
Ê 4ˆ fi e.g.; [ J , J , 5 , 7 , K] H S C D Ë 1¯ Ê 4ˆ fi e.g.; [ J , J , 5 , 7 , K ]. H S C D D Ë 1¯ NOWTS 3 ranks NOWTS which 2 NOWTS the pair NOWTS the single in hand from 13 of 3 ranks to cards from 4 card from 4 for the ranks in deck be pairs for each pair remaining rank 2
P(2 pairs) =
Ê13ˆ Ê 3ˆ Ê 4ˆ Ê 4ˆ Ë 2 ¯ Ë 2¯ Ë 2¯ Ë 1¯
123, 552
=
Ê 52ˆ Ë 5¯
ª 0.04754 or 20.04 : 1.
2, 598, 960 [xx yy z 3 ranks]
NOWTS 3 ranks NOWTS triplet NOWTS triplet NOWTS the 2 single in hand from 13 rank from 3 from 4 in deck ranks from 4 each in deck ranks in hand in deck 2
P(3 of a rank) =
Ê13ˆ Ê 3ˆ Ê 4ˆ Ê 4ˆ Ë 3 ¯ Ë 1¯ Ë 3¯ Ë 1¯
=
Ê 52ˆ Ë 5¯
54, 912
ª 0.02113 or 46.33 : 1.
2, 598, 960 [xxx y z 3 ranks]
NOWTS 2 ranks NOWTS rank of in hand from 13 4 from the 2 in deck ranks in hand
P(4 of a rank) =
Ê13ˆ Ê 2ˆ Ê 4ˆ Ê 4ˆ Ë 2 ¯ Ë 1¯ Ë 4¯ Ë 1¯
=
Ê 52ˆ Ë 5¯
NOWTS 4 of a NOWTS the single rank rank from 4 from 4 in deck in deck
624
ª 0.00024001 or 4164 : 1.
2, 598, 960 [xxxx y 2 ranks]
NOWTS 2 ranks NOWTS triplet NOWTS the triplet NOWTS the pair in hand from 13 rank from the 2 from 4 in deck from 4 in deck ranks in deck ranks in hand
P( Full-house) =
Ê13ˆ Ê 2ˆ Ê 4ˆ Ê 4ˆ Ë 2 ¯ Ë 1¯ Ë 3¯ Ë 2¯ Ê 52ˆ Ë 5¯
=
3,744
ª 0.00144058 or 693.12 : 1.
2, 598, 960 [xxx yy 2 ranks]
P369463-Ch001.qxd
38
9/2/05
10:56 AM
Page 38
Chapter 1 Introduction to Probability
NOWTS 1st starting card NOWTS 1 card in (ace-1 through 10) or 1st end hand from 4 cards in card (5 through Ace-13). deck for each of 5 ranks
Take away 40 straights that are also flushes
5
P(Straight) =
Ê10ˆ Ê 4ˆ - 40 Ë 1 ¯ Ë 1¯ Ê 52ˆ Ë 5¯
=
10, 240 - 40
10, 200
=
2, 598, 960
2, 598, 960 [abcde 5 ranks]
ª 0.0039246 or 253.8 : 1,
where the 40 straight flushes (4 suits * 10 starting cards) are subtracted from the 10,240 straights to get 10,200 pure straights. NOWTS 5 ranks in hand from 13 ranks in deck
P ( Flush) = P ( Flush) =
NOWTS 1 suit in hand Take away 40 flushes from 4 suits in deck that are also straights
Ê13ˆ Ê 4ˆ - 40 Ë 5 ¯ Ë 1¯ Ê 52ˆ Ë 5¯
=
5,148 - 40 2, 598, 960
=
5,108 2, 598, 960 [adfmv 5 ranks]
ª 0.001965 or 507.8 : 1.
There are 40 straight flushes (4 suits * 10 starting cards) among the 5148 flushes leaving 5108 flushes and nothing more (odds are 64973 : 1). There are 4 royal flushes (649740 : 1). NOWTS 5 ranks in hand from 13 ranks in deck
NOWTS single card in Take away all straights hand from 4 in deck or flushes for each type
5
P (Bust) =
Ê 13ˆ Ê 4ˆ Ë 5 ¯ Ë 1¯
- P (Straight + Flush)
[adfmv 5 ranks]
Ê 52ˆ Ë 5¯ =
1, 317, 888 - (10, 240 + 5,148 - 40)
2, 598, 960 ª 0.501177 or 1.004 : 1 in favor.
=
1, 302, 540 2, 598, 960
P(Straight + Flush) = P(Straight) + P(Flush) - P(Straight Flush). We have subtracted from the bust template the 10,240 straights and the 5148 flushes and added the 40 straight flushes to get 1,302,540 bust hands. A summary of the poker probabilities is shown in Table 1.6.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 39
1.7 Counting the Ways
Table 1.6
39
5-Card Poker Probabilities
Event
Number of Ways
Probability
13 4 4 4 1, 098, 240 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 4 ¯ Ë 1¯ Ë 2¯ Ë 1¯
1 pair (ww xyz)
3
0.42257
2
13 3 4 4 125, 552 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 4 ¯ Ë 2¯ Ë 2¯ Ë 1¯
2 pair (xx yy z)
0.04754
2
3 of a rank (xxx y z)
13 3 4 4 54, 912 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 3 ¯ Ë 1¯ Ë 3¯ Ë 1¯
4 of a rank (xxxx y)
13 2 4 4 624 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 2 ¯ Ë 1¯ Ë 4¯ Ë 1¯
0.000240
13 2 4 4 3, 744 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 2 ¯ Ë 1¯ Ë 3¯ Ë 2¯
0.001441
full house (xxx yy)
0.02113
5
straight
10 4 10, 200 = Ê ˆ Ê ˆ - 40 Ë 1 ¯ Ë 1¯
0.0039246
4 13 5,108 = Ê ˆ Ê ˆ - 40 Ë 1¯ Ë 5 ¯
0.001965
flush (a d f m v)
40 = 10 * 4
straight flush (a b c d e)
0.0000154
5
13 4 1,302,540 = Ê ˆ Ê ˆ - straight - flush + Ë 5 ¯ Ë 1¯ straight flush
bust (a d f m v)
0.501177
The sum of the number of no-bust hands and bust hands is 2,598,960 52 hands, ÊË ˆ¯ . 5 EXAMPLE 1.27
(poker dice). Poker dice is played with 5 dice, with the 6 sides on each die usually marked with the 9, 10, jack, queen, king, and ace symbols. However, regular dice suffice. The total number of ways that 5 dice can fall is just the Cartesian product 6 * 6 * 6 * 6 * 6 = 65 = 7776. We show a variation of the process used in 5-card poker to emphasize the several different but equivalent ways of computing probabilities and repeat some calculations by using the previous counting method. The specific probabilities and number of ways to get the various poker combinations are as follows: NOWTS 1 of NOWTS which NOWTS the 5 NOWTS the 6 ranks 2 of the 5 remaining 4 remaining dice are the numbers for numbers for 4th die pair the 3rd die
P(1 pair) =
Ê 6ˆ Ê 5ˆ Ê 5ˆ Ê 4ˆ Ê 3ˆ Ë 1¯ Ë 2¯ Ë 1¯ Ë 1¯ Ë 1¯ 6
or
5
=
3600 7776
ª 0.46296
NOWTS the 3 remaining numbers for the 5th die
P369463-Ch001.qxd
40
9/2/05
10:56 AM
Page 40
Chapter 1 Introduction to Probability
NOWTS NOWTS 4 ranks 1 from 4 from 6 ranks to be the pair
P(1 pair) =
NOWTS which of the 5 dice are the pair
NOWTS which of 3 remaining dice is single type
Ê 6ˆ Ê 4ˆ Ê 5ˆ Ê 3ˆ Ê 2ˆ Ê1ˆ Ë 4¯ Ë 1¯ Ë 2¯ Ë 1¯ Ë 1¯ Ë1¯
3600
=
5
(6 ) NOWTS 2 from 6 numbers
P(2 pairs) =
NOWTS NOWTS which of 2 remaining die remaining for single type dice is single type
[ww xyz 4 ranks]
.
7776
NOWTS 2 of NOWTS 2 of 5 dice for remaining 3 dice 1st pair for 2nd pair
Ê 6ˆ Ê 5ˆ Ê 3ˆ Ê 4ˆ Ë 2¯ Ë 2¯ Ë 2¯ Ë 1¯ 6
5
=
1800
NOWTS 1 of the remaining 4 numbers for 5th die
ª 0.23148
7776
or NOWTS NOWTS 2 of NOWTS 3 ranks 3 ranks to be which of the from 6 the pairs 5 dice occupy 1st pair
P(2 pairs) =
Ê 6ˆ Ê 3ˆ Ê 5ˆ Ê 3ˆ Ê1ˆ Ë 3¯ Ë 2¯ Ë 2¯ Ë 2¯ Ë1¯ 6
=
5
1800
NOWTS which of 3 remaining dice {≤} occupy 2nd pair
NOWTS which of 1 die occupy single type
[xx yy z 3 ranks]
.
7776
NOWTS 1 NOWTS 3 of NOWTS 1 of NOWTS 1 of from 6 numbers 5 dice for 3 remaining 5 numbers remaining 4 of a kind numbers
P(3 of a rank) =
Ê 6ˆ Ê 5ˆ Ê 5ˆ Ê 4ˆ Ë 1¯ Ë 3¯ Ë 1¯ Ë 1¯ 6
5
=
1200
ª 0.15432
7776
or
P(3 of a rank) =
Ê 6ˆ Ê 3ˆ Ê 5ˆ Ê 2ˆ Ê1ˆ Ë 3¯ Ë 1¯ Ë 3¯ Ë 1¯ Ë1¯ 6
5
=
1200
.
[xxx yz 3 ranks]
7776
NOWTS 1 of NOWTS 4 dice of NOWTS 1 of the 5 remaining 6 numbers 5 for the 4 of a kind numbers for last dice
P(4 of a rank) =
Ê 6ˆ Ê 5ˆ Ê 5ˆ Ë 1¯ Ë 4¯ Ë 1¯ 6
5
=
150 7776
ª 0.01929
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 41
1.7 Counting the Ways
41
or
P(4 of a rank) =
Ê 6ˆ Ê 2ˆ Ê 5ˆ Ê1ˆ Ë 2¯ Ë 1¯ Ë 4¯ Ë1¯ 6
P(5 of a rank) =
5
Ê 6ˆ Ê 5ˆ Ë 1¯ Ë 5¯ 6
5
=
150
.
[xxxx y 2 ranks]
7776
=
6
ª 0.0007716.
[xxxxx 1 type]
7776
P(straight) = 2 straight sequences * 5! (Permutations of the dice) = 240 of 7776 ways = 0.03086. The two straight sequences are (1 2 3 4 5) and (2 3 4 5 6).
P( full house) =
Ê 6ˆ Ê 2ˆ Ê 5ˆ Ê 2ˆ Ë 2¯ Ë 1¯ Ë 3¯ Ë 2¯ 6
5
=
300
ª 0.03858.
[xxx yy 2 ranks]
7776
P(Bust) = 4 * 5! = 480 of 7776 ways = 0.061728, from the four canonical dice patterns constituting a bust: (1 2 3 4 6), (1 2 3 5 6), (1 2 4 5 6), and (1 3 4 5 6). Notice that the number of possible ways for the outcomes of poker dice to occur sums to a total of 7776. Also notice that the sum of the probabilities is equal to 1. The command (sim-die-roll n) simulates the outcomes from rolling a die n times or n dice once. For n = 5, a roll of poker dice is simulated. Approximately every other roll should produce a pair since the probability is 0.46. For example, (sim-die-roll 5) may return (6 1 4 2 6). (print-count-a-b 1 6 (sim-die-roll 1296)) returns a count each face in 1296 tosses of a fair die. Integer Count
1 2 3 4 5 6 208 206 214 219 212 237
(sim-k-dice k n) returns n sums from throwing k dice. For example, (sim-k-dice 5 10)) returned (15 14 19 21 24 17 16 26 9 13) with a 17.4 sample mean versus the theoretical 17.5 population mean (5 * 3.5). (sim-poker-dice n) returns n simulated rolls in poker dice. For example, (setf poker-rolls (sim-poker-dice 15)) may return ((3 6 6 6 2) (4 3 4 3 4) (5 4 4 1 5) (4 5 4 6 5) (4 1 3 6 3) (1 3 4 1 2) (1 4 6 5 6) (5 2 3 5 6) (1 5 1 5 4) (5 3 2 3 2) (2 3 4 6 3) (2 6 6 5 3) (2 5 5 5 3) (4 3 6 5 1) (5 4 1 1 2))
P369463-Ch001.qxd
9/2/05
42
10:56 AM
Page 42
Chapter 1 Introduction to Probability
Table 1.7
Poker Dice Probabilities
Event
Number of Ways
Probability
1 pair
6 4 5 3 2 1 3600 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 4¯ Ë 1¯ Ë 2¯ Ë 1¯ Ë 1¯ Ë1¯
0.462963
2 pairs
6 3 5 3 1 1800 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 3¯ Ë 2¯ Ë 2¯ Ë 2¯ Ë1¯
0.231481
3 of a kind
6 3 5 2 1 1200 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 3¯ Ë 1¯ Ë 3¯ Ë 1¯ Ë1¯
0.154321
straight
240 = 2 * 5!
0.030864
bust
480 = 4 * 5!
0.061728
full house
6 2 5 2 300 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 2¯ Ë 1¯ Ë 3¯ Ë 2¯
0.038580
4 of a kind
6 2 5 1 150 = Ê ˆ Ê ˆ Ê ˆ Ê ˆ Ë 2¯ Ë 1¯ Ë 4¯ Ë1¯
0.019290
5 of a kind
6 5 6 = Ê ˆÊ ˆ Ë 1¯ Ë 5¯
0.000772
(count-1-pair poker-rolls) returns the number of “just one pair” rolls with the computed probability (7 15 0.466667). Similarly, (count-2-pair (sim-poker-dice 1296)) returned (332 0.256173). (count-3-of-rank (sim-poker-dice 1296)) returned (217 0.167438) (count-4-of-rank (sim-poker-dice 1296)) returned (28 0.021605) (count-5-of-rank (sim-poker-dice 1296)) returned (1 7.716049e-4) (count-full-house (sim-poker-dice 1296)) returned (47 0.036265)
in 15 1296 1296 1296 1296 1296
A summary of the poker dice events is shown in Table 1.7. EXAMPLE 1.28
The experiment is rolling 4 fair dice. Let S18 indicate the outcome sum is 18. a) Compute P(S18) using canonical patterns. b) Suppose that one of the dice is shown to be a 5. Compute P(S18). c) Suppose a second die is also shown to be a 5. Compute P(S18). d) Suppose a third die is shown to be a 3. Compute P(S18). e) Verify P(S9) in rolling 3 fair dice by computing the probability of rolling an 8 with 2 dice and a 1 with one die, etc. Denote S(a|b)), as the probability of rolling a sum of a with b dice. Then S(9 3) = S(8 2) * S(1 1) + S(7 2) * S(2 1) + S(6 2) * S(3 1) + S(5 2) * S(4 1) + S(4 2) * S(5 1) + S(3 2) * S(6 1).
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 43
1.7 Counting the Ways
43
Solution a) Canonical patterns and number of permutations are (1 5 6 6) (2 4 6 6) (2 5 5 6) (3 3 6 6) (3 4 5 6) (3 5 5 5) (4 4 4 6) (4 4 5 5) 12 12 12 6 24 4 4 6 P ( S18 ) = 80/1296 = 5/81 = 0.06173. b) Seeing a 5, the canonical patterns are now (1 5 6 6) (2 4 6 6) (2 5 5 6) (3 3 6 6) (3 4 5 6) (3 5 5 5) (4 4 4 6) (4 4 5 5) or (1 6 6) (2 5 6) (3 4 6) (3 5 5) (4 4 5), resulting in 21 ways, or 21/216 = 7/72. 3 6 6 3 3 Equivalency, P(S13) in tossing 3 dice. c) Seeing a second 5, the canonical patterns are now (1 6 6) (2 5 6) (3 4 6) (3 5 5) (4 4 5) or (2 6) (3 5) (4 4), resulting in 5 ways, or 5/36. Equivalency, P(S8) in tossing 2 dice. d) Seeing a third die bearing a 3, the canonical pattern is (2 6) (3 5) (4 4) (5) with probability 1/6. e) S(9 3) = S(8 2) * S(1 1) + S(7 2) * S(2 1) + S(6 2) * S(3 1) + S(5 2) * S(4 1) + S(4 2) * S(5 1) + S(3 2) * S(6 1) 5 1 6 1 5 1 4 1 3 1 2 1 = * + * * * + * + * + * 36 6 36 6 36 6 36 6 36 6 36 6 25 = . 216
EXAMPLE 1.29
There are 5 different pairs of shoes in a closet and 5 shoes are randomly selected. Compute the probability of a) 0 matching pairs, b) 1 matching pair, and c) 2 matching pairs. Solution 5
a) P(0 pairs) =
Ê 5ˆ Ê 2ˆ Ë 5¯ Ë 1¯ Ê10ˆ Ë 5¯
=
32 252
.
[v w x y z 5 types or “ranks”]
P369463-Ch001.qxd
9/2/05
44
10:56 AM
Page 44
Chapter 1 Introduction to Probability 3
b) P(1 pairs) =
Ê 5ˆ Ê 4ˆ Ê 2ˆ Ê 2ˆ Ë 4¯ Ë 1¯ Ë 2¯ Ë 1¯
=
Ê10ˆ Ë 5¯
160
.
[ww x y z 4 types]
.
[xx yy z 3 types]
252 2
c) P(2 pairs) =
Ê 5ˆ Ê 3ˆ Ê 2ˆ Ê 2ˆ Ë 3¯ Ë 2¯ Ë 2¯ Ë 1¯
60
=
Ê10ˆ Ë 5¯
252
For 2 pairs, we see that there are 3 types (ranks) of shoes to be chosen from 5 types in the sample space; two of these three types are to be chosen as the pairs, followed by the number of ways to choose each of each pair and the singleton shoe. We also note that the sum of the probabilities (32 + 160 + 60)/252 is 1. EXAMPLE 1.30
(craps). The game of craps is played with 2 dice. The roller rolls the dice. If the sum of the dice on the first roll is 2, 3, or 12, the roller loses; if 7 or 11, the roller wins. Other participants may bet with the roller or against the roller (with or against the house). If the first sum is any sum other than 2, 3, 12, 7, or 11, then that sum (called the point) becomes the roller’s point and the roller continues to roll the dice until the point returns (roller wins) or the sum of 7 occurs (roller loses). Compute the probability of the roller winning in the game of craps. Solution Let Si denote that the sum of the dice is i. Let P(Si, (Si | Si or S7) indicate that the point is i and that subsequent tosses result in Si before S7. For example, P(S4) = 3/36, P(S4 or S7) = (3 + 6)/36, and P(S4 | S4 or S7) = 3/9. Observe the symmetry resulting in equal probabilities as P(S4) = P(S10), P(S5) = P(S9), and P(S6) = P(S8). P ( win ) = P ( S7 ) + P ( S11 ) + 2{ P [ S4 , S4 ( S4 or S7 )] + P [ S5 , S5 ( S5 or S7 )] + P [ S6 , S6 ( S6 or S7 )]} = 6/36 + 2/36 + 2[3/36 * 3/9 + 4/36 * 4/10 + 5/36 * 5/11] 244 = = 0.492. 495 2
P ( win ) = P ( S7 ) + P ( S11 ) + 2
Â
i = 4 ,5 ,6
[ P( Si )]
P ( Si ) + P ( S7 )
P ( lose) = P ( S2 ) + P ( S3 ) + P ( S12 ) + 2
Â
i = 4 ,5 ,6
.
P ( Si ) P ( S7 ) P ( Si ) + P ( S7 )
.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 45
1.7 Counting the Ways
45
(sim-craps n) returns the outcome sums and results (L for Lose W for Win) from n plays of craps and returns the simulated probability of winning. (sim-craps 10) returned (((9 W)) ((10 L)) ((3 L)) ((6 W)) ((9 W)) ((7 W)) ((6 L)) ((7 W)) ((2 L)) ((6 L))) (sim-wins-in-craps 10000) returns the number of each sum and the number of wins from 10,000 plays. Expect the number of wins to be near 4920. (sim-wins-in-craps 10000) printed Digit Count
2 302
3 537
4 825
5 1089
6 1371
7 1676
8 1330
9 1121
10 917
11 545
12 287
Number of wins = 4901
EXAMPLE 1.31
(5-card poker revisited). There are always several ways to solve probability problems and it is beneficial to become familiar with as many ways as possible. We compute a few of the 5-card poker hands by the method called “on the fly” as each selection is made or by writing a canonical pattern and multiplying by the implied number of distinct patterns. We begin with the probability of a full house. One such way is to write the individual probabilities for a canonical pattern and then to contemplate how many distinct permutations can be generated from the pattern. For example, we could draw a triplet of the same rank, followed by a doubleton of a different rank to get a probability of 52
*
3
*
2
*
48
*
3
ª 0.0001441.
52 51 50 49 48 Notice that the first rank can be any card, followed by 2 matches of that rank. After a first match, the rank is implicitly specified. The second rank can be any card of the remaining 49 except for the 4th card of the first rank, thus leaving 48, followed by a match of the 2nd rank implicitly specifying the second type. The number of favorable permutations is 10, given by P(5; 3, 2) or, equivalently, 5C3 * 2C2. Each of these 10 selections will contain the equivalent composite numerator. There are 3 indistinguishable selections for the triple and 2 indistinguishable selections for the doubleton. Thus, P(full house) =
52
*
3
*
2
*
48
*
3
*
5!
= 0.001441.
52 51 50 49 48 3! 2! In similar fashion, we find the probability of 3 of one suit and 2 of another. P(3 of one suit and 2 of another) =
52 12 11 39 12 5! * * * * * = 0.103. 52 51 50 49 48 3! 2!
P369463-Ch001.qxd
46
9/2/05
10:56 AM
Page 46
Chapter 1 Introduction to Probability
Notice after selecting 3 of one suit that there remain 39 cards for the second suit, but that once the second suit is selected and thus specified, only 12 more are favorable for selecting another of that suit. After writing the canonical pattern, we multiply by the permutations generated by the pattern. There are P(5; 3, 2) orderings. Now consider the probability of 2 pairs in selecting 5 cards. A canonical 52 3 48 3 44 probability is shown as . We must be careful proceeding. * * * * 52 51 50 49 48 P(2 pairs) =
52
*
3
*
48
*
3
*
44
*
5!
= 0.04754.
52 51 50 49 48 2! 2!1! 2! Notice that the number of ways of getting 2 pairs is 15, given by 5! . 2! 2!1! 2! The number of occurrences of the first pair is 5C2, followed by the second pair, 3C2, giving 30; but the occurrence of the pairs is indistinguishable, accounting for the third 2! in the denominator, yielding 15 ways. P(3 of a rank) =
52
48 44 5! * * ª 0.021128. 52 51 50 49 48 3!1!1! 2! *
3
*
2
*
[xxx y z]
Similarly, we have 5C3 ways for the triple, followed by 2C1 ways for the first indistinguishable of 2 singletons, yielding 20/2 = 10 ways. How many 5 card poker hands would one expect from the first 1000 digits of p ? Here we are sampling with replacement [xxx y z]. P(3 of a rank) = 1 * 0.1 * 0.1 * 0.9 * 0.8 * 5C3 = 0.072 where the first entry 1 represents any of the 10 digits 0 to 9; the second entry 0.1 is the probability of matching as, is the third entry; the fourth entry is the probability of a nonmatching digit; and the fifth entry is the probability of not matching the triple and not matching the fourth entry. We have 1000/5 = 200 poker hands and 0.072 * 200 = 14.4 hands.
Match Problem Revisited In randomly mapping n items onto themselves, we may be interested in obtaining the probability that at least one item is assigned to itself or the complement probability that no item is assigned to itself. In Example 1.11, we enumerated the solutions for n = 3. There are n! permutation maps. Let each map be indicated by Ai, where Ai indicates that a match occurs at the ith position. To determine the probability of at least one matching integer, we seek the probability of the union of the sets Ai. The inclusion-exclusion principle gives
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 47
1.7 Counting the Ways n
n
i =1
i =1
47
P Ê U Ai ˆ = Â P ( Ai ) - Â P ( Ai A j ) + Ë ¯ i< j
Â
P ( Ai A j Ak ) - . . .
i< j< k
+ ( -1)n -1 P ( Ai A j Ak . . . An ) 1 1 1 n 1 n 1 n 1 - ... = ÊË ˆ¯ - ÊË ˆ¯ * + ÊË ˆ¯ * * 1 n 2 n n -1 3 n n -1 n - 2 n 1 + ( -1)n +1 ÊË ˆ¯ n n! = 1 - 1/2! + 1/3! - 1/4! + . . . + ( -1)n +11/ n! x2 x3 xn = 1 - e -1 as n Æ •, with e x = 1 + x + + + ... + ... 2! 3! n! The probability of at least one integer being assigned to itself is 1 - e-1, which equals 0.63212. This probability is practically independent of n for n ≥ 7. The probability of no map assigning any integer to itself is e-1 = 0.367979. Similarly, if one had 1000 letters randomly inserted into 1000 addressed envelopes, the probability of no letter matching its envelope is e-1; or if 100,000 hats at a football stadium were randomly blown around with each individual randomly claiming a hat, the probability of at least one person getting his or her own hat back is 1 - e-1, almost the same probability if there were only 10 people in the stadium. We now ask about the distribution of the number of matches in the mappings. With n distinct integers, there are n! permutation maps or assignments. How many of these maps have exactly 0 matches, how many have exactly 1 match, and how many have exactly r matches for r £ n? Note that none can have exactly n - 1 matches, because assuming n - 1 matches forces the remaining integer to match. Let N(n, r) indicate the number of n-permutation maps with exactly r matches. The probability of exactly r matches is then given by P (exactly r matches) =
N ( n, r )
.
n! In each of these n-permutation maps with exactly r matches, the other n - r selections all are mismatches called derangements. We can select the (n - r) positions from the n positions in n
n Cn - r = n Cr = ÊË ˆ¯ ways. r
Then the number of permutation maps with exactly r matches is given by the product of the number of derangements (0 matches) and the number of ways these (n - r) maps can be selected from the n maps. Thus the number of permutation maps with exactly r matches is given by n N ( n, r ) = N ( n - r, 0) * ÊË ˆ¯ . r
P369463-Ch001.qxd
9/2/05
48
10:56 AM
Page 48
Chapter 1 Introduction to Probability
Table 1.8
24 Permutations of 1-2-3-4 Maps
PERMUTATIONS
PERMUTATIONS
1
2
3
4
Number of Matches
1
2
3
4
Number of Matches
1 1 1 1 1 1 2 2 2 2 2 2
2 2 3 3 4 4 3 3 4 4 1 1
3 4 2 4 2 3 4 1 1 3 3 4
4 3 4 2 3 2 1 4 3 1 4 3
4 2* 2* 1 1 2* 0Æ 1 0Æ 1 2* 0Æ
3 3 3 3 3 3 4 4 4 4 4 4
4 4 1 1 2 2 1 1 2 2 3 3
1 2 2 4 4 1 2 3 3 1 2 1
2 1 4 2 1 4 3 2 1 3 1 2
0Æ 0Æ 1 0Æ 1 2* 0Æ 1 2* 1 0Æ 0Æ
The probability of exactly r matches in n! permutation maps is given by
P [( N ( n, r )] = =
n N ( n - r, 0) * ÊË ˆ¯ r n! N ( n - r, 0) r!( n - r )!
N ( n - r, 0) * =
n! r!( n - r )!
n! .
Table 1.8 shows that the number of 4-permutation maps with exactly 2 matches is equal to 4 N (4, 2) = N (2, 0) * ÊË ˆ¯ = 6. 2 Hence the probability of exactly 2 matches in the 4-permutation maps is 6/24 = 0.25. EXAMPLE 1.32
Four people check their hats and receive them back randomly. Compute the expected number of correct returns. Solution Let RV X be the number of correct returns and indicator RV Xi be 1 if the ith hat was correctly returned and 0 if not. Then X = X1 + X2 + X3 + X4. E( X i ) = p = 1/4 and E( X ) = E( X1 + X 2 + X 3 + X 4 ) = 1/4 + 1/4 + 1/4 + 1/4 = 1. To verify, regard X ’s density. X P(X )
0 9/24
1 8/24
2 6/24
3 0
4 1/24
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 49
1.7 Counting the Ways
49
For example, P ( X = 2) =
N ( n - r , 0) r!( n - r )!
=
N (4 - 2, 0) 2!(4 - 2)!
=
1 4
E( X ) = (0 * 9 + 1 * 8 + 2 * 6 + 3 * 0 + 4 * 1)/24 = 24/24 = 1. In computing the number of 2 correct matches, observe that the 2 correct matches can be chosen from 4 as 4C2 = 6 and that there are 4 - 2 = 2 mismatches which can only occur in one way [(1 2) Æ (2 1)]. To compute the number 1 correct match, there are 4C1 = 4 ways to choose the correct match and then 3 mismatches can only occur in 2 ways {(1 2 3) Æ (3 1 2); (1 2 3) Æ (2 3 1). The command (print-map n) returns the respective number of 0, 1, 2, . . . , n matching. (print-map 4) returns (9 8 6 0 1) corresponding to 9 of the 24 permutations with 0 matching, 8 permutations with 1 matching, 6 permutations with 2 matching (starred), 0 permutations with 3 matching, and 1 permutation with all 4 matching. (derange n) returns the number of derangements. (derange 4) returns 9 marked by Æ. The Poisson distribution with parameter k (in Chapter 3) provides an extremely close estimate for the probability of exactly r matches in n! maps for n ≥ 7. A short table of the number of no matches with their probabilities for n = 1 to 10 is shown in Table 1.9. Note that the probability of 0 matches for n = 10 is 1334961/10! e-1. Also observe that N(n, 0) = [n * N(n - 1, 0)] ± 1 where 1 is subtracted if n is odd and added if n is even. For example, N(5, 0) = [5 * N(4, 0)] - 1 = (5 * 9) - 1 = 44. To show for example that N(4, 0) = 9, we let each Ai be the event that the ith digit matches for i = 1 to 4. Then P(no match) = 1 - P(at least one match) = 1 - P(A1 + A2 + A3 + A4) with P ( no match ) = P ( A1c A2c A3c A4c ).
Table 1.9
Derangements and Probabilities of No Match
n
N(n, 0)
P(0 matches)
n
N(n, 0)
P(0 matches)
1 2 3 4 5
0 1 2 9 44
0 1/2 2/6 9/24 44/120
6 7 8 9 10
265 1,854 14,833 133,496 1,334,961
265/6! 1,854/7! 14,833/8! 133,496/9! 1,334,961/10!
P369463-Ch001.qxd
9/2/05
50
10:56 AM
Page 50
Chapter 1 Introduction to Probability
From the inclusion-exclusion principle (see Example 1.27) we have 4 1 4 1 1 4 1 1 1 4 1 1 1 1 9 1 - ÊË ˆ¯ * + ÊË ˆ¯ * * - ÊË ˆ¯ * * * + ÊË ˆ¯ * * * * = = 0.375, 1 4 2 4 3 3 4 3 2 4 4 3 2 1 24 and 0.375 * 24 = 9 maps with none having a match. The number of derangeÈ 1 1 1 ments N(n, 0) shown in Table 1.9 is given by N ( n, 0) = n! Í1 - + - + . . . Î 1! 2! 3! n 1 ˘ + (-1) . n! ˙˚ EXAMPLE 1.33
(recursive approach to the match problem). Recall that N(n, r) denotes the number of n-permutation maps that have exactly r matches. In particular, consider the permutations for n = 3 with 3, 1, and 0 matches, shown in Table 1.10. Suppose we introduce integer 4 to determine the number of 4-maps with no matching integers, N(4, 0). For each of the 2 maps of 3 integers with 0-match, the digit 4 can be exchanged with each of the 3 integers to create 3 * N(3, 0) 4-maps (3 * 2 = 6) with no matches. For example, 1 2 1 3
23Æ 31 23Æ 12
1 4 1 4
2 3 2 1
3 1 3 2
4 2 4 3
1 2 1 3
2 4 2 4
3 1 3 2
4 3 4 1
1 2 1 3
2 3 2 1
3 4 3 4
4 1 4 2.
For the 3-maps with a single match, the digit 4 can be exchanged with the single identity map assignment to destroy the assignment and create 3 * N(2, 0) 4-maps with no matches. For example, 1 2 3 Æ 1 2 3 4 132 4 3 2 1. Thus N(4, 0) = 3 * N(3, 0) + N(3, 1) = 3 * N(3, 0) + 3 * N(2, 0) = 3 * 2 + 3 * 1 = 9. Look at Table 1.9. Notice that the 44 maps of no match for n = 5 is given by (4 * 9) + (4 * 2). We illustrate this procedure in Table 1.11. Recursively, N ( n, 0) = ( n - 1) * N ( n - 1, 0) + ( n - 1) * N ( n - 2, 0) = ( n - 1) * [ N ( n - 1, 0) + N ( n - 2, 0)]. EXAMPLE 1.34
(3-door problem). There are 3 doors. Behind two of the doors are goats and behind one door is a car. The experiment is to select a door. Then the host, who knows exactly what is behind each door, opens another door to always Table 1.10
3-Permutation Maps
3-match
1-match
123 123
123 123 123 132 213 321
0-match 123 231
123 312
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 51
1.7 Counting the Ways
Table 1.11
51
Exchanging 4 with 1, 2, and 3 to Create All Mismatches 0-MATCH
123 231
4¨Æ1 1234 4312
4¨Æ2 1234 2413
4¨Æ3 1234 2341
4¨Æ1 1234 4123
123 312
4¨Æ2 1234 3421
4¨Æ3 1234 3142
1-MATCH 123 132
4¨Æ1 1234 4321
4¨Æ2 1234 3412
123 321
123 213
4¨Æ3 1234 2143
reveal a goat. You are then given the option to switch to the last remaining door or to maintain your original choice. The question is, should you switch or not to increase your probability of winning the car? Solution Clearly, if you never switch, the probability of winning the car is 1/3. And just as clearly, if you always switch, you will lose only if you initially choose the car. Thus the probability of winning by always switching is 2/3. Try the command (sim-doors) to simulate the game. EXAMPLE 1.35
(occupancy problem). We seek the distribution of n items into r containers. Consider the ways of placing 4 marbles into 2 urns. We first distinguish our marbles by ordering them in a row: _ _ _ _. From the multiplication FPC and the permutations of the marbles in that each marble can be placed in the any of the two urns, we have 2 * 2 * 2 * 2* = 24 = 16 total permutations of the placement. Each distinguishable way has probability 1/16, as shown in Table 1.12. Each indistinguishable way is shown in Table 1.13, which depicts the number of marbles in each urn. The probabilities in Table 1.13 can be computed from the multinomial coefficients. For example, the probability of a 1–3 distribution is given by 3
1
4! Ê 1 ˆ Ê 1 ˆ 1 P(1, 3) = = . Ë ¯ Ë ¯ 3!1! 2 2 4
In general, the probability of n indistinguishable items being distributed in k containers with outcome r1, r2, . . . , rk where the sum of the ri equals k is given by the formula n! r1! r2 ! . . . rk ! k n
.
(1–11)
In observing the possible arrangements of 4 marbles into 2 bins, the 5 pictorial representations or ordered partitions of 4 marbles are (0, 4) (1, 3) (2, 2) (3, 1) (4, 0) xxxx or x xxx or xx xx or xxx x or xxxx ,
P369463-Ch001.qxd
9/2/05
52
10:56 AM
Page 52
Chapter 1 Introduction to Probability
Table 1.12 Placing 4 Indistinguishable Marbles in 2 Bins
Table 1.13 Placing 4 Distinguishable Marbles in 2 Urns
INDISTINGUISHABLE Bin 1
Bin 2
Probability
xxxx xx xxx x
1/16 1/16 6/16 4/16 4/16
xxxx xx x xxx
DISTINGUISHABLE Bin 1 12 — 12 13 14 23 24 34 12 12 13 23 1 2 3 4
34
3 4 4 4
Bin 2 — 12 34 24 23 14 13 12 4 3 2 1 23 13 12 12
34
4 4 4 3
suggesting that we are selecting where to place the (2 - 1) dividing line to create the 2 bins among the 5 objects (4 marbles + 1 dividing line). We note that the total number of ways to put r indistinguishable objects into n bins is given by Ê r + n - 1ˆ = Ê r + n - 1ˆ . (1–12) Ë ¯ Ë n -1 ¯ r To check the number of ways to put r = 4 marbles into n = 2 bins, Table 1.12 displays the 5C1 = 5 = 5C4 different ways. EXAMPLE 1.36
a) How many ways can 20 indistinguishable marbles be placed into 5 bins with at least 1 marble in each bin? b) How many ways can 4 marbles be put in 2 bins with at least 1 marble in each bin? Display the marbles in the 2 bins. Solution a) Placing 1 marble in each of the 5 bins removes 5 marbles, leaving (20 + 5 -1 -5) objects to then be put in 5 bins, resulting in (combination 18 4) or (combination 18 14), returning 3060 ways. b) Similarly, placing 1 marble in each bin leaves 2 marbles to be put in 2 bins in the following 3 ways: (1, 3), (2, 2), and (3, 1).
EXAMPLE 1.37
How many ways can 5 coins be selected from pennies, nickels, and dimes?
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 53
1.7 Counting the Ways
53
Pennies
Nickels
Dimes
x xx x xx ...
xxxxx xxxx xxx xxx xx ...
x x ...
Solution The pennies, nickels, and dimes serve as bins or types of objects of 3 - 1 = 2, bringing the number of objects to manipulate (5 coins plus 2 bins) to 7, implying 7C2 = 7C5 = 21 ways. The ordered triples of pennies, nickels, and dimes can be enumerated canonically as (p n d): (0 0 5) (0 1 4) (0 2 3) (1 1 3) (1 2 2), yielding 3 + 6 + 6 + 3 + 3 = 21 distinguishable permutations among the three types of coins, but indistinguishable within the coin type. EXAMPLE 1.38
If all the relays in the circuit below have independent probability of 3/4 of being closed, find the probability that current flows from X to Y.
X
A
B
C
D
Y
Solution Method I: P(Relay is open) = 1/4. P(A is open OR B is open) = P(A + B) = P(A) + P(B) - P(AB). P(no current flow in top path) = P(A + B) = P(A) + P(B) - P(AB) = 1/4 + 1/4 - 1/16 = 7/16. Similarly, P(no current flow in bottom path) = 7/16. P(no current flow in top AND bottom paths) = (7/16)*(7/16) = 49/256. fi P(current flow) = 1 - 49/256 = 207/256. Method II: P(Relay is closed) = 3/4. P (current flow from X to Y ) = P ( AB + CD); i.e., both A And B are closed Or both C And D
are closed, = P ( AB ) + P (CD) - P ( ABCD) = 9/16 + 9/16 - 81/256 = 207/256.
P369463-Ch001.qxd
9/2/05
54
10:56 AM
Page 54
Chapter 1 Introduction to Probability
Method III: We enumerate the 16 positions of all the relays as shown below where 0 indicates open and 1 indicates closed. An asterisk indicates current flow. Summing the probabilities, the P(current flow) = 207/256.
A
B
C
D
Probability
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1
0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
0 1 0 1* 9/256 0 1 0 1* 27/256 0 1 0 1* 27/256 0* 9/256 1* 27/256 0* 27/256 1* 81/256 Total = 207/256
Method IV: P(Relay is open) = 1/4. P(no current flow from X to Y) = P[(A + B)(C + D)]; i.e., either A or B is open and either C or D is open. This method is an application of DeMorgan’s law. P [( A + B )(C + D)] = P ( AC + AD + BC + BD) = P ( AC ) + P ( AD) + P ( BC ) + P ( BD) - P ( ACAD) - P ( ACBC ) - P ( ACBD) - P ( ADBC ) - P ( ADBD) - P ( BCBD) + P ( ACADBC ) + P ( ACADBD) + P ( ADBCBD) + P ( ACBCBD) - P ( ACADBCBD) = 116 / + 116 / + 116 / + 116 / - 1/64 - 1/64 - 1/256 - 1/256 - 1/64 - 1/64 + 1/256 + 1/256 + 1/256 + 1/256 - 1/256 = 49/206. Thus P(current flow) = 1 - 49/206 = 207/256. Note that P(ACBC) = P(ABC) where events A, B, and C indicate relays are open. EXAMPLE 1.39
A slot machine has the following number of symbol patterns on three randomly rotating dials.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 55
1.7 Counting the Ways
55
Symbol
Dial 1
Dial 2
Dial 3
Bar Bell Plum Orange Cherry Lemon
2 1 7 8 2 0
1 8 2 2 7 0
1 7 3 4 0 5
20
20
20
Total
Compute the probability in a play that Bar bar bar occurs, P(BAR = 3) Plum plum plum occurs, P(Plum = 3) At least one orange occurs, P(Orange ≥ 1) No plum occurs, P(Plum = 0) At least one plum occurs, P(Plum ≥ 1) Exactly two plums occur, P(Plum = 2) (14 * 17 + 21 * 18 + 6 * 13)/8000 At least one bar occurs, P(Bar ≥ 1) 4/20 - 5/400 + 2/8000 = 751/4000 = 0.1877 h) Exactly one bar occurs, P(Bar = 1) [(2 * 19 * 19) + (18 * 1 * 19) + (18 * 19 * 1)]/8000 = 0.17575. i) Exactly one lemon occurs P(Lemon = 1) = P(Lemon ≥ 1) = (1)(1)(5/20) = 5/20 = 0.025. j) Develop pseudo-code for simulating a slot machine.
a) b) c) d) e) f) g)
Solution
Let O1 denote Orange on dial 1 and P2 denote Plum on dial 2, etc.
a) P(Bar = 3) = (2/20)(1/20)(1/20) = 2/8000 b) P(Plum = 3) = (7/20)(2/20)(3/20) = 42/8000 = 0.00525 c) P(Orange ≥ 1) = P(O1) + P(O2) + P(O3) - P(O1,O2) - P(O1,O3) - P(O2, O3) + P(O1,O2,O3) = 71/125 = 0.568 d) P(Plum = 0) = 13 * 18 * 17/8000 = 3978/8000 = 0.49725 e) P(P1 + P2 + P3) = P(P1) + P(P2) + P(P3) - P(P1, P2) - P(P1, P3) - P(P2, P3) + P(P1, P2, P3) = 12/20 - 41/400 + 42/8000 = 2111/4000 = 0.50275 Alternatively, P(Plum ≥ 1) = 1 - P(Plum = 0) = 1 - (13/20)(18/20)(17/20) = 1 - 0.49725 = 0.50275 f) P(exactly 2 Plums) = P(P1, P2, P3c) + P(P1c, P2, P3) + P(P1, P2c, P3) = (7/20)(2/20)(17/20) + (13/20)(2/10)(3/20) + (7/20)(18/20)(3/20) = 694/8000 = 0.08675
P369463-Ch001.qxd
56
9/2/05
10:56 AM
Page 56
Chapter 1 Introduction to Probability
g) P(Bar ≥ 1) = P(Bar1 + Bar2 + Bar3) = 4/20 - 5/400 + 2/8000 = 751/4000 = 0.1877 h) P(Bar = 1) = (2/10)(19/20)(19/20) + (18/20)(1/20)(19/20) + (18/20)(19/20)(1/20) = 1406/8000 = 0.17575 i) P(Lemon = 1) = 1 * 1 * 5/20 = 5/20 j) Slot machine simulation: Generate a random integer from 0 to 19 with the command (random 20) for each Dial. The frequency of occurrence determines how many integers to assign to the symbol. For example, the frequency of Orange occurring on Dial 1 is 8 and thus the 8 integers from 10 to 17 are assigned to the symbol. Freq
Dial 1
Return
Freq
Dial 2
Return
Freq
Dial 3
Return
2 1 7 8 2
0–1 2 3–9 10–17 18–19
Bar Bell Plum Orange Cherry
1 8 2 2 7
0 1–8 9–10 11–12 13–19
Bar Bell Plum Orange Cherry
1 7 3 4 5
0 1–7 8–10 11–14 15–19
Bar Bell Plum Orange Lemon
The command (setf plays (sim-slot 10)) may return ((PLUM ORANGE BELL) (ORANGE ORANGE LEMON) (ORANGE CHERRY BELL) (BAR BELL BELL) (BELL CHERRY BELL) (PLUM BELL PLUM) (ORANGE BELL PLUM) (PLUM BAR BELL) (ORANGE BELL ORANGE) (CHERRY BELL BELL)). Suppose we seek the probability of at least one plum, P(Plum ≥ 1). The command (member 'cat '(dog cat bird)) returns (cat bird) and NIL if there is no cat in the list. Thus the command (repeat #'member (list-of 10 'plum) plays) returns ((PLUM ORANGE BELL) NIL NIL NIL NIL (PLUM BELL PLUM) (PLUM) (PLUM BAR BELL) NIL NIL). Notice that 4 of the 10 plays returned at least one plum for a simulated probability of 0.4. The template (P-1-fruit fruit number-of-plays) returns the simulated probability for at least one of any symbol. For example, (P-1-fruit plum 1000) returned 0.498 versus 0.50275 theoretical. (P-1-fruit orange 1000) returned 0.553 versus 0.568 theoretical. (P-1-fruit lemon 1000) returned 0.248 versus 0.250 theoretical.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 57
1.8 Summary
57
The template (sim-slot-machine n) returns the theoretical and empirical number of each symbol in n plays for 3n total symbols. For example, (sim-slot-machine 1000) may return Symbol BAR BELL ORANGE PLUM CHERRY LEMON
1.8
Theoretical 200 800 600 700 450 250
Empirical 202 832 606 659 452 249
Summary This material on probability is usually the most difficult for the reader to comprehend. There are many different ways to think about the problems and many different ways to count. Readers can easily confuse permutations with combinations, distinguishable with indistinguishable, mutually exclusive with independent, sampling with replacement with sampling without replacement, and marginal probability with conditional probability. Many of these different types of problems can be solved after some clever way of viewing the problem comes to mind. Thus readers are encouraged to work as many of the problems as possible using the fundamental counting principles of addition and multiplication and the inclusion-exclusion principle. Recursive and generating functions are also useful. Moment generating functions will be introduced in the next chapter.
EXAMPLE 1.40
In a town are 3 hotels. Three persons come to town. Compute the probability that they stay in separate hotels. Solution Method I: Enumerate the sample space. Designate the persons as A, B, and C and the hotels as 1, 2, and 3. Enumerating, A
B
C
A
B
C
A
B
C
1 1 1 1 1 1 1 1 1
1 1 1 2 2 2 3 3 3
1 2 3 1 2 3* 1 2* 3
2 2 2 2 2 2 2 2 2
1 1 1 2 2 2 3 3 3
1 2 3* 1 2 3 1* 2 3
3 3 3 3 3 3 3 3 3
1 1 1 2 2 2 3 3 3
1 2* 3 1* 2 3 1 2 3
P369463-Ch001.qxd
58
9/2/05
10:56 AM
Page 58
Chapter 1 Introduction to Probability
From the 3 * 3 * 3 = 27 ways the 3 people can check in, there are only 6 ways where each checks into a different hotel. Thus P(not sharing hotels) = 6/27. Method II: Sophisticated counting. Enumerate the favorable ways (analogous to birthday problem). 3
P3 = 3 * 2 * 1 = 6 ways of not sharing a hotel, {(12 3)(1 3 2)(2 1 3)(2 3 1)(3 1 2)(3 2 1)} P ( not sharing hotels) = 3 P3 /33 = 6/27.
Method III: On the fly. i) The first person can check into any of the 3 hotels with probability 3/3 of not sharing; ii) The second person can then check into any of 2 of 3 hotels with probability 2/3 of not sharing; iii) The third person has only 1 of 3 with probability 1/3 P(not sharing hotels) = (3/3)(2/3)(1/3) = 2/9 = 6/27. Method IV: Canonical pattern. Find one canonical pattern and multiply by the number of permutations generated from the pattern with each pattern corresponding to (A B C). Let P(A Æ 1) denote the probability that person A checked into Hotel 1. P ( A Æ 1) = P ( B Æ 2) = P (C Æ 3) = 1/3 fi P (123) = (1/3)3 = 1/27. But there are 3! permutations (print-permutation '(1 2 3)) Æ (1 2 3)(1 3 2)(2 1 3)(2 3 1)(3 1 2)(3 2 1), yielding P( not sharing hotels) = 3!*(1/27) = 6/27. Method V: Backdoor. Find the probability of at least 2 people sharing a hotel. From the enumeration, there are 21 arrangements where at least 2 people share a hotel. Thus, 1 - 21/27 = 6/27 is the complement probability. Method VI: Inclusion-exclusion principle. Find the probability that at least one hotel was not used where the probability of a hotel not being used by A, B, and C is (2/3)3 = 8/27. If any hotel was not used, then sharing occurred (pigeonhole principle). Designate the not used hotel events as X, Y, and Z. Then P ( X + Y + Z ) = P ( X ) + P (Y ) + P ( Z ) - P ( XY ) - P ( XZ ) - P (YZ ) + P ( XYZ ) = 8/27 + 8/27 + 8/27 - 1/27 - 1/27 - 1/27 + 0 = 21/27 (probability of sharing). P( not sharing hotels) = 1 - 21/27 = 6/27, where P ( X ) = P (Y ) = P ( Z ) = (2/3)3 = 8/27.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 59
1.8 Summary
59
The essential probability concept is the probability of either of two events occurring: P ( A + B ) = P ( A) + P ( B ) - P ( AB ). The conditional probability formulation of event A occurring given that event B has occurred is given by P( A B ) =
P ( AB )
or P ( AB ) = P ( B ) P ( A B ) = P ( A) P ( B A).
P( B ) if A and B are independent, the conditional probability is P ( A B ) = P ( A) and P ( AB ) = P ( A) * P ( B ). The result can be generalized to three or more sets. The number of ways (combinations) to select r objects from a source of n objects is given by n
n! n n ˆ Cr = ÊË ˆ¯ = ÊË = . ¯ r n-r r!( n - r )!
The number of ordered arrangements (permutations) of n objects taken r at a time is given by n
Pr =
n! ( n - r )!
= n Cr * r Pr
Given n = n1 + n2 + n3 + . . . + nr, the number of ways the n objects can be arranged is given by n!
.
n1! n2 ! n3 ! . . . nr ! The number of ways to select (distribute) r items with repetition from (into) n different types (bins) is given by Ê r + n - 1ˆ = Ê r + n - 1ˆ . Ë ¯ Ë n -1 ¯ r
EXAMPLE 1.41
There are 7 different-colored marbles in an urn. How many possible groups of size 4 can be selected if sampling is a) with replacement; b) without replacement but color is relevant; or c) without replacement and color is not relevant. 7! 7! a) 74 = 2401; b) c) = 840; = 35. (7 - 4)! 4!(7 - 4)!
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 60
Chapter 1 Introduction to Probability
60
The answer to the boot problem is as follows. Since the neighbor received $100 from the owner, the neighbor neither gained nor lost. The owner’s loss must then equal the thief’s gain, $50 and a pair of boots.
PROBLEMS All dice, coins, and cards are fair under random selection unless otherwise specified. DICE
1. In tossing a die, compute the probability of a face showing a) an even number; b) a 2 or 4 or not a 6 by using the inclusion/exclusion template P(A + B + C) = P(A) + P(B) + P(C) - P(AB) - P(BC) - P(AC) + P(ABC); c) a 3 before a 4; d) a 3 before a 5 or 6; e) less than 4. ans. 1/2 5/6 1/2 1/3 1/2. 2. How many ordered ways can 2 dice fall? 3. Compute the number of ways to get a sum of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12 in rolling a pair of dice and compute the total number of ways. ans. 1 2 3 4 5 6 5 4 3 2 1 36. 4. Compute the probability of no repeating face in 2, 3, 4, 5, 6, and 7 tosses of a die. 5. Compute the probability that a) two dice match; b) 3 dice match; c) 5 dice match; d) n dice match. ans. 1/6 1/62 1/64 1/6n-1. 6. Find the probability that the absolute value of the difference between outcomes in tossing a pair of dice is a) 0; b) 1; c) 2; d) 3; e) 4; f) 5. Compute the sum of the probabilities. 7. Compute the probability of rolling exactly 3 fours with one roll of 5 dice. Exactly 3 “fours” includes the event that the other two dice may bear the same numeral. ans. 0.03215. 8. Compute the probability of rolling exactly 4 threes with one roll of 5 dice. 9. Compute the probability that the sum of 2 dice is either a 5 or a 9. ans. 8/36. 10. Let S7 denote the event that the sum of two dice is 7. Compute the probability of S7 given that one specified die is less than a) 6; b) 5; c) 4; d) 3; e) 2.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 61
61
Problems
11. In tossing a pair of dice compute the probability “on the fly” of the following sums Si. a) S7; b) S6; c) S5; d) S4; e) S3; f) S2. ans. 6/36 5/36 4/36 3/36 1/36. 12. Complete the probability table for the sum of 3 dice. Consider a canonical way to compute each sum. For example, for S8 the canonical events denoted by monotonic integers are (1 1 6), (1 2 5), (1 3 4), (2 2 4), and (2 3 3). (Permutations with repetitions). For the 3 events (1 1 6), (2 2 4), and 3! (2 3 3) there are = 3 ways for each to occur. For the 2 events 1! 2! (1 2 5) and (1 3 4) there are 3! ways for each to occur. Hence there are 3 * 3 + 2 * 6 = 21 ways for S8. Sum 3 4 5 6 7 8 9 10 11 ...
Canonical Patterns (1 1 (1 1 (1 1 (1 1 (1 1 (1 1 (1 2 (1 3 (1 4 ...
1) 2) 3) 4) 5) 6) 6) 6) 6)
(1 (1 (1 (1 (1 (1 (1
2 2 2 2 3 4 5
2) 3) 4) 5) 5) 5) 5)
(2 (1 (1 (1 (2 (2
2 3 3 4 2 3
2) 3) 4) 4) 6) 6)
(2 (2 (2 (2 (2
2 2 2 3 4
3) 4) 5) 5) 5)
(2 (2 (2 (3
Total
3 3 4 3
3) 4) (3 3 3) 4) (3 3 4) 5) (3 4 4)
1 3 6 10 15 21 25 27 27 ...
Note: If the basic elements (equally likely) of the sample space are the number of ways under the total column, the distribution is Maxwell-Boltzmann. If the basic elements are the canonical patterns, the distribution is Bose-Einstein. If the basic elements are only the canonical patterns without repetition, the distribution is Fermi-Dirac (invoking the Pauli exclusion principle). Under Fermi-Dirac assumptions, P(S3) = P(S4) = P(S5) = 0, and there is only one way to get S6, (1 2 3), 2 for S8, (1 2 5)(1 3 4), 3 for S9, (1 2 6)(1 3 5)(2 3 4), 3 for S10, (1 3 6)(1 4 5) (2 3 5), and 3 for S11, (1 4 6)(2 3 6)(2 4 5). P(S10) = 27/216 or 6/216 or 3/216 depending on the appropriate model of reality.
13. A pair of dice is loaded so that “1” on each die is twice as likely to occur as any of the other 5 faces. What is the most likely outcome of a) a die toss? b) a dice sum? ans. 1 7. 14. Compute directly the probability of losing at the game of craps with two dice. 15. Let P(S | n) denote the probability of getting sum S with n dice. Compute the probability of getting a sum of 15 with 3 dice canonically and computing P(15 | 3) = P(12 | 2) * P(3 | 1) + P(11 | 2) * P(4 | 1) + P(10 | 2) * P(5 | 1) + P(9 | 2) * P(6 | 1). ans. 10/63.
P369463-Ch001.qxd
62
9/2/05
10:56 AM
Page 62
Chapter 1 Introduction to Probability
16. a) Compute the probability of rolling 12 dice and receiving exactly 2 of each numeral. b) Repeat for receiving 1, 2, 3, 4, 1 and 1 of 1 to 6 respectively. 17. Use canonical patterns to compute the probability of getting a sum of 19 in rolling 4 dice. (See dice problem 12.) Use command (dice-4 19) to verify. ans. 56/1296. 18. Compute the probability of rolling 4 fair dice and getting 2 pair. 19. The numerals 4, 5, and 6 are twice as likely to occur as are the numerals 1, 2, and 3. If this unfair die is tossed 216 times, how many of each numeral is expected to occur? ans. 24 24 24 48 48 48. 20. Compute the probability of S7, a sum of 7, if the numerals 4, 5, and 6 are twice as likely to occur as are the numerals 1, 2, and 3 on each of a pair of unfair dice. 21. Compute the probability of a sum of 14 with 3 dice (S14,3) by calculating and adding the probabilities of S12,2 * S2,1 + S11,2 * S3,1 + S10,2 * S4,1 + S9,2 * S5,1 + S8,2 * S6,1. ans. 15/216. 22. a. Create the canonical patterns for a sum of 8 with 3 fair dice and find P(S8). b. Suppose you are shown a die bearing a 1. Now compute P(S8). c. Suppose a second die is revealed to bear a 2. Now compute P(S8). d. Verify the probability of the sum by computing the combination sums with two dice and then the third die with 7 and 1; 6 and 2; 6 and 3; 4 and 4; 3 and 5; and 2 and 6. For example, 7 and 1 means P(S7) with 2 dice and then getting a 1 on the third die. COINS
1. Compute the probability in respective flips that a) 2 coins match, b) 3 coins match, c) 5 coins match, and d) n coins match. ans. 1/2 1/22 1/24 1/2n-1. 2. Compute probabilities of 0, 1, 2, and 3 heads in flipping a coin 3 times. 3. Three people flip a coin. The “odd” person is out. Compute the probability of the event odd person out occurring. ans. 6/8. 4. Which is more likely to occur: a) exactly 100 heads from 200 flips or b) exactly 1000 heads from 2000 flips? 5. Three coins are placed in a box. One is 2-headed, one is 2-tailed, and the third is fair. The experiment consists of randomly choosing a coin from the box. a) Compute the probability that the 2 sides match. ans. 2/3. b) Upon selecting a coin, we observe that the top side is a head. Compute the probability that the bottom side is also a head. ans. 2/3.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 63
63
Problems
6. A coin is weighted so that tails is three times more likely to occur than heads. a) Compute the probabilities of 0, 1, 2, and 3 heads in three coin flips. b) What are the odds of a tail occurring on a single flip? 7. Seven fair coins and three 2-headed coins are placed in a box. The experiment is to randomly pick a coin and flip it. Compute the probability that the coin is fair if the result of a) a flip was heads, b) 3 flips were all heads. Use prior and posterior probabilities to compute for 3 flips at once and also in 3 stages. ans. 7/13 7/31. 8. A coin is to be tossed repeatedly. Two players pick a sequence of 3 outcomes. The player whose sequence occurs first wins. One player picks the sequence “THT.” What sequence should the other player choose to maximize the chances of winning? 9. Two fair coins are each tossed 5 times. Compute the probability that the number of heads resulting from 5 flips of a specified coin is 4 heads, given that there were 7 heads total. ans. 0.4167. 10. In an urn are 5 gold coins and 5 silver coins. Seven coins are randomly selected. Compute the probability of getting 4 gold coins.
CARDS
1. In a deck of 52 cards all but one are randomly removed unseen. Find the probability that the remaining card is a) an ace; b) the ace of spades; c) a spade. ans. 4/52 1/52 13/52. 2. Seven cards are randomly drawn from a deck. Compute the probability that a) they are all black; b) at least 1 is black; c) at least 1 is red; d) there are at most 2 of 12 face cards (kings, queens, and jacks). 3. Compute the probability of a) void (complete absence of a suit) in a specified hand of bridge consisting of 13 cards, b) a hand of all 13 diamonds ans. 4 * (39C13 / 52C13) 1/52C13. 4. Show that the event drawing a king and the event drawing a diamond are independent. 5. Thirteen cards are dealt from a deck after one card has been removed. Compute the probability that the cards are all of the same color. ans. (26C13 + 25C13) / 51C13. 6. Compute the probability of 2 jacks and 3 kings in 5-card poker. Compute the probability of a full house in 5-card poker. 7. Compute the probability that all 4 aces are together in a deck. ans. 49 /
52
8. Compute the probability of 4 aces by drawing 5 cards from a deck.
C4.
P369463-Ch001.qxd
64
9/2/05
10:56 AM
Page 64
Chapter 1 Introduction to Probability
9. Compute the probability of 4 aces in a specified bridge hand (13 cards). ans. 0.00264. 10. Compute the probability of 4 aces and 4 kings in a specified bridge hand. 11. Compute the probability of 4 aces or 4 kings in a specified bridge hand. ans. 0.00528. 12. Compute the probability of drawing an ace or a diamond. 13. Compute the odds of picking an ace from a deck of 52. ans. 12 : 1 against. 14. Compute the odds of a full house in 5-card poker. 15. Compute the odds of just 3 of one rank and 2 of another rank in 7-card poker. ans. 39.6 : 1 against. 16. Using a reduced deck of cards containing 3 aces (AS AD AC), 3 kings (KS KD KC), and 3 jacks (JS JD JC), compute the probability of 2 pairs in randomly selecting 4 cards by the methods of i) on the fly, ii) top down hierarchical approach, and iii) enumeration. b) Repeat, computing the probability of a triple. 17. Given a deck of cards without kings, jacks, and spades, compute the probability of two pairs in 5-card poker. ans. 13365/237336 = 0.056312. 18. Complete the table below for 7-card poker depictions. 7-Card Poker Hands One pair Two pairs Three pairs Triple Two triples Full house 4 : 3 Triple & 2 pair
Number of ways [uu v w x y z] [uu vv w x y] [uu vv ww x] [uuu v w x y] [uuu vvv w] [xxxx yyy] [uuu vv ww]
19. Compute the probability that a 5-card hand has a) at least one of each of the following ranks: ace, king, queen, jack, b) a void in face cards (kings, queens, and jacks). ans. 0.00414 0.2532. 20. Compute the probability that each of 4 bridge hands has an a) (5,2,2,1) division of suits, b) (4,3,3,3) division of suits.
MISCELLANEOUS 1. Two urns have 10 colored balls each. Urn X has 2W, 3B, and 5R; Urn Y has 3W, 2B, and 5R. The probability of selecting Urn X is 0.3 and of selecting Urn Y is 0.7. Find a) P(X | B), b) P(Y | W), and c) P(X | R). ans. 9/23 7/9 0.3
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 65
Miscellaneous
65
2. Five people enter an elevator on the bottom floor in a building with 6 floors. Compute the probability that at least 2 get off on the same floor, assuming that each is equally likely as not to get off on each floor. 3. In a bin are 5 different pairs of shoes. Five shoes are randomly selected. Compute the probability of a) none matching, b) one matching pair, c) two matching pairs. ans. 8/63 40/63 15/63. 4. In a drawer are 4 black socks, 6 gray socks, and 10 white socks. One reaches in and randomly grabs 3 socks. Compute the probability of a matching pair. Also solve using a backdoor approach. Repeat the experiment if there are no white socks in the drawer. 5. Of all the families that have two children with at least one boy, a) what is the probability that both children are boys? b) Given that the older child is a boy, what is the probability that both are boys? ans. 1/3 1/2. 6. What is the most likely 4-children family distribution (4 : 0, 3 : 1, or 2 : 2)? 7. How many ways can 12 indistinguishable marbles be placed into 5 boxes? Hint: Visualize 12 marbles with the 4 dividers creating the five boxes in a row, with the dividers mixed among the marbles. How many ways can you place the 4 dividers among the marbles? Develop insight by considering 1, 2, 3, 4 marbles placed into 4 boxes. ans. 16C4. 8. Invoke a plausibility argument to show that there are at least 2 people in the world who have the exact same number of hairs on their bodies. Hint: Pigeon-hole principle. 9. How many people would you have to ask in order to find one who shares a birth month and day with you with probability 1/2? ans. 1 - (364/365)n = 1/2 or n = 253 people. 10. Several people are in a room. They each clink glasses with each other and an odd number of clinks is heard. A person enters the room and the ritual is repeated but an even number of clinks is heard. Then another person enters the room and the ritual is performed once more. What is the probability of an odd number of clinks this time? Hint: Think of the number of diagonals in an n-gon, an (n + 1)-gon and an (n + 2)-gon. 11. The NCAA college basketball tournament features 64 teams with single elimination. Assuming all 64 teams are of equal ability and have an equal chance of winning, what is the probability that one Team A will play another Team B? ans. 63/64C2 = 1/64 = 0.03125. 12. Two players randomly choose a number from 1 to 10. Compute the probability that a) one number is greater than the other, b) a particular player’s number is greater than the other player’s number.
P369463-Ch001.qxd
66
9/2/05
10:56 AM
Page 66
Chapter 1 Introduction to Probability
13. There are 10,000 cars in a parking lot with a license plate consisting of 3 letters followed by 4 numerals. No two cars have the same numerical designation. When the cars begin to leave, what is the probability that the first 5 cars leave in increasing numerical order? ans. 1/120. 14. Of 3 cards, one is red on both sides, another green on both sides, and the third is red on one side and green on the other. One card is randomly picked and the green side is showing. Find the probability that the other side is also green. 15. a) An octagon has how many diagonals? b) An n-gon has how many diagonals? c) What regular polygon has twice the number of diagonals as sides? ans. 20 n(n-3)/2 7-gon. 16. Eighteen dice are rolled. Compute the probability that each numeral appears thrice. 17. An integer is randomly chosen from 1 to 100. Determine the probability that the integer is divisible by 2 or 3 or 5. ans. 74/100. 18. To win the Powerball jackpot one must match 5 numbers chosen from 1 to 49 and also match the Powerball number chosen from 1 to 42. Compute the odds of a) matching only the five numbers, b) matching the Powerball number, c) winning the Powerball jackpot. 19. (Simpson’s paradox) At one clinic Drug A cures 100 people out of 400 treated, for a cure rate of 25%. Drug B cures 10 people out of 20 treated, for a cure rate of 50%. Drug B is then reported to be twice as effective as Drug A. At another clinic Drug A cures 2 people out of 20 treated, for a cure rate of 10%, while Drug B cures 80 people out of 400 treated, for a cure rate of 20%. Drug B is again reported to be twice as effective as Drug A. Which drug has the better overall cure rate? 20. Which is more probable, S10 or S11, in rolling 3 fair dice? S9 or S12? 21. Do you see a pattern in the following digits to predict the next digit? ans.? no 397985356295141? If you do not, may you conclude that the digits are random? p = 3.14159265358979323846264338327950288419 716939937510582097494459230 . . . 22. In an urn is either a black marble or a white marble. A white marble is put into the urn and a marble is randomly selected from the urn. What is the probability that the selected marble is white? If it is white, what is the probability that the remaining marble is white?
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 67
67
Miscellaneous
23. You have 2 urns, 10 white marbles, and 10 red marbles. The experiment is to randomly pick an urn and randomly pick a marble from the urn. How should you arrange all the marbles in the urns to maximize the probability of selecting a red marble? Compute the probability. ans. 0.7368. 24. a) In mapping the integers from 1 to n into 1 to n, how many such maps contain exactly (n - 1) matches? b) How many maps have exactly 4 matches for n = 7? 25. Which is more likely, to win at craps or for 23 randomly selected people to have distinct birthdays? ans. craps. 26. a) Compute the probability of any one sequence of 23 people having distinct birthdays. b) How many unordered sequences are there? c) Find the probability of 23 people having distinct birthdays. n
27. Show that a)
n
 ÊË xˆ¯ = 2
x =0 ( n +1) / 2
c)
 x =0
n n
,
b)
Ê nˆ = 2n -1 if n is odd, d) Ë x¯
 ( -1) x =0 n -1
n
Ê nˆ = 0, Ë x¯
n
 i = ÊË 2ˆ¯ . i =1
n n 20 20 ˆ . 28. Find n for a) ÊË ˆ¯ = ÊË ˆ¯ and b) x for ÊË ˆ¯ = ÊË 3 7 x 20 - x¯ 29. Twins want to wear the same socks for an outing. In the drawer are black, white, and argyle socks. Find the minimum number of socks randomly selected from the drawer to ensure that the 4 socks match. ans. 10. 30. Create an example for two baseball players A and B so that player A has a higher batting average than Player B during both halves of the season, but Player B has the higher average over the entire season. 31. In the expansion of (x + y + z + w)10, what is the coefficient of the x2y2z3w3 term? ans. 25200. 32. a) Find the probability that a randomly chosen integer from 1 to 1000 is divisible by 3 or 5 given that it is divisible by 2. b) Find the probability that a positive integer less than 100 is relatively prime to 100. n n - 1ˆ n 33. Show that Ê ˆ = Ê Ë r ¯ r Ë r - 1¯ . 34. Describe the poker hand displayed by the probability of its occurrence.
P369463-Ch001.qxd
68
9/2/05
10:56 AM
Page 68
Chapter 1 Introduction to Probability 2
Ê13ˆ Ê 3ˆ Ê 4ˆ Ê 4ˆ Ë 3 ¯ Ë 2¯ Ë 2¯ Ë 4¯ Ê 52ˆ Ë 8¯ 35. In tossing 3 fair dice, compute the probability that the sum is less than or equal to 9. Repeat for 4 fair dice. ans. 81/216 126/1296. 36. You pick a card from a deck of 52 cards but you do not look at it. Compute the probability that the next card you pick from the deck is an ace. Use total probability. 37. Four appliances have to be repaired. A company employs 6 repairpersons. Compute the probability that exactly 2 repairpersons fix all four appliances. ans. 0.16207. 38. Of the integers 1 to 1000, how many have distinct digits (no leading zeros)? 39. In the expansion of (3x2 - 4y3)10, find the term containing y12. ans. 39,191,040x12y12 40. Find the term in the expansion of (2x2 - 3xy2 + 4z2)5 containing x3y2. 41. In a bag are numerals from 1 to 12. If 1–6 is selected, A wins. If 7–10 is selected, B wins. If 10–12 is selected, C wins. The selections are made with replacement in the order A, B, and C until one wins. Compute the probability of winning for A, B, and C. ans. 9/13 3/13 1/13. 42. Given P(A) = 1/3, P(B) = 1/2, P(AB) = 1/4, find a) P(A|B), b) P(B|A), c) P(A + B), d) P(ABc). 43. Roulette has 38 stopping bins of which 18 are of one color, 18 of another, and 2 of a third color. Compute the odds of winning on a single bet. Devise a way to simulate the game. 44. a) i) b) i)
In flipping a fair coin, which sequence below is more likely to occur? H T T H H T T T H H H T, ii) H H H H H H H H H H H. Which 5-card poker hand is more likely to occur? AS KS QS JS 10S, ii) 3D 7C KH 6S 9D.
45. Use the on the fly method to write the number of ways represented by a single canonical pattern in combinatorial form for getting 3 pairs, 2 triples, and 3 singletons in 15-card poker. [u v w xx xx xx yyy yyy] 15! ans. = 63,063,000, where the last 3 factorials 2! 2! 2! 3! 3!1!1!1! 3! 2! 3! in the denominator are for the 3 pairs, 2 triples, and 3 singletons. 46. Compute the probability of a hand in 5-card poker containing at least one card of each suit. Hint: Inclusion-exclusion principle.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 69
69
Miscellaneous
47. Compute the probability of getting exactly 3 unique faces in n tosses of a fair die. Hint: Inclusion-exclusion principle. ans. 6C3 [3n - 3C2 * 2n + 3C1 * 1n]/6n 48. Compute the probability that a positive integer x £ 130 is relatively prime to 130. Hint: 130 = 2 * 5 * 13. 49. In a. b. c. d.
sampling with replacement and with no leading zeros, How many 5-digit numbers are there? ans. 9*104. How many are even? ans. 9*103*5. How many are palindromes? ans. 9*102*1*1. How many have exactly one 7? ans. 29,889.
50. How many internal chord intersections from vertices are there in a regular n-gon? 51. What is the probability that a 5-digit number has one or more repeated digits? Use front door approach and confirm by back door approach. ans. 0.69760. 52. There are 12 balls that look identical except one of the balls is either lighter or heavier than the others. With a balance scale, find the odd ball in a minimum number of weighings. Enumerate by using a tree diagram. The arrow indicates the side going up. 1234 5678
367 48G
3
G
3L 8H
1
2
6
9 10 11 G
9
7
1L 5H 2L 7H 4L 6H
10
12 G
367 48G
9
10
6
7
1 2
3
9L 11H 10L 12L12H 9L 11L 10L 6L 4H 7L 2H 5L1H 3H
G
8L
53. A slot machine has the following number of symbol patterns on three randomly rotating dials. Symbol
Dial 1
Dial 2
Dial 3
Bar Bell Plum Orange Cherry Lemon Total
2 1 7 8 2 0 20
1 8 2 2 7 0 20
1 7 3 4 0 5 20
P369463-Ch001.qxd
70
9/2/05
10:56 AM
Page 70
Chapter 1 Introduction to Probability
a) Construct a tree diagram and find the probability of exactly 1 cherry. ans. 0.38. b) Compute the probability of a) 3 plums, b) 3 bars, c) exactly 2 plums, d) 3 cherries, e) 3 oranges, and f ) 3 bells. ans. 0.00525 0.00025 0.08675 0 0.008 0.007. c) Compute the probability of exactly 1 plum, 1 orange, and 1 bell. ans. 0.08. Hint: (permutation-list '(plum orange bell) 3), returns ((PLUM ORANGE BELL) (PLUM BELL ORANGE) (ORANGE PLUM BELL) (ORANGE BELL PLUM) (BELL PLUM ORANGE) (BELL ORANGE PLUM)). 54. In an urn containing b blue marbles and r red marbles, two marbles are randomly selected sequentially. Compute the probability that the second marble drawn is blue. Show that this probability is independent of whether sampling is with or without replacement.
SOFTWARE EXERCISES To load the software, click on the SOFTWARE icon on your desktop leading to the executable file “Genie.exe”. The software appears interactive—issue a command, receive a result. Commands may also be put into a command file and executed. To exit, enter: Q to quit. To load a file of commands, enter (load “filename.lisp”) or (compilefile “filename.lisp”) Software expressions are shown bold and enclosed in parentheses. Some basic calculations follow in communicating with the Genie. At the Genie > prompt, try the following commands. The symbol Æ stands for returns. (+ 1 2 3 4 5) Æ 15 (- 1 2 3 4 5) Æ -13 (* 1 2 3 4 5) Æ 120 (/16 8 4) Æ 0.5 (expt 2 3) Æ 8 (+ (expt 2 3) (expt 3 2)) Æ 17 (+ (* 2 3 4) (/16 8)) Æ 26 (list 'a 'b 'c) Æ (A B C) (cons 'a '(b c)) Æ (A B C) (append '(A B) '(C D)) Æ (A B C D) (random 1000) returns a random integer from 0 to 999. (random 1000.0) returns a random number between 0 and 1000. The commands always return something. The software uses * for the last value returned and ** for the next to last value returned and *** for the third from last value returned. Thus after entering 2 3 4 in turn, the command (* * ** ***) returns 24. The first * denotes multiplication, the second * denotes 4, ** denotes 3, and *** denotes 2. The substitution of the *’s can save time. (args 'function-name) returns the arguments for the function. For example, (args 'square) returns number, implying the square function takes a numerical argument. An error causes the following message:
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 71
71
Software Exercises
;;; Use :C followed by an option to exit. Type :HELP for help. ;;; Restart options: ;;; 1 Abort to top level. Enter :c 1 to return to the top level, or continue at the present level. The semicolon is used to indicate comments for code documentation. Templates are in bold with arguments in italics. Commands are in bold. Note that (mu list-of-numbers) is a template whereas (mu '(1 2 3 4 5)) is a command returning the mean 3. If list-of-numbers is assigned to an actual list, then (mu list-of-numbers) is a command. Pressing the keys F3, F7, and F9 can help in editing and executing previous commands. Key Function F3 .................................................... Brings up last command for execution F7 .................................................... Displays prior commands by number As shown. Highlight a command and press Enter. 25: (anova cables) 26: (combination '(A B C D E) 3) 27: (pick-until 4 10) 28: (pi1000) 30: (firstn 100 (pi1000)) 31: (sim-coins-1-1 100 1/2 10) 32: (sim-coins 1000 1/2 10) F9 .................................................... Enter command number: 28 Brings up command 28 (pi1000) for execution. Can be used in conjunction with F7. ≠ key also brings up the prior commands with each pressing. The down arrow key proceeds in the opposite direction. The main commands to help you understand the functions used in the text are: Setf
(setf x '(1 2 3 4 5))
Mu
(mu x)
Repeat
(repeat #' square x)
List-of
(list-of 4 x)
Random
(random 10) (random -10.0)
Sum
(sum x)
; assigns x to the list of integers 1 to 5. When x is ; entered, (1 2 3 4 5) is returned. ; 3 is returned as mu returns the mean of x. ; returns (1 4 9 16 25) squaring each in x; a.k.a mapcar ; returns ((1 2 3 4 5) (1 2 3 4 5) (1 2 3 4 5) (1 2 3 4 5)) ; returns a random integer between 0 and 9. ;returns a random number between 0 and -9. ; returns 15
P369463-Ch001.qxd
72
9/2/05
10:56 AM
Page 72
Chapter 1 Introduction to Probability
Count Nth First Fifth Firstn Flatten Upto Pi1000
(count 1 '(1 1 3 1 4)) (nth 2 x) (first x) (fifth x) (firstn 3 x) (flatten '((1 2)(3 4))) (upto 10) (pi1000)
From-a-to-b
(from-a-to-b 3 9 2)
Args
(args 'square)
Print-Length
(print-length n) (print-length nil)
; returns 3. ; returns 3 as nth is 0-based ;1 ;5 ; (1 2 3) ; returns (1 2 3 4) ; returns (1 2 3 4 5 6 7 8 9 10) ; returns the list of the first 1000 ; decimal digits of p. ; returns (3 5 7 9) in increments of ; 2. The default is 1. ; returns entering arguments of ; function square, ; enables the first n items of output ; list. ; enables the entire output list. ; For example, x Æ (1 2 3 4 5) ; (print-length 3) x Æ (1 2 3 . . .)
Ensure that the parentheses balance before entering a command. Nearly all simulations are preceded by the prefix sim- as in (sim-diceroll 1000). 1. (coin-flips n p) returns the simulated results from n coin flips with probability of success p for the event heads. (coin-flips 100 1/20) may return 5 heads 95 tails. 2. (sim-coins n p m) returns a list of the number of heads (successes) with p the probability of success from n coin flips repeated m times. (simcoins 100 19/20 10) may return (100 97 96 98 94 97 92 99 97 91). Try (sim-coins 100 1/2 10). 3. (mu list-of-numbers) returns the average or mean of a list of numbers. (mu '(1 4 4)) Æ 3. When used with #2 we get the result of n * m simulations of a coin flip. Try each of the following software commands three times each to see the ratio tend toward 5, 50, and 500 respectively: (mu (sim-coins 10 1/2 100)); (mu (sim-coins 100 1/2 100)); (mu (sim-coins 1000 1/2 100)). 4. (sim-die-roll n) returns the simulated outcomes from the rolling of a die n times or the rolling of n dice simultaneously. (sim-die-roll 10) Æ (3 6 4 1 2 5 3 6 4 5). The command (sim-die-roll 5) simulates the playing of poker with dice. Try the software commands (sim-die-roll 12); (sim-die-roll 120); (sim-die-roll 1200). Then try (mu (sim-die-roll 12)); (mu (sim-die-
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 73
73
Software Exercises
roll 120)); (mu (sim-die-roll 1200)) to see each approach the theoretical average of 3.5. (print-count-a-b a b list) returns a count of the numbers from a to b in the list. For example, (print-count-a-b 1 6 (sim-die-roll 3000)) may return Integer Count
1 500
2 513
3 527
4 492
5 461
6 507.
The empirical average is 10422/3000 = 3.474. 5. (sim-dice-roll n) returns the n simulated sum of rolling a fair pair of dice. (sim-dice-roll 12) may return (6 4 6 8 4 4 6 7 7 8 6 6). Try the following software commands: (sim-dice-roll 12); (sim-diceroll 120); (sim-dice-roll 1200). Later in Chapter 7 we will test if such outcomes give evidence of fair dice. How many times should S7, the sum of 7, occur in rolling a pair of dice 36 times? We can get an empirical distribution from rolling a pair of fair dice 1000 times with the command (printcount-a-b 2 12 (sim-dice-roll 1000)) returning Integer Count
2 26
3 49
4 76
5 106
6 142
7 171
8 147
9 133
10 65
11 55
12 30
The sample estimate for the expected value is 7070/1000 = 7.07 7. Press the F3 key or n the ≠ up arrow key to repeat the simulation. 6. (mu (sim-dice-roll n)) returns the average of n rolls of a fair pair of dice. (mu (sim-dice-roll 1296)) should return a number close to 7. Use the F3 key to repeat the command and observe values above and below 7 but all values close to 7. 7. (f n) returns n!. (f 5) Æ 120. Try (f 10); (f 100). (! n) also returns n!. 8. (permutation n r) returns nPr. (permutation 10 2) Æ 90. Also (perm 10 2) Æ 90. (permutation list r) returns the permutations of list taken r at a time. (permutation '(1 2 3 4) 2) returns the twelve 2-permutations ((1 2) (1 3) (1 4) (2 1) (2 3) (2 4) (3 1) (3 2) (3 4) (4 1) (4 2) (4 3)). (permute list case) permutes the objects in the list for case D if the objects are distinguishable and I for indistinguishable. The default is indistinguishable. How many are returned by (permute '(r a d a r))?
P369463-Ch001.qxd
74
9/2/05
10:56 AM
Page 74
Chapter 1 Introduction to Probability
((D A A R R) (D (D R R A A) (A (A A R D R) (A (A R A R D) (A (R D R A A) (R (R A R D A) (R
A R A R) (D D A R R) (A A R R D) (A R R D A) (A A D A R) (R A R A D) (R
A R R A) (D D R A R) (A R D A R) (A R R A D) (R A D R A) (R R D A A) (R
R A A R) (D D R R A) (A R D R A) (A D A A R) (R A A D R) (R R A D A) (R
R A R A) A D R R) R A D R) D A R A) A A R D) R A A D))
9. (combination n r) returns nCr. (combination 10 2) Æ 45. (combination 10 8) Æ 45. (combination list r) returns the combinations of list taken r at a time. For example, (combination '(1 2 3 4) 2) returns the 6 combinations ((1 2) (1 3) (1 4) (2 3) (2 4) (3 4)). (combination '(A B C D E) 3) returns what is selected and what is not selected as (A B C) (A B D) (A C D) (B C D) (A B E) (A C E) (B C E) (A D E) (D E) (C E) (B E) (A E) (C D) (B D) (A D) (B C) (B D E) (C D E) (A C) (A B) 10. (pick-until target n) returns a list of integers in the range from 1 to n until the target occurs. (pick-until 4 6) may return ((5 3 2 1 6 4) 6) where the last 6 indicates the total number of selections to get the target integer 4 to occur. With n = 6, a die roll is simulated. (pick-until 1845 10000) is the simulation of a small lottery where there are 10,000 numbers and 1845 is played. If you have the patience, you can opt to try for 1 in 100,000 and watch the horde of losers scroll by after the command (PRINT-LENGTH NIL). 11. Estimating p. a) Buffon’s needle. To simulate p, a needle of length L is repeatedly dropped on a hardwood floor with width D between cracks (L £ D). Record the number of times the needle crosses a crack. If L sin q is greater than x, the needle crosses a crack. Record the ratio c as the number of crossings to the total number n of drops. Show that pˆ = 2L/Dc. Note that 0 £ q £ p/2 and 0 £ x £ D where x is the distance of the lower needle edge to the crack above.
L D
x
q
L sin q
(buffon needle-length grid-width n) (buffon 1 2 100) returned (63 37 3.1746), indicating 63 crossings, estimating p to be 3.17.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 75
75
Software Exercises
;; drop a needle of length L onto line paper of width D ;;; estimate pi to be 2L/D*in/n) (defun buffon (L D n) (let ((in 0) (out 0)) (dotimes (i n (list in out (/ (* 2 L) (* D (/ in n))))) (if (> (* L (sin (random (/ (pi) 2)))) (random (float D))) (incf in) (incf out))))) b) Given the unit circle x2 + y2 = 1 with the area of the first quadrant being p/4, we can simulate the value for p by counting the number of ordered pairs (x, y) that fall in the circle. (pi-hat n) returns an approximation to p as the ratio of the number of random pairs in the first quadrant of the circle to the total number of pairs. (pi-hat 1000) may return pi-hat = 3.18 in = 795 out = 205. ;;; monte carlo simulation of p using x2 + y2 = 1 = > p/4 = in/n (defun pi-hat (n) (let ((in 0) (out 0)) ; in is # in circle; (dotimes (i n) (if (< = (+ (square (random 1.0)) (square (random 1.0))) 1) (incf in) ; if < 1 add 1 to in (incf out))) ; if > 1 add 1 to out (format t “pi-hat = ~6,4F ~3d in ~3d out” (/(* 4 in) n) in out))) 12. (pi1000) returns the first 1000 decimal digits of p. The command (mu (pi1000)) returns the average of these 1000 digits. Predict the theoretical average, assuming that the digits are random, and try the command (mu (pi1000)) to see the actual average of the digits 0-9. 13. (pm n r) returns the probability of exactly r matches in the n! permutation maps where the integers from 1 to n are mapped onto 1 to n. a) Try (pm 3 0) to find the probability of no match. Let (1 2 3) indicate the map where 1 goes to 2, 2 goes to 3, and 3 goes to 1. The other is (3 2 1) so (pm 3 0) Æ 1/3 = 2/6. b) Try (pm 3 1) and (pm 3 3). What is (pm 3 2) without computing? c) Compute N(5, 3), the number of 5-permutation maps with exactly 3 matches. 14. (print-map n) returns the number of matches having exactly 0 1 2 3 . . . n matches. For example, (print-map 7) returns Digit Count
0 1854
1 1855
2 924
3 315
4 70
5 21
6 0
How many total maps are there for the integers 1 to 7? Try (print-map 50).
7 1
P369463-Ch001.qxd
76
9/2/05
10:56 AM
Page 76
Chapter 1 Introduction to Probability
15. The command (random-perm n) returns a random permutation of the integers from 0 to n - 1. (random-perm 10) returned (7 1 5 4 3 2 6 8 0 9). The command (solitaire) simulates the number of matches from comparing two random permutations of the integers 0 to 51. The expected number of matches is 1. The command (sim-solitaire n) returns the result of n plays of solitaire. (sim-solitaire 10) returned (0 2 2 0 0 1 1 2 2 0). (mu (sim-solitaire 1000)) should return a number close to 1. Repeat the command. Also try the command (pairlis (random-perm 20) (random-perm 20)) and scan for matches. 16. (from-a-to-b a b step) returns an arithmetic sequence of numbers from a towards b with the common difference being the value of step. For example, (from-a-to-b -3 3 1/2) returns (-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3). (sum-a-to-b a b step) returns the sum of the arithmetic sequence from a towards b with step defaulting to 1. (sum list-ofnumbers) returns the sum of the list. (sum '(8 37 56 -12 2/3)) returns 89. 6666. ( n +1) / 2 n n Ê nˆ ( 1 ) = 0 , and c)   ÊË xˆ¯ = 2n -1 if n Ë x¯ x =0 x =0 x =0 is odd for n = 7 by using (pascal 7) and summing the coefficients with (sum (pascal 7)). n
17. Verify a)
n  ÊË xˆ¯ = 2n , b)
n
18. (birthday n) returns the probability of at least 2 or more people sharing a birth month and day from a group of n people. For example, (birthday 23) returns 0.5072972. What would (birthday 90) return in all probability? 19. (poker-5) returns the probability and odds against selecting the poker hands from 5-card poker. A menu asks the reader to select the hand by choosing a number. For example, (poker-5): Select the desired 5-card poker hand by choosing a number. 1. Full House 5. 4 of a rank 8. Bust
2. 1 Pair 6. Flush
3. 2 Pairs 4. 3 of a rank 7. Straight (aces high and low)
4 Æ selected 3 of a kind (p = 0.0211284 Odds = (46.3295454 : 1 AGAINST)) 20. (swr m source) returns m random numbers with replacement from the source list of numbers. (swr 10 (upto 100)) returned the 10 random numbers (69 3 79 85 64 32 17 74 15 73)
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 77
77
Software Exercises
from the integers 1 to 100. What would you expect to return from (mu (swr 1000 (upto 100)))? (setf rand-nums (swr 100 (upto 10000))) Æ 100 random numbers from 1 to 10000. (count-if #'(lambda (x) (< ¥ 1000)) rand-nums) Use the F3 key to change the command to 2000, 3000, . . . , 9000, 10000 to check the numbers. Expect an increase in 10 for each 1000 increase. One assignment returned 11 20 30 40 50 62 68 81 90 100, respectively, from 1000 to 10000. (sample-with-replacement n list) returns n random selections from a list of numbers. One may also use (SWR n list) or (SWOR n list) for sampling with and without replacement. 21. What is the most frequent integer in the first 1000 integers of p? The template (print-count-a-b a b list) returns the count of each integer from a to b contained in the list. (print-count-a-b 0 9 (pi1000)) returns the count of each digit of p. Count Integer
93 0
116 1
102 2
104 3
93 4
96 5
94 6
95 7
101 8
106 9
Note that the integer 1 occurs most frequently (116 times) in the first 1000 integers of p. The command (random-vs-pi-digits) compares the count of the first 1000 decimal digits of p with 1000 randomly chosen digits. (randomvs-pi-digits) may display Random vs. Pi 1000 Digits Digit Random-Count Pi-Count
0 109 93
1 108 116
2 106 102
3 95 104
4 107 93
5 83 96
6 99 94
7 88 95
8 101 101
9 104 106
22. Assume a fair $1 betting game in which a bet is doubled each time for a loss but returns to betting $1 each time it wins (St. Petersburg’s paradox). Try (sim-double-bet n) to see the effects. Returned is a list of W ~ win, L ~ lose along with the bets made. Notice the winning and losing streaks in (sim-double-bet 1000). LWLLLLWLWLLWWLLLWWLLLWLLLWLWWWWWWWWLLLLL LWWWLWWWWLWWWLLWLLWLLLWWWLWLLLLLLLLLWWW LLWWWWLLLWLWLWLLWLLWWLLLLLWLWWLLLWLLWWWW WLLLWWLWLLLLWLLWWLWWWLLLWLWWLLLLWLWLLLWW
P369463-Ch001.qxd
78
9/2/05
10:56 AM
Page 78
Chapter 1 Introduction to Probability LWLLLWWWLLLLWLLLWWWLWLWLWWLLLWWLWWWWLWL WWLLWWWWLWLWLLWLLWWLLLWLWWLWLLWWLWWWLLW WWWWWWLLWWWLLLWWLLLLWLWWLLLLLLWLWLLLWWL LWLLLWWLLLWLLLLLLWWWLLLWLWWLWWLWWWLLLWWL WWLWLLWWWLWLLLLLLLLLLWWLWLWLLWLLWWWLLLWL LWLWWWLWLLWLLLWLWLLWWWLWLWLLWWWLLLWWWLL LLWWLLWWWWLWLWWWLLWLWWLLWLLLWLWWWWLWLLL LLWWWLLWLLWWWLLWLLWLLLLLWWLLLWLLLWWWWWW WLWLLWLWWWLWLLLWLWWLWWWLLLLLWWLLLWWWWLL LLLLLLWWWWLWLLLLLWWWWWWLLWLWLLWWWLWLWWW WWWWLLLWWLLWWLLLWLWLLWWWLWLWWWLWLWWWLL WLLLLLWLLLWLWLWLWLWWLWLLWLLWLWWWWLLLWLLL WLWWLLLWLWLWLLWLWLLLWWWLWWWWWLWLLWWLWW LWWWWWLWLLLWWLLLLLLLLLLLWLWWWWWWWWWWWW LWWWLWWWLWLLWWWLWWWWLWLWLLLWWLLLLLWWWL LLWWWLWLLLWLWLLWWLLWLLWLLLLWWLLWLLLWLWWL LWLWWWWWLLWWWWLWLWWLLWWWWLWLLWWWLWLWW WLWLWWWWLLWLLLLLLWLWWLLLLWWWWWLLWLWLLWW LLWWLLLLLLWWLWLLWLWWWLWWLWWWWLWWLLWWLLL WLWLLLLLWWWLLLLWWLLWLWLWLWLWWWLWLWLWWWL WLLLWWLLWLWWWWWWLWLLWWLLLWLLLLWLLWWLLLW WWLLWLLLWLWWLWLWLWLWLLLL 1 2 1 2 4 8 16 1 2 1 2 4 1 1 2 4 8 1 1 2 4 8 1 2 4 8 1 2 1 1 1 1 1 1 1 1 2 4 8 16 32 64 1 1 1 2 1 1 1 1 2 1 1 1 2 4 1 2 4 1 2 4 8 1 1 1 2 1 2 4 8 16 32 64 128 256 512 1 1 1 2 4 1 1 1 1 2 4 8 1 2 1 2 1 2 4 1 2 4 1 1 2 4 8 16 32 1 2 1 1 2 4 8 1 2 4 1 1 1 1 1 2 4 8 1 1 2 1 2 4 8 16 1 2 4 1 1 2 1 1 1 2 4 8 1 2 1 1 2 4 816 1 2 1 2 4 8 1 1 2 1 2 4 8 1 1 1 2 4 8 16 1 2 4 8 11121212112481121111212112411112121241241124812 1 1 2 1 2 4 1 1 2 1 1 1 2 4 1 1 1 1 1 1 1 2 4 1 1 1 2 4 8 1 1 2 4 8 16 1 2 1 1 2 4 8 16 32 64 1 2 1 2 4 8 1 1 2 4 1 2 4 8 1 1 2 4 8 1 2 4 8 16 32 64 1 1 1 2 4 8 1 2 1 1 2 1 1 2 1 1 1 2 4 8 1 1 2 1 1 2 1 2 4 1 1 1 2 1 2 4 8 16 32 64 128 256 512 1024 1 1 2 1 2 1 2 4 1 2 4 1 1 1 2 4 8 1 2 4 1 2 1 1 1 2 1 2 4 1 2 4 8 1 2 1 2 4 1 1 1 2 1 2 1 2 4 1 1 1 2 4 8 1 1 1 2 4 8 16 1 1 2 4 1 1 1 1 2 1 2 1 1 1 2 4 1 2 1 1 2 4 1 2 4 8 1 2 1 1 1 1 2 1 2 4 8 16 32 1 1 1 2 4 1 2 4 1 1 1 2 4 1 2 4 1 2 4 8 16 32 1 1 2 4 8 1 2 4 8 1 1 1 1 1 1 1 2 1 2 4 1 2 1 1 1 2 1 2 4 8 1 2 1 1 2 1 1 1 2 4 8 16 32 1 1 2 4 8 1 1 1 1 2 4 8 16 32 64 128 256 1 1 1 1 2 1 2 4 8 16 32 11111124121241112121111111248112411248121241112 1 2 1 1 1 2 1 2 1 1 1 2 4 1 2 4 8 16 32 1 2 4 8 1 2 1 2 1 2 1 2 1 1 2 1 2 4 1 2 4 1 2 1 1 1 1 24812481211248121212412124811121111121241121121 1 1 1 1 2 1 2 4 8 1 1 2 4 8 16 32 64 128 256 512 1024 2048 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 2 1 1 1 2 1 2 4 1 1 1 2 1 1 1 1 2 1 2 1 2 4 8 1 1 2 4 8 16 32 1 1 1 2 4 8 1 1 1 2 1 2 4 8 1 2 1 2 4 1 1 2 4 1 2 4 1 2 4 8 16 1 1 2 4 1 2 4 8 1 2 1 1 2 4 1 2 1 1 1 1 1 2 4 1 1 1 1 2 1 2 1 1 2 4 1 1 1 1 2 1 2 4 1 1 1 2 1 2 1 1 1 2 1 2 1 1 1 1 2 4 1 2 4 8 16 32 64 1 2 1 1 2 4 8 16 1 1 1 1 1 2 4 1 2 1 2 4 1 1 2 4 1 1 2 4 8 16 32 64 1 1 2 1 2 4 1 2 1 1 1 2 1 1 2 1 1 1 1 2 1 1 2 4 1 1 2 4 8 1 2 1 2 4 8 16 32 1 1 1 2 4 8 16 1 1 2 4 1 2 1 2 1 2 1 2 1 1 1 2 1 2 1 2 1 1 1 2 1 2 4 8 1 1 2 4 1 2 1 1 1 1 1 1 2 1 2 4 1 1 2 4 8 1 2 4 8 16 1 2 4 1 1 2 4 8 1 1 1 2 4 1 2 4 8 1 2 1 1 2 1 2 1 2 1 2 1 2 4 8) winning $483.
Notice the relatively long losing streak in bold from 1 to 2048, losing 11 consecutive times, or risking $2,048 to win $1. In a similar vein, suppose you have $7 to bet on the outcome of a fair coin flip. If outcome heads occurs you win $1. You will bet until you win $1 or lose all $7 by doubling your bet if you lose. See the following density table.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 79
79
Software Exercises X
H
P(X) Outcome ($)
1
/2 1
TH 1
/4 1
TTH 1
/8 1
TTT 1 /8 -7
The expected gain is 1/2 *1 + 1/4 * 1 + 1/8 * 1 + 1/8 * -7 = 0, implying a fair game. In general, the density table for the Martingale strategy is X P(X)
1
-2n + 1
1 - qn
qn
Win $1 but can lose $2n - 1.
E(X) = 1 - qn - qn(2n - 1) = 1 - (2q)n. For a fair game, p = q = 1/2, E(X) = 0. 23. Suppose you start with $64 and always bet half of what you have each time for 6 bets, winning half of the bets in any order. Show that your final sum is $27 and thus you always lose. Try (bet sum order fraction), for example (bet 64 '(1 0 0 0 1 1) 1/2). Notice your bet sequence is win-lose-lose-lose-win-win. Then try a simulation where the probability of winning each time is 1/2 using (sim-bet1 sum fraction n) where n is the number of trials and fraction is 1/2 of remaining money for the next bet. Your win-lose sequence is random, with probability 1/2 on each bet. Observe that the bet function returns 27 each time, while sim-bet1 returns a random (powers of 3 to 729) sum each time. For example, (sim-bet1 64 1/2 6) may return 3, 9, 27, 81, 243, or 729. Then try (sim-bet2 start$ fraction n m) for m of (simbet1) for n trials, starting with start$ and betting a fraction of your current dollars. Use (mu (sim-bet2 64 6 1/2 10000)) repeatedly to see the expected value estimate about 64. 24. (dice-4 sum) returns the number of ways to get the sum along with the canonical patterns. (dice-4 19) returns 56 ways of getting a sum of 19 from the 1296 total ways the 4 dice can fall. (56 (1 6 6 6) (2 5 6 6) (3 4 6 6) (3 5 5 6) (4 4 5 6) (4 5 5 5)). Using (repeat #¢eval-perms ¢((1 6 6 6) (2 5 6 6) (3 4 6 6) (3 5 5 6) (4 4 5 6) (4 5 5 5))) returns (4 12 12 12 12 4) summing (sum '(4 12 12 12 12 4)) Æ 56. (dice sum n-dice) returns a list containing the favorable number of ways of getting sum with n dice, the total number of ways and the probability. For example, (dice 19 4) returns (56 1296 0.0432098). What would (dice 20 3) return? (canonical-dice 4) prints the entire canonical structure as does (canonical-dice 3), (canonical-dice 2), and (canonical-dice 1). Try (canonical-dice 3).
P369463-Ch001.qxd
80
9/2/05
10:56 AM
Page 80
Chapter 1 Introduction to Probability
25. Try faking the random outcomes of 100 flips of a fair coin by writing 1 for Heads and 0 for Tails. Then compare your list with the return from (random-coin-flip 100). An expert could easily spot the difference. Notice the lengths of the runs of 0’s and runs of 1’s. 26. How many positive integers £ n are relatively prime to n? For n = 100, 100 = 2252. 100 - (50 - 20 + 10) = 40. The probability of a randomly selected number from 1 to 100 is relatively prime to 100 is 40/100 = 0.4. We simulate. (defun sim-prime-to-n (n m) (/ (count 'T (mapcar #' (lambda (x) (= 1 (gcd x n))) (swor n (upto m)))) m)) (sim-prime-to-n 100 100) Æ 39. 27. A card is either white (W) or black (B). An unknown card is in a hat. A W card is put into the hat. Then a card is randomly selected from the hat. Find P(W). ans. 3/4. To simulate, try the command (sim-BW-cards 1000), which returned a list of W and B selected from the experiment. (sim-BW-cards 1000) should return a number close to 750 for the number of white cards selected. 28. Simulation of 3 coins in a box: one 2-headed coin, one fair and one 2-tailed. (sim-HT2 n) performs the probability of matching coin faces 2/3. (sim-HT2 5) returned Coin Selected (H H) (H T) (T T) (H T) (T T) 0.6
Face Shown
W/L
H T T H T
W L W L W
(Try (sim-HT2 1000) Æ 2/3 probability. 29. A simulation of the game of craps where a fair pair of dice is rolled. If the sum of the dice is 2, 3, or 12 on the first roll, the player loses. If the sum is 7 or 11, the player wins. Any other sum becomes the point. The dice are then rolled repeatedly until a 7 occurs, player loses, or the point sum occurs, player wins, and game is restarted. As previously calculated, the probability of winning at craps is 0.492. Try (sim-craps n) to get an estimate. (sim-craps 10000) may return 0.4938.
P369463-Ch001.qxd
9/2/05
10:56 AM
Page 81
81
Self Quiz 1A: Conditional Probability
30. (PRINT-COMBINATIONS lst n r) returns the nCr combinations beginning with lst in lexicographic order. (PRINT-COMBINATIONS '(A B C) 5 3) prints the 10 combinations (A B C)(A B D)(A B E)(A C D)(A C E)(A D E) (B C D)(B C E)(B D E)(C D E)(C D E). Generate the 300 candidate monograms for the child of Mr. and Mrs. Zero. (see Example 1.24) with the command (PRINT-COMBINATIONS '(A B) 25 2).
SELF QUIZ 1A: CONDITIONAL PROBABILITY 1. A fair die is rolled, showing an even number. The probability that the die is a prime number is a) 3/6 b) 1/3 c) 1/6 d) 1/2. 2. A pair of fair dice is rolled. The probability of a sum of 5 given that the sum is odd is a) 1/2 b) 4/36 c) 4/18 d) 4/13. 3. In an urn are 3 red marbles and 4 white marbles. Marbles are randomly selected without replacement. The probability that the second marble picked is white given that the first marble was white is a) 4/7 b) 3/7 c) 1/2 d) 1/7. 4. The probability of randomly selecting a jack from a deck of 52 cards after being informed that the card is a spade is a) 1/13 b) 1/4 c) 4/13 d) 4/52. 5. Given that P(A + B) = 21/32, P(AB) = 5/32, P(C|B) = 1/5, P(B + C) = 15/16, P(A|B) = 1/3, P(AC) = 5/32, a) P(BC) = ___ b) P(B) = ___ c) P(A) = ___ d) P(B|C) = ___ e) P(C) = ___ f) P(B|A) = ___. 6. Ten coins are in a box. Five are fair, 3 are 2-headed, and 2 are 2-tailed. In randomly selecting a coin, P(fair) = ___ and P(fair | head) = ___. a) 5/20 and 1/2 b) 5/20 and 3/4 c) 1/2 and 5/11 d) 1/2 and 6/11. 7. Compute the sensitivity, specificity, positive and negative predictive values, and prevailing rate from the following 1000 test cases. ACTUAL DIAGNOSIS Disease No Disease
Total
Test Positive Test Negative
520 50
70 4360
590 4410
Total
570
4430
5000
P369463-Ch001.qxd
82
9/2/05
10:56 AM
Page 82
Chapter 1 Introduction to Probability
Answers: 1b 2c 3c 4a 5[a 3/32 b 15/32 c 11/32 d 1/6 e 18/32 f 5/11] 6c 7. Sensitivity = 520/570, Specificity = 4360/4430, Positive Predictive Value = 520/590, Negative Predictive Value = 4360/4410, Prevailing Rate = 570/5000.
SELF QUIZ 1B: POKER PROBABILITY x-card poker means randomly selecting x cards from a deck of 52 cards. 1. The number of favorable ways of getting 3 pairs in 7-card poker is a) 617,760 b) 15,440 c) 2,471,040 d) 133,784,560. 2. The number of ways of getting 3 of one kind and 2 of another in 7-card poker is a) 3,744 b) 1,647,360 c) 411,840 d) 3,294,720. 3. The number of favorable ways of getting 1 pair in 7-card poker is a) 63,258,624 b) 2,471,040 c) 24,710,400 d) 133,784,560. 4. The number of ways to get one pair in 3-card poker is a) 78 b) 156 c) 6,864 d) 3,744. 5. The number of favorable ways to get 3 of one kind and 4 of the other in 7-card poker is a) 278 b) 1,560 c) 320 d) 624. 6. The odds of getting 2 pairs and one triplet in 7-card poker are a) 1167 : 1 b) 1121 : 1 c) 1082 : 1 d) 1508 : 1. 7. The odds of getting 3 pairs in 7-card poker are a) 116 : 1 b) 121 : 1 c) 108 : 1 d) 53 : 1. 8. The number of ways of getting 3 pairs in 6-card poker is a) 71,166 b) 1,716 c) 61,776 d) 3,291. 9. The number of ways of getting 4 pairs in 9-card poker is a) 1,287 b) 2,574 c) 33,359,040 d) 13,343,616. 10. The probability of getting 1 pair in 2-card poker is a) 3/51 b) 6/51 c) (4/52) * (3/51) d) 78/1326. The command (Poker-5) presents a menu for selecting the probability and returns the odds against various holdings in 5-card poker. Answers: 1c 2d 3a 4d 5d 6c 7d 8c 9c 10ad
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 83
Chapter 2
Random Variables, Moments, and Distributions
. . . Born under a wandering star.
Random variables wander along the x-axis but are more likely to be in some locations than in others according to their density functions. This chapter introduces the most important concepts in probability and statistics: random variables and their cumulative, density, joint, and conditional probability distributions. Moments, variance, and covariance are defined along with the concept of moment generating functions. Examples demonstrating the concepts are analyzed and enhanced with software commands. 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11
Introduction Random Variables Distributions Moments Standardized Random Variables Jointly Distributed Random Variables Independence of Jointly Distributed Random Variables Covariance and Correlation Conditional Densities Functions Moment Generating Functions Transformation of Variables Summary 83
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 84
Chapter 2 Random Variables, Moments, and Distributions
84
2.0
Introduction An important concept in probability theory is that of a random variable. A random variable is a function, one of seven basic calculus concepts (function, limit, continuity, derivative, anti-derivative, definite integral, and infinite series). In the first chapter different methods were used to solve various probability problems: enumeration, on the fly, solving the complement, recursion, and sophisticated counting methods. The concept of a random variable is another powerful method for solving probability problems. Since the random variable is a function, in one sense the reader should already be familiar with the concept. We will look at the domain, range, distribution, and moments of random variables to enhance understanding of probability and to learn the whereabouts of the wandering random variables. Included are transformation of random variables and concepts of marginal, joint, and conditional distribution.
2.1
Random Variables In an experiment, the uncertain events of the sample space can be mapped into the set of real numbers. Consider the experiment of flipping a coin once. We can assign the outcome “head” to p and the outcome “tail” to q, where p and q are real numbers between 0 and 1 inclusive. Now consider the experiment of tossing a coin 3 times. Measures of interest could be the total number of heads, the difference between the number of heads and the number of tails, whether the first and the third trials match (1 if they do and 0 if they don’t), the outcome of the second trial denoted by 0 for tails and 1 for heads, and various other measures. A random variable X is a function of the possible outcomes of the sample space in an experiment and maps these outcomes into the set of real numbers called the range space. The probability distribution of a random variable which maps the domain sample space of events into their probabilities is called the probability or density or mass function. Outcome space can be mapped into the real numbers from which the density function maps these real numbers into probability space [0, 1]. Random variables (RV) are denoted by capital letters, with X used as the default letter. The small letter x denotes the values in the population that X can assume. In the 3-coin experiment, the domain is the elementary sample space S: {HHH HHT HTH HTT THH THT TTH TTT}. Let RV X designate the number of heads. To see the action of X as a function, the individual maps are shown below: X: HHH Æ 3 HHT Æ 2 HTH Æ 2 HTT Æ 1 THH Æ 2 THT Æ 1 TTH Æ 1 TTT Æ 0.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 85
2.1 Random Variables
85
The domain of the random variable is the set of outcomes in the sample space. The range is a set of real numbers; in this case x is a member of the set {0 1 2 3}. Now let RV X be the difference between the number of heads and tails in the 3-flip coin experiment. X: HHH Æ 3 HHT Æ 1 HTH Æ 1 HTT Æ -1 THH Æ 1 THT Æ - 1 TTH Æ 1 TTT Æ -3 The range is the set {-1 -3 1 3}. Let RV X in the 3-flip coin experiment be 1 if the first and third coins match, and 0 if they do not. X: HHH Æ 1 HHT Æ 0 HTH Æ 1 HTT Æ 0 THH Æ 0 THT Æ 1 TTH Æ 0 TTT Æ 1. The range is the set {0 1}. This type of random variable is often referred to as an indicator RV, indicating event occurrence with a 1 and nonoccurrence with a 0. See Figure 2.1. Finally, let RV X assign the number of heads occurring in the 3-flip coin experiment to the probability of its occurrence (probability space). The domain is {0 1 2 3} and the range or probability space is {1/8 3/8} for a fair coin. The probability density function for RV X assigns the domain to probability space and is denoted by default as f(x). The density function consists of the ordered pairs {(0, 1/8) (1, 3/8) (2, 3/8) (3, 1/8)}. RV X maps events in the sample space into the set of real numbers which f(x) maps into the values between 0 and 1, called the probability space [0, 1]. Specifically, each elementary event in the sample space is assigned to the probability of that event occurring. This probability density function f(x) is often designated as P(x) and is referred to as the probability mass or density function satisfying i) 0 £ P(X) £ 1 and ii) SP(X) = 1 for discrete random variables.
I(x) p
q x 0
Figure 2.1
1
Indicator RV with P(X = 1) = p
P369463-Ch002.qxd
9/2/05
86
11:01 AM
Page 86
Chapter 2 Random Variables, Moments, and Distributions
The cumulative distribution function of a random variable is denoted by the default symbol F, with an appropriate subscript when necessary to indicate its associated random variable, e.g., FX. The cumulative distribution function is defined as x
Ú f ( x )dx = Â f ( x)
Fx( X ) ∫ P ( X £ x ) =
-•
(continuous RVs) (discrete RVs),
"x
the probability that RV X is less than or equal to some specified value x. The cumulative distribution function is monotonic or nondecreasing, implying that for x1 £ x2; F(x1) £ F(x2). The granularity of the sample space may change with the interest in the events. For example, if RV X is the number of heads in the 3-fair-coin experiment, the domain space can be transformed into the domain space {0 1 2 3} with TTT TTH, THT, and HTT THH, HTH, and HHT HHH X f(X )
0 1/8
1 3/8
Æ 0 Head; Æ 1 Head; Æ 2 Heads; and Æ 3 Heads. 2 3/8
3 1/8
Observe that P ( X = 3) = P ( HHH) = 1/8 where x = 3 heads, and P ( X = 2) = P (THH + HTH + HHT) = 1/8 + 1/8 + 1/8 = 3/8. Figure 2.2 shows the discrete density and cumulative distribution for RV X. EXAMPLE 2.1a
Create the probability density function for RV X being the sum of a pair of fair dice. Solution The sample space consists of 6 * 6 = 36 different singular events of the red and blue die permutations. The random variable of interest is the sum ranging from 2 to 12. The granularity of interest is each of the 36 equally likely events with probability 1/36 of occurrence. We can count the number of equally likely elementary events in Table 2.1 to create the probability density function in Table 2.2. For example, the sum 4 denoted by S4, occurs for ordered pairs (1, 3), (2, 2), and (3, 1). Let RV X be the outcome sum of the dice experiment. The number of elementary events summing from 2 to 12 determines the probability of occurrence of the sum. Thus the probability of the outcome sum equaling 4 is the number of favorable elementary events for the sum of 4 divided by the total number of elementary events; that is 3/36. The density function can also be given in formula as
9/2/05
11:01 AM
Page 87
F (x) 1 1/8 3/4 3/8 1/2 3/8 1/8
1/8 0
x
1
2
3
Cumulative Distribution Function f(x) 3/8
3/8
1/8
1/8
x 0
1
3
2
Discrete Density Function
Figure 2.2 Table 2.1
RED DIE
P369463-Ch002.qxd
Table 2.2 X P(X )
3 Fair Coin Flips Sample Space for Sum of Two Dice
+
1
2
1 2 3 4 5 6
2 3 4 5 6 7
3 4 5 6 7 8
BLUE DIE 3 4 4 5 6 7 8 9
5 6 7 8 9 10
5
6
6 7 8 9 10 11
7 8 9 10 11 12
Probability Density Function for Sum of Two Fair Dice 2
3
4
5
6
7
8
9
10
11
12
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 88
Chapter 2 Random Variables, Moments, and Distributions
88
Ï x -1 ; x = 2, 3, 4, 5, 6,7 ÔÔ f ( x ) = Ì 36 Ô13 - x ; x = 8, 9,10,11,12. ÔÓ 36 EXAMPLE 2.1b
Simulate the density function for RV X as the sum of a pair of fair dice rolled 1296 times. Solution The command (print-count-a-b 2 12 (sim-dice-roll 1296)) printed: Empirical Integer Count
2 37
3 70
4 116
5 141
6 187
7 208
8 190
9 143
10 94
11 81
12 29
8 190
9 144
10 108
11 72
12 36
Theoretical Integer Count
2 36
3 72
4 108
5 144
6 190
7 216
For example, the theoretically expected number of sums S2 or S12 should be the same, computed as 1296/36 = 36. Also, 1296/6 = 216 is the theoretically expected number of S7. Later, in Chapter 7, we will test the empirical data with the theoretical data to determine if the pair of dice is fair. EXAMPLE 2.2
Create the probability density function for RV X, where X is the number of divisors of a randomly chosen integer from 1 to 10. Solution First determine the divisors of the first 10 integers as shown in Table 2.3a. Table 2.3b shows the density distribution of the number of the divisors. Figure 2.3c displays the discrete density function. Figure 2.4 displays the cumulative distribution function. Table 2.3a
Number of Divisors
Integer
1
2
3
4
5
6
7
8
9
10
# Divisors
1
2
2
3
2
4
2
4
3
4
Table 2.3b X P(X )
Number of Divisors, Density Distribution 1
2
3
4
1/10
4/10
2/10
3/10
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 89
2.1 Random Variables
89
f (x ) 4/10 3/10 1/10
2/10 x
1
Figure 2.3
2
4
3
Discrete Density for Number of Divisors
F (x )
1
8/10
5/10
1/10 x
Figure 2.4
Cumulative Distribution Function for Number of Divisors
Table 2.4a
EXAMPLE 2.3
Table 2.4b
+
1
2
3
X
1 2 3
2 3 4
3 4 5
4 5 6
P(X )
Density for 3-Sided Dice 2
3
4
5
6
1/9
2/9
3/9
2/9
1/9
The experiment is rolling a pair of “3-sided” dice (1-2-3). The random variable X is the sum of the dice. Compute the probabilities of the outcome space for X as shown in Table 2.4. Draw the cumulative distribution function and find P(X £ 4). Solution There are 3 ¥ 3 = 9 elementary outcomes. We note that P(X £ 4) = 6/9 and P(X > 4) = 3/9. We also note that the sum of the probabilities is 1. See Figure 2.5 for the cumulative distribution function.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 90
Chapter 2 Random Variables, Moments, and Distributions
90
Fx
1 8/9 6/9
3/9 1/9 x 2
3
4
5
6
Figure 2.5 Cumulative Probability Distribution Function for Sum of Two 3-sided Dice
F (4) = P ( X £ 4) = (1/9 + 2/9 + 3/9) = 6/9, and F (1) = P ( X £ 1) = 0. Random variables are often classified by their domains. Those with a finite or countably infinite domain are called discrete; those with the cardinality of the continuum are called continuous. Density functions for discrete RVs are often referred to as probability mass functions. Both terms density and mass (functions) are used. An indicator random variable X for event A is defined as X = 1 if A occurs with probability p, and X = 0 if A c occurs with probability q =1 - p. Indicator RVs are prevalent in any experiment, for one can always ask if an event occurred—for example, a full house in poker or any numeral in a lottery drawing. The expected value of indicator RV is always p. See Figure 2.5. For example, in the 3-coin-flip experiment we can designate Xi as 1 if the ith toss was Head and as 0 if the ith toss was Tail. Then X as the total number of heads is given by X = X1 + X2 + X3 for indicator RVs Xi. EXAMPLE 2.4
In the diagram of electrical circuitry (Figure 2.6), the probability that each component is closed for current flow is p. Find the probability of current flow from D to E, find the probability of no current flow, and show that the two probabilities sum to 1.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 91
2.2 Distributions
91
A
B
D
E C
Figure 2.6
Current Flow from D to E
Solution Let A, B, and C be indicator RVs. Then P ( flow ) = P ( AB + C ) = P ( AB ) + P (C ) - P ( ABC ) = p 2 + p - p 3. P ( no flow ) = P [( A c + B c )C c ] = P ( A c C c + B c C c ) = P( AcC c ) + P( B cC c ) - P( Ac B cC c ) = (1 - p)2 + (1 - p)2 - (1 - p)3 . The sum of the two probabilities is given by p 2 + p - p 3 + 2(1 - p)2 - (1 - p)3 = p 2 + p - p 3 + 2 - 4 p + 2 p 2 - 1 + 3 p - 3 p2 + p3 = 2 - 1 = 1.
2.2
Distributions The discrete density function of a random variable X is designated f(x) and assigns to each outcome or to each value of X its probability of occurrence. That is, f ( x ) = P ( X = x ) for x real. Table 2.2 displays the density function for RV X, where X is the sum of the outcomes from tossing a pair of dice. Table 2.5 displays the cumulative distribution function for the outcomes, and Figure 2.7 shows a sketch of the function. P ( S4 ) = P ( X = 4) = P (1, 3) + P (2, 2) + P (3, 1) = 1/36 + 1/36 + 1/36 = 3/36. f(xi) = 1/36 for each xi, an elementary event. Thus P(X = 4) = 3/36. The density function f satisfies i) f(x) ≥ 0 and ii) Â f ( x i ) = 1, summing all xi
over the xi. The discrete cumulative F(x) = P(X £ x) = Â f ( x i ). xi £ x
distribution
function
is
defined
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 92
Chapter 2 Random Variables, Moments, and Distributions
92
Table 2.5 X F(x)
Probability Density Function for Sum of Two Fair Dice
2
3
4
5
6
7
8
9
10
11
12
1/36
3/36
6/36
10/36
15/36
21/36
26/36
30/36
33/36
35/36
36/36
For example, F(S7) = P(X £ 7) = P(X = 2) + P(X = 3) + P(X = 4) + P(X = 5) + P(X = 6) + P(X = 7) = f(2) + f(3) + f(4) + f(5) + f(6) + f(7). = (1 + 2 + 3 + 4 + 5 + 6)/36 = 21/36. The cumulative distribution function is monotonic, nondecreasing with the following properties: i) F(-•) = 0; ii) F(•) = 1; iii) if x £ y, then F(x) £ F(y); and iv) P(a £ X £ b) = F(b) - F(a). x
Further, F ( x ) = P { X Œ ( -•, x]} =
Ú
f ( x )dx, implying F¢(x) = f(x).
-•
x
F
1 35/36 33/36 30/36 26/36
21/36
15/36
10/36 6/36 3/36 1/36
2
Figure 2.7
3
4
5
6
x 7 8 9 10
11
12
Cumulative Probability Distribution Function for the Sum of Two Fair Dice
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 93
2.2 Distributions
93
F (x ) 1
F (b ) F (a )
x 0
a
P (a < x < b) =
Figure 2.8
Ú
b
1
b a
f (x )dx = F (b ) – F (a )
Probability as Area Under a Curve
f (x )
x 0
a
b
P (a < X < b) =
1 b
Ú a f (x )dx.
Figure 2.9 Cumulative and Density Functions Showing F(b) - F(a) = Úab f(x)dx The cumulative distribution for the sum of a pair of dice is shown in Table 2.5. For example, P(X < 2) = F(2-) = 0 and P(X ≥ 13) = 1 - P(X < 13) = 1 - P(X £ 12) = 1 - F(12) = 1 - 1 = 0. We run into difficulties with continuous random variables and have to restrict events. Consider the unit interval as a domain for RV X. The interval consists of an infinite set of points and all the points seem to be elementary events. If we permit points to be events, what probability should we assign to each? Any value other than zero would violate the probability axiom of P(S) = 1. To circumvent this difficulty we allow only intervals to be events and assign the probability zero to each point or to each set of discrete points. The probability of each interval (a, b) is then defined to be the integral from a to b of the density function, that is, the probability is the area under the curve f(x) between the limits of integration and is F(b) - F(a) where F is the cumulative distribution function. See Figures 2.8 and 2.9. Continuous density functions are thus not unique since an infinite number of discrete points could be added to the density function, but with no prob-
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 94
Chapter 2 Random Variables, Moments, and Distributions
94
F (x )
1 3/4 1/2
0
Figure 2.10
1/2 3/4
1
Cumulative Uniform Distribution
f (x ) 1
x 0
Figure 2.11
1/2 3/4
1
Continuous Uniform Density on [0, 1]
ability measure assigned to these discrete points. Also, observe that open or closed intervals are not significant for continuous density functions. However, open and closed intervals are significant for discrete density functions. EXAMPLE 2.5
Consider the continuous uniform distribution (often referred to as the rectangular distribution) for RV X on the interval [0, 1] shown in Figure 2.11. The cumulative uniform distribution is shown in Figure 2.10. Since the length of the interval is 1, the height of f(x) must also equal 1 to ensure that the total area under the curve is 1. Find a) P(X = 1/2) and b) P(1/2 < X < 3/4). Solution 1/2 a) P(X = 1/2) = 0 = Ú1/2 1dx. 3/4 1dx = 3/4 - 1/2 = 1/4. b) P(1/2 < X < 3/4) = Ú1/2
Continuous cumulative distribution functions have properties similar to discrete cumulative distributions functions, with the exception of integrating rather than summing to compute the probabilities. That is, F(-•) = 0; F(•) = 1; for a < b, F(a) £ F(b); P( X £ x ) = F ( x ) = EXAMPLE 2.6
Ú
x
-•
f ( t)dt.
RV X has continuous density function f(x) = 2x on 0 £ x £ 1. First verify that f(x) is a valid density. See Figure 2.12. Then compute a) P(X £ 1/2);
b) P(X > 3/4);
c) P(1/2 < X < 3/4).
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 95
2.2 Distributions
95
f (x ) (1, 2)
x 0
1
P(0 < X < 1) =
Ú
1 0
2xdx = 1
f(x) = 2x on [0, 1]
Figure 2.12
Solution We verify that f is a valid density by showing that 1 = 1 and observing that f(x) is nonnegative over the interval 0 [0, 1]. See Figure 2.12. 1
Ú 2xdx = x
2
0
a) Since F ( x ) = P ( X £ x ) =
Ú
x
0
2xdx = x 2 ,
2
1ˆ 1 Ê Ê 1ˆ Ê 1ˆ P X£ =F = = = Ë ¯ Ë ¯ Ë ¯ 2 2 2 4
1/2
Ú
0
2xdx = x 2
1/2 1 = . 0 4
2 1 3ˆ 7 7 1 Ê Ê 3ˆ Ê 3ˆ = 1- F = 1= = Ú 2xdx = x 2 = . b) P X > 3 /4 Ë ¯ Ë ¯ Ë ¯ 3/4 16 4 4 4 16 3 /4 3ˆ 4 5 3/4 5 Ê1 Ê 3ˆ Ê 1ˆ 9 -F = = = Ú 2xdx = x 2 = . c) P < X < = F 1/2 Ë2 ¯ Ë ¯ Ë ¯ 1/2 16 4 4 2 16 16 16
EXAMPLE 2.7
Given the cumulative probability distribution function (Figure 2.13) of RV X, find the density function and graph each. Find P(3 < x £ 6). Ï1/3 1 £ x < 4 ÔÔ1/2 4 £ x < 6 F ( x) = Ì Ô5/6 6 £ x < 10 ÔÓ1 x ≥ 10. Solution
The density function Figure 2.14 is shown as X P(X )
1
4
6
10
2/6
1/6
2/6
1/6
P (3 < x £ 6) = 5/6 - 2/6 = 3/6.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 96
Chapter 2 Random Variables, Moments, and Distributions
96
F (x )
1 5/6 3/6 2/6
1
Figure 2.13
4
6
10
Cumulative Distribution Function Derived from Density Function
f (x ) 2/6
2/6 1/6
1/6 x
1
Figure 2.14
EXAMPLE 2.8
4
6
10
Density Function Derived from Cumulative Distribution Function a) Verify that f is a valid density, b) compute P(-1/2 £ X £ 1/2) for RV X with density and graph the density and cumulative distribution functions. Ï1/2 Ô f ( x) = Ì 1 ÔÓ (2 - x ) 4
-1 < x £ 0 0< x E(Y ) = E( X1 ) + E( X 2 ) = 1.1 + 1.1 = 2.2. Again, the expected value is “expected” in the mathematical sense, not in the sense that we would be disappointed if the value did not occur. In fact, it may well be impossible for the expected value to ever occur from a trial. We do not really expect to get 1/2 of a head on a coin flip. To get exactly 5000 heads from 10,000 coins flips is highly unlikely. We expect to get a value close to the expected value in any long run of repeated trials.
EXAMPLE 2.13
In front of us are four bins full of money. One bin is filled with $1 bills, another with $5 bills, a third with $10 bills, and a fourth with $20 bills. We randomly choose a bin and withdraw one bill. It cost $1 to do this experiment once, $2 twice, $3 three times, and so on. a) To maximize expected return, how many times should we play? What is the expected optimum profit? b) How many times can we expect to play before losing? Solution a) Since each bin is equally likely to be chosen with probability 1/4, we expect to win (1 + 5 + 10 + 20)/4 = $9 from each pick, but it cost $x for the xth pick. After n picks, our profit gained, g(n), is given by
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 101
2.3 Moments
101 n
g( n ) = 9 n - Â x = 9 n x =1
n( n + 1)
=
17 n - n 2
2
,
2
from which g¢(n) = 17/2 - n = 0 when n = 81/2, implying 8 or 9 times for optimum profit. The expected optimum profit is g(8) = 36 = g(9). Note that g≤(n) = -1, indicating that the maximum occurs at n = 81/2 since g(0) = 0 and g(•) = -•. 17 n - n 2 b) g(n) = 0 when = 0 or when n = 17 or n = 0 (don’t play). 2
We can play 17 times before expecting a loss.
The command (sim-bin$ n) prints the outcome bills and returns a net profit or loss by playing n times. (sim-bin$ 9) returned $1 = 4 $5 = 1 $10 = 1 $20 = 3, totaling $79, costing $45 for a net payment of $34. (sim-bin$ 17) returned $1 = 5 $5 = 3 $10 = 5 $20 = 4, totaling $150, costing $153 for a net payment of $ -3. (sim-bin-n$ (n m) performs (sim-bin$ n) m times and returns the average of the outcomes. (sim-bin-n$ 9 100) returned $36.72; (sim-bin-n$ 17 1000) returned -0.5990.
EXAMPLE 2.14
Find the first 2 population moments about the origin and about the mean for the discrete population of RV X {1 2 3 4 5 6 7 8 9 10}. Solution
The first moment about the origin is E( X ) = m =
10
1
Âx 10
i
= 5.5.
i =1
The second moment about the origin is E( X 2 ) =
1
10
Âx 10
2 i
= 38.5.
i =1
10
The first moment about the mean is E[( x - m )] = Â i =1
( xi - m )
= 0.
10 10
The second moment about the mean is E[( X - m )2 ] = Â i =1
( x i - m )2
= 8.25.
10
The second moment about the mean is called the variance denoted as V(X). Observe that V(X) = E(X 2) - E2(X), that is, 8.25 = 38.5 - 5.52.
P369463-Ch002.qxd
9/2/05
102
11:01 AM
Page 102
Chapter 2 Random Variables, Moments, and Distributions
The command (moments-o n population) returns the nth moment about the origin from the population of numbers. (moments-mu n population) returns the nth moment about the mean. For example, (moments-o 1 (upto 10)) Æ 5.5 (moments-o 2 (upto 10)) Æ 38.5 (moments-mu 2 (upto 10)) Æ 8.25, the variance of the population. EXAMPLE 2.15
A slot machine has the number of symbol patterns on three randomly rotating dials as shown in Table 2.7. Table 2.7
Slot Machine Display
Symbol
Dial 1
Dial 2
Dial 3
Bar Bell Plum Orange Cherry Lemon
2 1 7 8 2 0
1 8 2 2 7 0
1 7 3 4 0 5
Create the density distribution of each of the 6 symbols and find the expected value of each. Solution a) Bar 2 1 1 on the dials P(Bar = 0) = 18/20 * 19/20 * 19/20 = 0.81225 P(Bar = 1) = (2 * 19 * 19 + 18 * 1 * 19 + 18 * 19 * 1)/8000 = 0.17575 P(Bar = 2) = (2 * 1 * 19 + 2 * 19 * 1 + 18 * 1 * 1)/8000 = 0.01175 P(Bar = 3) = (2 * 1 * 1)/8000 = 0.00025 Bar 2 1 1 Bar
0
1
2
3
P( Bar) 0.81225 0.17575 0.01175 0.00025 E(Bar) = 0.17575 + 0.0235 + 0.00075 = 0.2 per play. b) Bell 1 8 7 P ( Bell = 0) = 19 * 12 * 13/8000 = 2964/8000 = 0.3705 P ( Bell = 1) = (1 * 12 * 13 + 19 * 8 * 13 + 19 * 12 * 7)/8000 = 0.466 P ( Bell = 2) = (1 * 8 * 13 + 1 * 12 * 7 + 19 * 8 * 7)/8000 = 1252/8000 = 0.1565 P ( Bell = 3) = (1 * 8 * 7)/8000 = 56/8000 = 0.007 Bell 0 1 2 3 P( Bell)
0.3705 0.466 0.1565 E(Bell) = 0.8 per play.
0.007
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 103
2.3 Moments
103
c) Plum 7 2 3 P ( Plum = 0) = 13 * 18 * 17/8000 = 3978/8000 = 0.49725 P ( Plum = 1) = (7 * 18 * 17 + 13 * 2 * 17 + 13 * 18 * 3)/8000 = 3286/8000 = 0.41075 P ( Plum = 2) = (7 * 2 * 17 + 7 * 3 * 18 + 13 * 2 * 3)/8000 = 694/8000 = 0.08675 P ( Plum = 3) = 7 * 2 * 3/8000 = 42/8000 = 0.00525 Plum P( Plum )
0
1
2
3
0.49725
0.41075
0.08675
0.00525
P( Plum ) = 0.6 per play. d) Orange 8 2 4 P (Orange = 0) = 12 * 18 * 16/8000 = 3456/8000 = 0.432 P (Orange = 1) = (8 * 18 * 16 + 12 * 2 * 16 + 12 * 18 * 4)/8000 = 3552/8000 = 0.444 P (Orange = 2) = (8 * 2 * 16 + 12 * 2 * 4 + 8 * 18 * 4)/8000 = 928/8000 = 0.116 P (Orange = 3) = 8 * 2 * 4/8000 = 0.008 Orange
0
P(Orange)
1
0.432 0.444
2
3
0.116
0.008
E(Orange) = 0.444 + 0.232 + 0.024 = 0.7 per play. e) Cherry 2 7 0 P (Cherry = 0) = 18 * 13 * 20/8000 = 4680/8000 = 0.585 P (Cherry = 1) = (2 * 13 * 20 + 18 * 7 * 20)/8000 = 3040/8000 = 0.38 P (Cherry = 2) = 2 * 7 * 20/8000 = 280/8000 = 0.035 Cherry P(Cherry )
0
1
2
3
0.585
0.38
0.035
0
E(Cherry ) = 0.38 + 0.070 = 0.45 per play. f) Lemon 0 0 5 P ( Lemon = 0) = 20 * 20 * 15/8000 = 0.75 P ( Lemon = 1) = 20 * 20 * 5/8000 = 0.025 Lemon
0
P( Bell)
0.75
1 0.25
E( Lemon) = 0.25 per play.
P369463-Ch002.qxd
104
9/2/05
11:01 AM
Page 104
Chapter 2 Random Variables, Moments, and Distributions
(sim-slot-machine 10000) returned Theoretical
Empirical
2000 8000 6000 7000 4500 2500
2021 8034 5880 7063 4527 2475
Bar Bell Orange Plum Cherry Lemon
Information Content (Entropy) Conditional probability depends on the information content gained by knowing that one event called the conditional event occurred and how it affects the occurrence of another event. The higher the probability of the conditional event, the lower the information content. For example, in the experiment of tossing a fair die, let A be the event that the face with numeral 2 occurred, B the event that the outcome was less than 6, C the event that the outcome was even, and D the event that the outcome was an even prime. Then P(A | B) = 1/5 < P(A | C) = 1/3 < P(A | D) = 1. The information content of an event A is defined as I A = Log 2
1 P ( A)
= -Log 2 P ( A),
where IA is the number of bits of information necessary to conclude A. The information content of a certain event is 0. A bit or one binary digit is the unit used to measure uncertainty. For example, the probability of selecting the king of diamonds from a deck of 52 cards is 1/52 and the number of bits of information is about 5.7 bits/trial as 25 = 32 < 52 < 64 = 26 (log252 ª 5.7). To demonstrate, we proceed as follows: Is the card black? No. Is the card a heart? No. Using the ace value as 1, is the card 7 or higher? Yes. Is the card a face card? Yes. Is the card the jack or queen of diamonds? No. Now we are done using only 5 bits, but had the previous answer been yes, then we would have had to expend the 6th bit to distinguish between the last two cards. The method is the basis of binary search. Entropy is defined as the expected information content of RV I(X), expressed in bits by n
H = -Â pi log 2 pi , i =1
and define H = 0 when p = 0, since lim p Log 2 (1/ p) = 0. pÆ 0 +
For example, the entropy of RV X being the outcome from a fair coin flip is (-1/2 * -1) + (-1/2 * -1) = 1 bit and the entropy of a biased coin with P(Heads) = 3/4 is (-3/4 * -0.415) + (-1/4 * -2) ª 0.811 bit. Notice that maximum entropy occurs for a fair coin where P(Heads) = 1/2.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 105
2.3 Moments
EXAMPLE 2.15
105
How much information is conveyed by a) b) c) d)
a full house in 5-card poker and a sum of 7 in rolling 2 fair dice? Compute the entropy in 2 fair coin flips coded for RV Y as HH HT TH TT. Compute the entropy when RV X is the number of heads in 2 fair coin flips.
Solution a) P(full house) = 3744 / 2,598,960 ª 0.00144058 fi 9.44 bits. (- (log 0.00144058 2)) = (log 2598960/3744 2) = 9.44. b) P(S7) = 1/6 fi I(S7) = 2.585 bits. (- (log 1/6 2)) Æ 2.585. c) P(HH) = P(HT) = P(TH) = P(TT) = 1/4. 4
n
H Y = -Â pi log 2 pi = -Â i =1
d) X P(X)
0 /4
1
i =1
1 1 /2
1 4
log 2
1
1 = - ( -2 - 2 - 2 - 2) = 2 bits. 4 4
2 1 /4
n
H = -Â pi log 2 pi fi H x = ( -1/4 * -2) + (1/2 * -1) + ( -1/4 * -2) = 1.5 bits. i =1
EXAMPLE 2.16
Use entropy considerations to determine which is the more likely target to be hit when one is using 2 shots with probability 1/2 of a hit at each firing or one is using three shots with probability of 1/3 of a hit at each firing. Solution Let RVs X and Y be the number of hits. Then for the first target X can be 0, 1, or 2 hits, as shown, and Y can be 0, 1, 2, or 3 for the second target. Both X and Y are binomial RVs (discussed in Chapter 3). Target 1 X P(X)
0 1/4
1 1/2
Target 2 2 1/4
Y P(Y)
0 8/27
1 12/27
2 6/27
3 1/27
HX = -1/4 * log 2 1/4 - 1/2 * log 2 1/2 - 1/4 * log 2 1/4 = (-0.25 * -2) + (-0.5 * -1) + (-0.25 * -2) = 1.5; since Log 21/4 = -2, Log 21/2 = -1. H Y = -8/27 * log 2 8/27 - 12/27 * log 2 12/27 - 6/27 * log 2 6/27 - 1/27 * log 2 1/27 = -8/27 * -1.7548 - 12/27 * -1.1699 - 6/27 * -2.1699 - 1/27 * -4.7548 = 1.6982. Firing 2 shots with probability 1/2 of each shot and hitting the target has the lower entropy and thus the higher probability.
P369463-Ch002.qxd
9/2/05
106
11:01 AM
Page 106
Chapter 2 Random Variables, Moments, and Distributions
The command (info p) returns the number of bits of information given probability p; e.g., (info 1/2) Æ 1. The command (entropy list-of-probabilities) returns the entropy. For example, (entropy '(1/4 1/2 1/4)) returns 1.5 and (entropy '(8/27 12/27 6/27 1/27)) returns 1.698.
EXAMPLE 2.17
There are n identical balls with one ball imperceptibly heavier or lighter than the other (n - 1) balls. With a balance scale determine the number of balls to place on each side of the scale to minimize the expected number of balls remaining to be weighed or to maximize the information content of the weighing. For n = 12 balls find the minimum number of weighings. Solution The experiment is to place 2x balls on the balance scale to determine if the scale balances. For the scale to balance we can choose any of the 2x balls from the (n - 1) good balls in n-1C2x ways. Thus Ê n - 1ˆ Ë 2x ¯
P ( balance) =
=
( n - 1)! Ê ˆ Ë (2x )!( n - 2x - 1)!¯ n! Ê ˆ Ë (2x )!( n - (2x )!) ¯
Ê nˆ Ë 2x ¯
=
n - 2x
= 1-
n
2x
.
n
2x ˆ 2x Ê P ( no balance) = 1 - P ( balance) = 1 - 1 = . Ë n¯ n As a figure of merit, let RV X be the number in the set containing the odd ball after the event balance or left side goes down (LSD) or right side goes down (RSD). We seek E(X). E( X ) = Â x * P ( X = x ) = ( n - 2x ) =
( n - 2x )
n ( n - 2 x )2 + 2 x 2
+x
x
+x
n
x
for 0 £ x £
n 2
n
.
n
Table 2.8 Event X P(X)
Density Function for Balancing n Balls Balance
LSD
RSD
n - 2x n - 2x
X x
X x
n
n
n
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 107
2.3 Moments
E ¢( x ) =
107
-4( n - 2x ) + 4 x
= 0 when n - 2x = x or when x =
8+4
=
12
n
.
3
n E ¢¢( x ) =
n
> 0 fi a relative minimum for n > 0.
n
n Ê nˆ n = we see that is an absolute minimum on the Ë 2¯ 2 3 n n interval 0 £ x £ . Thus place balls (round up) on each side of the balance 3 2 scale for the first weighing.
Since E(0) = n and E
Tree Solution Let G, L, and H denote Good, Light, and Heavy, respectively. There are 3 events for each weighing: the left side goes down, the scale balances, or the right side goes down (see Figure 2.16). Notice there are 24 leafs, 12 balls, light or heavy (12 * 2 = 24). Alternative solution using entropy Entropy is maximized when the probabilities are equal. If the scale balances, the odd ball resides in the n - 2x remaining n - 2x balls with probability . For a side to go down, n x the odd ball is one of the x balls with probability . Equating the pron n - 2x x n babilities yields = => x = . n n 3
1234|5678
9 10 | 11 G
367|48G
3| G
3H
8L
Figure 2.16
12
6 |7
1H 5L 2H 6L 4H 7L
9 |10
9H
12 | G
3 6 7 |4 8 G
9 | 10
6|7
12
11L 10H 12H 12L 9L 11H 10L 6H 4L 7H 2L 5H 1L
Tree Diagram for Balancing Balls
G|8
3L 8H
P369463-Ch002.qxd
108
9/2/05
11:01 AM
Page 108
Chapter 2 Random Variables, Moments, and Distributions
Higher Moments The rth moment of a continuous RV X around the origin is defined as E( X r ) =
Ú
•
-• n
x r f ( x )dx
(continuous RVs)
= Â x ir P ( X i = x i )
(2–3)
(discrete RVs).
i =1
From the definition, the 0th moment is then equal to 1, and the first moment around the origin is the expected value of the random variable X, E(X ) = m. Moments are often defined about their mean value as well as the origin. The second moment about the mean is defined as the variance of X and is denoted by the symbol 2
V ( x ) = E[( X - m )] =
Ú
•
-• n
( x - m )2 f ( x )dx
(continuous RVs)
(2–4)
= Â ( X i - m )2 P ( X i = x i ) (discrete RVs). i =1
The variance indicates the spread or dispersion of the RV around its mean. Since the expected value operator is linear and ( X - m )2 = X 2 - 2mX + m 2 , V ( X ) = E[( X - m )2 ] = E( X 2 - 2mX + m 2 ) 2
(2–5)
2
= E( X ) - 2mE( X ) + E( m ) = E( X 2 ) - 2m 2 + m 2 = E( X 2 ) - m 2 = E( X 2 ) - E 2 ( X ). In words the variance of a random variable (second moment about the mean) is the difference between the second moment about the origin and the square of the first moment about the origin. Formula (2-5) is frequently useful in computing the variance. The nth moment around the mean m is given by E[( X - m )n ] =
Ú
•
-•
( x - m )n f ( x )dx
from which n n n E[( X - m )n ] = E ÈÍÊË ˆ¯ x n - ÊË ˆ¯ x n -1m + . . . + ( -1)n ÊË ˆ¯ x n - n m n ˘˙, Î 0 ˚ 1 n relating the moments about m to the moments around the origin. For example, by using the binomial expansion we have E[( X - m )4 ] = E( X 4 ) - 4mE( X 3 ) + 6m 2 E( X 2 ) - 4m 3 E( X ) + m 4 .
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 109
2.3 Moments
109
Let E[(x - m)n] designate the nth central moment about the mean and mn designate the nth moment about the origin. Then central moments are related to moments about the origin by the relation n n n E[( X - m )n ] = ÊË ˆ¯ E( X n ) - ÊË ˆ¯ mE( X n -1 ) + . . . + ( -1)r ÊË ˆ¯ m r E( X n - r ) 0 1 r n + . . . + ( -1)n E( X 0 )ÊË ˆ¯ m n . n For n = 2, E[(x - m)2] = E(X 2) - 2mE(X) + m2. Using the population {1 2 3 4 5 6 7 8 9 10}, we have 8.25 = 38.5 - 2 * 5.52 + 1 * 5.52. For n = 3, E[(x - m)3] = E(X 3) - 3mE(X 2) + 3m2E(X) - m3. 0 = 302.5 - 3 * 5.5 * 38.5 + 3 * 5.52 * 5.5 - 5.53.
(moments-mu (moments-mu (moments-mu (moments-o 3 (moments-o 2 (moments-o 1 (moments-o 0
EXAMPLE 2.18
3 (upto 10)) 2 (upto 10)) 1 (upto 10)) (upto 10)) (upto 10)) (upto 10)) (upto 10))
Æ 0 Æ 8.25 Æ 0 Æ 302.5 Æ 38.5 Æ 5.5 Æ 1
Compute the variance of RV X being the number of heads in a fair coin flip. Solution
X P(X)
0 1 /2 1/2
1
E( X ) = 0 * 1/2 + 1 * 1/2 = 1/2 . V ( X ) = E[( X - m )2 ] = (0 - 1/2)2 * P ( X = 0) + (1 - 1/2)2 * P ( X = 1) = 1/4 * = 1/4
1/2
+ 1/4
*
1/2
or V ( X ) = E( X 2 ) - E 2 ( X ) = [0 2 * 1/2 + 12 * 1/2] - (1/2)2 = 1/2 - 1/4 = 1/4. The variance is denoted by the symbol s 2. The square root of the variance is called the standard deviation and is denoted by the symbol s. The units for m and s are the same.
P369463-Ch002.qxd
9/2/05
110
EXAMPLE 2.19
11:01 AM
Page 110
Chapter 2 Random Variables, Moments, and Distributions
Compute V(X) or s 2 for fX(x) = 2x on [0, 1]. E( X ) =
1
Ú
0
x(2x )dx =
E( X 2 ) =
1
2x 3 1 2 = ; 3 0 3
x 2 (2x )dx =
Ú
0
1
;
2 2
1 Ê 2ˆ 1 V ( X ) = E( X ) - E ( X ) = = . Ë ¯ 2 3 18 2
2
The expected value operator can also be used to find the expected value of a function of a random variable. That is, E[ g( X )] =
EXAMPLE 2.20
Ú
•
-•
g( x ) f ( x )dx.
Let density function fX (x) = 2x for 0 £ x £ 1 and let RV Y = X2. Find E(Y) and the density function fY (y). Solution FY (y) = P(Y £ y) = P(X2 £ y) = P(X £ y ) = Ú0 y 2xdx = y. That is, FY (y) = y. F¢(y) = fY (y) = 1, for 0 £ y £ 1, and E(Y ) = 1/2. However, we can find E(Y ) directly since E(Y ) = E(X 2) = Ú10x2(2xdx) = 2x 4 1 1 = . 4 0 2
EXAMPLE 2.21
Compute the variance of RV X, the outcome from rolling a fair die, and also of RV Y, the outcome sum from rolling a fair pair of dice. Solution 1-die: V ( X ) = E( X 2 ) - E 2 ( X ) = (12 + 22 + 32 + 42 + 52 + 62 )/6 - (21/6)2 = 91/6 - 441/36 = 105/36 = 35/12. 2-dice: V(Y ) = V(Y1 + Y2) = V(Y1) + V(Y2) since Y1 and Y2 are independent RVs = 35/12 + 35/12 = 35/6.
EXAMPLE 2.22
Compute V(aX + b) by letting RV Z = aX + b with V(Z) = E[(Z - mz)2].
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 111
2.3 Moments
111
[ Z
-
mz
2
]
2
2
V ( aX + b) = E[( aX + b) - E( aX + b)] = E[aX + b - am - b] = a 2 E[( X - m )2 ] = a 2 V ( X ). The variance operator is not linear since V ( X + X ) = V (2 X ) = 4V ( X ) π V ( X ) + V ( X ) = 2V ( X ).
Observe that V(X + X) π V(X + Y) where, for example, RVs X and Y are the outcomes of throwing a fair die. EXAMPLE 2.23
a) Given RV X with E(X) = m, V(X) = s 2. Find E(X + c) and V(X + c) for constant c. b) Find E(cX) and V(cX). c) Compute m and s 2 of population X = {1 3 5 7 9} in feet. d) Recompute m and s 2 of the population in inches. Solution a) E(X + c) = E(X) + c = m + c and V(X + c) = V(X). b) E(cX) = cE(X) = cm. V(cX) = c2V(X) = c2s 2. 5
Âx c) m = E( X ) =
i =1
i
= (1 + 3 + 5 + 7 + 9)/5 = 5 feet;
5 5
Â(x s 2 = V( X ) =
i
i =1
- 5)2 = (16 + 4 + 0 + 4 + 16)/5 = 8 ft 2 .
5 d) In inches Population Y = {12 36 60 84 108} or Y = 12X. E(Y ) = E(12 X ) = 12 E( X ) = 12 * 5 = 60 inches V (Y ) = V (12 X ) = 122 V ( X ) = 144 * 8 = 1152 inches = ( -48)2 + ( -24)2 + 02 + 242 + 482 )/5 = 5760/5 = 1152.
The variance of constant a is zero, V(a) = 0. The variance of RV X plus constant a is variance of X, V(X + a) = V(X). The variance of a constant a times RV X is a2 times variance of X, V(aX) = a2V(X). V(-X) = V(X). EXAMPLE 2.24
For independent random variables X and Y, mx = 5, s x2 = 9, my = 4, s Y2 = 16. Find the mean and variance of RV 2X - 3Y.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 112
Chapter 2 Random Variables, Moments, and Distributions
112
Solution E(2 X - 3Y ) = 2 E( X ) - 3 E(Y ) = 2(5) - 3(4) = -2. V (2 X - 3Y ) = 4V ( X ) + 9V (Y ) = 4(9) + 9(16) = 180.
2.4
Standardized Random Variables If we subtract the mean mX from a random variable X and divide the result by the standard deviation sX, the resulting random variable is said to be standardized, that is, Z=
X - mX sX
.
The expected value of any standardized RV is 0 and the variance is 1, that is, E(XS) = 0 and V(XS) = 1. E( Z ) =
E( X - m x ) s
=
mx - mx s
= 0.
and V( Z) = V
2 Ê X - m X ˆ V( X - mx ) V( X ) s x = = = = 1. Ë s ¯ s 2x s 2x s 2x X
Standard scores reflect the number of standard deviations above or below the mean m. Standardized random variables play a prominent role in developing suitable hypothesis tests in statistics. EXAMPLE 2.25
The mean of 40 test scores is 80 with a standard deviation s of 2. a) Find the standard score for a raw score of 90. b) Convert the raw score of 90 to a standard score with m = 75 and s = 1. Solution a) Z =
X - mX sX
=
90 - 80
= 5 = z.
2
The raw score of 90 is converted to standard score 100, 5 standard deviations (5 * 2 = 10) above the mean score of 80. b) The raw score 90 is converted to a z-score of (90 - 75)/1 = 15 fi 75 + 15 = 90 standard score since the standard deviation is 1.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 113
2.4 Standardized Random Variables
EXAMPLE 2.26
113
(Chebyshev’s inequality). For any RV X with mean m and standard deviation s, show that P(|X - m| < ks) > 1 - 1/k2 for k > 0. Assume X is continuous. Solution s 2 = E[( X - m )2 ] = =
Ú
≥
Ú
m - ks
•
-•
( x - m )2 f ( x )dx
( x - m )2 f ( x )dx + Ú
m + ks
m - ks
-• m - ks
-•
Ú
k 2s 2 f ( x )dx +
Ú
•
m + ks
( x - m )2 f ( x )dx +
Ú
•
m + ks
( x - m )2 f ( x )dx
k 2s 2 f ( x )dx
since in both cases x £ m - ks fi x - m < - ks fi ( x - m )2 ≥ k 2s 2 and x ≥ m + ks fi x - m ≥ ks fi ( x - m )2 ≥ k 2s 2 . Hence, s 2 ≥ k 2s 2 [ P ( X £ m - ks ) + P ( X ≥ m + ks )] or P ( X - m ≥ ks ) £
1 k2
,
implying P ( X - m < ks ) > 1 -
1 k2
.
For k = 2, the probability that RV X lies within 2 standard deviations of its mean is at least 3/4, where X is any random variable. Chebyshev’s inequality provides conservative bounds on the distribution of any random variable. The Law of Large Numbers is a consequence of Chebyshev’s inequality. The law states that the proportion of a specific outcome becomes closer and closer to the underlying probability of that outcome as the number of observations increases. That is, P
2 Ê X1 + X 2 + . . . X n ˆ s -m >e £ Æ 0 as n Æ • with E(Xi) = m. Ë ¯ ne 2 n
The Law of Large Numbers accounts for the oxymoronic frequent occurrence of rare events as the number of trials n increases. EXAMPLE 2.27
a) Let RV X indicate the number of heads in the 3 flips of a fair coin. Compare the actual probability with the bounds given by the Chebyshev’s inequality for k = 1, k = 2, and k = 3. b) Estimate P(4 < X < 20) for an unknown distribution with m = 12 and s 2 = 16.
P369463-Ch002.qxd
114
9/2/05
11:01 AM
Page 114
Chapter 2 Random Variables, Moments, and Distributions
Solution a) E(X) = 1.5; V(X) = 3/4 with s = 0.866. Chebyshev’s Bound on Probability Actual Probability k = 1: P [ X - 1.5 < 1(0.866)] > 0 vs. P (0.63 £ X £ 2.4) => P (1 £ X £ 2) = 3/4. k = 2: P [ X - 1.5 < 2(0.866)] > 3/4 vs. P ( -0.23 £ X £ 3.2) => P (0 £ X £ 3) = 1. k = 3: P [ X - 1.5 < 3(0.866)] > 8/9 vs. P ( -1.1 £ X £ 4.1) => P (0 £ X £ 3) = 1. 1
b) P (4 < X < 20) = P ( -8 < X - 12 < 8) = P ( X - 12 < 8 > 1 -
2
=
2
2.5
3 4
.
Jointly Distributed Random Variables We are often interested in how two or more random variables are related. For example, we may be interested in height versus weight or success in college with high school grade point average and scholastic achievement tests. Jointly distributed RVs pervade correlation analysis and multiple regression. We define the continuous joint cumulative distribution function of two RVs X and Y (bivariate distribution) as FXY ( x, y ) = P ( X £ x, Y £ y ) =
x
Ú Ú
y
-• -•
f ( x, y )dydx.
The double integral is simply singular integrals in turn and f(x, y) is the joint density of the RVs. The individual density functions are called marginal densities. We can find f(x) by integrating the joint density over all the y values since FX ( x ) = P ( X £ x ) = P ( X £ x, Y £ •) = FXY ( x, •). Similarly, FY ( x ) = P (Y £ y ) = P ( X £ •, Y £ y ) = FXY ( •, y ). Thus we have f x ( x) =
Ú
•
-•
f ( x, y )dy and f y ( y ) =
Ú
•
-•
f ( x, y )dx.
(2–6)
Whereas for a singular RV, the probability of its whereabouts is the area under the curve of the density function, for 2 RVs, the volume under the surface of the joint density function represents the joint probability. For joint density fXY, i) f(x, y) ≥ 0 for all (x, y) • • ii) Ú-• Ú-• f(x, y)dxdy = 1 y x iii) P(X < x, Y < y) = Ú-• Ú-• f(x, y)dxdy.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 115
2.5 Jointly Distributed Random Variables
EXAMPLE 2.28
115
Let joint density f(x, y) = 4xy, for 0 £ x £ 1; and 0 £ y £ 1. a) b) c) d)
Verify that the integral of the joint density is 1. Find marginal densities f(x) and f(y). Find E(X ) from the joint density and also from the marginal density. Find E(XY ).
Solution 1 4 xy 2 1 dx = Ú 2xdx = 1. 0 0 0 0 2 0 b) fX(x) = Ú104xydy = 2x for 0 £ x £ 1, and by symmetry fY(y) = 2y for 0 £ y £ 1.
a)
1 1
1
Ú Ú 4xydydx = Ú
Note that fXY (x, y) = fX * fY, implying that X and Y are independent RVs. c) E( X ) =
1 1
Ú Ú 4x 0 0
2
ydydx =
1
Ú (2 x 0
2
1 y 2 ) dx = 0
1
Ú 2x 0
2
dx = 2/3.
Observe that after integrating with respect to y, the resulting integral is the expression for the expected value of the marginal density of X, E(X) = Ú102x2dx. 1 1
ÚÚ
d) E( XY ) =
EXAMPLE 2.29
0 0
xy * 4 xydxdy =
3 Ê 4x 2 ˆ 1 y dy = 0 ¯0 3 1
ÚË
1
4y 2
0
3
Ú
dy = 4/9.
Given joint density f(x, y) = 1/y, 0 < x < y < 1, compute a) fX(x) and fY (y), d) P(X £ 1/4, Y £ 1/2),
b) P(X £ 1/2), e) P(X > 1/4, Y > 1/2),
c) P(Y £ 1/4), f) P(X ≥ 1/2 or Y ≥ 1/2).
Solution a) f X ( x ) =
1
Ú
dy
x
= - Ln x, 0 < x < 1;
y
b) P ( X £ 1/2) =
1/ 2
Ú
0
fY ( y ) =
y
dx
0
y
Ú
= 1, 0 < y < 1.
1/ 2
- Ln( x )dx = - xLn( x ) - x
0
= -(0.5 Ln 0.5 - 0.5)
= 0.84657. c) P (Y £ 1/4) =
1/ 4
Ú
0
1dy = 1/4.
d) Method I: one integration using dy then dx P ( X £ 1/4, Y £ 1/2) = 1/4 1/4 1/2 dydx 1/4 Ú0 Úx y = Ú0 ( Ln 0.5 - Ln x )dx = xLn 0.5 - ( x Lnx - x ) 0 = 0.423.
P369463-Ch002.qxd
116
9/2/05
11:01 AM
Page 116
Chapter 2 Random Variables, Moments, and Distributions
1/2
y y=x
x
1/4
One Integration Using dydx
Figure 2.17
Method II: Two integrations using dx then dy 1/4 y dxdy 1/2 1/4 dxdy 1/2 dy P ( X £ 1/4, Y £ 1/2) = Ú Ú +Ú Ú = 0.25 + Ú 0 x 1/4 0 1/4 y y 4y = 0.025 + 0.25( Ln 0.5 - Ln 0.25) = 0.4233. e) Method I: One integration using dx then dy 1
Ú Ú
P ( X > 1/4, Y > 1/2) =
y
dxdy
1/2 1/4
=
1
Ú
1/2
1ˆ Ê 1dy Ë 4y ¯
y = 0.5 + 0.25 Ln 0.5 = 0.3267.
y 1/2
x
1/4
One Integration Using dxdy
Figure 2.18
Method II: Two integrations using dy then dx P ( X > 1/4, Y > 1/2) = =
1/2 1
Ú Ú
1/4 1/2 1/2
Ú
1/4
dydx y
1
+Ú
1
Ú
dydx
1/2 x
y
1
- Ln 0.5 dx + Ú - Ln x dx 1/2
= 0.1733 + 1 + 0.5 Ln 0.5 - 0.5 = 0.3267. f) P ( X ≥ 1/2 or Y ≥ 1/2) = P ( X ≥ 1/2) + P (Y ≥ 1/2) - P ( X ≥ 1/2, Y ≥ 1/2) y dxdy 1 1 Ê 1ˆ = (1 - 0.84657) + 0.5 - Ú Ú = Ú 1dy 1/2 1/2 1/2 Ë y 2y ¯ = 0.153 + 0.5 - (0.5 - 0.347) = 0.5.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 117
2.5 Jointly Distributed Random Variables
Table 2.9
Y
117
Discrete Joint Density Function
1 2 3 4
1
X 2
3
0 2/8 0 1/8
0 0 2/8 0
1/8 2/8 0 0
f (x, y)
0
1
2
3
4
y
1 2 3 x
Figure 2.19
Graph of Discrete Joint Density Function
In computing probabilities for joint densities, economy of effort can be achieved at times by wisely choosing the order of integration, which depends on the region.
Discrete Joint Density Functions We define the discrete joint distribution function of two RVs X and Y as f XY ( x, y ) = P ( X = x, Y = y ). Discrete joint density functions are often shown in tables. The graph of the probability mass or density function f(x, y) = P(X = x, Y = y) for Table 2.9 is shown in Figure 2.19. Notice that the sum of the probabilities in the table is 1. EXAMPLE 2.30
Let RV X be the number of heads in 3 fair coin flips and RV Y the number of runs in the flip outcomes. Verify the joint density f(x, y) for RVs X and Y shown in Table 2.10, and compute a) P(X ≥ 2, Y ≥ 2); b) the marginal densities of each; c) E(X) and E(Y ); d) V(X) and V(Y ); e) E(XY ). f ) Are X and Y independent? HHH is a run of H whereas HTH has 3 runs, H, T, H and THH has 2 runs. HHH Æ 1, HHT Æ 2, HTH Æ 3, HTT Æ 2, THH Æ 2, THT Æ 3, TTH Æ 2, TTT Æ 1.
P369463-Ch002.qxd
118
9/2/05
11:01 AM
Page 118
Chapter 2 Random Variables, Moments, and Distributions Number of Runs Density 1 2/8
Y P(Y )
2 4/8
Table 2.10
Number of Heads Density 3 2/8
0 1/8
X P(X)
1 3/8
2 3/8
3 1/8
Discrete Joint Density f(x, y) X
1 Y 2 3 f(x)
0
1
2
3
f(y)
1/8 0 0 1/8
0 2/8 1/8 3/8
0 2/8 1/8 3/8
1/8 0 0 1/8
2/8 4/8 2/8 1
Solution a) P(X ≥ 2, Y ≥ 2) = f(2, 2) + f(2, 3) + f(3, 2) + (3, 3) = 2/8 + 1/8 + 0 + 0 = 3/8. b) Notice that the marginal density functions are in the margins of Table 2.10. x f(x)
0 1/8
1 3/8
2 3/8
3 1/8
y
f (y)
1 2 3
2/8 4/8 2/8
c) E(X) = 0 * 1/8 + 1 * 3/8 + 2 * 3/8 + 3 * 1/8 = 12/8 = 3/2. E(Y ) = 1 * 2/8 + 2 * 4/8 + 3 * 2/8 = 2. d) E(X 2) = (1/8)(0 + 3 + 12 + 9) = 24/8 = 3; E(Y 2) = (1/8)(2 + 16 + 18) = 9/2; V(X) = E(X 2) - E2(X) fi V(X) = 3 - 9/4 = 3/4; V(Y ) = 9/2 - 4 = 1/2. e) E(XY) = (0 * 1 * 1/8) + (1 * 1 * 0) + (2 * 1 * 0) + (3 * 1 * 1/8) + (0 * 2 * 0) + (1 * 2 * 2/8) + (2 * 2 * 2/8) + (3 * 2 * 0) + (0 * 3 * 0) + (1 * 3 * 1/8) + (2 * 3 * 1/8) + (3 * 3 * 0) = 3. f) Observe that E(XY ) = 3 = E(X) * E(Y ) = 3/2 * 2. The fact that E(XY ) = E(X) * E(Y ) does not necessarily imply that the RVs X and Y are independent. However, if X and Y are independent, then P(X = x, Y = y) = P(X = x) * P(Y = y) and E(XY ) = E(X)E(Y ). Similarly, continuous RVs X and Y are independent if f(x, y) = f(x) * f(y). From the table, however, P ( X = 1, Y = 1) = 0 π P ( X = 1) * P (Y = 1) =
3 2 3 * = . 8 8 32
RVs X and Y are dependent. If you were told that there was only one run, you would know that the outcome is either HHH or TTT; the outcome value of RV Y affects the probability outcome of RV X.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 119
2.5 Jointly Distributed Random Variables
Table 2.11
119
Discrete Joint Density f(x, y) X
0 1 Y 2 3 fx
EXAMPLE 2.31
0
1
2
3
fy
1 12 18 4 35
15 60 30 0 105
30 40 0 0 70
10 0 0 0 10
56 112 48 4 220
In an urn are 5 red, 4 white, and 3 blue marbles. Three marbles are randomly selected. Let RV X be the number of red marbles and RV Y the number of white marbles. a) Create the joint density function f(x, y). b) Find the marginal densities and compute E(X), E(Y ) and E(XY ). c) Find P(X + Y £ 1). Solution a) There are 12C3 = 220 ways to select 3 of the 12 marbles. Each entry in Table 2.11 then is to be divided by 220. Ê 5ˆ Ê 4ˆ Ë 2¯ Ë 1¯ 40 For example, P ( X = 2, Y = 1) = = . 220 Ê12ˆ Ë 3¯ b) The marginal densities are shown in the margins. E(X) = [0*35 + 1*105 + 2*70 + 3*10]/220 = 275/220; E(Y ) = [0*56 + 1*112 + 2*48 + 3*4]/220 = 1; E(XY ) = x * y * P(X = x, Y = y) = [1*1*60 + 2*1*40 + 1*2*30]/220 = 200/220. c) P(X + Y £ 1) = f(0, 0) + f(0, 1) + f(1, 0) = (1 + 12 + 15)/220 = 28/220.
EXAMPLE 2.32
In an urn are 2 red, 1 white, and 1 blue marble. A sample of 2 is selected. Create the joint density function f(x, y, z) for RVs X, Y, and Z being the number of corresponding colored marbles selected. Find the marginal densities of X, Y, and Z. Compute a) P(X £ 1), b) P(X = 1), c) P(X = 1 | Y = 1), d) P(X = 1 OR Z = 1), e) P(X = 1 | Y = 1, Z = 0), g) E(X | Y = 1) Solution
X Y Z P(X,Y,Z)
2 0 0 1/6
0 1 1 1/6
1 1 0 2/6
1 0 1 2/6
f) E(XYZ),
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 120
Chapter 2 Random Variables, Moments, and Distributions
120
The marginal densities of X, Y and Z are X f ( x)
0 1/6
1 2 4/6 1/6
Y f ( y)
0 1 /2
1 1 /2
0 1 /2
Z f ( z)
1 /2
1
a) P(X £ 1) = P(X = 0) + P(X = 1) = 1/6 + 4/6 = 5/6 from the marginal density function of X. b) P(X = 1) = 4/6 from the marginal density of X. c) P(X = 1 | Y = 1) = P(X = 1, Y = 1)/P(Y = 1) = (2/6) / (3/6) = 2/3. d) P(X = 1 OR Z = 1) = P(X = 1) + P(Z = 1) - P(X = 1 AND Z = 1) = 4/6 + 3/6 - 2/6 = 5/6. e) P(X = 1 | Y = 1, Z = 0) = P(X = 1, Y = 1, Z = 0)/P(Y = 1, Z = 0) = (2/6)/(2/6) = 1. f) E(XYZ) = 2*0*0*1/6 + 0*1*1*1/6 + 1*1*0*1/6 + 1*0*1*2/6 = 0. g) E(X | Y = 1) = 2/3.
0 1/3
X f(X|Y = 1)
2.6
1 2/3
2 0
Independence of Jointly Distributed Random Variables Two RVs X and Y are said to be independent if f(x, y) = fX(x) * fY (y) for all (x, y) in the domain. If there are more than 2 RVs, the test for independence is similar to the test for independence among 3 or more events. They must be independent pair-wise, 3-wise, etc. If the RVs are independent, the joint density function then becomes the product of the marginal density functions. For example, fXY (x, y) = 4xy = 2x * 2y = fX * fY. Then •
•
Ú Ú xyf ( x, y)dydx = Ú xf ( x )dx *Ú yf ( y )dy
E( XY ) =
-• -• •
•
-•
-•
(2–7)
= E( X ) * E(Y ). For independent RVs X and Y, the mean of their product is the product of their means. Two discrete RVs X and Y are said to be independent if P ( X = x, Y = y ) = P ( X = x ) * P (Y = y ) for all x- and y-values. EXAMPLE 2.33
Determine if RVs X and Y are independent given joint density function f ( x, y ) =
2 3
( x + 2y ); 0 £ x £ 1; 0 £ y £ 1.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 121
2.7 Covariance and Correlation
Solution
1
f x( x ) =
Ú
f y( y ) =
Ú
0
121
2
2 ( x + 2y )dy = ( x + 1) on [0, 1]. 3 3
1
2
0
3
( x + 2y )dx =
2Ê 1ˆ 2y + on [0, 1]. Ë 3 2¯
As f(x, y) π fx(x) * fY (y), X and Y are dependent. Whenever we sample randomly from a distribution, each member of our sample is a random variable. Because the sample is randomly selected, the joint density function of the sample is just the product of each sample member’s density function. This fact underlies the importance of a random sample.
2.7
Covariance and Correlation If two random variables are independent, information about the value of one does not provide any help in determining the value of the other. But if the RVs are dependent, then information of the value of one helps determine the value of the other. We measure this help with the term called correlation. To define correlation, we need to determine the covariance between RVs. The covariance of two random variables is the average of the product of each from its mean and is a measure of the linear correspondence between the random variables. The covariance of RVs X and Y is denoted by the symbols C(X, Y) or Cov(X, Y) or sxy and is defined as C( X , Y ) = E[( X - m X )(Y - m Y )].
(2–8)
Since E[(X - mX)(Y - mY)] = E(XY - Ymx - XmY + mxmY) = E(XY) - mYmX - mXmY + mxmY, C( X , Y ) = E( XY ) - E( X ) E(Y ).
(2–9)
Note that C(X, Y ) = C(Y, X ), C(aX, bY ) = ab * C(X, Y ), and C(X, X ) = V(X ). Recall that if X and Y are independent, E(XY ) = E(X )E(Y ), and consequently C(X, Y ) = 0. However, it is not necessarily true that if C(X, Y ) = 0, then X and Y are independent.
EXAMPLE 2.34
Let discrete joint density f(x, y) for RVs X and Y be given as shown below, where RV Y = X 2. Show that C(X, Y ) = 0 even though X and Y are not independent.
P369463-Ch002.qxd
122
9/2/05
11:01 AM
Page 122
Chapter 2 Random Variables, Moments, and Distributions
Solution 0 1
Y fx(x)
E( X ) = -1 *
1 3
+0*
1
+ 1*
3
1
-1
X 0
1
0 1/3 1/3
1/3 0 1/3
0 1/3 1/3
= 0; E(Y ) = 0 *
3
E( XY ) = E( X 3 ) = -1 *
1
+0*
3
1
+ 1*
3 1 3
+ 1*
1
fY(y) 1/3 2/3 1
2 3
=
2
= E( X 2 );
3
= 0;
3
C( X , Y ) = E( XY ) - E( X ) E(Y ) = 0 = 0 - 0 *
2
= 0.
3 So although the covariance of X and Y is zero, RV Y depends on X, i.e., Y = X2. P(Y = 1 | X = 0) = 0 π P(Y = 1) = 2/3, implying X and Y are dependent. Whereas V(X) ≥ 0, C(X, Y) can be positive, negative, or zero. With a positive covariance, if X is large, Y tends to be large; with a negative covariance, if X if large, Y would tend to be small. The correlation coefficient r(X, Y) is defined as the ratio of the covariance to the square root of the product of the variances. r( X , Y ) =
C( X , Y ) . V ( X )V (Y )
(2–10)
We show that -1 £ r £ 1, or -1 £
C( X , Y ) £1 V ( X )V (Y )
X Y ˆ V ( X ) V (Y ) 2C ( X , Y ) VÊ + = + + = 1 + 1 + 2r ( X , Y ) ≥ 0 Ës sY ¯ s 2X s Y2 s Xs Y X fi r( X , Y ) ≥ - 1. Similarly, V
Y ˆ V ( X ) V (Y ) Ê X = + - 2r( X , Y ) = 1 + 1 - 2r( X , Y ) ≥ 0 Ës sY ¯ s 2X s Y2 X fi r( X , Y ) £ 1.
Hence, -1 £ r( X , Y ) £ 1.
(2–11)
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 123
2.7 Covariance and Correlation
123
The correlation coefficient measures the linear relationship between RVs X and Y. Notice that the denominator cannot be negative, implying that r and C(X, Y) must have the same sign. If r = 0, there is neither a positive nor a negative linear relationship, indicating that the two RVs are not linearly correlated, C(X, Y) = 0. If r is positive, then Y tends to increase as X increases; if r is negative, then Y tends to decrease as X increases. Specifically, if X > mX and Y > mY, or if X < mX and Y < mY, r > 0; if X < mX and Y > mY, or if X > mX and Y < mY, r < 0. EXAMPLE 2.35
Let density function f(x) for RV X be given as shown below. Let RV Y = 2X. Show that X and Y are perfectly correlated, i.e., r = 1. Solution
X f(x)
-1 1/3
0 1/3
1 1/3
E( X ) = ( -1 + 0 + 1)/3 = 0; E( XY ) = E(2 X 2 ) = 2[1/3(1 + 0 + 1)] = 4/3; V ( X ) = E( X 2 ) - 0 = (1 + 0 + 1)/3 = 2/3; C( X , Y ) = V ( X )V (Y )
Hence r( X , Y ) =
-2 1/3
Y f(y)
0 1/3
2 1/3
E(Y ) = E(2 X ) = 2 E( X ) = 0. C( X , Y ) = E( XY ) - E( X ) E(Y ) = 4/3. V (Y ) = V (2 X ) = 4V ( X ) = 8/3. 4/3 = 1. (2/3) *(8/3)
Note that X and Y are linearly related. This result is always true since the slope 2 of Y = 2X is positive, indicating a perfect positive correlation. If the slope value were negative, then r = -1, a perfect negative correlation. EXAMPLE 2.36
Calculate r(X, Y) from the joint density table for RVs X and Y.
3
X
1 2 3 f(y)
1
/4 0 0 1 /4
Y 5
7
0 1 /2 0 1 /2
0 0 1 /4 1 /4
f(x) 1
/4 /2 1 /4 1 1
P369463-Ch002.qxd
9/2/05
124
11:01 AM
Page 124
Chapter 2 Random Variables, Moments, and Distributions
Whenever there is exactly one probability greater than zero in each row, Y is a function of X. Notice that for X = 1, it is a certainty that Y = 3. Similarly, X = 2 implies Y = 5 and X = 3 implies Y = 7. In this case Y = 2X + 1 and r = 1. EXAMPLE 2.37
Let RV X have density f(x) = 2x for 0 £ x £ 1 and let RV Y = aX + b. Calculate the expression for r(X, Y). E( X ) =
1
Ú 2x
2
0
dx =
2
;
E( X 2 ) =
3
E(Y ) = aE( X ) + b =
2a 3
+b=
1
Ú 2x
3
0
2a + 3b
;
1 dx = ; 2
V( X ) =
-
2
V (Y ) = a 2 V ( X ) =
3
E( XY ) = E[ X ( aX + b)] = E( aX 2 + bX ) =
1
a2
4 9
=
1
.
18
.
18 a 2
+
2b 3
=
3a + 4b
;
6
C( X , Y ) = E( XY ) - E( X ) E(Y ) 3a + 4b 2 2a + 3b a = - * = . 6 3 3 18 r( X , Y ) =
s XY s Xs Y
=
a /18 = 1. 1/18 * a 2 /18
Note that if a > 0, r = 1, if a < 0, r = -1, and if a = 0, r = 0.
We are often interested in the sum and difference of random variables. The other properties of interest are E(X + Y) and V(X + Y). Since E is linear, the expected value of the sum is the sum of the expected values. We can compute V(X + Y) by using the shortened version of V(X) = E(X2) - E2(X). Thus E[( X + Y )2 ] - E 2 ( X + Y ) E[( X + Y )2 ] - [ E( X ) + E(Y )][ E( X ) + E(Y )] E( X 2 + 2 XY + Y 2 ) - E 2 ( X ) - 2 E( X ) E(Y ) - E 2 (Y ) E( X 2 ) - E 2 ( X ) + E(Y 2 ) - E 2 (Y ) + 2[( E( XY ) - E( X ) E(Y )] (2–12) V( X ) + V (Y ) + 2C( X , Y )
V( X + Y ) = = = = =
V ( X - Y ) = V ( X ) + V (Y ) - 2C( X , Y )
(2–13)
and in general V ( aX + bY ) = a 2 V ( X ) + b2 V (Y ) + 2ab * C( X , Y ). Note that C ( X , X ) = E( X 2 ) - E( X ) E( X ) = V ( X ) and that V ( X + X ) = V ( X ) + V ( X ) + 2C( X , X ) = 4V ( X ) = V (2 X ).
(2–14)
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 125
2.7 Covariance and Correlation
125
For independent random variables Xi, V
EXAMPLE 2.38
Given joint density f ( x, y ) =
x+y
n
n
i =1
i =1
Ê ˆ Ê ˆ X = V( Xi ) . ËÂ i¯ ËÂ ¯
for 0 £ x £ 1, 0 £ y £ 2, compute
3 V(3X - 2Y - 5). Solution
f X ( x) = fY ( y ) = E( X ) =
2
( x + y)
0
3
1
( x + y)
0
3
Ú
Ú
2
1
Ú (x 3
dy = dx =
2xy + y 2 2 2( x + 1) = for 0 £ x £ 1; 0 6 3 x 2 + 2yx 1 0
6 2
0
5 + x )dx = ; 9
=
2y + 1
for 0 £ y £ 2;
6
E( X 2 ) =
2
1
Ú (x 3
3
0
+ x 2 )dx =
7
;
18
V ( X ) = 13/162. Similarly, E(Y ) =
11
E(Y 2 ) =
;
9 E( XY ) =
16
;
V (Y ) =
9 1
1 2
ÚÚ 3
0 0
;
81
( x 2 y + xy 2 )dydx =
C( X , Y ) = E( XY ) - E( X ) E(Y ) =
23
2 3
1
1
Ê
Ú 2x 3 Ë 0
-
2
+
8x ˆ 2 dx = ; ¯ 3 3
5 11 -1 * = . 9 9 81
Using Equation (2–14), we have V (3 X - 2Y - 5) = 9V ( X ) + 4V (Y ) - 12C( X , Y ) 9 * 13 4 * 23 12 * -1 = + 162 81 81 325 = = 2.0061728. 162
EXAMPLE 2.39
Find the sample correlation coefficient between two samples of size 20 drawn randomly from the uniform distribution on [5, 10]. Then compute the correlation coefficient between the two sorted ascending samples. The command (setf s1 (sim-uniform 5 10 20) s2 (sim-uniform 5 10 20)) assigns random samples from the continuous uniform on [0, 1] to s1 and s2. For example, s1 may be assigned to
P369463-Ch002.qxd
126
9/2/05
11:01 AM
Page 126
Chapter 2 Random Variables, Moments, and Distributions
6.1850 5.1285 5.8795 8.3230 7.8035 6.1580 7.2920 6.7135 8.0265 6.3065 5.3185 8.9090 9.0365 8.8315 6.7430 7.6685 8.0580 5.7040 6.6240 6.5815 and s2 may be assigned to 9.2415 8.8265 7.7615 7.8905 9.3095 9.7360 9.0350 7.0460 9.7500 9.7760 6.8145 5.9700 7.4285 5.7255 5.0540 6.3965 6.6430 8.7300 7.8050 7.7855 The command (rho s1 s2) returns the correlation coefficient -0.3398, but the command (rho (sort s1 #' 2);
X
0
1
2
3
4
P(x)
c
2c2
2c
3c2
c
d) E(X); e) E(X2); f) V(X).
ans. 0.2 0.8 0.32 2.04 5.96 1.7984.
2. The density function for RV X is f(x) = cx on the interval [0, 1]. Find a) c; b) P(X > 1/2 ); d) E(X); e) E(X2);
c) P(1/4 < X < 1/2); f) V(X).
3. The experiment is the roll of a pair of fair dice. Let RV X denote the maximum of the dice, RV Y the minimum of the dice, and RV Z the absolute difference between the dice. a) Find E(X), E(Y), E(Z).
b) Find V(X), V(Y), V(Z). ans. 161/36 91/36 70/36 1.97 1.97 2.05.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 141
141
Problems
4. The experiment is 3 flips of a fair coin. Let RV X be the number of heads, RV Y the outcome of the second flip, and RV Z the difference between the number of heads and the number of tails. a) Find E(X), E(Y), and E(Z).
b) Find V(X), V(Y), V(Z).
5. The daily demand for a store’s computers is a RV X shown below. Compute the distribution of a 2-day demand (assume day-to-day independence) and the expected value of the 2-day demand. X P(X)
0 1 2 0.25 0.5 0.25 1-day demand
ans.
Y P(Y)
0 /16
1
1 2 6 /16 /16 2-day demand 4
3 /16
4
4 /16
1
6. Ten balls numbered from 1 to 10 are placed in a box. You can pay $1 and randomly choose a ball from the box to receive that number in dollars, or you can pay $1 and accept the ball if it is greater than 5 but pay $2 more to replace the ball if it is not greater than 5 and draw again and accept the number in dollars on the next ball selected. Compute the expected profit of each option. (p2-6 1000) returns the simulation of the latter. 7. Compute the variance of the a) outcomes from rolling a fair die; b) sum of n rolls of a fair die. ans. 35/12 35n/12. 8. Compute the expected number of aces in four random picks from a deck without replacement. 9. Compute the expected number of die tosses until a 4 appears. ans. 6. 10. In your pocket are 3 quarters, 5 dimes, 2 nickels, and 4 pennies. You reach into your pocket and grab one coin. What is the expected value? What are your assumptions? 11. a) If an integer is randomly chosen from a to b where a to b are consecutive integers, show that the expected value is (a + b)/2. b) If you tried the n keys in your pocket one at a time without replacement to open a door, find the probability that the kth key tried opened the door. Repeat the problem with replacement. ans. 1/n (n - 1)k-1/nk. 12. A die is rolled with RV X as the outcome. You can accept either $E(1/X) or $1/E(X). Which is more? 13. In front of you are four bins. One is filled with $5 bills, another with $10 bills, a third with $20 bills, and a fourth with $50 bills. You randomly choose a bin and withdraw one bill. It costs $1 to perform a trial of this experiment, and an additional dollar each time a trial is repeated, for example, for the third trial it will cost you $3 to play. a) To maximize your expected return, how many times should you play and what is the expected profit? b) How many times can you play before expecting to lose? ans. 21 215.25 41.
P369463-Ch002.qxd
142
9/2/05
11:01 AM
Page 142
Chapter 2 Random Variables, Moments, and Distributions
14. The density function for RV X is given by f(x) = 2x on [0, 1]. RV Y = 5X + 10 on [10, 15]. Find a. E(X); b. E(Y) from Y’s density function; c. E(Y) from X’s density function; d. E(Y 2). 15. a) Write the joint density function for the RVs X and Y, representing the respective outcomes from a fair dice roll. b) Write the joint density for RV X being the number of 4’s and RV Y being the number of 5’s in a fair dice roll. ans. f(x, y) = 1/36; x, y = 1 - 6.
Y
X 1 8/36 2/36 0
0 16/36 8/36 1/36
0 1 2
2 1/36 0 0
16. Compute E(X) and V(X) for an indicator RV X. 17. Given the joint density function of X and Y in the table below, find a) fX,
b) fY,
c) fX|Y=1,
d) E(X) and E(Y),
e) E(Y | X = 0).
Y X 0 1
0 1/8 0
1 2/8 1/8
2 1/8 2/8
3 0 1/8
18. Given f(x, y) = 6(x2 + y)/5 on [0, 1] for x and y, find P(X < 1/2, Y < 1/4). 19. Given f(x, y) = c(2x + y) for 2 < x < 6; 0 < y < 5, find c, fX(x) and fY(y). 20 x + 25 32 + 4 y ans. c = 1/210 f x ( x) = on (2, 6) fY ( y) = on (0, 5). 420 210 20. Given f(x, y) = 2x + 2y - 4xy for 0 £ x £ 1; 0 £ y £ 1, compute the following. a) c) e) g) i) k)
fX(x) = _____________ E(X) = _____________ E(X 2) = ____________ E(XY) = ____________ V(X) = _____________ p = ________________
b) d) f) h) j) l)
fY(y) = _______________ E(Y) = _______________ E(Y2) = _______________ C(X, Y) = _____________ V(Y) = ________________ V(X - Y) = ____________
21. Prove E(X2) ≥ E2(X). 22. Given the table for joint density fXYZ(x, y, z) for RVs X, Y, and Z, a) show that the RVs are independent pair-wise but fXYZ π f(x)*f(y)*f(z); b) write the conditional density fXY|Z=1.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 143
143
Problems 0 0 0 1/4
X Y Z fXYZ
1 1 0 1/4
1 0 1 1/4
0 1 1 1/4
23. Calculate r for the joint density distribution for RVs X and Y.
1 X 2 3 f(y)
3 1/4 0 0 1/4
Y 5 0 1/2 0 1/2
7 0 0 1/4 1/4
ans. 1.
f(x) 1/4 1/2 1/4
24. Given joint density f(x, y) = 3x for 0 £ y £ x £ 1, compute V(3X - 2Y - 5). 25. Complete the solution for Example 2.17 for n = 12 balls in 3 weighings. 26. a. Which conveys more information, 2 pairs or a triple in 5-card poker? b. Compute the entropy H in a coin flip with P(heads) = 0.90. c. Compute the entropy in a fair die roll. 27. Show that if E[(X - Y)2] = 0 for RVs X and Y, then X = Y. 28. Prove that if X and Y are independent RVs, then E(XY) = E(X)E(Y). 29. Given fXY = 1 for 0 £ x £ 1; 0 £ y £ 1 and that RV U = X + Y and RV V = X - Y, show that fUV (u, v) = 1/2, 0 £ u £ 2, -1 £ v £ 1 and that u 0 £ u £1 fU ( u ) = . 2- u 1£ u £ 2
{
30. Given fXY = 1 for 0 £ x £ 1; 0 £ y £ 1 and that RV U = X + Y and RV V = Y, find fUV(u, v), fU, and fV. 31. Given joint density function f(x, y) = 1/2 for 0 < x < 1 and 0 < y < 2 for independent RVs X and Y, let RV U = X + Y and RV V = X - Y, and find the joint density fUV. ans. 1/4. 32. The experiment is to randomly pick two marbles from an urn containing 4 red marbles and 6 blue marbles. Let RV X indicate the number of red marbles in the sample. Find the density function for RV Y =
1 x +1
.
33. The commands (Covar x y) returns the covariance sxy, (Var x) returns the variance s x2, and (rho x y) returns the correlation coefficient for RV X and RV Y where the domains for X and Y are each a list of numbers. If RVs X and Y are each discrete uniform on the set of integers from 0 to 99, predict the value for the correlation coefficient. If RV Z is randomly selected from a set of integers 0 to 99, guess the value of the cor-
P369463-Ch002.qxd
144
9/2/05
11:01 AM
Page 144
Chapter 2 Random Variables, Moments, and Distributions
relation coefficient with X and Z. Then predict with certainty the value of the correlation coefficient with Y and Z. (setf X (upt0 99) Y (upt0 99) Z (swr 100 (upto 100))) creates the 3 RVs. ans. 1 ª 0 same. 34. In an urn are 3 red, 4 white, and 5 blue marbles. Three marbles are randomly selected. Let RV X denote the number of white marbles and RV Y the number of blue marbles in the sample. Create the joint density function fXY and the conditional density function fY|x=1. Determine if X and Y are independent RVs. 35. For the joint density given below, find a) the conditional density of X given Y = 1; b) r, the correlation coefficient. Y X 0 1
0 1/8 0
1 2/8 1/8
2 1/8 2/8
3 0 1/8
36. Given joint density f(x, y) = (2/5) (2x + 3y) for 0 £ x £1; 0 £ y £ 1, find the conditional variance of y given x = 1/4. 37. Let RV X have density f(x) = 3x2 on [0, 1] and RV Y = 2x + 3. Find r(X, Y). ans. 1. 38. Show that E(X + Y) = E(X) + E(Y) given joint density f(x, y). 39. a)Use Chebyshev bounds to find P(|X - 15| > 6) for RV X with m = 15 and s 2 = 9 from an unknown distribution. b) For RVs with nonnegative domains, P(X ≥ x) £ m/x (Markov’s inequality). For RV X with density f(x) = 2x on [0, 1], find the upper bound and the actual probability that X > 3/4. ans. £ 1/4 £ 8/9 vs. 7/16. 40. Find the covariance of RVs X and Y with joint density f(x, y) = 6xy2 for x and y on [0, 1]. 41. Show that RV Y’s density function fY(y) = fx(y) + fx(-y) when RV Y = |X| where fX(x) is the value of the probability density at X = x. 42. Find the regression curve of Y on X for joint density function f(x, y) = 2 for 0 < x < y < 1. 43. Show that discrete f ( x ) =
2x n( n + 1)
for n a positive integer and x =
1, . . . , n is a valid density. 44. Show that f(x) = 2(c - x)/c2 on [0, c] is a valid density. 45. Given RVs Y and X where Y = X + 1, find Y’s density if X is a) continuous uniform on [0, 1]; ans. 1 on [2, 3] b) exponential with parameter k = 1; ans. e-(y-1), y > 1
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 145
Review
c) exponential with parameter k = 2; d) f(x) = 2x on [0, 1]; e) f(x) = 3x2 on [0, 1].
145
ans. 2e-2(y-1), y > 1 ans. 2(y - 1) on [1, 2] ans. 3(y - 1)2 on [1, 2]
REVIEW 1. a. How many ways can 5 identical (indistinguishable) items be put in 3 bins? ans. 7C2 = 21. 005 014 023 032 041 050 104 113 122 131 140 203 212 221 302 311 320 401 410 500 050 b. How many ways if each bin must have at least 1 item? ans. 4C2 = 6. 113 122 131 212 221 311 2. There are 3 boxes, A, B, and C. Box A contains 2 gold coins, Box B has 1 gold and 1 silver coin, and Box C has 2 silver coins. Experiment: Randomly choose a box. Clearly the probability of getting matching coins is 2/3. Randomly choose a coin in the chosen box. You see that the coin is gold, eliminating Box C. Compute the probability that the other coin in the box is also gold. 3. Compute the probability of winning the lottery if you must pick 6 numbers in correct order from the numbers 1 to 99. If you bought 100 lottery tickets, your probability of winning would be 100 times greater than if you bought one lottery ticket. What is that probability? ans. 1.24e-12, 1.24e-10. 4. In a matching problem of n hats returned randomly to n owners, what is the expected number of matches? Hint: Let Xi be an indicator RV and let Xi = 1 if the ith hat matches the owner’s hat and 0 if it does not. Let RV X = Sxi. 5. In problem 4, a. What is the probability of exactly n - 1 matches from n hats? ans. 0. b. What is the probability of exactly 3 matches from 5 hats? 1/12. 6. In randomly selecting r shoes from n pair of shoes, what is the expected number of matches? Hint: Pick one shoe. The probability of the next shoe matching is p = 1/(2n - 1). So the expected value of indicator RV X (any match) is p = 1 *1/(2n - 1). How many selections of 2 shoes can one select from r shoes? Compute the expected number of matches for 5 pairs of shoes from randomly selecting 6 shoes. 7. In rolling a pair of dice let RV X be the outcome of the first die and let RV Y be the sum of the dice. Create a table displaying the joint density f(x, y). 8. Find E(X) where RV X is a randomly selected integer from the integers 1 to n.
P369463-Ch002.qxd
146
9/2/05
11:01 AM
Page 146
Chapter 2 Random Variables, Moments, and Distributions
9. Let f(x) = 1 for 0 £ x £ 1. Find M(t), E(X), and V(X) from the generating function. 10. In selecting from a set of n items with replacement, how many items can be expected to be missed after n selections? Hint: Use indicator RVs for each item. (sim-n-items n m) simulates the expected number of missed items in n selections for m trials. 11. The mean of RV X is 50; the variance is 25. Give a bound on P(40 £ X £ 60). 12. Let X and Y be RVs representing the outcomes of two dice and let W = max(X, Y). Create the joint density distribution table for fWX. Repeat for W = min(X,Y). 13. Show that f X Y ( x y ) =
f ( x, y )
is a valid density by integrating
fY ( y )
f ( x, y ) fY ( y )
with respect to x. 14. A pair of fair dice is tossed. Compute the P(S7), the probability that the sum is 7. You are then told that the outcome of the first die is an odd prime number. Now compute P(S7). 15. Show that C(c + X, Y) = C(X, Y) for constant c. 16. Show that a) C(X, Y + Z) = C(X, Y) + C(X, Z); b) C(X + Y, X - Y) = V(X) - V(Y). 2 2 17. Show that Ú10Ú2-v v (u - v )dudv integrates to 1.
18. Given fXY(x, y) = cx2y for 0 £ y £ x2 £ 1, compute a) c, b) P(X £ 2Y), and c) P(X ≥ 2Y). 19. Prove that C(X, X) = V(X). 20. Use indicator RVs to solve for the probability of no envelope getting the correct letter when n letters are randomly inserted into n envelopes. n
Let X = ’ (1 - X i ) where Xi = 1 if the correct letter is in the i =1
envelope and 0 otherwise. 21. Let f(x, y) = 4xy for 0 < x < 1, 0 < y < 1. Find the joint probability distribution of U = X2 and V = XY. ans. g(u, v) = 4u1/2vu-1/2|1/2u| = 2v/u for 0 < v < 1 and v2< u < 1. 22. A person has fire insurance on a $200,000 home for $1900 a year. Past statistics show the probability of fire in the area is 0.9%. Compute the expected profit of the insurance company if administration costs are $60 a year. 23. Write out the coefficient of the x3y4z6 in the expansion of (x + y + z)13. ans. 13!/(3!4!6!).
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 147
147
Paradoxes
24. Mr. and Mrs. Yancey wish to name their daughter with 3 unique initials in alphabetical order. How many such monograms are there for the Yanceys? . . . for the Zimbos? 25. Cards from 2 shuffled decks are turned up simultaneously. How many matched pairs would one expect? Suppose the cards from one deck were announced in order ace 2 3 . . . king of spades, ace 2 3 . . . king of hearts, ace 2 3 . . . king of diamonds, and ace 2 3 . . . king of clubs. How many correct calls would one expect? Compute the probability of 0 or 1 match. Compute the probability of no match and thus show the closeness to 1/e. (D-print-map n) returns the number of maps for 0 to n matches. (D-print-map 5) returns (44 45 20 10 0 1). ans. 1 1 (float (/ (sum (firstn 2 (D-print-map 52))) (f 52))) Æ 0.73576. 0 or 1 match (float (/ (sum (firstn 1 (D-print-map 52))) (f 52))) Æ 0.3678794 vs. 1/e. no match 26. If 4 hats are randomly returned to their owners, show that the expected number of correct hat-owners is 1. Use the software command (printmap 4) to return the number of ways to have 0 to 4 matches. For example, (print-map 4) prints Digit Count
0 9
1 8
2 6
3 0
4 1
The probability of a 0 match is thus 9/24 and of 1 match is 8/24, etc. Verify that the expected number of correct hats returned is 1. 27. Compute the probability that 20 randomly selected people all have their birthdays in the same 3 months (inclusion-exclusion principle).
ans. 28. Given RV X with density f(x) = x3 on 0 £ x £ E(X); c) E(Y) where Y = 2x3 + 5.
Ê12ˆ (320 - 3 * 220 + 3) Ë 3¯ 1220
.
2 , find a) P(X < 1); b)
29. The following 10 numbers are in a hat: 1 3 5 7 9 14 16 17 19 20. Six numbers are randomly selected from the hat. Compute the probability that the 3rd lowest number is 9. Explain how to simulate this situation. ans. 2/7.
PARADOXES 1. In flipping a fair coin, you are awarded $2x if the event heads does not occur until the x flip. Compute the expected value of the award (St. Petersburg paradox). ans. •.
P369463-Ch002.qxd
148
9/2/05
11:01 AM
Page 148
Chapter 2 Random Variables, Moments, and Distributions
2. At one clinic, Drug A cures 100 people out of 400 treated, for a cure rate of 25%. Drug B cures 10 people out of 20 treated, for a cure rate of 50%. Drug B is then reported to be twice as effective as Drug A. At another clinic, Drug A cures 2 people out of 20 treated, for a cure rate of 10%, while Drug B cures 80 people out of 400 treated, for a cure rate of 20%. Drug B is again reported to be twice as effective as Drug A. Which do you think is the more effective drug? 3. An urn has 1 white ball and 1 red ball. A ball is randomly picked from the urn. A white ball ends the experiment, but a red ball results in another red ball added to the urn and the experiment repeated. P(W) = 1/2 on the first selection and P(RW) = 1/6 for a white to occur on the second selection (1/2)*(1/3). What is the expected number of picks until a white ball is selected? Try (sim-paradox-wr n) to see the series develop. (simparadox-wr 10) may return (R R R W). The command returns (W), or a s series of R’s less than n followed by a W or exactly n R’s. The command (sim-uwr n m) returns m trials of (sim-paradox-wr n). To see that “any” length of R’s can occur, give the following commands. a. (setf trial-list (sim-urn-wr 700 1200)) b. (setf trial-length (repeat #' length trial-list)) c. (setf x (position 700 trial-length)) d. (nth x trial-list)
; returns 1200 trials of (simparadox-wr 700). ; returns lengths of each of 1200 trials. ; returns position of string of length 700. ; returns string of 700 R’s if event occurred.
Let RV X denote the trial at which a white ball is selected from the urn. X P(X)
1 1/2
2 1/6
3 1/12
4 1/20
5 1/30
6 1/42
7 1/56
8 1/72
... ...
E( X ) = (1 * 1/2) + (2 * 1/2 * 1/3) + (3 * 1/2 * 2/3 * 1/4) + (4 * 1/2 * 2/3 * 3/4 * 1/5) + . . . = 1/2 + 1/3 + 1/4 + 1/5 + . . . + 1/ n + . . . • 1 =Â = •; Harmonic Series. n -1 n + 1 4. Assume that 2 people have random amounts of money in their wallets. These people meet and compare amounts. The person with the more money loses that amount to the other person. Both persons a priori reason that each has more to gain than to lose, i.e., that the wallet game seems to favor both persons. Note that the lesser of the two sums of money is not at risk. Explain the wallet paradox. Try the command (sim-wallet max-amount n) where a random number from 0 to max-amount is put in each wallet and the results are returned for n wagers. For example, (sim-wallet 100 10) may return ((51 85 78 85) (65 76 58 77 79 56)),
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 149
Software Exercises
149
showing one person won 4 times, the other 6 times, with sums being 299 and 411, respectively. Try (repeat #' sum (sim-wallet 100 100)). Vary the max amount and n, the number of iterations. 5. Answer orally the next question either “yes” or “no.” Will the next word spoken by you be “no”? Will the next word spoken by you be “yes”? ans. Not true True. 6. When waiting for an elevator on the 5th floor of a 20-floor building, which is the more likely direction for the next elevator: up or down? Explain. 7. Let density function f(x) = c/(x2 + 1) for -• < x < •. Find the value of the constant c and show that f(x) is a valid density function. Then show that E(X) does not exist for this Cauchy density. ans. c = 1/p. Hint: (arctan x)' = 1/ (x2 + 1).
SOFTWARE EXERCISES 1. (random n) returns an integer from 0 to n - 1 if n is an integer. (random 10) may return 7. In decimal format, (random n) returns a number x in 0 £ x £ |n|. (random 10.0) may return 4.3701995; (random -10.0) may return -5.433834. 2. (all-occur integer) returns the expected number of trials before all n events in an equally likely sample space occur. Compute the expected number of coin flips until each event heads and tails occur with (alloccur 2) and the expected number of dice rolls until each face occurs with (all-occur 6). 3. (sim-all-occur integer) returns a simulated number of trials before all n items are selected from an equally likely sample space. (sim-all-occur 2) should return a value close to the theoretical value of 3. (sim-alloccur 6) should return a number close to the theoretical value of 14.7. 4. (sim-n-all-occur integer n) returns the average of n simulations of (simall-occur integer). (sim-n-all-occur 6 100) returns the average of a list of 100 trial runs of (sim-all-occur 6). The result should be close to 14.7. Expect the result to be closer, the larger the number of trials. Compare the simulated results with the theoretical results from (all-occur integer). 5. Repeat software exercise 4 for a fair coin toss. Try (sim-n-all-occur 2 100). Simulate Review Problem 10 with (n-items 100 100) to show that the expected number of missed selections is close to 100/e. 6. Find the expected number of trials to randomly select with replacement all the integers from 1-10. Try (sim-n-all-occur 10 100) for a simulation and (all-occur 10) for an exact answer. Use the ≠ key to repeat the simulation as desired. Try changing 100 to 200 to 500 to see the value close in on (all-occur 10).
P369463-Ch002.qxd
150
9/2/05
11:01 AM
Page 150
Chapter 2 Random Variables, Moments, and Distributions
7. (pick-until target n) returns a simulated sequence of integers between 1 and n until the target integer appears. (pick-until 4 6) simulates repeated tossings of a die until 4 appears. For example, (pick-until 4 6) may return '((1 1 2 6 5 3 3 6 2 4) 10), indicating that the target 4 occurred at the tenth die roll. How many times do we expect a die to be tossed before a 4 appears? Take the average of 100 returns with (simpick-until 4 6 100). We expect 6 times before a 4 occurs. (pick-until 12345 100000) may give you a feel for innumeracy. 8. (EV list-1 list-2) returns the expected value where one list is the probabilities and the other list is the values assumed by the random variable at these probabilities. ( EV ¢(0 1 2 3) ¢(1/8 3/8 3/8 1/8)) Æ 1.5. 9. In a continuous random sequence of heads and tails, which 3-sequence will occur first more frequently, TTH or HTT? To simulate, try the command (H-vs-T '(T T H) '(H T T)). For example, (H-vs-T '(T T H) '(H T T)) may return (HTT wins). Show that THH > HHT > HTT > TTH > THH where > means “wins over with 2:1 to 3 : 1 odds.” The command (sim-h-vs-t seq1 seq2 n) returns the results from n replications. For example, (sim-h-vs-t ¢(T T H) ¢(H T T) 100) may return (26 74), implying that HTT occurred 74 times before TTH. What sequence would you pick to beat TTT? ans. HTT 7.5:1. 10. We seek a random sample from the density f(x) = 2x on [0, 1]. Note that F(x) = X2. Suppose RV Y = X2. Then Y must be continuous uniform on [0, 1], implying that X = Y . To simulate a random sample, first use the command (setf y (sim-uniform 0 1 100)), which returns 100 random samples from the continuous uniform on [0, 1]. Then (setf x (repeat #¢ sqrt y)) takes the square root of each sample. Finally, the command (mu x) returns the average of these 100 x-values. What is the theoretical expectation? Try(mu (repeat #' sqrt (sim-uniform 0 1 100))) and use the ≠ key to repeat the simulation of E(X). Increase the sample size 100 to 1000 to see the simulated estimate of the expected value approach the theoretical expected value of 2/3. 11. (XYZ-dice n) returns the maximums, minimums, and absolute values of the differences from n rolls of a pair of fair dice (see Problem 3). Then
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 151
151
Software Exercises
(repeat #' mu *) computes estimates for the expected values of the maximum, minimum, and absolute value of the difference. Try (setf ddata (XYZ-dice 300)) followed by (repeat #' mu d-data) to see how close these estimates are to the theoretical values of (4.47 2.53 1.94) for E(X), E(Y), and E(Z), respectively. Try (repeat #' mu (XYZ-dice 100)) in conjunction with the ≠ key to repeat the simulation as desired. 12. (XYZ-coins n) is a simulation of returning in 3 fair coin flips the following: X, the number of heads; Y, the outcome of the second flip; and Z, the difference between the number of heads and the number of tails. See Problem 4. Try (repeat #' mu (XYZ-coins 100)) to see estimates of the theoretical values 1.5, 0.5, and 0, respectively. 13. (setf x (upto 100) y (reverse (upto 100)) assigns x to the first 100 integers and y to the same integers but from 100 to 1. (Covar x y) returns sxy, the covariance of x, and (Var x) returns sx, the variance of x. Predict the correlation coefficient and use the software commands to verify your prediction with (rho x y), which returns the correlation coefficient of the x-y data. 14. Try (var (firstn 100 (pi1000))) and repeat for secondn, thirdn. fourthn, and fifthn for the first 1000 digits of p to see similar variance values. Values should be close to 8.25 if all the digits from 0 to 9 are equally represented, i.e., for pseudo-random pi. 15. Test the pseudo-randomness of p. Use (setf pi (pi1000)). Then (firstn 100 pi) returns the first 100 decimal integers of p. The command (secondn 100 pi) returns the second 100 digits of p. Predict the correlation coefficient between these two sets of pi digits and check your prediction with (rho (firstn 100 pi) (secondn 100 pi)). Use the ≠ key and try the first and third set with (rho (firstn 100 pi) (thirdn 100 pi)). Similarly, repeat with thirdn and fourthn and fourthn and fifthn. Predict the correlation coefficient for (rho (secondn 100 pi) (secondn 100 pi)). 16. (sim-dice-roll n) returns the outcome sums Si from n fair pair of dice rolls. Try the command (setf d-sums (sim-dice-roll 1296)) followed by (print-count-a-b 2 12 d-sums), which returns the number of Si in the 1296 outcomes d-sums, i.e., the number of occurrences of the sums from 2 to 12. For example, we expect 1296/6 = 216 outcomes of the sum 7. Check for the dice sums. X Count Expect
2 41 36
3 73 72
4 122 108
5 143 144
6 169 180
7 202 216
8 182 180
9 133 144
10 108 108
11 82 72
12 41 36
P369463-Ch002.qxd
152
9/2/05
11:01 AM
Page 152
Chapter 2 Random Variables, Moments, and Distributions
17. Let RV Y = X3 where fX(x) = 3x2 for x on [0, 1]. Show that Y has density fY(y) = 1 on [0, 1] and simulate a random sample of size 100 from X’s density. Then estimate E(X) from the sample. The software commands are (setf Y (sim-uniform 0 1 100)) to return a random sample from Y’s density; (setf X (repeat #' cube-root Y)) to take the cube root of each; (mu X) to return the average of the sample, an estimate for E(X) = 3/4. 18. Complete the blanks with the expected value from the following software commands: a) b) c) d)
(mu (sim-uniform 5 10 100)) returns ______ (mu (firstn 500 (pi1000))) returns ______ (rho (firstn 100 (pi1000) ) (thirdn 100 (pi1000))) returns ______ . (rho (secondn 100 (pi1000)) (fourthn 100 (pi1000))) returns ______. e) (mu (sim-coins 100 19/20 10)) with 10 experiments of flipping a coin 100 times with probability of success 19/20. The expected average is ______. 19. (mu-svar list) returns the mean and variance of a list of numbers. For RV X with density f(x) = 3x2 on [0, 1] with X-data being assigned to a random sample from that distribution, (setf x-data (repeat #' cuberoot (sim-uniform 0 1 100))) draws a sample of size 100 from the distribution of RV X with density f(x) = 3x2 on [0, 1]. That is, RV Y = X3 and Y is continuous uniform on [0, 1]. (mu-svar X-data) should return values close to ______ and ______ . ans. 3/4 0.0375. Try (mu-svar (repeat #' cube-root (sim-uniform 0 1 100))) repeatedly with the F3 key. 20. A die is rolled with RV X as the outcome. You can accept either $1/X or $1/E(X). Which is more? Simulate the problem. f ( x) = P ( X = x) = 1/6 where x is an integer from 1 to 6. E( X ) = 3.5 and 1/E(X) = 1/3.5 = 0.2857. Let RV Y = 1/ X with the discrete density Y P(Y)
1 1/6
1/2 1/6
1/3 1/6
1/4 1/6
1/5 1/6
1/6 1/6
E(Y ) = E(1/ X ) = 0.4083 > 1/ E( X ) = 0.2857. The software command (mu (repeat #' recip (die 1000))) simulates the results from 1000 die rolls. The recip function returns the reciprocal. This simulation returned 0.4123666.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 153
153
Software Exercises
21. Compute the probability of a random point composed of two random numbers from the continuous uniform on [0, x], being inside a circle inscribed in a square of side x. ans. p/4. x
22. Explain how to simulate the area of a random triangle in a rectangle of dimensions x by y. a) Generate 3 random points (xi, yi) where 0 < xi < x and 0 < yi < y. b) Compute the distance between each point. c) Use Heron’s formula to find the area of the triangle given by A=
1
( a + b + c) for distances a, b and c. 2 d) Repeat as desired to find the average of such areas. Try the command (random-triangle x y n); for example, (random-triangle 10 10 1000) may return the average of 1000 areas of random triangles in a 10 by 10 square as 7.54, from which p ª 0.754. where
s=
s( s - a )( s - b)( s - c)
23. Compute the probability of a randomly chosen point falling inside a randomly chosen triangle in a rectangle of dimensions 10 by 10. See Exercise 22. ans. ª 0.075. 24. a) The experiment is to randomly pick an integer from 1 to 10 and compute its number of divisors. Find the expected number of divisors if the experiment is done repeatedly. # # Div
1 1
2 2 X P(X)
3 2 1 1/10
4 3
5 2 2 4/10
6 4 3 2/10
7 2
8 4
9 3
10 4
4 3/10
E( X ) = (1 + 8 + 6 + 12)/10 = 2.7 divisors. (swor n list) return n items from list without replacing. For example, (swor 6 (upto 10)) may return (7 10 6 8 5 3), no duplicates, while (swr 6 (upto 10)) may return (7 9 9 1 8 3), with duplicates possible. (divisors-f n) returns a list of the divisors of n. (divisors-of 10) returns (1 2 5 10). (num-divisors-of 10) returns 4. (mu (repeat #¢ num-divisors-of (swr 20 (upto 10)))) returned the simulated value of 2.85.
P369463-Ch002.qxd
154
9/2/05
11:01 AM
Page 154
Chapter 2 Random Variables, Moments, and Distributions
(mu (repeat #¢ num-divisors-of (swor 20 (upto 10)))) returns the exact 2.7 value, of course. b) Simulate the expected number of divisors of an integer randomly selected from the integers 1 to 1000 using the following commands. (setf integer-list (swor 100 (upto 1000))) returns a list of 100 randomly selected integers from 1 to 1000. (mu repeat #¢ num-divisors-of integer-list)) returned 7.13, the average number of divisors in the integer-list. Try repeatedly (mu (repeat #¢ num-divisors-of (swor 100 (upto 1000)))) to get an estimate of the expected number of divisors from a randomly chosen integer from 1 to 1000. Vary the dample size with and without replacing. What will (mu (repeat #¢ num-divisors-of (swor 1000 (upto 1000)))) always return? 25. Variance, covariance, and correlation. Assign x and y to random samples of size 10 from the integers 1 to 100 using the commands (setf x (swor 10 (upto 100))) (self y (swor 10 (upto 100))). Show that V(X) = C(X, X) by using the commands (var x) and (covar x x). Check C(X, Y) with (covar x y). If the covariance is negative, which is greater: V(X + Y) or V(X - Y)? Verify by using the commands (seft x + y (repeat #¢ + xy)) and (setf x - y (repeat #¢ - xy)). Notice that the mnemonic variable x + y is the list of adding the x-list to the y-list element-wise. Then verify that V(x + y) = V(X) + V(Y) + 2C(X, Y) and V(x - y) = V(X) + V(Y) - 2C(X, Y). Next assign W to 5X with the command (setf w (repeat #¢ * x (listof 10 5)). Now W = 5X and V(W) = 25V(X). Verify with (var W) and (var X). (setf x (sample 10 (upto 100))) Æ (57 93 46 93 44 26 32 21 86 31) (setf y (sample 10 (upto 100))) Æ (79 94 84 75 49 73 28 25 92 84) (var x) Æ 711.29;
(covar x x) Æ 711.29
(setf x+y (repeat #' + x y)) Æ (136 187 130 168 93 99 60 46 178 115) (setf x-y (repeat #' - x y)) Æ (-22 -1 -38 18 -5 -47 4 -4 -6 -53) (var x) Æ 711.29 (var y) Æ 576.81 (* 2 (Covar x y)) Æ 788.86 (var x+y) Æ 2076.96 (var x-y) Æ 499.24 = 711.29 + 576.81 - 788.86 W = 5x becomes (setf W (repeat #' *x (list-of 10 5))) Æ (285 465 230 465 220 130 160 105 430 155)
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 155
155
Software Exercises
(var x) Æ 711.29 (var w) Æ 17782.25 (* 25 (var x)) Æ 17782.25 (covar x x) Æ 711.29 (covar x y) Æ 394.43 (covar x x+y) Æ 1105.72 26. For RV X with density f(x) = 1/x on [1, e], compute E(X) and then devise a simulation. Y = F ( X ) = Ln X implies that Y is continuous uniform on [0, 1] and X = e Y . E( X ) =
Ú
e
1
x(1 / x )dx =
1
Úe 0
Y
(1)dy = e - 1 ª 1.718.
Simulate with the command (repeat #' exp(sim-unform 0 1 100)) returning a sample from the X-distribution and an estimate of its expected value is the average of the values. Try (mu (repeat #¢ exp (sim-uniform 0 1 100))) to see the closeness to 1.718. 27. With population (upto 6), integers from 1 to 6 and (setf data (upto 6)), use the commands (moments-mu moment-number data) for central moments and (moments-o moment-number data) for moments about the origin to verify 4 4 4 E[( X - m )4 ] = ÊË ˆ¯ E( X 4 ) - ÊË ˆ¯ mE( X 3 ) + ÊË ˆ¯ m 2 E( X 2 ) 0 1 2 4 4 - ÊË ˆ¯ m 3 E( X ) + ÊË ˆ¯ m 4 . 1 4 (moments-mu 4 (upto 6)) returns 14.729 = E(X - m)4 (moments-o 4 (upto 6)) returns 379.166 = E(X4) (* 4 (mu (upto 6)) (moments-o 3 (upto 6))) returns 1029 = 4mE(X3) (* 6 (square (mu (upto 6))) (moments-o 2 (upto 6))) returns 1114.75 = 6m2E(X2) (* 4 (expt (mu (upto 6)) 3) (moments-o 1 (upto 6)) returns 600.25 = 4m3E(X) (expt (mu (upto 6)) 4) returns 150.0625 = m4. That is, E[(X - m)4] = E(X4) - 4mE(X3) + 6m2E(X2) - 4m3E(X) + m4, 14.729 = 379.166 - 1029 + 1114.75 - 600.25 + 150.0625. 28. To simulate a sample size 15 from density f(x) = 2x on [0, 1], we can use command (setf x1 (sim-uniform 0 1 15)), returning (0.46 0.72 0.74 0.60 0.47 0.50 0.42 0.40 0.92 0.49 0.95 0.23 0.05 0.71 0.44).
P369463-Ch002.qxd
156
9/2/05
11:01 AM
Page 156
Chapter 2 Random Variables, Moments, and Distributions
Notice that the complement of the random sample is also a random sample; i.e., (setf x2 (repeat #¢- (list-of 15 1) x1)) returned (0.53 0.27 0.25 0.39 0.52 0.49 0.57 0.59 0.07 0.50 0.04 0.76 0.94 0.28 0.55). (mu (repeat #¢ sqrt x1)) Æ 0.71, while (mu (repeat #¢ sqrt x2)) Æ 0.64. Both are estimates for the expected value of RV X. The average of the 2 estimates, 0.6779, is usually better.
SELF QUIZ 2: MOMENTS 1. A number is randomly selected from the integers 1 to 12. Let RV X be the number of divisors of the selected integer. E(X) equals a) 35;
b) 35/12;
c) 12;
d) 3;
e) answer not given.
2. Given fXY(x, y) = 2 for 0 < x < y < 1, compute the following: a) c) e) g) i) k)
fX(x) = ___________________ E(X) = ___________________ E(X2) = ___________________ E(XY) = __________________ V(X) = ___________________ r = ______________________
b) d) f) h) j) l)
fY(y) = ___________________ E(Y) = ___________________ E(Y2) = ___________________ C(X, Y) = _________________ V(Y) = ____________________ V(3X - 2Y) = ______________
3. The experiment is to flip a fair coin 2 times. Let RV X be the number of heads and RV Y the number of runs. Recall that the number of runs in the event HT is 2. The joint density distribution fXY is a)
b)
Y 1 2
0 1/4 0
X 1 0 1/2
2 1/4 0
Y 1 2
0 1/16 3/16
X 1 1/4 1/4
2 1/4 0
c)
Y 1 2
0 1/8 0
X 1 1/8 1/2
2 1/4 0
Y 1 2
0 1/8 1/4
X 1 1/8 2/16
2 1/8 1/8
d)
P(X = 0 | Y = 1) is a) 1/4; b) 1/2; c) 1/8; d) 3/8; E(X | Y = 1) is a) 0; b) 1/2; c) 1; d) 1/4. 4. Given the daily demand of an item RV X shown below and assuming independence between daily demands, the probability of a demand of 3 items in 2 days is X P(X)
a) 0.18;
b) 0.36;
0 0.1
c) 0.12;
1 0.6
2 0.3
d) 0.06.
P369463-Ch002.qxd
9/2/05
11:01 AM
Page 157
157
Self Quiz 2: Moments
5. Given RV X has density f(x) = 2x on [0, 1] and that RV Y = X2, the density function for Y is a) 4x2;
b) 2y;
c) 1;
d) 1/X;
e) answer not given.
6. Choose T for true or F for false. a) If RVs X and Y are independent, then their covariance C(X, Y) = 0. b) If C(X, Y) = 0, then RVs X and Y are independent. c) If RV Y = F(X) where F is a cumulative distribution function, then Y may be an exponential RV. d) C(X, X) = V(X). e) V(X) = pq for any indicator RV with probability of success p and failure q. X f(x)
0 c
1 2c2
2 2c
3 3c2
4 c
7. Given the probability density function for discrete RV X below, compute a) c
b) P(X £ 3)
c) P(X > 2)
d) E(X)
e) V(X)
8. The density function for RV X is given by f(x) = 2x for 0 £ x £ 1. RV Y = 5X + 10; 10 £ y £ 15. Find a. E(X); b. E(Y ) from Y’s density function; c. E(Y ) from X’s density function; d. E(X 2). 9. Describe a method for simulating the value of p using software commands. 10. Describe a method for sampling from the density distribution f(x) = 2x for 0 < x < 1.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 158
Chapter 3
Special Discrete Distributions
Mathematical formulas have their own life; they are smarter than we, even smarter than their own authors, and provide more than what has been put into them. Heinrich Hertz
Certain probability scenarios recur in modeling the laws of nature. These scenarios give rise to some of the more useful discrete probability distributions: the discrete uniform, Bernoulli, binomial, negative binomial, geometric, hypergeometric, and Poisson. The expected values, variances, and moment generating functions are derived for each along with their entropies. Applications to probability scenarios are presented in the examples. The reader is encouraged to think of these special distributions determined by their parameters and to visualize the characteristics of their data sets. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 158
Introduction Discrete Uniform Bernoulli Distribution Binomial Distribution Multinomial Distribution Hypergeometric Distribution Geometric Distribution Negative Binomial Distribution Poisson Distribution Summary
P369463-Ch003.qxd 9/2/05 11:12 AM Page 159
3.1 Discrete Uniform
3.0
159
Introduction Common discrete random variables and their distributions are discussed along with appropriate assumptions for their use. The domain and range of the RVs offer bounds on the whereabouts of the RVs, and the distribution densities provide probabilities on the location of the RVs. We look at the first few moments and related quantities: E(X), E(X 2), V(X), and M(t), with example problems. The two most important discrete distributions are the binomial and the Poisson. The two most important processes are the Bernoulli and Poisson. The Bernoulli process forms the basis for many of the special discrete distributions, and the Poisson process is related to the continuous exponential distribution. Probability problem descriptions carry clues to the applicable density distribution.
3.1
Discrete Uniform The discrete uniform or rectangular distribution with parameter n has RV X with equal probability for all n values that X can assume. The principle of symmetry assumes that all outcomes are equally likely. The density function is f ( x) =
1 n
for x = x1, x2 , . . . , x n .
The expected value m is E( X ) =
1 n
n
Âx
i
= x,
(3–1)
i =1
and the variance s 2 is V ( X ) = E( X 2 ) - E 2 ( X ) 1 n = Â x i2 - x 2 . n i =1
(3–2)
When the xi are consecutive integers from a to b, the expected value is simplified to E( X ) =
a+b
(3–3)
2 and the variance to V( X ) =
n2 - 1
.
12 To see this, the sum of consecutive integers from 1 to n is
(3–4)
P369463-Ch003.qxd 9/2/05 11:12 AM Page 160
160
Chapter 3 Special Discrete Distributions n
Âx =
n( n + 1)
,
2
x =1
and the expected value is E( X ) =
n
1
Âx n
1
=
i
*
n( n + 1) 2
n
i =1
n +1
=
.
2
Similarly, in calculating the second moment E(X 2) for consecutive integers Xi from 1 to n, n
Âx
2
=
n( n + 1)(2n + 1)
,
6
x =1
and dividing by n, E( X 2 ) =
( n + 1)(2n + 1)
,
6 we find V ( X ) = E( X 2 ) - E 2 ( X ) =
( n + 1)(2n + 1) 6 n2 - 1
=
Ê n + 1ˆ Ë 2 ¯
2
.
12 The variances of any n consecutive integers are equal since V(X) = V(X + c), where c is a constant. For discrete uniform RV X on integer domain [a, b], ( b - a + 1)2 - 1 V( X ) = , since n = b - a + 1. 12 The moment generating function M(t) = E(etX) for the discrete uniform is given by 1
n
Âe n
M ( t) =
txi
=
1
( e tx1 + e tx2 + . . . + e tx n ).
n
i =1
Observe that M (0 ) =
1
(1 + 1 + . . . + 1)
n
n
= 1.
n
The expected value E(X) and the second moment E(X 2) are calculated from M ¢( t) =
n
1
Âx e n i
txi
;
M ¢(0) =
i =1
M ¢¢ =
1
n
Âx n i =1
2 i
e txi ;
M ¢¢(0) =
1
n
Âx n 1
i
= E( X ) = x .
i =1 n
Âx n i =1
2 i
.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 161
3.1 Discrete Uniform
161
The entropy H of the discrete uniform is the expected value of the information I(X), E[-Log2 P(X = xi)], where I(x) = -Log2 * P(X = x) and Log2 X = Ln X/Ln2. H( x) = -
1
n
 Log n
2
P( X = xi ) = -
i =1
1
n
 Log n
1 2
i =1
n
= Log 2 n.
For discrete uniform distributions with probability 1/n, H(n) = Log2 n. The discrete uniform has the maximum entropy (maximum uncertainty) of all discrete distributions. EXAMPLE 3.1
a) Find E(X) and V(X) where RV X is the outcome of a fair die roll. b) How many bits of information (yes/no questions) would it take to determine the outcome of any roll? c) Find the expected values of the sums from rolling 2, 3, and 4 fair dice. Repeat for the expected values of the products. Verify E(XY) = E(X)E(Y) by creating the distribution of RV Z = XY, the product of two fair dice. Solution a) E( X ) =
1+ 6
= 3.5 and V ( X ) =
62 - 1
=
35
= 2.917. 2 12 12 b) Log2 6 = 2.585 bits = 6 * (-1/6) * Log2(1/6). c) E(X + Y) = E(X) + E(Y) = 3.5 + 3.5 = 7; E(X + Y + Z) = 10.5; E(W + X + Y + Z) = 14. E(XY) = E(X)E(Y) = 3.52 = 12.25; E(XYZ) = 3.53 = 42.875; E(WXYZ) = 3.54 = 150.0625, from the independence of the dice. The density table for RV Z being the product of two fair dice is shown. The probabilities P(Z) are each divided by 36. Z P(Z)
1
2
3
4
5
6
8
9
10
12
15
16
18
20
24
25
30
[1
2
2
3
2
4
2
1
2
4
2
1
2
2
2
1
2
36 1] =/ 36
(sim-dice-product m n) returns the products from rolling m dice in n trials. (sim-dice-product 3 10) Æ (30 180 60 16 4 20 15 48 8 60). (mu (sim-dice-product 2 1000)) returned 12.334 as an estimate for m = 12.25. (mu (sim-dice-product 3 1000)) returned 42.773 as an estimate for m = 42.875. (mu (sim-dice-product 4 1000)) returned 151.404 as an estimate for m = 150.0625.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 162
162
EXAMPLE 3.2
Chapter 3 Special Discrete Distributions
RV X has discrete uniform distribution for the integers on the interval 5 £ x £ 12. Find a) E(X), b) V(X) and s, c) P(m - s £ X £ m + s). Solution a) E( X ) =
a+b
=
5 + 12
2 b) V ( X ) =
n2 - 1 12
= 8.5 =
2 =
5 + 6 + 7 + 8 + 9 + 10 + 11 + 12
.
8
82 - 1
= 5.25 = s 2 fi s ª 2.29,
12
where there are 8 integers in the closed interval [5, 12]. c) P ( m - s £ X £ m + s ) = P (8.5 - 2.29 £ X £ 8.5 + 2.29) = P (6.21 £ X £ 10.79) = P (7 £ X £ 10) = 4 / 8. X is between 7 and 10 inclusive with P(X = x) = 1/8.
The commands (cdiscrete-uniform a b x) returns P(X £ x) on [a, b]. (cdiscrete-uniform 5 12 7) Æ 3/8. (cdiscrete-uniform-a-b a b ¥1 ¥2) returns P(¥1 £ X £ ¥2) (cdiscrete-uniform-a-b 5 12 7 10) Æ 1/2. (sim-d-uniform a b n) returns n random samples from the discrete uniform on [a, b]; (sim-d-uniform 5 12 10) returned (8 6 7 5 12 7 12 9 5 9). (mu (sim-d-uniform a b n)) returns an estimate for the expected value from the discrete uniform on [a, b], (mu (sim-d-uniform 5 12 100)) returned 8.55 vs. the theoretical 8.5. (var list) returns the population variance of list of numbers (var (from-a-to-b 5 12)) returns 5.25. (svar (sim-d-uniform a b n)) returns a simulated value for the variance on the interval [a, b]. (svar (sim-d-uniform 5 12 100)) returned the sample variance value of 5.78 compared to the theoretical value 5.25. (mu-var list) returns the mean and variance of list. (mu-var (upto 10)) returns (5.5 8.25)
P369463-Ch003.qxd 9/2/05 11:12 AM Page 163
3.2 Bernoulli Distribution
EXAMPLE 3.3
163
Suppose RV X is discrete uniform on the integers 2 thru 12 inclusive. a) Find E(X) and V(X). b) Find E(X) and V(X) where X is the outcome sum from rolling a fair pair of dice. Note the difference. Solution a) E( X ) =
2 + 12
= 7; V ( X ) =
112 - 1
= 10. 2 12 b) Let X1 and X2 denote the respective outcomes. E(X) = E(X1 + X2) = 3.5 + 3.5 = 7; V ( X ) = V ( X1 + X 2 ) =
62 - 1 12
+
62 - 1 12
=
35 6
.
(mu-svar (sim-d-uniform 2 12 1000)) returned (6.999 10.097096) vs. (7 10) (mu-svar (sim-dice-roll 1000)) returned (6.943 5.787535) vs. (7 5.83)
3.2
Bernoulli Distribution For the Bernoulli distribution there are two events or states of interest: success and failure. These events can be designated yes/no, up/down, left/right, on/off, 1/0, true/false, defect/no defect, etc. The probability of “success” is denoted by p and of “failure” by q; however, which event is labeled “success” is arbitrary. The numerical designator for a success is 1 and for a failure is 0. The Bernoulli RV X is an indicator RV and its density function is given by f ( x ) = p x q1- x for x = 0, 1. P ( X = 1) = p; P ( X = 0) = q; and p + q = 1. E( X ) = 1 * p + 0 + q = p; E( X 2 ) = 12 * p + 0 2 * q = p; V ( X ) = E( X 2 ) - E 2 ( X ) = p - p 2 = p(1 - p) = pq. V ( p) = p - p 2 V ¢( p) = 1 - 2 p = 0 when p = 1 / 2 where V (1 / 2) = 1 / 4 is a maximum. (3–5) The Bernoulli RV as an indicator RV is a trial in performing experiments with the binomial, negative binomial, and geometric distributions. The typical (not necessarily fair) coin flip experiment is an example of a Bernoulli RV. The moment generating function and the first three moments (0 to 2) are given by
P369463-Ch003.qxd 9/2/05 11:12 AM Page 164
164
Chapter 3 Special Discrete Distributions
M ( t) = E( e tX ) = q * e 0 t + p * e tt = q + pe t . Notice the 0th moment M(0) = 1. M ¢( t) = pe t from which the 1st moment M ¢(0) = pe 0 t = p = E( X ). M ¢¢( t) = pe t from which the 2nd moment M ¢¢(0) = pe 0 t = p = E( X 2 ). The entropy for a Bernoulli RV is H = -p Log2 p - (1 - p)Log2(1 - p). Maximum entropy (maximum uncertainty or maximum variance) occurs at p = 1/2. A Bernoulli RV X with p = q = 1/2 is also a discrete uniform RV, V(X) = (22 - 1)/12 = 1/4.
The command (Bernoulli p) returns either a 1 for success or 0 for failure where p is the probability of success. (Bernoulli 19/20) probably returns a 1. (sim-Bernoulli p n) returns a list of n Bernoulli trials. (sim-Bernoulli 1/3 20) Æ (0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 0)
3.3
Binomial Distribution The binomial RV X indicates the number of successes in a sequence of Bernoulli trials. The interest is in the probability of exactly x successes in n trials. The assumptions for its use concern the stability of these trials. These assumptions include the following: 1. a sequence of Bernoulli trials indicating success or failure; 2. the trials are conducted under identical experimental conditions; 3. the probability p remains constant from trial to trial, a stationary process; and 4. each trial is independent from every other trial. The binomial RV X is a sum of Bernoulli RVs, X = X1 + X 2 + . . . + X n for each Bernoulli RV X i . The 3-coin flip experiment in which X is the number of heads fits the binomial assumptions. Suppose we are interested in computing P(X = 2). One outcome sequence 1 1 1 1 of flips is HHT, a canonical pattern with probability * * = . How many 2 2 2 8 3! such ways are there? . 2!1!
P369463-Ch003.qxd 9/2/05 11:12 AM Page 165
3.3 Binomial Distribution
165
Observe that an equivalent question is how many ways 2 objects (Heads) can occur from 3 flips and recall the combination 3C2. Also note that the equivalent number of ways to obtain 1 object (tails) from 3 flips is 3C1. Thus P ( X = 2) = 3 C2 p 2 q 3 - 2 . To compute the cumulative probability P(X £ x), sum for each discrete value less than or equal to x. For example, P(X £ 2) = P(X = 0) + P(X = 1) + P(X = 2) 0
3
1
2
2
1
3 Ê 1ˆ Ê 1ˆ 3 Ê 1ˆ Ê 1ˆ 3 Ê 1ˆ Ê 1ˆ = ÊË ˆ¯ + ÊË ˆ¯ + ÊË ˆ¯ Ë ¯ Ë ¯ Ë ¯ Ë ¯ 0 2 1 2 2 2 Ë 2¯ Ë 2¯ 2 = 18 + 3 8 + 3 8 = 7 8. Also, 3
0
1 7 3 Ê 1ˆ Ê 1ˆ P ( X £ 2) = 1 - P ( X = 3) = 1 - ÊË ˆ¯ = 1- = . Ë ¯ Ë ¯ 3 2 2 8 8 The density function for binomial RV X with parameters n and p is given as n Binomial( X ; n, p) = P ( X = x) = ÊË ˆ¯ px q n - x , for x = 0, 1, . . . , n, x
(3–6)
where X is the number of successes, n is the number of Bernoulli trials, and p is the probability of success. k
The cumulative probability P ( X £ k) =
n
 ÊË xˆ¯ p
x
qn-x .
(3–7)
x =0
The sum of the probabilities for x = 0 to n equals one because Ê nˆ p0 q n + Ê nˆ p1q n -1 + . . . + Ê nˆ pn = Ë 0¯ Ë 1¯ Ë n¯
n
n
 ÊË x ˆ¯ p q x
n -x
= ( p + q) n = 1n. (3–8)
x =0
Since a binomial RV X is a sum of Bernoulli RVs Xi, that is, X = X1 + X 2 + . . . + X n , in which each Xi is a Bernoulli (indicator) RV, we have E(Xi) = p and V(Xi) = pq. For binomial RV X, E( X ) = E( X1 + X 2 + . . . + X n ) = p+ p+ ...+ p = np.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 166
166
Chapter 3 Special Discrete Distributions
V ( X ) = V ( X1 + X 2 + . . . + X n ) = pq + pq + . . . + pq = npq. Notice the use of assumption 4, independence of trials, for computing the variance. The moment generating function is given by M (t) = E(etx ) = E(etSxi ) = E(etx1 * etx2 * . . . * etxn ) = ( pet + q) * ( pet + q) * . . . * ( pet + q). Hence, M ( t) = ( pe t + q )n . Observe M (0) = ( p + q )n = 1; M ¢( t) = npe t ( pe t + q )n -1, from which M ¢(0) = E( X ) = np. M ¢¢( t) = npe t ( n - 1)( pe t + q )n - 2 ( pe t ) + npe t ( pe t + q )n -1. M ¢¢(0) = E( X 2 ) = np 2 ( n - 1) + np. V ( X ) = E( X 2 ) - E 2 ( X ) = np 2 ( n - 1) + np - n 2 p 2 = npq. The binomial entropy H = - np Log 2 p - n(1 - p) Log 2 (1 - p). EXAMPLE 3.4
a) Create the density function for the binomial RV X being the number of heads in 5 tosses of a fair coin. Compute b) the expected number of heads; c) P(X £ 3); d) P(m - s £ X £ m + s). Solution X f(X)
0
1
2
3
4
5
0.0313
0.1563
0.3125
0.3125
0.1563
0.0313
a) Using (binomial n p x) for x = 0, 1, . . . , 5, we have x
5 Ê 1ˆ Ê 1ˆ P ( X = x ) = ÊË ˆ¯ x Ë 2¯ Ë 2¯
5- x
.
For example P(X = 3) = 5C3 * 0.53 * 0.52 = 0.3125 = (binomial 5 1/2 3). Observe the symmetry in the binomial coefficients of the distribution since p = q for a fair coin flip. See Figure 3.1b. b) Since each flip is a Bernoulli indicator RV with E(Xi) = p, E( X ) = np = 5 * 1 / 2 = 2.5 head. V ( X ) = npq = 5 / 4; s = 5 / 2. The expected value can be verified by summing the products of xi * f(xi).
P369463-Ch003.qxd 9/2/05 11:12 AM Page 167
3.3 Binomial Distribution
167
c) P ( X £ 3) = 0.0313 + 0.1563 + 0.3125 + 0.3125 = 0.8125 = 1 - P ( X = 4) - P ( X = 5) = 1 - 0.1563 - 0.0313. d) P ( m - s £ X £ m + s ) = P (2.5 - 1.12 £ X £ 2.5 + 1.12) = = = = =
P (1.38 £ X £ 3.62) P (2 £ X £ 3) P ( X = 2) + P ( X = 3) 0.3125 + 0.3125 0.625.
The command (binomial n p x) returns the probability of exactly x successes in n Bernoulli trials. For example, (binomial 5 1/2 3) = 0.3125 = P(X = 3). The command (binomial-density n p) returns the binomial density function. For example, (binomial-density 5 1/2) returns x
P(X = x)
0 1 2 3 4 5
0.03125 0.15625 0.31250 0.31250 0.15625 0.03125
The command (cbinomial n p x) returns the cumulative probability for x successes. For example, (cbinomial 5 1/2 3) returns 0.8125 = P(X £ 3). The command (cbinomial-a-b n p a b) returns P(a £ X £ b). For example, (cbinomial-a-b 5 1/2 2 3) returns 0.625. EXAMPLE 3.5
a) Create the binomial density function f(x) for n = 5, p = 0.7 using the command (binomial-density 5 0.7), b) Compute E(X), c) Find P(X £ 3), d) Find P(m - s £ X £ m + s). Solution a) (binomial-density 5 0.7) returns X f(X)
0
1
2
3
4
5
0.00243
0.02835
0.1323
0.3087
0.36015
0.16807
P369463-Ch003.qxd 9/2/05 11:12 AM Page 168
168
Chapter 3 Special Discrete Distributions
Notice that because of the difference in p and q, the distribution is skewed left (direction of longer tail) from symmetry. b) E(X) = np = 5 * 0.7 = 3.5 heads. Notice that most of the probability is centered around the expected value 3.5. See Figure 3.1a, b, and c for (binomial 10 p X). c) P(X £ 3) = 0.00243 + 0.02835 + 0.1323 + 0.3097 = 0.47178. (cbinomial 5 0.7 3) returns 0.47178. d) P ( m - s £ X £ m + s ) = P (3.5 - 1.025 £ X £ 3.5 + 1.025 ) = P (2.475 £ X £ 4.525 ) = P (3 £ X £ 4) = 0.3087 + 0.36015 = 0.66885. (cbinomial-a-b 5 0.7 3 4) returns 0.66885.
0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 7 8 9 10 a. Binomial(10, 0.7, X ) 0.3 0.25 0.2 0.15 0.1 0.05 0 b. Binomial(10, 0.5, X ) 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3
4 5
6 7
8
c. Binomial(10, 0.3, X )
Figure 3.1
Binomial Distributions
9 10
P369463-Ch003.qxd 9/2/05 11:12 AM Page 169
3.3 Binomial Distribution
EXAMPLE 3.6
169
Compute the probability of a) exactly 3 S9 in 5 rolls of a pair of fair dice where S9 is the event that the sum of the dice is 9; b) at least 1 S9 in 5 rolls of a pair of fair dice; c) at least 3 S9 in tossing a pair of fair dice 5 times. a) With P(S9) = 4/36 = 1/9, we have
Solution
3
2
5 Ê 1ˆ Ê 8ˆ (binomial n p x) for P ( X = 3; n = 5, p = 1/ 9) = ÊË ˆ¯ = 0.010838. 3 Ë 9¯ Ë 9¯ (binomial 5 1/9 3) returns 0.0108384. 0
5
5 Ê 1ˆ Ê 8ˆ b) P ( X ≥ 1) = 1 - P ( X = 0) = 1 - ÊË ˆ¯ = 1 - 0.5549 = 0.4451. 0 Ë 9¯ Ë 9¯ (cbinomial-a-b 5 1/9 1 5) returns 0.4450710. c) P(S9) = 1/9. We seek at least 3 occurrences with p = 1/9. (cbinomial-a-b 5 1/9 3 5) = 0.01153, that is, P ( X ≥ 3) =
P ( X = 3) 3
+ 2
P ( X = 4) 4
+ 1
P ( X = 5) 5
0
8 8 8 5 1 5 1 5 1 = ÊË ˆ¯ Ê ˆ Ê ˆ + ÊË ˆ¯ Ê ˆ Ê ˆ + ÊË ˆ¯ Ê ˆ Ê ˆ Ë ¯ Ë ¯ Ë ¯ Ë ¯ Ë ¯ Ë 3 9 4 9 5 9 9 9 9¯ = 0.01084 + 6.774E-4 + 1.694E-5 = 0.01153. Alternately, P ( X ≥ 3) = 1 - P ( X £ 2) = 1 - (cbinomial 5 1/9 2) = 1 - 0.9885 = 0.0115. (cbinomial 5 1/9 2) calculates the cumulative binomial from x = 0 to x = 2. EXAMPLE 3.7
A fair coin is flipped 100 times, resulting in 60 heads. How many standard deviations is 60 above the expected number of heads? Solution n = 100, p = 1/2, s 2 = npq = 100 * 1/2 * 1/2 = 25, implying s = 5 and 60 is (60 - 50)/ 5 = 2 standard deviations above the mean of 50.
EXAMPLE 3.8
Compute the probability of exactly 3 fours occurring in the experiment of rolling 5 fair dice. Solution Method I: Using a counting argument, the 3 dice bearing a four can be selected from the 5 dice in 5C3 ways, and each of the two remaining dice can bear any of the other 5 numbers other than four in 5 ways, with the total ways the 5 dice can fall given by 65.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 170
170
Chapter 3 Special Discrete Distributions
P ( X = 3 fours) =
Ê 5ˆ Ê 5ˆ Ê 5ˆ Ë 3¯ Ë 1¯ Ë 1¯
=
6
250
= 0.03215.
7776
Method II: Use the binomial command with X = 3; n = 5, p = 1/6. 3
2
5 5 1 (binomial 5 1/6 3) = ÊË ˆ¯ Ê ˆ Ê ˆ , Ë ¯ Ë 3 6 6¯ which can be seen to be equivalent to the result from Method I, 10 * 25 65 EXAMPLE 3.9
.
Find the mean and variance of RV Y =
X
where RV X is binomial with
n parameters n and p. Solution
E(Y ) =
E( X ) n
=
np
= p;
V (Y ) = V
n
RV Y is the proportion of successes in n trials. EXAMPLE 3.10
Ê X ˆ npq pq = = . Ë n¯ n2 n
The game chuck-a-luck is usually played at carnivals. You may bet $n and select 1 of the 6 faces to occur on three fair dice. You win $n for each die bearing your selection. Compute the probability of winning to determine if chuck-a-luck is a fair game. Solution There are 6 * 6 * 6 = 216 ways that 3 dice can occur, with 1/6 probability of success on each die. Method I: Inclusion/Exclusion Principle Suppose we select the outcome x. Let Di denote outcome success (x) on die i for i = 1, 2, 3. P ( win) = P ( D1 + D2 + D3 ) = P ( D1 ) + P ( D2 ) + P ( D3 ) - P ( D1 D2 ) - P ( D1 D3 ) - P ( D2 D3 ) + P ( D1 D2 D3 ) 1 1 1 1 1 = + + - 3Ê ˆ + Ë ¯ 6 6 6 36 216 =
91
.
216 Method II: Complimentary Probability
P369463-Ch003.qxd 9/2/05 11:12 AM Page 171
3.3 Binomial Distribution
171
P ( win ) = 1 - P ( Lose) = 1 -
5 5 5 216 - 125 91 * * = = . 6 6 6 216 216 91 .
(- 1 (binomial 3 1/6 0)) returns 0.4212962 or
216 Method III: Binomial Distribution X P(X)
0
1
2
125
75
15
3 1
216
216
216
216
P ( win ) = (cbinomial-a-b 3 1/6 3) x
3
3 Ê 1ˆ Ê 5ˆ =  ÊË ˆ¯ Ë 6¯ Ë 6¯ x =1 x
3- x
=
75 216
+
15 216
+
1 216
=
91
.
216
Method IV: Counting Argument—as to where the first winner occurs: If on the first die, then 1 * 6 * 6 = 36 ways; if on the 2nd die, then 5 * 1 * 6 = 30; if on the 3rd die, then 5 * 5 * 1 = 25, again resulting in 91 favorable ways. Each time the game is played a player’s net expected gain is P(win) - P(lose), n * 75 + 2 n * 15 + 3 n * 1 216
-
125 n
=
216
The odds of winning are expressed as
-17 n
p
:1 =
q Thus, chuck-a-luck is not a fair game.
= -7.87 cents on $n bet.
216 91
:1, or 0.728 to 1 for.
125
The command (sim-chuck-a-luck die-num n) returns n simulated runs showing the number of times winning (with the money won in parentheses), the number of times losing, the probability for winning, and the number of times the selected die-num occurred from each of n games of chuck-a-luck. For example, one simulation run of (sim-chucka-luck 3 100) printed (WIN 101 120 101
= 0 0 0
41 ($46) LOSE = 59 p = 0.41 (2 1 0 0 0 0 0 0 0 1 0 0 0 1 0 1121101121101010010100101000010 0012010210110010000000000000110 0 0 1 1 0 0 0 0 1 0 1))
P369463-Ch003.qxd 9/2/05 11:12 AM Page 172
172
Chapter 3 Special Discrete Distributions
The player’s chosen number was 3 and the game was played 100 times, resulting in winning 41 times (winning $46) and losing 59 times (losing $59), for a simulated probability of winning 41 41 + 59
= 0.41 ª
91
.
216
(cbinomial-a-b 3 1/6 1 3) returns 0.42129, the theoretical probability of winning 1, 2, or 3 times.
EXAMPLE 3.11
Find a) E(X) for binomial RV X with parameters n = 6, p = 0.5, and b) show that P[X = E(X)] π 0.5. c) Find p when P(X = 3 | n = 6) = 0.2. Solution a) E(X) = np = 6 * 0.5 = 3; 3
3
5 6 Ê 1ˆ Ê 1ˆ = = 0.3125 π 0.5. b) P ( X = 3) = ÊË ˆ¯ Ë ¯ Ë ¯ 3 2 2 16 c)
EXAMPLE 3.12
C3 p3 (1 - p)3 - 0.2 fi p3 (1 - p)3 = 0.01; (0.01)1 3 fi 1- p = or p2 - p + 0.215443469 = 0. p (quadratic 1 - 1 0.215443469) returns the two roots for p as (0.685894 0.314106) (binomial 6 0.685894 3) Æ 0.2 (binomial 6 0.314105 3) Æ 0.2. 6
Let binomial RV X have parameters n = 3 and p = 1/2, 1 and let Y = . Find density fY. 3+ x Solution
First note that the density fX(x) is given by X
0
1
2
3
fX(x)
1/8
3/8
3/8
1/8
Y
1/3
1/4
1/5
1/6
fy(y)
1/8
3/8
3/8
1/8
and density fY(y) by
.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 173
3.3 Binomial Distribution
Substituting x =
173
1 - 3y
into the binomial form for X yields
y x
3 Ê 1ˆ Ê 1ˆ fx( x ) = ÊË ˆ¯ x Ë 2¯ Ë 2¯
3- x
Ê 3 ˆ 3 3 Á 1 - 3y ˜ Ê 1 ˆ 1 3ˆ Ê ˆ Ê =Ë ¯ , producing fY ( y ) = Á ˜Ë ¯ . x Ë 2¯ Á y ˜ 2 Ë ¯
As a check, we see that 1 3 3 1 fY Ê ˆ = ÊË ˆ¯ Ê ˆ = . Ë 5¯ Ë ¯ 2 8 8 EXAMPLE 3.13
Consider the 4 ways that 3 boxes of two different detergents A and B can be bought {(3 0), (0 3), (1 2), (2 1)}, with the ordered pairs representing (A, B). Suppose that the relative market shares of the products are P(A) = 1/4 and P(B) = 3/4. Find the probability of each way. 3
0
3!
Ê 1ˆ Ê 3ˆ P (3, 0) = * = 1 64 ; 3! 0! Ë 4 ¯ Ë 4 ¯ 1
0
Ê 1ˆ Ê 3ˆ P (0, 3) = * = 27 64 ; 0! 3! Ë 4 ¯ Ë 4 ¯
2
2
3!
Ê 1ˆ Ê 3ˆ P (1, 2) = * = 27 64 ; 1! 2! Ë 4 ¯ Ë 4 ¯
3
3!
1
3!
Ê 1ˆ Ê 3ˆ = 9 64 . P (2, 1) = * 2!1! Ë 4 ¯ Ë 4 ¯
Notice that (binomial n p x) = (binomial (n q (-n x)). For example, (binomial 3 1/4 1) Æ 27/64 = (binomial 3 3/4 2). Notice also that the sum of the probabilities of the four cases is 1. EXAMPLE 3.14
Which target has the higher probability of being hit: target A fired at twice with p = 1/2 or target B fired at thrice with p = 1/3? A
0
1
2
B
0
1
2
3
P(A)
0.25
0.5
0.25
P(B)
0.30
0.44
0.22
0.04
Using entropy, (entropy ¢(1/4 1/2/ 1/4) = HA = 1.5; (entropy ¢(0.3 0.44 0.22 0.04)) = HB ª 1.7. Using probability, (cbinomial-a-b 2 1/2 1 2) Æ 0.75 (cbinomial-a-b 3 1/3 1 3) Æ 0.704. Target A is more likely to be hit (less entropy, higher probability, more certainty).
P369463-Ch003.qxd 9/2/05 11:12 AM Page 174
Chapter 3 Special Discrete Distributions
174
3.4
Multinomial Distribution Closely akin to the binomial is the multinomial distribution. Rather than the 2 possible outcomes present in the binomial, the multinomial has k possible outcomes: x1, x2, . . . , xk for the corresponding Bernoulli RVs Xi, with respective probabilities p1, p2, . . . , pk, where Spi = 1 and Sxi = n independent trials. The multinomial density is given by f ( x1, x2 , . . . x k ) =
n! x1! x2 ! . . . x k !
p1x1 p2x2 . . . pkxk
(3–9)
for nonnegative integers xi. The sum of the probabilities is 1 since Spi = 1 and n!
 x !x ! . . . x !p 1
ÂX
i
2
x1 1
p2x2 . . . pkxk = ( p1 + p2 + . . . pk ) n .
k
= n; E( X i ) = npi ; V ( X i ) = npi q i ; C( X i , X j ) = - npi p j for i π j.
Observe that p1x1p2x2 . . . pkxk is the canonical pattern and
n!
is the x1! x2 ! . . . x k ! number of such patterns and that sampling is with replacement. The negative value of the covariance makes sense since the more successes in one category, the fewer successes in the others. For k = 2, the multinomial density becomes the binomial density. EXAMPLE 3.15
In a box are 12 coins of which 6 are fair, 2 are 2-headed, and 4 are 2-tailed. Eight coins are randomly selected from the box and flipped. Compute the probability that the 8 coins show 4 fair, 1 2-headed, and 3 2-tailed. Solution
Using equation 3–9 with n = 8, x1 = 4 p1 = 6 /12,
x2 = 1, p2 = 2 /12,
x3 = 3, p3 = 4 /12,
we find that P ( X1 = 4, X 2 = 1, X 3 = 3) =
EXAMPLE 3.16
8!
(1 / 2)4 (1 / 6)1 (1 / 3)3 = 0.1080.
4!1! 3!
In rolling a pair of fair dice 6 times, compute the probability of a sum of a 6 or 7 or 8 twice, a sum less than 6 three times, and a sum greater than 8 once. b) Then write out the expansion of (x + y + z)4. Solution
P ( S6 + S7 + S8 ) = 16 / 36; P (Sum < 6) = 10 / 36; P (Sum > 8) = 10 / 35.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 175
3.5 Hypergeometric Distribution
175 3
3
1
Ê 6! ˆ Ê 16 ˆ Ê 10 ˆ Ê 10 ˆ f (2, 3, 1; 16 36, 10 36, 10 36) = Ë 2! 3!1!¯ Ë 36 ¯ Ë 36 ¯ Ë 36 ¯ = 12500 177147 ª 0.0706. b) ( x + y + z )4 = x 4 + y 4 + z 4 + 4 x 3 y + 4 x 3 z + 4y 3 x + 4y 3 z + 4 z 3 x + 4 z 3 y + 6 x 2 y 2 + 6 x 2 z 2 + 6y 2 z 2 + 12x 2 yz + 12xy 2 z + 12xyz 2 . For example, the coefficient for y2z2 is calculated from
4!
= 6 and for
0! 2! 2! x 2 yz from
4!
= 12.
2!1!1!
The command (multinomial n x-list p-list) returns the multinomial probability where x-list is the list of xi and p-list is a list of pi. For example, (multinomial 6 ¢(2 3 1) ¢(16/36 10/36 10/36)) returns 12500/ 177147.
3.5
Hypergeometric Distribution Whereas the binomial RV was a sequence of independent trials, that is, sampling with replacement, the hypergeometric RV is a sequence of trials characterized as sampling without replacement. There are still only two outcomes, success and failure, for the hypergeometric RV. Suppose we have an urn with 5 white marbles, 7 blue marbles, and 8 red marbles. The experiment is that we randomly select n = 5 marbles. We seek the probability of exactly 2 red marbles in this sample size of 5. Once the selection of interest is designated red for success, the other marbles combined become nonred for failure. How many ways can we select 2 red marbles from the red population of 8 red marbles? 8
C2 = 28.
How many ways can we choose the remaining 3 samples from a population of 12 nonred marbles? 12
C3 = 220.
How many ways can we select 5 marbles from the total population of 20 marbles? 20
C5 = 15504.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 176
176
Chapter 3 Special Discrete Distributions
Thus,
P ( X = 2 red) =
Ê 8ˆ Ê 12ˆ Ë 2¯ Ë 3 ¯
6160
=
Ê 20ˆ Ë 3¯
= 0.3973.
15504
The hypergeometric RV X with parameters A successes, B failures, and n samples to get exactly x successes is given by the density
f ( A, B, n; x ) = P ( X = x ) =
Ê Aˆ Ê B ˆ Ë x ¯ Ë n - x¯ Ê A + Bˆ Ë n ¯
, max{0, n - B} £ X £ min{ A, n}. (3–10)
Observe that X, the number of successes, can never be greater than the min {A, n} or less than the max {0, n - B), by considering the special case B = 0 and the case A = 0. Finding the hypergeometric expected value and variance provides a partial review of Chapter 2 and the combinatorics of Chapter 1. To find the expected value, we use indicator RVs Xi, which equal 1 if the ith item is an element of A (success) and 0 if the ith item is an element of B (failure). Then the number of successes X = SXi, P ( X i = 1) = p =
A A+ B
and P ( X i = 0) = q =
E( X ) = E( SX i ) = SE( X i ) =
nA A+ B
B A+ B
; and (3–11)
.
To find the variance, V ( X i ) = pq =
A A+ B
*
B A+ B
=
AB ( A + B )2
.
Recall that for random variables X and Y, V ( X + Y ) = V ( X ) + V (Y ) + 2C ( X , Y ). For n random variables X1, X2, . . . , Xn, where RV X = SXi, V ( X ) = V ( X1 + X 2 + . . . + X n ) nAB n = + 2ÊË ˆ¯ C ( X i , X j ), 2 2 ( A + B) where the covariances are taken pairwise for all i < j. We now seek C(Xi, Xj). E( X i X j ) = 1 * P ( X i X j = 1) + 0 * P ( X i X j = 0) = P ( X i X j = 1).
P369463-Ch003.qxd 9/2/05 11:12 AM Page 177
3.5 Hypergeometric Distribution
177
0.4 0.3 0.2 0.1 0
a. Hypergeometric A = 20, B = 10, n = 10
Figure 3.2
10
8
6
4
2
0
10
8
6
4
2
0
0.3 0.2 0.1 0
b. Binomial (n = 10, p = 2/3, X )
Hypergeometric vs. Binomial Now P ( X i X j = 1) = P ( X i = 1) * P ( X j = 1 X i = 1) =
A A+ B
*
A -1 A + B -1
since Xi and Xj are not independent. The covariance C ( X i , X j ) = =
A A+ B
*
A -1
A + B -1 - AB
-
( A + B)2 ( A + B - 1)
A2 ( A + B )2
,
,
and finally V( X ) =
With N = A + B, p =
A
nAB ( A + B)
, and q =
B
2
*
A+ B - n A + B -1
, V ( X ) = npq *
.
(3–12)
N-n
and the npq term N N -1 N is the variance of a binomial RV. For n = 1, V(X) = pq, the variance of a Bernoulli trial, and for A or B = 0, V(X) = 0, implying that X is a constant, that is, all red marbles or all nonred marbles. For n > 1, the hypergeometric variance is less than the binomial variance. Deriving the moment generating function for the hypergeometric RV is relatively complex. Figure 3.2 shows the hypergeometric distribution for x = 0, 1, . . . , 10 with n = 10, A = 20, and B = 10, along with the binomial approximation for p = 20/30. EXAMPLE 3.17
A job plant can pass inspection if there are no defects in 3 randomly chosen samples from a lot size of 20. a) Compute the probability of the plant passing inspection if there are 2 defects in the lot. b) Compute the probability of at least one defect. c) Compute the probability of at most one defect. d) Compute the probability that the second item chosen is defective.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 178
178
Chapter 3 Special Discrete Distributions
Solution a) A = 2 defects; B = 18 nondefects; n = 3; x = 0.
P ( X = 0) =
Ê 2ˆ Ê 18ˆ Ë 0¯ Ë 3 ¯ Ê 20ˆ Ë 3¯
b) P ( X ≥ 1) = P ( X = 1) + P ( X = 2) =
=
816
= 0.7158.
1140
Ê 2ˆ Ê18ˆ + Ê 2ˆ Ê18ˆ Ë 1¯ Ë 2 ¯ Ë 2¯ Ë 1 ¯ Ê 20ˆ Ë 3¯
= 0.2842 = 1 - 0.7158.
c) P ( X £ 1) = P ( X = 0) + P ( X = 1) =
Ê 2ˆ Ê 18ˆ + Ê 2ˆ Ê 18ˆ Ë 0¯ Ë 3 ¯ Ë 1¯ Ë 2 ¯
= 0.98421.
Ê 20ˆ Ë 3¯ d) Let D denote defect and N denote no defect. The tree diagram indicates, the probability of the second item being defective is 2/380 + 36/380 = 38/380 = 0.1.
Defect 1/19
DD
Defect 2/20
2/380
No Defect 18/20 ND Defect 2/19
36/380
P369463-Ch003.qxd 9/2/05 11:12 AM Page 179
3.5 Hypergeometric Distribution
179
The commands (hyperg A B n x) returns P(X = x), for example, (hyperg 2 18 3 0) returns 0.7158; (hyper-density A B n) returns the density function for X = 0 to n. (hyper-density 2 18 3) returns x
P(X = x)
0 1 2 3
0.71579 0.26842 0.01579 0.00000
(chyperg A B n x) returns P(X £ x), for example, (chyperg 2 18 3 1) returns 0.9842. (chyperg-a-b A B n x1 x2) returns P(x1 £ x £ x2), for example, (chyperg-a-b 2 18 3 1 2) Æ 0.2842 = P(1 < x £ 2). The command (sim-hyperg A B n m) returns m samples from the hypergeometric. For example, (sim-hyperg 60 40 30 10) may return (19 22 18 18 22 19 17 18 17 18), from which the average is 18.8 ª nA/(A + B) = 18.
EXAMPLE 3.18
In drawing 5 cards from a deck of 52 cards, compute the probability of the following: a) b) c) d)
a heart flush, at most 3 spades, 3 kings, 3 of same rank.
Solution a) Let RV X be the number of hearts, A = 13, B = 39, n = 5, x = 5, which results in
P ( X = 5 hearts) =
Ê13ˆ Ê 39ˆ Ë 5 ¯Ë 0 ¯
= 0.0004952.
Ê 52ˆ Ë 5¯ b) Let RV X be the number of spades, A = 13, B = 39, n = 5, x £ 3. 13 39 13 39 13 39 13 39 P ( X £ 3 spades) = ÊË ˆ¯ ÊË ˆ¯ + ÊË ˆ¯ ÊË ˆ¯ + ÊË ˆ¯ ÊË ˆ¯ + ÊË ˆ¯ ÊË ˆ¯ 0 5 1 4 2 3 3 2 = 0.9888.
Ê 52ˆ Ë 5¯
c) Let RV X be the number of kings, A = 4 kings, B = 48, n = 5, x = 3.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 180
Chapter 3 Special Discrete Distributions
180
4 48 P ( X = 3 kings) = ÊË ˆ¯ ÊË ˆ¯ 3 2
Ê 52ˆ = 0.001736. Ë 5¯
d) Let RV X be the number of a specified rank, A = 4, B = 48, n = 5, x = 3. Since P(3 kings) = 0.001736, P(3 of same rank) is 13 * 0.001736 = 0.022569. EXAMPLE 3.19
In a drawer are 4 black socks, 6 gray socks, and 10 white socks. One reaches in and randomly grabs 3 socks. Compute the probability of a matching pair. P(BB) or P(BBB) or P(GG) or P(GGG) or P(WW) or P(WWW)
Solution
ÈÊ 4ˆ * Ê16ˆ + Ê 4ˆ + Ê 6ˆ * Ê14ˆ + Ê 6ˆ + Ê10ˆ * Ê10ˆ + Ê10ˆ ˘ ÎÍË 2¯ Ë 1 ¯ Ë 3¯ Ë 2¯ Ë 1 ¯ Ë 3¯ Ë 2 ¯ Ë 1 ¯ Ë 3 ¯ ˚˙ =
900
Ê 20ˆ Ë 3¯
= 0.7895.
1140 (+ (hyperg 4 16 3 2) (hyperg 4 16 3 3) (hyperg 6 14 3 3) (hyperg 6 14 3 2) (hyperg 10 10 3 3) (hyperg 10 10 3 2)) Æ 0.78947. Alternately, 4 6 10 P ( match ) = 1 - P ( no match ) = 1 - ÊË ˆ¯ ÊË ˆ¯ ÊË ˆ¯ 1 1 1 240 900 = 1= . 1140 1140
3.6
Ê 20ˆ Ë 3¯
Geometric Distribution Often we are interested in knowing the probability of the first success occurring at a specified Bernoulli trial. For example, how many times must a coin be flipped before the first Head occurs? The geometric RV X denotes the number of Bernoulli trials for the first success to occur for x = 1, 2, 3, . . . , •. The sample space is countably infinite. However, we know that x - 1 failures must occur before the first success. Thus the density for a geometric RV X is f ( x ) = q x -1 p for x, a positive integer. Observe that
(3–13)
P369463-Ch003.qxd 9/2/05 11:12 AM Page 181
3.6 Geometric Distribution
181
34
31
28
25
22
19
16
13
10
4
1
Figure 3.3
7
f (x )
0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0
x
Geometric Density Function for p = 1/7.
•
 pq
x -1
= p + pq + pq 2 + . . . + pq n + . . .
x =1
= p(1 + q + q 2 + . . . + q n + . . .) 1 = p 1- q = 1. Notice the presence of the Bernoulli trials. It is assumed that each trial is independent of all the other trials and that the probability p remains fixed throughout the trials. A graph of the density function for the geometric RV X with p = 1/7 is shown in Figure 3.3. EXAMPLE 3.20
In a fair die roll experiment, a) compute the probability that the first occurrence of the event “four” occurs on the 3rd roll, and b) show that the probability of the first success of a geometric RV is the average of having exactly 1 success in x Bernoulli trials. Solution 2
5
5 1 25 a) With q = and p = , P ( X = 3) = q p = Ê ˆ = . Ë ¯ 6 6 6 6 216
b) P ( X = x ) = q x -1 p =
1
2
Ê xˆ p1q x -1 Ë 1¯ x
= q x -1 p.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 182
182
Chapter 3 Special Discrete Distributions
The expected value of the geometric RV X is derived as •
 xq
E( X ) =
x -1
 xq q
x =1
p
=
•
p
p=
x
*
q
(1 - q)2
=
1
(q + 2q 2 + 3q 3 + 4q 4 + . . .)
q
x =1
q
p
=
.
p
Dividing q by 1 - 2q + q2 produces the series. E(X) can also be derived from the moment generating function. •
M (t) =
 etx q x -1 p = x =1
•
p
 (qe ) q
t x
=
x =1
p Ê qet ˆ pet = for q Ë 1 - qet ¯ 1 - qet
qet < 1 fi t < - Ln q. pet (1 - qet ) + pqe2 t pet = . Note M (0) = 1; M ¢(t) = (1 - qet )2 (1 - qet )2 p 1 M ¢(0) = = = E( X ). 2 p p pet (1 - qet )2 + 2 pqe2 t (1 - qet )
M ¢¢(t) =
(1 - qet )4
M ¢¢(0) = V( X ) =
1+ q
p2 1+ q 2
p EXAMPLE 3.21
=
pe4 (1 + qet ) (1 - qet )3
.
+ E( X 2 ), from which
-
1 2
p
=
q p2
.
a) What is the expected number of trials in rolling a fair die until a specified face occurs? b) What is the expected number of trials in rolling a fair die until all six faces occur? c) If one sampled 200 times with replacement from the integers 1 to 100, how many would one expect to miss? Solution a) Rolling a fair die until a specified face occurs is a geometric RV X. E( X ) =
1 p
=
1
= 6 times.
1/ 6
b) Let Xi = the number of rolls for the ith new face to occur after the (ith 1) face occurred. Then X = X(1) + X(2) + . . . + X(6), where the subscripts are ordinal and not necessarily the die value. The probability of the first new face to occur is 6/6, of the second new face 5/6, third face 4/6, etc. The reciprocals of the probabilities are the E(Xi). The expected value E(X) = E(X1) + E(X2) + E(X3) + E(X4) + E(X5) + E(X6)
P369463-Ch003.qxd 9/2/05 11:12 AM Page 183
3.6 Geometric Distribution
=
6 6
+
6 5
183
+
6
+
4
6 3
+
6 2
+
6 1
= 6*
147
= 14.7 rolls.
60
c) Let RV X = X1 + X2 + . . . + X100 for indicator RVs Xi where X is the number of integers not selected. P(Xi = 1) = (n - 1)/n on any one trial or [(n - 1)/n]m where m is the total number of trials. Thus for n = 100 and m = 200, 200
the number of integers not selected = 100 * [99 /100] = 100 * 0.134 = 13.4. Note that
Ê n - 1ˆ Ë n ¯
n
n
1 = Ê 1 - ˆ ª e -1 as n Æ • and ª e -2 for m = 2 n. Ë n¯
The command (sim-%-missing m n) returns the simulated percent of the missing items along with the list of missing items. (sim-%-missing 100 200) returned (14% (2 4 15 32 47 52 57 58 62 73 76 81 82 92)) where 0.14 ª 1/e2
EXAMPLE 3.22
Compute the probability of the event Heads on the 10th trial of fair coins flip, given that Heads has not occurred by the 7th trial with P(Heads) = 1/4. Solution
Using P(A | B) = P(AB)/P(B), we find that
P ( first H on 10 th trial no heads up to 7 th trial) = q 9 p / q 7 = q 2 p 2
Ê 3ˆ Ê 1ˆ Ê 9 ˆ = = . Ë 4 ¯ Ë 4 ¯ Ë 64 ¯ The geometric RV is said to be memoryless. EXAMPLE 3.23
A job plant makes chips at a 60% reliability rate. A test is devised to authenticate this rate. One hundred chips are selected for testing. If the first failure occurs before the 4th chip is tested, the test fails. Compute the probability of passing the test. Note that while the sampling is without replacement, the dependency is rather weak. Solution q is ª 0.4.
The probability of success p is ª 0.6 and probability of failure
P ( pass test ) = P ( no failures in first 3 chips) = 0.63 = 0.216. P ( failure before 4th chip) = 0.4 + 0.6 * 0.4 + 0.62 * 0.4 = 0.784 = 1 - 0.216. q pq ppq
P369463-Ch003.qxd 9/2/05 11:12 AM Page 184
184
Chapter 3 Special Discrete Distributions
The command (geometric p x) returns the probability of the first success occurring on the x trial. (geometric 3/5 3) returns 0.096. (geometric-density p n) returns the geometric density for P(X = x) from X = 0 to n. (geometric-density 1/2 3) returns X
P(X)
1 2 3
0.5 0.25 0.125
The command (cgeometric p x) returns the cumulative probability that the first success occurs at or before the xth trial. For example, (cgeometric 3/5 3) returns 0.936, the probability of success at or before the completion of the xth trial. The command (cgeometric-a-b 1/3 2 5) returns 0.535, P(2 £ x £ 5).
3.7
Negative Binomial Distribution The geometric RV is the variable number of Bernoulli trials to obtain the first success. The negative binomial RV is the variable number of Bernoulli trials to obtain a fixed number k of successes. The binomial RV is the exact number of successes in a fixed number of Bernoulli trials. Observe that the last outcome for the negative binomial is the final kth success, implying that k - 1 successes occurred in exactly x - 1 trials (a binomial RV). The density function for the negative binomial RV with parameters k as the specified number of successes and p as the probability of success is given by x - 1ˆ k x-k f ( x; k, p) = ÊË p q for x = k, k + 1, k + 2, . . . k - 1¯
(3–14)
Observe that Ê x - 1ˆ p k -1q x - k Ë k - 1¯ is a binomial RV with k - 1 successes in x - 1 trials, which when multiplied by p for the probability of success at the last independent Bernoulli trial becomes the negative binomial density. Also note that the geometric RV is equivalent to the negative binomial RV when k = 1, similar to the Bernoulli RV being a binomial RV when n = 1. The
P369463-Ch003.qxd 9/2/05 11:12 AM Page 185
3.7 Negative Binomial Distribution
185
Ê 1 - qˆ distribution gets it name from the expansion of Ë p p¯ -k 1 q Note that Ê - ˆ = 1. Ë p p¯ EXAMPLE 3.23
-k
for x = k, k + 1, . . .
Compute the probability that the third head occurs at the 5th trial in a repeated coin flip when the probability p of Heads (success) is 1/4. Solution There must be exactly 2 successes in 4 trials, followed by a success on the final, fifth trial. Thus 1ˆ 1ˆ Ê Ê P ( X = 5; k, p) = P 5; 3, = Binomial 2; 4, *p Ë ¯ Ë 4 4¯ 2
2
1 4 Ê 1ˆ Ê 3ˆ Ê 1ˆ Ê ˆ = ÊË ˆ¯ = 0.0527 = negbin 3 5 . Ë ¯ Ë ¯ Ë ¯ Ë ¯ 2 4 4 4 4 EXAMPLE 3.24
Compute the probability that a) the 5th child of a family is the 3rd son; b) there are at least 2 daughters when the 5th child is born; c) the 1st, 2nd, or 3rd son occurs at the birth of the 5th child. Solution 2
3
x - 1ˆ x - k k Ê 4ˆ Ê 1 ˆ Ê 1 ˆ a) P ( X = 5; k = 3, p = 1 / 2) = Ê Ë k - 1¯ q p = Ë 2¯ Ë 2 ¯ Ë 2 ¯ = 0.1875. b) With use of the cumulative negative binomial command for k = 2 to 5, (cnegbinomial-a-b 1/2 2 5 5) returns 0.46875, where k1 = 2, k2 = 5, x = 5. With use of each negative binomial command and adding, (+ (negbin (negbin (negbin (negbin
1/2 1/2 1/2 1/2
2 3 4 5
5) 5) 5) 5)) returns 0.46875.
c) P ( X = 1 or X = 2 or X = 3) 4
2
3
3
4 Ê 1ˆ Ê 1ˆ 4 Ê 1ˆ Ê 1ˆ 4 Ê 1ˆ Ê 1ˆ = ÊË ˆ¯ + ÊË ˆ¯ + ÊË ˆ¯ Ë ¯ Ë ¯ Ë ¯ Ë ¯ 0 2 2 1 2 2 Ë 2¯ Ë 2¯ 2 = 0.03125 + 0.125 + 0.1875 = 0.34375.
2
P369463-Ch003.qxd 9/2/05 11:12 AM Page 186
186
Chapter 3 Special Discrete Distributions
The command (negbin p k x) returns the probability of xth trials needed for k successes. For example (negbin 1/2 3 5) returns P(X = 5 trials, given k = 3 successes) Æ 0.1875. (NegBin-density p k n) returns P(X = x) for X = k to n. (NegBin-density 1/2 2 5) returns X
P(X)
2 3 4 5
0.25 0.25 0.1875 0.125
The expected value of the negative binomial can be found from the moment generating function of the geometric RV or by summing directly the geometric RVs similar to deriving the binomial RV from the Bernoulli. Let negative binomial RV X = X1 + X2 + . . . + Xk where k is the number of successes and Xi is geometric RV or the number of trials to obtain the ith success. Then E( X i ) =
1 p
q
, V( Xi ) =
q
2
, E( X ) =
k
, and V ( X ) =
kq
p
p2
since the Xi RVs are independent. Alternatively, with use of the moment generating function, k
t Ê pe ˆ M ( t) = ; Ë 1 - qe t ¯
M (0) = 1.
Let x = pet and y = qet; then x = dx and y = dy and k
M ( t) =
Ê x ˆ Ë1- y¯
M ¢( t) =
k( x )k -1[(1 - y )dx + xdy] (1 - y )k -1(1 - y )2
M ¢(0) = E( X ) =
kp k p
Similarly, M ¢( t ) =
k +1
=
kx k (1 - y )k + 1
k p ;
.
=
kx k -1dx (1 - y )k +1
=
kx k (1 - y )k +1
=
k( pe t )k (1 - qe t )k +1
P369463-Ch003.qxd 9/2/05 11:12 AM Page 187
3.8 Poisson Distribution
k 2 x k (1 - y )k +1 + k( k + 1) x k (1 - y )k y
M ¢¢( t) =
(1 - y )2 k +1
M ¢¢(0) =
V( X ) =
187
k 2 p k p k +1 + k( k + 1) p k p k q p 2 k +1
k 2 + kq - k 2 p
Thus E( X ) =
k
2
=
kq
EXAMPLE 3.25
k 2 p 2 k +1 + k 2 qp 2 k + kqp 2 k p 2 k +1
=
k 2 + kq p2
.
p2
and V ( X ) =
p
=
kq p2
.
Compute the probabilities of a negative binomial distribution until exactly 2 successes occur with parameter p = 3/4. The computed probabilities for negative binomial X with parameter k = 2 successes at the Xth Bernoulli trial with probability of success parameter p = 3/4 are shown as follows. X P(X)
2
3
4
5
6
7
8
9
10
11
...
0.563
0.281
0.105
0.035
0.011
0.003
0.001
ª0
ª0
ª0
...
(negbin-density 3/4 2 11) returns the probabilities of exactly 2 successes from 2 to 11 trials (0.5625 0.28125 0.10546 0.03515 0.01098 0.00329 0.00096 0.00027 0.00007 0.00002), The command (cnegbinomial p k x) returns the cumulative probability of k or more successes. (cnegbinomial 3/4 2 11) returns 0.9999918. (cnegbinomial-a-b p a b x) returns the sum of the probabilities from a to b successes in x trials. (cnegbinomial-a-b 3/4 2 5 11) Æ 0.0148.
3.8
Poisson Distribution In the previous discrete distributions discussed in this chapter, the Bernoulli trial is the essence in each. The Poisson process is not made up of Bernoulli trials. The Poisson process occurs over time or space. Some examples of the Poisson process are the number of accidents at a certain intersection in a year, the number of errors per chapter in a book, the number of flaws in a bolt of cloth per yard, the number of phone calls in the next hour, the number
P369463-Ch003.qxd 9/2/05 11:12 AM Page 188
188
Chapter 3 Special Discrete Distributions
of cars arriving at a toll station per minute, and the number of earthquakes in a year. With the Poisson distribution we are interested in calculating the probability of the number of successes over some time or space unit. The Poisson process is epitomized by random arrivals or occurrences where the time for an arrival or the space for an occurrence is small relative to the observation time or space. Given a specified time or space interval, the Poisson process is characterized by: 1) The number of occurrences in any subinterval of time or space is independent of the number of occurrences in any other subinterval. 2) The mean number of occurrences is proportional to the size of the interval. If an average of 2 phone calls in 1 hour is assumed, then 4 calls in two hours are assumed. 3) The occurrences do not occur simultaneously. The probability of more than one occurrence in an arbitrary small interval is assumed to be 0. 4) The probability of occurrence in an interval remains constant for all such intervals. The density for the Poisson RV X with parameter k > 0 (where k is the mean number of occurrences per unit time or space) is given by f ( x; k) =
e-kkx
for x = 0, 1, 2, . . .
(3–15)
x! x2
Recall that e x = 1 + x +
+
x3
2!
3!
+ ...+ ...
The sum of the probabilities is 1 as •
e-kkx
Â
x!
x =0
•
= e-k  x =0
kx
= e - k * e k = 1.
(3–16)
x!
The expected value of a Poisson random variable is •
E( X ) =
 x =0
xe - k k x x!
•
=Â x =1
xe - k k x x!
•
= k x =1
e - k k x -1 ( x - 1)!
= k,
(3–17)
since the summation is 1 from (3–16), with the first term being 0 at x = 0. Similarly, •
E( X 2 ) =
 x =0
x2e-kk x x!
•
=Â x =1
xe - k k x ( x - 1)!
•
= k y =0
( y + 1)e - k k y +1 ( y )!
= k * E(Y + 1) = k( k + 1) = k 2 + k, with change of variable y = x - 1. The variance is V ( X ) = k 2 + k - k 2 = k.
(3–18)
P369463-Ch003.qxd 9/2/05 11:12 AM Page 189
3.8 Poisson Distribution
189
k=1
0.3
k=4
0.3
k=9
0.2
0.15 0.2
Figure 3.4
15
12
0
8
9
0 6
0
3
0.05
12
0.1
4
12
9
6
3
0
0
0.1
0
0.1
Poisson Densities with k = 1, 4, 9
Observe that the expected value and variance have the same numerical value (but different units), a defining characteristic of the Poisson RV. The moment generating function is derived as follows: •
M ( t) =
Â
e tx e - k k x
x =0
x!
•
= e-k  x =0
( ke t ) x
t
= e - k e ke .
x!
Note that M(0) = 1. t
M ¢( t) = e - k ke t e ke fi M ¢(0) = k, t
t
M ¢¢( t) = ke - k ( e t e ke ke t + e ke e t ) fi M ¢¢(0) = ke - k ( ke k + e k ) = k 2 + k, and V ( X ) = k 2 + k - k 2 = k. Figure 3.4 shows Poisson densities for parameter k at 1, 4, and 9 with s at 1, 2, and 3, respectively. EXAMPLE 3.26
The average number of defective chips manufactured daily at a plant is 5. Assume the number of defects is a Poisson RV X. Compute a) P(X = 5) and b) P(X ≥ 1). c) Compute k and s 2 of the distribution of X if P(X = 0) = 0.0497. Solution a) P ( X = 5) =
e -5 55
= 0.1755.
(poisson 5 5) Æ 0.1755
5! b) P ( X ≥ 1) = 1 - P ( X = 0) = 1 - e -5 = 1 - 0.0067 = 0.9933. c) P ( X = 0) = 0.0497 = e - kfi - k = Ln 0.0497 = -3.0018 fi k ª 3, s 2 = 3.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 190
190
EXAMPLE 3.27
Chapter 3 Special Discrete Distributions
Recall the matching problem (permutation maps) and suppose for our example that n hats are randomly returned to n people. Compute the expected number of matches, the exact probability, and the Poisson approximate probability of exactly r persons receiving their correct hats for r = 10, n = 20. Solution Let RV X = X1 + X2 + . . . + Xn where Xi is an indicator RV with value 1 if the hat and person match. Since the expected value of an 1 indicator RV is p = , the expected number of matches is then given by n E( X ) = E( X1 ) + E( X 2 ) + . . . + E( X n ) = nE( X i ) = n *
1
= 1.
n
Here with Poisson parameter k = 1 the probability of exactly r persons getting their own hats back is estimated by P ( X = r k = 1) =
e -11r
= 1.01377711963E-7.
r! The exact probability of 10 out of 20 people getting their own hats back is given by Ê 20ˆ N(10, 0) = Ê 20ˆ 1334961 = 1.01377718333E-7. Ë 10¯ Ë 10¯ 20! 20! where N(10, 0) is number of permutation maps with zero matches (derangements). For n ≥ 7, the probability is practically independent of n.
The template (pm n r) returns the exact probability of exactly r matches in n! permutation maps. For example, the command (pm 20 10) returns 0.0000001, as does the command (poisson 1 10). Try (pm n r) for n = 25, 30, 35, 40 with r fixed at 10. (zero-maps n) returns the number of zero matches or derangements for a given n. (zero-maps 10) Æ 1334961. (N n r) returns the number of maps with exactly r matches.
EXAMPLE 3.28
A well-shuffled deck of 52 cards is in hand. The player says aloud, specifying a suit, “Ace, 2 ,3, 4, 5, . . . , 10, jack, queen, king.” Then the player repeats, specifying a second suit, with “Ace, 2, 3 . . .” as each card is presented. The player loses as soon as a card is called correctly. Compute the probability of winning at this form of solitaire.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 191
3.8 Poisson Distribution
191
Solution P(0 matches in 13 calls) ª e-1 = 0.3679. (/ (derangements 13) (! 13)) Æ 63633137/172972800 ª 0.367879. P(0 matches in 52 calls) = (/ (derangements 52) (! 52)) Æ 0.367879. EXAMPLE 3.29
Show that the binomial distribution can be approximated by a Poisson distribution for large n and small p. Solution Equate the Poisson parameter k, the average number of occurrences over time or space to the expected number of successes in the Binomial (X; n, p). k = E( X ) = np; p =
k
.
n x
Binomial( x; n, p) =
n!
x
p q
x!( n - x )!
n-x
n!
n-x
kˆ Ê kˆ Ê 1= Ë ¯ Ë x!( n - x )! n n¯ n
kˆ Ê 1k x n( n - 1) . . . ( n - x + 1) Ë n¯ = x x x! n kˆ Ê 1Ë n¯ x
n
kˆ kˆ Ê Ê As n Æ •, 1 Æ 1 and 1 Æ e-k. Ë ¯ Ë ¯ n n Thus Binomial ( x; n, p) Æ EXAMPLE 3.30
e-kkx x!
for n large and p small.
Compute the probability that S2, a sum of 2, occurs in 10 rolls of a pair of fair dice at most 2 times. Use the Poisson approximation to the binomial. Solution Exact Binomial: P ( S2 ) =
1
and
36
1ˆ Ê Binomial X £ 2, 10, = P ( X = 0) + P ( X = 1) + P ( X = 2) Ë 36 ¯ = 0.7545 + 0.2156 + 0.0277 = 0.9978. (cbinomial 10 1/36 2) Æ 0.9977790.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 192
192
Chapter 3 Special Discrete Distributions
Poisson approximation: Setting k = np =
10
,
36 10 ˆ Ê P X £2 k= = P ( X = 0) + P ( X = 1) + P ( X = 2) Ë 36 ¯ = 0.7575 + 0.2104 + 0.0292 = 0.9971. (cpoisson 10 / 36 2) Æ 0.9970953.
The command (poisson k x) returns the probability of x occurrences in time or space when the average number of occurrences in time or space is k. (poisson 2 3) returns 0.1804. The command (cpoisson k x) returns the cumulative probability. (cpoisson 10/36 2) returns 0.9971. (cpoisson-a-b k a b) returns P(a £ x £ b | k). (cpoisson-a-b 10/36 0 2) Æ 0.9971. (poisson-density k n) returns P(X) for X = 0 to n. (poisson-density 2 4) returns
EXAMPLE 3.31
X
P(X)
0 1 2 3 4
0.1353 0.2707 0.2701 0.1804 0.0902
Suppose your household averages 5 phone calls per hour. a) Compute the probability of taking a 15-minute nap without the phone ringing. b) Repeat for a 30-minute nap. c) Repeat for two 15-minute naps one hour apart. Solution a) 5 phone calls in 60 minutes is k = 5/4 for 15 minutes. P ( X = 0; k = 5 / 4) = e -5 / 4 = 0.2865. b) 5 phone calls in 60 minutes gives k = 5/2 for 30 minutes. P ( X = 0, k = 5 / 2) = e -5 / 2 = 0.0821. c) Taking two 15-minute naps one hour apart without the phone ringing is equivalent to taking one 30-minute nap. The problem can be worked by
P369463-Ch003.qxd 9/2/05 11:12 AM Page 193
3.8 Poisson Distribution
193
considering one 30-minute interval as shown above or two 15-minute intervals separately as shown below. For 15-minute intervals, k = 5/4. P ( X = 0, k = 5 / 2) = P ( X = 0, k = 5 / 4) * P ( X = 0, k = 5 / 4) = 0.2865 * 0.2865 = 0.0821. EXAMPLE 3.32
The average rate of vehicles arriving randomly at a toll station is 20 per minute. Ten percent of the vehicles are trucks. Compute the probability that a) b) c) d) e) f) g)
50 vehicles arrive within 2 minutes. 40 cars arrive within 2 minutes. 10 trucks arrive within 2 minutes. at least 5 cars will arrive before the first truck. at least 5 trucks arrive in the next 40 vehicles. exactly 2 cars arrive in a 10-second period. exactly 2 cars arrive in any two 5-second intervals.
Solution minute.
Let RV X = # vehicles, Y = # cars, Z = # trucks arriving per
P(X = 50 | k = 40) = e-40 * 4050/50! = (poisson 40 50) Æ 0.0177. P(Y = 40 | k = 36) = e-36 * 3640/40! = (poisson 36 40) Æ 0.0508. P(Z = 10 | k = 4) = e-4 * 410/10! = (poisson 4 10) Æ 0.0053. P(Y ≥ 5 before Z = 1) = 0.95 = (-1 (cgeometric-a-b 0.1 1 5)) Æ 0.5905. P(truck arrival) = 0.1 and success of first truck is 6 or greater. e) n = 40, p = 0.1, Z ≥ 5; (-1 (cbinomial 40 0.1 4)) Æ 0.3710. f) P(Y = 2 | k = 3) = e-3 32/2! = (poisson 3 2) Æ 0.2240. g) To get exactly two cars in the 2 separate intervals, we consider (1, 1), (0, 2), and (2, 0), where the ordered pairs indicate the number of arrivals in the first interval and in the second interval.
a) b) c) d)
2
For (1, 1): [ P ( X = 1 k = 1.5] = ( * (poisson 1.5 1) (poisson 1.5 1)) Æ 0.1120. For (0, 2): P ( X = 0 k = 1.5) * P ( X = 2 k = 1.5) = (0.2231) * (0.2510) = 0.0560. For (2, 0): P ( X = 2 k = 1.5) * P ( X = 0 k = 1.5) = (0.2510) * (0.2231) = 0.0560. The sum of the three probabilities is 0.2240. Observe that the problem is equivalent to P(Y = 2 | k = 3), the probability of exactly 2 arrivals in a 10-second period. EXAMPLE 3.33
a) Compute P(| X - m | £ 4) for a Poisson RV with parameter k = 16. b) Use the normal approximation with continuity correction to check the Poisson probability.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 194
Chapter 3 Special Discrete Distributions
194
0.12 0.1
N(16, 16) Poisson k = 16
0.08 0.06 0.04 0.02 0 5
0
Figure 3.5
10
15
20
25
30
35
Normal Approximation to Poisson
Solution a) P(|X - 16 | £ 4) fi P(12 £ X £ 20) = (- (cpoisson 16 20) (cpoisson 16 11)). = 0.71412. b) Poisson k = 16: N(16, 16) P (11.5 £ X £ 20.5) = FÊ Ë
20.5 - 16 ˆ 11.5 - 16 ˆ - FÊ ¯ Ë ¯ 4 4
= 0.8697 - 0.1151 = 0.7546.
3.9
Summary The most frequently used discrete distributions are the binomial and the Poisson. The Bernoulli trial is the essence of the binomial, the negative binomial, and the geometric. Whereas the binomial process is sampling with replacement, the hypergeometric process is sampling without replacement. The Poisson RV can approximate the binomial RV when the probability of success is small and the number of trials is large. The binomial RV can approximate the hypergeometric RV when the number of trials is large. The geometric RV is the number of the Bernoulli trials for the first success to occur. The negative binomial RV is the number of Bernoulli trials for the kth success to occur. The binomial RV is the number of successes in a fixed number of Bernoulli trials. All three of these RVs are sampling with replacement. The hypergeometric RV is the number of successes in n trials by sam-
P369463-Ch003.qxd 9/2/05 11:12 AM Page 195
3.9 Summary
195
pling without replacement. The Poisson RV is the number of independent occurrences in a fixed observation time or space where the average number of occurrences is a constant. EXAMPLE 3.34
A scientist is breeding flies (Poisson process) at an average rate of 10 flies a minute. Assume P(male) = 0.4. Compute the probability that a) b) c) d) e) f)
exactly 10 flies are born in the next minute. exactly 10 flies are born in exactly 3 of any 5 one-minute intervals. the third fly born is the first female. the sixth fly born is the 3rd male. exactly 2 flies will be born in the next 10 seconds. exactly 40 male flies occur in a sample of 100 from 200 flies.
Solution a) b) c) d) e) f)
(Poisson k = 10, x = 10) = 0.1251. (poisson 10 10) (Binomial n = 5, p = 0.1251, x = 3) = 0.015. (binomial 5 0.12513) (Geometric p = 0.6, x = 3) = 0.096. (geometric 0.6 3) (NegBinomial p = 0.4, k = 3, x = 6) = 0.13824. (negbin 0.4 3 6) (Poisson k = 10/6, x = 2) = 0.2623. (poisson 10/6 2) (Hyperg A = 0.4 * 200, B = 0.6 * 200, n = 100, x = 40) = 0.1147. (hyperg 80 120 100 40)
A summary of these special discrete distributions is shown in Table 3.1. Table 3.1
Special Discrete Distributions
RV X
Density f(x)
E(X)
V(X)
M(t)
n
Uniform
1
Âx
x
n
Geometric Bernoulli Binomial
Hypergeometric
Negative Binomial Poisson
q x-1p x 1-x
pq
Ê nˆ p x q n - x Ë x¯ Ê Aˆ Ê B ˆ Ë x ¯ Ë n - x¯
2 i
i =1
n
-
n
x2
Â
e txi n
i =1
1
q
pet
p
p2
1 - qet
p
pq
q + pet
np
npq
(q + pet)n
nA
nA
A+ B - n
B
—
Ê A + Bˆ Ë n ¯
A+ B
A + B A + B A + B -1
Ê x - 1ˆ q x - k p k Ë k - 1¯
k
kq
( pet )
p
p2
(1 - qet )
k
k
ek(e -1)
e-kkx x!
t
k k
P369463-Ch003.qxd 9/2/05 11:12 AM Page 196
196
Chapter 3 Special Discrete Distributions
PROBLEMS DISCRETE UNIFORM DISTRIBUTION 1. A person has 12 keys to try to open a locker. What is the probability that the a) 3rd or 4th key tried will open the locker? b) kth of n keys will open the locker? ans. 2/12. 1/n. 2. Find E(X) and V(X) for the discrete uniform RV X on the integers 1, 2, 3, 4, 5. Then repeat for Y = 3X for the integers Y = 3, 6, 9, 12, 15. 3. Show that E( X ) =
a+b
for the discrete uniform RV on the integers
2 [a, b] and that V ( X ) =
n2 - 1
, where n is the number of integers in 12 [a, b]. Then compute the variance of the sum of a fair dice roll. 4. Which conveys more information for guessing an integer from 1 to 10: being told that a) the number is less than 7 or b) the number is odd? Hint: Information is Log2 (1/p). BERNOULLI DISTRIBUTION 5. Show that a Bernoulli indicator RV is a binomial RV with n = 1. Show that the expected value of a Bernoulli RV (indicator) is p. 6. Calculate the entropy in flipping a biased coin when P(Heads) is a) 0.9, b) 0.1, c) 0.5. BINOMIAL DISTRIBUTION 7. The experiment is rolling a fair die 3 times. Compute the probability of at least one face showing a 6, using a) binomial formulation, b) inclusion/exclusion principle, and c) complementary probability or back door approach. ans. 91/216. 8. Seven soldiers fire sequentially at a target with probability p = 3/8 of a hit for each soldier. Compute the probability of at least 2 hits before the last soldier fires. 9. RV X has density f(x) = 2x on [0, 1]. A random sample of 5 data points is taken from the distribution. Compute the probability that exactly two of the five data points exceed 1/2. ans. 0.0879.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 197
Problems
197
10. Two fair dice are rolled. a) Compute the probability that one die specified in advance is greater than the other. b) If the dice are rolled 7 times, compute the probability that one die specified in advance exceeds the other in exactly 3 of the rolls. 11. Five people enter an elevator on the bottom floor of a building with 6 floors. Use the binomial distribution repeatedly to compute the probability that exactly one gets off on each of the 5 floors, assuming that each is equally likely to get off as not on each floor. Then recompute using the birthday problem model. ans. 0.0384. 12. Compute the probability of a baseball player with a 0.333 batting average getting at least one hit in 56 consecutive games. Assume the player bats 4 times a game. 13. Two fairly evenly matched tennis players A and B have probability of winning of 0.45 and 0.55, respectively. Should A play B 2 out of 3 or 3 out of 5? In general, what is the best strategy to adopt when forced to play a superior opponent? ans. 2 out of 3. 14. Ten batteries randomly selected from a manufacturer with a known 5% defect rate are sold. The cost C of a defective battery is C = 2X 2 + X + 4 where RV X is the number of defects. Find the expected repair cost. 15. Past data show that 10% of the parts shipped from a plant are defective. The repair cost for defects is C(X) = 5X 2 + 2X + 3 where RV X is the number of defects. If 20 parts are shipped, compute the expected repair cost. ans. 36. 16. Find the mean, variance, and standard variation of binomial RV X with n = 50 and p = 0.7. Compute P(m - 2s < X < m + 2s). 17. a) For binomial RV X with p = 0.1 and n unknown, find the smallest n such that P(X ≥ 1) ≥ 0.5. b) For binomial RV X with n = 10, find the largest p such that P(X £ 2) ≥ 0.5. ans. 7 0.258. 18. A system has n independent sensors monitoring a heat source area, with each sensor having probability 1/4 of detection. How many sensors are needed to have probability 0.95 of at least one sensor detecting the heat source when in the area? 19. Two out of 100 people qualify for the IQ society Mensa. Compute the probability that exactly a) 2 people qualify from a random sample of 100 or b) 10 qualify. c) Repeat, using the Poisson approximation. ans. binomial Æ 0.2734 vs. Poisson Æ 0.2707 binomial Æ 0.0000287 vs. Poisson Æ 0.0000381.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 198
198
Chapter 3 Special Discrete Distributions
20. Compute the probability that the sum of 7 (S7) will occur before a sum of 3 (S3) exactly twice in 5 trials of rolling a pair of fair dice, with a trial defined as either S7 or S3 occurring. Interpret the consequences of the experiment. 21. Of 100 families selected at random from a pool of families that have 4 children, show the expected number of families containing 0, 1, 2, 3, and 4 boys. ans. 6.25 25 37.5 25 6.25. 22. For binomial RV X with m = 10 and s 2 = 9, find a) n and p. Compute b) P(X < 2); c) P(1 < X < 3). 23. Find a recursive relation for the binomial density. ans. P ( X = x + 1) =
p( n - x ) q( x + 1)
P ( X = x ).
24. Compute E(X) and V(X) for binomial RV X directly from the density function. 25. Compute the entropy for the number of heads in 3 flips of a fair coin. ans. 1.811. 26. Binomial RV X has E(X) = 3 and V(X) = 9/4. Compute the probability of exactly 3 successes.
MULTINOMIAL DISTRIBUTION 27. a) In a drawer are 4 black socks, 6 gray socks, and 10 white socks. One reaches in and randomly grabs a sock and records the color. The sock is returned to the drawer. If this process is done 6 times (i.e., sampling with replacement), compute the probability of 1 black, 2 gray, and 3 white socks resulting. Also solve the problem on the fly. ans. 0.135. b) An urn contains 12 balls: 5 red, 4 white, and 3 blue. A trial consists of selecting one ball from the urn and replacing the ball. Compute the probability that in 3 trials all three balls are different in color. ans. 5/24. 28. a) A fair die is rolled 10 times. Compute the probability of 1 one, 2 twos, 3 threes, and 4 fours. b) Ten fair pairs of dice are thrown. Compute the probability of 4 sums of 7, 3 sums of 8, 2 sums of 10, and 1 sum of 12. c) Four fair dice are thrown. Compute the probability of getting the same number of faces showing a 2 as well as a 5. d) A manufacturer ships appliances in three colors: almond, white, and green. A shipment of 30 appliances arrived. If the probabilities of almond, white, and green are 0.3, 0.5, and 0.2, respectively, compute the probability that 10 of each color arrived.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 199
199
Problems
HYPERGEOMETRIC DISTRIBUTION 29. a) In a drawer are 4 black socks, 6 gray socks, and 10 white socks. One reaches in and randomly grabs 3 socks. Compute the probability of a matching pair. ans. 0.7895. b) In a narrow-neck bottle where only one marble can pass through at a time are 5 red marbles and 2 white marbles. A wager is offered that 3 red marbles will roll out before 1 white marble does. Should you accept this wager? How many more red marbles must be added before accepting the wager? 30. A box has 4 white, 2 red, and 1 green marble. Three are picked at random without replacement. Let RV X be the number of white marbles selected. Find a) P(X = 1), using the hypergeometric model, and compare with the on the fly method; b) P(X ≥ 1); c) P(X = 2 | X ≥ 1); d) P(The second marble drawn is white). 31. A shipment contains 100 items of which a sample of 10 is tested. If 2 or less items are found defective, the shipment passes the test. If the manufacturer has a 20% defect rate, compute the probability of the shipment passing the test. Use the binomial approximation and check with the exact hypergeometric probability. ans. 0.6778 vs. 0.6812. 32. Compute the probability of exactly 3 of a rank, then the probability of a full house, then the probability of just a triple in 5-card poker and verify P(A + B) = P(A) + P(B) - P(AB). Just a triple is exactly 3 of same rank minus full house. 33. Compute the expected number of aces in picking 1, 4, and 13 random selections from a deck without replacement. Use the command (Hy n) to return the probabilities. For example, (Hy 1) returns the probabilities (0.9230769 0.0769230) corresponding to 0 and 1. Verify the expected number of aces is n/13. ans. 1/13 4/13 8/13 1 34. Seventy-five wolves are trapped and tagged and released into the wilderness. After the wolves disperse, 25 wolves are caught and 15 of them have tags. Estimate the number of wolves in the area. Make appropriate assumptions. 35. Consider independent binomial RVs X and Y with parameters A and p and B and p, respectively. Show that the conditional density of X, given that X + Y = n, is hypergeometric. See Chapter 1 Coin Problem 9. Hint: Find
P( X = x X + Y = n) =
P ( X = x, Y = n - x ) P( X + Y = n)
.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 200
200
Chapter 3 Special Discrete Distributions
GEOMETRIC DISTRIBUTION 36. Let RV X denote the trial at which the first success occurs. In a repeated coin flip experiment, show that the a) probability of the first head occurring on an odd number of trials p . is (1 - q 2 ) Hint: P(x is odd) = p + q2p + q4p + . . . + q2mp + . . . for m = 0, 1, 2, . . . p . b) probability that x is even is 1+ q c) P(x is odd) π 1/2. Hint: Let
q 1 - q2
= 1 / 2 and reach a contradiction.
37. Let RV X denote the number of rolls of a pair of fair dice at which the ans. (8/9)4. first sum of 9 (S9) occurs. Compute P(X ≥ 5). 38. Three identical fair coins are thrown simultaneously until all three show the same face. Find the probability that they are thrown more than three times. Use both front door and back door approaches. 39. How many tosses of a fair coin are expected before both events (Head and Tail) occur? ans. 3. 40. Given that P(head) in a loaded coin is 1/20, a) find the expected number of times to flip the coin until 5 heads occur. b) How may flips are expected before both heads and tails occur? •
41. a) Find E(X) for the geometric RV X given that E( X ) = Â xP ( X = x ). x =1
E( X ) = q 0 p + 2qp + 3q 2 p + 4q 3 p + . . .
= p(1+ 2q + 3q 2 + 4q 3 + . . . + nq n -1 + . . . b) Show that P(X = x + k | X > k) = P(X = x). NEGATIVE BINOMIAL DISTRIBUTION 42. Compute the probability of getting the third S9 in exactly 10 rolls of a pair of fair dice. Compute the expected number of rolls in order to get 3 S9 successes. 43. A couple will have children until two female children occur. Find the probability that they will have four children. ans. 0.1875. 44. During the World Series in baseball, the first team to win 4 games is the winner. Assuming that both teams are equally likely to win, compute the probability of the series ending in a) 4 games, b) 5 games, c) 6 games, d) 7 games.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 201
Problems
201
POISSON DISTRIBUTION 45. Trucks arrive at a weighing station at an average rate of 30 per hour. Compute the probability of exactly a) 20 trucks arriving in the next hour; b) 10 trucks arriving in 30 minutes; c) no truck arriving in 6 minutes. d) Which is more likely, exactly 20 trucks arriving in 30 minutes, followed by exactly 20 trucks arriving in the next 30 minutes, or exactly 40 trucks arriving in one hour? e) Suppose truck arrival is now 4 per hour. Show that the probability of exactly 2 arriving in an hour is equivalent to the probability of exactly 2 arriving in any two separate half-hour periods. ans. Using (poisson k x), (poisson 30 20) Æ 0.01341. (poisson 15 10) Æ 0.0486. (poisson 3 0) Æ 0.0498. (* (poisson 15 20) (poisson 15 20)) Æ 0.001748 (poisson 30 40) Æ 0.0139. e) (poisson 4 2) = (+ (* 2 (poisson 2 2) (poisson 2 0)) ; 2, 0 or 0, 2 (* (poisson 2 1) (poisson 2 1))) ; or 1, 1 Æ 0.1465. a) b) c) d)
46. A certain insect is distributed randomly as a Poisson RV X with k = 2 insects per square yards. Find a radius r such that P(X ≥ 1) = 3/4. 47. Machine A breakdown is a Poisson RV X with mean t/2 hours daily with its cost given by CA(t) = 21t + 7X 2. Machine B is also a Poisson RV Y with mean t/6 and daily cost CB(t) = 42t + 12Y 2. Find the expected cost of each machine for a 12-hour day. ans. E(CA) = 546 E(CB) = 576. 48. The number of accidents at an intersection each year is a Poisson RV with k = 5. A widening of the road was completed, which was claimed to reduce the parameter k to 2 for 60% of the drivers, with no effect on the remaining 40%. If a driver uses the intersection for a year and does not have an accident, find the probability that the widening benefited the person. 49. If the average number of potholes in a mile of highway is 20, compute the probability of a) 2 or more potholes in 1/4 mile of highway, b) 8 potholes occurring in a 1/2-mile stretch of highway, given that there were 5 potholes in the first quarter mile. ans. 0.96 0.14. Miscellaneous 50. The mean number of arriving customers at a restaurant is 20 per hour. Compute the probability that 20 customers will eat at the restaurant in 3 of the next 5 hours.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 202
202
Chapter 3 Special Discrete Distributions
51. Compute the probability of getting at least 5 heads on 10 flips, given that 3 heads occurred on 5 flips of a fair coin. ans. 0.8125. 52. In flipping a fair coin 10 times, find the probability of getting the same number of heads on the first 5 flips as on the second 5 flips. 53. Show that the sum of two independent Poisson RVs is also a Poisson RV. 54. Given hypergeometric RV X with A = 4, B = 6, and sample n = 2, find the density function for RV Y = 1 / ( x + 1). 55. Find the E(X) and V(X) for discrete uniform RV X on the integers in 13 £ x £ 39. ans. 26 60.7. 56. Compare by completing the table entries for binomial RV with parameters n = 10 and p = 0.1 and Poisson RV with parameter k = 1. Use templates (binomial n p x) and (poisson 1 x). X Binomial Poisson
0
1
2
3
4
5
0.3487 0.3679
57. For the multinomial distribution f ( x1, x2 , . . . , x k ) =
n! x1! x2 ! . . . x k !
p1x1 p2x2 . . . pkxk ,
show that f ( x1, x2 , . . . , x k ) =
n!
x1! x2 ! . . . x k ! that C( X a , X b ) = - npa pb π j.
p1x1 p2x2 . . . pkxk
n
Hint: Let the number of successes for X i = Â Ai , where each Ai is an i =1
indicator RV, where E(Ai) = pa, and similarly, E(Bj) = pb. Show C(Ai, Bj) = -pi pj since Ai * Bi = 0. Then show that C( X a , X b ) = Â C( X i , X j ) + Â C( X i , X j ). iπ j
i= j
58. Show that the negative-binomial(p, k, x) = (k/x) binomial(p, k, x). 59. In a bag are numerals from 1 to 12. If 1 to 4 are selected, A wins. If 5 to 8 are selected, B wins. If 9 to 12 are selected, C wins. The selections are made with replacement in the order A, and then B, followed by C, until one wins. Compute the probability of winning for A, B, and C. ans. 9/19 6/19 4/19.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 203
203
Problems
60. Two red cards and two black cards are face down on a table. Two of the cards are randomly selected. Compute the probability that the 2 cards have the same color. 61. In an urn are red and black marbles, with black marbles outnumbering red marbles 2 to 1. One random sample reveals 12 black and 8 red. Another random sample reveals 4 black and 1 red. Which sample offers more evidence that the urn contains twice as many black marbles as red marbles? ans. larger sample. 62. A fair coin is flipped 3 times, constituting a trial. Compute the probability of getting all heads or all tails for the second time at the fifth trial. Interpret and solve accordingly. 63. Given P(Poisson RV X = 0) = 0.2, find k.
ans. 1.61.
64. Given P(Poisson RV X £ 20) = 0.2, find k. 65. Compute the probability that the 13 spades in a deck of 52 playing cards are distributed in Bridge (4 hands of 13) so that North gets 4, East gets 5, South gets 3, and West gets 1. ans. 0.0053877. 66. Skewness a3 is defined as the ratio of the third moment about the mean to s 3, that is, a3 = E(X - m)3/s 3. If a3 is positive, the distribution is skewed right toward the longer tail. Determine the skewness for the following density.
0
1
2
3
4
5
0.07776
0.2592
0.3456
0.2304
0.0768
0.01024
X f(x)
67. Find the expected value of the series expansion of the constant e by dividing the expansion terms by e and using these values as the probabilities for the expansion terms. ex =
x0
+
x1
+
x2
+
x3
+ ...+ 0! 1! 2! 3! e = 1 + 1 + 1 / 2! + 1 / 3! + . . . + (divide by e to get) 1 = 1/ e + 1/ e + 1/2e + 1/6e + . . . + ( probabilities sum to 1) Expected value = 1/ e + 1/ e + 1/4e + 118 / e + ... + = 0.838613 (expect-e n) returns the first n series terms, probabilities, expected value, and entropy. 68. A slot machine has the following number of symbol patterns on three randomly rotating dials.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 204
204
Chapter 3 Special Discrete Distributions Symbol
Dial 1
Dial 2
Dial 3
Bar Bell Plum Orange Cherry Lemon
2 1 7 8 2 0
1 8 2 2 7 0
1 7 3 4 0 5
20
20
20
Total
Compute the probability that a) b) c) d) e) f)
the first orange on Dial 1 occurs on the 5th trial, the first orange on Dial 2 occurs on the 5th trial, exactly 3 bells occur on Dial 3 out of 10 plays, the 3rd orange on Dial 1 occurs on the 6th trial, the 3rd orange on Dial 1 occurs on the 5th trial, RV Plum £ 3 in 10 plays on Dial-1.
69. Identify the discrete random variables from the following descriptions and compute the probability when applicable. a) Probability of getting a “5” on a roll of a die. ans. discrete uniform 1/6. Bernoulli b) Probability a “5” occurs for the first time on the 3rd roll of a fair die. geometric 0.1157. c) Probability no “5” will occur for 6 rolls of a fair die. (binomial 6 1/6 0) Æ 0.3349 (binomial 6 5/6 6) Æ 0.3349 (- 1 (cgeometric 1/6 6) Æ 0.3349. d) Probability the third “5” occurs on 12th roll of a fair die. (negbin 1/6 3 12) Æ 0.0493. e) Probability exactly 3 fives occur in 12 rolls of a fair die. (binomial 12 1/6 3) Æ 0.1974. f) Probability exactly 5 hearts occur in a hand of 13 cards. (hyperg 13 39 13 5) Æ 0.1247. g) Number of times per hour a pair of dice is rolled on a casino’s craps table. Poisson. h) Drawing a winning lottery ticket. discrete uniform. i) Number of aircraft arriving at an airport. Poisson. j) Determining whether sum of 7 occurred in rolling 2 fair dice. Bernoulli. k) Probability of getting a matching pair of shoes from randomly (* 4 (hyperg 2 6 3 2)) Æ 3/7 selecting 3 shoes from 4 different pairs. (/ (* (comb 4 2)(comb 2 1) (comb 2 2)(comb 2 1)) (comb 8 3)) Æ 3/7
P369463-Ch003.qxd 9/2/05 11:12 AM Page 205
Review
205
REVIEW 1. Three marbles are randomly put into 2 bins. Compute the probability that the 2nd bin has at least one marble. Repeat for 3 bins. ans. 7/8 19/27. 2. a) Find constants and b so that f (x) = ax2 + b is a density function on [0, 1] with E(X) = 1. b) Repeat for E(X) = 3/4. Hint: Caution. 3. There are n pairs of shoes. You randomly select k shoes. Compute the expected value of the number of matches. ans. E(X) = k(k - 1)/[2(2n - 1)]. 4. An urn contains 10 red, 5 white, 6 blue, and 3 green marbles. Ten marbles are randomly selected. Compute the probability of 4 red and 3 green marbles. 5. Sketch the cumulative distribution function for the experiment of tossing a fair die. 6. From a set of integers (1 to n), two numbers are randomly selected without replacement. Determine the joint density of the two RVs X and Y and create the joint table for n = 6. Take note of the marginal densities. 7. In the matching problem, compute the probability of randomly returning to their owners exactly 5 hats from a total of 15 hats. Find the difference between the Poisson approximation (poisson 1 5) and the exact (pm 15 5). ans. 0.0030656620 0.0000000002. 8. In a town are 3 hotels in which 3 people randomly get a room. Compute the probability that each person checked into a different hotel by a) enumeration, b) sophisticated counting, c) repeated binomial application, d) on the fly. x+a 9. Find any constants a and b satisfying discrete density f ( x) = for b x = 0, 1, 2, 3. ans. 1 10 infinitely more 10. An urn contains 2W and 3B from which 2 are randomly drawn. Create the joint density function for both with and without replacement where X and Y are indicator RVs for W and B respectively. Let X = 1 if the first drawn is W and 0 if not. Let Y = 1 if the second drawn is B and 0 if not. Show that X and Y are independent with replacement but dependent without replacement in forming the joint density functions of each respectively. 11. Prove that the probability of the absolute difference between the ratio of successes to the total number of Bernoulli trials x/n, and p, the proportion of successes, tends to zero as n tends to infinity (Weak Law of Large Numbers).
P369463-Ch003.qxd 9/2/05 11:12 AM Page 206
206
Chapter 3 Special Discrete Distributions
12. Find k and show that f(x) = x2 + kx + 1 is not a valid density on [0, 3]. 13. Compute the probability of getting a sum of 19 with 4 fair dice. Let x1 + x2 + x3 + x4 = 19 subject to the constraint 1 £ xi £ 6 for each die. The 4 dice bins need 3 dividing lines with 19 items, yielding 19 + 3 = 22 total items. Among the 22C19 solutions, 0 on a die is not possible, thus reducing the manipulated items to 22 - 4 = 18. The total now is 18C15 = 816. Solutions with any xi > 6 must be removed. There are 4 ways to choose the xi > 6 and 6 ways to choose 2 of the xi’s > 6. Use the inclusion/exclusion principle to get 816 - 4C1 * 12C9 + 4C2 * 6C3 = 56 ways to get a sum of 19. Note that the number of items eliminated is 4 for positive solutions and 6 for each xi bearing an integer greater than the constraint 1 to 6. 14. A song contains 36 notes of 8 different sounds, with frequency of the 8 notes being 8, 7, 6, 5, 4, 3, 2, and 1, respectively. How many different tunes could be composed without regard for timing? 15. Compute the probability in tossing a fair coin 10 times and getting a) the sequence HTHTHTHTHT b) 5 heads. ans. 1/1024 0.246094.
SOFTWARE EXERCISES 1. (sim-d-uniform a b n) returns n random samples from the discrete uniform on [a, b]. (mu (sim-d-uniform a b n)) returns a simulated expected value for the discrete uniform on [a, b]. For example, (setf x (sim-d-uniform 5 12 10)) returned the following 10 samples: (12 8 7 5 9 11 9 7 7 8). (mu x) returned a simulated expected value of 8.3 where the theoretical value is 8.5. 2. (sim-bernoulli p) returns 1 if success occurs with probability p or 0 if failure occurs with probability q where p + q = 1. (sim-bernoulli 1/20) probably returns 0. 3. (binomial n p x) returns the binomial probability nCx pxqn-x, for example, (binomial 3 1/2 2) Æ 0.375. (sim-binomial n p m) returns m replications of the number of successes from n trials with probability p. (sim-binomial 3 1/2 10) Æ (2 2 2 2 0 1 3 1 1 3). (cbinomial n p x) returns P(X £ x, n, p), the sum of the binomial probabilities from 0 to x, (cbinomial 10 1/2 5) Æ 0.6230. (cbinomial-a-b n p a b) returns P(a £ X £ b). (cbinomial-a-b 10 1/4 2 5) returns 0.7362. (binomial-density n p) returns the probabilities for X = 0 to n. (binomial-density 3 1/2) Æ
P369463-Ch003.qxd 9/2/05 11:12 AM Page 207
207
Software Exercises X
P(X)
0 1 2 3
0.125 0.375 0.375 0.125
4. (multinomial n x-lst p-lst) returns the multinomial probability n! p1x1 p2x2 . . . pkxk . x1! x2 ! . . . x k ! (multinomial 6 ¢(1 2 3) ¢(4/20 6/20 10/20)) returns 27/200 = 0.135. 5. (poisson k x) returns the P(X = x). (poisson 5 4) Æ 0.175467. (cpoisson k x) returns the cumulative probability P(X £ x). (cpoisson 5 4) Æ 0.440493. (cpoisson-a-b k a b) returns P(a £ X £ b). (cpoisson-a-b 5 2 6) returns 0.7218. (poisson-density k n) returns P(X) for X = 0 to n. (poisson-density 2 5) returns X
P(X)
0 1 2 3 4 5
0.1353352 0.2706705 0.2706705 0.1804470 0.0902235 0.0360894
6. (hyperg A B n x) returns the probability that hypergeometric RV X = x. (hyperg 5 6 4 2) Æ 0.4545. (chyperg A B n x) returns the cumulative probability that X £ x for the hypergeometric RV X. (chyperg 5 6 4 2) Æ 0.8030. (chyperg-a-b A B n a b) returns P(a £ X £ b). (chyperg-a-b 5 6 4 1 3) returns 0.94. (hyperg-density A B n) returns the P(X) for X = 0 to n. (hypergdensity 10 5 3) returns X
P(X)
0 1 2 3
0.0219780 0.2197802 0.4945054 0.2637362
P369463-Ch003.qxd 9/2/05 11:12 AM Page 208
208
Chapter 3 Special Discrete Distributions
7. (geometric p x) returns P(X = x) the first success. (geometric 3/4 3) Æ 0.0469. (cgeometric p x) returns the cumulative probability P(X £ x) (cgeometric 1/4 3) Æ 0.578125. (cgeometric-a-b p a b) returns P(a £ X £ b). (cgeometric-a-b 3/4 3 5) returns 0.2344. (geometric-density p n) returns P(X) for X = 1 to n. (geometricdensity 1/2 5) prints
X
P(X)
1 2 3 4 5
0.5 0.25 0.125 0.0625 0.03125
(sim-geometric p n) returns n runs of the number of trials at which the first success occurs. Try (sim-geometric 1/4 100) followed by (mu *) to see how close the sample average is to the theoretical average of 4. Use (mu (sim-geometric 1/4 100)) and press the F3 key to repeat the command. 8. (all-occur n) returns the expected number of trials before all 1 to n numbers appear. (all-occur 6) Æ 14.7 expected trials before each face of a die appears. Consider the integers from 1 to 100. Simulate how many would be missed in sampling 300 trials with replacement. (sim-%-missed 100 300) Æ (5 % (9 32 71 79 92)). The number missed should be close to 100/e-3. How many expected trials are needed for all of the integers from 1 to 100 to occur? (all-occur 100) Æ 518.74. Try (sim-%-missed 100 519). 9. (negbinomial p k x) returns P(X = x) for the kth success. (negbinomial 1/2 3 5) Æ 0.1875. (cnegbinomial p k x) returns P(X £ x) for k successes by the kth to the Xth trial. (cnegbinomial 1/4 2 5) returns 0.36718. (cnegbin-density p k x) returns the probabilities of the kth to xth success. (cnegbin-density 1/4 2 5) Æ (0.0625 0.0938 0.10545 0.10557). (negbin-density p k n) returns the P(X) for X = k to k + n. (negbindensity 1/2 3 5) prints
P369463-Ch003.qxd 9/2/05 11:12 AM Page 209
209
Software Exercises X
P(X)
3 4 5 6 7 8
0.125 0.1875 0.1875 0.15625 0.1171875 0.0820312
10. (sim-neg-binomial p k n) returns a list of n values where each value is the number of trials occurring to achieve k successes with probability p of success. (sim-neg-binomial 1/2 5 10) may return (14 16 12 14 17 8 15 22 7 10). Try (sim-neg-binomial 1/2 5 100) and verify that the mean is close to 10 (5/ 0.5) and the variance close to 10 (5 * 0.5)/ (0.52) with (mu-var (sim-neg-binomial 1/2 5 100)). 11. (sim-ex-negbin p k n m) returns m simulated values from the negative binomial. (sim-ex-negbin 0.1 3 20 10) Æ close to 30 = 3/0.1 = k/p. 12. Compute the probabilities below by direct computation and check with the programs. a) (binomial 5 1/2 3); b) (negbin 1/2 3 5); c) (cbinomial 5 1/2 3). 13. (skewness sample) returns the skewness of the sample, a3 = E[(X m)3]/s 3. Show that the skewness of a binomial with p = 1/2 ª 0 by simulating (skewness (sim-binomial 10 1/2 1000)), where the simbinomial returns from 1000 samples the number of successes in 10 Bernoulli trials with p = 1/2. Repeat with p = 3/4 and show that skewness is less than 0; p = 1/4 to show that skewness is greater than 0. 14. (pm n r) returns the probability of randomly returning to their owners exactly r hats from n hats. (poisson 1 r) returns an estimate of this probability. Rework Review problem 7 and try some other combinations to see that for n ≥ 7, the probability is practically independent of n. (pm 15 5) Æ 0.0030656622; (poisson 1 5) Æ 0.0030656620. 15. (sim-d-uniform a b n) returns n random samples from the discrete uniform on [a, b]. Find and compare a simulated sample mean and variance of the discrete uniform on the interval [2, 22] with its theoretical expected value and variance of 12 and 36.67, respectively, using (mu-svar (sim-d-uniform 2 22 1000)). 16. Compare the binomial probabilities with the Poisson probabilities for binomial parameters n = 200, p = 1/20, and Poisson parameter k = 10
P369463-Ch003.qxd 9/2/05 11:12 AM Page 210
210
Chapter 3 Special Discrete Distributions
for 0 to 9 occurrences. The software command (binomial-vs-poisson n p) returns the comparison of the first 10 x-values 0 to 9. The command (binomial-vs-poisson n p) returns the comparison of the first 10 x-values 0 to 9; for example, (binomial-vs-poisson 200 1/20) returned the table below. Try various values for n and p and observe the difference is smaller for larger n and smaller p. X
Binomial
Poisson
0 1 2 3 4 5 6 7 8 9
0.0000350 0.0003689 0.0019322 0.0067120 0.0173984 0.0358956 0.0614005 0.0895616 0.1137197 0.1276853
0.0000453 0.0004539 0.0022699 0.0075666 0.0189166 0.0378332 0.0630554 0.0900792 0.1125990 0.1251100
17. Consider the first quadrant of a unit circle defined by x2 + y2 = 1. If two randomly chosen values from the continuous uniform on [0, 1] are squared and summed, resulting in a value u, the probability that binomial RV X lies within the circle portion of the first quadrant is p/4. RV X is binomial in that the value u is either less than 1 (inside the circle) or greater than 1 (outside the circle). Thus the probability is fixed, and the trials are independent. Further, V(X) = npq = n * p/4 * (1 - p/4) = 0.1685478 * n. Thus an estimate for p is given by 4 times the number of successes (in’s) divided by the number of trials. The command (pi-hat n) returns the estimate for p by counting the number of successes. For example, (pi-hat 500) returned (PI-HAT = 3.216 IN = 402 OUT = 98). If n = 1000, how many points are expected to fall within the unit circle portion of the first quadrant? ans. 1000p/4 = 785.3982300. (pi-hat 1000) may return (PI-HAT = 3.128 lN = 782 OUT 218). Estimate p by running the following commands to complete the table below: (setf test-data (repeat # ¢ pi-hat ¢(100 200 300 400 500 600 700 800 900 1000))) to find the number of trials in fist quadrant as the number varies from 100 to 1000. (setf predictions (repeat # ¢ * (list-of 10 (/ pi 4)) ¢(100 200 300 400 500 600 700 800 900 1000)))
P369463-Ch003.qxd 9/2/05 11:12 AM Page 211
211
Software Exercises
Estimate the variance by using the command (setf npq (repeat # ¢ * (list-of 10 0.1685478) ¢(100 200 300 400 500 600 700 800 900 1000))) Trials (n) 100 200 300 400 500 600 700 800 900 1000
Pi-hats
Successes (c)
Predictions 78.54 157.10 235.62 314.16 392.70 471.24 549.78 628.32 706.86 785.40
Variance = npq 16.85 33.71 50.56 67.42 84.27 101.13 117.98 134.84 151.69 168.55
18. The game chuck-a-luck is played with the rolling of three dice. A player selects a number from 1 to 6 and places a bet, and the dice are rolled. The player receives what is bet on each die bearing the number and loses the bet if none of the dice bear the number. The software template (sim-chuck-a-luck die-num n) returns the simulated number of times of winning (with the money won in parentheses), the number of times losing, the probability for winning, and the number of times the selected die-num occurred from each of n games of chucka-luck. Try (sim-chuck-a-luck 3 100) for a simulated probability of winning at this game. (sim-chuck-a-luck 3 100) Æ # Wins = 45 returning $51, # Losses = 55, P(win) = 0.45). 19. (sim-geometric p n) returns the number of trials at which the first success occurs. Tally and compare the experimental results with the theoretical results in the table for n = 100 trials and p = 1/2 using the following commands: (setf sg (sim-geometric 1/2 100)) followed by (print-count-a-b 1 12 sg). Simulated trials 1 (Y1) and trial 2 (Y2) are shown in the following table.
P369463-Ch003.qxd 9/2/05 11:12 AM Page 212
212
Chapter 3 Special Discrete Distributions Number of Trials 1 2 3 4 5 6 7 8 9 10 11 12
Theoretical X q x-1p
Trial 1 Y1
Trial 2 Y2
50 25 12.5 6.25 3.125 1.5625 0.7813 0.3906 0.1953 0.0977 0.0488 0.0244
51 26 15 4 3 0 0 1 0 0 0 0
48 27 14 7 2 1 0 1 0 0 0 0
Trial 3 Y3
Trial 4 Y4
The expected value of a geometric RV with p = 1/2 is 2. The following software command multiplies the x-values 1–12 by their respective probabilities and sums to yields an estimate: (sum (repeat # ¢ * (upto 12) (repeat # ¢ expt (list-of 12 1/2) (upto 12)))). The command returned 1.9966. 20
20. Find p for which
50
 ÊË x ˆ¯ p
x
q (50 - x ) = 0.7.
x =0
ans. (inv-binomial 50 20 0.7) Æ 0.374707. (cbinomial 50 0.3747075 20) Æ 0.7. 21. Find k for which P(Poisson RV X £ 10) = 0.3. ans. (inv-poisson 0.3 10) Æ 12.46. (cpoisson 12.46 10) Æ 0.30. 20
50 22. Find p for which Â Ê ˆ p x q (50 - x ) = 0.7. Ë x¯ x =0 23. Find k for which P(Poisson RV X £ 10) = 0.3. (inv-poisson k x) (inv-poisson 0.3 10) Æ 12.46. 24. Simulate the slot machine. (sim-slot n) returns n plays of the slot machine in problem 67. For example, (setf slots (sim-slot 12)) may return ((ORANGE BELL LEMON) (PLUM BELL BELL) (CHERRY CHERRY PLUM) (ORANGE PLUM BELL)
(PLUM CHERRY ORANGE) (PLUM CHERRY BELL) (ORANGE CHERRY BELL) (PLUM BELL LEMON)
(ORANGE CHERRY ORANGE) (CHERRY ORANGE LEMON) (ORANGE BELL BELL) (PLUM BELL ORANGE)
(repeat #¢ member (list-of 12 ¢orange) slots) returned
P369463-Ch003.qxd 9/2/05 11:12 AM Page 213
213
Software Exercises
((ORANGE BELL LEMON) (ORANGE) (ORANGE CHERRY ORANGE) NIL NIL (ORANGE LEMON) NIL (ORANGE CHERRY BELL) (ORANGE BELL BELL) (ORANGE PLUM BELL) NIL (ORANGE)) Count the number 8 of lists with orange as a member to get a rough estimate of 8/12 of at least one orange. The exact probability is 0.568. Try (pick-a-fruit fruit n) using command (pick-a-fruit ¢orange 1000) to simulate the probability and repeat the command several times to see that the estimate exceeds 0.568 and falls short of 0.568. Compute and verify through simulation the probability of at least one bar. (pick-a-fruit ¢bar 1000) Æ 0.19. 25. Compute the probability that the sum of 2 dice is greater on the second roll. ans. 0.443133. (binomial n p x) For example, (binomial 2 1/36 2) is the probability that the two sums are 2, (binomial 2 2/26 2) is the probability that the two sums are 3, and so forth. The probability that the two sums are the same is given by the command (sum (repeat #¢ binomial (list-of 11 2) (repeat #¢ / ¢(1 2 3 4 5 6 5 4 3 2 1)(list-of 11 36)) (list-of 11 2))). The command returns 0.112654, summing the list of probalilities (+7.716049e-4 3.08642e-3 6.944444e-3 0.012346 0.01929 0.027778 0.01929 0.012346 6.944444e-3 3.08642e-3 7.716049e-4), the probability of the same totals from 2–12 on both rolls. P(different sums) = (-1 0.112654) Æ 0.887346 fi (/ 0.887346 2) Æ 0.443673, the probability of the sum being greater (exclusively or lesser) on the second roll. (sim-sum-2-dice 15000) Æ 6647 for a simulated probability of 6647/ 15,000 = 0.443133. (defun sim-sum-2-dice (n) (let ((cnt 0)) (dotimes (i n cnt) (if (< (+ 2 (random 6)(random 6)) (+ 2 (random 6)(random 6))) (incf cnt)))))
; ; ; ;
when done return count first dice sum < second dice sum increment count
P369463-Ch003.qxd 9/2/05 11:12 AM Page 214
214
Chapter 3 Special Discrete Distributions
SELF QUIZ 3: DISCRETE DISTRIBUTIONS 1. The variance of RV X being the outcome sum from rolling a fair pair of dice is _____. 2. A firm sells 5 items randomly from a large lot of items of which 10% are defective. The cost of these defectives is given by C = 2X2 + 3X + 2 where X is the number of defectives among the 5 items sold. Find the expected repair cost. 3. The moment generating function for a binomial RV is given by _____. 4. You arrive at a bus stop at 8 o’clock, knowing that the bus arrives at a time uniformly distributed between 8 and 8:30. If at 8:15 the bus has not arrived, the probability that you will have to wait at least an additional 10 minutes is _____. 5. The number of colds a person gets each year is Poisson with k = 4. A new drug reduces k to 3 for 60% of the population and has no effect on the remaining 40%. If a person tries the drug for a year and has 1 cold, find the probability that the drug benefits the person. 6. Compute the p that favors a 1-engine aircraft over a 3-engine aircraft under the assumptions of using identical engines and that at least half of the engines are needed to sustain flight. 7. The daily outputs of four sugar refineries are exponential RVs with mean equal to 3 tons. Compute the probability that exactly 2 of the 4 plants process more than 3 tons on a given day. 8. A missile protection system has n radar sets that independently monitor the same area, with each having a probability of detection of 0.25. The number of radar sets needed to have probability 0.95 of detecting entering aircraft is _____. 9. A couple will have children until 3 female children occur. The probability that the third child is the first female is _____, and the probability that they will have 7 children is _____. Assume P(female) = 0.4. 10. A shipment contains 30 printed circuit boards. A sample of 10 is tested. If no more than 1 defect is found, the shipment is accepted. Assume a 10% defect rate to find the probability of acceptance. 11. A coin is flipped repeatedly with P(heads) = 1/20. The expected number of flips until heads occur is _____, and the expected number of flips until both faces occur is _____. 12. For a Poisson fly breeding process with k = 10 flies a minute and P(male) = 0.4, compute the probability using software commands that a) exactly 10 flies are born in the next minute; b) exactly 10 flies are born in exactly 3 of any of the next 5 1-minute intervals; c) the 3rd fly born is the first female; d) the 6th fly born is the 3rd female; e) exactly 2 flies will be born in the next 10 seconds; f) exactly 40 male flies occur in a sample of 100 from 200 flies.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 215
Chapter 4
Special Continuous Distributions
How I wonder where you are!
This chapter introduces the more important continuous distributions—the continuous uniform, exponential, Gamma, normal, T, chi-square, F, beta, and Weibull. The normal distribution is the most used distribution for modeling natural phenomena and activities. For each distribution the expected value, variance, and moment generating function along with entropy are derived. Examples illustrate applications of the distributions.
4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11
Introduction Continuous Uniform Distribution Gamma Function Gamma Family (Gamma, Exponential, Chi-Square) Exponential Distribution Chi-Square Distribution Normal Distribution Student t Distribution Beta Distribution Weibull Distribution F Distribution Summary 215
P369463-Ch004.qxd 9/2/05 11:13 AM Page 216
216
4.0
Chapter 4 Special Continuous Distributions
Introduction Continuous random variables may assume an uncountable, infinite number of values and at times serve as adequate models for discrete RVs with countably infinite ranges. The most special continuous distribution is the normal distribution, also called the Bell or Gaussian curve. The continuous uniform is most useful in illustrating mathematical concepts of randomness and in the simulation of other random variables. The gamma distribution consists of a family of useful distributions of which the exponential and chi-square are members. This family of distributions is often used to model queuing situations. The normal distribution is often used to model natural phenomena and activities. The Weibull is used in reliability theory for time to fail models. Interrelationships of the distributions are shown.
4.1
Continuous Uniform Distribution The continuous uniform distribution is the usual model for the mathematical concept of randomness. Often the designator U[0, 1] is used for the uniform on the interval [0, 1]. The density function for the continuous uniform RV X on [a, b] is f ( x) =
E( X ) =
1
Ú b-a
b
a
xdx =
1 b-a
for a £ x £ b.
(4–1)
x2
( b - a )(b + a ) a + b b b2 - a 2 = = = . 2( b - a ) a 2( b - a ) 2( b - a ) 2
Similarly, E( X 2 ) =
b3 - a 3 3( b - a )
=
b2 + ab + a 2
,
3
from which V ( X ) = E( X 2 ) - E 2 ( X ) b2 + ab + a 2 a 2 + 2ab + b2 ( b - a )2 = = . 3 4 12 M ( t) =
1 b-a
Ú
b
a
b e bt - e at = for t π 0. t( b - a ) a t( b - a ) = 1 when t = 0 using L’Hospital’s rule.
e tx dx =
e tx
P369463-Ch004.qxd 9/2/05 11:13 AM Page 217
4.1 Continuous Uniform Distribution
217
x2
ex = 1+ x +
+
2! e bt = 1 + bt +
x3
+ ...+
3!
b2 t 2
+
+ ...
n!
b3 t 3
2! e at = 1 + at +
xn
+ ...
3!
a 2 t2
+
a 3 t3
2!
+ ....
3!
The moments can be seen more clearly by
bt
M ( t) =
e -e
(1 - 1) + ( b - a )t +
at
( b - a )t
= = 1+
( b + a )t
+
( b2 - a 2 )t 2
2! ( b - a )t ( b2 + ab + a 2 )t 2
2! M ¢( t) =
(b + a )
+
M ¢(0) =
( b3 - a 3 )t 3 3!
+ ...
+ ...
3!
2( b2 + ab + a 2 )t
2!
+
+ ...
3!
b+a
= E( X ).
2 M ¢¢( t) =
2( b2 + ab + a 2 )
+ ...
3! M ¢¢(0) =
2( b2 + ab + a 2 )
= E( X 2 ).
3! V ( X ) = E( X 2 ) - E 2 ( X ) =
( b2 + ab + a 2 ) 3
-
( b + a )2 4
=
( b - a )2
.
12
The entropy of a continuous density function is defined as H(X) = • f(x) Log2 f(x)dx. The continuous uniform entropy is thus computed -Ú-• as b
1
a
b-a
H( X ) = -Ú
Log 2
Ê 1 ˆ dx = Log 2( b - a ) = Log 2(2 3s ). Ë b - a¯
Note that entropy depends on the length of the interval, as does the variance causing the uncertainty. Note also that defining entropy for continuous RVs can violate H(X) ≥ 0. For (b - a) £ 1, the continuous uniform entropy is negative. The continuous uniform distribution on a finite interval has the maximum entropy of all distributions.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 218
218
Chapter 4 Special Continuous Distributions
The template (sim-uniform a b n) returns a random sample from the continuous uniform on [a, b]. For example, (sim-uniform 0 1 100) may return 0.12 0.06 0.94 0.31 0.56 0.06 0.89 0.06 0.36 0.18
0.04 0.30 0.25 0.66 0.63 0.55 0.42 0.74 0.56 0.09
0.13 0.33 0.81 0.02 0.50 0.28 0.55 0.15 0.54 0.72
0.22 0.73 0.50 0.81 0.62 0.81 0.66 0.59 0.74 0.25
0.64 0.76 0.63 0.03 0.59 0.57 0.96 0.63 0.87 0.64
0.69 0.54 0.18 0.26 0.22 0.31 0.30 0.83 0.41 0.44
0.99 0.11 0.24 0.61 0.66 0.08 0.75 0.90 0.37 0.04 0
0.76c 0.94 0.95 0.64 0.70 0.21 0.78 0.40 0.07 0.22
0.90 0.28 0.57 0.65 0.41 0.22 0.56 0.80 0.10 0.19
0.87 0.22 0.10 0.96 0.44 0.44 0.52 0.84 0.25 0.87,
from which (mu-svar *) returned 0.49 0.5 = x and 0.07 1/12 = s2.
EXAMPLE 4.1
Find and graph the cumulative F(x) and density f(x) distributions for the continuous uniform random variable X on the interval [0, 1], showing that F¢ = f. Solution
F (x ) 1
x 1 x f (x )dx = x on [0, 1] F (x ) = S–∞ = 1 for x > 1 f (x ) 1
x 1 f (x) = F ¢ = 1 on [0, 1].
EXAMPLE 4.2
The number of defective solder joints follows a Poisson distribution. In a particular 8-hour workday, one defect was found. Compute the probability that the defect was produced during a) the first hour of soldering or b) the last hour of soldering. c) Given that no defects were found during the first 4 hours, compute the probability that a defect was found during the fifth hour.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 219
4.1 Continuous Uniform Distribution
219
Solution Since exactly one defect was found, the time of occurrence of the defects is continuous uniform on [0, 8] throughout the day, while the number of occurrences is Poisson. a) P ( X £ 1) =
1
Ú
1
EXAMPLE 4.3
b) P (7 £ X £ 8) =
Ú
8
1
dx = 1/8. 8 8 c) P (4 £ X £ 5 P ( X > 4) = P (4 £ X £ 5, X > 4)/ P ( X > 4) 51 Ú4 8 dx 1/8 1 = = = . 81 1/2 4 Ú4 8 dx 0
dx = 1/8.
7
Suppose waiting time X is uniformly distributed from 1 to 3 hours. The cost in dollars of a delay is given by C = 12 + 5X 2. a) Compute the probability that the waiting time is 2 or more hours. b) Compute the expected cost of a delay. a) Compute the probability of X exceeding m + 2s. Solution a) Since X is continuous uniform on [1, 3], f ( x) =
1 b-a
=
1 3 -1
=
1
P ( X ≥ 2) =
.
2
b) E(C ) = E(12 + 5 X 2 ) = 12 + 5 E( X 2 ) = 12 + 5Ú
1
3
1
3
1
2
2
Ú
dx =
1
.
2
x 2 dx = 12 +
2
5 x 3 3 101 = . 6 1 3
E(C) = $33.67. (3 - 1)2
c) Since m = E(X) = (1 + 3)/2 = 2; s =
3
ª 0.5774. 12 3 P(X > m + 2s) = P(X > 2 + 2 * 0.577) = P(X > 3.155) = 0.
EXAMPLE 4.4
=
Given that RV X is continuous uniform on [0, 1], find the density function for RV Y = Xn for integers n > 1. Solution Using the transformation of variables formula fY ( y) = f X [ g -1 ( y)] *
dx
dy with Y = g( x) = X n, X = g -1( y) = y1/ n, dy = nx n -1dx, we have fY ( y ) = 1 *
dx dy
=
1 nx
n -1
1
= n( y
1/ n n -1
)
1
= ny
( n -1) / n
; 0 £ y £ 1.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 220
Chapter 4 Special Continuous Distributions
220
Equivalently, FY ( y ) = P (Y £ y ) = P ( X n £ y ) = P ( X £ y1/ n ) = FY¢( y ) = f y( y ) =
1
0
dx = y1/ n .
y (1- n ) / n for 0 £ y £ 1, n = 1, 2, . . . .
n 1
In particular for n = 2, fY ( y ) =
for y on [0, 1].
2 y EXAMPLE 4.5
Ú
y1/ n
Show that if RV X is continuous uniform on [a, b], then RV Y = cX + d is continuous uniform on [ca + d, cb + d]. Using the transformation of variables with dx/dy = 1/c, we see
Solution that
fY ( y ) =
1 b-a
*
1
=
c
1 c( b - a )
; ca + d £ y £ cb + d
Note that the interval length is (cb + d) - (ca + d) = c(b - a) and the area 1 under the density function y = is equal to 1. c( b - a )
4.2
Gamma Function The Gamma function serves as a normalizing constant for the gamma family of continuous distributions, including the exponential and the chi-square distributions, and also for the Weibull, beta, and F distributions. The Gamma function is capitalized to distinguish its reference from the gamma density function. The Gamma function is defined by the integral G(a ) = G(1) =
Ú
•
0
Ú
•
0
x a -1e - x dx, a > 0.
e - x dx = - e x
G (a + 1) =
Ú
•
0
(4–2)
• = 1. 0
xa e - x dx = - xa e - x
• • a -1 - x + Ú ax e dx = 0 + aG (a ) = aG (a ), 0 0
with u = xa, du = axa-1dx, dv = e-xdx, v = -e-x, and integrating udv by parts. G(a + 1) = aG(a), and for any nonnegative integer n, G(n + 1) = n!. The Gamma function is frequently referred to as the factorial function because of this recursive multiplicative property. Note that G(n) = (n - 1)!.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 221
4.3 Gamma Family (Exponential, Chi-Square, Gamma)
EXAMPLE 4.6
Use the Gamma function to show that Solution Method I:
Ú
•
Ú
•
0
221
e - x dx = 1.
e - x dx = G(1) = 0! = 1.
0
Method II: Recognize the integral as exponential density function with k = 1. Method III: Integrate directly
EXAMPLE 4.7
Ú
•
0
e - x dx = - e - x
• = 1. 0
Use the Gamma function to evaluate a)
Ú
•
0
x 5 e - x/2 dx;
b)
Ú
•
0
( x - 2)2 e - x/2 dx.
Solution a) Let y = x/2 with 2dy = dx and x = 2y. Then
Ú
•
0
b)
Ú
•
0
x 5 e - x/2 dx =
( x - 2)2 e - x/2 dx =
Ú
•
0
Ú
•
0
32y 5 e - y 2dy = 64G(6) = 64 * 5! = 7680.
(2y - 2)2 e - y 2dy •
= 2Ú (4y 2 - 8y + 4)e - y dy 0
= 8G(3) - 16G(2) + 8G(1) = 16 - 16 + 8 = 8. The command (Gamma a) returns (a - 1)! (Gamma 5) Æ 24 = 4!; (Gamma 5/3) returns 0.903 as the fractionalized factorial value. (inc-Gamma-fn a x) returns the value of the Gamma function with integrating limits 0 and x, called the incomplete Gamma function. (inc-Gamma-fn 5 3) returns 0.185 as the value of the integral Ú03 x4e-xdx.
4.3
Gamma Family (Exponential, Chi-Square, Gamma) With the Gamma function we now define the gamma density distribution with parameters k and a for RV X as f ( x; a , k) =
ka x a -1e - kx G(a )
; for k, a , and x ≥ 0.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 222
Chapter 4 Special Continuous Distributions
0.4
a=2
0.35 a=3
0.3 0.25
a=4 a=5
0.2 0.15 0.1 0.05
Figure 4.1a
12
11
10
9
8
7
6
5
4
3
2
1
0 0
Gamma Densities for k = 1 and a = 2,3,4,5
0.35 0.3
k = 1/2
0.25
k = 1/3
0.2
k = 1/4
0.15
k = 1/5
0.1 0.05
Figure 4.1b
27
24
21
18
15
12
9
6
3
0 0
222
Gamma Densities for a = 1 and k = 1/2, 1/3, 1/4, 1/5
For k = 1, the distribution is called the standard gamma. The parameter k determines the size of the gamma density: the smaller the parameter k, the less peaked (kurtosis) the gamma density distribution. For a = 1, the gamma distribution simplifies to the exponential distribution. The parameter a determines the location and shape of the gamma density. Using a combination of k and a creates a suitable family of functions to fit various experimental data. See Figures 4.1a and 4.1b. The gamma distributions are appropriate for component failure data and waiting times for service at queues. The calculations of moments can be done just as simply for general parameters as for specific parameters.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 223
4.3 Gamma Family (Exponential, Chi-Square, Gamma)
n
E( X ) =
ka
Ú G(a )
•
0
x
a + n -1 - kx
e
a + n -1
ka
Ú kG(a )
dx =
223
•
0
Ê yˆ Ë k¯
e - y dy =
G(a + n ) k n G(a )
with y = kx and dy = kdx. For n = 1, G(a + 1)
E( X ) =
kG(a )
=
aG(a ) kG(a )
a
=
.
k
Similarly, for n = 2, E( X 2 ) =
G(a + 2) k 2 G(a )
=
(a + 1)aG(a ) k 2 G(a )
=
a +a2 k2
,
implying that V ( X ) = E( X 2 ) - E 2 ( X ) =
M ( t) = =
ka
Ú
•
a +a2 k2
x a -1e ( k - t ) x dx =
G(a ) 0 ka G(a )
( k - t)a G(a )
=
2
-
a Êaˆ = . Ë k¯ k2 ka
( k - t)G(a )
Ú
•
0
y a -1e - y ( k - t)a -1
dy
ka
(4–3)
( k - t)a
M(0) = 1, and the first moment as well as all the others can be recovered through successive differentiation. For example, M ¢( t) =
Since M ( t) =
aka ( k - t)a -1 ( k - t)
2a
=> M ¢(0) =
aka ka -1 k
2a
=
ak 2a -1 k
2a
=
a
.
k
ka
, the sum of independent gamma RVs Xi with para( k - t)a meters ai and k is also a gamma RV X. That is, X = X1 + X2 + . . . + Xn, and n
Ê k ˆ M X ( t) = ’ M X i( t) = Ë k - t¯ i =1
a1 + ... + a n
. If E(X) and E(Ln X) are specified for RV
X, the gamma density has the maximum entropy.
EXAMPLE 4.8
a) Given a gamma RV X with a = 3 and k = 2, suppose X indicates the time in years between failures of an electronic component. Should one be suspicious of the model if the first failure occurred in a month? b) Compute P(X £ 1.5 years) and P(X £ 1 month).
P369463-Ch004.qxd 9/2/05 11:13 AM Page 224
224
Chapter 4 Special Continuous Distributions
Solution a) E( X ) = V( X ) =
a k a k2
= =
3 2 3
year; .
4
With an expectation of 1.5 years or 18 months, one should be suspicious for the occurrence of a failure in just one month. b) P ( X £ 1.5 years) =
23
1.5
Ú G(3)
0
x 2 e -2 x dx = 4Ú
3
0
y2
e-y
4 = (* 0.5 (inc-Gamma-fn 3 3)) = 0.2884.
dy
=
2
1
Ú 2
3
0
y 2 e - y dy
P ( X £ 112 / year) = (* 0.5 (inc-Gamma-fn 3 1/6)) = 0.00034. The command (sim-gamma a k n) returns n random samples from the gamma distribution with parameters a and k. For example, (setf data (sim-gamma 3 2 100)) may return 1.53 1.91 3.40 2.75 0.65 0.26 2.01 0.88 1.40 2.89
0.71 0.78 1.33 1.20 1.79 1.63 0.96 1.68 3.30 2.21
0.86 0.49 0.93 2.61 2.32 1.86 2.26 1.90 0.90 1.55
1.01 0.40 1.10 1.41 1.45 2.10 2.15 1.04 2.12 3.24
2.37 1.23 0.60 5.29 1.64 1.92 0.53 1.96 1.53 3.63
1.86 2.15 0.19 0.52 2.85 1.41 1.20 0.71 0.79 0.93
1.18 1.01 0.96 2.18 1.17 1.02 0.50 2.47 1.45 1.73
1.03 1.22 0.92 0.62 1.24 1.66 0.90 1.36 0.40 1.53
0.96 1.28 1.91 2.15 0.80 1.54 1.39 1.57 0.50 0.39
0.45 0.47 1.52 1.87 1.68 1.87 1.49 1.60 1.52 2.53,
from which (mu-svar data) returned 1.52 = x m = 3/2 and 0.66 = s2 3/4 = s 2.
4.4
Exponential Distribution The exponential distribution for RV X is frequently used to model the failure rates of electronic components and the waiting times in queues. The exponential density (Figure 4.2) serves as an excellent model for the time between random occurrences, for example, the time between Poisson occurrences. Suppose there are k occurrences per time or space measurement of an event for a Poisson RV X. If there are no occurrences of the event in the time (or space) interval [0, x], then
P369463-Ch004.qxd 9/2/05 11:13 AM Page 225
4.4 Exponential Distribution
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
225
k=2
3
2.8
2.6
2.4
2
2.2
1.8
1.6
1.4
1
1.2
0.8
0.6
0.4
0
Figure 4.2
0.2
k=1
Exponential Densities for k = 1, 2
F(x) = P(X £ x) = 1 - P(X > x) = 1 - P(no occurrence in time interval [0, x]) = 1 - e-kx. Then F(x)¢ = f (x) = ke-ke. P(Exponential RV X > x) = P(Poisson RV X = 0 occurrences | kx occurrences per time), that is, P( X > x ) =
Ú
•
0
ke - kx dx = - e - kx
• = e - kx . x
The density function for the exponential random variable X is given by f ( x ) = ke - kx for x > 0, parameter k > 0, which we see is a gamma density distribution with a = 1. From the expected value a/k for the gamma density, E(X) = 1/k for the exponential. The cumulative distribution function for the exponential is F(x) = P(X £ x) = 1 - e-kx. From the gamma density, the moment generating function (Equation 4–3) for the exponential with a = 1 is M ( t) =
k k-t
M ¢( t) = M ¢¢( t) =
,k>t
k ( k - t)
2
fi M ¢(0) = E( X ) =
.
k
2k( k - t) ( k - t)
1
4
fi M ¢¢(0) = E( X ) =
2 k2
,
P369463-Ch004.qxd 9/2/05 11:13 AM Page 226
226
Chapter 4 Special Continuous Distributions
from which V( X ) =
1 k2
.
The entropy of an exponential RV is Log2 (e/k) = Log2 se. If E(X) of RV X is specified, the exponential distribution has the maximum entropy of all the distributions. As more constraints are specified, the entropy or amount of uncertainty lessens. EXAMPLE 4.9
RV X has density function f(x) = ce-2x for x > 0. Find a) P(X > 2) and P(X < 1/c); b) P(1s < X < 2s) and P(X < 5s). Solution
First compute the value of the constant c.
Ú
•
0
ce -2 x dx =
-ce -2 x • = 1 fi c = 2. 0 2
• = e -4 = 0.018316 = (U- exponential 2 2). 2 1/2 1/2 P ( X < 1/c) = P ( X < 1/2) = Ú 2e -2 x dx = - e -2 x = 1 - e -1 = 0.63212. 0 0 (L-exponential 2 1/2) Æ 0.6321212. •
a) P ( X > 2) = Ú 2e -2 x dx = - e -2 x 2
Note that about 63% of any exponentially distributed RV is below the mean, that is, 1/ k
Ú
0
ke - kx dx = 0.63212.
Knowing that f(x) is an exponential density, we can conclude directly that c = 2. b) s 2 = 1/k2 = 1/4 fi s = 1/2. P(1s < X < 2s) = P(1/2 < X < 1)) = (exponential-a-b 2 1/2 1) Æ 0.232544. P(X £ 5s) = P(X £ 5/2) = (exponential 2 5/2) Æ 0.99362.
The command (exponential-a-b k a b) returns P(a < X < b) for exponential RV X with parameter k. For example, P(1 < X < 2 | k = 2) is (exponential-a-b 2 1 2), returning 0.1170. (L-exponential k x) returns the lower tail as P(X < 1 | k = 1) is (L-exponential 1 1) returning 0.63212. P(X > 1 | k = 1) is (U-exponential 1 1) returning 0.36788, the upper tail.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 227
4.4 Exponential Distribution
EXAMPLE 4.10
227
The length of time X to complete a job is exponentially distributed with E(X) = m = 1/k = 10 hours. a) Compute the probability of job completion between two consecutive jobs exceeding 20 hours. b) The cost of job completion is given by C = 4 + 2X + 2X 2. Find the expected value of C. Solution a) P ( X ≥ 20) =
•
e - x/10
20
10
Ú
dx = 0.13533; (U-exponential 1/10 20).
Equivalently, using the Poisson formulation, in 20 hours the average number of job completions is 2. P ( X P = 0 kP = 2 jobs) = e -2 = 0.13533; (poisson 2 0). b) For exponential RV X, E( X ) = m =
1
= 10,
k V( X ) =
1 2
fi E( X 2 ) = V ( X ) + E 2 ( X ) =
k = 2m 2 = 200.
.
2 k2
E(C ) = E(4 + 2 X + 2 X 2 ) = 4 + 2m + 2(2m 2 ) = 4 + 2 * 10 + 2 * 200 = $424.
(sim-exponential k n) returns n random samples from the exponential distribution with parameter k. For example, (setf data (simexponential 1/10 100)) may return 8.52 5.50 2.78 19.11 6.30 20.47 5.83 6.34 0.21 7.29
18.57 0.45 1.01 2.30 20.65 18.76 3.02 10.05 12.61 11.00
16.39 6.68 4.05 6.57 7.78 11.32 13.81 1.23 5.26 7.35 3.10 14.07 9.76 2.18 0.99 8.49 41.36 8.95 5.36 3.40 5.10 40.32 21.06 2.50 27.01 11.62 7.80 8.47 4.66 32.81
8.96 3.15 9.11 9.17 10.93 3.48 12.49 19.56 18.24 0.21 6.56 2.87 13.07 35.91 3.99 32.79 0.11 0.42 1.08 10.60 7.83 0.36 6.52 3.93 11.62 11.09 11.86 2.16 1.69 9.03 24.25 40.86 14.61 2.16 6.49 15.58 6.40 34.85 31.17 6.42 22.27 8.03 4.07 2.99 2.85 19.16 7.68 11.30 1.83 13.72
from which (mu-svar data) returns 10.66 = x 10 = m and 94.21 = s2 100 = s 2.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 228
Chapter 4 Special Continuous Distributions
228
EXAMPLE 4.11
Memoryless Property a) Show that for exponential RV X with f(x) = ke-kx, P(X > a + b | X > b) = P(X > a). b) Also show equivalently that P(X < a + b) | X > a) = P(X < b). c) Then show that the minimum of independent exponential RVs is exponential with parameter k = -Ski. Solution a) For example, the probability of a component surviving more than 50 minutes given that it has survived for at least 30 minutes is equal to the probability that the component will survive for at least the next 20 minutes. P ( X > a + b, X > b)
P ( X > a + b X > b) =
P ( X > b) •
Ú Ú
ke - kx dx
a +b
=
•
=
=
P ( X > a + b) P ( X > b)
e - k( a + b) e - kb
ke - kx dx
b - ka
=e = 1 - F ( a ) = P ( X > a ). b) For example, the probability that a component has failed in less than 50 minutes given that it has survived for at least 30 minutes is equal to the probability that the component will fail during the next 20 minutes. P ( X < a + b) X > a ) =
P ( X < a + b, X > a ) P( X > a ) a +b
=
Ú Ú a
•
a
ke
- kx
dx =
ke
- kx
=
P ( a < X < a + b) P( X > a )
e - ka - e - k ( a + b)
dx
e - ka
= 1 - e - kb = 1 - e - kb
= F ( b) = P ( X < b). c) Let Y = min{X1, Xx . . . Xn} with parameter ki for RV Xi Note that
Ú
•
y
ke - kx dx = - e - kx
• = e - ky . y
Then P(Y > y) = P(X1 > y)P(X2 > y) . . . P(Xn > y) = e-k1ye-k2y . . . e-kny = e-Skiy.
4.5
Chi-Square Distribution Another of the gamma family distributions with a = v/2 and k = 1/2 is the chi-square distribution given by density
P369463-Ch004.qxd 9/2/05 11:13 AM Page 229
4.5 Chi-Square Distribution
229
f ( x; v) =
(1/2)v/2 x v/2 -1e - x/2 G( v/2)
for x ≥ 0.
(4–4)
The degrees of freedom parameter is v, and the density is often designated by the chi-square symbol c 2. The chi-square distribution is useful for hypothesis testing in determining independence of RVs. The moments of the chi-square RV can be generated from gamma’s moment generating function with a = v/2 and k = 1/2. v/2
ka
Ê 1/2 ˆ M ( t) = = ( k - t)a Ë (1/2 - t) ¯
.
Note that M(0) = 1, which implies that the integral of (4–4) is 1. v(1/2)v/2
M ¢( t) =
2(1/2 - t)v/2 + 2
and M¢(0) = E(X) = (v/2)(1/2)/(1/4) = v, the degrees of freedom. M ¢¢(0) = E( c 2 ) = v2 + 2v, from which V ( X ) = v2 + 2v - v2 = 2v. A chi-square (X 2) RV is the sum of squared independent standard normal RVs. That is, c 2 = Z12 + Z22 + . . . + Zv2 with v degrees of freedom (df ). Also note that chi-square RVs are additive. The sum of two chi-square RVs with df of v1 and v2 is a chi-square RV with v1 + v2 df. The chi-square density distributions are shown in Figure 4.3 for 1, 10, and 20 df.
0.3 0.25
v=1
0.2 0.15 v = 10
0.1
v = 20
0.05
Figure 4.3
Chi-Square Densities with v = 1, 10, 20
30
28
26
24
22
20
18
16
14
12
8
6
4
10
–0.05
2
0
0
P369463-Ch004.qxd 9/2/05 11:13 AM Page 230
230
EXAMPLE 4.12
Chapter 4 Special Continuous Distributions
Compute the probability that chi-square RV X < 4 with v = 4. Solution P ( X < 4) =
1
Ú 4(1!)
4
0
= -3e
-2
xe - x/2 dx =
2(2)
Ú
2
0
4
ye - y dy = e - y ( - y - 1)
2 0
+ 1 + 0.594.
A change of variable with y = x/2 is used in the calculation of the integral. Notice that the Ú20 ye-ydy is equivalent to (inc-Gamma-fn 2 2) Æ 0.593994.
The command (chi-square v x) returns P(X < x | v df ). For example, (chi-square 4 4) returns 0.59399, which is P(X £ 4 with 4 degrees of freedom). The inverse function (inv-chi-sq 4 0.59399) returns the chi-square value x = 4.
EXAMPLE 4.13
Compute E(Xn) for the gamma distribution and use the concept to a) compute E(X), E(X 2), and V(X) for the chi-square with a = v/2 and k = 1/2, and b) confirm E(X), E(X 2), and V(X) for the exponential with a = 1. E( X n ) =
Ú
•
ka x a + n -1e - kx
0
ka
dx =
G(a )
k
a +n
G(a )
G(a + n ) =
G(a + n ) k n G(a )
Solution a) Chi-square For n = 1, E( X ) =
G(a + 1)
For n = 2, E( X 2 ) =
kG(a )
=
G(a + 2) 2
k G(a )
aG(a )
=
kG(a ) =
a
2v
=
= v.
2
k
a (a + 1)G(a ) 2
k G(a )
=
a (a + 1) k2
V ( X ) = E( X 2 ) - E 2 ( X ) = v( v + 2) - v2 = 2v. b) Exponential For n = 1, E( X ) =
G(1 + 1)
For n = 2, E( X 2 ) =
kG(1)
=
G(1 + 2) 2
k G(1)
1
.
k =
2 k2
.
= v( v + 2).
P369463-Ch004.qxd 9/2/05 11:13 AM Page 231
4.6 Normal Distribution
231
V( X ) =
2 k
2
1
-
k
=
2
1 k2
.
(sim-chi-square v n) returns n random samples from the c 2 distribution with parameter v. For example, (setf data (sim-chi-square 10,100)) followed by (mu-svar data) returned 11.28 = x 10 = m and 22.37 = s2 20 = s 2.
4.6
Normal Distribution The most frequent distribution used to model natural phenomena is the normal distribution. Observing the heights, weights, IQ scores, and other measurements of people seems to indicate the appropriateness of the normal distribution. Watching a group of children run around the track will soon show in evidence the few leaders, the mass in the middle, and the few stragglers. The normal distribution is frequently used to approximate the binomial and Poisson distributions. The standard normal RV is the resulting distribution from the Central Limit Theorem, which states that for a set of independent and identically distributed (iid) RVs {Xi}, W = SXi has E(W) = nm and V(W) = ns 2. The implication from the Central Limit Theorem is that the density function for W approaches the standard normal density as n Æ • regardless of the distribution from which the Xi were obtained. Thus the Central Limit Theorem is the reason why so many natural occurrences in our everyday world seem so normally distributed. The standard normal RV Z is usually indicated as N(0, 1), corresponding to N(m, s 2). The symbol F(z) is used to indicate the cumulative distribution function. That is, F( z ) = P ( Z £ z ) =
1 2p
Ú
z
-•
2
e - z /2 dz.
The normal RV X with distribution N(m, s 2) has density function given by f ( x) =
1
2 /2 s 2
e -( x - m )
for -• < x < •.
(4–5)
2p s Setting f ¢(x) = 0 reveals (x - m) = 0 with a maximum occurring at x = m, and setting f ≤(x) = 0 shows points of inflection at x = m ± 1. Also, f(-x) = f(x), indicating symmetry about the vertical axis. The normal curve is often called the bell curve because of its shape or the Gaussian curve because Carl Gauss
P369463-Ch004.qxd 9/2/05 11:13 AM Page 232
Chapter 4 Special Continuous Distributions
232
Standard or Unit Normal m = 0, s = 1
Figure 4.4
14
3
13.2
1.6
12.4
0.2
10
–1.2
9.2
–2.6
8.4
0 –4
6
0.1
7.6
0.2
6.8
0.2 0.15 0.1 0.05 0
11.6
0.3
10.8
0.4
Normal Density with m = 10, s = 2
Normal Densities
0.9 0.8 0.7 0.6 Series1 Series2 Series3 Series4 Series5
0.5 0.4 0.3 0.2 0.1 0 –6
–4
–2
0
2
4
6
–0.1
Figure 4.5
Normal Curves Centered at m = 0 with s = 1/4, 1, 2, 3, 4 studied it. The standard normal density function and the normal density with m = 10 and s = 2 are shown in Figure 4.4. A set of normal curves centered at m = 0 with standard deviation of 1/4, 1, 2, 3, and 4 is shown in Figure 4.5. The cumulative standard normal distribution is shown in Figure 4.6. We can use the moment generating function to generate the first and second moments.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 233
4.6 Normal Distribution
233
Probability
1.0
0.5
0.0 –4
Figure 4.6
–3
–2
–1
x-m s
1
2
3
4
Cumulative Standard Normal Distribution 1
M ( t) = E( e tX ) = let y =
0 z
Ú 2p s
•
-•
2 /2 s 2
e tx e - ( x - m )
dx;
with sdy = dx. Then etX = et (sy+m), and M ( t) =
e mt 2p
Ú
•
-•
e mt
2
esty e - y /2 dy =
Ú 2p
-•
-( y 2 - 2sty )
Completing the square yields
•
=
e -(y
2 - 2sty )/2
-( y 2 - 2sty + s 2 t 2 ) - s 2 t 2
2 M ( t) =
e mt +s
dy.
,
2
2t 2 /2
Ú
•
-•
2
e - ( y -st ) /2 dy
2p 2 2 = e mt +s t /2 , -• < t < •. Observe that
1
Ú
•
-•
2
e - ( y -st ) /2 dy is equal to 1, since the integral is the density
2p function of a normal RV with mean st and variance 1. With M (t) = e mt + s
2 t2
/2
, M (0) = 1, and M ¢(t) = ( m + s 2 t) M (t)
with M ¢(0) = m = E( X ). M ¢¢(t) = ( m + s 2 t) M ¢(t) + s 2 M (t) with M ¢¢(0) = m 2 + s 2 = E( X 2 ), from which V ( X ) = s 2 . Note that in the form of M(t) = exp(mt + s 2t2/2) for the normal RV, the mean is the coefficient of t and the variance is the coefficient of t2/2. For X ~ N(m, s 2), H ( X ) = Log 2 ( 2ps 2 e ). Note again that entropy is a function of the variance, but not the mean. If the mean and variance of an
P369463-Ch004.qxd 9/2/05 11:13 AM Page 234
234
Chapter 4 Special Continuous Distributions
RV are specified, the normal distribution has the maximum entropy of all the distributions. The entropy for the unit normal is 2.05. The normal density function cannot be integrated in closed forms. That is, there are neither elementary functions nor sums of elementary functions whose derivatives yield the normal density. The integration is done numerically. The importance of normal RVs stems from their closure property that the sum and differences of normal RVs are also normal RVs. With this property coupled with random sampling, the independent RVs have additive variances as well. Further, the sampling distribution of means where the sample is taken from any distribution tends to the normal distribution as the sample size approaches infinity.
EXAMPLE 4.14
Show that if X is N(m, s 2), then RV Z =
X -m
is standard (unit) normal, s by a) finding E(Z) and V(Z) and b) using the transformation of variables method. Solution
a) E( Z ) =
E( X - m ) s
V( Z) = V
b) f ( x ) =
1
=
m-m s
= 0;
1 s2 Ê X - mˆ = V( X - m) = = 1. Ë s ¯ s2 s2
2 /2 s 2
e -( x - m )
, dx/dz = s ;
2p s then f Z ( z ) =
EXAMPLE 4.15
1
2 /2 s 2
e - (sz + m - m )
s =
2p s
1 2p
2
e - z /2 .
a) Show that if X is N(m, s 2), RV Y = aX + b is N(am + b, a2s 2). b) Find the density function for RV Y in degrees Centigrade given that RV X is N(50, 81) in degrees Fahrenheit. c) Confirm Part b by using the transformation of variables. Solution a) M Y ( t) = E( e tY ) = E[e t ( aX+b) ] = E( e atX+bt ) = e bt E( e atX ) = e bt M X ( at) = e bt * e mat +s
2 a 2t 2 /2
= exp[bt + mat + (sat)2 / 2]
= exp[( am + b)t + a 2s 2 ( t 2 / 2)], which implies that RV Y is N(am + b, a2s 2) from the coefficients of t and t2/2. Also, E(Y ) = E( aX + b) = am + b and V (Y ) = V ( aX + b) + a 2 V ( X ) = a 2s 2 .
P369463-Ch004.qxd 9/2/05 11:13 AM Page 235
4.6 Normal Distribution
235
b) Given X ~ N(50, 81) in Fahrenheit, then Y = (5/9) (X - 32), E(Y ) = (5/9)(50 - 32) = 10, V (Y ) = (25/81) 81 = 25, and Y ~ N (10, 25). c) Y =
5
( X - 32) fi X =
9
9Y
+ 32;
5
Using f ( y ) = f ( x )
dx dy
=
dx
=
dy 1
e
.
5
Ê 9y ˆ -Á + 32 - 50 ˜ Ë 5 ¯
2 /2*81
9
=
5
2p 9 f ( y) =
1 2p 5
EXAMPLE 4.16
9
2 /2 *25
e - (Y -10 )
1
2 /2 *81
e -3.24 ( y -10 )
2p 5
Find the following probabilities from the normal distributions. P ( X < 5) for N (4, 4) P ( X > 3) for N (4, 9) P (2 < X < 6) for N (4, 4) Given RVs X ~ N(50, 9) and Y ~ N(60, 16) with X independent from Y, compute i. P ( X > 53 AND Y > 64), ii. P ( X > 53 OR Y > 64), iii. P (Y - X < 7). e) Compute P(X 2 - X < 6) for N(2, 4). a) b) c) d)
Solution
Command (L-normal mu var X) returns the lower tail probability; (U-normal mu var X) returns the upper tail probability; (del-normal mu var X1 X2) returns the interval probability.
Ê 5 - 4ˆ = F(0.5) = 0.6915 = (L-normal 4 4 5). a) P ( X < 5) = F Ë 2 ¯ 3 - 4ˆ b) P ( X > 3) = 1 - FÊ = 1 - F( -0.3333) = 1 - 0.3694 = 0.6306 Ë 3 ¯ = (U-normal 4 9 3). Ê 6 - 4ˆ Ê 2 - 4ˆ -F = F(1) - F( -1) = 0.6827 c) P (2 < X < 6) = F Ë 2 ¯ Ë 2 ¯ = (del-normal 4 4 2 6). d) i. P ( X > 53, Y > 64) = (* (U-normal 50 9 53)(U-normal 60 16 64)) = [1 - F(1)] * [1 - F(1)] = 0.15872 = 0.02517.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 236
236
Chapter 4 Special Continuous Distributions
ii. P[( X > 53) + (Y > 64)] = P ( X > 53) + P (Y > 64) - P ( X > 53, Y > 64) = 0.1587 + 0.1587 - 0.02517 = 0.29214. (- (+ (U-normal 50 9 53) (U-normal 60 16 64)) (* (U-normal 50 9 53) (U-normal 60 16 64))) iii. As RVs Y and X are normal, so is Y - X with E(Y - X ) = 60 - 50 = 10; V (Y - X ) = V (Y ) + V ( X ) = 16 + 9 = 25. Thus Y - X ~ N (10, 25). P (Y - X < 7) = (L -normal 10 25 7) Æ 0.2743 = F[(7 - 10) / 5] = F( -0.6) = (phi-0.6) Æ 0.2743. e) P ( X 2 - X < 6) = P ( X 2 - X - 6 < 0) = P [( X - 3)( X + 2) < 0] = P ( X > -2, X < 3) = P ( -2 < X < 3) = (del-normal 2 4 -2 3) Æ 0.6687. EXAMPLE 4.17
Show that G(1/2) = p by letting x = y2/2 in the Gamma function with a = 1/2. 1
Solution First note that since
2p G
Ê 1ˆ = Ë 2¯
Ú
•
0
x -1/2 e - x dx =
Ú
•
0
2
Ú
•
-•
•
2
2
e - x /2 dx = 1, Ú e - x /2 dx = 2p .
2
e - y /2 ydy =
y
-•
2 2
Ú
•
-•
2
e - y /2 dy =
1 ˆ 1 * 3 * . . . * G(2n - 1) p Ê G n+ = for positive integer n. Ë 2¯ 2n
2
( 2p ) = p .
2
The command (Gamma n) returns G(n) = (n - 1)!, where n is an integer or a fraction. (Gamma 5) returns 24, and G
3 p Ê 5ˆ 3 Ê 3ˆ 3 1 Ê 1ˆ 3 1 = G = * G = * * p = . Ë 2 ¯ 2 Ë 2 ¯ 2 2 Ë 2¯ 2 2 4
(Gamma 1/2) returns 1.7724539 1.32934.
EXAMPLE 4.18
p ; (Gamma 5/2) returns
Compute E(X) for normal RV X directly from the definition with X -m Z= . s
P369463-Ch004.qxd 9/2/05 11:13 AM Page 237
4.6 Normal Distribution
Solution E( X ) = = =
237
1
Ú 2p s
•
-•
1 2p s
Ú
•
-•
Ú
•
-•
2 /2 s 2
xe -( x - m )
dx
2
(sz + m )e - z /2 dz 2
ze - z /2 dz +
m
Ú 2p
•
-•
2
e - z /2 dz
2p = s * E ( Z ) + m = s * 0 + m = m.
EXAMPLE 4.19
Let RV X be N(1, 2) and independent RV Y be N(3, 4). Compute a) b) c) d)
P ( X < 1.5, Y < 2); P [( X < 1.5) + (Y < 2)]; P (Y - X < 0); P (2 X + 3Y > 9).
Solution Ê 1.5 - 1ˆ = 0.6382. a) P ( X < 1.5) = F Ë 2 ¯ Ê 2 - 3ˆ P (Y < 2) = F = 0.3085. Ë 2 ¯ P ( X < 1.5, Y < 2) = 0.6382 * 0.3085 = 0.1969. b) P ( X < 1.5 OR Y < 2) = P ( X < 1.5) + P (Y < 2) - P ( X < 1.5, Y < 2) = 0.6382 + 0.3085 - 0.1969 = 0.7498. c) Y - X ~ N(2, 6); E(Y - X) = mY - mX = 3 - 1; V(Y - X) = V(Y) + V(X). Ê 0 - 2ˆ P (Y - X < 0 ) = F = 0.2071 = (phi - 0.8165). Ë 6 ¯ d) Let RV W = 2X + 3Y. Then W is N(11, 44) since E( W ) = E(2 X + 3Y ) = 2 E( X ) + 3 E(Y ) = 2(1) + 3(3) = 11; V ( W ) = V (2 X + 3Y ) = 4V ( X ) + 9V (Y ) = 4 * 2 + 9 * 4 = 44. Ê 9 - 11ˆ Thus P ( W > 9) = 1 - F Ë 44 ¯ = 1 - 0.3815 = 0.6185 = (U-normal 11 4 49).
Table 4.1 is a short computer-generated table of cumulative standard normal probabilities. Symmetry can be used to find probabilities for negative z. The
P369463-Ch004.qxd 9/2/05 11:13 AM Page 238
Chapter 4 Special Continuous Distributions
238
Table 4.1
Normal Curve Probabilities Cumulative Standard Normal Table P ( Z £ z) =
z
1
2p Ú
z
2
e - z /2dz
-•
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0 0.1 0.2 0.3 0.4
0.5000 0.5398 0.5793 0.6179 0.6554
0.5040 0.5438 0.5832 0.6217 0.6591
0.5080 0.5478 0.5871 0.6255 0.6628
0.5120 0.5517 0.5910 0.6293 0.6664
0.5160 0.5557 0.5948 0.6331 0.6700
0.5199 0.5596 0.5987 0.6368 0.6736
0.5239 0.5636 0.6026 0.6406 0.6772
0.5279 0.5675 0.6064 0.6443 0.6808
0.5319 0.5714 0.6103 0.6480 0.6844
0.5359 0.5753 0.6141 0.6517 0.6879
0.5 0.6 0.7 0.8 0.9
0.6915 0.7257 0.7580 0.7881 0.8159
0.6950 0.7291 0.7611 0.7910 0.8186
0.6985 0.7324 0.7642 0.7939 0.8212
0.7019 0.7357 0.7673 0.7967 0.8238
0.7054 0.7389 0.7704 0.7995 0.8264
0.7088 0.7422 0.7734 0.8023 0.8289
0.7123 0.7454 0.7764 0.8051 0.8315
0.7157 0.7486 0.7794 0.8078 0.8340
0.7190 0.7517 0.7823 0.8106 0.8365
0.7224 0.7549 0.7852 0.8133 0.8389
1.0 1.1 1.2 1.3 1.4
0.8413 0.8643 0.8849 0.9032 0.9192
0.8438 0.8665 0.8869 0.9049 0.9207
0.8461 0.8686 0.8888 0.9066 0.9222
0.8485 0.8708 0.8907 0.9082 0.9236
0.8508 0.8729 0.8925 0.9099 0.9251
0.8531 0.8749 0.8944 0.9115 0.9265
0.8554 0.8770 0.8962 0.9131 0.9279
0.8577 0.8790 0.8980 0.9147 0.9292
0.8599 0.8810 0.8997 0.9162 0.9306
0.8621 0.8830 0.9015 0.9177 0.9319
1.5 1.6 1.7 1.8 1.9
0.9332 0.9452 0.9554 0.9641 0.9713
0.9345 0.9463 0.9564 0.9649 0.9719
0.9357 0.9474 0.9573 0.9656 0.9726
0.9370 0.9484 0.9582 0.9664 0.9732
0.9382 0.9495 0.9591 0.9671 0.9738
0.9394 0.9505 0.9599 0.9678 0.9744
0.9406 0.9515 0.9608 0.9686 0.9750
0.9418 0.9525 0.9616 0.9693 0.9756
0.9429 0.9535 0.9625 0.9699 0.9761
0.9441 0.9545 0.9633 0.9706 0.9767
2.0 2.1 2.1 2.3 2.4
0.9772 0.9821 0.9861 0.9893 0.9918
0.9778 0.9826 0.9864 0.9896 0.9920
0.9783 0.9830 0.9868 0.9898 0.9922
0.9788 0.9834 0.9871 0.9901 0.9925
0.9793 0.9838 0.9875 0.9904 0.9927
0.9798 0.9842 0.9878 0.9906 0.9929
0.9803 0.9846 0.9881 0.9909 0.9931
0.9808 0.9850 0.9884 0.9911 0.9932
0.9812 0.9854 0.9887 0.9913 0.9934
0.9817 0.9857 0.9890 0.9916 0.9936
2.5 2.6 2.7 2.8 2.9
0.9938 0.9953 0.9965 0.9974 0.9981
0.9940 0.9955 0.9966 0.9975 0.9982
0.9941 0.9956 0.9967 0.9976 0.9982
0.9943 0.9957 0.9968 0.9977 0.9983
0.9945 0.9959 0.9969 0.9977 0.9984
0.9946 0.9960 0.9970 0.9978 0.9984
0.9948 0.9961 0.9971 0.9979 0.9985
0.9949 0.9962 0.9972 0.9979 0.9985
0.9951 0.9963 0.9973 0.9980 0.9986
0.9952 0.9964 0.9974 0.9981 0.9986
3.0 3.1 3.2 3.3 3.4
0.9987 0.9990 0.9993 0.9995 0.9997
0.9987 0.9991 0.9993 0.9995 0.9997
0.9987 0.9991 0.9994 0.9995 0.9997
0.9988 0.9991 0.9994 0.9996 0.9997
0.9988 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9994 0.9996 0.9997
0.9989 0.9992 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9996 0.9997
0.9990 0.9993 0.9995 0.9997 0.9998
area under the curve from -• up to z is the probability that the value of RV Z is less than z. The relationship F(-x) = 1 - F(x) can be used to find probabilities for negative z, for example, F(-1) = 1 - F(1) = 1 - 0.8413 = 0.1587. The complete normal table is in Table 3 of Appendix B.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 239
4.6 Normal Distribution
239
0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 –4
–3
–2
–1
0
1
2
3
4
68.27% 95.45% 99.73%
Figure 4.7
Standard Normal Probabilities
The command (phi z) returns P(Z £ z). For example, F(1) = 0.8413 from the table and (phi 1) returns 0.8413 from the software program. (del-phi z1 z2) returns F(z2) - F(z1). For example, (del-phi -1 1) returns 0.6827, (del-phi -2 2) returns 0.9545, and (del-phi -3 3) returns 0.9973. See Figure 4.7 for those probabilities. (U-normal mu s 2 X) returns the upper tail probability; (U-normal 0 1 1) returns 0.1587. (L-normal mu s 2 X) returns the lower tail probability; (L-normal 0 1 1) returns 0.8413. (del-normal m s 2 x1 x2) returns P(x1 < X < x2), e.g., (del-normal 50 4 48 52) returns 0.6827 = P(48 £ X £ 52) given X ~ N(50, 4). EXAMPLE 4.20
Find two z-values with the shortest interval (z2 - z1) that gives a) 95% probability, Solution
b) 99% probability.
Symmetry provides the shortest interval.
a) Select the two z-values for 0.025 and 0.975, which are z0.025 = -1.96 and z0.975 = 1.96, giving a total interval length of 3.92. Note that the length between the z-values with probabilities 0.01 and 0.96 also constitutes 95% of the total probability, with total length given by
P369463-Ch004.qxd 9/2/05 11:13 AM Page 240
240
Chapter 4 Special Continuous Distributions
z96 - z01 = 1.751 - ( -2.326) = 4.078 > 3.92. ( - (inv-phi 0.96)(inv-phi 0.01)) Æ 4.077862 b) Select for z0.005 and z0.995, which are -2.575 and 2.575, respectively, for an interval length of 5.15. The command (inv-phi p) returns the z-value such that F(z) = p. (inv-phi 0.025) returns -1.96. a) Use the fact that the binomial distribution can be approximated by the normal distribution to find the probability that Binomial RV X = 13, where X is the number of heads in 20 flips of a fair coin. See Figures 4.8a and b. b) For binomial RV X, find c such that P(X < c) = 0.90, with n = 300 and p = 1/4. Solution a) Because the probability is 1/2, the distribution is not skewed and the normal approximation should be quite good for a sample size of just 20. The command (binomial n p x) ~ (binomial 20 1/2 13) returns 0.0739.
0.2 0.15 0.1 0.05
Figure 4.8a
18 20
14 16
8 10 12
6
4
2
0
0
Binomial(X; n = 20, p = 0.5)
0.2 0.15 0.1 0.05
Figure 4.8b
N(10, 5)
18 20
14 16
8 10 12
6
4
2
0 0
EXAMPLE 4.21
P369463-Ch004.qxd 9/2/05 11:13 AM Page 241
4.6 Normal Distribution
241
m = np = 20 * 0.5 = 10 and s 2 = npq = 5. The continuity correction for 13 heads assumes X is between 12.5 and 13.5. Thus P(12.5 < X < 13.5) given N(10, 5) is Ê 13.5 - 10 ˆ Ê 12.5 - 10 ˆ F -F = 0.94124 - 0.868224 = 0.07301. Ë ¯ Ë 5 5 ¯ (del-normal 10 5 12.5 13.5) Æ 0.07301 (normal approximation); (binomial 20 1/2 13) Æ 0.0739 (exact binomial). b) E(X) = np = 300/4 = 75; V(X) = npq = 900/16 = 56.25. fi F( c - 75)/7.5 = 1.28173 = (inv-phi 0.9) fi c = 84.6. As a check on the approximation, (cbinomial 300 1/4 85) Æ 0.918. (cbinomial 300 1/4 84) Æ 0.896. EXAMPLE 4.22
Use the normal approximation to a) binomial RV X to compute the probability that the number of heads in 100 flips of a fair coin is greater than 60 and b) Poisson RV Y to compute the probability of 25 phone calls in the next hour, where the mean number of calls per hour is 25. Solution a) Normal approximation to the binomial m = np = 100(1/2) = 50; s 2 = npq = 50(1/2) = 25; N (50, 25) Ê 60.5 - 50 ˆ 1- F = 1 - F(2.1) = 1 - 0.9821 = 0.01786 = (U-phi 2.1). Ë ¯ 5 The command (-1 (cbinomial 100 1/2 60)) returns 0.0176, the value of 60
60
40
100ˆ Ê 1 ˆ Ê 1 ˆ P ( X > 60) = 1 - Â ÊË x ¯ Ë 2¯ Ë 2¯ x =0
= 1 - 0.9824 = 0.0176.
b) Normal approximation N(25, 25) to the Poisson E(Y) = 25 = V(Y) is (normal 25 25 25.5) Æ 0.5398 (normal approximation) (cpoisson 25 25) Æ 0.5529 (exact Poisson). EXAMPLE 4.23
Find the value of a)
Ú
2
2
-1
e - x /2 dx and b)
1
Ú
-•
2
e - x /2 dx.
Solution
Ú
a) The value of
2
-1
2
e - x /2 dx = 2p * [F(2) - F( -1)], which from the normal
table is 2p * (0.9772 - 0.1587) = 2.052 = (* (sqrt (* 2pi)) (del-phi-1 2)). b) The value of
1
Ú
-•
2
e - x /2 dx = 2p * F(1) ª 2.507 * 0.8413 ª 2.109.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 242
242
EXAMPLE 4.24
Chapter 4 Special Continuous Distributions
Show that if RV Z is N(0, 1) and X = Z2, then X is chi-square with v = 1 degree of freedom. Solution f Z( z) =
1
e- z
2 /2
for - • < z < •. We can use the transformation formula 2p for Z since Z is strictly monotonic on the interval (-•, 0) and also on the interval (0, •), where z = ± x , X = z 2 fi dx = 2zdz and dz/dx = 1/2z = 1/2 x . dz 1 1 f X ( x) = 2 f Z ( x ) * =2 e - x /2 dx 2p 2 x x -1/2 e - x/2 = for x > 0. 2p The density is the chi-square or gamma density with k =
1
and a =
1
, by
2 2 comparison of the density function with the standard gamma given by f ( X ; k, a ) =
ka x a -1e - kx G(a )
or c 2 with v = 1 given by f ( X ; v) = 2
=
1 1/2
G(1/2)2
1/2 x -1/2 e - x/2 G(1/2) x1/2 -1e - x/2 .
2
A c RV with v = 1 is Z , implying that 2P(Z > z) = P(c 2 > z2) or 2[1 - F(z)] = 1 - c 2(1, z2) or 2 F(z) - 1 = c 2 (1, z2); That is, for any z, say, for example z = 0.123, using the software command ( - (* 2 (phi 0.123)) 1) returns 0.0978929 = 2 F( z) - 1; (chi- square 1 (square 0.123)) returns 0.0978929; c 2 (1, z2 )
The command (setf data (sim-normal m s n)) returns n random samples from the normal distribution with parameters m and s. For example, (setf data (sim-normal 5 1 100)) may return 4.61 6.66 4.94 5.64 5.42 5.58 4.93
3.47 5.94 3.33 3.84 6.36 4.05 6.55
5.14 5.74 3.58 3.68 3.93 5.28 3.66
4.50 5.85 4.21 5.99 3.88 3.08 5.58
5.62 7.20 8.00 5.25 5.64 5.42 3.10
4.35 6.59 6.44 4.76 6.76 4.55 7.02
5.55 5.00 4.64 4.11 6.18 4.56 6.42
5.74 3.34 5.40 4.29 6.12 5.12 4.21
5.32 6.56 3.50 5.68 4.42 5.91 4.03
5.99 5.31 4.13 3.75 3.54 6.33 4.02
P369463-Ch004.qxd 9/2/05 11:13 AM Page 243
4.7 Student t Distribution
7.18 5.31 3.86
5.33 4.38 4.78
4.72 6.47 5.91
243
5.30 6.87 5.42
5.50 3.58 4.47
4.24 4.29 5.37
4.16 4.83 7.02
5.05 5.06 4.38
4.41 6.19 6.24
4.00 3.51 5.26
from which (mu-svar data) returned 5.08 = c ª 5 = m 1.17 = s2 ª 1 = s 2
4.7
Student t Distribution The t distribution gets its name student from the fact that W. S. Gosset published his findings under the pseudonym Student because the brewery for which he worked did not allow publications by its workers. The t distribution is often used in hypothesis testing of the parameter m when the variance of the normal distribution is unknown. For independent RVs Z and c v2 where Z is the unit normal and c v2 is a chisquare random variable with v degrees of freedom, RV t is the ratio defined as t=
Z c 2 /v
,
with the density function given by Ê v + 1ˆ - ( v +1) /2 Ë 2 ¯ Ê t2 ˆ 1+ ,-• 2.508).
Solution a) From Table 4 in Appendix B showing upper tail probabilities, P(t10 < 1.372) = 1 - 0.100 = 0.900, or by using the software command (L-tee 10 1.372) Æ 0.90.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 245
4.8 Beta Distribution
245
b) P(t12 < 3) = (L-tee 12 3) Æ 0.994. Notice that Table 4 in Appendix B does not have the exact entry. c) P(t22 > 2.508) = (U-tee 22 2.508) Æ 0.10 and also from Table 4.
The template (tee n x) returns the P(X £ x) given n degrees of freedom. For example, (tee 15 2.132) returns 0.975. (U-tee 15 2.132) returns 0.0250. The command (inv-t n a) returns the critical t-value computed with a in percent or decimal. For example, (inv-t 15 2.5) returns 2.132.
4.8
Beta Distribution The density function for a beta RV X on domain [0, 1] is given by f (a , b ) =
G(a + b ) G(a )G( b )
x a -1(1 - x )b -1, a , b > 0.
(4–7)
When a = b = 1, the beta distribution becomes the continuous uniform distribution on [0, 1]. Given that f is a density, the reciprocal of the constant term must be the value of the integral from 0 to 1. This integral is often referred to as the beta function. That is, Beta(a , b ) =
E( X n ) = =
1
Ú
0
x a -1(1 - x )b -1dx =
G(a + b )
1
G(a )G( b ) G( a + b )
Ú G(a )G( b )
x a + n -1(1 - x )b -1 dx
G(a + b )
G(a + n )G( b )
0
G(a )G( b )
*
G(a + n + b )
=
G(a + b )G(a + n ) G(a )G(a + n + b )
Observe that for n = 1, E( X ) =
a a +b
and for n = 2, E( X 2 ) =
.
(a + 1)a (a + b + 1)(a + b )
,
.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 246
246
Chapter 4 Special Continuous Distributions
2.5 2
a=5 b = 10
a=2 b=2
a=2 b=3
1.5 a=3 b=3
1 0.5 0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Beta Densities for a and b.
Figure 4.10 from which
(a + 1)a
Ê a ˆ V( X ) = (a + b + 1)(a + b ) Ë a + b ¯ =
ab ( a + b + 1)(a + b)2
2
.
Figure 4.10 shows the beta densities for some combination values of a and b. If the mean, variance, minimum, and maximum of a distribution are specified, then the beta distribution has the maximum entropy or maximum uncertainty of all the distributions. EXAMPLE 4.27
a) For a beta RV X with a = 2 and b = 2, find P(X < 1/2) and the expected value E(X). b) Find a and b given that RV X has a beta distribution with expected value m = 1/3 and variance s 2 = 1/18. Solution G(2 + 2)
a) P ( X < 1/2) = E( X ) =
G(2)G(2)
a a +b
b) Setting m =
1
= =
3 setting s 2 =
1/2
1 18
2 2+ 2 a a +b =
Ú
0
=
1
2 x 3 ˆ 1/2 Êx x(1 - x )dx = 6 = 1/2. Ë 2 3 ¯0
.
2 fi b = 2a , and ab
(a + b + 1)(a + b )2
fi 3a + 1 = 4, or a = 1 and b = 2.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 247
4.9 Weibull Distribution
EXAMPLE 4.28
247
Project Evaluation and Review Technique (PERT) is a method of laying out activities in needed order of completion. Times to complete the activities are often modeled by using the beta distribution. If one activity has a = 5 and b = 2, compute the probability that the activity will be completed within 75% of the maximum time. Solution P ( X £ 3/4) =
G(5 + 2) G(5)G(2)
Ú
3 /4
0
x 4 (1 - x )dx =
6!Ê x 5 x 6 ˆ 3/4 = 0.534. 4!Ë 5 6 ¯0
The template (Inc-beta a b x) returns the P(X £ x) for the beta density on [0, x], called the incomplete beta. For example, (Inc-beta 5 2 3/4) returns 0.535 as the approximate value of G(5 + 2)
Ú G(5)G(2)
3 /4
0
4.9
x 4 (1 - x )dx = 0.534.
Weibull Distribution The exponential distribution is suitable for components experiencing a constant failure rate. However, there are many components that experience a high failure rate during the initial burn-in period and a constant failure rate thereafter until such time when physical deterioration causes the failure rate to rise. The Weibull distribution contains an aging parameter a making it applicable to model the lifetime of such components. The density function for the Weibull RV X is given by a
f ( x; a , k, c) = ka ( x - c)a -1 e - k ( x - c ) for 0 £ x £ •, k, a > 0,
(4–8)
where the constant c is a kind of guaranteed time of survival. The surest guarantee is c = 0. With c = 0, a
f ( x; a , k) = kax a -1e - kx .
(4–9)
By substituting y = kxa and making use of the Gamma function, the expected value of X is computed as 1/a •
a
E( X ) = ka Ú x a e - kx dx = 0
=
1 1/ a
k
G
Ê a + 1ˆ . Ë a ¯
Ú
•
0
Ê yˆ Ë k¯
e - y dy (4–10)
P369463-Ch004.qxd 9/2/05 11:13 AM Page 248
248
Chapter 4 Special Continuous Distributions
Similarly, the second moment about the origin is 2 /a •
a
E( X 2 ) = ka Ú x a +1e - kx dx = 0
1
= k
2 /a
G
Ú
•
0
Ê yˆ Ë k¯
e - y dy
Ê a + 2ˆ . Ë a ¯
The variance is then V( X ) =
1 È Ê a + 2ˆ Ê a + 1ˆ ˘ G - G2 . Í 2/ a Î Ë ¯ Ë a ¯ ˚˙ k a
(4–11)
When a = 1, the Weibull density in Equation 4–9 becomes ke-kx, which is the exponential density. The Weibull distribution thus encompasses the exponential for constant failure rate but allows other positive values for the shaping parameter a, resulting in a choice of failure rates. We integrate the Weibull density f(x) to obtain F(x), the cumulative Weibull distribution. x
a
F ( x ) = P ( X £ x ) = ka Ú x a -1e - kx dx = 0
Ú
kxa
0
e - y dy = - e -y
a kx a = 1 - e - kx , 0
by letting y = kxa with dy = akxa-1dx. Thus a
F ( x ) = 1 - e - kx .
(4–12)
The survivor (reliability) function S(x) is defined as P(X > x) which is equivalent to 1 - F(x). Let RV T be the time to fail with density f and cumulative distribution F. The ratio h( t) =
f ( t) 1 - F ( t)
is called the hazard function or failure rate at time t. If a = 1, the failure rate is ke - kt 1 - (1 - e - kt )
= k,
a constant rate. If a π 1, the failure rate is a
ka ta -1e - ( kt )
a
1 - (1 - e - ( kt ) )
= kata -1,
a variable rate. For a < 1, the failure rate decreases with time; for a > 1, the failure rate increases with time; for a = 1, the failure rate is k, the exponential parameter. The Weibull density function is shown in Figure 4.11 with a = 2 and 3 and k = 0.5 and 1.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 249
4.9 Weibull Distribution
249
0.6
0 0. 4 0. 8 1. 2 1. 6
3
3.6
0 2.4
0 1.8
0.2 1.2
0.2 0.6
0.6 0.4
0
0.4
a. Weibull Density a = 2, k = 0.5
2 2. 4 2. 8
1 0.8
b. Weibull Density a = 2, k = 1
0.8
.5
0.6
1
2
0
d. Weibull Density a = 3, k = 1
c. Weibull Density a = 3, k = 0.5
Figure 4.11
EXAMPLE 4.29
3
0 2
0 2. 4 2. 8 3. 2
.5
0. 4 0. 8 1. 2 1. 6
0.2
1
0.4
Weibull Densities
For a Weibull RV X with parameters a = 3 and k =
1
, compute
100 P(X < 5). a
Solution F(x) = 1 - e-kx ; P(X < 5) = 1 - e-1.25 = 1 - 0.2865 = 0.7135.
EXAMPLE 4.30
Given a =
and k =
5 Solution
EXAMPLE 4.31
3
1
for a Weibull RV X, compute
P(X > 2000).
200
P(X > 2000) = e-95.64/200 = e-0.4782 = 0.6199.
The failure distribution in years of an appliance is Weibull with a = 3 and k = 0.2. Find P(X £ 2) and determine m and s until a failure occurs. Solution
a
F(X) = 1 - e-kx ; P(X < 2) = 1 - e-0.2*8 = 1 - 0.2019 = 0.7981.
m = E( X ) =
1 1/a
k
G
Ê a + 1ˆ = 5 -1/3 G(4/3) = 0.585 * 0.89338 = 0.552 years. Ë a ¯
P369463-Ch004.qxd 9/2/05 11:13 AM Page 250
250
Chapter 4 Special Continuous Distributions 2/a
È Ê a + 2ˆ 2 Ê a + 1ˆ ˘ ÍÎGË a ¯ - G Ë a ¯ ˙˚ = 52/3 [G(5/3) - G 2 (4/3)] = 2.92[0.9033 - 0.89342 ] = 0.307.
Ê 1ˆ V( X ) = Ë k¯
Thus s = 0.307 = 0.5541.
The command (Weibull a k x) returns P(X £ x). For example, (Weibull 3 0.2 2) Æ 0.7981. (U-Weibull 3 0.2 2) Æ 0.201897, the upper tail probability.
4.10 F Distribution The F distribution, named for Sir Ronald Fisher, is applicable to samples from a normal population. The RV Fn,m is the ratio of two independent chi-square (C2) RVs divided by their respective degrees of freedom n and m. Fn , m = c n2 =
( n - 1) S12 s 12
c n2 / n
(4–13)
2 cm /m
2 ; cm =
( m - 1) S22 s 22
.
We designate the RV F as Fn,m and its specific value as Fn,m,a. P ( Fn , m £ Fn , m ,a ) = 1 - a =P
2 Ê c n /n ˆ £ Fn , m ,a , Ë c 2 /m ¯ m
where Fn,m,a can be obtained from the F tables or the software commands for a specified a and degrees of freedom for both the numerator and denominator. P ( Fn , m ≥ Fn , m ,a ) = a =P
2 Ê c n /n ˆ ≥ Fn , m ,a . Ë c 2 /m ¯ m
Fn , m ,a = 1/ Fm , n ,1-a .
(4–14)
Note that the degrees of freedom are reversed on the right in equation (4–14). For example,
P369463-Ch004.qxd 9/2/05 11:13 AM Page 251
4.10 F Distribution
251
F5,10,0.05 is software command (inv-F 5 10 0.05), which returns 3.34. F10,5,0.95 is software command (inv-F 10 5 0.95), which returns 0.299 ª 1/3.34. The density distribution of Fn,m is given by G f ( x) =
Ê n + m ˆ n/2 m /2 ( n - 2 )/2 ( n m )x Ë 2 ¯
for x > 0
(4–15)
( n + m )/2 Ê nˆ Ê mˆ G G [ m + nx] Ë 2¯ Ë 2 ¯
The expected value of F is m=
m m -2
for m > 2;
(4–16)
the variance of F is s2 =
m 2 (2m + 2n - 4) n( m - 2)2 ( m - 4)
, m > 4.
(4–17)
One of the most useful properties of the F RV is that if s12 and s22 are the sample variances of two independent random samples of sizes n1 and n2 from normal populations with variances s 12 and s 22, then F =
S12 /s 12 S22 /s 22
=
s 22 S12 s 12 S22
is an F distribution with n1 - 1 and n2 - 1 degrees of freedom. The F distribution is related to the t distribution in that F(1, v, x) = t2(v, x). For example, ta = 0.05, v =12 = 2.179; Fa = 0.05,1, v =12 = 2.1792 = 4.748. We make extensive use of the F statistic in the analysis of variance discussed in Chapter 9. In Figure 4.11a, the highest to lowest peaks are given by numerator degrees of freedom n = (15, 10, 5) while denominator degrees of freedom m remains at 10, and in Figure 4.12b the peaks are for n = 10 while m = (15, 10, 5). EXAMPLE 4.32
Given that P(F5,10 ≥ 3.33) = 0.05, compute F10,5,0.95. Solution Using the relationship in Equation 4-14, F10,5,0.95 =
EXAMPLE 4.33
1 F5,10,0.05
=
1 3.33
= 0.30
(inv-F 10 5 0.95).
If s12 and s22 are sample variances from independent samples of sizes 21 and 25, respectively, with s 12 = 16 and s 22 = 20, find P(s12/s22 > 3.425).
P369463-Ch004.qxd 9/2/05 11:13 AM Page 252
252
Chapter 4 Special Continuous Distributions
(a)
15 10
0.8 0.6
5
0.4 0.2
0
0 0
Figure 4.12
Solution
F =
1
2
3
n = 10; m = 15 n = 10; m = 10 n = 10; m = 5
0 1 2 3 4 5 6 7
4
F-Densities for n and m
s 22 S12
=
20 S12
2
fiP
Ê 1.25 S1 ˆ > 3.425 = P ( F20,24 > 2.74) Ë S2 ¯
s 12 S22 16 S22 ª 0.01. (U-Fd 20 24 2.74) Æ 0.009956.
EXAMPLE 4.34
(b)
n = 15; m = 10 n = 10, m = 10 n = 5; m = 10
20
2
Machine 1 makes 12 parts with standard deviation 0.005 of length. Machine 2 makes 15 parts with standard deviation 0.012. If both machines are to make identical parts, compute the probability of these standard deviations occurring. Solution
(0.005)2 ˆ Ê P s12 / s22 < = P [ F (11, 14) < 0.1736] = 0.0030. Ë (0.012)2 ¯
(0.012)2 ˆ Ê P s22 / s12 > = P [ F (14, 11) > 5.76] = P [ F (11, 14) < 1/5.76] = 0.0030. Ë (0.005)2 ¯ Hence the total probability is the sum. The command ( + (L - Fd 11 14 0.1736) (U- Fd 14 11 5.76)) retums 0.006. We may conclude that something is awry with at least one of the machines.
4.11 Summary The special continuous distributions serve as models for many natural phenomena. The normal and gamma distributions are most important. The continuous uniform is frequently used for simulating samples from the other
P369463-Ch004.qxd 9/2/05 11:13 AM Page 253
4.11 Summary
253
continuous distributions. The more parameters a distribution has, the more suitable is the distribution for modeling complex phenomena. ka x a -1e - kx The gamma density f ( x ) = is exponential with scale parameG(a ) ter k when shape parameter a = 1, and chi-square when k = 1/2 and a = v/2 for integer degrees of freedom parameter v. A chi-square RV is a sum of the squared unit normal RV Z, that is, c 2 = Z12 + Z22 + . . . + Zv2. In particular c 2 = Z2 fi P(Z2 < 1) = P(-1 < Z < 1) = P(X12 £ 1) = 0.6827. Also, 2 * F(z) - 1 = c 2(1, z2). The Weibull density is exponential with parameter k when a = 1. The beta density is continuous uniform when a = b = 1. For RVs X and Y where Y = g(X), a strictly monotonic function on the domain of X, the density function of Y is obtained by the transformation formula fY ( y ) = f X [ g -1( y )]
dg -1( y )
.
dy
A summary of these special continuous distributions is shown in Table 4.2.
EXAMPLE 4.35
Find P(X < 1/2) for the following densities: a) c) e) g) i)
continuous uniform on [0, 1], b) exponential with k = 2, unit normal on (-•, •), d) t with v = 2 df, chi-square with v = 2 on [0, •), f) gamma with a = k = 2, Weibull with a = 1, k = 2, h) beta with a = 2 and b = 3 on [0, 1], F with n = m = 2.
Solution 1/2
a)
Ú
b)
Ú
1/2
e)
2e -2 x dx = - e -2 x
0
1
1/2
Ú 2p
c) d)
1dx = 1/2 = (continuous-uniform 0 1 1/2).
0
-•
1/2 = 1 - e -1 = 0.6321 = (L-exponential 2 1/2). 0
2
e - x /2 dx = F(1/2) = 0.6915 from normal tableor command (phi1/2).
x2 ˆ Ê 1 + Ú 2¯ 2p G(1) -• Ë
G(3/2)
1
1/2
Ú 2
0
-3/2
1/2
e - x/2 dx = - e - x/2
dx =
1
Ú 2
9 /8
-•
u -3/2 du =
2 = (L-Tee 2 1/2).
2 2 2
u -1/2
9/8 = 0.6667 -•
1/2 = 1 - e -1/4 = 0.2212 = (chi-sq 2 1/2). 0
P369463-Ch004.qxd 9/2/05 11:13 AM Page 254
Chapter 4 Special Continuous Distributions
254
Table 4.2
Special Continuous Distributions
RV X Uniform
Exponential
Unit Normal
Density f(x)
E(X)
V(X)
M(t)
1
a+b
( b - a )2
e tb - e ta
b-a
2
12
t(b - a )
ke-kt 1
e-x
2 /2
1
1
k
k
k2
k-t
0
1
et /2
m
s2
emt+s
2
2p Normal
1
2 /2s 2
e -( x - m )
2 2
t /2
2p
T
Chi-Square
Gamma
v + 1ˆ - ( v+1)/2 GÊ Ë 2 ¯ Ê x2 ˆ Á1 + ˜ v¯ vp G(v/2) Ë 1 G(v/2)2v/2 ka G(a )
Weibull
Beta
F
G(a + b )
GÊ Ë
x v/2-1e - x/2
2
a
a
k
k2
1
a
G(a )G( b )
—
v-2
xa -1e - kx
kaxa-1e-kx
v
0
k1/a xa -1(1 - x) b -1
n + m ˆ n/2 m/2 ( n -2)/2 ( n m )x 2 ¯
( n+ m )/2 n m G Ê ˆ G Ê ˆ [ m + nx] Ë 2¯ Ë 2 ¯
a + 1ˆ GÊ Ë a ¯
Ê 1ˆ Ë k¯
1 (1 - 2t) v/2 Ê k ˆ Ë k - t¯ 2/a
ÈG Ê a + 2 ˆ - G 2 Ê a + 1 ˆ ˘ Ë a ¯ ˚˙ ÎÍ Ë a ¯
a
ab
a+b
(a + b )2 (a + b + 1)
m
m 2 (2 m + 2 n - 4)
m -2
n( m - 2)2 ( m - 4)
—
—
—
-2 x
1/2 - xe f) 4Ú xe -2 x dx = 4È ÍÎ 2 0
1/2 1 1/ 2 -2 x ˘ + Ú e dx ˙ = 1 - 2e -1 = 0.2642 ˚ 0 2 0 = (inc-Gamma-fn 2(* 2 1/2)). 1/2 1/2 -2 x g) 2Ú e dx = - e -2 x = 1 - e -1 = 0.6321 = (Weibull 1 2 1/2). 0 0 1/2 G(5) 4! 1/2 1˘ È1 1 h) x(1 - x )2 dx = Ú ( x - 2x 2 + x 3 )dx = 12Í + ˙ Ú 0 0 Î G(2)G(3) 2! 8 12 64 ˚ = 0.6875 = (inc-Beta 2 3 1/2). i)
4G(2)
1/2
Ú G(1)G(1)
0
1
1/2 3 3 1 dx = 4Ú (2 + 2x )-2 dx = 2Ú u -2 du = -2u -1 = 0 2 2 3 (2 + 2 x ) = ( Fd 2 2 1/2). 2
a
P369463-Ch004.qxd 9/2/05 11:13 AM Page 255
255
Problems
PROBLEMS CONTINUOUS UNIFORM ON [A, B]
1. A call for a taxi reveals that the arrival will be sometime between 7:00 and 7:20 (continuous uniform RV). Find the probability that the taxi will arrive a) before 7:07, b) after 7:12, c) after 7:15, given that the taxi had not arrived by 7:10. ans. 7/20 8/20 5/10. 2. If RV X is continuous uniform on [0, 1], find the density function for a) RV Y = 1/ X ; b) RV Y = X 2 . c) Find E( Z ) and V ( Z ) for RV Z = 10 X 2 . 3. Given random sample X1, X2, . . . , Xn from the continuous uniform on [0, 1], find the densities and expected values for Ymin, the minimum of the {Xi}, and Ymax, the maximum of the {Xi}. 4. If RV X is continuous uniform for 0 < x < 1, find a) E(e-x),
b) E(sinx cosx),
c) P(m - s < X < m + s).
5. A stick of unit length is randomly broken in two pieces. a) Compute the probability that the shorter piece is less than 0.4. ans. 0.8 b) Find the expected value of the longer piece. ans. 3/4. c) If the stick is randomly broken in two places, compute the probability that the 3 pieces form a triangle. See Software Exercise 10 for a simulation. ans. 1/4. 6. RVs X and Y are uniformly distributed over [0, a] and [0, b], respectively, and the variance of Y is 4 times the variance of X. Show that E(Y) = a.
GAMMA WITH PARAMETERS a, K
7. a) Substitute z2/2 for x in G(a) = Ú•0 xa-1e-xdx to create an expression for G(1/2) and solve. Hint:
Ú
•
-•
2
e - z /2 dz = 2p .
b) Although the Gamma function is not defined for a < 0, find a value for G(-1/2) using the relationship G(a) = G(a + 1)/a. ans. -3.545. 8. a) Find P(X < 4) for the gamma density f(x; a = 2, k = 1/3). 7 ans. 1 - e -4/3 = 0.3849. 3 b) The time in minutes between calls to a hospital emergency room is exponential with parameter k = 3. Compute the probability of 10 or more calls in 6 minutes. Verify with the Poisson model.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 256
256
Chapter 4 Special Continuous Distributions
9. Find the specific gamma density for RV X, given a) a = 1; b) k = 1/2 and a = v/2; c) E(X) = 3 and V(X) = 1.5, and write P(X < 5). d) Evaluate Ú10(-Ln x)-1/2dx. ans. Exponential chi-square k = 2, a = 6 p . EXPONENTIAL WITH PARAMETER K
10. Find the cumulative distribution for exponential density f(x) = ke-kx for x > 0. 11. Service time at a tollgate is exponential, with a mean of 10 seconds. Find the probability that the time to wait is less than 20 seconds. Verify using the Poisson. ans. 0.865. 12. The lifetime of a tube has density f(x) = 100/x2 for x > 100 hours. Find the probability that exactly 2 of 5 tubes will be replaced within the first 150 hours of operation. 13. RV X has density f(x) = ce-2x for x > 0 and constant c. Find a) P(X > 2), b) P(X > m + s). ans. 0.0183 0.1353. 14. The mean arrival rate of customers at a shop is 15 per hour (assume Poisson process). Find the probability that the next customer will arrive after 12 minutes. Use both the exponential density and the Poisson density. 15. The lifetime of a computer battery is exponential, with mean m equal to 1/2 year. Find the probability that exactly 1 of 5 such batteries will last longer than 2 years. ans. (binomial 5 (U-exponential 2 2) 1) Æ 0.0851. 16. Show that Y is exponential with parameter k, given that RV X is continLn(1 - X ) . uous uniform on [0, 1] and Y = k 17. Let X1, X2, . . . , Xn be n independent exponential random variables with parameters ki. Show that the minimum (X1, X2, . . . , Xn) is exponential with parameter k = Ski. 18. Memoryless Property. Suppose that the life of a CD player is exponential with parameter k = 1/15 and that the player has been in use for 5 years. Find the probability that the CD player is still working after 15 more years.
CHI-SQUARE (c 2)
19. Let X and Y be chi-square RVs with 2 and 3 degrees of freedom, respectively. Compute the probability that X + Y < 7. ans. (chi-sq 5 7) = 0.77935. 20. Compute the probability that the variance of a random sample of size 11 ( n - 1) S 2 from N(5, 16) is less than 8. Hint: is a chi-square RV. s2
P369463-Ch004.qxd 9/2/05 11:13 AM Page 257
257
Problems
21. For a chi-square (c 2) RV with v = 10 df, find P(5 £ c 2 £ 10). ans. (-(chi-sq 10 10) (chi-sq 10 5)) = 0.560 - 0.109 = 0.451. 22. Find the mean and variance of a chi-square RV with 16 degrees of freedom. 23. a) Let X = Z2 where Z is distributed N(0, 1). Find the density for X. ans. c 12. b) Let independent unit normal RVs X and Y represent a point in the plane. Compute the probability of the point lying within the unit circle centered at the origin. (See software exercise 22). ans. 0.396. c) Let independent unit normal RVs X, Y, and Z represent a point. Find the radius of the smallest sphere containing the point with probability 0.90. ans. 6.25.
NORMAL N(m, s 2)
24. Suppose RV X is distributed N(m = 1, s 2 = 4). Find the probability of each of the following: a) P(X < 3),
b) P(X > 1.5),
c) P(2 < X < 5),
25. Find the expected value and variance of
X -m s
d) P(-1 < X < 0.5).
where X is normal. ans. 0 1.
26. RV X is N(5, 25). a) Compute P(|X - 5| < 3). b) Find c, given P(2 < X < c) = 0.703. 27. Compute E(X) for RV X ~ N(m, s 2) directly from the definition of expected value. 28. Find the probability of getting exactly 20 heads or 30 tails in 50 tosses of a fair coin by approximating the binomial distribution with the normal distribution, using the continuity correction. 29. The mean of a machine for drilling holes is set at 1 inch. The standard deviation s is 0.0004. Find the percent of drilled holes between 1.001 and 1.003 inches. Assume a normal distribution. ans. 0.62%. 30. Evaluate a) d)
3
1
2
Ú
-•
Ú 2 * G(7 / 2)
e - x /2 dx; •
-•
2
b)
e - ( x - 5 ) /8 dx;
Ú
•
-•
2
x 2 e - x /2 dx;
e)
Ú
•
0
e - x x n dx;
c)
Ú
•
0
f)
3 x 6 e -4 x dx;
Ú
•
0
x -1/2 e - x dx.
31. a) Determine the setting of the mean for filling cans so that the probability of exceeding 20 ounces is less than 1% for a normal RV with s = 2. ans. 15.35. b) Determine the largest variance for a normal distribution with mean 3000 hours that serves as a model for the lifetime of a motor so that 90% of the motors last at least 2700 hours. ans. 54, 783.54.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 258
258
Chapter 4 Special Continuous Distributions
32. The yearly demand for a product is RV X distributed N(500, 1000). It is desired to satisfy the demand 96% of the time. Determine the manufacturing level for the year. 33. For RV X distributed as N(2, 4), find a) P(2X - 5 < 1) and b) E(C2 + 2X + 1). ans. 0.6915 13. 34. If 20% of a population with high blood pressure is N(140, 25) and the rest is N(120, 25), find the probability that a randomly selected person X has high blood pressure, given that the person’s blood pressure exceeds 130. 35. a) Find the density function for RV Y given that Y = |X| and RV X is N(0, 1). b) If RV X is distributed N(20, 2) in degrees Centigrade, find the distribution in degrees Fahrenheit. 2 2e - y /2 ans. for y > 0 N(68, 162/25). 2p 36. The time in minutes to complete the first step of a two-step process is distributed N(35, 5) while the time for completing the second step is distributed N(20, 20). Find the probability of the two-step process exceeding an hour. 37. Show that a) the entropy of N ( m, s 2 ) = Log 2 2pes 2 = Ln 2ps 2 , b) among all density functions with a specified mean and variance, the normal distribution has the largest entropy c) the entropy of the unit normal is 1.419 base e or 2.047 base 2.
BETA B(a, b )
38. Show that the beta density integrates to 1. 39. Find the P(X < 1/2), E(X), and E(1 - x)3 for a beta RV X with a = 3 and b = 5. 40. Find c such that f(x) = cx2(1 - x)3 is a beta density. ans. G(3 + 4)/[G(3)G(4)] = 60. 41. Given a = 1 and b = 2 for beta RV X and g(x) = 2X2 + 3X + 1, find E[g(x)]. 42. Find the beta density on the interval [a, b] and compute m and s 2. G(a + b ) È ( x - a ) ˘ * ans. f ( x ) = ( b - a ) G( a )G( b ) ÍÎ ( b - a ) ˙˚ 1
a -1
È (b - x ) ˘ ÍÎ ( b - a ) ˙˚
b -1
.
43. In Program Evaluation and Review Technique (PERT) analysis the most likely time to complete a task is 2 days. The parameters a and b can show the percentage of maximum time to complete an activity. The pessimistic and optimistic times are 5 days and 1 day, respectively.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 259
259
Problems
a) Find a relationship between a and b. b) Given a = 1, find P(X £ 3 days). c) Find the probability that an activity will be complete within 85% of maximum time, given a = 6 and b = 2. 44. A beta RV X distribution for a manufacturing machine with a = 2 and b = 1 has an associated repair cost given by RV C = X2 + 3X + 5. a) Determine E(X), E(X2), V(X), and E(C). b) Verify these answers for density f(x) = 2x on [0, 1], using the beta formulas for E(X) and V(X). ans. 2/3 1/2 1/18 7.5. 45. Show that f(x, a, b = 1) = axa-1 for the beta density. 1
Ú
46. Show that
WEIBULL W(a, k)
0
x -1/2 (1 - x )-1/2 dx = p .
47. Find the probability that Weibull RV X exceeds 10 years with parameters k = 1/50 and a = 2. 48. Find the probability that Weibull RV X exceeds 10 years with parameters a = 2 and k = 1/50, given that X > 7 years. ans. e-51/50 = 0.361. 49. For the Weibull distribution with parameter a = 1, write F the cumulative distribution, S the survivor function, f the density function, and H the hazard function. What property of the exponential RV does the hazard function suggest? f ( x ) = ke - kx => F ( x ) = P ( X £ x ) =
Ú
x
0
ke - kt dt = - e - kt
x = 1 - e - kx . 0
Survivor function S( x ) = P ( X > x ) = 1 - P ( X £ x ) = 1 - [1 - e - kx ] = e - kx . Hazard function H ( x ) = •
2
50. Evaluate Ú x 5 e -3 x dx. 0
f ( t) 1 - F ( t)
=
ke - kt e - kt
= k => constant failure rate. ans. 1/27.
F 51. Write the density for the F with n = m = 4 and compute P(F ≥ 1/2). DISTRIBUTION Ê n + m ˆ n/2 m /2 ( n - 2 )/2 G ( n m )x Ë 2 ¯ 6 * 16 * 16 x f ( x) = = . ( n + m )/ 2 ( 4 + 4 x )4 Ê nˆ Ê mˆ G G [ m + nx] Ë 2¯ Ë 2 ¯ 52. Compute P(s12/s22 > 2.03) for two samples of sizes 21 and 25 from the same normal population. ans. F20,24,.05 = 2.03; 0.05. 53. If s12 and s22 are sample variances from independent samples of sizes 16 and 25, respectively, with s 12 = 20 and s22 = 16, find P(s12/s22) > 1.25.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 260
260
Chapter 4 Special Continuous Distributions
T 54. Show that the t density approaches the unit normal density as the DISTRIBUTION degrees of freedom parameter v approaches infinity. 55. Show that tv2 = F1,v, that is, a t distribution with v degrees of freedom squared is an F distribution with 1 and v degrees of freedom. 56. Find a) P(t15 > 2); b) P(t10 < 2);
c) P(t60 < 1.671). ans. 0.032 0.963 0.95.
MISCELLANEOUS 1. A radius of a circle is randomly selected from the continuous uniform on [0, 1]. Find the probability that the area of the circle is less than one. ans. 1/ p = 0.5642. 2. For RV X with density f(x) = 3x2 on [0, 1], find the probability that exactly 3 of 5 random samples exceed 1/2. 3. Suppose 20 random samples are chosen with replacement from the integers 1 to 25. Find the probability that at most 12 of the samples are less than 16. Then use the command (SWR 20 (upto 25)) to sample with replacement 20 samples with p = 15/25 and count the number less than 16. ans. (cbinomial 20 15/25 12) Æ 0.5841. (SWR 20 (upto 25)) may return (21 5 23 25 6 10 24 23 4 10 7 12 18 12 24 19 8 24 3 11) of which 11 are less than 16. The following code returns the average of 100 such samples. (Let (( z nil)) (dotimes (i 100 (mu z)) (push (length (filter (swr 20 (upto 25)) '(lambda (x) (<x 16)))) z))) 4. Suppose RVs X and Y are jointly uniform on the unit circle. Find f(x, y) and fX(x). Hint: The cylindrical volume (probability) representing the integral of the joint density function must have a value of 1. 5. a) Use the normal approximation to the binomial to determine the probability of getting exactly 100 heads in 200 tosses of a fair coin. ans. (binomial 200 1/2 100) (binomial-vs-normal 200 1/2). b) Find c given P(X < c) 0.95 for binomial RV X with n = 300 and p = 3/4. ans. 237. c) Approximate P(22 £ X < 27) for binomial RV X with n = 100 and p = 1/4 by using the normal distribution with continuity correction. ans. (del-normal 25 18.75 21.5 27.5) Æ 0.5087 (cbinomial-a-b 100 1/4 22 27) Æ 0.5109
P369463-Ch004.qxd 9/2/05 11:13 AM Page 261
261
Review
6. The output from a radioactive process is a Poisson RV X with k = 80 per hour. a) Find P(X < 75) b) Find P(X < 75) with use of normal approximation to the Poisson. c) Compare the Poisson with the normal approximation with use of (poisson n-vs-normal 80). 7. Let RV X be distributed N(m, s 2) and let Y = aX + b. Using the transformation of variables technique, show that Y is distributed N(am + b, a2s 2). 8. You and your friend agree to randomly meet sometime between 8 and 9 in the morning but will wait no longer than 20 minutes for the arrival of the other. Compute the probability that you will meet your friend. 9. The mean number of signals received each hour is 30 (Poisson process). Compute the probability of no received signal in the next 5 minutes, using the Poisson and the exponential densities. ans. 0.0821. 2
10. If RV X has Rayleigh density f(x) = kxe-kx /2 for x > 0, find the first two moments about the origin using the Gamma function, and then find V(x). 11. Find the expected distribution of the total time in minutes for three sequential independent processes distributed N(30, 20), N(50, 30), and N(25, 35). Find the probability that a product passing through the three processes takes more than 90 minutes to be completed. ans. N(105, 85) 0.95. 12. Given RV X ~ N(m, s 2) and X = Ln Y, find Y’s density distribution, called the log-normal. 13. A failure rate is set so that the probability of exceeding it is 1/50. Estimate the rate given the following sample from a log-normal distribution (44 24 143 68 68 24 100 63 70 211 211 24). See Problem 12. ans. 331.
REVIEW 1. A continuous distribution is given by f(x) = 2x for 0 £ x £ 1. Find mean, variance, and the cumulative distribution. ans. 2/3 1/18 x2. 2. The time to repair a computer is an RV X whose density is given by f(x) = 1/4 for 0 < x < 4. The cost C = 20 + 30÷x depends on the time x. Find E(C). 3. Ten balls are randomly chosen from an urn containing 20 W (white) and 30 R (red). Let RV X be the number of Ws chosen. Find E(X). ans. 4.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 262
262
Chapter 4 Special Continuous Distributions
4. Let X be continuous uniform on [3, 7] and let Y = 2X + 3. Find F(y), f(y), and E(Y) from Y’s density as well as from X’s density, using Y’s relationship to X. 5. RV X is given by f(x) = a2xe-ax for x ≥ 0. Find E(X) by inspection. ans. 2/a. 6. Prove that if X is an RV that is always nonnegative, then for any value v > 0, P(X ≥ v) £ E(X)/v (Markov’s Inequality). 7. In the expansion of (3x + 2y)20, find the coefficient of x12y8. ans. 17138079429120 20C12 * 312 * 28. 8. Confirm that the expected number of hats returned correctly is 1 from checking 5 hats and having them randomly returned. Use the density for RV X below, with X being the number of hats returned correctly. X P(X)
0 11/30
1 3/8
2 1/6
3 1/12
4 0
5 1/120
SOFTWARE EXERCISES 1. (continuous-uniform a b x) returns P(X £ x | a, b); (continuousuniform 2 7 4) Æ 2/5. 2. (phi z) returns P(Z £ z) where Z is the standard normal distribution. (phi 1) Æ 0.8413. (del-phi a b) returns F(b) - F(a). (del-phi -1 1) Æ 0.6827. (L-normal mu var x) returns the lower tail probability; (L-normal 50 4 52) Æ 0.8413. (U-normal mu var x) returns the upper tail probability; (U-normal 50 4 52) Æ 0.1587. (del-normal mu var x1 x2) returns P(x1 < X < x2); for example, (delnormal 0 1 –1 1) Æ 0.6827. 3. (inv-phi p) returns the z value such that P(Z < z) = p. (inv-phi 0.25) Æ -0.6741891. Note that (phi -0.6741891) Æ 0.25. 4. (Gamma n) returns (n - 1)! for nonnegative integer n. (Gamma 5) Æ 4! = 24, (Gamma 1/2) Æ p = 1.7724, (Gamma 3/2) Æ 1/2 * G(1/2) = p /2, (Gamma 5/2) Æ 3/2 * G(3/2) = (3/2) * (1/2) * p , etc. 5. (exponential-a-b k a b) returns the P(a £ x £ b) for an exponential RV X with parameter k. (L-exponential k x) returns P(X £ a); (U-exponential k x) returns P(X > x).
P369463-Ch004.qxd 9/2/05 11:13 AM Page 263
Software Exercises
263
Rework Problem 17, using the software. Recall P(A | B) = P(AB) / P(B); P(X ≥ 20 | X ≥ 5) = P(X ≥ 20, X ≥ 5) / P(X ≥ 5) = P(X ≥ 20) / P(X ≥ 5). ans. (/ (U-exponential 1/15 20) (U-exponential 1/15 5)) Æ 0.3678794. 6. (chi-sq v x) returns P(C 2v £ x) where v is the degrees of freedom for the chi-square random variable. (chi-sq 12 6.3) returns 0.1; (chi-sq-inv 12 0.1) returns 6.3. 7. The command (L-tee n x) returns P(X < x) with n degrees of freedom. For example, (L-tee 15 2.131) returns 0.975. (inv-tee n alpha) returns x for which P(X > x) = alpha in percent or decimal. For example, (invtee 15 0.025) returns 2.131. 8. (sim-normal m s n) returns n sample values from a normal distribution N(m, s 2). For example, (setf data (sim-normal 50 2 100)) returns a random sample of size 100 from the population. (mu-svar data) should return x 50 and sample variance s2 22 = 4. (stem&leaf data) should show the normal bell shape. Try (stem&leaf (sim-normal 50 2 100)) and repeat with the F 3 key. 9. (U-FD n d x) returns the upper tail P(X > x) for an F-distributed RV X with n degrees of freedom for the numerator and d degrees of freedom for the denominator. (U-Fd 15 10 2) returns 0.865. (L-Fd 15 10 2) Æ 0.135. 10. c 2m = Z21 + Z22 + . . . + Z2n; that is, a chi-square RV with n degrees of freedom is a sum of the squares of n unit normal RVs. To simulate, generate a random sample of size 30 from the standard normal distribution N(0, 1). Then square each value in the sample and sum. The expected value of the chi-square RV with 30 degrees of freedom is 30. (setf data (sim-normal 0 1 30)) returned the following random sample of size 30 from N(0, 1): -1.14 0.71 0.76 0.54 -1.26 1.50 -2.48 -0.00 -0.36 0.46 -1.14 0.55 1.09 0.34 0.18 -0.90 -0.71 1.04 -0.09 -1.01 0.63 0.64 -0.73 0.92 -1.99 -2.28 -0.68 0.01 0.27 0.28. (setf sq-data (repeat #' square data)) returned the following list of data squared: (1.30 0.51 0.59 0.30 1.61 2.25 6.17 0.00 0.13 0.21 1.30 0.31 1.19 0.12 0.03 0.82 0.51 1.09 0.00 1.02 0.39 0.42 0.53 0.85 3.97 5.21 0.47 0.00 0.07 0.08). (sum sq-data) returns 31.55 (30), the simulated sum of 30 unit normal RVs squared.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 264
264
Chapter 4 Special Continuous Distributions
The expected value of the chi-square RV with 100 degrees of freedom is 100. The commands are combined in (sim-X-sq n) for n such runs of n normal samples squared and averaged. (mu-svar (sim-X-sq n)) should return values near the expected value n and sample variance 2n. (mu-svar (sim-X-sq 100)) Æ (99.79 213.46). 11. Simulate 100 samples from the continuous uniform on [0, 1]. Find the first two sample moments and compute the variance. Check the nearness to the theoretical value of 1/12. 1. (setf data (sim-uniform 0 1 100)) 2. (setf M1 (sample-moments data 1)) 3. (setf M2 (sample-moments data 2)) 4. (- M2 (sq M1))
; generate sample of size 100 from the uniform. ; assign M1 to the first moment, i.e., X. ; assign M2 to the second moment. ; subtract square of 1st moment from 2nd.
Note: (svar sample) returns the sample variance directly, theoretically equal to 1/12. 12. a) A stick is randomly broken in two places. Simulate the probability that the three pieces form a triangle. The command (stick a b n) returns n random samples of 2 values from the continuous uniform on [a, b] along with the sizes of the left, middle, and right pieces. (stick 5 10 16) returned the data below. Notice that the sum of the 3 values (Left, Middle, Right) in each row is (b - a) = 10 - 5) = 5. The final row simulates the probability of forming a triangle, followed by a list of 1s and 0s, with 1 denoting that a triangle is possible and 0 that it is not. In this run the simulated value was identical to the theoretical value of 1/4. Random 7.663 5.027 5.672 8.693 7.595 7.549 9.456 8.3875 5.05 8.196 8.1305 8.797 8.0785 5.077 8.955 8.6015
Cuts
Left
Middle
Right
Triangle
5.9885 8.703 6.24 9.8095 9.8175 7.643 9.964 7.8535 9.9855 7.636 7.1415 9.0475 5.769 7.15 9.2465 6.9545
0.9885 0.027 0.672 3.6935 2.595 2.549 4.456 2.8535 0.05 2.636 2.1415 3.797 0.769 0.077 3.955 1.9545
1.675 3.676 0.568 1.116 2.2225 0.094 0.508 0.534 4.9355 0.56 0.989 0.2505 2.3095 2.073 0.2915 1.647
2.3365 1.297 3.76 0.1905 0.1825 2.357 0.036 1.6125 0.0145 1.804 1.8695 0.9525 1.9215 2.85 0.7535 1.3985)
YES NO NO NO NO NO NO NO NO NO YES NO YES NO NO YES
P369463-Ch004.qxd 9/2/05 11:13 AM Page 265
265
Software Exercises
(0.25 (1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1)) b) Simulate the expected value of the shorter and the longer piece of a stick randomly cut at one place with (stick-2 length n). The shorter piece should be close to 1/4 of the length and the longer piece should be close to 3/4 of the length. (stick-2 10 1000) returned (2.507074 7.49292) as f(x) = 2 on [0, 1 /2] and on [1/2, 1]. 13. Compare the binomial probabilities with the normal probabilities for binomial parameters n = 200, p = 1/2, and normal approximation m = 100, s 2 = 50 from 100 to 109 occurrences. The command (binomial-vs-normal n p) returns the comparison of the 10 x-values 100 to 109. For example, the following table was generated by the command (binomial-vs-normal 200 1/2). np = m; npq = s 2 X
Binomial
Normal
100 101 102 103 104 105 106 107 108 109
0.0563484 0.0557905 0.0541496 0.0515210 0.0480532 0.0439344 0.0393751 0.0345912 0.0297869 0.0251412
0.0563719 0.0558120 0.0541652 0.0515278 0.0480498 0.0439208 0.0393529 0.0345631 0.0297562 0.0251113
Try various values for n and p and observe the difference is smaller for larger n and for p close to 1/2 (e.g., 0.45, 0.55, 0.6). 14. Compare the Poisson probabilities with the normal probabilities for large values of Poisson parameter k = 200 and normal approximation parameters m = k = s 2 = 200 for X = 200 to 209 occurrences. The following table was generated by the command (poisson-vs-normal 100) poisson-vs-normal k = m = s 2 = 100 X 100 101 102 103 104 105 106 107 108 109
Poisson
Normal
0.0398610 0.0394663 0.0386925 0.0375655 0.0361207 0.0344007 0.0324535 0.0303303 0.0280836 0.0257648
0.0398776 0.0396789 0.0390886 0.0381243 0.0368142 0.0351955 0.0333136 0.0312188 0.0289648 0.0266064
P369463-Ch004.qxd 9/2/05 11:13 AM Page 266
266
Chapter 4 Special Continuous Distributions
Try various values for k and observe that the difference is smaller for larger k. 15. (sim-gamma a k n) returns n samples from a gamma distribution with parameters a and k. For example, (setf data (sim-gamma 5 4 100)) returned (1.10 1.63 2.90 0.97 1.92 0.66 1.01 1.09 1.66 2.91 1.19 0.65 0.80 2.97 1.10 1.21 2.26 0.95 1.39 2.16 3.18 1.29 1.76 2.01 2.38 0.85 1.82 1.76 1.58 1.68 0.81 2.85 1.10 0.62 0.58 0.84 1.36 1.74 0.71 1.28 1.29 0.89 0.81 0.98 0.92 0.84 2.17 0.98 1.40 0.78 0.58 1.15 1.30 0.83 0.56 0.85 1.21 0.90 0.67 1.20 2.40 1.41 2.23 2.70 0.55 1.89 0.80 1.09 1.30 0.85 1.14 0.93 1.54 0.42 1.52 0.92 0.85 1.29 1.91 1.18 0.65 0.94 1.15 1.47 0.40 1.76 1.50 2.43 1.04 1.22 1.75 1.35 0.85 1.29 0.94 1.18 0.49 1.61 0.58 1.42). (mu-svar data) returned x = 1.32 5/4 = m and s2 = 0.39 5/15 = s 2. 16. (sim-rayleigh k n) returns n samples from the Rayleigh distribution with parameter k. See Miscellaneous problem 10, where the density is 2 given by f(x) = kxe-x /2 for x > 0. For example, (sim-rayleight 2 100) returned 0.40 0.86 0.13 0.93 1.54 0.88 1.56 1.27
1.25 0.33 0.87 0.48 0.91 1.01 0.19 0.18 0.31 0.32 0.04 0.26 0.30 0.52 1.57.
0.94 1.33 0.87 0.66 2.43 0.47 1.44
1.01 0.02 1.34 0.95 0.09 0.35 0.96
1.69 1.30 0.38 0.11 1.33 1.14 2.04
0.13 0.78 0.80 0.62 0.37 0.15 1.36
0.23 0.65 0.70 0.44 0.47 0.64 0.68
1.36 0.54 0.50 0.23 0.89 0.59 0.80
1.94 1.63 0.39 1.79 1.05 0.75 0.62
1.05 0.75 1.26 0.59 0.85 0.20 0.08
1.15 0.74 1.23 0.72 0.75 0.70 0.80
0.38 1.88 1.20 0.27 0.52 0.77 0.19
0.59 1.71 1.62 0.56 0.87 0.59 2.28
The average of the sample is 0.82 and the sample variance is 0.28. 17. Simulate E(sin xcos x) =
1
Ú sin x cos xdx = 0
sin 2 x 1 = 0.354. The com2 0
mands are ; take a random sample from the continuous uniform. (setf cdata (repeat #' cos data)) ; take the cosine of each value. (setf sdata (repeat #' sin data)) ; take the sine of each value. (setf ndata (repeat #' * cdata sdata)) ; multiply the two lists. (mu ndata) ; take the average of the multiplied list. (setf data (sim-uniform 0 1 100))
The commands are combined in (Esinxcosx n). (Esinxcosx 100) Æ 0.356.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 267
267
Software Exercises
18. (sim-exponential k n) returns n samples from the exponential with parameter k. Try (sim-exponential 1/2 100) and the command (stem&leaf (sim-exponential 1/2 100)). 19. (Inc-beta a b x) returns P(X < x) for the beta density defined on [0, x] for x < 1. For example, (inc-beta 3 5 1/2) returns 0.7734375. G(a + b )
Ú G(a )G( b )
x
0
x (a -1) (1 - x )( b -1) dx
is the incomplete beta function yielding P(X £ x). 20. The minimum of n independent exponential RVs is exponential, with k = Ski. (min-exponential k1 k2 k3 n) returns the minimum value of three random samples from exponentials with parameters k1 k2 k3 taken n times. (mu-svar (min-exponential 2 3 7 100)) should return a value close to 1/12 and 1/144. Try (stem&leaf (min-exponential 1/10 1/5 1/7 100)). 21. (Weibull a k x) returns P(X £ x). (Weibull 1 2 5) Æ 0.9999546 ¨ (exponential 2 0 5). (sim-weibull 5 2 20) returns a random sample from the Weibull distribution with a = 5 and k = 2. (HDP (sim-weibull 5 2 100)) returns a horizontal dot plot of the sample, which should show normal tendencies since the sample size 100 is large. 22. Let RVs X and Y from N(0, 1) represent a point in the plane. Simulate the probability that the point lies within the unit circle centered at the origin. A chi-square RV with 2 degrees of freedom is the sum of the squares of two unit normal RVs: c2 = x2 + Y 2 (sim-circle-pts 1000) returned (412 1000 0.412) vs. (chi-sq 2 1) = 0.3958951 (defun sim-circle-pts (n) (let ((in 0)) ; number inside circle (dotimes (i n (list in n (/ in n))) ; do n times, return in, n and in/n (if (‹ (sum (mapcar 'sq (sim-normal 0 1 2))) 1) ; if within circle, (incf in))))) ; add 1 to in.
P369463-Ch004.qxd 9/2/05 11:13 AM Page 268
268
Chapter 4 Special Continuous Distributions
SELF QUIZ 4: CONTINUOUS DISTRIBUTIONS 1. Compute P(2 < X < 6) for N(m = 5, s 2 = 4). 2. Annual rainfall in inches is N(m = 31, s 2 = 25). Compute probability that in 3 of the next 5 years the rainfall will be less than 30 inches. 3. Find the mean of RV X (a chi-square RV) with v = 16 degrees of freedom, given that E( X ) =
Ú
•
0
1 G( v/2)2
v/2
x ( v/2 ) e - x/2 dx.
4. The tread wear in miles of 500 tires is exponential with parameter k. How many tires will last as long as m? 5. Find E(X5) for a gamma RV X with a = 2 and k = 5. •
6. a.
Ú
c.
Ú
e.
-• •
1
2
x2 e - x /2 dx = _______________.
b.
Ú
•
0
z7 e - z/2 dz = _____________.
1
2
e - x /2 dx = _________________. d.
2p 1
3 2p
Ú
•
-•
2 /18
xe -( x -5 )
dx = ______.
f.
Ú
•
0
Ú
•
0
x2 e - x/2 dx = ________.
6xe -6 x dx = ______________.
7. If men’s and women’s heights are normally distributed as N(70, 9) and N(66, 4) respectively, compute the probability that a woman’s height will exceed a man’s height. Assume independence. 8. RV X is continuous uniform on [-3, 7]. Find the probability that a) X is greater than 0, given that X is greater than -1 and b) P(X > m + s). 9. In a standard test set with a mean of 80 with s = 5, what is the cutoff grade for an A if the cutoff is at 10%? 10. Find the probability that 3 independent measurements from a normal distribution with m = 20 and s = 4 are within the interval [18, 22]. 11. For a sample space consisting of a square, compute the probability of a random point falling inside a circle inscribed in the square. 12. Given RV X is continuous uniform on [0, 1], find the density of Y = X2. 13. If RV X is continuous uniform for 0 £ x £ 1, find E[Ln(x)].
P369463-Ch005.qxd 9/2/05 11:14 AM Page 269
Chapter 5
Sampling, Data Displays, Measures of Central Tendencies, Measures of Dispersion, and Simulation One picture is worth more than a 1000 words.
This chapter transitions from the realm of probability into the realm of statistics or applied probability. A few descriptive and analytic tools are used to display data and to help characterize the risks involved with statistical inferences. The data displays lead to data analysis to explore for estimators and confidence intervals (Chapter 6) and to confirm through hypothesis testing (Chapter 7). The fundamental theorem of parametric statistics is also featured, along with estimators for location and dispersion parameters, as well as Monte Carlo simulations of the special discrete and continuous random variables discussed in chapters 3 and 4. A short description of order statistics is included. 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8
Introduction Data Displays Measures of Location Measures of Dispersion Joint Distribution of X and S2 Simulation of Random Variables Using Monte Carlo for Integration Order Statistics Summary 269
P369463-Ch005.qxd 9/2/05 11:14 AM Page 270
270
5.0
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
Introduction How does one describe 100,000 numbers? If the numbers are from a normal distribution, then N(m, s 2) suffices, that is, only two parameters. A list of measures of location such as the mean, median, mode, minimum and maximum, or measures of dispersion as variance, standard deviation, and range all aid in statistical summaries of data, leading to inferential statistics. When the probability density function of a random variable is known, a lot can be said about the whereabouts of the RV. When the density function is not known, it is common to conjecture about the underlying distribution of the RV of interest and to take samples from a population from which inferential statistics can help describe the distribution. Probability statements are made from distributions before the underlying experiment is performed; inferences about unknown parameters of distributions are made from samples after the experiment is performed. See Figure 5.1. But first there are data from which information is extracted. Scores of baseball games like 3-2 and 5-1 constitute data. More informed data emerge when the teams having each score are known. The data take on more meaning merely by identifying the numbers runs. Data analysis takes data from samples, observes patterns, extracts information, and makes inferences. The individual elements constituting the random samples are reasonably thought to be independent of one another. Each element of the sample can be regarded as the value of a random variable. A random sample of n elements from the population is usually denoted as X1, X2, . . . , Xn. Each RV Xi has the exact same density distribution, and random implies independence of the Xis. Any such manipulation or condensing of the sample quantities that do not depend on any unknown parameters of the population is called a statistic. All statistics are random variables with density functions of the Xi. Population is the term used for the totality of elements from which a sample may be drawn. The symbol N denotes the population size; the symbol n denotes the sample size. The N for theoretical or conceptional populations is infinite. Samples have statistics; populations have parameters. Statistics are used to describe or summarize properties of the sample as well as to infer properties of the distribution from which the sample was
Probability Population
Sample Inferential Statistics
Figure 5.1
Probability and Statistics
P369463-Ch005.qxd 9/2/05 11:14 AM Page 271
5.1 Data Displays
271
taken. A sampling distribution is obtained by selecting with replacement all possible samples of size n from a population. It is too expensive to sample every member of a population for most purposes such as political polls, brand preferences, and research studies; thus sampling is essential to determine people’s preferences and testing outcomes. From just a few random samples, valid inferences can be made regarding the population. The main question is to determine how few to sample to ensure valid inferences at appropriate costs. Some of the population parameters of interest are the mean, median, mode, standard deviation, variance, range, probability of success, and degrees of freedom. Each of these parameters can be estimated from statistics (random variables). The statistics have their own sampling distributions. The standard deviation of sampling distributions is often called the standard error. The joint density of a random sample is the product of the individual densities, as randomness implies independence. If all the samples are drawn from the same population, the samples are said to be independent and identically distributed (iid) and the joint density is the product of the densities: n
’f
xi
(5–1)
( x i ).
i =1
Before discussing sample statistics, we present several ways of organizing and displaying data.
5.1
Data Displays One useful display for organizing data is the ordered stem and leaf, introduced by John Tukey. Consider the following sorted, random sample of size 100 taken from a normal distribution with m = 50 and s = 16. (setf data (repeat #'round (sim-normal 50 16 100))) returned 3 37 48 63
13 37 49 63
21 38 49 63
22 38 49 64
23 39 50 65
24 39 50 65
25 39 50 66
26 41 51 66
27 41 51 66
27 42 54 68
28 43 55 69
28 44 55 69
28 44 55 69
29 44 56 71
31 44 57 72
31 45 57 74
32 45 58 74
32 45 58 74
33 46 59 76
36 47 60 76
37 47 61 76
37 47 61 81
37 47 61 88
37 48 62 94
37 48 62 94.
An ordered stem and leaf display, which uses the first digit as the stem and the second digit as the leaf is shown in Figure 5.2. The same data are presented in finer resolution as Figure 5.3 by creating separate stems for the leaves 0–4 and 5–9. Determining the stem (first digits) and leaf (last digits) selections for an accurate display requires judgment. For 3 or more digits or decimal displays (e.g., 234 or 4.0157), the selection of the stems depends on what is to be emphasized and conveyed. Usually the first one or two digits are used as the stems. The first column shows the number of entries in each row, the second column shows the stems, and the third column shows the leafs. The command (stem&leaf data) prints the following.
P369463-Ch005.qxd 9/2/05 11:14 AM Page 272
272
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
Cum 1 2 14 32 54 69 88 96 98 100
Stem 0 1 2 3 4 5 6 7 8 9
Figure 5.2
Leaf n = 100 3 3 123456778889 112236777777788999 1123444455567777888999 000114555677889 0111223334556668999 12444666 18 44
Stem and Leaf Data
0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9
3 3 12345 6778889 11223 677777778899 11234444 5556777788999 000114 55677889 0111223334 55666899 12444 666 1 8 44
Figure 5.3
Finer Resolution of Stem and Leaf Data
If one visualizes the display on its side in Figures 5.2 and 5.3, the shape of the distribution is discernible. Clearly the sample is not from an exponential distribution but could be from a normal distribution. The command (stem&leaf data) returns a stem and leaf plot of the data. The first column tallies the number in the row, the second is the stem, and the third the leaves. The command (Hdp data) returns a horizontal dot plot pictorial with asterisks used in lieu of the actual values (Figure 5.4).
P369463-Ch005.qxd 9/2/05 11:14 AM Page 273
5.1 Data Displays
273
* * *********** ************** ***************** ******************** ***************** ************ **** ***
Figure 5.4
Distribution of Sample Data
Group A 07 5778 135 147 59 3
Figure 5.5
Stem Group B 1 2 18 3 8 4 379 5 69 6 27 7 57 8 058 9
Back-to-Back Stem and Leaf Display
An advantage of the stem and leaf display occurs in comparing two sets of data. Suppose Group A consists of scores 10 17 25 27 27 28 31 33 35 41 44 47 55 59 63 and Group B consists of scores 21 28 38 43 47 49 56 59 62 67 75 77 80 85 88. A back-to-back display is shown in Figure 5.5, from which it is obvious that Group B has higher scores.
Boxplots Boxplots, also called box-and-whisker plots, are an excellent way to portray the distributional quantities of data. The boxplot shows the median in a box with tails emanating from the box, up and down (or left and right), showing the range of the data. The upper (or right) border of the box shows the upper quartile, which is the median of the data above the median. The lower (or left) border of the box shows the lower quartile, which is the median of the data below the median. The IQR is the interquartile range. Relatively shorter
P369463-Ch005.qxd 9/2/05 11:14 AM Page 274
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
274
lines mark the boundaries of the 75th percentile, plus 1.5 times the IQR, and the 25th percentile, minus 1.5 times the IQR, denoting the outlier region. EXAMPLE 5.1
Prepare a boxplot (Figure 5.6a) for the following data. 24 21 20 31 27 41 43 34 17 25 32 30 28 26 33 36 39 27 28 31 7 63.
The sorted ascending data are 7 17 20 21 24 25 26 27 27 28 28 30 31 31 32 33 34 36 39 41 43 63. The minimum value is 7, the maximum value is 63, and the median is 29, the average of the 11th and 12th values (28 and 30). There are 11 values below the median and 11 above, rendering the first quartile at 24.75 and the third quartile at 34.5. The relatively small horizontal lines mark the 1.5 times the IQR above the 75th percentile and below the 25th percentile. Data outside these boundaries (63 and 7) are plotted separately as outliers, denoted by asterisks. Figure 5.6b shows multiple boxplots of data.
The command (boxplot data) returns the minimum, the 25th percentile, the median, the 75th percentile, and the maximum of the data. With the above data assignment, (boxplot data) returns 63
34.5
29
24.75 7
min = 7, q25 = 24.75, median = 29, q75 = 34.5, max = 63.
Frequency Distributions and Histograms Oftentimes raw data need to be classified and summarized to be useful. Frequency distributions are used to classify data, and plots of the data are called
P369463-Ch005.qxd 9/2/05 11:14 AM Page 275
5.1 Data Displays
275
(a) 60
* 63
50 40 34.5 29
30
24.75 20 10 * 7 0
Figure 5.6a
Boxplot of Data
(b) 80 70 60
C1
50 40 30 20 10 0 C1
C2
C3
Figure 5.6b Multiple Boxplots (Minitab)
C1
C2
P369463-Ch005.qxd 9/2/05 11:14 AM Page 276
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
276
Histogram of Data, with Normal Curve 6 Frequency
5 4 3 2 1 0 15
20
25
30
35
40
45
Data
Figure 5.7
Frequency Distribution and Histogram histograms. For example, consider the following data that are displayed as a frequency distribution and as a histogram with normal curve in Figure 5.7. 24 21 20 31 27 41 43 34 17 25 32 30 28 26 34 36 39 27 28 31
The command (setf data (sample 100 (upto 1000))) assigns data to a random sample of size 100 from the integers 1 to 1000. The template (histo-info data number-intervals) returns the sorted data of numbers and determines the number of class intervals, unless number-intervals is specified, and boundaries. For example, (histoinfo data 7) generated the results in Example 5.2 below, where data are a random sample of size 100 from the integers 1 to 1000. EXAMPLE 5.2
Prepare a histogram and boxplot of the following data of size 100. Solution The command (histo-info (sample 100 (upto 100)) 7) returned the following display. The sorted sample is 12 168 321 494 701 863
19 170 341 495 704 863
35 173 355 496 738 870
93 179 376 507 750 870
104 182 382 522 759 880
114 183 383 557 760 907
115 216 388 560 778 908
130 240 412 565 779 910
139 245 416 578 780 919
140 253 421 585 787 919
141 259 438 625 792 954
149 260 456 640 792 957
151 268 457 653 800 965
156 275 460 662 818 988
158 161 287 313 481 483 674 690 840 848 997.
164 321 492 695 860
P369463-Ch005.qxd 9/2/05 11:14 AM Page 277
5.2 Measures of Location
277
Class Boundaries 11.5 152.5 152.5 293.5 293.5 434.5 434.5 575.5 575.5 716.5 716.5 857.5 857.5 998.5
Frequency 13 19 12 15 11 14 16 100
997 778.75 487.5 241.25 12
Figure 5.8
Boxplot
20 15 10 5 0 11.5
Figure 5.9
152.5
293.5
434.5
575.5
716.5
857.5
Histogram
The range is 985 = (997 - 12). The number of recommended intervals is 7 (27 = 128 > 100 = sample size). The minimum category length is 140.7 (985/7). Round up to use 141. The boundaries of the 7 class intervals are then (11.5 152.5) (152.5 293.5) (293.5 434.5) (434.5 575.5) (575.5 716.5) (716.5 857.5) (857.5 998.5). The boxplot is shown in Figure 5.8 and the histogram is shown in Figure 5.9.
5.2
Measures of Location
Mean Suppose an estimate for m is desired. Immediate candidates are the mean, median, and mode. An estimate is said to be robust if its value does not
P369463-Ch005.qxd 9/2/05 11:14 AM Page 278
278
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
change much when spurious data occur in the underlying sample. We will see that the median and mode are more robust than the mean. The most useful statistic from a sample is the sample mean, designated as X and defined as X =
X1 + X 2 + . . . + X n
(5–2)
.
n X is a random variable and, as such, has an expected value, variance, and density function, and so on, accorded to all RVs. We use X to indicate the RV before the random sample is selected and x to indicate the value of the RV after the sample is taken. The expected value of X can be calculated from (5–2) as E( X ) = E
Ê X1 + X 2 + . . . + X n ˆ m + m + . . . + m nm = = = m, Ë ¯ n n n
and the variance V( X) is computed as V
2 2 2 ns 2 s 2 Ê X1 + X 2 + . . . + X n ˆ s + s + . . . + s = = = . Ë ¯ n n2 n2 n
The mean and variance of a sample are usually designated as m x and s 2x. Notice that the samples are independent from the random selection process. Recall that V(aX + b) = a2V(X). Also note that as the sample size increases, the variance of the sample decreases.
EXAMPLE 5.3
For the population {10 12 14 8 16 20 12 8}, find m and s. Solution
Ê Â x i ˆ Ê 100 ˆ m =Á = 12.5; ˜= Ë N ¯ Ë 8 ¯ s2 =
Â(X
- m )2
i
N
EXAMPLE 5.4
= 14.75 with s = 14.75 = 3.84.
Find the mean and variance of RV X with density f(x) = 3e-3x, x > 0. Solution Recognizing the exponential density distribution, we see that the mean m = E( X ) =
1 k
=
1 3
= 0.3333 and s 2 =
1 k
2
=
1 9
= 0.1111.
P369463-Ch005.qxd 9/2/05 11:14 AM Page 279
5.2 Measures of Location
279
The command (sim-exponential k n) returns a sample of size n from the exponential distribution with parameter k. For example (setf data (sim-exponential 3 30)) returned the following random sample: 0.650 0.188 0.147 1.181 0.081 0.396 0.187 0.238 0.109 0.306 0.006 0.020 1.025 0.482 0.237 0.010 0.294 0.353 0.085 0.237 1.089 0.009 0.525 0.048 0.648 0.416 0.249 0.687 0.871 0.159. The command (mu-svar data) returned 0.364 = x ª 1/3 = m and 0.112 = s2 ª 1/9 = s 2.
EXAMPLE 5.5
Let X1, X2, and X3 be random samples from a normal distribution with mean 1 m π 0 and s 2 = . Find values of a and b to ensure that RV W = aX1 + bX2 25 - 2X3 is standard normal, N(0, 1). Solution
E( W ) = E( aX1 + bX 2 - 2 X 3 ) = am + bm - 2m = ( a + b - 2)m = 0.
Since m π 0, a + b - 2 = 0, or a + b = 2. V ( W ) = a 2s 2 + b2s 2 + 4s 2 =
a 2 + b2 + 4
= 1, or a 2 + b 2 = 21.
25 Solving, a 2 + (2 - a )2 = 21 fi 2a 2 - 4a - 17 = 0 or a = 4.0822, b = -2.0822.
The (ex5.5 m n) returns n random sample means of size 3 from a normal distribution N(m, 1/25). For example, (setf data (ex5.5 10 50)) returned 50 sample means with a sample size of 3: 2.110 1.609 -2.321 -0.562 -1.245 -1.209 -1.151 -0.354 -0.808 -0.150 0.599 0.849 -0.589 1.520 0.502 0.612 -1.267 1.971 -0.318 -0.286 0.744 -0.361 0.797 1.380 0.473 -0.428 1.293 0.170 0.322 0.769 0.543 0.627 0.905 -0.612 -0.658 -0.291 0.396 0.484 0.708 1.204 1.128 -1.980 -1.750 -0.042.
-0.319 -0.645 -0.536 -1.949 -1.113 0.682
(mu-svar data) returned 0.029 = x ª 0 = m and 1.064 = s2 ª 1. Try (mu-svar (ex5.5 68 50)) for various values of m and n.
Median Another measure of central tendency is the median X˜ of a distribution for RV X. The median is defined as that value of X such that P(X £ X˜ ) ≥ 1/2 and
P369463-Ch005.qxd 9/2/05 11:14 AM Page 280
280
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
P(X ≥ X˜ ) ≥ 1/2. The value X˜ is the unique midpoint of a continuous distribution but is not necessarily unique for a discrete distribution. If the length of the sorted sample from a discrete distribution is odd, the median is the middle value. If the length of the sample is even, then any value between the two middle values is considered the median. However, it is common to average the middle two values and to call this average the median. The median minimizes the sum of the absolute values, that is, E[|Xi - X˜|] is a minimum from the median, whereas the mean minimizes the sum of the squared deviations, E[(x - x )2]. EXAMPLE 5.6
a) Find the median for the population if each member is equally likely to occur. b) Verify that the sum of the absolute deviations from the median is less than the sum of the absolute deviations from the mean. c) Verify that the sum of the squared deviations from the mean is less than the sum of the squared deviations from the median. Population: {20 8 12 8 12 16 14 10} has a mean of 12.5. Solution The command (setf data (sort '(20 8 12 8 12 16 14 10) #' 0.
m
m
m (2m + 1)! È lx x ˘ l e - lx [1 - e - lx ] = * Í- e ˙ Î ˚ 0 m! m! (2m + 1)! = [1 - e -2 mlx ]l e - lx for x > 0. m! m!
It can be shown that the variance of the median for random samples from a ps 2 . We will see that the normal population of size 2n + 1 is approximately 4n mean is more efficient when we compare the efficiency of the mean with the efficiency of the median in the next chapter.
5.8
Summary Data displays can show information precisely and concisely. However, we must be wary of misguided uses of some displays with inflated or deflated scales to show huge gains or small losses. Displaying data requires judgment. One of the most useful results for statistical applications is the joint distribution of X and S 2 when sampling from a normal distribution. X is a ( n - 1) S 2 normal random variable and is a chi-square RV independent of X. s2 The normal distribution is the only distribution for which this result is true. The method of inverse cumulative distribution functions is used to simulate samples from random variable distributions. The method is based on the fact that a RV Y, which is a function of a cumulative distribution function of a RV X, is continuous uniform on the interval [0, 1]. The polar method can be used to simulate normal random variables. The Monte Carlo technique for estimating integrals is based on the expected value of a function defined on the interval [a, b] with use of the continuous uniform density. It is simple to use and produces estimates that are useable in statistical applications.
EXAMPLE 5.38
Describe the following data set of 100 random numbers in regards to the mean, median, mode, IQR, MAD, 5% trimmed mean, range, sample
P369463-Ch005.qxd 9/2/05 11:14 AM Page 311
5.8 Summary
311
variance, sample standard deviation, skewness, kurtosis, IQR, and outliers. Solution The data is already assigned to the variable data - 5.38.
(setf data=5.38 '(320 845 668 714 3972 454 852 181 164 478 19 301 722 919 768 199 248 412
124 389 557 713 875 630 510 909 39 49 815 258 731 2569 540 833 434 929 261 635 560 24 649 789
27 552 764 738 791 339
925 458 300 140 227 971
883 427 379 879 337 938
927 371 477 665 174 380 247 858 247 371 904 710 925 5981 70 486 739 431 227 569 630 163 795 622 509 929 120 253 984 436 378 630 397 318 342))
258 350 543 248 642
A boxplot (boxplot data-5.38) of the data shows minimum value = 19, Q25 = 300.25, median = 509.5, Q75 = 783.75, maximum = 5981.
19
5981 300.25
509.5
783.75
A summary of some useful data commands are as follows: mean is (mu data - 5.38) Æ 624.39, median is (median data - 5.38) Æ 509.5, mode is (mode data - 5.38) Æ 630, interquartile range is (IQR data) Æ 483.5, median absolute deviation from the median is (MAD data - 5.38) Æ 250, 5% trimmed mean (mu (trim-mean data - 5.38 5)) Æ 531.15, range is (range data - 5.38) Æ 5962, sample standard variance is (svar data - 5.38) Æ 526205.35, sample standard error is (sqrt (svar data - 5.38)) Æ 725.4, and list of outliers is (outliers data - 5.38) Æ (3972 5981 2569). The command (depict data-5.38) returns
P369463-Ch005.qxd 9/2/05 11:14 AM Page 312
312
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
N 100
Mean 624.390
Median 509.500
Mode 630.000
Trim-5% 531.156
Sum 62439
Mssd 537993.560
Std-Dev 725.400
Se-Mean 72.540
SVAR 526205.44
IQR 483.500
MAD 250.000
Range 5962
Mid-Range 2981.000
Q-1 300.250
Q-3 783.750
Min 19.000
Max 5981.000
Skewness 5.137
Kurtosis 34.841
CV 1.162
along with a horizontal dot plot. Horizontal Dot Plot N = 100 **************************************** ********************************* * *
EXAMPLE 5.39
The command (random-sample n) generates a random sample from a distribution. Try to infer the distribution from the sample statistics and horizontal dot plots. Then try to infer the parameters of the distribution. Enter *RS* to see the command that generated the data. (random-sample 50) returned
Solution
(0 1
1 0
1 0
0 0
1 1
1 0
0 0
0 1
0 0
0 1
0 1
1 0
0 1
1 1
0 1
0 0
1 1
1 1
1 0
1 0
1 0
1 1
0 0
1 1),
an extremely easy sample to decipher. The distribution is discrete. (HDP *) ************************* ************************ We can conclude that the sample is from a Bernoulli distribution. This exercise can be challenging and warrants your appreciation of inferential statistics.
P369463-Ch005.qxd 9/2/05 11:14 AM Page 313
313
Problems
PROBLEMS 1. Find the mode for a binomial RV given n and p and, specifically, for a binomial RV X with n = 20 and p = 3/4. ans. p(n + 1) - 1 £ x £ p(n + 1) 15. 2. Find the mode for a) binomial RV given n = 10, p = 1/8); b) binomial RV given n = 30, p = 5/8). 3. Let X1, X2, and X3 be a random sample from a normal distribution with mean m π 0 and s 2 = 1/36. Find values a and b for standard normal RV W = aX1 - 4X2 + 2bX3. ans. ( a, b) ~ (2 + 6, 1 - 6/2) or (2 - 6, 1 + 6/2). 4. Given that RV X has a continuous uniform distribution on the interval [3, 5], a) find and sketch the density distribution of the sample mean from random samples of size n = 49. Compute P( X < 4.1). 5. Find the median of the Weibull density and the exponential density functions. Check the simulated estimate, using (median (sim-weibull a k n)) with parameter values for a and k; for example (median (sim-weibull 2 3 100)). ans. median = [(Ln 2)/k]1/a. 6. Find k and the median of the distributions for density functions 2
a) f(x) = kxe-x for x > 0,
b) f ( x )
k x +1
on [0, 1].
7. For exponential RV X with k = 1, find fY(y) for Y = -Ln X. -y ans. e-(e +y) for y on (-•, •). 8. The waiting time in minutes at a clinic is distributed N(30, 5). a. Compute the probability that the waiting time for a randomly selected patient exceeds 28 minutes. b. Compute the probability that the average waiting time of 12 randomly selected patients exceeds 28 minutes. 9. Boxplot and make a histogram with class size 5 for the following data: (45 39 41 49 26 28 41 40 33 31 38 28 41 49 36 49 31 49 28 31 45 28 30 40 39) 10. Find a) skewness a3 and b) kurtosis a4 for N(m, s 2). 11. Find the skewness a3 and the kurtosis a4 for the Poisson distribution with parameter k. ans. 1/ k 3 + 1/k. 12. Find the coefficient of variation D for the binomial distribution with parameters n and p and for a Poisson distribution with parameter k.
P369463-Ch005.qxd 9/2/05 11:14 AM Page 314
314
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
13. Let RV Y = eX where RV X is N(m, s 2). Use transformation of variables to find fY(y) and compute P(2 £ Y £ 4) = P(Ln 2 £ X £ Ln 4) for m = 0 and s 2 = 1. ans. 0.16128. 14. Find the mean, median, and mode for the a) exponential density distribution given by f(x) = e-x for x ≥ 0, b) density distribution f(x) = 4x(1 - x2) for x on [0, 1]. 15. Find the expected mean, median, and mode of the outcomes from tossing a pair of fair dice repeatedly. ans. 7 7 7. 16. A sample of size n = 33 is taken from N(2, 4). Compute E( X) and P(S2 > 4). 17. For the population {1 2 6 7}, select all the samples of size 2 with replacement and compute the mean and variance of the population (m, s 2) and 2 Ê s ˆ the mean and variance of the sampling distribution of means x , . Ë n¯ s2 Show that E( X) = m = 4 and that s 2x = = 6.5/12 = 3.25. For random 2 samples of size 5 from the integers 1 to 100, V( X) = ___. ans. 833.25/5. 18. Simulate a random sample of size 100 from the density f(x) = 2x on [0, 1] and show that the mean of the samples is close to E(X) = 2/3. Set u to the antiderivative of 2x, which is x2. Then solve for x to get x = u , a member of the sample. Generate 100 samples from the continuous uniform on [0, 1] and take the square root of each. The average of such values is an estimator for E(X). The command (mu (mapcar #'sqrt (sim-uniform 0 1 100))) should return a value near 2/3. 19. Simulate 100 samples from the density f(x) = 3x2 on the domain [0, 1] and show that the mean of the samples is close to E(X) = 3/4. (mu (mapcar #'cube-root (sim-uniform 0 1 100))) should return a value near 3/4. 20. Simulate 100 samples from the density f(x) = 6x-2 for x in [2, 3] and show that the mean of the samples is close to E(X) = 6(Ln 3 - Ln 2) = 2.43279. 21. Minimize the sum of the squared deviations about c where n
Â(x f ( c) =
i
- c )2
i =1
n -1
to show that c = X .
22. Find the IQR for RV X with density function given by f(x) = 3x2, 0 £ x £ 1.
P369463-Ch005.qxd 9/2/05 11:14 AM Page 315
315
Problems
23. Find the median absolute deviation from the median (MAD) for the following data: (15 35 65 37 86 35 98 49 50 64 65)
ans. 15.
24. Find the sample standard deviation, IQR, and MAD for the following data sets: a) 18 12 34 65 27 74 58 45 76 24 56 100; b) 18 12 34 65 27 74 58 45 76 24 56 500. c) Compare the sample standard deviation s with
IQR 1.35
and
MAD
as an
0.675
indication of the dispersion of each set. 25. Determine the candidate outliers for the two data sets in Problem 24. ans. 500. 26. Find the mean, median, and IQR of a Weibull RV x with parameters k = 1 and a = 2. 27. Verify that the Median (aX + b) = a * Median(x) + b for a = 5, b = 12, and x = ¢(1 2 3). Use software commands to generalize. (median (mapacar #'+ (list-of 11 12) (mapcar #'* (list-of 11 5) (upto 3))))) Æ 22. (+ (* 5 (median (upto 3))) 12) Æ 22. (defun verify-median (a b x) (let* ((n (length x)) (med1 (median (mapcar #'+ (list-of n b) ; median (ax + b) (mapcar #'* (list-of n a) x)))) (med 2 (+ (* a (median x)) b))) ; a * median (x) + b (list med1 and med2))) The command (verify-median a b x) returns the two medians computed accordingly. (verify-median 5 12 (upto 3)) returns (22 22), showing the median is 22. 28. A sample of size n = 25 is taken from N(30, 16). Compute a) E( X),
b) V( X),
c) P( X > 31),
d) P(S2 < 20).
29. Consider two baseball players A and B, where player A has a higher batting average than Player B during both halves of the season, but Player B has the higher batting average over the entire season. Use weighted mean concept to distinguish the importance of different data. 30. Find the mean and variance of RV Y = X 3, where RV X has density f(x) = 3x2 on [0, 1]. Check values by finding E(Y ), E(Y 2), and V(Y ). 31. Find the mode of the beta distribution on the interval [0, 1]. ans. (a - 1)/(a + b + 1).
P369463-Ch005.qxd 9/2/05 11:14 AM Page 316
316
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
SOFTWARE EXERCISES The variables pi - 100 return the first 100 digits, pi - 200 return the second 100 digits, . . . , and pi - 900 return the ninth 100 digits of pi for data to use in the excercises and already are assigned. 1. (upto n) returns a list of integers from 1 to n. (upto 5) returns (1 2 3 4 5). (upto n) returns a list of integers from 0 to n. (upto 5) returns (0 1 2 3 4 5). 2. (swr n population) returns a random n-sample from population sampling with replacement. (swr 3 (upto 10)) may return (4 9 5). (swor n population) returns a random sample from population without replacement. (swor 10 (upto 12)) Æ (6 3 2 4 7 9 8 11 10 5). (swr 10 (upto 12)) Æ (9 9 9 11 5 10 10 4 12 7). 3. (sim-uniform a b n) returns a random n-sample from the continuous uniform on [a, b]. (sim-uniform 0 1 5) may return (0.3445 0.6676 0.3217 0.5698 0.8767). 4. (sim-binomial n p m) returns m random samples from a binomial with parameters n and p. (sim-binomial 10 1/2 5) may return (4 6 5 7 4). (mode (sim-binomial n p m)) returns the mode of the binomial samples. Check the validity of the analytical answers to Problem 2. 5. (sim-poisson k n) returns a random n-sample from a Poisson distribution with parameter k. (sim-poisson 5 10) may return (5 6 5 3 4 5 6 5 4 5). For large n (n ≥ 30), the distribution is nearly normal. Try (HDP (sim-poisson 10 100)) to see a horizontal dot plot of a nearly normal distribution. 6. (sim-exponential k n) returns a random sample of size n from an exponential distribution with parameter k. (sim-exponential 2 5) may return (0.0040 0.3682 0.6878 2.040 0.2991). The average of the samples is an estimator for k. Try (mu-svar (sim-exponential 2 100)) to see x and s2 estimates. 7. (sim-normal m s n) returns a random n-sample from a normal distribution with parameters m and s. Try (mu-svar (sim-normal 0 1 20)) to see x and s2 estimates. (sim-normalX+-Y mx sx nx my sy ny) prints stem and leafs with use of simulated samples of size nx and ny for RVs X, Y, X + Y, and X - Y,
P369463-Ch005.qxd 9/2/05 11:14 AM Page 317
317
Software Exercises
along with estimates for their parameters, where X ~ N(mx, s x2) and Y ~ N(my, s 2y). Try (sim-normalX+-Y 10 3 100 16 2 100). 8. (sim-gamma a k n) returns a random n-sample from a gamma distribution with parameters a and k. Try (mu-svar (sim-gamma 2 5 10)) to recover estimates for the expected value a/k and sampling variance a/k2. 9. (sim-weibull a k n) returns a random n-sample from a Weibull distribution with parameters k and a. (sim-weibull 2 3 5) may return (0.2701250 0.4768672 0.6331509 0.6594265 0.5239947). 10. (from-a-to-b a b step) returns a list of integers with step difference of 1 if not specified. (from-a-to-b 5 10 1) returns (5 6 7 8 9 10). 11. (Boxplot data) for the following data: a) (upto 10),
b) (swr 10 (upto 10)),
c) (from-a-to-b 10 50).
12. Find the mean, median, and mode from data generated from the software command (swr 10 (upto 10)). 13. Stem and leaf the population generated from the software command (from-a-to-b 10 90)) What is the distribution? Verify that E(X) = 50 and V(X) ª 546.67 by using the command (mu-var (from-a-to-b 10 90)). (stem&leaf (from-a-to-b 10 90)) returns a stem and leaf plot of the actual values. Compare (sim-normal 50 5 500) with (stem&leaf (simnormal 50 5 500)) to see the value of the stem and leaf plot. 14. Generate 100 random samples from the exponential distribution with parameter k = 5. Then compute an estimate for E(X) = 1/5 with the command (mu (setf data (sim-exponential 5 100))). Try (Hdp data) to see sample distribution. 15. Generate a random sample of size 100 from the population 1–100. Compute x and compare with m = 50. Vary the size of the sample as well as the population. Predict m and compare with X computed with the command (mu (swr 100 (upto 100))). 16. Generate 100 random samples from the continuous uniform on [7, 17], compute X , and compare with (7 + 17)/2. (mu (sim-uniform 7 17 100)). 17. Generate 100 random samples from the binomial distribution with parameters n = 12 and p = 3/4. Compute X and compare with np. (mu (sim-binomial 12 3/4 100)).
P369463-Ch005.qxd 9/2/05 11:14 AM Page 318
318
Chapter 5 Sampling, Data Displays, Measures of Central Tendencies
18. Generate 100 samples from a Weibull distribution with a = 1 (exponential) and k = 2. Then take the average of the sample to test the nearness to the expected value. Repeat for 100 samples from an exponential with k = 2 for similar results. Why are they similar? (mu (sim-weibull 1 2 100)) ª (mu (sim-exponential 2 100)). 19. (sim-beta a n) returns a random sample of size n with parameters a and b = n - a + 1. Show that (mu (sim = beta 10 19)) Æ ª 10/20 = 0.5. 20. Verify that the skewness is 0 and the kurtosis is 3 for N(m, s 2) through simulation, taking the following steps: a) (sim-normal m s n) returns n random samples from N(m, s 2). Choose values for m, s, and n. For example, (setf n-data (simnormal 5 12 100)). b) (skewness n-data) returns a3 of the data (should be close to 0). c) (kurtosis n-data) returns a4 of the data (should be close to 3). 21. Reference Software Exercise 5, (setf p-data (sim-poisson 4 100)), to generate 100 samples from a Poisson distribution with k = 4. The (skewness p-data) and (kurtosis p-data) should return values close to the theoretical values of 1 1 . and 3 + k k 22. Verify for a binomial distribution that skewness a 3 = ( q - p)/ npq and kurtosis a 4 = 3 +
1 - 6 pq
.
npq (msk (sim-binomial 10 3/4 100)) returns the mean, skewness, and kurtosis for random sample of size 100 from a binomial distribution with parameters n = 10 and p = 3/4. 23. To simulate a sample from a chi-square distribution, use the command (sim-chi-sq v n), where v is number of degrees of freedom and n is the desired number of samples. Try (sim-chi-sq 7 100) and check the closeness of x to 7 and s2 to 14. Recall that for a chi-square RV X, E(X ) = v and V(X ) = 2v. For software exercises 24–28, use the software command (simintegrate function a b n) to return an estimate of Úba function(x)dx, where n is the number of values taken in the interval and averaged. Compare with the exact value for simple integrations. Increase n to see that the accuracy of the estimate generally improves. Usually a large number of iterations are necessary for sufficient accuracy. Try the value of 100 for n. The density function is the continuous uniform random
P369463-Ch005.qxd 9/2/05 11:14 AM Page 319
Software Exercises
319
variable on [a, b], and we are taking the expected values of functions of this random variable. The command (simpson function a b n) is an implementation of Simpson’s rule for integrating function from a to b with n the number of subintervals. The Monte Carlo simulation may be compared with Simpson’s computation. 24. Find an estimate using the sqrt function for Ú425 xdx and compare the estimate to the exact answer. 25. Estimate Ú10 2 Log n xdx, using the Monte Carlo simulation with the continuous uniform density on the interval [2, 10]. ans. E(Ln x) = Ú10 2 Ln xdx ª (sim-integrate 'Log 2 10 100) ª 13.64. 26. Estimate Ú10 sin xdx, using the Monte Carlo simulation with the continuous uniform on [0, 1]. Sin is a software function. 1 27. Estimate Ú 0 sec xdx, using the Monte Carlo procedure for integration. ans. 1.23.
Compare with Simpson’s rule (simpson 'sec 0 1 100) Æ 1.22619. ans. E(sec x) = Ú10 sec xdx ª (sim-integrate 'sec 0 1 100) ª 1.23. 28. The function (sin x)/x is written as (sinx/x x), where x is the value to be evaluated. 1 sin x dx. Estimate Ú 0 x 29. Check the theoretical median [(Ln 2)/k]1/a of a Weibull RV with the simulated value, using the software template (median (sim-weibull a k n)). (Median (sim = weibull 3 5 1000)) Æ 0.5142 ª [(Ln 2)/s]1/3 = 0.5175) 30. (percentile percent list) returns the value from the list of discrete values that is the percentile of the list. For example, (percentile 50 '(1 2 3 4 5)) returns the value 3, the median. a) Find the 12th percentile of the following numbers: 12 17 23 18 24 18 20 19 20 27. b) Find the 60th percentile of the exponential with parameter k = 2. (nth 59 (sort (sim-exponential 2 100) #' 31 = ________. 9. How could the value of the integral sin x / x over the interval [0, 1] be simulated? 10. Find the mode for the binomial RV given n and p and specifically for n = 12 and p = 1/4.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 325
Chapter 6
Point and Interval Estimation
Guess if you can, choose if you dare. ~ Héraclius
This chapter introduces techniques for estimating unknown parameters of distributions from sampled data. Suppose we have a sample from N(m, s 2) and want to find point estimates for m and s 2. The point estimators X and S2 come to mind. Point estimators are derived with use of the methods of moments and the maximum likelihood. Confidence intervals are derived about parameters in regard to precision and sample size. The bootstrap procedure for estimating is illustrated.
6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7
Introduction Unbiased Estimators and Point Estimates Methods of Finding Point Estimates Interval Estimates (Confidence Intervals) Prediction Intervals Central Limit Theorem (Revisited) Parametric Bootstrap Estimation Summary 325
P369463-Ch006.qxd 9/2/05 11:16 AM Page 326
326
6.0
Chapter 6 Point and Interval Estimation
Introduction Estimation methods are used to provide estimates for the unknown parameters of distributions and to assist in statistical inference. Data can be collected by surveys, simulation, or controlled experiments. If observed data are thought to be from a certain distribution, estimates of the parameters of the distribution would help in the use of the distribution in making inferences and in simulation exercises. Because everything continuously measurable is an estimate in that Heisenberg’s uncertainty principle applies to continuous measurements, it is beneficial to have information pertaining to the bounds on the errors of the estimates. There are two kinds of estimators: point estimators and interval estimators. Actually, even point estimators are interval estimators, with the intervals being negligible for the specified purpose. For example, when asked for the time, one usually gives a point estimate, even though extreme accuracy carries with it a specified error interval. Similarly, the value for gravity varies along the surface of the earth but is considered as a point estimate or a constant in most applications. Sample data are collected for both point and interval estimators to answer questions pertaining to the “best” estimate of the parameters of the underlying population and the confidence in stating that the actual parameters lie between lower and upper bounds. We can never be 100% sure of our estimate, precisely because it is an estimate, but we can be confident to some extent. Parameter estimates are made from statistics from one or more samples. Samples are characterized by their statistics; populations are characterized by their parameters. The parameters of the population are usually unknown. Random samples from populations are indicated X1, X2, . . . Xn; the population members are indicated by x1, x2, . . . , xN where N is the number of members in the population and can be infinite. That is, xi is the ith member of the population and Xi is the ith member of the random sample. Before the sample is taken, each Xi is an RV. To emphasize this point, consider the sampling done as follows and see that each Xi is an RV with sampling done by replacement from the binomial distribution, with n = 100 and p = 1/2.
(swr 10 (sim-binomial 100 1/2 50)) (swr 10 (sim-binomial 100 1/2 50)) ................................ (swr 10 (sim-binomial 100 1/2 50)) (swr 10 (sim-binomial 100 1/2 50))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 Æ (45 44 54 47 48 45 47 56 49 48) Æ (53 43 48 51 39 49 41 49 51 46) ............................... Æ (48 48 53 55 49 54 53 41 46 41) Æ (51 50 58 44 54 49 50 51 50 57)
A statistic is any function g(X1, X2, . . . , Xn) of the n random samples X1, X2, . . . , Xn. All statistics are random variables before the data are collected. If a single value is used for the statistic, the value is referred to as a point estimate. The statistic before the data are computed is referred to as a point estimator. For example, suppose data are gathered and the sample mean is to be computed. This sample mean X may be used to estimate m, the mean of the
P369463-Ch006.qxd 9/2/05 11:16 AM Page 327
6.1 Unbiased Estimators and Point Estimates
327
population. The sample mean X is a random variable (point estimator or prestatistic) before the data are collected, and its value x is the point estimate when the data are known. Point estimates can be computed by the method of moments or by the maximum likelihood method. Interval estimates about a parameter q are computed from samples from distributions and are expressed as qˆ ± an error term, where qˆ is an estimator for q. The objective of estimation is to specify the population parameters from the samples with a desired level of confidence. Several properties of estimators are discussed. Three generic properties of all estimators are validity, precision, and reliability. An interval estimate of the average height of a human would be extremely reliable if given as 0 to 10 feet, but the estimate would not be very precise. In contrast, an average height given as 3 feet, 7.231457 inches, would be extremely precise but not very reliable for that precision. A scale that consistently weighs people exactly 15 pounds heavy is precise, reliable, but not valid. The standard error of the sampling distribution determines the precision of the estimator. The smaller the standard error, the greater is the precision. Statistics may be reliable without being valid but can never be valid without being reliable. No estimator can be perfect, but estimators can have desirable properties, including the ease of computing the estimator. We seek the estimator that yields the most information for the least cost at an acceptable risk. Estimators have properties such as unbiasedness, efficiency, consistency, sufficiency, and least variance.
6.1
Unbiased Estimators and Point Estimates The symbol q is used to indicate a generic parameter of the population (for example, m, p, s 2), and the symbol qˆ is used to denote the statistical estimator for q. If the expected value of the estimator is equal to the parameter, that is, if (6–1) E(qˆ) = q , the estimator is said to be unbiased, a desirable property in that sometimes the estimate is too high and other times it is too low, but neither always too high nor always too low. If qˆ is biased, the bias B is the absolute difference between the expected and actual value of the parameter. (6–2) B = E(qˆ) - q
EXAMPLE 6.1
a) A coin is flipped n times, revealing X heads. Show that an unbiased X estimator for p, the probability of a head, is Pˆ = . n b) If the experiment is repeated m times, resulting in X1, X2, . . . , Xm heads, x show that an unbiased estimator for p is Pˆ = . n
P369463-Ch006.qxd 9/2/05 11:16 AM Page 328
328
Chapter 6 Point and Interval Estimation
Solution X a) For binomial(X; n, p), E(X ) = np. Using the estimator Pˆ = , we have n Ê X ˆ np ˆ E( P ) = E = = p, Ë n¯ n implying that
X
is an unbiased estimator for p.
n X mnp b) E( Pˆ ) = EÊ ˆ = = p. Ë n¯ mn Unbiased estimators are not unique. Suppose X1, X2, . . . , Xn is a random sample from a normal distribution, N(m, s 2). We seek an estimator for m. Let X1 + X 2 X1 + 2 X 2 X1 - 4 X 2 + 8 X 3 qˆ1 = X1; qˆ2 = ; qˆ3 = ; qˆ4 = . 2 3 4 Observe that m+m E(qˆ2 ) = = m; 2 m + 2m m - 4m + 8m 5m E(qˆ3 ) = = m; E(qˆ4 ) = = , 3 4 4 indicating that qˆ1, qˆ2, and qˆ3 are unbiased estimators but that qˆ4 is a biased estimator. Which is the preferred estimator? The answer in general is the unbiased estimator with the least variance. Since E(qˆ1 ) = m;
V (qˆ1 ) = s 2 ,
V (qˆ2 ) =
2s 2
= 0.5s 2 ,
4
and V (qˆ4 ) =
81s 2
V (qˆ3 ) =
5s 2
= 0.555s 2 ,
9
= 5.0625s 2 ,
16
qˆ2 is preferred over qˆ3 and qˆ3 is preferred over qˆ1. There are times, however, when a biased estimator has a smaller error than an unbiased estimator. An estimator is said to be consistent if the precision and reliability of its estimate improve with sample size. That is, the bias approaches 0 as the sample size approaches infinity. Precisely, lim P ( qˆ - q ≥ e ) = 0 for any e > 0. nÆ•
For example, the estimator X =
Âx
i
becomes more precise and reliable with
n increasing n. The variance
s2
for X decreases with increasing n. All unbin ased estimators are consistent estimators.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 329
6.1 Unbiased Estimators and Point Estimates
329
Another property of estimators involves their mean square error. The mean square error estimator is the expected value of the square of the difference between the estimator and the parameter. MSE(qˆ) = E[(qˆ - q )2 ].
(6–3)
If E(qˆ) = q, that is, if qˆ is an unbiased estimator for q, then the MSE(qˆ) reduces to the V(qˆ). MSE(qˆ) = E[(qˆ - q )2 ] = E(qˆ2 - 2qqˆ + q 2 ) = E(qˆ2 ) - 2qE(qˆ) + q 2 = V (qˆ) + E 2 (qˆ) - 2qE(qˆ) + q 2 2 = V (qˆ) + [ E(qˆ) - q ] = V (qˆ) + B 2 .
(6–4)
Thus, if qˆ is unbiased, MSE(qˆ) = V(qˆ). A minimum variance estimator qˆ for q has the property that V(qˆ) £ V(q*) for all other estimators q*. If the estimator is unbiased, MSE(qˆ) = V(qˆ). An estimator qˆ is said to be sufficient if the conditional distribution of the random samples given qˆ does not depend on the parameter q for any xi. For ˆ = 0.56 from 56 heads out of 100 coin flips does not depend on example, p the order of the flips. An estimator is said to be more efficient than another estimator if it is more precise and reliable for the same sample size n. If qˆ is an estimator for m, then V(qˆ) cannot be smaller than s 2/n. Since V( X) = s 2/n, X is an efficient estimator. Notice that variability determines efficiency.
Cramér-Rao Inequality When there are several unbiased estimators of the same parameter, the one with the least variance is sought. A test to determine whether an unbiased estimator has minimum variance is given by the Cramér-Rao inequality. A minimum variance unbiased estimator of q must satisfy 1
V (qˆ) =
=
ÈÊ ∂ Ln f ( x ) ˆ nE Í ¯ ∂q ÍÎË 1 nE
2
or ˘ ˙ ˙˚
2 Ê ∂ Ln f ( x ) ˆ Ë ¯ ∂q 2
One formulation may be easier to compute than the other. Recall the entropy of a continuous density for RV X is H ( X ) = E[ Log2 f ( x )] =
Ú
•
-•
f ( x ) Log2 f ( x )dx.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 330
330
Chapter 6 Point and Interval Estimation
Interchanging differentiation with integration to maximize the information 2 Ê ∂ Ln f ( x ) ˆ term nE and thus to minimize the entropy leads to the minimum Ë ¯ ∂q 2 variance unbiased estimator. EXAMPLE 6.2
Show that X is a minimum variance unbiased estimator for parameter m from a normal distribution N(m, s 2). Solution f ( x) =
1
2
2
e - ( x - m ) / 2s ;
2p s Ln f ( x ) = - Ln 2p s ÈÊ ∂ Ln f ( x ) ˆ nE Í ¯ ∂q ÍÎË V( X ) =
s2
=
1
2
( x - m )2 ∂ Ln f ( x ) ( x - m ) ; = 2s 2 ∂m s2
2 ˘ n n È( x - m ) ˘ = E( Z 2 ) = . ˙ = nE Í ˙ 4 2 Î s ˚ s s2 ˙˚
=
s2
and X is a minimum variance unbiased 2 n ÈÊ ∂ Ln f ( x ) ˆ ˘ nE Í ¯ ˙˙ ∂q ÍÎË ˚ estimator. However, there still may be a biased estimator with less variance. n
Another Method For any constant c, E[( x - c)2 ] = E[( x - m + m - c)2 ] = E[( x - m )2 + 2( x - m )( m - c) + ( m - c)2 ] = E[( x - m )2 + 0 + E[( m - c)2 ] ≥ E[( x - m )2 ]. That is, X is a minimum variance estimator for m. EXAMPLE 6.3
ˆ is a minimum variance unbiased estimator for parameter Show that 1/ X = K k from an exponential distribution. Solution
f(x) = ke-kx; Ln f(x) = Ln k - kx ÈÊ ∂ Ln f ( x ) ˆ nE Í ¯ ∂k ÍÎË
2
2
ÈÊ 1 ˘ Ê 1 2x ˆ ˆ ˘ + x2 ˙ = nE Í - x ˙ = nE 2 Ëk ¯ Ë ¯ k ÍÎ k ˙˚ ˙˚ n 2 1 1ˆ Ê 1 =n + + = . Ë k2 k2 k2 k2 ¯ k2
Note that E(X 2) = V(X ) + E2(X ) = 1/k2 + (1/k)2.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 331
6.1 Unbiased Estimators and Point Estimates
331
ˆ ) = k2/n implies that X is a minimum variance unbiased estimator for an V( K exponential distribution. The relative efficiency of two estimators qˆ1 to qˆ2 is defined as the ratio of their MSEs, Reff =
MSE(qˆ1 ) . MSE(qˆ2 )
(6–5)
A ratio less than 1 implies that qˆ1 (numerator) is more efficient (smaller mean square error). The estimator X is an unbiased, consistent, sufficient, minimum variance estimator for m.
EXAMPLE 6.4
In considering a random sample X1, X2, . . . , Xn from N(m, s 2), which of these estimators for m is more efficient? X1 + X 2 2 X1 + 3 X 3 qˆ1 = X1, qˆ2 = , qˆ3 = , qˆ4 = X 2 4 Solution Observe that qˆ1, qˆ2, and qˆ4 are unbiased estimators but that qˆ3 5m - 4m m is a biased estimator, with B3 = = = 0.25m. 4 4 The relative efficiency MSE(qˆ1 ) s2 = =2 MSE(qˆ2 ) s 2 / 2 implies that X1 + X 2 qˆ2 = 2 is more efficient (smaller mean square error) than qˆ1 = X1. In considering sample sizes, the variance of the qˆ2 estimator of a sample size of 2n is the same as the variance for the qˆ1 estimator for a sample size of n. Similarly, MSE(qˆ2 ) s 2 / 2 n = = , MSE(qˆ4 ) s 2 / n 2 which implies that qˆ4 is more efficient than qˆ2 for n > 2.
EXAMPLE 6.5
For a normal distribution the sample mean X and sample median X˜ are s2 both unbiased estimators for m. Given that V ( X ) = and V( X˜) = n ps 2 , determine which estimator is more efficient. 2( n - 1)
P369463-Ch006.qxd 9/2/05 11:16 AM Page 332
332
Chapter 6 Point and Interval Estimation
Solution Since both are unbiased estimators, we compare variances. V( X ) s 2/ n 2( n - 1) = = . 2 ˜ V ( X ) ps / 2( n - 1) np As n Æ •, the relative efficiency Æ
2
ª 0.64 < 1 fi that X is 64% more effip cient in estimating m than is the median, which translates into lesser samples and reduced cost. If the sample median X˜ of size of 100 is used to estimate m, the sample mean X of size 64 can be used to estimate m with the same confidence. The command (Median-vs-Mu m s n) returns n trials each, estimating the mean with a sample size of 64 and the median with a sample size of 100 from N(m, s 2). For example, (Median-vs-Mu 50 5 10) prints Medians Æ 49.30 50.05 49.87 49.05 50.04 49.83 49.86 50.40 49.36 50.12 Means Æ 50.75 50.93 49.83 49.95 50.80 48.81 50.44 50.15 49.95 48.41
EXAMPLE 6.6
Show that E( X) = m and V( X) = s 2/n when sampling is with replacement of all possible samples of size n = 2 from the population {1 3 6 10}. Solution The population mean m = (1 + 3 + 6 + 10)/4 = 5; N
The population variance s 2 =
 (x
i
- m )2 / N
i =1
= (16 + 4 + 1 + 25)/4 = 11.5. The 16 samples are: ( 1 1) ( 3 1) ( 6 1) (10 1)
( 1 3) ( 3 3) ( 6 3) (10 3)
X P( X )
( 1 6) ( 3 6) ( 6 6) (10 6)
The 16 sample means are:
( 1 10) ( 3 10) ( 6 10) (10 10)
1 2 3 3.5 [1 2 1 2
1.0 2.0 3.5 5.5
4.5 2
5.5 2
6 6.5 1 2
2.0 3.0 4.5 6.5
3.5 4.5 6.0 8.0
5.5 6.5 8.0 10.0
8 10 2 1]/16
E( X ) = 80/16 = 5; E( X 2 ) = 492/16 = 30.75; V ( X ) = 30.75 - 25 = 5.25 = 11.5/2. That is, E( X) = m; V( X) = s 2/n; and X is an unbiased estimator for m.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 333
6.2 Methods of Finding Point Estimates
EXAMPLE 6.7
333
Given two random samples from the same distribution with sample sizes n1 and n2, show that estimator X = a X1 + (1 - a) X2 is unbiased for m where 0 £ a £ 1. Solution E( X ) = E[aX1 + (1 - a ) X 2 ] = am + (1 - a )m = m fi X is an unbiased estimator for u.
6.2
Methods of Finding Point Estimates Two frequent methods used to find point estimates are the 1) Method of Moments (Karl Pearson) and 2) Maximum Likelihood (R. A. Fisher). In general, maximum likelihood estimators have more desirable properties than method of moments estimators. We will compare the two methods for different estimators.
Method of Moments Estimators (MME) To find method of moments estimators (MME), express the parameter of interest in terms of the population moments. Then use sample moments for population moments. For example, to estimate the population moment E(X ), we use the sample moment X. Population Moments E( X )
Sample Moments  Xi
E( X 2 )
Â(X )
n
E( X r )
n ... Â ( X i )r
...
n ...
...
EXAMPLE 6.8
2
i
(6–6)
Find the method of moments estimators for m and s 2 from N(m, s 2). Solution Express the parameters of interest, m and s 2, in terms of the population moments. E(X ) = m implies that n
ÂX mˆ =
i =1
n
i
= X,
P369463-Ch006.qxd 9/2/05 11:16 AM Page 334
334
Chapter 6 Point and Interval Estimation
and V(X ) = E(X 2) - E2(X ) implies that sˆ 2 =
Â(x ) i
n
2
2
( x i - x )2 Ê Â xi ˆ . -Á ˜ =Â Ë n ¯ n
Recall (see Equation 5–3) that by defining n
Â(X S2 =
i
- X )2
i =1
n -1
,
E( S 2 ) = s 2 . That is, S2 is an unbiased estimator for s 2. Note that the MMEsˆ 2 is slightly biased, tending to underestimate s 2, since the denominator n is larger than the denominator n - 1. Also note that even though S2 is an unbiased estimator for s 2, S is not an unbiased estimator for s. EXAMPLE 6.9
Find the MME for q given the density for RV X is f(x; q) = qe-qx, x ≥ 0. Solution Express q in terms of the population moments by finding the expected value of X. The expected value for an exponential RV is 1/q. E( X ) =
Ú
•
0
xq e -qx dx =
1
1 . Thus qˆ = , q X
where we have substituted the sample moment X for the population moment E(X ).
The command (sample-moment nth sample) Æ the nth moment of the sample. (setf sample (sim-exponential 2 100)) assigns 100 values from the exponential with parameter q = 2 to the variable sample. (sample-moment 1 sample) returns the 1st sample moment; (sample-moment 2 sample) returns the 2nd sample moment. Both moments can be used to estimate q. E(X ) = 1/q fi qˆ = 1/ X. E(X 2) = V(X ) + E2(X ) = 1/q 2 + 1/q 2 = 2/q 2. fi qˆ2 = 2/M2; where M2 is the 2nd sample moment. (sample-moment 1 sample) returned 0.529 with qˆ = 1/0.529 = 1.8 ª 2. (sample-moment 2 sample) returned 0.498 with qˆ =
2 0.498
ª 2.00.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 335
6.2 Methods of Finding Point Estimates
EXAMPLE 6.10
335
a) Find the MME for q given the density f(x; q) = 1/q on [0, q] for RV X. b) Determine if the estimator is unbiased. c) Sample from the continuous uniform on [0, q = 5] and compute the estimates for q using the first and second moments of the sample. Solution x2 q q = fi qˆ = 2 X . Ú0 q 2q 0 2 q x2 x3 q q 2 E( X 2 ) = Ú dx = = fi qˆ = 3 M 2 . 0 q 3q 0 3
a) E( X ) =
q
x
dx =
2(0 + q ) b) E(qˆ) = E(2 X ) = 2 E( X ) = 2m = = q fi the estimator is unbiased. 2
c) (setf sample (sim-uniform 0 5 100)) (sample-moment 1 sample) returned 2.58 as the first moment M, or X, leading to qˆ = 2x = 2 * 2.58 = 5.16 ª q = 5. (sample-moment 2 sample) returned 8.78 for the second moment M2, leading to qˆ = 3 * 8.78 = 5.13 ª q = 5.
EXAMPLE 6.11
For continuous uniform RVs X and Y on [0, q], a) b) c) d) e)
find the MME for q for RV Z = max(X, Y); show that this MME is unbiased; simulate Z using 100 samples for q = 5, that is, f(x) = 1/5 for 0 £ x £ 5; recover an estimate for q from the sample; show that the wallet paradox problem (Chapter 2 problem set Paradox 4) has expected value of 2q/3, using the command (mu (wallet 600 1000)), which returns the average amount won by each player with up to $600 in each wallet played 1000 times. A random number from 1 to 600 is selected for each wallet.
The person with the smaller number receives the larger number in dollars. The wager appears favorable to both. Solution
a) Given RV Z = max(X, Y),
FZ ( z ) = P ( Z £ z ) = P ( X £ z, Y £ z ) =
2 Ê1 z ˆ Ê1 z ˆ z dx * Ú dy = . Ú Ëq 0 ¯ Ëq 0 ¯ q2
P369463-Ch006.qxd 9/2/05 11:16 AM Page 336
336
Chapter 6 Point and Interval Estimation
Differentiating FZ to get fZ, f Z( z) = E( Z ) =
2
2z
on [0, q ] and
q2 q
2z 2
0
q2
Ú
E( Z ) =
Ú
q
0
dz =
2z 3 q 2q 3Z = fi qˆ = . 2 3q 0 3 2 2
2z 3
2z 4 q q 2 q 2 Ê 2q ˆ q2 = fi V( Z) = = dz = . q2 4q 2 0 2 2 Ë 3¯ 18
3 2q Ê 3Z ˆ 3 = E( Z ) = * = q, b) E(qˆ) = E Ë 2 ¯ 2 2 3 implying that qˆ is an unbiased estimator. c) The command (setf X (sim-uniform 0 5 100)) assigned the following 100 random samples to X. 4.36 3.91 1.57 0.83 4.64 1.33 0.53
3.89 2.62 4.73 3.91 0.45 3.52 3.31
2.15 0.79 4.82 0.83 0.47 3.84 2.93
1.26 2.64 0.12 4.19 4.82 0.85 2.85 3.39 1.59 2.89 2.98 1.67 3.27.
0.01 3.77 2.99 0.74 4.20 1.80
4.30 4.42 4.55 4.26 0.87 2.93
2.99 3.59 3.78 3.39 3.95 3.34
3.62 3.55 4.56 1.63 4.64 4.35
1.39 2.63 4.79 4.61 4.04 2.48
3.66 3.18 1.13 2.45 1.32 4.62
0.05 2.93 0.55 2.87 2.11 2.38
0.16 4.00 2.26 0.74 3.54 3.93
4.20 4.88 0.16 0.51 1.58 1.38
1.23 3.60 2.35 2.37 3.04 1.68
4.62 1.29 2.08 3.32 2.90 4.91
The command (setf Y (sim-uniform 0 5 100)) assigned the following 100 random samples to Y. 4.52 1.57 0.40 0.13 1.61 3.30 2.28
3.11 0.05 4.28 1.31 1.55 2.43 0.61
1.57 1.20 4.87 2.29 3.62 2.89 1.41
3.31 1.01 3.77 4.27 3.29 0.60 3.72 1.48 0.59 2.24 2.85 2.94 2.58.
0.08 3.26 0.12 0.35 2.12 4.36
2.36 3.34 3.36 1.00 4.24 3.92
2.91 0.11 3.62 0.00 2.53 0.97
0.24 3.13 0.49 0.92 2.72 4.80
1.51 4.71 2.95 0.80 3.76 0.51
0.90 4.86 2.09 2.73 4.07 3.97
2.83 2.96 1.80 2.47 1.61 4.16
1.37 2.08 4.77 4.61 1.05 3.77
2.69 1.01 2.71 3.94 2.99 2.44
2.08 4.54 2.65 3.01 0.85 3.47
0.21 4.09 2.48 0.32 1.70 4.48
The command (setf Z (repeat #' max X Y )) assigned the maximum of the 100 pairs of X and Y to Z. 4.52 3.91 1.57 0.83 4.64 3.30 2.28
3.89 2.62 4.73 3.91 1.55 3.52 3.31
2.15 1.20 4.87 2.29 3.62 3.84 2.93
3.31 2.64 3.77 4.27 4.82 0.85 3.72 3.39 1.59 2.89 2.98 2.94 3.27.
0.08 3.77 2.99 0.74 4.20 4.36
4.30 4.42 4.55 4.26 4.24 3.92
2.99 3.59 3.78 3.39 3.95 3.34
3.62 3.55 4.56 1.63 4.64 4.80
1.51 4.71 4.79 4.61 4.04 2.48
3.66 4.86 2.09 2.73 4.07 4.62
2.83 2.96 1.80 2.87 2.11 4.16
1.37 4.00 4.77 4.61 3.54 3.93
4.20 4.88 2.71 3.94 2.99 2.44
2.08 4.54 2.65 3.01 3.04 3.47
4.62 4.09 2.48 3.32 2.90 4.91
P369463-Ch006.qxd 9/2/05 11:16 AM Page 337
6.2 Methods of Finding Point Estimates
337
d) The command (mu Z) returned z as 3.37, the average of the maximums. 3 Z 3 * 3.37 qˆ = = = 5.06 ª 5. 2 2 e) (mu (wallet 600 1000)) may return estimated (397 409), showing that each player won 2 * 600/3 ª 400. EXAMPLE 6.12
Let X1, X2, . . . , Xn be a random sample from a common density function f. Let U = max{Xi} and V = min{Xi}. Find the cumulative distribution and density functions of U and V. Solution
FU ( u ) = P (U £ u ) = P ( X1 £ u ) * P ( X 2 £ u ) * . . . * P ( X n £ u ) =
[Ú
u
-•
]
n
f ( x )dx .
[Ú
Fu¢( u ) = fU ( u ) = n
u
-•
f ( x )dx
]
n -1
* f ( u ) = n * f ( u ) * [ F ( u )]
n -1
.
FV ( v) = P ( V £ v) = P ( X1 £ v) * P ( X 2 ≥ v) * . . . * P ( X n ≥ v) =
[Ú
•
v
]
[Ú
Fv¢( v) = f v( v) = n EXAMPLE 6.13
n
f ( x )dx . •
v
f ( x )dx
]
n -1
* f ( v) = n * f ( v) * [1 - F ( v)]
n -1
.
The following data are taken from a gamma distribution with unknown parameters a and k. Find the MM estimates for a and k. Solution (setf gamma-data '(11.94 29.05 40.89 44.13 23.32 27.91 27.21 11.61 35.41 22.40 34.50 15.49 11.9 11.89 26.48 7.09 16.52 36.53 15.28 20.46 22.46 38.96 41.60 17.20 16.74 36.15 8.65 17.55 18.90 10.57)) (mu-svar gamma-data) returned (23.30 120.44), x = 23.30, and s2 = 120.44. x = aˆ / kˆ = 23.3 and s 2 = aˆ / kˆ2 = 120.5 kˆ = x / s 2 = 23.3/120.5 = 0.19, aˆ = x * kˆ = 4.4. The data were simulated from (sim-gamma 5 1/5 30) with a = 5 and k = 1/5.
Maximum Likelihood Estimators (MLE) Suppose there are q unknown black marbles in an urn containing a total of 6 marbles, from which 3 are drawn without replacement. We notice that there are 2 black marbles in the sample. Let RV X be the number of black marbles
P369463-Ch006.qxd 9/2/05 11:16 AM Page 338
338
Chapter 6 Point and Interval Estimation
that occur in a sample. What value of q would maximize the occurrence of the event that x = 2 black marbles in a sample of 3?
P ( X = 2 q = 2) =
Ê 2ˆ Ê 4ˆ Ë 2¯ Ë 1¯
=
Ê 6ˆ Ë 3¯
P ( X = 2 q = 4) =
Ê 4ˆ Ê 2ˆ Ë 2¯ Ë 1¯ Ê 6ˆ Ë 3¯
4
;
P ( X = 2 q = 3) =
Ê 3ˆ Ê 3ˆ Ë 2¯ Ë 1¯
20
=
12
=
Ê 6ˆ Ë 3¯
;
P ( X = 2 q = 5) =
Ê 5ˆ Ê1ˆ Ë 2¯ Ë1¯
20
Ê 6ˆ Ë 3¯
9
;
20
=
10
.
20
We conclude that the MLE for q is 4 since the largest probability, 12/20, occurs when q = 4. In other words, by assuming that q = 4, we get the maximum probability of exactly 2 black marbles occurring in a sample size of 3, that probability being 12/20. This is essentially the idea behind the MLE. We regard each RV Xi from the random sample X1, X2, . . . , Xn to be from identical distributions and use the terms independent and identically distributed (iid) to indicate such. Since the samples are independent, the product of the marginal densities is the joint density, also called the likelihood function. L( x i q ) = f ( x1, q ) * f ( x2 , q ) * . . . * f ( x n , q ) n
= ’ f ( xi q ) i =1
= P[ X1 = x1, X 2 = x2 , . . . , X n = x n ).
(6–6)
After the xi data are collected, L is a function of only q. We seek the q that maximizes the joint density function. For discrete distributions this maximization is equivalent to maximizing the probability of occurrence P(Xi = xi) of the sample. It is often easier to take the log of the likelihood function before attempting to find the critical points where the derivative vanishes. That is, the function and the log of the function have the same critical points. For example, consider y = f(x) = 2x3 - 3x2 -36x + 10. 1) y¢ = 6x2 - 6x - 36 = 0 when x2 - x - 6 = 0 or when (x - 3)(x + 2) = 0. Critical values at x = -2, 3. d d 6 x 2 - 6 x - 36 2) = 0 when [ Lny] = [ Ln(2x 3 - 3 x 2 - 36 x + 10)] = 3 dx dx 2x - 3 x 2 - 36 x + 10 x2 - x - 6 = 0. Some properties of logarithms often used to simplify the log of the likelihood function before differentiating are:
P369463-Ch006.qxd 9/2/05 11:16 AM Page 339
6.2 Methods of Finding Point Estimates
i. ii. iii. iv. v.
Ln Ln Ln Ln Ln
339
ab = Ln a + Ln b; a/b = Ln a - Ln b; ab = b Ln a; e = 1; 1 = 0,
where the natural log to the base e is denoted by Ln and is the inverse of the exponential ex. The basic relationship between logs and exponents is Log base number = exponent ¤ Base exponent = number. Log 2 32 = 5 = log 2 25 = 5 Log 2 2 fi 25 = 32. To find the maximum of a function f(x) on an interval [a, b], compare f(a) with f(b) with f evaluated at each of the critical x-values where the first derivative is zero (implying a horizontal tangent) or does not exist (corner point). The largest of these values indicates where the independent variable X assumes the maximum. EXAMPLE 6.14
Find the MLE for q given the exponential density f(x; q) = qe-qx and compare with the MME computed in Example 6.7. Solution n
x -q L( x i ; q ) = ’ q e -qxi = q n e  i (we seek the q that maximizes L); i =1 n
Ln[ L( x i ; q )] = n Lnq - q  x i
(taking the natural log of both sides);
i =1
dLn[ L( x i , q )] dq
=
n q
n
- Â xi = 0
(taking and setting the derivative to zero);
i =1
qˆ = 1/ x
(solving for the qˆ that maximizes the function).
The MLE for the exponential parameter is the same as the MME. Note also that the second derivative is -n/q 2, implying that the value is a relative maximum. EXAMPLE 6.15
Find the MLE for a Poisson parameter from a random sample given by X1, X2, . . . , Xn.
Solution
L( x i ; q ) =
e -q q x1
*
x1!
e -q q x2 x2 !
*. . .*
e -q q x n
=
e - nq q Â
xi
’x !
xn !
,
i
Ln L( x i ; q ) = - nq + S x i * Lnq - S Ln x i !, dLn[ L( x i , q )] dq
= -n +
Âx q
i
= 0 when qˆ =
Âx n
i
= X.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 340
340
EXAMPLE 6.16
Chapter 6 Point and Interval Estimation
Given density function f(x) = (q + 1)xq on [0, 1], a) find the MME and the MLE of q; b) use software commands to simulate estimates for q = 2. Solution a) MME: E( X ) =
1
Ú (q + 1)x
q +1
0
dx =
(q + 1) x q + 2 1 q + 1 = from which 0 q +2 q +2
2x - 1 qˆ = . 1- x MLE: L( x i q ) = (q + 1)n X1q X 2q . . . X nq Ln L( x i , q ) = n Ln(q + 1) + q S Ln X i n +  Lnx -n +  LnX i = 0 when qˆ = = - 1. dq q +1 - Lnx  Lnx b) Simulate 100 samples from the density when q = 2, f(x) = 3x2 on [0, 1]. RV U = X3 = F(X ) = > X = U1/3 and U is continuous uniform on [0, 1]. d Ln[ L( x i , q )]
=
n
1. (setf U (sim-uniform 0 1 100)) returns 100 samples from U on [0, 1]. 0.33 0.86 0.85 0.57 0.84 0.21 0.16
0.39 0.01 0.83 0.35 0.45 0.80 0.30
0.41 0.40 0.84 0.22 0.95 0.04 0.77
0.30 0.73 0.25 0.73 0.47 0.84 0.42 0.77 0.86 0.44 0.71 0.18 0.06.
2. (setf X (repeat 100 samples. 0.69 0.95 0.94 0.83 0.94 0.60 0.55
0.73 0.26 0.94 0.71 0.77 0.93 0.67
0.74 0.73 0.94 0.61 0.98 0.34 0.91
#'
0.08 0.68 0.00 0.51 0.02 0.11
0.12 0.95 0.20 0.73 0.39 0.64
0.47 0.72 0.74 0.12 0.57 0.83
0.54 0.59 0.01 0.47 0.24 0.44
0.13 0.53 0.39 0.60 0.12 0.45
0.04 0.66 0.81 0.36 0.92 0.56
0.76 0.32 0.25 0.41 0.41 0.51
0.67 0.23 0.51 0.57 0.24 0.39
0.29 0.50 0.74 0.98 0.72 0.51
0.38 0.45 0.50 0.44 0.46 0.75
0.03 0.76 0.99 0.77 0.21 0.44
cube-root U)) returns the cube root of each of the
0.67 0.90 0.63 0.90 0.77 0.94 0.75 0.92 0.95 0.76 0.89 0.56 0.39.
0.44 0.88 0.19 0.80 0.27 0.49
0.49 0.98 0.58 0.90 0.73 0.86
0.78 0.89 0.90 0.50 0.82 0.94
0.81 0.84 0.26 0.77 0.62 0.76
0.51 0.81 0.73 0.84 0.50 0.76
0.35 0.87 0.93 0.71 0.97 0.82
0.91 0.68 0.63 0.74 0.74 0.80
0.87 0.61 0.80 0.82 0.62 0.73
0.66 0.79 0.90 0.99 0.89 0.80
0.72 0.76 0.79 0.76 0.77 0.91
3. (mu X) returns 0.74 = x, from which 2x - 1 2 * 0.74 - 1 = = 1.85 ª 2. qˆMME = 1- x 1 - 0.74 4. (sum (repeat #' Log X)) returns -33.37 = SLn Xi, from which n + Â Lnx 100 - 33.37 qˆMLE = = 2. -Â Lnx 33.37
0.34 0.91 0.99 0.92 0.59 0.76
P369463-Ch006.qxd 9/2/05 11:16 AM Page 341
6.2 Methods of Finding Point Estimates
341
The command (MMvsML q-range n m) returns m trials using n random samples with a random value of q chosen from the integer range [0, q-range - 1] for density f(x) = (q + 1)xq on [0, 1] and compares the MME with the MLE by tallying the results. (MMvsML 20 30 15) returned
q 15 7 14 6 11 10 2 3 0 12 13 16 17 9 16
EXAMPLE 6.17
Method of Moments
Maximum Likelihood
WINNER
15.253 6.588 14.939 5.871 10.946 9.717 1.913 3.456 -0.057 11.950 12.497 14.586 18.296 9.124 19.191
15.491 6.524 14.907 5.880 11.017 9.723 2.034 3.477 0.009 11.893 12.510 14.381 18.309 9.182 19.024
MME MME MLE MLE MLE MLE MLE MME MLE MME MLE MME MME MME MLE
MME WINS = 7
MLE WINS = 8
For a continuous uniform RV X on [0,q], a) find the MLE, b) compare the MLE with the MME in regards to their relative efficiency. Solution The likelihood function L(xi; q) = 1/q for 0 £ xi £ q. We seek the q value that maximizes L. Checking the end values on the closed interval [0 q] shows that L(0) = L(q) = 1/q, and taking a derivative does not produce any critical x values. L grows larger with smaller values of q, and q must be at least as large as each Xi in the random sample X1, X2, . . . , Xn. To maximize L, make q as small as possible subject to the constraint that q must be at least as large as every Xi. Thus, qˆ = max{ X i }. Recall from Example 6.10 that the MME for q is 2 X and that the estimator is unbiased. The MLE for q is max {Xi} and cannot be unbiased since we do not expect the maximum of a sample to equal the maximum of a population.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 342
342
Chapter 6 Point and Interval Estimation
To determine the more efficient estimator, we compute the mean square error (MSE) of each. Since the MME 2 X is unbiased, its MSE is just the variance, that is, 4s 2
MSE(2 X ) = V (2 X ) =
=
4(q 2 /12)
n
n
=
q2
.
3n
Designate the MLE estimator Z, the max {Xi}. Then FZ ( z ) = P ( Z £ z ) = P (each X i £ z ) n
n
È z dx ˘ Ê zˆ = ÍÚ = . Ëq¯ Î 0 q ˚˙ Differentiating the cumulative distribution FZ produces the density functon nz n -1
f Z( z) =
qn
, 0 £ z £ q,
from which
Ú
E( Z ) =
q
0
nz n q
n
dz =
nz n +1
nq q = . ( n + 1)q 0 n + 1 n
Similarly, E( Z 2 ) =
q
nz n +1
0
qn
Ú
dz =
nz n + 2
nq 2 q = ( n + 2)q n 0 ( n + 2)
and V( Z) =
nq 2 ( n + 1)2 ( n + 2)
The bias B = E( Z ) - q =
with B 2 =
.
nq n +1
-q = -
q n +1
q2 ( n + 1)2
MSE( Z ) = V ( Z ) + B 2 nq 2 q2 2q 2 q2 = + = £ , ( n + 1)2 ( n + 2) ( n + 1)2 ( n + 1)( n + 2) 3 n indicates that the MSE( Z ) = max{ X i } =
2q 2
£
q2
= MSE(2 X ), ( n + 1)( n + 2) 3 n and the biased MLE is more efficient than the unbiased MM estimator.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 343
6.2 Methods of Finding Point Estimates
343
The command (MME-vs-MLE q-range ss n) compares the MME with the MLE with density f(x) = 1/q on [0, q] for various values of q, showing which estimator is closer to q, the MME of 2 x or the MLE max{Xi}, and tallies the results. The value of q is randomly chosen from the interval 1 to q-range for n trials of sample size ss. For example, (MME-vsMLE 25 100 20) returned the following.
EXAMPLE 6.18
q
MME
MLE
Winner
5 5 14 22 8 11 18 15 12 17 12 7 17 5 7 25 14 3 6 3
21.980 25.748 26.190 25.227 26.487 22.517 25.075 22.661 25.845 24.649 24.833 24.970 23.238 25.170 23.701 25.103 26.011 25.498 26.005 23.350
24.853 24.732 24.818 24.847 24.753 24.702 24.766 24.672 24.597 24.589 24.825 24.904 24.971 24.932 24.360 24.971 24.893 24.654 24.562 24.604
MME MLE MLE MLE MLE MME MLE MME MLE MLE MLE MLE MME MLE MME MLE MLE MLE MLE MME
MME wins 6
MLE wins 14
For a random sample from the shifted exponential density function f(x; k, q) = ke-k(x-q) for x ≥ q, a) find the MME for q and k and compute each estimate from the data; b) find the MLE for q and k and compute each estimate from the data. Solution 0.87))
(setf data '(0.85 0.80 0.85 0.75 0.89 0.57 0.74 0.86 0.95
a) MME: Since there are two parameters, we need two moment equations. • È - xe Ú0 kxe - k( x -q )dx = ke kq Ú0 xe - kx dx = ke kq ÍÎ k = q + 1/ k and thus X = M1 = q + 1/ kˆ.
E( X ) =
•
- kx
• 1 • - kx ˘ + Ú e dx ˙, ˚ 0 k q
P369463-Ch006.qxd 9/2/05 11:16 AM Page 344
344
Chapter 6 Point and Interval Estimation •
•
0
q
E( X 2 ) = ke kq Ú x 2 e - kx dx = ke kq {- x 2 e - kx / k} + 2/ k * ke kq Ú xe - kx dx = q 2 + 2q / k + 2q / k 2 and thus M 2 = qˆ2 + 2qˆ/ kˆ + 2/ kˆ2 . 1) qˆ + 1/ kˆ = M1 fi kˆ =
1 M1 - qˆ
.
2) qˆ2 + 2qˆ/ kˆ + 2/ kˆ2 = M 2 fi qˆ2 + 2qˆ( M1 - qˆ) + 2( M1 - qˆ)2 = M 2 ; qˆ2 - 2 M1qˆ + 2 M12 - M 2 = 0. qˆ2 = M1 ± M 2 - M12 and kˆ =
1 M 2 - M12
.
n
Âx M1 =
i
i =1
= (sample-moment 1 data) Æ 0.813,
n n
Âx M2 =
2 i
i =1
= (sample-moment 2 data) Æ 0.6711,
n
qˆ = M1 ± M 2 - M12 = 0.9137 and kˆ =
1
= 9.93.
M 2 - M12
b) MLE: L( x i ; k, q ) = k n * e - kSx * e nkq L is maximized by the largest q subject to the constraint that each xi ≥ q, which implies qˆ = min{x i }. Ln L = n Lnk - k Sx i + nkq; ∂ ln L ∂k
=
n k
- Sx i + nq = 0 when kˆ =
1 x - qˆ
.
From the data, qˆ = min{x i } = 0.57 and kˆ =
1 0.813 - 0.57
= 4.1152.
Notice that the MME and the MLE for k are identical in the form kˆ = but that for q the estimators differ.
1 x - qˆ
P369463-Ch006.qxd 9/2/05 11:16 AM Page 345
6.2 Methods of Finding Point Estimates
EXAMPLE 6.19
345
Find the MLE for parameters m and s 2 from a random sample X1, X2, . . . , Xn taken from the normal distribution N(m, s 2). Solution The normal density function is 1
f ( x) =
2 /2 s 2
e -( x - m )
, -• < x < •.
2p s The joint density is the product of the marginal densities. n
1
L( x; m, s ) = ’ 2
2p s
i =1
Ln L = -
n 2
(2ps 2 )n / 2
e
2
2s 2
.
n
=Â
∂m
=
1
Â
Ln(2ps 2 ) - Â ( x i -m )2 /2s 2 .
n
∂ Ln L
e
- ( xi -m )2 /2s 2
Ê n ˆ -Á xi -m ˜ Ë i =1 ¯
i =1
2( x i - m ) 2s 2
i =1
= 0 when
Sx i = nmˆ or when mˆ = x . n
∂ Ln L
=
∂s 2
-n 2s 2
Â(x +
i
- m )2
i =1
2(s 2 )2
= 0 when
n
Â(x sˆ 2 =
i
- x )2
i =1
. n
The MLE for s 2 is the same as the MME and is similarly biased.
EXAMPLE 6.20
Given density function f(x) = 1/2(1 + qx) for -1 £ x £ 1, 0 £ q £ 1, a) find the MME qˆ; b) compute V(X ), V( X), and V(qˆ); and c) find P(|qˆ| > 1/3) when q = 0 and n = 30. Solution 1 È x 2 qx 3 ˘ 1 q + = . Thus qˆ = 3 X . Í ˙ -1 2Î 2 3 ˚ -1 3 3 1 1 1 Èx qx 4 ˘ 1 1 b) E( X 2 ) = Ú ( x + qx 2 )dx = Í + = . ˙ -1 Î ˚ 2 2 3 4 -1 3 a) E( X ) =
1
1
Ú 2
( x + qx 2 )dx =
P369463-Ch006.qxd 9/2/05 11:16 AM Page 346
346
Chapter 6 Point and Interval Estimation 2
3 -q2 Êqˆ V( X ) = = ; 3 Ë 3¯ 9 1
V( X ) =
s2
=
3 -q2
;
9n
n
3 -q2 V (qˆ) = V (3 X ) = 9V ( X ) = . n c) Given q = 0 and n = 30 (large sample size), with qˆ ~ N(0, 3/30), when q = 0, P(|qˆ| > 1/3) = 1 - P(-1/3 < qˆ < 1/3) = 0.2918. EXAMPLE 6.21
Find MM and ML estimators for a and b from the continuous uniform on [a, b]. Solution MME: E(X ) = (b + a)/2 fi M1 = ( bˆ + aˆ)/2 E( X 2 ) = ( b - a )2 /12 + ( b + a )2 /4 fi M 2 = ( bˆ - aˆ )2 /12 + M12 bˆ + aˆ = 2 M1 bˆ - aˆ = 2 3( M 2 - M12 ) bˆ = M1 + 3( M 2 - M12 ) aˆ = M1 - 3( M 2 - M12 ) MLE: aˆ = Min{ X i } and bˆ = Max{ X i } Command (UabMMML a b sample-size n) returns n MM and ML estimates for aˆ and bˆ and indicates which ones are closer to a and to b. For example, (UabMMML 5 10 30 15) returned the following. METHOD OF MOMENTS A-HAT
B-HAT
5.138 9.722 5.149 10.293 5.138 9.613 4.651 9.734 4.793 9.477 4.476 10.457 4.955 10.189 4.702 10.600 4.763 9.976 5.279 10.478 4.451 10.002 5.227 9.189 5.722 10.083 5.087 10.494 5.145 10.026 MME wins 10
MAXIMUM LIKELIHOOD A-HAT 5.207 5.107 5.108 5.139 5.076 5.024 5.114 5.245 5.161 5.347 5.131 5.165 5.368 5.114 5.223 MLE wins
WINNER
B-HAT
A-HAT
B-HAT
9.978 9.794 9.676 9.955 9.617 9.997 9.913 9.890 9.741 9.868 9.776 8.998 9.750 9.664 9.787 20
MME MLE MLE MLE MLE MLE MME MLE MLE MME MLE MLE MLE MME MME
MLE MLE MLE MLE MLE MLE MLE MLE MME MLE MME MME MME MLE MME
P369463-Ch006.qxd 9/2/05 11:16 AM Page 347
6.3 Interval Estimates (Confidence Intervals)
6.3
347
Interval Estimates (Confidence Intervals) The Central Limit Theorem allows us to make probability statements about the mean of the sampling distributions of means for large samples (n ≥ 30). When X is the mean of a random sample taken from any population with mean m and variance s 2, then X tends to N(m, s 2/n) and Z=
X -m s
tends to N (0, 1) as n Æ •.
n
When we take a random sample X1, X2 . . . , Xn from a normal distribution with an unknown mean m but with s 2 known, we can quantify the closeness of X to m. The point estimate is x, but we now seek an interval about the estimator X that contains m with a desired degree of confidence. X -m Consider the unit normal RV Z = . s n The probability that z0.025 £ Z £ z0.975 is the same as the probability that z0.025 £
X -m s
£ z0.975 .
n
That is, P ( z0.025 £ Z £ z0.975 ) = P ( -1.96 £
X -m s
£ 1.96) = 0.95.
n
We can express this probability as an interval about m to get specifically P ( X - 1.96s / n £ m £ X + 1.96s / n ) = 0.95 or generally P ( X - za /2s / n £ m £ X + za /2s / n ) = 1 - a . Note that we have a probability statement about an unknown parameter m with random variables on the left and right side of the inequality, as X is an RV. Regarding the endpoints enclosing the interval, we have z1-a /2s za /2s ˆ Ê x,x+ or, equivalently, ( x ± za /2s / n ). Ë n n ¯ We call this interval a confidence interval when X is evaluated and say that we expect m to be inside this interval 100(1 - a) times out of 100 (see Figure 6.1). That is, with m unknown, we gather data from the distribution and compute x. We then state with the appropriate confidence that we expect m to be inside the interval.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 348
348
Chapter 6 Point and Interval Estimation
95% z0.025
Figure 6.1
0
z0.975
95% Confidence Interval
If we desire more confidence, say 99%, we must include a larger interval, thus sacrificing precision. A confidence interval around x says that m Œ x ± za /2s / n with 100(1 - a)% confidence. EXAMPLE 6.22
a) Find 95% and 99% confidence intervals for data from which x = 20.3, s = 4, and n = 49. b) Find the confidence for the interval (19.3, 21.3). Solution The large size of n (49 ≥ 30) allows us to assume that the sampling distribution of X is asymptotically normal. a) z0.025 = -1.96; z0.975 = 1.96 m Œ x ± z0.975s / n m Œ 2.06 ± 1.96 * 4/ 7 or m Œ(19.18, 21.42), with 95% confidence. z0.005 = -2.58; z0.995 = 2.58 m Œ 20.3 ± 2.58 * 4/ 7 or m Œ(18.83, 21.77), with 99% confidence. Notice the widening of the interval (less precision) for the greater confidence. b) The midpoint is x = 20.3 and 20.3 - 19.3 = 1 = |za/2| * 4/7 fi za/2 = 7/4 = 1.75. (phi 1.75) Æ 0.96 fi a/2 = 4% fi a = 8% fi (100 - a) = 92% confidence interval.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 349
6.3 Interval Estimates (Confidence Intervals)
349
The command (mu-Z-ci x s n a) returns a (1 - a)% confidence interval for m. (mu-Z-ci 20.3 4 49 5) Æ (19.18, 21.42) with 95% confidence. (mu-Z-ci 20.3 4 49 1) Æ (18.83, 21.77) with 99% confidence.
EXAMPLE 6.23
a) Simulate 100 95% confidence intervals by sampling from the normal distribution N(m = 5, s 2 = 4). b) Compute the length of each of these intervals. Solution
a) (sim-nci m s ss n a) returns n (100 - a)% simulated confidence intervals, using a sample size of ss. For example, (sim-nci 5 2 100 100 5) generated the following 100 95% confidence intervals of length 0.784. The 5 starred intervals do not contain the parameter m. ((4.816 (4.605 (4.794 (4.782 (4.726 (4.871 (4.707 (4.279 (4.516 (4.532 (4.625 (4.632 (4.348 (4.620 (4.588 (4.873 (4.466 (4.600 (4.457 (4.343 (4.711 (4.889 (4.973 (4.532 (4.549
5.600) 5.389) 5.578) 5.566) 5.510) 5.655) 5.492) 5.063) 5.300) 5.316) 5.409) 5.416) 5.133) 5.404) 5.372) 5.657) 5.250) 5.384) 5.241) 5.127) 5.495) 5.673) 5.757) 5.316) 5.333)
(4.635 (4.646 (4.423 (4.121 (4.652 (4.538 (4.554 (4.524 (4.353 (4.671 (4.812 (4.503 (4.568 (4.199 (4.729 (4.477 (4.585 (4.612 (4.341 (4.841 (4.979 (4.720 (4.645 (4.584 (4.566
5.419) 5.430) 5.207) 4.905)* 5.436) 5.322) 5.338) 5.308) 5.137) 5.455) 5.596) 5.287) 5.352)* 4.983) 5.513) 5.261) 5.369) 5.397) 5.125) 5.625) 5.763) 5.504) 5.429) 5.369) 5.351)
(4.341 (4.956 (4.547 (4.612 (4.372 (4.831 (4.808 (4.677 (4.389 (4.411 (5.242 (4.609 (4.881 (4.369 (4.403 (4.763 (4.413 (4.921 (4.755 (4.514 (4.216 (4.587 (4.891 (4.553 (4.629
5.125) 5.740) 5.331) 5.396) 5.156) 5.615) 5.592) 5.461) 5.173) 5.195) 6.026) 5.393) 5.665) 5.153) 5.187) 5.547) 5.197) 5.705) 5.539) 5.298) 5.000) 5.371) 5.676) 5.337) 5.413)
(4.611 (4.749 (4.618 (4.648 (4.689 (4.818 (4.517 (4.183 (4.928 (5.151 (4.170 (4.699 (4.475 (4.974 (4.705 (4.372 (4.832 (4.443 (4.592 (4.827 (4.572 (4.371 (4.996 (4.759 (4.384
5.395) 5.533) 5.402) 5.432) 5.473) 5.602) 5.302) 4.968)* 5.712) 5.935)* 4.954)* 5.484) 5.259) 5.758) 5.489) 5.156) 5.616) 5.228) 5.376) 5.611) 5.356) 5.155) 5.780) 5.543) 5.168))
P369463-Ch006.qxd 9/2/05 11:16 AM Page 350
350
Chapter 6 Point and Interval Estimation
b) The length of each interval is 2 * za /2s / n = 2 * 1.96 * 2/10 = 0.784. The command (sim-plot-ci m s n m a) plots the m(100 - a)% confidence intervals for a sample size of n. Notice that changing the sample size n from 15 to 35 in the top 2 displays resulted in smaller confidence intervals, while changing a from 10% to 1% in the bottom 2 displays resulted in longer confidence intervals. The dotted vertical line is the mean 15 of the distribution. (sim-plot-ci 15 20 15 5 10) Interval Length = 16.9916706, m = 15, s = 20, n = 15 90% Conf Intervals m = 15 _________________ _________________ _________________ _________________ ______________
(0.9122595, (2.1381694, (6.8590531, (5.1267114, (-2.764294,
17.9039301) 19.1298401) 23.8507238) 22.1183821) 14.2273765)
(sim-plot-ci 15 20 35 5 10) Interval Length = 11.1236595, m = 15, s = 20, n = 35 90% Conf Intervals ____________ _____________ _____________ ____________ ____________
(6.6173535, (7.1033287, (5.2398832, (13.519462, (9.5946313,
17.7410130) 18.2269883) 16.3635428) 24.6431221) 20.7182909)
(sim-plot-ci 15 20 35 5 1) Interval Length = 17.4185350, m = 15, s = 20, n = 35 99% Conf Intervals __________________ __________________ __________________ __________________ __________________
(-1.5013761, (6.8253402, (9.1140984, (4.3726454, (6.5834725,
15.9171589) 24.2438753) 26.5326335) 21.7911805) 24.0020075)
P369463-Ch006.qxd 9/2/05 11:16 AM Page 351
6.3 Interval Estimates (Confidence Intervals)
351
Trade-Off: Sample Size Which is better, a 95% confidence interval or a 99% confidence interval? Although in general a 99% confidence interval is to be preferred, we cannot say that a 99% confidence interval is always to be preferred over a 95% confidence interval. The trade-off is between precision and confidence of the estimate. The maximum error za /2s / n associated with a confidence interval is L/2, where L is the length of the interval. L=
2 za /2 s
(6–7)
n Observe that to seek smaller L implies larger a (smaller za/2 with less precision), smaller s, and larger n. Solving for n allows us to specify the sample size for the desired interval length or precision. n=
Ê 2 za /2 s ˆ Ë L ¯
2
(6–8)
Recall that V( X) = s 2/n so that the standard error s x becomes smaller as n increases. But as n increases—that is, there are more data—the associated costs of data-gathering increase as well.
EXAMPLE 6.24
When sampling from a N(5, 16), a) find n for a 90% confidence interval of length 1/2; b) repeat for a length of 1/4; c) repeat a) for a 99% confidence interval. Solution 2
a) n =
2
Ê 2za /2s ˆ Ê 2 * 1.645 * 4 ˆ = = 692.74 ª 693. Ë ¯ Ë L ¯ 1/2
b) For L = 1/4, smaller error and greater precision call for increased n. n = (32 * 1.645)2 = 2771 ª 4 * 692.74. To halve the error is to quadruple the sample size to maintain the same confidence. 2
Ê 2 * 2.576 * 4 ˆ c) n = ª 1699. Ë ¯ 1/2 The only way to increase confidence (reliability) with smaller error (precision) is to increase the sample size.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 352
Chapter 6 Point and Interval Estimation
352
Confidence Interval When s Is Not Known Suppose that neither m nor s is known. If the sample size from a normal distribution is large, for example, n ≥ 30, we can invoke the Central Limit Theorem to assume that the sampling distribution of the mean is asymptotically normal and use X and S2 as unbiased estimators for m and s 2. The sample size applicable for the Central Limit Theorem is a function of the skewness of the population. A symmetrical population requires less than 30 samples, but heavily skewed populations like the exponential require more than 30 samples. A confidence interval for m with unknown s from large samples may be expressed as x ± za/2 * S / n . If the sample size from a normal distribution is small (less than 30), then we can use the t-statistic. t=
X -m
.
(6–9)
S/ n We then say that m Œ x ± tn -1,a /2 s/ n , with 100(1 - a) percent confidence at n - 1 degrees of freedom. A degree of freedom is lost since s is used as an estimate for s. EXAMPLE 6.25
Find a) a 99% confidence interval for the mean of a normal distribution with unknown variance, given that x = 5.5, n = 49, and s2 = 5.76; and b) an exact 99% confidence interval using the t-statistic. Solution Since the sample size is considered large, the 99% confidence interval is given by a) x ± z0.005 * s/ n = 5.5 ± 2.576 * 2.4/7 = 5.5 ± 0.883 or m Œ (4.62, 6.38). (mu-Z- ci 5.5 (sqrt 5.76) 49 1) Æ (4.62, 6.38) with 99% confidence. b) x ± t48,0.005 * s/ n = 5.5 ± 2.683 * 2.4/7 = 5.5 ± 0.920 or m Œ (4.58, 6.42). (mu-T- ci 5.5 (sqrt 5.76) 49 1) Æ (4.58, 6.42) with 99% confidence.
EXAMPLE 6.26
a) Find a 95% confidence interval for the mean of a normal distribution with unknown variance based on the 20 samples assigned to data. b) Find the confidence for the t-interval (3.738, 6.962) about m. Solution
(setf data '(7 9 3 2 3 8 4 6 2 6 4 3 8 3 2 7 9 5 8 8)) (mu data) Æ 5.35 = x; (std-err data) Æ 2.52 = s.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 353
6.3 Interval Estimates (Confidence Intervals)
353
a) Since the sample size is less than 30 and s is unknown, we use a tconfidence interval for m. x ± tn -1,a /2 * s/ n = 5.35 ± (2.094 * 2.52/4.47) or m Œ(4.17, 6.53), with 95% confidence. The t-value for 19 degrees of freedom and a/2 = 0.025 is 2.0935 from Table 4 in Appendix B or from the command (Inv-t 19 2.5). b) The midpoint x = 5.35 and the error is 5.35 - 3.738 = 1.612 = t19,a/2*2.52/4.47. t19,a/2 = 2.861 fi a/2 = (m-tee 19 2.861) Æ 0.005 fi a = 0.010 fi 99% confidence.
Confidence Interval for the Difference between Two Means (m1 - m2) In establishing a confidence interval for the difference between two means, it is helpful to think in terms of two independent random variables X1 and X2. If the sample sizes are large, the Central Limit Theorem paves the way. If X1 and X2 are normal RVs, then so is X1 - X2, with m = E( X1 - X 2 ) = m1 - m 2 and s 2 = V ( X1 - X 2 ) =
s 12
+
s 22
n1
n2
for independent samples n1 and n2. Then a 100(1 - a)% confidence interval for m1 - m2 is s 12
m1 - m 2 Œ ( x1 - x2 ) ± za /2
+
n1
s 22
(6–10)
.
n2
If the samples are large and the variances are unknown, the sample variances may be used for the population variances. That is, for large samples, a 100(1 - a)% confidence interval for m1 - m2 is ( x1 - x2 ) ± za /2
s12 n1
+
s22
.
n2
If the sample sizes are small but are assumed to have been drawn from independently different normal distributions with the same but unknown variance, the pooled sample variance may be used: m1 - m 2 Œ ( x1 - x2 ) ± tn1+n 2-2,a /2 * spooled * with 100(1 - a)% confidence,
1 n1
+
1 n2
P369463-Ch006.qxd 9/2/05 11:16 AM Page 354
354
Chapter 6 Point and Interval Estimation
where 2 = SPooled
( n1 - 1) S12 + ( n2 - 1) S22 n1 + n2 - 2
(6–11)
.
Notice the similarity in sampling from normal distributions. E( X1 - X2) = m1 - m2, and with the variance unknown, each sample variance is an unbiased estimator for s 2. Thus the pooled sample variance is the best choice for the population variance. The t-distribution is used instead of the normal distribution because of the small sample sizes.
EXAMPLE 6.27
2 a) Find a 95% confidence interval for m1 - m2 with x1 = 10, x2 = 9, s 1 = 9, 2 s 2 = 4, and n1 = n2 = 100 when sampling from two independent normal distributions. b) Find a 95% confidence interval for m1 - m2 with x1 = 10, x2 = 9, s12 = 9, s22 = 4, and n1 = n2 = 100 when sampling from two independent normal distributions. c) Find a 95% confidence interval for m1 - m2 with x1 = 10, x2 = 9, s12 = 9, s22 = 4, n1 = n2 = 10 in sampling from two independent normal distributions with the same but unknown variance.
Solution a) m1 - m 2 Œ ( x1 - x2 ) ± za /2
s 12 n1
+
s 22
.
n2
m1 - m 2 Œ (10 - 9) ± 1.96 *
9
+
100
4
= 1 ± 0.707.
100
b) The confidence interval remains essentially the same as in the first case since the large sample size permits s2 to be used for s 2. m1 - m 2 Œ (10 - 9) ± 1.96 *
9
+
100
4
= 1 ± 0.707.
100
c) The pooled t confidence interval is appropriate. m1 - m 2 Œ ( x1 - x2 ) ± tn1+ n 2 - 2,a /2 * spooled *
1 n1
+
1 n2
m1 - m 2 Œ (10 - 9) ± 2.101 * 2.55 * 0.2 = 1 + 2.4, 2 where spooled =
(10 - 1)(9 + 4)
= 6.5.
18
When the sample sizes are the same, the pooled variance is merely the average of the variances, that is, equal weights.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 355
6.3 Interval Estimates (Confidence Intervals)
355
Confidence Interval for s 2 of a Normal Distribution Recall that
( n - 1) S 2 2
is chi-square (x 2 RV) with v = n - 1 degrees of freedom.
s A 100(1 - a)% confidence interval for s 2 is ( n - 1) S 2 c n2 -1,1-a /2
£s2 £
( n - 1) S 2 c n2 -1,a /2
,
from which s2 Œ
EXAMPLE 6.28
2 2 Ê ( n - 1) S ( n - 1) S ˆ , with 100(1 - a )% confidence. Ë c n2 -1,1-a /2 c n2 -1,a /2 ¯
Find a 95% confidence interval for s 2 and s, given the following random sample of size 10 from a normal population: 22 34 23 25 30 26 27 25 27 29. Solution
(svar '(22 34 23 25 30 26 27 25 27 29)) Æ 12.4 = s2.
Entering the chi-square table in the Appendix with v = 9 degrees of freedom, the chi-square UPPER tail values for a = 0.975 and a = 0.025 are 2.70 and 19.0, respectively. The commands (inv-chi-sq 9 97.5) and (inv-chi-sq 9 2.5) return 19.02 and 2.674, respectively. With s 2 = 12.4, s 2 Œ s2 Œ
2 2 Ê ( n - 1) S ( n - 1) S ˆ , Ë c n2 -1,1-a /2 c n2 -1,a /2 ¯
Ê (10 - 1)12.4 (10 - 1)12.4 ˆ , ª (5.87, 41.33) Ë ¯ 19.0 2.7
s Œ(2.42, 6.43).
Confidence Interval for a Proportion To establish a confidence interval for a proportion p, we can first estimate X ˆ = , where RV X is the number p with the sample proportion statistic p n X ˆ ) = EÊ ˆ = of successes from n Bernoulli trials. The expected value E( p Ë n¯ np X is an unbiased estimator for p. = p shows that n n
P369463-Ch006.qxd 9/2/05 11:16 AM Page 356
356
Chapter 6 Point and Interval Estimation
ˆ) = V V( p
Ê X ˆ npq pq = = . Ë n¯ n2 n
Thus for large sample sizes the Central Limit Theorem can be invoked to X - np X /n - p show that or equivalently asymptotically approaches the unit npq pq/ n normal distribution as n gets large. A 100(1 - a)% confidence interval for p is given by ˆˆ pq
ˆ ± za /2 pŒp
.
n
For small samples, the confidence interval estimates for proportions can be fraught with peril. Proportions close to 0 and 1 can lead to erratic confidence intervals regardless of sample size. ˆ is computed as Also, when the population size is small, the variance of p s 2pˆ = EXAMPLE 6.29
ˆ(1 - p ˆ) p n
*
N-n N -1
.
It is reported that 300 of 500 subjects have been helped with a new drug. Find a 99% confidence interval for p, the proportion of those people helped.
Solution
ˆ ± za /2 pŒp
ˆˆ pq
fi p Œ 0.6 ± 2.576 *
0.6 * 0.4
n 500 fi p Œ (054, 0.66) with 99% confidence. The command (cip p-hat n a) returns a (100 - a)% confidence interval for p. For example, (cip 300/500 500 0.01) returns (0.544 0.656).
EXAMPLE 6.30
In a sampling yes/no poll the sampling error is reported to be ±3%. State the assumptions and find the sample size. Solution The Bernoulli density f(x) = px(1 - p)1-x for x = 0, 1 has E(X ) = p and E(X2) = p. Thus V(X ) = p(1 - p) = p - p2 for 0 £ p £ 1. Let function g(p) = p - p2 for 0 £ p £ 1 with g(0) = g(1) = 0. Then g¢(p) = 1 - 2p = 0 when p = 1/2 and g≤(p) < 0 fi absolute maximum for p = 1/2.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 357
6.3 Interval Estimates (Confidence Intervals)
357
Assuming a = 5%, length L = 2 * 0.03 = 0.06, and for a 95% confidence interval, the sample size is 2
2
2 Ê 1.96 p(1 - p) ˆ Ê 19.6 0.5 * 0.5 ˆ Ê za/2s ˆ n= = = Ë L /2 ¯ Ë ¯ Ë ¯ 0.03 L /2 = 1067.11 ª 1068.
2
Confidence Interval for the Difference between Two Proportions For large samples the sampling distribution of p for a binomial distribution ˆ 1 and p ˆ 2 are the sample proportions of two large is approximately normal. If p ˆ1 - p ˆ 2) = p1 - p2 and an estimate random samples of size n1 and n2, then E( p ˆ ˆ ˆ ˆ pq pq ˆ1 - p ˆ2 ) = 1 1 + 2 2 . for V ( p n1 n2 A 100(1 - a)% confidence interval for p1 - p2 is given by ˆ1 - p ˆ 2 ) ± za /2 p1 - p2 Œ ( p
EXAMPLE 6.31
ˆ1qˆ1 p
+
ˆ 2 qˆ2 p
n1
(6–12)
.
n2
In an election 250 of 300 voted for A and 400 of 600 voted for B. a) Find a 95% confidence interval for the difference in the proportions. b) Find an upper 95% confidence limit for the difference in the proportions from these data: 22 34 23 25 30 26 27 25 27 29. Solution ˆ1 = a) p
250 300
ˆ2 = = 0.833; p
400
= 0.667.
600
ˆ1 - p ˆ 2 ) ± za /2 p1 - p2 Œ ( p
ˆ1qˆ1 p
+
ˆ 2 qˆ2 p
n1
n2
Œ (0.833 - 0.667) ± 1.96
0.833 * 0.166 300
+
0.667 * 0.333 600
Œ 0.167 ± 0.063 Œ(0.104, 0.230) with 95% confidence. ˆ1 - p ˆ 2 ) + za b) p1 - p2 £ ( p
ˆ1qˆ1 p n1
+
ˆ 2 qˆ2 p
.
n2
p1 - p2 £ (0.833 - 0.667) + 1.645 * 0.02888 = 0.213 with 95% confidence.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 358
Chapter 6 Point and Interval Estimation
358
Confidence Interval for the Paired T-Test Sometimes data are collected in pairs when two measurements for each subject or object are made or when the same person is involved with before and after treatments. In some sense the measurements (for example, a person’s test score before and after instruction) may not be entirely random. The procedure to establish a confidence interval is to regard the difference in the pairs as a random sample and to proceed accordingly. From n random paired observations (2n measurements) we compute the n differences D between the pairs (X - Y). An example will illustrate the procedure.
EXAMPLE 6.32
Find the upper-tailed 99% confidence limit for the random paired data below. The differences D are post-test scores - pre-test scores.
Students Pre-test Scores Post-Test Scores D = Differences
1 60 75 -15
2 45 65 -20
3 80 90 -10
4 87 80 7
5 79 89 -10
6 75 95 -20
7 60 85 -25
8 30 69 -39
9 45 40 5
Solution (setf data '( -15 -20 -10 7 -10 -20 -25 -39 5)) d = (mu data) Æ -14. 1; s 2D = (svar data) Æ 206.6 1; s D = 14.37; n = 9 pairs, from which we use the upper confidence limit t9-1,a=0.01 to get mD £
d
+ t9 -1,a * s D / n
£ - 14. 1 + 2.896 * 14.37 / 3 ª -0.24.
Confidence Intervals for Ratio of Variances s 22/s 12 The F-statistic is the ratio of two independent chi-squared RVs divided by their respective degrees of freedom, that is, S12 F =
2 2 s 12 s 2 S1 = with ( n1 - 1) degrees of freedom for the numerator S22 s 12 S22
s 22
P369463-Ch006.qxd 9/2/05 11:16 AM Page 359
6.3 Interval Estimates (Confidence Intervals)
Table 6.1
359
Portion of F Table for a = 0.05 V1
V2
1
2
3
4
5
6
7
8
9
10
11
1 2 3 4
161. 18.51 10.13 7.71
299 19.0 9.55 6.94
216 19.16 9.28 6.59
225 19.25 9.12 6.39
230 19.30 9.01 6.26
234 19.33 8.94 6.16
237 19.36 8.88 6.09
239 19.37 8.84 6.04
241 19.38 8.81 6.00
242 19.39 8.78 5.96
243 19.40 8.76 5.93
5 6 7 8 9
6.61 5.99 5.59 5.32 5.12
5.79 5.14 4.74 4.46 4.26
5.41 4.76 4.35 4.07 3.86
5.19 4.53 4.12 3.84 3.63
5.05 4.39 3.97 3.69 3.48
4.95 4.28 3.87 3.58 3.37
4.88 4.21 3.79 3.50 3.29
4.82 4.15 3.73 3.44 3.23
4.78 4.10 3.68 3.39 3.18
4.74 4.06 3.63 3.34 3.13
4.70 4.03 3.60 3.31 3.10
and n2 - 1 degrees of freedom for the denominator. A 100(1 - a)% confidence interval for the ratio s 22 s 12 s22 s12
using
S22 S12
as the point estimator is given by
F1-a / 2, n1 -1, n2 -1 £
s 22 s 12
£
s22 s12
Fa / 2, n1 -1, n2 -1.
(6–13)
A property of the F RV is that F1-a(v1, v2) = 1/Fa(v2, v1) for getting the lower tail values. A portion of the F distribution table for a = 0.05 is shown in Table 6.1. For example, F0.05(7, 4) = 6.09 (upper tail) and therefore F0.95(4, 7) = 1/6.09 = 0.1642. We seek F0.05(4, 7) from the table in order to find F0.95 (7, 4) = 1/F0.05(4, 7) = 1/4.12 = 0.2427.
EXAMPLE 6.33
Two random samples of size n1 = 10 and n2 = 9 from a normal distribution produced sample variances of s12 = 3 and s22 = 2.7. Find a 90% confidence interval for the ratio s 22/s 12. Solution Using Equation 6–13, we have a 90% confidence interval 2.7 s 22 2.7 given by F0.95 (9, 8) £ £ F0.05 (9, 8), or from the upper tail F-table 3 s 12 3 F0.95(9, 8) = 1/F0.05(8, 9) = 1/3.23 = 0.31; F0.05(9, 8) = 3.39. The 90% confidence interval is (0.9 * 0.31, 0.9 * 3.39) = (0.28, 3.1).
P369463-Ch006.qxd 9/2/05 11:16 AM Page 360
Chapter 6 Point and Interval Estimation
360
The command (cif s2-2 s2-1 n1 n2 alpha) returns a 100(1 - a)% confidence interval for the ratio s 12/s 22. For example, (cif 2.7 3 10 9 0.10) returns (0.264 2.915). Command (inv-f 10 8 0.05) returns 3.36, (inv-f 10 8 0.95) returns 0.32, (inv-f 8 10 0.05) returns 3.08, the reciprocal (1/3.08) of 0.32. Also command (L-Fd 8 10 3.08) returns 0.95, P(F £ 3.08); (U-Fd 8 10 3.08) returns 0.05, P(F > 3.08). Testing the ratio of two variances is the concept of an analysis of variance (ANOVA), discussed in Chapter 9.
6.4
Prediction Intervals Given a random sample X1, X2, . . . , Xn from a normal population, we are interested in predicting a confidence interval for the next sample Xn+1. The mean Xn of the first n samples is normally distributed. Now regard the RV Xn - Xn+1. Its expected value is 0 and its variance is s 2/n + s 2 or s 2(1/n + 1). If s 2 is unknown, s2 can be used as its estimator from the n samples. Thus a prediction interval can be computed with use of the t-statistic with n - 1 degrees of freedom. T=
X n - X n +1 S
1
(6–15)
+1
n EXAMPLE 6.34
A cereal-filling machine is set at 18 ounces. The first 20 boxes have a mean fill of 18.1 ounces with a computed standard deviation of s = 0.12. a) Compute a 95% prediction interval for filling the 21st box. b) Compute a 95% confidence interval for m. Solution a) Denote the mean of the first 20 samples as X20. The 95% prediction interval is given by x20 ± tn -1,a /2 * s *
1
+1
n
or 18.1 ± 2.086 * 0.12 * 1.0247
P369463-Ch006.qxd 9/2/05 11:16 AM Page 361
6.5 Central Limit Theorem (Revisited)
361
18.1 ± 0.257. Note that (inv.+ 20 0.025) fi 2.086. Hence X 21 Œ(17.84, 18.36) with 95% confidence. b) m Œ 18.1 ± 2.094 * 0.12/4.472 fi m Œ (18.043, 18.156). Note that the 95% prediction interval for Xn+1, the 21st sample, is longer than the 95% confidence interval for m.
6.5
Central Limit Theorem (Revisited) The Central Limit Theorem implies that sampled mean values tend to be normally distributed for large sample sizes of 30 or more. The command (simbinomial 10 1/2 100) repeats a binomial experiment 100 times to determine the number of successes from 10 trials with p = 1/2. The mean of the 100 outcomes then is one sampling mean in the distribution of sampling means. The command (sim-clt-binomial 10 1/2 50) performs the same experiment 50 times, resulting in a list of 50 sampling means, each mean determined by 100 repetitions of the binomial experiment with n = 10, p = 1/2. For example, (sim-binomial 10 1/2 100) may return 4 2 6 4
5 6 6 7
3 7 3 7
4 5 3 7
5 7 4 4
5 3 3 6
5 3 2 4
2 7 3 7
6 3 5 4
8 5 7 4
4 5 6 6
4 5 6 4
6 4 6 7
6 4 3 7
6 3 8 7
6 5 8 3
2 4 2 7
6 3 6 5
4 4 6 5
8 7 6 4
3 4 8 5
3 5 5 5
4 6 4 5
4 2 4 5
7 7 7 6.
The mean of the sample is 4.98. The command (sim-clt-binomial 10 1/2 100) returns 100 such means, as shown below. (4.81 5.08 5.09 4.83 5.08 4.91
5.02 4.91 4.96 5.09 4.65 5.32
4.57 4.85 4.95 4.98 5.32 5.07
4.84 5 4.81 5.18 5.23 4.78 4.99 5.17 5.16 5.18 5.22 4.95 5.14 5.04 4.6 5.13 4.98 4.72 4.77 4.86 4.85 4.7 5.03 5.12 4.82 5.15 5.05 4.89 5.3 5.12 5.14 5.2 4.96 4.59 5.05 5.16 4.96 5.21 5.17 5.2 5.06 5.17 4.83 5.06 5 4.86 5.25 4.82 4.75 5.2 5.09 5.01 4.98 5.06 4.89 4.98 4.66 4.93 5.01 5.39 4.82 5 5.06 4.94 5.06 4.69 4.83 5.15 5.27 4.72 5 5.27 5.01 4.73 5.1 5.11 5.16 5.02 4.98 5 5.12 5.27).
The overall mean is 5.002 as an estimate for E( X) and the overall sample s2 npq 10 pq 1 1 1 1 variance is 0.0325, an estimate for = = = * * = = ss ss 100 10 2 2 40 0.025.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 362
362
Chapter 6 Point and Interval Estimation
The Central Limit Theorem says that for any infinite distribution with finite variance and existing mean, the mean X of the sampling distribution of means is a normal RV with E( X) = m, the mean of the distribution from which s2 the samples were drawn, V ( X ) = , where s 2 is the variance of the distriss bution from which the samples were drawn and ss, the sample size, is sufficiently large (greater or equal to 30). EXAMPLE 6.35
Horizontal dot plot a sample of size 100 from the exponential with parameter k = 2 and also 50 such sample means from this distribution. Solution
(0.02 0.20 0.89 1.00 0.55 0.33
0.50 1.15 0.19 0.07 0.40 0.26
(setf e-data (sim-exponential 2 100)) returned
0.45 1.03 1.36 0.92 1.98 0.16
0.52 0.32 0.13 0.00 2.08 0.18
0.23 0.26 0.12 0.09 1.54 0.46
0.07 0.74 0.48 0.34 0.63 0.34
1.46 0.25 0.11 0.51 0.14 1.59
0.53 0.86 1.32 0.34 0.09 0.18
0.41 0.61 0.33 0.91 0.15 0.74
0.00 1.82 0.57 0.04 0.93 0.52
0.65 0.54 0.55 0.80 0.53 0.54
0.64 0.03 1.04 0.07 0.12 0.28
0.25 1.33 0.13 0.83 0.25 1.73
0.73 0.95 0.07 0.14 0.39 0.48
0.06 0.32 0.16 0.79 0.81 0.12 0.14 0.23 0.05 1.24 2.34)
1.56 0.32 0.71 0.73 0.08
(mu-svar e-data) returns 0.57 for x and 0.26 for s2. (hdp e-data) displays the exponential nature of the curve. ******************************* ******************** ***************** ************** ***** **** **** ** ** (setf e-clt (sim-clt-exponential 2 50)) returns (0.54 0.52 0.60 0.45 0.48 0.49 0.50 0.57 0.48 0.55 0.57 0.41 0.49 0.53 0.60 0.50 0.49 0.45 0.51 0.57 0.48 0.49 0.49 0.46 0.46 0.48 0.48 0.49 0.55 0.61 0.52 0.50 0.42 0.42 0.53 0.47 0.48 0.59 0.48 0.51 0.57 0.44 0.44 0.42 0.46 0.52 0.52 0.41 0.57 0.49) The mean is 0.5067, a simulated value for x; the variance s2 is 0.0028. (hdp e-clt) displays the normal-like nature of the sampling distribution of means due to the Central Limit Theorem.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 363
6.6 Parametric Bootstrap Estimation
* * * *
6.6
* * * *
* * * *
* * * *
363
* *********** *************** ******
Parametric Bootstrap Estimation With the pervasive presence of the computer in statistics, a method called bootstrap estimation is frequently used to provide estimates for standard errors of parameters without any knowledge of the underlying true parameters. From an original data set of a known distribution with unknown parameters, repeated resamples called bootstraps are created by sampling with replacement from the original data or from a fitted model. Desired sample statistics are computed from each bootstrap sample. When we sample from a known distribution, the original data sample can be used in lieu of the population to generate approximate confidence intervals for certain statistics. An example illustrates the procedure to show the effectiveness of a parametric bootstrap.
EXAMPLE 6.36
Suppose we sample from a gamma distribution with parameters a = 3 and k = 5. From our sample of size 100, we seek to provide estimates for a and k. We assume that we are aware that the sample is from a gamma distribution. Our sample is generated from the command (setf Fn (sim-gamma 3 5 100)) for a = 3 and k = 5 and returns what we call the original gamma sample known as Fn, which serves as a substitute for our population. Original Gamma Sample X1, X2, . . . , Xn from F denoted as Fn 0.4317 0.7079 1.0056 0.1994 0.4194 0.4508 0.9219 0.4414 0.8124 0.6509 0.8637 0.2604 0.1844 0.4334 0.5144 1.2160 1.1621 0.9133 0.5080.
0.5149 1.3280 0.3926 0.3545 1.0760 0.7676 0.9108 0.9630 0.6004
0.6545 0.1881 0.2060 0.3541 0.1244 1.0576 0.1946 0.3399 0.5293
0.8785 0.6564 0.7021 0.2370 1.0558 0.4594 1.0910 0.7095 0.2675
0.5756 0.5373 0.3281 0.8496 0.3114 0.2624 0.8097 1.0533 0.2874
0.5663 0.4192 0.4367 1.4458 0.6643 0.6271 0.8715 0.4095 0.4284
0.3954 1.3699 0.5878 0.4987 2.0912 0.3176 0.5239 0.7529 0.1849
0.2562 0.8972 0.3993 0.9776 0.5678 0.3948 0.2402 0.2787 0.4104
0.3531 0.2198 0.7544 0.0823 0.4005 0.4221 0.7438 0.2494 0.4920
0.7794 0.7296 0.2096 1.5236 0.5193 0.4478 0.5967 0.3469 1.2801
The mean of this sample is X = 0.6089, which is a good estimate since E(X ) = a/k = 3/5 = 0.6.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 364
364
Chapter 6 Point and Interval Estimation
The variance of the sample is 0.1257, which also is a good estimate since V ( X ) = a / k 2 = 3/25 = 0.12. We now have this data sample and want to determine approximate confidence intervals for a and k, the parameters of the distribution. This is the beginning of the bootstrap procedure with just the sample data. After plotting the data and seeing that a gamma distribution is reasonable, we proceed to find estimates for the parameters a and k. From the original sample designated Fn, the method of moments estimators are aˆ =
x2 V ( Fn )
=
0.60892
= 2.9491;
0.1257
kˆ =
X
=
V ( Fn )
0.6089
= 4.8440.
0.1257
The parameters are fitted to a gamma density. A bootstrap procedure (Monte Carlo simulation) is used to generate many such random samples of size 100 from the gamma sample Fn now regarded as the population. For example, generate 1000 iterations of (sim-gamma 2.9491 4.8440 1000); compute an aˆ and kˆ for each of 1000 samples. The average of these aˆ’s is a and the average of the kˆ’s is k. The estimate for the standard error of aˆ is then saˆ =
1 n
n
 (a
i
- a )2 = 0.3545
i =1
for n = 1000, and similarly, skˆ =
1
n
 (k n
i
- k )2 = 0.8486.
i =1
Thus an approximate 95% confidence interval for a is given by aˆ ± 1.96 * saˆ or 2.9493 ± 1.96 * 0.3545 = (2.2543, 3.6439) and for k by 4.8440 ± 1.96 * 0.8486 = (3.1807, 6.5073). We can also compute a 95% confidence interval for a, using the percentile method. Since we have 1000 estimates of a, we can take the 2.5 and 97.5 percentiles. The command (percentile percent list) returns the given percentile of a list of numbers. For example, (percentile 50 (upto 100)) returns 50.5. The interval (P2.5, P97.5) is (2.2425, 4.1172) from the 1000 samples of aˆ. 0.025 = P (aˆ - a ) £ L = 2.2425 - 2.9491 = -0.7066
P369463-Ch006.qxd 9/2/05 11:16 AM Page 365
6.6 Parametric Bootstrap Estimation
365
and 0.975 = P (aˆ - a ) £ U = 4.1172 - 2.9491 = 1.1681. where L and U are the lower and upper confidence bounds. Then P ( L £ aˆ - a £ U ) = 0.95 or P (aˆ - U £ a £ aˆ - L) = 0.95 with (aˆ - U , aˆ - L) being a 95% confidence interval. (aˆ - U , aˆ + L) = (2.9491 - 1.1681, 2.9491 - ( -0.7066) = (1.781, 3.6557). The overall a is 3.0534. The overall k is 5.0312. The 1000 estimates for a and k are generated by (bootstrap-ak 2.9491 4.8440 Fn 1000). The command (bootstrap-ak a-hat k-hat (sim-gamma a k n) n) can be used to generate the lists of 1000 aˆ’s and 1000 kˆ’s. Specifically, (bootstrap-ak 2.9491 4.8440 Fn 1000) generates the lists, where Fn is the original datum generated from the command (sim-gamma 3 4 100). The bootstrap samples were all taken from the simulated population Fn. Although we sought estimates for a and k, the bootstrap procedure can work for any statistic by computing that statistic from the bootstrap ensemble.
EXAMPLE 6.37
Provide an estimate for the standard error s from the following exponential sample: 0.165 0.322 0.208 0.052 1.793 0.202 1.055 0.016 0.145 1.101 1.059 0.129 Solution
(setf e-data '(0.165 0.322 0.208 0.052 1.793 0.202 1.055 0.016 0.145 1.101 1.059 0.129))
(bootstrap-parameter n parameter data) returns the mean of n bootstrap samples taken from data for the parameter, where parameter can be the mean, median, mode, variance, or standard deviation. For example, (bootstrap-parameter 1000 (stdev e-data)) returned 0.544 as an estimate for the standard error and (bootstrap-parameter 1000 (mu e-data)) Æ 0.520 as an estimate for m.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 366
Chapter 6 Point and Interval Estimation
366
Thus we use 0.52 as an estimate for m with 0.577 as an estimate for s. The e-data was generated from exponential with parameter k = 2, s = 0.5. Notice that in the parametric bootstrap the initial sample is fitted to a distribution with parameters inferred from the data. Subsequent bootstrap samples are then taken from the inferred distribution. In the nonparametric bootstrap, subsequent samples are taken from the initial data sample. The nonparametric bootstrap is discussed further in Chapter 10.
6.7
Summary We discussed two methods for finding point estimators: the method of moments and the method of maximum likelihood estimators. A third method, called least square estimators, is discussed in Chapter 8, regression. In general, maximum likelihood estimators have more desirable properties than method of moments estimators. Ease of computation may dictate what estimate is used; for example, the range is often used to indicate dispersion rather than the standard error in statistical process control. Interval estimates are derived for various confidence intervals, all of which depend heavily on the Central Limit Theorem. The theorem states that the sampling distribution of the mean from any distribution with finite variance tends toward a normal distribution, as sample size increases toward infinity. The expected value of RV X is m of the distribution from which the sample was taken E( X) = m. The variance of X is s 2 of the distribution from which the sample was taken, divided by the sample size n. V( X) = s 2/n. In the next chapter we look at methods to test hypotheses from which the sample data will cause us to reject or fail to reject our hypotheses. A summary of some of the common confidence intervals is shown in Table 6.1.
EXAMPLE 6.38
Find both MME and MLE point estimators and point estimates for a and k of the gamma density from the following data: (setf data '(2.1 0.6 2.2 0.9 1.6 1.5 1.6 2.6 1.3 1.2 1.4 2.6 2.2 3.1 1.4 1.5 0.8 2.1 1.3 0.9)) Solution
f ( x) =
MME:
ka x a -1e - kx G(a ) E( X ) =
a
fi
k E( X 2 ) =
a k2
aˆ = kˆ
+
a2 k2
ÂX n fi
i
= X = M1;
aˆ aˆ 2 + = kˆ2 kˆ2
ÂX n
2 i
= M2 .
P369463-Ch006.qxd 9/2/05 11:16 AM Page 367
6.7 Summary
367
Solving for aˆ and kˆ yields aˆ =
M12 2 1
M2 - M
; kˆ =
M1 M 2 - M12
.
M1 = 1.645; M 2 = 3.1305; aˆ = 6.4; kˆ = 3.9. -k x k na Px ia -1e  i MLE: L( x i , a , k) = n [G (a )] Ln L = na Ln k + (a - 1) Ln X i - k X i - n Ln G (a ) ∂L ∂a
= n Ln k + Â LnX i ∂L ∂k
=
na k
nG ¢(a ) G (a )
(1)
;
- Â Xi.
(2)
aˆ Setting the second equations to zero yields kˆ = , and substituting this X into the first partial equation yields an equation that can be solved only numerically. nG (aˆ ) nG ¢(aˆ ) = n Ln aˆ - n Ln X + Â Ln X i = 0. X G (aˆ ) G (aˆ ) However, the MM estimates are aˆ = 6.4 and kˆ = 3.9, and invoking the bootstrap with the command (bootstrap-ak 6.4 3.9 data 1000) returned the following. n Ln
aˆ
+ Â LnX i -
The mean and variance of a-hat are (6.7 3.7). A 95% confidence interval for parameter alpha is (6.2, 6.5). The mean and variance of kˆ are (4.1 1.4). A 95% confidence interval for parameter k is (3.5, 4.2). A list of outliers for aˆ and kˆ are also printed. EXAMPLE 6.39
Let X1, X2 . . . , Xn be a random sample from the continuous uniform on [0, q]. Find the MME and the MLE estimators and the mean and variance of each estimator. Solution f(x) = 1/q on [0, q]. E( X ) =
1
Ú q
q
0
xdx =
q 2
fi qˆMME = 2x and E(2x ) = q fi qˆMME is ubiased.
V (qˆMME ) = V (2x ) = 4V ( x ) = 4 * q 2 /12n = q 2 /3 n; qˆMLE = max{ X i }.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 368
Chapter 6 Point and Interval Estimation
368
Table 6.2
Confidence Intervals
Population
State of s
Sample Size
Parameter
Confidence Interval
Normal
N
Known
m
x ± ( za /2s / n )
Normal
n < 30
Unknown
m
x ± (tn -1,a /2 s/ n )
Arbitrary
n ≥ 30
Unknown
m
x ± ( za /2 s/ n )
Normal
n < 30 paired data
Unknown
mD
d ± (tn -1,a /2 s D / n )
Normal
N
Known
m1 - m2
( x1 - x2 ) + za /2
s 12 n1 s12
+
s 22
+
s22
n2
Arbitrary
n ≥ 30
Unknown
m1 - m2
( x1 - x2 ) + za /2
Normal
n < 30
Unknown
m1 - m2
( x1 - x2 ) + tn +n -2,a /2 sPooled
Normal
N
Unknown
s2
Binomial
n ≥ 30
Unknown
p
ˆ ± za /2 p
Binomial
n1 ≥ 30 n2 ≥ 30
Unknown
p1 - p2
ˆ1 - p ˆ2 ) ± za /2 (p
Normal
n1 n2
Unknown
s 22
s22
s 12
s12
1
( n - 1)s 2 c n2 -11 , - (a /2 )
n1
n2
2
£ s2 £
1 n1
+
1 n2
( n - 1)s 2 c n2 -1,a /2
ˆˆ pq n ˆ1qˆ1 p n1
+
ˆ2qˆ2 p n2
Ê s 2 ˆ s2 F1-(a /2)(v1, v2 ) £ Á 2 ˜ £ 2 Fa /2 (v1, v2 ) Ë s 2 ¯ s2 1 1
Note: When sampling without replacement from a population that is small relative to the sample size, a correction factor is appropriate. With N the population size and n the sample size, the correction factor is N-n given by . The correction factor pertains to sampling when the sample size is greater than 10% of the N -1 population size and is multiplied by the error.
To seek its density function, designate qˆMLE = Z. Then n
zn Ê z dx i ˆ FZ ( z ) = P ( Z < z ) = P ( X i < z, . . . , X n < z ) = Ú = . Ë 0 q ¯ qn f ( z) = E( Z ) =
nz n -1
on [0, q ].
qn
Ú
E( Z 2 ) =
nz n
q
0
q
n
dz =
q
nz n +1
0
qn
Ú
nq n +1 ( n + 1)q
dz =
n
=
nq n + 2 ( n + 2)q n
nq n +1 =
;
nq 2 n+2
Bias = .
nq n +1
-q =
q n +1
.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 369
369
Problems 2
nq 2
nq 2 Ê nq ˆ V( Z) = = . n + 2 Ë n + 1¯ ( n + 2)( n + 1)2 nq 2
2
2q 2 Ê q ˆ MSE Z = V ( Z ) + B = + = . ( n + 2)( n + 1)2 Ë n + 1¯ ( n + 1)( n + 2) ( n + 1)qˆMLE An unbiased MLE estimator for q is . n 2
With an understanding of estimation theory and confidence intervals, we now take up hypothesis testing in Chapter 7.
PROBLEMS ESTIMATORS
1. Given that RV X has a continuous uniform distribution on the interval [3, 5], find and sketch the density distribution of the sample mean with sample size 49. ans. N[4, (5 - 3)2/(12*49)]. 2. Consider 36 random samples from the continuous uniform distribution for RV X on [5, 10]. Find E( X), V( X), and P( X < 7.55). 3. Let X1, X2, . . . , Xn be a random sample from a normal distribution. Compute E( X 2). Is X 2 an unbiased estimator for m2? Determine the bias and the effects of a larger sample size. ans. s 2/n + m2 no s 2/n decreases bias. 4. Find the maximum likelihood estimator for Bernoulli parameter p. 5. A coin is flipped 5 times, revealing heads on the first, third, and last flip and tails on the second and fourth; that is, the sample data are 1 0 1 0 1. Write the likelihood function and find the MLE for p, the probability of a heads. ans. 0.6. 6. Find the method of moments estimators for gamma distribution parameters a and k. 7. Given E(X1) = E(X2) = m, V(X1) = 5, and V(X2) = 12, find the a) variance of mˆ = 1/2 X1 + 1/4 X2 and b) value of p which minimize the variance of ans. 2 12/17. mˆ = pX1 + (1 - p)X2. 8. Unbiased MME qˆ = c X is used to estimate q from density f(x) = 1/q for 0 < x < q. Find c and the variance of qˆ with sample size n. 9. RV X has density f(x; q) = (q + 1)xq for 0 £ x £ 1; q > -1. A random sample of size 10 (0.92 0.79 0.90 0.65 0.86 0.47 0.73 0.97 0.94 0.77) has x = 0.8. a) Find the MME qˆ for q and compute the estimate from the data. ans. 3.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 370
370
Chapter 6 Point and Interval Estimation
b) Find the MLE qˆ for q and compute the estimate from the data. ans. 3.11.
ÂX 10. Show that
2
n
n
i =1
n
2 i
-
Ê ˆ X ËÂ i¯ i =1
n2
n
Â(X =
i
- X )2
i =1
. n
11. Find the MME and MLE for q, given a random sample X1, X2, . . . , Xn from a population with density f(x; q) = e-(x-q), for x > q. ans. MME: x - 1 MLE: min{Xi}. 12. Show that S2 is an unbiased estimator for s 2 and thus S2/n is an unbiased estimator for s 2x . 13. Given X1, X2, . . . , Xn from the following distributions, find the method of moments and maximum likelihood estimators for p. a. Geometric
b. Bernoulli
c. Binomial
d. Negative binomial
14. Given X1, X2, . . . , Xn from the following distributions, find the method of moments estimators for k. a. Poisson
b. Exponential
15. a) Given a population of four values {1 5 11 13}, find the sampling distribution of the mean and verify that E( X ) = m and V( X ) = s 2/n for all possible samples of size 2 with replacement. ans. X = 7.5; V( X ) = 11.375. b) Show that the median X˜ of all random samples of size 3 taken without replacement from this population {3 5 10 12 15.5} is an unbiased ˜ ) = 9.1. estimator of m. ans. E( X 16. Find the better of the two estimators from a random sample X1, X2 . . . X1 + 2 X 3 X5 and compute the relative efficiency, given that qˆ1 = and 3 X1 + X 2 + X 3 qˆ2 = . 3 17. Given data 50 51 57 54 52 55 56 60 59 52 55 51 taken from a gamma distribution with a = 2, k unknown, find the a) MM estimator for k and the estimate; b) ML estimator for k and the estimate.
ans. 2/ x 0.0368. ans. 2/ x 0.0368.
18. Let X1, X2, . . . , Xn be a random sample from the continuous uniform on [0, q]. Find the MME and MLE and the mean and variance of each estimator. 19. a) A sample size of 25 has V( X ) = 4. Compute the comparable sample size if V( X ) = 5. b) A sample size of 25 has V( X ) = 4. Compute the comparable sample size if V(X ) = 2. ans. 20 50.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 371
371
Confidence Intervals
20. The following sample is taken from an exponential with parameter k. Find the MME for k. (1.0 0.4 0.0 2.0 0.1 4.5 1.1 0.4 1.3 2.3 3.3 1.0 1.3 0.3 1.0 0.3 1.3 0.9 1.9 3.0 0.2 4.3 1.5 4.0 0.4 2.3 0.9 1.8 0.9 0.2) 21. The sample was taken from N(m, s 2). Find the MM and ML estimates for m and s 2. 49.8 47.8 52.3 48.2 50.4 49.7 49.9 54.7 48.6 53.3 41.2 45.2 48.5 45.1 46.1 49.2 48.8 49.4 53.3 46.6.
ans. 48.9 9.6 from N(50, 16)
ˆ from a 22. Find the MME and MLE point estimators and estimates for p sample of size 30 taken from a geometric distribution: 3 2 1 3 1 1 2 2 1 6 4 2 15 2 6 3 4 3 1 4 2 4 2 2 3 7 9 1 1 4. 23. Show that the maximum likelihood estimator X is an unbiased minimum variance estimator for k from the Poisson distribution. 24. Find the MLE estimator for parameter k from a gamma distribution with a = 3. 25. Find the MLE estimator for k from a Weibull distribution with a = 3. n
ans n/ Â X i3 . i =1
26. If the joint density likelihood function of n random samples can be factored as f(x1, x2, . . . , xn, q) = T[g(x1, x2, . . . , xn); q] h(x1, x2, . . . , xn), where T depends on the sample for g but not for h and h does not depend on q, then statistic g(x1, x2, . . . , xn) is sufficient for q. Show that x is sufficient for the k parameter of a Poisson distribution. unbiased estimator for normal parameter s. Hint: n s ( X i - X )2 E( c n2 -1 ). Use the =Â ~ c n2 -1; E( S ) = 2 2 s s n -1 i =1 Gamma function G(a) to evaluate the c 2 density function. 2G [( v + 1)/2]s ans. E(S) = . n - 1G ( v/2)
27. Find an ( n - 1) S 2
CONFIDENCE INTERVALS Assume random sampling from a normal distribution N(m, s 2) unless otherwise specified. 1. Rewrite P(-za/2 £ Z £ za/2) = (1 - a) about m where Z =
X -m
. s n ans. x - za /2s / n < m < x + Za /2s / n .
P369463-Ch006.qxd 9/2/05 11:16 AM Page 372
372
Chapter 6 Point and Interval Estimation
2. a) Find 95% and 99% confidence intervals for m with n = 10, x = 42, and s = 1.6. Note which interval is longer. How would the length of the interval change if b) n increased to 100? c) s increased to 4? d) Find n if the error of the 95% confidence interval can be no more than 0.5. 3. Specify the confidence level for the following intervals when sampling from N(m, s 2): a) x + 2.575s / n , b) x + 0.26s / n , c) x + 2.33s / n , d) x ± s / n . ans. 99% 20% 98% 68.26%. 4. Find a 95% confidence interval for m and s 2 from the following data: 28.7 25.8 24.0 25.9 26.4 28.9 25.4 22.7 25.1 27.9 29.3 28.9 24.3 24.8 23.6 25.2 5. Compute a 99% confidence interval for m, given n = 40, x = 9.46, and s = 0.58. ans. (9.22, 9.70). 6. Find a 95% confidence interval for m1 - m2 from 16 paired dependent observations, given that d = 1.25 and s D = 14.7. 7. Find the smallest 95% confidence interval for s 2, given a random sample from a normal distribution with n = 25, s2 = 0.25, and a = 0.05. ans. (0.149, 0.472). 8. Find a 95% confidence interval for parameter p, given 12 successes from n = 75 trials. 9. Find a 90% confidence interval for m1 - m2, given n1 = 15, n2 = 20, x1 = 90, x 2 = 85, s 12 = 4, and s 22 = 6.25. ans. (3.75, 6.25). 10. Find a 95% confidence interval for m1 - m2 from normal distributions with unknown but same variance, given that n1 = 12, n2 = 15, x1 = 24.6, x2 = 22.1, s1 = 0.85, s2 = 0.98. 11. Find a 95% confidence interval for m1 - m2, given 20 paired observations, where the mean difference d = 1.21 with s D = 12.68. ans. (-4.72, 7.14). 12. Find 95% confidence intervals for s 2 and s from a normal random sample with n = 20 and s2 = 0.0225. 13. Derive a confidence interval for binomial parameter p with large random sample size n and the help of the Central Limit Theorem. 14. Two random samples of size n1 = 8 and n2 = 7 from a normal distribu2 tion produced sample variances of s1 = 4 and s22 = 3.6. Find a 90% con2 fidence interval for the ratio s 22/s 1.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 373
373
Miscellaneous
15. Find a 95% confidence interval for m of a normal distribution with unknown variance from a random sample of 20; x = 10 and s = 4. ans. (7.44, 12.56). 16. Evaluate the following integrals: a. d.
1 2p 1 2p
1.96
Ú
-1.96
Ú
•
-•
2
e - z /2 dz;
z2 e
1
b.
2 2p - z2 /2
dz; e.
Ú
•
0
Ú
4
2
2
e - ( x - 5 ) /8 ;
2( x - 0.5)2 e -2 x dx;
1
c. f.
Ú
6
2
2 2p • ka x a e - kx
Ú
0
2
e - ( x - 5 ) /8 ;
G(a )
dx.
17. Explain these two confidence intervals: P(m £ x + 0.25) = 0.95 and P(m ≥ x + 0.25) = 0.95. 18. Find a 95% confidence interval for the ratio s 22/s 12, given that s12 = 9 and s22 = 15 with respective sample sizes of 12 and 20 from independent normal distributions. 19. Suppose the mean of a random sample of size 15 from N(m, s 2) with m and s 2 unknown is 20, with sample error 4. Find a 95% prediction interval for the 21th sample. ans. (11.415, 28.585). 20. Cereal box contents by weight are normally distributed with a standard deviation of 0.3 ounce. A random sample of 25 boxes showed a mean weight of 16.2 ounces. Find a 95% confidence interval for m, the true mean weight.
MISCELLANEOUS 1. The lifetime of an electronic component is distributed N(m, s = 60 hours). Find the sample size to ensure a 95% confidence interval for m with an error of at most 10 hours. ans. 139. 2. Show that S is a biased estimator for s by assuming that S is unbiased and reaching a contradiction. 3. Determine a 95% confidence interval for the parameter cost C to repair 3 machines, where C = 2m1 + 5m2 + 3m3 and 100 random records of each machine have been examined to reveal the following data: ans. (127.15, 134.85). x1 = 10 s12 = 4
x2 = 15 s22 = 9
x3 = 12 s32 = 16.
4. To be 95% confident in the results of a yes/no poll, how many people must be polled if an error of 3% is tolerable?
P369463-Ch006.qxd 9/2/05 11:16 AM Page 374
374
Chapter 6 Point and Interval Estimation
5. Given random sample X1, X2, . . . , X10 = 45 40 43 43 44 45 42 41 42 43 ( X = 42.8), find an unbiased estimate for a) b) c) d)
x/100 = 0.428 s2 = 2.62 x = 42.8 2/ x = 0.0467
p if the sample is from a binomial(x; 100, p). s 2 if the sample is from N(m, s 2) k if the sample is from a Poisson. k if the sample is from a gamma with a = 2.
6. For the following stem and leaf data from a normal distribution, find confidence intervals for a = 1, 5, and 10%.
Cum 3 7 13 22 32 42 50 56 61 64
Stem 0 1 2 3 4 5 6 7 8 9
Leaf 4 6 2 5 1 1 2 4 3 3 1 1 0 1 2 4 0 1 1 1
n = 64 7 7 7 2 5 6 5 7 7 4 5 5 2 3 4 4 5 6 5 5 6 1 1 2 3
7 8 6 6 6 7
8 7 7 8
9 9 8 8 9 7 8 9 9
7. Find MMEs for the parameters a and b of a continuous uniform RV X on [a, b]. See Software Exercise 11. ans. aˆ = M1 - 3( M 2 - M12 ), bˆ = M1 + 3( M 2 - M12 ). 8. Let X1, X2, . . . , X35 be a random sample from N(4, 21) and Y1, Y2, . . . , Y49 be a random sample from N(2, 28). Find the sampling distribution of X - Y . 9. Which of the following symbols are RVs? a) m
b) S
c)
s2
n g) N (population size)
d) X
e)
S2 n
f ) Max{Xi}
h) s 2
10. Find a 95% prediction interval for the 16th sample, given that the first 15 random samples were taken from a normal distribution with x = 1.95 and s = 0.01. 11. Find the method of moments estimator qˆ for a random sample X1, X2, . . . , Xn taken from the continuous uniform on the interval [1, q]. Determine whether the MME estimator is unbiased. ans. 2 X - 1, unbiased. 12. Find the sample sizes needed for 95% and 99% confidence intervals if the standard deviation is 4 and the error is less than 1.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 375
375
Software Exercises
SOFTWARE EXERCISES 1. Random Sampling. Consider the following 100 samples: 29 41 09 89 74
82 55 41 37 14
52 34 77 02 50
91 71 42 59 22
76 73 92 03 25
55 34 50 51 01
07 88 01 84 71
29 40 34 89 14
67 66 33 25
33 26 14 36
43 37 38 26
19 46 36 84
76 72 96 85
18 8 37 19 53 32 48 47 58 59 79 04 65 13 25 57
56 02 88 5 81 82 20 71 31 18 26 82 96 41 29 55 100 01 37 100 54 12 96 13
a) Quickly select the 10 values that in your judgment best represent these samples and compute the sample mean. Cross out the selected data and repeat for 10 other represented values. Compute the sample mean of each. Use (mu data) to compute x of your chosen values. b) Compute the sample mean from the combined 20 samples. c) (swor m (up to 10)) returns m random selections from the integers 1 to n. (swor 10 (upto 100)) may return (21 59 95 47 54 29 15 64 57 10). You may use these numbers as ordinal positions from the data list to indicate a random selection of 10 values to compute the sample mean. You may also run (swor 10 (upto 100)) for another list of 10 random numbers. The corresponding samples for the random 10 integer ordinal positions are shown in bold. 21 59 95 47 54 29 15 64 57 10 5 96 22 9 34 34 8 96 38 33.
d) Compare the sample means from using your judgment with the sample mean from the randomly selected data. Compute the sample mean for all the sampled data. ans. x = 46.64; s2 = 800.25.
CENTRAL LIMIT THEOREM
For exercises 2 thru 9, to demonstrate the CLT at work, use the following sequence of commands (sim-distribution 1/2 100) by substituting the name for the distribution. (mu-svar *) (HDP **) (sim-clt-distribution 1/2 100) (mu-svar *) (HDP **)
;;; depicts the distribution in dot plot ;;; central limit theorem ;;; depicts the asymptotic normal
2. (sim-binomial n p m) returns the results from n Bernoulli trials with probability of success p, repeated m times. Try (setf data (simbinomial 32 1/4 100)). Then use (print- count-a-b 3 14 b-data)
P369463-Ch006.qxd 9/2/05 11:16 AM Page 376
376
Chapter 6 Point and Interval Estimation
to see the distribution of successes. Perform a horizontal dot plot with the command (HDP data). When turned on its side, what distribution should be depicted? On what value should the distribution center be near? (sim-clt-binomial n p m) returns a list of m sampling means from (sim-binomial n p 100). That is, m trials of (sim-binomial n p 100) are performed, returning a list of the m sampling means. Try (setf data (sim-clt-binomial 10 1/2 25)). Predict the mean and the variance. Confirm with (mu-svar data). ans. 5 2.5/100. 3. (sim-poisson k n) returns the number of occurrences of a Poisson distribution with parameter k repeated n times. Perform a horizontal dot plot using commands (setf data (sim-poisson 50 100)) followed by (HDP data). When turned on its side, what distribution should be depicted? What should (mu-svar data) return? ans. normal 50. (sim-clt-poisson k n) returns a list of n sampling means from (simpoisson k 100); n trials of (sim-poisson k 100) are performed, from which E( X ) = m and V( X ) = s 2/100. Try (setf data (sim-clt-poisson 5 25)). Predict the mean and the variance. Confirm with (mu-svar data). Try (HDP data). ans. 5 5/25. 4. (sim-geometric p n) returns n simulated values from a geometric distribution with probability of success p. The values represent the number of Bernoulli trials for the first success to occur. Stem and leaf (simgeometric 1/20 100). Predict the number of trials at which the first success occurs. (sim-clt-geometric p n) returns a list of n sampling means from (sim-geometric p 100). That is, n trials of (sim-geometric p 100) are performed, from which E( X ) = m and V( X ) = s 2/100 can be calculated. Try (setf data (sim-clt-geometric 1/2 25)). Predict m x and s 2x. Confirm with (mu-svar data)). Try (HDP data). 5. (sim-gamma a k n) returns n samples from a gamma density distribution with parameters a and k. Try (setf data (sim-gamma 2 5 100)). Predict m x and s 2x. Confirm with (mu-svar data) to see the nearness to a/k = 0.4 and the nearness to V(X ) = a/k2 = 0.08. Find estimators aˆ and kˆ and the estimates from a sample of size 100 from a gamma density with x = 0.536 and s2 = 0.064. ans. aˆ = 4.489 kˆ = 8.375 (sim-clt-gamma a k n) returns a list of n sampling means from (simgamma a k 100). N trials of (sim-gamma a k 100) are performed, from which E( X ) = m and V( X ) = s 2/100 can be calculated. Predict the
P369463-Ch006.qxd 9/2/05 11:16 AM Page 377
Software Exercises
377
mean and variance of data in (setf data (sim-clt-gamma 2 10 30)). Confirm with (mu-svar data)) and (HDP data). ans. 2/10 2/1000. 6. (sim-weibull a k n) returns n samples from a Weibull density distribution with parameters a and k. (setf data (sim-weibull 2 10 50)) followed by (mu-svar data) to get the results. 1 Ê 3ˆ Is the simulated mean near G = 0.280 ? 1/2 Ë 2¯ 10 (sim-clt-weibull a k n) returns a list of n sampling means from (simweibull a k 100). That is, n trials of (sim-weibull a k 100) are performed, from which E( X ) = m and V( X ) = s 2/100 can be calculated. Predict the mean and the variance of (sim-clt-weibull 2 10 30). 7. (sim-clt-exponential k n) returns a list of n sampling means from (simexponential k 100). That is, n trials of (sim-exponential k 100) are performed, returning a list of the n sampling means. Try (hdp (simexponential 1/2 100)) to see an exponential plot of one sample. * * * * * * *
* * * * * *
* * * * *
* * * * *
* * * *
* * * *
******************************** ************* ******* *
Then try (setf e-data (sim-clt-exponential 1/2 100)), followed by (hdpe e-data), and notice the normal-looking sampling distribution of the exponential mean. * * * * *
* * * *
* * * *
* * * *
* * * *
* * * *
**************** **************************** *********************** *
Predict the mean and the variance of (sim-clt-exponential 1/2 100). ans. 2 4/100. 8. (sim-clt-uniform a b n) returns a list of n sampling means from (simuniform a b 100). That is, n trials of (sim-uniform a b 100) are performed, from which E( X ) = m and V( X ) = s 2/100 can be calculated. Try (setf u-data (sim-uniform 5 20 100)), followed by (hdp u-data). Compare horizontal dot plot with the plot of (hdp (sim-clt-uniform 5 20 50)). 9. (sim-clt-neg-binomial p k n) returns the results of n number of trials at which the kth success occurred; the mean and variance are returned,
P369463-Ch006.qxd 9/2/05 11:16 AM Page 378
378
Chapter 6 Point and Interval Estimation
along with a list of n sampling means from (setf data (sim-negbinomial p k 100)). That is, n trials of (sim-neg-binomial p 100) are performed, returning a list of the n sampling means. Predict mean and variance of (sim-clt-neg-binomial 1/2 5 25). Confirm with (mu-svar data) and (HDP data). ans. 10 0.1. 10. (sim-beta a n) returns a random sample of size n with b = n = a + 1. Generate for a = 40 and n = 100 to see how close x is to m = a/(a + b). Try (mu (sim-beta 40 100)). 11. (MME-vs-MLE q ss n) compares the n method of moments estimator with the n maximum likelihood estimator for f(x) = 1/q on [1,q], with the values of each showing which is closer to q. The MME is 2 x - 1 and the MLE is the max {Xi}. The command returns the number of times each was closer, using a sample size ss. Try (MME-vs-MLE 10 30 15) for 15 experiments of reporting the better estimator in 15 samples. 12. Revisit Miscellaneous Problem 7 through the following software commands. a) (setf sample (sim-uniform 5 12 30)) ; sample size 30 from U[5, 12]. b) (setf m1 (mu sample) s2 ; m1 is x 1st moment, s2 is (svar sample))) s2. c) (setf m2 (+ s2 (square m1)) ; assigns m2 to s2 + m21 (second moment) d) (setf x (sqrt (* 3 (- m2 (square m1))) e) (list (- m1 x) (+ m1 x)) ; returns unbiased estimates for 5 and 12. The command (UabMMML a b ss n) returns n estimates each of the MME and MLE and the closer estimate. (UabMMML 5 12 30 5) returns the 5 estimates each of parameters a = 5 and b = 12, using a sample size of 30.
METHOD OF MOMENTS
MAXIMUM LIKELIHOOD
WINNER
A-HAT
B-HAT
A-HAT
B-HAT
A-HAT
B-HAT
4.441 5.241 5.287 5.110 5.011
11.915 11.771 12.383 12.241 11.691
5.123 5.802 5.103 5.066 5.046
11.977 11.712 11.720 11.894 11.884
MLE MME MLE MLE MME
MLE MME MLE MLE MLE
(Uab-MM-ML-LS 5 12 30 5) returns MM ML and least-squares estimates (Chapter 8).
P369463-Ch006.qxd 9/2/05 11:16 AM Page 379
379
Software Exercises MME
MLE
LEAST-SQUARES
A-HAT
B-HAT
A-HAT
B-HAT
A-HAT
B-HAT
4.0149 4.5580 4.9197 4.7132 5.7834
11.7250 11.9379 12.3341 11.9518 12.0680
5.1962 5.1129 5.0113 5.1943 5.4279
11.8996 11.9482 11.7610 11.6037 11.7221
5.3694 5.2834 5.1783 5.3675 5.6089
12.2963 12.3465 12.1531 11.9905 12.1128
1
for -• < x < •. Show that f(x) is p ( x 2 + 1) a valid density function. The density function is called Cauchy. Show that E(X ) does not exist for the Cauchy density. Hint: (arctan x)¢ = 1/(x2 + 1). Given that F(X ) = 1/2 + (1/p) * arctan x, show how to simulate a Cauchy RV and compute E(X ).
13. Let density function f ( x ) =
Let U = 1/2 + (1/p) * arctan x. Solving for x yields x = tan[p(u - 1/2)], where U is continuous uniform on [0, 1]. How do you expect E(X ) not to exist from a random sample of the Cauchy distribution? Try (mu-svar (sim-cauchy 100)) and check the fluctuations in the mean and the variance. 14. This exercise looks at the efficiency of the sample median with the efficiency of the sample mean. The command (Median-vs-Mu m s n) returns n runs each of computing the mean of a normal distribution N(m, s 2), using a sample size of 100 for the median and a sample size of 64 for the sample mean. From a normal sample of size 100, 64 were randomly selected to compute the mean, while the entire sample was used to compute the median. I. (median-vs-mu 2 5 10) Medians Æ 2.31 2.74 1.84 1.96 1.80 1.55 0.95 3.09 2.66 1.86 Means Æ 2.13 3.06 1.50 2.93 2.26 1.16 1.14 2.02 2.30 2.08 II. (median-vs-mu 50 20 10) Medians Æ 51.27 51.42 54.97 48.90 45.57 55.39 48.83 50.07 48.57 51.05 Means Æ 52.67 52.99 55.40 47.48 48.87 48.52 53.61 49.49 51.01 54.44 15. The command (sample-moment n data) returns the nth sample moment of data, a list of numbers. Data may be from various distributions. (sample-moment 2 (sim-normal 2 2 100)) returned 8.179 as an estimator for E(X2) = s 2 + m2 = 4 + 22 = 8. 16. Estimate the MME and MLE for q, given density function f(x; q) = (q + 1)xq for 0 £ x £ 1; q > -1. For example, with q = 3, f(x) = 4x3 with E(X ) = 0.8 and F(X ) = X4. To simulate a sample, set U = x4 with
P369463-Ch006.qxd 9/2/05 11:16 AM Page 380
380
Chapter 6 Point and Interval Estimation
x = U1/4. Then take n = 30 samples from the continuous uniform on [0, 1] and take the fourth root of each. The command (setf U (simuniform 0 1 30)) returns a random sample of 30 from the continuous uniform distribution on [0, 1]. (setf X (repeat #'expt U (list-of 30 1/4))) takes the fourth root of each uniform sample. (mu X) returns 2x - 1 x as the method of moments estimator for m = E( X ) = 0.8. qˆ = . 1- x (sum (repeat #'log X)) adds the sum of the natural log of the sample data X. See how close your estimate is to 3 using n + Â Ln( x i ) 2x - 1 qˆMME = and qˆMLE = = -1 1- x -Â Ln( x i )
n
 Ln( x )
,
i
that is, (+ -1 (/ -30 (sum (repeat #'log x)))) for the MLE. 17. (MMvsML q-range n m) returns m trials from n random samples with a random value of q chosen from the integer range [0, q-range - 1], which specifies the distribution to compare the MME with the MLE by tallying the results. The density is f(x) = (q + 1)xq for 0 £ x £ 1. Try (MMvsML 20 50 10). 18. Simulate 100 samples from the maximum of two continuous uniform RVs X and Y on the interval [0, 10]. See Example 6.8, where Z = X + Y and z = 2qˆ/3. The commands are 1. 2. 3. 4.
(setf X (sim-uniform 0 10 100)) (setf Y (sim-uniform 0 10 100)) (setf Z (repeat #'max X Y)) (mu Z)
; ; ; ;
generates 100 samples of X generates 100 samples of Y picks the max {Xi, Yi} expect a value close to 20/3.
Try the command (mu (repeat #'max (sim-uniform 0 10 100) (sim-uniform 0 10 100))) in conjunction with the ≠ key to repeat the command to return estimates of mZ = 20/3. 19. The command (s2pool data) returns the pooled variance from a list of sampled data. For example, (s2pool '((1 2 3)(4 5 6 7) (8 9 10 11 12))) returns 1.8888. 20. Repeat Example 6.18, using the command (sim-nci m s ss n a) to generate n confidence intervals, and count the number of intervals containing m. See Software Exercise 23. (sim-nci 10 2 50 20 5) should return about 19 intervals containing the value 10.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 381
381
Software Exercises
The following is an actual run, showing one interval in italics not containing the mean m = 10. (9.36 (8.96 (9.47 (9.49
10.47) 10.07) 10.58) 10.60)
(9.77 10.87) (9.62 10.73) (9.74 10.85) (10.02 11.13)
(9.60 (9.52 (9.30 (9.65
10.71) 10.63) 10.41) 10.76)
(9.03 (9.20 (9.54 (9.71
10.14) 10.31) 10.65) 10.82)
(9.56 (9.49 (9.94 (9.09
10.66) 10.60) 11.05) 10.20)
21. The experiment is tossing a fair die. E(X ) = 3.5 and V(X ) = 35/6. (die n) returns the simulated results from tossing a fair die n times. For example, (die 25) may return 3 6 3 1 1 3 5 6 3 3 3 2 5 2 6 6 4 4 6 2 2 2 3 3 3, from which x = 3.48 and s2 = 2.59. The following software code generates m sample runs of x with n tosses of a fair die each time. (defun dx (m n) (let ((x nil)) (dotimes (i m x) (push (mu (die n)) x)))). (dx 10 25) may return 3.92 3 4.28 3.04 3.96 3.6 3.92 3.4 3.32 3.04, from which x = 3.54 and s2 = 0.20, compared with the theoretical a + b 1+ 6 E( X ) = = = 3.5 and 2 2 V( X ) =
n2 - 1 12 * ss
=
62 - 1 12 * ss
=
35 12 * 10
= 0.29.
Try (mu-svar (dx 100 100)) along with the ≠ repeat key to see similar runs. (print-count-a-b 1 6 (die 1296)) may return Integer Count
1 216
2 204
3 220
4 201
5 226
6 229.
Try the ≠ repeat key to see similar returns. 22. The command (sim-u a b n m) returns m simulated means for the estimators from a continuous uniform density on [a, b] of sample size n. Use the command (setf sample (sim-uniform 5 10 100) clt-sample (sim-u 5 10 100 100)) to compare the continuous uniform distribution with the sampling distribution of its mean.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 382
382
Chapter 6 Point and Interval Estimation
Compute the sample mean and variance of sample and clt-sample with the commands (mu-svar sample) and (mu-svar clt-sample). Horizontal dot plot with the command (HDP sample) to see the shape of the uniform distribution on the interval [5, 10]. Then horizontal dot plot with the command (HDP clt-sample) to see the central limit theorem at work with a somewhat normal distribution. (mu-svar sample) returned (7.462 2.125) and (mu-svar clt-sample) returned (7.515 0.021). 23. The template (ci data alpha) returns a 100(1 - a)% confidence interval for the mean and variance of the data list. For example, (ci (sim-uniform 0 1 100) 5) returned the following: 95% Confidence Interval for m is: (0.4589 0.5852) 95% Confidence Interval for s 2 is (0.0780 0.1365) 24. The template (sim-plot-ci m s n m a) plots m confidence intervals of sample size n, given m, s, and a. For example, (sim-plot-ci 15 20 36 15 5) may return the following:
INTERVAL LENGTH = 13.06, m = 15, s = 20, n = 36
______________ ______________ ______________ ______________ ______________ ______________ ______________ _____________ ______________ ______________ ______________ ______________ ______________ ______________ ______________
95% CONF INTERVALS (12.67, 25.74) (7.20, 20.27) (9.05, 22.12) (5.41, 18.48) (3.96, 17.03) (3.38, 16.45) (15.19, 28.26) (8.77, 21.84) (15.15, 28.22) (1.41, 14.48) (12.15, 25.22) (5.64, 18.71) (9.74, 22.81) (9.13, 22.20) (4.80, 17.87)
Try (sim-plot-ci 30 25 20 8 50) and expect half to contain the mean 30.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 383
383
Software Exercises INTERVAL LENGTH = 7.5376, m = 30, s = 25, n = 20
________ ________ ________ ________ _________ ________ ________ ________
50% CONF INTERVALS (31.0861, 38.6238) (27.8750, 35.4126) (29.3015, 36.8392) (37.2209, 44.7585) (19.8807, 27.4183) (26.5550, 34.0927) (30.7267, 38.2643) (17.9823, 25.5200)
25. The command (mu-Z-ci x sigma n a) returns a 100(1 - a)% confidence interval for m. (mu-Z-ci 50 4 49 1) Æ (48.53, 51.47) with 99% confidence. The command (mu-T-ci x s n a) returns a 100(1 - a)% confidence interval for m. (mu-T-ci 50 4 49 1) Æ (48.47, 51.53) with 99% confidence. 26. (random-sample max-size) returns a random sample from an unknown distribution of max-size. Try to determine the distribution from testing the sample. Typing *RS* reveals the distribution. For example, (setf data (random-sample 30)) may return (59 41 49 51 72 60 69 53 52 52 64 53 58 52 57 56 55 44 50 46 61 47 41 55 46 59 58 67 63 55). (depict data) returns N 30 STD-DEV 7.746 Q-1 49.750
MEAN 54.833
MEDIAN 55.000
MODE 55.000
TRIM-5% 54.714
SUM 1645
MSSD 50.103
SE-MEAN 1.414
SVAR 60.006
IQR 9.500
MAD 4.500
RANGE 31
MID-RANGE 15.500
Q-3 59.250
MIN 41.000
MAX 72.000
SKEWNESS 0.196
KURTOSIS 2.487
CV 0.141
Horizontal Dot Plot N = 30 * * * * *
* ***** ********** ******* *
*RS* returns (SIM-POISSON 55 30), the distribution from which the sample was drawn.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 384
384
Chapter 6 Point and Interval Estimation
27. Suppose you know that the following sample is from a normal distribution with unknown mean and variance. Use the bootstrap procedure to generate a 95% confidence interval for the mean. (setf data '(51.63 50.36 50.59 50.33 50.15 49.7 45.09 55.26 48.86 50.64)) (mu-std-err data) returns ( x = 50.26, s = 2.50). We use these as parameters of our sampling distribution. Now take 10 samples with replacing from N(50.26, 6.24) using the command (sim-normal 50.26 2.50 10) and for each sample compute x. Do this 1000 times or more. Bootstrap sample sizes are usually 1000 or more. Then take the 97.5 and 2.5 percentiles of the returned x ’s. Try (bootstrap data 10) to see the complete return before running (bootstrap data 1000), which will return a list of 1000 x for each sample of 10 and a list of 1000 standard errors. If you want to see the data, run the command (bootstrap data 1000). Otherwise (setf boot-data (first (bootstrap data 1000)) followed by (list (percentile 2.5 boot-data) (percentile 97.5 boot-data)). The command returned (48.21 53.31) a 95% confidence interval for m. The sample was taken from N(50, 4). The mean x of the x ’s was mˆ = 50.06 and the mean of the standard errors was sˆ = 2.49.
SELF QUIZ 6: ESTIMATION AND CONFIDENCE INTERVALS 1. a) Let X1, X2, . . . , X49 be a random sample from a geometric distribution with probability of success p = 1/4. Then E( X ) = ___ and V( X ) = ___ b) Given population {2 5 7 8 9 10 12} and considering the distribution of all possible samples X1, X2, . . . , X4 with replacement, the mean of the sampling mean distribution of means is ____ and the variance of the sampling mean of this distribution is ____. 2. Given random sample X1, X2, . . . , X4 = {0.78 0.97 0.34 0.25} from density f(x, q) = 1/q on [0, q], qˆMME = ____ from the data values. a) MMEq = ____. qˆMLE = ____ from the data values. b) MLEq = ____. ˆ c) V(q MME) = ____.
P369463-Ch006.qxd 9/2/05 11:16 AM Page 385
385
Self Quiz 6: Estimation and Confidence Intervals
3. Given random sample X1, X2, . . . , X36 from density distribution f(x) = 2e-2x for x > 0, then P( X > 0.48) = ____. 4. a) In sampling from N(m, s 2) with s known, the value of a for the interval x ± 1.72s/ n is ____. b) The size n for a total length 1/2 of a 95% confidence interval with s = 1.75 is ____ 5. For a random sample randomly sampled data 3 18 6 3 4 15 4 7 13 9 22 8 from a normal distribution, a) Write a 99% confidence interval for the population mean m. b) Write a 95% confidence interval for the population variance s 2. 6. The expected daily cost of the downtime of 3 machines is given by C = 5m1 + 3m2 + 4m3. Provide a 95% confidence interval for the cost if random selections from company records revealed: n1 = 200 x1 = 12 s1 = 6
n2 = 200 x2 = 19 s2 = 4
n3 = 200 x3 = 14 s3 = 5
7. A study reported that 200 of 500 subjects benefited significantly from a new drug. Provide a 99% confidence interval for the proportion p who benefited. 8. Let X1, X2, . . . , X36 be a random sample from N(6, 25) and Y1, Y2, . . . , Y64 be a random sample from N(8, 4). Describe the sampling distribution of RV X - Y . 9. Given a random sample X1, X2, . . . , X10 = 118 115 111 122 104 113 114 114 111 108, an unbiased estimate for a) p if the sample is from binomial(X; n = 200) is _______. b) s 2 if the sample is from N(m, s 2) is _______. c) k if the sample is from a Poisson is _______. 10. Two random samples from the same distribution were taken as shown below. Find the best estimate for the variance of the distribution. Sample 1 Sample 2
12 9 12 11 24 7 11 14 15 10 17 19 2 6 9 10 15 2 4 8 14 12 7 11 20 6 16 11 10 5.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 386
Chapter 7
Hypothesis Testing
Factual evidence can never “prove” a hypothesis; it can only fail to disprove it, which is what we generally mean when we say, somewhat inexactly, that the hypothesis is “confirmed” by experience. ~ Milton Friedman
After we have estimated a parameter, how can we test to see if our estimate is satisfactory? How can we test to determine if our sample is from a binomial or normal or any other distribution? How can we test hypotheses about parameters of distributions? In this chapter we use inferential statistics to discuss the concepts and methods of hypothesis testing. We will be concerned with our hypothesized distributions, the sample sizes (to invoke the central limit theorem), whether variances are known, and the similarity of variance measurements when comparing two distributions. 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 386
Introduction Terminology in Statistical Tests of Hypotheses Hypothesis Tests: Means Hypothesis Tests: Proportions Hypothesis Tests for Difference between Two Means: Small Samples (n £ 30) s 2 Known Hypothesis Test with Paired Samples Hypothesis Tests: Variances Hypothesis Tests for Independence, Homogeneity, and Goodness of Fit Summary
P369463-Ch007.qxd 9/2/05 11:17 AM Page 387
7.1 Terminology in Statistical Tests of Hypotheses
7.0
387
Introduction In one sense a lot about the testing of hypotheses is already done because the confidence intervals in chapter 6 can serve as a method of determining the acceptance or rejection of hypotheses. However, the whole pathway of collecting, organizing, and evaluating data is paved with potholes and pitfalls. A general procedure is to first plan the experiment completely and evaluate the plan as to fairness and contingencies. How much data to collect, what kind of data, what measurement or instrument to use to get the data, from what group or groups, how much money is involved, the time constraints, the precision and reliability desired or required, the kind of study (quick and dirty, probe, research hypothesis, judgment, poll, etc.), and the implications of possible results are all things that need to be considered when planning an experiment. Oftentimes convenience samples are available. Imagine the difficulty of being a graduate student doing a dissertation and attempting to get two groups of students already assigned to classes in different schools to be your randomly selected controlled group and your randomly selected experimental group. What almost always happens is that you accept the groups as is and your research becomes more of a probe. The reader is advised to determine the sponsor of research studies (who furnished the dollars) that have surprising results. Even though the statistical procedures are sound, the execution of the research may be faulty. Bias may creep in unknowingly or even be deliberately put in and interpreted to favor the researcher’s hypothesis. We cannot overestimate the up front effort required before the collection of data. However, in this chapter we assume that all the proper steps have been taken to secure the data. Recall the acronym GIGO: garbage in, garbage out. It is just as applicable to statistical analysis as to computer programming. In hypothesis testing, we may conjecture a value for a parameter of a population. For example, we may think that m is 50 for a normal population. We then randomly collect data from the population and compute an estimator for m, namely, the value x of the statistic X. The nearness of x to our hypothesis value 50 for m needs to be quantified in a probability statement with acceptable error or risk for the test. The risk is designated by the symbol a. The tests are designed to favor the null hypothesis unless the alternative hypothesis is clearly a significantly better choice.
7.1
Terminology in Statistical Tests of Hypotheses Karl Pearson developed the groundwork for modern hypothesis testing. The null hypothesis, designated H0, is usually reserved for the status quo of the
P369463-Ch007.qxd 9/2/05 11:17 AM Page 388
388
Chapter 7 Hypothesis Testing
situation or the standard operating procedure of no effect or difference, although any of the statistical hypotheses can be designated the null. The name null derives from the expectation of no significant difference between the two test groups. For example, if an experimenter wanted to test a new production process against the current production process by comparing the mean daily production of each, the null hypothesis would be formulated to reflect no significant difference between the two means of the production processes. It is much easier to disprove the hypothesis by counterexample than it is to prove the hypothesis. The alternative hypothesis, designated H1, is often referred to as the researcher’s hypothesis, indicating a difference or what the experimenter (researcher) really wants to justify statistically. In the new production process, the alternative hypothesis would be stated to reflect that the new process is superior to the current process. Of course, merely stating it does not statistically prove it. If the null hypothesis is stated as H0: q = q0 versus the alternative H1 = q π q0, the test is said to be two-tailed or nondirectional in that the experimenter is interested in q values greater than or less than q0, as designated in H1. If the null hypothesis is stated as H0: q = q0 versus the alternative H1 = q < q0, the test is said to be one-tailed (lower tail from the H1 specification) or directional in that the experimenter is interested only in q values less than q0, as designated in H1. If the null hypothesis is stated as H0: q = q0 versus the alternative H1 = q > q0, the test is said to be one-tailed (upper tail from the H1 direction) in that the experimenter is interested only in q values greater than q0, as designated in H1. Again, the type of test, one-tailed or two-tailed, is determined by the alternative hypothesis. Whether to use directional hypotheses is moot. The overall effect does not need to be as large to be statistically significant in a directional test. A simple hypothesis is one in which the underlying distribution is completely specified. Specifying the parameter k for an exponential distribution is an example of a simple hypothesis. For example, H0: k = 2 for the exponential distribution is simple. The hypothesis k ≥ 2 for an exponential distribution is not simple and is called composite, because specifying that k ≥ 2 does not completely specify the distribution. The test statistic is an appropriate estimator used to determine its nearness to the hypothesized parameter by using an appropriate statistical test, e.g., normal, t, chi-square, etc. In testing the null hypothesis H0: m = m0 with n samples from a normal distribution N(m, s 2), X is an appropriate estimator for m with the Z-statistic to determine the closeness given by
Z=
X -m s/ n
.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 389
7.1 Terminology in Statistical Tests of Hypotheses
389
If the z-value falls in the acceptance region, the null hypothesis is not rejected. If the z-value falls outside the acceptance region in the rejection region, the null hypothesis is rejected. The critical region of the test refers to the area under the standard normal density function in which the null hypothesis is rejected; the acceptance region is the area in which the null hypothesis is not rejected. The area of the critical region is determined by the significance level or the level of significance of the test and is designated by the symbol a and is the risk assumed when the null hypothesis is simple. There are two types of errors, Type I and Type II. A Type I error occurs when the null hypothesis is erroneously rejected. The probability of a Type I error is a. Usually the hypotheses are formulated with the Type I error being the more serious, since the Type I error is under the control of the experimenter. In manufacturing consumer products, a Type I error is considered to be the producer’s risk of rejecting a good batch of products to sell. A Type II error occurs when the null hypothesis is erroneously accepted. The probability of a Type II error is designated as b(q). A Type II error is a function of the assumed true value of the parameter q. In manufacturing consumer products, a Type II error is considered to be the consumer’s risk of accepting (buying) a bad batch of products. In Figure 7.1, the b error is the area under the right normal curve to the left of the bold vertical line, which indicates the upper critical x value.
H0
H1
Power
b a xcritical
Figure 7.1
Type I and Type II Errors and Power
P369463-Ch007.qxd 9/2/05 11:17 AM Page 390
390
Chapter 7 Hypothesis Testing
It is desirable to keep both a and b errors small, but decreasing a increases b, and decreasing b increases a when the other factors remain constant. The only way to decrease both a and b errors is to increase the sample size n, which increases costs. The power of a hypothesis test is denoted as 1 - b(q) and is the probability of rejecting the null hypothesis H0 when the specified alternative hypothesis H1 is true. The closer H1 is to H0, the lower the power and the higher the probability of a Type II error. In Figure 7.1, when the alternative hypothesis is true, the power is the area under the right hand curve to the right of the vertical bold line and the Type II error is b as shown. When the null hypothesis is true, the Type I error is a.
EXAMPLE 7.1
The fill of a cereal box machine is required to be 18 ounces, with the variance s 2 already established at 0.25. The past 100 boxes revealed an average of 17.94 ounces. A testing error of a = 1% is considered acceptable. If the box overfills, profits are lost; if the box underfills, the consumer is cheated. a) Can we conclude that the machine is set at 18 ounces? b) Find a 99% confidence interval for m. c) Find b if the true mean is 18.1, and compute the power of the test. d) Would the result change if the sample variance were 0.25 from the data rather than knowing s 2 = 0.25? Solution a)
Null Hypothesis H0: m = 18 versus Alternative Hypothesis H1: m π 18 (two-sided test or nondirectional).
The value 17.94 for RV X is an estimate for parameter m, assumed to be 18. Statistical test Z =
X -m s/ n
=
17.94 - 18
= -1.2 = z0.1151 > z0.005 = -2.576.
0.5/ 100
Since the computed z-value -1.2 is greater than (to the right of ) the critical table value -2.576, that is, in the acceptance region, the null hypothesis H0 cannot be rejected (see Figure 7.3). b) Observe that H0 is a simple hypothesis in that the distribution is completely specified as N(18, 0.25). An equivalent test of H0 is to determine if the parameter m = 18 is in a 99% confidence interval about x = 17.94. If so, H0 cannot be rejected; if not, H0 can be rejected. The 99% confidence interval for the null hypothesis m = 18 is given by m Œ x ± za /2 * s / n with (100 - a )% confidence. 18 Œ17.94 ± 2.576 * 0.5/ 100 with (100 - 1)% = 99% confidence = 17.94 ± 0.12875 or in the range (17.81,18.07). (mu-Z-ci 17.94 0.5 100 0.01) Æ (17.81 18.07) with 99% confidence.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 391
7.1 Terminology in Statistical Tests of Hypotheses
391
99%
Figure 7.2
99% Confidence Interval for m Œ [17.81, 18.07]
The value 18 is in the interval (17.81, 18.07), indicating that the null hypothesis cannot be rejected (Figure 7.2). c) Given that the true mean m is 18.1, the null hypothesis m0 = 18 is accepted if X lies between m 0 - za /2 * s / n < X < m 0 + za /2 * s / n or 18 - 2.576 * 0.5/ 100 < x < 18 + 2.576 * 0.5/ 100 or 17.87 < x < 18.13. b (18.1) = P (Type II error m = 18.1) = P ( accept m 0 = 18 m = 18.1) = P (17.87 < x < 18.13 m = 18.1) Ê 18.13 - 18.1ˆ Ê 17.87 - 18.1ˆ =F -F Ë 0.5/10 ¯ Ë 0.5/10 ¯ = F(0.6) - F( -4.6) ª 0.73. With use of the command template (del-normal m s 2/n x1 x2), the specific command (del-normal 18.1 1/400 17.87 18.13) returns 0.7257.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 392
392
Chapter 7 Hypothesis Testing
Notice that the relatively high b error results from the closeness of the alternative hypothesis m1 = 18.1 to the null hypothesis m0 = 18. The greater the distance between H0 and H1, the less the b error. The probability of rejecting the null hypothesis H0 when H0 is false, that is, when m = 18.1, is given by the power of the test. Ê 18 + 2.565 * 0.5/10 - 18.1ˆ P ( X ≥ m 0 + za / 2s / n m = 18.1) = 1 - F ª 0.27 Ë ¯ 0.5/10 = 1 - b (18.1) = 1 - 0.73 = 0.27. The power of 0.27 indicates that the statistical test will reject 27% of the time the hypothesized mean m0 of 18 whenever the real mean m1 = 18.1. d) No, the result would not change. The large sample size allows the central limit theorem to use s2 as an estimate for s 2.
The command (beta-b m0 m1 s n a) returns the two-tailed beta error for a in decimal or percent. For example, (beta-b 18.0 18.1 0.5 100 1) returns 0.718. The command (sim-hypothesis mu sigma a n) returns the simulated two-tailed p-value for a random sample of size n from N(m, s 2) and accepts or rejects depending on the given a. (sim-hypothesis 18 0.5 0.01 100) may return (p-value = 0.4234 ACCEPT) Repeat the command a few times; then change the values of sigma, alpha, and sample size n. Accept should occur 99 times out of 100 with a set at 0.01.
EXAMPLE 7.2
The diameter of a drill is supposedly 3/8 inch. Random samples of 49 holes are drilled with mean diameter 12/33 inch with the standard error s equal to 0.03 inch. With the level of significance a set at 1%, can the null hypothesis m = 3/8 inch be rejected? Solution Null Hypothesis: H0: m = 3/8 versus Alternative Hypothesis: H1: m π 3/8. Statistical Test: Z =
X -m
where s suffices for s since n = 49 ≥ 30.
s/ n z=
(12/33 - 3/8) 0.03/ 49
= -2.65 = z0.004 < -2.575 = za /2 = z0.005 ,
P369463-Ch007.qxd 9/2/05 11:17 AM Page 393
7.1 Terminology in Statistical Tests of Hypotheses
393
Unt Normal Density
Rejection Region
Rejection Region Acceptance Region
–4
Figure 7.3
–3
–2
–1
0
1
2
3
4
Unit Normal Curve Critical Regions
falling in the lower (left) tail of the rejection region, and hence H0 is rejected. The a-error comprises the rejection region |Z| ≥ za/2. Figure 7.3 depicts the two-tailed rejection region. Because H0 was rejected, m0 = 3/8 (or 0.375) should not be in a 99% confidence interval about x = 12/33. The 99% confidence interval for m0 = 0.375 is 12/33 ± 2.575 * 0.03/7 or the interval (0.3526, 0.3747). The command (Z-test x m s n) returns the z-value and one-tail p-value. For example, (Z-test 12/33 3/8 0.03 49) returns z = -2.6515151, pvalue = 0.004 (lower tail). The p-value is the smallest a-value for rejecting the null hypothesis.
EXAMPLE 7.3
a) Given H0: m = m0 and H1: m > m0 when sampling from N(m, s 2) with sample size n, find xc, the value of the critical region. b) Formulate the probability of a Type II error for m = m1 given a and that m1 > m0 (Figure 7.4). c) Find the sample size when s = 4, m0 = 50, m1 = 52, a = 5%, b = 10%. d) Verify b(52) = 10%, using the sample size in part c, above.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 394
394
Chapter 7 Hypothesis Testing
a
b
m0
Figure 7.4
m1
xc
Errors a and b with Critical xc
Solution a) The critical value xc = m0 + za s/ n , assuming that H0 is true, and from assuming that H1 is true, xc = m1 - zbs/ n . Equating these two values for xc gives xc = m 0 +
za s
= m1 -
zb s
n
.
n
Solving for sample size n at the specified a and b, 2
È s ( za + zb ) ˘ n=Í . Î m1 - m 0 ˙˚
(7–1)
b) The Type I error a is the area under the normal curve centered at m0 to the right of xc. The Type II error b is the area under the curve centered at m1 to the left of xc. The probability of a Type II error is given by Ê ( m 0 + za s / n ) - m1 ˆ Ê X c - m1 ˆ b ( m1 ) = F = FÁ ˜. Ë s/ n ¯ Ë ¯ s/ n If the test statistic X lies to the left of xc, then H0 will not be rejected. 2
2
È s ( za + zb ) ˘ È 4(1.645 + 1.282) ˘ c) n = Í =Í ˙˚ = 34.27 ª 35. Î Î m1 - m 0 ˙˚ (52 - 50)
P369463-Ch007.qxd 9/2/05 11:17 AM Page 395
7.1 Terminology in Statistical Tests of Hypotheses
395
( m + za s / n ) - m1 ˆ Ê (50 + 1.645 * 4/ 35 - 52) ˆ d) b (52) = FÊÁ 0 ˜ = FÁ ˜ Ë ¯ Ë ¯ s/ n 4/ 35 = F( -1.31) ª 10%. EXAMPLE 7.4
The mean breaking strength of a new cable is hypothesized to be 260, whereas the breaking strength of the old cable is normally distributed with mean breaking strength 250 and standard deviation 25. A test is devised so that an a error of 5% and a b error of 10% are acceptable. Determine the sample size for the test. Solution H 0: m = 250 vs. H1: m = 260. za = z0.05 = -1.645; zb = z0.10 = -1.28. 2
2
È s ( za + zb ) ˘ Ê 25( -1.645 - 1.28) ˆ n=Í = = 53.47 or 54 cables. Ë ¯ Î m1 - m 0 ˙˚ 260 - 250 EXAMPLE 7.5
Create an upper beta operating curve (Figure 7.5) for Example 7.4 with H0 = 250, a = 5%, n = 54, and s = 25. Solution We first determine the acceptance region under the null H0: m = 250. This region is given by m0 +
za s n
= 250 +
1.645 * 25
= 250 + 5.56 = 255.6.
54
1.2 1 0.8 0.6 0.4 0.2 0 248 249 250 251 252 253 254 255 256 257 258 259 260
Figure 7.5
Beta Operating Curve
P369463-Ch007.qxd 9/2/05 11:17 AM Page 396
396
Chapter 7 Hypothesis Testing
Table 7.1 m b
248 0.99
250 0.950
252 0.855
254 0.681
256 0.453
258 0.240
260 0.098
That is, we cannot reject H0 if our test statistic X from our sample of 54 cables is less than or equal to 255.6. We then compute P( X £ 255.6 | m = mb) for various values around our hypothesized mean of 250. For example, Ê 255.6 - 254 ˆ P ( X £ 255.6 m b = 254) = F = 0.681. Ë 25/ 54 ¯ The data for the operating characteristic curve is shown in Table 7.1, generated by the command (beta-table u0 beta-list sd n alpha tail) or specifically by (beta-table 250 '(248 250 252 254 256 258 260) 25 54 5 'U) Observe that the curve dips to 95% where b(m0 = 250) = 95%, xc = 255.6. When the assumed true “alternative” mean is the same as the hypothesized null value of the mean, b = (1 - a). Decreasing a or n, setting m0 closer to b, or increasing s increases the probability of a beta error; increasing a or n, setting m0 farther from b, or decreasing s decreases the probability of a beta error. EXAMPLE 7.6
Consider RV X with density given by f(x; q) = (q + 1) xq for 0 £ x £ 1; q > -1. The null hypothesis q = 1 is to be rejected if and only if X exceeds 0.8. The alternative hypothesis is q = 2. Find a and b for q = 2 and q = 3. Solution a = P ( X > 0.8 q = 1) =
EXAMPLE 7.7
1
Ú
0.8
2xdx = 0.36;
b q = 2 = P ( X £ 0.8 q = 2) =
Ú
b q = 3 = P ( X £ 0.8 q = 3) =
Ú
0.8
0 0.8
0
3 x 2 dx = 0.512; 4 x 2 dx = 0.4096.
A coin is to be flipped 20 times to determine if it is fair. If the number of heads is 15 or more or 5 or less, the fairness of the coin is rejected. a) Determine the significance level a of the test. b) Compute the b error and power of the test if the probability of a head is 0.6. Solution
H 0 : p = 0.5 vs. H1: p = 0.6.
a) a = P ( reject H 0 p = 0.5) = 1 - P (6 £ X £ 14 n = 20, p = 0.5) = 1 - 0.9586 = 0.0414.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 397
7.2 Hypothesis Tests: Means
397
b) b = P ( accept p = 0.5 p = 0.6) = P (6 £ X £ 14 n = 20, p = 0.6) = 0.8728. Power = 1 - b(0.6) = 1 - 0.8728 = 0.1272. P(a £ X £ b | n, p) = (cbinomial-a-b n p a b) P(6 £ X £ 14 | n = 20, p = 0.5) = (cbinomial-a-b 20 0.5 6 14) Æ 0.9586 P(6 £ X £ 14 | n = 20, p = 0.6) = (cbinomial-a-b 20 0.6 6 14) Æ 0.8728
7.2
Hypothesis Tests: Means We develop tests for the parameters m and s 2 of the normal distribution, the mean of any distribution with large sample size, and the probability of success p for the binomial distribution. In hypothesis testing, we seek to quantify the closeness of the parameter estimates from the data to the hypothesized parameters. The closeness is determined by how much error we are willing to accept.
EXAMPLE 7.8
(Normal with s known) a) Given the following 20 random samples from a normal distribution with unknown m but with s 2 = 4, test whether H0: m = 10 is acceptable with a set at 5%. b) Determine b(9) and b(10). Solution Data:13 8 10 10 8 9 10 11 6 8 12 11 11 12 10 12 7 10 11 8 ( x = 9.85) a) Ho: m = 10 versus H1: m π 10 (two-tailed). Ê x -mˆ Ê 9.85 - 10 ˆ F =F = F( -0.3354) = 0.369 > 0.0250 = F( -1.96) Ë s/ n ¯ Ë 2/ 20 ¯ fi Cannot Reject at 5%. b) The critical boundaries for accepting H0 are m0 -
za /2s
< X < m0 +
za /2s
n
n
or 10 -
1.96 * 2
< X < 10 +
1.96 * 2
4.47 or 9.12 < X < 10.88.
4.47
P369463-Ch007.qxd 9/2/05 11:17 AM Page 398
398
Chapter 7 Hypothesis Testing
(beta-b m 0 m1 s n a ) b (9) = P (9.12 < X < 10.88 m = 9) = (beta-b 10 9 2 20 0.05) Æ 0.39. Ê 10.88 - 9 ˆ Ê 9.12 - 9 ˆ =F -F ; (del-normal 9 4/20 9.12 10.88) Ë 2/ 20 ¯ Ë 2/ 20 ¯ = 0.39. Even though there is only a 5% chance of rejecting H0 when H0 is true, there is a nearly 40% chance of accepting H0 = 10 when the true mean is 9. Observe that b = 1 - a when H0 = H1. (del-normal m s 2 ¥ 1 ¥ 2) b (10) = P (9.12 < X < 10.88 m = 10); (del-normal 10 4/20 9.12 10.88) = F[4.47(10.88 - 10)/2] - F[4.47(9.12 - 10)/2] = F(1.96) - F( -1.96) (beta-b m 0 m1 s n alpha) = 0.975 - 0.025 = 0.95.
(beta-b 10 10 2 20 0.05)
Even though there is only a 5% chance of rejecting H0 when H0 = 10 is true, there is a 95% chance of accepting H0 when indeed H0 is true. [Note: the 20 samples were simulated from N(10, 4)].
EXAMPLE 7.9
Random sampling from N(m, s 2) produced the following data: 13 8 10 10 8 9 10 11 6 8 12 11 11 12 10 12 7 10 11 8 ( x = 9.85). Test H0: m = 10 versus H1: m π 10 (two-tailed with data). Solution Since we are sampling from a normal distribution, we use x = 9.85 as an estimate for m and s2 = 3.50 as an estimate for s 2. Since sample size n = 20 is considered small with s 2 unknown, the t-test is appropriate. H 0 : m = 10 versus H1: m π 10 ( two-tailed ).
t=
n(x - m) s
= t19,.975
=
20 (9.85 - 10)
1.87 fi NO REJECT.
= -0.359 > -2.093
P369463-Ch007.qxd 9/2/05 11:17 AM Page 399
7.2 Hypothesis Tests: Means
399
The command (t-test m0 s n x-bar) returns the t- and p-values for testing H0: m = m0 vs. H1:m π m0. For example, (t-test 10 2 20 9.85) returns (t = -0.335, p-value = 0.741). The command (one-sample-t data m0) returns the t- and p-values from the sample data; for example, (setf data '(13 8 10 10 8 9 10 11 6 8 12 11 11 12 10 12 7 10 11 8)) Then (one-sample-t data 10 5) prints
n 20
df 19
s 1.872
se mean 0.418
x-bar 9.85
t -0.358
p-value 0.724
and returns (t = -0.3584 p-value = 0.724).
EXAMPLE 7.10
Let X1, X2, . . . , X16 be a random sample from a normal distribution with mean m and variance s 2 = 16. In testing the null H0: m = 3 vs. H1: m = 4, the critical region is X > xc. If the significance level of the test a is set at 0.04, find the respective value of xc and the probability of a Type II error. Solution
xc = m + zas / n = 3 + 1.75 * 4/4 = 4.75; (inv-phi 0.96) returns 1.75.
Ê 4.75 - 4 ˆ b (4) = P ( X < 4.75 m = 4) = F = F(0.75) = 0.773 = (phi 3/4). Ë 4/4 ¯ EXAMPLE 7.11
Let X1, . . . , X6 be a random sample from a distribution with density function f(x) = qxq-1; 0 £ x £ 1 with q > 0. The null hypothesis H0: q = 1 is to be rejected in favor of the alternative H1: q = 2 if and only if at least 4 of the sample observations are larger than 0.8. Find the probability of a) Type I error (a), and b) Type II error (b). Solution a) The probability of a Type I error is the probability of erroneously rejecting a true null hypothesis. When H0 is true, q = 1 and f(x) = 1. Then a = P ( X ≥ x; n, p q = 1) = (cbinomial n p x ) where p = P ( X ≥ 0.8 q = 1). The probability of any one sample exceeding 0.8 is
P369463-Ch007.qxd 9/2/05 11:17 AM Page 400
Chapter 7 Hypothesis Testing
400
p=
1
Ú
0.8
1 dx = 1 - 0.8 = 0.2.
For 4 or more samples the cumulative binomial probability is 3
6 1 - P ( X £ 3; n = 6, p = 0.2) = 1 -  ÊË ˆ¯ 0.2 x 0.86 - x = 1 - 0.983 = 0.1696. x =0 x The Type I error a is ª 1.7% = (cbinomial-a-b 6 0.2 4 6). b) For b, assuming the true q = 2, f(x) = 2x, and the probability of any one sample being less than 0.8 is given by p = Ú0.8 0 2xdx = 0.64. The probability that 3 or less samples are less than 0.8 is given by b (2) = binomial ( X £ x, n, p) = binomial ( X £ 3, n = 6, p = P ( X < 0.8 q = 2) = binomial ( X £ 3, n = 6, p = 0.64) 3
=
6
 ÊË xˆ¯ 0.64
x
0.366 - x
x =0
= 0.3732 = (cbinomial 6 0.64 3) The Type II error of erroneously accepting H0 is b = 37.32%.
The command (cbinomial n p x) returns P(X £ x | n, p). (cbinomial 6 0.64 3) returns 0.3732. (cbinomial-a-b n p a b) returns the sum of the probabilities from a to b. (cbinomial-a-b 6 0.2 4 6) returns 0.01696.
P-value The p-value of a test is the critical boundary between acceptance and rejection of the null hypothesis. It is the probability of making a Type I error if the actual sample value is used for rejection; that is, it is the smallest level of significance for rejecting the null hypothesis. If the reported p-value is greater than any a specified for the test, then the null hypothesis cannot be rejected. If the p-value is less than a specified a for the test, the null hypothesis can be rejected. Often when tests are conducted, the results are reported that H0 was rejected at a level of significance a = 5%. What many readers would like to know is how close H0 was to being accepted. (One prefers to know the score of a game rather than just who won). Was it rejected by a substantial margin or could H0 have been accepted at a = 10%? Researchers usually report a p-value for the test without regards to specifying an a.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 401
7.2 Hypothesis Tests: Means
Table 7.2 z 2.0 2.1 2.2 2.3 2.4
401
Partial Normal Table of Probabilities
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.9772 0.9821 0.9861 0.9893 0.9918
0.9778 0.9826 0.9864 0.9896 0.9920
0.9783 0.9830 0.9867 0.9898 0.9922
0.9788 0.9834 0.9871 0.9901 0.9925
0.9793 0.9838 0.9875 0.9904 0.9927
0.9798 0.9842 0.9878 0.9906 0.9929
0.9803 0.9846 0.9881 0.9909 0.9931
0.9808 0.9850 0.9884 0.9911 0.9932
0.9812 0.9854 0.9887 0.9913 0.9934
0.9817 0.9857 0.9890 0.9916 0.9936
EXAMPLE 7.12
a) Test H0: m = 10 versus H1: m > 10 for data that reveal x = 10.7, s2 = 4, n = 49, with a set at 5%. b) Find the p-value for the test. c) Show that if a > p-value, the null hypothesis H0 is rejected. Solution a) The hypothesis assumes that the underlying distribution is normally distributed as N(10, s 2). The Z-test applies since n = 49 is relatively large. z=
x-m
=
s/ n
10.7 - 10
= 2.45 > 1.645 = z0.95 fi Reject.
2/ 49
b) The p-value (Table 7.2) is 1 time (2 times for two-tailed test) the area under the standard normal curve from the computed z = 2.45 to z = •, that is, 1 - F(2.45) = 1 - 0.9929 = 0.0071. With a two-tailed alternative hypothesis, the p-value would be 2 * 0.0071 = 0.0142. c) Suppose a is set at 0.008 (0.8%). Then z=
10.7 - 10
= 2.45 > z0.992 = 2.41 fi Reject.
2/ 49 That is, the p-value for z = 2.45 is 0.7%, which is less than the a-value set at 0.8%, which implies rejection.
EXAMPLE 7.13
Let X1, . . . , X16 be the gas mileage data from an assumed normal distribution. The lot manager claims the miles per gallon (mpg) are 27. The collected data show x = 25.9 mpg and s2 = 4. Compute the p-value of the test. H 0: m = 27 versus H1: m £ 27 ( lower one-tailed test ). Test Statistic: T =
X -m S/ n
=
25.9 - 27
= -2.2.
2/ 16
(L-tee df x ) p-value = t15 ( -2.2) = 0.0219 ª 2.2%. (L-tee 15 – 2.2) Æ 0.021948. If a were set at 5%, the claim would be rejected; if a were set at 1%, the claim would not be rejected.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 402
Chapter 7 Hypothesis Testing
402
Degrees of Freedom v 14 15 16
Figure 7.6
p-value = 0.022 a = 0.100 1.345 1.341 1.337
a = 0.050 1.761 1.753 1.746
a = 0.025 2.145 2.131 2.120
a = 0.010 2.624 2.602 2.583
a = 0.005 2.977 2.947 2.921
Partial T-table
Figure 7.6 depicts a portion of a t-table. Notice that the t-value of 2.2 for 15 degrees of freedom lies between values 2.131 and 2.602, corresponding to a-values of 0.025 and 0.010, with the calculated p-value of 0.022 somewhere between the two but closer to 0.025. The actual p-value is computed with the command (inv-t 15 2.2), returning 2.1993. Notice that as the t-value becomes larger, the critical a (p-value) becomes smaller.
Directional Tests Caution is necessary in deciding whether to perform directional or one-tailed tests. The experimenter should have some justification before performing one-tailed tests, because it is easier to reject the null hypothesis. EXAMPLE 7.14
Suppose daily productivity at a factory has averaged 100 with s = 25 and temperature controlled at 70 degrees Fahrenheit. The experimenter thinks productivity will increase if the temperature is reduced to 60 degrees Fahrenheit. The experiment is performed, with a set at 5% for the next 36 days, with the average productivity x = 108. The experimenter tested H0: m = 100 vs. H1: m > 100. Is the one-tailed hypothesis justified? Solution
x = 108; n = 36 days; z =
108 - 100
= 1.92 > 1.645 = z0.95
25/ 36 fi REJECT. Note that if x was exceedingly below 100, H0 could not be rejected. But a two-tailed hypothesis test shows H 0: m = 100 vs. H1: m π100 Z = 1.92 < 1.96 = z0.975 fi NO REJECT.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 403
7.3 Hypothesis Tests: Proportions
403
The one-tailed tests make it easier to reject the null hypothesis. Temperature could have had a significant detrimental effect on productivity, an effect which goes untested in this experiment.
7.3
Hypothesis Tests: Proportions Given a binomial RV X with parameters n and p and with n sufficiently large, ˆ = X/n, V( p ˆ ) = V(X/n) = npq/n2 = pq/n. E(X ) = np and V(X ) = npq. With p Then RV ˆ-p p pq/ n is approximately standard normal. Therefore, to test H0: p = p0 versus H1: p π p0, H0 is rejected if ˆ-p p
> za /2 .
pq/ n EXAMPLE 7.15
(Proportion) It is hypothesized that at least 75% of a factory’s employees favor a new health bill. When polled, 360 out of 500 workers voted in favor of the new health bill. Can the null hypothesis of p = 0.75 be rejected with a set at 5%? Solution
H 0 : p ≥ 0.75 versus H1: p < 0.75 ( lower one-tailed test ).
For large samples, the standardized RV Pˆ is approximately normal. ˆ = 360/500 = 0.72. p Thus z=
ˆ-p p pq/ n
=
0.72 - 0.75 0.75 * 0.25/500
= -1.549 > -1.645 = z0.05.
Cannot reject H0. Notice that F(-1.549) = 0.0600 fi p-value = zcritical = 6% > a = 5%, confirming the decision not to reject. EXAMPLE 7.16
If a machine produces more than 10% defectives, Repair is in order. In a random sample of 50 items, 7 defectives were found. a) Does this sample evidence support Repair at a = 1%? b) Find the p-value. c) Determine the critical number of defects to reject H0. Assume that a large lot of items are produced daily.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 404
404
Chapter 7 Hypothesis Testing
Solution a) H 0 : p = 0.10 vs. H1: p > 0.10; ˆ = 7/50 = 0.14. p z=
ˆ-p p
=
0.14 - 0.10
0.10 * 0.90/50 pq/ n fi Cannot reject.
= z0.8272 = 0.9431 < 2.33 = z0.99
b) The p-value is 1 - F(0.9431) = 1 - 0.8272 = 0.1728 > 0.01 = a fi Cannot reject. (U-phi 0.9428) Æ 0.1729. ˆ - 0.10 p ˆ = 0.1988; (phi 2.33) Æ 0.99. = z0.99 = 2.33 fi p c) 0.10 * 0.90/50 ˆ = 50 * 0.1988 = 9.94 ª 10 The critical number of defects is given by n p defects. That is, we would reject H0 that p = 0.10 at a = 1% if we observed 10 or more defects. Note: (cbinomial 50 0.10 10) Æ 0.9906 > 0.99 fi Reject. Consider two binomial random variables X1 and X2 with parameters n1, p1 and n2, p2 with n1 and n2 both large. Sampling from each of the populations x ˆ1 = 1 and can produce estimates for the unknown p parameters with p n1 x2 ˆ2 = p . Under the hypothesis that there is no difference between the n2 proportions, that is, p1 - p2 = 0, the best estimate for the true proportion would be the pooled proportion given by ˆ pooled = p
x1 + x2 n1 + n2
1.
The approximately standard normal RV is given by ( Pˆ1 - Pˆ2 ) - ( p1 - p2 ) 1 1ˆ ˆ Pooled qˆPooled Ê p + Ën n2 ¯ 1 and can be used to test the null hypothesis H0: p1 - p2 = 0 versus H1: p1 - p2 π 0. n1 p1q1 n2 p2 q2 Ê X1 X 2 ˆ Ê X1 X 2 ˆ = p1 - p2 and V = + , the RV 2 Ën ¯ Ë ¯ n2 n1 n2 n1 n2 1 given by
Since E
P369463-Ch007.qxd 9/2/05 11:17 AM Page 405
7.3 Hypothesis Tests: Proportions
405
( Pˆ1 - Pˆ2 ) - ( p1 - p2 ) p1q1
+
p2 q2
n1
n2
is also approximately standard normal when n1 and n2 are sufficiently large. Under H0 with p1 = p2 ,
ˆ1 - p ˆ2 ) (p p1q1
+
becomes
.
1 1ˆ ˆ Pooled qˆPooled Ê p + Ën n2 ¯ 1
p2 q2
n1
ˆ1 - p ˆ2 ) (p
n2
Either form may be used to test the null hypothesis. However, the pooled proportion should be used only when it is assumed that p1 - p2 = 0.
EXAMPLE 7.17
In a preference test a new deodorant was preferred by 320 of 400 people asked in the North and 300 of 425 people asked in the South. Is there a difference between the two groups at a 5% level of significance? Solution
H0: p1 - p2 = 0 versus H1: p1 - p2 π 0. ˆ1 = p
Method I: z =
320 400
ˆ2 = = 0.8; p
300
620 = 0.71; Pˆpooled = = 0.76. 425 825
(0.8 - 0.71) - 0
= 3.025 > 1.96 = z0.975 fi REJECT.
1 ˆ Ê 1 0.76 * 0.24 * + Ë 400 425 ¯ The p-value for the test is 2[1 - F(3.025)] = 0.00249 ª 0.25% < 5%. Method II: z =
(0.8 - 0.71) - 0 0.8 * 0.2 400
+
0.71 * 0.29
= 3.026 > 1.96 = z0.975 fi REJECT.
425
Fisher-Irwin Test Suppose we want to test the difference between two binomial random variables X and Y, where the normality assumption may not be appropriate because of the small sample size. Let p be the probability of successes for both RVs with A Bernoulli trials for X and B Bernoulli trials for Y. Suppose we observe that there were x successes in X and y successes in Y where x + y = n. Under the hypothesis H0: p1 = p2 vs. H1: p1 π p2, we should reject the null if there is a significant disparity between the two proportions. Then
P369463-Ch007.qxd 9/2/05 11:17 AM Page 406
406
Chapter 7 Hypothesis Testing
P( X = x X + Y = n) =
=
=
P ( X = x, Y = n - x ) P( X + Y = n) A Ê ˆ p x q A - x Ê B ˆ p n-x q B-n+x Ë x¯ Ë n - x¯ Ê A + B ˆ p n q A+ B - n Ë n ¯ Ê Aˆ Ê B ˆ Ë x ¯ Ë n - x¯ Ê A + Bˆ Ë n ¯
for x = 0, 1, . . . n
is a hypergeometric probability and thus independent of the probability of success. The p-value for the test is given by p-value = 2 min{ P ( X £ x ), P ( X ≥ x )}.
EXAMPLE 7.18
Suppose that Shift A produced 3 defective chips out of 15 and Shift B produced 10 defective chips out of 20. Is there a difference between the two shifts with a set at 5%? Use the Fisher-Irwin test as well as the pooled p-test. Solution Fisher-Irwin 3
P -value = 2 * P ( X £ 3, A = 15, B = 20, n = 13) = 2 * Â x =0
Ê15ˆ Ê 20ˆ Ë 3 ¯ Ë 10¯
= 0.1404,
Ê 35ˆ Ë 13¯
fi no reject.
The command (chyperg A B n x) returns the cumulative probability of x or fewer successes. For example, (chyperg 15 20 13 3) returns 0.0702.
Pooled p-test (0.2 - 0.5) - 0 ˆ = 2F( -1.536) = 2 * 0.0622 = 0.1245. P-value = 2FÊ Á 13 22 Ê 1 1 ˆ˜ * * + Á ˜ Ë 35 35 Ë 15 20 ¯ ¯
P369463-Ch007.qxd 9/2/05 11:17 AM Page 407
7.4 Hypothesis Tests for Difference between Two Means
407
7.4 Hypothesis Tests for Difference between Two Means: Small Samples (n £ 30) s 2 Known The confidence intervals established in the last chapter are the bases for the appropriate test statistics. Assume one sample X1, X2 . . . Xn is from a normal population with mean mX and variance s x2, and another independent sample Y1, Y2 . . . Ym is drawn from another normal population with mean mY and variance s Y2. Then 2 2 È Ê s X sY ˆ˘ RV X - Y ~ N Í( m X - m Y , + . Ë n Î m ¯ ˙˚
That is, X - Y is a normal RV with m = E( X - Y ) = m X - m Y and V ( X - Y ) =
s 2X n
+
s Y2 m
under the assumption of random sampling. Thus the appropriate test statistic is Z=
( X - Y ) - (m X - mY ) s 2X
+
s Y2
n
.
(7–2)
m
If the samples are large, the sample variances may be substituted for the appropriate unknown population variances.
EXAMPLE 7.19
(Difference between Two Means m1 - m2). Test the hypotheses Ho: mX - mY = 0 versus H1: mX - mY π 0 at a = 0.05 for the following data:
Solution
Z=
(X - Y ) - 0 s
2 X
+
s
2 Y
n = 100
m = 144
x = 25
y = 26.5
S X2 = 16
SY2 = 25
=
25 - 26.2 16
+
25
n m 100 144 = -2.08 < - z0.025 = -1.96 fi REJECT. The p-value is (* 2 (phi -2.08)) = 2(0.01876) = 0.0375 or 3.75%.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 408
Chapter 7 Hypothesis Testing
408
n < 30; s 2 Unknown Recall a fundamental theorem of statistics that states when sampling from a normal distribution, X and S2 are independent random variables and RV X -m has a t distribution with n - 1 degrees of freedom. S/ n Consider two random samples drawn from normal populations. Assume one sample X1, . . . , Xn is from a normal population with mean mX and variance s 2, and another independent sample Y1, . . . , Ym is drawn from another normal population with mean mY and the same variance s 2. Then RV X - Y is distributed È Ê 1 1 ˆ˘ N Í( m X - m Y , s 2 + . Ë n m ¯ ˙˚ Î That is, X - Y is a normal RV with m = E( X - Y ) = m X - m Y
and
V( X - Y ) =
s2 n
+
s2 m
under the assumption of random sampling. We can use the sample variance SX2 from the X1, . . . , Xn and the sample variance SY2 from the Y1, . . . , Ym to provide unbiased, independent estimates for s 2. These estimates are then 2 pooled with use of a weighted average. If n = m, then spooled = (SX2 + SY2)/2. 2 2 By assuming homogeneity of variance, that is, s X = sY, when s 2 is unknown, RV ( X - Y ) - (m X - mY ) 2 pooled
S
(7–3)
Ê1 1ˆ + Ë n m¯
has a t-distribution. If the parameter s were used instead of Spooled, the RV would be standard normal. With s 2 being estimated by the pooled (weighted) average of the sample variances, a t RV with n1 + n2 - 2 degrees of freedom results. Thus for small sample sizes from a normal population with unknown m and s, under the null hypothesis of no significant difference between the means, each sample variance is an independent estimate of s 2, and the pooled variance can be used for each in the t-test. These samples usually result from drawing a large sample from a population and randomly assigning the subjects to two X and Y groups receiving different treatments (placebo vs. drug). The two samples should show homogeneity of variance. EXAMPLE 7.20
Test the null hypothesis of no difference between means at a = 5% from the following data taken from two normal populations with the same variance.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 409
7.4 Hypothesis Tests for Difference between Two Means
Brand A Brand B
Solution
43 62
53 43
65 54
49 67
55 59
409
60 45
47 46
50 63
60 65
55 45
H 0: m A - m B = 0 versus H1: m A - m B π 0.
From the data, x A = 53.7; s 2A = 45.1; x B = 54.9; s 2B = 88.8; n A = n B = 10. 2 Since n A = n B , sPooled =
( X A - Y B ) - (m A - m B ) 2 Pooled
S
=
9 * 45.1 + 9 * 88.8
=
45.1 + 88.8
18 (53.7 - 54.9) 66.95 * 2/10
Ê 1 + 1 ˆ Ën nB ¯ A
= 66.95.
2 = -0.328 > t0.025 ,18 = -2.101
fi Cannot Reject. The p-value is 2 * t(df = 18, -0.328) = 2 * 0.373 ª 0.75 or 75%, which strongly indicates that both samples came from the same population. (t-pool x y) prints test statistics and the 95% confidence interval for X -Y. (t-pool '(43 53 65 49 55 60 47 50 60 55) '(62 43 54 67 59 45 46 63 65 45)) Æ x1-bar = 53.70 svar-1 = 45.12 sŸ2-pooled = 6.94 t-statistic = -0.3280 95% confidence interval is
x2-bar = 54.90 svar-2 = 88.77 s-pooled = 8.18 two-tailed p-value = 0.7467325 (-8.889 6.489)
However, there are occasions when the pooled variance procedure is not appropriate, that is, s 12 π s 22. When the sample sizes are less than 30 and the assumption of homogeneity of variances is not met, the Smith-Satterthwaite (S-S) test can be applied. The reader should be aware that erroneous conclusions could result from using the pooled variance with the t-test when the assumption is not warranted. The S-S test approximates the t-test with the degrees of freedom estimated from the data and rounded down when not an integer. The Smith-Satterthwaite t-test is given by t S- S =
X 1 - X 2 - ( m1 - m 2 ) S12 n1
+
S22
,
n2
with conservative degrees of freedom computed as
P369463-Ch007.qxd 9/2/05 11:17 AM Page 410
410
Chapter 7 Hypothesis Testing
v=
( S 12 / n1 + S 22 / n2 )2 ( S 12 / n1 )2 n1 - 1
EXAMPLE 7.21
+
( S 22 / n2 )2
.
n2 - 1
Consider the t-test between random samples of size 12 from RV X1 distributed N(50, 4) and size 22 from RV X2 distributed N(52, 169). Test H 0: m1 - m 2 = 0 vs. H1: m1 - m 2 π 0. Solution The command (t-pool (sim-normal m1 s n1) (sim-normal m2 s n2)) is used for the results. Assuming falsely that both samples came from normal distributions with the same variance, the command (t-pool (sim-normal 50 2 12) (sim-normal 52 13 22)) returned (notice the standard deviation of X2 is assumed to be 13 and not 2) x1 = 50.88
x2 = 55.28
s 12 = 2.30
s 22 = 124.98
spooled = 9.10
t-stat = -1.35, two-sided p-value = 0.1878, 95% confidence interval = (-11.05, 2.26). The null hypothesis of no significant difference cannot be rejected at a = 18%. However, the command (t-pool (sim-normal 50 2 12) (sim-normal 52 2 22)) with the true assumption of equal variances of 4 returns x1 = 50.58 2 1
s = 3.00
x2 = 51.97 2 2
s = 4.03
s-pooled = 1.92
t-stat = -1.99 two-sided p-value = 0.03, rejecting H0. 95% confidence interval is (-2.77, -0.03) Be sure to check the assumption of equal variance before using the pooled t-test.
The Bartlett test for testing variances is discussed in Chapter 9. The command (bartlett (list (sim-normal 50 2 12) (sim-normal 52 13 22))) returned (B = 29.2527974, p-value = 0.0000000) implying the variances 22 and 132 are not near enough in value to assume homogeneity of variance. The F-distribution test in Section 7.6 can also be used to test for equality of variances.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 411
7.5 Hypothesis Test with Paired Samples
EXAMPLE 7.22
411
Test the null hypothesis of no difference between means with a at 5% from the following data modified from Example 7.20 in that the A-entry 47 is replaced by 147. Brand A Brand B
43 62
53 43
65 54
49 67
55 59
60 45
50 63
147 46
60 65
55 45
Solution H0: m1 = m2 versus H1: m1 π m2 The sample means are 63.7 for Brand A and 54.9 for Brand B. The sample variance s12 = 896.23 and s22 = 88.77. With this wide disparity in the sample variances, the pooled t-test would not be appropriate (ignoring outlier implications). tS - S =
X 1 - X 2 - ( m1 - m 2 ) S
2 1
+
n1 v=
2 1
2 2
( s / n1 + s / n2 ) 2 1
( s / n1 ) n1 - 1
2
+
S
2 2
( s / n2 )
2
896.23 10
n2
2
2 2
63.7 - 54.9
=
+
= 0.887.
88.77 10
(896.23/10 + 88.77/10)2
=
(896.23/10)
n2 - 1
(10 - 1)
2
+
(88.77/10)
2
=
9702.25
ª 10.
901.23
(10 - 1)
The p-value is 0.396, which implies that the null hypothesis cannot be rejected. The values for the pooled procedures do not vary much in this problem, except for the degrees of freedom going from 18 to 10. If the assumption of normality is not warranted, a distribution-free procedure (discussed in Chapter 10) can be used.
The command (Two-sample-t x-data y-data) returns the t-value, the degrees of freedom, and the p-value for testing the difference of two means, assuming unequal variances. The F-test and Bartlett test are performed to determine homogeneity of variance and the appropriate t-test. The command (s-pool x-data y-data) returns the pooled standard error.
7.5
Hypothesis Test with Paired Samples When determining the difference between small independent samples, the subjects are often paired, and the difference between the means of the populations is tested with a t-test because the differences are independent. For
P369463-Ch007.qxd 9/2/05 11:17 AM Page 412
Chapter 7 Hypothesis Testing
412
example, the two samples could be student scores from a pretest followed by a post-test after a treatment was administered. Then too, independent samples could be paired by grouping pairs according to age, weight, height, etc. However, when paired, the two samples are no longer independent. The procedure to establish a test for the null hypothesis of no significant difference between the two samples is to regard the difference in the pairs as a random sample and proceed accordingly. For example, from n random paired observations (2n measurements) we compute the n differences between the pairs (X - Y ) and use a t-test with n - 1 degrees of freedom. EXAMPLE 7.23
Determine if there is a significant difference between the pretest and posttest scores at a = 1%. Solution Students Pretest Scores Post-Test Scores D = Difference
1 60 75 -15
2 45 65 -20
3 80 90 -10
4 87 80 7
5 79 89 -10
6 75 95 -20
7 60 85 -25
8 30 69 -39
9 45 40 5
H 0: m D = 0 vs. H1: m D π 0. (mu-svar '( -15 - 20 -10 7 -10 - 20 - 25 - 39 5)) Æ ( -14.11 206.61) d = -14.1; s 2D = 206.61; s D = 14.37 with n = 9, t=
d -0 sD
=
-14.1
= -2.945 > -3.355 = t8,0.005 with p-value 0.019
14.37/ 9
fi Cannot Reject.
Paired vs. Unpaired Sometimes a paired design can be more powerful than an unpaired design. Designate the two populations as distributions for two random variables X and Y with RV D = X - Y. Then if X and Y are independent, E( D) = m X - m Y and V ( D) = s 2X + s Y2 . An estimator for E(D) is D = X -Y with E( D ) = E( X - Y ) = m X - m Y
P369463-Ch007.qxd 9/2/05 11:17 AM Page 413
7.5 Hypothesis Test with Paired Samples
413
and V ( D ) = (s 2X + s Y2 )/ n. If the samples are paired, then they are no longer independent and V ( D ) = V ( X ) + V (Y ) - 2C( X , Y ) = (s 2X + s Y2 )/ n - 2C( X , Y ) with C( X , Y ) = rs X s Y . Then V(D) =
s 2X + s Y2 - 2rs X s Y
(paired and dependent).
n V(D) =
s 2X + s Y2
(unpaired and independent).
n Under the assumption of equal variances, V(D) =
2s 2 (1 - r )
.
(paired and dependent)
n V(D) =
2s 2
(unpaired and independent)
.
n Since both estimators are unbiased, a comparison of their minimum squared error reveals 2s 2 (1 - r ) V( D)D V( D)I
=
n 2s 2
= 1 - r,
n indicating that the paired samples could have a smaller variance if the correlation coefficient is positive, even though the independent size is 2n and the dependent size is n. The paired t-test can even be used when the prevariances and postvariances are not homogeneous.
The command (paired-t pre post a) returns the t- and p-values and a 100 (1 - a)% confidence interval for the paired t-test. (paired-t '(60 45 80 87 79 75 60 30 45) '(75 65 90 80 89 95 85 69 40) 5) returns n 9
D-bar -14.111
Std Error 4.791
t-value -2.945
p-value 0.018
Confidence Interval (-25.161, -3.061)
P369463-Ch007.qxd 9/2/05 11:17 AM Page 414
Chapter 7 Hypothesis Testing
414
Statistically Significant vs. Practically Significant At times test results can be statistically significant without any practical value. For example, if a drug could reduce the risk of a disease from 1 in 10 million to 1 in a million, even though the drug’s effectiveness is statistically significant (factor of 10), it has little practical significance. Suppose the effects of disposable contacts were tested where subjects wore a defective lens in one eye and a nondefective lens in the other. Further suppose that the results showed that the defective lens averaged 10 microcysts per eye, while the good lens averaged only 3 microcysts per eye, resulting in a p-value of 0.02. But if fewer than 50 microcysts per eye require no clinical action, then the test results were statistically significant but not practically significant. Suppose in a 2-month weight-loss experiment, the following data resulted.
Before 120 131 190 185 201 121 115 145 220 190 pounds After 119 130 188 183 188 119 114 144 243 188 Delta 1 1 2 2 3 2 1 1 -23 2
When tested at 5%, H0 of no difference in weight loss could not be rejected. However, 9 out of 10 lost weight on the diet. These test results have practical significance in that 9 out of 10 lost weight, even though the results were not statistically significant.
7.6
Hypothesis Tests: Variances Oftentimes we are interested in testing the spread of a distribution and we hypothesize bounds for the spread. Tests for these bounds center around the ( n - 1) S 2 chi-square (c 2) RV with confidence interval: s2 ( n - 1) S 2 c 12-a /2,n -1
£s2 £
( n - 1) S 2 c a2/2,n -1
.
(7–4)
The test for variance determines whether a sample with variance S2 came from a normal distribution with population variance H0 = s 2.
EXAMPLE 7.25
Let X1, . . . , X12 be a random sample from a normal distribution for which m and s 2 are unknown. Test H0: s 2 = 4 vs. H1: s 2 π 4 at a = 0.05 using the following sample: 55 49 30 61 33 37 42 50 63 50 43 62
P369463-Ch007.qxd 9/2/05 11:17 AM Page 415
7.6 Hypothesis Tests: Variances
415
Solution (svar '(55 49 30 61 33 37 42 50 63 50 43 62)) Æ 125.36. Using equation (7–4), ( n - 1) s 2 c 12-a /2,n -1 (12 - 1)125.36
£s2 £
£s2 £
21.92
( n - 1) s 2 c a2/2,n -1 (12 - 1)125.36 3.79
or 62.91 £ s 2 £ 363.84, which implies we must reject the null, since 4 is not in the interval. Equivalently, the c2 value (11 * 125.36)/4 = 86.185 > c211,a/2=0.025 = 21.92. Since the c2 distribution is not symmetric, the approximate two-tailed p-value is given by the command (* 2 (U-chi-sq 11 344.73)) Æ 0.0000.
(variance-test data H0) returns the chi-square statistic with two-tail p-value for testing. H 0: s 2 = c vs. H1: s 2 π c (variance-test '(55 49 30 61 33 37 42 50 63 50 43 62) 4) returned chi-square statistic = 344.73 with p-value 0.0000, 95% confidence interval is (62.90, 363.66), 99% confidence interval is (51.41, 542.54).
EXAMPLE 7.26
It is desired to test H0: s 2 = 25 vs. H1: s 2 ≥ 25 with a set at 0.05. A sample size of 16 from an assumed normal distribution reveals the sample variation to be 28. Can the hypothesis be rejected? If not, determine how large the sample variation can be before rejection occurs. Solution c2 =
(16 - 1)28 25
2 = 16.8 < 25.0 = c 15 ,0.95
fi p-value = 0.33 and H 0 cannot be rejected. The critical sample variation is given by 2 2 15 * SCritical /25 or SCritical = 25 * 25/15 = 41.7.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 416
Chapter 7 Hypothesis Testing
416
Hypothesis Tests for the Equality of Two Variances When given two random samples from two independent normal populations with unknown means and variances, it may be desirable to test for the equality of the population variances, especially before conducting a t-test of the difference between two means. s 12 = 1 vs. To test the H0: s 12 = s 22 vs. H1: s 12 π s 22, or equivalently H0: s 22 S 12 s 12 π 1, use the F statistic, the ratio of two chi-square RVs with being a S 22 s 22 s 12 point estimator for . s 22 Associated with the statistic are (n1 - 1) degrees of freedom for the numerator and (n2 - 1) degrees of freedom for the denominator. A 100 (1 - a)% confidence interval is given by P
2 s 12 S 12 Ê S1 F1-a /2 ( v2 , v1 ) < Fa /2 ( v2 , v1 ) = 100(1 - a )%. < Ë S2 s 22 S 22 2
We compute f0, the value of the ratio of the variances. We reject H0 if upper tail f0 > fa/2 ( n2 - 1, n1 - 1) or if lower tail f0 < f1-a/2 ( n2 - 1, n1 - 1) where f1-a /2 ( n2 - 1, n1 - 1) = 1/ fa /2 ( n1 - 1, n2 - 1). Although the designation of the population variances is arbitrary, it is customary to designate the larger population variance as s 12 for one-tailed tests.
EXAMPLE 7.27
Is there a difference between the two sample variances from normal populations at a = 0.10 for the following data? Find the p-value. If not, is there a difference between the means of the two normal populations with a set at 0.05? x1 = 15
x2 = 22
n1 = 9
n2 = 7
2 1
s = 30 Solution
H0: s 12 = s 22 vs. H1: s 12 π s 22
s 22 = 50
P369463-Ch007.qxd 9/2/05 11:17 AM Page 417
7.7 Hypothesis Tests for Independence, Homogeneity, and Goodness of Fit
417
The command (L-Fd 8 6 3/5) returns 0.2458, which is P(X < 3/5) with 8 and 6 degrees of freedom for the F-ratio value 30/50 from the F-distribution. The two-sided p-value is given by P(X < 3/5) + P(X > 5/3) or by the command ( + (L-Fd 8 6 3/5)(U-Fd 6 8 5/3)), returning 0.2458 + 0.2458 = 0.4917, the lower and upper tails, respectively. Equivalently, the command (cif 30 50 8 6 0.10) returns (0.1209 2.4007), a 90% confidence interval for s12/s 22. Note that 30/50 = 0.6 is in the interval. To now test H0: m1 = m2 vs. H1: m1 π m2, ( X 1 - X 2 ) - ( m1 - m 2 ) 2 Pooled
S
1ˆ Ê 1 + Ën n2 ¯ 1
=
(15 - 22) - 0 Ê 1 1ˆ 38.57 + Ë 9 7¯
= -2.237 with p-value 0.0420.
The command (Fdata-test sample-1 sample-2) returns the F-ratio of s12/s22 and the p-value for testing the equality of the two variances. For example, (Fdata-test '(60 45 80 87 79 75 60 30 45) '(75 65 90 80 89 95 85 69 40)) returns (F-ratio = 1.318 p-value = 0.702). (F-test n1 n2 ratio) returns the two-tailed p-value for testing the equality of two variances given the degrees of freedom for the numerator n1 and denominator n2. (L-Fd n1 n2 x) returns P(X < x); (U-Fd n1 n2 x) returns P(X > x), for example, (L-Fd 8 6 3/5) returns 0.2461.
7.7
Hypothesis Tests for Independence, Homogeneity, and Goodness of Fit Suppose we have data from an unknown population and we hypothesize that the population is Poisson or binomial or some other distribution. How can we test such a hypothesis? Karl Pearson developed the chi-square test, which provides an appropriate but approximate statistic, namely, m
c2 = Â i =1
( Q j - E j )2 Ei
.
(7–5)
P369463-Ch007.qxd 9/2/05 11:17 AM Page 418
418
Chapter 7 Hypothesis Testing
We reject the null hypothesis if the test value exceeds the critical c 2 value for the appropriate degrees of freedom and level of significance. Each Oi represents an observed value, and each Ei is the expected value for that observation under the hypothesis. The degrees of freedom v associated with the test is m - k - 1, where m is the number of class intervals or cells and k is the number of parameters for which estimates have been used. The a level for the chi-square test is always upper-tailed. The number in each class as a rule should be at least 1 and at least 80% of the class sizes should be larger than 5. If not, the diminished classes can be lumped into the adjoining classes with appropriate adjustments. If the observations are assumed to be from a binomial distribution, then m
 i =1
(Oi - Ei )2 Ei
m
=Â
O 2i - 2Oi Ei + E 2i Ei
i =1 m
and 7–5 can be simplified to
O
2 i
ÂE i =1
m
=Â i =1
O 2i Ei
m
m
- 2Â Oi + Â Ei , i =1
i =1
- n since n = SOi = SEi, further imply-
i
ing S(Oi - Ei) = 0. The chi-square test is also applicable for testing the independence between two factors or for the randomness of data. For example, the chisquare test can reveal whether males vote differently from females, whether married people vote differently from those who are single, or whether proficiency in one subject matter is similar to proficiency in another subject matter. These concepts are illustrated in the examples.
R ¥ C Contingency Tables Test for Homogeneity and Independence When n sample items are classified by different figures of merit, one may wonder whether one figure of merit favored the samples more than another. For example, we expect ACT scores to be highly correlated with SAT scores; that is, we do not expect the scores to be independent. The two tests for homogeneity and independence for r ¥ c contingency data are computed identically. The test for homogeneity consists of r rows of data with c columns, for example, the 6 categories of a die rolled 1296 times:
Die-Face Count
1 215
2 212
3 221
4 204
5 199
6 245
.
The expected number for each face is 1296/6 = 216. m
c2 = Â i =1
(Oi - Ei )2 Ei
=
(1 + 16 + 25 + 144 + 289 + 841) 216
= 6.09 < c 52,0.05 = 11.07.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 419
7.7 Hypothesis Tests for Independence, Homogeneity, and Goodness of Fit
419
Thus the data support the hypothesis of homogeneity. The data for testing independence may be organized into a contingency table where one criterion classification is shown in the rows and another criterion classification is shown in the columns. Consider the following table pertaining to the marital status and gender of employees. We seek to determine if there is a relationship between gender and marital status. The null hypothesis is that gender and marital status are independent.
Married Unmarried Totals
Male 25 35 60
Female 15 25 40
Totals 40 60 100
The probability of being in the first row is 40/100; the probability of being in the first column is 60/100. The probability of being in the first row and the first column is (40/100) * (60/100) = 0.24, under the assumption that there is no relationship (independence). We then expect 0.24 * 100 = 24 subjects (in bold) to be in this category of a married male, first row first column.
Married Unmarried Totals
Male 25 (24) 35 (36) 60
Female 15 (16) 25 (24) 40
Totals 40 60 100
Thus, to get the expected number in each category, we simply multiply the row total by the column total and divide by the total of all entries. The expected numbers are shown bold in parentheses. Also, for this example, we have to compute only the expected number for one cell; the other cells can be found by subtracting from the total in the category. The degrees of freedom associated with this chi-square test is (r - 1) * (c - 1), where r is the number of rows and c is the number of columns. There is 1 degree of freedom. The null hypothesis is stated as, “Gender is independent of marital status,” or more specifically, H0: Oij = Eij for all cells vs. H1: Oij π Eij for at least one cell. Alternative hypotheses are usually two-tailed. Using the chi-square statistic m
X2 = Â i =1
(Oi - Ei )2
,
Ei
we have c 12 = (1/24 + 116 / + 1/36 + 1/24) = 0.1736 with an approximate p-value of 0.68 for 1 degree of freedom.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 420
420
Chapter 7 Hypothesis Testing
The command (chi-square-test list-of-row-lists) prints the chi-square contribution from each cell and the expected value of each cell and returns the value of the chi-square statistic, the p-value, and degree of freedom. chi-square-test '((25 15)(35 25))) Æ 4 cell chi-squares = 0.0416 + 0.0625 + 0.0277 + 0.0416 = 0.1736 The expected values are: (24 16) (36 24)). (chi-square = 0.1736 p-value = 0.677 df = 1) (chi-square-test '((215 212 221 204 199 245))) returned the die count from 1296 trials. 5 cell chi-squares = 0.005 + 0.074 + 0.116 + 0.667 + 1.338 + 3.894 = 6.093 The expected values are: (216 216 216 216 216 216)) (Chi-square = 6.093 p-value = 0.297 df = 5)
EXAMPLE 7.28
Test at a = 1% whether the first 1000 decimal digits of p show homogeneity.
0 88
1 107
2 97
3 103
4 92
5 95
6 88
7 92
8 98
9 100
Solution H0: oi = ei for all cells vs. H1: oi π ei for at least one cell. Observe that the chi-square test is for homogeneity among the outcomes and that the sample size is under the control of the experimenter. Under the hypothesis that the first 1000 digits of p are homogenous, the expected number for each of the 9 digits is 1000/10 = 100. Then, if we use m
 i =1
(Oi - Ei )2
,
Ei
X2 = 0.667 + 1.260 + 0.010 + 0.510 + 0.167 + 0.010 + 0.667 + 0.167 + 0.042 + 0.167 = 3.667; P-value = 0.932; df = 9; cannot reject. EXAMPLE 7.29 In a city the results of a poll show the following preference by three groups of people for Candidate X and Candidate Y, who are running in a local election. Is there a relationship between preference and group at a = 1%?
P369463-Ch007.qxd 9/2/05 11:17 AM Page 421
7.7 Hypothesis Tests for Independence, Homogeneity, and Goodness of Fit
Prefer X White Black Hispanic Total
525 (502.2) 400 (461.6) 600 (561.3) 1525
421
Prefer Y
No Preference
Total
105 (149.8) 200 (137.7) 150 (167.5)
50 (28) 25 (25.7) 10 (31.3)
680 625 760
455
85
2065
Solution The null hypothesis is that there is no relationship among the group preferences or that the preferences are random. (chi-square-test '((525 105 50)(400 200 25)(600 150 10))) returned c 42 = 87.1 > 13.3 = c 42,0.01, which strongly implies that there is a relationship among the group preferences. The p-value is P(c42 > 87.6) ª 4E-18 4.46) = 0.216 > a = 0.05. We cannot reject the hypothesis that the underlying distribution is Poisson with parameter k = 3/4. If the Poisson k parameter is not specified, an estimate can be provided from the data. However, the c2 statistic then has one less degree of freedom. For a more accurate test the cells should have at least 5 observations in each class. Those classes with less than 5 can be combined with the other classes to achieve this goal before testing.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 426
426
Chapter 7 Hypothesis Testing
The command (poisson-gf x-values frequency k-optional), with x-values being the list of the number of defects and frequency the list of the corresponding frequencies, returns the estimated value for k (if k-value is not entered), the chi-square value, and the p-value for the test. For example, (poisson-gf '(0 1 2 3) '(67 40 10 3)) returns X = (0 1 2 3) Frequency = (67 40 10 3) Total = 120 Probabilities = (0.563 0.324 0.093 0.018) Expected = (67.525 38.827 11.163 2.140) k-hat = 0.575, c 2 = 0.5067, v = 2, p-value = 0.224.
EXAMPLE 7.35
Perform a goodness-of-fit test on the following data generated by the command (setf nb-data (sim-neg-binomial 3/4 3 100)) with parameters p = 3/4 and k = 3 successes. 3 3 5 4 3 3 5 4 3 3 4 4 3 3 5 6 4 3 3 3 4 3 5 4 3 4 3 5 6 5 3 3 3 4 4 4 3 6 3 6 5 3 3 3 3 3 5 3 3 4 4 3 3 3 3 5 4 4 4 4 6 37 4 3 4 3 4 5 6 6 3 3 3 3 6 3 3 4 3 4 3 3 3 37 5 4 4 4 3 4 4 4 4 47 4 3 6 Solution
(print-count-a-b 3 7 nb-data) returns
Integer Count tabled as
3 46
X Frequency P(X ) Expect
4 31
3 46 0.422 42.2
5 11
6 9
4 31 0.316 31.6
7 3,
5 11 0.158 15.8
≥6 12 0.1035 10.4
We combine the count 3 for X = 7 into the cell for 6 or more. (mu-svar nb-data) returns (3.92 1.206), showing that that average number of trials is 3.92 ª k/p = 4; the sample variance is 1.21 ª (kq/p2) = 1.33. If the parameters are unknown, these two equations can be solved simultaneously to get ˆ = 0.764 and kˆ = 2.995 ª 3. p For example, P(X = 3 successes in n = 4 trials with p = 3/4) is given by (neg-binomial 0.75 3 4) Æ 0.3164. Then
P369463-Ch007.qxd 9/2/05 11:17 AM Page 427
7.7 Hypothesis Tests for Independence, Homogeneity, and Goodness of Fit
c2 =
(46 - 42.2)2 42.2
+
(31 - 31.6)2
+
31.6
(11 - 15.8)2
+
427
(12 - 10.4)2
15.8
= 2.06
10.4
for v = 1 df, with p-value 0.151, implying that we cannot reject the hypothesis that the data came from a negative binomial distribution.
The command (negbin-gf data k) returns the chi-square value, v degrees of freedom, and p-value for testing whether the sample came from a negative binomial. For example, (negbin-gf (sim-neg-binomial 3/4 3 100) 3) printed X Frequency Probabilities Expected
3 44 0.422 42.19
4 34 0.316 31.64
5 17 0.158 15.82
6 4 0.066 6.59
7 1 0.025 2.47
and returned ˆ = 3/4 , k = 3, c 2 = 2.237, v = 4, p-value = 0.308). (p
EXAMPLE 7.36
Determine if the following sorted random sample of size 30 came from N(50, 25). 41.99 43.49 43.50 44.20 44.43 45.37 45.45 46.16 46.34 46.88 47.23 47.47 47.72 49.71 51.42 51.49 51.80 51.82 52.30 52.80 53.28 55.01 55.34 55.40 56.43 56.53 57.52 58.30 60.80 60.80 Solution H0: Random sample is from N(50, 25) vs. H1: Sample is not from N(50, 25). To demonstrate, we compute quartiles. The 25th and 75th percentiles of N(50, 25) are P25 = 50 + 5( -0.6742) = 46.629; the command (inv-phi 1/4) returns -0.6742. P50 = 50 P75 = 50 + 5(0.6742) = 53.371; the command (inv-phi 3/4) returns 0.6742. Now count the number in each of the four cells and use the chi-square test. There are 9 samples less than 46.629, 5 between p25 and p50, 7 between p50 and p75, and 9 above p75. These cell numbers do not arouse suspicion since the expected number in each is 7.5. The command (chi-square-test '((9 5 7 9)) returns c 2 = 1.467, v = 3 df, p-value = 0.6893, and the null hypothesis is accepted.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 428
Chapter 7 Hypothesis Testing
428
The command (phi-test mu sigma sample) returns the chi-square value and v degrees of freedom and p-value for testing whether the sample came from N(m, s 2). For example, (phi-test 50 5 sample), where sample is the data in Example 7.36, returned Number in each cell: (9 5 7 9) x = 50.699, s = 5.440 4 cell chi-squares = 0.300 + 0.833 + 0.033 + 0.300 = 1.467 The expected cell values are: ((7.5 7.5 7.5 7.5)) ( c 2 = 1.467, p-value = 0.69, df = 3). Experiment with (phi-test 50 4 (sim-normal 48 10 100)) by varying m and s with the simulated normal data. (normal-test sample) similarly returns the cell chi-squares into 10 deciles from testing the data with computed mean and standard deviation. (normal-test (sim-exponential 2 100)) returned (chi-square = 34.2 p-value = 8.25e - s df = s).
Probability Plots A qualitative method for assessing goodness of fit by using the eye is the graphical probability plot. If we sample from a continuous uniform on [0, 1] and then sort the sample in ascending order, a plot of the sorted data versus the expected values of the ordered ascending values (X(k)) should appear approximately linear with slope 1 if our assumption of uniformity is correct. The expected value of each ordered sample is given by E( X ( k ) ) =
k n +1
,
where n is the number of samples and where the expected values may be viewed as ordered percentiles. That is, a plot of the cumulative distribution k function F(X(k)) versus is of essentially similar data and should be n +1 approximately linear with slope 1 if the sample data were drawn from the assumed distribution where X(k) is a typical ordered statistic. Special probability paper exists for the normal, lognormal, gamma, Weibull, and other distributions. However, ordinary graph paper is sufficient after appropriate transformations of the data. An example will illustrate this procedure. EXAMPLE 7.37
Create a probability plot for a random sample of size 25 from the continuous uniform on [0, 1], using the software command (sim-uniform 0 1 25).
P369463-Ch007.qxd 9/2/05 11:17 AM Page 429
7.7 Hypothesis Tests for Independence, Homogeneity, and Goodness of Fit
429
Solution 1) Generate the random samples. The template (sim-uniform a b n) returns a random sample of size n from the interval [a, b]. For example, (setf U (sim-uniform 0 1 25)) assigned the following 25 samples to U: 0.206 0.898 0.995 0.595 0.995 0.396 0.060 0.707 0.633 0.517 0.465 0.922 0.092 0.542 0.853 0.784 0.933 0.039 0.449 0.640 0.188 0.003 0.379 0.425 0.135.
2) Sort the random samples in ascending order.
The command (sort U #' 16, can H0 be rejected? Repeat when H1: m π 16. 3. To test H0: m = 50 vs. H1: m > 50, a random sample of 10 items is taken from N(m, 4). The decision is to reject H0 if the sample mean exceeds 51.8. Find the significance level of the test and determine if the following data reject the H0: 51 47 51 51 47 55 45 51 45 53. ans. 0.05017 51.5 < 51.8 fi Do not reject.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 438
Chapter 7 Hypothesis Testing
438
Table 7.4
Common Hypothesis Tests
Hypothesis
Test Statistic
H0: m = m0 vs. H1: m π m0
Z=
X - m0
H0: X = X0 vs. H1: X π X0
Z=
X - X0
H0: p = p0 vs. H1: p π p0
X - p0 Z= n p0q0
s/ n
npq
Rejection Criteria
Comment
|z| > za/2 |z| > za/2
Binomial X n large
|z| > za/2
Proportion p
n H0: m1 = m2 vs. H1: m1 π m2
X1 - X 2
Z=
s 12 n1
H0: p1 = p2 vs. H1: p1 π p2
H0: mD = d0 vs. H1: mD π d0
T=
X - m0 S/ n D - d0 Sd / n
T=
X1 - X 2 1 1 + n1 n2
SPooled c2 =
( n - 1) S 2 s 02
|z| > za/2
n2
ˆPooledqˆPooled Ê 1 + 1 ˆ p Ën n2 ¯ 1 T=
H0: s 2 = s 20 vs. H1: s 2 π s 20
s 22
ˆ1 - p ˆ2 p
Z=
H0: m = m0 vs. H1: m π m0
H0: m1 = m2 vs. H1: m1 π m2
+
|z| > za/2
Pooled Proportion
|t| > tn-1,a/2
t Distribution
|t| > tn-1,a/2
n Pairs
|t| > tn1+n2-2,a/2
Pooled S2
c2 < c2n-1,1-a/2 c2 > c2n-1,a/2
H0: s 12 = s 22 vs. H1: s 12 π s 22
F=
S12 S22
f0 > fa/2(n - 1, m - 1) f0 < f1-a/2(n - 1, m - 1)
4. Given n = 36, a = 0.01, s = 4 for directional testing H0: m = 90 vs. H1: m = 93, find the probability of a Type II error when sampling from a normal distribution. 5. Equate the a rejection region boundary point under the supposition that the null value m = m0 with the same boundary point under the supposition that the true mean m = m1 to derive a formula expressing n as a function of the critical a and b values, s, and the hypothesized means. Assume m1 < m0.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 439
439
Problems
6. Test the formula derived from Problem 5 on Problem 4 with a at 0.01 and the computed b at 0.0148 to verify that the required sample size is 36 by considering symmetry. 7. Find the sample size from a normal distribution that will yield errors of a = 0.05 and b = 0.01, testing H0: m = 130 vs. H1: m = 129 with s = 2.1. ans. 70 8. Use two-sided hypothesis testing to find the required sample size n from a normal distribution for each of the following: a) b) c) d)
m0 m0 m0 m0
= = = =
100, 100, 100, 100,
m1 m1 m1 m1
= = = =
110, 110, 110, 105,
s s s s
= = = =
25, 25, 25, 25,
a a a a
= = = =
0.05, 0.10, 0.05, 0.05,
b b b b
= = = =
0.10; 0.10; 0.20; 0.05.
9. a) Find the p-value when the number of heads from 100 flips of a fair coin is between 40 and 60 inclusive, using the normal approximation with continuity correction to the binomial. ans. (del-normal 50 25 39.5 60.5) Æ 0.0357. b) State the decision if a is set at 5%. ans. Reject. c) Find Type II error if P(heads) = 0.6. ans. (del-normal 60 25 39.5 60.5) Æ 0.541. 10. Find the p-value for testing H0: m = 1000 vs. H1: m π 1000 given s = 125, n = 8, x = 1070, and a = 0.05 when sampling from a normal distribution. 11. If a machine produces more than 10% defectives, repair is in order. In a random sample of 100 items, 15 defectives were found. Assume that a large lot of items are produced daily. Does this sample evidence support Repair at a = 0.01? Find the p-value. ans. no p-value = 0.048. 12. Given x = 2960, s = 36, n = 8, test H0: m ≥ 3000 versus H1: m < 3000 at a = 0.05 when sampling from a normal distribution. 13. Test H0: s 2 £ 0.02 versus H1: s 2 > 0.02 with n = 10, s2 = 0.03, a = 0.05 when sampling from a normal distribution. ans. p-value = .14126. 14. To test H0: m £ m0 vs. H1: m > m0, given s = 28, n = 100, the decision rule is to accept H0 if x £ 110 and to reject H0 if x > 110. a. Determine a when m0 = 110 and b(m = 115). b. Determine a when m0 = 112 and b(m = 115). Is the new a smaller or larger? . . . the new b? ans. 0.7625 0.0371. 15. Is there a difference between the two population means at a = 5% for the following data? ans. yes p-value = 0.0006. n1 = 100
n2 = 100
x1 = 50
x2 = 50
s12 = 18
s22 = 16
P369463-Ch007.qxd 9/2/05 11:17 AM Page 440
440
Chapter 7 Hypothesis Testing
16. Is there a difference between the two population variances at a = 10% for the following data? Use F-statistic to test H0: s 12 = s 22 vs. H1: s 12 π s 22. n1 = 10
n2 = 10
x1 = 25
x2 = 37 2 1
( n1 - 1) s = 200
( n2 - 1) s22 = 180
17. Show that the pooled variance estimator is unbiased for s 2. 18. Test the claim that a new drug is 90% effective in controlling arthritis if 160/200 people experience control at a = 0.05. Test for both the number and the proportion of people experiencing control. 19. Quiz 1 for 50 students and quiz 2 for 60 students had mean scores, respectively, of 80 and 83 with standard deviations of 5 and 6. Is there a significant difference in quiz difficulty at a = 1%? ans. z0.005 = -2.860 p-value = 0.004. 20. Determine if there is a difference in means at a = 5% when sampling from two normal populations with the same variance. n1 = 20
n2 = 18
x1 = 110
x2 = 108
s12 = 9
s22 = 16
21. The two samples were taken from normal distributions with unknown means but with equal variances. Determine if there is a difference in their means at a = 5%. ans. p-value = 0.498. Sample 1: 25 34 34 27 28 27 36 Sample 2: 36 29 28 22 25 37 20 30 28 22. A standard box is to be filled with 25 oz. with standard deviation of 0.25 oz. A random sample of 15 boxes revealed a sampling error s equal to 0.36 oz. Test the hypothesis H0: s 2 = 0.0625 versus H1: s 2 π 0.0625 with a set at 5%. 23. Let X1, . . . , X16 be a random sample from a normal distribution with mean m and variance 4. In testing H0: m = 0 vs. H1: m = 1, the critical region is x > xc. If the significance level of the test a is set at 4%, find the respective value of xc and the probability of a Type II error. ans. 0.875 0.4013 24. Use the chi-square test at a = 5% to determine whether the number of occurrences of the 301 to 400 digits of p are biased. The command (setf pi-digits pi-400) generates 72458 70066 06315 58817 48815 20920 96282 92540 91715 36436 78925 90360 01133 05305 48820 46652 13841 46951 94151 16094.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 441
441
Problems
The command (print-count-a-b 0 9 pi-digits) returns Digit Count
0 13
1 13
2 9
3 8
4 10
5 12
6 11
7 5
8 10
9 9
25. a) Determine the expected number of 5-card poker hands containing just 3 of the same rank formed from the first 1000 decimal digits of p. ans. 14.4 vs. 16. (setf pi-5 (re-group (pi1000) (list-of 200 5))) (sum (repeat #' rank-n pi-5 (list-of 200 3))) Æ 16 triples vs. 14.4 expected b) Repeat for expected number of exactly one pair. ans. 100.8 vs. 97. (sum (repeat #' rank-n pi-5 (list-of 200 2))) Æ 97 vs. 100.8 expected 26. Von Mises (1964) tested the first 2035 (1017 pairs) digits of p grouped by 2 for the number of each from 00 to 99. (re-group (firstn 2035 pi-2500) (list-of 1017 2)) generates the 1017 double digits pairs. ((1 4) (1 5) (9 2) (6 5) (3 5) (8 9) (7 9) (3 2) (3 8) (4 6) (2 6) (4 3) (3 8) (3 2) (7 9) (5 0) (2 8) (8 4) (1 9) (7 1) (6 9) (3 9) (9 3) (7 5) (1 0) (5 8) (2 0) (9 7) (4 9) (4 4) (5 9) (2 3) (0 7) (8 1) (6 4) (0 6) (2 8) (6 2) (0 8) (9 9) (8 6) (2 8) (0 3) (4 8) (2 5) (3 4) (2 1) (1 7) (0 6) (7 9) (8 2) ... ... ... (cnt-von-2) returned the number of each of the 100 double digits from 00 to 99, with each double digit having a probability of 1/100. (9 12 10 7 14 10 11 6 6 15 9 13 11 11 6 4 4 11 10 20 7 15 8 8 8 15 15 14 14 7 9 11 9 10 9 12 7 9 8 9 8 9 9 8 10 15 11 12 15 13 11 7 8 12 4 12 10 7 9 13 16 8 9 11 11 10 10 7 9 12 5 12 10 11 5 17 10 13 16 12 5 17 13 10 9 13 14 5 7 10 5 10 12 9 12 10 9 5 9 13) That is, there were 9 00's, 12 01's, 10 02's, etc., and 13 99's. Use the chi-square test to determine if the expected number of double digits is similar to the actual number. (chi-square- test (list (cnt-von-2))) returns (chi-square- value = 100.306, df = 99, p-value = 0.4444) fi cannot reject. (print-von-2) print the respective count below each digit pair from 00 to 99. 27. Compute the probability of a full house, using 5 digits at a time, from the first 1000 digits of p. (print-5 (repeat #'juta-list (re-group (pi1000) (list-of 200 5))(list-of 200 5))) returns list of 200 5-card poker hands. ans. theoretical = 0.009 actual = 0.005 = 1/200.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 442
442
Chapter 7 Hypothesis Testing
28. The following data reflect the number of defects in working on two parts by two shifts. Is there a relationship between the shifts and parts or are they independent? Part 1 Part 2
Shift 1 5 20
Shift 2 15 10
29. Parts from three suppliers, X, Y, and Z, were sorted as to Acceptable (A), defective but repairable (DBR), and condemned (C). Determine if there is a relationship between the suppliers and their parts. ans. no c 24 = 1.64 p-value ª 0.802. X Y Z
A 125 225 250
DBR 10 12 15
C 2 5 3
30. In flipping a coin 1000 times, the tally is 530 heads and 470 tails. Use the normal approximation (without continuity correction) to the binomial to test for a fair coin and report the p-value. Repeat the test for a fair coin by using the chi-square test. Show that z2 = c 2. 31. a) The test of 3 batteries for a calculator revealed the following number of defects from a random sample of 450 calculators. Determine if the distribution of the defects is binomial. Number of Defects Frequency Binomial Probability Expected Frequency
0 250 0.4408 198.4
1 100 0.4152 186.8
2 75 0.1304 58.7
3 25 0.014 6.3
b) Determine whether the tabulated random sample below is binomial. x = 3.89. X Frequency
0 2
1 6
2 21
3 12
4 19
5 22
6 9
7 7
8 2
9 0
The command (binomial-gf n X-list frequency p-optional) returns the probabilities and expected frequencies with the c2 value. For example, (binomial-gf 9 '(0 1 2 3 4 5 6 7 8 9) '(2 6 21 12 19 22 9 7 2 0)) returns the display X Frequency Probability Expected
0 2 0.0061 0.61
1 6 0.0420 4.20
2 21 0.1279 12.97
3 12 0.2272 22.72
4 19 0.2595 25.95
5 22 0.1975 19.75
6 9 0.1002 10.02
7 7 0.0327 3.27
(p-hat = 0.432, c2 = 23.807, v = 8, p-value = 4.616)
8 2 0.0062 0.62
9 0 0.0005 0.05
P369463-Ch007.qxd 9/2/05 11:17 AM Page 443
443
Problems
32. a) The following frequencies were generated using the commands (setf X (upt0 7)) Æ (0 1 2 3 4 5 6 7); (setf frequency (repeat #' rest (count&pair (sim-poisson 3 50)))) Æ (3 9 11 8 8 5 3 3). Assume you do not know this and check to see if the distribution is Poisson with parameter k = 3 and a set at 5%. 0 3
X Frequency
1 9
2 11
3 8
4 8
5 5
6 3
7 3
The command (poisson-gf X frequency 3) returns a display of the probabilities and expected frequencies along with the c 2 and p-values. b. Determine if the following random sample is Poisson ( x = 47.8, s2 = 53.75) 40 51 37 60 52 54 48 54 43 51 47 39 34 39 51 55 52 52 56 41. 33. a) A person is predicting the color of each card from a deck of 52 cards. The experimenter does not give feedback until all 52 cards have been predicted. How many cards must be predicted correctly to be significant at a = 0.05? b) Repeat for predicting the suit. c) Repeat for predicting the rank. ans. 32 18–19 7. 34. Show that Z2 = c 2 for a chi-square test with two categories with probability p. 35. Create the beta operating curve with a set at 0.05 for testing H0: m = 100 and s = 10 in increments of 2 units about H0 of m = 100 for a random sample X1, . . . , X25. Accepted for x in (100 ± 1.645 * 10/5) or (96.71, 103.29). Then repeat for upper-tail alternative hypotheses. Two-tail
m b
96 0.930
98
100
102
104
106
Upper-tail
m b
96 0.9999
98
100
102
104
106
36. Create a beta operating curve for a = 0.01 about H0: m = 10 in increments of 1 with s = 4 and n = 36 for accepting x in m ± z0.005s/ n . Two-tail
m b
8 0.336
9
10
11
12
13
Upper-tail
m b
8 0.999
9
10
11
12
13
P369463-Ch007.qxd 9/2/05 11:17 AM Page 444
444
Chapter 7 Hypothesis Testing
37. Are the following numbers random? Could a goodness-of-fit test for the number of each digit expected detect patterns? What number comes after every 6? ans. no no 1 3 0 6 1 4 6 1 3 4 9 8 5 6 1 9 5 8 8 7 3 4 5 8 7 2 4 5 7 6 14 9 83 4 5 7 6 1 3 8 6 1 0 4 4 8 3 3 2 3 8 3 8 6 1 3 4 5 7 9 6 1 33 388 6 161 2 4 8 5 6 1 3 0 94 6 1 7 3 8 6 10 3 6 18 615617 38. The personnel department claims that they hire without regard to sex and show the last 30 hires: m f m f m f m f m f m f m f m f m f m f m f m f m f m f m f, where m indicates male and f female. Can one refute the claim? 39. Given that Drug A cured 15 of 25 and Drug B cured 5 of 25 people, determine if there is a difference between the two drugs, using a pooled proportion test. Repeat using the chi-square test for the contingency ans. 2.88672 = 8.333 p-value ª 0.004. table and show that z2 = c 2. Drug A
Drug B
15 10
5 20
Cured Not Cured
40. For the following X and Y data sets from normal distributions, determine if the pooled t-test can be used (F-test). If so, do so; if not, do so and also use the Smith-Satterthwaite procedure to compare results. Determine if there is a significant difference between the means of the populations. X Y
27.5 25.6
30.7 23.4
27.6 23.1
38.9 23.3
37.1 22.7
28.9 25.8
41. Determine if there is a significant difference in systolic blood pressure readings before and after taking medication (see Software Exercise 13). ans. t = 1.626 p-value = 0.1426. Pre-BP Readings Post-BP Readings
160 165
125 130
180 150
157 140
149 135
135 140
140 155
160 135
145 120
42. Determine if the following sorted random sample of 20 numbers could have come from N(25, 81). 6.44 7.78 9.87 15.00 17.54 18.26 20.18 20.65 21.12 21.96 24.31 25.19 26.05 26.12 27.02 30.22 32.07 33.29 36.34 37.59 43. In a test for the effectiveness of a drug, 100 people were given both a placebo and the drug in a double-blind experiment. After a time the treatments received were reversed. The results of the test are shown below, indicating yes/no response to the question, “Did the treatment help?” Report the p-value of testing the null hypothesis of an ineffective drug. ans. 0.000022.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 445
445
Miscellaneous
Drug
Yes No
Placebo Yes No 12 37 19 32
44. A car salesperson claims that the standard deviation of miles per gallon (mpg) for a certain brand of car is 4. The last 10 tests showed mpg of 26, 24, 23, 28, 24, 28, 20, 20, 26, and 30. Can the data refute the claim?
MISCELLANEOUS 1. The density function for random variable X is given by f(x) = (q + 1)xq; 0 £ x £ 1; q > -1. a) Find the method of moments estimator for q.
ans.
1 - 2x
x -1 b) Find the maximum likelihood estimator (MLE) for q. ans. (n/ -Sln Xi) -1. c) The hypothesis H0: q = 1 is to be rejected in favor of H1: q = 2 if and only if X > 0.9. The probability of a Type I error is _______. ans. 0.19. d) The probability of a Type II error is _______. ans 0.729. 2. A coin is tossed 6 times and comes up heads each time. Determine if the coin is fair at a = 1%, using a one-tailed test. 3. Asking the question “Do you do drugs?” or “Do you cheat on your spouse?” is not likely to be answered by most people. In order to depersonalize the responses, mark 50 cards with an A and 50 cards with a B to distribute to 100 respondents with the instructions to answer the personal question if an A-card is received and the question “Does the last digit of your social security number end in a 7, 8, or 9?” if a B-card is received. Let k equal the probability of a yes response. Then X, the number of yes responses, is binomial with parameters n and k. kˆ = 1/2 * p + 1/2 * 0.3, from ˆ can be calculated. For example, suppose 25 yes responses were which p counted from the 100. Then ˆ ª 0.2. 0.25 = kˆ = 1/2 * p + 0.15 fi p ˆ after counting 12 yes responses from 25 A-cards and 25 Calculate p B-cards. ans. 0.18. 4. Two processes (assume Poisson) for making material revealed the following flaws per 10 yards of material.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 446
446
Chapter 7 Hypothesis Testing
Site A
4
6
4
6
Site B
2
1
7
4
5
4
5
7
Test H0: kA = 2kB vs. H1: kA π 2kB to determine if there is a significant difference between the respective k parameters. 5. A business office assumes that its daily number of telephone calls (assume a Poisson process) is less than 50. A 2-week (10-day) log revealed 42 67 70 39 47 53 80 35 46 60. Test H0: k £ 50 vs. H1: k > 50 ans. p-value = 0.044. 6. The numbers of homicides occurring by month are indicated below. Are the numbers of homicides independent of month at a = 5%? Jan 8
Feb 5
Mar 6
Apr 4
May 2
Jun 7
Jul 11
Aug 6
Sep 2
Oct 8
Nov 8
Dec 4
7. A coin is tossed with RV X being the number of tosses until a head occurs. The experiment was repeated 1000 times, resulting in the table below. Test to determine if distribution fits a geometric with p = 1/2.
X
1
Freq
495
2
3
4
5
6
7
8
9
10
241
135
67
33
12
8
4
4
1
8. Use the poker test to determine if the following 200 digits are random by completing the table below. Flushes and straights are not considered. Check the probabilities, using sampling with replacement; for example, P(Bust) = 1 * 0.9 * 0.8 * 0.7 * 0.6 = 0.0324. 6 8 4 6 4
6 0 1 2 0
620 00458 670 88990 135 45263 735 22454 754 12758
Poker Hand Bust One pair Two pair Three of a rank Four of a rank Five of a rank Full house
5 2 8 4 9
4 7 6 1 3
1 1 4 2 2
0 1 8 8 9
6 7 4 2 8
4 0 8 4 8
5425 6436 8188 2045 7742
9 2 5 3 5
1 4 6 5 4
4 7 7 0 1
5 1 2 9 8
0 6 3 9 1
3 5 9 5 5
3 7 5 6 7
6 3 3 9 0
1 5 9 8 9
2 1 4 8 5
8 2 2 0 4
1 7 0 9 2
531 488 553 008 425
1 1 3 8 6
2 0 7 6 2
6 7 0 7 7
7 9 8 0 8
7 3 8 2 9
Observed
Probability
Expected
10 20 7 2 1 0 0
0.3024 1 * 0.1 * 0.9 * 0.8 * 0.7 * 5C2 = 0.504 1 * 0.1 * 0.9 * 0.1 * 0.8 * 5!/(2!)3 = 0.108 1 * 0.1 * 0.1 * 0.9 * 0.8 * 5!/(3!) = 0.072 1 * 0.1 * 0.1 * 0.1 * 0.9 * 5!/4! = 0.0045 1 * 0.1 * 0.1 * 0.1 * 0.1 * 1 = 0.0001 1 * 0.1 * 0.1 * 0.9 * 0.1 * 5!/(3! * 2!) = 0.009
12.096 20.16 4.32 2.88 0.18 0.004 0.36
P369463-Ch007.qxd 9/2/05 11:17 AM Page 447
447
Miscellaneous
9. Determine if the following data are from a normal population. (37.4 53.8 34.9 52.2 29.6 44.1 43.4 41.6 57.5 38.2 53.8 52.7 42.4 50.2 50.0 56.1 49.0 51.9 59.9 53.2 62.1 37.8 45.7 52.0 39.1 62.2 45.7 52.7 46.5 45.2 81.0 49.0 47.3 47.3 47.2 33.5 52.7 23.3 39.3 62.5 41.1 50.1 53.8 57.7 52.4 32.0 50.6 41.5 37.7 70.3) ans. Cannot rule out. 10. Mean of the Square of the Successive Differences (MSSD) Another measure of dispersion is the mean square of the successive differences of a sample. Given sample (setf data '(1 7 5 6 8 19 1 13 12 19)), the statistic is computed as n -1
Â(X MSSD =
i +1
- X i )2
i =1
. 2n
MSSD = [(7 - 1)2 + (5 - 7)2 + (6 - 5)2 + . . . + (12 - 13)2 + (19 - 12)2 ]/2(10) = 38, with sample size 10. (MSSD data) return 38; (svar data) Æ 42.54. The ratio of the MSSD to the sample variance S2 is 38/42.54 = 0.893184. The statistic C = 1-
MSSD
S2 = 1 - 0.893184 = 0.106816 can be tested for large sample size as
Z=
C
.
n -2 n2 - 1 The null hypothesis for the test is that the data are random. For example, (mssd-test (upto 100)) returns z = 10.095, p-value = 0.0000, C = 0.999, MSSD = 0.50, rejecting the hypothesis while (mssd-test (swr 100 (upto 1000))) returned z = 0.207, p-value = 0.4178, C = 0.021, MSSD = 67241.88, failing to reject the hypothesis. (mssd-test data) returned z = 0.376, p-value = 0.3535, C = 0.107, MSSD = 38.
P369463-Ch007.qxd 9/2/05 11:17 AM Page 448
448
Chapter 7 Hypothesis Testing
SOFTWARE EXERCISES 1. For the first 500 digits of p, find the mean, variance, skewness, and kurtosis and how many digits occur before all 10 digits occur. 1 2 9 6 5 2 1 9 4 2 4 2 5 8 5 2 3 6 9
4 7 7 2 1 3 0 5 4 0 5 7 2 9 2 7 2 2 3
1 9 4 8 3 1 2 4 6 1 4 3 0 2 1 0 6 7 8
5 5 9 0 2 7 7 9 1 9 3 7 9 5 3 3 1 7 1
9 0 4 3 8 2 0 3 2 0 2 2 2 9 8 6 1 4 8
2 2 4 4 2 5 1 0 8 9 6 4 0 0 4 5 7 9 3
6 8 5 8 3 3 9 3 4 1 6 5 9 3 1 7 9 5 0
5 8 9 2 0 5 3 8 7 4 4 8 6 6 4 9 3 6 1
3 4 2 5 6 9 8 1 5 5 8 7 2 0 6 5 1 9 1
5 1 3 3 6 4 5 9 6 6 2 0 8 0 9 9 0 3 9
8 9 0 4 4 0 2 6 4 4 1 0 2 1 5 1 5 5 4
9 7 7 2 7 8 1 4 8 8 3 6 9 1 1 9 1 1 9
7 1 8 1 0 1 1 4 2 5 3 6 2 3 9 5 1 8 1
9 3 6 9 1 6 1 7 9 3 2 8 0 5 2 8 3 3 6 6 9 3 0 6 5 4 3 0 4 1 3 0 8 5 8 5 2.
2 3 4 0 8 4 5 8 7 9 6 3 0 5 5 9 4 7
3 9 0 6 4 8 5 1 8 2 0 1 9 3 1 2 8 5
8 9 6 7 4 1 9 0 6 3 7 5 1 0 1 1 0 2
4 3 2 9 6 1 6 9 7 4 2 5 4 5 6 8 7 7
6 7 8 8 0 1 4 7 8 6 6 8 1 4 0 6 4 2
2 5 6 2 9 7 4 3 6 0 0 8 5 8 9 1 4 4
6 1 2 1 5 4 6 6 1 3 2 1 3 8 4 1 6 8
4 0 0 4 5 5 2 6 6 4 7 7 6 2 3 7 2 9
3 5 8 9 0 0 2 5 5 8 9 4 4 0 3 3 3 1
3 8 9 0 5 2 9 9 2 6 1 8 3 4 0 8 7 2
8 2 9 8 8 8 4 3 7 1 4 8 6 6 5 1 9 2
3 0 8 6 2 4 8 3 1 0 1 1 7 6 7 9 9 7
(setf w (pi500)) assigns the variable w to the first 500 digits of p. (all-occur 10) returns the expected number of digits before all digits occur. Try (all-occur 10) and count the integers in p until all the integers 0–9 occur. The last digit to occur is 0 in row 1. (mu w) returns the mean of w. We expect 4.5. What would you expect the skewness of the digits to be if the digits are seemingly random? Try (skewness w). What do you expect for the variance? Recall that for discrete uniform RV X, x = 0, 1, 2, . . . , 9, E(X ) = 4.5 and E(X 2) = 28.5, from which V(X ) = 8.25. (Try (svar w)). 2. Problems 28 and 29 can be solved with command (chi-sq-test list-oflists), which returns the value of the chi-square statistic, the p-value, and the list of expected values. Problem 28: (chi-sq-test '((125 10 2) (225 12 5) (250 15 3))). 3. (phi z) returns P(Z £ z) = F(z); (del-phi z1 z2) returns F(z2) - F(z1). 4. (beta-error m0 m1 s n alpha tail) returns the beta error given the null hypothesis m0, the alternative hypothesis m1, the sampling error s,
P369463-Ch007.qxd 9/2/05 11:17 AM Page 449
449
Software Exercises
sample size n, and tail 'L for Lower, ‘U for Upper and ‘B for both. (betaerror 250 254 30 70 5 'B) returns 0.7998530. Verify problem 4 with (beta-error 90 93 4 36 1 'U) Æ 0.0148. (beta-table m0 beta-list s n a) returns a table of beta errors for alternative hypotheses in the beta-list. (beta-table 300 '(298 300 302 304 306 308) 20 36 5) prints Beta Operating Table m b
298 0.9876
300 0.9500
H0 = 300 302 304 0.8520 0.6719
306 0.4385
308 0.2252.
Beta Operating Curve 1 0.9 0.8 0.7 beta
0.6 0.5 0.4 0.3 0.2 0.1 0 296
298
300
302
304 mu
306
308
310
312
5. The command (binomial n p x) returns the probability of exactly x successes from n Bernoulli trials, where p is the probability of success. Use this command for Problem 31. 6. (t-pool sample-1 sample-2) prints the sample means, sample variances, the pooled variance, and the t-statistic value with the 2-tailed p-value under the assumption that the samples are from normal distributions with the same but unknown variance. 7. (binomial-gf n x-list frequency p-optional) returns the estimated value for p (if p-optional is not given), the chi-square value, and the p-value for a binomial goodness-of-fit test with the x-list the number of successes and frequency the corresponding list of frequencies. For problem 31 the command is, (binomial-gf 3 '(0 1 2 3) '(250 100 75 25)).
P369463-Ch007.qxd 9/2/05 11:17 AM Page 450
450
Chapter 7 Hypothesis Testing
8. (poisson-gf x frequency k-optional) returns the estimated value for k (if k-optional is not given), the chi-square value, and the p-value for a Poisson goodness-of-fit test with x the list of the number of occurrences and frequency the corresponding list of frequencies. Rework problem 32 with the software. (poisson-test data) returns the p-value for testing if the sample data is from a Poisson distribution. (poisson-test (sim-binomial 10 1/2 100)) may return a p-value of 0.006. Change p from 1/2 to 1/20 and expect the p-value to increase sharply. The binomial parameters n = 10 and p = 1/20 render a Poisson approximation k of 1/2. a) The following command simulates 100 random samples from a binomial distribution with parameters n = 6 and p = 1/3. Can the goodness-of-fit test for a Poisson distribution detect the difference? b) Reverse the roles of the Poisson with the binomial setting Poisson k to binomial np. a) (poisson-gf (upt0 6) (repeat #'rest (count-a-b 0 6 (sim-binomial 6 1/3 100)))). b) (binomial-gf 7 (upt0 6) (repeat #'rest (count-a-b 0 6 (simpoisson 2 100)))). 9. Create a probability plot for a sample drawn from an exponential distribution with parameter k = 2. Use the command (sim-exponential 2 25) for a sample of 25. 10. Create a probability plot for a random sample drawn from a normal distribution with mean 5 and standard deviation 4. 11. This exercise correlates two random normal samples from the same distribution. Use software commands (setf x(sim-normal 5 10 100)) (setf y(sim-normal 5 10 100)). Then use the command
(sqrt (R-sq x y))
to find the correlation of the unsorted samples. We expected 0, as the samples are independent. Now use the commands (setf sx (sort (copy-list x) ' tn - 2,0.025 = 2.776 / nS xx 0.447 * 91/ 6 * 17.5 0.416 fi p-value = 0.045; reject the hypothesis that the intercept is zero.
Âx
2 i
The standard error of A (sA) is the denominator value 0.416. A 95% confidence interval for the parameter a is A ± tn -2, a /2 * s A = 1.2 ± 2.776 * 0.416 = (0.0448, 2.355). Notice that the value zero is not in this interval, confirming the rejection of the hypothesis that the intercept a = 0.
The command (test-alpha x y a) returns the t- and p-values for testing H0: a = A; If the value for a is omitted, the value of zero is used. (test-alpha x y 0) returns (t = 2.88, p-value = 0.045). The commands (sa x y) returns 0.416 for sA, the standard error for A. (cia x y a) returns (0.0466 2.353) or 1.2000 ± 1.1534, a (100 - a)% confidence interval for A.
ˆ Distribution of RV Y Yˆ is a point estimator for the mean response a + bx. Since Yˆ | x = A + Bx, and A and B are normally distributed, Yˆ is also normally distributed. E(Yˆ ) = E( A + Bx ) = E( A) + xE( B ) = a + bx = E(Y x ) = m Y x . V (Yˆ ) = V ( A + Bx ) = V ( A) + x 2 V ( B ) + 2x * C( A, B ) s 2 Â x 2 x 2s 2 2xx s 2 = + , nS xx S xx S xx
P369463-Ch008.qxd 9/2/05 2:56 PM Page 479
8.3 Distribution of Estimators with Inference on Parameters
479
where C( A, B ) = C(Y - Bx , B ) = E( BY - B 2 x ) - E(Y - Bx ) E( B ) = E( BY ) - xE( B 2 ) - ab = b (a + b x ) - x
2 Ês ˆ + b 2 - ab ËS ¯ xx
=-
xs 2
.
S xx The interested reader is asked to show that C(B, Y ) = 0 and thus E(B Y ) = E(B)E( Y ). By substituting ( Y - B x) for A, in the equation V( Yˆ) = V(A + Bx) we have s 2 ( x - x )2 s 2 V (Yˆ ) = V (Y - Bx + Bx ) = V [Y + ( x - x ) B ] = + n S xx 2 1 ( x x ) È ˘ = s2Í + . În S xx ˙˚ Notice that the variance is a minimum at x = x and increases as (x - x) increases. When V( Yˆ) is viewed as a function of x, 2( x - x )s 2 V ¢(Yˆ ) = = 0 when x = x . S xx 2s 2 V ¢¢(Yˆ ) = > 0 fi relative minimum. S xx By equating the two derivations, 2 2 2 x 2s 2 2xx s 2 È 1 (x - x) ˘ s  x s2Í + = + . În S xx ˙˚ nS xx S xx S xx
The interested reader is asked to show the equivalency in the problems. 2 È Ê 1 (x - x) ˆ ˘ RV Yˆ = A + Bx is distributed as N Ía + xb , s 2 + ; Ën Î S xx ¯ ˚˙
Yˆ - a - bx
RV
2
is standard normal;
È 1 (x - x) ˘ s2Í + În S xx ˚˙ Yˆ - a - bx
RV
2
È 1 (x - x) ˘ S Í + În S xx ˚˙ 2
is distributed tn - 2 .
P369463-Ch008.qxd 9/2/05 2:56 PM Page 480
480
EXAMPLE 8.13
Chapter 8 Regression
Find a 95% confidence interval for E(Y | x = 4) from the data in Example 8.11 repeated below.
Solution s2 = 0.2.
X
1
2
3
4
5
6
Y
3
5
6
9
10
12
Yˆ = 1.2 + 1.8x with n = 6; x = 3.5; Sxx = 17.5; SSError = 0.8;
2 È 1 (x - x) ˘ ( A + Bx ) ± tn -2,a / 2 S 2 Í + În S xx ˙˚
= 1.2 + 1.8 * 4 ± 2.776
0.8 È 1 (4 - 3.5)2 ˘ + = 84 ± 0.527. (6 - 2) ÍÎ 6 17.5 ˙˚
E(Y x = 4) Œ (7.87, 8.93) with 95% confidence. The command (ciY x y x0 a-level) returns a 100(1 - a - level)% confidence interval for Y | x. For example, (ciY '(1 2 3 4 5 6) '(3 5 6 9 10 12) 4 5) returns (7.873 9.927) or (8.4 ± 0.527).
What would be a suitable prediction interval for just a single predicted value of Y at x? Since RV Yˆ is distributed N(a + bx, s 2), look at RV Y - Yˆ, the residual at Y | x. Yˆ is obtained upon repeated samples at x where Y is the actual ordinate at x. E(Y - Yˆ ) = E(Y ) - E(Yˆ ) = a + bx - a - bx = 0; 2 Ê 1 (x - x) ˆ V (Y - Yˆ ) = V (Y ) + V (Yˆ ) = s 2 + s 2 + Ën S xx ¯ 1 ( x - x )2 ˘ È = s 2 Í1 + + . Î n S xx ˙˚
Then a prediction interval about Y at x is Y x Œ A + Bx ± ta/2,n-2 * s * 1 +
1 n
+
( x - x )2
.
S xx
In Figure 8.6 and Figure 8.7, observe that the mean error is the difference between the mean Yˆ and the true mean a + bx while the predicted error is between the mean Yˆ and the value of Y at x.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 481
8.3 Distribution of Estimators with Inference on Parameters
481
Yˆ = A + Bx Yˆ = A + Bx
E(Y ) = a + bx * *
* E(Y )= a + bx * xp
xp
Figure 8.6
EXAMPLE 8.14
Mean Error
Figure 8.7
Prediction Error
Find the expected value, a 90% confidence interval, and a 90% prediction interval for Y at x = 3.5, given the data from Example 8.13 from which Yˆ = 1.2 + 1.8x. X
1
2
3
4
5
6
Y
3
5
6
9
10
12
Solution SxY = SxY - n x Y = 189 - 6 * 3.5 * 7.5 = 31.5. S xx = S x 2 - nx 2 = 91 - 73.5 = 17.5. The regression line is Yˆ = 1.2 + 1.8x, and thus E(Y x = 3.5) = 1.2 + 1.8 * 3.5 = 7.5. A 90% confidence interval for the mean Y at x = 3.5 is E(Y x ) Œ ( A + Bx ) ± ta / 2 * s
1
+
( x - x )2
n
Œ7.5 ± 2.132 * 0.447 *
S xx 1
+
(3.5 - 3.5)2
6
17.5
Œ7.5 ± 0.39 = (7.11, 7.89). A 90% prediction interval about RV Y |x=3.5 is Y
x=3.5
Œ A + Bx ± ta /2 * s 1 +
1
+
n Œ7.5 ± 2.132 * 0.447 * 1.08 Œ7.5 ± 1.03 Œ (6.47, 8.53).
( x - x )2 S xx
P369463-Ch008.qxd 9/2/05 2:56 PM Page 482
Chapter 8 Regression
482
Prediction Limits
Confidence Limits
6.47
Figure 8.8
7.11
E(Y |x = 3.5) = 7.5
7.89
8.53
Prediction Limits Versus Confidence Limits
Observe that the prediction interval is wider than the confidence interval (Figure 8.8), because the prediction interval is an interval for a future response (random variable), whereas a confidence interval is an interval for a mean response (constant).
The command (ciYp x y x0 a-level) returns a (100 - a-level)% confidence interval for a predicted value of Y at x = x0. (ciYp '(1 2 3 4 5 6) '(3 5 6 9 10 12) 4 5) returns (8.4 ± 1.35). The command (sYm x y x0) returns the standard error of the mean value of Y at x = x0. At x0 = 4, (sYm '(1 2 3 4 5 6) '(3 5 6 9 10 12) 4) returns 0.190. (sYp x y x0) returns the standard error of the prediction of Y for a single value of x = x0. At x0 = 4, (sYp '(1 2 3 4 5 6) '(3 5 6 9 10 12) 4) returns 0.486.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 483
8.4 Variation
EXAMPLE 8.15
483
The command (lr-stats x y) prints many of the regression computations. For example (lr-stats '(1 2 3 4 5 6) '(3 5 6 9 10 12)) prints Sx = 21 SY = 45 Sx 2 = 91 SY 2 = 395 S xY = 189 Yˆ = 1.2 + 1.8 x SY 2 = 395 Sxx = 17.5000 Sxy = 31.5 SSerror = 0.8 s 2 = 0.2 Explained Variation = 56.7 SYY = 57.5
R 2 = 0.9861
For b = 0, (t = 16.837 p-value = 7.292e-5) For a = 0, (t = 2.8823 p-value = 0.045) sa = 0.416333974 sb = 0.1069 (1.5038, 2.0962) or 1.8 ± 0.2962 95% Conf Interval for b (0.0466, 2.3534) or 1.2 ± 1.1534 95% Conf Interval for a Residuals are: -0.000 0.200 -0.600 0.600 -0.200 0.000 The yhats are (3.0 4.8 6.6 8.4 10.200002 12.0) The b coefficients in Y ’s are -0.1429 -0.0857 -0.0286 0.0286 0.0857 0.1429 The a coefficients in Y ’s are 0.6667 0.4667 0.2667 0.0667 -0.1333 -0.3333 (6.9942, 8.0058) or 7.5 ± 0.5058 95% Confidence Interval for Yˆ at x = x (6.1618 8.8382) or 7.5 ± 1.3382 95% Conf Interval for Y-Predict at x = x F-ratio = Explain/(Error/4) = 283.4989
ANALYSIS OF VARIANCE Source Regression Residual Error Total
8.4
SS
DF
MS
F
p-value
56.700 0.800 57.500
1 4 5
56.700 0.200
283.499
0.0001
Variation The quantities in regression may seem similar at first reading and the notation can be confusing. The true value of Y is given by a + bx; but the observed Yi has a random component and thus is equal to a + bxi + ei. The observed
P369463-Ch008.qxd 9/2/05 2:56 PM Page 484
484
Chapter 8 Regression
Yi fluctuate about the regression line because of the random error component e. That is, Y = Yˆ + (Y - Yˆ ), which is the mean predicted value plus the error. Yˆ is equal to A + Bx and is the mean y-value exactly on the regression line, meaning that the expected value of the observed Y is Yˆ. RV Y is the mean of the observed y-values. The three RV Y-values of interest are then Y, Yˆ, and Y . The square of the differences between any two of these RVs is a source of variation. The three variations (shown in Figure 8.9) are S(Y - Yˆ )2 , total variation, S(Yˆ - Y )2 , explained variation due to the linear regression relation, and S(Y - Yˆ )2 , unexplained variation, sum of the squares of the residuals. The relationship among the three variations is Total Variation = Explained Variation + Unexplained Variation. S ( Y - Y )2 = S(Yˆ - Y )2 + S(Y - Yˆ )2 .
Y
Yˆ
*
Yi * Y–Y Y – Yˆ
* Y
Yˆ – Y
*
* *
X
Figure 8.9
Three Variations
(8–24)
P369463-Ch008.qxd 9/2/05 2:56 PM Page 485
8.4 Variation
485
We drop the summation subscripts and derive (8–24) starting with the identity (Y - Y ) = (Y - Yˆ ) + (Yˆ - Y ), (Y - Y )2 = [(Y - Yˆ ) + (Yˆ - Y )]2 , S(Y - Y )2 = S(Y - Yˆ )2 + 2S(Y - Yˆ )(Yˆ - Y ) + S(Yˆ - Y )2 . We show that 2S(Y - Yˆ)( Yˆ - Y ) = 0, to complete the derivation. Substituting A + Bx for Yˆ in the expression 2S(Y - Yˆ)( Yˆ - Y ) yields 2S(Y - A - Bx )( A - Bx - Y ) = 2 AS(Y - A - Bx ) - 2 BS( xY - Ax - Bx 2 ) - 2Y S(Y - A - Bx ) = 2 A * 0 - 2 B * 0 - 2Y * 0 =0 from the normal equations (8–9) and (8–10).
The commands (Syy y) returns the total variation; (Sexplain x y) returns the explained variation; and (SSerror x y) returns the unexplained variation, where x and y are lists of data points. For example, the command (Syy '(3 5 9)) returns 18.67.
Coefficient of Determination The coefficient of determination R2 is defined as the ratio of the explained variation to the total variation SYY. R
2
 (Yˆ - Y ) =  (Y - Y )
2
2
=
SExplain
.
(8–25)
SYY
Since 0 £ R2 £ 1, the value of R2 indicates the proportion of the total variation explained by the linear model, that is, the gain in more accurate predicting of the Y-value than by random guessing. Recall that r=
C( X , Y ) V ( X )V (Y )
=
s xy s xs y
.
The statistic R = R 2 is referred to as the sample correlation coefficient and is an estimator of r when both the X and the Y variables are random, for example, the correlation between the observed Yi and the Yˆi. The value of R is used as an indicator of the appropriateness of the linear regression model
P369463-Ch008.qxd 9/2/05 2:56 PM Page 486
486
Chapter 8 Regression
with the given data. Usually a value of R close to 1 or -1 indicates a good fit, but the correlation coefficient really only shows the improvement in using the x-values to predict a linear relationship with Y rather than using Y as the estimate. Thus a low correlation value does not necessarily mean that there is no relationship between the RVs X and Y but that there is little linear relationship between the variables. Nor does a high correlation coefficient necessarily imply a causal relationship. R2 tends to increase with smaller sample sizes. The correlation coefficient is often referred to as the index of fit and can also be expressed as S xY
R=
.
(8–26)
S xx SYY Note that the b-estimator B =
S xY
is identical in sign and similar in form
S xx to R. The estimators for V(X ), V(Y ), and C(X, Y ) are maximum likelihood S xx SYY S xY , , . estimators and are given, respectively, by n n n The correlation between the Yˆi and the observed Yi also provides information as to fit.
The template (R-sq x y) returns R2, (rho x y) returns the correlation between x and Y or between Yˆ and Y, and (Sexplain x y) returns SExplain, the explained variation.
8.5
Residual Analysis Before performing any linear regression, always plot the points in a scatter plot to see if there is a linear component present and to detect other obvious patterns. After performing the regression, the adequacy of the assumptions made for linear regression can be tested. These tests focus on analysis of the residuals (Y - Yˆ). To standardize the residuals we divide (Y - Yˆ) by S, where S2 =
SSError ( n - 2)
We expect the standardized residuals Yi - Yˆi S
=
Ei S
.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 487
8.5 Residual Analysis
487
to be independent unit normal random variables, with approximately 95% of the values within two standard deviations of the zero mean. There should be no obvious patterns in the plot of the standardized residuals. We can examine the assumptions with a plot of Yi versus Yˆi as well as by normal probability plots of the standardized residuals. We plot the standardized residuals for 3 contrived examples with x being the integers from 1 to 30. 1) Y = 3 + 5x + ei Regression yielded Yˆ = 6.1 + 5.1x, R2 = 0.92 with the ei random components from N(0, 16) added to the deterministic component. Table 8.1 shows that the plot of the standardized residuals indicates appropriate assumptions.
Residual Analysis Plot of Standardized Residuals Standardized Residuals
Equation
Y = 3 + 5x + E 200 150 100 50 0 20
0 0
20
40
–1000 –1500 –2000
Y = 3 + 5x + E with 1 outlier value 200 150 100 50 0 0
20
–0.3
Yˆ = 6.1 + 5.1x with R2 = 0.92.
0 –0.1 0
20
40
40
Y = –2x 2 –500
–0.2
0.1
40
Standardized Residuals
0
Comment A random component from N(0, 16) has been added to a straight line to show the appearance of standardized residuals when the assumptions of linear regression are adequate.
0.2
1.5 1 0.5 0 –0.5 0 –1 –1.5 –2 –2.5
Standardized Residuals
Table 8.1
2 1 0 –1 0 –2 –3 –4 –5
Y = –2x 2
20
40
The high 0.94 R2 is the % reduction in total variation. The linear model, though inappropriate, indicates the improvement over using Y. The residuals show the quadratic pattern. Yˆ = 330.7 - 62x.
20
40
The last Y value of 153 was replaced by the value 5 to indicate the effects of an outlier. The R2 value changed from 0.92 to 0.59, the error term s2 from 4792 to 25323, and the line from 6.1 + 5.1x to 16.3 + 4.1x with the one outlier value.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 488
488
Chapter 8 Regression
2) Y = -2x2 Inappropriate regression yielded Yˆ = 330.66 - 62x with R2 = 0.94. The presence of the quadratic component still produced a high R2, which does not necessary indicate that the fit is appropriate. Neither does a high R2 necessarily indicate a steep slope. 3) Y = 3 + 5x with an outlier y-value of 5 instead of 153 for x = 30. The equation changed from a perfect fit of 3 + 5x to 12.9 + 4x, R2 changed from 1 to 0.65, and the error term from 0 to 683.
The template (residuals x y) returns a list of the residuals. For example, (residuals '(1 2 3) '(3 5 9)) returns (1/3 -2/3 1/3) where Yˆ = -1/3 + 3x. The template (Ei x y) returns a list of the standardized residuals (Yi - Yˆi)/s. For example, (Ei '(1 2 3) '(3 5 9)) returns (0.408 -0.816 0.408). (pro (residuals (x y)) returns a plot of ordered residuals. For Example 2, (setf y-data (repeat #' - (sim-normal 0 4 30) (repeat #' * (list-of 30 -2) (repeat #' square (upto 30))))) adds a normal error component to Y = -2x2, resulting in the following sample: (-0.4 -6.0 -21.0 -31.8 -55.2 -64.6 -92.7 -129.7 -158.0 -208.4 -242.4 -288.4 -334.3 -389.1 -451.4 -508.3 -579.1 -644.4 -720.6 -800.6 -882.3 -968.9 -1062.4 -1150.6 -1246.6 -1357.3 -1463.4 -1562.3 -1678.5 -1803.3). The Minitab plot of ordered residuals shows a quadratic factor present in the data (Figure 8.10).
A violation of the linearity assumption of constant variance for the error terms is shown in Figure 8.11. The plot of the residuals fans out or grows with the x-values, indicating that E(Ei) π 0 (heteroscedastic instead of homoscedastic). EXAMPLE 8.16
With the indicated data, a) find the regression line of Y on x. b) Assign residuals with the command (setf residuals (residuals x y)) and predict the regression line (y-hat residuals Y), (y-hat residuals x), and (y-hat x residuals). c) Predict the regression line (y-hat x x) and predict (r-sq x residuals) and (r-sq y residuals).
P369463-Ch008.qxd 9/2/05 2:56 PM Page 489
8.5 Residual Analysis
489
Residuals Versus the Order of the Data (response is Y) 200
Residual
100 0 –100 –200 –300 5
10
15
25
20
30
Observation Order
Figure 8.10
Ordered Residuals of Y = -2X 2
Standardized Residuals
Growing Variance (fan-shaped) 200 100 0 –100 0
10
20
30
40
–200
Figure 8.11
Solution
Heteroscedastic Variance
(setf x (upto 12) Y '(5 6 7 8 10 11 12 13 15 17 18 19)) (mu y) Æ 11.75; (mu x) Æ 6.5; (R-sq x y) Æ 0.992916,
a) (y-hat x y) Æ Y-hat = 3.182 + 1.318x. b) (setf residuals (residuals x y)) returned (0.5 0.181 -0.136 -0.454 0.227 -0.090 -0.409 -0.727 -0.045 0.636 0.318 0).
P369463-Ch008.qxd 9/2/05 2:56 PM Page 490
490
Chapter 8 Regression
(y-hat residuals Y) Æ Y-hat = 11.75 + 1x, where A is Y and B is 1. (y-hat residuals x) Æ Y-hat = 6.5 + 0x, where A is x and B is 0. (y-hat x residuals) Æ Y-hat = 0 + 0x, where 0 is the mean of the residuals and B is 0. (y-hat y residuals) Æ Y-hat = -0.0832 + 0.0071X, where R2 = 1 - 0.0071. c) (y-hat x x) Æ Y-hat = 0 + 1x, (r-sq x residuals) Æ 0, (r-sq y residuals) Æ 7.083771e-3 = 1 - (R-sq x y).
The command (xy-residuals n) prints the regression equations in Example 8.16, using n random samples from the integers 1 to 100 for both x and y.
Lack of Fit F-Test Residual and probability plots are visual tests for the linear regression assumptions of common variance of independent normal random variables with zero mean. We now seek an analytical test for these assumptions. In order to perform a suitable hypothesis test, we need at least one level of input x-values with multiple y-values called replicates (Table 8.2). The sum of squares error term SSError can be partitioned into two parts: 1) Pure experimental or measurement error SSPure and 2) Lack of fit error SSLOF . That is, SSError = SSPure + SSLOF. The pure sum of squares error can be calculated at each x-level, using ni
 (Y
ij
- Y i )2
j =1
where
Table 8.2
Replicated Y-values
Level
1
2
i
k
x1 Y11, Y12, . . . , Y1n1
x2 Y21, . . . , Y2n2
xi Yi1, . . . , Yini
xk Yk1, . . . , Yknk
X Y
P369463-Ch008.qxd 9/2/05 2:56 PM Page 491
8.5 Residual Analysis
491
ni
ÂY Yi =
ji
j =1
.
ni
The sum of the pure error sum is k
ni
  (Y
ij
- Y i )2 .
i =1 j =1
SSLOF, the sum of squares due to lack of fit, is attained by subtracting the pure error from the total sum of squares error SSError. The degrees of freedom v for SSError is n - 2, for SSPure, v = n - k, and for SSLOF, v = (n - 2) (n - k) = k - 2. With k the number of x-levels and n the total sample size, a suitable F-statistic for testing H 0 : True regression model is linear versus H1: True regression model is not linear is F =
SSLOF /( k - 2) SSPure /( n - k)
=
MSLOF MSPure
with the rejection region being F > Fa,k-2,n-k.
EXAMPLE 8.17
Find the lack of fit F-statistic to test the adequacy of the linear regression model for the following data (Figure 8.12).
35 30 25 20 15 10 5 0 20
Figure 8.12
30
Lack of Fit Test
40
50
60
70
P369463-Ch008.qxd 9/2/05 2:56 PM Page 492
Chapter 8 Regression
492
ni
3
3
1
2
6
1
2
3
1
3
x 20 20 20 25 25 25 30 35 35 40 40 40 40 40 40 45 50 50 55 55 55 60 65 65 65 Y 3 3.5 5 2 4 5 7 10.5 12 14 16 18 19 16 17 16.5 19 22.5 23 25 27 30 27 29 33
H0 : True regression model is linear vs. H1: True regression model is not linear. Observe that k = 10 levels of x, ni = (3 3 1 2 6 1 2 3 1 3) for i = 1 to 10, and k =10
n=
Ân
i
= 25.
i =1
(setf x '(20 20 20 25 25 25 30 35 35 40 40 40 40 40 40 45 50 50 55 55 55 60 65 65 65) Y '(3 3.5 5 2 4 5 7 10.5 12 14 16 18 19 16 17 16.5 19 22.5 23 25 27 30 27 29 33)) The command (SSerror x y) returns SSError = 106.07 with v = 23 (25 - 2) degrees of freedom. The command (Pure-error x y) returns (15 56.0833 (2.1666 4.6666 0 1.125 15.3333 0 6.125 8 0 18.6666)) indicating v = 15 df and SSPure = 56.08. The individual internal sums of squares are the sums of the squared deviations about the Y-means and are computed as
(2.16 4. 6 0 1.125 15. 3 0 6.125 8 0 18. 6). For example, the 3 Y-values at x = 20 are 3, 3.5, and 5, with a mean of 3.8 3 , and sum of squared deviations 2.1 6 = (3 - 3.8 3 )2 + (3.5 - 3.8 3 )2 + (5 - 3.8 3 )2. The x-levels with only one Y value do not contribute to this sum of squares. The error term due to lack of fit is obtained by subtracting the pure error from the total error SSError. Thus SSLOF = 106.07 - 56.08 ª 50, with 8 degrees of freedom (k - 2). The F-statistic is computed as F =
SSLOF /( k - 2) SSPure /( n - k)
=
50/8
= 1.67 with p-value = 0.186.
56.08/15
Thus we cannot reject the hypothesis that the true linear regression model is linear. The command (test-beta x y) returns a 23 degrees of freedom t-value of 20.95, with p-value ª 0, affirming that b π 0.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 493
8.6 Convertible Nonlinear Forms for Linear Regression
493
The template (anova-lof x y) returns an analysis of variance table for the explained and unexplained variation. For example, for the x and y data in Example 8.17, (anova-lof x y) returns
Source Explained Unexplained Lack of fit Pure error Total
8.6
SS
df
MS
F
p-value
2023.29 106.07 50.00 56.08 2129.36
1 23 8 15 24
2023.29 4.61 6.25 3.74
438.71
0.000
1.67
0.186
Convertible Nonlinear Forms for Linear Regression When it is obvious that the Y response is not linearly related to the input xvalue, in certain cases a transformation of the data can result in appropriate use of the linear regression model. For example, consider the model Y x = ab x * e, estimated by Y x = AB x . Performing natural logarithmic transformation yields Ln Y = Ln a + x Ln b + Ln e and Ln Yˆ = Ln A + x Ln B. Ln ei are assumed to be independent normal RVs with expected value 0 and constant variance s 2. If we let W = Ln Yˆ, W becomes a linear function of x. We perform linear regression on W and x and then transform back to get our estimates for A and B for our original model. It should be recognized that finding suitable transformations of the data to attain a linear relationship can be a challenge. Some convertible forms are Y = ax b * e , Y = a +
b x
+ e,
1 Y
= a + bx + e ,
1 Y
=a+
b x
1
+ e, Y = e
a + bx + e
.
Such forms are called intrinsically linear, as they can be transformed into linear forms. To illustrate this procedure, we use
P369463-Ch008.qxd 9/2/05 2:56 PM Page 494
Chapter 8 Regression
494
Y =a *bx corresponding to Y = 2 * 5x to simply generate 5 points, perform the transformation, do the regression, and transform back to get our original model. EXAMPLE 8.18
Given the x and Y values generated by Y = 2 * 5x using the model Y = a * b x, find the corresponding regression estimates for A and B. Solution We transform the Y values into Ln Y values as shown and perform linear regression with the (x, Ln Y ) data pairs. Ln Y = W = Ln A + (Ln B) x. Y 10 50 250 1250 6250 S
X
Ln Y
X2
X Ln Y
1 2 3 4 5
2.3 3.9 5.5 7.1 8.7
1 4 9 16 25
2.3 7.8 16.5 28.4 43.5
15
27.5
55
98.5
The normal equations are: n( Ln A) + ( Ln B )S x i = S Ln Yi fi 5Ln A + 15Ln B = 27.5. S x i ( Ln A) + ( Ln B )Sx i2 = S x i Ln Yi fi 15Ln A + 55Ln B = 98.5. Solving simultaneously, 10Ln B = 16 fi Ln B = 1.6 fi B = e1.6 ª 5. Ln A = 0.7 fi A ª e 0.7 = 2. Thus the original model Y = 2 * 5 x is retrieved.
(repeat #' log list) returns the Ln of each number in the list. (Y-hat '(1 2 3 4 5) (repeat #' log '(10 50 250 1250 6250)) returns Ln Y = Ln A + (Ln B) x as Y-hat = 0.693 + 1.609x, from which A and B are recoverable as A = (exp 0.693) = e0.693 ª 2 and B = (exp 1.609) = e1.609 ª 5, yielding Y = 2 * 5x.
8.7
Polynomial Regression If the linear assumptions do not hold, we can adjust the model by adding square terms, cubic terms, etc. We may assume that our data can be fitted to a model described as
P369463-Ch008.qxd 9/2/05 2:56 PM Page 495
8.7 Polynomial Regression
495
Y = b 0 + b1 x + b 2 x 2 + . . . + b r x r + e . The bs are called beta weights or regression coefficients and can be estimated from the data, similar to simple linear regression. To illustrate the procedure for r = 2, assume our model is Y = A + Bx + Cx2 where symbols A, B, and C are used to avoid subscripts. SSError = S(Y - A - Bx - Cx 2 )2 . ∂ SSError ∂A ∂ SSError ∂B ∂ SSError ∂C
= S 2(Y - A - Bx - Cx 2 )( -1) = 0 when nA + BS x + C S x 2 = SY . = S 2(Y - A - Bx - Cx 2 )( - x ) = 0 when AS x + BS x 2 + C S x 3 = S xY . = S 2(Y - A - Bx - Cx 2 )( - x 2 ) = 0 when AS x 2 + BS x 3 + C S x 4 = S x 2 Y .
The three equations are the normal equations and can be continued for higher degrees, although the lowest degree polynomial that fits the data should be used. nA + BS x + CS x 2 = SY . AS x + BS x 2 + CS x 3 = S xY .
(8–27)
AS x 2 + BS x 3 + CS x 4 = S x 2 Y .
(8–29)
(8–28)
A simplified contrived example shows the procedure. EXAMPLE 8.19
Consider polynomial Y = 1 + 2x + 3x2. That is, we know in advance that A = 1, B = 2, C = 3. We use the five x-values 0, 1, 2, -1 and -2 to create the data table below and use equations 8–27 to 8–29 to verify the coefficients. x
x2
x3
x4
xY
x2Y
1 6 17 2 9
0 1 2 -1 -2
0 1 4 1 4
0 1 8 -1 -8
0 1 16 1 16
0 6 34 -2 -18
0 6 68 2 36
Totals 35
0
10
0
34
20
112
Y
Solution 5 A + 0 B + 10C = 35 0 A + 10 B + 0C = 20 10 A + 0 B + 34C = 112 Notice that (A, B, C) = (1, 2, 3) satisfies all three equations.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 496
496
Chapter 8 Regression
(poly-regress x-data y-data degree) returns the polynomial regression equation corresponding to the degree of the polynomial. If degree is 2, a quadratic equation is returned. For example, (poly-regress '(0 1 2 -2 -2) '(1 6 17 2 9) 2) Æ Y-hat = 1 + 2XŸ1 + 3XŸ2 corresponding to Y-hat = 1 + 2x + 3x2. Note that in trying to fit the data to a cubic, (poly-regress '(0 1 2 -1 -2) '(1 6 17 2 9) 3) returns Y-hat = 1 + 2XŸ1 + 3XŸ2 + 0XŸ3, corresponding to Y-hat = 1 + 2x + 3x2.
Since polynomial regression involves the powers of x, the regression coefficients are highly dependent (multicollinearity). To lessen this effect, the polynomial models are often formulated with the powers of the algebraic deviations. EXAMPLE 8.20
Given the following x-y data, recover the a) quadratic polynomial and b) Xi - X regression equation coefficients. Solution Given polynomial y = x2 - 17x + 60, the x-y data pairs are
X Y
0 60
1 44
2 30
3 18
4 8
5 0
6 -6
7 -10
8 -12
(setf x (upt0 8) y'(60 44 30 18 8 0 -6 -10 -12)); assign x and y data (setf deviations (repeat #' -x (list-of 9 (mu x)))); subtract x from each xi. a) (poly-regress x y 2) returns Y-hat = 60 - 17XŸ1 + 1XŸ2. b) (poly-regress deviations Y 2) returns Y-hat = 8 - 9XŸ1 + 1XŸ2. Now note the correlations of x with x2 and the deviations with their squares given by (rho x (repeat #' square x)) returns (r = 0.9621576) while (rho deviations (repeat #' square deviations)) returns (r = 0). Although R2 is an excellent figure of merit for linear regression, it is not a perfect indicator. See the following example. EXAMPLE 8.21
Compute R2 from the (x, y) data set. Find the linear Yˆ values. Plot the residuals. Then use polynomial (quadratic) regression. Compare the Yˆ’s from the linear regression with the Yˆ’s from the quadratic regression.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 497
8.7 Polynomial Regression
497
Residuals Versus the Order of the Data (response is Y) 60 50 40
Residual
30 20 10 0 –10 –20 –30 2
6
4
8
10
12
Observation Order
Figure 8.13
Plot of Ordered Residuals
Solution (setf X '(25 282 462 605 726 831 926 1011 1090 1163 1231 1296) Y '(1356 1414 1469 1522 1572 1621 1667 1712 1756 1798 1839 1879)) (y-hat x y) Æ Y-hat = 1292.499 + 0.424X ; linear regression (R-sq x y) Æ 0.9756 (may seem like a nice fit). The linear regression Yˆ’s are (Yhats x y) Æ (1303.1 1412.2 1488.6 1549.3 1600.6 1645.2 1685.5 1721.6 1755.1 1786.1 1815.0 1842.6). A plot of the residuals (Figure 8.13) reveals the presence of a quadratic factor. (polynomial-regress x y 2) returns Y-hat = 1351 + 0.173471X + 0.0001812X2. Figure 8.14 shows the close fit. (yhats (list x (repeat #' square x)) y) returns the quadratic regression Yˆ’s:
P369463-Ch008.qxd 9/2/05 2:56 PM Page 498
498
Chapter 8 Regression
Regression Plot Y = 1351.00 + 0.173471 x + 0.0001811 x **2 S = 0.648346
R-Sq = 100.0%
R-Sq(adj) = 100.0%
1900 1800
Y
1700 1600 1500 1400
0
500
1000 X
Figure 8.14
Fitted Quadratic Regression
(1355.4 1414.3 1469.8 1522.2 1572.4 1620.2 1666.9 1711.5 1755.3 1797.7 1839.0 1880.0). (R-sq (list x (repeat #' square x)) y) returns 0.999979, an improved R2. Polynomial regression is particularly precarious in extrapolating beyond the range of the data. The regression cure may fit nicely, for example, on the way up but horribly when the curve bends down beyond the data. A second-order polynomial can exactly pass among 3 points, a third order among 4 points, etc. Thus, the higher the degree of the fitted polynomial, the seemingly better the fit. However, the relation between y and the powers of x is camouflaged, and extrapolation becomes overly sensitive.
8.8
Multiple Linear Regression Often the dependent criterion variable depends on two or more independent variables. For example, we may want to describe, control, or predict a
P369463-Ch008.qxd 9/2/05 2:56 PM Page 499
8.8 Multiple Linear Regression
499
person’s blood pressure (dependent variable) when given the person’s weight and age (independent input variables). Added regression variables may account for more of the variability than just the lone prediction variable used in simple regression. However, it is highly unlikely when 4 or 5 variables account for most of the variation that an additional prediction variable will account for some of the remaining variation without being highly correlated with one or more of the predictor variables. Further, adding more predictor variables adds complication to analyzing the causal relationship between these variables and the response. Thus, an optimum number of predictor variables is sought. The general regression model for two prediction variables is given by Y = b0 + b1x1 + b2x2 + e. The procedure is similar to simple regression. Following the procedures for differentiating the sum of squares errors with respect to the estimators, we arrive at the normal equations given by + BSx1
nA
ASx1 + BSx12
+ CSx2
= SY,
(8–30)
+ CSx1x2 = Sx1Y, 2 2
ASx2 + BSx1x2 + CSx
(8–31)
= Sx2Y.
(8–32)
We show the procedure for multiple linear regression in the following simple example.
EXAMPLE 8.22
With model Y = A + Bx1 + Cx2 = 0 for A = -21, B = 3, and C = 3, we use x1 values 1, 2, and 3 and x2-values 5, 7, and 8 to generate Y-values -3, 6, and 12 and complete the table.
x1
x2
x1x2
x12
x22
x1Y
x2Y
-3 6 12
1 2 3
5 7 8
5 14 24
1 4 9
25 49 64
-3 12 36
-15 42 96
Total: 15
6
20
43
14
138
45
123
Y
Our normal equations from equations (8–30 to 8–32) are: 3A + 6B + 20C = 15 6A + 14B + 43C = 45 20A + 43B + 138C = 123. Notice again that (A, B, C) = (-21, 3, 3) satisfies all three equations. The procedure is similarly continued for more independent variables.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 500
500
Chapter 8 Regression
(mlr-solve x1 x2 y) prints the regression equations and returns the solution. (mlr-solve '(1 2 3) '(5 7 8) '(-3 6 12)) returns (-21 3 3).
EXAMPLE 8.23
a) By inspection, write the regression equation for the tabled data.
Y
12
17
x1 x2 x3 x4
25 3 6 7
34 7 8.5 12
8
9
56 9 4 16
78 12 4.5 18
33 90 40 16.5 20
b) Use polynomial regression to fit the data below.
X Y
6 0
6 0
2 240
3 72
5 0
7 0
9 72
Solution a) (setf x-data '((25 34 56 78 90) (3 7 9 12 40) (6 8.5 4 4.5 16.5) (7 12 16 18 20)) Y-data '(12 17 8 9 33)) assigns the data to variables x-data and y-data. The command (y-hat x-data y-data) returns Y-hat = 0 + 0X1 + 0X2 + 2X3 + 0X4; that is, Y = 2X3. b) (setf x '(6 6 2 3 5 7 9) Y '(0 0 240 72 0 0 72) (polynomial-regress '(x) '(Y) 4) returns Y-hat = 1260 - 852XŸ1 + 215XŸ2 - 24XŸ3 + 1XŸ4, an exact fit. (polynomial-regress x y 2 'r-sq) Æ 0.927 (degree 2). (polynomial-regress x y 3 'r-sq) Æ 0.977 (degree 3). (polynomial-regress x y 4 'r-sq) Æ 1 (degree 4). Notice R2 increases with the higher degree polynomial regression.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 501
8.8 Multiple Linear Regression
501
Multiple Linear Regression with Matrices The normal equations for describing, predicting, or controlling Y with regression variables X1, . . . , Xk are generated by differentiating SSError = S(Yi - Yˆ)2 with respect to the beta weights for model Y = b0 + b1x1i + b2x2i + . . . + bkXki + Ei. We denote the beta estimates by Bi. That is, n
SSError = Â [Yi - ( B0 + B1 x1i + B2 x2 i + . . . + Bk x ki )]2 . i =1
The normal equations from setting the partials of the error with respect to the Bi to zero are n*B0 + Sx1i*B1 + Sx2i*B2 + . . . + Sxki*Bk = SYi Sx1i*B0 + Sx1i2 *B1 + Sx1ix2i*B2 + . . . + Sx1ixki*Bk = Sx1iYi ... ... ... ... ... ... ... 2 Sxki*B0 + Sxkix1i*B1 + Sxkix2i*B2 + . . . + Sxki*Bk = SxkiYi where the subscript i varies from 1 to n, with n being the number of data inputs. In matrix notation we have XTXB = XTY where È1 x11 x21 . . . x k1 ˘ Í1 x x . . . x ˙ k2 ˙, X = Í 12 22 ÍM ˙ ÍÎ ˙ 1 x1n x2 n . . . x kn ˚
È B0 ˘ ÍB ˙ B = Í 1 ˙, ÍM ˙ ÍÎ ˙˚ Bk
ÈY1 ˘ ÍY ˙ and Y = Í 2 ˙, ÍM ˙ ÍÎ ˙˚ Yn
Observe that È1 1 . . . 1 ˘ È1 x11 x21 . . . x k1 ˘ Í x x . . . x ˙ Í1 x x . . . x ˙ in k2 ˙= ˙ Í 12 22 X TX = Í i1 i 2 ˙ ÍM ˙ ÍM ˙ ÍÎ ˙˚ ÍÎ x ki x k 2 . . . x kn 1 x1n x2 n . . . x kn ˚
Èn  x1i . . .  xki ˘ Í ˙ 2 Í x1i  x1i . . .  x1i x ki ˙. ÍM ˙ Í ˙ 2 Πx ki  x ki x1i . . .  x ki ˚
From matrix equation XB = Y we have XTXB = XTY where XT is the transpose of matrix X. The transpose of a matrix has the columns of the matrix in its rows and the rows of the matrix in its columns. For example, if 1 2˘ 1 3˘ matrix X = ÈÍ , then X T = ÈÍ . Î3 4˙˚ Î2 4˙˚ By finding the inverse of matrix (XTX ) denoted by (XTX )-1, we could then solve the equation for the beta estimators by matrix multiplying (XTX )-1 times XTY. That is,
P369463-Ch008.qxd 9/2/05 2:56 PM Page 502
502
Chapter 8 Regression
XB = Y XTXB = XTY (XTX )-1(XTX )B = (XTX )-1XTY or IB = B = (XTX )-1XTY, where I is the identity matrix. We designate the matrix C = (XTX )-1XT from which B = CY. Observe that XTX and its inverse are symmetric and that CT is thus symmetric and that the variance-covariance matrix CC T = ( X T X ) -1 X T [( X T X ) -1 X T ]T = ( X T X ) -1 X T X [( X T X ) -1]T = X -1( X T ) -1 X T X [X -1( X T ) -1]T = X -1( X T ) -1 X T XX -1( X T ) -1 = X -1( X T ) -1 = ( X T X ) -1. The covariance of B is equal to the covariance of CY and the variance of B can be obtained by multiplying the main diagonal entries of (XTX )-1 by s 2 since C(Bi, Bi) = V(Bi). Armed with the variances of the beta coefficients, we can obtain confidence intervals for each bi and test for the significance of the regression. The residuals matrix R can be determined from R = Y - XB. S2 =
SSError n - k -1
,
where k is the number of x-variables and SSError is the sum of the square of the residuals. Notice that for simple linear regression, k = 1. A simple illustrative example may help make these matrices clear.
EXAMPLE 8.24
Fit a matrix equation, solve for the beta weights, and write the matrices X, XT, XTX, Y, XTY, (XTX )-1 = CCT, B, C, the residual matrix R, SSError, and the sample variances for the regression estimators, given the following data.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 503
8.8 Multiple Linear Regression
503
Y
x1
x2
x12
x22
x1x2
x1Y
x2Y
1 3 -1
0 1 5
2 4 3
0 1 25
4 16 9
0 4 15
0 3 -5
2 12 -3
Total: 3
6
9
26
29
19
-2
11
The matrices are given as follows: È1 0 2˘ È1 1 1˘ È3 6 9 ˘ È 1˘ Í ˙ T Í ˙ T Í ˙ X = 1 1 4 , X = 0 1 5 , X X = 6 26 19 , Y = Í 3˙. Í ˙ Í ˙ Í ˙ Í ˙ ÍÎ2 4 3˙˚ ÍÎ9 19 29˙˚ ÍÎ-1˙˚ ÎÍ1 5 3˙˚ Observe that our XTX matrix contains the coefficients of the normal equations 3A + 6B + 9C = 3 6A + 26B + 19C = -2 9A + 19B + 29C = 11, with È1 1 1˘ È 1˘ X Y = Í0 1 5˙ Í 3˙ = Í ˙Í ˙ ÍÎ2 4 3˙˚ ÍÎ-1˙˚ T
È 3˘ Í-2˙ Í ˙ ÍÎ11˙˚
representing the right-hand side of the equations. T
( X X)
-1
È 4.851 -0.037 -1.481˘ È131/27 Í ˙ = Í-0.037 0.074 -0.037˙ = Í -1/27 Í Í ˙ ÍÎ-40/27 1 . 481 0 . 037 0 . 518 Î ˚
-1/27 -40/27˘ 2/27 -1/27 ˙ ˙ -1/27 14/27 ˙˚
Note the symmetry from the B matrix of regression coefficients È 4.851 -0.037 -1.481˘ È 3˘ Í ˙ B = ( X X ) X Y = Í-0.037 0.074 -0.037˙ Í-2˙ = Í ˙ Í ˙ ÍÎ11˙˚ 1 481 0 037 0 518 . . . Î ˚ T
-1
T
È-1.6˘ È-5/3˘ Í ˙ Í ˙ Í-0.6˙ = Í -2/3˙ ÍÎ 1.3 ˙˚ ÍÎ-4/3˙˚
yielding the beta estimates (B0, B1, B2) = (-1.67, -0.67, 1.33). Observe that È 4.851 -0.037 -1.481˘ È1 1 1˘ Í ˙ C = ( X TX ) -1 X T = Í-0.037 0.074 -0.037˙ Í0 1 5˙ Í ˙ Í ˙Í ˙ Î -1.481 -0.037 0.518 ˚ Î2 4 3˚ È 1.8 -1.1 0.2˘ È17/9 Í ˙ = Í -0.1 -0.1 0.2˙ = Í -1/9 Í Í ˙ Í Î-0.4 0.5 -0.1˚ Î-4/9
-10/9 2/9˘ -1/9 2/9˙ ˙ 5/9 -1/9˙˚
P369463-Ch008.qxd 9/2/05 2:56 PM Page 504
504
Chapter 8 Regression
È 1.8 -1.1 0.2˘ È 1.8 -0.1 -0.4˘ Í ˙Í ˙ CC T = Í -0.1 -0.1 0.2˙ Í -1.1 -0.1 0.5˙ Í ˙Í ˙ Î-0.4 0.5 -0.1˚ Î-0.4 0.2 -0.1˚ È 4.851 -0.037 -1.481˘ Í ˙ = Í-0.037 0.074 -0.037˙ = ( X TX ) -1 . Í ˙ 0.037 0.518 ˚ Î -1.481 -0 È 1˘ È1 0 2˘ È-0.6˘ Í ˙ R = Y - XB = Í 3˙ - Í1 1 4˙ Í-0.6˙ = Í ˙ Í ˙ ÍÎ -1˙˚ ÍÎ1 5 3˙˚ ÍÎ 1.3˙˚
È0˘ Í0˙. Í ˙ ÎÍ0˚˙
SSError = 0, since 3 noncollinear points uniquely determine a plane. The variance estimators are the trace (main diagonal elements) of the CCT matrix multiplied by s2, which is an unbiased estimator for s 2: (4.851, 0.074, 0.518) * 0 = (0 0 0). The following commands generate the matrices used in multiple linear regression. (setf x-data '((0 1 5 7)(2 4 3 8 )) y-data '(1 3 -1 9) x-values '(2 7) alpha 0.05) assigns the x and y data and x1 and x2 values needed for predicted y-values and the a level of significance. (X-matrix x-data) returns the matrix X, #2A((1 0 2)(1 1 4)(1 5 3)(1 7 8)). (Y-matrix y-data) returns the matrix Y, #2A((1)(3)(-1)(9)). (Xt x-data) returns the matrix XT, #2A((1 1 1 1)(0 1 5 7)(2 4 3 8)). (XtX x-data) returns the matrix XTX, #2A((4 13 17)(13 75 75)(17 75 93)). #2A((12)(61)(83)). (XtY x-data y-data) returns the matrix XTY, (inverse (X + X X-data)) returns (X + X)-1, #2A((225/193 11/193 -50/193)(11/193 83/1158 -79/1158) (-50/193 -79/1158 131/1158)). (Beta-estimates x-data y-data) returns Bis, (-4.03627 -0.606218 2.119171). (SSerror x-data y-data) returns SSError, 1.523315. (Sexplain x-data y-data) returns SExplained, 54.476684. (R-sq x-data y-data) returns R2, the coefficient of determination, 0.972798. (B-matrix x-data y-data) returns the B matrix of beta coefficients. #2A((-4.036268)(-0.606218)(2.119171)).
P369463-Ch008.qxd 9/2/05 2:56 PM Page 505
8.8 Multiple Linear Regression
505
(C-matrix x-data) returns the C matrix, #2A((125/193 36/193 130/193 -98/193) (-46/579 -167/1158 122/579 5/386) (-19/579 145/1158 -151/579 65/386)). (R-matrix x-data y-data) returns the residuals. #2A((0.797926)(-0.834198)(-0.290156)(0.326424)). (Residuals x-data y-data) displays actual Yi versus Yˆi, Actual
Predict
Residual
1.0000 3.0000 -1.0000 9.0000
0.20207 3.8342 -0.70985 8.6736
0.79793 -0.83420 -0.29015 0.32643.
1.523315. (sŸ2 x-data y-data) returns V( Yˆ), (sŸ2-y0 x-data y-data x-values) returns V(Ypredict) given the xi values, 2.569115. (y-predict x-data y-data x-values) returns the predicted y-value. 9.585492. (ci-mlr x-data y-data alpha) returns 100(1 - a)% confidence intervals for the regression parameters, -4.03627 ± 12.95041 -0.60622 ± 3.21111 2.11917 ± 4.03415. (test-betas x-data y-data) returns a list of t- and p-values for testing H0: bi = 0 vs. H1: bi π 0, Predictor X0 X1 X2
Coef
SE Coef
T-statistic
P-values
-4.036 -0.606 2.119
1.333 0.330 0.415
-3.029 -1.835 5.105
0.2030 0.3177 0.1231
(F-ratio-mlr x-data y-data) returns the value of the F-statistic and p-value in testing F -statistic =
SExplained / k SSError /( n - k - 1)
= ( F = 17.881, p-value = 0.16493).
(Regress-anova x-data y-data) returns the analysis of variance table.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 506
Chapter 8 Regression
506
ANALYSIS OF VARIANCE Source Model Error Total
SS
DF
MS
F
p-value
54.477 1.523 56.000
2 1 3
27.238 1.523
17.881
0.164930
(y-hat x-data y-data) returns the regression equation, Y-hat = -4.0363 - 0.6062X1 + 2.1192X2. (display-mlr x-data y-data) prints many of the above outputs to a file.
8.9
Multiple Regression Techniques Among the techniques for multiple regression analysis are forward selection, backward elimination (selection), and stepwise selection, based on criteria of choice. Among the criteria for variable selection are Mean Square Error (MSE), coefficient of determination (R2), Prediction Sum of Squares (PRESS), Mallow statistic (CP), and variance inflation factors (VIF). We briefly discuss techniques and criteria for variable selection, stating that choosing the appropriate model is an art assisted by some science. Further, the elimination of candidate variables is usually done when the list is considerable. However, if the analysis is to uncover the effects of a certain variable on the response variable, then elimination techniques cannot be used on that variable.
Forward Selection We consider adding candidate variables to the model one at a time in forward selection until the criteria of choice do not improve with the added variable. Once a variable is added and significantly improves the model, it is never retracted. Suppose our criterion of choice is R2 and that we have 3 candidate variables x1, x2, and x3. We compute R2 for each single model xi and Y and choose the highest R2 regressor. If it is significantly better than using Y = Y, the regressor is kept and the procedure is repeated by considering the 2-variable models. EXAMPLE 8.25
Analyze the following data set using R2 and MSE criteria with the R-sq and MSE-r software commands. Solution
The command (ex8.25) assigns the data to the xi and Y.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 507
8.9 Multiple Regression Techniques
507
Y
90 91 94 96 97 99 103 105 109 111 116 119 123 125 125 130 132 131 134 137
x1 x2 x3 x4 x5
42 43 43 27 2
45 44 43 31 2
45 46 43 37 4
47 46 44 38 5
47 48 45 38 11
48 48 45 40 12
48 48 50 41 13
48 50 50 42 16
49 51 50 43 19
50 51 50 43 20
50 52 52 45 21
50 52 53 45 25
51 54 53 46 30
51 54 56 46 30
52 54 57 47 32
52 54 59 50 35
53 55 59 58 35
56 55 59 61 39
56 58 61 62 64
59 60 63 69 66
1) First examine all 1-regressor models. (R-sq x1 y) Æ 0.874 (MSE-r (list x1) (R-sq x2 y) Æ 0.942 (MSE-r (list x2) (R-sq x3 y) Æ 0.969 (MSE-r (list x3) (R-sq x4 y) Æ 0.835 (MSE-r (list x4) (R-sq x5 y) Æ 0.851 (MSE-r (list x1) Choose x3 since it has the highest R2 value 0.969 and the value 8.18.
y) Æ 33.12 y) Æ 15.17 y) Æ 8.18 y)Æ 43.59 y) Æ 38.99 lowest MSE
2) Next look at all 2-regressor models with variable x3 present. (R-sq (R-sq (R-sq (R-sq
(list (list (list (list
x3 x3 x3 x3
x1) x2) x4) x5)
y) y) y) y)
Æ Æ Æ Æ
0.970 0.977 0.969 0.969
(MSE-r (MSE-r (MSE-r (MSE-r
(list (list (list (list
x3 x3 x3 x3
x1) x2) x4) x5)
y) y) y) y)
Æ Æ Æ Æ
8.36 6.36 8.63 8.66
Now choose the model with regressors x3 and x2 since the MSE was reduced to 6.36 and the R2 value increased to 0.977. 3) Then seek all 3-regressor models with variables x3 and x2 present. (R-sq (list x3 x2 x1) y) Æ 0.978 (R-sq (list x3 x2 x4) y) Æ 0.979 (R-sq (list x3 x2 x5) y) Æ 0.981
(MSE-r (list x3 x2 x1) y) Æ 6.41 (MSE-r (list x3 x2 x4) y) Æ 6.22 (MSE-r (list x3 x2 x5) y) Æ 5.49
Notice there is little improvement in the 3-regressor model and the final choice is to remain with the two-variable model x3 and x2 or accept x2, x3, and x5. The command (Y-hat (list x2 x3) y) returns Yˆ = -28.42 + 1.14X2 + 1.62X3 while (Y-hat (list x2 x3 x5) y) Æ Yˆ = -60.298 + 1.714X2 + 1.753X3 - 0.200X5. Take a look at (mlr-stats '(x1 x2 x3 x4 x5) y) for all-regressor models.
Backward Elimination In this procedure, all the candidate regressor variables are used for the initial model and variables are eliminated one at a time to determine any improvement. When no further improvement results, the previous model is selected.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 508
Chapter 8 Regression
508
EXAMPLE 8.26
Use the backward elimination technique to fit the best regressor equation from the data Example 8.25. Solution Step 1) (R-sq (list x1 x2 x3 x4 x5) y) returns 0.982; (MSE-r (list x1 x2 x3 x4 x5) y) returns 6.176. Step 2) (R-sq (list x1 x2 x3 x4) y) returns 0.979; (MSE-r (list x1 x2 x3 x4) y) returns 6.635. (R-sq (list x1 x2 x3 x5) y) returns 0.982; (MSE-r (list x1 x2 x3 x5) y) returns 5.822. (R-sq (list x1 x2 x4 x5) y) returns 0.943; (MSE-r (list x1 x2 x4 x5) y) returns 18.062. (R-sq (list x1 x3 x4 x5) y) returns 0.972; (MSE-r (list x1 x3 x4 x5) y) returns 8.945. (R-sq (list x2 x3 x4 x5) y) returns 0.982; (MSE-r (list x2 x3 x4 x5) y) returns 5.773. We eliminate regressor variables X1 and X4 since without each we have a higher R2 and a lower MSE. Step 3) Continuing, we arrive at the same conclusion for forward selection in that regressor variables X3 and X2 or X2, X3, and X5 are adequate.
Model Variables Selection Criteria Among the criteria for variable selections are the Mean Square Error (MSE), R2, the Prediction Sum of Squares (PRESS), the Mallow CP statistic, and the variance inflation factors (VIF). If our purpose is to predict a response Y, then we seek the minimum number of variables adequate for the prediction. However, if our purpose is to determine the effects of certain candidate variables on a designated response, we must then use and determine the effects of all the candidate variables. Since we are already familiar with the MSE and R2, we look at the PRESS, Mallow CP, and VIF criteria. PRESS PRESS is an acronym for prediction sum of squares. This statistic can be used to distinguish among models formed with a different number of candidate predictor variables. To calculate the PRESS statistic, 1) Remove the first data point, compute the beta estimates bi, i = 2, 3, . . . n, predict the first data point Yˆp1, and compute the residual Yˆp1 - Y1. 2) Repeat this operation for each of the remaining data points to get a list of predicted values. The predicted residual is Yi - Yˆpi. 3) The PRESS statistic is the sum of the squares of the predicted residuals. A model with a low PRESS value and acceptable R2 is sought.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 509
8.9 Multiple Regression Techniques
509 n
PRESS = Â (Yˆpi ,-i - Yi )2 i =1
Consider the following data set assigned by the command (Press.ex).
Y
12
-26
43
92
27
41
57
3
30
-5
x1 x2 x3 x4
20 17 34 25
28 15 33 35
28 16 31 27
28 22 31 25
23 21 34 29
25 17 35 26
26 20 33 28
21 18 29 30
23 21 35 32
23 17 35 29
Compute the beta estimates by first removing the first data entry 12 from Y, and the data entries 20, 17, 34, and 25 from the x1, x2, x3, and x4 rows respectively. Using the command (beta-estimates (repeat #' rest x-data) (rest ydata)), we get the beta estimates (-35.71 5.01 7.83 -0.69 -6.31) From these estimates, Yˆp1 = B0 + B1x1 + B2x2 + B3x3 + B4x4 or Yˆp1 = -35.71 + 5.01 * 20 + 7.83 * 17 - 0.69 * 34 6.31 * 25 = 16.39, and the first predicted residual is computed as Y1 - Yˆp1 = 12 - 16.39 = -4.39. The procedure is repeated for the remaining residuals. The PRESS statistic is the sum of the square of these predicted residuals. The following is an all-models display of the beta estimates with the computed PRESS statistics.
Variables x1, x1, x1, x1, x2, x1, x1, x1, x2, x3, x1 x2 x3 x4
x2, x3, x4 x2, x3 x2, x4 x3, x4 x3, x4 x2 x3 x4 x3 x4
bˆ0
bˆ1
bˆ2
bˆ3
bˆ4
PRESS
-48.43 -212.83 -74.88 140.57 124.50 -270.37 -17.66 115.73 -47.35 297.97 -80.28 -143.75 123.79 228.42
5.24 4.67 5.35 4.95 — 4.89 4.15 5.06 — — 4.39 — — —
8.02 9.66 8.02 — 7.74 9.67 — — 9.30 — — 9.30 — —
-0.76 — — -0.71 -2.32 — -1.72 — -2.92 -2.19 — — -2.92 —
-6.12 -1.57 -6.17 -7.38 -5.69 — — -7.42 — -6.93 — — — -7.03
2655.97 13439.65 1855.46 12681.38 8067.39 9875.72 23185.99 11710.05 13839.91 12676.55 15016.26 9699.67 17935.64 9350.25
P369463-Ch008.qxd 9/2/05 2:56 PM Page 510
510
Chapter 8 Regression
We observe that the full model (x1, x2, x3, x4) and the reduced model (x1, x2, x4) have the best PRESS statistics, shown in bold.
The command (Print-PRESS x-data y-data) prints the beta estimates after each Yi is removed and returns the PRESS statistic. (Print-PRESS (x-data) Y ) prints the beta estimates and the computed predicted value Y-Predict, Y-observed, and the difference between the two values.
B0 -35.71 -91.76 -69.35 -50.11 -70.44 -41.23 -41.18 -158.66 38.78 -56.33
B1
B2
B3
B4
Ypredict
Yobs
Ypredict-Yobs
5.01 6.12 4.86 6.11 5.03 5.03 5.10 6.60 5.60 5.07
7.83 6.96 8.52 9.22 8.76 8.47 7.82 8.14 6.47 7.67
-0.69 -0.65 -0.38 -1.44 -0.46 -1.53 -0.80 1.90 -1.86 -0.14
-6.31 -4.72 -5.86 -6.72 -5.94 -5.64 -6.10 -6.65 -7.30 -6.14
16.352 -2.715 33.090 110.978 41.198 28.382 50.581 -17.982 4.777 7.571
12.00 -26.00 43.00 92.00 27.00 41.00 57.00 3.00 30.00 -5.00
4.35 23.28 -9.91 18.98 14.20 -12.62 -6.42 -20.98 -25.22 12.57
PRESS = 2655.9424
EXAMPLE 8.27
Candidate Selection Given the following data, use candidate selection procedures to best fit the model in regards to the least number of independent variables for the highest R2 and lowest PRESS statistic. The command (ex8.27) assigns the data.
Y
90 91 94 96 97 99 103 105 109 111 116 119 123 125 125 130 132 131 134 137
x1 x2 x3 x4 x5
42 43 43 27 2
45 44 43 31 2
45 46 43 37 4
47 46 44 38 5
47 48 45 38 11
48 48 45 40 12
48 48 50 41 13
48 50 50 42 16
49 51 50 43 19
50 51 50 43 20
50 52 52 45 21
50 52 53 45 25
51 54 53 46 30
51 54 56 46 30
52 54 57 47 32
52 54 59 50 35
53 55 59 58 35
56 55 59 61 39
56 58 61 62 64
59 60 63 69 66
We begin to test his independent variable with the dependent response variable Y. Using the command (R-Sq xi Y ) for i = 1 to 5 and taking the variables one at a time, we get
P369463-Ch008.qxd 9/2/05 2:56 PM Page 511
8.9 Multiple Regression Techniques
R2
511
X1
X2
X3
X4
X5
0.874
0.942
0.969
0.835
0.851
two at a time, for example, (R-Sq (list x1 x2) Y )
R2
X1, X2
X1, X3
X1, X4
X1, X5
X2, X3
X2, X4
X2, X5
X3, X4
X3, X5
X4, X5
0.942
0.970
0.874
0.885
0.977
0.942
0.943
0.969
0.969
0.867
three at a time,
X1X2X3 X1X2X4 X1X2X5 X1X3X4 X1X3X5 X1X4X5 X2X3X4 X2X3X5 X2X4X5 X3X4X5 R
2
0.978
0.942
0.943
0.9712
0.9706
0.886
0.979
0.981
0.943
0.967
four at a time,
R2
X1X2X3X4
X1X2X3X5
X1X2X4X5
X1X3X4X5
X2X3X4X5
0.9789
0.9815
0.943
0.9716
0.9817
five at a time,
X1X2X3X4X5 R2
0.9817
Adding extra regressor variables only increases R2. To compensate for this 2 tendency, an adjusted R2 is computed with use of RAdjust = 1 - (1 - Rp2) + Ê n -1 ˆ , where R2p is based on p-regressors instead of k. Ë n - k - 1¯ Still another figure of merit for candidate selection is the PRESS R-Sq statistic, called R2Predict. The computation is given as 2 RPredict = 1-
PRESS
.
SYY
Although the value can range from below 0 to above 1, the statistic is truncated to lie in the range 0 to 1.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 512
512
Chapter 8 Regression
The commands (R-sq (list x1 x2 x3 x4 x5) y) returns 0.9817. (R-sq-Adjust x-data y-data) returns the adjusted R2. For example, (R-sq-adjust (list x1 x2 x3 x4 x5) y) returns 0.9751. 2 (R-sq-Predict x-data y-data) returns the RPredict value.
For example, (R-sq-Predict (list x1 x2 x3 x4 x5) y) returns 0.9623.
Variance Inflation Factors (VIF) Often in multiple linear regression the regressor variables can be highly correlated, since the regressor variables are selected to affect the response variable. This correlation results from multicollinearity, a condition that can skew the response. EXAMPLE 8.28
Show that -1 + 3x1 + 2x2 = Yˆ and -4 + 2x1 + 3x2 = Yˆ both satisfy the data because of the multicollinearity of x1 and x2. Infinitely many models do so.
Y
10
15
20
25
x1
1
2
3
4
x2
4
5
6
7
Solution The variables x1 and x2 are perfectly correlated with x2 = x1 + 3.
-1 + 3x1 + 2x2 = Y = -4 + 2x1 + 3x2 -1 -1 -1 -1
+ + + +
3*1 3*2 3*3 3*4
+ + + +
2*4 2*5 2*6 2*7
= = = =
10 15 20 25
= = = =
-4 -4 -4 -4
+ + + +
2*1 2*2 2*3 2*4
+ + + +
3*4 3*5 3*6 3*7
The command (vif '((1 2 3 4)(4 5 6 8))) returns (29.166 29.166), indicating strong multicollinearity between x1 and x2. Note that the x2 = 7 value was changed to 8. The VIF would become infinite for perfectly correlated variables and a singular XTX matrix results. To check for multicollinearity among the variables, we compute the 1 Variance Inflation Factors (VIF) for X j = for i = 1 to n, i π j, where 1 - R 2j R2j is the coefficient of determination of the data set, with the jth data entry omitted from the calculation performed on the omitted “response” Xj. A zero
P369463-Ch008.qxd 9/2/05 2:56 PM Page 513
8.9 Multiple Regression Techniques
513
coefficient of determination, that is, R2j = 0 for Xj, implies no multicollinearity, giving a VIF of 1. Any VIF value exceeding 1 implies the degree of multicollinearity. Regressor variables with VIFs in the 5 to 10 range, depending on the size of the data sets, may need to be excluded from the regression. For the previous data set in Example 8.25, the VIFs are X1 X2 X3 X4 X5 . VIF 40.57 25.58 13.81 27.36 15.17 Notice that the variable X3 has the highest R2 (0.969) and lowest VIF (13.81).
The command (VIF x-data) returns a list of the variance inflation factors. (VIF (list x1 x2 x3 x4 x5)) returns X1 (40.57
X2 25.58
X3 13.81
X4 27.36
X5 15.17)
The command (R-sq (list x1 x2 x4 x5) x3) returns 0.927 for R32 fi VIF for X3 = 13.81. VIFs over 10 arouse suspicion of multicollinearity. Notice the high VIFs for these regressors. Mallow CP Statistic The Mallow CP statistic SSErrorReducedModel + 2( k - 1) - n, which reduces to SSErrorWholeModel Cp =
MSE p MSE k
[ n - p - 1] - [ n - 2( p + 1)] =
SSE p
is
computed
as
- ( n - 2 p),
MSE k
where p is the number of regressor variables used in the model and k is the maximum number of regressor variables available, and n is the sample size. If CP is greater than (p + 1), the model may contain unneeded regressor variables (overspecified); if less, the model may not contain a sufficient number of needed regressor variables (underspecified). The best model will have a Cp value close to (p + 1).
Y
90 91 94 96 97 99 103 105 109 111 116 119 123 125 125 130 132 131 134 137
x1 x2 x3 x4 x5
42 43 43 27 2
45 44 43 31 2
45 46 43 37 4
47 46 44 38 5
47 48 45 38 11
48 48 45 40 12
48 48 50 41 13
48 50 50 42 16
49 51 50 43 19
50 51 50 43 20
50 52 52 45 21
50 52 53 45 25
51 54 53 46 30
51 54 56 46 30
52 54 57 47 32
52 54 59 50 35
53 55 59 58 35
56 55 59 61 39
56 58 61 62 64
59 60 63 69 66
P369463-Ch008.qxd 9/2/05 2:56 PM Page 514
514
Chapter 8 Regression
For example, (CP '(x1 x2 x3 x4 x5) '(1 2 5) y) ; reduced model uses x1, x2 and x5 returns 31.8 where MSEp = 16.939, MSEk = 6.182, k = 5, p = 3, and n = 20; while (CP '(x1 x2 x3 x4 x5) '(2 3) y) returns 3.5, (CP '(x1 x2 x3 x4 x5) '(2 3 5) y) returns 2.2, indicating that regressors x2 and x3 or x2, x3, and x5 are sufficient (agreeing with forward and backward selection procedures).
The command (Cp x-data index y-data) returns the value of the CP statistic, where x-data is a symbolic list of the names of the x-data, index is a list of the number of the variables (x1 . . . xn) constituting the partial model, and y-data is a list of the y-data. For example, (Cp '(x1 x2 x3 x4 x5) '(1 2 5) y) returns 31.8.
Stepwise Regression In performing linear regression with the given data, it is advised to set aside if possible about a randomly selected 30% of the data for validation. After developing the model with the remaining 70%, attempt to validate the model on the 30% data set aside. Often when one is determining what the response variable Y depends on, the x input or regressor variables must be selected that cause a significant response in Y. It is desirable to attain an economic model by eliminating those inputs that have little or no effect on the response. We may want to eliminate a variable that is difficult and expensive to measure and which offers little effect on the response. Stepwise regression (forward or backward) is such a procedure that helps select the important factors and eliminate the weak factors. However, this data-driven procedure is no substitute for intelligent analysis of the factors affecting the response. Stepwise regression helps the investigator focus on the more important aspects of the regression. Step 1: In stepwise regression, first take the response variable and each x variable in turn and run a linear fit to test the hypothesis b1 = 0 at the desired level of risk in the model Y = b 0 + b1 x i . The x variable with the highest absolute, significant t-value or R2 or lowest MSE is then selected as the best single predictor of Y and becomes x1. If the
P369463-Ch008.qxd 9/2/05 2:56 PM Page 515
8.9 Multiple Regression Techniques
515
t-value is not significant, then that regressor variable is eliminated from the equation and subsequent testing. Equivalent tests to determine the best regressor variables are the highest R2 values or lowest MSE. Step 2: Next take each of the remaining variables in turn and run the regression for the model Y = b 0 + b1 x i + b 2 x i . Similarly, the tests are run on the hypothesis that b2 = 0 and the x-variable with the largest t-value is selected as the second best predictor. This variable becomes x2. Step 3: The process is repeated until none of the remaining x-variables has a significant b. As an extra caution, after a second x-variable is selected in the model Y = b0 + b1x1 + b2x2, the t-value for b1 should be rechecked for significance. If the significance is below the a-level, then the x1 variable should be eliminated from the model and the remaining x-variables may be introduced to test for their corresponding t-values for their betas. Notice, however, that in Forward or Backward Elimination, once a variable is accepted or discarded, it is accepted or discarded permanently. This procedure does not guarantee the best-selected x-variables, since only estimates are provided. But the procedure can help prune the candidate variables objectively. The procedure is shown in the following example.
EXAMPLE 8.29
Use the stepwise regression procedure for the following data set:
Y
53
73
117
117
153
174
231
299
400
559
X1
1
1
2
3
5
8
13
21
34
55
X2
19
17
15
13
11
9
7
5
3
1
X3
2
4
6
8
10
11
14
16
18
20
X4
1
4
1
5
9
2
6
5
3
2
Solution The Command (ex8.29) assigns the data to the xi and to the Y. (setf Y'(53 73 117 117 153 174 231 299 400 559) X1'( 1 1 2 3 5 8 13 21 34 55) X2'(19 17 15 13 11 9 7 5 3 1) X3'( 2 4 6 8 10 11 14 16 18 20) X4'( 1 4 1 5 9 2 6 5 3 2))
P369463-Ch008.qxd 9/2/05 2:56 PM Page 516
516
Chapter 8 Regression
The command (test-beta x-data y-data b0) returns the t- and p-values for testing the null hypothesis b = b0. If no value for b0 is given, the test is assumed for b = 0. (test-beta x1 Y ) returns t = 19.07; p-value = 3.8E-8 (R-sq x1 Y ) (test-beta x2 Y ) returns t = -7.14; p-value = 9.8E-5 (R-sq x2 Y ) (test-beta x3 Y ) returns t = 7.54; p-value = 6.7E-5 (R-sq x3 Y ) (test-beta x4 Y ) returns t = -0.23; p-value = 0.8226 (R-sq x4 Y )
returns 0.9785. returns 0.8643. returns 0.8765. returns 0.0067.
The same command (test-beta x-data y-data) returns the degrees of freedom, a list of t-values and a corresponding list of p-values for testing multiple regressor variables. (test-beta (list x1 x2 x3 x4) y) returns Predictor X0 X1 X2 X3 X4
Coef
SE Coef
T-statistic
P-values
-128.853 6.276 8.005 16.648 -1.266
253.612 0.550 12.076 13.004 1.854
-0.508 11.403 0.663 1.280 -0.683
0.6330 0.0001 0.5367 0.2566 0.5251
(R-sq (list x1 x2 x3 x4) y) returns 0.9982 as the value for R2.
The model begins with Y = b0 + b1x1, as the variable x1 has the highest significant t-value of 19.07 or R2 value of 0.9785. Regressor variable x4 is eliminated from consideration because its t-value fails to reject the hypothesis b = 0. Proceed with model Y = b0 + b1x1 + b2xi and check x2 and x3 with x1 for the highest significant t-value. (test-beta (list x1 x2) y) The t-values for x1 and x2 are 19.56 and -7.40, with R2 = 0.9976. The t-values for x1 and x3 are 20.37 and 8.16, with R2 = 0.9980. (test-beta (list x1 x2 x3) y) returns ( df = 6 the t-values = ( -0.168 18.249 0.360 1.140) with p-values = (0.871 0.000 0.731 0.297)); while
P369463-Ch008.qxd 9/2/05 2:56 PM Page 517
8.9 Multiple Regression Techniques
517
(test-beta (list x1 x2) y) returns ( df = 7 the t-values = (13.262 19.558 - 7.401) with p-values = (0.000 0.000 0.000)). Notice the strong correlation between x2 and x3 (multicollinearity). (rho x2 x3) Æ -0.9986. Maintaining both of these strongly correlated variables would lead to erroneous results. Also check all previous b t-values for significance even though they passed the initial test. Notice the higher t-values and lower p-values for regressor x1 and x2 rather than x1, x2, and x3. (y-hat (list x1 x2) y) returns the final regression equation Yˆ = 196.3 + 6.7x1 - 7.4x2. Notice that x3 could be used in lieu of x2, but it would be erroneous to use both. (y-hat (list x1 x3) y) Æ Y-hat = 38.805 + 6.585X1 + 7.763X3. To emphasize this point, (R-sq (list x1 x2 x3) y) Æ 0.998; (R-sq (list x1 x2) y) Æ 0.998; (R-sq (list x1 x3) y) Æ 0.998. In fact, for explaining the relationship, it may be easier just to use x1 with (y-hat x1 y) Æ Y-hat = 90.268 + 8.904X1 and (R-sq x1 y) Æ 0.978. We could also start with all the regressor variables and eliminate each one at a time to see if the R2 value increases. If R2 increases, the eliminated regressor variable remains eliminated. When R2 no longer increases but starts to decrease, we have arrived at our final model.
The template (Y-hat x-data y) returns the regression equation. For example (Y-hat x1 y) returns Y-hat = 90.268 + 8.904X, while (Y-hat (list x1 x2) y) returns Y-hat = 196.31 + 6.70x1 - 7.45x2. The template (MR x-symbolic-data y-data function) returns function values for all possible combinations of variables for model selection, with the use of R-Sq, PRESS, MSE-mlr, VIF, or Beta-Estimates for function. For example, (MR '(x1 x2 x4)Y 'R-sq) prints Model (X1 X2 X4) (X1 X2) (X1 X4) (X1) (X2 X4) (X2) (X4)
R-Square 0.99756 0.99756 0.98473 0.97848 0.90622 0.86429 0.00667
The template (mlr-stats x-data-symbolic y-data) prints a list of the various models formed from the candidate variables, including the
P369463-Ch008.qxd 9/2/05 2:56 PM Page 518
Chapter 8 Regression
518
corresponding MSE, R2, PRESS, Cp, and R2predict statistics to the screen or to a file. These statistics help in making informed choices on the appropriateness of each model. (mlr-stats '(x1 x2 x4) y) prints Model Variables None (X1 X2 X4) (X1 X2) (X1 X4) (X1) (X2 X4) (X2) (X4)
R2
MSE
PRESS
CP
R2 PREDICT
0.000 0.997 0.998 0.985 0.978 0.906 0.864 0.007
25660.71 93.73 80.47 503.94 621.35 3093.90 3917.76 28675.84
291756.56 1661.61 1091.16 9079.44 10561.20 50823.32 59468.73 332458.50
2452.04 4.00 2.01 33.64 47.04 227.07 328.40 2441.62
0.000 0.993 0.995 0.961 0.954 0.780 0.742 -0.440
where the adequate model (x1 x2) in all criteria, shown in bold, yields equation Y -hat = 196.31 + 6.70 X 1 - 7.45 X 2.
EXAMPLE 8.30
Y x1 x2 x3 x4 x5 x6
Perform candidate selection using R2, PRESS, Mallow’s CP, the MSE, and the VIF from the following data.
-551.8 -623.9 -544.3 -688 -676.8 -659.1 -574.2 -696 -639.4 -697.1 -553.2 -679.3 -645.7 -563 -610 17 48 22 50 72 12
10 45 29 56 72 18
17 48 28 57 76 10
14 28 24 60 80 12
20 32 29 59 78 17
17 49 29 59 73 15
15 22 27 50 72 17
12 23 21 54 76 19
14 36 24 50 77 20
10 50 24 57 80 19
14 32 29 51 78 16
20 47 29 59 80 19
14 47 23 56 72 13
20 37 21 53 72 10
11 22 28 60 71 10
Solution The command (ex8.30) assigns the data to the xi and to Y. With use of stepwise regression techniques, x1 and x2 are eliminated. The command (mlr-stats '(x3 x4 x5 x6) Y) returned the display below, from which the optimal model appears to be x3, x4, and x6. The beta coefficients from the command (beta-estimates (list x3 x4 x6) Y ) are (62.665 9.333 -13.497 -12.060) and the regression equation Yˆ = 62.665 + 9.333 X 3 - 13.497 X 4 - 12.060 X 6 is returned by the command (y-hat (list x3 x4 x6) Y ).
P369463-Ch008.qxd 9/2/05 2:56 PM Page 519
8.9 Multiple Regression Techniques
Model Variables None (X3 X4 X5 X6) (X3 X4 X5) (X3 X4 X6) (X3 X4) (X3 X5 X6) (X3 X5) (X3 X6) (X3) (X4 X5 X6) (X4 X5) (X4 X6) (X4) (X5 X6) (X5) (X6)
EXAMPLE 8.31
519
R2
MSE
PRESS
CP
R2-PREDICT
0 0.949 0.547 0.948 0.398 0.404 0.275 0.306 0.003 0.734 0.455 0.731 0.309 0.380 0.261 0.287
3233.410 235.130 1863.381 213.896 2270.446 2449.792 2733.111 2617.101 3470.095 1093.012 2053.935 1013.345 2404.343 2335.128 2570.023 2481.616
0 5599.364 34564.589 4247.174 40801.519 8606.141 49414.945 45820.013 62132.25 23089.677 34123.372 18102.911 40186.375 43908.709 41909.633 41164.834
51965.510 5.000 80.173 3.006 106.873 107.607 130.485 124.564 180.855 44.133 95.823 42.716 121.932 110.174 131.092 126.204
180.591 0.876 0.236 0.906* 0.098 -0.073 -0.091 -0.012 -0.372 0.489 0.246 0.600 0.112 0.030 0.074 0.090
Find the beta coefficients along with the corresponding t-values, p-values, and variance inflation factors for the following data.
Y
90 91 94 96 97 99 103 105 109 111 116 119 123 125 125 130 132 131 134 137
x1 x2 x3 x4 x5
42 43 43 27 2
45 44 43 31 2
45 46 43 37 4
47 46 44 38 5
47 48 45 38 11
48 48 45 40 12
48 48 50 41 13
48 50 50 42 16
49 51 50 43 19
50 51 50 43 20
50 52 52 45 21
50 52 53 45 25
51 54 53 46 30
51 54 56 46 30
52 54 57 47 32
52 54 59 50 35
53 55 59 58 35
56 55 59 61 39
56 58 61 62 64
59 60 63 69 66
Solution The command (ex8.31) assigns the data to the xi and to Y. The template (beta-t-p-vif x-symbolic y-data) returns the corresponding display. For example, (beta-t-p-vif '(x1 x2 x3 x4 x5) y-data) prints
Variables
Beta
t-values
p-values
VIF
Intercept X1 X2 X3 X4 X5
-62.7466 0.0570 1.7665 1.7713 -0.0990 -0.1777
-2.2056 0.0640 2.7759 5.4611 -0.3454 -1.4500
0.0446 0.9497 0.0149 0.0000 0.7348 0.1690
0 40.5663 25.5796 13.8132 27.3611 15.1735
P369463-Ch008.qxd 9/2/05 2:56 PM Page 520
520
Chapter 8 Regression
8.10 Correlation Analysis In regression analysis, the predetermined x-values (input) are not random variables. However, in correlation analysis, both the X and Y values are random variables with joint density f(x, y). Assuming fX and fY|x are normal with mY|x = a + bx, the joint distribution can be written as f ( x, y ) = f x * fY X . Since Y = a + bX + E, V (Y x) = s 2 , m Y = a + bm x , and V (Y ) = s Y2 = b 2s x2 + s 2 , where the error RV E ~ N (0, s 2 ). That is, f ( x, y ) =
2 ( y - a - b x)2 ˘˘ È 1 È (x - m x ) exp Í- Í + ˙˚˙˚; -• < x < •; -• < y < •. Î 2 Î s 2x 2ps xs s2
1
Note that the bivariate normal distribution can also be written as f ( x, y ) =
1
(8–33)
2ps xs y 1 - r 2
2 2 1 Ï È( X - m x ) Ê x - m x ˆ Ê Y - m y ˆ (Y - m y ) ˘ ¸ * expÌ2 r + ˙˚˝˛ Ë s ¯Ë s ¯ Ó 2(1 - r ) ÍÎ s 2x s y2 x y
by substituting my - bmx for a, rsy/sx for b, and s 2y - b 2s 2x for s 2. Correlation analysis assumes the joint distribution of RVs X and Y is bivariate normal (8–33), with the 5 parameters mX, s X2 , mY, s Y2, and r. Correlation analysis is closely related to regression and to the least squares method of finding estimates A, B, and R for the parameters a, b, and r. From regression analysis, R2 =
( S XY )2
,
S xx SYY and B=
S XY
;
S XX thus R=B
S XY SYY
Normalize both xi and yi by letting
.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 521
8.10 Correlation Analysis
521
ui =
xi - x
yi - y
and vi =
S xx
.
Syy
Then Suu = Svv = 1 with Suv = R. Regressing on u, given v, results in ˆi = Rui with intercept A = - R u = 0 since ( u, ) = (0, 0). If the slope of the regression line is zero, then so is the correlation coefficient. Also it is evident that R and B have the same sign. To test the null hypothesis that r = 0, we could test the hypothesis b = 0 as done previously. However, we could not use the beta test for hypothesized values of r other than zero, because the skewness precludes the assumption of normality in the population. A suitable test statistic equivalent to the b = 0 test is given by R n -2
t=
1 - R2
,
which is the t distribution with v = n - 2. Recall that t2 = F and thus an equivalent test, as in the lack of fit F test, is R 2 ( n - 2)
F =
1 - R2
.
In correlation analysis, both X and Y are assumed to be normal random variables with a joint bivariate normal distribution. It can be shown that Fisher Z transformation of r to Z given by Z=
1 2
Ln
1+ r 1- r
= tanh -1r (inverse hyperbolic tangent)
(8–34)
is approximately normal with E( Z ) = m Z =
1
Ln
2
1+ r 1- r
= tanh -1r,
(8–35)
.
(8–36)
and V( Z) =
1 n -3
Hence, z=
Z - mZ sZ
= ( Z - m Z ) n - 3,
implying P ( - za /2 £ ( Z - m Z ) n - 3 £ za /2 ) = 1 - a , implying Z-
za /2 n -3
< mZ < Z +
za /2 n -3
= 100(1 - a )% confidence interval
P369463-Ch008.qxd 9/2/05 2:56 PM Page 522
522
Chapter 8 Regression
Also r=
EXAMPLE 8.32
e2 Z - 1 e2 Z + 1
= tanh z (hyperbolic tangent).
Given r = 0.6 from a random sample of 28 x-y pairs, construct a 95% confidence interval for the population correlation coefficient r and also test the null hypothesis that r = 0.7, versus the alternative hypothesis that r π 0.7. Solution For r = 0.6, Z0.6 = atanh-1 0.6 = 0.6931, V(Z) = 1/(28 - 3) = 1/25, and the 95% confidence interval for mZ is Z-
za /2 n -3
< mZ < Z +
za /2 n -3
0.6913 - 1.96/5 < m Z < 0.6913 + 1.96/5, where mZ is in the interval (0.2993, 1.0833) with 95% confidence. Transforming from Z to r, we have, using tanh z, 0.2907 < r < 0.7944. To test H0: r = 0.7 vs. H1: r π 0.7, z=
Z0.6 - Z0.7
s = 5(0.693 - 0.867) = -0.87
with a p-value = 0.384. Thus we cannot reject the hypothesis that r = 0.7. We also note that r = 0.7 lies within the 95% confidence interval, confirming that our null hypothesis cannot be rejected.
The command (r-to-z r) converts the r-value to a z-value. (r-to-z 0.6) Æ 0.693147. The command (z-to-r z) converts the z-value to an r-value. (z-to-r 0.693147) Æ 0.6. The z-interval given by (0.2993, 1.0833) may be converted into an r-interval, using (Z-list-to-R-list '(0.2993 1.0833)) to get (0.2907 0.7944). The command (rho x-data y-data) returns the r correlation coefficient. For example, (rho '(1 2 3 4 5) '(2 4 6 8 10)) returns 1.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 523
8.10 Correlation Analysis
EXAMPLE 8.33
523
The correlation coefficient from one random sample of size 35 was r1 = 0.67 and from another random sample of size 42 was r2 = 0.90. Test for a difference between the coefficients at a = 5%. H 0 : r1 - r 2 = 0 vs. H1: r1 π r 2 , Solution Z0.67 = 0.811 and Z0.9 = 1.472 and V ( Z1 - Z2 ) =
1 35 - 3
+
1 42 - 3
= 0.057
fi s = 0.239. z=
Z1 - Z2 - ( m Z1 - m Z2 ) s Z1-Z2
=
0.811 - 1.472
= -2.77.
0.2389
The p-value is 0.0056, and the null hypothesis of no difference between the coefficients is rejected. The reader should be cautious in thinking that a high correlation between two variables necessarily implies a causal relation between the two variables. For example, more people die from tuberculosis in the state of Arizona than in any other state (high correlation). This fact does not necessarily mean that the climate in Arizona causes tuberculosis. Considering the fact that people suffering from tuberculosis go to Arizona to enjoy the climate, naturally more people will eventually die from tuberculosis in Arizona.
EXAMPLE 8.34
Given X is the first 100 digits of pi and Y is the second 100 digits, perform the T and Z test for testing H0: r = 0 vs. H1: r π 0. The command (setf x pi-100) sets X to the 1st 100 digits of p as: 1 4 1 5 9 26 5 3 5 8 97 9 3 23 8 4 6 26 4 3 3 8 3 27 9 5 0 28 8 4 1 971 6 9 3 9 9 37 5 10 5 8 20 97 4 9 4 4 5 9 23 07 8 1 6 4 0 6 2 8 6 2 0 8 9 9 8 6 2 8 0 3 4 8 2 5 3 4 2 1 1 7 0 6 7 9. The command (setf y pi-200) sets Y to the 2nd 100 digits of p as: 8 2 1 4 8 0 8 6 5 1 3 2 8 2 3 0 66 47 0 9 3 8 4 4 6 0 9 5 5 0 5 8 2 2 3 1 7 2 5 3 5 9 4 0 8 12 84 8 11 1 7 4 5 0 2 8 4 10 27 0 1 9 3 8 5 2 1 1 0 5 5 5 9 6 4 4 6 2 2 9 4 8 9 5 4 9 3 0 3 8 1 9 6. The command (rho x y) returns 0.2542550 with p-value = 0.00522 (one tail). T=
R n -2 1- R
2
=
0.254255 100 - 2 1 - 0.0646456
= 2.602522 with p-value = 0.0107,
P369463-Ch008.qxd 9/2/05 2:56 PM Page 524
524
Chapter 8 Regression
Z = atanh -1 0.254255 = 0.259957. (r-to-z 0.254255) Æ 0.259957. z=
Z - mZ sZ
= ( Z - m Z ) n - 3 = (0.259957 - 0) * 100 - 3 = 2.56027
with p-value = 0.0105. EXAMPLE 8.35
The data below show the number of units produced by the day shift and the night shift for the past two weeks. Find the correlation coefficient. Day Shift Night Shift
Solution
25 28
28 20
28 23
28 23
20 20
25 24
20 23
24 28
26 20
27 22
(setf day-shift '(25 28 28 28 20 25 20 24 26 27) night-shift '(28 20 23 23 20 24 23 28 20 22))
(rho day-shift night-shift) returned r = -0.0507, p-value = 0.4466 (one-tail).
8.11 Summary When both paired data sets X and Y are randomly chosen, the problem is one of correlation analysis. When one set is predetermined, the problem is one of linear regression. The following comprehensive simple example of simple linear regression offers a review of the concepts and calculations covered in the chapter. EXAMPLE 8.36
Use the method of least squares to fit a line for the data given by columns Y and x in the display below. x
x2
xY
Y2
Yˆ
(Y - Yˆ)
(Y - Yˆ)2
1
1
1
1
1
0.857
0.143
0.020
2
2
4
4
4
1.714
0.286
0.082
Y
2
3
9
6
4
2.571
-0.571
-0.327
3
4
16
12
9
3.429
-0.429
0.184
5
5
25
25
25
4.286
0.714
0.510
5
6
36
30
25
5.143
-0.143
0.020
Totals: 18
21
91
78
68
18.000
0
1.143
Y = 3, x = 3.5. A plot of the data is shown in Figure 8.15a, and that of the residuals versus Yˆ is shown in Figure 8.15b.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 525
8.11 Summary
525
Residuals vs. Predicted Y-hat
6 4 2 0 0
2
8
6
4
Residuals
Y
Y vs. x
1 0.5 0 –0.5 0 –1
4
2
x
6
Predicted Values
(a)
(b)
Figure 8.15(a) Y vs. x
Figure 8.15(b)
Residuals vs. Yˆ
a) Write the regression equation. From the data Yˆi = 0 + 0.857x. b) Compute the Yˆi, (Y - Yˆi), and the (Y - Yˆi)2 entries. Observe that SYi = S Yˆij = 18 fi sum of the residuals equal 0. c) The residual at x = 2 is 0.286. d) SSError = S(Yi - Yˆi)2 = 1.143. e) Sxx = S(xi - x)2 = 6.25 + 2.25 + .25 + .25 + 2.25 + 6.25 = 17.5 = S x 2 - nx 2 = 91 - 6 * 3.52 = 91 - 73.5 = 17.5. f ) SxY = SxY - n x Y = 78 - 6 * 3.5 * 3 = 15. g) SYY = SY2 - n Y 2 = 68 - 6 * 32 = 14. h) SExplained = BSxY = .857 * 15 = 12.857. i) Check that SYY = SExplained + SSError. 14 = 12.857 + 1.143. j) R 2 =
Sexplained
=
12.857
= 1-
SSError
= 1-
1.143
= 0.918. 14 SYY 14 SSError 1.143 = = 0.286. k) s 2 = n -2 4 s2 0.286 = = 0.016 with sB = 0.128. l) s 2B = S xx 17.5 m) Find the p-value for testing the hypothesis that the slope b = 0. Show that SYY
T=
R n -2
T2 =
is also equivalent for testing H 0 : b = 0. Show that
1 - R2
R 2 (n - 2) 1- R
2
=F =
SExplained /1 SSError /( n - 2)
.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 526
526
Chapter 8 Regression
B
=
0.857
= 6.708 fi p-value = 0.003
0.128
sB
T=
R n-2 1- R
2
SExplained /1 SSError /( n - 2)
=
0.958 6 - 2
ª 6.708.
1 - 0.918 =
12.857/1
ª 45 ª 6.7082.
1.143/4
n) Find the p-value for testing the hypothesis that the intercept a = 0. t = 0 with p-value = 1. s 2 Â x 2 0.286 * 91 = = 0.248 with s A = 0.498. o) s 2A = nS xx 6 * 17.5 p) A 95% confidence interval for a is 0 ± 2.776 * 0.498 = 0 ± 1.379. q) A 99% confidence interval for b is 0.857 ± 4.604 * 0.128 = 0.857 ± 0.589. 2 1 ( x - x )2 ˆ Ê 1 (3.5 - 3) ˆ r) V ( A + Bx ) x =3 = s 2 Ê + = 0.286 + = 0.052. Ën Ë6 S xx ¯ 17.5 ¯ s) Compute a 95% confidence interval for the mean value Yˆ|x=3.5. Yˆ
x = 3.5
Œ 3 ± 2.776 * 0.218 = 3 ± 0.605.
t) Predict Y | x = 3.5 and compute a 95% prediction interval for a + 3.5b at x = 3.5. The predicted value of Y | x = 3.5 = 0 + 0.857 * 3.5 = 3. The variance of a predicted response at x = 3.5 is 0.577. A 95% prediction interval for a + 3.5b at x = 3.5 is 3 ± 2.776 * 0.577 = 3 ± 1.602. Observe that a prediction interval always includes the confidence interval of the mean value at the x-value. Performing standardized residual analysis can check the assumptions for linear regression. About 95% of these values should lie between -2 and 2 standard deviations. Always plot the data for a quick check of the linear relationship. Stepwise regression is a useful procedure for pruning the candidate space for input factors. There are cautions in the procedure. Be wary of X-variables that are strongly correlated with other X-variables and use the best representative of the set. Candidate variable selection is more an art than a science. Be wary of drawing casual relationships among variables when the correlation coefficient is high. The correlation coefficient only indicates the explainable variation due to a possible existing linear relationship.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 527
527
Problems
Table 8.3 Parameter
a
Estimator
Distribution
Â
A
Confidence Interval S2
xi2 ˆ Ê s2 N Áa, ˜ nS xx ¯ Ë
A ± tn -2,a /2
Âx
2
nS xx
b
B
s2 ˆ N ÊÁ b, ˜ Ë S ¯ xx
B ± tn -2,a /2 S 2 / S xx
Y|x
Yˆ
1 ( x - x )2 ˆ ˘ N ÈÍa + b x, s 2 ÊÁ + ˜ Ë Î n S xx ¯ ˙˚
1 ( x - x )2 ˘ A + Bx ± tn -2,a /2 S 2 ÈÍ + În S xx ˙˚
Y|x
YPredicted*
1 ( x - x )2 ˆ ˘ N ÈÍa + b x, s 2 ÊÁ1 + + ˜ Ë Î n S xx ¯ ˙˚
1 ( x - x )2 ˘ A + Bx ± tn -2,a /2 S 2 ÈÍ1 + + Î n S xx ˙˚
* The “confidence interval” for predicted response YPredicted is a prediction interval.
A summary of the simple linear regression estimators is shown in Table 8.3.
PROBLEMS 1. a) Perform the linear regression to find the equation for Y on x, using the data below. b) Then perform the linear regression for X on y and show that x and y satisfy both regression lines. c) Find Sxx, SYY, SxY, SSError, and the p-value for testing b = 0. d) Find a 95% confidence interval for the intercept parameter a. ˆ = 2.67 + 0.5y ans. Yˆ = 3 + 0.21x X 2 18.7 8 4 7.1 p-value = 0.8 3 ± 41.6. Y x xY x 2 6 4
2 4 8
2. Find Yˆ for data pairs (x, Y ) = {(1, 3), (5, 6) (8, 9)}. Find Y for x = 7. Find Y for x = 25. (Careful!) 3. A person has 3 sons aged 2, 4, and 8 years with respective heights in inches of 35, 42, and 60. Compute R2 and then predict the height of the 2-year old when he is 6 and when he is 20. See Problem 2.
ans. 0.9966 4.3 feet 9.2 feet?
P369463-Ch008.qxd 9/2/05 2:56 PM Page 528
528
Chapter 8 Regression
4. Given 20
Âx
20 i
i =1 20
Âx
20
= 23.92, Â Yi = 1843.21, Â Y 2i = 170, 044.53, i =1 20
2 i
i =1
= 29.29, Â x i Yi = 2, 214.66,
i =1
i =1
write the equation of the linear regression model and compute SSError. 5. Fill in the blanks, given the data below. x Y
4 1
6 3
8 10 6 8
12 14 16 18 14 16 20 21
Yˆ
__ __ __ __ __ __ __ __ ˆ Y -Y __ __ __ __ __ __ __ __ 2 ˆ (Y - Y ) __ __ __ __ __ __ __ __ a) b) c) d) e) f) g) h) i) j)
The estimated regression line is ______. ans. Yˆ = -5.96 - 1.55x. Sxx = ____ 168__ SxY = ____ 261 SSError = ____ 7.39 s2 = ____ 1.23 SYY = ____ 412.9 R2 = ____ 0.98 sA = ____ 1.02 sB = ____. 0.09 tb=0 = ____ 18.14 ta=0 = ____ -5.84 The residuals are: ____ 0.75 -0.36 -0.46 -1.57 1.32 0.21 1.10 -1 The sum of the residuals is ______ 0 The sum of the square of the residuals is ______ 7.39 The estimate Y for x = 7 is ______ 4.89 The explained variation is ______ 405.5 Express B in terms of the Yi’s coefficients with the command (B-Ycoef x), which returns (-0.041 -0.029 -0.017 -0.005 0.005 0.017 0.029 0.041). When each is multiplied by the Yi (1 3 6 8 14 16 20 21) and summed, the B value is returned, that is, (dot-product (BY-coef x) Y ) Æ 1.553, the slope. Observe the symmetry of the coefficients due to the spacing in the x-values. The sum of the B coefficients is always zero.
6. Given joint density f ( x, y ) =
2
(2x + 3y ) for 0 £ x £ 1 and 0 £ y £ 1, find
5 mY|x and mx|y. 7. Fill in the space below for the following data: Y 3.6 2.8 5.6 7.0 9.0 10.5 S 38.5
x 1.5 2.0 2.5 3.0 3.5 4.0 16.5
x2
xY
Yˆ
Y - Yˆ
(Y - Yˆ)2
P369463-Ch008.qxd 9/2/05 2:56 PM Page 529
529
Problems
a) The least squares line is __________. ans. Y-hat = -2.15 + 3.11. b) Plot the points and graph the least squares line. Fill in the entries for Yˆ, (Y - Yˆ), and (Y - Yˆ)2. (See Software Exercise 19.) c) The residual at x = 2 is __________. -1.28 d) SSError = S(Yi -Yˆ )2 = ________ and s2 = ________. 2.94 0.734. e) Sxx = __________. 4.35. f) SxY = __________. 13.63. g) SYY = __________. 45.37. h) The computed t-value for testing the hypothesis b = 0 at a set at 0.05 is __________. 7.60. i) The explained variation is __________ and the unexplained variation is __________. 42.43 2.94. j) R2 = Explained/Total = __________. 0.935. k) A 95% confidence interval for the mean value of a + bx at x = 3.5 is __________. (7.46, 10.04). l) A 95% prediction interval for Y at x = 3.5 is __________. (6.05, 11.45). m) A 95% confidence interval for a is __________. (-5.42, 1.12). n) A 95% confidence interval for b is __________. (1.98, 4.25). o) True or False: TTTT.
 Yˆ e i
i
= 0;
ii)
ÂX e
i =1
n
n
n
n
i)
i
= 0;
i
iii)
i =1
Âe
i
= 0;
iv)
i =1
n
 Y =  Yˆ . i
i =1
i
i =1
8. Set up normal equations for the model Y = b0 + b1x1 + b2x2 + E and solve, using the data below. Y
x1
x2
1 2 3
2 3 3
3 2 5
Y2
x12
x22
x1Y
x2Y
x1x2
S
9. Given the joint density f(x, y) = 6x for 0 < x < y < 1, show that mY|x = (1 + x)/2 and mx|y = 2y/3. 10. Given joint density f(x, y) = 2 for 0 < y < x < 1, find mY|x and mx|y. 11. Fit a linear regression equation Yˆ = A + B x to the following data: ans. Yˆ = 21.69 - 3.471x. x Y
4 31
9 58
10 65
14 73
4 37
7 44
12 60
22 91
1 21
17 84
12. Find the linear regression equation from the following data and estimate the amount of chemical remaining after 5 hours. Find the residual at x = 6 hours. Can you find the residual at 7 hours? Hours Chemical
2 1.8
4 1.5
6 1.4
8 1.1
10 1.1
12 0.9
14 0.6
P369463-Ch008.qxd 9/2/05 2:56 PM Page 530
530
Chapter 8 Regression
13. Fit the data below to the model Y = ab x by transforming into Ln Y = Ln a + (Ln b)x. ans. Yˆ = 1.37 * 1.38x. x Y
1 2
2 2.4
4 5.1
5 7.3
6 9.4
8 18.3
14. Prove that if the regression of Y on x is linear, then m Y
x
= mY + r
sY sx
(x - mx). 15. a) Show that V ( A) =
s 2 Â x i2
n S xx b) Show that C( Y, B) = 0.
=
s2 n
+
x 2s 2
.
n S xx
16. Compute to show that the coefficient of determination is exactly 1 for the linear set of points {(1, 2) (2, 4) (3, 6)}. Use the commands to show that (R-sq x Y ) = (R-sq Y x). 17. Use a transformation to find an exact relationship between x and Y for the following data:
{(0, 1/2)(1, 1/5)( -1 -1)}. Hint: See answer.
ans.
1
= 2 + 3x.
Y
18. Use a transformation to find an exact relationship between x and Y for the following data:
{(1, 3)( -1, 11)(2, 5)}. 19. Use a transformation to find an exact relationship between x and Y for the following data:
{(1, -1/2)( -1, 1/8)(5, 1/2)}. Hint: See answer.
ans. 1/Y = 3 - 5/x.
20. Fit a line for the x-Y pairs where the x-set is the first 25 decimal digits of p and the Y-set is the second 25 decimal digits of p. Naturally, before fitting a line, one should make a scatter plot of the data to see if linear regression is applicable. However, in this case, proceed to see the SSError term. Predict the intercept value of A and R2. x Y
1 4 1 5 9 2 6 5 3 5 8 9 7 9 3 2 3 8 4 6 2 6 4 3 3 8 3 2 7 9 5 0 2 8 8 4 1 9 7 1 6 9 3 9 9 3 7 5 1 0
The command (setf x (Sublist pi-100 0 24)) assigns x to the first 25 digits of pi; (setf y (Sublist pi-100 25 49)) assigns y to the second 25 digits; (SSerror x y) returns the error; (A x y) returns the intercept; (Sexplain x y) returns the explained variation. (R-sq x y) returns R2.
P369463-Ch008.qxd 9/2/05 2:56 PM Page 531
531
Problems
21. Generate y-values by using the command (sim-lr a b sigma n). The command returns a list of Y-data with an added error component from N(0, s 2). (y-hat (upto n) y-data) retrieves the Yˆ equation, which should show A and B very close to a and b, depending on the size of n. With true equation Y = 5 - 2x, (sim-lr 5 -2 4 10) generated the y-data X Y
1 2.1
2 3.3
3 -1.4
4 -2.6
5 -5.1
6 -8.7
7 -9.5
8 -10.0
9 -13.5
10 -14.1
from which the retrieved estimate is Yˆ = 5.1 - 2.0X. Try (y-hat (upto 30) (sim-lr 25 37 4 30)) to retrieve estimates close to 25 and 37. Vary the sigma value 4 and the sample size 30 to see how accuracy is affected. 22. Find a 95% confidence interval for the population correlation coefficient r if a random sample of 39 x-y pairs has a correlation coefficient r of 0.63. Test H0: r = 0.5 versus H1: r π 0.5 at a = 5%. 23. Test for a difference between correlation coefficients at a = 1%, given the following data. ans. p-value = 0.1911. r1 = 0.78 n1 = 45
r2 = 0.65 n2 = 56.
24. Show that 2 2 2 x 2s 2 2xx s 2 È 1 (x - x) ˘ s  x s2Í + = + . În S xx ˙˚ nS xx S xx S xx
MULTIPLE LINEAR REGRESSION WITH MATRICES
Y 1 2 3 S 6
x1 2 3 3 8
x2 3 2 5 10
Y2 1 4 9 14
x12 4 9 9 22
x22 9 4 25 38
x1Y 2 6 9 17
x2Y 3 4 15 22
x1x2 6 6 15 27
25. a) Solve Problem 8 using the matrix approach. (setf x-data '((2 3 3)(3 2 5) y-data '(1 2 3)) (setf X-matrix (X-matrix x-data)) (setf Xt-matrix (transpose-matrix X-matrix)) (setf Y-matrix (Y-matrix y-data)) (setf XtX-matrix (XtX x-data)) (setf XtY-matrix (XtY x-ddata y-data)) (setf B-matrix (B-matrix x-data y-data)) È1 2 3˘ È1 1 1˘ È14 13 22˘ Í ˙ T Í ˙ T X = 1 3 2 , X = 2 3 3 , XX = Í13 14 20˙ Í ˙ Í ˙ Í ˙ ÍÎ3 2 5˙˚ ÍÎ22 20 35˙˚ ÍÎ1 3 5˙˚
P369463-Ch008.qxd 9/2/05 2:56 PM Page 532
532
Chapter 8 Regression
È11.8 -3.7 -0.4˘ È 6˘ È1˘ Í ˙ Í ˙ T -1 t Y = 2 , B = ( X X ) X Y = Í -3.7 1.5 -0.1˙ Í17˙ = Í ˙ Í ˙ ÍÎ-0.4 -0.1 0.2˙˚ ÍÎ22˙˚ ÎÍ3˚˙
È -2.6˘ Í ˙ Í 1.3˙ ÍÎ 0.3˙˚
b) Show that the normal equations are given by XTXB = XTY. (multiply-matrix XtX-matrix B-matrix) Æ #2A ((6.00)(17.00)(22.00)) (multiply-matrix Xt-matrix Y-matrix) Æ #2A ((6)(17)(22)) c) Find the regression coefficients (beta-estimates) by solving the normal equations. (beta-estimates x-data y-data) Æ (-2.67 1.33 0.33). (mLr - solve '(2 3 3) '(3 2 5) '(1 2 3)). 26. For the model E(Y | x) = b0 + b1x1 + b2x2 with the data below, write the following matrices. Show that the fit is exact and explain why so, that is, SSError = 0 or that R2 = 1. Y
x1
x2
3 5 4
4 3 1
8 7 6
a) X b) XT c) XTX d) Y matrix e) B matrix T -1 g) (X X ) h) (XTX )-1 XTY i) C j) R-matrix
f) XTY
27. Fit a matrix equation, solve for the beta weights, and write the matrices X, XT, XTX, XTY, (XTX )-1, B, C, R-matrix, and SSError, given the following data: Y
x1
x2
x3
1 3 6 9
0 1 2 3
2 4 5 7
1 -1 3 5
(setf x-data '((0 1 2 3)(2 4 5 7)(1 -1 5 5)) y-data '(1 3 6 9)) (setf X-matrix (X-matrix x-data)) (setf Xt-matrix (transpose-matrix X-matrix)) (setf Y-matrix (Y-matrix y-data)) (setf XtX-matrix (XtX x-data)) (setf XtY-matrix (XtY x-data y-data)) (setf B-matrix (B-matrix x-data y-data)) (setf C-matrix (C-matrix x-data)) (setf R-matrix (R-matrix x-data y-data)) (repeat #' print-matrix (list X-matrix Y-matrix XtX-matrix XtY-matrix B-matrix C-matrix R-matrix)) (setf SSerror (SSerror x-data y-data))
P369463-Ch008.qxd 9/2/05 2:57 PM Page 533
533
Miscellaneous
ans. = #2A((1 1 2 1)(1 1 4 1)(1 2 5 5)(1 3 7 5)) = #2A((1 1 1 1)(1 1 2 3)(2 4 5 7)(1 1 5 5)) = #2A((1)(3)(6)(9)) = #2A((4 7 18 12)(7 15 37 27)(18 37 94 66)(12 27 66 52)) = #2A((19)(43)(107)(79)) = #2A((-2.250002)(1.000027)(1.0)(0.249996) = #2A((9/8 1/8 1/2 -3/4)(1 -1 -1 1)(-1/2 1/2 0 0)(-1/8 -1/8 1/2 -1/4) = #2A((-2.098083e - 5)(-2.098083e - 5)(-3.242493e - 5) (-5.912781e - 5) SSerror = 0 X XT Y XTX XTY B C R
MISCELLANEOUS 28. For joint density fXY (x, y) = 2, 0 < y < x < 1, find the regression curve of Y on x and verify the curve by finding the parameters for the regression line mY
x
= mY + r
sY sx
( x - m x ).
29. For the following data, fit a regression curve. Check R2 and the t-test for b = 0. Then perform the lack of fit F-test and interpret the results. Refer to the data and residual plots. ans. Yˆ = -5.23 - 3.54x R2 = 0.96 t = 22.2 f = 10.57. 1 1 1 2 3 3 3 4 5 6 6 6 7 8 8 9 9 10 10 10 0 1/2 1/4 3 5 6 5 9 8 12 12 14 20 22 24 27 29 31 32 33
x Y
SSError = 89.57 with v = 18 df ; SSPure = 9.46 with v = 10 df ; SSLOF = 80.11 with v = 8 df .
35 30 25 20 15 10 5 0 0
5
10
Standardized Residuals
Residual Analysis 2 0 –2 0 –4
10
20
P369463-Ch008.qxd 9/2/05 2:57 PM Page 534
534
Chapter 8 Regression
30. Examine the residual plot and make an observation.
2 1 0 0
10
20
30
–1 –2 –3
31. Find the maximum likelihood estimators for parameters a, b, and s 2 in the simple linear regression model given by Yi = a + bxi for RV Yi. 32. Given X = (1 2 3 4 5), Y = (10 12 37 56 98), Z = ( 2 2 2 3 2 4 2 5 2 ), use software commands to check the following statements. The command (p8.32) assigns the data to X, Y and Z. True or False? a) (SSerror x y) = (SSerror z y) b) (B x y) = (B z y) c) (A x Y ) = (A z Y ) c) (Residuals x y) = (Residuals z y) d) (Sb x y) = (Sb z y) e) (Test-beta x Y ) = (Test-beta Z y) 33. Show that for x-data multiplied by a constant a and for y-data multiplied by a constant b, the t-statistic (Test-beta x y) = (Test-beta ax by). ans. t = B/ s B = [ S xY / S xx ]/ s 2 / S xx = [abS xY /a 2 S xx ]/( b/a ) s 2 / S xx . 34. Write the Yˆ equation, using the data in the table. What is the slope if 5 is added to each xi? Y x1
3 2
7 7
12 20
20 50
50 60
35. Find the regression equation to predict one’s emotional quotient (TEQ) from knowing scores for Intrapersonal, Interpersonal, Adaptability, Stress Management, and General Mood from the data. The command (Mp8.35) assigns the data to the variables. TEQ
95 83 104 89 78 89 101 94 90 94 74 88 89 82 52 91 95 104 97 90 79 113 92 82 123 93 87 108 104 90 96 82 97 56 95 105 116 107
INTRA
84 85 101 99 89 84 96 98 112 105 74 83 101 69 52 91 95 98 96 90 84 115 91 76 118 105 83 115 106 80 76 78 102 51 90 106 108 110
P369463-Ch008.qxd 9/2/05 2:57 PM Page 535
Miscellaneous
535
INTER
125 69 99 98 71 104 103 92 85 109 55 100 98 94 41 94 94 112 89 94 64 104 91 77 125 76 85 103 106 90 111 97 103 63 99 102 127 102
ADAPT
84 88 94 98 88 91 100 97 91 79 89 81 82 87 71 92 107 92 109 103 88 115 91 81 114 94 103 102 103 98 100 91 91 75 89 99 113 104
STRESS
100 98 107 68 73 97 113 81 58 77 104 83 72 92 60 101 86 108 102 85 94 109 100 94 124 98 89 96 103 100 125 66 81 77 99 106 121 100
GMOOD
97 85 100 94 80 87 95 101 93 104 63 112 102 96 75 88 86 114 91 81 83 113 93 121 123 78 82 110 103 99 101 97 108 58 103 112 117 108 ans. TEQ-hat = -15.702 - 0.401 X 1 + 0.192 X 2 + 0.217 X 3 + 0.207 X 4 + 0.137 X 5.
36. True or False? a) SYi2 = SYˆi2
b)
Â
(Yi - Yˆi )
=0
c) S(Y - Yˆ )(Yˆ - Y ) = 0
s
R ( n - 2)
e) Â Yˆi ei = 0 1 - R2 f) Â (Yˆ - Y )2 - B Â ( x - x )(Y - Y ) = Â (Y - Yˆ )2
d) T =
Given X = (1 2 3 4 5), Y = (12 37 98 56 10), Z = (1999 1998 1997 1996 1995), X2 = (3 6 9 12 15). True or False? g) h) i) j) k) l) m) n) o) p) q) r) s) t)
(SSerror X Y ) = (SSerror Z Y ) (B X Y ) = -B(Z Y ) (A X2 Y ) = (A X Y ) (residuals X Y ) = (residuals Z Y ) (A X Y ) = -(A Z Y ) R2 does not measure the magnitudes of the slopes. R2 measures the strength of the linear component of the model. R2 for x2 + y2 = 7 is 0. A large R2 does not necessarily means high predictability. R2 increases when sample size decreases. R2 varies as to the intercept A. (B x y) = (dot-product (B-Y-coef x) y) (A x y) = (dot-product (A-Y-coef x) y) (sum (yhats (swor 5 (upto 100)) Y )) = SYi
37. In a chemical process, 8 pressure measures are taken at 8 different temperatures. Determine the best values of coefficients A and B in using the exponential model Pressure = AeB*temperature. First evaluate a linear relationship and then perform a logarithmic transformation on the pressure to get the exponential relationship. Then perform polynomial regression for the quadratic and the cubic. Comment on the best relationship, using the R2 criterion.
P369463-Ch008.qxd 9/2/05 2:57 PM Page 536
536
Chapter 8 Regression Pressure (mm of mercury) Temperature (°C)
15.45 20
19.23 25
26.54 30
34.52 35
48.32 40
68.11 50
98.34 60
(T°C)
Pressure
Linear
Quad
Cubic
Exponential
20
15.45
7.27
12.99
15.32
17.12
25
19.23
18.24
20.17
19.57
21.12
30
26.54
29.20
28.20
26.40
26.05
35
34.52
40.16
37.07
35.37
32.14
40
48.32
51.13
46.79
46.03
39.65
50
68.11
73.06
68.76
70.67
60.34
60
98.34
94.99
94.12
96.78
91.84
70
120.45
116.91
122.86
120.82
139.77
120.45 70
The linear model is the simplest, but the exponential model may be more appropriate. The polynomial models deal with temperatures squared and cubed, which gives one pause as to why pressure would react to squared or cubed temperatures.
SOFTWARE EXERCISES LINEAR REGRESSION
x is a list of the x-values and Y is the corresponding list of the y-values. The command (setf x '(1 2 3 4 5) y '(6 5 4 0 -4)) assigns x to the list (1 2 3 4 5 ) and y to the list (6 5 4 0 -4). 1. (Sxx x) returns Sxx for x, a list of x-values. (Sxx x) returns 10; (Syy y) returns Syy = 68.8; (Sxy x y) returns SxY = -25. 2. (Predictions x y) returns a list of x, y values with the predicted y and the residuals. (Predictions x y) returns Obs
X
Y
Ypredict
Residual
1 2 3 4 5
1 2 3 4 5
6 5 4 0 -4
7.2 4.7 2.2 -0.3 -2.8
-1.2 0.3 1.8 0.3 -1.2
3. (Test-Betas (list x) y) prints Predictor
Coef
SE Coef
T-statistic
p-value
A B
9.7 -2.5
1.519 0.458
6.382 -5.455
0.007 0.012
S = 1.449
R-sq = 0.908
P369463-Ch008.qxd 9/2/05 2:57 PM Page 537
537
Software Exercises (Test - Regress (list x) y) prints Analysis of Variance Source SS DF Regression 62.5 1 Residual Error 6.3 3 Total 68.8 4
MS 62.5 2.1
F 29.761
p-value 0.009
(Layout x y) prints a columnar display of y, x, x2, xY, Yˆ, and (Y - Yˆ) Y 6 5 4 0 -4 S 11
x
x2
xY
Y-Hat
1 2 3 4 5 15
1 4 9 16 25 55
6 10 12 0 -20 8
7.2 4.7 2.2 -0.3 -2.8 11
(Y-YHat)
(Y-YHat)2
-1.2 0.3 1.8 0.3 -1.2 0
1.44 0.09 3.24 0.09 1.44 6.3
Y-hat = 9.7 - 2.5 X 4. (SSerr x y) returns SSError; (SSerr x y) returns 6.3. 5. (B x y) returns the B value. 6. (A x y) returns the A value. 7. (Y-hat x y) returns the equation Yˆ = A + Bx. 8. (R-sq x y) returns R2. 9. (Sexplain x y) returns SExplain, the explained variation. 10. (Sa x y) returns the standard error of A, sA. 11. (Sb x y) returns the standard error of B, sB. 12. (s2 x y) returns s2. 13. (sYm x y x0) returns the standard error for the mean value of Y, at x = x0. 14. (sYp x y x0) returns the predicted standard error of Y, at x = x0. 15. (cia x y a-level) returns a (1 - a-level/2) confidence interval for a. 16. (cib x y a-level) returns a (1 - a-level/2) confidence interval for b. 17. (ciYm x y x0 a-level) returns a (1 - a-level /2) confidence interval for the mean value of a + bx0. 18. (ciYp x y x0 a-level) returns a (1 - a-level/2) confidence interval for the predicted value of a + bx0. 19. (Residuals x y) returns a list of the residuals; (yhats x y) returns a list of the Yˆ values.
P369463-Ch008.qxd 9/2/05 2:57 PM Page 538
538
Chapter 8 Regression
20. (Test-beta x Y B0) returns the t- and p-values for testing the null hypothesis b = B0. (Test-alpha x y A0) returns the t- and p-values for testing the null hypothesis a = A0. The value of A0 defaults to 0 if unspecified. 21. The least squares estimator for f(x) = 1/(q - 1) on [1, q] may be computed from the following command for 30 data points: (LSE (sim-uniform 1 20 30)) Vary the sample size 30 and q = 20 for other values. 22. (Y-predict x y x0) returns the predicted mean value of Y at x = x0. 23. Generate data with a normal error value added by using the software command (setf data (sim-lr a b s n), which returns a list of n Y-values from the equation Y = a + bx, with added noise from N(0, s 2); that is, Y = a + bx + E. The command (Y-hat (upto n) data) retrieves the Yhat equation that should show A and B close to a and b, depending on the size of s and n. Try (setf data (sim-lr 240 1000 4 100)) followed by (y-hat (upto 100) data) to retrieve the Y-hat line estimate of the true regression line Y = 240 + 1000x. What does (Y-hat (SWR 100 (upto 100)) data) return? NOISE. To show the outlier effect, (setf y-data (sim-lr 3 5 2 30)) returns 30 data points about the line Y = 3 + 5x with an added N(0, 4) component. Use (pro y-data) to see that linear regression is appropriate, followed by (pro (residuals (upto 30) y-data)) to see that the residual plots are acceptable. Now (setf (nth 29 y-data) 5) changes Y | x = 30 to 5, an outlier. Repeat (pro (residuals (upto 30) y-data)) to see the devastating effect of an outlier.
P369463-Ch008.qxd 9/2/05 2:57 PM Page 539
539
Software Exercises
Residuals Versus the Order of the Data (response is C1) 4 3
Residual
2 1 0 –1 –2 –3 –4 5
10
15
20
25
30
Observation Order
Residuals Versus the Order of the Data (response is C1)
Residual
0
–50
–100
5
10
15
20
Observation Order
25
30
P369463-Ch008.qxd 9/2/05 2:57 PM Page 540
540
Chapter 8 Regression
24. (display-mlr X-data Y-data) prints to a file various outputs from multiple linear regression analysis, and (lr-stats x y) prints various statistics for simple linear regression. 25. (mlr X1 X2 Y-data) prints the normal equations for the model Y-Hat = A + Bx1 + Cx2. Use (setf x1 '(15 40 50 55 42 33 23 21 19 15) x2 '(19 17 17 15 15 15 16 17 18 18)) (self Y-data '(53 73 116 117 153 174 231 299 400 599)). Use the command to find the normal equations. With use of the matrix approach, the command (y-hat (list x1 x2) y-data) returns Y-hat = 767.721 - 7.513 X 1 - 18.864 X 2. Y
x1
x2
53 73 116 117 153 174 231 299 400 559
15 40 60 55 42 33 23 21 19 15
19 17 17 15 15 15 16 17 18 18
26. Find the regression Yˆ plane, using the matrix approach. Y x1 x2
2 1 2
5 3 3
7 5 5
8 7 7
9 9 11
11 11 13
13 13 17
17 15 19
27. (print-matrix matrix-A) prints matrix A. For example, (print-matrix (X-matrix '((1 2)(4 3))) returns 1 1
1 2
4 3
(inverse matrix) returns the inverse. (inverse (make-mat '((1 2)(4 3)))) returns # 2A(( -3/5 2/5)(4/5 -1/5)) (print-matrix (inverse (make-mat '((1 2)(4 3))))) prints -3/5 4/5
2/5 -1/5.
28. For the given data x1, x2, and Y below (assigned by command (p8.28), use commands a) (R-sq x1 x2) to show that x1 and x2 are uncorrelated, and that (B x1 x2) Æ 0.
P369463-Ch008.qxd 9/2/05 2:57 PM Page 541
541
Software Exercises
b) (y-hat x1 Y ), (y-hat x2 Y ), and (y-hat (list x1 x2) Y ) to show that simple linear regression slopes are the same for multiple regression, and c) (Regress-Anova x1 Y ), (Regress-Anova x2 Y ), and (RegressAnova (list x1 x2) Y ) to show that the explained regression for the multiple linear model is the sum of the explained regression for the two simple linear regressions. Y X1 X2
3 5 6 7 11 11 19 20 7 7 7 7 10 10 10 10 5 5 12 12 5 5 12 12
29. The command (X-matrix x-data) returns the matrix X. (Y-matrix y-data) returns the matrix Y. (XT x-data) returns the matrix XT. (XTX x-data) returns the matrix XTX. (XTY x-data y-data) returns the matrix XTY. (C-matrix x-data y-data) returns the matrix C = (XTX )-1XT. (Beta-estimates x-data y-data) returns a list of the Bi’s. (SSerror x-data y-data) returns SSerror. (sŸ2 x-data y-data) returns the estimate for s 2. (sŸ2Y0 x-data y-data x-values) returns the variance estimate for Ypredict. (R-sq x-data y-data) returns R2. (inverse matrix) returns the inverse of square matrix matrix. (R-matrix x-data y-data) returns the matrix of residuals. (Sexplain x-data y-data) returns SSExplain. (CCT x-data y-data) returns (XTX )-1. (ci-mlr x-data y-data alpha) returns the bi 1 - alpha/2 confidence intervals. SSExplained / k . (Test-regress x-data y-data) returns the F-statistic SSError/( n-k-1) (Test-betas x-data y-data alpha) returns a list of degrees of freedom, t- and p-values for testing the individual null hypotheses bi = 0 vs. bi π 0. 30. Use the commands in Software Exercise 29 to perform multiple linear regression analysis on the following data to create the matrices X
XT
Y
XTX
XTY
(X T X) -1 = CC T B
C
R.
Find 95% confidence intervals for the beta coefficients. Find the t-values for testing the individual hypotheses bi = 0. Test for the significance of the regression. (setf x-data '((1 2 3 4)(95 4 2 3)) y-data '(18 12 9 7)) assigns the data.
P369463-Ch008.qxd 9/2/05 2:57 PM Page 542
542
Chapter 8 Regression
È1 Í1 X =Í Í1 ÍÎ 1
Y
x1
x2
x12
x22
x1x2
x1Y
x2Y
18 12 9 7
1 2 3 4
5 4 2 3
1 4 9 16
25 16 4 9
5 8 6 12
18 24 27 28
90 48 18 21
1 2 3 4
5˘ È1 1 1 1˘ È 4 10 14˘ È 46˘ 4˙ T Í ˙, X = 1 2 3 4˙, X T X = Í10 30 31˙, X T Y = Í 97˙ Í ˙ Í ˙ Í ˙ 2˙ ÍÎ177˙˚ ÍÎ5 4 2 3˙˚ ÍÎ14 31 54˙˚ ˙˚ 3
È18˘ È 18.31 -2.94 -3.05˘ Í12˙ Í ˙ T -1 T Y = Í ˙, ( X X ) = CC = Í -2.94 -0.5 0.4 ˙ Í 9˙ ÍÎ-3.05 0.4 0.5 ˙˚ ÍÎ ˙˚ 7 È659/36 -53/18 -55/18˘ = Í -53/18 5/9 4/9 ˙ Í ˙ ÍÎ -55/18 4/9 5/9 ˙˚ È0.083 0.194 3.361 2.638˘ Í ˙ C = Í 0.16 -0.05 -0.38 0.61 ˙ = ÍÎ 0.16 0.05 -0.61 0.38 ˙˚ È 0.83˘ È15.61˘ Í ˙ Í ˙ B = Í -2.8 ˙, R = Í -1.38˙ Í 0.27 ˙ ÍÎ 0.8 ˙˚ Í ˙ Î 0.27 ˚
/ 7/36 121/36 -95/36˘ È112 Í 1/6 -118 / -7/18 1118 / ˙ Í ˙ ÍÎ 1/6 118 / -1118 / 7/18 ˙˚
b confidence intervals: b 0 15.61 ± 69.30 intercept SSError = 2.7; s 2 = 2.7 b1 -2.89 ± 12.07 b 2 0.89 ±12.07 Total Variation SYY = 69 2
R = 0.9597
Explained Variation SExplain = 66.2
Ti -values = (1.036 0.492 - 0.313)
Test for regression, using the F-statistic = SExplain /2 = 66.2/2 = 11.92. SSError /1 2.7/1 31. (R-to-Z R) converts the correlation coefficient R to Z. (Z-to-R Z) converts the Z-value to R. 32. Predict the return from (R-sq (sim-uniform 0 1 100) (sim-uniform 0 1 100)). Expect the correlation to be close to 0. Then repeat for
P369463-Ch008.qxd 9/2/05 2:57 PM Page 543
543
Software Exercises
(R-sq (sort (sim-uniform 0 1 100)#' 3.98 = fc(2, 11) with p-value = F2,11(4.3) = 40.4 0.042, which is less than a = 0.05, implying that the data reject the hypothesis of equal means with a set at 5%. Equivalent computational formulas for the sum of squares are SSBetween = c Ti2 T 2 Â r - N where Ti is the total of each column, T is the sum of all the i =1 i treatments, and N is the total number of observations. For example, if we rework Example 9.1, The F-value is
SSBetween =
4202 6
+
2602 4
+
2322 4
-
(420 + 260 + 232)2
= 345.71.
14
SS Total = S x ij2 - N x 2 = 652 + 70 2 + 752 + 792 + 70 2 + 612 + 70 2 + 722 + 652 + 532 + 60 2 + 552 + 592 + 582 - 14 * 65.12 = 60200 - 59410.29 = 789.71. SS Within = SS Total - SSBetween = 789.71 - 345.71 = 444.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 558
558
Chapter 9 Analysis of Variance
The commands (SSb data) returns SSBetween, (SSw data) returns SSWithin, (SSt data) returns SSTotal, (anova data) returns the entire table. (setf data '((65 70 75 79 70 61) (70 72 65 53) (60 55 59 58))). Then (SSb data) returns 345.7, (SSw data) returns 444.0 The command (mu-svar data) returns the sample means and variances of each treatment. (mu-svar data) returns ((70.0 42.4) (65.0 72.67) (58.0 4.67)).
EXAMPLE 9.2
Show the equivalence of the F statistic for ANOVA and the square of the t statistic for testing the difference between the following two samples. Verify that the mean square error value is equal to the pooled variance 2 sPooled .
X1 X2
5 7 10 9 9 4 6 8 7 10
H 0: m1 - m 2 = 0 vs. H1: m1 - m 2 π 0 Solution ¥2) yields
(setf ¥1 '(5 7 10 9 9) ¥2 '(4 6 8 7 10)) followed by (t-pool ¥1
x1 = 8;
x2 = 7,
s x21 = 4
sx22 = 5,
2 sPooled = 4.5,
spooled = 2.12,
t = 0.7454,
p-value = 0.4774;
95% confidence interval is (-2.09414 4.09414). t=
( x1 - x2 ) - ( m1 - m 2 ) 1 ˆ Ê 1 2 sPooled + Ën n x2 ¯ x1
=
(8 - 7 ) - 0
= 0.7453559.
Ê 1 1ˆ 4.5 + Ë 5 5¯
Observe t2 = (0.7453559)2 = 0. 5 = F (see the ensuing ANOVA table). The pooled variance 4.5 is the mean square error value. The degrees of freedom are 8 for the t statistic and (1, 8) for the F statistic.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 559
9.1 Single-Factor Analysis
559
The command (ANOVA data) returns the ANOVA table. (ANOVA '((5 7 10 9 9) (4 6 8 7 10))) returns
EXAMPLE 9.3
Source
SS
df
MS
F
p-value
Between Within
2.5 36.0
1 8
2.5 4.5
0.555
0.48
Total
38.5
9
Show that the square of the t-value for testing H0: b = 0 in using the regression data is equivalent to the F-test ratio of explained variation to mean square error.
1 12
x Y
Solution
2 15
3 20
4 23
5 27
(setf ¥ '(1 2 3 4 5) Y '(12 15 20 23 27))
The command (Y-hat ¥ y) returns Yˆ = 8 + 3.8x; (Sb x y) returns the standard error of B, sB = 0.1633, from which
t=
B-b SB
=
3.8 - 0
= 23.27.
0.1633
The command (Sexplain x y) returns SSExplain = 144.4 with (2 - 1) = 1 df. (SSerror x y) returns SSError = 0.8 with (5 - 2) = 3 df, from which MSE =
0.8
= 0.26,
3 F =
144.4 0.26
= 541.5 = 23.27 2 = t 2 .
P369463-Ch009.qxd 9/2/05 2:59 PM Page 560
Chapter 9 Analysis of Variance
560
EXAMPLE 9.4
Test the following data consisting of five treatments, each containing 20 samples, with a set at 0.05, to determine if there is a difference among the means. H 0: m i = m j vs. H1: m i π m j for i π j for i = 1, 2 . . . , 5 Solution
(19 (40 (33 (26 (30
(EX 9.4) assigns the data below to the variable cables.
24 47 15 22 54
12 33 23 26 49
33 35 21 27 61
32 33 24 15 40
19 35 23 31 47
12 46 22 23 37
11 19 25 16 50
24 23 30 25 58
28 36 33 33 61
12 26 35 22 57
13 17 30 26 27
18 32 38 22 29
21 24 12 39 34
23 22 39 16 46
20 29 44 26 26
22 35 29 31 58
12 25 36 26 48
15 23 27 35 36
33) 39) 41) 20) 34)))
The command (anova cables) returns the table. The 5 means are 20.15, 30.95, 29, 25.35, and 44.1, returned by (mu cables). The 5 sample variances are 53.081, 72.155, 72.842, 39.818, and 141.674, returned by (svar cables).
Source
SS
df
MS
F
p-value
Between Within Total
6386.34 7211.85 13598.19
4 95 99
1596.585 75.914
21.03
0.000
Note that the F-value 21.03 indicates a significant difference at a = 0.05, since the p-value ª 0, confirming the rejection of equal means. Thus we reject the null hypothesis of no difference among the means. For equal treatment sizes the pooled variance, also known as mean square error (MSE), is the average of the treatment variances. MSE = 75.914 = (53.081 + 72.155 + 72.842 + 39.818 + 141.673)/5. Notice the disparity in the treatment means 20.15, 30.95, 29, 25.35, and 44.1 (sŸ2-pooled cables) Æ 75.914.
The Bartlett Test for Homogeneity of Variances One of the assumptions of an ANOVA is that the treatment groups are random samples from independent normal populations. The Bartlett test can be used to check the null hypothesis for equal variances, that is, H0: s 2i = s 2j versus H1: s 2i π s 2j for some i and j.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 561
9.1 Single-Factor Analysis
561
Under H0, the best estimate for the variance is the pooled variance or mean square error. The Bartlett test statistic is c 2 - Â ( ri - 1)Ln S 2i ( N - c)Ln SPooled i =1
B= 1
,
c
1 1 ˆ Ê 1+ Â Ë 3( c - 1) i =1 ri - 1 N - c ¯ where N is total sample size, c is the number of treatments, ri is the sample 2 size of the ith treatment, and SPooled is the pooled variance. The B statistic is a chi-square random variable with (c - 1) degrees of freedom. EXAMPLE 9.5
Use the Bartlett test to check for equal variances at a = 5% in Example 9.4. 2 = 75.914, and the Bartlett Solution With N = 100, c = 5, ri = 20, sPooled test statistic B = 8.77 with p-value = 0.0672, the null hypothesis of homogeneity of variances cannot be rejected.
The command (Bartlett cables) returns the Bartlett B statistic with the p-value for testing H 0: s i2 = s 2j for all i and j versus H1: s i2 π s 2j for some i and j. (Bartlett cables) returned (B = 8.768 p-value = 0.067) for the data in Example 9.4, failing to reject equal variances at a = 5%.
EXAMPLE 9.6
Check the homogeneity assumption for the data in Example 9.1 repeated below. Find a 95% confidence interval for the mean of Treatment 1 and for the mean of the difference between Treatments 1 and 2. Treatment 1 65 70 75 79 70 61
Treatment 2
Treatment 3
70 72 65 53
60 55 59 58
Solution With data assigned to a list of the treatment groups, the software commands (setf data ' ((65 70 75 79 70 61)(70 72 65 53)(60 55 59 58))) (Bartlett data) returned B = 3.95, p-value = 0.139, implying a safe homogeneity assumption.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 562
562
Chapter 9 Analysis of Variance
When the assumption of equal variances is not met, transformations can be considered or a nonparametric ANOVA (Chapter 10) can be performed. For a balanced design, a 95% confidence interval (CI) for mi is x i ± tr (c -1),0.025
MSE
,
n
where the t-value is computed with use of the degrees of freedom for the mean square error term. In the case of an unbalanced design, a 95% CI for m1, with use of the treatment 1 data (65 70 75 79 70 61), is 70 ± t11,0.025
MSE
= 70 ± 2.201
40.363
= (64.47, 75.53).
6
ni
The 95% CI for the difference between the means of Treatments 1 and 2 is Ê 1 1ˆ (70 - 65) ± t11,0.025 MSE + = 5 ± 2.201* 4.1 = 5 ± 9.03 = ( -4.03,14.03). Ë 6 4¯
9.2
Two-Way ANOVA without Replication The two-way procedure for ANOVA is a generalization of the one-way. Several advantages accrue from a two-way over a one-way ANOVA: fewer subjects are required to control more of the error, with the ability to study interaction with less masking of the treatment effects. We show a simple example to compute the variations and show the layout for the two-way ANOVA. We have effects of a factor Y of interest at different levels A through N. For example, we may have the factor size at levels large, medium, and small. We also have effects of a factor X at different levels 1 through k. The layout of the data is shown in Table 9.3. The two main factors X and Y, with 1 to k levels of X and A to N levels of Y, are called the main effects. Table 9.3
Two-Way ANOVA without Replication FACTOR Y
Factor X
Levels
A
B
...
N
1 2 ... K
x11 x21
x12 x22
xk1
xk2
... ... ... ...
x1n x2n ... xkn
P369463-Ch009.qxd 9/2/05 2:59 PM Page 563
9.2 Two-Way ANOVA without Replication
Table 9.4
563
Two-Way Layout without Replication SS
df
MS
F
Fc a = 0.05
p-value
r-1 c-1 (r - 1)(c - 1) N-1
Rows Columns Error Total
We assume that there is only one subject in each cell. That is, x11 is a single measurement (no replication) in Treatment 1 at level 1. Sometimes we may want to control for an extraneous or suspected important factor and block on such a variable. For example, in testing general purpose vegetable fertilizers, we may want to also block on plot location to control for such extraneous factors as elevation, drainage, soil fertility, and amount of sunshine. The analysis of the data is performed by calculating entries for the table shown in Table 9.4. Variations for SSRows and SSColumns are computed similarly to those for SSBetween in the one-way analysis. The total sample size n is equal to r * c. In computing the degrees of freedom in the analysis of variance, visualize the computation and subtract the givens or the already specified. For example, if there are r row means with a specified overall mean, the degree of freedom value is equal to r - 1 for SSRows. For SSError, there are rc cell means minus the specified row means minus the specified column means plus one, since the specified row means and specified column means each implicitly specify the overall mean, giving (rc - r - c + 1) = (r - 1)(c - 1). EXAMPLE 9.7
Three different general fertilizers were applied on four acres in different fields to study their effects on yields of corn. Perform the ANOVA, using location (Acre) as a blocking variable, and repeat the analysis, ignoring the effects of location.
FERTILIZER
Acre
1 2 3 4 S c s2
A
B
C
S
r
s2
7 6 7 8 28 7 0.67
14 7 8 7 36 9 11.3
6 11 12 3 32 8 18
27 24 27 18
9 8 9 6
19 7 7 7
x=8
P369463-Ch009.qxd 9/2/05 2:59 PM Page 564
564
Chapter 9 Analysis of Variance
Solution
(setf data '((7 6 7 8)(14 7 8 7)(6 11 12 3))).
Compute SSRows and SSColumns as in the one-way analysis. The four row means are, respectively, 9, 8, 9, and 6, and the overall mean x is 8. SS Rows = 3[(9 - 8)2 + (8 - 8)2 + (9 - 8)2 + (6 - 8)2 ] = 3(1 + 0 + 1 + 4) = 18, with 4 - 1 = 3 df (see first row of Table 9.5a). (SSrows data 4) Æ 18. The three column means are, respectively, 7, 9, and 8. SS Columns = 4[(7 - 8)2 + (9 - 8)2 + (8 - 8)2 ] = 4(1 + 1) = 8 with 2 df (see second row of Table 9.5a). (SScols data) Æ 8. The error term for this 2-way without replication is calculated as the square of the cell mean, (a single xij) minus the row mean minus the column mean plus the overall mean. SS Error = (7 - 9 - 7 + 8)2 + (14 - 9 - 9 + 8)2 + (6 - 9 - 8 + 8)2 + (6 - 8 - 7 + 8)2 + (7 - 8 - 9 + 8)2 + (11 - 8 - 8 + 8)2 + (7 - 9 - 7 + 8)2 + (8 - 9 - 9 + 8)2 + (12 - 9 - 8 + 8)2 + (8 - 6 - 7 + 8)2 + (7 - 6 - 9 + 8)2 + (3 - 6 - 8 + 8)2 = 1 + 16 + 9 + 1 + 4 + 9 + 1 + 4 + 9 + 9 + 0 + 9 (SSerror-a data 4) Æ 72. = 72 with ( r - 1) * ( c - 1) = 6 df (see third row of Table 9.5a). SSTotal is computed as in the one-way ANOVA. Of course, we can simply add the already computed rows, columns, and error variations to get 18 + 8 + 72 = 98 for the total variation. (SSt data) Æ 98. The MS column is the SS column divided by the df column (SS/df), and the F column is the ratios of the corresponding MS to the MSE (that is, 6/12 = 0.50 and 4/12 = 0.33). The results of the ANOVA without the blocking variable are shown in Table 9.5b. Table 9.5a
ANOVA with Blocking Variable
Source
SS
df
MS
F
p-value
Fc a = 0.05
Rows Columns Error Total
18 8 72 98
3 2 6 11
6 4 12
0.50 0.33
0.695 0.730
4.76 5.10
P369463-Ch009.qxd 9/2/05 2:59 PM Page 565
9.2 Two-Way ANOVA without Replication
Table 9.5b
565
ANOVA without Blocking Variable
Source
SS
df
MS
F
p-value
Columns Error Total
8 90 98
2 9 11
4 10
0.4
0.682
Fc a = 0.05
The command (anova data number-of-rows) returns the completed ANOVA table where data is a list of column data and r is the number of rows. For example, (anova '((7 6 7 8)(14 7 8 7)(6 11 12 3)) 4) returns the completed ANOVA Table 9.5a.
To Block or Not to Block To control for extraneous error in testing 2 treatments or samples, the paired t-test was used. To control in testing more than 2 treatments, blocking is used. In Example 9.7, performing the ANOVA with the blocking variable location reduced the sum of squares error from 90 to 72, with 18 accounted for by the blocking variable. But the degrees of freedom were also reduced from 9 to 6. This loss in freedom implies that to block unnecessarily requires a higher F-ratio to be significant, thus resulting in a lack of power. The MSE has fewer degrees of freedom when blocking, implying that the pooled variance is less precise. In a balanced design, the degrees of freedom associated with the one-way error variation is c(r - 1), where c and r are the number of columns and rows, respectively. In the block design the degrees of freedom associated with the error variation is (r - 1)(c - 1) = rc - r - c + 1. Thus the difference is (rc - c) - (rc - r - c + 1) = r - 1. In determining whether to block or not, an indicator is the ratio of the mean sum of squares for the blocking factor divided by mean square error. In this example the ratio is 6/12 = 0.5, indicating not to block. A ratio greater than one indicates that blocking is effective. There are trade-offs in a rigidly controlled experiment versus one of randomization. A rigidly controlled experiment may not generalize or scale under different controls. The model for a two-way analysis of variance with replication is given by X ij = X + ( X r - X ) + ( X c - X ) + ( X cell - X r - X c + X ) + Eij ( X ij - X ) = ( X r - X ) + ( X c - X ) + ( X cell - X r - X c + X ) + Eij = ( X r - X ) + ( X c - X ) + ( X cell - X r - X c + X ) + ( X ij - [ X + ( X r - X ) + ( X c - X ) + ( X cell - X r - X c + X ).
P369463-Ch009.qxd 9/2/05 2:59 PM Page 566
Chapter 9 Analysis of Variance
566
Table 9.6
Blocking Effects a. NO BLOCKING
Source
SS
df
MS
F
p-value
Between Within Total
56 60.25 116.25
2 9 11
28 6.69
4.19
0.049
b. BLOCKING Source
SS
df
MS
F
p-value
Rows Columns Block Error Total
24.08 56 8.66 27.5 116.25
1 2 2 6 11
24.08 28 4.33 4.58
5.26 6.11 0.94
0.062 0.036 0.441
Squaring both sides yields S( X ij - X )2 = nc S( X j - X )2 + nr S( X r - X )2 + ncell S( X ij - X r - X c + X )2 + ( X ijk - X cell )2 Total = Column effects + Row effects + Interaction effects + Error EXAMPLE 9.8
Perform an ANOVA at a = 5% for the 2 ¥ 3 design and determine whether to block or not on Rows. COLUMNS Levels
C1
C2
C3
R1
12 14
10 15
14 18
R2
13 14
16 16
19 22
Rows
Using the (ANOVA data) and (ANOVA data 2) produced the one-way and two-way ANOVA tables shown in Table 9.6. The blocking ratio (24.08/4.58 = 5.26 > 1) indicates that blocking was proper and effective. In the one-way ANOVA the column effects were not significant at a = 5%, but after blocking on the Rows, the column effects were found to be significant.
9.3
Two-Way ANOVA with Replication Another variation of the two-way is a two-way with replication, where there is a source of variation called interaction. Interaction occurs when the effects
P369463-Ch009.qxd 9/2/05 2:59 PM Page 567
9.3 Two-Way ANOVA with Replication
Table 9.7
567
Two-Way Layout with Replication
Source
df
MS
Rows Columns RC Error Total
r-1 c-1 (r - 1)(c - 1) rc(nc - 1) N-1
F
p-value
Fc a = 0.05
of each factor depend on the specific levels or values of the other factors. SSrc designates the interaction effect. In studying the behavior of high school children, the behavior of boys in an all-boys class and the behavior of girls in an all-girls class can be significantly different from the behavior of boys and girls in a mixed class because of the interaction of the boys with the girls. Similarly, a certain brand may be significantly different at a certain level of input from the other brands. For example, in testing the efficiency of airconditioners, we may have efficiency indicators for brand A and brand B at high, medium, and low settings. A two-way layout with replication is shown in Table 9.7. The number in each cell is denoted by nc, r is the number of rows, c is the number of columns, N is the total sample size. The model for a two-way analysis of variance with replication is given by m ij = m.. + Ri + C j + ( RC )ij + Eij where m.. is the overall mean, Ri is the main row effect at level i, Cj is the main column effect at level j, and (RC)i,j is the interaction of the R effect at level i with the column effect at level j. SS Rows = r1( r1 - x )2 + r2 ( r2 - x )2 + . . . + rr ( rr - x )2 , where rr indicates the size of the rth row, rr is the mean of the rth row, and x is the overall mean. Similarly, SS Columns = c1( c1 - x )2 + c2 ( c2 - x )2 + . . . + cc ( cm - x ), where cc indicates the size of the cth column and cc is the mean of the cth column. SS rc = nc[( rc11 - r1 - c1 + x )2 + ( rc12 - r1 - c2 + x )2 + . . . + ( rc km - rk - cm + x )2 ]. where nc is the cell size (assumed the same for each cell), ri is the mean of the ith row, cj is the mean of the jth column, and rcij is the cell mean of the ith row and jth column cell. We next show a two-way analysis with replication using a simplified version of the data in Example 9.7. EXAMPLE 9.9
Perform a two-way analysis with replication on the data in the table below, which has 2 types (rows) of wheat and 3 columns (the three fertiliz-
P369463-Ch009.qxd 9/2/05 2:59 PM Page 568
568
Chapter 9 Analysis of Variance
ers), with yields in bushels per acre indicated by the data. The cell means are in bold. Solution
(setf data '((7 6 7 8)(14 7 8 7)(6 11 12 3))) Fertilizer A
Fertilizer B
Fertilizer C
7 6 6.5 7 8 7.5 28 7
14 7 10.5 8 7 7.5 36 9
6 11 8.5 12 3 7.5 32 8
Wheat 1 Wheat 2 S c
SS Rows = r1( r1 - x )2 + r2 ( r2 - x )2 + . . . + rr ( rr - x )2 = 6(8.5 - 8)2 + 6(7.5 - 8)2 = 3 with 2 - df (see first row of Table 9.8). 2
2
S
r
51
8.5
45
7.5
x=8
(SSrows data 2) Æ 3. 2
SS Columns = c1( c1 - x ) + c2 ( c2 - x ) + . . . + cc ( cc - x ) = 4(7 - 8)2 + 4(9 - 8)2 + 4(8 - 8)2 = 8 with 2 df (see second row of Table 9.8)
(SScols data) Æ 8.
The interaction is computed similarly to that for the error term without replication. The interaction is the number in each cell (2) times the square of the cell mean minus the row mean minus the column mean plus the overall mean. SS rc = nc [( rc11 - r1 - c1 + x )2 + ( rc12 - r1 - c2 + x )2 + . . . + ( rc rc - rr - cc + x )2 ] = 2[(6.5 - 8.5 - 7 + 8)2 + (10.5 - 8.5 - 9 + 8)2 + (8.5 - 8.5 - 8 + 8)2 + (7.5 - 7.5 - 7 + 8)2 + (7.5 - 7.5 - 9 + 8)2 + (7.5 - 7.5 - 8 + 8)2 = 8 with 2 df (see third row of Table 9.8) (SSrc data 2) Æ 8. The between cells sum of squares is the sum of the row, column, and interaction effects. That is, SS Cells = 2[(6.5 - 8)2 + (10.5 - 8)2 + (8.5 - 8)2 + (7.5 - 8)2 + (7.5 - 8)2 + (7.5 - 8)2 ] = 2[2.25 + 6.25 + 0.25 + 0.25 + 0.25 + 0.25] = 19. (SScells data 2) Æ 19. Table 9.8
Two-Way with Replication
Source
SS
df
MS
F
p-value
Rows Columns RC Error Total
3 8 8 79 98
1 2 2 6 11
3 4 4 13.2
0.23 0.30 0.30
0.65 0.75 0.75
Fc a = 0.05 5.99 5.14 5.14
P369463-Ch009.qxd 9/2/05 2:59 PM Page 569
9.3 Two-Way ANOVA with Replication
569
The error term is computed similarly to the one-way analysis in that each entry in each cell is subtracted from its cell mean, squared, and summed. SS Error = (7 - 6.5)2 + (6 - 6.5)2 + (14 - 10.5)2 + (7 - 10.5)2 + (6 - 8.5)2 + (11 - 8.5)2 + (7 - 7.5)2 + (8 - 7.5)2 + (8 - 7.5)2 + (7 - 7.5)2 + (12 - 7.5)2 + (3 - 7.5)2 = 0.25 + 0.25 + 12.25 + 12.25 + 6.25 + 6.25 + 0.25 + 0.25 + 0.25 + 0.25 + 20.25 + 20.25 = 79 with 6 df ( rcnc - rc) where nc is the number in each cell. (SSerror-a data 2) With the high p-values, the null hypothesis of equal means cannot be rejected. Again, the F column is the ratio of the MS effects (3, 4, 4) to the MSE (13.2), giving 4 4 ˆ Ê 3 , , = (0.23, 0.30, 0.30), respectively. Ë 13.2 13.2 13.2 ¯
The command (Row-means data-list r) returns the row means, r the number of rows (Column-means data-list) returns the columns means, (Cell-means data-list r) returns the cell means, (SSrows data-list r) returns SSRows, (SScols data-list) returns SSColumns (SSrc data-list r) returns SSrc, (SScells data-list r) returns SSCells, which equals SSColumns + SSRows + SSrc, (SSerror-a data list r) returns SSerror. (SSt data-list) returns SSTotal. (anova data-list nmu-rows) returns the completed table, where datalist is a list of column data, num-rows is the number of rows. (Row-means '((7 6 7 8) (14 7 8 7) (6 11 12 3)) 2) returns the list of row means (8.5 7.5).
EXAMPLE 9.10
Analyze the interaction (Fig. 9.7) for the data in the 3 ¥ 3 layout (Table 9.9a) by plotting the row and column means (Table 9.9b). Then complete the twoway ANOVA (Table 9.9c). Row effect = R1 + R2 + R3 = (8 - 8) + (7.5 - 8) + (8.5 - 8) = 0 - 0.5 + 0.5 = 0. Column effects = C1 + C2 + C3 = (6 - 8) + (8 - 8) + (10 - 8) = -2 + 0 + 2 = 0.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 570
570
Chapter 9 Analysis of Variance
15 10 5 0 0
1
Figure 9.7
Table 9.9a
2
3
4
Interaction
3 ¥ 3 ANOVA Data COLUMNS Levels
C1
C2
C3
R1
2.5 3.5 6 8 9 7
7 9 7 7 8 10
12 14 8 9 8 9
R2
Rows
R3
Table 9.9b
Cell, Row, and Column Means C1
C2
C3
¯ R
3 7 8 6
8 7 9 8
13 8.5 8.5 10
8 7.5 8.5 x=8
R1 R2 R3 ¯ C
Table 9.9c
Two-Way ANOVA with Interaction
Source
SS
df
MS
F
p-value
Rows Columns RC Error Total
3 48 56 11.5 118.5
2 2 4 9 17
1.5 24 14 1.28
1.17 18.78 10.96
0.341 0.001 0.002
P369463-Ch009.qxd 9/2/05 2:59 PM Page 571
9.4 Multiple Comparisons of Treatment Means
(a) 40 20 0 0
2
4
No Interaction
Figure 9.8
571
(b) 20 10 0 0
2
4
Interaction
Additive and Multiplicative Effects on Interaction
RC effects = ( RC )1,1 + . . . + ( RC )3,3 = (3 - 8 - 6 + 8) + (8 - 8 - 8 + 8) + (13 - 8 - 10 + 8) + (7 - 7.5 - 6 + 8) + (7 - 7.5 - 8 + 8) + (8.5 - 7.5 - 10 + 8) + (8 - 8.5 - 6 + 8) + (9 - 8.5 - 8 + 8) + (8.5 - 8.5 - 10 + 8) = -1 + 0 + 1 + 1.5 - 0.5 - 1 + 1.5 + 0.5 - 2 = 0. The algebraic deviations about a mean always sum to zero, but the sum of the squared deviations about the mean is not zero unless each deviation is zero. Thus there is row, column, and interaction effect when the algebraic deviations are squared. In general, a plot of the treatment means shows no interaction when the lines connecting the cell means are parallel and shows interaction when the lines are not parallel. The additive effects in Figure 9.8a show no interaction where the multiplicative effects in Figure 9.8b show interaction. The more sharply the cell means deviate from parallelism, the stronger the interaction.
9.4
Multiple Comparisons of Treatment Means After possibly discovering that there is at least one treatment group distinguished from the others in the analysis of variance, we are faced with the problem of ferreting out the significant measures. One might think that all that is necessary is to perform all the combinations of tests, taken 2 at a time, using hypothesis-testing procedures. But if we perform 10 such independent tests, each at a = 5%, the Type I error rate would increase from 5% to 1 - 0.9510 = 40%. Each individual test would carry the a risk, but the risk for the ensemble of tests would be 40%. Multiple comparisons may be determined before the ANOVA (preplanned) or selected after looking at the data (post-hoc). There is a strong
P369463-Ch009.qxd 9/2/05 2:59 PM Page 572
Chapter 9 Analysis of Variance
572
tendency to misuse post-hoc comparisons. Preplanned comparisons should normally be used. Post-hoc comparisons should normally be done when the H0 is rejected. The experimenter should remain wary of Type I errors. Paired comparisons can be made using the LSD (least significant difference) procedure, the Fisher LSD procedure, the Duncan multiple range test, the Scheffe procedure, the Tukey procedure, the Bonferroni procedure, and others. We illustrate the procedures for continuing with the pair-wise comparisons based on the Tukey and the Bonferroni procedures. It is to be emphasized that only one such test should be used (decided in advance of seeing the data), that is, the tests must not be used until the experimenter is satisfied with the results. The Tukey method uses the studentized range distribution and each treatment must have the same sample size. The Bonferroni method uses the student t distribution, and sample sizes need not be identical. We first introduce the concept of contrasts.
Contrasts When more specific hypotheses are to be tested rather than just whether there is any difference among the means in an ANOVA, the use of contrasts emerges. Contrasts are a way of testing specific hypotheses. Creating appropriate contrasts to test can be challenging and are determined by the experiment and the experimenter. A contrast is a linear combination of the treatment means where the sum of the coefficients is zero. That is, contrast k
L = Â ci m i , i =1 k
where
Âc
i
= 0. An unbiased estimator for contrast L is
i =1
ˆ = c1 x1 + c2 x2 + . . . + ck x k . L ˆ is norUnder the assumption of sampling from normal distributions, RV L mally distributed with k
ˆ ) = L = Â ci m i E( L i =1
and k
2
ˆ ) = s 2 Â ci . V( L i =1 n i Thus, RV ˆ-L L k
s
Ân i =1
(9–6)
ci2 i
P369463-Ch009.qxd 9/2/05 2:59 PM Page 573
9.4 Multiple Comparisons of Treatment Means
573
is unit normal, RV ˆ - L )2 (L k 2
s
(9–7)
ci2
Ân i =1
i
is chi-square with 1 degree of freedom, and RV ˆ-L L
(9–8)
k
Âc SP
2 i
i =1
ni k
has a t distribution with N - k degrees of freedom where N = Â ni . i =1
All contrasts have one degree of freedom so that SSL = MSL. Thus, t2 = F =
MSL
,
MSE where MSL is mean square of the contrast and MSE is the mean square error. The t-test and the equivalent F-test are suitable for testing the contrast hypotheses. That is, ˆ2 L MSL t2 = =F = , 2 k MSE Ê ˆ MSE Â ci x / n Ë ¯ i =1
2
k
MSE * F = MSL =
Ê ˆ cx ËÂ i ¯ i =1
.
k
Ê ˆ c2 / n ËÂ i¯ i =1
Suppose there are 4 treatments differing by an increasing amount of a material present. We may want to test the hypothesis that m1 = (m2 + m3 + m4)/3 to see whether the amount of material present makes a difference, where Treatment 1(placebo) has none of the material present. This null hypothesis is the contrast L = 3m1 - m2 - m3 - m4 = 0. The sum of squares for a contrast is given by k
SS L =
2
Ê ˆ cx ËÂ i i¯ i =1 k
Ê ˆ c2 / n ËÂ i¯ i =1
,
(9–9)
P369463-Ch009.qxd 9/2/05 2:59 PM Page 574
574
Chapter 9 Analysis of Variance
where k is the number of treatments and n is the common sample size. Again, all contrasts have one degree of freedom so that SSL = MSL. For unbalanced designs, the added constraint of Scini = 0 is needed in order to compare treatment means. The F statistic with 1 and N - k degrees of freedom is then given by SSL
F (1, N - k) =
.
MSE Two contrasts are called orthogonal if the scalar product of their coefficik
ents is zero, that is, contrasts L1 =
k
Âc m i
i
and L2 =
i =1 k
Âd m i
i
are orthogonal if
i =1 k
 ci di = 0 with equal sample size or if
Ân c d
i =1
i =1
i i
i
= 0 for unequal sample sizes.
Given k treatment means, k - 1 mutually orthogonal contrasts can always be constructed. A set of orthogonal contrasts is not unique. Each orthogonal contrast results in an independent sum of squares with one degree of freedom. The sum of orthogonal contrast sum of squares equals the treatment (between) sum of squares and the sum of each degree of freedom equates to the between degrees of freedom in a one-way ANOVA. EXAMPLE 9.11
Use the following four treatment data sets to complete the exercises.
T1
T2
T3
T4
2 3 5 2 S 12 xi 3
3 3 5 5 16 4
8 7 7 6 28 7
3 1 2 2 8 2
a) Test H0: m1 = (m2 + m3 + m4)/3 vs. H1: m1 π (m2 + m3 + m4)/3 or L = 3m1 m2 - m3 - m4 = 0 vs. L = 3m1 - m2 - m3 - m4 π 0. b) Perform the ANOVA. c) Show that the sum of the individual contrast sum of squares equals the sum of squares between (SSBetween) using the 3 orthogonal contrasts with coefficients La = (111 -3), Lb = (11 -2 0), and Lc = (1 -1 0 0). Test each contrast for significance at a = 5%.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 575
9.4 Multiple Comparisons of Treatment Means
575
d) Verify that t2 = F for testing Part a. e) Show that orthogonal contrasts Ld = (1 0 0 -1), Le = (1 -1 -1 1) and Lf = (0 1 -1 0) also partition SSB. Solution (setf data ' ((2 3 5 2)(3 3 5 5)(8 7 7 6)(3 1 2 2))) (mu-svar data) Æ ((3.0 2.0)(4.0 1.333)(7.0 0.667)(2.0 0.667)). a) The four treatment means are 3, 4, 7, and 2, respectively with an overall mean of 4. The four sample variances are 2, 4/3, 2/3, and 2/3, respectively. For example, s 12 = [(2 - 3)2 + (3 - 3)2 + (5 - 3)2 + (2 - 3)2 ]/(4 - 1) = 6/3 = 2. ˆ = 3 x1 - x2 - x3 - x4 , we find Using estimator L ˆ ) = 3 m1 - m 2 - m 3 - m 4 , E( L ˆ = 3(3) - 1(4) - 1(7) - 1(2) = -4, L ˆ ) = (9 + 1 + 1 + 1)s 2 /4 = 3s 2 . V( L Then RV ˆ-L L s 3 is unit normal under the assumption that the independent treatment populations are normal with the same variance. Further, RV
ˆ-L L SP 3
has a t distribution with 16 - 4 = 12 degrees of freedom, where Sp is the pooled standard error and the 4 lost degrees of freedom are due to the 4 estimated means. Thus a suitable test statistic for the null hypothesis L = 0 is t=
ˆ-L L
.
SP 3 The pooled variance is the MSE of the ANOVA or (2 + 4/3 + 2/3 + 2/3)/4 = 7/6, and the pooled mean sampling error is 7/6 = 1.08. t=
-4 - 0
=
-4
= -2.138 with 7/6 * 3 3.5 p-value = 0.054 = (* 2 ( L-tee 12 - 2.138)).
P369463-Ch009.qxd 9/2/05 2:59 PM Page 576
576
Chapter 9 Analysis of Variance
b) For the ANOVA, see the table below.
Source
SS
df
MS
F
p-value
Between Within
56 14
3 12
18. 6 1.1 6
16
0.0002
Total
70
15
c) The respective treatment sums are 12, 16, 28, and 8. The sum of squares for contrast coefficients La = (1 1 1 -3) is designated SSLa and is equal to 2
k
SSL a =
Ê ˆ cx ËÂ i i¯ i =1
=
k
n
(1 * 12 + 1 * 16 + 1 * 28 - 3 * 8)2
=
4(12)
Ê ˆ c2 ËÂ i ¯
1024 48
=
64 3
i =1
2
k
=
Ê ˆ cx ËÂ i i¯ i =1
1 Ê k 2ˆ Â ci ¯ nË i =1
=
(1 * 3 + 1 * 4 + 1 * 7 - 3 * 2)2 (1 + 1 + 1 + 9)/4 4
=
64
.
3
Similarly, for contrast Lb coefficients (1 1 -2 0) testing the average of m1 and m2 with m3, SSLb =
(1 * 3 + 1 * 4 - 2 * 7 + 0 * 2)2 (1 + 1 + 4 + 0)/4
=
98
,
3
and Lc contrast coefficients (1 -1 0 0) testing m1 = m2, SSLc =
(1 * 3 - 1 * 4 + 0 * 7 + 0 * 2)2 (1 + 1 + 0 + 0)/4
= 2.
Observe that the total contrast sum of squares equals the between treatment sum of squares, that is, (64/3 + 98/3 + 2 = 56 = SSBetween). Further, since the contrasts are independent by being mutually orthogonal, each can be F-tested for significance at the specified a by dividing each SSLi by the mean square error. The corresponding F-ratios for the contrasts are: 64/3 7/6
=
128 7
= 18.29,
P369463-Ch009.qxd 9/2/05 2:59 PM Page 577
9.4 Multiple Comparisons of Treatment Means
98/3
=
196
7/6
577
= 28,
7
and 2
=
12
7/6
= 1.71.
7
Contrast La and Lb are significant but contrast Lc is not. The critical F at 1 and 12 degrees of freedom is 4.75. Note that the average of the F-ratios (18.29 + 28 + 1.71)/3 equal 16, the Fratio for the ANOVA. d) From Part a, the t-value is -
4 6
ª -2.138, with t2 =
16 * 6
21
21
= 4.5714.
The sum of squares for contrast coefficients (3 -1 -1 -1) is computed as SS L = The MSE is
7
(3 * 3 - 1 * 4 - 1 * 7 - 1 * 2)2 (9 + 1 + 1 + 1)/4
=
16
.
3
. The F-value is computed to be 4.5714, as
6 SS L MSE
=
16/3
=
32
=
16 * 6
= t2
7/6 7 21 = (/ (SSL data '(3 -1 -1 -1))(S Ÿ 2-pooled data)).
e) For orthogonal contrasts Ld = (1 0 0 -1), Le = (1 -1 -1 1), and Lf = (0 1 -1 0) with means 3, 4, 7, and 2, we have SSL d =
(3 - 2)2 2/4
= 2; SSL e =
(3 - 4 - 7 + 2)2 4/4
= 36; SSL f =
( 4 - 7 )2
= 18;
2/4
and SSL d + SSL e + SSL f = 2 + 36 + 18 = 56 = SSBetween .
(setf data '((2 3 5 2)(3 3 5 5)(8 7 7 6)(3 1 2 2))). The command (SSL data contrast) returns the contrast sum of squares for the specified contrast. For example, (SSL data '(1 1 1 -3)) returns 21.333333. The template (F-L data contrast) returns the F-ratio and p-value for the contrast. For example, (F-L data '(1 1 1 -3)) returns F = 18.29 with p-value = 0.001.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 578
Chapter 9 Analysis of Variance
578
(C-anova data list-of-contrasts) prints the ANOVA table with the Between sum of squares SSb, and the user’s specified set of contrasts sum of squares or SSL, partitioned by a default set of orthogonal contrasts. The contrast effects are shown in offset (C-anova data '((1 1 1 -3)(1 1 -2 0)(1 -1 0 0))) displays Source Between (1 1 1 -3) (1 1 -2 0) (1 -1 0 0) Within Total
SS
df
MS
F
p-value
56 21.33 32.67 2.00 14 70
3 1 1 1 12 15
18.67 21.33 32.27 2.00 1.16
16.00 18.29 28.00 1.71
0.000 0.001 0.000 0.164
Notice that the average of the orthogonal contrast mean squares 21.33, 32.67 and 2 is the mean square of SSb = 56/3, and the average of the F-ratios 18.29, 28, and 1.71 is the Between-F-ratio 16.
Contrast Confidence Intervals k
ˆ = c1 x1 + c2 x2 + . . . + ck x k, a 100 Given contrast L = Â ci m i with estimator L i =1
(1 - a)% confidence interval can be established using the pooled sampling error. The confidence interval for the ith mean is given by x i ± tn - k,a /2 s p / ni , and the confidence interval for the difference between two means xi - xj is given by
x i - x j ± tn - k,a /2 s p
1 ni
+
1
.
nj
The confidence interval for any arbitrary contrast L is given by 2 2 2 ˆl ± tn - k,a /2 s p c 1 + c 2 + . . . + c k . n1 n2 nk
EXAMPLE 9.12
Use the four treatment data sets below to find 95% confidence intervals for k
a) T3; b) T3 - T2, c) L = Â ci m i with use of contrast (-1 1 -1 1). i =1
P369463-Ch009.qxd 9/2/05 2:59 PM Page 579
9.4 Multiple Comparisons of Treatment Means
579
T1
T2
T3
T4
2 3 5 2 12 3
3 3 5 5 16 4
8 7 7 6 28 7
3 1 2 2 8 2
S xi
Solution a) (s-pooled data) returns sp = 1.08 and x i ± tn - k,a /2 s p / ni = 7 ± 2.179 * 1.08/2 = (5.823, 8.177). b) x i - x j ± tn - k,a /2 s p
1 ni
+
1
= (7 - 4) ± 2.179 * 1.08 * 0.707
nj = (1.336, 4.664).
2 1
2 2
2 ˆ ± tn - k,a /2 s p c + c + . . . + c k = ( -3 + 4 - 7 + 2) ± 2.179 * 1.08 * 1 c) L n1 n2 nk = ( -6.353, -1.647).
Note that 0 is not in the interval and the contrast hypothesis is rejected.
(setf data '((2 3 5 2)(3 3 5 5)(8 7 7 6)(3 1 2 2))). The command (L-hat-ci data contrast row/col nrows a) returns the contrast 100(1 - a)% confidence interval. For example, (L-hat-ci data '(-1 1 -1 1) 'c 1 5) returns (-4 ± 2.353) or (-6.353, -1.647). (L-hat-ci data '(-1 1) 'r 2 1) returns 99% confidence interval is 0.5 ± 2.274
Least Significant Difference (LSD), Fisher LSD, and Scheffe Procedures Several multiple-comparison procedures can be used to determine the significance of the treatments. The experimenter must choose the one that meets the assumptions of the experiment and the desired control over the overall alpha risk. The least significant difference procedure makes all possible pair-wise comparisons using the t-test. Any two means which vary by more than the LSD may be significant. The LSD is given by
P369463-Ch009.qxd 9/2/05 2:59 PM Page 580
580
Chapter 9 Analysis of Variance
1ˆ Ê 1 LSD = ta /2, n1 + n2 - 2 MSE + = ta /2, n1 + n2 - 2 2MSE/ n when n1 = n2 , Ën n2 ¯ 1 where MSE is the mean square error and n is the sample size. The Fisher LSD procedure is similar, except that the procedure is not used unless the initial ANOVA null hypothesis was rejected. There is no control over the experiment-wise error rate; therefore, the LSD and Fisher LSD should not be used when several tests are planned. The Scheffe test procedure uses the statistic below to control the overall experiment-wise error rate. The Scheffe can test all possible comparisons. The critical Scheffe S for contrasts is S = ( c - 1) Fa Su i2 ( MSE/ n ), where c is the number of treatment means, the ui ¢s are the contrast coefficients, and the F statistic is computed with (c - 1) degrees of freedom for the numerator and c * (n - 1) for the denominator. The significant test for a pair of means is given by |X i - X j| 1ˆ Ê 1 MSE + Ën nj ¯ i EXAMPLE 9.13
≥ Fa ,c -1, df .
Use a) the LSD procedures to determine which drugs are significant at a = 0.05 and b) the Scheffe procedure to test a set of orthogonal contrasts given by (1 1 -2) and (1 -1 0) for the drug data below. x Drug A 5 6 8 7 9 7 Drug B 6 7 5 6 5 5.8 Drug C 3 2 4 3 4 3.2
Solution
(anova '((5 6 8 7 9)(6 7 5 6 5)(3 2 4 3 4)) returns Source
SS
df
MS
F
p-value
Between Within Total
37.7 15.6 53.3
2 12 14
18.9 1.30
14.5
0.0006
a) LSD: The 3 means for the drugs are A = 7, B = 5.8, and C = 3.2. LSD = ta/2,c (n -1) 2MSE/ n = t0.025,3 (5 -1) 2 * 13/5 = 2.179 * 0.721 = 1.57 A - B = 7 - 5.8 = 1.2 < 1.57 fi not significant, A - C = 7 - 3.2 = 3.8 > 1.57 fi significant, B - C = 5.8 - 3.2 = 2.6 > 1.57 fi significant.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 581
9.4 Multiple Comparisons of Treatment Means
581
Again, there is no control over the a-level. For these 3 tests, the overall experimental a = 1 - (1 - a)3 = 1 - 0.953 = 14.26%. b) Scheffe: The means of the three drugs are 7, 5.8, and 3.2. ˆa = 1 7 + 1 5.8 - 2 3.2 = To test hypothesis m1 + m2 - 2m3 = 0, compute L * * * 6.4. The critical S = ( c - 1) Fa Su i2 ( MSE/ n ) = 2 * 3.89 * (1 + 1 + 4) * 1.3/5 = 3.48. As 6.4 exceeds the critical S of 3.48, the hypothesis is rejected. That is, the mean 3.2 is not statistically equivalent to the average of 7 and 5.8. ˆb = 7 - 5.8 = 1.2 < 3.48 fi Cannot reject the hypothesis m1 = m 2 . L 2
k
SSL b =
Ê ˆ cx ËÂ i i¯ i =1
1 n
k
Âc
= (1 * 7 + 1 * 5.8 - 2 * 3.2)2 /[(12 + 12 + -22 )/5] = 34.133.
2 i
i =1
The F statistic for testing the contrast is F = 34.133/1.3 = 26.6, confirming the rejection.
The command (Scheffe data contrast alpha) returns the result of a customized contrast. (Scheffe '((5 6 8 7 9) (6 7 5 6 5) (3 2 4 3 4)) '(1 1 -2) 5) returns S-Critical = 4,
L-hat = 6.4, REJECT.
Tukey Method Suppose we have 4 groups (number of treatments). The total number of pairwise comparisons for the 4 groups is 4C2 = 6 different pair-wise tests. The interval estimates for the difference in means is given by m i - m j Œ x i - x j ± q1-a ,c,c ( n -1)
MSE
(9–10)
n
where c is the number of treatments (number of columns), n is the common sample size, MSE is the mean square error (pooled variance) of the ANOVA, and q is the critical studentized range value. The term q
MSE n
is called the Tukey honestly significant difference (HSD).
P369463-Ch009.qxd 9/2/05 2:59 PM Page 582
Chapter 9 Analysis of Variance
582
Table 9.10
Portion of Studentized Range q.95 CRITICAL VALUES FOR THE STUDENTIZED RANGE q95
DENOMINATOR
NUMBER OF TREATMENTS
df
2
3
4
5
6
7
8
9
10
11
12
1 2 3 4
18.0 6.08 4.50 3.93
27.0 8.33 5.91 5.04
32.8 9.80 6.82 5.76
37.1 10.9 7.50 6.29
40.4 11.7 8.04 6.71
43.1 12.4 8.48 7.05
45.4 13.0 8.85 7.35
47.4 13.5 9.18 7.60
49.1 14.0 9.46 7.83
50.6 14.4 9.72 8.03
52.0 14.7 9.95 8.21
5 6 7 8 9
3.64 3.46 3.34 3.26 3.20
4.60 4.34 4.16 4.04 3.95
5.22 4.90 4.68 4.53 4.41
5.67 5.30 5.06 4.89 4.76
6.03 5.63 5.36 5.17 5.02
6.33 5.90 5.61 5.40 5.24
6.58 6.12 5.82 5.60 5.43
6.80 6.32 6.00 5.77 5.59
6.99 6.49 6.16 5.92 5.74
7.17 6.65 6.30 6.05 5.87
7.32 6.79 6.43 6.18 5.98
10 11 12 13 14
3.15 3.11 3.08 3.06 3.03
3.88 3.82 3.77 3.73 3.70
4.33 4.26 4.20 4.15 4.11
4.65 4.57 4.51 4.45 4.41
4.91 4.82 4.75 4.69 4.64
5.12 5.03 4.95 4.88 4.83
5.30 5.20 5.12 5.05 4.99
5.46 5.35 5.27 5.19 5.13
5.60 5.49 5.39 5.32 5.25
5.72 5.61 5.51 5.43 5.36
5.83 5.71 5.71 5.53 5.46
The entering arguments for the studentized range q are c for the numerator and c(n - 1) for the denominator. A small portion of the distribution is shown for q0.95 in Table 9.10. Confidence intervals for all pair-wise comparisons are then formed. If zero is not contained in the confidence interval, the test is significant at the collective 95% confidence level. Equivalently, the mean difference between any two samples must be at least as large as the HSD to be significantly different. The studentized range statistic is q=
X max - X min SP / n
where SP is the pooled standard error and n is the common sample size. EXAMPLE 9.14
Use the Tukey method to determine which drugs are collectively significant at a = 0.05. x Drug A 5 6 8 7 9 7 Drug B 6 7 5 6 5 5.8 Drug C 3 2 4 3 4 3.2
Source
SS
df
MS
F
p-value
Between Within Total
37.7 15.6 53.3
2 12 14
18.9 1.30
14.5
0.0006
P369463-Ch009.qxd 9/2/05 2:59 PM Page 583
9.4 Multiple Comparisons of Treatment Means
583
Solution The low p-value indicates rejection of the null hypothesis of equal means. q=
xmax - xmin
=
sP / n
7 - 3.2
= 7.45 > 3.77 fi REJECT.
1.14/ 5
The Tukey HSD is q1-a,c,c ( n -1)
MSE
= 3.77 *
1.3
= 1.922.
5
n
The value 3.77 for q3,12 is from the q.95 studentized range table. 1. x1 - x2 = 7 - 5.8 = 1.2 < 1.922 fi not significantly different at a = 5%. 2. x1 - x3 = 7 - 3.2 = 3.8 > 1.922 fi significantly different at a = 5%. 3. x2 - x3 = 5.8 - 3.2 = 2.6 > 1.922 fi significantly different at a = 5%. Similarly, for confidence intervals, m i - m j Œ ( x i - x j ) ± q1-a ,c,c ( n -1)
m1 - m 2 Œ (7 - 5.8) ± q95,3,12
MSE
imply that
n
1.3 5
Œ1.2 ± 3.77 * 0.51 = ( -0.723, 3.123) and 0 is in the interval, m1 - m 3 Œ (7 - 3.2) ± 3.77 * 0.51 = (1.877, 5.723) and 0 is not in the interval, m 2 - m 3 Œ (5.8 - 3.2) ± 3.77 * 0.51 = (0.677, 4.523) and 0 is not in the interval. Performing simultaneous multiple comparisons exacts a price. If we had simply performed 3 separate t-tests using the pooled standard error to compare two drug means, the significant difference would have been Ê 1 1ˆ t12,025 MSE + = 2.179 * 1.14 * 0.632 = 1.57, Ë n n¯
compared to the Tukey HSD of 1.92.
Bonferroni Method The pooled t-test for comparing 2 means is given by Tm1mn ˆ ˆ2 - 2,a /2 =
( X 1 - X 2 ) - ( m1 - m 2 ) 2 Pooled
S
.
1ˆ Ê 1 + Ën n2 ¯ 1
For several t-tests, the mean squared error MSE is an improvement on the pooled variance.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 584
584
Chapter 9 Analysis of Variance
Before the Bonferroni method is used, the desired pair-wise comparisons to test must be stated in advance. That is, we do not need to make all pairwise tests as in the Tukey method. Let m be the total number of desired interval estimates. The Bonferroni collective confidence intervals are then given by 1ˆ Ê 1 m i - m j Œ x i - x j ± t N - c,a /(2 m ) MSE + Ën nj ¯ i
(9–11)
where m is the number of estimates, ni is the sample size of treatment i, and the degree of freedom associated with the critical t-value is the total sample size minus the number of treatments (N - c). Observe the nonstandard a for entering the t-table. The value is a /(2m). Each of the m comparisons is made with confidence 1 - a /m, resulting in a 1 - a confidence level for the set of comparisons. Comparisons that do not exceed 1ˆ Ê 1 t N - c,a /(2 m ) MSE + Ën nj ¯ i are not significant. EXAMPLE 9.15
Use the Bonferroni method for the data in Example 9.10 for m = 3 interval estimates at experiment a = 5%. x Drug A 5 6 8 7 9 7 Drug B 6 7 5 6 5 5.8 Drug C 3 2 4 3 4 3.2
The critical difference is 1ˆ Ê 1 Ê 1 1ˆ ta/(2 m ), N - c MSE + = t0.00833,15 - 3 1.3 + = 2.004. Ën ¯ Ë 5 5¯ nj i The critical t-value is located in the t-table for a%/2m = 5/6 = 0.833% with (15 - 3) degrees of freedom. With use of interpolation, t12,0.00833 ª 2.8, or the software command (inv-t 12 5/6) returns 2.7798. 1. x1 - x2 = 7 - 5.8 = 1.2 < 2.004 fi not significantly different. 2. x1 - x3 = 7 - 3.2 = 3.8 > 2.004 fi significantly different. 3. x2 - x3 = 5.8 - 3.2 = 2.6 > 2.004 fi significantly different. Similarly, 1. m1 - m 2 Œ7 - 5.8 ± 2.8 * 1.3 * .4 = 1.2 ± 2.004 = ( -0.804, 3.204), which is not significant and agrees with the Tukey method.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 585
9.4 Multiple Comparisons of Treatment Means
585
Note that the Bonferroni interval (-0.804, 3.204) is wider than the Tukey interval (-0.723, 3.123). 2. m1 - m3 Œ (7 - 3.2) ± 2.004 = 3.8 ± 2.004 or the interval (1.796, 5.804), which is at least significant at a = 5%. 3. m2 - m1 Œ(5.8 - 3.2) ± 2.004 = 2.6 ± 2.004 or the interval (0.596, 4.604), which is at least significant at a = 5%. The template (Bonferroni data L a) returns the paired means and the critical Bonferroni point where data is the list of treatment groups and L is a list of the locations of the means. For example, (setf data '((5 6 8 7 9)(6 7 5 6 5)(3 2 4 3 4))), ( Bonferroni data '((1 2)(1 3)(2 3)) 5) prints the display below. The means are (7 5.8 3.2). Comparisons
Critical Point
Significant Difference
7 - 5.8 = 1.2 < 2.004 7 - 3.2 = 3.8 > 2.004 5.8 - 3.2 = 2.6 > 2.004
NO YES YES
Comparisons between means: ((1 2)(1 3)(2 3)) t-value = 2.779, df = 12, a /2 = 0.0083. (Bonferroni - t data number = tests alpha) returns the t value for the family of tests. (Bonferroni = t data 3 5) Æ 2.780.
Tukey Method vs. Bonferroni Method The Bonferroni intervals are wider (less precise) than the Tukey intervals, but the Bonferroni method can be used for unequal sample sizes and also when only a limited number of tests are desired. The number of interval estimates must be specified in advance for the Bonferroni method. A 95% confidence interval for just considering the first two treatments m1 - m2 in Example 9.11 is given by m1 - m 2 Œ7 - 5.8 ± t0.025 * 0.8 = 1.2 ± 2.306 * 0.8 = ( -0.65, 3.04), where sP = 0.8. The same interval considering the variance of the entire sample (MSE) is given by 1ˆ Ê 1 m i - m j Œ x1 - x j ± ta /2,8 * MSE + , Ën nj ¯ i
P369463-Ch009.qxd 9/2/05 2:59 PM Page 586
586
Chapter 9 Analysis of Variance
which is 1.2 ± 2.306 * 0.721 = 1.2 ± 1.66 = ( -0.463, 2.86). The four values for the interval estimates m1 - m2 are ( -0.65, 3.04) for a pair-wise comparison considering just the 2 treatments, ( -0.46, 2.86) using the variance of all three treatments, ( -0.43, 2.83) using the Tukey method, and ( -0.82, 3.22) using the Bonferroni method.
9.5
ANOVA and Regression Regression and ANOVA use common techniques centered on the sum of squares expressions Sxx, SXY, and SYY. In the next example a simple 2K factorial design is used to illustrate the closeness of the two statistical tools. In a 22 factorial design, there are two main effects, called A and B, and the interaction effect, AB. Each of the 2 factors A and B have two levels usually called Low and High. The symbol (1) is used to indicate A and B are both Low, the symbol a indicates that A is High and B is Low, the symbol b indicates that B is High and A is Low, and the symbol ab indicates both A and B are High. Note the presence of the symbol indicates High; the absence, Low. The symbol I indicates A * B * AB. For any two numbers designated Low and High, coded variables, which have 1 and -1 values, can be used for Low and High. For example, consider the Low setting at 20 and the High setting at 45, say, for some temperature measurement. Then whenever T is High at 45 or Low at 20, X High = X Low =
T - ( High + Low )/2 ( High - Low )/2 T - ( High + Low )/2 ( High - Low )/2
= =
45 - (45 + 20)/2 (45 - 20)/2 20 - (45 + 20)/2 (45 - 20)/2
=
12.5
= 1.
12.5 =
-12.5
= -1.
12.5
In the diagram below, the A effects are averaged from the difference in the temperature response in A going from Low to High.
b High
ab
Low (1)
a High
P369463-Ch009.qxd 9/2/05 2:59 PM Page 587
9.5 ANOVA and Regression
587
A Effects = [( -1) + a - b + ab]/2
A at High level vs. A at Low level (from to -1).
B Effects = [( -1) - a + b + ab]/2 B at High level vs. B at Low level. AB = [(1) - a - b + ab]/2 A and B both at High level and Low level F Effects vs. A and B at Low-High and High-Low. Notice that the coefficients of the A effects are -1 1 -1 1 using the order (1) a b ab; that the coefficients of the B effects are -1 -1 1 1; that the coefficients of the AB effects are 1 -1 -1 1; and that the contrasts are orthogonal. EXAMPLE 9.16
Analyze the 22 factorial design and verify that the main effects for Factors A, B, and AB are twice the beta coefficients for the multiple linear regression model, that the grand mean is the intercept, and that the contrasts are orthogonal. B
A
Low 24 68 5
Low High B-Means
High 8 10 12 14 11
A-Means 6 10 8
The cell means are (1) = 3 (A Low B Low), a = 7 (A High, B Low), b = 9 (B High, A low), and ab = 13 (A High, B High).
(1) a b ab
I
A
B
AB
1 1 1 1
-1 1 -1 1
-1 -1 1 1
1 -1 -1 1
High b = 9
ab = 13
Low (1) = 3
a = 7 High
A Effects = [-(1) + a - b + ab]/2 = [-3 + 7 - 9 + 13]/2 = 4. B Effects = [-(1) - a + b + ab]/2 = [-3 - 7 + 9 + 13]/2 = 12 /2 = 6. AB Effects = [(1) - a - b + ab]/2 = [3 - 7 - 9 + 13]/2 = 0. Y
LA
LB
LAB
2 4 6 8 8 10 12 14
-1 -1 1 1 -1 -1 1 1
-1 -1 -1 -1 1 1 1 1
1 1 -1 -1 -1 -1 1 1
P369463-Ch009.qxd 9/2/05 2:59 PM Page 588
588
Chapter 9 Analysis of Variance
(setf x-data '(( -1 -1 1 1 -1 -1 1 1)( -1 -1 -1 -1 1 1 1 1) (1 1 -1 -1 -1 -1 1 1 1)) y-data '(2 4 6 8 8 10 12 14)) (y-hat x-data y-data) returns Y -hat = 8 + 2 X 1 + 3 X 2 + 0 X 3. Observe that 8, the intercept, is the grand mean; 2 and 3 are half of the A and B effects since the coded variables go from -1 to 1 representing a change of 2 and 0 indicates the absence of interaction. The ANOVA table shows SSRows at 32. Using the contrast for the A effects of -1 1 -1 1, corresponding to the means of 2
4
Ê ˆ cx ËÂ i i¯ (1), a, b, and ab, we get
i =1 4
Âc
2 i
=
( -3 + 7 - 9 + 13)2 2
/n
= 32 = SSRows .
i =1
Similarly SSColumns, with contrast (-1 -1 1 1), is
( -3 - 7 + 9 + 13)2
= 72, and
2
SSrc, with contrast 1 -1 -1 1 is (3 - 7 - 9 + 13)2/2 = 0. (anova '((2 4 6 8)(8 10 12 14)) 2) prints Source
SS
df
MS
F
p-value
Rows Columns RC Error Total
32 72 0 8 112
1 1 1 4 7
32 72 0 2
16 36 0
0.0161 0.0039
Analysis of variance can be solved using multiple regression techniques and indicator RVs. The indicator RVs indicate the populations from which the samples are taken. The number of indicator RVs required is one less than the number of treatments. Use (print-matrix matrix) to print the matrices (X-matrix x-data) (Y-matrix y-data) (XtX x-data) È1 -1 -1 1˘ È2˘ Í1 -1 -1 1˙ Í4˙ Í ˙ Í ˙ Í1 1 -1 -1˙ Í6˙ È8 0 0 0˘ Í ˙ Í ˙ Í0 8 0 0˙ 1 1 -1 -1˙ 8 ˙ X -Matrix = Í ; Y -Matrix = Í ˙; X T X -Matrix = Í Í1 -1 1 -1˙ Í8˙ Í0 0 8 0˙ Í ˙ Í ˙ ÍÎ ˙ 0 0 0 8˚ Í1 -1 1 -1˙ Í10˙ Í1 1 1 1˙ Í12˙ Í ˙ Í ˙ Î1 1 1 1˚ Î14˚
P369463-Ch009.qxd 9/2/05 2:59 PM Page 589
9.5 ANOVA and Regression
589
(inverse (Xtx x- data))(XtY x-data y-data) 0 0˘ È1/8 0 È64˘ Í 0 1/8 0 ˙ Í16˙ 0 ˙ X TY = Í ˙ ( X T X )-1 -Matrix = Í Í0 0 1/8 0 ˙ Í24˙ ÍÎ ˙˚ ÍÎ ˙˚ 0 0 0 1/8 0 È 1/8 1/8 Í -1/8 -1/8 C=Í Í -1/8 -1/8 ÍÎ 1/8 1/8
(C-matrix x-data) 1/8 1/8 1/8 1/8 1/8 1/8 -1/8 -1/8 -1/8 -1/8 1/8 1/8 -1/8 -1/8 -1/8 -1/8
1/8 1/8 1/8 1/8
1/8˘ 1/8˙ ˙ = ( X T X ) -1 X T 1/8˙ ˙ 1/8˚
(B-matrix x-data y-data) È8˘ Í2˙ B = CY = Í ˙ fi 8 + 2 X1 + 3 X 2 + 0 X1 X 2 . Í3˙ ÍÎ ˙˚ 0 EXAMPLE 9.17
The drying times in minutes for 3 different paints are shown below. Relate the ANOVA analysis with multiple linear regression. Paint A 122 125 120 124 125
Paint B
Paint C
130 138 135 135 130
145 148 149 145 150
Solution With use of dummy indicator variables, the data are shown as follows.
Paint: X1 X2
X1
X2
Paint
-1 -1 1
-1 1 1
A B C
A
B
C
122 125 120 124 125 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
130 138 135 135 130 -1 -1 -1 -1 -1 1 1 1 1 1
145 148 149 145 150 1 1 1 1 1 1 1 1 1 1
P369463-Ch009.qxd 9/2/05 2:59 PM Page 590
590
Chapter 9 Analysis of Variance
(serf x-data '(( -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1) ( -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1)) y-data '(122 125 120 124 125 130 138 135 135 130 145 148 149 145 150)) (y-hat x-data y-data) Æ Y -hat = 135.3 + 6.9 X 1 + 5.2 X 2. (test-beta x-data y-data) Æ Predictor X0 X1 X2
Coef
SE Coef
t-statistic
p-value
135.300 6.900 5.200
0.862 0.862 0.862
156.907 8.002 6.030
0.00 0.00 0.00
(regress-anova x-data y-data) Æ ANALYSIS OF VARIANCE Source Model Error Total
SS
df
MS
F
p-value
1473.711 89.226 1562.937
2 12 14
736.856 7.436
99.100
0.000000
(Sexplain x-data y-data) Æ 1473.7 (SSerror x-data y-data) Æ
9.6
89.2
Analysis of Means (ANOM) Another approach to seeking differences among treatments is the analysis of means procedures conceived by Ott. In this graphical procedure a decision interval is established between upper and lower decision points. Treatment means falling within the interval do not show any significant difference. Treatment means outside the interval do. The upper decision line (UDL) is given by x + ha s p ( k - 1)/ kn and the lower decision line (LDL) is given by x - ha s p ( k - 1)/ kn , where x is the grand mean, k is the number of treatments. n is the common treatment sample size, and ha is the critical multivariate t distribution value used for the analysis.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 591
9.6 Analysis of Means (ANOM)
EXAMPLE 9.18
591
Use the cable data below to perform an ANOM at a = 1%. Cable Cable Cable Cable Cable
1 2 3 4 5
19 40 33 26 30
24 47 15 22 54
12 33 23 26 49
33 35 21 27 61
32 33 24 15 40
19 35 23 31 47
12 46 22 23 37
11 19 25 16 50
24 23 30 25 58
28 36 33 33 61
12 26 35 22 57
13 17 30 26 27
18 32 38 22 29
21 24 39 39 34
23 22 12 16 46
20 29 44 26 26
22 35 29 31 58
12 25 36 26 48
15 23 27 35 36
33 39 41 20 34
20.15 30.95 29 25.35 44.1 with x = 29.91.
Sample means
53.08 72.16 72.84 39.82 141.67 with s 2P = 75.91.
Sample variances
Sample standard errors 7.29 Decision limit interval
8.49 8.53
6.31 11.90
with s P = 8.71.
(LDL, UDL) = (24.391, 35.428).
The error term ha s p ( k - 1)/(kn) = 3.17 * 8.71 * (5 - 1)/(5 * 20) = 5.52, where h(a = 0.01, k = 5, df = 95) = 3.17 and s P = 8.71. The command (H-alpha k v a) returns ha with k as number of treatments, v as the degrees of freedom associated with the pooled variance s2P, and a as the specified significance level. Thus, LDL = 29.91 - 5.52 = 24.39 and HDL = 29.91 + 5.52 = 35.43. Figure 9.9 shows that mean values 20.15 and 44.1 are outside the decision limits and are significantly different from the three mean values 30.95, 29, and 25.35 inside the interval. The one-way ANOVA is given to compare the results. Notice that the mean square error is the pooled variance.
50 44.1
40
UDL 30.95 29
30
25.35
LDL
20.15
20 10 0 0
Figure 9.9
1
2
3
Analysis of Means
4
5
6
P369463-Ch009.qxd 9/2/05 2:59 PM Page 592
592
Chapter 9 Analysis of Variance
Source
SS
df
MS
F
p-value
Between Within Total
6386.34 7211.85 13598.19
4 95 99
1596.58 75.91
21.03
0.0001
The command (ANOM cables a) returns the means of the cable data, the sample variances and standard errors, the pooled sample standard error, and the upper and lower decision limits comprising the critical decision interval. (ANOM cables 1) returned the tabled data where cables is a list of the cable data at a = 1%. The command (ANOM-plot data optional-a) plots the UDL, the LDL, and the means. For example, (ANOM-plot cables 5) prints Analysis of Means 44.10 34.44 —————————————— UDL 30.95 29.00 25.38 —————————————— LDL 25.35 20.15
Graphical Analysis of Treatment Means Plot the treatment means along the horizontal axis displaying a normal distribution under the null hypothesis of no significant difference among these means. This distribution can be estimated as N(m = x, s 2 = MSE). Then try to visualize x ± 3s, the spread of 6s of the normal distribution, encompassing the treatment means. For example, given the following data,
T1
T2
T3
T4
2 3 5 2 S 12 3 x
3 3 5 5 16 4
8 7 7 6 28 7
3 1 2 2 8 2
P369463-Ch009.qxd 9/2/05 2:59 PM Page 593
9.7 Summary
593
0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0
0
1
2
3
2
Figure 9.10
3
4
5
6
4
7
8
7
ANOM N(m = x = 4, s 2 = MSE = 1.17)
the ANOVA table is Source
SS
df
MS
F
p-value
Between Within Total
56 14 70
3 12 15
18.666 1.166
16
0.0001
In Figure 9.10, notice that the mean of 7 is outside the 6s spread of the estimated normal distribution. The distribution can be slid along the axis to see if the treatment means can be covered within the spread.
9.7
Summary When more than two comparisons are necessary, the analysis of variance techniques can be used to indicate if there is significant difference among the treatment means using the F statistic. The assumptions for an ANOVA are sampling from normal populations with homogeneity of variance. The Bartlett test is used to test for homogeneity of variance. The population variance between treatments is compared to the variance within treatments. Under the null hypothesis of no significant difference, the
P369463-Ch009.qxd 9/2/05 2:59 PM Page 594
594
Chapter 9 Analysis of Variance
ratio should be close to 1. The variance between treatments is higher than the variation within treatments when the null hypothesis is rejected, that is, when the F-ratio is significantly higher than 1. If there is a significance difference among the means, the Tukey or Bonferroni method can be used to determine which pair-wise comparisons are significant collectively. The more graphical but less powerful analysis of means (ANOM) can also be used. Although we have shown several methods for making comparisons between pairs of means, we have done so to illustrate the methods. Such practice is not proper. That is, one does not keep trying different tests until one succeeds. The proper test should be planned in advance of seeing the data. If a significant difference is found when comparing the means with a t-test for each in the collection, the experimenter may repeat the experiment with random sampling data from the two respective populations and t-test for significance. EXAMPLE 9.19
Perform a one-way ANOVA on the cable data at a = 5%.
Cable Cable Cable Cable Cable
1 2 3 4 5
19 40 33 26 30
24 47 15 22 54
12 33 23 26 49
33 35 21 27 61
32 33 24 15 40
19 35 23 31 47
12 46 22 23 37
11 19 25 16 50
24 23 30 25 58
28 36 33 33 61
12 26 35 22 57
13 17 30 26 27
18 32 38 22 29
21 24 12 39 34
23 22 39 16 46
20 29 44 26 26
22 35 29 31 58
12 25 36 26 48
15 23 27 35 36
33 39 41 20 34
H 0: s i2 = s 2j vs. H1: s i2 π s 2j for some i π j. 1. Perform the Bartlett test for homogeneity of variance. (Bartlett cables) returns B = 8.7689 with p-value = 0.0669, implying we cannot reject the assumption of homogeneity of variances at a = 5%. 2. The command (anova cables) prints the ANOVA table.
Source
SS
df
MS
F
p-value
Between Within Total
6386.34 7211.85 13598.19
4 95 99
1596.58 75.91
21.03
0.0001
A significant F statistic permits multiple comparisons tests. 3. With use of the Tukey procedures, there are 5C2 = 10 paired mean comparisons.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 595
9.7 Summary
595
2 The MSE or pooled variance s Pooled of the cable data is 75.91. The HSD is
MSE
q1-a,c,c ( n -1)
n
= q0.95,5,95
75.91
= 3.95 * 1.95 = 7.70,
20
where q0.95,5,95 = 3.95. 4. The five cable means x1, x2, x3, x4, and x5 are, respectively, 20.15, 30.95, 29, 25.35, and 44.1. Performing the 10 comparisons (10C2), we have, x2 - x1 = 30.95 - 20.15 = 10.75 > 7.70 fi 30.95 is significantly higher than 20.15 x2 - x3 = 30.95 - 29.00 = 1.95 < 7.70 fi 30.95 is not significantly different from 29. x2 - x4 = 30.95 - 25.35 = 5.6 < 7.70 fi 30.95 is not significantly different from 25.35. x5 - x1 = 44.10 - 20.15 = 23.95 > 7.70 fi 44.1 is significantly higher than 20.15. x5 - x2 = 44.10 - 30.95 = 13.15 > 7.70 fi 44.1 is significantly higher than 30.95. x5 - x3 = 44.10 - 29.00 = 15.1 > 7.70 fi 44.1 is significantly higher than 29. x5 - x4 = 44.10 - 25.35 = 18.75 > 7.70 fi 44.1 is significantly higher than 25.35. x3 - x1 = 29.00 - 20.15 = 8.85 > 7.70 fi 29 is significantly higher than 20.15. x3 - x4 = 29.00 - 25.35 = 3.65 < 7.70 fi 29 is not significantly higher than 25.35. x3 - x1 = 29.00 - 20.15 = 8.85 > 7.70 fi 29 is significantly higher than 20.15. Each test is at the 5% level of significance. 5. Use the Bonferroni method to test all combinations of the 10 means. The command (Bonferroni cables (combination-list (upto 5) 2) 5) returns the following. The means are (20.15, 30.95, 29, 25.35, 44.1).
P369463-Ch009.qxd 9/2/05 2:59 PM Page 596
596
Chapter 9 Analysis of Variance
50 44.1
40
UDL
30
30.95
x
29
25.35
LDL
20.15
20 10 0 1
0
Figure 9.11
2
-
4
5
6
Analysis of Means
Comparisons 20.15 20.15 30.95 20.15 30.95 29 20.15 30.95 29 25.35
3
30.95 29 29 25.35 25.35 25.35 44.1 44.1 44.1 44.1
Critical Point = = = = = = = = = =
10.8 8.85 1.95 5.2 5.6 3.65 23.95 13.15 15.1 18.75
> > < < < < > > > >
Significant Difference
7.919 7.919 7.919 7.919 7.919 7.919 7.919 7.919 7.919 7.919
Comparisons between means: t-value = 2.874
YES YES NO NO NO NO YES YES YES YES
df = 95 a /2 = 0.0025
((1 2)(1 3)(2 3)(1 4)(2 4)(3 4)(1 5)(2 5)(3 5)(4 5)) 6. Observe the advantage of the ANOM graphical display of the results in Figure 9.11 over the ANOVA table and subsequent analysis in spite of the ANOM’s slight lack of precision. The plot is displayed in Figure 9.11.
PROBLEMS Under ANOVA assumptions of random samples from normally distributed populations with common but unknown variance s 2, solve the following problems. 1. The gas mileage for 3 different types of cars is shown in the table below. Is there a difference in gas mileage among the cars at a = 0.05? ans. p-value = 0.285.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 597
597
Problems Car A
Car B
Car C
24 25 26 24 25 23
25 27 27 23 25 25
22 25 24 25 26 22 Source
SS
df
MS
F
p-value
Between Within Total
5.44 30.83 36.27
2 15 17
2.72 2.06
1.32
0.285
2. Set a at 5% to test whether the 3 methods of teaching are significant. Theoretical
Empirical
Mixed Mode
77 86 71 75 80 77
83 91 75 78 82 80
78 85 64 77 81 75
3. Test at a = 5% to see if there is a significance difference in tensile strength of 4 different brands of cables. ans. p = value = 0.005. Brand W
Brand X
Brand Y
Brand Z
136 138 129 140 132
164 160 170 180 155
171 177 165 162 180
130 192 120 125 130
4. a) Test H0: m1 = m2 = m3 at a = 5% to determine if population means are equal. Sample 1 3.1 4.3 1.2
Sample 2
Sample 3
5.4 3.6 4.0
1.1 0.2 3.0
b) After doing the conceptual computations, verify using the following computational formulas: SS Total = S x 2 - ( S x )2 / N , SS Between = ST i2 / ni - ( S x )2 / N 2 c) Compute s2 for each sample and show that MSE = sPooled .
P369463-Ch009.qxd 9/2/05 2:59 PM Page 598
598
Chapter 9 Analysis of Variance
5. Twelve independent samples were taken of each of 4 brands. The x’s for each brand are shown in the table and SSError is 700. Is there significant difference among the means at a = 1%? Brand x
1 80
2 81
3 86
4 90
6. Determine if there is a difference among the 3 levels of executive rank. Notice that you will have to sort the data by level before performing an ANOVA. Person 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Level C C C C L L M C M L M L L M M L L M C M Score 92 104 90 80 130 90 156 108 93 128 80 105 110 133 128 130 144 159 140 135
7. Perform a two-way ANOVA without replication on the data below. ans. 32 12.33. Factor Y Levels E F
Factor X
A 5 9
B 3 8
C 8 13
D 4 6
8. Perform a two-way ANOVA with 2 replications. Factor Y Levels E
A 5 7 11 9
Block X F
B 3 4 10 8
C 8 6 15 13
D 4 6 8 6
9. Perform an ANOVA on the regression problem. Show that the square for the critical t-value at level a is equal to the critical F-value (1, 4 df ) at 2a. ans. t2 = 7.6032 = 57.805 = F. x Y
1.5 3.6
2.0 2.8
2.5 5.6
3.0 7.0
3.5 9.0
4.0 10.5
10. The following 5 groups of 20 digits are the first 100 decimal digits of the constant e. Is there a difference among the groups? Observe the frequency of the digit 9 in each row. Use the Bartlett test for homogeneity. 7 0 4 6 4
1 2 7 2 5
8 8 0 7 7
2 7 9 7 1
8 4 3 2 3
1 7 6 4 8
8 1 9 0 2
2 3 9 7 1
8 5 9 6 7
4 2 5 6 8
5 6 9 3 5
9 6 5 0 2
0 2 7 3 5
4 4 4 5 1
5 9 9 3 6
2 7 6 5 6
3 7 6 4 4
5 5 9 7 2
3 7 6 5 7
6 2 7 9 4
11. The F-ratio in a one-way ANOVA was computed as F = 4.23 with degrees of freedom 4, 36. Find the number of treatments, the total sample size, and the significance of the test. ans. 5 41 p-value = 0.00658.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 599
599
Problems
12. Given that the coefficient of determination R2 = 0.8 for 12 x-Y pairs in linear regression, determine the F-ratio. See Review Problem 3. 13. Show that m
m
m
n
 Â(x
ij
n
- x )2 = n  ( x i . - x )2 +   ( x ij - x i . )2
i =1 j =1
i =1
i =1 j =1
for m treatment groups with n samples in each. 14. Perform a two-way ANOVA with 4 replications on the data below, where the scores indicate pounds lost on 2 different diets and 4 different exercise regimens, with a = 5%.
Levels E
A 5 7 8 3
Exercise B C 3 8 4 6 5 7 7 8
F
2 11 12 9
1 10 6 8
D 4 6 9 10
Diet 6 15 14 13
7 9 6 6
15. Consider the following three data sets.
T1 10 10 15
Set T2 15 20 25
1 T3 30 35 40
T4 45 50 50
U1 15 15 20
Set U2 10 15 20
2 U3 40 45 50
U4 35 40 40
V1 5 10 20
Set V2 20 20 20
3 V3 25 45 35
V4 60 40 45
a) If SSWithin = 133.3 in Set 1, why does it also equal 133.3 in Set 2? b) If SSBetween = 2072.91 in Set 1, why does it also equal 2072.91 in Set 3? c) Do the following two commands equate? ans. Yes. (anova '((10 10 15)(15 20 25)(30 35 40)(45 50 50))) (anova '((0 0 5)(5 10 25)(20 25 30)(35 40 40))) d) What is the relation of F-ratios and sum of squares in the following two commands? (anova '((10 10 15)(15 20 25)(30 35 40)(45 50 50))) ans. Same F-ratios ; note that data differ by the multiplicative constant 5. SSb & SSw factor of 25 (anova '((2 2 3)(3 4 5)(6 7 8)(9 10 10)))
P369463-Ch009.qxd 9/2/05 2:59 PM Page 600
600
Chapter 9 Analysis of Variance
16. Use the Tukey HSD test to determine which of the 4 means below are significant where the results of the ANOVA comparing 4 treatments of sample size 5 produced a significant F-ratio with MSE = 25. x1 = 25
x2 = 32
x3 = 35
x4 = 36
17. Complete the ANOVA table from the data for the 2 ¥ 4 factor experiment.
Factor Y Levels E
A 3 8
B 12 9
C 10 11
D 20 18
F
5 4
11 10
7 8
10 9
Factor X
Source
SS
df
MS
F
Rows Columns RC Error Total
18. a) Determine if interaction is present in the cell display of means for a two-way ANOVA.
7
8
15
7
9
12
7
9
10
b) Show that the significant interaction can mask the main effects using the following contrived yield data from two levels of pressure and temperature:
TEMPERATURE
PRESSURE High
Low
High
20, 18, 16
7, 9, 12,
Low
12, 9, 7
16, 18, 20
P369463-Ch009.qxd 9/2/05 2:59 PM Page 601
601
Problems
19. Find the contrast sum of squares from the following data and contrasts (Software Exercise 15). T1 T2 T3
5 6 4 5 9 7 7 5 2 5 2 3
a) La = (11 -2) b) Lb = ( -1 -1 2) c) Lc = (1 -2 1)
d) Ld = ( -2 11)
ans. 24 24 24 0.
20. Use the cable data to show that the sum of squares for the following set of orthogonal contrasts equals the between sum of squares for a oneway ANOVA: (1 1 1 1 -4)(1 1 1 -3 0)(1 1 -2 0 0)(1 -1 0 0 0). For example, (SSL cables '(1 1 1 1 - 4)) returns 5033.90. Repeat for orthogonal contrasts ( -4 1 1 1 1), (0 -3 1 1 1), (0 0 -2 1 1), and (0 0 0 -1 1). 21. Determine whether it is appropriate to block on the rows for the following data. Columns Levels R1
C1 4 6 7 8 10 9
Rows R2
C2 5 9 11 5 6 8
C3 9 9 12 11 9 10
22. Find the contrast sum of squares for contrast La = (1 1 0 -2) and Lb = (1 -1 0 0) for the following data set. T1 2 3 5 2 4 5 5
T2
T3
T4
3 3 5 5 4 3
8 7 7 6 7
3 1 2 2
23. Multiple linear regression with 3 regressor variables and 20 observations produced the ANOVA table with some blanks. Complete the table and test the hypothesis of no linear regression, using a = 5%. Source Regression Error Total
SS
df
2700
3 16
5000
MS
F
p-value
P369463-Ch009.qxd 9/2/05 2:59 PM Page 602
602
Chapter 9 Analysis of Variance
24. A multiple linear regression with 4 regressor variables and 25 observations produced the ANOVA table. Complete the table and test the hypothesis of no linear regression at a = 5%. Source
SS
Regression Error Total
9575
df
MS
F
p-value
10312
25. (rest (Dice-4 14) ) returned the following canonical patterns: ((11 6 6)(1 2 5 6)(1 3 4 6)(1 3 5 5)(1 4 4 5)(2 2 4 6) (2 2 5 5)(2 3 3 6)(2 3 4 5)(2 4 4 4)(3 3 3 5)(3 3 4 4)). Predict the F-ratio and p-value for a one-way ANOVA on the 12 groups. ans. 0 1.
REVIEW 1. The following student test scores resulted from two different methods of teaching. Determine if there is a significant difference between the groups. A-scores B-scores
82 90 78 76 88 60 77 89 90 85 90 65 95 87 78 97 67 84 87 93
2. Perform an ANOVA on the data in Review problem 1 and verify that the F-ratio is equal to the square of the t-value. 3. Show that the F-ratio in regression analysis, the ratio of the mean square–explained variation to the mean square error, is given by the formula below, where R2 is the coefficient of determination. F =
R2 (1 - R 2 )/( n - 2)
.
4. Find the t-value for testing the null hypothesis that the slope b = 0 versus the alternative hypothesis that b π 0 in simple linear regression and show that the square of the value is equal to the F-ratio for the following data: x 1 2 3 4 5 Y 2 5 8 9 7 5. Show that confidence interval testing, hypothesis testing, regression testing of b = 0, and the analysis of variance F-ratio test for a set at 5% lead to the same conclusion for the two samples below, drawn from normal distributions with the same variance.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 603
603
Software Exercises
Sample 1: 2.02 3.12 5.42 4.27 1.65 5.73 4.06 1.62 3.89 1.60 (x = 3.34, s 2 = 2.48) Sample 2: 4.75 7.47 9.58 8.29 5.78 7.18 8.85 5.32 8.27 11.01 (x = 7.65, s 2 = 3.87) 6. Show that the F-test for a contrast is the square of the t-statistic for a pooled t-test with equal samples with the same variance. Use the following data with contrast L = (1 - 1). X: (19 24 12 33 32 19 12 11 24 28 12 13 18 21 23 20 22 12 15 33) Y : ( 40 47 33 35 33 35 46 19 23 36 26 17 32 24 22 29 35 25 23 39) (F-L (list x y) '(1 - 1)) Æ 18.62711 Æ -4.3159
(T-pool x y)
SOFTWARE EXERCISES (setf data '((22 25 24 25 26 22)(24 25 26 24 25 23) (25 27 27 23 25 25))) 1. (SSb data) returns the between variation for a one-way ANOVA where data is a list of the columns. In Problem 1, (SSb data) returns (5.44 2). 2. (SSw data) returns the within variation or error for a one-way ANOVA. 3. (SSt data) returns the total variation for a one-way ANOVA. 4. (anova data) returns an ANOVA table displaying the results. Source
SS
df
MS
F
P-value
Between Within Total
5.443 30.833 36.278
2 15 17
2.722 2.056
1.324
0.2954
5. (row-means data nrows) returns the row means where nrows is the number of rows for a two-way ANOVA with replication. (row-means data 2) Æ (25 24.22). 6. (column-means data) returns the column means for a two-way ANOVA. (column-means data) returns (24.0 24.5 25.333334) 7. (cell-means data num-rows) returns the cell means for a two way ANOVA where num-rows is the number of rows. (cell-means data 2) Æ ((23.666666 25.0 26.333334) (24.333334 24.0 24.333334))
P369463-Ch009.qxd 9/2/05 2:59 PM Page 604
604
Chapter 9 Analysis of Variance
8. (anova data num-rows) returns an ANOVA table. (anova data 2) returns Source
SS
df
MS
F
P-value
Rows Columns RC Error Total
2.722 5.444 5.444 22.667 36.278
1 2 2 12 17
2.722 2.722 2.722 1.889
1.441 1.441 1.441
0.2531 0.2748 0.2748
9. Knowing that there were three treatment groups of respective sample sizes 15, 21, and 24, complete the following ANOVA table. Source
SS
Between Within Total
df
MS
F
124 800
10. (Bartlett data) returns the Bartlett statistic B and p-value for testing homogeneity of variances, where data is a list of the treatment groups. (Bartlett data) Æ ( B = 1.0029 p-value = 0.6067) 11. Determine if there is a significant difference among the preference brands for soap shown. A 7.7 8.9 7.5 8.9 6.5 5
B
C
D
7.8 8.8 7.3 9 8 8
8 8 9 9 7 6
9 9 9 9 6 7
Find the mean square error without performing an ANOVA. ans. pooled variance = 1.44 12. Is there a difference among the 5 different treatment groups? T1 14 15.7 12.9 13.6 14.2
T2
T3
T4
T5
15.6 14.2 13.5 18.2 16.3
12.5 16 15.4 14.8 15.2
15 14.3 13.8 16.2 15.9
16.8 17.5 14.9 17.4 17.8
P369463-Ch009.qxd 9/2/05 2:59 PM Page 605
605
Software Exercises
13. Try the following command: (anova (list (sim-normal 50 10 30)(sim-norma 55 10 30) (sim-normal 65 10 30))) Vary the mean and variance for (sim-normal m s n) to see when the ANOVA becomes significant. Try various sizes for n. One trial returned Source Between Within Total
SS 16318.035 436329.840 452647.870
df 2 87 89
MS 8159.018 5015.286
F 1.627
p-value 0.202476
14. The command (Bonferroni data L a) returns the paired means and the critical Bonferroni point where data is the list of treatment groups and L is a list of the locations of the means. (setf data '((22 25 24 25 26 22) (24 25 26 24 25 23) (25 27 27 23 25 25))) (Bonferroni data '((1 2) (1 3) (2 3)) 5) prints the display and returns the t-value 2.694. The means are (24 24.5 24.666) Comparisons
Critical Point
24 - 24.5 = 0.500 < 2.230 24 - 24.66 = 0.666 < 2.230 24.5 - 24.66 = 0.166 < 2.230
Significant Difference NO NO NO
3 Comparisons between means 1 to 3: ((1 2)(1 3)(2 3)) t-value = 2.694 15. The command (SSL data L) returns the contrast sum of squares for contrast L coefficients. T1 T2 T3
(SSL (SSL (SSL (SSL
'((5 '((5 '((5 '((5
6 6 6 6
5 6 4 5 9 7 7 5 2 5 2 3
4 4 4 4
5)(9 5)(9 5)(9 5)(9
7 7 7 7
7 7 7 7
a) La = (1 1 -2)
b) Lb = ( -1 -1 2)
c) Lc = (1 -2 1)
d) Ld = ( -2 1 1)
5)(2 5)(2 5)(2 5)(2
5 5 5 5
2 2 2 2
3)) 3)) 3)) 3))
'(1 1 -2)) returns 24 for La. '(-1 -1 2)) returns 24 for Lb. '(1 -2 1)) returns 24 for Lc. '(-2 1 1)) returns 0 for Ld.
16. The command (F-L data contrast) returns the F-ratio and p-value for testing the contrast. Test the contrasts in software exercise 15. 17. The command (ANOM data a) returns the means of the data, the sample variances and standard errors, the pooled sample, standard error,
P369463-Ch009.qxd 9/2/05 2:59 PM Page 606
606
Chapter 9 Analysis of Variance
and a graphical display of the upper and lower decision limits comprising the critical decision interval. (anom cables 5) returns Means Grand Mean Variations
(20.15 30.95 29 25.35 44.1) 29.91 (53.081 72.155 72.842 39.818 141.673) with 2 sPooled = 75.914. (7.285 8.494 8.534 6.310 11.902) with spooled = 8.712 LDL = 25.384 UDL = 34.435.
Standard Errors Decision Limits
18. The command (ANOM-plot data a) returns a graphical display of the group means. (anom-plot cables 5) returns Analysis of Means 20.15 25.384 —————————————— 25.35 ——————— LDL 29 30.95 34.435 ———————————————————————— — UDL 44.1
19. Use the treatment data below to find the sum of squares for the following contrasts: a) La = (1 1 -2), c) Lc = (1 -2 1), T1 T2 T3
16 20 16 13 14 14 19 13 14 17 13 15
b) Lb = (-1 -1 2), d) Ld = (-2 1 1).
(setf data '((16 20 16 13)(14 14 19 13)(14 17 13 15))) a) b) c) d)
(SSL (SSL (SSL (SSL
data data data data
'(1 1 2)) Æ 2.041667, '(-1 -1 2)) Æ 2.041667, '(1 -2 1)) Æ 0.666667, '(-2 1 1) Æ 5.041667,
20. Use the cable data in Example 9.12 to show that the sum of squares for the following set of orthogonal contrasts equals the between sum of squares for a one-way ANOVA. Repeat for orthogonal contrasts (-4 1 1 1 1), (0 -3 1 1 1), (0 0 -2 1 1), and (0 0 0 -1 1). Using Command (SSL (SSL (SSL (SSL
cables cables cables cables
'(1 '(1 '(1 '(1
1 1 1 -4)) 1 1 -3 0)) 1 -2 0 0)) -1 0 0 0))
returns 5033.90 returns 27.34 returns 158.70 returns 1166.40 totaling 6386.34.
Notice that the between sum of squares 6386.34 has been partitioned by the orthogonal contrasts.
P369463-Ch009.qxd 9/2/05 2:59 PM Page 607
607
Self Quiz 9: Analysis of Variance
Source
SS
df
MS
F
p-value
Fc a = 0.05
Between Within Total
6386.34 7211.85 139598.19
4 95 99
1596.58 75.91
21.03
0.000¢
2.46
(SSL (SSL (SSL (SSL
cables cables cables cables
'(-4 1 1 1 '(0 -3 1 1 '(0 0 -2 1 '(0 0 0 -1
1)) returns 2381.44; 1 1)) returns 52.27. 1)) returns 437.01; 1)) returns 3515.63.
The sum is 6386.34. Try (C-anova cables '((-4 1 1 1 1)(0 -3 1 1 1)(0 0 -2 1 1)(0 0 0 -1 1))).
SELF QUIZ 9: ANALYSIS OF VARIANCE 1. Given the following incomplete ANOVA table, indicate true or false for the statements. Source
SS
df
Between Within
900 640
2 8
True or False? a) b) c) d) e)
The null hypothesis states that all 4 means are equal. The null hypothesis can be rejected at a = 5%. There are 10 observations in the experiment. The mean square error (MSE) is 1.41. If each ANOVA observation is doubled, SSB and SSW would also double.
2. a) Complete the table from the following data for 3 different treatments.
Source
T1
T2
T3
10 11 12
12 14 16
16 20 12
SS
df
MS
F
Fc
Between Within Total
b) Compute the contrast sum of squares for the following contrasts. i) La = (1 1 -2) ii) Lb = (1 -1 0)
P369463-Ch009.qxd 9/2/05 2:59 PM Page 608
608
Chapter 9 Analysis of Variance
3. Complete the table for the ANOVA 2 ¥ 4 experiment and indicate F-test results at a = 5%. Factor Y Levels E
A 4 8
B 12 10
C 10 8
D 4 6
F
4 4
10 8
10 8
10 12
Factor X
Source
SS
df
MS
F
Reject y or n
Rows Columns RC Error Total
4. Find the MSE for the following three treatment data sets. T1: 10 15 12 13 12 17 T2: 20 19 17 16 17 18 T3: 18 12 10 22 25 18
16 18
15 19
14
12
5. A random sample of size 16 from N(75, 64) and a second sample of size 9 from N(70, 144) are taken, resulting in respective means X 1 and X 2. a) P( X 1 - X 2 > 4) = ______. b) P(3.5 £ X 1 - X 2 £ 5.5) = ______. 6. (anova '((2.1 3.2 6.8)(4.5 6.2 7.3)(5.3 2.5 4.8)(7.3 8.9 9.2))) returned 60.922 for SSTotal. (anova '((4.2 6.4 13.6)(9 12.4 14.6)(10.6 5 9.6) (14.6 17.8 18.4))) returns SSTotal = ______.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 609
Chapter 10
Nonparametric Statistics
Sire, I have no need of that hypothesis. ~ Laplace
If small samples are not taken from normal distributions, the parametric tests used for confidence intervals, hypothesis testing, and linear regression are not applicable. Nonparametric statistics are free from assumptions that the data came from a specified distribution. The only assumption that needs to be made is that the population from which the sample is drawn is continuous. Some nonparametric tests need the additional assumption of symmetry. Nonparametric statistics have widespread applicability for qualitative data on an ordinal scale. 10.0 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 10.10 10.11 10.12
Introduction The Sign Test Nonparametric Bootstrap Estimation The Sign Test for Paired Data The Wilcoxon Signed-Rank Test Wilcoxon-Mann-Whitney (WMW) Rank Test for Two Samples Spearman Rank Order Correlation Coefficient Kendall’s Rank Correlation Coefficient (t) Nonparametric Tests for Regression Nonparametric Tests for ANOVA Runs Test Randomization Tests Summary 609
P369463-Ch010.qxd 9/2/05 3:00 PM Page 610
610
10.0
Chapter 10 Nonparametric Statistics
Introduction In parametric statistics, in our hypothesis testing of the parameters, and in our estimation of these parameters, we carried the strong assumptions that our underlying population for our samples was normal, with homogeneity of variances. In regression we assumed that our error was normally distributed with a mean of zero. In the analysis of variance we again assumed that our samples were taken from normal populations with equal variances. Of course, we must test our assumptions for validity before, during, and after the analysis. Although the normality assumption is relatively robust, there are times when the assumption does not hold, and to assume that it does when it doesn’t can result in serious errors. Also some of our data may not be naturally quantifiable, as, for example, yes/no preference responses of likes and dislikes. Many nonparametric techniques exist for testing data without assuming normality of the underlying population and for data that can be ranked (ordinal scale). The tests are called distribution free tests. Nonparametric tests do not seek the values of the population parameters, but rather focus on the sample populations. Because nonparametric statistics do not rely on underlying assumptions, they have widespread applicability. Nonparametric statistics are based on order statistics. As such, they have the further advantage of being applicable when actual measurements are not practical, as long as rank order is available. On the other hand, nonparametric tests are less precise (have less power) in detecting differences between two groups.
10.1
The Sign Test When a small sample of data is collected from a population that is very much unlike a normal, the t-test is not applicable. Suppose we want to test the null hypothesis that RV X of a continuous distribution is less than the median of the distribution. The median of a distribution indicated by m˜ is defined as the value for which P ( X £ m˜ ) = P ( X ≥ m˜ ) = 1/2.
(10–1)
If the distribution is normal, then m = m˜ . Although we can test the sample data by the t-test if the population is normal, we can use the sign test for any continuous distribution. The procedure for the sign test is as follows. Let X1, . . . , Xn be a random sample from the population. Compare each Xi with the hypothesized median m˜ and count the number above, equal to, and below m˜ . The probability of X equaling the median for a continuous distribution is zero but can occur in the gathering of the sample through the lack of precision in measurement. The values, which exactly equal the
P369463-Ch010.qxd 9/2/05 3:00 PM Page 611
10.1 The Sign Test
611
median, should be removed from the analysis, and the sample size should be reduced accordingly. We now have a binomial random variable with parameter p = 1/2 to test whether the difference between the number above the median is significantly different from the number below the median. We can test both one-sided hypotheses and two-sided hypotheses. We illustrate the procedure in the following example with simulated data from a normal distribution with m = 5 and s = 4. EXAMPLE 10.1
The 12 samples below were taken from a continuous distribution with unknown median m˜ . Use the sign test to test H 0: m˜ = 5 vs. H1: m˜ > 5 at a = 0.05. Data: 4.6 -2.9 4.8 2.2 10.9 0.1 5.6 4.1 0.3 3.3 5.1 7.7. Solution
The 4 samples above the median are underlined.
(setf data '(4.6 - 2.9 4.8 2.2 10.9 0.1 5.6 4.1 0.3 3.3 5.1 7.7)) The stem and leaf diagram generated from (stem&leaf data) prints a fairly symmetric display of the data.
Stem and Leaf 1 -2 9 1 -1 3 0 03 3 1 4 2 2 5 3 3 8 4 168 10 5 16 10 6 11 7 7 11 8 11 9 12 10 9
N = 12
The mean x of the 12 samples is 3.8, the median is 4.35, and the sample standard deviation is 3.6. Eight of the 12 values are below the hypothesized median of 5, and 4 of the 12 values are above the median. This sign statistic is then a binomial random variable with parameters n = 12 and P(X > m˜ ) = p = 1/2. The p-value for the test is computed as P(X ≥ 8) = (-1 (cbinomial n p x)) = (-1 (cbinomial 12 1/2 7)) Æ 0.1938 > a = 0.05, and H0 cannot be rejected.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 612
612
Chapter 10 Nonparametric Statistics
For X = 3, P ( X < 3) = (cbinomial 12 1/2 2) Æ 0.019 and P ( X > 9) = (cbinomial-a-b 12 1/2 10 12) Æ 0.019. A 96% confidence limit for the median is (X(2), X(10)) = (0.1, 5.6). Note that the H0 median 5 is in the interval, confirming the decision to not reject. The sorted data elements are ( -2.9 0.1 0.3 2.2 3.3 4.1 4.6 4.8 5.1 5.6 7.7 10.9) The sign test is not as powerful as the t-test when both tests are applicable, since the t-test uses more of the information in the samples than the sign test. The sign test is usually reserved for testing the median of a population.
The command (sign-test data H0 tail-type), where data is a list of the values and H0 is the median under the assumption that the null hypothesis is true and tail-type is upper, lower, or two-sided, yields a summary table for the sign test. For example, (sign-test '(4.6 - 2.9 4.8 2.2 10.9 0.0 5.6 4.1 0.3 3.3 5.1 7.7) 5 'upper) generated the following table:
n 12
Below 8
Equal 0
Above 4
p-value 0.1938
Median 4.35 b
The command (cbinomial-a-b n p a b) returns
.
n
 ÊË xˆ¯ p
x
q ( n - x ),
x=a
P(a £ X £ b). (cbinomial-a-b 12 1/2 8 12) Æ 0.1938 = P ( X ≥ 8).
EXAMPLE 10.2
Use the normal approximation to the binomial and test the data in Example 10.1 to see if H0: m˜ = 5 versus H1: m˜ > 5 at a = 0.05 can be rejected. Solution The normal distribution can be used to approximate a binomial RV with parameters n = 12 and p = 1/2, with m = np = 12 * 1/2 = 6, and s 2 = npq = 12* 1/2 * 1/2 = 3. The test is considered valid for n ≥ 12. Z=
X -m s
=
X - 0.5 n 0.5 n
=
4-6
= -1.155
0.5 12
with p-value = 0.1241, again failing to reject the null hypothesis.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 613
10.2 Nonparametric Bootstrap Estimation
613
With continuity correction the calculation is Z=
( X - 0.5) - 0.5 n
=
0.5 n with p-value = 0.0745.
10.2
(4 - 0.5) - 6
= -1.4433
0.5 12
Nonparametric Bootstrap Estimation Whenever only vague knowledge is known about a distribution from which a sample was drawn, the bootstrap procedure becomes appropriate. Vague knowledge may pertain to the continuity of the distribution or to symmetry assumptions. Multiple samples from the original sample are taken and for each sample the statistic of interest is calculated. Usually 1000 or more bootstrap samples are used to evaluate the statistic or to provide a confidence interval.
EXAMPLE 10.3
a) Use a bootstrap procedure to find a 95% confidence interval for the median 4.35 of the data in Example 10.2: ( -2.9 0.1 0.3 2.2 3.3 4.1 4.6 4.8 5.1 5.6 7.7 10.9). b) Find a 99% confidence interval for the standard deviation of 24 diamonds with the following number of imperfections: (2 3 5 7 8 9 10 1114 17 19 20 22 25 30 31 32 34 36 37 38 42 44 47) from a diamond population not thought to be normally distributed. A larger sample is much more likely to represent the population than a smaller sample. Solution a) Sampling with replacement from Data and using the nonparametric bootstrap command (np-bootstrap data 100) return the sorted list of 100 medians, each computed from a random bootstrap sample of size 12. 1.25 2.75 3.70 4.35 4.60 4.80 5.1
1.25 3.15 3.70 4.35 4.60 4.80 5.35
1.25 3.15 3.70 4.35 4.60 4.80 5.6
1.25 3.30 3.70 4.35 4.60 4.80 6.4
2.20 3.30 3.70 4.35 4.60 4.95
2.20 3.30 3.95 4.35 4.60 4.95
2.20 3.30 3.95 4.35 4.60 4.95
2.20 3.65 3.95 4.35 4.70 4.95
2.20 3.65 4.10 4.35 4.70 4.95
2.75 3.70 4.10 4.35 4.70 4.95
2.75 3.70 4.10 4.35 4.70 4.95
2.75 3.70 4.10 4.45 4.80 5.1
2.75 3.70 4.10 4.45 4.80 5.1
2.75 3.70 4.35 4.45 4.80 5.1
2.75 3.70 4.35 4.60 4.80 5.1
2.75 3.70 4.35 4.60 4.80 5.1
The median is 4.35, with the 95% confidence interval (1.25, 5.35) corresponding to the third and 98th elements in the sorted bootstrap. b) (setf d- flaws '(2 3 5 7 8 9 10 11 14 17 19 20 22 25 30 31 32 34 36 37 38 42 44 47))
P369463-Ch010.qxd 9/2/05 3:00 PM Page 614
614
Chapter 10 Nonparametric Statistics
The template (bs-sd data n alpha) returns a 100(1 - a)% confidence interval for the standard deviation of the diamond population. The command offers the option of seeing the list of n computed standard errors from the n random bootstrap samples. (bs-sd d- flaws 1000 0.01) returned (10.752 16.790) with 99% confidence. Each bootstrap sample is generated by random sampling of size 24 with replacement from the d-flaws data. Bootstrap Sample 1: (swr 24 d-flaws) Æ s1 = 12.789 (25 17 30 9 44 37 36 8 37 19 44 47 44 14 20 34 42 17 10 38 37 25 32 11) Bootstrap Sample 2: (swr 24 d-flaws) Æ s2 = 13.288 (32 47 9 8 9 37 22 19 34 36 2 19 30 10 8 36 14 11 22 25 31 47 42 22) Bootstrap Sample 3: (swr 24 d-flaws) Æ s3 = 11.706 (22 30 19 37 30 25 8 34 2 47 25 20 25 7 7 32 31 5 22 10 36 20 25 11) ... ... ... ... ... Bootstrap Sample 1000: (swr 24 d-flaws) Æ s1000 = 12.789 (11 9 8 22 34 37 32 36 3 3 22 11 30 3 25 11 32 44 42 11 47 3 37 22) The list of si is sorted. The 6th and 995th positions of the sorted bootstrap standard errors constitute the 99% confidence interval. The first 10 sorted bootstrap sample standard errors are (10.149 10.1787 10.36110.392 10.524 10.701 10.84110.942 10.950 11.033). The last sorted 10 bootstrap sample standard errors are (16.554 16.59116.642 16.644 16.828 16.945 17.017 17.385 17.445 18.101). The 6th and 995th are in bold for the 99% confidence interval for the standard deviation of the diamond population.
10.3
The Sign Test for Paired Data The sign test can also be used for paired data when we are interested in determining whether the two samples came from populations with the same median. Suppose we have two samples represented by X1, . . . , Xn and Y1, . . . , Yn and wish to determine if their underlying populations have the same median parameter. We simply apply the sign test to the paired differences with the null hypothesis that the median is zero, versus the alternative hypothesis that the median is not zero. The procedure is illustrated in the next example.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 615
10.4 The Wilcoxon Signed-Rank Test
EXAMPLE 10.3
615
A new measuring system was used to test the performance of a brand of motor oil. The old measuring system was also used. Test to determine if the median score is the same for both the old and the new measuring systems at a = 10%, versus the alternative hypothesis that measuring systems vary in their medians. H 0: m˜old = m˜new versus H1: m˜old π m˜new New Old Diff Sign
10 9 1 +
12 11 1 +
8 9 -1 -
13 14 -1 -
14 12 2 +
9 8 1 +
13 11 2 +
11 10 1 +
We have 2 negative signs and 6 positive signs. Since the alternative hypothesis is two-sided, the computed p-value is doubled. 2
8 2 * P ( X £ 2) = 2 ÊË ˆ¯ 0.58 = 0.2891 > 0.10 fi we cannot reject H 0 . x =0 x The old and new measuring systems have a common median. (sign- test '(1 1 -1 -1 2 1 2 1) 0 ' both) returns the following.
The command (setf data '(1 1 -1 -1 2 1 2 1)) followed by (sign-test data 0 'both) with H0 = 0 and tail set to both generates the following table: n 8
Below 2
Equal 0
Above 6
p-value 0.2890
Median 1
Type II Beta Error for the Sign-Test While the sign-test controls the Type I a error, the Type II b error depends not only on the alternative median but also on the assumption of the underlying distribution. The b error for a median displaced by one unit depends on the density function, whether normal, exponential, or any other continuous density function. However, once the underlying distribution is specified, the sign-test statistic is binomial and the b error can be computed.
10.4
The Wilcoxon Signed-Rank Test In the sign test we see that some of the available information is discarded, namely, the magnitudes of the differences above and below the hypothesized
P369463-Ch010.qxd 9/2/05 3:00 PM Page 616
616
Chapter 10 Nonparametric Statistics
median. The Wilcoxon signed-rank test makes use of these magnitudes. However, where the sign test assumes only that the underlying population is continuous, the Wilcoxon signed-rank test adds the assumption that the underlying population is symmetric as well as continuous. The Wilcoxon signed-rank test can test for symmetry under the continuous distribution assumption. If the test H0: m˜ = m˜ 0 vs. H1: m˜ π m˜ 0 is rejected, either the distribution is symmetric but m˜ π m˜ 0 or the distribution is not symmetric. When the magnitudes above and below the median are similar, the null hypothesis is accepted. When the magnitudes above differ significantly from those below, the null hypothesis is rejected. The Wilcoxon signed-rank test procedures for testing H0: m˜ = m˜ 0 vs. H1: m˜ π m˜ 0 are as follows. 1) Determine the difference in magnitudes between each sample value and the hypothesized median. 2) Sort the absolute magnitudes. 3) Sum the ranks of the initial positives to W + and the initial negatives to W -. W = min (W +, W -) is the test statistic. Reject H0 if W £ the critical Wn,a. EXAMPLE 10.4
Use the 12 samples in the data shown below from Example 10.1 to run the Wilcoxon signed-ranked test for testing H0: m˜ = 5 vs. H1: m˜ > 5. Data
4.66
-2.95
4.80
2.27
10.91
0.07
5.63
4.18
0.31
3.30
5.01
7.70
Solution (setf data '(4.66 - 2.95 4.80 2.27 10.91 0.07 5.63 4.18 0.31 3.30 5.01 7.70)) The magnitudes of the differences from the median value of 5 are given by the command (repeat #' - data (list-of 12 5)) -0.34 -7.95 -0.2
-2.73
5.91 -4.93
0.63 -0.82 -4.69 -1.7
0.01
The absolute magnitudes of the differences are then 0.34
7.95
0.2
2.73
5.91
4.93
0.63
0.82
4.69
1.7
0.01
2.7
The sorted absolute magnitudes with their signs are 0.01 + 1
0.2 2
0.34 3
0.63 + 4
0.82 5
1.7 6
2.7 + 7
2.73 8
4.69 9
4.93 10
5.91 + 11
7.95 12
2.7
P369463-Ch010.qxd 9/2/05 3:00 PM Page 617
10.4 The Wilcoxon Signed-Rank Test
617
In case of ties in magnitude, each is given the average of the ranks. For example, if the 3rd, 4th, and 5th positions were equal, each would be assigned a rank score of 4. n
The sum of all the ranks is
Âi = i =1
n( n + 1)
= 6(13) = 78.
2
The value of RV W + = w+, the sum of the ranks for the + entries, is 23 = (1 + 4 + 7 + 11). The value of RV W - = w-, the sum of the ranks for the negative entries, is 55. The value of RV W * = w* = min (23, 55) = 23.
The template (Wilcoxon-sign-test data median-value alpha side) returns W +, W -, W *, and Wcritical. For example, (Wilcoxon-sign-test '(4.66 - 2.95 4.80 2.27 10.91 0.07 5.63 4.18 0.31 3.30 5.01 7.70) 5 5 1) returns ( W * = 23 W - = 55 W * = 23 W -critical = 17 No-Reject ).
From the Wilcoxon signed rank statistic W + table, the critical w for n = 12 and a = 0.05 (one-sided) is 17 too few runs. Since 23 is not less than 17, we cannot reject the null. See a portion of the table in Table 10.1. Entering arguments are shown in bold. See Table 8 in Appendix. If the sample size for the Wilcoxon signed rank test is greater than 20, then W + can be shown to be approximately normal with
Table 10.1
Portion of Wilcoxon Signed-Rank Test Table CRITICAL VALUES FOR THE WILCOXON SIGNED-RANK TEST a-VALUES
n ... 11 12 13 14
0.10 0.05
0.050 0.025
0.02 0.01
0.010 0.005
... 13 17 21 25
... 10 13 17 21
... 7 9 12 15
... 5 7 9 12
two-sided tests one-sided tests
P369463-Ch010.qxd 9/2/05 3:00 PM Page 618
Chapter 10 Nonparametric Statistics
618
Z=
W - n( n + 1)/4 -0.5
.
n( n + 1)(2n + 1)/24 EXAMPLE 10.5
Use the data in Example 10.4 to repeat the test of H0: m˜ = 5 vs. H1: m˜ > 5 at a = 5%, using the normal approximation of W+ even though the sample size n is only 12. Solution
Data
4.66 -2.95
4.80
2.27
W + = 23 fi z =
10.91
0.07
5.63
4.18
|23 - 12(13)/4|-0.5
0.31
3.30
5.01
7.70
= -1.216,
12(13)(25)/24 with a one-tail p-value of 0.1120, and again the null hypothesis that the median is 5 cannot be rejected.
The template (WST-normal data H0) returns the W value, the z-value, and a one-tailed p-value. For the data in Example 10, (WST-normal data 5) returns ( w = 23
10.5
z = -1.255
p-value = 0.1047).
Wilcoxon-Mann-Whitney (WMW) Rank Test for Two Samples The Wilcoxon-Mann-Whitney rank test, also referred to as the Mann-Whitney U test or the Wilcoxon rank-sum test, can detect the difference between the medians of two samples from independent populations. The Wilcoxon test is used for two different treatments on the same sample (repeated measurements). The statistic tests the null hypothesis of no difference between the medians of the two underlying population distributions. Suppose that we have two independent random samples represented by X1, . . . , Xn and Y1, . . . , Ym and wish to test the hypothesis that the samples came from the same underlying population. We make no assumptions on the underlying population other than that the density function is continuous. Observe that the number in each sample does not have to be equal. The WMW rank test sums for each sample can be used to provide a test statistic for this hypothesis. First we combine the n + m values and rank each. Under the null hypothesis that both samples came from the same underlying population, the X- and Y-values should be distributed throughout.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 619
10.5 Wilcoxon-Mann-Whitney (WMW) Rank Test for Two Samples
619
Let SX indicate the sum assigned to the X-value ranks and SY indicate the sum assigned to the Y-value ranks. The sum of the ranks is given by the formula for the sum of the first k integers, namely, k
(1 + 2 + . . . + k) = Â i =
k( k + 1)
.
2
i =1
Similarly, [1 + 2 + . . . + ( n + m )] =
( n + m )( n + m + 1)
.
2 The average of the ranks is found by dividing the sum by the total number (n + m), ( n + m )( n + m + 1) 2 n+m
r=
=
( n + m + 1)
.
2
The expected value of SX, which has n entries, is then given by n * r or E( S X ) =
n( n + m + 1)
,
2
and, similarly, E( SY ) =
m( n + m + 1)
.
2
Thus if SX is not near enough to E(SX) or if SY is not near enough to E(SY), the null hypothesis can be rejected. In case of a tie, the ranks corresponding to the identical entries are summed and averaged, and the average is added to each group’s total. The procedures are illustrated in the next example. EXAMPLE 10.6
Use the WMW rank test to determine whether there is a significant difference in the medians for the two sets of scores at a = 0.05.
X-Test Scores Y-Test Scores
60 76
45 65
81 90
87 80
79 89
75 95
30 85
69
45
Solution H 0: m˜ x = m˜ y vs. H1: m˜ x π m˜ y 1. Assign the X- and Y-test scores, using the command (setf x '(60 45 81 87 79 75 30) y '(76 65 90 80 89 95 85 69 45)).
P369463-Ch010.qxd 9/2/05 3:00 PM Page 620
620
Chapter 10 Nonparametric Statistics
2. Join the two lists. The command (append x y) performs the joining, and returns (60 45 81 87 79 75 30 76 65 90 80 89 95 85 69 45). 3. Rank the joined list. The command (rank (append x y)) returns the ranking. X -ranking Y -ranking (4 2.5 1113 9 7 1 8 5 15 10 14 16 12 6 2.5) 4. Determine the sum of the ranks in each group. In case of a tie, the corresponding ranks are averaged and the average is assigned to each. The command (sum-ranks (x y)) returns (47.5 88.5).
60 45 4.5 2.5
79
81 87 9 11 13 9 SX = 47.5
75 7
30 1
76 8
65 5
90 80 89 95 85 69 45 Joined 15 10 14 16 12 6 2.5 Rank SY = 88.5
Notice the two values of 45 of the 2nd and last entries in the joined list. The corresponding ranks are summed (2 + 3) and the average of these two ranks, 2.5, is assigned to each. S X = 47.5; SY = 88.5. The Wilcoxon W statistic is min (SX, SY) = min (47.5, 88.5) = 47.5. 16
The sum of the ranks is
Âi =
16 * 17
r=
= 136 = 47.5 + 88.5.
2
i =1
n + m +1 2
=
7 + 9 +1
= 8.5
2
E( S X ) = n * r = 7 * 8.5 = 59.5 versus S X = 47.5. E( SY ) = 9 * 8.5 = 76.5 vs. 88.5. 5. Enter the table with W = 47.5, n = 7, and m = 9. A portion of the table for the Wilcoxon-Mann-Whitney statistic is shown in Table 10.2 for a = 0.05. The critical W = 40 < 47.5 indicates failure to reject the null hypothesis of no significant difference between the X- and Y-test scores. The normal approximation to the WMW test is given by Z=
W - n1( n1 + n2 + 1)/2
,
n1 n2 ( n1 + n2 + 1)/12 where n1 is the smaller sample size of one group and n2 the sample size of the other.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 621
10.5 Wilcoxon-Mann-Whitney (WMW) Rank Test for Two Samples
Table 10.2
621
Portion of Wilcoxon-Mann-Whitney n
EXAMPLE 10.7
m
4
5
6
7
8
9
10
8 9 10
14 15 15
21 2 23
29 31 32
38 40 42
49 51 53
63 65
78
11
12
13
14
Use the normal approximation to the WMW rank test to determine whether there is a significant difference in means for the two sets of scores at a = 0.05.
X-Test Scores Y-Test Scores
60 76
45 65
81 90
87 80
79 89
75 95
30 85
69
45
Solution Z=
W - n1( n1 + n2 + 1)/2
=
47.5 - 7(7 + 9 + 1)/2
= -1.270
n1 n2 ( n1 + n2 + 1)/12 7 * 9(7 + 9 + 1)/12 with p-value = 0.240.
The template (WMW-normal X Y ) returns the z and 2-tailed p-values for normal approximation testing of the null hypothesis of no difference between the two groups. (WMW-normal '(60 45 81 87 79 75 30) '(76 65 90 80 89 95 85 69 45)) prints Median of the first data set is 75.0 Median of the second data set is 80.0 Ranks of first group (4.0 2.5 11.0 13.0 9.0 7.0 1.0) Ranks of second group (8.0 5.0 15.0 10.0 14.0 16.0 12.0 6.0 2.5)
sum to 47.50 sum to 88.50
z = -1.270 with p-value = 0.2040, W = 47.50.
In some experiments, each subject undergoes two levels of the independent factor to determine whether the two levels represent two different distribu-
P369463-Ch010.qxd 9/2/05 3:00 PM Page 622
622
Chapter 10 Nonparametric Statistics
tions. An appropriate model to use is the McNemar test. The assumptions for the test are that the sample of n subjects has been randomly selected from the population of interest, each subject’s measurement is independent of the other subjects, and the measurements are categorical. The McNemar test can be used for pretest versus post-test, placebo versus drug, or favorable versus unfavorable results. The generic model is given by Favorable Yes No a b a+b c d c+d a+c b + d n = a + b + c + d.
Favorable Yes No
EXAMPLE 10.8
In an experiment with 100 randomly chosen people who suffered from frequent acid indigestion, the subjects were given two remedies A and B alternately to try for the next two episodes of indigestion. The subjects reported whether the remedy gave relief. Is there a difference in the two remedies, given the following results?
Relief 35 30 65
Remedy B Relief No Relief
Remedy A No Relief 20 15 35
Total 55 45 100
Solution H 0 : pb = pc vs. H1: pb π pc . In this experiment, each of 100 subjects reported 2 observations. Remedy B gave 55% relief and Remedy A gave 65% relief. Since the observations are paired, the McNemar test can be used. In this test the concordant pairs (agreed) are thrown out. For example, 35 experienced relief/relief and 15 experienced no relief/no relief. These pairs are removed from the analysis. Method 1: A c 2 test can be made on the discordant pairs 30 and 20 under the null hypothesis of no difference between the two remedies. c2 =
( b - c )2 b+c
=
(20 - 30)2 20 + 30
= 2,
which is similarly shown with an expected 25 for each. c2 =
(30 - 25)2 25
+
(20 - 25)2
= 2 fi p-value = 0.1573 for v = 1df .
25
(chi-square re-test '((20 30))) Æ c 2 = 2, p-value = 0.1573.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 623
10.6 Spearman Rank Order Correlation Coefficient
623
Method 2: A normal approximation to the proportion test can be made under the hypothesis H0: p = 0.5. z=
30/50 - 25/50
= 1.414 fi p-value of 0.1573 for a 2-way test.
0.5 * 0.5/50 Note that z2 = 1.4142 = 2 = X 2. This is also similar to z=
b-c
=
b-c
10.6
20 - 30 20 + 30
= -1.414.
Spearman Rank Order Correlation Coefficient Recall that the Pearson correlation coefficient ranged from -1 to +1 to indicate the extent of an assumed linear relationship between two measures. For ordinal x-y data, there may not be much of a linear relationship between x and y, yet there may be interest in knowing perhaps to what extent y increases (decreases) each time x increases. The Spearman rank order correlation coefficient can indicate the consistency of change. The range of the coefficient is from -1 (perfectly monotonic) to 0 (no consistent change) to 1 (perfectly monotonic), with the sign indicating the direction of the trend. The Spearman correlation deals with the ranks rather than the actual measures of the two variables. For example, the Spearman correlation can indicate whether practice makes perfect: each time one practices, does one’s performance increase? The procedures for finding the Spearman correlation are as follows. 1) Convert the x and y data into ranks separately. 2) Compute the sum of the squares of the differences d between the x-ranks and corresponding y-ranks. 3) Spearman r is denoted by rs = 1 -
6Â d 2 n( n 2 - 1)
for small samples ( n £ 10)
(10–2)
and by the unit normal Z=
n( n 2 - 1) - 6Â d 2
for large samples ( n > 10).
n( n + 1) n - 1 The Spearman correlation is equal to the Pearson correlation of ranks when there are no ties.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 624
624
EXAMPLE 10.9
Chapter 10 Nonparametric Statistics
Below are scores for X and Y as well as their respective ordinal ranks. The row indicated by D is the difference in the respective ranks (x-rank–y-rank) and D2 is the square of the differences.
RANK rank X Y X-ranks Y-ranks d d2
1 3 4 2 3 -1 1
2 6 9 4 4 0 0
3 7 15 5 5 0 1
4 4 3 3 2 1 0
5 2 1 1 1 0 0
(Pearson-r '(3 6 7 4 2) '(4 9 15 3 1)) Æ the Pearson correlation 0.945. Recall Pearson’s r = rP =
S XY
,
S XX SYY where S XY = S( x i - x )( yi - y ) and S XX = S( x i - x )2 . Spearman’s r = rS = 1 -
6Â d 2
n( n 2 - 1) = 1 - 6 * 2/(5 * 24) = 1 - 0.1 = 0.9.
The command (setf x '(3 6 7 4 2) y '(4 9 15 3 1)) assigns x and y to the data. (rank x) returns (2 4 5 3 1), the ranks of x. (Spearman-r x y) returns 0.9, the Spearman correlation. (Pearson- r x y) returns 0.945, the Pearson correlation. (Pearson- r (rank x) (rank y)) returns 0.9, the Spearman correlation.
In assigning the ranks to the x and y ordinal data, ties may occur. For example, suppose that ranks 4, 5, and 6 all have the same x-value. Then the sum of the ranks (4 + 5 + 6) divided by the number of x-data with the same value (3), is assigned as the rank for each same x-data (5). EXAMPLE 10.10
a) Find the Spearman correlation for x and y scores. b) Show that the Pearson correlation between the ranks is equal to the Spearman correlation.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 625
10.7 Kendall’s Rank Correlation Coefficient (t)
X Y
625
12 16 18 9 23 11 12 34
17 15
Solution The command (Spearman-r '(12 16 18 9 17) '(23 11 12 34 15)) returns rS = -0.7. The command (rank '(12 16 18 9 17) returns the x-ranks as 2 3 5 1 4, and the command (rank '(23 11 12 34 15) returns the y-ranks as 4 1 2 5 3. The command (Pearson-r '(2 3 5 1 4) '(4 1 2 5 3)) returns the Pearson correlation rP = -0.7. EXAMPLE 10.11
Data for 15 workers tested for IQ and salary earned are shown below. Find the Spearman rank order correlation coefficient from the data. Worker 1 2 3 4 5 6 7 8 9 10 11 IQ 110 121 98 102 132 126 117 119 127 122 118 Salary 35 42 28 30 57 65 46 50 62 70 65 IQ-rank Sal-rank D D2
5 4 1 1
11 7 4 16
1 2.5 15 13 1 3 11 13.5 0 -.5 4 -.5 0 .25 16 .25
7 9.5 14 9 10 12 -2 -.5 2 4 .25 4
12 13 14 15 102 111 105 119 45 38 29 38
12 8 2.5 6 15 13.5 8 5.5 -3 -5.5 -5.5 .5 9 30.25 30.25 .25
4 2 2 4
9.5 5.5 4 16
The command (ex 10.11) assigns the IQ and salary data. Solution Notice, for example, that there are two IQ scores of 102 occurring at the 2nd and 3rd ranks. Thus, each is given a rank of 2.5, the average of 2 and 3. Sd 2 = 131.5, from which rs = 1 - 6 * 131.5/(15 * 224) = 0.7652. = (spearman-r IQ salary). The Pearson correlation 0.7643 for the ranks is nearly the same and is slightly lower because of the ties.
10.7
Kendall’s Rank Correlation Coefficient (t) In x-y paired data, often high (low) x-values associate with high (low) y values. For i < j, if xi < xj and yi < yj, then pairs (xi, yi) and (xj, yj) are said to concordant. Thus pairs (1, 3) and (4, 6) are concordant, pairs (1, 3) and (2, 1) are discordant, while pairs (4, 7) and (4, 9), pairs (5, 10) and (7, 10), and pairs (5, 10) and (5, 10) all constitute a tie. In n observations there are nC2 = n (n - 1)/2 pairings. Kendall’s t is defined as t =
nc - nd n( n - 1)/2
,
where nc and nd are number of concordant and discordant pairs, respectively.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 626
Chapter 10 Nonparametric Statistics
626
If all the pairs are discordant, t = -1; if all are concordant, t = 1; if nc = nd then t = 0, implying independence of the x-y data. EXAMPLE 10.12
a) Find the Kendall t and test the correlation for x and y scores. b) Compare with the Spearman-r. c) Repeat the Kendall t for ranked data. X Y
12 16 18 9 23 11 12 34
17 15
Solution a) H 0: t = 0 vs. H1: t π 0. The 8 discordant pairs are: (12, 23) with (16, 11), (18, 12), (17, 15) and (9, 34); (16, 11) with (9, 34); (9, 34) with (17, 15); (18, 12) with (9, 34) and (17, 15). The 2 concordant pairs are: (16, 11) with (18, 12) and (17, 15). With use of the template (Kendall-tau x-data y-data a tail), where a is 5% or 1% and tail is 1 or 2, (Kendall-tau '(12 16 18 9 17) '(23 11 12 34 15) 0.05 1) returned Concordant-Pairs = 2 Discordant-Pairs = 8 Ties = 0 Kendall’s Tau = -0.6 ( nc - nd ) = 6 Critical = 8 Cannot Reject b) The command (Spearman-r '(12 16 18 9 17) '(23 11 12 34 15)) returns rS = -0.7. The Kendall t is lower than the Spearman-r. c) (Kendall-tau (rank '(12 16 18 9 17)) (rank '(23 11 12 34 15)) 5 2) returns the exact same output as with the actual data.
10.8
Nonparametric Tests for Regression When the assumptions of least squares regression cannot be met, regression methods focusing on the median rather than the mean can be used. One such method developed by Thiel estimates the slope parameter by using the median of all slopes of the lines determined by the x-y data points. Consider the following example.
EXAMPLE 10.13
Given the following 7 x-y data pairs, fit an apt regression line. X 0 Y 11
Solution
1 2 7 6.1
3 4 5 6 7 8 5.4 4.7 4 3.3 2.7 2
(setf x (upt0 8) y '(11 7 6.1 5.4 4.7 4 3.3 2.7 2)).
P369463-Ch010.qxd 9/2/05 3:00 PM Page 627
10.8 Nonparametric Tests for Regression
627
Residuals Versus the Order of the Data (Response is y)
Residual
2
1
0
–1 1
2
3
4
5
6
7
8
9
Observation Order Figure 10.1
Ordered Residuals
If we use (Y-hat x y), the equation returned is Y - hat = 8.86 - 0.932 X . (residuals x y) Æ (2.14 -0.928 -0.896 -0.665 -0.433 -0.201 0.03 0.361 0.593). Notice how the residuals start positive for a value and then go negative for 5 values, returning positive for the last 3 values. A plot of the ordered residuals using the command (pro (residuals x y)) is shown in Figure 10.1. The plot gives one concern for the assumptions of least squares regression being applicable, as the first y-value appears high but is not an outlier. It seems desirable to lessen the effect of wayward values. Using Thiel’s method, we first find the slopes by fixing each x-y point in turn and computing the slope from each of the other x-y data pairs. The command (Thiel-b x y) returns the following slopes with median b = -0.7. As there are 9 data points, there are 9C2 = 36 slopes to compute.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 628
628
Chapter 10 Nonparametric Statistics
(-4 -2.45 -1.866 -1.575 -1.4 -1.283 -1.185 -1.125) ; Point (0, 11) fixed; (-0.9 -0.8 -0.766 -0.75 -0.74 -0.716 -0.714) ; Point (1, 7) fixed; (-0.7 -0.7 -0.7 -0.7 -0.68 -0.683) ; Point (2, 6.1) fixed; (-0.7 -0.7 -0.7 -0.675 -0.68) ; Point (3, 5.4) fixed; (-0.7 -0.7 -0.666 -0.675) ; Point (4, 4.7) fixed; (-0.7 -0.65 -0.666) ; Point (5, 4) fixed; (-0.6 -0.65) ; Point (6, 3.3) fixed; (-0.7) ; Point (7, 2.7) fixed; 9C2 = (8n -
n - 1 slopes n - 2 slopes n - 3 slopes n - 4 slopes n - 5 slopes n - 6 slopes n - 7 slopes n - 8 slopes 36) = 36
Fixing the first pair of x-y data (0, 11), the slopes are calculated as yi +1 - yi x i +1 - x i
=
7 - 11 1- 0
= -4,
6.1 - 11 2-0
= -2.45, etc.
The intercept a is the median of the ai = yi - 0.7xi. The command (repeat #' -y (repeat #' * (list-of 9 -0.7) x)) returns the ai as (117.7 7.5 7.5 7.5 7.5 7.5 7.6 7.6) with median 7.5. The second value a2 = 7.7 is computed from y2 - bx2 = 7 - (-0.7)*1 = 7.7. The Thiel regression (Thiel-r x y) returns the equation y = 7.5 - 0.7X. See refitted data in Figure 10.2. The y-hats of this equation are generated from the command (repeat #' + (list-of 9 7.5) (repeat #' * (list-of 9 -0.7) x)), returning (7.5 6.8 6.1 5.4 4.7 4 3.3 2.6 1.9), compared to the original yi observations (117 6.1 5.4 4.7 4 3.3 2.7 2), curtailing the influence of the observed value y1 = 11.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 629
10.8 Nonparametric Tests for Regression
629
11 10 9 8 y
7 6 5 4 3 2 0
1
2
3
4 x
5
6
7
8
INITIAL X-Y SCATTER PLOT 8 7
z
6 5 4 3 2 0
Figure 10.2
1
2
3
4 x
5
X-Y Scatter Plot (Thiel Regression)
6
7
8
P369463-Ch010.qxd 9/2/05 3:00 PM Page 630
Chapter 10 Nonparametric Statistics
630
EXAMPLE 10.14
a) Compare the least square regression with the Thiel method of regression for x-data from 1 to 100 and y-data from the squares of 1 to 100. b) Predict the equation if the ranks of the x-data and y-data were used. c) Predict the equation if second order polynomial regression is used. Solution a) (y-hat (upto 100) (repeat #' square (upto 100))) Æ Y-hat = -1717 + 101X; (thiel-r (upto 100) (repeat #' square (upto 100))) Æ y = -1925 + 101X. b) (Y-hat (rank (upto 100)) (rank (repeat #' square (upto 100)))) Æ Y-hat = 0 + 1X. (Thiel-r (rank (upto 100)) (rank (repeat #' square (upto 100)))) Æ Y-hat = 0 + 1X. Notice that the regression is nonlinear but strongly monotonic. c) (polynomial-regress (upto 100) (repeat #' square (upto 100)) 2) returned the exact equation Y-hat = 0 + 0XŸ1 + 1XŸ2 where x Ÿ2 is x2.
10.9
Nonparametric Tests for ANOVA
Kruskal-Wallis The ANOVA procedures along with the F-test assumed that the underlying distributions from which the samples were taken were normal, with homogeneity of variances. When that assumption cannot be safely met, the distribution-free Kruskal-Wallis (K-W) test is appropriate. The continuity assumption along with same shape distribution and random sampling hold for the K-W test. The procedure for the test is as follows. r
1) Rank all N samples from the r treatments together where N = Â ni . i =1
Ties result in each entry receiving the average rank. 2) Under H0: mi = mj, the ranks should be uniformly distributed throughout. n N +1 n( n + 1) , 3) With  i = , we would expect the rank of each sample to be 2 2 i =1 the expected value for a discrete uniform random variable. 4) The K-W test statistic is KW =
r
2
N + 1ˆ Ê ni Ri , Â Ë N ( N + 1) i =1 2 ¯ 12
P369463-Ch010.qxd 9/2/05 3:00 PM Page 631
10.9 Nonparametric Tests for ANOVA
631
which is equivalent to the computationally easier formula given by KW =
12 N( N +
r
 1) i =1
R 2i
- 3( N + 1).
ni
5) In case of several ties a more accurate K-W statistic is given by KW =
1 È r R 2i N ( N + 1)2 ˘ ÍÂ ˙˚, 4 S 2 Î i =1 ni
where S2 =
r nj N ( N + 1)2 ˘ È 2 r ij   ˙˚ Í N - 1 Î i =1 j =1 4
1
The K-W statistic can be compared with the critical chi-square table value for (r – 1) degrees of freedom. We show a simple example to follow the procedure by hand. EXAMPLE 10.15
The scores of sounds caused by three different brake materials are shown with their ranks in parentheses. B1 B2 B3
7 (1) 19 (3) 50 (9)
12 (2) 39 (7) 45 (8)
24 (4) 30 (6) 25 (5)
Solution N=9 R2 = 3 + 7 + 6 = 16; R22 = 162 = 256;
R1 = 1 + 2 + 4 = 7; R12 = 72 = 49; KW =
12
r
R 2i
Ân N ( N + 1) i =1
R3 = 9 + 8 + 5 = 22. R23 = 222 = 484.
Ê 49 256 484 ˆ + + - 3(9 + 1) 9(9 + 1) Ë 3 3 3 ¯ = 5.06 < 5.99 = X 22,0.05 .
- 3( N + 1) =
i
12
The critical chi-square value at a = 5% for 2 degrees of freedom is 5.99, implying we cannot reject the hypothesis of no difference among the brake materials. EXAMPLE 10.16
The grades below resulted from three different teaching methods of problemsolving. Compute the K-W statistic to test if there is a difference in the methods at a = 5%.
Interpretation Lecture Group Solve
87 58 63
95 89 95
88 79 89
78 87 76
92 90 68
84 69 88
75 79 92
90 88 85
95 95 77
P369463-Ch010.qxd 9/2/05 3:00 PM Page 632
632
Chapter 10 Nonparametric Statistics
Solution The template (KW-ranks list-of-numbers) returns an ordinal rank of the list of numbers. For example, (KW-ranks '((7 12 4))) returns (2 3 1). The command (ex10.16) assigns k1 to (87 95 88 78 92 84 75 90 95), k2 to (58 89 79 87 90 69 79 88 95), k3 to (63 95 89 76 68 88 92 85 77), and k-list to (list k1 k2 k3)). The ranks are then given by (KW-ranks k-list)), returning 13.5 25.5 16 8 22.5 11 5 20.5 25.5 1 18.5 9.5 13.5 20.5 4 9.5 16 25.5 2 25.5 18.5 6 3 16 22.5 12 7. The template (KW-rank-sums list) returns the sum of the respective treatment ranks. For example, (KW-rank-sums k-list) returns (147.5 118 112.5). The command (setf sq-list (repeat #' square '(147.5 118 112.5))) assigns and returns (21756.25 13924 12656.25). The command (sum sq-list) returns 48336.5. The value for S2 = 62.635 is given by (KW-s2 k-list). (KW-test k-list) returns (kw = 1.257, p-value = 0.533, SŸ2 = 62.635).
Observe that the sum of the ranks (147.5 + 118 + 112.5) = 378 =
N ( N + 1) 2
=
27(27 + 1)
.
2
Thus, KW =
1 È r R 2i N ( N + 1)2 ˘ 1 È 27(27 + 1)2 ˘ = 5370 . 72 Â Í ˙˚ ˙˚ S 2 Î i =1 ni 4 62.62 ÍÎ 4 = 1.2568 < 5.99 = c 20.05,2 ,
and the null hypothesis of no difference among the treatments cannot be rejected. An equivalent K-W test is an ANOVA on the ranks.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 633
10.9 Nonparametric Tests for ANOVA
633
The command (anova (kw-ranks k-list)) prints the table below. Source
SS
df
MS
F
p-value
Between Within Total
78.72 1549.77 1628.50
2 24 26
39.36 64.57
0.6
0.53
Observe that the F-ratio and p-value confirm the Kruskal-Wallis test of not rejecting the null hypothesis. EXAMPLE 10.17
Perform a Kruskal-Wallis test on the data, followed by an ANOVA on the ranks. The ranks are shown in parentheses. A B C
6 (1) 18 (15.5) 9 (6)
9 (6) 19 (17) 8 (4)
7 (2.5) 18 (15.5) 10 (9)
7 (2.5) 20 (18) 10 (9)
9 (6) 17 (14) 14 (13)
10 (9) 13 (12) 11 (11)
Solution (ex10.17) assigns variables A, B, and C to their respective data and assigns variable k-list to (list A B C). (kw-ranks k-list) returned the respective ranks (1 6 2.5 2.5 6 9) (15.5 17 15.5 18 14 12) (6 4 9 9 13 11). (KW-s2 k-list) Æ 28.206. S2 =
r nj N ( N + 1)2 ˘ 1 È 18(18 + 1)2 ˘ È 2 1982.83 ˙˚ = ÍÎÂ Â rij Í ˙˚ = 28.21. N - 1 i =1 j =1 4 18 - 1 Î 4
1
(KW-test k-list) returns KW = 12.704, p-value = 10743e-3, SŸ2 = 28.206. KW =
1 È r R 2i N ( N + 1)2 ˘ 1 È729 + 8464 + 2704 18(18 - 1)2 ˘ = Â Í ˙ ˙˚ ˚ 28.21 ÍÎ S 2 Î i =1 ni 4 6 4 = 12.70 with p-value 0.0017.
The command (anova (KW-ranks k-list)) printed
ANOVA ON THE RANKS Source
SS
df
MS
F
p-value
SSBetween SSWithin SSTotal
358.33 121.16 479.50
2 15 17
179.16 8.07
22.18
0.000
P369463-Ch010.qxd 9/2/05 3:00 PM Page 634
Chapter 10 Nonparametric Statistics
634
Friedman Test The Friedman test is a distribution-free nonparametric ANOVA for a randomized block design when comparing k treatments for significant difference among the means. The observations are ranked within each block and summed within each treatment. The test statistic is given by Fr =
r
12
ÂR bk( k + 1)
2 i
- 3b( k + 1),
i =1
where b is the number of blocks and Fr has an approximate chi-square distribution with k - 1 degrees of freedom. First rank the measurements within each block and sum the ranks for each of the k treatments. EXAMPLE 10.18
Perform a nonparametric ANOVA for the randomized block experiment with four treatment effects (A, B, C, and D) and ten blocking levels (1 to 10) at a = 5%.
A Treatments B C D
Solution
1 20 42 64 30
2 19 21 48 16
3 19 7 13 21
4 23 90 30 75
Blocks 5 6 7 19 20 23 84 32 20 70 9 34 18 36 67
8 21 2 70 43
9 26 10 36 92
10 21 70 70 41
(ex10.18) assigns the data to A, B, C, and D and k-list is assigned to (list A B C D).
In testing for homogeneity of variance, use the command (Bartlett k-list) for the Bartlett test to get a B-value of 34.05 with a p-value ª 0. The low pvalue implies rejection of the null hypothesis homogeneity of variance. Thus, use the Friedman nonparametric test.
Block Ranks 1 1 3 4 2
2 3 4 5 6 7 8 9 10 2 3 1 2 2 2 2 2 1 3 1 4 4 3 1 1 1 3.5 4 2 2 3 1 3 4 3 3.5 1 4 3 1 4 4 3 4 2
Ri 18 24.5 29.5 28 S
R2i 324 600.25 870.25 784 2578.5
For example, in ranking Block 1, (rank '(20 42 64 30)) returns (1 3 4 2); in ranking Block 7, (rank '(23 20 34 67)) returns (2 1 3 4).
P369463-Ch010.qxd 9/2/05 3:00 PM Page 635
10.9 Nonparametric Tests for ANOVA
F x2 =
12
635
k
ÂR bk( k + 1)
2 i
- 3b( k + 1) =
i =1
12 10 * 4 * 5
* 2578.5 - 30 * 5 = 4.71
with p-value = 0.193. We cannot reject the hypothesis of no significant differences among the treatments.
The template (Friedman-chi-sq-test data) returns an approximate chi-square statistic and p-value for testing a randomized block design where data is a list of the treatment measurements. The variable k-list is assigned to the data with command (ex10.18). (Friedman-chi-sq-test k-list) returns Fr = 4.71, p-value = 0.194.
An improved Friedman test statistic is based on the F distribution and is given by k
ÂR
2 i
i =1
-
bk( k + 1)2 4
b
FF = ( b - 1)
k
k
b
,
  R - ÊË Â R 2i ˆ¯ /b 2 ij
i =1 j =1
i =1
with k - 1 and (k - 1) * (b - 1) degrees of freedom. Using this statistic, we have SRi2 = 2578.5, SSRij2 = 299.5, and FF = 1.7 with p-value = 0.1876.
The command (Friedman-F-test data) returns the Friedman F-test statistic and p-value. k
ÂR
2 i
i =1
FF = ( b - 1)
-
bk( k + 1)2 4
b k
k
b
ÂÂ R i =1 j =1
2 ij
-
Ê ˆ R 2 /b ËÂ i ¯ i =1
If the results show significant difference among the treatment means, then a multiple comparison test can be made, similar to the parametric tests. The critical difference is given by
P369463-Ch010.qxd 9/2/05 3:00 PM Page 636
636
Chapter 10 Nonparametric Statistics k b 1 b Ê ˆ 2b   R 2ij -  R 2i Ë b i =1 ¯ i =1 j =1
ta/2
( k - 1)(b - 1)
.
The degree of freedom for the critical t-value is (k - 1) * (b - 1).
EXAMPLE 10.19
Perform a nonparametric ANOVA for the randomized block experiment with four treatment effects (A, B, C, and D) and 5 blocking levels at a = 5%. Solution (ex10.19) assigns the data to the variable F-data.
1 A Treatments B C D
2
Blocks 3 4
5
10 19 19 23 19 42 21 7 20 34 64 48 53 60 70 18 26 21 25 18
Block Ranks 1 2 3 4 5 Ri Ri2/b 1 3 4 2
1 2 4 3
2 1 4 3
2 1 4 3
2 8 64 3 10 100 4 20 400 1 12 144 S 708/5
(Friedman-F-test F-data) returns F = 7.90 with p-value = 0.0036, and we reject the hypothesis of equal treatment means. The critical difference is computed as
tn -1,a /2
k b 1 b Ê ˆ 2b   R 2ij -  R 2i Ë ¯ b i =1 j =1 i =1
( k - 1)(b - 1)
= 2.179 *
2 * 5 * (150 - 141.6) 3*4
= 5.77,
where tn-1,a/2 = 2.1789. The rank totals are 8, 10, 20, and 12 for A, B, C, and D, respectively.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 637
10.10 Runs Test
637
The six comparisons are |8 - 10| = 2 fi no difference. |8 - 20| = 12 fi significant. |8 - 12| = 4 fi no difference. |10 - 20| = 10 fi significant. |10 - 12| = 2 fi no difference. |20 - 12| = 8 fi significant. Thus C is significantly different from the other three treatments.
10.10 Runs Test In checking the number of yes/no, up/down, +/-, and true/false data, the actual number of each may not be as discerning as the occurrence of patterns called runs. A run is a repeated subsequence of an identical symbol, preceded and followed by a different symbol corresponding to the measurement of data using two distinct symbols. For example, consider the following outcomes from tossing a coin 20 times: T H H T T H H T H H H T T T T T H H T T. We see that there are 11 T’s and 9 H’s and 9 runs of lengths 1, 2, 2, 2, 1, 3, 5, 2, and 2 (underlined). There are 5 runs of T and 4 runs of H. The difference in the number of runs for the two symbols can only be -1, 0, or +1, depending on the first and last positions. To simply illustrate the number of T-runs minus H-runs, T H H T produces 2 - 1 = +1, T H T H produces 2 - 2 = 0, and H T T H produces 1 - 2 = -1. Can we test to determine if the outcomes were generated randomly? Imagine the fishy data if the run order were H H H H H H H H H H T T T T T T T T T T with 2 runs. A runs test can test the hypothesis that the order is random and reject the hypothesis if there are too many runs or an insufficient number of runs. Tables and software routines are available to test the probability of the runs occurring. In any n-sequence of two symbols, there will be n1 of one symbol and n2 of the other symbol, with n1 + n2 = n. The number of equally likely ways that the n-sequence can be ordered is given by Ê n ˆ = Ê n ˆ = n! . Ë n ¯ Ë n ¯ n !n ! 1 2 1 2 Let RV R be the number of runs in the sequences. First suppose that R is even. Recall that the number of distinguishable ways to distribute n objects n - 1ˆ . into r distinguishable cells with no cell empty (two symbols) is ÊË r - 1¯
P369463-Ch010.qxd 9/2/05 3:00 PM Page 638
638
Chapter 10 Nonparametric Statistics
Picture n black dots in a row ••••••••. . . . To make r cells, r - 1 white dots can be placed between any two black dots, but not at the beginning or end (no empty cell). Notice that with r as the total number of runs, Ê n1 - 1ˆ Ê n1 - 1ˆ ÁÁ r - 2 ˜˜ = Ë r1 - 1¯ Ë ¯ 2 n -1 r -2 where r1 is the number of H-runs. There are Ê 1 ˆ ways to form the ÁÁ r - 2 ˜˜ 2 Ë ¯ 2 n -1 runs from the n1 - 1 H-symbols. Similarly, there are Ê 2 ˆ ways to form ÁÁ r - 2 ˜˜ Ë ¯ 2 r -2 the runs from the n2 - 1 T-symbols. Since the starting symbols are 2 interchangeable, the number of arrangements is doubled. For an even number of runs there are Ê n1 - 1ˆ Ê n2 - 1ˆ 2Á r Á - 1˜˜ ÁÁ r - 1 ˜˜ Ë ¯Ë ¯ 2 2 arrangements. If the number of runs is odd, r = r1 + r2 where r1 = r2 ± 1. The discrete density function for P(R = r) is given by Ê n1 - 1ˆ Ê n2 - 1ˆ 2Á r Á - 1˜˜ ÁÁ r - 1 ˜˜ ¯ ¯Ë Ë 2 2 P( R = r ) = , r even, Ê nˆ Ën ¯ 1 1 n Ê 1 ˆ Ê n2 - 1ˆ Ê n1 - 1ˆ Ê n2 - 1ˆ ÁÁ r - 1 ˜˜ ÁÁ r - 3 ˜˜ + ÁÁ r - 3 ˜˜ ÁÁ r - 1 ˜˜ ¯ ¯Ë ¯ Ë ¯Ë Ë 2 2 2 2 = , r odd. Ê nˆ Ën ¯ 1 Notice that when R is odd, the numbers of H- and T-runs are
r -1 2
or vice versa.
and
r +1 2
P369463-Ch010.qxd 9/2/05 3:00 PM Page 639
10.10 Runs Test
639
Consider the run patterns of n (7) symbols of H and T with n1 = 2H and n2 = 5T. The 7C2 = 7C5 = 21 run patterns are shown below with the respective number of runs. The number of runs is bound between 2 and 2n1 if n1 = n2 or bound between 2 and 2n1 + 1 if n1 < n2. (H H T T T T T) (H T H T T T T) (H T T H T T T) (H T T T H T T) (H T T T T H T) 2 4 4 4 4 (H T T T T T H) (T H H T T T T) (T H T H T T T) (T H T T H T T) (T H T T T H T) 3 3 5 5 5 (T H T T T T H) (T T H H T T T) (T T H T H T T) (T T H T T H T) (T T H T T T H) 4 3 5 5 2 (T T T H H T T) (T T T H T H T) (T T T H T T H) (T T T T H H T) (T T T T H T H) 3 5 4 3 4 (T T T T T H H) 2 Ê n1 - 1ˆ Ê n2 - 1ˆ Ê n1 - 1ˆ Ê n2 - 1ˆ ÁÁ r - 1 ˜˜ ÁÁ r - 3 ˜˜ + ÁÁ r - 3 ˜˜ ÁÁ r - 1 ˜˜ ¯ ¯Ë ¯ Ë ¯Ë Ë 2 2 2 2 P ( R = 3) = Ê nˆ Ën ¯ 1 Ê 2 - 1ˆ Ê 4 - 1 ˆ Ê 2 - 1 ˆ Ê 4 - 1ˆ Á 3 - 1˜ Á 3 - 3 ˜ + Á 3 - 3 ˜ Á 3 - 1˜ Ë ¯Ë ¯ Ë ¯ ¯Ë 5 2 2 2 2 = = , 21 Ê 2 + 5ˆ Ë 2 ¯ and we note that there are 5 of 21 patterns depicting runs of 3. Ê n1 - 1ˆ Ê n2 - 1ˆ Ê 2 - 1ˆ Ê 5 - 1ˆ 2Á r 2 ˜ Á ˜ r Á 4 - 2˜ Á 4 - 2˜ Á - 1˜ Á - 1 ˜ Ë ¯Ë Ë ¯Ë ¯ ¯ 8 2 2 2 2 P ( R = 4) = , = = 21 Ê nˆ Ê 7ˆ Ën ¯ Ë 2¯ 1 and we note that there are 8 of 21 patterns depicting 4 runs. With RV R being the number of runs, a discrete density can be created. R 2 P(R) 2/21
3 5/21
4 8/21
5 6/21
To test H0: R £ 2 vs. H1: R ≥ 2, we see that P(R £ 2) = 2/21 = 0.095, a onetail p-value.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 640
640
Chapter 10 Nonparametric Statistics
The command (run-patterns g-pattern symbol-1 symbol-2) returns the patterns and associated runs for each pattern. For example, (runpatterns '(H H H T T) 'H 'T) prints (H H H T T) (H H T H T) (H H T T H) (H T H H T) (H T H T H) 2 4 3 4 5 (H T T H H) (T H H H T) (T H H T H) (T H T H H) (T T H H H) 3 3 4 4 2 Naturally, with permutations involved, the g-patterns must be small (£9 symbols) for display. (run-density-table g-pattern symbol-1 symbol-2) creates the discrete density table. (run-density- table '(H H H T T) 'H 'T) prints
R P(R = r)
2 0.2
Discrete Density Table 3 4 5 . 0.3 0.4 0.1
(cum-run-density- table '(H H H T T) 'H 'T) prints Cumulative Discrete Distribution Table R 2 3 4 5. P(R < r) 0.2 0.5 0.9 1
(run-density n1 n2 r) returns the P(R = r). For example, (run-density 3 2 4) Æ 0.4. (cum-run-density n1 n2 r) returns P(R £ r). (cum-run-density 3 2 4) Æ 0.9.
When n1 and n2 are large (>10), R can be approximated by the normal distribution with E( R ) =
EXAMPLE 10.20
2n1 n2 n1 + n2
= 1 and V ( R ) =
2n1 n2 (2n1 n2 - n1 - n2 ) ( n1 + n2 )2 ( n1 + n2 - 1)
.
Students were asked to flip a coin 100 times and to record the outcomes for a homework assignment. One student turned in the following pattern. Determine statistically if the pattern was written without flipping or actually resulted from flipping a fair coin 100 times. Solution The variable rdata is assigned to the data by the command (ex10.20).
P369463-Ch010.qxd 9/2/05 3:00 PM Page 641
10.11 Randomization Tests
641
(setf rdata '(H H H H T T H H H H H H T T T T H H H H H H T H H H TTTHHHHHTTHHHHHHTTTHHHTHHH HHHTTTTHHHHHHHTTHHHHTTHHHT H H H H T T T H H H H H H H H H H H H T T H))
The command (n-runs data symbol-1 symbol-2) returns the number of runs in the data. For example, (n-runs rdata 'T 'H) returns 27.
There are 68 H’s, 32 T’s, and 27 runs. (cum-run-density 68 32 27) returns P(R £ 27) = 0.00005. The sequence is rejected as randomly generated. With use of large sample theory, (R-normal rdata 'H 'T) returned the normal probability of 0.9999678.
EXAMPLE 10.21
Dichotomize the following 20 data points about the mean and test to determine if the 0 -1 runs appear random: 6 3 8 5 6 3 3 1 4 6 11 6 4 5 4 7 7 6 4 6 Solution (mu '(6 3 8 5 6 3 3 1 4 6 11 6 4 5 4 7 7 6 4 6)) returns 5.25. (mu-runs '(6 3 8 5 6 3 3 1 4 6 11 6 4 5 4 7 7 6 4 6)) converts the points to 0 or 1, returning (1 0 1 0 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 1). (n-runs '(1 0 1 0 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 1) 0 1) returns 11 runs. As the mean is 5.25, the data points are converted to 0 for those below the mean and 1 for those above the mean. Those equal to the mean are discarded, yielding the following: 1 0 1 0 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 1. (Run-probability '(1 0 1 0 1 0 0 0 0 1 1 1 0 0 0 1 1 1 0 1) 0 1) returns P(R £ 11) = 1.
10.11
Randomization Tests The distribution-free tests are appropriate when assumptions of sampling from normal distributions and homogeneity of variance are not appropriate,
P369463-Ch010.qxd 9/2/05 3:00 PM Page 642
642
Chapter 10 Nonparametric Statistics
but the random sample assumption still prevails. Many samples are difficult to choose randomly and consume resources to do so. With available data that are not randomly selected, permutations can get relief from this assumption. A small example will illustrate the procedure of randomization tests, also called permutation tests. EXAMPLE 10.22
Assume homogeneity of variance from normal populations but not random subjects and use the pooled-t test to produce a p-value to test if there is a difference between the 2 processes of groups A: 22 12 15 and B: 10 8. The number of possible assignments of the 5 subjects chosen 3 at a time leads to 10 possible combinations for Treatment A. They are (22 12 15)(22 12 10)(22 15 10)(12 15 10)(22 12 8) (22 15 8)(12 15 8)(22 10 8)(12 10 8)(15 10 8). Similarly, there are 10 possible combinations for Treatment B, listed as (10 8)(15 8)(12 8)(22 8)(15 10) (12 10)(22 10)(12 15)(22 15)(22 12). Consider the first t-test for A1 = (22 12 15) versus B1 = (10 8): x A = 16.333,
x B = 9,
s 2A = 26.333, s 2B = 2,
s 2Pooled = 18.222, t-value =
16.333 - 9
= 1.882.
18.222(1/3 + 1/2)
The 10 possible t-tests are performed on the 10 combinations. The t-value for the original data set computes as 1.882. Regarding the absolute values of the t-values for a two-tailed test, we see that 2 of 10 t-values are as large as 1.882, and thus 2/10 is the two-tailed p-value. See Table 10.3, below, where similar shading indicates the combination groups for each test. EXAMPLE 10.23
Suppose 5 subjects are randomly assigned to two treatments. Treatment A produced responses 22, 12, and 15 and Treatment B produced responses 10
Table 10.3 Process A Process B T-values
Randomization Tests 22 12 15 22 15 8 10 8 12 10 1.882 0.759
22 12 10 12 15 8 15 8 22 10 0.580 -0.836
22 15 10 22 10 8 12 8 12 15 1.197 -0.029
12 15 10 12 10 8 22 8 22 15 -0.481 -2.829
22 12 8 15 10 8 15 10 22 12 0.264 -1.306
P369463-Ch010.qxd 9/2/05 3:00 PM Page 643
10.11 Randomization Tests
643
and 8. Test for a difference between the 2 groups without any assumptions of normality or homogeneity of variance or randomness. Solution Use ranks instead. For Treatment A, the ranks are given by (combination-list (rank '(22 12 15 10 8)) 3), returning
The command (pooled-t data r) prints the nCr combination of tests and computes the t value for each and returns a p-value. For example, (pooled-t '(22 12 15 10 8) 3) prints three rows, consisting of Test Group 1, Test Group 2, and the t values, and returns the p-value. ((22 12 15) (22 12 10) (22 15 10) (12 15 10) (22 12 8) ((10.8) (15.8) (12.8) (22.8) (15 10) 1.8819 0.5804 1.1971 -0.4810 0.2637 (22 15 8) (12 15 8) (22 10 8) (12 10 8) (15 10 8)) (12 10) (22 10) (12 15) (22 15) (22.12)) 0.7589 -0.8362 -0.0290 -2.8290 -1.3059 ( p-val = 0.2 t-values = (1.881 0.580 1.197 - 0.480 0.263 0.758 - 0.836 - 0.028 - 2.828 -1.305))
(5 3 4)(5 3 2)(5 4 2)(3 4 2)(5 3 1)(5 4 1)(3 4 1)(5 2 1)(3 2 1)(4 2 1) with sums 12 10 11 9 9 10 8 8 6 7 The sum of the ranks of Treatment A is 12, versus the sum for Treatment B, 3. Can this be due to chance alone? The probability due to chance can be computed. Regarding the sum of the ranks as RV X, its density is
X P(X )
6 0.1
7 0.1
8 0.2
9 0.2
10 0.2
11 0.1
12 0.1
P (22 12 15) fi P ( ranks = 5 3 4) = P ( X = 12) = 0.1. Randomization tests are also appropriate for ANOVAs when the assumption of independent samples cannot be made. Rather than use a large number of all permutations, a random sample of the permutations is used. Equivalent test statistics are the F-ratios or, equivalently, the square of the sums of each level divided by its sample size. The next example illustrates the procedure.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 644
644
EXAMPLE 10.24
Chapter 10 Nonparametric Statistics
Suppose 12 subjects are not randomly selected but randomly assigned to three groups of sizes 3, 4, and 5, with the following responses.
A 20 45 25
B 30 35 40 35
C 40 50 30 50 45
Compute the F-ratio for the current assignment, which is the test statistic. Solution (setf data-0 '((20 45 25)(30 35 40 35)(40 50 30 50 45))). (F-anova data-0) returns 2.269. Generate the desired number of permutation assignments, and for each, compute an F-ratio. For example, (gen-r-perms data-0 3) generated the following 3 permutations of assignment.
A B C (setf rn-data '(((30 40 50) (20 40 45 25) (30 35 35 50 45) ((35 30 30) (50 40 50 45) (20 40 25 45 35)) ((40 45 45) (25 50 35 30) (40 35 50 20 30))) (repeat #' F-anova rn-data) returns the 3 F-ratios (0.6432, 4.4309, 0.8113),
We can determine how many F-ratios were as large or larger than 2.269, and divide by the number of permutations used to get an approximate p-value. T i2 An equivalent test statistic to the ANOVA F-ratio is  , since the degree ni of freedom for the F-ratio is constant; and total variation equals between variation plus within variation. The sum of the totals squared for the initial assignment is (20 + 45 + 25)2 3
+
(30 + 35 + 40 + 35)2 4
+
(40 + 50 + 30 + 50 + 45)2
= 16, 845.
5
The sum is computed for all the other random permutation assignments to determine how many are as large or larger than 16,845.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 645
10.12 Summary
645
The command (gen-p-stats data nperm) returns the two identical pvalues, one, using the sum of the squares of each permutation assignment divided by its sample size, and the other, computing and comparing the F-ratios. The number of permutations used is nperm. For example, (gen-p-stats '((20 45 25)(30 35 40 35)(40 50 30 50 45)) 5) printed
Random Permutations ((35 ((30 ((35 ((50 ((35
40 50 30 45 35
30) 35) 35) 25) 30)
(25 (45 (20 (50 (40
35 40 45 20 45
30 50 45 30 40
40) 40) 40) 35) 50)
(50 (25 (30 (35 (30
45 30 50 40 45
45 35 40 45 25
50 20 25 30 50
20)) 45)) 50)) 40)) 20))
T 2/ni
F-ratios
16720.000 16869.584 16563.334 16576.250 16769.584
1.218 2.523 0.287 0.352 1.593
and returned (T-p-value = 1/5 F-p-value = 1/5). The command (repeat #' SSt rn-data) returns (1022.92, 1022.92, 1022.92, 1022.92 1022.926), as the total variation for each permutation assignment remains the same. (gen-p-stats '((20 45 25)(30 35 40 35)(40 50 30 50 45)) 1000) returned ( Tp-value = 147/1000 Fp-value = 147/1000).
10.12 Summary Nonparametric tests need not assume homogeneity of variances nor that samples are from normal distributions. These tests are often called distributionfree tests. Further, the data need not be numerical, and nonparametric tests can readily use categorical data. Recall the sign tests. EXAMPLE 10.25
Given the data below, complete the following problems. TEQ INTRA INTER ADAPT STRESS
100 90 91 107 91 65 91 106 125 117 104 127 101 99 117 74 97 97 107 74 96 99 107 90 105 100 102 81 117 103 77 100 90 97 75 87 115 103 92 103 98 113 106 91 90 105 99 109 108 78 105 98 101 102 108 92 102 125 98 98
a) Test the hypothesis that the median of TEQ is 100. (sign-test TEQ 100 two-sided) returned
P369463-Ch010.qxd 9/2/05 3:00 PM Page 646
646
Chapter 10 Nonparametric Statistics
n 11
Below 5
Equal 1
Above 6
Median 104
p-value 1
b) Repeat the above test using the Wilcoxon Signed Rank Test. (Wsign-test TEQ 100) returned ( W+ = 33 W- = 30 W * = 30). The critical W for n = 12 at a = 5% is 13. As 30 > 13, the null hypothesis cannot be rejected. c) Use the Wilcoxon-Mann-Whitney statistic to test at a = 5% whether sample INTRA scores and INTER scores are from populations with the same median. (WMW intra inter) returned Rank of first group (17 13.5 23.5 1.5 1111 21.5 1.5 9 13.5 21.5 7.5) with sum 152 Rank of second group (20 15.5 18 5 23.5 19 4 15.5 7.5 11 3 6) with sum 148 ( Z = -0.1154 with p-value = 0.9080 W = 148). d) Use Spearman rank order correlation coefficient to determine if adapt and stress are significantly correlated. (Spearman-r ADAPT STRESS) returned 0.340 and null hypothesis of no significance cannot be rejected. e) Use the Kruskal-Wallis ANOVA to test for differences among INTRA, INTER, ADAPT, and STRESS. (anova (kw-ranks (list INTRA INTER ADAPT STRESS)) returned Source
SS
df
MS
F
p-value
SSBetween SSWithin SSTotal
688.87 8502.62 9191.50
3 44 47
229.62 193.24
1.19
0.3200
and the null hypothesis of no significant difference cannot be rejected. f ) Test to see if the run data on TEQ is random. 100 90
91
107
91
65
91
106
125
117
104
127
(mu-runs '(100 90 91 107 91 65 91 106 125 117 104 127)) dichotomizes the data into zeros and ones for scores below and above the mean as 0 0 0 1 0 0 0 1 1 1 1 1.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 647
647
Problems
(Run-Probability '(0 0 0 1 0 0 0 1 1 1 1 1) 0 1)) returns P(R ‹ 4) = 0.069, a one-tail p-value, or 0.138 for a two-tailed p-value. Nonparametric statistics can be quite useful. Being essentially distributionfree, the methods have widespread applicability and are simple to perform. While the normal assumption is relatively robust, always check assumptions before performing the analysis. Parametric tests, when applicable, are more powerful than nonparametric tests. Nonparametric tests can serve to provide confirming evidence in the rejection or acceptance of hypotheses. The tests are certainly simple enough to perform.
PROBLEMS 1. a) Test whether the median of a population is 7 at an a = 10% from the following sample. 9 5 2 20 15 4 8 12 19 3 11
ans. p-value = 0.5488.
b) Repeat the test using the normal approximation to the binomial. ans. p-value = 0.3658. 2. Use the Mann-Whitney test with a at 5% to determine if there is a difference between male and female salaries. Male Female
68 56
55 46
62 64
59 59
53 55
63 50
64 67 54 53
53 52
70 57
3. Determine whether the median of the population is 4.5 at an a = 5% for the following sample. 3 4 5 57 51918 9 5
ans. p-value = 0.3876.
4. Twelve men were given two different brands of razor blades with which to shave for a month. Six shaved with Brand A for half the month and switched to Brand B the other half of the month. Also six shaved with Brand B for half the month and switched to Brand A for the other half month. Nine of the men reported that Brand B shaved better, while three of the men thought that Brand A shaved better. Determine whether Brand B is better at a = 5%. The command (cbinomial 3 12 1/2) returns 0.073 > 0.05. 5. Emissions in a steel mill in parts per millions were listed during the week as 10 12 13 12 9 7 6 9 10 7 8.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 648
648
Chapter 10 Nonparametric Statistics
Determine if the median is greater than 7 at a = 5%. ans. p-value = 0.0195. The command (sign-test '(10 12 13 12 9 7 6 9 10 7 8) 7 'upper) returns Below 1
n 9
Equal 2
Above 8
p-value 0.0195
Median . 10
6. Determine if there is a difference between the following two groups at a = 5%. The samples are from a continuous distribution. ans. W = 31.5 X Y
25 32
54 47
35 60
47 73
54 54
34 49
7. A new machine is introduced for making parts, which can be scored as to quality of make. We want to test the new machine’s effectiveness. We randomly select 3 parts made by the new machine and 3 parts made by the old machine. We note that we must be careful in selecting our parts, as the quality can vary with time of day (shift changes), heat build-up, etc. What is our risk of concluding that the new machine is significantly better if all three parts made on the new machine are better than each of the 3 parts made on the old machine? ans. a = 0.05 8. Test the following data to determine if the median is 20, using the signtest and the normal approximation. 2 8 29 34 15 18 34 23 19 22 28 8 17 27 15 17 18 19 22 30
The command (sign-test data 20 'two-sided) returns n 20
Below 11
Equal 0
Above 9
p-value 0.8238
Median . 19
X = 11 with m = np = 20 * (1/2) = 10, s 2 = npq = 20 * (1/2)(1/2) = 5, Z=
9 - 10
= -0.4472.
5 p-value equals 2 * F(-0.4472) = 2 * 0.3274 = 0.6547. 9. Test the following two measurements from two separate treatments at a = 5%, using the Wilcoxon-Mann-Whitney rank test. ans. W = 51. X 25 13 24 12 16 18 9 Y 15 23 19 14 17 21 11
P369463-Ch010.qxd 9/2/05 3:00 PM Page 649
649
Problems
10. Test the following two measurements from two separate treatments at a = 5%, using the Wilcoxon-Mann-Whitney rank test. X 25 23 24 22 26 18 19 Y 15 23 9 14 7 21 11
11. Test the following two measurements from two separate treatments at a = 5%, using the Wilcoxon-Mann-Whitney rank test and the normal approximation. Note the many ties in the scores. ans. W = 48.5 z = -0.511. X 5 3 3 9 5 5 4 Y 5 3 9 1 7 4 1
12. Use both the WMW test and the normal curve approximation to the WMW rank test to determine whether there is a significant difference in means for the two sets of scores at a = 5%. A-Test Scores B-Test Scores
12 17 18 16 19 15 22 30 40 13 23 14 28 29 11 25 26 36
13. For the following two samples compute the sample means and variances. Determine if the assumption of homogeneity of variances is warranted for using the t-test. Test for significant difference between the two samples, using the WMW test at a = 5%. Sample X 20 Sample Y 3
29 5
18 2
25 1
27 4
12 7
33 4
11 6
16 2
14. Determine if there is a difference in the ordinal pretest and post-test scores at a = 0.05. Student 1 2 3 4 5 6 7 8 9 10
Pretest Score
Post-Test Score
10 9 11 7 13 15 8 7 12 10
12 8 14 8 12 10 8 8 11 13
15. a) Find the Spearman correlation for x and y scores. b) Show that the Pearson correlation between the ranks is equal to the Spearman correlation. ans. -0.7.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 650
650
Chapter 10 Nonparametric Statistics X Y
12 16 18 9 17 23 11 12 34 15
16. Find the Spearman correlation for the pretest and post-test scores in Problem 14. Compute the Pearson correlation and show that rP < rS. ans. h = 3.02.
17. Perform a Kruskal-Wallis test on the following data.
Method 1: 25 35 45 20 10 Rank: 5 10 13.5 2.5 1 Method 2: 30 25 30 30 25 Rank: 8 5 8 8 5 Method 3: 40 20 40 50 45 Rank: 11.5 2.5 11.5 15 13.5 18. In a random block design, the time in seconds to perform five similar assembly operations is shown blocked on 4 subjects. Use the Friedman F-test to determine if there is a significant difference among the assemblies. Blocks 1 2 3 4
Treatments
A1 12 19 16 19 A2 12 20 18 20 A3 14 22 20 26 A4 15 24 21 25
19. Check the Pearson and Spearman correlation coefficient for the following data pairs. Then add the pair (50, 50) to the set and repeat for both the Pearson and Spearman correlations. X Y
6 12
3 0
0 8
9 4
(Pearson-r '(6 3 0 9) '(12 0 8 4)) returns 0; (Spearman-r '(6 3 0 9) '(12 0 8 4)) returns 0. (Pearson-r '(6 3 0 9 50) '(12 0 8 4 50)) returns 0.96215; (Spearman-r '(6 3 0 9 50) '(12 0 8 4 50)) returns 0.5. 20. Suppose nonrandom Process A produced values of 22, 12, 9 and nonrandom Process B produced values of 15, 20, 23, 8. Perform a randomization test to see if there is a difference in the 2 processes. 21. Six company officials ranked the five investments projects as shown. Are the rankings random? Projects
A
B
C
D
E
O-1 O-2 O-3 O-4 O-5 O-6
3 3 2 1 2 3
1 5 4 2 3 1
5 1 3 5 4 4
4 4 1 4 5 5
2 2 5 3 1 2
P369463-Ch010.qxd 9/2/05 3:00 PM Page 651
651
Problems
22. Perform an ANOVA randomization test on the simplified data for Treatments A, B, and C under the assumption that the samples are not randomly selected but randomly assigned to the treatments. A
B
C
5
4
7
4
7
5
8
6
9
1) Assign the data: (setf data '((5 4 8)(4 7 6)(7 5 9))) 2) Secure the F-ratio for this arrangement: (F-anova data) returned 0.5. 3) Randomly generate 25 permutation groupings. (setf rn-data (genn-perms data 25)) returned (6 (8 (4 (7 (5 (4 (4
5 4 7 5 4 4 7
7 5 4 6 4 5 9
5 7 8 4 7 7 7
8 9 5 5 9 6 5
9 6 9 8 5 8 8
4 7 5 7 8 9 5
4 5 6 9 6 7 6
7) (4 4) (6 7) (7 4) (5 7) (6 5) (4 4)).
7 9 5 4 5 8
9 8 4 8 8 9
6 4 9 6 7 4
5 7 5 7 4 7
8 5 6 7 9 5
5 5 4 9 5 6
7 7 7 4 4 7
4) 4) 8) 5) 7) 5)
(9 (6 (5 (5 (8 (7
8 9 4 7 6 4
4 5 6 8 4 9
6 4 5 4 4 7
7 7 9 9 9 5
4 4 7 7 5 5
5 7 7 4 5 6
5 8 4 5 7 4
Note: To generate all permutations would include
7) 5) 8) 6) 7) 8)
(4 (7 (5 (9 (7 (7
9 8 4 6 9 4
7 5 5 8 5 6
5 5 8 4 4 9
9! 2! 2! 2!!!3
8 4 6 5 7 4
4 9 9 7 8 5
7 6 7 7 5 7
6 7 7 5 4 5
5) 4) 4) 4) 6) 8)
= 45, 368.
4) Regroup into three treatments. (setf rassigns (repeat # ' re-group rn-data (list-of 25 '(3 3 3)))) returned (((6 ((4 ((9 ((7 ((7 ((9 ((8 ((4 ((4
5 9 5 5 5 6 6 8 7
7) 7) 6) 4) 6) 8) 4) 9) 9)
(5 (5 (4 (9 (4 (5 (4 (4 (7
8 8 7 5 5 4 9 7 5
9) 4) 4) 6) 8) 7) 5) 5) 8)
(4 (7 (7 (4 (7 (7 (5 (6 (5
4 6 8 7 9 5 7 7 6
7)) ((4 5)) ((8 5)) ((7 8)) ((5 4)) ((5 4)) ((5 7)) ((7 5)) ((7 4))).
7 4 8 4 4 4 9 4
9) 5) 5) 6) 8) 4) 5) 9)
(6 (7 (5 (5 (6 (7 (4 (7
5 9 4 9 7 9 7 5
8) 6) 9) 7) 7) 5) 8) 5)
(5 (7 (6 (7 (9 (8 (5 (6
7 5 7 4 4 6 4 4
4)) ((9 8 4) (6 7 4) (5 5 4)) ((6 9 8) (4 7 5) (5 7 4)) ((4 7 4) (8 5 9) (5 6 8)) ((5 4 5) (8 6 9) (7 7 5)) ((5 7 8) (4 9 7) (4 5 7)) ((6 5 8) (7 4 9) (5 4 6)) ((4 4 5) (7 6 8) (9 7 8)) ((7 4 6) (9 4 5) (7 5
7)) 4)) 7)) 4)) 6)) 7)) 5)) 8))
5) Compute the F-ratios for each group. (repeat #' F-anova rassigns) returns (1.48 0.394 0.5 0.2 1.148 2.333 0.862 0.2 1.48 0.394 1.0 3.588 0.2 0.2 0.862 2.333 4.0 0.393 0.027 1.0 4.0 0.612 0.2 0.2 0.862) 6) Count the number as large as 0.5; 15, implying a p-value estimate of 15/25 = 0.6.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 652
652
Chapter 10 Nonparametric Statistics
7) The steps are combined in the command (gen-p-stats data 1000), which returned ( Tp-value = 323/500 Fp-value = 323/500 = 0.646). 23. The drying time in minutes for two different paint brands is shown below. Test to see if there is a difference in their drying performance. Paint A: 8 5 7 6 10 20 12 14 12 18 14 7 12 Paint B: 10 9 9 10 13 10 6 9 9 6 (WMW-normal '(8 5 7 6 10 20 12 14 12 18 14 7 12) '(10 9 9 10 13 10 6 9 9 6)) returns ranks of first group (7 1 5.5 3 13.5 23 17 20.5 17 22) with sum 172.5; ranks of second group (20.5 5.5 17 13.5 9.5 9.5 13.5 19 13.5 3 9.5 9.5 3) with sum 103.5 (Z = -1.0232 WITH P-VALUE = 0.3061 W = 103.5). 24. Consider pattern ‘(H T H T H T H) of two symbols with 7 runs. Is this a random pattern? Assuming equally likely patterns, how many patterns can be generated with 7 symbols of which 3 are T’s and 4 are H’s? How many can have 2 runs? 25. Dichotomize the following 100 data points and test for randomness of the runs. (setf data '(8 8 22 22 13 15 10 5 12 12 16 10 18 13 15 10 14 12 16 11 20 18 18 16 9 14 11 12 13 10 11 6 15 15 13 7 10 13 15 12 9 7 16 18 15 19 11 11 16 8 18 12 16 17 14 16 10 11 9 15 10 12 8 13 14 14 9 17 13 12 15 12 9 10 13 14 17 15 18 11 12 12 16 9 17 15 12 11 8 10 19 13 12 10 15 12 9 22 13 11))
(setf pdata (mu-runs data)) returns 0 if the data point is below the mean and 1 if above. (0 0 1 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0)
(prob-runs pdata 0 1) returns 51 runs with P ( R < 51) = 0.08019. 26. In a run arrangement of two symbols of length n, with n1 of one symbol and n2 of the other symbol, the number of runs is determined by the beginning symbol of each run. Let R = I1 + I2 + . . . + In, where each I is an indicator RV for whether the symbol starts a run and thus has a value of 1. I1 obviously equals 1. Find the expected value of R by finding p = P(Ik = 1).
P369463-Ch010.qxd 9/2/05 3:00 PM Page 653
653
Software Exercises
27. Take a random sample of size 51 from the exponential with parameter k = 1/20. Find both the parametric and nonparametric 99% confidence interval for the standard deviation. ans. parametric (14.700 £ s £ 24.772) nonparametric (9.177 £ s £ 26.212) 28. Take a random sample of size 50 from N(50, 16) and compute the 95% confidence interval for the standard deviation, using both the parametric and nonparametric procedures. 29. a) Use the Thiel method of regression to fit a line to the following data. b) Compute the covariance as an indicator of a monotone property. c) Find a 95% confidence interval for the median slope b, using the Kendall-t. x y
33 58 108 158 163 195 208 245 295 298 308 330 16.13 15.51 16.18 16.7 17.22 17.28 17.22 17.48 17.38 17.38 17.64
30. Find the Pearson and Spearman correlation coefficients and Kendall’s t, using the following 10 data pairs. X: 1.1 2.9 4.25 8.1 9.12 9.5 9.7 9.9 10.4 10.7 Y: 2.1 2.7 2.9 3.7 4.7 4.3 6.5 7.6 10.3 12.5
SOFTWARE EXERCISES 1. (sign-test data H0) returns a sign-test table of results for data (a list of values) and H0 (the value of the median for the null hypothesis). 2. Recall that (sim-normal m s n) returns n random samples from a normal distribution with mean m and standard deviation s. By combining the sign-test function with the random samples, we can conduct a sign-test. Try (sign-test (sim- normal 10 2 15) 8) to determine whether the sample is rejected at a = 0.05. Notice that the median is equal to the mean for a normal population and has the value 10 while the null assumption is 8. 3. Try (sign-test (sim-exp 2 12) 0.5) to do a sign test on 12 samples from an exponential with the k-parameter equal to 2. The median for an ln 2 exponential is given by . k 4. (wsign-test data H0) returns the sum of the ranks for both the positive and negative magnitudes for Wilcoxon Signed Rank Test. H0 is the value of the median for the null hypothesis.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 654
654
Chapter 10 Nonparametric Statistics
5. (WST-normal data H0) returns a normal approximation to the Wilcoxon sign test for sample sizes larger than 20 using the statistic W - n( n + 1)/4
Z=
.
n( n + 1)(2n + 1)/24 6. (wmw x y) returns the minimum sum for testing the Wilcoxon-MannWhitney statistic for ranked sums of two groups x and y. 7. Try (sign-test (sim-bin n p m) h) to test binomial samples with varying median null hypothesis values h. The parameter n is the number of binomial trials, p is the probability of success, m is the number of samples, and h is the hypothesized median. 8. The software command (WMW-normal X Y ) returns the z and twotailed p-values for normal approximation testing of the null hypothesis of no difference between the two groups. (WMW-normal '(60 45 81 87 79 75 62 30 40) '(76 65 90 80 89 95 85 69 45)) returns z = -1.766 and p-value = 0.077. 9. (rank data) returns the ordinal ranks of data. For example, (rank '(21 45 53 44 62)) returns (1 3 4 2 5). 10. (Pearson-r x y) returns the Pearson correlation coefficient for data sets x and y. 11. (Spearman-r x y) returns the Spearman’s rank correlation coefficient for ordinal data sets x and y. 12. Solve Problem 15 in steps. X 12 16 18 9 17 Y 23 11 12 34 15
i) (setf x '(12 16 18 9 17) y '(23 11 12 34 15)) ; assigns x and y to their respective ; data sets. ii) (rank x) returns (2 3 5 1 4) and (rank y) returns (4 1 2 5 3). iii) (repeat #' -(rank x) (rank y)) returns (-2 2 3 -4 1) ; d is the difference in the respective ; ranks. iv) (repeat #' square '(-2 2 3 -4 1)) returns (4 4 9 16 1). v) (sum '(4 4 9 16 1)) returns 34, which is d2. The command (Spearman-r x y) returns -0.7 = rs = 1 - 6 * 34/(15 * 224). 13. (KW-ranks data) returns the ranks of the data for a Kruskal-Wallis test.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 655
655
Software Exercises
14. (KW-rank-sums data) returns the sum of the treatment ranks. 15. (KW-sq-of-ranks data) returns the square of the treatment ranks. 16. (KW-s2 data) returns s2 for the data ranks. 17. (KW-test data) returns the K-W statistic for the ranks that serves as a distribution-free ANOVA statistic comparable to the F-ratio. (setf data (List pi-100 pi-200)). 18. Work Problem 17, using the software commands. Also use the command (ANOVA-1 (KW-ranks data)) to compare the ANOVA F on ranks with the K-W statistic. (setf data (List pi-100 pi-200)). 19. (Friedman-rank data) returns the ordinal ranks for data, a list of treatment columns. (Friedman-rank '((1 6 7) (9 34 87)(62 4 90))) returns (1 2 3) (2 3 1) (1 2 3). 20. (Friedman-F-test data) returns the Friedman F-test statistic and p-value. 21. The command (Friedman-chi-sq-test data) returns an approximate chi-square statistic and p-value for testing a randomized block design where data is a list of the treatment measurements. Try (Friedman-chisq-test data) for data assigned by the command (setf data '((22 45 24 1 49 18 38 13 48 17) (38 21 22 6 19 22 3 2 21 41) (45 8 31 24 37 2 21 22 16 38))). 22. The command (n-runs pattern symbol1 symbol2) returns the number of runs in the pattern. For example, (n-runs '(H T H T T H T) 'H 'T) returns 6. 23. The command (run-patterns pattern symbol1 symbol2) prints and returns the possible patterns of two symbols with the associated runs. For example, (run-patterns '(H T H T T H) 'H 'T) prints and returns (T T T H H H)(T T H T H H)(T T H H T H)(T T H H H T)(T H T T H H) 2 4 4 3 4 (T H T H T H)(T H T H H T)(T H H T T H)(T H H T H T)(T H H H T T) 6 5 4 5 3 (H T T T H H)(H T T H T H)(H T T H H T)(H T H T T H)(H T H T H T) 3 5 4 5 6 (H T H H T T)(H H T T T H)(H H T T H T)(H H T H T T)(H H H T T T) 4 3 4 4 2 24. (run-density n1 n2 r) returns P(R = r) where n1 is the number of one symbol in a two-symbol sequence, n2 is the number of the other symbol, and r is the number of runs. For example, (run-density 62 38 50) returns 0.0747.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 656
656
Chapter 10 Nonparametric Statistics
(cum-run-density n1 n2 r) returns P(R £ r). (cum-run-density 62 38 50) returns 0.691. (N-runs g-pattern symbol-1 symbol-2) returns the number of runs in the g-pattern of two symbols. (N-runs '(H H T H T H H T H) 'H 'T) returns 7. 25. (run-density-table g-pattern symbol- symbol-2) prints the discrete density table. (run-density-table '(H H H H H T T T T T T T) 'H 'T) prints Discrete Density Table R P(R = r) 2 3 4 5 6 7 8 9 10 11 12
0.0025 0.0126 0.0606 0.1212 0.2273 0.2273 0.2020 0.1010 0.0379 0.0076 0.0000.
(cum-run-density-table '(H H H H H T T T T T T T) 'H 'T) prints Cumulative Discrete Distribution Table R P(R < r) 2 3 4 5 6 7 8 9 10 11 12
0.00 0.02 0.08 0.20 0.42 0.65 0.85 0.95 0.99 1.00 1.00.
26. Find the number of runs in the first 1000 digits of p by replacing those digits with a 0 for those below the mean and with a 1 for those digits above the mean. (setf pi-data (mu-runs (pi1000))) converts digits below the mean to 0 and those above to 1. (Number-of-Runs pi-data 0 1) returns 522 runs, consisting of 508 0’s and 492 1’s.
P369463-Ch010.qxd 9/2/05 3:00 PM Page 657
657
Software Exercises
(Run-probability pi-data 0 1) returns P(R ‹ 522) = 0.914. (R-normal pi-data 0 1) returns Upper Tail 0.0905683. 27. Determine if the following sequence of 60 H and 56 T with 70 runs was randomly generated. Use the Runs-Test. (setf run-data '(H T T H H H T T H T H T H H T H H T T T H H T H H H T T H HHTTTHTHHTHHHTTHHTTHTTHHHTTTTHTHTHTH THHTTHHTTTHTHTTTHHHHTHTHTHHTHTHTTHHH T T T H H T H T H H T H T H T))
28. Take a random sample of size 50 from N(50, 16) and compute the 95% confidence interval for the standard deviation, using both the parametric and nonparametric procedures. 29. Using the nonparametric bootstrap procedure, find a 95% confidence interval for the data 79 88 39 17 40 27 45 100 50 71. We show 2 of several methods. Method 1: Generate the bootstrap ensembles and compute the 2.5 and 97.5 percentiles. Method 2: From the ensembles, compute x and use (2 x - 97.5 percentile value, 2 x - 2.5 percentile value) The command (bootstrap-ci data n optional-function) returns both computations. (bootstrp-ci '(79 88 39 17 40 27 45 100 50 71) 500 'mu) was executed 5 times returning 2.5 and 97.5 percentiles
2 x - x0.975 2 x - x0.025
((39.4816 72.3814)
(38.8054 71.7052))
((39.0520 71.4793)
(39.7387 72.1660))
((39.3393 71.3940)
(39.9025 71.9572))
((38.9111 71.4971)
(39.4703 72.0563))
((39.0522 71.4945)
(39.6894 72.1287))
30. Perform Thiel regression on the following data set where the assumptions of least square regression are not appropriate. The x-data is 1 to 20. Use the command (sort (flatten (thiel-b x y)) #'