Statistical Theory and Methodology In Science and Engineering Second Edition K. A. BROWNLEE Associate Professor of Statistics The University of Chicago
John Wiley & Sons, Inc., New York. London· Sydney
© 1960, 1965
COPYRIGHT
BY JOHN WILEY
& SONS, INC.
All Rights Reserved This book or allY part thereof mllst 1I0t be reprodllced ill allY form withollt the writtell permissioll of the pllblisher.
20 19 18 17 16 15 14 13
ISBN 0 471 11355 7 LIBRARY OF CONGRESS CATALOG CARD NUMBER:
6512717
PRINTED IN THE UNITED STATES OF AMERICA
Preface
The original intent of this bookto serve for a threequarter sequence in statistical methodsremains unchanged. The main objective is to develop competence and selfconfidence in the use of statistical methods, which require some understanding of the theoretical background and some practice in the application of statistical methods to actual data. This edition assumes that the reader has had a oneyear course in elementary calculus. This change has permitted a substantial expansion in the number and scope of the exercises. Topics discussed at greater length than in the first edition include transformations of density functions (Sections 1.13 and 1.14), two X two tables (Section 5.4), Bartlett's test (Section 9.5), the confidence interval for the entire line in linear regression (Section 11.11), the effects of errors of measurement on observations from a bivariate normal population (Section 12.8), and the partial correlation coefficient (Section 13.4). Material added to this edition includes Bayes' theorem (Section 1.7), curtailed binomial sampling (Section 1.11), sampling inspection (Sections 3.153.17), queuing theory (Chapter 4), estimation of the parameter of an exponential distribution from truncated observations (Section 5.5), the distribution of numbers of runs of elements of two types (Section 6.3), the Friedman X~ and Cochran Q tests (Sections 7.9 and 7.10), the regression fallacy (Section 12.5), correlation coefficients between indices (Section 13.5), and the likelihood ratio test for goodness of fit in multiple regression (Section 13.8). Room for the foregoing material has been made by omitting from this edition the discussion of weighted regression and the chapter on the multiple regression approach to the analysis of variance. This text is in a form that permits wide choice in the sequence in which v
vi
PREFACE
the material is read. The sketch below shows which chapters are prerequisite to the succeeding chapters:
5
3 6 7
12
8910{
1l1213 14151617
More compression is possible; Chapter 12 can be taken up after Section 11.4 and Chapter 13 can be taken up after Section 12.4. Chicago, Illinois September 1964
K. A.
BROWNLEE
Acknowledgnlents
The body of statistical theory and techniques expounded in this book is largely due to Professors R. A. Fisher and J. Neyman and their associates. I am deeply conscious of how feeble and anemic presentday statistical theory and practice would be without their work. My thanks are due R. R. Blough, W. H. Kruskal, H. V. Roberts, and several anonymous reviewers who commented on parts of an earlier draft of this textbook. I am particularly indebted to D. L. Wallace and G. W. Haggstrom, who made very many valuable comments and suggestions. The foregoing obviously have no responsibility for any inadequacies of the present form. My thanks for permission to reproduce data are due the editors of the American Jou/'llal of Public Health, Analytical Chemistry, the Astrophysical Jou/'llal, the Australian Jou/'llal of Applied Science, Chemical Engineering Progress, Food Research, Industrial and Engineering Chemistry, the Jou/'Ilal of the American Chemical Society, the Journal of the Chemical Society, the Jou/'Ilal of Hygiene, the Jou/'I1al of the Institute of Actuaries, the New York State Jou/'Ilal of Medicine, the Philosophical Transactions of the Royal Society, the Proceedings of the American Society for Testing and Materials, the Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, and Science. My thanks are also due Chapman and Hall for permission to reproduce data from Principles of Biological Assay, by C. W. Emmens, and the New York WorldTelegram and Sun for data from the World Almanac and Book of Facts. I am indebted to Professor Sir Ronald A. Fisher, Cambridge; to Dr. Frank Yates, Rothamsted; and to Messrs. Oliver and Boyd, Edinburgh, for permission to reprint parts of Tables III and V from their book Statistical Tables for Biological, Agricultural and Medical Research. I am vii
viii
ACKNOWLEDGMENTS
also indebted.to Professor E. S. Pearson and the Biometrika Trustees for permission to quote extensively from some of the tables in Biometrika Tables for Statisticians, volume 1, edited by E. S. Pearson and H. O. Hartley, and to Dr. A. Hald and John Wiley and Sons for permission to quote extensively from some of the tables in Statistical Tables and For' mulas. K. A. B.
Contents
CHAPTER
1
PAGE
MATHEMATICAL IDEAS
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17
Introduction, 1 Concept of Sample Space, 2 Probability, 6 Conditional Probability, 9 Independence, 11 Exercises lA, 16 Bayes' Theorem, 18 Permutations and Combinations, 20 Exercises 1B, 23 Random Variables, Probability Functions, and Cumulative Distribution Functions, 24 The Binomial Distribution, 30 Curtailed Binomial Sampling, 32 Exercises 1 C, 35 Continuous Frequency Functions, 36 Examples of Continuous Distributions, 42 Transformations of Probability Functions and Density Functions, 45 Another Example of the Transformation of a Density Function,48 The Concept of Expectation, 51 The Expectation of a Function of a Random Variable, 55 The Concept of Variance, 57 Exercises 1D, 61 ix
X
CONTENTS
1.18 1.19 1.20 1.21 1.22 1.23 1.24 1.25 1.26 1.27
2
The Properties of the Standardized Normal Distribution, 63 Discrete Bivariate Distributions, 67 Continuous Bivariate Distributions, 70 Conditional Continuous Density Functions, 72 Independent Continuous Random Variables, 74 Expectation in the Multivariate Case, 75 Exercises IE, 76 Covariance and the Correlation Coefficient, 77 The Variance of a Mean, 80 The Addition Theorem for the Normal Distribution, 81 The X2 Distribution, 82 Exercises IF, 84
STATISTICAL IDEAS
2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18
87
Statistical Inference, 87 Some Principles of Point Estimation, 88 Maximum Likelihood Estimation, 91 A Weighted Mean Unbiased and of Minimum Variance, 95 Some Principles of Hypothesis Testing, 97 A Criterion for Choosing between Alternative Tests, 101 A OneSided Test of an Observation from a Normal Population with Known Variance, 105 A OneSided Test of an Observation from a Binomial Distribution, 110 The Testing of Composite Hypotheses, 111 A TwoSided Test of an Observation from a Normal Distribution with Known Variance, 113 A TwoSided Test of an Observation from a Normal Distribution with Unknown Variance, 118 The Comparison of Two Means, 119 The Concept of P Value, 120 Confidence Limits: The General Method, 121 Confidence Limits for the Mean of a Normal Distribution with Known Variance, 124 Confidence Limits: The Pivotal Method, 127 Confidence Limits for the Parameter (J of a Binomial Distribution, 129 The Relationship between Confidence Limits and Tests of Hypotheses, 130 Exercises, 133
CONTENTS
3
THE BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 3.20 3.21 3.22
136
The Normal Approximation to the Binomial Distribution, 136 Testing Hypotheses about the Binomial Distribution with a Normal Approximation, 140 The Angular Transformation and Other Variance Stabilizing Transformations, 144 Testing Hypotheses about the Binomial Distribution with the Angular Transformation, 146 Confidence Limits for the Parameter of a Binomial Distribution, 148 Comparison of Two Observed Frequencies with the Normal Approximation, 150 The Correlated Two X Two Table, 154 Exercises 3A, 157 The Hypergeometric Distribution, 158 An Application of the Hypergeometric Distribution to Wild Life Population Estimation, 162 Fisher's Exact Test for Two X Two Tables, 163 The Poisson Distribution, 166 An Alternative Derivation of the Poisson Distribution, 169 Tests of Hypotheses about the Poisson Distribution, 172 Confidence Limits for a Poisson Parameter, 173 Simple Sampling Inspection, 174 Relationship between Sampling Inspection and Hypothesis Testing, 177 Rectifying Inspection, 177 Double Sampling, 180 The Addition Theorem for the Poisson Distribution, 181 The Comparison of Two PoissonDistributed Observations, 181 The Comparison of Two PoissonDistributed Observations with the Parameters in a Certain Hypothetical Ratio, 183 An Application to Vaccine Testing, 185 Exercises 3B, 185
4 AN INTRODUCTION TO QUEUING THEORY
4.1 4.2
xi
190
Introduction, 190 SingleChannel, Infinite, Poisson Arrival, Exponential Service Queues, 191
xii
CONTENTS
4.3 4.4 4.5 4.6
5
THE MULTINOMIAL DISTRIBUTION AND CONTINGENCY TABLES
5.1 5.2 5.3 5.4 5.5
6
6.5 6.6 6.7
7.5 7.6 7.7 7.8
221
Introduction, 221 The Mean Square Successive Difference Test, 221 Runs of Elements of Two Types, 224 An Approximation to the Distribution of the Number of Runs of Elements of Two Types, 226 Runs above and below the Median, 231 Control Charts for the Mean and Range, 232 Control Charts for PoissonDistributed Observations, 235 Exercises, 236
SOME NONPARAMETRIC TESTS
7.1 7.2 7.3 7.4
206
The Multinomial Distribution, 206 The X2 Approximation for the Multinomial Distribution, 207 Contingency Tables, 211 The Two x Two Table, 215 Life Testing, 217 Exercises, 219
SOME TESTS OF THE HYPOTHESIS OF RANDOMNESS: CONTROL CHARTS
6.1 6.2 6.3 6.4
7
Queues with Arbitrary Service Time Distribution, 195 SingleChannel, Finite, Poisson Arrival, Exponential Service Queues, 198 Multichannel, Infinite, Poisson Arrival, Exponential Service Queues, 200 Inventory Control, 203 Exercises, 204
241
The Assumption of Normality, 241 The Sign Test, 242 The Median Test, 246 The Mean and Variance of a Sample from a Finite Population, 248 The Wilcoxon TwoSample Rank Test, 251 The Adjustment for Ties in the Wilcoxon TwoSample Rank Test, 253 The H Test, 256 The Wilcoxon OneSample Test, 258
CONTENTS
xiii
7.9 The Friedman Rank Test, 260 7.10 The Cochran Q Test, 262 Exercises, 265
8
THE PARTITIONING OF SUMS OF SQUARES
8.1 8.2
271
The Distribution of Sample Estimates of Variance, 271 The Partitioning of Sums of Squares into Independent Components, 276 Exercise, 280
9
TESTS OF EQUALITY OF VARIANCES AND MEANS
9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9
282
Introduction, 282 Uses of the Sample Estimate of Variance, 282 The Variance Ratio, 285 The Interrelations of Various Distributions, 288 A Test for the Equality of Several Variances, 290 The OneSample t Test, 295 The TwoSample t Test, 297 The TwoSample Test with Unequal Variances, 299 A Comparison of Simple Tests for Means and Medians, 304 Exercises, 305
10
ONEWAY ANALYSIS OF VARIANCE
10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9
309
Introduction: Models I and II, 309 OneWay Analysis of Variance: Model 1,310 The Problem of Multiple Comparisons, 316 OneWay Analysis of Variance: Model 11,318 Interpretation of a Model II OneWay Analysis of Variance, 321 An Example of a Model II OneWay Analysis of Variance with Equal Group Sizes, 325 An Example of Model II OneWay Analysis of Variance with Unequal Group Sizes, 327 Simple Sampling Theory, 329 The Power Function of Model II OneWay Analysis of Variance, 330 Exercises, 330
xiv 11
CONTENTS
SIMPLE LINEAR REGRESSION
11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 11.9 11.10 11.11 11.12 11.13 11.14 11.15 11.16 11.17
334
Introduction, 334 The Model, 335 An Analysis of Variance Representation, 338 An Example of Linear Regression, 342 The Use of the Regression Line in Reverse, 346 The Comparison of Two Regression Lines, 349 Parallel Line Biological Assay, 352 An Example of Parallel Line Biological Assay, 354 Regression through the Origin, 358 The Use of the Regression Line through the Origin in Reverse, 361 A Joint Confidence Region for IX, p, 362 Linear Regression with Several Observations on y at Each x, 366 An Example of Linear Regression with Several Observations on y at Each x, 371 The Comparison of Several Regression Lines: Simple Analysis of Covariance, 376 Simple Analysis of Covariance, 388 Exponential Regression, 391 Regression with Error in the Independent VariabJe Exercises, 391 Exercises, 393
12
THE BIVARIATE NORMAL DISTRIBUTION AND THE CORRELATION COEFFICIENT
12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8
397
Introduction, 397 Transformations of Bivariate Distributions, 397 The Bivariate Normal Distribution, 401 Some Properties of the Bivariate Normal Distribution, 404 The Regression "Fallacy," 409 Estimation of the Parameters of the Bivariate Normal Distribution, 410 Tests of Significance for the Correlation Coefficient, 413 The Effects of Errors of Measurement, 414 Exercises, 417
CONTENTS
13
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
xv 419
13.1 13.2
Introduction, 419 Linear Transformation of the Variables in a Bivariate Normal Distribution to Give Independent Variables, 419 13.3 Regression on Two Independent Variables, 422 13.4 The Partial Correlation Coefficient, 429 13.5 Correlation Coefficients between Indices, 431 13.6 Regression on Several Independent Variables, 433 13.7 A Matrix Representation, 439 13.8 A Test of Whether Regression on r Variables Gives a Significantly Better Fit than Regression on q Variables, 441 13.9 Polynomial Regression, 447 13.10 Further Uses for the c Matrix, 447 13.11 Biases in Multiple Regression, 452 13.12 An Example of Multiple Regression, 454 Exercises, 462 14
TWOWAY AND NESTED ANALYSIS OF VARIANCE
467
14.1 14.2 14.3 14.4 14.5 14.6
Introduction: The Model for Model I Analysis, 467 The Analysis of Variance, 471 Computing Forms for TwoWay Analysis of Variance, 475 TwoWay Analysis of Variance: Model II, 478 The Interpretation of a Model II Analysis, 481 TwoWay Analysis of Variance with Only One Observation per Cell, 482 14.7 Nested or Hierarchical Analysis of Variance, 482 14.8 The TwoWay Crossed Finite Population Model, 489 14.9 Discussion of the TwoWay Crossed Finite Population Model,498 14.10 Nested Classifications in the Finite Model, 499 Exercises, 501 15
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
15.1 15.2 15.3 15.4 15.5
The Model, 504 Models I and II, 508 Mixed Models, 511 Confidence Limits in ThreeWay Analysis, 512 An Example of ThreeWay Analysis of Variance, 514
504
xvi
CONTENTS
15.6 15.7 15.8
16
PARTIALLY HIERARCHICAL SITUATIONS
16.1 16.2 16.3 16.4
17
Orthogonal Contrasts, 517 The Partitioning ofInteractions into Orthogonal Contrasts, 520 FourWay Analysis of Variance, 524 Exercises, 526
A Partially Hierarchical Situation and Its Model, 530 Calculation of Sums of Squares, Etc., 532 The Expectations of Mean Squares in Partially Hierarchical Models, 538 Confidence Limits in Partially Hierarchical Models, 540 Exercises, 543
SOME SIMPLE EXPERIMENTAL DESIGNS
17.1 17.2 17.3 17.4
530
547
Completely Randomized Designs, 547 Randomized Block Designs, 548 The SplitPlot Situation, 550 Relationship of SplitPlot to Partially Hierarchical Situations, 553 Exercises, 554
APPENDIX
Table I The Cumulative Standardized Normal Distribution Function, 558 Table II Fractional Points of the t Distribution, 560 Table III Fractional Points of the X2 Distribution, 562 Table IV Percentage Points of the F Distribution, 564 Table V y = 2 arcsin 570 Table VI Values of dn and Fractional Points of the Distribution of the Range, 572 Table VII up, 572 Table VIII Logarithms of n!, 573 Table IX Fractional Points of the Studentized Range, 574 Table X Fractional Points of the Largest of k Variance Ratios with One Degree of Freedom in the Numerator, 576 Table XI Random Sampling Numbers, 577
vx,
PARTIAL ANSWERS TO SELECTED EXERCISES
579
INDEX
585
Sta tis tical Theory
and Methodology In Science and Engineering
CHAPTER
I
Mathematical Ideas
1.1. Introduction This is primarily a textbook on statistics, not on probability, and we will deal with the latter only as much as is necessary. The two disciplines are, however, closely related, and are often confused. In probability, a branch c;>f mathematics, we specify the structure of a problem, construct a mathematical model to correspond, specify the values of the parameters (the numerical constants of the system), and then deduce the behavior of the system, e.g., the distribution of the relative number of times each possible outcome will occur. In statistics, we assume the structure of the system and the corresponding model, but not numerical values for the parameters, and from a set of observations on the system we attempt to infer, e.g., the values of the parameters. These characterizations will be clearer from a simple example. A random sample of size n is taken from a lot of N electric light bulbs containing a proportion e of defectives. What is the distribution of the number of defectives, X, in repeated random samples? Specifically, suppose n = 100, N = 10,000, and e = 0.1. We will not get exactly 100 X 0.1 = 10 defectives in every sample. Often we will get 10, in fact more often than any other outcome, but we will also often get 9, and 11, and 8, etc. What proportions of the time in repeated random samples will X equal ... ,6, 7, 8, ... etc.? This is a question in probability. Conversely, suppose that we take a random sample of size n and actually observe X defectives. What can we say about e? For example, what is the most likely value of e, and what is the range of values of e that is reasonably plausible, i.e., consistent in some sense with our observations? These are questions in statistics. Modern statistics is the product of many diverse influences, and some potentially important contributions were lost in the mists of indifference
2
MATHEMATICAL IDEAS
CHAP.
1
of their time. A conscientious historian would have to disinter these and give due credit even though he may be almost the first man to read them since their publication. Nevertheless, some of the main landmarks are generally agreed upon. Studies of gambling problems by the French mathematicians Pascal and Fermat in the year 1654 were the first significant investigations in probability. Over the next two centuries astronomers were interested in the theory of observational errors; in the early nineteenth century Laplace and Gauss made important contributions. For a general account up to the middle of the nineteenth century, see Todhunter [1]. By the beginning of the twentieth century a school under the leadership of Karl Pearson [2] in London was active in statistics, initially from the point of view of its application to biological measurements. An associate of this group, W. S. Gosset, published in 1908 under the pseudonym "Student" a solution to the important problem of the comparison of the mean of a small sample of normally distributed observations with a hypothetical value [3]. Modern statistics may be said to have begun with the appointment in 1919 of R. A. Fisher to the staff of the Rothamsted Experiment Station in England. Fisher's contributions [46] are threefold: first, a broad attack on the fundamental principles of estimation and inference, second, the solution of a large number of mathematical problems in distribution theory that were roadblocks to further progress, and third, the creation of the science of the design of experiments, involving three main principles, namely, the essentiality of replication and randomization, and the possibility of reduction of error by appropriate organization of the experiment. In the 'thirties J. Neyman, at that time in London, developed with E. S. Pearson [7] the theory of hypothesis testing and confidence intervals. In the 'forties, A. Wald and his associates of the Statistical Research Group at Columbia University created the ideas and techniques of sequential analysis [8] and statistical decision theory [9]. In recent years the volume of publication has become relatively enormous, much of it inspired, sometimes rather remotely, by the wide variety of practical problems to which statistics is now being applied. It is now difficult to be expert in more than one or two subdivisions of the field. 1.2. Concept of Sample Space The concept of sample space is a convenient method of representing the outcome of an experiment. By experiment we mean some procedure upon which we embark and at whose completion we observe certain results. For example, we may feed a vitamin supplement to a group of hogs and observe their weights after a certain number of weeks, or we may
SECT.
1.2
3
CONCEPT OF SAMPLE SPACE
toss a coin a certain number of times and observe how many times it falls with the head uppermost. The set of all possible outcomes of an experiment is represented by the sample space S. Each possible outcome is represented by a sample point. For example, if our experiment is to drop two light bulbs in sequence, with possible outcomes U = undamaged, F = filament broken, and G = glass envelope broken, then the possible outcomes are as represented in Table 1.1, where the ordering of the symbols within the parentheses ( ) corresponds to the time sequence. Here the sample space contains 9 sample points. Table 1.1 (U, U) (U, F) (U, G)
(F, U) (F,F) (F, G)
(G, U) (G,F) (G, G)
An event is the sum of sample points with some specified property. Thus for Table 1.1 the event "both undamaged" consists of the outcome (U, U). In this case the event consists of only one sample point: Such an event may be called a simple event. The event "one or more glass en:velopes broken" is made up of the sample points (U, G), (F, G), (G, U), (G, F), and (G, G); such an event, which can be decomposed further into a set of simple events, may be called a compound event. Suppose now that we consider a particular experiment. This will give rise to a fixed sample space. Consider an event E defined as a particular set of the sample points. Then all the sample points not in this set form the complementary event "not E," denoted by EO. Consider the foregoing experiment of dropping two light bulbs. Define the event E1 as "one or more undamaged." Then E1 and Ef are as in Figure 1.1. Also, events E1 , E2 may be defined such that a particular outcome may belong in more than one of them. The event E1 n E 2, called "E1 intersection E 2", is made up of those sample points belonging to both E1 and E 2 • For example, if the event E1 as before is "one or more undamaged" (U, U)
(F, U)
(G, U)
(U,F)
(F,F)
(G,F)
(U, G)
(F, G)
(G, G)
~
~
Figure 1.1
 El +s
4
MATHEMATICAL IDEAS
I
(U, U)
(F, U)
I
I (U,F)
I L
(F,F)
(U, G)
I
(G, U)
(F, G)
(G,F)
I
CHAP.
1
I I
(G, G)
f s
Figure 1.2
El nE2 (G, U)
(U, U)
(U,G)
(F,F)
(G,F)
(F,G)
(G,G)
S
Figure 1.3
(U, U)
(F, U)
(G, U)
1+E
,~
(U, F)
(F, F)
(G, F)
(U, G)
(F, G)
(G, G)
1
+E3
~________________L_~__~~___JS
Figure 1.4
and the event E2 is "one or more filaments broken," then E1 and E2 are as in Figure 1.2, and E1 n E2 is made up of the points (U, F) and (F, U), as in Figure 1.3. It is possible for two events Ei and E j to be so defined that Ei n Ej is empty of sample points; in other words, the event Ei n E j is impossible. We then say that E i , E j are mutually exclusive events. For example, if we define El as before as "one or more undamaged" and E3 as "both glass envelopes broken," then E1 n E3 is empty of sample points (Figure 1.4).
SECT.
1.2
5
CONCEPT OF SAMPLE SPACE
(U, U)
I (U, F) (U, G)
I (F, U) I I
(G, U)
J
E1
(G, F)
J
E2
I (F, F)
l
(F, G)
II
___ s
(G, G)
Figure 1.5
A further piece of symbolism is useful. By E1 U E 2, read "E1 union E2," we mean the event "at least one of the two events, i.e., either E1 but not E 2, or E2 but not E1, or E1 and E2 together." For the previous definitions of E1 and E 2, E1 U E2 is shown in Figure 1.5. More generally, for events E1 and E2 defined on an arbitrary sample space S by Figures 1.6a and b, Figures 1.6c and d illustrate (E1 U E2)C and (E1 n E2)C and the relationships (E1 U E 2)" (E1 n E 2 )"
= =
E~
E~,
(2.1)
E~ U E~,
(2.2)
n
which can be proved from the axioms of elementary set theory.
r.s
r~s
Ef (b)
(a)
r~
., S
S
(E1 nE2)C= Ef UE~
(E1 UE2)C= ElnE~
(d)
(c)
Figure 1.6
6
1.3.
MATHEMATICAL IDEAS
CHAP.
1
Probab~lity
As an axiom we associate with every point AI in the sample space a number, called the probability of Ak , denoted by Pr{A k }. These probabilities must be nonnegative and comply with the condition that (3.1)
where the summation is over the entire sample space. We further suppose that the probability of an event E, Pr{E}, is the sum of all the probabilities of the sample points Ak in E. These axioms lead to some useful rules. In (3.1), if a certain set of the points Ak corresponds to an event E, the complementary set will correspond to the complementary event EC, so E and EC will have associated with them the entire set of points, and therefore
Pr{E}
+ Pr{EC} =
1.
(3.2)
When the events E j and E j are mutually exclusive, there are no sample points belonging simultaneously to both E j and E j ; i.e.,
Pr{E; () E;} = O.
(3.3)
In this situation, Pr{E; U Ej } is the sum of the probabilities of the sample points lying in E j and those lying in Ej , there being no points lying in E; and Ej simultaneously, and (3.4)
This equation holds only when E; and E j are mutually exclusive. If events E 1, ••• ,Ek are all mutually exclusive, then there are no sample points belonging simultaneously to E j and E j for any combination of i and j, and (3.3) holds for any pair of events E; and E j • Also,
Pr{EI
U
E2
U ..• U
Ek } = Pr{E1 }
+ Pr{E2} + ... + Pr{Ek }.
(3.5)
If the events EI> ... , Ek , in addition to being mutually exclusive, are also exhaustive, so that (3.6) then (3.7)
Consider now the case where E i , Ej are not mutually exclusive, so that the event E; () E J is not empty of sample points. In Figure 1.7 we see that the region E; can be split into two parts, E j ( ) Ej and E; () E" and the region Ej can likewise be split into parts, E; () E, and Ef () EJ, the part E j ( ) Ej being common to both E; and EJ • It follows that the event
SECT.
Ei
U
1.3
7
PROBABILITY
Ei can be regarded as made up of three mutually exclusive events, so Pr{Ei
U
E j } = Pr{Ef n E j }
+ Pr{Ei n
Ei}
+ Pr{Ei n
En. (3.8)
We can simultaneously add and subtract Pr{Ei n Ei } to the righthand side leaving the equation unchanged:
Pr{E i
U
E j } = Pr{Ef n E i }
+ Pr{Ei n
E j } + Pr{Ef n En + Pr{Ei n E i }  Pr{Ei n Ei}'
(3.9)
Now Ef n E j and Ei n E j are mutually exclusive events, so by (3.4)
+ Pr{Ei n
Pr{Ef n E j }
Ej }
=
Pr{(Ef n E j )
U
(Ei n E j )}
= Pr{E j }.
(3.10)
.~s
Figure 1.7
Likewise
Pr{Ei n En
+ Pr{Ei n
E j } = Pr{(E i n E~) = Pr{Ei}'
U
(Ei n E j )} (3.11)
Thus, substituting (3.10) and (3.11) in (3.9),
Pr{Ei
U
E j } = Pr{Ei}
+ Pr{E
j} 
Pr{Ei n E j }.
(3.12)
As an illustration, consider the experiment of drawing a single card from a wellshuffled deck. The sample space wjll consist of 52 sample points corresponding to the 52 possible cards that might be drawn. Intuitively, if the deck is well shuffled this implies that the probability of anyone card being drawn is the same as that for any other card, i.e., (3.13) But these 52 outcomes are the entire sample space, so by (3.1) Pr{A i } = 1/52 for i = 1,2, ... ,52. Now define the event E1 as the occurrence of a heart. El will contain 13 sample points, all of probability 1/52, so
Pr{E1}
=
Pr{heart}
=
1
13 x 52
= 1 . 4
(3.14)
CHAP.
MATHEMATICAL IDEAS
1
Also define the event E2 as the occurrence of an honor card (ace, king, queen, jack, or ten). There are 4 X 5 = 20 honor cards in the deck, so Pr{E2}
= 5 .
(3.15)
= Pr{heart honor} = 5 .
(3.16)
= Pr{honor} = 20
X 
1
52
13
Further, there are 5 heart honors, so Pr{E1 n E 2}
52
We can now use (3.12) to give us Pr{a heart or an honor or a heart honor} = Pr{E1 U E 2} = Pr{E1}
+ Pr{E2} 
Pr{E1 n E 2}
= 13 + 20,_ 2. = 52
52
52
28. (3.17) 52
This we can readily check as there are 13 hearts, including the 5 heart honors, plus 3 X 5 other honors, making a total of 28 cards which are either hearts or honors or heart honors. We will now develop a formula for the probability of the union of three events, analogous to (3.12). We first note that
We omit a formal proof of this, but in effect it would be a translation into mathematical language of the following argument. The lefthand side defines elements x which belong to Ei and which also belong to either E2 or E3 (or both): the righthand side defines elements y which belong either to E1 and E2 simultaneously or to E1 and E3 simultaneously (or to both E1 and E2 and E1 and E3 simultaneously); i.e., to E1 and to either E2 or E3 (or both). Thus any x is a y and any y is an x. We will assume without proof the generalization of (3.18): E1 n (E2 u E3
U ... U
Ek)
= (E1 n E 2) U (E1 n E 3) U ... U (E1 n Ek)'
(3.19)
We now consider Pr{E1 U E2 U E3}' The event considered here can be regarded as the union of E1 with E2 U E 3, so from (3.12) Pr{E1 U E2
U
E 3} = Pr{E1} + Pr{E2 U E 3}  Pr{E1 n (E2
U
E3)}' (3.20)
For the last term on the righthand side we use first (3.18) and then (3.12): Pr{E1 n (E2
U
E 3)}
= Pr{E1 n E 2}
=
Pr{(E1 n E2)
+ Pr{E1 n
U
(E1 n E 3)}
E 3}  Pr{(E1 n E2) n (E1 n E3)}' (3.21)
SECT.
1.4
9
CONDITIONAL PROBABILITY
The last term here can be written as
Pr{(EI
(')
Ea) (') (E1
(')
Ea)} = Pr{EI
(')
E2 (') Ea}.
(3.22)
Substituting this in (3.21), substituting (3.21) in (3.20), and using (3.12) for the term Pr{E2 U Ea} in (3.20), we get
Pr{EI
U
E2
U
+ Pr{E2} + Pr{Ea} 
Ea} = Pr{E1 }
 Pr{EI
(')
Ea}  Pr{E2
(')
Pr{EI
+ Pr{E
Ea}
I (')
(')
E2}
E2 (') Ea}. (3.23)
1.4. Conditional Probability Suppose that the events E1 , E2 are among the possible outcomes of an experiment and that we are interested in Pr{E1 }. Suppose that we are informed that Ea has occurred. What can we now say about the probability of El which we now write as Pr{E1IE2} (read as "the probability of El given E2")? We now are operating, not on the original entire sample space, but on the restricted sample space made up of the points belonging to E2 • This implies that the probabilities of the now restricted set of points At, say Pr{A t IE2}, must be adjusted so that they sum to 1 in the new sample space. This can be achieved by multiplying all the original Pr{A t} which lie in E2 by IfPr{E2}; i.e., we write 1
Pr{AiIEa} =  Pr{E a}
X
(4.1)
Pr{A i},
since if we sum the Pr{A;IE2} over the entire E2 space we get EI
!
Pr{ AilEz}
i
1
Es
Pr{ E2 }
i
= !
Pr{ At}
1
= Pr{ Ea}
Pr{ Ea}
X
= 1.
(4.2)
To obtain Pr{E1IE2} we sum the Pr{A i IE2} over those points which lie in E1 ; note, however, that we are already confined to E 2, so this summation is over points in El 0(') E2. Thus ElnEa
Pr{E1IE2}
= !
Pr{A t IE 2 }
t
1
ElnEa
Pr{E 2 }
i
=  !
_ Pr{ El (') E 2 }

Pr{E2}
Pr{A t} (4.3)
This implies
Pr{EI
(')
Ez} = Pr{E2} Pr{E1 IEz}.
(4.4)
10
CHAP.
MATHEMATICAL IDEAS
I
Clearly an analogous argument will give
Pr{E IE } = Pr{E 2 n E 1} 2 1 Pr{E1}
(4.5)
and (4.6)
Pr{E2 n E1} = Pr{E1} Pr{E2IE1}.
For example, suppose that we have a deck of cards from which the 5 diamond honors have been removed. Let the experiment be to draw one card at random from the 52  5 = 47 cards in the abbreviated deck.
Let the event E1 be that the chosen card is an honor and the event E2 be that the chosen card is a heart. Then the event E1 consists of 15 sample points each with probability 1/47, so 15 Pr{honor} = Pr{E1} =  = 0.319. (4.7) 47 The event E1 n E2 is the appearance of an honor heart. The number of sample points in E1 n E2 is 5, each with probability 1/47. The number of sample points in E2 is 13, each with probability 1/47. Thus if we catch a glimpse of the card and know that it is a heart, we can then say that the probability that it is an honor is, by (4.3), Pr{honorlheart}
= Pr{E 1IE2} = Pr{E1 n E2} = 5/47 = 2. = 0.385, 13/47
Pr{ E 2 }
13
which is substantially greater than the unconditional figure of 0.319. This is a mild illustration of the bridge proverb, "one peep is worth two finesses. " Considering this situation from another viewpoint, we can calculate the probability of getting a heart honor in two ways, using either (4.4) or (4.6): Pr{honor heart}
= Pr{honor} Pr{heartlhonor} = Pr{E1} Pr{E 2IE1} = 15 X
47
2. = 2.,
(4.8)
:3 = :7 .
(4.9)
15
47
or Pr{heart honor} = Pr{heart} Pr{honorlheart}
= Pr{E2} Pr{E1IE2} = ~~
X
Equation (4.4) extends to three events: write E2 n Ea for E2 to get
Pr{E1 n E2 n Ea} = Pr{E1IE2 n Ea} Pr{E2 n Ea} = Pr{E1IE2 n Ea} Pr{E2IEa} Pr{Ea}.
(4.10)
SECT.
1.5
11
INDEPENDENCE
1.5. Independence
Suppose that the probability of the event E1 is the same whether or not the event E2 occurs, i.e., (5.1) We then say that E1 is independent of E2. Equation (5.1) constitutes a satisfactorily intuitive definition of independence, but we will now show that it implies (5.2) as this latter form is usually more convenient and is often given as the definition. From (4.3) we have
Pr{E IE} = Pr{E 1 n E 2} . 1 2 Pr{E2} We can substitute E~ for E2 to get
(5.3)
Pr{E lEe} = Pr{ E1 n EU .
(5.4)
1
2
Pr{E~}
Now if our definition of independence, (5.1), is satisfied, then the lefthand sides of (5.3) and (5.4) are equal, and hence
Pr{E1 n E2} _ Pr{E1 n E~}., Pr{E 2} Pr{Ea
(5.5)
whence
Pr{E 1 n E2} Pr{Ea = Pr{E 1 n But by (3.2), Pr{En = 1  Pr{E2}, so
E~}
Pr{E 1 n E2}(1  Pr{E 2}) = Pr{E1 n
Pr{E 2}.
E~}
Pr{E 2}
(5.6) (5.7)
and hence
Pr{E1 n E2} = Pr{E1 n E2} Pr{E 2} + Pr{E1 n E~} Pr{E 2} = (Pr{E 1 n E2} + Pr{E 1 n Pr{E 2} = Pr{E1 n E2 u E1 nED Pr{E 2} = Pr{E 1} Pr{E2}.
Em
(5.8)
We note also, substituting this in (5.3), that in the case of independence (5.9)
Thus we have shown that our definition of independence, (5.1), implies the usual definition (5.2). The arguments can be used in reverse to show that (5.2) implies (5.1); these two definitions are therefore equivalent.
12
CHAP.
MATHEMATICAL IDEAS
1
Presumably because the word independent and the phrase mutually exclusive have related connotations in ordinary English usage, their probabilistic definitions and implications are sometimes confused. Assume that Pr{El } > 0 and Pr{E2} > O. If the events are mutually exclusive, then the event El n E2 is impossible, and therefore Pr{El n £2} = O. But if PrfEl n E 2 } = 0, then (5.2) cannot be satisfied, and therefore the events El and E2 cannot be independent. Conversely, if the events El and E2 are independent, (5.2) is satisfied, and therefore Pr{El n E 2 } 0 and so the events El and E2 cannot be mutually exclusive. It should be noted, however, that it is possible fortwo events El and £2 to be not mutually exclusive and not independent. For example, let El be the event picking a heart and E2 be the event picking a heart honor or diamond honor. Then Pr{El } = 1/4 and Pr{E2} = 10/52. Since Pr{El n E2 } = 5/52 =;6. 0, the events are not mutually exclusive, and since 5/52 =;6. 0/4) x (l0/52) equation (5.2) is not satisfied and therefore the events are not independent.
>
(AlBl)
(AlB2)
(AlB3) •••
(AlBj)
••• (AlBm)
(A2 B l)
(A2 B 2)
(A2 B 3) •••
(A2 B j)
••• (A2 B m)
(AiBl)
(Ai B 2)
(Ai B 3) •••
(AiBj)
••• (AiBm)
(AnBl)
(An B 2) (An B 3) •••
(AIIBj )
• •• (AnBm)
A\12)
B~12) J
812
Figure 1.8
We now need to consider compound experiments made up of two separate independent experiments: consider, for example, the compound experiment formed by first throwing a 6sided die and secondly drawing a card from a 52card deck. The outcomes of the first experiment can be represented by points Ai' i = 1, ... ,11 = 6, in the sample space Sl with probabilities Prl{A i }, and the outcomes of the second experiment can be represented by points Bj,j = I, ... ,m = 52, in the sample space S2 with probabilities Pr 2{BJ The outcomes of the joint experiment can be represented by the ordered pairs (AiBj) in the joint sample space S12 with probabilities Pr12 {(A i Bj )}; see Figure 1.8. In this sample space we
SECT.
1.5
13
INDEPENDENCE
can define the event A~12) as the union over all j of the points (AiBS): A~12)
= (AIB1)
U (AiB2) U ... U (AiBm).
(5.10)
Likewise, (5.11) The intersection of A~12) with B}12) is (AiBi)' We now assume that the separate experiments are independent, so that in S12 we have (5.12) We further assume that the probability of Ai is the same whether we consider Ai as occurring in Sl or S12: (5.13) Likewise, (5.14) These assumptions give, on substituting (5.13) an.d (5.14) in (5.12), (5.15) Now suppose that E is an event defined on Sl as the union of all points ieE
Ai conforming to the definition E; i.e.,
U Ai' and likewise F is defined i
jeF
on S2 as UBi' Represent E in the joint sample space S12 by E 12 , i.e., j
m teE E12
Likewise
=
U U (AiBi)' i
t
The occurrence of E in the first experiment and F in the second experiment, i.e., the ordered pair (EF) in the joint experiment, is equivalent in SIB to the intersection of E12 and Fn., i.e., to
14
CHAP.
MATHEMATICAL IDEAS
1
Thus the probability of observing in the joint experiment E followed by F is the product of the probability of E in the first experiment with the probability of F in the second experiment, it being assumed that the separate experiments are independent. Equation (3.12), or the lack of it, is the basis of a common probabilistic fallacy. Suppose that two missiles are fired at a target independently and that each has a probability of 0.2 of destroying the target. The popular misconception is that the probability of the destruction of the target is 0.2 + 0.2 = 0.4. The fallaciousness of this argument is evident if the probability of either missile destroying the target was 0.6; this argument would give 0.6 + 0.6 = 1.2 as the probability of the destruction of the target, an obviously impossible result. The correct answer is obtained as follows. Let E1 be the event "target destroyed by first missile" and E2 the event "target destroyed by second missile." By (3.12) Pr{target destroyed} = Pr{target destroyed by first missile, or second missile, or both}
= Pr{E1 U E2 } = Pr{E1} + Pr{E2 }
Pr{E1 (') E2 }.

(5.18)
Here Pr{E1 (') E2 } is the probability of the destruction of the target by both missiles. By (5.17) when the events are independent, Pr{E1 (') E2} = Pr{E1} Pr{E2 }, so Pr{target destroyed}
=
0.2
+ 0.2 
0.2
X
0.2
= 0.36.
(5.19)
As a similar example, consider an experiment consisting of drawing one card from a deck, replacing it, and drawing another. Let E1 be the event of getting a spade on the first draw and E2 be the event of getting a spade on the second draw. Then Pr{E1} = 13/52 = 1/4 and Pr{E2 } = 13/52 = 1/4. Then Pr{both cards are spades} = Pr{E 1 (') E2 } = Pr{E 1} Pr{E 2 } 1
= "4
1
X
1
"4 = 16 '
(5.20)
since the two draws are independent. We might note in passing that Pr{at least one card is a spade} = Pr{E 1 U E 2 } = Pr{E1}
+ Pr{E 2}
1 1 1 7 =+=4
4
16

16'
Pr{E 1 ri E 2} (5.21)
SECT.
1.5
15
INDEPENDENCE
or alternatively, Pr{at least one card is a spade}
= =
1  Pr{neither card is a spade} 1  Pr{E~ n E~}
= 1
~
4
X
=
1  Pr{ED Pr{E~}
~= 2 .
4
(5.22)
16
We might also note that
u E~ n E2 } = Pr{E 1 nED + Pr{E~ n E2 } = Pr{E1} Pr{E~} + Pr{ED Pr{E 2}
Pr{exactly one card is a spade} = Pr{E1 n E~
1
3 4
3 4
1 4
6
=x+x=. 4
16
(5.23)
To illustrate the difference dependence may make, consider the related experiment in which first one card and then another are withdrawn from the deck, this time without replacement of the first card. Here Pr{both cards are spades}
= Pr{E1 n Ea} = Pr{E 1} Pr{ EalEl} 13 = 52
12
x 51 =
1 17 '
(5.24)
since on the second draw, if a spade has already been withdrawn from the deck on the first draw then there are only 12 spades in the remaining 51 cards. The result of the second drawing is dependent on the result of the first drawing. Given three events E1, E2 , and Ea, we say that they are pairwise independent if the following equations are satisfied.
Pr{E1 n E2 } = Pr{E1} Pr{E2},
(5.25)
Pr{E1 n Ea} = Pr{E1} Pr{Es},
(5.26)
Pr{E2 n Es} = Pr{E2 } Pr{Es}.
(5.27)
We say that the events are (completely) independent if in addition to (5.25)(5.27) the following equation is also satisfied:
Pr{El n E2 n Es} = Pr{E1 } Pr{E2} Pr{Es}.
(5.28)
It can be shown that the above definition implies a commonsense equivalent, i.e.,
Pr{E 1 IE 2 n Es} = Pr{EIIE~ n Es} = Pr{E 1 IE 2 n E~} = Pr{EIIE~ n En.
(5.29)
and two similar equations obtained by cyclically permuting the suffices.
16
MATHEMATICAL IDEAS
CHAP.
I
We will prove only the first part of (5.29). Starting with (5.28) and using (5.26), we have
Pr{El () E2 () E3}
= Pr{E 1 () E3} Pr{E2} = Pr{(E 1 () E2 () E3) U (El () E~ () E3)} Pr{E 2} = [Pr{El () E2 () E3}
+ Pr{El () E~ () E3}] Pr{E2}
= Pr{El () E2 () E 3} Pr{E2}
+ Pr{El
()
E~
() E 3} Pr{E 2}.
Pr{El () E2 () E3}(1  Pr{E2})
=
Pr{El () E~ () E 3} Pr{E 2}. (5.31)
(5.30)
Therefore
Substituting Pr{En for (I  Pr{E2}) and dividing both sides by Pr{E2} Pr{En Pr{E3} gives
Pr{E 1 () E2 () E 3} _ Pr{El () E~ () E 3} Pr{E2} Pr{E3} Pr{En Pr{E3}
(5.32)
Pr{E 1 () (E2 () Ea)} _ Pr{E 1 () (E~ () E 3 )} Pr{E2 () Ea} Pr{E~ () Ea}
(5.33)
or
so (5.34) which was to be proved. The other parts of (5.29) can be proved similarly.
EXERCISES lA IA.I. One card is drawn at random from a standard 52card deck. Consider the events defined as the card being E 1 , a spade; E2 , an honor (A, K, Q, J, to); Ea, the ace of spades; E4 , the ace of hearts. Answer the following questions, giving reasons for your answers: (a) Are El and E2 independent? (b) Are El and E2 mutually exclusive? (c) Are Ea and E4 independent? (d) Are Ea and E4 mutually exclusive? (e) Are E2 and Ea independent? 3, p{x} is too small numerically to be visible on the scale used. Figure 1.16 shows the binomial probability function with n = 9, () = 1/2: for = 0 and for x = 9,p{x} = (1/2)9 = 1/512 = 0.00195, which is too small to be visible on the scale used. For more awkward instances logarithms can be used: x
loglo PlI{x}
= loglo (~) + X loglo () + (II
 x) 10glo(1  ()).
(9.8)
pIx! 0.3
0.2 
Figure 1.16
Table VIII of loglo II! can be used to evaluate
10glO(~)
= loglo II! loglo x! IOglO(1I  x)!.
(9.9)
1.10. Curtailed Binomial Sampling
Consider a contest in which II games are played by two players P and Q. It is agreed that if P wins at least k games, then he has won the contest. Ordinarily, of course, n is odd, and determined by the equation n = 2k  1. For example, it may be that to win the contest P has to win 4 games out of7. This special relation between 12 and k is not essential to the problem, however. It could be that P is given an advantageous handicap so that he is declared the winner of the contest if he wins 3 games out of 8. In another example, if the fender on our automobile stays on so long as k out of 12 bolts do not vibrate loose, 12 and k are determined purely by the technology of the situation. Suppose that the probability of P winning any game is (), and that successive games are independent. Then the probability of P winning x games is given by (9.3), and the probability of his winning the contest, say A;;(()) is the sum of these terms for x = k, k + I, ... , 11; i.e., (10.1)
SECT.
1.10
33
CURTAILED BINOMIAL SAMPLING
Let us consider a particular case, say with n = 5, k = 3. Looking at the problem from first principles, each of the 5 games has 2 possible outcomes, so the total number of points in the sample space, which we will represent by symbols of the form (WWWWW), (WWWWL), etc., is 25 = 32. The following classes of points correspond to the definition of P winning the contest: (0) (WWWWW)with probability 05.
5
(b) (WWWWL), (WWWLW),etc., making a total of ments each with probability 04 (1  0)1. (c) (WWWLL), (WWLWL), etc., making a total of ments each with probability 03(1  0)2.
C) G)
such arrangesuch arrange
The sum of the probabilities of these points can be found directly or from (10.1) as A:(O)
=
J3 (~)
oro(1  0)5ro
= 03(10 
150
+ 6( 2).
(10.2) (10.3)
It is obvious that once P has won k games there is no point in playing any more: in reality,. the contest will be terminated as soon as P has won k games, or as soon as Q has won n  k + 1 games. The previous analysis is thus unrealistic, although intuitively it is clear that nevertheless it must give the correct answer. A more realistic analysis involves a restricted sample space, since a point (WWWWW) will not exist in reality; if player P won the first three games he would be declared the winner of the contest and the last two games would never be played. The realistic sample space consists of the following classes of points:
(0) (WWW) with probability 03. (b) (LWWW), (WLWW), (WWLW), each with probability 03(1  0). (c) (LLWWW) plus other arrangements in which the first four games result in 2 W's and 2 L's, the last game always being a W. There are a
total of
G)
such arrangements, all with probability 03(1  0)2.
(d) (LLWWL) plus other arrangements in which the first four games result in 2 W's and 2 L's, the last game always being an L. There are a
total of
(~)
such arrangements, all with probability 02(1  0)3.
(e) (WLLL) plus other arrangements in which the first three games
34
CHAP.
MATHEMATICAL IDEAS
1
result in 1 Wand 2 L's, the last game always being an L. There are a total of
G)
such arrangements, all with probability 0(1  0)3.
(f) (LLL) with probability (1  0)3. The entire realistic sample space therefore consists of 1 + 3 + 6 + 6 + 3 + 1 = 20 points, and the sum of the probabilities ofthese 20 points is of course 1. Those points in classes (a), (b), and (c) conform with the definition of P winning the contest. The probability of P winning the contest, say B~(O), in this instance is B:(O)
= 03 + (~)03(1  0) + (~)03(1 = 03(10  150 + 6( 2),
 0)2
(lOA)
(10.5)
which is equal to A:(O) in (10.3). The general form for B~(O) can be obtained by the following arguments:
~ ~~ ,:,~ { B~(O) = Pr ~+~ ... ~W+ l!:,W'" ~W+'" (~) permutations
k
nk
(k~ 1) permutations
k
}
+~.
(10.6)
(::=0 permutations consists of k + x
The xth term, for example, games altogether, ending in a W for player P. The preceding sequence of k  1 + x games must include k  1 W's, and therefore (k  1 + x)  (k  1) = x L's. There will be
(k  ~ + X) such possible preceding sequences. Thus
Pr{winning contest in
k + x games}
= (1 
OY"Ok(k  ~ + x),
(10.7)
and the probability of winning the contest is
B~(O) = Ok]:(1 _ OY"(k 
! + x).
(10.8)
Since A~(O) and B;(O) give answers to two problems which must have the same answer, we must have
i (n)OIl(l Y
II=~'
0t = Oknik(l _ O)",(k II
",=0
1 x
+ x),
(10.9)
a rather remarkable identity which can be proved algebraically. In a contest such as the Wodd Series, in which n = 7, k = 4, the number
EXERCISES
Ic
35
of games required to terminate the contest, say x, is of interest. Since either team can win, we have Pr{X
= 4} = 04 + (1
Pr{X
= 5} =
(1)[0 4(1  0)
Pr{X
= 6} =
G) [04(1 
Pr{X
= 7} = (~) [0 4(1
 0)\
+ 0(1
 0)4],
O?
+ 02(1
 0)4],
 0)3
+ 03(1

w].
For 0 = 1/2, the probabilities are 2/16, 4/16, 5/16, and 5/16, and for = 3/4, the probabilities are 328/45, 336/45 , 225/45, and 135/45 • For an intensive examination of the World Series, see Mosteller [12].
o
EXERCISES IC te.t. A professor of psychology writes the following: "Let us now consider whether estimates of the probability of success in a given task obey rules similar to those of mathematical probability or are subject to different, psychological rules. One rule of mathematical probability convenient for such a test is the additive theorem: namely, that small, independent probabilities of a particular event add up to a larger probability. Thus if you are drawing for a lucky ticket in a pool, your chances of success will increase in proportion to the number of tickets you take. In one of our experiments we confronted our subjects with a choice between taking a single large probability or a set of smaller probabilities; e.g., they were allowed to draw either one ticket from a box of 10 or 10 tickets from 100, in the latter case putting back the ticket drawn each time before making the next draw. Mathematically, of course, the chance of drawing the prize ticket was exactly the same in both cases. But most of the subjects proved to be guided mainly by psychological rather than mathematical considerations. "If the 10 draws had to be made from 100 tickets in one box, about fourfifths of the subjects preferred to make a single draw from a box of 10." Comment. tC.Z. An aircraft has 4 engines, 2 on the left wing and 2 on the right wing. Suppose that the probability of anyone engine failing on a certain transocean flight is 0.1, and that the probability of anyone engine failing is independent of the behavior of the others. What is the probability of the crew getting wet (a) if the plane will fly on any 2 engines? (b) If the plane requires at least one (i.e., one or more) engine operating on both sides in order to fly? tC.3. The Chevalier de Mere (1654) wanted to know the following probabilities: (a) probability of seeing one or more sixes in 4 throws of a 6 sided die; (b) probability of seeing one or more double sixes in 24 throws with a pair of dice. The Chevalier thought that these two probabilities should be the same, but he
36
CHAP.
MATHEMATICAL IDEAS
1
threw dice so assiduously he convinced himself they were different. Evaluate these two probabilities. IC.4. Suppose that the probability that a light in a classroom will be burnt out is 1/4. The classroom has six lights, and is unusable if the number of lights burning is less than four. What is the probability that the classroom is unusable on a random occasion? IC.S. "The United States SAC (Strategic Air Command) is supposed to be based upon about fifty home bases. If the Soviets happened to acquire, unknown to us, about 300 missiles, then they could assign about six missiles to the destruction of each base. If the Soviet missiles had, let us say, one chance in two of completing their countdown and otherwise performing reliably, then there would be ... (about) an even chance that all the bases would be destroyed, about one chance in three that one base would survive, and a small chance that two or more bases would survive." (From Herman Kahn, "The Arms Race and Some ofIts Hazards," Daedalus, Volume 89, No.4 of the Proceedings of the American Academy of Arts and Sciences, p. 74480, 1960.) Calculate the numerical value of the "small chance that two or more bases would survive." IC.6. Large tomatoes are packaged at random three to a container. Let (J be the probability that a tomato has some flavor. Let E1 be the event that not more than one tomato in a container has some flavor, and let E2 be the event that at least one tomato has some flavor. (a) For what values of (J, excluding 0 and 1, are E1 and E2 independent? (b) As in (a), but suppose that a container contains four tomatoes. IC.7. Consider making a sequence of independent trials with probability of success at each trial equal to (J. Let X be the number of trials necessary to observe a success. (a) Show that the probability function is
Px{x} = (1 
x
(J)r O.
(l2.2)
and is graphed in the upper part of Figure 1.23. The cumulative distribution function is P{x}= f OJ p{t}dt=1 flU dt=tlOJ
a
Q()
=
~ [x 
a/S
a
a/S
( ~) ] = ; + ~
(12.3)
for x satisfying the inequality a/2 < x < a/2. Forx = a/2,p{x} = 0; for x = 0, P{x} = 1/2; and for x = a/2, P{x} = 1. The cumulative distribution function is thus zero up to a/2, a straight line with slope l/a from (a/2, 0) to (a/2, I), and 1 for x > a/2, as shown in the lower part of Figure 1.23.
plxl
I/a
a~/2~~Oa~/2~x
Plxl
~~~~x
Figure 1.23
Another simple family of continuous probability density functions is the negative exponential:
p{x}
= ()e IJIU, = 0,
0;:5; x < 00, otherwise,
(12.4)
44
CHAP.
MATHEMATICAL IDEAS
1
where 0 is a fixed parameter greater than zero, and e is the base of natural logarithms. The cumulative distribution function is P{x}
= 0LGle lIt
dt
= o( 
~ e
lIt)
I: =
_ellt
I: (12.5)
The probability density function (12.4) and the cumulative distribution function (12.5) are graphed in Figure 1.24 for the case 0 = 0.5. Each value of 0 will give a different but similar distribution. 1.0 0.9 0.8 0.7 0.6
p1xl
0.5 0.4 0.3 0.2 0.1 5
6
0
Figure 1.24
A particularly important family of continuous probability density functions is the socalled normal or gaussian:
1 e(0:51 S B p{ x} = ==/2a , ../217 (1
(12.6)
where ~ and (1 are fixed parameters, 17 has its traditional meaning, and e is the base of natural logarithms. The cumulative distribution function is P{x} =
fGl
1
J217
(1
e
r
i
£ .~
3
7'
~
~
L
y
2
2
y
SECT.
1.13
47
TRANSFORMATIONS OF PROBABILITY FUNCTIONS
two events must be equal: PY{Yl} AYI = Px{X1}[g(Yl
+ AYl) 
(13.6)
g(Yl)],
whence, dropping the subscript and substituting g(y) for x: py{y} = Px{g(y)} g(y
+ Ay) 
g(y) .
(13.7)
Ay
In the limit Ay + 0 we obtain py{y} = Px{g(y)} dd g . Y
(13.8)
In words, the probability density function for Y equals the formula for the probability density for X, with g(y) '= x substituted for x, multiplied by the derivative of g(y) with respect to y. The foregoing discussion assumed that y was a strictly increasing function of x everywhere. The same result is readily obtained if y is a strictly decreasing function of x everywhere, the only difference being that in (13.8) in place of dg/dy the absolute value of dg/dy, namely Idg/dyl, appears: py{y} = Px{g(y)}
I~~ I·
(13.9)
As an example, consider the normal density function (12.6). As it stands, it is a different function for every pair of values of and C1, and it is obviously impossible to tabulate an infinity of normal distributions. However, all these normal distributions can be reduced to a common form, the standardized normal distribution, by a simple transformation. Define
e
y =f(x)
xe == C1
x e
  .
C1
C1
(13.10)
The inverse function x = g(y) is obtained by solving (13.10) for x: x = g(y) = C1y
+ e.
(13.11)
The derivative of g(y) with respect to y is dg(y)/dy = C1. We now substitute in (13.8): 2 1 2 1 py(y) = = e IJ /2 • C1 = = e'Y /2 .J27r C1 .J27r This is the standardized normal distribution, for which we reserve the symbol cp(u): (13.12)
48
CHAP.
MATHEMATICAL IDEAS
1
The cumulative form, for which we use the symbol (t) dt = f~c/>(t) dt.
(18.6)
SECT.
1.18
STANDARDIZED NORMAL DISTRIBUTION
65
But
L" ,p(t) dt = 1  L~,p(t) dt,
(18.7)
so ( u) = 1  (u).
(I8.8)
Let up be that value of u corresponding to the P point of the cumulative distribution function; i.e., if we take an observation at random from a standardized normal distribution the probability is P that the observation is less than up: (18.9) Pr{u < up} = P. But Pr{u
< up} = (up), i.e., (18.10)
From (18.8), (up) = 1  (up),soP = 1  (up), and (up) = 1  P. But (u1_p) = 1  P, so (18.11)
Table I in the Appendix gives the values of P corresponding to specified values of Up. For example, if up = 2, P = 0.02275, and if up = 2, P = 0.97725. Table I can be used conversely to find up for a specified value of P, but Table VII does this directly for the more commonly used values of P. For example, if P = 0.025, Up = 1.960, and if P = 0.975, Up = 1.960. These points are plotted in Figure 1.28. These two operations, namely given up to find P and given P to find Up, can also be performed readily for any normal distribution if we know its mean ~ and its variance 0'2. In the equation for the cumulative distribution function of the normal distribution, P{x}
1 f'" =~ .J27rO'
change the variable from t to u = (t P{x}
=
e(ts)
•• 12fT dt,
~)/O'.
Then
(18.12)
<Xl
1 f("'S)/fT e"'/2O' d ll =(x ~). .J27rO' 00 a
(18.13)
For example, suppose that we are given that ~ = 100, a = 10, and we want the probability that an observation taken at random from this distribution is less than 120. We have
Pr{X
< 120} =
P{120} =
C 20 ~ 100)
= (2) = 0.97725.
(18.14)
66
CHAP.
MATHEMATICAL IDEAS
1
Conversely, given the same distribution, what is the value of Xl' that a fraction P of all observations are less than? We solve the equation (18.15)
For example, if P is specified as 0.95, we know that the solution of UO•95 in (UO•95 ) = 0.95 is UO•95 = 1.645, so (1.645) = 0.95 = ( X O•95a
~).
(18.16)
Thus 1.645 = (:l;O.95  ~)/a = (xO.05  100)/10, or X O•95 = 116.45. Equation (18.13) will give the probability that an observation will lie k or more times the standard deviation below the mean: Pr{X
~
+ ka} = 1 
Pr{X
< ~ + ka} =
1  (k)
= ( k). (18.18)
We can also ask for the probability that an observation deviates from the mean by more than k times the standard deviation in either direction, i.e., that X lies outside the interval ~ ± ka. This is Pr{X
~ + ka} = ( k) + ( k) = 2( k) = 2[1  (k)]. for k = 1.96, 2(1.96) = 2 X 0.025 = 0.05, or
ka}
For example, tively 2[1  (1.96)] = 2(1  0.975) = 0.05. (18.17) can be written in an alternative form: Pr{X < ~

ka}
(18.19)
alternaThe lefthand side of
+ Pr{X > ~ + ka} = Pr{X  ~ < ka} + Pr{X  ~ > ka} = Pr{(X  ~) > ka} + Pr{X  ~ > ka} = Pr{IX  ~I > lw} = 2( k).
(18.20)
where k is a positive number. Complimentarily, Pr{IX  ~I
< ka} =
1  2( k).
(18.21)
SECT.
1.19
67
DISCRETE BIVARIATE DISTRIBUTIONS
1.19. Discrete Bivariate Distributions
Consider the experiment consisting of dropping three light bulbs in sequence with possible outcomes U = undamaged, F = filament broken, and G = glass envelope broken. The sample space contains 27 sample points. If the successive drops are independent, then for example Pr{U, U, U} = Pr{U} Pr{U} Pr{U}. Define two random variables, X = number of U's, Y = number of F's, and let Pr{U} = 1/2, Pr{F} = 1/3, Pr{G} = 1/6. For each of the 27 points Table 1.15 (U, U, U)(3, 0) (U, U, F)(2, 1) (U, U, G)(2, 0) (F, U, U)(2, 1) (F, U, F)(1, 2) (F, U, G)(1, 1) (G, U, U)(2,0) (G, U, F)(I, 1) (G, U, G)(1, 0)
27 18 9 18 12 6 9 6 3
(U, F, U)(2, 1) (U, F, F)(I, 2) (U, F, G)(I, 1) (F, F, U)(1, 2) (F, F, F)(O, 3) (F, F, G)(O, 2) (G, F, U)(1, 1) (G, F, F)(O, 2) (G, F, G)(O, 1)
18 12 6 12 8 4 6 4 2
(U, G, U)(2,0) (U, G, F)(1, 1) (U, G, G)(I, 0) (F, G, U)(I, 1) (F, G, F)(O, 2) (F, G, G)(O, 1) (G, G, U)(1, 0) (G, G, F)(O, 1) (G, G, G)(O, 0)
9 6 3 6 4 2 3 2 1
in the sample space we can tabulate the values taken by X and Yand also the probability of that point occurring; see Table 1.15. For example, for the sample point (U, U, U) we have x = 3, y = 0, and Pr{U, U, U} = (1/2)(1/2)(1/2) = 1/8 = 27/216. This sample point is therefore labeled (U, U, U) (3, 0) 27, the denominator 216 being omitted for conciseness. From Table 1.15 we can construct Table 1.16 showing the probability of obtaining any pair of values of x and y. Table 1.16
216 P{Xi, y;} (0,0) (1,0) (2,0) (3,0) (0,1) (1, 1) (2, 1) (0,2) (1,2) (0,3)
1 3+3+3= 9 9 + 9 + 9 = 27 27 2+2+2= 6 6 + 6 + 6 + 6 + 6 + 6 = 36 18 + 18 + 18 = 54 4 + 4 + 4 = 12 12 + 12 + 12 = 36 8
68
CHAP.
MATHEMATICAL IDEAS
1
Table 1.17
216 p{x;, Yi} Yi Xi
2
0
~ 216 P{Xi, Yi}
3
1 2 3
1 9 27 27
6 36 54 0
0 0
0 0 0
~ 216 P{Xi, Yi}
64
96
48
8
0
i
12 36
i
8
27 81 81 27
It is natural to express Table 1.16 in a twoway form (Table 1.17) and also to represent it graphically (Figure 1.29). In general, a discrete bivariate distribution will appear as in Table 1.18, where x and yare confined to the discrete values Xl' ••• , X" and Yh ... , Ym • The probability of any particular pair of values, Xi' Yi' is p{xj> Yi}' The marginal probabilities for X can be obtained by summing over y, and
plx,y! 0.6 0.5
y
0.4 0.3 0.2 0.1 0
x
Figure 1.29
SECT.
1.19
69
DISCRETE BIV ARIATE DISTRIBUTIONS
vice versa: m
PX{Xi}
= ~P{Xi' YI}' I
(19.1)
n
PY{Yi} = ~ P{Xi' YS}·
(19.2)
i
For example, for Table 1.17,
PY
{2}
= 12 + 36 + 0 + 0 = ~ = ~ . 216
216
9
Table 1.18
Yl
Ys
Ym
Xn
This result can be checked by regarding Y as a binomially distributed variable with n = 3, f) = 1/3, and using (9.3):
Pa{Y = 2}
=
(23) (13)~( 1  31)32= 92.
The conditional probability that X = Xi' given that Y = YS' can be found by using (4.3): Pr{X
=
xt/Y = Y;}
=
Pr{X
=
Xi and Y = Yi} Pr{Y = ys}
=
p{Xj, y,}.
(19.3)
p{ys}
It is convenient to call this the conditional probability function and use the symbol p{x/y}. Also, if X and Yare independent, using (5.9), namely Pr{E1 IE2} = Pr{E1}, gives Pr{X =
xii Y = Yi} =
Pr{X =
Xi}'
(19.4)
70
MA THEMA TICAL IDEAS
CHAP.
1
Substituting from (19.3) and writing Pr{X = Xi} = P{Xi} we find that in the case of X, y independent
p{x;, Yi} = p{x;}P{Yi};
(19.5)
i.e., the joint probability function is the product of the two marginal probability functions. In the case of the bivariate distribution of Table 1.16, for example, we have just seen that p{ Y = 2} = 2/9, whereas from the table, for example, p{ Y = 2JX = O} = (12/216)/(27/216) = 4/9, so for this distribution p{yJx} ¢ p{y} and hence X and Yare not independent. Similarly,p{X = 3}= 27/216, so p{X = 3}p{Y = 2} = (27/216)(2/9), which is not equal to p{X = 3, Y = 2}, which is O. 1.20. Continuous Bivariate Distributions Consider a function p{x, y}
~
0 for all
X,
y such that
L: L:p{x, y} dy dx = 1.
(20.1)
This function p{x, y} can be regarded as a twodimensional, or bivariate, probability density function and is the y analog of the univariate probability density functions we have considered hitherto. Analogously to (11.5) in the univariate case, we can define a cumulative distribution function
P{Xi' Vi} = Pr{X ~ Xi' Y ~ Vi} =
f "'/ fllJ_oop{x,y}dydx. 00
(20.2)
Graphically, we can use the two horizontal dimensions to represent X, y and the vertical dimension to represent p{x, V}. Figure 1.30 Thus p{x, y} generates a surface above the plane p{x, y} = O. The total volume enclosed between this surface and this plane, by (20.1), is 1. P{x;, Vi} is the volume under the surface measured over the shaded area only (Figure 1.30). In manipulating bivariate probability density functions we will often find it convenient to reverse the order of integration, i.e., to assume
J"'bflldp{x, y} dy dx ~
=
flldJ"'b p{x, y} dx dy.

(20.3)
~ This is always legitimate in the situation in which we are involved. Technically, it is permissible if the integral is absolutely convergent. The
SECT.
1.20
71
CONTINUOUS BIVARIATE DISTRIBUTIONS
requirement that p{x, y} ~ 0 implies that if the integrals converge then they are absolutely convergent, and the requirement (20.1) implies that they are convergent, and hence they are absolutely convergent. The only example of a continuous bivariate distribution we will encounter in this book is the bivariate normal:
(20.4)
This will be discussed in detail in Chapter 12. We may wish to find the marginal probability density function of X alone, given p{x, y}. Intuitively, we would expect to get this by integrating p{x, y} over y. More formally, we can proceed as follows. Define Qx(r) as Qx(r) =
L:
(20.5)
PX,y{r, y} dy.
Then
Pr{oo =
< X < x} = Pr{oo < X < x, 00 < Y
~o. When the alternative hypothesis is ~l < ~o, the critical region is x < ~o + u,.a for a single observation, and (7.14) for the mean of 11 observations. In this latter case the power function is
7T(~I) =
(U,. + ~oa/yIiI).
(7.15)
11
Thus for a test at the level of significance (f. = 0.05, u,. = 1.645, and for (~o  ~l)/( a/J;;) = 1, 7T(~I) = (1.645 + 1) = 0.2594. Therefore we can use Table 2.2 as it stands for the power of this other alternative hypothesis, HI: ~ = ~l < ~o, if we replace (~l  ~o)/(a/J;;) in the table by (~o  ~l)/(a/J~).
SECT.
2.7
109
A ONESIDED TEST FROM A NORMAL POPULATION
We can use (7.12) to calculate the number of observations necessary to give a specified power, I  p, assuming that a is known. Putting (7.12) equal to 1  p, is the entire righthand quadrant and is also a composite hypothesis.
°

~> 0
(0,1)
x (4. 1)
0~~~~7~~~
o
Figure 2.9
Now suppose that, though 010 is fixed under the null hypothesis, 01a is not precisely specified under the alternative hypothesis, and the remaining parameters Oz, ... , Ok are not specified under either hypothesis, so that we are comparing two composite hypotheses. The two hypotheses will define regions in the parameter space. The symbol w is given to the region corresponding to the null hypothesis, and we will use the symbol 0' to represent the region corresponding to the alternative hypothesis. We cannot compute the lefthand side of (9.1) to use as a test statistic, as the values of 010 and Oz, ... , Ok are all unspecified. However, we can compute an analog of (9.1), in which the unspecified parameters are given their maximum likelihood values. The numerator and denominator of (9.1) are both likelihood functions, and by definition the maximum likelihood estimates of the unspecified parameters will maximize the likelihoods. If L(w) and L(O') are these two likelihood functions, then the analog of (9.1) is L(O')/L(w) and we would reject the null hypothesis for large values of this statistic. The usual likelihood ratio test procedure is a slight modification of the above. If 0 represents the region in the parameter space made up of the union of the regions of the null and alternative hypotheses, i.e., o = 0' U w, then we reject the null hypothesis for small values of the likelihood ratio A, A=L(w). L(O)
(9.2)
[Note that A is a function of the observations and hence is a random variable. Although using a Greek letter (A) for a random variable violates
SECT.
2.10
A TWOSIDED TEST FROM A NORMAL POPULATION
113
our practice of confining the use of Greek letters to parameters, the use of A to represent the likelihood ratio is such a widely established convention that we continue the practice.] Clearly A is positive since it is a ratio of products of probability functions which must always be positive. Also Acannot be greater than 1 since the maximum value for varying the parameters in a region w cannot exceed the maximum value for L varying the parameters in a region 0, where w is a subset of O. Thus A must lie in the interval 0 to 1. A small value of A indicates that the likelihood computed using w, corresponding to the null hypothesis, is relatively unlikely, and so we should reject the null hypothesis. Conversely, a value of Aclose to 1 indicates that the null hypothesis is very plausible and should be accepted. We therefore define a critical region as
0< A < Ac'
(9.3)
where Ac is a constant chosen so that the level of significance of the test is at the desired level <x. In a twoparameter situation the test can be pictured graphically as follows. The two horizontal coordinates represent the two parameters and the vertical coordinate the likelihood. First, confined to region w we wander round till we find as large a likelihood as possible: this is L(w). Second, confined to a larger region 0 (which includes w) we wander round till we find a likelihood as large as possible: this is L(O). If L(O) is not much larger than L(w), i.e., if A is close to 1, then the extra liberty of selection permitted under the alternative hypothesis has yielded small improvement in the likelihood, and so the null hypothesis is plausible. On the other hand, if L(O) is much larger than L(w), then the null hypothesis appears implausible. The likelihood ratio procedure produces a plausible statistic A, and will indicate the general nature of the critical region. The critical region cannot be determined exactly for a test at the level of significance <X unless we know the distribution of A under the null hypothesis. Wilks [9] showed that for large sample size (and moderate restrictions on the continuity of the likelihood function) the distribution of 210g A tends to the x2(r  s) distribution where r is the number of parameters fitted under o and s the number of parameters fitted under w. 2.10. A TwoSided Test of an Observation from a Normal· Population with Known Variance To apply the procedure of the previous section to testing that a group of n observations came from a normal population with mean ~o and known variance (j2 against the alternative that the mean is ~1 ¥: ~o, we write down
114
CHAP.
STATISTICAL IDEAS
2
the likelihood function (3.6):
2 2 1 )11/2 L= (   e U / 2u )I:C"'iS).
(10.1)
27T(J2
In Section 2.3 we saw that, allowing ~ to vary, L was maximized by putting ~ = :E xi/n = x, (3.10). Thus (10.2) Under the null hypothesis Ho no parameters are allowed to vary, since we are assuming (J2 to be fixed and known, and ~ is fixed at ~o; thus L(w)
= (_1_)n/2eU/2U2)I:C"'iSO)~
(10.3)
27T(J2
Substituting these values for L(w) and L(n) in the definition of the likelihood ratio, (9.2), and canceling out common factors gives eU/2(2) I:C"'iso)2
=
A.
(10.4)
eC1/2u 2 )I:C"'i:l:) " .
But 1 (Xi  ~O)2
=1
[(Xi  x)
+ (x 
~O)]2
= 1 (Xi 
X)2
+ n(x 
~O)2,
(10.5) since 21 (Xi  x)(x  ~o)
= 2(x 
~o)
1 (Xi 
x)
= 2(x 
~o) . 0
= O. (10.6)
Thus the likelihood ratio is (10.7)
The edge of the critical region, xo' is determined by eClI/2U")(:fcso)'
Xo
= A.
(10.8)
0'
or, solving for xo'
= ~o  " 2 log A.o
;n
and
~o + .J210g A.o
;ii'
(10.9)
Since A. c is less than 1, log A. o is negative, so Xc is of the form Xc
.J
= ~o ±
k
;n
(10.10)
where k = 2 log A. c• From (10.7) we see that if x is far from ~o, in either direction, then (x  ~O)2 will be large, and hence A. small. Thus
SECT.
2.10
115
A TWOSIDED TEST FROM A NORMAL POPULATION
large values of Ix  ~ol call for rejection of the null hypothesis. The critical region is that part of the line below ~o  ka/..J~ plus that part of the line above ~o + ka/..J;;; see Figure 2.10. The total area of the curve in the critical region has to equal oc: since the normal distribution is symmetrical about its mean this implies that the area of each tail is oc/2. The critical region will therefore be 
x
a < 5"0 + 11«/2 ..J11 I:
an
d
x
a > 5"0t. + 111«/2 ..J11 .
a/2
(10.11)
a/2 ku/~
Critical region _
+ku/~
~o ~ Region of acceptance ~~ Critical region
Figure 2.10
Since
U«/2 = U1«/2' the lefthand term can be written x < ~o 
ul_«/2a/..J~, and (10.11) is equivalent to saying: reject if (x  ~o)/(a/..J;;) is less than
u1 «/2 or greater than +u1 «/2' This is equivalent to saying:
Ixa/.Jn  I >
reject if
~o
The power of this test, when
~
=
~l>
_«/2'
(10.12)
U1
is
7T(~I) = pr{xa0~0 < U 1«/2} + pr{xa~o > U1«/2} ~1
X
~1

~o
= Pr { a/..Jn + a/..Jn < U 1«/2
}
X  ~1 ~1  ~o } 1 + Pr {r: a/y n + a/y /n > U «/2 ~o} X  ~1 = Pr {r < U«/2  ~1 "j= + 1 a/y n a/y n
X  ~1 r < U«/2 a/y n
 Pr {
~1  / ~o} . a/y n
(10.13)
116
STATISTICAL IDEAS ~
Now under the alternative hypothesis
CHAP.
=
~l'
2
the random variable
(x  ~l)/«]/.J~ is a unit normal deviate. Each of these two probability statements is therefore of the form Pr{u < k} = TlO o)}
=
1  P 2,
which imply (14.5) Now suppose that we have taken a sample of observations and have calculated the numerical value of the estimate, say To. In Figure 2.14 let the horizontal line parallel to the 0 axis through the point To on the T axis intercept the two curves T2(0) and TICO) at points A and B. Drop lines parallel to the T axis through the points A and B to intercept the 0
SECT.
2.14
CONFIDENCE LIMITS:
123
THE GENERAL METHOD
axis at points f} and 0. We assert that the interval (f), 0) is a P 2 confidence interval for 0 ; i.e.,
°
°

PI
(14.6)
The justification for this assertion is as follows. Enter 0, the true value of 0, on the axis. Erect the perpendicular at this point to cut the curves T1(O) at C and T2(O) at D. At both these points has the value 0 , so, at C, T = T1(Oo), and, at D, T = T2(Oo). Now draw the horizontal lines
°
°
°
L~L~~~6
Figure 2.14
through C and D to cut the T axis at T1(Oo) and T2(Oo) respectively. As drawn, Figure 2.14 has three properties: (a) AB intersects CD,
(b) To lies in the interval (T1(Oo), T2(Oo», (c) The interval (f), 0) includes 0 •
°
Figure 2.15 is constructed by the same procedure as Figure 2.14 the difference being that in Figure 2.15 we assume that 00 is greater than 0. Figure 2.15 has three properties: (d) AB does not intercept CD,
(e) To does not lie in the interval (T1(Oo), T2(Oo»,
(f) The interval (f), 0) does not include
°
Thus the two statements (b) To lies in the interval (T1(Oo), TlO o», (c) The interval (f), 0) includes 0 ,
°
0•
124
STATISTICAL IDEAS
CHAP.
2
are always true simultaneously or false simultaneously. But by (14.5), the event (b) has probability P 2  PI' so the event (c) must also have the same probability, namely P 2  Pl' Hence we can write (14.7) It will be noted that ~ and 0 are functions of To, which is a function of the observations and hence is a random variable, and therefore ~ and 0
L~~~L~9
Figure 2.15
are random variables. The confidence interval W, 0) is thus a random variable. On repeating the experiment of taking the sample, calculating the estimate To, and constructing the confidence interval, in general we will get different numerical values for ~,O. Nevertheless, our procedure guarantees that in a portion P 2  PI of those experiments the intervals W, 0) will include the true value of
e.
2.15. Confidence Limits for the Mean of a Normal Distribution with Known Variance We will now illustrate the above procedure by finding confidence limits for the mean; of a normal distribution with known variance a 2 • We assume that we have a sample Xl' ... ,XII' We know that the mean of the observations, say X, is an unbiased estimator of ;, with density function (1.26.1). We also know from (1.26.3) that the cumulative
SECT.
2.15
CONFIDENCE LIMITS OF A NORMAL DISTRIBUTION
125
distribution function of X is
Pr{X < x}
=1'" p{x:n dx =tribution (lC.8), for a single observation X show that the maximum likelihood estimator of 8 is a/x. This is a biased estimator. Show that (a  1)/(x  1) is un unbiased estimator of 8. 2.5. (a) Given a sample of size 11, namely Xl' ... , XII' from an exponential distribution,
p{x}
= 8e 0OO ,
o~ X
(10
+ 1/2 
Pr{X
ne) _ ct>(x  1/2  ne).  e) ~ne(1  e)
e=
0.1, x = 10; then
100
X
(1.24)
0.1) _ ct>(10  1/2  100 X 0.1). ~100 X 0.1 X 0.9
~100 X 0.1 X 0.9 ~
< x}
I
ct>(0.1667)  ct>( 0.1667) = 0.5662  0.4338
(1.25) = 0.1324. The exact answer is 0.1319. The alternative forms of (1.22) and 0.23) referring to It = x/n, obtained by dividing the numerator and denominator of the argument of ct>( ) by n, are
Pr{H Pr{H
e), 1/2n  e).  e)/n
< h}""'" ct>(h + 1/2n 
(1.26)
< h} ,...... ct>(h 
(1.27)

~e(1
 e)/n
~e(1
For a rigorous proof of the results of this section, with a careful attention to remainder terms, etc., the reader is referred to Uspensky [5] or Feller [6]. The approximations developed in this section are usually satisfactory if ne(1  e) > 9. Thus if e = 1/2, an n of 36 is large enough, but if = 1/10, n needs to be ~ 100.
e
3.2. Testing Hypotheses about the Binomial Distribution with a Normal Approximation. Suppose that we observe x defectives in a sample of size n and wish to test the null hypothesis that the fraction defective e equals eo against the alternative hypothesis e = e1 wpere e1 > eo. This problem was discussed in Section 2.8, in terms of lilypothesis testing at a stated level of
SECT.
3.2
TESTING HYPOTHESES ABOUT THE BINOMIAL DISTRIBUTION
141
significance. Alternatively, we can set Pr{X
~ x} = v~J:)o~(1 
0ot V
= P,
(2.1)
and, for example, using the tables cited in [1], [2], [3], or [4], evaluate P. This will be the P value of the null hypothesis. To use the normal approximation to the binomial distribution, we note that Pr{X < x}
+ Pr{X ~ x} =
(2.2)
1,
whence, using (1.23),
P
= Pr{ X ~} x = 1
Pr{ X is a random variable, and likewise n.2. We wish to test the null hypothesis that (Jl = (J2 = (J, say. Define h1_ nn ,
h 2  n21
nl.
(6.1)
•
11 2.
Under the null hypothesis, (6.2)
= (J(1 To estimate V[h 1


(J)(..l 111.
+ ..l).
(6.3)
n2.
h2 ] we need an estimate of (J(1 
(J) = (J 
(6.4)
(J2.
Under the null hypothesis that (Jl = (J2 = (J we can use the column totals as referring to a single sample of size n from a population with fraction defective (J. From (1.15.8), n.l/n is an unbiased estimator of (J, and from (1.17.9), 1)
11. 1(11. 1 
(6.5)
1)
11(11 
is an unbiased estimator of (J2. Thus an unbiased estimator of (J(l  (J) is 11.1
11. 1(11. 1 
11
11(11 
1) 1)
=
11.111.2 11(11 
1)
,
(6.6)
152
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
whence
V[h1  h2 ] =
=
3
(..l + ..l) 11 2.
11.111.2 1) 111.
11(11 
=
CHAP.
11.111.2 (112. + 111.) n(n  1) 111.112.
n. 111. 2 (11  1)111.112.
(6.7)
Therefore, under the null hypothesis that E[11 1 ]
E[h 2 ]

= 0, (6.8)
111 11 1
1111 
=
'' 11
(6.9)
J
[ 111.11.1112.11.2 n 2(11  1)
~
is a unit deviate asymptotically normal. The form (6.9) will be discussed in Section 3.10. Empirical investigation seems to indicate that the approximation is improved by replacing the 11  1 in (6.8) by 11. The statistic can then be written in the form .Jh{1  11)(1/111.
(6.10)
+ 1/11 2,)
where h = 11. 1 /11. The approximation is also improved by introducing corrections for continuity. If the alternative hypothesis is that 01 > O2 , then the P value is derived from u 1_p =
..:......::';::::::=====~=;===~~:::..:.
.Jh(l 
If the alternative hypothesis is 01
up=
+ 1/211 2,) h)(I/lIl, + l/lId
(hI  1/2111.)  (h2
(hI
(6.11)
< O2 , then the P value is derived from
+ 1/2111 ,) 
(h2  1/211 2,)
.Jh(l  h)(l/lIl,
+ 1/11
.
(6.12)
2 ,)
For a twosided test, if hI > 112 we use (6.11) with U1_ P replaced by U1P/2, and if hI < 112 we use (6.12) with Up replaced by u p / 2• A criterion for deciding whether the approximation of this section is satisfactory is to compute the quantities ll i .ll,;/n for all four combinations
SECT.
3.6
COMPARISON OF TWO OBSERVED FREQUENCIES
153
of i and j. If in every instance this quantity is greater than 5, then the approximation is good. The figure 5 is perhaps rather conservative, and something of the order of 3.5 will be usually adequate. If this criterion is not satisfied, so we cannot use this normal approximation, then this situation can be handled with Fisher's exact test, described in Section 3.10. As an example, Table 3.2 shows the number of cases with reactions observed on using two types of rubber tubing for injection of a certain substance. Ifwe assume that patients were allocated at random to the two "treatments," rubber A and rubber B, any difference between the groups will be attributable to a difference between the rubbers. (On the Table 3.2 Number of cases Type of rubber A B Totals
With reactions
Without reactions
Totals
27 5 32
13 10 23
40 15 55
other hand, if random allocation was not used, it will probably be impossible to conclude anything useful from the data. If there is a significant difference, we will not know to what it is due.) The smallest of the four quantities ni.n.;/n is 23 X 15/55 = 6.27. Since this is greater than 5, the normal approximation of this section will be satisfactory. We calculate II = 32/55 = 0.5818 and then compute, for a twotailed test,
1) (5 1) (4027  2;40  15 + h1s = 1.981 = !.i
[0.5818(1 
0.5818)C~ + 115) ]
!/lPI2,
(6.13)
whence P = 0.0476. We would thus conclude at the 5 per cent level of significance that the two groups differed in their percentage reactions, presumably on account of the different rubber tubing. For calculating powers and necessary sample sizes, the angular transformation is convenient: Yl = 2 arcsin Jill' Y2 = arcsin J~ have variances 1/n1, 1/n 2, writing n1 = nl" n2 = n2,> so V[Y1  Y2] = 1/n1 + 1/11 2 , Let 'YJt> 172 be equal to 2 arcsin (Jt> 2 arcsin (J2 respectively. The power
J
J
154
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
of a twosided test is given by appropriate substitutions in (2.10.15): 7T(01  ( 2)
= (U"/2

'YJ1  'YJ2 ) ./1/111 + 1/112
+ (U"/2 +
'YJ1  'YJ2 ), ./1/111 + 1/112
(6.14) and for a onesided test against the alternative 01 7T(01  ( 2) = (U"
+
> O2 , (2.7.12) gives
'YJ1  'YJ2 ). ./1/111 + 1/112
(6.15)
For calculating necessary sample sizes we need to make some assumption about the relative magnitudes of /1 1 and /12' It is easy to show that /1 1 + /12 for a specified power, etc., is a minimum when /11 = /1 2 = m, say, so that the standard deviation ofYl  Y2 is ./2/m. For the twosided test, (2.10.19) gives
m = (U1_P
+ U1_,,/2)2
2 2 ('YJ1  'YJ2)
'
(6.16)
and for the onesided test, (2.7.18) gives 111
=
(U1_P
+ U1_,,)2
2 2 ('YJ1  'YJ2)
(6.17)
For example, if we wish to have a power 1  {3 of 0.95 of rejecting the null hypothesis 01 = O2 at the level of significance ex = 0.01, making a onesided test, when in fact 01 = 0.10, O2 = 0.05, then m
=
2(1.645 + 2.326)2 (0.6435  0.4510)2
= 850.
(6.18)
Note that m = 850 implies that /1 1 = /1 2 = 850, i.e., we need 850 observations on each population, making a total of 1700 observations.
3.7. The Correlated Two x Two Table The standard way of presenting the results of two sets of independent trials is a 2 x 2 table such as Table 3.1 and the null hypothesis that the proportions of "successes" is the same in each population can be tested by the methods of Section 3.6. However, there are some experimental situations which give rise to data which may also be put in a 2 X 2 table which cannot correctly be analyzed in that way. If we were given Table 3.3 (from Mosteller [8]) and asked to test the null hypothesis that the probability of nausea was the same with either
SECT.
3.7
ISS
THE CORRELATED TWO X TWO TABLE
Drug
Table 3.3 Nausea No nausea 18 10 28
A B
Totals
Patient 1 2
3 4 5
100 100 200
82 90 172
Table 3.4 Drug A DrugB N N
N
IV IV
N
IV IV
100
drug, we might be tempted to proceed with the normal approximation for the comparison of two proportions, Section 3.6, without pausing to inquire how the data was obtained. A naIve interpretation of Table 3.3 would be that we took two samples each of 100 patients and gave one set of 100 drug A and the other set of 100 drug B. In actual fact, the data was obtained from just 100 patients, each of whom received both drugs, with results as sketched in Table 3.~, where Nand N mean nausea and no nausea. Nine patients, like patient number 1, had nausea with both drugs; 9 patients like number 2 had nausea with drug A but not with drug B; only 1 patient, number 3, had nausea with drug B but not with A; and 81 patients like patient number 4 had no nausea with either drug. The data of Table 3.4 should therefore be summarized as in Table 3.5. Table 3.6
Table 3.5
DrugB N IV Totals
Drug A N IV
Totals
9 1 9 81 18 82
10 90 100
Drug A IV N DrugB N IV Totals
Totals
'lTn
'lTI2
'lTi.
1121
'lT22
'lT2.
'IT.I
'IT.2
156
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
As can be seen from the column totals, there were 18 cases of nausea and 82 without nausea with drug A, which is how the first row of Table 3.3 was formed. Likewise the row totals, 10 and 90, are the figures for drug B in Table 3.3. The essential features of the original data in Table 3.4 have been lost in the summarization of Table 3.3 but have been retained in Table 3.5. Table 3.3 can be constructed from Table 3.5, but the reverse is not true. Table 3.6 gives the population proportions corresponding to the observed frequencies of Table 3.5. We are interested in the difference 1T1.  1T.l' which is 1T1. 
= (1Tll + 1T12) 
1T.l
( 1Tll
+ 1T21) = 1T12 
1T21.
(7.1)
Thus in Table 3.5, the 9 patients who had nausea with both drugs, and the 81 who had nausea with neither, tell us nothing about the difference between the drugs. All the information on this question is contained in the other diagonal, presented in Table 3.7. If there was no difference Table 3.7. Patients Who Responded Differently to the Two Drugs
Favorably to A and unfavorably to B Unfavorably to A and favorably to B Total
1 9
10
between the drugs, we would expect these 10 patients to be split on the average 50: 50 between the two categories in Table 3.7. The onesided P value for the null hypothesis is thus given by the sum of the terms in the binomial tail: Pr{X ~ X/II, O} With 0 = 1/2,11 Pr{X
~
=
X/II, O}
10,
X
=
'" (11) = v~o 'Jl OV(1
 0t v•
(7.2)
1, we have
= ~0(~)OV(1  O),,v =
~o 1
(10) 'Jl
(!)V(1 _!2}yov = l..2 (~ ~) 2 0! 10! + I! 9! 10
= 0.01074,
(7.3)
or with the normal approximation (1.22), Pr{X
~ x} = 1/6, what is the probability of the null hypothesis being rejected if in fact 0 = 1/5? Use both the methods of Sections 3.2 and 3.4. (b) If instead of fixing the sample size at 115, the sample size is to be chosen so that the probability of rejecting the null hypothesis () = 1/6 when in fact o = 1/5 is 0.90, what should this sample size be? Use both the methods of Sections 3.2 and 3.4. Suppose that the experiment is actually carried out and in 115 jobs 27 have the front cylinder in the worst condition. (c) What is the P value of the null hypothesis that () = 1/6 (assuming that the alternative hypothsis is () > 1/6)? (d) Give 95 per cent confidence limits for the proportion O. Since the sample size is rather large, a normal approximation will be satisfactory. 3A.3. Fertilizer was stored in drums of a certain type. After a certain period, it was observed that in a sample of 63 drums 1 had split seams. Calculate 90 per cent confidence limits for the proportion of split drums in that population of drums (a) by a normal approximation (b) exactly. 3A.4. The Chevalier de Mere thought that it paid to bet evens on (i) getting one or more sixes in 4 throws of a die; but not on (ii) getting one or more double sixes in 24 throws with a pair of dice. In point of fact, the true probabilities (assuming fair dice) for these events are 0.51775 and 0.49141 respectively. Suppose that you are planning experiments to test empirically the null hypotheses (a) that the probability of (i) is 0.5, against the alternative that it is 0.51775. (b) that the probability of (i) is equal to the probability of (ii), against the alternative that they are 0.51775 and 0.49141 respectively. Assuming onesided tests of significance with c£ = 0.05, in each case, (a) and (b), how many observations should be taken to give probability 0.9 of rejecting the null hypothesis? 3A.S. Suppose that you are planning a clinical trial for a proposed vaccine against a disease which has a low and variable incidence from year to year. Therefore it is necessary to run a control group. Suppose that the average Incidence for a season is 1 per 50,000. Suppose that you wish the trial to have a
158
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
probability of 0.99 of rejecting the null hypothesis that the vaccine is without effect if in fact the vaccine reduces the incidence by onehalf. Suppose that a level of significance 0.01 is to be used, and only a onesided alternative is to be considered. Assuming that the two groups will be made the same size, what is the total number of subjects required? (Note that for very small values of x, sin x ~ x and arcsin x ~ x.) 3A.6. In an experiment to test whether seeding clouds of a certain type causes them to produce radar echoes corresponding to the occurrence of rain (R. R. Braham, L. J. Battan, and H. R. Byers, "Artificial Nucleation of Cumulus Clouds," Report No. 24, Cloud Physics Project, University of Chicago), on each suitable day a plane flew through two clouds that met certain criteria and one of the pair, chosen at random, was seeded and the other not. For 46 flights, echoes occurred in both clouds 5 times, in the unseeded cloud only 6 times, in the seeded cloud only 17 times, and in neither cloud 18 times. Find the onesided P value for the null hypothesis that seeding is without effect.
3.8. The Hypergeometric Distribution Suppose that from a population of N elements, of which M are defective and N  Mare nondefective, we draw a sample of size 11, without replacement. What is the probability that our sample contains exactly x defectives,
p{x}? The following equations are evident: Pr{first is defective}
= ~,
(8.1)
I
Pr{second is defective first is defective}
=
(8.2)
M  1,
Nl
I
Pr{xth is defective first, second, ... , (x  l)th are defective}
M  (x  I)
(8.3)
=N  (x  I) ,
+ I)th is nondefective Ifirst x were defective} = N  M, Nx Pr{(x + 2)th is nondefective Ifirst x were defective and NMI the (x + I)th was nondefective} = , NxI Pr{(x
(8.4)
(8.5)
I
PrUx + (n  x)]th is a nondefective first x were defective and (x + I)th, ... , [x + (n  x  I)]th were nondefective}
=
N  M  (II N  (x + II 
X X 
1) I)
=
N  M  II +
X
NII+1
+ 1
(8.6)
SECT.
3.8
159
THE HYPERGEOMETRIC DISTRIBUTION
Thus the probability of drawing the sequence D··· D D··· D, in which D recurs n times and D recurs n  x times, is the product of the probabilities (8.1), ... , (8.6):
Pr{D ... D D· .. D}
:=
M(M  1) ... (M  x + 1) X (N  M)(N  M  1) ... (N  M  n + x N(N  1) ... (N  x + 1) X (N  x)(N  x  1) ... (N  n
+ 1) (8.7)
+ 1)
If we multiply numerator and denominator by (N  n)! (N  M  n
+ x)! (M 
x)!
(8.8)
we obtain
Pr{D ... D D . .. D}
=
(N  n)! M! (N  M)! (M  x)! (N  M  n + x)! N!
(8.9)
This probability was derived for the specific sequence of x D's and n  x D's, but will hold for any other sequence of the same numbers of D's and D's. By (1.7.7) the number of such sequences is nt/x! (n  x)!. Thus the probability of obtaining x D's and n  x D's, the particular sequence not being specified and being regarded as irrelevant, is n! (N  n)! M! (N  M)! p{x} = x! (/1  x)! (M  x)! (N  M  n + x)! N! =
(~)(~ =~) (~)
,
(8.10)
where there are certain restrictions on the variable x, namely
0::;; x::;; n,
O::;;x::;;M,
0::;; n  x::;; N  M.
(8.11)
This result can be obtained more directly by the following argument. There are
(~)
different ways, all assumed equally likely, in which to
draw a sample of size n from a population of N elements. We want the number of ways which give samples with x defective elements. The x defective elements must have been drawn from the M defective elements in the population, and this can be done in
(~) ways.
Likewise the n  x
nondefective elements must have been drawn from the N  M nondefective elements in the population, and this can be done in
(Nnx  M)
160
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
ways.
These two drawings are independent.
CHAP.
3
Therefore there are
(~) (~ =!') ways of drawing a sample of size
n with x defectives.
Thus p{x}, being the number of ways of drawing a sample of size n with x defectives divided by the total number of ways of drawing a sample of size n, is
which is the same result as (S.IO). The hypergeometric probability function (S.lO) is tedious to handle numerically, except with the aid of a published table which goes up to N ~ 100 [9]. We might expect that in the limiting case in which the population size is large compared with the sample size it might be approximated by a binomial probability function, since in this case the distinction between sampling without replacement (hypergeometric) and with replacement (binomial) tends to zero. This we will now demonstrate. We write (S.lO) in the form P{} x
=
M! (NM)! x! (M  x)! (11  x)! (N  M  11
= M(M
 1) ... (M  x x!
+ x)!
1I!(N1I)! N!
+ 1)
x ..:...(N__M''.)(N__M__1')_.·_·'(N__M__1I_+'x_+"1) (11  x)! 11! x N(N  1) ... (N  11 + 1) II!
= x! (11  x)!
X X
M(M  1) ... (M  x + 1) (N  M)(N  M  1) ... (N  M N(N  1) ... (N  11 + 1)
II
+ X + 1).
(S.12)
Ignoring the first part of (S.12), n !/x! (n  x)!, the numerator is made up of two sequences of terms, the first sequence, M(M  1)· .. , being x in number, and the second sequence, (N  M)(N  M  1) ... , being n  x in number, so the total number of these terms is x + (n  x) = n. In the denominator, the sequence N(N  1) ... is made up of n terms. Dividing both numerator and denominator by Nn, there will be one N
SECT.
3.8
161
THE HYPERGEOMETRIC DISTRIBUTION
for every term in the numerator and denominator. Denoting the proportion of defectives in the population, M/N, as we get
e,
p{x} =
11'
x!
.
(11 
x)!
1)... (e 'XI) 1) ... (1  ee('e N J:j (1  e) ( 1  e N x
1) (
(
11 
11  ~ XI)
1)
IIN"'I~
(8.13)
e held constant, and 11 and x also fixed, p{x}""" (:)e"'(1  et"', (8.14)
If now N tends to infinity with
which is the usual binomial probability function. A rough criterion for the validity of the approximation is that n/N < 0.1; i.e., for the sample size to be less than 10 per cent of the population size. As an illustration of the degree of approximation given in one case with I!/N = 0.1, Table 3.8 gives the values of p{x} for the hypergeometric probability function with N = 1000, M = 20, n = 100 along with the values of p{x} for the binomial probability function with = M/N = 20/1000 = 0.02, n = 100. The column for the binomial with = I! = 200, and the column headed Poisson will be referred to in Section 3.11.
e
e om,
Table 3.8
H ypergeometric
Binomial
Binomial
Poisson
x
N = 1000, M = 20, n = 100
0 1 2 3 4 5 6 7 8 9 10
0.1190 0.2701 0.2881 0.1918 0.0895 0.0311 0.0083 0.0018 0.0003 0.0000 0.0000
0.1326 0.2707 0.2734 0.1823 0.0902 0.0353 0.0114 0.0031 0.0007 0.0002 0.0000
0.1340 0.2707 0.2720 0.1813 0.0902 0.0357 0.0117 0.0033 0.0008 0.0001 0.0000
0.1353 0.2707 0.2707 0.1804 0.0902 0.0361 0.0120 0.0034 0.0009 0.0002 0.0000
100
0.0000
0.0000
0.0000
0.0000
(J
= 0.02, n = 100
(J
= 0.01, n = 200
~=2
162
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
3.9. An Application of the Hypergeometric Distribution to Wild Life Population Estimation Suppose that we have an enclosed lake containing N fish. We wish to estimate N. We take a sample of size M and mark the fish and return them to the lake. We now have a population of size N containing M marked elements. Suppose that there exists a period sufficiently long to allow adequate mixing but not so long that births and deaths will have appreciable effects on Nand M. After such a period we take a second sample of size n and find x marked fish. The frequency function of x is, by (8.10), (9.1)
We know M, n, and x. We may estimate N by the method of maximum likelihood; i.e., we find that N which maximizes L = p{x}. We regard p{x} as a function of N, say PN{X}, and consider the ratio PN{x}/PN_l{X}. Increasing N from M we find the largest value of N for which this ratio is greater than 1. Since each successive PN{X} in this sequence is larger than its immediate predecessor, the last in the sequence must give the maximum value for L = PN{x}, The ratio
=
N2

N2 
is greater than one for nM
+ nM , nN + Nx
MN  nN MN 
> N.'!;,
(9.2)
so the maximum likelihood estimator
N is the integer just less than nM/x. If we take N = nM/x, we might note that maximum likelihood estimators sometimes have disconcerting properties. Thus
E[N]
nMJ
= E [
x
=
[IJ = nM
nME x
rnin11,M
!
 . (~)(~ ==~) .
1
~=O x
Since the summation includes the term for x
= 0,
(~)
for which l/x
(9.3)
=
00,
SECT.
3.10
FISHER'S EXACT TEST FOR TWO X TWO TABLES
the expected value of N is 00. However, the expected value of better behaved, being unbiased:
E[~J = [~J = _1 E[x] = _1 . nM =1., N
nM
nM
N
nM
N
163
liN is (9.4)
using the result of exercise (3B.2a). For a detailed examination of these topics see Chapman [10]. 3.10. Fisher's Exact Test for Two
X
Two Tables
In his Design of Experiments [II] Fisher discussed an experimental investigation of a lady's claim to be able to tell by taste whether the tea was added to the milk or the milk was added to the tea. In one form of such an experiment, four cups would be prepared by each method, making a total of eight cups, and presented to the lady for tasting. She would be informed that there were four of each. Each cup could then be categorized (A), according to how it was made, and (B), according to how the lady says it was made. The data from such an experiment could be represented generally as in Table 3.9. Table 3.9 Category B
Category A Totals
B1
B2
Totals
Al
n11
n12
n1.
A2
n2l
n22
n.1
n.2
n2. n
An important feature of Table 3.9 is that the row totals nl, and n2. were fixed by the experimenter, and also the column totals n. 1 and n. 2 are fixed, since presumably the lady, knowing that in fact there are precisely nl, cups with tea added to the milk, will adjust her judgements so that the sum of her n11 and n2l , equal to n. 1, equals the known number n1' Fisher's example has, as stated above, the feature that n. 1 = nl ., and hence n. 2 = n2.> but this is special to his example. More generally, consider an urn containing n balls of which nl, are marked Al and n2. are marked A 2 • Imagine that we have a row of n cells, n.1 marked B1 and n.2 marked B 2 • Then before the sampling starts nl" n2.> n.1> and n. 2 are all fixed, and there is no requirement that nl, = n.1' We now sample the
164
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
balls one at a time, without replacement, placing them serially in the cells, till all the balls have been withdrawn. Let l1ij be the number of type Ai balls which were placed in type D j cells. Then the results of this sampling experiment can be assembled as in Table 3.9. Since all the marginal totals are fixed, knowing say 1111 implies that we know all the other l1 ij . The probability of obtaining a particular value of 1111 can be calculated immediately from the hypergeometric probability function (8.10). We can suppose that we have a finite population of size N = 11 in which M = 111. elements are defective. We take a sample of size 11 = 11. 1, and in this sample we observe x = 1111 defectives. Then N  M
= 11 
111.
= 11 2 .,
11 
X
=
11.1 
1111
=
1121>
(10.1)
and (8.10) gives P{lIn}
=
(Ill) ("2) = lin
"21
(n )
ilL! 11 2.! 1I. 1! 11.2! lin! 1112! 1121! 1I22! 1I!
(10.2)
11.1
This is the probability of obtaining precisely that value for 1111' For a test of significance, we want not only this probability but also the sum of the probabilities of the possible results more extreme in the same direction; i.e., we need to sum the tail ofthe distribution. For a long tail this calculation will be tedious, but it is some help to note that the factor (10.3)
is common to all terms in the series. The data of Table 3.2 will be used to illustrate the arithmetic, though it was not collected under the condition of both sets of marginal totals fixed which we have been supposing in this section. In Table 3.2 the row totals could have been fixed, but the column totals would be random variables and would therefore not be fixed. This point will be discussed further at the end of this Section. Table 3.10 contains the arithmetic. The logarithm of C, (10.3), is I
og
40! 15! 32! 23! 55!
= 47.911645 + 12.116500 + 35.420172 + 22.412494 = 44.757130.
73.103681 (lOA)
In constructing the upper part of Table 3.10, in this instance the observed proportion of reactions with rubber D, 5/15, is less than 27/40, and so we
SECT.
3.10
165
FISHER'S EXACT TEST FOR TWO X TWO TABLES
write down all possible tables in which this proportion is smaller than 5/15, always subject to the restriction that the marginal totals are unchanged. When the entry in this cell has gone from 5 to 0, it can go no further. This gives the 2 X 2 tables across the upper part of Table 3.10. The sum of the probabilities in the last row is 0.02406. This is the P value for this tail. For a twosided test, the usual procedure is to double it, setting 0.04812. Table 3.10 Observed table 27 and more 5 extreme tables
13 28 10 4
12 29 11 3
11 12
30 2
10 13
31 1
9 32 14 0
8 15
log I/u ! log 1/12 ! log I/u ! log 1/22!
28.036983 29.484141 9.794280 8.680337 2.079181 1.380211 6.559763 7.601156
30.946539 7.601156 0.778151 8.680337
32.423660 6.559763 0.301030 9.794280
33.915022 5.559763 0.000000 10.940408
35.420172 4.605521 0.000000 12.116500
Sum log C sum Probability
46.470207 47.145845 48.006183 :Z.286923 3".611285 4.750957 0.01936 0.00409 0.00056
49.078733 ).678397 0.00005
50.415193 0.341937 0.00000
52.142193 K614937 0.00000
Calculation of P values by the Fisher exact test may be somewhat tedious, but tables have been prepared by Mainland [12] giving the probabilities of all possible samples for 111 = 112 ~ 20 and the probabilities of samples in the region of probabilities 0.005 and 0.025 for 111 < 112 ~ 20. Another set of tables by Finney [13] deals with all samples, equal and unequal, up to 111 = 112 = 15, and these were extended by Latsha [14] up to 111 = 112 = 20. For large samples we can develop an approximate test. By exercise (3B.2) we know that 111. 11 .1
(10.5)
E[I1 11 ] =   , 11
V[ 1111 ] _
111.11.1112.11.2 112(11 
1)
(10.6)
,
so that under the null hypothesis of random sampling under the specified model we will have a unit deviate approximately normal: 1111 1I
1111 
E[I1 11 ]
.jV[I111]
III 11 1 _._. /I
(10.7)
166
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
It appears from empirical comparisons that the approximation is improved by replacing the n  I in (10.7) by n, so we use nn nl.n.l n
(10.8)
with a correction for continuity of ±1/2 with the sign chosen to reduce the absolute value of the numerator. An alternative form of the statistic (10.8) is (nnn22  n12n21).J (10.9) U= 14 (n1.n.1n2.n.2)
n
which when corrected for continuity has its numerator written as
{/nnn22 
~}.Jn.
n21 n12/ 
(10.10)
For the data of Table 3.2, these statistics give a P value of 0.0476, to be compared with the exact value of 0.04812. It will be noted that the statistic (10.7), developed for the 2 X 2 table with all margins fixed, is identical with the statistic (6.9), developed for the 2 X 2 table with one set of margins fixed. This suggests that the Fisher exact test can be used for the latter case when the sample sizes are too small to justify the normal approximation of Section 3.6. This in fact was shown to be the case by Tocher [15]. 3.11. The Poisson Distribution
One approach to the Poisson distribution is to consider a limiting case of the binomial distribution. For the binomial, E[x] == ~ = nO; we suppose that n tends to infinity and 0 tends to zero in such a way that nO = ~ remains a nonzero, noninfinite quantity. Then p{x}
=
(n)O:l>(1 _ x
= n(n
o)n:I> =
 1) ... (n  x n:l>
n! (~\:I>(1 x! (n  x)! nl
+ 1) e(1 x!
_
_ ~\n:I>
~)n:I> 11
nl
SECT.
3.11
167
THE POISSON DISTRIBUTION
As n tends to infinity, all the terms in brackets tend to 1, and so does (1  ~/n)", also the limit of (I  ~/n)" is known to be e s, so
+,x.
~.,
p{ x}
e s,
x
= 0, 1, ... ,
(11.2)
and this is the frequency function of the Poisson distribution. From the form of the above derivation, it is apparent that the Poisson distribution can be used as an approximation to the binomial for large n and small 0. Table 3.8 gives the values of p{x} for the binomial with 0= 0.02, n = 100 and 0 = 0.01, n = 200 and for the Poisson with ~ = nO = 100 X 0.02. It will be noted that as n increases and 0 decreases the Poisson probability function becomes a better approximation to the binomial. The expectation of the Poisson distribution is easily found: 00
E[X]
~.,
00
=!
=!
x . e s . .,=0 x!
xp{ x}
.,=0
= e s[ o·
~o
 + 1 . ~1 + 2 . ~2 + ...]
o!
I!
2!
= es[o + ~(1 + :: + ;: + ...)] = eL ~. eS = ~
(11.3)
since ~1
~2
es =1+++···. 11 2!
(11.4)
To find the variance we use (1.17.3) which requires finding ~.,
00
!
E[X(X  1)] =
x(x  1)e S 
x!
.,=0
~"2~2
00
!
=
x(x  l)e S =:='.,=2 x(x  1)(x  2)!
=
~2
i es (x e 2)! = ~2 .! e.s ~IIy! 2
.,=2
making the substitution y
=
,
(11.5)
11=0
x  2 in the summation. Thus
E[X(X 
1)]
=
~2,
(11.6)
whence V[X]
= E[X(X =
~2 
~(~ 
1)]  E[X](E[X]  1) 1)
=
~.
(11. 7)
168
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
For a random sample of observations from a Poisson distribution, the maximum likelihood estimator is found by maximizing the likelihood, Xl, ••• , X n ,
(11.8)
or equivalently, maximizing the logarithm of the likelihood, log L=
/1~ + (~Xi) log ~ 
To maximize, we differentiate with respect to
~
(1)" Xi!)'
/1
(11.9)
and equate to zero:
+ (ni:Xi ) . ~1 =
d log L
~=
log
(11.10)
0
whence n
;: 0: pil
+ dl) = Pr{x events in interval (0, I + dl)} Pr{x in interval (0, I)} Pr{O in interval (I, I + dl)} + Pr{x  1 in interval (0, Pr{l in interval (I, I = px(/) . (I  ~ dl) + PXl(/) . ~ dl.
=
In
+ dl)} (12.6)
Rearranging gives
The limit of the lefthand side is the derivative of pil) with respect to (, so we have the differential equation (12.8)
A solution is (12.9)
SECT.
3.12
DERIVATION OF THE POISSON DISTRIBUTION
171
as is easily checked, since dp.,(t) = ~"xt"'l est dt x!
=
+ (~t)'" (_~)est x!
~. (~t)"'l est _ ~ . (~t)'" est (x1)!
= ~P"_l(t) 
x! ~p.,(t).
(12.10)
Thus (12.9) does satisfy (12.8). Therefore the distribution of the number of events in the time interval (0, t) is given by (12.9), which is a Poisson distribution with parameter ~t. There is nothing special about the origin in the interval (0, t), so (12.9) applies to any interval of length t. From the present approach we see that if the probability of each radioactive atom in a mass disintegrating is a constant, then the number of atoms disintegrating in a time period t has the distribution (12.9). Similarly, if over a given part of the day the probability of a telephone call being received by a switchboard is constant, then the distribution of the number of calls per time interval is given by (12.9). The same would apply to the number of flaws per yard of insulated wire, the number of misprints per page, the number of blood cells on individual squares on a haemocytometer, etc. The Poisson distribution, which is a discrete distribution, is closely related to the negative exponential distribution (1.12.4), p{x} = ()e B"', < x < 00, which is a continuous distribution. In the foregoing derivation of the Poisson we assumed that the probability of an event occurring in the interval (t, t + dt) was ~ dt. The probability that the event occurs times in the interval (0, t) was then found to be Po(t) = e st, (12.4). We can write
° °
Pr{interval between successive events
=
Pr{event occurs
Hence
> t}
°times in interval (0, tn = est.
Pr{interval between successive events
< t} =
(12.11)
1  e st . (12.12)
The lefthand side of this is in the form of a cumulative distribution function for a random variable T defined as the interval between successive events. By (1.11.6) the corresponding probability density function is obtained by differentiation with respect to t. Differentiating (12.12) we set dP{t} dt
= p{t} = ~est
(12.13)
which has the form of the negative exponential probability density function.
172
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
In other words, if the probability of an event in the time interval (t, t + dt) is ~ dt, then the distribution of the number of events per interval of time t is Poisson of the form (12.9) and the distribution of time between successive events is negative exponential of the form (12.13). 3.13. Test of Hypotheses about the Poisson Distribution n
To obtain sums of terms,
L p{x}, of Poisson distributions we can use
.=0
Molina's table [17]. Alternatively, we can use a relationship with the cumulative X2 distribution, tabulated in Table III. From (5.1) the cumulative sum of the terms of the binomial distribution is given exactly as Pr{X ~ x}
= P{x} =
1  Pr {F
nx () } c, the lot is to be rejected. From the arguments of Section 3.8, we have Pr{lot is accepted} = Pr{ x ~ c} (15.1) Under the usual industrial conditions, the lot size N is large and the sampling fraction nlN is small, say _ Xl Ix1 +
Xl
+ X2} =
Pr{F
< Xl + X2 Xl
= pr{F < X 2 :
Xl
I},
+ I}
(20.6)
where the aegrees of freedom for Fare 2xl , 2(X2 + 1). Our P value for a onesided test against the alternative ~l > ~2' is determined by
1» < X2X: I} = P
(20.7)
+ 1»
1
(20.8)
+ 1), 2xl ) = ~ . X2 + 1
(20.9)
pr{F(2Xl,2(X2 + and, since Pr{F < Fp}
= P,
F P(2Xl' 2(X2
= X2 + Xl
or
Fl_P(2(X2
To revert to our initial example, suppose that the number of flaws in carpeting A is Xl = 9 and in carpeting B is X 2 = 2. Then
Fl_P(2(2
+ 1),2
X
9)
= _9_ 2+1
or Fl _1'(6, 18) = 3. From Table IV, FO•95(6, 18) = 2.66 and FO•975(6, 18) = 3.22, so 1  P ~ 0.965, and P,...J 0.035. The normal approximation, (20.1), gives x 2/T2 • We therefore need to sum the binomial series: Pr{XI ~ xllxl = _ Xl Ix I + X 2 = Xl + X 2} = Pr {F
2(X2 +
1»
=
X2
(21.6)
~2
Suppose that this
+ 1 TI T2
(21.7)
Xl
Xl ~2 = FIl' ( 2( X 2 + 1) ,2xI ) = Xl T2 . ~+1~
(21.8)
~+1~
3.22. An Application to Vaccine Testing Suppose that we are testing lots of vaccine containing y live particles per liter and we test a sample of Q milliliters (m!) drawn from the wellstirr~d lot. Then on the average the sample will contain yQ/IOOO live particles: call this quantity~. Suppose that the test of the sample will find a live particle if it is present, and that if one or more live particles are found then the lot is rejected. The probability of rejection will be Pr{ X
> O}, =
1  Pr{ X
= O} =
1
e~
!:
1"'=0 =
1
e~.
Suppose that y = 5 particles per liter and Q = 60 ml; then ~ = 0.3, and the probability of rejection is (1  e 0 •3) = 0.259. Thus 25.9 per cent of such lots will be rejected. Suppose that we wish to determine the size of sample that will accept only a fraction P of lots containing 5 live particles per liter; i.e., we require Pr{x = O} = p. But Pr{x = O} = e g, so ~ = log. p. Suppose that we require p = 0.05; then loge P= loge 5  loge 100 = 1.60944 4.60517 = 2.99563, from a table of natural logarithms, so ~ = 2.99563. Then Q = ~/y = 2.99563/5 = 0.599 liters = 599 ml. EXERCISES 3B 3B.1. Interpret in words the conditions (8.11). 3B.2. Show that for the hypergeometric distribution (8.10) (a) E[x] = n: '
(b) Vex] = n:
(1  ~) (1 _~ : : ~) .
186
BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
3B.3. You are dealt a hand of 13 cards from a wellshuffled bridge deck of 52 cards. What is the probability of receiving (a) exactly zero aces, (b) exactly two aces? 3B.4. Find an expression for the variance of the estimator lIN = X/11M discussed in Section 3.9. 3B.S. Fertilizer was stored in drums of two types. After a certain period it was observed that of 57 of the first type, 7 had split seams, and of 63 of the second type, I had split seams. Calculate the onesided P value for the null hypothesis that the two types of drums do not differ in their liability to splitting (a) with the Fisher exact test, (b) with the normal approximation. 3B.6. A small carhire company has two cars which it rents out by the day. Suppose that the number of demands for a car on each day is distributed as a Poisson distribution with mean 1.5. (a) On what proportion of days is neither car required? (b) On what proportion of days is the demand in excess of the company's capacity? (Note that el. S = 0.223.) 3B.7. Suppose that a system involves 11 components in series, each of which must be operative for the system to function. Suppose that the probability of the ith component failing in the time interval (I, I + dt), given that it has not already failed, is Ai dl. What is the expected value of the length of time to failure of the system?
@a ...
[email protected]. (a) Suppose that a system requires a certain function to be performed at a certain stage, and that the component responsible for performing this function has probability of failure in the time interval (I, I + dl) of A dl. What is the expected value of the length of time before failure?
(b) Suppose that the component is duplicated, as in the sketch. Both components are operating, and the system functions if one or the other or both the components are functioning but fails if both components have failed. What is the expected value of the length of time before failure of the system? (c) Do the same as in (b), but with 11 components in parallel, instead of two. If I is the time to failure of the system, show that E[T] =
~ [(~)}

(;)
~ + ... + (_I)nl (:)~J'
Feller [6], Volume 1, p. 63, gives the identity
! _ (11) ! + ... + (_l)nl (11) ! (11) 1122 1111
= 1
EXERCISES
3B
187
Prove this using Feller's hint of integrating the identity nl
!
(1  t)v = [l  (1  t)n]t1 •
v=O
It follows that E[T]
= ~1 ( 1 + 21 + 31 + ... + ii1) .
38.9. (a) Do the same as in (3B.8b), but the system starts with C1 only in operation, and C1 has the property that when it fails it switches C2 on. (b) As in (a), but with a total of n components functioning on a standby and takeover basis. Show that if t is the time to failure of the system the density function of t is _
),k
PT{t}  (k _ I)! t
klAt
e
,
k
E[T]
38.10.
=1'
(a) Show that
i (n)x ( mx ) = (m z+ n) .
:>:=0
Z 
(b) Given that X is binomially distributed with parameters n, 9, and that Y is binomially distributed with parameters m, 9, show thatZ = X + Yis binomially distributed with parameters (n + m, 9).
38.11. A company installed two compressors at the same time. These compressors are used continuously. By the end of a year one of them has had 13 breakdowns and the other 3. Can you say that the first compressor is significantly inferior to tqe second? Make both exact and approximate tests. Discuss the assumptions underlying your test. 38.12. Suppose that the company in (3B.11) above is going to expand its plant and acquire more compressors of the second type. They would like to know within what limits the average number of breakdowns per compressor per year will lie, with 95 per cent confidence. Find these limits. 38.13. W. Allen Wallis [20] found that for the 96year period 18371932 there were 59 years in which no vacancies occurred in the U.S. Supreme Court, 27 years with 1 vacancy, 9 years with 2 vacancies, 1 year with 3 vacancies, and 0 years with more than 3 vacancies. (a) Calculate the expected numbers of 0, 1, 2, 3, and more than 3, vacancies on the assumption that the number of vacancies has a Poisson distribution. (If tables of natural logarithms or the exponential function are not readily available, note that if z = eY , then y = log. z = 2.3026 loglo z). (b) What is the probability that a President will serve out his (first) fouryear term without being presented with the opportunity to make any appointments? 38.14. A careful inspection shows 2 flaws in 1200 feet of Brand A magnetic tape and 10 flaws in 1800 feet of Brand B magnetic tape. Assuming that the probabilities of flaws in any length dx are ~.4 dx and ~B dx, obtain exact bounds for the P value of the null hypothesis that ~.4 = ~B' 38.15. You decide to replace the three 12AX7 tubes in the preamplifier of your hifi system. You purchase four such tubes from a retailer who has
188 BINOMIAL, HYPERGEOMETRIC, AND POISSON DISTRIBUTIONS
CHAP.
3
twelve of the tubes on his shelf, five of which are defective. What is the probability that you receive at least three good tubes, i.e., three or four good tubes. Reduce your answer to a simple numerical fraction. 38.16. A consumer agrees to purchase large lots of items provided that samples of size n are to be taken from each lot and the lot will be accepted if the number of defectives x ~ c. The consumer requires that if the fraction defective in a lot is 4.5 per cent then that lot has a probability of 0.90 or more of being rejected. The producer requires that if the fraction defective in a lot is 2.25 per cent then that lot has a probability of 0.10 or less of being rejected. (a) What are the smallest c and n with the desired properties ? (b) Suppose that the producer agrees to institute rectifying inspection. What is the worst possible average fraction defective that the consumer could find himself accepting? 38.17. "(The committee) came up with 12 deaths from pulmonary embolism among 1.0 million Enovid users in 1962, compared with 203 deaths among the general population of 25.6 million women of childbearing age, or a death rate of (7.9) per million," Wall Street JOIl/'lla/, September 18, 1963. Test the null hypothesis that the death rates among Enovid users and the general population are equal.
REFERENCES 1. National Bureau of Standards, Table of the Billomial Probability Distributioll, Applied Mathematics Series 6 (1950). 2. Romig, Harry G., 50100 Billomial Tables. New York: John Wiley and Sons, 1953. 3. Tables of the Cumulative Billomial Probabilities. Ordnance Corps Pamphlet ORDP 201. U.S. Government Printing Office, 1952. 4. Staff of the Computation Laboratory, Tables of the Cumulative Billomial Probability Distributioll. Cambridge: Harvard University Press, 1955. 5. Uspensky, J. V., Illtroductioll to Mathematical Probability. New York: McGrawHill Book Co., 1937. 6. Feller, William, All Introductioll to Probability Theol)1 alld Its Applicatiolls, Volume 1. 2nd ed. New York: John Wiley and Sons, 1957. 7. Hald, A., Statistical Theory with Ellgilleerillg Applicatiolls. New York: John Wiley and Sons, 1952. 8. Mosteller, Frederick, "Some Statistical Problems in Measuring the Subjective Response to Drugs," Biometrics, 8 (1952), 22026. 9. Lieberman, Gerald J., and Donald B. Owen, Tables of the Hypergeometric Probability Distributioll. Stanford: Stanford University Press, 1961. 10. Chapman, Douglas G., "Some Properties of the Hypergeometric Distribution with Application to Zoological Sample Censuses," Ulliversity of Califomia Publicatiolls ill Statistics, 1 (1951), 13160. 11. Fisher, R. A., The Desigll of Experimellts. 7th ed. Edinburgh: Oliver and Boyd, 1960. 12. Mainland, Donald, "Statistical Methods in Medical Research: I: Qualitative Statistics (Enumeration Data)," Calladian Joumal of Research, E, 26 (1948), 1166. 13. Finney, D. J., "The FisherYates Test of Significance in 2 x 2 Contingency Tables," Biometrika, 35 (1948), 14556.
REFERENCES
189
14. Latscha, R., "Tests of Significance in a 2 x 2 Contingency Table: Extension of Finney's Table," Biometrika, 40 (1953), 7486. 15. Tocher, K. D., "Extension of the NeymanPearson Theory of Tests to Discontinuous Variates," Biometrika, 37 (1950), 130144. 16. Clarke, R. D., "An Application of the Poisson Distribution," Journal of the I1Istitute of Actuaries, 72 (1946) p. 481. 17. Molina, E. c., Poiss01l's Exp01le1ltial Bi1l0mial Limit. New York: Van Nostrand, 1942. 18. Dodge, Harold F., and Romig, Harry G., Sampli1lg I1Ispecti01l Tables. 2nd ed. New York: John Wiley and Sons, 1959. 19. Wald, A., Seque1ltial A1Ialysis. New York: John Wiley and Sons, 1947. 20. Wallis, W. Allen, "The Poisson Distribution and the Supreme Court," Journal of the America1l Statistical Associati01l, 31 (1936), 37680.
CHAPTER 4
An Introduction to Queuing Theory
4.1. Introduction
An important branch of probability theory deals with the behavior of queues. Customers waiting in a barber shop, airplanes waiting their turn for a runway at an airport, looms needing the attention of an operator in a spinning mill, are all examples of queues. Queues are classified according to: (a) The time distribution of arrivals. Usually this is considered to be a Poisson process as discussed in Section 3.12. (b) The distribution of service times. The assumption that this is exponential leads to the easiest mathematics, and in some instances, e.g., duration of calls from a telephone which is not a pay phone, reasonably realistic, but in other cases other distributions, which may lead to intractable mathematics, may be appropriate. (c) The queue discipline. The simplest assumption is "first come, first served," i.e., the queue waits in line and the head of the line is the next to receive service. Sometimes, however, the service is at random. For example, the operator servicing looms can pick a loom at random from the group needing service rather than pick the one which has been waiting longest. (d) The number of service channels. An airport with only one usable runway has only one channel; a barber shop with k barbers has k channels.
Queuing theory considers primarily such questions as the distribution of the length of the queue, the distribution of waiting times, and the number of customers lost because when they arrive they see that the queue is too long so they go away. Historically, some of the earliest studies of these problems were in the context of a manual telephone exchange receiving calls from subscribers. 190
SECT.
4.2
SINGLECHANNEL, INFINITE, POISSON ARRIVAL
191
A. K. Erlang first published on this topic in 1909 and his most important paper appeared in 1917 (reprinted by Brockmeyer et al. [1]). The field was brought to the attention of modern statisticians by D. G. Kendall [2] in 1951; doubtless his interest was stimulated by the ubiquity of queues in all aspects of British existence in that grey and depressing era. For a monograph with many applications (and a bibliography of 910 items) see Saaty [3]. Other monographs are by Morse [4] and Takacs [5].
4.2. SingleChannel, Infinite, Poisson Arrival, Exponential Service Queues In this section we consider the simplest case, in which the arrivals are Poisson with parameter A, i.e., the probability of a new customer arriving in the interval (t, t + tJ,.t) is A tJ,.t, and there is a single server with service time having an exponential distribution with parameter ft. In other words, the probability of arriving in a time interval of length t is given by the Poisson frequency function with parameter At, (3.12.9). The expected value for a variable with a Poisson distribution is equal to the parameter, (3.11.3). Thus the average number arriving per time t equals At, and the average number arriving per unit time equals A. Also, from (1.15.12), the expected value of an exponential distribution written as p{t} = fteI't is E[t] = l/ft. Thus the average service time is l/ft, so the average number which could be'served per unit time is ft. Hence Nft is the ratio of arrival rate to potential service rate. In order to be able to reach a state of equilibrium it is necessary that A < ft, since otherwise the queue would grow indefinitely. We assume that there are no restrictions on the possible length of the queue, and that the queue discipline is "first come, first served." A customer is regarded as being a member of the queue until his waiting for service is completed; i.e., when receiving service he is still a member of the queue. Let n be the number of customers waiting in the queue at time t and PII(t) be the probability that at time t the queue contains n customers. For n > 0, and small tJ,.t, we have
Pr{n customers in queue at t + M} = Pr{n customers in line at time t} Pr{O customers arrive in (t, t + tJ,.t)} Pr{O customers discharged in (t, t + tJ"t)}
+ Pr{n + I customers in line at time t} Pr{O customers arrive in Ct, t + M)} Pr{1 customer discharged in (t, t + tJ,.t)} + Pr{n  1 customers in line at time t} Pr{1 customer arrives in (t, t + tJ,.t)} Pr{O customers discharged in (t, t + tJ,.t)} + Pr{n customers in line at time t} Pr{1 customer arrives in (t, t + tJ,.t)} Pr{1 customer discharged in Ct, t + M)}. (2.1)
192
AN INTRODUCTION TO QUEUING THEORY
CHAP.
4
It is assumed, of course, that the probabilities are independent, e.g., that the probability of an arrival is independent of the probability of a customer being accepted. Thus (2.1) can be written as
PII(t
+ 6.t) = f""oJ
+ P,,+1(t)(1  A 6.t)ft 6.t + p,,(t)A 6.tft 6.t A 6.t  ft 6.t) + PII+1(t)ft I:l.t,
p,,(t)(1  A 6.t)(1  ft 6.t)
+ p,,_I(t)A 6.t(1 p,,_I(t)A I:l.t
 ft 6.t)
+ PnCt)(1

(2.2) (2.3)
terms involving (6.t)2 being neglected in (2.3). Equation (2.3) can be rearranged to give PII(t
+ 6.t) 6.t
p,,(t) f""oJ
APII_I(t)  (A
+ ft)p,,(t) + ftP,,+1(t).
(2.4)
We now take the limit as 6.t tends to zero: dplI(t)  = APn_l(t) 
(A
dt
+ ft)PII(t) + ftP"+I(t).
(2.5)
We now consider in exactly the same way Pr{O customers in queue at
t
+ 6.t}, obtaining Po(t
+ 6.t) = Po(t)(1
 A6.t)
+ Pl(t)(1
 A 6.t)ft I:l.t,
(2.6) (2.7)
(2.8) We now assume that a stationary state is reached so that p,,(t) is independent of t and can be denoted by p", i.e., we assume that dp,,/dt = 0 for n = 0, 1, .... Equations (2.8) and (2.5) then become Apo AP,,_I  (A
+ ftPl = 0
+ ft)p" + ftPn+1 =
0
when n = 0, when n > O.
(2.9) (2.10)
From (2.9),
A
From (2.10), with n
= 1, Apo  (A
PI =  Po· ft
(2.11)
+ ft)P1 + ftP2 = 0,
(2.12)
whence, solving for P2 and substituting for PI from (2.11), (2.13)
SECT.
4.2
SINGLECHANNEL, INFINITE, POISSON ARRIVAL
Continuing for n
193
= 2, 3, ... , it is apparent that Pn =
AY' (;JPo,
(2.14)
Since the queue must be of some length, (2.15)
From (2.14) <Xl
LPi
=
<Xl
(A)i
(2.16)
PoL  .
i=O
tt
i=O
If we assume that Aftt < 1, then the summation is the sum of an infinite geometric series, and so
i
(~).
i=O

1
(2.17)
tt  1  Aftt .
Substituting (2.17) and (2.15) in (2.16), we have Po
=
A 1  ,
and substituting this in (2.14) gives
(2.18)
tt
(2.19)
Table 4.1 gives the first five terms of Pn for four values of Aftt. The probability that a queue has more than 4 members is thus 1/1024 = 0.00098 when Aftt = 1/4 and 525.22/1024 = 0.5129 when Aftt = 7/8. Table 4.1.
1024Pn AI/t
n
1/4
1/2
3/4
7/8
0 1 2 3 4
768 192 48 12 3 1
512 256 128 64 32 32
256 192 144 108 81 243
128 112 98 85.75 75.03 525.22
~5
The average value of the length of the queue is the simplest aspect of the distribution, though clearly not the entire story, for if two distributions
194
AN INTRODUCTION TO QUEUING THEORY
CHAP.
4
had the same expected value, we would usually prefer the one which had fewer very large queues. The expected value of the length is
(2.20) We recall the formula for the sum of an infinite geometric series with a < 1: 1+a
+ a 2 + a 3 + ... = 1 .
(2.21)
Ia
Differentiating both sides gives
+ 2a + 3a + ... = 1 (1  a}2 Using this in (2.20) gives, for AlP, < 1, 1
(2.22)
2
E[I1] =
(1 _p,~)~ (1  1Alp,}2
Afp,
=
P,
.
1  A/p,
(2.23)
As AlP, approaches 1, the expected value of the queue length tends rapidly to infinity; for AlP, = 1/2, E[I1] = 1, but for Alp, = 15/16, E[I1] = 15. A customer will be interested not only in the length of queue as measured by the number of customers in it, but also by the time waiting for service, say fw' The average waiting for service time can be derived as follows. At any random instant the probability that the queue contains 11 customers is P1I' given by (2.19). The expected service time for each of these customers is 1/p" since the service time has an exponential distribution with parameter p,. The expected total service time for these 11 customers will be nip,. The expected waiting time E[f lV ] will be E[lIO]
= L'" (probability of n customers)
x (waiting time for n customers)
11=0
= 1p1l(~) = 1 (~)"(1 11=0
P,
11=0
P,
 ~)~ = 1111(~)"(1  3) . P, P,
P,
11=1
P,
(2.24)
P,
Comparing the summation with (2.20), we see that E[t lV ]
= 1 E[I1] = 1 AI~ = 3_1_ P,
p,1A/p,
p,p,A
(2.25)
If we add on the expected value of the service time for ourselves, 1/p" we obtain for the expected value for the sum of time waiting for service
SECT.
4.3
QUEUES WITH ARBITRARY SERVICE TIME DISTRIBUTION
195
plus time for service, say E[lq],
E[tq ]
= A  l+l = 1 . fl, fl, 
A
fl,
fl, 
A
(2.26)
It will be noted that Po, (2.18), is the fraction of the time there are no customers either waiting or being serviced and the service facility is being unused. It can be made close to zero by making Affl, close to 1, but then the proportion of each customer's time that is wasted waiting in the queue rises drastically. This latter may be regarded as
E[t w ] E[t q ]
A
=
(2.27)
fl,
Thus, for example, if Affl, is 0.9, then 10 per cent of the server's time is wasted (resting!) but of the time each customer spends in the queue on the average 90 per cent is wasted (fuming !).
4.3. Queues with Arbitrary Service Time Distribution In Section 4.2 we derived the distribution function of the length of a queue (2.19) where the distribution of service times was exponential, and from that we obtained the expected value of the length of the queue E[n] (2.23) and the expected value of the waiting time E[t w] (2.25). In this section we relax the assumption about the form of the distribution of service time and merely suppose that it has an expected value E[ts] = T and variance a2 • We will obtain formulas for E[n] and E[tw]' The general argument follows section 2 of Kendall's paper [2]. An essential concept in the argument is that at certain time instants, called by Kendall "points of regeneration," a knowledge of the state of the process at such an instant has the property that all relevant information is contained in that knowledge alone, and further knowledge of the past history is useless for predicting the future behavior of the process. For a Poisson process, all instants have this property and are thus points of regeneration. For a queuing process with Poisson input and arbitrary service time distribution, the instant at which service is completed on a customer is a point of regeneration. Also, all instants at which the queue is empty and no one is being serviced are points of regeneration. Consider an instant at which customer Ci leaves. Let n be the size of the queue at this instant (n does not include the departing customer). Since on occasion the departing customer may leave the line empty, it is possible for n to equal zero. Let the next customer CHI to be served have service time Is (ts is a random variable with the service time distribution)
196
AN INTRODUCTION TO QUEUING THEORY
CHAP.
4
and during this time suppose that r new customers arrive. Then, conditionally on the value of 18 , r is a Poisson variable with parameter 18A and probability function analogous to (3.12.9). Using (3.11.3)
E[r] = 18A,
(3.1)
E[r(r  1)] = E[r2]  E[r] = (t8A)2,
(3.2)
and from (3.11.6) whence it follows that (3.3)
Equations (3.1) and (3.3) are conditional on the particular value of f" and if we take expectations over time we obtain
E[r] = E[t8A] = AT
(3.4)
and (3.5) Now the variance of the service time distribution was defined as 0'2, so (3.6) whence (3.7) and substituting in (3.5),
E[r2] = A2(0'2
+ T2) + AT.
(3.8)
Now let n' be the length of the queue which customer CHI leaves behind him. We assume that statistical equilibrium will exist and therefore the marginal distributions of nand n' will be identical and in particular will have the same means and the same mean squares. We assume therefore that E[n] = E[n'], (3.9) 2 E[n ] = E[n'2]. (3.10) We now demonstrate that nand n' are related by the equation n'
=
max (n  1,0)
+ r.
(3.11)
There are two cases to consider. First, when customer Ci leaves the queue , length is zero; i.e., n = O. Then we wait for the next customer CHI to arrive and eventually he does. He receives immediate service lasting a time Is: in this time r further customers arrive and wait in line, so that in this case (3.12) n' = r. Second, when customer Ci leaves the queue length is n > O. Then the next in line moves into service immediately, leaving n  1 waiting in line.
SECT.
4.3
197
QUEUES WITH ARBITRARY SERVICE TIME DISTRIBUTION
Customer CHI takes time Is for service, in which time r customers arrive and add themselves to the line, so when CHI leaves the queue length is
n'
=n
I
+ r.
(3.13)
It is easy to check that (3.12) and (3.13) can be written as (3.11). We now define a variable d as a function of n, 15(n), with the properties
d
=
15(n)
=
1
=0
if n if 11
= 0,
(3.14) (3.15)
> o.
We can then write (3.11) as (3.16) = n  1 + d + r. It is easy to check that when n = 0, d = 1, and (3.16) gives n' = r corresponding to (3.12), and that when n > 0, d = 0, and (3.16) gives n' =
n'
n  I + r corresponding to (3.13). We note that in consequence of the definitions (3.14) and (3.15) (3.17) d 2 = d, (3.18) 11(1  d) = n, as can be readily checked by considering the cases n = 0, n > O. We now use the assumption (3.9) that E[n] = E[n'] and take expectations of (3.16) to get E[d] = 1  E[r] = 1  AT, (3.19) using (3.4) for E[r]. We now square (3.16), n' = n  (1  d) + r, to get n'2
=
n2 + 1  2d
+ d2 + r2 
2n(1  d)
+ 2nr 
2(1  d)r, (3.20)
and substitute from (3.17) for d2 and from (3.18) for n(1  d), to get n'2  n2 = 2n(r  1)
+ (r 
1)2
+ d(2r 
1).
We now take expectations and use the assumption (3.10) that E[n2] to get 2E[n(r  1)] + E[(r  1)2] + E[d(2r  1)] = O.
(3.21)
= E[n'2] (3.22)
An essential part of the argument is that r is independent of nand d (d, of course, is a function of n): this permits us to write (3.22) as 2E[n]E[r  1]
+ E[(r 
1)2]
+ E[d]E[2r 
1]
= O.
(3.23)
Solving for E[n], this gives E[n]
=
=
E[(,.  1)2] + E[d]E[2r  1] 2E[1  r] E[,.2]  2E[,.] + 1 + E[d]{2E[,.]  I}
2(1  E[r])
(3.24)
198
AN INTRODUCTION TO QUEUING THEORY
CHAP.
4
Substituting from (3.8) for E[r 2 ], from (3.4) for E[r], and from (3.19) for E[d] gives E[n]
12 2
= AT + II T
+ 12
2
(3.25)
II (]
2(1  AT)
This equation expresses the average length of the queue in terms of the arrival rate A and the service time mean T and variance (]2. Let f,. be the time a customer waits in the queue for service: f w does not include his service time Is. Thus tw + fs is the total time from arrival to discharge, and the average total time is E[tw
+ Is] =
E[tw]
+ T.
(3.26)
Since the arrival rate is A, the average number of arrivals in this time is A(E[tw] + T). But this must equal the average queue length immediately following his discharge, E[n], given by (3.25); thus A(E[t,.]
+ T) =
AT
+
A2T2
+ A2(]2
2(1  AT)
,
(3.27)
so (3.28)
This shows that if we consider all service time distributions with the same average service time, that with (]2 a minimum (i.e., zero), i.e., a constant service time, gives the smallest average waiting time. 4.4. SingleChannel, Finite, Poisson Arrival, Exponential Service Queues The situation considered in this section is identical with that of Section 4.2 except for the difference that the maximum permissible queue length is N rather than infinity. This situation could arise, e.g., in a downtown
gasoline station for which the available space will only accommodate a certain number of cars and customers arriving when the station is full are not permitted to wait on the street. For n in the interval (0, N  1) the arguments of (2.1) to (2.14) apply unchanged in the present situation. We will consider now the special case of n = N. Similar to but not identical to (2.1), we have the equation
Pr{N customers in queue at f + ~t} = Pr{N customers in line at time I} Pr{O customers discharged in (f, t + ~f)} + Pr{N  1 customers in line at time t} Pr{l customer arrives in (f, I + M)} Pr{O customers discharged in (I, I
+ ~f)}.
SECT.
4.4
199
SINGLECHANNEL, FINITE, POISSON ARRIVAL
This can be written as
PN(t
+ !:!.t) = PN(t)(1
 I' !:!.t)
+ PNI(t) . A!:!.t . (1
 I' !:!.t)
+ AP!I'1(t) !:!.t,
,...., PN(t)  I'PN(t) !:!.t
(4.2) (4.3)
which rearranged gives
PN(t
+ !:!.t) !:!.t
PN(t)
1
= JlPN_1
( )
( )
t  I'PN t .
(4.4)
When we take the limit of this as !:!.t  0, the lefthand side becomes dpN(t)/dt, and if we assume that a stationary state is reached, so that dpN(t)/dt = 0, we obtain (4.5) APNI = I'PN' But for n
< N, (2.14) applies, so PNI
=
( ~A)NIPo·
(4.6)
Thus, substituting this in (4.5), we have
PN
A
A(A)NI
= ~ PNI = ~
~
Po
(A)N
=
~
Therefore (2.14) is valid in the present case for
(4.7)
Po·
°: ; n ::;; N: (4.8)
Analogous to (2.15), the queue must be of some length, so
1=
N (A)i LN Pi = Po L I' .
i=O
(4.9)
i=O
The formula for the sum of a finite geometric series is
1
+ a + a2 + ... + aN =
1  aN+l
1a
,
(4.10)
so
N (A)i
i~O ~ =
1 _ (A./I')N+l 1  A./I' .
(4.11)
Substituting this in (4.9) gives
Po
=
1  A./I' 1 _ (A./I')N+I'
(4.12)
200
AN INTRODUCTION TO QUEUING THEORY
CHAP.
4
and substituting this in (4.8) gives Pn
1  Affl, (A)n (Affl,)N+l ; ,
o~ n ~
=1_
N.
(4.13)
Manipulations analogous to (2.20)(2.23) give as the average length of the queue 1  (N + l)pN + N pN+l (4.14) E[n] = p (1 _ p)(1 _ pN+l) , where p = Affl,. The proportion of the time customers are turned away because the queue is full is equal to the proportion of the time the queue is of size N, i.e., 1 P N 1 _ pN (4.15) PN = 1 N+l P = 1  1 N+l • p p 4.5. Multichannel, Infinite, Poisson Arrival, Exponential Service Queues In our discussion (Section 4.2) of a singlechannel queue, with Poisson arrivals and exponential service time distribution, we had to consider the two situations n = 0, n > O. With k service channels we have to consider the three situations n = 0, 1 ~ n < k, n ~ k. The discussion for the case n = 0 leading to (2.9) remains unchanged and we continue to have
n
= O.
(5.1)
For the case 1 ~ n < k, (2.1) applies as stated. The terms involving the probabilities of 0 or I customers being discharged need consideration. For example, if there are n customers in line at time t and the probability of anyone being discharged in (t, t + 6.t) is fl, /).t, then the probability of 0 being discharged is given by the binomial probability function
(~)(fl,/).t)O(1 fl,/).t)no =
(1  fl,/).t)n = 1  C)(fl,/).t)l
+ (;)(fl, /).t)2 rJ
Similarly, if there are n discharged is
1  nfl, /).t.
... (5.2)
+ 1 customers in line the probability of 1 being
SECT.
4.5
201
MULTICHANNEL, INFINITE, POISSON ARRIVAL
and if there are n  1 customers in line the probability of 0 being discharged is (5.4) and if there are n customers in line the probability of 1 being discharged is
(~)(,u~t)l(l

,u~t)nl~ n,uM.
(5.5)
Substituting (5.2)(5.5) in (2.1) we get
Pn(t
+ ~t) ~ Pn(t)(1
 A ~t)(l  n,u ~t) + PnH(t)(1  A ~t)(n + P,._I(t)A ~t[l  (n  1),u M] + p,.(t)A ~tn,u ~t
/">oJ
Pn(t)(l  A M  n,u ~t) +Pn_l(t)A ~t.
+ 1),u ~t
+ PnH(t)(n + 1),u ~t (5.6)
By the same manipulations that led from (2.3) to (2.5), this gives (A
+ n,u)Pn = AP"_l + (n + l),up"H'
1~ n
< k.
(5.7)
Ifn > k, (2.1) applies as stated, and in place of (5.2) we have 1  k,u ~t, in place of (5.3) we have k,u ~t, in place of (5.4) we have 1  k,u ~t, and in place of (5.5) we have k,u M. Substituting these expressions in (2.1) we get
Pn(t
+ ~t)
/">oJ
/">oJ
Pn(t)(l  A ~t)(l  k,u ~t) + PnH(t)(1  A M)k,u ~t + Pn_l(t)A ~t(1  k,u ~t) + Pn(t)A M k,u M (5.8) Pn(t)(1  A ~t  k,u ~t) + PnH(t)k,u ~t + Pn_l(t)J.. M. (5.9)
If n = k, the changes listed above for the case n > k for (5.2), (5.3), and (5.5) apply, but in place of the change for (5.4), namely 1  k,u ~t, we have 1  (Ie  1),u ~t. Making this change in (5.8) leads to the same expression as (5.9), however. By the same manipulations that lead from (2.3) to (2.5) we obtain (A
+ k,u)Pn = Apnl + k,uPnH'
k
~
n.
(5.10)
From (5.1) we obtain
PI Putting n
=
A
=
,u
Po·
(5.11)
1 in (5.7) gives (A
+ ,u)PI = }.po + (1 + 1),uP1+1'
(5.12)
202
AN INTRODUCTION TO QUEUING THEORY
CHAP.
4
whence (5.13)
and more generally 1 (A)" ,Po,
P1I =
f.t
1 :::;;
n.
II
< k,
(5.14)
and this equation is also true for n = O. When we put n = kin (5.10) we can derive Pk+l
=
(
A \k+l kk f.t k ) k! Po,
(5.15)
= k + 1 we can derive
and when we put n
Pk+2 ~
and in general for n
=
(
A y+2 kk f.t k )
(5.16)
k! Po,
k,
k:::;; n.
(5.17)
<Xl
1 P1l = 1 must be satisfied, so
We need to evaluate Po. The condition
11=0
kl(A)" 1 Po 11=0 f.t 11 !
1  
+ 1 (A)"  1 Po = 1
(5.18)
<Xl
"=k f.t k! k"k
'
and 1 kl(A)n 1 kk  1   + 1A)n Po  11=0 f.t n! lI=k kf.t k!' <Xl
(
(5.19)
The second term on the righthand side can be written as kk 1 A )"=k"{ A)" 1 A )"}  1kf.t k! lI=k kf.t k! kf.t . CI)
(
00
,,=0
(
A'l (
(5.20)
11=0
For the righthand side we use the formulas for the sums of infinite and finite geometric series to obtain kk 00 ( A \" kk[ 1 1  (Ajkf.tlJ k! n~k kf.t) = k! 1  Ajkf.t  1  Ajkf.t
(Ajf.t)k  Ajkf.t) .
= k! (1
(5.21)
Substituting in (5.19) we get (A/f.tl 1  kl(A)" 1  1 + '''"Po  11=0 f.t 11 ! k! (1  Aj kf.t) •
(5.22)
SECT. 4.6
INVENTORY CONTROL
203
Substituting the value for Po implied by this equation into (5.14) and (5.17) gives the solutions for all Pn' As an application of these formulas, we have that the probability of a new random arrival having to wait in line is Pr{n ~ k}
1 Po =! (A)"  ;;=k p, k! k 00
n=k
(A/p,)k
(5.23)
where Po is given by (5.22). 4.6. Inventory, Control
In this section we will merely indicate how the methods of queuing theory are applicable to inventory control. In Section 4.4 we discussed single channel queues of finite length N, and in Section 4.5 we discussed multichannel infinite queues. Suppose that we continued with the natural ext~nsion to multichannel finite queues, and than particularized to the case where the maximum queue length N equals the number of channels k. Now consider a retailer oflarge, single, standard items, whose customers will purchase an item if he can deliver it from stock but otherwise will go away and look for it at a competitor's store. This retailer is assumed to place a reorder every time he makes a sale. This situation is the exact analogy of the queuing situation discussed in the previous paragraph. We have the relationships Number of channels = k } N _ k M' . f" Maximum length of queue = N aXlmum sIze 0 Inventory = N Idle channel Item in stock Arrival of customer and start of service Item purchased and order for replacement sent immediately End of service Delivery of ordered item Service time Time for reordered item to be delivered All channels busy No item in stock It is apparent that the solution of the queuing problem is the solution of the inventory problem. The optimum inventory will depend on the relative costs of holding an item in inventory, the profit on a sale when made, and the loss in customer good will when a customer is turned away because there are no items in stock. The reader is referred to Chapter 10 of [4] for an examination of these questions.
204
AN INTRODUCTION TO QUEUING THEORY
CHAP.
4
EXERCISES 4.1. For a singlechannel infinite queue with Poisson input with parameter ,t and exponential service time with parameter It, what is the mode of the distribution of the queue length n? 4.2. For a queue of the type discussed in Section 4.2, with Poisson input with parameter ,t and exponential service time with parameter It, show that V[n]
= E[n] + (E[n])2,
where E[n] is given by (2.23). 4.3. Consider a queuing situation in which the same corporation operates both the customers and the service facility; e.g., a large cab company owning and operating both cabs and a service facility for the cabs. Suppose the cabs arrive randomly with Poisson parameter ,t and a single service channel is available for which the service times are exponential with parameter It. Suppose that the cost per hour of providing a service facility with parameter I' is MI', and that the cost in lost time, etc., for a cab in waiting and being serviced is C per hour. The constants M and C are fixed, as is the parameter A. Suppose that It can be varied, e.g., by providing a larger service staff or more equipment. What value should It be to minimize the total cost per hour to the corporation (owning both cabs and service facility)? 4.4. In a queuing situation with single server, with Poisson input with parameter,t and exponential service time with parameter /I, let nw be the number waiting in line for service (i.e., the total queue length n minus the customer being served, if any). Find the expected value of nw. 4.5. A manager operates a service facility in which customers arrive randomly with Poisson parameter ,t and queue for service. The service times are exponential with parameter It. The manager can vary It. Suppose that he wishes to choose II so that only in a fraction IX of the time, on the average, will the total queue length equal or exceed n members. Express the value of It that achieves this condition explicitly in terms of ,t, IX, and n. 4.6. For a singlechannel finite queue of fixed maximum length N, prove (4.15) and show that the average length of the queue E[n] has the following limits: (a) Lim E[n] p+O
=
p
+ p2,
(b) Lim E[n] p+ 1
= N12,
(c) Lim E[n]
=N
 II p.
p+ co
4.7. Show that (3.25) is consistent with (2.23), and that (3.28) is consistent with (2.25), when the appropriate values of T and (] are inserted in (3.25) and (3.28). 4.8. For a multichannel queue of the type discussed in Section 4.5, show that the average number waiting for service is
REFERENCES
205
4.9. Consider two singlechannel, Poisson arrival, infinite queuing situations. In queue a, the service time is a constant, say T. In b, the service time has an exponential distribution with mean T. Obtain a simple result for the ratio of the average waiting time in queue a to the average waiting time in queue b. 4.10. Consider two singlechannel, Poisson arrival, infinite queuing situations. In queue a, the service time has a rectangular distribution over the interval (0, T). In b, the service time has a constant value equal to the mean of the service time in a. Obtain a simple result for the ratio of the average waiting time in queue a to the average waiting time in queue (b). 4.11. Let PO,l be the probability of an arriving customer not having to wait for service with a singlechannel infinite queue with Poisson arrivals with parameter)" and exponential service time distributed with parameter Itl. Let PO,2 be the probability of an arriving customer not having to wait for service with a twochannel infinite queue with Poisson arrivals with parameter )" and exponential service time distributed with parameter /12 = 111/2. Thus the average service time in the singlechannel queue is onehalf the average service time in the two channel queue. Obtain a simple expression for the ratio PO,2/PO,l in terms of Aflt2 = P2. If P2 = 1, what is P O,2/PO,l?
REFERENCES 1. Brockmeyer, E., Halmstrom, H. L., and Jensen, Arne, The Life and Works of A. K. Erlang. Transactions of the Danish Academy of Technical Sciences, Number 2. Copenhagen, 1948. 2. Kendall, David G., "Some Problems in the Theory of Queues," Journal of the Royal Statistical Society, B 13 (1951) 151173. 3. Saaty, Thomas L., Elements of Queuing Theory. New York: McGrawHill Book Co., 1961. 4. Morse, Philip M., Queues, Inventories and Maintenance, New York: John Wiley and Sons, 1958. 5. Takacs, L., Introduction to the Theol}' of Queues, London: Oxford University Press, 1962.
CHAPTER 5
The Multinomial Distribution and Contingency Tables
5.1. The Multinomial Distribution Socalled contingency tables are the main topic of this chapter. Their analysis requires a consideration of the multinominal distribution, which we discuss in this section. Section 5.2 treats a method of obtaining approximate percentage points in terms ofthe X2 distribution. Contingency tables themselves are discussed in Section 5.3. The multinomial distribution is a generalization of the binomial distribution in which instead of two possible outcomes A and A there are k mutually exclusive and exhaustive outcomes AI" .. ,Ak with probabilities k
01 ,
••• ,
Ok' where 10i 
=
1. We want the probability that, on n trials,
i
k
Al occurs exactly Xl times, ... , Ak occurs exactly xk times, where
The probability of getting the specific sequence ~,
:1:1
n.
i
Al ... Al ... Ak ... Ak
,
1 Xi =
(Ll)
~
times
is, since successive trials are independent, Of1 •.• O;k. The ordering of the sequence is irrelevant to our purpose, as we are merely interested in the total number of times Al occurs, etc. All sequences with a specified set of Xi have the same probability. It remains to evaluate the number of sequences with a specified set of Xi' Equation (1.7.6) gave the number of arrangements of Xl objects of type AI' x 2 objects of type A 2 , etc., as , X , ••• X k",. so n'lx . l ' 2' P r{x · b · · · , 'Xk}
n!


Xl! • , •
xk !
0"'1 . .. O"'k 1 k
is the frequency function of the multinomial distribution, 206
(1.2)
SECT.
5.2
THE
X2
207
APPROXIMATION
The multivariate distribution considered in Section 1.19 is an example ofa multinomial distribution, with ()u 1/2, ()F 1/3, ()G 1/6. In that section we calculated from first principles the entire distribution, giving the results in Table 1.17. Equation (1.2) will give the same results directly. For example,
=
Pr{2, 0, I}
=
=
=2!~: l!GJGJ(~J= ~ =~76'
(1.3)
In the case of k = 2, the multinomial distribution reduces immediately to the binomial. In general, however, it is awkward to work with, and we outline in the next section the development of an approximation to (1.2) which can then be used to obtain certain cumulative probabilities. 5.2. The X2 Approximation for the Multinomial Distribution
We use the symbol P* for the probability defined in (1.2). As in the derivation of an approximation to the binomial distribution, so here we use Stirling's approximation (3.1.2) for the factorials in (1.2). Thus
'"
(27T)~(kl)nll+~
This involves n to the (n n
+
k
k
i
i
II Xi("i+~) II ()fi.
(2.1)
1/2)th power. We can write
1)
k( +1 2=  21 (k  1) + t Xi + 2 '
(2.2)
and also k
k
i
i
II ()fi = II ()i~():"i:~).
(2.3)
Substituting in (2.1) gives p* '" (27Tn )~(l~l) (
If k
()i
)~
If nXi;J\"i+~. k
(
()
(2.4)
Taking logarithms, log p* ~  1  (Ie  1)1 log(27Tn)k   L log ()i 2 2 ;
k ( + 1) Lx;  log (~ ). 2 n()i i
(2.5)
The expected value for the ith cell is n(); = e;, say. Define the deviation
208
MULTINOMIAL DISTRIBUTION AND CONTINGENCY TABLES
CHAP.
of the observed value in the ith cell from its expected value as di = Then
 di + ei _
1+~
Xi _
nOi and
(Xi
+ 1/2) =
log P*
f'"ooJ

(e i
ei
+ di + 1/2).
ei
Xi 
,
5
ei •
(2.6)
Substituting in (2.5),
11k  (k 1) log(27T11)  log 0i 2 2 i
L
 ~ (e
i
+ di +~) IOg(1 + ~J •
(2.7)
Again as in the binomial case we use the expansion 10g(1
+ y) = y 
y2 2
y3
+   ....
(2.8)
3
For the first terms of this series to give a good approximation we need Iyl «1. Here we are going to substitute y = dile i. We therefore need to check on the magnitude of Idileil. From (1.18.21) we have, when X has the distribution N(~, 0'2), Pr{lx  ~I
> kO'} = 2cl>(k).
(2.9)
We can regard Xi as a binomial variable which will be approximately normally distributed with expectation nO i = ei and variance nOi(1  Oi)' Hence Pr{lxi  eil > k..} nO i (1  0i)} = 2cl>( k). (2.10) Choose for k the value ";nO;/(1  0;). Then we have pr{ldi l >
J
nOi ..}nOi(1  Oi)}
1  0i
= 2cl>(''
J
nOi ),
1  0i
(2.11)
or, since ..}nOi/(1  0i)..}nO i(1  0i) = nO i = ei, Pr{ldil
> ei} = 2cl>(
J
nOi ).
1  0i
(2.12)
For moderate values and beyond of ei = nOi' say 5 or greater,..} nOil(l  0;) will be greater than ..}5, and hence cl>( ..}nO;/(1  0;» will be very small. Hence Pr{1 d;le i I > I} will be very small, and we will be justified in using the expansion (2.8). We note here that d i will be of the order of its standard deviation";nO i (1  0;), or approximately";;;. We now examine
SECT.
5.2
THE
X2
209
APPROXIMATION
the second summation term in (2.7), using (2.8) to expand log(1
k (ei + di + 1) [do 1(d~o )2 + 1 (d....!o)3 ~  
~
ei
2
I
= Lk ( d; i
1 d~ 2 ei
....!
2 ei
(d o)4 + ...]
d~ d~ + 1 i  ... + ....! 

k
k
ej
j ):
1 ....! 4 ei
3 ei
3 ei
+ dde
1 d~ i 2 ei
1 d~ i 4 ei
1 do + ... + ....! 

2 ei
+ . .. ) . (2.13)
We note that
k
k
.Li di = .L (Xi  ei ) = .L Xi  .L ei = O. i i i
terms in powers of ei , recalling that zeroorder terms are
Examining the
_
d is of order .Je
we see that the
i,
i
o
(2.14) and the terms of order
1/.J;; are (2.15)
For moderately large ei , say greater than 5, all terms except those of zero order may be neglected. Hence, substituting back in (2.7) and taking antilogarithms, we get p*
,..J
(27Tn)l1(k~1) (
d U 0i)l1 exp [1   ~....l k
•
k
2
2 • ei
]
.
(2.16)
Here all the terms preceding the exponential part, involving 7T, n, k, and the Oi' are constants or parameters. Thus P* is, to this approximation, k
determined by statistic:
..L dNe i ,
and historically the symbol X2 was given to this
i
(2.17) For a test of significance based on this statistic, we need to be able to calculate the probability of obtaining a value of the statistic as large as or larger than the observed value. We can write (2.17) in the form
(2.18)
210
MULTINOMIAL DISTRIBUTION AND CONTINGENCY TABLES
CHAP.
5
If we consider a particular cell, say the ith, and regard ourselves as in a binomial situation, then Xi  ei '"'' .In(J;(1  (Ji)
N(O, 1).
(2.19)
Hence our "X2," as defined by (2.17), is a weighted sum, the weights being 1  (Ji' of squares of approximate unit normal deviates. These approxik
mate unit normal deviates are not quite independent, since L (Xi

ei )
= O.
i
Thus our "X2," because of this restriction, and on account of the weights, is not a true X2(k) as defined in (i .27.1). However, it may be shown that it is approximately a true X2(k  I). As an example, suppose that the losttime accidents reported for a certain period for three shifts are 1, 7, and 7. We wish to test the null hypothesis that (Jl = (J2 = (J3' i.e., that (Ji = 1/3. There are a total of 11 = 15 accidents; so the expected number for each shift is e i = n(J; = 15 X 1/3 = 5, and the corresponding values of di = Xi  e i for the three shifts are 1  5 = 4, 7  5 = 2, and 7  5 = 2. The test statistic, with degrees of freedom k  1 = 2, is
X2
=
i d~e = (_4)2 + 22 + 22 = 4.80. 5 5 5 i
(2.20)
i
The 0.90 and 0.95 points of X2 with 2 degrees offreedom are 4.61 and 5.99. The observed value of 4.80 being intermediate between these two points, our P value is approximately 1  0.91 = 0.09. As another example, consider the data of Table 3.1l. Under the null hypothesis that the observations are Poisson distributed, we have a sample of size 576 which should be multinomially distributed with (Ji' given by the column headed p{x}. The columns headed IIp{X} give the expected numbers e;; the last two columns are d; = Xi  e i and d;Je i • Since the X2 approximation for the multinomial distribution is not very satisfactory for e i < 5, the cells for X ~ 4 are combined into a single cell. Summing the entries in the last column gives X2 = 1.03. As regards the degrees of freedom for this X2, here we have five classes; so k = 5, and we have constrained :E di to be zero, and we have also estimated the parameter; of the Poisson distribution which we fitted to the data. The degrees of freedom are therefore 5  1  I = 3. Table III of the appendix gives for three degrees of freedom the 0.10 and 0.25 points of the X2 distribution as 0.58 and 1.21; so 1  P c::= 0.20 and p,....., 0.8. The data of Table 3.11 are thus very well fitted by a Poisson distribution.
5.3
SECT.
211
CONTINGENCY TABLES
5.3. Contingency Tables Suppose we have a random sample ofn objects, crossclassified according to two attributes A j , Bj" Let the probability of an object having the attributes Ai. B; be Bj ;, and the number observed in the sample be no:
k
'Ill
k
m
i
;
j
;
L L nij = n, L L B = 1. j;
Let the marginal row probabilities be
0;. and the marginal row sums be 1lj. as in Tables 5.1 and 5.2. By (1.19.1) and (1.19.2), k
B.; =
LB
(3.1)
j ;.
j
Table 5.1
Table 5.2
Aj
Ai
°.1
n. 1
0.;
n.m
11..
If the true probabilities B;; are known, the agreement between the observed and hypothetical distribution can be tested by the statistic (2.17) developed in the previous section, k '" ( B )2 2 _ '" '" ll i ;  II ji X kk ' I1B;;
j= km 1.
j,
(3.2)
However, usually in practice the Bij have to be estimated from the data. We are usually interested in testing the hypothesis that the row and column classifications are independent, i.e., that
Pr{A B,} = Pr{A Pr{B,}, j,
i.e.,
j}
Bij
=
Bj.B. i .
(3.3) (3.4)
We now find the maximum likelihood estimators of Bj • and B.,. We first find a suitable expression for the likelihood function. Assuming
212
MULTINOMIAL DISTRIBUTION AND CONTINGENCY TABLES
CHAP.
5
that we have a multinomial distribution of the form (1.2), if we take a sample of size 1 then the xi} are all zero except for one of them, which has the value 1. Thus the X;;! equal either O! = 1 or I! = 1. Thus the frequency function is n. 0"'" .. ,O"'km = rr rr O"!I; . ,... . ,11 ,
X 11 •
10
10m
X km •
'"
•• .=1 3=1
(3.5)
.;.
Assume now that we take a sequence of n samples, each of size 1, and let nij be the sum of the X;; summed over this sequence of n observations. Thus no is equal to the number of occurrences in the ijth cell. The likelihood function (2.3.1) is n
L = p{ Xl} ... p{ Xn} =
'm
k
k
m
rr rr rr Ofj' = rr rr OFF.
(3.6)
i;
i;
Under the null hypothesis of independence of the row and column classifications, (3.4), we have
Il1](Oi. o.;ti' = (IlI}Oi~ii)(ilIlO~iI).
L=
(3.7)
Now we can write for the first parenthesis,
rr rr O?,iI = rr (O nII O!'i2 ... O?,lm) = rr 10
'"
10
t.
10
t.
t.
t.
i i i
on
O~; t.
1I1i =
i
rr O!'i. 10
t.
,
(3.8)
i
and analogously for the second parenthesis. Thus (3.9) 10
Now since
L 0i. =
1 we can write Ok. as 101
Ok. = 1  LOi .. Substituting in (3.9), we get L=
(3.10)
i=l
(1 _~;;10. )1I k·rrk1 o.ni·rr'" 0",; £., ..
•.
;=1
i=l
.; ,
(3.11)
;=1
and, taking logarithms, log L =
n.10g( 1  :~:Oi.) + :~:ni.10g 0i. + ~t; log 0.;. k
(3.12)
To find the value of 0i. which maximizes this, we differentiate with respect to 0i. and equate to zero:
olog L
1
OOi.
1 _ £., '" 0.t.
  = nd1)
101
;=1
+ l1 i . ~1 = O. v •.
(3.13)
SECT.
5.3
213
CONTINGENCY TABLES
Using (3.10), this gives (3.14) so (3.15) k
But
1 fJi . =
1, so fJk./n k. = lin. Substituting in (3.14),
i=l
fJ.,. = n.t. (fJk.) = Ilk.
(3.16)
11;. • 11
Similar arguments must apply to the estimation of e.;; so fJ. j = njn. The maximum likelihood estimators of the eij are thus
fJ .. = Ill. n.; " n n,
(3.17)
and these can be inserted in (3.2) to give k m ( l = 11 Ilii 
i;
/
lli.Il.; 11
lI i .lI. j
)2
(3.18)
ln
However, this X2 does not have the degrees of freedom of the X2 in (3.2), namely km  I, since we have estimated a number of parameters from the data. We have estimated k parameters ei ., but, in view of the restrick
tion that 1
ei . =
1, only k  I of these are independent. Likewise m  1
i
degrees of freedom are taken in estimation of the freedom for the X2 in (3.l8) are thus (km 
e.
j •
The degrees of
1)  (k  1)  (m  1) = (k  I)(m  1).
(3.l9)
As an illustration of a contingency table, Abrahamson et al. [1] present data on the bacterial count of three types of pastry (Table 5.3). We ask whether the distribution of class of bacterial count varies according to type of pastry. We estimate the expected number for each cell as Ili . n.iln. For i = 1, j = 1, for example, this is 175 x 220/368 = 104.620. The calculations are given in detail in Table 5.4. Summing all the entries in the third part of this table, k m
11 d:ileij = i
11.389,
i
and this will be distributed as X2 with degrees offreedom (3  1)(3  1) = 4 under the null hypothesis. The 0.975 point of X2( 4) is 11.1; so P < 0.025, and therefore we reject the null hypothesis that the distribution of bacterial
214
MULTINOMIAL DISTRIBUTION AND CONTINGENCY TABLES
CHAP.
5
count is independent of the type of pastry. In other words, the pastries do differ in their distributions of bacterial counts. This is all that the X2 test as applied to a contingency table will do for us. It does not tell us in what specific way or ways the pastries differ. To form an opinion on that question we need to compare visually the table of
Type of pastry Eclairs Napoleons Coconut custard pies Totals
Table 5.3 Low Medium 92 53 75 220
37 15 19 71
High
Totals
46 19 12 77
175 87 106 368
Table 5.4 Expectations eo 104.620 52.011 63.370
33.764 16.785 20.451
36.617 18.204 22.179
Observed minus expectation do 12.620 0.989 11.630
1.5223 0.0188 2.1344
3.236 1.785 1.451
0.3101 0.1898 0.1029
9.383 0.796 10.179
2.4044 0.0348 4.6716
expectations with the table of observations and see if it is obvious in what way the discrepancy between expectation and observation is arising. This has to be done. on a commonsense basis. If we attempt to formulate and test any particular hypothesis, its real significance level is distorted by the fact that we are making multiple tests suggested by the data. In the present instance, there are no obvious difficulties to the obvious interpretation of the diJ in Table 5.4: Coconut custard pies have a relatively small number of high counts and a relatively large number of low counts, napoleons are average, and eclairs have a relatively large
SECT.
5.4
THE TWO X TWO TABLE
215
number of high counts and a small number oflow counts. In other words, stay away from eclairs. For a very extensive examination of contingency tables with particular emphasis on measures of association that are relevant to particular purposes, see Goodman and Kruskal [24]. For a general review of the X2 test see Cochran [5, 6]. The examination of threedimensional contingency tables proves surprisingly difficult: for a recent review see Goodman [7]. 5.4. The Two x Two Table The 2 X 2 contingency table is a special, but frequently occurring, case of the k X m contingency table. We suppose that in an infinite population all elements are categorized as Al or A2 and simultaneously as BI or B 2. For example, Al and A2 could be fair hair or otherwise and BI and B2 could be blue eyes or otherwise. We suppose that the fraction of the population classified as AiB, is ()o. We take a random sample of size n from this population and obtain the results in Table 5.5. Table 5.5 B2
Totals
nll
nl2
nl.
n21
n22 n.2
n2. n
BI Al A2 Totals
n.l
The total number of observations n is fixed, but both the row totals and the column totals are random variables. We are interested in testing the null hypothesis (3.4), for which an appropriate statistic is (3.18), which in this case will have 1 degree of freedom. For Table 5.5, (3.18) can be written after some manipulation as (4.1)
If the rows in Table 5.5 were labeled Population 1 and Population 2 and the columns were labeled Defective and Nondefective, then Table 5.5 would be identical, apart from the difference in sampling procedure, with Table 3.1. For Table 5.5 the null hypothesis is that ()ij = ()i.()., for all i andj, and in particular ()n = ()1.e.1 and ()21 = ()2.().1, so that, under
216
MULTINOMIAL DISTRIBUTION AND CONTINGENCY TABLES
CHAP.
5
the null hypothesis of independence, 0 °0 = ° = 0· 11
1.
21
.1
(4.2)
2.
From the equation for conditional probability, (1.4.5), we can write
I
Pr{defective item from population i} _ Pr{defective and from population i} Pr{item from population i}
(4.3)
Identifying the righthand side of (4.3) with i = 1,2, with the first and last terms in (4.2), we have equivalent to the null hypothesis of independence the relationship
I
Pr{defective item from population I} = Pr{defective item from population 2}. (4.4)
I
Thus a test of the null hypothesis of independence 0ii = 0i.O.i is identical with a test that the proportion of defectives is the same in the two populations. We have encountered three situations giving rise to similar 2 X 2 tables, namely Tables 3.1, 3.9, and 5.5. The differences between the three situations was first emphasized by Barnard [8] and Pearson [9]. In Pearson's nomenclature, Problem I is what we discussed in Section 3.10, where both row totals and column totals were fixed, Problem II is what we discussed in Section 3.6, where one set of marginal totals (rows) was fixed and the other a random variable, and Problem III is what we discussed in this section, where both sets of marginal totals are random variables. The large sample statistics for testing the appropriate null hypotheses are (3.10.7) or (3.10.9) for Problem I, (3.6.8) or (3.6.10) for Problem II, and (4.1) for Problem III. We first note that (3.10.9) is algebraically identical with (3.6.10). Secondly, the distribution of the square root of (4.1) will be a unit normal deviate, since X2(1) = u2 • But the square root of (4.1) is identical with (3.10.9). In all three cases the correction for continuity improves the approximation. For (4.1) the correction takes the form of bringing each "0 closer to its expectation by 1/2. This is equivalent to replacing the factor (11111122  11211112)2 in the numerator by (4.5)
the other terms in (4.1) remaining unchanged. In this form the adjustment is widely known as "Yates' correction."
SECT.
5.5
217
LIFE TESTING
These three large sample approximations are, of course, only valid for large samples. For most purposes, the approximation is satisfactory if the minimum expectation, calculated as lli,I1,;/I1, regarding the 2 X 2 table as being Problem III whether it is or not, exceeds some number such as 5. There is some reason to suppose that this is rather conservative and that it would be satisfactory to use 3.5 as the minimum expectation. If the minimum expectation is too small to allow the large sample approximation to be used, if we are dealing with a Problem I we can use the Fisher exact test of Section 3.1 0 as an exact solution. If we are dealing with a Problem II or III, it was shown by Tocher [10] and Sverdrup [11] that the Fisher exact test is an exact solution to these other situations. An exposition of their arguments is beyond the level of this book. See also Lehmann [12], Sections 4.4 and 4.5. 5.5. Life Testing Suppose that 11 electron tubes are placed on life test and that we assume that their life has an exponential distribution p{t} = ()e 6t • Tubes will fail at varying times, but there is an appreciable probability that we will have to wait a very long time for the longest surviving tube to fail. The
I I
o
nr
I I
Figure 5.1
problem we discuss in this section is the derivation of the maximum likelihood estimator of () from the observations t l , • •. ,tr , r < 11, i.e., when the observations are terminated after the failure of the rth tube and we have no knowledge of what the lives of the remaining 11  r tubes would be other than that they are greater than tr • We imagine the t axis divided into intervals as in Figure 5.1. The numbers above the axis are the numbers of tubes whose lives terminate in the corresponding intervals. The probability element P{tl' t 2, ... , tr } dt l dt2 ... dtr is equal to the probability given by the multinomial probability function (1.2), where the x's are 0, 1, 0, ... , 1, 11  r, and the corresponding ()'s are
i
ll
o
p{t} dt,
ii1+dt1 11
p{t} dt, ... ,
itr+cltr
p{t} dt,
tr
i
oo
p{t} dt.
tr+dlr
218
THE MULTINOMIAL DISTRIBUTION
CHAP.
5
Thus
p{tl , t2, ... , t.} dt l dt2 ... dt.
=
II! O!l!"'l!(I1r)!
((llp{t} dt)0(it 1+dl 1p {t} dt)l ...
Jo
I
(iOO
p{t} dt)n. (5.1) tT+dlT
X
=
II! (11 
r)!
ill+dllp{t} dt ... itr+dITp{t} dt(iOO p{t} dt)n.. 11 IT IT+dIT
By the mean value theorem,
f/(X) dx = (b
iIili+dl~{t} dt ~ p{tJ dti = ()e
8Ii

(5.2)
a)/W where a S ~ S
b, so
= 1, ... , r.
(5.3)
dti'
i
Also, using (1.12.5)
i
= e8IT.
oo p{t} dt,..., fOOp{t} dt tT+dlT tT
(5.4)
Thus substituting (5.3) and (5.4) in (5.2), and canceling out the dt j , i = 1, ... , 1', we get
p{tl , ... , t.}
=
n! ()e 811 (n  I')!
=
n! (n  I')!
()r
•••
()e8IT(e8IT)n.
exp (()
±
ti) exp [(n  I')()t.]. (5.5)
i
This is also the likelihood function, so log L = log n!  log(n  I')!
•
+ I' log () 
() L ti  (n  I')()t.
(5.6)
i
d log L ~
= '0I' 
.t• ti 
(11  I')t..
(5.7)
Equating this to zero and solving for (J gives (J=_I'_
•
Li ti + (n
(5.8)
 I')t.
For a further discussion of this topic, see Epstein and Sobel [13], who show that 21' (Jf() is distributed as x2(2r), which implies that (J is an unbiased estimator of () and that it has variance ()2fr which is independent of /l, and which permits tests to be made and confidence intervals to be constructed. Epstein and Sobel give a table which shows, for example, that
219
EXERCISES
the expected time to the tenth failure in a sample of twenty is 0.23 times the expected time to the tenth failure in a sample of ten.
EXERCISES 5.1. In a particular hour interval on four consecutive days the numbers of bugs of a certain type caught by a bugcatching device were 11, 25, 19, and 35. Test the null hypothesis that the expected numbers of bugs caught per hour were the same on the four days. 5.2. In intervals of 50, 70, 50, and 75 minutes on four consecutive days the numbers of bugs of a certain type caught by a bug catching device were 11, 25, 19, and 35. Test the null hypothesis that the expected numbers of bugs caught per hour were the same on the four days. 5.3. In a series of autopsies on 199 heavy smokers evidence of hypertension was found in 38.2 per cent of the cases. For moderate, light, and nonsmokers the corresponding numbers of cases were 288, 152, and 161 and the corresponding percentages 40.3, 45.5, and 50.3. Test the null hypothesis that the probability of hypertension is independent of the smoking category [Data from Wilens, Sigmund L., and Cassius M. Plais, "Cigarette Smoking and Arteriosclerosis," Science, 138 (1962), 975977].
5.4. The number of window air conditioners in nine rows of row houses in an eastern city are as follows: Row Number of houses Number of houses with air conditioners
2
3
4
5
6
7
8
9
23
43
43
41
41
42
42
39
36
5
8
18
3
17
11
25
19
18
Test the null hypothesis that the probability that a house has an air conditioner is independent of which row it is in. 5.5. In two highaltitude balloon flights near the north magnetic pole, the numbers of positive and negative electrons in cosmic rays were counted, and further each particle was categorized by the energy range into which it fell: Energy interval Number of electrons MEV
Positive
Negative
50100 100300 3001000
9 32 23
20 51 117
[Source: De Shong, James A. Jr., Roger H. Hildebrand, and Peter Meyer, "Ratio of Electrons to Positrons in the Primary Cosmic Radiation," Physical Review Letters 12 (1964), 36.]
220
THE MULTINOMIAL DISTRIBUTION
CHAP.
5
Test the null hypothesis that the relative proportions of positive and negative electrons are independent of the energy range.
REFERENCES 1. Abrahamson, Abraham E., Rubin Field, Leon Buchbinder, and Anna V. Catilli, "A Study of the Control of Sanitary Quality of Custard Filled Bakery Products in a Large City," Food Research, 17 (1952), 26877. 2. Goodman, Leo A., and Williams H. Kruskal, "Measures of Association for Cross Classification," JOlll'llal of the American Statistical Association, 49 (1954), 72364. 3. Goodman, Leo A., and William H. Kruskal, "Measures of Association for Cross Classifications. II: Further Discussion and References," JOlll'llal of the American Statistical Association, 54 (1959). 12363. 4. Goodman, Leo A., and William H. Kruskal, "Measures of Association for Cross Classifications. III: Approximate Sampling Theory," JOllI'I/al of the American Statistical Association, 58 (1963), 31064. 5. Cochran, William G., "The X2 Test of Goodness of Fit," Annals of Mathematical Statistics, 23 (1952), 31545. 6. Cochran, William G., "Some methods for strengthening the common X2 tests," Biometrics, 10 (1954), 41751. 7. Goodman, Leo A., "Simple Methods for Analyzing ThreeFactor Interaction in Contingency Tables," JOllI'I/al of the American Statistical Association, 59 (1954), 31952. 8. Barnard, G. A., "Significance Tests for 2 x 2 Tables," Biometrika, 34 (1947), 12338. 9. Pearson, E. S., "The Choice of Statistical Tests Illustrated on the Interpretation of Data Classified in a 2 x 2 Table," Biometrika, 34 (1947), 13967. 10. Tocher, K. D., "Extc;nsion of the NeymanPearson Theory of Tests to Discontinuous Variates," Biometrika, 37 (1950), 13044. 11. Sverdrup, Erling, "Similarity, Unbiassedness, Minimaxibility, and Admissibility of Statistical Test Procedures," Skandinavisk Aktllarietidskrift, 36 (1953), 6486. 12. Lehmann, E. L., Testing Statistical Hypotheses. New York: John Wiley and Sons, 1959. 13. Epstein, Benjamin, and Milton Sobel, "Life Testing," JOllmal of the American Statistical Association, 48 (1953), 486502.
CHAPTER 6
Some Tests of the Hypothesis of Randomness: Con trol Charts
6.1. Introduction Data are frequently obtained serially in time or space. For example, a series of determinations of the velocity of light may be spread over weeks or months. The quality of insulation of a long length of wire may be determined at a number of points along its length. In calculating the variance of the mean as a2 jn, we are making the assumption that the observations are independent and are identically distributed. It is therefore desirable to have some method of checking on this assumption. In this chapter we will consider two such tests, the first appropriate for continuous observations with a normal distribution, and the second appropriate to a sequence of dissimilar elements of two types. This latter test is immediately adaptable to any continuous measurements by classifying them as above or below the median. The resulting test assumes continuity in the distribution but makes no assumptions about the form of the distribution, and is in fact an example of a nonparametric test, a class of tests to be discussed in Chapter 7. 6.2. The Mean Square Successive Difference Test The mean square successive difference test is a test of the null hypothesis that we have a sequence of independent observations Xl, ••. , X" from a population N(~, a2). We compute estimates of a2 in two ways. The first is the unbiased estimator (2.3.23), (2.1) 221
222
HYPOTHESIS OF RANDOMNESS:
CONTROL CHARTS
CHAP.
6
In computing this, it is usually convenient to use the identity n
n
n
.2 X~  2x.2 Xi + llX2 i i
.2 (Xi  X)2 = i
(2.2) The second estimator of (12 is d 2/2, where d 2 is defined as (2.3)
1
{_I
n~
n~
= 11 _ 1 ~;[X~l] +i~E[x~]  2~;[Xi+l]E[Xi] = 2{E[x~]  (E[X;])2} = 2V[x] = 2(12.
}
(2.4)
It was proved [l] that, under the null hypothesis,
E[d:~2J
= 1,
and
(2.5) (2.6)
Thus the test statistic
.J(11 
2)/(11 2  1)
(2.7)
is approximately distributed as a unit normal deviate under the null hypothesis. The exact distribution under the null hypothesis has been tabulated for 11 over the range 4 to 60 (but note that in [2] the sample estimate S2 was defined with 11 in the denominator instead of 11  I). This tabulation shows that even for 11 as small as 10 the normal approximation (2.7) is good. The alternative to the null hypothesis is usually that consecutive observations tend to be correlated positively with their predecessors. The successive differences Xi+l  Xi therefore tend to be smaller than they would
SECT.
6.2
223
THE MEAN SQUARE SUCCESSIVE DIFFERENCE TEST
be under complete randomness, and so the expected value of d2j2 is less than 0'2. The numerator of (2.7) will tend to be negative, and the P value for the null hypothesis is obtained by putting (2.7) equal to Up. Table 6.1 gives the results of 23 determinations, ordered in time, of the density of the earth by Cavendish [3]. The righthand column of the table is referred to in Section 6.5. Table 6.1
1 2 3 4 5 6 7 8 9 10 11
12
Xi
di
5.36 5.29 5.58 5.65 5.57 5.53 5.62 5.29 5.44 5.34 5.79 5.10
0.07 +0.29 +0.07 0.08 0.04 +0.09 0.33 +0.15 0.10 +0.45 0.69
Using (2.1) and (2.2),
B B A A A A A B B B A B
S2
13 14 15 16 17 18 19 20 21 22 23
Xi
di
5.27 5.39 5.42 5.47 5.63 5.34 5.46 5.30 5.75 5.68 5.85
+0.17 +0.12 +0.03 +0.05 +0.16 0.29 +0.12 0.16 +0.45 0.07 +0.17
is computed as
[ (5.362 + ... + 5.852) _ (5.36 + ... + 5.85)2J S2 = 23 23  1 We compute d 2 from (2.3): d2
=
B B B A A B Median B A A A
[(0.07)2 +2~" + (0.17)2J
= 0.036260.
= 0.061941.
From (2.6) we obtain
V[(d 2/2)J = S2
23  2 23 2  1
= 0.03977.
Thus our test statistic is u = (0.061941/2)/0.036260  1 = 0.14588 p ./0.03977 0.1944
= 0.732 '
whence P = 0.23. This is substantially greater than 0.05; so we would not reject the null hypothesis of randomness at the 5 per cent level of significance.
224
HYPOTHESIS OF RANDOMNESS:
CONTROL CHARTS
CHAP.
6
6.3. Runs of Elements of Two Types Suppose that we have m elements oftype A and n of type B, m + n = N, and these N elements are selected randomly one at a time without replacement and the sequence of A's and B's recorded. In this section we consider the distribution of the number of runs, say u, a run being a sequence of like elements. For example, if m = n = 5, and we observe any of the sequences
ABBBBBAAAA, AABBBBBAAA, AAABBBBBAA, AAAABBBBBA, or BAAAAABBBB, etc., (3.1) then U = 3. We want Pr{u = 3}. We will consider first two combinatorial problems. We will determine the number of ways in which r indistinguishable objects can be placed in n identifiable cells, firstly with no restriction on the number of objects that can be put in anyone cell (other than, of course, the obvious restriction that we cannot put in anyone cell more objects than we have, namely r), and secondly with the restriction that no cell can be empty. Consider a row of n cells with r objects to be placed in them. The n cells can be represented by the spaces between n + 1 bars; e.g.,
IAAAIIAAIII
°
represents a row of 5 cells with 3, 0, 2, 0, and objects A in the cells. Of the n + 1 bars, one must always be in the first position and one in the last, but the remaining n  1 may be anywhere. Each arrangement of n  1 objects of one type, namely bars, and r objects of a second type, namely A's, determines a different way of filling the n cells, and by (1.7.7) the number of such arrangements, and hence the number of ways of filling the n cells, is [(n  1) + /']! (n  I)! r!
(3.2)
Now suppose that the number of objects exceeds the number of cells, i.e., r > n, and we impose the condition that no cell is to be empty. Now no interval between objects can have more than one bar, since if it did have say two, then these two bars would define a cell which would be empty. Of the r  1 spaces between the r objects, we can choose n  1 of them as places to put the bars (we have a total of n + 1 bars with two of them committed to the two end positions). The number of ways in
SECT.
6.3
225
RUNS OF ELEMENTS OF TWO TYPES
which we can make this choice is, by (1.7.7), ( 1'1) n 1
= (n
(1'1)!  1)! (I'  1I)! .
(3.3)
This is the number of ways in which we can place I' objects in n cells, r ;;:: 11, with none of the cells empty. An alternative argument is as follows. Given I' objects, first place n of them in the n cells so that every cell contains precisely one object. This leaves r  11 objects which can be placed in the n cells now without any restriction. Using (3.2) with r replaced by I'  11, the number of ways is
[(11  1) + (I'  1I)]! (I'  1)! (3.4) (n  1)! (I'  1I)! (n  1)! (I'  1I)! ' as before in (3.3). We now turn to obtaining Pr{U = u} given m objects of one kind and /I objects of a second kind. From (1.7.7), the total number of arrangements
. (m m+ n) = (m n+ n) .
IS
We assume that all these arrangements
have equal probability, so Pr {U= u} = number of arrangements giving u runs . total number of arrangements
(3.5)
Now u is either odd or even. Let us consider the case where it is even, so that u = 2v where v is an integer. Then there must be v runs of objects of the first kind and v runs of objects of the second kind. Consider the runs of objects of the first kind and regard each run as a cell. By (3.3), the number of ways we can fill these v cells from m objects in such a way that no cell is empty is (: ~
n.
Similarly, there are (:
=:)
ways in
which the n objects of the second kind can be placed in v cells, no cell being empty. Finally, given a sequence A··· B· .. A .. " we can exchange every A group with the B group following it so that we have a sequence B ... A ... B ... , and this sequence will have the same number of runs as the first. Thus the number of arrangements giving u runs is
2(111 
l)(nv1  1).
v1
(3.6)
Substituting this in (3.5) gives
2(11; ==
Pr{ U
D(: == D
= u = 2v} = '(,n+'n)111
(3.7)
226
HYPOTHESIS OF RANDOMNESS:
CONTROL CHARTS
A similar argument for the case of u odd, say equal to 211
CHAP.
6
+ 1, gives (3.8)
For the example of (3.1), m
Pr{V
~
3}
=
Pr{V
= n = 5, u = 3, and
= 2} + Pr{V = 3} 2 252
8 252
=  +  = 0.0397. (3.9)
This problem was discussed by Stevens [4] and Wald and Wolfowitz [5]. The cumulative probability Pr{V ~ u} has been tabulated for m ~ n ~ 20 by Swed and Eisenhart [6]. 6.4. An Approximation to the Distribution of the Number of Runs of Elements of Two Types The formulas for the exact probabilities of numbers of runs of two types of elements obtained in the previous section are too cumbersome to use in practice, particularly when we 'are beyond the range of Swed and Eisenhart's tables [6]. In this section we obtain the expected value and the variance of the number of runs, u, and then assuming that u is normally distributed we can obtain P values for observed u's. It is convenient to consider the number of transitions t rather than the number of runs u: A transition is defined as the point where one run ends and another begins. The number of transitions must be one less than the number of runs, as the last run ends at the end of the sequence, and this point is not counted as a transition. Hence t=ul.
(4.1)
We want to find E[u] = E[t] + 1 and V[u] = V[t] under the null hypothesis. Denote the total number of gaps between elements where transitions could occur as N' = N  1. Define ti as the number of transitions at the ith gap, i.e., if either A, A or B, B on the two sides of the gap, if either A, B or B, A on the two sides of the gap.
(4.2)
SECT.
6.4
DISTRIBUTION OF THE NUMBER OF RUNS OF ELEMENTS
227
:ATI
Then the total number of transitions t
= L ti' and E[ti ] is the probability i
that a pair of consecutive elements are dissimilar. We have Pr{first in a pair is an A} = _n_l ,
(4.3)
m +n
I
Pr{second in a pair is a B first was an A} =
n
m +n
1;
" IS an AB} = mn P{ r a pair . m+nm+nl
so
(4.4) (4.5)
Using these and similar results we can tabulate the probabilities of all possible types of pairs (Table 6.2), along with the corresponding values of ti and t~. Table 6.2 Type of pair
ti
t~
•
Probability
AA AB BA BB
0
0
1 1
1 1
0
0
m(m  1)/(m + n)(m + n  1) mn/(m + n)(m + II  1) mn/(m + n)(m + n  1) n(n  1)/(m + 1I)(m + n  1)
11 
1)
It follows that E[t i ] = 1 X (111
11111 11)(111 +
+
211111 = (m + 11)(111 + 11 
E[t~]
=
+1X 1)
(111
+
11111 11)(m + 11  1)
,
(4.6)
211111
(111
+ 11)(111 + 11 
(4.7)
1)
We thus obtain
E[t]
= E[! tt] = i
since N'
= m + 11 
! E[tt] = N' (m + 11)(m211111+ 11 i
1)
= ~, m + 11
(4.8)
1, and the expected number of runs is E[II] = 1 + 2m11 • 111
+ II
(4.9)
228
HYPOTHESIS OF RANDOMNESS:
CONTROL CHARTS
CHAP.
6
We now want the variance of u: V[u]
=
t ti N'
]
=
V[t]
=
Z V[ti ] + 2 i=l Z i=i+1 Z COV[ti' ti]'
V[
N'
N'l
N'
(4.10)
i
The first term is readily evaluated:
z V[ti] = N'V[ti] = lV'
N'{E[t~]  (E[t i])2}.
(4.11)
i
The second term in (4.10) involves the COV[li' t 1 ] which can be calculated as since £[li] = £[t i ]. The terms tili are best considered in two groups: 1. The terms litHl which involve adjacent gaps, in which the pair of elements determining Ii has as its second element the first element of the second pair determining IH1 . 2. The terms lilk' where k > i + I, for which the two pairs of elements do not have an element in common; i.e., the ith and kth gaps are separated by one or more other gaps.
We need the numbers of terms of these two types. The total number of pairs Ii' Ii> with i < j, is the number of combinations that can be formed from N' items taken two at a time, i.e., ( N') 2
=
N'! 2! (N'  2)!
=
N'(N'  1) . 2
(4.13)
The number of pairs of type I is N  2 = N'  I, since, if we consider a typical sequence of elements 1,2,3,4,5, then the only adjacent pairs ofl's that can be formed are 11 ,12 from elements 1,2,3; 12 , la from elements 2,3,4; la, 14 from elements 3,4,5. The number of pairs of type 2 can then be obtained as the total number of all pairs minus the number of pairs of type 1 : N'(N'  1) _ (N' _ 1) 2
= (N'
 1)(N'  2) . 2
(4.14)
To compute the COV[li' Ii] from (4.12) we need £[I/I i ] for the two types of pairs. For type 1, the pairs t i , tiH are those pairs of transitions with one element in common. All possible examples are listed in Table 6.3. The only sequences giving nonzero values for litiH are ABA and BAB.
SECT.
6.4
DISTRIBUTION OF THE NUMBER OF RUNS OF ELEMENTS
229
Table 6.3
Sequences of elements
AAA AAB ABA ABB BAA BAB BBA BBB
000 010 1 1 1
1 0 0
1 0 0
111 010
000
These have probabilities
_111 . _ n 
1111 n III n  1 and ..:..:..:m+n lI1+nl lI1+n2 lI1+n lI1+nl lI1+n2 (4.15) Thus  1) E[ ti ti+1 ]  1 X       111/1(111 ''(Ill + n)(111 + n  1)(m + n  2) + 1
X
(Ill
111/1(/1  1) n  1)(111 + n  2)
+ n)(111 +
111n (4.16) =(m + n)(111 + n  1) Inserting this in (4.12) we get for the covariances of the pairs of the type ti' ti+1
Cov [ti , ti +1]
=
111/1 (111 + /1)(111 + n  1)

2 (E[t;]).
(4) .17
There are N'  1 = m + n  2 such terms. To compute Cov[t i, td, k > i + 1, we need E[titk ] for the pairs of t's formed by the types of pairs of elements listed in Table 6.4. The only types giving nonzero values for titk are AB· .. AB, AB' .. BA, BA ... AB, and BA ... BA, all of which involve two A's and two B's: These types all have the same probability III /I /111 /II ' 111+/1 111+/11 lI1+n2 lI1+n3
(4.18)
Thus E[ titk ] _
411111(111  1)(/1  1) (Ill + n)(111 + n  1)(/11 + /I  2)(111 + n  3)
(4.19)
230
HYPOTHESIS OF RANDOMNESS:
CONTROL CHARTS
CHAP.
6
Inserting this in (4.12), we get of the covariances of pairs of the type f i' f k' k > i + I, 411111(111  1)(11  1) _ (E[ .)]2 Cov [f". tk ] = t, . (111 + 11)(111 + 11  1)(111 + 11  2)(111 + 11  3) . (4.20) By (4.14), there are (111 + 11  2)(m + II  3)/2 such terms. Types of pairs
Ii
Ik
AA· ··AA AA···AB AA···BA AA···BB AB···AA AB···AB AB···BA AB···BB
0 0 0 0 1 1 1 1
0 1 1 0 0 1 1 0
Table 6.4 Types of pairs lilk BA···AA BA···AB BA···BA BA···BB BB···AA BB···AB BB···BA BB"'BB
0 0 0 0 0 1 1 0
Ii
Ik
lilk
1 1 1 1 0 0 0 0
0 1 1 0 0 1 1 0
0 1 1 0 0 0 0 0
We now substitute in (4.12) the variance term (4.11) and the two covariance terms (4.17) and (4.20), the two latter with the appropriate coefficients, namely, the numbers of terms of the two types: V[II]
= (111 + 11
 1){E[t~]  (E[t i ])2}
+ 2(111 + 11 +2
1 (111 2
X 
+
x { (111
+
11)(111
2){
(111
11111
+ 1I)(m + 11 
11  2)(111
+
1)
 (E[t;])2}
11  3)
411111(111  1)(11  1)  1)(111 + 11  2)(111
+ II
+ 11
_ (E[t i ])2}. 
3)
em
(4.21)
The sum of the coefficients of (E[t;])2 is easily shown to be + II  1)2. Substituting for E[ti] from (4.6) and for E[t:] from (4.7), (4.21) reduces to  111  11) V[1 ] _ 211111(211111 1  .
(111 + 11)2(111 + 11  1) Thus, if u is an observed number of runs, the statistic II  [211111/(111
+ 11) + 1] + 11)2(111 + II
(4.22)
(4.23) ../211111(211111  111  11)/(111  1) is under the null hypothesis a standardized variable which for large m and 11 is approximately normal.
SECT.
6.5
231
RUNS ABOVE AND BELOW THE MEDIAN
The above statistic has been studied by Stevens [4] and Wald and Wolfowitz [5]. Wallis [7] suggested making a correction for continuity by bringing the observed u closer to its expectation by 1/2. For very small values of m and n the normal approximation will be unreliable, and the exact values of the distribution tabulated by Swed and Eisenhart [6] should be used. Usually the alternative hypothesis envisaged is onesided: Usually we anticipate the runs of like elements to be greater in length and hence fewer in number than under the hypothesis of randomness. For example, if a row of tomato plants contains diseased plants we might expect these to occur in groups. The usual critical region for the statistic (4.23) is therefore will occur in (4.5). The samples containing Xl are made up by selecting 11  1 other elements from the available population of N  1 elements, and this can be done in
(~= :)
ways. There will therefore be
samples containing Xl. In other words, Xl occurs The same will apply to every other Xi. Thus E[x] =
(N  I)N1 J (N)l[l n nnli 
Xi
(~
= 1
=:)
1N Xi =
Nt
(~= :)
times in (4.5).
~
(4.6)
by (4.1). The expected value of the sample mean is thus the population mean. The variance of x is V[x] = E[X2]  (E[X])2. (4.7) We have already found E[x]. Now consider E[X2]: (~)
E[x 2]
=
1
(~r)
t X~p{x;} = (~) t x~.
(4.8)
But
t(~)
x~
=
[1
~
(Xl
+ X 2 + ... + Xn)J2 + ... +
[1
~
(XNn+1
+ ... + Xx) J2 . (4.9)
When we square each term, each
Xi
will give rise to an x~, and we have
already seen that in the similar expression (4.5) each times; so the squared part of (4.9) is 
1(N  11) ( 2+ ... +
n2
Xl
/I 
2)
XN
•
Xi
occurs
n
(N = 11
(4.10)
250
SOME NONPARAMETRIC TESTS
CHAP.
The expression (4.9) will also give rise to product terms
Any par
xixj•
ticular Xi and Xj will occur together in a given sequence in of the samples; so the product part of (4.9) is 2(N
2)

(X I X 2
+
XIXs
+ ... +
n2 n  2 the factor 2 arising from the fact that xixj into (4.8) gives E[x2] =
= xjxi •
7
(~=
i)
(4.11)
XN_IXN),
Substituting these back
(N)I[l(N  11)(X~ + ... + x~) n 2
11
11 
+
:2(: =~)(XIX2
+ ... +
(4.12)
XN_IXN)}
We thus have E[X2] ready for substitution into (4.7) to give V[x]. Equation N
(4.7) also requires (E[X])2. In (4.6) we found E[x]) to be (lIN) L Xi; so i
(4.13) Substituting (4.12) and (4.13) in (4.7), we get
 = [(N)l "2I(Nn  11)  N1] + ... + + [( N)l "22 (N  2)  22 ] + ... + /1 n2 N (Xl2
V[x]
11
2 XN )
2
11
(X I X 2
xN_IXN)'
(4.14)
11
The coefficient of x~
(N 
+ ... + x1 can be written as
1)'
n1
n2( : )
1
1
(Nl)!
='''2 2 N
n (n  I)! (N  /1)!
n!(N1I)! N!
Nn N  l Nn = ~ = ~ 1I(N  I) =
and the coefficient of (XIX2
2(~=i) n2 ( : )
1
1
1
N
nN
N
 2=   2
(1 1)
Nn N  N 2 n(N  1) ,
+ ... + xN_IXN) is
( 4.15)
2 2 (N2)! /1!(N1I)!  =     '    '    '~ 2 2 N n (/1  2)! (N  11)! N! =
2(n  1) 1IN(N  1)
2 2(N  /1) N 2 =  N 2n(N  1) .
(4.16)
SECT.
7.5
251
THE WILCOXON TWOSAMPLE RANK TEST
Then, inserting these results in (4.14), we get  = V[x]
1 1)
2
)]
N  n [(  2 (x 12 +"'+XN)2(X1X 2 2 +"'+XNI XN n(N 1) N N N
.
(4.17) The part in brackets is identical with the expression for 0'2, (4.3), so V[x]
= 0'2 N
 n nN1
= 0'2(1 n
(4.18)
_ n
1).
(4.19)
N1
Clearly, if N tends to infinity, then V[x] tends to 0'2/n, its customary form for an infinite population. 7.5. The Wilcoxon TwoSample Rank Test The Wilcoxon twosample rank test [6] is a test of the null hypothesis that two populations are identical, against the alternative hypothesis that they differ by a linear translation. We substitute ranks for the actual observations. As an example, Table 7.6 lists five determinations of the atomic weight of carbon from one preparation and four determinations from another preparation [8]. Ranks are allocated to the observations in order of increasing magnitude without regard to the division into two samples. Table 7.6 Preparation A Preparation B Determination
Rank
Determination
Rank
12.0072 12.0064 12.0054 12.0016 12.0077
8 7
11.9853 11.9949 11.9985 12.0061
1 2 3 6
5 4 9
Suppose that one sample is of size 11 and the other of size N  11. The test assumes that any combination of the ranks into these two groups is equally likely. The total number of ways of grouping the ranks, given Nand
11,
is the number of ways of picking
11
elements out of N,
(~).
The test then counts how many of the possible combinations give a rank sum as extreme as or more extreme than that observed.
252
SOME NONPARAMETRIC TESTS
CHAP.
7
In Table 7.6, regarding the observations from preparation B as the sample of size 11, the rank sum is 1 + 2 + 3 + 6 = 12. The only ways we could get a rank sum as' small as or smaller than this, given 11 = 4, are 1 1
+ 2 + 3 + 4 = 10, + 2 + 3 + 6 = 12, 1 + 3 + 4 + 5 = 13
1 1
+ 2 + 3 + 5 = 11, + 2 + 4 + 5 = 12.
For example, is greater than our observed rank sum 12. The P value is equal to the ratio of the number of ways we can form a rank sum as extreme as or more extreme than that observed, namely, 4 ways, to the total possible number of ways of forming sums of 11
= 4ranks,namely, (~) =
(!) = 126ways. ThusP =
4/126
= 0.0317
for a onesided test. The rationale of this procedure is that, if one distribution is displaced relative to the other, the low ranks will tend to fall in one sample and the high ranks in the other sample, and so the rank sums will be relatively low or high. In the case of small Nand 11, it is relatively easy to compute the P value directly as in the above example. For large samples, an approximate test is available based on the fact that the mean of the ranks of a sample is distributed around its expected value approximately normally. Let R be the rank sum and R the mean of the ranks of the sample of N
size
11.
= (lIN) L Xi
By (4.6), the expected value for R is E[R]
where
i
the Xi are the elements of the finite population, here the integers 1 to N. The sum of the first N integers is 1+ 2
+ ... +
N
=
+
N(N
1) ,
(5.1)
2 and also the sum of their squares is 12
+
22
+ ... +
N2
=
N(N
+
1)(2N 6
+
1) .
(5.2)
Using (4.1) and (5.1), E[R] =
1. . N(N + 1) =
N
2
N
+1.
The variance of R, VCR], will be given by (4.18) where stituting (5.1) and (5.2) in (4.2): (]2
= 1. . N(N + 1)(2N + 1) N
_
[1..
N(N
N
6
(5.3)
2 (]2
+ 1)J2 = 2
is given by subN2

1.
(5.4)
12
Thus, substituting this in (4.18),
VCR] = (N 2  1)/12 N 11
11
N  1
=
(N
+ l)(N 1211
11) •
(5.5)
SECT.
7.6
253
THE ADJUSTMENT FOR TIES
We then have
R  E[R] .JV[=R]~
R  (N
+ 1)/2
(5.6)
+ 1)
(5.7)
= ,../7.;(N:;=+:::==::l~)(N¢=n~)/:r.=12~n 2R  n(N
= '../:=;n(;:;::;N;:::+~l)?7(N:;::::::::::::::n);:;; 2. As in the Wilcoxon test, the entire set of observations, here in k groups of size II; rather than two groups as in the Wilcoxon test, are ranked, and mean ranks R; are calculated for each group. R; has an expectation (N + 1)/2 (5.3), and a variance (N + 1)(N  11;)/1211; (5.5). The ratios {R;  E[R;]}/.) V[R;] will be standardized variables approximately normal. Kruskal and Wallis denoted by H the sum of their squares multiplied by a weighting factor 1  11;/N, and showed that H has approximately a X2(k  1) distribution; i.e., H
=.± (N[R;+ 1)(N(N + /lj)/1211; 1)/2]2 . (1 _~) N i
r...I
X2(k  1).
(7.1)
approx
Two identical forms for H, more convenient for calculation, are H
= =
N  1 N
.± [R; j
(N 2
+ 1)/2]2 1)/1211;
(7.2)
_3(N + 1).
(7.3)
(N 
12 i(R~) N(N + 1) j II;
If ties occur, H should be divided by the factor
1
~T
N3
_
N
(7.4)
where T = (I  1)1(1 + 1) is calculated and included in the sum for each group of ties. While H is distributed asymptotically as X2(k  1) for large k and 11;, in small samples the approximation is not very good, and Kruskal and Wallis [11] provided tables of the exact distribution for the
SECT.
7.7
THE
H
257
TEST
case of k = 3, lli S 5. For intermediate cases they proposed approximations based on the incomplete gamma and incomplete beta distributions, for details of which see their paper. Wallace [12] has discussed several approximations, and Box [13] suggested an approximation based on the F distribution. As an example of the application of the H test, Table 7.9 gives the terminal digits of observations on six days on the mechanical equivalent of heat [14]. The first observation was 4.1849, but only the last two digits are given in Table 7.9. There are several groups of ties, the first being Table 7.9 Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Obs. Rank
Obs. Rank
Obs. Rank
Obs. Rank
Obs. Rank
Obs. Rank
49 52 43 /Ii
Rl
18 21t 15
3
42 43 42 43
12t 15 12! 15
46 52 27 51
4 54!
17 21t 5 20
4 55
11
27 71 23
1 5 23 2
38 50 41 41 4
4 63!
31
12
23(23
+ 1)
[(54 t )2 3
+ (5W + ...J 3 4
X
(23
3 9 5 7
4 24
48
three observations at 27 which receive the mean rank (4 Substituting in (7.3), H =
24 40 27 37
8 19 10! 10!
+ 5 + 6)/3 =
+ 1) =
8.753.
5.
(7.5)
However, we need the correction for ties, there being two groups of three tied ranks and three groups of two tied ranks. For each of the triplets the correction (I  l)t(t + I) is (3  1) . 3 . (3 + 1) = 24, and for each of the doublets the correction is (2  1) . 2 . (2 + 1) = 6. The correction factor (7.4) is 1 _ 24 x 2 + 6 x 3 = 0.99457' 23 3  23 '
(7.6)
so the corrected H is 8.753/0.99457 = 8.801 with 6  1 = 5 degrees of freedom. The 0.90 point of X2(5) is 9.24, and so the null hypothesis is acceptable at the 0.10 level. We can apply the H test in the case where there are only two samples, so that k = 2. The H statistic (7.1) will then have approximately the X2 distribution with 2  1 = 1 degree of freedom. In Section 1.27 we saw n
that x2(n) was defined as L
u~;
so X2(1) is just the square of a single unit nor
i
mal deviate. We shall see in Section 9.4 that specifically XL.(I)
= UL./2'
2S8
SOME NONPARAMETRIC TESTS
CHAP.
7
It is easy to show that the H statistic in the case of k = 2 is identically equal to the square of the Wilcoxon statistic (S.7). In other words, the H test for the case of k = 2 is identical with the twosample Wilcoxon test. 7.S. The Wilcoxon OneSample Test
Wilcoxon [7] proposed a test for the median of a single sample, in which we give ranks to the absolute magnitudes of the observations and then give to the ranks the signs of the corresponding observations. Essentially the test is of the null hypothesis that the distribution of the observations is symmetric about zero, so that any rank is equally likely to be positive or negative. Frequently in application the observations are differences between paired observations, as in Table 7.10 which reproduces the data of Table Table 7.10 Difference 0.3 6.3 3.7 2.8 5.8 1.4 1.7 2.3 1.7 1.6 1.8 0.6 4.5 1.9 2.4 6.8 Rank 1 15 12 II 14 3 5t 9 5t 4 7 2 13 8 10 16
7.1 on the difference in potency for 16 lots of a pharmaceutical product reported by two methods of analysis, here with ranks attached to the absolute magnitude of the differences. Table 7.10 includes the complication of two differences 1.7 and 1.7 tieing in absolute magnitude. As in the unpaired Wilcoxon test, such ties receive the mean of the ranks necessary for the group of ties, here t(S + 6) = S!. The test statistic is the sum of the ranks of either sign, 1St for the negative ranks or 120! for the' positive ranks, and the P value is the probability of getting a rank sum equal to or more extreme than the observed value. Under the null hypothesis any rank is equally likely to be positive or negative. The total number of ways rank sums can be produced is thus 2N , 216 in the case of Table 7.10. In moderate situations it is possible to enumerate all the rank sums less than or equal to the observed rank sum. Here the observed negative rank sum is 3 + S! + 7 = 1St. We proceed to write down all ways in which we can form rank sums less than or equal to IS!. Starting with 0, I, ... ,IS; continuing with I + 2, ... , I + 14, 2 + 3, ... , 2 + 13; continuing on up to 7 + 8; then counting triplets such as I + 2 + 3, quadruplets such as 1 + 2 + 3 + 4, and the quintuplets I + 2 + 3 + 4 + S!a and I + 2 + 3 + 4 + Stb; there are a total of 140 combinations of ranks with sums ~IS!. The onesided P value is thus 140/216 = 0.00214.
SECT.
7.8
259
THE WILCOXON ONESAMPLE TEST
Obviously, direct enumeration can be excessively tedious and a normal approximation is useful. Consider the ranks Ri , i = 1,2, ... ,N. Construct subsidiary variables di , where di is attached to the ith rank. These variables d; take the values 0 when a rank is negative and 1 when a rank is positive. Under the null hypothesis a rank is equally likely to be positive as negative. Then E[d i ] = 1 X
E[d~] = ,
E [did;] i'/;
=
1 0 X 0 X4
12 X
+0 X
! +0 X ! 2
2
=
!,
(8.1)
2
l2 + 02 X l2 = l2'
1 1 X4
+1X
1 0 X4
(8.2)
+1X
1 1 X 4
= 1 ;
4 (8.3)
~V
The sum of the positive ranks, say S, is L diRi' Under the null hypothesis, i
(8.4)
NN
NN
i
i
= L L R~E[d~] + L L R;RjE[did;] 1 i>Fl
j
i=1
(8.5) Thus
1N V[S] = E[S2]  (E[S])2 =  L R~.
4 ;
(8.6)
If the Ri are the integers 1 to N, then using (5.1) and (5.2),
E[S]
=
N(N
+ 1) , 4
V[S] = N(N
+ 1)(2N + 1) . 24
(8.7) (8.8)
If, instead of the integers 1 to N, the ranks contain a set of t ties, as in Section 7.6 the sum of the ranks is unchanged, so E[S] continues to be
260
CHAP. I
SOME NONPARAMETRIC TESTS
given by (8.7). The sum of squares of the ranks is changed, however. In place of the ranks R j + 1, R j + 2, ... , Ri + t we have the mean rank R j + (t + 1)/2 a total of t times [see (6.1)], so the sum of squares of the ranks is increased by
+ 1)/12.
which reduces to (t  l)t(t (8.6) and (8.8) gives V[S]
=
N(N
N
Making this change in
L R~ in ;
+ 1)(2N + 1) _
(t  l)(t 48
24
+ 1) .
(8.10)
For the data of Table 7.10, the sum of the positive ranks S = 120t, and there is one pair of ties; so E[S]
=
N(N
+ 1) =
16
X
4 V[S]
=
16 X (16
+ 1) = 68,
(16
+ 1)(2
X
16
+ 1) _
24
u_ 1 P
(2  1) X 2 X (2 48
= 374  0.125 = 373.875, = S  E[S] = 120.5  68 = 2.715 .JV[S]
(8.11)
4
.J373.875
+ 1) (8.12) (8.13)
'
whence P = 0.0033, to be compared with the exact figure 0.0021 previously obtained. While nothing generally appears to be known about the use of a continuity correction with this approximation, its use does not seem to be indicated. We have indicated how to handle ties in the absolute values of the ranks. However, if the observations are differences between paired samples, some samples may be tied in one or more pairs, giving rise to differences which are zero. It is not clear how these zeros should be handled. One procedure will be to ignore their existence completely, i.e., to delete them from the sample. 7.9. The Friedman Rank Test Consider the data of Table 7.11, which gives a function of the daily determination of the efficiency of a chemical plant for each of six runs, each run lasting seven days. We ask whether the null hypothesis that the median efficiencies for the different runs are the same should be rejected. At first glance we might consider using the H test of Section 7.7, but this
SECT.
7.9
261
THE FRIEDMAN RANK TEST
is not appropriate to the present problem; it assumes under the null hypothesis that all observations are identically distributed, whereas in the present problem we postulate as the null hypothesis merely that all observations on day 1 are identically distributed and that all observations on day 2 are identically distributed, etc., but we do not postulate that the day 2 distribution is the same as the day 1 distribution, etc. Table 7.11 Run number Day 2
3
4
5
6
30(2) 41(3) 40(2) 41(3) 37(3) 17(1) 12(1)
72(6) 38(2) 46(3) 47(5) 38(4) 60(6) 41(5)
35(3) 35(1) 33(1) 46(4) 47(6) 47(5) 38(4)
31
24
1 2 3 4 5 6 7
60(4) 62(5) 58(5) 52(6) 31(1) 23(2) 26(2)
64(5) 63(5) 63(6) 36(1) 34(2) 32(3) 27(3)
14(1) 46(4) 47(4) 39(2) 42(5) 43(4) 57(6)
1 Rij
25
26
26
r
15
i
In general, suppose that there are I' rows and c columns, that we wish to test for column differences, and that the rank in the ijth cell, the ranking being within rows, is Rij. For Table 7.11, for the first row the observations are 60, 64, 14, 30, 72, and 35, so these are allocated the ranks 4, 5, 1, 2, 6, and 3 given in parentheses beside the corresponding observations. For the ith row for the ranks Rii we have the integers I, ... , c, so E[Rij] = (c + 1)/2, V[Rii] = (c 2  1)/12 [see (5.3) and (5.4)]. Then since R. i , the mean of the ranks in the jth column, is the mean of I' R;/s, _ V[R. i ]
If we assume that
c2
1

= .
(9.1)
121'
R.i is asymptotically
normal we have
R.,.  E[R .,.] "" N(O 1) .jV[R.;l "
(9.2)
and the sum of squares of c such quantities will have approximately a X2(C) distribution. However, the R.; are not independent, since c
_
c
1
r
1
r
c
1i R.; = 11 RiJ = I '11 R;; = jl'; ; j
c( C
+ 1) 2
.
(9.3)
262
SOME NONPARAMETRIC TESTS
CHAP.
7
Along these lines Friedman [15] derived the statistic which he called X~, c
L' ' CR.i  E[R. j ])2 1 ... 2 _ _ _,_ _ , Xr  cV[R. i ] C
(9.4)
and showed that under the null hypothesis it has approximately the X2(C  1) distribution. The statistic can be put in the form
X~ =
.i (i Rii)2  3r(c + 1).
12 rc(c + 1),
(9.5)
t
For Table 7.11,
X2 r
=
7
X
12 6(6
+ 1)
(25 2
+ ... + 242) 
3 X 7 X (6
= 5.612,
+ 1) (9.6)
which is substantially less than X~.9o(5) = 9.24, so the null hypothesis can be accepted. We could, of course, use the same procedure to test whether the median efficiencies for the different days are equal. An alternative, but essentially identical, form of this test was introduced by M. G. Kendall under the name coefficient of concordance [4]. 7.10. The Cochran Q Test Consider the data of Table 7.12 taken from Cochran [16]. A number of specimens, 69 in all, were tested on four different media A, B, C, and D for the presence of a certain organism. For a particular specimen on a particular medium the outcome could be growth or no growth, represented by 0 or 1. For a particular specimen, therefore, the results can be 1111, 1110, 1101, ... , 1000,0000. Table 7.12 gives the number of specimens Table 7.12
T,
A
B
C
D
Number of cases
1 1 0 0 0
1 1 1 1 0
1 1 1 1 0
4 2 3 1 59
6
10
1 0 1 0 0 7
10
with particular results: for example, there were 4 1111's, 0 1110's (not
SECT.
7.10
THE COCHRAN
Q TEST
263
shown in the table), 2 1101's, etc. We are interested in testing the null hypothesis that the probabilities of growth with the different media, averaged over all specimens, are equal. This situation is the generalization to k columns of the situation discussed in Section 3.7. There, if each patient had been tested on k drugs rather than two, then we would have the situation discussed here. Cochran proposed for a test of the null hypothesis the statistic Q defined as c
Q=
c(c  1) L (T;  T)2 r
C
LU
; i 
i
r
(10.1)
L u~ i
where U i is the number of 1's in the ith row and T j is the sum of the entries in the jth column, allowing, of course, for the fact that in Table 7.12 each row is really repeated a total number of times equal to the corresponding number of cases. A sketch of the proof that the distribution of Q under the null hypothesis tends to X2(C  1) as the number of rows other than those all 0 and all 1 tends to infinity is quite involved, and we can obtain this result more simply by regarding the Cochran Q test as a special case of the Friedman test with ties in the ranks. * Let xiJ be a variable with the property that Xij
=
1
=0
if ijth cell has a success, if ijth cell has a failure.
(10.2)
Then since T; is the number of successes in the jth column, r
LXi; i
and since
Ui
=
T;,
(10.3)
is the number of successes in the ith row, c
LXii = ui •
(10.4)
;
It follows that (10.5)
Consider a given row with Ui successes. The successes are allocated ranks 1, ... , U i with sum ui(u j + 1)/2 and hence mean rank (u; + 1)/2. The c  ui cells with failures are allocated ranks U j + 1, ... ,c, and the mean '" This was demonstrated to the author by Nancy D. Bailey following a surmise by William H. Kruskal at a seminar by Bailey.
264
SOME NONPARAMETRIC TESTS
of these ranks is (c
+ Ui +
1.L c
C
mean rank (c
+ Ui +
1)/2 occurring
r
Ui
C 
times and the
times. Thus
(c  IIi) ( C+
{~[lIi IIi;

(10.6)
Rij .
j
+ 1)/2 occurring Ui
1+ 1+
~[lIi (IIi;
V[R ii ] =
)2
c
C
we have the mean rank (u i
j
(1.L
2
Rij 
j
c
.L Rij
7
1)/2. From (4.2), the variance of the ranks
V[Rij] = 
In
CHAP.
IIi)
(C _
+
C
1)1 + IJf
I;; + I;i
CUi  U~ = !..! 4
(10.7)
Therefore V[R.i] = V [
] 1 r .L R;i r i
1
r
4r
i
= 2.L (CUi  II~).
(10.8)
In the ith row the ranks are RiJ equal to (u i + 1)/2 or to (c depending on whether Xu = 1 or O. We can write
1 2
Rii =  [II;
+ 1 + c(l 
+ Ui +
1)/2
(10.9)
xii)]'
so l r
_
r
l
1
.L IIi + 2 (C + 2r i
R.i =  .LRij = 
r ;
1)  
c r
.Li Xii·
2,.
(10.10)
Now R.. = (c + 1)/2, and using (10.3) and (10.5) for the first and last terms on the righthand side we get c R.j  R .. =  2 (Tj  T). (10.11)
,.
Therefore r
_
.Li (R.i 
_
2
R.J
c2
c
=  2 .L (Tj 4,.

2
(10.12)
T) .

j
We now substitute (10.12) and (10.8) in (9.4) to get 2
c
~ .L(Tj
c

T)2
14,.2 j Xr =   l'rc  2 .L (CII;  II~) 4,. i 2
C 
c(c  1) .L (Tj
which is identical with Cochran's Q, (10.1).

j
C
r
r
i
i
.L IIi  .L II~
T)2
(10.13)
265
EXERCISES
To compute Q for the data of Table 7.12 we need a table showing the number of specimens having each possible value of Ui (Table 7.13) from which we get
..
~
IIi
=4
X
4
+3 X 5+2
X
1
+1 X 0 +0
X
=
59
33,
(10.14)
=
113. (10.15)
i
.
~ 1I~
= 42 X
+ 32 X 5 + 22 X 1 + 12 X 0 + 0 2 X
4
59
j
c
We also need ~ (T;  T)2 which can be calculated as ;
i
:i
!(i T;)2 = 62 + 102 + 7 2 + 10 2 
T~ 
C;
33
2
4
= 12.75.
(10.16)
Thus
Q = 4(4  1)
X 12.75 = 8.05. 4 X 33  113
Since X~.95(3)
=
7.81, P
(10.17)
< 0.05. Table 7.13
II;
Number of specimens
4 4
3 5
2 1
1
o
o 59
EXERCISES 7.1. Apply the appropriate Wilcoxon rank test to the data of Table 7.4. 7.2. A group of mice are allocated to individual cages randomly. The cages are allocated in equal numbers, randomly, to two treatments, a control A and a certain drug B. All animals are infected, in a random sequence, with tuberculosis [Note: the experimenter usually wants to infect all the controls and then all the B group, but this is terrible]. The mice die on the following days following infection [one mouse got lost]. Control A: 5, 6, 7, 7, 8, 8, 8, 9, 12 Drug B: 7,8,8,8,9,9, 12, 13, 14, 17 A preliminary experiment having established that the drug is not toxic it can be assumed that the test group cannot be worse (die sooner) than the control group under any reasonable alternative hypothesis. Report the P value for the null hypothesis that the drug is without effect, using the appropriate Wilcoxon test. 7.3. Acid is concentrated continuously in a certain type of plant. Part of the plant corrodes and eventually fails. The throughput in hundreds of tons obtained between installation and failure is recorded. These parts were obtained
266
CHAP.
SOME NONPARAMETRIC TESTS
7
from three separate foundries. Test the hypothesis that the median life is the same for the three foundries. Foundry
Throughputs obtained
A B C
84, 60,40,47,.34,46 67,92,95,40,98,60,59, 108,86, 117 46,93,100,92,92
7.4. In a rainmaking (?) experiment, rainfall measurements were made on 16 pairs of days. On one day in each pair the clouds were seeded and on the other day no seeding was done. The choice of which day in a pair to seed was made randomly. The total rainfall over the network of gauges for the 16 pairs of days is below. Test the null hypothesis that seeding is without effect with (a) the sign test, (b) the appropriate Wilcoxon test. Report onesided P values. Pair no.
S
NS
Pair no.
S
NS
1 2 3 4 5 6 7 8
0 2.09 0.07 0.30 0 2.55 1.62 0
1.37 0 0 0.10 0.44 0 1.01 0.54
9 10 11 12
0 1.87 2.50 3.15 0.15 2.96 0 0
0 0.62 0 5.54 0.01 0 0 0.75
13
14 15 16
7.5. In a trial of two types of rain gauge, 69 of type A and 12 of type B were distributed at random over a certain area. In a certain period 14 storms occurred, and the average amounts of rain found in the two types of gauge were as shown in the accompanying table. Storm
Type A
Type B
Storm
Type A
TypeB
1 2 3 4 5 6 7
1.38 9.69 0.39 1.42 0.54 5.94 0.59
1.42 10.37 0.39 1.46 0.55 6.15 0.61
8 9 10 11 12
2.63 2.44 0.56 0.69 0.71 0.95 0.50
2.69 2.68 0.53 0.72 0.72 0.93 0.53
13
14
Data from E. L. Hamilton, "The Problem of Sampling Rainfall in Mountainous Areas," pp. 469475 of Proceedings of the Berkeley Symposium 011 Mathematical Statistics and Probability, J. Neyman (ed.), University of California Press, Berkeley and Los Angeles, 1949. Obtain the onesided P value for the hypothesis that the two types of gauge are
267
EXERCISES
giving similar results, using (a) the appropriate Wilcoxon test by exact enumeration, (b) the appropriate Wilcoxon test using a normal approximation, (c) the sign test. 7.6. On a number of days coffee is prepared by two different electric coffee makers, and the time in minutes and tenths of a minute for the coffeemaking operation to be completed by each machine is recorded. The results were as below.
It is probable that the time required varies from day to day due to variation in such factors as the temperature of the water or line voltage. Use the appropriate Wilcoxon test to find the exact P value for the null hypothesis that the median times for the two machines are the same, against the onesided alternative that machine B takes longer than machine A. Make a similar test with the sign test. 7.7. Fruit juice is stored for a number of months in four types of container and then rated for quality by eight panels of tasters. The taste scores assigned by the panels were as below. Panel Container 5 2 3 4 6 7 8 A
B C D
6.14 6.55 5.54 4.81
5.72 6.29 5.61 5.09
6.90 7.40 6.60 6.61
5.80 6.40 5.70 5.03
6.23 6.28 5.31 5.15
6.06 6.26 5.58 5.05
5.42 6.22 5.57 5.77
6.04 6.76 5.84 6.17
(a) Test the null hypothesis that there is no difference between the containers. (b) Test the null hypothesis that there is no difference between the panels.  7.S. Samples of soil were taken at five locations on a farm and tested for the
presence of a certain fungus. The test was of the form that yields only yes or no answers, represented by 1 or 0 in the table below. The same five locations were sampled on a total of 18 occasions. Test the null hypothesis that the probability of a sample being positive is the same for all five locations. Occasion Location 1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5
0 0 1 1 0
0 0 1 1 0 1 0 1 1 0 1 1 1 1 1 I I 1 0 1 I 0 0 1 0
0 0 1 1 0
1 1 1 0 0
1 0 1 1 0
1 0 I
0 0
0 0 0 0 0
12
13
14
15
16
17 18
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 1 0
0 0 0 0 0
0 1 1 1 1
1 1 1 0 0
268
SOME NONPARAMETRIC TESTS
CHAP.
7
7.9. A sample of five observations, xl> ... , x5 is taken from one population and a sample of five observations Y1, ... , Y5 is taken from another population. It is found that four of the x's lie below the joint sample median. (a) Using the median test, what is the onesided P value for the null hypothesis that the samples came from populations with the same medians? (b) Suppose that the density function of Y is py{Y} =1 =
0
1
1
Z ••• , Xn from a normal population of unknown mean and variance (J'2. In this section we will show that the usual unbiased estimator of the variance
e
~ ( Xi = 1 k.,
s2
n  1 i=l
)2

X
,....., (J'
2
x2(n

11 
1) ,
1
(Ll)
where,....., is used to denote "is distributed as," and that S2 is independent of x, These two results are the basis of the Student t test to be discussed in Section 9,6, We can write 11
!
11
e)2
(Xi = i
x) + (x 
I
[(Xi i
n
m2= I
(Xi 
X)2 + n(x  e)2,
i
(1.2) Dividing by
(J'2,
(x _ 1:)2 =I_i_' n (X _ x)2 + (x _ 1:)2 I_'i_S~, 11
i
(J'
i
(J'
271
(J'/.J 11
(1.3)
272
THE PARTITIONING OF SUMS OF SQUARES
Define ui
= (Xi
~)Ia, so that Xi

_ 11 = 
X
n
Xi
= a
1Ii
= aUi
(11 n
CHAP.
S
+~. Then
)
_
+~=
Ui
1Ii
au
+ ~,
(1.4)
so X
Xi 
a
and
= aUi + ~  (aii + ~) = a
x
~
 =
alJI!
aii + ~ alfo
~
_
Ui 
ii IIJI!
(1.5)
U,
, _
=  = V
(1.6)
Ill/.
Thus (1.3) can be written as n
n
i
i
1 U~ = 1 (U i
ii)2

+ (In£()2
(1.7) (1.S)
say. n
The lefthand side
1 u~
is distributed as
x2(n).
On the righthand side,
J~ ii = ii(lIJ~) is ~ unit normal deviate, and hence its square Q1 is distributed as x2(l). We shall now prove that Q2 is distributed as X2(11  1) n
and is independent of Q1' This will establish that as a2X2(11  1) and is independent of (x _ ~)2.
1 (Xi 
X)2 is distributed
i
11
Define Ii = ui
ii, so that Q2 = 1/~, and

i
iii = i Ui a.
1.
I!ii
=
i IIi t
I!
(i 11;/11)
(1.9)
= O.
t
The number of degrees of freedom for a set of variables is defined as the number of variables minus the number of independent linear relations or constraints between them. The number of degrees of freedom of the sum of squares of a set of variables is the same as the number of degrees 11
of freedom of the variables themselves. In the present instance,
1 U~ i
involves 11 variables U i with no linear relationship between them, and so it has 11 degrees of freedom; but Q2 involves 11 variables Ii with the rest ric
" tion Iii
= 0,
and so Q2 has 11  1 degrees of freedom. The 11 variables
i
Ii are not independent [see exercise (S.l)], but we can construct from them /I 
1 variables n
(a) 1/~ ;=1
"
=1
(i = 2,3, ... , /I) with the properties that
v~,
i=2
(b) the
Vi
(c)
Vi
the
Vi
are independent unit normal deviates, (i = 2, ... ,11) are independent of V1 defined as
J;;ii.
SECT.
8.1
..
L
Thus the distribution of of Q2
=
273
THE DISTRIBUTION OF SAMPLE ESTIMATES OF VARIANCE
v~ is
X2(11  1), and therefore the distribution
l=l
L I: is also x2(n 'n
1) and is independent of Q1.
i=2
11
n
;=1
;=1
L q by using the relation L
The first step is to eliminate If from which implies that
Ii = 0, (1.10)
so I~ = I~
+ 2(12/a + 1214 + ... + 12 / + I~ + 2(1a14 + ... + la/ ..)
11 )
+ ...
+
+ I!.
(1.11)
If in Q2 = L I: will give Q2 in the form of a function i=1 1 variables Ii' i = 2, 3, ... , 11: Q2 = 2(1~ + 121a + 12/4 + ... + 12/ .. + I~ + la/4 + ... + lain 11
Substituting this for of the
11 
+
... +
+ I~).
(1.12) This can be expressed as the sum of 11  1 squares, by first collecting all terms that involve 12 , the first row, and writing them as a perfect square plus whatever is left over: 2(1~
+
12/a + 12 /4
=
{v'2[12
+ ... + 12/.. )
+ ~(1a +
14
+ ... +
11l)Jf
lI211 2a a4 ···11 an
!2 12 4
...  11 4/1
_l12 2 n·
(1.13)
274
THE PARTITIONING OF SUMS OF SQUARES
CHAP.
8
Substituting this in (1.12), Q2
=
{J2[/2 + ~ (13 + I( + ... + In)Jf + ~ I: + 13/( + ... + 13/n 2
+ ~ I!. 2
(1.14)
The next step is to collect all the terms in 13 excluding those in the first row: They are set out in the second row of (1.14). We write them as a perfect square plus whatever is left over, just as we did for 12 , The square part is
(1.15) This procedure can be continued until we are finally left with only I!. These squares can be denoted by i = 2, ... ,11:
v:'
2= J2[/2 + ~ (13 + I( + ... + u].
V
V3 = J~[/3 + ~(/4 + 15 + ... + In)].
Vi
=
J.
Vn
=
J
i
11
11
Il 
We have thus expressed Q2
[Ii + ~(li+l + li+2 + ... + In)J, I
1
= !n i=l
Vi
(1.16)
In' "
l~ as! v~. It remains to show that these i=2
are unit normal deviates and are independent of each other and of V!,
8.l
SECT.
To do this we express the
Ii == Vi
= =
V;,
i> 1, in terms of the Ui
.J'1(1,1j"
1)
1 1)
.J'1(1,1
1)
= .J'
,1 1(1 
+ liH + ... + 1,,)
(iIi ('
",(, 
1)
,_
Hli 
IU
+ lIiH  U_+ ... + II"  U_)
(illi
+ lIiH + ... + II" 
nii)
(ill;
+ 1Ii+1 + ... + II" 
111 
= 
by the substitution
ii:
Ui 
==
275
THE DISTRIBUTION OF SAMPLE ESTIMATES OF VARIANCE
.J'1(1,1
+ 112 + ... + II
[111
1)
j_
1 
112 II; 
(i 
..• lIiH 
lI i 1 ..• 
II,,)
(~,17)
1)11;].
The Vi will be normally distributed since they are linear functions of the which are normally distributed, and
E[vi ] = 
.J'1(1,1
1)
{E[1I1]
+ ... + E[lI i  1] 
lI;
(i  1)E[lIiU = 0, (1.18)
V[vJ
= , ,1 1(1 
Cov[v;, Vj]
1)
{V[lId
= E[viv j] = .J"
+ ' , , + V[lI i  1] + (i 
E[Vi]E[Vj]
1, ,
1(1 
1)2V[U;U = 1, (1.19)
l)J(J 
1) X
E[(1/1 +
' . , + l/i1 
(1/ 1 + . , , +
II ;1 
(i  1)lIi)
(j  1)1/ j)]'
(1.20)
For i < j, the expectation E[ ] will include the squared terms lI~ for i and their expectation is E[II~
+ ' , , + 1I~1 
(i  1)1I~] = 0,
< j,
. (1.21)
The expectation E[ ] will also include cross products of the form kr.uru., ,. ¥= s, where k r • is some constant. Their expectation is E[kr.uru.] = kr.E[ur]E[u.] = O.
(1.22)
Thus Cov[v i , v;] = 0, It will be shown in Section 12.3 that, if V 1 , V 2 are linear functions of normally distributed variables u1 , u2 , then V 1 , V 2 are jointly distributed in the bivariate normal distribution, and that, if Cov[v 1 , v2 ] = 0, then V1 , V 2
276
THE PARTITIONING OF SUMS OF SQUARES
CHAP.
8
are independent. Thus the Vi are independent, and since they have zero means and unit variances, and are normally distributed,
Q2
= I"
i=2
V~ '"'" x2(n  1),
(1.23)
where as before the symbol,"", is used to denote "is distributed as." It is easy to show that v1, Vi' i = 2, ... , 11, are independent, and hence Q1 is independent of Q2' To recapitulate, if we write n
n
I LI~ = (In ii)2 + I;=1(Lli ;=1
ii)2 = Q1
+ Q2'
(1.24)
then Q1 and Q2 have independent X2 distributions with degrees of freedom 1 and n  1. Substituting Ui = (Xi  ~)/(J and multiplying both sides by (J2 gives
L" (Xi 
~)2 = /l(X 
~)2
+ I" (Xt 
i
X)2,
(1.25)
i
for which the lefthand side is distributed as (J2X 2 with n degrees of freedom and the two terms on the righthand side are distributed as (J2X2 with degrees of freedom 1 and n  1, and are independent. This implies that for normally distributed random variables S2
and also that
i
= _1_ (x; _ X)2,"", 111;
(J2
X2(/l  1) ,
111
(1.26)
x is independent of S2.
8.2. The Partitioning of Sums of Squares into Independent Components The result of the preceding section will be used for the Student t test in Chapter 9. For the comparison of k means, in Chapter 10 and following, we need a stronger result, Cochran's theorem [2]. A restricted form of this theorem is as follows. We have n unit normally distributed and independent random variables U i , i = 1, ... ,n. The sum of their squares is distributed as x2(n). Suppose that Qi' j = 1, ... ,k, are sums of squares with Ii degrees of freedom. Then, if k
n
Ii
Qi
= Q1 + ... + Qk = I
II~
(2.1)
i k
and
Iii = i1 + ... + fk = i
II,
(2.2)
8.2
SECT.
277
PARTITIONING INTO INDEPENDENT COMPONENTS
then the theorem states that the Qj are independently distributed as x2(h). Actually the theorem is somewhat stronger in that the Qj can be quadratic forms,
!! ajjxixj = allx~ + a12 x l x 2 + ... + alnxlX n i
j
+ a 21 x l x 2 + a22x~ + a23x 2x 3 + .. " (2.3) where a ii = aji' We assume that each Qj is the sum of the squares of m, variables lij' i=l, ... ,mj; i.e., 2
Qj = I lj
mj
+ I~j + ... + I~,,;j = I
I~j'
(2.4)
i
We assume that, among the mj variables Iii' there are rj linear relations, so that the number of degrees of freedom for Qj is Ii = mi  r j • We are mJ
going to show that each Q,
=I
I~ can be written in the form
~
=I
Q,
i
V~"
i
in which the Vii are independent unit normal deviates. This will show that the Q, have x2(f;) distributions and are independent. If there are r i linear relations among the mi 'variables Iii' then we can use these relations to eliminate rj of the Iii and convert Qj into a quadratic form in only mj  rj = Ii variables Iii' Thus it is possible to obtain (2.5) Further, this quadratic form can be linearly transformed into another quadratic form inli variables Vli' ••• , V,;j involving only squares (2.6) in which the hi} (2.1), we have
= ± 1 (see, e.g., Section 51
of [3]). Combining (2.6) and (2.7) k
Now, if the condition (2.2) for Cochran's theorem is satisfied,
If, = 11, j
so that the number of terms on the lefthand side of (2.7) equals the number of terms on the righthand side. The theory of quadratic forms contains the result (see, e.g., Sec;;tion 50 of [3]) that, if a real quadratic form of rank r is reduced by two nonsingular transformations to the forms
278
THE PARTITIONING OF SUMS OF SQUARES
CHAP.
8
then the number of positive c's equals the number of positive k's. In (2.7) all the coefficients of the u~ are positive, in fact being all +1, and hence aU the coefficients of the V~j must also be positive. They were already known to be ± 1; so it follows that they are all +1, and (2.6) becomes (2.9) k
We now have two sets of transformations of the quadratic form! Qj' n
k
where Qj is written as in (2.5), one leading to ! u~ and the other to ! n
i
k
Ii
i
I I·
i v~J' i
In other words, ! u~ can be linearly transformed to ! ! v~i' where k
!h =
i
11;
j
i.e., the
11 Vii'S
are linear functions of the
11
i
u/s. It will be
i
convenient at this point to change the index on V from ij, i j = 1, ... 'h' to g = 1, ... ,11. If the transformation is
= 1, ... , k,
n
Vg
=
CollI l
+
Cq2112
+ ... +
Cg"lIn
=
!
Cojll i'
g = 1, ... ,11,
(2.10)
i
then
n
We know that! g
n
v: = ! u:;
so
i
(2.12)
and the coefficients of the u's on the two sides of this equation must be equal for any i = j = j', n
! C;j = g
C~i +
c: + ... + i
C!i
= 1.
(2.13)
There are no cross products UiUj' , j :r6 j', in the lefthand side of (2. 12); so on the righthand side the coefficient of UjUj' , j :r6 j', must be zero. CHCI }'
+
C2jC2 j'
+ ... +
C"jC"j'
= !" CojCgj' = O. g
(2.14)
SECT. 8.2
27~
PARTITIONING INTO INDEPENDENT COMPONENTS
If we multiply the first of the 11 equations (2.10) by
Cll,
the second by
C2l ,
etc., we get CnV l C2lV2
= =
C~llli
C~llli
+ Cn C1211 2 + ... + CllClnll n + C2lC22 11 2 + ... + C2 l C2n ll " (2.15)
Adding these
equations gives
11
+ C2l V2 + ... + CnlV n = IIl(C~l + C~l + ... + C~l) + 1I 2(CnC12 + C2lC22 + ... + CnlCn2) + ... + lInCcn cln + C2l C2n + ... + cnlc"n)
CUVI
(2.16) since by (2.13) the coefficient of Ul equals 1 and by (2.14) the coefficients of U2' ••• , Un are all zero. Proceeding in this manner, 'we multiply the 1/ equations (2.10) by C12, C22' etc., to get C12V l
+ C22V2 + ... + C"2V,, =
(2.17)
U2'
and so on. Thus in general, II j
=
+ C2jV2 + ... + C"jV" = !" CgjVg,
(2.18)
iii
(2.19)
cliVI
g
and so
~;U~ = j
But
±(± j
Cg;Vg)2
g
2" uJ = 2" v:; j
=
CgjCg'jVgVg'
g g'
j
=
i 1(1 g g'
CgjCg'j)VgVg,.
j
so the coefficients of the vg's on the two sides of the
g
equation
1v: = 11 (i g
are equal. Thus, for g
g
g'
(2.20)
CgjCg,,)VgVg,
,
= g',
!" cg,C = !" c!, = C!l + C!2 + ... + c!n = l. O"
,
(2.21)
j
and, for g :yl: g', n
!;
Cg,Cg'j '= CgIO'1
+ Cg2Cg'2 + ... + Cg"Cg'n =
We will now use these relations to show that the unit normal deviates.
Vg
O.
(2.22)
are independent
280
THE PARTITIONING OF SUMS OF SQUARES
CHAP.
8
From the definition of Vg in (2.10), it follows that (2.23)
c: V[UI] + ... + c: nV[u n] =
V[Vg] =
I
C!l
+ ... + c!n = 1,
(2.24)
using (2.21). The Vg are thus unit normal deviates, and it remains to show that they are independent. This we do by showing that they have zero covariance, this implying independence by the theorem of Section 12.3. The covariance is
Cov[Vg, Vg']
=
E[vgvg,]  E[vg]E[vg,]
= E[(CgIUI + ... + CgnU n)(Cg'IU I + ... + Cg'nUn)] = E[CgICg'IU~ + ... + CgnCg'nU~] + E[cross products in Ll g, Ug']' (2.25) But E[ugug,] = E[ug]E[ug,] = 0, since ug, Ug' are independent and have zero expectation. Hence
Cov[vg, Vu'] = CgICg'IE[u~]
+ ... +
CunCg'nE[u~] (2.26)
by (2.22). Since the Vg are independent unit normal deviates, each Q i = Ii
L vJi' changing the index on v back to ij from g, has the x (fj) distribution, 2
;
and is independent of the other Qi' This completes the proof of the slightly restricted form of Cochran's theorem which we stated in (2.1) and (2.2) The independence of the sample mean and variance, established at the end of Section 8.1, was required for Student's original exposition of the t test in 1908 [4]. A formal proof was given by Fisher in 1925 [5], an earlier proof by Helmert in 1876 being overlooked. Cochran's theorem was implicit in Fisher's early use of the analysis of variance [6, 7], but apparently did not receive a formal statement until 1934 [2]. Two papers by Irwin [8, 9] reviewed the mathematical basis of the analysis of variance. For a modern proof see Section 11.11 of Cramer [10].
EXERCISE 8.1. Show that Ii, defined in Section 8.1 as = 0, V[li] = (n  1)/n, COV[li, Ii] = 1/n.
E[li]
IIi 
ii,
has the properties
REFERENCES
281
REFERENCES 1. Hald, A., Statistical Theory with Engineering Applications. New York: John Wiley and Sons, 1952. 2. Cochran, William G., "The Distribution of Quadratic Forms in a Normal System," Proceedings of the Cambridge Philosophical Society, 30 (1934), 17891. 3. Bocher, Maxime, Introduction to Higher Algebra. New York: The Macmillan Co., 1907. 4. "Student." "The Probable Error of a Mean," Biometrika, 6 (1908), 125. 5. Fisher, R. A., "Applications of Student's Distribution," Metron, 5 (1925), 90104. 6. Fisher, R. A., "The Goodness of Fit and Regression Formulae, and the Distribution of Regression Coefficients," Joumal of the Royal Statistical Society, 85 (1922), 597612. 7. Fisher, R. A., Statistical Methods for Research Workers. 1st ed.: Edinburgh: Oliver and Boyd, 1925. 8. Irwin, J. 0., "Mathematical Theorems Involved in the Analysis of Variance," Joumal of the Royal Statistical Society, 94 (1931), 284300. 9. Irwin, J. 0., "On the Independence of the Constituent Items in the Analysis of Variance," Supplement to the Joul"llal of the Royal Statistical Society, 1 (1934), 23652. 10. Cramer, H., Mathematical Methods of Statistics. Princeton: Princeton University Press, 1951.
CHAPTER 9
Tests of Equality of Variances and Means
9.1. Introduction In this chapter we discuss tests as follows. Suppose that we have a sample Xu, •.• , X1n1 ' distributed normally with mean ~l and variance 0'1. In Section 9.2 we give the test for the null hypothesis O'i = O'~. In Section 9.7 we give the test for the null hypothesis that ~l = ~o. When we have a seco~d sample, X 21>" • ,X2n2 ' distributed normally with mean ~2 and variance O'~, we may wish to test the null hypothesis that O'i = O'~ (Section 9.3), or that ~l = ~2 assuming that O'i = O'~ (Section 9.7), or that ~l = ~2 without any assumption about O'i and O'~ (Section 9.8). Supposing that we have k samples, k > 2, the test for the equality of variances is given in Section 9.5 and the test for the equality of means is given in Chapter 10. Section 9.9 reviews the various tests for means in this chapter and for medians in Chapter 7.
9.2. Uses of the Sample Estimate of Variance 11
We saw in Section 8.1 that S2
= L (Xi
 x)2/(n  1) is distributed as
i
0'2x 2(11  1)/(11  1). If we use f for the number of degrees of freedom, it follows that f S 2/0'2 is distributed as X2(J). We can obtain confidence limits for 0'2 from
(2.1)
whence on substitutingfs 2/0'2 for X2(J), taking reciprocals of the terms of the inequality, multiplying by s2f, and reversing the order of the inequality, 282
SECT.
9.2
USES OF THE SAMPLE ESTIMATE OF VARIANCE
283
we get (2.2) As discussed in Section 2.15, we might choose PI and P 2 to make the confidence interval as short as possible in some sense, but most people would prefer that the limits be symmetric in probability so that PI = 1  P 2 = oc/2. For example, if we observe S2 = 23.394 with degrees of freedom! = 66, and want 90 per cent confidence limits, we need X~.95(66) = 86.0 and X~.05(66) = 48.3, whence the limits are 66, 23.394 X  66 ) ( 23.394 X 86.0 48.3
=
(17.95, 31.97).
(2.3)
To derive a likelihood ratio test of the null hypothesis that 0'2 = O'~ against the alternative hypothesis that 0'2 :;6. O'~, we need the logarithm of the likelihood function (2.3.7), 11 1 2 log L =  log 27T0'2   2 I (Xi  ~) . 2 20'
(2.4)
For ro, 0'2 is fixed as O'~, and we need the maximum likelihood estimator of~. Differentiating with respect to ~ and equating to zero gives § = x, whence L(ro) =
( 1)n/2 [1 I 2
27T0'0
exp 
2
20'0
]
(Xi  X)2 .
(2.5)
For n, both 0'2 and ~ are allowed to vary, and we know from (2.3.13) that 82 = :E (Xi  X)2/11 and from (2.3.10) that § = x. Hence
The likelihood ratio A = L(ro)/L(n) can be reduced to
F )n/2 2 A = ( eF en /
(2.7)
if we define F = 82/0'~. The likelihood ratio A (2.7) tends to zero as F + 0 and as F + 00. Since we know the distribution of 8 2, in principle we can find the distribution of 82, of F, and of A. We could then make our critical
284
TESTS OF EQUALITY OF VARIANCES AND MEANS
CHAP.
9
region correspond to equal tail areas in the distribution of A.. There is no theorem that states we have to follow the likelihood ratio procedure exactly, however, and in practice we use equal tail areas in the distribution of S2. The usual critical region is thus made up of
s2
a2oXLa,2(f) .
(2.9)
f
o
and .'
f
For onesided tests, rI./2 in the above formulas is replaced by For the alternative a2 > a~, the power of the test is
7T(a~) =
pr{s2
> a~X~7(f) IE[s2] = a~}
fS2 al
= Pr { 2 =
rI..
a~ 2 2 2} >"2 XIi!)IE[s ] = al a l
pr{l > :ixLa(f)}.
(2.10)
For example, suppose that we are making a test with rI. = 0.05, using = 19. We want the probability of rejecting the null hypothesis if aUa~ = 3/2. Substituting Xg.95(19) = 30.1,
f
7T( a~ = 3;~) =
pr{X2(19)
> ~ X30.1} =
Pr{x 2(19)
> 20.0}
f'.,J
0.39. (2.11)
Equation (2.10) can be used to calculate the number of degrees of freedom necessary to give specified power. When the alternative hypothesis is E[S2] = a~ > ag, we put (2.10) equal to 1  (3:
7T(aD = pr{X2 > But Pr{x 2
> X~} =
:i
xLi!)}
= 1
{3.
(2.12)
1  {3; so (2.13)
For specified values of the ratio a~/a~, rI., and {3, this can be solved for f with the X2 table. For example, for aUa~ = 3/2, rI. = 0.05, 1  (3 = 0.90, we find that, for f = 100, Xg.95(100) = 124.3 and X~.1O(100) = 82.4; so X~.95(100)/X~.IO = 124.3/82.4 = 1.508, very close to the required value of
SECT.
9.3
THE VARIANCE RATIO
285
1.500. Thus the solution is 1 slightly greater than 100. For large values off, we can use the approximation (1.27.13)
x~U)""" H.j2!  1 + IIp)2.
(2.14)
substitute this in (2.13): (2.15)
and solve for I:
!,..... ! + !(1I 1a + 1I1P~Y. 2 2 .j«(J~/(J~)  1 }
(2.16)
For (J. = 0.05, Ill_a = 1.645, 1  {3 = 0.90,1I1_p = 1.282, and (JU(J~ = 3/2, we obtain/~ 102.9.
9.3. The Variance Ratio Let s~, s~ be independent sample estimates of (J2 based on/1,f2 degrees of freedom. We have seen that s~ is distributed as (J2X 2(fi)/!;' Denote by F the ratio of two such independent mean squares which are estimates of the same (J2: 2
F=~2 •
(3.1)
S2
A variance ratio will have associated with it two numbers of degrees of freedom, one for the numerator sample variance and one for the denominator sample variance. We will, when necessary, attach these in parentheses following the F: For the Fin (3.1), e.g., we write F(f1,f2)' If we substitute (J2 X2(j;)1Ji for s~ in (3.1), we obtain F = (J2lU1)/!1 = l(1)/!1
(J2lU2)/!2
X2(2)/!2 .
(3.2)
The distribution of X2 is known, and the distribution of the ratio of two X2·S. and hence of F, can be found. The cumulative form of F is tabulated in Table IV. Since X2 is never less than zero, F is distributed from 0 to 00. It can be shown that E[F] = 12/(/2  2), which tends to 1 as/2 tends to 00, and the mode is at 12(/1  2)/ftC/2 + 2). Table IV of the Appendix gives only the upper tail of the F distribution. We obtain points in the lower tail as follows. We have (3.3)
286
TESTS OF EQUALITY OF VARIANCES AND MEANS
CHAP.
9
Taking reciprocals of the inequality inside the braces will reverse the inequality sign: {S2 Pr..1 s~
>
I} = 1 
F i  P(fi,J2)
so
pr{~
s~ then we need only consider whether (3.12) is satisfied, and if s~ < s~ we can take reciprocals of (3.11) to give (3.13) and now consider whether this inequality is satisfied. For onesided tests, (1./2 in (3.12) and (3.13) is replaced by oc. For the alternative O'~ > O'~, the power of the test for a specified value of the ratio O'~/O'~ = cp is
7T(cp) =
pr{~ > Fl_«(fl,J2)IE[s~] = O'~ = cpO'~ = cpE[S~]} (3.14) 2
2
S~//O'~ "" F(fl>f2)'
(3.15)
S2 0'2
Hence (3.14) is equivalent to
7T(cp) = pr{F(fl,J2)
> ~ F1ifl,J2)}'
(3.16)
For example, if we are making a test of the null hypothesis O'~ = O'~ at the level of significance oc = 0.05, with /1 = 30, /2 = 10, then FO•95(30, 10) = 2.70, and the power for cp = 5 is
pr{F(30, 10) and the power for cp
> 2.70} =
=
pr{F(30, 10)
5
pr{F(10, 30)
< _5_} 2.70
rooJ
0.90,
(3.17)
1.25 is
> 2.70} = 1 1.25
rooJ
Pr{F(30, 10)
1  0.90
= 0.10.
< 2.16} (3.18)
288
TESTS OF EQUALITY OF VARIANCES AND MEANS
CHAP.
9
To find the number of degrees of freedom necessary to achieve a specified power, we put (3.16) equal to 1  (3. But also
Pr{ F(fl,!2) so
cP
> F p(fl,!2)} =
= F~~(~:i» =
(3.19)
1  (3;
F la(f!> 12)FIp(f2, 11),
(3.20)
which can be solved iteratively with the F tables. For example, with ot = 0.05, 1  (3 = 0.90, FO•95(80, 80) = 1.45 and FO. 90(80, 80) = 1.33; so their product is 1.93. The solution for cP = 2 will therefore require/I = h slightly smaller than 80, about 75. For a given sum /1 + /2' it can be shown that an equal allocation of the total number of observations to the two samples so that/I = /2 is not exactly optimum but so close that further consideration is unnecessary. 9.4. The Interrelations of Various Distributions
In (1.27.1),
x2(n)
" u~, where the was defined as L
Ui
were independent
i
unit normal deviates. When 11 = 1, X2(1) = u2 • To get corresponding probability levels for the two distributions Uand X2, we proceed as follows. Consider small areas ot in the lower and upper tails of the standardized normal distribution: These will be defined as U < ua and U > u1 a ' The sum of the areas in the two tails is 2ot, and the area between ua and u1 a is 1  2ot. Then (4.1) From the symmetry of the normal distribution, ua becomes
= u1 a , and so (4.1) (4.2)
This implies Pr{1I 2
Now put 1  20t
= P,
so that 1  ot Pr{1I 2
Also,
< lI~a} =
= 1  2ot.
(1
+ P)/2:
< 1I~1+I')/2} =
Pr{l(1)
(4.3) Then (4.3) becomes
P.
(4.4)
< x~(1)} = P.
(4.5)
Comparing (4.4) and (4.5), and, since u2 = X2(1), x~(1)
=
1I~1+I')/2'
(4.6)
The exponents 2 in (4.6) need careful interpretation. The lefthand side, X}(l), is the P point of the distribution of X2 with 1 degree of freedom;
SECT.
9.4
THE INTERRELATIONS OF VARIOUS DISTRIBUTIONS
289
the righthand side, U~1+P)/2' is the square of the (1 + P)/2 point of the distribution of u. For example, if P = 0.95, (1 + P)/2 = 0.975, and X~.95(1) = 3.84 = (1.96)2 =
u:.
975 •
(4.7)
X~.10(1) = 0.0158 = (0.126)2 = U~.55'
(4.8)
Or, if P = 0.10, (1
+ P)/2 =
0.55, and
In (1.27.3) we noted that E[X 2(j)] = f, and in (1.27.9) that V[X 2(j)] = 2/ It follows from the former that E[x 2(f)If] = III = 1, and from the latter that (4.9) Thus, as I tends to infinity, x2(f)1f becomes closer and closer to 1 with high probability. In (3.2) put/2 = co; the denominator X2(f2)1f2 becomes 1 and we get F(f1' co) = X2(f1) . (4.10)
11
For example, the 0.99 point of X2 with 10 degrees of freedom is 23.2: the 0.99 point of F with degrees of freedom 10 and co is 2.32, which equals
23.2110. The distribution of the ratio of a unit normal deviate to the square root of an independent X2 with I degrees of freedom, divided by f, is known as the t distribution with I degrees of freedom: (4.11) As stated earlier, x2(f)1f tends to 1 as I tends to infinity. Thus the t distribution with infinite degrees of freedom is identical with the standardized normal distribution. The ( distribution is related to the F distribution, for, if we put/1 = 1 in (3.2), the numerator becomes x2(1)/l which is just u2. Hence (4.12) Comparing (4.11) with (4.12), we see that (2 = F. Corresponding probability levels of ((j) and F(l,f) are obtained in the same way that we found the relationship for u and X2(1). The ( distribution is symmetric about
290
TESTS OF EQUALITY OF VARIANCES AND MEANS
CHAP.
9
zero; so, with t(f) replacing u and F(I,j) replacing X2(1), the argument proceeds exactly analogously to (4.1) through (4.6) and gives F p(1, f) For example, for f
=
12 and P
F o.95(1, 12)
=
t~l+P)/2(f).
(4.13)
= 0.95,
= 4.75 = (2.179)2 = 1:. 915(12).
(4.14)
9.5. A Test for the Equality of Several Variances We saw in Section 9.3 that the null hypothesis that two sample variances are estimates of a common variance can be tested with the F test. Neyman constructed a test for the k sample case by the likelihood ratio procedure. Suppose that we have observations Xij' i = I, ... , k, j = 1, ... ,l1 j , which are distributed normally with means ~i and variances O'~. We wish to test the null hypothesis (5.1)
against the alternative that the O'~ are in general different. The density function of Xi} is
(5.2) The likelihood function, defined as in (2.3.1), is L
= (27T)n/. O'~1 2
1 •••
O';k
1 ~~ k nl exp [   £.., £.., 2;;
(
• xii 
1= (la/2(1I1 + 112 I .J1/111 + 1/112 2
(7.10)
2).
S
Confidence limits can be obtained analogously to Section 2.16 as
pr{(XI 
x2)  (P.sJ..!. + ..!. < ~I III
112

~2 < (XI  x2)  (PlsJ..!. + ..!.} III
= P2 
PI'
112 (7.11)
As an illustration of the twosample t test, we will use two series of determinations of the atomic weight of scandium by Honigschmid [7] (Table 9.3). The figures are given in units in the third decimal place; so 79 corresponds to 45.079. Certain calculations are given in Table 9.4. Table 9.3 Series A 79, 84, 108, 114, 120, 103, 122, 120, Series B 91, 103,90, 113, 108,87, 100,80,99, 54. Table 9.4 'IIi
'IIi
Series A Series B
Ie r
ni
!Xjy
Xi
!x2iv
~ !X iV n, v
8 10
850 925
106.25 92.50
92,250 88,109
90,312.50 85,562.50
~
'IIi
!(XiV  X/)2
1,937.50 2,546.50
SECT.
9.8
THE TWOSAMPLE TEST WITH UNEQUAL VARIANCES
299
From (7.4), S2
=
1937.50 + 2546.50 8  1 + 10  1
= 280.25.
(7.12)
and for the test statistic (7.10) we have 106.25  92.50 I= 13.75 = 1.732 I../280.25../1/8 + 1/10 7.9408
(7.13)
which is referred to the t table with 16 degrees of freedom. The 0.90 and 0.95 points of 1(16) are 1.337 and 1.746, and so the twosided P value is just greater than 0.10. For 95 per cent confidence limits we need 10 . 975(16) = 2.120,/0 . 025(16) = 2.120, Xl  x2 = 13.75, and S../l/l1 l + 1/11 2 = 7.9408, to get 13.75 ± 2.120 X 7.9408 = (3.08,30.58).
9.S. The TwoSample Test with Unequal Variances Suppose that we have two independent samples XlI>"" x ln , and x2l , ••• , x 2n , from normal distributions with means ~l and ~2 and variances C1~ and (j~. We wish to test the null hypothesis that ~l = ~2: We do not assume knowledge of (j~ or (j~, and in distinction to the previous section we do not assume that (j~ = (j~. For example, we may measure the same quantity with two different instruments and wish to test the null hypothesis that the difference between the instruments is zero: There is, in general, no reason to suppose that the measurements made by the two instruments have the same variance. In this problem both the null and the alternative hypotheses are composite. The likelihood ratio procedure for constructing a test fails in this instance. The maximum likelihood estimator for ~ in w turns out to be a solution to a cubic polynomial, and the distribution of A. involves the ratio (jU(j~, which is unknown (see Mooe! and Graybill [8]). No generally accepted solution to this problem, often called the BehrensFisher problem on account of a solution proposed by Behrens, exists. We will discuss a treatment by Welch [9, 10]. We saw in Section 2.12 that the means Xl> x2 are normally distributed with variances (jUlIl, (j~/112' and (Xl ../
x2) (j~/l1l

(~l 
+ (j~/n2
~2)
(8.1)
300
TESTS OF EQUALITY OF VARIANCES AND MEANS
is a unit normal deviate. However, we do not know the problem is to determine the distribution of
x2)
(Xl 

(~l
CHAP. (J'~,
and the
~2)

9
(8.2)
~ S~/111 + S~/112
formed by substituting sample estimates s~ with degrees of freedom 1 for the (J'~ in (8.1). We will first discuss a more general question. Suppose that s~, i = 1, ... ,k are independent mean squares with degrees of freedom J;. and expected values (J'~. Suppose that we are concerned with a linear combination of these mean squares, S2, defined as 11; 
S2
=
als~
k
+ ... + aks~ = 1; aiS~'
(8.3)
where the a; are known constants. We have E[S2]
k
]
k
k
= E [4, aiS~ = 4 , aiE[s~] = 4 , ai(J'~·
(8.4)
From (8.1.26), s~ '"" (J'~X2(J;)JJ;., and from (4.9), V[X 2({;)JJ;.] = 2JJ;., so
V[s~] = v[(J'~~(/;)J = «J'~)2V[XY)J = 2(~~)2 ,
(8.5)
and (8.6) We propose to approximate the distribution of S2 by the distribution of a mean square, say S'2, where S'2 has some number of degrees of freedom, say f', by choosing S'2 and f' so that S2 and S'2 have the same expectation and variance. Thus we put k
E[S'2] = E[S2]
= 1 ai(J'~.
(8.7)
i
From (8.5), which gives the variance of a variance estimate, V[S'2]
= (E[S'2])2}, = (~ai(J'~
J}' .
(8.8)
Equating V[S2], (8.6), and V[S'2], (8.8), and solving for f' gives
, (tk ai(J'~ )2 f =
k
(2)2·
la~.!!..i...i J;
(8.9)
SECT.
9.8
THE TWOSAMPLE TEST WITH UNEQUAL VARIANCES
301
In practice, we do not know the values of the (j~, and so we have to substitute the sample estimates s~; so I' is estimated as
(8.10)
To revert to our particular problem, we are concerned with the linear combination of two variances s~ and s~, namely, sUn1 + s~/n2' and so our coefficients are 01 = 1/11 1,°2 = 11112. Substituting these values of the coefficients in (8.10), we get
l'
f"OooJ
[(S~/111)
+ (S~/112)]2 (1/111)2(S~)2 + (1/112)2(S~)2 111  1
112  1
(8.11) If we define (8.12) we can derive
(8.13) It is then easy to show, by differentiating 111' with respect to c and then equating to zero, that the maximum value that I' can take is /1 + /2: This occurs when s~ s~ ''= ''(8.14)
The smallest value thatl' can approach is the minimum of/1 and/2 : This occurs when either 1  c or c approaches zero, i.e., when (s~/112)/(s~/n1) or (silI11)/(s~/112) approaches zero. As an example of this procedure we will use the data of Dean on the atomic weight of carbon, quoted in Table 7.6, for an illustration of the Wilcoxon twosample test which gave an exact onesided P value of 0.0317 and a normal approximation of 0.033. Here we need to compute the separate s~. Using (6.8), for the first sample A, we can subtract a constant,
302
TESTS OF EQUALITY OF VARIANCES AND MEANS
CHAP.
9
12.0000, from all observations: 2 Sl
(0.0072)2 + ...
+ (0.0077? 
(0.0072 + ... + 0.0077)2/5 5 1
=
=
(18,381  16,017.80) X 108= 590.80 X 108. 4
Similarly s~ = 7460.00 X 108 • Also ii\ = 12.00566 and For the null hypothesis gl = g2, the test statistic (8.2) is (12.00566  11.99620)  0 ..J590.80 X 108/5 + 7460.00 X 108/4
(8.15)
x2 =
= 2.124.
11.99620.
(8.16)
The approximate number of degrees of freedom of this Ilike statistic is given by (8.11): First calculating
s~ = III
590.80 X 108 = 118.16 X 108, 5
S~ = 7460.00 X 112 4
(8.17)
108 = 1865.00 X 108,
(8.18)
we get
f'
rJ
(118.16 + 1865.00)2 (118.16)2/(5  1) + (1865.00)2/(4  1)
= 3.38.
(819) .
The t table (Table II of the Appendix) shows that for degrees of freedom 3 and 4 the 0.95 point is 2.353 and 2.132. For 3.38 degrees of freedom the 0.95 point will be approximately 2.27. The onesided P value for the observed value of the statistic, 2.124, is thus greater than 0.05. If we had treated this problem as an ordinary twosample t test we would have obtained t = 2.372 with 7 degrees of freedom corresponding to a onesided P value of just less than 0.025, so making the assumption that the variances are from the same population makes a substantial difference in the conclusions reached. The twosample Wilcoxon test agrees closely with the twosample t test, but it too makes the assumption that the variances of the two populations are the same. It is interesting to note the results of falsely assuming that the two sample variances are samples of a common variance and using (7.8) instead of (8.2). The numerators of these two statistics are identical and they differ only in their denominators. If/;.= 11;  1, and if sUs~ = F, so that s~ = Fs~, the square of the denominator of (7.8) is
11s~
+ 12S~(_1_ + _1_)
11+12
11+1
12+ 1 '
(8.20)
SECT.
9.8
THE TWOSAMPLE TEST WITH UNEQUAL VARIANCES
303
and the square of the denominator of (8.2) is 2 _S_I_
2
+ _S_2_.
fl+ 1
f2+ 1
(8.21)
The ratio of (8.20) to (8.21) is reducible to (Ffl
+ f2)(fl + f2 + 2)
(8.22)
If we are using equal sample sizes, so that/I = /2 = f, say, then the ratio (8.22) equals (Ff + /)(2f + 2) = 1 (8.23) 2f(Ff+ F +f+ 1) no matter what the value of F. Thus, for equal sample sizes, the test statistic has the same numerical value, and the only difference would be that we would refer (7.8) to the t table with 2/ degrees offreedom whereas we would refer (8.2) to the t table with degrees offreedom given by (8.11), which in the case of/I = /2 = / becomes
f'
=/F + 1)2 F2
+1
'
(8.24)
which will always be less than 2/ except when F = 1. But, for 0.5 < F < 2, the reduction in the number of degrees of freedom does not exceed 10 per cent; for 0.333 < F < 3, not 20 per cent; so that, when the sample sizes are equal and sample variances nearly equal, there is little difference between the procedures. When the sample sizes are unequal, however, e.g., when/I» /2' (8.22) tends to F; i.e., the test statistic will be in error by a ratio approaching Since there are no theoretical bounds on F, the test statistic could be in error by any amount. We conclude, therefore, that the twosample t test is rather sensitive to the assumption that the sample variances come from a common population, unless the sample sizes are close to equality. For markedly unequal sample sizes, and when the assumption that the sample variances are from a common population cannot be justified, a preferable procedure would be to use the methods of this section. A further conclusion is that the sample sizes should be made equal if possible. The treatment of the twosample test with unequal variances is subject to disagreements which are still unresolved; see Fisher [11]. However important the controversy from the point of view of statistical theory, it seems that the Welch procedure described here is close enough to the truth to be used in practice without qualms. The ksample form appears to have been first published by Satterthwaite [12].
.JF.
304
TESTS OF EQUALITY OF VARIANCES AND MEANS
CHAP.
9
9.9. A Comparison of Simple Tests for Means and Medians We have discussed seven tests for means and medians, and Table 9.5 is intended to assist in choosing the most appropriate for a particular situation. In general, we should use as powerful a test as we feel can be justified. Definitions and discussion of the power of these tests are beyond Table 9.5. Assumptions Involved in Various Tests All tests assume that the observations are independent and have continuous distributions Test
One sample: Yl>' .. , Yn
Normality and equality of variances
Normality
One sample formed as differences between paired observations: dt = xli  X2j, i = 1, ... , n
The parent populations must be normally distributed but may have different variances Symmetry about Necessary and sufficient Wilcoxon Identity of tests distributions zero of the condition is that (which implies) distribution p{xl> x 2} is symmetric, equality of so that p{a, b} = variances) p{b, a}. Median test Identity of Observations have A sufficient condition is (two samples) distributions median zero that Pj{xli' x 2 is symmetric. The Pi{ , } can be different for different i.
t
tests
Two samples: xu, •.. , x 1n1' x 2l> ... , x 2n•
a
Sign test (one sample)
Welch test
Not necessary for the d j to have identical distributions, nor that the distribution of any d j be symmetrical Normality. Inequality of variances permitted
EXERCISES
305
the scope of this book, but it is not misleading to state that when the observations are normally distributed the powers of the Wilcoxon tests are of the order of 3j7T, and the powers of the median and sign tests are of the order of 2j7T, of the corresponding t tests. An important topic in applied statistics is the question of the robustness of statistical procedures. A robust procedure is one which is affected only slightly by appreciable departures from the assumptions involved. Implicit in the concept is the feeling that conclusions in error by so much are not going to make us look foolish. For example, with a certain amount of a certain type of nonnormality the 0.05 point of the t statistic may actually be the 0.055 point, but that would not harm us appreciably. Also implicit is the feeling that in many areas of application the departures from the assumption will rarely exceed a certain degree. To discuss robustness usefully we would need to quantify these various feelings. We will not attempt this. We merely remark that, e.g., at the end of Section 9.8 we saw that in the twosample t test when the two sample sizes are equal the test is very little affected by large departures from the assumption of equality of variances in the two populations. On the other hand, at the end of Section 9.5 we referred to Box and Andersen's conclusion [5] that Bartlett's test for the equality of k variances was very sensitive to nonnormality. These authors found, on the other hand, that the test for the comparison of k means (the analysis of variance, to be discussed in Chapter 10) was quite robust against nonnormality.
EXERCISES 9.1. Construct 95 per cent confidence limits (a) for the variance, and (b) for the mean, of Cavendish's measurements of the density of the earth given in Table 6.1. 9.2. Construct 95 per cent confidence limits for the ratio of the variances of the series A and series B measurements in Table 9.3. 9.3. Test the null hypotheses that (a) the variances are equal and (b) that the means are equal, and (c) construct 95 per cent confidence limits for the difference between the means, for the data of Table 7.4. Do this (i) on the data as it stands, (ii) on the logarithms of the observations, and (iii) on the reciprocals of the observations. In the case of (ii) and (iii), transform the confidence limits for the transformed variable back into the original scale. 9.4. Samples of very pure iron prepared by two different methods had the following melting points: A: 1493,1519,1518,1512,1512,1514,1489,1508,1508,1494 B: 1509, 1494, 1512, 1483, 1507, 1491
306
TESTS OF EQUALITY OF VARIANCES AND MEANS
CHAP.
9
(a) Test the null hypothesis that the two methods give iron with the same melting point. (b) Construct 95 per cent confidence limits for the difference A B. Assume normality, and (i) assume the variances are the same, and (ii) do not assume that the variances are the same.
9.5. For the data of exercise (7.5), obtain the onesided P value for the null hypothesis that the difference between the two types of gauges is zero, using (a) the data as it stands, (b) the logarithms of the data. Give an explanation for the difference between the results, and indicate which you would prefer. 9.6.1. For the data of exercise (7.2), (a) test the null hypothesis that the two samples could have come from populations with the same variance. (b) Test the null hypothesis that the means are the same, assuming normality, (i) assuming that the variances are the same and (ii) not assuming that the two variances are the same. Data for Exercise 9.7 Region A
Region B
Region C
Region D
84.0 83.5 84.0 85.0 83.1
82.4 82.4 83.4 83.3 83.1
83.2 82.8 83.4 80.2 82.7
80.2 82.9 84.6 84.2 82.8
83.5 81.7 85.4
83.3 82.4 83.3 82.6 82.0 83.2
83.0 85.0 83.0
83.0 82.9 83.4
85.0 83.7 83.6
83.1 83.5 83.6
83.1 82.5
83.3 83.8 85.1 83.1 84.2
86.7 82.6 82.4 83.4 82.7
80.6 82.3
82.9 83.7 81.5 81.9 81.7 82.5
84.1 83.0 85.8 84.0 84.2 82.2 83.6 84.9
Data from O. C. Blade, "National MotorGasoline Survey," Bureau of Mines Report of Investigatioll 5041.
307
EXERCISES
II. The table below gives the reciprocals of the observations of exercise (7.2). (a), (b) Perform the same tests on these data as in (a) and (b) above. (c) In
exercise (7.2), the original data were tested with the appropriate Wilcoxon test: do the same on the reciprocals. III. Compare Ia with Ha, Ib(i) with Ib(ii), Ib(i) with Hb(i), lIe with exercise (7.2). Comment on similarities and differences in the results. Control A: 0.200,0.167,0.143,0.143,0.125,0.125,0.125,0.111,0.083 Drug B: 0.143,0.125,0.125,0.125,0.111,0.111,0.083,0.077, 0.071, 0.059 9.7. The data on p. 306 are the results of octane determinations on samples of gasoline obtained in four regions of the northeastern United States in the summer of 1953. Test the null hypothesis that the variability in octane number is the same for aU four regions. Some calculations which may be useful are summarized below. Region ni
B
16
13
18
22
62.0
37.0
58.0
66.2
258.06
107.98
215.86
232.24
C
"I
LXiv
D
A
"I
LX~v v
A constant 80 has been subtracted from all observations before making these calculations. 9.S. Rosa and Dorsey [see exercise (10.3)] measured the ratio of the electromagnetic to the electrostatic unit of electricity with great precision. In a long series of observations they on occasion disassembled their apparatus, cleaned it, and reassembled it. The variances of the groups of observations, multiplied by 108 , and numbers of observations in each group, were as follows.
Group
Number of observations II
2 3 4 5
8 6 24 15
Variance 1.5636 1.1250 3.7666 4.1721 4.2666
Test the null hypothesis that these sample variances could come from a common population.
308
TESTS OF EQUALITY OF VARIANCES AND MEANS
CHAP.
9
9.9. Let s~, s~ be independent unbiased estimators of G~, G~ with nl  I, n2  1 degrees of freedom respectively. Show that the likelihood ratio statistic for a test of the null hypothesis G~ = G~ can be written as
A=
(nl
+ n2) ~ X 1.72} = Pr{F{21, 66) > 0.43} = Pr{F{66, 21) < 2.326},
(9.3)
taking reciprocals and reversing the sign of the inequality. Since Pr{F{66, 21) < 2.173} = 0.975, the power is somewhat greater than 0.975. EXERCISES 10.1. For the data of exercise (9.7), irrespective of the outcome of that exercise: (a) Test the null hypothesis that the mean octane number is the same in the four regions.
331
EXERCISES
(b) Construct 95 per cent confidence limits for the difference in means of regions A and l! (i) assuming that it had been your original intention so to do, (ii) assuming that the idea occurred to you after looking at the data. (c) Construct 95 per cent confidence limits for the contrast defined by the difference between region A and the mean of the other three regions. Assume that this contrast was suggested by the data. 10.2. The table below gives 100 times the logarithm to the base 10 of onetenth the throughputs listed in exercise (7.3). The logarithmic transformation is selected as it seems likely to stabilize the variance as a function of the mean: The specific form mentioned gives numbers easy to work with.
Foundry
Transformed throughputs
A B
92, 78, 60, 67, 53, 66 83, 96, 98, 60, 99, 78, 77, 103, 93, 107 66,97, 100,96,96
C
(a) Test the null hypothesis that the means of the three foundries are the same. (b) Construct 95 per cent confidence limits for the difference between foundry A and the mean of foundries Band C, assuming (i) that this had been your original
intent, and (ii) that this contrast was suggested to you by the data. Quote these limits in terms of the transformed numbers and also in terms of the original scale. 10.3. The data below give some of the results of Rosa and Dorsey ["A New Determination of the Ratio of the Electromagnetic to the Electrostatic Unit of Group k
2
nl
3
62 64 62 62 65 64 65 62 62 63 64
65 64 63 62 65 63 64 63
65 64 67 62 65 62
11
8
695 43,927
62 66 64 64 63 62 64 64 66 64 66 63
Li
5
4 65 63 63 63 61 56 64 64 65 64 64 65
66 64 65 65 65 64 66 67 66 69 70 68 69 63 65
6
24
15
509
385
1,525
992
32,393
24,723
96,997
65,664
64
llj
LXiv
4,106
y llj
LX~ty
263,704
332
ONEWAY ANALYSIS OF VARIANCE
CHAP.
10
Electricity," Bulletin of the National Bureau of Standards, 3 (1907), 4336041 on the ratio of electromagnetic to electrostatic units of electricity, a constant which is equal to the velocity of light. The figures below have had 2.99 subtracted from them and have been multiplied by 10,000 to give numbers simple to work with. The groups correspond to successive dismantling and reassembly of the apparatus. Certain sums which may be useful are given in the lower part of the table. (a) Make a conventional analysis of variance of these data, giving the expectations of the mean squares. (b) Obtain estimates of the components of variance for within groups and between groups. Test the null hypothesis that the latter is zero. (c) Form the weighted mean which has minimum variance. (d) What is the estimated variance of this mean? (e) Supposing that all 64 observations were regarded as 64 independent observations, what would the variance of the simple mean be? 10.4. A manufacturer has been making all his product from a single large uniform batch of raw material and has achieved a reputation for uniformity of product. This reputation has brought him more business, and he now needs to consider a much larger output. The raw material is no longer obtainable in very large batches, and he must now use relatively large numbers of small batches. He considers that he will lose his reputation for uniformity if the standard deviation of his new output exceeds by 20 per cent the standard deviation of the old. In other words, if the new standard deviation is Up' and the old uT, then UT' must not exceed 1.2uT • He decides to run a trial on a random sample of five batches of raw material. He is going to make a number, say n, of parts from each batch and test for batch differences with a oneway analysis of variance using ex as his level of significance. (a) Suppose he chooses n = 9, ex = 0.01; what is the probability of his detecting a deterioration, i.e., an increase in total standard deviation of the magnitude specified above? (b) Suppose he uses ex = 0.1, what then? (c) Suppose he requires ex = 0.01 and a probability of detecting the specified deterioration in uniformity of 0.99, what should n be?
REFERENCES 1. Heyl, Paul R., "A Redetermination of the Constant of Gravitation," Jou/"llal of Research of ti,e Bureau of Standards, 5 (1930), 124350. 2. Schefi'6, H., "A Method for Judging All Contrasts in the Analysis of Variance," Biometrika, 40 (1953), 87104. 3. Tukey, John W., "The Problem of Multiple Comparisons."Unpublished dittoed manuscript. 4. Pearson, E. S., and H. O. Hartley (eds.), Biometrika Tables for Statisticians, Vol. 1. London: Cambridge University Press, 1954. 5. Dunnett, Charles W., "A Multiple Comparison Procedure for Comparing Several Treatments with a Control," Jou/"llal of the American Statistical Association, 50 (1955). 10961121. 6. Bross, Irwin, "Fiducial Intervals for Variance Components," Biometrics, 6 (1950), 13~4.
REFERENCES
333
7. Scheffe, Henry, Allalysis of Variallce. New York: John Wiley and Sons, 1959. 8. Williams, J. S., "A Confidence Interval for Variance Components," Biometrika, 49 (1962), 27881 9. Cochran, W. G., "The Combination of Estimates from Different Experiments," Biometrics, 10 (1954), 101129. 10. Cochran, William G., Samplillg Techlliques, 2nd ed. New York: John Wiley and Sons, 1963
CHAPTER
II
Simple Linear Regression
11.1. Introduction When an investigator observes simultaneously two variables x and y, usually with good reason he plots the observations on a graph. If there is any sign of an association, he is usually seized with the impulse to fit a line, usually a straight line or, rather infrequently, a parabola or cubic. The purpose of this arithmetic penance of curve fitting is often not very clearly defined; one purpose, however, might be to predict x from a new observed y or vice versa. Another purpose is to use the route of testing the significance of the parameters of the line as a means of testing for association between x and y. The standard technique for fitting a line is known as "least squares," and the line so fitted is known as a regression line for curious historical reasons (see Section 12.5). In this chapter we shall consider the case in which a series of values of x have been selected by the experimenter and he observes y at those values of x. The socalled independent variable x is assumed to be measured without appreciable error. The situation in which the variables x and y vary at random outside the control of the experimenter and are only observed will be discussed in the next chapter. The present chapter is of unusual length, and some guidance to the reader is called for. Sections 11.2 and 11.3 give the usual fitting of a straight line to data by the method of least squares, and Section 11.4 is a numerical example. This is as much as is found in some elementary textbooks. Section 11.5 deals with finding confidence limits for x from an observed y. In Section 11.6 the problem of comparing two regression lines is covered. An important application of this statistical technique is to parallelline biological assay (Section 11.7); a numerical example follows in 334
SECT.
11.2
335
THE MODEL
Section 11.8. The line discussed thus far is the twoparameter line Y = a + b(x  x). In Section 11.9 the oneparameter line through the origin y == bx is described, with its use in reverse in Section 11.10. The construction of joint confidence regions for (IX, f3) is considered in Section 11.11. Sections 11.12 and 11.13 discuss the case where we have several observations on Y at some of the x's and it becomes possible to check the line for goodness offit. Section 11.14 extends the methods of Section 11.6, for comparing two regression lines, to the case of more than two lines: This technique is also known as the analysis of covariance as it amounts to making an analysis of variance of Y adjusted for concomitant variation in x. For a thorough review of linear regression see Acton [1]. In general we assume that we have observations Yiv as follows: (1.1)
where i = 1, ... , k. If all the Il i = 1, so that there is only one observation on Y at each x, we have a special, and important case, for which the analysis, both theoretical and arithmetical, is simpler. We will therefore deal with this case first. The general case, where all the lli are not equal to 1, can always be treated as the special case by merely ignoring the fact that some of the Xi happen to be identical. For example, if III = 2, 112 = 3 and 113 = 1, we can regard the observations (Yll' Xl), (YI2' Xl), (Y2l> X2), (Y22' X2), (Y23, x 2), (Y3l> x 3)
(1.2)
(Yl> Xl), (Y2, X2), (Y3' X3), (Y4, x 4), (Y6, Xli), (Y6' x 6).
(1.3)
as The fact that in this second set of observations Xl = X 2, etc., can in general be disregarded. The only case in which it would matter is where all the x's are identical; then it is obviously meaningless to attempt to fit a line. For the present we will deal with the simpler situation and suppose that we have observations (Xi' Yi)' i = 1, ... ,k. Except where otherwise k
obvious or explicitly stated, all summation operations,
!, will
be over
i
i = I, ... , k, and so we will omit the index of the summation and write
merely:E. 11.2. The Model We assume that y is distributed normally about an expected value 'fJ with variance (12, and that all observations are independent. We further assume that 'fJ is a simple linear function of x: 'fJ = IX
+ f3(x 
x).
(2.1)
336
SIMPLE LINEAR REGRESSION
CHAP.
11
The problem is to obtain, from the data, sample estimates a, band S2 of ex, {l and ()'2 and to determine the distribution of these estimates. The estimated regression equation is Y= a
+ b(x 
x)
(2.2)
where x = ~ xi/k. We write (2.1) and (2.2) in the forms given rather than as 1] = ex' + {lx, Y = a' + bx because in this latter form a', b are dependent whereas in the form (2.2) a, b are independent, and this property is convenient when we come to consider V[ Y]. The standard method of estimation in regression is the method of least squares; this is to use those values of a, b which will minimize the sum of squares of deviations, say R, between the observed values Yi and the predictions Y i given by inserting the values of Xi in the estimated equation (2.2). Thus we minimize
R =
L (y; 
Y i )2 =
L [y; 
a  b(x;  X)]2.
(2.3)
The method of least squares appears to be largely due to Gauss. As far as estimation of the parameters is concerned, it does not require the assumption of normality, but this assumption is necessary for construction of confidence intervals for or tests of hypotheses about the parameters. With the assumption of normality, the method of maximum likelihood gives results identical with those of the method of least squares. The method of least squares has the desirable properties that the estimators it gives are unbiased and, among all unbiased linear estimators, have minimum variance. Detailed discussions have been given by David and Neyman [2] and Plackett [3]. To find the values of a and b that minimize R we differentiate (2.3) with respect to a and b and equate to zero:
oa = 2L [Yi oR = 2 L [Yi oR
a 
b(Xi  x)] = 0,
a  b(Xi  x)](xi
ob
(2.4) x) = O.

(2.5)
These two equations can be written as
L (Yi L (Yi 
Yi )
= 0,
(2.6)
Yi)(X i  x)
= o.
(2.7)
Rearranging (2.4) and (2.5) gives
+ b L (Xi x) + b L (Xi ka
.a
L (x; 
= L Yi' X)2 = L (Xi x)
(2.8) 
X)Yi'
(2.9)
SECT.
11.2
337
THE MODEL
Since l:: (Xi  x)
= 0, we have as estimators for ~ and p: a= b=
I
Yi = g,
(2.10)
k
I
(Xi  X)Yi '" ",(Xi  x)2
(2.11) .
The numerator of this expression for b can be written slightly differently: Since l:: (Xi  x) = 0, then g l:: (xi  x) = 0, and
I
(Xi  X)Yi
=I
(Xi  X)Yi  g
I
(Xi  x)
=I
(Xi  X)(Yi  g). (2.12)
Thus an alternative form for b is (2.13) We can readily check that b is an unbiased estimator of p. We assumed that the expected value of Yi was 1]i' given by (2.1) with X = Xi' Then
E[b]
=I
(Xi  X)E[Yi] I (Xi  X)2
= ~
I(Xi  x) I (Xi  X)2
=I
+ /'R II
(Xi  x)[~ + P(xi  x)] I (Xi  X)2 (Xi  X)2 R = /'. (Xi  X)2
(2.14)
The variances of a and b can be obtained directly, for inspection of (2.10) and (2.11) shows that they are linear functions of the Vi' which are assumed to be independent and have a normal distribution with variance 0'2:
I
y.]
1
V[a]
= V [,: = k2 I
V[Yi]
= k'
V[b]
=
=I
(Xi  X)2V[Yi] X)2]2
v[I (Xi  X)Yi] (Xi  X)2
I
0'2
[I (Xi 
(2.15)
=
I
0'2
•
(Xi  X)2
(2.16) We will defer the estimation of 0'2 to the next section. A demonstration that in general the method of least squares produces estimators of smallest variance among all unbiased linear estimators is somewhat involved [2, 3], but it is often quite easy to show that a least squares estimator, when it has been obtained, is of minimum variance. We shall show that the estimator b (2.11) has the smallest variance of all unbiased linear estimators. Suppose that there exists an alternative linear estimator b' , (2.17) b' = I CiYi'
338
SIMPLE LINEAR REGRESSION
CHAP.
11
We have E[b']
=L
ciE[Yi]
=L
ci[at
+ P(xi 
=
x)]
at
L
For b' to be an unbiased estimator, so that E[b']
Ci
+ PL
= p,
(Xi  x)ci. (2.18) we require
L C = 0,
(2.19)
i
L (xi 
=
x)c i
1.
(2.20)
The variance of this alternative estimator b' is V[b']
=L
CW[Yi]
=
(12
C~ = (12 L[Ci 
L
L ~i  X)2 Xi  X
+ LX(Xi  X_)2J2 X (2.21)
since the cross product term is zero, for
J
L [C. Xi  x (Xi  x) • L (Xi  X)2 L (Xi  x)2
= L CtC Xi
 x) _ L (Xi  x)2 L (Xi  X)2 [L (Xi  X)2]2
=0 '
(2.22) using (2.20). Thus V[b']
=
2
(1
L [Ci
 J2 + L 2 (Xi  X) (Xi  X)
X· ""'.
k
X

(1
2
2'
(2.23)
in which the last term is a constant. Hence to minimize V[b'] we can only make adjustments to the first term; by putting Ci
=
Xi  x X)2
L (Xi _
(2.24)
we make the first term zero and hence make V[b'] a minimum. But, with this value of Ci , our alternative estimator (2.17) is b' = L c.y. = L (Xi  X)Yi .. L (Xi  X)2
(2.25)
which is our original least squares estimator b (2.11). Therefore b' = b, and b is the minimum variance unbiased linear estimator. Incidentally, from (2.23) we see that when Ci is given the value in (2.24) V[b] = (12/"'2:. (Xi  X)2, as found earlier in (2.16). 11.3. An Analysis of Variance Representation We will now consider regression analysis from the point of view of analysis of variance. From this approach we will obtain the variances of
SECT.
11.3
AN ANALYSIS OF VARIANCE REPRESENTATION
a and b (which we have already found directly), and in addition we will be able to show that a and b are independent of each other and of S2, our estimate of 0'2. The deviation of an observation Yi from the value predicted by the true regression equation (2.1) can be written as
Yi  '7i = (Yi  Yi) = (Yi  Yi) = (Yi  Yi )
+ (Yi 
1]i)
+ [a + b(Xi  x)  {ex + P(xi + (a  ex) + (b  P)(xi  x).
x)}] (3.1)
y
~
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _L ____________________ ~y
~X
Xi
~
Figure 11.1
This equation is represented graphically in Figure 11.1. Squaring and summing over i gives
!
(Yi  1]i)2 = k(a  ex)2
+ (b 
P)2! (Xi  X)2
+!
(Yi  Yi)2, (3.2)
using (2.6) and (2.7) to show that two of the cross products are zero. On the lefthand side of (3.2), L (Yi  1]i)2 is a sum of squares with k degrees of freedom and distributed as 0'2X 2(k). It is partitioned into three components. The first and second each involve one variable, a and b respectively, and will each have one degree offreedom. The third involves k variables Yi  Y i , but these are subject to the two restrictions (2.6) and (2.7), so the degrees of freedom are k  2. Thus the sum of the three
340
SIMPLE LINEAR REGRESSION
CHAP.
11
sums of squares on the righthand side of (3.2) equals the sum of squares on the lefthand side, and the degrees offreedom likewise, so by Cochran's theorem the sums of squares on the righthand side are distributed as O'2X 2 with the corresponding degrees of freedom and are independent. Since ~ (y;  y;)2 is distributed as O'2X2(k  2), S2, defined as S2
= _1_
~(y. _ y)2
k2£'"
(3.3)
.'
will be distributed as O'2X2(k  2)/(k  2), will have expected value 02, and will be independent of a and b. Also, since k(a  ()()2 is distributed as 02X2(l) it has expected value 02, and therefore V[a]
= E[(a
2
= 0 .
_ ()()2]
(3.4)
k
By a similar argument, V[b] = O'2/~ (Xi  X)2. We can make a test ofthe null hypothesis P = 0 by substituting S2 for 0'2 in the expression for V[b]. Alternatively, since the last two terms on the righthand side of (3.2) are independent and have distributions O'2X2(l), O'2X2(k  2) respectively, we have that
(3.5) Thus under the null hypothesis
P = 0, (3.6)
To calculate the numerator of S2, (3.3), we usually use an identity obtained as follows. We note that Yi 
y=
[a
+ b(x; 
x)]  a
= b(Xi 
x),
(3.7)
so by (2.7)
L (Yi 
Yi)t Yi  y)
= b L (Yi

Yi)(X i  x)
= O.
(3.8)
Thus when we square and sum over i the identity
(3.9) the cross product is zero. Therefore
This equation is entered in the second column of Table 11.1.
11.3
SECT.
341
AN ANALYSIS OF VARIANCE REPRESENTATION
To calculate ~ (Yi we use (3.7):
! Pi 
y)2

y)2, known as the sum of squares due to regression,
=
b2
=
[! (Xi  X)(Yi ! (Xi  X)2
!
(Xi  X)2
= b!
(Xi  X)(Yi  y)
y)]2
(3.11) (3.12)
Table 11.1 Source of variance
Degrees of freedom
Due to regression Remainder
1 k 2
~(Yi ~(Yi

fj)2 y i )2
Total
k 1
~(Yi

fj)2
Sums of squares
Mean squares s~ S2
E[M. S.] a2 a2
+ P2~(xi

x)2
The other term on the righthand side of (3.10), ~ (y;  y;)2, is the sum of squares of deviations of observed values about the estimated line, and is commonly known as the residual or remainder sum of squares. It is, of course, the numerator of S2 defined as in (3.3). Using (3.10) and (3.12), it can be calculated as
~ ( . _ y)2 = ~ (y _ )2 _ k
Y.
•
k
i
Y
[! (X;  X)(Yi ! (x _ X)2
(3.13)
y)]2.
To find the expected value of s~ defined as in Table 11.1, we note that (b  (3)2 ~ (Xi  x)2 is distributed as 0'2X 2(1) and hence has expected value u2• Thus
0'2 = ! (Xi  x)2E[b 2]  2{3! (Xi  x)2E[b]
+ {32! (Xi 
X)2.
(3.14)
Now from (2.14), E[b] = {3, and from (3.11),
!
(Xi  x)2E[b 2] = E[! (Yi  y)2].
(3.15)
Thus substituting in (3.14) and rearranging, E[! (li  y)2] = 0'2
+ {32 !
= 0'2(1 +
(Xi  X)2
:r:]).
(3.16) (3.17)
The latter form makes it clear that E[s~] is a function of the ratio of {32 to the variance of b.
342
SIMPLE LINEAR REGRESSION
CHAP.
11
Using (2.2) for Y, (2.15) and (2.16) for V[a] and V[b], and the fact that
a and b have zero covariance, we can write, for any fixed x, V[Y] = V[a = a2
+ b(x + (x 
[!k
x)] = V[a]
+ (x 
x)2V[b]
J.
X)2 ~:CXi  X)2
(3.18)
The expected value of Y is ex. + (l(x  x) = 1'/. Y is a linear function of a and b, and a and bare lineat: functions of the Yi which are normally distributed. So Yis normally distributed; inserting S2 for a2 in (3.18), we have Y1'/
(3.19)
='""t(k  2) . .JV[Y]
This gives confidence limits for 1'/, the true value at some specified x: Pr{Y  tp 2.JV[Y]
< 1'/ < Y 
tp 1.JV[Y]}
= P2 
Pl'
(3.20)
A new single observation at x will be distributed about 1'/, with a variance a2 , independently of Y, so E[y  Y]
= E[y] 
V[y  Y]
=
V[y]
E[Y]
= 1'/ 
+ V[Y] =
a2
1'/
= 0,
(3.21)
[1 + !k + j,x(Xi x)~x)2J,
(3.22)
and .J y  Y '"" t(k  2). V[y  Y]
(3.23)
Thus we have the conclusion Pr{Y + tP1.JV[y  Y]
< y < Y + tp)V[Y 
Y1} = P 2

Pl' (3.24)
This resembles a confidence interval statement, but differs, for whereas a confidence interval statement is an interval for a parameter, (3.24) is an interval for a random variable y. The interval (3.24) is known as a prediction interval and represents the limits between which we are P 2  PI confident that a new single observation y taken at the specified x will lie. The interval (3.20) represents the limits between which we are P z  PI confident that the true value of the parameter will lie.
11.4. An Example of Linear Regression The data of Table 11.2 (personal communication from Dr. D. W. Cugel), plotted in Figure 11.2, give the results of an experiment to determine the
SECT. 11.4
AN EXAMPLE OF LINEAR REGRESSION
343
Table 11.2 xi
Yi
Xi
Yi
Xi
Yi
1190 1455 1550 1730 1745 1770
1115 1425 1515 1795 1715 1710
1900 1920 1960 2295 2335 2490
1830 1920 1970 2300 2280 2520
2720 2710 2530 2900 2760 3010
2630 2740 2390 2800 2630 2970
behavior of a method of measuring blood flow. The Yi are rates of flow estimated by this method for a series of Xi; the Xi were accurately qetermined by a direct method. We need the following: k = 18, LXi = 38,970, L Yi = 38,255,
LX: = 89,394,900,
LY~ = 86,125,825,
10 2 X Y 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6
4 2
o o
2
4
6
8
10 12 14 16 18 20 22 24 26 28 30 32
Figure 11.2
344
SIMPLE LINEAR REGRESSION
CHAP.
11
from which we derive
~ (x. ",c.,
 X)2
=
•
~ (y. ",c.,
_ y)2
=
•
~ (Xi
",c.,
~ X~
(! Xi)2 =

k
",c.,.
~ y~ _
(! Yi)2 = k
",c.,.
89 394 900  (38,970)2 " 18
86 125 825 _ (38,255)2 " 18
) =",c.,~ XiYi  !
)(
 X Yi  Y
=
5 024 850 '"
4 823 323.6 " ,
=
!
Xi Ie Yi
= 87,719,100 _ 38,970 X 38,255 = 4,897,025.0. 18
The sum of squares due to regression is, from (3.12),
~ (¥, _ )2 = (4,897,025.0? = 4772 451.7. ",c.,
•
Y
5,024,850.0
(4.1)
"
We can now assemble Table 11.3, finding the residual sum of squares by difference [see (3.13)]. Table 11.3
Source of variance
Sums of squares
Degrees of freedom
Mean squares
Due to regression 4,772,451.7 Residual 50,871.9
1 16
4,772,451. 7 3,179.49
Total
17
4,823,323.6
E[M. S.] ()'2
+ fJ2'L,(Xi
_ x)2
()'2
Under the null hypothesis (3 = 0 the variance ratio 4,772,451.7/3179.49 = 1501.0 is distributed as F(I, 16). Here the result is obviously significant. An alternative test would be to use the fact that
b  (3 ""' t(k _ 2), v'V[b]
(4.2)
for which we need b, from (2.13),
b
=
!
(Xi  X)(Yi  y) ! (Xi  X)2
=
4,897,025.0 5,024,850.0
=
and V[b], which we get by replacing 02 in (2.16) by
V[b] =
!
0.974561,
(4.3)
S2:
S2 _ = 3179.49 = 6.32753 X 104 • (Xi  X)2 5,024,850.0
(4.4)
SECT.
11.4
345
AN EXAMPLE OF LINEAR REGRESSION
A test of the null hypothesis that {3
= 0 is given by
b  0 = 0.974561  0 = 38.74, .JV[b] .J6.32753 X 104
(4.5)
which will be distributed as ((16). The null hypothesis obviously has to be rejected. Of course, (2 = (38.74)2 = 1501.0 = F of the previous test. In this particular example there is no reason to doubt the significance of the regression. A more interesting null hypothesis is that {3 = 1; for this, (4.2) gives 0.974561  1 = 1.012, (4.6) .J6.32753 X 104 which is distributed as ((16) under this null hypothesis. Clearly the null hypothesis that {3 = 1 can be accepted. We may wish to construct 95 per cent confidence limits for (3; for these, to.976(16) = 2.120 and .JV[b] = 0.02515; so t.J V[b] = 0.05333, and the 95 per cent confidence limits are 0.97456 ± 0.05333 or (0.9212, 1.028). To construct the estimated regression line we need y = 38,255/18 = 2125.278, it = 38,970/18 = 2,165.000, and b = 0.974561, whence the estimated regression line is
+ 0.974561(x 
Y = 2125.278
At x
= 0, (Y)",=o = V[(Y) _ ]
2165.000).
(4.7)
15.352. From (3.18),
= 3179.49[~ + (0 
",0
18
2165.000)2J 5,024,850.0
=
3142.493.
(4.8)
Under the hypothesis that (17)",=0 = 0, the ratio 15.352  0 = 0.274 (4.9) .J3142.493 is distributed as t( 16): Clearly the null hypothesis of a zero intercept is acceptable. Confidence limits for the intercept are 15.352 ± 2.120
X
.J3142.493 = (103.5, 134.2).
(4.10)
Let us summarize what has been so far established. (i) The estimated regression line of y on x as given in (4.7). (ii) 95 per cent confidence limits for the regression coefficient {3 are (0.921, 1.028); so the null hypothesis {3 = 1 is acceptable. (iii) The intercept on the y axis has 95 per cent confidence limits (103, 134); so the null hypothesis that the intercept is zero is acceptable. (iv) The estimated variance of an observation around its true value is 3179.49, corresponding to a standard deviation of 56.83.
346
SIMPLE LINEAR REGRESSION
CHAP.
II
We might ask, if we observe a new value of y, what is our estimate of :1:, and what are confidence limits on the true value of x corresponding to this observed y? This question is investigated in the following section. A further question arising from items (ii) and (iii) above is the following: While the null hypothesis that fJ = 1, the intercept unspecified, is acceptable, and also the null hypothesis that the intercept is zero, fJ being unspecified, is acceptable, is the null hypothesis that fJ = 1 and the intercept is zero simultaneously acceptable? This question is discussed from the testing point of view in Section 11.9 and from the confidence region point of view in Section 11.11.
11.5. The Use of the Regression Line in Reverse Suppose that we have a regression line Y = a + b(x  x), based on k observations (Xi' Yi)' Suppose that we now observe a new fj', the mean of m new observations known to have arisen from the same x, and we wish to predict the X corresponding to this fj', and to construct confidence limits for this prediction. Of course, m may equal 1, so that we just have a single observation y'. We can solve the estimated regression equation (2.2) for:l:, inserting fj' for Y, where fj' is the mean of the new set of observations on y, to get
x  x + fj' ba'
(5.1)
. .
a point estimate of the value of X corresponding to the new observed value fj'. The expected value of the new fj' is E[fj'] = 1]. Corresponding to this value of 1] is a value of x given by solving the true regression equation (2.1) for x: Denote this value of x by~. Then 1]
Cl
~=x+
(5.2)
fJ
and
Cl 
= O.
(5.3)
 a  ba  x). This variable will have expected value
(5.4)
1] 
fJ(~

x)
We now define a new variable z as
z
E[z]
= E[fj'] 
= fj'
E[a]  (~  x)E[b]
by (5.3). Its variance is V[z]
_ = V[y'] +
V[a]
+ (~ 
x)2V[b]
=(J'2[.!.+!+ (~_X)2 m
k
L (Xi 
X)2
J.
= 1]  Cl 
(~

(J'2 (J'2 =  +  + (~ m
k
x)fJ
=0
x)2 ~
(5.5)
(J'2
k(Xi 
)2 X
(5.6)
SECT.
l1.S
347
THE USE OF THE REGRESSION LINE IN REVERSE
The random variable z is a linear function of three random normally distributed variables y', a and b, and hence will itself be normally distributed. Thus zlv'V[z] is N(O,l), and, on replacing (12 in (S.6) by its estimate S2 from (3.3), we have
y'  a  b(~  x) "' t(k _ 2). sv'llm + 11k + (~  x)2/'L, (Xi  X)2
(S.7)
We can insert the above expression in place of t in the statement Pr{t1 < t < t 2 } = P 2  Pi> to get confidence limits for~. If is the lower confidence limit, it is given by the equation
£
fj'  a  b(~  x)
sv'llm
+ 11k + (f 
= t2 ,
(S.8)
x)2/'L, (Xi  X)2
where 12 is the P 2 point of I with the degrees of freedom of s, namely k  2. Squaring and expanding and collecting terms in we get a quadratic equation in £:
r, £,
~2[b2 
t~s2 _ 2J +
'L, (Xi
 X)
+ (fj' 
a
2§['L, (Xit~s2 _X) 2X 
+ bX)2  t~S2[.!. + ! + m
k
b2x  b(fj'  a)] x2 ] = O.  X)2
'L, (Xi
(S.9)
Using the usual formula for the solution to a quadratic equation, using the negative sign for the square root term, we get ¢=

x+
bey'  a)
t2s
b2  t~s2/'L, (Xi  X)2
b2  t~s2/'L, (Xi  X)2
X
{[b 2 
t~s2
'L, (Xi
]
 X)2
(.!.m + !)k + 'L,(fj'(Xi  a)2X)2 }~.
(S.10)
The upper confidence limit ~ is obtained in the identical manner, using II in place of 12 , and the solution is the same as (S.10) with this change. Now let us suppose that we choose PI and P 2 so that the limits are symmetric in probability, i.e., 11 = 1«/2,12 = 11«/2, and where there is no danger of confusion we will omit the suffix on the ('s. The behavior of (S.10) is determined by the quantity we define as g:
g= First, suppose that g
(bl.J~[bJ = < 1.
b2'L,
:::2_ X)2'
(S.11)
Then
I / I> b vV[b]
t,
(S.12)
348
SIMPLE LINEAR REGRESSION
CHAP.
II
and so b is significantly different from zero at the level of significance when tested against the twosided alternative. Also, when g < I, 2 2
b2 _
ts (Xi  X)2
L
>0
IX
(5.13)
'
so the quantity under the square root sign in (5.10), say Q, is positive and we obtain a real solution for §. Second, suppose that g> 1. Then in (5.12) the inequality sign is reversed and b is not significantly different from zero. Also, the inequality sign in (5.13) is reversed. We will get imaginary solutions to (5.11) if Q < 0, i.e., if [
b2 _
1+ _1) + ('Y 
] (_ t2s2 L (Xi  X)2 m
a )2
0 and (5.10) gives real solutions. Thirdly, if g is small, say g < 0.1, then b2 _
t2s2 L (Xi  X)2
=
b2[1 _
t2s2 ] 2 b L(Xi  X)2
=
b2(1 _ g) ~ b2, (5.16)
and (5.10) and its analog for ~ become
~,....., x + y'


b
a_
t2 s[(l m
Ibl
+.!) + k
~"""'x+Y'a_tls[(l+!)+ b
Ihl
In
k
(y'  a? J~ (Xi  X)2
(5.17)
b2
b2
{Y'a)2 J~ L (Xi  X)2
(5.18)
L
as approximate confidence limits for~. These are usually considered valid for most purposes when g < 0.1. To illustrate these results, suppose that in the example of Section 11.4 we observe a single new observation y = 3000. To construct 95 per cent confidence limits for ~ we need t o.975 = to.025 ' For that example S2 = 3179.49, b = 0.974561, L (Xi  X)2 = 5,024,850. Inserting these values in (5.11) we obtain g = 0.00299, which permits use of the approximations
SECT.
11.6
349
THE COMPARISON OF TWO REGRESSION LINES
(5.17) and (5.18). The 95 per cent confidence limits for ~ are 2165.000
+ 3000.000 
2125.277 0.974561
± 2.120J3179.49[(~ 0.974561
+ l) + (3000.000  2125.277)2J~
1
18
(0.974561)2
X
5,024,850
= 3062.5 ± 135.2. Thus if we observe a flow rate of 3000 by our new method, we can be 95 per cent confident that the true flow rate is in the interval (2927,3198). 11.6. The Comparison of Two Regression Lines We suppose that we have two sets of observations,
,,=l, ... ,l1i :
(XiV'
YiV )' i
= 1,2,
and We will discuss the procedure for deciding whether a single common regression line is an adequate fit, or whether separate regression lines Y1
=
Y2
=
a1 + b1(x  Xl) a2 + b2(x  x2 )
(6.1)
are necessary. We start by fitting separate lines, obtaining estimates ai' bi and s~, i = 1, 2. If the lines are identical, the s~ will be estimates of a common (12, and their ratio will be distributed as F. It is easier'to put the larger in the numerator of the variance ratio; a twosided test will usually be appropriate. If the null hypothesis is rejected, then the lines differ in this regard, but further examination is difficult because the dissimilar variances involve the BehrensFisher problem. If the null hypothesis of a common residual variance is accepted, we can form a joint estimate of (12 [cf. (9.7.6)],
s:
S2
= (111  2)s~ + (112 111 
2
+ 112 
2)s:. 2
If the null hypothesis PI  P2 = 0 is true, bi distributed about 0 with a variance

(6.2) b2 will be normally
350
SIMPLE LINEAR REGRESSION
CHAP.
11
where (1z will be estimated by (6.2). Thus under the null hypothesis, {31  (3z = 0, ~~ ) 1
"1
s{[~ (Xlv 
Xl)2]
1
"2
+ [~(X2y 
X2)2]
~ r.J t(1I1
+ 112 
4.
(6.4)
}
If we reject this null hypothesis, then the lines differ in slope and hence are different. If, on the other hand, we accept this null hypothesis, then We fit a joint estimate of the common slope and proceed to test whether the lines are coincident as well as parallel. We next need to estimate the common slope. We have two true equations for the two parallel lines,
'71
= (Xl + (3(x 
Xl),
(6.5)
Xl),
(6.6)
and two estimated equations, Y1
= a1 + b(x 
The sum of squares of deviations of the observations from the two parallel estimated lines is fll
R =
L (Ylv 
fl2
ylv )2
v
+ L (Y2v 
Yzv)2
v
nz
fll
= L [Ylv 
a1
b(Xlv  Xl )]2

v
+ L [Yzv 
a 2  b(X2v  X2)]2.
v
(6.7)
The least squares estimates are those which make this a minimum. Differentiating with respect to aI, az and b and equating to zero gives
oR  =
oa
n1
2 L rylv  al  b(Xlv  Xl)]
= 0,
(6.8)
v
1
(6.9)
'" [Yzv  a z  b(xzv 2 L
x2)](X2v  x2) = O.
(6.10)
v
The first two of these equations give (6.11)
SECT.
11.6
351
THE COMPARISON OF TWO REGRESSION LINES
and the third gives nl
b
=
'L Y1v(~lv 
n2
+ 'L Y2v(X2v 
Xl)
v
X2) (6.12)
v nl
'L (Xlv 
112
X1)2
+ 'L (X 2V 
v
X2 )2
v
The same arguments that were used to obtain (2.16) give (6.13)
To estimate (J'2 for the model (6.5), the sum of squares of deviations from the two parallel lines is given by (6.7). Inserting the solutions for aI, a2, and b, straightforward manipulation leads to
Since we have fitted three parameters to the data, this sum of squares has 3 degrees of freedom. We now proceed to test whether the two parallel lines are identical, i.e., lie on top of each other. If the true lines (6.5) are identical, then '71 = 'fJ2 for all X, and hence
"I + "2 
(6.15) and
(CX l  C(2)  {l(Xl 
x2) = o.
(6.16)
x2)
(6.17)
It follows that the quantity
(al  a 2)  b(Xl 
will have expected value zero, and be distributed normally with variance
(J'2
"I
+ (J'2 + (Xl _
X2)2V[b]
= (J'2[.l + ~ +
"2
111 ,
112
2 ~~l  X2)2 ]. . '" '" ( . _ :.)2 (6.18) ,£.,,£., X'V X, v
i
Thus, if the lines are identical,
S [1/11 1
+
(a l  a2)  b(Xl 2 1/112 + (Xl  X2)2 /
x2)
t~ 11;
~
r.;
t
(
"I + 112 
3)
(XiV  Xi)2] (6.19)
where s is derived from the sum ofsquates in (6.14). A numerical example will be discussed in Section 11.8.
352
SIMPLE LINEAR REGRESSION
CHAP.
11
11.7. Parallel Line Biological Assay An important application of the foregoing section is to a common form of biological assay for vitamins, hormones, etc., in which the response of the organism, usually an ~nimal, over a certain range is linearly proportional to the logarithm of the dose. For complicated substances not readily susceptible of chemical analysis this provides a method of assaying the potency of an unknown preparation in terms of a standard. Suppose that animals receive various known log doses xlv of the standard and other animals receive various known log doses x 2• of the unknown. Then the responses Yl> Y2 may be fitted by two straight lines, one for the standard and one for the unknown, with common slope b: Yl
= al + b(x 
Y2
Xl),
= a 2 + b(x  x2).
(7.1)
These lines are estimates of the corresponding true lines: "II
= otl + {3(x 
Xl),
1]2
=
ot 2
+ {3( x

x2)·
(7.2)
If the basic preparations of the standard and the unknown from which the various doses of each have been prepared are of the same potency, then the two lines (7.2) will be identical, but if, e.g., the unknown is p times as potent as the standard then it will take IIp units of the unknown to give the same response as is given by 1 unit of the standard. Graphically, the line for the unknown will be displaced horizontally relative to the line for the standard by an amount equal to log p = p" say. Since the lines have the same slope, they are parallel, and therefore the horizontal distance between them, p" is the same for all values of the response 1]. Let ~l be the value of Xl when 1]1 takes some convenient value, say (otl + ot z)/2, and let ~2 be the value of X 2 when 1]2 takes the same value. Then ~1  ~2 = p" and otl
+ {3(~1 
Xl)
= otl + ot2 = ot2 + {3(~2  x2), 2
(7.3)
whence P,
1: = S"l 
1:
S"2
=

Xl 

X2 
otl 
ot2
.
(7.4)
{3
The difference ~l  ~2 = P, is the difference in the logarithms of equivalent quantities of the standard and unknown; i.e., p, is the logarithm of the potency ratio, and the antilogarithm of p" p, is the potency ratio. To obtain an estimate M of p, we replace otl' ot2 and {3 in (7.4) by the estimates aI, a2 and b. From (6.11), al = ih, a2 = Y2' and b is given by (6.12):
(7.5)
SECT.
11.7
353
PARALLEL LINE BIOLOGICAL ASSAY
The estimated potency ratio, say 1', is the antilogarithm of M. To obtain confidence limits for the logarithm of the potency ratio /" consider the variable z defined as
z = (th  Y2) Its expected value is
=
E[z]
substituting for
b[(~l

~2)
 (Xl  x2)].
(IXI  1X 2)  P[(~l  ~2)  (Xl 
~l 
~2
V[z] = V[fjl]
(7.6)
x2)] = 0,
(7.7)
from (7.4). The variance of z is
+ V[Y2] + [(~l 
~2)  (Xl  X2)]2 V[b]
(7.8)
(7.9)
This is similar to (5.7), with a replaced by th  Y2' ~ by ~l Xl 
x2, 11k by 1/111 + 1/112, and
~ (Xi  X)2 by
2
lIi
i
V
!! (XiV 
~2' X

by
Xi)2, and with
fi' and 11m omitted. Equation (7.9) can thus be handled in the same way as (5.7). If we define gas
(7.10)
analogous to (5.11), we get as the confidence limits for
(Xl  X2) _
+
~l 
~2
= /"
(Yl  Y2)lb
1g Is [(1 _ b(1  g)
g)(.!. + l) + 111
112
b2
(Yl  Y2)2 ]~ (Xiv _ Xi)2 '
±! •
(7.11)
v
where t takes the probability levels P2 and PI to give P 2  PI confidence limits. For a comprehensive review of the statistical problems in biological assay see two books by Finney [4,5]. The first deals mainly with the awkward situation where the response at each x is not a continuous variable but instead an allornone affair, e.g., alive or dead. At each dose
354
SIMPLE LINEAR REGRESSION
CHAP.
11
we have a proportion hi of animals surviving. The fitting of a regression line of h on x is not straightforward since the hi are binomial variables with variances 0i(1  0i)/n i which are not constant but instead a function of 0. In fitting the line the points hi have to be weighted inversely as their variances. Furthermore, the variances involve the 0i which are unknown but which can be estimated from a provisional line. This will give a better line, which will give better estimates of the Oi' which give a still better line. This iterative procedure converges, but a number of theoretical problems are involved, and the calculations in practice are tedious. The second text of Finney is a comprehensive examination of all types of bioassay. See also Emmens [6]. 11.S. An Example of Parallel Line Biological Assay In an assay of an estrogenic hormone [6], three groups of rats received 0.2, 0.3 and 0.4 mg of the standard and two groups received 1 and 2.5 mg of the unknown. Table 11.4 gives a linear function of the logarithm of
the weight of the rats' uteri. The '11th (v = 1, ... , niJ) observation at the jth level (j = 1, ... , m i ) with the ith preparation (i = 1 for the standard and i = 2 for the unknown) is (x iiv , Yo.)' The lower part of Table 11.4 assembles certain sums, sums of squares and sums of products. We further need
"'1
'IIlj
j
v
k"" k""
(x liv  X1.. )2
"'1 '1111 2
= k"" k"" •x lJv j
"'1 '1111 )2 ( ~' "'11v Xliv
v
(8.1)
""
k Illi i
= 4.175858 _ (8.~!2)2 = 0.254205, (8.2)
tnt fllJ "'1 "11
"'1 'IIlJ
11 (Xliv  x1J(Yliv i
v
fh) =
11 X 1ivY1jv j
v
1nl flU
11 Xliv 11 Yliv i
v
"'1
j
v
(8.3)
""
kill} j
= 794.598 _ 8.632 X 1721 = 12.720526 19
Table 11.4*
Dose, mg log (dose x 10)
=x
0.2 0.301 1
0.3 0.477 1
0.4 0.602 1
1
2
3
1.0 1.000 2
2.5 1.398 2
1
2
ml
j
L
ni;
= Yiiy
77 93 116 78 87 86 101 104
118 85 105 76 101
6
8
5
494
742
41,912
70,100
73 69 71 91 80 110
>i
00 m2
L
;
Response
'"trIC"l
Unknown
Standard
;
79 87 71 78 92 92
s:: '1:j
101 86 105 111 102 107 102 112
t"" tTl
0
'rl '1:j
> := > t""
t"" tTl t""
C Z
19
6
8
14
485
1,721
499
826
1,325
48,151
160,163
41,863
85,744
127,607
nij
LYiiV
tTl
>
v
tTl
tl:1 ....
0
t""
0
0
'nij
LY~;v
•
t""
nij
LXi;v
•
n >
6 x .301
8 x .477
6 x .301 2
8
5 x .602
8.632
6 x 1.000
8 x 1.398
4.175858
6 x 1.0002
8
17.184
> """" > >
"] + [~~ (XiV  ii.>"]
Degrees of Mean freedom squares
1
1
s:
k2
sf
kl
s'•
1
Deviations of the group means about their regression line Between the individual slopes bl About the individual lines About the overall line
k
Li k
L
A
"i[O'.  {Ooo (b i 
i
k
ni
i
•
k
nj
LL
LL i
•
+ b(Xi. 
 2 ni
b)
L
(.1:iV 
:f i .)
i oo)}]' 2
v
k
{Yiv  [Oi.
+ b i(·1:iV 
Xi.)]}"
{Yi.  [0
+ b(XiV 
i . .)]}'
00
L",2k i k
L",2 i
s·1
380
SIMPLE LINEAR REGRESSION
CHAP.
11
Each term in (14.16) has a useful interpretation. 1. Yiv  [Yi. + b;(XiV  Xi,)] is the deviation between Yiv and the value predicted by the ith individual regression line (14.6). The corresponding sum of squares is the numerator of s~ defined in (14.5). 2. (b i  h)(x iV  Xi,) is the deviation between the slope of the ith individual line (14.6) and the parallel lines (14.8), weighted by the deviation of Xiv from Xi.' 3. Yi.  [Y .. + h(xi.  xJ] is the deviation between the mean of the ith group and the value predicted by the regression line for group means, (14.10). 4. From the identity (14.17) we see that the sum of squares derived from the last term in (14.16) is a function of the difference between h, the regression coefficient for the group means, and h, the regression coefficient for the parallel lines. Table 11.12 gives the results of observations on three lime kilns. The variables X and Y were daily observations on the tons of lime made per day x and a measure of the quality of the lime y. The mean qualities Y}" Yv Ya. (column 4) were 70.767, 69.933, and 79.514. A simple analysis of variance could be made on Yiv to test the null hypothesis that there was no difference between the kilns as regards quality, but it was known that y was roughly linearly related to x, and the tonnage means xl.' Xv and xa. (column 6) were 74.100, 69.733, and 66.541. We therefore want to compare the Y means adjusted for variation in the x means. We need the sums of squares and products of Xiv and Yi," and these are given in columns 7, 8, "I
and 9. We next calculate
I,.
(Yi"  Yi)2, etc., in columns 10 through 15
of Table 11.12. The various sums of squares can be calculated as follows. The sum of squares for s~ is the sum of the sums of squares of deviations about the separate lines: Each of these sums of squares is
"j
.I (Yiv 
Yif 
[1 (XiV 
X;.)(YiV  y;.)
r
..:::V_ _ " j_ _ _ _ _==_
.I (Xiv 
(14.18)
Xi,)2
v
so the sum over i is k
111
1. _ y.,).I .2 _ k ([~(XiV".
.Ii .I CY,v v
i
Xi)(Yiv 
.Iv (Xiv I
_
Xi,)
2
ytJJ2) .
(14.19)
SECr.
11.14
381
COMPARISON OF SEVERAL REGRESSION LINES
This calculation is carried out in Table 11.12. The separate sums of squares for each line, (14.18), are tabulated in column 17, and their sum over i, (14.19), is given in the last row of column 17. The sum of squares for s~ is k
l
'1Zi
(b i

i
6)2
L (Xiv 
Xii
v
k
lli
Li Lv (Xiv 
XiJ 2
(14.20) = [16]  [IW = 13 158.245 _ (21,972.630)2 = 518.498
[13]'
38,196.689
where the numbers in square brackets are the column totals from Table 11.12. For the sum of squares for we need
s:
(
"I )2
k
t~Yiv
(14.21)
k
L /Ii i
[W =
= [10] 
604246.590 _ (8212)2 = 2131.019 [2]' 112 '
k
k'"
/I.(x.  X•• )(y.I.  y •• ) t I.
i
(14.22)
= [14] _ [5][3] = 572 527.370 _ 7823 X 8212 = 1066.166
[2]' k
k
'" /I.(x. _ X )2
4 i
• ,.
..
= "'" '"
(I
Xiv
v
i
112
r (i I r
'
Xiv
_
l1i
,,'_v,,k
(14.23)
"'" "'" 11;
;
= [12] 
[W =
547370.311 _ (7823)2 = 947.731. [2]' 112
30 45 37
ni
2
70.767 69.933 79.514
2123 3147 2942
Ie "'i
iii.
""
4
I Yiv v
3
Ini IIYiv i v i = 112 = 8212
Ie
1 2 3
1
Xiv
"'i
i v = 640,642
ni
IIY~v
k
159,813 232,975 247,854
v
IY~v
7
Ie "'i
74.100 69.733 66.541
Xi.
6
I l x iv i v =7823
2223 3138 2462
"'i
1v
5
i v = 585,567
k 'ni
Ilx~v
176,675 233,698 175,194
v
'1li
Ix~v
8
Table 11.12
"
= 594,500
i
I I Yiv x iv
Ie "'i
i
r = 604,246.590
Yiv
f Cit ni
150,237.633 220,080.200 233,928.757
162,518 227,882 204,100
(~YiV) ni
"'/
10
1v YivXiv
9
ihY = 36,395.410
"
Ie "'i
11 (YiV i
ih·)2
9,575.367 12,894.800 13,925.243
1v (Yiv 
~
S z
Cf.I
ttl
~
~ ttl
0
~
ttl
Z
C
ttl
'1Ii
s::
5!l
." t"'
[7]  [10] =
11
w
tv
00
2 3
=
ni
547,370.311
4•
k Ci ~XiV
r
164,724.300 218,823.200 163,822.811
ni
~XiV ) C
12
i

x;)2
=
=
V
 Xi)2
38,196.689
'Ili
11,950.700 14,874.800 11,371.189
'L 'L (Xiv
k
v
'L (xiV
ni
[8]  [12]
13
ni
ni
V
ni
=
572,527.370
ni
V
ni
'L xiv 'L Yiv
'LV
k
157,314.300 219,450.800 195,762.270
V
'L Xiv 'L Viv
ni
14
ni
;
v
=
=
21,972.630
Xi)(YiV  y;)
5203.700 8431.200 8337.730
 X;)(YiV  Y;)
'L'L (XiV 
]c
'L (XiV
ni
[9]  [14]
15
Table 11.12 (Continued)
'L
k i
=
2265.850 4778.897 6113.498
 Xi)2
=
 x.)2 1. 13,158.245
IV
'L (X.
ni
=
T
T iii)
~ (XiV  X;)(YiV  Y;)
Ci
ni
'L (XiV
[ni ~ (XiV  Xi.)(Yi" 
[15]2/[13]
16
23,237.165
Sum of squares for si
7309.517 8115.903 7811.745
[11]  [16]
17
w
w
00
'"
t'"
Z I'T1
z
til
'"(5
:;.:l I'T1
a
m
:;.:l
> t'"
I'T1 :;.:l
."
'"I'T1 :;.:l v.;
s:: "0
0
(")
.j::.
....
;i
rrI (")
tI>
384
SIMPLE LINEAR REGRESSION
The sum of squares for
L" lI i[il;. 
{Y ..
i
CHAP.
11
s: is
+ b(Xi. 
XJ}]2
"
[±
IIlXi .

YJ]
XJ(yi. 
2
'"l l i (_ _.. )2  =..:....:,,=, = k Yi.  Y
L IIlXi . 
i
(14.24)
X.Y
i
= 2131.019 _ (1066.166)2 = 931.618. 947.731
For the sum of squares for
s! we first substitute [15] and [13] in (14.9),
h = 21,972.630 = 0.575250 38,196.689
'
and (14.22) and (14.23) in (14.11),
b=
1066.166 947.731
=
1.24967.
We can now calculate the sum of squares for (1.124967 _ 0.575250)2(_1947.731
+
s! from (14.17) as
1 38,196.689
)1= 2673.310.
For the sum of squares about the overall line we first need
(14.25)
=
[7]  [3]2 [2]
=
640642 _ (8212)2 , 112 k
~,
'/I;
L L (Yiv i
yJ( Xiv  xJ =
38526.429, '
'ni
L LXiv L L Yiv L L XivYiv  i \' ~, i \' /,
\'
k
1Z;
=
11;
(14.26)
i v ,
k i
=
[9]  [5][3] [2]
=
IIi
594 500 _ 7823 X 8212 , 112
=
20906.464, '
11.14
SECT.
and
fl,
k
fl'
k
"" (X'.tV 4"" ~
385
COMPARISON OF SEVERAL REGRESSION LINES

X•• )2
= 4t "" k"" X~
tv
tv
_
(i i Xiv i
tv
vk
r
(14.27)
'"
£.. lli i
=
[8]  [5]2 [2]
=
585 567 _ (7823)2 , 112
=
39 144.420. '
Then the sum of squares of deviations about the overall line is k
11,.
II i
k
(Yiv  Y;v)2
=I i
v
11,
I
{Yi  [y ..
y
+ b(Xiv  xJW (14.28)
=
38526.429  (20,906.464)2 = 27360.591. , 39,144.420'
The sum of squares due to regression on the overall line is k
fl'
"" "" (Y.. _  )2 4£.. n' Y.. t V
[± i (YiV 
yJ(xil• 
XJ]2
= =='',._ ____ k
____==__
/I'
(14.29)
II(x;v  x.Y v ' _
i
= (20,906.464)2 = 11 165.838.
39,144.420
'
FinalIy, the total sum of squares about the grand mean is k
11,
Ii Iv
(Yiv 
y.Y = 38,526.429.
We now assemble alI these sums of squares in Table 11.13. Various interpretations of Tables 11.11 and 11.13 are illustrated in Figure 11.9. In Figure 11.9a the slopes Pi of the individual lines are different and in general not equal to the average slope i3 of the separate paraIIellines. The test of the null hypothesis Pi = i3 is given by s~M = 259.249/219.219 = 1.18 which under the nulI hypothesis is distributed here as F(2, 106). Clearly here the nun hypothesis of paralIelism of the separate lines is acceptable.
386
SIMPLE LINEAR REGRESSION
Figure 11.9
CHAP.
11
SECT.
11.14
387
COMPARISON OF SEVERAL REGRESSION LINES
If we can accept the foregoing null hypothesis we then proceed to test whether the group means can be regarded as lying on a least squares line. In Figure 11.9b they do not, whereas in Figure 11.9c they do; the test is = 931.618/219.219 = 4.25, which under the null hypothesis is distributed here as F(I, 106). In this instance this null hypothesis is rejected. This establishes that the lines for the groups do differ in the way illustrated by Figure 11.9b.
s:M
Table 11.13

,
Degrees of freedom
Sums of squares
Source of variance Between band b Deviations of the group means about their regression line Between the individual slopes About the individual lines
931.618 518.498 23,237.165
1 2 106
About the overall line Due to the overall line
27,360.591 11,165.838
110 1
Total
38,526.429
111
2,673.310
Mean squares 2673.310
= s!
931.618 = si 259.249 = s~ 219.219 = si
If this null hypothesis had been accepted, then there would remain the possibility that though the individual lines are parallel with slope jj and the group means do lie on a line with slope p, yet jj ¥= p, as illustrated in Figure 11.9c. The alternative is that jj = p, as illustrated in Figure 11.9d, and this implies that a single line, the overall regression line, is an adequate fit to all groups. The test for this in Tables Il.ll and 11.13 is
2/ 2
S4 S1'
In some forms of analysis of covariance, the parallelism of the separate lines is assumed, so that s~ and s~ are pooled, and also the null hypotheses and are tested jointly, by pooling these. This will be involved in discussed in the next section. If we accept the null hypothesis that the individual lines are parallel, we can construct the adjusted means as follows. The individual lines with common slope (14.8) are
s:
s:
(14.30)
and the y mean adjusted for x equal to x .. is Yi(.r)
=
fh
+ b(x .. 
Xi}
(14.31)
388
SIMPLE LINEAR REGRESSION
For example, x..
= 7823/112 =
Y 1(:r) = 70.767
11
CHAP.
69.848, and
+ 0.57525(69.848 
74.100) = 68.321.
The variance between two adjusted means is
+
V[Y;(£) Y;'(X)] = V[il;,
= V[y;,]
fil. 
h(x..  xd 
+ V[yd + (Xi.
h(x..  Xi')]
 xi,/V[h] (14.32)
where V[h] is given by a formula analogous to (6.13). For example, the estimated variance between the adjusted means for kilns 1 and 2 is V[Y. 1(.;;)
y;
2(£)
]
= 219.219[.l + ~ + (74.100 30
69.733)2J 38,196.689
45
=
12.29
'
and the confidence limits can be immediately constructed. Table 11.14 Sums of squares
Degrees of freedom
Mean squares
Between kilns Within kilns
2,131.019 36,395.410
2 109
1065.509 333.903
Total
38,526.429
III
Source of variance
It is natural to inquire whether the analysis of covariance of these data has achieved any advantages over a simple analysis of variance of y. From the calculations given, we can readily assemble Table 11.14. The variance ratio is 3.191, which can be compared with 8.222 obtained from Table 11.1 3 by pooling s! and s:. We can also compute the variances between the unadjusted means as ~ V[Y 1

1 Y2 ] = 333.903 ( :nr
+ 45) = 1
18.550
which is some 50 per cent larger than the variance between the adjusted means.
11.15. Simple Analysis of Covariance Quite frequently the analysis of the previous section is simplified. Two changes are made: 1. The sums of squares and degrees of freedom for
s! and s: in Table
SECT.
11.15
389
SIMPLE ANALYSIS OF COVARIANCE
11.11 are combined into a single line measuring differences between the adjusted y means. 2. The test for parallelism of the individual slopes involving s~ is omitted. The effect of these two changes is to simplify the arithmetic considerably. This modified analysis starts from a table of sums of squares and products (Table 11.15). Table 11.15 source of variance
Degrees of freedom
Sums of squares and products for x"
k
k
l'
~ IIlrl . 
Groups k  l
for Y'
for '''Y ~ IIMI' 
:;,.Y
.r.J(iil .
ii .. )"

I
= T;rz k
Error
k
~1I1k i
k
"I
~ ~ ('''Iv v i
XI')"
Total
"i ~ ~ (,CiV i v
~lIi1 i
= T•• k
"I
~ ~ (a'iv i v
·"I·)(Yi,· 
iii')
:r ..>"
k
"i ~ ~ ('''iv i v
=S".
"I
~ ~ (YiV i
iii')'
v
= E ••
= Ex.
l'
ii .. )"
I
= Tx.
=Exx k
~ "l(iil· 
k "i
X.. )(Yiv 
ii .. )
~ ~ (Yiv i v
ii.Y
=S..
= Sx.
To pool s! and s: in Table 11.11 involves pooling the degrees of freedom, 1 + (k  2) = k  1, and the corresponding sums of squares. Call the pooled mean square Then the corresponding sums of squares will be (k  l)s;. Straightforward manipulation gives for the sum of sq.uares for the adjusted y means (k  l)s; = TlIlI _ (E",y + T",y)2 + E!y, (15.1) E",,,, + T",,,, E",,,, changing from the notation of Table 11.11 to that of Table 11.15. To dispense with the test for parallelism of the individual lines amounts. to assuming that the individual lines are parallel, when s~ of Table 11.11 will have expected value ()'2. Pooling the degrees of freedom of s~ and s~ gives
s;.
(k  1)
+
(*
2k)
11; 
=
*
11; 
k  1.
(15.2)
Straightforward manipulation gives the error sum of squares as
( ~~' 11; ,
)
k 1
2
S5
=
EY1l 
E!y 
E",,,,
(15.3)
changing from the notation of Table 11.11 to that of Table 11.15. Under the null hypothesis that the adjusted means are equal, s;/s: will be distributed as
F(k  1, ~
11; 
k 1).
390
SIMPLE LINEAR REGRESSION
CHAP.
11
To apply this simplified procedure to the data of the example of the previous section, Table 11.16 gives the numerical values corresponding to the data of Table 11.l2. The T's and the S's have been calculated earlier, the E's are most easily obtained by difference. Using (15.l), the sum of squares for adjusted y means is 2S2
= 2131.019 _
(20,906.464)2 39,144.420
6
s:
+ (21,972.630)2 = 3604.927' 38,196.689
'
so = 1802.463. It will be noted that, as a check, the sum of the sums of and si in Table 11.13 is 2673.3lO + 931.618 = 3604.928. squares for Using (15.3), the sum of squares for error is
s:
( 112  3  l)s2
=
36395.410  (21,972.630)2 = 23755.664' 38,196.689"
6,
so s~
=
s: and
219.960. It will be noted, as a check, that the sum of squares for + 23,237.165 = 23,755.663.
s~ in Table 11.13 is 518.498
Table 11.16 Source of variance
Degrees of freedom
Sums of squares and sum of products of deviations
Groups Error
2 109
947.731 Trz= E.rx = 38,196.689
Total
111
S.rx
for
for xy
xi
= 39,144.420
= 1,066.166 = 21,972.630 Sz. = 20,906.464
T•• Ex.
for y' T•• = 2,131.019
= 36,395.410 S •• = 38,526.429
E ••
Hence the test of the null hypothesis of equality of the adjusted means is
~= S6
1802.463 219.960
= 8.19,......, F(2, 108).
The arithmetic necessary for this simplified analysis of covariance is substantially less complex than in the full analysis of the previous section. Nevertheless, the full analysis would ordinarily be indicated except in routine applications where a substantial body of previous experience had established that the separate lines, if not truly parallel, at least must be very nearly so. Nonparallelism of the lines is an important feature and should not be overlooked. The interpretation of an analysis of covariance can prove remarkably tricky, and a paper by Fairfield Smith [7] should be studied in this context. For a general discussion see Cochran [8].
SECT.
11.17
391
REGRESSION WITH ERROR IN INDEPENDENT VARIABLE
11.16. Exponential Regression Suppose that our model is 'YJ =
CJ.eP"'.
(16.1)
The sum of squares of deviations between the observations Yi and the predictions Y i given by the estimated equation Y = aeb'" is (16.2) Differentiating this with respect to a and b and equating to zero gives (16.3)
oR  = ob
. + 2a 2 '" x.e2b
2a '" x.y.e b"', £.,
• •
£.,
•
"'l
= 0
(16.4) ,
whence we have the two simultaneous equations (16.5) (16.6)
in the two unknowns a and b. An exact solution can only be approximated to by a tedious iterative procedure. An alternative approach is to take logarithms of(16.1): log 17
= log CJ. + f3x,
(16.7)
and obtain a least squares solution for log CJ. and f3 by minimizing R
=
2 (log Yi 
log Y i )2
(16.8)
in the usual way, i.e., handle the problem as if it was to regress log yon x. The use of log y in place of y means that we are minimizing the sums of squares of deviations of log y from log Y instead of y from Y; so we will obtain a different solution. Also, if V[Ylx] = (}'2, a constant, then V[log Ylx] will not be a constant, and the leastsquares analysis based on log y will be incorrect. Quite often, however, V[ylx] = k2172; i.e., ·the standard deviation is proportional to the mean, and then, as discussed in Section 3.3, log y will have a constant variance. In these circumstances, then, we will be correct in regressing log y on x. 11.17. Regression with Error in the Independent Variable Throughout this chapter we have assumed that the independent variable x was observed without error. We have further supposed that y was
392
SIMPLE LINEAR REGRESSION
CHAP.
11
distributed normally about 17 with variance ()"2, and it is irrelevant whether this variation in y is in some sense genuine or represents measurement error. For example, if we were measuring the current y in a circuit for various values of the applied voltage x, then presumably variation in y is largely measurement error. On the other hand, if we were measuring the weights of hogs y fed various quantities of a ration supplement x, then the larger part of the variation in y for a fixed x would represent genuine variation in the hogs' weight. The theory of this chapter is the same whether the variation in y is measurement error or genuine variation. The assumption that x is known without error is, however, fundamental. Suppose that there is a true linear relation
v=
ex.
+ fJu
(17.1)
where ex. and fJ are constants. Suppose that we obtain the various values of x in the following manner. The controls of the plant are set to bring u into roughly the desired region, and we then measure u with error d, independent of u, so that we record the measurement as x' where x'
=
u
+ d.
(17.2)
The variable v is observed and measured with error e, independent of D, so that we record y' = V + e. (17.3) Thus v
=
y'  e, u
= x' 
d. Substituting in (17.1),
y' = ex.
+ fJx' + (e 
(17.4)
fJd).
This appears at first sight to be a standard regression model, y' being given by a linear relation ex. + fJx' plus a random error e  fJd. However, in standard regression analysis it is assumed that the random error is independent of the independent variable. In the present case this condition is not satisfied. We demonstrate this as follows: Cov [x', e  fJd] = E[(u
+ d)(e 
fJd)]  E[u
= E[ue] + E[de]  E[ud]  {E[u] + E[d]}{E[e] 
+ d]E[e 
fJd]
 fJE[d 2 ] fJE[d]}.
(17.5)
Now we are assuming that d is independent of u and of e, and e to be independent of u, and d and e to have zero expectations. It follows that Cov [x', e  fJd]
=
fJE[d 2 ]
=
fJV[d] =/= 0,
(17.6)
and therefore x' and e  fJd are not independent. For reviews of what can be done in this situation see Madansky [9] and Keeping [10].
393
EXERCISES
Berkson [11] pointed out that if we operate the controls to bring a gauge recording u to a selected value XI, it is true that due to random errors in the gauge, say d, we do not get the plant set at XI but instead at
u = XI  d.
(17.7)
If we substitute this value of u and v = yl  e from (17.3) in (17.1), on rearranging we get (17.8) yl = IX + (3XI + (e  (3d). This is a standard linear regression situation, since XI is a fixed, chosen variable and' the error term e  (3d is independent of XI. Hence our sample estimate of (3 will be unbiased. For a review of Berkson's model, see Lindley [12], and for a generalization, see Scheffe [13].
EXERCISES 11.1. The solubility of nitrous oxide in nitrogen dioxide was determined with results as below for temperature ranging from 263 to 283 degrees absolute. The reciprocal temperature is expressed as IOOO/T. A number of independent determinations were made at each temperature. [Data from W. Arthur Rocker, "Solubility and freezing point depression of nitrous oxide in liquid nitrogen dioxide," Analytical Chemistry, 24 (1952), 13221324.]
Reciprocal temperature
3.801
3.731
3.662
3.593
3.533
1.28 1.33 1.52
1.21 1.27
1.11 1.04
0.81 0.82
0.65 0.59 0.63
.Solubility, % by weight
(a) Fit a straight line of the regression of solubility on reciprocal temperature. Test this line for deviations from linearity. (c) If the hypothesis of linearity
(b)
is acceptable, form a pooled estimate of the variance of the solubility measurements. (d) What is your estimate of the true solubility fora reciprocal temperature of 3.78? (e) What are the 95 per cent confidence limits for this estimate? (f) What are the 95 per cent confidence limits for the slope of the line? (g) Suppose you took a sample and found its solubility to be 1.30. Between what limits would you have 95 per cent confidence that the reciprocal temperature lay (assuming the sample to be saturated)? Use (i) an exact method, (ii) an approximate method. 11.2. The data below are similar to those of Table 11.9 but on a sample of a different make of automobile. Xi
Yi
19.65 3.44 { 3.93
31.15 4.98 5.45
35.95 6.40
50.15 8.88
59.65 11.22
394
SIMPLE LINEAR REGRESSION
CHAP.
11
I. (a) Obtain the regression line of y on x. (b) Test the null hypothesis of linearity for this line. (c) Extrapolate the line to x = 0, and construct 95 per cent confidence limits for 1] at this value of x. II. Compare this line with that for the data of Table 11.9. (a) Compare the variances about the regression lines. (b) Compare the slopes. (c) Test whether the lines can be regarded as coincident. 11.3. In a comparison of a new method of gas analysis with a standard method, the following results were obtained on a series of samples:
Standard method (x): New method (y):
2.97 2.94
3.56 3.54
6.45 6.48
1.12 1.08
6.66 6.73
1.37 1.33
6.80 6.86
Data from Jere Mead, "A critical orifice CO 2 analysis suitable for student use," Science, 121 (1955), 103104. Fit the line y = a + b(x  x), and test at the level of significance IX = 0.05 the null hypotheses (a) {J = 1; (b) the intercept on the y axis equals zero; (c) items (a) and (b) jointly. 11.4. In an assay of a preparation of insulin of unknown potency against a standard, three doses of the unknown and three doses of the standard were injected into rabbits, and the percentage fall in blood sugar after a certain period was observed; these are the data below. In insulin assay of this type, it is usually assumed that the plot of the above variable against log dose is a straight line. (a) Test the parallelism of the two lines. (b) Calculate the point estimate of the potency ratio. (c) Calculate 95 per cent confidence limits for the potency ratio. Preparation Unknown log dose
Standard log dose 0.36
0.56
0.76
0.36
0.56
0.76
17 21 49 54
64 48 34 63
62 72 61 91
33 37 40 16 21 18 25
41 64 34 64 48 34
56 62 57 72 73 72 81 60
11.5. A sample of 56 subjects were given a test involving mental addition, x. They were divided into four groups randomly, and each group drank one of f?ur beverages, which could not be distinguished by the subject. After a short ttme
395
EXERCISES
interval to allow the drugs in the beverages to act, the subjects were retested by a replicate of the first test, y. The scores are as below (data from H. Nash). Drug A
DrugC
DrugB
DrugD
x
y
x
y
x
y
x
Y
24 28 38 42 24
24 30 39 41 27
23 33 39 36 18
18 32 33 35 19
27 27 44 38 32
35 31 55 43 44
27 44 39 27 59
28 40 34 27 47
39 45 19 19 22
46 56 25 18 25
28 43 37 30 49
28 41 37 33 39
32 24
28 33
13
13
39 52
39 58
36 19 34 22 28
30 24 28 21 28
34 52 27 42
31 52 38 45
37 40 36 41
38 41 38 36
17 20 49 29
18 17 41 25
39 29 55 49
39 26 46 42
I. Make a simple analysis of variance on the y's. II. Make a simple analysis of variance on the differences y  x. III. (a) Make an analysis of covariance of y, including a test for the parallelism of the separate regression lines. (b) Discuss the difference between I, II, and IlIa. (c) Construct a simple (non multiple) comparison 95 per cent confidence interval for the difference between the adjusted means for drugs A and B. (d) B is a placebo and C and D are two levels of a certain drug, where the level of D is twice the level of C. The contrast B  2C + D will measure the linearity of response. Construct a 95 per cent multiple comparison confidence interval for this contrast. Certain sums and sums of squares and products which may be useful are below
1 2 3 4
"i
fll
III
11l
ni
LYiv
LXiv
L2 ,/Ji,·
2 Lx iv
LX;vY;,'
14 14 14 14
497 468 480 460
455 490 443 507
19,367 16,332 18,842 16,040
16,249 18,008 15,787 20,265
17,623 17,066 17,018 17,941
56
1905
1895
70,581
70,309
69,648
I
"i
k
L
396
SIMPLE LINEAR REGRESSION
REFERENCES 1. Acton, Forman S., Analysis of StraightLine Data. New York: John Wiley and Sons, 1959. 2. David, F. N., and J. Neyman, "Extension of the Markov Theorem of Least Squares," Statistical Research Memoirs, 2 (1938), 105116. 3. Plackett, R. L., "A Historical Note on Least Squares," Biometrika, 36 (1949), 45860. 4. Finney, D. J., Pro bit Analysis. 2nd ed.; London: Cambridge University Press, 1952. 5. Finney, D. J., Statistical Methods in Biological Assay. New York: Hafner Publishing Co., 1952. 6. Emmens, C. W., Principles of Biological Assay. London: Chapman and Hall, 1948. 7. Smith, H. Fairfield, "Interpretation of Adjusted Treatment Means and Regressions in Analysis of Covariance," Biometrics, 13 (1957), 282308. 8. Cochran, William G., "Analysis of Covariance: Its Nature and Uses," Biometrics, 13 (1957), 26181. 9. Madansky, Albert, "The Fitting of Straight Lines When Both Variables are Subject to Error," Joumal of the American Statistical Association, 54 (1959), 173205. 10. Keeping, E. S., Introduction to Statistical Inferellce. Princeton, N.J.: D. Van Nostrand, 1962. 11. Berkson, J., "Are There Two Regressions?" Joumal of the American Statistical Association, 45 (1950), 16480. 12. Lindley, D. V., "Estimation of a Functional Relationship," Biometrika, 40 (1953), 4749. 13. Scheife, H., "Fitting Straight Lines When One Variable Is Controlled," Joumalof the Americall Statistical Associatioll, 53 (1958), 106117.
C HAP T E R
12
The Bivariate Nornlal Distribution and the Correlation Coefficient
12.1. Introduction In the preceding chapter we supposed that we observed Y at a number of selected values of x. The values of x used were what we cared to make them, and x was not a random variable. In this chapter we discuss the case where both x and yare random variables, drawn from some hypothetical population. For example, x could be the girth of a hog and y its marketable weight; or x could be the "intelligence" of a brother and Y the "intelligence" of his sister; or x could be the grade average of members of a class and Y their income in dollars after Federal income taxes, 10 years later. In every case care should be taken to define quite precisely the population from which the. sample is taken, and to recognize that any inferences possible are strictly applicable only to that population. 12.2. Transformations of Bivariate Distributions In Section 1.14 we saw how to obtain the distribution of a function of x, given the distribution of x. In this section we do the analogous thing for a bivariate distribution. We are given a bivariate distribution with the probability density determined by the function Px(x 1 , x z). We are also given that Yl is some function of Xl> X z, namely, Yl = !1(XI, x 2 ), and likewise Yz is another function of Xl' x z, namely, Yz = !z(xI, x z). We assume that these functions have continuous first partial derivatives, and that to each point in the (xl> x z) plane there is one and only one point in the (YI, Yz) plane and vice versa, Let the inverse functions be Xl = gl(YI, Yz), Xz = gZ(YI, Yz). 397
398
THE BIVARIATE NORMAL DISTRIBUTION
CHAP.
12
Now, if Xl, X 2 are random variables, then Yl> Y2 will be random variables. Our problem is to determine
where py(Y!> Y2) is the joint probability density function of Yh Y2, given that we know Px(x I , x 2) and the functions!l and!2.
R ~IfY2
r~yl
Figure 12.1
Consider the area R in the (YI' Y2) plane defined by the four lines YI = YI' Yl = YI + dYI' Y2 = Y2, Y2 = Y2 + dY2 (Figure 12.1). The probability that a random point falls in the rectangle R is approximately equal to the product of the probability density PY(YI' Y2) and the area dYI dY2: (2.2) Now, for a fixed Yl> the equation YI = !1(XI, x 2) will determine a line in the (Xl' x 2) plane, say A. Similarly the equations YI + dYI = !1(X!> x2), Y2 = !2(XI , x 2), Y2 + dY2 = !2(Xl> x 2) will determine three more lines B, C, and D (Figure 12.2). These four lines will enclose an area S. Now, because of the onetoone correspondence of points (YI, Y2) with points (xl' x 2), whenever a random point (YI , Y2) falls inside the rectangle R in Figure 12.1, the corresponding point (Xl' X 2) falls inside the figure S in Figure 12.2. The probability of (Xl' X 2) falling inside S is approximately equal to the product of the probability density Px(x I , x 2) times the area
SECT.
12.2
TRANSFORMATIONS OF BIVARIATE DISTRIBUTIONS
399
of S; hence (2.3)
We therefore need to find the area of S. S is approximately a paraJle1ogram. It is known from coordinate geometry that, if (Xl' X 2), (X~, x~), and (X~, x~) are three of the vertices of a parallelogram, then the area of the parallelogram is given by the absolute value of the determinant 1
Xl
X2
.X'1
•X'2
1 .x"1
x;
(2.4)
We therefore need the coordinates of three of the vertices of S.
r~Xl
Figure 12.2
To obtain the coordinates of PI, the intersection of the lines A and C, we note that along the line A we have values of (Xl> x 2) satisfying the equation Yl = /1(."/;1' x 2) and along the line C we have values of (xl> x 2) satisfying the equation Y2 = /lxl> x2). At the intersection of these two lines these two equations must be satisfied simultaneously; i.e., the coordinates of PI are the solutions for Xl' x 2 in (2.5) But we know that when Yl = YI> Y2 = Y2' (2.6)
400
THE BIVARIATE NORMAL DISTRIBUTION
CHAP.
Ii.
so these x values are the coordinates of Pl. Similarly, the point P2 is the intersection of the lines Band C, and along the line B we have values of (Xl> x 2) satisfying the equation YI + dYI = fl(x I, x 2), so the coordinates of P 2 are the solutions for Xl, x 2 in (2.7)
and we know that when YI
=
+ dYl> Y2 = Y2,
YI
(2.8)
so these X values are the coordinates of P 2 • But, by Taylor series, gl(YI
+ dYI' Y2) =
+ agl dYI + higher order terms;
(2.9)
g2(YI
+ dYI' Y2) = g2(YI, Y2) + aag 2 dYI + higher order terms;
(2.10)
gl(YI, Y2)
aYI YI
so the coordinates of P2 can be written, ignoring the higherorder terms, as approximately ( giYI' Y2)
+ agl dYI' g2(YI, Y2) + ag2 dYI). aYI
(2.11)
.aYI
Similarly, the coordinates of P a are approximately ( gl(YI' Y2)
+ agl dY2' g2(YI, Y2) + ag2 dY2). aY2
(2.12)
aY2
Substituting these three coordinates in (2.4), we get for the area of S the absolute value of 1
gl(YI, Y2) gl(Yl' Y2)
1 gl(Yl> Y2)
ag
g2(Yl> Y2) l
+ a dYI Yl
+ aagl dY2 Y2
Expanding this determinant gives
g2(Yl> Y2) glYI, Y2)
ag2
+ a dYl Yl
+ aag2 dY2 Y2
(2.13)
SECT.
12.3
THE BIVARIATE NORMAL DISTRIBUTION
401
substituting this for the area of Sin (2.3) gives
PY(Yl, Y2) dYl dY2
=
Px(X l , :/:2)
ag l
ag2
aYl agl
aYl dYl dY2 ag2
(2.15)
aY2 aY2 and
(2.16)
where the absolute value of the determinant is used.
12.3. The Bivariate Normal Distribution In this section we shall postulate a certain model which will lead us to the bivariate normal distribution. Let Xl' x 2 be independent normally distributed variables with zero means and unit variances. Let y~, be linear functions of Xl' x 2 defined by constants 1]i, Ii}: y~ = 1]1 + luxl + l12 x 2, (3.1) y~ = 1]2 + l2l Xl + 122x2·
Y;
It will be more convenient to deal with Yl = y~  1]1, Y2 = Y;  172' since these new variables will have expected values zero. Thus Yl and Y2 are functions of Xl and X2:
+ 112 x 2, l2l Xl + 122 x 2·
Yl = fl(x l , x 2) = 1u x l Y2 = f2(x l , x 2) =
(3.2)
The variables Yl, Y2 will be normally distributed with expectations zero, and (3.3) Cov[Yl> Y2]
= E[YlY2]  E[Yl]E[Y2] = E[lu12lx~ + 112122x~ + lu122xlx2 + 112121xlX2] = lu 12l
+ 112122 ,
(3.4)
402
THE BIVARIATE NORMAL DISTRIBUTION
CHAP.
12
We can solve the equations (3.2) to give the inverse functions of x in terms of Y1, Y2: ~ _ g (y y) _ 122Y1  112Y2 _ 122Y1  112Y2 (3.5) '''111>2, 111122  112121
.,\
. (3.6) (3.7) The random variables Xl' x 2 by definition are independent and N(O, I); so
=
.l e~(Q)la+Q)aa).
(3.8)
27T To get the distribution of (Y1> Yz) we use (2.16). We need
(3.9) so Ogl Og2 OY1
OY1
(3.10)
Ogl ogz OY2
OY2
We now use (2.16):
Define p as the correlation coefficient between Y1 and Y2: COV[Y1' Y2]
P = ";V[Y1]V[Y2]
Let V[Y1]
=
O'~, V[Y2]
0'20'2(1 _
1 2
2) P
=
111/21
+ 112/22
(3.12)
= .J(l~1 + n2)(1:1 + 1~2)'
O'~. Then
= (12 + 129)(12 + 12 )[1 11
= (111 122 
1.
21
22
' 12 1'u)2 = .,\2,

+ 112122)2 ] + li2)(l:1 + 1~2)
(111 121
(1~1
(3.13)
SECT.
12.3
403
THE BIVARIATE NORMAL DISTRIBUTION
and the constant part of (3.11) outside the exponent is (3.14)
2d The exponent of (3.11) is 
2!2
{(/~2 + l~l)Y~ 
2(/11/21
+ 112/22)Y1Y2 + (l~l + 1~2)Yn
(3.15) Substituting in (3.11), 1 /
P{Y1, Y2} =
2
P
27T(h(12Y 1 
exp { _
1 2(1 
p2)
Y1 _ [()2 (11
2p Y1 Y2 (11 (12
+ (Y2 )2J} (12
(3.16) •
Finally, we may make the further transformation back to Yl' Y2 which is very simple since the determinant is merely 1: P{Y1'' Y2'}
=
ex p {
1/ 2 1 p
27T(11(12Y
1 2(1 
2
[(Y~
p)
 '(11)2 _ 2p (11
Y~
 'Yh (11
Y~  'Y/2 + (Y~ (12
'Y/2)2]},
(12
(3.17) and this is the general form for the bivariate normal distribution. We can note at once an important result. If the covariance of y~, y~ is zero, then p = 0, and (3.17) becomes
= ~ exp [ !(y~ ./ 27T(11 = p{Yap{y~},
2
 'Y/1)2] J~ exp [ !(y~  'Y/2YJ 27T(12
(11
2
(12
(3.18)
so under this circumstance y~ and y~ are independent. This is the result which was assumed in the discussion in Chapter 8.
404
THE BIVARIATE NORMAL DISTRIBUTION
CHAP.
12
12.4. Some Properties of the Bivariate Normal Distribution
The general form is given in (3.17). Just as with the univariate normal distribution we found it convenient to have a standardized form, so it will be here. We define new variables (4.1) with inverse functions
so that (4.3) The probability density of (uI , u2) will be, using (2.16),
(4.4) It will be convenient to use the symbol c/>(UI' u2) for this standardized form. Of course, c/>(u l , u2) is a function of p. In studying the properties of the standardized bivariate normal distribution (4.4), it will be convenient to make a further pair of transformations. We define new variables
(4.5) (4.6)
(4.7) (4.8)
so that Ogl = 1,
au;
Ogl ou~
= 0,
Og2 _
'2
:l,y1p'
UU 2
(4.9)
12.4
SECT.
PROPERTIES OF BIVARIATE NORMAL DISTRIBUTION
405
and the determinant is
Ogi Og2 Ou~
Ou~
Ogi Og2 ou~
ou~
1
p
=
(4.10)
o
J1 _ p2
Also
u~

2pUIU 2
+ u~ = u1 2  2pul(pu~ + .J17lu~) + (pu~ + Jl  p2U~)2 = (1  p2)(U~2 + U~2).
(4.11)
We now write down the probability density of (u~, u~), using (2.16) again:
(4.12) Thus u~, u~ are independent. We get the marginal distribution of u~ by integratingp{u~, u~} over u~ [see (1.20.10)]:
p{ul.} = f:~ p{u~,
u~} du~ (4.13)
since the integral is just the integral of a unit normal distribution which must equal 1. Hence the marginal distribution of u~ = UI is a unit normal distribution, and so the marginal distribution of y~ will be a normal distribution with mean 1h and variance O'~. The same argument shows that the marginal distribution of y~ is normal with mean 1)2 and variance O'~. The conditional distribution of U 2, given UI [see (1.21. 7)], is
(4.14)
406
THE BIVARIATE NORMAL DISTRIBUTION
CHAP.
12
which can be regarded as a normal distribution of liz with mean pill and variance 1  p2. We now transform this by defining Xz = /2(111' liz) =:: U 20'2 + ~2' so that the inverse function is U z = gz(x l , x 2) = (X2  ~2)/a2' and dg z/dx 2 = 1/0'2' We also define Xl = 1110'1 + ~l' Then (4.14) becomes
which can be regarded as a normal distribution of a random variable X z with mean (4.16) and variance (4.17) The system is, of course, symmetric in
Xl
and
X 2,
and so we have (4.18) (4.19)
Thus the means of the conditional distributions are linear functions of the other variable, and the variances are constants. We recall that in Section 11.2 our model for linear regression was that y was normally distributed around its expected value with a constant variance, and this expected value was a simple linear function of another variable x. Therefore the conditions for a conditional regression analysis of Xl on X 2, and for a conditional regression analysis of X 2 on Xl> are both satisfied. Here there are two true regression lines, (4.16) and (4.18), with regression coefficients (4.20) The product of the two regression coefficients is p2. To find the point of intersection of the two regression lines, we can substitute X 2 from (4.16) into (4.18) which gives Xl  ~l = p2(Xl  ~l)' which is only true if Xl = ~1'
SECT.
12.4
PROPERTIES OF BIVARIATE NORMAL DISTRIBUTION
407
and similarly we obtain X 2 = ~2' The point of intersection is therefore (~1' ~2)' We find the angle between the two regression lines as follows. In Figure 12.3,
{3",1 I'" 2
= tan A,
(3",2 I'"1
= tan C,
(4.21)
and B is the angle between the two regression lines. For three angles,
Figure 12.3
say A, B tan A
+ 90°, and C, whose sum is 180°, + tan(B + 90°) + tan C = tan A tan(B + 90°) tan C,
(4.22)
whence tan(B
+ 90°) =
p(a1/a 2) + p(a2/a 1) = _P_ a~ + a: , p(a1/a 2)p(a2/al)  1 p2  1 a1a2
(4.23)
so (4.24) When p = 1, tan B = 0, and so the two regression lines are identical. When p = 0, tan B = 00, and so the lines are at right angles to each other.
408
THE BIVARIATE NORMAL DISTRIBUTION
CHAP.
12
We can write (4.17) and (4.19) as (4.25) Thus, when p = 1, the conditional variance V[x2lxl] = 0, and, when p = 0, the conditional variance equals the unconditional variance. Thus p2 can be regarded as measuring the fraction of the variance of X 2 "explained" by the regression on Xl> and vice versa. This is one of the most useful interpretations of the correlation coefficient. The nature of the surface generated by the bivariate normal distribution can be seen from the equation for its probability density (3.17). If we place this equal to a constant, this implies ( Yl 0'1
171\2 _
J
2p Yl 
171
0'1
Y2  'YJ2
+
(Y2  'YJ2 \2 = constant, (4.26)
0'2
0'2
J
dropping the primes from the y's. This equation defines ellipses, for different values of the constant, at an angle to the axes. When p = 0, this equation becomes (Yl
J
~ 'YJl + (Y2 ~ 'YJ2Y =
constant,
(4.27)
which defines ellipses whose principal axes are parallel to the (Yl> Y2) axes. We can use the results of this section to indicate the proof of an important theorem we assumed in Section 1.26, namely, that any linear combination of independent random normal variables is itself normally distributed. To prove this for 11 random variables, it will be sufficient to prove it for two random variables. The proof is along the following lines. In (3.2) we defined Yl as Inxl + 112x 2 where In and 112 are constants and Xl and X 2 are independent random variables normally distributed with means zero and unit variances. In (3.16) we found the joint density function P{Yl, Y2}, where Y2 was another linear combination 12l X 1 + 12zx2• We defined Ul as (y~  1h)/0'1' which is equal to Yl/O'l' We subsequently defined u1as Ul , and found in (4.13) that u1was normally distributed with mean zero and unit variance. It follows that the linear combination Inxl + 112x 2 = Yl = UlO'l will be normally distributed, with zero mean and variance O'~ = 1;1 + 1;2' The arguments are changed only in detail if, instead of being distributed N(O, 1), the Xi are distributed N(~i' O'!).
SECT.
12.5
409
THE REGRESSION "FALLACY"
12.5. The Regression "Fallacy" Equations (4.16) and (4.18) are the basis of the phenomenon of regression noted by Galton. Galton observed that on the average the sons of tall fathers are not as tall as their fathers, and similarly the sons of short fathers are not as short as their fathers; i.e., the second generation tended to regress towards the mean. But if we look at the data the other way round, we find that on the average the fathers of tall sons are not as tall as their sons and the fathers of short sons are not as short as their sons, so the first generation tends to regress towards the mean. It seems implausible that both statements can be true simultaneously, so this phenomenon has been called the regression fallacy. In (4.16) let Xl be the height of a father and X 2 be the height of his son: (5.1)
so (5.2) For convenience, assume that the population of fathers and sons is stable, one generation to the next, so that E[x 1 ] = E[x 2 ], (11 = (12' Then (5.2) becomes (5.3)
Suppose that the correlation coefficient between Xl and X 2 is positive and less than 1. Consider a particular value of Xl greater than the mean, E[x l ], so that Xl  E[x l ] > O. Then the righthand side of (5.3) is negative, and we have (5.4)
Thus the expected value for the son's height X 2 for fathers of a particular height Xl (greater than the average of Xl) is less than that particular father's height. In other words, "on the average, sons of tall fathers are not as tall as their fathers." The foregoing arguments are symmetrical in Xl and X 2 , so for a particular x2 greater than E[x 2 ] (5.5) ,
so "on the average, fathers of tall sons are not as tall as their sons."
410
THE BIVARIATE NORMAL DISTRIBUTION
CHAP.
12
The situation is illustrated in Figure 12.4, in which a single constant probability density ellipse, (4.26), is drawn for a bivariate normal distribu_ tion with means E[x 1 ] = E[x 2 ] = 0, variance a~ = a~, and p = 0.6. It is apparent that BA < AO, which is (5.4), and DC < CO, which is (5.5). The fallacy in the regression fallacy consists in supposing that there is a fallacy.
Figure 12.4
12.6. Estimation of the Parameters of the Bivariate Normal Distribution
We use the method of maximum likelihood to obtain estimators of the bivariate normal distribution
The likelihood L =
" P{Xlv' x 2v } is II v
SECT. 12.6
411
ESTIMATION OF THE PARAMETERS
where the summation is over ')I. As is usually the case, it is easier to maximize the logarithm of the likelihood:
logL= nlog(27T0'10'2.J1
l)
1 [L (Xlv 2 ~1)2  2p L (Xlv  ~1)(X2v 2 2(1  p ) 0'1 0'10'2
~2)

+
L (x 2v 2
~2)2J
0'2
.
(6.3) ~lo ~2' 0'1, 0'2,
We differentiate with respect to
and p: (6.4) (6.5)
(6.6)
Equating (6.4) and (6.5) to zero and rearranging gives 
1 ~
"".;'2
~1) A
£., (Xlv 
0'1
0'10'2
f1
(6.9)
b 2 and equate to zero:
oR =
2
ObI
i
[Yv  a  bl(Xlv  Xl)  b2(X 2v  X2)](Xlv  Xl) = 0,
v
(3.5)
oR = ab2
2
i
[yv  a  bl(Xlv  Xl)  b2(X 2v  X2)](X2v 
x2) = 0.
v
(3.6) These three equations imply
(3.7)
" (Xlv L
Since
Xl) =
v
n
L" (yv 
Yv)(x lv  Xl)
L" (yv 
Yv)(x2v 
°= L" (X2v 
= 0,
(3.8)
x2) = 0.
(3.9)
x2), (3.4) gives us
n
n
flo
LYv  l1a  blL(Xlv  Xl)  b2 L(x 2v  X2) y
y
v
= LYv 
na
= 0,
y
(3.10) whence
1
a
n
= n LYv' v
(3.11)
From (3.5) and (3.6) we get a pair of simultaneous equations, known as the normal equations, for the two unknown b's with coefficients involving Sums of squares and sums of products which can be calculated from the data: "
bl L (Xlv v
"
n
xl )2 + b2L (Xlv v
bl L (Xlv  XI )(X2v  X2) v
n
 XI )(X2v  X2) =
Lv Y.(Xlv 
n "
+ b2L (X 2v 
X2)2 =
L Y.(X2v v
Xl),
(3.12)
X2). (3.13)
424
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
Space can be saved by using the definitions
x: = ~ 11
~'
11
~' X:
XI )2,
(Xlv 
= ~ (X 2• 
I'
= I"
XIX2
X2)2,
•
v
xl )(a:2v  x2),
(XIV 
(3.14)
V 11
~' yXI
11
= ~ Y.(Xlv 
~' yX2
Xl),
v
= ~ Y.(X2v •
X2).
with which the normal equations (3.12) and (3.13) read
bl~' x~ + b2~' XIX2 = I' yXI' bl~' XIX2 + b2~' X: = ~'YX2' Ifwe multiply (3.15) by bl~' X~
I' x~ and (3.16) by I'
XIX2
(3.15) (3.16)
we get
I' X~ + b2~' XIX2~' X~ = ~'YXI~' X~,
bl(~' XI X2)2
+ b2~' XIX2~' X~ =
(3.17) (3.18)
~'YX2~' XI X2.
Solving for bl gives
I'
I'
b _~' yXI X~ YX2~' XIX2 I ~' XI~' X~  (~'XIX2)2 .
(3.19)
By similar manipulations,
b2
= I' yX2 ~' X~  I' yXI I' Xl X2 . ~' X~ ~' X~

(~'
(3,20)
Xl X 2 )2
The solution for bl , (3.19), can be written in the form
bl = [~' xi~' x~  (~' XI X2)2]1 X
=
[~' x~ ~ (Xlv 
[I' x~~' x~  (I'
x
{~[(~' X:)(X I• 
~' XIX2 ~ (x 2• 
:i\)y. 
X2)Y.]
X I X 2)2]1
Xl) 
(~' XIX2)(X2• 
X2)]y,,}, (3.21)
which shows that b l is a linear function of the y's. Since the y's are normally distributed, bl will be normally distributed, with variance
SECT.
13.3
425
REGRESSION ON TWO INDEPENDENT VARIABLES
If we define P"'1"'2 as the "correlation coefficient" between
Xl
and
X 2,
i.e., (3.23)
we can write (3.22) as V[b] 1
= ~. "".2 ~ Xl
[1 _ (1'
.2 "', ~ Xl ~
J = ~
1 .
l
X l X 2)2
"',
2
"',
2
~ Xl
X2
1  P"'1"'2 2
(3.24)
This can be compared with the expression for the variance of the regression coefficient of y on Xl alone, (N'L' x~, (11.2.16). Similarly V[b 2 ] = a 2
l' X21
"', •X21 ~ "', •X22 ~
•
1
a2
= ("" "', •x 22 1 ~ X1·X2 )2 ~

2 P"'1"'2
.
(3.25)
The same type of manipulation leads to
whence (3.27) If in (2.17) we substitute bl for Xl' b 2 for X2, etc., we get
We now substitute (3.22) for V[b l ], (3.25) for V[b 2 ], and use (3.27) to get (3.29)
[(bl  fJl)21' x~ + 2( bl

fJl)( b2  fJ2) l' Xl X2 + (b 2 fJ2)21' x~],....., a2l(2). (3.30)
We will assume without proof the generalization of this to r independent variables Xl, ... , Xr: r
~ (b i i
rl

fJi)21' x~
r
+ 21 1 i=l
(b i  fJi)(b j i=i+l

fJj)
l' xix
j ,.....,
a2x2(r). (3.31)
426
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
We can write the deviation between the observation Yv and the true value "Iv as
+ (Yv  "Iv) Yv) + (a  ex)
Yv  "Iv = (Yv  Yv) = (Yv 
+ (b l 
+ (b 2 
{31)(Xlv  Xl)
{32)(X2v 
x2)·
(3.32)
Squaring and summing, and using (3.7), (3.8), and (3.9) to dispose of cross products, n
n
! (yv  "Iv)2 = ! (yv  Yv)2 v v
+ 2(b l
+ n(a 
ex)2
+ (b l
n
{31)2! (Xlv  Xt )2 v

n
{31)(b 2


{32)! (Xlv  XI )(X2v  X2 ) v
+ (b 2 
n
{32)2! (X2v  X2)2.
(3.33)
v
The lefthand side of this equation is distributed as O'2 x 2(n). On the rightn
hand side! (Y v 
Yv)2 has n  3 degrees of freedom since the Yv  Yv
v
have to satisfy the three linear restrictions (3.7), (3.8) and (3.9). The second term, involving a, has one degree of freedom. Third, we showed in (3.30) that the last three terms jointly are distributed as O'2 X2(2). Thus the conditions for Cochran's theorem are satisfied, and the three comn
ponent sums of squares, namely, ! (yv 
Y)2, n(a  ex)2 and (3.30) are
v
independently distributed as O'2 X2. If we define S2 as
(3.34) it will be distributed as O'2x2(n  3)/(n  3) and have expected value 0'2. We can make separate tests of the null hypotheses {31 = 0 and {32 = 0 by substituting this estimate S2 for 0'2 in (3.22) and (3.25), but in general these tests are not independent since usually Cov[bt> b2 ] :;6 O. We can make a joint test of the null hypothesis {31 = {32 = 0 as follows. Since (3.30) is distributed as O'2X2(2),
HCbl
 {31)2 ! ' x~
+ 2(b l

{31)(b 2

{32) ! '
XIX2
+ (b 2 
{32)2 !' x~]
O'2X2(2) ,....., (3.35)
2
13.3
SECT.
427
REGRESSION ON TWO INDEPENDENT VARIABLES
and has expected value (j2, and the distribution of the ratio of it to be £(2, n  3). Hence, under the null hypothesis PI = P2 = 0,
~ (b~ I' x: + 2b l b2I' XI X2 + b: I' x:) '" £(2, Il 
2s
3).
S2
will
(3.36)
n
Both the numerator of S2, I (Yv  Y.)2, and the expression in parentheses v
in (3.36), known as the sum of squares due to regression, can be computed in the forms given. However, alternative identities are convenient. We obtain them as follows. Squaring and summing the identity Yv  y
= (Yv 
Yv)
+ (Yv 
(3.37)
y)
gives n
Iv (Yv 
n
=I
y)2
v
n
(Yv  Yv)2
+I
v
(Yv  y)2,
(3.38)
since the cross product n
2 ~ (yv  Yv)(Yv  y) v
n
= 2I
+ bl(Xlv 
(yv  Yv)[y
v n
= 2b l I v = 0,
Xl)
+ b2(X 2v 
X2)  y]
n
(yv  Yv)(X lv  Xl)
+ 2b 2I
(yv  Yv)(X2v  X2)
v
(3.39)
by (3.8) and (3.9). Rearranged, (3.38) becomes n
n
n
I(yv  Yv)2 = I(yv  y)2  I(Yv _ y)2. (3.40) v v v The first term on the righthand side is simply the sum of squares of deviations about the grand mean, and so we are left with finding a conn
venient form for
2 (Yv 
y)2. We note that
v
n
I
v
n
(Yv  y)2
=I
[a
+ bl(Xlv 
Xl)
+ b2(X 2v 
X2)  a]2
v
n
=I
[bl(X lv  Xl)
+ b2(X 2v 
X2)]2
v
n
= b~ I (Xlv  XI)2 v
+ b: I
n
+ 2b l b2I
(Xlv  XI )(X2v  X2)
v
n v
(X 2v  X2)2
(3.41)
428
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
We can obtain an alternative expression for the righthand side as follows. Multiply the normal equations (3.12) and (3.13) by bl and b2 respectively: b~
n
n
2v (XlV 
XI)2
+ b l b 2 2v (Xlv 
n
bl b2
2v (XlV 
XI)(X2v 
X2)
n
+ b: 2 (X2v 
XI )(X2v  X2)
v
X2)2
n
=
bl
2v yixlv 
Xl),
(3.43)
n
=
b2
2v Yv(X 2v 
X2),
(3.44) and add: b~
n
n
2v (XlV 
+ 2bl b 2 2 (XlV 
XI)2
v
n
XI )(X2v  X2)
n
= bl 2 Yv(Xlv  Xl) v
+ b: 2 (X 2v v
X)2
n
+ b 2 2v yix2v  x2).
(3.45)
Thus the sum of squares for regression is conveniently calculated from the righthand side of (3.45) and the remainder sum of squares is then found from (3.40). The test implicit in (3.36) is conveniently put in tabular form (Table 13.1). Table 13.1 Degrees of Mean Source of variance Sums of squares freedom squares bl~'
Due to regression
+ b2~' yX2
2
S2 2
 Yv)2
n  3
s2
 y)2
n 1
yXI
n
2 (yv
Remainder
v n
2 (Yv
Total
v
Finally, the predicted value Y, Y
= a + bl(XI 
Xl)
+ b2(X 2  x2)
(3.46)
+ (JI(XI 
Xl)
+ (J2(x 2  x2).
(3.47)
has expected value 'YJ
= ex.
Since a is independent of bl and b2 , [Y]
a
=
2[1
V[a]
+ (Xl 
+ (Xl 11
X1)2
XI )2V[b l ]
2 ,X~ 
+ (X2 + 2(XI 
X2)2V[b 2] XI )(X2 
x2) Cov[b l , b 2]
2(XI  XI )(X2  X2) 2' X1X2 + (X2  X2)2I'x~J
2' X~ 2' X~  (2' XIX2)2
.
(3.48)
13.4
SIlCT.
THE PARTIAL CORRELATION COEFFICIENT
429
13.4. The Partial Correlation Coefficient In the preceding discussion in this chapter the socalled independent variables Xl> X2 were regarded as fixed variables. They may, however, be random variables, in which case (y, Xl' X 2) will have a trivariate distribution. An important trivariate distribution is the trivariate normal: (4.1) where
OJ
is the determinant
= = since Pi;
=
1
P21
P31
P12
1
P32
P13
P23
1
(4.2)
1  P~2  P~3  P~3
+ 2P12P13P23'
(4.3)
Pi;' and 4> is defined as
1(X~2
cp = 
OJn  2 OJ 0'1
X~2 X~2 x;x; x;x; + OJ22 2" + OJ33 2" + 2OJ12  + 2OJ13 + 2OJ 23 X~X~) 0'2
0'3
0'10'2
0'10'3
0'20'3
(4.4) in which x~ = Xi  ~i' i element in OJ. Thus OJ n OJ12
=
(P12 
= 1
=
1,2,3, and OJij is the cofactor of the ijth
P~3' OJ22
P13P23),
OJ13
=
= 1
P~3' OJ33 = 1  P~2'
P12P2a 
PIa,
OJ23
=
(P2a 
P12Pla).
(4.5) The trivariate normal is the generalization of the bivariate normal to three variables. Geometrically it can be represented by concentric ellipsoids of constant density in threedimensional space. We showed in Section 12.4 that when a bivariate normal distribution in Xl> X2 is integrated over X2 it gives a univariate normal distribution in Xl' Likewise, a trivariate normal in Xl, X 2 , Xa when integrated over Xa gives a bivariate normal distribution in Xl> X2 with parameters ~l> ~2' O'l> 0'2, P12' Now suppose that we have a trivariate normal distribution in Xl' X2, X3 and we choose to consider the correlation coefficient between Xl and X 2 with Xa "held constant," i.e., we take a thin slice through the trivariate distribution parallel to the Xl> X 2 plane and consider the distribution of Xl> X2. In other words, we consider the distribution function p{Xl> x 2!xa}. Bya simple extension of (1.21.7),
P{Xl> x21 x 3} =
p{ Xl'
X 2, X 3 }
P{Xa}
.
(4.6)
430
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
For simplicity, but with no loss of generality, we will assume that the means ~l' ~2' ~a are zero. The substitutions are straightforward. The function P{Xl' X2, xa} is given by (4.1)(4.5). Since integrating P{Xh X2, xa} over Xl gives a bivariate normal distribution in X 2, Xa, when we further integrate over X 2 we get as the distribution of Xa a univariate normal distribution with parameters (0, O'~. After some manipulation, we can get (4.6) in the form
p{ Xl' x2 lxa}
=
{27TO'fJ 1 
x exp { ![1  ( 2 _ 2
P~a0'2.J 1  P~a [1 
(
P12  PlaP2a 2 YJ ~}l .J(1  P~a)(1  P2a)
P12 ~ PlaP2a 2 )2Jl [[Xl  t13(O'l/~a) XS]2 .J(1  Pls)(1  P2S) 0'1(1  PIa)
P12  PlaP2a
. [Xl  PlS(0'1/O'S)Xa][x2  P2a(0'2/O'a)Xa] .j(1  p~s)(l  P~3) 0'1.J 1  P~a0'2.J 1  p~s
+ [x 2 /2a(0'2/~a)xa]2J}.
(4.7)
0'2(1  P2a) This has the form of a bivariate normal density function of random variables xllxa and x21xa with parameters (4.8)
E[XlIXa] = PIa 0'1 Xs = f3lSX a, O'a P2a 0'2 Xa = f32aXa, O's V[xllxa] = 0'~(1  P~a), V[x2IXa] = 0'~(1  P~a), E[x2IXa]
=
_ P( zero, i.e., if P12 = PIS = P2S = 0, then (5.9) becomes
PlS,2S = [( 0'12
~~
~2~S
X2
(5.10)
+ O's2) (2 0'2 + O's2)J!2 ' ~=
~~
and Xs are all
~=
which is never zero except in the trivial case of O's = 0, which would imply that Xs was a constant. If the three coefficients of variation are equal, i.e., if (5.11) then (5.10) gives
1
(5.12)
P132S=' , 2
13.6. Regression on Several Independent Variables The extension of regression analysis to I' independent variables is straightforward. We have (I' I)tuples of observations, (Y., Xl;, ... , Xr,.), 'JI = 1, ... ,n. We assume that y is normally distributed with variance 0'2 about 1], where 1] is a simple linear function:
+
'7
= oc + f3l(Xl
 Xl)
+ ... + f3r(x r 
xr)'
(6.1)
Xl)
+ ... + br(xr 
x r),
(6.2)
The estimated equation is
Y
= a + b1(X l

434
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
and the sample estimates, a, bI> ... , br are obtained by minimizing the sum of squares of deviations between the observed and predicted valu~s, n
R =
I
n
(y.  y.)2 =
•
I
•
{Yo  [a
+ bl(xly 
Xl)
+ ... + b,.(xrv  xr)])2,
(6.3) by differentiating with respect to a, bl , ••• , br and equating to zero. This procedure gives 1n a =  I y. = fj (6.4) n • and a set of r simultaneous linear equations, the socalled normal equations:
+ b2I' Xl X2 + ... + br I' Xl Xr = I' yXl' blI' Xl X2 + b 2 I' x: + ... + br I' X2Xr = I' yX2'
bl I' x:
(6.5)
bl I' Xl Xr + b2 I' X2Xr + ... + br I' X~ = I' yxr • These equations can be solved for the b's. However, an alternative procedure which will also give us V[b i ] and Cov[b i , b;] is to be preferred. Suppose that there exist constants Cn, Cl2 , ••• , Cl r with properties to be defined later. Multiply the first of the normal equations by Cn , the second by C12 , etc., sum the resulting equations, and rearrange so as to collect all the coefficients of bl together, all the coefficients of b 2 together, etc.:
+ C12 I' Xl X2 + ... + clt'I' XlXr) + b2(c n I' Xl X2 + C12 I' x: + ... + Clr I' X2Xr )
bl(c n I' x:
+ b,.(cnI' XlXr + C12 I' X2Xr + ... + Clr I' x~) = Cn I' yXl + C12 I' yX2 + ... + Cl r I' yxr·
(6.6)
Let us require that the coefficient of bl in this equation is 1 and the coefficients of all the other b's are zero, i.e., that
+ C12 I' Xl X2 + ... + Clr I' XlXr = cnI' Xl X2 + C12 I' x~ + ... + Clr I' X2Xr =
cnI' x:
1, 0,
(6.7)
SECT.
13.6 REGRESSION ON SEVERAL INDEPENDENT VARIABLES
435
This is a set of r simultaneous linear equations for r unknowns, namely, the constants cu , j = 1, ... , r, and they can be solved to give solutions for the Cu in terms of the observed Xiv' since the various sums of squares and products 2' x~, 2' XiXi are readily calculated. Substituting the equations (6.7) in (6.6) gives bl
= Cll 2' yXI + Cl2 2' yX2 + ... + Cl r 2' yxr ;
(6.8)
so once the Cli have been found bl can be calculated simply, since the yXi are known numbers. For each bi' we can find a similar set of Cii (i = 1, ... , r) which will do the same trick for bi as (6.8) does for b l • For bi we have the set of simultaneous equations
I'
Cil
2' x: + Ci 22' XI X2 + ... + Cir 2' XlXr = 0,
Cil
2' XlXr + Ci2 2' X 2Xr + ... + Cir 2' X~ =
0,
where the righthand sides are 1 for the ith equation and zero for all the others. The ith equation has 2' X~ as the coefficient of Cii' Analogous to (6.8), using (6.9) in (6.6), (6.10)
Recalling that 2' YXi
n
= 2 YvCXiV

Xi)' we can write this as
v
n
bi
= 2 Yv[Cil(Xlv 
Xl)
+ Ci2(X 2v 
X2)
+ ... + Cir(Xrv 
x r )],
(6.11)
v
which shows that b i is a linear function of the Y v ' Since the Yv are normally distributed, b i will be also. The quantity in square brackets in (6.11) is solely a function of the x's. It will be convenient to have a symbol for it, say k iv
Then
= Cil(Xlv 
Xl)
+ Ci2(X 2v 
x 2)
+ ... + cir(xrv 
xr)'
(6.12)
(6.13)
and n
V[b;]
= 2 k:vV[yv] v
=
n (J22 k:v' v
(6.14)
436
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
Thus when we have found I" k~., we will have V[b i ]. Now
I" k~. •
• = I" ki.ki• = I" ki.[Cil(Xly • • =
Cil
I"
•
ki.(xly  Xl)

Xl)
+ ... + Ct.(Xr• 
+ ... + Cir I"
•
ki.(xr•  xr)'
In the sequence of terms with the index ij on the c's, i fixed and j r, examine a particular one, say that for j = h: Co.
I"
(6.15)
=
1, ... ,
k i .( Xh.  Xh)
• = Cih I" •
[Cil(Xly 
" = CiI.[Cil I •
(Xly 
+ ... + Co.(Xh•  X,.) + ... + Ci.(Xr•  Xr)](Xh• " XI)(X h•  Xh) + ... + Cih I (Xh•  Xh)2 + ...
Xl)
•
CO.(Cil
Xh)
•n
+ Cir I =
Xr )]
(Xh•  X,,)( x r •  Xr)
I' XIX" + Ci2 I' x 2x" + ... + co, I' X:' + ... + Cir I' XhX,), (6.16)
We have two cases to consider; h
= i and h :F
i. If h
= i, (6.l6) is
and the part in brackets is identically equal to the lefthand side of the ith equation in the set (6.9), for which the righthand side is l. Thus, when h = i, (6.l6) equals Cii • For all values of h other than i, the part in brackets of(6.16) is one of the equations (6.9) which equals zero. Thus in (6.15) all the terms for which i :F j are zero, and the one term for which i = j is equal to Cj j :
I"
k~.
=
(6.17)
Cii'
so, from (6.14), (6.18) To obtain the covariance of bi' b i , we proceed as follows: From (6.13), n
bi
+ bi = I
•
n
y.ki•
+I
•
n
y.Tc i •
=I
•
y.(ki•
+ k i .),
(6.20)
SECT.
l3.6
437
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
with variance n
V[bi
+ bi] = !
(kiv
n
+ k;v)2V[yv] =
0'2! (kiv
v
+ k;Y
v
n
= 0'2! le~v
v
n
n
v
v
+ 0'2! le~v + 20'2 !
leivle;v
n
= V[b i ]
+ V[b;] + 20'2! leivle;v.
(6.21)
Substituting in (6.19) gives
Cov[bi, b;]
=
n
(6.22)
0'2! leivle;v. v
From the definition of k iV ' (6.12), n
!v
n
leivle;v = cil ! k;.(Xlv  Xl) v
n
+ ... + cir !
k;v(xrv 
xr ).
(6.23)
In the sequence of these terms with the index if on the c's, i fixed and 1= 1, ... ,t, examine a particular one, say that for I = h:
Cih !" k;v(xhv  x,,) v
= CUI!" [C;l(Xlv 
Xl)
+ ... + c;,,(xhv 
X,,)
+ ... + c;,{xrv 
xr)](x"v  Xh)
v
(6.24) We have two cases to consider: h
=j
and h
y6
j. If II
= j, (6.24) is (6.25)
and, except that j appears in place of i, the part in brackets is identical with the ith equation in the set (6.9), for which the righthand side is 1. For all values of II other than j, the part in brackets of (6.24) is one of the equations (6.9) for which the righthand side is zero. Thus, in (6.23), all the terms are zero except one, which is C;j. Therefore
!"
lei,.le;v
=
(6.26)
cu ,
and so, from (6.22), (6.27) We can write the deviation between the observation Yv and the true value '7 v as Y.  'YJv = (yv  Yv)
+
(a  CI.)
+ (b l

f31)(X lv

+ ... + (b r 
Xl)
f3r)(x rv

xr)' (6.28)
438
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
Squaring and summing gives n
1l
I
(y.  1].)2
=I
v
(y. 
yy + /I(a
r
 OC)2
+I
rl
r
+ 2I I
(b i
i=1 ;=i+l

(3i)2

I' X~
(3i)(b,  (3;) I' XiX;.
The lefthand side is distributed as ()'2x 2(n).
. I (Y. 
(b i
i
v
(6.29)
On the righthand side
Y.)2 is distributed as ()'2x 2(n  I  r), n(a  OC)2 as ()'2X 2(1), and
•
the remaining part, as surmised in (3.31), is distributed as ()'2x2(r). Each of these components is independent of the other. If we define 1 n S2 = I (y.  y.)2, (6.30) 1111'. it will be distributed as ()'2x2(n  1  r)/(n  1  1') and have expected value ()'2. We can make separate tests of the null hypotheses (3i = 0 by substituting this estimate S2 for ()'2 in (6.18), but in general these tests are not independent since usually Cov[b i , h;] ¥= O. We can make a joint test of the null hypothesis (3i = 0 for all i as follows. Since (3.31) is distributed as ()'2x2(r),
;t 1 [r
(bi  (3i)2
rl
r
I' X~ + 2i~ ,J_)bi 
(3i)(b;  (3;)
I' XiX,
]
()'2X2(,.) r..J  , . 
(6.31) and has expected value ()'2, and the distribution of the ratio of it to 8 2, (6.30), will be F(r, 11  1  1'). Hence, under the null hypothesis Pi = 0 for all i,
1(i b~ I' X~ + 1 i rs2
i=l '=i+l
i
bib, I' XiX,)
r..J
F(r,
11 
1  1'). (6.32)
We could calculate the quantity in parentheses, known as the sum of squares due to regression, in this form, but an identity is usually more convenient. We multiply the normal equations (6.5) by bl , b2 , etc. and sum: b~ I' X~ + b~ I' + . . . + b~ I' X~ + 2b 1 b2 I' Xl X2 + ...
x;
+ 2br_ I br I' Xr 1Xr r rl r = I b~ I' X~ + 2 I I bib, I' XiX i i=l '=i+l = bl I' yXI + ... + br I' YXr'
j
(6.33) which is the usual form for computing the sum of squares due to regression.
To calculate
i
•
(y.  Y.)2, arguments similar to those of (3.41) to (3.45)
SECT.
13.7
439
A MATRIX REPRESENTATION
will give n
!
(Y.  y)2 = b1 !'
yXI
+ ... + br !' YXr'
(6.34)
n y)2 _ !(Y. _ y)2.
(6.35)
and as in (3.40), n
n
!(Y. _ y.)2
•
= !(Y. _ •
•
Finally, the predicted value Y, (6.2), has expected value variance V[Y] = V[a]
..
+!
(Xi 
1],
(6.1), and
Xi )2V[bi ]
i=1
,,1 r
+ 2! !
(Xi 
X;)(X; 
X;) Cov[bi , b;]
i=1 ;=£+1
13.7. A Matrix Representation The set of equations (6.9) gives a set of c;;,j = 1, ... , r, for each value of i, i = 1, ... ,r. These Cij can be written in a square matrix, say C, which is known as the c matrix:
C=
(7.1)
This section, which uses matrix notation, is not used explicitly in the remainder of this chapter and may be omitted. Its purpose is to show that C is actually the inverse of the matrix of the coefficients in the normal equations (6.5). Let the n X r matrix of values of Xi.  Xi' corresponding to the values of Xi at which observations were made on y, by X: Xu

Xl
X21  X2
X"l 
X..
Xl2

Xl
X22  X2
Xr2

Xr
X=
(7.2)
Xln  Xl
X2n  X2
Xrn  X..
440
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
Then the transpose of X, namely X', is
X' =
(7.3)
X rI 
Xr
X r2 
xr
and, using the notation of (3.14),
X'X
k"'"
=
x
'1'X 2
k"'" X22
~'
XIX r
~'
X 2X r
(7.4) ~'
XIX,
~'
~' x~
X 2X r
is the matrix of the coefficients of the b's in the normal equations (6.5). Define Band G as ~'
yXI
~/YX2
G=
B=
(7.5) ~/yxr
br
Then the normal equations (6.5) can be written as X' XB = G. Premultiply this by (X' X)l. Since (X' X)I(X' X) = I, this gives B
= (X' X)IG.
(7.6) (7.7)
Thus the regression coefficients b i can be obtained by premultiplying G, the righthand sides of the normal equations, by the inverse of (X' X), (X' X) being the matrix of the coefficients ofthe b's in the normal equations. Suppose that C = (X'X)l. Then (7.7) is equivalent to bl
Cll
C12
Clr
~/YXI
b2
C21
C22
C2r
~/YX2
(7.8)
=
br
Cri
Cr2
Crr
~/yxr
SECT.
13.8
A TEST COMPARING r VARIABLES AND
q
441
VARIABLES
which gives r
bk =
Ck1
1' yXI + Ck2 1' yX2 + ... + Ckr l' YXr = 1, Cki l' yx" (7.9)
which is identical to (6.10). Substituting C gives
= (X' X)l in (X' X)l(X' X) =1
l' X1Xr l' X 2X r
o
1
o o
l' X~
o
0
1
1 0
(7.10) Now two matrices are equal if and only if the corresponding elements are equal. If we carry out the matrix multiplication on the lefthand side of (7.10), we get a single matrix, and if we put each element in the first row equal to the corresponding element in the first row of the matrix on the righthand side of (7.10), we get
1' X~ + c12 1' X 1X 2 + ... + Clr l' X1Xr = 1, cn1' X 1X 2 + c12 1' x: + ... + Clr l' X 2X r = 0,
C11
(7.11)
which set of equations is identical with the set (6.7), and in general equating the ith columns gives (6.9). This shows that the matrix of the cij obtained by solving the sets of equations (6.9) is in fact the inverse of the X' X matrix. 13.8. A Test of Whether Regression on r Variables Gives a Significantly Better Fit than Regression on q Variables Clearly there is no point in using a regression equation of y on r variables Xi' i = 1, ... , r, unless it gives a significantly better fit than the regression equation of y on a subset of the Xi' i = 1, ... ,q. In this section we derive a likelihood ratio test.
442
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
We first need the maximum likelihood estimators of ot and the fli' The density function of y is p{yv}
=
N(ot +
=
1
.J27T.J
*
fli(Xiv  Xi)' 0'2)
(1
[Yv  ot  *fliXiV 
exp  0'2 2
xi)Tj
0'2
(81) '.
The likelihood function is
The logarithm is easier to handle:
J2
/n 1 n [ r log L= n log y 27T  log 0'2  2:2 Yv  ot  :2 flix;v  Xi) . 2 20' v i
(8.3) As usual, we differentiate with respect to the parameters:
a~ log L i n [ =  2 (1)(2):2 Yv uot
20'
v
r
ot  :2 fli(X,v  Xi)
J= 0,
(8.4)
i
a8ii: log L i n [ =  20'2 (1)(2) ~ Yv 
ot 
t fli(X;v r
]
Xi) (Xiv  Xi) = 0,
(8.5)
alog L
11
~ =  20'2 
1
1
n [
2(1) (0'2)2~
,
r
J = O. 2
Yv  ot  tfli(XiV  Xi)
(8.6) n
From (8.4) we get & = :2 yin, identical with the least squares solution for a in (6.4). From the set of r equations (8.5) we get a set of r simultaneous equations identical to those of (6.5) with = bi' From (8.6) we get
Pi
82 =
! iv
n
[Yv  & 
f P.cx;v i
Xi)J 2 = J
i (yv 
Yv)2.
(8.7)
11 v
Thus the maximum likelihood estimator of 0'2 is biased due to the occurrence of the divisor n in place of n  1  r. The maximum likelihood estimators of ot and fli are identical with the least squares estimators, but whereas the former are derived under the assumption of normality, (8.1), the derivation of the least squares estimators involved no assumption about the form of the distribution.
SECT.
13.8
A TEST COMPARING
r
VARIABLES AND
q
443
VARIABLES
We now construct a likelihood ratio test of the null hypothesis that y can be fitted by a regression equation with parameters IX, {JI> ... , {Jq against the alternative hypothesis that additional parameters {Jq+1' ..• , {Jr are necessary. be the estimate of the residual variance when q {J's are used, i.e., Let for ro, and let O'~ be the same quantity when all r {J's are used, i.e., for Q. Similarly, let P;q, Pir be the estimators of {Ji when the regression equation includes q and r (J's respectively. The likelihood ratio is
O'!
*
t
L(ro) [(2'7T(1!Y'/2r1 exp {_(2&!)1 [yv  oc Piq(XiV  Xi)]2} A = .:~.!.....!.... L(Q) [(27T(1~)"/2]1 exp {(2&~rl ~ [yv  oc  Pir(Xiv  Xi)d .
*
(8.8)
The summations in the two exponents are identical with the summations in the expression for 8 2, (8.7), so (8.9) Any monotonic function of A will serve as our statistic. The function below on the lefthand side of (8.10) is a function of s:/s~ where s:' s~ are the usual unbiased estimators of the residual variance, and s:, s~ are, of course, functions of 8:, 8~. The statistic is usually calculated in the form given on the righthand side, where Gr and Gq are as defined in Table 13.2: (r  q)I[(1I  1  q)s:  (II  1  r)s~]
=
(G r
Gq )/(I'  q)

(8.10)
Table 13.2 Source of variance
Sums of squares
Degrees of freedom
Mean squares
A. Regression on Xl' ••• , xOJ Xq+h • . . ,
x,
G,
r
B. Regression on xb ·•• ,xO' C. Difference A  B
= difference E 
F. Total
Xq
G. r q
D
D. Remainder using ~b ••• , xq , X O+ 1 , ••• , x, E. Remainder using Xb""
q
1I1r ll1q 111
G, G. = (11 I  q)s:  (11 I  r)s~
G, G. rq
(11 1  r)s~
S2
(11 1  q)s:
82
,
•
444
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
While we have shown that this statistic is equivalent to the likelihood ratio statistic, we have not given any indication as to its distribution und~r the null hypothesis. In fact, under the null hypothesis it is distributed as F(1'  q, n  1  1'). A proof of this for the general case is complicated (see, e.g., Section 14.3 of Anderson and Bancroft [2] and Section 5.8 of Kempthorne [3]) and we consider only the case of l' = 2, q = 1. For this special case, Table 13.2 becomes Table 13.3. Table 13.3
Sums of squares
Source of variance Regression on Xl> X 2 jointly Regression on Xl only Difference due to adding x 2 to
A B
Degrees of freedom 2 1
C=AB
X1
Mean squares
C
Remainder using regression on xl> X 2 jointly
D =FA
n3
Total
F
11 
S2
1
Our objective is to show that the difference, say C, between the sum of squares due to regression on Xl, X2 jointly, say A, and the sum of squares due to regression on Xl only, say B, is independent of the remainder sum of squares, say D, and that under the null hypothesis f32 = 0 the ratio
(8.11) [I' X1Y I' X1X I' X2Y I' X~]2 I' XUI' x~ I' x~  (I' XlX2)2] .
2 =~~~~=~~~
(8.12)
We will derive (8.12) by a different route, which will make it clear that it is independent of S2 = D/(n  3). Consider the regression of X 2 on Xl'
SECT.
13.8
A TEST COMPARING l' VARIABLES AND
q
445
VARIABLES
The estimated regression equation is
X2
=
_ X2
+
~'XIX2
~'X~
(Xl 
_ Xl)'
(8.13)
We define Xu as the deviation of x 2 from the value predicted by this equation, i.e., (8.14) Similarly, the estimated equation for the regression of y on
Xl
is (8.15)
and we define Y2.l as Y2.1
=
y  y 
~'xy
£., I ( ~ Xl 
)
(8.16)
Xl •
£., Xl
We now consider the regression of Yu on L Yu = 0 we can write
Xu'
Since
~ Xu
=0
and
(8.17) From the definitions of Xu and Yu it is straightforward to show that (8.18) (8.19) Making these substitutions in (8.17) and noting (3.20), we have b
_ (~' X~)l[~' x~~' X2Y  ~'XIX2~' xIY] _ b (8.20) (~' XDI[~' x~~' x~ _ (~' XIX2)2]  2'
112.1:»2.1 
The sum of squares due to regression of Y2.1 on Xu, say C', is, by (11.3.11 ),
(8.21)
446
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
The remainder sum of squares for the regression of Y2.1 on X2.1' say D', is
D'
=!
Y~.l
 (sum of squares due to regression of Y2.1 on xu)
(8.22) substituting for Y2.1 from (8.16) and using (8.21). It is straightforward to show that (8.22) can be written as D' =
!
(y _ y)2
_{(I' XIy)2 + (!' x~!' X2Y !' X~
!' XIX2!' XIy)2} !' x~[!' X~ !' X~  (!' Xl X2)2] . (8.23)
Now from the identity of (8.11) and (8.12),
(!' XIy)2 + [I' xly!' XIX2  !' x 2y!' X~]2 !' X~ !' x~[!' X~ !' X~  (!' Xl X2)2] = sum of squares due to regression of Y on
Xl> X 2 jointly.
(8.24)
Therefore
D' =
!
(y  y)2  (sum of squares due to regression ofy on Xl' x2 jointly)
= D.
(8.25)
By the usual arguments (see Section 11.3) for simple linear regression on one independent variable, under the null hypothesis that the regression coefficient is zero the remainder sum of squares D' is independent of the sum of squares due to regression C', and they both have the same expected value. It therefore follows under the null hypothesis that D = D' is independent of C = C', and that the ratio of the corresponding mean squares has the F distribution with the appropriate degrees of freedom. The test of Table 13.3 of whether X2 adds to the fit of the regression of Y on Xl is identical with the test of the null hypothesis that {32 in the joint regression equation is zero. For the latter test we would use (3.20) for b2 and (3.25), modified by the substitution of S2 for (12, for J7[b 2] for the statistic
1
!' x2y!' x~  !' xly!' XIX2
= ~ (!' xDJ,2[!, x~!' x~  (!'
XIX2)2]J,2'
(8.26)
SBCT.
13.10
FURTHER USES FOR THE
c MATRIX
447
and this will have the ten  1  2) distribution under the null hypothesis. Its square is identical with the test of Table 13.3, the ratio of C as in (8.12) to S2, which has the F(l, n  1  2) distribution. The two tests are therefore identical. 13.9. Polynomial Regression The conventional designation of Xl' X 2 as "independent" variables is unfortunate, for in general, except in suitably designed experiments, they are not independent, since usually their covariance is not zero. Nothing in the foregoing development is assumed about the Xi except that they are known variables measured without error. We are therefore quite at liberty to use as X 2 the square of Xl, i.e., to fit the equation
+ fJI(XI  Xl) + fJ2[(X~) = constant + fJIX I + fJ2X~
'YJ =
(l
(x~)]
The foregoing techniques can therefore be used to fit a seconddegree curve if we think it necessary. The test of Table 13.3 enables us to test whether a seconddegree curve gives a significantly better fit than a straight line. 13.10. Further Uses for the c Matrix One advantage of the c matrix method of handling multiple regression calculations is that once the c matrix has been obtained it is a relatively trivial matter to carry out the regression analysis of another independent variable, say z, on the same x's. We merely need the L' ZXi , i = 1, ... , r, and substitute them in (6.10) to obtain the regression coefficients of Z on the x's. In this way, e.g., we can readily compare the results of regressing on the x's various functions of y, e.g., log y or l/y, to find the functional form which gives the best fit. A statistic appropriate for comparing fits is the multiple correlation coefficient, the sample value of which is defined by
R2
=
1_
remainder sum of squares of y unconditional sum of squares of y
_ sum of squares due to regression  unconditional sum of squares of y .
(10.1) (10.2)
So far in this chapter, with the exception of Section 13.4 on the partial correlation coefficient, we have supposed that the Xi' i = 1, ... ,r, be known fixed variables. In general, however, they may be obtained in any of three different ways.
448
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
1. They may be chosen deliberately in some specified pattern. In particular, suppose that there are three variables on numerical scales, Xl taking only r values, X 2 only t values, and Xa only u values, and we obtain an observation on y for every combination of Xl with X 2 and with X 3• The total number. of observations will be rtu. In this case the simple and multiple regression coefficients of each X on each of the other x's will be zero, and by (12.4.20) and (4.12) the corresponding correlation coefficients will be zero also. 2. Certain ranges for the variables Xi may be selected, and combinations of values of Xl' X2 and X3 chosen by some randomization procedure. So far the only difference from case 1 is that the x's are chosen at random instead of by some deliberate pattern or design. However, now the regression coefficients of the x's among themselves will be distributed about expected values of zero with some variance. Furthermore, if the original choices of the Xi were made from normal distributions, then we will have a multivariate normal distribution with the sample correlation coefficients rfJJifJJj distributed around population values PfJJ/fJJ j = O. . 3. We may observe a system from the outside, without exercising any control over it. If the Xi have normal distributions, then we may have a multivariate normal distribution. Unlike case 2, in general the expected values of the regression coefficients of the x's among themselves will not now be zero. In cases 2 and 3 we may be prepared to assume the multivariate normality. As implied in Section 13.4, just as we can carry out a simple regression analysis on a bivariate normal population, so we can carry out a multiple regression analysis on a multivariate normal population. We may wish to make tests of significance and set confidence limits on the regression coefficients of xp on the other x's in the regression equation of xp on the other x's. The c matrix gives us directly these regression equations. Write out the equations (6.9) with i = 1', omitting that with 1 on the righthand side; at the same time move over to the righthand side the terms involving cpp
SECT.
13.10
FURTHER USES FOR THE
Now divide throughout by
( _ C"l)
C,,"
C
MATRIX
449
c""
~' xlxr + (_ C"2) ~' x 2x r+ ... + (_ C"r) ~' x~ = ~'x"xr' C,," C,,"
(10.4)
Comparing this set of equations. with a typical set of normal equations (6.5), it is apparent that these are the normal equations for the regression of x" on Xl> ••• , Xr (omitting x), and the regression coefficient of x" on Xl is C"l/C"", etc. Thus, having found the C matrix (7.10), we can immediately write down the regression equation of x" on the other x's: x"
=
x"
+ (
C"I)(XI  Xl)
e""
+ ... + (
C"r)(Xr  xr)'
(10.5)
C,,"
Obviously in the sequence i = 1, ... , r on the righthand side, i = I' is omitted. Each row in the C matrix gives us the regression equation of a different x on the remaining x's. A more explicit notation for regression coefficients is sometimes desirable. Let Xi stand for the sequence Xl, X 2, ••• , Xr in which x" and X k are omitted. Then by b"'/l"'~' . we mean the regression coefficient of x" on X k in the regression equation of x" on all the x's, Xl to Xr (but naturally excluding x,,). Thus in (10.5) "'j
j
= 1, ... , r(butj =F 1,1').
(10.6)
We can readily obtain from the C matrix the estimated variances of the regression coefficients. We first obtain the estimated residual variance of x" about the regression plane on the remaining x's, say S2. In the set of equations (10.3) we omitted the equation in (6.9) whose righthand side was 1:
(10.7)
450
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
Dividing by (c,,):
( C"l) ,2'
XIX"
+ ( C"2) ,2' X 2X" + ...
C""
C"" (10.8)
Since c"l/c"" is the regression coefficient of x" on Xl' etc., the lefthand side of this is exactly (6.33), the sum of squares due to regression. But we do not need to calculate it in that form, for (l0.8) gives it as,2' x!  I/c" . And, comparing (6.35), it is obvious that the residual sum of squares of; about the regression plane on the other x's is I/c"". This residual sum of" squares will have [11  I  (I'  I)] = n  I' degrees of freedom; so s! = (l/c",,)/(n  1'). We may note here another interesting relationship. * If Ri is the multiple correlation coefficient of Xi on the remaining x's, we can substitute the righthand side of (10.8) in (10.2) to get
R~ =,2' x~  1/cu = 1 _ •
Solving for
Cit
~, x~1. £..,
1.._1_. Coo ~, x~, u£..,
(10.9)
and substituting this solution in (6.18) gives (10.10)
analogous to (3.24). This shows that V[b i ] increases as Xi is more highly correlated with the other x's. So far we have the regression coefficients of xI' on the other x's, and also the residual variance, but to get the variance and covariance of these regression coefficients we need the c matrix corresponding to the equations (10.4). This can be obtained from the original c matrix by the formula (lO.l1)
c;;
where is the ijth element of the c matrix obtained when X is omitted. To prove this formula, order the x's so that Xr is the variable to be eliminated. Then corresponding to (6.7), but omitting the last of those
*
Pointed out to the author by Cuthbert Daniel.
SECT.
13.10
451
FURTHER USES FOR THE C MATRIX
equations, we have
~' x~ + C12 ~' X l X 2 Cll ~' X l X 2 + C12 ~' x~
Cll
Cn ~' Xl X.._ l
+ ... + CUrI) ~' XlXrl + Clr ~' XIX,. = 1, + ... + CUrI) ~' X 2Xrl + Clr ~' X2X" = 0,
+ C12 ~' X 2Xrl + ... + CUrI) ~' X~l + Clr ~' X ..Xrl =
O. (10.12)
The corresponding set of equations for the reduced
C
matrix is
I' X~ + C~2 ~' + ... + ~' XlXrl = 1, C~l I' Xl + C~2 ~' + ... + C~(rl) ~' = 0, C~l
Xl X2
X2
C{(rl)
X:
X 2Xrl
Subtracting corresponding equations in (10.12) from (10.13) gives (C~l

cll ) ~' x~
+ (C~2 
c12 ) ~'
Xl X2
+ (C{(rl) (C~l

Cll )
~'
XlX,._l
+ ... ~'
CUrI»
XlXrl 
+ (C~2  C12) ~' X 2X r _ l + ... + (C~(rl)  Cl(rl» ~' X:_l

~'
Clr
Clr
~'
XlXr
= 0,
= 0, (10.14)
XrXrl
whence
' + C1(,.1)
C
Url) "'" X X
k

I' rl 
"'"
x x T'
£., '1'
Cl r
C ,_ C 11 11 "'" X X k ' I' rl Clr
+ C12' 
C
12 "'" X 2'X rl k
Clr
+ ...
I
+ CUrI) Cl r
CUrI) "'" X 2 k rl
= k "'" 'X r1'X r'
(10,15)
452
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
These equations are identical with the normal equations for the regression of Xr on Xl' ••• ,Xrl' Thus (10.16) Substituting for
b:J),:J)j • :J)k
from (10.6) gives
(10.17) whence
(10.18) and in general, with x" rather than ,
cij
Xr
= cij 
as the eliminated variable, Ci"C"j
 .
C,,"
(10.19)
Thus, to get the estimated variance of the regression coefficient of x" on we need
X k , S!Ckk ,
(10.20) and then
The formula (10.19) can be used also to drop from a regression equation a variable, say x,,, that appears, after construction of the regression equation, to be nonsignificant. We could, of course, start again from the beginning, but, if we have four or more independent variables, it is more expeditious to remove the /lth row and column from the C matrix and compute the new b's from (6.10). Alternatively, it can be shown that, if bi is the regression coefficient of y on Xi in the multiple regression equation of y on all the x's, and if b~ is the regression coefficient of y on Xi in the multiple regression equation of y on all the x's excluding x,,, then
(10.22) 13.11. Biases in Multiple Regression In (10.22), b~ represents the regression coefficient of y on Xi' the regression equation containing Xl> X2' ••• , Xm but excluding x"' We can write
SECT. 13.11
BIASES IN MULTIPLE REGRESSION
453
this as bY"'i' "'I' where Xj is understood to stand for the sequence of x's from Xl to Xm but excluding Xi and xI" Also, b i represents the regression coefficient of y on Xi' the regression equation containing all the x's. We can write this as bll"'i' "'/' where x; is understood to stand for the sequence of x's from Xl to X m , excluding Xi' but including xI" Also, bJJ is the regression coefficient of y on xI" the regression equation containing all the x's. We can write this as bll"'JJ' "'t, where is understood to stand for the sequence of x's from Xl to Xm but excluding xI" The remaining item in (10.22) is ciJJ/cJJJJ • Since c iJJ = CJJi , this item is equal to cJJ;/cJJJJ • Reference to (10.5) shows that this is the regression coefficient of xI' on Xi' the regression equation containing as independent variables all the x's except x. I' We can write this as b",I''"i ' ""I where x,' is the same as in the preceding paragraph. We thus write (10.22) as
x;
where
Xi
bY"'i''''1 = bY"'i'''';' + b"'JJ"'I·",ill"'JJ''''/ = Xl> ... , Xm but excluding Xi' XI"
x~ = Xl' ... , X;
but excluding
Xi'
= Xl' ... , Xm but excluding
xI"
X".
(11.1)
The implication of this equation may be clearer if we consider a simple specific instance, say m = 3, i = 1, f.l = 3. Then Xj = X 2' x; = X 2X s ' = X I X 2 ' and (11.1) reads
x;
(11.2) This equation shows that the regression coefficient of y on Xl, ignoring xs, is a biased estimator of the regression coefficient of yon Xl' not ignoring xs, by an amount a function of the product of the regression coefficient of y on Xs and the regression coefficient of Xs on Xl' Equation (11.2) illustrates why the application of multiple regression techniques to observational data can be so treacherous and misleading. The apparent regression of y on Xl may really be due to the fact that y is dependent on xs, and Xs is correlated with Xl' We may fail to observe xs, and attribute the regression of y on Xl to a functional dependence which may be wholly false. In most circumstances, therefore, any indications produced by a multiple regression analysis of observational data are merely a good hint to try for confirmation by a proper experiment. In a true experiment the independent variables will be properly randomized with a table of random numbers and will have low correlations differing from zero by only random fluctuation, or else in a completely balanced experiment the correlations will be exactly zero.
454
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
The justification sometimes advanced that a multiple regression analysis on observational data can be relied upon if there is an adequate theoretical background is utterly specious and disregards the unlimited capability of the human intellect for producing plausible explanations by the carload lot. For attempts to investigate these difficulties, see Tukey [4] and Simon [5]. A further reason for being suspicious of inferences from a multiple regression analysis on observational data is that there is no guarantee that the residuals are independent. 13.12. An Example of Multiple Regression The data of Table 13.4 will be used to illustrate the foregoing procedures. They were obtained from 21 days of operation of a plant for the oxidation of ammonia to nitric acid. The Xl column represents the rate of operation Table 13.4 Xl
X2
X3
Y
Xl
X2
X3
Y
80 80 75 62 62
27 27 25 24 22
89 88 90 87 87
42 37 37 28 18
58 58 58 58 50
18 17 18 19 18
89 88 82 93 89
14
62 62 62 58 58
23 24 24 23 18
87 93 93 87 80
18 19 20 15 14
50 50 50 50 56 70
18 19 19 20 20 20
86
7 8 8 9 15 15
Xl
x2 x3
y
72
79 80 82 91
13 11
12 8
= air flow = cooling water inlet temperature = acid concentration = stack loss
of the plant. The nitric oxides produced are absorbed in a countercurrent absorption tower. The third variable is the concentration of acid circulating, minus 50, times 10: i.e., 89 corresponds to 58.9 per cent acid. The X 2 column is the temperature of cooling water circulated through coils in the absorption tower. The dependent variable y is 10 times the percentage of the ingoing ammonia to the plant that escapes from the absorption
SECT.
13.12
455
AN EXAMPLE OF MULTIPLE REGRESSION
column unabsorbed, i.e., an (inverse) measure of the overall efficiency of the plant. I We will fit a linear regression equation similar to (6.2). We need all the sums, sums of squares, and sums of products (Table 13.5), and these Table 13.5
Y
Y
Xl
X2
X3
8518
23,953 78,365
8,326 27,223 9,545
32,189 109,988 38,357 156,924
368
1,269
443
1,812
xl x2 x3
Totals
are expressed as sums of squares and products of deviations from the means in Table 13.6. We now set up the equations (6.9). To give more convenient numbers we will temporarily multiply the righthand sides by 10,000. Also, although we can obtain the hi from the c matrix from (6.10), we might as well add an additional set of righthand sides L' YX i • We are thus solving four separate sets of three simultaneous linear equations, in which the numerical coefficients on the lefthand sides, L' x~, L' X 1 X 2' etc., are identical for the four sets, but those on the righthand sides do differ. Table 13.6
Y Xl x2 x3
Y
Xl
X2
X3
2069.238
1715.286 1681.143
562.952 453.143 199.810
435.857 491.429 132.429 574.286
The equations (6.5) and (6.9) are repeated in rows I through 3 of Table 13.7, and the numerical values for the coefficients entered from Table 13.6 in rows 4 through 6. The problem of solving simultaneous linear equations, though simple in theory, is arduous in practice, and has a long history. For conventional desk calculation a method known as the Doolittle, though probably due to Gauss, is one of the most satisfactory. Various modifications have been proposed, but the effort in learning their details seems to outweigh the slight savings they achieve. For r greater than 5, the work becomes excessive, and recourse should be made to an electronic digital computer. Programs are available for the standard machines.
~
0\
1:;
Table 13.7 Cll
1 2 3
l:' :I:~ l:' :1:1:1:2 l:' :l:l:1:a
4 5 6
1681.143 453.143 491.429
7
1.0
8 9
453.143 453.143
10 11
Cia
l:' :l:l:1:a l:' :1:: l:' :l:2:1:a
l:' :l:l:1:B l:' :l:a:l:B ~' :l:aI
491.429 132.429 574.286
0.269544590
0.292318381
77.667756 1.0
i= 1
Cia
453.143 199.810 132.429
199.810 122.142244
0
Righthand sides
Coefficients of
132.429 132.462028 0.033028 0.000425247
l:' Y:l:l l:' y:l:a l:' y:l:a
1715.286 562.952 435.857 1.020309396 562.952 462.346062 100.605938 1.295337256
i= 2
1=3
~
~
10,000 0 0
0 10,000 0
0 0 10,000
~
10,000 0 0
0 10,000 0
0 0 10,000
t""
0
0
0 2,695.44590
10,000 0
0 0
2,695.44590 34.704825
10,000 128.753559
0 0
5.94833896
til
t'I!
rii
~
~
t'I! t'I!
= ~t'I! z .j ~ ~
~
@
~

IN
1.0
26
0.044468874 1.985070 0.002887 6.788121
1.0
21
0.349133712 9.355276 34.704837 0.002662
0.000064691 0.002887760 +0.000004199 +0.009874945
1.0
17 18 19 20
1.0
430.632256 1
0.000001
15 16
22 23 24 25
574.286 143.653730 0.000014
491.429 491.429
132.429 132.462028 0.033027
12 13 14
0.715644558
1.020309396
1.295272565
1.295337256
65.509844 0.152124796
435.857 501.409626 0.042782
17.288615
5.948339
34.707713
34.707724
0
128.753563
128.753559
4.25247 0.0098749
2,924.33004 6.7907826
34.704825
0 0 4.25247
0 2,923.18381 1.14623
6.790783
0
0.009874945
0
10,000 23.221669
10,000 0 0
J
VI
,J:o.
!
~
~
!:; ::a
~
~
~
~
~
trl
Z
>
N
~

R
~
458
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
The procedure for solving the equations is as follows. 1. In line 7 divide line 4 by minus the coefficient of Cil in line 4, i.e., by 1681.143. 2. In line 8 write out line 5 again. In line 9 multiply line 4 by the coefficient of Ci2 in line 7, i.e., by 0.269544590. Add lines 8 and 9 together; the coefficient of Cil vanishes, and we get line 10. 3. In line 11 divide line 10 by minus the coefficient of Ci2 in line 10, i.e., by 77.667756. 4. In line 12 write out line 6 again. In line 13 multiply line 4 by the coefficient of Ci3 in line 7, i.e., by 0.292318381. In line 14 multiply line 10 by the coefficient of Ci3 in line 11, i.e., by 0.000425247. Add lines 12, 13, and 14 together; the coefficients of Cil and Ci2 vanish, and we get line 15. 5. In line 16 divide line 15 by the coefficient of Ci3 in line 15, i.e., by 430.632256. This gives successively, on the righthand side, b3 = 0.152125, Cl3 = 6.790783, C23 = 0.009875, and C33 = 23.221669. Of (:ourse, the three Ci3 need multiplying by 104 to give the correct values. 6. Now go back to line 11. This line represents the following four equations:
1.0b 2 + 0.000425247b3 1.0C12 + 0.000425247c l3 1.0C22 + 0.000425247c 23
= = = 1.0C32 + 0.000425247c33 =
1.295337256, 34.704825, 128.753559,
o.
Substituting the solution for b3 from line 16, i.e., 0.152124796, gives
1.0b 2  0.000064691
=
1.259337256,
and this is schematically given in line 17. Solving for b 2 gives b2 = 1.295272, as is schematically given in line 21. Similarly, line 18 corresponds to substituting Cl3 = 6.7907826 in the second of the four equations above, 1.0CI2  0.002887760 = 34.704825, whence CI2 = 34.707713, as is given in line 21. The solutions for C22 and C32 are obtained similarly. 7. Now go back to line 7, which represents the following four equations:
= = 0.292318381c23 =
LObI  0.269544590b 2  0.292318381b3 1.0cn  0.269544590C 12  0.292318381c13
1.0C2I  0.269544590C22 1.0c31  0.269544590C 32  0.292318381c 33 =
1.020309396, 5.94833896, 0, O.
SECT.
13.12
459
AN EXAMPLE OF MULTIPLE REGRESSION
Substituting b2 = 1.295272 and b3 = 0.152124796 in the first of these gives LObI  0.349133712 + 0.044468874 = 1.020309396, which is represented schematically in line 22. The solution for bl is hI = 0.715644558, as is given in line 26. The three remaining equations above for Cn. C2b and C3l give lines 23, 24, and 25, respectively, and the final solutions are in the last, three columns of line 26. 8. Assemble these solutions in Table 13.8. Table 13.S. lO4cij
= 17.288615 = 34.707724 C3 1 = 6.790783 0.715645 bl =
= 34.707713 = 128.753563 0.009875 C3 2 = 1.295272 b2 =
= 6.790783 = 0.009875 c a3 = 23.221669 ba = 0.152125
Cll
Cl2
c l3
C21
C22
C23
9. Check the accuracy of these solutions by inserting them in the lefthand side of line 6 and seeing how the righthand sides coincide with the required values. Here we get as lefthand sides 435.8569994, 0.034, 0.005, 10,000.000, which compare very well with the values in line 6. However, the need for carrying a large number of significant figures is pointed up by the discrepancy in the fifth decimal place between Cn and C12 • In the above calculations we obtained the bi directly by carrying the column L' YXi as an additional righthand side. This was not strictly necessary, for we can use (6.10) to obtain, e.g.,
bl
=
[1715.286'
X
17.288615
+ 562.952
+ 435.857
X
(34.707713)
X (6.790783)] X 104 = 0.715633,
which differs from the direct solution inthe fifth decimai place on account of rounding errors. To make a joint test of the null hypothesis Pi = 0 for all t, we compute the sum of squares due to regression from (6.33): 0.715644
X
1715.286
+ 1.295273
X
562.952
+ (0.152125)
X
435.857
=
1890.406.
The residual sum of squares about the regression plane is given by (6.35) as 2069.238  1890.406 = 178.832, and with 21  1  3 = 17 degrees of freedom this gives S2 = 10.520. These results are assembled in Table 13.9. The variance ratio, 630.135/10.520 = 60, being distributed as F(3, 17) under the null hypothesis, is overwhelmingly significant.
460
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
Source of variance
Table 13.9 Sums of squares
Due to regression About the regression plane
1890.406 178.832
Total
2069.238
CHAP.
Degrees of freedom
Mean squares
3 17
630.135 10.520
13
Separate t tests on the individual b i are made, using (6.18). In Table 13.10 tests of the individual null hypotheses Pi = 0 are performed. Under the null hypothesis, the ratios in the last column are distributed as t(17). The first two are obviously very significant, and the last nonsignificant. The individual confidence intervals for the separate b i can be constructed if desired. Table 13.10
v VTb
i 10.520 x 17.288 x 104 10.520 x 128.754 x 104 10.520 x 23.222 x 104
1 2 3
0.1349 0.3680 0.1563
The regression equation of Y on
Y
=
bi
i]
0.7156 1.2953 0.1521
Xl> X 2 , X3
5.305 3.520 0.973
is
17.524 + 0.715633(XI  60.429) + 1.295273(x + (0.152125)(x3  86.286)
2 
= 39.919
+ 0.7156xI +
1.2953x2

21.095)
0.152Ix3'
To test whether X3 adds significantly to the regression on Xl and X 2, we need the latter regression. When we have only three independent variables, to drop one it is about as quick to start from scratch with the two we want. However, we will illustrate the procedure of Section 13.10. Since we are omitting X 3, in (10.20) p, = 3, and
c'
=
[17.288615 . (6.790783):1 x 104 23.221670 J
=
[128.753603  (0.009875):1 x 104 23.221670J
11
c' 22
=
15.302766
X
= 128.753599
X
and, from (10.19),
c' = [34.707713  (6.790783)(0.009875)J 1~ 23.22 1670 9
=
34.704825 x 104
=
C~l'
X
104
104
'
10",
SECT.
13.12
461
AN EXAMPLE OF MULTIPLE REGRESSION
We can calculate b~ from (6.10) as
=
b{
[1715.286 X 15.302766
+ 562.952
X
(34.704825)] X 104
= 0.671159
and similarly b; b'
=
1.295338. Alternatively, we can use (10.22),
= 0.715645 _
1
(6.790783)(0.152125) = 0.671159 23.221669
which comes to the same thing. The sum of squares due to regression on Xl and X 2 only is, again using (6.33), (0.671159 X 1715.286
+ 1.295338 X
562.952) X lQ4
= 1880.443.
We now construct Table 13.11 analogous to Table 13.3. The test of the null hypothesis that adding X3 to Xl' x 2 does not improve the fit is given by Table 13.11
Sums of squares
Degrees of freedom
Mean squares
Due to regression jointly on xl> x 2• X3 Due to regression jointly on Xl. x 2 Due to adding X3 to Xl> X 2 Deviations about regression plane on xl> x 2• x3
1890.406 1880.443 9.963
3 2 1
940.221 9.963
178.832
17
10.520
Total
2069.238
20
Source of variance
the variance ratio 9.963/10.520, which is clearly nonsignificant. The test of the joint hypothesis {J~ = (J; = 0 is given by the variance ratio 940.221/[(2069.2381880.443)/(202)] which is clearly overwhelmingly significant. The regression equation for Xl and X 2 only is Y = 17.524 + 0.671159(x1  60.429) + 1.295338(X2  21.095) = 50.359 + 0.671159x1 + 1.295338x2.
We might ask for 95 per cent confidence limits for For these values, Y
=
50.359
+ 0.671159
X
50
1]
at
+ 1.295338 X
Xl
18
=
50,
X2
= 18.
= 6.516.
The estimated variance of Y can be calculated from (3.48), which is applicable to the special case of two independent variables, or from (6.36), which is applicable to the general case of r independent variables. We
462
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
13
CHAP.
will use the latter form here. Since we are using only two variables, is given by pooling the last two lines in the body of Table 13.11: S2 == (9.963 + 178.832)/(1 + 17) = 10.489. Also Xl  Xl = 50  60.4286 == 10.4286, X2  x2 = 18  21.0952 = 3.0952. Then
S2
J?'[Y]
= x
=
+ {15.302766 x (10.4286)2 + 128.753603 (3.0952)2 + 2( 34.704825)( 10.4286)( 3.0952)} x 104] 10.489[(1/21)
1.18893.
The square root of this is 1.0904, and t o.976(18) = 2.101; so confidence limits for 17 are at 6.516 ± 2.101 x 1.0904 = (4.23, 8.81). Finally, we will illustrate the use of the c matrix to give the regression equation of X3 on Xl, x 2 • Reading off the third line of Table 13.8, we have, for (10.5),
X  X • 3 
+ (
• 3
6.790783)(X _ x) 23.221669' 1 • 1
= 86.286 + 0.292433(xl

+ (_
0.009875 )(X  x) 23.221669 2 2
60.429)  0.000425(X2  21.095).
To test the significance, say, of the regression coefficient of this equation, we would use (10.21): J?'[b
] 31.2
= _1_ 21  3
X3
on
in
Xl,
17.288615 x 23.221669  (6.790783)2 = 0.03651. (23.221669)2
A test of the null hypothesis fJ31.2 1.53, which is distributed as t(18).
= 0 is given by 0.292433/~0.03651
=
EXERCISES 13.1. The following table gives data on death rate due to heart disease in males in the 55 to 59 age group, along with the proportionate number of telephones, and of fat and protein in the diet. (a) Test the significance of the regression of yon xl> in the regression of yon Xl alone. (b) Construct the multiple regression equation of yon xl> x 2 • (c) Make a joint test of tIte null hypothesis Pl = P2 = O. (d) Test whether adding X 2 to the regression equation (on Xl) has significantly improved the fit. (e) Construct the multiple regression equation of y on Xl' x 2 , and x 3 • (j) Give 95 per cent confidence limits for P3 in this equation. (g) Give 95 per cent confidence limits for 1] at Xl = 221, x 2 = 39, X3 = 7. (h) Test whether X 2 and X3 together add anything to the regression of y on Xl' (i) Construct the multiple regression equation of Xl on X 2 and X 3 • (j) Give 95 per cent confidence limits for the regression coefficient of Xl on X3'
463
EXERCISES
Australia Austria Canada Ceylon Chile Denmark Finland France Germany Ireland Israel Italy Japan Mexico Netherlands New Zealand Norway Portugal Sweden Switzerland United Kingdom United States
Xl
X2
Xa
y
124 49 181 4 22 152 75 54 43 41 17 22 16 10 63 170 125 15 221 171 97 254
33 31 38 17 20 39 30 29 35 31 23 21 8 23 37 40 38 25 39 33 38 39
8 6 8 2 4 6 7 7 6 5 4 3 3 3 6 8 6 4 7 7 6 8
81 55 80 24 78 52 88 45 50 69 66 45 24 43 38 72 41 38 52 52 66 89
xa
= 1000 (telephones per head) = fat calories as per cent of total calories = animal protein calories as per cent of
y
= 100 [log (number deaths from heart
Xl
x2
total calories disease per 100,000 for males in 55 to 59 age group)  2] Sources: Xl from World Almanac and Book of Facts (New York: New York World Telegram, 1951) except the figures for Ireland and Ceylon, which were obtained by private communication from the countries concerned. x 2 , x a ' and x 4 from J. Yerushalmy and Herman E. Hilleboe, "Fat in the Diet and Mortality from Heart Disease: a Methodological Note," New York State JO/l/'llal of Medicine, 57 (1957), 234354.
464
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
Sums of Squares and Products
y xl x2 xa
Sums
y
xl
X2
Xa
78,624
123,591 288,068
39,409 68,838 21,807
1,248
1,926
667
7,504 13,226 4,042 772 124
Sums of Squares and Products of Deviations y X2 xl Xa y Xl
7828.364
14,334.273 119,455.455
x2
1,571.909 10,445.182 1,584.773
xa
469.818182 2370.363636 282.545455 73.090909
13.2. In the table below, referring to the year 1950, for 46 states, y = standardized liver cirrhosis death rate Xl = per cent of population "urban" (1950 Census definition of urban) x 2 = 100 (number of children ever born to women 4~9 years old)l x3 = wine consumption in hundredths of U.S. gallons of absolute alcohol per capita of toal population x" = spirits consumption, likewise. Sources: y, Xa, and x" from Wolfgang Schmidt and Jean Bronetto, "Death from Liver Cirrhosis and Specific Alcohol Consumption: An Ecological Study," American Journal of Public Health, 52 (1962), 147382. xl and X 2 from Statistical Abstract of the United States, 1955, U.S. Department of Commerce. Two states, Oklahoma and Mississippi, are excluded on account of legal restrictions on the sale of liquor.
Alabama Idaho Iowa Maine Michigan Montana New Hampshire N. Carolina Ohio Oregon Pennsylvania
Y
Xl
X2
Xa
X"
41.2 31.7 39.4 57.5 74.8 59.8 54.3 47.9 77.2 56.6 80.9
44 43 48 52 71 44 57 34 70 54 70
33.2 33.8 40.6 39.2 45.5 37.5 44.2 31.9 45.6 45.9 43.7
5 4 3 7 11 9 6 3 12 7 14
30 41 38 48 53 65 73 32 56 57 43
465
EXERCISES
y
Utah Vermont Virginia Washington W. Virginia Wyoming Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Illinois Indiana Kansas Kentucky Louisiana Maryland Massachusetts Minnesota Missouri Nebraska Nevada New Jersey New Mexico New York N. Dakota Rhode Island S. Carolina S. Dakota Tennessee Texas Wisconsin Sums
34.3 53.1 55.4 57.8 62.8 67.3 56.7 37.6 129.9 70.3 104.2 83.6 66.0 52.3 86.9 66.6 40.1 55.7 58.1 74.3 98.1 40.7 66.7 48.0 122.5 92.1 76.0 97.5 33.8 90.5 29.7 28.0 51.6 55.7 55.5 2920.7
Xl
65 36 47 63 35 50 55 33 81 63 78 63 65 45 78 60 52 37 55 69 84 54 61 47 57 87 50 85 27 84 37 33 44 63 58 2588
X2
32.1 36.9 38.9 47.6 33.0 38.9 35.7 31.2 53.8 42.5 53.3 47.0 44.9 35.6 ·50.5 42.3 43.8 33.2 36.0 47.6 50.0 43.8 45.0 42.2 53.0 51.6 31.9 56.1 31.5 50.0 32.4 36.1 35.3 39.3 43.8 1907.9
Xa
12 10 10
14 9 7 18 6 31 13 20 19 10
4 16 9 6 6 21 15 17 7 13 8 28 23 22 23 7 16 2 6 3 8 13 533
x4
33 48 69 54 47 68 47 27 79 59 97 95 81 26 76 37 46 40 76 70 66 63 59 55 149 77 43 74 56 63 41 59 32 40 57
(a) Give 95 per cent confidence limits for the regression coefficient b1lzl' (b) Add X z to the regression equation. Test the null hypothesis PUZ"Xl = O. (c) Add Xa to the regression equation. Test the null hypothesis {J UZ3. Zl:r. = O. (d) Test the joint null hypothesis that {J"'Yr Z2Z3 = {JlI:r"zl"a = O. (e) Give the regression equation of xa on xl and x 2 • Give 95 per cent confidence limits for {J"'3:r•. J'\' (f) If you have strong views about hard liquor, make up your own exercises
involving
;1'4'
466
REGRESSION ON SEVERAL INDEPENDENT VARIABLES
CHAP.
13
Sums of Squares and Products of Deviations
y xl xB xa
y
xl
X2
Xa
24,741.3481
12,446.4783 11,158.8696
5817.9129 4209.4870 2233.0237
6167.47609 3327.95658 1403.6457 2155.152174
13.3. Suppose that xl' X B, Xa are distributed in a trivariate normal distribution. Let {J12.a be the regression coefficient of xl on X B in the regression equation of xl on x 2 and Xa, and let {J12 be the regression coefficient of Xl on X 2 in the regression equation of xl on x 2 alone. (a) Express the relationship between {J12.a and {l12 as a function involving P12, Pla.a' and P23' (b) What relationships involving P12, PIa, and P2a will make {J12.3 = {J12 ? 13.4. Assume that xl and X B are random variables. (a) Obtain an expression for the correlation coefficient between x I /x2 and x 2 , involving the parameters 0''''1' 0''''2' gl> g2' and P"'1"'2' (b) Suppose that the coefficients of variation of Xl and x 2 are equal, i.e., that 0'' '1/ g1 = 0''''2/g2' Give a simple form for the correlation coefficient between x I /x2 and x 2 • (c) Suppose further that P"'1"'2 = O. Obtain a numerical value for the correlation coefficient between x I /x 2 and x B• 13.5. Starting from equations of the form (11.2.13) for bY"'l and b"'1"2' and from equations of the form (3.19) of bY"'l'Y2 and bY"'2'Y1' show that
bY"'l
= bY"'1''''2 + b"'1"'2bY"'2''''1' REFERENCES
1. Pearson, Karl, "Mathematical Contributions to the Theory of Evolution. On a Form of Spurious Correlation Which May Arise When Indices are Used in the Measurement of Organs," Proceedillgs of the Royal Society, 60 (1897), 48998. 2. Anderson, R. L., and T. A. Bancroft, Statistical Theory ill Research. New York: McGrawHili Book Co., 1952. 3. Kempthorne, 0., The Desigll alld Allalysis of Experimellts. New York: John Wiley and Sons, 1952. 4. Tukey, John W., "Causation, Regression, and Path Analysis," Chapter 3, pp. 3566, in Statistics alld Mathematics ill Biology, Oscar Kempthorne et al. (eds.). Ames, Iowa: Iowa State College Press, 1954. 5. Simon, Herbert A., "Spurious Correlation: a Causal Interpretation," 10l//'llal of lite Americall Statistical Associatioll, 49 (1954),46779.
CH APT E R 14
TwoWay and Nested Analysis of Variance
14.1. Introduction: The Model for Model I Analysis In Chapter 10 we considered the analysis of variance where the data were classified in one way. In this chapter we will discuss the analysis of twoway tables, first when both classifications are model I, then when both are model II, and finally the mixed case where one classification is model I and the other model II. We suppose that the data are in the form of Table 14.1. For example, rows could correspond to varieties of corn, and columns to quantity of some fertilizer, and we have n independent estimates xijv for each row X column combination. We assume that the 1'1 sets of 11 observations are random samples from I't separate populations, each normally distributed about means ~ij but all with the same variance 0'2. The model is Xi}.
i
=
=
~H
+ ziJ.: =
1, ... ,I': j
zii. "" N(O,
1, ... ,t:
'JI
=
(1.1)
0'2),
1, ... ,11. n
The means of the sample observations in each cell are xu. The mean of the sample means in the ith row is
1
X.
...
tIt
= (1/n) 1 xi}v, •
n
= 1 x.. = 11 xii' I; III i . •
(1.2)
l}.
and the mean of the sample means in the jth column is 1
1
r
r
n
I'll i
•
x. i . = 1 Xii. = 11 XiJ.· l' i
467
(1.3)
468
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
Table 14.1
j
Averages
2
2
r Averages
X.t.
X.I.
Table 14.2 gives the population means Ei; for each cell, the row and t
l'
= (l/t) 2 EI ;, L = (1/r) 2 Ei;, and the grand
column means, defined as ~i. mean defined as 11'_ ~ = r 2~' I"
=
; l1't rt 2i 2; E11..
i
1'_
= t2; "~ '.
(1.4)
The deviations of each row mean from the grand mean are denoted by 1}1' 'fJi
=
~I.
~,

(1.5)
and the deviations of each column mean from the grand mean by';: (1.6)
Clearly, both these sets of deviations sum to zero: t
l'
2'fJI i
= 0 = 2';· i
(1.7)
Table 14.2 j
2 1 2
~11 ~21
r Means Deviations
~r.
E.• L ~=t.
~12 ~22
~r2
E.a E.a ~ =tz
~1I ~at
~rt
L
~.t~=tt
Means
Deviations
E•• Ea.
~ •. ~=1J1
~r.
~r. ~=1J,
~
~a. ~=1Ja
14.1
saCC.
INTRODUCTION:
469
THE MODEL FOR MODEL I ANALYSIS
We define quantities Oil as the difference between the true mean for the ijth cell, ~ii' and what we would expect on the basis of an additive model,
+ row effect + column effect = ~ + 'lJi + ~i;
Grand mean
(1.8)
i.e. (1.9) Thus, if the true cell mean is equal to the prediction (1.8), then the additive JDodel holds and Oij = O. In other words,
= ~ + 'lJi + ~; + 0ii'
~ij
(1.10)
the Oij measuring the departure from the additive model (1.8). The Oil are known as the interaction constants. When we say that we have zero interaction, we mean that we have an additive model as in (1.8). The Oil sum to zero over each suffix, for each value of the other suffix; for the sum over i, r
r
But I 'lJi
r
r
Ii Oij = I ~ii i
r~  I'lJi  r~j.
( 1.11)
i
= 0, from (1.7), and, from (1.6),
i
r{j
=
r~., 0
r~

r
= k~ E.o; 1
r~;
(1.12)
+ rE = o.
(1.13)
= 0,
(1.14)
i
so r
r
=I
IOij
i
r
~ij
I i
~ij
rE  0 

i
Specifically, we have t relations r
I
r
Oil
=I
i
j
and r relations
r
0i2
= ... = I
Oil
i
(1.15) However, effectively there are only r + t  1 independent restrictions on the Oil. This is because we have, say, the t restrictions (1.14), which
(r ) = O.
when summed determine the relation ~t ~ Oii r
r
But then the ~ Oil
(t ) t ~ 0ii' must also sum to zero.
(l.l5), whose sum is ~ ~ Oil =
t
r
In
other words, only r  1 of the r relations (1.15) will be independent. Thus the total number of independent restrictions on the Oij is r + t  1. Since the number of the ii is rl, the number of independent constants in this group is (1.16) rt  (r + t  1) = (r  1)(1  I).
°
470
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
We now rewrite the original model (1.1) as Xiiv
= ~ + 17i + ~i + ()ii + Ziiv'
(1.17)
The original model involved rt independent cell means ~ii; this model involves the grand mean ~, r  1 independent row constants 17i' t  1 independent column constants ~i' and (r  1)(t  1) independent interaction constants ()ii' The total number of independent parameters in the new model (1.17) is
1
+ (r 
1)
+ (t 
1)
+ (r 
1)(t  1) = rt,
(1.18)
the same as the original model (1.1). If the full number of parameters are needed, very little will be gained by the change in the model. However, if the null hypothesis of additivity can be accepted, we can regard the interaction constants ()ij as zero, and then we have only to consider the grand mean and row and column parameters, a very substantial reduction in the number of parameters. Furthermore, it may happen that either the row or column parameters, or both, can be regarded as zero, which would allow further reduction in the number of parameters necessary to describe the situation. In fact, if both were zero, the only parameter left is the grand mean ~. Corresponding to the population parameters ~, 17i' ~i and ()ii' we will have sample analogs. The sample analog of ~ (1.4) is
_
1
X ...
r
t
n
= 111 xiiv, I'tn i i v
that of 17i (1.5) is (Xi ..  X.. ,), that of ~i (1.6) is (x.].  x.. ,), and that of ()ij (1.9), is Xii. 
[x ...
+ (Xi .. 
x.. ,)
+ (x. i . 
x.. ,)]
= Xii.
 Xi .. 
x. i .
+ x.... (1.19)
Using these expressions we can write an identity analogous to (1.10) Xii.
= x... + (Xi ..
 X. ..>
+ (x. i . 
X.. ,)
+ (Xii.
 Xi .. 
x.i.
+ X...>.
(1.20)
lfwe subtract this from (1.10), we get an expression for the deviation of the sample mean for the ijth cell from the true value for the mean of that cell: Xii.  ~ii
= (Xii.
 Xi .. 
+ (Xi .. 
X... 
+ X... 17i) + (x. i . 
x. i .
()ii) X... 
~j)
This identity will be used in the following section.
+ (X ... 
~). (1.21)
14.2
SECT.
J
471
THE ANALYSIS OF VARIANCE
14.2. The Analysis of Variance
We start with the identity X·· x ••• = (x..lJ.  x.Z..  x • i • 111' 
+ X ) + (x.
t..
•••
+ (X. i .  X.J + (XiiV 
 x . .. )
(2.1)
xii)'
and square and sum over i,j and 'V: r
t
n
t
r
L~i ~v (xiiv  x.,Y = ~ ~ (Xii. i i
Xi ..  x. i .
II
i
r
t
+ lit ~i (Xi .. 
+ 111' ~; (X. i . 
X.. /
r
+ x.J 2
t
n
+ ~i ~; ~ (XiiV 
x .. l
Xii/.
(2.2)
l'
This equation is entered in the second column of Table 14.3. We wish to Table 14.3 Source of variance
Degrees of freedom
Sums of squares
Mean squares
EIM.S.] r
1111: 112
r
Rows
111
1: (Xi.. 
r 
x.Y
s~
1
0'2+_'_
1
rl
I
,,, 1:"
I
Columns
IIr 1: (x.}.  X... )2
t 
1
S2 3
1
0'2+ _ 1 _
1
t r
r
Interaction
111: 1: (xu. I
r
Within cells
Total
X.I. +
1)(t 
0'2 +
1)
. (r 
i
}
v
r
I
n
1: 1: 1: (xiiv 1
i
I
1)(t 
1)
X.. .)2
n
1: 1: 1: (xuv 
{
(r 
Xi ..
I
i
I
111: 1: 0:1
I
,.
XU.)2
rt(1I 
X.. ,)2
rtll 
1)
S2 1
0'2
1
obtain the distributions of the various mean squares s~, etc., under the various null hypotheses 17 i = 0 for all i, etc. Consider the identity XiJv  ~ii = (xiiV  xii)
+ (Xii.
 ~i:l)'
(2.3)
We square this and then sum over 'V: n
~ (xiiv  ~iJ? y
n
= ~ (XiiV y
XiiY
+ n(xi:l. 
~i:l)2.
(2.4)
472
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
The lefthand side is distributed as O'2x 2(n), and the two terms on the right_ hand side are distributed as O'2x 2(n  1) and O'2X 2(1). Now sum over i and j.
By the additivity of X2's, the lefthand side will be distributed as O'2x2(nn), and the two terms on the righthand side as O'2x2(l't(n  1» and O'2x2(rt). If we define
(2.6) it will have expected value 0'2 and be independent of the other term on the righthand side of (2.5). Now consider this other term. Ifwe square the identity (1.21) and sum over i,j and 'I' (summing over 'I' amounts to multiplication by n), we get r
11
t
Li L; (xii.
 ~ii)2
r
t
= 11 Li L; (Xii.
r
 Xi ..  X. i .
+ X... 
( 0 )2
t
+ Ilt Li (Xi .. 
X...  17i)2
+ nl' Li (X. i . 
X...  'i)2
+ I1tl'(X...  W. (2.7)
The lefthand side is distributed as O'2X2(l't). The degrees of freedom for the first term on the righthand side follow from the fact that the random variables Xii.  Xi ..  x. i . + X... are subject to the conditions r .. k'" (xtJ.
X·t.. ~ xi ..
+ X.0. ) = 0
for each j,
(2.8)
Xi ..  iX. i .
+ X...) = 0
for each i,
(2.9)
i
t
Li (Xii . 
and, similar to the argument for the analogous Oii' these form only /' + t  1 independent relations. The degrees of freedom for this term are therefore (2.10) 1'1  (I' + t  1) = (I'  1)(1  1). For the second and third terms we have the conditions r
'" "'" i
t
(xi.. 
X ) ....
0  "'" '" (x•. i.

X... )
(2.11)
j
and so the degrees of freedom are I'  1 and 1  1, respectively. The last term involving the single random variable X... has 1 degree offreedom. Hence, by Cochran's theorem, these sums of squares will be distributed as O'2X2 and be independent.
SBCT.
14.2
473
THE ANALYSIS OF VARIANCE
If we define
S;2 as r
11
t
L L:i (xii. 
Xi ..  x. i .
i
+ X... 
eii)2
(2.12)
(I'  1)(t  1) then
12
S2 ,...., (]
2l«I'  1)(t  1)) , (I'  1)(t  1)
(2.13)
and S;2 will have expected value (]2 and be independent of s~. Hence the ratio S;2/S~ has the F distribution with degrees of freedom (r  1)(t  1), rt(n  1). Now define s~ as r
S~=
t
n L L__________________ (Xii.  Xi ..  X. i. + X..Y __ __ t~'~i
(2.14)
(I'  1)(t  1)
To find its expected value, consider the expected value of the numerator of S;2;
E[n
**
(xii.  Xi ..  x. i.
= E 
+
X...  eii
)2]
[n ii ±(Xii.  Xii ..  X. i. + x.yJ + i ±e:i i 211
**
11
E [eii(Xii .  Xi ..  X. i .
= E[(I'  1)(t 
l)s~J

n
**e:
+
i
x.J] (2.15)
i.
The lefthand ,side equals (r  1)(t  1)(]2, whence it follows that
(2.16) If the null hypothesis e ii S2
~,...., S2
= 0 for all i, j
is true, then s~
F«I'  1)(t  1), I't(n  1)).
= S;2,
and so (2.17)
1
If the null hypothesis is false, so that some eii =;t:. 0, then E[s~] > (]2 and the ratio s~/s~ will not have the F distribution but instead a distribution displaced upwards in the direction of larger values of s~/s~. The critical region is therefore large values of s~/s~. The other two mean squares in Table 14.3 can be treated similarly.
474
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
If the null hypothesis ()iJ = 0 for all i, j is rejected, then the true cel\ means are not given by an additive model, and we might as well deal with the individual cell means. On the other hand, if the null hypothesis is acceptable, then we may assume that the ()iJ are zero, and proceed to test the row and column effects. If the 17i = 0 for all i, then s! is an estimate of 0'2 independent of s~, and the ratio s!M will be distributed as F«I'  1), I't(n  1»). We may wish to construct confidence limits for ~i.  ~i'., the difference between two row means. Since, using (1.5),
~i.

~i" = (~i.
~)


(~i"
~) =

1]i 
1]i"
(2.18)
confidence limits for ~i.  ~i" will be identical with confidence limits for 1];" Averaging the model (1.17) over j and v gives
1]i 
Xi ..
since ~. Hence
= ~ + 1]; + ~. + 0i. + Zi .. = ~ + 1]i + Zu t
= (l/t) 1
'f =
(2.19)
t
0 by (1.7) and
0i.
= (l/t) 1 ()If = 0
by (1.15).
f
Xi .. 
Xi' ..
=
(17i 
1]i')
+ (Zi .. 
(2.20)
Zi'.J,
with expectation (2.21) and variance V[Xi .. 
x i •..1
=
V[Zi..1
+ V[Zi'..1 = 20'2 .
(2.22)
tn
We can therefore derive confidence limits for
1]i 
1k from
(2.23) Usually, there is little or no point in constructing confidence limits for ~i.  ~i'. unless ~ii  ~i'i is constant over j, i.e., unless the difference between the two rows is the same for all columns. From (1.10), (2.24) Thus (2.23) will also give confidence limits for ~if  ~i'j if we can assume that the interaction constants ()if are zero. Ifwe cannot make this assumption, we can obtain confidence limits for ~i:l  ~i':I as follows. Since Xii. 
Xi':/.
= 17i 
1k
+ ()ii 
()i':I
+ zii. 
zi':/.
(2.25)
has expectation E[xi:I. 
xi':/.]
= 1Ji 
1]i'
+ ()if 
()i':I
=
~i:l

~i':I'
(2.26)
SECT.
14.3
FORMS FOR TWOWAY ANALYSIS OF VARIANCE
475
from (2.24), and variance V[Xii .

Xn] = V[z;;J
+ V[zi'j.] =
20'2 ,
(2.27)
11
/
confidence limits for
(~iJ
 ~i'i) can be obtained from
(X;;.  xi'i,)  (~;;  ~i'i) ,...,., t(rt(1l _
.J2s~/1l
1».
(2.28)
Comparison of the denominator of (2.28) with the denominator of (2.23) shows the advantage gained if the assumption of zero interaction is permissible. 14.3. Computing Forms for TwoWay Analysis of Variance Table 14.4 gives the per cent reduction in blood sugar a certain time after injection of insulin into rabbits. A group of 24 rabbits was divided at random into six groups of four rabbits each, and each rabbit received an injection of insulin. Two factors were involved, the dose at three levels and the preparation of insulin, A and B, at two levels. Table 14.4 Dose Preparation
2.29
3.63
A
17 21 49 54
64 48 34 63
62
141
209
286
33 37 40 16
41 64 34 64
56 62 57
126
203
247
576
267
412
533
1212
B
72
61 91 636
72
476
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
While the sums of squares in Table 14.3 can be calculated in the forms given there, it is usually more satisfactory to use identities that involve totals rather than means. We need the following sums of squares: r
t
n
i
;
v
LLI X~iv = 172 + 212 + ... + 722 = 69,358, ! n
±±(i XiiV)2 = i
i
±(± i XiiV)2 nr l
nt
i
i
i
= 267 2
+ ... + 533 2 =
v
(3.2)
65,640.25,
(3.3)
4 X2
v
±(± i XiiV)2
+ 2472 = 65,863.00,
4
v
~
i
1412 + ...
(3.1)
+
= 636 2 576 2 = 61,356.00, 4 X 3
(3.4)
1 (r t n )2 nrt I Ii Iv Xiiv = 4 X12122 = 61,206.00. 2 X 3
(3.5)
i
By manipulations similar to those of (10.2.31) we get for the total sum of squares
(3.6)
= 69,358.00 
= 8152.00.
61,206.00
The withincells of squares is
(3.7)
= 69,358.00 
65,863.00
= 3495.00.
The rows sums of squares is nt
2 lr(tn Ir_ (Xi ..  X.J =  I I I tn i i i
Xijv
)2

v
l(rtn I LI

rtn
)2
(3.8)
(rI It In )2
(3.9)
i
i
Xiiv
v
= 61,356.00  61,206.00 = 150.00
Similarly, the columns sum of squares is I1r
t (X.i.  X..l = 1 It(rI In )2 rl1
Ii
Xiiv
i
i
v
= 65,640.25 
1 

rtl1
61,206.00
Xiiv
i
i
v
= 4434.25
14.3
SECT.
477
FORMS FOR TWOWAY ANALYSIS OF VARIANCE
A convenient computing form for the interaction sum of squares is (see ~xercise
14.5)
r t
n k.L., ~ '" (x.. ",. i
X.t..  X..1.
;
=1 n
i i(i i
i
+X
. ..
)2
J... (i J'tn i
XiiV)2 
v
r
 nt 2 (Xi ..  X.. Y i

ii i
XiiV)2
v
t
nJ'
2i (X. i .  x..Y
(3.10)
= 65,863.00  61,206.00  150.00  4434.25 = 72.75.
It will sometimes be worth while to calculate directly the quantities (1.19), because they are estimates of the ()ii and a pattern in them may tell us something about the nature of the interaction. The cell means xii. are given in Table 14.5a. For i = 1, j = 1, for example, we have xu. = 141/4 = 35.25. Also Xl .. = 636/4 x 3 = 53.00, x.I. = 267/4 x 2 = 33.75, and X... = 1212/4 x 2 x 3 = 50.50, so Xu.  Xl .. 
x.I. 
X...
=
35.25  53.00  33.75
+ 50.50 =
0.625. (3.11)
The quantities Xii.  Xi ..  x. i . + X... are tabulated in Table 14.5b. They have the property of summing to zero in each row and in each column [see (2.8) and (2.9)]. Their sum of squares, multiplied by n, is of course the interaction sum of squares, and this provides an alternative to the identity (3.10): r
t
i
i
n 22( x·Ii.  x·I..  x .i.
+ '")2 X
= 4[(0.625)2
+ ... + (2.375)2] =
72.75.
(3.12)
Table 14.5 (a) Xii.
(b) Xii.  Xi .. 
j
1 2
35.25 31.50
x.i. +
X...
j
2
3
52.25 50.75
71.50 61.75
0.625 +0.625
2
3
1.750 +1.750
+2.375 2.375
These results are assembled in Table 14.6. It is apparent that the null hypothesis that the interaction is zero is acceptable, and hence there is unlikely to be anything of interest in Table 14.5b. The null hypothesis
478
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
that the difference between preparation is zero is acceptable since we have F(I, 18) < I, but the dose effect, with a variance ratio 2217.125/194.167 :::: 11.4 being distributed under the null hypothesis as F(2, 18), is highly significant.
Source of variance Rows = preparations
Table 14.6 Sums of Degrees of freedom squares
Mean squares
150.00
150.000
E[M. S.]
12
r
O'2+T~11~ t
Columns = doses
4434.25
2
2217.125
8 t
0'2
+2~'~ 1
Interaction
72.75
2
36.375
Within cells
3495.00
18
194.167
Total
8152.00
23
4 r t O'2 +!!O2 2 j i /1 0'2
14.4. TwoWay Analysis of Variance: Model II In model II twoway analysis of variance we assume that both rows and columns are random effects, sampled from infinite populations. For example, in a large factory with very many identical machines, machines could correspond to rows, columns to a random sample of batches of raw material, and the replicates in the cells are several items made from each batch on each machine. The model is (4.1)
where gj, e;, Yii and Zjiv are independently sampled from normal populations with zero means and variances V'~, V'~, (02 and (j2, respectively. The usual objective of the analysis is to estimate and construct confidence limits for the parameters of the model, namely, the grand mean and the four components of variance. The analysis of variance involves the same equation (2.2) as in the model I analysis. The computing forms, the degrees of freedom, and the mean squares are all the same as in Table 14.3, and the only differences between the two models lie in the expectations and distributions of the mean squares. The mean square s~ continues to have the same expected value a2 and the same distribution as before.
SECT.
14.4
TWOWAY ANALYSIS OF VARIANCE:
479
MODEL II
To find the expected value of the interaction mean square s~, we proceed as follows. We form various averages of the model (4.1), /
+ gi + e} + Yu + zo" Xi" = ~ + gi + e, + fh + Zi,,' = ~ + g, + e} + y,} + X,,, = ~ + g, + e, + y" + Z,,'; Xi}, = ~
X,},
(4.2) (4.3) (4.4)
Z,}"
(4.5)
so we have the identity
Xii.  Xi"  X,}, + X"' = (Yii  fh  y,)
+ Y.,) + (zu.
+ zoo,),
 Zi"  z,i,
(4.6)
Now
!E[CI'  1)(t 
n
= E[
*t
1)s~] = E[ii(x ii , • ,
Xi ..  x,i.
+ Y.fJ + E
(Yii  Yi.  Y.i
[* t
+ x,y]
(Zii,  Zi"  Z.J,
+ z,J J 2
(4.7) the expectation of the cross product being zero since our model assumes independence of the y's and z's. Consider the two parts of (4.7) separately. It is straightforward to show that
+ z..
E[**CZiJ'  Zi ..  z,J,
yJ
= E[~ ~(ztJ... £., £., i
Z.••
;
)2J  tE[~(Z' £., i
t..
 Z. ..
 I'Elt(Z,J, Now (I't 
r
t
i
J
1)112 (Zi),

z"Y
)2J
Z,J 2] .
(4.8)
is the ordinary sample estimate of the
variance of the ziJ,' which is (12/11, so
E[i ±(ziJ,  z..fJ = I't n J
Analogously, (I' 
1 (12.
(4,9)
i
i
1)11 (Zi"

z"Y is
the ordinary sample estimate of
i
the variance of the Zi,,' which is (12/111, so
E[~(Z"  Z )2J = I' til 1 (12. f....,
(4,10)
480
TWOWAY AND NESTED ANALYSIS OF VARIANCE
Likewise, E
[!
t ( )2J Z ·Z .1....
( 1 2 .
E[
ft( Z.. r
t
I).
14
(4.11)
=(1
I'll
j
Thus
CHAP.
J
Z.  Z . + Z.J2 d.
l..
I't  1 I'  1 = (  t 
t* tn
11
1) (12 = 1 (I'
t 
I'  I'll
(Yo  fh 
The consideration of E[
1)(t  1)(12. (4.12)

11
ii. + ii.,)2J, j
the other part of
(4.7), will follow the analogous course, the only difference being that terms of the type V[Yij] = w 2, V[fi,j] = w 2jr, etc., involve w 2 and not (12, and omit the factor Ijl1. We can thus assert
E[**(Yii  iii.  ii.j + ii./J =
w 2(1'  1)(t  1).
(4.13)
+ (I' 
(4.14)
Substituting (4.12) and (4.13) in (4.7) gives
E[*
*
(Xii.  Xi ..  x. j. + x..
=!
n
/J
(r  1)«(  1)(12
1)(t  1)w 2,
whence, multiplying by Ilj(r  1)(t  1), E[s~] = (12
+ nw 2•
(4.15)
Now consider E[s!]. We have from (4.3) and (4.5) that
Xi ..  X... = (gi so
E[*(X; ..  x..
/J =
g,)
E[*(gi 
+ Wi.  ii.,) + (Zi ..  z.. ,),
(4.16)
g/J
+ E[*(ii;.  ii.fJ + E[*(Zi ..  z.. f}
(4.17)
the expectations of the cross products being zero since our model assumes independence of the g's, y's and z's. In (4.10) we have evaluated the last term, and the first two are analogously
E[*(gi  gyJ E[
f Uk r
J
= (r 
Y.Y = (r 
1)1p~, 2
t .
1) w
(4.18)
(4.19)
SECT.
14.5
481
THE INTERPRETATION OF A MODEL II ANALYSIS
substituting in (4.17) and multiplying by tn/(r  1) gives (4.20) Similarly, (4.21) These results are assembled in Table 14.7.
Table 14.7 Source of variance
Degrees of freedom
Sums of squares
Mean squares
EIM.S.]
r
RoWS
1/1
~ (Xi ..  X.. ,)2
I' 
1
s~
(12
+ IlW 2 + tl/lp~
t 
1
S23
(12
+ IlW· + rmpi
s:
(12
+ IlW 2
S21
(12
I
t
Columns
Ill'
~ (x.l.  x .. .)2 I
t
r
Interaction
II ~ ~ (xii:  XI .. i
I
t
r
Within cells
I
t
r
I
1)
+ x .. .>" rt(1l 
1)
v
1/
~ ~ :E (xilv  x .. .>" I
1)(t 
1/
~ ~ ~ ('''Uv  x u J2 i
Total
 x.l.
(I' 
I'tll
1
v
14.5. The Interpretation of a Model II Analysis The tests of significance for the null hypotheses co 2 = 0, 1J'~ = 0, 1J'~ = 0 are obvious from the column of expected mean squares in Table 14.7. Whereas, in the model I analysis (Table 14.3), the main effects for rows and columns were tested against the withincells mean square, here they are tested against the interaction mean square. The parameters of the model are estimated in the obvious ways: ~
2 s~  s~ 2 s~  s~ co =   , 1J'1= ,
= x... , 62 = s~,
til
II
2
s~  s~
1J'2=    . I'll
(5.1) Exact confidence limits for ()'2 and for C0 2/()'2, and approximate confidence limits for w2 , 1J'~ and 1J'~ can be obtained as in Section to.5. For confidence limits for ~, from (4.5) E[x .. .1 = ~, and _ V[x ...l
1J'~
1J'~
co 2
(12
,.
t
I't
I'tll
=  + + + =
()'2
+ IlW 2 + Ilt1J'~ + llI"IP~ rIll
.
(5.2)
482
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
To obtain a mean square with expected value equal to the numerator in (5.2) we use the linear combination s! + s:  s:, since it has expected value (5.3)
Thus
x...  ~
J(s! + s~ 
s~)lrtn
t"'.J
f(r),
(5.4)
approx
where f' is given by application of (9.8.10). Confidence limits for ~ can be derived in the usual way, but these will be approximate on account of the approximation in (5.4). 14.6. TwoWay Analysis of Variance with Only One Observation per Cell Sometimes we have data in a twoway classification with only one observation per cell. The analysis is similar to that with n observations per cell, but, with n = 1 the model becomes, in the model I case, Xii
= ~ + 17i +
'1 + Oij +
zij
(6.1)
+ gi + ei + Yij + Zij'
(6.2)
and in the model II case Xii
=
~
In the table of analysis of variance there is no item "within cells," and the analysis is as in Table 14.8. The expectations of the mean squares for the two models are given in Table 14.9. In both models there is no test for interaction. In the model II analysis, the tests for the main effects of rows and columns are unchanged. In the model I analysis, the test for the row and column main effects may, if there is appreciable interaction so that
r
t
i
i
L L 0~1 =;t:. 0, be inefficient, since the denominator mean square will
be inflated by the extra component. On the other hand, if either variance ratio is significant, then it may be taken that the corresponding effect is real. 14.7. Nested or Hierarchical Analysis of Variance In our discussion of oneway model II analysis of variance in Section 10.4 we postulated a sampling situation in which sacks of wool were taken from a large consignment, and then samples were taken from each sack at random. If we now suppose that several analyses are performed on each sample, an appropriate model would be (7.1)
SECT.
14.7
483
NESTED OR HIERARCHICAL ANALYSIS OF VARIANCE
Table 14.8 /
Source of variance
Sums of squares
Rows
1
r
I
(XI' 
1
i
X/j
i
Mean squares
rl
s'«
1
s'3
x.. )'
C J rllCI I
= 1 Ir I
Degrees of freedom
 
t X/j )'
i
i
t
rI
Columns
(X'i 
x.. )'
t 
i
= 1rIit C Ii Xli Remainder
r
t
i
i
II
Total
J lCI I J 
~
t Xii
It
i
i
(by difference)
(,.  1)(1  1)
x.. )'
,.1  1
(X/j 
rt = IIx~i i
i
lC
~
t IIx/j
It
i
i
,
s'
J
where ~ is the grand mean, Vi corresponds to the sack effect, Yo to the sample within sack effect, and zo. to analyses within samples. The random variables Vi' Yo, and zo. are independently normally distributed with zero means and variances 1jJ2, w 2 , and 0"2. Suppose there are r sacks, t samples per sack, and 11 analyses per sample. Squaring and summing the identity Xii. 
x... = (Xi .. 
+ (Xii.
X..,)

+ (Xii. 
Xi.,)
(7.2)
Xii)
Table 14.9 Source of variance
E[M. S.]
Mean squares
Model I t
Rows
112
+ "1~   '" 1]~ • ,.
Columns
S2
3
112
1
112
112
+ w + tlf'~
112
+w
2
t
+I~~ 1
i
1
Remainder
Model II
+ (,. _ 1)(1 _
., r
t
1) ~ ~
(j~i
2
484
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
gives r
t
r
11
t
r
:L:L:L (Xii.  X.J 2 = nt :L (xi ..  X.. Y + II :L:L (Xii. i • i i i
Xi.l
i
r
t
11
i
j
y
+ :L:L:L (Xii' 
Xii?'
(7.3)
These sums of squares are entered in Table 14.10, along with their degrees of freedom. We now determine the expected value of these mean squares, as given in the last column. Table 14.10 Source of variance
Sums of squares
Degrees of freedom
Mean squares
1
S23
(12
+ IIW 2 + II/'P'
s~
(1'
+ IIW'
S21
(12
EIM.S.]
r
Between sacks
111 l: (Xi •• 
r 
X •• .)2
I
t
r
Between samples within sacks
Ill: l: (xu. i
r
Between analyses within samples
l: l: l: (xii" 
Total
l: l: l:,. (,vii"  x.Y
i
i
"
r
t
n
i
r(t 
Xi..>"
1)
i t n
rt(lI 
XU.)2
1)
rtll 1
i
Considering s~, averaging the model (7.1) over xii.
Y
gives
= ~ + Vi + Yu + Zii.;
(7.4)
so
(7.5) Squaring and summing over
Y
gives
11
:L (Xii. 
11
Xii)2
= :L (zu. 
ZiiY
(7.6)
Divided by 11  1, this is a sample estimate of the variance of the zUv' namely, a 2 • Pooling the sums of squares and degrees of freedom through summation over i andj leaves this unchanged; so we obtain
E[s~] = E
[±i i
(Xii' 
• , \' I't(n 
Xii)2]
= a2 •
1)
(7.7)
Now consider s~: Averaging (7.4) over j gives Xi ..
= ~ + Vi + fh + Zi .. ;
X.t..
= (y .. 
so X', j .

l,)
y.) t..
+ (z'j
t •
 z·t •. ).
(7.8)
(7.9)
SECT.
14.7 NESTED OR HIERARCHICAL ANALYSIS OF VARIANCE
485
Squaring and summing over j and then taking expectations gives
E[~(Xii.  xi.l]
[*
E
=
(Y/i 
[*
Yil] + E (Zii.  Zi.lJ. (7.10)
the expectation of the cross product vanishing by reason of the indet
pendence of the Yii and
.2 (Yo  Yi,)2/(t 
It is apparent that
Ziiv'
1)
i
is a sample estimate of the variance of the Yii' namely, w 2 • Similarly t
.2 (zii. ;
the
ziiv

Zi.y/(t 
1) is a sample estimate of the variance of the Zii .. Since
have variance
E
0'2,
the
Zii. have variance O'2/n, and so
[±(xii.  xi.ll = j
t 1
2
w2
+!!.....
(7.11)
11
Pooling the sums of squares and degrees of freedom by summation over i, and multiplying by n, gives
E[s~]
Now consider
s::
= E[
11
±±(Xii.  x.y] = i
i
0'2
l'(t  1)
+ I1W2•
(7.12)
Averaging (7.8) over i gives X...
= ~ + D. + Y.. + Z... ;
(7.13)
so, subtracting this from (7.8) gives Xi ..  X...
=
(Vi  V,)
+ Wi.

Y.,) + (Zi ..  z.. ,).
(7.14)
Squaring and summing over i and then taking expectations gives
E[*(Xi ..  x.. l] = E[*(Vi 
D.?] + E[*Wi.  YY] + E[*(Zi..  z..YJ.
(7.15)
the expectations of the cross products vanishing since our model assumes r
independence of the v's, y's, and z's. It is apparent that
.2 (Zi ..  z.. Y/ i
(,.  I) is a sample estimate of the variance of the
z... , namely O'2/tn.
486
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
Table 14.11 x' = quantity
Batch
2
3
4
5
6
7
8
9
10
11
12
13
14
Sample 1 (1) 2 (1) 3 (1) 4 (1) 1 (2) 2 (2) 3 (2) 4 (2) 1 (3) 2 (3) 3 (3) 4 (3) 1 (4) 2 (4) 3 (4) 4 (4) 1 (5) 2 (5) 3 (5) 4 (5) 1 (6) 2 (6) 3 (6) 4 (6) 1 (7) 2 (7) 3 (7) 4 (7) 1 (8) 2 (8) 3 (8) 4 (8) 1 (9) 2 (9) 3 (9) 4 (9) 1(10) 2(10) 3(10) 4(10) 1(11) 2(11) 3(11) 4(11) 1(12) 2(12) 3(12) 4(12) 1(13) 2(13) 3(13) 4(13) 1(14) 2(14) 3(14) 4(14)
Subsamples 76 69 72 75 110 119 120 111 130 143 141 129 62 50 71 66 62 48 80 87 91 87 78 87 101 89 78 76 136 108 128 96 140 92 107 84 81 86 103 85 108 102 102 109 106 100 99 102 93 85 78 89 116 104 118 112
85 82 78 84 109 106 121 119 140 121 147 140 67 61 74 67 64 50 86 91 97 90 74 83 97 96 96 87 123 131 119 82 136 80 114 113 99 83 94 87 98 102 103 III
107 104 98 91 89 89 80 87 117 116 119 109
Sample totals 161 151 150 159 219 225 241 230 270 264 288 269 129 111 145 133 126 98 166 178 188 177 152 170 198 185 174 163 259 239 247 178 276 172 221 197 180 169 197 172 206 204 205 220 213 204 197 193 182 174 158 176 233 220 237 221
x = 100(log x'  1)
Batch totals
621
915
1091
518
568
687
720
923
866
718
835
807
690
911
Subsamples 88 84 86 88 104 108 108 105 111 116 115 111 79 70 85 82 79 68 90 94 96 94 89 94 100 95 89 88 113 103 111 98 115 96 103 92
91 93 101 93 103 101 101 104 103 100 100 101 97 93 89 95 106 102 107 105
93 91 89 92 104 103 108 108 115 108 117 115 83 79 87 83 81 70 93 96 99 95 87 92 99 98 98 94 109 112 108 91 113 90 106 105 100 92 97 94 99 101 101 105 103 102 99 96 95 95 90 94 107 106 108 104
Sample totals 181 175 175 180 208 211 216 213 226 224 232 226 162 149 172 165 160 138 183 190 195 189 176 186 199 193 187 182 222 215 219 189 228 186 209 197 191 185 198 187 202 202 202 209 206 202 199 197 192 188 179 189 213 208 215 209
Batch totals
711
848
908
648
671
746
761
845
820
761
815
804
748
845
SECT.
14.7 NESTED OR HIERARCHICAL ANALYSIS OF VARIANCE
487
r
Likewise I (fk  'ii.Y/(r  1) is a sample estimate of the variance of the r
i
'iii., namely
I
and
0)2/t,
(Vi  v.)2/(r  1) is a sample estimate of the
i
variance of the v's, namely we get E[s~] = E
[
1p2.
Thus multiplying (7.15) by tn/(r  1),
I OiL  x.. f ]
lit r
=
i
1'1
(i
+ 110)2 + Ilt1p2.
(7.16)
Equations (7.7), (7.12), and (7.16) are entered in the last column of Table 14.10. The data of Table 14.11 give the results of taking four samples from each of 14 batches ofa slurry. The quantity of matter in suspension in each sample x' was determined in duplicate by dividing each sample into two subsamples. The model for the analysis will be (7.1). The range 11';; between the duplicates x;;v and x;;v' can be used as an estimate of the standard deviation (J, since c1;; = 1V;;/d2 ; see (6.6.2). If we tabulate the ranges according as the batch total is in the intervals (500  599), (600  699), etc., we get mean ranges of 4.25, 5.50, 7.75, 6.83,8.25 and 12.25. It appears that the standard deviation (J is increasing approximately linearly with the mean. The model (7.1) makes the assumption that the z;;v are distributed normally with a constant variance (J2. However, we saw in Section 3.3 that, when the variance is proportional to the square of the mean, the logarithm of the variable will have a constant variance. The analysis will therefore be performed on x = 100[(log x')  1], this transformation producing numbers easy to handle. Computing forms for the sums of squares in Table 14.10 are easily obtained. Defining
A
B
r
t
"
i
i
v
=I I I
xf;v
= 1 Ir It ("I IIi;
v
Xiiv
c = 1 Ir (tI In til
D
i
= ~(i l'tll
i
)2
±i i
)2
X iiv
v
;
= 88 2 +
v
=
93 2 +
181 2 +
... +
... + 2
104 2 = 1,078,281,
209 2
=
2 = 7112 + ... + 845 =
XiiV)2
4 x 2
=
(10,931)2 14 x 4 X 2
1,077,756.5,
1,075,573.375,
= 1,066,846.080,
(7.17)
(7.18)
(7.19)
(7.20)
488
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
straightforward manipulation yields
..
nt 1 (Xi ..  X.. /
=c
=
8727.295,
(7.21)
n 11 (Xii.  Xi.Y = B  C = 2183.125,
(7.22)
i
l'
..
t
i
;
t
n
111 (Xii. i ; v .. t
Xii)2
=
D
A  B
=
524.500,
(7.23)
= 11,434.920.
(7.24)
n
111 (Xii.  X.Y = A i ; •
D
The analysis of variance is in Table 14.12. The test of the null hypothesis 0)2 = 0 is given by the variance ratio 51.979/9.366 = 5.55: FO.9996(30, 50) = 2.86; so clearly this null hypothesis is rejected. The test of the null hvpothesis 1p2 = 0 is given by the variance ratio 671.330/51.979 = 12.9: Fo.9995(1O,40) = 4.21, so this null hypothesis is rejected. The point
Source of variance Between batches Between samples within batches Between subsamples within samples Total
Table 14.12 Sums of Degrees of squares freedom
Mean squares
8,727.295
13
671.330
0'2
+ 2002 + 8'1'2
2,183.125
42
51.979
0'2
+ 200 2
524.500
56
9.366
0'2
11,434.920
111
E[M. S.]
estimates of the components of variance are
02 = 9.366,
A2 _
0)

51.9799.366 2
= 21 3 1/12 = .,
671.33051.979 = 77.4. 4 X 2 (7.25)
It is apparent that there is considerable variation between samples within a batch. The largest component of variance, however, is between batches. For estimating the mean of a batch, the estimated variance of one subsample from one sample would be 02 + 2 = 9.4 + 21.3 = 30.7. The estimated variance of two subs am pies from one sample would be 02/2 2 = 9.4/2 + 21.3 = 26.0; so there is very little gain in the precision of the estimation of a batch mean through analyzing two subsamples instead of one. It might, nevertheless, be justifiable to analyze two subsamples as a check against gross errors and mistakes.
w
w
+
SECT.
14.8
THE TWOWAY CROSSED FINITE POPULATION MODEL
489
Confidence limits for the overall mean of the process, averaged over batches, can be constructed as follows. The variance of (7.13) is V[X .. .1
=
V[v.1
+
V[y..1
+
V[Z .. .1
2 2 2 2+ 2+ = '!L + ~ + !!.... = /leo (1
I'
rt
rt/l
2
IIt1p,
rtn
(7.26) and
V[x ] = ...
s:
=
rtll
671.330 14 x 4 x 2
= 5.994.
(7.27)
Since
x... 
¢
' " t(r 
1),
(7.28)
v'V[x...1 we find 95 per cent confidence limits for ¢ to be (92.3, 102.9). These are on our transformed scale. Transforming back, the 95 per cent confidence limits for the process average on the original scale are approximately (83.3, 106.9). 14.8. The TwoWay Crossed Finite Population Model The model II analysis of Section 14.4 postulated random sampling from infinite populations. Random sampling from finite populations was first considered by Tukey [1], Cornfield and Tukey [2], and Bennett and Franklin [3]. This finite population model is of interest in itself, for sometimes the assumption that a population, e.g., of machines, is infinite is too gross. This model is also of interest because, if we let the population sizes go to infinity, with the additional assumption of normality, then we get model II, and, if we decrease the population size until it equals the sample size, so that the sample contains the entire population, then we get model!. Our main motivation for considering the finite population model, however, is that it will give us a procedure for handling the mixed model, in which one category, say rows, is model I and the other, columns, is model II. Such mixed models occur frequently in practice. The arguments to follow only involve the ideas of expectation and of combinations but they are somewhat lengthy and involved, and some readers may be content to read only through (8.6) and then proceed to Section 14.9. For a twoway crossed classification with replication in the cells the model is Xii. = ¢ + 1]i + ~i + Oij + Zii.' (8.1) similar to (1.17), with i = 1, ... , r; j = 1, ... , t; 'JI = 1, ... , n. However, here the 1]i and ~i' referring to rows and columns, respectively,
490
TWOWAY AND NESTED ANALYSIS OF VARIANCE
14 are random samples from populations of sizes Rand T and satisfy the conditions CHAP.
(8.2)
Selecting a particular i and a particular j determines the row and column and hence the cell that forms their intersection, and with this cell is associated the interaction constant OiJ. The interaction constants satisfy the conditions R
!
i
0ii
T
=0
!i
for each j,
Oij = 0 for each i.
(8.3)
We make the following definitions: 2
(1" = R
2
1
~
2
1
~
,2
_ 1 f 'Y/i'
(1'=T_17
(1: = (R
(8.4) (8.5)
i'
1 _ l)(T _ 1)
t t O:i' R T
(8.6)
These definitions are not consistent with (7.4.2) which would give, e.g., R
(1: = (I/R) ! 17;, but they are more convenient in the present instance. i
The conventional partitioning of the total sum of squares is identical with that of Table 14.3. We will now evaluate the expectations of the corresponding mean squares under this different model. We will first need the expected values of the squares of certain sums of the Zii' Oji' ~i' and '7i' The Zjiv have zero expectation and are independent and hence have zero covariances, and so Therefore (8.7)
Also
But the variance of the sum of variance (12 is 11(12, and so
11
independent observations each with r t
= ! ! /1(12 = i
i
I'tl1(12.
(8.8)
SECT.
14.8
491
THE TWOWAY CROSSED FINITE POPULATION MODEL
Similarly,
E[*(~~ZiJ.n = E[*(*~Zii.n = E[(**~zii.)l = rtna (8.9) We next find E[ 1]~l The total number of ways a sample of r can 2
*
be taken from R is
E
(~), and these are all assumed equally likely.
•
Thus
[* 1]~J = (~rl[(1]~ + ... + 1]~) + ... + (1]~r+1 + ... + 1]~»). (~
Any particular 1]i will occur in
(8.10)
=:)
samples and hence appear in the
square brackets this number of times as 1]:. Therefore, using (8.4),
E[
t r
We next find
(R)l(Rr _ 11) t
2J =,.
1]i
E[ (*
1]i)].
R
1)
2= r (1  Ii a".2
1]i
(8.11)
By similar arguments,
(8.12) Now (
R
)2
=
R
+
RR
(8.13)
21]i 21]: 221]i1]; i i i ; i~;
and by (8.2) the left hand is zero. Hence, substituting in (8.12),
t r
(R)l[(Rr  11)  (R,. _ 22)JRt
)~ 'J = r
E [ ( 1]£
2
1]i
= r( 1 
r) a". R 2
(8.14) Expressions for {, similar to (8.11) and (8.14) are obtained by the same arguments. We now find
E[ (* ~ OiiJJ.
Specifying a particular row, say the pth,
and a particular column, say the qth, determines a particular cell, and associated with this cell is the constant Opq. The total number of ways we can choose,. rows out of a possible R, and t columns out of a possible T,
492 is
TWOWAY AND NESTED ANALYSIS OF VARIANCE
(~) (~).
CHAP.
14
However, if we specify that the sample of I' rows is to include
the pth row, and the sample of t columns is to include the qth column, the number of ways in which we can select the remaining I'  1 rows and
. (R  1)1 (Tt _ 1)l '
I
t  1 co umns possible ()ii is
t t
r
~
(
IS
\2
()ij J '
I' _
h
.
Thus, III . t e summatIon of all
the coefficient of the direct square of each element
(R 11) (Tt  11) . I' 
r
t
i
:I
The cross products in ( L L
()ii
)2 are of three types:
1. Those with both ()'s in the same row but different columns,
2()'J)q()'J)u,
q =F u. 2. Those with both ()'s in the same column but different rows, 2()'J)iw p =F s. 3. Those with the two ()'s differing both in row and in column number, 2()'J)q().lt'
p =F s, q =F u.
In how many of the possible
(i ±
()i:l)2
i
does a particular cross product
:I
of the first type, 2()'J)q()'J) II , occur? A sample of the ()i:l will arise from a choice of I' rows and t columns. Suppose that it contains a specified row, say the pth, and two specified columns, say the qth and the uth. Then the remaining I'

1 rows can be selected from the R  1 available in
(~
:= !)
ways, and the remaining t  2 columns from the T  2 available in
specifie~ 2t()pi'J); occurs (~ := when we consider all possible (L L()i:I) . (~:= ~)
ways. Thus a
D(~:= D
times
From (8.3) we can writei:l T
o = L ()J') = :I
so T
()!q = ()'J)q
L ()'J):I =
:I*q
()pi()Pl
T
()pq
+ L ()j,j;
(8.15)
}*q
+ ... + ()P( P > 0.05) for the null hypotheses that there are (i) no interaction between batches and treatments, (ii) no differences between batches, and (iii) no differences between treatments. (b) Give estimates of the components of variance (i) between cylinders in the same batch given the same treatment, (ii) for interaction between batches and treatments, and (iii)
502
TWOWAY AND NESTED ANALYSIS OF VARIANCE
CHAP.
14
between batches. (c) Give 95 per cent confidence limits for the difference between treatments C2 and Ca. Batch
2
3
4
5
613 631 603
656 637 649
648 638 649
637 637 617
602 585 608
591 591 597
618 613 619
575 608 612
614 591 580
545 534 547
583 609 614
641 617 634
641 634 614
625 639 627
597 566 593
The sum of squares of all observations is 16,815,853. [Source: George Werner, "The Effect of Type of Capping Material on the Compressive Strength of Concrete Cylinders," Proceedillgs of the American Society for Testillg alld Materials, 58 (1958), 116686.] 14.3. For the data of exercise (7.7), (a) test the null hypothesis that there is no difference between containers, (b) test the null hypothesis that there is no difference between panels, (c) construct 95 per cent confidence limits for the difference between containers A and B (i) assuming that that was your original intention and (ii) assuming that this comparison was suggested by the data. (d) What is the estimated variance of the difference between observations on two containers by a single pa~el, i.e., if Xij is the observation on the ith container by the jth panel, what is V[xij  xi'j]? (e) What is the estimated variance of the difference between an observation on one container by one panel and an observation on another container by another panel, i.e., what is V[xij  xi';']? 14.4. Concrete beams were made with different "cement factors," namely 5, 6 and 7 sacks of cement per cubic yard. For each cement factor five replicate batches of concrete were prepared, and from each batch two replicate beams were cast. The modulus of rupture was determined on each beam. Thus in the table below, 671 and 595 are the results for the two beams from the first batch, and 648 and 618 are the results for the two beams from the second batch, etc., made with cement factor 5. The cement factor is a model I factor. Regard the five batches of concrete prepared for each level of the cement factor as a random sample from an infinite population, and the two beams cast from each batch as a random sample from an infinite population. (a) Give bounds for thePvalues (e.g., 0.10 > P > 0.05) for the null hypotheses that there are no differences between (i) batches of concrete made with the same cement factor, and (ii) levels of the cement factor. (b) Give estimates of the components of variance (i) between duplicate beams from the same batch, and
503
REFERENCES
(ii) between batches made with the same cement factor. (c) Give 95 per cent
confidence limits for the difference between cement factors 6 and 7. Moduli of rupture of concrete beams, in psi Five batches made at each level of cement factor Cement factor Two beams cast from each batch
5
671 595
648 618
548 559
604 640
519 596
6
714 618
684 688
618 628
644 657
629 624
7
708 617
683 696
633 665
725 672
634 608
The sum of squares of all observations is 12,283,103. [Source: Stanton Walker and Delmar L. Bloem, "Studies of Flexural Strength of Concrete. Part 3: Effects of Variations in Testing Procedure," Proceedings of the American Society for Testing and Materials, 57 (1957), 11271139.] 14.5. Prove the identity (3.10). 14.6. Prove the identity (4.8).
REFERENCES 1. Tukey, John W., "Interaction in a RowbyColumn Design," Memorandum Report 18, Statistical Research Group, Princeton University. 2. Cornfield, Jerome, and John W. Tukey, "Average Values of Mean Squares in Factorials," Annals of Mathematical Statistics, 27 (1956), 907949. 3. Bennett, Carl A., and Norman L. Franklin, Statistical Analysis in Chemistry and the Chemical Industry. New York: John Wiley and Sons, 1954.
C HAP T E R IS
ThreeWay and FourWay Analysis of Variance
15.1. The Model The methods of analysis of twoway classification of data in the previous chapter generalize to the threeway case, in which observations can be classified according to three independent criteria. Imagine a threedimensional lattice in which the index i refers to rows which might correspond to varieties of corn, j to columns which might correspond to quantity of fertilizer, and k to arrays, say date of harvesting. Suppose that in general in the population there are R rows, T columns, and U arrays, so that there will be RTU cells in the lattice. Suppose that in the sample there are I'tu cells formed by the intersection of I' rows, t columns, and u arrays. Let 'JI be the index of the observation in each cell, going to 11 in the sample. Let ~iik be the true mean for the ijkth cell. Then the model is
(1.1) where the 'Ziikv are normally distributed with zero mean and variance a2• To represent averaging of the ~iik over any suffix, we will use a bar over the ~ and replace the suffix averaged over by an x. The overall average of the ~ijk' i.e., ~""'''''' we will however represent simply by~. The deviation of the ith row mean from the overall mean we represent by 17i .. : 17i ..
=
~iXX

~.
(1.2)
Similarly the column and array effects are 17.i,
= ~Xix

~,
(1.3)
17 .. k
= ~XXk

~.
(1.4)
504
SECT.
15.1
505
THE MODEL
The deviations of the cell means in the row and column table, formed by averaging over arrays, from the values expected on the assumption that they would be the grand mean ~ plus the row effect plus the column effect are denoted by 'H.: (1.5) The 'ii. are the constants for the row X column interaction. The row X array interaction will be represented by 'i.k and the column X array interaction by '.ik' The deviations of the cell means ~iik from the values expected on the assumption that they would be the grand mean plus the row, column, and array effects plus the three twoway interactions are represented by iik:
°
'ii.
~iik  (~ + 17i .. + 1].i. + 1] .. k + + 'i.k + Ck)' (1.6) The 0iik are the constants for the threeway interaction among rows, columns, and arrays. We can now rewrite the model (1.1) in the form 0iik
=
'ii.
Xijkv = ~ + 1]i .. + 1].i. + 17 .. k + + 'i.k + '.ik + 0iik + Zijkv' (1.7) To represent averaging any of the constants we will use the same convention as for the ~iik' Thus L", is the average over arrays for the jth column of the column X array interaction constants. We will consider the finite population model which will give the other models as special cases. The algebra of the partitioning of the sums of squares and the arithmetic of the calculation of the sums of squares are identical for all models. In the finite population model we suppose that the 1]i .. are a sample of size /' from a population of size R, and the 1].i. and the 17 .. k are samples of t and ufrom populations of size Tand U, respectively. In the population, the various parameters sum to zero over each index: R
!
T
i
= !i
R
T
1]i ..
U
1].i.
= !k
'ii.
RUT
!i 'ii. = ! =! i i R
! i
T
0iik
= 0,
1] .. k
'i.k
= !k
'i.k
= !i
U Ck
= !k
Ck
= 0,
(1.8)
U
=!
Ojik
i
=!
Ojik
= O.
k
We make the definitions ()"2.
""
()"2' B
,,>
= (R =
()"~Ba =
1
R
'" 17~
 1) ft .. ,
1
(R 
etc., RT
'" '" 1)(T1)77 1
(R _ l)(T _ l)(U 
'i
2)"
.
etc.,
(1.9)
R T U
l)'t t f °i
2 ik'
506
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
We can write the deviation of Xi;kY from the grand mean x.... as an identity in which the successive terms are sample estimates of the parameters:
x.. J + (x. i ..  x.. J + (x .. k.  x.. J x;. .. + X....) + (X.•. k.  X.....  X..k. + X. ... )
xiikY  x.... = (Xi ... 
+ (X.d..
 X..... 
+ (X. ik. + (Xiik. + (xiiky 
x. i ..
x.. k. + x.. J

Xii ..  Xi.k.  X. ik.
+ Xi ... + x. i .. + x.. k.  x.. J (1.10)
Xiik ).
+ X.... in the form [x .... + (Xi ...  x... J + (x. i ..  x... J],
If we write Xii ..  Xi ...  X.; .. Xii .. 
(1.11)
it is more obviously a sample estimate of 'ii.; see (1.5). Similarly, the penultimate term in (1.10) is more obviously a sample estimate of ()iik, see (1.6), if written as
+ (Xi ...  x... J + (X.; ..  x... J + (x .. k.  x... J + (x.f,..  X.....  x;. .. + X....) + (X ..k. . .  Xf...  X..k.. + X.... ) + (X. ik .  X. i ..  X.. k. + x... J]. (1.12)
Xi;k.  [x ....
Squaring and summing (1.10) over all indices gives an equation which is entered in column 2 of Table 15.1: the sums of all the cross products are zero. The last column gives the expectations of the mean squares, which can be derived by an extension of the methods of Section 14.8 [1]. The closed forms of sums of squares in Table 15.1 are inconvenient for calculation, and open forms involving totals instead of means are more satisfactory. For example, the sum of squares for the main effect for A is
r(_
_)2 = L 1 r(t "n )2 LLL XiikY 
tun LXi...  X.... i
tun
i
i
k
Y

(rLLLL t "n Xi;kY )2.
1
rtun
i
;
k
Y
(1.13)
For the AB interaction r
t
nil LL( x· 'i..  x·•...  X .J.. i
;
=
1i un i
+ X ••.•)2
±(f f ;
k
r
Xiiky)2  _1
Y
 ntu L (Xi ...  X..Y i
rtun

(i ±f f i
J
k
t
nru L (x. i .. i
Xiiky)2
y

x... Y (1.14)
Total
Within cells
ABC
BC
AC
AB
C
B
A
Source of variance
I
I
..
I
"
I
,
..
11
. .
k
i
I
k
v
~ ~ ~ ~ (xijkv 
;
i
~ ~ ~ ~ (xilkV 
,
Xii •• 
x ••• Y
Xiik.)·
:i ••. .).
rtun 
1
rtu(n· 1)
(r  1)(t  1){u 
I)(u  I)
(t 
+ X ••• .)2
x.lk·
I)(u  I)
(r 
(r  I)(t 
1)
E[M. S.]
j)a"u
a'
a2 + nahc
a2 + n( 1  j)ah c + nraic
2
1 f)( 1 :i)akc + nt(1 j)a~c + nr(1 f)a i c + nrta~ a + n(1 ~)ahc + nuak a' + n(1  f)ah c + nta~c
+ nr(1  o)aic + nruai
n(1  j)(1  ~)akc + nU(1 
+ nt(1  ~)a~c + ntual
n(1  f)(1  o)a"uc + nU(1  f)ak
a2 + n(
a2 +
1
t
uI
a2 +
1
r
I)
Degrees of freedom
+ X ••• Y
+ X ••• .)2
Xi.k. 
X •• k.
X •• k.
X.I ••
+ Xi ••• + x.I •• + X ••k. 
n ~i ~; ~k (xilk. 
,
x.f •• 
..
nr ~I ~k (x.lk. 
I
Xi ••• 
Xi .•• 
X .•. .)2
X ••• .)·
X ••• .)·
nt ~i ~k (Xi.k. 
.
,
I
nu ~i ~ (Xii •• 
r
k
Ilrt ~ (X •• k. 
.
I
nru ~ (:i. I •• 
I
i
ntu ~ (Xi ••• 
,
Sums of squares
Table 15.1
>!
VI
.I
0
t"'
0 t::I m
a::
!f\
:I:
Y'
~

'"!f\()
508
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
where the last two terms are the sums of squares for the A and B main effects. The total sum of squares is "
t
It
n
1 ("
2
t
It
n
)2
!!!!XiikV  rtun !!!!Xiikv , i i k v i i k v
(1.15)
and the withincells sum of squares is
(1.16) The sum of squares for the ABC interaction is obtained by difference.
15.2. Models I and II We will get the expected mean squares for a model I threeway analysis of variance by putting I' = R, t = T, and u = U in the coefficients of the components of variance in the last column in Table 15.1. All the factors in parentheses will be zero, giving the second column of Table 15.2, in
Table 15.2 E[M. S.]
Source of variance A B
Model I 0'2 0'2
C
0'2
AB AC BC ABC Within cells
0'2 0'2 0'2 0'2 0'2
+ ntll0'1 + nrllO'~ + nrtO'~
+ 1Il10'1B + ntO'~c + nrO'~c + nO'~BC
Model II 0'2 0'2 0'2 0'2 0'2 0'2 0'2 0'2
+ nO'!be + nllO'!b + ntO'!e + ntllO'! + nO'!be + 1I1100!b + nrO'~e + nrllO'~ + nO'!be + ntO'!e + nrO'~e + n,.tO'~ + nO'!be + nllO'!b + nO'!be + ntO'!e + nO'! be + n,.O'~c + 1I00!be
which all the components of variance are really sums of squares of constants, e.g., O'~
R
=!
'YJ~.. /(R 
1), and are not true variances. All the
i
effects are clearly to be tested against the withincell mean square. If there is only one observation per cell, the analysis will be unsatisfactory unless O'~BC is small compared with 0'2. We will get the expected mean squares for a model II analysis by putting R = T = U = 00 in the coefficients of the components of variance in the
SECT.
15.2
MODELS I AND II
509
last column of Table 15.1. All the factors in parentheses will be equal to 1, giving the last column of Table 15.2. Here the components of variance are true variances, and we will adopt the convention of using lowercase letters in the subscript to denote this fact. The usual testing procedure is to test the threeway interaction against the withincells mean square. The next step depends on our point of view. A strict "nonpooler" proceeds to test the twoway interactions against the threeway interaction, irrespective of the outcome of this first test. Then he would test the main effects against appropriate linear combinations. For example, to test the A main effect he would use the mean squares for AB + AC  ABC, since the expected value for this combination is
(2.1) This linear combination of three mean squares would have its approximate degrees of freedom estimated by (9.8.10). Alternatively, some people would indulge in some judicious pooling. If at the first test the mean square for ABC was neither statistically significant at the chosen level of significance nor had a variance ratio exceeding 2, the "sometimespooler" would pool the sums of squares and degrees of freedom of the ABC and withincells term, and use this as an estimate of 0'2. The assumption is being made that O'!bC = 0, and so O'!bC is stricken out of all the expected mean squares. Similarly, if one of the interactions, say AB, was statistically significant at the chosen level of significance, or if the variance ratio exceeded 2, the sometimespooler would leave this mean square untouched, but otherwise he would assume that O'!b was zero, pool its sum of squares and degrees of freedom with those for ABC + within cells to get a better estimate of 0'2, and also strike O'!b out of the expectations of the mean squares wherever it occurred. With this procedure, if none of the twoway interactions was significant nor had variance ratios exceeding 2, they would all end up by being pooled with the withincells mean square, and the three main effects would be tested against this pooled error term. The "neverpooler" can be confident that his errors of the first kind have the stated probability. The sometimespooler may be somewhat uncomfortable about this, but he will claim that his errors of the second kind have smaller probability than those of the neverpooler. If the sample sizes are large, so that degrees of freedom are plentiful, the motivation, or temptation, to be a sometimespooler are less. The rule about the factor 2 comes from Paull [2]. If, in a model II analysis, there is only one observation per cell, of course there is no withincell mean square, but putting n = 1 in the expectations of mean squares in Table 15.2 does not affect the testing procedure.
510
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
A further complication in the testing of complicated analyses of variance is the multiplicity of tests being performed. For example, in a fivefactor analysis there will be 5 main effects, 10 twoway interactions, and 10 threeway interactions. Thus, ignoring the fourway and the fiveway interactions, we may be testing 5 + 10 + 10 = 25 effects. Suppose that Xl, •.• , xn are identically distributed independent observations. Then one of the sample of n will be the largest: Call this X mnX . The condition that the largest is less than some value, say x, is the same as the condition that they are all less than x, i.e., Pr{xi
< X, X 2 < X, ••• }
= Pr{x1Il8X
< x}.
(2.2)
Also, since the observations are !ndependent, Pr{xi
< x, X 2 < X, ••• }
=
so Pr{xlllftX
< x} Pr{x2 < x} ... (Pr{x/ < x})";
= Pr{xi
< x} =
(Pr{xi
< x})".
(2.3)
(2.4)
Now there will be a P point of the cumulative distribution of X lll8X such that (2.5) Also, substituting xmnx,P for x in (2.4), Pr{xmftx
< xm8x ,p} = (Pr{xi < xmftx,p})".
(2.6)
Comparing (2.5) and (2.6), we see that
(2.7) or
(2.8) In other words, the P" point of the distribution of xmax is equal to the P point of the distribution of x. Thus, if we are testing 10 independent F ratios, all with the same degrees of freedom, so that they have the same distribution under the various null hypotheses, the 0.99 point of the distribution for the largest of the 10 is actually the ~0.99 ~ 0.999 point of the ordinary F distribution. Of course, in the usual analysisofvariance situation the Fratios are not independent since they use a common denominator mean square. Finney [3] showed that for moderately large, say greater than 10 or preferably 20, denominator degrees of freedom the F ratios could be assumed to be independent without serious error. For the special case where the numerator degrees of freedom are 1, Nair [4] tabulated the 0.95 and 0.99 points of the largest variance ratio for denominator degrees of freedom starting at
SECT.
15.3
MIXED MODELS
511
10, reproduced as Table X of the Appendix. Unfortunately his table only / goes to the largest of 10 variance ratios. Daniel [5] and Birnbaum [6] have developed a scheme for the significance testing of a large number of mean squares with single degrees of freedom which has promise. However, a completely satisfactory procedure must give weight to the relevant a priori probabilities. It is found by experience that main effects are more frequently significant than twoway interactions, and twoway interactions are more frequently significant than threeway interactions, and so on. Thus, if we find two effects, one a main effect and the other a fourway interaction, both significant at the 0.025 level, we would have little hesitation in accepting the former as real and dismissing the latter as an instance of random fluctuation. Also, the pattern of significance conveys relative information. If we find the main effects A, B, and C and their interactions AB, AC, and BC significant, we would not be surprised to find ABC significant, whereas, if ABC was significant without any of the other effects mentioned being significant, we would be tempted to regard this as an accident of random fluctuation. It seems clear that an efficient analysis of a multifactor experiment is at present somewhat subjective. One could certainly lay down certain rules, but they would lead to a higher proportion of errors, of the first and second kinds, than an intelligent and experienced practitioner would commit. 15.3. Mixed Models In threeway analysis of variance there are two mixed models, one in which one factor is random and two factors are fixed, and vice versa. In the first case, if a is random and Band C are fixed, we put R = 00, t = T, and u = U in the expectations of mean squares in Table 15.1, and obtain the lefthand side of Table 15.3. On a nonpooling basis, the tests of significance are clear. For example, aB and aC are tested against within cells whereas BC is tested against aBC, and a is tested against within cells whereas Band C are tested against the aB and aC interactions. If there is only one observation per cell, then satisfactory tests for a, aB, and aC exist only if (J'~BC is small, but the tests for B, C, and BC remain valid. When a and b are random and C fixed we put R = T = 00, U = U in Table 15.1, and this gives the righthand side of Table 15.3. The tests are straightforward except that a linear combination of mean squares aC + bC  abC is necessary to provide a satisfactory error term for C. If there is only one observation per cell, so that there is no withincell mean square, the tests are unchanged except for ab, which requires (J'~bC to be small for a satisfactory test. 
512
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
Table 15.3 a random, Band C fixed
a and b random, C fixed
E[M. S.l
Effect
E[M. S.l
Effect
C
+ IIfIlU! + IIIIU!B + IIrllu~ U· + IIfU!o + IIrtU~
C
+ IIIIU!b + IIfIlU! + IIIIU!b + IIrllu: U· + IIU!bO + IIfU!o + IIru:o + IIrta~
aB aC BC aBC
+ IIIIU!B U· + IIfU!o u· + IIU!BO + IIru~o u· + IIU!BO
ab aC bC abC
+ IIIIU!b + IIU!bO + IIfU!o u· + IIU!bO + IIru:o a' + IIU!bO
Within cells
u·
Within cells
u·
a B
u· u·
u·
a b
u·
u·
u·
u·
15.4. Confidence Limits in ThreeWay Analysis We will discuss the situation where there is only one observation per cell and it is assumed that the threeway interaction is zero; i.e., the model (1.7) with ')I = 1 and (Jiik = 0 becomes X iik
=
e + 1]i .. + 1].i. + 1] .. k + 'ii. + 'i.k + '.ik + Ziik'
(4.1)
We will discuss the construction of confidence limits for the difference between two row means. Using (1.2) 1]i .. 
1k ..
= af.,., 
e)  (~i'"''

e)
= ~f""  ~i'='
(4.2)
Thus confidence limits for 1]i ..  1]i' .. are identical with confidence limits for ~f""  ~i'''''' In the model I case, if we want confidence limits for the difference between two row means, then we will be making the assumption that the interaction involving rows, namely, the 'ii. and 'i.k, are zero; so the model becomes (4.3) X iik = + 1]i.. + 1].i. + 17 .. k + '.ik + Ziik •
e
Averaging over j and k, we get Xi ..
since
=
e+ 17i .. + 17..,. + 17..., + L., + Zi .. = e + 1]i .. + Zi .. , 1 tIT i ii ., ..,. "''11 k ·'.i. "'17 k .i. 0 , tit
since t = T, and similarly X.'t.. 
17..., =
r.,., =
x.,1 •• = (17'i... 
(4.4)
(4.5)
j
O. Thus
'11., 'I t •• )
+ (z.
~..
 z·, ) t ••
,
(4.6)
SECT.
15.4
513
CONFIDENCE LIMITS IN THREEWAY ANALYSIS
with expectation and variance E[xi ..

Xi,.J = 17/ .. 
V[X i ..

Xi,.J = V[zd
(4.7)
1];, .. ,
+ V[Zi'.J =
(4.8)
20'2.
til
Confidence limits for 1k.  17r .. can be obtained by inserting in place of 0'2 the mean square for ABC, which is assumed to estimate 0'2. In the mixed model of the type (a, B, C), if we are going to construct confidence limits for the difference between two column means, 17.i.  17.1'.' we will be making the assumption that the BC interaction '.ik = 0, and so the model (4.1) becomes xiik
= ~ + 171 .. + 17.i. + 17 .. k +
'ii. +
+ Ziik'
(4.9)
+ 17",.. + 17.i. + 17..", + ~"'i. + ~""'" + i.i.,
(4.10)
'i.k
Averaging over i and k, we get
x. i. = ~ whence
x. i . The
x.i'. = (17.i.  17.1'')

+ (~"'i.
 ~"'i',)
+ (i.i.
 i.1'')'
(4.11)
'ii. is a mixed interaction subject to the conditions 1 '.i. = 1 'ii. = 0, R
T
i
J
t
00
but, since R
'ii.
=
00
and T
=
t, these conditions become
1i 'ii. = 1 j
'ii. =
O.
Thus is distributed normally with zero mean and variance O'!B for each level of j, and hence ~"'i. is distributed normally with zero mean and variance O';n!r for each level of j. Hence E[x. i .

x.1'.l
=
(4.12)
(17.i.  17.1''), 20'2
20'2
2
I'll
I'll
V[x. i .  x.d = .!!!!. +  = r
A
V[x.i.  x. i .]
2
=
(0'2
+ 1I0'!n),
(M. S. for aB),
(1. 13) (4.14)
I'll
and confidence limits follow in the usual way. In the mixed model of the type (a, b, C), we have
x.. k 
x.. k = ~ + 17", .. + 17.",. + 17 .. k + ~"''''. + ~",.k + Lk + i .. k, x.. k' = (17 .. k  17 .. d + (~",.k  ~",.k') + (r"'k  Lk') + (i .. k 
(4.15)
i .. k,)·
(4.16)
The
'i.k and '.n. are mixed interactions distributed normally with zero
514
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
means and variances E[X"k 
_
V[X"k 
X"k']
_
X"k']
J?'[X"k  X"k']
O'!e, 0':0 for each level of k.
(4.17) (4.18) (4.19)
abC],
I't
15
Hence
= '/}ook  '/}"k" 20'~0 20':0 20'2 2 2 ) = +  +  = 2 ( 0'2 + to'aO + 1'00w , I' t I't I't = ~ [M. So for aC + bC 
CHAP.
and the linear combination of mean squares will have its approximate degrees of freedom given by (9.8.10). It will be noticed from the foregoing results that it is unnecessary in practice to go through these calculations to obtain the estimated variance of the difference between two means, since the answer always comes out to be twice the appropriate error mean square for testing the corresponding effect divided by the number of original observations in each mean being compared. 15.5. An Example of ThreeWay Analysis of Variance The data of Table 15.4 (taken from [7]) were obtained in a research program aimed at developing procedures for the bacteriological testing of milk. Milk samples were tested in an apparatus which involved two major components, a bottle and a tube. The result of a single test is simply Table 15.4 Bottle (k)
II
Tube {j) Milk sample (i)
1 2 3 4 5 6 7 8 9 10 11 12
A
B
C
A
B
C
1 3 3 2 2 1 5 1 0 3 0 0
1 4 2 4 1 1 5 1 1 4 0 1
1 2 4 1 3 2 5 1 2 5 4 2
1 2 3 1 2 0 3 0 2 1 0 0
3 1 3 0 4 2 5 2 2 1 2 3
2 3 6 0 6 1 5 0 2 3 1 1
recorded as growth or failure to grow. All six combinations of two types of bottle and three types of tube were tested on each sam pIe, and 10 tests were run with each sample X bottle X tube combination. Table 15.4 gives the number of positive tubes in each set of 10. As discussed in Section 3.3, this variable should be binomially distributed, and, to obtain a variable with a stable variance, we should use the inverse sine transformation. However, our main purpose in presenting this example is as an illustration of the calculations for a threeway analysis of variance, and this will be
15.5
SECT.
I
515
AN EXAMPLE OF THREEWAY ANALYSIS OF VARIANCE
achieved better by using the simple integers of Table 15.4. It is, however, instructive to also carry out the analysis of variance on the transformed variable and see how little the conclusions are affected. Let i refer to samples, j to tubes, and k to bottles. Regarding Table 15.4 as a threeway classification with one observation per cell, the operations of summing over cells and calculating a withincells sum of squares do not arise. The first step in the analysis is to form sums over every index and every combination of indices; e.g., we sum over samples (i) to obtain a tube r
Table 15.5. Sums over Samples (i),
2
Tube (j)
XiJk
,
,
~~xiJk
Bottle (k)
A
B
C
I II
21 15
25 28
32
30
78 73
36
53
62
151
i
,
u
~~xilk I
I
k
r
(j)
bottle (k) table containing
X
2
Xi;k
(Table 15.5). This table is then
i
summed over bottles (k) to give the tube totals r
t
i
j
r
it
i
k
r
t
22
X iJk
and over tubes
(j) to give the bottle totals 2 2 X iJk • The sum of the tube totals, equal to the sum of the bottle totals, is the grand total and 15.7 are obtained similarly. Table 15.6. t Sums over Tubes (j),
2
Bottle (k) II
3
2
3
9 9
4
7 6
Sample (i)
A
B
C
9
1 2 3
2 5 6 3
4
3 5 10 1
4
5 3 10 3 3 5 2
4
3
4
~~"'lIk
78
73
151
I
,
I
15 21 8 18 7 28 5
4
5 6 7 8
9
9
1 8 1 2
10
4
11
0 0
12 ,
4
9
3 10 1 4
4
8 5 3
53
62
k
9
15 21 8 18 7 28 5 9 17
7 7
u
~~xilk I
5 5
u
~~Xilk I
k
11
,
XiJk
,
~~x{lk
12
15 3 3 12
tI
2 k
u
17 7 7
9
10
4
Table 15.7. Sums over Bottles (k),
Xi;k
6 6 12 1 12 3 13 2 6 5 3
5 6 7 8
Tables 15.6
Xiik"
k
j
Tube (j)
I
1
i
J ,
Sample (i)
/I
222
k
36
151
516
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
With these preliminary summations the subsequent calculations are straightforward. The sum of squares for samples is given by (1.13) with the modification that 11 = 1 and summation over p is not involved. S.S. for samples
2 X 3
S.S. for tubes
=
S.S. for bottles
=
For the sample 22
2 + 15 2 + ... + 7 2 = 9:.......:......''
+
52
+
= 93.486 '
36 2 + 53 2 + 62 2 (151)2 = 14.527 12 X 2 2 X 3 X 12 ' 78 2 + 73 2 (151? = 0.347. 12 X 3 2 X 3 X 12
tube interaction, the sums of squares is given by (1.14):
X
62 +
(151)2 2 X 3 X 12
... +
2
32
(151? _ 93.486  14.527 = 22.806. 2 X 3 X 12
The other two interactions are calculated similarly. The total sum of squares is given by (1.15): (12
+
32 +
... +
12) _
(151)2 = 184.319. 2 X 3 X 12
All these items are entered in Table 15.8 and the threeway interaction is calculated by difference. Table 15.8 Sums of squares
Degrees of freedom
Mean squares
Samples
93.486
11
8.499
Tubes
14.527
2
7.263
Bottles
0.347
Source of variance
EIM.S.]
0.347
a2 + 3 X 2a! a2 + 2a!T + 2 X 12a~ a2 + 3a:B + 3 X 12a1
Samples X tubes
22.806
22
1.037
a2
Samples X bottles
27.153
11
2.468
1.695
2
0.847
24.305
22
1.105
a a2 a2
184.319
71
Tubes X bottles Samples X tubes X bottles Total
2
+ 2a:T + 3a:B + 12a~B
The tube and bottle are fixed effects and the sample is a random effect. If s, T, and B refer to samples, tubes, and bottles, application of the left half of Table 15.3, with 11 = 1 and a~TB = 0, gives the expectations of mean squares in the last column of Table 15.8. Interpreting this table as a nonpooler, it is apparent that bottles, samples X tubes, and tubes X bottles are nonsignificant. Samples X bottles has a
15.6
SECT.
517
ORTHOGONAL CONTRASTS
variance ratio 2.23, and FO•95(11, 22) = 2.26; so it does not quite reach the 0.05 level of significance. The tube main effect, tested against the sample X tube interaction, has a variance ratio of 7.00, and FO•975(2, 22) = 6.81; so there is no doubt as to the significance of the tube effect. The sample effect is also highly significant, and the component of variance (J'! is estimated as (8.499  1.105)/6 = 1.232. Confidence limits can be constructed with (4.14). For example, for 95 per cent confidence limits between the two bottles, we want (0.975(11) = 2.201, and (X .. I  x.. u) = (78  73)/12 X 3 = 0.139: Also A
V[x .. I

x.. u] =
2
12 X 3 [M.S. for samples X bottles]
= 0.1371.
The confidence limits are 0.139 ± 2.201y'0.1371, or (0.676,0.954). Confidence limits for the difference between any two tube averages can be constructed similarly, although if we are interested in more than one comparison it would be advisable to use Tukey's multiplecomparison technique (cf. Section 10.3). 15.6. Orthogonal Contrasts It is possible to split the r  1 degrees of freedom of a model I factor up into r  1 separate single degrees of freedom, each corresponding to a specific contrast, in such a way that the r  1 contrasts are independent and the sum of the sums of squares corresponding to each contrast adds up to the original unpartitioned sum of squares. Suppose that we have r means Xi distributed normally with means ~i r
and variances
(J'2/11i'
We define a contrast as in (10.3.1) as ()
= 2 Ci~j, i
r
where
2
r
Cj
= 0. As in (10.3.2) the contrast is estimated as H =
i
2 cix
j'
i
and H is distributed normally with expectation () and variance V[H] = r
2'" 2/
(J' £., C i ll i •
Therefore H  () r (
(J'2
2 C~/Ili
)v. ,. . ., N(O, 1),
(6.1)
i
and (6.2)
518
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
= 0 the quantity M
Thus under the null hypothesis ()
M
(i
CHAP.
15
defined as
CiXi\2
= ~ir _','
(6.3)
~ c~/ni i
will be distributed as
and the ratio of (6.3) to an independent estimate 8 2 of 0'2 with! degrees of freedom will be distributed as F(I,f). More generally, taking expectations of (6.2), n
0'2
2
~ E.!
0'2X2(1),
= E[H2] + ()2 _
i ni
2()E[H]
= E[H2]
_
()2,
(6.4)
r
whence, substituting H = ~ CiXi and rearranging, i
(6.5)
If the Xi are based on totals can write M, (6.3), as
Ti
M
of n i observations, so that
xt =
(i Ci Ii/IIi\2
= !...!..i_ _,''r
Ti/lli'
we
(6.6)
~ C~/Ili i
and, when the ni all equal n, this becomes M =
(i CiIi)2
....:....:...i_,r
11
(6.7)
~ c~ i
Suppose now that we have two contrasts, (6.8) and we require these to be independent. We saw in Section 12.3 that for independence it is sufficient, when HI, H2 are jointly normal, to show that they have zero covariance. Now V[HI
so
+ H 2] =
V[HI ]
+ V[H2] + 2 COV[HI' H 2];
(6.9)
(6.10)
SECT. 15.6
But
*
519
ORTHOGONAL CONTRASTS
V[H I + H2] = V[* Clixi
+
C2i Xi]
*(C
= V[*(C li + C2i)Xi] =
li
+ C2i)2V[Xi ] (6.11)
so
2 COV[HI' H 2] =
20'2
i c nc
(6.12)
li 2i •
i
i
Thus independence between HI and H2 implies (6.13)
or, when the ni are all equal to n, r
!
CliC2i
(6.14)
= 0,
i
For l'  1 degrees of freedom, it is possible to construct infinitely many sets of l'  1 orthogonal contrasts, but we are only interested in contrasts that are reasonable. In the milk testing example in Section 15.5, the three tubes were such that tubes Band C were quite similar in size but A was much smaller. It is therefore reasonable to define contrasts with coefficients C li
= 0, 1, 1,
C 2j
= 2, 1, 1. t
These coefficients do define contrasts since! cmj
(6.15)
=
°for
X
1
;
111
=
1 and 2.
Also they are orthogonal as they satisfy (6.14): t
!;
CI ;C2j
°
= x ( 2) + (1) x 1 + 1
= 0.
(6.16)
The first contrast will measure the difference between Band C, and the second will measure twice the difference between A and the mean of B and C. By (6.7) the sum of squares for the contrast H", will be (
M III =
t
)2
ru
~ Cmj~!
Xijk
'k
1
t I'll
!
j
C~,,;
.
(6.17)
520
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
For m
=
15
1, this is [0 X 36 + (1) X 53 + 1 X 62]2 12 X 2[0 2 + (_1)2 + 12]
and, for
CHAP.
111
=
1.687
(6.18)
,
= 2, it is
[(2) X 36 + 1 X 53 + 1 X 62]2 12 X 2[(2)2 + 12 + 12]
= 12.840.
(6.19)
Each of these sums of squares has one degree of freedom, and the sum 1.687 + 12.840 = 14.527 is the unpartitioned sum of squares for tubes with two degrees of freedom. The appropriate error term for the partitioned mean square is the same as for the unpartitioned mean square, the sample X tube interaction in this case. For the two components the F's are 1.687/1.037 = 1.63 and 12.840/1.037 = 12.4. The first contrast, between the similar tubes, is completely nonsignificant, but the second contrast, between the small tube and the mean of the larger tubes, is highly significant. The set of orthogonal contrasts in Table 15.9 is applicable to the case where the factor is a numerically valued variable for which the gaps between successive levels are equal. The coefficients ~i, ~~, etc., will give the sums of squares for linear, quadratic, cubic, etc., terms in a polynomial of the dependent variable y on the factor x. Obviously, for k levels only k  1 terms can be fitted. To construct the polynomial regression equation is not difficult, but we will not deal with it here: See [8], Chapter 16. Coefficients up to the fifth degree for numbers of levels k up to 52 are given in [9]. Table 15.9 k=3
"=5
k=4
e~
e;
e~
e;
e;
e~
e;
e;
e~
1
1 2 1
3 1 1 3
1 1 1 1
1 3 3 1
2 1
2 1 2 1 2
1 2
1 4 +6 4 1
0
1
0
1 2
0
2 1
15.7. The Partitioning of Interactions into Orthogonal Contrasts A t X u twoway table will have (t  1)(u  1) degrees of freedom for interaction, and, if both the classifications are fixed effects which have been partitioned into orthogonal contrasts, it is possible to compute sums of squares with single degrees of freedom corresponding to the pairwise
SECT.
15.7
521
THE PARTITIONING OF INTERACTIONS
/
interaction of single degrees of freedom of the two sets of orthogonal contrasts. An example will make this much clearer. Table 15.10 (a)
dl d2
= 1 =1
(b)
Cn
Cl2
0
1
1
0 0
1
1
1 1
C21
Cl3
dl d2
= =
1 1
C22
C23
2
1
1
2 2
1
1 1
1
For the milk testing example in Section 15.5, we partitioned the main effect of tubes into two contrasts in Section 15.6. The other fixed e.ffect, bottles, has only two levels, and so only one degree of freedom, but we can regard this as a single contrast defined by coefficients dk : dl = 1, d2 = 1. We then set up Table 15.10. We form the products dkc m; in the body of the table. The coefficients so formed are contrasts, since within each table they sum to zero, and they are orthogonal, since the sum of products of corresponding coefficients is zero. (If the factor bottles had four levels, and we had partitioned it into three contrasts, there would have been three sets of d's, namely, d1k , d2k , and d3k , and we would have gotten 6 twoway tables of coefficients corresponding to die 6 individual degrees of freedom of the interaction.) To get the sums of squares corresponding to these two degrees of freedom, we use (6.7) again, and get the sum of the products of the coefficients in Tables 15. lOa and b with the totals in Table 15.5. From Table 15. lOa, [0 X 21
+ 1 X 25 + (1) X 32 + 0 X 15 + (1) X 28 + 1 X 30]2 12[02 + 12 + (_1)2 + 0 2 + (_1)2 + 12] (_5)2
=   = 0.521.
12 X 4
(7.1)
Table 15.lOb gives a sum of squares 1.173, and the sum of these two components is 1.694 with 2 degrees of freedom, which is equal to the unpartitioned sum of squares for the bottle X tube interaction. In the example of Section 15.5, two of the twoway interactions are mixed model interactions. For example, the tube X sample interaction has tube as a fixed effect, which is partitionable into orthogonal contrasts, and samples as a random effect, which cannot be partitioned into individual degrees of freedom. However, the (r  1)«(  1) degrees of freedom can be partitioned into (  1 components, each consisting of r  1 degrees :>f freedom, representing the interaction of each component of the tube
522
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
effect with samples. We calculate the value of the contrast for each sample, and then find the variance of these. For example, consider the contrast defined by the coefficients cli = 0, 1, 1. Let Hli be the value of this contrast for the ith sample. (7.2)
Then E[Hli]
t
t
i
i
= ~ cljE[xuJ = ~ Clj!ij", = 0li> t
V[H li ] = ~ c~;V[Xid ;
(7.3)
(12 t
~ c~;.
=
II
(7.4)
i
Suppose that the null hypothesis that all the 0li
= iiI. = 01 is true.
Then
r
~ (Hli 
Hl.)2/(r  1) is the usual sample estimate of the variance of the
i
Hli and
1
~ (H l'
H)2 r..J
  £. I' 
1
(12
~
2
X2(1'  1)
£. Cli "'0._" u; (I'  1)
1 i t .
,
(7.5)
so if we define N as r
II
~ (Hli 
Hd 2
N=  i   t 
(7.6)
(I'  1) ~ c~; ;
it is distributed as (12X2(I'  1)/(1'  1). Thus under the null hypothesis 0li = 01> the ratio N/S2, where S2 is an independent estimate of (12 with / degrees of freedom, has the F(r  1,/) distribution. Substituting (7.2) for H li , N can be calculated from the relationship
We will now
~nd
E[N]. We have the identity
Hli  0li
= (Hli 
HI.)  (Oli  iiI.)
+ (HI.
 iiI),
(7.8)
SECT.
15.7
523
THE PARTITIONING OF INTERACTIONS
whence l'
Li (Hli 
l'
()li)2 =
Li [(H1i 
ii1J 
01J]2 + r(ii1.  01J2, (7.9)
«()li 
since the cross product is zero: l'
2 L (iiI.  01J[(Hli
H1', )

«()li 
01J]
i
t
The lefthand side of (7.9) is distributed as «(]2/U) L c~ix2(r). Thus taking expected values, we have i
Thus
E[
t (Hli r
_
Hd
2J
(]2
= (r  1) ;;
t
2 Cli
t
+ t «()li l'
2

()lJ ,
(7.12)
whence, referring to (7.6), r
E[N]
=
II
(]2
L «()li  01i
+ !i~t
(7.13)
(r  I)Lc~i i
" x iik are The sums .2
obtained by summing over bottles and were given
k
u
It
in Table 15.7. The sums
L Xm , .2 X12k'
1t
L Xl3k
are given by the k k k entries in the first row, i = 1, in that table: 2,4, and 3, respectively. Then for i = 1, 2, 3, ... , r, we have UHli taking the values t
"
i
k
t
"
i
k
t
"
i
k
IIHll=Lc1iLxlik= [0
and
X
2+(1)
X
4+ 1
X
3]= 1,
[0
X
5 + (1)
X
5+ 1
X
5]
UH1r=Lc1iLxrik= [0
X
0+(1)
X
4+ 1
X
3]= 1.
uH12
=
Lcli.2X2ik
=
=
0,
(7.14) (7.15)
(7.16)
524
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
Then, since u = 2, the first term in (7.7) is
i (± c
li
•
't
~
r
X iik
k
=
(_1)2 + 02 + 52 + ... + (_1)2 75 2x[02+(1)2+12] =2X2=18.750,
~ , C~i (7.17) whence the sum of squares for the interaction of this contrast with samples is 18.750  1.687 = 17.063. A similar calculation for the second contrast gives for the interaction sum of squares L/
32 + 02 + 3 2 + ... + 7 2  12.840 = 18.583  12.840 = 5.743. (7.18) 2[(  2)2 + 12 + 1 ]2 The sum of these two components, 17.063 + 5.743 = 22.806, equals the unpartitioned sum of squares for the sample X tube interaction. 15.8. FourWay Analysis of Variance
The analysis of a fourway classification is the obvious extension of threeway analysis, and we will review it only briefly. The model is Xiikmv
where the
~iikm
=
~iikm
+ Ziik",v
(8.1)
have the structure
~iikm = ~ + 'Jt ... +
'i.
+ 1] .. k. + '7 ... 111 + 111 + '.ik. + C.", + ' .. kllt . + e (8.2) IIt i k", + e. ik ", + W ii"", and the Ziik""" 'JI = I, ... ,11, are normally distributed with zero mean and variance (12. The sample sizes for i,j, k, mare r, t, u, v, and the population sizes are R, T, U, V. The ,];. .. are a sample of size r from a population of size R and sum to zero in the population '7.i ..
+ 'ii .. + 'i.k. + eiik . + eil .
R
L'h .. = 0; i
(8.3)
similarly for the other main effects. The twoway terms sum to zero in the population over each index, e.g., R
T
Li 'ii .. = L 'ii .. =
O.
(8.4)
j
The threeway terms and the fourway terms also sum to zero in the population over each index, e.g.,
(8.5) and
(8.6)
SECT.
15.8
525
FOURWAY ANALYSIS OF VARIANCE
We define O'~, O'~B' etc., analogously to the definitions (1.9). The usual identity xOk>llv  x ..... = etc. has on its righthand side the following groups of terms: 1. Sample estimates of the four main effects, e.g., Xi ....  X..... , which is an estimate of 'YJi ... ' 2. Sample estimates of the six twoway interactions, e.g., xii ...  Xi .... x. i ... + X..... , which is an estimate of ~ij .. ' 3. Sample estimates of the four threeway interactions, e.g., Xiik .. xii ...  Xi . k..  X. ik .. + Xi .... + X. i ... + X.. k..  X..... , which is an estimate of ()jjk .. 4. The sample estimate of the single fourway interaction, which will have the form xUk>ll.  [x ..... + main effects + twoway interactions + threeway interactions]. 5. The deviations of the individual observations from the cell means, e.g.,
xii 1..·m.
xi;knu' 
Squaring and summing over all indices gives the partitioning of the total sum of squares. Typical sums of squares and computing identities are r
A
= IIVUt L (Xi .... 
X... .f
i
= 1t
i (iiii
U VII i
i
=
'"
2 
1(iiii~>iiklllv)2, i
I" til VII
i
k
.".
V
(8.7)
L L (xii ... 
IIVU
Xiik>llV)
v
t
r
AB
k
i
Xi .... 
X. i ...
j
r t (" " )2 =  L L LLLxiikmv 1
+ x... J 2
11
UVII i
i
k
v
III

1
I"tllVII
(rLLLLLxiik>llV til" " )2 i
i
k
m
v
 sum of squares for A  sum of squares for B, r
ABC =
IIV
t
u
L Li Lk (xiik .. 
xii ...  Xi .k..  X. ik ..
i
=~
iii (i i
VII i
i
k
Xiik1llV)2 
+ Xi .... + x. i ... + x..k ••  x... .f
_1 (i iii i
I"tuVII
v
>II
i
i
k
III
Xiik>llv)2
V
 (sums of squares for A, B, C, AB, AC, BC),
ABC D
=!
iii i (i
II i
i
k
III
Xiikmv)2 
_1_ I"tu VII
v
(i iii i i
j
~.
>II
r
til"
n
= L L L L L X:j~''''V 
Xjjk>ll,.)2
iikmv
r t i l " (n )2 L L L L L Xijkmv . l1iikm v 1

(8.9)
V
( sums of squares for all main effects) twoway and threeway interactions' Within cells
(8.8)
(8.10) (8.11)
526
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
The degrees of freedom for the foregoing are r  1, (r  1)(1  1), (r  1)(1  1)(u  1), (r  1)(1  1)(u  1)(v  1), and rluv(n  1), respectively. The expectations of the mean squares are as follows: E[M. S. for A]
~) (1  ~) (1  ~)(J~BaD
= (J2 + 11(1 
1 ;)(1  ~)(J~Ba + 11 1 ~)(1 ~)(J~BD + 11/(1  ~)(1  ~)(J~aD + nuv(l  ~)(J~B + I1tV(1 ~)(J~a + ntu (1  ~ )(J~D + ntuv(J~, (8.12) + IlV (
E[M. S. for AB]
U(
= (J2 + 11(1 
~)(1  ~)(J~BaD + lIV(l  ~)(J~Ba + 1111
(1  ~ )(J~BD + nllv(J~B'
~ )(J~BaD + nV(J~Ba, E[M. S. for ABCD] = (J2 + lI(J~BOD'
E[M. S. for ABC]
= (J2 + 11 (1

(8.13) (8.14)
(8.15) The various models can be obtained from this finite population model.
EXERCISES 15.1. Four strains of a microorganism producing an antibiotic in a fermentation process were to be compared. The comparison was made on four batches of the main ingredient of the broth. A further factor was introduced into the experiment, namely, concentration of the broth, at three levels, equally spaced. The yields were as follows:
Concentration
Concentration Batch
Strain A
B C D
2
A
B C D
40 52 78 59 47 64 73 77
2 69 71 100 76
3 70 91 110 108
Batch
76
91 99 143 127
4
72
122 106
3
55 61 71 78
2 79 83 106 103
3 102 94 106 127
44 69 87 76
77 75 106 107
85 116 131 125
Strain A
B C D A
B C D
527
EXERCISES
The strains are a model I effect, and in fact the main objective of the experiment is to select the best strain. The batches are a model II effect, and the concentration is a model I effect, partitionable into linear and quadratic components. (a) Give the conventional analysis of variance with the expectations of the mean squares. Make tests of significance of the various effects. State succinctly your interpretation of the effects of the various factors. (b) Give 95 per cent confidence limits for the difference between strain C and strain D, averaged over batches and concentrations. 15.2. The table below gives the results of an experiment on the amounts of niacin found in peas after various treatments. The factors were: (a) A comparison between blanched peas Po and processed peas Pl' (b) Two temperatures of cooking: Co = 175°F, C l = 200°F. (c) Two times of cooking: To = 2t min, Tl = 8 min. (d) Three d1fferent sieve sizes of peas: Sl' S2' S3'
Po
PI
C
S
To
Tl
To
Tl
0
1 2' 3
91 92 112
72 68 73
86 85 101
68 72 73
1 2 3
84 94 98
78 78 73
83 90 94
76 71 76
Source: R. Wagner, F. M. Strong, and C. A. Elvehyem, "Effects of Blenching on the Retention of Ascorbic Acid, Thiamine, and Niacin in Vegetables," Industrial and Engineering Chemistry, 39(1947),99093. Make an analysis of variance of these data. Suppose that the levels of the factor sieve size are spaced equally, and partition the main effect of sieve size into components corresponding to linear and quadratic terms: likewise for the firstorder interactions. Include in the analysis the secondorder interactions, but do not bother with any partitioning of these. The highestorder interaction mean square will have too few degrees of freedom to be a satisfactory estimate of error (since there is no replication, there is no explicit estimate available). Therefore, for an estimate of error pool the secondorder interactions with the highestorder interaction (i.e., assume that the second and thirdorder interactions are all zero and hence the corresponding mean squares are all estimates of a 2 ). What effects do you consider to be statistically significant at the 0.05 level? For uniformity, (a) be a neverpooler (apart from that recommended in the previous paragraph); (b) in testing the firstorder interactions, make due allowance for the fact that there are quite a large number of them; (c) on the other hand, be more generous to the main effects, and test them individually. Summarize in appropriate tables the effects you find statistically significant, and list the effects you consider nonsignificant.
528
THREEWAY AND FOURWAY ANALYSIS OF VARIANCE
CHAP.
15
15.3. In a comparison of three different compositions of mortar, three lots of cement were supplied to each of five laboratories. Each laboratory using each lot of cement made up batches of mortar according to the three compositions. Twoinch cubes of mortar were cast and the compressive strength determined. The factor composition is a model I effect, and the factors cements and laboratories are model II effects. (a) Make a conventional analysis of variance of this data, presenting the results in standard tabular form, including the expectations of the mean squares. (b) Test the null hypotheses that the various main effects and interactions are zero. _(c) Give estimates of the components of variance corresponding to the various random terms in your model. (d) Give 95 per cent confidence limits for the difference between compositions 1 and 2. The compositions were such that it would be interesting to consider the following contrasts: (i) between compositions 1 and 2. (ii) between the mean of compositions 1 and 2 and composition 3. (e) Partition the main effect for composition into single degrees of freedom corresponding to these two contrasts. Test the null hypotheses that each contrast is zero. (f) Partition the laboratory x composition interaction into two components and test the null hypotheses that they are zero.
2
Cement Composition Lab 1
2 3 4 5
812 746 797 850 829
2
3
839 744 802 896 829
723 689 731 757 735
870 799 771 864 887
3
2
3
838 797 737 877 903
798 759 724 779 765
859 787 781 843 863
2
3
863 765 821 854 940
761 709 753 759 777
The sum of squares of all observations is 29,068,527. Source: Working Committee on Plastic Mortar Tests, "Report on Investigation of Mortars by Seven Laboratories," Proceedillgs of the Americall Society for Testillg alld Materia/s, 40 (1940), 21025.
REFERENCES 1. Cornfield, Jerome, and John W. Tukey, "Average Values of Mean Squares in Factorials," Annals of Mathematical Statistics, 27 (1956), 907949. 2. Paull, A. E., "On a Preliminary Test for Pooling Mean Squares in the Analysis of Variance," Annals of Mathematical Statistics, 21 (1950),53956. 3. Finney, D. J., "The Joint Distribution of Variance Ratios Based on a Common Error Mean Square," Annals of Ewrellics, 11 (1941), 13640. 4. Nair, K. R., "The Studentized Form of the Extreme Mean Square Test in the Analysis of Variance," Biometrika, 35 (1948), 1631. 5. Daniel, C., "Fractional Replication in Industrial Research," Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 5, 8798. J. Neyman (ed.). Berkeley: University of California Press, 1956.
REFERENCES
529
6. Birnbaum, A., "On the Analysis of Factorial Experiments without Replication." Contributed paper at Annual Meeting of the Institute of Mathematical Statistics, Cambridge, Massachusetts, August 2528,1958. 7. Barkworth, H., and J. o. Irwin, "Comparative Detection of Coliform Organisms in Milk and Water by the Presumptive Coliform Test," Joufllal of Hygielle, 43 (1943), 12935. 8. Anderson, R. L., and T. A. Bancroft, Statistical Theory ill Research. New York: McGrawHill Book Co., 1952. 9. Fisher, R. A., and F. Yates, Statistical Tables for Biological, Agricultural, alld Medical Research. 3rd ed.; Edinburgh: Oliver and Boyd, 1948.
C HAP T E,R r6
Partially Hierarchical Situations
16.1. A Partially Hierarchical Situation and Its Model In an investigation of the canmaking properties of tin plate, two methods of annealing were studied. Three coils were selected at random out of a supposedly infinite population of coils made by each of these two methods. From each coil, samples were taken from two particular and reproducible locations, namely, the head and tail of each coil. From each sample, two sets of cans were made up independently, and from each set an estimate of the can life was obtained: These are the data in Table 16.1, taken from Table 16.1 Annealing method i 1 Location
Duplicates
k
11
1(1)
2(1)
1 2 1 2
288 295 278 272
355 369 336 342
2
2
Coils within anneals j(i) 2(2) 3(1) 1(2)
329 343 320 315
310 282 288 287
303 321 302 297
3(2) 299 328 289 284
[1]. For a definitive study, one would require a substantially larger sample of coils, but the data of Table 16.1 will suffice to demonstrate the principles involved in the analysis. The structure of this experiment is rather different from what we have previously encountered. If the coils were crossed across annealing method, so that the first coil with annealing method 1 corresponded in some way with the first coil with annealing method 2, and the second coils likewise, 530
SECT.
16.1
HIERARCHICAL SITUATION AND ITS MODEL
531
etc., then we would have a threeway analysis with replication in the cells. However, this is not the case: The coils are not so crossed, but instead are nested within the annealing methods. Alternatively, if the locations were random samples from each coil, so that they were nested within the coils, with no crossing of location 1 across coils or annealing methods, then we would have a purely nested or hierarchical situation, with four classifications, namely, annealing methods, coils within annealing methods, locations within coils, and replications within locations. However, this is not the case: The locations are a model I effect crossed across coils and annealing methods. What we have is a partially hierarchical or crossed nested situation, which can be represented by the model x jjkv = ~
+ OCi + {c(oc)}j(j) + Ak + (AOC)ik + {AC(OC)}k;W + ZOkv'
(1.1)
in which ~ is the grand mean, OC i is the annealing method effect, {c(oc)}j(i) represents the random coil effect within annealing method, Ak represents the location effect, (AOC)ik represents the interaction of locations with annealing methods, {AC( oc) hj(i) represents the interaction of locations with coils within anneals, and ZOkv is a random error distributed normally with zero mean and variance a 2 • In (1.1), single symbols could be used in place of {c(oc)}, (AOC), {AC(OC)}, but the use of these multiple symbols is helpful in identifying immediately the meaning of each. It is also helpful to use Greek letters to denote fixed, model I effects, e.g., oc and A, and Roman letters for random, model II effects, e.g., C and z, and this has been done in (1.1), somewhat prematurely, as we will first consider the finite population model and then move to this particular case. Thus for the finite population model, i goes to r in the sample and R in the population, j to t in the sample and T in the population, k to u in the sample and U in the population, and 11, which is from an infinite population, to n in the sample. There are various side restrictions on the model (1.1). First, R
LOCi
U
R
U
k
i
k
= 0, L Ak = 0, L (AOC);k = L (AOC)ik = O.
i
(1.2)
These conditions are similar to those for a twoway model, (14.8.2) and (14.8.3). Second, the coilswithinannealingmethods term {C(OC)}j(i) is a standard nested term similar to the Aj(i) of Section 14.10 and Table 14.15, i.e.; T
L; {c(oc)}J(;) = 0
for each i.
(1.3)
The complete array of {AC(OC)hj(i) is indicated in Table 16.2. For each
532
PARTIALLY HIERARCHICAL SITUATIONS
CHAP.
16
value of i we have a twoway table of interaction constants with the usual property of twoway interaction constants of summing to zero in each row and in each column [see (14.8.3)], i.e., T
L:I {AC(cx)h:i(i) = 0
for each (k, i),
(1.4)
for each j(i).
(1.5)
U
L {AC(CX)h:i(i) = 0 k
However, for a fixed k, i.e., in Table 16.2 for a fixed row, the {AC(CX)}kj(i) do not sum to zero over i for a fixed j. In this respect they behave similarly to the A;/(i) of Table 14.15.
16.2. Calculation of Sums of Squares, Etc. The identity corresponding to the model (1.1) is X·tJft!V .•. 
x = (x.t...  x....) + (x't i .0 ••••
+ (x ..n;.•.  x.... ) + (x,..  x" .0.  x.0";.•. + x....) + (x,.,...  x,':1  x,," + XI 
X.t... )
Ai,
... 1\0.
••
,Ai.
...
)
(2.1) The terms on the righthand side are the sample estimates of the terms on the righthand side of the model (1.1), excluding the grand mean. The first and third terms need no comment. The second term is similar to an ordinary nested term such as the second term in (14.7.2). The fourth term is an ordinary twoway interaction resembling (15.1.11). The fifth term can be obtained by regarding it as the difference between xiik . and what would be predicted as the grand sum X.... plus the annealing method effect Xi ...  x.... plus the coilswithinannealingmethod effect Xii ..  X.... plus the location effect x.. k •  X.... plus the location X anneal twoway interaction Xi .k .  Xi ...  x.. k . + x.... ; i.e., Xiik . 
[x ....
+ (Xi ...
 :1:... .>
+ (Xii .. 
+ (Xi.k. 
+ (x .. k.  x.... ) Xi ...  x.. k. + x.. J] = Xiik .  Xii ..  xu. + Xi.... Xi .. )
(2.2)
An alternative way of constructing partially hierarchical models is to consider them as degenerate cases of fully crossed models. For example, we suppose momentarily that the coil effect is fully crossed, so that there will be a coil main effect x. i ..  x.... and an anneal X coil interaction
gj
~ ..... 0\ N Table 16.2
i
=1
j(i) = 1(1) . .. j(i) = T(1)
k
=1
~
i =R
T
{AC(tX)}ll(1) ... {Ac(tX)hT(l)
()
j(i) = l(R) .. j(i) = T(R)
I
c::
T
{Ac(tX)h;(1)
=0
{AC(tX)}U(R) ... {Ac(tX)hT(R)
I
{Ac(tX)h;(1)
=0
{AC( tX )}kl(R)
.I; {Ac(tX)h;(R) = 0
{AC(tX)}li(R)
=0
i
T
k=k
{AC( tX) hl(1)
{AC( tX )}kT(l)
I; T
k=U
{Ac(tX)}Ul!l)
{Ac(tX)}UT(l)
I
{AC(tX)}U;(1)
=0
T
{AC( tX )}kT(R)
{AC( tX) }Ul!R)
{Ac(tX)}UT(R)
U
U
I
{Ac(tX)hl!1)
k
=0
Ik {Ac(tX)hT(1)
I
~ ~ CI.I
c:: E::
CI.I
T
i
§
{AC(tX)}U;(R)
=0
~ CI.I
I:>
U
u
Ik {Ac(tX)hl!R) I
{Ac(tX)hT(R)
k
~
ttl
=0
=0
=0
~
Ul W
W
534
PARTIALLY HIERARCHICAL SITUATIONS
CHAP.
16
+ X..... Now we admit that the coil effect is not a main effect and pool it with its interaction with anneals:
Xii ..  Xi ...  X. i ..
(X.i..  X.... )
+ (x.ti..

X.t...  X.i..
+ X.... ) = x.ti..

X.t ... '
(2.3)
which is the second term on the righthand side of (2.1). Similarly, if coils was a crossed effect, then it would have an interaction with locations, and its interaction with anneals would also have an interaction with locations. But, since coils is not a crossed effect, these two interactions are pooled together: (X. ik . 
x. i ..

+ (Xiik. 
x.. k. + x... ,) Xii ..  Xi . k.  X. ik.
+ Xi ... + x. i .. + x..k • 
X... ,)
= xiik.  Xii ..  Xu. + Xi ... , (2.4) which is identical with (2.2). This viewpoint also applies to the degrees of freedom. For coils within anneals, each anneal contributes I  1 degrees of freedom and there are /' anneals; so the degrees of freedom are /'(t  1). But, taking the viewpoint of (2.3), the degrees of freedom should be (t  1)
+ (/' 
1)(t 
1)
= /'(1 
1),
(2.5)
which is the same result. For the interaction of locations x coils within anneals, since locations have u  1 degrees of freedom and coils within anneals have /'(t  1) degrees of freedom, their interaction will have /'(t  1)(u  1) degrees of freedom. From the viewpoint of (2.4), the degrees of freedom will be (I  1)(u  1)
+ (/' 
1)(u  1)(1  1)
= /'(t
 1)(u  1),
(2.6)
which is the same result. Squaring and summing over all indices (2.1) gives an equation entered in the second column of Table 16.3. To calculate these sums of squares we form Table 16.4 by summing over p. We next sum over k to get the coilswithinannealingmethod table. We also sum over j to get the locations X annealing method table. Summing this over i gives the location totals and over k gives the annealing method totals. The sum of squares for annealing methods is an ordinary main effect [see (15.1.13)]: Illit
Lir t(;.x·. . 
2 2 ;)2 x _ 3842 + 3590 .... 2 X 3X 2 2
X
(7432)2 _ 2646000 . . 3X 2 X 2 (2.7)
til
Source of variance
Annealing methods Coils within annealing methods Locations
Sums of squares
'1'
nut
1 (Xi ... 
1
t
'1'
r(t 
null(x"" ".. x" I ••• )2 i
nrt
 x ... J2
u
11 (x" k
t. •
i
'1'
1)
k
t
 x"t...  X.. k.
+ X. ... )2

:'l
0\
E[M. S.]
a2
a2
j
1" (x .. k. r
nt
+
(1 ~)(1 ~)na~c(a)
+
(
1 U
u I
a2
+ (I
(r  I)(u  1)
a2
+ n(1
u) na;'c(a) 2
(I 
N +t(1
~)na~a
~) una~(a) + tuna!
2 + unaC(a)
~)na~c(a) + (I i)tn~a +rtna;. 
~)a~c(a) + tn~1X
(')
> r< (')
C
r
••• , lu, would be selected, so that locations would be a model II effect. (c) Instead of the single random sample oflocations 11> ••• , lu being taken for all coils for all annealing methods, as in (b), one random sample of locations is taken for the first annealing method, a second random sample of locations is taken for the second annealing method, and so on. All coils for a particular annealing method are sampled at the same locations. (d) Instead of a single random sample of locations serving for all coils for a particular annealing method, as in (c), a separate random sample of locations is taken for every coil. For each of these situations, using as far as possible the sample terminology and notation, but where necessary making appropriate changes, list the sources of variance for an analysis of variance table with the corresponding degrees of freedom and the expected values of the mean squares. State the error terms appropriate for testing the null hypotheses that there are no differences between annealing methods and between locations.
REFERENCES 1. Vaurio, V. W., and C. Daniel, "Evaluation of Several Sets of Constants and Several
Sources of Variability," Chemical Engineering Progress, 50(1954), 8186. 2. Bennett, Carl A., and Norman L. Franklin, Statistical Analysis ill Chemistry and the Chemical Industl)l. New York: John Wiley and Sons, 1954.
CHAPTE R I7
Some Simple Experimental Designs
17.1. Completely Randomized Designs We have in previous chapters developed techniques suitable for analyzing the simpler experimental designs. The simplest experiment perhaps is to compare I' levels of a single experimental factor: We may be comparing three brands of gasoline, or four thicknesses of shoe leather, or five quantities of supplement to a hog ration. If we make IIi independent.observations on each level of the factor, in a completely random order, the model for the experiment will be
and the appropriate analysis will be model loneway analysis of variance as discussed in Section 10.2. Usually it will be preferable to have the IIi all equal, since then the variances of the differences between any pair will be the same, and also since Tukey's method of multiple comparisons, which requires equal IIi and is more efficient than Scheffe's method for simple comparisons, can then be applied. However, if our objective is to compare all the other levels of the factor with one particular level, the "control," as can be done effectively with Dunnett's technique, it is advantageous to have the IIi for the control equal to .J"k times the 11; for the other levels, where k is the number of other levels. The next simplest experiment is to investigate two factors simultaneously in all combinations with 11 independent replicates per combination. For example, one factor could be brand of gasoline at I' levels, and the other mean speed of automobile at t levels. Or one factor could be quantity of supplement to the hog ration and the other factor the breed of hog. The total number of observations required is rIll, and these should be obtained in a completely random order. The appropriate analysis will be twoway 547
548
SOME SIMPLE EXPERIMENTAL DESIGNS
CHAP.
17
analysis of variance as discussed in Chapter 14. The model may be model I, or II, or mixed, depending on the nature of the factors. For example, in the hog example just given, the first factor, the quantity of supplement, is a fixed or model I factor, and the second factor, breed of hog, would be a model I factor if we were interested only in these particular breeds, but a model II factor if we were interested in generalizing to a larger population of breeds. Table 17.1 Treatments Random numbers Ordering
11
12
13
14
15 77 01 64 3 13 2 10
15
21
22 23
24 25
69(2) 11
69(8) 12
58 40 81 8 7 14
31
32 33
34 35
16 60 20 00 84 22 4 9 5 1 15 6
In discussing the principles of experimental design, it is convenient to have a general word for the basic experimental unit that gives rise to the measurement which we analyze. In the preceding paragraph, a given trip with a particular automobile using a particular gasoline will give rise to a single determination of gasoline consumption and is the basic experimental unit. In the hog feeding example, the individual hog is the experimental unit. The great bulk of the theory of experimental design was developed in agronomy, in which the basic experimental unit is a plot of ground on which is grown a crop and to which is applied fertilizers, etc. It is convenient to use the word plot for basic experimental unit in general. Randomization of treatments on to plots is best performed with a table of random numbers (Table Xl). Suppose that we have an experiment involving five replicates of three treatments. We can denote the jth replicate of the ith treatment as ij. We write out in any order, systematic if convenient, these 3 X 5 = 15 treatments (Table 17.1, first row). Under each treatment we enter a twodigit random number from Table XI. We then order in increasing magnitude these random numbers. The particular set we chose happens to have a tie, two 69's. This tie is resolved by picking two further random digits and attaching one to each of the ties. The next twodigit random number is 28, and so the first 69 is regarded as 69.2 and the second as 69.8. The third row gives the ordering of the random numbers. Then plot 1 receives treatment (33), plot 2 receives treatment (13), etc. 17.2. RandomizedBlock Designs If we are studying a single factor at t levels, with,. replicates on each level, it may be possible to arrange the observations in ,. blocks of t observations. In agricultural experimentation each observation of, say,
SECT.
17.2
549
RANDOMIZEDBLOCK DESIGNS
yield, comes from a plot of ground, and we may group t adjacent plots to form a block. In executing the experiment, we randomly allocate the t levels to the t plots in the first block, repeat the randomization for the second block, and so on. This is quite different from the completely 'lndomized experiment where there was a single randomization of rn ;reatments, actually /' treatments repeated n times, on to rn plots. Usually the blocks will be regarded as a random or model II factor, and the factor under study will be a model I factor. The appropriate analysis will be a mixed model with one observation per cell. A suitable modification of the model (14.8.1) to this circumstance would be (2.1)
and Table 14.14 with no withincell term and n = 1 becomes Table 17.2. The remainder mean square is an appropriate error term for testing the effect of the factor and for constructing confidence limits for differences. The test of the block effect is only satisfactory if it can be assumed that is small compared to 0'2. This limitation is usually of small import as our main objective is to study the factor.
0':
Source of variance Blocks Factor Remainder
Table 17.2 Mean square Type of effect S2 4 S2 3 S2
II I
2
E[M.S.] a 2 + ta~ a2+a:+ra~ a 2 + a:
The objective of using a randomizedblock design instead of a completely randomized arrangement is to reduce experimental error. Adjacent plots clustered together in a block should be more alike in their response to the same treatment than plots at opposite ends of the field. Let O'~b and O'~r be the error terms in the two forms. We define the more efficient design, the randomized blocks, as the standard, and the efficiency of the other is defined as 1=
2 O'Ob.
O'~r
(2.2)
This is a reasonable definition of efficiency, for to get the same accuracy on the comparison of treatment means we would have to use 1// times as many replicates with the less efficient design; e.g., if the efficiency is 0.50, we need 1/0.5 = 2 times as many observations with the inefficient design compared with the efficient design to get the same accuracy.
550
SOME SIMPLE EXPERIMENTAL DESIGNS
CHAP.
17
Suppose that we have t treatments and l' blocks, or replicates. Let E be the error mean square and B be the block mean square, and let F = BIE. Then it can be shown. that the estimated efficiency of the completely randomized design relative to the randomizedblock design is
1=
A2 aOb
a~r
E
= [(I'  l)B
+
I'(t  l)E]/(1'1  1)
_
I' t  1 F(I'  1) + r(t  1) (2.3)
As we would expect, the larger the F, the lower is the estimated relative efficiency of the completely randomized design. If we have two factors, say Band C, at t and u levels, and /' blocks containing tu plots, we can randomly allocate the tu treatment combinations to the plots in the first block, repeat the process for the second block, etc. If the blocks are regarded as a random effect and the two factors are fixed effects, the appropriate analysis is similar to that discussed in Section 15.3, and the expected values of the mean squares are like those given in the lefthand side of Table 15.3, where a! corresponds to blocks, with the modification that n = I and there is no withincells sum of squares. The appropriate error term for the factor B is its interaction with blocks, and analogously for C, and the twoway interaction B X C is to be tested against the remainder mean square. Actually, many practitioners of the art pool the sums of squares and degrees of freedom for all the interactions with blocks, namely aB, aC, and aBC, and use this pooled mean square as an error term. This procedure has implicit in it the assumption that a!B = a!o = a!BO = O. It is not clear in what fields of experimentation such a set of assumptions is or is not valid. 17.3. The SplitPlot Situation
Suppose that we have two factors, say B corresponding to varieties of potato and C to quantity of fertilizer. We might plan the experiment as in the previous section, in l' randomized blocks, each block containing tu plots. Alternatively, suppose we group these tu plots into t groups of u plots, and now change the terminology so that the ultimate unit is a subplot and the groups of u su bplots are whole plots, the block containing t such whole plots. We now design the experiment as follows. We randomly allocate the t levels of factor B (variety of potato) to the t whole plots in the first block, repeat this procedure for the second block, etc. We now randomly allocate the u levels of factor C (quantity of fertilizer) to the u subplots contained
SECT.
17.3
551
THE SPLITPLOT SITUATION
in the first whole plot in the first block, repeat this procedure for the second whole plot, etc. If there are three varieties and four fertilizers, the first two blocks might have the arrangement shown in Table 17.3. Table 17.3 va!4
Va!2
VI/I
VI/a
V2/2
V2!1
vaft
va!a
VI/2
VI/4
V2!4
vda
VI/I
I VI/a
Vall
Va!4
V2!2
V2!1
VI/4 1 VI/2
va/2
va!a
V2!a
l'2!4
Block I
Block II
Such an arrangement is known as a splitplot design, here in randomized blocks. We would be motivated to use it in the present example if, on the one hand, it was technically inconvenient to plant the varieties in small subplots, switching from variety to variety as we cover the four subplots in a whole plot, whereas on the other hand it was not troublesome to switch from fertilizer to fertilizer going from one subplot to another in the same whole plot. The essential feature of a splitplot design is that instead of the rtu ultimate experimental units being obtained after randomization over the whole number of rlu units, as in a completely randomized design, or being obtained after r separate randomizations over tu units, as in a simple randomized block design, they are obtained by first randomizing the treatments C on to the u subplots, this randomization being performed rt times, and then randomizing the treatments B on to the t whole plots, this randomization being performed r times, once for each of the r blocks. An appropriate model for the experiment of Table 17.3 is x iik
= ~ + bi + "Pi + em) + CPk + (CP"P)jk + (bcp)ik + Zijk'
(3.1)
In this model, bi is the randomblock effect, "Pi is the fixed variety effect, CPk is the fixed fertilizer effect, (CP"P)ik is their interaction, and (bCP)ik is the interaction of blocks with fertilizer. The motivation for writing ej(i) in this form is that if the experiment had been run with every whole plot receiving identical varieties and every subplot receiving identical fertilizers, then we would have a simple nested situation. The'random variation of whole plots within the blocks is represented by the unrestricted error term ei(i)' and the random variation of the subplots within the whole plots
552
SOME SIMPLE EXPERIMENTAL DESIGNS
CHAP.
17
is represented by the unrestricted error term Zi;k' Let e and Z have variances w 2 and 0'2, respectively. To obtain the expected values of the mean squares in the analysis of variance we follow the procedure of Section 16.3 and set up Table 17.4. Table 17.4 j
Z;;k (b,P)ik (4)''');k 4>k
(r, R)
(t, T)
1 1  r/R
I
r r
1  I/T
e;(i)
1
'tp;
r
1 1  I/T
bi
1  r/R
I
k (II,
V)
1 1  II/V 1  II/V 1  II/V II II II
In general, in splitplot experiments the wholeplot treatments may be either model I or model II, and likewise the subplot treatments, though it is more usual for them both to be model I, as in the present case. Thus here R = 00, so 1  I'/R = 1, and t = T and u = V, so 1  tiT = 1  u/V = O. We can now write down the expectations of the mean squares in Table 17.5. It is a matter of opinion in any particular instance Table 17.5
Source of variance
Degrees of freedom
Blocks Varieties Wholeplot error Fertilizers Varieties x fertilizers Blocks x fertilizers Subplot error
r  1 11 (r  1)(1 /II (I  1)(u (I'  1)(11 (r.  1)(1
Total
I'lu  1
 1)  1)  1)  1)(u  1)
E[M.S.] a2 a2 a2 a2 a2 a2 a2
+ IIW2 + tlla~ + uw2 + rlla4>2 + IIW 2 + ta2b4> + rta4>2 + ra:Ip + ta blp2
whether the term (br/»ik should or should not be included in the model. It is perhaps more usual to exclude it. In that case its sum of squares and degrees of freedom are pooled with subplot error to form a new estimate of 0'2 based on t(1'  1)(k  1) degrees of freedom, and of course to'~ gets stricken out of the expectation of the fertilizer mean square.
SECT.
17.4
553
RELATIONSHIP OF SPLITPLOT TO HIERARCHICAL SITUATIONS
17.4. Relationship of SplitPlot to Partially Hierarchical Situations
In our discussion of the splitplot experiment of the preceding section, there was nothing essential to the splitplot concept that the experiment be run in randomized blocks. As far as the wholeplot part of the experiment is concerned, it might just as well be run as r completely randomized replicates of the t wholeplot treatments. If we omit the block effects from the model (3.1), we get (4.1) If we now refer back to the model (16.1.1) for the partially hierarchical situation discussed in Section 16.1, and modify it to conform to the situation where there is only one replicate per cell by omitting the suffix " and the final error term Zijk. and by substituting ZUk for {Ac(oc)hi(i)' we get XUk = ~ + OCi + {C(oc)}iW + Ak + (AOC)ik + Ziik> (4.2) which is essentially identical with (4.1). We therefore see that the splitplot situation is an agricultural example of a partially hierarchical classification, and the two can be considered together. We can use the results of Section 16.4 on variances of various types of differences, with the aforementioned differences. Thus the analog of (16.4.13) becomes V[X. i .

x. i ',] = ~ (M.S. for wholeplot error).
(4.3)
1'U
Since, with" becomes V[X ..k

=
1, the {AC(OC)}ki{i) has been replaced by ziik, (16.4.18)
x.. k '] = ~
(M.S. for subplot error),
(4.4)
= ~ (M.S. for subplot error).
(4.5)
1't
and (16.4.22) becomes V[X. ik  X.ik']
r
From (16.4.24) and (16.4.25), V[X. ik  X.J'k] =
l. [(u ru
 l)(M.S. for subplot error)
+ (M.S. for wholeplot error)].
(4.6)
554
SOME SIMPLE EXPERIMENTAL DESIGNS
CHAP.
17
EXERCISES 17.1. Suppose that in a single experiment k treatments are to be compared only with a control treatment, and are not to be compared with each other. Suppose that no replications are to be run on the control and n t replicates are to be run on each of the k other treatments. The total number of replicates N = no + kn t is fixed. Show that the choice of no and n t such that no = vkn t makes V[xo'  x/.], averaged over i, a minimum. Assume no and nt to be large enough to be regarded as continuous variables. 17.1. An experiment was run to compare three similar magnesium salts, A, B, C, in the production of an antibiotic by fermentation. In the first replication, three fermentations were started, one containing salt A, another salt B, and the third salt C. After five days, samples for analysis were withdrawn from each fermentation, and likewise after six days. The whole operation was repeated a total of four times. The replications should be regarded as blocks. (a) Make an appropriate analysis of variance for these data, and report the F values, with the corresponding degrees of freedom, for the main effects of salt, age (five days versus six), and their interaction. (b) Give 95 per cent confidence limits for (i) salt A  salt B, (ii) six days  five days, (iii) salt A  salt B, both at six days. Magnesium salt A
C
B
Replication
5 days
6 days
5 days
6 days
5 days
6 days
1 2 3 4
69 82 67 69
84 78 74 77
91 75 78 85
98 82 92 92
81 72 66 73
86 77
79 81
17.3. A comparison was made of the effects of two levels of temper, half and quarter, on the shearing modulus of elasticity of a certain stainless steel. From each of five random lots of this type of steel three tubes were chosen at random. Each tube was cut in half, and one portion given the quartertemper treatment and the other portion the halftemper treatment. 1(1)
Sample, j(i)
3(i)
2(i)
Temper, k
k=1
k=2
k=1
k=2
k=1
k=2
Lot, i = 1 i= 2 i= 3 i= 4
1073 1022 942 1011 1044
1024 962 900 1000 1010
1026 1009 927 1058 1027
990 1005 887 1006 1010
996 1150 942 1038 1035
972 985 885 1022 1004
i= 5
Sum of squares of all observations = 30,013,864. Source: C. W. Muhlenbruch, V. N. Krivobok, and C. R. Mayne, "Mechanical Properties in Torsion and Poisson's Ratio for Certain Stainless Steel Alloys," Proceedillgs oftile Americall Society for Testillg alld Materials, 51 (1951), 83752.
EXERCISES
555
(a) Write down an appropriate linear model and compute the corresponding analysis of variance. Give the expectations of the various mean squares. (b) Test the null hypothesis that the difference between tempering treatments is constant over lots. (c) Give a 95 per cent confidence limit of the difference between the two tempering treatments. (d) ESjimate the various components of variance. 17.4. An experiment is run to compare r treatments T;, i = 1, ... ,r in t randomized blocks b;, j = 1, ... ,t. The treatments are randomized on to the plots in each block. Observations are made at the end of 1 year. The same treatments are used on the same plots for a total of II years Ykt k = 1, ... , II, and observations made each year. Thus no further randomization beyond the initial one is used. It is to be assumed, however, that the sequence of II years is in some sense a random sample of all years. The treatments are to be regarded as a model I effect, and the blocks as a model II effect. (a) Write down a suitable linear model for this experiment. (b) Present a table listing the names of the mean squares you would compute, with the corresponding numbers of degrees of freedom and the expectations of the mean squares. (c) Indicate how you would test the main effect for treatments. 17.5. Suppose that the experiment in exercise (17.4) was replicated each year at several places. Let the index on the place symbol be m, m = 1, ... ,v. Suppose that the places are (i) a random sample from an infinite population, and (ii) the only places of interest. (a) Write down suitable linear models for these two situations, and (b) present a table listing the names of the mean squares you would compute, with the corresponding degrees of freedom and the expectations of the mean squares for the two situations. (c) Under supposition (i), how would you test the main effect for treatments? (d) Under supposition (ii), how would you test the treatment x place interaction, and assuming that this was zero, the treatment main effect?
Appendix
558
APPENDIX
Table I. The Cumulative Standardized Normal Distribution Function* 1
4i(u) =
u
~~
zt
.'az
V2n 
FOR
4 099 ~ u ~ 0 0000
"
°00
°01
°oz
003
004
005
°06
°°7
°08
°°9
°O °1 °2 °3 °4
°5000 °4602 °4 207 °3821 °3446
°4960 °4562 °4 168 °3783 °3409
°49 20 °45 22 °4 129 °3745 °337 2
°4880 °44 83 °40go °37°7 °3336
°4 840 °4443 °4052 °3669 °3300
°4801 °4404 °401 3 °3632 °3 264
°4761 °4364 °3974 °3594 °3228
°4721 °4325 °3936 °3557 °3 19 2
°4681 °4286 °3897 °3520 °3156
°4 64 L °4 247 °3 859 °34 83 °3121
°5 06 0708 °9
°3085 02743 02420 °21I9 0184 1
°3050 :27°9 °2389 °20go 0181 4
°3015 °2676 023511 02061 01 788
02 981 02643 °23 27 02033 01 762
°2946 °261I °22 97 02005 °1736
°2912 °2578 02266 °1977 °17 1I
02877 °2546 °2236 01 949 01685
02843 °2514 02206 °19 22 01660
02810 °24113 °2177 011194 01635
°2776 0245 1 °2148 01867 °161I
01562 °1335 °1I31 °°9510 °°7927
°1539 °1314 °11I2 009342 007780
°15 15 °1292 °log3 °°9 17 6 °07636
01 492 °1271 01075 °Ogo12 007493
°1469 °1251 01056 008851 °°7353
°1446 °1230 01038 008691 °°7 21 5
°1423
1°2 1°3  1°4
°1587 °1357 °1I51 00g680 °08°76
008534 °07078
01 401 °1I90 °1003 °08379 006944
°1379 °1I7° °og853 008226 0068u
 1°5 1 06  1°7 1 08  1°9
006681 °05480 004457 °03593 002872
006552 °0537° 004363 003515 002807
°06426 005262 0°4272 003438 002 743
006301 °05 155 004182 003362 002680
006178 °05050 004093 003288 002619
006057 004947 004006 003216 °02559
005938 °04846 °03920 °03144 °02500
005821 °04746 003836 003074 002 44 2
°057°5 °04648 003754 °03005 °02385
005592 004551 °03673 002938 °02330
2°O
0022 75 °017116 °01 39° °01072 00'11198
'02222
002169 001 700 001321 00101 7 °°'7760
°021I8 001659 001287 °O'9go3 °0'7549
002068 001618 °012 55 00'9642 00'7344
002018 °01 578 00'9387 00'7 143
001 9 23 °01500 °01I60 00'8894 00'6756
°01876 °01463 °01I3° 00' 8656 00'6569
001831 001 426
'01222
001 97° °01 539 °OU91 0°'9137 00'6947
00'5868 00'4396 00'3264
00'5543 00'4 145 00'3072 °°'2256 °0' 1641
00'5386 00'4025 °0'2980 °0'2186 °°'1589
00'5234 °O'3go7 °o' 2890 °O'21I8 °0'1538
00'5085 00'3793 °0'28°3 °0'1489
00'4940 00'3611 1 °0'27 18 °°'1988 °0'144 1
00'4799 00'3573 °0' 2635 °°'1926 °0'1395
'011223
°O' 1I07 0°'7888 °O' 5571 °°'3 897 °°'27° 1
°0' 1°7° 0°'7622 °°'5377 °°'375 8 °0'2602
°0' 1°35 °°'736 4 °°'5 19° °°'362 4 °0'25°7
°0'7u 4 00'5009 00'3495 °O' 2415
°°' 1854
°O' 1785 °°' 121 3 00'8162 °°'544 2 °°'3594
°O' 1718 °0'1I66 °°'7 84 1 °0'5223 °°'3446
°0'7532 °0'5012 °0'3304
°°'2252 °°'1458 °0'9345 °°'5934 °°'373 2
°°' 21 57 °0'1395 00'8934 00'5668 °O' 3561
°°'2325 °O' 1434 00'8765 006 53 04 °0'3 179
°0'2216 °°'1366 00'8339 0065042 °0'3° 19

1'0 1'1
2'1
2'2
2°3  2°4 2°5 2 06  2°7 2 08  2°9 3°O 3°1 3°2 3°3
3"4 3°5 3°6
3"7 3°11 3°9 4°O 4° 1 4°2 4°3
4"4 4°5 4°6 4°7 4°8 4°9
'0 1 6210
001 743 °01 355 001044 00'7976
'1210 '1020
00'466 1 00'3467 °0'2555 °0'1866
00' 6037 00'4527 00'3364 °0'2477 °0' 18°7
°0'175°
00'5703 °O'426g 00'3167 °0'2327 °0' 1695
°0'135° °°'96 76 00' 6871 °°'4 834 °°'336 9
°0'13°6 00'9354 00'6637 00'4665 °°'3 248
°0' 1264 °0'go43 00'6410 °0'4501 °0'313 1
00'8740 °0'61go 00'4342 °O' 3018
°O'1I83 00'8447 °°'5976 °°'4 189 °0'29°9
°O' 1I44 00' 8164 °0'5770 °0'4041 °°' 28°3
°0'2326 °°'1591 °°'1°7 8 00'7235 °0'4810
°O' 2241 °O' 1531 °°' 1036 00'6948 00'4615
°°' 21 58 °0'1473 00'9961 00'6673 00'4427
°°' 2°7 8 °°'14 17 00'9574 00'6407 00'4247
'0)2001
°°'1363 00'9201 00' 6152 00'4074
°O' 1926 °0' 13Il 00'8842 °°'5906 °O'3go8
°°'3 167 °0'2066 °0'1335 00'8540 °O' 5413
°°'3°36 °°'1978 °°' 1277 00'8163 °°'5 169
°°'2910 °°' 1894 °°'7 801 °°'4935
°°'27 89 °°' 181 4 °O'1I68 °0'7455 °0'47 12
°°' 2673 °0'1737 °O'Il18 00'7124 00'4498
°0'2561 °0'1662 °°' 1069 00'6807 °°'4 294
00'6503 °°'4°98
°0'2351 °°'15 23 0°'9774 00'6212 °0'39Il
°°'3°9 2 °°'19 19 °o' Il79 00'7 178 °06 4327
°0'2949 °0'1828 °0' 1I23 00' 6827 °064IlI
°0'2813 °°'174 2 °°' 1069 00'6492 °O'3go6
°0'2682 °0'1660 00' 1017 0066173 °06 37"
°°'2558 °O' 1581 00'9680 00'58 6g 006 3525
°0'2439 °0'15°6 00'92Il 00'5580 °06 3348
°°'3398
°0'3241
'0'2112
'052013
°0'13°1 00'7933 °06 4792
°0' 1239 00'7547 00'4554
'012 4°1
'0 4 1222
'0 1 1261
00'8496 00'5669 00'3747 °0'2454 °0'1591 '04 1022
Example: 4i (3 057) = 00'1785 = 0 000017850
'0 1
2°52
'01101
00'8424 00'6387
'0 1 1001
°0'1653 '0 1 1121
559
APPENDIX
Table I.
The Cumulative Standardized Normal Distribution Function (collfinued) 1
!Il(u)
u
..
= ~~ e'dz
V2n
FOR 0,00 ~ U
~
4,99,
«>
u
'0O
'01
'02
'03
'04
'05
'06
'07
'08
'09
'0
'5°4° '5438 '5832 ,6217 ,6591
'5°8o '5478 '5871 ,6255 '6628
'512O '55 17 '59 1o '62 93 ,6664
'5160 '5557 '5948 '633[ ,67°°
'5 199 '5596 '5987 ,6368 ,6736
'5 239 '5636 ,6026 ,64°6 ,6772
'5 279 '5 675 ,6064 ,6443 ,6808
'53 19 '57 14 '6103 ,6480 ,6844
'5359 '5753 '614[ ,6517 ,6879
,695° '7 291 '7611 '79 1O '8186
'6g85 '7324 '7642 '7939 '8212
'701 9 '7357 '7 673 '7967 '8238
'7°54 '7389 '77°3 '7995 ,8264
'7088 '7422 '7734 '8023 ,8289
'7 123 '7454 '7764 '8051 ,8315
'7 157 '7486 '7794 '8078 ,834°
'7 1go '75 17 '782 3 ,8106 ,8365
'7224 '7549 '7852 ,8133 ,8389
hr
'5000 '5398 '5793 '6179 '6554 ,6915 '7 257 '758o '7 881 '81 59 ,8413 ,8643 ,8849 'go32O '91924
,8438 ,8665 ,8869 'go4go '92°73
'846[ ,8686 ,8888 'go658 '92220
,8485 ,87°8 '8go7 'go824 '92364
,85°8 ,8729 ,8925 'go988 '925°7
,8531 ,8749 ,8944 '9 11 49 '9 2647
,8554 ,877° ,8962 ' '9 1309 '9 2785
,8577 '87go '8g80 '91466 '9 2922
,8599 ,8810 ,8997 '91621 '93056
,8621 ,883° 'go147 '9 1774 '93 189
['5 [,6 1'7 1,8 1'9
'933 19 '94520 '95543 '96407 '97 128
'9344 8 '94630 '95637 '96485 '97[93
'93574 '94738 '957 28 '96562 '97 257
'93699 '94845 '958[8 '96638 '973 20
'93822 '94950 '95go7 '967[2 '9738[
'93943 '95053 '95994 '96784 '97441
'94062 '95[54 '96080 '96856 '97500
'94179 '95 254 '96164 '96g26 '97558
'94295 '95352 '962 46 '96g95 '9761 5
'94408 '95449 '96327 '97062 '97670
2'0 2'1 2'2 2'3 2'4
'977 25 '9821 4 '98610 '9 8928 '9'1802
'97778 '982 57 '98645 '98956 '9' 202 4
'97831 '98300 '98679 '98983 '9'2240
'97882 '9 834 1 '98713 '9'0097 '9'2451
'9793 2 '98382 '98745 '9'0358 '9' 2656
'97982 '98030 '980 77 '9812 4 '98169 '98422 '98461 '98500 '98537 '98574 '98778 '98809 '98840 'g887° '98899 '9' 0613 '9'0863 '9'1106 '9'1344 '9'1576 '9' 2857 '9'3053 '9'3 244 '9'343[ '9'361 3
2'5 2,6 2'7 2,8 2'9
'9'37go '9'5339 '9'6533 '9'7445 '9' 8134
'9'3963 '9'5473 '9'6636 '9'75 23 '9' 81 93
'9'4 132 '9'5604 '9'6736 '9'7599 '9' 82 50
'9'4 297 '9'5731 '9' 6833 '9'7 673 '9'8305
'9'4457 '9'5855 '9'6g28 '9'7744 '9'8359
'9'461 4 '9'5975 '9'7° 20 '9'781 4 '9'8411
'9'4766 '9' 6093 '9'7 110 '9'7 882 '9'8462
'9'49 15 '9' 6207 '9'7 197 '9'7948 '9'8511
'9'5060 '9'6319 '9'7 282 '9'8012 '9'8559
'9'5 201 '9'6427 '9'73 65 '9!8074 '9' 8605
3'0 3'1 3'2
3"4
'9' 8650 '9'0324 '9' 3 129 '9' 5166 '9'6631
'9' 8694 '9'°646 '9'3363 '9'5335 '9'6752
'9'8736 '9'0957 '9' 35go '9'5499 '9' 6869
'9'8777 '9' 881 7 '9'1260 '9'1553 '9'3 810 '9'402 4 '9'5658 '9' 5811 '9'6982 '9'709 1
'9'8856 '9,' [836 '9'4230 '9'5959 '9'7 197
'9'8893 '9'2112 '9'44 29 '9' 6103 '9'7 299
'9'8g30 '9'8965 '9'2378 '9' 2636 '9'4623 '9'481b '9' 62 42 '9'6376 '9'7398 '9'7493
'9'8999 '9'2886 '9'499 1 '9'6505 '9'75 85
3'5 3,6 3"7 3,8 3'9
'9'7674 '9'8409 '9'8922 '9'2765 '9'5 1go
'9'7759 '9'8469 '9'8964 '9'3052 '9'5385
'9'7842 '9'7922 '9'8527 '9'8583 '9'0039 '9'0426 '9'33~7 '9'3593 '9'5573 '9'5753
'9'7999 '9' 8637 '9'0799 '9'3848 '9'5926
'9' 8074 '9' 8689 '9'1158 '9'4094 '9' 6092
'9'8146 '9'8739 '9'1504 '9'433 1 '9' 6253
'9' 821 5 '9'8787 '9' 1838 '9'4558 '9'6406
'9'8282 '9'8834 '9' 21 59 '9'4777 '9'6554
'9'8347 '9'8879 '9'2468 '9'4988 '9'6696
4'0 4'1 4'2 4'3 4"4
'9' 6833 '9'7934 '9'8665 '9'1460 '9'4587
'9'6964 '9'8022 '9'872,3 '9' 1837 '9'483 1
'9'7090 '9'7 211 '9'73 27 '9'8106 '9'8186 '9' 8263 '9'8778 '9'8832 '9'8882 '9' 21 99 '9'2545 '9' 2876 '9'5065 '9'5 288 '9'5502
'9'7439 '9'8338 '9'8931 '9' 3 193 '9'5706
'9'7546 '9'84°9 '9'8978 '9'3497 '9'5go2
'9'7 649 '9'7748 '9'8477 '9'8542 '9'°226 '9'°655 '9'37 88 '9'4066 '9' 6089 '9'6268
'9'7843 '9' 8605 '9' 1066 '9'4332 '9'6439
4'5 4,6 4'7 4,8 4'9
'9'6602 '9'7888 '9' 8699 '962067 '965208
'9'6759 '9'7987 '9'8761 '962 453 '965446
'9'6go8 '9'8081 '9'882[ '962822 '965673
'9'7 187 '9' 8258 '9'893[ '963508 '966094
'9'73 18 '9'8340 '9'8983 '96382 7 '966289
'9'7442 '9'8419 '960320 '964[3[ '966475
:9'7561 '9'8494 '960789 '9644 20 '966652
'9'7675 '9'8566 '96 [235 '964696 '96682[
'9'77 84 '9' 8634 '96 [66[ '964958 '96698[
'I
'2 '3 '4 '5 ,6 '7 ,8 '9 ['0 1'[ 1'2 1'3
3"3
'9'7°5 1 '9' 81 72 '9'8877 '963[73 '96588 9
Example: !Il(3'57)
= '9'8215 = 0,9998215,
• Abridged from Table II of Statistical Tables and Formulas by A. Hald, John Wiley & Sons, New York, 1952.
560
APPENDIX
Table II. Fractional Points of the t Distribution*
X
0.750
0.900
0.950
0.975
0.990
0.995
I 2 3 4 5
1.000 0.816 0.765 0.741 0.727
3.078 1.886 1.638 1.533 1.476
6.314 2.920 2.353 2.132 2.015
12.706 4.303 3.182 2.776 2.571
31.821 6.965 4.541 3.747 3.365
63.657 9.925 5.841 4.604 4.032
318 22.3 10.2 7.173 5.893
6 7 8 9 10
0.718 0.711 0.706 0.703 0.700
1.440 1.415 1.397 1.383 1.372
1.943 1.895 1.860 1.833 1.812
2.447 2.365 2.306 2.262 2.228
3.143 2.998 2.896 2.821 2.764
3.707 3.499 3.355 3.250 3.169
5.208 4.785 4.501 4.297 4.144
II 12 13 14 15
0.697 0.695 0.694 0.692 0.691
1.363 1.356 1.350 1.345 1.341
1.796 1.782 1.771 1.761 1.753
2.201 2.179 2.160 2.145 2.131
2.718 2.681 2.650 2.624 2.602
3.106 3.055 3.012 2.977 2.947
4.025 3.930 3.852 3.787 3.733
16 17 18 19 20
0.690 0.689 0.688 0.688 0.687
1.337 1.333 1.330 1.328 1.325
1.746 1.740 1.734 1.729 1.725
2.120 2.110 2.101 2.093 2.086
2.583 2.567 2.552 2.539 2.528
2.921 2.898 2.878 2.861 2.845
3.686 3.646 3.610 3.579 3.552
0.999
561
APPENDIX
Table II• Fractional Points of the t Distribution (continued)
.X
0.750
0.900
0.950
0.975
0.990
0.995
0.999
21 22 23 24 25
0.686 0.686 0.685 0.685 0.684
1.323 1.321 1.319 1.318 1.316
1.721 1.717 1.714 1.711 1.708
2.080 2.074 2.069 2.064 2.060
2.518 2.508 2.500 2.492 2.485
2.831 2.819 2.807 2.797 2.787
3.527
26 27 28 29 30
0.684 0.684 0.683 0.683 0.683
1.315 1.314 1.313 1.311 1.310
1.706 1.703 1.701 1.699 1.697
2.056 2.052 2.048 2.045 2.042
2.479 2.473 2.467 2.462 2.457
2.779 2.771 2.763 2.756 2.750
3.435 3.421 3.408 3.396 3.385
40 60 120
0.681 0.679 0.677 0.674
1.303 1.296 1.289 1.282
1.684 1.671 1.658 1.645
2.021 2.000 1.980 1.960
2.423 2.390 2.358 2.326
2.704 2.660 2.617 2.576
3.307 3.232 3.160 3.090
co
3.5(l~
J.485 3.467 3.450
*Abridged from Table 12 of Biometrika Tables for Statisticians, vol. I, edited by E. S. Pearson and H. O. Hartley, Cambridge University Press, Cambridge (1954), and Table III of Statistical Tables for Biological, Agricultural, and Medical Research, R. A. Fisher and F. Yates, Oliver & Boyd, Edinburgh, 1953.
562
APPENDIX
Table lli. Fractional p
f
0.005
0.010
0.025
1 2 3 4 5
0.0'393 0.0100 0.0717 0.207 0.412
0.0 3 157 0.0201 0.ll5 0.297 0.554
0.0 3982 0.0506 0.216 0.484 0.831
6 7 8 9 10
0.676 0.989 1.34 1.73 2.16
0.872 1.24 1.65 2.09 2.56
II 12 13 14 15
2.60 3.07 3.57 4.07 4.60
16 17 18 19 20
0.05
0.10
0.20
0.30
0.40
0.0 2393 0.103 0.352 0.7ll l.l5
0.0158 0.211 0.584 1.06 1.61
0.0642 0.446 1.00 1.65 2.34
0.148 0.713 1.42 2.19 3.00
0.275 1.02 1.87 2.75 3.66
1.24 1.69 2.18 2.70 3.25
1.64 2.17 2.73 3.33 3.94
2.20 2.83 3.49 4.17 4.87
3.07 3.82 4.59 5.38 6.18
3.83 4.67 5.53 6.39 7.27
4.57 5.49 6.42 7.36 8.30
3.05 3.57 4.ll 4.66 5.23
3.82 4.40 5.01 5.63 6.26
4.57 5.23 5.89 6.57 7.26
5.58 6.30 7.04 7.79 8.55
6.99 7.81 8.63 9.47 10.3
8.15 9.03 9.93 10.8 11.7
9.24 10.2 ILl 12.1 13.0
5.14 5.70 6.26 6.84 7.43
5.81 6.41 7.01 7.63 8.26
6.91 7.56 8.23 8.91 9.59
7.96 8.67 9.39 10.1 10.9
9.31 10.1 10.9 11.7 12.4
11.2 12.0 12.9 13.7 14.6
12.6 13.5 14.4 15.4 16.3
14.0 14.9 15.9 16.9 17.8
21 22 23 24 25
8.03 8.64 9.26 9.89 10.5
8.90 9.54 10.2 10.9 ll.5
10.3 11.0 11.7 12.4 13.1
11.6 12.3 13.1 13.8 14.6
13.2 14.0 14.8 15.7 16.5
15.4 16.3 17.2 18.1 18.9
17.2 18.1 19.0 19.9 20.9
18.8 19.7 20.7 21.7 22.6
26 27 28 29 30
11.2 ll.8 12.5 13.1 13.8
12.2 12.9 13.6 14.3 15.0
13.8 14.6 15.3 16.0 16.8
15.4 16.2 16.9 17.7 18.5
17.3 18.1 18.9 19.8 20.6
19.8 20.7 21.6 22.5 23.4
21.8 22.7 23.6 24.6 25.5
23.6 24.5 25.5 26.5 27.4
35 40 45 50 75 100
17.2 20.7 24.3 28.0 47.2 67.3
18.5 22.2 25.9 29.7 49.5 70.1
20.6 24.4 28.4 32.4 52.9 74.2
22.5 26.5 30.6 34.8 56.1 77.9
24.8 29.1 33.4 37.7 59.8 82.4
27.8 32.3 36.9 41.4 64.5 87.9
30.2 34.9 39.6 44.3 68.1 92.1
32.3 37.1 42.0 46.9 71.3 95.8
563
APPENDIX
Points of the X2 Distribution * 0.50
0.60
0.455 1.39 2.37 3.36 4.35
0.708 1.83 2.95 4.04 5.13
0.70
0.80
0.90
0.975
0.990
3.84 5.99 7.81 9.49 11.1
5.02 7.38 9.35 11.1 12.8
6.63 9.21 11.3 13.3 15.1
7.88 10.6 12.8 14.9 16.7
10.8 13.8 16.3 18.5 20.5
0.95
0.995
0.999
1.07 2.41 3.67 4.88 6.06
1.64 3.22 4.64 5.99 7.29
6.21 7.28 8.35 9.41 10.5
7.23 8.38 9.52 10.7 11.8
8.56 9.80 11.0 12.2 13.4
10.6 12.0 13.4 14.7 16.0
12.6 14.1 15.5 16.9 18.3
14.4 16.0 17.5 19.0 20.5
16.8 18.5 20.1 21.7 23.2
18.5 20.3 22.0 23.6 25.2
22.5 24.3 26.1 27.9 29.6
10.3 11.3 12.3 13.3 14.3
11.5 12.6 13.6 14.7 15.7
12.9 14.0 15.1 16.2 17.3
14.6 15.8 17.0 18.2 19.3
17.3 18.5 19.8 21.1 22.3
19.7 21.0 22.4 23.7 25.0
21.9 23.3 24.7 26.1 27.5
24.7 26.2 27.7 29.1 30.6
26.8 28.3 29.8 31.3 32.8
31.3 32.9 34.5 36.1 37.7
15.3 16.3 17.3 18.3 19.3
16.8 17.8 18.9 19.9 21.0
18.4 19.5 20.6 21.7 22.8
20.5 21.6 22.8 23.9 25.0
23.5 24.8 26.0 27.2 28.4
26.3 27.6 28.9 30.1 31.4
28.8 30.2 31.5 32.9 34.2
32.0 33.4 34.8 36.2 37.6
34.3 35.7 37.2 38.6 40.0
39.3 40.8 42.3 43.8 45.3
20.3 21.3 22.3 23.3 24.3
22.0 23.0 24.1 25.1 26.1
23.9 24.9 26.0 27.1 28.2
26.9 27.3 28.4 29.6 30.7
29.6 30.8 32.0 33.2 34.4
32.7 33.9 35.2 36.4 37.7
35.5 36.8 38.1 39.4 40.6
38.9 40.3 41.6 43.0 44.3
41.4 42.8 44.2 45.6 46.9
46.8 48.3 49.7 51.2 52.6
25.3 26.3 27.3 28.3 29.3
27.2 28.2 29.2 30.3 31.3
29.2 30.3 31.4 32.5 33.5
31.8 32.9 34.0 35.1 36.3
35.6 36.7 37.9 39.1 40.3
38.9 40.1 41.3 42.6 43.8
41.9 43.2 44.5 45.7 47.0
45.6 47.0 48.3 49.6 50.9
48.3 49.6 51.0 52.3 53.7
54.1 55.5 56.9 58.3 59.7
34.3 39.3 44.3 49.3 74.3 99.3
36.5 41.6 46.8 51.9 77.5 102.9
38.9 44.2 49.5 54.7 80.9 106.9
41.8 47.3 52.7 58.2 85.1 111.7
46.1 51.8 57.5 63.2 91.1 118.5
49.8 55.8 61.7 67.5 96.2 124.3
53.2 59.3 65.4 71.4 100.8 129.6
57.3 63.7 70.0 76.2 106.4 135.6
60.3 66.8 73.2 79.5 110.3 140.2
66.6 73.4 80.1 86.7 118.6 149.4
5.35 6.35 7.34 8.34 9.34
2.71 4.61 6.25 7.78 9.24
* Abridged from Table V of Statistical Tables alld Formulas by A. Hald, John Wiley and Sons, New York, 1952.
'"
1.72 1.66 1.60 1.55 1.49
1.54 1.47 1.40 1.32 1.24
1.57 1.51 1.44 1.37 1.30
1.61 1.54 1.48 1.41 1.34 1.64 1.57 1.51 1.45 1.38
1.67 1.61 1.54 1.48 1.42 1.77 1.71 1.66 1.60 1.55
1.82 1.76 1.71 1.65 1.60
1.85 1.79 1.74 1.68 1.63
1.88 1.83 1.77 1.72 1.67
1.93 1.87 1.82 1.77 1.72
1.98 1.93 1.87 1.82 1.77
2.05 2.00 1.95 1.90 1.85
2.14 2.09 2.04 1.99 1.94
2.28 2.23 2.18 2.13 2.08
2.49 2.44 2.39 2.35 2.30
2.88 2.84 2.79 2.75 2.71
30 40 60 120
1.59 1.58 1.57 1.56 1.55
1.63 1.61 1.60 1.59 1.58
1.66 1.65 1.64 1.63 1.62
1.69 1.68 1.67 1.66 1.65 1.72 1.11 1.70 1.69 1.68
1.77 1.76 1.75 1.74 1.73 1.82 1.81 1.80 1.79 1.78
1.87 1.86 1.85 1.84 1.83
1.89 1.88 1.87 1.87 1.86
1.93 1.92 1.91 1.90 1.89
1.97 1.96 1.95 1.94 1.93
2.02 2.01 2.00 2.00 1.99
2.09 2.08 2.07 2.06 2.06
2.18 2.17 2.17 2.16 2.15
2.32 2.31 2.30 2.29 2.28
2.53 2.52 2.51 2.50 2.50
2.92 2.91 2.90 2.89 2.89
1.72 1.70
25 26 27 28 29
1.74 1.72 1.70 1.69 1.67 1.13
1.77 1.75 1.79 1.78 1.76 1.74 1.73
1.84 1.83 1.81 1.80 1.78
1.89 1.87 1.86 1.84 1.83
1.94 1.92 1.90 1.89 1.88
1.96 1.95 1.93 1.92 1.91
2.00 1.98 1.97 1.95 1.94
2.04 2.02 2.01 1.99 1.98
2.09 2.08 2.06 2.05 2.04
2.16 2.14 2.13 2.11 2.10
2.25 2.23 2.22 2.21 2.19
2.38 2.36 2.35 2.34 2.33
2.59 2.57 2.56 2.55 2.54
2.97 2.96 2.95 2.94 2.93
20 21 22 23 24
1.15
1.71 1.69 1.67 1.66 1.64
1.73
1.68 1.66 1.64 1.62 1.61
1.75 1.72 1.70
1.18
1.82
1.85 1.81 1.78
1.87 1.84 1.81 1.78 1.76
2.11 2.03 1.96 1.90 1.86
3.14 2.76 2.51 2.34 2.21
1.50 1.42 \.35 1.26 1.17
1.56 1.54 1.53 1.52 1.51
1.64 1.62 1.60 1.59 1.57
i.79 1.75 1.72 1.69 1.67
2.08 2.00 1.93 1.88 1.83
3.12 2.74 2.49 2.32 2.18
63.06 9.48 5.14 3.78
62.79 9.47 5.15 3.79
1.90 1.87 1.84 1.81 1.79
1.92 1.89 1.86 1.84 1.81
1.97 1.94 1.91 1.89 1.86
2.02 1.99 1.96 1.93 1.91
2.06 2.03 2.00 1.98 1.96
2.09 2.06 2.03 2.00 1.98
2.12 2.09 2.06 2.04 2.02
2.16 2.13 2.10 2.08 2.06
2.21 2.18 2.15 2.13 2.11
2.27 2.24 2.22 2.20 2.18
2.36 2.33 2.31 2.29 2.27
2.49 2.46 2.44 2.42 2.40
2.70 2.67 2.64 2.62 2.61
3.07 3.05 3.03 3.01 2.99
3.16 2.78 2.54 2.36 2.23
3.17 2.80 2.56 2.38 2.25
120
60
2.13 2.05 1.99 1.93 1.89
62.53 9.47 5.16 3.80
40
62.26 9.46 5.17 3.82
30
2.16 2.08 2.01 1.96 1.91
15 16 17 18 19
2.18 2.10 2.04 1.98 1.94
2.20 2.12 2.06 2.01 1.96
2.24 2.17 2.10 2.05 2.01
2.28 2.21 2.15 2.10 2.05
2.32 2.25 2.19 2.14 2.\0
2.35 2.27 2.21 2.16 2.12
2.38 2.30 2.24 2.20 2.15
2.41 2.34 2.28 2.23 2.19
2.46 2.39 2.33 2.28 2.24
2.52 2.45 2.39 2.35 2.31
2.61 2.54 2.48 2.43 2.39
2.73 2.66 2.61 2.56 2.52
2.92 2.86 2.81 2.76 2.73
3.29 3.23 3.18 3.14 3.10
10 II 12 13 14
3.19 2.82 2.58 2.40 2.28
3.21 2.84 2.59 2.42 2.30
3.24 2.87 2.63 2.46 2.34
3.27 2.90 2.67 2.50 2.38
3.30 2.94 2.70 2.54 2.42
3.32 2.96 2.72 2.56 2.44
3.34 2.98 2.75 2.59 2.47
3.37 3.01 2.78 2.62 2.51
3.40 3.05 2.83 2.67 2.55
3.45 3.11 2.88 2.73 2.61
3.52 3.18 2.96 2.81 2.69
3.62 3.29 3.07 2.92 2.81
3.78 3.46 3.26 3.11 3.01
4.06 3.78 3.59 3.46 3.36
5 6 7 8 9
62.00 9.45 5.18 3.83
61.74 9.44 5.18 3.84
61.22 9.42 5.20 3.87
60.71 9.41 5.22 3.90
60.19 9.39 5.23 3.92
59.86 9.38 5.24 3.94
59.44 9.37 5.25 3.95
58.91 9.35 5.27 3.98
58.20 9.33 5.28 4.01
57.24 9.29 5.31 4.05
55.83 9.24 5.34 4.11
53.59 9.16 5.39 4.19
49.50 9.00 5.46 4.32
I
24
20
15
12
10
9
8
7
6
39.86 8.53 5.54 4.54
4
3
2
Percentage Points of the F Distribution * 90 per cent pOints
2 3 4
~I
Table IV.
1.46 1.38 1.29 1.19 1.00
1.52 1.50 1.49 1.48 1.47
1.61 1.59 1.57 1.55 1.53
1.76 1.72 1.69 1.66 1.63
1.97 1.90 1.85 1.80
~.06
3.10 2.72 2.47 2.29 2.16
63.33 9.49 5.13 3.76
'"
VI
>
"C
4.77 4.69 4.62 4.56 4.51
6.20 6.12 6.04 5.98 5.92
5.87 5.83 5.79 5.75 5.72
5.69 5.66 5.63 5.61 5.59
5.57 5.42 5.29 5.15 5.02
15 16 17 18 19
20 21 22 23 24
25 26 27 28 29
30 40 60 120
co
5.46 5.26 5.10 4.97 4.86
6.94 6.72 6.55 6.41 6.30
10 II 12 13 14
3.86 3.82 3.78 3.75 3.72
3.69 3.67 3.65 3.63 3.61
3.59 3.46 3.34 3.23 3.12
4.29 4.27 4.24 4.22 4.20
4.18 4.05 3.93 3.80 3.69
4.15 4.08 4.01 3.95 3.90
4.83 4.63 4.47 4.35 4.24
7.76 6.60 5.89 5.42 5.08
4.46 4.42 4.38 4.35 4.32
8.43 7.26 6.54 6.06 5.71
10.01 8.81 8.07 7.57 7.21
5 6 7 8 9
3
4
3.25 3.13 3.01 2.89 2.79
3.35 3.33 3.31 3.29 3.27
3.51 3.48 3.44 3.41 3.38
3.80 3.73 3.66 3.61 3.56
4.47 4.28 4.12 4.00 3.89
7.39 6.23 5.52 5.05 4.72
799.5 864.2 899.6 39.00 39.17 39.25 16.04 15.44 15.10 10.65 9.98 9.60
647.8 38.51 17.44 12.22
2
I 2 3 4
~I
3.03 2.90 2.79 2.67 2.57
3.13 3.10 3.08 3.06 3.04
3.29 3.25 3.22 3.18 3.15
3.58 3.50 3.44 3.38 3.33
4.24 4.04 3.89 3.77 3.66
7:15 5.99 5.29 4.82 4.48
921.8 39.30 14.88 9.36 6.52 5.37 4.6"' 4.20 3.87
6.62 5.46 4.76 4.30 3.96 3.72 3.53 3.37 3.25 3.15
6.68 5.52 4.82 4.36 4.03 3.78 3.59 3.44 3.31 3.21 3.12 3.05 2.98 2.93 2.88
3.20 3.12 3.06 3.01 2.96
2.87 2.74 2.63 2.52 2.41
2.97 2.94 2.92 2.90 2.88
3.13 3.09 3.05 3.02 2.99
2.91 2.87 2.84 2.81 2.78
2.75 2.73 2.71 2.69 2.67
3.01 2.97 2.93 2.90 2.87
2.85 2.82 2.80 2.78 2.76
2.75 2.62 2.51 239 2.29
3.29 3.22 3.16 3.10 3.05
3.95 3.76 3.61 3.48 3.38
4.07 3.88 3.73 3.60 3.50
3.41 3.34 3.28 3.22 3.17
6.85 5.70 4.99 4.53 4.20
2.65 2.53 2.41 2.30 2.19
3.85 3.66 3.51 3.39 3.29
1.7!i 1.64 1.48 1.31 1.00 1.87 1.72 1.58 1.43 1.27
1.94 1.80 1.67 1.53 1.39 2.01 1.88 1.74 1.61 1.48 2.07 1.94 1.82 1.69 1.57 2.14 2.01 1.88 1.76 1.64 2.20 2.07 1.94 1.82 1.71 2.31 2.18 2.06 1.94 1.83 2.41 2.29 2.17 2.05 1.94 2.51 2.39 2.27 2.16 2.05
2.57 2.45 2.33 2.22 2.11
1.91 1.88 1.85 1.83 1.81 1.98 1.95 1.93 1.91 1.89 2.05 2.03 2.00 1.98 1.96 2.12 2.09 2.07 2.05 2.03 2.18 2.16 2.13 2.11 2.09 2.24 2.22 2.19 2.17 2.15 2.30 2.28 2.25 2.23 2.21 2.41 2.39 1.36 2.34 2.32 2.51 2.49 2.47 2.45 2.43
2.61 2.59 2.57 2.55 2.53
2.68 2.65 2.63 2.61 2.59
2.09 2.04 2.00 1.97 1.94 2.16 2.11 2.08 2.04 2.01 2.22 2.18 2.14 2.11 2.08 2.29 2.25 2.21 2.18 2.15 2.35 2.31 2.27 2.24 2.21 2.41 2.37 2.33 2.30 2.27 2.46 2.42 2.39 2.36 2.33
2.57 2.53 2.50 2.47 2.44
2.68 2.64 2.60 2.57 2.54
2.77 2.73 2.70 2.67 2.64
2.84 2.80 2.76 2.73 2.70
2.40 2.32 2.25 2.19 2.13 2.46 2.38 2.32 2.26 2.20
2.52 2.45 2.38 2.32 2.27 2.59 2.51 2.44 2.38 2.33 2.64 2.57 2.50 2.44 2.39 2.70 2.63 2.56 2.50 2.45
2.76 2.68 2.62 2.56 2.51
2.86 2.79 2.72 2.67 2.62
2.96 2.89 2.82 2.77 2.72
3.06 2.99 2.92 2.87 2.82
3.08 2.88 2.72 2.60 2.49 3.14 2.94 2.79 H6 2.55 3.20 300 2.85 2.72 2.61 3.26 3.06 2.91 2.78 2.67 3.31 3.12 2.96 2.84 2.73
3.37 3.17 3.02 2.89 2.79
3.42 3.23 3.07 2.95 2.84
6.02 4.85 4.14 3.67 3.33 6.07 4.90 4.20 3.73 3.39 6.12 4.96 4.25 3.78 3.45
6.18 5.01 4.31 3.84 3.51
3.52 3.33 3.18 3.05 2.95
co
3.62 3.43 3.28 3.15 3.05
120
6.23 5.07 4.36 3.89 3.56
60
6.28 5.12 4.42 3.95 3.61
40
6.33 5.17 4.47 4.00 3.67
30
6.43 5.27 4.57 4.10 3.77
24
lOIN 1014 1010 1006 997.2 1001 39.50 39.49 39.48 39.47 39.46 39.46 13.90 13.95 13.99 14.04 14.08 14.12 8.26 8.31 8.36 8.41 8.46 8.51
20 993.1 39.45 14.17 8.56
6.76 5.60 4.90 4.43 4.10
984.9 39.43 14.25 8.66
976.7 39.41 14.34 8.75
968.6 39.40 14.42 8.84
963.3 39.39 14.47 8.90
956.7 39.37 14.54 8.98
948.2 39.36 14.62 9.07
15
12
10
9
7
6.98 5.82 5.12 4.65 4.32
937.1 39.33 14.73 9.20
6
Table IV. Percentage Points of the F Distribution (continued) 97.5 per cent points
VI
>
!7
1'065° 1'0883 1'11°7 1'1329 1'1549
1'0679 1'°905 1'113° 1'1351 1'1571
'3° '31 '32 '33 '34
1'1593 1'1810 1'2025 1'2239 1'2451
1'1615 1'1832 1'2°47 1'2260 1'2472
1'1636 1'1853 1'2068 1'2281 1'2493
1'1658 1'1875 1'2°9° 1'23°3 1'2514
1'1680 1'1896 1'2324 1'2535
1'17°2 1'1918 1'2132 1'2345 1'2556
1'1723 1'1939 1'2154 1'2366 1'2577
1'1745 1'1961 1'21 75 1'2387 1'2598
1'1767 1'1982 1'2196 1'24°8 1'26'9
1'1788 1'2°°4 1'221 7 1'243° 1'264°
'35 '36 '37 '38 '39
1'2661 1' 287° 1'3°78 1'3284 1'349°
1'2682 1'2891 1'3°98 1'33°5 1'3510
1'27°3 1'2912 1'3119 1'3325 1'3531
1'2724 1'2932 1'3'40 1'3346 1'3551
1'2745 1'2953 1'3161 1'3367 1'3572
1'2766 1'2974 1'3181 1'3387 "359 2
1'2787 1'2995 1'3202 1'34°8 1'36 '3
1'28°7 1'30,6 1'3222 "34 28 "3633
1'2828 1'3°36 1'3243 1'3449 1'3654
1'2849 "3°57 1'3264 1'3469 1'3674
'4° '4 1 '4 2 '43 '44
1'3694 1'3898 1'4101 1'43°3 1'45°5
1'3715 1'3918 1'4121 1'4324 1"45 25
1'3735 1'3939 1'4142 1'4344 1'4545
1'3756 1'3959 1'4162 1'4364 1'4565
1'3776 1'3979 1'4182 1'4384 1'4586
1'3796 1'4°°0 1'4202 1'44°4 1'4606
1'3817 "4020 1'4222 1'4424 1'4626
1'3837 1'4°4° "4243 1'4445 1'4646
1'3857 1'4°6, 1'4263 1'4465 1'4666
1'3878 "4°81 1'4283 "44 85 "4686
'45 '4 6 '47 '4 8 '49
1'47°6 1'4907 1'5108 1'53°8 1'55°8
1'4726 1'4927 I'SI28 1'5328 1'5528
1'4746 1'4947 1'5148 1'5348 1'5548
1'476 7 1'496 7 1'5168 1'5368 1'5568
1'4787 1'4987 1'5188 1'5388 1'5588
1'48°7 1'5007 1'5208 1'54°8 1'5608
1'4827 1'5°27 1'5228 1'5428 1'5628
1'4847 1'5°48 1'5248 "5448 1'5648
1'4867 1'5°68 1'5268 1'5468 1'5668
1'4887 "5°88 1'5288 "5488 1'5688
0'2101
1'2111
1'0122
Example: 2 arcsin 1/0,296 = "'505.
571
APPENDIX
Table V. y = 2 arcsin
V; (continued)
'000
'001
'002
'003
'004
'005
'006
'007
'008
'009
1'5708 1'5908 1,6108 1,63°8 1'65°9 1,6710 1'69II 1'7 II3 1'7315 1'7518
1'5728 1'5928 1,6128 1,6328 1,6529
1'5748 1'5948 1'6148 1,6348 1,6549
1'5768 1'5968 1,6168 1,6368 1,6569
1'5788 1'5988 1,6188 1,6388 1,6589
1'5808 1,6008 1,6208 1,64°9 1'6609
1'5828 1,6028 1,6228 1,6429 1,6629
1'5848 1'6048 1,6248 1,6449 1,6649
1'5868 1,6068 1,6268 1,6469 1,6669
1'5888 1,6088 1,6288 1,6489 1,6690
1,673° 1,6931 1'7133 1'7335 1'7538
1,675° 1,6951 1'7153 1'7355 1'7559
1,677° 1'6971 1'7173 1'7376 1'7579
1,6790 1,6992 1'7193 1'7396 1'7599
1,6810 1'7° 12 1'7214 1'7416 1'7620
1,683° 1'7°32 1'7234 1'7437 1'7640
1,685° 1'7°52 1'7254 1'7457 1'7660
1,6871 1'7°72 1'7274 17477 1'7681
1,6891 1'7°92 1'7295 1'7498 1'77°1
1'7722 1'7926 1,8132 1,8338 1,8546
1'7742 1'7947 1,8152 1,8359 1,8567
1'7762 1'7967 1'8173 1,8380 1,8588
1'7783 1'7988 1'8193 1,84°0 1,8608
1'7803 1,8008 1,8214 1,8421 1,8629
1'7824 1'8029 1,8235 1,8442 1,865°
1'7844 1,8°