INTRODUCTION TO STATISTICAL INFERENCE
by
Jerome
C. R. Li
Chairman, Department of Statistics Oregon State College
Di...
234 downloads
1768 Views
10MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
INTRODUCTION TO STATISTICAL INFERENCE
by
Jerome
C. R. Li
Chairman, Department of Statistics Oregon State College
Dis tri buted by Edwards Brothers, Inc. Ann Arbor, Michigan
1957
COpyright 1957 by JEROME C. R. LI All rig"" re."rued
o Internatioul Copyright 1957 by
JEROME C. R. LI All foreign rig"'_ rnerued
Composed by TM Science Pre .., Inc., Lancaster, PeDlUJylvania, U.S.A. Printed by Edworda Brolla.r., Inc., Ann Arbor, Michigan, U.S.A. Published by JEROME C. R. LI Distributed by EdwtITd. Brollaer., Inc., Ann Arbor, Michigan, U.S.A.
PREFACE This book fs~sitentially a nonmathematical exposition of the theory of statistics written fM experimental scienti.ts. It is an expanded version of lecture notes used fM a oneyear cour.e in statistics taught at Oregon State College since 1949. Students in this course Po + 1.9601 Vii. (The notations < and > stand for "less than" and "greater than" respectively). It may be simply stated as u < 1.96 and u > 1.96. The inequality u < 1.96 implies tbat
yPt
  < 1.96. o
'l/i
As both sides of the above inequality are multiplied by 01 Vn the resulting inequality is
y
Po
<  1.9601v'n.
As 1'0 is added to both sides of the inequality, the result is
y < I'D 
1.9601'1/".
Therefore, u <  1.96 is the same critical region as y < Po  l.96olvn. Similarly, u> 1.96 is the same critical region as y> Po + l.96o/yn. In similar situations hereafter, the quantity u is calculated for the sample, and the critical regions will be stated as u <  1.96 and u > 1.96 if the 5% significance level and two alternative hypotheses are used. Each a lternative hypothesis corresponds to a critical region. A test of hypothesis with two alternative hypotheses and consequently two critical regions is called a twotailed test. If there is only one alternative hypothesis (Section 6.1) and consequently only one critical region, the test of hypothesis is called a onetailed test.
6.8 Assumptions The assumptions are the conditions under which a test of hypothesis is valid. In tbe example used in Sections 6.3 and 6.4, the most important assumption is that the sample is random. If the sample is deliberately selected so that the sample mean is close to or far from the hypothetical population mean, in order to accept or reject the hypothesis, tbe objectivity and therefore the validity of the test are completely destroyed. A random sample is a sample drawn from the population so that every observation in the population has an equal chance of being drawn. Another assumption is that the population is normal. Only a normal population produces normally distributed sample means (Theorem 5.2b).
REMARKS
6.10
55
But the size of sample 16 is large enough to insure the approximate normal distribution of the sample means (Theorem 5.2a) even if the population is not normal. Therefore this assumption is minor as compared to the assumption that the sample is random. 6.9 Procedures The procedures of a test of hypothesis may be illustrated by an example. A random sample of 25 observations is used to test the hypothesis that the population mean is equal to 145 at the 1% level. The population standard deviation is known to be 20. The procedures are as follows: U) Hypothesis: The hypothesis is that the population mean is equal to 145, that is, 1'0 = 145. (2) Alternative hypotheses: The alternative hypotheses are that (a) the population mean is less than 145, and (b) the population mean is greater than 145. (3) Assumptions: The assumptions are that (a) the sample is random (important), (b) the population is normal (minor), and (c) the population standard deviation is known. (4) Level of significance: The chosen significance level is 1%. (5) Critical regions: The critical regions are where (a) u <  2.576 and (b) u > 2.576 (Table 3, Appendix) (6) Computation of statistic 1'0 == 145 n  25 (given)
I y ... 3,471 (The 25 observations are given but are not listed here)
r  138.84
r  1'0 
yn 
6.16
5
u = 20 (given)
u/yn=4
rI'o
u =  
u
6.16
=   = 1.54 4
yn which is outside the critical regions. (7) Conclusion: The population mean is equal to 145.
6.10 Remarks The statistical test of hypothesis can be used only on numerical data representing either measurements, such as the temperature of a room, height of a person, etc., or counts, such as the number of insects on a
56
Ch.6
TEST OF HYPOTHESIS
leaf or the number of books on a shelf. Before any statistical methods can be used, the information obtained must be expressed in numbers. The scientist's first problem is to devise ways of measurement. Before the thermometer was invented, the temperature was described by words such as hot, warm, cold, and cool. The intangible is made tangible by the thermometer. The I.Q. is the psychologist's attempt to make intelligence tangible. As human knowledge advances, more and more intangible qualities are being made tangible quantities. The test of hypothesis is widely used even without expressing the information in numbers. Legal procedure in the United States and in many other countries is an example of a test of hypothesis. During a trial, the defendant is considered innocent, that is, his presumed innocence is merely a hypothesis which is suhject to rejection.
As a matter
of fact, perhaps, the police, the district attorney, and the grand jury may already consider him guilty. An alternative hypothesis, therefore, is that he is guilty. To extend the analogy, the witnesses and exhibits are the observations of a sample. How much evidence is sufficient to convict a person depends on the jury's judgment. In other words, the jury determines the critical region. When the trial is completed, the jury has to decide, after deliberation, whether the defendant is innocent or guilty, that is, whether to accept or reject the hypothesis. If an innocent man is found guilty, the Type I error is committed. If the jury wants a great deal of evidence to convict a defendant, the probability of committing a Type I error is reduced, but because of this, a guilty person may escape punishment and thus a Type II error is committed. If the jury convicts the defendant on flimsy evidence to prevent a possibly guilty person from escaping punishment, an innocent person may be convicted and thus the Type I error is committed. The probability of committing both kinds of errors can be reduced only by increasing the sample size, which means the presentation of more evidence in court. With this analogy in mind, the reader may achieve better understanding of the two kinds of errors if he will read this chapter again.
EXERCISES (1) A random sample of 16 observations was drawn from the basketful of tags which is a normal population with mean equal to 50 and standard deviation equal to 10. The observations of the sample are as follows:
62 37
43 56
60 41
49 43
72 56 36 45
45 56
46 49
Let us pretend that the population mean is unknown. Using the 5% significance level, test the hypothesis that the population mean is equal to (a) 40 (b) 49 (c) 50 (d) 51 and (e) 60. This is a twotailed test. Following the procedures given in Section 6.9, write a complete
QUESTIONS
57
report for (a) only. For (b), (c), (d), (e), just compute u and state the conclusions. Since the population mean is actually known to be 50, it can be determined whether a conclusion is right or wrong. For each of the five cases state whether the conclusion is correct, or a Type I error is made, or a Type II error is made. This exercise is intended to show that a Type II error is likely to be committed if the hypothetical population mean is close to the true population mean. [(a) u .. 3.90; no error. (b) u .. 0.30; Type II. (c) u .10; no error. (d) u ..  .50; Type II. (e) u .. 4.10; no error.] (2) A random sample of 2500 observations were drawn, with replacement, from the basketful of tags. The sample mean is 49.9. Using the 1% significance level, test the same five hypotheses in Exercise (1). Are the conclusions different from those obtained in Exercise (1)? This exercise is intended to show that the probability of committing both Type I and Type II errors can be reduced at the same time by increasing the sample size. (3) A random sample of 25 observations was drawn from the basketful of tags (p .. 50, C1 .. 10). The sample mean was found to be 54. Using both the 1% and 5% significance levels, test the hypothesis that the population mean is equal to (a) 50 and (b) 49. There are four tests
altogether. For each test, compute u and state the conclusion. Since the population mean is actually known to be 50, it can be determined whether the conclusion is right or wrong. For each of the four cases, state whether the conclusion is correct, or whether a Type I error or a Type II error is made. This exercise is intended to show the fact that a change in the significance level without a change in the sample size gains something and also loses something. The use of the 5% significance level is more likely to lead to a Type I error and less likely to a Type II error than the use of the 1% significance level.
QUESTIONS (1) Define the following terms:
(a) Hypothesis (b) Assumption (c) Type I error (d) Type II error (e) Significance level (£) Critical region. (2) The quantities 5% and 1% are used repeatedly in this chapter. They refer to the percentages of what? (3) What are the mean and variance of the u's?
58
TEST OF HYPOTHESIS
Ch.6
(4) What is the consequence of reducing the significance level from 5% to 1% without changing the sample size? (5) What is the consequence of increasing the sample size without changing the sign ificance level? (6) What is the consequence of using zero percent significance level? What does this mean in terms of the analogy given in Section 6.10? (7) Does one need a large or a small sample to reject a false hypothesis which is very close to the true one? Why? (8) If a hypothesis is already rejected by a sample of 10 observations, is it likely to be rejected or accepted by a sample of 100 observations? Why? (9) When one tests a hypothesis with the same significance level, is a large sample or a small sample more likely to cause the rejection of the hypothesis? Why? (10) Regardless of the sample size, one can test a hypothesis. Why does one prefer a large sample?
REFERENCES Kendall, Maurice G.: Advanced Theory of Statistics, Vol. II, Charles Griffin and Company, London, 1946 (Extensive bibliography). Wald, Abraham: "On the Principles of Statistical Inference, I f Notre Dame Mathematical Lectures, No. I, University of Notre Dame, 1942.
CHAPTER 7
SAMPLE VARIANCEx'DISTRIDUTION Chapters 5 and 6 collectively deal with the deductive and inductive relations between a population and its samples. The deductive relation is shown in Chapter 5, which describes the characteristics of the sample means drawn from a population. The direction is from the population to the samples. Chapter 6, on the other hand, in showing how a single sample can be used to test a hypothesis about the population illustrates the inductive relation between a population and its samples: the direction is from a sample to the population. Furth ennore , in Chapters 5 and 6 the center of discussion is the mean. Now, in Chapter 7, all of what is described above is repeated, but the center of discussion is shifted to the variance.
7.1 Purposes of Stadying Sample Variance The most obvious reason for the study of the sample variance is to acquire knowledge about the population variance. A second reason, however, is at least as important as the first. It is that the knowledge of the variance is indispensable even if one's interest is in the mean. The ~test is introduced in Section 6.7 to test the hypothesis that the population mean is equal to a given value, where yp.
u =. a
0)
It can be seen from the above equation that the population standard deviation a must be known before this test can be used. But a usually is unknown, and this test therefore has a very limited use. To remove this 1imitation, it is essential to find a way to estimate (J or a Z from a sample. In other words, whether one's interest is in the population mean or in the population variance, the knowledge of the sample variance is indispensable.
7.2 Sample Vmace The first problem in studying sample variance is to detennine what the sample variance should be to provide a good estimate of the population variance. A hint can be obtained from the sample mean. The population mean is I.y Y1 + Y2 + . .• + YN (1) p. =IV= N 
59
60
Ch. 7
SAMPLE V ARIANCE)(DISTRIBUTION
and tne population variance is ~
= I(y 1L}2 =
4 1 9
J
n : 5; y = Iy/ n = 2. I.(y  1)2 .. 4; S2 = I.(y  1)2/(n 1) = 1 I.y  (I,y)2/n = 24  (10)2/5 = 24  20 = 4
7.5 ,rDislribution The normal distribution is described in Chapter 3 as one of the most important distributions in statistics. In this section another important frequency distribution called the )(distributio" is described. This distribution is closely related to the normal distribution and will be introduced through reference to the normal distribution. If all possible samples of si ze " are drawn from a normal population with mean equal to ,.,. and variance equal to c1, a sample mean y can be computed from each sample. Theorem 5.2b states that the distribution of these sample means follows the normal distribution. However, from each sample one can compute not only the mean, but also other statistics, such as tbe sum Iy and variance S2. 1£, for each sample, the statistic
(Equation 1, Section 3.2) is computed, the value of lu2 , like any other statistic, will change from sample to sample. The fluctuation of the values of Iou 2 can be illustrated by the four random samples gi ven in
SAMPLE VARIANCE~DlSTRIBUTION
66
Ch. 7
Table 4.2. These samples are drawn from the tag population which is a nonnal population with mean equal to SO and variance equal to 100 (Section 4.1). The first sample consists of the observations 50, 57, 42, 63, and 32, and the value of I,u2 of this sample is Iu J =
= =
eO~50r + e7~50)2 + (42~~Or + (63~50r + (32~50r (5050)2 + (5750)2 + (4250)2 + (6350)2 + (3250)2 100
o +49 + 64 + 169 + 324 606 ==6.06. 100
100
The second sample consists of the observations 55, 44, 37, 40, 52 and the value of Iu2 is Iu2 = (55 50)2 + (44  50)2+ (37 50)2 + (40 _SO)2 + (52 50)2 100
."
25 + 36 + 169 + 100 + 4
100
= 3.34.
If all possible samples are drawn from a normal population, each sample will have its own value of Iu2• The distribution of these values of !tul is called the ~distribution. The value of Iu2 is influenced not only by the change of observations from sample to sample, but also by n, the sample size. For example, if the first sample consists only of its first two of the five observations, the value of Iu2 is Iu 2
=
(50 50)2 + (57 _SO)2 100
= .49,
instead of 6.06 for the five observations. If the second sample consists only of its first two observations instead of five, the value of ~U2 is Iu 2 = ~55 50)2 + (44 50)2 100
= .61
instead of 3.34. On the average, Iu2 is larger if the sample size is larger. Therefore, for each sample size, there will be a different ~~istribution, the mean of which increases with the sample si ze. Consequently, the )(2distribution is not represented by a single frequency curve but by a family of curves. What uniquely identifies a particular curve is the mean of the distribution. This mean which is denoted by II is called the number of degrees of freedom (for which the abbreviation is
7.5
~DISTRIBUTION
67
d.f. or OF) of the ~distribution. The reason for adopting this name for the mean is not explained here, but its meaning will be revealed as the subject develops. The curves for the ~distributions with 1, 4, and 5, degrees of freedom are shown in Figure 7.5a • •20
,.
.
{
• 1$
Fig. 7.5 a
The discussion in this section may be summarized in the following theorem: Theorem 7.5 If all possible samples of size n are drawn from a normal population with mean equal to IL and variance equal to (12, and for each sample Iu 2 is computed, where (2)
the frequency distribution of Iu 2 follows the )(2distribution with n degrees of freedom (that is, raj. Elaborate mathe:natics must be used to derive the ~distribution from the normal population. The theorem, however, can be verifi eel experimentally by reference to the sampling experiment described in Chapter 4. Briefly, 1000 random samples, each consisting of five observations, are drawn from the tag population which is a normal population with mean
,,=
SAMPLE VAHIANCE~DlSTRIBUTI0N
68
Ch.7
eTlal to 50 and variance equal to 100. For eacb sample, the value of is computed. The values for four such samples are shown in Table 4.2. The frequency table of the 1000 values of I.uJ is given in Table 7.5, where both the theoretical and the observed relative frequencies are shown. The theoretical relative frequency is the wouldbe frequency if all possible samples of size 5 were drawn, and the observed relative frequency is that obtained from the 1000 samples. It can be seen from ~2
TABLE 7.5
~~r 01 12 23 34 45 56
67 78 89
910 1011 1112 1213 1314 1415 Over 15 Total
Observed Frequency
/
r./.(%)
Theoretical
Midpt.
r./.(%)
m
35 109 148 171 139 106 77 64 53 28 18 20 11 6 8 7
3.5 10.9 14.8 17.1 13.9 10.6 7.7 6.4 5.3 2.8 1.8 2.0 1.1 .6 .8 .7
3.7 11.3 14.9 15.1 13.4 11.1 8.6 6.4 4.7 3.4 2.4 1.7 1.1 .8 .5 1.0
1000
100.0
100.1
Mean of
.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 11.5 13.5 14.5 18.1
m/ 17.5 163.5 370.0 598.5 625.5 583.0 500.5 480.0 450.5 266.0 189.0 230.0 126.5 81.0 116.0 126.7 4924.2
'It  Ina/ = 9 "if "" 4924.2 1000 4.
(This sampling experiment was done cooperatively by about 75 students Oregon State College in the Fall of 1949.)
at
Table 7.5 that the theoretical and observed frequencies fit closely but not exactly. The theoretical frequency is based on all possible samples of size 5, while the observed frequency is based on only 1000 samples. Therefore the theoretical and observed frequencies are not expected to agree perfectly. The close agreement between the theoretical and observed frequencies can also be seen in Fig. 7.5b which shows the histogram of the distribution of 1000 Iu2 values and the theoretical frequency curve of 'Itdistribution with 5 degrees of freedom. The fact that the mean of I.u2 is equal to the sample size n can also be shown by Table 7.5. The mean of the 1000 values of ~2 could be easily found. Unfortunately, however, the individual value loses its identity after the frequency table is made. Even so, the approximate mean can be found (rom
69
7.5
'.
I·
2
"
6
Fig. 7.5 b
R
12
10
x'
the frequency table by considering any value of Iu2 in the class 01 as .5, any value in the class 12 as 1.5, etc. These mid.points of various classes are designated by m in Table 7.5. For the class "over 15", the mean 18.1 of the 7 values of Iu2 in that class is used for m. Then the approximate mean of Iu2 is
Imf
4924.2
If
1000
==4.9
which is approximately equal to the sample size 5. If all possible samples were drawn from the normal population, it can be proved mathematically that the mean of the distribution is exactly 5. This completes the experimental verification of Theorem 7.5. Like the tables showing the relative cumulative frequency for tbe normal distribution (Section 3.2), tables for the ,cdistribution are also available. An abbreviated table is shown in Table 4, Appendix. Each line of the table represents a different number of degrees of freedom (d.f.) such as 1, 2, ••. , 100 shown in the extreme left column. Each column shows a different percentage. For example, the tabulated value cOlTesponding to 10 d.f., and 5% is 18.3070. This means that 5% of the ~values with 10 d.f. are greater than 18.3070. The ,cvalue corresponding to 5 d.f., and 5% is 11.0705. This means that 5% of the ~values
SAMPLE V ARIANCE~DISTRIBUTION
70
Ch.7
with 5 d.f. are greater than 11.0705. Th is value may be compared with that obtained by the sampling experiment. From Table 7.5, it can be seen that 5.2% (i.e., 2.0 + 1.1 + .6 + .8 + .7) of the 1000 Iu2 are greater than n. This percentage is approximr.tely equal to 5% as expected. 7.6 Distribution of u2
= 1, follows the ~distribution with 1 degree of freedom. The distribution curves of u and u 2 are shown in Fig. 7.6a and 7.tih. The distribution of It can be deduced from Theorem 7.5 that u2 , being Iu2 when n
.6 r.
.
{
•5
.4
.3
.2
2.5%
.1
3
2
I
0
f
1.96
2
t
3
,.
1.96
Fig. 7.6 a
u is the normal distribution with mean equal to 0 and variance equal to l. The distribution of ~ with 1 degree of freedom, being u2 , is the doubledup version of u (Fig. 7.6a and 7.6b) because (_U)2 = ,;. The square of any value between 0 and 1 is a value between 0 and 1. Likewise the square of any value between 0 and 1 is a value between 0 and 1. For example, (.5)2 = .25, (_.5)2 = .25. The 68% of uvalues lying between 1 and 1 yield the 68% of u 2 between 0 and 1. The square of any value greater than 1.96 is greater than (1.%)2 or 3.84, and the square of any value less than 1.96 is also greater than 3.84. For example. either 22 or (_2)2 is 4 which is greater than 3.84. Since there are 2.5% of uvalues greater than 1.96 and 2.5% of uvalues less than 1.96, a total of 5% of u2 values are greater than 3.84. Therefore, there arc 68% of ~values
7.7
71
DISTRIBUTION OF SS/a2
.8 r.
.
/
.7
.6
.5
.4
.3
.2
.1
o
2
3
r
5
3.84 = (1.96)1
Fig. 7.6 b
between 0 and 1, and 5% of ~values greater than 3.84. (d. Table 4, Appendix). It should be noted from Fig. 7.6a and 7.6b that the middle portion of the CIJrve of u becomes the left tail of ';(' with 1 d.f. The fact that uJ follows the ,;(,distribution with 1 d.f. is frequently recalled in later chapters. The above discussion about u and u J may be summarized in the following theorem: Theorem 7.6 If a statistic u {ollows the normal distribution with mean equal to zero and variance equal to 1, u 2 {ollows the X2distribution with 1 degree o{ freedom (that is v = 1). 7.7 Distribution of SS/a2 In the two preceding sections it is shown that (1)
follows the ,;(,distribution with n degrees of freedom and u2 follows the
72
Ch.7
SAMPLE VARIANCEXl·DlSTRIBUTION
){distribution with 1 degree of freedom. This section shows that follows the ~distribution with n  1 degrees of freedom, where 5S I(y  Yl2 =
~
=(y I 
Yl 2 + (y 2

y>2
+ ••• + (y II

SS/(i
y)2
(2)
~
a2
The two quantities Iu2 and SS/cI are not the same. It can be observed from Equations (1) and (2), that the former quantity deals with the deviations of the observations of a sample from the population mean, while the latter quantity deals with the deviations of the observations from the sample mean. Thus they are not the same, yet they are related. The relation is as follows: (3)
The algebraic proof of the above identity is given in Section 7.8. In this section, the relation is verified numerically. For example, a sample of five observations 50, 57, 42, 63, 32 is drawn from a normal population with mean Il equal to 50 and variance q2 equal to 100. Each of the three quantities in Equation (3) can be calculated for this sample, with the details of the computation shown in Table 7.7a. The result is that 606 598.80 5(1.44) = +,
cI
q2
cI
Since 606 = 598.80 + 7.20, this example verifies the identity shown in Equation (3).

Y
yp.
50 57 42 63 32
0 7 8 13 18
TABLE 7.78 (y _ p.>2

yy
(y 
y)2
49
1.2 8.2
64 169 324
6.8 14.2 16.8
1.44 67.24 46.24 201.64 282.24
244
606
0
598.80
Ir
I(y _p)2 n
0
I(y 
y)
SS
= 5; Y=48.8, P. = 50; (y 1l)2 = 1.44
The left side of Equation (3) is shown to follow the ){distribution with n degrees of freedom in Section 7.5. The term on the extreme right
7.7
73
DISTRIBUTION OF SS/u 2
of Equation (3) is
orr ~~)' = rr
!' .~;;)'.~
(Equation 1, Section 6.7) which is shown to follow the ~~istribution with 1 degree of freedom in Section 7.6. Then it is reasonable to expect that the middle term
1(1  y)2
tI
SS
(5)
=
tI
of Equation (3) follows the )(~istribution with n  1 degrees of freedom. This expectation can be verified by the sampling experiment described in Chapter 4. Briefly, 1000 random samples, each consisting of five observations, are drawn from a normal population with mean equal to 50 and variance equal to 100. For each sample, the value SS is calculated by the method given in Section 7.4. The S5values of four such samples are given in Table 4.2. Since tI is equal to 100, it is an easy matter to obtain SS/cI, once SS is calculated. The frequency table of 1000 values of SS/tl is given in Table 7. 7b. The theoretical relative frequency given in the same table is that of the ~distribution with 4 degrees of freedom. TABLE 7.7b
ss/ci
Observed Frequency
f
r·f·(%)
93
9.3
1011 1112 1213 Over 13
181 189 152 116 86 64 35 38 21 8 7 3 7
18.1 18.9 15.2 11.6 8.6 6.4 3.5 3.8 2.1 .8 .7 .3 .7
Total
1000
100.0
01 12 23 34 45 ~
67 78 89 910
neoretical r·f·(%)
I
9.0 17.4 17.8 15.2 11.9 8.8 6.3 4.4 3.1 2.1 1.4
.9 .6 1.1 100.0
Midpt. m
.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
10.5 11.5 12.5 16.4
mf
46.5 271.5 472.5 532.0 522.0 473.0 416.0 262.5 323.0 199.5 84.0 80.5 37.5 114.8 3835.3
SS Inaf 3835.3 Mean 0 (  =  =   = 3 . 8 tI 'I.f 1000
(This sampling experiment was conducted cooperatively by about 80 students at Oregon State College in the Fall of 1950).
74
SAMPLE VARIANCEXlDISTRIBUTION
Ch.7
.20 f.
.
/
•IS
.10
.OS
Fig. 7.7
The histogram of the 1000 values of SS/t/, and the ,rcurve with 4 degrees of freedom are given in Fig. 7.7. It can be observed from Table 7.7b or Fig. 7.7 that the agreement between the observed and the theoretical frequencies is close. This shows that the values of SS/(l2 which were calculated from 1000 samples, each consisting of 5 observations, follow the ,rdistribution with 4 degrees of freedom, and thus verifies the contention that SS/ll follows the ,rdistribution with n  1 degrees of freedom. The approximate mean of the 1000 values of SS/ll is found to be 3.8 (Table 7.7b), which is close to the number of degrees of freedom 4. This further verifies the contention that SS/ cI follows the ,redistribution with n  1 degrees of freedom. The following theorem summarizes the discussion in this section: Theorem 7.7a If all possible samples of size n are drawn from a normal population with variance equal to (12, and for each sample the value I(y  r)I/(l2 is computed, the values of I(y  '1)2/(12 follow the ,rdistribUlion with n  1 degrees of freedom (that is, v .. n  1). To avoid the lengthy statement given in the above theorem, in practice it is usually said that I(y  1)2 or SS has n  1 degrees of freedom. Since
7.7
DISTRIBUTION OF SS/q2
75
this number of degrees of &eedom is also the divisor used in obtaining S2, that is,
sZ
SS
SS
==, n 1 v
(6)
(Equation 6, Section 7.2), the sample variance S2 is the SS divided by its number of degrees of &eedom. The number of degrees of &eedom of a sum of squares can also be interpreted as the least number of deviations which have to be known in order that the remaining.ones can be calculated. The quantity SS is the sum of the squares of the n deviations (y  1), that is, SS = 1:(y  1)2
= (y.  1)2 + (y2  1)2 + ••• + (y n  Y)2.
(7)
But it is known that the sum of these deviations is equal to zero, that is, I(y  1) = 0 (Equation 5, Section 7.4). Therefore, when n  1 of these deviations are known, the remaining one becomes automatically known. So the SS has n  1 degrees of &eedom. For example, the mean of the five observations 3, 2, 1, 3, 1 is 2 and the five deviations &om the mean are 1, 0, 1, 1, 1. If any fonr of the five deviations are known, the remaining one will be known, because the sum of these five deviations is equal to zero. The nnmber of degrees of &eedom is used in connectioo with every method presented in later chapters. Whenever the number of degrees of &eedom is used, it is used directly or indirectly in connection with the ~distributiOD. If one states that an S2 has v degrees of &eedom, he means that the statistic vs a/q1 = SS/qa follows the x'distribution with v degrees of freedom. For example, if one says sa has a degrees of freedom, he means that asZ / q1 follows the x'distribution with a degrees of &eedom. In later chapters, the number of degrees of freedom is often nsed without reference to the x'distribution, although it is taken for granted that the x'distribution 'is the point of origin. From the distribution of SS/q1, the distribution of sZ /qa can be deduced, because sZ = SS/v. In the case of n = 5, or v = 4, SS/4 = S2. In other words, SS is 4 times as large as sJ. When a value of SS falls between 0 and 4, the corresponding value of sa will fall between 0 and 1. From Table 7.7b it can be seen that 93 out of 1000 samples have values of SS/q1 falling between 0 and 1. Without a further sampling experiment, it can be deduced that the same 93 samples will have values of S2/qa falling between 0 and .25. In other words, if the classes of Table 7.7b are changed to 0.25, .25.50, etc., the new frequency table will be that of S2/q1. Consequently, the statistic sZ/qa follows the distribution of ~/v.
Table 4, Appendix, shows the relative cumulative &equencies of If each line of the
~distribution which is the distribution of SS/lI'.
76
SAMPLE VARIANCEX 2DISTRIBUTION
Ch.7
values of the same table is divided by v the number of degrees of freedom, the resulting table, which is Table 5, Appendix, gives the relative frequency of X/v which is the distribution of sI/tI. The mean of the distribution of s" /tl can be deduced from that of SS/u". Since SS/tl is v times as large as s" /tI, the mean of SS/u" is also v times as large as that of s" / ~. But the mean of SS/ ~ is equal to v, and therefore, the mean of s"/~ is vlv or 1. This same result can be obtained by a different method. Theorem 7.2 states that the mean of the variances s" of all possible samples is equal to tI. If each of the sIvalues is divided by ~, then the new mean is equal to the old mean divided by u", that is, the mean of s" / u" is equal to tI /u" or 1 (Theorem 2.4b). For convenience of future reference, the result is stated in the following theorem: Theorem 7.7b If all possible samples of the same size are drawn from a given population with variance equal to a 2• ana the variance S2 is computed for each sample, the mean of all the ratios s2/a" is equal to 1. After the mean of s" / tI is considered, it is also of interest to know how the variation of s., / u" is affected by the sample size. If the size of the sample is increased, the sample variance s., becomes a more reliable estimate of the population variance tI, and the values of s" of all possible samples of a given size cluster more closaly around the population variance u", or the values of s.,/tl of all possible samples hug the value 1 more closely. As the sample size n approaches infinity, every s., becomes tI. Then the value of S2 / ~ becomes 1 for every sample. This phenomenon can be observed in Table 5, Appendix. For example, both the CJ7 .5% point and the 2.5% point converge toward 1 as the number of degrees of freedom increases. 7.8 Algebraic Identities The algebraic identity given in Equation 4, Section 7.4, can be proved as follows:
(1)
7.9
ANALYSIS OF VARIANCE
77
The algebraic identity gi veo in Equation 5, Section 7.4, can be proved as follows
I(y  y) = (Yl  1) + (r2  1) + ••• + (yII  y) =~ny
Iy
= 1:y  n n
(2)
The algebraic identity given in Equation 3, Section 7.7, can be proved as follows: 1:(y  ".)2
= 1:[(y  y) + (y _ ".)]2  1:[(y 
y)2 + 2(y 
1> (y 
".) + (y  ".)2]
= [(Yl  1>3 + 2(Yl  1) (y [(y 2  y)' + 2(y2  1> (y
".) + (y  ".)3] +  1') + (y  IL)2] + • • • + 1> (y  ".) + (y  IL)2]
[(y II  1)2 + 2(y II = ICy  1)2 + n(y  ".P.
Since I(y  1)
= 0, 2 11.1433 (Fig. 7.10a), if the 5% significance level is chosen. The statistic 55/0: is equal to 480/100 or 4.80 with 4 degrees of freedom. The value 4.80 is outside the critical regions; therefore, the hypothesis is accepted and the conclusion is that ci' 100. If it were in the left critical region, the conclusion would be that the true population :0::
,. f·
.4
.J
.2
_1.....
2
4
6
19
Critical
"Ii
•
10
r11.143312
14
X'
•
Critical ...1011
Fig. 7.10 a
variance is less than 100. If it were in the right critical region, the conclusion would be that the true population variance is greater than 100. If the hypothesis is true and this procedure of testing the hypothesis is followed, 5% of all possible samples will lead to the erroneous conclusion that the population variance is not equal to 100. Therefore, the Type I error (shaded by vertical liDes in Fig. 7.l0a) is 5%. The reason for risking the Type I error is the everpresent possibility that the hypothesis might be false. If the hypothetical variance is smaller than the true variance 02, e.g., 0: e ('1Jo 1, the statistic 55/u: is 2(55/0 2) and consequently 55/0: does not follow the true )(2distribution, but a distorted one such as the curve on the right shown in Fig. 7.l0b. If the hypothetical variance is larger than the true variance 0 2, e.g.,
0:
u:
80
Ch. 7
SAMPLE VARIANCEXtDISTRIBUTION
r. {
. •4
.3
.2
Dlalarte4 )C.  c _
.I WIll1.E/~"::~: ~· === 2 _
6
10
.484419
f
12 \ 11.1433
t.r 1
Crillcal "'1100
14
)C.
•rC FoG
Fig. 7.10 b
u! = 2u
the statistic SS/u: is (~)(SS/u2) and consequently SS/u: does not follow the true X2distribution, but a distorted one such as the curve on the left shown in Fig. 7.10c. In the case where = (~)u2, more than 2.5% of SS/u: is greater than 11.1433 (Fig. 7.10b); and in the case where u: co 2u 2, more than 2.5% of SS/u: is less than .484419 (Fig. 7.10c). In other words, when the statistic SS/u: is large enough or small enough to fall inside one of the critical regions, the event is due to one of two reasons. One is that the hypothesis is true, or = 2, and the sample is one of the 5% of all possible samples which have the values of SS/u 2 in the critical regions (Fig. 7.10a). The other reason is that the hypothesis is false, e.g. u: = '1p2 or u: = 2u 2, and the x2value is made too large or too small by using u: as the divisor in the statistic SS/u! as in the portions of the distorted curves not shaded by horizontal lines in Fig. 7.10b and 7.10c. Whenever the conclusion that u2 > u! or u 2 < u: is reached because SS/u: is inside one of the critical regions, the second reason overrides the first. In tenns of the three curves gi ven in Fig. 7.1 Oa, 7.10b, and 7.10c, whenever SS/u; is inside either one of the critical regions, the sample is considered drawn from a distorted ~distribution rather than from the true ~distribution. But because the sample could come from the true 2,
u:
u: u
7.10
81
TEST OF HYPOTHF:SIS
~d.istribution (shaded by vertical lines in Fig. 7.10a), a Type I error may be committed. When the statistic SS/~ falls outside the critical regions and the conclusion that 0'2 = O'~ is reached, it is still possible that the sample may have come from a distorted ,(distribution (the portion of the distorted curve shaded by horizontal lines in Fig. 7.10b and 7.1Oc). In other words, when a hypothesis is accepted, a Type II error may be committed. The probability of committing a Type II error is
,. f.
•4
/
.1
.2
.1
14 .484419
C.
).'
1 .4
1 ,..,~._
Fig. 7.10 c
represented by the portion of a distorted curve shaded by horizontal lines. All the principles concerning the test of a hypothesis given in Section 6.6 apply here. As the sample size n increases and consequently the number of degrees of freedom, n  1, increases, the overlapping area of the true Xcurve and a distorted >tcurve will decrease and thus the probability of committing a Type II error will decrease. As the ratio 0'2/u! drifts away from 1, e.g. = 10 0'2 instead of = 2 0'2, or = 0'2/10 instead of = 0'2/2, the overlapping area of the true ~curve and a distorted x'curve will also decrease, and thus the probability of committing a Type II error will decrease. In other words, if the significance level remains unchanged, either the increase of sample size or the drifting away of ~ from 0'2 will reduce the probability of committing a Type II
o!
o!
o!
u:
82
SAMPLE V ARIANCEX2DlSTRIBUTION
Ch. 7
error. The Type II error shown in Fig. 7.10b or 7.10c is so large because (a) the sample size is small and (b) the ratio 0'2/o! is not greatly different from 1 (0'2/0'~ = 2 in one case, and 0'2/0': = X in the other). The distribution of S2/0'2 can also be used in testing the hypothesis that the population variance is equal to a given value. For illustration, the same example of SS = 480 and n = 5 is used. Since S2 = 480/4 = 120. the statistic S2/0: = 1201100 or 1.20. The critical regions can be determined with the aid of Table 5, Appendix. They are where ~/v 2.7858, where v is the number of degrees of freedom. These two values which determine the critical regions are onefourth as large as th eir corresponding values obtained from the ~table, and the statistic s21 a! is also onefourth as large as SS/O': Therefore the conclusions reached by the two methods should always be the same. There is really no need for both methods. The reason for introducing the statistic S2/0': is because it provides the common sense way of looking at the test of the hypothesis that the population variance is equal to 0'2. o The statistic S2 is an estimate of the true variance 0'2. If s2/a! is too much greater than 1, the indication is that the true variance is greater than the hypothetical variance Leing tested. If S2/0'2 is too much less than 1, the indication is that the true variance is less than the hypothetical variance being tested. 7.11 Procedures of Test of Hypothesis The procedures in the test of the hypothesis that the population variance is equal to a given value can be illustrated by an example. A sample of 10 observations is drawn from a population. The observations are tabulated as follows: 4.8 5.6
:1.2 3.6 4.7 5.3
4.8 5.1
6.1 7.6.
The problem is to determine whether the variance of the population is equal to 4. The test procedure is as follows: 1. Hypothesis: The hypothesis is that the population variance is equal to 4, that is, 0'2 = a! = 4. 2. Alternative hypotheses: The alternative hypotheses are (a) that the population variance is less than 4, that is, 0'2 < 4, and (b) that the population variance is greater than 4, that is, 0'2> 4. 3. Assumptions (conditions under which the test is valid): The sample is a random sample drawn from a normal population. 4. Level of significance: The chosen significance level is 5%. 5. The critical regions are where (a) ~ < 2.70039 and (b) ~ > 19.0228. (Values obtainable from Table 4, Appendix; )( with 9 d.f. An example of a onetailed test is given in Section 7.12.)
7.12
83
APPLICATIONS
6. Computation of statistic:
o! (l:y)2
= 4 (given)
= 2580.64
n
(l:y}2 n
= 10 258.~4
l:y l:y2
= 50.8
= 271.80
SS = 13.736 [SS =l:y  (l:y)2/n (Section 7.4)] ,r = 3.4340 with 9 d.f. (,r =SS/u!) 7. Conclusion: Since the computed ,rvalue, 3.4.140, is outside the critical region, the hypothesis is accepted. The conclusion is that the population variance is equal to 4. (If the computed ,rvalue were less than 2.70039, the conclusion would be that the population variance is less than 4. If the computed ,rvalue were greater than 19.0228, the conclusion would be that the population variance is greater than 4.) 7.12 ApplicatiODs The test of the hypothesis that the population vanance is equal to a given value, is extensively used in industry. Manufacturers ordinarily require that their products have a certain degree of uniformity in length, weight, etc. Consequently, the permissible variance (or standard deviation) of a certain measurement of a product is usually specified as a production standard. Samples are taken periodically and the hypothesis that the production standard is being maintained is tested. For instance, the average drained weight of a can of cherries may be specified as 12 ounces, and the standard deviation may be specified as 'l. ounce. As long as wbole cherries are canned, it is almost impossible to make the drained weight of every can exactly 12 ounces. One more cherry might cause overweight and one less might cause underweight; therefore, it is expected that the drained weights of canned cherries will vary from can to can. The problem, however, is to prevent the variation from becoming too great. A random sample of n, say 10, cans is inspected periodically. The cherries are drained and weighed. The weights are the observations, y. The hypothetical variance is (%)2, or u! = 1~6' The procedures given in Section 7.11 may be followed in testing this hypothesis. If the objective is to prevent the population variance from becoming too large, the onetailed test should be used. In other words, there is only one critical region on the right tail of the ,rdistribution. The two critical regions where ,r < 2.70039 and ,r > 19.0228, as given in the preceding section, should be replaced by one critical region where ,r > 16.9190. If the hypothesis is accepted after the statistical test, the production standard is maintained and no action is needed. If the hypothesis is rejected, th~ conclusion is that the drained weight varies more from can to can than the specified standard allows. Then some action is needed to correct the situation. The advantage of testing the hypothesis periodically is
84
Ch. 7
SAMPLE VARIANCEX2·DISTRIBUTION
that any defects in the manufacturing process are in this way revealed before a great deal of damage is done. This example of what is called quality control is an illustration of the general principles of the application of statistics in industry. The details may be found in the many books written on the subject. It should be realized that the term "quality control" used here is really quantity control. It has nothing to do with the quality of the cherries. 7.13 Remarks The purpose of this chapter, like that of the preceding chapters, is to introduce some of the basic principles of statistics. The application is incidental. Even though the test of the hypothesis that the population variance is equal to a gi ven value is extensi ve ly used in industrial quality control, its use among research workers is quite limited. Examples of its usefulness to research workers can be found; by and large, however, its importance as compared to the methods presented in later chapters is not very great. Nevertheless, princi pIes presen ted in th is chapter are repeatedly used in developing more useful methods. The ~.test is used in Chapters 21, 22, and 24 to test a wide variety of hypotheses other than the one mentioned in this chapter.
EXERCISES 0) Draw all possible samples of size 2, with replacement, from the popu
lation which consists of the observations 2, 3, 4, 5. Find the mean y and variance S2 for each sample. Show that (a) is an unbiased estimate of p. and (b) S2 is an unbiased estimate of fiA. (2) A random sample of 8 observations is drawn from the tag population whi ch is a normal population wi th mean equal to 50 and standard deviation equal to 10. The observations of the sample are as follows:
r
152 43 60 37
56
41
49 43.
Pretend that the population mean and variance are unknown. Using the 5% significance level, test the hypothesis that the population variance is equal to (a) 10 (b) 99 (c) 100 (d) 101 and (e) 1000. This is a twotailed test. Following the procedure given in Section 7.11 write a complete report for (a) only. For (b), (c), (d), (e), simply compute the ~.values and state the conclusions. For each of the fi ve cases state whether the conclusion is correct, or whether a Type I error or a Type II error is made. The purpose of this exercise is to acquaint the student with the basic principles of the test of a hypothesis. It is not an application to a practical problem: in practice one does not test various hypoth
85
QUESTIONS
>t
>t ""
eses with the same sample. [(a) = 61.8875; no error. (b) 6.2513; Type II. (c) = 6.1888; no error. (d) = 6.1275; Type II. (e) >t = 0.6189; no error.] (3) Using the 8 observations given in Exercise 2, verify the three identities proved in Section 7.8 (p. = 50, (J = 10). Do not drop the decimals in your computations. (4) The drained weights, in ounces, of 12 cans of cherries are:
>t
11.9 12.7
12.6 11.9
12.3 11.3
11.8 12.0
>t
12.1 11.8
11.5 12.1.
The specified standard deviation is ~ ounce. Is this specification being met? The purpose of this test of hypothesis is to detect the possibility that the standard deviation may become too large; therefore, the onetailed test should be used. Use the 1% significance level. (5) Find the variance sa for each of the 25 samples of Exercise 1, Chapter 5. Then show that the mean of S2 is equal to u'. (6) Repeat Exercise 1 with the sample size changed from 2 to 4. (7) Draw all possible samples of size 3, with replacement, from the population which consists of the observations 3 and 6. Find the mean y and variance S2 for each sample. Show that (a) the mean of
:y is
equal to 11 and (b) the mean of .~2 is equal to
(12.
(8) The specified standard deviation of a certain machine part is allowed to be 0.010 inches. A sample of 10 parts is measured and are recorded as follows:
1.011 0.975
0.998 0.995
0.980 0.970
1.021 1.000
1.025 1.031.
Test the hypothesis that the population standard deviation is equal to 0.010, at the 5% level, with the intention to detect the possibility that the standard deviation may become too large. QUESTIONS (1) One thousand random samples, each consisting of 5 observations,
are drawn from the tag population which is a normal population with mean equal to 50 and variance equal to 100. If the statistic SS/(J2 were calculated for each sample, (a) what distribution does this statistic follow? (b) what is the mean of this distribution? (2) If the sampling experiment were done as described in Question 1 except that each sample consists of 10 instead of 5 observations, (a) what distribution would the statistic SS/(J2 follow? (h) what would be the mean of this distribution?
SAMPLE VARIAN CE)(~ D1STRI BUTION
86
Ch. 7
(3) If the sampling experiment were done as described in Question (1) except that 2000 instead of 1000 samples were drawn, (a) what distribution would the statistic SSjq2 follow? (b) what would be the mean of this distribution? (4) If 10 were added to each of the observations in the population before the sampling experiment of Question (1) were carried out, (a) what distribution would the statistic SSjq2 follow? (b) what would be the mean of this distribution? (5) If the statistic concerned is I(y  p.)2jq2 instead of I(y  y)2jq2, what are the answers to Questions (1)  (4)? (6) What is the relation between the distribution of SSjq 2 and that of S2 jq2?
(7) (a) What is u? (b) What distribution does u2 follow? (8) What is the analysis of variance? (9) What is an unbiased estimate? (10) What can one do to reduce the probability of committing a Type II error, if the significance level is fixed at 5%? (ll) What are the assumptions underlying the ~test? (12) The ~test may be used in testing the hypothesis that the population variance is equal to a given value. If the hypothesis is accepted, the ~value must be in the neighborhood of what value?
REFERENCES Kendall, :llaurice G.: Advanced Theory of Statistics, Vol. II, Charles Griffin & Company, London, 1946. \Iood, Alexander M.: Introduction to the Theory of Statistics, McGrawHill Book Company, New York, 1950. Peach, Paul: An Introduction to Industrial Statistics and Quality Control, Edwards & nroughton Co., Raleigh, N.C., 1947. Pearson, Karl (Editor): Tables for Statisticians and Biometricians, Part I, Table XII, Biometric Laboratory, University College, London, 1930.
CHAPTER 8
STUDENT'S ,.DISTRIBUTION This cbapter introduces another important frequency distribution called Student's tdistribulion, named after W. S. Gosset, who used the pseudonym "Student" in bis statistical writings. First developed by Gosset early in this century, this distribution was subsequently modified by R. A. Fisber. It is tbis modified version which is presented in this chapter.
8.1 Description of tDistribution The "test is introduced in Section 6.7 to test the hypothesis that the population mean is equal to the given value 110 where (1)
However, the utest has limited practical use, because the population variance qa is usually unknown. The tdistribution is developed to overcome this difficulty. If the population variance qa in Equation (1) is replaced by the sample vtuiance s2, the resulting statistic is
r  110
1=
~.
(2)
The purpose of introducing t, therefore, is to remove the restrictive condition that the population variance must be known. The variance S2 in Equation (2) can be computed from a sample. Thus even though u has limited practical use, it is instrumental in introducing t. Since the mean of the means of all possible samples of the same size is equal to the population mean p" it is conceivable that the mean of t is equal to 0 or is the same as that of u. It is also conceivable that thp. variance of t is greater than that of " (variance of u = 1, Section 6.7). The statistic u is made of four elements, namely, y, p" qa, and n. Of these four elements only y cbanges from sample to sample. But while the statistic t is also made of four elements, namely, y, p" 52, and n, two of these elements, y and 52, change from sample to sample. As a result, it is expected that t will fluctuate more from sample to sample than u will. Therefore, it is expected that the variance of t is greater than 1, which is the variance of u. The tdistribution is not a single frequency curve, but a family of curves. The number of degrees of freedom of S2 uniquely identifies a
87
88
STUDENT'S tD1STRIBUTION
Ch. 8
.s
,'..
•1
Fig. 8.1
particular tcurve. The variation, from sample to sample, of sI diminishes as the number of degrees of freedom increases. As a result, the variation of t also diminishes as the number of degrees of freedom of sa increases. As the number of degrees of freedom approaches infinity, S" approaches u"; consequently t approaches u. Thus u becomes a special case of t. The number of degrees of freedom of s 1 is also called the number of degrees of freedom of t. In other words, a particular tdistribution is identified by the number of degrees of freedom of sI in Equation (2). From the graphs of the tdistributions, with 1, 4, and 00 degrees of freedom, given in Fig. 8.1, it can be seen that a tcurve is bellshaped and looks very much like the normal curve. Therefore, the casual observation of the graphs will not enable one to distinguish them. It is the relati ve frequency, not the general appearance, th at distinguishes one frequency curve from another. 'The tdistribution with 00 degrees of freedom shown in Fig. 8.1 is the udistribution which is the normal distribution with mean equal to 0 and variance equal to 1. The above discussion can be summarized in the following theorems: Theorem 8.1a If all possible samples of size n are drawn from a normal population with mean equal to p., and for each sample the statistic t, where (3)
8.2
EXPERIMENTAL VERIFICATION OF IDISTRIBUTION
89
is calcrJ.ated, the frequency distribution of the tvalues follows the Stu,dent' s ~distribution with 11 degrees of freeaom, where 11 is the number of degrees of freedom of s2 (11 = n  1 in this case). Theorem B.lb As the number of degrees of freedom of S2 approaches infinity, the Student's t~istribution approaches the normal distribution with mean equal to zero and variance equal to 1, that is, t approJJches U, as 11 approaches infinity. The experimental verification of Theorem 8.1a is given in the following section.
8.2 Experimental Verification of tDistribution Theorem 8.1a can be verified experimentally. The details of the sample experiment are described in Chapter 4. Briefly, 1000 random samples, each consisting of 5 observations, are drawn from the tag popnlation, which is a normal population with mean equal to 50 and variance equal to 100. For each sample, the statistic t is calculated. As an ell'ample, the computing procedure of t for the sample consisting of the observations 50, 57, 42, 63, 32 is shown as follows:
n=5
= 244 r = 244/5 = 48.8 (Iy)' = (244)' = 59,536 (Iy)' /n = 59,536/5 = 11,907.2 Iy
Iy = 12,506 SS = 12,506  11,907.2 = 598.8 (Section 7.4) s' = 598.8/4 = 149.7 (Section 7.4) s'ln = 149.7/5 = 29.94 ys'/n = Y29.94 = 5.472 = 48.8  50 = 1.2 t = 1.2/5.472 = 0.219.
r  ,.,.
For each of the 1000 samples, the tvalue is calculated as shown above. An example of the tvalues of four samples is given in Table 4.2. This is not so formidable a computing project as it seems. The values of and SS are already computed for previous sampling experiments, and very little additional computation is needed to obtain a tvalue for each sample. Since s' has n  1 or 4 degrees of freedom, t also has 4 degrees of freedom. The frequency table of these 1000 tvalues is given in Table 8.2. The theoretical frequency given in that table is that of the tdistrihution with 4 degrees of freedom. The histogram of the 1000 tvalues with the superimposed tcurve with 4 degrees of freedom is shown in Fig. 8.2. It can be seen either from Table 8.2 or Fig. 8.2 that
r
90
Ch. 8
STUDENT'S tDISTRIBUTION TABLE 8.2 t
Observed Frequency (
r.(.(%)
Theoretical

m
m/
5 4 3 2 1 0 1 2 3 4 5
40 24 69 170 21S 0 219 160 75 16 35
Below 4.5 to 3.5 to  2.5 to 1.5 to 0.5 to 0.5 to 1.5 to 2.5 to 3.5 to 4.5 Above 4.!i
S 6 23 85 218 325 219 80 25 4 7
.0 .6 2.3 8.5 21.S 32.5 21.9 8.0 2.5
.7
.5 .7 2.1 7.1 21.S 35.6 21.S 7.1 2.1 .7 .5
Total
1000
100.0
100.0
4.5 3.5 2.5 1.5 0.5 0.5 1.5 2.5 3.5
.4
Ina/ 'i../
~.lidpt.
r./.(%)
16
16 1000
\fean of t = , =   = .016 (This sampling experiment was conducted cooperatively by about 75 students at Oregon State College in the Fall of 1952).
the observed frequency and the theoretical frequency fit very closely. 'The observed frequency of t is based on the tvalues of 1000 samples, while the theoretical frequency is based on all possible samples of size 5. They do not exactly agree. The mean of the 1000 tvalues could be found easily if they were available. Unfortunately, however, the identity of each individual tvalue is lost after the frequency table is made. Yet the approximate value of the mean can be found by using the midpoint m of a class to represent Illl the tvalues in that class. For example, the class .5 to .5 is represented by 0 and the class .5 to 1.5 is represented by 1. The two extreme classes do not have definite class limits. They are arbitrarily assigned the values 5 and 5 respectively. Then the approximate mean of the 1000 tvalues is
Unf
16 ==.016 If 1000
which is very close to 0 as expected. It can also be observed from Table 8.2 that the variance of t is larger than that of u. For example, the relative frequency of u Leyond 3.5 and +3.5 is almost equal to 0 (Table 3, Appendix). But in Table 8.2, it car. be seen that 1.4% of the 1000 tvalues are less than 3.5 and that 1.1% of the tvalues are larger than 3.5. This shows that the variance of t must be larger than that of u. This experiment verifies the tdistribution and confirms the specula
8.2
EXPERI~IENTAL
VERIFICATION OF lD1STRInU1l0N
91
tion (Section 8.U that the mean of t is equal to zero and that the variance of t is larger than 1. After the verification of Theorem 8.1a, the theorem itself may appear to have been awkwardly stated. If every sample has n observations, S2 must have n  1 degrees of freedom, and consequently t has n  1 degrees of freedom. Certainly there is no need to say that t has II degrees of freedom, where II is the number of degrees of freedom of S2. However, the reason for stating the theorem in such an apparently awkward manner is that and S2 in
r
(1)
need not come from the same sample and not even from samples of the same size. For example, if 2000 random samples were drawn from the tag population. with the si ze of the first. third •••• , 1999tb samples being 4 and with the size of the second. fourth •.••• 2000th samples being 9. one tvalue could be calculated from a pair of samples of sizes 4 and 9 respecti vely. The sample mean could be calculated from the first sample with 4 observations, and the sample variance S2 could be calculated from the second sample with 9 observations. If the sampling experiment were conducted this way, atvalue
r
y50
.= If .. .1
.1
.1
 I
J
I
o
Fig. 8.2
92
STUDENT'S tDISTRIBUTION
Ch.8
could be calculated for each pair of samples. The quantity II in Equation (1) in this case would be 4, which is the number of observations from which is calculated, and the number of degrees of freedom of sI would be 8. Then the frequency distribution of these 1000 ,values would follow the ,distribution with 8 degrees of freedom. Although it may be difficult at this stage to see why rand S2 should be calculated from different samples, as the subject develops the desirability of doing so becomes obvious.
r
8.3 tTable The relative cumulative frequency of the ,distribution for various numbers of degrees of freedom is giveu in Table 6, Appendix. Each liue of the ttable represents a particular number of degrees of freedom. For example, for 4 degrees of freedom, 2.5% of the tvalues are greater than 2.776. Since the tcurve is symmetrical, tbis tabulated value also indicates that 2.5% of the tvalues are leu than 2.176. As the number of degrees of freedom increases, the tabulated values in the column labeled 2.5% in the ttable become smaller and reach 1.960 as the limit. This shows that t approaches u as the number of degrees of freedom approaches infinity, and also shows that the variance of t decreases 88 the number of degrees of freedom lDcreases. 8.4 Test of Hypothesis The preceding three sections deal with the deductive relation between a population and its samples; or more specifically, the distribution of the tvalues of all possible samples of the same size drawn from a given normal population. This section deals with the drawing of inductive inferences about the population from a given sample, or more specifically, the test of the hypothesis that the population mean is equal to a given value. It is necessary to establish the tdistribution before the test of hypothesis is discussed, because the tvalues which are needed to establish the critical regions come from the ttable, which is made according to the distribution of the tvalues of all possible samples of the same size drawn from the same normal population. The use of t is similar to that of u in testing the hypothesis that the population mean is equal to a given value. The only difference is abat, in the ttest, the sample variance S2 is used, while in the utest, the population variance 0 2 is used. Since the two distributions are not the same, the critical regions are also different. If the 5% significance level is used, the boundary values of the critical regions for a twotailed utest are 1.96 and 1.96, while in the ttest these values are replaced by the corresponding values in the ttable with the appropriate number of degrees of freedom. For 4 degrees of freedom, these values are 2.776 and 2.776. (Table 6, Appendix).
8.4
TEST OF HYPOTHESIS
93
All the discussious of significance level, Type II error, and sample size coucerniug the utest as given in Sections 6.4, 6.5, 6.6, and 6.7, apply to the ttest. To avoid duplication, only the rudiments of these discussions are repeated here. The test of hypothesis can be explained in terms of the sampling experiment of Section ~.2. It is important to realize that, in verifying the tdistribution with 4 degrees of freedom, the true population mean 50 is used in computing the 1000 tvalues, that is,
However, in testing the hypothesis that the population mean is equal to SO, the critical regions are where t < 2.776 and where t > 2.776, if the 5% significance level is used. Since 5% (relati ve frequency) of all possible samples of size 5 yield tvalues falling inside these regions, and thus lead to the erroneous conclusion that the population mean is not equal to 50, the probability of one sample, drawn at random, committing the Type I error is .05. The reason for risking the Type I error is the everpresent possibility that the hypothetical mean being tested may be false. For the sake of discussion, consider the true population mean I'to be 50 aod the hypothetical mean fLo to be 60. 10 testiog a hypothesis, the hypothesis is considered true until proved false. Therefore, the computed tis
, y 1'0 Y 60
.=
~~
(1)
Since the statistic (2)
follows the tdistribution, t'in Equation (1) cannot follow the tdistribution because the wrong value of the population mean is used. The effect of using 60 instead of 50 is to make t'less than t. When a computed tvalue is small (large negative number) enough to fall inside the left critical region, it could be because the hypothetical mean 1'0 is larger than the true mean p.; or it could be because the sample is unusual while 1'0 and I' are really equal. But the decision is to reject the hypothesis; that is, the former reason overrides the latter one. The conclusion is that the true mean p. is less than the hypothetical mean 1'0' If, for fear
94
STUDENT'S tD1STRIBUTION
Ch.B
of committing a Type I error, the bypothesis is accepted no matter bow small or bow large a tvalue is, the large tvalue, derived from the fact that ""0 < Il, or the small tvalue, derived from the fact that ""0 > Il, will escape detection. Consequently, any hypothesis, correct or false, will be accepted. In other words, one risks a Type I error to make possible the rejection of tbe hypothesis if it seems false. 1£ the significance level is made low, tbe probability of committing a Type I error is reduced but that of committing a Type II error is increased if the sample size remains the same. 8.5 Procedures The procedures of the Hest may be illustrated by an example, in which onedigit observations are used to make the computing procedures easy to follow. The observations of a gi ven sample are 5, 3, 1, 4, 2. A twotailed test, with 5% significance level, is used to test the hypothesis tbat the population mean is equal to 5. 1. Hypothesis: The hypothesis is that tbe population mean is equal to 5, that is, Jlo = 5. 2. Alternative hypotbeses: The altemative hypotheses are that (a) the population mean is less than 5 or (b) the population mean is greater than 5. 3. Assumptions: The given sample is a random sample drawn from a normal population. 4. Level of significance: The 5% significance level is chosen. 5. Critical regions: The critical regions are where t < 2.776 and where t > 2.776. 6. Computation of t: 1'0 n
l:y
Iyl = 55
=5 =5
SS
15 Y=3 (Iy)2 = 225 (Iy)2/n = 45 =
yf4,
2
S2
s2/n ys2/n
Y
= 10 =
2.5
= .5 = .7071
1'0 =2
t = ~==2.83 s .7071
. WIth
4 d.f.
n 7. Conclusion: Since t is inside the left critical region, the conclusion is that the population mean is less than 5. (If the tvalue were Letween 2.776 and 2.776, the conclusion would be that the population mean is equal to 5. 1£ the tvalue were greater than 2.776, the conclusion would be that the population mean is greater than 5.)
8.5
95
PROCEDURES
It should be noted that t, like u, has no unit of measurement. If the observations 'Y are measured in inches, y is a number of incbes; ,.,.., is a number of inches; S2 is a number of square inches; n has no unit of measurement. Therefore, the unit of t is
y inches ,.,.., inches
/?
s~ in.
=
number of inches number of inches

a pure nnmber.
Consequently, the unit used in the observations has no bearing on the tvalue. Similarly, if the same quantity is subtracted from or added to all the observations, the tvalue again is unaffected. If 32 is added to each of the observations, y is increased by 32; ,.,.., is also increased by 32; consequently 7.3879 (2.5% point of the Fdistrihution with 4 and 5 degrees of freedom); and the left critical region is where F < 1/9.3645 = 0.10679 (9.3645 is the 2.5% point of the Fdistribution with 5 and 4 degrees of freedom). Since the Fvalue 14 is in the right critical region, tbe conclusion is that
u:
s:/ s:
u:
s:
u: u:.
0:
s:/s:
u:
u:.
s: s: u: u:,
u:.
110
Ch. 9
VARIANCERATIOFDISTRIBUTION
o!
is greater than o~. But theoretically 2.5% of the Fvalues are greater than 7.3879 if the hypothesis is true, that is, o~ = Yet when F falls inside the critical region, the hypothesis is rejected. This decision is made because of the possibility that the hypothesis is false and is greater than All the discussions on the interlocking relation between Type I error, Type II error, and sample si ze given in Sections 6.3, 6.4, ti.5, and 6.6, apply to the test of the hypothesis that o! =
0:.
0:
0;.
0:.
Q.5 Procedures
0: 0:
The procedures in the test of the hypothesis that = can be illustrated by the following example. The observations of the first sample are 2, 3, 7 and those of the second sample are 8, 6, 5, 1. The procedures are as follows: 1. Hypothesis: The hypothesis is that the two population variances are equal, that is, o! = 2. Alternative hypotheses: The alternative hypotheses are that (a) < and (b) > 3. Assumptions: The given samples are random samples drawn from normal populations. 4. Level of Significance: The 5% significance level is chosen. 5. Critical regions: The critical regions for F with 2 and 3 degrees of freedom are where F < .025533 (.025533= 1/39.165 and 39.165 is the 2.5% point of the Fdistribution with 3 and 2 d.f.) and where F> 16.044. 6. Computation of F: The details of the computation of F are given in Table 9.5. The Fvalue is 7/8.667 = .8077 with 2 and 3 degrees of freedom.
0:.
0: 0:
0: 0:.
TABLE 9.5 
1
Sample No. 

,""=
2


2
8 6
3 7
Observations
~
(Ir~ (~f' n
~ 55 ~
55
5 1
12 144
400
48
100
62 14 7
126
= II  (Iy)2 In
i = 55/(n1)
20
~
8.667

9.6
WEIGHTED MEAN OF SAMPLE VARIANCES
111
7. Conclusion: Since F is outside the critical regions, the conclusion is that the population variances are equal, that is, = (If F were less than .025533, the conclusion would be that O'! < If F were greater than 16.044, the conclusion would be that > It should be noted that F is a pure number. If the observations are measured in inches, S2 is a number of square inches. Then F = which is a number of square inches divided by another number of square inches, is a pure number.
0': 0':. 0':. 0': 0':).
s:/ s:,
9.6 Weighted Mean of Sample Variances
In the example of Section 9.5, s: = 7 with 2 degrees of freedom, and s: = 8.667 with ::\ degrees of freedom. The conclusion drawn from the test of the hypothesis is that the two population variances are equal. Then both s: and s: are the estimates of the variance which is common to both populations. The problem is to combine s: and s: to form a single 0'2
0'2.
estimate of The average of s! and s: could be used for this purpose, but the average (~) (s! + s:) i goores the fact that s! wi th 3 degrees of freedom is a more accurate estimate of 0'2 than with 2 degrees of freedom. Therefore, in order to give more weight to the more accurate estimate, the weighted average s~ of s! and s: is used. Using the numbers of degrees of freedom as weights,
s:
2
S2
2
s. + "2 S2_ = ". ____
(1)
P
s:
where ", is the number of degrees of freedom of and lIz is the number of degrees of freedom of s:. The weighted average, s~, of s: and s: is also called the pooled estimate of aZ. For example, where s: = 7 with 2 d.f. and = 8.667 with 3 d.f. the pooled estimate of 0'2 is
s:
SZ
p
=
(2 x 7) + (3 x 8.667) 2+3
=
14 + 26 5
= 8.
It should be noted that "S2 = 55, because 55/" = sz. Therefore the pooled estiptate of O'z is actually (2) with ('" + lIz) d.f., where 55, is the 55 of the first sample with "1 degrees of freedom, and 55z is the 55 of the second sample with lIz degrees of freedom. From Table 9.5, it can be seen that 55, = 14 and S52 = 26. Th e nnmerator of S2 is 04 + 26) or 40 and the denominator is (2 + 3) or 5. The pooled varfance s~ is equal to 40/5 or 8 with (2 + 3) or 5 degrees of freedom. This method of obtaining pooled estimate of O'z stems from the following theorem which is verified experimentally.
112
Ch.9
VARIANCERATIOFDISTRIBUTION
Theorem 9.6 If the statistic SS/u l foUows the >tdiSlribuuon with II, degrees of freedom and the statistic SS,/U' follows the >tdislribueion with lIa degrees of freedom and SS/u l and SS,/U' are obtained from independent samples, the statistic
follows the x'distribution with (II, + II,) degrees of freedom. This theorem can be explained and also verified by the sampling experimenL In Section 7.7 it is shown that the value of SS/~ of the 1000 samples follow the X'distribution with 4 degrees of freedom. Five hundred pairs of samples can be made from the 1000 samples. The first and second samples form a pair; the third and fourth form a pair, and so on. An example of the S5values of 4 random samples, each consisting Qf 5 observations, is given in Table 4.2. The S5values are 598.8, 237.2, TABLE 9.6 55, + 55, 100
Observed Frequency
r./.(,,)
/
01
1
12
6
23 34 45 56 67 78 89 9  10 10  11 11 12 12  13 13  14 14  15 15  16 16  17 17  18 18  19 19  20 Over 20 Total
Theoretical
29 32 54 54 56 58 52 30 29 31 19 12 13 5 7 3 2 3 4
0.2 1.2 5.8 6.4 10.8 10.8 11.2 11.6 10.4 6.0 5.8 6.2 3.8 2.4 2.6 1.0 1.4 .6 .4 .6 .8
0.2 1.7 4.7 7.7 10.0 11.0 11.1 10.3 9.1 7.7 6.3 5.1 3.9 3.0 2.3 1.7 1.2 .9
500
100.0
100.0
Mean
+ 55, 100
f 55, 0
Midpt. m
r.f.(~)
.5 1.S 2.5 3.5 4.5 5.5 6.5 7.S 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 23.1
.6 .5 1.0
Ina/ 3942.4
!:[=
500
m/ .5 9.0 72.5 112.0 243.0 297.0 364.0 435.0 442.0 285.0 304.5 356.5 237.5 162.0 188.5 77.5 115.5 52.5 37.0 58.5 92.4 3942.4
=7.9
(This sampling experiment was done cooperatively by about 75 students at Oregon State College in the Fall of 1952).
9.6
113
WEIGHTED MEAN OF SAMPLE VARIANCES
396.8, and 319.2 respectively. The value of (SSI + SSa)/ul for the first pair is (589.8 + 237.2)/100 or 8.360; and that for the second pair is (396.8 + 319.2)/100 or 7.160. Such a value is obtained from each of the 500 pairs of samples. It should be noted that the 1000 samples are independent samples (Section 4.2). This is an important condition for the validity of Theorem 9.6. The frequency distribution of the 500 valnes of (SSI +SS2)/UI is given in Table 9.6. The theoretical frequency given in the table is that of the Xldistribution with 8 degrees of freedom. The histogram of the 500 values of (SSI + SS2)/U I with the snperimposed ~curve with 8 degrees of freedom is shown in Fig. 9.6. Both the table and graph show that the observed and the theoretical frequencies fit closely • . IS r.
I.
8
4
10
12
x'
14
Fig. 9.6
Since the mean of the Xldistribution is its number of degrees of freedom, the mean of the 500 values of (SSI + SS2)/U I is expected to be close to 8. The approximate mean of these values can be found from Table 9.6 using the midpoint m of a class to represent all the values in that class. The class 01 is represented by .5, the class 12 by 1.5, and so forth. The class "over 20" has no upper limit. The value 23.1, which is the mean of the 4 values falling inside that class, is assigned to that class. The approximate mean of the 500 values of (SSI + SS2)/U 2 is
I.m{ 3942.4 =
.l'..{
500
= 7.9
114
VARIANCE RA TIOFDISTRIBUTION
Ch.9
which is close to 8 as expected. This completes the verification of Theorem 9.6 and thus justifies the assignment of (VI + v2 ) degrees of freedom to the pooled variance, s~, where VI and V 2 are the numbers of degrees of freedom of s~ and respectively. The pooled estimate of 0 2 is obtained by Equation (2) only if the two populations have the same variance 0 2 but different means. If the two populations have the same mean and the same variance, the observations of the two samples should be lumped together to form a single sample of n l + ~ observations; and the S2 of this single sample with (ns + ~  1) degrees of freedom, is the pooled estimate of 02. The pooled estimate, s~, of the population variance 0 2 can be obtained from any number of sample variances. The use of this method is not limited to two samples. In general, if a random sample is drawn from each of k populations with the same variance 0 2, the pooled estimate of 0 2 , based on the k samples, is
s:
k
sum the_ _ SSvalues S2  _ _ of __ _ _ __ P 
sum of k numbers of J. f.
.
(3)
9.7 Relatioa Between FDtstrlbuUon aDd ~DtstribUtiOD
It is shown in Section 7.7 that the Statistic SS/02 follows the ~distri bution with V (v = n  1) degrees of freedom; and in Section 9.2 that the statistic sV s: follows the Fdistribution with VI and V 2degrees of freedom. Since s2and SS are related, that is, S2 = SS/v, the two distributions F and )(2 must be related. In fact, they are related in more ways than one. The relation demonstrated in this section has the most practical importance. Since S2 is an estimate of 0 2, S2 approaches 0'2, as the number of degrees of freedom approaches infinity. In other words, if infinitely many observations are included in a sample, the sample becomes the population itself, and S2 becomes 0 2• Therefore, the statistic F eo sV s: becomes SV'02 if V 2 approaches coo(jnfinity). But this statistic S2/02 also follows the distribution of )(2/V (Section 7.7). This relation between )(2 and F is stated in the following theorem. Theorem 9.7 If a statistic x 2 follows the X2distribution with V degrees of freedom, the statistic )(2/V follows the Fdistribution with v and coo degrees of freedom. The relation stated in Theorem 9.7 can be observed from Tables 5 and 7 of the Appendix. For example, the 5% point of ~ /v with 10 degrees of freedom is 1.8307 which is also the 5% point of F with 10 and coo degrees of freedom. In fact, the whole column of 5% points of X2/V with various numbers of degrees of freedom is the bottom line of the 5% Ftable. The column of 1% points of X2/V is the bottom line of the 1% Ftable. This relation holds for all the percentage points. Because of this relation, the test of the hypothesis that the population variance is equal to a given value (Section 7.10) can be performed by three different methods. Either
EXERCISES
115
the statistic X' = SS/(1' with n  1 degrees of freedom, or the statistic x'/(n 1)  S2/(1' with n  1 degrees of freedom, or the statistic F _ S2/(11 with n  1 and 00 degrees of freedom, may be used. Of course, the three seemingly different methods are really the same method, and the conclusions reached through these methods should always be the same.
9.8 Remarks The test of the hypothesis that two population variances are equal can be applied in comparing the uniformity 0 f a product made by two different processes. For example. two manufacturing processes are used to make a machine part with a specified length of 1.5 inches. With the average length of the machine parts being equal to 1.5 inches. the process which can produce parts with more uniform length is the more desirable process. A part will not fit into a machine if it is too long or too short. In other words, small variance is a desirable feature of the machine parts. To compare the variances of the length of the machine parts made by the two processes, one would obtain a random sample of the product made by each of the two processes, and measure the length of each machine part. Then, following the procedures given in Section 9.5, he can test the hypothesis that (1:  (1~ If one accepts the hypothesis, the conclusion is that the two processes can produce the machine parts with equal unifomity. If one rejects the hypothesis, the conclusion is that the process wi th the smaller variance is the more desirable one. Scientific research workers may use a similar test to compare the refinement of two experimental techniques. The technique which produces observations with the smaller variance is the more refined one. Despite the examples cited above, the direct applications of this test to experimental data by research workers are limited. The main purpose of introducing the F distribution at this stage is to prepare for a future topic called the analysis of variance (Chapter 12), and not to test the hypothesis that two population variances are equal. The F table is awkward to use for a twotailed test, because the percentage points of the left tail of the Fdistribution are not tabulated. However, the F table is not made for this purpose, but is largely for use in connection with the analysis of variance where the F test is a onetailed test. The twotailed Ftest is Dot used in the remainder of this text. EXERCISES (1) Two random samples are drawn from the tag population which is a normal population with mean equal to 50 and variance equal to 100. After the samples are drawn, 10 is subtracted from each observation of the first sample and 10 is added to each observation of the second sample. Then the first sample is actually a random sample drawn
116
VARIANCERATIOFD1STRIBUTION
Ch.9
from a nonnal population with mean equal to 40 and variance still equal to 100. The second sample is a random sample drawn hom a nonnal population with mean equal to 60 and variance equal to 100. The observations of the two samples are tabulated as follows: Sample 1 41 46 T1
53 45 37 57
46
Sample 2 58 72 81 67 67 72 59 63 59
Pretending that the sources of the two samples are unknown, test the hypotheses that the two population variances are equal, at the 5% level. Following the procedure given in Section 9.5, write a complete report. Since the population variances are actually known to be equal, state whether your conclusion is correct or whether you make a Type I error. Note that a Type n error cannot be made, because the hypothesis being tested is true. The purpose of changing the observations of the samples is to illustrate that the means of the populations have no bearing on this test. (F  1.48 with 7 and 8 d.f.) (2) If the hypothesis is accepted in Exercise (1), find the pooled estimate of the variance (I' which is common to both populations. Find the number o~ degrees of &eedom of this pooled variance s~. (s~71.6147 WIth 15 d.f.) (3) Lump the observations of the two samples of Exercise 0) together to fonn a single sample of 17 observations. Find the sample variance Sl of this sample. Compare this 8 1 with 8~ obtained in Exercise (2). Which quantity is greater? Why? Under what condition is s I the appropriate estimate of (II? Why? Under what condition is the more desirable estimate of (lJ? Why? (4) Multiply each observation of the second sample of Exercise (1) by 1.1, and recalculate Test the hypothesis that (I:  (I: at the 5% level. When each observation of the second sample is multiplied by 1.1, this sample becomes a sample drawn from the population with variance equal to 0.1)1 X 100 or 121 (Theorem 2.4b). Thus (I:  100 and (If = 121. Since the population variances are actually known, state whether your conclusion is correct or whether you have made a Type n error. A Type I error cannot be made when the hypothesis being tested is false. The purpose of this exercise is to illustrate that a false hypothesis is likely to be accepted with small samples if the hypothesis is not too far from the truth.
:1
8:.
117
EXERCISES
(5) Multiply each observation of the second sample of Exercise (1) by 5, and recalculate Test the hypothesis that at the 5% level. When each observation of the second sample is multiplied by 5, this sample becomes a sample drawn from the population with variance equal to 2500 (Theorem 2.4b). Thus 100 and 2500. Since the population variances are actually known, state whether your conclusion is correct or whether you have made a Type D error. A Type I error cannot be made when the hypothesis being tested is false. The purpose of this exercise is to illustrate that the false hypothesis can be rejected with relatively small samples if the hypothesis is far enough from the truth. (6) For the data of Exercise 4, Chapter 7, test the same hypothesis by the Ftest. (7) For the data of Exercise 8, Chapter 7, test the same hypothesis by the Ftest. (8) Two methods are available for packing frozen strawberries. The packer wishes to know which one gives him packages of more uniform weights. The weights of the packages produced by the two methods are as follows:
s:.
(7f  (7: (7: 
(7: 
Method A
Method B
16.1 16.2 15.9 16.1 16.1 15.9 16.0 16.2 16.0 16.2 16.1 16.0
15.8 16.4 16.3 16.2 16.4 15.7 16.0 16.1 16.5 16.0 16.1 16.3
Test the hypothesis that the two population variances are equal, at the 5% level. (9) In a study of the method for counting bacteria in rat feces, a new method and the old method were compared. Films were made by both methods and the resulting slides were fixed and stained with crystal violeL Twentyfive random fields were examined with a microscope and the number 4 and not 2 < 4. When both sides of an inequality 2 < 4 are divided by 2, the resulting inequality is 1 >2 and not 1 < 2. 11.2 EstimalioD by IDterval The problem of estimation of a parameter is already considered in the preceding chapters. For example, the sample mean is used to estimate the population mean".. But a sample mean is very seldom equal to the population mean. The four samples, each consisting of 5 observations, shown in Table 4.2, are random samples drawn from the tag population whose me_ is equal to 50, but none of the four sample means is equal to 50. It is obvious that a sample mean is not in general an adequate estimate of the population mean. Therefore, this chapter introduces a new approach to the problem of estimation, the use of an interval to estimate the population mean.
r
141
142
CONFIDENCE INTERVAL
Ch.11
It is a common practice in everyday life to use an interval for the purpose of estimation. For example, in estimating a person's age, one might say that Jones is in his fifties, that is, Jones' age is somewhere between 50 and 60. One would never estimate Jones' age as 52 years 5 months and 14 days. The fonner estimate by the interval 50 to 60 appears to be rough, but it is likely to be correct, while the latter estimate is precise, but it is likely to be wrong. The longer the interval, the greater the chance that the estimate will be correct. For example, one can estimate Jones' age as somewhere between 0 and 200. This interval is bound to include Jones' correct age. One has 100% confidence in the correctness of this estimate. But the interval is so long that it becomes useless, because anybody's age is between 0 and 200. But if the interval is kept short, for example in the estimate that Jones' age is between 52 and 53, one's confidence in the correctness of this estimate is not nearly so great as is his confidence in the correctness of the estimate that Jones is from 0 to 200 years old. Therefore, the length of an interval and the degree of confidence in the correctness of an estimate are of primary interest in the estimation of a population mean by an interval. 11.3 Coal'ideDce IDle"al and Coal'ldeDce Coefficient In this section, the estimation of the population mean is used as an illustration of the terms confidence interval and confidence coefficient. It is known that the statistic
yp.
.~ follows the normal distribution with mean equal to zero and standard deviation equal to 1 (Section 6.7). If all possible samples of size n are drawn from a population, and the uvalue is calculated for each sample, 95% of these uvalues fall between 1.96 and 1.96, that is, the inequality (1)
holds for 95% of all possible samples of si ze n. Two intervals can be derived from Inequality (1). When each element of Inequality (1) is multiplied by yu t In and then II is added to each element, the interval, expressed by the resulting inequality, is II 1.96
.9 V; < Y< II + 1.96 .~ V;·
(2)
11.3
CONFIDENCE INTERVAL AND CONFIDENCE COEFFICIENT
143
When each element of the original Inequality 0) is multiplied by  ..jut/II, the resulting inequality is
1.96
~> y +". > 1.96 ~.
r
Then is added to each of the elements of the above inequality. The interval, expressed by the resulting inequality, is
r + 1.96 .9" V;; > ". > Y 1.% .~ V; or
y 1.96 .R JUT V;;  'Y + I' >  t
.021
y is added to each of the three tenos, the resulting inequality is y  t.
02•
~ < I' < Y+ t. o• ~
(2)
The above inequality specifies the confidence interval of I' with a confidence coefficient of 95%. It should be realized that the confidence interval specified by Inequality (2) is not one interval, but a collection of intervals. For each ssnple of " observations, y and S2 can be calculated and t.021 can be obtained from the ttable and therefore a confidence interval of I' can be calculated. If all possible samples of size " are drawn from a nonoal population, 95% of the samples yield confidence intervals which will include the population mean. The sampling experiment described in Chapter 4 may be used to demonstrate the meaning of this confidence interval and its confidence coefficient. The four samples, each consisting of 5 observations, given in Table 4.2 are random samples drawn from a nonnal population with mean equal to 50. The values of and vsi/" are already computed for each of the samples. The value of t. 021 is 2.7764. For the first sample, the confidence in· terval of the popUlation mean, with a confidence coefficient of 95%, is
r
48.8 (2.7764 x 5.472) < I' < 48.8 + (2.7764 x 5.472) or
33.6 < I' < 64.0. Since the population mean is known to be 50, this sample yields a confidence interval which includes the population mean. The confidence intervals calculated from the other three samples are 36.0 to 55.2, 46.8 to 71.6, and 44.3 to 66.5 respectively. Each of these three confidence intervals also includes the population mean SO. If all possible samples of size 5 are drawn and, for each sample, a confidence interval is calculated, 95% of all the intervals will in elude the population mean 50.
11.5
CONFIDENCE INTERVAL OF DIFFERENCE BETWEEN MEANS
147
The confidence interval of the population mean can be applied to the same kind of problems described in Sections 8.6 and 8.7. In those sections, the problems are tests of hypotheses. In this section, there is no hypothesis and the problem is to estimate tbe population mean. Tbe sugar beet experiment described in Section 8.7 may be used to differentiate the two kinds of problems. In testing the hypothesis that the population mean is equal to zero, the objective is to determine wbether the use of fertilizer increases the yield of sugar beets. In finding the confidence interval of the population mean, the objective is to detennine how much the yield of sugar beets is increased by the use of fertilizer. In an application, only one sample, which may consist of many observations, is available and consequently only one interval is calculated. Before the confidence interval can be calculated, the confidence coefficient is arbitrarily chosen. If 95% confidence coefficient is chosen, t. 021 is used in computing the interval. If 99% coefficient is cbosen, t .001 is used in computing the interval. The confidence interval computed from a given sample mayor may not include the population mean. Since the population mean is unknown, there is no way to determine wbether the interval actually includes the population mean or not. The confidence coefficient is attached to the confidence interval as a performance rating based on the confidence intervals compnted from all possible samples of the same size. The method of computing the confidence limits of a population mean is quite simple. The method of computing and saln is given in Section 8.5. When these two quantities are computed, tbe intervals are obtained by Inequality (2). The confidence intervals of the population mean are already computed from the four samples given in Table 4.2. The reader may familiarize bimself with the computing procedure by recomputing these intervals. The length of the confidence interval given in Inequality (2) is
r
2 t. oa•
l"
which changes from sample to sample, because s' changes from sample to sample. But the average length of the confidence intervals computed from all possible samples of size n will be decreased by increasing the sample size. 11.5 Coafidence Interval of Difference Between Means The confidence interval of the difference between two population means can be obtained from Theorem 10.4a. The algebraic manipulation involved is the same 88 that given in the preceding section and therefore
148
CONFIDENCE INTERVAL
Ch.11
is omitted. The limits of the 95% confidence interval of Il. Ilt are
(r.  y,):t: t.OII
(1)
If the confidence coefficient of 99% is desired, the 2.5% point is replsced by .5% point of the tdistribution with ". + ".  2 degrees of &eedom. The method of computing
+ (y  1>.
(1)
The above equation is an algebriac identity. After simplification, it can be seen that the equation is y = y. For example, the first observation of Table 12.1a is 3; the first sample mean is 5 aDd the general mean is 6. Then by Equation (1),
3
=6 + (5 
6) + (3  5) = 6  1  2.
The quantity (1) is tbe deviation of the first sample mean from the general meaD and the quantity (2) is tbe deviation of the observation from the first sample meaD. The components of eacb of tbe 15 observations are sbown in Table 12.1b. The purpose of breaking down eacb observaTABLE 12.1b
I
Sample No.
(1)
(2)
(3)
CompolleDts of observations
612 61+2 61+2 61+1 613
6+3+0 6+3+3 6 +3 +2 6 + 31 6 +34
623 6 22 62+2 6 2 +0 6 2 +3
61+0
6+3+0
6 2 +0
1+(11> +(11> S. .ple me_
r
12.1
MECHANICS OF PARTITION OF SUM OF SQUARES
153
tion into components is to explain the algebraic identity that
L
(y 
1)2 =
L: (y ;)2 + L:
(y 
1)2.
(2)
The SUID of squares I(y  ;)2 in Equation (2) is called the total SS, which is the SS of the composite sample of the len or 15 observations. For the given example (Table 12.1a), AI"
L(y  9>2
= (3  6)2 + (7  6)2 + ••• + (7  6)1
a
148.
The middle term of Equation (2) is the SUlD of squares of the middle components of the observations (Equation 1 and Table 12.1b), that is, ira
L for all the k samples. But for each sample, ~y  1>  0 (Table 121b). Therefore, if "  1 of the deviations (y  y) are known, the ren
maining one becomes automatically known; and conse'JUently has ,, 1 degrees of freedom for each sample.
L (Y  y)2
Then the pooled value
Iu,
L (Y  1)2 for the k samples has k(n 
1) degrees of freedom.
12.3 Compaliag Method The mechanics of the partition of the sam of squares are shown in detail in Section 12.1. In that section the example used, involving mostly onedigit numbers, was deliberately chosen to avoid complicated com
160
ANAL YSIS OF VARIANCEoNEWAY CLASSInCA1l0N
Ch.12
putation. The purpose of that section is to show the meaning of partitioning the total SS into the amongsample SS and withinsample SS, but the method of doing so is too tedious to be of practical value. In this section, a shortcut method is presented for computing the total SS, amongsample SS, withinsample SS, and also the Fvalue. The shortcut method is developed entirely &om the identity (Equations 3 and 4, Section 7.4)
0) and the procedure is ~evised mainly to suit a desk calculator. The notations used in this section are the same as those used in Section 12.1, that is
k number of samples; n number of observations of each sample; '1an observation; Tsample total, e.g. T" TI , •• ' , Tk i rsample mean, e.g. rl' •• ', G grand total; ygeneral mean.
r".
rk;
Since the general mean y is the mean of the kn observations, the total SS is the SS of the composite sample of the kn observations. From Equation (I), it can be seen that kit
total SS =
ca
kit
L ('1  1)2 = LY . kn
r
(2)
r,
Since is also the mean of the k sample means the application of Equation (1) to the sample means leads to the result that amongsample SS But
k
=n L (r  1)2
ris the sample total T divided by n.
I:
n
r,k Cty)ll LLY  kJ
Therefore,
1:n
kn
(3)
12.3
COMPUTING METHOD
161
The SS for each sample is II
(~)J
1'1
II
Lr= Lr. n n The pooled SS for the k samples is the sum of the k S5values, that is,
withinsample SS
=L r '"
~  .J..J!.... n 'P.I
(4)
Now the computation of the total SS, amongsample SS, and withinsample SS narrows down to the computation of the following three quantities:
,
(I) : ;
(ll)
1:."'P;
(III)
i: r.
Then, from Equations (2), (3), and (4), it can be seen that total SS amongaample SS
withinaample SS
= (Ill)  (I) = (D)  (I) == (Ill)  01).
(5) (6)
(7)
Now it is apparent, from the above equations, that the total SS is equal to the amongsample SS plus the withinsample SS. The basic principle of the shortcut method is to replace the means by the totals in all steps of computation. The general mean y is replaced by the grand total G and the sample mean:; is replaced by the sample total T. The observation r remains intact. With this principle in mind, it is easy to remember of what particular combination of the three quantities (I), (II), and (III) a certain 55 consists. The total S5, being the sum of the squares of the deviations (r  9>, is equal to (III)  (I). The quantity (III) involves r and the quantity (I) involves G which is associated with y. The amonJl:sample (treatment) S5, being the sum of squares of the deviations (y  n, is equal to (II)  (I). The quantity (ll) involves T which is 8880ciated with and the quantity (I) involves G which is 88sociated with The error SS, being the sum of the squares of the deviations (r  1>, is equal to (III)  (II). The quantity (III) involves r and the quantity (II) involves T which is 88sociated with y. With these 88sociations in mind, the shortcut method becomes meaningful and easy to remember. In the analysis of variance calculations, the sample totals T and the grand total G are obtained first. The rest of the computation can be arranged in a tabular form shown in Table 12.3a. Note that the sample
y.
r,
162
Ch. 12
ANALYSIS OF VARlANCEoNEWAY CLASSIFICATION
r
mean and the general mean and its components. The quantity (Table 12.3a)
y are
If2 = T:+
not needed in computing the total SS
11 + ••• + r:
can be obtained in one continuou8 operation on a desk calculator. Of course, Ira can also be obtained in the same way. TABLE 12.3a Preliminary Calculations (1)
(2)
(3)
(4)
Type of Total
Total of Squares
No. of Items Squared
No. of Observations per Squared Item
Crand Sample Observation
IT'
C'
1
kn n
I.y2
k kn
5) Total of Squares per Observation (2)
+ (4)
(I) (II) (In)
1
Analysis of Variance Source of Vsriation Amongsample Wi thinsam pie Total
Sum of Squares
SS (II)  (I)
(III)  (II) (III) (I)
Degrees of Freedom
k 1 kn  k kn 1
Mean Square MS
ns!y
F
ns~.s:
sa p
The three S5values in the lower half of Table 12.3a are obtained by subtracting one item from another item of column (5) of the upper half of the table. The number of degrees of freedom can be obtained by performing the corresponding subtraction among tlie items of column (3) in the upper half of the table. For example, the total SS is obtained by subtracting (I) from (III) and its number of degrees of freedom, len  1, can be obtained by subtracting the first item, 1, from the third item, len, of column (3). The total SS and its components of the example given in Table 12.1a we already found in Section 12.1. Now the shortcut method is used on the same 15 observations. The details of the computation are shown in Table 12.3b. It should be noted that the three S5values obtained by the shortcut method given in Table 12.::\b are the same as those obtained in Section 12.1 by the long method. For this particular example, the ad
12.4
163
VARIANCE COMPONENTS AND MODELS TABLE 12.3b Preliminary Calculations (1)
(2)
(3)
(4)
(5)
Type of Total
Total of Squares
No. of hems Squared
No. of Observations per Squared
Total of Squares per Observation (2) .;. (4)
Item Grand Sample Observation
1 3 15
8.100 3,050 688
15 5 1
540 610 688
Analysis of Variance Source of Variation
Sum of Squares
Amongsample Wi thinsamp Ie Total
70 78 148
SS
Degrees of Freedom 2
12 14
Mean Square
35.0 6.5
F
5.38
vantage of using the shortcut method is not obvious. because the total number of observations is small and the observations are mostly onedigit numbers. Wheu one compares the two methods on a practical problem, however, one soon realizes that the shortcut method indeed deserves its name. 12.4 Variance CompoaeDls and Models In Section 12.2, it is shown that, il the " samples of size n are drawn from the same population, the variance among the k sample means
(1)
is an estimate of (II In. The average of s; of all possible sets of k samples of size n drawn from the same population is equal to (I I In (Theorem 7.2). It is of interest to know the average of s~ if the k samples are draWD from populations with different means but with the same variance. 'The sampling experiment of Section 12.2 may be used to determine this average. In this sampling experiment, where k .. 2, n = 5, all 1000 samples are
164
ANAL YSIS OF VARIANCEoNEWAY CLASSlnCATION
Ch. 12
drawn &om the same nonnal population with mean equal to 50 and variace equal to 100. However, the samples could be drawn from different populations. For example, the first samples of each of the 500 pair of samples could be drawn from a population with mean equal to 60 and the second sample drawn from a population with mean equal to 40. If 10 is added to each of the 5 observations of the first sample of each pair and 10 is subtracted from each of the 5 observations of the second sample, then these samples, with changed observations, become samples drawn from normal populations with means equal to 60 and 40 respectively. But both variances are still equal to 100 (Theorem 2.4a). The effect of these changes in the observations on the variance among sample means can be seen from the example of four samples given in Table 4.2. The means of the firat pair of samples are 48.8 and 45.6 respectively. These changes in observations, and the resulting changes in population means, will change the two sample means into 58.8 and 35.6 respectively, but with the general mean, :; = 47.2, unchanged. When the two samples are drawn from the same population with mean equal to 50, the variance of the sample means is 1
(48.8  47.2)1 + (45.6  47.2)1
'1
21
s_ =
= 5.12.
Now when the first sample is drawn from the population with mean equal to 60 and the second sample drawn from the population with mean equal to 40, the variance of the sam pie means is ....t oS;.
'1
=
(58.8  47.2)1 + (35.6  47.2)1 21
=
2 69.12.
Because of the variation among population means, the variance among sample means is increased from 5.12 to 269.12, with a net gain of 264. On the average, the gain is equal to the variance of the two population means. In general, the average of for all possible sets of Ie samples drawn from Ie populations is equal to CT 1In + CT;, where CT I, the variance of the Ie population means, is defined as IJ.
Sv
(2) The notation ~ in the above equation is the mean of the Ie population means, Ill, #la, ••• , "'Ie· The sampling experiment may be used to verify this point. The 1000 samples of the experiment are all drawn from the
12.4
165
VARIANCE COMPONENTS AND MODELS
same population with mean equal to 50. The variance among the sample means is
(3)
If the two samples of each of the 500 pairs of samples are drawn from populations with means equal to 60 and 40, the effect is that is increased by 10 and 11 is decreased by 10, and consequently the difference 1.  11 is increased by 20. The variance of the new sample means
r.
is then
The effect of the population means being 60 and 40, instead of being equal, is that the variance of a pair of sample means is increased by 20 2.7587. When an Fvalue is less than 2.7587, the hypothesis that the " population means are equal is accepted and thus a Type II error is committed. It can be seen from Fig. 12.5 that the probability of committing a Type II error, as shown by the shaded area of the distorted Fdistribution, is very small. The procedure of the test of hypothesis, illustrated by the example given in Table 12.1a, is summarized as follows: 1. Hypothesis: The hypothesis is that the three population means are equal, that is, 1',  1' ... 1'. The hypothesis can also be stated as q~  o. 2. Alternative hypothesis: The alternative hypothesis is that the three population means are not all the same, or that q~ > o. 3. Assumptions: The assumption is that the three samples are random samples drawn from normal populations with the same variance. 4. Level of significance: The 5% significance level is used. 5. Critical region: The critical region is where F> 3.8853.(Since "  3 and n  5, the numbers of degrees of freedom of F are 2 and 12. This Ftest is a onetailed test; therefore, the 5% table is used for the 5% significance level). 6. Computation of F: The details of the computation of F are shown in Table 12.3b. The Fvalue is equal to 5.38, with 2 and 12 degrees of freedom. 7. Conclusion: Since the Fvalue is inside the critical region, the hypothesis is rejected, and the conclusion is that the three populations do not have the same means. (If the Fvalue were less than 3.8853, the conclusion would be that the three populations do have the same means.)
170
ANAL YSIS OF VARIANCEoNFrWAY CLASSIFICATION
Ch. 12
..,o
0
N
at)
N .... eO
.~
c.
"~ "uI
~
r.... ."
!
~
.i !::
\
0
12.6
RELA110N BETWEEN tDISTRIBU110N AND FDISTRIBU110N
171
12.6 Relation Between tDtstrlbatlon aDd FDtstrllnatloa
If two samples are available, the ttest may be used in testing the hypothesis that the two population means are equal (Section 10.6), but the Ftest (analysis of variance with k  2) can also be used for the same purpose. It is interesting to see whether the two tests will always lead to the same conclusion. In the ttest, if n, D n l  n, y, YI
t
D
,
~(~+
V si
n,
y, fl
.:.)_, ~ n V;;
(1)
l
with (n, + n l  2) or 2n  2 degrees of freedom. In the analysis of variance,
(2) (Equation 4, Section 12.2) with 1 (k  2, k  1  1) and 2(n  1) degrees of freedom. After some algebraic manipulation, it can be shown that t l  F. The quantity is equal to (1/2)(f,  fl)1 (Equation 3, Section 12.4). Hence
s;
n
n
n
and t has 2(n  1) degrees of freedom, while F has 1 and 2(n  1) degrees of freedom. The tdistribution is symmetrical, with zero in the center of the distribution (Fig. 8.1). The Fdistribution is asymmetrical with zero as the lower limit (Fig. 9.1). A tvalue may be either negative, zero, or positive, but t 2 can be only zero or positive, because the square of a negative number is positive. For t with 10 degrees of freedom, 2.5% of all tvalues are less than 2.2281 and 2.5% of all tvalues are greater than 2.2281, but (3)1 and 31 are both equal to 9 and greater than (2.2281)1. Therefore, a total of 5% of all tlvalues are greater than (2.2281)1 or 4.96. This value 4.96 is the 5% point of F with 1 and 10 degrees of freedom. The Fdistribution with 1 and &I degrees of freedom is then a doubledup version of the tdistribution with &I degrees of freedom. The two tails of t are folded into the right tail of F. The center portion of the tdistribution becomes the left tail of Fdistribution. The square of the 2.5% point of t with &I degrees of freedom is the 5% point of F with 1 and &I degrees of freedom. This relation can be observed from the ttable and Ftable. The 2.5% points of t for various numbers of degrees of freedom are 12.706, 4.3027, 3.1825, etc. The squares of these values
172
ANAL YSIS OF VARIANCEoNEWAY CLASSIFICATION
Ch.12
are 161.44, 18.513, 10.128, etc., which are the values given in tile first column of the 5% Ftable. Therefore, in testing a hypothesis that two population means are equal, either the twotailed ttest or the onetailed Ftest (analysis of variance) may be used, and the two tests always yield the same conclusion. The fact that t 2  F can be further substantiated. In Section 7.9, it is shown that
ss:
I(y  1l)1
d.,.:
II
Then
n(r  1l)1

1l)1 + 1:(,.  1)1
(4)
+nl
(5)
1

n(y  1l)1
1
F I.(,. 
= nCr 
y)1
Sl

(f 
".)1
(6)
Sl
n
nl
follows the Fdistribution with 1 and n  1 degrees of freedom. In Theorem 8.1a, it is shown that
(7)
follows the tdistribution with n  1 degrees of freedom. It can be seen from Equations (6) and (7) that t l  F. The relation between the tdistribution and the Fdistribution can be summarized in the following theorem:
Theorem 12.6 If a statistic t follows the Student's tdistribution with degrees of freedo~ t l follows the Fdistribution with 1 and II degrees of freedom. II
12.7 Asaamptloaa The assumptions are the conditions under which a test of hypothesis is valid. In the analysis of variance, the assumptions are the conditions under which the statistic
F
= amongsample
mean square _ treatment MS
withinsample mean square
(1)
error MS
follows the Fdistribution. These conditions, which are already demonstrated in the sampling experiment of Section 12.2, are listed below: (a) The Ie samples are random samples drawn from Ie populations. (b) The Ie populations are normal. (c) The variances of the Ie populations are equal.
12.8
APPLICATIONS
173
Of course, equality of the Ie population means is another necessary condition under which the statistic F follows the Fdistribution. But this conditioD is the hypothesis being tested. The hypothesis is considered true before the test, but it is subject to rejection after the test. The other three conditions listed above must be true, before and after the test. If the assumptions are not satisfied, the stati.tic F given in Equation (1) will not follow the Fdistribution and consequently the percentage points given in the Ftable will not be correct. Then the 5% point is not really the 5% point but a different percentagepoint. Therefore, when the 5% point in the Ftable is used in determining the critical region, the actual significance level is not exactly 5%, but more or less than 5%. Hence, the consequence of using the existing tables without satisfying the assumptions underlying a test of hypothesis is that the significance level is disturbed. Another conceivable consequence is that the probability of committing the Type II error may also be affected. The consequences of not satisfying the assumptions have been investigated by many workers. The results of these investigations are briefly summarized as follows: (a) Randomness of samples. The randomness of the samples can be achieved by randomizing the experimental material (Section 10.9). The random number table may be used for this purpose. In other words, one avoids the consequences of non~andomness by making the samples random, instead of creating nonrandomness and then worrying about the consequences. (b) Normality of populations. Nonnormality of the populations does not introduce serious error in the Ftest or in the twotailed "'test. 1£ the Ftable and ,..table are used in detennining the critical regions, the true significance level is actually larger than the one being specified. For example, if the 5% point of the Ftable is used to determine the critical region, the true significance level is actually larger than 5%. Therefore, if the hypothesis is true, the rejection of it is more likely than the significance level indicates. (c) Homogeneity of variances. If the variances of the Ie populations are not too much different, the F test and the twotailed ,..test are not seriously affected. The effect of heterogeneity of variances can be reduced by using samples of the same size (Section 10.7). In conclusion, a slight departure nom the assumptions will not cause serious error in the Ftest and the twotailed test. In Chapter 23, methode are presented for correcting the departure nom the assumptions.
12.8 Applications The analysis of variance is a very comprehensive topic in statistics. What is presented in this chapter is only the simplest case. Originally,
174
ANALYSIS OF VARIANCEQNFrWAY CLASSIFICATION
Ch.12
the analysis of variance was developed by R. A. Fisher for Dee by agricultorisls in analyzing data from their field experiments. At the present time, ils applications are extended to various experimental sciences, some examples of which are given in this section. A manufactorer has three (Ie  3) di((erent processes of making fiber boards and wishes to determiue whether these processes produce equally strong boards. A random sample of, say, 20 (n = 20) boards is to be obtained from each of the products manufactured by the three processes. The strength (observation) of each of the 60 boards is determined. Then the analysis of variance may be used to test the hypothesis that the average strengths of the boards produced by the three different processes are the same. The analysis is as follows (the case of multiple observations for each board is discussed in Section 18.9):
Sources of VCJTiation
Degrees of Freedom
Amon g processes Within processes Total
57 (len  Ie) 59 (len 1).
2 (Ie 1)
Five different kinds of feed may be compared for their fattening ability. A total of 50 animals may be divided into·5 groups, at random, with the aid of a random number table (Section 10.9). The animals are individually fed, but each group of 10 animals is fed a different ration. The animals are weighed, both before and after a feeding period. The gain in weight of each animal is an observation. Then the analysis of variance may be used in testing the hypothesis that the 5 groups of animals, on the average, gained the same amount of weight. The analysis is as follows:
Sources of Variation
Degrees of Freedom
Among rations Within rations Total
45 (len  Ie) 49 (len 1).
4 (Ie 
n
Two examples of possible applications of the analysis of variance are given in this section. The applications seem to be quite straightforward once the basic principles of the analysis of variance are mastered. However, it must be realized that these examples are deliberately chosen for their simplicity and, therefore, are deceiving. In general, the application of statistics involves the translation of the abstract idea of sample and population into a practical problem. This translation has never been proved easy. It is baffling even to the experts at times. In order to make an abstract idea (mathematical model) agree as closely as possible with a practical problem, the method of collecting data becomes all important. The design of an experiment, which is a process of collecting data, is no longer an amateur's job. At the present time, research or
12.10
UNEQUAL SAMPLE SIZES
175
ganizations usually have consulting statisticians on their staff. Their job is to recommend methods of collecting data to suit existing statistical methods, to modify the old statistical methods to fit practical problems, or to develop new statistical methods. An experimental scientist consults the statistician for this purpose before he starts his experiment and not after the experiment is completed, much as one consults an architect when he wishes to build a house. The purpose of consulting an architect is to obtain his advice before a house is built and not his blessing after the house is finished. An experimental scientist consults a statistician at the planning stage of an experiment mainly to take advantage of the statistician's knowledge. However, the limitation of the statistician's knowledge is also a reason for early consultation. Modern statistics is a very new field, as witnessed by the fact that most of the outstanding contributors to its development are still living. Therefore, even though a vaet amount of statistical knowledge has been accumulate,! during the last 50 years, the number of statistical methods which can be directly applied to practical problems is still quite limited. In general, experimeuts are designed within the limit of existing methods. But if there is no existing method for dealing with a particular set of experimental data at hand, a statistician may be unable, or unwilling, to help the experimental scientist. The production of a tailormade method usually requires both time and talent, and even the best brains do not produce drastically new methods in rapid succession. Moreover, e theorret, in'statistics as in any field of science, usually prefers to select problems to suit his own talents and may not wish to spend his time on an assigned problem which he mayor may not be able to solve. Therefore, the experimental scientist should consult a statistician before an experiment is started to make sure that a statistical method is available or a method can be developed, to handle the experimental data.
12.9 SpectRe Tests There are many tests associated with the analysis of variance. The Ftest described in Section 12.5 is a general one. It only tests the hypothesis that the k population means are equal. The other tests are designed to test more specific hypotheses concerning the k population means. Some of these tests are presented in Sections 15.3, 15.4, 15.5, and 17.8. The advantages of these tests over the general Ftest are discussed in those sections.
12.10 Uaeqaal Sample Sizes So far, the analysis of variance being considered is the case of k samples, each consisting of n observations. In this section, the case of
176
Ch. 12
ANAL YSIS OF VARIANCEoNErWAY CLASSInCATION
unequal sample sizes is considered, that is, the " samples consist of "i observations respectively. The hypothesis and assumptions are the same for both cases. The difference is mainly in the computing method which is to be illustrated by the example given in Table 12.10a. The 3 (" = 3) samples consist of 2, 2, and 4 observations respectively, that is "a = 2, ". = 2, ". = 4, and L& = 8. It should he noted here that the total number of observations is 1:" instead of "". Each of the 8 observations can be partitioned into three components, that is,
"If "If ... ,
Y
r+ (f  y) + (y  y).
(1)
The components of the observations are .hown in Table 12.10b. The amongsample SS is 1:,,(y  y)1 
",(f, 
= 2(3)1
1)1 + "I(fl  1)1 + n. 0. Thus the hypothesis that u~  0, is more likely to be rejected if the hypothesis is false. Therefore, the probability of committing the Type n error is reduced by equalizing the sample sizes. However, if the hypothesis is true, that is, u~  0, the value of no has no effect on the ratio which is equal to 1. Consequently, the probability of committing a Type I error is not affected by the value of noo Therefore, equalizing the sample sizes has every advantage. It can be proved algebraically that l:n Sl Sl no    .!!. ii  .!!. Ie l:n l:n
(2)
wb ich indicates that no is always less than or equal to the average sample size, ii. The magnitude of the difference between no and ii depends on the variance, s!, of the sample sizes, where l:n l  (l:n)I/1e
l:(n  ii)1 Sl 
IeI
n

IeI·
(3)
If the total number of observations is the same, the larger the variation amoDg the sample sizes, the larger is the probability of committing a Type II error. The algebraic proof of Equation (2) is as follows: 110 _ _ 1 rl: n _ l:nl] Ie 1 In
L
_ l:n _ l:n + _1_ ~l:n)1 l:nll
J
k 1 l:n (k  1) (l:n)1  1e(l:n)1 + kl:n l
k l:n
ktdistribution with 1 degree of freedom. (2) The relation between " and t is given in Theorem 8.lb. The tdistribution approaches the udistribution as the number of degrees of freedom approacbes infini ty. (3) The relation between Xl and F is given in Theorem 9.7. If a statistic ~ follows the Xldistribution with v degrees of freedom, Xl/V follows the Fdistribution with v and 00 degrees of freedom. (4) The relation between t and F is given in Theorem 12.6. If a statistic t follows the Student's tdistribution with v degrees of freedom, f follows the Fdistribution wi th 1 and v degrees of freedom. (5) The relation between " and F can be deduced from the relation between "and t and that between t and F. If" follows the tdistribution with 00 degrees of freedom, ,,2 must follow tbe Fdistribution with 1 and DO degrees of freedom. It should be noted that each of tbe three distributions, II, t, and >t, is
>t
a special case of the Fdistribution. The various relatioDs are suuuoar
ized in Table 13.2, which is made in the form of tbe Ftable. 1£ Table 13.2 is the 5% Ftable, the figures in tbe first column of the table are the squares of the 2.5% points of the tdistribution (Section 12.6). Tbe figures in the last line of the table are the 5% points of /v (Table 5, Appendix). Tbe lower left comer of the table is The 5% point of F with 1 and 00 degrees of freedom is (1.960)1 or 3.842t, and F. In Tbe four distributions are introduced in tbe order of II, terms of Table 13.2, the introduction is started from tbe lower left comer (,,) aDd tben extended to the whole bottom line. Then" is extended to the wbole first column, and finally the whole table is covered. As
,,2.
>t
>t,
190
Ch. 13
REVIEW TABLE 13.2 Flable
~ "2 1 2 3
· ·
· 00
1
2
3
...
00
l
II'
)(/lla
1
compared with the Fdistribution, the other three distributions are relatively unimportant; but they are instrumental in introducing the statistical concept. To simplify the subject of statistical methods, it seems to be more practical to merely list a collection of methods, each for a particular purpose, aud say nothing of the relations among the various methods. Yet anyone who reads research papers in the fields where statistics is used, is seriously handicapped if he does not have some knowledge of the relations among the various distributions. For example, if one knows that the ttest can be used to test the hypothesis that two population means are equal, he may become bewildered when he encounters the analysis of variance being used for the same purpose. [n fact, it is quite common among beginners to use the analysis of variance to test the hypothesis that two population means are equal after the ttest is already perfonned, hoping that a least one test will produce a favorable conclusion (Section 12.6).
13.3 TealS or Hypodaesea The preceding section shows that the various distributions are introduced systematically. This section shows that the methods of testing various hypotheses are also introduced systematically. The methods are listed in Table 13.3. [t should be noted that the Ftest can be used in testing every hypothesis listed in the table. All of the four statistics, u, )(2, t, and F, listed in Table 13.3 are pure numbers. Regardless of the unit of measurement used in the observations, the values of these four statistics remain unaffected. The length may be measured in inches or in centimeters; the yield of crops may be measured in pounds per plot or bushe Is per acre; the temperature may be measured in Fahrenheit or centigrade; whatever the unit of measurement, the values of u, )(2, t, and F remain unchanged. As a result, the conclusions reached by the tests of hypothesis remain the same, even though the unit of meSAurement is changed.
13.3
191
TESTS OF HYPOTHESIS
The hypotheses listed in Table 13.:J are very simple. If two populations are available, it is a simple matter to detennine which population has a greater mean. If III is known to be 49.2 and 112 is known to be 49.1; anyone can see that III is greater than 112 without committing any type of error and, furthermore, anyone can see that p..  Il2 = .1 without the aid of a confidence interval. Such simple arithmetic as this becomes a complicated statistical method when the populations are not available and samples must be relied upon. If 11 = 49.2 and 12 = 49.1, it cannot be concluded that III  P2 = .1, because the sample mean changes from sample to sample. A conclusion concerning the population means must be reached through the use of statistics. There are only two conditions under which statistics is not needed. Either the sample sizes are 80 large TABLE 13.3 No. of Parameters
Hypothesis
I
Statistic
Section No.
1
17 2 =17:
1
Il =1'0
~/V F , F
8.4 & 8.5 7.9 & 12.6
2
cI.=o:
F
9.4 & 9.5
2
p..=112
,
F
10.6 12.5
p..=P2=···llk
F
12.5
Ie
7.10 & 7.11 7.10 9.7
that the samples become the populations themselves, or the population variances are equal to zero. When all the observations of one population are the same, and, at the same time, all the observations of another population are the same (that is, both population variances are equal to zero), the difference between III and 1'2 can be determined with certainty by drawing one observation from each population. For example, all the observations in one population are equal to 49.2, and all the observations in another population are equal to 49.1.
Then the two population
means are 49.2 and 49.1 respectively. As a result, only simple arithmetic is needed to determine the difference between the two population means, and a statistical method is not needed for this purpose. Whenever conclusions about the populations are reached without using statistics, one or both conditions mentioned above must be fulfilled. However, a problem in which the variance is equal to zero frequently ceases to be of interest to scientists. For example, sociologists may be interested to know the income distribution of a country. But if the in
192
REVIEW
Ch. 13
come of every person is the same, immediately the problem itself disappears. 13.4 Significance The word significance has a technical meaning in statistics. In general, it is used in connection with the rejection of a hypothesis. The meaning of the word significance depends on the hypothesis tested. This is the reason why various tests of hypotheses are presented before the word significance is introduced. In testing the hypothesis that the population mean is equal to a given value, say 60, the sample mean is said to be &ignifico.nely different from 60, if the hypothesis is rejected. If the hypothesis is accepted, the sample mean is said to be not significantly different from 60. In testing the hypothesis that two population means are equal, the two sample means are said to be significantly different if the hypothesis is rejected, that is, if the conclusion is reached that the two population means are different. The mere conclusion that two population means are different does not imply that there is a substantial difference between them. The magnitude of the difference must be estimated by a confidence interval. The result of the analysis of variance is said to be significant if the conclusion is reached that the k population means are not equal. The word significance is used only in connection with statistics and never with parameters. Two sample means may be significantly different or Dot significantly different, depending on the rejection or acceptance of the hypothesis that the two population means are equal. But the word sigllificantiy is not used to modify the difference between two population means. 13.5 Sample Size The sample size plays an important role in statistical methods. It seems to be intuitively obvious that the larger the sample size, the more accarate the result. However, the specific benefits to be derived from a large sample are not so obvious. Two types of errors are involved in testing a hypothesis. The increase in accuracy of a test always meaDS the reduction of the probability of conmitting one or both types of error. The probability of committing 8 Type I error is called the significance level, which may be fixed as large or as small as one wishes, regardless of the sample size. The advantage of having a large sample is to reduce the probability of committing a Type II error, after the sisnificance level is fixed. If the hypothesis being tested is true, only a Type I error, if any, can be made. As long as the significance level remains the same and the
13.6
SIMPLIFIED STATISTICAL METHODS
193
hypothesis being tested is true, a large sample has no advantage over a small one. Il the hypothesis being tested is false, only a Type II error, if any, can be made. If the significance level remains the same, the probability of committing a Type II error will be decreased by an increase in sample size. In othcr words, a large sample is more likely to callsc the rejection of a false hypothesis than is a small sample. This is the advantage of having a large sample in a test of hypothesis. In estimating a parameter by a confidence interval, the confidence coefficient and the length of the interval are of primary importance. The higher the coefficient, the more likely it is that an interval will catch the parameter. But the confidence coefficient is arbitrarily chosen and has nothing to do with the sample size. The advantage of having a large sample is to rednce the length of the interval, after the confidence coefficient is chosen. For example, the 95% confidence interval of II is
y t. 02l
~
to
f+
t.'2I
~.
The length of this interval is 2t. 01l ';s 2 /n. As y changes from sample to sample, the center of the interval will change. As S2 changes from sample to sample, the length of the interval will change, even if the sample size remains the same. The increase in sample size will reduce the average length of such intervals. 13.6 Simplified Statistical Methods The various hypotheses and methods of testing them 8S listed in Table 13.3, are presented in the preceding chapters. Yet they are not the only methods which can be used in testing these hypotheses. During the last decade many simplified methods have been developed for testing the same hypotheses. Some of these methods, which are unfortunately named inefficient statistics, are usually presented in books on industrial quality control, but none of them is given in this book. The advantage of these simplified methods lies in the simplicity of computation. For example, the quantity SS, which is needed in every statistic listed in Table 13.3, is never needed for the simplified methods. However, for the same given observations drawn from a nonnal population, the probability of committing a Type II error is greater, if the simplified methods are used instead of those listed in Table 13.3. This is the reason why the simplified methods are called inefficient statistics. In contrast, the methods listed in Table 13.3 may be called efficient statistics. But it should be realized that the increase in sample size will decrease the probability of committing a Type II error. Therefore, this
194
REVIEW
Ch. 13
probability may be equalized by using larger sample sizes for the inefficient statistics than for the efficient statistics. The choice of a method of testing a hypothesis depends on the cost of computation and that of collecting observations. If the cost of collecting observations is low, larger samples may be obtained and the inefficient statistics may be used to avoid the complicated computation. If the observations are costly to collect, they should be treated carefully with efficient statistics. The analogy is like an orange juice squeezer. If oranges are sold at a penny a dozen, there is no need to buy an expensive and efficient squeezer to squeeze an orange dry. If oranges are sold at one dollar apiece, a good squeezer is needed to squeeze every drop of juice out of an orange. In addition to cost, human inertia also plays an important part in the choice of a method. One naturally likes to use a familiar method rather than to struggle with a strange one. At the present time, inefficient statistics are usually used in industrial plants where observations are collected easily and cheaply. EFficient statistics are usually used by scientists in analyzing experimental data, which are usually collected at great effort and expense. 13.7 ":rror The word error is used in several distinct senses in statistics. The computing error, the standard error, the Type I error, and the Type II error are really different kinds of errors. These different kinds of errors are listed below: (1) A mistake made in computation is usually called an e"or. It may be due to the failure either of the calculator or of the operator. The seriousness of this kind of mistake usually is not realized by beginners in statistics. A mistake in computation, if undetected, may lead to a wrong conclusion, no matter how elaborate the statistical method applied to the experminetal data. (2) The term sUJndam error does not imply that any mistakes are made. The sample mean changes from sample to sample. The standard error of the mean (~ection 5.3) measures the deviations of the sample means from the population mean. The fact that sample means deviate from the population mean is a natural phenomenon. (3) The word e"or in the error SS of the analysis of variance also does not imply that any mistakes are made. An error here is the deviation of an observation from the sample mean. (4) The Type I and the Type II errors are actual mistakes; but they are not mistakes due to human or mechanical failure. They are natural consequences of drawing conclusions about the populations while only samples are available.
REFERENCES
195
QUESTIONS
(1) If the significance level remains the same, what is the advantage of
increasing the sample size in testing a hypothesis? (2) In testing a hypothesis with the same significance level, is a large sample or a small sample more likely to cause a rejection of the hypothesis? Why? (3) In testing the hypothesis that the population mean is equal to a given value, only one sample is needed. Why then are all possible samples dragged into the discussion? (4) What is 'the 1% point of F with 1 and 00 degrees of freedom? Why? (5) What are the 1% and 5% points of F with 00 and 00 degrees of freedom? Why?
REFERENCES Dixon, W. J. and Massey, F. J.: Introduction to Statistical Analysis, McCr.... Hill Book Company, New York, 1951. Mosteller, Frederick: "00 Some Useful 'Inefficient' Statistics," A1IIIals of Mathematical Statistics, Vol. 17 (1946), pp. 377408.
CHAPTER 14
RANDOMIZED BLOCKS The term randomized blocks designates a particular design for an experiment. The design itself and possible applications of it are the same as those used in the method of paired observations (Section 8.7). However, in the latter method, only two treatments are involved, while in the randomized block design, any number of treatments may be involved in the experiment. As a result, the analysis of variance must be used, instead of the ttest, in testing the hypothesis that the treatment effects are equal. 14.1 Randomized Block Versus Completely Randomized Experiment The randomized block design may be illustrated by the feeding experiment (Section 12.8) which involves 5 rations and 50 animals. The 50 animals are divided at random, with the aid of a random number table, into 5 groups, each consisting of 10 animals. Then each group is fed a different ration. Or this method may be modified by dividing the animals, before they are randomized, into 10 groups, each consisting of 5 animals. The grouping is done in snch a way that the animals within a gronp are as nearly alike as possible in initial weight, age, etc. If the animals are piSS or mice rather than steers, each group may very well be animals of the same litter. Each gronp or litter of animals is called a block. Then the 5 animals in a block are assigned at random to the 5 different rations. An experiment of this kind, in which the randomization is preceded by deliberate selection, is called a randomized block experiment. If the 50 animals are assigned at random to the 5 different rations, withont previous grouping into blocks, the experiment is said to be completely randomized. The applications of the analysis of variance, with oneway classification, as given in Chapter 12, are all completely randomized experiments. In the completely randomized experiment, the 50 animals are assigned at random to the 5 different rations, without any restriction. In the ramdomized block experiment, the animals are deliberately divided into 10 blocks and then the randomization is restricted within the blocks. This restriction of randomization differentiates these two designs of experimen ts. Many books have been written about the design of experiments, which is a specialized field of statistics. Long lists of designs are available for use by experimental scientists. One design differs from another according to the restrictions imposed on the randomization. 196
14.1
RANDOMIZED BLOCK. VS. RANDOMIZED EXPERIMENT
197
The sugar beet experiment described in Section 8.7 is an example of a randomized block experiment. The field map showing the relative positions of the plots is given in Fig. 8.7. This example of two treatments, with and without fertilizer, is to show the possible application of the method of paired observations. However, if the number of treatments i8 iocreased from two to four, (for example, if 0, SO, 100 and ISO pounds of available nitrogen per acre are used in the experiment,) each block will ioclude 4 plots instead of 2, and the 4 treatments are then 88signed at random to the 4 plots within a block. When the random number table is used, the treatments are given code numbers: I, 2, 3, and 4. The plots within a block are numbered consecutively according to their geographic Block No.
4
PI .. No.
4
t
4
2
Fig. 14.1
order (Fig. 14.1). The treatment whose code number occurs firat in the random number table is 88signed to the first plot; the treatment whose code number occurs second in the table is assigned to the second plot, and so forth. The randomization of the treatments is repeated for each block. A field map of 4 treatmeuts with 3 blocks is given in Fig. 14.1 88 ao illustration. Although the block and plot numbers are given separately io Fig. 14.1, in actual practice the two numbers are usually combined ioto one. For example, the third plot of the second block is called Plot No. 23, 2 for the block and 3 for the plot. The term block was first used in field experiments where the work "block" meant a block of land. It was appropriate and descriptive. But it subsequently became a technical term used in experimental designs, aod now even a Ii tter of animals may be called a block. In the randomized block experiment, every treatment occurs once in each block. Then each block is really a complete experiment by itself. Therefore, the randomized block experiment is also called the randomized complete block experiment as distinguished from the more complicated designs which are called incomplete block experiments. The number of complete blocks is also called the number of replications of an experiment. The experiment shown in Fig. 14.1 is called a randomi zed block
Ch. 14
RANDOMIZED BLOCKS
198
experiment with 4 treatments (4 different quantities of fertilizers) and 3 replications (blocks). The advantage of the randomized block experiment over the completely randomized experiment is discussed in Section 14.8.
14.2 Mechuics of Partitioa of
s.. of Sqaares
For the randomized block experiment, the analysis of variance is used in testing the hypothesis that the treatment effects are equal. The mechanics of the partition of sum of squares are illustrated by an example of 3 treatments and 4 repJications given in Table 14.2&. The data may be supposed to have come from a feeding experiment consisting of 3 feeds and 4 litters of animals or from a variety trial consisting of 3 varieties of wheat planted in 4 blocks in the .field. TABLE 14.2a
Rep. totals
Treatment
1 2 3
Replication
4
Rep. means Yr
Rep. effects Y,'
1
2
3
T,
6
7 7 2
21
4
8 9 3 4
24 12 15
7 8 4 5
1 2 2 1 0
8 7 7
Te
28
20
24
72
24
Treatment means Ye
7
5
6
18
, =6
Treatment effects Ye  ,
1
1
0
0
Treatment totals
The notations used here are the same as those used in Chapter 12. The letter k is the number of treabnents and n the number of replications. In the given example, k = 3 and n = 4. The general mean and the grand total are still designated by y and G respectively. However, the notations rand T need modification. In Chapter 12, the analysis of variance is the ~ase of oneway classification. The sample mean is the treatment mean. The sample total T is the treatment total. However, in the randomized block experiment, the analysis of variance is the case of twoway classification: one for replication and the other for treatment. To distinguish these two classifications, the subscripts r or t are attached to rand T. The rand T r are the mean and total of a replication, and re and Teare the mean and total of a treatment. The replication effect is (rr  y) and the treabnent effect is (Y e It can be observed from Table 14.2a that the sum of the replication effects is equal to zero and that the sum of the treatment effects is also equal to zero. Each of the kn or 12 observations can be expressed as the sum of the (1) general mean, (2) replication effect, (3) treatment effect, and (4) error,
r
r
r>.
14.2
199
MECHANICS OF PARTITION OF SUM OF SQUARES
that is,
y
= y + (Y,  Y> + (Y,  1) + (y  r, 
y, + 1).
(1)
The above equation is an algebraic identity. After simplification, it can be reduced to = y. The error term (y  y, + 1) appears to be complicated, but it is simply the leftover portion of an observation, after the general mean, replication effect, and treatment effect are accounted for. For example, the observation in the third replication and the second treatment (Table 14.2&) can be expressed as
r
r,
2
= 6 + (2) + (0 + (1),
where 6 is the general mean, (2) is the effect of the third and (0 is the effect of the second treatment. The error (0 over portion of the observation after the general mean, the effect, and the treatment effect are accounted for. In other error (1) is the quantity needed to balance the equation.
replication is the leftreplication words, the
TA8LE 14.2b
general mean + replication effect + treatment effect + error Treatment 1 2 Rep. 3 4
1
2
3
6+1+12 6+2+11 62+1+2 61+1+1
6+11+1 6+21+0 6211 611+0
6+1+0+1 6+2+0+1 62+01 61+01
The 12 observations, each expressed as the sum of the four components, are shown in Table 14.2b. It can be observed in the table that the replication effects are the same for each of the treatments and that the treatment effects are the same for each of the replications. The sum of the replication effects and that of the treatment effects are both equal to zero. The sum of the errors, for each replication and for each treatment, is equal to zero. It can be shown algebraically that these characteristics of the components shown in Table 14.2b are present for any set of numbers, no matter what physical interpretation, such as treatment and replication, is attached to the kn observations. These characteristics reveal the physical meaning of the numbers of degrees of freedom. Since the sum of the k trea tment effec ts is equal to zero, if (k  1) treatmen t effects are known, the remaining one is automatically determined. Therefore, the treatments have k  1 degrees of freedom. For example, the 3 treatment effects are 1, 1 and o. If any two of them are known, tHe remaining one can be determined, because the sum of the three treatment effects is equal to zero. Therefore the treatments have 2 degrees
200
Ch. 14
RANDOMIZED BLOCKS
of freedom. Similarly, the n replications have n  1 degrees of freedom. Since the sum of the errors, for each replication and for each treatment, is equal to zero, if the errors for (n  1) replications and (Ie  1) treatments are known, the remaining terms can be aulomati cally determined. Therefore, the errors have (Ie  1) (n  1) degrees of freedom. For exam
ple, the errors of the first 2 treatments and the first 3 replications of Table 14.2h are these:
o
2 I? 1 O? 2 1 ? ? ??
o
000
o
o o
The errors of the last treatment and the last replication are not given but are designated by question marks. Since the som of the errors is equal to zero for every replication and for every treatment, these missing errors can be determined. Therefore, the errors have (Ie 1){n  1) or (3  1){4  1) or 6 degrees of &eedom. The sum of squares of the len or 12 treatment effects, Ie
nL: {Y,  1>2 = 4[1 2 + {_1)2 + 02] = 8, IS the treatment SS with (Ie  1) or 2 degrees of freedom (Equation 3, Section 12.1). The sum of squares of the len or 12 replication effects, /I
leL:(Yr  1>2 = 3[1 2 + 22 + (_2)2 + (_1)2]
= 30,
is the replication SS with (n  1) or 3 degrees of freedom. squares of the len or 12 error terms, (_2)2 + (_1)2 + ••• + (_1)2
The sum of
= 16,
is the error SS with (Ie  l){n  1) or 6 degrees of freedom. The total SS, which is the SS of the len observations, is len
L: (y 
1>2 = (6  6)2 + (8  6)2 + ••• + (4  6)2
= 54,
which is the sum of the replication SS, treatment SS, and error SS, that is, 54
= 30 + 8 + 16.
The algebraic proof of this identity is similar to that given in Section 12.1 and, therefore, is omitted here.
14.3 Statistical IolerprelatioD of Partitioa of s.. of Squares The meaning of the partition of the sum of squares with twoway classification is to be illustrbted by the tag population which is a normal popu
14.3
INTERPRETATION OF PARTITION OF SUM OF SQUARES
201
lation with mean equal to 50 and variance equal to 100. Six samples, each consisting of a single observation, are drawn from the tag population. The observations or samples are arranged in the form of 3 treatments and 2 replications as shown in Table 14.3a. Since all of the 6 observations TAOLE 14.3a Rep.
Treatment 1 2 3
mean
41
57
37
:
45
I m~~ ~ =44
55
57
I
53
56
47
1 2
I 47
,=
49
are drawn from the same population, the treatment effects and the replication effects do not really exist. For illustration, these 6 observations may be considered the yields of 6 plots of wheat. The treatments and replications have no effect on the yields. The variation in yields among the 6 plots may be considered the natural characteristics of the plants, just as different people may have different heights. The 6 observations are drawn from the same population with mean equal to SO, even though the 6 observations themselves are not the same. Now suppose the land of the first replication is more fertile than that of the second replication. To simulate this difference in fertility of the two replications, one may add 10 to each of the 3 observations (yields) in the first replication and add nothing to the second replication. Then the population means of the 6 observations are no longer the same. ~ow further suppose that the 3 treatments, such as different kinds of fertilizers, also influence the yields. To simulate the treatment effects, one may add 10 to each of the 2 observations of the first treatment; 20 to each of the 2 observations of the second treatment; and 30 to each of the 2 observations of the third treatment. Then the population means of the 6 observations are further disturbed. The resultant population means, given in Table 14.3b, are no longer tle same. But each population mean can be expressed in the fonn (1)
where Il is the mean of any of the 6 populations; Ii is the mean of the 6 population means; IL, is the mean of the 3 population means in the same replication; and IL, is the mean of the 2 population means in th e same treatment. The 6 population means, expressed in the form of Equation 0), are given in Table 14.3c. Note that the 6 population means given in Tables 14.3b and 14.:k are the same even though they are expressed in different forms. The quantity IL,  Ii is called the replication effect. It can be observed from Table 14.3c that the two replication effects are 5 and 5 which are originally introduced by adding 10 to each of the observations of the first replication and nothing to the second replication.
~
S
TABLE 14.3b Treatment

 . I
Rep. 2 Treatment mean /If
1 
I
50 + 10 + 10 = 10 50 + 0 + 10 =60

Replication Mean
Replication Effect
2
3
p,.
JIr  ji.
50 + 10 + 20 = 80 50 + 0 + 20 = 10
50 + 10 + 30 = 90 50+ 0+30=80
80 10
5 5
75
85
p=15
:
65
  
~ o
~
N ~ o ~
o n
1'0::
Treatment
effect /If  ji
0
::lei
en
10
0
10
0

.cr ("")
'"""
~
14.3
INTERPRETATION OF PARTITION OF SUM OF SQUARES
203
TABLE 14.3c Treatmeat Rep.

1 2
1
2


75 + 5 10 = 70 75  5 10 =60
3
== ====== =
75 + 5 + 0 =80 75 5 +0 =70
75 + 5 + 10 = 90 75  5 + 10 =80
This modification of observations causes the average yield of the two replications to increase by 10 + 0) or 5. The effect 5 of the first repliclition is obtained by 00  5) and the effect 5 of the second replication is obtained by (0  5). In other words, an individual replication effect is defined in terms of the deviation from the average of all replication effects. The same is true for the treatment effects (Il t  ~). The treatment effects are originally introduced by adding 10, 20, and 30 to the observations of the first, second, and third treatments respectively. The average increase is 20 and the three treatment effects are 10, 0 and 10 respectively. After the population means are cbanged by introducing the replication effects, the observations will be changed correspondingly. >\n observation 4.3, from a population whose mean is 50, becomes 63, if the population mean is increased to 70. The error is defined as (y  11) which is the deviation of an observation y from its population meaD /1. The error of the observation 43 is (43  50) or 7. If one adds (y  11) to both sides of ~quation 0), the resuiting equation is
t(
Y = ji +   (1B  y).
The meaning of the above expression can be interpreted from the absence of interaction, that is, the interaction term being equal to zero, or
or
f=Y+{'A
Y> +('8 y).
It would be interesting to observe the characteristics of a set of treatment means so that all the interaction terms are equal to zero or the interaction 55 is equal to zero. From the general mean, A effects, and B effects given in Table 18.2a, one can create such a set of treatment means. The 12 means thus created are given in Table 18.3, which shows that the corresponding elements of any two rows or any two columns TABLE 18.3
I~ 1 2 3 mean
1
2
3
4
meaD
6+42=8 6+41=9 6 + 4 + 3 = 13
6+02=4 6+01=5 6+0+3=9
632=1 631=2 63+3=6
612=3 611=4 6 .... 1+3=8
4 5 9
10
6
3
5
6
316
FACTORIAL EXPERIMENT
Ch. 18
maintain the same difference. The four means of the Eirst row are 8,4,1,3, and those of the second row are 9,5,2,4. The diEEerence between the corresponding means is 1; that is, 9  8 = 5 4 =: 2  1 = 4  3 c 1. This is true for any two rows or any two columns. The implication of the absence of the interaction may be seen from an example. Suppose the 4 levels of the factor A are 4 difEerent quantities of the fertilizer A and the 3 levels of the factor B are 3 different quantities of the fertilizer B. Then the 12 treatment means are the average yields. Since the means 8,4,1,3 and 9,5,2,4 maintain a constant difference of 1, the interpretation is that the second quantity of the fertilizer B enables the crop to yield, on the average, one unit more than the first quantity, regardless of the level of fertilizer A used. Another example of absence of interaction mlly help to clarify further the meaning of interaction. Suppose the 4 levels of the factor A are the 4 classes of high school, senior, junior, sophomore, and freshman and the 3 levels of the factor B are 3 different schools. Each of the classes of the 3 schools has n students. The 12 treatment means are the average scores for a given examination. The absence of interaction implies that the students of school No. 2 scored, on the average, one point higher than those of school No.1, regardless of the grade of the students. Any departure from this condition means the existence of interaction between the schools and the years. The interaction SS measures this departure quantitatively. It is equal to zero only if all the interaction terms are equal to zero (Column 6, Table 18.2d). The interaction discussed here is a descriptive measure of the date. The statistical inference about the interaction is given in Section 18.5. 18.4 Compating Method The mechanics of the partition of the sum of squares are shown in detail in Section 18.2. In that section the simple example, involving mostly onedigit numbers, is deliberately chosen to avoid complicated computation. The purpose of that section is to show the meaning of partitioning the total SS into various components, but the method of doing so is too tedious to be of practical value. In this section, a shortcut method is presented for the computation of the analysis of variance. The
18.4
317
COMPUTING METHOD
TABLE 18.4a
X
I
1
~
2
...
a
Total
T,
,TB
TA
G
2
·· ·b Total
basic principles behind the shortcut method are discussed in Sections 12.3 and 14.4 and will not be repeated here. Furthermore, this section is mainly concerned with the partitioning of the treatment SS, with" lor ab  1 degrees of freedom, into three components, namely the ASS, B SS, and AB SSe The method of obtaining the treatment SS, error SS, and total SS for a completely randomized experiment is given in Section 12.3 and that of obtaining the replication SS, treatment SS, error SS, and total SS for a randomized block experiment is given in Section 14.4. For both cases, the method of partitioning the treatment SS into the three components is the same. The notations used here are the same as those defined in Section 18.2. The first step in the computation is to find the TABLE 18.4b Preliminary Calculations (1)
(2)
(3)
(4)
(5)
Type of Total
Total of Squares
No. of Items Squared
No. of Observations per Squared Item
Total of Squares per Observation (2) + (4)
Grand Factor A Factor B
G2
1
ITiA
a
nob nb
n
6
no
UI
06
n
IV
Treatment
Irs Ir:
I
Analysis of Variance Source of Variation
Sum of Squares
A B AB
III I
01 bl
I  II  III + IV IV  I
(0  l)(b  1) 061
Treatment
III
Degrees of Freedom
Mean Square
F
318
Ch. 18
FACTORIAL EXPERIMENT
ab treatment totals, T I' and then arrange these totals in a twoway table such as Table 18.4a. The rest of the computing procedure is sbown in Table 18.4b. This table, however, does not give the complete computing procedure. It should be used in conjunction with Table 12.3a for a completely randomized experiment and with Table 14.4a (or a randomized block experiment. TABLE 18.4c
~
I
2
3
4
Ta
1 2 3
14 22 24
8 8 20
4 4 10
6 6 18
32 40 72
TA
60
36
18
30
144
The computing procedure may be illustrated by the example given ir Table 18.2a. The 12 treatment totals are shown in Table 18.4c and the analysis o( variance computation is sbown in Table 18.4d. It should be noted that the values o( the various Ss..values are the same as those shown in Table 18.2e.
TABLE 18.4d Preliminary Calculations (1)
(2)
(3)
(4)
(5)
Type of Total
Total of Squares
No. of Items Squared
No. of Observations per Squared Item
Total 01 Squares per Observation (2) +.(4)
Grand Factor A Factor B Treatment Observation
20,736 6,120 7,808 2,312 1,192
1 4 3 12 24
24 6 8 2 1
864 1,020 976 1,156 1,192
Analysis of Variance Source of Variation
A B AB Error Total
.
Sum of Squares
Degrees of Freedom
Mean Square
156 112 24 36 328
3 2 6 12 23
52 56 4 3
F
STATISTICAL INTERPRETAnONFIXED MODEL
18.5
319
18.5 Statistical InterpretationFixed Model The nab observations of a completely randomized a x b factorial experiment with n replications are considered ab random samples, each consisting of n observations, drawn from ab normal populations with the same variance (II. An example of six population means of a 3 x 2 facTABLE 18.5a
~
1
2
3
Total
1 2
64 50
51 47
53 47
168 144
Total Mean I1A A effect
114 57 5
98
100 50 2
312
49
3
Mean
118
56 48
B effect 11811
4 4 0
11 = 52
0
I1A 11
torial experiment is shown in Table 18.5a. Each population mean #l is made of the general mean ~ A effect #lA  ~ B effect #lB  ~ and the interaction AB effect (11
;D 
(I1A 
Ii) 
(118 
iL) = 11I1A
118 + ii,
that is, (1)
The above equation is an algebraic identity. After simplification it becomes 11 = 11· The method of finding the general mean, A effect, B effect, and interaction AB effect is the same as that given in Section 18.2. The only difference is in the notations used. Such differences in notations are introduced to differentiate the parameters and statistics. The A effects and B effects are ~iven in Table 18.5a. The interaction effects are given TABLE 18.5b
~ 1 2
Total
Total
1
2
3
3 3
2 +2
1
+1
0 0
0
0
0
0
in Table 18.5b. The six population means, expressed in terms of their components (Equation 1) are shown in Table 18.5c. The contrasting notations for the parameters and statistics are listed in Table 18.5d.
Ch.18
FACTORIAL EXPERIMENT
320
TABLE 18.5c
~
1
2
3
1 2
52 + 5 + 4 + 3 52+543
52  3 + 4  2 52 3  4 + 2
52 2 + 41 52  2  4 + 1
TABLE 18.5d Statistic
Parameter
General Mean
Il IlA ji IlB Ii 1lIlA IlB+ji
A Effect B Effect AB Effect
There are three variances connected with the ab population means ot a twofactor factorial experiment. The variance of the A effects is defined os
that of the B effects is defined as b
~(IlBpY
I
uB 
b1
(3)
,
and that of the interaction effects is defined as ab U
I AB
=
L: (Il 
Il A  Il B + ~)I (a _ 1){b  1) .
In terms of the example given in Tables 18.5 a, b, c, (5)1 + {_3)1 + (_2)1
ul
A

= B
ul
31 (4)Z + {_4)1
21
38 
2
= 19,
32 ==32,
1
and Note that the divisors for the three variances are not the number of items squared but the number of degrees of freedom.
STATISTICAL INTERPRETATIONFIXED MODEL
18.5
321
The variance of the k or ob population means is defined as (Equation 2, Section 12.4) lib
2 0'
J.I.
1: (,.,.  ji.)2• =
(5)
Gbl
For the example given in Table 18.5a,
(64  52)2 + ••• + (47  52)2 0'2
=
6 1
J.I.
200 D=
5
40.
This variance of the k or ob population means is a linear combination of the three variances o'~, o'~, and O'~B' More specifically, it is the weighted average of bu~, aO'~, and O'~B' with the numbers of degrees of freedom being the weights. In symbolic form, the relation is 0'2
= (0 l)bO'A + a(b l)O'~ + (0 l)(b l)O'AB
(6)
abl
J.I.
which can be verified by the example under consideration, that is, (3  1) (2) (19) + (3) (2 1) (32) + (3 1) (2 1) (14)
40 =
200 =.
(3 x 2) I
5
The average mean squares for treatment and error are given in Equations (4) and (5) of Section 12.4 as: Average treabnent MS =
O'J
+
no;
Average error AlS = 0'2
(7) (8)
The average mean squares for A, B, and AB are given as follows:
= 0'2 + nbu A B &IS = 0'2 + nGq~
Average A MS Average
Average AB MS = 0'2 + no AB
(9) (10) ( 11)
The above equations are given without any justification, but they are reasonable to expect. Tile treatment MS is the weighted average of the three mean squares due to A, B, and A B; that is, treatment !tiS
(0 1)(A MS) + (b 1)(B MS) + (0 I)(b 1)(AB MS)
=                   ob 1
(A SS) + (B SS) + (AB SS)
=ob 1 treatment 55 =ab 1
FACTORIAL EXPERIMENT
322
Ch.18
The average treatment mean square is also the weighted average of the three average mean squares, with the numbers of degrees of freedom being the weights. This can be verified by the example given in Table 18. Sa. It is already determined that a = 3, b = 2, u~ = 19, = 32, u~B = 14, and u~ = 40. Suppose n = 10, u 2 = 100. Then the values of the average mean squares are:
ui
Treatment:
u 2 + 1k1~ = 100 + 10(40)  500
A: u 2 + nbu~ = 100 + 10(2)(19) = 480 B: u 2 + nQ(1~ = 100 + 10(3)(32) = 1060 AB: u 2 + rJC1~B = 100 + 10(14) = 240 It can be observed that
500 =
2(4RO) + 1(1060) + 2(240) 2500
=.
5
5
The algebraic proof of the stated relation can be carried out similarly by utilizing the equation that
ab  1
= (a
 1) + (b  1) + (a  l)(b  1)
and also Equation (6). The average means squares given in Equations (7) to (11) can be explained in terms of a sampling experiment. A group of 6 random samples,
TABLE 18.5e
~ 1
1
2
3
55 72 52
55 62 38 47
38 54 67 44 41 51 33 47
58
73
46
66
46 55 40 49 58
74 79 54 89 2
61 58
42 39 59 40 42 43 71
42
38 67 32 69 48 53 41 52 50 36
71
56 49 64
67 47 56 51 43 50 41 44
323
STATISTICAL INTERPRETATIONFIXED MODEL
18.5
TABLE 18.5f
X
1
2
3
Total
1 2
672 497
496 486
502 512
1,670 1,495
Total
1,169
982
1,014
3,165
each conslstlDg of 10 observations, may be drawn from the tag population (Chapter 4) which is a normal population with mean equal to 50 and variance equal to 100. Then 14 is added to each of the 10 ohservations of the first sample, and 1 is added to each of the 10 observations of the second sample and so forth, so that the 6 population means would be equal to those given in Table 18.5a. An example of such a group of 60 observations is given in Table 18.5e. The treatment (sample) totals are given in Table 18.5f, and the mean squares are shown in Table 18.5g. If another group of 6 samples of the same size were drawn from the sarqe 6 populations, a different set of mean squares would be obtained. When infinitely many groups of samples are drawn, there will be infinitely many eets of values for the mean square. An average mean square is the average of the infinitely many values of a particular mean square. It should be noted from Table 18.5g that the mean square. computed from TABLE 18.5g Preliminary Calculations (1)
(2)
(3)
(4)
(5)
Type of Total
Total of Squares
No. of Items Squared
Observations per Squared Item
Total of Squares per Observation (2) + (4)
Grand A B Treatment Observation
10,017 ,225 3,359,081 5,023,925 1,694,953 175,981
1 3 2 6 60
60 20 30 10 1
166,953.75 167,954.05 167,464.17 169,495.30 175,981.00
Analysis of Variance Source of Variation A
B
Sum of Squares 1,000.30 510.42
Degrees of Freedom
Mean Square
Average MS
2 1
500.15 510.42
480 1060
AB
1,030.83
2
515.42
240
Error Total
6,485.70 9,027.25
54 59
120.11
100
324
FACTORIAL EXPERIMENT
Ch. 18
one goup of 6 samples do not necessarily agee with their respective average mean squares. 18.6 ModelsTest of Hypotheses In the preceding section, the ab population means are considered fixed quantities. This interpretation is called the linear hypothesis model or the fixed model (Section 12.4) of the analysis of variance. However, other interpretations of the population means are also possible. The #lA's and #lB's may be considered slllDples of large numbers of #lA's and #lB's. This interpretation of the population means is called the component of variance model or the random variable model. It is also possible to mix tbese two models; that is, one of the two factors fits the linear hypothesis model, while the other fits the component of variance model. An example at this stage may help to distinguish the various models. The data of the 3 x 2 factorial experiment given in Table 18.5e can serve this purpose. The three levels of the factor A may be considered three high schools and the two levels of the factor B two teaching methods used in instructing a certain subject. An observation y is the test score of a student. If one's interest is only in comparing these three particular schools and these two particular teaching methods, the linear hypothesis model is the appropriate interpretation. On the other hand, the component of variance model is the appl'opriate interpretation, if one is interested in knowing whether, in general, schools and teaching methods would affect the students' scores and only incidentally selected these three schools and two teaching methods for experimentation. If one is interested only in the comparison of these two particular teaching methods and he incidentally selected the three schools out of many, the mixed model is the appropriate interpretation. The reason for discussing the various models of the analysis of variance is that the average mean squares are different for the different cases. Those given in Equations (9), (10), and (11) of Section 18.5 are correct for the linear hypothesis model only. As a contrast the average meab squares for the linear hypothesis model and the component of variance model are listed in Table 18.6. The three variances u~, u~, and for the linear hypothesis model are defined in Equations (9), (10), and for the com(11), Section 18.5. The three variances u~, uS' and u ponent of variance model constitute a different set of quantities. 'They are the variances of all the population means rather than the particular population means involved in the experiment. The prime is used to differentiate these two kinds of variances. The pmpose of listing the average mean squares for the two models, however, is not to show the difference in the variances with or without the prime, but to furnish a
uAB
AB
MODELSTEST OF HYPOTHESIS
18.6
325
guide in finding the F value in testing a particular hypothesis. In the linear hypothesis model, each of the A, B, and AB mean squares should be divided by the error mean square to test the three hypotheses tbat (1) the factor A bas no effect on the means of y, (2) tbe factor B bas no effect on the mean of y, and (3) there is no interaction between factors A and B. But in the component of variance model, to test the three hypotheses, the A and B mean squares sbould be divided by the AB mean square (Table 18.6) and the AB mean square sbould be divided by tbe error mean square in testing the same three bypotheses. The test procedure for a mixed model is the same 8S that for tbe component of variance model. Like all analyses of variance, the F test is a onetailed test. The numbers of degrees of freedom of F, as usual, are those of the mean squares used as the numerator and the denominator in finding the Fvalue. TAJJLE 18.6 Average Mean Square Component
Lillear Hypothesis
Factor A
(72
+
Factor B
(72
+ llauBl
IDteractiOll AB Error
(72
nbu 2
A
+ 11(71
(72
AS
ComponeDt of Variance (72 u l (71
+ nu'2AS + nbu'2A
+ 110'2AB + nau'2B
+ 11(1'2
AS
(71
The nun:ber of degrees of freedom for the error MS depends design used for the experiment. For a completely randomized ment, the number of degrees of freedom is k(n 1) or ab(n  1). randomized block experiment the number of degrees of freedom is (,.  1) or (ab  1) (,.  1).
on the experiFor a (k  1)
18. 7 Tests of SpectRe Hypotheses The methods of testing the three general hypotheses are given in the preceding section. In addition to these methods, the individual degree of freedom (Section 15.3), the least significant difference (Section 15.4), the linear regression (Section 17.8), and the multiple range test (Section 15.5) can also be used in connection with the factorial experiment for testing more specific hypotheses. Furthermore, these methods can be used not only on the treatment means, each based on n observations, as described previously; but also on the means of the levels of the factqr A and the means of the b levels of the factor B. However, one must be aware that each of the A means is based on nb observations and that
326
FACTORIAL EXPERIMENT
Ch.18
each of the 8 means is based on na observations. In using the individual degree of freedom, the letter n in Equation (11), Section 15.3, and Equation (6), Section 17.8 should be interpreted as the number of observations from which a total T is computed. The example given in Table 18.5e may be used as an illustration. If an individual degree of freedom is used on the 6 treatment totals, n is equal to 10. If it is used on the 3 Atotals, n is replaced by nb or 20; if it is used on the 2 8totals, n is replaced by na or 30. The use of the least significant difference between means also follows the same rule. The least significant difference as given in Inequality (2), Section 15.4 is (1)
It may be used on the treatment means, Ameans, or 8means, but the letter n is to be interpreted as the number of observations from which a 'mean is computed. This principle also applies to the new multiple range test. In finding ..;srTii, the letter n is subject to the same interpretation.
18.8 Hierarcbieal Classificalion An experiment involving multiple factors is not necessarily a factorial one which requires that factors A and 8 crisscross to form the treatment combinations. H the factors do not crisscross, but one factor nests inside another, the experiment is called a hierarchical one. For example, samples of iron ore may be sent to four different chemical laboratories for analysis and each laboratory may assign three technicians to carry out the chemical analysis, each technician making two determinations of the percentage of the iron content in the ore. Such a set of data may be tabulated as shown in Table 18.2a, where the four levels of the factor A are the laboratories, the three levels of the factor B are the technicians, and the two observations in each cell are the determinations. Such a tabulation indeed resembles the factorial experiment; however, the data are really of the hierarchical classification. The three technicians of the first laboratory may be Jones, Smith, and Brown; those of the second may be White, Johnson, and Miller; those of the third may be Carpenter, Robinson, and Riley; and those of the fourth may be Anderson, Howard, and Walker. These people are 12 individual technicians employed by four independent laboratories. They are not three sets of quadruplets crossing the laboratory lines. Instead of calling the laboratories and technicians the factors A and 8, to call them tiers A and B would be
18.8
IDERARCHICAL CLASSIFICATION
327
more appropriate. So this set of data has three tiers, namely laboratory, technician, and determination. Another illustration may help to clarify further the meaning of the hierarchical classification. A college may be divided into a number of schools, such as the school of science and the school of engineering. The school of science may have such departments as mathematics, physics, and chemistry; the school of engineering may have the departments of civil engineering, mechanical engineering, and electrical engineering; and within the departments are the faculty members. Even though both schools are subdivided into departments, there is no onetoone correspondence betweeD the departments of the school of science and those of the school of engineering. The schools and departments are two tiers rather than the two factors. The difference between the factorial and hierarchical experiments can be illustrated by the following diagrams:
1
Factorial A 3 2
4
A
1 B
8
2
,
3 Hierarchical
In the hierarchical classification, each observation may be broken down into (1) the general mean, (2) A effect, (3) B effect within tier A, and (4) error within tier B, that is (1)
The notations used here have exactly the same definitions as given in Section 18.2. In comparing the above equation with Equation (1), Section 18.2, one will notice that the general mean, A effect, and error are the same for both equations. The only difference is that the B effect and
Ch.18
FACTORIAL EXPERIMENT
328
AB effect of the factorial experiment are added together to fonn the B effect within tier A of the hierarchical experiment. As a result, the total 55, A 55, and error 55 are the same for both cases. The B 55 within tier A is equal to the sum of the B 55 and AB 55. Therefore, the details of the partition of the sum of the squares are omitted here. The shortcut method of computation is given in Table 18.8a. The letter a represents the number of levels of the tier A; the letter b is the number of levels of the tier B within each level of A; the letter" is the number of observations within each level of tier B. As a further illustration, the data of Table 18.2a may be considered hierarchical experiment and analyzed as such. The result is given in Table 18.8b. The purpose of analyzing the same set of data by two different methods, factorial and hierarchical, is to show the numerical relation between these two methods; for example, the B 55 within the tier A is equal to the sum of the B 55 and AB 55. One should not acquire the Table 18.Be Preliminary CalcwatioDS (1)
(2)
Type of Total
Total of Squares
Grand Tier A B within A Observation
G2
l:TA l:T2 l:~
!
(3)
No. of Items Squared
1 a ab nab
(4)
Obaervation& per Squared Item nab nb n 1
(5) Total of Square. per Observation (2) .;. (4)
I 11 1Il IV
Analysis of Variance Source of Variation Tier A B within A Error (within B) Total
Sum of Squares III
III  II IV  III IV I
Degrees of Freedom
Mean Square
F
aI a(b  1) ab(n  1) nab  1
impression that the method of analysis for a given set of data is arbitrarily chosen. The method of analysis is determined by the method by which the experiment is carried out or by the physical meaning of the experiment, and not by the appearance of the tabulation of the data. As long as the twodimensional paper is used for tabulation, all tables have similar appearances. The figures are arranged either in columns, in rows, or in columns and rows. Therefore, the appearance of a table can hardly
329
HIERARCHICAL CLASSIFICATION
18.8
be used as a guide in selecting an appropriate statistical method. The average mean squares are given in Table l8.8b. The variances 0'2 and O'~ have the same definitions as given in Section 18.5, while 0'; (A) is the variance of the B effects within Tier A. In terms of the example of the iron ore, 0'2 is the variance among determinations made by a technician, 0'; t1) is the variance among the means of the determinations made by different technicians of a laboratory, and O'~ is the variance among the means of the determinations made by different laboratories. The average mean squares (Table 18.8b) may be used as a guide in selecting the appropriate denominator for the Fvalue in testing a hypothesis. The statistic
F.
Tier A MS
B within A MS
is used in testing the hypothesis that 0' ~
O. The statistic
OK
B within A MS
FError MS
is used in testing the hypothesis that
O';(A)
= o.
Table 18.8b PrelimilUU'}' Calculations (1)
(2)
(3)
(4)
Type of Total
Total of Squares
No. of Items Squared
Observations per Squared
Correction Tier A B within A Observation
20,736 6,120 2,312 1,192
1 4 12 24
24 6
Item
(5) Total of Squares per Observation (2) + (4) 864 1,020 1,156 1,192
2
1
Analysis of Variance Source of Variation
Sum of Squares
Degrees of Freedom
Average Mean Square
Mean Square 2
Tier A
156
3
52
0'
B within A Error (within B) Total
136 36 328
8 12 23
17 3
2 0' 0'2
2 60"2 + 110S( A) + n A 2
+ MS( A)
FACTORIAL EXPERIMENT
330
Ch.18
The analysis of variance with hierarchical classification does not present any difficulty in computation, even if the numbers of subdivisions are not the same for the different levels of a tier. Since the hierarchical classification is a oneway classification (Chapter 12) within another, the computing method for single classification with unequal numbers of observations may be used repeatedly at different tiers. The basic principles of this method are given in Section 12.10 and therefore are not repeated here. The difference in the computing method between the equal and unequal number of subdivisions is in replacing T~ +
If2
T: + •••
a
n
n Table 18.Se
Tier A
1
B
Y
1
19 23
361 529
17 15 16
289 225 256
48
3
768
3
18
324
18
1
324
1
12 16
144 256
28
2
392
2
20 24
400 S7~
44
2
968
1
21 23 22
441 529 484
66
3
1,452
18 21 16
324 441 289 256
72
4
1,296
3
28
784
28
1
784
4
19 15
361 225
34
2
578
2
2
Sum
No. of Totals
17
Grand
A
,.Z
2
3
B
Oba. T
n
T2/n
42
2
882
T
ra
TZ/n
108
6
1,944
72
4
1,296
200
10
4,000
T
n
T2/n
380
20
7,220
.
7,444
7,240
7,220
20
9
3
1
IV
III
II
I
7,494
331
SAMPLING ERROR
18.9
where T is Bny kind of totBI Bnd n is the number of observations in that total. The rest of the computing procedure is the same for both cases. An example of computation for the hierarchical classification with unequal numbers of subdivisions is shown in Table IS.ac. The first two columns identify an observation and the observation (1) itself is shown in the third column. The rest of the table shows the details of computation. The procedure involves finding the totals (T) and couDting the numbers (n) of observations for different levels of different tiers. Then one finds the quantity I.(T 2 In) for each tier and counts the number of totals in each such quantity as shown at the bottom of the table. The sums of squares and their numbers of degrees of freedom can be obtained by finding the differences between two adjacent terms, that is, II  I for A, III  II for B within A, IV  III for error within B, and IV  I for total. The analysis of variance table showing these components is given in Table 18.8d.
Table 18.8d Source of Variation
SUID of Squares
Tier A B within A Error Total
20 204 50 274
Degrees of Freedom
Mean Square
2 6
10,00 34.00 4.55
11
F
0.29 7.47

19
The amount of computing work involved is not nearly so great as it ~eems. Many of the intermediate steps shown in Table 18.8e may be omitted. Each of the quantities I.Y and I.(P In) may be obtained in one continuous operation on a desk calculator. Therefore the columns for y and T2 In for various tiers are not really necessary.
18.9 Sampling Error The hierarchical classification of the analysis of variance presented in the preceding section is often used in combination with other experimental designs. An example of the application of the analysis of variance given in Section 12.8 may be used as an illustration. A manufacturer has three different processes of making fiher bORrds Rnd wishes to determine whether these processes produce equally strong boards. A random sample of 20 boards is to be obtained from each of the products manufactured by the tb~ee processes. The strength (observation) of each of the 60 boards is determined. Then the analysis of variance may be used to test the hypothesis that the average strengths of the boards
FACTORIAL EXPERIMENT
332
Ch. 18
produced by the three different processes are the same. as shown in Section 12.8 is as follows:
The analysis
Sources of Variation
Degrees of Freedom
Among Processes Within processes Total
57
2
59
However, one may take more than one observation for each board. Suppose 4 measurements of strength are made on each board. There will be 240 observations. The analysis (Table 18.8a) is as follows: Sources of Variation
Degrees of Freedom
Among processes Among boards, within processes (Experimental Error) Within board (sampling error) Total
2
57 180 239
In this experiment, a board is called a sampling unit. The variation among the sampling units is called the exper.imental error (Section 14.8), while the variation within the sampling units is called the sampling error. The tomato experiment of Table 14.5a may be used as an illustration of the use of the hierarchical classification in a randomized block experiment. There are 6 varieties and 5 replications and, therefore, 30 plots in this experiment. Here a plot is a sampling unit. The error with 20 degrees of freedom (Table 14.5b) is the experimer.tal error. If the weights of tomatoes of individual plants were recorded and suppose there were 10 plants in each plot, the analysis would be as follows: Sources of Variation
Degrees of Freedom
Replication Variety Experimental error Sampling error Total
4
The procedure that described as tier A, the error as within
5 20 270 299
of testing a hypothesis in this experiment is the same as in the preceding section. The item variety is interpreted experimental error as tier B within A, and the sampling tier B.
333
EXERCISES
EXERCISES l. Make three sets of fictitious treatment (sample) means for a 3 x 4 factorial experiment so that (i) A SS is equal to zero, but B SS and AB SS are not equal to zero; (ij) B SS is equal to zero, but A SS and AB SS are not equal to zero; (iii) AB SS is equal to zero, but A SS and B SS are not equal to zero. 2. The following data are those of a completely randomized 3 x 2 factorial experiment.
~ 1 2
1
2
3
17 25
10 4
11
12 10
9 5
2 10
5
(i) Express each of the 12 observations as the sum of the general
mean, A effect, B effect, AB effect, and error. (ii) Find the total SS, ASS, B SS, AB SS, and error SS from the components of the observations. (iii) Find the same set of S5values by the shortcut method and note that the values thus obtained are the same as those obtained in (ii).
3. For the following eet'of 10 population means, find
~
1
2
3
4
5
1
63
84
45
78
60
2
75
92
57
46
85
(7A' oA,
and
(7A8.
4. Twelve random samples, each consisting of two observations, are drawn from the tag population with mean equal to 50 and variance equal to 100. The data are tabulated as follows:
IX 1 2 3 '
1
2
3
4
46 37
53
58 46
60
46 55
61
56 30
55 47
58
r49 57 51 53 42 54
53 53
46 46
~
FACTORIAL EXPERIMENT
334
Ch.18
Test the hypotheses, at the 5% level, that (i) the population means of four levels of the factor A are the same F = 1.16), (ii) the population means of three levels of the factor B are the same (F = 0.36), and (iii) the interaction between the factors does not exist. (F = 0.33) Since the sources of these samples are known, state whether your conclusions are correct or errors (Type I or II) have been made. 5. Add appropriate numbers to the observations of Exercise 4 so that the 12 population means are as follows:
IX 1 2 3
1
2
3
4
50 50 150
50 50 150
150 50 50
150 50 50
'
\lhich of the three hypotheses are correct? Test the hypotheses at at the 5% level and see if your conclusions are correct (Exercise 7). 6. Add appropriate numbers to the observations of Exericse 4 so that the 12 population means are as follows:
~"
aSO. CaCNa
75 317 303 281
56 173 288 265
90
251 245 241
65 238 238 209
Blaak NuCreeD (NH.>aSO. CaCN,
152 461 403 344
103 383 466 295
179 391 387 388
154 339 388 274
CIIPC
Using the 5% significance level, test the hypotheses that 0) there is no interaction between fertilizers and weed killers; (2) the different fertilizers have the same effect on the yield of grass; and (3) the different weed killers have the same effect on the yield of grass. After the analysis, summarize yoW' conclusions in a short paragraph.
13. The following table gives the % shrinkage during dying of folD' types of fabrics at folD' different dye temperatures. Temperature Fabric
210°F
215°F
220°F
225°F
I
1.8 2.1
2.0 2.1
4.6 5.0
7.5 7.9
II
2.2 2.4
4.2 4.0
5.4 5.6
9.8 9.2
III
2.8 3.2
4.4 4.8
8.7 8.4
13.2 13.0
IV
3.2 3.6
3.3 3.5
5.7 5.8
10.9 11.1
p:..:..
This is a completely randomized 4 x 4 factorial experiment with 2 replications. Test the main effects and the interaction at the 5% level. 14. An experiment was conducted to investigate the effects of (a) the date of planting and (b) the application of fertilizer on the yield of soybean. The randomized block design with folD' replications was used. The yields of the 32 plots are given in the following table:
337
EXERCISES
Date of Planting
Fertilizer
Early
Late
Replicate 1
2
3
4
Check Aero Na K
28.6 29.1 28.4 29.2
36.8 29.2 27.4 28.2
32.7 30.6
32.6 29.1
26.0 27.7
32.0
Check Aero Na K
30.3 32.7 30.3 32.7
32.3 30.8 32.7 31.7
31.6 31.0 33.0 31.8
30.9 33.8 33.9 29.4
29.3
Test the various hypotheses at the 5% level and write a summary on the findings of the experiment. 15. The following data were obtained in a study of the effectiveness of benzaldehyde, 3thiosemicarbazone and two of its analogs against vaccinia virus in chick embryos. Six eggs were used at each of three virus dilutions for each compound tested. The entire experiment was done twice at different times (replications). The following table shows the lI1ean reciprocal survival times (the mean of the values of l04/ time in hours) obtained for each group of six eggs. Substituent pmetboxy
pamino Virus Oil uti on Rep. 1 Rep. 2
104 • 0 104 • 3 10 4 • 6
Unsubstituted
Rep. 1
Rep. 2
Rep. 1
Rep. 2
87
90
82
71
72
77
79 77
80
73
72
70
66
81
72
68
62
61
Test the main effects and the interaction at the 5% level. (Hamre, Dorothy, Brownlee, K. A., and Donovick, Richard: "Studies on the Chemotherapy of Vaccinia Virus, The Activity of Some Thiosemicarbazones," The Journal of Immunology, Vol. 67, 1951, pp. 305312). 16. In a study of bacterial counts by the plate method, twelve observers counted each of twelve plates three times. The first time the plates were counted the plates were labelled 1 to 12. They were then taken away and renumbered 13 to 24 in an order different from that previously used. For the third count they were renumbered 25 to 36, again in different order. While the same twelve plates were counted three times by each observer, the imp'ession was given that 36 different plates were being provided. Each observer entered up the count on a slip of paper; this walJ removed after each series of twelve plates so that, if suspicion was aroused that the same plates
n.
338
FACTORIAL EXPERIMENT
Ch.18
were being recounted, no reference could be made to the previous figures obtained. The results of the counts are given in the accompanying table. Does the average bacterial count vary with the observer? Observer
Plate No.
A.
E.
F.
H.
J.
K.
C.
Rl
353 339 345 347 339 340
346 344 344
340 382 374 384 356 362 349 391 364
375 359 372 358 375 359
355 333 332 341 334 336 340 336 328
334 334 328
R4
201 197 211 210 198 199
205 209 204
202 205 206 203 206 205 201 204 211
203 203 211 207 214 206
200 188 194 186 176 192 174 188 191
201 199 200
I
,i R6
G.
I.
L.
59 62
50 45 54
35 39 45
43 45 46
46 48 45
139 146 138 145 140 140
147 149 150 144 138 145
133 130 132
126 116 132
138 135 133
135 133 136
137 134 135
144 145 143 154 147 153
148 155 147 156 157 153
160 164 160
145 143 150
142 148 141
152 141 142
240 250 242
218 225 223
268 272 261 267 252 253
266 266 262 264 261 261
227 239 235
220 234 231 233 236 228
231 228 233
55 57 64
63 55 61
55 52 55
53 55 55
167 158 151
155 144 148
51 53 67 174 153 175
54 55 54
168 145 148 160 141 157
54 67 58 158 150 169 166 161 165
51 53 55
P3
58 70 64 154 160 163 186 177 156
159 152 130
149 145 143
138 137 152
P5
88 86 85 101 90 94
81 93 102
84 93 93
106 96 103 101 109 105
101 91 93
85 95 97
87 93 117
65 92 108
83 98 82
84 82 93
170 173 174 187 173 175
174 182 182
165 161 170
182 180 181 194 191 170
179 183 183 176 182 172
165 163 171
160 158 169
166 166 161
163 166 171
121 159 128 144 138 137 126 155 144 155 127 150
146 142 128
123 120 121
143 139 125 139 132 143
137 142 130 132 141 138
123 143 154 120 140 155 126 143 143
143 164 140 135 147 143
101 112 112 103 116 112
127 128 121
138 138 135
115 126 119 127 134 139
128 115 121 125 122 123
58 51 57
54 62 58
52 54 59
50 41 44
R7
146 139 136 140 134 146
137 142 139
138 134 133
RIO
141 139 139
149 157 150
158 146 151
R12
239 261 223 259 224 246
I
I
D.
B.
I
P2
I I
P8
P9
P11
,
55 59 71
51 52 60
56 67 53
57 66 59
56 57 53
50
57 59 56
117 124 123
(Wilson, G. S.: "The Bacteriological Grading of Milk," His Majesty's Stationery Office, London, 1935)
EXERCISES
339
17. The purpose of this experiment is to study the effect of 4 different baking temperatures and 5 different recipes on the size of the cake wbose crosssection is measured. in square incbes. Tbe 40 cakes used in this experiment were individually mixed and baked. Tbe areas of the crosssections of cakes are given below:
Recipe Temperature
Plain Cake
3% GMS
6% GMS
3% Aldo
6% Aldo
218°C
4.26 4.49
5.35 5.39
5.67 5.67
5.30 5.67
5.52 5.80
190°C
4.59 4.45
4.75 5.10
5.30 5.57
5.00 5.02
5.41 5.29
163°C
4.63 4.63
4.56 4.91
4.80 4.86
4.79 4.88
4.65 4.80
149°C
4.01 4.08
3.87 3.74
4.13 4.03
3.98 4.11
4.16 4.35
Using the 5% significance level, test the hypotheses that 0) there is no interaction between recipe and temperature, (2) the recipe bas no effect on the size of the cake, and (3) the temperature has no effect on the size of the cake. If the temperature is significant, use the new multiple range test (Section 15.5) to rank the temperatures according to the average cake sizes. Test the specific hypotheses concerning the recipes by the individual degrees of freedom (Exercise 5, Chapter 15). After the analysis, list all the conclusions. 18. In an effort to reduce labor absenteeism in a large plant, 2 variables were studied. Once hired, applicants were assigned at random to one of 6 groups. These were in different parts of the plant, and treatments were as nearly as possible alike save that 3 different lengths of work week and the condition of presence or absence of music. This is a completely randomized 3 x 2 factorial experiment with 30 replications. The results, in terms of numbers of half days absent during the succeeding quarter, are given in the accompanying table. Do the length of work week and the presence of music affect the absenteeism?
FACTORIAL EXPERIMENT
340
Ch. 18
Length of Work Week 3Shour
4Ohour
Mus.
No Mus.
0 0 0 0 0 0 1 1 1 1 1 2 2 2 3 3 4 4 4 4 4 5 8 12 12 13 20 20 21 29
0 0 0 0 1 2 2 2 2 2 2 3 3 3
"
4 4 5 5 5 6 7 8 10 10 12 14 14 16 38
Mus.
0 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 2 2 2 3 4
"
4 5 6 6 10 10 12 16
48hour
No Mus.
Mus.
No Mus.
0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 2 3 3 3 4 4 4 4 6 6 6 6 10 28
0 0 0 0 0 1 1 1 2 2 2 2 2 2 2 3 3 5 8 8 10 10 10 16 16 20 22 23 30 30
0 0 0 0 0 1 1 2 2 2 2 2 3
"
4 6 6 6 6 6 6 7 10 10 11 12 14 14 29 34
19. A group of 60 auto mechanics working at several different garages, all of which are agencies for the same car manufacturer, are chosen as the subjects in a study of the effectiveness of the company's onthejob refresher courses. First, all are given a trade examination. They are then classified into three groups of 20 each, namelythe top, middle, and bottom third according to their performance on the test. Each group of 20 is then divided at random into two subgroups. One subgroup of 10 mechanics attended the refresher course. The other did not. At the end of the course, all 60 mechanics are given another examination. The scores of the· second examination are as follows:
341
EXERCISES First Examination Bottom
17 18 20
20 22 22 22 24 25 26
Training Course
13
14 16 16 16
No Training Course
17
17 18 18 20
Middle
Top
26 29 30 30 33 33 34 35 35 37
35 36 38
38 '40
41 41 44 44
46
24 26 26 28 28 28 32 32 33 34
29
33 34 36 36 38 41 41 43 47
This is a completely randomized 3 x 2 factorial experiment with 10 replications. . (a) Test the hypothesis that the interaction is absent, at the 5% level. What is the meaning of interaction in this experiment? (b) Test the hypothesis that the training course is valueless, at the 5% level. 20. The following data were obtained in a study of the effect of induced maltase formation on the free glutamic acid depletion of nitrogen Flasks
Inductor Absent
Inductor Present
1
19.6 20.4
10.3 10.1
2
17.9 17.2
10.9 10.8
3
17.2 18.0
9.9 9.9
4
18.9 19.6
11.1
17.3 17.5
12.0 11.8
5
11.5
342
FACTORIAL EXPERIMENT
Ch.18
replenished yeast cells. exmethylglucoside was used to induce maltase formation. Cells were nitrogen starved for 80 minutes in a synthetic medium, replenished with NH.CI for 15 minutes, centrifuged, washed, and resuspended in buffer. Equal aliquots were placed in 10 flasks containing glucose with or without the inductor. Following incubation, two determinations per flask were made of the free glutamic acid content of centrifuged washed cells obtained manometrically by the decarboxylase method. Test for a difference in the average glutamic acid between the two treatments, at the 5% level. (Halvorson, Harlyn 0., and Spiegelman, S.: "Net Utilization of Free Amino Acids During the Induced Synthesis of !\1altozymase in Yeast," Journal of Bacteriology, Vol. 65, 1953, pp. 601608.) 21. This is exactly the same experiment as F:xercise 20. Arginine instead of glutamic acid was measured. Test for a difference in arginine between the two treatments, at the 5% level.
Flasks
Inductor Absent
Inductor Present
1
3.2 3.2
1.7 1.7
2
2.4 2.4
1.9 2.3
3
2.6 2.9
1.8 1.6
4
2.8 3.0
1.9 1.8
5
2.6 2.6
2.0 1.9
QUESTIONS 1. What is a factorial experiment? 2. What are the advantages of a factorial experiment over & series of simple experiments? 3. \\ hat are the sources of variation and their numbers of degrees of freedom for a 5 x 7 factorial experiment with 4 replications, if (a) the completely randomized design is used and (b) the randomized block design is used? 4. \Uat are the sources of variation and their numbers of degrees of freedom for a 4 x 6 factorial experiment with 10 replications, if (a) the completely randomized design is used and (b) the randomized block design is used?
REFERENCES
343
5. In a factorial experiment, the factor A consists of three schools and the factor B consists of two teaching methods and an observation y is the test score of a student. (a) What is the meaning of the presence of interaction? (b) What is the meaning of the absence of interaction? 6. In a factorial experiment, three different teaching methods are used on the students of five different levels of I.Q. An observation is the test score of a student. (a) Suppose it is found that one method is best suited to students with a high I.Q. and that another method is best suited to those with a low I.Q. Would you say that there is an interaction or no interaction between the teaching methods and the levels of I.Q. of the students? (b) Suppose the three teaching methods have the same relative merits for students of all levels of I.Q. Would you say that there is interaction or no interaction? 7. What is the difference between the experimental error and the sampling error? 8. In a factorial experiment, one may test the hypothesis that the ab treatment means (Il) are equal. (a) Then why should one bother to test the three component hypotheses? (b) What are the three component hypotheses? 9. Both factorial and hierarchical experiments are cases of the analysis of variance with multiple classifications. What is the difference between them? 10. What hypotheses can be tested by the hierarchical experiment?
REFERENCES Anderson, R. L. and Bancroft, T. A.: Statistical Theory in Research, McCrawHill Book Company, New York, 1952. Bennett, Carl A. and Franklin, Normal L.: Statistical Analysis in Chemistry and the Chemical Industry, lohn Wiley and Sons, Inc. New York, 1954. Fisher, R. A.: The Design of E"periments, Hafner Publishing Company, New York, 1951. Mood, A. M.: Introduction to the Theory of Statistics, McCrawHill Book Company, New York, 1950. Dstle, Bernard: Statistics in Research, Iowa State College Press, Ames, 1954.
CHAPTER 19
ANALYSIS OF COVARIANCE The distributions of the regression coefficient and the adjusted mean are presented in Sections 17.3 and 17.4, but the discussion is limited to a single sample. In this chapter, the discussion is extended to " samples. Therefore, the material presented here is essentially an extension of the subject of linear regression. The new technique used here, called the analysis of covariance, involves the partitioning of the sum of the products of " and y into various components.
In many ways, this
technique is similar to the analysis of variance, which involves the partitioning of the sum of squares. Incidentally, this chapter also integrates linear regression and the analysis of variance.
19.1 Tesl of Ho.ogeaeilY of Regressioa Coefficienls Section 17.3 shows that the regression coefficient b varies from sample to sample even though all the samples are drawn from the same population. Therefore, a mere inspection of the values of the sample regression coefficients will not enable one to tell whether the k population regression coefficients, f3, are equal. In this section a method is given to test the hypothesis that" population regression coefficients are e'!ual, under the assumption that the samples are drawn at random from the populations described in Section 16.2. The physical meaning of the regression coefficient is extensively discussed in Chapters 16 and 17. The age (x) and the height (y) of children is one of the illustrations used. The regression coefficient in this example is the rate of growth of the children. 1£ one wishes to know if the children of different races grow at the same rate «(3), one may obtain a random sample of children from each race and test the hypothesis that the growth rates are equal. The rejection of this hypothesis implies that the growth rate of children varies from one race to another. Geometrically, the hypothesis that all {3's are equal means that the regression lines of the k populations are parallel. 'The test of homogeneity of means (analysis of variance, Chapter 12) is accomplished by comparing the variation among the sample means (among sample MS) with that among the observations (within sample MS). The test of homogeneity of regression coefficients is based on the same principle, except that the comparison is between the variation among the sample regression coefficients and that among the observations. Since these two tests have so much in common, one test can be explained in tenns of the other.
(m
344
19.1
TEST OF HOMOGENEITY OF REGRESSION COEFFICIENTS
345
r
The mean and the regression coefficient b, different though they are, yet have many properties in common. This fact should not be surprising, for both rand b are linear combinations of the n observations (y) of a sample (Equation 3, Section 15.1 and Equation 4, Section 17.3), and consequently (Theorem 15.2) both follow the nonnal distribution if the population is normal. Furthermore, the similarities do not end here. In fact, the analogy can be carried to the numerators and denominators of these two statistics. The mean y is equal to Tin, and b is equal to SP ISS •• The two numerators T and SP, and the two denominators nand SS s also play corresponding roles. For example, the variance of is equal to a J I n (Theorem 5.3) and that of b is equal to a J ISS. (Th eorem 17.3). The general mean 9 is the weighted mean of the k sample means y, with the sample sizes n 88 the weights (Equation 4, Section 12.10), that
r
IS,
_ nlr, + n.jJ + ••• + n~L T, + TJ + ••• + T G .. II "I + n J + ••• + n, I.n =In'
'Y 
(1)
Then it is to be expected that the mean 1j of k regression coefficients, b" bit ••• , and b Ie' should be the weighted mean of the b's, with the values of SS. of the k samples as the weights, that is,
=
SPI+SPJ+"'+SP Ie
I.S~S.........::.."
• where SS.1 is the SS. for the hrat sample; SSd is the SS. for the second sample; and so forth. The fact that bSS" is equal to SP is derived from the equation that b  SP ISS. (Equation 5, Section 16.9). The amoogaample SS is (Equation 2, Section 12.10) Ie
L n(y  9)J 
nl  ;=r.==:r===r.=='T yi~IO "'1) + "'aO "'a) nl na
(1)
approximately follows the normal distribution, with mean equal to 0 and variance equal to 1. This statistic, with some modification, may be used in testing the hypothesis that the difference between two population means is equal to a given value. The necessary modification is that the unknown variances O'~ and of Equation (1) must be replaced by their estimates. How the population variances should be estimated is determined by the hypothesis being tested. In a binomial population. the variance is equal to ".(1".). H two population means are equal, the population variances are bound to be equal. Therefore, in testing the hypothesis that "'I"'a is equal to a value other than zero. the population variances are considered to be different. Then the estimates of O'~ and are 11(1f,) and fa(1fa) respectively. Thus the revised statistic is
0':
0':
u
(fl1a)  ("'I"'a)
~~.~~~~~~
1 !fl(lYI) + y 20 Ya)' nl na
(2)
V
On the other hand, if the hypothesis is "'I"'a 0, the two population variances are considered equal. Then the pooled estimate y(1y) should 0::
408
Ch.21
SAMPLING FROM BINOMIAL POPULAll0N
be used to replace both O'! and O'~ of Equation (1). Therefore, in testing the hypothesis that III  112' the appropriate statistic to use is
flf2
U
 [1 1]'
(3)
= ;===========~
f(lf)  + RI
R2
r
where is the general mean of the two samples. The application of Equation (2) may be illustrated by an example. Suppose a manufacturer claims that his new formula for a certain insecticide can kill 8% more insects than his old formula. In statistical terms, the hypothesis is that III  112  0.08. To see whether this claim is justified, an investigator conducted an experiment and found that the new formula killed 320 out of 400 insects, while the old one killed 60 out TABLE 21.6a Observation
Treatment
Total
New
Old
Killed Not killed
320 80
60 40
380 120
Sample size
400
100
500
of 100 insects. The data are given in Table 21.6a in the form of a twoway frequency table. Considering a killed insect as a success, the mean for the new formula is 320/400 or 0.80, and the mean for the old formula is 60/100 or 0.60. The hypothesis being tested here is 111112" 0.08, without specifying the values of III and 112" The value of u is (Equation 2) u=
(.80  .60)  .08
.12
, /(.80) (.20) + (.60) (.40)
V
400
..    2.27, .0529
100
with the details of computation shown in Table 21.6b.
Since 2.27 is
TABLE 21.6b
=
Item T
  
n
r
ly yOy) yOy)/n Standard error

New
Old
320 400 .800000 .200000 .160000 .000400
60 100 .600000 .400000 .240000 .002400
Combination
.200000 ()
.002800 (+) .052915 V

21.6
409
DIFFERENCE BETWEEN TWO MEANS
greater than 1.96, the hypothesis should be rejected, if the 5% significance level is used. In plain English, the conclusion is that the margin of superiority of the new formula over the old one is wider than what the manufacturer claims. The confidence interval of IllIla can be obtained from Equation (2). For a confidence coefficient of 0.95, the limits are (Chapter 11)
 )±1960 ( YlYa·
yY
Oy) 1
1
n.
Ya(1y)a + .
(5)
na
For the data given in Table 21.6a, the limits are
(.80 .60) ± 1.960(.0529) or
.20 ± .10, with the details of the computation shown in Table 21.6b. Therefore the difference IllIla is somewhere between 0.10 and 0.30, with a confidence coefficient of 0.95. In other words, the new formula can kill somewhere between 10 to 30 percent more insects than the old one can. In order to obtain a more exact estimate of the difference IllIla' that is, to narroy the intervAl. larger sample sizes are required. The sample sizes of the data given in Table 21.6a are deliberately made different to show the general method of computation. However, in conducting an experiment, efforts should be made to equalize the sample sizes. Instead of using 400 and 100 insects for the two treatments, 250 should be used for each. For the same total number of observations, the equalization of sample sizes reduces the standard error of the difference between two sample means (Section 10.7). As a result, in the test of a hypothesis the probability of making a Type II error is reduced; and in the estimation of a parameter, the confidence interval is narrowed. The computing method for Equation (3) can be illustrated by the data given in Table 21.6a. The new item involved here is the general mean y, which is the grand total G divided by the total number of observations. For the given set of data, the general mean is equal to 380/500 or 0.76 (Table 21.6a). Of course, is also the weighted mean of 11 and with the sample sizes as the weights; that is,
r
ra'
(6) The physical meaning of this general mean is that 76 percent of the insects are killed for the two treatments combined. With the general
410
Ch.21
SAMPLING FROM BINOMIAL POPULATION
mean computed, one can proceed to find the valne of u, which is (Equation 3)
.80 .60 u
.20
;;.=====;;r::;:==;::::;;r
(.76) (.24)
~ 100 1
=     4.188,
+
(7)
.04775
400
with the details of the computation shown in Table 21.6c. Since 4.188 is greater than 1.960, the conclusion is that III is greater than Ill' if the 5% significance level is used. Of course, this is a foregone conclusion, because it is already established by the confidence interval that IllIll is somewhere between 0.10 and 0.30. However, there is a reason for using this same set of data here as an illustration. The purpose is to show that the estimated standard error of 1111 is not the same, when ~ and are estimated in different ways. The standard error is equal to 0.0529 (Table 21.6b), if two variances are estimated individually; and is equal to 0.0477 (Table 21.6c), if the two variances are estimated jointly.
0':
TABLE 21.6c Item
New
Old
T
320 400 .800000
60 100 .600000
ra
t
y
ly 1'(11> l/ra y(1f) 
J
~l+1 fit raJ
Standard error
.002500
.010000
Combination 380 500 .200000 .760000 .240000 .182400 .012500
G(+) "i.n(+) ()
G/"i.n
1.96; and the critical region for x* is that >t > 3.84 (Section 7.6). The physical meaning of the number of degrees of freedom can be noted
"a
from the hypothetical frequencieB of Table 21.6d. The four marginal totals, nu n a, G, and InG, are fixed by those of the observed frequencies. When one of four hypothetical frequencies, h, is determined, the other three are automatically determined. Therefore, the number of degrees of freedom is 1. The meaning of the number of degrees of freedom can also be Doted from the discrepancies, {h, of Table 21.6e. The four values are all equal to 16, with the only difference being in the sign. When one value is determined, the other three are automatically determined. If the first discrepancy were 9, the others would be 9, 9, and 9. Therefore, the number of degrees of freedom is 1. There is a shortcut method of computing the value of Xa. The method is
Xa •
[T I(n a T a)  T a(n l  T l)]2(n l + nJ (n l ) (n a) (G) (InG)
(12)
•
For the data given in Table 21.6a,
[320(40)  60(80)]2(500)
X2 
(400) (100) (380) (120)
32,000,000,000 ..
1,824,000,000

17.54.
(13)
The value obtained above is the same as that shown in Table 21.6e. The two versions of X 2 , as shown in Equations (11) and (12), are algebraically identical, despite their difference in appearance. Equation (12), however, is a computing shortcut, because it does not need the hypothetical frequencies. Large samples are required for the statistic " to follow the normal distribution, and consequently large samples are also required for to 2 follow the X distribution. To insure validity of the u or the x2test, the sample sizes should be large enough so that none of the four hypothetical frequencies is less than 5. This is the same working rule given in Section 21.3.
"a
21.7
TEST OF HOMOGENEITY OF MEANS
413
21.7 Tesl of Homogeneity of Me .... The method of testing the hypothesis that two population means are equal is given in the preceding section. In this section, a method is developed to cope with the case of "samples. The hypothesis is that" population means are equal. This method serves the binomial populations in the same way that the analysis of variance serves the normal populations. As shown in Section 21.2, the sample means drawn from a binomial population follow approximately the normal distribution with mean equal to Il and variance equal to jL(1Il)/n (Table 21.2b). Then the sample mean 8 1 may be considered a sample of " observations drawn from a normal population with mean equal to Il and variance equal to u 2 /n (Section 12.2). From this point of view, it can be seen that the statistic
t,(ry)2 .. nt(rY>2 amongsample SS 
X2_
0 2
q2
1l(1Il)
(1)
follows the X2distribution with "1 degrees of heedom (Theorem 7.7 a and Equation 2, Section 12.2). If the sample sizes are not equal, the statistic : , n( 11')2
Xl ~
01
amongsample SS  
(2)
Il(lIl)
follows the xldistribution with "1 degrees of freedom. The numerator n 1( 111')2 + nl(YIy>1 + .•. + n Ic( flc _1')2
(3)
is still the amongsample SS (Equation 2, Section 12.10). The distributions of the statistics given in Equations (2) and (3) are derived hom Theorem 7. 7a; but these statistics cannot be used to test any hypotheses until the population variance is replaced by its estimate. Since the variance of a binomial population is equal to 1l(1Il), the quantity 1'(11') may be used as a pooled estimate of 0 2 (Equation 3, Section 21.6). This estimate is a good one, if the sample sizes n are large, or if the number of samples, Ie, is large, or if both the n's and" are large, because y is an estimate based on In observations. With Il replaced by Yo the new statistic is
"",t
2 X
n( f1)2 = amongsample SS
y(1f)
y(11)
(4)
414
Ch.21
SAMPLING FROM BINOMIAL POPULAnON
which follows approximately the )(2diatribution, with kl degrees of freedom. This statistic may be need in testing the hypothesis that k population means are equal. An example may be used as an illustration of the purpose of the X2test. Table 21.7a shows the frequencies of the successes aDd failures 01 4 samples. In application, the samples are the treatmenta; the successes and failures are the observations. Therefore, the DumberS of successes and failures may be the numbers of insects killed and not killed; the 4 sa~les may be 4 kinds of insecticides, or 4 dosages of the same insecticide. In terms of germination rate, the numbers of successes and failures are the numbers of seeds germinated and not ~erminated; TABLE 21.7a Observation
Sample No. 1
2
3
Total
4
Success Failure
154 46
172 28
146 54
244 156
Sample Size
200
200
200
400
284 1,000
.610 59,536 148.84
.716000 512,656 512.656
Item Mean T2 Tl/n 11
101) I(Tl/n)GI!In )(1
716
C ~n
Computation .770 23,716 118.58
.860 29,584 147.92
.730 21,316 106.58

Y C2 Ga~n
.284000 .203344 9.264 45.56
the 4 samples may be 4 varieties of grass. In terms of medicine, the numbers of successes and failures are the numbers of patients responding favorably and unfavorably; the 4 salJ1)les may be 4 kinds of drugs. One can think of numerous examples of this sort, and the )(2_ test can be used to determine whether the percentages of successes of the k treatments differ significantly. In statistical terma, the hypothesis being tested is that the Ie ,",'s are equal. The shortcut methods of calculating the amongsample 55 given in Sections 12.3 and 12.10 can be utilized in computing the value of ~. The amongaample 55 is equal to (5)
21.7
TEST OF HOMOGENEITY OF MEANS
415
if the sample sizes are not the same; it reduces to
ITI GI T: + T: + ••• + T1 GI , ,. Ie,. ,. Ie,.
(6)
if the sample sizes are the same. The computing procedure is illustrated by the data given in Table 21.7a. The details of the computation are shown in the lower half of the table. The notations used here are the same ones used in the preceding sections. The number of successes iD a sample remains the sum T of that sample. The amongsample 55 is
Le') :~
U8.58 + 147.92 + 106.58 + 148.84  512.656  9.264.
The general mean ris equal to 716/1000 or 0.716. Therefore, the value of Xl is (Equation 4) 9.264
XI

(.716) (.284)
9.264 
.203344

45.56
with 3 degrees of freedom. Because of the magnitude of 45.56, the hypothesis is rejected and the conclusion is that the percentages of successes. of the four treatments are not the same. Needless to say, this ~ is a onetailed test. The hypothesis is rejected only because Xl is too large. If there are only two treatments, that is, Ie  2, the Xl given in Equation (4) is exactly the same as those given in Equations (11) and (12) of Section 21.6. For the data of Table 21.6a, (320)'
(60)1
(380)1
400
100
500
+
(:) (:)
3.2 17.54, .1824
which is the same value shown in Table 21.6e and Equation (13), Section 21.6. Thus another computing method is introduced for the twoaample cue. Another method of testing the hypothesis that Ie population means are equal is the chi square test of independence, which is described in the preceding section. The computing method is the same as that given in Tables 21.6d and 21.6e, except that the number of samples is changed from 2 to Ie. For the data given in Table 21.7a the value of)( computed by this method is 45.56 (Table 21. 7bj, which is the same as that obtained previously (Table 21. 7a). Therefore, these two )(tests are really the same one, despite their difference in terminology, notations, and computing methods.
416
SAMPLING FROM BINOMIAL POPULATION
Ch.21
TABLE 21.7b
Sample No. 1 2 3 4
Observation
f
Success Failure Success Failure Success Failure Success Failure
154 46 172
Total
(f_h)2
h
fh
143.2 56.8 143.2
10.8 10.8 28.8
116.64 116.64 829.44
28
56.8
28.8
839.44
146 54 244 156
143.2 56.8 286.4 113.6
2.8  2.8 42.4 42.4
7.84 7.84 1,797.76 1,797.76
1000
1000 .0
0.0
(fh}l/b
0.8145 2.0535 5.7922 14.6028 0.0547 0.1380 6.2771 15.8254 45.5582
21.8 Analysis of Variance Versns )(2Test The choice of a statistic in testing the hypothesis that " population means are equal, depends on the kind of populations involved. For the normal populations, the statistic used is Amongsample SS
F
=
"
1 _ Amongsample MS Withinsample SS Withinsample MS
(1)
Ink with "  1 and In " degrees of freedom; for the binomial populations, the statistic used is Amongsample SS
>t 
y(l y)
(2)
with ,, 1 degrees of freedom. There is undoubtedly a great deal of similarity between these two statistics. F or the purpose of comparison, the X2 may be expressed in terms of F (Theorem 9.7); that is, Amongsample SS
F' =
L
"1
= _ _"__1_ _ Amongsample MS = y(1y) yO y)
(3)
with"  1 and 00 degrees of freedom. Note that a prime is attached to the above F to distinguish it from the F of the analysis of variance. Observing Equations 0) and (3), one will notice that the numerators of F and F' are the same; the denominators, though different, are both pooled estimates of the population variance. For normal populations, q2 is estimated directly by Sl, the error mean square; for binomial populations, q2 is estimated through the general mean y, because q2 :::: ,.,. (1  ,.,.). Therefore, the basic principles underlying these two tests, )(2 and F, are the
21.8
417
ANALYSIS OF VARIANCE VERSUS XaTEST
same. The only difference between them is the manner in which 0 2 is estimateddirectly through sa for normal populations and indirectly through y for binomial population~. It would be interesting to see what would happen if the variance 0 2 of the binomial populations is estimated by S2 instead of by yU  y). The relation between S2 and y(1  y) can be readily seen through a numerical example. Table 21.8a shows 4 samples, with the observations given individually rather than in the familiar form of a frequency table. The analysis of variance of the data is shown in the lower part of the table. ..
_
Item
TABLE 21.8a Sample No.
I
(3)
(4)
1 1
0
0
0 0
1
r
i
I
n
'f2 'f2/n
Total
(2)
(I)
1
0
1 1
0
1 0
0 0
2 4 4 1.0
3 5
2 5 4 0.8
9 1.8
1 1 2 4 4 1.0
G
9
18 81 4.5
~.
~
G2 G2/In
Analysis of Variance Source of Variation
5S
DF
MS
Amongsample Withinsample Total
0.1
3 14 17
0.0333 0.3143 0.2647
4.4 4.5
The computation is very simple, because y = y, for y = 0 or 1. The quantity Iy2 for all ~n observations is simply equal to the grand total G. Dy the computing method given in Section 12.10, the amongsample 55 is
L~(Pn) ~: k. the withinsample 55 is
~y'  L(~) =G
(4)
= 1.0 + 1.8 + 0.8 + 1.0  4.5 :: 0.1;
I1 ~)=
9 1.0 1.8  0.8 1.0
=4.4;
(5)
the total 55 is
G2 If  
In
G2
=G  
~
:: 9  4.5
= 4.5.
(6)
418
Ch.21
SAMPLING FROM BINOMIAL POPULATION
It is through these components of 55 that the relation between S2 and y(11) can be established. The total mean square, which has never been used in the analysis of variance, is approximately equal to y(1  y). When the total 55 is divided by ~II instead of ~II  1, its number of degrees of freedom, the result is (Equation 6)
r G III LG  III 1
2]
G
= ~II
( G :
~II
)1 = Y_ f_I =1(1_ y). _
(7)
Therefore, the total mean square is only slightly gt'eater than ;(11>. For the data of Table 21.8a, y(11> = .5(1.5) = 0.25 as compared with 0.2647, the total mean square shown in Table 21.8a. The difference between 0.25 and 0.2647, though small, is still misleadingly large, because In is only equal to 18. The difference in these numbers is induced by the difference in the divisors, In and In  1. Therefore, the larger the total nUDlber of observations, the smaller the difference between the total mean square and y(l  f}. Since the ~test is used only for large samples and never for samples as small as those shown in Table 21.8a, the total mean square and ;(11> may be considered equal for all practical purposes. Now the relation between S2 and yel  f) is established. The fonner is the withinsample mean square, while the latter is approximately the totel mean square of the analysis of variance. Therefore the x2test is equivalent to the analysis of variance, with the
total mean square being used as the elTor tel'ltl. The practice of using the total mean square 88 the elTor tenn in the analysis of variance may seem shocking at first, but it is not nearly so bad 88 it appears. The consequences of this practice are illustrated by Table 21.8b which shows the analysis of variance of the data given in Table 21. 7a. The total mean square is the weighted average of the amongsample and withinsample mean squares, with their numbers of degrees of freedom being the weights. Table 21.8b Source of Variation
SS
Amongsample Withinsample Total
9.264 194.080 203.344
DF
3 996 999
MS
F
F'
3.0880 .1949 .2035
15.84
15.17
In terms of the mean squares given in Table 21.8b, the relation among them is 0.2035
=
3(3.0880) + 996(.1949) 3 +996
(8)
21.8
ANALYSIS OF VARIANCE VERSUS )(TEST
419
The' above equation is obviously true, because a mean square multiplied by its nwr.ber of degrees of freedom is equal to the SS. Because of this relation, the magnitude of the total mean square must lie somewhere . between those of the amongsample and withinsample mean squares. But the total mean square is much closer to the withinsample mean square, because its number of degree of freedom, In  Ie, is much larger than ,, 1, and therefore the withinsample mean square carries more weight in the weighted average (Table 2108b). If F is greater than 1, that is, if the amongsample mean square is greater than the withinsample mean square, the total mean square, being between the two values, is also greater than the withinsample mean square. Therefore F' (Equation 3), being obtained from a larger divisor, is less than F (Table 21.8b). But the difference is slight for large samples, because large samples make In " much greater than"  1, and consequently make the total mean square and the withinsample mean square more alike (Table 2108b). For samples not very large, the difference in F and F' is further compensated by the difference in their critical regions. The critical region of F', which has 00 degrees of freedom for the denominator, is determined by the values given in the bottom line of the F table; while that of F is determined by the larger values given in the upper lines. Therefore to be significant F needs a larger value than F' does.
All the discussions so far indicate that the x2test and the analysis of variance usually yield the same conclusion in testing the hypothesis that " population means are equal. Due to the fact that F' is less than F when F is greater than 1, the x2test will not reject the hypothesis as . frequently 88 the analysis of variance will. Therefore, the ,rtest seems to have a slightly higher probability of committing a Type II error than the analysis of variance does. However, one must realize that the F test is not beyond reproach, because the populations under consideration are binomial and not normal. It is known that the analysis of variance tends to reject the true hypothesis more frequently than the significance level specified, if the populations are not nonnal (Section 12.7). Therefore, the Ftest seems to have a higher probability of committing a type I error than the ,rlest does. Now the discussion about the )(test versus the Ftest can be boiled down to this: For binomial populations, either method may be used in testing the hypothesis that" population means are equal. The conclusions reached by the two methods are usually the same. On rare occasions when they do not agree, one test may be 88 much at fault 88 the other. After all, both of them are approximate tests in the sense that the validity of x2test requires large samples and the validity of F test requires nonnal populations. Neither of these requirements is entirely fulfilled in sampling from
420
SAMPLING FROM BINOMIAL POPULATION
Ch.21
binomial populations. Therefore, one has no basis for favoring one over the other. The discussion about F and F'is equally applicable to t and u, where
(Equation 1, Section 10.6), and u=
(Equation 3, Section 21.6). Here the contrast is also between S2 and ;< 1  y). This is, of course, to be expected, because r .. F and u2  F' (Theorems 7.6 and 9.7).
21. q Individual Oegree of Freedom The x2test described in the preceding sections is used in testing the general hypothesis that the k population means are equal. For testing a more specific hypothesis an individual degree of freedom may be used. The amongsample SS (Equation 4, Section 21.7) may be broken down into k 1 components. each of which is to be divided by;c Teata," metric., Vol. 10, pp. 417451, 1954. SDeclecor, C. Wos SIDIl.eical Method., Fourth EdltlOD, 1946.
Bi~
CHAPTERZZ
SAMPLING FROM MULTINOMIAL POPULATION This chapter is essentially an extension of the preceding one. Instead of dealing with the binomial population which consists of only two kinds or observatioDs, this chapter discDsses the rrwltinomial populatioD whose observations can be classified in more than two categories. However, the basic techniques remain the same. The x'teet of goodness of fit is used on the onesample case, while the ~test of independence is used on the ksample case. 22.1 Malt'aoa,aI Pop.lad... A multinomial population is a set of observations which can be clusified into a finite uumber of categories. The number of categories may be designated by the letter r. If r = 2, the multinomial population becomes a binomial population. For example, the five grades A, B, C, D, aDd F given to students at the end of a term cODstitute a multinomial population with r = 5. If the grades consist only of passing and failing, the population becomes binomial. In answering a question, if the answers "yes", "no", and "I don't know" are permitted, the answers constitute a multinomial population with r = 3. If the answers are restricted to only "yes" and "no," the population becomes binomial. There is an abundance of examples of multinomial populations in everyday life. Cars may be classified by their makes, by their body styles, or by their colors. Houses may be classified by their architectural styles, or by their constructional material. Skilled workers may be classified by their trade. Army officers may be classified by their ranks. People may be classified by their races. Whenever the observations, such as the different makes of cars, can be divided into a fiuite number of categories, they constitute a multinomial population. A multinomial population is described by the relative frequencies of the observations in the r categories. These relative frequencies are designated by"., "a' ... , and "r' Suppose 5% of the students receive tbe Grade A, 20% receive B, 50% receive C, 20% receive D, and 5% receive F. Then "1 = .05, "a = .20, ", = .50, ". = .20, and ". = .05. The sum of these relative frequencies" is equal to 1. If r = 2, the relative frequencies "1 and "2 are equivalent to "and (I  ") of a binomial population.
22.2 Test of Goodaess of Fit The test of goodness of fit is first introduced in Section 21.4 to test the hypothesis that the relative frequency of successes of a binomial 432
433
TEST OF GOODNESS OF FIT
22.2
population is equal to a given value. The same test may be used in testing the hypothesis that the r relative &equencies, "., "2' ••• , "r' of a multinomial population are equal to certain specified values. With a random sample of n observations, the statistic used to test thia hypotheaia is
t
>t _
(f  h)2 _ (f.  h.)2 + (f2  h2)2 + ••• +
(1)
For large aamplea, this statiatic approximately follows the >tdistribution with r  1 degrees of &eedom, if the hypotheais is correct. The {'S which are the numbers of observations faIlins into the r categories are called the obseroed frequencies. The sum of these &equencies ia equal to n. The h's which are equal to n"., n"2' ••• , and n" n' are called the hypothetical frequencies. The sum of the A's is alao equal to n, becauae the sum of the is equal to 1. t\. sampling experiment conducted by the author may be used to verify the fact that the statistic X 2 given in Equation (1) approximately follows the X2distribution with r  1 degrees of freedom. A multinomial population consists of 4000 beads, of which 1600 are red, 1200 are blue, and 1200 are white. The observations are the 3 colora, red, blue, and white. The relative frequencies of the three categories (colors) are 0.4. 0.3. and 0.3. the sum of which is equal to 1. From this population, 1000 random samples. each consisting of 20 observations. are drawn. For each sample. the numbers of red. blue. and white beads are recorded. The average numbers of red. blue. and white beads for the 1000 samples are 7.98. 5.99. and 6.03. respectively. These values are approximately equal to 8. 6. and 6, which are n7l'., n7l'2' and 1171', respectively. If all possible samples were drawn, the average frequencies would be exactly 8. 6, and
"'S
6. For each of the 1000 samples, the statistic X 2 is computed. Suppose f •• f2' and fl are the numbers of red, blue, and white beads in a sample. The statistic for that sample is
>t _(f. 
8)2 + (f2  6)2 + (f,  6)2,
(2)
866
For example, the first sample consists of 10 red, 4 blue, and 6 white beads. The statistic for this sample is
>t =
(10  8)2
8
(4  6)2
+
6
(6  6)2
+
6
 1.17.
(3)
After the 1000 ~values are computed, a &equency table and a histogram may be made to show the shape of the distribution. The mean of the 1000
434
Ch.22
SAMPLING FROM MULTINOMIAL POPULATION
Xvalues is approximately equal to 2. This indicates that the distribution has T  1 or 3  1 or 2 degees of &eedom. Oot of the 1000 >1values, 60 or 6% exceeds 5.99, the 5% point of the Xdistribution with 2 degrees of freedom. The discrepancy between 6% and 5% is not excessive for an experiment consisting of only 1000 samples. If a larger nnmber of samples were drawn, the discrepancy would be expected to diminish. From Equations (2) and (3), it can be seen that the statistic >t is a measurement of the deviations of the observed &equencies of a sample from the true average frequencies, ",ra,. ""It ... , and "".' If these average &equencies are incorrect because of a wrong hypothesis concerning the values of the ,,'s, the result is that the Xvalue of a sample will be affected. In general, it is increased rather than decreased. For example, the x2value for the sample consisting of 10 red, 4 blue, and 6 white beads is 1.17 (Equation 3). In testing die hypothesis tbat "1" . 2, "2 .4, and ", = .4, the same sample yields a statistic of 1:1
X
(10  4)2 1:1
4
+
(4 _ 8)1 8
+
(6  8)1 8
... 11.5.
This large value of X enables one to reject the wrong hypothesis. Therefore in testing the hypothesis that the ,,'s are equal to a given set of values, a large Xvalue indicates that the hypothesis might be wrong. Since the ~value can also be large sometimes, even if the hypothesis is correct, there is always the possibility of committing a Type I error in rejecting a hypothesis. 00 the other hand, a wrong hypothesis can sometimes yield a small value of~. In testing the hypothesis that "a .5, "2 = .2, and ". = .3, the sample consisting of 10 red, .. blue, and 6 white beads yields a Xvalue of zero. 00 the basis of the value of ~, this wrong hypothesis would be accepted. Therefore, in accepting a hypothesis there is always the chance of making a Type II error. The rolling of dice may be used as an illustration of the application of the test of goodness of fit. If a die is well balanced, the six faces 1:1
TABLE 22.2 No. of Spots I 2 3 4 5 6 Total
Observed Frequency
Hypothetical Frequency h
Ih
0; f, tbe number of successes, is 45 + 43 or 88. The value of is (Equation 2, Section 21.4)
>t
(f _11,,)3
m
(88  2nO/3)3
u'  nllO w) 
room
(64)3/9
400/9  10.24
(S)
which is the same value given in Equatiou (4). The x'test of the indi vidual degree of heedom is equivalent to a twotailed "..test (Sect.ion 7.6). At times, the onetailed "..test is more desirable. For example, because of the nature of the game, one suspects that the sides 3 and 4 of the die occur more hequently, and not less hequently, than I, 2, 5, and 6. In this case, the onetailed utest should be used. For the example of Table 22.2, u
=
f II" vn'{lwl
=
88  200/3
V~G)m
 3.21>,
(6)
which is the square root of x' or 10.24. However, the difference in the one or twotailed "..tests is not in the value of u, but in the critical regions. For the 5% significance level, the critical regions for a twotailed test are u <  1.960 and u> 1.960; the critical region for a onetailed test is u > 1.645. If u is greater than 1.645, the hypothesis is rejected by a onetailed test; while in a twotailed test, u must exceed 1.960 before the bypothesis is rejected. Therefore, the onetailed test is more powerful than the twotailed one, if the sides 3 and 4 are suspected to occur only more uequently than the other sides. 22. 4 F''''.g Freqaeacy CIII'Yel!J
Many applications of the test of goodness of fit can be found in this book. Various distributions of the statistics, sucb as the t. F, and )(2_ distributions, are verified by sampling experiments consisting of 1000 samples. Such an empirical distribution may be tested for agreement with the theoretical one. The sampling experiment of Section 8.2 is used here as an example. The observed and hypothetical frequencies of the
437
FITI1NG FREQUENCY CURVES
22.4
1000 tvalues are given in Table 22.4&. The observed frequencies are transferred from Table 8.2. The theoretical relative frequencies of Table 8.2 are the values of "'s. The hypothetical frequencies are equal to 1000 times the theoretical relative frequencies. The computation of Table 22.4& shows that )( = 1l.S9 with 10 degrees of freedom. Therefore, the conclusion is that the result of the sampling experiment does not refute the theoretical tdistribution. In fitting the empirical data to a theoretical frequency curve, the value and the number of degrees of freedom are affected by the manner in of which the observations are grouped. In Table 22.4a, the tvalues are arbitrarily classified into 11 groups. But the same 1000 tvalues can also be classified into 3 groups as shown in Table 22.4b. For this grouping, )(  4.23 with 2 degrees of freedom. This example illustrates the effect of the grouping of observations on the )(1est of goodness of fit. Sometimes the conclusions reached through different groupings may even be different. Comparing Tables 22.4a and 22.4b, one sees that the latter grouping obscures the possible discrepancies between the observed and hypothetical frequencies at the tail ends of the tdistribution.
>t
TABLE 22.48
,
Observed Frequency
1 Below 4.5 to 3.5 to 2.5 to 1.5 to 0.5 to 0.5 to 1.5 to 2.5 to 3.5 to Above
4.5 3.5 2.5 1.5 0.5 0.5 1.5 2.5 3.5 4.5 4.5
Total
Hypothetical Frequency h
(/ h)l
/h
8 6 23 85 218 325 219 80 25 4 7
5 7 21 71 218 356 218 71 21 7 5
81
1000
1000
0
3 1 2 14 0 1 9 4
8 2
(/ h)1
h
9 1 4 196 0 961 1 81 16 9 4
1.800 .143 .190 2.761 .000 2.699 .005 1.141 .762 1.286 .800
>t = 11.587
TABLE 22.4b
,
Observed FrequeDCY
1
Hypothetical FrequeDCY h
(I  h)3 Ih
(I  h)2
324 961 169
Below 0.5 0.5 to 0.5 Above 0.5
340 325 335
322 356 322
18 31 13
Total
1000
1000
0
h
1.006 2.699 .525
>t = 4.230
438
SAMPLING FROM MULTINOMIAL POPULATION
Ch.22
In general, a more minnte grouping of observations is more useful in detecting the difference in the shapes of the empirical and the theoretical frequency distributioDs. However, the namber of groups cannot be too large either. If the observations are divided into minute groups, some of the hypothetical frequencies may be too small. The timehonored working rule is that all the hypothetical frequencies must be at least 5 (Section 21.3). But this working rule need not be strictly observed. As long 88 80% of the hypothetical frequencies are equal to or greater than 5, and the other 20% are not less than 1, the test of goodness of fit still can be used. The reason for compromising the working rule is to make the test more sensitive to the possible discrepancies at the tail ends of the empirical and theoretical distributions.
22.5 Test of ladepeadeace The test of independence is first introduced in Section 21.6 to test the hypothesis that two samples are drawn from the same binomial population. The same test can be used in testing the hypothesis that " samples are drawn from the same rcategoried multinomial population. The statistic used to test this hypothesis is
ir(f _ A)I
x'=L
(1)
A '
where f is an observed frequency and A the corresponding hypothetical frequency. For large samples, this statistic follows the x'distribution with ("  1)(r  1) degrees of freedom. The sampling experiment of Section 22.2 can be used to verify the distribution of the statistic x'. The multinomial population consists of 4000 colored beads, 40% red, 30% blue, and 30% white. One thousand random samples, each consisting of 20 beads, are drawn from this threecategoried population. The numbers of red, blue, and white beads are recorded for each of the 1000 samples. Then the 1000 samples are organized into 500 pairs of samples. The first and the second samples cOllstitute a pair; the third and the fourth cOllstitute another pair; and so forth. One such pair of samples is shown in Table 22.5a. For each pair of samples, a x'value can be computed. TABLE 22.5a Observation
Sample No.
Total FrequeDCY
Pooled Relative Frequency
1
2
Red Blue White
10 4 6
7
8 5
17 12 11
.425 .300 .275
Sample Size
20
20
40
1.000
22.6
439
AN EXAMPLE OF TEST OF INDEPENDENCE
The end result is 500 ~values. The purpose of this sampling experiapproximately follows the ~ ment is to show that the statistic distribution with (k  1)(r  1) or (2 1)(3 1) or 2 degrees of freedom.
>t
TABLE 22.5b Observed Frequeacy
Hypothetical Frequency
I
1&
Red Bloe White
10 4 6
Red Bloe While
Sample No.
Observatioll
1
2 Total
(/ 1&). 11&
(1 1&)2
8.5 6.0 5.5
1.5 2.0 .5
2.25 4.00
7 8 5
8.5 6.0 5.5
1.5 2.0 .5
2.25 4.00
40
40.0
.0
.25
.25
1&
.2647 .6667 .0455 .2647 .6667 .0455
>t = 1.9538
The details of computation for the pair of samples of Table 22.5a are shown in Table 22.5b. The hypothetical frequencies given in the table are not n times the ,,'s 88 in the test of goodness of fit, but n times the pooled relative frequencies of the r categories. The hypothetical frequencies 8.5, 6.0, and 5.5 are equal to 20 times .425, .300, and .275 (Table 22.5a) respectively. Judging from Table 22.5b, it seems extremely tedious to calculate 500 x2values. However, a computing shortcut can be used to reduce the amount of work. For the pair of samples given in Table 22.5a, ,.2 X
(10 _7)2
(4  8)2
10 + 7
4 +8
=  +
+
(6  5)2
6+5
= 1.954.
(2)
The numerators are the squares of the differences between the two observed frequencies, and the denominators are the IJUIIlS of the same two frequencies. The >tvalue obtained this way is the same as that given in Table 22.5b. Even though this shortcut is applicable only to the case of two samples of the same size, it does save a tremendous amount of time for this sampling experiment. After the 500 ~values are computed, the empirical distribution may be checked against the theoretical ~ distribution with 2 degrees of freedom. Out of the 500 ~values computed, 7:l or 5.4% exceed 5.99, the tabulated 5% point of the >tdistribution with 2 degrees of freedom. The discrepancy between 5.4% and 5.0% for a sampling experiment involving only 500 >tvalues is not excessive. 22.6 AD. Ex_pie of Teet or IadepeDdeDce The data given in Table 22.6a may be used as an illustration of the application of the test of independence. The table shows the grade dis
440
Ch. 22
SAMPLING FROM MULTINOMIAL POPULATION
tribation of 693 stDdents in a freshman chemistry cl88s. The stDdenta are cl88sified by the grades they received and also by the school in which they are registered. Snch a twoway frequency table is called a contingency table. Table 22.6a is a 5 x 6 contingency table. The purpose of compiling such data is to determine whether the percentages of students receiving the five different grades vary from school to school of the same college. If the five percentages remain the same for all the six schools, the grade distribution is said to be independent of the schools. In tenne of statistics, the 5 grades may be interpreted 88 the 5 categories of a multinomial population. The 6 schools may be considered 88 6 random samples drawn &om 6 multinomial populations. The hypothesis is that the 6 populations have the same set of relative &equencies; that is, the grade distribution is independent of the schools. Moreover, another interpretation can be given the data. The 6 scbools may be interpreted 88 the 6 categories of a multinomial population and the 5 grades as 5 samples drawn &om the multinomial populations. It does not matter which interpretation is adopted. The final conclusion remains the same. The computing procedure for this set of data is Ihe same 88 that described for the sampling experiment. The 30 cells of Table 22.6a contain the observed frequencies. The pooled relative frequencies for the 5 grades are given in the bottom of the table. The hypothetical frequencies are equal to n times these pooled relative frequencies. For example, the hypothetical frequencies for the school of agriculture are TABLE 22.6a Grade School Agriculture Engineering Science Home Economics Pharmacy Uncl ...ified
Sample Size
A
B
C
D
F
22 28 8 15 9
59 66
49 41 21 16 40 38
38
23 17 7 2 6 10
"
11 27 34 20
23 16 9 15 19
191 175 63 69 104 91
120 693 65 205 217 86 Total frequ_cy Relative frequency .124098 .313131 .295815 .173160 .093795 .999999
". ria
~
". '"."
In
(This table is published with the permiasion of Mr. DenDi. KrzyzlIIlI" instructor in Chemistry, South Dakota State College.)
equal to 191 times the pooled relative frequencies; for the school of engineering, they are equal to 175 times the same set of relative frequencies. The 30 hypothetical frequencies thus computed are given in Table 22.6b. The value of )( is equal to 47.82 which exceeds 31.41, the 5" point of the )(distribution with 20 degrees of &eedom. If the 5% significance level is used, the conclusion is that the percentages of the stD
22.6
AN EXAMPLE OF TEST OF INDEPENDENCE
441
TABLE 22.6b 
School Asricaltnre
EagiDeerill8
A B C D F A B
C D F Science
A B
C D F Home Economics
Pb_acy
8 11 21 16 7 15 27 16 9
A B
9 34 40 15 6 4 20
D F
Total
22 59 49 38 23 28 66 .. 1 23 17
A B C D F
C
Unclassified

Obeerved Hypothetical Grade Frequency Frequency h /
A B C D F
2
23.70 59.81 56.50 33.07 17.91 21.72 54.80 51.77 30.3() 16.'1 7.82 19.73 18.64 10.91 5.91
ttest described here is applicable to any populatioD iDstead of to a Dormal populatioD oDly. 24.3 c.o.ple&ely R.doa.ud. Experta.,
To test the hypothesis thai the medians of Ie populatioDs are equal, the biDomial transformatioD may be used. Firat the median of the pooled observatioDs of the Ie samples is determiDed. TheD all the observatioDs EalliDS ahove this pooled mediaD are cODsidered successes and the rest
failures. After the traIlsformatioD, the hypothesis becomes that Ie IT'S are equal (SeCtiOlls 21.7 aDd 21.10). Strictly speakiDg, this hypothesis is Dot exactly equivaleDt to the ODe that Ie populatioD medians are equal. The reason is that the observations which are eqUDI to the pooled median of the Ie samples are classified, with those below the median, as failures. But for practical purposes, these two hypotheses may be considered the same. After the transformation, all the methods dealing with binomial populations may be used. If some of the percentages of successes are too close to 0 or 100, the binomial data can be transformed one step farther into normal data by the angular transformation. Through the two successive transformations, a bridge is built between the distributionfree methods and the normaldistributioD methods.
24.3
COMPLETEL Y RANDOMIZED EXPERIMENT
473
TABLE 24. loa TreatmeDt No.
2
1 61 60 64 53 53
54 60 65 52 51 58 59 60 57 58 53 54 58 60 55
60
3
67 61
56
54 61 70 56 59 55 64 56 51 51 55 56 56 60
54
58
55 51 60 51 52 54 71 55 51 58
54 54 62 56 55 54 54 62 53 54
TABLE 24.311 TreatmeDt Total
FreqaeDCY
1
2
3
Greater than median Less than or equal to median
12 8
9 11
5 15
34
Sample size
20
20
20
60
26
The interpretation of the data is not at all complicated by these lraDeformations. A Ireabnent which has a large average angle also haa a large percentage of observations above the pooled mediaD. This fact, in turn, implies that the treabnent haa a large mediaD. As an illustration of the binomial traDsformation, the data of Table 24.3a may be used. The three treabnents given in the table are actually three raDdom samples, each consisting of 20 observations, drawn from a normal population, with the lower half of the distribution cut off. Therefore, the population from which these samples are drawn is decidedly not normal and not even symmetrical. Now suppose that the source of these samples is unknown. The hypothesis that the three population medians are equal is tested. The first step is to rearrange the observations according to their magnitudes and to determine the pooled median of the three samples. Among the 60 observations given in Table 24.3a, 26 are above 56; 6 are equal to 56; 28 are below 56. Therefore, the pooled median is 56. The next step is to count the number of observations above 56 for each of the three samples. These frequencies are shown in Table 24.3b. The among sample 55 is (Equation 5, Section 21. 7) (12)2
(9)2
(5)2
(26)2
  +  +     = 1.2333; 20 20 20 60 the value of
pU  p) is equal to 26 34 60 • 60  0.2456;
(1)
474
DISTRIBUTIONFREE METHODS
Ch.24
the value of )(' is 1.2333
   = 5.02,
.2456
(3)
with 2 degrees of freedom. If the 5% significance level is used, the conclusion is that no difference has been detected among the three population medians, because 5.02 is less than 5.99, the 5% point of the Xldistribution with 2 degrees of freedom. Of course, the hypothesis being tested here is a general one. For more specific hypotheses, the individual degrees of freedom and linear regression may be used. Since the detailed description of these methods is given in Section 21.9, it is omi tted here. The binomial data can be transformed further, if necessary, into nonnal data by the angular transformation. The samples given in Table 24.3a are drawn nom the same population. Consequently, the percentages of successes, or the percentages of the observations exceeding the pooled median, are all fairly close to SO. If the samples were drawn from diffe~ ent populations, some of the percentages of successes might be close to o or 100. Should this occur, the angular transformation might be used. The advantage of such an additional transformation is that it enables one to take advantage of the numerous methods developed for normal populations, including the multiple range test (Section 15.5). The method presented in this section is basically the binomial transformation. Therefore, in using this method, large samples are required. The sizes of the k samples should all be equal to or larger than 10 (Section 21.3). 24.4 Randomized Block Experiment The normal score transformation (Section 23.5) may be used in a randomized block experiment with k treatments and n replications. The observations in a replication may be ranked according to their magni tude and subsequently replaced by the corresponding normal scores. Then the analysis of variance may be used on the transformed observations. The procedure of computation may be illustrated by the data given in Table 24.4a. The original observations of the 5 treatments and 10 replications are shown in the upper half of the taLle, while the transformed ones are shown in the lower half. The details of the analysis of variance are given in Table 24.4b. It should be noted that the median of every replication is replaced by O. A positive or negative normal score indicates that the original observation is above or below its replication median. Therefore, the hypothesis being tested is that the medians of the k treatments are equal.
24.4
475
RANDOMIZED BLOCK EXPERIMENT TABLE 24..4a Treatment ReplicatiOll
1
2
3
4
fi
2 3 4 5 6
46 48 32 42 39 48
50 46
69 47 46 65 49 59
48 60 54
44. 40 59 44. 55 50
1
7
49
8 9 10
30 48 34
50 48 37 58 50 44. 40 39
I 2 3 4 5 6 7 8 9 10
.50 .50 1.16 1.16 .50 1.16 .00 1.16 1.16 1.16
.50 .50 .00 .50 1.16 .00 .50 .50 1.16 .00
1.16 .00 .50 1.16 .00 .50 1.16 .50 .50 .50
.00 .00 .50
1.16 1.16 1.16 .50 1.16 .50 .50 1.16 .50 1.16
Total
5.14
1.82
2.66
3.98
0.32
47 50 68 58 46 46 37
42 63 47 47
.00 1.16 .50 .00
.50 1.16 1.16
47 71
43 55
TABLE 24..4b Preliminary CalculatioDs (l)
(2)
(3)
(4)
(5)
Type of Total
Total of Squares
No. of Items Squared
ObservetiODS per Squared Item
Total of Squares per ObservatioD (2) .;. (4)
Treatment Observation
52.7504 31.9120
5
10
SO
1
5.27504 31.91200
Analysis of Variance Source of Variation
Sum 0 f Squares
Degrees of Freedom
Treatment Error Total
5.27504 26.63696 31.91200
36
4 40
Mean Square 1.3188 .7399
F
1.78
.
The positive and negative signs of the normal scores also suggest that the binomial transformation may be used. Indeed, if the numbers of repli
476
DISTRIBUTIONFREE METHODS
Ch.24
TABLE 24.4c TreatmeDt FrequeDcy Greater tho mediaD Less thaD or equal to mediaD No. of replicatioDs
Total
1
2
3
2 8
3 7
6
10
10
10
"
"5
5
5
6
30
10
10
50
"
20
cations and treatments are both large, the binomial transformation is aD alternative to the normal score transformation. The observations which exceed their respective replication medians are considered successeSi those which do not are considered failures. Then. for each of the It treatments, the numbers of successes and failures can be obtained. Such a 2 x 5 contingency table, which is obtained from the data of Table 24.4a, is given in Table 24.4c. The value of ,r is (Section 21.7)
tJ! + 31 + 61 + 51 + 41
(20)1

10 50 1 X2 =             =   4 . 1 7 , al 30 .24
(1)
.SO 50
with 4 degrees of freedom. The linomial transformation for a randomized block experiment can be used only if both k and n are large. 1£ k, the number of treatments, is small, the ,rvalue needs be corrected. The corrected value is
1) X! = (k k ,r.
(2)
In terms of the example under consideration, the corrected ,r is X!
4 0::: 
5
(4.17) = 3.34
(3)
with 4 degrees of freedom. This correction term originated from the relation between the chi square test of independence and the analysis of variance (Section 21.8). The x2test value for a completely randomized experiment is approximately equal to the treatment SS divided by the total mean square and exactly equal to the treatment SS di vided by p( 1  p) (Table 21.10). The total mean square is the total SS divided by kn  1 and p( 1  p> is the same SS divided by kn, the total number of observations (Section 21.8). Since the ratio of kn  1 to kn is almost equal to 1 for large values of kn, the conclusion reached through the analysis of variance and the xltest are almost always the same. However, with
24.4
477
RANDOMIZED BLOCK EXPERIMENT TABLE U.4d Tre atm eat ReplicatioD
1 2 3 4 5 6 7 8 9 10
1
2
3
4
5
0
1 0
1 0
0 1 0
0
0 1 1 0 1 1 1
0 0 1 0 1
1
0 0 0 0 0 0 1 0
Total
 _~_I
2
0 1 0 0 0
1 0 1 0 1 1 1
3
6
Total
0 0 0
0 0 1 0 1
2 2 2 2 2 2 2 2 2 2
5
4
20
TABLE 24•.te Preliminary CalculatioDs (1)
(2)
(3)
Type of Total
Total of Squares
No. of Items Squared
Crand ReplicatioD Treatment ObservatioD
400 40 90 20
1 10 5 50
(5)
(4)
ObservetiOllS per Squared
Item
Total of Squres per Observation (2) + (4)
50 5 10 1
8 8 9 20
Analysis of Variance Source of VariatiOll RepliCaliOD Treatment Error Total
pO  p)
Sum of Squares 0 1
DF
11
0 4 36
12 12
40 50
MeaD Square
X2 4.1667
X: 3.3333
.30 .24
the randomized block experiment, the situation is different. The difference can be shown through the example t9veQ in Table 24.4a. Basically, what the binomial transfonnation does is to replace all observations above their respective replication medians by 1 and the others by O. The data thus tl"ansfonned are given in Table 24.4d. The analysis of variance of these transformed observations is given in Table 24.4e. Since the replication totals are always the same, the replication SS is always equal to zero. As a result, the 9 or 11  1 degrees of freedom for replica
DISTRIBUTIONFREE METHODS
478
Ch.24
tion disappear completely. Consequently, the number degrees of freedom fer the total 55 is correspondingly reduced from 49 er len  1 to 40 or (Ie Un. The uncorrected chi square is the treatment 55 divided by pO  p> which is the total 55 divided by 50 or len. The corrected chi square X~ is the treatment 55 divided by the total mean square. Therefore, the correction term is simply the ratio of the total number of degrees of freedom to the total number of observations; that is,
>t
(Ie 
nn
len
1e1 ... le'
.\s long as Ie, the number of treatments, is large, this correction need not be used. However, when the number of treatments is small, such as Ie = 2, this correction is absolutely essential (Section 24.5).
24.5 Sign Test The sign test is a method which deals with the randomized block experiment with 2 treatments and n replications (Section 8.7 and 14.6). Since the general method for the randomized block experiment is presented in the preceding chapter, there is little need to emphasize the twotreatment case. However, the computing method for this case is simplified to such an extent that the sign test deserves some special attention. The use of the sign test may be illustrated by the data of a randomized block experiment with 2 treatments and 12 replication. given
TABLE 24.58 Rep. No. I
2 3 4 5 6
Treatment
1
2
18 4 18 76 1 35
37 56 52 25 71 83
Sign of Difference
Rep. No.

7 8 9 10
+ 
11
12
Treatment
1
2
30 84 75 41 51 50
86 81 66
5 91 75
Sign of Difference

+ + +

in Table 24.5a. A plus or minus sign is given to each of the 12 replications, depending on whether the observation of the first treatment is greater or less than the observation of the second treatment. If there is no difference between the effects of the two treatments, there should be approximately the same number of plus and minus signs. If the effect of the first treatment is greater that that of the second treatment there is an excess of plus signs, otherwise a deficit in plus signs. Therefore, the hypothesis that two treatment effects are equal is the same as that the relative frequency of plus signs is equal to 0.5, or rr = 0.5. Here
479
SIGN TEST
24.5
TABLE 24.5b TrelllllleDt
Frequency
Total
1
2
Creater thaD media Less thaD median
4 8
8 4
12 12
No. of replicatioDs
12
12
24
again, a distributionfree method is essentially the binomial transfonnation. To test the hypothesis that TI = 0.5, either u or ~test (Section 21.4) may be used, provided that the number of replications is weater than or equal to 10 (Section 21.3). For the example given in Table 24.Sa, there are 4 plus signs and 8 minus signs. By the test of goodness of fit, the two hypothetical frequencies are both equal to nTl or n/2 or 6. Therefore, •~
)( =
(4  6)2
6
+
(8  6)2
6
8
=  .,
6
1.33
with 1 degree of freedom. By the utest, the statistic is
u ..
46
TnTl
ylnnO  TI)
=
ylI2{.S)(.5)
2  
{3
.. 1.15.
(2)
Of course, it is expected that u2 = ~ (Theorem 7.6). The u is a twotailed test; the ~ is a onetailed tesL If the 5% si gnificance level is used, the conclusion is that no difference between the two treabnent effectll is detected, because 1.33 is less than 3.84 and 1.15 is weater than 1.96. Strictly speaking, the sign test is applicable only to the case in which all the n signs are either positive or negative. But in practice the two observations of a replication are sometimes equal. When this occurs, such a replication may be excluded from the test. The ,cvalue of the sign test is exactly the corrected chi square for the randomized block experiment with 2 treatments and n replications of the preceding section. This relation can be shown through the example giveD in Table 24.5a. The 2 x 2 contingency table showing the numbers of observations weater than or lells than their respective replication medians is given in Table 24.5b. In testing the hypothesis that two treabnent medians are equal,
X:
42
82
(12)2
+,c = 12 12 24
12 12 24
24
8
= • 3
(3)
480
DISTRIBUTIONFREE METHODS
Ch.24
Since the correctioD term is equal to (k  I)/k or ~ the corrected chi square is (4) which is the same value given in Equation (1).
This relatioD can also be shoWD algebraically. The mediaa of a replicatioD is the average of the two ObservatiODS iD that replicatioD. A plus sign implies that the fust observatioD is greater than the secoDd one in that replicatioD (Table 24.5a). Therefore, the Damber of plus sips is the Damber of ObservatiODS greater thaD their replication medians for the first treatmeDt and alao the Damber of observatioDs leu than their replicatioa medians for the secoDd treatmeDt. Therefore, the 2 x 2 contingeDCY table is as follows: n T
T
II
IIT
T
II
II
II
2n
The letter T in the above table is the namber of plus signs. By the sign test,
>t .. ul
=
(:T"J
2 = (2T 11(.5)(.5) II
By the method for randomized block experimeDt, 'fI (II _1)1 +21
X! =2
"
"
1 1
11)1.
(5)
(6)
2 2 which can be reduced to the same expreuion given in Equation (5). The sign test is alllO closely related to the normal score transformation. By the normal score transfonnation, the larger one of the two observations in a replication is replaced by 0.56 (Table 11, Appendix) and the smaller one is replaced by 0.56. Then the 'test may be nsed OIl the " differences to test the hypothesis that the population mean of the differences is equal to zero (Section 8.7); or the transformed scores may be analyzed as a randomized block experiment with 2 treatments and " replications. The two methods are equivalent (Section 14.6). For the example of Table 24.5a, , a
1.1726
(7)
SIGN TEST
24.5
481
with 11 degrees of freedom;
F ... 1.3750
(8)
with 1 and 11 degrees of freedom. The numerical values of e and F are slightly greater than those of u and )( of the sign test (Equations 1 and 2). But the conclusions reached through the sign test and the normal score transformation are usually the same. The similarity of the sign test and the normal score transformation stems from the fact that the normal score transformation also amounts to the binomial transformation, when the number of treabDents is two. If the observation in the first treatment is greater than that in the second treatment, the difference between the two normal scores is 0.56  (0.56) or 2(.56). Similarly, if the 6ret observation is less than the second one, the difference is 2(.56). So long as there are no ties, the difference between a pair of normal scores can only assume the value of either 2(.56) or 2(.56). Therefore, the normal score transformation, like the sign test, also amounts to the binomial transformation. The difference between the sign test and the normal score transformation stems from the fact that the sign test oses the hypothetical population variance ,,(1 ".) or (.5)(.5) or 0.25 as the divisor in obtaining the ,cvalue, while the normal score transformation uses the estimated
variance
T nT
s2.n
nl
(9)
as the divisor. The letter T in the above equation is the namber of plos sigos and n  T is the number of minus sigos in the n replications. If the factor (n 1) in the above equation were replaced by n, S2 would be equal to p(l pl. This is another example showing the relation between the analysis of variance and the )(18st (Section 21.8). In general, the ratio of the )(value of the sign test to the Fvalue of the normal score transformation is equal to the ratio of sa to 0.25; that is,
)( s2 . F 0.25 For the example of Table 24.5a, sa
4
8
12
11
=  •  = .242424.
The ratio ,e/F = 1.3333/1.3750 = .9697; while the ratio 8/0.25 is also equal to .9697. In view of the fact that various distributionfree methods dealing with the randomized block experiment with 2 treatments and n replications are
482
DISTRIBUTIONFREE METHODS
Ch.24
practically alike, it does not make much difference which method is used. However, because of its simplicity in computation, the sign test deserves some special consideration. During recent years, it has rightfully become one of the most commonly used statistical methods. The purpose of this section is not only to introduce the sign test, but also to show that this simple method is as good as other seemingly more refined methods. 24.6 Rem.n. No examples of applications are given in this chapter, because the distribntion&ee methods presented here are applied to the same types of problems for which the analysis of variance is used. The only difft;=rence is that the distributionfree methods are used to deal with the populations which are far from being normal. In this chapter the distributionfree methods are explained in an 1D101'thodox fashion, in tenns of transformations and the analysis of variance. The main purpose of developing the distributionfree methods is to free statistics from the shackles of the normal populations. Transformations and the analysis of variance are not crdinarily considered indigenous to the subject of distributionfree methods. Yet, because both techniques are presented in the earlier portion of this book, it is expedient to use them in explaining the distributionfree methods. Moreover, as a byproduct this method of presentation bridges the gap between distributionfree methods and normaldistribution methods. All methods presented in this chapter are called largesample methods. Before one can make use of these methods, he must have large samples. The minimum sample sizes associated with the various methods are derived from the working rule given in Section 21.3. Computation in the distributionfree methods often appears to be simple. Actually, what is simple and what is complicated depend entirely upon what tools are available. If only pencil and paper are available, the distribution&ee methods are decidedly simpler than the analysis of variance. On the other band, if desk calculators are available, the analysis of variance is actually much simpler than the distributionfree methods. The determination of the pooled median for k samples is a tedious job wben the sample sizes are large. But if JIlOre elaborate tools, such as the punch card machines, are available, the distribution&ee methods become simpler than the analysis of variance. These machines enable one to rearrange and classify the observations and tabulate the frequencies very efficiently; while the analysis of variance is a little more time consuming because more complicated computations are required. Obviously, therefore, it is difficult to evaluate a statistical method as to its simplicity, without specifying what tools are available.
24.6
EXERCISES
483
In general, a method considered simple usually implies that it is simple when no tools other than pencil and paper are used. The distributionfree methods can be used on any kind of population. By contrast, the normaldistribution methods, such as the analysis of variance, seem to have very limited use. The distributionfree methods appear to be able to replace the normaldistribution methods completely. However, in reality, this is not true. By analogy the normaldistribution methods are like suits made to fit a tailor's dummy. People who are too fat or too thin cannot wear these suits. Thus the transformations are devices for changing people's weight to fit them into the readymade suits. Besides, even without transfonnations, these suits are flexible enough to fit many people. They are not so rigid as they were once thought to be (Section 12.7). As a result, a large number of people can wear them. In terms of the analogy, then, the distributionfree methods are like loose coats. They are made so big that they can cover anybody but fit nobody in particular. Choosing the more desirable kind of clothes depends very much on a man's size. Therefore, in using the distributionfree methods, one may have something to gain and also something to loee, depending on what kind of population he is dealing with. If the normal distribution methods are used on populations far from being normal, the significance level may be seriously disturbed. If the population is nearly normal, the distributionfree methods are less powerful than the normaldistribution methods.
EXERCISES (1) For the following sets of data, test the hypothesis that the median of the population is equal to the given value: (a) Exercise 1, Chapter 8. (b) Exercise 2, Chapter 8. (2) Use the sign test on the following exercises of Chapter 8: (a) Exercise 3. (b) Exercise 6. (c) Exercise 7. (d) Exercise 8. (e) Exercise 9. (3) Do the following exercises of Chapter 12 by the distributionfree method given in Section 24.3: (a) Exercise 2(b) Exercise 3. (c) Exercise 4. (d) Exercise 8. (e) Exercise 9. (f) Exercise 10. (g) Exercise 11.
484
DISTRIBUTIONFREE METHODS
Ch.24
(h) Exerciee 12. (i) Exerciee 13.
(j) Exercise 14. (It) Exerciee 15. (1) Exercise 16.
Do the following exerciees of Chap"" 14 by the dietributionfree metllod (bued on median) given in Section 24.4: (a) Exerciee 2. (b) Exerciee 3. (c) Exerciee 4(d) Exercise 8. (e) Exercise 10. W Exerciee 11. (5) Repeat Exercise 4 by the normal score transformation. (4)
QUESDONS (1) What is a distributionfree method? (2) What is a median?
(3) The mean and median of a normal population are both equal to p. Why is the s.uple mean IDOre desirable than the sample median as aD estimate of p.? (4) Practically all the methode presented in this chapter are baaed on 8 particular transformation. What is it? (5) What are the advantages and disadvantages of the distributionfree methode as compared with the normaldistribution methode?
REFERENCES Dixoa, W. J. ud Mood, A. M.: "The Statistical Sip Testo" /0l1l7I. of Me Amenean StaIl.lleal Auoeiaai01l, Vol. 41 (1946). pp. 557566. DilDa, W. J.: '"Power Fuacticms of the Sip Teat .d Power Efficieacy lor Normal Alteraativea," Annal. of Maahemaaieal Seatlalle., Vol. 24 (1953), pp. 467473. Mood, A. Y.: lraerotiueeiOll eo Me Theory of Seaai.eiea, McGrawHili Book ColD'" puy. New York, 1950.
APPENDIX
APPENDIX
487
TABLE 1· Table of Random Normal Numbers with Mean Equal to 50 and Variance Equal to 100 1
2
3
4
5
6
7
8
9
10
46 37 62 59 40
53 58 49 60 56
58 46 47 52 51
60 46 36
60 47 40
49
34
40
36
35
47
48 50 43 44 49
46 63 53
51
59 42 61 67 53
78 48 42 49 50
49 51 51 62
57 53
46 55 50 49 36
61 46 54 37 44
53 49 43 51 53
46 61 52 55 34
44
44 29 41 45 51
50
56 30 48 55 49
55 47 49 48 48
45
60
47
66
60
58 53 67
43 43 46 36
63
48 39
42 54 52 58 54
47 57 53 53 52 54 63
53 56 37 48 40
62 45 37 39 54
51 43 54 50 36
45 54 67 45 50
56 51 63 59
43 47 42 43 42
62 59 32 41 63
31 62 45 48 47
57 36 50 42 36
40 54 39 45 45
68
52
51
85
29
41 49 46
45 41 60
64
47 68
38 61
40
50
45 47 53
41 41 57 47 46 57 86 44 42 51
60
53 42 51
66
39 45
67 72 37 38 46
35 46 47
46 49 49 33 39
49 56 76 37 52
46 63 45 62 60
48 48 53 58 48
61 51 56 56 51
58 70 53 61 60
60
41 29
38 56 54 52
48 33
·This table is derived, with the permission of Professor E. S. Pearson, from Tracts for Computers, No. XXV, Department of Statistics, University College,
University of London.
488
APPENDIX
TABLE 1 (continued)
1
2
3
4
5
6
7
8
9
10
62 56 32
51 52 43 39
38
49 39
39 58 44 53 54 44 40 52
43
45 60
40 36 52
56 33 40 59 49
35 49
48
34 36
42 34 52 36 49 47 46 41 40 62
60
50 57 44
51 42 55
63
46 41 47 59 36
77 31 59 44 58 45
47
40 44 51 32 39 48 74
33 50 50 38 51
48 48
46 61 37 41 41 43
71 45
52 36 62 62
67 49 46 53 31 52
56
40
65
42 42 49 57 42 67
55 45 54
53 40 37
50 37 46
52 46
43 52 53
43 47 50
66
58
54
38
51 55 77 54 36
53 42 56 50 61
48
67 55 55 54
68
60
56 48 38 47 68
44
50 47 41
48 68
46
65
42
31 43 59 51
65
53 46
35 32 52 50 57 51 53 42 38 64
40 43 50
52 39 40
41 48
54 64
52 54
34 65 66 46
50 62
47 42 42 60
52 54 49 50 58 50 55 41 53 49 55 70 59 62 60 55
64 60 60
60 43
62 46
57
63
64
50
55
64 44
54 58
59
68
60 44
46 53 42 50 61 53 32 61 52 73
44 34 53 61 32 79 58 52
APPENDIX
TABLE 1 (continued)
1
2
3
46 48 32 57 42 39 48 49 55
55 50 32 47
66
30 48 59 34 50
4
5
6
7
8
9
10
59
66
78 52 63 49 32 44
70 52 42 59
56 54
53
26
50 53 62 55
51 43 39 35 46 50
51
41 69 37 57 44
38 54 61 53 58
45
60
43
56
60
49
55 55 57
52 21 51 42 47 37 59 56 59 42 46 54 34 40 38
54 58 42 51 54 34 55 55
58 40
50 50 51 53 45 79
71
37
48 47 48 52 56 53 49 70 54 50
58 50 44 40 39
62 73 58 52
46 50 69 48
68
71
49 64
52 54 61 58 35 49 46 53 65 60
56 67 39 67
42
45
56
42
48
60
40 57 86 53
52 27 59 57 46 55 52 80 47 42 42 56 56
58 52 51 57 49 50 41
37 60 68
60
52 38
57 46 50 55 45 39 36 51 39
44
62
34
47 36
49 42
41
60
55
35 47 53 49 45 50 26 31
59 49
66
68
49
40 73 49 57 53 47 42
50 54
40
52 43
64
45
50 57 52 35 39 50 53 67 54 33
50 59 62 50 38 51 57
47
60 44
58 50 48 35 38
47 43 43 43
490
APPENDIX
TABLE 1 (coneinued)
1 69 47 46 65
49 59 42 63 47 47 48 60
54 47 50 68
58 46 46 37 44 40 29 59 44
2
3
4
5
6
7
8
9
10
60
53 57 51 48
45 51 52 55 39
45 54 42 57
46 62 45 46 49
58 51 44 56 56
44 62 48 37 40
42 55 54 54 34
38 50 40 50 39
48 54 53 40 48 54 37 41 54 53 58 62 58 76 52
41 39 58 47 59 46 46 29 57 32 40 38
55 56 34 47 50
45 56 40 58 51 51 42
57 61 62 44 50 48 56 55 53
54 30 16 73
66
60
53 61 54 44 27
34 58
43 33
54 55 50 38 35 43 56 52 57
69 49 45 62 44 55 43 42 32 51
58 45 38
52 61 43 42 51
42 53 37 36 59
63 52 51 43 46 35 51 47 64
49 51 43 58 48 43 63 66
29 48 56 48
38 47 63
64
50 34 60
77 49 32 52 38 44
58 48 65 60 64
52 36 36 56 60 38
sa
49
60
66
68
45 44
59 65 65
55 37 46
74 56
71
54 49 42 60 46 47 41 49 33 48 49 48
40 48
37 32 61
65
59 51
55
71
60
63
44 44 42
46 46 41
APPENDIX
491
TABLE 1. (continued)
1
2
3
4
5
6
7
8
9
10
37 50 56
69 51 37
48 62
53
36
48
53 56 51 51
48
45
58
46 27 48 52
55 58 53 55 67
41 52 54 49 66 59 38 34 45 47 54
24 58 37 73 40
43 47 27 70 70 55 50
54 43 67 52 55 40 40 66 46 35 63 61
65 48
51 53 55 61 40 56 34 48 44 64
67 47 39 45
30 53 61 46 56 57 32 11 56 59
64
68
46
64
45
41 55 49 52 44 49 62 56 51
59 56 49 38 70 45 51 52 62
48 38 41 48 46 55 57 46 55 56
44 73 54 41 44
56 57 57 57 56
54 62 54 30 57 46 57 44 44
33 55 57 51
56
44 43 45 47 46 51 56 38 52 48
37 56
56
48
60 61 48 48 26 56
62 38 37 37 30 48 50
63
66
46
35
56 50 56 30 46 54 35 44 49 36
55
49
40 58 47 56 57 54 37 58 55 61
54 58 54 21 61 58 48 44 56 69
50 43 57 56 63 56 59 56 51 41
60
44
57 56 61 44
72 69 69 41 44 59 57 62 60
52 38 44 47
66 34
50 51 46 52 60 67 37 54 40
61 44
492
APPENDIX
TABLE 1 (cora"rauecf)
1
2
3
4
5
6
7
8
9
10
58 53 50
62 56 56
50 39 47
34 55
67 54 46
49 47 38
62 48 59
22
48
52
44
53
54
56
49
55
53 52
43
51
41 44
45
42
52 43
38
66
57 51 28
43 56 47 51 47
55
60
41 55 52
59 51 62 47 46 52 50 52
44
30 53
54 66
45
55
54 40 52 42 47 49
68
61 44 50 58 46
61 67 54 66 78
55 47 46 47 32 62 36 56 53 59 48
27 62 62 63
46 58 40
40 56
57 48 42
50
42 44 66
63
63
57 46 49
47 48
68
42 44 42 39
48
42
58 51 49 49 50 55 49 49 46
60
52 59 54 53 54 49 40 51
48 45
46
53
66
54 56 46 46 64
46 56
51 57 59 54 47 43
44 44 42 56 41
53 49 57 49
55
60
62
31 52
36
51 56 55
48
62
60
48
54 68
48 66 63
46 39 47 37 40 42
71
35 56 35 61 67 39 53 61 54
55
55
28 67 44 28
41 55 55 60
73 33 58
52
41 43 47
51
60 48
51 36
40 48
54 42 61 52 38
35 60
60 40
58 58
49 49 49
58 63
42
47 52 43
493
APPENDIX
TABLE 1. (continued)
1
2
3
4
5
6
7
8
9
10
41 58
54 61 37 46
35 51
52 44 50 47 67
32 56 42 49 49
40
52
53 42 52 47
64
42 51 55
49 53
27 55 39
50 58 39 47 52
35
34 52
26 48
38
44 59
45
64
41 38
52
45
48
60 65
54 39
30
40 75
48
68
57 35
53
64 29 66
61
61 58 42 39 59
44
~
57 44 49 61 40 58 38 43
44 71 62
72 50 67
63
63
68
54
53
49
70
45 64
44
52 57 53 54 37
60
57
52 56 56
65 46
51
48
51
65
56 61 56 36
64
48 48
49
61
48 45
62
54 70 39
58
58
56 56 43 41
44
34 57
45 54
75
60
52 34 43
43 61 42 43 53 52 36
60
55 74 39
48
60
53
39 54
45
44 53
57
60
66
48 40 47 38 41
62 55 62 37
42 52 58 41
48
42
50 38 49 47 33
60
43
52 42 50
44
38
!)2 51
46 51 32
69
34 50
45 59
40
49 55 56
42
52
44
54
66
53
58
51
48
47
52
56
39
43
50
51
43
35
62
51
40 42 43 71
64
50
54 65
50 49 45 46
45
56
38
52 51 41 41 27
494
APPENDIX
TARLE 1 (continued)
6
7
8
9
10
52
48
61
46
55 66 68
40
56 56 45
48 65
56 63 49
45
66
37 52 52 49
50 39 46
44 62
38
66
40
58 43
64
49 62
47 52 53
35 55 53 39
48
64
37
63 49 51 42
42 54 59 49
43
46 48 51
50 49
39 57
1
2
3
4
5
58 61 34 39 49
45 42 54 59
32 55 60
67 36 47
61 35
48 42 51 43 36
56 55 51 56 56
58 49 46 21 58
30 46 42 27 53
53 61 62 60
66 46 38 53
43 41 56 67
47
60
64
53 48 56 65 43
55 45
56 49
S6
54
40 43
39 52
68
60
42
60
39 53 54 45 71 48 67 52 51 65 50 36
68
48
54 48
62 59
44 44 69 46 54
48 70 53
51 52
41
63 46
38
63
47 47
64
56 55 50 43 51
49 32 52 45 58
62
52 52 50 72 55
49 55 54
42 56 53 57 52
50 39 49 57 36
56 48 52 75 49
45 36
66 44
59 56 60
42 38 52 42 54 44 51 41 45
45
60
49 40
44 54
48
72
37
47 39
40
71 52 49 65 40 48
53 34 50 54 43 63
45
33 46 54 48 42 44 60
APPENDIX
495
TABLE 1. (continued)
1
2
3
4
5
6
44 66
37 52
31 55 49
53 37 31
54 57
48
45 41 36 57 50
48
63
41 53 75 56
55 61 46 67
50 29 39
48 45 76 52 47 54 42 39 27 44
40 42 52 26 36 43 41 35
42 43 53
56 35
64
38 45 54 47 50
64
49
71 64
63
47 44 58 48
60
48
7
8
9
58
40 42
50
33
44
63
65
34 39
54 76 53
54 59 44
54 38 50 40
55 52 47 41
48 61
64
63
50 55 85
58
60
52 76 64
49 48
49 71 65
61 50 62
44 43
63
42
49 55
34 43
50 43 47 38 39
50 48
32 64
39
59
31
66
61 39 46
56 50 57
61 41 47 45 49
68
46 67 56 48 43
44
57
42 44 46 67
65
35 36 59 52
64
76
50 64
62
40
52 34 50 56
47
68
52 33 62 32 70
53
39
54 41
53 54 49
51 48 15 45
44 61 37 49
49 33 54 42 37
71 56 51 52 43
53 53 62 55 54
48 38 59 50 38
55 62 52 63 42
55 48 51 51 67
39 39 45 42 46
60
56 56 59
66
10
66
55 43
66
42 34 50 40 46 42 63
48 46 46
496
APPENDIX
TAB LE 1 (continued)
1
2
3
4
5
6
7
8
9
10
28
33 49 59
38 43
37
62
42 46
55
56
64 68
34 54
48
47
53 43
40 52 36 33 42
61 52
36 48 51 59 57
41
63
58 52 43 49
35 51 50 40 51
42 53 54
44 48 51 54 52
53 36 61 54 52
40 52 48
62 49 56
48
68
54
46
51 53 59 56
41 51 53
49 61
65
72 69 51 58 44 54
49 32 45 52 54 57 55
52 47 62 62 46 51 51 47 58 47
43 55 43 39 57 55 38 38 62 49
62 54 46 40 49 49 39
39 41 53 52 59
40
43
41
42
50 56 62 52
48 46
45 50
40 68
52
50 53
56 57
52 50 52 42
58
49
45
46 50 44 49
46
48
52 49 59 34 55 75 41
42 47 54
60
50 55
63
45 42 49 64
61 53 70
44
59 31 42 42 38 63
42 58 43 50 56
60 48
43 46 52 62 61 38 74 64
53 51
63
68
44
48
44 56 39 38 42 59 53 44
50 68
46 50
48 43 47
28
60 29
32
52
48
48 68
57 36 57 62 52 32 S6
52 56 40 46 48 39 33
43 53 69 46 50
37 58 58 59 62
66
497
APPENDIX
TABLE 1. (cominuecl)
1
2
50 54
61 63
66
44
47
40 54
50
59
67
51 52 49 46 41
50 46 70
56 51 43 53 43
58
49 46
61
60 62
58 48
60 44
55
59
47 60
62 60 56
56 47 53 45
42
41 61 52
46 53 50
55
68 60
68
69 46 50
46
43
51 46
8
50
50
45 63 54 30
7
21
52
68
76 53 54 57
6
54 48 40
46
44
5
66
42
10
4
38 30 57
64 54
9
3
38 48
45 66
40 51 56 51
47 48
42 52 46 47 50 69 42 61
62
57 49 52
43 43 50
50 54
43
43 49
61 57 55 34
48
58
50
58 63
47 50 43 50 73
65
64
54 59
60 36
48
43
55 50 71 44
50 52 39 46
54
46 70 71
61 46
64
61 40 51 58 70 53
45
44
52 57 51 56 35
29
58
57 47 53 34
52 50 50 42
48 45
48 68 60
76 35 52 62 52
44 63
60
67 44
44
56
46 57
45 60
51 49 32 53 50 41 48
39 47 44
64
39
45
57 69
41 44
46 53 51
54 50 51 53 58 52 37 43 41 37
56 55 62
54 55 56 61 38
34 30 44 37 54
48 69
50 40 62 50 44
498
APPENDIX
TABLE 1 (continued)

1
2
3
4
5
6
7
46 57
50
43 62
62 59 37 50 54
57 49
40 54 49
51 39 49 45 34
49
40 55
59 62 34 50 38
34 42
50 42 75 50
34 48 49 35 53
77 47 30 53 78
62 45 51 41 57
53 59 63 49 40
42 57 50 39 51
31 57 50 57 45
51 57 46 53 72
54 40 30 51 54
33 35 43 40
53 54 37 46 57
44 47 36 50 52
56 55 51 58 48
41 50 46 56 34
59 35 69 52 67
53 45 56 52 61
46 42 37 48
57 49
41 61 34 62 45
50
61
72 53 49 53 42
59 58 37 54
60
43 48 47 49 39
62 39 58 40 59
60
68
58
57 38 57 40 55
60 64
48
36
37
65
60
64
54 41 39
43 36 59
61 67 45 52
66 64
60
30 47 34 61
44
44
43 70 54 35 41 51 42 49
8
9
44
62 45 56 43
55 64
63
40
10 48
49
33
49
48
69
42 61 35
55
55
45
44
65
46
48 34 54
68
71 62 52
46 57 56 76 47
59 41 35 49 36
50 52 40 52 49
39 44
52 61 47 50 50
50
48
39 57
67 43
41 39
60
64
36 52 71
499
APPENDIX
TABLE 1. (continued)
1
2
3
4
5
61 60
60 60
43 49 30
61 28 50
55 58 51
58 41 45 52 59
40 66 31 62
54 43 61 52
44 64
45
40 37 53
55 51 67
38
41 48 37 31
55
68
47 46 46 33 49
54 53 44 49 55
53 25
51 48
52 59
44 54 50 49 70
33 61 51 63
62 54 45 51 53
54 86 40 42
57 52 56 49 39
47 57 47 . 37
42
37 64
39
55 43 48
71 51 54 39 62
58 55 69
47 55 43 49 67
55 44 56 56 50
56
44
61 58 56 47 56
44 50 40 39 36
47 55 43 49 67
55 44 56 56 50
56
54
48
38 46 58 58
49 45
32
46
48 45 46 54
60
43 52
68
66
64
54 44 58 42
58
60
47 56 52 54 58
54 44 58 42 60
6
7
8
9
10
45
62 57 46 45 62
52
61 33 43 45 53
61 54 29 54 46
48 44 46 53 53
48 50
63
34 54 48 43
43 49 36 49 48
62 56 50 52 45
57 70 55 57 59
62 56 50 52 45
57 70 55 57 59
63
63
27 45
45
44
48
49 57 69
46 38 39 40
44
48 60
43 41 29 60
43 41 29
60
53 67
500
APPE!iDII
TABLE 1 (eontiuecl)
5
6
7
8
9
10
63
as
54
81
6S
40
63 36
4S
54 48
56
32
69
49
70 53 42
72 42 42
59
53
36
37
SO 48
57
60
61 53 57
63
46
51 49
63
52 39 42
47 43 57
69
71
58
48
62 48
38
54
37
54
4S
62
47
53
36 37 48 36 62
59 56
46 50
39 59 41 57
54
51 33 71
55
55 58
60
64 64
40 4S
49 43
22
47
30
48
58
60 44
61 4S
46 34
38 51 39 62 50
66
49 49 31 50
4
1
2
33
46
55
60
SO 44
54
53
60
36
54
SO
34
49
53 47
42
56
38
4S
56 48 42
39 39
40
4S
51
37 60
57
46 37 56
54
54
55 46
59
29 64
44
65 58
57 48
32 51 53 59 48
47
3
54 54
42 45 51
48
56 47 35 49 29
37 49 43 47 46
34 42
36 63
4S
44
50 45
36 44
56 51 38 42 42
46 40 43 52 54
52 68
50 44
58 41 58
45 47
60
SO
4S
46 27 47
49 57 26 72
46
57 54
52 34 50 51 50 45 35
52
55 67
56
4S
47
39 50
43 53
55
52 43
59 58 42
60 44
4S
44
46
40 35
51 54 50 47 43
I
SOl
APPENDIX
TABLE 1. (continued)
1
2
3
42
50 53 59 62 50 42 53
46 46 38 54
51 52 50 60
54
52
55
24
4
5
6
7
38
25
53 67 62
66 60 63
64
71 45 41 61
21
56
47
66
'39
52 55 51 57 46
56 54 51 42 39
65
62 61 32
54 32 46 59 61
55
50 55 49 50 57
54
52 49
52 52 38
38
60
41 74
62
50
65
58 42 35 42 43
36
62
44
44
69 50 79 56
38 57 70 40
43
49 51 56
60
50
39 30
47
63
52
71 45
42
48
60
44
41 41 39
55 52
55 44
49 33
47 52 53 61 34
61 33 50 40 41
6S 34
66 44
47 53 56
40 53 53
42
41
55
55
46
51 46
45 76
66
56
76
44
33
56 43 37 48 36 45
68
56
61 59 58
49
56
39
48
67 52
44
47
58
66
44
54
44
10
54 66
40
50 56 51
9
61
56
45
8
55
45
55
47
57
50 50 52 55 33
56 50 46 31
46 60
54 38 39 60
52 47 45 38
54
48 55
40 60
27
69 48
27 48
51 57 42 56
48 60
59 52
48 50 49 47 62
55 47 58 42
32
53 9 57 51 51
64
63
48 48
43 54 53
502
APPENDIX
TABLE 1 (com,raued)
1
2
3
4
41 54 66
66 60 45
49 57
54 51
55 53 34 43
37 34 33 52
63
65
62 38 42 61
52 50 61
56
45
63
49 42 47 35
59
66
64
45 58
56 50 39 55 42 52 61 76 50 56 36 38 71 41 39
45
64
55 61 55 41
44
60 66
50 53 54 68 48 64 38 52 57
49
54
44
44
42 54
41
55 57 38 61
63
64
33
47 39
9
10
66
54
57 41
34 35 51 50
56
54 41 61 47
6
7
35
49 61 51 41 43 65 45
44
49
69 43 33 40 52 51 56 59
57 59 52 31 51 59 49
59 53 67 62
65
43
46 54
37 62 62
63
54
63
50 67 37 73
68
61 58 58 52
48 53 43
8
5
43 51 56 54 48
50 45
65
69
46
59 41 40 57 58
44
53 70 55 39 63
58
63 54
49 59 51 43 54 53 34
51 45 59 68
37 51 46 56
46 65
63
60
35 54 47 46 46 41 56 40 49
48
35
44
38 54 51
71 39 53
60
54
58 51 55 45
41 53 45 48 54
43 66
27 58 53 45 44
55 53
64
79
54 57
59
36
63
45 64
44
56
42
65
47
54
60
55
APPENDIX
503
TABLE 1. (continued)
3
4
5
6
7
43
48
47 62 59 52
57 38 39 38 41
53
45
44
42 56
59 60
1
2
59 44
46 59
43
60
40 63
8
9
10
67
45
53
46
70
57 41
61 50
52
41 58 58 43 54
52
54 44
47 39 67 33 53
57 42 45 50 49
67 58 36 55 41
40
57
44
33 56 37 32
40
51 39 51 55
67 50 53 42
68 64
49
65
55
67
44
43
60
57 47
65
47
64
40
42 47
39
61
64
58 30 35 53 59
51
64
43 56 61 49
49 45 47 42 73 56 41 67
45 27 39 62
54 36 49 67 58
58
51 50 49 51 52 51 58
49 46
39
55 65
58 51
62
65
55 44 45
55 74 40
58
51 31
57 46
38 41 58
61 61 54 32 49
55 48 53 51 55
49 52 61 78 23
40 46
42 51 41 53 60
40
58 58 68
46 50 51 75 41 56
45
33 55
55
64
39 54 29
63
29 43 44
48 53 60
46
61
50
44
39 48
45 49
55
63
44
48
70 58 41 55
72 40 38 58
51 54
57 56 37 63
53 36 42 59 26 60
45 41 41
64 40
37 51
79 56 37 49 50 40 37 51 49 45
S04
APPENDIX
TABLE 1 (colldnued)
1
2
3
4
5
6
7
59 53 62
46 30 42
42 55 46
44
4S
51 51
59 41
60 60
58
43
44
58
44
67 32 55 74 41 46
63 38
61 41
44
49 57 52 33 47 37
40 74 58 54 61
50 35 57 43
50
55 43
52 46 43 37 52
48
58
51 42
40
52
53
44
63
43 47 59 43
36 67
4S
56 59
48 26 63 48
41 57 53
48
60
30
35
66
59 55 59
54
32 43
35
65
63
42
50 56 43 37 52
49 50
53
48 54 45
45 65
44
63
44
27 50
54
52
46 47 67
29 4S
48
27 50 41 52 36
57 54 55 58 59
49
38
68 62
57
64
68
51
43
36 72 51
49
48
66
58
45
44
36
69 74
40 45
44
55 53
39 41
48
50
46
53 46
39 60
63
65
40
68 48
51
43
49 42 56
10
55
69
48
60
9
72 62 57
41 56
38
8
47 22
44
54
80
53 55 49 56
60
42
47 52 42
34
68 63
38 69
43 42 52 36 52 63
42
44
63
55
50 64
45
66
58
50
54
49
60
60
55
48
4S
37
56 41 56
39
42 41 40
52 36
60 46
52 52
68 50
60
50 54
44
41
505
APPENDIX
TABLE 1. (continued)
1
2
3
4
5
56 58 42 50 27
58 57 47 62
47 52 51
58
70
43 61 53
50
60 65
55
60
42
52
62 55 54
52 55 50 29
42 53 52 37 67
42
71
50
64
40
48
53 56 51
41 50 50
54 78 57 42 36 31 51
63
51 41
58 47
60
34
55 41
36 50
61 56 50 41 62 51 25 48
47 62
43 42 60 48
31
46
71
45 48 25
46 51 44 34
47
40
38 49
61
48
60 44
33 50
44 54
60
41
44 65 44
57 45 54 53 44 44
49
41 67 57 33 71 68
47 56 51 48
43 41 54 57 49
6
7 59 47 32 52 58
8 52
49 42
52
66 60
45
54 52
51 35 55
48 47
28
46
66
50 40 62
40
32 28 49
48
40
58 53 72 47 48
51 43 27
63 58 61 57 44
63
54 57 53
9
10
64
54
49 42 54 27
64
60
43 54
44
36 52 54 52
43
48
54
49
42
58
39 55
68
43 41
34 43 56
44 55
63
60
48
59 47
38
48
39 59 56 56 42
49 58 38 46 45
45
51 43
41 53 57 46 58 44 49 52 34
48
34 53 41 47 52 58
50 47 53 29 49 58
508
APPENDIX
TABLE 2 (COMnuecl) SECOND THOUSAND
14
58
912
1316 1720 2124
2528 2932 3336 3740
1 2 3 4 5
6475 10 30 71 01 60 01 37 33
5838 85 84 12 22 59 20 25 22 89 77 43 63 4430 79 84 95 61 30 85 03 74 25 56 0588 41 03 48 79 0946 56 49 16 14 28 02
6 7 8 9 10
4786 3804 73 50 32 62 97 59
98 70 041:1 83 09 3464 19 95
01 31 59 11 22 37 64 16 78 95 08 83 05 48 00 74 84 06 10 43 4936 63 03 51
73 50 52 51 28 2234 78 39 32 34 93 2488 78 3666 93 02 95 56 24 20 62 83 73 19 32 06 62 06 9929 75 95
11 12 13 14 15
74 01 56 75 49 80 4358 16 65
23 19 4264 0499 4896 37 96
55 59 7909 69 57 13 35 10 50 0854 83 12 19 47 24 87 85 66 6460 32 57 13
82 14 98 70 01
16 4850 2690 55 17 96 76 55 46 92 18 38 92 36 15 50 19 77 95 88 16 94 20 17 92 8280 65 21 22 23 24 25
94 03 47 46 4785 57 61 0830
68 59 0604 6560 63 46 091:1
65 32 25 36 31 68 80 35 78 25 22 50 25 5860
87 62 17 55 87
17 69 61 56 55 95 3811 2490 67 07 66 59 10 28 87 53 7965 59 01 69 78 481:1 45 47 55 44
41 05 14 37 4064 10 37 57 18
41 28 41 57 87
05 51 71 65 91
31 67 70 15 07
87 27 13 62 54
59 47 3328 91 49 3666 5090
69 16 12 12 4343 8706 4604 53 36 3564 3969 32 OS 7734
66 22 42 40 15 96 74 90 9096 63 36 74 69 0963 08 52 82 63 72 92 92 36 00 22 15 01 9399 59 16 35 74 28 36 36 73 0588
48 31 44 68 02 37 31 30 4829 63 83 52 23 84 23 44 41 24 63 33 87 51 07 30 10 7060 71 0264 18 50 6465
78 02 31 80 44 99 7956 23 04 84 17 88 51 9928 24 39 53 92 29 86 20 18 0466 75 26 66 10
0459 34 82 76 56 80 00 5536
2529 81 66 99 22 21 86 79 64 43 12 55 80 46 31 9869
75 89 3488 50 26 2377 72 29 63 67 4094 81 28 19 61 81 70
15 96 0368 82 88 0756 2222 20 13
507
APPENDIX
TABLE 2Table of Random Sampling Numbers FIRST THOUSAND
14 1 2 3
4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 22
912
1316 1720 2124 2528 2932 3336 3740
23 15 75 48 59 01 83 72 0554 55 50 43 10 53 74 14 87 16 03 50 32 4043 38 97 67 49 51 94 05 17 97 31 26 17 18 99 75 53
59 35 62 58 08
93 08 23 53 70
76 90 50 78 94
24 61 05 80 25
97 18 10 59 12
08 37 03 01 58
86 44 22 94 41
95 10 11 32 54
23 96 54 42 88
03 22 38 87 21
67 13 08 16 05
44 43 340 95 13
11740 26 93 81 44 33 93 08 72 32 79 73 31 18 22 6470 6850 01 26 42 94
64 80 98 82
56 23 44 91 401 87 15 69
93 55 69 41
00 75 53 95
90 11 82 96
04 99 403 89 32 58 96 61 77 86 70 45
6407 47 55 73 80 27 48
24 62 86 06 17 10 77 27 79
71 38 36 91 59
26 03 37 89 36
07 07 75 41 82
0655 52 55 63 14 31 57 90 52
84 51 89 97 95
53 61 51 64 65
72 89 43 78 00 92 45 37
10 48 19 26 12
71 83 88 82 07
37 29 85 64 94
30 52 56 35 95
72 23 67 66 91
97 08 16 65 73
57 25 68 94 78
89 35 62 35 09 22 95 71 99
74 01 58 06
18 70 25 18
48 23 48 46 56 21 26 34 66
74 92 20 19 31
33 22 62 05 42
59 29 17 95 23
17 18 06 37 92 65 4806 43 86
45 35 02 74 08
47 05 82 69 62
35 54 35 00 49
41 54 28 75 76
44 22 8988 62 84 67 65 67 42
36 80 54 76
12 88 62 04 01 31 8726
59 78 81 33
0709
25 23 00 10 00 63 28 37 8865
92 81 60 99 26
34 63 25 11 07
42 50 95 12 44
0068
98 09 68 33 25
54 71 82 17 49
43 93 49 36 43 61 31 57 09 97 93 72 61
31 57 35 04 24 95 73 62 02
97 83 89 16 23 2596 24 81 44 25 11 32 21
58
11 38 08 37 44
40
05 23 87 04 36
4036 2571 95 27 38 80
44 67 33 84 53 20
29 74 48 53
4647 5993 69 19 22 540
29 82 53 26 9964 68 75 53 61
7650 15 87 4569 18 67 93 78
03 43 91 01 24
3000 63 61 4883 65 45 32 45
42 81 95 71 52
This table is reproduced with the permission of Professor E. S. Pearson from
Trac'. for Compueer•• No. XXIV, Department of Statistics, University College, University of London.
516
APPENDIX
TABLE 2 (contilaud) TENTH THOUSAND
14 1 2 3 4 5
58
912
1316
1720 2124
2528 2932 3336 3740
13 06 6257 58 92 5696 61 74 80 75 71 50 36 17 50 72 3586 5820 82 21 38 97 09 41 72 01 7009 11 70 94 92 5250 8585 40 62 65 13 05 49 8060 92 12 2590 76 76 70 83 21 17 79 81 00 74 4496 81 96 30 94 50 51 0926 0934 18 70 88'n 6553 'n66 71 35 73 11 58 76 3566 5656 22 25 03 10 21 33 43 42
5420 54 84 21 84 01 12 91 34 8885 2005 77 42 2946 89 87 1869 0902
6 7 8 9 10
0345 43 71 53 17 9644 6909 19 64 70 49 30 20 86 82 6866 53 46 3556 5653 14 92 04 57
34 75 61 30 9266 77 31 55 41
11 12 113 14 15
82 24 43 47 5811 65 52 19 42 24 10 87 57 94 59 98 61 85 73 22 03 29 71 91 90 0054 17 00
41 46 49 52 04 76 25 26 0898 'n56 0222 98 27 03 51 11 18 9899 72 83 8066 19 40 82 74 7160 2890 9284 7862 6048 2054
16 17 18 19 20
03 'n
4897
0864 71 62 12 37 8536 5685 47 32 6897 36 84
35 38 48 78
7888 55 01 67 61 34 52 9092 1020 19 83 22 63 16 03 63 96 20 71 82 IS 82 30 7904 37 89 0409 4875 41 39
18 92 4398
2986 31 07
7461 13 88 80 41 9770 39 17 35 28 0544 3938
41 75 86 52 01 17 4825 81 65 76 82 13 42 19 83 91 65 82 59 32 64 31 65 8971 8259 0176 92 53 4460 7580 3522 3368 3088 63 45 7540 4340 34 16
21 5596 47 65 48 05 95 52 6927 15 22 73 59 23 05 5809 53 92 4266 48 23 28 37 7264 76 66 43 'n 97 24 17 24 47 61 97 72 2089 47 05 33 14 16 25 13 23 75 38 94 97 9835 8446 50
80 71 45 10 93
20 69 05 71 32 13 5900 51 27
24 69 23 38 09 16 59 75 42 59 9734 8765 3815 8185 83 85 4948 4527 3698 4378 90 24 96 37 0665 2038 73 04 34 15 92 74 09 41 32 77 79 13 6611
2380 5879
7248 2241 37 IS
517
APPENDIX
TABLE :3 Area UDder the Normal Curve u
Area
a
Area
u
Area
3.0 2.9 2.8 2.7 2.6
.0013 .0019 .0026 .0035 .0047
1.0  .9  .8  .7  .6
.1587 .1841 .2119 .2420 .2743
1.0 1.1 1.2 1.3 1.4
.8413 .8643 .8849 .9032 .9192
2.5 2.4 2.3 2.2 2.1
.0062 .0082 .0107 .0139 .0179

.5 .4 .3 .2 .1
.3085 .3446 .3821 .4207 .4602
1.5 1.6 1.7 1.8 1.9
.9332 .9452 .9554 .9641 .9713
2.0 1.9 1.8 1.7 1.6
.0228 .0287 .0359 .0446 .0548
0 .1 .2 .3 .4
.5000 .5398 .5793 .6179 .6554
2.0 2.1 2.2 2.3 2.4
.9772 .9821 .9861 .9893 .9918
1.5 1.4 1.3 1.2 1.1
.0668 .0808 .0968 .1151 .1357
.5 .6 .7 .8 .9
.6915 .7257 .7580 .7881 .8159
2.5 2.6 2.7 2.8 2.9 3.0
.9938 .9953 .9965 .9974 .9981 .9987
u
Area
2.5758 1.9600 1.6449 0.6745
.005 .025 .050 .250
a 2.5758 1.9600 1.6449 0.6745
Area
.995 .975 .950 .750
518
APPENDIX
TABLE 4Percentage Points of the X'·Distribution
v
d.,.
99.5"
97.5"
1 392704 xl 010 982069 xl 0' 0.0100251 0.0506356 2 3 0.0717212 0.215795 0.206990 0.484419 4
5" 3.84146 5.99147 7.81473 9.48773
2.5"
1"
5.02389 7.37776 9.34840 11.1433
6.63490 9.21034 11.3449 13.2767
7.87944 10.5966 12.8381 14.8602
.5"
5 6 7 8 9
0.411740 0.675727 0.989265 1.344419 1.734926
0.831211 1.237347 1.68987 2.17973 2.70039
11.0705 12.5916 14.0671 15.5073 16.9190
12.8325 14.4494 16.0128 17.5346 19.0228
15.0863 16.8119 18.4753 20.0902 21.6660
16.7496 18.5476 20.2777 21.9550 23.5893
10 11 12 13 14
2.15585 2.60321 3.07382 3.56503 4.07468
3.24697 3.81575 4.40379 5.00874 5.62872
18.3070 19.6751 21.0261 22.3621 23.6848
20.4831 21.9200 23.3367 24.7356 26.1190
23.2093 24.7250 26.2170 27.6883 29.1413
25.1882 26.7569 28.2995 29.8194 31.3193
15 16 17 18 19
4.60094 5.14224 5.69724 6.26481 6.84398
6.26214 6.90766 7.56418 8.23075 8.90655
24.9958 26.2962 27.5871 28.8693 30.1435
27.4884 28.8454 30.1910 31.5264 32.8523
30.5779 31.9999 33.4087 34.8053 36.1908
32.8013 34.2672 35.7185 37.1564 38.5822
20 21 22 23 24
7.43386 8.03366 8.64272 9.26042 9.88623
9.59083 10.28293 10.9823 11.6885 12.4011
31.4104 32.6705 33.9244 35.1725 36.4151
34.1696 35.4789 36.7807 38.0757 39.3641
37.5662 38.9321 40.2894 41.6384 42.9798
39.9968 41.4010 42.7956 44.1813 45.5585
28 29
10.5197 11.1603 11.8076 12.4613 13.1211
13.1197 13.8439 14.5733 15.3079 16.0471
37.6525 38.8852 40.1133 41.3372 42.5569
40.6465 41.9232 43.1944 44.4607 45.7222
44.3141 45.6417 46.9630 48.2782 49.5879
46.9278 48.2899 49.6449 50.9933 52.3356
30 40 50 60
13.7867 20.7065 27.9907 35.5346
16.7908 24.4331 32.3574 40.4817
43.7729 55.7585 67.5048 79.0819
46.9792 59.3417 71.4202 83.2976
50.8922 63.6907 76.1539 88.3794
53.6720 66.7659 79.4900 91.9517
70 43.2752 80 51.1720 90 59.1963 100 67.3276
48.7576 57.1532 65.6466 74.2219
90.5312 101.879 113.145 124.342
95.0231 106.629 118.136 129.561
25 26
27
100.425 112.329 124.116 135.807
104.215 116.321 128.299 140.169
This table is reproduced with the permission of Professor E. S. Pearson from Biometrika, vol. 32, pp. 188189.
APPENDIX
519
TABLE 5· Percentage Points of y!/1I II
.5%
99.5"
97.5"
5"
2.5"
1"
I 2 3 4
:492704 x 1010 0.0050126 0.0239071 0.0517475
982069 xl 0' 0.0253178 0.0719317 0.1211048
:4.84146 2.99574 2.60491 2.37193
5.02:489 3.68888 3.11613 2.78583
6.63490 4.60517 3.78163 3.31918
7.87944 5.29830 4.27937 3.71505
5 6 7 8 9
0.0823480 0.1126212 0.1413236 0.1680524 0.1927696
0.1662422 0.2062245 0.2414100 0.2724663 0.3000433
2.21410 2.09860 2.00959 1.93841 1.87989
2.56650 2.40823 2.28754 2.19183 2.11364
3.01726 2.80198 2.63933 2.51128 2.40733
3.34992 3.09127 2.89681 2.74438 2.62103
10 11 12 13 14
0.2155850 0.2366555 0.2561517 0.2742331 0.2910486
0.3246970 0.3468864 0.3669825 0.3852877 0.4020514
1.83070 1.78865 1.75218 1.72016 1.69177
2.04831 1.99273 1.94473 1.90274 1.86564
2.32093 2.24773 2.18475 2.12987 2.08152
2.51882 2.43245 2.35829 2.29380 2.23709
15 16 17 18 19
0.3067293 0.3213900 0.3351318 0.3480450 0.3602095
0.4174760 0.4317288 0.4449518 0.4572639 0.4687658
1.66639 1.64351 1.62277 1.60385 1.58650
1.83256 1.80284 1.77594 1.75147 1.72907
2.03853 1.99999 1.96522 1.93363 1.90478
2.18675 2.14170 2.10109 2.06424 2.03064
20 21 22 23 24
0.3716930 0.3825552 0.3928509 0.4026270 0.4119263
0.4795415 0.4896633 0.4991955 0.5081957 0.5167125
1.57052 1.55574 1.54202 1.52924 1.51730
1.70848 1.68947 1.67185 1.65547 1.64017
1.87831 1.85391 1.83134 1.81037 1.79083
1.99984 1.97148 1.94525 1.92093 1.89827
25 26 27 28 29
0.4207880 0.4292423 0.4373185 0.4450464 0.4524517
0.5247880 0.5324577 0.5397519 0.5467107 0.5533483
1.50610 1.49558 1.48568 1.47633 1.46748
1.62586 1.61243 1.59979 1.58788 1.57663
1.77256 1.75545 1.73937 1.72422 1.70993
1.87711 1.85730 1.83870 1.82119 1.80468
30 40 50 60
0.4595567 0.5176625 0.5598140 0.5922433
0.5596933 0.6108275 0.6471480 0.6747950
1.45910 1.39396 1.35010 1.31803
1.56597 1.48354 1.42840 1.38829
1.69641 1.59227 1.52308 1.47299
1.78907 1.66915 1.58980 1.53253
70 80 90 100
0.6182171 0.6396500 0.6577367 0.6732760 1.0000000
0.6965371 0.7144150 0.7294067 0.7422190 1.0000000
1.29330 1.27349 1.25717 1.24342 1.00000
1.35747 1.33276 1.31262 1.29561 1.00000
1.43464 1.40411 1.37907 1.35807 1.00000
1.48879 1.45401 1.42554 1.40169 1.00000
d·f·
00
I
·This table is obtained from Table 4.
520
APPENDIX
TABLE 6· Percentage Points of the eDistribution
v d./.
5%
2.5%
1%
.5%
1 2 3 4
6.314 2.920 2.353 2.132
12.706 4.303 3.182 2.776
31.821 6.965 4.541 3.747
63.657 9.925 5.841 4.604
5 6 7 8 9
2.015 1.943 1.895 1.860 1.833
2.571 2.447 2.365 2.306 2.262
3.365 3.143 2.998 2.896 2.821
4.032 3.707 3.499 3.355 3.250
10 11 12 13 14
1.812 1.796 1.782 1.771 1.761
2.228 2.201 2.179 2.160 2.145
2.764 2.718 2.681 2.650 2.624
3.169 3.106 3.055 3.012 2.977
15 16 17 18 19
1.753 1.746 1.740 1.734 1.729
2.131 2.120 2.110 2.101 2.093
2.602 2.583 2.567 2.552 2.539
2.947 2.921 2.898 2.878 2.861
20 21 22 23 24
1.125 1.721 1.717 1.714 1.711
2.086 2.080 2.074 2.069 2.064
2.529 2.518 2.508 2.500 2.492
2.945 2.831 2.819 2.807 2.797
25 26 27 28 29
1.708 1.706 1.703 1.701 1.699
2.060 2.056 2.052 2.048 2.045
2.485 2.479 2.473 2.467 2.462
2.787 2.779 2.771 2.763 2.756
30 40 60 120
1.697 1.684 1.611 1.658 1.645
2.042 2.021 2.000 1.980 1.960
2.457 2.423 2.390 2.358 2.326
2.750 2.704 2.660 2.617 2.576
00
·This table is reproduced with the pennission of Professor E. S. Pearson from Biometrika, vol 32, p. 311.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1
2
3
4
5
6
7
,
r::... •• •
i•
1.
!
236.77 211.99 2M58 210.16 215.71 199.50 161.45 19.1S1 19.3SO 19.2N 19.2t7 19.164 19.000 18.513 8.1161 1.9t06 9.0 lIS 9.1172 9.2766 9.5521 10.128 6.09t2 6.1631 6.2560 6.3883 6.59U 6.9... 7.7086 401759 4.9501 5.0503 5.1922 5.tot5 5.7161 6.6079 403066 4.2839 401174 405337 407571 5.1433 5.9174 30(1660 .7870 s.971$ 4.1201 401448 407374 5.5914 1.5005 3.58G6 3.6875 3.1378 400662 4.4590 5.1177 3.2927 1.1718 3.4817 3.6131 3.1626 402565 5.1174 3.1355 3.2172 3.3258 1.4710 3.7013 401028 4.9646 3.0121 3.0946 3.2019 1.3567 1.5874 3.9121 4.8441 2.9134 2.9961 3.1059 3.2592 3.4903 1.1853 4.7472 2.1321 2.9153 3.0254 3.1791 3.4105 3.8056 4.6672 2.7642 2.8417 2.9582 3.1122 3.3419 1.7389 4.6001 2.7066 2.7905 2.9013 3.0556 a 3.2874 1.6123 4.MIl 2.6572 2.7413 2.1524 3.0069 3.2389 1.6317 4.4940 JI 2.6143 2.6987 2.1100 2.9647 3.1961 3.5915 4.4513 I 2.5767 2.6611 2.7729 2.9277 3.1599 3.5546 4.4139 2.S411 2.6281 2.7401 2.8951 3.1274 1.5219 4.3808 2.5140 2.5990 2.7109 2.1661 3.0984 3.4928 4.3513 20 2.4876 2.5727 2.6141 2.1401 3.0725 3.4661 4.3241 21 0 2.4638 2.5491 2.6613 2.1167 3.0491 3.4434 4.3009 22 2.4422 2.5277 2.6400 3.0_ 2.7955 3.4221 4.2793 21 2.42. 2.5082 2.6207 2.7763 3.0018 3.4028 4.2597 24 a 2.4047 2.4904 2.6010 2.7587 2.9912 3.3852 4.2417 25 2.3813 2.4741 2.5161 2.7426 2.9751 3.3690 4.2252 26 2.1732 UBI 2.5719 2.7278 2.9604 3.3541 4.2100 27 2.1S93 2.4453 2.5581 2.7141 2.9467 3.3404 4.1960 28 2.3463 2.432t 2.MM 2.7014 2.9340 3.3277 4.1830 29 2.4105 2.3341 2.5136 2.6196 2.9223 3.3158 401709 30 2.2490 2.11S9 2.4495 2.6060 2.1387 3.2317 400841 40 2.1665 2.2S40 2.3613 2.5252 2.7581 3.1504 4.0012 60 2.0867 2.1750 2.2900 2.4472 2.6102 3.0718 3.9201 120 2.0096 2.0986 2.2141 2.3719 2.6049 2.9957 3.8415 • ".i. t.w. io reproduc.d willt lit. ,..&.eloa of Prof....r E. So P._D ...... B. . .Irii....1. aa. ........ 1.
lIa
I5Z D• .,.•• of Freeclo. for N_.rator
TABLE 7.S" PoiDla.f lit. F.DilllrlJllllioll
211.11 19.171 8.8452 600410 4.1181 4.1468 ..,.7 1.4111 3.22M 3.0717 2.9480 2.1416 2.7669 2.6987 2.6408 1.5911 2.5480 2.5102 2.4768 2.4471 2.4205 2.1965 2.1748 2.lSn 2..171 2.3205 2.3053 2.2911 2.2782 2.2662 2.1802 2.0970 2.0164 1.9384
I
_.M 19.385 1.1121 5.9911 407725 400990 '6767 3.3881 3.1789 1.02104 2.1962 2.7964 2.7144 2.6411 2.5876 2.5377 2.4943 2.4563 2.4m 2.1928 2.3661 2.3419 2.3201 2.1002 2..21 2.:.55 2.2501 2._ 1.2229 2.2107 2.1240 2.0401 1.9588 1.1799
9
,
I
~
....
en
>C
o
Z

~
>
..

Q
i
0
lao
•
!
~
e0
oS
Q
.
II
II
0
e
• .s.
S
20 21 22 23 24 25 26 27 28 29
10 11 12 13 14 15 16 17 18 19
5 6 7 8 9
2 3 4
I
...
30 40 60 120
III
~


2.1646 2.0772 1.9926 1.9105 1.8307



2.0921 2.0035 1.9174 1.8337 1.7522
2.2'%76 2.2504 2.2258 2.2036 2.1834 2.1649 2.1479 2.1323 2.1179 2.1045
4.6777 3.9999 3.5747 3.2840 3.0729 2.9130 2.7876 2.6866 2.6037 2.5342 2.4753 2.4247 2.3807 2.3421 2.3080
4.7351 4.0600 3.6365 3.3472 3.1373 2.9782 2.8536 2.7534 2.6710 2.6021
2.5437 2.4935 2.4499 2.4117 2.3779 2.3479 2.3210 2.2967 2.2747 2.2547 2.2365 2.2197 2.2043 2.1900 2.1768
243.91 19.413 8.7446 5.9117
12
24L88 19.396 8.7855 5.9644
10
 
I
I
II
I


2.2033 2.1757 2.1508 2.1282 2.1077 2.0889 2.0716 2.0558 2.0411 2.0275 2.0148 1.9245 1.8364 L7505 1.6664
2.8450 2.7186 2.6169 2.5331 2.4630 2.4035 2.3522 2.3077 2.2686 2.2341
4.6188 3.9381 3.5108 3.2184 3.0061
245.95 19.429 8.7029 5.8578
15

20

L9317 1.8389 1.'7480 1.6581 1.5705
2.0075 1.9898 1.9736 L9586 1.9446
2.1242 2.0960 2.0707 2.0476 2.0267
2.3275 2.2756 2.2304 2.1906 2.1555
2.7740 2.6464 2.5436 2.4589 2.3879
4.5581 3.8742 3.4445 3.1503 2.9365
' 

1.8814 L1929 L1001 L6084 1.5113
2.7372 2.6090 2.5055 2.4202 2.3487 2.2878 2.2354 2.1898 2.1497 2.1141 2.0825 2.0540 2.0283 2.0050 1.9838 L9643 L9464 1.9299 1.9147 1.9005
249.05 19.454 8.6385 5.7744 4.5272 3.8415 3.4105 3.1152 2.9005
24


USO.
__
4.4638 3.7743 3.3404 3.0428 2.8259
4.4957 3.8082 3.3758 3.0794 2.8637 2.6996 2.5705 2.4663 2.3803 2.3082 2.2468 .2.1938 2.1477 2.1071 2.0712 2.0391 2.0102 L9842 1.9605 L9390 L9192 L9010 1.8842 L8687 1.8543 L8409 L7444 LM91 1.5543
25L14 19.471 8.5944 5.7170
250.09 19.462 8.6166 5.7459
'
L6223 1.5089 LSl9S L2539
1.6835 1.5166 1.4613 US 19 1.7396 L6313 1.5343 L4290 I.U80
1.1110 L6906 1.6717 L6541 L6371
1.2214
1.8963 La657 L8380 1.8128 1.7897
~
1.0000
L8432 1.8117 1.7831 1.7570 1.7331
2.5379 2.4045 2.2962 2.2064 2.1307 2.0658 2.0096 L9604 1.9168 L8780
1.7684 1.7488 1.7307 L7138 1.6981
4.3650 3.6688 3.2298 2.9276 2.7067 4.3984 3.7047 3.2674 2.9669 2.7475 2.5801 2.4480 2.3410 2.2524 2.1778 2.1141 2.0589 2.0107 1.9681 1.9302

254.32 19.496 1.5265 5.6281
253.25 19.487 1.5494 5.6581
120
252.20 19.479 8.5720 5.6878 4.4314 3.7398 3.3043 3.0053 2.7872 2.6211 2.4901 2.3842 2.2966 2.2230 2.1601 2.1058 2.0584 2.0166 L9796 L9464 L9165 L8895 L8649 L1424 LI217 L8027 1.7851 L7689 L7537
60
_   '    
1.7918 L6928 L5943 1.4952 1.3940
L9938 L9645 L9380 1.9139 L8920 1.8718 L8533 L8361 L8203 L8055
2.6609 2.5309 2.4259 2.3392 2.2664 2.2043 2.1507 2.1040 2.0629 2.0264
40
30
Delree. of Freedolll for Numerator
248.01 19.446 8.6602 5.8025

TABLE 7. (caecI) 5,. Poiat. of the F.Dielributioa
.;,
>C
o
2:
P2
> .;,
~ ~
en
4.4613 4.4199 4.3828 4.3492 4.3187
4.2909 4.3655 4.2421 4.2205 4.2006 4.1821 4.0510 3.9253 3.8046 3.6889
5.8715 5.8266 5.7863 5.7498 5.7167
5.6864 5.6586 5.6331 5.6096 5.5878 5.5675 5.4239 5 ••57 5.1524 5.0239
..•
•••
:4
3.5894 3.463l 3.3425 3.2270 3.1161
3.6943 3.6697 3.6472 3.6264 3.6072
3.8587 3.8188 3.7829 3.7505 3.7211
4.1528 4.0768 4.0112 3.9539 3.9OM
4.8256 4.6300 4.4742 4.3472 4.2417
7.7636 6.5988 5.8898 5.4160 5.0781
864.16 39.165 15.439 9.9792
eN. t ..... i. reproduced with the .  •• Ioa of Ptol•• _

19
2» 21 22 23 24 CI 2S 36 27 28 29 30 40 60 12»
0
i•! lao ...
4.76SO 4.6867 4.6189 4.5597 4.5075
5.4564 5.2559 5.0959 4.9653 4.8567
6.1995 6.1151 6.0420 5.9781 5.9216
6.9367 6.7241 6.5538 6.4143 6.2979
10 11 12 13 14
I
8.4336 7.2598 6.M15 6.0595 5.7147
799. SO 39.000 16.044 10.649
2
15 16 17 18
CI
0
i I ..s
!
10.007 8.U31 8.0727 7.5709 7.2093
647.79 38.506 17.443 12.218
1
5 6 7 8 9
1 2 :4 4
~
IIJ
TABLE 7b'
3.5147 3.4754 3.4401 3.4083 3.3794 3.3530 3.3289 3.3067 3.2863 3.2674 3.2499 3.136 1 3.0077 2.8943 2.7858 E. S. P•• _
3.8043 3.7294 3.6648 3.6083 3.5587
4.4683 4.2751 4.1212 3.9959 3.8919
7.3879 6.2272 5.5226 5.0526 4.7181
899.58 39.241 15.101 9.6045
4
3.4147 3.3406 3.2767 3.2209 3.1718
4.0721 3.8807 3.7.3 3.6043 3.5014
6.9777 5.8197 5.1186 4.6517 4.3197
937.11 39.331 14.735 9.1973
6
3.2891 3.1.3 3.2501 3.0895 3.2151 3.0546 3.1835 3.0232 3.15. 2.9946 3.1287 2.9685 3.1048 2.9447 3.0828 2.9228 3.0625 2.9027 3.0438 2.1840 3.0265 2.1667 2.9037 2.7444 2.7863 2.6274 2.6740 2.5154 2.5665 2.4082 fro. Bu..,",a. yol. 33,
3.5764 3.5021 3.4379 3.3820 3.3327
4.2361 4.0440 3.8911 3.7667 3.66M
7.1464 5.9876 5••52 4.8173 4.4844
92L85 39.298 14.885 9.3645
5
Depe.. of Fre.dom for Nlllller.tor
2.5" Point. of tis. F·Dietribatloa
,Po
3.0074 2.9686 2.9338 2.9024 2.8738 2.8478 2.8240 2.8021 2.7820 2.7633 2.7460 2.62. 2.5068 2.3948 2..75 8283.
3. 19M 3.2194 3.1556 3.0999 3.0509
3.9498 3.7586 3.6065 3.4827 3.3799
948.22 39.355 14.624 9.0741 6.8531 5.6955 4.9949 4.5286 4.1971
7
2.9128 2.8740 2.8392 2.8077 2.7791 2.7531 2.7293 2.7074 2.6872 2.6686 2.6513 2.5289 2.4117 2.2994 2.1918
3.1987 3.1248 3.0610 3.0053 2.9563
3.8549 3.6638 3.5118 3.3880 3.2853
6.7572 5.5996 4.8994 4.4332 4.1020
956.66 39.373 14.540 8.9796
8
2.5746 2.4519 2.3344 2.2217 2.1136
2.8365 2.7977 2.76. 2.7313 2.7027 2.6766 2.6528 2.6309 2.6106 2.5919
2.8800
3.1227 3.0488 2.9849 2.9291
3.7790 3.5879 3.4358 3.3120 3.2093
6.6810 5.52M 4.8232 4.3572 4.0260
963.. 39.387 14.473 8.9047
9
1
~
to.:>
CI1
>
."

..
Q
i
0
r!•
." II
0
S
Q
.. oS
8 II
a 'jj
•
S
 
00
120
60
30 40
20 21 22 23 24 25 26 27 28 29
10 11 12 13 14 IS 16 17 18 19
5 6 7 8 9
1 2 3 4
i~
va
II
2.5437 2.4935 2.4499 2.4117 2.3779 2.3479 2.3210 2.2967 2.2747 2.2547 2.2365 2.2197 2.2043 2.1900 2.1768 2.1646 2.0772 1.9926 1.!»105 1.8307
_ _ _ _ _ L_
2.0921 2.0035 1.9174 1.8337 1.7522
2.1649 2.1479 2.1323 2.1179 2.1045
2.2'Z76 2.2504 2.2258 2.2036 2.1834
I
I
2.2033 2.1757 2.1508 2.1282 2.1077 2.0889 2.0716 2.0558 2.0411 2.0275 2.0148 1.9245 1.8364 L7505 1.6664
2.8450 2.7186 2.6169 2.5331 2.4630 2.4035 2.3522 2.3077 2.2686 2.2341
2.9130 2.7876 2.6866 2.6037 2.5342 2.4753 2.4247 2.3807 2.3421 2.3080
I
4.6188 3.9381 3.5108 3.2184 3.0061
4.6777 3.9999 3.5747 3.2840 3.0729
4.7351 4.0600 3.6365 3.3472 3.1373 2.9782 2.8536 2.7534 2.6710 2.6021
245.95 19.429 8.7029 5.8578
15
243.91 19.413 8.7446 5.9117
12
241.88 19.396 8.7855 5.9644
10
L9317 1.8389 1.7480 1.6587 1.5705
2.0075 1.9898 1.9736 1.9586 1.9446
2.1242 2.0960 2.0707 2.0476 2.0267
2.3275 2.2756 2.2304 2.1906 2.1555
2.7740 2.6464 2.5436 2.4589 2.3879
4.5581 3.8742 3.4445 3.1503 2.9365
248.01 19.446 8.6602 5.8025
20
1.8874 L7929 L7001 L6084 1.5173
2.0825 2.0540 2.0283 2.0050 L9838 L9643 L9464 1.9299 1.9147 1.9005
2.7372 2.6090 2.5055 2.4202 2.3487 2.2878 2.2354 2.1898 2.1497 2.1141
4.5272 3.8415 3.4105 3.1152 2.9005
249.05 19.454 8.6385 5.7744
24
L8409 L7444 L6491 1.5543 L4591
1.9192 L9010 1.8842 L8687 L8543
2.0391 2.0102 L9842 1.9605 L9390
2.2468 .2.1938 2.1477 2.1071 2.0712
2.6996 2.5705 2.4663 2.3803 2.3082
4.4957 3.8082 3.3758 3.0794 2.8637
250.09 19.462 8.6166 5.7459
30
De,ree. of Freedom for Numer.tor
5" Point. of the FDimibutioa
TABLE 7. (eonalllleel)
1.7918 L6928 L5943 1.4952 1.3940
LB718 L8533 L8361 L8203 1.8055
L9938 L9645 L9380 1.9139 1.89Z
2.6609 2.5309 2.4259 2.3392 2.2664 2.2043 2.1507 2.1040 2.0629 2.0264
4.4638 3.7743 3.3404 3.0428 2.8259
25L14 19.471 8.5944 5.7170
40
L7396 L6173 1.5343 L4290 1.3180
L9464 L9165 L8895 LB649 L8424 L8217 LB027 1.7851 L7689 L7537
2.6211 2.4901 2.3842 2.2966 2.2230 2.1601 2.1058 2.0584 2.0166 L9796
252.Z 19.479 8.5720 5.6878 4.4314 3.73. 3.3043 3.0053 2.7872
60
L6223 1.5089 L3893 L2539 1.0000
1.7110 L6906 1.6717 L6541 L6377 1.7684 1.7488 1.7307 L7138 L6981 1.6835 L5766 L4673 Ll519 L2214
L8432 1.8117 1.7831 1.7570 1.7331
2.5379 2.4045 2.2962 2.2064 2.1307 2.0658 2.0096 L9604 1.9168 L8780
4.3650 3.6688 3.2298 2.9276 2.7067
254.32 19.496 8.5265 5.6281
00
1.8963 1.8657 L8380 1.8128 1.7897
2.1141 2.0589 2.0107 1.9681 1.9302
4.3984 3.7047 3.2674 2.9669 2.7475 2.5801 2.4480 2.3410 2.2524 2.1778
253.25 19.487 8.5494 5.6581
120
I
I
>
." ."
~ ~
en
• ••iii • CI
0
i! ...'"'
I
oS
CI
0
Ill)
60

30 40
27 28 29
•
:II 21 22 23 24 25
15 16 1'7 18 19
5.9781 5.9216 5.8715 5.8266 5.7863 5.7498 5.7167 506864 5.6586 5.6331 5.6096 5.5878 5.5675 5.'239 5.2857 5.1524 5.0239
6.0.20
6.1995 6.1151
6.9367 6.7241 6.5538 6.4143 6.2979
647.79 38.506 17.443 12.218 10.007 8.8131 8.0727 7.5709 7.2093
1
I
. . .55 .. 2421 .. 2205 4.2006 .. 1821 ..0510 3.9253 3.8046 3.6889
.. 2909
'.3187
•• 3492
•• 3828
4.4613
...."
5.'564 5.2559 5.0959 '.9653 '.8567 .. 7650 '.6867 .. 6189 .. 5597 ..5075
8.4336 7.2598 6.5415 6.0595 5.71'7
799.50 39.000 16.044 10.649
2
3.8587 3.8l88 3.7129 3.7505 3.7211 3.6943 3.6697 3.6472 3.6264 3.6072 3.5894 3.4633 3.M25 3.2270 3.1161
3.9539 3.90M
4.0112
'.1528 ..0768
'.8256 .. 6300 ...742 '.3472 .. 2417
86.. 16 39.165 15.'39 9.9792 7.7636 6.5988 5.8898 5."60 5.0781
3
3.51'7 3••75. 3.4401 3.4083 3.379' 3.3530 3.3289 3.3067 3.2863 3.2674 3.2499 3.1.1 3.0077 2.8943 2.7858
3.6083 3.5587
a.6648
3.8043 3.729'
'.4683 .. 2751 '.1212 3."59 3.8919
899.58 39.248 15.101 9.6045 7.3879 6.2272 5.5226 5.0526 •• 7181
4
6 937.11 39.331 1'.735 9.1973 6.9777 5.8197 5.1186 '.6517 •• 3197
•• 0721 3.8807 3.7283 3.6043 3.501. 3.5764 3•• 1.7 3.5021 3.3406 a••a~ 3.2767 3.3820 3.2209 3.3327 3.1718 3.2891 3.1283 3.2501 3.0895 3.2151 3.0546 3.1835 3.0232 3.15. 2."" 3.1287 2.9685 3.1048 2.9447 3.0128 2.9228 3.0625 2.9027 3.04S8 2.8140 3.0265 2.8667 2.9037 2.7444 2. '7163 2.6274 2.6740 2.515. 2.5665 2.4082 fro. B'elrlAa. yol. 33, '.2361 '.0440 3.8911 3.7667 3.66M
92L85 39.298 1'.885 9.3645 7.1464 5.9876 5.2852 '.8173 .. 4844
5
Dell'eee of Freedom for Numerelor
eN. lui. ie resodaced wilb lbe peralee'_ of ~aI•••_ E. S. P ••IIOD
•..
JI
!
5 6 7 8 9 10 11 12 13 1.
•
1 2 3
~
V2
TABLE 7b· 2.5" Pointe of lbe P·Dl.tribulioa
,Po
7
3.007' 2.9686 2.9U8 2.9026 2.8738 2.8478 2.8240 2.8021 2.7820 2.7633 2.74060 2.62. 2.5061 2.39. 2..75 8z.83.
3.0999 3.05009
3.1SS6
3.9498 3.7586 3.6065 3•• 27 3.3799 3. 29M 3.2194
948.22 39.as5 1'.62' 9.01'1 6.8531 5.m55 '."49 .. 5286 .. 1971
8
3.1987 3.1248 3.0610 3.0053 2.9563 2.9128 2.87.0 2.8392 2.1077 2.7791 2.7531 2.7293 2.707. 2.6872 2.6686 2.6513 2.5289 2.4117 2.299. 2.1918
3.8549 3.6638 3.5118 3.3880 3.2853
956.66 39.373 1'.540 8.9796 6.7572 5.5996 • .899. '.4332 •• 1020
2.8365 2.7977 2.7628 2.7313 2. 7027 2.6766 2.6528 2.6309 2.6106 2.5919 2.5746 2.4519 2.3344 2.2217 2.1116
3.1227 3.0488 2.0849 2.9291 2.8800
3.1790 3.5879 3.4358 3.U2O 3.2093
M3.28 39.387 1...73 1.90'7 6.6810 5.52M ..8232 .. 3572 ..0260
9
II
I
I
I
I
W
tit t.:I
>
"'CI ~ ~
..
CI
0
'"••" .••
i•.
J.. .s
!
.s•
S
6.6192
2.1131 2.1348 2.6998 2.6682 2.6396
2.6135 2.5895 2.5676 2.5413 2.5286 2.5112 2.3882 2.2702 2.1570
2.0483
20 21 22 23 24
25 26 27 28 29
30 40 60 120
oe

3.0602 2.9862 2.9222 2.8664 2.8113
3.1168 3.5251 3.3136 3.2491 3.1469
4.1611 4.2951 3.9639
506613


204120 2.2882 2.1692 2.Gl48 1.9M7
2.5149 2.4909 204688 204484 2.4295
2.6158 2.6368 2.6011 2.5699 2.5412
6.5246 5.3662 4.6658 4.1991 3.8682 3.6209 306296 3.2113 3.1532 3.!lS01 2.9633 2.8890 2.8249 2.1689 2.7196
976.71 390415 14.331 8.1512
~.63
39.398 14.419 8.8439
12
10
10 11 12 13 14 IS 16 17 18 19
8 9
5 6 7
I 2 S 4
"2 ~


1.8326
2.3012 2.1819 2.0613 1.9150
2.5131 2.5331 2.4984 2.4665 2.4314 2.4110 2.3161 2.3644 2.3438 2.3248
2.8621 2.1815 2.1230 2.6661 2.6111
3.5211 3.3299 3.1112 3.!lS21 2.9493
6.4211 5.2681 4.5618 4.1012 3.1694
984.87 39.431 14.253 8.6565
15
TABLE1I»(~
2.2422 2.2174 2.1946 2.1135 2.1540 2.1359 2.0069 1.8111 1.7591 1.6402
2.3005 2.2159 2.2533 2.2324 2.2131 2.1952 2.0611 1.9445 1.8249 1.7085
2.4016 2.3615 2.3315 2.2919 2.2693
6.2180 5.1112 404150 3.9412 3.6142 3.3654 3.1125 3.0181 2.8932 2.1888 2.1006 2.6252 2.5598 2.5021 2.4523
997.25 39.456 14.124 8.5109
24
2.1816 2.1565 2.1334 2.1121 2.0923 2.0139 1.9429 1.1152 1.6899 1.5660
2.6431 2.5611 2.5021 204445 2.3931 2.3486 2.3082 2.2718 2.2389 2.2090
6.2269 5.0652 4.3624 3.8940 3.5604 3.3110 3.1116 2.9633 2.8313 2.1324
1001.4 390465 14.081 8.4613
30
40
104835
2.1183 2.0928 2.0693 2.0471 2.0216 2.0089 1.8152 1.7440 1.6141
2.2813 2.2465 2.2091 2.1163 2.1460
2.5850 2.5085 2.4422 2.3842 2.3329


6.1151 5.0125 4.3089 3.8398 3.5!lS5 3.2554 3.0613 2.9063 2.1191 2.6142
1005.6 390473 14.031 8.4111
D....... of Fre.dom for Numeretor
2.4645 2.4241 2.3890 2.3561 2.3213
3.4186 3.2261 3.0128 2.9411 2.11431 2.'7559 2.6808 2.6158 2.5590 2.5089
6.3285 5.1684 4.4661 3.9995 3.6669
993.10 390448 14.161 8.5599
20
2.5" Poiat. of tb. FOi_dhlllioa
,


2.GlI1 2.0251 2.0011 1.9196 1.9591 1.9400 1.1028 1.6668 1.5299 1.3883
2.22M 2.1119 2.1446 2.1101 2.0199
2.5242 204411 2.3101 2.3214 2.2695
3.19M 3.0035 2.11418 2.1204 2.6142
6.1225 4.9519 4.2544 3.1844 3M9S
1009.8 390481 13.992 8.3604
60

1.9CII5 1.8111 1.8521 1.8291 1.1012 1.1861 1.6311 1.4822 l.3101 1.0000
1.8664 1.1242 1.5110 1.4321 1.2684
2.0853 2.0122 2.0032 1.9611 1.9353 1.9111 l.tlN5 1.9299 1.9012 1.1861
2.1S62 2.1141 2m6O 2.0n5 2.0099
2.4611 2.38S1 2.3153 2.2558 2.2032
2.3953 2.3163 2.2414 2.1169 2.1333
6.0153 4.8491 4.1423 3.6102 3.3129 3.0198 2.8821 2.1249 2.5955 2.41'12
6.0693 4.9045 4.1989 3.7219 3.3911 3.1399 2.M41 2.7114 2.t159O 2.5519
1018.3 39.498 13.902 8.2513
oe
1014.0 39••90 13.M1 8.3092
120
I
I
""
>:c
C
2
[IIJ
"0
>
tn
~
Q
~
..
.
0

3.4735 3.2910 3.1187 2.9559 ?5020
3.6990 3.5138 3.3389 3.1735 3.0173 4.0179 3.8283 3.6491 3.4796 3.3192
4.5097 4.3126 4.1259 3.9493 3.7816
5.3904 5.1785 4.9774 4.7865 4.6052
7.5625 7.3141 7.0771 6.8510 6.6349
30 40 60 120
00
3.6272 3.5911 3.5580 3.5276 3.4995
3.8714 3.8117 3.7583 3.7102 3.6667
3.8550 3.8183 3.7848 3.7539 3.7254
4.1027 4.0421 3.9880 3.9392 3.8951
4.1774 4.1400 4.1056 4.0740 4.0449
4.4307 4.3688 4.3134 4.2635 4.2184
4.6755 4.6366 4.6009 4.5681 4.5378
4.9382 4.8740 4.8166 4.7649 4.7181
4.3183 4.2016 4.1015 4.0146 3.9386
5.5680 5.5263 5.4881 5.4529 5.4205
5.8489 5.7804 5.7190 5.6637 5.6136
4.5556 4.4374 4.3359 4.2479 4.1708
7.7698 7.7213 7.6767 7.6356 7.5976
8.0960 8.0166 7.9454 7.8811 7.8229
4.8932 4.7726 4.6690 4.5790 4.5003
25 26 27 28 29
20 21 22 23 24
5.4170 5.2922 5.1850 5.0919 5.0103
5.3858 5.0692 4.8206 4.6204 4.4558
10.672 8.4661 7.1914 6.3707 5.8018
7
8
3.0665 2.8876 2.7185 2.5586 2.4073
3.1n6 3.3045 3.1238 2.9930 2.953W·8233 2.7918 2.6629 2.6393 2.5113
3.4567 3.3981 3.3458 3.2986 3.2560
3.8948 3.7804 3.6822 3.5971 3.5225
4.9424 4.6315 4.3875 4.1911 4.0297
10.158 7.9761 6.7188 5.9106 5.3511
3.2172 3.181R 3.1494 3.1195 3.0920
3.5644 3.5056 3.4530 3.4057 3.3629
9 6022.5 99.388 27.345 14.659
3.3239 3.2884 3.2558 3.2259 3.1982 3.4568 3.4210 3.3882 3.3581 3.3302
3.6987 3.6396 3.5867 3.5390 3.4959
4.0045 3.8896 3.7910 3.7054 3.6305
5.0567 4.7445 4.4994 4.3021 4.1399
5.2001 4.8861 4.6395 4.4410 4.2779 4.1415 4.0259 3.9267 3.8406 3.7653
10.289 8.1016 6.8401 6.0289 5.4671
5981.6 99.374 27.489 14.799
10.456 8.2600 6.9928 6.1776 5.6129
5928.3 99.356 27.672 14.976
e11li. lable i. reproduced with the peTmiaaiOD of Proleaaor E. S. Pear.oD &0lIl Biomf!rrika, yol. 33, pp. 8485.
•.
~
II
6.3589 6.2262 6.1121 6.0129 5.9259
IS 16 17 18 19
.. oS 8. ..
Q
8.6831 8.5310 8.3997 8.2854 8.1850
5.6363 5.3160 5.0643 4.8616 4.6950
5.9943 5.6683 5.4119 5.2053 5.0354
6.5523 6.2167 5.9526 5.7394 5.5639
7.5594 7.2057 6.9266 6.7010 6.5149
10.044 9.6460 9.3302 9.0738 8.8616
10 11 12 13 14
S • .§a
il
10.967 8.7459 7.4604 6.6318 6.0569
11.392 9.1483 7.8467 7.0060 6.4221
12.060 9.7795 8.4513 7.5910 6.9919
13.274 10.925 9.5466 8.6491 8.0215
16.258 13.745 12.246 11.259 10.561
6 5859.0 99.332 27.911 15.207
5 5763.7 99.299 28.237 15.522
5 6 7 8 9
1 2 3 4
5624.6 99.249 28.710 15.977
3
5403.3 99.166 29.457 16.694
2
4999.5 99.000 30.817 18.000
1
va
4
0,,11'''''. of Fr."dom for Numeralor
4052.2 98.503 34.116 21.198
~
1~
TABLE 7c. Pol•• oIth. F.olelrlla.laa
J
I
en ~ en
>
~ ~
• C•
0
r.
!
10
2.9791 2.8005 2.6318 204121 2.3209
30 60 60 120

3.1291 3.0961 3.0618 3.0320 3.0045
25 26 2'7 28 29
6055.8 99.399 27.229 14.546 10.051 '7.8'741 6.6201 5.8143 5.2565 4.8492 4.5393 6.2961 6.1003 3.9391 3.8049 3.6909 3.5esl 3.5082 304338
3.3682 3.3091 3.2516 3.2106 3.1681
15 16 17 18 14)
10 11 12 13 14
5 6 T 8 9
•
1 2: 3
20 21 22 23 26
,•
J•
. J!
j
i
•
!
va
~
12
2.8431 2.6648 204961 2.3363 2.1848
2.9931 2.95'79 2.9256 2.8959 2.8685
3.6662 3.5527 304552 3.3706 3.2965 3.2311 3.1129 3.1209 3.0160 3.0316
4.7059 4.3974 6.1553 3.9603 3.8001
6106.3 99A16 27.052 14.374 9.8883 7.7183 604691 5.6668 5.1114
2.7002 2.5216 2.3523 2.1915 2.0385
2.8502 2.8150 2.'782'7 2.1530 2.'7256
3.0880 3.0299 2.9780 2.9311 2.8881
4.5582 4.2509 6.0096 3.8156 3.6557 3.5222 304089 3.3117 3.2213 3.1533
9.7222 '7.5590 6.3143 5.5151 4.9621
615'7.3 99A32 26.872 14.198
15
2.548'7 2.3689 2.1978 2.CD46 1.8'783
2.6993 2.6640 2.6316 2.6011 2.5'762
2.9377 2.8196 2.8216 2.1805 2.'7380
404054 6.0990 3.8584 3.6646 305052 3.3119 3.2588 3.1615 3.0771 3.0031
6208.'7 99.449 26.690 14.020 9.5527 7.3958 6.1554 5.3591 4.8080
20
204689 2.2880 2.1154 1.9500 1.7908
2.6203 2.5848 2.5522 2.5223 204916
2.8591 2.8011 2.7488 2.1011 2.6591
3.2940 3.1808 3.0835 2.9990 2.9249
4.3269 6.0209 3.7805 3.5868 3 ••27.
7.312'7 6.0743 5.2'793 4.7290
904665
6234.6 99.U8 26.598 13.929
24
2.3860 2.2036 2.0285 1.8600 1.69M
2.5383 2.5026 204699 204397 204118
2.7'785 2.7200 2.6615 2.6202 2.5'7'73
4.2469 3.9611 3.'7008 3.5070 3.3476 3.2141 3.1007 3.0032 2.9185 2.8442
6260.7 990466 26.505 13.838 9.3793 7.2285 5.9921 5.1981 4.6486
30 40 60

201m 2.0191 1.8363 1.655'7 1.4730
2.363'7 2.32'73 2.2938 2.2629 2.2344
2.6071 2.5684 204951 2M71 2.6035
3.0471 2.9330 2.8348 2.7493 2.6742
6313.0 99AU 26.316 13.652 9.2020 '7.0568 5.8236 5.0316 404831 4.0819 3.7761 3.5355 3.3413 3.1813
~
1.5923
2.2992 2.1142 1.9360 1,.7628
2.3840 2.3535 2.3253
204530 2041'70
2.6911 2.6359 2.5831 2.5355 204923
3.1319 3.0182 2.9205 2.8356 2.1608
3.2656
304253
3.61~
4.1653 3.8596
9.2912 '7.1432 5.9014 5.1156 4.5667
6286.8 990474 260411 13.'7U
Delr••• of Freedom for N_er.tor
1" Poillu crI ala. FDleermUlloa
TABLE '71: (cUlMeI)
2.1107 1.9172 1.7263 1.5330 1.3246
2.2695 2.2325 2.1'" 2.16'70 2.13'78
2.5168 2A568 2.6029 2.3562 2.3099
2 •• 95 201M67 2.7U9 2.6597 2.5839
3.9965 3.6904 3.MM 3.2548 3.0912
99.691 26.221 13.558 9.1118 6.9690 5.7372 4.9660 4.3978
6339A
120
2.0062 1.801'7 1.6_ 1.3805 1.0000
2.1691 2.1315 2.0965 2.0642 2.0362
204212 2.3603 2.3055 2.2559 2.2107
2.8684 2.7528 2.6530 2.5660 204893
3.0060
9.0204 6.8801 5.6495 4.8588 4.3105 3.9090 3.6025 3.3608 3.1654
6366.0 99.501 26.125 130463
00
I
I
I
I
~
1:1

""""
Z
PS
>
~
c:n
S
9.4753 9.4059 9.3423 9.2838 9.2297
9.1797 8.8278 8.4946 8.1790 7.8794
I 25
26 27 28 29
30 40 60 120 
6.3547 6.0664 5.7950 5.5393 5.2983
6.5982 6.5409 6.4885 6.4403 6.3958
6.9865 6.8914 6.8064 6.7300 6.6610
7.7008 7.5138 I 7.3536 7.2148 7.0935
9.4270 8.9122 8.5096 8.1865 7.9217
I
r
I
TABLE 7ct°
4.6233 4.3738 4.1399 3.9207 3.7151
4.8315 4.7852 4.7396 4.8977 4.6591
5.1743 5.0911 5.0168 4.9500 4.8898
5.8029 5.6378 5.4967 5.3746 5.2681
7.3428 6.8809 6.5211 6.2335 5.9984
~::~!
15.556 12.028 10.050
22500 199.25 46.195 23.155
"
I
I 4.2276 3.9860 3.7600 3.5482 3.3499
4.4327 4.3844 4.3402 ",.2996 4.2622
4.7616 4.6808 4.6088 4.5441 4.4857
5.3721 5.2117 5.0746 4.9560 4.8526
6.8723 6."'217 6.0711 5.7910 5.5623
~:!~:~
14.940 11.464 9.5221
23056 199.30 45.392 22.456
5
3.9492 3.7129 3.4918 3.2849 3.0913
4.1500 4.1027 4.0594 4.0197 3.9830
4.4721 4.3931 4.3225 4.2591 4.2019
5.0708 4.9134 4.7789 4.6627 4.5614
6.5446 6.1015 5.7570 5.4819 5.2574
~:~~~:
14.513 11.073 9.1554
23437 199.33 44.838 21.975
I
;
n"lr"". of Fr"edom for Numerator
Po.,. d' ,II. F.Oi.tribulioa 7
3.7416 3.5088 3.2911 3.0874 2.8968
3.9394 3.8928 3.8501 3.8110 3.7749
4.2569 4.1789 4.1094 4.0469 3.9905
4.8473 4.6920 4.5594 4.4448 4.3448
6.3025 5.8648 5.52"'5 5.2529 5.0313
~:::!
14.200 10.786 8.8854
23715 199.36 44.434 21.622
Prot...« E. S. P._OD fr_ Brioonetril .. yol. 33, pp. 8687.
5.2388 4.9759 4.7290 4.4973 4.2794
5.4615 5.4091 5.3611 5.3170 5.2764
5.8177 5.7304 5.6524 5.5823 5.5190
6.3034 6.1556 6.0277 5.9161
6.4'760
8.0807 7.6004 7.2258 6.9257 6.6803
::~~~~
16.530 12.917 10.882
21615 199.17 47.467 24.259
eni. IMls i. re ...... uc.d wilh lbe penaia~iOD of
9.9439 9.8295 9.7271 9.6348 9.5513
20 o 21 ~ 22 23 24
~
:
~
17 18 19
10.798 10.575 10.384 10.218 10.073
': IS oS 16
:l
9
.9
i
12.826 12.226 11.754 11.374 11.060
10 11 12 13 14
..
:~:~~
:::~!
:
18.314 14.544 12.404
22.785 18.635 16.236
5 6 7
20000 199.00 49.799 26.284
16211 198.50 55.552 31.333
1 2 3 4
2   f   __ tr'l_+_ _ _ 1_
0.5"
8
3.5801 3.3498 3.1344 2.9330 2.7444
3.7758 3.7297 3.6875 3.6487 3.6130
4.0900 4.0128 3.9440 3.8822 3.8264
4.6743 4.5207 4.3893 4.2759 4.1770
6.1159 5.6821 5.3451 5.0761 4.8566
~:::~
13.961 10.566 8.6781
23925 199.37 44.126 21.352
I 9
3.4505 3.2220 3.0083 2.8083 2.6210
3.6447 3.5989 3.5571 3.5186 3.4832
3.9564 3.8799 3.8116 3.7502 3.69&9
4.5364 4.3838 4.2535 4.1410 4.0428
5.9676 5.5368 5.2021 4.9351 4.7173
~~!~
13.772 10.391 8.5138
24091 199.39 43.882 21.139
I
~
.I
en
>C

~ ~ ~ o
II
10 11 12 13
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 40 60
~
:
";

luo
J
.!
10
3.4117 3.3765 3.3440 1.1167 2.9012 2.7052 2.5188
304499
5.CllS5 1.8199 4.6f114 4.4236 4.2719 4.1423 4.0305 3.9329 3M70 1.7709 3.7OS0 3.6420 3.5870 3.5170 3.4916
5M67 506182
1 24224 199.40 2 I 41.616 4 20.967 5 13.618 10.250 6 8.1803 7 8 . 7.2107 9 606171
c:a 15
•
I
! • .s
....1
X
4.2498 4.0991 3.9709 3.8599 3.7631 3.6779 1.6024 3.5350 306741 3.4199 3.1701 1.1252 3.2839 3.2460 3.2111 3.1787 2.9531 2.7419 2.5439 2.3S83
406281
24426 199.42 41.187 20.705 13.184 10.OS4 8.1764 7.0149 6.2274 5.6613 5.2163 4.9C163 4.6429
12
i
I
20
24116 1t6I0 199.45 199A1 41._ 42.771 20Al8 20.167 13.146 12.901 9.8140 9.s111 7.9671 7.7540 6.6012 6.8141 6.0325 5.as18 5.4707 5.2740 5.0619 4.8552 4.7214 4.5299 4.27OS 406600 4.0585 1.2468 3.8826 4.0698 3.7142 9205 1 1. 3.7929 3.6073 3.6827 306977 1.4020 1.5866 3.5020 I 1.1178 3.2411 106270 1.1600 3.1764 1.2999 3.1165 1.2456 3.0624 3.1963 3.0133 2.9685 3.1515 3.1104 2.9275 1.0727 2.8899 3.OS79 2.8551 3.0057 2.8230 2.5984 2.7811 2.1872 2.5705 2.3727 2.1881 2.1868 1.9998
15
10
25064 199.47 42.466 19.8. 12.656 906741 9.1583 7.6450 7.5345 6.5029 6.1961 5.7292 5.6248 5.1712 5.0705 4.7557 4.6541 4.3109 4A115 4.1726 1.0727 3.9611 1.8619 1.7859 1.6867 1.6378 1.5388 1.5112 1.4124 106017 3.1030 1.1062 3.2075 3.2220 1.1214 3.1474 1.0488 2.9821 1 ••07 1.0201 I 2.9221 2.8679 2.9667 1 2.8187 2.9176 I 2.8728 2.7718 2.8318 2.712.7 2.7Ml 2AM9 2.75M 2.6601 2.7272 2.6278 2.5020 2.4015 2.2898 2.1874 1.9139 2.0190 1.7891 1.8913
21M. 199M 42.622 20.OS0 12.780
24
40
6.2875 5.5186 4.9659 1.5508 4.2282 1.9704 1.7600 3.5850 306172 1.1107 1.2014 3.1058 3.0215 2.M67 2.8799 2.8198 2.7854 2.7160 2.6709 2.6296 2.5916 2.5565 2.5241 2.2958 2.0719 1.8709 1.6691
706225
25168 199.47 42.1. 19.752 12.510 9.2401
D• .,.•• of FN.clo. few "_Ntew
0.5" Po• • aI au FDl8Iri1tll&lca
1.5325
2.1838 1.9622 1.7'"
206151
2.9159 2aGl 2.7736 2.7112 2.6511 2.6018 2.5613 2.5217 2.6131 204479
120602 9.1219 7.1_ 6.1772 SAlOl 4.8592 1.4450 4.1229 1.8655 3.6553 1.4801 3.1121 1.2058 3.090 1.0001
25251 199"8 42.149 190611
60
106055 1.3637
l.8Ml
25159 199.49 41.989 19.468 12.274 9.0015 7.1911 6.0649 5.3001 4.7501 4.1367 4.0149 3.7577 3.5471 3.1722 1.2240 1.0971 2.9871 2.8908 2.8058 2.7102 2.6625 2.6015 2.5463 2.4960 2.4501 2.4078 2.1689 2.1330 2.2997 2.0635
120
2.1760 1.9318 1.6885 lA111 1.0000
2.3765 2.1297 2.2867 2.2469 2.2102
206857 206276
1
02 1.26 1.1115 2.9839 . 2.8712 2.7762 2.6901 2.6140 2.5455
12.144 I L8793 7.076. 5.9505 5.1871 4M1S 4.2256 1.90S9 i 1.6465 I 1.4359

25465 199.51 41.129 19.125
o >C
lJJ z
>
."
~
2
3
4
5
6
7 8


9
10 12
14

16
18
20
SO
100
2
1
~
18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 6.09 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 4.50 3 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4.02 4 3.93 4.01 4.02 3.83 3.83 3.83 3.83 3.83 3.83 3.83 3.83 3.83 3.83 3.83 3.83 3.83 3.64 3.74 3.79 5 3.68 3.68 3.68 3.68 3.68 3.68 3.68 3.68 3.68 3.58 3.68 3.68 3.68 3.68 6 3.46 3.64 3.61 3.61 3.61 3.61 3.61 3.61 3.60 3.61 3.61 3.61 3.61 3.61 7 3.35 3.47 3.58 3.54 3.56 3.56 3.56 3.56 3.56 3.39 3.56 3.56 3.56 3.52 3.55 3.56 3.56 3.56 8 3.26 3.47 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.52 3.34 3.41 9 3.47 3.50 3.20 3.48 3.48 3.47 3.47 3.47 3.47 3.43 3.47 3.47 3.47 10 3.30 3.l5 3.37 3.46 3.47 3.48 I 3.48 3.48 . 3.48 3.47 3.46 3.46 3.46 3.46 3.39 3.27 3.35 3.43 3.44 3.45 3.46 11 3.11 3.48 3.48 I 3.47 3.48 3.46 3.46 3.46 3.46 12 3.08 3.23 3.33 3.36 3.40 3.42 3.44 3.44 3.47 3.47 3.47 3.46 3.47 3.45 3.46 3.06 3.35 3.38 3.41 3.42 3.44 3.45 13 3.21 3.30 3.47 3.47 3.47 3.46 3.46 3.41 3.42 3.47 3.39 3M 3.45 3.18 3.27 3.33 3.37 14 3.03 3.47 3.47 3.47 3.47 3.16 3.46 3.38 3.40 3.42 3.43 3.44 3.45 3.01 3.31 3.36 l5 3.25 3.47 I 3.47 3.47 3.46 3.47 3.43 3.44 3.45 3.37 3.41 3.30 3.34 3.39 16 3.00 3.15 3.23 3.41 3.47 3.47 3.45 3.47 3.42 3.44 3.46 3.13 3.28 3.33 3.36 3.38 3.40 17 2.98 3.22 3.47 3.47 3.41 3.41 3.39 3.41 3.43 3.45 3.46 3.27 3.32 3.35 18 2.97 3.12 3.21 3.37 3.47 3.47 3.47 3.41 3M 3.46 3.39 3.41 3.43 3.31 3.35 3.37 2.96 3.11 3.19 3.26 19 3.47 ' 3.47 3.46 3.46 3.41 3.40 3.44 3.43 3.30 3.34 3.36 3.38 3.10 3.18 20 2.95 3.25 3.46 3.47 3.47 3.46 3.47 3.39 3.42 3M 3.32 3.35 3.37 3.24 3.29 22 2.93 3.08 3.17 3.47 3.46 3.47 3.47 3.22 3.37 3.39 3.41 3M 3.45 3.07 3.34 3.28 3.31 3.15 2.92 24 3.36 3.47 I 3.47 3.46 3.41 3.34 3.41 3.43 3.45 3.30 3.38 26 2.91 3.06 3.14 3.21 3.27 3.41 3.47 3.46 3.35 3.37 3.43 3.45 3.20 3.26 3.30 3.33 3.40 28 2.90 3.04 3.13 3.47 I 3.47 3.46 3.47 3.47 3.32 3.40 3.43 3M 30 3.29 3.37 2.89 3.04 3.12 3.20 3.25 3.35 3.47 3.46 3.47 3M 3.47 3.35 3.39 3.42 40 3.10 3.17 3.01 3.27 3.30 2.86 3.22 3.33 3.47 3.48 3.48 3.40 3.43 3.45 3.31 3.33 3.37 60 2.83 2.98 3.08 3.14 3.20 3.24 3.28 3.53 3.41 3.53 3.45 3.12 3.22 3.26 3.29 3.32 3.36 3.40 3.42 100 2.80 2.95 3.05 3.18 00 3.61 3.61 3.41 3M 3.47 3.02 3.19 3.26 3.29 3.38 3.09 2.77 2.92 3.15 3.23 3.34 Thia table ia reproduced from David B. DUDc:aD. "Multiple RDle aad multiple F teata ... BioiiNlriiii. Volume 11 (1955),1,. 3 With the permiaaioD of the autbor of tbe article ud Profeaaor Gertrude M. Cox. the editor of BitJmfiric:..
~

5"
TABLEaa· Lnel New Multiple RUle Teat
Sipificaat StadeDtiaed R.ea far a
~
en
>
."
90.0 14.0 8.26 6.51 5.70 5.24 4.95 4.74 4.60 4.48 4.39 4.32 4.26 4.21 4.17 4.13 4.10 4.07 4.05 4.02 3.99 3.96 3.93 3.91 3.89 3.82 3.76 3.71 3.64
90.0 14.0 8.5 6.8 5.96 5.51 5.22 5.00 4.86 4.73 4.63 4.55 4.48 4,42 4.37 4.34 4.30 4.27 4.24 4.22 4.17 4.14 4.11 4.08 4.06 3.99 3.92 3.86 3.80
3
90.0 14.0 8.6 6.9 6.11 5.65 5.37 5.14 4.99 4.88 4.77 4.68 4.62 4.55 4.50 4.45 4.41 4.38 4.35 4.33 4.28 4.24 4.21 4.18 4.16 4.10 4.03 3.98 3.90
4
I
r== 
90.0 14.0 8.7 7.0 6.18 5.73 5.45 5.23 5.08 4.96 4.86 4.76 4.69 4.63 4.58 4.54 4.50 4.46 4.43 4.40 4.36 4.33 4.30 4.28 4.22 4.17 4.12 4.06 3.98
5
4.84 4.74 4.70 4.64 4.60 4.56 4.53 4,s0 4.47 4.42 4.39 4.36 4.34 4.32 4.24 4.17 4.11 4.04
4.~
90.0 14.0 8.8 7.1 6.26 5.81 5.53 5.32 5.17 5.06
6 8
i I
9


90.0 90.0 90.0 14.0 14.0 14.0 8.9 8.9 9.0 7.1 7.2 7.2 6.33 6.40 6.44 6.00 5.88 5.95 5.61 5.73 5.69 5.40 5.47 5.51 5.25 5.36 5.32 5.13 5.20 . 5.24 5.01 5.06 5.12 4.92 . 4.96 5.02 4.88 4.84 4.94 4.78 4.83 4.87 4.72 4.77 4.81 4.67 4.72 4.76 4.63 4.68 4.72 4.59 4.64 4.68 4.56 4.61 4.64 4.53 4.58 4.61 4.48 4.53 4.57 4.53 4,49 4.44 4.41 4.46 4.50 4.39 4.43 4.47 4,41 4.36 4.45 4.30 4.37 4.34 4.31 4.23 4.27 4.17 4.21 4.25 4.09 4.17 4.14
7 90.0 14.0 9.0 7.3 6.5 6.0 5.8 5.5 5.4 5.28 5.15 5.07 4.98 4.91 4.84 4.79 4.75 4.71 4.67 4.65 4.60 4.57 4.53 4.51 4.48 4.41 4.34 4.29 4.20
10 14
90.0 90.0 14.0 14.0 9.0 9.1 7.3 7.4 6.6 6.6 6.1 6.2 5.9 5.8 5.7 5.6 5.S 5.5 5.36 5.42 5.24 5.28 5.13 5.17 5.08 5.04 4.96 5.00 4.90 4.9& 4.84 4.88 4.80 4.83 4.79 4.76 4.72 4.76 4.69 4.73 4.65 I 4.68 4.62 4.64 4.58 4.62 4.;56 4.60 4.58 4.54 4.51 4.46 4.39 4.44 4.38 4.35 4.31 4.26
12
I
90.0 14.0 9.2 7.4 6.7 6.2 5.9 5.7 5.6 5.48 5.34 5.22 5.13 5.04 4.97 4.91 4.86 4.82 4.79 4.76 4.71 4.67 4.65 4.62 4.61 4.54 4.47 4.42 4.34
16
i
I
20
90.0 90.0 14.0 14.0 9.3 9.3 7.5 7.5 6.7 6.8 6.3 6.3 6.0 6.0 5.8 5.8 5.7 5.7 5.54 I 5.55 5.39 5.38 5.24 5.26 5.14 5.15 5.06 5.07 4.99 5.00 4.9& 93 4. 4.88 1 4.89 4.84 4.85 4.81 ! 4.82 4.78 4.79 4.74 4.75 4.70 4.72 4.67 4.69 4.65 4.67 4.63 4.65 4.57 4.59 4.50 4.53 4.45 4.48 4.41 4.38
18
4.79 4.75 4.74 4.73 4.72 4.71 4.69 4.66 4.64 4.60
90.0 14.0 9.3 7.5 6.8 6.3 6.0 5.8 5.7 5.55 5.39 5.26 5.15 5.07 5.00 4.9& 4.89 4.as 4.82
50
I
I
i
I
,
I
I
5.07 5.00 4.9& 4.89 4.85 4.82 I 4.79 I 4.75 4.74 4.73 , 4.72 4.71 4.69 4.66 4.65 4.68
S.15
5.55 I 5.39 I 5.26
90.0 14.0 9.3 7.5 6.8 6.3 6.0 5.8 5.7
100
I
r==~=
Thia table ia reproduced from Dnid B. DUDcan, "Multiple fanle aad multiple F teatl," BiorM.rlc., Volume 11 (1955), p. 4 with the permillioa of the author of the article and Profe •• or Gertrude M. COli, the editor of Biometric..
00
30 40 60 100
28
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 24 26
~
2

Sicnificatlt Studentized HIlDie. (or • 1" Lenl New Multiple Rale Teat
TABLE 8b*
c.n o
"a
>
"a
c..:I
531
APPENDIX TABLE 9a95" Coafid.ace lale"al of P.rceata,e of Succe••e. of BiaQmial Popl,ration No. of Suece ... T
0 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
R.lative Frequency Sample SiI•• n of Succe •••• 250 1000 'Y=.T/n
Sil. of Sample. n
10
15
20
0 17 0 25 S6 1 31 65 4 48 3 38 74 8 55 6 44 81 12 621 9 49 881 16 68! 12 54 93 21 73' 15 59 44 97 27 79 i 19 64 55100 32 84123 68 69100 38 88'27 73 45 92'32 77 52 96136 81 160 98 141 85 68100 '46 88 178 100 51 91 56 94 I 62 97 i 69 99 75 100 83100 0 0 3 7 12 19 26 35
31 45
0 22 0 32 2 40
I
I
, I
I
I
I
50
30 0 12 0 17 1 22 2 27 4 31 6 35 8 39 10 43 12 46 15 50 17 53 20 56 23 60 25 63 28 66 31 69 34 72 37 75 40 77 44 80 47 83 50 85 54 88 57 90 61 92 65 94 69 96 73 98 78 99 83 100 88100
0 0 0 1 2 3 5 6 7 9 10 12 13 15 16 18 20 21 23 25 27 28 30 32 34 36 37 39 41 43 45 47 50 52 54 56 57 59 62 64
11
14 17 19 22 24
27 29 31 34 36 38 41 43 44
46 48 50 53 55 57 59 61 63 64
66 68 70 72 73 75 77 79 80 82 84
85 87 88 90 91 93 94
66 69 71 73 76 CJS 78 97 81 98 83
l
07
99
86100 89100 93 100
100 0 0 0
1 1 2 2 3 4 4 5 5 6 7 8 9 9 10 11 12 13 14 14 15 16 17 18 19 19 20 21 22 23 24 25 26 27 28 28 29 30 31 32 33 34 35 36 37 38 39 40
4 5 7 8
10 11 12 14 15 16 18 19 20 21 22 24 25 26 27 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 S3 54 55 56 57 58 59 60
.00 .01
.02 .00 .04 .05 .06 .07
.08 .09 .10 .11 .12 .13 .14 .15 .16 .17 .18 .19 .20 .21 .22 .23 .24 .25 .26 .27 .28 .29 .30 .31 .32 .33 .34 .35 .36 .37 .38 .39 .40 .41 .42 .43 .44 .45 .46 .47 .48 .49 .50
1 4 5 6 7 3 9 3 10 411 5 12 613 714 7 16 817 918 1019 1020 11 21 1222 13 23 1424 1526 1627 1728 1829 1930 2031 2032 2133 2234 2335 2436 2537 2638 2739 2840 2941 3042 3143 3244 3345 3446 3547 3648 3749 3850 3951 4052 4153 4254 4355 4456 0 0 1 1 2
0 0 0 2 1 3 2 4 3 5 4 7 :; 8 6 9 610 711 812 913 1014 11 15 12 16 13 17 14 18 15 19 1621 17 22 1823 1924 1925 2026 2127 2228 2329 2430 2531 2632 2733 2834 2935 3036 3137 3238 3339 3440 3541 3642 3743 3844 3945 4046 4147 4248 4349 4450 4551 4652 4753
Thi. lable i. reproduced from Statistical Me.hod., fourth editioa (1946). pp. 45, with lhe permi.aion of Profe.aor Geor,e W. Snedecor and Iowa St.te Colleare Prea ••
532
APPENDIX TABLE 9b. 9~ Confidence Iate".1 of Percent.,e of Succe.ae. of Binomi.l Popul.tion
No. of Sacce •••• T
0 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ,0 41
42 43 44 45 46 47 48
49
50
Sile of aample,
10 0 0 1 4 8 13 19
41 54 65
74 81 87 92
26 96 35 99
46100 59100
15 0 0 1 2 5 8 12 16 21 26 31 37
30 40 49 56 63
20 0 0 1 2 4 6 8
23 32 39 45
51 56 61
69 74 79 11 66 84 15 70 88 18 74 92 22 78 95 26 82 44 98 30 85 51 99 3' 89 60100 39 92 70100 44 ~ 49 96 55 98 61 99 68100 77100
/I
30 0 0 0 1 3
4 6 8
10 12
14 16 18 21 24 26 29 32
50
16 0 22 0 28 0 32 ) 36 1 40 2 '4 3 48 4 52 6 55 7 58 8 62 10 65
1)
10 14 17 20 23 26 29 31 33 36 38 40 '3
68 12 45 71 14 47 74 15 49 76 17 51 79 18 53 35 82 20 55 38 84 21 57 42 86 23 59 45 88 24 61 48 90 26 63 52 92 28 65 56 ~ 29 67 60 96 31 69 64 VI 33 71 68 99 35 72 72100 37 74 78100 39 76 84100 41 77 43 79 45 80 47 82 49 83 51 85 53 86 55 88 57 89 60 90 62 92 64 93 67 94 69 96
100 0 0 0 0 1 1 2 2 3 3
, 4 5 6 6 7 8
9 9 10 11
12 12 13 14 15 16 16 17 18 19 20 21 21 22 23 24 25 26 27 28 29 29 30
71 97 31
74 98 77 99 80 99 83100 86100 90100
32 33 34 35
5 7 9 10 12 13 14 16 17 18 19 20 21 23 24 26 27 29 30 31 32 33 34 35 36 38 39 40 41
42 43 44
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
36 37 63
Rel.i"e Fr.quency Sample Size, " of Succ •••e. 250 1000 TIll
;=
.00 .01 .02
.os
.01 .06 .06 .07 .08 .09 .10 .11
.12 .13 .14 .15 .16 .17 .18 .19 .20 .21 .22
.23 .24 .25 .26
.27 .28 .29 .30 .31 .32 .33 .34 .35 .36 .37 .38 .39 .40 .41 .42 .43 M .45
.46 .47 .48
.49 .50
0 2 0 5 1 6 1 7 2 9 210 311 3 13 4 14 515 616 617 718 819 920 922 r 1023 1124 1225 13 26 1427 15 28 1630 1731 1832 1833 1934 2035 2136 2237 23 38 24 39 2540 26 41 2642 2743 2844 2945 3046 3147 3248 3350 3451 3552 3653 3754 3855 3955 4056 4157 4258
0 1 0 2 1 3 2 4 3 6 3 7 4 8 5 9 610 712 813 914 915 1016 1117
1218 13 19 1420 1521 1622 17 23 1824 1926 2027 2128 2229 2230 2331 24 32 25 33 2634 2735 283d 2937 3038 3139 3240 3341 3442 3543 3644 3745 3846 3947 4048 4149 4250 4351 4452 45 53 4654
Tbia table i. reproduced &om Suaau,leal lIe,hod•• fourth edition (1946), pp. 45 with the permi•• ion of Profe••or Geo"e •• Sa.decor .nd low. St.te Coll.,e Pre.a.
533
APPENDIX TABLE 10Transformation of Percentages to Degrees %
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40
0 18.4 26.6 33.2 39.2
!i.7 19.4 27.3 33.8 39.8
8.1 20.:t 28.0 34.4 40.4
10.0 21.1 28.7 35.1 41.0
11.5 22.0 29.3 35.7 41.6
12.9 22.8 30.0 36.3 42.1
14.2 23.6 30.7 36.9 42.7
15.3 24.4 31.3 37.5 43.3
16.4 25.1 31.9 38.1 43.9
17.5 25.8 32.6 38.6 44.4.
50 60 70 80 90
45.0 50.8 56.8 63.4 71.6
45.6 51.4 57.4 64.2 72.5
46.1 51.9 58.7 64.9 73.6
46.7 52.5 58.7 65.6 74.7
47.3 53.1 59.3 66.4 75.8
47.9 53.7 60.0 67.2 77.1
48.4 54.3 60.7 68.0 78.5
49.0 54.9 61.3 68.9 80.0
49.6 55.6 62.0 69.7 81.9
50.2 56.2 62.7 70.6 84.3
Table 10 is reprinted from Table XII of Fisher and Yates: Statiatical Tables
for Biological. Agricul'ural. and Medical Research. published by Oliver and Boyd Ltd., Edinburgh, by permission of tbe authors and publishers. TABLE 11Normal Scores for Ranks (Zero and negative values omitted) Number of Objects Rank

1 2 3 4
2
3
4
5
6
7
8
9
10
.56
.85
1.03 .30
1.16 .50
1.27 .64 .20
1.35 .76 .35
1.42 .85 .47 .15
1.49 .93 .57 .27
1.54 1.00 .66 .38
.l2
5 Number of Objects Rank
11
12
13
14
15
16
17
18
19
20
1 2 3 4 5
1.59 1.06 .73 .46 .22
1.63 1.12 .79 .54 .31
1.67 1.16 .85 .60 .39
1.70 1.21 .90 .66 .46
1.74 1.25 .95
1.76 1.28 .99 .76 .57
1".79 1.32 1.03 .81 .62
1.82 1.35 1.07 .85 .67
1.84 1.38 1.10 .89
.71
1.87 1.41 1.13 .92 .75
.10
.19
.27 .09
.34 .17
.39 .23 .08
.45 .30 .15
.50 .35 .21 .07
.55 .40 .26 .13
.59 .4.5 .31 .19
6 7 8 9 10
.71 .52
.06
Table II is abridged from Table XX of Fisher and Yates: Statistical Tables for Biological, Agricultural, and Medical Research, published by Oliver and Boyd Ltd., Edinburgh, by permission of the. authors and publishers. For large number of objects up to 50, see original table.
INDEXES
INDEX TO THEOREMS Theorem 2.48. • • • • • • • • • • • • • • . • • • • • • • • • 2.4b • • • • • • • • • • • • • • • • • • • • • • • • • 5.2a • • • • • • • • • • • • • • • • • • • • • • • • • 5.2b. • • • • • • •• • • • • • • • • • •• • • •• •
Page 5 6 33 35
Theorem
9.6 ••••••••••••••••••••••••
9.7 ••••••••••••••••••••••••
10.1& ••••••••••••••••••••••• lO.Ib ••••••••••••••••••••••• IO.Ie ••••••••••••••••••••••• 10.4••••••••••••••••••••••••
S.3.......................... 35 7.2 • • • • • • • • • • • • • • • • • • • • • • • • • • 62 7.5 • • • • • • • • . • • . • • • . • • • • • • • • • • 67 7.6 • • • • • • • • • • • • • • • • • • • • • • • • • • 71 7.7.. •••• ••••.•.••••••••.•••• 74 ??b.. .•• .••..•.••••••....... 76 8.1.. • • • • • • • • • • • • • • • • . • • • • • . • 88 8.1b • • • • • • • • • • • • • • • • • • • • • • • • • 89 9.1 • • • • • • • • • • • • • • • • • • • • • • • • •• 106
537
Page
112 114 122
lO•.t.b ••••••••••••••••••••••• 12.6 ••••••••••••••••••••••••
124 124 128 128 172
15.2 •••••••••••••••••••••••• 11.2 ••••••••••••••••••••••••
222 276
17.3 •••••••••••••••••••••••• 17.4 •••••••••••••••••••••••• 21.1 ••••••••••••••••••••••••
280 286 393
INDEX TO TABLES Table
Page
2.5a. . • • • • • • . • • • • • . • . • • • • • . 2.5b. • • • • • • • • • • • • • • • • • • • • • • 4.18. • • • • • • • • • • • • • • • • • . • •• •
7 7 23
4.1h • • • • • • • • • • • • • • • • • • • • • • • 4.2 • • • • • • • • • • • • • • • • • • • • • • • •
25 27
5.1a. • • • • • • • • • • • • • • • • • • • •• • S.lh. ••• . • . . •• • • • •• • . • • •••• 5.2. • • • • • • • . • . • • • • • • . • • • • • • 5.3 • • • • • • • • • • • • • • • • • • . • • • • •
30 31 32 35
5.6 • • • . • • • • • • • • • • • • • • • • • • • • 7.2 • • • • • • • • • • • • • • • • • • • • • • • • 7.4 • • • • • • • • • • • • • • • • • • • • • • • • 7.5 • • • • • • • • • • • • • • • • • • • • • • • •
38 61 65 68
7.7. . • • • • . • . • • • • • • • • • • • • • • •
73
8.2. • • • • • • . • • • • • • • • • • • • • •• • 8.7 •.•••••..••.••..••••••••
90 9'7
9.2 • • • • • • • • • • • • • • • • • • • • • • • •
107
9.5. • . • • • • • • • • • • • • • • • • • • • • •
110
9.6 • • • • • • • • • • • • • • • • • • • • • • • •
112
10.le. • • • • • • • • • • • • • • • • • • • • • • IO.lb .•••.•••••..••.. IO.le.......................
120 120 121
lO.ld.......................
10.2. • • • • • . • • • . . • • . • . • • • . •• •
10.5........................
10.6 • • • • • • • • • • • • • • • • • • • • • • • •
12.18. . • . • • • • • • . • • • • • • • • . • • . 12.1b • • • • • • • • • • • • • • • • • • . • • • • 12.1c. • . • . • . • • . • • . • • • • • . • . • • 12.1d • • • • • • • • • • • • • • • • • • • • • • • 12.1e • • • • • • • • • • • • • • • • • • • • •• • 12.2. • • • • • • • • • • • • • • • • • • • • • • • 12.38 • • • • • • • • • • • • • • • • • • • • • • •
Page
14.4a. • • • • • • • • • • • • • • • • • • • • • • 14.41» • • • • • • • • • • • • • • • • • • • • • • • 14.5. • • • • • • • • • • • • • • • • • • • • • • •
205 2()6 2CJ8
14.5b • • • • • • • • • • • • • • • • • • • • • • •
2()8
14.6 • • • • • • • • • • • • • • • • . • • • • • • • 14.7. • • • • • • • • • • • • • • • • • • • • • • • 14.8. • • •• • • • • • • • • • •• • • • • • • • •
15.2.. • • •• • • • • • • • • • • • • •• • • • • 15.3a. • • • • • • • • • • • • • • • • • • • • • • lS.3b.. • • •••• • •. • • •• • . • ••••. 15.58. • • • • • • • • • • • • • • • • • • • • • • lS.Sb • • • • • • • • • • • • • • • • • • • • • • • 16.1 • • • • • • • • • • • • • • • • • • • • • • • • 16.3. • . • • • • • • • • • • • • • • • • • • • • •
72
7.7b.......................
I......
Table
209 211 213
223 2.29 231 2..0 2~
245 250
16.3b ••••••••••••••••••••••••. 252 16.3e. • • • • • • • • • • • • • • • • • • • • • • 254
16.4. • • • • • • • • • • • • • • • • • • • • • • • 16.58. • • • • • • • • • • • • • • • • • • • • • •
258 25St
16.5b • • • • • • • • • • • • • • • • • • • • • • • 16.Sc • • • • • • • • • • • • • • • • • • • • •• •
26()
261
16.98. • •• • • • • • • • • • • • • • • • • •••• 267 16.9b. • • • • • • • • • • • • • • • • • • • • • • 268 17.1 ••••••• a 17.3 •••• a • • •
• • • • • • • • • •• • • •• •
275 279
J,24 130 132 151 152
17.4 •••••••••••••••• a a • • • • • a
285
155
1"_
155
12.3b. • • • • • • • • • • • • • • • • • • • •• •• 163 12.10.......... ...... .... ... 176
190 191 198
14.2b • • • • • • • • • • • • • • • • • . • • • • • 14.3. • • • • • • • • • • • • • • • • • • • • • • •
l' .68. • • • • • • • • • • . • • • . • . • • • • •
293
17.6b •••• a • • • • • • • • • • • • • • • • • • 17.6c • • • • • • • • • • • • • • • • • • • • • • • 17.6d • • • • • • • • • • • • • • • • • • • • • • •
294 294 295
17.6e . . . •• .. •• . . • . • .• • . . .••. 17.7........................
295 296
17.8 • • • • • • • . • • • • • • • • • • • • • • • • 18.1 • • • • • • • • • • • • • • • • • • • • • • • • 18.28. • • •• • •• •• • • ••• •• •• • •• •
300 309
17.7h....................... 298
155 158 162
12. lOb • • • • • • • • • • • • • • • • • • • • • • 12.10c............... ••••.•. 12.10d. • • • • • • • • • • • • • • • • • • • • • 13.2. • • • • • . • • • • • • • • • • • • • • • • • 13.3. • • •• •• • • • • • • •• • • • • • ••• • 14.28. • • • • • • • • • • • • • • • • • • • • • •
a •• a ••••••••• a..
176
171 178
18.2b • • • • • • • • • • • • • • • • • • • • • • •
311 312
IS.2e. . • • •• • • • • • • • • • • • . • • •• • 18.2d • • • • • • • • • • • • • • • • • • • • • • • 18.2e. • • • • • • • • • • •• • • •• • •• ••• lS.3. • • • • • . • • • • • • . • • . • • • • . • •
312 313 315 315
18.48.......................
317
18.4b. • • • • • • • • • • • • • • • • • • • • • •
317
199 201
IS.4c. • • • . . • • • • • • • . • • • • • • • • • 18.4d. • • • • • • • • • • • • • • • . • • • • • •
318 318
14.3b. • • • • • • • • • • • • • • • • • • • • • •
202
18.Sa. • • • • • • • • • • • • • • • • • • • • • •
319
14.3c • • • • • • • • • • • • • • • • • • • • • • • 14.3d. • • • • • • • • • • • • • • • • • • • • • •
2()S 2Cl4
18.Sb. • • • • • • • • • • • • • • • • • • • • • • IS.Se. • . • • • • • • • • • • • • . • • • •• • •
319 320
538
INDEX TO TABLES
Table
Page
Table
18.5d ......•...•.....•......
320 322 323 323 325 328 329 330 331 346 347 348 349 350 354 360 361 361 362 362 363 364 366 367
21.6c ....................... 21.6d ..•.................... 21.6e ....................... 21.1 •....................... 21.7b ..•.................... 21.S •.......................
18.5e .........•.............
IS.Sf ...................••.. 18.58 ...................•... 18.6 ........................ 18.8a .....•...•.............
18.8b ..........•............ l8.ac ....................... IS.8d ....................... 19.1_ ..................•.... 19.1b .......•............... 19.1c .......................
19.1d •.•.•...........•...... 19.2 ••..•................•.. 19.4 ........................ 19.68 .•.........•........... 19.6b .............•......... 19.6c ....•.................. 19.6d •...................... 19.6e ..•.................... 19.6f .......•............... 19.7 ........................ 19.8 •..............•........ 19.8b •.....•................ 19.8c .......................
368
369 19.98 ...•................... 371 19.9b ......................... 371 19.9c ..•.......•............. 372 19.9d ................•...... 373 19. ge ....................... 373 19.9f ....................... 374 19.9g ....................... 376 19.9h ....•.................. 378 21.1 ........................ 391 21.2a ....................... 394 21.2b ....................... 396 21.3 .......•................ 397 21.4 ........................ 404 21.6a ....................•.. 408 21.6b ....................... 408 19.8d .••.••••.••••••••.•••..
21.8b ....•.................. 21.9•....................... 21.9b ....................... 21.10 ....................... 22.2 ........................ 22.48 ....................... 22.4b ............•.......... 22.58 ..........•............
22.Sb ...............•....... 22.68 ..........•.........•.. 22.6b ....................... 22.7 ........................
22.8 ........................ 23.1 .................•...... 23.18 ....................... 23.2h ....................... 23.2c ....................•..
23.2d .......................
539 Page
410 411 411 414 416 417 418 421 423 426 434 437 437 438 439 440 441 442 443 448 451 452 453 454
23.3a ..•••..••.•.•••........ 23.3b .............•......... 23.3c ....................... 23.4 ........................
454
24.5b .......................
479
455 455 458 23. Sa .•....•......•..•.....• 460 23.5b ....................... 461 23.5c ....................... 461 24.38 ....................•... 473 24.3b ....................... 473 24.48 ....................... 475 24.4b ..................•.... 475 24.4c ....................... 476 24.4.d ••....•......•...•....• 477 24.4e ....................... 477 24.5a ....................... 478
INDEX TO FIGURES Figure 2.68 • • • • • • • • • • • • • • • • • • • • • • • 2.6b • • • • • • • • • • • • • • • • • • • • • • • 2.6c. • • • • • • • • • • • • • • • • • • • • • •
3.1s....................... 3.lc.......................
Page
Figure
8 9 10
S.lb. • • • ••• • •• • ••••• •• •••• •
14 16 17
3.3a. • • • • • • • • • • • • • • • • • • • • • • • 3.3b. • • • • • • • • • • • • • • • • • • • • • •
19 20
4.1b • • •• • • • • •• • •• • • • • • • • •• •
2S
24
Pap
9.1 ••••••••••••••••••••••••
1()6
9.2. • • • • • • • • • • • • • • • • • • • • • • • 9.6........................ 10.1 • • • • • • • • •• • • • • ••• • • • • ••• 10.2. . . • • . . • . . • • . . • • • . • • . • . • 10.5. . • • • • • • • •• .• •. •• • • •• • •• 12.2........................ 12.5. •• • •••• • ••• • • .•• ••••••• 14.1........................ 15.2. • • • • • • • • • • • • • • • • • • • • • • • 16.1 • • • • . . • • • • • • • • • • • • • • • • • • 16.2. • • • • • • • • • • • • • • • • • • • . • • • 16.3. •• • • •• • • • •• • • • • • •• •• •• •
108 113 123 125 131 159 170 197 224
17.7........................
29'7
19.9 • • • • • • • • • • • • • • • • • • • • • • • • 21.2. • • • • • • • • • • • • • • • • • • • • • • •
377 395
4.1a....................... 5.28.......................
32
S.2b. • • • • • • • • • • • • • • • • • • • •• • 5.2c. • • • • • • • • • • • • • • • • • • • • • •
38 83
S.2cI. • • • • • • • • • • • • • • • • • • • ••• 5.6 • • • • • • • • • • • • • • • • • • • • • • • •
34
6.38........... •••••••••.•• 6.3b. • • • • • • • • • • • • • • • • • • • • • • 6.4 • • • • • • • • • • • • • • • • • • • • • • • • 6.5 • • • • • • • • • • • • • • • • • • • • • • • • 7.Sa. • • • • • • • • • • • • • • • • • • • • • • •
47 .t8 50 51 67
?Sb....................... 69
21.3a....................... 398
7 .6a • • • • • • • • • • • • • • • • • • • • • • • 7.6b • • • • • • • • • • • • • • • • • • • • • • •
70 71
7 .10a • • • • • • • • • • • • • • • • • • • • • • • ? • lOb • • • • • • • • • • • • • • • • • • • • • •
79 80
?10e....... ••••••...••••••
81
21.8b. • • • • • • • • • • • • • • • • • • • • • • 21.3c. • • • • • • • • • • • • • • • • • • • • • • 21.8d.. • • •• • • • • • • • ••• • • • • • • • 23.1.. • •• • • •• • • • •• •• • • • • • • • • 23.1b • • • • • • • • • • • • • • • • • • • • • • • 23.lc. • • • • • • • • • • • • • • • • • • • • • • 23.38. • • • • • • • • • •• •• •• • • •• •• • 23.3b. • • • • • • • •• •• •• •• • • •• • ••
8.2........................ 8.7. • • • • • • • • • • • • • • • • • • • • • • •
249 255
16.4........................ 256 16.5. • • • • • • • • • • • • • • • • • • • • • • • 262 17.4........................ 286
4()
19.4........................ 356
7.7........................ 74
8.1........................
2'6
88 91
96
540
399
.too ~1
447 .. 449 456 457
INDEX TO SUBJECT MATTER A Additive components, 203 Adjusted mean, 283 distribution of, 283 factors affecting, 287 mean of, 284 randomized block, 366 sampling experiment, 284, 356 test of homogeneity of, 353 with equal regression coefficients, 363 with unequal regression coefficients, 353 variance of, 284 weight of, 355, 359 Advantages of equal sample sizes, 133, 148, 179 Advantages of large sample, 147, 168 Amonssample mean square, 158 average value of, 166 Amonssample SS, 155 Anal ysis of covariance, 344 relation with factorial experiment, 370
Analysis of variance, 77, 151 applications, 173 assumptions, 172 computing method for equal sample sizes, 159 computing method for unequal sample sizes, 177 model of oneway classification, 163 models, 163, 214 relation with linear regression, 290 relation with Xl test, 416 summary, 384 tests of specific hypotheses, 175, 221, 226, 233, 238, 298 unequal sample sizes, 175 Angular transformation, 447 example, 450 sampling experiments, 450 Applications of statistics, 388 Array, 245 mean of, 245 Assumptions, 54
8 Bioomial disuibution, 197 in contrast with binomial population. 396. 397
mean, 396 minimum sample size, 397 variance of. 390 Binomial population, 390 in contrast with binomial distribution, 396, 397 mean,391. 393 notatioDS, 425, 426 sampling experiments, 393, 397 variance, 391, 393 Binomial transformation, 47l, 472, 481 Block, 196, 197
C Central Limit Theorem, 34 Chi square, see X I Class, 6 Complete block, 197 Completely randomized experiment, 196 average values of mean squares, 166 distributionfree methods, 472 factorial experiments, lIee Factorial experiments oneway classification, 151 vs. randomized blocks, 196 Components of variance, aee Mean of mean squares Components of variance model, 166, 324 Confidence coefficient, 142, 144 Confidence interval, 142. 144 effect o( sample size on, 147, 193 length 0(, 144. 145, 147, 193 of adjusted mean, 287 of difference between adjusted means, 358 of difference between means, 147 of difference between regression coefficients, 350 of mean o( binomal population, 405 of population mean, 145, 278 o( regression coefficient, 282, 283 Confidence limits, 144 Contingency table, 440 2 x 2, 408. 442 2 x k, 414. 438 r x k. 438. 440 Correlation Coefficient. 265 Critical region, 47 Curve cl regression, 247 connectioD with factorial experimeDt. 377
Curvilinear regression, 298
541
542
INDEX
o Degrees of freedom, 66 physical meaning, 159, 405 Descript,ive statistics, I, 3 Dichotomous, 390 Distributionfree methods, 469, 482 advantages and disadvantages, 483 completely randomized experiment, 472 randomized block experiment, 474 sign test, 478 Distribution of rare events, 457 Dummy value for missing observation, 210 Duncan's test, see new multiple range test
E Equally spaced ~values, 300 Error, 167, 194 Error SS, 158, 167 Estimation, by interval, 141 Expected frequency, see Hypothetical frequency Expected value, see Mean of Experimental error, 212, 331, 452
F Factorial experiment, 309 average values of mean squares, 321, 325 computing method, 316 description of, 309 model,324 partition of sum of squares, 311 relation with analysis of covariance, 370 tests of specific hypotheses, 325 vs. hierarchieal classification, 326 Failure, 390 Fdistribution, 105 description of, 10!; relation with ",distribution, 171 relation with X2 distribution, 114 Fitting frequency curves, 436 Fixed model, 166, 324 Frequency, 6 Frequency curve, 9 fitting to data, 436 Frequency table, 6
G General mean, 152, 177 Grand total, 151
H Hierarchical classification, 326 Histogram, 8 Hypothesis, 44 Hypothetical frequency, 404, 410, 433, 438
I Incomplete block, 197 Independent sample, 26, 122 Individual degree of freedom, 226 advantage of, 302, 386 binomial, population, 420 linear regression, 298 multinomial population, 442 orthogonal set, 229, 230, 360 relation with least significant difference, 235 summary, 386 use and misuse, 237 x2test of goodness of fit, 435 Induction, 2, 43 Inefficient statistics, 193 Inequality, 141 Interaction, 315 relation with homoReneity of regression coefficients, 371
L Large sample method, 482 Least significant difference, 234, 326 between two treatment means, 234, 326 between two treatment totals, 235 relation with individual desree of freedom. 235 relation with new multiple range test, 238
sampling experiment, 237 use and misuse, 236 Least squares, 253 advantages of, 277, 281 Length of confidence interval, )44, 147 Level of significance, 46 Lint!ar combination, 221 distribution of, 222
543
INDEX TO SUBJECT MATTER
mean of, 222 sampling experiment, 223 variance of, 223 Linear hypothesis model, 166, 324 Linear regression, 244, 247 algebraic identities, 267 aa individual degree of freedom, 298 assumptions, 248 computing methods, 268 distribution of adjusted mean, 283 distribution of regression SS, 260 distribution of regression coefficient, 278 distribution of residual SS, 261 distribution of sample (unadjusted) mean, 276 estimation of parameters, 250 estimate of variance of array, 263 line of regression, 247 model, 248 partition of sum of squares, 255 planned experiment, 300, 302, 375 relation with analysis of variance, 290 sampling experiments, 259, 274, 278,
284 spacing %val ues, 282, 375 test of hypothesis, 264 test of linearity, 295 variance component, 288 Line of regression, 247 Logarithmic transformation, 458
M Mean, 3 vs. median, 470 Mean of, adjusted mean, 284 binomial distribution, 396 binomial population, 391, 393 difference between means, 124 mean squares, 166, 204, 289, 321, 325
population, 3 regression coefficient, 281 sample, 28 sample mean, 31, 276 Mean square, 158 Mechanics of Partition of sum squares, 151. 198 Median, 469 test of hypothesis. 471 vs. mean. 470
Minimum sample size for binomial distribution, 397 Missing observation. 209 Model. 166, 324 Multinomial population, 432 slIIIlpling experiment, 433
N New multiple range test, 238 relation with le88t significant difference, 238 Nonparametric methods, 469 Normal curve, 14 Normal probability gaph paper, 19 Normal scores, 459 Normal score transformation, 459 quantitative data, 474 ranked data, 459 relation with sign test, 481
o Observation, 1 Observed frequency, 433
Observed relative frequency, 39 Onetailed test, 54 Orthogonal, 229, 230, 360
p Paired observations, 96 relation with randomized blocks, 208 sign test, 478 Parameter, 28 Parameter and statistics of randomized blocks, 204 Poisson distribution, 456 test of hypothesis, 458 Pooled estimate of variance, 114 Pooled SS, 155 sampling experiment, 112 Pooled variance, 114 Population, 1 mean, 3 variance, 4 Population mean, 3 confidence interval of. 145. 278 Power of a test, 389 Probability. 21, 31 Probability of obtaining a failure on a single trial, 425 Probability of obtaining a success on a single trial, 425
INDEX
544 Q Quality control, 84
R Random number table, 135, 396 Randomization, 135 Randomized blocks, 196 binomial traosformation, 472 computing method, 205 distributionhee methods, 474 models, 214 normal score transformation. 459 relation with paired observations, 208 sign test, 478 summary, 387 vs. completely randomized experiment, 196 Random sample, I, 26 Random variable model, 166, 324 Range, 4 Ranked data, 459 Regression, 244 curve of, 247 line of, 247 linear, aee Linear regression Regression coefficient, 247 confidence interval of, 282, 283 distribution of, 278 factors affecting, 281 in contrast with mean, 344, 349 mean of, 280 relation with in teraetion, 371 sampling experiment, 278,351 test of homogeneity, 344 test of hypothesis fJ = 130, 282 test of hypothesis fJl = A, 344, 3Se variance of, 281 weight of, 34S, 350, 359 Regression equation, 247 Regression SS, 260 distribution of, 260 mean of, 289 Relation among various distribution 189 Relative CUIII11ative frequency, 7 Relative frequency, 7, 21 Reliability of sample mean, 36 Replication, 197 Replication,effect, 201, 204 Replication mean square, 204 average value of, 204 Residual SS, 254, 255 distribution of, 261
s Sample, 1 Sample..,... 28 distribution of, 31, 276 in contrast with reFessioll coefficient, 344, 349 mean of, 35, 276 variance of. 36, 276 weight of, 177, 345,350, 359 Sample size, SO. 147, 192 advantage of equal sample sizes, 133, 179 advantage of large sample, 147, 168. 192 effect on confidence interval, 147, 193 minimum for binomial distribution, 397 Sample aam, 393 Sample variance, 62 computing method, 64 distribution of, 75 linear regression, 263 weighted mean of, 111 Sampling error, 331, Sampling uperimenlS, 2S adj uated mean, 284, 356 angular transformalion, 450 binomial population, 393, 397 description, 23 difference between means, 125 Fdistribution, 107, 157 least significant difference, 237 Sampling experiment, (cont.) linear combination, 223 multinomial population, 433 normal score, 459 pooled SS, 112 regression coefficient, 278, 351 regression SS, 259 residual SS, 261 sample mean, 38 Xldistribution, 66, 73 ,distribution, 89, 129 Sampling unit, 332 Sampling with replacement, 30 Shortest significant range, 238 Sign test,478 as binomial traosformation, 481 relation with corrected Xl, 479, 481 relation with normal score transformation,480
545
INDEX TO SUBJECf MATTER
Sipificance, 192 Significance level, 46, 192 SipificaDt studmtized range, 238 Single desree of freedom, see IDdividual desree of freedom Size of the sample, ue Sample size Square ft)Ot transformatioD, 454 Standard deviatioD, 4 Standard error, 36, 194 Statistic, 28 Statistical inference, 1 Soedecor's FdistributioD, aee Fdistribution Student's edistribution, aee edistribulion Subeample, 250 Success, 390
T Table of raDdom Dumbers, 135, 396 edistributioD, 87, 127 description, 87 relation with F distribUlioD, 171 sampling experiment, 89. 129 Tea, of bomogeneity of, adjusted mean for randomized block experiment, 366 adjusted means with equal resressioD coefficients, 363 adjusted means with unequal regressioD coefficients, 353 meaDS of binomial populations, 413 means, aee analysis of variance regression coefficieDts, 344 relation with interactioD, 371 Test of hypothesis, 43, 78, 92, 109, 190 Teat of linearity of rep88ioD, 295 Test of sipificance, ue Test of hypothesis Tests of specific hypotheses, 175, 221, 226, 233, 238, 298 factorial experiment, 325 Theoretical frequency, 427 Theoretical relative frequeDCY. 39 Tier, 326 Total frequency, 6 Total 55, 153 Transformation, 447 angular, 447 binomial, 471, 472, 481 loprithmic, 4S8
normal score, 459 square root, 454 Treatment effect, 167, 203, 204 1jeatmentSS, 158,200 Twotailed test, 54 Type I Error, 45, 192 Type II Error, 45, 190
U Unadjusted mean, 283 Unbill1led estimate, 63 Units of measuremente, 388 lMeat,53
v Variance, 4 Variance compoDeDt model,l66, 32t Variance compoDents, aee Mean of mean squares Variance of adjusted mean, 284 binomial distribution, 396 binomial populatioD, 391, 393 difference betweeD. meas, 124 populatioD, 4 population means, 164 regression coefficient, 281 sample, 62 sample mean, 36. 277
w Weighted mean of adjusted means, 3SS, 359 means, 177, 345, 350, 359 regression coefficients, 345, 350, 359 variances, 111 Within....... ple mean square, 158 average value of, 166 Withinaample 55, ISS
x )( ldistri bution, 6S mean of, 66 relation with F~lItributioD, 114 correction, 476, 479 )( Itellt, 78 relation with aualysill of variance, 416
S46
INDEX
xltest of gooclnesa of 6t, 404, 432 computing ehortcut, 443 fhting frequency curves, 436 individual degee of freedom, 435 X lteet of independence, 410, 438 computing ehortcut 2 X 2, 412
2 x At 414 r x At 443 individual degae of &eedom, 442
z zdietribution, 105
SYMBOLS AND ABBREVIATIONS
SYMBOLS AND ABBREVIATIONS
549
LOWER CASE LETTERS a
6
= (1) The mean of a sample which consists of several arrays or the least squares estimate of ex. in connection with I inear regression. Page 250. (2) The number of levels of the factor A of a factorial experiment. Page 310• .. (1) The regression coefficient of a sample or the least squares estimate of f3. Page 250. (2) The number of levels of the factor B of a factorial experiment. Page 310. The regression coefficient of the kth sample, e.g., 6 n 62, etc. Page 345. = The weighted mean of the regression coefficients of several samples. Page 345 • ... The dummy value for a missing observation. Hsed in Section 14.7 only. Page 210. = Degrees of freedom. Page 67. .. Frequency as used in Table 2.5a. Page 7. m The number of failures in a binomial population. Page 392= The number of successes in a binomial population. Page 392= The number of treatment means in a group. Page 239. = The hypothetical, theoretical, or expected frequency in the chi square test of goodness of fit and the chi square test of independence. Pages 404, 410, 433, 438. =The number of samples or treatments. Pages lSI, 198. ... (1) A number. Used only in Theorem 2.4b. Page 6. (2) The midpoint of a class in a frequency table, such as Table 7.7b, page 73; Table 8. 2, page 90; Table 9.6, page 112; Table 10.5, page 130; Table 16.5c, page 261. (3) The mean of the observations (y) with the same svalue. Used only in Section 17.7. Page 296. =(1) The sample si ze or the number of observations in a sample. Page 29. (2) The number of replications in a randomized block experiment. Page 198. zs Defined in Equation (5), Section 12.10. Page 178. = The average sample size. Page 180. = The size of the kth sample, e.g., nu n2f etc. Page 119. = The relative frequency of successes of a sample. Table 21.10, page 426. = Tile relative frequency of successes of several samples combined. Table 21.10, page 426. = 0) Correlation coefficient. Used in Chapters 16 and 17 only. Page 265. (2) The number of categories of a multinomial population. Used in Chapter 22 only. Page 432. = Relative frequency. Table 2.5a, page 7.  Relative cumulative frequency. Table 2.5a, page 7.  The standard deviation of a sample or the square root of 8 2• Page 62= The variance of a sample or the unbiased estimate of (72, the variance of a population. Pages 62, 263.
= d
d·f· f
fo f, g h
k m
n
no n
p r
r·f· r.c·f· s
550
SYMBOLS AND ABBREVIATIONS
=The variance of the kth = Weighted mean of several
t '.021
'.001 U
v
", " " "y y'
Y8 ~ Y,
sample, e.g., .st, .sl, etc. Page 105. sample variances or pooled estimate of a population variance. Pages Ill, 114= The variance of the means of several samples of the same size. Page 157. = A statistic which follows the Student's edistribution. Pages 88, 128, 227, 277, 282, 287, 353, 357, 366. = The 2.5% point of the Student's edistribution. Page 145. c: The 0.5% point of the Student's edistribution. Page 147. c: A statistic which follows the nonnal distribution with mean equal to zero and variance equal to one. Pages 18, 83, 126, 226, 277, 282, 287, 402, 403, 407, 408• .. A linear combination of a number of observations. Page 221. = A value associated with an observation r or a quantitative treatment. Page 244. m A particular value of ". Page 283.  The mean of the xvalues of a sample. Page 246. m The mean of the ,,values of several samples. Page 354. = An observation. Page 3. eo: An original observation in the sampling experiment described in Section 17.1. Page 275. = The mean of a sample. Page 28. =The mean of several samples combined or the weighted mean of several sample means. Page 152.. The mean of the observations belonging to a particular level of the f.:tor A of a factorial experiment. Page 311. c: The mean 01 the observations belonging to a particular level of the factor B of a factorial experiment. Page 311. = The mean of the kth sample, e.g., ~, 120 etc., Page 119. = The mean of a replication of a randomized block experiment. Page 198. == The mean of a treatment of a randomized block experiment. Page 198. == An adjusted sample mean or a subsample mean. Page 250. = The estimated mean of r at "  ,,'. Page 283. :a The estimated mean of y at" .. ~ Page 355.
CAPITAL LETTERS AB C
DF E
Ek F
F.05 G
e"
.. The interaction between the factors A and B of a factorial experiment. Page 312. = Defined in Equation (1U, Section 16.3. Used in this section only. Page 253. == Degrees of freedom. Page 67. == A sample estimate of a parameter, such as i b, and~. Used in Section 19.6 only. Page 359. == An estimate oE a parameter obtained from the kth Il8D1ple, e.g., E 10 E 20 etc. Page 359. = A statistic which follows the Fdistribution. Pages 106, 158, 204, 227, 263. == The 5% point of the Fdistribution. Page 235. == The grand total. Page 151. == The grand total of ,,values. Page 347.
SYMBOLS AND ABBREVIA nONS
= The
G
551
grand total of yvalues. Page 347.
L§D = The least significant difference between two means. Page 233. LSD. os = LSD with the significance level equal to 5%. Page 234 • M
... A multiplier used in obtaining an individual degree of freedom of the sum of squares. Page 221. eo The Irth multiplier used in obtaining an individual degree of freedom. such as Mit M" etc. Page 221. ,. Mean square. Page 158. The number of observations in a population. Page 4. = An individual degree of freedom of the treatment sum of squares, e.g., {Pu Q\. etc. Pages 227, 228. = (1) The sum of the observations of the replication which has a missing observation. Used in Section 14.7 only. Page 210. (2) A quantity used in the shortcut method of computing the chi square value for a k x r contigency table. Used in Section 22.8 only. Page 443= The grand total of a randomized block experiment with a missing observation. Used in Section 14.7 only. Page 210 • ... The sum of products. Page 266. = The sum of products for the kth sample. e.g., SP u SPa. etc. Page 345• ... The sum of squares. Page 64II: The SS for the xvalues. Page 267.  The SS for the %values of the kth sample. e.g., SS,d' SSJC2' etc. Page 345 • .. The SS for the yvalues. Page 267.
=
R
S SP SP Ir SS SS
SS"
"Ir
SS y SSR
T
T,
= The shortest significant range. Page 238.  A sample total or the sum of the observations of a sample. Page 151. = The sum of the observations belonging to a particular level of the factor A of a factorial experiment. Page 311. = The sum of the observations belonging to a particular level of the factor B of a factorial experiment. Page 311 • .. The total of the kth sample. such as T I. Ta. etc. Page 151. = A replication total or the sum of the observations belonging to a particular replication of a randomized block experiment. Page 198• ... A treatment total or the sum of the observations belonging to a particular treatment of a randomized block. experiment. Page 198. ,. The sum of the %values of a sample. Page 364The sum of the yvalues of a sample. Page 364. = A version of sample variance defined in Equation (4). Section 7.2. Page 60• .. A version of sample variance defined in Equation (5). Section 7.2. Page 61. = The weight used in obtaining a weighted mean. Page 359. ::I
If
GREEK LETTERS eo The mean of a population with several xarrays. Page 247. ,. The population regression coefficient of y on x. Page 247.  The mean of several population regression coefficients. Page 349• • The regression coeHicient of the kth population. e.g •• {3u {3a. etc. Page 349.
552
SYMBOLS AND ABBREVIAnONS
.. The mean of a population. Page 3• .. The mean of several population means. Page 166= The hyptohetical value of a population mean. Page 46= The mean of the statistic Go Page 277. = The population mean of the factor A of a factorial experiment or the mean of the statistic 1A' Page 319. = The mean of the statistic b. Page 280. = The population mean of the factor B of a factorial experiment or the mean of the statistic 1B' Page 319. = The mean of the kth population. e.g., 1'2. etc. Page 119. = The population mean of a replication of a randomized block. experiment or the mean of the statistic 1,. Page 201. = The population mean of a treatment of a randomized block experiment or the mean of the statistic Page 201. = The mean of the statistic 1. Page 35. = The mean of the statistic 1.1. Page 222.  The mean of the observations y within an xarray of a population. Page 245 The mean of r at " ... %: Page 247. .. The mean of r at" .. "I' Page 247. .. The mean of the statistic 1", Page 284.
""It
rt'
""y."';
""y.
"I
"";
""i~
"
 The
mean of the statistic ¥&Ya. which is the difference between two sample means. Page 121• .. The number of degrees of freedom. Page 66... (1) The number of degrees of freedom of the sum of squares of the first sample. Page 106. (2) The number of degrees of freedom of the numerator of the statistic F. Page 105• .. (U The number of degrees of freedom of the sum of squares of the second sample. Page 106. (2) The number of degrees of freedom of the denominator of the statistic F. Page lOS. = The relative frequency of successes of a binomial population. Table 21.10. page 426e The relative frequency of the 0 bservations of the rth category of a multinomial JX)pulation. e.g•• etc. Page 432:0: The standard deviation of a population. Page 4::: The variance of a population. Page 4• .. The hypothetical value of a population variance. Page 78 • ... The variance of the first population. Page 105• .. The variance of the statistic Go Page 277• ... The variance due to factor A for the linear hypothesis model as defined in Equation (2). Section 18.S. Page 320. ... The variance due to factor A for the variance component 1I!0del. Page 324... The variance due to interaction AB for the linear hypothesis model as defined in Equation (4). Section 18.5. Page 320. = The variance due to interaction AB for the variance component model. Page 324. The variance of the statistic b. Page 280. = The variance due to factor B for the linear hypothesis model as defined in Equation (3). SectioD 18.5. Page 320.
"It "2.
q
q'l
A
q~B
q1B
=
SYMBOLS AND ABBREVIATIONS
553
.. The variance due to factor B for the variance component model. Page 324 The variance due to replications. ~age 204.  The variance due to treatments. Page 204• ... The variance of the statistic tI. Page 223... The atandard error of the mean or the atandard deviation of the atatiatic Y; Page 36... The variance of the aample mean y. Page 36. = The variance of the statistic 1". Page 284 • ... The variance of a IllllDber of populatioll mealls as de filled ill Equation (2), Section 12.4. Page 164. 0  The standard error of the difference between two aample means. Yl Y2 Page 124. u~ _:. _ The variance of the difference between two sample meaDB. Page ,I,~ 124. u.L,  , = The variance of the difference between two adjusted means at Y" 11" 2 , P age 365 •  An angle to which a percentage is transformed. Page 448 A statistic which follows the chi square distribution. Pages 65, 404,405, 412, 413, 433, 435, 438, 443, 444.
U '2
B
":a,,.