This page intentionally left blank
Copyright © 2006 New Age International (P) Ltd., Publishers Published by New Age International (P) Ltd., Publishers All rights reserved. No part of this ebook may be reproduced in any form, by photostat, microfilm, xerography, or any other means, or incorporated into any information retrieval system, electronic or mechanical, without the written permission of the publisher. All inquiries should be emailed to
[email protected] ISBN : 978-81-224-2429-4
PUBLISHING FOR ONE WORLD
NEW AGE INTERNATIONAL (P) LIMITED, PUBLISHERS 4835/24, Ansari Road, Daryaganj, New Delhi - 110002 Visit us at www.newagepublishers.com
PREFACE
Statistics is a subject used in research and analysis of data in almost all fields. Official government statistics are our old records and creates historical evidences. Many people have contributed to the refinement of statistics, which we use today in various fields. It is a long process of development. Today we have many statistical tools for application and analysis of data in various fields like business, medicine, engineering, agriculture, management etc. Many people feel difficult to find which statistical technique is to be applied and where. Even though computer softwares have minimized the work, a basic knowledge is must for proper application. This book is providing the important and widely used statistical tests with worked out examples and exercises in real life applications. It is presented in a simple way in an understandable manner. It will be useful for the researchers to apply these tests for their data analysis. The statisticians also find it useful for easy reference. It is good companion for all who need statistical tools for their field. The author is greatly indebted to the Authorities of Annamalai University for permitting to publish this book.
V. Rajagopalan
This page intentionally left blank
CONTENTS
Preface ..................................................................................................................... v 1. INTRODUCTION..................................................................................................... 1-6 2. PARAMETRIC TESTS ............................................................................................ 7-93 Test –1 Test for a Population Proportion ................................................................. 9 Test – 2 Test for a Population Mean (Population variance is known) .......................... 13 Test – 3 Test for a Population Mean (Population variance is unknown) ...................... 16 Test – 4 Test for a Population Variance (Population mean is known) .......................... 20 Test – 5 Test for a Population Variance (Population mean is unknown) ....................... 24 Test – 6 Test for Goodness of Fit .......................................................................... 27 Test – 7 Test for Equality of two Population Proportions .......................................... 30 Test – 8 Test for Equality of two Population Means (Population variances are equal and known) ............................................................................... 33 Test – 9 Test for Equality of two Population Means (Population variances are unequal and known) ........................................................................... 36 Test – 10 Test for Equality of two Population Means (Population variances are equal and unknown) ........................................................................... 39 Test – 11 Test for Paired Observations ..................................................................... 42 Test – 12 Test for Equality of two Population Standard Deviations .............................. 45 Test – 13 Test for Equality of two Population Variances ............................................. 48 Test – 14 Test for Consistency in a 2×2 table ........................................................... 53 Test – 15 Test for Homogeneity of Several Population Proportions ............................. 56 Test – 16 Test for Homogeneity of Several Population Variances (Bartlett's test) ............ 60 Test – 17 Test for Homogeneity of Several Population Means ..................................... 65 Test – 18 Test for Independence of Attributes ........................................................... 70 Test – 19 Test for Population Correlation Coefficient Equals Zero ................................ 74 Test – 20 Test for Population Correlation Coefficient Equals a Specified Value .............. 78 Test – 21 Test for Population Partial Correlation Coefficient ........................................ 81 Test – 22 Test for Equality of two Population Correlation Coefficients ......................... 83 Test – 23 Test for Multiple Correlation Coefficient ..................................................... 86
viii
Contents
Test – 24 Test – 25
Test for Regression Coefficient ................................................................. 88 Test for Intercept in a Regression .............................................................. 90
3. ANALYSIS OF VARIANCE TESTS ..................................................................... 95-153 Test – 26 Test for Completely Randomized Design .................................................... 97 Test – 27 ANOCOVA Test for Completely Randomized Design ................................. 102 Test – 28 Test for Randomized Block Design .......................................................... 109 Test – 29 Test for Randomized Block Design .......................................................... 115 (More than one observation per cell) Test – 30 ANOCOVA Test for Randomized Block Design ......................................... 120 Test – 31 Test for Latin Square Design ................................................................... 127 Test – 32 Test for 22 Factorial Design .................................................................... 132 Test – 33 Test for 23 Factorial Design .................................................................... 136 Test – 34 Test for Split Plot Design ....................................................................... 141 Test – 35 ANOVA Test for Strip Plot Design ........................................................... 148 4. MULTIVARIATE TESTS .................................................................................... 155-172 Test – 36 Test for Population Mean Vectors (Covariance matrix is known) ................. 157 Test – 37 Test for Population Mean Vector (Covariance matrix is known) .................. 160 Test – 38 Test for Equality of Population Mean Vectors (Covariance matrices are equal and known) ............................................................................. 164 Test – 39 Test for Equality of Population Mean Vectors (Covariance matrices are equal and unknown) ......................................................................... 167 Test – 40 Test for Equality of Population Mean Vectors (Covariance matrices are unequal and unknown) ...................................................................... 170 5. NON-PARAMETRIC TESTS ............................................................................. 173-210 Test – 41 Sign Test for Median .............................................................................. 175 Test – 42 Sign Test for Medians (Paired observations) ............................................. 177 Test – 43 Median Test .......................................................................................... 179 Test – 44 Median Test for two Populations ............................................................. 182 Test – 45 Median Test for K Populations ................................................................ 184 Test – 46 Wald–Wolfowitz Run Test ...................................................................... 187 Test – 47 Kruskall–Wallis Rank Sum Test (H Test) .................................................. 189 Test – 48 Mann–Whitney–Wilcoxon Rank Sum Test ................................................ 191 Test – 49 Mann–Whitney–Wilcoxon U-Test ............................................................ 193 Test – 50 Kolmogorov–Smirnov Test for Goodness of Fit ........................................ 197 Test – 51 Kolmogorov–Smirnov Test for Comparing two Populations ........................ 199 Test – 52 Spearman Rank Correlation Test .............................................................. 201 Test – 53 Test for Randomness ............................................................................. 203 Test – 54 Test for Randomness of Rank Correlation ................................................ 205 Test – 55 Friedman's Test for Multiple Treatment of a Series of Objects .................... 207
Contents
ix
6. SEQUENTIAL TESTS ........................................................................................ 211-224 Test – 56 Sequential Test for Population Mean (Variance is known) ........................... 213 Test – 57 Sequential Test for Standard Deviation (Mean is known) ............................ 216 Test – 58 Sequential Test for Dichotomous Classification ......................................... 218 Test – 59 Sequential Test for the Parameter of a Bernoulli Population ......................... 220 Test – 60 Sequential Probability Ratio Test .............................................................. 223 7. TABLES
.................................................................................................... 225-246
REFERENCES .................................................................................................. 247-248
CHAPTER – 1
INTRODUCTION Testing of Statistical hypotheses is a remarkable aspect of statistical theory, which helps us to make decisions where there is a lack of uncertainty. There are many real life situations where we would like to take a decision for further action. Further, there are some problems, for which we would like to determine whether the claims are acceptable or not. Suppose that we are interested to test the following claims: 1. The average consumption of electricity in city ‘A’ is 175 units per month. 2. Bath soap ‘B’ reduces the rate of skin infections by 50%. 3. Oral polio vaccine is more potent than parenteral polio vaccine. 4. A new variety of paddy yields 16.5 tones per hectare. 5. Drug ‘C’ produces less drug dependence than drug ‘D’. 6. Health drink ‘E’ improves weight gain by 25% for children. 7. Plant produced by cloning grows 50% faster than the ordinary one. 8. Door-to-door campaign increases the sales of a washing powder by 20%. 9. Machine ‘F’ produces items within specifications than Machine ‘G’. 10. The defective items in a large consignment of coconut is less than 4%. These are a few of the many varieties of problems, which can be solved, only with the help of statisticians. To solve such problems, we need the following basic and important concept in statistics theory, as follows. 1. POPULATION In any statistical investigation, the interest usually lies in the assessment of general magnitude with respect to one or more characters relating to individuals belonging to a group. Such group of individuals under study is called population. The number of units in any population is known as population size, which may be either finite or infinite. In a finite population, the size is denoted by, ‘N’. Thus in statistics, population is an aggregate of objects, animate or inanimate under study. In statistical survey, complete enumeration of population is tedious, if the population size is too large or infinite. In some situations, even though, 100% inspection is possible, the units are destroyable during the course of inspection. As there are various constraints in conducting complete enumeration namely man-power, time factor, expenditure etc., we take the help of sampling.
2
Selected Statistical Tests
2. SAMPLE A finite, small subset of units of a population is called a sample and the number of units in a sample is called sample size and is denoted by ‘n’. The process of selecting a sample is known as sampling. Every member of a sample is called sample unit and the numerical values of such sample units are called observations. If each unit of population has an equal chance of being included in it, then such a sample is called random sample. A sample of n observations be denoted by X 1, X 2,…, X n. 3. PARAMETERS The statistical measures namely mean, standard deviation, variance, correlation coefficient etc., if they are calculated based on the population are called parameters. If the population information is neither available completely nor finite, parameters cannot be evaluated. In such cases, the parameters are termed as unknown. 4. STATISTICS The statistical measures, if they are obtained, based on the sample alone, they are called statistics. Any function of sample observations is also known as a statistic. The following are the list of standard symbols used for parameters and statistics: Statistical measures Mean Median Standard deviation Variance Proportion Correlation coefficient Regression coefficient
Parameter µ M σ σ2 P ρ β
Statistic X
m s s2 p r b
5. SAMPLING ERROR Errors arise because only a part of the population is (i.e., sample) used to estimate the parameters and drawing inferences about the population. Such error is called sampling error. 6. STATISTICAL INFERENCE The process of ascertaining or arriving valid conclusions to the population based on a sample or samples is called statistical inference. It has two major divisions namely, estimation and testing of hypothesis. 7. ESTIMATION When the parameters are unknown, they are estimated by their respective statistics based on the samples. Such a process is called estimation. If an unknown parameter is estimated by a specific statistic, it is called an estimator. For example, the sample mean is an estimator to the population mean. If a specific value is used for estimating, the unknown parameter is called an estimate. It is broadly classified into two types namely point and interval estimation.
Introduction
8.
3
POINT AND INTERVAL ESTIMATION
If a single value is used as an estimate to the unknown parameter, it is called as point estimate and if we choose two values a and b (a < b) so that the unknown parameter is expected to lie in between a and b. Such an interval (a, b), found for estimating the parameter is called as an interval estimate. 9.
TESTING OF HYPOTHESIS
Hypothesis testing begins with an assumption or hypothesized value that we make about the unknown population parameter. The sample data are collected and sample statistics are obtained from it. These statistics are used to test the assumption about the parameter whether we made is correct. The difference between the hypothesized value and the actual value of the sample statistic is determined. Then we decide whether the difference is significant or not. The smaller the difference, the greater the likelihood, that our hypothesized value is correct. We cannot accept or reject the hypothesized value about a population parameter simply by intuition. The statistical tests for testing the significance of the difference between the hypothesized value and the actual value of the sample statistic or the difference between any set of sample statistics are called tests of significance. 10.
STANDARD ERROR
The standard deviation of any statistic is known as its standard error and it is abbreviated as S.E. It plays an important role in statistical tests. List of standard errors of some well-known statistics for large samples are given below: S.No.
Statistic
Standard error
1
X
σ/ n
2
p
PQ / n
3
s
σ / 2n
4
s2
σ2 × 2/ n
5
r
6
11.
(X
1
− X2
(1 − ρ )/ 2
)
n
σ12 σ 22 + n1 n 2
7
(s1 − s2 )
σ12 σ2 + 2 2 n1 2n 2
8
( p1 − p2 )
P1 Q1 P2 Q2 + n1 n2
PARAMETRIC TESTS
The statistical tests for testing the parameters of the population are called parametric tests. The different kinds of parametric tests are studied in Chapter 2.
4
Selected Statistical Tests
The following are the test procedures that we adopt in studying the parametric tests in a systematic manner: 11.1 Null Hypothesis
It is a tentative statement about the unknown population parameter. It is to be tested based on the sample data. It is always of no difference between the hypothesized value and the actual value of the sample statistic. It is to be tested, for possible rejection under the assumption that it is true. It is usually denoted by H0. 11.2 Alternative Hypothesis
Any hypothesis, which is complementary to the null hypothesis, is called an alternative hypothesis. It is usually denoted byH1. 11.3 Type-I and Type-II Errors
In hypothesis testing, we draw valid inferences about the population parameters on the basis of the sample data alone. Due to sampling errors, there may be a possibility of rejecting a true null hypothesis, called as Type-I error and of accepting a false null hypothesis, called as Type-II error are tabulated as follows: Situation Conclusion
H0 is true (H1 is false)
H0 is false (H1 is true)
H0 is accepted (H1 is rejected)
Correct Decision
Type-II Error
H0 is rejected (H1 is accepted)
Type-I Error
Correct Decision
The acceptance or rejection of H0 depend on the test criterion that is used in hypothesis testing. In any hypothesis testing, we would like to control both Type-I and Type-II errors. The probability of committing Type-I error is denoted by α and the probability of committing Type-II error is denoted by β. 11.4 Level of Significance
There is no standard or universal level of significance for testing hypotheses. In some instances, a 5 percent level or 1 percent of significance are used. However, the choice of the level of significance must be at minimum. The higher the significance level leads to higher the probability of rejecting a null hypothesis when it is true. Usually, the level of significance is the size of the Type-I error, i.e., either 5% or 1%, is to be fixed in advance before collecting the sample information. 11.5 Critical Region
A region corresponding to a statistic, t in the sample space S which amounts to rejecting of H0 is termed as region of rejection or critical region. If ω is the critical region and if t is a statistic based on a sample of size, n then P (t ∈ ω | H0) = α. That is, the null hypothesis is rejected, if the observed value falls in the critical region. The boundary value of the critical region is called as critical value. Let it be Zα. 11.6 One-sided and Two-sided Tests
In any test, the critical region is represented by a portion of area under the probability curve of the sampling distribution of the statistic. In a statistical test, if the alternative hypothesis is one-sided (left-
Introduction
5
sided or right-sided) is called a one-sided test. For example, a test for testing the mean of a population, H0: µ = µ0 against the alternative hypothesis H1: µ < µ0 (left-sided) or H1: µ > µ0 (right-sided) and for testing H0 against H1: µ ≠ µ0 (two-sided) is known as two-sided test. 11.7 Test Statistic
A statistical test is conducted by means of a test statistic for which the probability distribution is determined by the assumption that the null hypothesis is true. It is based on the statistic, the expected value of the statistic (hypothesized value assumed in H0) and the standard error of the statistic. The value so obtained as test statistic value based on the observed data is called observed value of the test statistic, let it be Z, and we use this value for arriving conclusion. 11.8 Conclusion
By comparing the two values namely, the observed value of the test statistic and the critical value, the conclusion is arrived at. If Z ≤ Zα, we conclude that there is no evidence against the null hypothesis H0 and hence it may be accepted. If Z > Zα, we conclude that there is evidence against the null hypothesis H0 and in favor of H1. Hence, H0 is rejected and alternatively, H1 is accepted. 12. ANALYSIS OF VARIANCE It is a powerful statistical tool in tests of significance. In parametric tests, we discussed the statistical tests relating to mean of a population or equality of means of two populations. In situations, when we have three or more samples to consider at a time, an alternative procedure is needed for testing the hypothesis that all the samples are drawn from the same populations, which have the same mean. Analysis of variance (ANOVA) was introduced by R.A. Fisher to deal the problem in the analysis of agricultural data. Variations in the observations are inherent in nature. The total variation in the observed data is due to the following two causes namely, (i) assignable causes, and (ii) chance causes. By this technique, the total variation in the sample data can be bifurcated into variation between sample and variation within samples. The second kind of variation is due to experimental error. These kinds of tests are very much applicable in agricultural field experiments, where they want to know the yield of different kinds of seeds, fertilizers adopted, pesticides used, different irrigation, cultivation method etc., accordingly there are different types of ANOVA tests available and are provided in Chapter 3. In ANOVA tests, we need the following terms with their definitions: 12.1 Treatments
Various factors or methods that we adopted in a comparative experiment are termed as treatments. For example, in field experiments, different varieties of paddy seeds, different kinds of fertilizers, different methods of cultivation etc., are called treatments. 12.2 Experimental Unit
A small area of experimental material is used for applying the treatment is called an experimental unit. In agricultural experiments, a cultivated land, usually called as experimental material is divided into smaller areas of plots in which, different treatment can be applied in it. Such kind of plots are called experimental units.
6
Selected Statistical Tests
12.3 Blocks
In field experiments, the experimental material is firstly divided into relatively homogeneous divisions, known as Blocks. All the blocks are further divided into small plots of experimental units. 12.4 Replication
The repetition of the treatments to the experimental units more number of times under investigation is called replication. In agricultural experiments, each block will receive all the treatments and in every block the similar treatments are repeated according to the number of blocks available. Hence, in analysis, the number of blocks will be same as number of replications. 12.5 Randomization
The adoption of various treatments to the experimental units in a random manner is called randomization. Different kinds of randomization will be adopted in the ANOVA tests, namely, complete randomization, randomization within blocks, row-wise, column-wise etc., according to the types of experimental designs. 13. MULTIVARIATE DATA ANALYSIS The data and analysis that we consider for more than one character (variable) plays an important role in the theory of statistics, usually called as multivariate analysis. Such kind of data will be in two dimensions. For example, in the study of physical characters namely, age (X 1), height (X 2), weight (X 3) of ‘N’ individuals, it can be arranged into a two dimensional data in the form of a matrix of order, 3 × N observations, the one direction being the sample numbers and the other being the variables. Hence, matrix theory has a major role in multivariate data analysis and the readers should have knowledge on matrix algebra. The tests of significance relating to multivariate data are provided in Section 4. 14. NON-PARAMETRIC METHODS The hypothesis tests mentioned above have made inferences about population parameters. These parametric tests have used the parametric statistics of samples that came from the population being tested. For those tests, we made the assumption about the population from which the samples were drawn. There are tests, which do not have any restriction or assumption about the population from which we sampled. They are known as distribution free or non-parametric tests. The hypotheses of non-parametric tests are concerned with something other than the value of a population parameter. Such different kinds of non-parametric tests are discussed in Chapter 5. 15. SEQUENTIAL TESTS The statistical tests mentioned earlier are based on fixed sample size. That is, the number of sample observations for those tests are constants. However, in sequential tests, the number of observations required depends on the outcome of the observations and is therefore, not pre-determined, but a random variable. The sequential test for testing hypothesis, H0 against H1 is described as follows. At each stage of the experiment, the sample observation is drawn and making any one of the following three decisions namely (i) accepting H0, (ii) rejecting H0 ( or accepting H1) and (iii) continue the experiment by making an additional observation. Thus, such a test procedure is carried out sequentially. Some of the sequential tests are provided in Chapter 6.
CHAPTER – 2
PARAMETRIC TESTS
THIS PAGE IS BLANK
TEST – 1
TEST FOR A POPULATION PROPORTION
Aim
To test the population proportion, P be regarded as P 0, based on a random sample. That is, to investigate the significance of the difference between the observed sample proportion p and the assumed population proportion P 0. Source
If X is the number of occurrences of an event in n independent trials with constant probability P of occurrences of that event for each trial, then E (X ) = nP and V (X ) = nPQ, where Q = 1– P, is the probability of non-occurrence of that event. It has proved that for large n, the binomial distribution tends to normal distribution. Hence, the normal test can be applied. In a random sample of size n, let X be the number of persons possessing the given attribute. Then the observed proportion in the sample be X = p, (say), then E(p) = P and S.E(p) = n
Var( p) =
P (1 − P ) . n
Assumption
The sample size must be sufficiently large (i.e., n > 30) to justify the normal approximation to binomial. Null Hypothesis
H0: The population proportion (P ) is regarded as P 0. That is, there is no significant difference between the observed sample proportion p and the assumed population proportion P 0. i.e., H0: P = P 0. Alternative Hypotheses
H1(1) : P ≠ P 0 H1(2) : P > P 0 H1(3) : P < P 0
10
Selected Statistical Tests
Level of Significance ( α ) and Critical Region
(1)
| Z | > | Zα |
such that P { | Z | > | Z α | } = α
α/2 –Zα/2
0
–Zα/2
α/2
(2) Z > Zα such that P {Z > Zα} = α
α 0
(3) Z < –Zα such that P {Z < –Zα} = α
α – Zα
0
Zα
Parametric Tests
11
Critical Values ( Z αα) Critical value
Level of Significance (α)
(Zα)
1%
5%
10%
1. Two-sided test
Z α = 2.58
Z α = 1.96
Z α = 1.645
2. Right-sided test
Zα = 2.33
Zα = 1.645
Zα = 1.28
Zα = –2.33
Zα = –1.645
3. Left-sided test
Zα
= –1.28
Test Statistic
Z=
p−P P (1 − P ) n
(Under H0: P = P 0)
The statistic Z follows Standard Normal Distribution. Conclusions
1. If Z ≤ Zα, we conclude that the data do not provide us any evidence against the null hypothesis H0 . Hence, it may be accepted at α% level of significance. Otherwise reject H0 or accept H1 (1). 2. If Z ≤ Zα, we conclude that the data do not provide us any evidence against the null hypothesis H0 and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1 (2). 3. If Z ≤ Z α , we conclude that the data do not provide us any evidence against the null hypothesis H0 and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1 (3). Example 1 Hindustan Lever Ltd. Company expects that more than 30% of the households in Delhi city will consume its product if they manufacture a new face cream. A random sample of 500 households from the city is surveyed, 163 are favorable in manufacturing the product. Examine whether the expectation of the company would be met at 2% level. Solution Aim: To test the HLL Company’s manufacture of a new product of face cream will be consumed by 30% of the households in New Delhi or more. H0: The HLL Company’s manufacture of a new product of face cream will be consumed by 30% of the households in New Delhi. i.e., H0: P = 0.3. H1: The HLL Company’s manufacture of a new product of face cream will be consumed by more than 30% of the households in New Delhi. i.e., H1: p > 0.3
12
Selected Statistical Tests
Level of Significance: α = 0.05 and Critical Value: Zα = 1.645 Based on the above data, we observed that, n = 500, p = (163/500) = 0.326 Test Statistic:
Z=
p−P 0. 326 − 0. 3 (Under H0: P = 0.3) = = 1.27 P (1 − P ) (0. 3)(0.7 ) n 500
Conclusion: Since Z < Zα, we conclude that the data do not provide us any evidence against the null hypothesis H0. Hence, accept H0 at 5% level of significance. That is, the HLL Company’s manufacture of a new product of face cream will be consumed by 30% of the households in New Delhi. Example 2 A plastic surgery department wants to know the necessity of mesh repair of hernia. They think that 15% of the hernia patients only need mesh. In a sample of 250 hernia patients from hospitals, 42 only needed mesh. Test at 2% level of significance that the expectation of the department for mesh repair of hernia patients is true. Solution Aim: To test the necessity of hernia repair with mesh is 15% or not. H0: The necessity of mesh repair of hernia is 15%. i.e., H0: P = 0.15 H1: The necessity of mesh repair of hernia is not 15%. i.e., H1: P ≠ 0.15 Level of Significance: α = 0.02 and Critical Value: Zα = 2.33 Based on the above data, we observed that, n = 250, p = (42/250) = 0.326 Test Statistic: Z =
p−P P (1 − P ) n
(Under H0: P = 0.15) =
0.168 − 0. 15 = 0.80 (0. 15)(0.85) 250
Conclusion: Since Z < Zα, we conclude that the data do not provide us any evidence against the null hypothesis H0. Hence, accept H0 at 2% level of significance. That is, the necessity of mesh repair of hernia as expected by the plastic surgery department 15% is true.
EXERCISES 1. A random sample of 400 apples was taken from large consignment and 35 were found to be bad. Examine whether the bad items in the lot will be 7% at 1% level. 2. 150 people were attacked by a disease of which 5 died. Will you reject the hypothesis that the death rate, if attacked by this disease is 3% against the hypothesis that it is more, at 5% level?
TEST – 2
TEST FOR A POPULATION MEAN (Population Variance is Known)
Aim
To test the population mean µ be regarded as µ0, based on a random sample. That is, to investigate the significance of the difference between the sample mean X and the assumed population mean µ0. Source
Let X be the mean of a random sample of n independent observations drawn from a population whose mean µ is unknown and variance σ 2 is known. Assumptions
(i) The population from which, the sample drawn, is assumed as Normal distribution. (ii) The population variance σ 2 is known. Null Hypothesis
H0: The sample has been drawn from a population with mean µ be µ0. That is, there is no significant difference between the sample mean X and the assumed population mean µ0. i.e., H0 : µ = µ0. Alternative Hypotheses
H1 (1) : µ ≠ µ 0 H1 (2) : µ > µ 0 H1 (3) : µ < µ 0 Level of Significance ( α ) and Critical Region: (As in Test 1)
14
Selected Statistical Tests
Test Statistic
Z=
X −µ σ/ n
(Under H0 : µ = µ0 )
The Statistic Z follows Standard Normal distribution. Conclusions
(As in Test 1)
Example 1 The daily wages of a Factory’s workers are assumed to be normally distributed. A random sample of 50 workers has the average daily wage of rupees 120. Test whether the average daily wages of that factory be regarded as rupees 125 with a standard deviation of rupees 20 at 5% level of significance. Solution Aim: Our aim is to test the null hypothesis that the average daily wage of the Factory’s workers be regarded as rupees 125 with standard deviation of rupees 20. H0: The average daily wage of the Factory’s workers is 125 rupees. i.e., H0: µ = 125. H1: The average daily wage of the Factory’s workers is not 125 rupees. i.e., H1: µ ≠ 125. Level of Significance: α = 0.05 and Critical Value: Zα = 1.96 Test Statistic:
Z=
=
X −µ σ/ n
(Under H0 : µ = 125)
120 − 125 = – 1.77. 20 / 50
Conclusion: Since the observed value of the test statistic |Z| = 1.77, is smaller than the critical value 1.96 at 5% level of significance, the data do not provide us any evidence against the null hypothesis H0. Hence it is accepted and concluded that the average daily wage of the Factory’s workers be regarded as rupees 125 with a standard deviation of rupees 20. Example 2 A bulb manufacturing company hypothesizes that the average life of its product is 1,450 hours. They know that the standard deviation of bulbs life is 210 hours. From a sample of 100 bulbs, the company finds the sample mean of 1,390 hours. At a 1% level of significance, should the company conclude that the average life of the bulbs is less than the hypothesized 1,450 hours? Solution Aim: Our aim is to test whether the average life of bulbs is regarded as 1,450 hours or less. H0 : The average life of bulbs is 1,450 hours. i.e., H0 : µ = 1450. H1 : The average life of bulbs is below 1,450 hours. i.e., H1: µ < 1450. Level of Significance: α = 0.01 and Critical Value: Zα = –2.33
Parametric Tests
15
Test Statistic: Z =
=
X –µ σ/ n
(Under H0 : µ = 1450)
1390 − 1450 = – 2.86 210 / 100
Conclusion: Since the observed value of the test statistic Z = –2.86, is smaller than the critical value – 2.33 at 1% level of significance, the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted and concluded that the average life of the bulbs is significantly less than the hypothesized 1,450 hours.
EXERCISES 1. A Film producer knows that his movies ran an average of 100 days in each cities of Tamilnadu, and the corresponding standard deviation was 8 days. A researcher randomly chose 80 theatres in southern districts and found that they ran the movie an average of 86 days. Test the hypotheses at 2% significance level. 2. A sample of 50 children observed from rural areas of a district has an average birth weight of 2.85 kg. The past record shows that the standard deviation of birth weight in the district is 0.3 kg. Can we expect that the average birth weight of the children in the district will be more than 3 kg at 5% level?
TEST – 3
TEST FOR A POPULATION MEAN (Population Variance is Unknown)
Aim
To test that the population mean µ be regarded as µ0, based on a random sample. That is, to investigate the significance of the difference between the sample mean X and the assumed population m ean µ0. Source
A random sample of n observations X i, (i = 1, 2,…, n) be drawn from a population whose mean µ and variance σ 2 are unknown. Assumptions
(i) The population from which, the sample drawn is Normal distribution. (ii) The population variance σ 2 is unknown. (Since σ 2 is unknown, it is replaced by its unbiased estimate S2 ) Null Hypothesis
H0 : The sample has been drawn from a population with mean µ be µ0. That is, there is no significant difference between the sample mean X and the assumed population mean µ0. i.e., H0 : µ = µ0. Alternative Hypotheses
H1(1): µ ≠ µ0 H1(2): µ > µ0 H1(3): µ < µ0
Parametric Tests
17
Level of Significance ( α ) and Critical Region
(1) |t| > tα,n–1 such that P{|t| > tα, n–1} = α
← α/2
–tα/2, n–1
0
tα/2, n–1
→α/2
(2) t > tα, n −1 such that P { t > tα, n −1 } = α
0 (3) t < t α, n −1 such that P { t < t α, n −1 } = α
α→
−t α, n −1
0
Critical Values (tα, n–1) are obtained from Table 2.
tα,n–1
→α
18
Selected Statistical Tests
Test Statistic
t =
X −µ S/ n
1 X = n
(Under H0 : µ = µ0)
n
∑ i =1
X i , S2 =
1 n (X − X )2 n − 1 i =1 i
∑
The Statistic t follows t distribution with (n – 1) degrees of freedom. Conclusions
1. If |t| ≤ t α, we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(1). 2. If t ≤ tα , we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(2). 3. If t ≤ tα , we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(3). Example 1 A sample of 12 students from a school has the following scores in an I.Q. test. 89 87 76 78 79 86 74 83 75 71 76 92. Do this data support that the mean I.Q. mark of the school students is 80? Test at 5% level. Solution Aim: To test the mean I.Q. marks of the school students be regarded as 80 or not. H0: The mean I.Q. mark of the school students is 80. i.e., H0: µ=80. H1: The mean I.Q. mark of the school students is not 80. i.e., H1: µ ≠ 80. Level of Significance: α = 0.05 and Critical Value: t0.05,11 = 2.20 Test Statistic:
t=
=
X −µ S/ n
(Under H0 : µ = 80)
80. 5 − 80 = 0.25 7.01 / 12
Conclusion: Since |t|< 2.20, we conclude that the data do not provide us any evidence against the null hypothesis H0. Hence, accept H0, at 5% level of significance. That is, the mean I.Q. mark of the school students is regarded as 80.
Parametric Tests
19
Example 2 The average breaking strength of steel rods is specified as 22.25 kg. To test this, a sample of 20 rods was examined. The mean and standard deviations obtained were 21.35 kg and 2.25 respectively. Is the result of the experiment significant at 5% level? Solution Aim: To test the average breaking strength of steel rods specified as 22.25 kg is true or not. H0: The average breaking strength of steel rods specified as 22.25 kg is true. i.e., H0 : µ = 22.25. H1: The average breaking strength of steel rods specified as 22.25 kg is not true. i.e. , H1: µ ≠ 22.25. Level of Significance: α = 0.05 and Critical Value: t0.05,19 = 2.09 Test Statistic: t =
X −µ (Under H0 : µ = 22.25) S/ n =
21.35 − 22. 25 = –1.74 2. 31 20
Conclusion: Since |t| < 2.09, we conclude that the data do not provide us any evidence against the null hypothesis H0 and hence it may be accepted at 5% level of significance. That is, the average breaking strength of steel rods specified as 22.25 kg is true.
EXERCISES 1. A sales person says that the average sales of pickle in a week will be 120 numbers. A sample of sales on 8 weeks observed as 112 124 110 114 108 114 115 118 125 126. Examine whether the claim of the salesman is true at 1% significance level. 2. A sample of 10 coconut has the following yield of coconuts from a grove in a season are 68 56 47 52 62 70 56 54 63 60. Shall we conclude that the average yield of coconuts from the grove is 65? Test at 2% level.
TEST – 4
TEST FOR A POPULATION VARIANCE (Population Mean is Known)
Aim
To test the population variance σ 2 be regarded as σ 20 , based on a random sample. That is, to investigate the significance of the difference between the assumed population variance σ 20 and the sample variance s2. Source
A random sample of n observations X i, (i = 1, 2,…, n) be drawn from a normal population with known mean µ and unknown variance σ 2. Assumption
The population from which, the sample drawn is normal distribution. Null Hypothesis
H0: The population variance σ 2 is σ 20 . That is, there is no significant difference between the assumed population variance σ 20 and the sample variance s2. i.e., H0: σ 2 = σ 20 . Alternative Hypotheses
H1(1) : σ 2 ≠ σ 20 H1(2) : σ 2 > σ 20 H1(3) : σ 2 < σ 20
Parametric Tests
21
Level of Significance ( α ) and Critical Region
(1) χ2 < χ21– (α/2),n ∪ χ2 > χ2(α/2), n such that P{χ2 < χ21– (α/2),n ∪ χ2 > χ2(α/2), n } = α
α/2←
2
2
χ (α / 2 ), n
χ1− (α / 2 ), n
0
{
→α/2
}
(2) χ 2 > χ2 α, n such that P χ2 > χ2 α, n = α
→α 0 (3) χ2 < χ21–α, n such that P {χ2 < χ21–α, n} = α.
α← 0
2
χ (1 −α), n
2 χ α,n
22
Selected Statistical Tests
The critical values of Left sided test and Right sided test are provided as a and b are obtained from Table 3. Test Statistic n
χ2 =
∑( X
i
− µ )2
i =1
σ 20
The statistic χ2 follows χ2 distribution with n degrees of freedom. Conclusions
1. If χ21– (α/2) ≤ χ2 ≤ χ2(α/2), we conclude that the data do not provide us any evidence against the null hypothesis H0 , and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(1). 2. If χ2 ≤ χ2α, we conclude that the data do not provide us any evidence against the null hypothesis H0 , and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(2). 3. If χ2 ≥ χ21–α , we conclude that the data do not provide us any evidence against the null hypothesis H0 , and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(3). Example 1 An agriculturist expects that the average yield of coconut is 63 per coconut tree and variance is 20.25 per year from a coconut grove. A random sample of 10 coconut trees has the following yield in a year: 76 65 64 56 58 54 62 68 76 78. Test the variance is significant at 5% level of significance. Solution Aim: To test the variance yield of coconut from the grove is significant with the sample variance or not. H0: The variance of the yield of coconut in the grove is 20.25. i.e., H0: σ 2 = 20.25 H1: The variance of the yield of coconut in the grove is not 20.25. i.e., H1: σ 2 ≠ 20.25 Level of Significance: α = 0.05 Critical Values: χ2(.975), 10 = 3.247 & χ2(.025), 10 = 20.483 Critical Region: P (χ2(.975), 10 < 3.247) + P (χ2(.025), 10 >20.483) = 0.10 n
Test Statistic: χ2 =
∑ (X
i
i =1
σ 20
− µ) 2 =
49. 1 = 10.91 4. 5
Parametric Tests
23
Conclusion: Since χ21–(α/2) < χ 2 < χ2(α/2), we conclude that the data do not provide us any evidence against the null hypothesis H0. Hence, H0 is accepted at 5% level of significance. That is, the variance of the yield of coconut in the grove be regarded as 20.25. Example 2 The variation of birth weight (as measured by the variance) of children in a region is expected to be more than 0.16. The mean of the birth weight is known, which is 2.4 Kg. A sample of 11 children is selected, whose birth weight is obtained as follows. Weight (in Kgs.): 2.7 2.5 2.6 2.6 2.7 2.5 2.5 2.3 2.4 2.3 2.5 Set up the hypotheses and for testing the expectedness at 5% level of significance. Solution Aim: To test the variance of the birth weight of the children be 0.16 or more. H0: The variance of the birth weight of children in the region is 0.16. i.e., H0: σ 2 = 0.16 H1: The variance of the birth weight of children in the region is more than 0.16. i.e., H1: σ 2 > 0.16 Level of Significance: α = 0.05 and Critical Value: χ20.05,11 = 18.307 n
Test Statistic: χ2 =
∑( X
i
− µ) 2
i =1
σ 02
=
0. 31 = 1.94 0.16
Conclusion: Since χ2 < χ2α, we conclude that the data do not provide us any evidence against the null hypothesis H0. Hence, H0 is accepted at 5% level of significance. That is, the variance of the birth weight of children in the region is 0.16.
EXERCISES 1. A psychologist is aware of studies showing that the mean and variability (measured as variance) of attention, spans of 5-year-olds can be summarized as 80 and 64 minutes respectively. She wants to study whether the variability of attention span of 6-year-olds is different. A sample of 20 6-yearolds has the following attention spans in minutes: 86 89 84 78 75 74 85 71 84 71 75 68 75 71 82 85 81 78 79 78. State explicit null and alternative hypotheses and test at 5% level. 2. The average and variance of daily expenditure of office going women is known as Rs.30 and Rs.10 respectively. A sample of 10 office going women is selected whose daily expenditure is obtained as 35 33 40 30 25 28 35 28 35 40. Test whether the variance of the daily expenditure of office going women is 10 at 1% level of significance.
TEST – 5
TEST FOR A POPULATION VARIANCE (Population Mean is Unknown)
Aim
To test the population variance σ 2 be regarded as σ 20 , based on a random sample. That is, to investigate the significance of the difference between the assumed population variance σ 20 and the sample variance s2. Source
A random sample of n observations X i, (i = 1, 2,…, n) be drawn from a normal population with mean µ and variance σ 2 (both are unknown). The unknown population mean µ is estimated by its unbiased estimate X . Assumption
The population from which, the sample drawn is normal distribution. Null Hypothesis
H0: The population variance σ 2 is σ 20 . That is, there is no significant difference between the assumed population variance σ 20 and the sample variance s2. i.e., H0: σ 2 = σ 20 . Alternative Hypotheses
H1(1) : σ 2 ≠ σ 02 H1(2) : σ 2 > σ 02 H1(3) : σ 2 < σ 02 Level of Significance ( α ) and Critical Region:
(As in Test 4)
Parametric Tests
25
Test Statistic n
χ2 =
∑( X
i
− X )2
i =1
σ 20
The statistic χ2 follows χ2 distribution with (n–1) degrees of freedom. Conclusions
(As in Test 4)
Example 1 A Statistics Professor conducted an examination to the class of 31 freshmen and sophomores. The mean score was 72.7 and the sample standard deviation was 15.9. Past experience to the Professor to believe that, a standard deviation of about 13 points on a 100-point examination indicates that the exam does a good job. Does this exam meet his goodness criterion at 10% level? Solution Aim: To test that, the examination meets the professor’s goodness criterion or not. H0: The variance of the score on the exam is regarded as 132 (=169). i.e., H0: σ 2 = 169 H1: The variance of the score on the exam is not 169. i.e., H1: σ 2 ≠ 169 Level of Significance: α = 0.10 Critical Values: χ2(.95), 30 = 18.493 & χ2(.05), 30 = 43.773 Critical Region: P (χ2(.95),30 < 18.493) + P (χ2(.05),30 > 43.773) = 0.10 n
Test Statistic: χ2 =
∑( X
i
− X )2
i =1
σ 20
=
ns 2 31× (15. 9) 2 = = 46.37 σ 20 132
Conclusion: Since χ2 > χ2(α/2), we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 10% level of significance. That is, this examination does not meet his goodness criterion of believing the standard deviation to be 13. Example 2 The variation of daily sales in a vegetable mart is reported as Rs.100. A sample of 20 day’s was observed with variance as Rs.160. Test whether the variance of the sales in the vegetable mart be regarded as Rs.100 or not at 1% level of significance. Solution Aim: To test the variance of the sales in the vegetable mart be regarded as Rs.100 or not. H0: The variance of the sales in the vegetable mart is Rs.100. i.e., H0: σ 2 = 100 H0: The variance of the sales in the vegetable mart is not Rs.100. i.e., H1: σ 2 ≠ 100 Level of Significance: α = 0.05 Critical Values: χ2(.975), 19 = 8.907 & χ2(.025), 19 = 32.852
26
Selected Statistical Tests
Critical Region: P (χ2(.975), 19 < 8.907) + P (χ2(.025), 19 > 32.852) = 0.05 n
Test Statistic: χ2 =
∑( X
i
− X )2
i =1
σ 20
=
3200 = 32 100
Conclusion: Since χ21–(α/2) < χ 2 < χ2(α/2), we conclude that the data do not provide us any evidence against the null hypothesis H0 . Hence, H0 is accepted at 5% level of significance. That is, the variance of the sales in the vegetable mart is Rs.100.
EXERCISES 1. A manufacturer claims that the lifetime of a certain brand of batteries produced by his company has a variance more than 6800 hours. A sample of 20 batteries selected from the production department of that company has a variance of 5000 hours. Test the manufacturer’s claim at 5% level. 2. A manufacturer recorded the cut-off bias (volt) of a sample of 10 tubes as follows: 21.9 22.2 22.2 22.1 22.3 21.8 22.0 22.4 22.0 22.1. The variability of cut-off bias for tubes of a standard type as measured by the standard deviation is 0.210 volts. Is the variability of new tube with respect to cut-off bias less than that of the standard type at 1% level?
TEST – 6
TEST FOR GOODNESS OF FIT Aim
To test that, the observed frequencies are good for fit with the theoretical frequencies. That is, to investigate the significance of the difference between the observed frequencies and the expected frequencies, arranged in K classes. Source
Let Oi, (i = 1, 2,…, K) is a set of observed frequencies on K classes based on any experiment and E i (i = 1, 2,…, K) is the corresponding set of expected (theoretical or hypothetical) frequencies. Assumptions
(i) The observed frequencies in the K classes should be independent. K
(ii)
K
∑O = ∑ E i
i =1
i
= N.
i =1
(iii) The total frequency, N should be sufficiently large (i.e., N > 50). (iv) Each expected frequency in the K classes should be at least 5. Null Hypothesis
H0: The observed frequencies are good for fit with the theoretical frequencies. That is, there is no significant difference between the observed frequencies and the expected frequencies, arranged in K classes. Alternative Hypothesis
H1: The observed frequencies are not good for fit with the theoretical frequencies. That is, there is a significant difference between the observed frequencies and the expected frequencies, arranged in K classes.
28
Selected Statistical Tests
Level of Significance ( α ) and Critical Region
χ2 > χ2α,(K–1) such that P{χ2 > χ2α,(K–1)} = α Test Statistic
Oi − E i χ = Ei i =1 K
∑
2
2
The Statistic χ2 follows χ2 distribution with (K–1) degrees of freedom. Conclusion
If χ2 ≤ χ2α,(K–1), we conclude that the data do not provide us any evidence against the null hypothesis H0 and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example 1 The sales of milk from a milk booth are varying from day-to-day. A sample of one-week sales (Number of Liters) is observed as follows. Day: Monday Tuesday Wednesday Thursday Friday Saturday Sunday Sales: 154 145 152 140 135 165 173 Examine whether the sales of milk are same over the entire week at 1% level of significance. Solution Aim: To test the sales of milk is same over the entire week or not. H0: The sale of milk is same over the entire week. H1: The sale of milk is not same over the entire week. Level of Significance: α = 0.01 Critical value: χ20.01,6 = 16.812 Frequency Day Observed (Oi )
Expected (Ei )
(Oi − Ei ) 2
(Oi − Ei ) 2 Ei
Monday
154
152
4
0.0263
Tuesday Wednesday Thursday Friday Saturday Sunday
145 152 140 135 165 173
152 152 152 152 152 152
49 0 144 289 169 441
0.3224 0.0000 0.9474 1.9013 1.1118 2.9013
1064
1064
Test Statistic:
χ2 =
Oi − E i Ei i =1 K
∑
7.2105 2
= 7.2105
Parametric Tests
29
Conclusion: Since χ2 < χ2α,(K–1), we conclude that the data do not provide us any evidence against the null hypothesis H0 . Hence, H0 is accepted at 1% level of significance. That is, the sales of milk are same over the entire week. Example 2 In an experiment on pea breeding, Mendal obtained the following frequencies of seeds from 560 seeds: 312 rounded and yellow (RY), 104 wrinkled and yellow (WY); 112 round and green (RG), 32 wrinkled and green (WG). Theory predicts that the frequencies should be in the proportion 9:3:3:1 respectively. Set up the hypothesis and test it for 1% level. Solution Aim: To test the observed frequencies of the pea breeding in the ratio 9:3:3:1. H0: The observed frequencies of the pea breeding are in the ratio 9:3:3:1. H1: The observed frequencies of the pea breeding are not in the ratio 9:3:3:1. Level of Significance: α = 0.01 Critical value: χ20.01,3 = 11.345 Seed type
(Oi − Ei ) 2 Ei
Frequency Observed (Oi )
Expected (Ei )
( Oi − E i ) 2
312 104 112 32 560
315 105 105 35 560
9 1 49 9
RY WY RG WG
Oi − E i Test Statistic: χ = Ei i =1 K
2
∑
0.0286 0.0095 0.4667 0.2571 0.7619
2
= 0.7619
Conclusion: Since χ2 < χ2α,(K–1) , we conclude that the data do not provide us any evidence against the null hypothesis H0 . Hence, H0 is accepted at 1% level of significance. That is, the observed frequencies of the pea breeding are in the ratio 9:3:3:1.
EXERCISES 1. A chemical extract plant processes seawater to collect sodium chloride and magnesium. It is known that seawater contains sodium chloride, magnesium and other elements in the ratio of 62:4:34. A sample of 300 hundred tones of seawater has resulted in 195 tones of sodium chloride and 9 tones of magnesium. Are these data consistent with the known composition of seawater at 10% level? 2. Among 80 off springs of a certain cross between guinea pigs, 42 were red, 16 were black and 22 were white. According to genetic model, these numbers should be in the ratio 9:3:4. Are these consistent with the model at 1% level of significance?
TEST – 7
TEST FOR EQUALITY OF TWO POPULATION PROPORTIONS Aim
To test the two population proportions P 1 and P 2 be equal, based on two random samples. That is, to investigate the significance of the difference between the two sample proportions p1 and p2. Source
From a random sample of n1 observations, X 1 observations possessing an attribute A whose sample proportion p1 is X 1/n1. Let the corresponding proportion in the population be denoted by P 1, which is unknown. From another sample of n2 observations, X 2 observations possessing the attribute A whose sample proportion p2 is X 2/n2. Let the corresponding proportion in the population be denoted by P 2, which is unknown. Assumption
The sample sizes of the two samples are sufficiently large (i.e., n1, n2 ≥ 30 ) to justify the normal approximation to the binomial. Null Hypothesis
H0: The two population proportions P 1 and P 2 are equal. That is, there is no significant difference between the two sample proportions p1 and p2. i.e., H0: P 1 = P 2. Alternative Hypotheses
H1(1) : P 1 ≠ P 2 H1(2) : P 1 > P 2 H1(3) : P 1 < P 2 Level of Significance ( α ) and Critical Region:
(As in Test 1)
Parametric Tests
31
Test Statistic
Z=
( p1 − p 2 ) − (P1 − P2 ) 1 1 P(1 − P ) + n1 n2 ∧
∧
(Under H0: P 1 = P 2)
n1 p1 + n 2 p 2 n1 + n2 The statistic Z follows Standard Normal distribution. ∧
P =
Conclusions
(As in Test 1)
Example 1 Random samples of 300 male and 400 female students were asked whether they like to introduce CBCS system in their university. 160 male and 230 female were in favor of the proposal. Test the hypothesis that proportions of male and female in favor of the proposal are equal or not at 2% level. Solution Aim: To test the proportion of male and female students are equal or not, in introducing CBCS system in their university. H0: The proportion of male (P 1) and female (P 2) students are equal, in favour of the proposal of introducing CBCS system in their university. i.e., H0: P 1 = P 2. H1: The proportion of male and female students is not equal, in favour of the propasal of introducing CBCS system in their university. i.e., H1: P 1 ≠ P 2 Level of Significance: α = 0.02 and Critical Value: Zα= 2.33 16 Based on the data, we observed that n1 = 300, p1 = = 0.53, 300 230 n2= 400, p2 = = 0.58 400 (300 × 0.53) + (400 × 0. 58) n1 p1 + n 2 p 2 = = 0.56 300 + 400 n1 + n2 ( p1 − p 2 ) − (P1 − P2 ) Z= (Under H0: P 1 = P 2) ∧ ∧ 1 1 P(1 − P ) + n1 n2
∧
P= Test Statistic:
Z=
(0. 53 − 0. 58) 1 1 0. 56 × 0. 44 + 300 400
= – 1.32
Conclusion: Since Z < Zα , we conclude that the data do not provide us any evidence against the null hypothesis H0 and hence it is accepted at 2% level of significance. That is, the proportion of male and female students are equal, in favour of the propsal of introducing CBCS system in their university.
32
Selected Statistical Tests
Example 2 From a random sample of 1000 children selected from rural areas of a district in Tamilnadu, it is found that five are affected by polio. Another sample of 1500 from urban areas of the same district, three of them is affected. Will it be reasonable to claim that the proportion of polio-affected children in rural area is more than urban area at 1% level? Solution Aim: To test the proportion of polio-affected children in rural area is same as in urban area or more than urban area. H0: The proportion of polio-affected children in rural (P 1) and urban (P 2) areas are equal i.e., H0 : P 1 = P 2. H1: The proportion of polio-affected children in rural area is more than urban area. i.e. , H1: P 1 > P 2. Level of Significance: α = 0.01 and Critical Value: Zα= 2.33 5 Based on the data, we observed that n1 = 1000, p1 = = 0.005, 1000 3 n2 = 1500, p2 = = 0.002 1500 n1 p1 + n 2 p 2 (1000 × 0.005) + (1500 × 0. 002) ∧ = = = 0.0032 P n1 + n2 1000 + 1500 Test Statistic:
Z=
Z=
( p1 − p 2 ) − (P1 − P2 ) 1 1 P(1 − P ) + n1 n2 ∧
∧
(Under H0: P 1 = P 2)
(0. 005 − 0. 002) 1 1 0. 0032 × 0.9968 + 1000 1500
= 1.30
Conclusion: Since Z < Z α , we conclude that the data do not provide us any evidence against the null hypothesis H0 and hence it is accepted at 1% level of significance. That is, the proportions of polio-affected children in rural and urban areas are equal.
EXERCISES 1. From a sample of 300 pregnancies in city-A in a year, 163 births are females. Another sample of 250 pregnancies in city-B in the same year, 132 births are females. Test whether the female births in both cities are equal at 1% level of significance. 2. A sample of 500 persons were selected from a city in Tamilnadu, 210 are tea drinkers. Another sample of 300 persons from a city of Kerala, 160 persons are tea drinkers. Test the hypothesis that the tea drinkers in Tamilnadu are less than that of Kerala at 10% level.
TEST – 8
TEST FOR EQUALITY OF TWO POPULATION MEANS (Population Variances are Equal and Known)
Aim
To test the two population means are equal, based on two random samples. That is, to investigate the significance of the difference between the two sample means X 1 and X 2 . Source
A random sample of n1 observations has the mean X 1 be drawn from a population with unknown mean µ1. A random sample of n2 observations has the mean X 2 be drawn from another population with unknown mean µ2. Assumptions
(i) The populations, from which, the two samples drawn are assumed as Normal distributions. (ii) The two Population variances are equal and known which is denoted by σ 2. Null Hypothesis
H0: The two population means µ1 and µ2 are equal. That is, there is no significant difference between the two sample means X 1 and X 2 . i.e., H0: µ1 = µ2 Alternative Hypotheses
H1(1) : µ1 ≠ µ2 H1(2) : µ1 > µ2 H1(3) : µ1 < µ2 Level of Significance ( α ) and Critical Region: (As in Test 1)
34
Selected Statistical Tests
Test Statistic
Z=
( X 1 − X 2 ) − (µ1 − µ 2 )
(Under H0 : µ1 = µ2)
1 1 σ + n1 n2
The statistic Z follows Standard Normal distribution. Conclusions
(As in Test 1)
Example 1 TVS Company wanted to test the mileage of its two wheelers with that of other brands. A random sample of 125 TVS make gave a mileage of 90 km. A random sample of 150 two wheelers of all other brands gave a mileage of 80 km. It is known that the standard deviation of both TVS Company and all other brands was 12 km. If significance is 5%, do TVS vehicles give a better mileage? Solution Aim: To test the average mileage of TVS two-wheelers with that of other brands is equal or more. H0: The average mileage of TVS two-wheelers (µ1) and all other brands (µ2) are equal. i.e., H0: µ1 = µ2. H1: The average mileage of TVS two-wheelers is more than that of all other brands. i.e. , H1: µ1 > µ2. Level of Significance: α = 0.05 and Critical Value: Zα = 1.645. Test Statistic:
Z=
( X 1 − X 2 ) − (µ 1 − µ 2 ) 1 1 σ + n1 n2 90 − 80
= 12
1 1 + 125 150
(Under H0 : µ1 = µ2)
= 6.88
Conclusion: Since the observed value of the test statistic Z = 6.88, is larger than the critical value 1.645 at 5% level of significance, the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted and concluded that the average mileage of TVS two wheelers is more than that of all other brands. Example 2 A random sample of 1000 persons from Chennai city have an average height of 67 inches and another random sample of 1200 persons from Mumbai city have an average height of 68 inches. Can the samples be regarded that the average height of persons from both cities is equal with a standard deviation of 5 inches? Test at 2% level of significance.
Parametric Tests
35
Solution Aim: To test the average height of persons from the cities Chennai and Mumbai are equal or not. H0: The average height of persons from the cities Chennai (µ1) and Mumbai (µ2) are equal. i.e., H0: µ1 = µ2. H1: The average height of persons from the cities Chennai and Mumbai are not equal. i.e. , H1: µ1 ≠ µ2. Level of Significance: α = 0.02 and Critical Value: Zα= 2.33 Test Statistic:
Z=
( X 1 − X 2 ) − (µ 1 − µ 2 ) σ
1 1 + n1 n2
67 − 68
= 5
1 1 + 1000 1200
(Under H0 : µ1 = µ2)
= 4.67
Conclusion: Since the observed value of the test statistic Z = 4.67, is larger than the critical value 2.33 at 2% level of significance, the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted and concluded that the average height of persons from the cities Chennai (µ1) and Mumbai (µ2) are not equal.
EXERCISES 1.
A sample of 100 households from Chidamabaram has an average monthly income of Rs. 6000 and from a sample of 125 from Cuddalore has Rs. 5400. It is known that the standard deviation of monthly income in those two places is Rs. 500. Is it reasonable to say that the average monthly income of Chidambaram is more than that of Cuddalore at 10% level? 2. Two research laboratories have independently produced drugs that provide relief to arthritis suffer. The first drug was tested on a group of 85 arthritis sufferers, producing an average of 6.8 hours of relief. The second drug was tested on 95 arthritis sufferers, producing an average of 7.2 hours of relief. Given that, the standard deviation of hours of relief by both drugs is equal and 2 hours. At 1% level of significance, does the first drug provide a significantly shorter period of relief ?
TEST – 9
TEST FOR EQUALITY OF TWO POPULATION MEANS (Population Variances are Unequal and Known)
Aim
To test the two population means be equal, based on two random samples. That is, to investigate the significance of the difference between the two sample means X 1 and X 2 is significant. Source
A random sample of n1 observations has the mean X 1 be drawn from a population with unknown mean µ1 and known variance σ 12 . A random sample of n2 observations has the mean X 2 be drawn from another population with unknown mean µ2 and known variance σ 22 . Assumptions
(i) The populations from which, the two samples drawn, are Normal distributions. 2 (ii) The population variances σ 2 and σ 2 are known. 1
Null Hypothesis
H0: The two population means µ1 and µ2 are equal. That is, there is no significant difference between the two sample means X 1 and X 2 . i.e., H0 : µ1 = µ2 Alternative Hypotheses
H1(1) : µ1 ≠ µ2 H1(2) : µ1 > µ2 H1(3) : µ1 < µ2 Level of Significance ( α ) and Critical Region:
(As in Test 1)
Parametric Tests
37
Test Statistic
Z=
( X 1 − X 2 ) − (µ1 − µ 2 ) σ 12 σ 22 + n1 n 2
(Under H0 : µ1 = µ2)
The statistic Z follows Standard Normal distribution. Note: If σ 12 and σ 22 are not known, they are estimated by their respective sample variances s12 and s 22 (for large sample, the sample variance is asymptotically unbiased to its population variance). In this case, the test statistic becomes Z=
Conclusions
( X 1 − X 2 ) − (µ1 − µ 2 ) s12 s 22 + n1 n2
(Under H0: µ1 = µ2)
(As in Test 1).
Example 1 The average daily wage of a sample of 140 workers in Factory-A was Rs. 120 with a standard deviation of Rs. 15. The average daily wage of a sample of 190 workers in Factory-B was Rs. 125 with a standard deviation of Rs. 20. Can we conclude that the daily wages paid by Factory-A are lower than those paid by Factory-B at 5% level? Solution Aim: To test whether the average daily wage of Factory-A with that of Factory-B is equal or less. H0: The average daily wage of Factory-A (µ1) and Factory-B (µ2) are equal. i.e., H0 : µ1 = µ2 H1: The average daily wage of Factory-A is less than Factory-B. i.e., H1 : µ1 < µ2 Level of Significance: α = 0.05 and Critical Value: Zα= –1.645 Test Statistic: Z =
=
( X 1 − X 2 ) − (µ 1 − µ 2 ) s12 s 22 + n1 n 2 120 − 125 (15) 2 ( 20)2 + 140 190
(Under H0 : µ1 = µ2)
= –2.60
Conclusion: Since |Z|, is larger than the critical value at 1% level of significance, the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence H1 is accepted and concluded that the average daily wage of Factory-A is less than that of Factory-B.
38
Selected Statistical Tests
Example 2 In a survey of buying habits, 390 women shoppers are chosen at random in super market-A located at Calcutta. Their average weekly food expenditure is Rs. 500 with a standard deviation of Rs. 60. From a random sample of 240 women shoppers chosen from super market-B of the same city, the average weekly food expenditure is Rs. 520 with a standard deviation of Rs. 75. Can we agree that the average weekly food expenditure of the women shoppers from two super markets is equal at 2% level? Solution Aim: To test the average weekly food expenditure of women shoppers from two super markets A and B are equal or not. H0: The average weekly food expenditure of women shoppers from super market-A (µ1) and super market-B (µ2) are equal. i.e., H0 : µ1 = µ2. H1: The average weekly food expenditure of women shoppers from super market-A and super market-B are not equal. i.e., H1 : µ1 ≠ µ2 Level of Significance: α = 0.05 and Critical Value: Zα= 2.33 Test Statistic: Z =
=
( X 1 − X 2 ) − (µ 1 − µ 2 ) s12 s 22 + n1 n 2 500 − 520 (60) 2 (75) 2 + 390 240
(Under H0 : µ1 = µ2)
= – 3.50
Conclusion: Since the observed value of the test statistic lZl = 3.50, is larger than the critical value 2.33 at 2% level of significance, the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted and concluded that the average weekly food expenditure of women shoppers from two super markets A and B are not equal.
EXERCISES 1. Suppose that the number of hours spent for watching the television in a day by middle-aged women is normally distributed with standard deviation of 30 minutes in urban area and 45 minutes in rural area. From a sample of 75 women in urban area and 100 women in rural area, the average number of hours spent by them in watching the television is 6 hours and 7 hours respectively per day. Can you claim that the average number of hours spent by middle-aged women in rural and urban area is equal at 1% level? 2. The marks obtained by students from Public schools and Matriculation schools in a city are normally distributed with a standard deviations of 12 and 15 marks respectively. A random sample of 60 students from Public schools has a mean mark of 84 and 80 students and from Matriculation schools has an average of 90 marks. Can we claim that the students of Public schools get less mark than that of Metric schools at 1% level?
TEST – 10
TEST FOR EQUALITY OF TWO POPULATION MEANS (Population Variances are Equal and Unknown)
Aim
To test the null hypothesis of the mean of the two populations are equal, based on two random samples. That is, to investigate the significance of the difference between the two sample means X 1 and X 2 . Source
A random sample of n1 observations X 1i, (i = 1, 2,…, n1) be drawn from a population with unknown mean µ1 . A random sample of n2 observations X 2j, (j = 1, 2,…, n2) be drawn from another population with unknown mean µ2. Assumptions
(i) The populations from which, the two samples drawn, are Normal distributions. (ii) The two Population variances are equal and unknown which is denoted by σ 2 (Since σ 2 is unknown, it is replace by unbiased estimate S2 ). Null Hypothesis
H0: The two population means µ1 and µ2 are equal. That is, there is no significant difference between the two sample means X 1 and X 2 . i.e., H0: µ1 = µ2 Alternative Hypotheses
H1(1) : µ1 ≠ µ2 H1(2) : µ1 > µ2 H1(3) : µ1 < µ2 Level of Significance ( α ) and Critical Region
1.
| t | < tα, (n1 +n 2 – 2 ) such that P { | t | > tα, ( n1 + n2 – 2 ) } = α
40
Selected Statistical Tests
2.
t > tα ,( n1 +n 2 – 2 ) such that P { t > tα ,( n1 +n 2 – 2 ) } = α
3.
t < –tα ,( n1 +n 2 – 2 ) such that P { t < –tα ,( n1 +n 2 – 2 ) } = α
Critical Values (tα ,( n1 +n 2 – 2 ) ) are obtained from Table 2. Test Statistic
t =
( X 1 − X 2 ) − (µ1 − µ 2 ) 1 1 S + n1 n2
∑ (X
(Under H0 : µ1 = µ2)
n1
1 X1 = n 1
n1
∑X i =1
1i
, X2
1 = n2
n2
∑X j =1
2i
and S 2 =
i =1
) ∑(X n2
i1
− X1 +
j =1
i2
− X2
)
n1 + n2 − 2
.
The statistic t follows t distribution with (n1 + n2 – 2 ) degrees of freedom. Conclusions
(As in Test 3)
Example 1 The gain in weight of two random samples of chicks on two different diets A and B are given below. Examine whether the difference in mean increases in weight is significant. Diet A: 2.5 2.25 2.35 2.60 2.10 2.45 2.5 2.1 2.2 Diet B: 2.45 2.50 2.60 2.77 2.60 2.55 2.65 2.75 2.45 2.50 Solution Aim: To test the mean increases in weights by diet-A (µ1) and diet-B (µ2) are equal or not. H0 : The mean increases in weights by both diets are equal. i.e., H0 : µ1 = µ2 H1 : The mean increases in weights by both diets are not equal. i.e., H1 : µ1 ≠ µ2 Level of significance: α = 0.05(say) and Critical value: t0.05 for 17 d.f = 2.11 Test Statistic:
t =
=
( X 1 − X 2 ) − (µ1 − µ 2 ) 1 1 S + n1 n2 (2. 34 − 2. 58)
(Under H0 : µ1 = µ2)
= –2.25
1 1 0.16 + 9 10 Conclusion: Since |t| > t α, we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 5% level of significance. That is, the mean increase in weights by two diets A and B are not equal.
Parametric Tests
41
Example 2 A researcher is interested to know whether the performance in a public examination by students of schools from Tsunami affected area compared with other students is poor or not. A random sample of 10 students from coastal area schools is selected whose marks are given below. 68 72 64 65 56 72 64 56 60 73. Another sample of 8 students from non-coastal area schools has the following marks 76 78 68 72 83 85 88 78. Test at 1% level of the hypothesis. Solution Aim: To test the performance in a public examination by students of schools from Tsunami affected area compared with other students is equal or less. H0: The performance in a public examination by students of schools from Tsunami affected area (µ1) compared with other students (µ2) is equal. i.e., H0: µ1 = µ2 H1: The performance in a public examination by students of schools from Tsunami affected area is less than that of other students. i.e., H1: µ1 < µ2 Level of Significance: α = 0.01 and Critical value: t0.01 for 16 d.f = – 2.58 Test Statistic:
t=
( X 1 − X 2 ) − (µ1 − µ 2 ) S
=
1 1 + n1 n2
(65 − 78. 5) 1 1 6.88 + 10 8
(Under H0 : µ1 = µ2)
= – 4.13
Conclusion: Since |t| > |t α|, we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 1% level of significance. That is, the performance in a public examination by students of schools from Tsunami affected area is less than that of other students.
EXERCISES 1. A paper company produces covers on two machines whose data is given below. The average number of items produced by two machines per hour is 250 and 280 with standard deviations 16 and 20 respectively based on records of 50 hours production. Can we expect that the two machines are equally efficient at 10% level of significance? 2. The yield of two varieties of brinjal on two independent sample of 10 and 12 plants are given below. Test whether the yield of Variety-A is more than Variety-B at 2% level of significance. Variety-A: 18 15 16 20 22 20 23 18 20 25 Variety-B: 12 14 16 13 16 20 22 24
TEST – 11
TEST FOR PAIRED OBSERVATIONS Aim
To test the treatment applied is effective or not, based on a random sample. That is, to investigate the significance of the difference between before and after the treatment in the sample. Source
Let X i, (i = 1, 2,…, n) be the observations made initially from n individuals as a random sample of size n. A treatment is applied to the above individuals and observations are made after the treatment and are denoted by Yi, (i = 1, 2,…, n). That is, (X i, Yi) denotes the pair of observations obtained from the ith individual, before and after the treatment applied. Let µX is unknown population mean before the treatment and µY is the unknown population mean after the treatment. Assumptions
(i) The observations for the two samples must be obtained in pair. (ii) The population from which, the sample drawn is normal. Null Hypothesis
H0: The treatment applied, is ineffective. That is, there is no significant difference between before and after the treatment applied. i.e., H0: µd = µX – µY = 0. Alternative Hypotheses
H1(1) : µd ≠ 0 H1(2) : µd > 0 H1(3) : µd < 0 Level of Significance ( α ) and Critical Region:
(As in Test 3)
Parametric Tests
43
Test Statistic
t=
d − µd Sd / n
( Under H0 : µd = 0)
n
∑d d =
i
i =1
n
, d i = X i − Yi , S d2
∑(
1 n d −d = n − 1 i =1 i
)
2
The statistic t follows t distribution with (n–1) degrees of freedom. Conclusions
(As in Test 3)
Example 1 A health spa has advertised a weight-reducing program and has claimed that the average participant in the program loses more than 5 kgs. A random sample of 10 participants has the following weights before and after the program. Test his claim at 5% level of significance. Solution Weights before: 80 78 75 86 90 87 95 78 86 90 Weights after: 76 75 70 80 84 83 91 72 83 83 Aim: To test the claim of health spa on average weight reduction is five kgs or more. H0: The average weight reduction is only 5 kgs. i.e., H0: µd = µx – µy = 5 H1: The average weight reduction is more than 5 kgs. i.e., H1: µd > 5. Level of Significance: α = 0.05 and Critical value: t0.05,9 = 1.83
Test Statistic:
t=
d − µd Sd / n
(Under H0: µd = 0)
4.7 =
1. 41 / 10
=10.54
Conclusion: Since t > tα, we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 5% level of significance. That is, the average weight reduction is more than 5 kgs. Example 2 A manufacturer claims that a significant gain on weight will be attained for infants if a new variety of health drink marketed by him. A sample of 10 babies was selected and was given the above diet for a month and the weights were observed before (A) and after (B) the diet given. Examine whether the claim of the manufacturer is true at 2% level of significance. A : 3.50 3.75 3.65 4.10 3.65 3.55 3.60 4.20 3.80 3.50 B : 3.80 4.20 3.90 4.50 3.75 4.20 3.60 4.35 4.20 3.40
44
Selected Statistical Tests
Solution Aim: To test the claim of manufacturer on marketing a new variety of health drink, that will promote weight gain or not. H0: The claim of manufacturer on marketing a new variety of health drink that will promote weight gain is not true. i.e., H0: µd = 0. H1: The claim of manufacturer on marketing a new variety of health drink that will promote weight gain is true. i.e., H1: µd ≠ 0. Level of Significance: α = 0.02 and Critical value: t0.02,9 = 2.82 Test Statistic:
t=
=
d − µd Sd / n
(Under H0: µd = 0)
−0. 26 0.24 / 10
= –3.43
Conclusion: Since |t| > tα, we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 2% level of significance. That is, the claim of manufacturer on marketing a new variety of health drink that will promote weight gain is true.
EXERCISES 1. The following data shows the additional hours of sleep gained by 15 patients in an experiment to test the effect of a drug. Do these data shows the evidence that the drug produces additional hours of sleep at 2% level? Hours gained : 2.5 3.0 2.25 3.25 1.75 1.5 2.5 2.25 3.0 3.25 3.0 2.5 2.75 3.25 3.75. 2. A coaching centre for giving coach to civil service examination claims that there will be a significant improvement in obtainning scores to the students. A random sample of 12 students was selected. They are conducted examinations, before and after the coach, and are given below. Test whether the claim of the coaching centre at 1% level of significance. Student: Score Before Coaching : Score After Coaching :
1 68 78
2 72 75
3 74 78
4 67 80
5 79 80
6 78 85
7 82 80
8 78 75
9 77 90
10 77 92
11 80 95
12 78 90
TEST – 12
TEST FOR EQUALITY OF TWO POPULATION STANDARD DEVIATIONS Aim
To test the standard deviations of the two populations σ 1 and σ 2 are equal, based on two random samples. That is, to investigate the significance of the difference between the two sample standard deviations s1 and s2. Source
A random sample of n1 observations is drawn from a population whose mean µ1 and standard deviation σ 1 are unknown. A random sample of n2 observations is drawn from another population whose mean µ2 and standard deviation σ 2 are unknown. Let s1 and s2 be sample standard deviations of the respective samples. Assumptions
(i) The two samples are independently drawn from two normal populations. (ii) The sample sizes are sufficiently large. (iii) Since the population standard deviations σ 1 and σ 2 are unknown, they are replaced by their estimates s1 and s2. Null Hypothesis
H0: The two population standard deviations σ 1 and σ 2 are equal. That is, there is no significant difference between the two, sample standard deviations s1 and s2. i.e., H0 : σ 1 = σ 2. Alternative Hypotheses
H1(1) : σ 1 ≠ σ 2 H1(2) : σ 1 > σ 2 H1(3) : σ 1 < σ 2 Level of Significance ( α ) and Critical Region:
(As in Test 1)
46
Selected Statistical Tests
Test Statistic
Z=
s1 =
s1 − s 2 s12 s 22 + 2n 1 2 n2 1 n1
n1
∑ i =1
X i2
− ( X ) , s2 = 2
1 n2
n2
∑Y i =1
i
2
− (Y ) 2
The statistic Z follows Standard Normal distribution. Conclusions
(As in Test 1).
Example 1 Two types of rods are manufactured by an industry for a specific task. A random sample of 50 items of rod-1 has a standard deviation 0.85 and a sample of 80 items of rod-2 has a standard deviation 0.72. Test whether the two types of rods are equal in their variation of specifications at 5% level of significance. Solution Aim: To test the two types of rods are equal in their variation of specifications or not. H0: The two types of rods are equal in their variation of specifications. i.e., H0: σ 1 = σ 2 H1: The two types of rods are not equal in their variation of specifications. i.e., H1: σ 1 ≠ σ 2 Level of Significance: α =0.05 and Critical value: Zα=1.96 Test Statistic:
Z=
s1 − s2 2 s2 1 + s2 2n1 2n2
=
0. 85 − 0.72 0.85 2 0.72 2 + 2 × 50 2 × 80
= 1.27
Conclusion: Since the observed value of the test statistic lZl = 1.27, is smaller than the critical value 1.96 at 5% level of significance, the data do not provide us evidence against the null hypothesis H0. Hence, H0 is accepted and concluded that the two types of rods are equal in their variation of specifications. Example 2 A random sample of 100 students from a private school has a standard deviation of mark in a competitive examination is 12.35. Another sample of 150 students from a government school has the standard deviation of mark in the same examination is 10.25. Test whether the standard deviation of mark by two schools is equal at 5% level of significance. Solution Aim: To test the standard deviation of mark in a competitive examination by two schools is equal or not.
Parametric Tests
47
H0: The standard deviations of marks in a competitive examination by two schools are equal. i.e., H0: σ 1 = σ 2 H1: The standard deviations of marks in a competitive examination by two schools are not equal. i.e., H1: σ 1 = σ 2 Level of Significance: α = 0.05 and Critical value: Zα=1.96 Test Statistic:
Z=
s1 − s2 2 s2 1 + s2 2n1 2n2
=
12.35 − 10.25 (12 .35)2 (10. 25) 2 + 2 × 100 2 ×150
= 1.99
Conclusion: Since the observed value of the test statistic |Z| = 1.99, is greater than the critical value 1.96 at 5% level of significance, the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted and concluded that the standard deviation of mark in a competitive examination by two schools is not equal.
EXERCISES 1. A random sample of 1500 adult males is selected from France whose mean height (in inches) is 72.25 and a standard deviation of 6.5. Another sample of 1200 adult males is selected from Japan whose mean height (in inches) is 58.75 and a standard deviation of 7.25. Examine whether the standard deviation of heights of adult male in two countries are equal or not. 2. A large organization produces electric bulbs in each of its two factories. It is suspected the efficiency in the factory is not the same, so a test is carried out by ascertaining the variability of the life of the bulbs produced by each factory. The data are as follows: Factory-A
Factory-B
Number of bulbs in the sample
150
250
Average life
1200 hrs
950 hrs
Standard deviation
250 hrs
200 hrs
Based on the above data, determine whether the difference between the variability of life of bulbs from each sample is significant at 1 percent level of significance.
TEST – 13
TEST FOR EQUALITY OF TWO POPULATION VARIANCES Aim
To test the variances of the two populations are equal, based on two random samples. That is, to investigate the significance of the difference between the two sample variances. Source
Let X 1i, (i = 1, 2,…, n1) be a random sample of n1 observations drawn from a population with unknown variance σ 12 . Let Y2j ( j = 1, 2,…, n2 ) be a random sample of n2 observations drawn from another population with unknown variance σ 22 . Assumption
The populations from which, the samples drawn are normal distributions. Null Hypothesis
H0: The two population variances σ 12 and σ 22 are equal. That is, there is no significant difference between the two, sample variances s12 and s22 . i.e., H0: σ 12 = σ 22 . Alternative Hypotheses
H1(1) : σ 12 ≠ σ 22 H1(2) : σ 12 > σ 22 H1(3) : σ 12 < σ 22 Level of Significance ( α ) and Critical Values ( Fαα)
The critical values of F for right tailed test are available in Table 4. That is, the critical region is determined by the right tail areas. Thus the significant value F α, (n1 –1, n2 –1) at level of significance α and (n1 – 1, n2 – 1) degrees of freedom is determined by P{F > F α, (n1 –1, n2 –1)} = α. The critical values of F
Parametric Tests
49
for left tailed test is F < F(1 – α), ( n1 –1, n 2 – 1) and for two tailed test is F > F( α / 2 ),( n1 –1 ,n 2 –1 ) and F < F(1– α / 2 ), (n1 – 1, n 2 – 1) . We have the following reciprocal relation between the upper and lower α significant points of F-distribution: Fα (n1 , n2 ) =
1 ⇒ Fα (n1 , n2 ) × F1 −α (n2 , n1 ) = 1. F1− α (n2 , n1 )
Critical Regions
1. F > F (α / 2 ),( n1 – 1, n2 –1 ) and F < F (1– α / 2 ),( n1 – 1, n2 –1 ) such that P {F > F (α / 2 ),( n1 – 1, n2 –1 ) } + P {F < F (1– α / 2 ),( n1 – 1, n2 –1 ) } = α
α/2 ←
0 F(1 −α / 2 ), (n −1 ,n −1) 1 2
F(α / 2 ), ( n1 −1, n 2 −1)
→ α/2
2. F > F α ,( n1 –1, n 2 – 1) such that P {F > F α ,( n1 –1, n 2 – 1) } = α .
0
→α Fα,( n
1 −1, n 2 −1 )
50
Selected Statistical Tests
3. F < F (1– α ),( n1 –1,n2 –1 ) such that P{F < F (1– α ),( n1 –1,n2 –1 ) }= α
α→ 0
F(1 – α ),( n
1 –1, n 2 – 1)
Test Statistic 2
F=
X1
=
S1
2
S2 1 n1
n1
∑ i =1
1 X 1i , X 1 = n2
2 S1 =
i =1
∑X
2j
,
j =1
n2
n1
∑( X
n2
i
− X 1)
n1 − 1
2
, S 22 =
∑ (Y j =1
i
− X 2 )2
n2 − 1
The statistic F follows F distribution with (n1 − 1, n 2 − 1) degrees of freedom. Conclusions
≤ F ≤ F ( α/2), ( n1 – 1, n2 –1) , we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(1).
1. If F (1 – α/2 ),( n
1 –1 ,n 2 – 1)
2. If F ≤ F ( α), ( n1 –1, n 2 –1 ) , we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(2). 3. If F ≥ F (1 – α ), (n1 – 1,n 2 – 1) , we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1(3).
Parametric Tests
51
Example 1 A quality control supervisor for an automobile manufacturer is concerned with uniformity in the number of defects in cars coming off the assembly line. If one assembly line has significantly more variability in the number of defects, then changes have to be made. The supervisor has obtained the following data. Number of Defects Assembly Line-A
Assembly Line-B
Mean
12
14
Variance
20
13
Sample size
16
20
Does assembly line A have significantly more variability in the number of defects? Test at 5% level of significance. Solution Aim: To test the assembly line A have significantly more variability than assembly line B in the number of defects or not. H0: There is no significant difference in variability between assembly line A and assembly line B in the number of defects. i.e., H0: σ 12 = σ 22. H1: The assembly line A has significantly more variability than assembly line B in the number of defects. i.e., H1: σ 12 > σ 22. Level of Significance: α = 0.05 and Critical value: F 0.05, (16-1, 20–1) = 2.23 2
Test Statistic:
F=
S1
2 S2
=
20 = 1.54 13
Conclusion: Since F < F á,(n −1 ,n −1 ) , we conclude that the data do not provide us any evidence 1 2 against the null hypothesis H0, and hence it is accepted at 5% level of significance. That is, there is no significant difference in variability between assembly line A and assembly line B in the number of defects. Example 2 An insurance company is interested in the length of hospital-stays for various illnesses. The company has selected 15 patients from hospital A and 10 from hospital B who were treated for the same ailment. The amount of time spent in hospital A had an average of 2.6 days with a standard deviation of 0.8 day. The treatment time in hospital B averaged 2.2 days with a standard deviation of 0.12 day. Do patients in hospital A have significantly less variability in their recovery time? Test at 1% level of significance. Solution Aim: To test the patients in hospital A, have significantly less variability than the patients do in hospital B, in their recovery time. H0: There is no significant difference in recovery time in variability between the patients in hospital A and hospital B. i.e., H0: σ 12 = σ 22.
52
Selected Statistical Tests
H1: The patients in hospital A, have significantly less variability than the patients do in hospital B, in their recovery time. i.e., H1: σ 12 < σ 22 ⇒ H1: σ 22 > σ 12. Level of Significance: α = 0.01 and Critical value: F 0.01, (10–1, 15–1) = 4.03. Test Statistic:
F=
S 22 S12
=
1.44 = 2.25 0. 64
Conclusion: Since F < F α ,( n – 1, n –1 ) , we conclude that the data do not provide us any evidence 1 2 against the null hypothesis H0 , and hence it is accepted at 5% level of significance. That is, patients at hospital A do not have significantly less variability in their recovery times.
EXERCISES 1. Two brand managers were in disagreement over the issue of whether urban homemakers had greater variability in grocery shopping patterns than did rural homemakers. To test their conflicting ideas, they took random samples of 25 homemakers from urban areas and 15 homemakers from rural areas. They found that the variance for the urban homemaker was 4.25 and rural homemaker was 3.5. Is the difference in the variances in days between shopping visits significant at 5% level? 2. The diameters of two random samples, each of size 10, of bullets produced by two machines have standard deviations 0.012 and 0.018. Test the hypothesis that the two machines are equally consistent in diameters at 1% level of significance.
TEST – 14
TEST FOR CONSISTENCY IN A 2×2 TABLE Aim
To test the given two attributes classified into two classes each, are independent, based on the observed frequencies, obtained from any sample survey. Source
A random sample of size N is classified into 2 classes by attribute-A and 2 classes by attribute-B. The above observed frequencies can be expressed in the following table known as 2 × 2 contingency table as follows. Attribute-A
Attribute–B
Class–1
Class–2
Total
Class–1
a
b
a +b
Class–2
c
d
c+d
a +c
b +d
N
Total Assumptions
(i) The sample size N, should be sufficiently large (i.e., N > 20) (ii) Each cell frequencies should be independent. (iii) Each cell frequencies are at least 3. Null Hypothesis
H0: The two attributes are independent. Alternative Hypothesis
H1: The two attributes are not independent.
54
Selected Statistical Tests
Level of Significance ( α ) and Critical Region
χ2 > χ2α, (1) such that P{χ2 > χ2α, (1)} = α Test Statistic 2
χ2 =
N {(ad − bc) } (a + b)(a + c )(b + d )(c + d )
The statistic χ2 follows χ2 distribution with one degree of freedom. Conclusion
If χ2 ≤ χ2α,(1), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example 1 Out of 5000 households in a town, 3200 are self-employed, out of 2200 graduate households, 1400 are self-employed. Examine whether there is any association between graduation and nature of employment at 5% level of significance. Solution Aim: To test the two attributes, graduation and nature of employment are independent. H0: Graduation and nature of employment are independent. H1: Graduation and nature of employment are dependent. Level of Significance: α = 0.05 and Critical value: χ20.05, 1 = 3.841 Employment
Graduation
Total
Self-empoyed 1400
Others 800
Non-graduates
1800
1000
2800
Total
3200
1800
5000
Graduates
2200
2
Test Statistic:
χ2 =
=
N {(ad − bc) } (a + b)(a + c )(b + d )(c + d ) 5000[(1400 × 1000) − (1800 × 800)]2 = 0.02 3200 × 1800 × 2200 × 2800
Conclusion: Since χ2 < χ2α, (1), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it is accepted at 5% level of significance. That is, Graduation and nature of employment are independent. Example 2 A sample survey was conducted from 300 persons, to study the association between drinking habit and sales of liquor from a town. The following two questions were asked and their response is reported below.
Parametric Tests
55
(A) Do you drink? (B) Are you in favor of sales of liquor? Question-A
Question-B
Total
Yes
No
Yes
100
40
140
No
140
20
160
Total
240
60
300
Test whether the drunkenness and opinion about the sales of liquor are associated or independent at 1% level of significance. Solution Aim: To test the drunkenness and opinion about the sales of liquor are associated or independent. H0: The drunkenness and opinion about the sales of liquor are independent. H1: The drunkenness and opinion about the sales of liquor are associated. Level of Significance: α = 0.05 and Critical value: χ20.05, 1 = 3.841 χ2 =
Test Statistic:
N {(ad − bc) 2 } (a + b)(a + c )(b + d )(c + d )
300[(100 × 60) − (40 × 140)]2 = 0.071 140 × 200 × 240 × 100 Conclusion: Since χ2 < χ2α,(1), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it is accepted at 5% level of significance. That is, the drunkenness and opinion about the sales of liquor are independent. =
EXERCISES 1. In an experiment on immunization of cattle from tuberculosis, the following data were obtained.
Inoculated Not Inoculated Total
Affected
Unaffected
Total
12 98 110
68 22 90
80 120 200
Examine the effect of vaccine in controlling the incidence of the disease at 2% level. 2. A sample survey was conducted from 500 to know the response from the students about the introduction of CBCS system in the university. The following data were obtained: Favor
Against
Total
Male Female
135 120
115 130
250 250
Total
255
245
500
Test whether the opinion about the introduction of CBCS system depends on the gender of the students at 2% level of significance.
TEST – 15
TEST FOR HOMOGENEITY OF SEVERAL POPULATION PROPORTIONS Aim
To test the k population proportions are equal based on k independent samples. That is to investigate the significance of the difference among the k sample proportions. Source
Let there be k populations from which k independent random samples are drawn. Let Oi be the observed frequency of a specific kind obtained from the ith sample of ni observations, i = 1, 2,…, k. Null Hypothesis
H0: The k population proportions are equal. That is, there is no significance difference among the k sample proportions. i.e., H0: P 1 = P 2 = … = P k. Alternative Hypothesis
H1: P 1 ≠ P 2 ≠ … ≠ P k. Level of Significance ( α ) and Critical Region
χ2 < χ21-(α/2),(k-1) ∪ χ2 > χ2(α/2),(k-1) such that P{χ2 < χ21-(α/2),(k-1) ∪ χ2 > χ2(α/2),(k-1)} = α Test Statistic k
(Oi − ni p )2
i =1
ni pq
∑ ∑O p= ∑n
χ = 2
i
where
and q = 1–p.
i
The Statistic χ2 follows χ2 distribution with (k-1) degrees of freedom.
Parametric Tests
57
Conclusion
If χ21–(α/2),(k–1) ≤ χ2 ≤ χ2(α/2),(k–1), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example 1 In an experiment on the efficiency of different insecticides in the control of mottle streak disease in finger millet, 50 plants were selected at random from the field, from each group. The number of plants affected from the disease in each group was observed as follows: Insecticide
Number of diseased plants
1
Endosulfan
8
2
Methyl dematon
7
3
Monocrotophos
5
4
Phosphamidon
6
5
Dimethoate
4
Test whether the proportions of diseased plants affected by various insecticides are equal at 5% level of significance. Solution Aim: To test the proportions of diseased plants affected by various insecticides are equal or not. H0: The proportions of diseased plants affected by various insecticides are equal. i.e., H0: P 1 = P 2 = P 3 = P 4 = P 5. H1: The proportions of diseased plants affected by various insecticides are not equal. i.e., H1: P 1 ≠ P 2 ≠ P 3 ≠ P 4 ≠ P 5. Level of Significance: α = 0.05 Critical Values: χ2(.975), 4 = 0.484 & χ2(.025), 4 = 11.143 Critical Region: P (χ2(.975), 4 < 0.484) + P(χ2(.025),4 > 11.143) = 0.05 p=
∑O ∑n
i
i
=
30 = 0.12 and q = 1–p = 0.88 250
number
Number of diseased plants (Oi )
Sample size(n i )
nip
(Oi – ni p )2 ni pq
1 2 3 4 5
8 7 5 6 4
50 50 50 50 50
6 6 6 6 6
0.7576 0.1894 0.1894 0.0000 0.7576
30
250
30
1.8940
Insecticide
58
Selected Statistical Tests
χ = 2
Test Statistic:
k
(Oi − ni p )2
i =1
ni pq
∑
= 1.894
Conclusion: Since 0.484 < χ2 < 11.143, we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it is accepted at 5% level of significance. That is, the proportions of diseased plants affected by various insecticides are equal. Example 2 A sample survey was conducted in 4 villages to study about the consumption of tobacco product. A random sample was selected from each of the village and the number of smokers is observed as follows. Examine whether the proportion of smokers in all the four villages are same at 2% level of significance. Village
Sample size
No.of smokers
A
60
14
B
70
16
C
80
17
D
90
13
Solution Aim: To test the proportions of smokers in all the four villages are equal or not. H0: The proportions of smokers in all the four villages are equal. i.e., H0: P 1 = P 2 = P 3 = P 4. H1: The proportions of smokers in all the four villages are not equal. i.e., H1: P 1 ≠ P 2 ≠ P 3 ≠ P 4. Level of Significance: α = 0.02 Critical Values: χ2(.99), 3 = 0.115 & χ2(.01), 3 = 11.345 Critical Region: P (χ2(.99), 3 < 0.115) + P (χ2(.01), 3 > 11.345) = 0.02
∑O ∑n
i
p= Village A B C D
Test Statistic:
i
=
60 = 0.2 and q = 1– p = 0.8 300
(Oi – ni p)2
Number of smokers (Oi )
Sample size (ni)
14 16 17 13
60 70 80 90
12 14 16 18
0.4167 0.3571 0.0781 1.7361
60
300
60
2.5880
χ = 2
ni pq
k
(Oi − ni p )2
i =1
ni pq
∑
ni p
= 2.5880
Parametric Tests
59
Conclusion: Since 0.115 < χ2 < 11.345, we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it is accepted at 2% level of significance. That is, the proportions of smokers in all the four villages are equal.
EXERCISES 1. The number of defective items was observed from 4 lots of fruits by taking random samples as follows. Can we regard that the proportion of defective items in all four varieties of fruits are same at 5% level. Number of Fruits Sample sze (n i p) defectives (Oi ) A 12 100 B 17 100 C 10 100 D 11 100 2. A clinical survey was conducted at four taluks of Thanjavur district to study the attack of filariasis. The following data were obtained. Test whether the ratio of filariasis is same in all the four taluks at 10% level of significance. Taluk
Patients affected
Sample size
A
6
200
B
3
300
C
5
400
D
2
100
TEST – 16
TEST FOR HOMOGENEITY OF SEVERAL POPULATION VARIANCES (BARTLETT'S TEST) Aim
To test the variances of the k populations are equal, based on k random samples. That is, to investigate the significance of the differences among k sample variances. Source
Let X ij, ( i = 1, 2,…, k ; j = 1, 2,…, ni ) be the observations of k random samples each has ni observations drawn from k independent populations whose variances are respectively σ 12 , σ 22 ,…, σ 2k . Let X 1 , X 2 , …, X k be the means of k samples. Assumptions
(i) The populations from which, the k samples drawn, are Normal distributions. (ii) The unknown variances σ 12 , σ 22 ,…, σ 2k are estimated by their respective unbiased estimates S12 , S 22 ,…, S k2 . Null Hypothesis
H0: The variances of k populations σ 12 , σ 22 ,…, σ 2k are equal. That is, there is no significant difference among the k unbiased estimates of the population variances S12 , S 22 ,…, S k2 . i.e. , H0 : σ 12 = σ 22 = … = σ 2k . Alternative Hypothesis
H1: σ 12 ≠ σ 22 ≠ … ≠ σ 2k . Level of Significance ( α ) and Critical Region
χ2 < χ21–(α/2),(k–1) ∪ χ2 > χ2(α/2),(k–1) such that
Parametric Tests
61
P{χ2 < χ21–(α/2),(k–1) ∪ χ2 > χ2(α/2),(k–1)} = α Test Statistic k
χ = 2
∑
ν i log
i =1
1 1 + 3(k − 1)
S2 S i2
1 − i
∑ ν i
1 ν
k
ν i = (ni − 1) ,
Si2
1 = νi
∑ νi i =1
ni
∑ (X j =1
= v,,
− Xi ) , 2
ij
2
S =
∑ν S
2 i i
ν
The Statistic χ2 follows χ2 distribution with (k–1) degrees of freedom. Conclusion
If χ21– (α / 2), (k – 1) ≤ χ2 ≤ χ2(α / 2),(k – 1), we conclude that the data do not provide us any evidence against the null hypothesis H0 , and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example 1 Three experts conducted an interview to the candidates and assigned the marks independently. A random sample of 5 candidates is selected whose marks are as follows. Examine whether there exists variation among the experts in assigning the marks at 5% level of significance. Experts
Candidates 3 4
1
2
5
A
64
78
86
65
92
B
68
72
80
74
80
C
70
75
78
70
85
Solution Aim: To test the variances among the experts in assigning the marks are equal or not. H0: The variances among the experts in assigning the marks are equal. H1: The variances among the experts in assigning the marks are not equal. Level of Significance: α = 0.05 Critical Values: χ2(.975), 2 = 0.0506 & χ2(.025), 2 = 7.378 Critical Region: P (χ2(.975), 2 < 0.0506) + P (χ2(.025), 2 > 7.378) = 0.05
62
Selected Statistical Tests
Calculations: k
v i = (ni – 1) = 5 – 1 = 4 for all i = 1, 2, 3 2 Si =
1 vi
ni
∑ (X v
i
= v = 12 ; k = 3 – 1 = 2
2
2 i i
S2 =
i =1
– X i ) ; S12 = 193.75; S 22 = 75.9993 ; S32 = 49.125
ij
j =1
∑ vS
∑v
=
4(193. 75 + 75.9993 + 49. 125) = 106.29 ; log S 2 = 4.6662 12 2
vi
Si2
log S i
vi log s i2
4
193.750
5.2666
21.0664
4
75.9993
4.3307
17.3226
4
49.1250
3.8944
15.5776
∑ v log S i
2 i
= 53.9666
Test Statistic: ν × log S 2 − χ2 =
k
∑ ν log S
2 i
i
i =1
1 1 + 3(k − 1)
1 − i
∑ ν i
1 ν
=
(12 × 4.6662 ) − 53. 9666 = 1.825 1 3 1 1+ − 3 × 2 4 12
Conclusion: Since χ2.975,2 < χ2 < χ2.025,2, we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. That is, the variances among the experts in assigning the marks are equal. Example 2 An agricultural experiment was carried out to examine the effectiveness of the yield of brinjals of four varieties. The following are the yields (in kgs.) of four varieties of brinjals applied in different plots as follows: Variety A B C D
Sample Size 4 5 6 7
Yield 12.50 10.50 8.50 16.50
16.25 12.75 9.50 15.65
14.50 14.50 9.75 15.35
16.50 13.25 16.75 14.25
14.25 15.50 16.25
10.50 15.55
16.75
Test, whether the variances of the yield of four varieties of brinjals, are equal at 2% level of significance.
Parametric Tests
63
Solution Aim: To test variances of the yield of four varieties of brinjals are equal or not. H0: The variances of the yield of four varieties of brinjals are equal. H1: The variances of the yield of four varieties of brinjals are not equal. Level of Significance: α = 0.02 Critical Values: χ2(.99), 3 = 0.115 & χ2(.01), 3 = 11.345 Critical Region: P (χ2(.99), 3 < 0.115) + P (χ2(.01), 3 > 11.345) = 0.02 Calculations: vi = (ni – 1) . v1 = 3, v 2 = 4, v3 = 5, v 4 = 6, 4
∑ νi = v =18, i =1
1 = v i
Si2
S12 = 4.5762 S = 2
∑ν S
2 i i
ν
ni
∑ (X
– Xi )
ij
2
j =1
S 22 = 3.1796
S32 = 40.3805
S 42 =0.8307
log S 2 = 2.7033
14.9294
Si2
vi
Log Si2
vi log Si2
3
4.5762
1.5209
4.5627
4
3.1796
1.1568
4.6272
5
40.3805
3.6983
18.4915
6
0.8307
– 0.1855
– 1.113
∑ v log S i
2 i
= 26.5684
Test Statistic: ν × log S 2 − χ2 =
k
∑ ν log S i
2 i
i =1
1 1 + 3(k − 1)
1 − i
∑ ν i
1 ν
=
(18 × 2.7033) – 26. 5684 = 20.1505 1 1 1 1 1 1 1+ + + + – 3 × 3 3 4 5 6 12
Conclusion: Since χ2 > χ2.01,3, we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence H1 is accepted at 2% level of significance. That is, the variances of the yield of four varieties of brinjals are not equal.
64
Selected Statistical Tests
EXERCISES 1. A manufacturer produces three types of iron rods. Random samples are drawn from each type, whose lengths (in mm) are as follows. Test whether the variances of the three types are equal at 5% level of significance. Type
Sample size
Length of rods
A
6
22 24 22 21 23 24
B
5
20 25 26 21 22
C
6
20 26 22 21 25 27
2. A sample survey was conducted in three localities from 10 households each, whose monthly expenditure on food are as follows. Are these samples agree with the variation of monthly food expenses of these three localities are same? Test at 5% significance level. Location
Monthly expenditure of 10 households
I
1450
1800
1620
1540
1870
1680
1530
1850
1650
1950
II
1250
2500
2400
2600
1800
1500
1800
1950
1800
1550
III
2450
2300
2020
2500
2400
2650
2550
2450
2800
2600
TEST – 17
TEST FOR HOMOGENEITY OF SEVERAL POPULATION MEANS Aim
To test the mean of the k populations are equal, based on k independent random samples. That is, to investigate the significance of the difference among the k sample means. Source
Let X ij, (i = 1, 2,…, k ; j = 1, 2,…, ni) be the observations of k random samples each has ni observations drawn from k independent populations whose means µ1, µ2,…, µk are unknowns and the variances are equal but unknown. Let X 1 , X 2 , …, X k be the means of k samples. Let n1 + n2 +…+ nk = n. Assumptions
(i) The populations from which, the k samples drawn, are Normal distributions. (ii) Each observation is independently drawn. Null Hypothesis
H0: The means of k populations µ1, µ2,…, µk are equal. That is, there is no significant difference among the k sample means X 1 , X 2 , …, X k i.e., H0: µ1 = µ2 = …, = µk. Alternative Hypothesis
H1: µ1 ≠ µ2 ≠ …, ≠ µk Level of Significance ( α ) and Critical Region
F > F α,(k – 1, n – k) such that P [F > F α,(k – 1), (n – k)] = α. The Critical value of F at level of Significance α and degrees of freedom (k – 1, n – k ) is obtained from Table 4.
66
Selected Statistical Tests
Method
Calculate the following, based on the sample observations. k
1. Grand total of all the observations, G =
ni
∑∑ X ij i =1 j=1
2. Correction Factor, CF = G2/n k
3. Total Sum of Squares, TSS =
ni
∑∑ X
2 ij
i =1 j =1
– CF
Ti 2 – CF 4. Sum of Squares between the Samples, SSS = n i =1 i th Ti be the sum of the i sample observations. 5. Error Sum of Square (Sum of Squares within the sample), ESS = TSS – SSS. 6. Analysis of Variance (ANOVA) Table: k
∑
Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Between samples
k –1
SSS
SSS/(k – 1)
With in samples
n –k
ESS
ESS/(n – k)
Total
n–1
TSS
–
Test Statistic
SSS /(k – 1) F = ESS / (n – k ) The Statistic F follows F distribution with (k – 1, n – k) degrees of freedom. Conclusion
If F ≤ F α, (k –1, n – k), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1 . Note:This test is same as test for completely randomized design with unequal number of replications on k treatments with i th treatment has ni replications. Example 1 The following data is obtained from three independent samples of students selected from three batches of students, which denotes their marks in an examination. Test whether, the mean mark of all the three batches students are equal at 5% level of significance. Batch A: 62 68 64 76 Batch B: 82 88 74 86 80 Batch C: 83 87 80
Parametric Tests
67
Solution Aim: To test the mean mark of all the three batches of students in the examinations are equal or not. H0: The mean marks of all the three batches of students in the examinations are equal. i.e., H0: µ1 = µ2 = µ3 H1: The mean marks of all the three batches of students in the examinations are not equal. i.e., H1: µ1 ≠ µ2 ≠ µ3 Level of Significance: α = 0.05 and Critical Value = F 0.05, (2,9) = 4.26 Calculations: Number of Samples k = 3 n1= 4 n2 = 5 n3 = 3 n = 12 T1 = 270 T2 = 410 T3 = 250 G = 250 Correction Factor, CF = 9302/12 = 72075 Total Sum of Squares, TSS = 622 +…+ 802 – CF = 863 270 2 410 2 250 2 + + − 72075 = 603.33 4 5 3 Error Sum of Squares, ESS = TSS – SSS = 259.67 ANOVA Table: Sum of Squares between samples, SSS =
Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Samples
2
603.33
301.67
Error
9
259.67
28.85
Total
11
863
SSS /(k – 1) 301 .67 Test Statistic: F = ESS / (n – k ) = = 10.46 28. 85 Conclusion: Since F > F 0.05, (2,9) = 4.26, we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 5% level of significance. That is, the mean marks of all the three batches of students in the examinations are not equal. Example 2 The following data denotes the life of electric bulbs of four varieties. Test, whether the average life of four varieties of bulbs is homogeneous at 5% level of significance. Variety
Sample size
Life of the electric bulbs in hours
I
8
1560
1670
1580 1650
1640
1680
1600
1650
II
9
1450
1460
1480 1450
1460
1440
1450
1480 1470
III
9
1430
1440
1450 1440
1430
1420
1410
1450 1470
IV
8
1540
1570
1550 1560
1570
1580
1530
1590
Solution Aim: To test the average life of four varieties of bulbs is equal or not. H0: The average life of four varieties of bulbs is equal. i.e., H0: µ1 = µ2 = µ3 = µ4.
68
Selected Statistical Tests
H1: The average life of four varieties of bulbs is not equal. i.e., H1: µ1 ≠ µ2 ≠ µ3 ≠ µ4. Level of Significance: α = 0.05 and Critical Value : F 0.05,(3,30) = 4.51 Calculations
Shifting the origin to 1410 and then dividing by 10, the above data reduces to 15 26 17 24 23 27 19 24 04 05 07 04 05 03 04 07 06 02 03 04 03 02 01 00 04 06 13 16 14 15 16 17 12 18 Number of Samples k = 4 n1 = 8 n2 = 9 n3 = 9 n4 = 8 n = 34 T1 = 175 T2 = 45 T3 = 25 T4 = 121 G = 366 2 Correction Factor, CF = 366 /34 = 3939.88 Total Sum of Squares, TSS = 152 + … + 182 – CF = 2216.12 145 2 45 2 25 2 1212 + + + − 3939.88 = 2012.81 Sum of Squares between samples, SSS = 8 9 9 8 Error Sum of Squares, ESS = TSS – SSS = 203.31 ANOVA Table: Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Samples
3
2012.81
670.94
Error
30
203.31
6.78
Total
33
2216.12
SSS /(k – 1) 670.94 Test Statistic: F = ESS / (n – k ) = = 98.96 6.78 Conclusion: Since F > F 0.05, (3,30), we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 5% level of significance. That is, the average life of four varieties of bulbs is not equal.
EXERCISES 1. Three varieties of coal were analyzed by four chemists and the ash content in the varieties was obtained as follows. Chemists Varieties 1 2 3 4 A 6 7 7 8 B 7 6 8 7 C 4 3 5 6 Do the varieties differ significantly in their ash-content?
Parametric Tests
69
2. Three processes A, B and C are tested to see whether their outputs are equivalent. The following observations of output are made: A
12
15
17
18
15
17
B
14
17
18
14
16
14
C
14
18
17
15
15
19
16
17
19
Examine the outputs of these three processes differ significantly at 1% level of significance.
TEST – 18
TEST FOR INDEPENDENCE OF ATTRIBUTES Aim
To test the given two attributes are independent, based on the observed frequencies, obtained from any sample survey. Source
A random sample of N observed frequencies be classified into m classes by attribute-A and n classes by attribute-B. The above observed frequencies can be expressed in the following table known as m × n contingency table. Attribute-B … j
1
2
1
O11
O12
…
2
O21
O22
…
…
…
…
…
…
Total
…
n
O1j
…
O1n
O1 .
O2j
…
O2n
O2 .
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
Attribute
i
Oi1
Oi2
…
Oij
…
Oin
Oi .
A
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
m
Om1
Om2
…
Omj
…
Omn
Om .
Total
O.1
O.2
…
O.j
…
O.n
N
Assumptions
(i) The sample size N, should be sufficiently large. (ii) Each cell frequencies Oij should be independent. (iii) Each cell frequencies Oij should be at least 5.
Parametric Tests
71
Null Hypothesis H0 The two attributes are independent. Alternative Hypothesis H1 The two attributes are dependent. Level of Significance ( α ) and Critical Region
χ2 > χ2α,(m–1) × (n–1) such that P {χ2 > χ2α,(m–1) × (n–1)} = α Test Statistic
[Oij − Eij ]2 χ = Eij i =1 j =1 Oi .× Oj . E ij = N 2 2 The statistic χ follows χ distribution with (m–1) × (n–1) degrees of freedom. m
n
∑∑
2
Conclusion
If χ2 ≤ χ2α,(m–1) × (n–1), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example 1 A newspaper publisher, trying to pinpoint his market’s characteristics, wondered whether newspaper readership in the community is related to reader’s educational achievement. A survey questioned adults in the area on their level of education and their frequency of readership. The results are shown in the following table. Frequency of readership
Level of educational achievement Post graduate Graduate
Secondary
Primary
Total
Never
15
18
22
25
80
Sometimes
16
24
15
25
80
Morn or Even
22
14
18
16
70
Both Editions
27
14
15
14
70
Total
80
70
70
80
300
Solution
Aim: To test the frequency of readership of Newspaper is i ndependent of level of educational achievement or not. H0: The frequency of readership of Newspaper is independent of level of educational achievement. H1: The frequency of readership of Newspaper depends on level of educational achievement. Level of Significance: α = 0.05 Critical Value: χ20.05, (4 – 1) × (4 – 1) = χ20.05,9 = 16.919 Oi. × O. j Calculations: E ij = N
72
Selected Statistical Tests
Oij
Eij
15 16 27 22 18 24 14 14 22 15 8 15 25 25 16 14 300
21.33 21.33 18.67 18.67 18.67 18.67 16.33 16.33 18.67 18.67 16.33 16.33 21.33 21.33 18.67 18.67 300 m
Test Statistic:
χ
2
(Oij – Eij )2
(Oij – Eij )2 /Eij
40.0689 28.4089 69.3889 11.0889 0.4489 28.4089 5.4289 5.4289 11.0889 13.4689 69.3889 1.7689 13.4689 13.4689 7.1289 21.8089
1.8785 1.3319 3.7166 0.5939 0.0240 1.5216 0.3324 3.0080 0.5939 0.7214 4.2492 0.1083 0.6315 0.6315 0.3818 1.1681 20.8926
[Oij − Eij ]2 Eij j =1 n
= ∑∑ i =1
= 20.8926
Conclusion: Since χ2 > χ20.05,9, we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 5% level of significance. That is, the frequency of readership of Newspaper depends on level of educational achievement. Example 2 In a survey, a random sample of 200 farms was classified into three classes according to tenure status as owned, rented and mixed. They were also classified according to the level of soil fertility as highly fertile, moderately fertile and low fertile farms. The results are given below. Test at 1% level of significance. Soil fertility High Moderate Low Total
Tenure status Owned
Rented
Mixed
45 20 20 85
15 10 25 50
10 15 40 65
Total 70 45 85 200
Solution Aim: To test the tenure status is independent of soil fertility or not. H0: The Tenure status and soil fertility are independent of each other. H1: The tenure status depends on soil fertility.
Parametric Tests
73
Level of Significance: α = 0.01 Critical Value: χ20.01, (3 –1) × (3 – 1) = χ20.01, 4 = 16.812 Oi. × O. j E ij = Calculations: N Oij
Eij
45 20 20 15 10 25 10 15 40 200
29.750 19.125 36.125 17.500 11.250 21.250 22.750 14.625 27.625 200
(Oij – Eij )2
(Oij – Eij )2 /Eij
232.5625 0.7656 260.0156 6.2500 1.5625 14.0625 162.5625 0.1056 153.1406
7.8172 0.0400 7.1977 0.3571 0.1389 0.6618 7.1456 0.0072 5.5436 28.9091
[Oij − Eij ]2 Test Statistic: χ = = 20.8926 E ij i =1 j =1 2 2 Conclusion: Since χ > χ 0.01,4, we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 1% level of significance. That is, the tenure status depends on soil fertility. m
2
n
∑∑
EXERCISES 1. Two researchers adopted different sampling techniques while investigating the same group of students to find the number of students falling in different intelligence levels. The data is as follows. Can you say that the sampling techniques adopted by the two researchers are significantly different? Level of students Researcher Below Average Above Genius average average A 64 42 36 24 B 56 58 44 26 2. In an organization, a random sample of 100 employees were selected whose educational level and their employment status was observed. Examine whether the employment status depends on their level of education at 10% level of significance. Employment status Assistants Clerical Supervisors
Level of education Primary Secondary Graduates 15 14 5 12 18 8 8 8 12
TEST – 19
TEST FOR POPULATION CORRELATION COEFFICIENT EQUALS ZERO Aim
To test the population correlation coefficient is zero, based on a bivariate random sample. That is, to investigate the significance of the difference between the sample correlation coefficient r and zero. Source
Let (X i, Yi), (i = 1, 2,…, n) be a random sample of n pairs of observations drawn from a bivariate normal population whose correlation coefficient ρ is unknown. Let r be the correlation coefficient based on the above sample. Assumptions
(i) The population from which, the sample drawn, is a bivariate normal population. (ii) The relationship between X and Y is linear. Null Hypothesis
H0: The population correlation coefficient ρ is zero. That is, there is no significant difference between the sample correlation coefficient r and zero. i.e., H0: ρ = 0 Alternative Hypothesis
H1: ρ ≠ 0 Level of Significance ( α ) and Critical Region
|t| > tα,(n–2) such that P{|t| > tα,(n–2)} = α Test Statistic
t=
r 1– r2
n–2
Parametric Tests
75
r=
1 n 1 n
∑X
∑ XY − X Y 2 1 −X n
2
∑Y
2
−Y
2
The statistic t follows t distribution with (n–2) degrees of freedom. Conclusion
If |t| ≤ tα, we conclude that the data do not provide us any evidence against the null hypothesis H0, be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example 1 A random sample of 10 student’s marks in Mathematics and English are given below. Test whether the correlation exists between the marks of two subjects at 2% level of significance. Marks in Mathematics: 68 54 78 75 76 85 54 68 87 75 Marks in English: 59 68 72 67 72 78 64 58 68 74 Solution Aim: To test the correlation coefficient between the marks in mathematics and English is zero or not. H0: The correlation coefficient between the marks in Mathematics and English is zero i.e. , H0 : ρ = 0 H1: The correlation coefficient between the marks in Mathematics and English is not zero i.e., H1 : ρ ≠ 0 Level of Significance: α = 0.02 and Critical Value: t0.02,8 = 2.896 Based on the data,
∑
X = 720 ;
∑ Y = 680 ; ∑ X r=
=
Test Statistic: t =
r 1− r 2
2
= 52984 ; 1 n 1 n
∑X
2
∑Y
2
= 46606 ;
∑
XY = 49293
∑ XY − X Y 2 1 −X n
∑Y
2
−Y
2
1 × 49293 − (72 × 68) 10 1 1 2 × 46606 − 68 2 × 52984 − 72 10 10
n − 2 = 0.51 × 2.83/0.86 = 1.68
= 0.51
76
Selected Statistical Tests
Conclusion: Since |t| < tα, we conclude that the data do not provide us any evidence against the null hypothesis H0. Hence, H0 is accepted at 2% level of significance. That is, the correlation coefficient between the marks in Mathematics and English is zero. Example 2 A random sample of 10 students is selected from a kinder garden school whose height (in cms) and weight (in kgs) are given below. Test whether the height and weight of the students of that school is correlated at 1% level of significance. Height: Weight:
92 96 18.50 19.25
88 96 98 95 89 96 90 90 17.75 19.50 19.00 19.25 18.00 19.50 18.50 18.75
Solution Aim: To test, the correlation coefficient between the height and weight of the students is zero or not. H0: The correlation coefficient between the height and weight of the students is zero i.e. , H0 : ρ = 0 H1: The correlation coefficient between the height and weight of the students is not zero i.e., H1 : ρ ≠ 0 Level of Significance: α = 0.01 and Critical Value: t0.01,8 = 3.355 Based on the data,
∑ X = 930 ; ∑ Y = 188 ; ∑ X r=
=
Test Statistic:
t=
2
∑Y
= 86606; 1 n 1 n
∑X
2
2
= 3537.75 ;
∑ XY = 17501.25
∑ XY − X Y 2 1 −X n
∑Y
2
−Y
2
1 × 17501.25 − (93 × 18.8) 10 2 1 2 1 × 3537. 75 − 18. 8 × 86606 − 93 10 10
r 1− r
2
= 0.8848
n − 2 = 0.8848×2.83/0.4659 = 5.3745
Conclusion: Since t > tα , we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 1% level of significance. That is, the correlation coefficient between the height and weight of the students is not zero.
Parametric Tests
77
EXERCISES 1.
The following bivariate data is obtained from a sample of five households whose monthly income (in rupees) and their electricity consumption (in units). Examine whether the monthly income and the electricity consumption for the households are correlated at 5% level of significance. Income: Electricity: Income: Electricity:
2.
12150 165 15300 155
16500 174 14800 168
17610 180 16500 188
10800 170 14800 175
16300 185 16800 185
A random sample of 15 students is selected; the correlation coefficient between their IQ and their English aptitude is obtained as 0.68. Examine whether, in general, IQ and English aptitude are correlated or not at 1% level of significance.
TEST – 20
TEST FOR POPULATION CORRELATION COEFFICIENT EQUALS A SPECIFIED VALUE Aim
To test the correlation coefficient in the population ρ be regarded as ρ0 (assumed value), based on a bivariate random sample. That is, to investigate the significance of the difference between the assumed population correlation coefficient ρ0 and the sample correlation coefficient r. Source
Let (X i, Yi), (i = 1, 2,…, n) be a random sample of n pairs of observations drawn from a bivariate normal population whose correlation coefficient ρ is unknown. Let r be the correlation coefficient based on the above sample. Assumptions
(i) The population from which, the sample drawn, is a bivariate normal population. (ii) The relationship between X and Y is linear. (iii) The variance in the Y values is independent of the X values. Null Hypothesis
H0 : The population correlation coefficient ρ is ρ0. That is, there is no significant difference between the sample correlation coefficient r and the assumed population correlation coefficient ρ0. i.e., H0: ρ = ρ0 Alternative Hypothesis
H1: ρ ≠ ρ0 Level of Significance ( α ) and Critical Region:
(As in Test 1)
Test Statistic
Z=
U −ξ (Under H0: ρ = ρ0) 1 n− 3
Parametric Tests
79
U=
1 (1 + r ) log e 2 (1 − r )
and ξ =
1 (1 + ρ) log e 2 (1 − ρ)
The statistic Z follows Standard Normal distribution. Conclusion
If Z ≤ Z α , we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example 1 The past record of the correlation coefficient between age (X) and height (X) of children reveals that it is 0.83. A random sample of 50 children whose age and weight is observed and the correlation coefficient is obtained as 0.88. Test whether the sample information is significant with the past record at 2% level. Solution Aim: To test the sample information on the age and height of the children whose correlation coefficient is significant with the past record or not. H0: The correlation coefficient between the age and weight of the children is 0.83. i.e. , H0 : ρ = 0.83. H1: The correlation coefficient between the age and weight of the children is not 0.83. i.e., H1 : ρ ≠ 0.83. Level of Significance: α = 0.02 and Critical Value: Zα= 2.33 Calculations: U=
1 (1 + 0.88) 1 (1 + r ) log e = 2 log e (1 − 0.88) = 1.3757 2 (1 − r )
and
ξ =
1 (1 + ρ) 1 (1 + 0.83) log e = log e =1.1881 2 2 (1 − ρ) (1 − 0.83)
Test Statistic:
Z=
U −ξ 1. 3757 − 1. 1881 = = 1.29 (Under H0 : ρ = 0.83) 1 1 n− 3 50 − 3
Conclusion: Since |Z| < Zα, we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence accept H0 at 2% level of significance. That is, the correlation coefficient between the age and weight of the children is 0.83. Example 2 The correlation coefficient between sales of textile cloths and advertising expenditure is expected by the sellers is 0.65 during the festival season. A random sample of 30 seller’s amount of sales and expenditure on advertisement is observed and correlation coefficient between them is obtained as 0.52. Examine whether the expectation by the sellers is true or not at 1% level.
80
Selected Statistical Tests
Solution Aim: To test the expectation by the sellers is true or not, that the correlation coefficient between sales of textile cloths and advertising expenditure is 0.65. H0: The expectation by the sellers is true, that the correlation coefficient between sales of textile cloths and advertising expenditure is 0.65. i.e., H0: ρ = 0.65 H1: The expectation by the sellers is true, that the correlation coefficient between sales of textile cloths and advertising expenditure is not 0.65. H1: ρ ≠ 0.65 Level of Significance: α = 0.01 and Critical Value: Zα= 2.58 Calculations: U=
ξ =
and
Test Statistic:
1 (1 + r ) 1 (1 + 0. 52) log e = log e = 0.5763 2 2 (1 − r ) (1 − 0. 52)
1 (1 + ρ) 1 (1 + 0. 65) log e = log e = 0.3367 2 ( 1 − ρ ) 2 (1 − 0. 65)
Z=
U −ξ 1 n− 3
=
0.5763 − 0. 3367 = 1.25 (Under H0: ρ = 0.83) 1 30 − 3
Conclusion: Since Z < Z , we conclude that the data do not provide us any evidence against the α null hypothesis H0 and hence accept H0 at 1% level of significance. That is, the expectation by the sellers is true, that the correlation coefficient between sales of textile cloths and adverting expenditure is 0.65.
EXERCISES 1. The medical record reveals that the correlation between the age of the mother and the birth weight of their first child is –0.24. A random sample of eight person’s age and their birth weight of their first child are observed as follows. Age of the Mother: 35 28 24 26 29 30 34 32 Birth weight of Child: 2.85 3.25 3.50 3.25 3.00 2.75 2.90 3.00 Examine whether the medical record provides the true information at 1% level of significance. 2. The age of husbands and their wives in India is correlated with correlation coefficient is 0.75. A random sample of 9 pairs is selected whose age is given below. Test whether this data reveals that the correlation coefficient in the population be 0.75 at 5% level of significance. Age of Husband: Age of Wife:
58 53
54 52
46 40
49 42
37 35
36 32
35 30
28 24
29 26
TEST – 21
TEST FOR POPULATION PARTIAL CORRELATION COEFFICIENT Aim
To test the population partial correlation coefficient ρ12.34…(k+2) be regarded as zero, based on a random sample. That is, to investigate the significance of the difference between zero and the partial correlation coefficient of order k (< n), r12.34…(k+2), (observed in a sample of size n from a multivariate normal population). Assumption
The sample is drawn, from a multivariate normal population. Source
A random sample of n observations be drawn from a multivariate normal population whose sample partial correlation coefficient of order k is r12.34…(k+2). Null Hypothesis
H0: The Population partial correlation coefficient ρ12.34…(k+2) = 0. That is, there is no significant difference between the sample partial correlation coefficient r12.34…(k+2) and zero. Alternative Hypothesis
H1: ρ12.34…(k+2) ≠ 0 Level of Significance ( α ) and Critical Region
t > tα ,(n – k – 2 ) such that P{ t > tα ,(n – k – 2 ) = α Test Statistic
t=
r12 .34...(k +2 ) 2
1 − r12 .34...(k + 2 )
(n − k − 2)
82
Selected Statistical Tests
The statistic t follows t distribution with (n–k–2) degrees of freedom. Conclusion
(As in Test 3).
Example An agricultural experiment was conducted to know the effect of some factors which influences the yield of paddy. The yield of paddy (Y) depends on the factors such as fertilizer used (X 1), irrigation (X 2), pesticides (X 3) and seed type (X 4). A sample study was conducted in 20 experimental units and it was found that the sample partial correlation coefficient between irrigation and fertilizer used was 0.23. Test whether the partial correlation coefficient of irrigation and fertilizer used in the yield of paddy is zero or not at 5% level of significance. Solution H0: The partial correlation coefficient of irrigation and fertilizer used in the yield of paddy is zero. i.e., H0: ρ12.34 = 0. H1: The partial correlation coefficient of irrigation and fertilizer used in the yield of paddy is zero. i.e., H1: ρ12.34 ≠ 0. Level of significance: α = 0.05 and Critical value: t0.05,11 = 2.201 Test Statistic:
t=
r12 .34...(k +2 ) 2
1 − r12 .34...(k + 2 )
(n − k − 2) =
0.23 × 15 − 2 − 2 1 − (0.23)
2
= 0.7838
Conclusion: Since t < t0.05,11, H0 is accepted and conclude that the partial correlation coefficient of irrigation and fertilizer used in the yield of paddy is zero.
TEST – 22
TEST FOR EQUALITY OF TWO POPULATION CORRELATION COEFFICENTS Aim
To test the two population correlation coefficients ρ1and ρ2 are equal, based on two independent bivariate random samples. That is, to investigate the significance of the difference between the two sample correlation coefficients r1 and r2. Source
A random sample of n1 pairs of observations be drawn from a bivariate population whose correlation coefficient ρ1 is unknown. A random sample of n2 pairs of observations be drawn from another bivariate population whose correlation coefficient ρ2 is unknown. The sample correlation coefficients of those two samples are r1 and r2 respectively. Assumptions
(i) The population from which the sample drawn is a bivariate normal population. (ii) The relationship between X and Y is linear. (iii) The variance in the Y values is independent of the X values. Null Hypothesis
H0: The two population correlation coefficients ρ1 and ρ2 are equal. That is, there is no significant difference between the sample correlation coefficient r1 and r2. i.e., H0: ρ1 = ρ2 Alternative Hypothesis
H1: ρ1 ≠ ρ2 Level of Significance ( α ) and Critical Region
(As in Test 1)
84
Selected Statistical Tests
Test Statistic
Z=
(U 1 − U 2 ) − (ξ1 − ξ 2 ) 1 1 + n1 − 3 n2 − 3
(Under H0: ρ1 = ρ2 ⇒ ξ1= ξ2)
(1 + r1 ) (1 + r2 ) (1 + ρ1 ) 1 1 1 U1 = 2 log e (1 − r ) , U2 = 2 log e (1 − r ) , ξ1 = 2 log e (1 − ρ ) 1 2 1 and
(1 + ρ2 ) 1 ξ2 = 2 log e (1 − ρ ) 2
The statistic Z follows Standard Normal distribution. Conclusion
If Z ≤ Z α , we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example A random sample of 29 children in City-A has the correlation coefficient between age and weight 0.72. Another sample of 29 children in City-B has the correlation coefficient between age and weight 0.8. Test whether the correlation coefficient between the age and height of the children in two cities is equal at 5% level of significance. Solution H0: The correlation coefficient between the age and height of the children in two cities is equal. i.e., H0: ρ1 = ρ2. H1: The correlation coefficient between the age and height of the children in two cities is not equal. i.e., H1: ρ1 ≠ ρ 2. Level of Significance: α = 0.05 and Critical value: Z0.05 = 1.96. Calculations: (1 + r1 ) 1 (1 + 0.72) 1 U1 = 2 log e (1 − r ) = 2 log e (1 − 0.72) = 0.91 1 (1 + r2 ) 1 (1 + 0.80) 1 U2 = 2 log e (1 − r ) = 2 log e (1 − 0.80) = 1.1 2
Test Statistic:
Z=
(U 1 − U 2 ) − (ξ1 − ξ 2 ) 1 1 + n1 − 3 n2 − 3
(Under H0: ρ1 = ρ2 ⇒ ξ1= ξ2)
Parametric Tests
85
=
(0.91 − 1.1) 1 1 + 29 − 3 29 − 3
= – 0.985
Conclusion: Since, Z < Z0.05, H0 is accepted and concluded that the correlation coefficient between the age and height of the children in two cities are equal.
TEST – 23
TEST FOR MULTIPLE CORRELATION COEFFICENT Aim
To test the multiple correlation coefficient in the population is zero, based on a sample multiple correlation coefficient. That is, to investigate the significance of the difference between the observed sample multiple correlation coefficient and zero. Source
A random sample of size n from a (k+1) variate population be drawn with multiple correlation coefficient R. That is, R is the observed multiple correlation coefficient of a variate (say, X 1) with k other variates (say, X 2, X 3, …, X k+1). Let ρ be the corresponding multiple correlation coefficient in the population. Assumptions
(i) The population from which the sample drawn is a (k+1) variate normal population. (ii) The relationship between X 1, X 2,…X k+1 are linear. Null Hypothesis
H0: The population multiple correlation coefficient, ρ is zero. That is, there is no significant difference between the sample multiple correlation coefficient R and zero. i.e., H0: ρ = 0. Alternative Hypothesis
H1: ρ ≠ 0. Level of Significance ( α) and Critical Region ( Fαα)
F > F α,(k, n–k–1) such that P{F > F α,(k, n–k–1)} = α. Critical value of F α is obtained from Table 4.
Parametric Tests
87
Test Statistic 2
n − k −1 k 1− R The statistic F follows F distribution with (k, n–k–1) degrees of freedom. F=
R
2
Conclusion
If F ≤ F α, we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example A random sample of 15 students was selected from a school and observed their marks in three subjects are obtained. The multiple correlation coefficient on the first subject to the other two subjects of the 15 students is found as 0.65. Test whether the multiple correlation coefficient on the first subject to the other two subjects in the school students is zero or not at 5% level of significance. Solution H0: The multiple correlation coefficient on the first subject to the other two subjects in the school students is zero. H1: The multiple correlation coefficient on the first subject to the other two subjects in the school students is not zero. Level of Significance: α = 0.05 and Critical value: F 0.05,(3,11) = 3.59 Test Statistic: F
=
2
(0. 65) 15 − 3 − 1 n − k −1 = = 2.68 2 2 3 k 1 − (0.65) 1− R R
2
Conclusion: Since, F < F 0.05,(3,11), H0 is accepted and concluded that the multiple correlation coefficient on the first subject to the other two subjects in the school students is zero.
TEST – 24
TEST FOR REGRESSION COEFFICIENT
Aim
To test the population regression coefficient of Y on X denoted by β be regarded as zero, based on a bivariate random sample. That is, to investigate the significance of the difference between the sample regression coefficient of Y on X, b and zero. Source
Let (X i, Yi), (i = 1, 2, …, n) be a random sample of n pairs of observations drawn from a bivariate normal population whose regression coefficient of Y on X is β. The sample regression coefficient of Y on X is denoted by b. Assumptions
(i) The population from which, the sample drawn, is a bivariate normal population. (ii) The relationship between X and Y is linear. Null Hypothesis
H0: The population regression coefficient of Y on X, β is zero. That is, there is no significant difference between the sample regression coefficient of Y on X, b and zero. i.e., H0: β = 0. Alternative Hypothesis
H1: β ≠ 0 Level of Significance ( α ) and Critical Region
|t| > tα,(n–2) such that P{|t| > tα,(n–2)} = α
Parametric Tests
89
Test Statistic
(n − 2) ( X − X ) 2 i i t = (b − β) 2 (Yi − yˆ i ) i
∑ ∑
b=
∑ ( X − X )(Y − Y ) ; ∑ (X − X ) i
i
2
(Under H0 : β = 0)
yˆi = Y + b( X i − X ) be the estimate of Y for a given value (say) xi of
i
X of the regression line of Y on X (for the given sample). The statistic t follows t distribution with (n–2) degrees of freedom. Conclusion
(As in Test 3)
Example A sample study was conducted on weight (Y ) and age (X ) of a sample of 8 children from a city. The regression coefficient of Y on X is found as 0.665 and sum of squares of deviation from the mean of Y is 44 and of X is 36. Test whether the regression coefficient in the weight and age of the children in the city is zero or not at 5% level of significance. Solution H0: The regression coefficient in the weight on age of the children in the city is zero. i.e., β = 0. H1: The regression coefficient in the weight on age of the children in the city is not zero. i.e., β ≠ 0. Level of significance: α = 0.05 and Critical value: t0.05,6 = 2.45
Test Statistic:
(n − 2) ( X − X ) 2 i i t = (b − β) 2 (Yi − yˆ i ) i
∑ ∑
= 0.665 ×
(8 − 2) × 36 = 1.4734 44
Conclusion: Since t < t0.05,6, H0 is accepted and concluded that the regression coefficient in the weight on age of the children in the city is zero.
TEST – 25
TEST FOR INTERCEPT IN A REGRESSION Aim
To test the regression that passes through the origin. That is, to investigate the significance of the difference between the intercept of a regression and zero. Source
A random sample of size n from a bivariate population be drawn. The intercept of the regression in the population is denoted by α. The regression with α = 0 is known as regression through origin. The linear regression in the sample is y = a + bx, where a is the intercept and b is the slope of the linear regression. Assumptions
(i) The population from which, the sample drawn is a bi-variate normal population. (ii) The relationship between Y and X are linear. Null Hypothesis
H0: The intercept of the regression in the population is zero. That is, there is no significant difference between the intercept of the linear regression in the sample and zero. i.e., H0: α = 0. Alternative Hypothesis
H1: α ≠ 0. Level of Significance ( α ) and Critical Region ( t αα)
t > tα,(n–2) such that P {t > tα,(n–2)} = α. Critical value of tα is obtained from Table 2.
Parametric Tests
91
Method
For the given bivariate data with Y is the dependent variable and X is the independent variable on n observations, calculate the following:
(i)
∑
y;
∑
2
y ;
∑x;
∑x
2
∑ xy ;
;
∑
(ii) Sum of Squares of the observations y = SS(Y) =
(iii) Sum of Squares of the observations x = SS(X) =
∑
x and y .
2 y –
∑
2 x –
∑ x
(iv) Sum of Products of the observations x and y = SP(XY) =
n
n
2
y . 2
.
x y ∑ xy – ∑ n∑ .
(v) The regression coefficient, b = SP(XY ) . SS ( X ) (vi) The intercept of the regression, a = y – bx . (vii) Sum of Squares due to regression b = SS(b) =
[SP( XY )]2 . SS (X )
(viii) ESS = SS(Y) – SS(b). ESS . (ix) Error Mean Square, se2 = n –1 Test Statistic
t=
a−0
2 1 se
n
(x)
SS ( X ) 2
+
The statistic t follows t distribution with (n–2) degrees of freedom. Conclusion
If t ≤ tα , we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1.
92
Selected Statistical Tests
Example From a Sorghum field, 36 plants were selected at random. The length of panicles (x) and the number of grains per panicle (y) of the selected plants were recorded. The results are given below. Fit a regression line of Y on X and test whether the intercept is zero at 5% level of significance. y
x
y
x
y
x
95 109 133
22.4 23.3 24.1
143 127 92
24.5 23.6 21.1
112 113 147
22.9 23.9 24.8
132 136
24.3 23.5
88 99
21.4 23.4
90 110
21.2 22.2
116 126 124
22.3 23.9 24.0
129 91 103
23.4 21.6 21.4
106 127 145
22.7 23.0 24.0
137 90
24.9 20.0
114 124
23.3 24.4
85 94
20.6 21.0
107
19.8
143
24.4
142
24.0
108
22.0
108
22.5
111
23.1
Solution H0: The intercept of the regression in the population is zero. That is, there is no significant difference between the intercept of the linear regression in the sample and zero. i.e., H0: α = 0. H1: α ≠ 0. Level of Significance: α = 0.05 and Critical value: t0.05, 34 = 2.04 Calculations: (i)
∑ y = 4174 ∑ y ∑ x y = 96183.4
2
= 496258
∑ x = 822.9 ∑ x
x = 22.86 and
2
= 18876.83.
y = 115.94
(ii) Sum of Squares of the observations y = SS(Y) = (iii) Sum of Squares of the observations x = SS(X) =
∑y
2
∑x
(∑ y ) –
= 12305.89.
n
2
–
(iv) Sum of Products of the observations x and y = SP(XY) = SP(XY ) =11.5837. SS ( X ) (vi) The intercept of the regression, a = y – bx = –148.8396. (v) The regression coefficient, b =
2
(∑ x ) n
2
= 66.7075.
x y ∑ xy – ∑ n∑
= 772.7167.
Parametric Tests
93
(vii) Sum of Squares due to regression b = SS(b) =
[SP( XY )]2 SS ( X )
= 8950.884.
(viii) ESS = SS(Y) – SS(b) = 3355.0048. ESS (ix) Error Mean Square, se2 = = 98.6766. n –1 Test Statistic:
t=
a−0 2 1 se
( x) + n SS ( X ) 2
=
−148. 8396 − 0 1 (22.86 )2 98.6766 + 36 66. 7075
= 9.506
Conclusion: Since t > t0.05, 34, H0 is rejected and concluded that the intercept α is significantly different from zero. In other words, the regression does not pass through the origin.
This page intentionally left blank
CHAPTER – 3
ANALYSIS OF VARIANCE TESTS
This page intentionally left blank
TEST – 26
TEST FOR COMPLETELY RANDOMIZED DESIGN Aim
To test the significance of the t treatment effects based on the observations from n experimental units. Source
Let yij, (i = 1, 2,…, t; j = 1, 2,…, r) be the observations of t treatments, each replicated with (equal number of replications) r times in n experimental units (i.e., n = tr). In this design, treatments are allocated at random to the experimental units over the entire experimental material. That is, the entire experimental material is divided into n experimental units and the treatments are distributed completely at random over the units. Linear Model
The linear model is yij = µ + τi + εij ; (i = 1, 2,…, t; j = 1, 2,…, r), where yij is the observation from the jth replication of the ith treatment, µ is the overall mean effect, τi is the effect due to the ith treatment and εij is the error effect due to chance causes. Assumptions
(i) (ii) (iii) (iv)
The population from which, the observations drawn is Normal distribution. The observations are independent. The various effects are additive in nature. εij are identically independently distributed as Normal distribution with mean zero and variance σ 2ε .
Null Hypothesis
H0: The k treatments have equal effect. i.e., H0: τ1 = τ2 = … = τt.
98
Selected Statistical Tests
Alternative Hypothesis
H1: The k treatments do not have equal effect i.e., H1: τ1 ≠ τ2 ≠ … ττ . Level of Significance ( α) and Critical Region ( Fαα)
F > F α,(t–1, n-t) such that P [F > F α,(t–1, n–t)] = α. The critical values of F at level of Significance α and degrees of freedom (t–1, n–1), are obtained from Table 4. Method
Calculate the following, based on the observations: t
1. Grand total of all the observations, G =
r
∑∑ y ij i =1 j=1
2. Correction Factor, CF = G2/n t
3. Total Sum of Squares, TSS =
r
∑∑ y
2 ij
i =1 j =1
– CF t
∑
1 2 Ti – CF 4. Sum of Squares between Treatments, SST = r i =1 Ti be the total of the ith treatment observations from all the replications. 5. Error Sum of Square (Sum of Squares within treatments), ESS = TSS – SST Analysis of Variance ( ANOVA) Table Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Treatments
t–1
SST
SST/(t – 1)
Error
n–t
ESS
ESS/(n – t)
Total
n–1
TSS
–
Test Statistic
F=
SST / (t – 1) ESS /(n – t )
The Statistic F follows F distribution with (t–1, n–t) degrees of freedom. Conclusion
If F ≤ F α,(t–1,n–t), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1.
Analysis of Variance Tests
99
Example 1 The following data denotes the four “tropical feed stuffs A, B, C, D” tried on 20 chicks is given below. All the twenty chicks are treated alike in all respects except the feeding treatments and each feeding treatment is given to five chicks. Test whether all the four feedstuffs are alike in weight gain of the chicks at 5% level of significance. A: B: C: D:
55 61 42 169
49 112 97 137
42 30 81 169
21 89 95 85
52 63 92 154
Solution Aim: To test all the four feedstuffs are equal in weight gain of chicks. H0: The four feedstuffs are equal in weight gain of chicks. H1: The four feedstuffs are not equal in weight gain of chicks. Level of Significance: α = 0.05 and Critical value: F 0.05,(3,16) = 3.06 Calculations: Number of treatments, t = 4 n = 20 T1 = 219 T2 = 355 T3 = 407 T4 = 714 Grand Total, G = 1695 2 CF = 1695 /20 = 143651.25 TSS = 552+…+1542 – CF = 181445 – 143651.25 = 37793.75 1 SST = (2192 + … + 7142) – CF = 26234.95 5 ESS = TSS – SST = 11558.80 ANOVA Table: Sources of variation Treatments
Test Statistic:
Degrees of freedom
Sum of squares
Mean sum of squares
3
26234.95
8744.98
Error
16
11558.80
722.42
Total
19
37793.75
–
SST / (t – 1) 8744. 98 F = ESS /(n – t ) = = 12.111 722.42
Conclusion: Since F > F 0.05,(3,16), we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 5% level of significance. That is, the four feedstuffs are not equal in weight gain of chicks. Example 2 In order to study the yield of five types of sesame, say, A, B, C, D, E an experiment was conducted using CRD with four pots per type. The outputs are given below. Examine whether all the four types of sesame are equal in their yield at 1% level of significance.
100
Selected Statistical Tests
A: B: C: D: E:
25 25 24 20 14
21 28 24 17 15
21 24 16 16 13
18 25 21 19 11
Solution Aim: To test all the five types of sesame are equal in their yields. H0: The five types of sesame are equal in their yields. H1: The five types of sesame are not equal in their yields. Level of Significance: α = 0.01 and Critical value: F 0.01,(4,15) = 4.89 Calculations: Number of treatments, t = 5 n = 20 Grand Total, G = 397 T1 = 85 T2 = 102 T3 = 85 T4 = 72 T5 = 53 2 CF = 397 /20 = 7880.45 TSS = 252 + … + 11 2 – CF = 8307 – 7880.25 = 426.55 1 SST = (852 + … + 532) – CF = 331.30 4 ESS = TSS – SST = 95.25 ANOVA Table: Sources of variation
Sum of squares
Mean sum of squares
4
331.30
82.825
Error
15
95.25
6.35
Total
19
426.55
–
Treatments
Test Statistic:
Degrees of freedom
SST / (t – 1 ) 82.825 F = ESS / (n – t ) = = 13.04 6. 35
Conclusion: Since F > F 0.01,(4,15), we conclude that the data provide us evidence against the null hypothesis H0 and in favor of H1. Hence, H1 is accepted at 5% level of significance. That is, the five types of sesame are not equal in their yields.
EXERCISES 1. To test the effect of small proportion of coal in the sand used for manufacturing concrete, several batches were mixed under identical conditions except for the variation in the percentage of coal. From each batch, several cylinders were made and tested for breaking strength. The results obtained are given below.
Analysis of Variance Tests
101
.00
.05
.10
.50
1.00
1560
1650
1740
1540
1490
1575
1560
1680
1490
1510
1650
1640
1690
1560
1540
1665
1670
1710
1480
1470
Test whether all the five cylinders show equal breaking strength. 2. A varietals trial on green gram was conducted in a CRD with five varieties. The results are given below. Test whether all the four varieties of green gram are equal in their yields at 1% level of significance. Varieties 1
2
3
4
5
12.5
14.2
14.6
15.2
13.5
14.2
13.5
14.3
14.8
14.2
13.2
12.8
13.8
15.6
14.6
14.3
12.9
12.9
14.9
15.2
15.2
13.2
14.2
15.3
14.9
TEST – 27
ANOCOVA TEST FOR COMPLETELY RANDOMIZED DESIGN Aim
To test the significance of the treatment effects and the significance of the regression coefficient of Y on X, based on the observations from n experimental units. Source
Let (Yij, X ij) (i = 1, 2,…, t; j = 1, 2,…, r) be the observations made from an experiment consists of t treatments each with replicated r times on two variables Y and X. The observations on auxiliary or concomitant variable, X apart from the main variable Y under study is available for each of the experimental units. When Y and X are associated, a part of the variation of Y is due to variation in values of X. After eliminating, the effects of blocks and treatments one can then estimate a relationship, between Y and X and use that relationship to predict the value of Y for a given value of X. This test is used for assessing the significance of relationship between X and Y. If there is, a significant association between X and Y one may calculate the adjusted treatment sum of squares and perform the test for the homogeneity of treatment effects. Let n = t × r. The observed data is arranged as follows: Treatments 1 Y Y11 Y12 … … … Y1r
2 X X11 X12 … … … X1r
Y Y21 Y22 … … … Y2r
X X21 X22 … … … X2r
… … … … … … … …
T Y Yt1 Yt2 … … … Ytr
X Xt1 Xt2 … … … Xtr
TYt
TXt
Treatment totals TY1
TX1
TY2
TX2
…
Analysis of Variance Tests
103
Linear Model
The linear model is Yij = µ + τi + b(X ij – X ) + εij where Yij is the observation from the jth replication of the ith treatment of the variable Y, X ij is the observation from the jth replication of the ith treatment of the concomitant variable X, X is the mean of X, µ is the overall mean effect, τi is the effect due to the ith treatment, b is the regression coefficient of Y on X and εij is the error effect due to chance causes. Assumptions
(i) (ii) (iii) (iv)
The population from which, the observations drawn is Normal distribution. The observations are independent. The various effects are additive in nature. εij are identically independently distributed as Normal distribution with mean zero and variance 2
σε . (v) The auxiliary variable X is correlated with Y. Null Hypotheses
H0(1): The regression coefficient b is insignificant. H0(2): The k treatments have equal effect. i.e., H0(2): τ1 = τ2 = … = ττ . Alternative Hypotheses
H1(1): The regression coefficient b is significant. H1(2): The k treatments do not have equal effect. i.e., H1(2): τ1 ≠ τ2 ≠ … ≠ ττ . Level of Significance ( α ) and Critical Region
F 1 > F α,(1,n–t–1) such that P [F 1 > F α,(1,n–t–1)] = α. F 2 > F α,(t–1,n–t–1) such that P [F 2 > F α,(t–1,n–t–1)] = α. The critical values of F at level of Significance α and degrees of freedoms (1,n–t–1) and (t–1, n–t–1) are given in Table 4. Method
Calculate the following, based on the observations. For variable Y t
1. Grand total of all the observations of Y, GY =
r
∑ ∑ Yij i =1 j =1
104
Selected Statistical Tests 2
GY 2. Correction Factor, CF Y = . n t
3. Total Sum of Squares, GYY =
r
∑∑ Y
2 ij
i =1 j =1
– CF Y
t
∑
1 2 TYi – CF 4. Treatment Sum of Squares, TYY = r Y i =1 th Tyi be the total of the i treatment observations of Y. 5. Error Sum of Squares, E YY = GYY – TYY For variable X t
6. Grand total of all the observations, GX =
r
∑∑ X
ij
i =1 j =1
2
G 7. Correction Factor, CF X = X n
t
8. Total Sum of Squares, GXX =
r
∑∑ X
2 ij
i =1 j =1
1 9. Treatment Sum of Squares, TXX = r
t
– CF X
∑T i =1
2 Xi
– CF X
TXi be the total of the ith treatment observations of X, from all the replications. 10. Error Sum of Squares, E XX = GXX – TXX For variables Y and X
11. Correction Factor, CF YX =
GY × G X n t
12. Total Sum of Products of Y and X, GYX
r
= ∑∑ Yij × X ij – CF YX i =1 j=1
1 t TYi × T Xi – CF 13. Treatment Sum of products of Y and X, TYX = r YX i =1
∑
14. Error Sum of Products, E YX = GYX – TYX 15. The regression coefficient within treatment, b = E YX/ E XX
Analysis of Variance Tests
105
Test Statistic
F1 =
E YY
E2 YX /1 E XX 2 E − YX /(n − t − 1) E XX
F 1 follows F distribution with (1, (n–t–1)) degrees of freedom. Conclusion
If F 1 ≤ F α,(1,n–t–1), accept H0 and conclude that the regression coefficient of Y on X is insignificant. If F 1 > F α,(1,n–t–1), reject H0 or accept H1 and conclude that the regression coefficient of Y on X is significant and proceed to make adjustments for the variate. Calculate the following adjusted values for the variable Y: 2
2
GYX E ; E′YY = EYY − YX ; T′YY = GYY ′ − EYY ′ G XX E XX One degree of freedom is lost in error due to fitting a regression line. The above calculations are provided as a single table as follows: G′YY = GYY −
Analysis of Covariance ( ANOCOVA) Table Sources of
Degrees of
Sum of squares and products
variation Treatments
freedom t–1
Y TYY
X TXX
YX TYX
Error
n–t
EYY
EXX
EYX
Total
n–1
GYY
GXX
GYX
TAR Denotes the Treatment Adjusted for the average Regression within Treatments. Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
TAR
t –1
T′YY
TYY ′ / t −1
Error
n – t –1
E′YY
EYY ′ / n − t −1
Total
n–2
G′YY
–
Test Statistic
′ /(t − 1) TYY F 2= E ′ /(n − t − 1) YY The Statistic F follows F distribution with (t–1, n–t–1) degrees of freedom.
106
Selected Statistical Tests
Conclusion
If F2 ≤ Fα, (t–1, n–t–1), we conclude that the data do not provide us any evidence against the null hypothesis H0(2), and hence it may be accepted at α% level of significance. Otherwise reject H0(2) or accept H1(2). Example The following data shows the age, X (in months) and weight, Y (in kgs) of samples of children from three states namely Tamilnadu (A), Kerala (B) and Karnataka (C). Test whether the regression coefficient of Y on X is significant and the children from all the three states are homogeneous. A Y
B
C
X
Y
X
Y
X
7.25 8.65 12.5 15.5
9 10 12 14
10.5 12.5 7.5 15.5
10 11 6 12
8.5 12.5 18.5 16.5
8 9 15 13
16.5
15
16.5
14
13.5
10
Solution H0(1): The regression coefficient of weight on age, b is insignificant. H0(2): The children from the three states are homogeneous. H1(1): The regression coefficient of weight on age, b is significant. H1(2): The children from the three states are not homogeneous. Level of Significance: α = 0.05 Critical Values: F 0.05,(1,11) = 4.84 and F 0.05,(2,11) =3.98 Calculations: For variable Y 2
G 2. CF Y = Y = 2467.85 n
1. GY = 192.4; t
r
∑∑ Y
3. GYY =
2 ij
i =1 j =1
– CF Y = 2660.3225 – 2467.85 = 192.4725
t
∑
1 2 TYi – CF Y = 2476.932 – 2467.85 = 9.082 r i =1 = GYY – TYY = 192.4725 – 9.082 = 183.3905
4. TYY = 5. E YY
For variable X t
6. GX =
r
∑∑ X i =1 j =1
2
ij
= 168;
7. CF X =
GX = 1881.6 n
Analysis of Variance Tests t
8. GXX =
107
r
∑∑ X
2 ij
– CF X = 1982 – 1881.6 = 100.4
i =1 j =1
1 9. TXX = r
t
∑T
2 Xi
– CF X = 1886.8 – 1881.6 = 5.2
i =1
10. E XX = GXX – TXX = 100.4 – 5.2 = 95.2 For variables Y and X
11. CF YX =
GY × G X = 2154.88 n t
r
∑∑ Y
12. GYX =
ij
i =1 j =1
1 13. TYX = r
t
∑T
Yi
i =1
× X ij – CF = 2278.25 – 2154.88 = 123.37 YX
× T Xi – CF = 2151.8 – 2154.88 = –3.08 YX
14. E YX = GYX – TYX = 123.37 – (–3.08) = 126.45 15. b = E YX/E XX = 126.45/95.2 = 1.3283
Test Statistic: F 1 =
E YY
E2 YX /1 15989.602 E XX 95.2 = = 119.71 2 ( 183 . 3905 − 167.958) / 11 EYX − /(n − t − 1) E XX
Conclusion: Since F 1 > F 0.05,(1,11), reject H0(1), accept H1(1) and conclude that the regression coefficient of Y on X is significant. That is, the regression coefficient of weight on age of the children is significant. Calculate the following adjusted values for the variable Y 2
G′YY = GYY
2 GYX (123.37 ) – = 192.4725 – = 40.8773 G XX 100.4 2
E′YY = EYY −
2 E YX (126.45) = 183.3905 – = 15.4325 E XX 95.2
' ' T′YY = GYY = 40.9773 – 15.4325 = 25.4448 − E YY
108
Selected Statistical Tests
ANOCOVA Table: Sources of variation Treatments
Degrees of freedom
Y
Sum of squares and products X
YX
2
9.082
5.2
–3.08
Error
12
183.39
95.2
126.45
Total
14
192.47
100.4
123.37
TAR denotes the treatment adjusted for the average regression within treatments. Sources of variation
Test Statistic:
Degrees of freedom
Sum of squares
Mean sum of squares
TAR
2
25.4448
12.7224
Error
11
15.4325
1.403
Total
13
40.8773
–
′ /(t − 1) TYY F 2 = E ′ /(n − t − 1) = 9.068 YY
Conclusion: Since F 2 > F 0.05,(2,11), we conclude that the data provide us evidence against the null hypothesis H0(2) and in favor of H1(2). Hence H1(2) is accepted at 5% level of significance. That is, the children in the three states are not homogeneous in their weights and ages.
TEST – 28
TEST FOR RANDOMIZED BLOCK DESIGN Aim
To test the significance of the t treatment effects and the significance of the r block effects based on the observations from n experimental units. Source
Let yij, ( i = 1, 2,…, t ; j = 1, 2,…, r) be the observations of k treatments, each applied with (equal number of replications) r times in n experimental units. In this design, the entire experimental material is divided into r homogeneous blocks, each block is further divided into t sub units such that t × r = n. The t treatments are allocated to each block randomly and for every r blocks. That is, randomization is restricted within blocks. Linear Model
The linear model is yij = µ + τi + βj + εij ; (i = 1, 2,…, t ; j = 1, 2, …, r) th where yij is the observation from the j block of the ith treatment, µ is the overall mean effect, τi is the effect due to the ith treatment, βj is the effect due to the jth block and εij is the error effect due to chance causes. Assumptions
(i) (ii) (iii) (iv)
The population from which, the observations drawn is Normal distribution. The observations are independent. The various effects are additive in nature. εij are identically independently distributed as Normal distribution with mean zero and variance 2
σε . Null Hypotheses
H0(1): The k treatments have equal effect. i.e., H0: τ1 = τ2 = … = ττ . H0(2): The r blocks have equal effect. i.e., H0: β1 = β2 = … = βr.
110
Selected Statistical Tests
Alternative Hypotheses
H1(1): The k treatments do not have equal effect. i.e., H1: τ1 ≠ τ2 ≠ … ≠ ττ . H1(2): The r blocks do not have equal effect. i.e., H1: β1 ≠ β2 ≠ … ≠ β r. Level of Significance ( α ) and Critical Region
1. F 1 > F α,(t–1), (t–1)(r–1) such that P [F 1 > F α,(t–1), (t–1)(r–1)] = α. 2. F 2 > F α,(r–1), (t–1)(r–1) such that P [F 2 > F α,(r–1), (t–1)(r–1)] = α. The critical values of F at level of Significance α and degrees of freedoms, (t – 1), (t –1) (r –1) and for (r – 1, (t – 1) (r – 1)) are obtained from Table 4. Method
Calculate the following, based on the observations. t
1. Grand total of all the observations, G =
r
∑∑ y ij i =1 j=1
2. Correction Factor, CF = G2/n t
3. Total Sum of Squares, TSS =
r
∑∑ y
2 ij
i =1 j =1
– CF
1 4. Sum of Squares between Treatments, SST = r Ti be the total of the ith treatment observations. 1 5. Sum of Squares between Blocks, SSB = k
r
∑B
t
∑T
i
i =1
2 j
j =1
2
– CF
– CF
βj be the total of the jth Block observations. 6. Error Sum of Squares, ESS = TSS – SST – SSB. Analysis of Variance (Anova) Table Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Treatments
t –1
SST
SST/(t – 1)
Blocks
r–1
SSB
SSB/(r – 1)
Error
(t – 1) (r – 1)
ESS
ESS/(t – 1)(r – 1)
Total
n–1
TSS
–
Analysis of Variance Tests
111
Test Statistics
(1)
SST /(t − 1) F 1 = ESS /(t − 1)(r − 1)
(2)
SSB /(r − 1) F 2 = ESS /(t − 1)(r − 1)
The statistic F 1 follows F distribution with (t – 1),(t – 1)(r – 1) degrees of freedom and the statistic F 2 follows F distribution with (r – 1),(t – 1)(r – 1) degrees of freedom. Conclusions
If F 1 ≤ F α,(t–1), (t–1)(r–1) , we conclude that the data do not provide us any evidence against the null hypothesis H0(1), and hence it may be accepted at α% level of significance. Otherwise reject H0(1) or accept H1 (1). If F 2 ≤ F α,(r–1), (t–1)(r–1), we conclude that the data do not provide us any evidence against the null hypothesis H0(2), and hence it may be accepted at α% level of significance. Otherwise reject H0(2) or accept H1 (2). Example 1 The following result shows the yield of three varieties of paddy manure in four plots each using RBD layout. Block
Paddy Varieties
Total
ADT36
IR20
PONNI
I
46.2
48.5
54.3
149
II
48.4
52.6
57.0
158
III
44.3
51.4
53.3
149
IV
49.1
53.5
51.4
154
Total
188
206
216
610
Solution Aim: 1. To test the yield of all the three varieties of paddy are equal. 2. To test the yield in all the four blocks are equal. H0(1): The yields of all the three varieties of paddy are homogeneous. H0(2): The yields in all the four blocks are homogeneous. H1(1): The yields of all the three varieties of paddy are not homogeneous. H1(2): The yields in all the four blocks are not homogeneous. Level of Significance: α = 0.05 Critical values: F 0.05,(2,6) = 5.14 and F 0.05,(3,6) = 4.76 Calculations: No. of treatments, t = 3; No. of Blocks, r = 4, Grand total, G = 610 CF = 6102/12 = 31008.33 TSS = 46.22 + … + 51.42 – CF = 31153.86 – 31008.33 = 145.53
112
Selected Statistical Tests
1 (1882 + 2062 + 2162) – CF = 100.67 4 1 BSS = (1492 + 1582 + 1492 + 1542) – CF = 19.003 3 ESS = TSS – SST – BSS = 25.857 ANOVA Table: SST =
Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Treatments
2
100.67
50.335
Blocks
3
19.003
6.334
Error
6
25.857
4.3095
Total
11
145.53
–
Test Statistics: 1.
SST /(t − 1) 50.335 F 1 = ESS /(t − 1)(r − 1) = = 11.68 4.3095
2.
SSB /(r − 1) 6. 334 F 2 = ESS /(t − 1)(r − 1) = = 1.47 4.3095
Conclusions: 1. Since, F 1 > F 0.05,(2,6), we conclude that the data provide us any evidence against the null hypothesis H0(1) and in favor of H1(1). Hence H1(1) is accepted at 5% level of significance. That is, the yields of all the three varieties of paddy are not homogeneous. 2. Since, F 2 < F 0.05,(3,6), we conclude that the data do not provide us any evidence against the null hypothesis H0(2), and hence it may be accepted at 5% level of significance. That is, the yields in all the four blocks are homogeneous. Example 2 A varietal trial was conducted on four varieties of sorghum at a research station. The design adopted was five randomized blocks of four plots each. The yield in lb. per plot obtained from the experiment is as follows. Analyze the data and comment on your findings. T1
Varieties T2 T3
I
22.5
28.2
32.5
26.8
110
II
27.6
29.6
36.8
24.0
118
III
24.4
27.4
34.2
25.0
111
IV
28.6
30.8
35.3
26.3
121
V
25.9
31.0
36.2
23.9
117
129
147
175
126
577
Blocks
Total
T4
Total
Analysis of Variance Tests
113
Solution Aim:
1. To test the yield of all the four varieties of sorghum are equal. 2. To test the yield in all the five blocks are equal. H0(1): The yields of all the four varieties of sorghum are homogeneous. H0(2): The yields in all the five blocks are homogeneous. H1(1): The yields of all the four varieties of sorghum are not homogeneous. H1(2): The yields in all the five blocks are not homogeneous. Level of Significance: α = 0.05 Critical values: F 0.05,(3,12) = 3.49 and F 0.05,(4,12) = 3.26 Calculations: No. of treatments, t = 4; No. of Blocks, r = 5, Grand total, G = 577 CF = 5772/20 = 16646.45 TSS = 22.52 + … + 23.92 – CF = 17002.74 – CF = 356.29 SST = (1292 + 1472 + 1752 – 1262) – CF = 303.75 BSS = (1102 + 1182 + 1112 + 1212 – 1172) – CF = 22.3 ESS = TSS – SST – BSS = 30.24 ANOVA Table: Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Treatments
3
303.75
101.25
Blocks
4
22.3
5.575
Error
12
30.24
2.52
Total
19
356.29
–
Test Statistics: 1.
SST /(t − 1) 50.335 F 1 = ESS /(t − 1)(r − 1) = = 40.18 4.3095
2.
SSB /(r − 1) 6. 334 F 2 = ESS/(t − 1)(r − 1) = = 2.21 4.3095
Conclusions: 1. Since, F 1 > F 0.05,(3,12), we conclude that the data provide us any evidence against the null hypothesis H0 (1) and in favor of H1(1). Hence H1(1) is accepted at 5% level of significance. That is, the yields of all the four varieties of sorghum are not homogeneous. 2. Since, F 2 < F 0.05,(4,12), we conclude that the data do not provide us any evidence against the null hypothesis H0(2), and hence it may be accepted at 5% level of significance. That is, the yields in all the five blocks are homogeneous.
114
Selected Statistical Tests
EXERCISE 1. An experiment was conducted to test the effect of different treatment of warp beams on the warp breakage-rates during weaving. Four wrap beams A, B, C and D were treated differently and were woven simultaneously on four looms over four days. At the end of the each day, the warp beams were interchanged between the four experimental looms in such a manner as to ensure that after completion of the experiment, the warp beam had worked on each of the four looms for one day. The plan of the experiment and the wrap breakage rates are given in the following table. Analyze the data and draw your conclusions. Day of weaving Loom
1
2
3
4
1
4.37(D)
5.24(C)
6.31(B)
6.28(A)
2
6.54(C)
6.58(B)
5.85(A)
5.94(D)
3
5.68(B)
6.12(A)
6.55(D)
5.85(C)
4
6.15(A)
5.85(D)
5.75(C)
6.25(B)
TEST – 29
TEST FOR RANDOMIZED BLOCK DESIGN (More than one observation per cell)
Aim
To test the significance of the t treatment effects and the significance of the r block effects and the interaction between treatments and blocks based on the observations from n experimental units. Source
Let yijk, (i = 1, 2,…, t ; j = 1, 2,…, r ; k = 1, 2,…, m) be the k th observation in the ith treatment and in the jth block. Let n = t × r × m. Linear Model
The linear model is yijk = µ + τi + βj + γij + εij where µ is the overall mean effect, τi is the effect due to the ith treatment, βj is the effect due to th the j block, γij is the interaction effect between ith treatment with jth block and εij is the error effect due to chance causes. Assumptions
(i) (ii) (iii) (iv)
The population from which, the observations drawn is Normal distribution. The observations are independent. The various effects are additive in nature. εij are identically independently distributed as Normal distribution with mean zero and variance 2
σε . t
(v)
∑ i =1
r
τi =
∑β j =1
j
=0
t
(vi)
∑γ
(vii)
∑γ
ij
= 0 for all j.
ij
= 0 for all i.
i =1 r
j =1
116
Selected Statistical Tests
Null Hypotheses
H0(1): The k treatments have equal effect. i.e., H0: τ1 = τ2 = …, = τt. H0(2): The r blocks have equal effect. i.e., H0: β1 = β2 = …, = βr. H0(3): The interaction effect between treatments and blocks is insignificant. i.e., H0: γij = 0 for all i and j. That is, treatment effects and block effects are independent of each other. Alternative Hypotheses
H1(1): The k treatments do not have equal effect. i.e., H1: τ1 ≠ τ2 ≠ …, ≠ τt. H1(2): The r blocks do not have equal effect. i.e., H1: β1 ≠ β2 ≠ …, ≠ βr. H1(3): The interaction effect between treatments and blocks is significant. i.e., H0: γij ≠ 0 for i and j. That is, treatment effects and block effects are interacted with each other. Level of Significance ( α ) and Critical Region
1. F 1 > F α, (t – 1), (tr(m – 1)) such that P [F 1 > F α, (t – 1), (tr(m – 1))] = α. 2. F 2 > F α, (r – 1), (tr(m – 1)) such that P [F 2 > F α, (r – 1), (tr(m – 1))] = α. 3. F 3 > F α, (t – 1)(r – 1), (tr(m – 1)) such that P [F 3 > F α,(t – 1)(r – 1), (tr(m – 1))] = α. The critical values of F at level of Significance α are obtained from Table 4. Method
Calculate the following, based on the observations: t
1. Grand total of all the observations, G =
r
m
∑∑∑ y
ijk
i =1 j =1 k =1
2. Correction Factor, CF = G2/n t
3. Total Sum of Squares, TSS =
r
m
∑∑∑ y
2 ijk
– CF
i =1 j =1 k =1
1 rm Ti be the total of the ith treatment observations.
4. Sum of Squares between Treatments, SST =
1 km th B j be the total of the j Block observations. 6. Sum of Squares due to interaction, 5. Sum of Squares between Blocks, SSB =
1 t r 2 T − CF SSI = m ij – SST – SSI. i =1 j =1 7. Error Sum of Square (ESS), ESS = TSS – SST – SSB – SSI.
∑∑
r
t
∑T
i
– CF
i =1
∑B j =1
2
2 j
– CF
Analysis of Variance Tests
117
Analysis of Variance Table
Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Treatments
t–1
SST
SST/(t – 1)
Blocks
r–1
SSB
SSB/(r – 1)
(t – 1)(r – 1)
SSI
SSI/(t – 1)(r – 1)
Error
tr (m – 1)
ESS
ESS/tr(m – 1)
Total
n–1
TSS
–
Interaction
Test Statistics
1.
SST /(t − 1) F 1 = ESS /tr(m − 1)
2.
SSB/(r − 1) F 2 = ESS/tr(m − 1)
3.
F3 =
SSI /(t − 1)(r − 1) ESS/tr(m − 1)
The statistic F 1 follows F distribution with (t – 1), tr(m – 1) degrees of freedom, the statistic F 2 follows F distribution with (r – 1), tr(m – 1) degrees of freedom and the statistic F 3 follows F distribution with (t – 1)(r – 1), tr(m – 1) degrees of freedom. Conclusions
If F 1 ≤ F α,(t–1), (tr(m–1)), we conclude that the data do not provide us any evidence against the null hypothesis H0(1), and hence it may be accepted at α% level of significance. Otherwise reject H0(1) or accept H1(1). If F 2 ≤ F α,(r–1), (tr(m–1)), we conclude that the data do not provide us any evidence against the null hypothesis H0(2), and hence it may be accepted at α% level of significance. Otherwise reject H0(2) or accept H1(2). If F 3 ≤ F α,(t–1)(r–1), (tr(m–1)), we conclude that the data do not provide us any evidence against the null hypothesis H0(3), and hence it may be accepted at α% level of significance. Otherwise reject H0(3) or accept H1(3). Example The following data shows the birth weights of babies born, classified according to the age of mother and order of gravida, there being three observations per cell. Test whether the age of mother and order of gravida significantly affect the birth weight of children.
118
Selected Statistical Tests
Order of gravida
Age of mother
1 2 3 4 5 & above
15 – 20
20 – 25
25 – 30
30 – 35
Above 35
5.1 5.0 4.8 5.2 5.2 5.4 5.8 5.7 5.9 6.0 6.0 5.9 6.0 6.0 6.0
5.0 5.1 5.3 5.3 5.3 5.5 6.0 5.9 6.2 6.2 6.5 6.0 6.0 6.1 6.3
5.1 5.1 4.9 5.3 5.2 5.2 5.8 5.9 5.9 6.0 6.1 6.0 5.9 6.0 5.8
4.9 4.9 5.0 5.2 5.0 5.5 5.8 5.5 5.5 6.0 5.8 5.5 5.9 6.0 5.5
5.0 5.0 5.0 5.1 5.3 5.9 5.9 5.4 5.5 5.8 5.6 5.5 5.5 6.0 6.2
Solution H0(1): The order of gravida is insignificant. H0(2): The age of mother is insignificant. H0(3):The age of mother and order of gravida do not significantly affect the birth weight of children. H1(1): The order of gravida is significant. H1(2): The age of mother is significant. H1(3): The age of mother and order of gravida significantly affect the birth weight of children. Level of Significance: α = 0.05. Critical values: F 0.05, (4,50) = 2.57 and F 0.05, (16,50) = 2.13 Calculations: Age group of mother
Order of gravida
Total 2
15 – 20
20 – 25
25 – 30
30 – 35
> 35
Ti..
Ti ..
1 2 3 4 ≥5
14.9 15.8 17.4 17.9 18.0
15.4 16.1 18.1 18.7 18.4
15.1 15.7 17.6 18.1 17.1
14.8 15.7 16.8 17.3 17.4
15.0 15.4 16.8 16.9 17.7
75.2 78.7 86.7 88.9 89.2
5655.04 6193.69 7516.89 7903.21 7956.64
Total T.j
84.0
86.7
84.2
82.0
81.8
418.7
35225.5
T. 2j .
7056
7516.89
7089.64
6724.00
6691.24
35077
CF = (418.7)2/75 = 2337.40; SSG = SSI =
1 5×3 1 3
∑T
2 i ..
∑∑ T
2 ij .
i
TSS = 1351.19 – 2337.40 = 13.79
– CF = 10.96; SSM =
1 5 ×3
∑T
2 . j.
– CF = 1.12
– CF –SSG – SSM = (7049.33/3) – 2337.40 – 10.96 – 1.12 = 0.30
j
ESS = 13.79 – 10.96 – 1.12 – 0.30 = 1.41
Analysis of Variance Tests
119
ANOVA Table: Sources of variation Order of gravida
Degrees of freedom 4
Sum of squares 10.96
Mean sum of squares 2.74
Mother’s age
4
1.12
0.28
Interaction
16
0.30
0.02
Error
50
1.41
0.03
Total
74
13.79
–
Test Statistics: 1.
SST/ (t − 1) F 1 = ESS/tr(m − 1) = 91.33
2.
SSB/(r − 1) F 2 = ESS/tr(m − 1) = 9.33
3.
F3 =
SSI/(t − 1)(r − 1) ESS/tr(m − 1) = 0.67
Conclusions: Since F 1 > F 0.05, (4,50), we conclude that the data provide us evidence against the null hypothesis H0(1) and in favor of H1(1). Hence H1(1) is accepted at 5% level of significance. That is, the order of gravida is significant. Since F 2 > F 0.05,(4,50), we conclude that the data provide us evidence against the null hypothesis H0(2) and in favor of H1(2). Hence H1(2) is accepted at 5% level of significance. That is, the mother’s age is significant. Since F 3 < F 0.05, (16,50), we conclude that the data do not provide us any evidence against the null hypothesis H0(3), and hence it is accepted at 5% level of significance. That is, the age of mother and order of gravida do not significantly affect the birth weight of children.
TEST – 30
ANOCOVA TEST FOR RANDOMIZED BLOCK DESIGN Aim
To test the significance of the treatment effects and the significance of the regression coefficient of Y on X, based on the observations from n experimental units under randomized block design. Source
Let (Yij, X ij) (i = 1, 2, …, t ; j = 1, 2, …, r) be the observations made from an experiment consists of t treatments each with r blocks (replications) on two variables Y and X. The observations on auxiliary or concomitant variable, X apart from the main variable Y under study is available for each of the experimental units. When Y and X are associated, a part of the variation of Y is due to variation in values of X. After eliminating, the effects of blocks and treatments one can then estimate a relationship, between Y and X and use that relationship to predict the value of Y for a given value of X. This test is used for assessing the significance of relationship between X and Y. If there is, a significant association between X and Y one may calculate the adjusted treatment sum of squares and perform the test for the homogeneity of treatment effects. Let n = t × r. The observed data is arranged as follows: Treatments
Blocks 1 1 2 … … … r Treatment totals
2
…
Block totals
t
Y Y11 Y12 … … … Y1r
X X11 X12 … … … X1r
Y Y21 Y22 … … … Y2r
X X21 X22 … … … X2r
… … … … … … …
Y Yt1 Yt2 … … … Ytr
X Xt1 Xt2 … … … Xtr
Y BY1 BY1 … … … BY1
X BX1 BX1 … … … BX1
TY1
TX1
TY2
TX2
…
TYt
TXt
GY
GX
Analysis of Variance Tests
121
Linear Model
The linear model is Yij = µ + τi + βj + b(X ij – X ) + εij where, Yij is the observation from the jth block of the ith treatment of Y, X ij is the observation from the jth block of the ith treatment of the concomitant variable X, X is the mean of X, µ is the overall mean effect, τi is the effect due to the ith treatment, βj is the effect due to the jth block, b is the regression coefficient of Y on X, and εij is the error effect due to chance causes. Assumptions
(i) (ii) (iii) (iv)
The population from which, the observations drawn is Normal distribution. The observations are independent. The various effects are additive in nature. εij are identically independently distributed as Normal distribution with mean zero and variance 2
σε . (v) The auxiliary variable X is correlated with Y. Null Hypotheses
H0(1): The regression coefficient b is insignificant. H0(2): The k treatments have equal effect. That is, H0(2): τ1 = τ2 = … = τt. Alternative Hypotheses
H1(1): The regression coefficient b is significant. H1(2): The k treatments do not have equal effect. That is, H1(2): τ1 ≠ τ2 ≠ … ≠ τt. Level of Significance ( α ) and Critical Region
F 1 > F α, (1, (t–1)(r–1) –1 such that P [F 1 > F α,(1,(t–1)(r–1)–1] = α. F 2 > F α, (t–1), (t–1)(r–1) –1 such that P [F 2 > F α,(t–1),(t–1)(r–1)–1 ] = α. The critical values of F at level of Significance α and degrees of freedoms (t − 1), (t − 1)(r − 1) and 1, (t − 1)(r − 1) − 1 are obtained from Table 4. Method
Calculate the following, based on the observations.
122
Selected Statistical Tests
For variable Y t
∑∑ Y
1. Grand total of all the observations of Y, GY =
2. Correction Factor, CF Y =
r
i =1 j =1
ij
GY2 n t
3. Total Sum of Squares (TSS), GYY =
r
∑∑ Y
2 ij
i =1 j =1
– CF Y
t
∑
1 TYi2 – CF 4. Treatment Sum of Squares (SST), TYY = r Y i =1 TYi be the total of the ith treatment observations of Y. 5. Block sum of squares (BSS), B YY
r
∑B
1 = t
2 Yj
j =1
– CF Y
BYj be the total of the jth block observations of Y. 6. Error Sum of Squares (ESS), E YY = GYY – TYY – B YY For variable X t
∑∑ X
7. Grand total of all the observations, GX = 8. Correction Factor, CF X =
r
i =1 j =1
ij
G X2 n t
9. Total Sum of Squares (TSS), GXX =
r
∑∑ X i =1 j =1
1 10. Treatment Sum of Squares (SST), TXX = r
2 ij
– CF X
t
∑T i =1
2 Xi
– CF X
TXi be the total of the ith treatment observations of X, from all the replications. 1 11. Block sum of squares (BSS), B XX = t
r
∑B j =1
2 Xj
– CF X
B Xj be the total of the jth block observations of X. 12. Error Sum of Squares (ESS), E XX = GXX – TXX – B XX
Analysis of Variance Tests
123
For variables Y and X
GY × G X n 14. Total Sum of Products of Y and X (TSP), 13. Correction Factor, CF YX =
t
GYX =
r
∑∑ Yij × X ij – CFYX i =1 j=1
15. Treatment Sum of products of Y and X (SPT), TYX =
1 t T × T Xi – CF YX r i =1 Yi
∑
16. Block sum of Products of Y and X (BSS), 1 B YX = t
r
∑ BYj × B Xj j =1
– CF YX
17. Error Sum of Products, (ESP) E YX = GYX – TYX – B YX 18. The regression coefficient within treatment, b = E YX/E XX 19. E = E YY – b XYX E YX Test Statistic
F 1= E YY
E2 YX / 1 E XX 2 E − YX /(t − 1)(r − 1) − 1 E XX
F 1 follows F distribution with 1, (r – 1)(t – 1) – 1 degrees of freedom. Conclusion
If F 1 ≤ F α,(1(t – 1) (r – 1)–1 accept H0(1) and conclude that the regression coefficient of Y on X is insignificant. If F 1 > F α,(1,(t–1)(r–1)-1 reject H0(1) or accept H1(1) and conclude that the regression coefficient of Y on X is significant and proceed to make adjustments for the variate. Calculate the following adjusted values for the variable Y: EYY ′ = EYY + TYY ; ′ E YX ~ b = E ′XX ;
EYX ′ = EYX + TYX ;
~ E 1 = EYY ′ − b EYX ′
E ′XX = E XX + TXX
124
Selected Statistical Tests
One degree of freedom is lost in error due to fitting a regression line. The above calculations are provided as a single table as follows Analysis of Covariance Table Sources of variation
Degrees of freedom
Sum of Squares and products Y X YX
Treatments
t –1
TYY
TXX
TYX
Blocks
r–1
BYY
BXX
BYX
Error
(t – 1)(r – 1)
EYY
EXX
EYX
Total
n–1
GYY
GXX
GYX
TAR denotes the Treatment Adjusted for the average Regression within treatments and R.C denotes the regression coefficients. Sources
R.C
Adj.SS
Adj.DF
MSS
TAR
—
E1 – E
(t – 1)
E1 – E/(t–1)
Error
B
E
(t – 1)(r – 1) – 1
E/(t – 1)(r – 1) – 1
Tre + Err
~ b
E1
R(t – 1) – 1
—
Test Statistic
E1 − E /(t − 1) F 2 = E /(t − 1)(r − 1) − 1 The Statistic F follows F distribution with (t – 1), (r – 1) – 1, degrees of freedom. Conclusion
If F ≤ F α,(t – 1),(t–1)(r – 1) – 1, we conclude that the data do not provide us any evidence against the null hypothesis H0(2), and hence it may be accepted at α% level of significance. Otherwise reject H0(2) or accept H1(2). Example A fertilizer trial on ADT-31 paddy was conducted in RBD. The grain yield was the primary variable, Y. The number of productive tillers per hill was observed as mean of ten hills and it was the covariate, X. The outputs are given below. Analyze the data and state your comments.
Analysis of Variance Tests
125
Block Treatment
I
II
III
Total
IV
Y
X
Y
X
Y
X
Y
X
Y
X
Control AN1 AN2
7.7 10.8 13.0
5.1 6.5 7.6
6.4 9.0 12.6
5.5 6.3 7.6
8.0 10.5 12.0
5.0 6.7 7.3
6.9 9.6 13.0
5.5 6.5 8.6
28.3 39.9 50.6
21.1 26.0 31.1
AN3 AN4 UN1
15.0 14.8 9.9
8.5 10.4 6.3
14.8 15.0 10.5
8.9 9.5 6.4
14.0 13.0 9.0
9.5 9.7 6.3
14.0 9.5 14.1 10.1 9.6 6.2
57.8 56.9 39.0
36.4 39.7 25.2
UN2 UN3 UN4
13.1 14.4 15.0
7.5 8.1 9.2
11.9 14.2 14.8
7.1 9.5 10.1
12.9 13.5 13.8
7.8 9.5 10.4
12.5 14.1 12.8
7.9 8.8 9.9
50.4 56.2 56.4
30.3 35.9 39.6
Total
113.0
69.2
109.2
70.9
106.7
72.2
106.6 73.0
435.5
285.3
H0(1): The regression coefficient b is insignificant. H0(2): The nine treatments have equal effect. H1(1): The regression coefficient b is significant. H1(2): The nine treatments do not have equal effect. Level of Significance: α = 0.05 Critical values: F 0.05,(1,23) = 4.28 and F 0.05,(8,23) = 2.38 Calculations: Analysis for Y 2
(435.5) CF = = 5268.3403 36 TSS = Gyy = (7.0)2 + (10.8)2 + … + (12.8)2 – CF = 227.6097 1 BSS = B yy = [(113.0)2 + (109.2)2 + (106.7)2 +(106.6)2] – CF = 3.003 9 1 SST = Tyy = [(28.3)2 + (39.9)2 + … + (56.4)2] – CF = 214.7272 4 ESS = E yy = 9.8795 Analysis for X 2
(285.3) = 2261.0025 36 TSS = GXX = (5.1)2 + (6.5)2 + … + (9.9)2 – CF = 93.8875 1 BSS = B XX = [(69.2)2 + (70.9)2 + (72.2)2 + (73.0)2] – CF = 0.9186 9 1 SST = TXX = [(21.1)2 + (26.0)2 + … + (39.6)2] – CF = 88.89 4 ESS = E XX = 4.0789 CF =
126
Selected Statistical Tests
Analysis for Y and X
(435. 5)(285. 3) = 3451.3375 36 TSP = Gyx = (7.0)(5.1) + (10.8)(6.5) + … + (12.8)(9.9) – CF = 130.7625 1 BSP = B yx = [(113)(69.2) + (109.2)(70.9) + (106.7)(72.2) + (106.6)(73)] – CF 9 = 3449.7133 – 3451.3375 = –1.6242 1 SPT = Tyx = [(28.3)(21.1) + (39.9)(26.0) + … + (56.4)(39.6)] – CF 4 = 3582.9950 – 3451.3375 = 131.6575 ESP = E yx = 0.7292 ANOCOVA Table: CF =
Sources of variation
Degrees of freedom
Sum of squares and products YY
XX
YX
Blocks
3
3.003
0.9186
– 1.6242
Treatments
8
214.7272
88.8900
131.6575
Error
24
9.8795
4.0789
0.7292
Treat + Error
32
224.6067
92.9689
132.3867
Total
35
227.6097
93.8875
130.7625
For the covariate X, Treatment Mean Square, TMS =
88. 89 = 11.1112 8
4.0789 = 0.17 24 11. 1112 F= = 65.36 0. 17 Since F is significant at 1% level of significance, we conclude that the covariate is also affected by the treatments. 0.7292 The regression coefficient within treatment, b = E YX/E XX = = 0.1788 4.0789 2 (0. 7292) 2 E = E YY – E YX/E XX = 9.8795 – = 9.8795 – 0.13036 = 9.74914 4. 0789 Error Mean Square, EMS =
Test Statistic:
F1 =
E YY
E2 YX / 1 E XX 0. 13036 /1 = = 0.3075 2 9 . 74914 / 23 EYX − /(t − 1)(r − 1) − 1 E XX
Conclusion: Since, F 1 < F 0.05,(1,23), F is not significant and hence b is not significant. Since b is not significant, the effect of covariate in reducing the error will not be significant.
TEST – 31
TEST FOR LATIN SQUARE DESIGN Aim
To test the significance of the m treatment effects, m row effects and m column effects based on the observations from m square (m2) experimental units. Source
Let yijk, (i, j, k = 1, 2,…, m) be the observations of m treatments, each applied with (equal number of replications) m times in m2 experimental units. In this design, the entire experimental material is divided into m2 experimental units arranged in a square so that each row and each column contains m units. The m treatments are allocated at random to these rows and columns in such a way that every treatment occurs once and only once in each row and in each column. This design is very much advantageous in the sense that, the treatment effect, the two orthogonal effects such as row and column effects can be studied simultaneously in m square experimental units. Linear Model
The linear model is yijk = µ + τi + βj + νk + εijk; (i, j, k = 1, 2,…, m) where yijk is the observation of the ith treatment obtained from the jth row and k th column, µ is the overall mean effect, τi is the effect due to the ith treatment, βj is the effect due to the jth row, νk is the effect due to the k th column and εijk is the error effect due to chance causes. Assumptions
(i) (ii) (iii) (iv)
The population from which, the observations drawn is Normal distribution. The observations are independent. The various effects are additive in nature. εijk are identically independently distributed as Normal distribution with mean zero and variance σ 2ε .
Null Hypotheses
H0(1): The m treatments have equal effect. i.e., H0(1): τ1 = τ2 = …, = τm.
128
Selected Statistical Tests
H0(2): The m rows have equal effect. i.e., H0(2): β1 = β2 = …, = βm. H0(3): The m columns have equal effect. i.e., H0(3): ν1 = ν2 = …,= νm. Alternative Hypotheses
H1(1): The m treatments do not have equal effect. i.e., H1(1): τ1 ≠ τ2 ≠ …, ≠ τm. H1(2): The m rows do not have equal effect i.e., H1(2): β1 ≠ β2 ≠…, ≠ βm. H1(3): The m columns do not have equal effect. i.e., H1(3): ν1 ≠ ν2 ≠…, ≠ νm. Level of Significance ( α ) and Critical Region
F i > F α,(m–1),(m–1)(m–2) such that P [F i > F α,(m–1),(m–1)(m–2)] = α for i = 1, 2, 3. The critical values of F at level of Significance α and degrees of freedom (m − 1, (m − 1)(m − 2)) are obtained from Table 4. Method
Calculate the following, based on the observations. m
1. Grand total of all the observations, G = 2. Correction Factor, CF =
G m
m
∑∑ y
ijk
j =1 k =1
2 2 m
3. Total Sum of Squares, TSS =
m
∑∑ y
2 ijk
j =1 k =1
– CF
1 m Ti be the total of the ith treatment observations.
4. Sum of Squares between Treatments, SST =
5. Sum of Squares between Rows, SSR =
1 m
m
∑R
m
∑T
2
i
– CF
i =1
2 j
– CF
j =1
R j be the total of the jth row observations. 6. Sum of Squares between Columns, SSC =
1 m
m
∑C
2 k
k =1
Ck be the total of the k th column observations. 7. Error Sum of Square, ESS = TSS – SST – SSR – SSC.
– CF
Analysis of Variance Tests
129
Analysis of Variance Table Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Treatments
m–1
SST
SST/(m–1)
Rows
m–1
SSR
SSR/(m–1)
Columns
m–1
SSC
SSC/(m–1)
Error
(m–1)(m–2)
ESS
ESS/(m–1)(m–2)
Total
m2 –1
TSS
-
Test Statistics
1.
F1 =
SST/ (m − 1) ESS/(m − 1)(m − 2)
2.
F2 =
SSR/ (m − 1) ESS/(m − 1)(m − 2)
3.
F3 =
SSC/(m − 1) ESS/(m − 1)(m − 2)
The statistic F 1, F 2, F 3 follows F distribution with (m–1),(m–1)(m–2) degrees of freedom. Conclusions
If F i ≤ F α, (m–1),(m–1)(m–2) , we conclude that the data do not provide us any evidence against the null hypothesis H0(i), and hence it may be accepted at α% level of significance. Otherwise reject H0(i) or accept H1(i) for i = 1, 2, 3. Example 1. An experiment was carried out to determine the effect of claying the ground on the field of barley grains; amount of clay used were as follows. A: No clay, B: Clay at 100 per acre. C: Clay at 200 per acre, D: Clay at 300 per acre. The yields were in plots of 10 square meters and the layout and yields were as follows. Analyze all the effects at 5% level of significance. Column
I
II
III
IV
Row
Total
I
D 34.7
A 35.6
B 38.2
C 35.5
144
II
C 38.2
D 34.4
A 42.8
B 37.6
153
III
A 36.4
B 37.2
C 41.7
D 36.7
152
IV
B 39.7
C 38.8
D 40.3
A 38.2
157
149
146
163
148
606
Total
130
Selected Statistical Tests
Solution H0(1): The yields under four types of clay are equal. H0(2): All the four rows have equal yields. H0(3): All the four columns have equal yields. H1(1): The yields under four types of clay are not equal. H1(2): All the four rows do not have equal yields. H1(3): All the four columns do not have equal yields. Level of Significance: α = 0.05 and Critical value: F 0.05,(3,6) = 4.76 Calculations: m = No. of treatments = No. of rows = No. of columns = 4 No. of experimental units, n = 16. T1=153 T2=152.7 T3= 154.2 T4 = 146.1 m
1. G =
m
∑∑ y ijk = 606 j =1 k=1
2. CF =
G
2
2
=
m
m
m
3. TSS =
2
606 2
4
∑∑ y
2 ijk
= 22952.25
– CF= 23038.58 – CF = 86.33
j =1 k =1
1 4. SST = m 5. SSR =
m
∑T
– CF =
i =1
1 m
1 6. SSC = m
2
i
m
∑R
2 j
– CF =
1 (1442 + 1532 + 1522 + 1572) – CF = 22.25 4
– CF =
1 (1492 + 1462 + 1632 + 1482) – CF = 45.25 4
j =1
m
∑C k =1
2 k
1 (1532 + 152.72 + 154.22 + 146.12) – CF = 10.035 4
7. ESS = TSS – SST – SSR – SSC = 8.795 ANOVA Table: Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
T reatments
3
10.035
3.345
Rows
3
22.25
7.4167
Columns
3
45.25
15.08
Error
6
Total
15
8.795 86.33
1.4658 –
Analysis of Variance Tests
131
Test Statistics: SST/ (m − 1) = 2.28 ESS/(m − 1)(m − 2)
1.
F1 =
2.
SSR /(m − 1) F 2 = ESS /(m − 1)(m − 2) = 5.06
3.
SSC /(m − 1) F 3 = ESS /(m − 1)(m − 2 ) = 10.29
Conclusions: Since F 1 < F 0.05, (3,6), we conclude that the data do not provide us any evidence against the null hypothesis H0(1), and hence it may be accepted at 5% level of significance. That is, all the four types of clay have equal yields. Since F 2, F 3 > F 0.05, (3,6), we conclude that the data provide us evidence against the null hypotheses H0(2) and H0(3) and in favor of H1(2)and H1(3). Hence, H1(2) and H1(3) are accepted at 5% level of significance. That is, all the four rows have not equal yields and all the four columns have not equal yields.
TEST – 32
TEST FOR 22 FACTORIAL DESIGN
Aim
To test the significance of the main effects and interaction effect based on experiment consists of two factors each with two levels. Source
In this design, let there be two treatments (Factors) say, A and B are called simple treatments whose effects can be tested with two levels, say 0 (absent) and 1 (present). That is, we study the individual effects of A and B as well as their combined effect, called as interaction. This 22 factorial design consists of 4 treatment combinations namely A 0B 0, A 1B 0, A 0B 1, A 1B 1 are denoted by ‘1’ (both at 0 level indicate no application of factor), main effect A, main effect B and interaction AB. It can be tested in r blocks (replications), so that it requires r × 22 = 4r = n experimental units. [1], [a], [b] and [ab] are called treatment totals, denote, respectively the observations of the treatments ‘1’, ‘a’, ‘b’ and ‘ab’ from all the r blocks. Null Hypotheses
H0(1): All the r blocks have equal effect. H0(2): The main effect A is insignificant. H0(3): The main effect B is insignificant. H0(4): The interaction AB is insignificant. Alternative Hypotheses
H1(1): All the r blocks do not have equal effect. H1(2): The main effect A is significant. H1(3): The main effect B is significant. H1(4): The interaction AB is significant.
Analysis of Variance Tests
133
Level of Significance ( α ) and Critical Region
F 1 > F α,(r–1), 3(r–1) such that P[F 1 > F α,(r–1), 3(r–1) ] = α F i > F α, 1, 3(r–1) such that P[F i > F α, 1, 3(r–1) ] = α, for i = 2, 3, 4 Method
Calculate the following 1. Factorial effect total for the main effect ‘A’ [A] = [ab] + [a] – [b] – [1] 2. Factorial effect total for the main effect ‘B’ [B] = [ab] + [b] – [a] – [1] 3. Factorial effect total for the interaction ‘AB’ [AB] = [ab] – [a] – [b] + [1] 4. Sum of Squares due to main effect ‘A’, SS[A] = [A]2/4r 5. Sum of Squares due to main effect ‘B’, SS[B] = [B]2/4r 6. Sum of Squares due to interaction ‘AB’, SS[AB] = [AB]2/4r 7. Calculation of G, CF, TSS, SSB are same as in RBD. 8. ESS = TSS – SSB – SS[A] – SS[B] – SS[AB] Analysis of Variance Table
Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Blocks
r–1
SSB
SSB/(r–1)
Main effect ‘A’
1
SS[A]
SS[A]/1
Main effect ‘B’
1
SS[B]
SS[B]/1
Interaction ‘AB’
1
SS[AB]
SS[AB]/1
Error
3(r–1)
ESS
ESS/3(r–1)
Total
n–1
TSS
-
Test Statistics
F1 =
SSB/ (r − 1) ESS/3(r − 1)
F2 =
SS [ A]/1 ESS/3(r − 1)
F3 =
SS [B ]/1 ESS/3(r − 1)
F 4=
SS[ AB ]/1 ESS/3(r − 1)
134
Selected Statistical Tests
Conclusions
If F 1 ≤ F α,(r–1),3(r–1), we conclude that the data do not provide us any evidence against the null hypothesis H0(1), and hence it may be accepted at α% level of significance. Otherwise reject H0(1) or accept H1(1). If F i ≤ F α,(1,3(r–1)), we conclude that the data do not provide us any evidence against the null hypothesis H0(i), and hence it may be accepted at α% level of significance. Otherwise reject H0(i) or accept H1(i) for i = 2, 3, 4. Example An experiment was planned to study the effect of urea and potash on the yield of tomatoes. All the combinations of two levels of urea [0 cent (p0) and 5 cent (p1) per acre] and two levels of potash [0 cent (k 0) and 5 cent (k 1) per acre] were studied in an RBD design with four replications each. The following are the yields. Analyze the data and state your conclusions. Block
Treatment yields
I
(1) 23
k 25
p 22
pk 38
II
p 40
(1) 26
k 36
pk 38
III
(1) 29
k 20
pk 30
p 20
IV
pk 34
k 31
p 24
(1) 28
Solution H0(1): All the four blocks have equal effect. H0(2): The main effect p is insignificant. H0(3): The main effect k is insignificant. H0(4): The interaction pk is insignificant. H1(1): All the four blocks do not have equal effect. H1(2): The main effect p is significant. H1(3): The main effect k is significant. H1(4): The interaction pk is significant. Level of Significance: α = 0.05. Critical Values: F 0.05, (3,9) = 3.86 and F 0.05, (1,9) = 5.12 Calculations: Treatment totals, [1] = 106; [p] = 106; [k] = 112; [pk] = 140 1. Factorial effect total for the main effect ‘p’ [P] = [pk] + [p] – [k] – [1] = 140 + 106 – 112 – 106 = 28 2. Factorial effect total for the main effect ‘k’ [K] = [pk] + [k] – [p] – [1] = 140 + 112 – 106 – 106 = 40 3. Factorial effect total for the interaction ‘pk’ [PK] = [pk] – [p] – [k] + [1] = 140 – 106 – 112 + 106 = 28
Analysis of Variance Tests
4. 5. 6. 7. 8.
135
Sum of Squares due to main effect ‘p’, SS[p] = [P]2/4×4 = 100 Sum of Squares due to main effect ‘k’, SS[k] = [k]2/4×4 = 49 Sum of Squares due to interaction ‘pk’, SS[pk] = [pk]2/4×4 = 49 G = 464, CF = 13456, TSS = 14116 – 13456 = 660, SSB = 94 ESS = TSS – SSB – SS[p] – SS[k] – SS[pk] = 368
ANOVA Table: Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Blocks
3
94
31.33
Main effect ‘p’
1
100
100
Main effect ‘k’
1
49
49
Interaction ‘pk’
1
49
49
Error
9
368
40.89
Total
15
660
–
Test Statistics: F1 =
SSB/ (r − 1) = 0.77 ESS/3(r − 1)
F2 =
SS [ A]/1 = 2.45 ESS/3(r − 1)
F3 =
SS [B ]/1 = 1.20 ESS/3(r − 1)
F4 =
SS[ AB ]/1 = 1.20 ESS/3(r − 1)
Conclusions: Since F 1 < F 0.01, (3,9), we conclude that the data do not provide us any evidence against the null hypothesis H0(1), and hence it is accepted at 1% level of significance. That is, all the four blocks have equal effect. Since F i < F 0.01, (1,9), for i = 2, 3, 4, we conclude that the data do not provide us any evidence against the null hypothesis H0(i), and hence it is accepted at 1% level of significance. That is, the main effects p, k and the interaction effect pk are insignificant.
TEST – 33
TEST FOR 23 FACTORIAL DESIGN Aim
To test the significance of the main effects and interaction effect based on experiment consists of three factors each with two levels. Source
In this design, let there be three treatments (Factors) say, A, B and C are called simple treatments whose effects can be tested with two levels, say 0 (absent) and 1 (present). That is, we study the individual effects of A, B and C as well as their combined effects, called as interactions. This 23 factorial design consists of 8 treatment combinations namely A 0B 0C0, A 1B 0C0, A 0B 1C0, A0B 0C1, A 1B 1C0, A 1B 0C1, A 0B 1C1 and A 1B 1C1 are denoted by ‘1’ (all at 0 levels indicate no application of factor), main effects A, B, C and interactions AB, AC, ABC. It can be tested in r blocks (replications), so that it requires r × 2 3 = 8r = n experimental units. [1], [a], [b], [c], [ab], [ac], [bc] and [abc] are called treatment totals, denote, respectively the observations of the treatments ‘1’, ‘a’, ‘b’, ‘c’, ‘ab’, ‘ac’, ‘bc’ and ‘abc’ from all the r blocks. Null Hypotheses
H0(1): All the r blocks have equal effect. H0(2): The main effect A is insignificant. H0(3): The main effect B is insignificant. H0(4): The main effect C is insignificant. H0(5): The interaction AB insignificant. H0(6): The interaction AC insignificant. H0(7): The interaction BC insignificant. H0(8): The interaction ABC insignificant. Alternative Hypotheses
H1(1): All the r blocks do not have equal effect. H1(2): The main effect A is significant.
Analysis of Variance Tests
137
H1(3): The main effect A is significant. H1(4): The main effect A is significant. H1(5): The interaction AB is significant. H1(6): The interaction AC is significant. H1(7): The interaction BC is significant. H1(8): The interaction ABC is significant. Level of Significance ( α ) and Critical Region
F 1 > F α,(r–1), 7(r–1) such that P[F 1 > F α,(r–1), 7(r–1) ] = α. F m > F α, 1, 7(r–1) such that P[F m > F α, 1, 7(r–1)] = α for m = 2, 3, 4, 5, 6, 7, 8. Method Yates method of totals and sum of squares of factorial effects in a 2 3 factorial experiment
Treatment Treatment combiStep(1) totals nation
‘1’ a b ab c ac bc abc
[1] [a] [b] [ab] [c] [ac] [bc] [abc]
[1] + [a] = u1 [b] + [ab] = u 2 [c] + [ac] = u 3 [bc] + [abc] = u4 [a] – [1] = u5 [ab] – [b] = u6 [ac] – [c] = u7 [abc] – [bc] = u 8
Step(2) u 1 + u 2 = v1 u 3 + u 4 = v2 u 5 + u 6 = v3 u 7 + u 8 = v4 u2 – u1= v5 u4 – u3 = v6 u6 – u5 = v7 u8 – u7 = v8
Step(3)
Factorial effect totals
Sum of squares
G [A] [B] [AB] [C] [AC] [BC] [ABC]
CF = G /32 2 SSA = [A] /8r 2 SSB = [B] /8r 2 SSAB = [AB] /8r 2 SSC = [C] /8r 2 SSAC = [AC] /8r 2 SSBC = [BC] /8r 2 SSABC = [ABC] /8r
v1 + v2 = w1 v3 + v4 = w 2 v5+ v6 = w3 v7 + v8 = w 4 v2 – v1 = w5 v4 – v3 = w6 v6 – v5 = w7 v8 – v7 = w8
2
Calculation of G, CF, TSS, BSS are same as in RBD. ESS = TSS – BSS – SSA – SSB – SSC – SSAB – SSAC – SSBC – SSABC Analysis of Variance Table Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Blocks Main effect ‘A’ Main effect ‘B’ Main effect ‘C’ Interaction ‘AB’ Interaction ‘AC’ Interaction ‘BC’ Interaction ‘ABC’ Error Total
r–1 1 1 1 1 1 1 1 7(r–1) n–1
BSS SSA SSB SSC SSAB SSAC SSBC SSABC ESS TSS
BSS/(r–1) SSA/1 SSB/1 SSC/1 SSAB/1 SSAC/1 SSBC/1 SSABC/1 ESS /7(r–1) -
138
Selected Statistical Tests
Test Statistics
BSS/ (r − 1) F 1 = ESS/7 (r − 1)
SSA/1 F 2 = ESS/7 (r − 1)
SSB/1 F 3 = ESS/7 (r − 1)
SSC/1 F 4 = ESS/7 (r − 1)
SSAB/1 F 5 = ESS/7 (r − 1)
SSAC/1 F 6 = ESS/7 (r − 1)
SSBC/1 F 7 = ESS/7 (r − 1)
SSABC/1 F 8 = ESS/7 (r − 1)
Conclusions
If F 1 ≤ F α, (r–1),7(r–1), we conclude that the data do not provide us any evidence against the null hypothesis H0(1), and hence it may be accepted at α% level of significance. Otherwise reject H0(1) or accept H1(1). If F m ≤ F α, (1,7(r–1)), we conclude that the data do not provide us any evidence against the null hypothesis H0(m), and hence it may be accepted at α% level of significance. Otherwise reject H0(m) or accept H1(m) for m = 2, 3, 4, 5, 6, 7, 8. Example The following data shows the layout and results of a 23 factorial design laid out in four replicates (blocks). The purpose of the experiment is to determine the effect of different kinds of fertilizers Nitrogen, N, Potash, K and Phosphate, P on potato crop yield. Block-I nk
kp
p
291
391
312
np
1
373 101
k
n
nkp
265
106
450
Block-II kp
p
k
nk
n
nkp
np
1
407 324
272
306
89
449
338 106
k
n
nkp
279
128
471
1
nkp
kp
131
437
435
Block-III p
1
np
323 87
324
kp
nk
423 334 Block-IV
np
nk
n
361
272
103
p
k
324 302
Analysis of Variance Tests
139
Solution H0: All the treatments as well as blocks have homogeneous effect. H1: All the treatments and blocks effects are significant. Level of Significance: α = 0.05 Critical values: F 0.05,(3,21) = 3.70 and F 0.05,(1,21) = 2.50 Calculations: n = 32; G = 9324; CF = 93242/32 = 2716780.5 Block totals: B1 = 2289 B2 = 2291 B3 = 2369 B4 = 2375 Treatment totals: ‘1’= 425; n = 426; k = 1118; nk = 1203; p = 1283; np = 1396; kp = 1666; nkp = 1807. TSS = (291)2 + (391)2 + … + (445)2 – CF = 3182118 – 2716780.5 = 465337.5 1 BSS = (2289)2 + … + (2375)2 – CF = 843 8 1 SST = (425)2 + … + (1807)2 – CF = 456955.5 4 ESS = TSS – BSS – SST = 7539 Yates method of totals and sum of squares of factorial effects in a 2 3 factorial experiment.
Treatment combinations
Total yield
(1)
‘1’ n k nk p np kp Nkp
125 426 1118 1203 1283 1396 1666 1807
851 2321 2679 3473 1 85 113 141
(2) 3172 6152 86 254 1470 794 84 28
(3)
Effect totals
9324 340 2264 112 2980 168 – 676 – 56
G [N] [K] [NK] [P] [NP] [KP] [NKP]
Test Statistic: BSS/ (r − 1) 843/(4 − 1) F 1 = ESS/7 (r − 1) = 7539/ 7(4 − 1) = 0.78 SS[ N ]/1 3612.5 /1 F 2 = ESS/7 (r − 1) = 7539/ 7(4 − 1) = 10.06 SS[ K ]/1 160178/1 F 3 = ESS/7 (r − 1) = 7539/ 7(4 − 1) = 446.1 SS[ NK ]/1 392/1 F 4 = ESS/7 (r − 1) = 7539/ 7(4 − 1) = 1.09
Sum of squares 2716780.5 3612.5 160178.0 392.0 277512.5 882.0 14280.5 98.0
140
Selected Statistical Tests
F5 =
SS[ P ]/1 277512. 5/1 = = 773.01 ESS/7 (r − 1) 7539/ 7(4 − 1)
F6 =
SS[ NP ]/1 882 /1 = = 2.45 ESS/7 (r − 1) 7539/ 7(4 − 1)
F7 =
SS [KP ]/1 14280.5 /1 = = 39.7 ESS/7 (r − 1) 7539/ 7(4 − 1)
F8 =
SS [NKP ]/1 98/1 = = 0.27 ESS/7 (r − 1) 7539/ 7(4 − 1)
Conclusions: 1. Since F 1 < F 0.05, (3,21), we conclude that all the blocks have homogeneous effect. 2. Since F 2, F 3, F 5, F 7 are > F 0.05, (1,21), we conclude that the respective factorial effects such as the main effects N, K and P and the interaction KP are significant. 3. Since F 4, F 6 are < F 0.05, (1,21), we conclude that the respective factorial effects such as the interactions NP and NKP are insignificant.
TEST – 34
TEST FOR SPLIT PLOT DESIGN Aim
To test the significance of the effect of main plot treatments and the effect of sub plot treatments. Source
Suppose we are interested to test two factors ‘a’ and ‘b’, factor ‘a’ being at p levels a1, a2,…, ap and factor ‘b’ at q levels b1, b2, …, bq. The different types of treatments are allotted at random to their respective plots. Such arrangement is split-plot design. In this design, the larger plots are called main plots and the smaller plots within the larger plots are called sub-plot treatments. The factor levels allotted to the main plots are called main plot treatments and the factor levels allotted to the sub-plot are called sub-plot treatments. The factor that requires greater precision is assigned to the sub-plots. The replication is then divided into number of main plots equivalent to the main plot treatments. Each main plot is divided into sub-plots depending on the number of sub-plot treatments. Hence, there are p main plot treatments, q sub plot treatments and r blocks (replications), so that there are rpq = n experimental units in total. The observations are arranged in a three-way table. Linear Model
The model for this experiment in randomized blocks is Yijk = µ + bi + mj + mij + sk + δjk + εijk. (i = 1, 2, …, r; j = 1, 2, …, p; k = 1, 2,…, q) Where Yijk is the observation of the ith block, jth main plot and k th sub plot. µ is the overall mean effect. bi is the effect due to the ith block. mj is the effect due to the jth main plot treatment. mij is the main plot error or error (A). sk is the effect due to the k th sub plot treatment. δjk is the effect due to interaction between main and sub plots. and εijk is the error effect due to sub plot and interaction or error (B).
142
Selected Statistical Tests
Assumptions
1. The main plot treatments are allocated randomly to each of the blocks. 2. The sub plot treatments are allocated randomly within the main plot treatments. 3. bi, mij and εijk are independently normally distributed each with mean zero and variance 2 2 2 σ b , σ m and σ ε respectively..
4.
∑m
j
= 0,
j
∑s k
k
= 0,
∑δ
= 0L∀ ⋅ j,
jk
k
∑δ
jk
= 0L∀ ⋅ k .
j
Null Hypotheses
H0(1): The m main plot treatments have equal effect. i.e., H0(1): m1 = m2 = …, = mp. H0(2): The s sub plot treatments have equal effect. i.e., H0(2): s1 = s2 = …, = sq. H0(3): There is no interaction between main and sub plot treatments. i.e., H0(3): δjk = 0 for all j and k. Alternative Hypotheses
H1(1): The m main plot treatments do not have equal effect. i.e., H0(1): m1 ≠ m2 ≠ …, ≠ mp. H1(2): The s sub plot treatments do not have equal effect. i.e., H0(2): s1 ≠ s2 ≠ …, ≠ sp. H0(3): There is interaction between main and sub plot treatments. i.e., H0(3): δjk ≠ 0 for all j and k. Level of Significance ( α ) and Critical Region
F 1 > F α,(p–1),(r–1)(p–1) such that P [F 1 > F α,(p–1),(r–1)(p–1)] = α. F 2 > F α,(q–1),(r–1)p(q–1) such that P [F 2 > F α,(q–1),(r–1)p(q–1)] = α. F 3 > F α,(p–1)(q–1),(r–1)p(q–1) such that P [F 3 > F α,(p–1)(q–1),(r–1)p(q–1)] = α. The critical values of F at level of Significance α and for respective degrees of freedom, are obtained from Table 4. Method
Calculate the following, based on the observations. Main Plot Analysis r
p
q
∑∑∑ y
1. Grand total of all the n observations, G =
ijk
i =1 j =1 k =1
2
2. Correction Factor, CF =
G n
r
3.
p
q
Total Sum of Squares, TSS = ∑∑∑ y i =1 j =1 k =1
2 ijk
– CF
4. Form a two-way table (BM table) for Blocks × Main plot treatments as follows.
Analysis of Variance Tests
143
Main plot treatments 1 2 … p
Blocks
Total
1
Y11.
Y12.
…
Y1p.
B1
2
Y21.
Y22.
…
Y2p.
B2
…
…
…
…
…
…
R
Yr1.
Yr2.
…
Yrp.
Br
Total
M1
M2
…
Mp
G
1 5. Sum of Squares in BM table, SSBM = q
∑∑ Y
2 ij .
i
1 6. Sum of Squares between blocks, SSB = pq
j
∑B
2 i
– CF
– CF
i
1 7. Sum of Squares between Main plot treatments, SSM = rq
∑M j
2 j
– CF
8. Error Sum of Squares in BM table (Error(A)), ESS(A) = SSBM – SSB – SSM Sub Plot Analysis
9. Form a two-way table (MS table) for Main plot treatments × Sub plot treatments as follows: Main plots treatments
Sub plot treatments 2 …
1
1
Y.11
Y.12
…
Y.1q
M1
2
Y.21
Y.22
…
Y.2q
M2
…
…
…
…
…
…
P
Y.p1
Y.p2
…
Y.pq
Mp
Total
S1
S2
…
Sq
G
10. Sum of Squares in MS table, SSMS =
1 r
∑∑Y
2 . jk
j
Total
q
– CF
k
11. Sum of Squares between Sub plot treatments, SSS =
1 rp
∑S
12. Sum of Squares of Interaction, SSI = SSMS – SSM – SSS 13. Error Sum of Squares (Error(B)), ESS(B) = TSS – SSB – SSM – ESS(A) – SSS – SSI.
k
2 k
– CF
144
Selected Statistical Tests
Analysis of Variance Table Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Blocks
r–1
SSB
SSB/(r – 1)
Main Plot Treatments
p –1
SSM
SSM/(p – 1)
Error (A)
(p – 1) (r – 1)
ESS(A)
ESS(A)/(r – 1)(p – 1)
Total (BM)
rp – 1
SSBM
–
Sub Plot Treatments
q –1
SSS
Interaction
(p – 1)(q – 1)
SSI
SSI/(p–1)(q – 1)
Error (B)
(r – 1)p(q – 1)
ESS(B)
ESS(B)/(r – 1)p(q – 1)
Total (MS)
rp(q – 1)
SSMS
–
Total
rpq – 1
TSS
–
SSS/(q – 1)
Test Statistics
1.
SSM / ( p − 1) F 1 = ESS ( A) / (r − 1)( p − 1)
2.
SSS / (q − 1) F 2 = ESS ( B ) /( r − 1) p (q − 1)
3.
SSI/ ( p − 1)(q − 1) F 3 = ESS ( B ) /( r − 1) p (q − 1)
The statistics F 1, F 2, F 3 follows F distribution with [(p – 1), (r – 1)(p – 1)], [(q – 1), (r – 1)p (q – 1)] and [(p – 1)(q – 1), (r – 1)p(q – 1)] degrees of freedoms respectively. Conclusions
If F 1 ≤ F α, (p – 1),(r – 1)(p – 1), we conclude that the data do not provide us any evidence against the null hypothesis H0(1), and hence it may be accepted at α% level of significance. Otherwise reject H0(1) or accept H1(1). If F 2 ≤ F α, (q – 1), (r – 1)p (q – 1) , we conclude that the data do not provide us any evidence against the null hypothesis H0(2), and hence it may be accepted at α% level of significance. Otherwise reject H0(2) or accept H1(2). If F 3 ≤ F α, (p – 1) (q – 1), (r – 1)p (q – 1), we conclude that the data do not provide us any evidence against the null hypothesis H0(3), and hence it may be accepted at α% level of significance. Otherwise reject H0(3) or accept H1(3).
Analysis of Variance Tests
145
Example An experiment was conducted in split plot design to study the effect of fertilizer (F ) and seed rate (S) on the yield of paddy raised under semi-dry condition. The main plot treatments were the seed rates 75, 100 and 125 kg/ha denoted by s1, s2 and s3 respectively. The sub-plot treatments were the fertilizer rates. They were N:P:K in the rate 75:15:20 = f 1; 75:15:40 = f 2; 75:15:60 = f 3; 75:30:20 = f 4; 75:30:40 = f 5; 75:30:60 = f 6; 75:45:20 = f 7; 45:45:40 = f 8; 75:45:60 = f 9 and 50:15:40 = f 10. The layout plan and grain yield of paddy in kg/plot are given in the following table. Analyze the data and draw the conclusions.
Replication (Block) I
Replication (Block) II
Replication (Block) III
s2
f5 13.82 f1 12.98
f2 13.21 f6 13.80
f10 11.50 f4 13.34
f8 14.46 f7 14.10
f3 13.22 f9 14.12
s1
f7 11.05 f3 10.27
f1 9.75 f10 8.06
f6 10.79 f5 10.66
f9 11.93 f4 10.53
f2 10.21 f8 11.96
s3
f10 11.80 f8 14.22
f7 14.01 f4 13.70
f2 13.58 f3 13.62
f9 14.31 f6 13.88
f1 13.16 f5 13.89
s1
f9 12.31 f6 11.31
f4 10.92 f10 8.45
f2 10.67 f8 12.22
f1 10.14 f5 11.28
f3 10.79 f7 11.44
s3
f4 13.72 f5 13.84
f7 14.02 f8 14.19
f1 13.26 f6 13.91
f9 14.18 f10 12.48
f3 13.65 f2 13.56
s2
f10 11.30 f2 13.26
f8 14.06 f5 13.65
f1 13.12 f6 13.70
f9 14.20 f4 13.43
f7 13.78 f3 13.31
s2
f2 13.36 f6 13.92
f8 14.22 f10 11.06
f7 14.16 f1 13.29
f4 13.69 f3 13.48
f9 14.01 f5 13.81
s3
f8 14.26 f4 13.68
f6 13.81 f1 13.31
f10 11.96 f9 14.40
f7 14.04 f2 13.49
f3 13.54 f5 13.74
s1
f6 10.48 f9 11.70
f8 11.82 f5 10.46
f4 10.40 f2 10.23
f10 7.80 f7 10.79
f1 10.01 f3 10.71
Solution
H0(1): The seed rates have equal effect. H0(2): The fertilizer rates have equal effect. H0(3): There is no interaction between seed rate and fertilizer rate. H1(1): The seed rates do not have equal effect. H1(2): The fertilizer rates do not have equal effect. H1(3): There is interaction between seed rate and fertilizer rate. Level of Significance: α = 0.05. Critical Values: F 0.05,(2,4) = 6.94; F 0.05,(4,54) = 2.52; F 0.05,(18,54) = 1.79
146
Selected Statistical Tests
Calculations: n = 90; r = 3; m = 10; s = 3; G = 1131.61 CF = 14228.2355; TSS = 235.9742 Block X Main plot (BM) table: Main plot (Seed rates) Blocks
Total
s1
s2
s3
1
105.11
134.55
136.17
375.83
2
109.53
133.81
136.81
380.15
3
103.86
135.54
136.23
375.63
Total
318.50
403.90
409.21
1131.61
1 [(105.11)2 + … + (136.23)2] – CF 10 = 14402.9601 – 14228.2355 = 174.7246 1 SSB = [(375.83)2+ … +(375.630)2] – CF 30 = 14228.6703 – 14228.2355 = 0.4348 1 SS due to Main plot, SSM = [(318.50)2+…+(409.21)2] – CF 30 = 14401.0095 – 14228.2355 = 172.7740 ESS(A) = SSBM – SSB – SSM = 1.5158 Main plot X Sub plot (MS) table: BM Table SS, SSBM =
Main plot Sub plot
s1
s2
s3
f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
29.90 31.11 31.23 31.85 32.40 32.58 33.28 36.00 35.84 24.31
39.39 39.83 40.01 40.46 41.28 41.42 42.04 42.74 42.33 34.40
39.73 40.63 40.81 41.10 41.47 41.60 42.07 42.67 42.89 36.24
Total
318.5
403.9
409.21
MS table SS, SSMS =
Total 109.02 111.57 112.05 113.41 115.15 115.60 117.39 121.41 121.06 94.95 1131.61
1 [(29.90)2 + … + (36.24)2] – CF 3
≠≠≠ ≠
Analysis of Variance Tests
147
= 14461.44 – 14228.2355 = 56.7606 1 SSS = [(109.02)2 + … + (94.95)2] – CF 9 = 14284.9961 – 14228.2355 = 56.7606 SSI = SSMS – SSM – SSS = 3.6699 ESS(A) = TSS – SSB – SSM – ESS(A) – SSS – SSI = 0.8191 ANOVA table: Sources of variation
Degrees of freedom
Sum of squares
Blocks Main (Seed rate) Error (a) Sub (Fertilizer rate) Interaction Error (b)
2 2 4 9 18 54
0.4348 172.7740 1.5158 56.7606 3.6699 0.8191
Total
89
235.9742
Mean sum of squares 0.2174 86.3870 0.3790 6.3067 0.2039 0.0152
Test Statistics:
≠≠≠ ≠
1.
SSM/ ( p − 1) 172. 7740 / 2 F 1 = ESS ( A)/ (r − 1)( p − 1) = = 227.964 1. 5158 / 4
2.
SSS/(q − 1) 56. 7606 / 9 F 2 = ESS (B )/ (r − 1) p (q − 1) = = 414.914 0. 8191/ 54
3.
SSI/ ( p − 1)(q − 1) 3.6699 / 18 F 3 = ESS (B )/ (r − 1) p (q − 1) = = 13.414 0.8191 / 54
Conclusions: Since F 1 > F 0.05, (2, 4), we conclude that the data provide us evidence against the null hypothesis H0(1) and in favor of H1(1). Hence H1(1) is accepted at 5% level of significance. That is, the seed rates do not have equal effect. Since F 2 > F 0.05, (4, 54), we conclude that the data provide us evidence against the null hypothesis H0(2) and in favor of H1(2). Hence H1(2) is accepted at 5% level of significance. That is the fertilizer rate do not have equal effect. Since F 3 > F 0.05, (18, 54), we conclude that the data provide us evidence against the null hypothesis H0(3) and in favor of H1(3). Hence H1(3) is accepted at 5% level of significance. That is, there is an interaction between seed rate and fertilizer rate.
TEST – 35
ANOVA TEST FOR STRIP PLOT DESIGN
Aim
To test the significance of the effect of main plot treatments and the effect of sub plot treatments based on strip plot design. Source
In this design, the main plot treatments are applied at random to rows and the sub plot treatments are applied at random to columns. Suppose we are interested to test two factors ‘a’ and ‘b’, factor ‘a’ being at p levels a1, a2, …, ap and factor ‘b’ at q levels b1, b2, …, bq as in split plot design. Hence, there are p main plot treatments, q sub plot treatments and r replications (blocks), so that there are rpq = n experimental units in total. The observations are arranged in a three-way table. Linear Model
The model for this experiment is Yijk = µ + ri + mj + mij + sk + eik + δjk + εijk (i = 1, 2, …, r ; j = 1, 2,…, p ; k = 1, 2,…, q) Where Yijk is the observation of the ith block, jth main plot and k th sub plot. µ is the overall mean effect. ri is the effect due to the ith block. mj is the effect due to the jth main plot treatment. mij is the main plot error or error (A). sk is the effect due to the k th sub plot treatment. δjk is the effect due to interaction between main and sub plots.
Analysis of Variance Tests
149
and εijk is the error effect due to sub plot and interaction or error (B). Assumptions
1. The main plot treatments are allocated randomly to each rows of the block. 2. The sub plot treatments are allocated randomly to each columns of the block. 3. ri, mij, eik and eijk are independently normally distributed each with mean zero and variance 2 2 2 σ r , …σ m …σ e … and σ ε respectively..
4.
∑j m j
= 0,
∑ sk k
= 0,
∑δ
= 0, … ∀ . j,
jk
k
∑j δ jk
= 0 … ∀ . k.
Null Hypotheses
H0(1): The m main plot treatments have equal effect. i.e., H0(1): m1 = m2 = …, = mp. H0(2): The s sub plot treatments have equal effect. i.e., H0(2): s1 = s2 = …, = sq. H0(3): There is no interaction between main and sub plot treatments. i.e., H0(3): δjk = 0 for all j and k. Alternative Hypotheses
H1(1): The m main plot treatments do not have equal effect. i.e., H1(1): m1 ≠ m2 ≠ …, ≠ mp. H1(2): The s sub plot treatments do not have equal effect. i.e., H1(2): s1 ≠ s2 …, sq. H1(3): There is interaction between main and sub plot treatments. i.e., H1(3): δjk ≠ 0 for all j and k. Level of Significance ( α ) and Critical Region
F 1 > F α, (p – 1), (r – 1)(p – 1) such that P [F 1 > F α, (p – 1), (r – 1)(p – 1)] = α F 2 > F α, (q – 1), (r – 1)(q – 1) such that P [F 2 > F α, (q – 1), (r – 1)(q – 1)] = α F 3 > F α, (p – 1)(q – 1), (r – 1)(q – 1) such that P [F 3 > F α,(p – 1)(q – 1), (r – 1)(q – 1)] = α The critical values of F at level of Significance α and for respective degrees of freedom, are obtained from Table 4. Method
Calculate the following, based on the observations: Main Plot Analysis r
p
q
∑∑∑ y
1. Grand total of all the n observations, G =
ijk
i =1 j =1 k =1
2
G 2. Correction Factor, CF = n
r
3. Total Sum of Squares, TSS =
p
q
∑∑∑ y i =1 j =1 k =1
2 ijk
– CF
4. Form a two-way table (BM table) for Block × Main plot treatments as follows.
150
Selected Statistical Tests
Block
Main plot treatments 2 …
1
Total
p
1
Y11.
Y12.
…
Y1p.
R1
2
Y21.
Y22.
…
Y2p.
R2
…
…
…
…
…
…
r
Yr1.
Yr2.
…
Yrp.
Rr
Total
M1
M2
…
Mp
G
5. Sum of Squares in BM table, SSBM =
1 q
∑∑ Y
2 ij .
i
1 pq
6. Sum of Squares between Blocks, SSB =
– CF
j
∑R
2 i
– CF
i
7. Sum of Squares between Main plot treatments, SSM =
1 rq
∑M
2 j
– CF
j
8. Error Sum of Squares in BM table (Error (A)), ESS(A) = SSBM – SSB – SSM Sub Plot Analysis
9. Form a two-way table (BS table) for Block × Sub plot treatments as follows: Block
1
Sub plot treatments 2 …
Total
q
1
Y1.1
Y1.2
…
Y1.q
R1
2
Y2.1
Y2.2
…
Y2.q
R2
…
…
…
…
…
…
r
Y r.1
Yr.2
…
Yr.q
Rr
Total
S1
S2
…
Sq
G
10. Sum of Squares in BS table, SSBS =
1 r
∑∑Y
2 . jk
j
– CF
k
11. Sum of Squares between Sub plot treatments, SSS = 12. Error Sum of Squares (Error (B)), ESS(B) = SSBS – SSS
1 rp
∑S k
2 k
– CF
Analysis of Variance Tests
151
14. Form a two-way table (MS table) for Main plot treatments × Sub plot treatments as follows: Sub plot treatments 2 …
Main plot treatments
1
1
Y.11
Y.12
…
Y.1q
M1
2
Y.21
Y.22
…
Y.2q
M2
…
…
…
…
…
…
p
Y.p1
Y.p2
…
Y. pq
Mp
Total
S1
S2
…
Sq
G
q
Total
∑∑
1 2 Y. jk – CF r j k 16. Sum of Squares of Interaction, SSI = SSMS – SSM – SSS 17. Error Sum of Squares (Error (C)), ESS(C) = TSS – SSB – SSM – ESS(A) – SSS – ESS(B) – SSI. 15. Sum of Squares in MS table, SSMS =
Analysis of Variance Table Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Blocks
r–1
SSB
SSR/(r – 1)
Main Plot Treatments
p –1
SSM
SSM/(p – 1)
Error(A)
(r – 1)(p – 1)
ESS(A)
ESS(A)/(r – 1)(p – 1)
Total (BM)
rp – 1
SSBM
–
Sub Plot Treatments
q –1
SSS
SSS/(q – 1)
Error(B)
(r – 1)(q – 1)
ESS(B)
ESS(B)/(r – 1)(q – 1)
Total (BS)
rq – 1
SSBS
–
Interaction
(p – 1)(q – 1)
SSI
SSI/(p – 1)(q – 1)
Error(C)
(r – 1)(p – 1)(q – 1)
ESS(C)
ESS(C)/(r – 1)(p – 1)(q – 1)
Total (MS)
pq – 1
SSMS
–
Total
rpq – 1
TSS
–
Test Statistics
1.
SSM / ( p − 1) F 1 = ESS ( A) / (r − 1)( p − 1)
152
Selected Statistical Tests
2.
SSS/ (q − 1) F 2 = ESS (B ) /(r − 1)(q − 1)
3.
SSI/ ( p − 1)(q − 1) F 3 = ESS (C ) / (r − 1)( p − 1)(q − 1)
The statistics F 1, F 2, F 3 follows F distribution with [(p – 1), (r – 1)(p – 1)], [(q – 1),(r – 1) (q – 1)] and [(p – 1)(q – 1),(r – 1)(p – 1)(q – 1)] degrees of freedoms respectively. Conclusions
If F 1 ≤ F α, (p – 1), (p – 1)(r – 1), we conclude that the data do not provide us any evidence against the
null hypothesis H0(1), and hence it may be accepted at α% level of significance. Otherwise reject
H0(1) or accept H1(1). If F 2 ≤ F α, (q – 1), (r – 1)(q – 1), we conclude that the data do not provide us any evidence against the null hypothesis H0(2), and hence it may be accepted at α% level of significance. Otherwise reject H0(2) or accept H1(2). If F 3 ≤ F α, (p – 1)(q – 1), (r – 1)(p1)(q – 1), we conclude that the data do not provide us any evidence against the null hypothesis H0(3), and hence it may be accepted at α% level of significance. Otherwise reject H0(3) or accept H1(3). Example Use the data in test-9, apply strip plot design, and draw your conclusions. Solution The main plot analysis is same as in split plot design. Apart from this, we have to form a two way table (BS table) for block × sub plot treatment as follows: Sub plot treatments Block f1
f2
f3
f4
f5
f6
f7
f8
f9
f10
I
35.89
37.00
37.11
37.57
38.37
38.47
39.16
40.64
40.26
31.36
II
36.52
37.49
37.75
38.07
38.77
38.92
39.24
40.47
40.69
32.23
III
36.61
37.08
37.19
37.77
38.01
38.21
38.99
40.30
40.11
31.36
SSBS =
1 42857. 122 (35.89)2 + … + (31.31)2 – CF = – 14228.236 = 57.471 3 3
SSS = 56.7606; SSI = 3.6699 ESS(B) = SSBS – SSS = 57.4710 – 056.7606 = 0.7104 ESS(C) = TSS – SSB – SSM – ESS(A) – SSS – ESS(B) – SSI = 235.9742 – 0.4348 – 172.7740 – 1.5158 – 56.7606 – 0.7104 – 3.6699 = 0.1087
Analysis of Variance Tests
153
ANOVA Table: Sources of variation
Degrees of freedom
Sum of squares
Mean sum of squares
Blocks
2
0.4348
Main Plot Treatments
2
172.7740
Error (A)
4
1.5158
Total (BM)
8
174.7246
Sub Plot Treatments
9
56.7606
6.3067
Error (B)
18
0.7104
0.0395
Total (BS)
27
Interaction
18
3.6699
0.2039
Error (C)
36
0.1087
0.0030
Total (MS)
29
233.205
–
Total
89
235.9742
–
57.471
0.2174 96.387 0.37895 –
–
Test Statistics: 1.
SSM/ ( p − 1) F 1 = ESS ( A)/ (r − 1)( p − 1) = 0.5737
2.
SSS/ (q − 1) F 2 = ESS (B ) /(r − 1)(q − 1) = 159.66
3.
SSI/ ( p − 1)(q − 1) F 3 = ESS (C )/ (r − 1)( p − 1)(q − 1) = 67.97
Conclusions: Since F 1< F 0.05, (2,4), we conclude that the data do not provide us evidence against the null hypothesis H0(1). Hence H0(1) is accepted at 5% level of significance. That is, the seed rates have equal effect. Since F 2 > F 0.05, (9,18), we conclude that the data provide us evidence against the null hypothesis H0(2) and in favor of H1(2). Hence H1(2) is accepted at 5% level of significance. That is, the fertilizer rates do not have equal effect. If F 3 > F 0.05,(18, 36), we conclude that the data provide us evidence against the null hypothesis H0 (3) and in favor of H1(3). Hence H1(3) is accepted at 5% level of significance. That is, there is an interaction between seed rates and fertilizer rates.
This page intentionally left blank
CHAPTER – 4
MULTIVARIATE TESTS
This page intentionally left blank
TEST – 36
TEST FOR POPULATION MEAN VECTOR (Covariance Matrix is Known)
Aim
To test the mean vector of the multivariate population µ be regarded as µ0, based on a multivariate random sample. That is, to investigate the significance of the difference between the assumed population mean vector µ0 and sample mean vector X . Source
Let X ij, (i = 1, 2,…p; j = 1, 2,…, N) be a random sample of p-fold N observations drawn from a p-variate normal population whose mean vector µ = (µ1, µ2,…, µp)T is unknown and co-variance matrix σ11 σ 21 = Σ ... σ p 1
σ 12 σ 22 ... σp2
σ1 p σ 2 p ... is known σ pp
... ... ... ...
The diagonal elements of Σ are variances, the non-diagonal elements are co-variances and the matrix is symmetric. Let X = ( X 1 , X 2 ,..., X p )T ; X = i
N
∑X j =1
ij
; (i = 1, 2,…, p) be the sample mean
vector which is an unbiased estimate of the population mean vector µ. Assumptions
(i) The population from which, the sample drawn, is p-variate normal population. (ii) The covariance matrix Σ is known.
158
Selected Statistical Tests
Null Hypothesis
H0: The population mean vector µ be regarded as µ0. That is, there is no significant difference between the sample mean vector X and the assumed population mean vector µ0. i.e., H0: µ = µ0. Alternative Hypothesis
H1: µ ≠ µ0 Level of Significance (α α ) and Critical Region
χ2 > χ2p(α) such that P{χ2 > χ2p(α)} = α Test Statistic
χ2 = N ( X – µ)T Σ −1 ( X – µ) (Under H0 : µ = µ0) The Statistic χ2 follows χ2 distribution with p degrees of freedom. Conclusion
If χ2 ≤ χ2p (α), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example A random sample of 42 insects of a specific variety is selected whose mean lengths of left and right antenna are observed as 0.564 inches and 0.603 inches. Test whether the lengths of left and right 0. 55 antenna of a specific variety of insects with mean vector with known covariance matrix 0.60 0. 012 0. 014 at 5% level of significance. 0. 012 0. 015 Solution 0. 55 H0: The left and right antennas of a specific variety of insects have the mean lengths i.e., 0.60 0. 55 H0: µ = 0.60 0. 55 H1: The lengths of left and right antenna of a specific variety of insects is not . i.e., H1: µ 0.60 0. 55 ≠ 0.60
Multivariate Tests
159
Level of Significance: α = 0.05 and Critical Value: χ20.05,(2) = 5.99 Test Statistic: χ2 = N ( X – µ)T Σ −1 ( X – µ)
(Under H0 : µ = µ0) T
0. 564 − 0.55 0. 014 = 42 0. 603 − 0. 60 0. 012 = 42[0.014
21. 2121 0.003]. − 18. 1818
−1
0. 012 0. 564 − 0.55 0.015 0.603 − 0.60 − 18.1818 0. 014 22. 7273 0. 003 = 0.0028
Conclusion: Since χ2 < χ20.05,(2), H0 is accepted and concluded that the left and right antennas of 0. 55 a specific variety of insects have the mean lengths . 0.60
TEST – 37
TEST FOR POPULATION MEAN VECTOR (Covariance Matrix is Unknown)
Aim
To test the null hypothesis that the mean vector of the multivariate population µ be regarded as µ0, based a multivariate random sample. That is, to investigate the significance of the difference between the assumed population mean vector µ0 and the sample mean vector X . Source
Let X ij, (i = 1, 2,…p ; j = 1, 2,…, N) be a sample of p-fold N observations drawn from a p-variate normal population whose mean vector µ = (µ1, µ2,…, µp)T and the covariance matrix Σ are unknown. T
Let X = ( X 1 , X 2 , ..., X p ) be the sample mean vector which is an unbiased estimate of the population mean vector µ. The unknown covariance matrix Σ is estimated by S=
A N –1
A = ∑ ( X ij − X )( X ij − X ) N
T
j=1
S11 S 21 S = ... S p1
S12 S 22 ... S p2
... S 1 p ... S 2 p ... ... ... S pp
The diagonal elements of S are variances, the non-diagonal elements are co-variances, and the matrix is symmetric.
Multivariate Tests
161
Assumptions
(i) The population from which, the sample drawn is p-variate normal population. (ii) The covariance matrix Σ is unknown. Null Hypothesis
H0: The population mean vector µ be regarded as µ0. That is, there is no significant difference between the sample mean vector X and the assumed population mean vector µ0. i.e., H0: µ = µ0. Alternative Hypothesis
H1: µ ≠ µ0 Level of Significance (α α ) and Critical Region
F > F p,N–p(α) such that P{F > F p,N–p(α)} = α Test Statistic T –1 T 2 = N ( X – µ) S ( X – µ)
(Under H0 : µ = µ0)
T2 = N ( X – µ 0 ) T A –1 ( X – µ 0 ) N –1 2
and
F=
T N–p N –1 p
The Statistic F follows F distribution with (p, N–p) degrees of freedom. Conclusion
If F ≤ F p,N–p(α), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Note: This test is also known as Hotelling’s T 2 test.
162
Selected Statistical Tests
Example Perspiration from 20 healthy females was analyzed. Three components, X 1 = sweat rate, X 2 = sodium content, and X 3 = potassium content, were measured and the data are given below:
Sweat rate-X1
Persons 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3.7 5.7 3.8 3.2 3.1 4.6 2.4 7.2 6.7 5.4 3.9 4.5 3.5 4.5 1.5 8.5 4.5 6.5 4.1 5.5
Sodium-X2
Pottassium-X3
48.5 65.1 47.2 53.2 55.5 36.1 24.8 33.1 47.4 54.1 36.9 58.8 27.8 40.2 13.5 56.4 71.6 52.8 44.1 40.9
9.3 8.0 10.9 12.0 9.7 7.9 14.0 7.6 8.5 11.3 12.7 12.3 9.8 8.4 10.1 7.1 8.2 10.9 11.2 9.4
Test the hypothesis that H0: µ = [4 50 10] against H1: µ ≠ [4 50 10] at 10% level of significance. Solution H0: The average perspiration of female (µ) is [4 50 10] i.e., H0: µ = [ 4 50 10 ] H1: The average perspiration of the female (µ) is not [4 50 10] i.e., H0: µ ≠ [4 50 10] Level of Significance: α = 0.10; Critical Value: F 0.10,(3,17) = 2.44 Calculations: Based on the above data, 4.640 X = 45. 400 , S = 9.965
2.879 10.002 − 1. 810
10. 002 199. 798 − 5.627
− 1.810 − 5. 627 S–1 = 3. 628
0. 586 − 0.022 0. 258
− 0.022 0.006 − 0.002
0. 258 − 0. 002 0.402
Multivariate Tests
Test Statistic:
163 T –1 T 2 = N ( X – µ) S ( X – µ)
0. 586 = 20 [4.640 – 4 45.4 – 50 9.965 – 10] − 0.022 0. 258
− 0.022 0.006 − 0.002
(Under H0 : µ = µ0) 0. 258 − 0. 002 0.402
4. 640 − 4 45. 400 − 50 9. 965 − 10
0. 467 = 20 [0.640 – 4.600 – 0.035] − 0.042 = 9.74 0. 160 F=
T2 N − p 9. 74 20 − 3 × = = 2.9049 20 − 1 3 N −1 p
Conclusion: Since, F > F 0.05,(3.17), H0 is rejected and concluded that the average perspiration of the female (µ) is not [4 50 10].
TEST – 38
TEST FOR EQUALITY OF POPULATION MEAN VECTORS (Covariance Matrices are Equal and Known)
Aim
To test the mean vectors of two multivariate populations µ1 and µ2 are equal, based on two multivariate random samples. That is, to investigate the significance of the difference between the sample mean vectors. Source
Let X ij(1), (i = 1, 2,…p ; j = 1, 2,…, N1) be a random sample of p-fold N1 observations called as sample-1 drawn from a p-variate normal population whose mean vector µ(1) = (µ1(1), µ2(1),…, µp(1))T . Let X ij(2), (i = 1, 2,…p ; j = 1, 2,…, N2) be a random sample of p-fold N2 observations called as sample-2 drawn independently from another p-variate normal population whose mean vector µ(2) = (µ1(2), µ2(2), …, µp(2))T . The mean vectors µ(1) and µ(2) are unknown. The covariance matrices of the two populations are equal and known and is denoted by σ11 σ 21 Σ = ... σ p 1
σ 12 σ 22 ... σp2
... ... ... ...
σ1 p σ 2 p ... σ pp
The diagonal elements of Σ are variances, the non-diagonal elements are co-variances and the (1 )
(1)
matrix is symmetric. Let. X (1) = ( X 1 , X 2 , … , X p
(1) T
) be the sample mean vector of the sample-1 (2 )
( 2)
( 2)
which is an unbiased estimate of the population mean vector µ(1) and X (2 ) = ( X 1 , X 2 , …, X p )T be the sample mean vector of the sample-2 which is an unbiased estimate of the population mean vector µ(2).
Multivariate Tests
165
Assumptions
(i) The populations from which, the samples drawn, are two independent p-variate normal populations. (ii) The covariance matrices of two populations are equal and known, denoted by Σ . Null Hypothesis
H0: The two population mean vectors µ(1) and µ(2) are equal. That is, there is no significant difference between the two sample mean vectors X (1) and X (2 ) i.e., µ(1) = µ(2). Alternative Hypothesis
H1: µ(1) ≠ µ(2) Level of Significance (α α ) and Critical Region
χ2 > χ2p(α) such that P{χ2 > χ2p(α)} = α Test Statistic
[
N 1N 2 T −1 χ2 = N + N ( X − µ ) ∑ ( X − µ) 1 2
]
(1) (2) X = X (1 ) – X (2 ) , µ = µ – µ
Under H0: µ(1) = µ(2), hence the test statistic becomes χ2 =
N 1N 2 N1 + N 2
( X (1) − X ( 2 ) )T ∑ −1 ( X (1) − X ( 2 ) )
The Statistic χ2 follows χ2 distribution with p degrees of freedom. Conclusion
If χ2 ≤ χ2p(α), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example Fifty observations are taken from the population Iris versicolour (1) and fifty from the population Iris setosa (2) on the characters: sepal length (X 1), sepal width (X 2), petal length (X 3) and petal width (X 4) in centimeters and obtained the measures as follows: 5. 936 5. 006 2. 770 3. 428 X 1 = 4. 260 X 2 = 1. 462 with known covariance matrix 1.326 0. 246
166
Selected Statistical Tests
19.1434 9.0356 Σ = 9.7634 3.2394
9. 0356
9.7634
11. 8658 4. 6232 2. 4746
4.6232 12.2978 3.8794
3. 2394 2. 4746 3. 8794 2. 4604
Test whether the mean vectors of given four characters of two populations are equal at 5% level of significance. Solution H0: The mean vectors of given four characters of two populations are equal. i.e., H0: µ(1) = µ(2). µ(2).
H1: The mean vectors of given four characters of two populations are not equal. i.e., H1: µ(1) ≠ Level of Significance: α = 0.05 and Critical value: χ20.05,(4) = 9.49 χ2 =
Test Statistic: 5. 936 − 5. 006 2. 770 − 3.428 50 × 50 4.260 − 1.462 = 50 + 50 1. 326 − 0.246
T
( 2) T (1) ( 2) N 1 N 2 (1) −1 ( X − X ) ∑ ( X − X ) N 1 + N 2
19.1434 9.0356 9.7634 3.2394
9. 0356
9.7634
11. 8658 4.6232 2.4746
4.6232 12.2978 3.8794
3. 2394 2.4746 3. 8794 2.4604
−1
5. 936 − 5. 006 2. 770 − 3.428 4.260 − 1.462 1. 326 − 0.246
= 2580.732 Conclusion: Since χ2 > χ20.05,(4), H0 is rejected and conclude that the mean vectors of given four characters of two populations are not equal.
TEST – 39
TEST FOR EQUALITY OF POPULATION MEAN VECTORS (Covariance Matrices are Equal and Unknown)
Aim
To test the mean vectors of two multivariate populations µ1 and µ2 are equal, based on two multivariate random samples. That is, to investigate the significance of the difference between the two sample mean vectors. Source
Let X ij(1), (i = 1, 2, …p; j = 1, 2,…, N1) be a random sample of p-fold N1 observations called as sample-1 drawn from a p-variate normal population whose mean vector µ(1) = (µ1(1), µ2(1), …, µP(1))T . Let X ij(2), (i = 1, 2,…p; j = 1, 2, …, N2) be a random sample of p-fold N2 observations called as sample-2 drawn independently from another p-variate normal population whose mean vector µ(2) = (µ1(2), µ2(2),…, µp(2))T . The mean vectors µ(1) and µ(2) are unknown. The covariance matrix of the two populations is equal but unknown and is denoted by Σ . The estimate of Σ is given by 1 S= N + N −2 1 2 S11 S 21 S = ... S p 1
S12 S 22 ... S p2
N2 N1 (1) (1) T ( 2) ( 2) (1) (1) ( X − X )( X − X ) + ( X ij( 2 ) − X )( X ij( 2 ) − X )T ij ij j =1 j =1
∑
... ... ... ...
∑
S1 p S 2 p ... S pp
The diagonal elements of S are variances, the non-diagonal elements are co-variances and the (1 )
(1)
matrix is symmetric. Let X (1) = ( X 1 , X 2 , … , X p
(1) T
)
be the sample mean vector of the sample-1 (2 )
( 2)
( 2)
which is an unbiased estimate of the population mean vector µ(1) and X (2 ) = ( X 1 , X 2 , …, X p )T be the sample mean vector of the sample-2 which is an unbiased estimate of the population mean vector µ(2).
168
Selected Statistical Tests
Assumptions
(i) The populations from which, the sample drawn are two independent p-variate normal populations. (ii) The covariance matrices of two populations are equal, denoted by Σ, is unknown. Null Hypothesis
H0: The two population mean vectors µ(1) and µ(2) are equal. That is, there is no significant difference between the two sample mean vectors X (1) and X ( 2) . i.e., H0: µ(1) = µ(2). Alternative Hypothesis
H1: µ(1) ≠ µ(2) Level of Significance (α α ) and Critical Region
F > F p , N1 + N 2 – p – 1 (α) such that P {F > F p , N1 + N 2 – p –1 (α)} = α Test Statistic
(
)
(
)
N1 N 2 T −1 T2 = N + N X − µ S X − µ 1 2 (1)
( 2)
X = X − X , µ = µ(1) – µ(2) (1) (2) Under H0: µ = µ , hence the test statistic becomes T2
T ( 2) (1 ) ( 2) N1 N 2 (1) −1 X − X S X − X = N +N 1 2 2
and
F=
T N1 + N 2 − p − 1 (N 1 + N 2 − 2 ) p
The Statistic F follows F distribution with (p1 N1 + N2 – p –1) degrees of freedom. Conclusion
If F ≤ F p , N1 + N 2 – p – 1 (α), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Note: This test is also known as Hotelling’s T 2 test. Example Two random samples of sizes 45 and 55 were observed from Chennai city of households having with and without air conditioning, respectively. Two measurements of electrical usage (in kilowatt hours) were considered. The first is the measure of total on peak consumption (X 1) during July and the second is a measure of total off-peak consumption (X 2) during July. The resulting summary statistics
Multivariate Tests
169
are 204. 4 X 1 = 556. 6
N1 = 45
13825.3 S1 = 23823. 4
23823. 4 73107. 4
130. 0 X 2 = 355.0
19616.7 8632.0 S2 = 55964. 5 19616.7 Test whether the average consumption of electrical usage on both on-peak and off-peak are equal at 5% level of significance. N2 = 55
Solution H0: The average consumption of electrical usage on both on-peak and off-peak are equal. i.e., H0: µ(1) = µ(2). H1: The average consumption of electrical usage on both on-peak and off-peak are not equal. i.e., H1: µ(1) ≠ µ(2).
Level of Significance: α = 0.05 and Critical value: F 0.05,(2,98) = 3.10 Calculations: The pooled sample covariance matrix, S=
(N 1 − 1)S1 + (N 2 − 1)S 2 10963. 7 = 21505. 5 N1 + N 2 − 2
0. 00027035 S–1 = − 0.000091327
21505 .5 63661 .3
− 0. 000091327 (1) (2 ) 0. 00004656 ( X − X ) =
74.4 201. 6
Test Statistic: N1 N 2 T2 = N + N 1 2
( X (1 ) − X ( 2 ) )T S −1 ( X (1) − X (2 ) )
45 × 55 = 45 + 55 [74. 4 =
2475 × [0.001699 100
0. 000270305 201. 6] − 0. 000091327
− 0.000091327 74.4 0. 00004656 201.6
74.4 0.002592]. 201. 6 = 24.75 × 0.6489528 = 16.0616
2
and
T N1 + N 2 − p − 1 16.0616 45 + 55 − 2 − 1 × F= = = 7.9488 45 + 55 − 2 2 (N 1 + N 2 − 2 ) p
Conclusion: Since, F > F 0.05,(2,97), H0 is rejected and concluded that the average consumption of electrical usage on both on-peak and off-peak are not equal.
TEST – 40
TEST FOR EQUALITY OF POPULATION MEAN VECTORS (Covariance Matrices are Unequal and Unknown)
Aim
To test the mean vectors of two multivariate populations µ1 and µ2 are equal, based on two multivariate random samples. That is, to investigate the significance of the difference between the two sample mean vectors. Source
Let X ij(1), (i = 1, 2, …p; j = 1, 2, …, N) be a random sample of p-fold N observations called as sample-1 drawn from a p-variate normal population whose mean vector µ(1) = (µ1(1), µ2(1), …, µp(1))T . Let X ij(2), (i = 1, 2, …p; j = 1, 2,…, N) be a random sample of p-fold N observations called as sample2 drawn independently from another p-variate normal population whose mean vector µ(2) = (µ1(2), µ2(2), …, µp(2))T . The mean vectors µ(1) and µ(2) are unknown. The covariance matrices of the two populations are unequal and unknown and are denoted by ∑1 and ∑2 . In this case ∑1 is estimated by S1 and ∑2 is estimated by S2, where S1 and S2 are sample covariance matrices of the two samples. (1)
(1)
(1)
Let X (1) = ( X 1 , X 2 , …, X p )T be the sample mean vector of the sample-1 which is an ( 2) ( 2) (2 ) unbiased estimate of the population mean vector µ(1) and X (2 ) = ( X 1 , X 2 , …, X p )T be the sample mean vector of the sample-2 which is an unbiased estimate of the population mean vector µ(2).
Assumptions
(i) The populations from which, the sample drawn are two independent p-variate normal populations. (ii) The covariance matrices of two populations are unequal, denoted by Σ 1 and Σ 2 , are unknown. Null Hypothesis
H0: The two population mean vectors µ(1) and µ(2) are equal. That is, there is no significant difference between the two sample mean vectors X (1) and X (2 ) . i.e., H0: µ(1) = µ(2).
Multivariate Tests
171
Alternative Hypothesis
H1: µ(1) ≠ µ(2) Level of Significance (α α ) and Critical Region
T2 > χ2α,(p) such that P {T2 > χ2α,(p)} = α Test Statistic –1
1 1 S 2 X1 – X 2 T 2 = X 1 – X 2 S1 + N2 N1 2 2 The Statistic T follows χ distribution with p degrees of freedom.
[
]
T
[
]
Conclusion
If T 2 ≤ χ 2α,(p), we conclude that the data do not provide us any evidence against the null hypothesis H0, and hence it may be accepted at α% level of significance. Otherwise reject H0 or accept H1. Example The problem given in Test 39, test whether the mean vectors of both samples can be regarded as drawn from the same population at 5% level of significance. Solution H0: The average consumption of electrical usage on both on-peak and off-peak are equal. i.e., H0: µ(1) = µ(2). H1: The average consumption of electrical usage on both on-peak and off-peak are not equal. i.e., H1: µ(1) ≠ µ(2). Level of Significance: α = 0.05 and Critical value: χ20.05,(2) = 5.99 Calculations: Given that N1 = 45
204. 4 X 1 = 556. 6
13825.3 S1 = 23823. 4
23823. 4 73107. 4
N2 = 55
130. 0 X 2 = 355.0
8632.0 S2 = 19616.7
19616.7 55964. 5
1 1 1 13825 .3 S1 + S2 = N1 N2 45 23823. 4 464. 17 = 886. 08 Test Statistic:
T2
= [X
1
– X2
23823. 4 1 8632. 0 + 73107. 4 55 19616. 7
886.08 2642. 15
]
T
1 1 N S1 + N S 2 1 2
–1
[X
1
– X2
]
19616. 7 55964 .5
172
Selected Statistical Tests
T
204.4 − 130.0 464. 17 = 556.6 − 355. 0 886.08 = [74.4 T2
χ2
( )⋅ −5920.874 . 080
201.6] 10
−4
−1
886.08 204.4 − 130.0 2642. 15 556.6 − 355. 0 − 20.080 74.4 10.519 201.6 = T1
Conclusion: Since > α,(p), H0 is rejected and concluded that the average consumption of electrical usage on both on-peak and off-peak are not equal.
CHAPTER – 5
NON–PARAMETRIC TESTS
This page intentionally left blank
TEST – 41
SIGN TEST FOR MEDIAN Aim
To test whether the population median M be regarded as M0. Source
A random sample of n observations is drawn independently. Let M0 be a given value to the population median. Assumption
Each observation in the sample should be independent of each other. Null Hypothesis
H0 : M = M0 Alternative Hypotheses
H1(1) : M ≠ M0 H1(2) : M > M0 H1(3) : M < M0 Level of Significance (α α ) and Critical Value (T αα)
The critical value, Tα for the level of significance, α and sample size, n is obtained from Table 5. Method
1. Discard the sample observations whose value is equal to M0. 2. Count the number of observations below and above M0 and they are respectively denoted by n1 and n2.
176
Selected Statistical Tests
Test Statistic
Min(n1 , n2 )KKK (For K H 1 : M ≠ M 0 ) T = n1 KKKKKKK( For K H1 : M > M 0 ) n 2 KKKKKKK (For K H1 : M < M 0 ) Conclusion
1. If ≥ Tα, accept H0 and if T < Tα reject H0 or accept H1. Example A random sample of 15 students is selected from a school whose height (in cms) is given below. Test whether the median height of the school students be regarded as 135 or not. Test at 5% level of significance. 132 134 138 139 142 132 140 136 135 140 139 132 131 136 138 Solution Aim: To test the median height of the school students be 135 cms or not. H0 : The median height of the school students is 135 cms. i.e., H0: M = 135. H1 : The median height of the school students is not 135 cms. i.e., H1:M ≠ 135. Level of Significance: α = 0.05 and Critical Value: T0.05, 15 = 9. Calculations: 1. Discard the sample observation 135 as it is the value of median. 2. Number of observations below the median, n1 = 5. 3. Number of observations above the median, n2 = 9. Test Statistic: T = Minimum (n1, n2) = 5. Conclusion: Since, T < T0.05, 15, H0 is rejected and H1 is accepted. Hence, we conclude that the median of the school students is not 135 cms.
TEST – 42
SIGN TEST FOR MEDIAN (Paired Observations)
Aim
To test the population medians M1 and M2 are equal. Source
Two random samples of n pairs of observations are drawn from two populations. The population medians M1and M2 are unknown. Assumptions
(i) Each pair of observations should be taken under the same conditions. (ii) The different pairs need not be taken under similar conditions. Null Hypothesis
H0 : M1 = M2 Alternative Hypothesis
H1 : M1 ≠ M2 Level of Significance (α α ) and Critical Value (T αα)
The Critical value, Tα for the level of significance, α and sample size, n is obtained from Table 6. Method
1. 2. 3. 4. 5. 6.
Let (X i, Yi), (i = 1, 2, … n) be the pairs of observations. Find X i – Yi for each of n pairs. Put ‘+’ sign, if X i – Yi > 0. Put ‘–’ sign, if X i – Yi < 0. Count the number of ‘+’ signs and denote it by T+. Count the number of ‘–’ signs and denote it by T–.
178
Selected Statistical Tests
Test Statistic
T = Min (T+, T–) Conclusion
1. If T ≥ Tα, accept H0 and if T < Tα reject H0 or accept H1. Example A random sample of 12 students is selected from a corporation school whose marks in a competitive examinations are 78 56 58 72 58 55 56 62 65 56 60 63. A sample of 14 students is selected from a matriculation school whose marks in internal assessment test (X ) and external examination (Y ) are as follows. X: 85 89 78 72 68 65 78 75 79 78 82 85 84 73 69. Y: 88 79 85 80 75 62 79 80 85 75 80 88 85 75 70. Examine whether the median marks of the two school students are same at 5% level of significance. Solution Aim: To test the median marks of the two examinations are equal or not. H0: The median marks of the two examinations are equal. H1: The median marks of the two examinations are not equal. Level of Significance: α = 0.05 and Critical value: R 0.05, 14 = 2. Calculations: X: 85 89 78 72 68 65 78 75 79 78 82 85 84 73. Y: 88 79 85 80 75 62 79 80 85 75 80 88 85 75. X–Y – + – – – + – – – + + – – T+ = 4; T– = 10.
–
Test Statistic: T = Minimum (T+ ,T–) = 4 Conclusion: Since, T > T0.05, 14, accept H0 and conclude that the median marks of the two examinations are equal.
TEST – 43
MEDIAN TEST Aim
To test the two samples are drawn from the populations having the same medians. Source
A random sample of n1 observations, arranged in order of magnitude as, X 1, X 2,…, X n1 drawn from a population with density function f 1(.) and a random sample of n2 observations, arranged in order of magnitude as, Y1, Y2,…, Yn2 drawn from another population with density function f 2(.). The population medians of the two populations are unknown. Let N = n1 + n2. Assumptions
(i) The two samples drawn are independent. (ii) The observations must be at least ordinal. (iii) The sample sizes should be sufficiently large. Null Hypothesis
H0: The two samples are drawn from the populations having the same median. Alternative Hypothesis
H1: The two samples are drawn from the populations having different medians. Level of Significance (α α ) and Critical value
The critical value, χ2α,1 for 1 degree of freedom and level of significance, α, is obtained from Table 3. Method
1. Combine the two samples and arrange the observations in order of magnitude, say, X 1 X 2 Y1 X 3 Y2 Y3 X 4 Y4 X 5 … such that X 1 <X 2 χ2α, (K – 1), reject H0 or accept H1. Example Five independent random samples are drawn with sizes 45, 65, 55, 85 and 62. The median of the combined sample is found and the number of observations above and below the median for each sample is found and is tabulated as follows. Examine whether the five random samples can be regarded as drawn from five populations with the same frequency distribution. Test at 5% level of significance. Samples Total 1
2
3
4
5
Above Median
20
30
25
40
30
145
Below Median
25
35
30
45
32
167
Total
45
65
55
85
62
312
H0: The populations from which, the five samples drawn have the same frequency distribution. H1: The populations from which, the five samples drawn have the different frequency distribution. Level of Significance: α = 0.05 and Critical Value: χ20.05, 4 = 9.49 Calculations: e11 = 145 × 45/312 = 20.91 e12 = 145 × 65/312 = 30.21 e13 = 145 × 55/312 = 25.56 e14 = 145 × 85/312 = 39.50 e15 = 145 × 62/312 = 28.80
e21 = 167 × 45/312 = 24.08 e22 = 167 × 65/312 = 34.79 e23 = 167 × 55/312 = 29.44 e24 = 167 × 85/312 = 45.50 e25 = 167 × 62/312 = 33.18
186
Selected Statistical Tests
Test Statistic: χ2
=
K
(a1 j − e1 j )2
j =1
e1 j
∑
+
K
(a2 j − e2 j )2
j =1
e2 j
∑
= 0.0396 + 0.0014 + 0.0123 + 0.0063 + 0.0500 + 0.0351 0.0013+0.0106+0.0055+0.0420 = 0.2041 Conclusion: Since χ2 < χ2α,(K–1), accept H0 and conclude that the populations from which, the five samples drawn have the same frequency distribution.
TEST – 46
WALD–WOLFOWITZ RUN TEST Aim
To test the two samples have been drawn from the populations having the same density functions. Definition (RUN)
A run is defined as a sequence of letters of one type surrounded by a sequence of letters of the other type, and the number of elements in a run is referred to as the length of the run. Source
A random sample of n1 observations, arranged in order of magnitude as, X 1, X 2,…, X n1 drawn from a population with density function f 1(.) and a random sample of n2 observations, arranged in order of magnitude as, Y1, Y2,…, Yn2 drawn from another population with density function f 2(.) Assumption
The two samples are drawn independently. Null Hypothesis
H0: The populations from which the two samples drawn have the same density function. i.e., H0: f 1(.) = f 2(.). Alternative Hypothesis
H1: f 1(.) ≠ f 2(.). Level of Significance (α α ) and Critical Value (Uαα)
The Critical value, Uα for the level of significance, α and for sample sizes, n1 and n2 is obtained from Table 7. Method
1. Combine the two samples and arrange the observations in order of magnitude, say, X 1 X 2 Y1 X 3 Y2 Y3 X 4 Y4 X 5 … such that X 1 <X 2 χ2(α), reject H0 or accept H1. Example The following table shows three independent samples of sizes 9, 6 and 5 drawn from three populations of children whose weight and their ranks. Test whether the mean weight of the children from the three populations is same at 5% level of significance. Sample Value Rank
1 11.7 1
1 11.9 2
1 16.1 3
1 17.5 4
1 20.5 7
1 25.1 10.5
1 30.5 14
1 32.1 15
1 82.5 20
2 19.6 6
Sample Value Rank
2 21.8 8
2 25.2 12
2 33.2 16.5
2 33.2 16.5
2 34.1 19
3 18.4 5
3 22.9 9
3 25.1 10.5
3 29.7 13
3 33.5 18
Solution H0: The mean weight of the children from the three populations is same. H1: The mean weight of the children from the three populations is not same. 2 Level of Significance: α = 0.10 and Critical Value: χ 2 = 4.61 Calculations: n1 = 9; n2 = 6; R 1 = 76.5 R 2 = 74 Test Statistic:
n3 = 5; R 3 = 55.5
N = 20;
2 12 R i Σ H = N ( N + 1) n – 3(N + 1) i 2 12 76.5 74 55.5 × + + − 3 × 21 = 2.15 = 20 × 21 9 6 5
Conclusion: Since, H < χ2(α), H0 is accepted and concluded that the mean weight of the children from the three populations is same.
TEST – 48
MANN–WHITNEY–WILCOXON RANK SUM TEST Aim
To test the two random samples be drawn from the populations having the same mean, based on the rank sum of the sample. Source
A random sample of n1 observations, arranged in order of magnitude as, X 1, X 2, …, X n1 drawn from a population with density function f 1(.) and a random sample of n2 observations, arranged in order of magnitude as, Y1, Y2, …, Yn2 drawn from another population with density function f 2(.). Assumptions
(i) The two samples drawn are independent. (ii) The populations have continuous frequency distributions. Null Hypothesis
H0: The populations from which the samples drawn have the same mean. Alternative Hypothesis
H1: The populations, from which, the samples drawn have different mean. Level of Significance (α α ) and Critical Value (R αα)
The critical value, R α for the level of significance α, and for sample sizes, n1 and n2 is obtained from Table 8. Method
1. Combine the two samples and arrange the observations in order of magnitude, say, X 1 X 2 Y1 X 3 Y2 Y3 X 4 Y4 X 5 … such that X 1 <X 2 30), the test statistic is Z = 1−
6r 2
n (n − 1)
n(2n − 2) Var (K) = 2(2 n − 1) which may be compared with the Table 1 as the statistic Z follows Standard Normal distribution. E(K) = (n + 1),
Example The following data denotes the length of iron rods (in cms.) of a sample of 24 units manufactured by an industry. Test whether the sample drawn is random at 10% level of significance. 21.02 20.09 19.64 20.75
20.08 19.40 20.82 21.01
20.05 20.56 21.26 19.09
19.70 20.97 20.75 18.73
19.13 20.17 20.74 18.45
17.09 21.35 21.59 19.80
Solution H0: The sample observations obtained is random. H1: The sample observations obtained is not at random. Level of Significance: α = 0.10. Critical Value: K0.10,12 = 8 (lower), 18 (upper). Calculations: Number of observations, n = 24. Median = 20.12 Number of observations above the median, n1 = 12 Number of observations below the median, n2 = 12 21.02 20.08 20.05 19.70 19.13 17.09 (+) (–) (–) (–) (–) (–) 20.09 19.40 20.56 20.97 20.17 21.35 (-) (-) (+) (+) (+) (+) 19.64 20.82 21.26 20.75 20.74 21.59 (-) (+) (+) (+) (+) (+) 20.75 21.01 19.09 18.73 18.45 19.80 (+) (+) (–) (–) (–) (–) Test Statistic: K = Number of runs = 6. Conclusion: Since K lies in the critical region, H0 is rejected and concluded that the sample observations drawn is not random.
TEST – 54
TEST FOR RANDOMNESS OF RANK CORRELATION Aim
To test the fluctuations in a sample have a random nature. Source
A sample of n observations is drawn as a time series data. Null Hypothesis
H0: The fluctuation in the sample is random. Alternative Hypothesis
H1: The fluctuation in the sample is not random. Level of Significance (α α ) and Critical Value (Z αα)
The critical values Zα for level of significance α, are obtained from Table 1. Method
1. The observations in the sample be given serial numbers in the order in which they occur and they are denoted by X i, i = 1, 2, …, n. 2. The ranks are given to the observations according to the increasing order of magnitude and is denoted by Yi, i = 1, 2, …, n. 3. Find di = X i – Yi, i = 1, 2, …, n. n
4. Find
∑d i =1
2 i
and denote it by r.
Test Statistic 2
Z=
6r − n (n − 1)
n(n + 1) n − 1 The statistic Z follows standard Normal distribution.
206
Selected Statistical Tests
Conclusion
If |Z| ≤ Zα/2 accept H0 and if |Z| > Zα/2 reject H0 or accept H1. Example The monthly rainfall (in cms) is obtained by metrological station over a period of twelve months in a city is given below. Test whether the rainfall is random over the entire year at 5% level of significance. Month (X ) :1 2 3 4 5 6 7 8 9 10 11 12 Rain (Y ): 12.5 10.7 14.5 10.2 8.5 12.8 15.5 16.8 22.5 26.5 28.2 30.5 Solution H0: The rainfall over the entire year is random nature. H1: The rainfall over the entire year is not random nature. Level of Significance: α = 0.05 and Critical Value: Z0.05 = 1.96. Calculations: n = 12 RX : 1 2 3 R Y: 4 3 6
4 2
5 1 r=
∑d
6 5 2 i
7 7
8 8
9 9
10 10
11 11
12 12
= 40
Test Statistic: 2
Z=
6r − n (n − 1) n(n + 1) n − 1
=
(6 × 40) − 12(144 − 1) 240 − 1716 = = –2.85 12 × 13 × 11 517.39
Conclusion: Since |Z| > Zα/2 reject H0 and conclude that the rainfall over the entire year is not random nature.
TEST – 55
FRIEDMAN'S TEST FOR MULTIPLE TRETMENT OF A SERIES OF OBJECTS Aim
To test the significance of the differences in response for K treatments applied to n subjects. Source
The data are obtained as a two-way table having n rows (subjects) and K columns (treatments). Assumptions
(i) The response to one treatment by a subject is not affected by the same subject’s response to another treatment. (ii) The response distribution is continuous for each subject. Null Hypothesis
H0: The effects of the K treatments are same. Alternative Hypothesis
H1: The effects of the K treatments are not same. Level of Significance (α α ) and Critical Value (χ χ 2αα)
The critical value χ2α,(K–1), for level of significance α is obtained from Table 3. Method
1. The data be represented by a table of n rows and K columns. 2. The rank numbers 1, 2,…, K are assigned in increasing order of magnitude for the values in each row. 3. The rank sum Rj, (j = 1, 2,…, K) is calculated for each of the K columns.
208
Selected Statistical Tests
Test Statistic
G=
12 nK (K + 1)
∑R
2 j
− 3n (K + 1)
Conclusion
If G ≤ χ2(α), accept H0 and if G > χ2(α), reject H0 or accept H1. Example Four experts were appointed to conduct an interview board. There are fifteen candidates attended the interview. The following are the points given to the candidates by the experts. Test whether the points given by the experts to the candidates are significant at 5% level of significance.
Candidates(n)
Points given by experts
C1
C2
C3
C4
1
8
8
10
10
2
7
9
9
9
3
10
8
8
10
4
8
8
10
10
5
9
9
9
10
6
9
9
10
9
7
8
9
8
9
8
8
8
8
8
9
9
9
10
9
10
9
9
9
9
11
9
10
9
10
12
7
9
9
9
13
10
10
10
10
14
9
9
9
10
15
7
10
10
10
Solution H0: The points given by the experts to the candidates are not significant. H1: The points given by the experts to the candidates are significant. Level of Significance: α = 0.05 and Critical Value: χ20.05,3 = 7.81
Non-parametric Tests
209
Calculations: Candidates(n)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Rj
R Rj–R
Ranks R(C1)
R(C2)
R(C3)
R(C4)
3.5 4.0 1.5 3.5 3.0 3.0 3.5 2.5 3.0 2.5 3.5 4.0 2.5 3.0 4.0
3.5 2.0 3.5 3.5 3.0 3.0 1.5 2.5 3.0 2.5 1.5 2.0 2.5 3.0 2.0
1.5 2.0 3.5 1.5 3.0 1.0 3.5 2.5 1.0 2.5 3.5 2.0 2.5 3.0 2.0
1.5 2.0 1.5 1.5 1.0 3.0 1.5 2.5 3.0 2.5 1.5 2.0 2.5 1.0 2.0
47
39
35
29
37.5
37.5
37.5
37.5
+9.5
+1.5
-2.5
-8.5
N = 15; K = 4. R j = Sum of the ranks by each experts; R = S=
∑ (R
j
−R
)
∑R
2
j
= 37.5
= 171
ti = Number of times any observation is repeated in each of the candidates. f i frequency of ti.
210
Selected Statistical Tests
ti 1 2 3 4 Total
fi 7 10 7 3
D=
fiti 7 20 21 12
∑ft
3 i i
3
fiti 7 80 189 192 468
= 468
Test Statistic:
G=
12(K − 1)S
=
12(4 − 1) × 171
= 12.51 3 n×K − D 15 × 4 − 468 Conclusion: Since G > χ20.05, 3, H0 is rejected and concluded that the points given by the experts to the candidates are significant. 3
CHAPTER – 6
SEQUENTIAL TESTS
This page intentionally left blank
TEST – 56
SEQUENTIAL TESTS FOR POPULATION MEAN (Variance is Known)
Aim
To test that, the mean of a population has a specified value based on sequential observations. Source
A random sample of observations is drawn sequentially as necessary. Assumption
σ 2.
The observations drawn are independent and follow a normal distribution with known variance
Null Hypothesis
H0: The mean of a population, µ has a specified value µ0. i.e., H0: µ = µ0. Alternative Hypothesis
H1: The mean of a population, µ has a specified value µ1. i.e., H1: µ = µ1. Method
(i) Fix the probabilities of Type-I and Type-II errors, α and β at a minimum level. (ii) Choose ‘c’ as a convenient value close to (µ0 + µ1)/2 . (iii) Calculate the following two boundary lines for every successive observations m: σ2 1 − β µ + µ1 log + m 0 − c am = µ1 − µ 0 2 α
214
Selected Statistical Tests
2
σ β µ + µ1 log + m 0 − c rm = µ1 − µ0 1 − α 2 (iv) Plot the above two lines in a graph. (v) For each m, find the cumulative sum of xi and plot in the graph. (vi) For every stage of m, the following decision is made which is provided in the conclusion. Conclusion m
(i) Accept H0 if
∑ (x i − c ) ≤ a m
(ii) Accept H1 if
∑ (xi − c ) ≥ rm
i =1 m
i =1
(iii) Continue sampling if a m