Linear Models: A Mean Model Approach
This is a volume in PROBABILITY AND MATHEMATICAL STATISTICS Z. W. Birnbaum, foun...
73 downloads
869 Views
8MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Linear Models: A Mean Model Approach
This is a volume in PROBABILITY AND MATHEMATICAL STATISTICS Z. W. Birnbaum, founding editor David Aldous, Y. L. Tong, series editors A list of titles in this series appears at the end of this volume.
Linear Models: A Mean Model Approach Barry Kurt Moser Department of Statistics Oklahoma State University Stillwater, Oklahoma
Academic Press San Diego Boston New York London Sydney Tokyo Toronto
This book is printed on acid-free paper, ^y Copyright © 1996 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted-in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc. 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com Academic Press Limited 24-28 Oval Road, London NW1 7DX, UK http://www.hbuk.co.uk/ap/ Library of Congress Cataloging-in-Publication Data Moser, Barry Kurt. Linear models : a mean model approach / by Barry Kurt Moser. p. cm. — (Probability and mathematical statistics) Includes bibliographical references and index. ISBN 0-12-508465-X (alk. paper) 1. Linear models (Statistics) I. Title. II. Series. QA279.M685 1996 519.5'35--dc20 96-33930 CIP PRINTED IN THE UNITED STATES OF AMERICA 96 97 98 99 00 01 BC 9 8 7 6 5
4
3 2 1
To my three precious ones.
This page intentionally left blank
Contents
xi
Preface Chapter 1 1.1 1.2 1.3
Elementary Matrix Concepts Kronecker Products Random Vectors
Chapter 2 2.1 2.2 2.3
Multivariate Normal Distribution
Multivariate Normal Distribution Function Conditional Distributions of Multivariate Normal Random Vectors Distributions of Certain Quadratic Forms
Chapter 3 3.1 3.2
Linear Algebra and Related Introductory Topics
Distributions of Quadratic Forms
Quadratic Forms of Normal Random Vectors Independence VII
1 1 12 16
23 23 29 32
41 41 45
Contents
viii
3.3 3.4
Chapter 4 4.1 4.2 4.3 4.4 4.5
Complete, Balanced Factorial Experiments
53
Models That Admit Restrictions (Finite Models) Models That Do Not Admit Restrictions (Infinite Models) Sum of Squares and Covariance Matrix Algorithms Expected Mean Squares Algorithm Applications
53 56 58 64 66
Chapter 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7
6.4 6.5
Maximum Likelihood Estimation and Related Topics
Maximum Likelihood Estimators of /3 and a Invariance Property, Sufficiency, and Completeness ANOVA Methods for Finding Maximum Likelihood Estimators The Likelihood Ratio Test for H/3 = h Confidence Bands on Linear Combinations of (3
81 86 87 89 91 94 97
105 105 108 111 119 126
Unbalanced Designs and Missing Data
131
Replication Matrices Pattern Matrices and Missing Data Using Replication and Pattern Matrices Together
131 138 144
Chapter 8 8.1 8.2 8.3
81
2
Chapter 7 7.1 7.2 7.3
Least-Squares Regression
Ordinary Least-Squares Estimation Best Linear Unbiased Estimators ANOVA Table for the Ordinary Least-Squares Regression Function Weighted Least-Squares Regression Lack of Fit Test Partitioning the Sum of Squares Regression The Model Y = X/3 + E in Complete, Balanced Factorials
Chapter 6 6.1 6.2 6.3
47 49
The t and F Distributions Bhat's Lemma
Balanced Incomplete Block Designs
General Balanced Incomplete Block Design Analysis of the General Case Matrix Derivations of Kempthorne's Interblock and Intrablock Treatment Difference Estimators
149 149 152 155
ix
Contents Chapter 9 9.1 9.2 9.3 9.4 9.5
Model Assumptions and Examples The Mean Model Solution Mean Model Analysis When cov(E) = a2ln Estimable Functions Mean Model Analysis When cov(E) = cr2V
Chapter 10 10.1 10.2 10.3 10.4 10.5 10.6
Less Than Full Rank Models
The General Mixed Model
The Mixed Model Structure and Assumptions Random Portion Analysis: Type I Sum of Squares Method Random Portion Analysis: Restricted Maximum Likelihood Method Random Portion Analysis: A Numerical Example Fixed Portion Analysis Fixed Portion Analysis: A Numerical Example
161 161 164 165 168 172
177 177 179 182 183 184 186
Appendix 1 Computer Output for Chapter 5
189
Appendix 2
193
A2.1 A2.2
Computer Output for Chapter 7
Computer Output for Section 7.2 Computer Output for Section 7.3
193 201
Appendix 3
Computer Output for Chapter 8
207
Appendix 4
Computer Output for Chapter 9
209
Appendix 5
Computer Output for Chapter 10
213
A5.1 A5.2 A5.3
Computer Output for Section 10.2 Computer Output for Section 10.4 Computer Output for Section 10.6
References and Related Literature Subject Index
213 216 218 221 225
This page intentionally left blank
Preface
Linear models is a broad and diversified subject area. Because the subject area is so vast, no attempt was made in this text to cover all possible linear models topics. Rather, the objective of this book is to cover a series of introductory topics that give a student a solid foundation in the study of linear models. The text is intended for graduate students who are interested in linear statistical modeling. It has been my experience that students in this group enter a linear models course with some exposure to mathematical statistics, linear algebra, normal distribution theory, linear regression, and design of experiments. The attempt here is to build on that experience and to develop these subject areas within the linear models framework. The early chapters of the text concentrate on the linear algebra and normal distribution theory needed for a linear models study. Examples of experiments with complete, balanced designs are introduced early in the text to give the student a familiar foundation on which to build. Chapter 4 of the text concentrates entirely on complete, balanced models. This early dedication to complete, balanced models is intentional. It has been my experience that students are generally more comfortable learning structured material. Therefore, the structured rules that apply to complete, balanced designs give the student a set of leamable tools on which to
xi
xii
Preface
build confidence. Later chapters of the text then expand the discussion to more complicated incomplete, unbalanced and mixed models. The same tools learned for the balanced, complete models are simply expanded to apply to these more complicated cases. The hope is that the text progresses in an orderly manner with one topic building on the next. I thank all the people who contributed to this text. First, I thank Virgil Anderson for introducing me to the wonders of statistics. Special thanks go to Julie Sawyer, Laura Coombs, and David Weeks. Julie and Laura helped edit the text. Julie also contributed heavily to the development of Chapters 4 and 8. David has generally served as a sounding board during the writing process. He has listened to my ideas and contributed many of his own. Finally, and most importantly, I thank my wife, Diane, for her generosity and support.
1
Linear Algebra and Related Introductory Topics
A summary of relevant linear algebra concepts is presented in this chapter. Throughout the text boldfaced letters such as A, U, T, X, Y, t, g, u are used to represent matrices and vectors, italicized capital letters such as 7, £7, T, E, F are used to represent random variables, and lowercase italicized letters such as r, s, t, n, c are used as constants.
1.1
ELEMENTARY MATRIX CONCEPTS
The following list of definitions provides a brief summary of some useful matrix operations. Definition 1.1.1 Matrix: An r x s matrix A is a rectangular array of elements with r rows and s columns. An r x 1 vector Y is a matrix with r rows and 1 column. Matrix elements are restricted to real numbers throughout the text. Definition 1.1.2 Transpose: If A is an n x s matrix, then the transpose of A, denoted by A', is an s x n matrix formed by interchanging the rows and columns of A.
1
2
Linear Models
Definition 1.1.3 Identity Matrix, Matrix of Ones and Zeros: In represents an n x n identity matrix, Jw is an n x n matrix of ones, ln is an n x 1 vector of ones, and O mxrt is an m x n matrix of zeros. Definition 1.1.4 Multiplication of Matrices: Let atj represent the 17* element of an r x s matrix A with i = 1 , . . . , r rows and j = 1 , . . . , s columns. Likewise, let bjk represent the jk0* element of an s x t matrix B with j = 1,..., s rows and k = 1,..., t columns. The matrix multiplication of A and B is represented by AB = C where C is an r x t matrix whose ffc* element c(* = X)/=i atjbjk- If the r x s matrix A is multiplied by a scalar d, then the resulting r x s matrix d\ has /7th element da,y. Example 1.1.1
The following matrix multiplications commonly occur.
Definition 1.1.5 Addition of Matrices: The sum of two r x s matrices A and B is represented by A + B = C where C is the r x s matrix whose 17* element
Definition 1.1.6 Inverse of a Matrix: An n x n matrix A has an inverse if AA"1 = A"1 A = !„ where the n x n inverse matrix is denoted by A"1. Definition 1.1.7 Singularity: If an n x n matrix A has an inverse then A is a nonsingular matrix. If A does not have an inverse then A is a singular matrix. Definition 1.1.8 Diagonal Matrix: Let a,, be the im diagonal element of an n x n matrix A. Let a^ be the /7th off-diagonal element of A for i / 7. Then A is a diagonal matrix if all the off-diagonal elements a,; equal zero. Definition 1.1.9 Trace of a Square Matrix: The trace of an n x n matrix A, denoted by tr(A), is the sum of the diagonal elements of A. That is, tr(A) =
1
Linear Algebra
3
It is assumed that the reader is familiar with the definition of the determinant of a square matrix. Therefore, a rigorous definition is omitted. The next definition actually provides the notation used for a determinant. Definition 1.1.10 Determinant of a Square Matrix: Let det(A) = |A| denote the determinant of an n xn matrix A. Note det(A) = 0 if A is singular. Definition 1.1.11
Symmetric Matrix: Annxn matrix A is symmetric if A = A'.
Definition 1.1.12 Linear Dependence and the Rank of a Matrix: Let A be an n x s matrix (s < n) where a\,..., as represent the s n x 1 column vectors of A. The 5 vectors a i , . . . , a5 are linearly dependent provided there exists s elements ki,...,ks, not all zero, such that k&i -\ h ksas = 0. Otherwise, the s vectors are linearly independent. Furthermore, if there are exactly r < s vectors of the set a i , . . . , as which are linearly independent, while the remaining s — r can be expressed as a linear combination of these r vectors, then the rank of A, denoted by rank (A), is r. The following list shows the results of the preceding definitions and are stated without proof: Result LI:
Let A and B each ben xn nonsingular matrices. Then
Result 1.2: Let A and B be any two matrices such that AB is defined. Then Result 1.3: Let A be any matrix. The A'A and AA' are symmetric. Result 1.4: Let A and B each be n x n matrices. Then det(AB) = [det(A)][det(B)]. Result 1.5: Let A and B be m x n and n x m matrices, respectively. Then tr(AB) = tr(BA). Quadratic forms play a key role in linear model theory. The following definitions introduce quadratic forms. Definition 1.1.13 Quadratic Forms: A function f ( x 1 , . . . , x n ) is a quadratic form if /(*i,..., jc.) = £?=i Znj=i atjxtxj = X'AX where X = ( x l . . . , xn)' is an n x 1 vector and A is an n x n symmetric matrix whose //* element is a,;.
4
Linear Models
The symmetric matrix A is constructed by setting a/,- and a-}i equal to one-half the coefficient on the ;c,jc; term for / / ./. Example 1.1.3 Quadratic forms are very useful for defining sums of squares. For example, let
where the n x 1 vector X = (x\,..., xn)'. The sum of squares around the sample mean is another common example. Let
Definition 1.1.14 Orthogonal Matrix: Ann x n matrix P is orthogonal if and only if P-1 = P'. Therefore, PP7 = P'P = In. If P is written as (pi, p 2 , . . . , pB) where p, is an n x 1 column vector of P for / = 1,..., n, then necessary and sufficient conditions for P to be orthogonal are
1
Linear Algebra
Example 1.1.4
5
Let the n x n matrix
where PP = P'P = Iw. The columns of P are created as follows:
The matrix P' in Example 1.1.4 is generally referred to as an n-dimensional Helmert matrix. The Helmert matrix has some interesting properties. Write P as P = (PJ |PM) where the n x 1 vector pi = (\/*Jn)ln and the n x (n — 1) matrix P« = (P2, P3, Pn) then
The (n — 1) x n matrix P^ will be referred to as the lower portion of an n -dimensional Helmert matrix. If X is an n x 1 vector and A is an n x n matrix, then AX defines n linear combinations of the elements of X. Such transformations from X to AX are very useful in linear models. Of particular interest are transformations of the vector X that produce multiples of X. That is, we are interested in transformations that
6
Linear Models
satisfy the relationship
where A. is a scalar multiple. The above relationship holds if and only if
But the determinant of XL. — A is an n* degree polynomial in X. Thus, there are exactly n values of X that satisfy |XIn — A| =0. These n values of A. are called the n eigenvalues of the matrix A. They are denoted by Xi, X 2 , . . . , Xn. Corresponding to each eigenvalue X, there is an n x 1 vector X, that satisfies where X, is called the Ith eigenvector of the matrix A corresponding to the eigenvalue A,. Example 1.1.5 Find the eigenvalues and vectors of the 3 x 3 matrix A = 0.61s + 0.4J3. First, set |XI3 — A| = 0. This relationship produces the cubic equation
Therefore, the eigenvalues of A are Xi = 1.8,X2 = X3 = 0.6. Next, find vectors X, that satisfy (A — X,-I3)X,- = 0 3x i for each / = 1, 2, 3. For Xi = 1.8, (A - 1.8I3)Xi = 0 3x i or (-1.2I3 + 0.4J3)Xi = 0 3x i. The vector Xi = (l/\/3)I3 satisfies this relationship. For X2 = X3 = 0.6, (A — 0.6I3)X, = 0 3x i or 13X« = 0 3x i for i = 2,3. The vectors X2 = (l/\/2, -1/V5,0)' and X3 = (l/\/6,1/V6, -2/V6)' satisfy this condition. Note that vectors Xi,X 2 ,X 3 are normalized and orthogonal since XjXi = X2X2 = X3X3 = 1 and XjXi = x;x3 = x2x3 = o. The following theorems address the uniqueness or nonuniqueness of the eigenvector associated with each eigenvalue. Theorem 1.1.1 There exists at least one eigenvector corresponding to each eigenvalue. Theorem 1.1.2 If an n xn matrix A has n distinct eigenvalues, then there exist exactly n linearly independent eigenvectors, one associated with each eigenvalue. In the next theorem and corollary a symmetric matrix is defined in terms of its eigenvalues and eigenvectors.
1
Linear Algebra
7
Theorem 1.1.3 Let A be an n x n symmetric matrix. There exists an n x n orthogonal matrix P such that P'AP = D where D is a diagonal matrix whose diagonal elements are the eigenvalues of A and where the columns ofP are the orthogonal, normalized eigenvectors of A. The ith column of P (i.e., the ith eigenvectors of A.) corresponds to the ith diagonal element of D for / = !,...,«. Example 1.1.6
Let A be the 3 x 3 matrix from Example 1.1.5. Then P'AP = D or
Theorem 1.1.3 can be used to relate the trace and determinant of a symmetric matrix to its eigenvalues.
The number of times an eigenvalue occurs is the multiplicity of the value. This idea is formalized in the next definition. Definition 1.1.15 Multiplicity: The n x n matrix A has eigenvalue A* with multiplicity m < n if m of the eigenvalues of A equal A.*. Example 1.1.7 All the n eigenvalues of the identity matrix In equal 1. Therefore, In has eigenvalue 1 with multiplicity n. Example 1.1.8 Find the eigenvalues and eigenvectors of the n xn matrix G = (a - b)ln + bjn. First, note that
8
Linear Models
Therefore, a+(n — 1 }b is an eigenvalue of matrix G with corresponding normalized eigenvector (\^/n)\n. Next, take any n x 1 vector X such that l^X = 0. (One set of n — 1 vectors that satisfies l^X = 0 are the column vectors p2, p s , . . . , pn from Example 1.1.4.) Rewrite G = (a — b)\n + blnl'n. Therefore,
and matrix G has eigenvalue a — b. Furthermore,
Therefore, eigenvalue a + (n — \)b has multiplicity 1 and eigenvalue a — b has multiplicity n — 1. Note that the 3 x 3 matrix A in Example 1.1.5 is a special case of matrix G with a = 1, b = 0.4, and n = 3. It will be convenient at times to separate a matrix into its submatrix components. Such a separation is called partitioning. Definition 1.1.16 Partitioning a Matrix: If A is an m x n matrix then A can be separated or partitioned as
Most of the square matrices used in this text are either positive definite or positive semidefinite. These two general matrix types are described in the following definitions. Definition 1.1.17 semidefinite if
Positive Semidefinite Matrix: Ann x n matrix A is positive
(i) A = A',
(ii) Y'AY > 0 for all n x 1 real vectors Y, and (iii)
Y'AY = 0 for at least one n x I nonzero real vector Y.
Definition 1.1.18 nite if
Positive Definite Matrix: Ann x n matrix A is positive defi-
1
Linear Algebra
9
(i) A = A' and
(ii) Y'AY > 0 for all nonzero n x 1 real vectors Y. Example 1.1.10 The n x n identity matrix IM is positive definite because In is symmetric and Y'In Y > 0 for all nonzero n x 1 real vectors Y. Theorem 1.1.5
Let A. be an n x n positive definite matrix. Then
(i)
there exists an n x n matrix B of rank n such that A = BB' and
(ii)
the eigenvalues of A. are all positive.
The following example demonstrates how the matrix B in Theorem 1.1.5 can be constructed. Example 1.1.11 Let A be an n x n positive definite matrix. Thus, A = A' and by Theorem 1.1.3 there exists n x n matrices P and D such that P'AP = D where P is the orthogonal matrix whose columns are the eigenvectors of A, and D is the corresponding diagonal matrix of eigenvalues. Therefore, A = PDP' = PD1/2D1/2P/ = BB' where D1/2 is an n x n diagonal matrix whose Ith diagonal element is x}'2 and B = PD1/2. Certain square matrices have the characteristic that A2 = A. For example, let
Matrices of this type are introduced in the next definition. Definition 1.1.19
Idempotent Matrices: Let A be an n xn matrix. Then
(i) A is idempotent if A2 = A and (ii) A is symmetric, idempotent if A = A2 and A = A'. Note that if A is idempotent of rank n then A = ln. In linear model applications, idempotent matrices generally occur in the context of quadratic forms. Since the matrix in a quadratic form is symmetric, we generally restrict our attention to symmetric, idempotent matrices.
10
Linear Models
Theorem 1.1.6 Let H be ann xn symmetric, idempotent matrix of rank r < n. Then B is positive semidefinite. The next theorem will prove useful when examining sums of squares in ANOVA problems.
The eigenvalues of the matrix !„ — £ Jn are derived in the next example. Example 1.1.12 The symmetric, idempotent matrix !„ — ^Jn takes the form (a — b)\n + bjn with a = 1 — £ and b = — £. Therefore, by Example 1.1.8, the eigenvalues of In - Jjn are a + (n - l)b = (!-£) + («- !)(-£) = 0 with multiplicity 1 and a — b = (I — £) — (— £) = 1 with multiplicity n — 1. The result that the eigenvalues of an idempotent matrix are all zeros and ones is generalized in the next theorem. Theorem 1.1.8 The eigenvalues of an n x n symmetric matrix A of rank r Z ~ NB(0, !„). Furthermore, Y'AY = (TZ + /i)'A(TZ + /x) = (Z + T~V)'T'AT(Z + T'1^) = (Z + T-1/Lt)TDr'(Z + I"1/*) where TAT is an n x n symmetric matrix, T is the n x w orthogonal matrix of eigenvectors of T'AT, and D is the n x n diagonal matrix of eigenvalues of T'AT such that T'AT = TDF'. The eigenvalues of T'AT are AI, . . . , Ap, 0 , . . . , 0 and rank (T'AT) = p. Let W = (Wi,..., Wn)' = T'(Z + T-1//)- Therefore, Y'AY = W'DW = £f = i ^^- BY Theorem 2.1.2 with n x n matrix B = T' and n x 1 vector b = T'T"1//' W ~ NnCF'T-1/*, I*)Therefore, W/2 are independent x\(8i > 0) random variables for i = 1,...,/?. Furthermore, /? = rank(T'AT) = rank(ATT') = rank(AE) because T is nonsingular. Finally, the eigenvalues of T'AT are found by solving the polynomial equation Premultiplying the above expression by |T' :| and postmultiplying by |T'| we obtain
Thus, the eigenvalues iT'AT are the eigenvalues of AE. We now reexamine the distributions of a number of quadratic forms previously derived in Section 2.3. Example 3.1.1 From Example 2.3.1 let the n x 1 random vector Y ~ Nn («!„, which is independent of XjD a Xi. Therefore, Z'GZ and Z'HZ are independent. The proof of the converse statement is supplied bySearle(1971). • The following theorem considers the independence of a quadratic form and linear combinations of a normally distributed random vector. Theorem 3.2.2 Let A and Bbenxn and mxn constant matrices, respectively. Let the n x I random vector Y ~ N rt (/x, £). The quadratic form Y'AY and the set of linear combinations BY are independent if and only j/BSA = 0 (or AEB' = 0). Proof: The "if" portion can be proven by the same method used in the proof of Theorem 3.2.1. The proof of the converse statement is supplied by Searle (1971). • In the following examples the independence of certain quadratic forms and linear combinations is examined.
3
Distributions of Quadratic Forms
47
Example 3.2.1 Consider the one-way classification described in Examples 1.2.10 and 2.1.4. The sum of squares due to the fixed factor is Y'A2Y where A2 = (lt — yJr) pJr is an idempotent matrix of rank t — 1. Furthermore, A2E = [(I, - yj,) ® ^J r ] [a2!, <S> Ir] = o-2A2. The sum of squares due to the nested replicates is Y'A3 Y where A3 = I, (Ir — £ Jr) is an idempotent matrix of rankt(r - 1). Likewise, A3E = [I, (Ir - pJr)] [o-2!, Ir] = cr2A3. Therefore, by Corollary 3.1.2(a), Y'A2Y ~ or 2 x f 2 _ 1 (X 2 ) and Y'A3Y ~ CT2x2(r_1)(X3) where A2 =
[(Ml, • • • , HA' 2
Ir]' [(I/ -
}J,)
±Jr] [(Ml, • • • , M/)' ® lr]/(2a 2 ) =
2
'ELO*.- - A-) /(2a ) with A- = EU^i'A and A3 = [ ( / m , . . . , /*,)' 0 ilrl' [Ir ® dr - fJr)] [ ( ^ i , . . . , ^ X ® lrl/(2or2) = 0. Finally, by Theorem 3.2.1, Y'A2Y and Y'A3Y are independent since A2EA3 = dr - f Jr)] = O rrx ,r-
Example 3.2.2 Reconsider Example 2.3.1 where Y = (Y\,..., 7n)' ~ Nn (ctln, a 2 I w ), 17 = S?=1(yi- - ?)2/a2 = y'td/a 2 )^ - ijB)]Y and ? = (l/n)i;Y. By Theorem 3.2.2, f and t/ are independent since (l/n)l^[cr 2 I n ] [(l/a 2 )(I n -
iX,)] = 0 1XB .
3.3
THE ^ AND F DISTRIBUTIONS
The normal and chi-square distributions were discussed at length in the previous sections. We now examine the distributions of certain functions of chi-square and normal random variables. Definition 3.3.1 Noncentral t Random Variable: Let the random variable Y ~ NI (a, cr 2 ) and the random variable U ~ x^(0)- If Y and U are independent, then the random variable T = (Y/a)/^/U/n is distributed as a noncentral t random variable with n degrees of freedom and noncentrality parameter A = «2/2. Denote this noncentral t random variable as f rt (A). Definition 3.3.2 Noncentral F Random Variable: Let the random variable U\ ~ X^ (A.) and the random variable f/2 ~ Xn 2 (^)- ^ ^1 an(^ ^2 are independent, then the random variable F = (U\ /n \)/ (t/2/«2) is distributed as a noncentral F random variable with n \ and n2 degrees of freedom and noncentrality parameter X. Denote this noncentral F random variable as F MliW2 (A). A t random variable with n degrees of freedom and a noncentrality parameter equal to zero [i.e., tn(k = 0)] has a central t distribution. Likewise, an F random variable with n\ and n2 degrees of freedom and a noncentrality parameter equal to zero [i.e., Fn^n2(k = 0)] has a central F distribution.
48
Linear Models
In recent years Smith and Lewis (1980, 1982), Pavur and Lewis (1983), Scariano, Neill, and Davenport (1984) and Scariano and Davenport (1984) have developed the theory of the corrected F random variable. The definition of the corrected F random variable is given next. Definition 3.3.3 Noncentral Corrected F Random Variable: Let the random variable U\ ~ cix^(X) and the random variable £/2 ~ C2X 2 2 (0). If U\ and t/2 are independent, then the random variable Fc = (c 2 /ci)[(f/i/ni)/(£/ 2 /n 2 )] ~ ^/Ji,n 2 (^) is called a corrected F random variable where the ratio c2/ci is the correction factor. In practice, we often encounter independent random variables U\ and (72, which are distributed as multiples of chi-square random variables (t/2 being a multiple of a central chi square). The random variable F = (£/i/wi)/(£/ 2 /n 2 ) in this case will be distributed as a noncentral F random variable if and only if c\ = c2 (i.e., ci/c\ = 1). Generally, c\ and c2 will be linear combinations of unknown variance parameters. In the following examples a number of central and noncentral t and F random variables are derived. Example 3.3.1 Letthenx 1 random vector Y =(Yi,..., Yn)' ~ N n (al n , a2ln). By Example 2.1.1, Y ~ N^a 2 /")- By Example 3.1.1, E"=i(y< ~ ?)2 = Y'[OU - ±J W )]Y ~ 0 is equivalent to the hypothesis HO : Mi = At2 = • • • = fJLt versus HI : the /x/'s are not all equal. Thus, under HO, the statistic F* has a central F distribution with t — I and t (r — 1) degrees of freedom. A y level rejection region for the hypothesis HO versus HI is as follows: Reject HO if F* > F^_l t ^ r _ l ) where
3
Distributions of Quadratic Forms
49
Ff_i t(r-\) 1S me 100 (1 — y)* percentile point of a central F distribution with t — 1 and t(r — 1) degrees of freedom. Example 3.3.3 Consider the two-way cross classification described in Example 3.1.2. The sums of squares due to the random factor 5 and due to the random interaction S T are given by Y'A2Y and Y'A4Y, respectively. It was shown that Y'A2Y = Y'[(I5 - ij,) ij,]Y ~ (ajT + ra|)x,2_i(0). Furthermore, A4S = [(I. - jJs) (If - jjf)] [ J/ + or|j, (g> I, + ajTls I,] = k. The statistics S\,..., Sr are jointly sufficient for 0 if and
6
Maximum Likelihood Estimation
109
only if
where g(S, 0) does not depend on Y\,..., Yn except through S and h (Y\,..., Yn) does not involve 0. Example 6.2.2 Letthen x 1 random vector Y = ( F i , . . . , Yn)' ~ N n (al n , a 2 I n ). The statistics Si = l^Y and £2 = Y'Y are jointly sufficient for 0 = (a, a2)' since
The next theorem and example link the ideas of sufficiency and maximum likelihood estimation. Theorem 6.2.2 IfS = (Si,..., Sr)' are jointly sufficient for the vector 9 and if 6 is a unique MLE of 6, then 6 is a function ofS. Proof:
By the factorization theorem
which means that the value of 6 that maximizes /Y(-) depends on S. If the MLE is unique, the MLE of 0 must be a function of S. • Example 6.2.3 Consider the problem from Example 6.2.2. Rewrite the model as Y = Xa + E where the n x 1 matrix X = !„ and the n x 1 random vector E ~ Nn(0, o2ln}. Therefore, the MLE of a is given by
and the MLE of a2 is
The MLEs & = S\/n and a2 — [(82 — S 2 /«)/n] and jointly sufficient for a and a2 where Si = l'n Y and S2 = Y'Y.
110
Linear Models
This section concludes with a discussion of completeness and its relation to minimum variance unbiased estimators. Definition 6.2.2 Completeness: A family of probability distribution functions {/T(f, 0), 9 6 0} is called complete if E[w(T)] = 0 for all 6 P,)Y, the b(t - 1) x (t - 1) matrix K2 = 1& lt-\, the (t — 1) x 1 vector of unknown parameters 62 = P[^» and the b(f — 1) x 1 random vector E2 ~ N/,(,_i)(0, o\T\b I,_i). Therefore, the MLE of the (t — 1) x 1 vector #2 is
The MLE of a|r is given by
Note that the MLE of a%T is the sum of squares for the block by treatment interaction divided by b(t — 1). Now the maximum likelihood estimators of (0\, 02) and the invariance property are used to derive the MLEs of the original parameters /3 = (0i,...,/3,)'- Note that
Premultiplying by the t x t matrix (yLJP,) we obtain
114
Linear Models
or
Therefore, by the invariance property, the MLE of ft is given by
The model from Example 6.3.1 belongs to a class of linear models where the MLE of ft equals the ordinary least-squares estimator of ft and the MLEs of the variance parameters in E are linear combinations of the ANOVA mean squares for the random effects. The next theorem provides formulas for the MLEs of a broad class of models, including the model from Example 6.2.1. Theorem 6.3.1 Let Y = X/3 + E where Y is an n x 1 random vector, X is an n x p matrix of rank p, ft is a p x 1 vector of unknown paramters, and the n x I random vector E ~ Nn (0, £). For i = 1 , . . . , m, let Y'B, Y and Y'C, Y be sums of squares corresponding to the various fixed and random effects, respectively, such that ln = YZLi(Bi + c 0, rank (C,-) = ri > 0, p = Y^T=i Pi ana n = Y^iLi (Pi + r i)- If there exist unique constants af > 0 such that X = ^lai(Bi+Ci)then i)
the maximum likelihood estimator of (3 is given by J3 = (X'X)-1X'Y if and only ifY%=\ B*x = X' ana
ii)
under the conditions that induce i), the maximum likelihood estimator ofaf is given by at = Y'QY/fo- + r,).
Proof: By Theorem 1.1.7, B, and C; are idempotent matrices for i = 1 , . . . , m; BiCj = O n x n for any i, j = 1 , . . . , m; Bi-Bj = O nxn for any i ^ j; and therefore, Bi + C, is an idempotent matrix of rank pi + ri. These conditions imply S"1 =
Eii^CBi + Q).
6
Maximum Likelihood Estimation
115
i) Assume £Hi B,X = X. Then B,C; = O wxw for all i, j implies C;X = 517=i C,B,X = O nxp for all ;. Now let the n x p matrix Q = [Qi |Q2| • • • |Qm] where Q, is an n x pt matrix of rank pf such that B, = Q, Q^ for each i = 1,..., m. Thus,QQ' = 52?=i B, and£7=1 airlBi = QA"1^ where Aisapxpnonsingular block diagonal matrix with a, lpi on the diagonal for i = 1,..., m. Therefore, X = 527=i B,X = QQ'X, which implies Q = X(Q'X)-1. The maximum likelihood estimator of /3 is given by
The "only if" portion of the proof of i) is omitted, but can be found in Moser and McCann (1995). ii) Before deriving the MLE of a,-, we need to derive a particular relationship between the matrices X and B,-. From the proof of i), Q = XCQ'X)"1. Therefore, X(X'X)-1XQ, = QJ; for / = 1,..., m. Premultiplying by Q't produces X(X'X)-1XB/ = B,-. Now, to find the MLE of a/, write the likelihood function with (3 replaced by ft and XT1 = 527=i "f^8' + Q)- Note tnat &)'; and the n x 1 random vector E = ( E m , . . . , EH,,,, • ••, Ebt\, • • • , Ebtn,,}' ~ NB(0, E). To construct the n x n covariance matrix E, redefine the random variable Eijv as
for v = 1,..., rtj where the random variables 5, represent the random block effect such that Bi ~ iid NI (0, or|); the random variables BTIJ represent the random block treatment interaction such that the b(t - 1) x 1 vector (Ib PJ) (BTn,..., BTbt)' ~ Nfc(,_i)(0, a| r lfci8)lf_i); and the random variables R(BT}(ij)v represent the random nested replicates such that R(BT\ij)V ~ iidNi(Q, cf^BT)). Furthermore, assume that #/, (BTn, • • • , BTbt)f, and R(BT)(ij)V are uncorrelated. Next, construct the covariance matrix when there is one replicate observation per block treatment combination. If r(J = 1 for all /, j then the bt x bt covariance matrix is given by
where the subscript d on Ed denotes a covariance matrix for one replicate observation in each of the bt distinct block treatment combinations. Note that the variance of R(BT\ij)V is nonestimable when r,; = 1 for all /, j. Therefore, a R(BT) d°es not appear in Ed. Now expand the covariance structure Ed to include all n = X!f=i Zlj=i rtj observations by premultiplying Ed by R, postmultiplying Ed and R', and adding a variance component that will account for the estimable variance of the R(BT)(ij)k variables when r,y > 1. Therefore, the n x n covariance
7
Unbalanced Designs and Missing Data
137
Table 7.1.3 Type I Sums of Squares for Example 7.1.2 Source
df
Type I SS
Overall mean n
1
Y'Si(S',Si)-1S',Y == Y'Ij n Y
Block (B)\n
b-\
Treatment (T)\n, B
t- 1
Y'[Ti(T'1Tir1T'1 -- -n^ Y'[S2(S^S2)-1S^ - TI(T;T I )- 'T',]Y
BT\fi,B,T Rep (BT)
(b - \)(f - 1)
Y'raC^r'Tj -- S2(S2S2r 'S2]Y
n-bt
Y'tln-RD-'R'JY
Total
n
Y'Y
matrix E is given by
The ANOVA table with Type I sums of squares can also be constructed for this example. First, consider what the sums of squares would be if there was one replicate observation per block treatment combination. If r,; = 1 for all /, j, the matrices for the sums of squares due to the overall mean, blocks, treatments, and the block by treatment interaction are given by
respectively, where Xw = lb lt, Zw = PJ, 1,, X2d = 1* ® Pr, Z2d = P^ P,, and P^ is the (t — 1) x t lower portion of an ^-dimensional Helmert matrix. Let Si = RXld = l B ,Ti = R(X ld |Z ld ),S 2 = R(Xid|Zld|X2d), T2 = RCXidlZidlXw^), and D = R'R. Matrices Si, TI, S2, T2, and R are used to construct Type I sums of squares in Table 7.1.3. In the next section pattern matrices are used in data structures with crossed factors and missing data.
Linear Models
138
Figure 7.2.1
7.2
Missing Data Example.
PATTERN MATRICES AND MISSING DATA
In some data sets, certain data points accidentally end up missing. In other data sets, data points are intentionally not observed in certain cells. For example, in fractional factorials or incomplete block designs, data are not observed in certain cells. In either case, the overall structure of such experiments follows the complete, balanced factorial form except that the actual observed data set contains "holes" where no data are observed. These holes are located in patterns in fractional factorial experiments and incomplete block designs. However, in other experiments, the holes appear irregularly. Such experiments with missing data can be examined using pattern matrices. The following example introduces the topic. Consider the two-way cross classification described in Figure 7.2.1. The experiment contains three random blocks and three fixed treatment levels. However, the observed data set contains no observations in the (1, 1), (2, 2), and (3, 3) block treatment combinations and one observation in each of the other six block treatment combinations. This may have arisen from a balanced incomplete block design. We begin our discussion by first examining the experiment when one observation is present in all nine block treatment combinations. In this complete, balanced design, let the 9 x 1 random vector Y* = (Yn, Y{2, Y^, Y2l, *22, ^23, ^31, Y32, F33)'. Write the model Y* = X*/3+E* where the 9 x 3 matrix X* = I 3 l 3 ,the3xl vector ft = (Pi, p2, 0 3 )',the9x 1 error vector E* = (En, E{2, £13, £21, £22, £23, E 3 i,E 3 2,£33)'~N 9 (0, E*),and
The 9x9 covariance matrix E* is built by setting Etj = B{ + (BT)ij and applying the covariance matrix algorithm from Chapter 4. For this complete, balanced data set, the sums of squares matrices for the mean, blocks, treatments, and the block
7
Unbalanced Designs and Missing Data
139
by treatment interaction are given by
respectively, whereXi = la^la, Zi = Q3013, X2 = 13<S>Q3, Z2 = Q3<S>Q3,and
The actual data set only contains six observations since Fy, ¥22, and ^33 are missing. Let the 6 x 1 vector Y = (Yu, Y\3, 72i, ^23, ^31, ^32)' depict the actual observed data set. Note that Y = MY* where the 6 x 9 pattern matrix M is given by
Furthermore, note MM' = l^ and the 9 x 9 matrix
The vector of actual observations Y contains the second, third, fourth, sixth, seventh, and eighth elements of the complete data vector Y*. Therefore, the second, third, fourth, sixth, seventh, and eighth diagonal elements of M'M are
140
Linear Models
ones and all other elements of M'M are zero. Furthermore, M is a 6 x 9 matrix of zeros and ones, with a one placed in the second, third, fourth, sixth, seventh, and eighth columns of rows one through six, respectively. Since the 9 x 1 complete vector Y* ~ N9(X*/3, E*), the 6 x 1 vector of actual observations Y = MY* ~ N6(X/3, E) where
and
The Type I sums of squares for this problem are presented in Table 7.2.1 using matrices Si = MXi,Ti = M[Xi|Zi], and S2 = M[Xi|Zi|X 2 ]. The sum of squares matrices AI, ..., A4 is Table 7.2.1 were calculated numerically using PROCIML in SAS. The PROCIML output for this section is presented in Section A2.1 of Appendix 2. The resulting idempotent matrices are as follows:
Depending on the pattern of the missing data, some Type I sum of squares matrices may have zero rank. In the example data set, the Type I sums of squares
142
Linear Models
four sums of squares Y'Ai Y , . . . , Y'A4Y and mutually independent. Therefore,
where X3 = 0 under the hypothesis H0 : fi\ — fa = fa. A y level rejection region for the hypothesis HO : fi\ = fa = fa versus HI : not all /Ts equal is to reject HO if F* > F%i where F^\ is the 100(1 — y) percentile point of a central F distribution with 2 and 1 degrees of freedom. Note that HO is equivalent to hypothesis that there is no treatment effect. The Type I sums of squares can also be used to provide unbiased estimators of the variance components aj and o\T. The mean square for BT|/z, B, T provides an unbiased estimator of aJr since A.4 = OandE(Y'A4Y/l) = E(ag r x 2 (0)) = a\T. Constructing an unbiased estimator for crj involves a little more work. In complete balanced designs, the sum of squares for blocks can be used to find an unbiased estimator for o\. However, in this balanced, incomplete design problem, the block effect is confounded with the treatment effect. Therefore, the Type I sum of squares for Block(fi)|/^ has a noncentrality parameter A.2 > 0 and cannot be used directly with Y'A4Y to form an unbiased estimator of crj. One solution to this problem is to calculate the sum of squares due to blocks after the overall mean and the treatment effect have been removed. After doing so, the block effect does not contain any treatment effects. As a result, the Type I sum of squares due to Block (B)\fj,, T has a zero noncentrality parameter and can be used with Y'A4Y to construct an unbiased estimator of crj. The Type I sum of squares due to Block (B)\fjL, T is given by
where A^ = T^TfT*)-1^' - S^'Sp"^' with S£ = M[X,|X 2 J and Tf = M [X11X21Z i ]. Note that the matrices Sj-J and T* now order the overall mean matrix Xi first, the treatment matrix X2 second, and the block matrix Z\ third. From the PROC IML output, the 6 x 6 matrix Aj for the example data set equals
Furthermore, ^(A^E) = (3cr| + cr|r) and
7
Unbalanced Designs and Missing Data
143
Therefore, an unbiased estimator of crj is provided by the quadratic form | Y'[A*, — A4]Y since
The procedure just described is now generalized. Let the n* x 1 vector Y* represent the observations from a complete, balanced factorial experiment with model Y* = X*/3 + E* where X* is an n* x p matrix of constants, (3 is a p x 1 vector of unknown parameters, and the n* x 1 random vector E* ~ Nn(0, E*) where E* can be expressed as a function of one or more unknown parameters. Suppose the n x 1 vector Y represents the actual observed data with n < n* where n* is the number of observations in the complete data set, n is the number of actual observations, and n* — n is the number of missing observations. The n x 1 random vector Y = MY* ~ Nn (X/3, E) where M is an n x n* pattern matrix of zeros and ones, X = MX*, and E = ME*M'. Each of the n rows of M has a single value of one and (n* — 1) zeros. The //'* element of M is a 1 when the t>th element in the actual data vector Y matches the jth element in the complete data vector Y* for i = !,...,« and j = !,...,«*. Furthermore, then x n matrix MM' = !„ and the n* x n* matrix M'M is an idempotent, diagonal matrix of rank n with n ones and n* — n zeros on the diagonal. The ones on the diagonal of M'M correspond to the ordered location of the actual data points in the complete data vector Y* and the zeros on the diagonal of M'M correspond to the ordered location of the missing data in the complete data vector Y*. Finally, let XXX^X,)-^ and Zs(Z'sZsrl^'s be the sum of squares matrices for the fixed and random effects in the complete data set for s = 1,..., m where rank(X^) = ps > 0, rank(Z5) = qs > 0, and Xi = 1*. Let S5 = M[X,|Zi|X 2 | • • • \Zs-i\Xs] and T, = M[Xi|Zi|X 2 | • • • |X,|Z,] for s = 1,..., m. The Type I sum of squares for the mean is Y'SiCSiSO^SjY = Y'±JnY. The Type I sums of squares for the intermediate fixed effects take the form
The Type I sum of squares for the intermediate random effects take the form
for s = 2 , . . . , < m. However, the missing data may cause some of these Type I sum of squares matrices to have zero rank. Furthermore, it may be necessary to calculate Type I sums of squares in various orders to obtain unbiased estimators of the variance components. The estimation of variance components with Type I sums of squares is discussed in detail in Chapter 10.
Linear Models
144
7.3
USING REPLICATION AND PATTERN MATRICES TOGETHER
Replication and pattern matrices can be used together in factorial experiments where certain combinations of the factors are missing and other combinations of the factors contain an unequal number of replicate observations. For example, consider the two-way cross classification described in Figure 7.3.1. The experiment contains three random blocks and three fixed treatment levels. The data set contains no observations in the (1, 1), (2, 2), and (3, 3) block treatment combinations and either one or two observations in the other six combinations. As in Section 7.2, begin by examining an experiment with exactly one observation in each of the nine block treatment combinations. In this complete, balanced design the 9 x 1 random vector Y* = (Y111, Y121, Y131, Y211, Y221, Y231, Y311, Y321, y33i)'. Use the model Y* = X*/3 + E where E ~ N9(0, E*). The matrices X*, /3, E, S*, Xi, Zi, X2, Z2, and M are defined as in Section 7.2. For the data set in Figure 7.3.1, let the 9 x 6 replication matrix R identify the nine replicate observations in the six block treatment combinations that contain data. The replication matrix R is given by
Finally, let the 9 x 1 random vector of actual observations Y = (Yu\, KBI, Fm, Y2n, Y212, y^, y311, y321, r322)'. Therefore, Y ~ N9(X/3, E) where
Figure 7.3.1
Missing Data Example with Unequal Replication.
7
145
Unbalanced Designs and Missing Data Table 7.3.1 Type I Sums of Squares for the Missing Data Example with Unequal Replication df
Type I SS
Overall mean ft
1
Y'Si (S'^iT'S', Y
Block (B)\n
2
Y'[Ti(T' 1 Tir li r, -S ] (S' 1 S 1 )- 1 S / 1 ]Y = Y / A 2 Y
Treatment (T)\n, B
2
Y'[S 2 (S^S2)" 1 S^-T,(T' 1 Ti)- 1 T / 1 ]Y = Y / A3Y
BT\[i,B,T
1
Y / [RD- 1 R / -S2(S^S 2 )- | S:,]Y
= Y'A4Y
Pure error
3
Y'^-RD-'R'JY
=Y'A 5 Y
Total
9
Y'Y
Source
=Y'AiY
and
The Type I sums of squares for this problem are presented in Table 7.3.1 using matrices R, D, Si =RMX 1 ; Ti = RM[X 1 |Zi],andS 2 = RM[Xi|Zi|X 2 ]. The sums of squares matrices AI, ..., A5 in Table 7.3.1, the matrices AI S A i , . . . , AsEAs, Aj, AjjEA^, and the noncentrality parameters X\,..., AS, ^3 were calculated numerically using PROCIML in SAS. The PROCIML output for this section is presented in Section A2.2 of Appendix 2. From the PROC IML output note that
146
Linear Models
Therefore, by Corollary 3.1.2(a),
where
The quadratic forms |[Y'A4Y-(Y'A5Y/3)] and Y'A5Y/3 are unbiased estimators of <jgT and O-|(ST), respectively, since
7
Unbalanced Designs and Missing Data
147
and
Furthermore, A5£A, = 06x6 for all s ^ t, s, t = 1 , . . . , 5. By Theorem 3.2.1, the five sums of squares Y ' A j Y , . . . , Y'AsY, are mutually independent. Therefore,
where A.3 = 0 under the hypothesis H0 : fti = fa = fo. A y level rejection region for the hypothesis HO : Pi = fa = Pi versus HI : not all j8's equal is to reject HO if F* > F%i where F%\ is the 100(1 — y) percentile point of a central F distribution with 2 and 1 degrees of freedom. The Type I sum of squares due to Block (B)\jjL, T is given by
where A£ = TKTf'Tf)-1!'*', -S^SfSp-^, with S£ = RM[Xi|X2] and Tf = RM[Xi|X 2 |Zi]. Furthermore, /3'X'A*3X/3 = 0, fr(A^E) = 4a| + |crjr + 2a|(B7), and EfY'AijY] = 4crJ + f ^Jr + 2cr|(Br). Therefore, an unbiased estimator of a\ is provided by ^{[Y'A^Y] - [Y'A4Y + (Y'A5Y/3)]} since
EXERCISES 1. If B i, 82, 83, and 84 are the n x n sum of squares matrices in Table 7.1.1, prove E2r = Br for r = 1 , . . . , 4 and BrB5 = O nxw for r ^ s. 2. In Example 7.1.1 prove that the sums of squares due to the mean, regression, and pure error are distributed as multiples of chi-square random variables. Find the three noncentrality parameters in terms of fii, /J2, and #3. 3. In Example 7.1.2 let b = t = 3, r\\ = r22 = ^33 = 2, ri 2 , ^23 = r^\ = 1, r\3 = r%i = r32 = 3, and thus n = 18. Let BI, 82, 83, 84, 85 be the Type I
148
Linear Models
sum of squares matrices for the overall mean ju,, B\n, T\n,B, BT\n,B,T, and R(BT)\n, B,T,BT, respectively. Construct and BI, B2, B3, B4, B5. 4. From Exercise 3, construct the n x n covariance matrix E. 5. From Exercises 3 and 4, calculate fr(B r E) and /3'X'dR'ErRKd(3 for r = 1,...,5. 6. From Exercise 3, find the distributions of Y'BrY for r = 1,..., 5. Are these five variables mutually independent?
8
Balanced Incomplete Block Designs
The analysis of any balanced incomplete block design (BIBD) is developed in this chapter.
8.1
GENERAL BALANCED INCOMPLETE BLOCK DESIGN
In Section 7.2 a special case of a balanced incomplete block design was discussed. As shown in Figure 7.2.1, the special case has three random blocks, three fixed treatments, two observations in every block, and two replicate observations per treatment, with six of the nine block treatment combinations containing data. We adopt Yates's/Kempthorne's notation to characterize the general class of BIBDs. Let b = the number of blocks t = the number of treatments
149
Linear Models
150
Figure 8.1.1
Balanced Incomplete Block Example.
k = the number of observations per block r = the number of replicate observations per treatment. The general BIBD has b random blocks, t fixed treatments, k observations per block, and r replicate observations per treatment, with bk (or tr) of the bt block treatment combinations containing data. Furthermore, the number of times any two treatments occur together in a block is A = r(k — !)/(? — 1). In the Figure 7.2.1 example, b = 3, t = 3, k = 2, and r = 2. The total number of block treatment combinations containing data is bk or tr, establishing the relationship bk = tr. To obtain a design where each block contains k treatments, the number of blocks equals all combinations of treatments taken k at a time or b = t\/[k\(t - Jk)!]. A second example of a BIBD is depicted in Figure 8.1.1. This design has b = 6 random blocks, t = 4 fixed treatments, k = 2 observations in every block, and r = 3 replicate observations per treatment, with bk = 12 of the bt = 24 block treatment combinations containing data. Next we seek a model for the general balanced incomplete block design. To this end, begin with a model for the complete balanced design with b blocks, t treatments, and one observation per block/treatment combination. The model is
where the bt x 1 vector Y* = ( Y n , . . . , YI,, . . . , Y M ,. ••, Y^)', theb? x? matrix X* = \b is given by
Note that #3 is a function of oj and a\T since var(#i) and var(^) are functions of ij represent the expected value of the kth observation in the ijlh combination of fixed factors A and B. Use the mean model
where the 5 x 1 vector p, = (nn, ^12, M2i> ^31, ^32)' and where Y, R, and E are defined as in Example 9.1.2. Note the 7 x 5 replication matrix R has full column rank k = 5. In general, the less than full rank model is given by
where the k x p matrix X<j had rank k < p. The equivalent mean model is
where the n x k replication matrix R has full column rank k and the elements of the & x 1 mean vector \L are the expected values of the observations in the k fixed factor combinations that contain data. Since the two models are equivalent Rp, = RXd/3. Premultiplying each side of this relationship by (R'R^R' produces
This equation defines the relationship between the vector /x from the mean model and the vector /3 from the overparameterized model.
9.3
MEAN MODEL ANALYSIS WHEN COV(E) = F/_jn_k where a is the overall mean. Confidence bands can be constructed on the linear combinations t'n where t is a k x 1 nonzero vector of constants. Under the normality assumption t'/i ~ NiCt'ji.crh'D^t) since
and
By Theorem 3.2.2, t'fr and Y'ApeY are independent since
Therefore,
A 100(1 — y)% confidence band on t'/Li is given by
9.4
ESTIMABLE FUNCTIONS
In Section 9.1 the less than full rank model Y = RXd/3 + E was introduced. The mean model Y = R/x + E was developed in Sections 9.2 and 9.3 to solve the difficulties caused by the less than full rank model. Arguably, there is no need to develop the less than full rank model since the mean model solved the overparameterization problem. However, less than full rank models are used (in
9
Less Than Full Rank Models
169
SAS, for example), so it seems worthwhile to explore them and their relationship to the mean model. For the less than full rank model, the least-squares estimator ft satisfies the system of normal equations
However, since X^DXd is singular, no unique solution for ft exists. In fact, there are an infinite number of vectors ft that satisfy the normal equations. All of these solutions are linear combinations of the vector Y, but none of them is an unbiased estimator of ft. As Graybill (1961, p. 227) points out, no linear combination of the vector Y exists that produces an unbiased estimator of ft. So how should we think of the term /3? As Searle (1971) states, for a less than full rank model, ft provides "a solution" to the normal equations "and nothing more." Therefore, ft should be thought of as a nonunique solution to the system of p normal equations, rather than as an estimator of ft. Although no linear combination of the vector Y produces an unbiased estimator of/3, unbiased estimators of g'/3 do exist for certain p x 1 nonzero vectors g. Unfortunately, unbiased estimators of g'/3 do not exist for all g. For example, let the p x 1 vector g = ( 1 , 0 , 0 . . . , 0)'. The parameter g'/3 is not estimable in this case since ft is not estimable and therefore no element of ft is estimable. The term "estimable" has been introduced. Before continuing, we formally define estimability. Definition 9.4.1 Estimable: A parameter (or function of parameters) is estimable if there exists an unbiased estimate of the parameter (or function of parameters). Definition 9.4.2 Linearly Estimable: A parameter (or function of parameters) is linearly estimable if there exists a linear combination of the observations whose expected value is equal to the parameter (or function of parameters). For the remainder of this chapter we confine our attention to linearly estimable functions. Therefore, the term estimable will subsequently imply linearly estimable. The next example demonstrates that all linear combinations of the vector // from the mean model are estimable. Example 9.4.1
For the mean model Y = R/u + E
where t is any k x 1 nonzero vector. For the less than full rank model the question still remains: When is g'/3 estimable? The following theorem addresses this question. The answer lies in the relationship that links \i and /3, namely, p, = Xd/3.
Linear Models
170
Theorem 9.4.1 The linear combination g'/3 is estimable if and only if there exists a k x 1 vector t such that g = Xdt. Proof: By definition g'/3 is estimable if and only if there exists an n x 1 vector b such that E[b'Y] = g'/3. First assume that g'/3 is estimable. Therefore, there exists an n x 1 vector b such that g'/3 = E[b'Y] = b'RXd/3 for all j3, which implies g' = b'RXd or g = XdR'b. Let t = R'b and there exists a k x 1 vector t such that g = Xdt. Now assume there exists a k x 1 vector t such that g = Xdt. Then E[t'A] = E[f D-'R'Y] = t'n = t'Xd/3 = g'/3. Therefore, g'/3 is estimable. As mentioned earlier, the normal equations have an infinite number of solutions. If /30 represents any one of the solutions then the next theorem shows that g'/30 is invariant to the choice of /30 when g'/3 is estimable. Theorem 9.4.2 Ifg'fl is estimable then g'/30 = t'/i provides a unique, unbiased estimate ofg'/3 where /30 is any solution to the normal equations and t is defined in Theorem 9.4.1. Proof:
Solve the normal equations for Xd/30. Therefore,
or
Since g'/3 is estimable, g'/30 = t'Xd/3o = t'/i where t'/i = t'Y is a unique estimate. Furthermore, g'/30 is an unbiased estimate of g'ft since E[g'/^] = E[t'/i] = t'^t = f Xd/J = g'/3. The Gauss-Markov theorem is applied to find the BLUE of g'/3. Theorem 9.4.3 Ifg'fi is estimable then the BLUE ofg'fi is g'/30 = t'/x. Proof: By Theorem 9.4.2, t'A = g'A)-BY Theorem 9.4.1, t'/i = t'Xd/3 = g'/3. By the Gauss-Markov theorem, t'/i is the BLUE of t'// = g'/3. The heart of the three previous proofs lies in the relationship // = Xd/3. Since t'/z is always estimable by t'/i, t'Xd/3 = t'/x is also estimable by t'/i. Therefore, g'/3 is estimable provided g' can be written as t'Xd for some k x 1 nonzero vector t. One could argue that the whole topic of estimable functions is viable only in so far as /3 is related to n through the relationship p, = Xd/3. Stated more strongly, estimable functions have little meaning without the mean model and, because of the mean model, estimable functions are at best redundant.
9
Less Than Full Rank Models
171
This section concludes with a SAS PROC GLM program that analyzes Federer's (1955) data from Example 9.1.1. Both the mean model and Searle's less than full rank model are run. The two models are then used to generate the same estimable functions. Example 9.4.2 The data set is given in Figure 9.4.1. The SAS program and output are presented in Appendix 4. The SAS output provides the following parameter estimates for the model Yfj = a + a, + £// (or equivalently for the model Y = RXd/3 + E).
Note that although the notation a, a\, #2, #3 is used, P0 should not be viewed as an estimate of (3 = (a, a\, a.^, o^)'. Rather /30 is one of the normal equation solutions for /3. The SAS output also provides the following estimates for the mean model Yjj = [LI + Etj (or equivalently for the model Y = R/z + E).
Suppose it is of interest to estimate ct\ — 0.2 = g'/3 for g = (0, 1, — 1,0)'. By Theorem 9.4.1, g'fl is estimable since there exists a 3 x 1 vector t = (1, — 1,0)' such that
Therefore, by Theorems 9.4.2 and 9.4.3, the unique BLUE of g'/3 = a\ — cti is provided by g'/^ = t'fi where
Figure 9.4.1
Federer's (1955) Data Set.
172
9.5
Linear Models MEAN MODEL ANALYSIS WHEN COV(E) =