Measurement Error Models
Measurement Error Models WAYNE A. FULLER Iowa State University Ames, Iowa
JOHN WILEY & SONS
New York
Chichester
Brisbane
Toronto
Singapore
A NOTE TO THE IEADER
This book has been electronically reproduced from digital itiforniation stored at Jolui Wiley I% Sons, hic. We are pleased that the use of this new technology will enable 11s to keep works of enduring scholarly value in print as long as there is a reasonable demand for them. The content of this book is identical to previous printings.
Copyright @ 1987 by John Wiley & Sons, Inc. All rights reserved. Published simultaneously in Canada.
Reproduction or translation of any part of this work beyond that permitted by Section 107 or 108 of the 1976 United States Copyright Act without the permission of the copyright owner is unlawful. Requests for permission or further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. Library of Congress Cataloging in Publication Data: Fuller, Wayne A. Measurement error models. (Wiley series in probability and mathematical statistics. Applied probability and statistics, ISSN 0271-6356) Bibliography: p. Includes index. 1. Error analysis (Mathematics) 2. Regression analysis. I. Title. 11. Series. QA275.FS5 1987 ISBN 0-471-86187- 1
Printed and bound in the United States of America by Braun-Brumfield, lnc. 109 8 7 6 5 4
To Doug and Bret
Preface
The study of regression models wherein the independent variables are measured with error predates the twentieth century. There has been a continuing interest in the problem among statisticians and there is considerable literature on the subject. Also, for over 80 years, studies have documented the presence of sizable measurement error in data collected from human respondents. Despite these two lines of research, only a fraction of the statistical studies appearing in the literature use procedures designed for explanatory variables measured with error. This book is an outgrowth of research on the measurement error, also called response error, in data collected from human respondents. The book was written with the objective of increasing the use of statistical techniques explicitly recognizing the presence of measurement error. To this end, a number of real examples have been included in the text. An attempt has been made to choose examples from a variety of areas of application, but the reader will understand if many of the examples have an agricultural aspect. The book may be used as a text for a graduate course concentrating on statistical analyses in the presence of measurement error. It is hoped that it will also find use as an auxiliary text in courses on statistical methodology that heretofore have ignored, or given cursory treatment to, the problems associated with measurement error. Chapter 1 was developed to provide an introduction to techniques for a range of simple models. While the models of Chapter 1 are special cases of models discussed in later chapters, it is felt that the concepts are better communicated with the small models. There is some flexibility in the order in which the material can be covered. One can move from a section in Chapter 1 to the corresponding section in Chapter 2 or Chapter 4. To facilitate flexible use, Sections 1.2, 1.3, and 1.4 are largely self-supporting. As a result, there is some duplication in the treatment of topics such as prediction. Some repetition seems advantageous because the vii
viii
PREFACE
models of this book differ from those typically encountered by students in courses on regression estimation. The proofs of most of the theorems require a background of statistical theory. One will be comfortable with the proofs only if one has an understanding of large sample theory. Also, the treatment assumes a background in ordinary linear regression methods. In attempting to make the book useful to those interested in the methods, as well as to those interested in an introduction to the theory, derivations are concentrated in the proofs of theorems. It is hoped that the text material, the statements of the theorems, and the examples will serve the person interested in applications. Computer programs are required for any extensive application of the methods of this book. Perhaps the most general program for normal distribution linear models is LISREL@ VI by Joreskog and Sorbom. LISREL VI is available in SPSSXTM and can be used for a wide range of models of the factor type. A program with similar capabilities, which can also perform some least squares fitting of the type discussed in Section 4.2, is EQS developed by Bentler. EQS is available from BMDP@Statistical Software, Inc. Dan Schnell has placed the procedures of Chapter 2 and Section 3.1 in a program for the IBM@ Personal Computer AT. This program, called EV CARP, is available from the Statistical Laboratory, Iowa State University. The packages S A P and BMDP contain algorithms for simple factor analysis. A program, ISU Factor, written with Proc MATRIX of SAS by Sastry Pantula, Department of Statistics, North Carolina State University, can be used to estimate the factor model, to estimate multivariate models with known error variances, and to estimate the covariance matrix of the factor estimates. A program for nonlinear models, written with Proc MATRIX of SAS by Dan Schnell, is available from Iowa State University. I have been fortunate to work with a number of graduate students on topics related to those of this text. Each has contributed to my understanding of the field, but none is to be held responsible for remaining shortcomings. I express my sincere thanks to each of them. In chronological order they are James S. DeGracie, Angel Martinez-Garza, George E. Battese, A. Ronald Gallant, Gordon D. Booth, Kirk M. Wolter, Michael A. Hidiroglou, Randy Lee Carter, P. Fred Dahm, Fu-hua Yu, Ronald Mowers, Yasuo Amemiya, Sastry Pantula, Tin-Chiu Chua, Hsien-Ming Hung, Daniel Schnell, Stephen Miller, Nancy Hasabelnaby, Edina Miazaki, Neerchal Nagaraj, and John Eltinge. I owe a particular debt to Yasuo Amemiya for proofs of many theorems and for reading and repair of much of the manuLISREL is a registered trademark of Scientific Software, Inc. SPSS' is a trademark of SPSS, Inc. BMDP is a registered trademark of BMDP Statistical Software, Inc. SAS is a registered trademark of SAS Institute, Inc. IBM AT is a registered trademark of International Business Machines, Inc.
PREFACE
ix
script. I thank Sharon Loubert, Clifford Spiegelman, and Leonard Stefanski for useful comments. I also express my appreciation to the United Kingdom Science and Engineering Research Council and the U.S. Army European Research Ofice for supporting the “Workshop on Functional and Structural Relationships and Factor Analysis” held at Dundee, Scotland, August 24 through September 9, 1983. Material presented at that stimulating conference had an influence on several sections of this book. I am grateful to Jane Stowe, Jo Ann Hershey, and Christine Olson for repeated typings of the manuscript. A part of the research for this book was supported by joint statistical agreements with the United States Bureau of the Census and by cooperative research agreements with the Statistical Reporting Service of the United States Department of Agriculture.
WAYNEA. FULLER Ames, Iowa
February 1987
Contents
List of Examples
xv
List of Principal Results
xix
List of Figures 1. A Single Explanatory Variable
xxiii 1
1.1. Introduction, 1 1.1.1. Ordinary Least Squares and Measurement Error, 1 1.1.2. Estimation with Known Reliability Ratio, 5 1.1.3. Identification, 9 1.2. Measurement Variance Known, 13 1.2.1. Introduction and Estimators, 13 1.2.2. Sampling Properties of the Estimators, 15 1.2.3. Estimation of True x Values, 20 1.2.4. Model Checks, 25 1.3. Ratio of Measurement Variances Known, 30 1.3.1. Introduction, 30 1.3.2. Method of Moments Estimators, 30 1.3.3. Least Squares Estimation, 36 1.3.4. Tests of Hypotheses for the Slope, 44 1.4. Instrumental Variable Estimation, 50 1.5. Factor Analysis, 59 1.6. Other Methods and Models, 72 1.6.1. Distributional Knowledge, 72
xi
xii
CONTENTS
1.6.2. The Method of Grouping, 73 1.6.3. Measurement Error and Prediction, 74 1.6.4. Fixed Observed X,79 Appendix 1.A. Large Sample Approximations, 85 Appendix l.B. Moments of the Normal Distribution, 88 Appendix l.C. Central Limit Theorems for Sample Moments, 89 Appendix l.D. Notes on Notation, 95
2. Vector Explanatory Variables
100
2.1. Bounds for Coefficients, 100 2.2. The Model with an Error in the Equation, 103 2.2.1. Estimation of Slope Parameters, 103 2.2.2. Estimation of True Values, 113 2.2.3. Higher-Order Approximations for Residuals and True Values, 118 2.3. The Model with No Error in the Equation, 124 2.3.1. The Functional Model, 124 2.3.2. The Structural Model, 139 2.3.3. Higher-Order Approximations for Residuals and True Values, 140 2.4. Instrumental Variable Estimation, 148 2.5. Modifications to Improve Moment Properties, 163 2.5.1. An Error in the Equation, 164 2.5.2. No Error in the Equation, 173 2.5.3. Calibration, 177 Appendix 2.A. Language Evaluation Data, 181
3. Extensions of the Single Relation Model 3.1. Nonnormal Errors and Unequal Error Variances, 185 3.1.1. Introduction and Estimators, 186 3.1.2. Models with an Error in the Equation, 193 3.1.3. Reliability Ratios Known, 199 3.1.4. Error Variance Functionally Related to Observations, 202 3.1.5. The Quadratic Model, 212 3.1.6. Maximum Likelihood Estimation for Known Error Covariance Matrices. 217
185
CONTENTS
xiii
3.2. Nonlinear Models with N o Error in the Equation, 225 3.2.1. Introduction, 225 3.2.2. Models Linear in x, 226 3.2.3. Models Nonlinear in x, 229 3.2.4. Modifications of the Maximum Likelihood Estimator, 247 3.3. The Nonlinear Model with an Error in the Equation, 261 3.3.1. The Structural Model, 261 3.3.2. General Explanatory Variables, 263 3.4. Measurement Error Correlated with True Value, 271 3.4.1. Introduction and Estimators, 271 3.4.2. Measurement Error Models for Multinomial Random Variables, 272 Appendix 3.A. Data for Examples, 281
4. Multivariate Models
292
4.1. The Classical Multivariate Model, 292 4.1.1. Maximum Likelihood Estimation, 292 4.1.2. Properties of Estimators, 303 4.2. Least Squares Estimation of the Parameters of a Covariance Matrix, 321 4.2.1. Least Squares Estimation, 321 4.2.2. Relationships between Least Squares and Maximum Likelihood, 333 4.2.3. Least Squares Estimation for the Multivariate Functional Model, 338 4.3. Factor Analysis, 350 4.3.1. Introduction and Model, 350 4.3.2. Maximum Likelihood Estimation, 353 4.3.3. Limiting Distribution of Factor Estimators, 360 Appendix 4.A. Matrix-Vector Operations, 382 Appendix 4.B. Properties of Least Squares and Maximum Likelihood Estimators, 396 Appendix 4.C. Maximum Likelihood Estimation for Singular Measurement Covariance, 404
Bibliography
409
Author Index
433
Subject Index
435
List of Examples
Number
Topic
1.2.1
Corn-nitrogen. Error variance of explanatory variable known. Estimates, 18
1.2.2
Corn-nitrogen. Estimated true values, 23
1.2.3 1.3.1
Corn-nitrogen. Residual plot, 26 Pheasants. Ratio of error variances known, 34 Rat spleens. Both error variances known, 40
1.3.2 1.3.3 1.4.1 1.5.1
1.5.2
1.6.1 1.6.2 2.2.1 2.2.2. 2.2.3 2.3.1 2.3.2 2.3.3
Rat spleens. Tests and confidence intervals, 48 Earthquake magnitudes. Instrumental variable, 56 Corn hectares. Factor model, 63 Corn hectares. Standardized factors, 69 Corn-nitrogen. Prediction for random model, 75 Earthquakes. Prediction in another population, 77 Coop managers. Error variances estimated, 110 Coop managers. Estimated true values, 114 Corn-nitrogen. Variances of estimated true values and residuals, 121 Apple trees. Estimated error covariance, 130 Corn-moisture experiment. Estimated error covariance, 134 Coop managers. Test for equation variance, 138 xv
xvi 2.3.4 2.4.1 2.4.2 2.5.1
LIST OF EXAMPLES
Rat spleens. Variances of estimated true values and residuals, 142 Language evaluation. Instrumental variables, 154 Firm value. Instrumental variables, 158
3.1.1
Corn-nitrogen. Calibration, 179 Corn-nitrogen. Duplicate determinations used to estimate error variance, 197
3.1.2 3.1.3
Farm size. Reliability ratios known, 201 Textiles. Different slopes in different groups, 204
3.1.4
Pig farrowings. Unequal error variances, 207 Tonga earthquakes. Quadratic model, 214
3.1.5 3.1.6 3.1.7 3.2.1 3.2.2 3.2.3
Quadratic. Both error variances known, 21 5 Supernova. Unequal error variances, 221 Created data. Linear in true values, 226 Berea sandstone. Nonlinear, 230 Berea sandstone. Nonlinear multivariate, 234
3.2.4
Hip prosthesis. Implicit nonlinear, 244
3.2.5 3.2.6
Quadratic, maximum likelihood. Large errors, 247 Pheasants. Alternative estimators of variance of estimated slope, 255
3.2.7
Pheasants. Alternative form for estimated variance of slope, 257 Quadratic. Large errors. Modified estimators, 257 Quadratic. Error in the equation. Weighted, 266
3.2.8. 3.3.1 3.3.2 3.4.1 4.1.1 4.1.2 4.2.1
Moisture response model. Nonlinear, 268 Unemployment. Binomial, 275 Mixing fractions. Known error covariance matrix, 308 Cattle genetics. Error covariance matrix estimated, 3 13 Two earthquake samples. Least squares estimation, 325
4.2.2
Corn hectares. Estimation of linear model, 330
4.2.3 4.2.4
Corn hectares. Distribution-free variance estimation, 332 Earthquakes. Least squares iterated to maximum likelihood, 337
LIST OF EXAMPLES
4.2.5 4.3.1 4.3.2 4.3.3
xvii
Corn hectares. Least squares estimation fixed and random models, 344 Bekk smoothness. One factor, 364 Language evaluation. Two factors, 369 Language evaluation. Not identified, 374
List of Principal Results Theorem
1.2.1
1.3.1 1.4.1 1.6.1 1 .A. 1 1 .c.1
1.c.2 1.C.3 2.2.1
Topic
Approximate distribution of estimators for simple model with error variance of explanatory variable known, 15 Approximate distribution of estimators for simple model with ratio of error variances known, 32 Approximate distribution of instrumental variable estimators for simple model, 53 Distribution of estimators when the observed explanatory variable is controlled, 8 1 Large sample distribution of a function of sample means, 85 Large sample distribution of first two sample moments, 89 Limiting distribution of sample second moments containing fixed components, IID observations, 92 Limiting distribution of sample second moments containing fixed components, independent observations, 94 Limiting distribution of estimators for vector model with error covariance matrix of explanatory variables known, 108
2.3.1
Maximum likelihood estimators for vector model with no error in the equation, 124
2.3.2
Limiting distribution of estimators for vector model with no error in the equation. Limit for small error variances and (or) large sample size, 127 Limiting distribution of instrumental variable estimator, 15 1
2.4.1
xix
xx
2.5.1 2.5.2
LIST OF PRINCIPAL RESULTS
Moment properties of modified estimator for simple model with known error variance and an error in the equation, 164 Vector version of Theorem 2.5.1, 171
2.5.3
Moment properties of modified estimator for model with no error in the equation, 173
3.1.1
Limiting distribution of weighted estimator for unequal error variances and an error in the equation, 187
3.1.2
Limiting distribution of weighted estimator for unequal error variances with no error in the equation, 190
3.1.3
Limiting distribution of maximum likelihood estimator for model with unequal known error covariance matrices, 218 Limiting distribution of estimators for nonlinear model with no error in the equation and known covariance matrix, 240 Limiting distribution of estimator of variance for nonlinear model, 243 Multivariate maximum likelihood estimator, error covariance matrix known up to a constant, 293 Multivariate maximum likelihood estimator, error covariance matrix estimated, 296
3.2.I 3.2.2 4.1.1 4.1.2 4.1.3 4.1.4 4.1.5 4.2.1
4.2.2 4.2.3 4.2.4 4.3.1 4.3.2 4.B.1
Multivariate likelihood ratio statistic, 301 Strong consistency of multivariate maximum likelihood estimators, 303 Limiting distribution of multivariate maximum likelihood estimators, 305 Limiting distribution of least squares estimator for structural model, 323 Least squares estimation of a diagonal covariance matrix, 329 Limiting distribution of least squares for normal functional model, 339 Equivalence of least squares estimators for the normal functional and structural models, 342 Limiting distribution of estimators for factor model, 360 Condition under which the factor model is not identified, 372 Almost sure convergence of the estimator maximizing a continuous function, 398
LIST OF PRINCIPAL RESULTS
xxi
4.B.2
Limiting distribution of estimator maximizing the normal distribution likelihood-structural model, 398
4.B.3
Limiting distribution of estimator maximizing normal distribution likelihood-functional model, 401
4.B.4
Limiting distribution of least squares estimator computed from sample covariances, 402
4.c. 1
Maximum likelihood estimator for model with known singular error covariance matrix, 404
List of Figures
Figure
1.2.1 1.3.1
Title
2.2.2 2.5.1
Residual plot for the corn-nitrogen data, 26 Pheasant data and estimated structural line, 36 Estimated minimum distance line and estimated true values for two types of cells, 42 Plot of residuals against estimated true x values, 67 Prediction in a second population, 80 Plot of residual against estimated true value for value orientation, 116 Normal probability plot for residuals, 117 Histogram for 2000 maximum likelihood estimates, 168
2.5.2
Histogram for 2000 modified estimates, 168
1.3.2 1.5.1 1.6.1 2.2.1
3.1.1 3.1.2 3.2.1 3.2.2
Plot of mean of two Y observations against mean of two
X observations for pig farrowing data, 209
Plot of weighted residuals against estimated true values, 212 Estimated function and observed value for wave velocity, 233 Estimated ellipse and observed data for image of hip prosthesis, 247
3.2.3
True function and maximum likelihood estimated function for 120 observations generated by quadratic model, 249
4.3.1
Deviation Z , , - it,plotted against ftl,368
xxiii
Measurement Error Models Edited by WAYNE A. FULLER Copyright 0 1987 by John Wiley & Sons, Inc
CHAPTER 1
A Single Explanatory Variable In Section 1 of this chapter we introduce the linear model containing measurement error and investigate the effects of measurement error on the ordinary least squares estimators. Estimation for a simple model is considered. In Sections 2 and 3 we study the two-variable measurement error model for two types of information about the measurement error variances. Because the objective is to introduce the reader to models and estimation procedures, we concentrate o n the normal bivariate model. Sections 4 and 5 treat situations in which three variables are observed. The estimation methods employed in Sections 2-5 are closely related to maximum likelihood estimation. Because of the simple nature of the models, most estimators can be obtained by the method of moments. Some specialized models and methods are considered in Section 6. Later chapters will be devoted to models of higher dimension and to an expanded treatment of models with fixed independent variables measured with error. In particular, Sections 2.1 -2.4 of Chapter 2 extend the results of Sections 1.1 -1.4 of this chapter to vector explanatory variables.
1.1. INTRODUCTION 1.1.1.
Ordinary Least Squares and Measurement Error
The classical linear regression model with one independent variable is defined by ~=ljlo+ljllx,+e,,
t =
1,2, . . . , n,
(1.1.1)
where (x,, x2, . . . ,x,) is fixed in repeated sampling and the e, are independent N(0, oee)random variables. We shall use the convention of identifying the 1
2
A SINGLE EXPLANATORY VARIABLE
mean and variance of a random variable within parentheses and shall use N to identify the normal distribution. It is well known that the least squares estimator
is unbiased for PI and has the smallest variance of unbiased linear estimators. In a second form of the regression model, the x, are assumed to be independent drawings from a N(pL,,oxx)distribution. In the second model it is also assumed that the vector (el, e 2 , . . . ,en) is independent of the vector (xl,x2, . . . , x,,), The estimator is the maximum likelihood estimator for Pi and is unbiased for under both models. We shall study models of the regression type where one is unable to observe x, directly. Instead of observing x,, one observes the sum
b1
X I = XI
+ u,,
(1.1.2)
where u, is a (0, uuu)random variable. The observed variable X I is sometimes called the manifest variable or the indicator variable. The unobserved variable x, is called a latent variable in certain areas of application. Models with fixed x, are calledfunctional models, while models with random x, are called structural models. To aid in remembering the meaning of functional and structural, note that “F” is the first letter of the words fixed and functional, while “S” is the first letter of the words stochastic and structural. As an example of a situation where x, cannot be observed, consider the relationship between the yield of corn and available nitrogen in the soil. Assume that (1.1.1) is an adequate approximation to the relationship between yield and nitrogen. The coefficient p , is the amount that yield is increased when soil nitrogen increases one unit. To estimate the available soil nitrogen, it is necessary to sample the soil of the experimental plot and to perform a laboratory analysis on the selected sample. As a result of the sampling and of the laboratory analysis, we do not observe x, but observe an estimate of x,. Therefore, we represent the observed nitrogen by X I , where X,satisfies (1.1.2) and u, is the measurement error introduced by sampling and laboratory analysis. The description of the collection of the soil nitrogen data permits two interpretations of the true values x,. First, assume that the fields are a set of experimental fields managed by the experiment station in ways that produce different levels of soil nitrogen in the different fields. For example, the application of varying rates of fertilizer and the growing of different fractions of legumes in the rotations would produce different levels of soil nitrogen. In such a situation, one would treat the true, but unknown, nitrogen levels in the different fields as fixed. On the other hand, if the fields were a random
1.1.
3
INTRODUCTION
sample of farmers’ fields in the state of Iowa, the true values of soil nitrogen could be treated as random variables. Let us investigate the effect of measurement error on the least squares coefficient in the simple model (1.1.1) and (1.1.2), under the assumption that the x, are random variables with ox, > 0. We assume (xt,
e,, ~,)l * NI[@.x, 0, OY, diag(oxx, gee,
ouu)],
(1.1.3)
where * NI is an abbreviation for “distributed normally and independently,” and diag(o,,, bee,ouu)is a diagonal matrix with the given elements on the diagonal. It follows from the structural model (1.1.3) that the vector ( y , X,)’, where Y; is defined by (1.1.1) and X, is defined in (1.1.2),is distributed as a bivariate normal vector with mean vector
ENY, XI> = (PY,P x ) = (Po + P I P , ,
PA
and covariance matrix (1.1.4) Let
be the regression coefficient computed from the observed variables. By the properties of the bivariate normal distribution,
+ ~uu)-lou = YI
m c ) = G;~*jly=Pl(~u
(1.1.6)
We conclude that, for the bivariate model with independent measurement error in X, the least squares regression coefficient is biased toward zero. It is important to remember that Equation (1.1.6) was derived under the assumption that the measurement error in X, is independent of the true values, x,, and of the errors, e,. One way to describe the effect of measurement error displayed in (1.1.6) is to say that the regression coefficient has been attenuated by the measurement error. There are a number of names for the ratio K,, = o;~o,, that defines the degree of attenuation. The ratio is called the reliability of X, in the social science literature, but because reliability is a heavily used word in statistics, we call K,, the reliability ratio. The reliability ratio is called heritability in genetics. An observed characteristic of a plant or animal, the X value, is called the phenotype and the unobserved true genetic makeup of the individual, the x value, is called the
4
A SINGLE EXPLANATORY VARIABLE
genotype. The phenotype is the sum of the genotype and the environment effect, where the environment effect is the measurement error. Because the bias in $Ilc as an estimator of fll is multiplicative, the test of the hypothesis that p1 = 0 remains valid in the presence of independent measurement error. That is, if model (1.1.1)-(1.1.5) holds with p1 = 0 and geu= 0, the population correlation between X,and Y, is zero for all values of cr,,. It follows that the usual regression “f test” of the hypothesis that pl = 0 has Student’s f distribution when fll = 0, whether or not the explanatory variable is measured with error. The use of the t distribution for hypotheses other than H,:p1 = 0 leads to biased tests in the presence of measurement error. Also, the presence of measurement error will reduce the power of the test of p1 = 0. See Exercise 1.1. For the classical regression model with fixed x,, the estimator
is called a linear estimator. The adjective “linear” is a modifier for the random variables I:and means that the estimator is linear in Y,. The term linear can also be interpreted to mean that the error in the estimator is a linear function of the random variables e,. If there are errors of measurement in the explanatory variable, the error in the least squares estimator is not linear in the full set of random errors. When X,= x, + u,, where u, is measurement error, the error in the least squares estimator is
[I:l
c {(x, - m,
- E)
1
+ (u, - W O , - io>
?
where u, = e, - utpl.The measurement errors u, enter as squares and as products with x, in the denominator, and as squares and as products with e, in the numerator of yIlc - pl. The covariance between u, and u, produces the bias characterized in (1.1.6). It is possible for the measurement errors u, to be correlated with the true values x, and with e,. In such cases the terms (x, - Z)(u, - ii) and (u, - @(el - i?) will also make a contribution to the bias. The population squared correlation between x, and is
I:=
I:=
RfY = ( ~ , , ~ Y Y ) - l ~ f Y = G v : % x 7
while the population squared correlation between X,and yl is R i y = (oxxuyy)-lgiy = ~ , , R r y .
1.1.
5
INTRODUCTION
Thus, the introduction of independent measurement error leads to a reduction in the squared correlation, where the factor by which the correlation is reduced is the factor by which the regression coefficient is biased toward zero. As with the regression coefficient, it is said that the correlation has been attenuated by the presence of measurement error. 1.1.2.
Estimation with Known Reliability Ratio
We have shown that the expected value of the least squares estimator of PI is the true PI multiplied by K,,. Therefore, if we know the ratio t7;jox,, it is possible to construct an unbiased estimator of PI. There are a number of situations, particularly in psychology, sociology, and survey sampling, where information about K,, is available. Assume we are interested in a population of individuals possessing a trait x. Examples of such traits are intelligence, community loyalty, social consciousness, willingness to adopt new practices, managerial ability, and the probability that the individual will vote in a particular election. It is impossible to observe directly the value of the trait for a particular individual. One can obtain only an estimate for the individual. This estimate, or response, for an individual is obtained frequently as a score constructed from the answers to a number of questions. The battery of questions is sometimes called an instrument. The score of an I Q test is, perhaps, the best known example. The difference between the individual's value for the trait and the response is the measurement error. Often the instrument has been studied and the ratio K,, is so well estimated that it may be treated as known. The reliability ratio K,, is published as a part of the supporting material for a number of standard instruments, such as IQ tests. Suppose that we draw a simple random sample of individuals from a population for which the ratio K,, is known. An unbiased estimator of the strucrelating a characteristic Y to the true value of tural regression coefficient /I1 a trait x of model (1.1.1) is given by /j - - I * (1.1.7) 1
- K,,
Yld,
where f l C is the least squares coefficient defined in (1.1.5). The coefficient (1.1.7) is sometimes called the regression coefficient corrected for attenuation. Because XI) is distributed as a bivariate normal, the conditional distribution of yIIC is normal and the conditional mean and variance of ylld, given X = (XI, X s , . . . ,XJ,are
(x,
E{PldIXJ = YI,
( 1.1.8)
6
A SINGLE EXPLANATORY VARIABLE
An unbiased estimator of (1.1.9)(for n > 2) is W l G J X }= where
"=
1
(X,- X ) 2
1-
st,
(1.1.10)
n
1 [k; - 7 - yIIG(xt - XI]'. r=
s t = ( n - 21-l
1
The unconditional variance of yIld is the variance of the conditional expectation plus the expected value of the conditional variance. Because the estimator yIIG is conditionally unbiased for yl, the unconditional variance is the ZL (X,- R)2 expected value of the conditional variance. The quantity is distributed as a chi-square random variable with n - 1 degrees of freedom, and the unconditional variance of yIlc is obtained by evaluating the expectation of (1.1.9). For n > 3,
(1.1.1 1) ~ { y I , , > = [(n - 3bxxI-l(flYY - Ylax,). The estimated conditional variance (1.1.10) is an unbiased estimator of the unconditional variance because f l G is conditionally unbiased. Therefore, an of (1.1.7) is unbiased estimator of the conditional variance of
P{j,lX}=
B1
[:;.
1=1
(X,- x ) ' ] - l s t .
(1.1.12)
8,
The conditional and unconditional distributions of are closely related and the estimated variance of the conditional distribution is also an estimator of the variance of the unconditional distribution. This is not true for
(1.1.13) po = F - B,x because the conditional expected value of Bo is a function of X. The limiting
distribution of the estimator of yhere V = n-'
c:=,
Po is established by noting that
Bo = P o
+
-
(B,-
Pl)Z
(1.1.14)
u, and u, = e, - urP1. It follows that n1i2(fl, - Po,
PI - PI) is normally distributed in the limit and that a consistent estimator of the covariance matrix of
(Po, P1)' is
where S,"
and
= (n
- 2)-l
2 (I:- so
t= 1
?{I, IX} is defined in (1.1.12).
- /jlX,)2
I. I.
7
INTRODUCTION
Also, by standard regression results,
(1.1.16)
is distributed as Student's t with n - 2 degrees of freedom. Any linear hypothesis about p1 can be transformed into a hypothesis about y 1 by using the reliability ratio. Therefore, in the bivariate situation, knowledge of the ratio c;~c,, permits one to construct an unbiased estimator of the parameter p1and to apply the usual normal theory for hypothesis testing and confidence interval construction. Unfortunately, these simple results do not extend to the vector-x case. See Section 3.1.3. To construct the estimator (1.1.7), we must assume that the X values are from a distribution where the reliability ratio is known. Generally, this means that the X values of the current study must be a random sample from the population that generated the random sample used to estimate the ratio. For example, assume that the population of graduate students scores higher on IQ tests than does the general population. Then the ratio c i i o x x of an IQ test computed for the general population is not applicable for a sample of graduate students. Recalling that R:, = (cxxdYY)Y:.'
= K,,R&,
an estimator of the squared correlation between x and Y is (1.1.17)
where
& = (mxxmyy)- 'mi,and (my,,mxy, mxx)is the sample estimator of
(cyy,ex,,exx).The
estimator (1.1.17) is said to be the squared correlation corrected for attenuation. It is possible for k:, defined by (1.1.17) to exceed one. In such cases, the maximum likelihood estimator of R& is one and the maximum likelihood estimator of /3, is 8, = (sgn m X Y ) [ m ; ~ ~ ~ ~ m , , 1 1 / 2 . We have studied the effect of measurement error on ordinary regression statistics and have demonstrated one method of correcting for that effect. We now give an indication of the order of magnitude of such effects. Table 1.1.1 contains estimates of the effect of measurement error on the regression coefficient for a number of socioeconomic variables. The attenuation coefficients, denoted by xXx, of the table are the correlations between two determinations on the same characteristic. For continuous variables, such as income, the attenuation coefficient is c;~e,,, For zero-one variables,
8
A SINGLE EXPLANATORY VARIABLE
TABLE 1.1.1. Attenuation coefficients for selected variables Sample Size, R i Y = 0.5 Variable
K,,
Sex" Age"
0.98 0.99 0.92 0.88 0.85 0.17 0.58
Age" (45-49)(0-1)
Educationb Incomeb
Unemployed' (0-1) Poverty statusd(0-1)
so/so 2157 6976 132 58
35 14 5
MEM Superior 97
171
27 19 15 11
7
All persons. Persons 14 and over. Persons 16 and over. All families. Unemployed attenuation coefficient is from Fuller and Chua (1984). All other coefficients calculated from data in U.S. Department of Commerce (1975). a
such as sex, the correlation between two independent, identically distributed, determinations defines the multiplicative effect of measurement error on the regression coefficient of the latent class model. See Section 3.4. The estimates of the table were constructed from repeated interview studies conducted by the United States Bureau of the Census. In such studies, the same data are collected in two different interviews. Most of the estimates of Table 1.1.1 come from a comparison of responses in the 1970 Census with the same data collected in the Current Population Survey. In survey sampling the measurement error in data collected from human respondents is usually called response error. The increase in response error as one moves down the table is associated with a corresponding increase in the complexity of the concept being measured. Sex and age are relatively well-defined characteristics, while income, poverty, and unemployment are all relatively complex. In fact, whether or not a family is above the poverty level, or an individual is unemployed, is determined by responses to a number of questions. Age illustrates how subdividing the population can increase the effect of measurement error. For the population as a whole about 1% of the observed variation in age is due to measurement error. However, few studies investigate the effect of age for the entire population. If one is interested in a single 5-year category, such as ages 45-49, the effect of measurement error on the estimated regression coefficient increases to about 8%. Measurement error variance is about 15% of total variation for income. This level of variation is typical of many related variables, such as occupational status and socioeconomic status. As with age, the percent of variation
1.1.
9
INTRODUCTION
due to measurement error increases if we restrict the population. For example, K,, = 0.82 for persons with some income. The 50/50 sample size given in the table is the sample size for which 50% of the mean square error of an ordinary least squares regression coefficient is squared measurement error bias, given that R i Y for the observed variables is 0.5. If the sample size is larger than that of the table, more than half of the mean square error is due to squared bias. The last column of Table 1.1.1 contains the sample size for which the correction-for-attenuation estimator (1.1.7) has the same mean square error as the ordinary least squares estimator in a population with R i Y = 0.5. For any larger sample size, the corrected coefficient of (1.1.7) has smaller mean square error. The sample size is determined by solving the equation (n - 3)-'(1 - Ri,,)
+ (1 - K , , ) ~ I c ; ~=R(n~-~ 3)-'~;:(1
-
Riy)
(1.1.18)
for n, where the left side of the equality is the mean square error of the ordinary least squares coefficient and the right side of the equality is the variance of the estimator (1.1.7), both in standardized units. This equation was derived under the assumption of bivariate normality, but it is a useful approximation for a wide range of distributions. References containing discussions of measurement error and its effects include Morgenstern (1963), Cochran (1968), Hunter (1980), and Pierce (1981). Dalenius (1977a-c) has given a bibliography for response errors in surveys. 1.1.3. Identification The model (1.1.1)-(1.1.3) can be used to illustrate the idea of identification. Identification is a concept closely related to the ability to estimate the parameters of a model from a sample generated by the model. For the purposes of our discussion we consider a model to be a specification of: (a) Variables and parameters of interest. (b) Relationships among the variables. (c) Assumptions about the stochastic properties of the random variables. It is understood that the parameters of interest are specified by a vector
8 E 0,where 0 is the space of possible parameter values and the dimension
of 8 is the minimum required to fully define the model. The observable random vectors have a sampling distribution defined on a probability space. Let Z be the vector of observable random variables DeJinirion Z.I.I. and let F,(a: 8 ) be the distribution function of Z for parameter 8 evaluated at Z = a. The parameter 8 is identified if, for any 8, E 0 and 8, E 0,8, # 8,
10
A SINGLE EXPLANATORY VARIABLE
implies that
Fz(a: 0,) # Fz(a: 0,) for some a. If the vector 8 is identified, we also say that the model is identified. Definition 1.1.2. The parameter Bi,where Bi is the ith element of 8, is identified if no two values of 0 E 0,for which Bi differ, lead to the same sampling distribution of the observable random variables. The model is identified if and only if every element of 0 is identified.
For the model of Section 1.1.1, Equations (1.1.1) and (1.1.2) give the algebraic relationship among the variables x,, e,, u,,X,, and where only I; and X I can be observed. For the normal structural model, the random vector (x,, u,, el) is distributed as a multivariate normal with G,, = ox,, = G,, = 0 and c,, > 0. This exhausts our prior information about the distribution of the variables. The vector of unknown parameters for the model is 0 = (pxlox,, flee, ouu,PO, P1). Under the assumptions, the distribution of the X I ) is bivariate normal. Hence, the distribution of (q,X,) observations is characterized completely by the elements of its mean vector and its covariance matrix, a total of five parameters. Because the model contains six parameters, there are many different parametric configurations (different 0) that lead to the same distribution of the observations. Therefore, the model is not identified. For example, the parameter sets
x,
(x,
81 = ( ~ x g, x x ,
gee, auu,
Po,
PI)
= (1,
1, 1, 1, 1, 1)
and 02
= (Px,
a x x , Gee, g u u ,
PO, 81) = (1, 2, 1.5,0, 1.5,0.5)
are both such that
While the normal structural model is not identified, one of the parameters is identified. The mean of x is equal to the mean of X. Thus, the parameter p x is identified because, given the sample distribution, the parameter vector 8 is restricted to a subspace where p, is equal to the mean of X. For a model that is not identified, it is possible for the sample distribution to contain some information about the parameters. We have seen that one of the parameters, the mean of x, is identified even though the other parameters are not identified. In fact, the sample distribution associated with model (1.1.1)-( 1.1.3) contains additional information about the parameters. We note that we are able to detect the situation P, = 0 from the distribution of
1.1.
11
INTRODUCTION
x).
That is, given o,, = 0, we have PI = 0 if and only if oxy = 0. However, we are still unable to determine o,, and IT,, when PI = 0. In general, given CT,, = 0, the distribution of ( X t , I;) permits one to establish bounds for ouu.Because o,, 2 0, we need only establish an upper bound for ouu.From the equations following (1.1.4), we can obtain an equation for o,, in terms of ouuand an equation for o,, in terms of gee.These-equations are
(Xt,
o'ee = o y y
nu,
=
- o;Y(~~xx- ~ u u ) - ' ,
o x x - ~ : ~ ( I T Y Y - gee)-'*
A maximum value for o,, is obtained by setting gee= 0 and a maximum value for oeeis obtained by setting o,, = 0. These bounds lead to the bounds for the parameters displayed in Table 1.1.2. When PI # 0, the bounds for PI are the population regression of Y on X and the inverse of the population regression of X on Y . The reader should remember that the assumptions vXx> 0 and oeu= 0 were used in this development. It is not posssible to set bounds for the parameters if ce, is also unknown. To summarize, the sampling distribution of ( X , , I;) tells us something about the measurement error model (1.1.1)-(1.1.3). However, the model is not identified because it is not possible to find a unique relationship between the parameter vector of the distribution of (X,, and the parameter vector 8. If we are to construct a consistent estimator for the vector 8, our model must contain a specification of additional information. In Section 1.1.2, knowledge of o&o,, enabled us to estimate the remaining parameters. In the following sections we use the method of moments and the method of maximum likelihood, methods that are often equivalent under normality, to construct estimators of the unknown parameter vector 8. The properties of the estimators depend on the type of information that is used to identify the model.
x)
TABLE 1.1.2. Bounds for the parameters of model (1.1.1)-(1.1.3) obtained from the distribution" of (X,,
x)
The covariance u,, is zero and u,, > 0.
12
A SINGLE EXPLANATORY VARIABLE
REFERENCES Allen (1939), Cochran (1968), Fisher (1966), Moran (197 l), Spiegelman (1982), Department of Commerce (1975).
US.
EXERCISES 1. (Section 1.1.1)Assume that model (1.1.1)-(1.1.3) holds with u,, = 1, uee= 0.1,and n = 100. Compute the power of the regression t test ( 0.05 level) of the hypothesis 8, = Oagainst H A :8 , # 0 at P , = 0.063for u,, = 0 and for u,, = 0.5. You may approximate the noncentral t distribution ' 2 variance 1. by the normal distribution with mean E { ~ l l } [ n - ' E { s ~ } ] - 1and 2. (Section 1.1.1)Let
Y = Po + PIX, + el, X I = x, + u,, k.el, u3' NI[(P,, 40)'.diag(a,,, ueet%AIy a, = U, - E { I;} - yl(X, - E { X , } ) , and 7, = U,:u,y.
-
(a) Show that V { a , } = u,, + y:uuu + (8, - Y~)~U,,. (b) Let fir = rn;irnxr. Give the unconditional variance of fir. Is.it possible for the unconditional variance of .ilc to be smaller than that of m;;rnxy? 3. (Section 1.1.3)In discussing the identification of models we often compared the number of unknown parameters of the model with the dimension of the minimal sufficient statistic. If a model is to be identified, it is necessary that the number of parameters be no greater than the dimension of the minimal sufficient statistic. Construct an example to demonstrate that not all parameters need be identified if the number of unknown parameters is less than or equal to the dimension of the minimal sufficient statistic. 4. (Section 1.1.3)Assume the model y, = Po t xJ,, with
+
-
(x,, e,, d' NI[(P,. 0,W,block diag(u,,,
&,)I
and (I;,X,) = (y,, x,) (e,, u,). Discuss carefully the identification status for cases (a), (b), and (c), giving the parameter space in each case. In all cases all parameters except those specified are unknown. (a) The parameters Po, u,,, and uuuare known. (b) The parameters ueeand u,, are known. (c) The parameters Po, u,,, and u,, are known. 5. (Section 1.1.2)Assume that the twelve (I;,X,)pairs (3.6,3.9), (2.5,2.9), (3.9,4.4), (5.0,5.9), (4.9,5.4),(4.5,4.2), (2.9,2.3), (5.2,4.5), (2.7,3.9, (5.8,6.0),(4.1,3.3),and (5.L4.1)satisfy model (1.1.1)-(1.1.3) with K,, = 0.85. Estimate (Po, PI). Estimate the covariance matrix of your estimators. 6. (Section 1.1.3)Let the following model hold:
2
I; = Po + j - 1
+
(e,,u,, xfY
XflP,
+ e,,
- NI[(O, 0,cxY,block diaidu,,,
o,,,
L)],
where XfI= xI1 u,. Obtain an expression for the expected value of the coefficient for X I , in the ordinary least squares regression of U, on X f I ,x,,, xf3,. . . ,x l p . Establish bounds for 8 , in XI). terms of the population covariance matrix of
(x,
1.2.
13
MEASUREMENT V A R I A N C E KNOWN
7. (Section 1.1.2, Appendix
l.B)Let
-
[(Y,, x,), (e,, 141’ NI[(I, 0, 0,OY, block diag(Z:,,, 0.21)],
(x,
+
where u,, = uyy= 1, uXy= p , and X , ) = ( y , , x,) (el, u,). (a) What is the reliability ratio of X,? Of X,+ Y,? (b) What is the reliability ratio of Y:? Of X:? Of X,Y,? (c) What is the reliability ratio of Y:? Of X:? Of X:Y,? 8. (Section 1.1.2) Verify the equation used to construct the last column ofTable 1.1.1 by showing that the left side of Equation (1.1.18) is the standardized mean sguared error of PIC and that the right side of (1.1.18) is the standardized mean squared error of 0,. both as estimators of P I . 9. (Section 1.1.3) Assume the model
-
+ e,, X,= x, + u,, NI[(P,, 0, OY, block diag(u,,, Zt,,)],
V, = Po + Blx,
(x,. e,, 4’
with ox,, u,,, ururand a,, unknown and u,, > 0. Show that, given any positive definite covariance matrix for (Y,,X,),the interval (- 00, 0 0 ) is the set of possible values for PI. 10. (Section 1.1.2, Appendix l.A) Show that PI of (1.1.7) satisfies
j , -B,
= (n - 1)-1u;~
1 ( x ,- X)(rl - F) + oP(n-l),
I= I
where r, = Y, - x, - y l X , and y,, = E( Y , yI X,]. Hence. i, and b, - PI. where V is dctined in (1.1.14). are independent in the limit. I I. (Section 1.1.2, Appendix l.A) In Section 1.1.2 it was assumed that kXx= u i i u x x was known. Prove the following theorem. ~
Theorem. Let model (1.1.1)-(1.1.3) hold and let ,?,I dependent of ( y qXI), r = 1 , 2 , . . . ,and
fl”*(Z,,
Let
- Kxx)
be an estimator of K,,, where I?,, is in-
N(0, owJ.
j , = Z;xl~lC. Then flliz(81
- 81) 5 “0,
Ki.{Uii(urr
- YIUXY) + K;:Y2y:U,,]].
1.2. MEASUREMENT VARIANCE KNOWN 1.2.1. Introduction and Estimators In this section, we retain the normal distribution model introduced in Section 1.1.1. That is, we assume X , = x, + u,, Y, = Po + P , x , + e,, (xu e,, u,Y NI[(P,, 0, 01’9 diag(g,,, g e e . %”)I.
(1.2.1)
The first equation of (1.2.1) is a classical regression specification, but the true explanatory variable x, is not observed directly. The observed measure of x,, denoted by X,,may be obtained by asking people questions, by reading an imperfect instrument, or by performing a laboratory analysis. It is assumed
14
A SINGLE EXPLANATORY VARIABLE
that the variance of the measurement error, u,,, has been determined, perhaps by making a large number of independent repeated measurements. Estimators of the remaining parameters will be derived under the assumption that a,, is known. X I )is bivariate normal, the sample mean 2 = ( y, P) and Because Z, = sample covariances (myy, m x y , m,,), where, for example,
(x,
C (x,- @(I; - P),
mxy = ( n - I ) - '
1=1
form a set of sufficient statistics for estimation of the parameters. If the parameter vector is identified, the maximum likelihood estimator will be a function of these statistics. See Kendall and Stuart (1979 Vol. 2, Chaps. 17 and 23). If there are no parametric restrictions on the covariance matrix of Z,, then n- ' ( n - l)rnzz is the maximum likelihood estimator of the covariance matrix of Z,, where
m,,
= (n - 1)-
n
* 1 (2,- Z)'(Z, - 2). r= 1
We shall use the unbiased estimator m,, in our discussion. We call mZzthe maximum likelihood estimator adjusted for degrees of freedom. X,)satisfy Recall that, under model (1.2.1), the population moments of (uYY,
oxy,oxx)= ( f l i o x x
+ Gee, f l l u x x , o x x + ow)
(x,
(1.2.2) See (1.1.4). We create estimators of the unknown parameters by replacing the unknown population moments on the left side of (1.2.2) with their sample estimators to obtain a system of equations in the unknown parameters. Solving, we have (1.2.3) a 1 = (mxx - %)-Imxy,
v,
(fix,
= (mxx - uUu, m y y - Blmxy), so, = (8, - s , q .
&ee)
v
The knowledge of a,, has enabled us to construct a one-to-one mapping from the minimal sufficient statistic to the vector (fix,
8x.x- Bo, B 1 , dee)*
For the quantities defined in (1.2.3) to be proper estimators (i.e., to be in the parameter space), 8,. and d,, must be nonnegative. The estimators of ox, and u,, in (1.2.3) will be positive if and only if m,y(mxx
-
UUU)
- m:x > 0.
(1.2.4)
1.2.
15
MEASUREMENT VARIANCE K N O W N
Estimators for samples in which (1.2.4) is violated are Be, = 0,
b, = m,/myy,
with
and 8,, = m -y1y m2x y ,
(fix,8,) defined by the last equation of (1.2.3).
1.2.2. Sampling Properties of the Estimators The sampling behavior of the estimator 8, defined in (1.2.3) is not obtained easily. While the properties of the normal distribution can be used to derive the distribution of the estimator conditional on the observed X values, the expressions are not particularly useful because the conditional mean is a function of the X,and the X,are functions of the measurement errors. (See Exercise 1.13.) Therefore, it seems necessary to u;e large sample theory to develop an approximation to the distribution of P1. The sample moments (m,,, m x y ,m x x ) are unbiased for the population moments and all have variances that are decreasing at the rate n-’. These properties enable us to obtain the limiting distribution of the estimators. Theorem 1.2.1. Let model (1.2.1)hold with a,, know?, CJ,, > 0, and a,, > 0. Then the vector n”’[(/?, - Po), - PI)], where (Po, is defined in (1.2.3), converges in distribution to a normal vector random variable with zero mean and covariance matrix
(8,
8,)
Po - X,Pl = e, - u,B1 and ax”= B,,,, = -Plaulr. Furtherwhere 5 more, nV{(Po, PI)’} converges in probability to r, where
B,, = m x x - a,,,,,and 8,,
=
-Plauu.
Proof. Under the normality assumption, mxy is unbiased for b X yand V m x y } = (n - l ) - l ( ~ x x ~ Y+Y4 Y ) .
(See Appendix 1.B for the moments of the normal distribution.) Because the sample moments are converging in probability to the population moments,
16
A SINGLE EXPLANATORY VARIABLE
we can expand 81
8, in a Taylor series about the population values to obtain = ( o x x - ~u,)-loxY
+ (oxx - cur#)-
-(ox, - ~ u u ) - 2 ~ X Y ( m X X- Cxx)
YmXY
- GXY)
+0pW1).
(See Appendix 1.A.) After algebraic simplification, this expression can be written nlQ,
- 0') = n1/20;:(mxv - ox,)
c:=
x)u,.
+ ~ , ( n -'"),
where m,, =*(n - l)-' (X, It follows that the limiting distribution of n1'2(p1- 8') is the same as the limiting distribution of n1/20;:(mxu - ouu)= n1/20;:(mx,
+ mu,- ou,,).
In a similar manner
fi0 = k - f i J = P o + plx + z- j l ( X + 6 ) = P o - (A - P l ) P x + + O,(n- '1,
where ij = i? - PIU, and the limiting distribution of n'12(fio - Po) is the same as that of n'12F - (/il
- Pl)Pxl.
By the properties of the normal distribution, the covariance between m,, and 0 is zero and, by Corollary 1.C.1 of Appendix 1.C, the limiting distribution of n"2(rnx, - ouv,ii) is bivariate normal. The limiting distribution of nllZ[(/io, fiJ - (Po, /I1)'] follows because, to the order required, the error in the estimators is a linear function of (mxu- o,,,,V), By the n y m a l moment properties, 6,, = oxx+ O,(n-'/z), and we have shown that - p1 = 0,(n-'12). Therefore, ,=1
= o,,
It follows that
nO{jl]
= oo :;,
+ O,(n-
112).
+ o,;t(~,,o,, + p:o;,,) + 0,(n-''2).
Similar results hold for the remaining two entries of (1.2.6).
0
The random variable ur, introduced in Theorem 1.2.1, will be central in our study of measurement error models. Its role is analogous to that of the deviation from the population regression line in ordinary fixed-x regression models. This analogy can be made more apparent if we substitute x, = X, w, into the first equation of (1.2.1) to obtain
1.2.
17
MEASUREMENT VARIANCE KNOWN
The u, differs from the error in the ordinary fixed-x regression model because X , and u, are correlated. In Theorem 1.2.1 it is assumed that the true values x, are normally distributed. The theorem holds if (e,, u,) is independent of x, and the x, satisfy mild regularity conditions. Normality of the error vector (e,,u,) permits us to give an explicit expression for the covariance matrix of the estimators in terms of second moments of the distribution of the original variables. The approximations based on normality remain useful for error distributions displaying modest departures from normality, provided the error variances are small relative to the variance of x. Also see Theorem 2.2.: and Section 3.1. The variance of the limiting distribution of n’/’(P1 - bl) is considerably larger than the corresponding variance for the ordinary least squares regression coefficient constructed with X,.The difference is due to several sources. The divisor, (mxx - ouu),has a smaller expectation than the divisor of the ordinary least squares estimator and the variance of u, is larger than the variance about the ordinary least squares line. The quantity nl/’ was used to standardize the estimators in obtaining the limiting distributions. We choose to use the divisor (n - 1) in the estimated variance (1.2.7) because the variance estimator so defined reduces to the ordinary least squares variance estimator when o,, = 0. For the same reason we used the divisor (n - 2) in the definition of suu. The use of the divisor ( n - 2) in the estimator of ouuleads to an internal inconsistency with the estimator of o,, define! in (1.2.3) because ouu= o,, b$~,,. The estimator n?{(Po, is an estimator of the covariance matrix of the limiting distribution of nilz($, - Po, - bl). Because the expectation of jl is not defined, it is notJechnically correct to speak of (1.2.7) as an estimator of the variance of PI. It is an estimator of the variance of the approximating distribution. Because n9{(bo,b1)’}is a consistent estimator of (1.2.5), it follows that
+
8,)’)
bl
(1.2.8) is approximately distributed as a N ( 0 , 1) random variable. In practice it seems reasonable to approximate the distribution of (1.2.8) with the distribution of Student’s t with n - 2 degrees of freedom. Care is required when using this approximation. Throughout, we have assumed oxx> 0. If nXxis small relative to the variance of mxx - o,,, the approximations of Theorem 1.2.1 will not perform well. A test of the hypothesis that a,, = 0 is given by the statistic 22
=a , ’
2 (X,-
,=1
X)2.
(1.2.9)
If ox, = 0, the distribution of x’ is that of a chi-square random variable with n - 1 degrees of freedom. For the approximations of Theorem 1.2.1 and Equation (1-2.8) to perform well, the population analogue of (1.2.9) should
18
A SINGLE EXPLANATORY VARIABLE
be large. That is, (n - ~ ) C T ~ ~ should ' C J ~ ~be large. As a rule of thumb, one might judge Student's t distribution with n - 2 degrees of freedom an adequate approximation to the distribution of (1.2.8) if (n -
1)-l(mXx
- C T ~ ~ ) - ~b u u Y ,
- (Cew
= m,-,l[mz,
guu)'ly
n
1 (Z, - Z)'(Z, - Z , X , - X). r=
(m,,, mzx)= (n - I)-'
1
As before, the use of estimators in constructing the predictor introduces an additional error that is O,(n - ' I 2 ) . If one ignores this additional component, an estimator of the variance of the prediction error is
f{gr - xr 1 Zr> = g u u - ( g e u ,
ouu)mi;(geu,
ouu)'.
(1.2.34)
Example 1.2.2. We use the estimates from Example 1.2.1 to construct an improved estimate of the soil nitroge? at each site. From Example 1.2.1, uuu= 57, be, = 43.29, Po = 67.56, and PI = 0.4232. Treating x, as a fixed unknown constant, the estimator (1.2.25')is A
2,
= 0.4510(x -
67.5613) + 0.8092X,
and the estimated variance (1.2.26) is f(2, - x,}
=
[bft?,' + 0A1]-' = 46.12.
Using the information available in reduces the variance of our estimate of soil nitrogen by about 20%, to a variance of 46 from a variance of 57. Table 1.2.2 contains the 2, and 8,. The reader may verify that (n - l ) - '
1 (x - Y)6, = 43.29 = d,,, n
1=
(n - 1)-'
1
n
1 ( X , - X)C, = -24.13
= -blgUu,
I =1
(n -
1)-1
1 (a, - X)G, = 0.
1-
1
We now construct the predictor of x, under the assumption that the x, are a random sample from a normal distribution. The estimated mean vector is (&
6,) = (Y,X) = (97.4545, 70.6364)
24
A SlNGLE EXPLANATORY VARIABLE
TABLE 1.2.2. Predicted soil nitrogen, estimated soil nitrogen, and deviations from fit
Observed Site 1 2 3 4 5 6
I
8 9 10 11
Observed Nitrogen
Yield YI
x,
4
86 115 90 86 110 91 99 96 99 104 96
70 97 53 64 95 64 50 70 94 69 51
-11.18 6.39 0.01 - 8.65 2.24 - 3.65 10.28 - 1.18 - 8.34 7.24 6.86
Estimated (Fixed x)
Predicted (Random x)
64.96 99.88 53.00 60.10 96.01 62.36 54.64 69.47 90.24 72.26 54.09
65.85 95.29 55.77 61.75 92.03 63.66 51.14 69.65 87.16 72.01 56.69
4
f,
and by (1.2.33),
I:] [ =
87.6727 104.8818]-'[ 104.8818 304.8545
104,88181 = [0,3801] 247.8545 0.6822
*
It follows that the estimated prediction equation for x,, given (Y;, X,)and treating x, as random, is
I, = - 14.5942 + 0.3801 Y;
+ 0.6822X,.
By (1.2.34), the estimated variance of the prediction error is
P(2,- x,((Y;,X,)} = 38.90. If we are willing to treat the x, as a random sample from a normal distribution and to condition on the observed Z,, the variance of the prediction error is about 84% of that of the estimate constructed treating x, as fixed. The values of 2,are compared to the values i f ,where the 2, were computed treating x, as fixed, in Table 1.2.2. Note that the IIvalues are all slightly closer to the mean of X than are the 2, values. See Exercise 1.18. on In Example 1.2.2 we constructed estimators of the x, values under the two assumptions, fixed x, and random x,. In fact, when we considered the x, to be random, we assumed them to be normally distributed. In the fixed-x case, the x values are the same for every possible sample that we are considering. Therefore, if x = (xl, x 2 , . . . , x,) is fixed and one is constructing
1.2.
MEASUREMENT VARIANCE K N O W N
25
unbiased estimators of x, one is asking that the average of the estimator be x, where the average is over all samples in the population of samples of
interest. In the soil nitrogen example the population of fixed-x samples are those samples always selected from the same set of 11 fields (sites). It is possible to think of a sampling scheme wherein the fields of Example 1.2.2 are selected at random and then the soil sampled in each selected field. If we consider the total sampling scheme consisting of random selection of fields combined with random selection of soil within fields, we have a population of samples in which the x values are random. However, it is also legitimate to consider the subset of all possible samples that contain the 11 fields actually selected. That is, we may restrict our attention to the conditional distribution of sample outcomes conditional on the particular set of 11 fields that was selected. When we restrict attention to the subpopulation of samples of soil selected from the 11 fields, we are treating the x values for those 11 fields as fixed. When the data are collected in an operation where fields are randomly selected, it is legitimate, in one context, to treat the x values as randomfor example, to estimate the variance of x-and in another context to treat the x values as fixed-for example, to estimate the individual x values for the 11 fields. In the first context the population of interest is the overall population of fields from which the sample was randomly selected. In the second context the population of interest is the set of 11 fields. 1.2.4.
Model Checks
In ordinary regression analysis it is good practice to plot the residuals from the fitted regression equation. For examples of such plots, see Draper and Smith (1981, Chap. 3). The construction of plots remains good practice when fitting measurement error models. The form of the plots for ordinary least squares and for measurement error models will differ somewhat because of the different nature of the measurement error problem. For an ordinary least squares problem, a common plot is that of the residuals against the independent variables. This plot often will give an indication of nonlinearity in the regression, of lack of homogeneity of the error variances, of nonnormality of the errors, or of outlier observations. For a fitted measurement error model, the quantites that correspond most closely to the residual and to the independent variable of ordinary least squares are G, and 2,, respectively. A measurement error model such as that given by (1.2.1) postulates constant variance for e, and u,. Therefore, u, = e, - plu, will also have constant variance. Because (e,, u,) is independent of x,, the expected value of u, given x, is zero. The best estimator of x, is 2,
26
A SINGLE EXPLANATORY VARIABLE
defined in (1.2.17) and (1.2.23). Under normality, u, and 2, are independent. Therefore, the mean of u, is zero for all 2, and the variance of v, is G,, for all xt. We are able to observe only 6,and i t ,not u, and i, but , the properties of the estimators should be similar to the properties of the true variables. It is suggested that the plot of Ct against P, be used in the same manner as the analogous ordinary least squares plot of deviations from fit against the explanatory variable. That is, the plot can be used to check for outliers, for nonlinearity, and for lack of variance homogeneity. Also, for samples of reasonable size, tests of normality can be applied to the G,. Estimators for models with heterogeneous error variances, for models with nonnormal errors, and for nonlinear models are discussed in Sections 3.1 and 3.2. As in ordinary least squares, the variances of the 6, are not all the same. Because Po and are estimated, the variances of the 6,will differ by a quantity of order n-’. An estimator of the approximate covariance matrix of (G,, 02, . . . , 6,) is given in Section 2.2.3. Using the estimated variances of that section, one could plot the standardized quantities [f{6,}]-1’28r against i t . However, for most purposes, the plot of 8, against 2, is adequate. Example 1.2.3. Figure 1.2.1 is a plot of the G values against the 2 values for the corn yield-soil nitrogen data, where the plotted values are taken from Table 1.2.2. This plot is for a small sample, but the plot contains no obvious anomalies. The mean and the range of the G values are similar for large, medium, and small P values. 00
10
-
8-
. 0
4-
l-
.
a
I 0 I
>
-4 -
-8 -
*
-10 I
0
I
I
I
I
I
1.2.
MEASUREMENT VARIANCE KNOWN
REFERENCES Birch (1964), DeGracie and Fuller (1972), Fuller (1980), Kendall a n d Stuart (1979), Lord (1960), Madansky (1959), Neyman (1951), Neyman and Scott (1948).
EXERCISES 12. (Sections 1.1.3, 1.2) (a) Assume model (1.2.1) holds with P , # 0 and no information availableAon o,, and (rep. Extend the results of Section 1.1.3 by constructing a pair of estimators and POL sych that the parameter Po is contained in the closed interval formed by plim pol and plim POL. (b) Assume that the data below were generated by model (1.2.1).
tot
t
1
2 3 4 5 6
1 8 9 10
XI
3.58 2.52 3.85 5.02 4.94 4.49 2.57 5.05 1.84 5.82
X I 3.94 2.44 3.95 5.93 5.31 4.55 2.60 3.81 1.75 6.04
t
11 12 13 14 15 16 17
18
19 20
XI
3.88 5.93 5.84 4.50 5.81 3.67 5.94 3.54 4.34 5.25
r, 3.32 5.70 5.25 4.24 4.6 I 3.63 4.20 4.03 4.22 4.33
Using these data construct an interval in p, space such that the probability of 8, being contained in a region so constructed is greater than or equal to 0.95. (c) Assume that the data listed below were generated by model (1.2.1). Construct a region in PI space such that the probability of P , being contained in a region so constructed is greater than or equal to 0.95.
t
1 2 3 4 5 6 7
a
9 10
x,
Y
t
x,
Y,
3.62 I .64 3.12 3.28 4.24 2.61 4.73 2.76 4.16 2.50
1.67 0.45 4.43 1.39 1.08 2.41 1.19 2.94 I .92 2.12
11 12 13 14 15 16 17 18 19 20
2.46 0.85 3.59 2.76 4.51 3.18 4.17 4.06 1.53 2.38
0.93 1.05 3.30 1.71 4.74 2.15 1.55 2.6 1 2.42 0.70
A SINGLE EXPLANATORY VARIABLE
28
13. (Section 1.2.2) (a) Let model (1.2.1) hold and let b, be defined by (1.2.3). Fpr samples with a,, and a,, positive, show that the conditional mean and variance of PI, given the sample values of XI, are
where X = ( X , , X,,. . . , XJ. (b) Show that (W
- c)-'dF(w),
where F(w)is the distribution function of a chi;square random variable and c is a positive number, is not defined. Hence, show that E { j , } is not defined. 14. (Section 1.2.2,Section 1.2.3) Reilly and Patino-Leal (1981) give the following data: Replicate ( j )
Condition (i)
v,
xi i
3.44 3.78 3.33 1.11 1.78 2.19
16.61 19.73 17.23 9.19 11.89 10.37 10.43 20.55 17.33
1.85
3
6.86 5.00
Assume the data satisfy the model
Xi = Po + P I X i j + eij, ( x i j ,eij, uijY
-
X,j = xij + uij,
NI[(pL,,0, O)', diag(u,,, u,,,
WI.
(a) Using the nine (Tj,Xi,) observations, estimate (Po, PI, ox,, uee).UsingATheorem 1.2.1, estimate the covariance matrix of the approximate distribution of (Po, PI). Estimate the x values treating them as fixed. Give the estimated variance of the estimated x values. Plot Bij against P,j. (b) Assume that uev= -0.4 and u,, = 0.8. Transform the data so that the covariance of the measurement errors in the transformed variables is zero. Estimate the transformed parameters using equations (1.2.3). Estimate the covariance matrix of the transformed parameters. Then, using the inverse transformation, estimate the original parameters. 15. (Section 1.2.1) Let XI = ( X I o ,Xll) = (1, XI,), and let (Po, /I1) of model (1.2.1) be estimated by (So,, S I M Y = (Mxx - %"I- ' M x , ,
I:=
Xi(x, XI) and Zuy= diag(0, uuu).How do these estimators differ where (M,,, M x x ) = C 1 from those defined in Equations (1.2.3)? 16. (Section 1.2.1) Show that a,, of (1.2.3) satisfies $,
= my" - (bl
- Pi).xr
t O,(n-') = m,, - /l:uUut oP(n-').
1.2.
29
MEASUREMENT VARIANCE KNOWN
17. (Section 1.2) Suppose that the relationship between phosphorus content in the leaves of corn ( Y ) and the concentration of inorganic phosphorus (x) in the soil is defined by the linear model Y, = Po + blx, + e,, X,= x, + u,, (x,, e,, 4’ NI[(P,, 0, O)’, diag(rrx,,,gee,0.25)],
-
where X, is the observed estimate of inorganic phosphorus in the soil. The chemical determinations for 18 soils are as follows: r
u,
x,
t
u,
XI
1
64 60 71 61 54 77
1.18 1.18 2.02 1.26 2.39 1.64 3.22 3.33 3.55
10 I1 12 13 14 15 16 17 18
51 76 96 77 93 95 54 68 99
3.69 3.45 4.9 I 4.91 4.75 4.91 1.70 5.27 5.56
2 3 4 5 6 7 8 9
81
93 93
(a) Calculate estimates of the parameters of the model. (b) Estimate the covariance matrix of the approximate distribution of (Po, P1). (c) Estimate the true x, values treating the x, as fixed. Estimate the variance of these estimates. (d) Plot 8, against L,. 18. (Section 1.2.3) (a) Using the I?, values of Table 1.2.2, regress X, on 0, using an ordinary regression program. Do not include an intercept in the regression. Verify that the deviations from fit of this regression are the 2, values of Table 1.2.2. How would you interpret the "Y-hat" values obtained from a regression of X I on u,? (b) Verify that the f, values of Example 1.2.2 satisfy A
2, -
x = [B""B,, + Beer7..]
~
.
%""BXX(L,- S),
where L, is defined in (1.2.25) and f, is defined in (1.2.33). Show that i l=
x + &?,
-
X),
where y* = [3{q,)]-'C{, 2 by
6 2 = (n - 2 ) - ’ ( n
- 1)i.
(1.3.34)
This definition of B2 is also consistent with the approximate chi-square distribution suggested for c?”,,. For unknown o2 there are three sample covariances (myy, mxy, mxx) available for the estimation of three parameters (oxx,PI, 0 2 ) .For known o2 (known I&&) there are three sample covariances available for the estimation of two parameters (oxx,pl). Therefore, knowledge of the full covariance matrix permits us to check for model validity. The smallest root of the determinantal equation (1.3.26) satisfies
1= (nee- 21jlae, + &,,)-l(myy
- 2/?,mxy
+ b&).
(1.3.35)
The numerator of this statistic is a multiple of the estimator of o,, constructed with deviations from the fitted line. The denominator is an estimator of c,, constructed with the known &. If were replaced by pl, the ratio would be distributed as Snedecor’s F with n - 1 and infinity degrees of freedom. In the next chapter it is demonstrated that (n - 2)- ‘ ( n - 1)x is approximately distributed as Snedecor’s F with n - 2 and infinity degrees of freedom, Therefore, when nee,oeu,and ouuare known, the st!tistic ( n - 2)-’(n - l)A can be used as a test of the model. If ( n - 2 ) - ’ ( n - l ) A is large relative to the tabular value of Snedecor’s F with n - 2 and infinity degrees of freedom, the model is suspect. Our derivation of Equation (1.3.27) demonstrates that the knowledge of o2 does not change the estimator of PI for the functional model. But knowledge of the error variance permits us to construct a test of model adequacy. This can be compared to the case of ordinary fixed-x regression. Knowledge of the error variance does not change the form of the fixed-x regression estimator of p,, but knowledge of the error variance permits construction of a test for lack of fit.
8,
Example 1.3.2. We analyze some data of Cohen, D’Eustachio, and Edelman (1977). See also, Cohen and D’Eustachio (1978) and Fuller (1978). The basic data, displayed in Table 1.3.2, are the numbers of two types of
1.2.
41
RATIO O F MEASUREMENT VARIANCES KNOWN
TABLE 1.3.2. Numbers of two types of cells in a fraction of the spleens of fetal mice
Individual
Number of Cells Forming Rosettes
Number of Nucleated Cells
Y
X
52 6 14 5 5
337 141 177 116 88
7.211 2.449 3.742 2.236 2.236
18.358 11.874 13.304 10.770 9.381
Source: Cohen and D’Eustachio (1978)
cells in a specified fraction (aliquot) of the spleens of fetal mice. Cohen and D’Eustachio (1978) argued, on the basis of sampling, that it is reasonable to assume the original counts to be Poisson random variables. Therefore, the square roots of the counts, given in the last two columns of the table, will have, approximately, constant variance equal to one-fourth. The postulated model is Y, = Po
+ BlX,,
where (y, X,) = (y,, x,) + (e,, u,), is the square root of the number of cells forming rosettes for the tth individual, and X , is the square root of the number of nucleated cells for the tth individual. On the basis of the sampling, the (e,,u,) have a covariance matrix that is, approximately, ZEE= diag(0.25,0.25). The square roots of the counts cannot be exactly normally distributed, but we assume the distribution is close enough to normal to permit the use of the formulas based on normality. Thus, our operating assumption is
(e,, u,)’
-
NI(O,0.251)
and we assume (e,, u,) is independent of xj for all t and j . The statistics associated with the data of Table 1.3.2 are (3.5748, 12.7374) and
(F, x)=
(my,, mxy, m x x ) = (4.5255, 7.1580, 11.9484).
The smallest root of
lmzz - A(0.25)II = 0 is i = 0.69605. Using (1.3.21) and (1.3.27), the estimated parameters of the line are (&, bl) = (-4.1686,0.6079).
12
A SINGLE EXPLANATORY VARIABLE
TABLE 1.3.3. Estimated numbers of two types of cells Observed t
I:
Observed
x,
E,
9,
4
1 2 3 4 5
7.211 2.449 3.742 2.236 2.236
18.358 11.874 13.304 10.770 9.381
7.0508 2.8878 3.8714 2.3403 1.7237
18.4563 11.6078 13.2260 10.7071 9.6929
0.2194 - 0.6009 -0.1772 - 0.1428 0.7015
The estimates of the x, and y, are, by (1.3.28) and (1.3.29),
+
+
it= 1.8503 0.4439Y; O.7302Xt, 9, = - 3.0438 0.2698Y; O.4439Xt.
+
+
The estimated true values are given in Table 1.3.3 and are plotted in Figure 1.3.2 as the dots on the estimated line. The original observations are plotted as crosses. In this example the covariance matrix of the measurement errors is proportional to the identity matrix. Therefore, the statistical distance is proportional to the Euclidean distance and the lines formed by joining (I;,X,) and (j$, 2,) are perpendicular to the estimated functional line. The statistical distance from (Y;, X,) to ( j , , a,) is, by (1.3.16) and (1.3.19), statistical distance =
(0.25)(1
+ &)
Y 6-
4-
2-
+
FIGURE 1.3.2. Estimated minimum distance line and estimated true values for two types of
cells.
I .?.
43
RATIO 01;M E A S U R E M E N T V A R I A N C E S K N O W N
Therefore, the statistical distance is proportional to the absolute value of 6,. The 6,are given in Table 1.3.3. As in Example 1.2.2,the sample correlation between 2, and 6,is zero. The estimation procedure has transformed the X,)into a new vector (C,, it), where the sample covariance observed vector between 0, and 2, is zero. By (1.3.30),the approximate variance of 2, is
(x,
q(2, - x,> = C(0.6079, 1)41(0.6079,1)’I-l = 0.1825. If the model is true, ( n - 2)-’(n - 1 ) x is approximately distributed as Snedecor’s F with three and infinity degrees of freedom. Because F = (3-’)4(0.69605)= 0.9281 is approximately equal to one, the data and the model are compatible. Figure 1.3.2might lead one to conclude that there is a nonlinear relationship between y, and x,. This interpretation should be tempered by two facts. First there are only five observations in the plot. Second, as the F statistic indicates, the deviations are not large relative to the inGependent estimate of the standard deviation of u,, where 6fi’ = O S ( 1 P?)”* = 0.59. Also see Exercise 3.10. We can check the assumption that c(x, - X)2 > 0 by computing
+
aui’mxx = 4(11.9484) = 47.79.
If mxx = ( n - l) - lx(x, - 2)’ were zero, the ratio would be distributed as Snedecor’s F with four and infinity degrees of freedom. It seems clear that mxx is positive and that the model is identified. By (1.3.31), w o ,
where 8,, = -bla,,
SJ>=
= -0.1520,
[
1
1.2720 - 0.0945 -0.0945 0.0074 ’
6,, = 0.3424,
q(2, - x,} = 11.7192, H z = (1 + /$)- ‘(jl, 1) = (0.4439,0.7302),
A,,
= H;m,,H2
-
A hypothesis of interest in the original study is the hypothesis that zero. On the basis of the approximate distribution of Theorem 1.3.1, t = (1.2608)-”’(-4.1686) = -3.71
Do is
is approximately distributed as a N ( 0 , 1) random variable when Po is zero. Thus, the hypothesis that the intercept is zero would be rejected at the 0.001 level. An alternative method of testing the hypothesis that Po = 0 is to estimate p1 using the uncorrected sums of squares and products. Under the null hypothesis that Po = 0, an estimator of PI is W X X
-
hJ ‘h?
44
A SINGLE EXPLANATORY VARIABLE
where (M,,,M,,) = n-'
X , ( x , X,)and
is the smallest root of
JMZz- I(0.25)I) = 0. For these data the smallest root is = 4.0573. If the model is true, then (n - l ) - ' d is approximately distributed as Snedecor's F with four and infinitity degrees of freedom. Because F = 5.07, the hypothesis that Po = 0 is rejected at the 0.001 level. The t test is a test of the hypothesis that y, = Pix, given th!t the alternative model for y, is y, = Po + Pix,. One conjectures that the I test will have smaller po_wer than the t test against the alternative of the affine model, because the 1 test has power against a wide range of 00 alternatives. One cannot hope to specify parametric configurations completely such that the approximate distribution of Theorem 1.3.1 will be appropriate in practice. However, some guidance is possible. One rule of thumb is to consider the approximations adequate if (n - 1)-1ii;:6:u
(1.3.36)
< 0.001.
This is roughly equivalent to requiring the coefficient of variation of (m,, - ouu)-lmXX as an estimator of m;:(mxx ouu) to be less than 5%. This suggestion is supported by Monte Carlo studies such as that of Martinez-Garza ( 1970) and Miller (1984), by higher-order expansions such as those described in Cochran (1977, p. 162), DeGracie and Fuller (1972), and Anderson (1974, 1976, 1977), and by the exact distributional theory of Sawa (1969) and Mariano and Sawa (1972). Also see Exercises 1.59 and 1.60. For the pheasant data of Example 1.3.1,
+
(n - l)-'rii;:&,
= (14)-'[(3.12895)-'(0.49229)]* = 0.0018.
For the cell data of Example 1.3.2,
(n - l)-'A;;a&
= (4)-'[(11.7744)-'0.25]*
= 0.0oO1.
Thus, although the sample size, n, for the pheasant example is larger, we expect the approximations to perform better for the cell example because A;:auu is smaller for the cell example. This is confirmed when we compare exact and approximate confidence intervals for the two examples. See Exercise 1.31 and Example 1.3.3. 1.3.4. Tests of Hypotheses for the Slope
pl,
While we are unable to establish the exact distribution of we can construct some exact tests for the model. Let model (1.3.1)-(1.3.2) hold with 6 =
I .3.
RATIO OF MEASUREMENT VARIANCES KNOWN
known and oe, = 0. We consider the null hypothesis that
fly, then
45
a, = by. If j1=
y, - B - p:(x,- X)= u, - 0, where u, = e, - P~u,, and the variance of u, is ow = b e e
+ @?)2cuu
=
[a + ( P ~ ) ' l ~ u u *
Because we know 6 and because (e,, u,) is normally distributed, we can construct a second random variable, say h,, that will be independent of u, if P1 = fly. We define
+ a 2 X ,= al(Po+ pix,) + a2x, + ale, + a,u,, where a , and a2 (a: + a: # 0) are chosen so that, under the null, the coh, = a,Y,
variance between h, and u, is zero. Because
C{h,, 4) = (a16 - ~zB:)b,Y, h, defined with a, = S - ' P y and a2 = 1 will be uncorrelated with u, under the null. If # 0, the definitions of h, and u, are symmetric. That is, the leads to a pair that is a multiple of the pair hypothesis that jl = -d(j:)-' (u,, A,), but with the identification interchanged. This symmetry means that we must consider two types of alternative hypothesis. First, assume that we know the sign of pl. Then the hypotheses are Ho:
= fly,
H A : p ,
20,
and
PI # P?,
(1.3.37)
where we have chosen nonnegative values as the parameter space for PI, with no loss of generality. Under alternative hypothesis (1.3.37),the null hypothesis p1 = by corresponds to the hypothesis that the correlation between u, and h, is zero. A t test or an F test can be used to test for zero correlation. The null hypothesis that PI = fly is accepted if the hypothesis of zero correlation is accepted, Furthermore, the set of PI with P, E [0, GO) for which the hypothesis of zero correlation is accepted at level a constitutes a 1 - CI confidence set for jl. We now consider the hypotheses, H,:
Pi
H A : p1
= f
p?*
p? # 0
With this alternative hypothesis the symmetry in the definition of u, and h, must be recognized. The acceptance of the hypothesis of zero correlation is equivalent to accepting the hypothesis that PI = or that PI = -6(fl:)-'.
46
A S l N G L E E X P L A N A T O R Y VARIABLE
However, if
= p: and oXx> 0, then
+
V { h , } = [S-'(p:)' l]'Oxx V {u,} = d,, (p:)20,, = [1
+
Therefore, the hypothesis that pothesis
pl =
H,: C{h,,v,} = 0 and
+ [s-'(pY + + (p:)26 - '160,".
1]Ouu7
corresponds to the composite hyV { h , } 2 6 - VCv,}.
Under the null hypothesis the three mean squares
1=
1
are mutually independent. This is because ms(,, is the residual mean square computed from the regression of u, on h, and mq2, is the mean square due to regression. Under the null, h, and u, are independent and, hence, mhhis independent of Xu;. Because the regression residual mean square ms,,, is independent of m h h , it follows that ms(2,is independent of m h h . Furthermore, E{ms(l,} = Ebs,,,} = 0"". Also, the sum
(n - l)muu= ( n - 2)ms(,,
+ ms,,,
is independent of the ratio [ms(,,]-'ms,,,. Therefore, the F test of the hypothesis that C{h,, u,} = 0, given by Ff-' = [ m ~ ~ ~ , ] - ' m s ( ~ , ,
is independent of the F test of the hypothesis that
F::
= 6-1m~'mu,.
O),h 2
(1.3.38)
6-'a,,,
given by (1.3.39)
These two tests may be combined in a number of different ways to obtain a test of the composite hypothesis. One procedure is composed of two steps.
Step 1. Test at level a1 the hypothesis that the correlation between h, and u, is zero using the F test Ff-, = [ms,,,]-'msol.
I .3.
47
RATIO OF MEASUREMENT VARIANCES K N O W N
If the hypothesis of zero correlation is rejected, reject the hypothesis that P, = Py. If the hypothesis of zero correlation is accepted, proceed to Step 2. Step 2. Test at level a2 the hypothesis that 2 h-la,, against the alternative that d-la,, > ah,, using the F test Fn n - 1l - d - l m ~ l m u v . If the hypothesis that h-'a,, ,< is accepted, accept the null hypothesis that 8, = fly. Otherwise, reject the hypothesis that p1 = fly. The probability of rejecting the hypothesis P{reject pylp,}
= a1
PI = fly
+ ( I - a,)P{reject d - ' ~ , ,
when it is true is ahhlPl
= fly}.
The probability of rejecting the hypothesis that S-lo,, d ahhis a function of the true gXx.This probability achieves its maximum of a2 for a,, = 0. Therefore, a1
< P{reject pyIpl = py} < a, + (1 - a1)az.
Given a test, it is possible to construct the associated confidence set. Therefore, a confidence set for 0 , with coverage probability greater than or equal to a1 + ( 1 - al)crZ is the set of fll for which and The confidence sets constructed by the methods of this section will not always be a single interval and can be the whole real line. For composite on the real line is consistent with the data when the quahypotheses any /I1 dratic in PI defined by (1.3.39) has no real root. This is roughly equivalent to the data failing to reject the hypothesis that axxis zero. Also, it is possible that the hypothesis of zero covariance will be accepted for all PI. In our treatment we have assumed that oeu= 0. This simplifies the discussion and represents no loss of generality. One can always transform the problem to one with equal-variance uncorrelated errors. For example, if Yae is known, we define Y: = (Yee- reur&lYue)1'2( I: - X,YL1Yue),
x: = y-l/zx uu
1'
Then the error covariance matrix for (YF, X : ) is a multiple of the identity matrix. The transformed model is
Y: = 0:
+ prx, + e,,
(1.3.40)
48
A S I NGLE EX P LA N A TOR Y VARIABLE
where the hypothesized value for
rl(u1/2rue.
fl:
is (Tee- TeuTi,1Tue)1/2T,!/2P~ -
Example 1.3.3. We illustrate the methods of this section using the data of Example 1.3.2. We construct a test of the hypothesis
H,:
fll
= 0.791
against the alternative, H A : /3, 2 0 and fll # 0.791. Under the null hypothesis I: - 0.791X, has variance
[1 + (0.791)’]0.25 = 0.4064
+
and is uncorrelated with h, = 0.791Y, X,.Because we assume we know that fl, 2 0, the null hypothesis that /3, = fly is equivalent to the hypothesis that y1 = 0 in the regression equation where u, obtain
-
- O.791Xr = y o
+
ylh,
+ u,,
NI(O,O.4064).Computing the regression of I; - O.791Xron h,, we
Thus, if fll = 0.791, the probability of getting a 9, this small or smaller is 0.025. Because we know crvv under the null hypothesis, the distribution of the “t statistic” is that of a N(0, 1) random variable. We chose the value 0.791 to yield a value of -1.96 for the test statistic. It can be verified that the hypothesis H,: PI = 0.450 will yield a test statistic of 1.96. Therefore, an exact 95% confidence interval for fll is the interval (0.450,0.791). That is, all values in the interval are accepted by the test. If we use the approximate distribution theory and the estimated variance P{I,} = 0.007418 calculated in Example 1.3.3, we obtain the approximate 95% confidence interval whose end points are 0.6079 f 1.96(0.0861). This interval is (0.439,0.777), which is very close to the exact interval (0.450,0.791). In this example the approximate theory works well because o,, is small relative to mxx. on REFERENCES Adcock (1877, 1878),Amemiya (1980), Anderson (1951b, 1976, 1977, 1984),Birch (1964), Fuller (1980),Kendall(l951, 1952),Kendall and Stuart (1979),Koopmans (1937),Kummell(1879),Pearson (1901),Sprent (1966),Tintner (1945),Villegas (1961). Section 1.3.4. Creasy (1956),Fieller (1954),Gleser and Hwang (1983, Williams (1955). Sections 1.3.1-1.3.3.
I. 3 .
49
RATIO OF MEASUREMENT VARIANCES KNOWN
EXERCISES 20. (Sections 1.3.2,1.3.3)The data below are 10 pairs of observations on hectares of corn for 10 area segments as determined by aerial photography ( y) and by personal interview (Xi).
Y = (97.1,89.8,84.2,88.2,87.0,93.1,99.6,94.7, 83.4,78.5), = (96.3, 87.4,88.6,88.6,88.6,93.5,92.9,99.0,77.7,76.1).
X
Assume that the data satisfy model (l.3.l), (1.3.2)with u,, = ow”. (a) Estimate (Po, BI, uuu)and the covariance matrix of the estimators. (b) Treating x , as fixed, compute the estimates of (x,, y,). (c) Plot the data, Ihe line, and (a,,j,). Plot G, against P,. (d) Compute an estimate of the variance of 4,. 21. (Sections 1.3.2,1.3.3) (a) Verify, for o,, = 0, that the estimators (1.3.27)and (1.3.7)are algebraically equivalent. (b) Verify Equation (1.3.19). 22. (Section 1.3.2) Estimate the parameters of the model studied in Example 1.3.1 under the assumption that 6 = 3. Estimate the covariance matrix of your estimators. 23. (Section 1.3.2) Let yi = Po + Plxrr ( x f ,e,, ulY
-
NI[(P,, 0,OY, diag(u,,,
u,,)l,
gee,
and (U,,Xi) = (y,, x,) + (e,,u,). Let yyy = ueeu;yl and y x x = uuuu,l, where uyy= P ~ u x xThe . ratio y x x = uuuu;xl sometimes is called the noise-to-signal ratio of X . Give the variance expression of Theorem 1.3.1 in terms of yyy and yxx. 24. (Sections 1.3.2,1.3.3)Let PI, 0,. and x^, be defined by (1.3.27),(1.3.28),and (1.3.29). (a) Show that I i?$, = 0. (b) Show that
x;=
(c) Let A,,, d ,,”, and d,, be defined by (1.3.31).Show that
x
where is the smallest root of (1.3.27). 25. (Sections 1.3.2,1.3.3)Estimate the true x and y values for Example 1.3.1,assuming the x values to be a random sample from a normal population. Give the approximate covariance matrix of your estimators. 26. (Section 1.3.4)Construct a 95% confidence interval for the PI of Example 1.3.1 given that the parameter space is the entire real line. 27. (Section 1.3.2)Show the equivalence of the two expressions for s,, defined following (1.3.12). 28. (Sections 1.3.2, 1.3.3)Let
(F, X , ) = (Y,, x,) + (e,, u,), for 1 = I, 2,. . . ,n, where E{(e,, u,)) = (0,O).Let u, = 0 so that y, = Px,,
%)I = o,,
E{(ei, @lY(ec,
diagtI, 0).
Use (1.3.27)to obtain an estimator of /I. Show that this estimator is the ordinary least squares regression coefficient obtained by regressing U, on x,. Use formula (1.3.12)to construct an estimator of the variance of the estimator.
50
A SINGLE EXPLANATORY VARIABLE
so
29. (Section 1.3.3) Treating and as known parameters, compute the estimated variance of 3, - y , for Example 1.3.2. Why is the estimated variance of 3, - y, smaller than the estimated variance of i,- x,? 30. (Section 1.3.3) Let X, = (1, X,,),
1 ( X , X,Y(U,, XJ, (1
M
= n-l
I= 1
F, x,2 - X 2 H K -
m=(n-
where X, = n-
'
X t 2 . Let iand f be the smallest roots of IM - LEI= 0 and Im - yI,I = 0,
respectively, where E diag(1, 0, 1) and I2 is the identity matrix of dimension two. What is the relationship between I and f? 31. (Section 1.3.4) Using the methods of Section 1.3.4, construct an exact 95% confidence interval for /l, for the data of Example 1.3.1. You may assume that it is known that PI > 0. Compare the exact and approximate intervals. 32. (Section 1.3.4) Using the data of Example 1.3.1, construct a test of the hypothesis H,: (/lo, PI) = (0.10,0.95) against the alternative, HA:(/lo, PI) # (0.10,0.95), PI > 0. 33. (Section 1.3.3) Let I, > I, be the two roots of lmzz -
Arezl= 0,
where (K, X,) satisfies model (1.3.2), with Zcc= Tapz,and u2 is unknown. Show that
a,,
=
[(PI, w;l&, i ~ ~ -- i2), ~ ( i ~
where a,, is defined in (1.3.8). 34. (Sections 1.3.2, 1.3.3) Assume that the data of Exercise 1.5 satisfy model (1.3.1)with aA= 0.5. Estimate (Po, P1, uuu).Estimate the covariance matrix of the approximate distribution of (Po, P1). Plot 8, against 2,.
1.4. INSTRUMENTAL VARIABLE ESTIMATION
In Sections 1.1.2, 1.2, and 1.3, we constructed estimators of the parameters of structural models that were identified by knowledge about the error variances. In this section we consider the use of a different type of auxiliary information. Assume that the model of interest specifies
y; = P o + P I X I +
el,
XI
= x,
+
(1.4.1)
UI,
for t = 1,2,. . . , n, where the e, are independent (0, gee)random variables. The vector XI)is observed, where u, is the measurement error in X,.In addition, we have available a third variable, denoted by known to be correlated with x,. For example, we might conduct an agronomic experiment where X, is the observed nitrogen in the leaves of the plant and is the dry weight of the plant. We would expect the true nitrogen in the leaves, x,,
(x,
w,
x
I .4.
51
INSTRUMENTAL VARIABLE ESTIMATION
w,
to be correlated with nitrogen fertilizer, applied to the experimental plot. Furthermore, it is reasonable to assume that both u, and el are independent of K. We give the definition of a variable such as K. Definitiun 1.4.1. Let model (1.4.1) hold and let a variable
tt: satisfy (1.4.2) (1.4.3)
where W = n - ' model (1.4.1).
I:= w.Then
is called an instrumental variable for x, of
It is convenient to have a parametric expression for the fact that x, and are related, and we use the parameters of the population regression of x, on to quantify the relationship. Let
n I 2= E ( X
-
(1.4.5)
nz,w}.
Condition (ii) of Definition 1.4.1 is equivalent to specifying n z z # 0. The II coefficients are defined with double subscripts so that they will be consistent with the notation of higher-order models. Using the n coefficients, we can write (1.4.6) t = 1,2,. . . , n, x, = n,, + n22M: + r,,
w.
By the where r, is the failure of x, to be perfectly linearly related to regression method of construction, r, has zero correlation with q.Equation (1.4.6) and the definition of X, yield
+
XI
= n,,
E{z:=,
+ %zw + a,,,
(1.4.7)
where afz = r, u, and Pi@,) = 0. In model (1.4.1) and Definition 1.4.1, x, can be fixed, random, or a sum of fixed and random components. Likewise, the variable Pi( can be fixed, random, or a sum of random and fixed components. If M: and x, contain fixed components, nZ2and n12could properly be subscripted with n because the expected value will be a function of the fixed components of the n observaare fixed, condition (ii) of Definition 1.4.1 reduces to tions. If both x, and
c (lqn
,-I
1= 1
W)X,
# 0.
(1.4.8)
52
A SINGLE EXPLANATORY VARIABLE
One possible choice for W; is a measurement of x, obtained by an independent method. If x, is fixed and if W; is a measure of x, containing error, then & is a sum of fixed and random components. We defined an instrumental variable in terms of model (1.4.1). If it is known that Po = 0, then the conditions (i) and (ii) become
Therefore, if the mean of x, is not zero, the variable that is identically one becomes a possible instrumental variable for the model with structural line passing through the origin. The idea behind instrumental variable estimation is seen easily when all variables are normally distributed. Therefore, to introduce the estimators, we assume (1.4.9)
where px = nI2 + n2,pw and we assume oXw= 7 1 2 2 ~ w w # 0. The model specifies two zero values in the mean vector and two zero covariances in the covariance matrix of (1.4.9), but the remaining parameters are unknown. In particular, we have not assumed oeuto be known or rsXuto be zero. Under the assumptions, the observed vector X,,F)is normally distributed with mean vector (PI,,Px3 Pw) = (Po + PI7112 + P17122Pw3 7112 + 7122PWI PW) and covariance matrix
(x,
+ 2 P 1 = x e + Gee + Ox, + P 1 o x u + oeu
fl:=xx
PI&
PlDxx
+ =xe + PI=,, + 6,” + 2=xu + =”u
=xx
P17122=ww
P17122=ww 71226ww
r22uww
=ww
1 ’
(1.4.10) The model (1.4.1), (1.4.9) contains 12 independent unknown parameters. One set of parameters is
b w - n1297122, P o , P I , =ww, =xxr
=xe, =xu, =uu, Gee, ueu).
The set of minimal sufficient statistics for a sample of n observations is the mean vector and the sample covariances of (Y,, X I &).There are nine statis-
I .4.
53
INSTRUMENTAL VARIABLE ESTIMATION
tics in this set. Therefore, we cannot hope to estimate all 12 of the parameters of the model. However, we note that the ratio of covariances ~i&JYW
=(
~ 2 2 ~ W W ) - 1 8 1 ~ 2 2= ~ w81. W
P1 and Po by j 1-1 - mXWmYWI
(1.4.11)
It follows that we can estimate
(1.4.12) (1.4.13)
p, = r - j,s,
where (7, X) = n-'
c:=(x,X , ) and
(my,+,,mxw> = (n - I ) - '
C (x- Y,X,- XI(& - TI).' n
,= I
Because the sample moments are consistent estimators of the population moments, the estimators of p1 and Po will be consistent under the model assumptions. The assumption that cxW# 0 is critical. If-cr,, = 0, the denominator of pl defined in (1.4.12) is estimating zero and 8, is not a consistent estimator of PI. The limiting properties of the estimators are given in Theorem 1.4.1. The assumptions of the theorem are less restrictive than the assumption of trivariate normality used in introducing the estimator.
w)'
Theorem 1.4.1. Let model (1.4.1) hold. Let the vectors (x,, e,, u,, be independently and identically distributed with mean (px,0, 0, p,)' and finite are zero and that fourth moments. Assume that the covariances oweand cWu oxW# 0. Also assume E { ( W - P W ) 2 u : ) = owwcvu, E { $ ( & - PW)) = 0,
I"[(.
(jo, j , ) be defined by (1.4.13) and (1.4.12). Then - Po] L + + P Y 2 2 -Pxv22]) >
where Y, = e, - Plu,. Let nl,2[Bo
(1.4.14)
81 - 81
0 '
["..
-PxVzz
v,2
where V2, = cr.&WWa,u.
Proof. The error in the sample moments is 0,(n-"2) because they are, approximately, the mean of n independent, identically distributed random variables with finite variance. Using the method of statistical differentials of Appendix 1.A, we have
54
A SINGLE EXPLANATORY VARIABLE
+
c:= (w
+
+
where we have used I: = Po PIX, u, and myw = Plmxw mwu. The variance of n - l - pw)u, is n-lowwo,,by assumption. The quantity n - 1 / 2 M W converges u in distribution to a normal random variable because the random variables (4- pW)v,are independently and identically distributed with zero mean and finite variance. The limiting distribution of n'/2(j1 - pl) then follows. Writing iio
- P o = v - PAiil = n-
c
- 01)
n
,=1
[Dl
+ O,(n- l )
- P x G k ( 4- Pw)01l
+ O,(n- l )
and using E{u?(W - pW)} = 0, we obtain the result for
j0.
0
Theorem 1.4.1 covers a broad range of possibilities. In particular, it is possible for (el, u,) to be correlated with x,, and e, may be correlated with u,. The critical assumptions owe = ow" = 0 and oXw# 0 permit the estimation of PI in the presence of measurement error that is correlated with the true values. The theorem was given for random (x,, but the result holds for fixed x, and for x, that are the sums of fixed and random components, under mild conditions. Because the sample moments are consistent estimators of the population moments, a consjsteFt estimator of the covariance matrix of the approximate is distribution of (/lo,/I1)
w),
s,,
= ( n - 21-1
I =1
-
F - jl(x,- X)]'.
(1.4.16)
In most practical applications of the method of instrumental variables, one will wish to check the hypothesis that oWx# 0. A test that owx = 0, under = 0, can be constructed by testing the hypothesis the assumption that oUw that the regression coefficient, in the regression of X , on i4( is zero. While the distribution of Theorem 1.4.1 is only approximate, one can construct an exact test of the hypothesis against the alternative, H A : P1 # P?, under the stronger assumptions that (e,,u,) is normally distributed and independent of i4(. The random variable
4 = r; - PYX, = Po + (P1 - P 3 x , - P%, + e,
(1.4.17)
I .4.
55
INSTRUMENTAL VARIABLE ESTIMATION
is then independent of W , if and only if p , = py. It follows that a test of the hypothesis that the population regression coefficient of h, on r/c: is zero is a test of the hypothesis that P1 = fly. Also, the set of values of for which the hypothesis is accepted at the q level constitutes a 1 - q confidence set for pl. The confidence set is the set of py for which ( n - 2)-'tf or, equivalently, the set of
(mhhmWW- miw)-lrn&,, for which
where t, is the q point of Student's t distribution with n - 2 degrees of freedom. The boundaries of the set are the solutions to the quadratic equation Imxxmww - [ 1 - [I
+ t i 2 ( n - 2)Im:wlP:
- 2{mwwmx,
+ t i 2 ( n - 2)]mywmxw}Pl+ mYYmWW- [I + a,
-
2)]miw = 0, (1.4.19)
provided the solutions are real. The values of satisfying (1.4.18)are generally those in the closed interval with end points given by the solutions to (1.4.19). However, the set is sometimes the real line with the open interval deleted and, if the solutions to (1.4.19) are imaginary, the set is the entire real line. The condition (1.4.18) must be satisfied for the confidence set. Given that one has an instrumental variable available, one might ask if the ordinary least squares estimator obtained by regressing k; on X , is unbiased for p,. The ordinary least squares estimator is unbiased when oxy= Pigxx, and under this condition, it follows from (1.4.10)that the population regression of I: on ( X , , W,) is
From (1.4.20)a test of the hypothesis that the ordinary least squares estimator of jl1is unbiased is equivalent to a test of the hypothesis that the coefficient of in the multiple regression of I; on X , and is zero. That is, we test the hypothesis that y = 0 in the equation
w
w
I: = P o + PlX, + YK +
ot*
(1.4.21)
The analysis associated with (1.4.21) also explains why variables whose theoretical coefficients are zero are sometimes significant in an ordinary least squares regression. If theory specifies Y to be a function of x only, x is measured imperfectly by X , and W is correlated with x, then the coefficient for W in the multiple regression of Y on X and W is not zero.
56
A SINGLE EXPLANATORY VARIABLE
Example 1.4.1. To illustrate the use of an instrumental variable we study reported magnitudes of Alaskan earthquakes for the period from 1969 to 1978. The data are from the National Oceanic and Atmospheric Administration’s Hypocenter Data File (Meyers and von Hake, 1976). These data have been studied by Ganse, Amemiya, and Fuller (1983). Three measures of earthquake magnitude are the logarithm of the seismogram amplitude of 20 second surface waves, denoted by Y,, the logarithm of the seismogram amplitude of longitudinal body waves, denoted by XI,and the logarithm of maximum seismogram trace amplitude at short distance, denoted by 4. Table 1.4.1 contains the three reported magnitudes for 62 Alaskan earthquakes. These magnitudes are designed to be measures of earthquake “strength.” Strength is a function of such things as rupture length and stress drop at the fault, both of which increase with strength. A model could be formulated to specify average rupture length and stress drop for a given strength. In addition to variations in fault length and stress drop from averages given by the strength model, there is a measurement error associated with the observations. The measurement error includes errors made in determining the amplitude of ground motion arising from such things as the orientation of a limited number of observation stations to the fault plane of the earthquake. In this example the relationship between the amplitude of surface waves, I;, and the true value of body waves, x,, is of interest. The proposed model is
I; = Po + P l x r + el,
X,= x, + u,,
where x, is the true earthquake strength in terms of body waves and (el, u,) is the vector of measurement errors, it being understood that “measurement error” also includes the failure of the basic model to hold exactly for each The earthquake. We assume (e,, u,) is uncorrelated with the measurement vectors (Y,, XI,W;) are assumed to satisfy the conditions of Theorem 1.4.1. The sample mean vector for the data of Table 1.4.1 is
w.
(Y,X,
W )= (5.0823,5.2145,5.2435)
and the sample covariance matrix is
[::: mwu
myx muw mwx mxx
1
0.6198 0.2673 0.4060 0.2673 0.2121 0.2261
.
mww m x w ] = [ 0.4060 0.2261 0.4051
The test of the hypothesis that g X w = 0 is constructed by regressing XIon and testing the hypothesis that the coefficient of W; is zero. The regression coefficient is 0.5581 and the t statistic is 9.39. Therefore, we are comfortable using as an instrumental variable. By (1.4.12)and (1.4.13),the instrumental
1.4.
57
INSTRUMENTAL VARIABLE ESTIMATION
TABLE 1.4.1. Three measures of strength for 62 Alaskan earthquakes Observation t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Surface Wave
K
Body Wave Xl
Trace
Observation
5.5 5.7 6.0 5.3 5.2 4.7 4.2 5.2 5.3 5.1 5.6 4.8 5.4 4.3 4.4 4.8 3.6 4.6 4.5 4.2 4.4 3.6 3.9 4.0 5.5 5.6 5.1 5.5 4.4 7.0 6.6
5.1 5.5 6.0 5.2 5.5 5.0 5.0 5.7 4.9 5.0 5.5 4.6 5.6 5.2 5.1 5.5 4.7 5.0 5.1 4.9 4.7 4.7 4.5 4.8 5.7 5.7 4.7 4.9 4.8 6.2 6.0
5.6 6.0 6.4 5.2 5.7 5.1 5.0 5.5 5.0 5.2 5.8 4.9 5.9 4.7 4.9 4.6 4.3 4.8 4.5 4.6 4.6 4.3 4.6 4.6 4.9 5.5 4.7 4.1 4.9 6.5 6.3
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
u:
Surface Wave
Body Wave Xl
Trace
5.4 5.3 5.7 4.8 6.4 4.2 5.8 4.6 4.7 5.8 5.6 4.8 6.2 6.8 6.0 4.6 4.1 4.4 4.0 5.0 5.9 5.7 5.0 5.3 5.7 4.7 4.6 4.2 6.5 4.9 4.4
5.7 5.7 5.9 5.8 5.8 4.9 5.7 4.8 5.0 5.7 5.0 5.1 5.2 5.5 5.8 4.9 4.7 4.9 5.3 5.0 5.6 5.5 4.1 5.5 5.5 4.8 4.8 4.5 6.0 4.8 5.0
5.1 5.7 5.8 5.7 5.7 4.9 5.9 4.3 5.2 5.9 5.4 5.5 5.8 6.2 5.8 4.7 4.5 4.6 5.2 5.6 5.9 5.9 5.3 5.5 5.4 4.7 4.6 4.6 7.1 4.6 5.3
I:
t
w
Source: Meyers and von Hake (1976).
variable estimator of (Po, P1) is
(bo,bl)= (-4.2829,
1.7960).
By (1.4.16) and (1.4.19, s,, = 0.3495 and the estimated covariance matrix is 1.2404
-0.2368
- 0.2368
1
0.0454
'
58
A SINGLE EXPLANATORY VARIABLE
Using P{bl},an approximate 95% confidence interval for p1is the interval (1.370,2.222). To illustrate the computation of an exact test for pi, we test the hypothesis that p1 = 1.3. We assume (el, u,)’to be normally distributed and regress the created variable
h, = Y, - 1.3X, on W;.Student’s t statistic for the hypothesis that the coefficient of is zero is t = 2.72. If we calculate the t values for several h, created with values of /.I1 near 1.37 and for several h, created with values of p1 near 2.22, we find that we obtain a t value of -2.00 for pi = 2.300 and a t value of 2.00 for p1 = 1.418.Note that the two-sided 5% point of Student’s t distribution with 60 degrees of freedom is 2.00. Therefore, an exact 95% confidence interval for p1 is (1.418, 2.300). Alternatively, one can construct the interval using (1.4.19)and (1.4.18). To test the hypothesis that the ordinary least squares regression of I: on X, would provide an unbiased estimator of PI, we compute the multiple regression of Y, on XIand K. The estimated regression line is
f
=
- 1.258 + O.474Xt+ 0.738%, (0.653) (0.195)
(0.141)
where the numbers in parentheses are the estimated standard errors. The test of the hypothesis that the coefficient of is zero is t = 5.23. Therefore, we reject the hypothesis that the ordinary least squares estimator of /.I1 is unbiased. See Exercise 1.42 for further analyses of these data. no
REFERENCES Anderson (1976), Anderson and Rubin (1949, 1950), Basmann (1960), Durbin (1954), Fuller (1977), Halperin (1961),Johnston (1972),Sargan(1958),Sargan and Mikhail (1971).
EXERCISES 35. (Section 1.4) (a) Let model (1.4.1) hold, let variable. Show that
‘f
1=1
j, be = 1=1
defined by (1.4.12),and let
w[x - F - B , ( X , - XI] = 0.
be the instrumental
I ,5.
59
FACTOR ANALYSIS
-
(b) Let model (1.4.1) hold with .Y, fixed and let a fixed instrumental variable W( be available. Let m,, = 0. If (e,, u,) NI(0, I d ) , what is the distribution of PI? 36. (Section 1.4) In Example 1.3.1 the estimated intercept was nonsignificant. Assuming the intercept to be zero, use the method of instrumental variables, with the constant function as the instrumental variable, to estimate PI of the model
Y, = Dlx, + el,
XI = x, + u,.
Estimate the variance of the approximate distribution of the instrumental variable estimator of 81. 37. (Section 1.4) Assume the model
x = Po + Pix, + el,
-
( x 1 3el, 4’
X,= xc+ u,,
NIC(P,, OY, diag(a,,,
%)I.
gee,
Let two independent identically distributed X determinations and a Y determination be made on a sample of n individuals. Consider the estimator
cr=
where XI, and XI2are the two determinations on XI, and X , i = n - ’ XIi.Give the variance of the limiting distribution of the estimator in terms of the parameters of the model. 38. (Section 1.4) Assume that only the value W,,= 6.5 is available for a new earthquake of the type studied in Example 1.4.1. Predict x63. If the vector (Y,,, x,,,w63)= (6.5, 6.1, 6.5) is available, estimate x , ~treating x6, as fixed. Compare the estimated variances of the two procedures under the added assumption that up”= 0. In computing the variances, treat all estimates as if they were parameters. (Hint: Under the assumption u,, = 0, one can estimate useand L.) 39. (Section 1.4) Estimate the parameters of the model of Exercise 1.14 using condition, where condition is the value of i, as an instrumental variable. Estimate the parameters using (i - 2)’ as an instrumental variable. Compare the estimated variance of Dl for the esimates constructed with the two different instrumental variables. 40. (Sections 1.1.2, 1.4) Let model (1.4.1) hold and let u: = x, + b,, where (e,, u,. b,) are independent identically distributed zero mean vectors with diagonal covariance matrix, finite fourth satismoments and uuu= ubb. Let (e,, u,, 6,) be independent of xj for all t and j . Show that fies the conditions for an instrumental variable. For x, satisfying the conditions of Theorem 1.4.1 show that the correlation between XIand I.t; estimates K ~ ~ .
1.5.
FACTOR ANALYSIS
The model of Section 1.4 contained a third variable and the assumption that the errors in the original ( Y , X)pair were uncorrelated with the third variable. These assumptions enabled us to construct estimators of the parameters of the equation involving x, but were not sufficient to permit estimation of all parameters. In this section we consider the model of Section 1.4 under added
60
A SINGLE EXPLANATORY VARIABLE
assumptions. Let
+ P l l X , + ell, x 2 = P o 2 + P 1 2 X r + er2, X I = Po1
x,= x, + UI,
(1.5.1)
where (XI, X2,X,)can be observed. Assume that ex, > 0, Pll# 0, and (Xp
eft, etz, d’ NI[(P~,O,O,O)’, diag(oxx, Q e e l l r ‘V
PI2 # 0,
eee22, cuu)I*
(175.2)
It is the assumption that the covariance matrix is diagonal that will enable us to estimate all parameters of the model. The model is the simplest form of the factor model used heavily in psychology and sociology. In the language of factor analysis, xris the commonfactor or latent factor in the three observed variables. The variables e,,, er2,and u, are sometimes called uniquefuctors. Note that the first subscript of Pi, identifies the x variable (factor) and that the second subscript identifies the Y variable. The two Y variables enter the model in a symmetric manner, and we choose to identify them with the same letter but different subscripts. This emphasizes the special assumptions of the factor model relative to the instrumental variable model. In fact, the three variables XI, X2,and X,enter the model in a completely symmetric way, as we illustrate in Example 1.5.2. Henceforth, we will often use the four subscript notation for population and sample moments. For example, the covariance matrix of e, = (e,,, er2)is denoted by
and the covariance between the first element of the vector Y,= (TI, the second element of e, is denoted by cYel2= C{ XI, e t 2 } . Under assumptions (151) and (1.5.2), P:lcxx
+ Dee11
PlIP12nxx
x2)and
P1lcx.x
(1.5.3) where PYl = P o 1 + P 1 1 P x ,and P Y Z = P o 2 + PltPX. The ratio of the variance associated with the common factor to the total variance of an observed variable is called the communality of the observed variable in factor analysis. Thus, the communality of k;, is ~ 4 = 1 CP:lcxx
+ ceellI-lP?lcxx
= 1 - ci*11geeii*
1.5.
61
FACTOR ANALYSIS
was called the reliability ratio in Section 1.1. The ratio uniqueness of variable E;, . Given a sample of n observations, the matrix of sample moments about the mean is Note that
nyyI - 1 I neel is called the
[
mYYll
mzz=
where Z, = ( X I ,
,I
mYXll
mYY12
myyzi
m ~ ~ 2 mrx21 2
mXYll
mXYlZ
3
mXXll
x2.X , ) , and, for example,
Since there is only one X variable, we can denote the sample variance of by m x x or by mxx11. The model contains nine independent unknown parameters. One set of pa~ , n,,, d.Using (1.53, we can rameters is (Pol, PI1, PO2,P12, n e e l l ,o , , ~ n,,,,, equate the sample moments to their expectations to obtain the estimators;
x
(ix, 801, 8 0 2 ) = (X, TI A
A
(0119 PI21
-
a,,x,yz
= (mx:12mYYI2,
(1.5.4)
-8 1 2 n
m,:I,m,Y2,),
for i = 172,
n*eel1. . = myyii - 8 ; i a x x nu,,
= mxx -
6xx
= miy112mxYllmxY12.
nxx,
The estimator of PI is the instrumental variable estimator for the relationship between and x, using as the instrumental variable. In the same way, the estimator of PI2 is the instrumental variable estimator of the relationship between q2 and x, using TI as the instrumental variable. For the # 0. The condimodel to be identified we must have n x y l l # 0 and cXyl2 tions on the two covariances correspond to the conditions required of instrumental variables. The estimators of the error variances are symmetric in the moments. For example, m~~,,m~,,,m,,,, is an estimator of that portion of the variance of XI that is associated with the common factor. The expression for 6,, is the analogous expression for X I . Alternative expressions for the estimators of the error variances are
x2
x,
GPeii= iiuuii - r(i:i6u,,,
(1.5.5)
i = 1,2,
6uu = ~ ~ 1 1 8 1 2 ~ - 1 ~ u u 1 2 ~ where iiuvij = (n - l ) - ' 6,i6,j, and 6,i = - 8 - (X, - X ) P , , . Because
c:=l
xi
-
E{4lU,2} = P 1 1 P 1 z ~ u u ,
+
E{u:} = neeii P:inuu,
i = 1, 2,
A
62
A S I NGLE EX P LA N A TOR Y VARIABLE
where uti = eti - urPli for i = 1,2, we see that the estimators of the error variances given in (k.5.5) are obtained by replacing (ql, ot2) with (8tl, 42), (Pll, P12) with pI2),and equating the resulting estimated sample moments of uIi to the expectation of the true sample moments. Under the normal distribution assumption, the estimators defined by (1.5.4) are maximum likelihood estimators adjusted for degrees of freedom, provided the solutions are in the parameter space. It is possible for some of the estimated variances of (1.5.4) to be negative. Maximum likelihood estimation for these types of samples is discussed in Section 4.3. The estimators (1.5.4) are continuous differentiable functions of the sample moments and we can use the method of statistical differentials to express the estimators as approximate linear functions of moments. By the arguments of Theorem 1.4.1 we have
(Ill,
811
/$i
- Pli = 0;:jlmYuji + ~ p ( n - ~ ) ,
- poi = vi - B(p*li- p l i )
+ OP(rP),
(1.5.6)
for j # i and i = 1,2. The vector of partial derivatives of 8,, with respect to the vector (mYXllrmyYl2,mxxI mxy12) is ( - m 2 1 2 m x y 1 2 , m;A
2mXY 1 191,
- m;A
2mxy 1 1).
If we evaluate these derivatives at the expected values of the moments we have
4"- 0," = ~ a 1 1 P 1 2 ) - 1 C ~ u u 1 2 ouu121 + beeii
qJ(n-1)9
(1.5.7)
- ceeii = muuii - PliP,lmvvij + Op(n-
for i # j and i = 1,2. The expressions for the estimated error variances can also be obtained directly from (1.5.5). The approximate expressions for the estimators enable us to construct the covariance matrix of the limiting normal distribution of the estimators. Construction of the estimated covariance matrix is illustrated in Example 1.5.1. .The approximate expressions for the estimators are informative for several reasons. First, in the limit, the estimators of the error variances, crceii, are functions only of the moments of u1 and u2. That is, the nature of the distribution of the true x vaiues has no influence on the distribution of the estimators of error variances. Second, given the independence otx, and the (et1,et2, uI), the covariance matrix of the limiting distribution of (poi,Sli)can be given an explicit expression that depends only on the second moments of the original distribution. Finally, only the distribution of 6,, depends on the higher moments of x,. This means that expressions for the limiting distributions of the estimated coefficients and for the estimated error variances can be obtained under quite mild assumptions on the x,.
1.5. FACTOR
63
ANALYSIS
An estimator of the uniqueness of variable 1 - Rii
TI is
-1
(1.5.8)
= mrriiceeii,
where R,, = m;;, ,(&,8J is the estimated reliability ratio (communality) for An estimator of the variance of the approximate distribution of R1 is
xl.
c{Rll}= m ; ~ l l ~ { r 3 e e l , }+ 2(n
-
l)-’(l - R1,)2(2R,1 - I),
(1.5.9)
where 3{6,e11 } is constructed in Example 1.5.1. It is sometimes of interest to estimate the true x, value that generated the vector X,). We can write model (1.5.1)as
(xIyx2,
[““,PO’] x 2 -Po2
=
I”:’]
4
PI2
+
[%] el2
(1.5.10)
.
Therefore, if (pol,/Io2, PI,, p12)and the covariance matrix of (e,,, er2, u,) are known, the generalized least squares estimator of x,, treating x, as fixed, is
See Exercise 1.43 and Section 1.2.3. The estimated value for x, obtained by replacing the parameters with estimators is
An alternative form for i,is i r
where mu”and
(fit,,
= Xt - ( f i t 1 7
A
-1
(1.5.13)
A
fii2)muv muu,
Gr2) are defined in (1.5.5) and h ” U
= (-bll&u9
-ij128uu)l.
See Exercise 1.43 and Chapter 4. An estimator of the variance of 2, - x, is
qXI,
-
x,) = (j:lLi&;
1
+
j:,s,:2
+
&;I)-
(1.5.14)
where the effect of estimating the parameters is ignored. The predictor of x,, treating x, as random, is given in Exercise 1.46 and is discussed in Section 4.3.
Example 1.5.1. In 1978 the U.S. Department of Agriculture conducted an experiment in which the area under specific crops was determined by three different methods. The three methods were digitized aerial photography, satellite imagery, and personal interview with the farm operator. We
64
A SINGLE EXPLANATORY VARIABLE
denote the hectares of corn determined for an area segment by the three and X,,respectively. An area segment is an area of the methods by &, earth's surface of approximately 250 hectares. Observations for a sample of 37 area segments in north-central Iowa are given in Table 1.5.1. We begin by assuming the data satisfy model (lS.l), (1.5.2). The sample mean vector is
x2,
(Fl, F2,X)= (123.28, 133.83, 120.32) and the sample covariance matrix for
m,,
=
[
(xl, x2,X I )is
1
1196.61 908.43 1108.74 908.43 1002.07 849.87 . 1108.74 849.87 1058.62
By the formulas (1.5.4) we have
[ ~ o l , ~ l l , ~ 0 2=, ~[-5.340, ,2] 1.069 35.246,0.819], (4.646,0.038, 11.738, 0.094) d,,, 8,,] = [11.47,305.74, 21.36 1037.261. [deel 8ee22, (23.10, 73.31, 20.70 250.27) The numbers in parentheses below the estimates are the estimated standard errors which will be obtained below. The estimated covariances of (utl, ut2) are
1 2 , ~ ~=~(35.8776, ) 18.7056, 320.0740), (fiuul fiuv21, where muu= (n - l)-' X).We note that
c;=l(Ctl, Cf2)'(Ctlr C,J
and Cjj =
xi - Fl - bli(X, -
8,, = 21.3557 = ( / ? ~ l / ? 1 2 ) - 1 ~ v u l ~ .
The (fiIl, Ct2) values are given in columns five and six of Table 1.5.1. The estimated values for myv21 and mYvl2are zero by the construction of the estimators. Also see Exercise 1.35. Under the normality assumption, the estimated covariance matrix of (myv21, mYvl2I muu119 ' n v v 1 2 7 muu22, mxx) is 998.663 569.442 569.442 10638.993 0 11.921 304.701 101.989 317.726 0 - 1077.782 - 1077.740
0 11.921 71.511 37.284 19.439 29.077
304.701 3 17.726 - 1077.782 101.989 0 - 1077.740 37.284 19.439 29.077 328.705 332.621 22.242 332.621 5691.520 17.014 22.242 17.014 62259.794
TABLE 1.5.1. Hectares of corn determined by three methods
Photograph Segment
XI
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
167.14 159.04 161.06 163.49 97.12 123.02 11 1.29 132.33 116.95 89.84 84.17 88.22 161.87 106.03 87.01 159.85 209.63 122.62 93.08 120.19 I 15.74 125.45 99.96 99.55 163.09 60.30 101.98 138.40 94.70 129.50 132.74 133.55 83.37 78.51 205.98 110.07 134.36
Satellite
L
168.30 162.45 129.60 166.05 94.05 140.85 110.70 158.85 121.95 106.65 99.00 153.00 159.75 1 17.45 84.15 157.50 194.40 165.15 99.45 166.05 154.35 153.90 132.30 92.70 142.20 65.25 113.40 131.85 92.70 135.90 159.75 132.75 100.35 113.85 206.55 130.50 138.15
Interview
xt
41
165.76 162.08 152.04 161.75 96.32 114.12 100.60 127.88 116.90 87.41 88.59 88.59 165.35 r04.00 88.63 153.70 185.35 116.43 93.48 121.00 109.91 122.66 104.21 92.88 149.94 64.75 99.96 140.43 98.95 131.04 127.07 133.55 77.70 76.08 206.39 108.33 118.17
- 4.702 - 8.869
3.883
- 4.066 - 0.497
6.376 9.098 0.978 - 2.665 1.747 -5.185 - 1.135 -9.534 0.204 -2.387 0.899 16.848 3.507 - 1.502 - 3.808 3.596 - 0.322 - 6.09 1 5.610 8.158 - 3.572 0.472 - 6.367 - 5.729 - 5.230 2.254 - 3.863 5.656 2.527 - 9.292 -0.385 13.387
4
2
- 2.760
- 5.594
-30.218 - 1.724 - 20.1 15 12.101 -6.972 18.827 -9.077 -0.215 -8.831 45.169 - 10.974 -3.007 -23.714 - 3.678 7.290 34.508 - 12.388 31.664 29.050 18.154 11.671 - 18.646 - 15.898 -23.048 - 3.747 - 18.456 -23.620 -6.712 20.391 - 11.919 1.441 16.269 2.201 6.495 6.083
3, 162.76 156.42 153.93 159.17 95.65 118.34 06.18 28.83 15.07 88.50 85.18 88.69 159.18 104.07 86.71 154.20 196.05 119.25 92.32 119.18 112.69 122.79 100.60 96.06 154.77 62.10 100.19 136.11 94.93 127.64 128.85 130.91 81.27 77.96 200.60 108.21 126.67
65
66
A SINGLE EXPLANATORY VARIABLE
This matrix is calculated using the expressions of Appendix l.B and the estimated moments, including mu". It follows from expressions (1.5.6) and (1.5.7) thtt the e$imated covariance matrix of the approximate distribution of (lOPll, ~ O P I Z Z, e e i i , Zee22, a^,, O.lO~,,) is -
0.138 0.060 -4.677 0.990 4.094 1.050 - 1.093 -0.705 0.060 0.865 -4.677 - 1.093 533.680 - 114.372 -447.073 92.102 0.990 -0.705 - 114.372 5374.730 428.550 4.094 1.050 -447.073 92.102 44.713 -40.315 - 1.678 - 1.077 -9.214
- 1.678-1.077 44.713 -9.214 -40.3 15 626.376
For example, the variance of the approximate distribution of Beell is estimated with where P{m,,,,} = 71.511 and P{mUul2}= 3?8.705. We omit the portion of the covariance matrix associated with &,,/Io2) because it is a simple function of the given covaTiance matrix and of the covariance matrix of (El, E J . See the equation for (poi - poi)in (1.5.6). The estimated communality and uniqueness for the three variables are given in Table 1.5.2. For example, the estimated uniqueness of variable is
x1
1 - G l l = m;$llBeell = 0.009586. By (1.5.9), the estimated variance of the approximate distribution of rZI1 is
P{tl1}= m&lP{Beell} = 0.0003777.
+ 2(n - 1)-'(1
-R
, ~ ) ~ ( ~ G-, 1) ,
The estimated values for x,, treating x, as fixed, are 2, = 2.7124
+ o.6270K1 + o.0180Y2 + 0.3151X,,
TABLE 1.5.2. Estimated communality and uniqueness Variable x1
x2
x,
Communality
Uniqueness
Standard Error
0.99041 0.69489 0.97983
0.00959 0.30511 0.02017
0.01944 0.08584 0.02008
I .5.
67
FACTOR A N A L Y S I S
where the coefficients are defined in (13.12). The estimated variance of the error in these estimators is given in (1.5.14) and is
P{it- x,}
=
6.7283.
The estimated true values are given in the last column of Table 1.5.1. Figure 1.5.1 contains a plot of i?,, against 2,. This plot is analogous to the plot of ordinary regression residuals against Y-hat. The variability of the seems to increase with ft.To test the hypothesis of homogeneous variance, we regress i?fl on 2,. The t statistic for itin the regression containing an intercept is t = 3.64. Therefore, we are led to reject the original model that postulated identically distributed errors.
I12.5
5
,
O
~
10.0
7.5
0,I
. .
5.0
..
2.5
0
.
*
-
-2.5
- 5.0 -7.5 -10.0 50
70
90
110
130
150
170
190
ESTIMATED TRUE VALUE FIGURE 1.5.1. Plot of residuals against estimated true x values.
21
-
68
A SINGLE EXPLANATORY VARIABLE
The estimated variances of the 8-coefficients do not require normality (see Theorem 1.4.1), but the estimated variances of the error variances (heell, hee22,6"") rest heavily on the normality assumption. The assumption of homoskedastic errors is required for all variance calculations. See Exercise 3.7 for alternative variance calculations for the instrumental variable estimator, Exercise 4.20 for alternative variance calculations for the vector of estimates of this example, and Exercise 1.44 for an alternative model. 0 0 In the form (1.5.1) the systematic part of the variable X, is called the factor. Assuming that Pll # 0 and P 1 2 # 0, we could as easily have designated y,, or yt2 as the factor. In factor analysis it is common to code the common factor by fixing its variance at one and its mean at zero. That is, the model is often written Zti
=noi
+ n l i f , + Eli,
(1.5.15)
NUO, I),
f,
for i = 1, 2, 3, where Z,= (Zll, ZI2,Z I 3 )= (XI, X2,X,), and e, = (ell, er2, t4J. The coefficients nil,n12,and n l 3 are calledfuctor loadings. Under the model ( 1.5.15), Zi
N
+ &,)
NI(nb,
(1.5.1 6)
where C,, = diag(ueell,c ~a ,,,,) ~ and~ nj = ~ (njl, ~ nj2, , nj3)for j = 0, 1. It follows that the parameters of model (1515) can be estimated by g e&ii . . = mZ Z i i fi:i fioi
-I
- mZZjkmZZjimZZki,
(1.5.17)
= mi.ZljkmZZjimZZki, -
= zi,
for i = 1,2, 3 and i # j # k. The estimators of the error variances are identical to those of (1.5.4). There is a sign ambiguity in the definition of xi, associated with model (1.5.15). If we choose the sign of n 1 3to be positive, then the sign of n l l is the sign of mzZl3 and the sign of xI2is the sign of mzz23. With this choice of standardization we see that ( 2 1 1 , A12, f i 1 3 )
-1/2
= Ox,
-
A
(fill, 8129
l),
(1.5.18)
and a,, are defined in (1.5.4). Because the estimators fiij are where jll,jl2, continuous differentiable functions of the moments, their sampling properties follow by the method of statistical differentials. The covariance matrix of A i 2 , A:3) is estimated by (1.5.19)
I .5,
FACTOR
69
ANALYSIS
where
Aij = m~~jljkmZZki for i # j # k, and ?{(rnZZ12, m Z z 1 3 , mZZ23)’} is the estimated covariance matrix of the vector of three moments. The covariance matrix of A , is constructed by using E { A l i , ;tlj}
&
(4;t1iA1j)-1E{R:i,A:j}.
Sometimes the observed variables are standardized and the factor parameters estimated from the sample correlation matrix. In this case the R f j are the estimated communalities of the original variables and the estimated “error variance” for a standardized variable is the estimated uniqueness of that variable. If we know the parameters of the model, the least squares estimator off, constructed on the basis of (1.5.15) is
I.=
xllgeiil(xl
+ x 1 2 ° i i 2 ( x 2 - ‘ 0 2 ) + x 1 3 g i 1 ( x t - n03) C l + n:zG?:z + .:3G1
- nOI) &
(1.5.20)
The estimator is the generalized least squares estimator treating f, as fixed. In practice, parameters on the right side of (1.5.20)can be replaced by sample estimators. The estimator obtained in this manner is a linear function of the estimator defined in (1.5.12).
1
Example 1.52. Using the data of Example 1.5.1, the estimated coefficients of the standardized factor are
(A,,,
$12, R 1 3 ) = (34.426,26.387,32.207)-
where we have chosen A 1 3 to be positive. Using (1.5.19),we find that the estimated covariance matrix of the approximate distribution of (R, 1, A,,, A,3), calculated under the assumption of trivariate normality, is
?(al) =
[
i
16.8900 12.5856 15.2983 12.5856 18.2291 11.8308 , 15.2983 1 1.8308 15.0968
where the matrix G,, is constructed using
(Al12, iI3, i2J = (1.3046, 1.0689,0.8193)
70
A SINGLE EXPLANATORY VARIABLE
and the covariance matrix of (rnZzlz, m Z Z 1 3 , ~ z z z 3 is ) constructed using the results of Appendix l.B and the sample moments given in Example 1.5.1. Q O While the parameterization (1.5.15) is more common in factor analysis than the parameterization (1.5.1), we prefer the parameterization (1.5.1). The p parameters of (1.5.1) are analogous to regression coefficients and the R parameters of (1.5.15) are analogous to correlation coefficients. Generally, regression coefficients have proved to be much more portable from one situation to another. This is because regression coefficients are largely independent of the parameters of the distribution of x. The mean of a regression coefficient does not depend on the distribution of x and the variance of the coefficient depends only on the second moments of the x distribution. The same is true of the jl,of the factor model. The distribution of a correlation coefficient is much more dependent on the form of the x distribution. For the factor model the mean of Al is a function of the variance of x and the distribution of 12, depends on the distribution of x. Any screening of the data that changes the variance of x changes the nl.If two different samples are to be compared, comparison of the R’S typically requires the distribution of x to be the same in the two populations. On the other hand, a specification of equality for the p’s of two samples can be made with the distribution of x in the two populations left unspecified. In the models discussed to this point, the number of sample covariances has never exceeded the number of parameters to be estimated. In applications of factor analysis, it is often the case that the number of covariances is much larger than the number of parameters. Consider a model with p > 3 observed variables that are a function of a single latent factor
+
i = 1 , Z . . . ,p . Z,, = nOi+ nljf, eti, There will be p means, $ p ( p + 1) sample covariances, and 3p parameters to be estimated from them. Inspection of a covariance matrix such as (1.5.3) will lead to several possible estimators for the parameters. In such a situation the parameters are said to be oueridentijied. The application of the method of maximum likelihood will lead to unique estimators of the parameters that are not equivalent to instrumental variable estimators. Estimation for models of higher dimension is discussed in Section 4.3.
REFERENCES
Barnett (1969), Harman (1976), Joreskog (1978), Lawley (1940, 1941, 1943), Lawley and Maxwell (1971).
I .5.
71
F A C T O R ANALYSIS
EXERCISES 41. (Section 1.5) As an example of a simple factor analysis model, consider some data studied by Grubbs (1948). The data given in the table are the time of burning for fuses on projectiles as recorded by three different observers. Assume the data satisfy model (1.5.1). The choice of observer to be identified with X I is arbitrary because we have no reason to believe that one observer is superior to the others. (a) Fit model (1.5.1) to these data. Using the covariance matrix of the approximate distribution, test the hypothesis that the variance of the measurement error is the same for the three observers. (b) Write the solution in the parametric form
&=s,,+n,,f;+e,,,
xi
i=l,2,3,
where u,, = I , is the observation on the tth fuse made by the ith observer, and the measurement error. Give the covariance matrix of (sol, so,,A,,, A , , , A,2, A13).
el, is
Data for Exercise 41: Observed fuse burning times Index
Observer B
Observer A
I
XI
YZ
I 2 3 4 5 6 7 8 9
10.07 9.90 9.85 9.71 9.65 9.83 9.75 9.56 9.68 9.89 9.61 10.23 9.83 9.58 9.60 9.73 10.32 9.86 9.64 9.49 9.56 9.53 9.89 9.52 9.52 9.43 9.67 9.76 9.84
10.10 9.98 9.89 9.79 9.67 9.89 9.82 9.59 9.76 9.93 9.62 10.24 9.84 9.62 9.60 9.74 10.32 9.86 9.65 9.50 9.56 9.54 9.89 9.53 9.52 9.44 9.67 9.77 9.86
10
II 12 13 14 15 16 17 18 19 20 21 22 23 24
25
26 27 28 29 Source: Grubbs (1948).
Observer C
x,
10.07 9.90 9.86 9.70 9.65 9.83 9.79 9.59 9.72 9.92 9.64 10.24 9.86 9.63 9.65 9.74 10.34 9.86 9.65 9.50 9.55 9.54 9.88 9.51 9.53 9.45 9.67 9.78 9.86
72
A SINGLE EXPLANATORY VARIABLE
42. (Section 1.5) Assume that the data of Table 1.4.1 of Example 1.4.1 satisfy the factor model (1.5.1) with surface wave equal to yI,, trace amplitude equal to and body wave equal to XI.
xz,
(a) Estimate the parameters of the model. (b) Calculate fill, Br2, and 2,. Plot &against 2,. Plot fiIz against 2,. (c) Estimate the parameters of the model with observation number 54 deleted. Plot fill and &2 against 2, for the reduced data set. (d) Estimate the covariance matrix of the approximate distribution of the estimates obtained in part (c). 43. (Section 1.5) (a) Using the expressions given in (1.5.6) and (1.5.7), derive expression (1.5.9). (b) Using the method associated with (1.2.16), derive expression (1.5.11) for the estimated true value x,, treating x, as fixed. (c) Show that f, of(1.5.11)can also be written XI
= X I - ii,,
where oIi = eli - u,Pli and 6, = (o,,, u l z ~ ~ ~ l ~ ~..zll'. uuull,
44. (Section 1.5) Fit the one-factor model to the logarithms of the data of Example 1.5.1.
Compute the covariance matrix of your estimates. Plot O,, and OI2 against the estimated true values, all for the model in logarithms. Regress ir?,,l and (&(on 2,. What do you conclude? 45. (Section 1.5) Compute the ordinary least squares regression of X I on Crl and fill for the data of Table 1.5.1. Compare the calculated residuals for this regression with 2, of Table 1.5.1. 46. (Section 1.5) Assume that the parameters of the factor model (1.5.1) are known. Let an observation (TI, yIz, X I )be given. Treating x, as random, obtain the best predictor of x,. Show that this predictor is x', = A + [ ~ { f , ~-lPJ, -'~xx(f, where i , is defined in (1.5.1 i), y:, = X;vlZor, and
v4 =
W
l
- P X l Z }= uxx
- Y."Z"..
Show that V{f, - x,} = u, - [V{xl}]-lu~x.,, 47. (Section 1.5) Find a and b such that 1;= (I bf,, where is defined in (1.5.20) and f, is defined in (1.5.11). 48. (Section 1.5) Prove that, at most, one of the variance estimates defined by Equations (1.5.4) can be negative.
+
1.6. OTHER METHODS AND MODELS In the first five sections of this chapter we have discussed methods that have found considerable use in application. In this section we mention some other procedures that appear in the literature associated with errors in variables. We also describe two special situations in which ordinary least squares remains an appropriate procedure in the presence of measurement error.
1.6.1. Distributional Knowledge In Sections 1.1-1.5 we often assumed both the errors and the true values to be normally distributed. Geary (1942, 1943) has demonstrated that the parameters of a model containing normal measurement error can be esti-
I .6.
73
OTHER METHODS AND MODELS
mated if it is known that the distribution of x, is not normal. Let
I; = Po + Pix, + e,, 6- N W ,&J,
X,= x, + u,,
(1.6.1)
where E, = (e,, u,) is independent of x j for all t andj. Also, assume, for example, that it is known that E b , - P A 3 } z 0.
(1.6.2)
This knowledge can be interpreted as the availability of an instrumental then variable. That is, if we set l4( = (X, -
x)2,
E { ( X , - P d w - P d I = (1 - n- 1)2E{(X, - P J 3 } f 0,
and, by the properties of the normal distribution, E { Wu,}= E { we,} = 0.
Therefore, the methods of Section 1.4 can be used to construct the estimator Pl
-1
= mxWmYW,
(1.6.3)
where
In this development, the assumption that the conditional mean of
I; given
x, is linear is used in a very critical way. It is a part of the identifying
information. The distributional theory of Section 1.4 can be extended to cover estimator (1.6.3), but the estimated variance expression (1.4.15) is not appropriate for estimator (1.6.3) because the U: of this section does not satisfy assumptions (1.4.14). See Exercise 3.7 for a variance estimator. Under model (1.6.1), (1.6.2), ( y - Y ) 2and (X, - z)(y- 7) are also possible instrumental variables. See Section 2.4 for the use of multiple instrumental variables. Reiersol (1950) showed that P , of (1.6.1) is identified for any model with nonnormal x. Estimation methods for P1 using only the fact that the distribution of x is not normal are possible, but the procedures are complex and have been little used in practice. See Spiegelman (1979) and Bickel and Ritov (1985) for discussions of such estimation. 1.6.2.
The Method of Grouping
Wald (1940) suggested an estimator of P1 for model (1.6.1) constructed by dividing the observations into two groups. Let the observations ( Yl, XI), ( Y2,X2),. . . , (K, X,) constitute group one and the remaining observations
’
74
A SINGLE EX P LA N A TOR Y VARIABLE
group two. Then Wald’s estimator is (1.6.4)
Wald showed that the estimator is consistent for PI of model (1.6.1) if the grouping is independent of the errors and if lim
n+ m
infix,,, -
> 0,
(1.6.5)
where Z(o is the mean of the true x values for the ith group. If the grouping is independent of the errors, this implies that there exists a variable,
1 if element is assigned to group one otherwise,
w={0
(1.6.6)
that is independent of the errors. Therefore, with grouping independent of the errors, Wald’s method reduces to the method of instrumental variables. Wald’s method has often been interpreted incorrectly. For example, it has been suggested that the method can be applied by randomly assigning elements to two equal-sized groups. The random method of assigning elements is independent of the errors of measurement, but (1.6.5) is not satisfied because both sample means are converging to the true mean. It has also been suggested that the groups be formed by splitting the sample on the basis of the size of the observed X values. Splitting the sample on the basis of X will satisfy (1.6.5) but the group an element falls into will generally be a function of u,. Therefore, the first condition of Wald’s theorem will generally be violated when the observed X values are used to form the groups. See Exercises 1.50 and 2.19. A method related to the method of grouping involves the use of ranks. It may be that the x values are so spaced, and the distribution of the errors is such that XI < X, implies x, < x,. Given this situation, the rank of the observed X values is independent of the measurement error and highly correlated with x,. Therefore, in the presence of these strong assumptions, the rank of the X can be used as an instrumental variable. Dorff and Gurland (1961a, b) and Ware (1972) have investigated this model. 1.6.3. Measurement Error and Prediction
One often hears the statement, “If the objective is prediction, it is not necessary to adjust for measurement error.” As with all broad statements, this statement requires a considerable number of conditions to be correct. Let
k; = Po + Plx, (x,, 8,Y
-
+ e,,
+
X , = x, u,, NI[(,u,, O)‘, block diag(a,,, U ] ,
(1.6.7)
1.6.
75
OTHER METHODS AND MODELS
where (e,, u,) = e,. Let a sample of n vectors (k;, X , ) be available and let X , , be observed. Because X,)' is distributed as a bivariate normal random variable, the best linear unbiased predictor of Y,,, conditional on (Xl, X 2 , .. . , Xn+Ais
(x,
I
Y,+l
= 'yoc + YIlGXn+l,
(1.6.8)
where r n yle =
1 ( X , - ""J
1-1
1 = 1
c n
I=
1
( X ,- X ) ( K -
Y),
I:=
yIoG = 7 - yllcX, and X = n - l X,. The assumption of zero covariances between (e,, u,) and x, is not required for the optimality of the predictor. Thus, if one chooses a random element from the same distribution, the simple regression of observed k; on observed XI gives the optimal prediction of Y,+ if(& X , ) is distributed as a bivariate normal random vector. If the joint distribution is not normal, the predictor is best in the class of linear predictors. The introduction of normal measurement error will destroy the linearity of the relationship between X and Y when x is not normally distributed. If
,
Y; = B o
+ B l X , + e,,
-
PI
f
0,
and XI = x, + u,, where (e,, u,)' NI(0, diag{c,,, G,,"}) and oUu> 0, then the expected value of Y given X is a linear function of X if and only if x is normally distributed. See Lindley (1947) and Kendall and Stuart (1979, p.
438).
It is worth emphasizing that the use of least squares for prediction requires the assumption that X , , be a random selection from the same distribution that generated the X , of the estimation sample. For example, it would be improper to use the simple regression of Yon X of Example 1.2.1 to predict yield for a field in which the nitrogen was determined from twice as many locations within the field as the number used to obtain the data of Table 1.2.1.
Example 1.6.1. Assume that the fields in Example 1.2.1 were selected at random from the fields on Marshall soil. Assume further that a 12th field is selected at random from the population of fields. Assume that twice as many soil samples are selected in the field and twice as many laboratory determinations are made. Under these assumptions, the variance of the measurement error in X12 for the 12th field is inuu= 28.5. Given that the observed soil nitrogen is 90, we wish to predict yield. We treat the estimates of Example
76
A SINGLE EXPLANATORY VARIABLE
1.2.1 as if they were known parameters. Because the variance of the measurement error in XI,is &,, the covariance matrix for (Y12, XI2)is Pl0.u [ P 12 ; o : x o e e
oxx
] [
+ +aUU
=
1
87.6727 104.8818 104.8818 276.3545
It follows that, conditioning on the observed value of X12,the predicted yield for the new randomly selected field is ?12(X12)
= 97.4546
+ 0.3795(90- 70.6364) = 104.80.
Under the assumption of known covariance structure, the expected value of the error in the predictions for fields with an observed nitrogen of 90 is zero. That is, the expected value of the prediction error conditional on X I 2is zero. Note that the average of the predictions is for a fixed observed value, not for a fixed true value. The unobserved true value is a random variable. The average squared prediction error for those samples with the observed X12equal to 90 is E([Y12 - F(X12)]21X12= 90) = oyy- 0.37950,~= 47.87.
Because of the bivariate normality, the variance of the prediction error is the same for all observed values of X. Let the true nitrogen value for the field be x I 2 and consider all possible predictions of yield that can be constructed for the field under consideration (or for all fields with true nitrogen equal to xlZ).The conditionally unbiased predictor of yield, holding xI2 fixed, is FI2(xl2fixed) = 97.4546
+ 0.4232(90- 70.6364)= 105.65.
Thus, if the true value of nitrogen is held fixed, the conditionally unbiased predictor of yield is obtained by using the structural equation as the prediction equation. The average squared prediction error for those fields with x = x I 2 and the same number of soil determinations as used to obtain XI,is
E{[Y12 -
f((xI2 fixed)I2lx = xI2}= oee+ (0.4232)*(~0,,) = 53.50.
The variance of the prediction error is a function of oeeand CT,, only. The variance does not depend on the true, but unknown, x value. Given that the variance of the prediction error for the prediction conditional on X is less than the variance conditional on x, why would one choose a prediction conditional on x? The model that permits one to use the prediction conditional on X assumes that x (the field) is selected at random from the population of fields. If this is not a reasonable assumption, the prediction constructed for fixed x should be used. As an example, assume that a farmer asks that his field be tested and corn yield be predicted for that field.
1.6. OTHER METHODS AND MODELS
77
Is it reasonable to treat the field as randomly selected? Assume further that the farmer is known to sell a soil additive product “Magic Grow Sand” that is 100% silica. Is it reasonable to treat the field as randomly selected? To investigate the random and fixed specification further, consider the problem of making statements about the true nitrogen x, on the basis of the observed nitrogen X,. If the sampled field is randomly chosen from the population of fields, it is reasonable to assume that the observed nitrogen and the true nitrogen are distributed as a bivariate normal. If we treat the estimates of Example 1.2.1 as the parameters, we have
I>
304.8545 247.8545 247.8545 247.8545
’
Therefore, the best predictor of true nitrogen conditioning on the observed nitrogen value is ?,(XI) = 70.6364 + O.81303(Xr- 70.6364).
Under bivariate normality, this predictor is conditionally unbiased, E{Z,(X,) - x , I X , = t} = o for all
5.
However, when we look at the conditional distribution of Z,(X,) holding
x fixed, we have
E{ x’,(X,)- X, IX,
= W ) = - 0.18697(0 - 70.6364).
Also,
E{[&(X,) - x , ] ~ ) x ,= W}= 37.6780
+ [0.18697(0 - 70.6364)]’.
It follows that the mean square error of x’,(X,)as an estimator for x, is greater than the mean square error of X, for Ix, - 70.63641 > 23.5101. Because the distribution of x, has a standard deviation of 15.7434, the mean square error of Z,(X,) as an estimator of x, is less than the mean square error of XIfor those fields whose true x is less than 1.493 standard deviations from the mean. See Exercise 1.5 1. no
In Example 1.6.1 we illustrated the construction of a predictor in a situation where the nature of the population of observables was changed by a change in the measuring procedure. In practice one may also find that the distribution of x in the prediction population differs from that of the estimation sample.
Example 1.6.2. To illustrate prediction for a second population, consider the earthquake data introduced in Example 1.4.1. In the National Oceanic and Atmospheric Administration’s Hypocenter Data File, values for surface
78
A SINGLE EXPLANATORY VARIABLE
waves (Y) are generally available for large earthquakes but not for small ones, while values for longitudinal body waves (X) are generally available for small earthquakes but not for large ones. The group of earthquakes for which the XI, y ) is available is a relatively small subset of the total. In the triplet data file there are 5078 Alaskan earthquakes with only X values reported. It is of interest to construct Y values for these earthquakes. The sample mean and variance for the group of earthquakes with only X values is
(x,
(X(2),a*,,(,))
= (4.5347, 0.2224),
while the corresponding vector for the data of Example 1.4.1 is (x(l), dxx(l))= (5.2145,0.2121).
Clearly, the mean of X differs in the two populations from which the two samples were chosen. Let us assume that the model Xi
= Po
+ BlXri + eti,
Xti = xri
+ Uri,
(1.6.9)
Wi = Y O + YlXti + Cri, > diag(axx(i),see, c u u , ~ c c ) ] , (xrij &ti)' N l [ ( ~ x ( i ) 01'9 holds for i = 1,2, where eri= (eri,uri, cri),i = 1 denotes the population from which the triplets were selected, and i = 2 denotes the population for which only X values are available. While the observations suggest that the X distribution is not exactly normal, it seems that normality remains a good working approximation. Model (1.6.9) is the factor model in one factor and contains stronger assumptions about the error covariance matrix than the instrumental variable model used in Example 1.4.1. Under the factor model, we can estimate the covariance matrix of (er,u,, cl). By Equations (1.5.6), the estimates are
(a*,,
8,,,
= (0.1398,0.0632,0.0617).
The estimates of the variances of x in the two populations are dxx(l)= dxx(1)- a*,, = 0.1489, dXx(2)= dxx(2) - 8," = 0.1592.
From Example 1.4.1, the estimated structural equation is =
-4.2829
+ 1.7960~~.
It follows that estimates of the remaining parameters of the (Y, X) distribution for population two are fiy(2)
= -4.2829
+ 1.796ox(2)= 3.8614,
dYy(,,= (1.7960)2(0.1592)+ 0.1398 = 0.6533, a*,,(,) = (1.7960)(0.1592) = 0.2859.
I .6.
OTHER METHODS AND MODELS
79
Let us assume that we are asked to predict the Y value for an earthquake with an X value of 3.9 selected from population two. On the basis of our model E { J;2) IX(2)) = PY(2) + 4 ( 2 ) 0 X Y ( 2 ) ( X ( 2 )- P X ( 2 ) )
( 1.6.10)
and
(1.6.11) Our estimators of these quantities are
B{ v2)1 X ( 2 ) }= 3.8614 + 1.2855(x,,, - 4.5347), V { y2,1 X ( 2 ) }= 0.2858.
Therefore, the predictor of Y for an observation in the second population with X = 3.9 is
P = 3.8614 + 1.2855(-0.6347) = 3.0455,
and the estimated standard deviation of the prediction error is 0.5346. One could modify the estimated prediction standard deviation by recognizing the contribution to the error of estimating the parameters. See Ganse, Amemiya, and Fuller (1983). Figure 1.6.1 illustrates the nature of prediction for the second population. The two clusters of points represent the two populations. The solid line represents the structural line common to the two populations. The two dashed lines are the least squares lines for the two populations. For our example the two clusters of points would overlap, but we have separated them in the figure for illustrative purposes. The point X, represents the value of X for which we wish to predict Y in the second population. The prediction is denoted by f,. The predictor computed by evaluating the least squares line for the first population at X, is denoted by Also see Exercise 1.52. on
t.
1.6.4.
Fixed Observed X
We have discussed two forms of the model
where e, = (e,, ut). In the first form, the x, are treated as random variables, often assumed to be normally distributed. In the second form of the model, the x, are treated as fixed constants. There is a third experimental situation that leads to a model that appears very similar to (1.6.12),but for which the stochastic behavior is markedly different. Assume that we are conducting an experiment on the quality of cement being created in a continuous mixing operation. Assume that quality is a function of the amount of water used in
80
A SINGLE EXPLANATORY VARIABLE
I
xp
I
1
PX(2)
PX(I)
X
FIGURE 1.6.1. Prediction in a second population.
the mixture. We can set the reading on the dial of a water valve controlling water entering the mixture. However, because of random fluctuations in water pressure, the amount of water actually delivered per unit of time is not that set on the dial. The true amount of water x, is equal to the amount set on the water dial XIplus a random error u,. If the dial has been calibrated properly, the average of the u, is zero. Thus, x, = XI- u,,
(1.6.13)
where the u, are (0, cUu)random variables. We used the negative sign on the right side of (1.6.13) so that the form of (1.6.12) is retained. It is important that in this experiment the observed XI,the reading on the dial, is Jixed. The observed value is controlled by the experimenter. Also, if it is assumed that variations in pressure are independent of the valve setting, the u, are In the model of earlier sections, x, and u, are independent, independent of XI. while X, and u, are correlated. In the cement experiment, x, and u, are
1.6. OTHER METHODS A N D MODELS
81
correlated, while X, and u, are independent. Berkson (1950) observed that when X,is controlled, ordinary least squares can be used to estimate the parameters of the line.
Theorem 1.6.1.
Let
x, = X , - u,, k; = Po + Pix, + e,, where (e,, ul), t = 1,2,, . . , n, are independent vectors with zero mean and
covariance matrix
E{(e,,uJ‘(e,, ut)} = diadoee, o,,), and X’= (Xl, X,, . . . , Xn)is a vector of fixed constants. Let -1
so,
n
(1.6.14)
,= 1
=
B - j,,x,
be the ordinary least squares estimators of Po and /I1. Then
N P o c , i j l d = (Po, P I )
and
where Axx = (n - l)mxx, u, = e, - u,Pl, and o,, = c,,
+ B?oUU.
Proof. Substituting the definition of x, into the equation for k;, we have
I: = P o + PIX, + 4,
(1.6.16)
where u, = e, - u,Pl. By the assumptions, the vector (e,, u,) is independent of X,. Therefore, u, is independent of X,. Substituting (1.6.16) into (1.6.14), we obtain BOl
- Po =
-
RSl, - P l ) ,
The mean and variance results then follow because the X, are fixed. Because (1.6.16) has the form of the classical fixed-X regression model, all the usual estimators of variance are appropriate. The residual mean square S”,
= (n - 2 ) 4
1= 1
[I:- F - ( X , - X ) f i 1 f ] 2
82
A SINGLE EXPLANATORY VARIABLE
is an unbiased estimator of ouXa n t replacing o,, of (1.6.15) with s,, produces an unbiased estimator of V{(Po,, P,,)'}. Theorem 1.6.1 and the discussion to this point suggest that errors of measurement associated with the controlled variable in an experiment can be ignored. This is not so. It is only estimation for the simplest model that remains unaffected. Berkson (1950)pointed out that tests in replicated experiments will be biased if the same error of measurement holds for several replicates. For example, assume that a fertilizer experiment is being conducted and consider the following two experimental procedures: A. Fertilizer is applied with a spreader. A rate is set on the spreader and fertilizer is applied to every plot randomly selected to receive that rate. The spreader is then set for the next experimental rate and the process is repeated. B. The fertilizer amounts are weighted separately for each plot. The amount shown by the scale is the same for all plots receiving the same treatment. The scale is calibrated with a standardized weight between each weighing. The quantity assigned to a particular plot is scattered evenly over the plot.
With procedure A, an error made in setting the spreader will be the same for all plots treated for that setting. We can write x,i =
x,- u, - a,,,
where u, is the error in the rate applied that is common for all plots treated with a single setting of the spreader and afi is the additional error arising from the fact that the material does not always feed through the spreader at the same rate. If the response Y is a linear function of the true application rate x,, we have Xi
+ eri = P o + P I X , + e,i - P l ~ r Plari= PO + P l x r
Because a part of the measurement error, u,, is common to the tth treatment, the treatment mean of the response variable is
z*
= Po
+ P I X , + e;. - P I % - P I G .
and the expected value of the within-treatment mean square is
where o,, is the variance of a,,, we assume e,,, u,, and a,, to be independent,
I .6.
83
OTHER METHODS AND MODELS
and we assume each of n treatements is observed on r plots in a completely randomized design. The expected value of the residual mean square obtained in the regression of treatment means on X, is bod
-
Therefore, with procedure A the estimator of PI is unbiased, but the usual within-treatment estimator of variance is a biased estimator of the true error variance. With procedure B, the setting for a plot (the weighing operation) is repeated for each plot, If the calibration produces unbiased readings, we can write
x,i = x, + Uti,
where one can reasonably assume the uti to be uncorrelated. Then
Ti = P o + PIX, + eti -
utiP1,
and the within-treatment mean square will be an unbiased estimator of 6””= o e e
+ a:guu.
Presence of measurement error in the experimental levels of a controlIed experiment produces complications if the response is not linear. To investigate the nature of this problem, assume that
Y; = Bo x, = (et,
+ P I X , + PZXT + e,,
x,- u,,
( 1.6.17)
4’ NI[O, diag(cee, ~ ~ u u ) ] , ‘V
where (e,, u,) is independent of X i for all t and j . Then, using x, = X, - u,, we obtain E { Y ; I X J = (Bo + P2fJ”U)
+ PIX, + Pa:.
(1.6.18)
If one estimates the quadratic function by ordinary least squares, one obtains unbiased estimators for p1 and pz, but the estimator of the intercept is an Also, the conditional variance of Y; given unbiased estimator of Po P2cUu. X,is the variance of
+
e, - P l U , - 2PZXtUt + P2(uT - %u), which is a function of X,. Therefore, ordinary least squares will not be the most efficient estimation procedure and the ordiyry p i p a t e s of the variance of the ordinary least squares estimators of (Po, PI, B2) will be biased. The effects of measurement errors in the controlled variable of experiments with nonlinear response have been discussed by Box (1961).
84
A SINGLE EXPLANATORY VARIABLE
REFERENCES Bickel and Ritov (1985), Geary (1942,1943), Kendall and Stuart (1979), Madansky (1959), Malinvaud (1970), Neyman and Scott (1951), Reiersol(1950), Spiegelman (1979).
Section 1.6.1. Section 1.6.2. (1972). Section 1.6.3. Section 1.6.4.
DortT and Gurland (1961a, 1961b), Pakes (1982), Wald (1940), Ware Ganse, Amemiya, and Fuller (1983), Lindley (1947). Berkson (1950), Box (1961), Draper and Beggs (1971).
EXERCISES 49. (Section 1.6.1) Assume that model (1.6.1) holds and that x, is distributed as an exponential random variable. That is, the density of xt is e-x, x 2 0.Give the variance of the approximate distribution of the estimator defined in (1.6.3). 50. (Section 1.6.2) Let the normal distribution model (1.2.1) of Section 1.2.1 hold. Let n be even and let r = 4 2 . Find the probability limit of Wald’s estimator (1.6.4) if group one is composed of those observations with the r smallest X values. 51. (Section 1.6.3) Assume that a 13th field is randomly selected from the population of fields of Example 1.2.1. Assume that only one-half as many soil sites and chemical determinations are made for the field. It follows that the measurement error associated with the observed soil nitrogen for the 13th field is 20: = 114.Assume that observed soil nitrogen is X I , = 60. Treating the estimates of Example 1.2.1 as parameters: (a) Estimate true nitrogen treating x I 3as fixed. Give the variance of your estimation error. (b) Estimate true nitrogen conditioning on X13and treating x I 3as random. Give the variance of your prediction error. (c) Predict observed yield conditioning on X I 3and treating x I 3as random. Give the variance of your prediction error. 52. (Section 1.6.3) Assume that the true parameters of a population of soil nitrogen values and yields such as that of Example 1.2.1 are IT,, = 200, px = 80, /lo = 50, PI = 0.5, and u,, = 60. Assume that a population of samples of X values with measurement error uuu= 100 is created by repeated sampling from a given field with true nitrogen equal to ~ 1 3 . (a) Find the mean and variance of the prediction error in the predicted value of true nitrogen constructed as f W 1 3)
= P*
+ ui;uxx(xl3 - A).
Compare the mean square error of this predictor to that of the unbiased estimator X I S . For what values of x , does ~ the predictor have smaller mean square error? (b) Assuming that ( Y , , , X,,)is observed, find the best predictor of x I 3conditional on (YI3, XIJ. Find the best estimator of x 1 3treating x I 3as fixed. For fixed x 1 3find the mean and variance of the error in the predictor of xIJthat conditions on (YI3, X13). 53. (Section 1.6.3) Let (X, x) be distributed as a bivariate normal random vector with mean (p,, /I,), u x x= uzx+ a,,, and uxr = uxx.Show that .?(X), defined in Example 1.6.1, has a smaller mean square error for x than that of X for Ix - /I# < a2axx,where a’ =
- u:,).
APPENDIX 1.A.
85
LARGE SAMPLE APPROXIMATIONS
APPENDIX LA. LARGE SAMPLE APPROXIMATIONS In this appendix we state a theorem that is used in numerous places throughout the text.
Theorem 1.A.1. Let g(a) be a real valued continuous function of the kdimensional vector a for a an element of k-dimensional Euclidean space. Let g(a) have continuous first derivatives at the point p = (pl, p z , . . , ,pk)'. Let the vector random variables X,= ( X , , ,X2,, . . . , X,J, t = 1,2,. . . , be inde... pendently and identically distributed- with mean p and covariance matrix Z x x . Let
x = ,-I
c x,. n
1=
1
(1.A-1)
converges in distribution to a normal random variable with mean zero and variance [g'l'(P), 9'2'(co, * . * 9 g'k'(P)l&x[g'l'(B), g'z'(P), '
' '
9
g'k'(P)l'9 (1.A.2)
where &)(p) is the partial derivative of g(a) with respect to aievaluated at a = p.
Proof. Because the derivatives are continuous at p, there exists a closed ball B with p as an interior point such that the derivatives are continuous on B. The sample mean vector X is converging to p in probability. Therefore, given E > 0, there is an N such that
P{X E B } > 1 - +&
(1.A.3)
for n > N . For X E B, by Taylor's theorem, = goC)
+ CI g"'(x*)(xi - pi), k
i=
where g'i'(x*) is the derivative of g(a) with respect to ai evaluated at x* and x* is on the line segment joining X and p. Because X is converging in probability to p and because g")(a) is continuous on B, g"'(x*) is converging in probability to g"'(p). It follows that, for % E B, i=I
[g(')(x*)
- g ( i ) ( p ) ] ( Xi pi)
86
A SINGLE EXPLANATORY VARIABLE
and that
i
plim n112 g(X) - g(p) -
k
C g“)(p)(Xi i= 1
-
I
p i ) = 0.
Therefore, the limiting distribution of nl’’[g(X) - g(p)] is the limiting distribution of
where = Cf= g(i)(p)(Xti - pi). See, for example, Fuller (1976, p. 193). The 4 are independently and identically distributed with mean zero and variance given in (1.A.2). Therefore, the conclusion follows by the Lindeberg central limit theorem. 0 The theorem extends immediately to vector valued functions. Corollary 1.A.1.
Let g(a) = [g,(a), gz(a), . *
* 9
g,(a>l’
be a vector valued function, where gi(a) are real valued functions satisfying the assumptions of Theorem l.A.l. Let X be as defined in Theorem l.A.l. Then n”2[g(x) - g(p)]
5 “0,
G%m,
where the 0th element of G is the derivative of gi(a) with respect to uj evaluated at a = I(.
Proof. Omitted.
0
If the function has continuous second derivatives, it is possible to evaluate the order of the remainder in the approximation. Corollary 1.A.2. Let the assumptions of Theorem 1.A.1 hold. In addition, assume that g(a) has continuous second derivatives at the point p. Then, given E > 0, there is an N and an M, such that
for all n > N.
APPENDIX I .A.
LARGE SAMPLE APPROXIMATIONS
87
Proof. Let B be the compact bail defined in Theorem 1.A.1. Then for of g(R) about the point p, with remainder, is
X E B, the first-order Taylor expansion
(1.A.4)
where g(ij)(x*) is the second partial derivative of g(a) with respect to a, and aj evaluated at a = x*, and x* is on the line segment joining p and X. By the continuity of the derivatives on B, there is an M , such that )g‘’j)(a)lc MI for a E B. Furthermore, there exists an M , , such that
P { ( ( X ,- pi)(Xj- pj)I > n-‘M2,}< +& for all n. Let M , ( 1 .AS).
=
(1.A.5)
M , M 2 , . The conclusion follows from (l.A.3),(l.A.4), and 0
The application of Theorem l.A.l is sometimes called the method of statistical differentials or the delta method. These terms are appropriate because the mean and variance expressions are obtained by expanding the function g(x) in a first-order Taylor series. As an example of the use of Theorem 1.A.1, we consider the function g(a) = u - l
and let R be the mean of n normal independent (p,a’) random variables, where p # 0. One method of expressing the fact that the distribution of is centered about the true value with a standard error that is proportional to n - ” , is to write
x
x -p =
op(n-l’2).
Formally, a sequence of random variables { Wn}is O,(a,) if, for every E > 0, there exists a positive real number M E such that P{JW,I> Mean} 6 for all n. See Fuller (1976, chap. 5). If we expand g(2) in a first-order Taylor series about the point p, we obtain g ( X ) = g ( p ) - p - 2 ( X - p)
+ OP(K1).
Ignoring the remainder, we have g(XE;)= g(p) - P- 2(f - PI.
88
A S I N G L E E X P L A N A T O R Y VARIABLE
The mean and variance of the right side of this expression are g(p) and n - 1p-4u2,respectively, which agrees with the theorem. It is not formally correct to say that p-402is the limiting variance of n”2Cg(8) - g(p)]. One should say that p-402 is the variance of the limiting distribution of n1/2[g(X) - g(p)]. Note that the theorem is not applicable if p = 0 because the function a - l is not continuous at a = 0. Very few restrictions are placed on the vector random variable X of Theorem l.A.l. For example, X, might be a function of X,.Consider the function g(a1, a,) = a;
1’2Ul.
Let XIbe a normal (0,a’) random variable and let X, = X:. Then we have
(
g(X1,X2)= n - I
,:1, x:, x )-I/,
= o-lxl
1
- X;”2X
1
+qfl-’),
because g(pl, p 2 ) = g(,I(pl, p,) = 0. We conclude that n’/’g(X,, 8,)is approximately distributed as a N(0, 1) random variable. For a function with g(‘)(p)= 0 for i = 1,2,. . . , k, the differencen’12[g(X) g(p)] converges in distribution to the constant zero. As an example, consider the function g(a) = a2
and let
X
be distributed as a normal (0, n - ’ ) random variable. Then n1’2(P
- 0)
converges in probability and, hence, in distribution to the constant zero.
APPENDIX l.B. MOMENTS OF THE NORMAL DISTRIBUTION
Let Y,= (XI, X,, . . . , Xk)be distributed as a multivariate normal with mean zero and covariance matrix X, where X has typical element oij. Given a random sample of vectors Y,, t = 1 , 2 , . . . , n, let
Y=n-’
c Y,, n
t=1
sij = (n - l)-’
11 ( y ,- E)( q, - T), n
I=
i = 1,2, . . . , k; j = 1,2, . . . , k.
APPENDIX l.C.
CENTRAL LIMIT THEOREMS FOR SAMPLE MOMENTS
89
It is well known that P is independent of all sij and that all odd moments of are zero. See, for example, Anderson (1958). Also,
for all i, j , k, m. If we let X, = (Xtl,X t z , . . . , X r k )be distributed as a multivariate normal with mean = (pi,p2,. . . , pk) and covariance matrix E,then C(Xtixtj,X
d , r )
= bircjq
+ (Jiqbjr + P i P q c j r + P i P r u j q + P j P q c i r + P j W i q .
APPENDIX l.C. CENTRAL LIMIT THEOREMS FOR SAMPLE MOMENTS In this appendix we give the limiting distribution for the properly normalized sample mean and covariance matrix constructed from a random sample of n observations. Two situations are considered. In the first, the vector of observations is a random vector. In the second, the vector of observations is a sum of a fixed vector and a random vector. See Appendix 4.A for the definition of vech used in Theorem 1.C.1.
Theorem 1.C.1. Let {Z,} be a sequence of independent identically distributed p-dimensional random vectors with mean p,, covariance matrix Czz, and finite fourth moments. Let be the sample mean, m , , the sample covariance matrix,
z
vech mzz = (mzzll, mzz21,. . . , mzzpl, mzz22,.. . m z z P 2 , .. . mzzPpY, vech x z z = (azz11,azz21,. ' ' rJZZp19 a22229 * * * azzpz, . f l Z Z P P ) l ' al = (2, - pZ,[ v e c W , - az)'(z,- aZ)- L41')7 9
7
'
' 3
and 0 = E{a,a;}.Then
nl"[l(Z - p,), (vech m,, - vech Z,,)']'
L
-+
N(0, a).
90
A SINGLE EXPLANATORY VARIABLE
Proof. We have mzz = (n - 1 I - I
[,IlC
1
(Z, - pzY(Zl - az) + n(Z - pZY@ - pz) , W . 1 )
where n‘I2(Z - pz)’(Z - pz) is converging to zero in probability. Therefore, the limiting distribution of n’/2(vech mzz - vech Ezz) is the same as that of n’I2(n - 1)-’
1=1
vech{(Z, - pz)’(Z,- pZ) - Zzz}.
For any arbitrary fixed row vector d, where dd’ # 0, the sequence {da,} is a sequence of independent identically distributed random variables with mean zero and variance dad’. The conclusion follows by the Lindeberg central limit theorem.
Corollary l.C.1. Let the Z,of Theorem 1.C.1 be normally distributed. Then fi = block diag(Xzz, fi22), where the element of fi22associated with rnzzij and mZZklis aZZikaZZjl
+ aZZiluZZjk*
Proof. For normal random variables Z and m, are independent. See, for example, Kendall and Stuart (1977, Vol. 1, p. 384). The covariances of the sample moments from a normal distribution are given in Appendix l.B. In Theorem 1.C.2 we give the limiting distribution of the vector composed of the means, mean squares, and mean products of variables that are the sum of fixed and random components. As a preliminary, we present two central limit theorems for weighted sums of random variables. Lemma l.C.l. Let e, be a sequence of independently and identically distributed (0, Zee)p-dimensional random row vectors, where Eeeis nonsingular. Let {c,} be a sequence of fixed p-dimensional row vectors with clc; # 0 and
lim
n-tm
Then
n
n-l
C c,CeecI= A > 0.
1=1
APPENDIX I.C.
91
CENTRAL LIMIT THEOREMS FOR SAMPLE MOMENTS
Proof. The random variables g r = cfei are independent with zero means and variances, E{g:)
=GeeC:.
We have
where F(e) is the distribution function of the vector e,
R , , = {e:(c,e')2> t 2 V n ) ,
and Ic,12 = c,ci. The ratio V,' assumption,
c:=l
Ic,J2 is
bounded and, by the limit
Because the random variables let[ are independently and identically distributed with finite second moment,
Therefore, the array { V - 1 / 2 g 1 }satisfies the conditions of the Lindeberg n central limit theorem. O
Lemma 1.C.2. Let e, be a sequence of independently distributed (0, Z,) p-dimensional random row vectors with uniformly bounded 2 6 (6 > 0) moments. Let { c f }be a sequence of fixed p-dimensional row vectors with c,c', # 0. Let n - ' X , where
+
be bounded above and below by positive real numbers for all n, assume lim n-' sup ]c,l'= 0 l s f s n
n-m
and assume
c n
n-'
lCf12
92
A SINGLE EXPLANATORY VARIABLE
Proof. We have
1c,I2, R , , = {e: (c,e’)2> where dnS= 0) moments, is given in Theorem 1.C.3. Theorem 1.C.3. Let Z, = z, + e,, where the e, are independent p-dimensional random row vectors with zero means, positive definite covariance
APPENDIX I.D.
95
NOTES ON NOTATION
matrices ?&,, and bounded 4 and let
+ 6 (6 > 0) moments. Let z, be a fixed sequence
lim Z = p,,
n+
Iim n-'
n-+m
where
zE,is
1.C.2. Then
lim m,,
n-t w
w
n
f=1
positive definite. Let
G,
1z 6
=
= m,,,
E,,,
and 0" be as defined in Theorem
"'(6 - 0,) A N(0, I),
where Gn= V{6}.
Proof. The proof parallels that of Theorem 1.C.2, with Lemma 1.C.2 used to establish normality. 0 APPENDIX 1.D. NOTES ON NOTATION In this section we summarize, for reference, some of the notation that is used throughout the book. Because the treatment of measurement errors is a topic in many areas of statistics and in many areas of application, there is no standard notation at the present time. Therefore, any notation chosen for book length treatment will, at some point, be in conflict with the reader's previous experience. We reserve capital I: and X,to denote observable random variables. If these letters are boldface, Y, and X,, they denote row vectors. Often Y,and X, are combined into a single row vector Z, = (Y,, X,). Lowercase letters y,, x,, y,, x,, and z, are reserved for the true values of q, X , , Y,, X,, and Z,, respectively. The variables denoted by lowercase letters may be either fixed or random, the nature of the variables being specified by the model. We desired a notation in which the observed and true values could be matched easily, and chose lowercase and capital letters as the simplest and most direct of the alternatives. The observed values, true values, and measurement errors are defined by
K = y , + e,, X,= x, + u,, z, = ( X , XI) = z, + e,, where e, = (e,, u,) is the row vector of measurement errors. Measurement error models are related to the regression model and we chose a notation with close ties to regression. To permit us to write models
96
A SINGLE E X P L A N A T O R Y VARIABLE
in the regression form
(1.D.I) I; = X t B + e,, to be a row vector and B to be a column vector. This causes
we define x, our notation to differ from that often used for multivariate models. When the regression model with an error in the equation is being considered, the error in the equation is denoted by q, and the pure measurement error in Y; by w,. In such models e, = w, + q, and
+ 4,.
Yr = xrB
To be consistent with the linear model, we define the vector of parameters to be a column vector identified with a Greek symbol. For example,
B=
(Po9
PI)’.
For models of the type (l.D.1) we reserve the letter u, for the deviation u, =
I; - X,B = e, - u,B.
If Y, is a row vector, there is a row vector of deviations v, = Y,
- X,fl = e, - u,B.
We shall follow the common statistical practice and use the same notation for a random variable and for a realization of that random variable, with a few exceptions. We use py to denote the mean of the random variable and C, to denote : , a,,, and V{Y;} the mean of the random vector Y,. The three symbols a will be used to denote the variance of the random variable Y. The bold V{X’} or Z x x will denote the covariance matrix of the column X’. The ijth element of X x x is a x x i jThe . covariance of the scalar random variables X and Y is denoted by ax, or by C ( X , Y}. The matrix Ex, = E { ( X - px)’(Y - p,)} is the covariance matrix of the column X and the row Y. The covariance between the ith element of X, and the jth element of Y, is c x y i j . Lowercase letter rn, appropriately subscripted, is used for the sample covariance matrix. For example,
Z,, and the ijth element of m,, where Z, = (Ztl,Z,,, . . . , Z,J, Z = n-’ is mZzij. Capital letter M, appropriately subscripted, is used for the matrix of raw mean squares and products. Thus,
M,,
= n-’
r=1
XIY,.
APPENDIX I.D.
97
NOTES ON NOTATION
We use the letter S, appropriately subscripted, for an estimator of the covariance matrix of the error vector. Thus, S,, is an estimator of the covariance matrix of E,. We shall generally use column vectors when discussing the distribution of a vector random variable. Thus, we write
z;
Iv
N W , &z),
where -NI(O, X z z ) signifies that the random column vectors Z;, t = 1,2, . . . , are distributed normally and independently with zero mean vector and covariance matrix Zzz. Special notation for mapping a matrix into a vector is defined in Appendix 4.A.
REFERENCES Appendix l.A. Cramer (1946), Fuller (1976), M a n n and Wald (1943). Appendix l.B. Anderson (1958). Appendix l.C. Kendall and Stuart (1977), Fuller (1976, 1980).
EXERCISES 54. (Appendix l . A ) Using the method of statistical differentials, find the mean and variance of the limiting distribution of nl”(Bx - ax), where
and the X, are normal independent (p, ):u random variables. 55. (Appendix l.A) Using the method of statistical differentials, obtain the mean and variance of the limiting distribution of R = Tx-’, where ( y , X,) are normal independent vectors with p, # 0 and pup; I = R. 56. (Appendix l.A) Let X, ( p x ,uXX),where px > 0. Let C x x= p;’u,, be the squared coefficient of variation of X . (a) Show that the coefficient of variation of the approximate distribution of X - ’ is equal to the coefficient of variation of 2. (b) Show that, for pi # 0 and p, # 0, the squared coefficient of variation of the approximate distribution of R obtained in Exercise 55 can be written
-
c,-9 = n-yc,,
- Zt,,
+ C,),
where C x y= p; ‘ p i I u X y . 57. (Appendixes 1.A, 1.C. Section 1.1) In Section 1.1 it was shown that E { f , , } = B , u i ~ u x x under the assumption that the (x,, e,, u,) are normally and independently distributed with diagonal covariance matrix.
98
A SINGLE EXPLANATORY VARIABLE
(a) Show that, for n
3,
WlA = ( n - 31- l ~ i : ( u X X ~ +p eP:%,%") under the assumption that the (x,, e,, u,) are normally and independently distributed with diagonal covariance matrix. (b) Assume model (l.l.l),(1.1.2) and assume {x,} is a fixed sequence with
Assume (e,, u,)'
-
I=!
Il-m
NI[O, diag(u,,, uuu)]and show that n'"(fll
+
where y l n = (mxx u J l ~ l m x l
v = bXX + U""F
- Y,,)5 N(O, v),
and lo,,
+ bXX + ~"")-4P:~xx~""(~:, + 4").
(c) Show that, under the assumptions of part (a),
~ ~ ~ C 9 1 ~ l ~=, ,2nl l' P h , , + ~ u u ) ~ 4 ( ~ ~ x ~ u u ) 2 . where the approximation is understood to mean that only the expectation of the leading term of the Taylor series is evaluated. 58. (Appendix LA, Section 1.2) Using the method of statistical differentials, find the limiting distribution of n112(s,, - u,,), where s,, is defined in (1.2.7). 59. (Appendix I.A, Section 1.2) Let X I NI(pc,,a,,) and let m x x = ( n - l ) - ' ( X I - XI2. (a) Obtain the exact mean and variance of m :; for n > 5. Compare this expression with the mean and variance of the approximate distribution derived by the methods of Appendix l.A. (b) Obtain the coefficient of variation of the approximate distribution of D = (1 - m;iusu). (c) Let PI = f(1 - m~:u,,)-', where 9 = m;:mx, and ( x , X,)are bivariate normal and satisfy model (1.2.1).Assume PI z-0. Show that the squared coefficient of variation of the approximate distribution of PI is C,, = C,, + C,,, where C,, and C,, are the squared coefficients of variation of the approximate distributions of 9 and D,respectively. (d) On the basis of higher-order approximations, Cochran (1977, p. 162) suggests that the approximate squared coefficient of variation of a ratio of uncorrelated approximately normally distributed random variables will be within 96% of the true squared coefficient of variation if the squared coefficient of variation of the denominator is less than 6%. For what values of u & ~and ~ n~ will the coefficient of variation of the approximate distribution of the denominator D be less than S%? 60. (Appendix 1.A. Section 1.2) Let (x,}? be a fixed sequence and let X I = x, u,. (a) Show that for u, NI(0, u,,),
-
I;=
+
-
V{m,, - mXx}= 4(n - I)-'
1 (x, - x)*u,, + Z(n - l)-'uiu.
I= I
(b) Using result (a), show that the coefficient of variation of A,, 0.2 if
m;iuuu < [0.02(n - 1) + 11"' - 1. Show that the coefficient of variation of A,, is less than 0.2 if
+
[(n - ~)m~,]-'uuu(uuum x x )< 0.01.
=
m,,
- uuuis less than
APPENDIX 1.D.
99
NOTES ON NOTATION
(c) Using Exercise 56, show that the squared coefficient of variation of the approximate distribution of (mxx - uuJ1mxx is
< 2(n - I)-lm;:u:u, where t = m,'u,,. 61. (Appendix 1.A, Section 13) Let b,, the estimator of p, of model (1.3.1), be defined by (1.3.7). (a) Let 0 = arctan fll and 0 = arctan PI. Show that, for u,, = u,,,
4 tan 26 = [ m x x - m,,l-l~ixy. Obtain the limiting distribution of 6.
+
(b) Show that the variance of the limiting distribution of n1'2(s,, - u",,) is 20;" u:ul-22, is defined in Theorem 1.3.1. where s,, is defined in (1.3.12) and 62. (Appendix I.A, Section 1.5)The estimates ofueelI and u,, of Example 1.5.1 are quite similar. Using the estimates and the estimated covariance matrix of Example 1.5.1, construct estimates of the parameters subject to the restriction ueell = uuu.That is, use generalized least squares to construct restricted estimates under the assumption that the estimators of Example 1.5.1 are normally distributed with the estimated covariance matrix constructed in Example 1.5.1. Do not use the original data in your calculations. Estimate the covariance matrix of your estimators. 63. (Appendix l.A) Let X I ) be distributed as a bivariate normal random vector. Using the methods of Appendix LA, obtain the approximate distribution of y i ; , where f x y = m;;mxy. 64. (Appendix l.B) Let 1: Nl(py,by,). Prove that
(x,
-
f{myy} = 2(n + ~ ) - ' m $ ,
is an unbiased estimator of V{m,,} = 2(n - l)-'u;,. Extend this result to the estimation of C{m,,, m x y l . 65. (Appendix l.B) Prove the following lemma (Joreskog, 1973).
Lemma. Let Z;
-
NI(0, E),t = 1,2,. . . , and let
w = Z-'(m and 2 = n -
?=
- Z)E-',
m
Z,. Then
= (n
- I)-'
CI (z,- Z)'(Z,- 21,
I=
+
C{wi,, w,,} = (n - 1)- '(uiruj' u"o*), where 0'' is the ijth element of E-'and wii is the ijth element of W.
-
66. (Appendix 1.B) Let Z; = (Zl,, ZJ NI(0, Z), where Z is a 2 x 2 matrix with 0th element equal to uij. Prove that the conditional variance of Z1,given Z2,is (.'I)- ', where ui' is the ijth element of Z- I , Generalize this result to higher dimensions.
Measurement Error Models Edited by WAYNE A. FULLER Copyright 0 1987 by John Wiley & Sons, Inc
CHAPTER 2
Vector Explanatory Variables
In this chapter we study models with more than one x variable. Maximum likelihood estimation is investigated and certain problems associated with the likelihood function are identified. Estimators of parameters that are extensions of the estimators of Chapter 1 are presented. The first four sections of this chapter parallel the first four sections of Chapter 1. Section 2.5 is devoted to the small sample properties of estimators and to modifications designed to improve the small sample behavior. The calibration problem is also considered in Section 2.5.
2.1. BOUNDS FOR COEFFICIENTS
In Section 1.1.3 we demonstrated that it is possible to establish bounds for the population regression coefficient in a simple regression with the independent variable subject to measurement error, provided the measurement error is independent of the error in Given a diagonal error covariance matrix, such bounds can be constructed for the model with a vector of explanatory variables. Let
x.
-
Zr = zr + er, N I [ ( p x ,O)', block diag(L,, L)],
Y, = Po (Xr, 8J'
+ XrB,
(2.1.1)
where Z, = (y, Xr), xr is a k-dimensional row vector, fl is a k-dimensional column vector, and the covariance matrix of Z,, denoted by Zzz,is positive definite. Under this model, the unknown parameter vector must satisfy (Zzz 100
w 1,
-
FY = 0.
2.1.
101
BOUNDS FOR COEFFICIENTS
Thus, if E,, is any positive semidefinite matrix, the vector
that satisfies
(ZZZ- XE,,)( 1, - j?)’ = 0,
where
2 is the smallest root
of
pzz- lee,(= 0,
is an acceptable parameter vector. We now assume that X,, is a positive semidefinite diagonal covariance matrix. This permits us to establish upper bounds for the error variances. If we solve the equation IX;zz - LiDiil = 0
for At, where Dii is a diagonal matrix with the ith diagonal element equal to one and all other entries equal to zero, we obtain the upper bound for the error variance of the ith component of Z,. The coefficient liis the population regression residual mean square obtained in the population regression of the ith element of Z, on the remaining k elements of Z,. Setting bounds for the elements of /Iis more difficult. We illustrate by considering a model with two explanatory variables. Let
I; = P o + P l X t l + P 2 x t 2 + ef and retain the assumptions of model (2.1.1). Then we can write the model as w o = Po* + P 2 W 2
+ e,,
wo
where wf2 is the portion of xt2 that is orthogonal to xtl, is the portion of I: that is orthogonal to x f I ,and (e0, = (w,,,, w , ~ ) (el, ut2).If we fix a value for ouull. say iiuull,in the permissible range for buul,,then one can define the new variables
w2)
( W O , Wf2) =
where
+
(I;- COlxtl, XI2 - C21X,I),
.. - 1 (CO,, C21) = ~ . x x l l ( ~ x Y 1 1 . ~ x x 1 2 ) ~
and iixxl1= oxxII - iiUull. It follows that the bounds of Section 1.1.3 apply to the reduced model and, for cuul= iiuul and 0 w w 0 2> 0,
d ~W~02~wwoo. If 0 w w 0 2has the same sign for all iiuullin the permissible range, then the derivatives of the bounds with respect to ijuull also never change sign. It follows that the unconditional bounds for will be in the set of conditional bounds associated with the minimum and maximum values of GUul If iiuul = 0, the lower bound for P2 is the regression coefficient of X 2 obtained ~ W ~ 2 2 ~ w w o 8 22
102
VECTOR EXPLANATORY VARIABLES
in the regression of Y on X, and X,, and the upper bound for p, is the inverse of the regression coefficient for Y in the regression of X, on Y and X,. At the maximum value for iiuull,the conditional upper and lower bounds for p, coincide. The value of 8, is the negative of the ratio of the coefficient for X, to the coefficient for Y in the regression of X , on Y and X,. If the conditional covariance between KOand W2 has a different sign for two values of ifuul in the permissible range, then there is a ifuul for which owWo2 = 0 and the range of possible values for the p’s is infinite. If the conditional covariances do not change sign, then the range of possible values is finite. For the general problem, consider the set of k + 1 values of /l constructed by using each of the k + 1 variables in Z, as the dependent variable in an ordinary least squares regression. In all cases the least squares solution is standardized so that the equation is written in the form Y = X/l. The bounds for /l will be finite if each coefficient vector in the set of k + 1 vectors obtained from the k + 1 regressions has the same sign structure. If the k + 1 vectors are all in the same orthant of the k-dimensional space, the set of possible values for /l is composed of the convex linear combinations of the k 1 coefficients obtained from the k + 1 regressions. If the set of k + 1 population vectors are not all in the same orthant, the set of possible values for /l is unbounded. See Klepper and Leamer (1984) and Patefield (1981). For the three-variable case, the possible set of values for (pl, 8,) is a triangle in (pl, p,) space with vertices corresponding to the values obtained from the three ordinary least squares regressions using each of the three variables as the dependent variable, provided all three solutions are in the same quadrant.
+
REFERENCES
Bekker, Wansbeek, and Kapteyn (1985), Klepper and Leamer (1984), Koopmans (1937), Patefield (1981). EXERCISES 1. (Section 2.1) Let model (2.1.1) hold with ZUe= 0. For one explanatory variable and simple uncorrelated measurement error it was shown in Section 1.1.1 that the expected value of the ordinary least squares estimatofi of the slope was closer to zero than the true value. (a) Show that
where y = ZXiZXu.(Hint: See (4.A.20) of Appendix 4.A.)
2.2.
103
THE MODEL WITH AN ERROR IN THE EQUATION
(b) Let the two squared multiple correlations be defined by K ? , = u,&’E,,,E~~ZXyand
R:,
=
u~~EyxE~xlZxy.
Show that R i X < R ; x . Show that the inequality is strict if at least one element of Zuuis positive. 2. (Section 2.1) Let model (2.1.1) hold and let XI = (xll, XJ be observed, where j’= (j’, B;), , x,, is measured without error, and X,2 = x , ~ + u12.Show that, typically, the least squares coefficients obtained in the regression of V, on x, are biased for both PI and f12. Show that the least squares estimator of 8 , is unbiased [or j1if E { x ; , x l 2 } = 0. See Carroll, Gallo, and Gleser ( I 985).
2.2.
THE MODEL WITH AN ERROR IN THE EQUATION 2.2.1.
Estimation of Slope Parameters
The generalization of model (1.2.1) of Section 1.2 to a model containing a vector of x variables is
r, = x,B + e,,
X,
=
x,
+ u,,
(2.2.1)
for t = 1,2, . . . , n, where x, is a k-dimensional row vector, /?isa k-dimensional column vector, and the (k 1)-dimensional vectors E: = (et, u,)’ are independent normal (0,C8J random vectors. It is assumed that the vector of covariances between e, and u,, Ze,, and the covariance matrix of u,, C,,, are known. The variance of e,, cr,,, is unknown. Instead of assuming the x variables to be random variables, as we did in Section 1.2, we initially assume that {x,} is a sequence of fixed k-dimensional row vectors. We examine the likelihood function assuming that XEEis nonsingular and, with no loss of generality, that
+
Xu”= I and EUe= 0.
(2.2.2)
With these assumptions, the density of (e,, u,)’ is (271)- W + 1)D0,1/2
exp(
ei2
+ uru3>-
(2.2.3)
Because x, is fixed, the Jacobian of the transformation of (e,, u,) into ( y, X,) is one, and the logarithm of the likelihood for a sample of n observations is log L = -+n[(k
+ 1) log 2.n + log
Oee]
To obtain the maximum likelihood estimators, the likelihood is to be maximized with respect to p, gee,and x,, t = 1,2,. . . , n. Differentiating (2.2.4),
104
VECTOR EXPLANATORY VARIABLES
we obtain
(2.2.5)
Equating the partial derivative with respect to x, to zero and using the hat
(*) to denote estimators, we obtain
i; = [8i1jpt I]-1[8,11;fi Noting that
flp + 11-
*-IAA
[o,,
we obtain
= I - [l
+ Xi],
t =
1,2,. . . , n.
(2.2.6)
+ 8,18'jyjjv;,f,
r, - 2,s = (8,,+ j$)-'(y, - X,j)S,,, x, - 2, = (8, + bj)-'(Y;- X,j)b.
(2.2.7) (2.2.8) If we substitute (2.2.7)into the second equation of (2.2.5)and set the derivative equal to zero, we obtain
8,'(8,,
+ b'j) = (8ee+ Pj)-'[n-l
I=1
(Y, - X&].
(2.2.9)
Equation (2.2.9)signals a problem. We know that the variance of
+
Y, - X,fl = e, - u,fl
is o,, fl'fl. Therefore, the quantity on the right side of Equation (2.2.9) should be estimating one, but the quantity on the left side of the equation is clearly not estimating one. We conclude that the method of maximum likelihood has failed to yield consistent estimators for all parameters of the model. This is a common occurrence when the likelihood method is applied to a model in which the number of parameters increases with n. In our model the parameters are (o,,, fl, x2,. . . ,xn). We should not be surprised that we cannot obtain consistent estimators of n + 2 parameters from n observations. Estimation in the presence of an increasing number of parameters has been discussed by Neyman and Scott (1948), Kiefer and Wolfowitz (1956), Kalbfleisch and Sprott (1970), and Morton (1981a). Anderson and Rubin (1956) showed that the likelihood function (2.2.4)is unbounded.
xl,
2.2.
105
THE MODEL WITH AN ERROR IN THE EQUATION
Let us now examine maximum likelihood estimation for the model with random x values. We shall see that the application of likelihood methods to the normal structural model produces estimators of the unknown parameters that are consistent and asymptotically normal under a wide range of assumptions. Let
X = Bo + x,Sl + e,,
+ u,,
X, = x,
(2.2.10)
where
Assume that C,, and Zueare known. To be conformable with the dimensions of (2.2.1), we let X, be a (k - 1)-dimensional vector. Let a sample of n vectors Z, = (X,X,) be available. Then twice the logarithm of the likelihood adjusted for degrees of freedom is 2 log L,(B) = - kn log 27c - ( n - 1) loglC,,I
-
(n - 1) tr{mzZ&i}
(2.2.11)
- n(Z - C Z ) G A Z - P Z Y ,
where 8’ = [ A , P o , B;
9
bee,
(vech LJ’]
and Czz = (A,I Y ~ X X ( P 1 ,1) + Z E E .
(See Appendix 4.A for the definition of vech Zxx.)The number of unknown parameters in B is equal to the number of elements in the vector of statistics [Z, (vech mZJ]. It is well known that 4, = and kzz = m,, maximize the likelihood (2.2.11) with respect to pz and Zzz when there are no restrictions on pz and Czz. Therefore, due to the functional invariance property of the method of maximum likelihood, the maximum likelihood estimators adjusted for degrees of freedom are
z
(A, Po, = (X, Sl
-
Xh,
= (mxx - ~
8 e e = mYY
’-
u u ) -I ( m X Y
(2.2.12) - Cue),
2mY,B,+ blm,.J,
ex,
+ 2xeuB1 - B;Ls~,
and Ex, = m,, - E,,, provided is positive definite and Be, 2 Ee,?&Zue, where Xiu is the Moore-Penrose generalized inverse of Xu,.If either of these conditions is violated, the estimators fall on the boundary of the parameter space. Let < i;t2< * * * < ;IF!l < 1; be the positive values of I - ’ that
106
VECTOR EXPLANATORY VARIABLES
satisfy (2.2.13)
where iiaall = Ze,,EiuEueand (2.2.14)
If A, c 1 and
A,-
ex,= mxX- ik&,u,
> 1, the maximum likelihood estimators are j 1
= (mxx
- lkZuu)-
l(mXl'
- RkZue),
(2.2.15)
and bee= ZeuZiUZue. If I,- < 1, the maximum likelihood estimator of Zxxis singular and the estimator of /I1is indeterminate. Estimator (2.2.15) is derived in the proof of Theorem 4.C.1. The estimator of given in (2.2.12) is the vector analogue of the estimator given in (1.2.3). On the basis of degrees-of-freedom arguments, we suggest that the estimator of oeein (2.2.12) be replaced by = svv - B;cuJl + 2EeuB1,
x;=l bee
(2.2.16)
where s,, = (n - k)-' [I; - 7 - (X, - X)$,]'. In many practical situations the matrices &, and Eueare not known but are estimated. For example, it is common for laboratories to make repeated determinations on the same material to establish the magnitude of the measurement error. If the covariance matrices for measurement error are estimated, a more detailed specification of the model than that given in (2.2.1) is required. This is because the random variable e, entering Equation (2.2.1) may be composed of two parts. The true values y, and x, will not be perfectly related if factors other than x, are responsible for variation in y,. Thus, one might specify Y f = x,P
+ 41.
(2.2.17)
where the q, are independent (0, oQq) random variables, and 4, is independent of x j for all t and j. The random variable 4, is called the error in the equation. We observe
(X,XI) = (Y,, XI) + (w,,u,),
(2.2.18)
where (w,,u,) = a, is a vector of measurement errors, a;
NI(& Zoo),
(2.2.19)
and a, is independent of (qj,x j ) for all t and j. In terms of the original model (2.2.1), e, = w, q, is the sum of an error in the equation and an error made in measuring y,. Typically, the variance of q, is unknown, but it is possible to conduct experiments to estimate the covariance matrix of a, = (w,,u,). Be-
+
2.2.
THE MODEL WITH AN ERROR IN' THE EQUATION
107
cause w,and q, are assumed to be independent, the covariance between u, and w, is equal to the covariance between u, and e,. Let S,, denote an unbiased estimator of Zoo.Then the estimator of fi analogous to (2.2.12) is
S = (Mxx - StdU)-'(MxY- Suw).
(2.2.20)
We use matrices of uncorrected sums of squares and products to define the estimator so that the estimator is conformable with model (2.2.17). The definition of the estimators permits the matrix S,, to be singular. For example, if the model contains an intercept term, one of the elements of X,is always one and the corresponding row and column of S,, are zero vectors. Given the estimated error covariance matrix S,,, a consistent estimator of aqqis (2.2.21) gqq= s,, - (SWW- 2 B U W + P"'S,,B),
c:=,
where s,, = (n - k)(I:- X,B)'. In Theorem 2.2.1 we give the limiting distribution of the standardized estimators of fi and oqqunder less restrictive conditions than those used to obtain the maximum likelihood estimator. The x, are assumed to be distributed with a mean p,, and covariance matrix Xxx, where the covariance matrix C,, can be singular, This specification contains both the functional and structural models as special cases. Under the fixed model, x, = pxr are fixed vectors. Under the random model I(,, 3 (1, p,), where p, is fixed over t, and the lower right (k - 1) x (k - 1) portion of X,, is positive definite. Dolby (1976) suggested that the model with random true values whose means are a function of t be called the ultrastructural model. To obtain a limiting normal distribution, some conditions must be imposed on the sequence of means of the true values. We assume that (2.2.22)
+
and let M,, = M,, Zxx,where M,, is positive definite. Commonly, the error covariance matrix will be estimated from a source independent of the M,, matrix. To assure that the error made in estimating the error covariance matrix enters the covariance matrix of the limiting distribution, we assume that the error in the estimator of C,, is of the same order as the error in Mzz. This is accomplished by assuming that the degrees of freedom for S,, is nearly proportional to n. Thus, we assume that lim d;'n = v,
n-m
(2.2.23)
where d , is the degrees of freedom for S,, and v is a fixed number. If the covariance matrix of the measurement error is known, then v = 0.
108
VECTOR EXPLANATORY VARIABLES
Theorem 2.2.1. Let model (2.2.17)-(2.2.19) hold and let (x, - p,,, 4,) be independently and identically distributed with zero mean vector and covariance matrix, E((xr - pxrr qrY(xr - ~ x r 41)) , = block diag(Cxx, nqqh
where { p x r }is a sequence of fixed k-dimensional vectors, q, has finite fourth moments, and q, is independent of x,. Let S,, be an unbiased estimator of ZOathat is distributed as a multiple of a Wishart matrix with d, degrees of freedom, independent of (3, XI) for all t . Let (2.2.22) and (2.2.23) hold. Let 0 = (p,oqq)'and let 8 = (p,dJ, where is defined in (2.2.20) and dqq is defined in (2.2.21).Then
n'''(8 where the submatrices of
- e)
A N(O, r),
r are
rsa= M,'o,,
+ M;:[z,,~~~ + C,,~Z~~]M;:
+ vM.L'[Lorr + LZ:vu]Mxxl, rq4= v{u:}+ 2 ~ 4 , = 2MixlGinuu
r p q
and or, = ow, - 2Z,,j?
+ B'Z,,j?.
+ vow),
Proof. Using the fact that X,= x,
+ u,, we may write
y; = X,B + 4, where u, = e,
- u,B.
It follows that
B = (Mxx - S,J=B
W x x S + Mxu - s,, + S U U B - S,,B) + (M,x - Suu)-'C(Mxu - Cur) - (Sur - Zur)],
c:=
where Mxu = n-' X,u,, Zuw- Z,,~ = Zuu.NOW M,, = n -
I,
= w,
- ur/?, Sur= S,, - S,,p, and Cur=
(xix,
+ xIu, + uix, + u)~,)
n
r= 1
and, by the weak law of large numbers,
2.2.
109
THE MODEL WITH AN ERROR IN THE EQUATION
Consider the linear combination n1'*S'(MxU - Xu,,) = n-'12
C C S,[x,,u, + uriu, - ouuil] 1=1 i = l n
k
(2.2.24) where ouuilis the covariance between tiir and u,, S = (dl, S,, . . . ,dk)' is an arbitrary vector such that 6'6 # 0, and the random variables, k
gt =
i=1
+ utiur - ousill,
Si[xtiut
are linear functions of independent and identically distributed random vectors. The mean of gr is zero and the variance is
Eb:}
= a"(P:IPx*
+L+U
o u u
+ &"Lula*
Therefore, by Lemma 1.C.1, n-1/2
f gr 5 N { O ,s ' M ~ +~ a'(~,,,,o,,~ s ~ ~ ~+ ~
r=1
~ ~ ~ ~ ~ ) t j } .
Because the nonzero S was arbitrary,
+
n1'2(Mxu- XU")5 "0, MxPua ~
+
u u ~ u u~
" U ~ " , ) .
conThe limiting distribution for n l i 2 ( B - p) follows because nl/'(S,,, - Zu,) verges in distribution to a multivariate normal random vector with mean zero and covariance matrix
+L
~ F u u o r r
r L )
and because S,, is independent of Mxv. We have shown that - j = O,(n-'/'), and it follows that
d,, = ( n - k)-'
I=I
u:
- (1, -b')Soo(l, -/?')'
Because S,, is independent of M,,,and
incl,,the
+ O,(n-').
conclusion follows.
0
110
VECTOR EXPLANATORY VARIABLES
can be estimated by
The variance of the approximate distribution of
V{/Q
-
= n-'[M,'s,,
+ M~J(s,,,,~,, + ~,,,~,,,,~M~~]
+ d j lM;JIS,,s,, + s,,&]M;J,
where M,, = M,, - S,,,
s,, = S,,
(2.2.25)
-
- S,,fl,
s, = (1, -P)SaaU,
-PI',
and s,, is defined in (2.2.21).If Z,,, and Zueare known, the estimated covariance matrix is given by (2.2.25) with d j ' = 0. The normality of u, and the assumption that S,,, is a Wishart matrix were used in Theorem 2.2.1 to obtain an explicit expression for the covariance matrix of the limiting distribution. If M;JS,, is not large, the variance estimator (2.2.25) will be satisfactory for modest departures from normality. Occasionally, some of the variances and (or) covariances in the matrix Xa,are known to be zero, while others must be estimated. A common model in sociology and psychology postulates a diagonal covariance matrix for the measurement error. In most such cases the estimators of the individual error variances will be independent. An estimator of fl for the model with diagonal covariance matrix is
s' = Wxx - Suu)-'Mx~,
(2.2.26)
and the elements of (Suull,Suu22,. . . , where S,, = diag(S,,, Suu22,. . . , Suukk) Suukk)are independent unbiased estimators of (uuull, o , , , ~ .~, ,. , uUukk) distributed as multiples of chi-square random variables with dr!, d,,, . . .,d f k degrees of freedom, independent of (yt, X,) for all t . The estimator of the covariance matrix of the approximate distribution of 8 is
where M,,
a{@}= M;~[n-'(MxxsuU .. .. - + s,,6,,)+ 2&,]fi;J,
= M,,
- S,,, S,, I
= -S,,fl,
k,= diag{d;?~%,ll, d;21&S:uz2,. . .
(2.2.27)
dYk1Bk - 2 s 2uu~d,
and s,, is defined in (2.2.21). Because the knowledge that some covariances are zero is used in constructing the estimator, the variance estimated by (2.2.27) is smaller than the variance estimated by (2.2.25), for comparable degrees of freedom. Example 2.2.1. In this example we consider some data studied by Warren, White, and Fuller (1974). In the original study the responses of 98 managers of Iowa farmer cooperatives were analyzed. We use a subsample of the original data containing 55 observations. The data are given in Table
TABLE 2.2.1. Data from role performance study Observation
Knowledge
Value Orientation
Role Satisfaction
Past Training
Role Performance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1.193 1.654 1.193 1.077 f ,539 1.385 1.462 1.385 1.539 1.654 1.462 1.154 1.424 1.116 1.270 1.347 1.116 1.077 1.423 0.923 1.385 1.270 1.116 1.346 0.846 1.077 1.231 0.962 1.500 1.577 1.885 1.231 1.808 1.039 1.385 1.846 1.731 1.500 1.231 1.346 1.347
2.656 3.300 2.489 2.478 2.822 3.000 3.111 2.545 2.556 2.945 2.778 2.545 3.61 1 2.956 2.856 2.956 2.545 3.356 3.21 1 2.5f 6 2.589 2.900 2.167 2.922 1.711 2.556 3.567 2.689 2.978 2.945 3.256 2.956 2.81 1 2.733 2.400 2.944 3.200 2.91 1 3.167 3.322 2.833
2.333 2.320 2.737 2.203 2.840 2.373 2.497 2.61 7 2.997 2.150 2.227 2.017 2.303 2.5 17 1.770 2.430 2.043 2.410 2.150 2.180 2.490 1.920 2.663 2.520 3,150 2.297 2.307 2.830 2.731 3.1 17 2.647 2.217 2.321 2.447 2.347 2.410 2.277 2.577 2.507 2.653 2.587
2.000 2.000 2.000 2.000 2.000 2.000 2.667 2.167 2.000 2.167 1.833 2.167 2.333 2.333 2.333 2.000 2.000 2.000 2.000 2.000 2.000 2.333 1.333 2.000 1.500 2.000 2.167 1.333 2.000 2.167 2.667 1.667 2.000 2.000 2.167 2.000 2.000 2.000 2.333 2.500 2.667
- 0.054
19
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
0.376 0.072 -0.150 0.171 0.042 0.188 - 0.052 0.3 10 0.147 0.005 - 0.088 0.044 - 0.073 0.224 0.103 -0.108 -0.019 - 0.062 -0.239 -0.159 0.069 -0.1 18 0.083 -0.255 -0.159 0.014 0.102 0.109 0.006 0.334 - 0.076 - 0.043 -0.126 - 0.056 0.203 0.023 0.047 0.01 1 0.153 0.100
111
112
VECTOR EXPLANATORY VARIABLES
TABLE 2.2.1.
(Continued)
Observation
Knowledge
Value Orientation
Satisfaction
Role
Past Training
42 43 44 45 46 47 48 49 50 51 52 53 54 55
1.154 0.923 1.731 1A08 1.193 1.308 1.424 1.385 1.385 1.347 1.539 1.385 1.654 1.308
2.967 2.700 3.033 2.91 1 3.311 2.245 2.422 2.144 2.956 2.933 3.41 1 1.856 3.089 2.967
3.140 2.557 2.423 2.793 2.283 2.210 2.350 2.330 2.130 2.837 2.600 2.790 2.500 2.8 13
2.000 1.833 2.000 2.000 2.333 2.000 2.000 2.000 2.000 2.167 2.167 2.000 2.000 2.667
Role
Performance - 0.089
0.007 0.089 0.182 0.259 0.007 - 0.015 - 0.023 -0.150 0.152 0.377 0.043 0.184 0.127
2.2.1. The postulated model is 4
Y, = P o
+ iC X t i P i + qr, = 1
(2.2.28)
where y, is the role performance of the tth manager, x1 is knowledge of the economic phases of management, x 2 is value orientation, x 3 is role satisfaction, and x4 is past training. The random variable q, is assumed to be a normal (0,oq4)random variable. Value orientation is the tendency to rationally evaluate means to an economic end, role satisfaction is the gratification obtained from the managerial role, and past training is the amount of formal education. The amount of past training is the total years of formal schooling divided by six and is assumed to be measured without error. Role performance, knowledge, value orientation, and role satisfaction were measured on the basis of responses to several questions for each item. Using replicated determinations on the same individuals, measurement error variances were estimated to be 0.0037, 0.0203, 0.0438, and 0.0180 for role performance, knowledge, value orientation, and role satisfaction, respectively. Each error variance estimate is based on 97 degrees of freedom and it is assumed that Z,,is,,diagonal. This is the model for which estimator (2.2.26)is appropriate. The matrix of mean squares and products corrected for the mean is
i
0.0598 0.0345 mxx = 0.0026 0.0188
0.0345 0.0026 0.1414 -0.0186 -0.0186 0.0887 0.0474 - 0.0099
1
0.0188 0.0474 - 0.0099 0.0728
2.2. THE
MODEL WITH AN ERROR IN THE EQUATION
113
and
m i u = (0.0210,0.0271, 0.0063,0.0155). using the program SUPER CARP. For reasons We compute the estimate of /l that will be explained in Section 2.5, the estimator is computed by subtracting a multiple, slightly less than one, of Suufrom the moment matrix. The estimator of expressed in terms of uncorrected moments is
b = [Mxx - n - ' ( n
- 6)Suu]-'MXi =[-1.24, 0.36, 0.149, 0.117, 0.040]', (0.27) (0.13) (0.094) (0.069) (0.075)
(2.2.29)
where the first coefficient is the intercept and the numbers in parentheses are the estimated standard errors of the coefficients.The sample size is n = 55 and 5 is the number of parameters estimated. The standard errors are obtained from the covariance matrix of expression (2.2.27). In this example the increase in the variance associated with the estimation of the error variances is small. For example, the estimated variance of the coefficient for knowledge, assuming the error variance to be known, is 0.0157. The estimated variance of the knowledge coefficient, under the assumption that the estimated error variance is based on 97 degrees of freedom, is 0.0173. By Equation (2.2.21), an estimator of oqqis Cqq= 0.0129 - 0.0075 = 0.0054,
x:=,
where s,, = 0.0129 and S,, + in the proof of Theorem 2.2.1,
Suuiiflf = 0.0075. By the arguments used
(2.2.30) and an estimator of the variance of the approximate distribution of Gqq is
= (0.00266)2.
(2.2.31)
The estimated standard error for Cqq is about one-half of the estimate. An alternative test of the hypothesis that nqq= 0 is developed in Section 2.4 and illustrated in Example 2.4.2. 00
2.2.2. Estimation of True Values In Section 1.2.3 we obtained, for fixed x,, the best estimators of the x, using the model information. We now extend those estimators to the model of this section. We assume model (2.2.17)-(2.2.19).and, as usual, we let u, = e, - u,fl
114
VECTOR EXPLANATORY VARIABLES
and a, = (w,,u,). In the model (2.2.17) the true y, differs from x,p by 4,. Hence, it is of interest to estimate both y, and x,. To estimate (yt, x,), we predict (e,, u,) and subtract the predictor of (el, u,) from (X, XI).Under normality, the best predictor of (e,, u,), given u,, is
(2.2.32)
(e,, U,) = u,6’,
where 6’= r ~ ~ ; ~ & , ~It. follows that the best estimator of (y,, x,), treating x, as fixed, is
(2.2.33)
( j , ,XI) = (K, XI) - U,d’.
For the estimator of z, constructed with estimated model parameters, we let
-
2, = ( j , ,a,) = (u,, X,)- OI,&
(2.2.34)
-p)’.
where s* = o,, Xu,, &, = (1, -p)Se,, and 8,, = (1, -F)Mzz(l, An estimator of the covariance matrix of the error in the approximate distribution of 2, is A-1-
cr{i;lx,} = s,, - uuu&&. ^.-I-
(2.2.35)
The estimator (2.2.35) ignores the contribution of estimation error in band 8,, to the variance. An estimator of the variance of ( j , ,n,) that contains order n-’ terms associated with parameter estimation is given in (2.2.41) of Section 2.2.3.
Example 2.2.2. We construct estimates of the individual y and x values for Example 2.2.1, treating x, as fixed. Let where xI is knowledge, x2 is value orientation, x j is role satisfaction, and x4 is past training. Then the estimated covariance matrix of the measurement error is
The matrix I
-
0.712 -0.356 0.103 0.043 0.034
0.0 0.564 0.507 0.164 0.627 0.204 1.0 0.698 0.0 0.799 -0.181 -0.059 0.925 -0.024 0.0 -0.084 0.981 0.0 -0.066 -0.059 - 0.011 0.0 -0.022 -0.020 -0.007
0.0 0.0 0.0 0.0 0.0 1.0
2.2.
115
THE MODEL WITH AN ERROR IN THE EQUATION
TABLE 2.2.2. Estimated true values for role performance data
i Observation
1 2 3 4 5 6
7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
B
Knowledge
Value Orientation
Role Satisfaction
Role Performance
0.010 0.182 0.1 14 - 0.003 0.028 -0.018 0.043 - 0.080 0.188 0.0 19 - 0.026 0.037 -0.126 -0.061 0.284 0.056 0.034 - 0.027 -0.141 - 0.046 -0.172 0.105 0.034 0.03 1 - 0.003 - 0.034 - 0.074 0.212 - 0.032 - 0.208
1.198 1.756 1.257 1.075 1.554 1.374 1.486 1.340 1.644 1.664 1.447 1.175 1.353 1.081 1.430 1.378 1.135 1.062 1.344 0.897 1.288 1.329 1.135 1.363 0.845 1.058 1.189 1.081 1.482 1.460
2.661 3.392 2.546 2.476 2.836 2.991 3.133 2.504 2.65 I 2.954 2.765 2.563 3.547 2.925 2.999 2.984 2.561 3.342 3.140 2.533 2.502 2.953 2.184 2.938 1.710 2.538 3.529 2.796 2.962 2.839
2.335 2.350 2.755 2.203 2.845 2.370 2.504 2.604 3.028 2.153 2.222 2.023 2.283 2.507 1.817 2.439 2.049 2.406 2.127 2.173 2.462 1.937 2.669 2.525 3.150 2.291 2.295 2.865 2.732 3.082
- 0.057
0.324 0.039 -0.149 0.163 0.047 0.176 - 0.029 0.256 0.142 0.012 - 0.099 0.080 - 0.056 0.142 0.087 -0.118 -0.011 - 0.021 -0.226 -0.110 0.039 -0.128 0.074 - 0.254 -0.149 0.035 0.04 1 0.1 I8 0.066
where s,,, = 0.0129, (1,
(d,,,
-p)= (1, 1.24, -0.36,
-0.149, -0.117, -0.040),
2””)= (0.0037,0.0, -0.00731,
-0.00653, -0.00211, 0.0).
The estimated y and x values for the first 30 observations are given in Table 2.2.2. Because xo = X , = 1 and because past training is assumed to be measured without error, estimates for xo and x4 are not given in Table 2.2.2. The estimated covariance matrix of the estimator of z, defined
116
VECTOR EXPLANATORY VARIABLES
in (2.2.35)is
-
0.0026 0.0 0.0
0.0
0.0021 0.0019 0.0006 0.0
0.0 0.0 0.0 0.0
0.0021 0.0019 0.0 0.0 0.0162 -0.0037 -0.0012 0.0 -0.0037 0.0405 -0.0012 -0.0011 0.0177 0.0 0.0 0.0 0.0 0.0
The quantity 17, = pixfiis also given in Table 2.2.2. Under the model, u, and 1, of (2.2.33)are uncorrelated. Therefore, as with the simple
0.25
0.20 0.15
.
0.10
. .
0.05
5I 3
0
1
. *
-0 . 0 5 -0.10
*
.
. :
. .' 5
.
.
.*
.. .'
-0.15
--0.25 1.5
1.7
o1.9 2.1
2.3 ,2.5
2.7
2.9
2 3.1
3.3 3.5 0
EST. TRUE VALUE FOR VALUE ORIENTATION FIGURE 2.2.1. Plot oF residual against estimated true value for value orientation.
~
2.2.
117
THE MODEL WITH AN ERROR IN THE EQUATION
model of Section 1.2, plots of 6,against the elements of fit should be constructed as model checks. Figure 2.2.1 contains a plot of 6,against the estimated true value for value orientation, ?,. This plot gives no reason to reject the assumptions of linearity and normality. The plots of 6,against the other estimated true values also gave no reason to reject the original model. Figure 2.2.2 contains a plot of the ordered G, against the expected value of the normal order statistics for a sample of size 55. The KolmogorovSmirnov statistic for the test of normality has a significance level of 0.084 when computed under the assumption that the 6, are a simple random sample of size 55. Given that the 6,are based on estimated parameters it
0.25
0.20 0.15
...
0.10
.*
0.05
...*'.
I-
2I 0 >
....
..'
-0.05
*.*-
-0.10 -0.15
.'
-0.20 -0.25. *
I
-2.0
x
1
1
I
-1.0
1
1
I
I 0
I
I
I
I
1
1.0
1
1
I
I
1
I
2.0
118
VECTOR EXPLANATORY VARIABLES
seems that one would be willing to use calculations based on the normal no model.
2.2.3. Higher-Order Approximations for Residuals and True Values The estimator of the variance for the estimated x, given in (2.2.35) ignored the increase in variance of the estimator that comes from replacing parameters with estimators. In this section, we develop an estimator for the variance of the approximate distribution of 2, that contains an estimator of the contribution to the variance arising from the estimation of the parameters. We retain model (2.2.17)-(2.2.19) with the assumption of normal qr and the assumption that the covariance matrix Z,, is known. Assume that we have completed estimation for a sample of size n so that we haveTstimators of /?,oq4,and o,, and an estimator of the covariance matrix of fl. We wish to estimate the vector of true values (y,, x,) and the covariance matrix of the estimation error, treating x, as fixed. Let
-b') 8' = &;leva
a^' = (1,
and a' = (1, -/?'), and 6' = (T;'Z,,, and u, = Z,a = eta,
i?, = Z,d
' c:=
utl, 8, = (w, + q,, u,) = (el, u,),
where Z, = (X,X,),a, = (w,, a*'&,,, a, = n i?;, and The estimator of z; is
2; = z; - 60, = Z; - 6u, - (,$ - S)U, where
S^ - s = a,['(~,
- 26E,,)(oi
Therefore,
&a
= a'z,,,
e,, =
fl is defined by (2.2.20) with Z,, replacing So,.
+
- SZ,(&- a)
+ oP(n-'),
- a) - ~(m,,, - o,,)]
(2.2.36) (2.2.37)
+ op(n-').
2; - z; = Z; - - [Sz, utou;'(Z,, - 6Z,,)](& - a) v,o,'6(mu, - o,,) O,(n- 1) = 2; - 2; [SZ, u,(J;'(E,, - SZ,,)][O, (M,X - Zuu)M,']' (2.2.38) v , ~ , ' ~ ( m ,, o,,) Op(n-
+
+
+
+
+
+
'x
where z, = Z, - ~ ~ 6 ' . We drop the term of O,(n-') in (2.2.38)and evaluate the covariance matrix of the leading terms. The random vector 2, - 2, = E, - U,s' (2.2.39)
is independent of v, and it follows that the covariance between e, - 0,s' and the remaining 0,(n-''2) terms of (2.2.38)is zero. The only cross product term
119
Therefore, the covariance matrix of the approximate distribution of 2, - z, is Xaa -
+ 6d'E{~[v,,z;}+ c;uyz,,
- SzU,)v,,(Xa0 - dXuO) - SC,,)(O, ZuuMi:)'Xu,
+ 2(n - k)-'66'0,, + 2n-'0;~'[(&, + 2=uuM,')(C,, - G,)l,
(2.2.40)
%io9
where V,, is the covariance matrix of the approximate distribution of ai and M,, is the matrix of mean squares and products of the true x,. Because the first row and column of V,, contain only zeros, zv z = g v 21 t
aa I
f
pp
17
b.
where V,, is the covariance matrix of the approximate distribution of All terms in expression (2.2.40)have sample analogues. Therefore, an estimator of the variance of the approximate distribution of 2, - z, is
-
V(2, - z,} = c,, - 6Zu,
+ d,yC,,
+ 2n- l[(X,,
va,
+ 8S11[%,Vp,%;+ 2(n - k)- 'auu]
- &,)vUa(C,, - &),
- s^2,,)(0,ZuuM;;y&
+ &o, 2uuM;:)(L,
v,,)
e,,
-
&,)I,
(2.2.41)
where = block diag(0, and is >he estimator of the covariance matrix of the approximate distribution of p. The last three terms of (2.2.41) are products of three matrices, where two of the matrices are matrices of error variances and the third is a function of the inverse of an estimator of Mxx.Therefore, the last three terms of (2.2.41) will be small relative to the other terms when the error variance is small relative to the variation in x,. An approximation to the first three terms of (2.2.41) is ${il - zl} = c,, - n-'(n - k - 2)SC,,. * A
(2.2.42)
To develop higher-order approximations for the predictor constructed , x, = (1, x , ~ and ) xllis under the structural model, we let ztl = (y,, x , ~ ) where the (k - 1)-dimensional vector that is the random portion of x,. Then a predictor of z , ~is Z,, = Z1 + (Zt1- %)(I - C m i A I L l l ) ,
(2.2.43)
120
VECTOR EXPLANATORY VARIABLES
where c = (n - l)-'(n - k - 2), inzz1 is the sample moment matrix of ZI1, CIIoI1is the covariance matrix of aI1,and a,, is the error associated with ztl. The predictor is an estimator of the conditional expectation of z,, given ZI1. Under the normal distribution assumption, an estimator of the variance of Z,, - zI1is W I l
- Z I l > = CPclll
- C2~oa11~i;lI~.P11.
(2.2.44)
The multiplier c appearing in expressions (2.2.43) and (2.2.44) is based on the work of Fuller and Harter ( 1987). One can also use Taylor series to derive the covariance matrix of the approximate distribution of the residuals fi,. Let 9 denote the n-dimensional column vector $ = (i?,, 8,, . . . , 8J = Y - xg, where Y = (Yl, Y 2 , .. . , Y")', and X' = (Xl, Xi,. . . , Xk). By our usual expansions,
X(/ - /I) = v - XM,'(M,, -q+ o&+).
0 =v
-
(2.2.45)
Now, for fixed x,, tr{[~;~tcvu +( where aj, is Kronecker's delta. It follows that
- %J} =
E{ujXW,'(M,u
E { [ v - n-'XM,'(M,,
- Cuu)][V
= InCcuu
- n-'XM,'(M,,
- 2n-I t r { ( L e u u
L c u u
+ ~:uFuuPjr]Mxxl},
- ZuU)]']
+ L&w)ML'} + tr{LVpp}]
- ~ ~ - ' x M , L x ' o ,+, n-'xVpp~' + O(n-2).
The quantity of most interest for model checking is the vector of standardized residuals s~;"~$. Using - 112 = A - 1 / 2 - LA - 3/2 uu
SUU
= A,"2[ 1
2 uu [- 2 ~ u u ~ , 1 ~ u u-( Bn1 + O,(n- e,'C,,(j - /?)I + O,(n- I),
we have [v - X(B - /91[1 - G 1 & w ( B - n1 + O p ( n - 7 Aiu1'2[I,- n-'SM,'f']v Op(K1), (2.2.46)
- 1/29 = A-
suu
uv
=
1/2
+
where x = X - vc;,,lCUu, and A,, = ( n - k ) - ' ~ ' ( 1 ~n- 1X'ML:X')2~.
In ordinary least squares estimation, with x observed, it is common practice to estimate the covariance matrix of the residuals with [I, - x(x'x)-'x']s2,
2.2. THE MODEL
121
WITH AN ERROR IN THE EQUATION
where I, is the n-dimensional identity matrix and s2 is the residual mean square. By (2.2.46),the analogous estimator for the measurement error problem is
[I,
+ B ( ~ ; ’ Q ~ -~ 2n-’M,’)i‘]s,,,
(2.2.47)
where 9 is the n x k matrix of estimated true values and nMxx is the estimator 1 x;x,. of Expression (2.2.47) for the approximate covariance matrix of the 0, contains terms that are not present in the covariance matrix of ordinary least squares residuals. Nonetheless, the covariance matrix
I:=
[I,
- qB’2)-12’]S”,
furnishes a useful approximation in many applications. For example, if one has available a program for diagnostic checking for ordinary least squares, the 6, and 9, can be read into the program as the “dependent variable” and the vector of “independent variables,” respectively. The resulting diagnostics can be treated, approximately, in the same manner as one would treat the statistics for ordinary least squares.
Example 2.2.3. We use the data and model of Example 1.2.1 to compute estimates of the variances of 2,and 9,. From Examples 1.2.1 and 1.2.2, we have n = 1 1 , b, = 0.4232, uuu= 57,8,, = 53.4996,8, = 247.8545, and 8,, = - 24.1224. We assume that the measurement error made in determining yield for a sample of plots within the field is uww= 25 and we also assume that uuw= 0. It follows that A-18’ = oliu Xua = 8 u i 1 ( ~ u0,w8,J , = (0.4673,0, -0.4509),
1.
where the zero element in the vector is for the intercept. The first-order approximation to the covariance matrix of (5, - y,, I, Zr - x,) is
c,, -sc,, *-
=
[
13.3177 0 11.2722 0 0 0 11.2722 0 46.1235
For a model with an intercept, the computations are somewhat simplified and are less subject to rounding error if all independent variables, except the variable that is identically equal to one, are coded as deviations from the mean. If we let
Z, = ( x , 1, X , - 70.6364),
then d’ = ( 1 , -97.4545, -0.4232), nMxx= diag( 1 1 , 2478.545), and
quu= diag(0, 4.8636,0.0304).
122
VECTOR EXPLANATORY VARIABLES
TABLE 2.2.3. Order approximations to the variance of ( i f$,, $) for corn yield and soil nitrogen
Site
i ,- R
1 2 3 4 5 6 7 8 9
- 5.68
10
11
29.24 - 17.64 - 10.54 25.37 - 8.28 - 16.00 - 1.17 19.60 1.62 - 16.55
Wf - x,)
W f- Y J
P{0,)
51.5 56.6 53.2 52.0 55.3 51.7
16.8 22.2 18.6 17.3 20.9 17.0 18.3 16.6 19.1 16.6 18.4
54.2 43.7 50.7 53.2 46.4 53.8 51.4 54.6 49.7 55.2 51.7
52.9
51.3 53.7 51.3 53.0
[P{O f } ] - "20,
- 1.52
0.97 0.00 - 1.19 0.33 - 0.50 1.43 -0.16 - 1.18 0.97 0.95
From (2.2.41), the sum of the terms in the covariance matrix of the approximate distribution of if- zf that are constant over t is 15.5213 0
I
8.9926
(2.2.48)
[ O8.9926 0O 50.3172 0
The estimator (2.2.41) is the matrix (2.2.48) plus &i,?,,,ii.Because the row and column of V { i , - z f } associated with the intercept will always be zero, one can easily restrict the computations to 2 x 2 matrices. Of course, i,q,,ii must include the terms for the intercept. We have carried out the computations for the full 3 x 3 matrix for illustrative purposes. The estimated variances of 22, - x, and j , - y, are given in Table 2.2.3. The average increase in the estimated variance of 2, - x, due to estimating the parameters is 6.8, an increase of about 15%. The average increase in the estimated variance of 9,- y, due to estimating the parameters is 5.0, an increase of about 38%. The largest percentage increase in the estimated variance of 9, - yf is about 67% and occurs for the second observation, the observation with the largest absolute value of 2f - s. One should remember that this is a small sample and, hence, the estimation effect is relatively large. Expression (2.2.47)for the estimated variance of 9 can be written as a function of the original X values or as a function of deviations from the mean. Using deviations from the mean, we have
q { 9 } = (59.4440)11 + i(e,, - 2n- 'M;~6,,)9
where the tth row of 2 is (1, if- 70.6364) and s,, = 59.4440. The 2, -2 and the diagonal elements of q{$}are given in Table 2.2.3. The last column
2.2.
THE MODEL WITH AN ERROR IN THE EQUATION
123
of Table 2.2.3 contains 8, divided by the estimated standard error of u,. These quantities can be used in place of C, for model checking.
on
REFERENCES Anderson and Rubin (1956), Fuller (1975, 1980), Kalbfleisch and Sprott (1970), Kiefer and Wolfowitz (1956), Morton (1981a).
EXERCISES 3. (Sections 2.2.1, 2.2.2) Using the data of Table 2.2.1 and the associated error covariance matrix, estimate the parameters of the model
Estimate x, = ( x I Ixf2,xI3) , for t = 1, 2, . . . , 20 treating xI1 as fixed. Give the estimated covariance matrix for your estimates of x f I . Predict the true values treating x,, as random. Give the estimated covariance matrix of the prediction error defined in (2.2.44). 4. (Section 2.2.1) Show that the likelihood in (2.2.4) increases without bound as uee-+ 0. (Hint: Fix ueeand maximize (2.2.4).) 5. (Section 2.2.3) Using expression (2.2.45), show that the covariance matrix of the approximate distribution of (I?,, uIj) is Using this expression for cov{u*,, ulj} and expression (2.2.46), construct an alternative estimator for x,. 6. (Section 2.2.1) A random sample of 40 of the plots used in the soil moisture experiment described in Example 2.3.2 gives the sample moments
(my,,mXY,m x x ) = (0.5087,0.1838,0.6006). Assume that these data satisfy the model
Estimate ooe,uxx.and PI. Give the estimated covariance matrix for the approximate distribution of the estimators. 7. (Section 2.2.1) (a) Prove (2.2.7) from formulas (2.2.5) and (2.2.6). (b) Verify the formula for E { g : } that follows (2.2.24). (Hint: First show that x, and u, are uncorrelated under the conditions of, Theprem 2.2.1.) 8. (Section 2.2.3) Verify the formula for 6 - S that follows (2.2.37). (Hint: First show that o,, = a'Z,,a = a'T,,a under the assumptions of the model.) 9. (Section 2.2.1) The estimator of@given in (2.2.12) was obtained by maximizing the likelihood for normal x, and is a method of moments estimator for any random x,. Let Zeu= 0 for model (2.2.I).
124
VECTOR EXPLANATORY VARIABLES
(a) Show that minimizes
of (2.2.12)(or $ of (2.2.20) with S,, = EuM and S,, = 0) is the value of /I that
Q(B) =
n
1 (E: - X,/I)’ - n S ’ L b .
1=1
Does the function have a minimum for all values of X, and Zuu?Explain. (b) Show that the minimum over 6 of E {,=, [(
- X,6)’
- 6’Luu6]]
occurs with 6 = /Io,where /?’ is the population parameter, and that the minimum of the expectation is nu,,.
2.3. THE MODEL WITH NO ERROR IN THE EQUATION In this section we consider maximum likelihood estimation of the errors-invariables model when the entire error covariance structure, including u,,, is known or is known up to a scalar multiple. The model is the extension of the model of Section 1.3 to a model with a vector of explanatory variables. 2.3.1. The Functional Model We first derive the maximum likelihood estimator treating the vector of true values x, as fixed in repeated sampling. The model is
(2.3.1) for t = 1,2, . . . , n, where {x,} is a sequence of fixed k-dimensional row vectors and 8, = (e,, u,)’ is the vector of measurement errors. We can also write the defining equations (2.3.1) as Z, = z,
z,a = 0,
where z, = (y,, x,), Z, = (y, XI), and a’ = (al, a*, . .
+ a,,
(2.3.2)
. ,a k + l )= (1, -p).
-
Theorem 2.3.1. Let model (2.3.1) and the associated assumptions hold. Let 8, NI(0, C,,), where ZRE= Y,,02 and Ye, is known. Then the maximum likelihood estimators of fl and u2 are
S = ( ~ x -x IYJ- -
6 :
= (k
+1)4,
IYue),
2.3. THE MODEL WITH NO ERROR
IN THE EQUATION
where M,, = n-' E;= ZiZ, and
2 is the smallest root of
125 (2.3.3)
IM,, - AY,,J = 0.
The maximum likelihood estimator of z,, t = 1, 2,. . . , n, is
2, = Zr
-
(Y, - X,B)[(l, -&Y,,(l,
-/?)Ye,.
-)')']-'(l,
(2.3.4)
Proof. We derive the estimator under the assumption that Y E ,is nonsingular. The density for e, is
127~r,,0~1 -
exp{ - ( 2 0 ~ ) '(&,re; I&;)}.
(2.3.5)
Because the z, are fixed, the logarithm of the likelihood for a sample of n observations is log L = - 2-
1,
iogj2nr,,021 - (2a2)- 1
c ( z , - z,)rE; yzt- z,y.
r= 1
(2.3.6)
We first maximize (2.3.5) with respect to z,, t = 1,2,. . . , n, for a given / I This reduces to the least squares problem of minimizing
( X - XrS, Xr - xr)yL1(T - xrB x, - Xr)' 9
(2.3.7)
with respect to x,. It follows that the least squares estimator of z; is 2;
Ik)yi '(b?Id'] = Z; - Y,,a(a'r,,a)- 'a'z;. = (B, I k ) ' [ ( B ?
'(fly
lz;
(2.3.8) (2.3.9)
Substituting expression (2.3.8) into (2.3.5), we have
1% L = - T ' n log)2nr,,aZI - (2a2)-'
n
1 Z,a(a'Y,,a)-'a'ZI.
I =1
(2.3.10)
Therefore, the a that maximizes (2.3.6) is the vector that minimizes
n-
' ,C= I (a'Y,,a)-'a'ZiZra= (a'Y,,a)n
'a'M,,a,
(2.3.11)
'
where we have introduced the factor n - for notational convenience. By Corollary 4.A.10 of Appendix 4.A, the a minimizing (2.3.1 1) is given by
( M -~Ire,)$ ~ = 0,
(2.3.12)
where 2 is the smallest root of (2.3.3). The smallest root 2 is also the minimum value for the ratio (2.3.1 1)
I = (oilr,,a*)loil~,,a*.
(2.3.13)
126
VECTOR EXPLANATORY VARIABLES
With probability one, the rank of M,, - ITe& is k and a* is determined up to a multiple. Using a‘ = (1, -/3’), the estimator of /? is
b = W x x- ~ r u u ) - l ( M -x yATue),
(2.3.14)
4, = Z,[I - k($rEeOi)idjlreEl.
(2.3.15)
where Tee, re,,and Y,, are the submatrices of re,,and M,, - AT,, is nonsingular with probability one. See the discussion in the text below for singular Y E & . By (2.3.9), the maximum likelihood estimator of z, is Differentiating (2.3.5) with respect to a2, we obtain
and the maximum likelihood estimator of
= (k
+1 ) 4 ,
.
where 2 is defined in (2.3.13) and (2.3.3).
0’
is
(2.3.16)
0
Observe that the maximum likelihood estimators of a and z, given in Theorem 2.3.1 would be the same if u2 were known. That is, the estimators of a and z, are the same whether ZeEor Teeis known. In deriving the maximum likelihood estimators we assumed ZeEto ke nonsingular. This was an assumption of convenience. The definition of /I given by Equation (2.3.14) does not require Teeto be nonsingular, provided m,, is nonsingular. Likewise, the estimator of z, defined in (2.3.15) is the maximum likelihood estimator for singular TEE. If Oi is a consistent estimator for a, then the maximum likelihood estimator of u2 will be a consistexit estimator for (k + 1)-’a2. As in Section 2.2, maximum likelihood fails to produce consistent estimators of all parameters for the functional model. The problem is essentially a “degrees of freedom” problem. We estimate (nk k ) parameters, but there is no adjustment in the maximum likelihood estimator for this fact. The estimator (2.3.14) of /3 has considerable appeal and we shall demonstrate that it possesses desirable large sample properties. In line with our earlier results, a reasonable consistent estimator of a’ is 8’ = ( n - k)-’nI. (2.3.17)
+
2.3.
127
THE MODEL WITH NO ERROR I N THE EQUATION
In Theorem 2.3.2 we demonstrate that the estimator of /l is approximately normally distributed if the error variances are small relative to the variation in x, or (and) if the sample size is large. To accommodate the two types of limits, the sequence of estimators is indexed by v , where v = n,T,, n, is the sample size, and the error covariance matrix for a particular v is a fixed matrix multiplied by T,-'. It is assumed that the sequences {n,} and {T,} are nondecreasing in v . Henceforth, we shall suppress the subscript v on n and T. In practice, we often have an estimator of XEe,rather than knowing the matrix YE,.We give the limiting distribution of the estimators for such a situation, and the estimation of the error covariance matrix contributes a term to the covariance matrix of the approximate distribution of
b.
Theorem 2.3.2.
Let
+
k; = x,B + e,, (e,, u,)'
-
X, = x, u,, NI(0, T - 'a),
for t = 1, 2, , . . , n, where R is a fixed positive semidefinite matrix and { x t } is a fixed sequence of k-dimensional vectors. Let n 2 k 1, T 2 1, v = nT, and lM,,l > 0 and let lim Z = p,, lim M,, = M,,,
+
v+ w
v+ m
where M,, is a positive definite symmetric matrix. Let S,, be an unbiased estimator of R, where S,, is the lower right k x k portion of S,,. Let S,, be distributed as a multiple of a Wishart matrix with d, degrees of freedom, independent of XI) for all t. Let d, 2 n t , , where tl is a fixed positive number. Let ) = (Mxx - AT-'SJ1(MXy - jT1 Sue),
(x,
where 1is the smallest root of Then, for n > k
+ 1,
IM,, n(n - k)-
(2.3.18)
- I.T-'S,J = 0.
'1= F + O,(n-'l2v-
'I2
1 3
(2.3.19)
where F is a random variable distributed as Snedecor's F with n - k and d, degrees of freedom. Furthermore, as v
+
co,where
r; 1 / 2 ( p "
- p) 5 N(O, I)
and, for example, a",,= T - '( 1, - /?')a( 1, - /?')'.
128
VECTOR EXPLANATORY VARIABLES
+
+
Proof. Now, for example, M x x i j= MXxij + Mxuij Mxuji MUuij, and V{MXUij} = n-~MxxiiOuujj = (nT)-’Mxxiimuuij = O(v-’), V{MUuij}= n - ’ ~ - 2 ( m u u i i q , u j j miuij) = O(v-’T-’),
+
where is the ijth element of R,, and R,, is the lower right k x k portion of R. I t follows that Mzz = M,, ZEt+ O,(V- ‘I2),
+
where Zee= T-’R. Because 1is the minimum root of (2.3.18), we have
X = T[(1, -&te(l,
-F)’]-’(l, -F)MZz(l,
6 TM”,[U, -P)See(l,
-P)’]-’.
-b)’
(2.3.20)
The ratio on the right side of the inequality in (2.3.20) is distributed as an F random variable with n and d, degrees of freedom. Hence, X is bounde: in probability, The root is a continuous function of the elements of 8 = [(MZz,See)- (Mzz ZfC,R)] with continuous first derivatives in a region about B = 0. Because B = OP(n-’l2),it follows that
x
+
1- 1 = O,(n-’j2), (2.3.21) where the result holds for fixed n because 1is bounded in probability. From
the definition of
S,
S - /I= (Mxx - XT-’S,,)-’[Mxu - -IT-’(S,, - S,,p)] = M,’[M,, + Mu, - T-’(Se, - S,,/?) - (X - l)Zuu]+ O,(V-’) (2.3.22)
= O,(v - ‘/2),
where we have used
(X - I ) T - ’ S ~=~O , ( n - l / Z T - l Mxx - XT-’S,, = M,, + O , ( V - ” ~ ) , Mx, - (1- l)T-’(Sue- S,,P) = M,, - (X - l)& + OP(v-’). By the definitions of x and S, 19
I - 1 = TW,, - 2 m x r + B’MXXS) - 1 see- 2B’s,,
+ B’s,,B”
T[M,, - s;, - 2(S - B)’(Mx, + (S - B)’(Mxx - S,,)(S T[%”- 2(S - B)’SUU+ (S - nrLcS - n1 = S;;’[M,,- SVu- 2(8 - /I)‘(Mx, - S,,,,) (S - /I)’(Mxx - 8,,)(8 - j3)] O , ( V - ” ~ ~ - ” ~ ) ,
kJ)
-
where
s,,
+
= T-’S,,,
+
s,, = T-’(S,, - SJ),
and
.?, = T-’(l, -fl’)SEe(l,-/?’)’,
n1
2.3.
129
THE MODEL WITH NO ERROR IN THE EQUATION
By the second expression of (2.3.22)and the properties of the sample moments,
j- p = M,'M,, + o,(v-1 / 2 ~ M,, - S,, = M,, O,(V-"~T-'/~), (Mxx - I$,,)-'(Mxx - S,,,) = I + O,(V-"~T-~~~).
-
19
+
It follows that and
(B - /?)'(M,, - s,,,)($- B) = M,,M~,'M,, + O,(V-'T-"~ 1 1= CL1(A4,,,- M,,M,'M,,)
+ O,(V-'/*~-'/~ 1.
(2.3.23)
By assumption, the numerator and denominator of (2.3.23) are independent, and d p i ' & , is distributed as a chi-square random variable with d , degrees of freedom. The limiting distribution of n(n - k)-'1 follows because (n - l)cru~l(Muu - M,,M~'M,,) is distributed as a chi-square random variable with n - k degrees of freedom. If T increases without bound, then
) - fl = M,'Mx,
+ o,,(v-~/').
Because x, is fixed, [V{Mxu)]-'~2Mx,,is a N ( 0 , I) random vector for all v. Now VIM,,} = T-'n-'MxX(l, -/3')f2(1, -p)'
and r, = V{M~,'Mx,,}+ O,,(v-'T-'). Therefore, the limiting distribution of r; ' l 2 ( f i - 8) is established for increasing T. If n - l = O(v-'),we write n'/2(j -
fl) = n-'/2M-1 f g, - n ' / 2 Mxx- l ( S u v xx I =1
L U ) +q n -
~ V V S V V
1/2)7
where g, = X;u, - C,, - gT,t(u: - o,,JC,,~ and g, is independent of S,,. The - p) then follows by the arguments of Theorem limiting normality of r; "'(/I 2.2.1. If C2 is known or known up to a multiple, Theorem 2.3.2 holds with d; = 0. If C Z is known up to a multiple, it follows from Theorem 2.3.2 that the estimator of g 2 given in (2.3.17)is approximately distributed as a multiple of a chi-square random variable with n - k degrees of freedom. If the error variances become small as v -+ CO, then one could normalize - /? with V(M;x'M,,.) to obtain the limiting distribution. From (2.3.23),
X
-
1 = a,;;,'[M
-
M,jxMLl'My,, - S,,,,]
+ O,(n-')
130
VECTOR EXPLANATORY VARIABLES
and substituting this expression into (2.3.22),we obtain
j-
= M;.[M,,
- S,,, - C , , , O ; ~ ( M , ,-~ SJ
+ OP(n-').
It is because r, is the variance of the leading term on the right side of the equality that we prefer r, to V{M,'M,,} as the variance of the approximate distribution of j,even for small error variances. On the basis of Theorem 2.3.2 we may use the variance expression of the approximate distribution for samples in which the variances of the elements of the difference
Mxx - (Mxx + Cuu) are small relative to the diagonal elements of M,,. This will be true if n is large so that the variances of the elements of M,, and Mu, are small. Likewise, the variances of the elements of M,,, and Mu, will be small if &,, is small (7'large). Theorem 2.3.2 gives the distribution of only for the null model. From expressions (2.3.10) and the fact that the sample moments converge in probability, we see that a test based on a" will have power against models in which the relationship between y, and x, is not linear. An estimator of the covariance matrix of the approximate distribution of j is
x
q{$}= n-'M;:buu where
+ (n-' + d;')M;J(~,JUu
e,, = ( n - k + df)-'[n(Mz,
- ~ U u ~ u u ) M , (2.3.24) ',
- Mzz) + dfS,,],
Mzz= (j, IYMxx(/t I), A,, = &(Mzz - See)H2, H, = (0, Ik)' - (1, -B"')'[(l, -jr)S,,(l, -B"'Y]-'(l,
-jF)S,,,
e,, e,, e,,fl.
and = The theoretical basis for the estimator v { j } is given in Theorem 4.1.5. The estimator of C,, is derived in Theorem 4.1.2. Example 2.3.1. The data in Table 2.3.1 are the means of six trees for log apple crop, log growth in wood, and log growth in girth of Cox's Orange Pippin apple trees for 5 years. Different trees were observed in each year. The data were collected at East Malling Research Station and are taken from Sprent (1969, p. 121ff). Sprent (1966, 1969) suggested that log crop (y), log extension wood growth ( x ~ )and , log girth increment (xz) would satisfy
2.3.
THE MODEL WITH NO ERROR IN THE EQUATION
131
TABLE 2.3.1. Crop, wood growth, and girth increment of apple trees Log Crop (Y)
Log Wood (XI)
Log Girth
Year 1954 1955 1956 1957 1958
1.015 1.120 1.937 1.743 2.173
3.442 3.180 3.943 3.982 4.068
0.477 0.610 0.505 0.415 0.610
(X,)
Source: Sprent (1969, p. 122).
the model Yr = So
+ S l X r l + P2xt2.
(2.3.25)
The model was put forward on the basis that reproductive and vegetative growth tend to balance each other. We assume that
(Y,
1, Xt1, Xr2) = (Yr, xzo, xr1, xr2) + (er, uto, url, u t A e; = (er, uro, u,1, u,J' "h&,I.
-
The trees are relatively young and the crop is increasing. It seems most reasonable to assume the xri to be fixed. The estimator of ZEEis the matrix that is one-sixth of the sample covariance matrix obtained by pooling the five covariance matrices for the variation among trees within years. This estimator is
r
see
I
0.5546
o
0 0 0
= -:.I079
L-0.0691
-0.1079
-0.06911
0 0.2756 0.1247
O 110-2. 0.1247 0.08781
There is evidence that the within-year covariance matrices are not equal, but for purposes of this example we assume that they are equal and that S,, is distributed as a multiple of a Wishart matrix. The moment matrix for the yearly means of Table 2.3.1 is 2.75958 1.59767 6.09486 0.83885
1.59767 6.09486 0.83885' 1.00000 3.72293 0.52333 3.72293 13.98190 1.94112 0.52333 1.94112 0.27973
The smallest root of the determinantal equation IMZZ
- ns,el
=0
132
VECTOR EXPLANATORY VARIABLES
is = 0.2089 and the F statistic of (2.3.19) is F = 2.51 = 0.52. Under the null, the approximate distribution of F is that of Snedecor's F with 2 and 25 degrees of freedom. One easily accepts the hypothesis that the matrix of mean squares and products for the true vector z, is singular. The test of the hypothesis that the matrix of mean squares and products for x, is singular is given by
F = 3 '(4)Ix = 8.08, where 1,= 6.0605 is the smallest root of
I[
[
15.2168 -0.8967 0.2756 0.124711 = 0. 0.73341 - A x 0.1247 0.0878
- 0.8967
I;=
Therefore, one is reasonably comfortable with the assumption that (x, %,)'(x, - 3) is nonsingular. We calculated the test for singularity using the Xr2),but the test is equivalent matrix of corrected mean squares for (X,,, to that computed using the 3 x 3 matrix M x x . The maximum likelihood estimator of is = ( M x x - ISuu)-'(Mxy- ISue) = (-
4.65 18, 1.3560, 2.2951)'.
Using (2.3.24), the estimated covariance matrix for
is
1
1.3417 -0,2322 -0.9009 0.0517 0.0757 , -0.2322 - 0.9009 0.0757 1.1828 where 8,, = 0.02803,
e,,
r
= (0,
0.5773 -0.0574
H2 = A
-0.7406, -0.4241)10-2,
o
0
0 0
I
-0.0860 -0.05741 0 O 0,2768 0.1217 0.08791 0.1217
1.0000
3.7229 0.5233
0.5233
1.9408 0.2794
[
0
0.2639 1.2276 0.6422 -0.6057
xi:].
- 0.2049
0.6532
2.3. THE
133
MODEL WITH NO ERROR IN THE EQUATION
This example provides a simple illustration of the computations because of the small number of observations. Also, the large sample theory is adequate because the error variances are small relative to the variation in x,. In situations where the error variance is small relative to the variation in x,, the estimated variance of the estimator will be dominated by the term ~-'M;:C?~~. The fact that C,, is estimated results in a very modest contribution to the estimated error variance in such situations. In this sample the contribution is A
+
.
(25)-tM~~(~uxc? xZ,,,,E,,)M,' ,0
=
i
-0.103 0.014 0.097
0.882
-0.103 -0.949
1
-0.949 0.097 lo-'. 1.121
The estimated values for (xtl, x f 2 )are given in Table 2.3.2. These values were computed by the formula $, = R
+ (Z, - Z)H,,
which is equivalent to (2.3.15). The first entry in the vector gf is always one in this example and is not included in the table. The estimated covariance matrix for the error in the estimated true values (&, ff2)is the lower right 2 x 2 portion of H&H2 and is
[
0.08 1 1 0.00971 0.0097 0.0237 '
00
Example 2.3.1 illustrates an experimental situation in which measurement error played a modest role. Because the variance of the measurement error is small relative to the variation in the true values, the maximum likelihood estimate of the structural equation is not greatly different from the ordinary least squares estimate. The following example presents the analysis of a large experiment in which measurement error was very important. TABLE 2.3.2. Estimated true values for apple trees
Year
Log Wood Growth i*
Log Girth Increment
1954 1955 1956 1957 1958
3.417 3.196 3.965 3.993 4.044
0.462 0.619 0.517 0.421 0.596
i 2
- 0.094
0.060 0.082 0.043 - 0.09 1
134
VECTOR EXPLANATORY VARIABLES
Example 2.3.2. We discuss an experiment conducted by the Iowa Agriculture Experiment station at the Doon Experimental Farm in northwest Iowa. The experiment is described by Mowers (1981) and in Mowers, Fuller, and Shrader (1981). The experiment consisted of growing crops in the sequence ccirn-oats-meadow-meadow with meadow-kill treatments applied to the second-year meadow at various times of the growing season. In the control treatment, the second-year meadow was harvested two or three times. Treatment two was a “short-fallow” treatment, in which second-year meadow was killed with herbicides in the early fall after the second cutting of hay. The third treatment was a longer fallow treatment, with meadow killed in midsummer after the first cutting of hay. All plots were plowed in the spring before corn was planted. Killing the meadow crop increased the amount of soil moisture available to the following corn crop. The longer the fallow period, the more soil moisture was increased relative to treatment one. Table 2.3.3 contains a portion of the data on corn yield and soil moisture collected in the experiment. The response to moisture is not linear over the full range of observations, The data in Table 2.3.3 are a subset of the data for which linearity is a reasonable approximation. The yield is the yield of corn grain in tens of bushels per acre. The soil moisture is the inches of available moisture in the soil at corn planting time. Table 2.3.4 contains the TABLE 2.3.3. Treatment means for yield-soil moisture experiment
Treatment One
Treatment Two
Treatment Three
Year
Yield
Soil Moisture
Yield
Soil Moisture
Yield
Soil Moisture
1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1969 1971 1975 1977
4.83 4.49 10.55 7.24 10.50 3.17 9.71 10.84 2.20 0.85 14.23 3.39 5.80 7.46
4.58 2.04 7.67 3.31 4.40 1.16 2.94 3.64 4.42 2.36 6.01 1.31 3.63 0.87 ’
6.23 6.07 9.89 9.38 9.05 5.66 9.42 11.42 5.83 2.73 14.88 4.60 6.61 7.33
6.14 2.24 8.69 4.11 3.73 4.06 2.97 4.03 5.88 4.09 5.81 1.97 3.39 0.64
7.48 5.13 10.02 9.95 10.08 6.00 9.94 10.80 7.69 2.37 15.46 5.07 6.25 8.55
7.50 2.12 9.47 5.67 4.63 4.30 3.66 3.32 6.83 4.20 6.32 1.83 3.61 1.44
Mean
6.80
3.46
7.79
4.12
8.20
4.64
Source: Mowers, Fuller, and Shrader (1981).
2.3.
135
THE MODEL WITH NO ERROR IN THE EQUATION
TABLE 2.3.4. Analysis of variance and covariance for yield (Y)and soil moisture (X)
Mean Squares and Products
Degrees of Freedom
Source
Total
125 13 2 26 84
Years
Treatments
Years x treatments Error
YY
XY
xx
11.7907
3.2278 23.8933 17.6327 1.2677 0.2932
4.8560 36.3441 14.7826 1.3609 0.8284
10 I .2508
21.5797 , 2.5389 0.5762
mean squares and products for yield and soil moisture. Clearly, the treatment of killing the meadow had an effect on soil moisture the following spring and on the yield of the corn crop planted that spring. Two questions are of interest. First, did the treatments have any effect on yield other than the indirect effect through soil moisture? Second, how much is yield increased by an additional inch of soil moisture? We analyze the data recognizing that soil moisture is measured with error. We postulate the model
-
y
= Po
+ BlX,
( Y , X ) = ( Y , x)
+ (e, 4,
where E' NI(0, E,,)and E = (e, u) is independent of x. We have purposely omitted subscripts. We treat the components of the error line in the analysis of variance as the elements of the matrix S,,, where S,, is an unbiased estimator of Z,,. The means (or multiples of the means) associated with different lines of the analysis of variance will serve as the Y and X observations in the analysis. In this example we analyze deviations from the mean. All results presented in the text for MZz are applicable for m,, with proper degrees-of-freedom modifications. Under the model, a multiple of the smallest root of the determinantal equation lmzz - AS&&/ =0
is approximately distributed as an F random variable. If we let the y and x values be the true values of year-by-treatment effects, we compute the smallest root of
I1 [
2.5389 1.2677
1.26771 - r . 5 7 6 2 0,293211 = 0. 1.3609 0.2932 0.8284 ~
This root is = 1.072 and F i i = (26)(25)-'i = 1.11. We conclude that the observed year-by-treatment effects are consistent with the model.
136
VECTOR EXPLANATORY VARIABLES
If we let the y and x values be the true values of treatment means, we compute the smallest root of
I[
21.5797 17.6327 17.63271 14.7826
-
1[0.5762 0.2932 0.8284 0.293211
= 0.
The smallest root is = 0.510 and Fi4 = 22 = 1.02. We conclude that the treatment means are consistent with the model. If we let the y and x values be the true values of the year means, we compute the smallest root of 101.2508 23.89331 - 0.5762 0.293211 = 0. 23.8933 36.3441 0.2932 0.8284
[
I[
x
The smallest root is = 41.749, F i t = 13(12)-’x = 45.23, and the model is rejected for year means. This is not surprising because there are many factors other than soil moisture that change the yearly environment for corn. To summarize, one can conclude that treatment effects and treatment-byyear effects are due to soil moisture effects created by the treatments. We can accept the hypothesis that treatments had no effect on yield beyond that due to moisture. However, there is variation in the year means that is not associated with variation in soil moisture. On the basis of these conclusions, we pool the treatment and treatmentby-years sums of squares to estimate the slope of the response of corn yield to moisture. The determinantal equation for the pooled data is 1[2.4366 3.8990 2.3196 2.43661 - I [,.,,62 0.2932 0.293211 0.8284 = 0. The smallest root is = 1.1573 and the associated F statistic is 1.2002. The estimator of p1 is
Fil =
= (2.3196- 0.9587)-’(2.4366 - 0.3393) =
1.5411 and the estimated variance of the approximate distribution of fll, constructed with moments corrected for the mean, is f{PI} = 0.06083, where n - 1
= 28,
tee = (n + d,
d , = 84, A,, = H;(mzz - &)H2 = 1.4385, 8,,
- 2)-’[d,S,, -,
=
1.7027,
0.27531 + (n - l)(mzz - mzz)]= r.5534 0.2753 0.8412 ’ and Hi = (0.5997,
Ci, = B,, - plBUu= - 1.021 1, 0.0758). The error line in the analysis of variance table is not for “pure” measurement error. Because of variations in the topography and in the soil, there
mzz= (pl, l)’fix,(j?l,I),
2.3. THE MODEL WITH
137
NO ERROR IN THE EQUATION
are differences in the true available soil moisture of different plots. It follows that the three estimates in the error line of the analysis of variance are the sums of two components, one that arises from variation in true values and one that is due to measurement errors. This fact does not impair the validity of our original measurement error model. For this experiment it seems reasonable to assume the measurement error associated with the moisture determination to be independent of the plotto-plot variability and of any measurement error made in determining yield. Under this assumption, the mean cross product on the error line is an estimator of fl,c,,, where o,,& the plot variability in true soil moisture. Therefore, using the estimate from Eaa,we have
d,,
= (1.5411)-'0.2753 = 0.1786.
The error mean square for soil moisture is estimating up, is the variance of pure measurement error. Hence,
6,,
+ c,
where c ,
= 0.8412- 0.1786 = 0.6626.
This estimate of measurement error variance is similar to direct estimates obtained in studies such as that of Shaw, Nielsen, and Runkles (1959).(See Exercise 4.6 of Chapter 4.) For the Doon experiment the ordinary least squares estimate of the effect of an inch of soil moisture computed from the error line of the analysis of variance is 0.354with a standard error of 0.083.The analysis that recognized the measurement error in soil moisture produced an estimate of 1.541 with a standard error of 0.25.The analysis recognizing measurement error produced an estimate more than four times that of the (incorrect) ordinary least squares procedure. At a price of $3.00 per bushel this is a difference of $35.61 in the estimated marginal value of one acre inch of soil moisture. 00 When the error variances are estimated, the knowledge that some error covariances are zero changes the covariance matrix of the limiting distribution. One model that arises frequently in practice is the model of (2.3.1)with the additional specification that the error covariance matrix is diagonal. Let the diagonal covariance matrix be estimated by S,, = diag(S,,, Suull, Suuzz, . . . Suukk),
where E{S,,} = &, and the elements of S,, are independently distributed as multiples of chi-square rando? variables with d f i degrees of freedom, i = 0, 1,. . . ,k. Let the estimator j? be defined by (2.3.14)with Sc, replacing YE, and let be the smallest root of(2.3.12)with S,, replacing Ye,. Given regularity conditions, the estimator of j? constructed with S,, is approximately normally
x
138
VECTOR EXPLANATORY VARIABLES
distributed with covariance matrix n- '[M,la,,
+ M;~(Z,,~O,,- ZJ,,,)M;J]
+ au;'(R/3Z,, + Z,,j?'R)
i
(2.3.26)
M;,
where R = diag(d;~fitdvll,ds;'P:&z2,. . d/k I P k2a u 2u k k ) In Theorem 2.3.2 the approximate distribution of the smallest root of (2.3.18) was shown to be that of an F random variable with numerator degrees of freedom equal to n - k and denominator degrees of freedom equal to d,. Therefore, a reasonable approximation for the distribution of based on S,, is the F distribution with numerator degrees of freedom equal to n - k and denominator degrees of freedom determined by the variance introduced by the estimation of C,, with S,,. Recall that the variance of the F distribution is +
9
a
[v1(v2 - 2)2(v2 - 4)]-'2v:(v, + v 2 - 2), v2 > 4, where v1 is the numerator degrees of freedom and v2 is the denominator degrees of freedom. See Kendall and Stuart (1977, Vol. I, p. 406). For large v2 the variance is approximately 2v; + 2v; It follows that an approximation for the distribution of n(n - k)-'X based on S, is that of the F distribution with n - k and v2 degrees of freedom where v2 is estimated by O2 = ( d ; d S z e
k
+ 11 d;il~S~ujl)-l(See + i=
k i= 1
2
j'"",,) .
(2.3.27)
Example 2.3.3. We study further the data of Example 2.2.1. We are now in a position to construct a second test of the hypothesis that a,, = 0. Let 2 be the smallest root of JMZz-
= 0,
(2.3.28)
where
Sf,= diag(0.0037,0.0,0.0203,0.0438,0.0180,0.0) and the elements of Z, are the observations on role performance, X,l = 1, knowledge,Avalueorientation, role satisfaction, and past training, respectively. If o,, = 0, A is approximately distributed as an F random variable with 50 and v2 degrees of freedom, where 0 , is given in (2.3.27). For the data of
2.3.
THE MODEL WITH NO ERROR IN THE EQUATION
x
139
Example 2.2.1, the smallest root of (2.3.28) is = 1.46 and F = 1.61, where the numerator degrees of freedom is n - k = 50. Using (2.3.27) we have C2 = 243, where /?=(-1.396,0.506,0.157,0.126, -0.0028), is the vector of estimates computed under the assumption q, = 0. Because 1.61 is close to the 1% tabular point of the F distribution with 50 and 243 degrees of freedom, we reject the hypothesis that cr, = 0 and accept 8,, = 0.0054 as an estimate of cqq. Because we reject the hypothesis that the covariance matrix of (y,, x,) is singular, it is clear that we also reject the hypothesis that the covariance matrix of x, is singular. Nevertheless, we illustrate the computations by computing a test of the hypothesis that the covariance matrix of x, is singular. The smallest root of JMxx - AS,,]
is
=
=0
1.75 and the F statistic is 55(5l)-'x *..
=
1.89. The vector
6 satisfying
( M -~ns,,)e ~ =o 1
is 6 = (-0.371, - 1.0, 1.935, 0.595, -0.920), where the first entry in X, is always one and the other entries are for knowledge, value orientation, role satisfaction, and past training. From (2.3.27),the estimated degrees of freedom for the denominator of the F statistic is C2 = 129. The 1% tabular value of F with 50 and 129 degrees of freedom is 1.68 and the hypothesis that the covariance matrix of x, is singular is rejected at that level. 00
2.3.2. The Structural Model The majority of the results developed for the estimator of fl for the model with fixed x, are also appropriate for the model with random x,. If ZCC = Y&P2,
where Tseis known, the maximum likelihood estimator of fl for the structural model is that given in Theorem 2.3.1 for the functional model. See Theorem 4.1.1 and Exercise 4.1 1. The maximum likelihood estimator for the structural model with XEEestimated by SEEis given in Theorem 4.1.2. The sample moment matrix mxx will converge to C,, almost surely as n + GO, for any x distribution with finite second moments. Therefore, as n + co,the limiting distribution of r; - fl) given in Theorem 2.3.2 holds for any such x distribution. Also, the variance of the approximate distribution of s" can be estimated by the expression given in (2.3.24) for such models.
"'(s
140
VECTOR EXPLANATORY VARIABLES
2.3.3. Higher-Order Approximations for Residuals and True Values In this section we present an estimator for the variance of the estimated true that recognizes the fact that an estimator of fl is used to construct values, %,, 2,. We assume
-
z, = z, + e,,
y, = XIS, E;
(x,
(2.3.29)
NI(0, &,),
where Z, = XI), C,, is known, and the x,, t = 1,2,. . . , are fixed. The estimator of the vector of true values is
2, = z, - o*,S',
A
,
.
1
-
e,,
(2.3.30)
where 6,= Z,a*, ai' = (1, - /Y), S' = cr, Xu&,8,, = a*'Z,,a*, = &'Zet,and is defined by (2.3.14) with EE&replacing Yet.Using (2.3.30), a* - a = Op(n-"2), and f - S = Op(n-"2), we have
i; - z;
= 2; - z; -
1-
(f - S)u, - (fit - u,)S + Op(n?),
where f, = Z, - u,S', 6' = aU;'Cua,
1- 6 = ( c r , ; ' ~ & ~ - 2~c~)(ai - a) + ~ ~ ( n - ' ) ,
and 8, - u, = Z,(k - a). Using the definition of zt we obtain
It follows that the covariance matrix of the approximate distribution of 2, - z, is
V{%,- z,} = Z&& - &XU&+ SS'E{z,V,,f;}
+ nu;'&
- GJVaaGe&- SCue),
(2.3.3 1)
where V,, is the covariance matrix of the approximate distribution of 2 and we used the independence of u, and 2, in deriving the result. Replacing parameters in (2.3.31) with their sample analogues, an estimator of the variance of the approximate distribution of 2, - z, is
3{2, - 2,)
= 72&e-
ie,, + 8s^,%,?ppa;
+ 8,'(X&&- i2uz)?aa(Cce- i.,,).
(2.3.32)
To develop an approximation for the covariance matrix of the limiting distribution of the estimated u,, let 0 = (I&, G 2 , .
. . ,On)'
and Z = (Z,, Z2,. . . , Z;)',
2.3.
141
THE MODEL WITH NO ERROR IN THE EQUATION
Then 8 =v
-
X ( j - p)
- n-'(X
=v
+ )%'.
Consequently, the standardized residuals [diag f { 9 } ] - ' / * 9 , where
+ %(O,, - ~ ~ - ' M ; > S ~ J S ~
6 { 0= ) SJ,
(2.3.35)
and diag 9{9}is the diagonal matrix composed of the diagonal elements of q{O}, will have a distribution approximated by that of the standardized
ordinary least squares residuals. See Miller (1986). As noted in Section 2.2.3, the diagnostic statistics computed by replacing the dependent variable and vector of independent variables by C1 and fit, respectively, in the ordinary least squares forms will behave, approximately, as the ordinary least squares statistics. The smaller the measurement variance relative to the variation in x,, the better the approximation.
Example 2.3.4. We use the results of this section to compute the approximate variances for the itvalues of Example 1.3.2. If the number of cells forming rosettes is expressed as a deviation from the mean, the estimated model becomes
j 1= 3.5748 + 0.6079(~,- 12.7374).
In this parameterization Z, =
(x,1, X I
oi' = (1, -3.5748, -0.6079), Qpp =
-
12.7374),
8' = (0.3424)-'(0.25, 0, -0.1520),
6{(V,jJ}= diag(0.06848, 0.00742},
nMxx = diag{5, 46.8768}, and EEE= diag(0.25, 0, 0.25). Then the first-order approximation to the covariance matrix of it is 0.0675 0 0.1110
0.1110 0 0.1825 The ( j t ,a,) portion of this matrix is singular, because, to the first-order of approximation, j , is a linear function of i t .The second-order approximation to the variance of i, is
0"
0.0677 0 0.1 114 b.1114 :.l83j+ where the first matrix is
X&&-
&&
0"
0.5330 0 [&Xl
-0.3241 :.197,]
+ 6,'(E,& - &)Vaa(X&&
- &),
@,&,
2.3.
143
THE MODEL WITH NO ERROR IN THE EQUATION
TABLE 2.3.5. Estimated variances of (PI, i, and ) GI for the cell data
x
t
2, -
1 2 3
- 1.1296
4
5
5.7189
0.4886
- 2.0303 - 3.0445
WlI
WJ
0.2446 0.1986 0.1971 0.2028 0.2103
0.2336 0.1093 0.1052 0.1205 0.1409
W
l
}
0.0388 0.2647 0.2721 0.2442 0.2072
[P{4}]- l'*Cr 1.1 I
- 1.17
- 0.34
- 0.29
1.54
the second matrix is %&,and 2, = (1, i t- 12.7374). The estimated variances are given in Table 2.3.5. In this example, the variance of u , is 0.25 and the estimated variance of i 1is close to that value. Figure 1.3.2 provides the explanation. The first observation is separated from the other observations by a considerable distance. Therefore, the least squares procedure will always produce a line that is close to the observation at t = 1. Because the X , and 1, values will be close together, the variance of 2, will be close to that of X I . Observations two through five are closer to the sample mean than observation one, and the first-order and second-order approximations to the variance of 2, are in better agreement for these observations than are the two variance approximations for i l . The estimator of the covariance matrix of the approximate distribution of v is 0.34241 + ?@, where 8,"
-
2n-'M-'6 xx
uu)n',
= 0.3424,
CI,
- 2 i l - ' M -*x' 8,,, = diag( - 0.06848, - 0.007 18),
and the i values are expressed as deviations from the mean. The estimated variances of the 0, are given in Table 2.3.5. The estimated variance for I?, is small for the same reason that the estimated variance of 2 , is large. The sum of squares of the standardized residuals of the last column of Table 2.3.5 is approximately equal to n, as it should be. 00
REFERENCES Anderson (195lb), Cox (1976), Fuller (1977, 1980, 1985), Koopmans (1937), Miller (1986), Nussbaurn (1977, 1978), Sprent (1966, 1969), Takernura, Momma, and Takeuchi (1985), Tintner (1945, 1952), Tukey (1951).
144
VECTOR EXPLANATORY VARIABLES
EXERCISES 10. (Section 2.3.1) (a) Verify that the estimator (2.3.15) is the least squares estimator of z, obtained from the observation Z, under the assumptions that z,a^ = 0,
Z,
=
z,
+ el,
where E(e,} = 0 and E{e;e,} = r,,u2. First, assume that r,, is nonsingular. Next, extend the estimator to singular Ter. (b) Show that the estimator Q{$}given in (2.3.24) for d;' = 0 and S,, = X,, can be written
Q{&
= M;;~,,M;;s,,,
e,
where_ 2, = x,t - Clkx, s^: = [(l, -$')fcc(l - /?)']-'(l, -/?$,,, = n - ' Z:=l 2$,, and M,, and E,,are defined in (2.3.24). I I . (Section 2.3.1) Show that I - (&r,,a^)-'a^&r,, of (2.3.15) is idempotent. What is the rank of I - (crzck)12. (Section 2.3.1) The data in the table were generated by the model
wr,,?
Y, = 80
(yl,
+ PIX11 + 82x121
X,,,X I 2 )= b1,x l l ,x12)+ ( e l , ulI. u12).
Observation
I:
X,,
4
1
0.6 4.0 14.5 15.6 13.8 6.4 3.7 5.9 7.2 16.1 9.6 1.5 13.0 8.3 11.1 8.3 13.3 11.3 2.0 12.3
7.8 8.4 8.3 12.1 8.4 8.9 7.3 9.1 8.4 9.7 11.5 8.7 11.8 10.9 12.6 8.0 10.8 14.5 8.2 10.7
6.5 7.3 12.9 9.8 10.9 10.4 9.8 8.6 9.6 11.2 9.6 7.1 11.7 9.0 12.9 10.1 8.7 16.0 8.6 10.9
2 3 4 5 6 7 8 9 10 11 12 13 14 15 I6 17 18 19 20
-
2
where ( e l , uII, u,J NI(0, ICY*). Estimate the parameters of the model. Test the hypothesis that the matrix of sums of squares and produc!s of the true values is of full rank. Test the hypothesis that o2 = I . Estimate the variance of /I1 &. Plot 6, against P,, and against &. 13. (Sections 2.3.1, 1.3.4)Extend the method of Section 1.3.4 to vector x, by using the fact that u, is independent of a, = XI - U,U,~Z"".
+
2.3.
145
THE MODEL WITH NO ERROR IN THE EQUATION
Hence, the F statistic calculated for the null hypothesis that the coefficient vector is zero in the regression of u, on x, is distributed as Snedecor’s F. If the covariance matrix Zeeis known, the test statistic can be calculated as a chi-square. Use this result to test the hypothesis that (p,, PZ) = (2, 3) against the alternative that both coefficients are positive for the model of Example 2.3.1. 14. (Section 2.3.1) Sprent (1969) suggested that the vector (PI, Pz) for the model of Example 2.3.1 might be equal to (1, 2). (a) Using the estimated covariance matrix of the approximate distribution of (PI, Pz) of Example 2.3.1, test the hypothesis (PI,&) = (1, 2). (b) Construct the likelihood ratio test of the hypothesis (PI. P2) = ( I , 2). (c) Assume that f i t = 2pl and estimate the equation subjectito this restriction. (Hint: Transform the X, vector.) (d) Give the estimated covariance matrix of the approximate distribution for your restricted estimator of part (c). 15. (Section 2.3.1) Assume the model I
T = Po + Pix, + e,, (el,u,)‘
-
*
XI = x, + u,,
NUO, Inuu).
Assume that for a sample of 100 observations we observe (my”, inXY, m x x ) = (5.00,0.01,0.98)
What do you conclude about the unknown parameters if it is known that u,, = l? Does your conclusion change if uuuis unknown? 16. (Section 2.3) Construct an analysis of variance table for the Reilly-Pantino-Leal data of Exercise 1.14. Assume that the data satisfy the model
-
Y j = Po + Plxi
+ eij,
Xi, = xi + u i j ,
where (eij, uij)’ NI(0, Z,,)(x!, , x 2 , x,) is fixed, the covariance matrix of (e;,. ujj) is unknown, i denotes condition, and j denotes replicate within condition. Estimate (/lo, PI,,,u uiu,uuu).Compute the estimate of the covariance matrix of the approximate distribution of (Po, PI)’. Compute d,,, 3””).Do you feel the estimated covariance matrix of the approximate distribution of (tee, the large sample approximation will perform well? Estimate (x,, x2. x3) treating these quantities as fixed. Hint: Note that the means satisfy the model
(x,,xi,)
-
r,”’(F, , U; )’ NI(0, ZJ, where r, is the number of replicates. 17. (Section 2.3.1) (a) Using the data of Table 2.3.3, construct a classical analysis of covariance using soil mois-
ture as the covariate and ignoring measurement error. Test for the effect of treatments after adjusting for soil moisture. Compare the conclusions reached on the basis of such a naive analysis with the conclusions of Example 2.3.2. (b) Let the observations of Table 2.3.3 be denoted by Y,,. Let the model in soil moisture be expressed as
146
VECTOR EXPLANATORY VARIABLES
where w,, are year indicator variables with w,, = 1 for i = f and zero otherwise, and the y i are the year effects. Using the general formulas for the estimators, estimate all param-
eters of the model and estimate the covariance matrix of the estimators. Plot the residuals C,, against the estimated true valuesA& What do you conclude? 18. (Section 2.3.3) Verify the formula for 6 - 6 given after (2.3.30). 19. (Sections 2.3.1, 1.6.2)Assume the existence of a sequence of experiments indexed by n. At the nth experiment, we observe XnI)satisfying
(x,,
u,, = P o + P I X , , + e,,, (em,,u d '
-
X",= X"I + un,,
NI(0, diag[u,,, a;
luw,vl).
for t = I, 2,. . . , b,, where {b,};= is a sequence of even integers and the x,, are fixed. Let d. be the distance between the (bn/2)th smallest xnI and the (bJ2 1)st smallest x",. Assume that dn-' = O(b,) and a,.-'/' = o(b,'). Let P, be the estimator (l.6.4), where the first group is comThat is, the groups are formed on the basis of the observed Xnl. posed of the b,j2 smallest Xn,. Assume
+
lim
"-m
(x,121 - ;r,,,,)= C ,
where C > 0, Trill, is the mean of the b,/2 smallest rmr, and .fn12) is the mean of the b,/2 largest
x",. Show that, as n
-, 00, /I, PI, j,- ( -~ ( 1 -) -x(z))-'(F(11- F,,J = Op(~n1/2bi1'2), b,!"(j, - PI) 1;N ( 0 , C-'u,,).
20. (Sections 2.3.1, 2.2.1) Assume the model
T = Po + Plx,
,.,
+ el,
XI = x,
+ u,,
for f = I , 2,. . . , n, where (x,, el, u,)' NI[(px, 0,O)'.diag (u,,, uee,uuu)]. (a) Assume uoeknown and uuuunknown. (i) Derive the maximum likelihood estimator adjusted for degrees of freedom of (u,,, u,,, Po. PI) for samples with (myy - u,,)mxx - m i r > 0.
(ii) Derive the maximum likelihood estimator adjusted for degrees of freedom for all samples. (b) Assume u,, and uuuunknown and u,, known. (i) Derive the maximum likelihood estimator adjusted for degrees of freedom of (uee,u,,, Po, PI) for samples with m,, - uxx> 0 and myyb,, - m i y > 0. (ii) Derive the maximum likelihood estimator adjusted for degrees of freedom for all samples. 21. (Sections 2.3.1, 2.2.1) Prove the following. Theorem. Let
where {x,} is a fixed sequence satisfying lim ff = p,,
a ' "
lim mxx = mXx.
n-m
2.3. THE MODEL
147
WITH NO ERROR IN THE EQUATION
Let sZ2 be an unbiased estimator of oz2distributed as a multiple of a chi-square random XI) for all t, where d;’ = O(n-I). Let variable with d , degrees of freedom independent of
(x,
j, = (mxx- T-1sz2)-1mxy, j ,
=
F - j1X.
Then [ w l ~ l - l ’ ~ ( s- P l I)
as v
-+
co, where
=L
z
+ O,(U
v = nT,
P{jl} = (n - I)-’[A,Ls,,+ t t i ; ? ( ~ - ~ s ~ ~ s+, , ~
+ di;:d;1&T-2s:2,
~ T - ~ S ~ J I
t=1
Axx= mxx - T - ’ S ~and ~ ,t.-* is Student’s t with n - 2 degrees of freedom.
22. (Sections 2.3, 2.2) Beaton, Rubin, and Barone (1976) used the data of Longley (1967) in a discussion of the effect of measurement error on regression coefficients. Longley (1967) originally used the data to test the computational accuracy of regression programs. Therefore, one should not be overly concerned with the economic content of the model. The model of Beaton, Rubin, and Barone might be written
Longley Data Total Employment
r,
60,323 61,122 60,171 61,187 63,221 63,639 64,989 63,761 66,O 19 67,857 68,169 66,513 68,655 69,564 69.331 70,551
Size of Armed
G N P Price Deflator
GNP
Total Unemployed
X, I
xtz
XI,
XI4
XI5
XI6
83.0 88.5 88.2 89.5 96.2 98.1 99.0 100.0 101.2 104.6 108.4 110.8 112.6 114.2 115.7 116.9
234,289 259,426 258,054 284,599 328,975 346,999 365,385 363,112 397,469 419,180 442,769 444,546 482,704 502,601 518,173 554,894
2,356 2,325 3,682 3,35 1 2,099 1,932 1,870 3,578 2,904 2,822 2,936 4,68 1 3,813 3,93 1 4,806 4,007
1,590 1,456 1,616 1,650 3,099 3,594 3,547 3,350 3,048 2,857 2,798 2,637 2,552 2,514 2,572 2,827
107,608 108,632 109,773 110,929 112,075 113,270 115,094 116,219 117,388 118,734 120,445 121,950 123,366 125,368 127,852 130,081
1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962
Source: Longley (1967).
Forces
Population > 14 Years
Year
148
VECTOR EXPLANATORY VARIABLES
where a, = (w,, u,). Beaton, Rubin, and Barone argued that rounding error furnishes a lower bound for the measurement error. The covariance matrix
Taa= (12)-' diag{l,
1, 1, 1, 1, 1)
is a possible covariance matrix for rounding error. (a) Compute the smallest root of lmzz -
%al
=
0,
where Z, = ( y, XI). Do you feel that mrr is singular? Do you feel that the measurement error model is appropriate and supported by this test? (b) Assume that the measurement error for year is zero so that Zoo= (12)-' diag{l, lo-', 1, I, 1, l,O}.
Compute the smallest root of lmzz - Eaal= 0. What do you conclude? (c) If the reliability ratios for the variables of 2,= (q,X , , , X , 2 , X I 3 ,X,,,X , s ) are all the same, if the reliability ratio for x16 is one, and if the rank of m,, is five, what is your estimate of the common reliability ratio? (d) Using the error covariance matrix of part (b), estimate the parameters of the model and estimate the covariance matrix of the approximate distribution of your estimator. 23. (Sections 2.2, 2.3) The yearly means of the data of Table 2.3.3 of Example 2.3.2 are
v,, = (6.180,5.230,. . ., 7.780), X,,= (6.073, 2.133,. . . , 0.983).
Assume that these means satisfy the model
-
y,, = P o
+ P I X , . + b,,
x,, X,,+ E,., =
where (6,. ii,,)' NI[O, diag(u,,, 9- 'uuy)].An estimator of uuubased on 84 degrees of freedom is given in Table 2.3.4 as 0.8284. (a) Using the yearly means, estimate (Po, PI, ubb).Estimate the covariance matrix of the approximate distribution of the estimators. (b) Combine the estimators of (Po,PI) of part (a) with those of Example 2.3.2 to obtain improved estimators. Test the hypothesis that the two sets of estimators are estimating the same quantity. Give an estimated covariance matrix for the approximate distribution of your combined estimator of (Po, PI).
2.4. INSTRUMENTAL VARIABLE ESTIMATION
The method of instrumental variables introduced in Section 1.4 extends to equations with several explanatory variables, some of which are measured with error, provided the number of instrumental variables equals or exceeds the number of variables measured with error. Let
I; = x,/l
+ e,,
X, =x,
+ u,,
(2.4.1)
for t = 1,2,. , . , n, where x, is a k-dimensional row vector of explanatory X,) is observed, and u, is the vector of measurement errors. Let variables,
(x,
2.4.
149
INSTRUMENTAL VARIABLE ESTIMATION
a q-dimensional vector of instrumental variables, denoted by W,, be available and assume that n > q 2 k. Let n and W, be such that WiW, is nonsingular with probability one,
I:=,
(E=
JqW;(e,, u,,} = (0, O),
and the rank of WiW,)-’ WiX, is k with probability one. This specification permits some of the elements of X, to have zero error variance. An xti measured without error must be a linear function of W,. Thus, a variable measured without error can serve as the instrumental variable for itself. Assume that n observations are available and write the model in matrix notation as Y=xg+e,
X=x+u,
(2.4.2)
where Y is the n-dimensional column vector of observations on I;, and X = (X,, X;, , , . , Xn)’ is the n x k matrix of observations on X,. Following the approach of Section 1.4, we express X as a function of W and an error, denoted by a2, by using the population regression of X, on W,. Thus, we write X
= Wn,
+ a2,
(2.4.3)
where n2 = [E{W’W}]-’E{W’X} is a q x k matrix of population regression coefficients. Substituting the regression expression for X into the system (2.4.2) we obtain the system
Y = Wn, + alr X = Wn2 + a2,
+
(2.4.4) (2.4.5)
where n 1 = n28, a $ = v a,/?, v = ( u ~ u,, , . . . , un)’, and u, = e, - uJ. The set of equations (2.4.4) and (2.4.5)is sometimes called the reducedform. For the model to be identified, we assume
;.1
Mwwa2
I + 0.
(2.4.6)
To motivate the estimator we make two assumptions, which we will later relax. First, and with no loss of generality, we assume W’W = nI,.
(2.4.7)
Next, let a, = (a,,, ar2) be the t t h row of a = (a,, a2), and assume the a, to be independent (0, Xaa) random vectors. Then the least squares estimators ($1, $ 2 )
= (w’w)-’(w’Y,
W’X)
(2.4.8)
+ cz,
(2.4.9)
satisfy 31 = n2B
+ 51,
$2
= 11.2
150
VECTOR EXPLANATORY VARIABLES
where the q rows of (, = (Cl, (,,) are uncorrelated with zero mean and covariance matrix Zcc= n-'C,,.
The covariance result for
C follows from the fact that, for example,
where W W = I, Iz2i is the ith column of%*,G,,~ is the variance of a,,, CT,,,,~,,~ is the variance of the ith element ofa,,, and oaal, 2 i is the covariance between a,, and the ith element of a,,. Thus, the dth element of 6 , is correlated only with Furthermore, the covariance between the dth element of 6 , the dth row of and the t t h row of g, is the covariance between a,, and at, multiplied by n - ' . If q = k, one can construct an estimator for fi by setting A1 and A, equal to their expectations and solving. Thus,
r2.
*-I*
j=n,
121
(2.4.10)
when q = k. In the econometric literature a model with q = k is said to be
just identified. Note that (2.4.10)can be written
j= ( w x ) - ' w ' Y , which is the matrix generalization of expression (1.4.12). Under the assumption that the a, are random vectors with zero mean and common covariance matrix, an unbiased estimator of C,, is
s,, = (n - q)-"(Y,
X)'(Y, X) - (Y, x)'w(w'W)-'w'(Y, X)]
and an unbiased estimator of Z,, is n-'S,,. If any of the x-variables are measured without error, the row and column of S,, associated with that variable are vectors of zeros. The estimator ( I z , , a,) satisfies the system (2.4.9)in the unknown parameters A, and fi. We recognize the model (2.4.9)as the measurement error model (2.3.1) with 3, replacing (Yl, Y,, . . . , Y")', I z , replacing (Xi, X i , . . . , Xb), g replacing (&;, E;, . . . &), and the number of observations equal to q rather than n. It follows that the estimator of /lfor model (2.4.9)defined in Theorem 2.3.2 is
fi = [4-'li;fi2
- ,k1S,,22]-1[q-iIz>Izi- ,in-1S,,21],
(2.4.11)
,,
where Saal Saa12, and Saa2,are the submatrices of S,,, xis the smallest root of lq-'A'A - An-1s0,l = 0,
(2.4.12)
and = (Izl, 6,). If q = k, the smallest root of (2.4.12)is zero and estimator (2.4.11) reduces to estimator (2.4.10).
2.4.
151
INSTRUMENTAL VARIABLE ESTIMATION
The estimator of j? usually is calculated by noting that n f f ' f fis the sum of squares and products of the estimated values for (Y, X), nfi'fi
=
(P,%)f(P,%) = (Y, XYW(W'W)-'w'(Y,x),
where (*,%) = W(W'W)-'W'(Y, X). Also, the smallest root v" of
I(*, %),(P,iz) - VS,J
=0
(2.4,13)
is the smallest root of (2.4.12) multiplied by q. That is, v' = qx. Therefore, the estimator of /I given in (2.4.11) can be expressed as
j = (XtX - ?S,,22)-1(kr9
- v"Sna2J
(2.4.14)
or as
B = (X'X - Paa2z)-1(X'Y - ~ a a 2 1 ) r
(2.4.15)
where y" is the smallest root of
((Y, X W , X) - YS,,I = 0.
(2.4.16)
Expressions (2.4.14) and (2.4.15) do not require W'W to be a multiple of the identity matrix. In Theorem 2.4.1 we give the large sample properties of the instrumental variable estimator under much weaker assumptions than we used to motivate the estimator.
Theorem 2.4.1.
Let
X = x,B + el,
XI = x,
+ u,,
where x, are k-dimensional vectors. Let W, be a q-dimensional vector, where n > q 2 k, and let (E,, x, - p,,, W, - pw,) be independently and identically distributed with mean zero and finite fourth moments, where e, = (el, u,). Let p, = (p,,, pwt) be a sequence of fixed (k + q)-dimensional row vectors satisfying lim n - 1
n-m
I=
1
pip, = Mpp.
Assume E{4IWJ = 0, E{4EtIW,} = c,,, and let n: = (nl, az)= M&(M,,, M,,), where plim n - 1
t=1
(2.4.17)
152
VECTOR EXPLANATORY VARIABLES
and Mwwand 4 M w w n 2 are nonsingular. Let fi be defined by (2.4.14)and let 3 be the smallest root of (2.4.13).Then
+ “0,
nl/@ - )I/ ?
(s;MWWn2)-1~,,1,
5 &,
where IT”,, = (I, -p)Zee(1,--p’)’ and xt-k is a chi-square random variable with q - k degrees of freedom for q > k. Furthermore, n112(B- fl) and 3 are independent in the limit. Proof. If the assumptions of the original model specification (2.4.1)are retained, is defined with probability one. If only the weaker assumptions of the theorem statement hold, given E > 0 there is some N such that is defined with probability greater than 1 - E for all n > N. We complete the proof assuming 1is defined with probability one. Because is the value of /Ithat minimizes the ratio n q - ’ [ ( l , -F)Saa(l,- py]-l[(l, -B’)%’W’WS(l,
-8’y-j
and 1is the minimum value of the ratio, it follows that 1is less than the ratio evaluated at the true p. The ratio evaluated at the true /lis nq- ’[( 1, -p)saa(1, - n’]- I ( 1, - p ) z ’ w ( w ’ w ) - 1W’Z(1, - /?y = nq- ’[( 1, -p)Saa( I , - p)’]- ‘v‘W(WW)- W‘V, (2.4.18)
’
where Z = (Y,X), v = e - up, and the expressions hold for general W of rank q. Now
(1, -/l’)Saa(l, -p)’ = ( n - q)-’(l, -p)Z’RZ(l, -p)’ = (n - ~)-‘v’Rv, where R = I - W(W’W)- W‘. Under our assumptions (n
- q)-’v‘Rv
P -+
0,”
and (2.4.19) The normality result of (2.4.19)follows because the random variables
w;v,= [ P W I + (WI - P w J I ’ ~ , are independently distributed with finite second moments. See the proof of Theorem 2.2.1 and Theorem 1.C.2.The zero mean for Wlu, and the form of the covariance matrix in (2.4.19)follows from assumptions (2.4.17).By (2.4.18)
2.4.
153
INSTRUMENTAL VARIABLE ESTIMATION
and (2.4.19), qx = C is bounded by a random variable whose limiting distribution is that of a chi-square random variable with q degrees of freedom. Hence, 1= Op(l). From the definition of / we have
1- 4 = [q-%>MWwi?,- ~ n - ' S , , , , ] - ' [ q - ' i ? ~ M , , i ? , + o,(n- liZ),
- ~ I - ' S , , ~-~/?] (2.4.20)
= [n;Mwwn2]-'n;Mwww
where w = B l -B,fi = M&MWV.The vector w is the vector of regression coefficients obtained by regressing v, = r; - X,fi on W,, and by (2.4.19), n112w5 N ( 0 , M&a,,).
The covariance matrix of the limiting distribution of n"'(/ (2.4.20). Now
- 8)follows from
-b)'
v ' = .[(I,
--j?)S,,(I, -/?)']-'(l, -)')i?'MW&(1, = ~ [ ( l -pfS,,(l, , -j?')']-'~'Kw + O , ( H - " ~ ) = aVilnw'Kw+ O,(n-'''),
(2.4.21) (2.4.22)
where
K = M w w - M,,n,(n;Mwwn,)-'tl.;M,,. Because n1/2M$La;,'/2a is converging in distribution to a N(0, Ik) random vector, the leading term of (2.4.22) is converging in distribution to a chi-square random variable with q - k degrees-of freedom. The independence of o ' K o and n;MWVin the limit follows from standard least squares results. 0 The limiting distribution of i j is that of a chi-square random variable with q - k degrees of freedom. For normally distributed el, expression (2.4.21)demonstrates that (q - k)-'ij is equal to an F random variable plus a remainder that is 0,(n-"2). Therefore, it is suggested that the distribution of (q - k)-'v" be approximated by the central F distribution with q - k and n - q degrees of freedom. A test of model specification can be performed by comparing (q - k)-Iv' with the tabulated F distribution. The covariance matrix of the approximate distribution of can be estimated with
/
o@>= (2%- ijSaa2,)-'S,,,
(2.4.23)
where (2.4.24)
154
VECTOR EXPLANATORY VARIABLES
The matrix z;Mwwn2 must be nonsingular in order for the model to be identified. Therefore, one should check this assumption by computing the smallest root of
132Mwwfi2 - 8n-1S,,,,221 = 0.
(2.4.25)
+
If the rank of z;Mw,a2 is k - 1, the smallest root fi divided by (q - k 1) is approximately distributed as a central F random variable with q - k + 1 and n - q degrees of freedom. A large F is desired because a large F indicates that the model is identified. If F is small it may be possible to identify the model by adding instrumental variables. It is suggested that the modified estimator given in Theorem 2.5.3 of Section 2.5 be used in practice. The estimator (2.5.13)can be written
j = [jifiz - ( a - CL)saa22]-"x5j - ( a - cI)s,,21]
(2.4.26)
in the notation of the instrumental variable model of this section. Note that the n of Theorem 2.5.3is equal to the q of this section. The theory of Theorem 2.5.3 is not directly applicable to the instrumental variable problem, but an analogous theorem can be proved for the instrumental variable problem. Setting CL equal to one in (2.4.26) produces an estimator of /Ithat is nearly unbiased. For the model with (at, u,) NI(0, Z), Anderson and Rubin (1949) demonstrated that the estimator (2.4.11) is a type of maximum likelihood estimator for the simultaneous equation model. For the simultaneous equation model, estimator (2.4.11) is called the limited information maximum likelihood estimator. Sargan (1958) obtained the estimator in the general instrumental variable setting. Anderson (1976, 1984) discusses the relationships between the limited information estimator and the errors-in-variables estimator. Fuller (1977) used the analogy to errors in variables associated with (2.4.9).
-
Example 2.4.1. In November 1983, the Department of English at Iowa State University conducted a study in which members of the general university faculty were asked to evaluate two essays. The essays were presented to the faculty as essays prepared as part of a placement examination by two foreign graduate students who were nonnative speakers of English. Three pairs of essays were used in the study: a pair containing errors in the use of articles; a pair containing errors in spelling; and a pair containing errors in verb tense. The faculty members were asked to read the essays and to score them using a five point scale for 1 1 items. The study is described in Vann and Lorenz (1984). We analyze the responses for eight items. The eight items are divided into three groups; three items pertaining to the essay, three items pertaining to the language used, and two items pertaining to the writer. The low and high
2.4.
155
INSTRUMENTAL VARIABLE ESTIMATION
points of the scale for the eight items were described as follows:
A. The essay is Z, Z, Z3
poorly developed-well developed difficult to understand-easy to understand illogical-logical
B. The writer uses language that is Z4 Z, z 6
inappropriate-appropriate unacceptable-acceptable irritating-not irritating
C. The writer seems
careless-careful unintelligent-intelligent
Z, Z,
Table 2.A.1 of Appendix 2.A contains the data for 100 respondents. This is a subsample of the 219 faculty members who participated in the study. The score on each item is the sum of the scores on that item for the two essays scored by the faculty member. The sample covariance matrix is given in Table 2.4.1. A possible model for these data is the factor model, where Z,; = bio
+
Bi+,3
+ / j ; 6 ~ , 6+ /ji*z,* + & t i ,
+E~;,
Zti = z , ~
for i = 1 , 2 , 4 , 5,7, for i = 3, 6, 8,
and the e, are independent [0, diag(oEEll, o E E 2 2. ,.., oEE88)] random vectors. In this model each observation is expressed as a linear function of three of the unknown true values, where one of the unknown true values is chosen from each of the three groups of equations. We shall estimate the equation for Z,,. Because the measurement errors in different variables are assumed to be uncorrelated, the observed variables not entering the equation for Z , , can be used as instrumental variables for that equation. Thus, we estimate the equation for Zfl using Z I z ,Z,,, Z t 5 , and Z,7 as instrumental variables. In the notation of this section, (',I?
' f 3 , 't6,
't8,
't,,
'143
'159
't7)
= (YY
x,17
x,,,
'133
&I,
&Z,
w 3 5 &4)*
We have not included the constant function in either the set of X variables or in the set of W variables. This will permit us to use matrices of corrected sums of squares and products in the calculations. To be strictly conformable
Mean
x2
Y Xl
-
6.50
7.07
2.1364 2.4092 1.1483 1.1839 1.8181 1.4675 1.3895 1.1271
XI
Y
2.9798 2.1364 1.2929 1.3182 1.7727 1.5758 1.5960 1.3131
Logical
Developed
7.76
1.2929 1.1483 2.3055 1.0533 1.3871 1.6166 1.8392 1.1459
x2
Irritating
w,
1.7727 1.8181 1.3871 1.2151 2.6516 1.6133 1.5802 1.4992 7.43
7.97
Understand
1.3182 1.1839 1.0533 1.7264 1.2151 1.2210 1.2804 1.4766
x3
Intelligent
TABLE 24.1. !Sample moments for 100 observations on language evaluation
6.96
1.5758 1.4675 1.6166 1.2210 1.6133 2.968 1 2.1685 1.2550
w 2
Appropriate
1.3131 1.1271 1.1459 1.4766 1.4992 1.2550 1.3685 2.6428 7.06
6.92
w4
Careful 1.5960 1.3895 1.8392 1.2804 1.5802 2.1685 3.2663 1.3685
w 3
Acceptable
2.4.
157
INSTRUMENTALVARIABLE ESTIMATION
with (2.4.1),(2.4.4),and (2.4.5),the X vector and W vector would be Xf = ( 1 , Z , , ,
T 6 ,Z18) and
W,
= (1,
z,,, Zt4,Z15,G).
The ordinary least squares estimates of the reduced form are
+
+
+
+
+
+
+
+
0.436w1 o.151q2 0.1311/t;, O.IlOW,,, (0.105) (0.100) (0.112) (0.696) (0.109) 2,, = 1.447 + 0.557&, + 0.148&:, + 0.052&, + 0.013&,, (0.572) (0.090) (0.092) (0.086) (0.082) 2,,= 2.396 o.175w1 o.176w2 0.3281/t;, 0.081&,, (0.570) (0.089) (0.092) (0.086) (0.082) 8,,= 2.918 0.108&, + o.114y2 + o.100y3 o.392w4. (0.076) (0.071) (0.068) (0.474) (0.074) = 0.523
+
+
The matrix S,, that is the estimator of the covariance matrix of the reduced form errors is .1.6825 0.8494 0.0794 0.2849
0.8494 1.1379 0.0264 0.2503
0.0794 0.0264 1.1284 0.0903
0.2849' 0.2503 0.0903 0.7813
The smallest root of Equation (2.4.13)is v" = 0.010. Because q - k = 1, the F statistic is equal to 0.010. Under the null, the distribution of the F statistic is approximately that of a central F with 1 and 95 degrees of freedom. The small, but not overly small, value for F gives us no reason to question the model. To check the identification status of the model, we compute the smallest root of Equation (2.4.25).The smallest root is 8 = 10.71 and the associated F statistic is F = 5.35. If the rank of n;mWWn2is two, the distribution of the statistic is approximately that of a central F with 2 and 95 degrees of freedom. Because the F statistic is large, we are comfortable with is three and we proceed with estithe assumption that the rank of n'gnWWn2 mation of the equation for developed. The estimators we present were computed using SYSREG of SAS. The limited information maximum likelihood option is used in which Y , X , , X , , X , are called endogenous variables and W,, W,, W,, W, are called exogenous variables in the block statement. We set ALPHA of the program equal to one, which results in the calculation of estimator (2.4.26)with c( = 1. The estimated equation is
P#= - 1.566 + 0 . 6 7 7 ~+ ~0 ~. 2 0 2 ~+ ~0.215~,,, ~ (0.843) (0.193)
(0.235)
(0.225)
where the estimated standard errors are those output by the program and s, = 1.058. The program calculates the standard errors as the square roots
158
VECTOR EXPLANATORY VARIABLES
of the diagonal elements of [Xt% - ( 3 - a ) ~ , , ~ ~ ] - ’ s , , .
00
Example 2.4.2. This example is taken from Miller and Modigliani (1966). The authors developed a model for the value of a firm and applied it to a sample of 63 large electric utilities. We write their model as
x=
Po
+ P I X I + el,
where k; is the current market value of the firm multiplied by 10 and divided by book value of assets, x, is the market’s expectation of the long run, future tax adjusted earning power of the firm’s assets multiplied by 100 and divided by book value of assets, and e, is the error in the equation. The coefficient flI is the capitalization rate for the expected earning power for utility firms. Because x, is an expectation, it cannot be observed directly. Each year the firms report their earnings and we let X, be the measured tax adjusted earnings multiplied by 100 and divided by book value of assets, where X,= x, + u, and u, is the difference between current reported earnings and long run expected earnings. Miller and Modigliani suggest three instrumental variables: Wt2 Current dividends paid multiplied by 100 and divided by book value of assets Wt3 Market value of debt multiplied by 10 and divided by book value of assets Wt4 Market value of preferred stock multiplied by 10 and divided by book value of assets
Because the firms practice dividend stabilization, dividends paid should reflect management’s expectation of long terms profits. In turn, management’s expectations should be correlated with the market’s expectations. Miller and Modigliani explain the more subtle reasons for including the other two variables in the set of instrumental variables. Miller and Modigliani (1966, p. 355) discuss possible correlations between the W? and u, and state: “But while complete independence is not to be expected, we would doubt that such correlation as does exist is so large as to dash all hopes for substantially improving the estimates by the instrumental variable procedure.” Therefore, our operating assumption is that
-
(e,,4’ W O , &J independent of (W?,, WtJ, Wf4).
2.4.
159
INSTRUMENTAL VARIABLE ESTIMATION
TABLE 2.4.2. Matrix of corrected mean squares and products Y
X
w5
w:
w:
0.8475 0.4I27 0.2336 - 0.0005 0.0466
0.4127 0.3020 0.1305 0.0092 0.0 106
0.2336 0.1305 0.1743 -0.0065 -0.0786
- 0.0005
- 0.0466
0.0092 - 0.0065 0.1780 -0.0792
0.0106 -0.0786 - 0.0792 0.3900
Source: Miller and Modigliani (1966).
Miller and Modigliani studied data for 3 years. We use the data for 1954. The sample mean vector is
(F, X, Wi,Wi,Wh)= (8.9189,4.5232, 2.5925,4.9306, 1.0974) and the matrix of mean squares and products is given in Table 2.4.2. The matrix of mean squares and products has been adjusted for two variables, the reciprocal of assets and a measure of growth. Therefore, the degrees of freedom is 60 for the mean squares and products presented in the table. We define the standardized vector W, by
w, = [l, (Wf2 wi,w,t, - w:,Wf4 - Wh)T] -
= [ 1,
(2.4.27)
@I],
1.
where T is the lower triangular (Gram-Schmidt) transformation 0.3953 0
[
T = 0.0879 2.3719 0 0.8318 0.8206 O1.777 1
Then the matrix of sample raw mean squares and products of W, is the identity matrix. Because we shall deal with moments about the mean, we have chosen to define the subvector a,. The matrix of corrected mean squares and products for X , , at)is
(x,
I
0.8475 0.4127 0.5594 0.0193 0.2767
0.4127 0.3020 0.3126 0.0334 0.1350
0.5594 0.3126 1.0000 0 0
0.0193 0.0334 0 1.0000 0
The model in the reduced form (2.4.4),(2.4.5)is
1.
0.2767 0.1350 0 0 1 .oooo
I; = n l l + n 2 1 w 2 + n 3 1 w 3 + n 4 1 w 4 + a:l, + n 2 2 w 2 + n 3 2 w 3 + n42w4 +
160
VECTOR EXPLANATORY VARIABLES
where nil = ni2Pl for i = 2,3,4. The reduced form estimated equations are
+ o.5594K2 + 0.0193& (0.0896) (0.0896) (0.0896)
= 8.9189
+ 0.2767K4, (0.0896)
2,= 4.5232 + 0.3126w2 + 0.0334w3 + 0.1350W4, (0.0570) (0.0570)
(0.0570)
(0.0570)
where the numbers in parentheses are the estimated standard errors of the regression coefficients. The matrix (n - 4)(n - I)-%,,
= m,, - rnzem&mez =
[,.4576 0.19981 0.1998 0.1849
and S,, = (60)-'S,,, where n - 1 = 60. The determinantal equation (2.4.12) is
1[0.0710 0.1300 0.0390 0.0710] -
,[,.,,,
0.0035 0.0035]1 0.0032 = 0,
and the smallest root is 1= 0.153. The corresponding F statistic is F = lS(0.153) = 0.23, which is approximately distributed with 2 and 57 degrees of freedom for a correctly specified model. The F test supports the model specification. If this F test were large, it would call into question the model specification. The statistic is small because the ratios of the three pairs of regression coefficients are roughly the same. The coefficients of K3 are small relative to the standard errors so that a value of 1.8 is totally acceptable for the ratio of the two coefficients. In this example there is little doubt that the model is identified. The test 7th = 0 is statistic for the hypothesis that F
= (0.00324)-'(0.03902)=
12.04.
If Cf=znf2 = 0, the distribution of the test statistic is approximately that of Snedecor's F with 3 and 57 degrees of freedom. One easily rejects the nT2 = 0 and accepts the hypothesis that the model is hypothesis that identified. The estimator of /I1 is
c;'=2
j1= (0.0390 - 0.0005)-'(0.0710 - 0.0005) = 1.8281
and the estimator of Po is 8, = 0.6500. By (2.4.2:), the estimated covariance matrix for the approximate distribution of fll) is '{(joy
where s,, = 0.3535.
)I)'>
=
[
(a,
1.0488 -0.2306
- 0.23061
0.0510 '
2.4.
161
INSTRUMENTAL VARIABLE ESTIMATION
The tests we have performed on the model indicate that the model is identified, but the sample size and error variances are such that the approximations based on the normal distribution of Theorem 2.4.1 should be used with care. In this example we carried out all numerical calculations parallel to the theoretical development. As illustrated in Example 2.4.1, the calculations are performed most easily in practice by using the limited information single equation option of a statistical package such as SAS (Barr et al, 1979).
on
REFERENCES Anderson (1951b, 1976), Anderson and Rubin (1949, 1950), Fuller (1976, 1977), Miller and Modigliani (1966), Sargan (1958), Sargan and Mikhail (1971).
EXERCISES 24. (Section 2.4) The full model for the Miller-Modigliani study of Example 2.4.2 is
where zI2 is the reciprocal of book value of assets multiplied by lo’, zI3 is the average growth in assets for 5 years divided by assets multiplied by 10, and the remaining variables are defined in Example 2.4.2. It is assumed that zI2 and zr3are measured without error. The sample correlation matrix for 1956 is -
y,
w, w,
X, 212 z,3 0.179 0.432 0.0 14 0.833 -0.162 1.Ooo -0.040 -0.083 0.630 -0.017 0.833 -0.162 - 0.040 1.000 -0.106 0.104 0.249 1.000 -0.384 0.344 0.179 - 0.083 -0.106 0.104 -0.384 1.000 -0.199 0.432 0.630 0.249 0.0 14 -0.017 1.om 0.344 -0.199 -0.022 -0.077 - 0.078 0.170 -0.442 -0.288
.ooo
1
w3
-0.022 -0.077 - 0.078 -0.442 -0.288
The means of the variables are (8.69,4.84,0.739,0.799,2.49,4.88,1.09) and the standard deviations are (1.1,0.55,0.98,0.80,0.42, 0.49, 0.52). Estimate the /3 vector by the method of instrumental variables. Estimate the covariance matrix of the approximate distribution of the estimators. 25. (Section 2.4) Assume that the model of Example 2.4.2 and Exercise 2.24 is replaced by the model with Po 3 0. Miller and Modigliani argue that it is reasonable for the model to have a zero intercept. Noting that the column of ones is an instrumental variable, estimate (B,, B2, B3). Test the model specification. Estimate the variance of the approximate distribution of your estimator.
162
VECTOR EXPLANATORY VARIABLES
26. (Sections 2.4,2.3) Let the model and assumptions of Theorem 2.4.1 hold. Let / be defined by (2.4.20)or, equivalently, by
/ = (MZz - k C z 2 ) - ' ( M z i - JSSst~i), where
and 1i s the smallest root of IM - ASee\ = 0. If we use (2.3.24),the covariance matrix of the approximate distribution of b is estimated by
S,, = n- lS,,., li = M,$2M,,,
6{j}= q-'M;ici,, + [ q - 1 + ( n - q)-']M-'e 2 2 pp M -Z1Z U . "", where e,, = n(n - k)-'H2S,,fi,, Mzz= Hz(M - S,,)H2, Hz= (0, Ik)' - (1, -/?)'[(l, -/?)SJ, 8,, = (n - k)-'[q(l, -@)M(i, -/?)'
-/?)']-'(l,
-/?)S,,
+ (n - q)(l, -/?)Sce(l, -/?)'I.
Note that (n, n - k, d,, Z,) of (2.3.24) corresponds to (q, q variable model. Prove that
- k, n - q, k i ) of
the instrumental
4) 1; N(O,I ~ ) , as n -* co.
[i'{j}]-''2(/-
27. (Sections 2.4, 2.3, 1.3) Let
Y, = P i x r + e,,
X,= x, + u,,
(x,, e,, u,)' ., NI[(P,,0,Ol', diag(u,,, nee,nu.11,
where px # 0. (a) Find the maximum likelihood estimator of (PI, px, u,,, u,,, uuu) for a sample of n vectors ( K,XI). (b) Give the covariance matrix of the limiting distribution of the maximum likelihood estimator of part (a). (c) Find the covariance matrix of the limiting distribution of the maximum likelihood estimator of (PI,p,, u,,, uuJ under the assumption that it is known that u,, = uuu.Show that the variance of the limiting distributtion of the estimator of PI is that of the optimal linear combination of X - ' P and estimator (1.3.7). 28. (Section 2.4) (a) Estimate the equation z,l
= PI0
+ p I J Z , 3 f &13
for the data of Example 2.4.1,using ( Z , * ,Z,,, Z , , , ZI6.Z,,, Z,8)as instrumental variables. Is the model acceptable? (b) Estimate the equation z17
= p70
+ p 7 J Z l 3 + fi76216 + fl7sz!8 + ' t 7
for the data of Example 2.4.1, using (Zll, Z,*, Z,4,Z,s)as instrumental variables. What do you conclude? 29. (Section 2.4) Show that when q > k the estimated error variance s,, given in (2.4.24)is the weighted sum of two independent estimates, one that is a function of S,, and one that is a function of 4.
2.5.
MODIFICATIONS TO IMPROVE MOMENT PROPERTIES
163
30. (Section 2.4) Verify, through the following steps, that (2.4.14)and (2.4.15)give the same estimator of b: (i) Define P, = W(W'W)-'W. In linear model theory, P, is called the orthogonal projection onto the column space of w. Show that
s,, = ( n - q)-'(Y.
X)"I, - p , m ,XI.
+
(ii) Show that (Q,8)'(3,8) - vS,, = (Y,X)'(Y, X) - [v (n - q)]S,,, and, hence, that i defined in (2.4.13)and 9 defined in (2.4.16)satisfy the relationship
v'= (iii) Show that, for i and
[O
+ (n - q)].
7 as defined in (ii), X'X - tsSaa22 = X'X - paa22, - X Y - ;s,,,, = X Y - jjsjoo21. ^ ^ ^ ^
31. (Section 2.4) Verify formula (2.4.21).(Hint: First derive an expression for - j similar to (2.4.20),but with M, replaced by MWw.) 32. (Section 2.4) Let q instrumental variables (F,, e l , ... , F,)be available for the single-x model (1.4.1).Let p, be given by (2.4.14)and let
6,= y, - P - j l ( X , - 2).
Is it true that Z,; , K,6, = 0 fprf = 1,2,. . . ,q? 33. (Sections 2.4, 2.2) The (Y,X) used in the instrumental variable estimator (2.4.14)can be viewed as the best predictor of (y, x) given W, where y = xb. Recall that it =X
+ (XI- W)m~~(m,,
- Zuu)
is an estimator of the best predictor of x, given XIwhen L,, is known and j, =
B + (x,- X)m;$nxr
is an estimator of the best predictor of y, given XI. Let
S,, = (n - k)- '
1 ( Y - j , , X,- C,)'(
t=1
and assume Eue= 0. Show that the estimator (2.4.14)with the rows of j defined in (2.2.12).
2.5.
Y, - j , , X,- 2,)
(P,8)equal to ( j , ,i,)is the estimator of
MODIFICATIONS TO IMPROVE MOMENT PROPERTIES
The estimators that we have been studying have limiting distributions, but the estimators do not necessarily have finite means and variances in small samples. In this section we introduce modified estimators that possess finite moments and that have small sample properties that are superior to those of the maximum likelihood estimators.
164
VECTOR EXPLANATORY VARIABLES
2.5.1. An Error in the Equation The model of Section 1.2 is
I: = Po + (XI,el, ~ 1 ) '
XtPl
+ e,,
Xr = xt
+ uf,
NI[(px, O,O)l, diag(axx, Gee,
(2.5.1)
~uu)],
where o,,, is known and a,, > 0. The estimator
ii,= (mxx - o,,)-lmxy,
(2.5.2)
introduced in Section 1.2, is a ratio of two random variables. Such ratios are typically biased estimators of the ratio of the expectations. In fact, the expectation of the estimator (2.5.2) is not defined. See Exercise 1.13. We shall demonstrate that it is possible to modify the estimator (2.5.2) to produce an estimator that is nearly unbiased for PI. To this end we define the alternative estimator,
j,= [I?,,
+ a(n - ~ ) - ~ a , , , , ] - ~ r n ~ ~ ,
(2.5.3)
where u > 0 is a fixed number to be determined, Hxx
and
=
{ ~ zI: ~ 7 -
x is the root of
if;Z 3 1 + ( n - I ) - ' (n - l)-l-jcu,, i f 1 < 1 + (n - 11-1,
lmzz - A diag(0, CT,,,,)~= 0.
(2.5.4)
(2.5.5)
The estimator 8, is a modification of the maximum likelihood estimator. Note that the estimator of ox,, denoted by A,,, is never less than (n - l)-lo,,u. The additional modification associated with a produces a denominator in the estimator of p1 that is never less than (n - l)-'(l + u)~,,,.Because the denominator in the estimator of P1 is bounded below by a positive number, the estimator has moments. The approximate mean of the estimator is given in Theorem 2.5.1.
Theorem 2.5.1.
0. Then
Let model (2.5.1) hold with r~,, > 0, o,, > 0, and
E { B , - P1} = ( n - l)-'a,'{a
-2
>
gee
- 2a;~ouu}auv + O(n-*), (2.5.6)
p1
where u, = e, - utpl. Furthermore, the mean square error of through terms of O(n-') is smaller for 0: = 5 than for any smaller a, uniformly in the parameters. Proof. The root
is given by -1 2 1= ~ , l C m x x - mYY mYxl-
2.5.
MODIFICATIONS TO IMPROVE MOMENT PROPERTIES
165
The quantity in the square brackets is the residual sum of squares obtained in the regression of X on Y multiplied by (n - I)-'. The residual sum of ;o la ,, is distributed as a chi-square random squares divided by ouu uy variable with n - 2 degrees of freedom. Therefore,
+
E ( 2 ) = (n - u - y n - 2)[1 and, for n sufficiently large, < 1 + n-1) = P
ti
+ (ouuayy)-loeeo,,]
< 1 + n - ; - ~(1)) < P{JX- E(f)l > 11 + n - 1 - E(;i)l} -E(R)I-~E{[R -~(R)14) G [i + =
~ X- E(X)
o(n-2),
where we have used Chebyshev's inequality and the moment properties of the chi-square distribution. Therefore, it is sufficient to consider the distribution of the estimator when 2 > 1 + (n and we write
8, - Dl = [mxx - ( n - ~ ) - ' ( n- 1 - ~ ) o , , , ] - ~ x [m,, + (n - l)-'(n - 1 - a)o,,P1] + ~ , ( n - ~ ) . Expanding p , - p1 in a Taylor series we have pl pI = a ; ~ [ m x +u mu,- o,, + a(n - ~ ) - ~ o , , , , ] -
- a;,2[mX, + mu"- o,, + a(n - l)-'oUv] x [2rn,, + mxx -ox, + mu, - uuu+ a(n - l)-'aUU]+ O , ( ~ I - ~ / ' ) ,
I:=,
where. for example, mxu= (n - 1)-' (x, - X)(u, - V). It can be verified that the conditions of Theorems 5.4.3 and 5.4.4 of Fuller (1976) are satisfied. Therefore, we may take the expectations of the first two terms in the Taylor series to obtain the bias result. (Recall the moment properties of the normal distribution.) The terms in the square (6, - PJ2 that contain a and whose expectations are O ( n - 2 )are (an- '%A2
-
40;,3(Un- l ~ u u M ~+ x "ma, - ~,,)(2mx,+ mu, - Cuu) 'u,,)(rn,, mu, - o , , ) ~ .
- 2a;,3(an-
+
The expected value of this expression is t~-~a.,;'[a;,(a~
- 801) - 2aauuoUu - 2ao;~a,(a,o,
+ 30&)] + O ( K 3 ) .
Because r~,,cr,, > q?,, the expression is smaller for a = 5 than for a < 5.
0
Monte Carlo methods can be used to demonstrate that the theoretical superiority of the modified estimator is realized in small samples. Table 2.5.1
166
VECTOR EXPLANATORY VARIABLES
TABLE 2.5.1. The maximum likelihood estimator adjusted for degrees of freedom
(PI) and modified estimator (j,) for 25 samples of size 21 Sample Number 1 2 3 4
5
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
m,,
rn,,
myy
0.89 1.14 1.27 1.28 1.56 1.61 1.69 1.71 1.74 1.74 1.86 1.93 1.94 1.95 2.0 1 2.02 2.09 2.1 1 2.17 2.29 2.34 2.59 2.95 3.38 3.70
0.12 0.90 0.23 0.83 0.65 1.11 0.41 1.18 0.56 1.53 0.88 1.37 0.76 1.34 0.56 1.59 0.74 1.65 1.05 1.44 0.66 1.45 1.20 2.19 1.41
1.46 1.79 1.83 2.15 1.64 2.22 1.52 1.87 1.55 3.05 1.21 3.07 1.66 2.37 2.19 1.74 2.01 3.85 1.98 2.52 1.10 1.92 2.40 2.24 2.19
@, 0.88 0.69 1.24 0.96 1.31 1.06 1.58 0.97 1.53 0.97 1.23 1.31 1.59 1.19 1.87 0.58 1.82 1.41 1.62 1.47 1.95 1.50 2.34 1.24 2.80
11.67 1.99 0.84 2.58 1.15 1.81 0.60 1.58 0.76 1.99 1.02 1.48 0.81 1.41 0.56 1.10 0.68 1.48 0.89 1.11 0.49 0.91 0.62 0.92 0.52
s, a =2
0.78 1.49 0.61 1.76 0.98 1.55 0.52 1.32 0.67 1.67 0.91 1.34 0.73 1.28 0.51 0.99 0.62 1.36 0.82 1.03 0.46 0.86
0.59 0.88 0.50
a=4-
2cX,
0.45 1.30 0.5 1 1.51 0.89 1.43 0.48 1.24 0.63 1.57 0.86 1.27 0.70 1.22 0.49 0.96 0.59 1.31 0.80 1.oo 0.44 0.84 0.58 0.87 0.50
contains the sample moments and estimators for 25 samples of 21 observations generated by the model
Y; = x, + e,,
(x,, e,, u,Y
-
X,= x, + u,,
(2.5.7)
NJ(0,I).
The samples were constructed so that the mean of mxx over the samples is equal to the population c x x .Therefore, this set of samples contains more information about the population of samples than would a simple random sample of 25 samples.
2.5.
MODIFICATIONSTO IMPROVE MOMENT PROPERTIES
167
Three estimators of j1are given in the table. The first estimator is the maximum likelihood estimator adjusted for degrees of freedom and is defined by (2.5.8)
where is the root of (2.5.5). The root 1is given in the fifth column of the table. The second estimator is the modified estimator (2.5.3)with c1 = 2. The third estimator is the modified estimator with a=
2
+ 2 m ~ ~ c=~ 4, -, 2k.,
where k,, = rn&mxx - o,,,).The bias expression of Theorem 2.5.1 furnishes the motivation for the third estimator. The quantity m ~ ~ o , ,is, , a biased esbut it is preferred to other estimators of the ratio because timator of ,,ac;,r of its smaller variance. The samples of Table 2.5.1 are ordered on mxx. This makes it clear that the c1 modification produces large improvements in the estimator for small m x x , while producing modest losses for large mxx. Sample number one is an example of the type of sample that produces estimates of large absolute value. It is because of such samples that the maximum likelihood estimator does not have moments and it is for samples of this type that the modification produces large improvements. Figure 2.5.1 contains a histogram for the maximum likelihood estimator adjusted for degrees of freedom constructed for 2000 samples generated by model (2.5.7). Figure 2.5.2 is the corresponding histogram for the modified estimator with c1 = 2. The superiority of the modified estimator is clear in these figures. The empirical distribution of the maximum likelihood estimator has a very heavy tail with a few observations well beyond the range of the figure. The spike to the right of the break in the scale of the abscissa of Figure 2.5.1 indicates that about 3% of the maximum likelihood estimates are greater than 3.5. On the other hand, the largest value for fll in the 2000 samples is 2.36. About 1% of the samples gave negative values for both estimators. The percentiles of the empirical distributions ofb, and 8 , for 2000 samples are given in Table 2.5.2. Percentiles are given for the population with ox, = 1 and for the population with crxx = 4.The entire distribution of the modified estimator is shifted to the left relative to the distribution of the maximum likelihood estimator. The shifts of the upper percentiles toward one are much larger than the shifts in the lower percentiles away from one. The empirical mean for the modified estimator for the population with crxx = 1 is 0.963 and the empirical variance is 0.190. The variance of the approximate distribution
FIGURE 2.5.1. Histogram for 2000 maximum likelihood estimates.
FIGURE 2.5.2. Histogram for 2000 modified estimates.
168
2.5.
169
MODIFICATIONS TO IMPROVE MOMENT PROPERTIES
TABLE 2.5.2. Monte C?rlo percentiles for maximum likelihood estimator adjusted for degrees of freedom (PI) and modified estimator with a = 2), u,, = 1, a,,,, = 1, = 1, and n = 21
(sl
axx= 4
a, = 1
Percentile 0.0 1 0.05 0.10 0.25 0.50 0.75
0.90 0.95 0.99
- 0.07
- 0.04
0.35 0.5 1 0.74 1.03 1.46 2.16 2.82 5.94
of the estimator for the population with
wh)
=
V{Pl>
0.63 0.72 0.78 0.89 1.01 1.15 1.31 1.42 1.67
0.30 0.45 0.68 0.93 1.25 1.54 1.71 1.96
0.61 0.71 0.76 0.87 0.99 1.11
1.25 1.34 1.55
ex. = 1 is
= (n - 1)-10,;2(cxxo"u = (20)-'[2(2)
+ du)
+ 13 = 0.25,
which is larger than the empirical variance. There are at least two explanations for this. First, the distributional approximation makes no allowance for the 1adjustment in the estimator. For the parametric configuration of our example this adjustment takes place about 19% of the time. Second, the modification has a definite effect on the variance of the estimator for samples of this size. The variance effect is evident in Figure 2.5.1. The Monte Carlo mean and variance of for the population with ex, = 4 are LOO0 and 0.0382, respectively. The Monte Carlo variance is larger than 0.0344 which is the variance of the approximate distribution. For the population with n,, = 4, the modification has less effect on the variance than it does for the ox, = 1 population. Also, the 1adjustment to the estimator is relatively infrequent when ex, = 4. The empirical mean and variance for the estimator with a = 2 + 2 m i i o , , were 0.883 and 0.133, respectively, for the population with ox, = 1 and were 0.992 and 0.0357, respectively, for the population with ex, = 4. Therefore, the estimator with c1 = 2 + 2rni:a,, has a smaller mean square error than the estimator with c1 = 2, a result that agrees with the conclusion in Theorem 2.5.1. The Monte Carlo percentiles of three test statistics are compared with the percentiles of Student's t distribution and the standard normal distribution
p,
170
VECTOR EXPLANATORY VARIABLES
TABLE 2.5.3. Monte Carlo percentiles for alternative test statistics for the model with a,, = 1, u,, = 1, fll = 1, and n = 21 Parameter and Variable
Percentile 0.01
0.05
0.10
0.90
0.95
0.99
- 2.99
- 1.94
- 1.43
0.72 0.63 1.28
0.83 0.75 1.59
1.06 0.97 1.97
-2.56 -2.71 - 2.22
- 1.73 - 1.87
- 1.29
- 1.67
- 1.45 - 1.30
1.18 1.06 1.34
1.44 1.32 1.69
1.80 1.72 2.33
Student's t l g
- 2.54
- 1.73
- 1.33
1.33
1.73
2.54
N(O, 1)
- 2.33
- 1.64
- 1.28
1.28
I .64
2.33
a,, = 1 i, for 1, i, for 8, iR I
4
i, for lI i, for 1,
6,, =
fRl
- 3.20
-2.24
-2.17 - 1.57
- 1.68 - 1.27
in Table 2.5.3. The statistics are
where
+ t?;~(auu~v,+ j:oiu)], z,, = (n - 2)-'(n - l)(myy - 2BlmXy + p:mxx), is defined in (2.5.3), sgn@, - pl) is the sign of p, pl, and 2 is the P{pI} = (n - 1)-'[I?;;g,,
A,,
-
square root of the likelihood ratio statistic for the hypothesis that the true value of the parameter is PI. See Exercise 2.37. The percentiles are given for two parametric configurations. The first is that of (2.5.7). The second parameter set is the same as the first except that oxx= 4 instead of ox, = 1. The distributions of i, and 7, differ considerably from that of Student's t
2.5.
171
MODIFICATIONS TO IMPROVE MOMENT PROPERTIES
with 19 degrees of freedom. All tail percentiles of the statistics for the population with ox, = 1 are to the left of the tail percentiles of Student’s t. At first glance, it might be surprising that the distribution of the Studentized statistic is skewed to the left when the distribution of the estimators is skewed to the right. This occurs because large estimates of PI are associated with small values of m x x . Small values of m,, produce large estimates of V { ) , } because of the small estimates for cxx.For example, the sample with (m,,, m x y ,muy)= (0.812,0.058, 1.925) yielded the statistics
b1 = 33.41
and
i, = 0.005.
Conversely, the majority of the small estimates for p1 are associate! with large mxx, and large mxx produce small estimates of the variance of pl. For example, the sample with ( m X X ,mxy, myy) = (3.146,0.649,2.122) yielded the statistics
), = 0.302
a,
and
2,
= -2.57.
PI,
Because the distribution of is moved left relative to that of the distribution of S, is to the left of that of i,. The percentiles of the likelihood ratio statistics are in much better agreement with percentiles suggested by large sample theory than are the percentiles of the Studentized statistics. The Monte Carlo results suggest that the normal distribution, rather than Student’s t , be used to approximate the percentiles of the transformed likelihood ratio statistic. In fact, the observed percentiles for the tranformed likelihood ratio statistic for the model with a,, = 1 are closer to zero than are the percentiles of the normal distribution. We believe this is so for the same reasons that the observed variance of is less than that suggested by the large sample theory. We state the following generalization of Theorem 2.5.1 without proof. The proof is of the same form as that of Theorem 2.5.1. An analogous theorem holds for x, fixed.
p,
Theorem 2.5.2.
Let
rl = Po + x,Sl + e,,
X, = x,
+ u,,
where x, is a k-dimensional row vector, u, is a k-dimensional row vector, and is a k-dimensional column vector. Let (Xt,
4)’
N I [ ( P x ,o)’, block diag(C,,,
‘EE)],
-
where Ex, is a k x k positive definite matrix, e, = q, + w,,and q j NI(0, aqq) independent of (x,, w,, u,)for all t andj. Let S,, be an estimator of the covariance matrix of a, = (w,, u,) distributed as a multiple of a Wishart matrix with
172
VECTOR EXPLANATORY VARIABLES
degrees of freedom d, = v-'n, where v is a fixed number. Assume So, is independent of (x,, 4,,a,) for all t . Let fi1
= [H,,
+ a(n - l)-lSuu]-l[fi,y + a(n - l)-'S,,,,,],
(2.5.9)
where c1 > 0 is a fixed real number,
i f 2 2 1 + (n - I)-' mxx - s u u m,, - [ X - ( n - I)-~IS,,,,i f 2 < 1 + ( n - I)-',
- {
H,,
=
1
N,,
=
{r~rIFY-
ifi2 I ( n - l)-l-jsuw if2 < 1
+ (n - 11-1 + (n -
11-1,
2 is the smallest root of and Z, =
(x,X,).Then
Imzz - &,I
E { $ , - 4') = - ( n - l)-lC;:{(k CuuZ;: J>C,,
+
= 0,
+ 1 - a) + (1 + v)[I tr(Z,,C,')
+ O(n-2).
Furthermore, through terms of order n - 2 , the mean square error of smaller for ct = k + 4 + 2v than for any smaller a. Proof. Omitted.
B1 is 0
The modifications of the estimators introduced in Theorems 2.5.1 and 2.5.2 guarantee that the estimator of Ex,, denoted by H,,, is always positive definite and that the estimator of fl possesses finite moments. It seems clear that one should never use an a smaller than k 1. The estimators are constructed on the assumption that Zx, is positive definite. In practice one yill wish to investigate the hypothesis that C,, is positive definite. The root A provides information about the validity of the model assumptions. A small root, relative to the tabular value, suggests that m,, is singular or that the model is otherwise incorrectly specified. For the model of Theorem 2.5.2, mrr can be singular if uqq= 0 or if mxxis singular. Study of the sample covariance matrix of X helps one discriminate between the two possibilities. When the model contains several independent variables measured with error, the smallest root, f, of the determinantal equation
+
lmxx - YSU"1 = 0
(2.5.10)
can be used to test the hypothesis that lCx,\ = 0. By Theorem 2.3.2, F = ( n - k)-'(n - I)? is approximately distributed as Snedecor's F with n - k and d , degrees of freedom when the rank of Z, is k - 1. When F is small,
2.5.
MODIFICATIONS To IMPROVE MOMENT PROPERTIES
173
the data do not support the assumption that the model is identified. One must decide if one has enough confidence in the assumption that 1?12, > 0 to warrant constructing an estimator of b. Theorem 2.5.2 is for the sequence of estimators indexed by n, where n is the number of observations. The error variance, Z,,,,, is constant. An alternative sequence of estimators can be constructed by letting the variance of u, approach zero, holding n constant. As with Theorem 2.3.2, the results of Theorem 2.5.2 hold for such sequences. See Fuller (1977). 2.5.2. No Error in the Equation A theorem analogous to Theorem 2.5.2 can be proved for the modification of the estimator constructed for the model of Section 2.3.
Theorem 2.5.3.
Assume
T: = x,j? + e,, 8,
XI
= x,
+ u,,
(2.5.11)
"(0, E&&),
for t = 1,2, . . . , n, where {x,} is a fixed sequence of k-dimensional row vectors and 8, = (el, u,). Assume that {IxIl} is uniformly bounded and that lim
n-cc
n-1 f=1
x;x,= M,,,
(2.5.12)
where M,, is nonsingular. Let wherefi is the smallest root of IM,, - AS,,I = 0, S,, is an unbiased estimator of Xea distributed as a multiple of a Wishart matrix with d, degrees of freedom, d, is proportional to n, and ct > 0 is a fixed number. Then E { b - /I} = [n-'(l - ct)1 O(n-').
+
+ ( a - ' + d;')M;:{?2,,,,
-B ~ ~ C ~ ~ C ~ ~ } ] M ; ~ Z ,
Furthermore, through terms of order n - 2 , the mean square error of uniformly in the parameters, smaller for ct = 4 than for any smaller ct.
Proof. Omitted. See Fuller (1980).
is,
0
A result analogous to Theorem 2.5.3 holds for random x,. In practice one can distinguish two situations. In one the error covariance matrix is known or estimated, and in the other the covariance matrix is known only up to a multiple. If the covariance matrix is known or estimated,
174
VECTOR EXPLANATORY VARIABLES
the estimator (2.5.13) can be used. If the covariance is given by YEEo2, where re,is known and d is unknown, the suggested estimator is
) = [ M x x - l(1 - a-'a)YJ1[Mxy
-
&I - n-'a)Yue], (2.5.14)
where 1is the smallest root of IM,, - AreE[ = 0. Table 2.5.4 contains the sample moments and estimators for 25 samples of 21 observations generated by the model
TABLE 2.5.4. Maximum likelihood estimator and modified estimators for 25 samples of size 21 with a,, s 1, nee= a,, = 1, and = 1 Sample Number 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25
s
mxx
mXr
myy
i
j,
a=1
a=2-GIz,,
1.05 1.24 1.36 1.45 1.50 1.53 1.54 1.62 1.63 1.67 1.76 1.78 1.82 1.83 1.88 1.90 2.07 2.12 2.38 2.40 2.79 2.88 3.1 1 3.30 3.38
0.1 1 0.53 1.01 1.02 0.28 0.82 0.23 0.52 0.64 0.56 1.08 0.8 1 1.26 0.24 1.22 0.98 1.37 1.11 1.87 0.93 1.22 0.43 1.10 1.42 1.73
1.58 1.20 2.69 1.81 1.80 1.75 2.39 1.75 1.87 1.40 2.62 1.51 2.07 0.94 1.40 1.79
1.02 0.69 0.82 0.59 1.33 0.8 1 1.48 1.16 1.10 0.97 1.02 0.82 0.68 0.88 0.39 0.87 0.37 0.80 0.84 1.33 1.39 0.99 0.74 1.18 1.16
4.90 0.96 1.86 1.19 1.69 1.14 3.92 1.13 1.20 0.79 1.48 0.85 1.11 0.26 0.82 0.94
1.53 0.9 1 1.73
1.15 1.20 1.08 1.74 1.oo 1.09 0.74 1.38 0.81 1.07 0.25 0.8 1 0.90
0.84 1.22 0.87 0.87 0.23 0.85 0.67 0.78
0.82 1.19 0.82 0.83 0.22 0.83 0.65 0.76
0.92 0.87 1.64 1.13 1.01 1.04 1.28 0.94 1.03 0.7 1 1.33 0.80 1.06 0.24 0.81 0.89 0.79 0.80 1.17 0.80 0.81 0.22 0.83 0.65 0.76
1.48
1.74 3.13 2.14 2.44 1.09 1.88 2.13 2.51
0.8 1
0.80
2.5.
175
MODIFICATIONS TO IMPROVE MOMENT PROPERTIES
where Po = 0, P, = 1, and a2 = 1. The set of samples is such that the mean of m,, for the 25 samples is equal to 2, the population ox,. The samples are ordered on m X x .Estimators of PI were constructed for the samples under the assumption that 5’ is unknown. The first estimator is the maximum likelihood estimator where
S, = ( m x x - b-lmx,,,
(2.5.16)
is the smallest root of
(2.5.17)
= 0.
lmzz -
The second estimator is the modified estimator (2.5.14) with third estimator is the modified estimator (2.5.14) with ct =
1
ct
= 1 and the
+ m ~ ~ n =, , 2, - I?,,,
where R,, = m;i(m,, - CT,,,,).Although larger estimates tend to be associated with small values of m,,, the association is not as strong as that observed for the CT,,,known case of Section 2.5.1. Also, the effect of the modification on the estimator is not as marked as the effect for the CT,, known case. Samples 1 and 7 are examples of the kinds of samples that produce estimates for PI that are large in absolute value. It is for such samples that the modification produces large improvements. The percentiles of the Monte Carlo distributions for 2000 samples of size 21 are given in Table 2.5.5 for two sets of parameters. In the first set, oXx= 1 and in the second set, cXx= 4. The remaining parameters are the same in both sets with B,, = CT”,= 1 and PI = 1. The modification moves the distribution to the left with large shifts in the upper percentiles. As a result, the TABLE 2.5.5. Monte Carlo percentiles for maximum likelihood estimator (b)and with a = l), u,, = nu, = 1, /?,= 1, and n = 21 modified estimator
(B,
Cl,,
Percentile 0.10 0.25 0.50 0.75 0.90 0.95 0.99
IT,,
81
~
0.0 1 0.05
=1 a 1 ~
0.11 0.42 0.54 0.76 1.oo 1.30 1.73 2.15 4.69
=4
81
Pl
0.63 0.74 0.80 0.89 1.oo 1.13 1.26 1.35 1.60
0.63 0.73 0.79 0.88 0.99 1.11 1.23 1.32 1.54
~
0.1 1 0.39 0.52 0.73 0.95 1.21 1.54 1.78 2.56
176
VECTOR EXPLANATORY VARIABLES
TABLE 2.5.6. Monte Carlo percentiles for alternative test statistics for the model with u,, = a,, = 1, = 1, and n = 21
Parameter and Variable
1 t , for j, i, for
ax (P~)2u,,.
If u,, < u,, or u,, < (py)2u,, and if tnvv< ( j y ) 2 m x xshow . that the supremum of the likelihood over the parameter space is given by = (a, a,,,)',where
+ (p?)-'myy
dxx = uUu
and d,, = (/3y)2~.u.
If ux, < uuuor u,, < ( p ~ ) 2 u ,and u if m,, > (Py)'m,,, show that the supremum of the likelihood on the parameter space occurs for d,,
= u,,
and
d,, = (py)zu,.
+ mrr.
38. (Section 2.5.3) Let uuMbe unknown and unbiasedly estimated by s,, where d,u,'s,, is distributed as a chi-square random variable with d, degrees of freedom. Assume s,, is independent of XI),t = 1, 2 , . , , , n. Let the estimator 7 , be the estimator (2.5.23) with uuureplaced by swu,Show that the variance of the approximate distribution of f l is
(x,
(n - l ) ~ l m ~ ~ * ( t n+x u,u,, ~ u ~ u + u:")
+ 2d;
'tn;+,%&.
39. (Section 2.5.3) Expand (f, - yl), where the estimator is defined in (2.5.23), in a Taylor expansion through terms of O,(n- '). Show that the expectation of the sum of these terms is zero. 40. (Section 2.5.3) Show that if uuu= 0, the estimator (2.5.23) reduces to 91
=
b&t
i(n - 2Nn - l ) - ' i q 1 t } 1 - I 3
where j l t is the ordinary least squares regression coefficient for the regression of on x, with an intercept, and V { j , / ] is the ordinary least squares estimator of the variance of j l t .
21 22 23
16 17 18 19 20
10 11 12 13 14 15
9
1 2 3 4 5 6 7 8
Observation
6 6 6
5
7 7
6 5
6 5 5
7 6 6
4
5 6 3
4
2 2 5 3
Developed Y
6 6 6 6 6 7 6 6 7 7 7 7
5
5 4 5 5 5 5 6 6 5
4
x,
Logical
8 10 9 8 6 7 8 7 8 10 5
6 6
5
5
7 6
8
5 6 6 7 9
x2
Irritating
TABLE 2A.1. A sample of 100 evaluations of essays
APPENDIX 2.A.
8 8 10 6
7
8
7
6 7 6 9 8 7 6
6
7 7
8 8
5
6 6
7
x3
Intelligent
5 7 7 6 6 7 7
9 7
5
5 1 9
9 6
7 6 6 5 4 6 8
8 8
8
5
6
8 7 9
4
3 6 5 6
w2
Appropriate
8
7
6
6 7
6
5
6 6
4
6
w,
Understand
LANGUAGE EVALUATION DATA
6 7 6 5 5 8 10 6
9
10
9
7 5 5 5 2 6
7
6 6 6
4 6
w,
Acceptable
7
7 6 6 8
7
5
4
6 8 6 9 6 8
6
6 6 6 5
4
7 6 5
w4
Careful
E
L
7 7 7 7 7 7 8 7 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 10 10
6 6 7 6
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
45 46 47 48 49
44
XI
Y
Observation
8 6 8 6 5 7 7 6 8 8 7 7 8 8 8 7 8 9 10 9 10
7
Logical
Developed
TABLE ZA.1. (Continued)
10 7 10 9 8 9 8 10 10 10
10
8 7 7 9 7 9 7 5 7 6 8 9 6 8 8
x2
Irritating 8 8 7 8 8 10 6 8 8 7 9 8 8 5 9 9 10 8 8 9 10 10 10 10 10 10
x3
Intelligent
7 8 10 10 8 10 7 9 9 9 10 10 10
10
7 8 7 9 8 8 6 8 8 9 8 8
WI
Understand 6 6 6 7 6 9 6 3 6 8 9 9 6 8 8 9 8 5 7 8 6 7 9 10 10 10
w 2
Appropriate
6 5 7 4 8 8 8 6 8 9 8 5 9 8 6 8 9 9 10 10
7 7 9
6 7
6
w 3
Acceptable
8 8 7 8 7 8 4 5 7 6 7 8 8 2 9 8 8 7 8 8 10 9 10 8 10 10
w4
Careful
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
53
50 51 52
I
6 6 7 8 7 7 7 7 6 8 6 7
4
5 6 5 6 9 8
4 6
5 7
6
8
4
6 5 4 6 4 5 4 6 5 6 7 8 6 6 5 7 8 6 7 7 6 8 8 5 4 6
3
10 5
10 4
10 5 6 6 6 6 8 8 8 8 7 6 6 8 7 6 8 7 5 6 8 8 7 7 8 9 9 9 6
9 9 8 8 9
8 8 8 7 7 8 9 9 8 7 7
8 8
7 7
5
8 8
7
6
7
10 8 7
6 6 6 8 8 9 7 8 6 8 9 8 6
5
8 8
8 5 4 6 5 6 6
5
7 6
3
10 4
8 7 7 6 6 8 6 8
7
4 6 5 5 5 6 6 5 7 6 6 7 7 6
5
10 4 4 4 4
5 7 7 8 7 6 8 8
6 6 5 4 7 6 6 7 7 6
5 5 6
4
4 4 5 5 4
3
10
7 6 6 8 5 6 6 7 10 9 8 7 5 8 5 9 6 8
1
6
5
7 5
5
10 6 7 7 6
01 8 01 01 8 8 6
L 6 8
L
L
8 8 8 8 L 9
P 9
5 9
01 01 01 6 6 01 6 6 L L 6 8 8 8 L L 8 8 8 8 L
9
01 01 6 6 6 8 8 8 6 01 8 8 8 8 8 L 8 L 8 8 6 L
01 6 01 01 6 L 6 8
L
6 6 8 8 O[ L 9 01 L
6 6
L
01 01 01 6 6 L L 01 6 L L 6 8 8 8 8 8 01 L
01 L
L L
01 01 01 01 6 01 01 8 01
6 8 8 8 8 01 01 8 L L 8 L 6
01 01 6 8 6 9 L 8 8 L L 6 8 8 8 01 9 6 8 5 01 8
01 6 L 8 8 9 8 L 8 8
L
8 8
L
L 01 9 8 L 5 6 8
001 66 86 16 96 56 P6 €6 Z6 16 06 68 88 18 98 58 P8 €8 Z8 18
08 61
Measurement Error Models Edited by WAYNE A. FULLER Copyright 0 1987 by John Wiley & Sons, Inc
CHAPTER 3
Extensions of the Single Relation Model
The linear measurement error model with normally distributed errors has a considerable history. Consequently, the theory for the models of Sections 2.2-2.4 is relatively well developed. On the other hand, extensions of the model to nonnormal errors, nonlinear models, and heterogeneous error variances are currently areas of active research. In Section 3.1 we consider the problem of heterogeneous error variances and nonnormal errors. Estimation procedures for a number of specific situations are suggested. Section 3.2 treats estimation for nonlinear models with no error in the equation. Nonlinear models with an error in the equation are considered in Section 3.3. Section 3.4 is an introduction to estimation for measurement error models containing multinomial responses. 3.1. NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
The models of Sections 2.2 and 2.3 assumed the errors to be normally distributed and the covariance matrix of the errors to be the same for all observations. In this section the error covariance matrix is permitted to be a function of t . Estimation procedures are suggested that represent a melding of the linear errors-in-variables techniques of Sections 2.2 and 2.3 with methods for ordinary linear models containing heteroskedastic errors. Relatively distribution-free estimators of the covariance matrix of the approximate distribution of the estimators are developed. The estimators and their large sample properties are given in Section 3.1.1. Applications to several specific problems are described in the sections following. 185
186
EXTENSIONS OF THE SINGLE RELATION MODEL
Introduction and Estimators
3.1.1.
To introduce the general problem, assume that we have conducted n experiments to measure the relationship between y, and x,. Each experiment provides an estimate of (y,, x,), denoted by (& XI),that contains measurement error. An estimate of the covariance matrix of the measurement error, denoted by t,,,,,, is also obtained in each experiment. The true covariance matrix of may vary from experiment to experime_nt. In the error, denoted by Eaatlr Example 3.1.7 the X I ) are estimated regression coefficients and &,,, is the estimated covariance matrix of the coefficients. In Example 3.1.4 the XI)are the averages of two determinations and is a one degree-offreedom estimator of X,,,,. Let
(x,
e,,,,
(x,
Z, = z, + a,, Ind(O, block diag{a,,, Caatl}),
Y , = x,C + q,,
(q,, a,)’
-
(3.1.1) (3.1.2)
where 2, = ( y , X,), a, = (w,,u,) is the vector of measurement errors, qr is the error in the equation, x, is a k-dimensional row vector of true values, and Ind(0, C,,,,) denotes a sequence of independent random variables with zero means and covariance matrices C,,,,. We assume that estimators of C,,,,, t = 1,2,. . . , n, are such that
(3.1.3)
plim n-1 fl+W
2 E,,,,
I=1
To arrange the unique elements of i?,, =
vech C,,,, 1
=
lim
n-tw
n-1
5 x,:,,,,.
t=1
(3.1.4)
e,,,, and C,,,, in columns, we let and c,, = vech Cap,,,
(3.1.5)
where vech A is the column vector created by listing the elements of the matrix A on and below the diagonal in a column. See Appendix 4.A. We have introduced several components of the estimation problem, including the specified relationship (3.1.1), the (I;,XI)observations, and estimates of the covariance matrices of the a,. A final component of the problem is a set of weights, 7c,, t = 1,2, . . . , n, estimated by the set of weights, f?,, t = 1 , 2 , . . . , n, for the observations. The weights will generally be related to the variances of the errors in the model. Often, as with ordinary generalized least squares, the estimation procedure will be an iterative one with K, = 1 for the first iteration and an estimator of the “optimal” weight used as the 72, for the second and higher iterations.
3.1.
NONNORMAL ERRORS A N D UNEQUAL ERROR VARIANCES
187
To construct an estimator for the parameters of model (3.1.1) assuming that oqq> 0 is unknown and that a set of estimated weights is available, let
(3.1.6) If, for example, A, E 1 and (3.1.3) holds, then M,,, Therefore, a natural estimator for /?is
s
= M,, is unbiased for MZz.
= M,ZA,,,,
(3.1.7)
where M,,, and Mxxg are submatrices of A,,,. In practice, one would modify estimator (3.1.7) by a procedure analogous to that of (2.5.9). We give the large sample properties of estimator (3.1.7) in Theorem 3.1.1. The specification introduced in Theorem 2.2.1, which covers both the functional and structural models, is used in the theorem. Persons interested in applications may prefer to proceed directly to Section 3.1.2.
Theorem 3.1.1. Let model (3.1.1) and (3.1.4) hold. Let [q,, at, (tar- cot)', (x, - p,,)], t = 1 , 2 , . . . , n, be independent with bounded 4 -f 6 moments (6 > 01, oqq> 0,E { ( q , , ar, qrar)lxr}= 0, E{x, - p x r }= 0, and (3.1.8) be a fixed sequence indexed by by fixed positive numbers. Let
1,
where {n,} is bounded
j , = [Z,, (vech ZiZ,)', (vech ziz,)', tb,]' and assume that (3.1.9)
for j = 1, 2. Let plim n- w
n-1
f n,x;x, = p ~ i mM,,,
,=I
lim G
n+cv
be positive definite, where G = n -
=
C
'I:=,n:E{did,},
d; = W r- L
u, = qr
= M,,,
fl+W
( t ,
+ w,- urj,~,,,,= (E,,,, - ~,,,,p),and euwft and e,,,, are submatrices
188
of
EXTENSIONS OF THE SINGLE RELATION MODEL
e,,,,.Assume plim n-1’2 n-rm
lim
n-m
n
C (fif - nJd, = 0,
1=1
n-1 f=1
(3.1.10)
nfp:fpx, = Mbff,,.
Then
? p ( j - fl) 5 N ( 0 , I),
where
(3.1.11)
qPP - n - l M xffx - l G M -xffx, l
(3.1.12)
Proof. We have ,1/2(j
- t)= n1/2M-1@ xffx -
n- 1 / 2 M xffx - 1
- M- n - 1 1 2 A
xny
-Mxnxfl)
9 - %“fJ 9 n K u 1 %”fJ + O,(l)l
1=1
fif(Xl0,
-
1=1
where we have used the distributional assumptions and assumptions (3.1 4,
(3.1.9), and (3.1.10). The random variables
n,di = n,(X$, - ~,,,,) are independent random vectors and as n
[
1=l
+ 03,
~ ~ E { d : d , } ] - ~ ” n,d; A N(0, I) 1=l
by Lemma 1.C.2. To complete the proof we show that
[
plim G - n - l n+ m
i n:E(d:d,)]
r=1
= 0.
Because the random portions of the elements of did, have bounded 1 moments and { n f }is bounded,
+
3.1.
NONNORMAL ERRORS A N D UNEQUAL ERROR VARIANCES
189
by the weak law of large numbers. Using d; = Xiur
- %tr
-
- %utt)O
- B)
and assumption (3.1.9), we have
Also, by assumption (3.1.9),
plim n - ’ n- w
n 1=1
(Afdid, - n:d;d,)
and we have the conclusion.
=0
0
Theorem 3.1.1 gives us a statistic that can be used, in large samples, to test hypotheses about fl without completely specifying the nature of the distribution of (x,,uJ. If (el, u,) NI(0, X,,),ind:pendent of x,, li, = n, = 1, and g,,,, = C,,, the estimator of the variance of /Idefined in (3.1.12) is estimating the same quantity as that estimated by (2.2.25) of Section 2.2. When the measurement errors are normally and identically distributed and independent of the true x values, then the estimator (2.2.25) is preferred because it has smaller variance. The estimator (3.1.12) is usually biased downward in small samples. If the sample is not large, it may be preferable to develop a model for the error structure rather than use the distribution-free estimator of Theorem 3.1.1. As with all problems of regression type, one should plot the data to check for outliers in either u*, or X,. In addition to the plot of 6,against the individual X t i , the plot of C, against [X,(n-’ X;Xj)-1X;]1’2 will help to identify vectors X, that are separated from the main cluster of observations. Distribution-free methods of the type under study will not perform well in a situation where one or two observations are widely separated from the remaining observations. Carroll and Gallo (1982) discuss some aspects of robustness for the errors-in-variables problem. Given the estimator of /?, one can estimate the variance of the error in the equation with
-
d,,
= (n
- k)-I
c (I: n
I =1
- Xj)’
-
(3.1.13)
Naturally, the estimator of oqqis taken to be the maximum of (3.1.13) and zero. If the estimator is zero, one may choose to estimate /3 by the methods of Theorem 3.1.2 below. In some situations it is known that the q1 of the model (3.1.1) are identically zero. To construct estimators for such models, we proceed by analogy
190
EXTENSIONS OF THE SINGLE RELATION MODEL
to the methods developed in Section 2.3. The suggested estimator is
a
= ~;,',fi,,,,
(3.1.14)
where * A
Mznz
and
2 is the smallest root
= MZnZ - A
L a . .,
of IMZnZ
- &ma.
.I = 0.
(3.1.15)
The limiting process used in Theorem 3.1.2 is that introduced in Theorem 2.3.2. The error variances are assumed to be proportional to T-' and the limiting behavior is obtained as v = Tn becomes large. The index v becomes large as the number of (y, XI)observations, denoted by n, becomes large and (or) as the error variances become small.
Theorem 3.1.2. Let model (3.1.1) hold with q, = 0 and %att
= T- 'finart
for t = 1, 2 , . . , , n. Let [T'/2a,, T& - ca,)),(x, - a,,)], t = I, 2 , . . . ,n, be independent with bounded 4 6 moments (6 > 0), Eta, Ix,} = 0, E { x , - a,,} = 0, and
+
where v = Tn. Assume that
n
1 (A{ - n!)(z;zI, Taia,, TE,,,,) = ~ , ( n - ' / ~ ) ,
n-1
I= 1
(3.1.16)
for j = 1, 2, where
t,= [T'/Z(vec z;a,)',
T(caI- vech a:at)', T(L,, - c,,)').
Let {(p,,, Tcb,, n,)} be a fixed bounded sequence, where {n,} is bounded above and below by positive numbers. Let plim n - 1 V'W
f
,=I
fi,x:x, = plim M,,, V-
Iim G = G
v+m
m
=
-
M,,,,
3. I .
191
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
be positive definite, where G = n - ' x f = ,n~E{d~d,},
f $k,,,,
plim Tn-
= plim \,-m
r=1
V-m
Then
where r v= n-'M;,;GM;A, (3.1.14).Furthermore, if n
f nrZaatt.
r= 1
cf=l
G = n-' x,2E{d;dr},and as v + 00, then
-+ 00
VpBI'?B
where
Tn-
L
- B)
W O , 11,
Q 88 - n - l M -xl n- -x G M 2 ,
i
= ( n - k)-'
-
&=
(cur".. I L r u .
XZ, -
.I = n -
L
c
r=1
is defined in
(3.1.1 7 )
*2-1nr d,d,, *
t
r
- 6kh. .($ - 5 v u t r ) L n u . . 3
n
,=I
L r ) r
%(&"l,?
Z",,,) = ( a " ' L r 4 a"'~aurr)t
(~uuttr
and a"' = (1,
-fit).
Proof. By (3.1.16) MznZ- M,,, - Zana,.= Op(n-1/2). Then, by the arguments used in the proof of Theorem 2.3.2, A - 1 = O,(n- 112) and
s
-B =
RA(Mxny- fix,,/?)
c k,[(X$, n
= t2-1M;A
r=1
- %),
-
(1- 1)?2uu,r]+ op(v- 1/2)*
192
EXTENSIONS OF THE SINGLE RELATION MODEL
Multiplying the equation
(sl
filz;zl- x
i fi12,,,,1
1=1
a" = 0
on the left side by a' and using zla = 0, we obtain
Also, n
The random variables T 1 h l d Iare independent and, if n have
-+ 03
as v
-+ 03,
we
r, 1'2(8- fl1 &+ N O , 1)
by Lemma 1.C.2 and Theorem l.C.3. By the assumptions, pIim TICI;,$ICI;~ V+oO
and, i f n
-+
03
TM;$~M;~:
V+rn
as v - + co,
plim (n - k)-'T v-m
= plim
i fi:ddl
I=
1
= plim n-'T v+ w
2 n:E{d;d,}.
1=1
17
The normal approximation of Theorem 3.1.2 will be satisfactory when the sample size, n, is large or when the errors are normally distributed with small variances.
3.1.
193
NONNORMAL ERRORS A N D UNEQUAL ERROR VARIANCES
Using the fact that the variance of a chi-square random variable is 2b, where b is the degrees of freedom, an approximation for the distribution of (n - k ) - ' n i is the F distribution with b and infinity degrees of freedom. The parameter b is unknown, but b can be estimated by
(d'Ea,a,,i)-2(n- k ) - 2 where oz' = ( I , -@) and o", = yt - X$. Maximum likelihood estimation for the normal distribution model with no error in the equation is discussed in Section 3.1.6.
3.1.2. Models with an Error in the Equation Assume that the error covariance matrices of model (3.1.1)are known. Then, a method of moments estimator for is (3.1.19)
j = M;:Mxy, where the matrices,
(A,,,
Mxx) = n-
i "XY,
1=1
- ~ " W t t f (XF, t -
~,,f,)l~
are unbiased for (Mxy,Mxx).In practice one would modify estimator (3.1.19) in the manner described in Section 2.5. The modified estimator is (3.1.20)
2 is the smallest root
of
1
1=1
(ZZ
- Azaatt)l=
(3.1.21)
0,
and z, = (y,, xJ. The limiting distribution of n1l2($ - /3) is the same as that of n'/2($, - 8). Given the estimator of B, an estimator of crqq is d,, =
f=l
[(n - k)-'(Y, - X$)' - n-'(l, -fi')Zaatf(l,
-pr)'].
(3.1.22)
The estimator in (3.1.22) will be positive for estimator (3.1.19) if the root A defined in (3.1.21) is greater than one. If 2 < 1, the estimator of crqq is zero.
194
EXTENSIONS OF THE SINGLE RELATION MODEL
By Theorem 3.1.1, the estimator (3.1.19) is normally distributed in the limit. An estimator of the variance of the approximate distribution of b constructed under the assumption of normal errors is
9{&= n-lfi;:GM;;,
(3.1.23)
where
a",,,, = gqq
+
- 2B'%VIt+ B ' L r r B ,
~ W W , ,
Ex,ff= Xuwrr- C,,,,B, and M,,
is defined in (3.1.19). The alternative estimator of the variance given in (3.1.12) is
T{)}
c:=l
= n-lM-l&(,j-l xx
xx
3
(3.1.24)
where 8 = (n - k)*-' did, and d, = XlC, - E,,,,. The estimator fl is a consistent estimator and is relatively easy to compute, but it may be possible to construct an asymptotically superior estimator using as a preliminary estimator. Recall that we can write 1
1
K = X,B + V,,
(3.1.25)
where u, = e, - u,fl and e, = qr + w,. The variance of u, in (3.1.25) is analogous to the variance of the error in the equation for the tth observation of a linear model. Therefore, it is reasonable to construct an estimated generalized least squares estimator by using an estimator of oUvIr to weight the observations. Such a weighted estimator is
where fiSvffis defined in (3.1.23). The estimator (3.1.26) can be modified by the method used to construct (3.1.20). Because the error in the preliminary estimator of fl is O,,(n- 'Iz), the error in d,,, of (3.1.23) is 0,(n-'12). It follows that fiu;itwill satisfy the conditions for i2, of Theorem 3.1.1. Unde! the assumption of normal errors, an estimator of the covariance matrix of /Iis
where
3.1.
NONNORMAL ERRORS A N D UNEQUAL ERROR VARIANCES
195
If one is unwilling to assume normal errors, one can estimate the covariance matrix with expression (3.1.12). The estimator (3.1.26) has intuitive appeal and the use of the 8";; as weights minimizes the first part of the covariance matrix of the limiting distribution-that associated with x$,. However, because of the contribution of the variance of uiu, to the covariance matrix, one is not guaranteed that the large sample covariance matrix of the estimator (3.1.26)is less than that of estimator (3.1.20).See Exercise 3.1. The weights that minimize the variance of the limiting distribution depend on the unknown x,. Without additional assumptions, we cannot construct a best weight, because we are unable to construct a consistent estimator of each x,. We expect estimator (3.1.26) to be superior to estimator (3.1.20) in almost all practical situations. One situation in which the measurement error variance will differ from observation to observation arises when the number of determinations differs from individual to individual. The different types of observations correspond to the different number of determinations. This situation occurs naturally in studies where the error variance in the explanatory variables is unknown. Then a part of the project resources must be used to estimate the error variance and this may be accomplished by making duplicate determinations on some of the study elements. Assume that it is desired to estimate the parameters of a model of the form (3.1.1) with V{(q,,at, x,)') = block diag(o,,,
x,,, xXJ.
(3.1.29)
To estimate all parameters, a sample of n elements is selected from the population and duplicate determinations are made on a subset of d , of the n elements. We assume that the two determinations are independent and identically distributed. Then we can write (Kj,
Xtj) = (
X J + (wtj, U t j L
~ t ,
(3.1.30)
where ( Xj, X,,), j = 1,2, is the j t h determination on the tth element, W a t 1 , a,2,'>
= block diag(C,,,
C,,),
(3.1.31)
and atj = (wtj, utj). An estimator of C,,, is (3.1.32) where Ztj = ( K j , Xtj). Given this estimator of the error covariance matrix, a consistent estimator of 1 is
196
EXTENSIONS OF THE SINGLE RELATION MODEL
where
[2
*
Xxx = n-'
I= 1
XiX,
(F,, XI)= -2 j C (xj,Xtj) =1 1
2
+
f=dj+l
fort
=
1
X;,Xt1 - ( n - $d,)Suu , 1 , 2 , . . . ,d,.
The estimator (3.1.33) satisfies the conditions of Theorem 3.1.1 with
.. %,it
=
'(zfl- zl2)'(zl1 - Zl2),
( n - td$d,)-
t = 1,2, . . . , d, t = d , + 1, . . . , n.
Condition (3.1.4) for the estimated covariances follows from our model assumptions and the assumption that d, is increasing. The estimator of /I given in (3.1.33) is a simple weighted average of the estimators constructed from the first d, observations and from the last n - d, observations. The estimator (3.1.33)is not efficient because the simple weights are not the optimum weights. If only the first d, observations are used to estimate /?,the variance of the approximate distribution of the estimated /I is v p p l l = d j'
+
+4 X u J u u ) G
~ ; . ~ ~ x x ~ u l ~~ l Ll P u u l l
+(4d,)- ' E L l ( & u g r r + L A J X L l ? (3.1.34) - /3')ZaR( 1 , - p)' and oUu1= gqq+ torr.See Theorem 2.2.1.
where or, = (1, From the expression for Vpp,I , the variance of the approximate distribution of M,,,, is I
vMMii =
v{Mx,iiI
=~
v
p
p
i
i
~
(3.1.35)
+
because Mxyl= E x r ~ l O,(n- '). If the last (n - d,) observations are used to estimate /3, the variance of the approximate distribution of the estimated /3 is vpp22
= ( n - d,)-
+
1 ~ : x x I ( ~ x x ~ u u 2 2cuuo""22
+ d j'zi.(LuOrr + zuuzuuFlx'
+L
2
X U U P 2 (3.1.36)
where a,,22 = oqq+ o,~,and the variance of the approximate distribution of Mixy22is (3.1.37) vMM 2 2 = xxxvp/72 2zxx. Because the same estirqator of the measurement error covariance matrix is used in the two estimators of Mxy,the covariance of the approximate joint is distribution of Mxyll and MxyZz VMM12 =
(2dJ)-'(XuuCrr
+
%J~UU)*
(3.1.38)
3.1.
197
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
Given the estimator (3.1.33), we can construct estimators of all quantities entering the covariance matrices (3.1.35), (3.1.37), and (3.1.38). Let
-
VMM
= [SMMII
!MMl2]
2TMM21
vMM22
(3.1.39) ’
Then an improved estimator of /? is
where
The estimated covariance matrix of the approximate distribution of
V}=
“XXll?
~ x , 2 2 ~ ~ , L ( ~ x ,Ml xlx ~2 2 ) 1 1 - l .
/ is (3.1.41)
Example 3.1.1. We illustrate a situation in which duplicate observations are used to estimate the measurement error variance. Assume that the 25 observations given in Table 3.1.1 have been selected from the same population of sites as the 11 sites studied in Example 1.2.1. The data of Table 3.1.1 differ from those of Table 1.2.1 in that two determinations were made on soil nitrogen at each site for the data of Table 3.1.1. In this example we assume that the error variance is not known and must be estimated from the 25 duplicate observations. This example differs slightly from the theory presented above because it is known that cruw= 0. Therefore, no attempt is made to estimate the entire covariance matrix Eaa. To facilitate the computations, we let
zij= (Zijl,zij2,Zij3)= ( X .- Y. ., 1, N i j - N,,), IJ
where N,, is the observed nitrogen for thejth field of the ith sample (i = 1,2), N,,= 68.5 is the grand mean of observed soil nitrogen, and y., = 97.4444 is the grand mean for yield. Let the data of Table 3.1.1 be sample one and let the data of Example 1.2.1 be sample two. For the data of Table 3.1.1 the estimated error variance is
8,” = 50-1 and the estimator (3.1.19) is
25
1 (Xtl -
X,2)2
I= 1
S’ = (0,0.4982),
= 54.8,
198
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.1.1. Additional observations on corn yield and soil nitrogen with duplicate determinations on soil nitrogen
Observation Number 1
2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22 23 24 25
where
Corn Yield
Soil Nitrogen
Determination 1
Determination 2
Average
71 78 76 59 97 53 76 43 86 44 89 46 66 62 76 59 61 70 34 93 59 48 64 95
70 66 17 58 87 69 63 45 81 58 71 66 53 54 69 57 76 69 41 87 62 40 48 103 97
70.5 72.0 76.5 58.5 92.0 61.0 69.5 44.0 83.5 51.0 80.0 56.0 59.5 58.0 12.5 58.0 68.5 69.5 40.5 90.0 60.5 44.0 56.0 99.0 98.5
106 119 87
100
105 98 98 97 99 88 105 91 90 94 95 83 94 101 78 115 80 93 91 111 118
100
zxx= diag( 1, 228.7834). Because aeuis known to be zero, Cee = s,, - (36)- '(1 1 + 12.5)fi:aU, = 50.4903.
The estimators of expressions (3.1.35), (3.1.371, and (3.1.38)for diagonal error covariance matrices are VMM1
-
= diag(2.2916,609.4423),
PMMz,
= diag(5.8265, 1779.6960),
VMMl,
= diag(0,29.8146).
~ ~d,, where duull= age+ %fi:8,, = 57.291 and 1 5 , " =
+ ?j:B,,
= 64.0918. The
3.1.
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
estimator of
199
p defined in (3.1.40) is = (-0.0204, 0.4996)',
where Mxyl= (-0.0044, 122.1580)', MxYz2 = (0.0101, 95.3687)', vech Mxxll = (1, -0.9400; 229.6100)', vech Mxxzz = (1,2.1364; 226.9050)', and vech is the notation for the vector half of a matrix defined in Appendix 4.A. The vector half of the estimated covariance matrix of the approximate distribution of the estimator is vech
s{b}= (1.63407, 0.00191; 0.00879)'.
In this example, the degrees of freedom for the estimated error variance and the magnitude of error variance rekative to the variance of x are such that the contribution to the variance of /?from estimation of the error variance is small. no
3.1.3. Reliability Ratios Known The model in which one has knowledge about the magnitude of the error variance relative to the variance of the observations was introduced in Section 1.1.2. We now consider the vector form of that model. Let. Y, =Po
+ XtPl + 4 r 3
z, = z, + a,,
and let the vectors (x,, a,, q,), t = 1,2, . . . , be independently and identically distributed with mean (a,, 0, 0), finite fourth moments, and
V{Cxt,at, 4,)') = block diagP,,,,
&a,
oq4I7
where Exxis nonsingular. Let the matrix A,, be known, where A ~= , D~;~,,D~A
and Dgz = diag(a,,, axxll,oxx22, . . . , bXXkk). If A,, known, it is replaced with
for the Y variable is not
Lw= AWUALA!W> where A:" is the Moore-Penrose generalized inverse of Auu and Aww, Awu, and A,, are the submatrices of A,, defined by the partition a, = (wt, u,). The diagonal elements of A,,, denoted in abbreviated notation by Aii, are equal ~ , tcii is the reliability ratio for the ith variable. In many to 1 - K ~ where applications it is assumed that the off-diagonal elements of Aa, are zero. An estimator of fl, can be obtained by constructing consistent estimators of mxxand of inxy.The estimator incorporating the modifications of
200
EXTENSIONS OF THE SINGLE RELATION MODEL
Section 2.5 is
jjl where
= H;:Hxy,
(3.1.42)
- i
m,, - ( 1 - n- l)DzzAooDzz iff 2 1 H z z = m,, - (f- n-l)DzzAaaDzz iff < 1, @z = diag(m,,, mxx11, mxx22, *
and f is the smallest root of
. . rnXXkk), 3
[mzz - fDZZAaafi,ZI = 0.
(3.1.43)
While not obvious, the estimator (3.1.42) satisfies the conditions of Theorem 3.1.1. To apply that theorem, let I , be the 0th element of A,, and express the ijth element of DzzA,,,Dzz as
+ O,(n
(3.1.44)
- 1).
Therefore, to the order of approximation required, Theorem 3.1.1 is applicable with the ijth element of defined by the tth elementAofthe sum in (3.1.44). The 0th element of E,,,,used to construct the vectors d, for the distributionfree variance estimator (3.1.12) is
e,,,,,
~ I i j m ~ 6 i m ~ ~ . j [ m , i ( z-I iZi)* + m;ij,@Ij - Zj)’].
(3.1.45)
If one is willing to assume that the vector (xI, ar, 4,) is normally distributed and that Z,, is a, diagonal matrix, the covariance matrix of the approximate distribution of fll can be estimated by
O{j,}
= (n - l)-’H;:fH;:,
(3.1.46)
where the ijth element o f f is fij =
mxxij(s,,
- 2 1 $ f m ~ ~-~2 ~1 f ~ & m+~2~1 ~~ ~ ~ . ~ ~ / 3 , / 3 ~ m ~ ~ ~ ~ ) 0.1
+ I,iIjj~i~jmxxiimxxjj.
Because of the differ-nt form of the knowledge used in computing tke root is different than the distribution of the root 1 given in Theorem 2.3.2. If the entire matrix A,,, is known and if q1 = 0, it can be shown that [P{f*}]-’”(f - 1) % N(0, I), where ?{I} = (n - k ) - ‘n- 1(fb’DzZAaaDZzh)-2 (6: - f6’~,,a1th)2, (3.1.47)
f,the distribution off
r= 1
3.1.
201
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
_- (X, - X)fll, 8' = (1, -p,),$,,,,
is the (k G, = S: with ijth element given by (3.1.45),and 0 satisfies A
+ 1) x ( k + 1) matrix
(mzz - f*Dzz~,,Dzz)d= 0.
(3.1.48)
This result can be used to check the rank of Ex,. One constructs the root and vector defined by (3.1.48),replacing Z, by X,and A,,, by A"". If the rank of Zxxis k - 1, the normalized root will be approximately distributed as a N ( 0 , 1) random variable. Example 3.1.2. In this example we analyze data for a sample of Iowa farm operators. The data were collected in 1977 and are a subsample of the data discussed in Abd-Ella et al. (1981). Table 3.A.l of Appendix 3.A contains 176 observations from the sample, where the variables are the logarithm of acre size of the farm (size), the logarithm of number of years the operator has been a farm operator (experience), and the education of the operator (education). Education is the transformation of years of formal training suggested by Carter (1971). We treat the 176 observations as a simple random sample of Iowa farm operators. To protect the confidentiality of the respondents, a random error was added to each of the variables in Table 3.A.1. Thus, the data of Table 3.A.1 contain two types of measurement error: that associated with the original responses and that added to protect confidentiality. The study of Battese, Fuller, and Hickman (1976) contains information on the reliability ratio for the original responses on farm size and that of Siege1 and Hodge (1968) contains information on the reliability ratios of the original responses for education and experience. The ratio of the variance of the errors added to protect confidentiality to the sample variance is known. Combining the two sources of error, the reliability ratio for size is 0.891, the reliability ratio for experience is 0.800, and the reliability ratio for education is 0.826. The reliability ratios are treated as known in the analysis. It is assumed that the measurement errors for the three variables are uncorrelated. The model is
I: = Po + PIxIl+ P2x,2 + e,,
X, = x, + u,,
where is observed size, e, = w, + qt, xI1 is true experience, x f 2 is true education, X,, is observed experience, and X r 2 is observed education for the tth respondent. The sample means are 5.5108,2.7361,and 5.6392 for size, experience, and education, respectively. The vector half of the sample covariance matrix is vech m, = (0.91462,0.21281,0.07142;1.00647, -0.44892; 1.03908)'. The estimator of C,, is the matrix of mean squares and products of X cor-
202
EXTENSIONS OF THE SINGLE RELATION MODEL
rected for attenuation. Thus, vech gxxis vech(m,,
- DxxA,,Dxx) = (0.805 18,
- 0.44892; 0.85828)',
where D,, = diag(1.00323, 1.01935) and 14""= diag(0.200, 0.174). The estimated equation computed from Equation (3.1.42) using the program SUPER CARP is
Pt = 2.55 + 0 . 4 3 9 ~+~0.313xt,, ~ (0.79) (0.116)
(0.096)
where the numbers in parentheses are the estimated standard errors constructed with estimator (3.1.12). The vector half of the covariance matrix estimated from Equation (3.1.12) is vech
vpp= (0.6237, -0.0765, -0.0713; 0.0134,0.0068; 0.0092)'.
The vector half of the estimated covariance matrix (3.1.46) computed under the assumption of normality is vech 3{8}= (0.4768, -0.5025, -0.5931; 0.0087, 0.0047; 0.0082)', where s,, = 0.8652. All estimated variances are smaller under the normality assumption. It is clear that the variables cannot be exactly normally distributed because experience is reported in years and education is restricted to a few values. Also, a plot of the residuals suggests that u, is negatively skewed. Because of these facts and because the sample is not small, the distributionfree form (3.1.12) seems the preferable variance estimation method. The estimated squared multiple correlation between I: and x, is
k& = (0.91462)-'[0.4386(0.21281) + 0.3126(0.07142)] = 0.126.
Also see Exercise 3.6.
0 0
For other applications of the correction-for-attenuation model see Fuller and Hidiroglou (1978) and Hwang (1986). 3.1.4.
Error Variance Functionally Related to Observations
This section is devoted to the model wherein the covariance matrices of the measurement error are known functions of observable variables or are known functions of the expectation of observable random variables. In particular, we consider estimator matrices constructed as (3.1-49)
3.1.
203
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
where r
< k + 1 and
are observable vectors. We treat situations in which (3.1 SO)
or in which (3.1.51)
Because any covariance matrix can be written in the form (3.1.49), the case of known error covariance matrices is automatically in the class. The model with known reliability ratios is also in the class. For example, if the A,, of Section 3.1.3 is diagonal, then
in the ith where is a ( k + 1)-dimensional row vector with ,l,!i'2(Z,iposition and zeros elsewhere. The vectors + t i may themselves be expressed in terms of other vectors. For example, we may write
zi)
+ti
=
{COi+rOi,
Cli@rli,
*
.
* 9
Cki+tki),
(3.1.52)
where csi, s = 0, 1,. . . , k, i = 0, 1,. . . ,k, are known constants and +rji are observables that may be fixed or random. Because the # f j i are permitted to be random variables, they can be, and often are, functions of Z,. The class of estimator matrices (3.1.49) does not exhaust the class of possible models, but it includes many useful models and the form of the estimator (3.1.49) lends itself to computing. Also, the matrix of expression (3.1.49) is always positive semidefinite. A situation in which the estimator (3.1.49) is applicable arises when data are available for several groups, it is desired to estimate separate slopes by group, and the explanatory variable with separate slopes is measured with error. This is one of many cases in which a problem that is straightforward in the absence of measurement error becomes relatively complex in the presence of measurement error. We consider a simple model for two groups and assume that grouping is done without error. Let 1; denote the true value of the explanatory variable of interest and let F, = 1; + r, be the observed value, where r, is the measurement error and r, NI(0, CJ,~).Ordinarily there will be additional variables in the regression equation, but we restrict our attention to the vector used to estimate separate slopes for the two groups. Let
-
I:= x,P + e,,
X, = x,
+ u,,
204
EXTENSIONS OF THE SINGLE RELATION MODEL
where
Xt
= (Xi,, Xt2), “XI13 Xr2h
xt
= ( xt 19 x t d , ui = (Drlrt, Dt2rt),
(x,,,
xt2)I
and
= [(DtIFt, Dt2F0, (
4 1 L Df,f,)I,
(1,O) if element t is in group 1 (DllY Dt2) = {(o, 1) if element t is in group 2. Let the variance of the measurement error in Y,, denoted by w,, be independent of r,. Then the covariance matrix of a1= ( w r ,u,) is 2
‘.art
=
C +;i+ti, i= 1
(3.1.53)
where = (a:?, 0,O) and +,2 = (0, o,!/~D,,,u,!/~D,,).These concepts are developed further in Example 3.1.3. Example 3.1.3. We illustrate the estimation of separate slopes by group using a constructed data set based on a survey conducted by Winakor (1975) and discussed by Hidiroglou (1974). The survey was designed to study expenditures on household textiles such as drapes, towels, and sheets. The explanatory variables for the yearly textile expenditures were the income of the household, the size of the household, and whether or not the household moved during the year. The data are given in Table 3.A.2 of Appendix 3.A, where the variables are: Log (expenditure on textiles in dollars plus 5), X I Log (income in hundreds of dollars plus 5), X 2 Log of number of members in the household, Indicator that takes the value one if the household moved and is x., zero otherwise
Y
The variables in Table 3.A.2 were constructed so that the moment matrix m,, of the explanatory variables is similar to that observed in the study. In the actual study only about one-sixth of the households moved, while in our data set about 30% are in that category. The moment matrix of the explanatory variables is similar to that observed in the study but the correlation between expenditure and the explanatory variables is much higher in our data set. It would require about 3000 of the original observations to produce standard errors for estimated coefficients of the order we obtain from our constructed data set of 100 observations. Our model for these data is
3.1.
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
205
+
and (Xrl,Xf2)= ( x f l ,xr2) ( u t l ,uf2). It is assumed that (et, ufl, ur2) is uncorrelated with (xrl,xf2)and that the vectors (el,ull, u f 2 ,xrlrxrz)have finite fourth moments. Note that, for the purpose of this example, it is assumed that the indicator variable for moving is measured without error. Following Hidiroglou (1974) we assume (A,,, A 2 2 ) = (0.1172,0.0485). The estimator of fl given by expression (3.1.7) is
j = M,’M,,,
where
Xt = (1, Xrl, X i 2 7 M,, = Mxv,
xf1x14)3
A
4’t1
= [O, 0, (0.1 172)”2(Xr~ - X,)(1 - xt4), 0, 033
$t2
= [O, 0, 0, (0.0485)1’2(Xf2 -
$13
01, = [0, 0, (0.1I72)’”(X,1 - X 1 ) X f 4 , 0, (O.1172)’”(Xf1- Xl)Xr4]. x2),
This set of $-vectors differs somewhat from the set in (3.1.53). We assume we know the reliability ratios for the textile data, while the variables in (3.1.53) were constructed under the assumption that the covariance matrix of the measurement error is known. For the textile data the sample mean vector is
(7, X1,X2,T4) = (3.6509,4.6707,0.9478,0.28). The vector half of the sample covariance matrix of (Xrl,Xr2,X11xr4)is (0.1975,0.0584, - 0.1483;0.3255, -0.2825; 4.1985)’
and the corresponding portion of n -
c:= x:=
$(&ti
is
(0.02291,0,0.00749;0.01563, 0; 0.00749)’.
The estimated equation is
f
= -0.1634
+
+
+
0.6514~~1 0.3699~~2o.3336xt1x,4, (0.2501) (0.0538) (0.0342) (0.0123)
where the numbers in parentheses are the standard errors obtained from the covariance matrix (3.1.12). The estimates were computed using SUPER CARP.The estimator contains a modification similar to that used in estimator (3.1.42).An estimator of neeis 8,, = (n - k ) - ’
r=1
(k; - X f f i ) 2- j’e,,,,j = 0.02316.
206
EXTENSIONS OF THE SINGLE RELATION MODEL
Using the estimator of oee,we can construct an estimator of IT,, for the two types of observations. For the set of individuals that did not move (xt4= 0), the estimated coefficient for xtl is 0.6514, and for the set of individuals that moved (xt4= l), the estimated coefficient for xtl is 0.9850. Therefore, the two estimators of c,, are 6uu(o, = 0.03503 and
8,,(,, = 0.04753.
Given these two estimates of the error variances, we can construct an improved estimator of /Iby weighting the observations with the reciprocal of the appropriate 6u,,(i).To accomplish this we define the vectors
6,:d2(y, 1, xtl, xt2, Xtlxt4), with i = 0, 1 for the households that did not move and for the households that moved, respectively. The vectors for the transformed problem are 0, (0.1172)1/2(xt1- X)(1 - Xr4), 0,01, 0, 0, (0.0485)”2(X,2- X2)(1 - xJ, 01 6&:1/:[O, 0, 0, (0.0485)1/2(Xt2 - XJXt4,0], = S;,:[;[O, 0,(0.1172)”2(Xt1- X 1 ) X t 4 , 0, (0.1172)”2(X,l - X1)Xt4], = rf;:d;[o, $r2
$,3
= S:,d:[O,
+
and the estimated equation is
+
+
+ 0 . 6 5 4 8 ~ ~0~. 3 6 3 5 ~ , ~O.3334xt3, (0.0338) (0.0122) (0.2362) (0.0505)
= -0.1732
where the numbers in parentheses are the standard errors obtained from the estimator covariance matrix (3.1.12).The estimated standard errors of the intercept and coefficient of x,, are about 5% smaller than those constructed in the unweighted analysis. The estimated standard errors for xt2 and xt3 are only marginally smaller than those estimated from the unweighted analysis. The variance of the estimator of /l based on the data in one moving category is not strictly proportional to IT,,,,(^^ of that category because of the term in C,,Z:,, that enters the variance expression. In our example, this term is small relative to the term that is proportional to Therefore, an estimated generalized least squares estimator constructed by the methods of Example 3.1.1 would be very similar to the estimator of this example that was constructed using 6,,(o, and 6uu(l) as weights. on The following example illustrates the type of economic data often collected in surveys. The variance of measurement error increases as the true value increases. The fact that duplicate observations were collected on all elements used for analysis is unusual.
3.1.
207
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
Example 3.1.4. We analyze some data collected by the Statistical Laboratory of Iowa State University under contract to the Statistical Reporting Service, U.S. Department of Agriculture. The study is described in Battese, Fuller, and Hickman (1976). A sample of farmers was contacted in the first week of September 1970. The same farmers were contacted one month later and a portion of the data collected at the first interview was also obtained at the second interview. This permitted the estimation of the response variance (measurement error variance) for the repeated items. We consider two variables: x,
y,
Number of breeding hogs on hand September 1st Number of sows farrowing (giving birth to baby pigs) between June 1 and August 31
The data are given in Table 3.A.3 of Appendix 3.A. Our initial model is where X j is the number of sows farrowing as reported in the j t h interview by the tth individual, X I j is the number of breeding hogs reported in the j t h interview by the tth individual, and j = 1, 2. It is assumed that the two , are independent with zero mean vector observations (wI1, uI1) and ( w , ~uI2) and finite 8 + 6 moments. It is assumed that the q, are independent with zero means and finite 8 + 6 moments, and that 4, is independent of ( w j , , u i l , w i 2 ,u i 2 ) for all t and i. The analysis of variance constructed from the two responses of 184 farmers is given in Table 3.1.2. The 184 farmers are a subset of the original data. The entries on the “Error” line are the estimates of the error variances and covariances for a single response. The covariance matrix for (E,, U,) is one-half ) the estimator of the covariance matrix of the covariance matrix of (wIj, u , ~and of a, = (E,, ii,) has entries that are one-half of the error line of Table 3.1.2. XI) and Equation (2.2.20), we obtain the following Using the 184 means
(z,
TABLE 3.1.2. Analysis of covariance for farrowings (Y)and number of breeding hogs (XI
Source Individuals
Error Total
Degrees of Freedom 183 184 361
Mean Squares and Products
YY
XY
xx
304.98
444.16 8.72 226.15
1694.64 139.09 914.74
58.23
181.27
208
EXTENSIONS OF THE SINGLE RELATION MODEL
estimators of the parameters:
PI = (847.32 - 69.54)-’(222.38 - 4.36) = 0.2803, Po = 10.4321 - 0.2803(36.7745)= 0.1242,
d,, = 847.32 - 69.54 = 777.78, dqq= 152.49 - 29.12 - (0.2803)(218.02)= 62.26,
where (Y,X )= (10.4321,36.7745). If one assumes that the response errors are normally and independently distributed and uses (2.2.25),the estimator of the covariance matrix of the approximate distribution is vech
v(h)= (1.531, -2.76,0.099; 7.507, -0.270; 1.097)’,
(Po,
where @ = loo/?,, O.ldqq),s,, = 94.9147, d,,*= 8,, = - 15.1321, d,, = 32.1394, and d , = 184. The estimated variance of fll is about 3% larger than it would be if the covariance matrix of the measurement error were known. For these data we have the original duplicate observations on each individual. Therefore, we can use the duplicate observations to construct the $ vectors of (3.1.53). Let
(E?X r ) = 461
=
+ k;A(X11 + XfdIY
“1
N X ’ - XZ), (Xf’ - X d .
Under the assumptions of this section, the covariance matrix of (gl,if,qt) can be a function oft. Because w;lh} =
assumption (3.1.3) is satisfied for vech{n-’
f
1=1
*il@ll}
E{(G,, u,)’m>ur)},
$gl.
The elements of
= (29.12,0,4.36;0,Q 69.54)’
are the same as the elements of the error covariance matrix S,, constructed from the analysis of variance table. The estimated model obtained from expression (3.1.7) with the c1 adjustment of Section 2.5 is j , = 0.132
+ 0.2801~,.
The estimated covariance matrix calculated by expression (3.1.12)is vech
${(lo, 100fil)’}= (2.3864, -7.0196; 24.3443)’.
The calculations for this example were performed using the functionally related option of SUPER CARP. The estimated covariance matrix of Theorem 3.1.1 contains much larger estimated variances than does the estimated covariance matrix constructed under the assumptions of Theorem 2.2.1. The estimate of the variance of
3.1.
209
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
bo
the approximate distribution of is 50% larger than that based on the constant-variance-normal assumptions and the estimate of the variance of the approximate distribution of is more than three times that based on the constant-variance-normal assumptions. The farrowings data violate the assumptions of identically distributed normal errors in several respects. First, the observations are not normal because the observations are bounded below by zero and there are a number of zero observations on the farrowings variable. Second, plots of the data demonstrate that the error variances for both variables are larger for large values. In Figure 3.1.1 the mean of the two Y observations is plotted against the mean of the two X observations. It is the fact that the error variances are not constant that produces the largest increase in the estimated variance. The variance formula for the estimator of given in Theorem 2.2.1 contains a term associated with the variance of s,, where Y, = w, - u , p l . In Theorem 2.2.1 the estimated variance of s,, is constructed under the assumption that the (ut, r,) are independent identically distributed normal vectors. The failure of these assumptions can lead to large biases in the estimated
0,
m
0-
b 0-
0u)
5 3-
T
* 0-
. .. . . .. . . . . . -
a
0
X-BAR
100
-l1 + - l
120
140
180
FIGURE 3.1.1. Plot of mean of two Y observations against mean of two X observations for pig farrowing data.
210
EXTENSIONS OF THE SINGLE RELATION MODEL
variance of d,,. For the pig data, estimators of o,, = o,, and of nrrare
c,,
= ,s - ~,,,,pl= - 15.13, ifr, = s, - 2j1s,, j?~,,,, = 32.14.
+
Therefore, under the normality assumption the estimated variance of s,, is
V{S,,} = (184)-'[(69.54)(32.14) + (- 15.13)2] = 13.39,
where s, = 69.54. Because the data used to estimate the error variances are available, we can construct an alternative estimator of the variance of dUr,The estimator Cur can be written d,, = (184)-'
x2
C C,= C,
184 1=1
where C , = (Xrl - XI2)[XI- (Xrl- X,,)bl](0.25). Therefore, a direct estimator of the variance of d,,, is P{dur}= (184)-'(183)-'
184
1 (C, - C)' = 36.55.
r=1
This consistent estimator of the variance of d,, is nearly three times the , are estimator of variance constructed under the assumption that ( y j ut,) constant-variance-normal vectors. The contribution to the estimated variance arising from variance estimation is included in the estimation formula (3.1.12). In this example, the contribution to the total variance arising from the estimation of the error variances is modest. Because the range in estimated error variances is so large and because the error variance seems to be related to x,, we attempt to improve the estimator of pl.Given our initial estimator of (Po, PI), we can estimate x, with
2, = X , - U * , S ~ , ' ~ , , , = X , + 0.1596,,
where svu = 94.9147 and 6, = - 15.1321. Now f; = (0.25)[:1
- bixti
- ( X 2 - biXr2)l2
is an estimator of (1, -/31)Zaa,,(l, -PI)' and
6:
=
[
-
P - (8, - 8)bll2
is an estimator of oq4+ (1, -~l)Zaa,,(l,-ply. It is clear from a plot of 6, against 1,that the variance of 6,increases with 1,.This plot and the plot of f , against ifled us to postulate
E ( $ } = (ao + tllxJ2,
qu:}
= y2(ao
+ w,)*,
3.1.
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
21 1
where ao, a,, and y are parameters to be estimated. In this model both r, and q, are assumed to have variances that increase with the square of x,. We fit the model by iterative generalized nonlinear least squares replacing (r:, ,o: x,) with (;:,I?:, 2,). In the fitting it was assumed that the standard deviations of r: and u: were proportional to their expected values. The estimates are [Go, GI,f2]
= [0.68,0.133,
1.6431, (0.17,0.011,0.143)
where the numbers in parentheses are the estimated standard errors obtained at the last step of the generalized nonlinear least squares procedure. In this model, the variance of q, is assumed to be a multiple of ur,lland the multiple is estimated to be 0.643. Let Z, = I , XI), n,= y2(Go 612f)--2, and
(c,
+
+I1
= I"
-
L),
0 7
Then the estimator (3.1.7)computed with of Section 2.5 gives
9, =
Wfl
- XI2)J
k,,,,of (3.1.49) and the adjustment
-0.006 + O.287xf, (0.410) (0.026)
where the numbers in parentheses are the standard errors estimated with expression (3.1.12).The use of weights produces an estimated standard error for the intercept that is less than one-third of that for the unweighted analysis. The standard error for the weighted estimator of /II is about one-half of that for the unweighted estimator. Figure 3.1.2 contains a plot of fi:'*O^, against i,. The observations that originally fell on the X axis form the curved lower boundary of the deviations. The fact that this boundary curves away from zero and that the deviations above zero for small 2, seem somewhat smaller than the remaining positive deviations might lead one to consider alternative variance functions. However, the majority of variance inhomogeneity has been removed, and we do not pursue the issue further. The weights are functions of the 2,. Therefore, rather strong assumptions are required to apply Theorem 3.1.1. It can be demonstrated that the conditions of the theorem are satisfied if we assume the error variances decline as n increases. From the practical point of view, this means that the variance of u, must not be too large relative to the variance of x,. Hasabelnaby (1985) has conducted Monte Carlo studies that indicate that the approximations of Theorem 3.1.1 are satisfactory for data similar to that of this example.
on
212
EXTENSIONS OF THE SINGLE RELATION MODEL
x22-
a
e
a
@ a
.a a
a
a
a
a
D.0
3.1.5.
The Quadratic Model
In this section we study estimation of the parameters of the model wherein I:is a quadratic function of the true value of a variable observed with error. This model falls in the domain of the theory of Sections 3.1.1 and 3.1.4 because the variance of the measurement error of the square of the explanatory variable is functionally related to the true value of the explanatory variable. Let I; satisfy the quadratic model,
I:= P o + P I 4 + P A + 41,
(3.1.54) H , = h, + rt. (qr?rt)' diag(aqq,ow)] where (qj,rj) is independent of h, for all t and j and (I:,H I )is observed. The error in H:, as an estimator of h:, is 2h,rr + r:, where
Let (3.1.5 5)
3.1.
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
+
213
where x, = (1, h,, h:) and u, = (0, r,, 2h,r, r: - arr).With these definitions, we can write the model (3.1.54) in the familiar form XI = x, + u,, I; = X I B + qr, = (Po, pi, p2). Furthermore, the conditional mean of u, given h, is
where the zero vector, and the conditional variance is
r 0 I0
E(uk[hi) = 0 arr 2hrarr 0 2h,arr 4h:arr + 2u,2*
(3.1.56)
The matrix E{u~u,Ih,}plays the role of the matrix C,,,, of Equation (3.1.2) of Section 3.1.1. To construct an estimator of the measurement covariance matrix, we assume a,, is known and note that the expected value of ~u,,,l
= (0, a;/’, 2af/’H,)’(O, a:,!2, 2a;/*H,) - diag(0, 0, 2~7:~)
(3.1.57)
is equal to (3.1.56). Theorem 3.1.1 issatisfied forBof(3.1.7)withiif= l,~,,,,defined in(3.1.571, and x, satisfying the conditions of the theorem. Thus, the estimator (3.1.7) can be used in large samples with large error variances. The estimator matrix of (3.1.57) is not always positive semidefinite. It follows that there exists no set of vectors * l i of the type described in (3.1.49) such that E{*;i#,iIh,} is exactly equal to Cuurr. However, it is possible to construct vectors $ f lsuch that (3.1.58) and such that E{$:l$I1} is approximately equal to CUulr for each t. To this end, let #tl
=
(0, a:/’, 24,!’[H
+ Ci’’(Al - A)]},
where
The reader may verify that
I=1
where M,, = n -
l
0 Harr 0 Har, 4arrMHH - 20,~
I:= H:, and, hence, (3.1.58) is satisfied.
(3.1.59)
214
EXTENSIONS OF THE SINGLE RELATION MODEL
Example 3.1.5. A theory of the earth’s structure states that the earth’s surface is composed of a number of “plates” that float on the earth’s mantle. At the place where two plates meet, one plate may be forced beneath the other. This movement places strain on the bedrock and produces earthquakes. The Tonga trench is a trench in the Pacific Ocean near Fiji where the Pacific plate meets the Australian plate. The data in Table 3.A.4 are the depths and locations of 43 earthquakes occurring near the Tonga trench between January 1965 and January 1966. The data are a subset of that analyzed by Sykes, Isacks, and Oliver (1969). The variable X , is the perpendicular distance in hundreds of kilometers from a line that is approximately parallel to the Tonga trench. The variable X2 is the distance in hundreds of kilometers from an arbitrary line perpendicular to the Tonga trench. The variable Y is the depth of the earthquake in hundreds of kilometers. Under the plate model, the depths of the earthquakes will increase with distance from the trench and a plot of the data shows this to be the case. The location of the earthquakes is subject to error and Sykes, Isacks, and Oliver (1969) suggest that a variance of 100 kilometers squared is a reasonable approximation for the variance in location. These same authors also explain why it is reasonable for the depth of the earthquakes to occur in a pattern that curves away from the earth’s surface. Therefore, our working model is Yr = P O
+ P l x f l + P 2 x t 2 + P3xf1 + qr,
xt2, x?1) + (wt, rt1, rr2, arJ, (qf, w,, r,,, rt2)’ NI[O, diag(a,,, 0.01,0.01,0.01)],
(X,Xtl, Xrz, Xr3) = (Yr,
-
xt1,
where X,, = X:, - 0.01. Because the error variance ofx,, is 0.01, the expected The sample mean of the observed vector is value of X , , is
(F, X,, X2,X3)= (1.1349, 1.5070, 1.9214, 3.5020) and the matrix of mean squares and products is
[
1.2129 1.2059 0.2465 4.3714
1.2059 1.2706 0.1633 4.4813
1
0.2465 4.3714 0.1633 4.48 13 1.1227 0.6250 0.6250 16.6909
’
The three vectors required for the construction of the error covariance matrix o f ( T , Xrl, Xi23 XrJ are = (0,O. 1,0,0.2[1SO70 *f2
=
{O,O,0.1, O}, and
+ 0.99798(Xr1- 1.5070]},
$r,
= {O.L O,O,O},
where the ( of Equation (3.1.59) is 0.99597. It follows that the upper left
3.1.
215
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
3 x 3 portion of the average of the estimated error covariance matrices is 0.011 and the last row of the matrix is (0,0.0301, 0,O.1403).
The estimated equation is =
+
+
+
-0.199 0 . 4 8 1 ~ ~0 ~. 0 7 7 ~ ~ 20.132~:,, (0.079) (0.113) (0.031) (0.033)
where the numbers in parentheses are the estimated standard errors calculated from the estimated covariance matrix of Theorem 3.1.1. The computations were performed with SUPER CARP. In this example the error variance of X I Iis less than 1% of the total variation. Therefore, the coefficients are very near to those obtained by ordinary least squares. In this example there are three large deviations from the fitted model, associated with observations 4, 6, and 12. Because the X l l values and X:, values for these observations are reasonably close to their respective means, the estimated standard errors obtained from the covariance matrix (3.1.12)are smaller than those computed by the ordinary least squares formulas. 00 Example 3.1.6. In this example we illustrate the calculations for the quadratic model in which the error variances of both Y and X are known. This is an application of Theorem 3.1.2 to the quadratic model. Table 3.A.4 contains 120 observations generated to satisfy the model
Yf = Po
+ P l X I l + P,X:l, (el, ull))
- (K x
XII) = (YI?X l l ) NI(O,O.O91).
+ (e,,
%)t
The sample mean vector is Z = (F, 1, 8,)= (0.50,0.00,OSO),where X I 2 = X;l - 0.09. The matrix of mean squares and products corrected for the mean is
I
I
0.41425
m,
0.27227 0.17612 0.59496 -0.00132 . 0.17612 -0.00132 0.36530
= 0.27227
The $ vectors required for the construction of the estimated covariance matrix are +11
= (0.3,0,0) and
$r2
= (0, 0.3, 0.57666Xll),
where [ of Equation (3.1.59) is 0.92373. The estimated mean of the error covariance matrices is 120
2
k,...= (120)-' 1 i = l t=1
$@ti
= diag(0.09,0.09,0.1962)
216
EXTENSIONS OF THE SINGLE RELATION MODEL
and the smallest root of is 1= 0.9771. By the results of Section 3.1.1, the test statistic, F = (n - k)-’nX = 0.9938,is approximately distributed as an F with b and infinity degrees of freedom, where the estimated degrees of freedom defined in (3.1.18) is h = 156. One easily accepts the model because is close to one. The modification (2.5.14)with a = 1 of the estimator (3.1.14)is = [Mxx - X(1 - ~ I - ~ ) ~ , , , , ~ ] - ~-[ X(l M ~-, ,~I-’)E,,.,] = (-
O.Oo5,0.539, 1.009)’.
The estimated covariance matrix of the approximate distribution of the vector of estimators is 0.7153 -0.0605 - 1.0350
- 0.0605
0.8920 0.1704
I
- 1.0350
0.1704 10-2. 2.3254
This example illustrates the large effect that measurement error can have on the estimates of the parameters of the quadratic function. The ordinary least squares estimate of the quadratic equation obtained by regressing on XI1and X:, is
+ 0.459Xz1+ 0.484X:,, (0.058) (0.054) (0.069)
= 0.214
where the numbers in parentheses are the “standard errors” computed by the ordinary least squares formulas. The ordinary least squares coefficient for the linear effect is about 85% of the consistent estimator. This agrees with the fact that the variance of the measurement error in Xzl is about 15% of the total variation of X,,.The least squares coefficient for the quadratic effect is only about one-half of the consistent estimator. A 15% error in the original observations results in, approximately, a 50% error in the squares. on See Exercise 1.7. The procedures illustrated in Examples 3.1.5 and 3.1.6 are appropriate for large samples with large error variances. If the error variances are not overly large relative to the variation in x,, it may be possible to construct more efficient estimators. This is because the variance of the deviation from the fitted function varies as the slope of the function varies. See Section 3.2.
3.1.
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
217
3.1.6. Maximum Likelihood Estimation for Known Error Covariance Matrices In this section we extend the model of Section 2.3 to permit different error covariance matrices for different observations. Let (3.1.60) where {x,} is a fixed sequence of k-dimensional row vectors and e, = (e,,u,). It is assumed that the covariance matrices, Xicat,,t = 1,2,. . . , n, are known. The model can also be written z,a = 0,
-r).
Z, = z, + e,,
(3.1.61)
where a' = (1, For this model, the method of maximum likelihood produces reasonable estimators. For nonsingular C t = 1 , 2 , . . . ,n, the logarithm of the likelihood function for a sample of n observations is log L =
-5
1 "
1 [10gl(2n)~+~X~,,,l + (Z, - z,)C&,:(Z, - z,,'].
t=1
(3.1.62)
We can follow the approach used in the proof of Theorem 2.3.1 to show that the maximum likelihood estimator of /I is the /I that minimizes
If we differentiate (3.1.63) with respect to we obtain
iGx(X;x
t= 1
-
and set the result equal to zero,
X X B ) - Y,(Xue,t
- XuuttB)l = 0,
(3.1.64)
where u,,",,= a'&&,,a and y , = o&:,a'Z;Z,a. Because crU,,,, and y, are functions ofg, it seems necessary to use iterative methods to obtain the /?that minimizes (3.1.63). One iterative procedure is suggested by (3.1.63) and (3.1.64). If g,,,,,,and y, are replaced with trial values denoted by cf,,;,')and yji-'), then (3.1.64) can be solved for ). We suggest a slightly modified intermediate estimator of B, defined by )(i)
= f i -( 1i -
fi( i -
1)
1),
(3.1.65)
218
EXTENSIONS OF THE SINGLE RELATION MODEL
where
and
is the smallest root of
1f
f=l
(n$;,l))- '(ZiZ, - Ayji-
"r.&,Jl = 0.
(3.1.66)
The estimator (3.1.65) has the advantage that the matrix to be inverted, H;! is positive definite (with probability one). The iterative estimation procedure is defined by the following steps:
1),
1. To initiate the procedure set o$! = yio) = 1, t = 1, 2 , , . . , n. 2. Compute @') by (3.1.65) and go to step 3. 3. Set r7(i)"Utt - (1, - P')CEEtf( 1, -
B(i)')t,
y y = (&J-
( I: - x,p')2.
Increase i by one and return to step 2.
. The iterative procedure is a form of the Newton-Raphson method because Hti- 1) is an estimator of the matrix of second py-tial derivatives of (3.1.63) with respect to /3 and the vector R(i- 1) - H,i- 1,/3(i- is an estimator of the vector of first partial derivatives of (3.1.63) with respect to 8. It follows that the iterative procedure can be modified to guarantee convergence. It seems that the unmodified procedure described in steps 1,2, and 3 above will almost always converge in practice. The limiting distribution of the maximum likelihood estimator is given in Theorem 3.1.3. The theorem is proved under weaker assumptions about the distribution of e, than those used to derive the estimator. Theorem 3.1.3. Let model (3.1.60) hold. Let ( 2 , ) be a bounded sequence random ) vectors with of fixed vectors and let the e, be independent (0, Z,,,, finite 4 + v (v > 0) momepts. Let the true parameter /3 be in the interior of the parameter space B, where B is an open bounded subset of k-dimensional Euclidean space. For all vectors /.? in B, assume 0 0, there exists an Nd such that
with probability greater than 1 - 6 for n > N,. From (3.1.63),
The vectors d, are independently distributed with bounded 2 + $v moments. Therefore, n-''' X;= d, converges in distribution to a normal vector by the Liapounov central limit theorem. The variance of the approximate distribution of j can be estimated by
f(j) = m - l M - i xnx GM-I xnxi
(3.1.69)
where Mxnx
= n-
1
I =I
~iU:t(x;x, -
~
~
*
t
~
t
)
,
3.1.
NONNORMAL ERRORS AND UNEQUAL ERROR VARIANCES
221
Theorem 3.1.3 and the estimator (3.1.69) are given for quite general error distributions. If the er are normally distributed, the estimator of G of (3.1.69) can be replaced with (3.1.70)
where the estimators are those defined in (3.1.69). It is also possible to construct a test of the model. Under our assumptions, (3.1.71)
is approximately distributed as a chi-square random variable with n - k degrees of freedom.
Example 3.1.7. Rust, Leventhal, and McCall (1976) analyzed the light curves of 15 Type I supernova by estimating the parameters for the relationship between apparent magnitude (a function of luminosity) and time. They were interested in studying the connection betweeen certain parameters of the relationship. To place the problem in our notation, we let ( y , X i ) be the estimated values of ( y i , x i ) in the equation
+
L ~ ( S )= cli exp( - x; ' t ) 6i exp( - y ; It),
where Li(r)is the luminosity of the ith supernova at time z and (ai, di, y i , xi) is the vector of parameters for the ith supernova. Rust, Leventhal, and McCall constructed estimates of (ai,bi, y i , xi), i = 1, 2, . . . , 15, by nonlinear least squares. The estimated values Xi) and the estimated covariance matrices of the estimated vector are given in Table 3.1.3. For the purposes of this example, we assume that the errors in the estimates of the parameters are normally distributed and we treat the estimates of ZCeaii as if they are the true covariance matrices of the estimation (measurement) variances. The X i ) values are only approximately normally distributed because they are estimates constructed by nonlinear least squares. Rust, Leventhal, and McCall(l976) explain a theory for luminosity under which
(x,
(x,
Yi = P o
+ Plxi.
Minimizing (3.1.63) for the data of Table 3.1.3, we obtain the estimated relationship Ji
= 30.04
+
6.1 16xi, (16.33) (3.136)
where the numbers in parentheses are the estimated standard errors. The
222
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.1.3. Observations and covariance matrices for luminosity of supernova
r,
Xi
aeeii
beuii
1 2 3 4
5
67.2 102.1 59.3 55.6 75.9
6.47 9.26 4.12 4.14 5.70
46 8550 1130 29 3020
0.60 71.20 28.70 0.88 47.70
0.024 1.190 1.780 0.105 1.350
6 7 8 9 10
87.7 47.4 112.3 114.8 68.1
4.59 4.55 10.30 7.48 5.21
1470 71 1640 6630 1200
16.20 1.99 19.80 80.60 22.40
0.483 0.188 0.488 1.790 0.590
11 12 13 14 15
61.2 120.4 112.3 85.4 47.9
4.96 5.96 7.11 8.38 8.17
420 524 503 2000 922
6.01 7.43 15.00 55.30 24.40
0.195 0.214 1.020 1.850 1.Ooo
Supernova
%uii
Source: Rust, Leventhal, and McCall (1976).
estimated covariance matrix computed using Equation (3.1.70) is
1
266.598 - 50.126 -50.126 9.836
*
The values of i i and Si are given in Table 3.1.4, where 2. = x. - 6.81.8 1 i uuir uuii and Buuii= oeuii- auuiiPl. An estimator of the variance of iiis
?(ai - Xi> = o”uii- B;”;B:”ji. Because there is a considerable range in the error variance, there is a considerable range in the variance of i i- x,. We have
Because this value is well below the 0.05 tabular value of the chi-square distribution with 13 degrees of freedom, we can accept the linear model as an adequate representation for the relationship between the true values x i and y i . We note that over one-third of the sum of squares is due to supernova 12. 0 0
3.1.
223
NONNORMAL ERRORS A N D UNEQUAL ERROR VARIANCES
TABLE 3.1.4. Statistics for supernova luminosity ii
P { i i- Xi}
1 2 3 4 5
6.50 9.13 4.03 4.14 5.53
0.019 0.661 1.405 0.102 0.124
6 7 8 9 10
4.29 4.7 1 10.07 7.00 5.09
0.347 0.175 0.288 0.941 0.218
11 12 13 14 15
4.95 5.21 6.16 8.25 9.06
0.129 0.129 0.2 15 0.46 1 0.494
Supernova
ouuii
BU;;iy
-2.41 15.43 4.06 0.24 11.00
40 7724 846 22 2487
0.15 0.03 0.02 0.00 0.05
29.59
19.26 39.01 6.19
1290 54 1416 5711 948
0.68 2.04 0.26 0.27 0.04
0.82 53.91 38.77 4.1 1 -32.1 1
354 44 1 358 1393 66 1
0.00 6.58 4.20 0.01 1.56
Si
- 10.47
REFERENCES Booth (1973), Chua (1983), Fuller (1980, 1984), Fuller and Hidiroglou (1978), Hasabelnaby (1985), Hidiroglou (1974), Wolter and Fuller (1982a, 1982b).
EXERCISES 1. (Section 3.1.1) Assume the model
+ el,
-
V, = x,p (el,
X I = x,
14'NI[O, diagb,,.
+ uI,
L)],
where the uuuff,I = 1 . 2 , . . . , n, are known and uoeis unknown. Consider the estimator
where the y, are weights. (a) Show that the weights that minimize the large sample variance of YI
=
fl are
(4+ U"",, + .,:Id",,, - .',:x:
(b) If we fix the error variances, it is not possible to construct con$stent estimators for each optimum yI. Possible weights are y,,, = 1, f f I = d,,,, = Ceee ~ z u ~ uand ll, 712
-
+
+
= (Mxx
,J""ff
+ &-"A~:"ff)-
':;":,kXx,
224
EXTENSIONS OF THE SINGLE RELATION MODEL I (X:- uuull), d,,,, -&urr. and fi is an estimator of /? satisfying fixx= n - I (p - P) = 0 , ( n - 1 / 2 ) . Ignoring the error in /?,construct examples to show that none of the
\here
three weights dominates the other two. (c) Consider the weights 7, = do + dIdu-",',,where@,, GI) are regression coefficients (restricted so that 7, > 0 for all t ) obtained by regressing
cx: + C:",G;,)-
l 6 A W
- U"",J
on C;",' with an intercept. Explain why an estimator of /? constructed using 7, should have smaller large sample variance than an estimator constructed with the weights of part (b). 2. (Sections 3.1.1, 3.1.4) Assume the model Y, = Po + P I X f l
where E k , w,, uJ1 = 0, E { ( q , , x,)'a,}
=
vech L,, =
+ P2x12+ 4,.
Z, = 2,
+ a,,
0,
2 2 (YOUY,
3030;IJ:,x:,,
Y12Xflxf2;A X ? 2 Y ,
I $iiSfi.Are there x, = ( x f l ,xf2),and a, = (w,, u,). Construct an estimator of C,,,, of the form any restrictions on y I 2 ? What is the form of the estimator if 1 + y l z = O? 3. (Sections 3.1.1, 3.1.4) Compute estimates of the parameters of the textile model of Example 3.1.3 under the assumption that the covariance matrix of (ufl,u f 2 )is diag(0.0231, 0.0158). Give the estimate of the covariance matrix of the approximate distribution. Why is the estimated , f Z ,XIlx,,) slightly different from that of Example 3.1.3? error covariance matrix for ( X f l X 4. (Section 3.1.6) What function of (/?, yIi^ll, u:;,'l) does of (3.1.65) minimize? What is the value of 1 if )?.!'( minimizes (3.1.63)? 5. (Sections 2.2.1, 3.1.3) Assume that the data of Table 2.2.1 satisfy the model
8"'
4
Y, = Po
+ i1 X,iPi + = I
xi = x, +
413
MI,
where the x, and u, are independently distributed normal vectors with E{u,) = 0 and E{u:u,} = diag{u,,,,, uuu22, uuu33,u.,,~}. Assume the ratios uuuiiu;~,iare known to be 0.3395, 0.3098,0.2029,and 0.0 for i = 1,2,3, and 4, respectively.Estimate the parameters of the equation under these assumptions. Compare the estimates and estimated standard errors of the estimates with those obtained in Example 2.2.1. 6. (Sections 3.1.3, 2.2) fa) Abd-Ella et al. (1981) assumed that the reliabilities for size, experience, and education in the original data of Example 3.1.2 (before adding error to protect confidentiality) were 0.984,0.930, and 0.953, respectively.How much were the standard errors of the estimated coefficients of experience and size increased by the errors added to protect confidentiality? (b) Assume the following model for the data of Example 3.1.2. Y,
= Bo
+ PIX11 + P 2 X I 2
+
( w f ,uti, u,zY
419 5
(Y,Xi) = (Y,,
XI)
+ (w,, U f ) ,
NI(O,
where Z,, = diag(0.0997,0.2013,0.1808)and (x,, 4,)is independent of ( w , , u,) for all t and j . Estimate 1 = (Do, fll, /I2)and uqq.Compare the estimate of /? and its estimated covariance matrix with that obtained in Example 3.1.2. Test the hypothesis that uW4= 0. Estimate (y,, x f l , x f 2 ) for t = 1, 2,. . . , 10, treating ( x f l , x , ~ as ) fixed
7. (Sections 3.1.1, 2.4) Prove the following theorem.
3.2.
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
225
Theorem. Let
Y = Po + x,BI
+ e,,
X , = x, + u,,
and let (e,,u,, I,, W;) be independently and identically distributed with finite fourth moments and
E{(et,u,)l = Wi,u , ) W = (0,O). Let the instrumental variable estimator of PI be
j,= m&m,,.
Then
t;@\’;(b, - B , ) f N(0, 1). where
GplL= ( n l)-’rn~~6rn& ’ c;=i:, and 2, = ( W, - w)fi,. -
6 = ( n - 2)
~
I
3.2. NONLINEAR MODELS WITH NO ERROR IN THE EQUATION 3.2.1.
Introduction
The regression model with fixed independent variables is called linear when the mean function is linear in the parameters. For example, the model
I; = Po
+ P l X t + P2X: + e,,
(3.2.1)
where the x, are fixed constants known without error, is linear in the paramModels such as eters (Do, PI,
a*).
Y, = P o
+ P l X l l + P O P l X t l + et,
I; = P O L 1
- exP(-Plxt)l + e ,
(3.2.2) (3.2.3)
are nonlinear in the parameters and require nonlinear methods for efficient estimation. Draper and Smith (1981) and Gallant (1975, 1986) contain discussions of nonlinear least squares procedures. By the definition for regression models, the functional measurement error models of Sections 2.2 and 2.3 are nonlinear models. This is because the true x,, as well as the betas, are unknown parameters. However, it is conventional to consider the measurement error model to be nonlinear only when the P parameters enter the mean function in a nonlinear manner or when the mean function is nonlinear in the explanatory variables measured with error. Let the model be Y , = s@,; P),
(I;,X,) = (Yt, x,) + (% 4 1 ,
(3.2.4)
226
EXTENSIONS OF THE SINGLE RELATION MODEL
where g(x; p) is a real valued continuous function, {x,} is a sequence of fixed p-dimensional row vectors, p is a k-dimensional column vector, and e, = (et,u,) is the vector of measurement errors. We assume e, is distributed with mean zero and covariance matrix ZEE, where at least one element of u, has positive variance. Definition 3.2.2. Model (3.2.4) is nonlinear if g(x; /l) is nonlinear in x when j3 is fixed or if g(x; p) is nonlinear in j3 when x is fixed.
We reverse our usual order of treatment and devote this section to the model in which EzE is known or known up to a multiple. Section 3.2.2 contains an example of the model which is linear in the explanatory variable, while Section 3.2.3 is devoted to models nonlinear in the explanatory variables. Section 3.2.4 is devoted to modifications of the maximum likelihood estimator that improve small sample properties. These modifications are recommended for applications, but the section is easily omitted on first reading. Section 3.3 contains a treatment of the nonlinear measurement error model with an error in the. equation.
3.2.2. Models Linear in x Models that are nonlinear in p, but linear in x, are relatively easy to handle. Such models represent an extension of the linear errors-in-variables model to the nonlinear case that is analogous to the extension of the linear fixed-x regression model to the nonlinear fixed-x regression model. The model can be written
where gi(/?) are nonlinear functions of /I, e, = (e,, u,), and Zeeis known. If {gi(fl); i = 1, 2, . . . ,k}, ignoring the restrictions imposed by /3, can be obtained by the usual linear measurement error methods. Using these estimators of gi(p), improved estimators of can be obtained by an adaptation of nonlinear procedures.
C:= xix, is nonsingular, estimators of
Example 3.2.1.
model
The data in Table 3.2.1 were generated to satisfy the Yr = P o
+ Pixri + ( P o + P h Z ,
-
with (I;,X,)= (y,, x,) + (e,, u,) and (e,, u,)’ NI(0, I). In this example the xli enter the g function linearly, but there is a nonlinear restriction on the
3.2.
227
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
TABLE 3.2.1. Data for Example 3.2.1 ~~
~~
Observation
K
1 2 3 4 5 6 7 8 9 10 11 12 13 14
- 8.05
- 1.77
0.70 - 2.22 - 1.57 - 1.73 2.14 - 1.60 -1.11 - 2.74 - 0.86 - 0.66 - 0.31 1.54 - 1.26 1.79 0.87 2.65 - 2.00 0.76 4.47
2.33 -9.35 - 6.03 4.47 5.58 3.80 - 3.56 - 6.30 -4.11 - 5.07 6.11 -0.43 - 8.52 3.04 - 1.67 1.13 - 1.68 3.3 1 1.32
15
16 17 18 19 20
1.45 0.9 1 - 3.49 - 2.74 2.9 1 -0.16 0.92 0.4 1 - 0.22 - 1.35 - 2.87 1.21 -0.41 - 2.86 1.69 - 1.03 2.9 1 0.12 - 1.34 5.99 -
coefficients of (1, xfl,xJ. Ignoring the fact that the coefficient of x t 2 is a function of Po and PI, we find that the coefficients of the equation Y, = B o
+ BlX,l + PZXtt
can be estimated using the methods of Sections 2.3 and 2.5.2 under the assumption that the covariance matrix of (ef,u,) is known to be I. The estimated equation obtained with the program package SUPER CARP is j , = 0.014 (0.6 19)
+
+
1 . 6 7 9 ~ ~ 1.922~,,, ~ (0.6 16) (0.482)
where the numbers in parentheses are the estimated standard errors. The vector half of the estimated covariance matrix of the estimators is vech qtra= (0.383,0.097,0.089; 0.379, -0.178; 0.232)'. Under the original model
b =( b o y
bl,
b,) =
(PO?
PI, Po + P 3 + (a09 a,, a217
(3.2.6)
228
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.2.2. Observations for nonlinear least squares estimation
Dependent Variable
Independent Variables
s
Observation
0.014 1.679 1.922
1 2 3
@I
@Z
@3
1 0 0
0
1
0 0
0
1
where the covariance matrix of the errors of estimation (q,, al, u2)is estimated by QPb. Using the system of equations (3.2.6), we obtain estimates of Po and PI by general nonlinear least squares. The estimates are the values of (Po, PI) that minimize
(p,,p,)
[B’ -
( P O ? P1,
Po
+ P319,’[B
- (Po, 11, P o
+P : n
We outline one method of constructing the generalized least squares estimators. The nonlinear model may be written iii
= milPo
+ m i 2 P l + mi3(Po + P:) + ai,
i = 1,2,3,
where the aijare displayed in Table 3.2.2. Most nonlinear regression programs are designed for observations with uncorrelated constant variance errors. To use such a program it is necessary to transform the observations. We wish to construct a transformation matrix T such that
Tt,,T’ = I. One such transformation is that associated with Gram-Schmidt diagonalization of a covariance matrix. (See, for example, Rao (1965, p. 9)) Table 3.2.3 contains the transformed dependent variable and the transformed matrix of independent variables. Because the original matrix of independent variables is the identity matrix, the matrix of transformed independent variables is the transformation matrix T. The variables of Table TABLE 3.2.3. Transformed observations for nonlinear least squares estimation
Dependent Observation 1 2 3
Independent Variables
Varitble
T:B
T@1
T%
T@3
0.023 2.81 1 9.133
1.615 - 0.423 1.993
0.000 1.678 1.800
O.Oo0
0.000 3.188
3.2.
229
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
3.2.3 and the model (3.2.3) were entered into a standard nonlinear regression fi,) = program to obtain the estimates of Po and p,. These estimates are (-0.080, 1.452) and the vector half of the estimated covariance matrix is
(B,,
vech
?{p}
= (0.3203, -0.0546; 0.0173)’.
a
The estimated covariance matrix of is the usual Taylor approximation computed for nonlinear least squares. Most nonlinear regression programs compute the covariance matrix of the estimates by multiplying a residual mean square by the inverse of the matrix of sums of squares and products of the derivatives with respect to the parameters. In our problem the error mean square of the transformed variables is estimated to be one. Therefore, it is necessary to multiply the standard errors output by the program by the square root of the inverse of the residual mean square. In this example the residual mean square is 0.141 with one degree of freedom. The residual mean square is approximately distributed as an F random variable with 1 and 17 degrees of freedom when the p’s satisfy the postulated restrictions. Therefore, for our example, the model is easily accepted. 00
3.2.3. Models Nonlinear in x In this section we derive the maximum likelihood estimator of nonlinear model introduced in Section 3.2.1. We let Y, = dxi; P),
-
(I;,Xi) = (Yt,
for the
xt) -I- (er, ur),
and assume that E: NI(0, Z,,), where el = (e,, ur),and that C,, is nonsingular and known (or known up to a multiple). It is assumed that g(x; /I) is continuous and possesses continuous first and second derivatives with respect to both arguments for x E A and p E B, where A and B are subsets of pdimensional Euclidean space and k-dimensional Euclidean space, respectively. The unknown true values (xl, x 2 , . . . , x,) are assumed to be fixed. Under this model the density function for (7,X,)is proportional to
I&-
”’exp{ -+[K - dxr; PI, XI - xr]C;
‘[I; - Ax,; P), X, - xi]‘}.
The maximum likelihood estimator is the (/?’, xl, x 2 , . . . ,x,) in B x A that minimizes the sum of squares n
1=1
where
(3.2.7)
230
EXTENSIONS OF THE SINGLE RELATION MODEL
The likelihood equations obtained by setting the first derivatives of (3.2.7) equal to zero are
(3.2.8) where the first equation holds for i = 1, 2 , . . . , k, the second equation holds for r = 1, 2, . . . ,n, F X I = [gX(l,(x,; /?I, gxctix,; /?I, . . ’ gxcp,(xt; 9
/?)I,
gpci,(xf;8) is the partial derivative of g(x; p) with respect to pi evaluated at (xl; /?), and gxci,(xt;/?) is the partial derivative of g(x; fi) with respect to x , ~ evaluated at (x,; /?). The maximum likelihood estimators will satisfy these
equations if the solutions are in the parameter space. For the linear model, we were able to obtain explicit expressions for the estimator of the parameter vectorp and for the estimator of x,. See (2.3.14) and (2.3.15). In most situations it is not possible to obtain an explicit expression for the maximum likelihood estimator of the parameter vector for the nonlinear model. In Example 3.2.2 we demonstrate how a nonlinear regression program can be used to obtain the maximum likelihood estimator. Because this procedure directly estimates the unknown x,, it is not computationally efficient for very large samples. Example 3.2.2. The data in Table 3.2.4 were obtained in an experiment conducted by Frisillo and Stewart (1980b). The experiment was designed to
TABLE 3.2.4. Observations for experiment on ultrasonic absorption in Berea sandstone
Observation
1
9 10 11 12 Source:
Compressional Speed ( Y )
Gas-Brine Saturation ( X )
1265.0 263.6 258.0 254.0 253.0 249.8 237.0 218.0 1220.6 1213.8 1215.5 1212.0
0.0 0.0 5.0 7.0 7.5 10.0 16.0 26.0 30.0 34.0 34.5 100.0
Frisillo and Stewart (1980a).
3.2.
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
231
investigate the possible use of sonic logging to detect areas with potential for natural gas production. The data are the observed compressional wave velocity (Y) of ultrasonic signals propagated through cores of Berea sandstone and the percent nitrogen gas saturation (X) in a brine solution forced into the pores of the Berea sandstone. The method used to create partial gas saturation in the brine solution could only produce saturation levels less than 35%. This explains the large gap in saturation levels between 34.5% and 100%. We assume that the data satisfy the model Y,
(x,
= Po
+ Bt[exp{B2xJ
-
-
1 1 2 9
where X , ) = (y,, x , ) + (e, , u,) and (q,uJ’ NI(0, 10’). We shall use a nonlinear least squares program to estimate the parameters of the model. Let (
G
9
Zt2)
= (K, X , )
and
(&I19
&,2) =
u,)
for t = 1 , 2 , . . . , 12. The model for nonlinear estimation can then be written
+ Jtj2
12
1
i= 1
Dtjixi
+
(3.2.9)
where Jtj, = Sjl, Dfji = S,,, and dti is Kronecker’s delta. The model (3.2.9) is a nonlinear regression model with parameters (Po, p l , Sz, xl, x 2 , . . . , x I 2), exD,,. . . , D , , , J , , J 2 ) , and errors E , ~ .Table 3.2.5 conplanatory variables (D,, tains the variables used in the nonlinear least squares estimation of the model. X , ) have been listed in a single column with the 12 observaThe 12 pairs tions on k; appearing first. This column of 24 observations is called the 2 column. The estimates obtained by nonlinear least squares are given in Table 3.2.6. The estimated function is plotted in Figure 3.2.1. Nonlinear regression programs generally require the user to provide start values for the parameters. In the present problem start values for (Po, PI, P2) were estimated from a plot of the data. The observed X , values were used as start values for x , . The estimated standard errors in Table 3.2.6 are those output by the nonlinear regression program. The distributional properties of the estimators are investigated in Theorem 3.2.1 below. In this experiment it is reasonable to believe that the error variances for X = 0 and X = 100 are smaller than the error variances for the mixtures. Also, one can argue that the expected value of the error given that X = 0 is not zero. Estimation recognizing these facts would change the estimates of (Do, PI, fi2) very little because the derivative of the function is zero at x = 0 and the derivative is nearly zero at x = 100. This is reflected in the fact that
(x,
h,
XI2
XlO XI 1
x9
X8
x7
X6
x5
x4
x 3
x 2
x,
YI 0 y*1 YI2
ys
y, Y E
y6
y4 y5
y3
y2
YI
Original Observation
4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
3
1 2
t
2
2 2 2 2 2 2 2 2 2 2 2
1 1
1 1
1 1 1
34.5 100.0
16.0 26.0 30.0 34.0
7.5 10.0
5.0 7.0
0.0 0.0
1265.0 1263.6 1258.0 1254.0 1253.0 1249.8 1237.0 1218.0 1220.6 1213.8 1215.5 1212.0
1
1 1 1 1
ZSj
Dependent Variable
j
Indices -
0
0 0 0 0 0
0 0 0 0 0
1 0 0 0 0 0
1
0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0
D2
1 0 0 0 0 0 0 0 0 0 0 0
Dl
0 0 0 0
0
0
0 1 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0 0
D3
0 0 0 0
0
0
0 0 1 0 0 0
0 0 0 1 0 0 0 0 0 0 0 0
D4
0 0 0 0
0
0
0 0 0 1 0 0
0 0 0 0 1 0 0 0 0 0 0 0
D5
0 0 0 0
0
0
0 0 0 0 1 0
0 0 0 0 0 1 0 0 0 0 0 0
D6
0 0 0 0
0
0
0 0 0 0 0 1
0 0 0 0 0 0 1 0 0 0 0 0
D7
0 0 0 0
1
0
0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0
D8
1 0 0 0
0
0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 1 0 0 0
D9
0 1 0 0
0
0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 0
DIO
0
0 1
0
0
0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0
Dll
TABLE 3.25. Table of observations for nonlinear least squares estimation of parameters of wave velocity model
0 0 0 1
0
0
0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1
DIZ
0 0 0 0
0
0
0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1
J,
1 1 1 1
1
1
1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0
J2
TABLE 3.2.6. Estimated parameters and estimated standard errors obtained by nonlinear least squares
Parameter
Estimated Standard Error
Estimate 1264.65 - 54.02 - 0.0879 0.00 0.00 4.93 6.73 7.17 8.69 14.67 27.46 28.8 1 34.77
1.03 1.58 0.0063 1.54 1.54 0.74 0.71 0.71 0.74 0.93 1.30 1.33
1.45 1.44
34.58 100.00
1.54
-
2.38
126W
W
& 1251
Q Z 124-
0 cn
123[r
a.
5 122V
121I
0
,
10
I
I
20
30
I
I
1
1
1
1
1
40
50
60
70
80
90
100
FIGURE 3.2.1. Estimated function and observed values for wave velocity.
233
234
EXTENSIONS OF THE SINGLE RELATION MODEL
the estimates of x, at these points are, to two decimals, equal to X,and the estimated standard errors for the estimated x, values are equal to 3. Also see Exercise 3.9 and Example 3.2.3. 00 The nonlinear least squares calculations of Example 3.2.2 are very general and can be used in small samples for a wide range of models. To illustrate the method for models with unequal error variances and for models where the observations satisfy more than one relationship, we continue the analysis of the Frisillo and Stewart (1980) data.
Example 3.2.3. In the experiment of Frisillo and Stewart (1980b) described in Example 3.2.2, a second response variable, called quality by Frisillo and Stewart, was also determined. We let yI2 denote the true value of quality and yrl denote the true value of compressional speed. The data are given in Table 3.2.7. For the purposes of this example, we assume the model (3.2.10)
TABLE 3.2.7. Observations for experiment on ultrasonic absorption in Berea sandstone
Observation 1
2 3 4 5 6 I 8 9 10 11 12
Compressional Speed ( X A
Quality (X2)
Gas-Brine Saturation (X,)
1265.0 1263.6 1258.0 1254.0 1253.0 1249.8 1237.0 1218.0 1220.6 1213.8 1215.5 1212.0
33.0 32.0 21.6 20.0 21.0 17.4 14.8 8.7 8.7 9.9 10.3 41.0
0.0 0.0 5.0 7.0 7.5 10.0 16.0 26.0 30.0 34.0 34.5 100.0
Source: Frisillo and Stewart (l980a).
3.2.
NONLINEAR MODELS WlTH NO ERROR IN THE EQUATION
235
We are assuming that the variance of e12 is one-fourth that of e,, for all observations. If the gas-brine saturation is greater than zero and less than loo%, we assume that the variance of e,, is equal to the variance of u,. For the zero and 100% saturation observations we assume that the gas-brine saturation is measured without error. The variables to be used in the nonlinear estimation are given in Table 3.2.8. The 2 column contains the values of compressional speed, followed by the values of gas-brine saturation measured with error, followed by the values of quality multiplied by two. The gas-brine saturation for observations 1, 2, and 12 are not included in the Z column because the true values for these observations are known and need not be estimated. The values of quality are multiplied by two so that the error variance is equal to that of speed. Also, the indicator variables associated with quality ( J 3 and H3) are multiplied by two. The variable F contains the values of saturation that are measured without error for the corresponding values of speed and quality. The values of F for observations 4-11 are arbitrary because they are annihilated by the indicator variables. The nonlinear model is written
1 if 2 observation is for ZIi J '. = { 0 otherwise,
1 ift=i 0 otherwise,
D . = D rji = I
K = K
= 'j
{
1 i f j # 0, 1, 12 0 otherwise,
Hi = J i K , j , Ftj = xIj(l - K,,), and we have omitted the subscripts identifying the observation when no confusion will result. The nonlinear least squares estimates of the parameters are given in Table 3.2.9. The estimates of Do, Br, and f12 are very similar to those of Table 3.2.6. The addition of data on quality had little effect on the estimates of (Po, PI, p2) or on the estimated standafd errors of these estimates. In fact, the estimated &) of Table 3.2.9 are slightly larger than those of standard errors of (&, Table 3.2.6. This is because d2 = 2.59 of Table 3.2.9 is larger than d2 = 2.38
PI,
N
o\
w
y11.1
y10.1
y9,1
YB,l
y7.1
y6,1
Y5.1
y4.1
y3.1
y2.1
K.1
1265.0 1263.6 1258.0 1254.0 1253.0 1249.8 1237.0 1218.0 1220.6 1213.8 1215.5 1212.0 5.0 7.0 7.5
0 0 0 1 0 0 0 0 0 0 0
0 0 1 0
0
0 1 0 0 0 0 0 0 0 0
0 1 0 0
0 0 0 1
0 0 0 1 0 0 0 0 0 0
0
0 0 0 0
0 0 0 0 1 0 0 0 0 0
0
0 0 0 0
0 0 0 0 0 1 0 0 0 0
0
0 0 0 0
0 0 0 0 0 0 1 0 0 0
0
0 0 0 0
0 0 0 0 0 0 0 1 0 0
0
0 0 0 0
0 0 0 0 0 0 0 0 1 0
0
~~~
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
1
~
0 0 0
1 1 1 1 1 1 1 1 1 1 1
TABLE 3.28 Data for nonlinear least squares estimation of multivariate wave velocity model
0
~
1 1 1
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 1 1 1 1 1 1 1 0 0 0 0
1
0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
~~
0
0 0 0
0 0 0 0
0 0 0 0 0 0 0
2y10.2
2y3,2
1 0 . 0 0 1 6 . 0 0 2 6 . 0 0 3 0 . 0 0 3 4 . 0 0 3 4 . 5 0 6 6 . 0 0 64.0 0 4 3 . 2 1 40.0 0 4 2 . 0 0 3 4 . 8 0 2 9 . 6 0 1 7 . 4 0 1 7 . 4 0 1 9 . 8 0 2 0 . 6 0 9 4 . 0 0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1
0 0 0 0 0 0 0 0 0 0
0 0 0 0
0
0
0 0 0
0 0
0 1
0 0
0
0 0 0 0 0 0
0 1
1
0
0 0
0 0
0
0
0
0
1
0
0 0 0 0
1
1 0 0 0 0 0 0 0 0 0 0
0
0
0 0 0
1
0
0 0 0 1 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 1
0
1
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 2
0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
238
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.2.9. Estimated parameters of multivariate wave velocity model Estimated Standard Error
Estimate
Parameter
1.05 1.58 0.0065 0.40 17.71 0.00 18 0.018 0.54 0.58 0.58 0.66 0.87 1.35 1.38 1.43 1.43
1265.14 - 54.50 - 0.0887 9.00 194.45 -0.0155 0.654 5.71 7.1 1 6.93 9.1 1 13.87 27.40 28.71 34.69 34.8 1 2.59
obtained in the analysis of Table 3.2.6. The conditions under which the standard errors of the estimators are appropriate are discussed in Theorem 3.2.1, following this example. While the addition of the quality data had little effect on the estimates of (Po, PI, P2), it did have an effect on the estimates of the x values. There is a marked reduction in the estimated standard error of the estimates of x3 through x,. The standard errors of the estimates of xg through xI1are affected less because the slope of the quality function with respect to x is nearly zero for this range of x values. no We now develop a computational algorithm for the nonlinear model that is appropriate for large samples. We consider the general model in which the true values are defined implicitly. Let the model be
m;b)
-
= 0,
Zf = 2,
+ e,,
(3.2.11) el NIP, U , where z, is a p-dimensiohal row vector and /3 is a k-dimensional column vector. We assume f(z,, /?) is a continuous function of z, and /? with continuous first and second derivatives. We assume that the z, are fixed unknown constants and that lZtel# 0.
3.2.
NONLINEAR MODELS WITH NO ERROR I N THE EQUATION
239
Given observations Z,, t = 1,2,. . . , n, the least squares (maximum likelihood) estimator is constructed by minimizing the Lagrangean n
n
(3.2.12)
with respect to /? and z,, t = 1,2, . . . , n, where the a, are Lagrange multipliers. Iterative methods are required to obtain the numerical solution. Britt and Luecke (1973) suggested a Gauss-Newton type of iteration that has been extended by Schnell(l983). Let Z,, Z,, . . . , Z,) be initial estimates. Expand f ( z I ; /Iin )a first-order Taylor expansion about the initial values to obtain
0,
Az,,B) f(%, S) + f&,
S>(/?- S) + f,(Z,, fi)(zt - Z,)',
(3.2.13) where f&, fi) is the k-dimensional row vector containing the partial deriva/?) with respect to the elements of /? evaluated at (z,, /?) = (Z,, tives of and f,(Z,, /?) is the p-dimensional row vector of partial derivatives of f(z,, /?) with respect to the elements of z, evaluated at (z,, j)= (Z,, j).Replacing f(z,, /?) of (3.2.12) with (3.2.13), we have
6)
f(z,, n
1 [(Z, - 2,) -
I =1
(2,
-
ZJIG
[(Zl - 2,) - (Zf - ZJI'
c n
+ I =1 ~ , [ f ( %s") + fp(& S,(S - j)+ f,(Z,,
P)(z, - 2,YJ
(3.2.14)
The objective function (3.2.14)is quadratic in (z, - 2,) and (/? - j).Differentiation produces the system of equations [(Z, - 2,) - (2, - Z,)]' - a,f;(Z,, j)= 0, (3.2.15)
i a,f@t, j)
a,
= 0,
1= 1
(3.2.16)
f@,, + fp(ir, j)(P - $1 + f,(Z,, b ( Z , - ZJ' = 0, (3.2.17) where (3.2.15)and (3.2.17) hold for t = 1, 2,. . . , n. If we multiply (3.2.15) by f,(Z,, &Zzzand use (3.2.17), we obtain
S) + f,@,>B)(P - S)l,
m, = Gxfi,+ f(% where fit = f,(Z,, fi)(Z, - f)' and
(3.2.18)
S).
(3.2.19) d"",,= fA& If we multiply (3.2.18) by fb(i,, sum, and use (3.2.16),the estimated change in /? is given by
S),
/
M
z
m
t
9
(3.2.20)
240
EXTENSIONS OF THE SINGLE RELATION MODEL
Therefore, the improved estimator of fl is obtained by adding the fi - fl of (3.2.20)to the initial estimator b. Multiplying (3.2.15)by C,, and using (3.2.19), we have 2, = 2, - 6”,[fz(Et, ji)(Z, - z,y
+ f ( z t , ii)+ f o ( n t , S)CS- B ) l f z ( z t , @ee.
(3.2.21)
The calculations are iterated using / and z, from (3.2.20) and (3.2.21)as initial values ($, 2,) for the next iteration. Modifications may be required to guarantee convergence. An alternative method of calculation is to use a nonlinear algorithm to choose i, so that f@,,fl) = 0 at each step. The limiting properties of the estimator are given in Theorem 3.2.1. It is important to note that the limits in the theorem are limits as the error variances become small. Therefore, the theorem is applicable in situations where the error variance is small relative to the curvature of the function. The adequacy of this type of approximation will be explored in the examples and in Section 3.2.4. Amemiya and Fuller (1985) have given a version of Theorem 3.2.1 for sequences in which both the sample size increases and the measurement variance becomes smaller.
Theorem 3.2.1.
Let
+ art, NI(0, r - ‘L),
f(z,, P) = 0, a,.,
-
Zrt
= zt
(3.2.22)
where ?&is a nonsingular fixed matrix and z,, t = 1 , 2 , . . . , n, is a set of n ( n fixed) p-dimensional fixed vectors. Let f(z,,8) be a continuous function defined on A x B, where A is a compact subset of p-dimensional Euclidean space and B is a compact subset of k-dimensional Euclidean space. Let f(z,, 8) have continuous first and second derivatives with respect to (z,, 8) on A x B. Let A, be the set of z, in A for which f(z,, P) = 0 and assume that for every C > 0 there exists a 6, > 0 such that
for all 8 in B satisfying I/? - j 0 [> 5. Let 8’ = (/?’, zl, z 2 , . . . ,z,) and let 8’ = zy, z:, . . . ,z,”)‘ be the true 8, where Po is an interior point of B and every zp is in the interior of A. Let d be the value of 8 that minimizes
(PO’,
(3.2.24) subject to I(&,, fi) = 0 for t = 1 , 2 , . , . , n. Then F($
- 80)
5 N(O, a),
3.2. as r
241
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
-, 03,
where (3.2.25)
R,, is an np
x np
matrix with p x p blocks of the form
Proof. Our proofs of consistency follow those of Amemiya and Fuller (1985). Let
Because
minimizes P,(/I), we have
say. For any z,, (zp - Z t ) E i '(ZP - z,)'
< 2[(Z,,
- Z,)C,
yzrt- ZJ' + E,&
lE:,]
and, therefore,
Q,(h< 2 [ P , ( i ) + R,] < 4R, = O P F 1 ) . By the identifiability condition (3.2.23),Q,(& converging to zero implies that
) is converging to jo, and / is a consistent estimator of
jo.
242
EXTENSIONS OF THE SINGLE RELATION MODEL
Because n
1 (2, - zPPC, '(gt -
t= 1
= nQ,(S),
2:)'
(kl, k2, . . . , 2.) is consistent for (zy, zi,,. . , 2):. If we expand f($,1)about (zp, Po), the last term of the Lagrangean n
C (Zrt - 2 t P i '(zr, -
t= 1
can be written
t .J":. P*)O -
t=
1
PO)
2tY
+
n
1 a,f(z,, P)
t= 1
+ f&:, B*)($
-
z31,
where a, ar! the Lagrange multiplilers and (z:,j*) is on the line segment Because 8 is a consistent estimatorpf 0' and 8' joining (it,P) and (z;, /lo). is in the interior of the parameter space, the probability that 8 satisfies the derivative equations associated with the Lagrangean approaches one as r -+ co. Following the derivation used to obtain (3.2.20)and (3.2.21),we have
21 -
4-Fpd) - f l 0 ) ] x V E4-t op(r- '"), where or, = F,,e:, and ZuEI= FZtCe,.We obtain the distributional result because the erI are normally distributed. 0 = art - Q,A[urr
The assumption that C,, is nonsingular is not required in the computation of the estimates. Also, the assumption that EELis nonsingular can be replaced
3.2.
NONLINEAR MODELS WITH N O ERROR IN THE EQUATION
243
in Theorem 3.2.1 by the assumption K,
-= fz(z,,B)C,&f:(Z,9B)
for all (z,, /I) in A x B. If CCEis singular, one can define the quadratic form in (3.2.24) for a vector of contrasts of the 2, that has a nonsingular covariance matrix. The linear functions of the Z, with zero error variance are arguments of f ( z , , /?) but not of the quadratic form. The covariance matrix of the limiting distribution of is estimated by the inverse matrix of (3.2.20) computed at the last iteration. The covariance matrix for the approximate distribution of I,can be estimated by substituting estimators for the parameters in S2,,rr of (3.2.25). In the derivation we treated C,, as known. However, CEEneed only be known up to a multiple. If
E&&= YEE02,
(3.2.26)
where YE,is known and nonsingular, and the r of Theorem 3.2.1 is absorbed into c2,the estimator of cr2 is d2 = ( n - k ) - '
n ,=I
= (n - k)-' I='
(Z, - Z,)T,~'(z,- 2)' [f,(f,, j)T,,f:(Z,, j)]-'[f,(Z,, B)(Z, - 2,)'12. (3.2.27)
The second expression in (3.2.27) is also valid for singular YeE. The approximate distribution of G2 is given in Theorem 3.2.2. Theorem 3.2.2. Let the assumptions of Theorem 3.2.1 hold with ECC defined by (3.2.26) where TEE is known and nonsingular. Let the estimator of (/I, z I , z2, . . . ,z,) be the vector that minimizes (3.2.4) with Y E ,replacing X,,. Let d2 be defined by (3.2.27). Then ( n - k)o-2d2
L +
xi-k,
where is distributed as a chi-square random variable with n - k degrees of freedom. Furthermore, the limiting distribution of G2 is independent of that of r1I2(d - eo)defined in Theorem 3.2.1. Proof. The estimation problem contains np unknown zIi values and k unknown elements of p. Because of the restrictions imposed by f ( z , , /?) = 0, there are a total of ( p - 1)n -t k independent unknown parameters. Let y be a (p - 1)n + k vector containing one set of independent parameters. Then we can write
z, = g,(yo) + E,,
t = 1, 2,
. . . , n,
244
EXTENSIONS OF THE SINGLE RELATION MODEL
where the form of g,(y) is determined by the parameters chosen to enter y and yo is the true value of y. Our assumptions are sufficient to guarantee the existence of functions g,(y) that are continuous and differentiable in the neighborhood of yo. By the proof of Theorem 3.2.1
+ +
(Z - 2)'
= F,(y*- yo) E' op(r- '"), f - yo = (FYT-'F,)-'F;r-'E' + oP(r-'I2),
where Z = ( Z l , . . . ,Z.), 2 = (S1,
. . . ,2.),
E
=(
. . . ,E,,),
E ~ ,
r = block diag(r,,, re,,. . . ,r,,),
and F, is the np x [(p - 1)n + k ] matrix of partial derivatives of Z with respect to y evaluated at y = yo. It follows that
8 = ( n - k ) - %(z - 2)r-I(z - 2)' = (n - k)-'rEHr-'He'
where
H
=I -
+ op(l),
F,(FbT-'F,)- ' F y r - '.
The distribution of the quadratic form in e follows by standard regression theory. [See, e.g., Johnston (1972, Chap. 5).] By the same theory, 1 3 ~and r1/2(b- Oo) are independent in the limit. An alternative proof can be constructed by substituting the expressions for - Po and 2, - zp from the proof of Theorem 3.2.1 into (3.2.27). 0
4
Example 3.2.4. To illustrate the computations for an implicit model we use an example from Reilly and Patino-Leal (1981). The data in Table 3.2.10 are 20 observations digitized from the x-ray image of a hip prosthesis. Reilly and Patino-Leal (1981) cite Oxland, McLeod, and McNeice (1979) for the original data. The image is assumed to be that of an ellipse P 3 ( ~ r- P i ) 2
+ 2 P d ~ -i Pi)(xi - P 2 ) iPs(x, - Pz)' - 1 = 0,
where yI is the true value of the vertical distance from the origin and x, is the true horizontal distance from the origin. We assume that the observations X,) of Table 3.2.7 are the sum of true values (y,, x,) and measurement error (et,u,), where (el, u,)' NI(0,102). The iterative fitting method associated with (3.2.20) and (3.2.21) gives the estimated parameters:
(x,
-
PI = -0.9994
b3 = S, =
(0.1114), 0.08757 (0.0041 I), 0.07975 (0.00350),
P2 = -2.9310
p4 =
82 =
0.01623 0.00588,
(0.1098), (0.00275),
3.2.
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
245
TABLE 3.2.10. Observations on x-ray image of a hip prosthesis Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
x 0.50 1.20 1.60 1.86 2.12 2.36 2.44 2.36 2.06 1.74 1.34 0.90 - 0.28 -0.78 - 1.36 - 1.90 - 2.50 - 2.88 -3.18 - 3.44
X,
9,
2,
v^,
-0.12 - 0.60 - 1.00 - 1.40 - 2.54 - 3.36 -4.00 -4.75 - 5.25 - 5.64 - 5.97 -6.32 - 6.44 - 6.44 - 6.4 1 - 6.25 - 5.88 - 5.50 - 5.24 - 4.86
0.534 1.174 1.535 1.800 2.274 2.435 2.421 2.268 2.030 1.738 1.358 0.882 -0.278 - 0.790 - 1.351 - 1.868 - 2.469 - 2.888 -3.174 - 3.476
- 0.072 - 0.626 - 1.049 - 1.437 - 2.494 - 3.354 - 3.999 -4.718 - 5.233 - 5.638 - 5.993 - 6.280 - 6.540 - 6.508 - 6.382 -6.181 - 5.835 - 5.509 - 5.234 -4.888
- 0.036 0.023 0.05 1 0.045 - 0.099 - 0.044
0.008 0.053 0.0 18 0.002 -0.015 - 0.023 - 0.055 - 0.039 0.0 17 0.046 - 0.034 - 0.007 0.005 - 0.029
Source: Reilly and Patino-Leal (1981).
where the numbers in parentheses are the estimated standard errors. The estimated covariance matrix of the estimators is the inverse matrix obtained at the last iteration of (3.2.20). The estimated values of (y,, x,) are given in Table 3.2.10. The estimated covariance matrices of (j,,a,) are given in Table 3.2.11. The covariances are estimates of the elements of the matrix R,,,, defined in (3.2.25) of Theorem 3.2.1. Note that the variance of 2, is smallest when the curve is most nearly parallel to the y axis. See Figure 3.2.2. The deviations
G, = f& /3)(Z, - 2J
= P,,(Z, - 2,)’
are given in Table 3.2.10. In the linear model u, = ~ ~ (-/?’)’ 1 , is a linear function of E, because the vector of weights (1, -/3’) is constant over observations. In the nonlinear model the vector of weights F,, changes from observation to observation. As a consequence, the variance of u, will change from observation to observation and the plot of 19, against ificould be misleading for this
246
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.2.1 1. Covariance matrices of ( j , , i,) and standardized deviations
Observation
10’3 { j , }
103e{j,, a,}
lo3? {a,}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5.17 3.61 2.93 2.54 1.86 1.37 1.23 1.63 2.39 3.26 4.20 5.06 5.88 5.79 5.48 5.02 4.30 3.82 3.66 3.97
- 1.00
4.48 3.78 4.19 4.64 5.52 5.86 5.83 5.37 4.67 3.87 2.97 2.09 1.29 1.40 1.66 1.99 2.55 3.20 3.83 4.69
-2.18 - 2.24 - 2.04 - 1.21 - 0.34 0.49 1.48 2.06 2.30 2.22 1.77 0.08 -0.63 - 1.30 - 1.84 - 2.29 - 2.35 -2.13 - 1.51
10%;;; 4.71 4.85 4.86 4.84 4.70 4.52 4.36 4.18 4.06 3.99 3.97 4.00 4.24 4.37 4.5 1 4.63 4.76 4.82 4.85 4.86
6;:,%,
0.76 0.48 1.06 0.92 -2.10 -0.98 0.18 1.27 0.45 0.04 -0.38 0.57 - 1.31 -0.90 0.38 0.99 0.72 -0.15 0.10 -0.60
reason. It is suggested that 6;:,/20, be plotted against the elements of 2r. The quantities 13;~:,‘~0,are given in Table 3.2.11. In this example, Xet = Io2 and is proportional to the signed Euclidean distance from the quantity 6u~~,‘z0, Xr), to the closest point, ( j , ,$,), on the estimated functhe observation, tion. The function and the observations are plotted in Figure 3.2.2. The runs of positive and negative values of the 6, in Figure 3.2.2 suggest that the error made in digitizing the image may be correlated, but we do not pursue that possibility further in this example. In the linear case, every element in the vector of differences Z, - 2, is a multiple of 0,. Therefore, the plot of Z,, - it,against is the same, except for a scale factor, as the plot of 0, against iri.In the nonlinear case the multiple of 0, subtracted from Z , , to create i,,is not constant from observation to observation. However, the standardized quantities 6u;:,/2u^, are equal to the differences Zti - it[divided by the estimated standard deviation of Z,, - ili.Therefore, a plot of 6;uA’20, against the ith element of 2, is analogous to the plot of the standardized difference Z,, - iriagainst Z,,.
(x,
00
3.2.
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
241
X
FIGURE 3.2.2. Estimated ellipse and observed data for image of hip prosthesis.
3.2.4.
Modifications of the Maximum Likelihood Estimator
The conclusion of Theorem 3.2.1 requires the variance of the measurement error to be small relative to the curvature. The data analyzed in Example 3.2.2 and Example 3.2.4 are such that the approximation of Theorem 3.2.1 seems adequate. We now study maximum likelihood estimation for situations in which the variance of the measurement error is not small relative to the curvature.
Example 3.2.5.
This example uses created data to illustrate some aspects
of the estimation of nonlinear errors-in-variables models. Table 3.2.12 con-
tains 120 observations generated by the model. Y , = Bo
+ a,x:, (q,u,)’
-( T >
X I ) = (Yt9 x,)
NI(0, 0.091),
+
(et9
Ut),
TABLE 3.2.12. Data for quadratic model 1.24 0.59 1.53 0.84 1.64 1.47 0.84 0.78 0.82 1.40 1.30 1.02 0.65 0.75 1.14 1.12 0.64 0.85 0.78 0.67 0.99 1.19 0.72 1.01 0.45 0.09 0.42 0.39 0.32 -0.01 -0.32 0.65 0.47 0.43 0.72 0.69 0.4 1 0.06 -0.31 0.05
248
- 1.44 - 0.90
-0.81 - 1.14 - 1.36 -0.61 -1.11 - 1.46 - 0.77 - 1.31 - 0.98 - 0.93 - 1.03 - 1.59 - 0.45 - 1.07 -1.11 -0.57 - 0.89 - 1.25 -0.81 - 1.07 -0.51 - 0.86 -0.16 - 0.64 0.2 1 - 0.49 - 0.45 - 0.7 1 - 0.69 - 0.46 - 0.22 - 0.98 - 0.54 - 0.54 - 0.8 1 -0.14 - 1.03 -0.36
0.02 0.04 0.14 -0.14 0.29 0.6 1 -0.04 0.58 - 0.06 -0.35 -0.35 0.09 0.1 1 0.03 0.26 0.4 1 -0.01 -0.45 0.32 -0.14 - 0.06 0.32 - 0.06 - 0.22 - 0.30 - 0.26 - 0.59 - 0.31 0.23 - 0.25 0.59 0.34 -0.22 0.24 0.42 - 0.07 0.8 1 0.43 -0.10 0.12
-0.14 -0.37 -0.71 -0.21 - 0.52 - 1.01 -0.34 - 0.68 -0.41 - 0.06 0.13 -0.18 0.11 - 0.04 -0.54 -0.10 0.35 -0.32 -0.23 -0.12 - 0.05 - 0.3 1 - 0.45 0.65 0.57 0.17 -0.16 - 0.09 0.08 -0.17 0.09 0.43 0.83 0.70 0.35 0.64 0.52 0.67 0.50 - 0.26
0.41 0.60 0.52 - 0.22 0.16 -0.10 0.13 0.06 0.46 0.30 0.30 0.09 0.30 0.37 -0.00 0.98
.oo
1
0.99 0.78 0.94 0.49 0.57 0.79 0.89 1.22 0.62 1.42 1.51 1.26 1.09 0.38 1.32 0.96 1.50 1.31 0.80 1.06 1.02 1.06 1.03
0.56 0.44 0.99 0.76 0.55 0.53 0.89 0.35 0.33 0.39 -0.19 0.55 0.55 0.02 0.46 0.84 1.03 0.92 1.28 1.53 1.22 0.84 1.23 1.36 0.72 0.71 1.44 0.82 0.58 0.39 0.63 1.11 0.84 1.07 0.80 1.08 0.85 0.97 1.oo 1.60
3.2.
249
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
. ..
-0,3 -0.6
-1.5
-1.0
-0.5
I
X
0.5
I
1.0
I
1.5
FIGURE 3.2.3. True function (dashed line) and maximum likelihood estimated function (solid line) for 120 observations generated by quadratic model.
where (Po, PI) = (0, 1). Twenty-four observations were generated for each of the five sets of (y, x) pairs, (1, - l), (0.25, -0.50), (O,O),(0.25, OSO), (1, 1). The errors were random normal deviates restricted so that the first and second moments for each of the five sets of 24 pairs are equal to the population moments. The data are plotted in Figure 3.2.3. Also plotted in the figure are the true function (dashed line) and the function estimated by maximum likelihood (solid line). The maximum likelihood estimate of the line is
9, = -0.171 + 1.241xf, (0.059) (0.133)
where the numbers in parentheses are the standard errors estimated by the usual nonlinear least squares method treating u2 as unknown. The estimate of o2 is d2 = 0.0808. The estimated intercept differs by more than two estimated standard errors from the true value of zero. Also, the estimated coefficient for x: is biased upward. On the other hand, the value of the estimated function at x = 1 is 1.070, not greatly different from the true value of 1.000. These results illustrate the fact that choosing the function to minimize the squared distances produces a bias that is a function of the curvature. The bias moves the estimated function away from the curvature. Because the true quadratic
250
EXTENSIONS OF THE SINGLE RELATION MODEL
function is concave upward at x = 0, the bias at that point is negative. The curvature at x = 1 is less than at x = 0 and the estimated curve is displaced less at x = 1. The slope of the estimated curve at x = 1 is biased because of the displacement at x = 0. The estimation method introduced in Section 3.1.5 for the quadratic model is essentially a method of moments procedure. As such it will have a small bias for medium size samples. Using the functionally related algorithm in SUPER CARP to construct the estimator described in Example 3.1.6, we find that the estimated quadratic for the data of Table 3.2.12 is
9, = 0.016 + 1.019~:, (0.074) (0.134)
where the numbers in parentheses are the estimated standard errors computed from the estimated covariance matrix of Equation (3.1.12). Generally, the maximum likelihood method will have a smaller variance and a larger bias than the method of moments procedure, Properties of the estimated variances based on likelihood procedures are discussed below.
no
Example 3.2.5 demonstrates that the maximum likelihood estimators of the parameters of nonlinear functions can be seriously biased. The maximum likelihood method chooses as the estimated surface the surface that minimizes the sum of the squared distances between the observations and the surface. When the surface is curved, the population sum of squares is minimized by a function that has been shifted away from the curvature. The bias is a function of curvature and does not decline if more observations with the same error variance and similar x values are added to the sample. Therefore, we consider modifications of the estimation procedure for samples in which the number of observations is relatively large and for models in which the measurement error in each Z vector is also relatively large. Let f(Z,,
B) = 0, 8,
z, = z, + e,,
- W O , X,,),
(3.2.28)
for t = 1,2, . . . , n. The second-order Taylor expansion of f(z,; /?) about the point (z;; Po) gives
f(% B) = f(zP, Po) + f,(zP, P”(P - P O ) + f,(ZP, P 0 h - ZP)’ + 3(z, - ~P)F,,(ZP, P 0 ) ( Z , - 2:)’ + (2, - zP)Fz&P, BO)(B- Po) + 3(P - P0)’Fpp(ZP, PO)(B- P O ) ,
(3.2.29)
where fp(zp,/?O) is the row vector of derivatives of f(z,, P) with respect to P, f,(zP, Po) is the row vector of derivatives off(z,, 8) with respect to z,, F,,(zP, /lo)
3.2.
251
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
is the p x p matrix of second derivatives off(z,, /?)with respect to zt, FzP(zP,Po) is the p x k matrix of second derivatives with respect to z, and fi, and all derivatives are evaluated at (?,, p ) = (zp, Po). If we replace (z,, /?) by (2,, /?) in (3.2.29),where the (if, are the maximum likelihood estimators, the term in it - zp is large because 2, - zp is the same order as Z, - 2.; The expected value of the term in (3.2.29) that is quadratic in 2, - zp is
b)
E ( t ( % ,- zp)Fzz(zp,Po#% - 2;)’) = 3 tr{FZZ(zp,Bo)Qzzrf),(3.2.30) where Qzzrr is the covariance matrix of 2, - zp. Let the maximum likelihood estimators of z,, /?,and fizz,,be denoted by if,8, and fizztf, respectively. Then an estimator of /? with smaller bias than the maximum likelihood estimator is obtained by minimizing (3.2.31) subject to the restrictions J(z,, /?) - t tr{Pzsfrfizzfr) = 0,
t
= I,2,.
. . , n,
(3.2.32)
where ~,,,,= F,,(L,, j)and fizZff is Q,,,, of (3.2.25)evaluated at (z,, /?) = (k,,)). Table 3.2.13 has been constructed to illustrate the nature of the bias in the maximum likelihood estimator of fi and the effect of the adjustment. Sets of 200 observations of the vector (e,, u,) were generated, where (e,, u,)’ NI(0, IcT’). Each set of 200 observations was standardized so that the sample mean is zero and the sample covariance matrix is the identity. Then the parameters Po and 6’ of the model
-
(Kj X t ) = (Y,, xr) + (et, ur) were estimated for each sample. The observed (K, X,)were set equal to (e,, u,) Y , = Po
+ CX:,
so that the true value of Po and the true values of all x, are zero. In the estimation, c is treated as known and the (y,, x,) values are treated as unknown. The estimates of Po are given in Table 3.2.13. The fixed known parameter c is given in the first column of Table 3.2.13 and the maximum likelihood estimate of the intercept is given in the third column. Because the true value of Po is zero, the estimate of Po is an estimate of the bias. The fourth column contains the theoretical approximation to the bias given in (3.2.30). The empirical bias is somewhat larger than the theoretical approximation. Also, the percent difference tends to increase as the theoretical bias increases and as the curvature increases. The last column contains the modified estimator constructed to minimize (3.2.31) subject to the modified restrictions (3.2.32).The maximum likelihood estimator of 0’ is used to construct fizz,,.The modification removes essentially all of the bias
252
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.2.13. Empirical bias of the maximum likelihood estimator of the intercept for the quadratic function with known second-degree coefficient ~
Approximate Bias
of P o
(3.2.30)
Modified Estimator
0.25 0.25 0.25 0.25
-0.541 -0.308 -0.153 - 0.068
- 0.5000 -0.2500 - 0.1250 - 0.0625
-0.428 -0.188 -0.060 -0.01 1
0.5 0.5 0.5 0.5
-0.311 -0.145 - 0.066 - 0.032
- 0.2500 - 0.1250
-0.0625 -0.03125
- 0.040
-0.616 -0.306 -0.136 - 0.064
-0.5Ooo -0.2500 -0.1250 - 0.0625
-0.376 -0.121 - 0.023 -0.003
- 1.0000 - 0.5OOO - 0.2500 - 0.1250
- 0.823
- 0.50OO - 0.2500 - 0.1250
- 0.242 - 0.046
Quadratic Parameter
True
M.L.E.
C
02
2.0000 1.0000 0.5000 0.2500 0,5000 0.2500 0.1250 0.0625
0.5000 0.2500 0.1250 0.0625
1.o 1.o 1.o 1.o
0.5000 0.2500 0.1250 0.0625
2.0 2.0 2.0 2.0
- 1.169 - 0.622
0.1250 0.0625 0.0312 0.0 156
4.0 4.0 4.0 4.0
-0.611 -0.273 -0.129 -0.063
- 0.290 -0.132
- 0.0625
(3.2.31)
-0.157
-0.006 -0.00 1
-0.314
- 0.080
-0.013
- 0.007 -0.001
in the estimator of Do when the theoretical bias is less than 0.07 and the modification removes most of the bias if the theoretical bias is less than 0.15. The variance of the maximum likelihood estimator of Po is approximately n-lo’. Therefore, for example, if o2 = 0.5 and c = 0.25, the squared bias of the maximum likelihood estimator of Do is greater than 10% of the variance whenever n is greater than 3. For a’ = 0.5 and c = 0.25, the squared bias of the modified estimator is less than 10% of the variance for n less than 59. The modification is less effective in removing the bias when a2 is small and c is large. This is because the estimated variance of 2 is more seriously biased downward in this case. The higher-order expansions can also be used to reduce the bias in the estimator of 0’. Expression (3.2.27) can be written
(3.2.33)
3.2.
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
where
L, = f&
253
bL
(3.2.34) b)rJx%, (3.2.35) 6,= fZ&, &(Z, - 2,)’. The quantity Y,;”:,U: is the squared distance from the point Z, to the point z, in the metric Yet.A source of bias in the estimator of cf2 is the bias in P,,,,as an estimator of Yuu,,.If we replace f,(& b) of (3.2.34) with a Taylor series about (zrrfl) and retain only first- and second-order terms in (2, - zt), we have ~ , , , f f = Yl,Uft+ 2f,(z,, m,&F,,(z,, - ZJ’ (3.2.36) + ( %- Zt)[Fz:(Zt. fl)~~&F*~(Z,> fl)I(%- Zf)’?
mf
fl). The expected value of the last term on the where YVv,,= fz(z,,fl)r&&f;(z,, right side of (3.2.36) is tr{~z2trCFrz(Z*3 flP-&EFZZ(Zf> nl). (3.2.37) An estimator of the expression in (3.2.37) is (3.2.38) fr{~iZZff~i-.,,rEEFi;fl where h,,,, and F,,,, are the maximum likelihood estimators defined for Equation (3.2.32). It follows that an estimator of a’ with reduced bias is 1 9
A
where 8’ is the estimator defined in (3.2.33). The maximum likelihood estimates of a’ adjusted for degrees of freedom and the modified estimator are given in Table 3.2.14 for the data sets of Table 3.2.13. It is clear that the maximum likelihood estimator of cf2 is biased, with the bias increasing as the curvature of the function increases and as the variance in the y direction increases. The modified estimator is less biased than the maximum likelihood estimator and displays little bias for ca’ < 0.25. There is a second consideration in the use of least squares (maximum likelihood) to estimate the parameters of a nonlinear functional relationship. The estimated covariance matrix of the estimators obtained by the procedure of Example 3.2.2 is a biased estimator of the covariance matrix of the estimators. This can be seen by considering the linear model. Let
I: = Po + P I X , + e, = d x , , P) + ef,
-
(3.2.40)
X , = x, + u,, and assume (e,, uf)’ NI(0, 10’). Then the estimator of p, given by the least squares method of Example 3.2.2 is equal to the estimator derived in Section 1.3.3. That is, iil
= (mxx - k ’ m x y ,
254
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.2.14. Empirical bias of the maximum likelihood estimator of the error variance for the quadratic function with known second-degree coefficient
Quadratic Parameter
True a’
M.L.E. of 6 2
Modified Estimator (3.2.39)
c
2.0000 1.oooo 0,5000 0.2500
0.25 0.25 0.25 0.25
0.126 0.171 0.215 0.240
0.158 0.216 0.247 0.253
0,5000 0.2500 0.1250 0.0625
0.5 0.5 0.5 0.5
0.389 0.462 0.490 0.499
0.472 0.505 0.504 0.503
0.5000 0.2500 0.1250 0.0625
1.o 1.o 1.o 1.o
0.683 0.861 0.960 0.99 1
0.865 0.989 1.012 1.006
0.5000 0.2500 0.1250 0.0625
2.0 2.0 2.0 2.0
1.174 1.556 1.846 1.961
1.512 1.888 2.018 2.018
0.1250 0.0625 0.0312 0.0156
4.0 4.0 4.0 4.0
3.443 3.842 3.966 4.003
3.954 4.047 4.025 4.018
where is the smallest root of lmzz - 111 =,O. The consistent estimator of the variance of the limiting distribution of p1 is given in Theorem 1.3.2 of Section 1.3.3,
3{/j1}= (n - l)-’rfi;,2[mxxS,, where 6,,= -/j,6”,,
=
-b,n,
kixx= )[(my,
- m,yx)’
+ 4&,]”’
-6
3,
(3.2.41)
- ( m y y - mxx).
If one uses the method of Example 3.2.2 to construct an estimator of
PI of
(3.2.15), the matrix of partial derivatives at the last iteration are those dis-
played in Table 3.2.15. It follows that the estimated variance obtained from
3.2.
255
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
TABLE 3.2.15. Partial derivatives associated with nonlinear least squares estimation of the linear errors-in-variables model Original Observation YI
y2
Index i
8)
Dependent Variable
dd-k ___
ZI I
I 1
1
2
a&
8)
-.
&'(it;
ax2
a) . .-War;h ax,
PI
0
0
PI
0 0
1 0
0 1 0
0 0 1
0 0
0
0
1
2
z 2 2
0
X"
n
Zn 2
0
0
z 1 2
dx
i l
x2
n 1
rig(%
~
$2
2 21
Z,1
afl I
2" 0 0
X I
XI
/b
___
PI
the nonlinear least squares fit is (3.2.42) where s, = ( 1
+ j:)S', k2 = (2n - n - 2)-'
n
2
C 1 ( Z t j- z^tj)2,
t=1 j = l
and itjis the usual least squares estimated *value for z t j . Now, by Theorem 2.3.2, d2 is a consistent estimator of c,, and PI is a consistent estimator of PI. Also E{
t= 1
2:) =
i=
I
[x:
+ (1 + j;)-lcu"] + O(1).
(3.2.43)
Therefore, if we fix X E E= I, assume n
C (x, - x)'
lim
n-rm
i=1
= fix,> 0,
and take the limit as the number of observations, n, becomes large, we have
+ +
p ~ i m [ n P ~ { j ~=)[fix, ] (1 fi:)-l]-leuu. (3.2.44) Equation (3.2.44)illustrates the fact that is a biased estimtator of the variance of the approximate distribution of for two reasons. First, the estimator n(2, is biased for?,, and, second, no estimator of - 03,) appears in prn{Pl}. the term fi;~(cuucu,,
c:-
x)'
vrn{fll} b1
Example 3.2.6. We illustrate the difference between the least squares estimator of the covariance matrix and the estimator (3.2.41), using the
256
EXTENSIONS OF THE SINGLE RELATION MODEL
pheasant data of Example 1.3.1. The 2 values constructed with the estimates of Example 1.3.1 are (9.123,6.522, 11.724, 12.661, 11.531, 11.142, 10.493, 7.651,9.659, 10.161, 10.116,8.142, 11.736, 11.816, 8.222) and s,, = 0.38783. Therefore,
[,:I
Vs2{jl} = C
(2, - X)' -
I-'
s,, = 0.00855,
while the estimated variance calculated in Example 1.3.1 is
P{l,} = 0.00925. In this example the error variance is not large relative to variation in the X(2, is only about 4% larger than (n - l)-'iXx. Also the term (n - 1)-'8~~(cr,,,s,, - 8:,) is small. Example 3.2.8 demonstrates that the difference between V,{j1) and ?{b,} can be large. 00
x)2
x values. As a result,
To construct an alternative estimator of the covariance matrix for the nonlinear model, we proceed by analogy to the linear model. The linear model can be written in the form (3.2.11) with
f(z,; B) = Y , - x,B*
(3.2.45)
In the linear model with known error covariance m+x, the estimator of the covariance matrix of the limiting distribution of /? is given in (2.2.24)as
?{a)
+
= n - l [ M ~ x l ~ , , M~>(C,,,,S,,-
~u,2,u)M~~]. (3.2.46)
By (2.2.22),the estimated variance of 6, is
V { i , } = ZEE- s1,e2,,
(3.2.47)
and we can write
+ fipzV(ir}Pzp]M;:svv, (3.2.48) = Fp,(f,, b) is the k x (k + 1) m = n - lM;:[MXx
where $Pz = F;, with respect to /Iand z, evaluated at (z,, fl) = (2,, /?). In the linear case Po.= (0, I). A consistent estimator of M,. is
= A,, - n-l
where for= fp(it,b) = if,
I =1
PpzrV{it}Fzpr,
(3.2.49)
3.2.
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
257
and f{2,) is defined in (3.2.47). Therefore, an estimator of the variance of the approximate distribution of for the linear model is
Q{j} = n-lZ;iM,,E;jsoo.
(3.2.50)
Example 3.2.7. We verify formula (3.2.50) using the pheasant data and working with the corrected sum of squares. We have ,A
= (n -
I)-'
(& -
I=I
x)*= 3.241 1,
P{1,} = (/i:6 + 1)-'du, = 0.1 122,
where
/il = 0.7516 and d,,
Crr
= 0.4923. Then = A,
?{ir> = 3.1289,
-
which agrees with the estimator of 6,.given in ExaTple 1.3.1. The estimator of the variance of the approximate distribution of PI is
P{bl} = (14)-'6~,!A,-f6'J~~,~,= 0.00925
?{bl}
which agrees with of Example 1.3.1. In these calculations we omitted from the vector of estimates because the associated x value, being identically one, is measured without error. 00
b,
In the nonlinear case, the matrix
nMrns =
f
r=1
ib,a;:rtDr
(3.2.51)
obtained at the last step of the iteration is analogous to ~M,,S;~'of the linear case and n
nzynl=
nz,&'.
nfi/nf
-
1 @Dztf{ir)kzprdA
I =1
(3.2.52)
is analogous to Therefore, a reasonable estimator of the variance of the approximate distribution of the nonlinear estimator is
f{/Q = n-'2s;lfrAfnfE;,t,, where
is defined in (3.2.51) and
(3.2.53)
ZJn, is defined in (3.2.52).
Example 3.2.8. In this example we continue the analysis of the data of Example 3.2.5, incorporating the modifications to the maximum likelihood procedure developed in this section. The estimated equation obtained by maximum likelihood in Example 3.2.5 is
$, = -0.171
+ 1.241~:
258
EXTENSIONS OF THE SINGLE RELATION MODEL
and the nonlinear least squares estimated covariance matrix obtained at the last iteration is defined by vech{n- ‘My,$} = 10-2(0.3518, -0.5307, 1.7632)’, where 6‘ = 0.0808 is used to construct the estimated covariance matrix and vech is defined in Appendix 4.A. To place the quadratic model in the implicit form, we write
j?) = Y, - Po - P2X: Then F,,(z,, j?) = diag(0, 2P2) and f@r,
= 211 - P o
- P2.L
Minimizing (3.2.31) subject to the restrictions (3.3.32) gives the estimated equation
j , = -0.081
+ 1.133~,.
The vector half of the estimate of variance defined in (3.2.53) is vech v { j } = 10-2(0.4180, -0.8010, 2.6020)‘. The adjustment in the estimation procedure reduced the estimated bias in the estimated parameters by about one-half. The adjustment uses a local approximation and, as such, will not remove all the bias in situations where the variance is large relative to the curvature. The modified estimate of variance for the quadratic coefficient is nearly twice that of the least squares no estimator. REFERENCES Arnemiya (1982), Amemiya and Fuller ( 1 983), Britt and Luecke (1973), Chan (1963, Clutton-Brock (1967), Dolby and Lipton (1972), Hoschel (1978a, 1978b), Reilly and Patino-Leal (1981), Schnell (1983), Stefanski and Carroll (1985b), Villegas (1969), Wolter and Fuller (1982b).
EXERCISES 8. (Sections 3.1.5, 3.2) The data in the table are 10 observations on earthquake depth (Y,) and distance from the Tonga trench (XI)taken from Sykes, Isacks, and Oliver (1969). Assume that these data satisfy
T = Po + P,x:
+ el,
where el, u,, and x, are mutually independent.
XI = x, + u,,
3.2.
NONLINEAR MODELS WITH NO ERROR IN THE EQUATION
r,
Observation
57 1 199 572 553 506 247
1
2 3 4 5 6 7 8 9 10
0
50 622 58
259
XI
394 175 370 412 341 232 46 42 432 5
(a) Assume that o,, = 100 and use the method of Section 3.1.5 and Example 3.1.5 to estimate the parameters of the model. Estimate the variance of the approximate distribution of your estimators. (b) Assume that u,, = u,, = 100. Construct*estimatof (3.1.14). What do you conclude about the adequacy of the model? Using 100 1,where 1 is defined in (3.1.15), as a revised estimate of u,, = u,,, calculate estimates of the parameters. Calculate an estimate of the covafiance matrix of the approximate distribution of your estimator setting ouu= u,, = 100 1. Wolter and Fuller (1982a) have shown that this estimator of variance is satisfactory for the model in which the covariance matrix is known up to a multiple. 9. (Section 3.2) Use the general nonlinear procedure of Section 3.2.3 to estimate the parameters of the model of Example 3.2.1. Estimate the true values and the variances of the estimated true values. Plot c?;:,‘’ i?, against 9,. How does Ci,, vary from observation to observation? 10. (Section 3.2) (a) Use the nonlinear least squares method of Example 3.2.2 to fit the linear model y , = Po
+ Pixf, (e,, u,)’
(Y, XI) = h,x,) + (el, u,),
-
NI(O,O.251)
to the data of Example 1.3.2. Compare the nonlinear least squares estimated variances for the estimated x values to the estimated variances given in Example 1.3.2. Using the data and error specification of Example 1.3.2, estimate the parameters of the model
Y , = a.
+ a l exp(a,x,),
(Y, X , ) = (Y,, x,) + (e,, u,)
bv nonlinear least sauares. Are the data consistent with this model?
11. (Section 3.2) Estimate the parameters of the model of Example 3.2.2 assuming that there is zero measurement error in X for X = 0 and X = 100. Estimate the covariance matrix of
your estimators using the methods of Example 3.2.2. Note that the degrees of freedofn fqr t)e residual mean square is six and that, correspondingly, the estimated variances of (Po, pi, P2) are larger than those in Example 3.2.2. The estimated variances of Example 3.2.2 are biased if the error in X at X = 0 and X = 100 is zero. 12. (Section 3.2) Using the data of Example 3.2.2 estimate the parameters of the model Yl = Po
-
+ Pi exp{P*xJ + P3CexPIB2x,~l’
under the assumption (e,, u,)’ NI[O, o2diag(1.44, I)]. Estimate the covariance matrix of the approximate distribution of your estimators.
260
EXTENSIONS OF THE SINGLE RELATION MODEL
13. (Section 3.2) A purpose of the experiment described in Examples 3.2.2 and 3.2.3 was to develop a method of estimating the gas saturation of Berea sandstone. Assume that the experiment is conducted on a sample of unknown saturation and that ( V , , , X 2 ) = (1225.0,0.12) is obtained. Add these observations to the data of Example 3.2.3 to obtain improved estimates of the parameters and an estimate of the gas saturation for the new sample. 14. (Section 3.2) Assume that the model .x:
-
+ Y:
= y2.
(Y,, XI) = (Y,, x,)
+ (e,,u,)
holds, where (el, u,)’ NI(0, I d ) and y 2 is a parameter. Let (x,, y,) be uniformly distributed on the circle x z yz = 9. Assume that on the basis of 1000 observations generated by such a model, the parameter y is estimated by the maximum likelihood methods of Section 3.2, wherein the x, are treated as fixed. What would be the approximate value of the estimator
+
of y?
15. (Section 3.2) In Example 3.2.5 the parameters of the quadratic function were estimated by least squares. Given an observation (7, A’,), show that the x value of the point on the curve that is the minimum distance from (7,A’,)is the solution of a cubic equation. Give the equation. 16. (Section 3.2) An expression for the variance of the nonlinear estimator of b was given in Theorem 3.2.1. Show that this expression is the variance expression one obtains as the inverse of the information matrix associated with the model of Theorem 3.2.1. 17. (Section 3.2) Is the covariance matrix for r”2(il - 2): o! Theorem 3.2.1, denoted by n,,,,, singular? Is the (k + p ) x (k + p ) covariance matrix-of r ” 2 [ ( B - Po)’; (i,- 231 singular? 18. (Section 3.2) Compare expression (2.3.32) for V { i , ) with expression (3.2.25) for L,,. 19. (Section 3.2) The data below are for 27 stag-beetles. The data have been analyzed by Turner (1978) and Griffiths and Sandland (1982). The model postulated for the true values is
Assume (el. u,)’
-
a,y,
+ 8 , logy, + a 2 x , + j2log x, - I = 0.
NI(0, I d ) .
Stag-beetles (Cycltmtnatrrsfurandus) ~~
Observation 1
2 3 4 5 6 7 8 9 10 11 12 13 14
Mandibular Length (V,)
Body Length
(A’,)
Observation
3.88 5.3 I 6.33 7.32 8.17 9.73 10.71 11.49 12.08 12.73 14.11 14.70 15.84 17.39
16.50 18.70 20.05 20.44 21.48 22.47 22.40 23.52 24.05 24.59 24.33 24.56 25.50 25.83
15 16 17 18 19 20 21 22 23 24 25 26 27
~
Mandibular Length
Body Length
18.83 19.19 19.92 20.79 21.53 22.54 23.25 23.96 25.38 28.49 30.69
26.68 27.13 27.36 27.61 28.5 I 28.96 29.25 30.27 30.63 33.37 35.37 37.00 39.50
( u,)
32.00 34.50
(XI)
3.3.
THE NONLINEAR MODEL WITH AN ERROR IN THE EQUATION
26 1
(a) Estimate ( a l ,PI, a,, p,, a’) under the assumption that (Y,,
X,)= (y,, x,) + ( e f t 3.
(b) Estimate ( a l , p,, a z , pz,a’) under the assumption that
(Y:/’, x,“’.) = (y,“’, x:”)
(c) Estimate (a,, PI, a,,
+ (e,, u,)
PI,u2) under the assumption that (log U,, log X,)= (log Y,. log x,) + (e,, u3.
(d) Plot the standardized residuals against P,, P,’/’, and log P, for fits (a), (b), and (c), respectively. Choose a model for the data.
3.3. THE NONLINEAR MODEL WITH AN ERROR IN THE EQUATION This section is devoted to the nonlinear measurement error model containing an error in the equation. The model is (3.3.1) z, = 2, + a,, Y, = Q(X,, B) + 41,
-
LJ],
(q,, a,)’ NI[O, block diag(a,,, where 2, = ( y , X,),a, = (w,, u,) is the vector ofmeasurement errors, and (q,,a,) is independent of x,. It is assumed that Eaa is nonsingular and known and )continuous that the form of g(. , .) is known. It is also assumed that g(x, /Iis
and possesses continuous first and second derivatives with respect to both arguments for x E A and /? E B, where A and B are subsets of p-dimensional Euclidean space and k-dimensional Euclidean space, respectively. We shall simplify our discussion by assuming C,, = 0. We first study the structural model where the x, are a random sample from a distribution with a density and finite fourth moments. If we know the form of the x distribution, we might apply the method of maximum likelihood to estimate p, aqq, and the parameters of the x distribution. For most distributions this is not a simple computation. Therefore, we first consider the situation in which the parameters of the x distribution are known. 3.3.1. The Structural Model If we know the form and the parameters of the x distribution and of the a distribution, we can evaluate the conditional distribution of x given X. In particular, when x, and a, are normally distributed, the conditional distribution of x, given X, is normal and the conditional expectation and variance of x, given X, are E{xt I X,)= B x + (X,- P x ) ( L + L) - lxxx> (3.3.2) qxll
x,>=
L
X
-L ~ i ; L
262
EXTENSIONS OF THE SINGLE RELATION MODEL
respectively. Using the conditional distribution of x, given X,, we can evaluate the conditional moments of y, given X,.We have ElY 1x1 = Jg(x, B)dK,x(xI X),
V{YlX) = cqq+ J[cr(x,
(3.3.3)
8) - ~ { Y ~ x } ] z d ~ x , , ( x l x ) ,
where F,,,(x I X) is the conditional distribution of x given X. The conditional moments of Y; given X, are
E{YIXI= E{Y(Xj, VIYIXI
=
V{Y(X) + G W W
The integrals (3.3.3) will often be difficult to evaluate. If the elements of Z;$Zuu are not large, we may approximate the function g(x,fl) with the second-order Taylor expansion, S(X9 B) = g(W
7
B) + g
m , B)(x - wt' + 9(x - W)g,xlW, B)(x - WY,
where W, = W(X,) = E{x,IX,}, the row vector g,(W, fl) is the first derivative of g(x, B) with respect to x evaluated at x = W , and g,,(W, 8) is the matrix of second derivatives of y(x, B) with respect to x evaluated at x = W . Using this approximation for g(x, B), we can approximate the integrals of (3.3.3) by
m,
E{Y I XI = B) + (0.5) tr[g,,(W, B)V{X IXH, V Y XI = c g m , nlvb IX",(W, nl' + bqq.
We can write
I
(3.3.4) (3.3.5)
(3.3.6) Y; = NWr, B) + I , , where h(W,, /Iis )the conditional expected value given in (3.3.4) and the approximate variance of I; given X, is
+
Grrrt
=~
+ [gx(W,, B)]v{~tlX,}[gx(W,,PI]',
e e
with e, = q, w,. One can estimate fl by applying nonlinear least squares to expression (3.3.6). Ordinary nonlinear least squares will not be efficient for B because the variance of I , is not constant. Therefore, it is necessary to iterate nonlinear estimation and variance estimation in the calculations. The first step is the application of ordinary nonlinear least squares to (3.3.6). Using the ordinary nonlinear least squares estimator of /I, denoted by j,and the residual mean square S& = ( n
-Iq-1
c $, n
t= 1
(3.3.7)
3.3.
263
THE NONLINEAR MODEL WITH AN ERROR IN THE EQUATION
where F, = with
I: - g(W,, g) - (0.5) tr[gx,(W,, ~ ) V { x , ( X r } ]one , can estimate oee dee
= stt
- n-1
,=
1
gx(w,, j)v{xrlxr}g~(wr,ii).
(3.3.8)
A weighted estimator of o,, is zee
=
(
r= 1
8iri)-
r=1
[T:
-
gx(wr, ii)v{xr 1 xr}g:(wi,
a)],
(3.3.9)
where 8rrii
= Bee
+ gx(Wr, Is)V(xi(XtjgXWr,S)
and Gee and d,, are restricted to be nonnegative. Given preliminary estimates of /3 and g e e ,denoted by and Gee, a weighted nonlinear least squares estimator of /3 is the /3 that minimizes
a
n
C r= 1
zit:{
I: - g(Wr, P ) - 0.5 tr[gxx(Wr, /3)V{~tIXi}l}~,
(3.3.10)
where grrit
= Gee
+ gx(Wr, hV{xrlXr}gWr~ii).
The procedure could be iterated with a new estimator of orrrl, but this will generally produce little change in the estimate of p. If the parameters of the conditional distribution of x, given X, are known, the variance of the approximate distribution of the estimator of that minimizes (3.3.10) is given by the usual nonlinear least squares formulas. If the parameters of the x distribution are not known, it is necessary to estimate them to construct the conditional expected value. Estimation of the parameters of the x distribution introduces additional terms into the variance of the approximate distribution of b. While such terms can be estimated, they are not always simple and may depend heavily on the form of the x distribution. 3.3.2. General Explanatory Variables
In Section 3.3.1, (x,, X,) was a random vector and approximations based on a Taylor expansion of g(x,, /3) about g(W,, /3), where W, = E{x,lX,}, were employed. In this section we relax the assumption about the distribution of x, and outline an estimation procedure that relies on local quadratic approximations to the nonlinear function, where the expansions are about x,. As such, the procedure is appropriate for fixed or random x,. Because of the local nature of the approximations, the measurement error should not be overly large relative to the curvature of the function.
264
EXTENSIONS OF THE SINGLE RELATION MODEL
By expanding the function g(X,, /I) about x, we can write
I; = B(X,, B) + e, = 9(X,, B) + B, + or,
(3.3.1 1)
where
4 = E { - o w 4 - Xt)gxx(xf,B)(X, - X l Y L v, = I:- d X , , B) - B,
(3.3.12)
(3-3.13) tr{(u;u, - L)gXx(Xt, B)}, x, is on the line segment joining x, and X,, and the approximation of (3.3.13) arises from the approximation of the expectation in (3.3.12). Equation (3.3.1 1) is a form of the model studied in Section 2.2 with g(X,, /?) replacing X,/l and Y, - B, replacing Y,. In Exercise 2.9 of Section 2.2 it is demonstrated that the estimator of /3 for the model (3.3.1) with g(x,, /I) = x,B can be defined as the j that minimizes the estimator of nee.Therefore, we choose as our initial estimator of B, the /lthat minimizes = e, - gx(Xt,B)ul - 0.5
n
1 [X - dxt, B) -
t= 1
where
Br(%p
PI]'
a%, B) = -0.5
n
-
1 gx($r,
1=1
BPuu&(%?
Bh
(3.3.14)
n)
tr{~"ugxx(%,
and it is an estimator of x,. One choice for k, is the estimated conditional mean constructed as if x, were normally distributed,
t, = X + (X, - X)rnj&n,,
- Euu).
(3.3.15)
Note that g(X,, /?), B&, /Iand ),gx(t,,/lare ) all functions of /Iin (3.3.14). The quantity being minimized in (3.3.14) is an estimator of no,,, where o,, = bqq ow, of model (3.3.1). Given a trial value for /3, denoted by j?, we can approximate expression (3.3.11) by
+
s, + m r , s, + g,(X,, j)(B- s, + (3.3.16) where g,(X,, B', is the row vector of derivatives of /I) with respect to fl evaluated at (x, /3) (X,, s). It is worthwhile to compare the approximation used in (3.3.10) with that Y, = dX,,
4,
g(x,
=
of (3.3.16). The random model of Section 3.3.1 was used to generate the approximate expectation of Y, given X,. In that case we expanded the function about W,,where W,is the conditional expected value of x, given XI, to obtain
E { Y, Ixt,
A
g w , , B) + 0.5 tr[gxx(Wt, B)V{X,I XJl.
3.3.
265
THE NONLINEAR MODEL WITH AN ERROR IN THE EQUATION
In the resulting minimization the X, and, hence, the W, are treated as fixed. In the approximation (3.3.16) the expansion is about the point x,, which is considered fixed in the approximation. Because we can observe X,, an approximate expression for g(x,, /3) as a function of X, is desired, but X, is treated as random in the minimization. Because the expansions used in (3.3.10) and (3.3.16) are about different points, the estimated bias enters the two expressions with different signs. Using (3.3.16) and a Gauss-Newton algorithm on (3.3.14), an improved is defined by estimator of /I, denoted by j(l,, -1
a)
B)
where E, = I: - g(X,, - B,(K,, g") and g&, is the matrix of the second derivatives with respect to x and 3/ evaluated at (xf, /?) = (%,, Expression (3.3.17) is similar to the expression for the estimator of fl defined in (2.2.12). The second term within the first set of curly brackets can be viewed as an estimator of the bias in g;P(X,,$)gp(Xt,$) as an estimator of gb(x,, d)gp(x,,S). The second term within the second set of curly brackets on the right side of (3.3.17) can be viewed as the negative of the estimated covariance between u, and gb(X,,/I) The . procedure associated with (3.3.14) and (3.3.17) can be iterated to obtain the /? minimizing (3.3.14). An estimator of o,, is given by the minimum of (3.3.14) divided by n - k. Improved estimators of the x, can be constructed as -'-I'
9, = x,- ~ , ~ u u f t ~ u u t t ~
8).
(3.3.18)
where
and d,, is the estimator of o,, defined by the minimum of (3.3.14). Having obtained estimators from the unweighted analysis, we can then construct a weighted estimator of /I by minimizing the quantity
where d,,,, is the b,,,, defined in (3.3.18) with 9, replacing 2,. The theoretical justification for the estimator obtained by minimizing (3.3.19) requires that
266
EXTENSIONS OF THE SINGLE RELATION MODEL
the error variances become small in the limit because d,,,, is a function of 2,. Under the assumption of normal errors, a reasonable estimator of the variance of the approximate distribution of the estimator defined by (3.3.19) is n-
(3.3.20)
lM;,iGM;ni,
where Mlgng
= e-1
t= 1
gi:r[gb<xt, B)gp(xt, B) - gps(at,
B)xuugxp(grt
B)I,
and ) is the final estimator of /3. For nonnormal errors, an estimator of G analogous to that given in (3.1.12) is (3.3.21)
Example3.3.1. We apply the procedures of this section to the data of Table 3.A.5. The data were generated using the model
-
Yt = P o
+ P I 4 1 + PZx:,,
(k;?XrJ = (Yr, Xtl) + (er, 4 , )
and (et,utJ NI[O, diag(a,,, oUu)].We analyze the data under the assumption that auu= 0.09 is known and oeeis unknown. The estimator (3.1.6) computed with SUPER CARP is
[$,b1,b2J = [-0.005, 0.540, 1.0071, (0.108,0.098,0.230)
where #?, = (0,0.3, O.57666Xt), X, = (1, Xrl, X:, - 0.09), gUurt = #il#rl, and fi, = 1. The standard errors are computed from the estimated covariance matrix (3.1.12). The foundation for the # t l vector is given in Example 3.1.5. The computations for the quadratic model are relatively simple because the model is linear in the p parameters. Also, the second derivative, gxx(x,,/I),
3.3.
267
THE NONLINEAR MODEL WITH AN ERROR IN THE EQUATION
entering the bias expression, B,(x,, /?), is a function of gw,,
B) + m
t
,
P2 only. Hence,
B) = P o + P1Xrr + P2(x:1 - %u)
(Po, PI, P,)
and it follows that the estimates (3.3.14). Estimators of the xtl were constructed as
defined by (3.1.6) minimize
4 1 = x,1 - ~ , U t P " " t A , *-I*
where
(2,,,fj2,,,,) = [see+ O.O9(8i +
itI= XI + O.8489(xr1- XI),
ijt = k; -
-o.o9(p1
+ 282%1)],
- plirl - p2(X:1
- 0.09),
and m ~ ~ ( r n X x0.09) = 0.8489. The estimator of o,, obtained by evaluating p2) is a",, = 0.090. These estimates were used to construct (3.3.14) at (Po,
PI,
d,,,, = Cee
+ 0.09(f11+ 2/i2x*f1)2.
The values of f f l ,ij,, and d,,,, for selected observations are given in Table 3.3.1. Second round estimates of (Po, PI, P2) were obtained by minimizing r=1
G,XK - PO - P I X t l - P z ( X , ~-, O.09)l2- [(Po. PI, P 2 ) @ l 1 l 2 l .
TABLE 3.3.1. Selected observations and statistics for quadratic example Observation 1
2
I; 0.70
0.05
X ,1
6,
4 1
~ " " f f
- 1.44 -0.90
- 0.52 -0.18
- 1.23
- 1.14
-0.30 0.07
-0.81 - 1.23 - 1.00 - 1.39
0.43 0.20 0.43 0.29 0.55
- 0.05 0.31
0.06
0.15 0.07 - 0.90 - 0.84
- 0.03 0.38 - 0.47 0.25 0.18
0.11 0.25 0.10 0.19 0.16
0.28 0.1 1 0.43 0.40 0.87
0.75 1.15 1.01 1.21 1.16
0.47
3 4 5
0.99
- 0.8 1
1.10
- 1.36
61 62 63
- 0.06
64
65 111 112 113 114 115
0.30
0.32 - 0.06 - 0.22 - 0.30 0.92
1.86
1S O 2.04 1.85
-0.45
0.65 0.57
0.63
1.11 0.84 1.07 0.80
0.86
0.83 0.69
0.89 0.83
268
EXTENSIONS OF THE SINGLE RELATION MODEL
This expression differs from (3.3.19) only in that a function of X,,is used in the second term in place of a function of The second round estimates obtained by using weighted observations in SUPER CARP are
[bo,bl,b,] = [0.012,0.527, 0.9671, (0.070,0.101,0.194)
where the estimated standard errors are obtained from th? estimated covariance matrix (3.1.12). The estimated standard error for P1j s essenltially the same as that for but the estimated standard errors for Po and p2 are smaller than t h y e of Po and fi,. The estimated variance of the approximate distribution of Po is about one-half that of For the approximations to be adequate, the error variances should not be too large. Limited Monte Carlo simulation indicates that the approximations hold reasonably well for data similar to that of this example.
&,
Po.
00
Example 3.3.2. In this example we fit a moisture response model to some data adapted from the experiment discussed in Example 2.3.2. The yield and soil moisture data are given in the second and third columns of Table 3.3.2. We will treat the data as averages for a pair of plots. The data are not the original yields, but yields adjusted for weather conditions so that the deviations from the fitted function are smaller than actually observed. It is assumed that yield is given by
T = P I C 1 + exp(P2 + P3.,1l-
+ e,,
where X,= x, + u,, X, is observed soil moisture, and e, and u, are independent. The variance of u, is assumed to be known and equal to 0.3313. The first step in our estimation procedure is the fitting of the model by nonlinear least squares using a prediction of the true x, as the explanatory variable. The estimated linear predictor of x, given X,is
i f= 4.612 - (6.7736)-'(6.7736 - 0.3313)(X, - 4.612), where R = 4.612 and mxx = 6.7736. The nonlinear least squares estimates using 3, as the explanatory variable are given in the third column of Table 3.3.3 and the nonlinear least squares estimates constructed with X,as the explanatory variable are given in the second column. An estimate of the bias in the function that is due to the curvature is B&,
j)= -0.S(O.3313)gx,(~,, j),
where j is the vector of estimates in the third column of Table 3.3.3. The fourth column of Table 3.3.3 gives the estimates obtained by nonlinear least
TABLE 3.3.2. Soil moisture and corn yield statistics
Observation
Yield
Soil Moisture
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
2.1 2.0 3.1 4.4 6.9 5.2 3.6 4.8 6.2 4.7 5.3 8.4 11.2 6.2 6.0 8.7 12.4 9.8 11.2 10.8 10.9 13.5 10.7 13.9 11.6
0.6 1.3 1.8 2.0 2.2 2.5 2.9 3.0 3.4 3.6 3.7 4.0 4.1 4.1 4.1 4.1 5.8 5.9 6.1 6.5 7.0 7.9 8.7 9.8 10.2
B,(f,,
b)
- 0.09 -0.10 - 0.09 - 0.08 - 0.04 - 0.06 - 0.07 -0.04 - 0.00 - 0.03 -0.01 0.06 0.09 0.03 0.02 0.07 0.08 0.09 0.08 0.08 0.06 0.03 0.02 0.01 0.01
as(%
2,
ax
0.72 1.23 1.81 2.24 2.95 2.70 2.50 2.84 3.36 3.09 3.28 4.17 4.90 3.68 3.64 4.29 6.15 5.69 6.13 6.40 6.89 8.02 8.61 9.84 10.18
h
d"-":,'%,
1.03 1.31 1.65 1.88 2.18 2.08 2.00 2.12 2.19 2.17 2.19 2.04 1.68 2.17 2.18 1.99 0.97 1.22 0.98 0.85 0.65 0.32 0.22 0.10 0.08
0.45 -0.21 0.03 0.63 1.92 0.49 - 0.95 -0.38 - 0.09 - 1.23 - 1.01 0.4 1 1.96 - 1.02 - 1.14 0.47 1.22 -0.77 0.12 - 0.46 - 0.67 1.14 - 1.36 1.20 - 0.76
Bias Adjustment GLS(?)
Estimate
TABLE 3.3.3. Estimates for the moisture model
Estimate and (Standard Error) Parameter
Bias Adjustment LS(?)
(3.3.19)
LS(X)
LS(?)
12.860 (0.980)
12.859 (0.980)
12.846 (0.921)
12.741 (0.726)
12.606 (0.837)
Pz
2.138 (0.402)
2.285 (0.432)
2.426 (0.456)
2.455 (0.445)
2.363 (0.421)
B3
- 0.6 17 (0.140)
- 0.649 (0.147)
- 0.689 (0.152)
- 0.708 (0.144)
- 0.694 (0.152)
2.36
2.36
2.36
0.99
1.03
-
1.40
-
1.41
P I
RMS a&?,
~
269
270
EXTENSIONS OF THE SINGLE RELATION MODEL
b)
squares using ?, as the explanatory variable and Y, - B,(?,, as the dependent variable. The estimate of aeecalculated from the deviations associated with the fit of column four is d,, = (n - 3)-'
This estimate of
geewas
I =1
?; - n-'
,=1
0.3313gi(if,
s)
= 1.40.
used to construct the estimates
d,,,, = gee+ 0.3313&,
S,
of the variances of the u,. The weighted nonlinear least squares estimate of
/I constructed with ~ u v , las weight, I: - B,(i,, )) as dependent variable, and i ,as explanatory variable is given in the fifth column of Table 3.3.3.
For columns two to five of Table 3.3.3 the estimated standard errors in the parentheses are the standard errors output by a conventional nonlinear least squares program. All are biased estimates because they are constructed under the assumption that the explanatory variable is measured without error. The estimated conditional expectation k, contains an estimation error because some parameters of the distribution are estimated, and this estimation error is not reflected in the estimated standard errors. The estimated standard errors of columns two to four are also biased because the u, do not have constant variance. The smaller estimated standard error for the estimate of fll of the fifth column, relative to that of the fourth column, is due to the fact that the variance of u, is smaller for large x, and the large x, are influential observations for the estimation of the limiting yield. The last column of Table 3.3.3 contains the estimates obtained by approximately minimizing (3.3.19). Only two steps of the interative procedure were calculated starting with the estimates of column five. The estimator o",, was where /$) is the estimate used to weight the observations. Also B,(& from the ith step, was used in the (i + 1)st step of the iteration, rather than computing the bias estimate as a function of the current estimate of /I. The estimated bias is given in Table 3.3.2. The estimates of fi in the fifth and sixth columns of Table 3.3.3 are similar for this example. The estimated standard errors for the final column were computed with the of (3.3.21). The standardized residuals ~?;,~'~i?, are given in the last column of Table 3.3.2. No obvious anomalies are present in the plot of these residuals against 2,.
fi(iJ,
no
REFERENCES Section 3.3.1. Armstrong (1985),Stefanski (1989,Stefanski and Carroll (1985b).
3.4.
MEASUREMENT ERROR CORRELATED WITH TRUE VALUE
271
EXERCISES
+
20. (Section 3.3.1) Let Y, = x,/? e l , where el is independent of (XI, XI)and XIis observed. Let conditional on XI?What is the conditional variance of given X I , if the el are Ind(0, uee)random variables? 21. (Section 3.3.1) Assume that for the model and data for Example 1.2.1 it is known that
W,= W(X,) = E{x,lX,}.What is the expected value of
x
E{xlX}
=
x
70.64 + 0.813(X
-
70.64).
Using this information, estimate (Po, 8 , ) and estimate the covariance matrix of the estimator. If only u;ju,, = 0.813 were known, how would your estimates and estimated covariance matrix change? Compare your estimated covariance matrices to that of Example 1.2.1. 22. (Section 3.3.2) Using the last two columns of Table 3.3.2 of Example 3.3.2, estimate uee. Construct a test of the hypothesis that the variance of e, is constant against the alternative that the variance is a linear function of xI.
3.4. MEASUREMENT ERROR CORRELATED WITH TRUE VALUE 3.4.1. Introduction and Estimators
In certain situations the expected value of the measurement error may be a function of the true values of the variable. For example, consider the binomial random variable x that takes the values zero and one. The two possible values for the measurement error are one and zero when the true value is zero, and negative one and zero when the true value is one. Therefore, the expected value of nontrivial measurement error is a function of the true value. Typically, in situations where the expected value of the measurement error is a function of the true value, the variance of the measurement error is also related to the true value of the variable. If the functional relationship between the mean of the error variable and the mean of the true variable is known, the model can be transformed into the model of Section 3.1.1. Assume that the vector of true values z, satisfy the linear model zta = q,, At = zi + (3.4.1) 511
where a' = (1, -/?'), q, is the error in the equation, A, is the observed vector, and t, is the measurement error. Let (3.4.2)
where C is known. It follows from (3.4.2)that E{A,lz,} = z,(I + C). If we let
Z, = A,(I + C)-'
+
and
E, =
A,(]
+ C ) - ' - z,
(3.4.3)
we have Z, = z, e,, where E{e,Iz,} = 0. Therefore, the vectors Z,, z,, and el have the properties that have been associated with these symbols in earlier
272
EXTENSIONS OF THE SINGLE RELATION MODEL
sections. If estimators of the error covariance matrices, C,,,,, t = 1,2, , . . ,n, are available, the theory of Section 3.1.1 is applicable to the transformed variables of (3.4.3). An important example of measurement error whose distribution is related to the true values is the multinomial observed subject to error. 3.4.2. Measurement Error Models for Multinomial Random Variables
We consider measurement error models for populations where the observation process consists of assigning each member of a sample of n elements to one, and only one, of Y categories. We adopt the convention of writing the observation as an r vector. If the tth sample element is placed in the first category of the A classification, we write A, = (1,0,0,. . . , 0);
(3.4.4)
if the tth element is placed in the second category, we write A, = (0, 1,0, . . . ,0), and so on.
The jth element of the A, vector, denoted by A,,, is a binomial random variable. There are alternative ways in which the measurement error process can be formalized for such populations. The latent structure model assumes there exists a population of responses for each element of the population. The mean of the responses for the A classification for the tth individual is denoted by zA,,where (3.4.5) z,, = E ( A j J = i t> and the symbolism means that the average is for the population of possible responses for the tth individual. The j t h element of the vector zA,,denoted by zA,,,is the probability that the tth individual is placed in categoryj. The observation for the tth individual is (3.4.6) A, = z A r + &At, where &A, is the measurement error. By construction, the mean of the error vector for the tth individual is the zero vector. The covariance matrix of the measurement error for the tth individual is E(ekeAr)
= ZAAir,
(3.4.7)
where C,,,, = diag zA,- &pA, is the covariance matrix for the multinomial distribution with probability vector zA,and diag zAtis the diagonal matrix with the elements of z,, on the diagonal. The latent structure model was developed by Lazarsfeld (1950, 1954) and is discussed by Andersen (1982). The latent structure model is sometimes called the average-in-repeated-trials
3.4.
MEASUREMENT ERROR CORRELATED WITH TRUE VALUE
273
model. When the individuals fall into a fixed number of distinct classes, with ) in class j , the model is called the response probability I ~ (for~ individuals latent class model. The latent class model has been used when there are responses to a number of different items. In the latent class model the number of distinct classes is often much smaller than the number of possible item response combinations. We call the second model for classification error the right-wrong model. The right-wrong model assumes that every element truly belongs to one of the r categories. The response error for the population is characterized by a set of response probabilities K ~ i j where , xAijis the probability that an element whose true A category i s j responds as category i. Because the categories are , for each column. mutually exclusive and exhaustive, the sum of the K ~ is~one That is, every element in true category j is placed in one of the available categories. There are two submodels of the right-wrong model. In the first submodel every element in true category j has the same vector of response probabilities. This submodel reduces formally to the latent class model in which there are exactly r types of individuals. Only the set of parameters used to describe the two models differ. In the second submodel of the right-wrong model, different elements are permitted to have different response probabilities. For example, some elements might always be reported correctly under this model. The second submodel of the right-wrong model can also be parameterized as a latent structure model. Under both submodels of the right-wrong model, the observed distribution of A, obtained by making a single determination on each element of a random sample of elements is multinomial. Likewise, the distribution, over the population of elements, for the multinomial of dimension r2 obtained by making two determinations on each element of the population is the same for the two submodels. More than two determinations per element are required to discriminate between the two submodels. The mean vector for the observed proportions is
&=
K,&,
(3.4.8)
where nA = (nA1,nA2,. . . , nA,) is the vector of proportions for the true classification and K~~~ is the ijth element of the matrix K A . The vector pA is the mean over the population of elements of the observed vector A,. If the population mean of the observation A, is equal to the vector of true proportions, p>
= R> = K,z>,
(3.4.9)
then the measurement error is said to be unbiased. Unbiased measurement error for the multinomial is analogous to zero mean measurement error
214
EXTENSIONS OF THE SINGLE RELATION MODEL
for continuous random variables. Hence, it is an important model for applications. For the binomial population, the matrix of response probabilities for the simple right-wrong model contains four elements. The elements of each column must sum to one and this imposes two restrictions on possible entries for the matrix. If the measurement error is unbiased, a third restriction is imposed on the elements of the matrix. It then follows that the elements of the matrix can be expressed as functions of the population proportions and one additional parameter. Under the unbiased measurement error model we can write 1 - a + anAi,
i=j (3.4.10) UnAi, j # i, where xAiis the fraction of the population that is truly in class i and a is the parameter of the response error distribution. The two-by-two matrix of response probabilities is KAij
=
1 - a + anA2
(3.4.11)
Using (3.4.10), the expectation of the observed proportion in class one is PA^ = (1 -
+ ~ n A 1 ) n A i+ a n A i n ~ 2= n A 1 ,
which demonstrates that the response error is unbiased under the specified model. One can view the postulated structure in the following way. There is a two-level response mechanism. At the first level, the probability is 1 - a that the correct response is obtained and the probability is a that one proceeds to the second level. At the second level the response is given with probabilities proportional to the population proportions. These error models can be applied to the estimation of the entries in a two-way table. Assume that individuals are classified with respect to characteristic A and with respect to characteristic B, where characteristic A has r classes and characteristic B has p classes. Assume that response error in the two classifications is independent. Using the right-wrong model it is desired to estimate the true proportions in the cells of the two-way table for the AB classification. Under the model, the expected fraction in the ijth cell of the observed AB table is (3.4.12)
where uAiCis the probability that an element in true category k of characteristic A is observed in category i, uBjs is the probability that an element in true category s of characteristic B is observed in category j, and nABdsis the frac-
3.4.
275
MEASUREMENT ERROR CORRELATED WITH TRUE VALUE
tion of the population that is in cell Ls of the AB table. The set of equations (3.4.12)can be written in matrix format as PAB
(3.4.13)
=KA~ABKL
where pABis the r x p matrix of expected values of the observed proportions and nABis the r x p matrix of population proportions for the true values. One can also write Equation (3.4.13)as (3.4.14)
vec PAB = ( K B 0 K ~ vec ) zAB,
where vec P A B is the column vector obtained by listing the columns of PAB one below the other, and K B 0 K A is the Kronecker product of K~ and K A . The vec notation and Kronecker products are discussed in Appendix 4.A. One can solve (3.4.14)for zAB to obtain vec zAB = (K; @ K; ') vec I(AB
(3.4.15)
or nAB= K A
'PABK;
l'.
Let jIABbe the observed two-way table. Then, if estimator of zAB is fiAB =
K~
and uA are known, an (3.4.16)
K i '@A#;
If the original observations were obtained by multinomial sampling, V{vec
= n-'[diag{vec
fiAB}
- (vec @AB)(vec@AB>'],
where n is the sample size. It follows that an estimator of the covariance matrix of vec ri,, is V{vec iAB} = (IC; o K ; ')V{vec $,,)(K, where
o K;
l)',
(3.4.17)
and K~ are assumed known.
Example 3.4.1. In this example we study a model for measurement error, also called response error, in the reporting of employment status. Bershad (1967) identified the effect of response error on estimates of month-to-month change in employment status. Our model is the response error model for the change in employment status developed by Battese and Fuller (1973) and by Fuller and Chua (1983). We use constructed data that are generally consistent with the data discussed by Bershad. We assume that 40,000 individuals are interviewed in each of two months to obtain the data given in Table 3.4.1. It is assumed that the response errors in the two months are independent and unbiased. It is assumed that the ol of model (3.4.10) is ct = 0.166. This value of ct is consistent with the interview-reinterview studies
276
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.4.1. Constructed data for two interviews on employment
Second Month (B)
First Month
(4
Unemployed
Employed
Total
Unemployed Employed Total
0.0325 0.0325 0.0650
0.0275 0.9075 0.9350
0.0600 0.9400 I .OoOo
of the U S . Bureau of the Census cited by Bershad. With these assumptions and the data of Table 3.4.1, it is desired to estimate the fraction of persons that were unemployed in both month one and month two. Under our model, the response probabilities for month A are (h.Al1, KAIZ)
(
= (1 - a
~ ~ ~ ~2 = 2 (an,,,, 2) ~
+ anAl.ranA1.l 1 -~a
+ anA2.),
where, for example, K~~~ is the probability that a respondent in category one is reported in category two in month A and nA1.is the fraction of the population in category one in month A. A n analogous expression holds for month B. Let IABU denote the observed fraction in category i for the first month (month A ) and in category j for the second month (month B).Let nABijdenote the corresponding population parameter. The PAsij are given in Table 3.4.1. Under the unbiased response model with a = 0.166, the estimated K matrix for the first month is obtained from (3.4.1 1) by setting nA1 = eA1.and nA2 = SA2,.Thus,
8,
= [0.844
0.0101
0.156 0.990
and the corresponding estimated
K
matrix for the second month is
p, = [0.845 0.0111 0.155 0.989
*
Therefore, the estimated vector of n values is vec ,21
= (a,
vec gAB = (0.0450,0.0200,0.0150,0.9200)’. @I
I)
In this example measurement error is very important. The estimated fraction of individuals shifting from unemployed to employed is only slightly more than one-half of the observation subject to measurement error. Because the sample proportions are consistent estimators of their expectations, the estimator of zABis consistent. The estimated covariance matrix
3.4.
277
MEASUREMENT ERROR CORRELATED WITH TRUE VALUE
of (3.4.17) is not applicable in this example because we have not assumed the entire matrices x A and x B to be known. We have only assumed the parameter a to be known. To construct an estimator of the covariance matrix of the limiting distribution of the estimator, we express the estimator as a function of the sample proportions. Because the sum of the probabilities is one, the two-by-two tables are determined by three parameters. If we choose r’ = [fiAl.,fiB1.,fi,B11] as our vector of observations and 8 = [xAl.,nBl.,7cABll] as our vector of parameters, we can obtain explicit expressions for the 7c estimates in terms of the p estimates. From (3.4.11) and (3.4.13) we have 72AB11
=
- a)-2{fiAB11
- fiAl.fiB1.[l
-
-
(3*4*18)
a)2]},
is the observed unemployment rate in 72,*, = fiAl.,and SB1,= fiBl., where month A. If we assume simple random sampling, the estimated covariance matrix of r of Table 3.4.1 is
v ( r } = (39,999)-’
[
0.056400 0.028600 0.030550 0.028600 0.060775 0.030388 0.030550 0.030388 0.031444
1
.
Let the vector of 7c estimates be denoted by 8. Using Corollary 1.A.1 and expression (3.4.18), we obtain the estimated covariance matrix of 6,
V { b }=
[
1.410 0.715 0.715 1.519 1.039 1.032
I
1.039 1.032 1.508
.
A critical assumption in the model of this example is the assumption that the probability of a correct response depends only on the current status of the individual. For example, it is assumed that the probability an employed person reports correctly is the same whether or not the person was employed last month. The effect of previous employment status could be investigated by conducting a reinterview study in which the previously reported unemployment status is used to classify respondents. 00
Example 3.4.1 illustrates the use of response probabilities to construct improved estimators of two-way tables. Dummy variables based on multinomial data are also used as explanatory variables in regression equations. Let
K = x,P
+ e,,
(3.4.19)
where x, is an r-dimensional vector with a one in thejth position and zeros elsewhere when element t is in the j t h category. Assume that e, is independent of x, and independent of the error made in determining x,. Let A,, as defined in (3.4.4), denote the observed classification.
278
EXTENSIONS OF THE SINGLE RELATION MODEL
Under the right-wrong model, the response error for the multinomial variable A,, as an estimator of x,, is correlated with the true value and has a variance that depends on the true value. If the matrix K,, of response probabilities is known, methods introduced in Section 3.4.1 can be used to transform the observations to make the model conform to that of Section 3.1.1. Let K,, denote the matrix of response probabilities and assume that a column of the matrix K,, holds for every member of the population. Then, if we let
xi = K A ~ A ~ ,
(3.4.20)
we can write XI = x, + u,, where E{u,) = 0 and E { X , I i = t } = x, for all Now V{Ailx, = l j }= Q j j = diag(lr,,j) - K ; . ~ K , , . ~ ,
t.
(3.4.21)
where K , , , ~is thejth column of K,,, diag(K,,.j) is the diagonal matrix with the elements of the vector K , , , ~on the diagonal, and 8, is an r-dimensional vector with a one in the j t h position and zeros elsewhere. It follows that ~,,ull
= v { U j I X , = ti}= K , l R j j K A 1 ’
(3.4.22)
and the unconditional variance of u; is
(3.4.24)
Assume that we estimate C,,,, with
~,,,,,= [P{A,
=
/j}]-lP{Xl = t j } K A 1 n j j K A 1 ’
(3.4.25)
when A, = l j .Then an estimator of /3 is (3.4.26) where XI is defined in (3.4.20)and f,,,,, is defined in (3.4.25). If K~ and the population probabilities are known, the estimator (3.4.26) satisfies the conditions of Theorem 3.1.1 and the variance of the approximate distribution o f ) can be estimated using that theorem. Because the A, deter-
3.4.
279
MEASUREMENT ERROR CORRELATED WITH TRUE VALUE
mine r categories, it is possible to estimate the variance of I; - X,B for each category and to construct a generalized least squares estimator. In practice we may have an external estimate of x A or of parameters such as the c1 of (3.4.11) but be unwillling to assume that the vector of population , estimator of the vector of probabilities probabilities is known. Given K ~ an px is given by
F; = K; q, where A = n-' A,, the j t h element of p x is Pix,= r?j), and the j t h element of A is the estimator of P { A t = fj).If the px must be estimated, the E,,,, do not satisfy the assumptions of Theorem 3.1.1 because the based on estimated p x are not independent. In practice one may be willing to ignore the error in due to estimatiyg fix and to use (3.1.12) as an estimator of the variance of the estimator of p. The operation of transforming the observed vector to obtain errors with zero mean furnishes another verification for the method of constructing estitable given by Equation (3.4.16).Assume that elements are mates of the zAB classified according to two characteristics, A and B. Assume that there are I classes for A and p classes for B. Let the response error for the two classifications be independent. If element t belongs to category i of A and to category j of B, then
e,,,,
x, = dri and
y, = l P j ,
where lriis an r-dimensional vector with a one in the ith position and zeros elsewhere and ePjis a p-dimensional vector with a one in the j t h position and zeros elsewhere. The observed A B table can be written (3.4.27)
t= 1
and the population table of true proportions can be written z,4B
= E{xiyt}.
(3.4.28)
The entries in the pAB table are probabilities. If we divide the entries in the table by the marginal probabilities, we create a table of conditional probabilities. Thus, pBI,4,ij
=
p{Yl
= 'pjlxt
= ' r i } = /lASijp,i!.
(3.4.29)
The entries in the table of conditional probabilities for B given A can be thought of as a table of population regression coefficients obtained by regressing y on x. That is, the PBIA table is PBIA
=
[E{xixt}]-
'E{x;yt).
(3.4.30)
280
EXTENSIONS OF THE SINGLE RELATION MODEL
Now the vector (3.4.31)
(Y,, X,) = (B,K, ", A,KA ")
+
satisfies (Y,, X,)= (yc,x,) (e,, UJ, where the expected value of (et, u,) is the zero vector. Under the assumption that the response errors in A and B are independent and that the matrices K~ and K, are known, an estimator of the P,,, table can be constructed with estimator (3.4.26),where (Yc,X,) is defined in (3.4.31).
REFERENCES Andersen (1982),Battese and Fuller (1973), Bershad (1967),Chua (1983), Fuller (1984), Fuller and Chua (1984), Goodman (1974), Haberman (1977), Hansen, Hurwitz, and Bershad (1961), Lazarsfeld and Henry (1968).
EXERCISES 23. (Section 3.4) Let B, be an observed r-dimensional vector with a one in the ith position and zeros elsewhere when the tth element of a population is placed in the ith category. Let K~ denote the r x r matrix with the 0th element giving the probability that an element truly in the jth category will be placed in the ith category. Then where b, is the r-dimensional vector with a one in the ith position and zeros elsewhere when the tth element truly belongs in the ith category. (a) Assume P = K ~ Pwhere , P = (Pl,P 2 , , . . , P,). Show that E{b;lB;} = DK~D-IB;, where D = diag(P,, P,, . . . , P,) and Pi is the population proportion in category i, and it is assumed that B, is the value for a randomly chosen individual. Give the general expression for the conditional probability. (b) Assume that the elements of K~ are given by K ~ i= j
Show that E{b;IB;} = K ~ B ; . (c) For the model of part (b) show that
1-a+aPi, UPi,
i=j j # i.
+
E{(K; 'B; - b;)(B,Ki " -, b,) 1 &} = (K; ' - K~)B;B,(K~ ' - K ~ ) ) diag(K&) - K~B;B,K~, where diag(r,B;) is a diagonal matrix with K ~ Bas: the diagonal. 24. (Section 3.4)Assume that it is possible to make repeated independent determinations on the elements of a population. Assume that there are two competing submodels of the right-wrong model of Section 3.4.2. In one submodel the probability of an incorrect response is the same
APPENDIX 3.A.
281
DATA FOR EXAMPLES
for all elements of the population. In the second submodel it is assumed that there are different types of elements with different probabilities of incorrect response. Is it possible to discriminate between the two submodels with two independent determinations per element on a sample of elements? Is it possible to discriminate with three independent determinations per element? 25. (Section 3.4) Let A, be a 2 vector representing the observed classification of a binomial random variable. Let x, be the 2 vector representing the true classification. Let E(A;lx;}
= K~x;,
where K” is the matrix of response probabilities. Assume that’the response error is unbiased. Let d, = A, - x,. Show that
26. (Section 3.4) In Example 3.4.1 the model assumed only the parameter a to be known. The population proportions were estimated under the unbiased response error constraint. Assume that the matrix of response probabilities =O ,,[
0.1561 0.010 0.844
is known to hold for both months. Estimate the x A Btable and estimate the covariance matrix
of your estimators.
27. (Section 3.4) Verify the expression for the variance q { i }given in Example 3.4.1.
APPENDIX 3.A.
DATA FOR EXAMPLES
TABLE 3.A.1. Data on Iowa farm size Logarithm
Logarithm
Farm Size
Experience
Education
2.079 3.951 2.565 3.258 2.773 0.0 2.565 0.0 2.944 3.258 3.367 1.099 3.401 2.565
5.5 3.5 5.0 4.5 6.0 6.0 6.5 7.5 6.5 6.5 6.0 5.5 4.5 7.5
of
of
Logarithm
Logarithm
Farm Size
Experience
Education
1.386 3.178 3.091 2.944
6.0 5.5 6.5 5.0 7.0 6.0 5.5 6.5 5.0 4.5 8.5 5.5 4.5 4.0
of
of
~
5.613 3.807 6.436 6.509 4.745 5.037 5.283 5.981 5.252 6.815 5.421 5.460 4.263 6.426
5.352 5.464 5.580 5.268 5.537 5.283 5.481 5.288 5.656 5.273 6.667 5.999 6.550 5.869
1.099
3.638 3.555 1.386 3.555 3.091 2.079 3.434 3.045 2.485
282
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.A.1. (Continued) ~
Logarithm of Farm Size
Logarithm of Experience
4.466 4.997 6.080 1.609 4.394 5.429 6.746 5.08 1 4.382 4.771 5.584 5.159 5.743 6.439 7.487 4.771 4.949 5.268 4.654 6.627 6.914 6.609 6.599 5.584 6.560 2.565 6.351 2.303 5.768 3.951 6.433 6.223 6.654 5.673 5.768 5.278 5.142 5.505 6.282 5.425 5.568
3.135 1.386 0.0 0.693 3.09 1 2.996 2.197 1.099 3.466 1.609 3.091 2.485 1.099 3.09 1 2.833 0.0 3.714 4.249 2.890 2.773 2.833 3.689 2.833 3.332 4.007 3.219 3.178 2.079 2.996 2.833 2.996 4.3 17 3.135 2.303 3.401 2.565 2.708 4.060 1.792 2.944 2.079
Education
Logarithm of Farm Size
Logarithm of Experience
Education
5.0 4.5 5.5 7.0 6.0 7.0 6.0 7.5 5.5 5.5 6.5 7.5 6.0 5.5 6.0 5.0 6.0 3.0 5.5 5.5 6.0 6.0 5.0 6.0 5.5 5.5 6.0 6.0 3.5 6.5 5.5 6.5 5.5 6.5 5.0 5.0 5.5 4.0 6.0 3.5 4.0
6.122 5.063 3.466 3.664 3.761 5.293 5.670 5.030 3.951 4.844 6.392 5.236 5.278 6.666 6.165 6.6 12 6.252 4.990 4.956 5.308 6.938 5.958 6.465 5.328 7.017 5.476 5.897 4.663 4.543 5.298 4.949 6.613 6.692 5.293 7.731 5.730 5.505 3.332 4.970 5.017 5.974
2.996 3.332 2.890 1.386 3.526 2.708 1.386 3.497 0.0 3.850 3.584 3.045 2.833 3.497 3.807 2.773 3.584 2.708 4.533 1.609 2.485 3.296 3.555 3.584 2.833 3.219 2.708 2.708 3.555 4.304 3.761 3.970 3.850 3.361 3.829 2.708 2.485 0.0 2.773 1.386 2.833
7.5 4.5 4.5 5.5 4.0 5.0 6.5 7.0 6.5 3.5 4.0 5.5 5.0 5.0 5.5 6.0 6.0 6.0 4.0 5.5 6.0 6.0 5.5 3.5 6.5 4.0 5.5 6.5 4.0 5.0 5.0 5.5 5.5 4.5 5.0 4.5 7.0 6.0 5.5 5.5 5.5
APPENDIX 3.A.
283
DATA FOR EXAMPLES
TABLE 3.A.1. (Continued) Logarithm
of Farm Size
Logarithm of Experience
4.898 6.852 6.784 6.829 6.201 5.595 5.971 5.505 6.006 5.537 3.555 5.187 5.407 6.054 4.820 7.299 5.293 5.136 5.501 5.900 6.236 6.380 5.724 5.598 5.48 1 5.165 5.01 1 4.787 5.308 6.764 4.836 6.686 5.338
0.693 3.638 2.485 3.829 3.045 1.609 3.091 1.609 4.007 2.773 2.996 3.638 3.045 1.099 4.220 3.045 0.693 3.219 2.565 1.792 3.258 1.792 3.401 3.258 3.784 4.043 1.792 4.159 3.178 2.398 1.792 3.135 2.639
Education
Logarithm of Farm Size
Logarithm of Experience
7.0 4.5 6.5 5.0 6.0 6.5 5.0 6.5 3.5 6.0 6.5 4.5 5.5 7.0 5.0 6.0 6.0 5.5 5.5 5.5 6.0 5.5 6.0 5.0 5.5 5.5 7.5 6.0 4.5 7.0 6.0 5.0 5.0
5.313 5.464 6.468 5.900 6.535 5.011 6.346 5.561 5.118 2.079 5.707 4.9 13 4.956 5.835 6.532 4.654 6.031 6.344 5.905 5.635 5.537 5.900 4.844 5.182 6.512 4.094 4.51 1 5.429 6.178 5.811 4.663 4.875 6.201
1.099 3.258 3.219 3.367 3.045 I .946 3.367 2.890 3.738 1.792 2.773 1.386 2.197 3.178 2.079 3.466 3.434 3.555 3.258 2.398 4.094 2.944 1.099 0.0 2.996 3.434 1.609 3.258 2.996 1.609 1.609 3.912 3.09 1
Education 8.0 5.5 6.0 6.5 8.0 7.0 5.5 5.5 5.0 5.0 4.5 7.0 1.5 6.5 6.0 3.5 5.0 5.0 5.5 6.5 5.5 6.0 6.0 7.5 4.5 5.0 5.5 6.5 7.0
7.0
6.0 4.0 6.0
284
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.A.2. Data on textile expenditures Log Expenditure
Observation
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Log Income
Household Size
(Y)
(XI)
(X,)
2.8105 3.1605 3.4564 3.35 15 3.1427 4.9828 3.8726 3.5005 3.2051 3.1133 5.3818 3.3564 5.1192 4.2950 2.9487 4.4244 4.5062 3.3970 3.6608 4.7977 2.9293 3.44 I 5 3.6255 3.4816 2.9680 3.7426 3.3659 3.6916 2.6986 3.0485 3.9057 4.8882 3.1042 3.1020 2.8285 3.6743 3.2908 2.9464 3.1480 3.4629 3.3155
4.2487 4.7194 4.8075 4.4777 4.4355 5.5550 5.5558 4.8463 4.5762 4.2303 4.7218 5.0702 4.5776 4.0892 4.504 1 4.3690 4.3467 4.8097 3.8930 4.7120 4.2486 4.6945 5.0868 5.3184 4.1903 4.7963 4.3260 4.9496 4.0787 4.7928 5.6423 5.1 164 4.2790 4.3619 4.3596 5.4968 4.6981 4.5562 4.5094 4.85 16 5.2429
0.693 1 0.6931 1.0986 1.0986 1.0986 0.693 1 1.7918 1.0986 1.0986 1.0986 I .6094 1.3863 1.6094 0.6931 o.Ooo0 1.0986 1.0986 1.7918 0.0000 O.oo00
1.0986 1.7918 1.3863 0.693 1 1.3863 1.6094 1.3863 1.3863 0.6931 0.6931 1.0986 0.6931 0.693 1 0.693 1 0.693 1 0.6931 0.6931 1.0986 1.3863 1.0986 1.0986
Moved Indicator (x4)
0 0 0 0 0
1 0 0 0 0 1 0 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 1 0
0 0 0 0 0 0 0 0
A P P E N D I X 3.A.
285
DATA FOR E X A M P L E S
TABLE 3.A.2. (Continued)
Observation 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
Log Expenditure (Y)
Income (XI)
Log
Household Size
2.9 139 4.5707 3.9754 3.4212 5.1222 3.7129 2.9538 3.5649 4.1 190 3.6449 2.9474 3.3074 2.9999 3.0784 3.9149 4.5 158 3.2703 5.7206 3.5917 3.2740 3.8237 4.9905 4.0648 3.4285 4.5406 4.0845 3.1397 2.8922 5.1510 3.9937 3.8553 2.5924 4.1913 3.3528 5.0442 3.3928 2.8359 2.6669 3.4381 4.6850 3.2266
4.5952 4.1656 5.1503 4.4454 4.6148 3.6932 4.6957 4.5040 5.6678 4.8324 4.0822 4.409 1 4.2238 4.2821 4.3624 4.5258 5.0018 5.1572 4.7557 4.3027 4.0133 4.6147 5.8566 3.6468 4.6024 5.3116 4.5930 4.2982 4.6137 4.8630 3.8508 3.8723 4.6414 4.5509 5.1408 5.0257 4.8217 4.7405 5.391 1 4.9156 4.3740
0.0000 1.6094 1.6094 1.6094 1.6094 0.6931 0.6931 1.0986 1.6094 1.7918 0.6931 1.0986 0.6931 1.7918
(X,)
0.0000 0.0000 0.6931 1.6094 1.3863 1.7918 0.0000 1.0986 1.6094 0.0000 0.6931 1.6094 0.693 1 0.6931 1.6094 1.9459 0.0000 1.3863 0.693 1 1.6094 0.0000 1.0986 0.0000 0.0000 0.0000 0.693 1 1.3863
Moved
Indicator (x4)
0 1 0 0 1 1
0 0 0 0
0 0
0 0
1 1
0
1 0 0 1 1
0 1
1 0 0 0 1
0 1
0 1
0 1 0 0 0 0 1 0
286
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.A.2. (Continued)
Household Size (X,)
Moved Indicator
4.8804 4.7378 4.8238 4.8093 4.4191 4.6917 5.1155 4.8888 5.0968 4.4723 4.3229 4.2901 5.4455 5.1324 4.1855 4.9826 4.4554 4.9747
1.0986 0.693 1 1.0986 0.6931 0.0000 0.6931 1.6094 0.0000 1.9459 0.6931 0.0000 0.6931 o.oO0o 1.3863 0.6931 0.6931 0.693 1 1.3863
0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1
I Observation
~ I ; ] I;2
X,,
X,,
0
11 9 38 87 5 16 51 21 21 11 21 52 51 38 81 12
11
Observation 83 84 85 86 87 88 89 90 91 92 93 94 95 96 91 98 99 100
3.0413 3.4214 3.7571 3.4723 4.3134 3.3162 3.2280 3.1080 3.9145 4.3283 2.5766 2.6888 3.4757 3.4902 2.9934 4.7598 4.4542 3.5973
TABLE 3.A.3. Data on pig farrowing
Observation 1
2 3
4
5 6 7
8
9 10 11 12 13 14 15 16
I;, 29 0 27 29 0 0 0
24 20 34 12 0 0 0
22 29 8 17 11
22 15 0 20 12
24 0 0
24 0 0
0
0
X,,
X l
41 42 56 61 32 51 25 22 39 17 75 32 46 92 15 10
44 53 56 57 32 50 0
22 39 8 77 31 46 87 0 15
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
(x4)
1
0
~
0
8 0 20 0 0
47 20 0 0 22 25 0 0 0 12
~~
0 0 18 0 12 44 16 19 10 20 38 0
0 0 0
8 103 125 5 28 44 17 21 23 46 52 51 38 77 9
APPENDIX 3.A.
287
DATA FOR EXAMPLES
TABLE 3.A.3. (Continued) Observation
XI
x2
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 13 14 75
30 15 0 0 17 20 0 5 0 0 5 11 32 0 0 20 0 0 0 0 13 24 16 13 8 0
31 16 0 0 17 20 0 6 0 0 3 25
10
29 16 15 9 0 0 5 11 20 0 0 4 18
60 0 0
5
34 0 20 0 15 0 0 13
65 15 0
0 0 15 26 15 29 10 I 0 10
11
20 0 2 3 18 50 0 0
X,,
X,,
64 64 18 16 12 12 18 18 43 43 36 31 46 41 21 21 5 2 0 18 20 6 6 36 37 82 75 63 59 6 3 83 42 84 75 21 26 13 13 9 9 23 19 113 108 31 31 21 0 23 24 56 46 36 36 51 51 17 17 50 44 24 33 20 20 0 1 1 15 25 22 22 61 57 25 55 0 2 5 4 51 42 152 127 19 24 21 38
Observation
1;,
II;,
76 77
0 0 12 10 15 0 0 25 0 0 10 9 0 0 0 0 0 12 0 15 12 0 43 12 11 5 4 21 20 20 0 0 60 16 16 0 31 20 32 0 18 12 40
0 10 10 0 0 4 13 38 29 10 26 20 0 38 28 0 32 29 0 2 5 7 0 27 27 0 150 150 0 1 5 0 0 20 30 10 45 78 4 18 14 0 35 35 15 15 45 0 36 51 12 22 17 19 12 19 10 11 10 15 15 I5 12 13 13 25 35 53 3 44 44 8 24 45 11 79 52 10 10 10 0 15 I1 24 45 49 20 37 42 20 42 61 0 30 40 0 64 60 28 185 71 18 25 27 9 17 16 0 1 1 13 50 47 3 36 47 32 104 129 0 44 54 13 14 14 9 21 21 35 130 103
18
19 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
112 113 114 115 116 117 118
Xr1
X,2
2813
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.A.3. (Continued)
Observation
XI
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 I36 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
18 5 0 0 4 20 0 11 0 15 82 10
0 7 12 18 0
0 15 14 40 0 9 11 20 0 0 20 46 31 0 0 0
Kz X,1 10 4 0 0 4 0 0
12 0 0 69 4 0 2 14 0 5 0 16 12 40 0 8 11 0 0 0 17 30 0 0 0 40
32 5 19 52 8 26 10 27 20 25 140 13 27 36 43 23 32 21 37 25 82 30 14 42 22 31 66 50 65 31 49 72 125
XlZ
31 5
18 72 7 55 0 26 15 16 163 25 2 17 33 23 62 21 34 40 82 20 15 42 22 32 0 25 83 0 31 72 86
Observation
Kl
K2
Xl1
152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 183 183 184
20 0 0 0 0 42 0 0 11 0 9 0 9 0 8 4 0 9 0 25 30 0 5 13 0 0 0 0 30 0 0 0
10
0 50 0 0 18 0 42 0 0 10 0 10 0 9 40 8 2 3 9 0 40 35 35 5 12 0 10 12 0 20 0 0 0
30 159 44 23 19 0 43 22 40 23 9 10
44
28 62 32 17 21 28 54 43 30 71 26 45 24 39 40 8 82 21 21 38
XlZ -
40 129 29 23 19 2 43 12 51 21 0 11 42 28 62 9 17 20 23 35 54 35 73 27 48 24 36 27 22 72 21 21 38
TABLE 3.A.4. Depths of earthquakes occurring near the Tonga trench Depth Earthquake 1
2 3 4 5 6 7 8 9 10 I1 12 13 14
15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
I:
Distance Perpendicular to Trench
0.60 2.10 2.61 1.20 0.00 0.00 0.27 0.69 0.68 0.35 0.08 0.17 0.8 1 0.64 3.06 2.21 2.88 2.66 2.47 2.58 0.30 0.00 0.72 1.96 0.07 1.99 1.33 0.00 2.84 2.09 0.97 0.58 3.66 0.3 I 2.08 0.00 0.42 0.39 0.00 0.50 2.53 0.00 0.00
Source: Sykes, Isacks, and Oliver (1969).
x, I
1.38 2.72 2.76 1.03 0.49 1.05 0.52 1.14 0.86 0.73 0.68 1.26 1.51 0.88 3.58 2.8 1 3.22 2.89 2.97 2.94 0.35 0.19 0.74 2.42 0.76 2.40 1.88 0.31 3.24 2.35 1.53 0.70 3.76 0.75 2.56 0.13 0.53 0.36 0.26 1.07 2.94 0.12 0.05
Distance Parallel to Trench x12
1.19 1.42 1.74 0.99 0.67 0.9 1 0.82 3.1 1 3.03 0.78 0.40 0.53 2.14 1.41 1.86 1.62 3.10 3.94 1.24 1.17 0.46 2.71 3.06 3.73 0.63 2.7 1 1.88 0.01 2.08 3.84 1.96 1.22 2.57 3.12 1.40 3.16 2.51 2.41 2.20 3.34 0.55 2.82 2.18
290
EXTENSIONS OF THE SINGLE RELATION MODEL
TABLE 3.A.J. Observations for quadratic model Observation 1 2
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
I:
x,
0.70 0.05 0.99 0.30 1.10 0.93 0.30 0.24 0.28 0.86 0.76 0.48 0.11 0.2 1 0.60 0.58 0.10 0.31 0.24 0.13 0.45 0.66 0.18 0.47 0.18 -0.18 0.15 0.12 0.05 - 0.28 -0.59 0.38 0.20 0.16 0.45 0.42 0.14 -0.2 1 - 0.59 -0.22 -0.25 -0.23 -0.13
- 1.44
1
-0.90 - 0.8 1 - 1.14 - 1.36 -0.61 -1.11 - 1.46 -0.77 - 1.31 - 0.98 - 0.93 - 1.02 - 1.60 - 0.45 - 1.07 -1.11 -0.57 -0.90 - 1.25 - 0.8 1 - 1.07 - 0.5 1 -0.86 -0.16 -0.64 0.21 - 0.33 - 0.45 - 0.7 1 - 0.70 - 0.46 - 0.22 - 0.98 - 0.54 - 0.54 -0.81 -0.14 - 1.03 - 0.36 -0.14 - 0.37 -0.71
Observation
r;
x,I
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
- 0.4 1 0.02 0.34 -0.31 0.3 1 - 0.06 0.35 -0.35
-0.21 -0.52 - 1.01 - 0.34 -0.68 - 0.40 - 0.06 0.13 -0.18 0.1 1 - 0.04 - 0.54 -0.10 0.35 -0.32 - 0.23 -0.12 - 0.05 0.31 - 0.45 0.65 0.57 0.17 -0.16 -0.02 0.08 -0.17 0.09 0.43 0.83 0.70 0.35 0.64 0.52 0.67 0.50 - 0.26 0.56 0.44 0.99 0.76 0.55 0.53
I1
78 79 80 81 82 83 84 85 86
0.09
0.11 0.03 0.26 0.4 1 -0.0 1 - 0.45 0.32 -0.14 -0.06 0.32 -0.06 -0.22 -0.30 -0.26 -0.59 -0.31 0.23 -0.25 0.59 0.34
0.05 0.51 0.69 0.20 1.08 0.70 0.17 0.39 0.68 0.87 0.79 0.05 0.43 0.17
APPENDIX 3.A.
TABLE 3.A.5. (Continued) Observation 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
102 103
291
DATA FOR EXAMPLES
I; 0.40 0.33 0.73 0.57 0.57 0.36 0.57 0.64 0.27 1.25 1.54 1.53 1.32 1.48 1.03 1.1 1 1.33
X,, 0.89 0.35 0.33 0.39 -0.19 0.55 0.55 0.02 0.46 0.84 1.03 0.92 1.28 1.53 1.22 0.84 1.23
I
Observation
I;
104
1.43 1.76 1.16 1.96 2.05 1.80 1.63 0.92 1.86 1.50 2.04 1.85 1.34 I .60 1.56 1.60 1.57
106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
X, 1 1.36 0.72 0.7 1 1.44 0.82 0.58 0.39 0.63 1.11 0.84 1.07 0.80 1.08 0.85 0.97 1.oo 1.60
Measurement Error Models Edited by WAYNE A. FULLER Copyright 0 1987 by John Wiley & Sons, Inc
CHAPTER4
Multivariate Models
The models studied in Section 1.2, Section 1.3, and Chapter 2 can be characterized by the fact that the true values of the observed variables satisfy a single linear equation. The models fell into two classes; those with an error in the equation and those with only measurement errors. In the models containing an error in the equation (e.g., the models of Sections 1.2 and 2.2), one variable could be identified as the “dependent” or y variable. In models with only measurement error (e.g., the models of Sections 1.3 and 2.3), the variables entered the model in a symmetric manner from a distributional point of view, but we generally chose to identify one variable as the y variable. In this chapter we extend our treatment of measurement error models with no error in the equation to the situation in which the true variables satisfy more than one linear relation. The extension of the model with an error in the equation to the multivariate case is relatively straightforward and is not discussed. 4.1. THE CLASSICAL MULTIVARIATE MODEL The model introduced in Section 1.3 assumes that independent information on the covariance matrix of the measurement errors is available and that the only source of error is measurement error. This section is devoted to multivariate models of that type. 4.1.1.
Maximum Likelihood Estimation
We derive the maximum likelihood estimators for two models: the model with fixed x, and the model with random x,. One representation for the fixed
292
293
4.1. THE CLASSICAL MULTIVARIATE MODEL model is
(4.1.1)
where {x,] is a fixed sequence, e, = (er, u,), Z, = (Y,, X,)is observed, y, is an r-dimensional row vector, x, is a k-dimensional row vector, z, = (y,, x,) is a p-dimensional row vector, /Iis a k x r matrix of unknown coefficients, and p = r + k. The r-dimensional vector v, = E ~ ( I-p)' ~ , is the vector of population deviations, where
Y,= x,p
+ v,.
The maximum likelihood estimators for the case in which
c,, = reEU2
(4.1.2)
and r,, is known are derived in Theorem 4.1.1 and are direct extensions of the estimators of Theorem 2.3.1. The estimator of /?is expressed as a function of the characteristic vectors of M,, in the metric C,,,where the vectors are defined in (4.A.21) and (4.A.22) of Appendix 4.A. Theorem 4.1.1. Let model (4.1.1) hold. Assume that Cvuand M,, are positive definite. Then the maximum likelihood estimators are
/j=-B
kr
B-1 rr
--
7
6; = ( p - [)-'
f
i=k+ I
xi,
(4.1.3)
2, = Z,(I - BBT,,) = Z, - ?,YLIYu,, where M,,
=
n-'
I:= ZlZ,, p - P is the rank of Y,,
0, = Zl(Ir, -$)',
(P,,,Pu,) = vr, -hr,,[(t,-&I!, 1~1,
B,iare the characteristic vectors of MZz in the metric re,, B.i, i = 1,2,. . . , r, are the columns of B = (B;,, B k r y , and AP < 1,- < - - < X p - r + are the r smallest roots of .
.
A
A
.
.
IM,, - Al',cl = 0.
Proof. The proof parallels the proof of Theorem 2.3.1. To simplify the proof we assume T,, to be nonsingular, but the conclusion is true for singular Tee.Twice the logarithm of the likelihood for a sample of n observations is 2 log L = - n
n
c 21 (z,- z , ) r l 1 ( z r- z,)'. (4.1.4)
I O ~ ( ~ X ~ , ,-U ~ (
r= 1
294
MULTIVARIATE MODELS
The zi that maximizes (4.1.4)for a given /? is 2; =
=
(B, Iky[(B,
IkIy;
‘(b, Iky]-
’@, IklYL
z; - rEec(c’r,c)-lcz;,
lz;
(4.1.5)
where C‘ = (I,, -p). Substituting this expression for z; into (4.1.4)we have 2 log L =
-
iog12nre,a2l - a-2
c z,c(c~,c)-~cz;. (4.1.6) n
,=1
Therefore, the /Ithat maximizes the likelihood is the /? that minimizes tr{C’MZzC(Cr,,C)’}.
(4.1.7)
By Corollary 4.A.10, the minimum is attained by choosing the columns of C to be linear combinations of the r characteristic vectors of M,, in the metric Ye, that are associated with the r smallest roots. The submatrix Brr is nonsingular with probability one and the estimator of fi is given by (4.1.3). Using the characteristic vectors in the metric To,the maximum likelihood estimators of z, defined by (4.1.5)are 2, = z, =
i: zlB.iBlrEE
i= 1
z, - itru;lru,. A
A
(4.1.8) (4.1.9)
Setting the derivative of (4.1.4)with respect to o2 equal to zero, we obtain the maximum likelihood estimator,
c (Z, - 2*)YGyz, - 2,y n
6: = (np)=p-’
1=1
tr{BM,,B} = p - l
f &.
i=k+l
0
Alternative derivations of the maximum likelihood estimators for fixed x, are given in Healy (1975) and Gleser (1981). Also see Exercises 4.1 1 and 4.12. As in the univariate model, the maximum likelihood estimator of a’ is not consistent for cr2, but the estimator (4.1.10)
is consistent for a2. The estimator (4,l.lO) can also be written (4.1.11)
where 0, = Z,(I, -fit)’ and
P,, = (I, -&TE,(I, -fit)’.
4.1.
295
THE CLASSICAL MULTIVARIATE MODEL
We now derive the maximum likelihood estimators for the model with estimated measurement error covariance matrix. We assume that the vector of true values is randomly chosen from a normal distribution and that the true values are known to satisfy r linear equations. We let
z, = z, + e,,
z,c = P o ,
-
(4.1.12)
NI[(O, pX)’,block diag(C,,, Wl, where z, = (y,, x,) is a p-dimensional vector, Z, = (Y,, X,),e, = (e,, u,), C’ = (Ir, -/Y’), x, is a k-dimensional row vector of true values, X,is a k-dimensional row vector of observed values, Y, is an r-dimensional row vector of observations, Po is an r-dimensional row vector of unknown coefficients, is a k x r matrix of unknown coefficients, and p = k r. The vectors Z, = (Y,,X,), t = 1, 2,. . . ,n, are observed. It is assumed that Zxxis nonsingular. Under the model, (4.1.1 3) Z; NI(plz, Ezzh (e,, x,Y
+
N
where = (P, I k ) ” X X ( b 9
‘ZZ
Ik)
+
& ,’
= ‘22
+
‘EZ
(4.1.14)
and F~ = (Po + pJ?, px). There are i k ( k + 1) unknown parameters in Exxand we can reparameterize the problem by defining a k x k nonsingular matrix G, with hk(k 1) parameters such that ZXxis equal to the product GzGz. One possible choice for G, is a lower triangular matrix, but we need not specify the nature of G2at this point. If we let f, denote the row vector defined by
+
f, = x,G;’, then it follows that C,, = I. Model (4.1.1) specifies y, = (z,,, z r 2 , .. . ,ztr) to be a linear function of x, and, hence, of ( J l , J z , .. . , Jk).We may write 2,
= (PO, 0)
+ frG,(P, 1) = (PO,0) + frA’,
where A‘ = G,(/?, 1) is a k x p matrix. It follows that
Ezz = ACffA’
+ C,, = AA‘ + E,,,
(4.1.15)
where there are +k(k + 1) + kr independent parameters in A. We shall use both the parameterization associated with (4.1.15) and the parameterization (4.1.12). Let a sample of n vectors Z, be available and let mZZ= (n -
I;=
n
1 (Z, - Z)’(Z, - Z),
r=1
where 2 = n-’ I Z,. We assume that an independent unbiased estimator of C,,, denoted by S,,, is available. It is assumed that dfS,, is distributed
296
MULTIVARIATE MODELS
as a Wishart matrix with df degrees of freedom and matrix parameter Eee. We first obtain the maximum likelihood estimators of the parameters of the model, Because Eeeis unknown, an estimator of this covariance matrix is also constructed. In Theorem 4.1.2 we define the estimator of C,, in terms of mzz. As in earlier chapters, we call the estimator constructed with m,, the maximum likelihood estimator adjusted for degrees of freedom. Matrix theory used in the development is summarized in Appendix 4.A. Theorem 4.1.2. Let model (4.1.12) hold. Let Ex, and Erebe positive definite. Let i1 2 2, 2 . * * 2 be the roots of
xp
Jmzz- &,,l = 0. Let the latent vectors of m, in the metric Seebe defined by
(4.1.16)
i = 1,2, . . . ,p , = 0, (mz,where TIiSeeT,j = Sij and Sij is Kronecker's delta. Let
(4.1.17)
be the matrix whose columns are the latent vectors of m,, in the metric
Seearranged so that the corresponding roots are ordered from largest to
smallest. Then, if 2,' > 1, the maximum likelihood estimators adjusted for degrees of freedom are (gee,
%?) = ( S e e e T e e T S e e , S e e f i T z z T S e e ) ,
/ = ex;'exy,
(So, A) = (P -
where
(4.1.18)
a,81,
%ee=diag(l, 1,. . 1, B T e e . k + l . k + l , B T e e , k + Z . k + 2 , * . BTeepp), i = k 1, k 2 , . . . ,p , BTeeii = (n - 1 + ds)-'[(n - 1)& df], ETzz= diag(l, - 1, I, - 1 , . . . ,Ak - 1,O, 0,. . . ,O). 9
A
I
I
+
+
3
+
ex,
If lk < 1, the maximum likelihood estimator of is singular and / is not defined. The supremum of the likelihood function is the function evaluated at and with z replacing k, where z is the number of & exceeding one.
eTce eTZz
Proof. A proof has been given by Amemiya and Fuller (1984), but our proof is closer to that given in Anderson, Anderson, and Olkin (1986). For a p-variable model in which k variables are X variables and Y variables are Y variables, the logarithm of the likelihood adjusted for degrees of freedom
4.1. THE
297
CLASSICAL MULTIVARIATE MODEL
is
+ d,) log 211- +(n - 1)iogjczzj
log Lc(e)= -$fin - 1
-+(n - 1)tr{mzzci;> - 9f logp&&( - td, tr{S,F, '}, where 8 is the vector of unknown parameters. Let T be the matrix such that TSe,T = I,
where
TmzzT = A,
(4.1.19)
cTZz = T'XzzT,
(4.1.20)
A = diag(X,, &,. . . ,lp).Define new parameters CTCE = T'Ce,T,
where C,,, - CTeeis a nonnegative definite matrix of rank k. Then, employing the transformation, log L(e) = -+(n - 1
+ d,)(p log 2n + iOgls,,I) - +j-(e),
where 8 contains the elements of vechX,,, vech CTZZand
and the free parameters in
f(@ = (n - l)[loglzTZZ[ + tr{AEFZ$}] + df[loglETee( + tr{z&:}]*
(4.1.21)
The likelihood will be maximized if f(8) of (4.1.21)is minimized. There exists a K such that CTce =
KG-IK',
where G = diag(g,, g2, . . . ,gp) and g; I&ee
XTZZ
6 g;
= KK',
-
6 . . < g;
are the roots of
- g-lzTZZl = O.
The roots g; are the population analogues of the statistics 1 ;' defined by (4.1.16). By the definition of&.&& and ETZZ, g; ' = 1 for i = k + 1, k + 2,. . . ,p and g;' < 1 for i = 1,2,,. . , k. The quantity to be minimized given in expression (4.1.21)cah be written f(8) = ( n - 1
+ d,) iogl&ZZl + d , log/G-'l + ( n - 1)tr{C;2zA}
+ d , tr{K-'K-"G} 2 ( n - 1 + d,) f: log yzi + d , i= 1
f log g;'
i= 1
(4.1.22)
298
MULTIVARIATE MODELS
where yzl 2 yz2 2 - * * 2 yzp are the roots of Z T Z Z we , have used the fact that the roots of KK' are equal to the roots of K'K, and the inequality follows by Corollary 4.A.14. The unknowns in (4.1.22) are yzi, i = 1,2,. . . , p , and gi, i = 1,2, . . . , k. If we first minimize (4.1.22) with respect to yzi, we have
+ +
+ +
i = 1 , 2 , . . . , k, f Z i = (n - 1 d f ) - ' [ ( n - l)Ai d f g i ] , (4.1.23) i = k + 1, . . . , p , f z i = (n - 1 d J ) - ' [ ( n - l)xi d J ] , where f z i = BTeEiifor i = k + 1, . . . p. Substituting the expressions of (4.1.23) for the yzi in expression (4.1.22),we obtain
where h(g; A) = (n
-
1
+ d,)
log{(n - 1
+ d,)-'[(n
- l)A + d,g]}
- d,
log g.
Because inf h(g; A) =
8'1
h(A; A) if A > 1 h(1;A) i f O < A < 1,
it follows from (4.1.24)that
where q = min{?, k} and the 8TEEijare defined in (4.1.23).The right side of (4.1.25)is the infimum of f(0) for 0 in the parameter space. This infimum is attained if we evaluate (4.1.21)at *
CTZZ &,
A
= d i a g ( L A,, = diag(1, 1, * .
* * * 3
x,,
. . 8TECPP),
~T&&,q+l.q+12
. , 1, ~ T e e , q + l , q + l , *
* 3
. 7
BTeepp).
(4.1.26)
The estimators in (4.1.18) are obtained by applying the inverse of the transform in (4.1.20) to the estimators in (4.1.26) and then using (4.1.19).
0
The estimator of /I derived in Theorem 4.1.2 can be obtained from that of Theorem 4.1.1 by replacing YE,with S,,,provided the estimator of Theorem 4.1.2does not fall on the boundary of the parameter space. As in the univariate case, the estimators of fl for the fixed and random models differ only in the treatment of boundary cases. In Theorem 4.1.2 it is assumed that Zecis positive definite. We shall find it worthwhile to have an expression for the maximum likelihood estimator that holds for singular See. This is accomplished by using the roots and vectors of S,, in the metric mzz. Let 3; Q 2; Q * ,< 2, be the values of
'
1
'
4.1.
299
THE CLASSICAL MULTIVARIATE MODEL
1-' that satisfy
IS,, - l-'mzzl = 0,
where the first t(t > 0) of the
2;'
(4.1.27)
are zero and p - t is the rank of See.Let
R = (R.1, R . 2 , . * .
9
Rp)
(4.1.28)
be the matrix of characteristic vectors of S,, in the metric mzz, where the first l vectors are associated with zero roots of (4.1.27).Let A;' c 1. Then the maximum likelihood estimators of and X,, are
e,, = AA',
j = (A*&A$y, where
=
[A:&,kkk]' is the p
(4.1.29)
x k matrix
A = mzz[(l - l;')1/2R,l,
(1 - 2;1)1/2R.2,. . . , ( 1 - 2;1)1/2R,k].
If S,, is positive definite, the matrix
A = S,,[(Rl - l)1/2T.l,(2,
A can be expressed as -
1)1/2T.2,. . . , (R, - 1)"2T,],
(4.1.30)
where T,i = 2!/2R,i and the T,i are defined in Theorem 4.1.2. See Result 4.A.11 of Appendix 4.A. That estimators of the form (4.1.29) maximize the likelihood for known singular X,,,is proved in Appendix 4.C. The estimator of C,, given in Theorem 4.1.2 can also be written
E,, = ( n - I + d,)-'[d,~,, + (n - l)(mzz - Ezz)].
(4.1.31)
In this form we see that the estimator of Z,, is obtained by "pooling" an estimator from the Z sample with the original estimator See.The Z sample can furnish information only on certain covariances, the covariances that define the covariance matrix of the u variables. In the transformed version of the problem defined in (4.1.20),the u variables correspond to the last p - k variables of e,T. In the model with a single equation, the estimated slope parameters are the elements of the vector a^ that minimize the ratio (a'S,,a)- 'a'mZza,
(4.1.32)
where a' = (1, - /Y). The ratio (4.1.32),evaluated at a*' = (1, - )'), can be interpreted as the ratio of the mean square of the residuals 6, divided by the estimator of the variance of 0,. In the multivariate case we have r equations and there is a corresponding r-dimensional vector of deviations, v,. A generalization of the ratio (4.1.32)for the r-dimensional case is ( n - l)-'
n
1 (Z, - z)c[c's,,c]-'c(z,- Zy,
,=I
(4.1.33)
300
MULTIVARIATE MODELS
-r).
where C = (I, It follows from the proof of Theorem 4.1.1 that the estimator obtained in Theorem 4.1.2 minimizes the ratio (4.1.33) and that the minimum value of the ratio is (4.1.34)
See Exercise 4.9. Another generalization of (4.1.32) is the ratio of determinants Ic'S,,C(- lJC'mzzCJ.
(4.1.35)
This ratio is also minimized by the estimator obtained in Theorem 4.1.2 and the minimum value of the ratio is the product of the r smallest Ai. If we know v, and Z,, the best estimator of x,, treating x, as fixed, is XI
=
x,- VIZ:,lC",.
(4.1.36)
See, for example, Equations (1.2.34) and (1.2.20). When the parameters are estimated, the estimator of x, is f, = x
+ (Z, - Z)H,,
(4.1.37)
where H, = (0, Ik)' - (I,., -jy2;'2uu,
A
(4.1.38)
g , = g, - C,,& For nonsingular e,,, H 2 = 2i'(b,IkY[(b, '(b, (4.1.39)
2,"= (I,., -j?')&,(I,., -/?)', A
and
A
A
Iky]
The estimator of the first-order approximation to the variance of Br is
e,, - ~ u u ~ ; " ' z u " . (4.1.40) We demonstrate that the first k columns of A of (4.1.29) define functions A
f ( 2 , - x,} =
-
A
of the Z, that are linear combinations of 2, and the last r columns define functions of the Z, that are linear combinations of 0,. The matrix R is the matrix of characteristic vectors of S,, in the metric m,, ordered so that the roots are in increasing order and we can write
x;'
= (Rk,
Rr)?
(4.1.41)
where R,. = (R;,., RiJ = (I, -P)'Rr,.. Therefore, the estimated contrasts
(Z, - Z)R, = i,R, are linear combinations of i, = Y,- - XI). Direct multiplication will verify that
b0
RiS,,H2 = 0.
(4.1.42)
(4.1.43)
4.1.
301
THE CLASSICAL MULTIVARIATE MODEL
where H, is defined in (4.1.38).Because R is of full rank and because R'S,,R
= diag(i; I ,
1; ',. . . ,I;
(4.I .44)
'),
it follows that the columns of R, are linear combinations of the columns of H, and that the elements of (Z, - Z)R, are linear functions of the elements of (a, - X). Therefore, the transformation R partitions (Z, - Z) into a part corresponding to gt and a part corresponding to it.Furthermore, the transformed variables are standardized so that the observed moment matrix of (Z, - Z)Ris the identity matrix. The model (4.1.12) contains p mean parameters, [(p - k)k $k(k l)] parameters for the covariance matrix of Z,, and the parameters for the covariance matrix of 8,. Because there are +p(p 1) unique elements in the sample covariance matrix of Z, and $p(p I) exceeds the number of parameters defining the covariance matrix of Z,, it is possible to construct a goodnessof-fit test of the model (4.1.12).The alternative model for Z, is taken to be the normal distribution with unconstrained mean and unconstrained positive definite covariance matrix. The likelihood ratio criterion for testing model (4.1.12)against the unconstrained alternative is given in Theorem 4.1.3. The test does not require a positive definite See.
+
+
+
+
Theorem 4.1.3. Let the null model be the model (4.1.12)and let the alternative model be the unconstrained model. Then
xi
that are greater than one, where q = min{k, z}, z is the number of L,(m,,,S,J is the logarithm of the likelihood of the sample adjusted for
and L,(b) is the logadegrees of freedom evaluated at (Zzz,Eee)= (mZZ,_See), rithm of the likelihood evaluated at pZz, Zee)= (C,,, gee) with gzzand gee defined in Theorem 4.1.2. Under the null model the test statistic converges in distribution to a chisquare random variable with +r(r + 1) degrees of freedom as n + co and d; = O(n- ').
'
Proof. Under the unconstrained model, the estimator of ZTZZis and the estimator of ZTtt is I. Therefore, the expression for the test statistic follows from (4.1.24).For the unconstrained parameter space, Zzz is any symmetric
302
MULTIVARIATE MODELS
positive definite matrix. We transform the parameters defining Ezz on Ihe constrained space into the set (@,H,,, HYY.,)by the one-to-one transformation, &z = H,, + X&,, H,, = (C,~YH,,(C,I) + (I,O)rHyy.x(k01,
where fi is a k x r matrix, H,, is a k x k symmetric matrix, and Hyy., is an r x r symmetric matrix such that H,, + C,, is positive definite. Under the null model, H , = C,, is positive definite and Hyy.,= 0. Since there are 2- lr(r + 1) distinct elements in Hyy.,, the limiting null distribution follows from the standard likelihood theory. 0 Because the test statistic depends only on Of, the limiting distribution of Theorem 4.1.3 holds for any reasonable x, if the e, are normally distributed. See Miller (1984) and Amemiya (1985b). The degrees of freedom for the likelihood ratio statistic is the difference between the number of unique elements in the covariance matrix of Z, and the number of parameters estimated. The difference is the number of unique elements in the covariance matrix of v,. Therefore, the likelihood ratio test is a measure of the difference between the sample covyiance matrix of 9, and the estimator of C,, constructed as (I, -~)Se,(I,-p)’. The fiti entering the likelihood ratio expression are analogous to the residuals from a regression in the sense that k coefficients are estimated in order to estimate the GI, for each i. Therefore, by analogy to regression, we suggest that the test statistic be constructed as
(4.1.45)
and that the distribution be approximated by that of the central F distribution with $r(r + 1) and d , degrees of freedom. If Z,, is known, the likelihood ratio statistic j 2 = (n - k - 1)[
f
j=q+1
]
(1, - 1% 1,) - (P - 4)
(4.1.46)
is approximately distributed as a chi-square random variable with ir(r + 1) degrees of freedom. Another function of the roots that is also a useful diagnostic tool is the sum
f. R, = ~~{c~~~c[c’s~.c~-’),
i=k+ 1
(4.1.47)
4.1.
303
THE CLASSICAL MULTIVARIATE MODEL
introduced in (4.1.34). The sum of the r smallest 1,is a monotone [unction of individual Ail while the likelihood ratio increases as an individual l imoves away from one in either direction. If the assumptions of Theorem 4.1.2 hold, P
C i=k+l where S,,
= (1,
+~~(n-'),
A
ili = tr{(m,, - m,,m,'m,,~S~'}
--/?')S6&(1,-8')'. It follows that
F = (n - l)[r(n - k
- 1)l-I
P
1
i=k+l
Ai
(4.1.48)
is approximately distributed as a central F with r(n - k - 1) and d , degrees of freedom.
4.1.2. Properties of Estimators We give the limiting distribution of the maximum likelihood estimators under weaker conditions than used in the derivation. We begin by demonstrating the strong consistency of the estimators. Theorem 4.1.4. Let model (4.1.12)hold without the normal distribution assumption. Assume that E , are independently and identically distributed with mean zero and positive definite covariance matrix CE,.Let S,, be an unbiased estimator of C,, distributed as a multiple of a Wishart matrix with d , degrees of freedom, where d;' = O(n-'). Assume that x, - p,, are independently and identically distributed with mean zero and covariance matrix Zxx,independent of E~ for all t andj. Assume the fixed sequence {p,,} satisfies
where m, = C,, where
(So,S, g,,)
+ m,,
is positive definite. Then, as n
(So,
S, gxx)
-+
(Po,
P 9
fixx)
.+ 00,
a%
is defined in Theorem 4.1.2.
Proof. Our proof follows arguments given by Gleser (1981) and by Amemiya and Fuller (1984). It cab be shown that, as n -+ 00,
mzz
-,&z,
a.s.,
304
MULTIVARIATE MODELS
+
where Czz = Zzz C,,, Zzz= (P, I)'m&?, I), and thus,
+
SE;1~2mzzSE~'/2 + Cc~1/2ZzzCE; '1' I,,
a.s.
If we let Ai be the eigenvalues of C; 1/2&zZ; 'I2,then Ii> 1 for i = 1,2, . . . , k and Ii= 1 for i = k + 1, k f 2, . . . ,p. A matrix of orthonormal eigenvectors corresponding to the r roots that equal one is
Q(,)= Z;/'(I, - /?')'CG'/'G, where G is an r x r orthogonal matrix. Let of (4.1.16) and let
A = diag(l,, A 2 , . . . ,A,)
i1 2 i22 , .. 2 1 , be the roots
= block
diag(&,,&,)),
xi
contains the r smallest roots. The eigenvalues are locally conwhere tinuous functions of the elements of s,; 1'2mzzS, '/'. Thus,
A(,)
+
I,,
a.s.
Let w be a point in the probability space of all sequences of observations. -, I,. The set of Fix an w such that mzz(w) --* C, SC@) -,F,,, and Acr)(o) such w has probability one. Let Q = (Q(,'),Q(,J be the yatrix*of characteristic vectors of Se; 1/2mzzSE; '/' associated with the roots 2, > I, >, * * * > where the partition corresponds to the k largest and r smallest roots. Since Q{,,(w)Q(&o) = I for all n, each element of Q,,(w) is bounded. Thus, for every subsequence of {Qc,(w))?= ', there exists a convergent subsubsequence. Taking the limit over such subsubsequences on both sides of the equation
&,,
[ S J ~ ) -] 1'2mz~(w) [ s e ~ w )-] l"Q(r)(w)= Q(r)(u)&r)(U), we find that the limit of a convergent subsubsequence of Q,,,(w)is of the form Q(,)for some orthogonal G(w),where G(o)depends on the subsequence and w. Hence, the limit of T,,,(w) = S; '/'Q(,)(w) for such a subsubsequence is (I, - /?')'Z,'/'C(w) and the limit of g(w) = -Tk,(U)T, ' ( 0 )is P, where the partition of T is as defined for R in (4.1.41). Since the limit is the same for every subsequence, g(w) converges to fi and 1converges to /I a.s. It follows that
'(j,I)'&,, + Z; '(P, IYZ,,, 2,, = [(j,1,')s;'(/.I, I,')'] - and c,, = [(P, H2= S;
where fore,
ex,= i3b(mzz - s,,)H2-, z,,,
Also go + Po a.s., because, X
+ ji,,
as., and $!
--t
a.s., ('/I,
IJ]
-
'.
There-
a.s.
Bo + Fxp,a.s.
U
4.1,
305
THE CLASSICAL MULTIVARIATE MODEL
Theorem 4.1.4 was stated and proved for positive definite Z,, but the result also holds for singular Z,,. The limiting distribution of the vector of estimators is derived in Theorem 4.1.5. The statement and proof are given in considerable detail for reference purposes. An alternative proof of asymptotic normality is given in Theorem 4.B.2 with a representation of the covariance matrix given in Corollary 4.B.2.
Theorem 4.1.5.
Let
-
u,
z,= z, + E t , e, N W , Yr = x,B, where Xu,,is positive definite. Let d,S,, be distributed as a Wishart matrix with matrix parameter LEE and d , degrees of freedom independent of Z,. Let c = lim dj'n,
e,,
O < c < 00.
n-. w
Let and be defined by (4.1.29)with R the matrix of characteristic vectors of S,, in the metric Mzz. Let
g,, = (n - 1 + d,)-'[d,S,,
+ (n - l)(Mzz- gzz)].
Assume that the x, - pxt are independently and identically distributed with mean zero and covariance matrix Zxx,independent of ej for all t and j . Assume that the fixed sequence (axl} satisfies lim n-'
n+m
lim n-
n+w
1 t=1
1 pxl = px,,
t=1
p
~ = ~M,,,p
~
~
where M x x = M + Zxxis positive definite. C' Let d = [(vec /I) (vech ', 2J]' and let 9 be the corresponding parameter vector. Then n'"(6 - 0) converges in distribution to a normal random vector with zero mean and covariance matrix (4.1.49)
where
306
-
If, in addition, x, = (1, xtl), where xrl then n”2(r* - 7)
MULTIVARIATE MODELS
NI(p,, Ex,) and p, is a fixed vector,
.5 N O , rrv),
e,,)’, ex,)’],
where r* = [(vec j)’, (vech (vech y is the corresponding parameter is defined in Theorem 4.1.2, the upper left portion of ryy is Tee, vector, and the remaining submatrices of ryy are
ex,
rpx= 2{& 6 (K:[MXx+ (1 + c F P P ] ) } # L
6 xeu)lK, - (1 + C)(Z,
(37)- 'gEE},
where gee = diag(18.53, 305.45, 18.53). If we compute one step of the Gauss-Newton iteration starting with our initial consistent estimators, we obtain the estimates of Table 4.2.5. The one-step estimates are the same for the two covariance matrices because d(Il = dcr)by Theorem 4.2.4. Also, as stated in Theorem 4.2.4, the estimated standard errors are the same for the estimates of b and XECfor the two models. It is only the standard errors for the estimates associated with x that differ. Under the random model the variance of the distribution of x is being estimated and the estimated variance of 8,, reflects the fact that the values of x change from sample to sample. Under the fixed model the set of x values is the same for the collection of samples for which the variance is being estimated. For example, the variance of the estimator of X is close to n-'B,, for the fixed model, and the variance of the estimator of p, is close to n - 18xx for the random model. TABLE 4.2.5. One-step estimates for random and fixed normal models
Parameter Random x
Standard Error Fixed x
1011, 1
10111,
Po2
802
1112
PI2
6""
0""
10- 'oee.22 10- 20,, 10- ' p x
10- ' g e e 2 2 10-2mxx 10-'z
Estimate
Random x
Fixed x
10.272 37.332 0.803 17.510 30.536 10.803 12.016
0.082 11.174 0.090 4.368 7.338 2.567 0.545
0.082 11.174 0.090 4.368 7.338 0.342 0.070
346
MULTIVARIATE MODELS
TABLE 4.2.6. Final estimates for random and fixed normal models
Parameter Random x
Standard Error Fixed x
Estimate
Random x
10.272 37.338 0.803 17.422 30.538 10.804 12.016
0.079 11.169 0.090 4.106 7.328 2.568 0.545
Fixed x ~
10Pll
1081,
Po 2
802
812
812
6,”
6,”
10-2a,, 10-lpx
1O-lF
10- 1 6 c e 2 2
10- 16,c22 10- 2mxx
0.079 11.169 0.090 4.106 7.328 0.331 0.067
Estimates for the model obtained after iteration are given in Table 4.2.6. Because the initial estimates were quite good, the final estimates differ little from the one-step estimates. The estimated standard errors changed more on iteration than did the estimates. The residual sum of squares from the weighted regression is 1.58 and is the same for the fixed and random regression models. Under the null model the residual sum of squares is distributed as a chi-square random variable with two degrees of freedom. The model of this example is easily accepted for these data, under the assumption of normal homoskedastic errors. 00 We have seldom discussed the efficiency of our estimators, except to note that maximum likelihood estimators of the normal structural model are efficient by the properties of maximum likelihood estimators. The least squares results provide us some additional information on efficiency. We have the following results: (i) The vector of sample covariances is converging in distribution to a normal vector under very weak conditions. (ii) The limiting distribution of the least squares estimator constructed with a consistent estimator for the covariance matrix of vech m,, is the same as the limiting distribution of the least squares estimator constructed with the true covariance matrix of vech m,,. (iii) The distribution-free estimator of the variance of vech m,, converges to the covariance matrix of the sample covariances under the fourth moment assumption. Because least squares is asymptotically efficient for normal random vectors, it follows from (i), (ii), and (iii) that the least squares estimator is asymptotically efficient in the class of estimators based only on mzz.
4.2.
347
LEAST SQUARES ESTIMATION OF THE PARAMETERS
In least squares estimation we are able to construct a consistent estimator of the covariance matrix of the distribution of the sample covariances without knowing the distribution of the x,. An extension of this result has been used by Bickel and Ritov (1985) to construct efficient estimators in a more general setting. For other results on efficiency in measurement error models, see Nussbaum (1977, 1984), Gleser (1983), and Stefanski and Carroll (1985a). Because the normal distribution maximum likelihood procedures and least squares procedures retain desirable asymptotic properties for a wide range of distributions, because the covariance matrix of the approximate distribution of the estimators is easily computed for nonnormal distributions, because the estimators are relatively easy to compute, because the estimators have been subjected t o considerable Monte Carlo study, and because many distributions encountered in practice are approximately normal, normal distribution maximum likelihood and least squares are the estimators most heavily used in practice.
REFERENCES Anderson (1969, 1973), Bentler (1983), Browne (1974, 1984), Dahm (1979),Dahm and Fuller (1986), Fuller and Pantula (1982), Ganse, Amemiya, and Fuller (1983), Joreskog and Goldberger (1972), Lee and Bentler (1980), Shapiro (1983).
EXERCISES 13. (Sections 4.2.1, 4.2.2) Let the sample covariance matrix for 1 1 observations be
vech m,, = (70.8182, 13.8545,48.5636)’. Assume that the data satisfy the model
Ti = Po + Blx, (.x,,e,, u,)
-
+ e,, XI = x, + u,, NICO, diag(u,,, u,,, 5711.
Let it be hypothesized that PI = 0.5. This implies that ox” = uuD= -28.5. (i) Construct the least squares estimator of ( u x x ,uvu)using each of the two matrices (a) vech = (rn,,, mxv,mxx)’ = (69.1046, - 10.4270,48.5636)’ (b) vech E = (mu”, uxv,mxx)’ = (69.1046, -28.500,48.5636)’ in the estimated covariance matrix, = 2(n @ (ii) Complete one additional iteration !sing the first step least squares estimates obtained with the matrix of (b) to construct Vaa. (iii) Compare the least squares chi-square test of the model computed under the procedures of part (ii) with the likelihood ratio statistic computed by the method of Exercise 2.37.
vda
el$;.
348
MULTIVARIATE MODELS
14. (Sections 4.2.1, 4.2.2) Use the least squares method of (4.2.20)to estimate the parameters of the model
-
i=l,2,3,
Z,i=Po,+Z,+El,r
(zl, E , ~ E, , ~ ,EJ
N O , diag{u,,, utzl1,% r 2 2 ,
~,,3~1h
for the Grubbs data of Exercise 1.41, where Zti,i = 1,2,3, are the fuse burning times reported by the three observers. (a) Use m,, for W.Is the model accepted by the data? (b) Repeat the least squares procedure replacing W by the estimator of obtained in part (a). (c) Estimate the parameters of the reduced model in which usrll= ucElz2 = uCrrJ3 using m,, for W. Compare the model with equal error variances to the model of part (a). 15. (Section 4.2.2) Assume that E(mzz) = aZ, where Z is a known p x p nonsingular matrix and a is a parameter to be estimated. Show that if the least squares method is used with W of (4.2.29)equal to mzz, the least squares estimator is
ezz
B = [tr{(Zm;;)'}]
- tr{Em;;
}.
Show that if the least squares method is used with W = E,the least squares estimator is
2 = p - l tr{mzzZ-'}. Show that the estimator constructed with W = E is the maximum likelihood estimator under the assumption of normality. 16. (Sections 4.2.1,4.2.2)The following data are for three measurements on 10 units as reported by Hahn and Nelson (1970).The first measurement is made by the standard device and the last two measurements are replicates of measurements made by a new device. For the purpose of this example we assume Z,l = x ~ + E , ~ , Z , i = y o + y , x l + ~ , i , f o r i = 2 , 3
and (x19
Unit 1 2 3 4 5 6
7 8 9 10 ~~
Source:
E
~
-
&,A ~ ,
NI[(P~,01, dia&,,,
ucLEIurtZ2,%r22)1.
Device 1
Determination 1
Determination 2
71 108 72 140 61 97 90 127 101 114
77 105 71 152 88 117 93 130 112 105
80 96 74 146 83 120 103 119 108 115
~~
Hahn and Nelson (1970).
4.2.
LEAST SQUARES ESTIMATION OF THE PARAMETERS
349
(a) Estimate the parameters of the model by least squares applied to the covariance matrix with
S,,
(b) Let ( X I , X,, exercise
=
2(n - ~ ) - ~ ~ i f m B, mzz)f. ,
x3) = [ Z f I ,f(Z,, + Z,,), +(Z12- Zf3)].Show that under the model of this
Also show that the covariances between (muyl3 , myuz3)and all other sample covariances and m y Y z 3need not be included in the least squares of Y, are zero. Therefore, fit. Use these facts to derive the least squares estimators (equivalent to the maximum likelihood estimators) of the parameters of the model. Compute the estimated covariance matrix of your estimates and compute the lack of fit statistic. Note that the model is a reparameterization of the model of Section 2.2. (c) Compute the covariance matrix for (&,, &, (irrl under the assumption that the x values are fixed. Use (4.2.51) and the estimates obtained in part (b). 17. (Section 4.2.2) The estimates of pxl and prz of Table 4.2.4 are nearly equal and the estiare very similar. Using the estimates of Table 4.2.4 and uxrlI =uXx2,= mates of uXx1and 1.31 to construct an estimate of the covariance matrix of s, compute least squares estimates of the model with p x l = p x 2 = px and u , , ~=~ uixzZ= uxx.Is the restricted model acceptable? 18. (Sections 4.2.3, 1.4) Assume that, for a sample of 25 observations on Z, = (XI, I&, X,), vech m,, = (3.45, 2.49, 1.80; 6.42, 3.90; 4.18)' and the vector of sample means is the model
xi
=
(x,x,,R)
Pot+
X I = x, and (x,, ell, efzI4'
+ u,,
= (1.51, 3.37,2.63). Assume that the data satisfy
+ efi,
for i
- W(A,W,
=
1, 2,
block diag(o,,,
L %JI,
(a) Assume that urn"= I . Estimate the remaining parameters of the model and estimate the covariance matrix of your estimates. (b) Assume that uuu= 1 and that oerI2 = 0. Apply least squares to the covariance matrix to estimate the remaining parameters of the model. Use 2(n - l)-1$3(mzz @ mzz)+V3 as the estimated covariance matrix for vech m,,. (c) Complete a second iteration of the least squares procedure of part (b) using
eZz)f3 where ezzis the estimated covariance
2(n - I ) - ~ + ~ ( % 8~
as the estimated covariance matrix for vech m,. matrix constructed from the estimates of part (b). (d) Beginning with the estimates of part (b), complete a second iteration of the least squares procedure using the fixed-x covariance matrix (4.2.51)for vech mZz. (e) Compare the estimated variance for your estimator of PI of part (c) with the estimated variance of the single equation estimator (1.2.3) of Section 1.2 that uses only the knowledge of uuuand with the variance of the instrumental variable estimator (1.4.12) of Section 1.4 that uses only the knowledge that ueeelz = 0. 19. (Sections 4.2.1, 4.2.2) Use the method of least squares to combine the two sets of corn yield-soil nitrogen data analyzed in Example 3.1.1. Assume the model of Example 3.1.1. Include
350
MULTIVARIATE MODELS
the sample means in your vector of observations. For the first iteration use the sample covariances and Lemma 4.A.1 to estimate the covariance matrices of the sample covariances. Then compute an additional iteration using estimated covariance matrices for the sample moments constructed from your first round parameter estimates. Does the model seem appropriate for these data? 20. (Sections 4.2.3, 1.5) (a) Use the nonparametric estimate of the covariance matrix of the sample moments of the corn hectares data constructed in Example 4.2.3 to estimate the covariance matrix of the estimated parameters of the factor model constructed in Example 1.5.1. Because the factor model of Example 1.5.1 is just identified, the least squares estimators are equal to the estimators presented in Example 1.5.1. (b) Compute an estimator of the covariance matrix of (PI, X,n i r YIl, rnYrlZ,n i y X l m y y 2 2 , n ~m x x )~of the~ form~of ?, ~ of Example , 4.2.3 for the corn hectares data. Use this covariance matrix to estimate the parameters of the model of Example 4.2.5 by nonlinear generalized least squares. Construct a test for the model. 21. (Section 4.2.2) Prove the following.
v2,
Theorem. Let the assumptions of Theorem 4.2.1 hold. Let bmbe the value of f3that minimizes
Q J ~ )= loglzzzvq + t r { m z z m w . and let bd be the value of 8 that minimizes
va,,)=
- e(f3)]'9i1[s- do)], where s = vech mzz, a(@) = vech Zzz(f3),and ?, is given in (4.2.11). Then ."2(f$ - dm)2 0. Qc(@
4.3.
[S
FACTOR ANALYSIS
Factor analysis has a rich history in psychology beginning with the work of Charles Spearman (1904). The statistical development of the subject began with the work in least squares of Adcock (1878) and Pearson (1901). Harman (1976) describes the development of factor analysis and cites a number of applications in the physical and social sciences. The application of the method of maximum likelihood to factor analysis is primarily due to Lawley (1940, 1941, 1943) and is well described in Lawley and Maxwell (1971). Anderson and Rubin (1955) contains a detailed investigation of the factor model. The applications of the method often involve large quantities of data and there has been a continuing interest in efficient computational methods. 4.3.1.
Introduction and Model
The factor model was introduced in Section 1.5. In the model of (1.5.1) the elements of an observation vector are specified to be the sum of a linear function of a single unobserved factor x, and an error vector. Specification
4.3.
35 1
FACTOR ANALYSIS
of the nature of the covariance matrix of the error vector and the independence of x, and the error vector permitted the estimation of the parameters of the model. In this chapter we extend the model to higher dimensions. In the simple model of Section 1.5, the number of parameters to be estimated was equal to the number of sample moments. As a result, we were able to obtain explicit expressions for the estimators. This is generally not possible for models of higher dimension. Let a p-dimensional vector of observations on the tth individual be denoted by (4.3.1) It is assumed that the vector can be expressed as the sum of a linear function of a k-dimensional factor vector f, and a p-dimensional error vector e,. It is often assumed that the unobserved factor vector is distributed as a normal random vector with mean zero and positive definite covariance matrix. Under the normal distribution assumption,
Z, = pz
+ f,A' +
(4.3.2)
81,
where
A is a k x p matrix, and pz is the mean of the Z vector. The matrix A is sometimes called the matrix of factor loadings. We also assume that Z C Eis a diagonal matrix. Because f, and e, are normal random vectors, the vector Z, is normal with mean pz and covariance matrix V{Zi} = AZ,,A'
+ ZEE.
(4.3.3)
In the terminology common in factor analysis that was introduced in Section 1.5, is called the unique factor for the variable Z,, and the elements off, are called the common factors. The fraction of the variance of Zti that is due to the variation in the common factors is called the communality. The communality for the ith observed variable is
Ai.ZffAi.(ozzii) where Ai, is the ith row of A and ozzii is the variance of ZIi.The variance oEEii is sometimes called the uniqueness of the variable Z r i .Uniqueness is also often expressed as a fraction of total variance. The model (4.3.2)contains p means and '9
pk
+ )k(k + 1) + p
+
(4.3.4)
other parameters, where pk is the number of elements in A, ik(k 1) is the number of unique elements in C,,, and p is the number of diagonal elements in Zee.
352
MULTIVARIATE MODELS
We can transform both f, and A without altering the content of the model. That is, if we let G be an arbitrary nonsingular k x k matrix, then
Z, = pz + f,GG-'A' + 8, (4.3.5) = pz + g,B + e,, where g, = f,C and B = G-lA'. Because of (4.33, the model is often parameterized by assuming Zff = I. While there are p k + $k(k + 1) + p parameters entering the covariance matrix of model (4.3.2),the existence of the arbitrary transformation matrix G means that there are only pk
+ i k ( k + 1) + p - k2 = k ( p - k ) + f k ( k + 1) + p
(4.3.6)
independent parameters associated with the model for Zzz. The sample covariance matrix m, contains +p(p + 1) unique elements and a necessary condition for the model to be identified is that the number of unique elements equal or exceed the number of parameters. By (4.3.6)the condition is ( p - k)' 2 p
+ k.
(4.3.7)
If the inequality (4.3.7)is not satisfied, it is possible to identify the model by imposing additional restrictions on A or (and) C f f or (and) Ze,. We can also consider the model expressed in a form consistent with the errors-in-variables development of earlier chapters. To this end the vector Z, is divided into two parts,
z, = w,, Xt),
(4.3.8)
where X, is a k-dimensional vector, Y, is an r-dimensional vector, and p - k. We assume
Y,= Po + x,P + e,,
or, equivalently,
X,= x, + u,,
Y
=
(4.3.9)
z, = ( P o , 0) + x,(P, 1) + e,,
where E, = (e,, u,). The assumptions that Zeeis diagonal, that Xxxis positive definite, and that x, is independent of e, are retained. The model contains p unknown means and ( p - k)k
+ p + $k(k + 1)
(4.3.10)
unknown parameters in the covariance matrix, where (p - k ) x k is the dimension of p, p is the number of diagonal elements in C,,, and t k ( k 1) is the number of unique elements in Zxx.Note that the number in (4.3.10) is the number of unique parameters obtained in (4.3.6) for the alternative parameterization.
+
4.3.
353
FACTOR ANALYSIS
The specification that Exxis positive definite together with (4.3.7) are necessary conditions for identification of /I and Ze,,but they are not sufficient. To understand this, recall the one-factor model in three variables. Identification of all parameters of that model required that the coefficients of the factor (Dl and /j21of (1.5.1)) be nonzero. In the present model, if a column of fl is composed of all zeros, it may not be possible to estimate all parameters of the model. In practice, care should also be taken in the specification of the X, vector, to guarantee that Exxis nonsingular. Before proceeding to other methods of estimation, we observe that the columns of P can often be estimated by the method of instrumental variables. Without loss of generality, consider the first equation of (4.3.9) (4.3.11) where fl,l is the first column of /?. The model specifies x, to be independent Y 3 , . . . , I;,p-kare indeof e, and E,, to be diagonal. This means that pendent of e, Therefore, if
x2,
p b 2k
+ 1,
(4.3.12)
it may be possible to use the method of instrumental variables to estimate
fl.l, The condition (4.3.12) is stronger than condition (4.3.7). However, for
many problems the instrumental variable estimators, which are easy to compute by two-stage least squares or limited information maximum likelihood, can furnish preliminary information about the parameters of the model. Section 2.4 contains a description of the instrumental variable estimators and Example 2.4.1 is an application of the method to the factor model. 4.3.2. Maximum Likelihood Estimation We now obtain the maximum likelihood estimators of the parameters for the normal model. Our development follows closely that of Lawley and Maxwell (1971). In Section 4.1 we obtained an explicit expression for the estimator of /? under the assumption that an independent estimator of &, is available. In the factor model, C,, is a matrix containing unknown parameters to be estimated from the Z, data. This complicates the estimation procedure and no explicit expression for the maximum likelihood estimator has been obtained. We develop some properties of the likelihood function and outline methods for numerical maximization of the likelihood function in this section. Material following Result'4.3.1 can be easily omitted by persons not interested in computational procedures.
354
MULTIVARIATE MODELS
Two parameterizations of the factor model have been developed. The first is that given in (4.3.2),which we write as
z, = pz + f,A + e,,
(4.3.13)
where the matrix of factors satisfies
E{f,} = 0, E{fifr} = I and there are $k(k + 1) + k(p - k) independent parameters in A. With this form of the model, the covariance matrix of Zr can be written CzZ = AA‘
+ Gee.
(4.3.14)
The second form of the model is that given in (4.3.9). In the parameterization of (4.3.9) xzz = (B, I ) ’ x x x ( B * 1) +
Under the normal model, the density of Z, is (271)-p’2pzz\-
1’2
exp{
-w,- p z ) C 2 ( Z r - /by},
(4.3.15)
where we assume that the elements of C:,,,Ex,, and fl are such that ICzzl > 0. The logarithm of the likelihood for a sample of n observations is log L = -+np log 271 - +n logJZ,,{
-in@
- p z ) ~ ; i ( Z-
-+(n - 1) tr(mzzc,-,l}
az)’.
(4.3.16)
It is clear that the maximum likelihood estimator of pz is Z, Pz =
z = (So + P,k PA,
(4.3.17)
because replacing pz by PZ reduces the last term on the right side of (4.3.16) to zero. As we did in Section 4.1, we define the logarithm of the reduced likelihood function adjusted for degrees of freedom by log L, =
-&I - I)(p log 271 + logJC,,I + tr{mzzZ;i}).
(4.3.18)
The maximum likelihood estimators of the parameters in Czz adjusted for degrees of freedom are described in Result 4.3.1. Result 4.3.1.
Let
Y, = + xtB + er, Xr = + u,, (el, x,)l NIW, ax)’,block diagF,,, Wl,
-
where C,, is a diagonal matrix with positive elements, Xzz and C,, are positive definite, and (p - k)2 3 p + k. Then the maximum likelihood estimators
4.3.
355
FACTOR ANALYSIS
adjusted for degrees of freedom satisfy the equations
j = E,*exy, 2;i(mzz - gZz)2iiA = 0, diag(m,, - e,,) = 0,
ezz e,, e,,
e,,
(4.3.19) (4.3.20) (4.3.21)
where = + and = AA',provided the maximum of the likelihood occurs in the interior of parameter space. ,
Proof. The derivatives of the likelihood function with respect to the elements of A are evaluated in Theorem 4.C.1 of Appendix 4.C and Equation (4.3.20)is the same as Equation (4.C.6).The derivative of (4.3.18)with respect to the ith diagonal element of & is
Because the partial derivative of Ezz with respect to crzeiiis a matrix with a one in the ii position and zeros elsewhere, we have the p equations, diag{&i(mZz - X z Z ) & ~ } = 0,
(4.3.22)
where diag m is the diagonal matrix composed of the diagonal elements of
m. If Xee is positive definite, = (AA'
+ X,,)-' = &..'- C,i'A(I + A'ZzL'A)-'A'Ze~'.
(4.3.23)
Multiplying Equation (4.3.23)on the right side by A, we find that Equation (4.3.20),as an expression in Zzz and A, can also be written
(mzz - Zzz)Ze;'A(1
+ A'Ce;'A)-'
= 0.
(4.3.24)
Using (4.3.23)and (4.3.24),we can write (4.3.22)as diag{C, '(mzz - Zzz)C, '} = 0
(4.3.25)
and, because E,, is diagonal, we obtain (4.3.21). We now give some results that suggest an iterative method of computing the estimators.
Result 4.3.2. Let gZ,be the maximum likelihood estimator of ZZeadis in the interior of parameter justed for degrees of freedom. Assume that space. Then the maximum likelihood estimator of adjusted for degrees of freedom is given by
eez
= ArkA&',
(4.3.26)
356
MULTIVARIATE MODELS
where A = (&k,
A~J, &A
= mz&l
- ~ ~ ' ) ' ' ' f i ~ 1 , ,(1 a . .
- A;')'''fi,k],
(4.3.27)
R,j are the characteristic vectors of tz, in the metric m, and 2;' < A;' < * * < 2; are the p roots of in the metric m,,. The maximum likelihood estimator of the covariance matrix of z, adjusted for degrees of freedom is
e,,
e,, = AAt. (4.3.28) It can be shown that if e,, satisfies the equations in Result 4.3.1,
Proof. then A;' < 1. Thus, the result follows from Theorem 4.C.1. The maximum likelihood estimator of A adjusted for degrees of freedom was obtained in that theorem for known Zee.If A maximizes the adjusted likelihood conditional on and if is associated with the global maximum, then ($ gee) give the global maximum of the adjusted likelihood.
e,,
e,,
Result 4.3.3. Let the model of Result 4.3.1 hold. Let the vector of parameters be
,,
Y = cy;,
r:. Y;Y,
where yp = vec j, 7; = (ueei r . ~ , , ~.~.,. , oeepp),and yx = vech Xxx. Consider an iterative Gauss-Newton nonlinear least squares procedure in which a step of the procedure is the application of generalized least squares to the approximating linear system g = P(y - 7)
+ a,
(4.3.29)
where 7 is the value ofy obtained at the preceding step, g = g(7) = vech[mZz Xzz(f)], B = F(f) is the matrix of partial derivatives of vech Zzz = vech Zzz(y) with respect to y' evaluated at y = f, and E{aa'} is approximated with
(n - 1)- 'fi = 2(n - 1)- l*p[Xz,(f) 0 Xzz(7)]*;. Let the estimate at the next step be i,where
i - f (pfi-lp)-lpfi-l= g.
(4.3.30)
Then the estimate of y, defined in (4.3.30)is
9,
=
pa,
(4.3.31)
where fi is the matrix whose 0th element is the square of the 0th element is a column vector whose ith element is the ith diagonal element of eE;' of
c',a
mu$~'Cr,
4.3.
357
FACTOR ANALYSIS
with Zuu= C’Z,,C, e’ = (I, -st),muu= c‘m,,C, and X&&= diag(Z&&ll, Z€&,,, . . C&&,,). I
I
-
*
9
Proof. Let F = F(y) = (Fp,F,, Fx), where 0 (BYlk)’xxx], Fx = Ik)l 0 (8,IkY]@k, F, = +,L, and L is the p 2 x p matrix defined in Result 4.A.3.2. Note that the partition of F conforms to the partition of y. To obtain an explicit expression for the estimate of ye at the next step in the least squares procedure, we shall apply a linear transformation to the system (4.3.30). The objectives of the transformation are to isolate the coefficients of y p and y x and to reduce the covariance matrix of the error vector to a block diagonal. Let
F/J= 2#p[(Ir,
Ork)’
$,[(fly
(Vf,
where H
= (C, H2) and
4) = Z,(C, H2) = ZfH,
H, = (Ok,, Ik)l - CE;ul&,u. Then,
V{(vl,q,)’) = H’&H
= block diag(Z,,, Z,J.
If the m,, of (4.3.29) is replaced by the moment matrix of Z,H, the transformed error vector is Q a = vech{H’m,,H} - vech{H’X,,H}, where Q‘ = +&H’@ H)Q,and the variance of (n - 1)’I2Q’ais W,[block diag{x,,,
x,,,})0 (block diag{Z:,,, x,,,,})I@b.
If we rearrange the elements of Q’a and write a, = [(vech mJ, (vec rn,J’, (vech m,,)’]’ - [(vech ZJ, (vec Ok,)’, (vech Cav)’]‘,
(4.3.32)
then the variance of (n - 1)’I2a, is block diag{V,, V2,V,}
=
a,,
(4.3.33)
where ‘1
= 2$r(zuu
0 &JIJ)$;~
v2
= Zuv @ Z:sq?
v.3
= 2$k(Zqq
The transformed derivatives are Q’Fp = 2$p[(Ir, - x i l L u ) ’ 0 (Okr, ZxxY]? Q’F, = +,(H‘ 0 H)L, Q F x = $p[(Okr? Ik)l 0 {Okr, Ik)l]-
8 Zqq)@ks
358
MULTIVARIATE MODELS
Arranging the transformed observations based on the estimated transformation in the same order as a*, we have the transformed version of (4.3.29): (4.3.34)
Because fi, is block diagonal, because (yb, y:) does not appear in the equations for g;*, and because the dimension of (yb, )7: is the same as that of (g2*, g3*), it follows that the generalized least squares estimator of (y - 7) obtained at the next step is
(4.3.35)
ye - 7, = [L(c~c)$:B;'$,(c'~c')L]-'L(c~c)$:B;'gl*. Using g,, = vech{m,, - e,,}, Lfe = vec gee, the expression for el given in
(4.3.33), and Result 4.A.2, we have fe
=
[L{(c~&'c') @ (CE&'C')}L]-'L
v e c ( ~ ~ , , m , , ~ ~ (4.3.36) '~>.
Expression (4.3.3 1) follows from (4.3.36) and Theorem 4.2.2.
0
The importance of Result 4.3.3 is that, at every step in an iterative procedure using the current estimate of the covariance matrix, the estimated generalized least squares estimator of ye can be computed from Equation (4.3.36),ignoring the remaining portion of the large system. The estimator of ye is given in (4.3.31). Results 4.3.2 and 4.3.3 suggest an iterative method for obtaining the maximum likelihood estimators. Given an initial estimator of Z,,, one uses Result 4.3.2 to obtain an estimator of A (or /3). Given the estimator of A, an estimator of Eeeis computed by the method of Result 4.3.3. One way to initiate the procedure is to use the residual mean square of the regression of each Zion all other Z,'s as the estimator of oEEii. This procedure of constructing start values requires n to be greater than p and the estimator of oeeiiwill be too large on the average. An efficient method of computing this start vector is to use the reciprocals of the diagonal elements of mi;. Because of the form of $& in Result 4.3.3, it is not necessary to compute A of Result 4.3.2 at each step in the application of the suggested iterative
4.3.
359
FACTOR ANALYSIS
c,,
method. Let be the current trial value for &. Let 1; be the p roots of
-
I&,
< 1, < . - < 1;' (4.3.37)
- A-'mzzl = 0,
and let R = (&., fi,,,) be the matrix of characteristic vectors of metric mzz. Then by (4.1.50),(4.1,51), and (4.1.52),
R( r ) ( r )fil cE;"'mu"%;e = R(,)A&R,,, A(,)= diag{&+,, , & + 2 , . . . ,5,. Thus, the next-step cy;1e1=
A(,.)
and $, -
-
(4.3.38) (4.3.39)
(r))
where defined in (4.3.31)requires only and &,). If E,, is the maximum likelihood estimator, then diag{&,)(A:)
E,, in the
&,))k;,))= diag{c,
estimate of 7,
'(mzz - E,,)E; '}
- 'L vec[&,)(A& - A(,))&,)] v, = B-
=0
= 0.
Hence, the maximum likelihood estimator is a stationary point for the method. Convergence of the iterative procedure is not guaranteed. However, the generalized sum of squares associated with (4.3.34)decreases in the direction from y' to f when 7 is not a stationary point. Therefore, by Lemma 4.2.1, it is possible to find a point between $and j such that the likelihood is increased and the iterative procedure can be modified so that it will converge to a local maximum of the likelihood function. Fuller and Pantula (1982) suggested a further modification of the method. They approximated the expressions (4.3.38) and (4.3.39) by &,)&,)-and ~,,,A,,,k;,,, respectively. Their modification is based on the fact that if E,, is the true Ce,, then A(,)estimates I,. The construction of the least squares estimator of Z,, is relatively simple if some of the estimates of aEEii are zero. Should B-'d produce a negative estimate for oeSii, the estimate for that element is set equal to zero. Then the associated row of the system of equations By^, = d
and the corresponding column of B are removed. The reduced system of equations is solved for the remaining elements of y,. From the derivation of the estimators it is clear that the maximum likelihood estimator of C,, is a function of m"u= (n - 1)-1
f=l
q,,
360
MULTIVARIATE MODELS
where i, = (2,- Z)(I, it seems that
-/?)’. On the basis of expansions such as that of (4.1.70) (n - k - l)-I(n - l&&
(4.3.40)
will be a less biased estimator of Cecthan the maximum likelihood estimator. If the estimator (4.3.40)is used, it is suggested that the covariance matrix of the estimators be estimated by (n - k - 1)-2(n- l)’fEE,
where re& is defined in Theorem 4.3.1 of the next section. The computational algorithm we have outlined is described by Fuller and Pantula (1982).It is closely related to an algorithm given by Joreskog (1977). Joreskog and Lawley (1968)(see also Joreskog (1967))and Jennrich and Robinson (1969)have given computational algorithms for the factor model. 4.3.3. Limiting Distribution of Factor Estimators In this section we derive an expression for the covariance matrix of the limiting distribution of the maximum likelihood estimators of the parameters of the factor model. Theorem 4.3.1. Let Zr = (Yt, Xr) = (Po, 0) + xr(P, 1) + er, (et,
NI[(O, PJ, block diag(&&,CXx)],
XJ‘
where ZEEis a positive definite diagonal matrix, C,, is positive definite, and the model is identified. Let
r*’ = (r*b, r*L9 jYJ
= [(vet
b)’,(ae&i 1,
9
aeepp), (vech %,)’I
be the maximum likelihood estimator adjusted for degrees of freedom and let y’ denote the corresponding vector of parameters. Let b0 = I? - @. Assume that A, the parameter space for y, is a convex, compact set. Assume that the true parameter value of y is an interior point of A, and that C,, evaluated at any value of y in A other than the true value differs from C,, evaluated at the true value. Then n1’2[(b,,
where roo
roll
rpo
rPR
rxo
rxp
F)’- (Bo, Y’YI 5 N O , r), roc
rox
rpe
rpx
rxe
rxx
4.3.
361
FACTOR ANALYSIS
Zcc= Xu&'Zuu, roe = (roo, T o DrOh, , rOx), Ik is the k-dimensional identity matrix, C' = (Ir, Okr is a k x r matrix of zeros, and L is the p 2 x p matrix defined in Result 4.A.3.2 of Appendix 4.A.
r),
Proof. By Corollary 4.B.2, n112(f - 7) 1;N [ O , ( F % - ~ F ) - ~ ] ,
where R = 29,(Z,, 0 Ezz)*b is the covariance matrix of (n and F is the matrix of partial derivatives
F=
1)'l2
vech m,,
a vech Z Z Z ( 7 ) 8Yl
Therefore, it suffices to obtain expressions for the elements of (F'R-'F)The matrix (F'R-'F)-l can be considered to be the covariance matrix of the generalized least squares estimator f of y computed from the system g = Fy
where g is a 2 - ' p ( p
+ a,
+ 1)-dimensional vector of observations,
(4.3.4 1)
a = vech[(Z, - az)(Zt - a,)' - Z Z Z J and, by the properties of the normal distribution, E{aa'} = R. In the proof of Result 4.3.3 we gave expressions for the matrix F and demonstrated how the transformation
Q = (Ir,(H 0 HI@, transformed the system (4.3.41) into a system g* = F*Y + a*
(4.3.42)
with a block diagonal error covariance matrix. Therefore, we evaluate the elements of
(F*R; IF*)- 1,
362
MULTIVARIATE MODELS
where F, is defined following (4.3.34) and (4.3.35)we have
fP- yP = (1, @ x;:)
is defined in (4.3.33). Using
+
vech m,, - Pl(f&- ye) op(n- l”), 6 Ik) vech m,, vech(m,, - z,,)
fX- yX = 2+k(&&i1
- PAf&- Ye)
+ op(n-1’2),
+
where
{2$k(&&u @1Ik)(C @
+ $k(HZ 6H 2 ) I L
= $k[(27&,zG1c’
+ H 2 ) 6H;]L
= P,.
Hence, using (4.3.33),(4.3.42),and the fact that vech mu”is uncorrelated with kZ*,
g3*L
rsa= (I, 6m v 2 ( ~6r + p,r,,p;, Tax= 2(1, o~ 9 v ~ (6z IM ; +~p,r,,P2, ~ ~ ~ ~ rxx
= 4$k(xuvzL1 = 4$&
@ lk)VZ(z~lZvu
0 z,,)&
6Ik)& + v 3 + pZre&pZ
+ 2$k(C,, 6 z,,)$; + pzr&&p29
rer= 2[L(c 6 c)$;@;(~;~l6E;~~)@,$,(C’6 C)L]= 2[L{(czu;’c’)
0 (cz;1c))L]-1,
where V, and V3 are defined in (4.3.33)and we have used
@,#,(C’Q C’)L = (C‘ 6 C‘)L. Algebraic substitution gives the expressions of the theorem statement. The covariance matrices for fi0 follow from the definition of 8,. 0 The expressions of Theorem 4.3.1 are rather complicated, but all expressions are straightforward matrix functions of the parameters. Because the parameters can be estimated consistently, estimators of the covariances can be constructed by substituting the corresponding estimators for the unknown parameters. The variance expressions for 1 and of Theorem 4.3.1 warrant comparison with the variance expressions of estimators of the same parameters given in Theorem 4.1.1. The expressions of Theorem 4.3.1 each contain an additional term that is introduced because the sample is used to estimate Ze&. In both cases it is possible to express this term as a function of the covariance matrix of the estimators of Xe,. The expressions for the covariance matrix of j?s and 9, are appropriate for a wide range of specifications on the x,. In particular, the expressions re-
ex,
4.3.
363
FACTOR ANALYSIS
main valid for fixed x, and for nonnormal, but random x,. The normality of the e, is required for all variance expressions, but modest deviations from normality should have modest effects on the covariance matrix of ga. Deviations from normality will have larger effects on the covariance matrices of the estimators of C,, and ZEE.It is clear that the estimators will converge in distribution to normal random variables for any distribution for which n1I2 vech(m,, - Zzz) converges to a normal vector. The likelihood ratio test of the adequacy of the factor model relative to the unconstrained model has the same form as that obtained in Theorem 4.1.3 of Section 4.1. If we let the null model be the identified normal factor model (4.3.9)and let the alternative model be the unconstrained normal distribution model, the negative of twice the logarithm of the likelihood ratio computed from the likelihood adjusted for degrees of freedom is
= ( n - 1)
P
1
j=k+l
log A;',
(4.3.43)
wherefi;' 0. Prove that the least squares estimator of (a,,, aZ2.,)based on the sample covariances is
@,,, b22.1)= (mil, mt2 - 2pmI2t p2m11), where m,, = mzzi, = ( n - l ) - ' Z:= ( Z , , - ZJz. (b) Show that the estimator of part (a) is the maximum likelihood estimator adjusted for degrees of freedom. 25. (Section 4.3) (a) Assume that z, = (Z,,,z , 2 , , . . , 2 1 6 ) = ( X , . Yzr X 3 , x 4 , X,I, X12) satisfies the factor model
-
where x, NI(0, ZJ. Is PI identified for ail positive definite Z,,? Explain. (b) Assume Z, NI(pz, Zzz),with 5
1.00 0.90 0.90 0.10 0.20 0.90 1.00 0.90 0.10 0.20 0.10 0.10 0.10 1.00 0.70 0.20 0.20 0.20 0.70 1.00
Does Z, satisfy the factor mode! in two factors? Are the parameters of the model identified? 26. (Section 4.3.3) The data in the table are observations on the vital capacity of the human lung as measured by two instruments, each operated by a skilled and an unskilled operator. The observation identified as X is the standard instrument operated by a skilled operator, Y,
380
MULTIVARIATE MODELS
is the standard instrument operated by an unskilled operator, Y2 is the experimental instrument operated by a skilled operator and Y3 is the experimental instrument operated by an unskilled operator. The data are described in Barnett (1969). Assume that the data satisfy the factor model
X j = Poj + x , P l j + eljr XI = X, Em
+ u,,
8;
-
i = I , 2, 3, NI(0, Z,,),
= diag(aee1lr ~ e e 2 2 am339 ,
0uu)r
Readings of lung vital capacity for 72 patients on four instrument-operator combinations 345 131 382 211 186 194 236 288 198 312 176 148 184 358 188 240 222 254 92 224 224 226 386 278 222 188 94 248 166 404 254 178 128 194 176 204
353 132 372 288 142 178 226 292 172 318 163 176 166 348 200 232 212 250 120 216 213 251 418 210 140 182 96 222 178 418 256 170 130 206 200 166
Source: Barnett (1969).
403 161 415 274 154 202 243 265 180 325 139 170 140 368 209 255 229 262 64 230 203 240 398 189 184 190 106 215 176 400 208 139 80 203 186 ' 147
312 160 370 252 169 180 235 286 166 304 120 164 165 396 207 248 227 196 103 230 214 245 368 200 136 184 100 215 180 377 225 120 113 188 186 116
106 200 228 194 258 140 126 232 200 240 288 342 100
140
188 128 312 377 342 274 284 380 210 182 140 220 194 326 196 132 284 206 220 126 304 214
100
180 228 180 270 144 110 242 194 190 298 315 113 140 171 126 300 334 322 288 292 374 168 140 132 168 190 320 194 126 306 184 197 115 284 218
85 127 238 167 285 168 100 236 198 147 324 320 65 135
160
116 31 1 390 312 285 27 1 344 I65 106 135 164 182 325 189 114 365 172 190 86 285 256
60 170 235 158 21 1 148 103 236 198 174 314 320 84 138 135 133 325 370 329 288 275 340 193 105 110 111 127 327 192 100 35 1 178 227 115 267 272
4.3.
381
FACTOR ANALYSIS
-
where x, NI(0, uxx)and x, is independent of ei for all i and t . (a) Estimate the parameters of the model and estimate the covariance matrix of your estimates. Plot I?,, against i ,for j = I, 2, 3 and plot X,- i ,= CI4 against i,. (b) Estimate the reduced model in which P I , = 1 and P12= /Il3. Are the data compatible with this model? (c) Estimate the reduced model with Pol = 0, P I , = I , Pf12= Po3, and PI2 = Are the data compatible with this model? 27. (Sections 4.1, 4.2, 4.3) The data in the table are observations made at three stages in a production process. (a) Fit the model of Example 4.2.3 to the data. That is, assume that the observations, denoted by satisfy
xj,
Yj = Poi + x, + eIj,
-
where eIj NI(0, ojj), el, is independent of eij for t # i, and e,j is independent of xi for all f and i. Do you feel the unit coefficient model is appropriate for these data? (b) Fit the unrestricted factor model to the data. Fit the factor model with uszll = ( T ~ ~ (c) Fit the model of Section 4.1,
Z,,
-
=
for i = 4 2 ,
Pol t PlIz13 +
zl3 = 213
+
E13r
under the assumption el NI(0, la’). (d) What model would you choose for these data?
Stage
Stage 1
2
3
Observation
1
2
3
34 34 26 29 24
48 58
44 56 53
63 63 53 62 62
21 22 23 24 25
42 41 35 44 40
54 57 52
69 68 68 69 65
46 45 41 45 40
58 59 53 55 59
26 21 28 29 30
40 41 37 32 42
55
10
29 34 27 29 40
46 47
67 69 70 66 63
I1 12 13 14 15
30 29 38 34 32
45 52 53 50 49
62 60 63 60 58
31 32 33 34 35
27 36 37 37 32
41 52 48 51 40
53 63 69 62 60
16 17 18 19 20
38 30 41 40 39
53 49 54 52 48
63 56 69 63 63
36 31 38 39 40
34 24 34 40 36
45 41 50 53 49
63
Observation 1
2 3 4 5
6 7 8 9
54
49
50 54
58
59 68 67
~
~
.
382
MULTIVARIATE MODELS
28. (Section 4.3.3) Estimate the factor model in two factors for the seven language items of Table 2.4.1 obtained by deleting careful. Use logical and irritating as the variables defining the factors. Compare the estimates to those of Example 4.3.3.
APPENDIX 4.A.
MATRIX-VECTOR OPERATIONS
In this appendix we present notation for performing various operations on the elements of matrices. When interested in functions of elements of a matrix, it is often more convenient to arrange the elements in a vector.
Definition 4.A.I. Let A = (ai,) be a p x q matrix and let A,j denote the j t h column of A. Then vec A = (al a,,, = (A:l, A:,,
. . . , apl,a12,a Z 2 , .. . ,a p 2 , .. . ,alq,a Z q , .. . , apqY . . . ,A:q)'.
Sometimes, to identify the elements of the original matrix more clearly, we will separate the elements of different columns with a semicolon and write vec A = (all, a z l , . . . ,apl;a I 2 , .. . , ap2;. . . ;a l q , .. . , apq)', Note that vec A, also written vec{A}, is the vector obtained by listing the columns of A one beneath the other beginning with the leftmost column. It follows that vec{A') = (all, a t 2 , . . . , alq,azl,a 2 , , . . . a z q , .. . , apl,up,,. . . apq)l = (AL.9 A2.7. . . , Ap.)l, where Ai, in the ith row of A, is a listing of the rows of A as column vectors one below the other commencing with the top row. If the matrix A is a symmetric p x p matrix, vec A will contain $p(p - 1) pairs of identical elements. In some situations one will wish to retain only one element of each pair. This can be accomplished by listing the elements in each column that are on and below the diagonal.
Definition 4.A.2. vech A
Let A = (ai,) be a p x p matrix. Then
= (all, a z l , .
. . ,apl,aZ2,a 3 2 . .. . , ap2,a 3 3 ,a 4 3 r . .. ,a p 3 , .. . ,aPa)l.
The vector vech A, also written vech{A}, is sometimes called the vector half of A. As with vec A, we will sometimes separate the elements of different columns with a semicolon and write . a22,a32,. . . ,a p 2 ; .. . ;aJ. vech A = (all, u ~ ~. . ,, apl;
For symmetric A, vech A contains the unique elements of A. Therefore, it is possible to recreate vec A from vech A.
APPENDIX 4.A.
383
MATRIX-VECTOR OPERATIONS
Definition 4.A.3. Let A = (ai,) be a p x p symmetric matrix. Let CD, be the p z x t p ( p + 1) matrix such that and define $, by
vec A = @, vech A,
(4.A. 1)
+, = (CDp,)
(4.A.2)
-
'CD'Z.
Note that CD, is unique and of full column rank and that vech A
= $,
vec A = (CDbCDp)- lCDbCDp vech A.
If no confusion will result, we may omit the dimension subscript p from linear transformations of vec A into vech A, but the transformation JI, which is the Moore-Penrose generalized inverse of a,is particularly useful. Single subscripts could be used to denote the elements of vec A and vech A, but it is more convenient to retain the original double subscript notation associated with the matrix A. In the double subscript notation for the elements of a matrix, the first subscript of vec A is nested within the second because the elements of vec A are listed by column. If the elements of @ are defined with double subscripts to match those of vec A and vech A, we have, for i = l , 2, . . . , p , j = 1,2, . . . , p , k > / = 1 , 2, . . . , p.
a,.There are many
I if (i,j ) = (k, d ) 1 if ( i , j ) = (e, k ) 0 otherwise. If follows that the elements of WCD are given by [@'@]ij,kd
i
1 if (i, j ) = (k,e), i = j = 2 if ( i , j ) = ( k , d), i # j 0 otherwise.
Therefore, if the elements of $ are defined with double subscripts to match those of vec A and vech A, we have [$]ij,ks
= t(dkjdsi
+
dkidsj)r
where dij is Kronecker's delta, 1 ifj=i
Also,
j ,< i,
(4.A.3)
384
MULTIVARIATE MODELS
An element of the product of two matrices identified in the double subscript notation can be written as a double sum. Let A = ('ij.ks)
be an mn x pq matrix and let B = (b,,,UU) be a pq x dv matrix. Then CAB]ij,uu
=
f
k = l s=l
aij,ksbks.uv.
(4.A.4)
Definition 4.A.4. The Kronecker product of a p x q matrix A = (aij)and an rn x n matrix B = (bij), denoted by A 63 B, is the pm x q n matrix
a11B a12B . . u ~ , B uZ~B UZ~B. . . u ~ , B
AQB=
aplB aP2B
...
a,,B
In the double subscript notation the (ij, ks)th element of A C3B is
LA
0 B]ij,ks = (Ijsbik*
(4.A.5)
Some elementary properties of the Kronecker product are given in Result 4.A. 1. Result 4.A.I. Then
Let the matrices A, B, C, and D be suitably conformable. (4.A.6)
(ii)
(A 0 B)(C 0 D) = (AC) C3 (BD), (A 0 B)-' = A - ' 6 B-',
(4.A.7)
(iii)
(A @I B)' = A' 0 B',
(4.A.8)
(1)
(iv)
(A
+ B) 0 (C + D) = (A 0 C) + (A 0 D) + (B 0 C ) + (B C3 D).
(4.A.9)
Proof. Reserved for the reader.
0
The next two results establish relationships involving Kronecker products and the matrices CD and @. by
Result 4.A.2
Let the p 2 x p 2 symmetric idempotent matrix K~ be defined K p = CDP@, = @,(@;CDp)-'Wp,
(4.A.10)
APPENDIX 4.A.
385
MATRIX-VECTOR OPERATIONS
where a,, is defined by (4.A.1) and @,, is defined in (4.A.2), and let A be a p x 4 matrix. Then %,,(A6 A) = (A 6 ANq.
(4.A.11)
Proof. Note that a typical element of K,, is given by C~p]ij,st
= +(SisSjt + SitSjs).
Using this definition we have
= [(A
The matrix
K,, is
0
6 A)~qlij.km.
a projection matrix that creates a p' x p 2 column whose
(ij, km)element is the average of the (ij, km) and (ji, km)elements of the original p 2 x p 2 vector. Result 4.A.2 follows from the fact that the average of aijakmand ajiakmcan be obtained as the average of the ij and j i elements of
column km of A 0 A or as the average of the km and mk elements of row ij of A 6 A. Result 4.A.3.1.
Let A be a p x p nonsingular matrix. Then [$,,(A @ A)$;]-'
= (Pb(A-'
0Ad')@,,.
(4.A.12)
Proof. We verify the result by multiplication, $,,(A 6 A)$bUJb(A-' 6 A-')@,, = $,,(A 6 A)K,,(A-' 0 A-')@,, = $,,(A @ A)(A 0 A)-'K,@, = I.
Resuff 4.A.3.2. defined by
0
Let L (sometimes denoted by Lp) be the p 2 x p matrix L = ~ I . , 6 r , l , ~ , * ~ ~ , 2 , . ~ ~ , ~ . ,(4.A. , 6 13) ~ . ~ ~ ~
386
MULTIVARIATE MODELS
where I.i is the ith column of the p x p identity matrix. Let A and B be p x p matrices. Then
-
(i) (A 6 B)L = (A.1 6 B.1, A.2 6 B.2, A.p 6 B.ph (ii) the jith element of L(A 6 B)L is ajibji, (iii) vec(diag d) = Ld, 4
9
where A,i is the ith column of A, B,i is the ith column of B, aji is thejith element of A, bji is the jith element of B, and diag d is a p x p diagonal matrix whose diagonal is composed of the p elements of the vector d.
Proof. By (4.A.6),
(A 0 W(1.i 0 1.i) = (A1.i) 6 (B1.i) = A.i 6 B,i and the result (i) follows. Again applying (4.A.6) we have
(IIj 6 1IjMA.i 6 B.0 = (IIjA.3 6 (1IjB.i) = aJ,I &. and result (ii) is established. Result (iii) follows from the fact that the column vector I,i 6 I,i has a one in the [(i - 1)p i] position and zeros elsewhere.
+
0
The $ matrix can be used to obtain a compact expression for the covariance matrix of the sample covariances of a normal distribution.
Lemma 4.A.I. Let Z, be normally distributed with mean pz and covariance matrix Ezz. Let
Then the covariance matrix of vech mzz is V{vech mZz} = V { ~ ,vec mZZ} = 2(n
- l)-Vp(&z 6 Xz,)Wp.
(4.A.14)
Furthermore, V-'{vech mZZ}= +(n - l)Wp(&'
6 E;;)@,,.
(4.A.15)
APPENDIX 4.A.
387
MATRIX-VECTOR OPERATIONS
Now the ij, kd element of $(Zzz 0 Zzz)$’ is, by (4.A.4) and (4.A.5),
=
ffff
u=1 u=11=1 s=l
d(6tj6siBudsuk
3 stjSsi6uk6ut
+6tiSsjSut6uk + 6ti6sj6uk6ul)oZZsuaZZtu = %‘ZZikaZZjt
+ aZZitoZZjk)
and the first result is established. The inverse result follows by an application 0 of Result 4.A.3. The next three results involve Kronecker products and the trace and vec operators. Result 4.A.4. Let A, B, and C be p x q, q x m, and m x n matrices, respectively. Then (4.A. 16) vec(ABC) = (C’0 A) vec B.
Proof. Let C,, be the j t h column of C. Then the j t h p-dimensional subvector of vec(ABC) is
’- f
(ABC), -
iZ1
C~~AB,~
= (Clj 0 A) vec
B
and the conclusion follows.
0
Result 4.A.5. Let A and B be p x q and q x p matrices, respectively, and let the trace of C, denoted by tr C or by tr{C}, be the sum of the diagonal elements of a square matrix C. Then
tr(AB) = (vec A)’ vec B = (vec A)’ vec B.
Proof. The trace is
= (vec A’)’ vec B.
The other result follows because tr{AB} = tr{BA}.
(4.A.17)
388
MULTIVARIATE MODELS
Result4.A.6. Let A, B, C, and D be p x q, q matrices, respectively. Then
x
m, p x n, and n x m
tr{ ABD‘C‘) = (vec A)‘(BQ C)(vec D).
(4.A. 18)
Proof. Now
tr{ABD’C’) = (vec A)’vec(CDB) = (vec A)’(B 6 C) vec D, where we have used Result 4.A.5 and Result 4.A.4. We next give some results on derivatives of matrix and vector functions. Definition 4.A.5. Let A = A(8) be a p x q matrix whose typical element aij = a,,@)is a function of the r-dimensional column vector 8 = 8,, . . . ,
(el,
6,)’. Then
The derivatives of products of matrices follow directly from Definition
4.A.5. We have, for example, *
aAB
-=-
aA
aB
. aei aei B + A - -aei
(4.A. 19)
Definition 4.A.6. Let a = a(8) be a p-dimensional column vector whose typical element a j = aj(8) is a function of the r-dimensional column vector 8. Then
APPENDIX 4.A.
389
MATRIX-VECTOR OPERATIONS
Definition 4.A.7.
The matrix =
[i?]
is the transpose of the matrix of Definition 4.A.6. Dejnition 4.A.8. Then
Let g(A) be a scalar function of the p x q matrix A.
I
&?(A) MA) aa11 342
~
...
-
The derivative of the determinant of a p x p nonsingular matrix has a simple form. Result 4.A.7.
Let A(0) be a nonsingular p x p matrix. Then
Proof. The expansion of the determinant using cofactors is j= 1
Because Cof(aij) does not depend on aij, dIAl = Cof(aij) = a'jlA(,
aaij
where uij is the ijth element of A-I. It follows by the chain rule that
Result 4.A.8.
Let A(0) be a p x p matrix with positive determinant. Then
390
MULTIVARIATE MODELS
Proof. Using Result 4.A.7 and the chain rule, we obtain
= tr{A-'E].
Note that if A is a symmetric matrix and arr= Bi, then
and if A is symmetric and Atj = di for t # j, then
where arj is the tjth element of A and a'j is the tjth element of A-'.
Result 4.A.9.
Proof.
Let A = A(@)be a p x p nonsingular matrix. Then
If B is a p x p matrix, we have aAB r3B -=A-+-B. adi adi
dA adi
Letting B = A-', O=A-
aA dA-' +-A adi doi
and the conclusion follows.
0
If A is symmetric, note that -fjais
- ariajs
- ariais
for i # j for i = j.
We next give some theorems about the roots of symmetric matrices. These lead to alternative representations for positive semidefinite symmetric matrices .
APPENDIX 4.A.
39 1
MATRIX-VECTOR OPERATIONS
Result 4.A.20. (Courant-Fischer min-max theorem) Let A be a p x p symmetric matrix. Let I , 2 1, 2 . . . 2 1, be the characteristic roots of A. Then I , = max (x’x)-’x’Ax, X
,I2 = min max (x‘x)-’x’Ax, y t 0 x’y=0
Ak = min max (x’x)-’x’Ax, y i f o x’yt=O
i = l , . . . .(k- 1)
I$, = min max (x’x)-’x’Ax yi’o
x,yi=o
i= 1 . . . . , ( p - 1)
= min (x’x)- ‘x‘Ax. XZO
Proof. See Bellman (1960, p. 115).
0
The Courant-Fischer min-max theorem is typically stated with the condition yiyi = I, on the vectors over which the minimum is evaluated. Such a condition does not change the minimum value of the ratio. Let m be a symmetric p x p matrix and let Z be a symmetric positive definite p x p matrix. Then the roots A, 2 I 2 2 * . . 2 1, of Im - AX/=0
(4.A.20)
are called the roots of m in the metric X. There exists a matrix T such that
TXT = I, T m T = diag(1,, I , ,
. . . ,&).
(4.A.21) (4.A.22)
We call the columns of T the characteristic vectors of m in the metric Z. Some authors call any vector T,i satisfying (m - liX)T.i= 0
a characteristic vector of m in the metric E, but we reserve the term for vectors that also satisfy (4.A.21). The matrix T can be constructed as
T = Qr- ‘”P, where the columns of Q are the characteristic vectors of Z,
Q’EQ = r = diagiy,, y 2 , . . . ,Y,),
392
MULTIVARIATE MODELS
Q Q = I, yi are the characteristic roots of C, the columns of P are the characteristic vectors of r-'/'Q'mQr-'/', and r-''*= diag(y;'/', y;"', .. ., y;
1'2).
The corollary of Result 4.A.10 for the roots of m in the metric C is used repeatedly in the text. Corollary 4.A.10. Let m and C be p x p symmetric positive definite matrices. Let A, b I, 2 . . * 2 ,$, be the roots of the determinantal equation Jm- AX[=0.
Let P be a p x r matrix of rank r, where r
+ k = p. Then
max IP'ZPI-'JP'mPJ = P
max tr((P'CP)-'(P'mP)) = P
n di, r
i= 1
r
C li, i= 1 "
The maxima are attained when the ith column of P is proportional to the ith characteristic vector of m in the metric C, i = 1,2, . . . , r, and the minima are attained when the ith column of P is proportional to the (k + i)th characteristic vector of m in the metric Z. Proof. Reserved for the reader.
0
Result 4.A.12. Let m and Z be p x p positive definite symmetric matrices. Let 2, > i2 > . . . > j, be the roots of m in the metric Z and let .j, < f 2 < * * * < p , be the roots of I:in the metric m. Let T.i be the vectors of m in the metric Z and let R.i be the vectors of I: in the metric m.Then
(a) 1, = ?,:I, (b) T.i = jf''R.i (or T,i =
[email protected]),
Proof. Reserved for the reader.
APPENDIX 4.A.
MATRIX-VECTOR OPERATIONS
393
Result 4.A.22. (Spectral decomposition) Let m be a real symmetric p x p matrix. Let A,, Az, . . . , A p be the, not necessarily distinct, roots of m. Let Q be a matrix such that Q‘mQ = A = diag(A,, I,, , . . , A,) and Q’Q = QQ’ = I. Then
where Q.i is the ith column of Q.
=
f.
i= 1
AiQ.iQIi.
If there are repeated roots, it is possible to group the products associated with the repeated roots to obtain A
m=
i= 1
liAi,
where d is the number of distinct roots and ri
Ai=
1
j= 1
Qj.QJ.
is the sum of the matrices associated with the ith distinct root. Note that Ai = A; and A: = A i . Corollary 4.A.12. Let m be a real symmetric p x p matrix and let 72 be a real positive definite symmetric matrix. Let 1, 2 AZ 2 - . 2 1, be the roots of m in the metric X. Let T be a matrix such that
TmT = diag(Al, A 2 , . . . ,1,) and TCT = I. Then
where T,i is the ith column of T. Proof. Reserved for the reader
0
394
MULTIVARIATE MODELS
Result 4.A.23. (Positive square root) Let m be a real symmetric positive semidefinite p x p matrix. Let Q be the matrix such that
Q'mQ = diag(A,, A2,. . . , Ap) = A and Q'Q = QQ' = I. Define the positive square root of m by n
where A'/' = diag(A;/',
A:/', . . . ,Ail'),
the positive square roots of Ai are chosen, and Q.i is the ith column of Q. Then m = [m1/']2 = m1/2m1/2. If m is positive definite, I = m- '/zmm-112and m - ' = [m-'/2]2, where m-1/2
and A -
= diag(A;
'I2,
A;
'/',
= [m1/2]-1
= Q,j-1/2Q!
. . . , A;
Proof. Reserved for the reader.
0
The following results on the traces of products of matrices appear several places in the literature. Our proofs are based on those of Theobald (1975a,
1975b).
Result 4.A.24. Let M and S be real symmetric p x p matrices. Let the characteristic roots of M be A M , 2 AM2 2 * * * 2 AM,, and let the characteristic roots of S be As, 2 A,, 2 * . * 3 Asp. Then
tr{MS} < tr{AMAs},
where 'M
= diag(AM1,
As = diag@,,, As2,
..
(4.A.23)
7
. . . ,Asp).
Furthermore, tr{MS} = tr{AMA,} if and only if there exists an orthogonal matrix Q such that Q'MQ = A M and Q'SQ = As.
(4.A.24)
Proof. Let Q M be an orthogonal matrix composed of characteristic vectors of M and let Q, be an orthogonal matrix composed of characteristic
APPENDIX 4.A.
395
MATRIX-VECTOR OPERATIONS
vectors of S. We have tr{MSI = tr(Q.wAMQLQsAsQiJ = tr{QiQ.w.A.wQLQsAs). Let B = QbQs and note that B is an orthogonal matrix. Then
If=,
bfi = 1. Let C be the matrix with elements cij = b;. where Cp=I b i = A matrix C with nonnegative elements that sum to one by both rows and columns is called a doubly stochastic matrix. Let cij,c j k be two elements of C with i > j and k > j . Let a i k ( j ) = min(cij, C j k ) . If we replace cij,C j k , c j j , and Cik Of c by cij
- aik(j),
Cjk
- aik(]],
cjj
+
and
aik(j),
Cik
+
aik(j),
respectively, the new matrix, denoted by C1, has at least one zero element, is doubly stochastic, and
Asz, . . . ,Asp). The operation
AbClAS = AbcAS + a i k ( j ) ( i M i - A M j ) ( A S k
where Ah = (A,,, l M z.,. ., AMP) and A$ = (A,,, can be repeated until the matrix C is reduced to the p x p identity and the inequality result follows. If Q M = Qs, then C = I and the equality result is established. 0
Corollary 4.A.14. Let M and S be real symmetric matrices with S nonsingular. Then tr{MS-'} 2 tr{AMAi'},
(4.A.25)
where A M = diag(lM,, AMZ, . . J ~ p As = diag(Asl, AS2,. . . ,Asp), 7
h
3 . . . 3 lMP are the roots of M, and i s ,2 3 . . ' 3 i s ,are 3 the roots of S. Expression (4.A.25) is a n equality if and only if there exists a Q satisfying Equations (4.A.24).
Proof. Replace S by -S-' in the proof of Result 4.A.14. Result 4.A.15. Let M, R,and S be real symmetric p x p matrices and let S be positive definite. Let y M l 2 yy2 2 * . * 2 yMp be the roots of IM - $31 = 0,
396
MULTIVARIATE MODELS
and let
2
Y R ~ YR2
2
* * .
2 yRP be the roots of fR - yS( = 0.
Then tr{M(R
+ s)-’}3 tr{r,(I
rR)-’},
(4.A.26)
where rM = diag(yM,,y M z , . . . , y M p ) and r R = diag(yRl,~ ~ 2. .,,YR$. . Furthermore, expression (4.A.26) is an equality if and only if there exists a T such that T’MT = I?,,
T R T = rR, and TST = I.
(4.A.27)
Proof. Let P,,, be an orthogonal matrix of characteristic vectors of the and let P, be an orthogonal matrix of characteristic matrix S- MSvectors of S-”2RS-”2.Then P, is a matrix such that
’‘*
PR(S-’’’RS-’/~
+ I)PR = (r, + I).
It follows from Corollary 4.A.14 that
tr{M(R + S)-’} = tr{S-’/2MS-’/2(S-1/2RS-1/2 + I)-’} 2 tr{rM(rR I)-’)
+
with equality holding if and only if there exists a T satisfying Equations (4.A.27). 0 APPENDIX 4.B. PROPERTIES OF LEAST SQUARES AND MAXIMUM LIKELIHOOD ESTIMATORS In this appendix we present theorems giving conditions under which the least squares and maximum likelihood estimators of parameters of multivariate measurement error models are consistent and asymptotically normally distributed. A critical part of the proof is the fact that the estimator defined implicitly as the value giving the maximum of a continuous function is itself a continuous function of the remaining arguments of the original function. We begin with two lemmas that establish this result. Lemma 4.8.1 is an adaptation of Lemma 1 of Jennrich (1969). Lemma 4.B.l. Let y(x, y) be a continuous real valued function defined on the Cartesian product A x B, where A is a subset of p-dimensional Euclidean space and B is a compact subset of q-dimensional Euclidean space.
APPENDIX 4.B.
397
PROPERTIES OF LEAST SQUARES
Then the function (4.B.1)
h(x) = max g(x, Y) YEB
is a continuous function of x on the interior of A .
Proof. Let xo be any interior point of A . Then there exists a 6, > 0 such that x is in A if ( x - x o l < do, where l x - x o / is the Euclidean norm of x - xo. Because S = ((x, y): I x - x o ( < do, y E'B)is compact, g(x, y) is uniformly continuous on S. Therefore, for any E > 0, there is a 6, > 0 such that 1S(XI, Y J - S(X2, Y2)) < E
for all (x,, y l ) and (x2, y2) in S satisfying Jxl - x2(< 6, and IyI Hence, if
-
y21 < 6,.
Jx- xo( < 6,
where 6 = min(6,, h1), then
< g(x, Y) < g(x0, Y) for all y E E. Therefore, for Ix - xo( < 6 g(x0, Y) - E
max g(xo, y) - E < max g(x, y) YE,
YSB
+E
(4.B.2)
< max g(xo, y) + E. YE,
Because c is arbitrary, h(x) is continuous at x = xo, where h(x) is defined in (4.B.l). Because xo is an arbitrary interior point of A , h(x) is continuous on 0 the interior of A . Lemma 4.B.2. Let the assumptions of Lemma 4.B.1 hold. Let xo be an interior point of A. Assume that the point yo is the unique point for which max,., g(xo, y) is attained. Let yM(x) be a point in B such that g(x7 YMM(X)) = max g(x9 Y). Y.6
Then yM(x) is a continuous function of x at x = xo.
Proof. By assumption, yo is unique and g(xo, y) is continuous on the compact set B. Hence, for any E > 0 there is an V ( E ) > 0 such that 9@0, Y) < dxo, Yo) - v ( 4
(4.B.3)
for all y in B satisfying (y - yol 2 E. By Lemma 4.B.1, h(x) of (4.B.1) is a continuous function of x. Therefore, given U(E) > 0, there exists a 6, > 0 such
398
MULTIVARIATE MODELS
that for all Ix - xoI < 6 ,
’
(4.B.4) h(x) dxo, Yo) - +I(&). By Equation (4.B.2) there exists a 6, > 0, such that for all Ix - xo( < 6, and all y E B (4.B.5) B(X, Y) < B(Xm Y) +
am.
Thus, for all y in B satisfying Iy - yo( 2 E and x such that Ix - xoI < 6, where 6 = min(d,, d3), g(x, Y) < Ax,, Y)
+ M E ) < dxo, Yo) - am,
where we have used (4.B.3). Thus, by (4.B.4), for (x - xol < 6 and all y in B satisfying Iy - yo\ 2 E, (4.B.6) - 3m. Therefore, (x - xo( < S implies (yM(x) - yo( < E and yM(x) is continuous at Ax, Y) < d x , Y&))
0
x = xo.
Theorem 4.B.1 Let {s,,}be a sequence of p-dimensional statistics. ASsume S, converges to the p-dimensional vector uo almost surely as YI -+ co. Let g(s, y) be a continuous function defined on A x B, where B is a compact set and u0 is in the interior of A. Assume maxysBg(ao, y) is uniquely attained at y = yo. Let
9 = 9(SJ be the value of y such that dsnr
Then 9 + yo almost surely as n
f) = max g(s,, YEB
Y).
-, 00.
Proof. By Lemma 4.B.2, g(s,,) is a continuous function of s, and the result follows. 0 We now derive the limiting distribution of the estimator constructed by maximizing the normal likelihood for a covariance matrix. Theorem 4.B.2. Let {Z,} be a sequence of independent identically distributed random vectors with mean pz, covariance matrix Zz&), and finite fourth moments. Let Zzz(y) be a continuous function of y with continuous first and second derivatives. Let A, the parameter space for y, be a convex compact subset of q-dimensional Euclidean space such that Z,,(y) is positive definite for all y in A. Assume that the true parameter value yo is in the interior of A and that &,(y) # Xzz(yo) for any y in A with y # yo. Let 9 be the value
APPENDIX 4.8.
399
PROPERTIES OF LEAST SQUARES
of y that maximizes
S(r,s) =
- bl&z(y)l
- tr{mzzmY)}>
(4.B.7)
where s = vech m,,. Let F be of full column rank, where
is the matrix of partial derivatives of &(y) y = yo. Then n”2(r^- Yo)
where
5 “0,
with respect to y evaluated at (4.B.8)
V,,),
’
’
V,, = [FaIF] - FR-‘EaaR-‘F[F’R - F] Zaa= E{aa’},
(4.B.9)
a = vechC(Zt - PZW, - Pz) - ~ Z Z ( Y O ) I %
(4. B. 10) (4.B.1 1)
Q =
2~’,C~zz(Yo) 0 ~zz(Yo)l%.
Proof. By assumption, Ezz(y) is nonsingular for all y in A . Also, all partial derivatives of the first two orders of Xzz(y) exist and are continuous functions of y on A. Hence, all partial derivatives of the first two orders of the function (4.B.7) with respect to y exist and are continuous functions of y and of vech m,, = s on the Cartesian product A x B, where B is a subset of 2 - ’ p x ( p 1)-dimensional Euclidean space such that s in B implies that m,, is positive definite. Then, by Taylor’s theorem, for y in A ,
+
(4.B.12) where y** is used to denote the fact that the elements of the matrix are evaluated at points on the line segment joining yo and y. By Theorem 4.B.1. f is consistent for y o . Therefore, with probability approaching one as n increases, f(y, s) attains its maximum at an interior point 9 in A and (4.B.13) with probability approaching one. Using (4.B.13),we evaluate (4.B.12) at y =
9 to obtain
(4.B.14) with probability approaching one, where y* is used to denote the fact that the elements of the matrix are evaluated at points on the line segment joining
400 yo and
MULTIVARIATE MODELS
9. By Theorem 4.B.1 9
-+
yo
as. as n
-+
co.
(4.B.15)
Because m,, is, essentially, composed of the sample means of independent identically distributed random variables with finite population means, m,, &,(yo), as. as n -+ 00. (4.B.16) Thus, -+
(4.B.17) where (4.B.18) and uo = vech Cz,(yo), because the second derivative is a continuous function of y and s. The matrix Ho is negative definite because the function f ( y , uo)has continuous first and second derivatives and attains its maximum uniquely at an interior point y = yo. Therefore, the probability that the matrix of partial derivatives on the left side of (4.B.18) is nonsingular approaches one as n -+ co. Hence, by (4.B.14) (4.B.19) with probability approaching one as n
-, GO. For an element yi of y,
where we have used Results 4.A.8, 4.A.9, and 4.A.6. It follows that
= 2F'R- vech[m,,
- Xzz(yo)],
(4.B.20)
where i 2 - l is defined in terms of Xi,' in Lemma 4.A.1. For elements yi and Y j of Y,
APPENDIX 4.B.
401
PROPERTIES OF LEAST SQUARES
Because the partial derivatives are continuous, we have, by (4.B.151, (4.B.17), and (4.B.21), (4.B.22) It follows from (4.B.19), (4.B.20), and (4.B.22) that
9 - yo = [F’fl-’F]-’F’R-’
vech[mzz - CZz(yo)]+ oP(n-’’*), (4.B.23)
The conclusion follows from (4.B.23) and the fact that n’’’ vech[m,, C,,(y,,)] converges in distribution to a normal vector. See Theorem l.C.l
0
Corollary 4.B.2. Let the assumptions of Theorem 4.B.2 hold. Assume that the Z, are normally distributed. Then n’’*(y^ - yo)
5 N(0, [FR-’F]-’),
(4.B.24)
where S-2 and F are defined in Theorem 4.B.2.
Proof. The conclusion is immediate because the Zaaof (4.B.9) is equal to for normally distributed Z,. See Lemma 4.A.1. 0
Theorem 4.B.3. Let
z, = (Po, 0) + x,(A 1) + e,,
(4.B.25)
where the e, are independent identically distributed random variables with zero mean vector, covariance matrix 7&&, and finite fourth moments. Let (x,) be a sequence of k-dimensional fixed vectors. Let ya be the vector containing the unknown portion of /Iand let yp be the vector containing the unknown portion of X&&.Assume that Y’ = [Y;, Y:, (vech mx,.)’l= CYb, Y:,
r;1
is an element of a convex, compact parameter space A. Assume that for any y in A, mxxand Xzz(Y) = (A ~ Y ~ , . , . ( P1) 3 + C&,
are positive definite. For n > k, let = [ ~ b oYLO, , (vech m x x n ) l l = C Y ~ O ,YLO,
YAI
be the true value of y in A and let lim mxxn= m,.,,
n-rm
where mxxnand
m,.. are positive definite. Assume that Yb = [Ybo. y:o> (vech fix,)’]
= [Ybo, YLO,
v:01
402
MULTIVARIATE MODELS
is an interior point of A. Assume that for any y in A, with y # yo, Xzz(y) # %hJo)* Let
roan = V{(n - 1)l/* vech(m,, - mzzn- &)}, (4.B.26) F=
a vech &,(Yo) dY’
(4.B.27)
3
where F is the matrix of partial derivatives of vech Z&) y’ evaluated at y = yo and F is full column rank. Then
with respect to
r^ -,yo, a.s., n”2(v^ - Y,) 5 N O , rYY), where $ is the value of y that maximizes (4.B.7),
rYy = ( F ’ ~ - ~ F ) - ~ FlQ F-, , , n - l ~ ( ~ ’ a - l ~ (4.B.28) )-l, l.2 = 2 + P C ~ Z Z ( Y O ) 0 ~ z z ( r o ) l ~ ~ .
Proof. Because f(y, s) of (4.B.7) evaluated at s = vech Ezz(yo) has a unique maximum on A at y = yo and because m,, -+ Zzz(yo),as., the consistency result follows by Theorem 4.B.1. The arguments used in the proof of Theorem 4.B.2 can be used to show that
9 - yn = (Fn-’F)-’F’Q-’ vech[mz,
+
- Ezz(yn)] op(n-1/2).
The result then follows by Theorem 1.C.2 of Appendix 1.C.
(4.B.29)
0
Corollary 4.B.3. Let the assumptions of Theorem 4.B.3 hold and, in addition, assume the e, to be normally distributed. Then n’/2[(Tp6,$3’ - ( ~ b o&4‘] ,
5 N O , Gi i),
(4.B.30)
where G I 1is the upper left block of the matrix (F’R-’F)-’ and F and 0 are defined in Theorem 4.B.3. Proof. This result follows from Theorem 4.2.4 of Section 4.2.3.
0
We now give the limiting distribution of the estimator $, where 9 is obtained by applying generalized least squares to the elements of the sample covariance matrix.
Theorem 4.B.4. Let {Z,} be a sequence of independent identically distributed random vectors with mean p,, covariance matrix Zzz(y),and finite
403
APPENDIX 4.8. PROPERTIES OF LEAST SQUARES
fourth moments. Let Zzz(y) be a continuous function of y with continuous first and second derivatives. Let A, the parameter space for y, be a convex, compact subset of q-dimensional Euclidean space such that XZz(y) is positive definite. Let yo, the true parameter value, be an interior point of A. Assume that for any y in A, with y # yo, ZJy) # ZZ,(yo).Let be the value of y that minimizes h(Y9 s; ~,,,,) = [ s - ~
o w L 7 -~ a(r)l, ~ S
(4.B.31)
where s = vech mzZ,a(?)= vech Zz,(y), and G,,,, is a positive definite random matrix that converges in probability to a positive definite matrix Go, as n approaches infinity. Let F be of full column rank, where F is the matrix of partial derivatives of a(y) with respect to y’ evalliated at y = yo. Then n’’’(7 - yo) -4, NO,
v,,),
where V,, = (FG,,F)- ‘F’G,,C,,G,,F(F’G,,F)-
(4.B.32)
C,, = E{aa’), and a = vech[(Z, - pz)‘(Z, - pz) - Xzz(y0)].
Proof. Using the arguments associated with (4.B.12) and (4.B.13), we can write, with probability approaching one as n + co, (4.B.33) where y* is on the line segment joining yo and 7. Because (f, s; G,,,) is converging in probability to (yo, @(yo); G,,), because the second derivatives are continuous, and because fky, a(yo); Ga,) has a unique minimum at y = yo,
(4.B.34)
s; Gaan) = -2FG,,[s aY
+ o,,(n-”’).
- a(yo)]
(4.B.35)
The result follows because, by Theorem 1.C.1, [s - a(yo)] is converging in distribution to a multivariate normal random variable with mean zero and 0 covariance matrix Z.,
404
MULTIVARIATE MODELS
Corollary 4.8.4. Let the assumptions of Theorem 4.B.4 hold. In addition, assume that the Z, are normally distributed and that G,,, = fWp(m&' 8 Then
n1/2(f - yo)
5 N ( 0 , [Fa-'F]-
'),
where R is defined in Theorem 4.B.2.
Proof. Under the assumptions,
= R and
Wp(mzz 0 mzz)% 5 The result then follows from (4.B.32).
0
APPENDIX 4.C. MAXIMUM LIKELIHOOD ESTIMATION FOR SINGULAR MEASUREMENT COVARIANCE In this appendix we derive the maximum likelihood estimator for the multivariate model, permitting the known error covariance matrix to be singular. The formulation provides a unified treatment for models that contain both explanatory variables measured without error and explanatory variables measured with error. Theorem 4.C.1. Let the normal model (4.1.12)hold with C,, and C,, nonbe the values of A; that satisfy singular. Let i;' d j; ' d . . .
. R . Statist. SOC.Ser. B 23, 160-170. DorlT, M. and Gurland, J. (1961b), Small sample behavior of slope estimators in a linear functional relation. Biometrics 17, 283-298. Draper, N. R. and Beggs, W. J. (1971), Errors in the factor levels and experimental design. Ann Math. Statist. 41, 46-58. Draper, N. R. and Smith, H. (1981), Applied Regression Analysis, 2nd ed. Wiley, New York. Driel, 0. P. van (1978), On various causes of improper solutions in maximum likelihood factor analysis. Psychometrika 43, 225-243. Duncan, 0. D. (1973, Introduction to Structural Equation Models. Academic Press, New York. Durbin, J. (1954), Errors-in-variables, I n t . Statist. Reu. 22, 23-32. Dwyer, P. S. (1967), Some applications of matrix derivatives in multivariate analysis. J . Am. Statist. Assoc. 62, 607-625. Efron, B. (1982), The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia. Farris, A. L., Klonglan, E. D., and Nomsen, R. C. (1977), The Ring-Necked Pheasant in Iowa. Iowa Conservation Commission, Des Moines, Iowa. Featherman, D. L. (19711, A research note: A social structural model for the socioeconomic career. Am. J . SOC.77, 293-304. Fedorov, V. V. (1974), Regression problems with controllable variables subject to error. Biometrika 61, 49-56. Feldstein, M. (1974), Errors in variables: A consistent estimator with smaller MSE in finite samples. J. Am. Statist. Assoc. 69, 990-996. Fieller, E. C. (1954), Some problems in interval estimation. J . R . Statist. SOC.Ser. B 16, 175-185. Fisher, R. A. (1938),The statistical utilization of multiple measurements. Ann. Eugenics 8, 376-386. Fisher, F. M. (1966), The Identijication Problem in Econometrics. McGraw-Hill, New York.
BIBLIOGRAPHY
417
Fletcher, R. and Powell, M. J. D. (1963), A rapidly convergent descent method for minimization. Cornput. J. 6, 163-168. Florens, J. P., Mouchart, M., and Richard, F. (1974) Bayesian inference in errorin-variables models. J . Multiuariate Anal. 4, 4 19-452. Franklin, J. N. (1968), Matrix Theory. Prentice-Hall, Englewood Cliffs, New Jersey. Frisillo, A. L. and Stewart, T. J. (1980a), Effect of partial gas/brine saturation on ultrasonic absorption in sandstone. Amoco Production Company Research Center, Tulsa, Oklahoma. I Frisillo, A. L. and Stewart, T. J. (1980b), Effect of partial gas/brine saturation on ultrasonic absorption in sandstone. J . Geophys. Res. 85, 5209-521 1. Fuller, W. A. (1975), Regression analysis for sample survey. Sankhyii C 37, 117-132. Fuller, W. A. (1976), Introduction to Statistical Time Series. Wiley, New York. Fuller, W. A. (1977), Some properties of a modification of the limited information estimator. Econometrica 45, 939-953. Fuller, W. A. (1978), An affine linear model for the relation between two sets of frequency counts: Response to query. Biometrics 34, 517-521. Fuller, W. A. (1980), Properties of some estimators for the errors-in-variables model. Ann. Statist. 8, 407-422. Fuller, W. A. (1984), Measurement error models with heterogeneous error variances. In Topics in Applied Statistics, Y. P. Chaubey and T. D. Dwivedi (Eds.). Concordia University, Montreal. Fuller, W. A. ( 1 985), Properties of residuals from errors-in-variables analyses. (Abstract) inst. Math. Statist. Bull. 14, 203-204. Fuller, W. A. (1986), Estimators of the factor model for survey data. In Proceedings of the Symposia in Statistics and Festschrijl in Honour of V . M . Joshi, I. B. Mac Neil1 and G. J. Umphrey (Eds.). Reidel, Boston. Fuller, W. A. and Chua, T. C. (1983), A model for multinomial response error. Proceedings of the 44th Session of the international Statistical lnstitute, Contributed Papers, Vol. 1, pp. 406-409. Fuller, W. A. and Chua, T. C. (1984), Gross change estimation in the presence of response error. In Proceedings of the Conference on Gross Flows in Labor Force Statistics. Bureau of the Census and Bureau of Labor Statistics, Washington, D.C. Fuller, W. A. and Harter, R. M. (1987), The multivariate components of variance model for small area estimation. In Small Area Statistics: An International Symposium, R. Platek, J. N. K. Rao, C. E. Sarndal, and M. B. Singh (Eds.). Wiley, New York. Fuller, W. A. and Hidiroglou, M. A. (1978), Regression estimation after correcting for attenuation. J . Am. Statist. Assoc. 73, 99-104. Fuller, W. A. and Pantula, S. G. (1982), A computational algorithm for the factor model. Iowa State University, Ames, Iowa. Gallant, A. R. (1975), Nonlinear regression. Am. Statist. 29, 73-81. Gallant, A. R. (l986), Nonlinear Statistical Models. Wiley, New York.
418
BIBLIOGRAPHY
Gallo, P. P. (1982), Consistency of regression estimates when some variables are subject to error. Commun. Statist. Part A 11, 973-983. Ganse, R. A., Amemiya, Y.,and Fuller, W. A. (1983), Prediction when both variables are subject to error, with application to earthquake magnitude. J . Am. Statist. ASSOC.78, 761 -765. Garber, S. and Klepper, S. (1980), Extending the classical normal errors-in-variables model. Econometrics 48, 1541 - 1546. Geary, R. C. (1942), Inherent relations between random variables. Proc. R. Irish Acad. Sect. A 47, 63-76. Geary, R. C. (1943),Relations between statistics: The general and the sampling problem when the samples are large. Proc. R. Irish Acad Sect. A 49, 177-196. Geary, R. C. (1948), Studies in relations between economic time series. J . R. Statist. SOC. Ser. B 10, 140-158. Geary, R. C. (1949), Determinations of linear relations between systematic parts of variables with errors of observation the variances of which are unknown. Econometricn 17, 30-58. Geraci, V. J. (1976), Identification of simultaneous equation models with measurement error. J . Econometrics 4, 263-282. Girshick, M. A. (1939), On the sampling theory of roots of determinantal equations. Ann. Math. Statist. 10, 203-224. Gleser, L. J. (1981), Estimation in a multivariate “errors-in-variables” regression model: Large sample results. Ann. Stntist. 9, 24-44. Gleser, L. 3. (1982), Confidence regions for the slope in a linear errors-in-variables regression model. Technical Report 82-23. Department of Statistics, Purdue University, Lafayette, Indiana. Gleser, L. J. (1983), Functional, structural and ultrastructural errors-in-variables models. Proc. Business Economic Statist. Sect. Am. Statist. Assoc., 57-66. Gleser, L. J. (1985), A note on G. R. Dolby’s ultrastructural model. Biometrikn 72, 117-124. Gleser, L. J. and Hwang, J. T. (1985), The nonexistence of lOO(1 - LZ)% confidence sets of finite expected diameter in errors-in-variables and related models. Technical Report 85- 15. Department of Statistics, Purdue University, Lafayette, Indian a. Gleser, L. J. and Watson, G. S. (1973), Estimation of a linear transformation. Biornetrika 60, 525-534. Goldberger, A. S. (1972), Maximum likelihood estimation of regressions containing unobservable independent variables. Int. Econ. Rev. 13, 1-15. Goldberger, A. S. (1972), Structural equation models in the social sciences. Econometrica 40, 979-1002. Goldberger, A. S. and Duncan, 0. D. (Eds.) (1973), Structural Equation Models in the Social Sciences. Seminar Press, New York. Goodman, L. A. (1974), Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika 61, 21 5-231.
BIBLIOGRAPHY
419
Gregory, K. E., Swiger, L.A., Koch, R. M., Sumption L. J., Rosden, W. W., and Ingalls, J. E. (1965), Heterosis in pre-weaning traits of beef cattle. J . Animal Sci. 24, 21 -28. Gregory, K. E., Swiger, L. A., Sumption, L. J., Koch, R. M., Ingalls, J. E., Rowden, W. W., and Rothlisberger, J. A. (1966a), Heterosis effects on growth rate and feed efficiencyof beef steers. J . Animal Sci. 25, 299-310. Gregory, K. E., Swiger, L. A., Sumption L. J., Koch, R. M., Ingalls, J. E., Rowden, W. W., and Rothlisberger, J. A. (1966b), Heterosis effects on carcass of beef steers. J . Animal Sci. 25, 31 1-322. Griffiths, D. A. and Sandland, R. L. (1982), Allometry and multivariate growth revisited. Growth 46, 1 - 1 1 . Griliches, Z. (1974), Errors in variables and other unobservables. Econometrica 42, 971 -998. Griliches, Z. and Ringstad, V. (1970), Errors in the variables bias in nonlinear contexts. Econometrica 38, 368-370. Grubbs, F. E. (1948), On testing precision of measuring instruments and product variability. J . Am. Statist. Assoc. 43, 243-264. Grubbs, F. E. (1973), Errors of measurement, precision, accuracy and the statistical comparison of measuring instruments. Technometrics 15, 53-66. Haberman, S. J. (1977), Product models for frequency tables involving indirect observation. Ann. Statist. 5, 1124- 1147. Hahn, G. J. and Nelson, W. (1970), A problem in the statistical comparison of measuring devices. Technometrics 12, 95- 102. Haitovsky, Y. (1972), On errors of measurement in regression analysis in economics. Rev. Int. Statist. Inst. 30, 23-35. Halperin, M. (1961), Fitting of straight lines and prediction when both variables are subject to error. J . Am. Statist. Assoc. 56, 657-659. Halperin, M. ( I 9641, Interval estimation in linear regression when both variables are subject to error. J . Am. Statist. Assoc. 59, 1112-1120. Halperin, M. and Gurian, J. (1971), A note on estimation in straight line regression when both variables are subject to error. J . Am. Statist. Assoc. 66, 587-589. Hanamura, R. C. (1979, Estimating variances in simultaneous measurement procedures. Am. Statist. 29, 108- 109. Hansen, M. H., Hurwitz, W. N., and Bershad, M. A. (1961), Measurement errors in censuses and surveys. Bull. Int. Statist. Inst. 38, 359-374. Harman, H. H. (1976), Modern Factor Analysis, 3rd ed. University of Chicago Press, Chicago. Hasabelnaby, N. A. (1983, Functionally related analysis of an error-in-variables model. Creative component for the M.S. degree. Iowa State University, Ames, Iowa. Heady, E. O., Sonka, S. T., and Dahm, P. F. (1976), Estimation and application of gain isoquants in decision rules for swine producers. J . Agric Econ. 27, 235-242.
420
BIBLIOGRAPHY
Healy, J. D. (1973, Estimation and tests for unknown restrictions in multivariate linear models. Department of Statistics, Purdue University, Mimeo Series 471. Healy, J. D. (1980),Maximum likelihood estimation of a multivariate linear functional relationship. J . Multivariate Anal. 10, 243-251. Henderson, H. V. and Searle, S. R. (1979), Vec and Vech operators for matrices, with some uses in Jacobian and multivariate statistics. Can. J . Statist. 7, 65-81. Henrici, P. (1964), Elements of Numerical Analysis. Wiley, New York. Hey, E. N. and Hey, M. H. (1960), The statistical estimation of a rectangular hyperbola. Biometrics 16, 606-6 17. Hidiroglou, M. A. (1974), Estimation of regression parameters for finite populations. Unpublished Ph.D. thesis. Iowa State University, Ames, Iowa. Hidiroglou, M. A., Fuller, W. A., and Hickman, R. D. (1980), SUPER CARP. Department of Statistics, Iowa State University, Ames, Iowa. Hinich, M. J. (1983), Estimating the gain of a linear filter from noisy data. In Handbook of Statistics, Vol. 3, D. R. Brillinger and P. R. Krishnaiah (Eds.). NorthHolland, Amsterdam. Hodges, S. D. and Moore, P. G. (1972), Data uncertainties and least squares regression. J . Appl. Statist. 21, 185-195. Hoschel, H. P. (1978a), Least squares and maximum likelihood estimation of functional relations. In Transactions of the Eighfh Prague Conference ow Information Theory, Statistical Decision Functions, Random Processes. Academia, Prague. Hoschel, H. P. (1978b), Generalized least squares estimators of linear functional relations with known error covariance. Math. Operationsforsch. Statist. Ser. Statist. 9, 9-26. Hotelling, H. (1933), Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24,417-441,498-520. Hotelling, H. (1957), The relation of the newer multivariate statistical methods to factor analysis. Br. J . Statist. Psychol. 10, 69-79. Hsu, P. L. (1941a), On the problem of rank and the limiting distribution of Fisher’s test function. Ann. Eugenics 11, 39-41. Hsu, P. L. (1941b), Canonical reduction of the general regression problem. Ann. Eugenics 11, 42-46. Hunter, J. S. (1980), The national system of scientific measurement. Science 210, 869-874. Hunter, W. G. and Lamboy, W. F. (1981), A Bayesian analysis of the linear calibration problem. Technometrics 23, 323-328. Hwang, J. T. (1986), Multiplicative errors-in-variables models with applications to the recent data released by U.S. Department of Energy. J. Am. Statist. Assoc. 81, 680-688. Isogawa, Y. (1983, Estimating a multivariate linear structural relationship with replication. J . R. Statist. SOC. Ser. B 47, 211-215. Jaech, J. L. (1971), Further tests of significance for Grubbs’ estimators. Biometrics 27, 1097-1101.
BIBLIOGRAPHY
421
Jaech, J. L. (1976), Large sample tests for Grubbs’ estimators of instrument precision with more than two instruments. Technometrics 18, 1271132. James, A. T. (1954), Normal multivariate analysis and the orthogonal group. Ann. Math. Statist. 25, 40-75. Jennrich, R. I. (1969), Asymptotic properties of nonlinear least squares estimators. Ann. Math. Statist. 40, 633-643. Jennrich, R. 1. (1973), Standard errors for obliquely rotated factor loadings. Psychometrika 38, 593-604. Jennrich, R. I. (1974), Simplified formulae for standard errors in maximum-likelihood factor analysis. Br. J. Math. Statist. Psychol. 27, 122-131. Jennrich, R. I. and Robinson, S. M. (1969), A Newton-Raphson algorithm for maximum likelihood factor analysis. Psychometrika 34, 1 1 1-1 23. Jennrich, R. 1. and Thayer, D. T. (1973), A note on Lawley’s formulas for standard errors in maximum likelihood factor analysis. Psychometrika 38, 571 -580. Johnston, J. (l972), Econometric Methods, 2nd ed. McGraw-Hill, New York. Jones, T. A. (1979), Fitting straight lines when both variables are subject to error I. Maximum likelihood and least squares estimation. J. Znt. Assoc. Math. Geol. 11, 1-25. Joreskog, K. G . (1966), Testing a simple structure hypothesis in factor analysis. Psychometrika 31, 165-178. Joreskog, K. G. (1967), Some contributions to maximum likelihood factor analysis. Psychometrika 32, 443-482. Joreskog, K. G. (1969), A general approach to confirmatory maximum likelihood factor analysis. Psychometrika 34, 183-202. Joreskog, K. G. (1970a), Estimation and testing of simple models. Er. J. Math. Statist. Psychol. 23, 121-145. JGreskog, K. G. (1970b), A general method for analysis of covariance structures. Biometrika 57, 239-251. Joreskog, K. G. (1971), Statistical analysis of sets of congeneric tests. Psychometrika 36, 109-133. Joreskog, K. G. (1973), A general method for estimating a linear structural equation system. In Structural Equation Models in the Social Sciences, A. S . Goldberger and 0.D. Duncan (Eds.). Seminar Press, New York. Joreskog, K. G. (1977), Factor analysis by least squares and maximum likelihood methods. In Statistical Methods for Digital Computers, Vol. 13, K. Enslein, R. Ralston, and S. W. Wilf (Eds.). Wiley, New York. Joreskog, K. G. (1978), Structural analysis of covariance and correlation matrices. Psychometrika 43, 443-477. Joreskog, K. G. (1981), Analysis of covariance structures. Scand. J. Statist. 8, 65-92. Joreskog, K . G. and Goldberger, A. S. (1972), Factor analysis by generalized least squares. Psychometrika 37, 243-260. Joreskog, K. G. and Lawley, D. N. (1968), New methods in maximum likelihood factor analysis. Br. J. Math. Statist. Psychol. 21, 85-96.
422
BIBLIOGRAPHY
Joreskog, K. G. and Sorbom, D. (1981), LISREL V: Analysis of linear structural relationships by maximum likelihood and least squares methods. University of Uppsala, Uppsala, Sweden. Kadane, J. B. (1970), Testing overidentifying restrictions when the disturbances are small. J . Am. Statist. Assoc. 65, 182-185. Kadane, J. B. (1971), Comparison of k-class estimators when the disturbances are small. Econometrica 39, 723-737. Kalbfleisch, J. D. and Sprott, D. A. (1970), Application of likelihood methods to models involving large numbers of parameters. J. R. Statist. SOC. Ser. B 32, 175-208.
Kang, Y. J. (1983, Estimation for the no-intercept errors-in-variables model. Creative component for the M.S.degree. Iowa State University, Ames, Iowa. Karni, E. and Weissman, I. (1974), A consistent estimator of the slope in a regression model with errors in the variables. J . Am. Statist. Assoc. 69,211-213, corrections 840.
Kelley, J. (1973), Causal chain models for the socioeconomic career. Am. SOC. Rev. 38,481-493.
Kelly, G. (1984), The influencefunction in the errors in variables problem. Ann. Statist. 12,87-100.
Kendall, M . G . (1951), Regression, structure, and functional relationship, I. Eiometrika 38, 11-25. Kendall, M. G . (1952), Regression, structure, and functional relationship, 11. Eiometrika 39, 96-108. Kendall, M. G. and Stuart, A. (1977), The Advanced Theory of Statistics, Vol. 1, 4th ed. Hafner, New York. Kendall, M. G. and Stuart, A. (1979), The Advanced Theory of Statistics, Vol. 2,4th ed. Hafner, New York. Kerrich, J. E. (1966), Fitting the line Y = GIXwhen errors of observation are present in both variables. Am. Statist. 20, 24. Ketellapper, R. H. (1982), Two-stage least squares estimation in the simultaneous equation model with errors in the variables. Rev. Econ. Statist. 64, 696-701. Ketellapper, R. H. and Ronner, A. E. (1984), Are robust estimation methods useful in the structural errors-in-variables model? Metrika 31, 33-41. Kiekr, J. and Wolfowitz, J. (1956), Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27, 887-906.
Klepper, S . and Learner, E. E. (1984), Consistent sets of estimates for regression with errors in all variables. Econometrica 55, 163-184. Konijn, H. S. (1981), Maximum likelihood estimator and confidence intervals for a simple errors in variables model. Commun. Statist. Part A 10, 983-996. Koopmans, T. C. (1937), Linear Regression Analysis o j Economic Time Series. DeErven F. Bohn, Haarlem, The Netherlands. Koopmans, T. C. and Hood, W. C. (1953), The estimation of simultaneous linear economic relationships. In Studies in Econometric Method, W . C. Hood and T. C.
BIBLIOGRAPHY
423
Koopmans (Eds.). Yale University Press, New Haven. Koopmans, T. C. and Reiersol, 0. (1950), The identification of structural characteristics. Ann. Math. Statist. 21, 165-181. Korn, E. 1. (1982), The asymptotic efficiency of tests using misclassified data in contingency tables. Eiometrics 38, 445-450. Kummell, C. H. (1879), Reduction of observed equations which contain more than one observed quantity. Analyst 6, 97-105. Lakshminarayanan, M. Y. and Gunst, R. F. (1984),Estimation of parameters in linear structural relationships: Sensitivity to the choice of the ratio of error variances. Biometrika 71, 569-573. Lawley, D. N. (1940), The estimation of factor loadings by the method of maximum likelihood. Proc. R. SOC.Edinburgh A 60, 64-82. Lawley, D. N. (1941), Further investigations in factor estimation. Proc. R. SOC.Edinburgh A 61, 176-185. Lawley, D. N. (1943), The application of the maximum likelihood method to factor analysis. Br. J . Psychol. 33, 172-175. Lawley, D. N. (1953), A modified method of estimation in factor analysis and some large sample results. Uppsala Symposium on Psychological Factor Analysis, Nordisk Psykologi’s Monograph, No. 3. Almqvist and Wiksell, Stockholm. Lawley, D. N. (1956), Tests of significance for the latent roots of covariance and correlation matrices. Biometrika 43, 128- 136. Lawley, D. N. (1967), Some new results in maximum likelihood factor analysis. Proc. R. SOC. Edinburgh A 67, 256-264. Lawley, D. N. (1976), The inversion of an augmented information matrix occurring in factor analysis. Proc. R. SOC.Edinburgh A 75, 171-178. Lawley, D. N. and Maxwell, A. E. (1971), Factor Analysis as a Statistical Method, 2nd ed. American Elsevier, New York. Lawley, D. N. and Maxwell, A. E. (1973), Regression and factor analysis. Biometrika 60, 331-338. Lazarsfeld, P. F. (1950), The logical and mathematical foundation of latent structure analysis. In Measurement and Prediction, S. A. Stouffer et al. (Eds.), Princeton University Press, Princeton, NJ. Lazarsfeld, P. F. (1954), A conceptual introduction to latent structure analysis. In Mathematical Thinking in the Social Sciences, P. F. Lazarsfeld (Ed.), The Free Press, Glencoe, IL. Lazarsfeld, P. F. and Henry, N. W. (1968), Latent Structure Analysis. Houghton Mifflin, Boston. Learner, E. E. (1 978), Least-squares versus instrumental variables estimation in a simple errors in variables model. Ecanometrica 46, 961-968. Lee, S. Y. (1989, On testing functional constraints in structural equation models. Eiometrika 72, 125-132. Lee, S. Y. and Bentler, P. M. (1980), Some asymptotic properties of constrained generalized least squares estimation in covariance structure models. S. Afr. Statist. J . 14, 121-136.
424
BIBLIOGRAPHY
Lee, S.Y. and Jennrich, R. I. (1979), A study of algorithms for covariance structure analysis with specific comparisons using factor analysis. Psychometrika 44, 99-113. Levi, M. D. (1973), Errors in the variables bias in the presence of correctly measured variables. Econometrica 41, 985-986. Lindley, D. V. (1947), Regression lines and the linear functional relationship. J. R. Statist. SOC. Suppl. 9, 218-244. Lindley, D. V. (1953), Estimation of a functional relationship. Biometrika 40,47-49. Lindley, D. V. and El Sayyad, G. M. (1968), The Bayesian estimation of a linear functional relationship. J. R. Statist. SOC.Ser. B 30, 190-202. Linssen, H. N. (1980), Functional relationships and minimum sum estimation. Doctoral dissertation. Technische Hogeschool, Eindhoven, The Netherlands. Linssen, H. N. and Banens, P. J. A. (1983), Estimation of the radius of a circle when the coordinates of a number of points on its circumference are observed: An example of bootstrapping. Statist. Prob. Lett. 1, 307-31 1. Longley, J. W. (1967), An appraisal of least squares programs for the electronic computer from the point of view of the user. J. Am. Statist. Assoc. 62, 819-841. Lord, F. M. (1960), Large-sample covariance analysis when the control variable is fallible. J. Am. Statist. Assoc. 55, 307-321. Madansky, A. (1959), The fitting of straight lines when both variables are subject to error. J. Am. Statist. Assoc. 54, 173-205. Madansky, A. (1976), Foundations of Econometrics. North-Holland, Amsterdam. Magnus, J. R. (1984), On differentiating eigenvalues and eigenvectors. London School of Economics, London. Mak, T. K. (1981), Large sample results in the estimation of a linear transformation. Biometrika 68, 323-325. Mak, T. K. (1983), On Sprent’s generalized least-squares estimator. J . R. Statist. SOC. Ser. B. 45, 380-383. Malinvaud, E. (1970), Statistical Methods of Econometrics. North-Holland, Amsterdam. Maloney, C . J. and Rastogi, S. C. (1970), Significance test for Grubbs’ estimators. Biometrics 26, 671-676. Mandel, J. (1959), The measuring process. Technometrics 1, 251-267. Mandel, J. (1976), Models, transformations of scale and weighting. J. Quality Technol. 8, 86-97. Mandel, J. (1982), The linear functional relationships with correlated errors in the variables; a heuristic approach. Unpublished manuscript. National Bureau of Standards, Washington, D.C. Mandel, J. and Lashof, T. W. (1959), The interlaboratory evaluation of testing methods. ASTM Bull. NO. 239, 53-61. Mann, H. B. and Wald, A. (1943), On stochastic limit and order relationships. Ann. Math. Statist. 14, 217-226. Mariano, R. S.and Sawa, T. (1972), The exact finite-sampledistribution of the limited-
BIBLIOGRAPHY
425
information maximum likelihood estimator in the case of two included endogenous variables. J. Am. Statist. Assoc. 67, 159-163. Martinez-Garza, A. (1970), Estimators for the errors in variables model. Unpublished Ph.D. dissertation. Iowa State University, Ames, Iowa. McCulloch, C. E. (1982), Symmetric matrix derivatives with applications. J. Am. Statist. Assoc. 77, 679-682. McDonald, R. P. (l978), A simple comprehensive model for the analysis of covariance structures. Br. J. Math. Statist. Psychol. 31, 59-72. McDonald, R. P. (1980), A simple comprehensive model for the analysis of covariance structures: Some remarks o n applications. Br. J. Math. Statist. Psychol. 33, 161- 183. McRae, E. C. (1974), Matrix derivatives with an application to an adaptive linear decision problem. Ann. Statist. 1, 763-765. Meyers, H. and von Hake, C. A. (1976), Earthquake Data File Summary. National Geophysical and Solar-Terrestrial Data Center, U.S. Department of Commerce, Boulder, Colorado. Miller, M. H. and Modigliani, F. (1966), Some estimates of the cost of capital to the electric utility industry, 1954-57. Am. Econ. Rev. 56, 333-391. Miller, R. C., Aurand, L. W., and Flack, W. R. (1950), Amino acids in high and low protein corn. Science 112, 57-58. Miller, S. M. (1984), Tests for the slope in the univariate linear errors-in-variables model. Creative component for the M.S. degree. Iowa State University, Ames, Iowa. Miller, S. M. (1986), The limiting behavior of residuals from measurement error regressions. Unpublished Ph.D. dissertation. Iowa State University, Ames, Iowa. Moberg, L. and Sundberg, R. (1978), Maximum likelihood estimator of a linear functional relationship when one of the departure variances is known. Scund. J . Statist. 5, 61-64. Moran, P. A. P. (1961), Path coefficients reconsidered. Aust. J. Statist. 3, 87-93. Moran, P. A. P. (1971), Estimating structural and functional relationships. J . Multivariate Anal. 1, 232-255. Morgan, W. A. (1939), A test for the significance of the difference between the two variances in a sample from a normal bivariate populatipn. Biometrika 31, 13-19. Morgenstern, 0.(1963), On the accuracy of economic observations. Princeton University Press, Princeton, New Jersey. Morimune, K. and Kunitomo, N. (1980), Improving the maximum likelihood estimate in linear functional relationships for alternative parameter sequences. J. Am. Statist. Assoc. 75, 230-237. Morton, R. (1981a), Efficiency of estimating equations and the use of pivots. Biometrika 68, 227-233. Morton, R. (1981b), Estimating equations for an ultrastructural relationship. Biometrika 68, 735-737.
426
BIBLIOGRAPHY
Mowers, R. P. (1981), Effects of rotations and nitrogen fertilization on corn yields at the Northwest Iowa (Calva-Primghar) Research Center. Unpublished Ph.D. dissertation. Iowa State University, Ames, Iowa. Mowers, R. P., Fuller, W. A,, and Shrader, W. D. (1981), Effect of soil moisture on corn yields on Moody soils. Iowa Agriculture and Home Economics Experiment Station Research Bulletin 593, Ames, Iowa. Mulaik, S. A. (1972), The Foundations of Factor Analysis. McGraw-Hill, New York. Nagar, A. L. (1959), The bias and moment matrix of the general k-class estimators of the parameters in simultaneous equations. Econometrica 27, 575-595. Nair, K. R. and Banerjee, K. S. (1942-1943), A note on fitting straight lines if both variables are subject to error. Sankhyii 6, 331. Nair, K. R. and Shrivastava, M. P. (1942-1943), On a simple method of curve fitting. S~nkhyii6, 121-132. National Opinion Research Center (1947), Jobs and occupations: A population evaluation. Opinion News 9, 3-13. Nel, D. G. (1980), On matrix differentiation in statistics. S . Afr. Statist. J . 14, 137-193. Neudecker, H. (1969), Some theorems on matrix differentiation with special reference to Kronecker matrix products. J . Am. Statist. Assoc. 64, 953-963. Neyman, J. (1951), Existence of consistent estimates of the directional parameter in a linear structural relation between two variables. Ann. Math. Statist. 22, 497512. Neyman, J. and Scott, E. L. (1948), Consistent estimates based on partially consistent observations. Econometrica 16, 1-32. Neyman, J. and Scott, E. L. (1951), On certain methods of estimating the linear structural relation. Ann. Math. Statist. 22, 352-361. Nussbaum, M. (1976), Maximum likelihood and least squares estimation of linear functional relationships. Math. Operationsforsch. Statist. Ser. Statist. 7 , 23-49. Nussbaum, M. (1977), Asymptotic optimality of estimators of a linear functional relation if the ratio of the error variances is known. Math. Operationsjiorsch. Statist. Ser. Statist. 8, 173-198. Nussbaum, M. (1979), Asymptotic efficiency of estimators of a multivariate linear functional relation. Math. Operationsforsch. Statist. Ser. Statist. 10, 505-527. Nussbaum, M. (1984), An asymptotic minimax risk bound for estimation of a linear functional relationship. J . Multivariate Anal. 14, 300-3 14. Okamoto, M. (1973), Distinctness of the eigenvalues of a quadratic form in a multivariate sample. Ann. Statist. 1, 763-765. Okamoto, M. (1983), Asymptotic theory of Brown-Fereday’s method in a linear structural relationship. J . Japan Statist. Soc. 13. 53-56. Okamoto, M. and Isogawa, Y. (1981), Asymptotic confidence regions for a linear structural relationship. J . Japan Statist. Soc. 11, 119-126. Okamoto, M. and Isogawa, Y. (1983), Maximum likelihood method in the BrownFereday model of multivariate linear structural relationship. Math. Japanica 28, 173- 180.
BIBLIOGRAPHY
427
Okamoto, M. and Masamori, I. (1983), A new algorithm for the least-squares solution in factor analysis. Psychometrika 48, 597-605. ONeill, M., Sinclair, L. G., and Smith, F. J. (1969), Polynomial curve fitting when abscissas and ordinates are both subject to error. Cornput. J . 12, 52-56. Oxland, P. R., McLeod, D. S., and McNeice, G. M. (1979),An investigation of a radiographic technique for evaluating prosthetic hip performance. Technical Report, University of Waterloo, Department of Systems Design. Pakes, A. (1982), On the asymptotic bias of the Wald-type estimators of a straight line when both variables are subject to error. Int. Econ. Rev. 23, 491-497. Pal, M. (1980), Consistent moment estimators of regression coefficients in the presence of errors in variables. J. Econometrics 14, 349-364. Pal, M. (198 l), Estimation in Errors-in-Variables Models. Ph.D. thesis. Indian Statistical Institute, Calcutta. Pantula, S.G . (1983), ISU FACTOR. Department of Statistics, North Carolina State University, Raleigh, North Caroline. Pantula, S. G . and Fuller, W. A. (1986), Computational algorithms for the factor model. Commun. Statist. Part B 15, 227-259. Patefield, W. M. (1976), O n the validity of approximate distributions arising in fitting a linear functional relationship. J. Statist. Cornput. Sirnul. 5, 43-60. Patefield, W. M. (1977), On the information matrix in the linear functional relationship problem. Appl. Statist. 26, 69-70. Patefield, W. M. (1978), The unreplicated ultrastructural relation: Large sample properties. Biometrika 65, 535-540. Patefield, W. M. (198 l), Multivariate linear relationships: Maximum likelihood estimation and regression bounds. J . R . Statist. SOC.Ser. B 43, 342-352. Patefield, W. M. (1989, Information from the maximized likelihood function. Biometrika, 72, 664-668. Pearson, K. (1901), On lines and planes of closest fit to systems of points in space. Philos. Mag. 2, 559-572. Pearson, K. (1902), On the mathematical theory of errors of judgment, with special reference to the personal equations. Philos. Trans. R. Soc. London A198,235-299. Pierce, D. A. (1981). Sources of error in economic time series. J . Econometrics 17, 305-322. Pitman, E. J. G . (1939), A note on normal correlation. Biornetrika 31, 9-12. Prentice, R. L. (1982), Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika 69, 33 1-342. Rao, C. R.(1955), Estimation and test of significance in factor analysis. Psychometrika 20,93-111. Rao, C. R. (1965), Linear Statistical Inference and Its Applications. Wiley, New York. Reilly, P. M. and Patino-Leal, H. (1981), A Bayesian study of the error-in-variables model. Technometrics 23, 22 1-23 I . Reilman, M. A., Gunst, R. F., and Lakshminarayanan, M. Y. (1985), Structural model estimation with correlated measurement errors. Biometrika 72, 669-672.
428
BIBLIOGRAPHY
Reiersol, 0. (1950), Identifiability of a linear relation between variables which are subject to error. Econometrica 18, 375-389. Richardson, D. H. (1968), The exact distribution of a structural coefficient estimator. J. Am. Statist. Assoc. 63, 1214-1226. Riggs, D. S., Guarnieri, J. A., and Addelman, S. (1978), Fitting straight lines when both variables are subject to error. LiJe Sci. 22, 1305-1360. Robertson. C. A. (1974), Large sample theory for the linear structural relation. Biometrika 61, 353-359. Robinson, P. M. (1977), The estimation of a multivariate linear relation. J . Multivariate Anal. 7 , 409-423. Ronner, A. E. (1986),Moment estimators in a structural regression model with outliers in the explanatory variable; theorems and proofs. Metrika, to be published. Roos, C. F. (1937), A general invariant criterion of fit for lines and planes where all variates are subject to error. Metron 13, 3-20. Roth, W. E. (1934), On direct product matrices. Bull. Am. Math. SOC.40, 461-468. Russel, T. S. and Bradley, R. A. (1958),One-way variances in a two-way classification. Biometrika 45, 11 1-129. Rust, B. W., Leventhal, M., and McCall, S. L. (1976), Evidence for a radioactive decay hypothesis for supernova luminosity. Nature (London) 262, 118-120. Sampson, A. R. (1974), A tale of two regressions. J . Am. Statist. Assoc. 69, 682-689. Sargan, J. D. (1958), The estimation of economic relationships using instrumental variables. Econometrica 26, 393-415. Sargan, J. D. and Mikhail, W. M. (1971),A general approximation to the distribution of instrumental variable estimates. Econometrica 39, 131-169. SAS Institute Inc. (1985), SAS@User’s Guide: Statistics, Version 5 Edition. SAS Institute Inc., Cary, North Carolina. Sawa, T. (19691, The exact sampling distribution of ordinary least squares and twostage least squares estimators. J. Am. Statist. Assoc. 64, 923-937. Sawa, T. (1973), Almost unbiased estimator in simultaneous equations system. Int. Econ. Rev. 14, 97-106. Schafer, D. W. (1986), Combining information on measurement error in the errorsin-variables model. 1. Am. Statist. Assoc. 81, 181-185. Scheffe, H. (1958),Fitting straight lines when one variable is controlled. J. Am. Statist. ASSOC.53, 106-117. Schneeweiss, H. (1976), Consistent estimation of a regression with errors in the variables. Metrika 23, 101-1 15. Schneeweiss, H. (1980), An efficient linear combination of estimators in a regression with errors in the variables. In Mathematische Systeme in der Okonomie, M. J. Beckmann, W. Eichhorn, and W. Krelle (Eds.). Athenaum, Konigstein, Schneeweiss, H. (1982), Note on Creasy’s confidence limits for the gradient in the linear functional relationship. J. Multivariate Anal. 12, 155- 158. Schneeweiss, H. (1985), Estimating linear relations with errors in the variables; the
BIBLIOGRAPHY
429
merging of two approaches. In Contributions to Econometrics and Statistics Today, H. Schneeweiss and H. Strecker (Eds.). Springer-Verlag, Berlin. Schneeweiss, H. and Mittag, H. J. (1986), Lineare Modelle mit fehlerbehafteten Daten. Physica-Verlag, Heidelberg. Schnell, D. (1983), Maximum likelihood estimation of parameters in the implicit nonlinear errors-in-variables model. Creative component for the M.S. degree. Iowa State University, Ames, Iowa. Schnell, D. (1987), Estimation of parameters in the nonlinear functional errors-invariables model. Unpublished Ph.D. thesis, Iowa State University, Ames, Iowa. Scott, E. L. (1950), Note on consistent estimates of the linear. structural relation between two variables. Ann. Math. Statist. 21, 284-288. Seares, F. H. (1944), Regression lines and the functional relation. Astrophys. J. 100, 255-263. Searle, S. R. and Quaas, R. L. (1978), A notebook on variance components: A detailed description of recent methods of estimating variance components, with applications in animal breeding. Biometrics Unit Mimeo Series Paper No. BU-640-M. Cornell University, Ithaca, New York. Selen, J. (1986), Adjusting for errors in classification and measurement in the analysis of partly and purely categorical data. J. Am. Statist. Assoc. 81, 75-81. Shapiro, A. (1983), Asymptotic distribution theory in the analysis of covariance structures. S. Afr. Statist. J . 17, 33-81. Shapiro, A. (1985), Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints. Biometrika 72, 133-144. Shaw, R. H., Nielsen, D. R., and Runkles, J. R. (1959), Evaluation of some soil moisture characteristics of Iowa soils. Iowa Agriculture and Home Economics Experiment Station Research Bulletin 465, Ames, Iowa. Shukla, G. K. (1973), Some exact tests of hypothesis about Grubbs’ estimators. Biometrics 29, 313-318. Siege], P. M. and Hodge, R. W. (1968),A causal approach to the study of measurement error. In Methodology in Social Research, H. M. Blalock and A. B. Blalock (Eds.). McGraw-Hill, New York. Smith, K. (1918), On the standard deviations of adjusted and interpolated values of an observed polynomial function and its constants and the guidance they give towards a proper choice of the distribution of observations. Biometrika 12, 1-85. Solari, M. E. (1969), The “maximum likelihood solution” to the problem of estimating a linear functional relationship. J. R . Statist. Soc. Ser. B 31, 372-315. Sonka, S. T., Heady, E. O., and Dahm, P. F. (1976), Estimation ofgain isoquants and a decision model application for swine production. Am. J . Agric. Econ. 58, 466-474. Spearman, C. ( 1 904), General intelligence, objectively determined and measured. Am. J . Psychol. 15, 201-293. Spiegelman, C. (1979), On estimating the slope of a straight line when both variables are subject to error. Ann. Statist. 7, 201-206.
430
BIBLIOGRAPHY
Spiegelman, C. (1982), A note on the behavior of least squares regression estimates when both variables are subject to error. J. Res. Nat. Bur. Stand. 87, 67-70. Sprent, P. (1966), A generalized least-squares approach to linear functional relationships. J . R. Statist. SOC.Ser. B 28, 278-297. Sprent, P. (1968), Linear relationships in growth and size studies. Biometrics 24, 639-656. Sprent, P. (1969), Models in Regression and Related Topics. Methuen, London. Sprent, P. (1970), The saddle point of a likelihood surface for a linear functional relationship. J. R. Statist. SOC.Ser. B 32, 432-434. Sprent, P. (1976), Modified likelihood estimation of a linear relationship. In Studies in Probability and Statistics, E. J. Williams (Ed.). North-Holland, Amsterdam. Stefanski, L. A. (1989, The effects of measurement error on parameter estimation. Biometrika 74, 583-592. Stefanski, L. A. (1985), Unbiased estimation of a nonlinear function of a normal mean with application to measurement-error models. Unpublished manuscript. Cornell University, Ithaca, New York. Stefanski, L. A. and Carroll, R. J. (1985a), Conditional scores and optimal scores for generalized linear measurement-errors models. Unpublished manuscript. Cornell University, Ithaca, New York. Stefanski, L. A. and Carroll, R. J. (1985b), Covariate measurement error in logistic regression. Ann. Statist. 12, 1335-1351. Stroud, T. W. F. (1972), Comparing conditional means and variances in a regression model with measurement errors of known variances. J. Am. Statist. Assoc. 67, 407-412, correction (1973) 68, 251. Sykes, L. R., Isacks, B. L., and Oliver, J. (1969), Spatial distribution of deep and shallow earthquakes of small magnitudes in the Fiji-Tonga Region. Bull. Seismof. SOC. Am. 59, 1093-1113. Takemura, A., Momma, M., and Takeuchi, K. (1985), Prediction and outlier detection in errors-in-variables model. Unpublished manuscript. University of Tokyo. Tenenbein, A. (1970), A double sampling scheme for estimating from binomial data with misclassifications. J. Am. Statist. Assoc. 65, 1350-1361. Theil; H. (1958), Economic Forecasts and Policy. North-Holland, Amsterdam. Theil, H. (1971), Principles of Econometrics. Wiley, New York. Theobald, C. M. (1975a), An inequality for the trace of the product of two symmetric matrices. Math. Proc. Cambridge Philos. SOC. 77, 265-267. Theobald, C. M. (1975b), An inequality with application to multivariate analysis. Biometrika 62, 46 1-466. Theobald, C. M. and Mallinson, J. R. (1978), Comparative calibration, linear structural relationships and congeneric measurements. Biometrics 34, 39-45. Thompson, W. A., Jr. (1963), Precision of simultaneous measurement procedures. J. Am. Statist. Assoc. 58, 474-479. Thomson, G. H. (1951), The Factorial Analysis of Human Ability. London University Press, London.
BIBLIOGRAPHY
431
Thurstone, L. L. (1974), Multiple Factor Analysis. University of Chicago Press, Chicago. Tintner, G. (1945), A note on rank, multicollinearity, and multiple regression. Ann. Math. Statist. 16, 304-308. Tintner, G. (1946), Multiple regression for systems of equations. Econometrica 14, 5-36. Tintner, G. (1952), Econometrics. Wiley, New York. Tracy, D. S.and Dwyer, P. S. (1969), Multivariate maximum and minimum with matrix derivatives. J. Am. Statist. Assoc. 64, 1576-1594. Tukey, J. W. (1951), Components in regression. Biometrics 5, 33-70. Turner, M. E. (1978), Allometry and multivariate growth. Growth 42, 434-450. US. Department of Commerce (1975), 1970 Census of Population and Housing. Accuracy of Data for Selected Population Characteristics as measured by the 1970 CPS-Census Match. PHE(E)-11 US. Government Printing Office, Washington, D.C. Vann, R. and Lorenz, F. (1984), Faculty response to writing of nonnative speakers of English. Unpublished manuscript. Department of English, Iowa State University, Ames, Iowa. Villegas, C. (1961),Maximum likelihood estimation of a linear functional relationship. Ann. Math. Statist. 32, 1048-1062. Villegas, C. (1964), Confidence region for a linear relation. Ann. Math. Statist. 35, 780-787. Villegas, C. (1966), On the asymptotic efficiency of least squares estimators. Ann. Math. Statist. 37, 1676- 1683. Villegas, C. (1969),On the least squares estimation of non-linear relations. Ann. Math. Statist. 40, 462-466. Villegas, C. (1972),Bayesian inference in linear relations. Ann. Math. Statist. 43, 17671791. Villegas, C. (1982), Maximum likelihood and least squares estimation in linear and affine functional models. Ann. Statist. 10, 256-265. Voss, R. E. (1969), Response by corn to NPK fertilization on,Marshall and Monona soils as influenced by management and meteorological factors. Unpublished Ph.D. thesis. Iowa State University, Ames, Iowa. Wald, A. (1940), Fitting of straight lines if both variables are subject to error. Ann. Math. Statist. 11, 284-300. Waid, A. (1943), Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Am. Math. SOC.54, 426-482. Walker, H. M. and Lev, J. (1953), Statistical Inference. Holt, Rinehart and Winston, New York. Ware, J. H. (1972),Fitting straight lines when both variables are subject to error and the ranks of the means are known. J . Am. Statist. Assoc. 67, 891-897. Warren, R. D., White, J. K., and Fuller, W. A. (1974),An errors-in-variables analysis of managerial role performance. J . Am. Statist. Assoc. 69, 886-893.
432
BIBLIOGRAPHY
Wax, Y. (1976), The adjusted covariance regression estimate. Unpublished Ph.D. thesis. Yale University, New Haven, Connecticut. Werts, C. E., Rock, D. A., Linn, R. L., and Joreskog, K. C. (1976), Testing the equality of partial correlations. Am. Statist. 30, 101-102. Whittle, P. (1952), O n principal components and least square methods of factor analysis. Skand. Aktuarietidskr. 35, 223-239. Whitwell, J. C. (1951), Estimating precision of textile instruments. Biometrics, 7 , 101112. Wiley, D. E. (1973), The identification problem for structural equations models with unmeasured variables. In Structural Equation Models in the Social Sciences, A. S . Goldberger and 0. D. Duncan (Eds.). Seminar Press, New York. Wilks, S. S. (1963), Mathematical Statistics. Wiley, New York. Willassen, Y. (1977), On identifiability of stochastic difference equations with errorsin-variables in relation to identifiability of the classical errors-in-variables (EIV) models. Scand. J. Statist. 4, 119-124. Williams, E. J. (1955), Significance tests for discriminant functions and linear functional relationships. Biometrika 42, 360-381. Williams, E. J. (1969), Regression methods in calibration problems. Bull. lnt. Statist. Inst. 43, 17-27. Williams, E. J. (1973), Tests of correlation in multivariate analysis. Bull. lnt. Statist. Inst. Proc. 39th Session. Book 4, 218-234. Williams, J. S. (1978), A definition for the common-factor analysis model and the elimination of problems of factor score undeterminacy. Psychometrika 43, 293306. Williams, J. S. (1979), A synthetic basis for a comprehensive factor-analysis theory. Biometrics 35,719-733. Winakor, G . (1975), Household textiles consumption by farm and city families: assortment owned, annual expenditures, and sources. Home Economics Research J . 4, 2-26. Wolfowitz, J. (1957), The minimum distance method. Ann. Math. Statist. 28, 75-88. Wolter, K. M. (1974), Estimators for a nonlinear functional relationship. Unpublished Ph.D. dissertation. Iowa State University, Ames, Iowa. Wolter, K. M. and Fuller, W. A. (1975), Estimating a nonlinear errors-in-variables model with singular error covariance matrix. Proc. Business Econ. Statist. Sect. Am. Statist. Assoc., 624-629. Wolter, K. M. and Fuller, W. A. (1982a), Estimation of the quadratic errors-invariables model. Biometrika 69, 175-182. Wolter, K. M. and Fuller, W. A. (1982b), Estimation of nonlinear errors-in-variables models. Ann. Statist. 10, 539-548. Wu, D. (1973), Alternative tests of independence between stochastic regressors and disturbances. Econornetrica 41, 733 -750. Zellner, A. (1970), Estimation of regression relationships containing unobservable variables. lnt. Econ. Rev. 1 1 , 441-454.
Measurement Error Models Edited by WAYNE A. FULLER Copyright 0 1987 by John Wiley & Sons, Inc
Author Index
Abd-Ella, M.M.. 201, 224 Adcock, R.J., 30, 350 Amemiya, Y.,56,79,240,296,302,303, 328 Andersen, E.B., 272 Anderson, B.M., 296 Anderson, T.W., 30, 44, 89, 104, 154, 296, 321, 350, 372 Barnett, V.D., 30, 380 Barone, J.L., 147 Bartlett, M.S.,363 bttese, G.E.,201, 207, 275 Beaton, A.E., 147 Bellman, R., 391 Bentler, P.M., 321 Berkson, J., 81, 82 Bershad; M.A., 274 Bickel, P.J., 73, 347 Box, G.E.P., 83 Britt, H.J., 239 Brown, R.L., 179 Browne, M.W., 321,335 Carroll, R.J., 189, 347 Carter, L.F., 201 Chua, T.C., 8, 275 Cochran. W.G., 9, 44 Cohen, J.E., 40 Dahm, P.F., 313, 321 Dalenius, T., 9 DeGracie, J.S., 18, 44 D’Eustachio, P., 40
Dijkstra, T., 321 Dorff, M., 74 Draper, N.R., 25, 225 Edelman, G.M., 40 Frisillo, A.L., 230, 234 Gallant, A.R., 225 Gallo, P.P., 189 Ganse, R.A., 56, 79, 328 Geary, R.C., 72 Gleser, L.J., 294, 303, 347 Goldberger, AS., 34 Gregory, K.E., 313 Griffiths, D.A., 260 Grubbs, F.E., 71, 348 Gurland, J., 74 Hahn, G.J., 348 Harman, H.H., 350 Harter, R.M., 120 Healy, J.D., 294 Hickman, R.D., 201, 207 Hidiroglou, M.A., 202, 204, 205 Hodge, R.W., 201 Hunter, J.S., 9 Hunter, W.G., 179 Hwang, J.T., 202 hacks, B.L., 214 Jennrich, R.L., 321, 360, 396 Johnston, J., 244
433
434 Joreskog, K.G., 34, 360, 365 Kalbfleisch, J.D., 104 Kendall, M.G., 14, 30, 75, 90, 138 Kiefer, J., 104 Klepper, S., 102 Koopmans, T.C., 30 Kummell, C.H., 30 Lashof, T.W., 365 Lawley, D.N., 350, 353, 360 Lazarsfeld, P.F., 272 Learner, E.E., 102 Lee, S.Y., 321 Leventhal, M., 221 Lindley, D.V., 30, 75 Lorenz, F., 154 Luecke, R.L., 239 McCall, S.L., 221 McLeod, D.S.,244 McNeice, G.M., 244 Madansky, A., 30 Mandel, J., 365 Mariano, R.S.,44 Martinez-Garza, A., 44 Maxwell, A.E., 353 Melton, B.E., 313 Meyers, H., 56 Miller, M.H., 158 Miller, S.M., 44, 142, 302, 3 12 Modigliani, F., 158 Moran, P.A.P., 30 Morgenstern, O., 9 Morton, R., 104 Mowers, R.P., 134 Nelson, W., 348 Neyman, J., 104 Nielsen, D.R., 137 Nussbaum, M., 347 Oliver, J., 214 Olkin, I., 296 Oxland, P.R., 244
Pantula, S.G., 31 1, 359, 360, 365 Patefield, W.M., 102
AUTHOR INDEX
Patino-Leal, H., 28, 244, 308 Pearson, K.,30, 350 Pierce, D.A., 9 Rao, C.R., 228 Reiersol, O., 73 Reilly, P.M., 28, 244, 308 Ritov, Y.,73, 347 Robinson, P.M., 360 Rubin, D.E., 147 Rubin, H., 104, 154, 350, 372 Runkles, J.R., 137 Rust, B.W., 221 Sandland, R.L., 260 Sargan, J.D., 154 Sawa, T.,44 Schnell, D., 239 Scott, E.L., 104 Shaw, R.H., 137 Shrader, W.D., 134 Siegel, P.M., 201 Smith, H., 25, 225 Sorbom, D., 365 Spearman, C.,350 Spiegelman, C., 73 Sprent, P., 130 Sprott, D.A., 104 Stefanski, L.A., 347 Stewart, T.J., 230, 234 Stuart, A,, 14, 30, 75, 90, 138 Sykes, L.R.,214 Theobald, C.M., 394 Tintner, G., 30 Turner, M.E., 260 Vann, R., I54
von Hake,C.A., 56
Voss, R.E., 18
Wald, A., 13 Ware, J.H., 74 Warren, R.D., I10 White, J.K., 110 Winakor, G., 204 Wolfowitz, J., 104 Wolter, K.M., 259
Measurement Error Models Edited by WAYNE A. FULLER Copyright 0 1987 by John Wiley & Sons, Inc
Subject Index
Alternative model, see Test Analysis ofcovariance, 135, 145 (Ex.1.17), 207 Apple trees, 131 Attenuation: coefficient, 7, 8 correction for, 5 , 202 correlation, 5 regression coefficient, 3 see also Reliability ratio Bartlett adjusted chi-square, 363 Bartlett factor scores, 364 Bekk smoothness, 364 Berea sandstone, 230 Berkson model, 8 1 Bias: adjustment for, 164, 250 of least squares, 3, 8 of M L estimators, 164 of ML for nonlinear model, 25 1 Binomial variables, 271 Bivariate normal, 3 with measurement error, 4, 14, 30 moments of, 88 Boundary solution, see Maximum likelihood Bounds: for coefficients, 11, 12, 100, 101 for error variances, 1 I , 101 Calibration, 177 Census Bureau, 8 Central limit theorem: with fixed components, 92, 94
6 r linear functions, 90
for sample moments, 89
see also Distribution, limiting
Characteristic roots, 391 Characteristic vectors, 391 Coefficient: correlation, 4 least squares, 3, I I , 100 of variation, 98 (Ex.1.59) Common factor, 60, 351 Communality, 60, 63, 69, 35 1 Confidence set, slope, 47 with instrumental variable, 55 Consistency: least squares, 398 maximum likelihood, 303, 398 Constrained model, see Test Controlled X, 79 nonlinear response, 83 Convergence: in distribution, 32, 85 in law, 32, 85 Corn hectares, 63 Corn yield, 18, 134, 179, 197 Corrected for attenuation: coeficient, 5 correlation coefficient, 7 variance of estimator, 6 Correlation, 4, 20 attenuation, 5 corrected for attenuation, 7 Courant-Fischer min-max theorem, 39 I Covariance matrix: of estimated true values, 260 (Ex.3.17)
435
436
SUBJECT INDEX
Covariance matrix (Conrinued) of estimators, 15, 32, 53, 108, 127, 151 measurement error, estimated, 295 of nonlinear estimates, 240,243 sample, 14, 96 of sample moments, 89,332, 386 singular, 404 Current Population Survey, 8 Degrees of freedom: chi-square, 17,363 covariance, estimated, 308
F: 127
likelihood adjusted for, 14,17,31.62, 105,
296,363
mean squares, 46 Studentized statistic, 17,34 Delta method, 85,87 Derivative: of determinant, 389 of inverse matrix, 390 matrix, 388 Determinantal equation, 38, 40, 125 Determinations: duplicate, 195, 348 (Ex. 4.16) multiple, 195 Diag, 3 Diagnostics, 142 Distance: Euclidean, 37,42 minimum, 37.42 statistical, 36, 42 Distribution, limiting: adequacy of, 18,44 of estimators: bivariate structural, 15, 32 factor model, 360 functions of covariance matrix, 323,329,
339,398
instrumental variable, 53, 15 1 multivariate model, 303, 305 nonlinear model, 240 unequal variances, 21 8 of moments, 89, 92,94 nonlinear estimates, 240 residuals, 120, I41 true values, 119, 140 see also Maximum likelihood Distribution-free, 188, 191,332
Dummy variables, 23 1 error in, 277 in regression, 277 Earthquakes, 56, 77,214,325 Efticiency : of least squares, 346 measurement error models, 347 weights, 223 (Ex. 3.1) Eigenvalues, 39 1 Eigenvectors, 391 Employment status, 275 Endogenous variable. I57 Equation error, 106, 186, 193, 261,
292
Error covariance matrix: diagonal, 131 diagonal estimated, 110 estimated, 106, 127 functionally related, 202 linear measurement model, 103, 124 Errors: equation, 106, 186, 193,261 heteroskedastic, 185 measurement, 2, 12,95 nonnormal, 185 unequal variances, I85,217 Errors-in-variables, '30 Error variances: bounds for, 11, 101 functionally related, 202,208 known, 14 measurement, 2, 13 ratio known, 30 unequal, 193 Estimation: bivariate structural model, 14,31 factor model, 61 instrumental variable, 53, 148 least squares, 2 linear, 4 maximum likelihood, 103 nonlinear, 225 ordinary least squares, 2 preliminary, 193 structural model, 5 true x, 20,38,63 weighted, 194 Exogenous variable, 157
SUBJECT INDEX Explanatory variable, I , 4, I3 vector, 100 Factor: analysis, 59, 350 common, 60, 35 1 latent, 60 loadings, 68, 351 estimated, 69 scores, 364 unique, 60, 35 1 Factor model: covariance of estimates. 66, 360 estimated X. 63, 364 estimates, 35, 64, 68 estimators, 6 1 , 360 identification, 61, 372 maximum likelihood, 353 predicted x, 72 (Ex. 1.46). 364 Farm operators, 201 Firm value, 158 Fixed model, see Functional model Fixed observation, 80 Fixed x. 2, 20, 24, 103 Functionally related variance, 202, 208 Functional model: definition, 2 least squares estimation, 36, 339 maximum likelihood for, 103, 124 multivariate, 293 nonlinear, 229 unequal variances, 2 17 vector, 103, 124 Gauss-Newton, 265 iteration, 239, 356 Generalized least squares, 20, 322. See nlso Least squares estimation Genotype, 4, 3 13 Goodness of fit, see Lack of fit Gram-Schmidt, 228 Grouped estimator, 73 Heritability, 3 Heteroskedastic errors, 185, 190 Heywood case, see Maximum likelihood, boundary solutions Hip prosthesis, 244 Hogs, 207
437 Hypothesis test, see Test Identification: definition, 9 distribution knowledge, 73 factor model, 352, 372 instrumental variable model, 149 for model, 10 normal structural model, 10 parameter, 10 test for, 17, 132, 154, 172, 201 Identified, just, I50 Implicit model, 238, 244 Independent variable, 1, 13 vector, 100 Indicator variable, 2, 23 I , 277 Instrument, 5 Instrumental variable: constant function as, 52 definition, 5 1 estimator, 53, 150 properties of, 53, 151, 153 factor model, 61, 353 groups, 74 model, 50, 148 ranks, 74 test: of identification, 54, 154 of model, 153 ISU factor, 311, 365, 375, 376 Just identified, 150 K ~ 4,~ 5. , See also Reliability ratio Kronecker delta, 383 Kronecker product, 384
Lack of fit, 40, 130, 153, 221 Language, 154, 369, 374 Latent class model, 273 Latent factor, 60 Latent structure model, 272 Latent variable, 2 Least squares estimation: of covariance matrix, 321 of diagonal matrix, 329 for errors-in-variables, 36 for fixed x. 2 of functional model, 338
438 Least squares estimation (Continued) iterated, 337 limiting distribution, 402 measurement error bias in, 3 method of moments, 38 nonlinear model, 239 random us. fixed model, 343 restricted model, 349 (Ex. 4.17) statistical distance, 37 structural model, 323 of true values, 20, 38, 113, 140, 300 variance of, 5 weighted, 187 Likelihood function, see Maximum likelihood Likelihood ratio: factor model, 363 multivariate model, 301, 33 1 for slope, 170, 176 Limited information estimator, 154 Limiting distribution, see Distribution, limiting; Maximum likelihood Linear estimator, 4 Linear model, 1 LISREL, 31 1, 365, 376
M,96 m, 96
Managers, 1 10 Manifest variable, 2 Matrix: derivatives, 388 spectral decomposition, 393 square root, 394 trace, 387 Maximum likelihood: adjusted for degrees of freedom: with correlation known, 379 (Ex. 4.24) of covariance matrix, 308, 360 factor model, 363 linear model, 14, 17, 34, 106 multivariate, 296, 320 (Ex. 4.1 1) for test, 301 boundary solutions, 7, 15, 105, 146 (Ex. 2.20), 293, 375 different error variances, 2 17 error in equation, 103 I estimator: bias of, 173 bias for nonlinear, 25 1 of covariances, I4
SUBJECT INDEX
distribution of, 127 estimated variance of, 220 limiting distribution, 108, 127,218,240, 303, 320 (Ex. 4.12), 398 moditkation of, 163, 171, 173, 250 Monte Carlo properties, 166 sequences of, 127 of sigma, 126, 243, 294 singular error covariance, 298,404 structural model, 139, 296 of true values, 126 factor model, 62, 353 fixed x model, 2, 103, 124 functional model, 2, 103, 124 implicit nonlinear model, 239 inconsistency of, 104, 126, 294 iterative computation, 336 limited information, 154 multivariate model, 292 nonlinear model, 229, 238, 240, 247,253 for n parameters, 104 randomx, 105 unbounded, 104 unequal error variances, 21 7 Mean squares: corrected, 96 raw, 96 Measurement error: correlated with true values, 27 I effect on least squares, 7, 9 introduction, 2 notation, 95 power oft, 12 (Ex. 1.1) unbiased, 273 Measurement variance, 2, 13. See also Error covariance matrix; Error variances Method of moments, 14, 30,250 Metric, roots in, 293, 356, 391. See also Distance Mixing fractions, 308 ML, see Maximum likelihood Model: checks, 25, 117, 367 definition of, 9 specification test, 40, 130, 153, 221. See also Likelihood ratio Modifications of ML, 164, I7 1, I73,25 1 Moments: distribution of sample, 89 of estimators, 28, 163
439
SUBJECT INDEX
method of, 14, 30 for quadratic model, 250 of normal distribution, 88 Monte Carlo, 44, 165, 21 1, 347 Moore-Penrose inverse, 383 Multinomial variables, 272 Multivariate model, 292 distribution of estimators, 305 Newton-Raphson method, 218 Noise to signal ratio, 49 (Ex. 1.23) Nonlinear estimator: iterative calculation, 239 properties, 240, 255 see also Nonlinear model Nonl i near mode I : approximation to, 264 controlled X, 79 covariance matrix of estimator, 256, 257 definition, 226 error in equation, 261,263 functional model, 263 structural model, 263 estimated covariance, 266 estimators, properties of, 240 expansions of, 250 fixed X, 79 implicit, 238 least squares estimation, 238, 350 (Ex. 4.20) linear in x, 226 maximum likelihood, 229, 238 nonlinear in x. 229 structural, 261 variance estimator, 253 Nonlinear regression, 23 1 Nonnormal errors, 185 Normal distribution: moments of, 88 sample moments, 90 Normality, test for, 1I 7 Notation, 95
OLS, I , 100. See also Least squares estimation Order in probability, 87 Ordinary least squares, I , 100. See olso Least squares estimation Outliers. 189 Ovendentified, 70
Pheasants, 34 Phenotype, 3, 3 13 Plot, residual, 116 Power, off test, 12 (Ex. 1.1) Prediction: in a second population, 77 of trwe values, 22, I 19 Y given X, 74 ProbabiIities: conditional: 279 response, 274 two-way table, 279 Quadratic form, minima and maxima, 391 Quadratic model, 2 I2 bias of maximum likelihood, 247 controlled experiment, 83 example of, 214,215, 266 modified estimators for, 257 Random model, see Structural model Random x, 22, 24 Ranks, as instrumental variables, 74 Rat spleens, 40 Reduced form, 149 Reliability, 3 Reliability ratio: correlation, 5, 7 definition, 3 estimated, 13 (Ex. 1.1 I ) estimation with known, 5, 199 factor model, 6 1, 63 vector explanatory variables, 199, 203 Repeated observations, 195, 348 (Ex, 4.16) Residuals: covariance matrix, 120, 141 definition, 25 diagnostics, I2 1 multivariate, 3 12 plot, 1 16 test for normality, 1 I7 variance, 120, 140, 143 Response error, 8 Restricted model, see Test Right-wrong model, 273 Robustness, I89 R-squared, 4, 20
s,97
440 Scores, factor: Bartlett’s, 364 Thomson’s, 364 Soil moisture, 134, 268 Soil nitrogen, 18, 179, 197 Spectral decomposition, 393 Square root, of matrix, 394 Stag beetles, 260 (Ex. 3.19) Statistical differentials, 85, 87 Statistical distance, 36, 42 point to line, 37 Stochastic matrix, 395 Structural model: definition, 2 estimators for, 14, 30, 107, 139, 295 least squares estimation, 323 maximum likelihood for, 296 nonlinear, 261 normal, 3 Studentized statistic, 17, 34, 177 Student’s t, 4. See nlso Distribution, limiting; Studentized statistic SUPER CARP, 113,202,205,208,215, 227,250,266 Supernova, 22 1 Table, two-way, 274 Taylor series, 16, 85, 87 Test: hypothesis: bias with instrumental variable, 55 lack of tit, 40, 130, 221 for slope, 4, 44 slope with instrumental variable, 54 identification: for instrumental variables, 54, 154 reliability ratio model, 201 singular mxx, 18, 132, 172 for instrumental variable model, 153 of model, 40, 130, 153, 221 power of, 4 of uxx= 0, 18
SUBJECT INDEX
of zero slope, 4 Textiles, 204 Tonga trench, 2 14 Trace, 387 of matrix product, 395 Trait, 5 True values, 2, 13, 30, 50, 95 estimated, 20, 38, 113, 140, 300 factor model, 63, 69, 364 predicted, 22, 119 factor model, 72 (Ex.1-46), 364 r-statistic, 4, 17, 34, 177 Ultrastructural model, 107 Unconstrained model, see Test Unique factor, 60, 35 1 Uniqueness, 61, 69, 35 1 estimator of, 63 Variable: controlled, 79 indicator, 2 instrumental, 50 latent, 2, 60 manifest, 2 observable, 2, 13, D, D. 95 true, 2, 13, 30, 59, 95 Variance: distribution free, 188, 189 error, 2, 13 Functionally related, 202 of least squares, 5 of moments, 88 of normal, 2 sample, 14 see also Covariance matrix Vec, definition, 382 Vech, definition, 382 Wald estimator, 74, 84, 146 (Ex. 2.19) Weights, 186, 194, 195 Wishart matrix, 108, 127. 296