Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES
Selected Proceedings of the Symposium on Estimating...
12 downloads
609 Views
39MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES
Selected Proceedings of the Symposium on Estimating Functions Ishwar V. Basawa, V.P. Godambe and Robert L. Taylor, Editors
Volume 32
Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Volume 32
Selected Proceedings of the Symposium on Estimating Functions Ishwar V. Basawa, V.P. Godambe and Robert L. Taylor, Editors
Institute of Mathematical Statistics Hayward, California
Institute of Mathematical Statistics Lecture Notes-Monograph Series Editorial Board Andrew A. Barbour, Joseph Newton, and David Ruppert (Editor)
The production of the IMS Lecture Notes-Monograph Series is managed by the IMS Business Office: Miriam Gasko Donoho, IMS Treasurer, and James H. Sanders, IMS Business Manager.
Library of Congress Catalog Card Number: 97-077203 International Standard Book Number 0-940600-44-7 Copyright © 1997 Institute of Mathematical Statistics All rights reserved Printed in the United States of America
TABLE OF CONTENTS An Overview of the Symposium I.V. Basawa, V.P. Godambe and R.L. Taylor Estimating Functions: A Synthesis of Least Squares and Maximum Likelihood Methods V.P. Godambe
1
5
S E C T I O N 1: L I K E L I H O O D A N D R E L A T E D T O P I C S . . . 17 Partial Likelihood and Estimating Equations P. Greenwood and W. Wefelmeyer
19
Avoiding the Likelihood C.C. Heyde
35
Likelihood and Pseudo-likelihood Estimation Based on Response-Biased Observation J.F. Lawless
43
Likelihood From Estimating Functions P.A. Mykland
57
SECTION 2: GENERAL THEORY
63
Estimating Functions in Semiparametric Statistical Models . . . .
65
S. Amari and M. Kawanabe Estimating Functions, Partial Sufficiency and Q-Sufficiency in the Presence of Nuisance Parameters V. P. Bhapkar
83
Estimating Functions and Higher Order Significance D.A.S. Fraser, N. Reid and J. Wu
105
On Consistency of Generalized Estimating Equations B. Li
115
SECTION 3: QUASILIKELIHOOD
137
Extended Quasilikelihood and Estimating Equations J.A. Nelder and Y. Lee
139
Quasilikelihood Regression Models for Markov Chains W. Wefelmeyer
149
SECTION 4: APPLICATIONS TO LINEAR MODELS AND ECONOMETRICS
175
Optimal Instrumental Variable Estimation for Linear Models With Stochastic Regressors Using Estimating Functions A.C. Singh and R. P. Rao
177
On Estimating Function Approach in the Generalized Linear Mixed Model B.C. SutradharandV.P. Godambe
193
Using Godambe-Durbin Estimating Functions in Econometrics . 215 H.D. Vinod Estimating Functions and Over-Identified Models T. Wirjanto SECTION 5: APPLICATIONS TO TIME SERIES, BIOSTATISTICS AND STOCHASTIC PROCESSES
239
257
On the Prediction for Some Nonlinear Time Series Models Using Estimating Functions B. Abraham, A. Thavaneswaran and S. Veins
259
Estimating Function Methods of Inference for Queueing Parameters I.V. Basawa, R. Lund and U.N. Bhat
269
Optimal Estimating Equations for State Vectors in Non-Gaussian and Nonlinear State Space Time Series Models 285
/. Durbin
Estimating Functions in Failure Time Data Analysis /?.L. Prentice and L. Hsu
293
Estimating Functions for Discretely Observed Diffusions: A Review M. Sorensen
305
Fitting Diffusion Models in Finance
327
D.L. McLeish and A. W. Kolkiewicz SECTION 6: APPLICATIONS TO SPATIAL STATISTICS 351 Prediction Functions and Geostatistics 353 A.F. Desmond Efficiency of the Pseudo-Likelihood Estimate in a One Dimensional Lattice Gas 369 J.L. Jensen Estimating Functions for Semivariogram Estimation 5. Lele
381
SECTION 7: NONPARAMETRICS, ROBUST INFERENCE AND BOOTSTRAP 397 Estimating Covariance Matrices Using Estimating Functions in Nonparametric and Semiparametric Regression RJ. Carroll, SJ. Iturria and R.G. Gutierrez
399
Estimating Equations and the Bootstrap F. Hu and J.D. Kalbfleisch
405
Estimating Functions: Nonparametrics and Robustness P.K. Sen
417
SECTION 8: FURTHER TOPICS
437
Inference From Stable Distributions H. El Barmi and P. I. Nelson
439
Separate Optimum Estimating Function for the Ruled Exponential Family Γ. Yanagimoto and Y. Hiejima
457
Institute of Mathematical Statistics LECTURE NOTES — MONOGRAPH SERIES
AN OVERVIEW OF THE SYMPOSIUM ON ESTIMATING FUNCTIONS I. V. Basawa University of Georgia V. P. Godambe University of Waterloo R. L. Taylor University of Georgia The Symposium on Estimating Functions was held at the University of Georgia from March 21, 1996 to March 23, 1996. The Symposium was cosponsored by the Institute of Mathematical Statistics and the Statistical Society of Canada and represented continuing efforts by the two professional societies to focus special attention on some of the more prominent directions in probability and statistics. Partial funding by the University of Georgia's "State-of-the-Art" Conference Program and a National Security Agency Grant contributed to the success of the Symposium and is gratefully acknowledged. The Symposium attracted 119 registered participants from several countries including Australia, Canada, Denmark, England, Germany, Hong Kong, India, Japan, Kuwait, Sweden, and the United States of America. The program consisted of 13 sessions with 35 invited speakers and 14 contributed talks. The main theme of the Symposium can be summarized as "Statistics at a Juncture of a Synthesis." The 'likelihood function' has provided a basic methodology for the parametric inference for decades. On the other hand semi-parametric inference has been primarily based on the 'least-squares' methodology for a longer time. The unification and extension of the two methodolgies, achieved during recent years through estimating functions was the main theme of the opening talk of the Symposium by V. P. Godambe. The discussions and presentations which followed were energetic and covered a wide variety of topics: C. C. Heyde suggested avoiding the likelihood; J. A. Nelder presented extensions to quasilihood; N. Reid discussed higher order significance; J. Durbin discussed applications to nonlinear state space time series; J. D. Kalbfleish presented bootstrap using estimating functions. There appeared to be a consensus at the Symposium that Statistics was at
2
BASAWA, GODAMBE AND TAYLOR
a juncture of a synthesis of the two of its main methodologies, namely the likelihood and least squares, brought about by estimating functions. Many of the major results which were presented at the Symposium are summarized in these selected proceedings. Specifically, the papers are organized into eight sections. A general historical paper by V. P. Godambe precedes the following sections: 1. 'Likelihood' with papers by P.Greenwood & W. Welfelmeyer, C. C. Heyde, J. F. Lawless, and P. A. Mykland. Topics which are central to this section include likelihood, partial likelihood, pseudo-likelihood and other alternatives to the likelihood. 2. 'General Theory' with papers by A. Amari & M. Kawanabe, V. P. Bhapkar, D.A.S. Eraser, N. Reid & J. Wu, and B. Li. Papers in this section deal with problems on estimating functions in semiparametric models, nuisance parameters, higher order significance and consistency. 3. 'Quasilikelihood' with papers by J.A. Nelder & Y. Lee, and W. Wefelmeyer. Extended quasilikelihood and regression models for Markov chains are discussed in this section. 4. 'Applications to Linear Models and Econometrics' with papers by A. C. Singh & R. P. Rao, B.C. Sutradhar & V. P. Godambe, H. D. Vinod, and T. Wirjanto. Papers in this section address problems in instrumental variable estimation, generalized linear mixed models, Godambe-Durbin estimating functions in econometrics and over- identified models. 5. 'Applications to Time Series, Biostatistics and Stochastic Processes' with papers by B. Abraham, A. Thavaneswaran & S. Peiris, I. V. Basawa, R. B. Lund & U. N. Bhat, J. Durbin, R. L. Prentice & L. Hsu, M. Sorensen, and D. L. McLeish & A. W. Kolkiewicz. Applications in this section are in nonlinear state space models, prediction, failure time data analysis, queueing parameter estimation, diffusion processes and models in finance. 6. 'Applications to Spatial Statistics' with papers by A. F. Desmond, J. L. Jensen, and S. Lele. Prediction in geostatistics, pseudo-likelihood estimation for lattice models and semivariogram estimation are covered in papers in this section. 7. 'Nonparametrics, Robust Inference and Bootstrap' with papers by R. J. Carroll, S. J. Iturria & R. G. Gutierrez, F. Hu & J. D. Kalbfleisch, and P. K. Sen. These papers contain results for estimating covariance matrices, nonparametrics and robustness and bootstrap techniques. 8. 'Futher Topics' with papers by H. El Barmi & P. I. Nelson, and T. Yanagimoto & Hiejima. The two papers in this final section are concerned with inference from stable distributions and inference for the ruled exponential family. The editors of these selected proceedings of 29 papers are very grateful to numerous referees who very carefully and critically reviewed all papers which
SYMPOSIUM OVERVIEW
3
were submitted for publication in the proceedings. Also, the willingness of individual authors to limit the number of pages of their articles helped produce this volume in the IMS Lecture Notes Series. Special thanks go to Connie Durden for the preparation of this Volume. Ms. Durden worked tirelessly and patiently with the various authors in securing software files of their papers, providing uniformity of margins, spacing and similar editorial changes which greatly enhanced the general appearance of this volume.
Institute of Mathematical Statistics LECTURE NOTES — MONOGRAPH SERIES
ESTIMATING FUNCTIONS: A SYNTHESIS OF LEAST SQUARES AND MAXIMUM LIKELIHOOD METHODS V.P. Godambe University of Waterloo ABSTRACT The development of the modern theory of estimating functions is traced from its inception. It is shown that this development has brought about a synthesis of the two historically important methodologies of estimation namely, the 'least squares' and the 'maximum likelihood'. Key Words: Estimating functions; likelihood; score function.
1
Introduction
In common with most of the historical investigations, it is difficult to trace the origin of the subject of this conference: 'Estimating Functions'. However, in the last two centuries clearly there are three important precursors of the modern theory of estimating functions (EF): In the year 1805, Legendre introduced the least squares (LS) method. At the turn of the last century Pearson proposed the method of moments and in 1925 Fisher put forward the maximum likelihood (ML) equations. Of these three, the method of moments faded out in time because of its lack of any sound theoretical justification. However the other two methods namely the LS and the ML even at present play an important role in the statistical methodology. These two methods would also concern us in the following. The LS method was justified by what today is called the Gauss-Markoff (GM) theorem: The estimates obtained from LS equations are 'optimal' in the sense that they have minimum variance in the class of linear unbiased estimates. This was a finite sample justification. At about the same time Laplace provided a different 'asymptotic justification' for the method. Fisher justified the ML estimation, for it produced estimates which are asymptotically unbiased with smallest variance. This left open the question, is there a finite sample justification for the ML estimation corresponding to the GM theorem justification for the LS estimation? The modern EF theory provided such a justification. According to the Όptimality criterion' of the EF theory, the score function (SF) is 'optimal'.
6
2
GODAMBE
SF Optimality
To state the just mentioned result formally we introduce briefly some notation. Let X = {x} be the sample (observations) space and a class of possible distributions (densities) on X be given by {/( |0),0 G Ω}, Ω being the parameter space, which we assume here to be the real line. If the function / is completely specified up to the (unknown) parameter 0, f{ \0) is called a parametric model. For this model the (score function) SF = d\ogf( \θ)/dθ. Any real function of x and 0 say g(x,0) is called an estimating function, (EF). It is said to be unbiased if its mean value for 0 E Ω, is zero; £g = 0. Further, for reasons which would be clear later, corresponding to every EF g we define a standardized version g/{£{τ^)}uu
Now in a class Q = {g} of
unbiased estimating functions, g* is said to be 'optimal' if the variance of the standardized EF g, is minimized for g — g*: ε
^)2/{£(%)}2
< ε(g)2/{εφ}2,
θen,9eg.
(2.1)
SF Theorem (Godambe, 1960). For a parametric model /( |0), granting some regularity conditions, in the class of all unbiased EFs, the optimal estimating function is given by the SF i.e. g* = dlogf{.\θ)/dθ. The optimality of the SF given by the above Theorem should be distinguished from the optimality of the LS estimates based on the GM theorem. The SF optimality (though with some additional assumptions implies asymptotic optimality of the ML estimate) is essentially optimality of the 'estimating function' while the LS optimality is optimality of the 'estimate'. The concept underlying optimality criterion of the EF theory became more vivid and compelling in relation to the problem of nuisance parameters.
3
Conditional SF Optimality
Now let the parameter 0 consist of two components 0\ and #2, θ = (#i,#2) and the parametric model be /( |0i,#2) where θ\ is real and 02 is a vector; θ £ Ω, θ\ G Ωi, 02 £ Ω2 and Ω = Ωi x Ω2. Further suppose we want to estimate only 0\ (the interesting parameter) ignoring 02 (the nuisance parameter). How to proceed? To this question the ML estimation provides no satisfactory answer. If 0\ and §2 are jointly ML estimates for θ\ and 02, as is well known, the estimate 0\ can be inconsistent (unacceptable) in case the dimensionality of the parameter 02 goes on increasing with the number of observations (cf. Neyman-Scott, 1948). The EF theory, for the present
ESTIMATING FUNCTIONS situation implies, restricting to that part of likelihood function which is governed by the interesting parameter θ\ only. Formally for the parametric model /( |0i,02), let Q\ be the class of all unbiased EFs g(x,θ\), that is functions of x and 0χ only: Gi = {9
9 = s ( M i ) , S(g) = 0, Θ e Ω}.
Further let t be a complete sufficient statistic for the parameter 02, f° r every fixed 0χ. Assuming the statistic t is independent of the parameter θ\ we have Conditional SF Theorem (Godambe, 1976). Granting some regularity conditions, in the class of EFs Q\, the 'optimal' EF g* is given by the conditional SF i.e. g* = dlog/( |t;0i)/30i. Note in the above theorem the definition of optimality is obtained from (2.1) just by replacing in it Q by Gi and consequently S(dg/dθ) by ε(dg/dθι). That is the criterion of optimality is unconditional. In the case of the Neyman-Scott example, unlike the ML estimate 0, the equation 'conditional SF = 0' provides a consistent estimate of θ\. Further the EF optimality criterion suggests a definition of 'conditional SF' in case the statistic t depends on the parameter θ\. If ί(0io) is the value oΐt at θ\ = 0χo then we define the conditional SF by g* where g* = {dlogf(.\t(θ10),θ1,θ2)/dθι}θl0=Θl.
(3.1)
This definition is motivated as follows. The EF g* in (3.1) £ Q\ though it depends on 02- It further is 'optimal' in Q\ though only locally at 02 (Lindsay, 1982). Unlike the previous situation, when the sufficient statistic t, was independent of 0χ, now no universally optimal g* (i.e. for all 02 £ Ω2) exists in Q\. Further though the EF g* in (3.1) depends on 02, it is orthogonal to the marginal SF of the sufficient statistic ί, hence the substitution of an estimate 02 derived from the latter, in the former would still leave the former nearly optimal for large samples, (Lindsay, 1982; Godambe, 1991; Small and McLeish, 1994; Liang and Zeger, 1995). The equation
would provide a (nearly optimal) consistent estimate of θ\. Note in the forgoing discussion, conditioning is used just as a 'technique' to obtain (unconditionally) 'optimum' EFs; but it is not used as a principle of inference. In fact, without invoking any conditioning at all, Godambe and Thompson (1974) established, in case of the normal distribution iV(0χ,02), the optimality of the EF (s 2 - 02), for the interest parameter 02, ignoring the nuisance parameter θ\. How this (unconditional) optimality leads to a very 'flexible conditioning' will be discussed later.
GODAMBE For a general perspective on the topic of conditioning and optimality we refer to Small and McLeish (1988), Lindsay and Waterman (1991) and Lindsay and Li (1995). Lloyd (1987) and Bhapkar (1991) have given results concerning optimality of 'marginal SF' under 'conditional completeness'. From the above discussion it is clear that the EF theory has corrected a major deficiency of the ML estimation in case of the nuisance parameters. Some earlier references in respect of the nuisance parameters are Bartlett (1936), Cox (1958), Barnard (1963), Kalbfleisch and Sprott (1970), BarndorffNielsen (1973) and others. Some of these authors tried to obtain conditions under which the marginal distribution of t does not contain any information about θ\ the parameter of interest. As we have seen the optimality criterion of the EF theory yields such a condition in terms of 'completeness of the statistic t\ Though not universally applicable (as none can be, I suppose) it by now has been commonly used for its mathematical manageability. It also carries with it greater conviction for it is derived from an optimality criterion which has proved to be fruitful very generally. In the following we would show that the EF theory, just as it corrected ML estimation, also corrects some major inadequacies of the LS estimation and the GM theorem.
4
Quasi-Score Function
We now replace the abstract (observation) sample in the discussion by n real variates X{ : i = 1, ...,n which are assumed to be independently distributed with means μi(θ) and variances Vi(θ) (μι and v% being some specified functions of θ) i = 1, ...,n. For simplicity let θ be a scalar parameter. Initially we consider the special case where μι are linear functions of θ and V{ are independent of θ. Here the LS equation is given by Σι{xi — μ%){-^)hi — 0 uu The solution of the equation, as said before, according to GM theorem, has smallest variance in the class of all linear unbiased estimates of 0; hence is 'optimal'. The estimating function Σ{χi ~ tJLi){-^E')lvi ιs a l s o 'optimal' uθ according to criterion (2.1), in the class of all EFs of the form n
9 = Σ(xi
~ μ%)(H
(4.1)
where aι can be any arbitrary functions of θ. (Actually here we minimize 8(g2) subject to holding ε(dg/dθ) = const.. This will explain the standardization of EF mentioned earlier.) Note this EF optimality implies more than the GM optimality, for the solutions corresponding to all the equations g = 0 include not only all linear unbiased estimates of θ but many more.
ESTIMATING FUNCTIONS Now let the means μ* and variances V{ be arbitrarily specified functions of θ. Here the LS equation is given by 3 + B = 0 where g
=
, n ,„. 1
..jdμi/dθ) υ%
and
(4.2)
Clearly in (4.2), S{g) = 0 and £(B) = ΣΐdlogVi/dθ. Note for large n, (