PSEUDOSOLUTION OF LINEAR FUNCTIONAL EQUATIONS Parameters Estimation of Linear Functional Relationships
Mathematics an...
11 downloads
435 Views
5MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PSEUDOSOLUTION OF LINEAR FUNCTIONAL EQUATIONS Parameters Estimation of Linear Functional Relationships
Mathematics and Its Applications
Managing Editor: M. Hazewinkel Centrefor Mathematics and Computer Science, Amsterdam, The Netherlands
-
PSEUDOSOLUTION OF LINEAR FUNCTIONAL EQUATIONS Parameters Estimation of Linear Functional Relationships
ALEXANDER S. MECHENOV Moscow State University, Russia
- Springer
Library of Congress Cataloging-in-Publication Data A C.I.P. record for this book is available from the Library of Congress.
ISBN 0-387-24505-7
e-ISBN 0-387-24506-5
Printed on acid-free paper.
O 2005 Springer Science+Business Media, Inc.
All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now know or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if the are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed in the United States of America. 9 8 7 6 5 4 3 2 1
SPIN 1 1383666
Contents
......................................................... vii General Preface ................................. . . Labels and Abbreviations ................................................................................. ix 1.......................................................................................................................... SYSTEMS OF LINEAR ALGEBRAIC EQUATIONS ................................ 1 1. ANALYSIS OF PASSIVE EXPERIMENTS............................................ 1 1.1 Multiple Linear Regression Analysis .................................................. 2 1.2 Linear Model Subject to the Linear Constraints ................................ 13 1.3 Estimation of the Normal Parameter Vector ..................................... 18 1.4 Confluent analysis of Passive Experiment ........................................ 36 1.5 Stable Parameter estimation of the Degenerated Confluent Model .... 56 1.6 Confluent-Regression Analysis of Passive Experiment ...................... 67 1.7 Stable Estimation of Normal Parameters of the Degenerated ConfluentRegression Model................................................................................... 84 2........................................................................................................................... SYSTEMS OF LINEAR ALGEBRAIC EQUATIONS .............................. 93 2. ANALYSIS OF ACTIVE AND COMPLICATED EXPERIMENTS ..... 93 2.1 Analysis of Active Experiment ......................................................... 94 2.2 Analysis of Active-Regression Experiment...................................... 111 2.3 Analysis of Passive-Active Experiment........................................... 117 2.4 Analysis of Passive-Active-Regression Experiment ......................... 127 3........................................................................................................................... LINEAR INTEGRAL EQUATIONS........................................................ 141 3. ANALYSIS OF PASSIVE AND OF ACTIVE EXPERIMENTS ......... 141 3.1 Variational Problems for the Construction of Pseudosolutions of Linear Integral Equations ............................................................................... 141 3.2 About the Appurtenance of Gauss-Markov Processes to Probability Sobolev Spaces ................................................................................... 166 3.3 Fredholm Linear Integral Equations with the Random Right-Hand Side Errors ................................................................................................. 182 3.4 Linear Integral Equation with the Measured or Realized Core ......... 187 3.5 Technique of Computations ........................................................... 201 References ............................................................................................... 213 Application .............................................................................................. 221 Index ....................................................................................................... 229 Glossary of symbols ................................................................................. 233
GENERAL PREFACE
In the book there are introduced models and methods of construction of pseudo-solutions for the well-posed and ill-posed linear functional equations circumscribing models passive, active and complicated experiments. Two types of the functional equations are considered: systems of the linear algebraic equations and linear integral equations. Methods of construction of pseudos6lutions are developed in the presence of passive right-hand side errors for two types of operator errors: passive measurements and active representation errors of the operator, and all their combinations. For the determined and stochastic models of passive experiments the method of the least distances of construction of pseudosolutions is created, the maximum likelihood method of construction of pseudosolutions is applied for active experiments, and then methods for combinations of models of regression, of passive and of active experiments are created. We have constructed regularized variants of these methods for systems of the linear algebraic equations with the degenerated matrices and for linear integral equations of the first kind. In pure mathematics, the solution techniques of the functional equations with exact input data more often are studied. In applied mathematics, problem consists in construction of pseudosolutions, that is, solution of the hctional equations with perturbed input data. Such problem in many cases is incomparably more complicated. The book is devoted to a problem of construction of a pseudosolution (the problem of a parameter estimation) in the following fundamental sections of applied mathematics: confluent models passive, active and the every possible mixed experiments. The models circumscribed by systems of the linear algebraic equations with inaccurately measured and assigned, but inaccurately realizable, matrix and the models reducing in linear integral equations of the second and first kind with inaccurately measured and prescribed core and with inaccurately measured right-hand side. The necessity of operation is stipulated by need for solution techniques of these problems, which arise by reviewing many applied problems (and sometimes theoretical). For example, problem of rounding errors' compensation at systems of the linear algebraic equations' solution on computers, carrying out of scientific researches and data processing. Problems of handling of research results of the scientific experiments, carrying errors of measurement or errors of implementation of the assigned data, permanently arise both in theoretical and in experimental research, but especially
viii at a solution of practical applied problems where it is necessary to deal with the experimental data. The purpose of book is development of various types of experimental data models for linear functional equations, setting problems of pseudosolution construction, creation of solution techniques of these problems, theirs finishing up to numerical algorithms, writing of handlers of some experiments. Stochastic models of experimental data and the same models of the representation of the a priori information on a required solution (parameters), the statistical approach to a solution of problems in view the confluent-regression analysis is considered. Main results are obtained on the method of least squares-distances and on the maximum likelihood method. In the book only one mode of deriving of estimations is suggested: point estimation. The interval estimation and the test of hypotheses have remained behind frameworks of work in view of lack of researches about distributions of necessary values. It will serve as stimulus for the further researches and in the following book the author will try to fill this gap. The authorship of the least-squares estimation method goes back to Gauss and Legendre. After this the authorship of the least-distances estimation method goes back to Pearson and the authorship of the active experiment estimation to Fedorov. The author has proposed the least-squares-distances estimation method and finished all development in this area of point estimation. In it minimization on auxiliary group of unknown regressors and a right-hand side all again will be spent. They are deduced after that from reviewing, than the possibility of minimization on required parameters is achieved. Such method has allowed to solve the broad congregation of problems. The authorship of regularized methods goes back to Tikhonov and Ivanov. It was possible and to receive to the author in this area generalizing results using the same method of preliminary minimization concerning unknown core and unknown right-hand side. The majority of known problems of the regression analysis are problems of the confluent-regression analysis of passive (or) and active experiment which solution techniques are developed only for the elementary cases and had no therefore practically any circulating. The suggested estimation methods are especially important for the tasks demanding a high exactitude of calculations. The author expresses gratitude to scientists of laboratory of statistical simulation of computational mathematics and cybernetics faculty of the Moscow state university named after M.V. Lomonosov for fruitful arguing of obtained result.
LABELS AND ABBREVIATIONS Uniform labels for all work as a whole: in work (compare a sentence [Lloyd and Ledennan 19901, Vol. 2) the semiboldface Greek letters designate the vectorial and matrix nonrandom values: vectors by lower case and matrices by capital letters. That is the Greek letters designate both unknown parameters and different exact values. Semiboldface Latin letters designate vectors and matrices composed from random variables: vectors by lower case and matrices by capital letters. An italic type Greek and Latin with indexes are the elements of corresponding vectors and matrices. The italic type Latin designates functions and names of standard functions and operations from continuous values of argument on intervals. Thus Greek letters on former designate nonrandom arguments and Latin letters designate the random arguments. The special capitals designate the diverse sets. Tilde above a Latin letter designates realization of random variable (sample). Estimates also are considered as random variables and consequently are designated by Latin letters, near on a tracing to corresponding Greek letters and are supplemented by cap or by crescent to underline their uniformity with realizations. Abbreviations SLAE is a system of linear algebraic equations; SNAE is a system of nonlinear algebraic equations; MLM is a maximum-likelihood method; RMLM is a regularized maximum-likelihood method; LSM is a least-squares method; RLSM is a regularized least-squares method; LDM is a leastdistances method; RLDM is a regularized leastdistances method; LSDM is a least-squaresdistances method; RLSDM is a regularized least-squaresdistances method; RSS is a residual sum of squares.
Chapter 1 SYSTEMS OF LINEAR ALGEBRAIC EQUATIONS
Abstract
In chapter 1 the basic problem of the confluent, confluent-variance and confluent-regression analysis of passive experiment: a problem of estimation of unknown parameters is solved algebraically. The problem of the robust estimation of normal parameters of incomplete-rank confluent and confluent-regression model is solved also.
1. ANALYSIS OF PASSIVE EXPERIMENTS The statistical theory of the linear regression analysis [Borovkov 1984, 1984a, Cox & Hinkley 1974, Draper & Smith 1981, Demidenko 19811 offers the most spread method of parameter estimation. Consequently naturally rushing to compare results of own researches to the results obtained with the help of the classical theory. It causes to devote the first paragraphs of the given chapter to a summary of the basic part of this theory to accent its merits and demerits, moreover with the purpose to have a possibility to apply some specially obtained outcomes in the further account. The material of the chapter, for brevity, we explain in language of the matrix theory. All over, we enter concept of a linear functional relationship [Kendall & Stuart 1968, Lindley 19471 as identities such as a vector is identical to a matrix multiplied by a vector. The basic interest in study of linear functional relationship consists in detection of functional connection between a variable of the response p and other variable or a group of the variables Cl ,..., ,g, 4 1,... ,&, ql, ...,qk known as explaining variables. We enter uniform for first two chapters of work a
2
Alexander S. Mechenov
linear functional relationship or the linear functional (algebraic) equation of the first kind [Mathematical encyclopedia 19851. Assumption 1.Let there is a linearfinctional equation of thejrst kind
where A=
rp = (qq,...,P,,)~
is
a
[c ,. ..,em,41;...,QP ,ql ,...,qk]
response
vector
(right-hand side),
is a matrix explaining the response,
Bp ,Sl,...,6k)T is a vector of parameters. All values including in Eq. (1.0.0), are not rudom. We assume that such relation exists irrespectively our possibility of it's observe or create. The relationship of cause and effect goes from a matrix multiplied by a parameter vector, to the response vector. Separate parts of Eq. (1.0.0), various images deformed by errors, we study in first two chapters of work and at the end of the second chapter the relationship (1.0.0) we shall consider as a whole. Remark 1.0. In algebra the concept of system of the linear algebraic equations (SLAE) is usually applied, but such concept at once provides, that the right-hand side and a matrix are known theoretically (precisely) and it is necessary to find only a solution. For the linear algebraic equation of the first kind we carry out also the statements of problems essentially distinguished from such classical statement. = (pl,. ..,p m ,0
1.1 Multiple Linear Regression Analysis
We enter the basic model. Assumption 1.0. A linearfinctional equation of thejrst kind
is given, where
1
..
(p = (qq,.,an)T
is the unknown response vector,
is a known matrix, 6 = (6 1,. .., s )T ~ are unknown parameters H = [tll,. .., (in Eq. (1.0.0) matrices 8=0 and @=0). In the given paragraph the linear functional equation can be considered as overdetermined and necessarily joint SLAE H6 = (p . Non-joining in SLAE it appears exclusively due to inaccuracies in measurement of a right-hand side as in this paragraph the case when only it is exposed to measurement errors will be considered. It is naturally improbable, that always matrix H would be measured or prescribed in practice without errors, but, nevertheless, all the regression analysis
3
Pseudosolution of Linear Functional Equations
(which title goes back to Galton [Galton 18851) is under construction on this assumption. The linear functional equation (1 .O. 1) is schematically represented in Figure 1.1-1. The simple square framework represents theoretically known values, double square represents unknowns; framework with the rounded angles that will be measured.
pararneaers
Figure 1.1-1. Scheme and description of a linear functional equation.
Let we had an opportunity to measure the values pi,i = 1,n ; changed by the additive random error (see its more strict representation in item 3.2) ei ,i = ; so yi = pi + ei ,i = 1, n ; knowing, that each value p i ,i = 1,n ; coriesponds by definition to the values qil,...,qik , (not prescribed to us and not measured by us, and known theoretically, regardless to measurements cp), in n points i= 1,...,n. In practical applications this situation is, certainly, far-fetched, but realizable in the following cases: 1. In the assumption q6=cp, q = (1..., I ) ~that , is, for the linear functional equation from which the mathematical expectation is estimated; 2. In assumption I6=cp, (I is an identity matrix), that is, for the linear functional equation from which mathematical element expectations are estimated; 3. It in many cases is consistent as well for relations, characteristic for an analysis of variance. Therefore the set of all such matrices H looks as follows
4
Alexander S. Mechenov
Other area of applications: problems of the approximation theory of one function by others where also it is admissible to count the values of H exact [Berezin & Zhidkov 1966, Bakhvalov 19731. We study this situation a little bit more in detail [Metchenov 1981, Mechenov 19881. Assumption 1.1. The vector y = ( y , ,...,y,)* is a random vector of measurements over a Euclidean sample space n", is a linear manfold of dimension k in the space R", n2k. We assume, that there exists a set of representations of an aspect
~IH]
called as linear stochastic model, where ql ,.-.,qk is a vector in the set L[H] , e = ( e , e n is an additive random vector of observational errors having mathematical expectation zero ( Ee = 0 ) and a known positive-deJnite covariance matrix that is independent of 6: cov(e,e) = ~ (- Ee)(e e -~
e
)
~
= Eee T =Z
The vector set cp = Slql +...+Gkqk forms linear manifold of dimension k in n-dimensional space R~ provided that vectors 111,.-.,q k are non collinear. Noting expression (1.1.0) component-wise, we have
or in the matrix form
The matrix H is frequently named a variance matrix or a matrix of the plan, and a vector y is named a response. In the regression analysis, the relationships are constructed with the arbitrary nonrandom matrix, however any ideas of type two-dimensional distribution of pair of random variables [Lloyd & Lederman 19891 in this case it is not meaningful, as models with a random matrix demand other approaches.
Pseudosolution of Linear Functional Equations
5
The model of the variance and regression analysis is submitted in Figure 1.1-2. In the left part of figure (see [Vuchkov, etc. 19851) pointers show relationships of cause and effect. In the second part the simple square framework represents theoretically known values, double square framework represents unknowns and framework with the rounded angles and with shadow represents random variables.
-
Figure I.1-2. Scheme and description of measurements in regression and variance model.
1.1.1 Point Estimation of Required Parameters by the Least Squares Method Let errors submit to the Normal law. We write out the likelihood function:
Problem 1.1.1. Knowing one observation y = H6 + Z of the random vector y, its covariance matrix Z, and the matrix H offull rank k, estimate the true values ofparameters 6 of the model (1.1.0)so that value of likelihoodfunction is maximized (maximum-likelihoodmethod (MIM)). Remark 1.1.1. As det(EeeT)e d e t ( ~ does ) not depend on required parameters the Problem l . l. l can be reformulated thus: to estimate true values of parameter 6 of model ( 1 . 1 .O) so that the weighted quadratic form
6
Alexander S. Mechenov
is minimized (method of least squares (LSM)). So the MLM, which generally builds efficient, consistent and asymptotically unbiased estimates, in this case coincides with LSM. The method of least squares was proposed by Legendre [Legendre 18061 and Gauss [Gauss 1809, 18551 and was generalized in [Aitken 1935, Mahalanobis 19361to the case of an arbitrary covariance matrix. Markov [Markov 19241has in turn suggested build the best linear unbiased estimate, which has led besides to outcome as LSM (in the modem aspect explained in [Draper & Smith 19811 and by other authors). Malenvaud [Malenvaud 19701 and Pyt'ev [Pyt'ev 19831 in turn proposed entirely different approaches leading to the same results as the LSM. Theorem 1.1.1. The estimate ;i of the parameters 6 can be computed from the SLAE
which is called a normal equation or system of normal equations. Proof. We calculate derivatives of the minimized quadratic form with respect to the parameters and we equate these derivatives to zero. That is necessary condition of a minimum. As [Ermakov & Zhiglavsky 19871
where M is any functional, then
whence follows to Eq. (1.1.2). Taking into account, that det the formula
#
0 , we have for an estimation
7
Pseudosolution of Linear Functional Equations
1.1.2 Best Linear Unbiased Estimates. Gauss-Markov Theorem
The estimate ;I of the parameter vector 6 will be unbiased linear in that and only in that case, when d=Ty, where is a matrix such, that Ed = m y = r'H6 = 6 , that is, rH=I. For example r = H T~
(
r1 H
We consider the approach suggested by Markov [Markov 19241. Namely: Problem 1.1.2. Knowing one observation = H6 + 5 of the random vector y, its covariance matrix I: and the matrix H of the fill rank k, estimate true values of parameters 6 of model (1.1.0) so that to construct the unbiased linear estimation of unknown parameters with the least variance. Lemma 1.1.2. Let r be some matrix with n columns. Then mathematical expectation of the vector Ty is equal to
and its covariance matrix is equal to
Theorem 1.1.2. (Gauss -- Markov) The estimate 2 of the parameter vector 6 is linear unbiased and has a covariance smaller or equal to the covariance of any other unbiased linear estimate ofparameter vector. Proof. We consider an arbitrary linear estimate of the vector 6
i = ~ , where r is a required matrix of dimension kxn. As
for an unbiasedness of an estimation it is necessary, that TH=I. However thus it is set only the kxk equations concerning elements T, therefore we consider the following problem: to determine r also so that a variation of the linear form z T i would be minimum at any predetermined vector z. By virtue of the Lemma 1.1.2
Alexander S. Mechenov
= t r ( ~Tn r T,
=tr(CWT~)
For the solution of a minimization problem of the linear form variation in linear constraints we apply a method of undetermined Lagrange multipliers, having entered kxk such multipliers as the matrix A. Having equated to zero the derivatives with respect to the matrix of the expression
we obtain the equation
Lefbmultiplying its on H T ~ - l&d taking into account, that H T T = I ,we discover
Then
As the matrix T can be a rank 1 there is a set of matrices r, satisfjling this relation, but only one of this set does not depend from the matrix T, it is what will convert in zero expression in square brackets, that is
So, in this case under the same formula (1.1.2) the LSM-estimate 2 from the Problem 1.1.1 and the best (with the least variance) linear unbiased estimate of the parameter vector is calculated. Because of these properties, LSM-estimate appeared extremely attractive.
Pseudosolution of Linear Functional Equations
Corollary 1.1.2. The estimate covariance matrix
9
i of the vector of parameters 6 has the
Proof. A covariance matrix
Definition 1.1.2. We name the residual vector 6 the value A
G=Y-Hd
-
A
y-y.
1.1.3 Mathematical Expectation of the Residual Sum of Squares Theorem 1.1.3. Mathematical expectation of the weighed residual sum of squares is equal to n-k
Proof. Really
10
Alexander S. Mechenov
E,!?~= ~ i z i -Iee = ~
( -y~ 6T r1 ) (y - ~ 6 )
We use a relation eT X -1 e = tr 1.1.4 LSM-Estimation of Homoscedastic Model Assumption 1.1.4. (homoscedasticity) The random vector of measurement errors e of linear stochastic model (1.1.0) has the same constant variance a2 . From here follows, that a covariance matrix
X = cov(e,e) = ~ ( -eEe)(e - Ee)
T
= Eee T
= a2 I
will be a scalar matrix. The model (1.1.O) has the form
The quadratic form which needs to be minimized in the Problem 1.1.1, notes as follows
Theorem 1.1.4. The normal equation will be
Pseudosolution of Linear Functional Equations
H'HG = ~ Taking into account, that der(~'H)
'y.
+ 0, we have for an estimate the formula
where
1.1.4.1 Graphic Interpretation Corollary 1.1.4a.
Proof. As 6 = 7 - j , that
Thus, the residue vector is orthogonal to linear manifold L[H]. In that case when among vectors of model there is a unit vector the residual sum is equal to zero
We remark, following [Kolmogorov 19461, that the solution of the Problem 1.1.1 in the Assumption 1.1.4 characterizes the vector j = H i from L[H] , which is a projection of a vector of measurements to this subspace as it is submitted in Figure 1.1-3.
12
Alexander S. Mechenov
Figure 1.1-3. Projection of measurements on the plane of regressors.
1.1.4.2 Estimation of the Variance of Errors Theorem 1.1.4.b. The unbiased estimator of the error variance experimental variance
c?
is an
Proof. That is obvious corollary of Theorem 1.1.3.
1.1.4.3 Multivariate Multiple Model Investigated model (1.1.0) can be expanded on multivariate multiple, that is, on model of an aspect
where we arrange the matrix E by columns into the column E. Malenvaud considered such models and passage to such models is obvious enough [Malenvaud 19701. Remark 1.1.4. Basically, these outcomes should be put into practice with the big caution as only in the theory can be so, that the matrix H is really known precisely. All above-stated without special meditations is put into practice for those problems of the regression analysis where the columns of matrix H are either measured or assigned values or functions from measured or assigned values, that is, we underline, not free from errors of measurement or the representation of the information. But, at present, they are used most intensively in all computations and in software packages: LOTUS 123, Borland QUATTRO, Microsoft EXCEL, BMDP [BMDP 19791, SAS [Barr 19761, AMANCE [Bachacou, etc. 19811, SPAD [Lebarte & Morineau 19821 and many others (as only in very weak degree methods of a solution of problems with measured and (or) prescribed matrices have been developed).
Pseudosolution of Linear Functional Equations
1.2 Linear Model Subject to the Linear Constraints
Outcomes of the given paragraph will be essentially used further as the confluent analysis of passive experiment enters the scheme of linear model with linear constraints. Assumption 1.2. We consider the linear finctional equation (1.1.1) subject to the linear constraints of the form
where
r is a some known precisely fill-rank
matrix of dimension lxk and u is a known exact vector of dimension 1. That is, the unknown parameters P satisfy both to the model (1.1.O) and to the system of linear algebraic equations r6=u.
1.2.1 LSM-Estimator Subject to the Linear Constraints
It agrees [Aitchison & Silvey 19581 we consider the following problem [Metchenov 1981, Mechenov 19881. Problem 1.2.1. Knowing one observation y = H6 + 'i?' of the random vector y, its covariance matrix Z, the matrix H o f f i l l rank k, the matrix r of rank 1 and the vector u, estimate the true values of the parameters 6 of model (1.2.0) so that the quadraticform
is minimized subject to the linear constraints r6=u. Theorem 1.2.1. The estimator 4 of the parameter vector 6 is calculated from system of k+ 1 linear algebraic equations
Alexander S. Mechenov
where 1en' is a vector of the Lagrange undetermined multipliers. The SLAE (1.2.2) is called an expanded normal equation or expanded system of normal equations. Proof. We use the Lagrange method for the proof. For a calculation of T
result it is enough to set a vector 2 1 = 2(a1, ...,A/) of Lagrange undetermined multipliers, to left-multiply the system of constraints on this vector
and to add the product to the minimized quadratic form
The derivation with respect to the vectors 6, 1 leads to the expanded SLAE (1.2.2). It is necessary to show only that a vector thus calculated is unique. We consider any vector 6 such, that l3=u. Then
a
and from not negativity of the quadratic form (Z - Z) T Z -1 (6 - Z) 2 0 in view of previous follows -e-T Z- 1 ( E -5) 2 0 . From the previous equality and from this inequality we have
Thus, only the vector d minimizes the quadratic form (1.2.1) in linear constraints I~=u. This estimator, obviously, also is the best linear unbiased estimator.
Pseudosolution of Linear Functional Equations
1.2.2 Graphic Illustration
We construct also a graphic illustration for th~scase. We remark, that the solution of the Problem 1.2.1 characterizes a vector y = H a from L[H] which is a projection ji to intersection L[H] n - v ]
as it is shown in figure 1.2-1.
Figure 1.2-1. Projection of measurements on a regressor plane at linear constraints.
1.2.3 Expectation of the Residual Sum of Squares Theorem 1.2.2. For model (1.2.0) expectation of the weighed residual sum of squares
Proof. The weighed residual sum of squares
16
Alexander S. Mechenov
The second member is represented in the form
Really
whence
that expectation of the weighed sum of the squares added by constraints, is equal
It is known from the Theorem 1.1.3 that E i 2 = n - k . In result, we have the following unbiased estimator
17 1.2.4 Relation Between the LSM-Estimate in Linear Constraints and the Standard LSM-Estimate
Pseudosolution of Linear Functional Equations
Further there will be useful an expression of a LSM-estimate of model (1-2.0)through a LSM-estimate of model (1.1.O)
For the proof it is enough to substitute value I, in the expanded normal equation. Theorem 1.2.3, The estimate d is unbiased and has the covariance matrix
Proof. Taking into account, as Ed = 6 and E ( T ~- u) = 0 , at once we
have from Eq. (1.2.3), that Ed = 6 . From that, as d = 6 + KH T Z -1- e, using Eq. (1.2.3), we receive
then
Remark 1.2.3. Because this rather seldom stated in textbooks, model with linear constraints is a base for estimation construction of confluent model of the passive experiment investigated in the item 1.4 and further. As in chapter 3 the linear integral equations reducing in ill-posed problems are investigated, we consider separately a case of regression models with the incomplete rank matrices. The reader, who interests only with the well-posed problems of parameter estirnation, can continue at once to item 1.4.
18
Alexander S. Mechenov
1.3 Estimation of the Normal Parameter Vector Assumption 1.3. We consider a variant of linear regression model (1.1.0)
when the matrix H is singular of incomplete rank r O, X = E + C , x ennm,EC = 0, EesT = T,E C C ~ = M,M > 0, T 2 b = 1 p + w 7 be^^, EW=O, Eww = v K,K>O,n+h>m,
(1.5.4)
such that the supplement condition is accomplished, that is, the matrix composed from matrices E and I is not degenerated. We consider, that the experiment is conducted passively, that is, the events occur at some unknown values of E and of p. The researchers have at their dis-
65
Pseudosolution of Linear Functional Equations
posal the random response vector, the random regressor matrix, the random a priori vector b for the vector p and its covariance matrices. The structure of model is shown in Figure 1.5-1.
P
object EP=(p
I
Figure 1.5-1. The scheme of regularized experiment.
Problem 1.5.4. Knowing one realization y.%(rank%= m) and
6
of
random variables y, X , b and the corresponding covariance matrices Z, M, T, K, estimate the unknowns cp, E and the parameters p of linear conjluent stochastic model (1.5.4) by M M . Theorem 1.5.4. Estimates ofparameters f3 of the linear confluent model (1.5.4)from the Problem 1.5.4minimize the following quadraticform
The proof differs a little from item 1.4.1. Theorem 1.5.4.1. Expectation of the weighed RSS at true values of parameters is equal to
Proof. We take advantage of the Theorem 1.2.2 of the Chapter 1. The outcome is obvious, as is present the nm+n+m equations and nm constraints. 1.5.5 A method of the Least Distances for Homoscedastic Model
66
Alexander S. Mechenov
We consider the most popular case when all errors of the regressor matrix are known with the same variance, the same as errors of the response vector and the a priori information errors. Assumption 1.5.5. We use the linear confluent stochastic model (1.4.1) ofpassive experiment of an incomplete rank with the a priori information y=EP+e,
T
2
E I R ~P ,exkrn,Ee=O,Eee = a I,
X = B + C , X E ~ ~ ~ , E C = O , E ~ ~ ~ =(1.5.5) O , E ~ ~ b = I P + w , b ~ n ' ,EW=O, Eww T = v 2I , n + A 2 m , such that the supplement condition is accomplished, that is, the matrix composed from matrices E and I is not degenerated. In the model (1.5.5) the observations have identical variances in y, separately in columns of matrix X and in the a priori information. Then the Problem 1.5.4 has the form. Problem 1.5.5. Knowing one realization y.g(rankg = rir) and b of the random variables y, X , b and corresponding variances 02, p2 and v2, estimate the unknowns cp, E and the parameters fl of linear confluent stochastic model (1.5.5) by the MLM. Theorem 1.5.5. Estimates of parameters fl of linear confluent model (1.5.5)for the Problem 1.5.5 minimize thefollowing quadraticform
The proof differs a little from item 1.4.3. Remark 1.5.5. Apparently from previous, the problem in such aspect is easy enough for putting, but uniform computing process as in regression model, already is not present. So development of such approach any more so is interesting as does not leading to use of already available program package as it takes place in the regression analysis where can be used the same standard program and for computing the LSM- and the RLSM-estimates, changing only input data. Here, for deriving the RLDM-estimates it is necessary to write the specific program. Nevertheless this approach is interesting to that allows to count the degrees of freedom and, accordingly, to apply them to calculation of the unknown variance of the a priori information. It allows to estimate the interval of solution (parameter) errors.
67
Pseudosolution of Linear Functional Equations
1.6 Confluent-Regression Analysis of Passive Experiment We have solved in item 1.4 the parameter estimation problem for linear functional model of the first kind in presence of random measurement errors in the matrix and in the response vector (having the correlated observations) or so-called confluent model of passive experiment. In the given paragraph the constructive solution of the parameter estimate problem of the jointly operating in experiment the linear regression model and the linear confluent model of passive experiment or otherwise linear confluent-regression model of passive experiment is offered by a method of least squares-distances (LSDM). Assumption 1.0.6. Given the linear finctional relationship (not subject to our influence overdetermined simultaneous SLAE offill rank)
where cp = (qq ,-.-, Cn)T is an unknown response vector, B = [gl -.,E], ,a
is an
..,q k ] is a known theoretical matrix, unknown measured matrix, H = [q,. T = ( f l l ,...,fl,,,) ;6 = (4,...,6k)T is a vector of unknown parameters (it is
taken the first and the third terms of the linear functional equation (1.0) at =O). Assumption 1.6. We consider the following well-posed linear confluentregression stochastic model of a passive experiment
constructedfor the linearfinctional equation (1.0.6).
I
I
Figure 1.6-1. Scheme and description of a passive experiment.
I
68
Alexander S. Mechenov
We assume that the experiment is passive, that is, events occur when E, H, p and 6 take some nonrandom values, and the researcher observes the random response vector y=cp+e=Ep+HG+e with the covariance matrix T, of the errors e and the random regressor matrix X=E+C with the covariance matrix M of the errors C. We suppose that the matrix H is known precisely (as it is supposed in the analysis of variance, for example, it can be a background vector). The structure of confluent-regression model or confluent-variance model of the passive experiment is shown in Figure 1.6-1. Remark 1.6.1. A linear confluent model (1.6.1) was first considered in [Lindley 19471 for the case of a linear functional relationship with only two parameters p and 6:
and was estimated the parameters p and S by the MLM. Parameters have been estimated with the help of derivation on these two parameters and then of solution of the SNAE. The numerical example of the estimation is shown in [Lindley 19471. However, the generalizations have not followed in view of complexity of a solution of the obtaining high order SNAE. The general algebraic formulation of the model was solved in [Mechenov 19911 and the general stochastic formulation in [Mechenov 1997, 19981. 1.6.1 Least Squares-Distances Method of the Parameter Estimate of Model of Passive Experiment Problem 1.6.1. Given are the values of the matrix H , and one realization and one realization % (rank % = m) of the random variables y and X:
We also have the covariance matrices 2, T , and M. It is required to estimate the unknowns 9, E and the parameters p, 6 of the linear confluent model (1.6.0) of passive experiment by the MLM. Theorem 1.6.1. The estimators 6,a of the parameters P,6 of the linear confluent model (1.6.0)fvom Problem 1.6.1 minimize the weighted quadraticform
Pseudosolution of Linear Functional Equations
or, they are calculatedfrom the SNAE
This system of equations is called the confluent-regression normal equation. Proof. Instead of the matrix X and the vector y, we consider the (mn+n)dimensional vector z = ( ~ , - y ~ To ) ~this . end, we stretch the matrix X by rows and augment it with the row -yT; the same transformation is applied to the vec-T T T T tors 6 = ( B , - ~ ~ P )= ~ (g,-y , ) and w = (c,-e ;) . The original linear confluent-regression model (1.6.0) is rewritten as a linear regression model with linear constraints rc = -H6 :
where the constraint matrix I' is the same as in item 1.4.1. Using the MLM, we receive similarly 1.4.1 that Problem 1.6.1 reduces to the following two-stage minimization problem: estimate the true value of the parameter 6 so that the quadratic form
I'c
is minimized subject to the constraints + H6 = 0, and then find the minimum of this quadratic form over all possible values of the parameters f3 of matrix r and of the parameters 6
Consider the first stage. We use the method of undetermined Lagrange multipliers, and we receive similarly item 1.4.1, that the estimator of the vector 4 is calculated from the relationship
Alexander S. Mechenov
At any parameter p, the matrix r is the full-rank matrix because of presence of the submatrix I. Then the minimization problem of the weighed quadratic form subject to these linear constraints always has the unique solution. Then the variational problem can be rewritten in a form independent of E and 0:
j2 =(Z-Z) T a-1 (Z-Z)
+
=( I 3~
6(rC2r'T)d(E ) ~
+ HS)
= ( R ~ + H ~ - ~ ) ~ Y - ~ ( R- 7~) + H s = -T e Y -1-e,
where Z = Y - ~ ~ - H S . In the second stage, we have to find 6,; that minimizes (**). Differentiating (**) with respect to parameters p, 6, we obtain the complicated SNAE, which is certainly insoluble in an explicit aspect but which is solvable in simple cases. The estimators of parameters p, 6, provided that the matrix A = [E,H] has a full rank, will supplies a unique solution to the Problem 1.6.1. 1.6.1.1 Estimate of Mean of the Residual Sum of Squares Now, using (*), we easily obtain the estimators i = (&-y AT)T of the regressor matrix E and the response vector cp respectively
Theorem 1.6.1.1. Let P, 6 the true values of parameters. The mean of the residual sum of squares (*) is given by
The result follows from equation (**) allowing for equation (*) as in item 1.2.3.
Pseudosolution of Linear Functional Equations
Remark 1.6.1. The mean of the RSS n-m-k [Kendall & Stuart 19681
71
i2 is assumed (similarly 1.4.la) equal to
The obtained result satisfies completely to the correspondence principle. Really, when the matrix E is absent in the Eq. (1.6. l), this estimation method is transformed to the LSM [Gauss 1809, 1855, Aitken 19351 and, when the matrix H is absent in the Eq. (1.6.l), it is transformed to the LDM [Pearson 1901, Mechenov 19881. Therefore we call this method by the least squares-distances method (LSDM) [Mechenov 19911 or, when there is only one free parameter S, by the least distances method. 1.6.2 Identical Correlations in Columns of Matrix and in the Vector of the Right-Hand Side Assumption 1.6.2. We assume the following linear confluent-regression stochastic model ofpassive experiment
in which [E, HI is the full-rank matrix of the linearfinctional relation (1.0.6), where Ci is the column i of the error matrix C, 6, is the Kroneker delta. Thus, the normally distributed passive observations in model (1.6.2) have an identical covariance matrix Z in y and in each column i of the matrix X, that often enough appearance at measurements by the same devices, both the response and the regressor matrix. Then the parameter estimation problem takes the following form. Problem 1.6.2. Given a single realization y and 2 (rank = m) of the random variables y and X ,
the exact theoretical matrix H, and the covariance matrices Z, M = diag(Z,... ,Z ) , estimate the unknown elements 9,E and the unknown parameters p, 6 of the linear confluent-regression stochastic models (1.6.2) by the MLM, that is, so that the weighed sum of squares
Alexander S. Mechenov
would be minimum in the constraint cp=EP+H6. Corollary 1.6.2. Estimates k,& ofparameters P, 6 of the linear confluentregression model (1.6.2)from the Problem 1.6.2 minimize the quadraticform
or they are calculated from the SLAE
where i2is the minimum value of the RSS 9. The proof is similar to item 1.4.2. 1.6.3 Equally Weighed Measurements of the Matrix and of the Vector
We consider the most popular case when the elements in the matrix X and in the response vector y are measured with the same variance. Assumption 1.6.3. We assume the following fill-rank linear confluentregression stochastic model of passive experiment, which use the finctional relation (1.0.6)
Thus, the normally distributed passive observations in model (1.6.3) have an equal variance 2 in y and equal variance ,din X. Then the parameter estimation problem takes the following form. Problem 1.6.3. Given a single realization y and % (rank % = m) of the random variables y and X ,
73
Pseudosolution of Linear Functional Equations
the exact theoretical matrix H , and the variances o2 ,,u2 , estimate the unknown elements cp, E and the unknown parameters py 6 of the linear confluentregression stochastic model (1.6.3) by the U , that is, by minimizing the RSS
subject to the linear constraints (p=EP+HG. Corollary 1.6.3. Estimates 6,i of parameters P, 6 to linear confluent-regression model (1.6.3)from the Problem 1.6.3 minimize the quadraticform
or they are calculated from the SLAE
where i2is the minimum value of the RSS 9. The proof repeats the proof of the Corollary 1.4.3. 1.6.3.1 Estimate of the Variance Knowing the ratio of variances
K = 02/,u2 , the
approached estimate of the
variance a2can be computed (similarly 1.4.1.1 and 1.4.3.6) fromthe formula
n-m-k
n-m-k
1.6.3.2 Example of Construction of the Estimate Let us process the results of the example using the confluent model with free background. As an example in Figure 1.6-2, we construct the line of simple regression p=&i-6217 and the line of simple confluence p=@+S. At the left, these four measurements are approximated by the LSM, leading to the line j = 1.5 in a result. On the right, these four measurements are approximated by the LDM,
74
Alexander S. Mechenov
leading to the line j = i with declination 1 (the RSS are equal to 1 in both cases). In both cases, we have the symmetric estimation of the input data. I
Figure 1.6-2. Comparison the LSM-estimate(a) and the LDM-estimate (b).
1.6.4 Progress Toward the Ideal in Inherited Stature
Consider the statistically based approach to solving the problem of estimating the linear dependence between the height of children and the average height of their parents. As we know from messages of mass media and we see from own observations, our children accelerate. They are higher and cleverer than us on the average. However by results of observations, Galton [Galton 1885, 18891 has concluded that there is not the acceleration, but there is the regression of height of children on height of parents. In particular, it has denominated the entire area of statistics: regression analysis. The initial data are shown in Table 1.6.4 (Galton takes these data from researches: Record of family Faculties [Forrest 19741, and the author has taken them from [Forrest 19741). Table 1.6.4.The relation between the height of children and the mean height of theirs parents. The lefi-hand column gives the mean height (designated according to functional relation as x or q, in inches) of the parents; the rows give the number of children with the height shown at the top ( y, in inches).
75 From the LSM we have the following estimate of the parameters of linear regression model:
Pseudosolution of Linear Functional Equations
Sir Galton did not use the LSM, but on average maximum values of columns has concluded the regression of children height on the mean height of parents. Since measurements of human height contain random errors, pure regression analysis is not applicable to these data. Accordingly this problem is one of processing the results of a passive experiment. Here it is preferable to seek a functional relation between the heights of the form cp = EJ + 8; and when describing the model one must take account of errors (ei;ci),i = 1,s.-,n; that have different random causes but are of the same homoscedastic type for all pairs (yi ;x i ) = ( c i +~6;ci)+ (ei ;ci ),i = 1.-..,n; leading to the following model:
in which we assume the variance of the height of people is the same and equal to bL.
However in the data it is supposed, that height of parents is taken as a half-sum of height of the father and mother, moreover rounded off up to inch, and, hence, contain an error of measurements. Assuming the variance of height of father, mother and children identical and equal to 02,it is easy to calculate the variance of the half-sum height of the father and mother ,u2 [Borovkov 19861. It is equal to half o2:,u2 = 02/2 (the estimate of the ration of variances from input data gives value 2.2446). It is possible to explain excess over the two by magnification of height of children and by magnification of their scatter heights. Estimating by the LSDM, that is, applying the confluent-regression analysis of passive experiment, we have the following result
The input data (on intersection of the row 69 and the column 70) contains probably or a corrigendum or 'rough malfunction', i.e., instead of value 2 it would be necessary to read 12. It practically influences the estimate results neither the LSM, nor the LDM. The results of calculations are submitted in Figures 1.6-3 and 1.6-4.
Figure 1.6-3. The graph of the residual sum of squares for confluent-regression model.
From observational results Galton concluded that there was no progression, but rather a regression of the height of children to the height of the parents, and this, in data particular gave a name to this entire area of statistics: regression analysis. Galton argued (see [Galton 1869, 1883, 1889, 19071 and [Galton 18851 is named "Regression towards mediocrity...") that tall people degenerate in the third generation with coedcient -1 I3 (see least-squares estimate).
I I Fig. 1.6-4. The data of Table 1.6.4 (data), the LSM (reg) and the LDM (r*c) estimates of measurement of heights of children as a function of the mean height of parents. The line y=x is shown for comparison.
77 In fact, they (these great both height and mind) undertake whence. But there is also bias, that is, tall people must come from somewhere, and by the minimumdistance estimate taller people occur 1.6 times as often among tall people; conversely, short people tend to get even shorter (the bias depends on that fact). Therefore by analogy and as opposed to regression it is possible to name the given analysis of passive experiment as the "progression" analysis.
Pseudosolution of Linear Functional Equations
1.6.5 Homoscedastic Measurements of the Matrix and of the Response Vec-
tor Let us consider the main assumption 1.6.1 in application to the typical case where all the errors in the columns of the confluent matrix X and in the response vector y are homoscedastic. Assumption 1.6.5. We assume the following linear stochastic model of passive experiment
in which [ E , HI is the fill-rank matrix of the linear finctional equation (1.O.6), d;, is the Kroneker delta. Thus, the normally distributed passive observations have an equal variance 02 in y and equal variances j$ in each column i of the matrix X. Then the parameter estimation Problem 1.6.1 takes the following form. Problem 1.6.5. Given a single realization 7 and % (rank% = m) of the random variables y and X ,
the exact theoretical matrix H, and the variances a2 ,pi2 ,l. = 1,m; estimate the unknown elements q, E and the unknown parameters P,6 of the linear stochastic model (1.6.5) by MLM.
Given that the observation errors are independent of the parameters, the parameters should be estimated so as to minimize the squared distance of the observations from the sought values
Alexander S. Mechenov
subject to the linear constraints cp=EP+HG. Corollary 1.6.5. The estimates 6 , i of the parameters P,8 of the linear model (1.6.5)from Problem 1.6.5 minimize the quadratic form
or are calculated from the following SLAE
where i2is the minimum value of the RSS 9. The corollary is the particular case of the proposition 1.6.1 proved for a more general covariance matrix. 1.6.5.1 Example of Construction of the Estimate
As a corollary, we consider estimation for two-parameter linear stochastic confluent model with variances that are equal for each variable and homoscedastic within each variable:
The model includes one free (regression) parameter. It follows from Corollary 1.6.5 that the estimates i1,&,2of the parameters PI,P2,S of the linear model (1.6.5a) from Problem 1.6.5 minimize the quadratic form
Pseudosolution of Linear Functional Equations
YJZ
S 2 = 1 ~ 1 %+ P2g2 + 62- 2 ' o2+ P:P: + P2&
and are obtained as the solution of the followkg SLAE:
This SLAE obviously has to be where S2 is the minimum value of the RSS 9. solved by an iterative process. Consider a regression example from weiss 19951, where the price of secondhand Nissan car is estimated as a function of the model year ("age") {I and the distance traveled 5. The input data are given in Table 1.6.5. Table
First we run a simple regression on age [Mechenov 20001. For a simple linear regression model we obtain the LSM-estimate
To compute the LDM-estimate, we assume that the car age is uniformly distributed throughout the year. We thus obtain for its variance [Borovkov 19841
80
Alexander S. Mechenov
We further assume that for 11 observations the uniform distribution is adequately approximated by the normal distribution. We use the variance of the LSM price estimate as the starting value for iterative calculation of the variance of the price y. Iterating to the expected number of degrees of freedom =11-2=9, we easily choose the variance ratio for the input data. We thus obtain the LDMestimate
The LDM produces a more accurate estimate of the value of a two-year old car than the rough LSM estimation (whose deficiencies are described in weiss 19951). The prices calculated using LDM estimates perfectly match the observations. It follows from the model that the price of a new car is $20,000 and its useful life is -10 years. Figure 1.6-5 plots the input data and both estimates. regression&confluence
U)
g 'E
18 16 14 12 10864203 0
\K x\
1
2
3
4
5
6
7
years I
I
Figure 1.6-5. One-parameter model. Squares show the input data, disks the LSM-estimates, diamonds the LDM-estimates.
Let us now consider the regression on the distance traveled. For a simple linear model we have the LSM-estimate
and the LDM-estimate (the observation error is also assumed uniformly distributed)
Pseudosolution of Linear Functional Equations
81
Here the LDM estimation procedure is virtually identical with LSM calculations, because the theoretical variance of the distance traveled is very small, which is not entirely consistent with observations. Indeed, the odometer readings are directly related to the wheel diameter, and a change of 1 cm in the wheel radius due to tire wear leads on average to an error of 4%-5% in distance measurement. Hence on average ,u: 5 . Let us re-calculate the estimate with this variance. Here also the variance ratio of observations is easily calculated from the degrees of freedom:
This change of a variance does not have a strong impact on the estimate because in fact the variance should be much smaller for large odometer reading, and this approach requires calculations with heteroscedastic variance, which fall outside our scope. ~igure1.6-6 plots the observations and both estimates.
I
Figure 1.6-6. One-parameter model. Dark squares are the input data, diamonds the LDMestimates, disks the LSM-estimates.
Finally, let us consider regression on both variables. For two-parameter linear model we obtain the LSM-estimate
82 and the LDM-estimate
Alexander S. Mechenov
Both LSM and' LDM estimates fit the observations. Figure 1.6-7 plots the observations and the two estimates. We see that the six middle points (which provide the most reliable description of the observations) are closer to LDM estimates. Since age and mileage are strongly correlated (in principle, they cannot be efficiently used together in the same model), the rank of matrix is sensitive to the specification of the variance. The LDM-estimate rapidly loses its stability when the variances of independent variables become equal. The variance ratio of the observations is easily calculated from the degrees of freedom. The model is min fact nonlinear in these parameters (because the price does not drop to zero after 9-10 years) and a more detailed analysis is required.
I
I
Figure 1.6-7. Two-parameter linear model projected onto the pricedistance plane. Squares show the input data, diamonds the LDM-estimates, disks the LSM-estimates.
1.6.5.2 Example of Extrapolation of the Input Data Let consider an example. In Figure 1.6-8, the black squares designate input data which errors are identical on both variables. The cloud of input data, on the outlines, reminds a trunk of a "gun" shooting under an angle of 45 degrees. The estimates of input data (symmetric concerning the line y=x) are computed by the LSM (the disks connected by a line) and by the LDM (the diamonds connected by a line). Above in Figure 1.6-8, the obtained estimates are written. Thus, the LDM-estimate keeps the symmetry available in input data, and the prediction goes on an axis of a symmetry. In this case, the LSM estimate does not lead to similar symmetric result. It follows, that for value x=8, the LDM-extrapolation
Pseudosolution of Linear Functional Equations
83
("shots from a gun") is equal to 8, and the LSM-extrapolation is equal to 6,96 and falls ltside the limits of the band ye [x-1, x+ 11. regression y=0.4+0.82x and confluence y=x
F
re 1.6-8. The estimates of input data. Squares show the input data, diamonds the LI estimates, disks the LSM-estimates.
Remark 1.6.5. Further it is possible to consider the confluent-regression analysis of passive experiment subject to the linear constraints, the multivariate confluent-regression analysis of passive experiment and nonlinear models. As in practice, the estimates of nonlinear parameters are made with use of a linearization, there are only technical difficulties. Now we consider the confluent-regression incomplete-rank models of passive experiment.
84
Alexander S. Mechenov
1.7 STABLE ESTIMATION OF NORMAL PARAMETERS OF THE DEGENERATED CONFLUENT-REGRESSION MODEL
We consider the problem of the stable estimation against input data errors of the normal vector for cases when the matrix [E,H]has an incomplete rank. Assumption 1.0.7. We assume the following linearJirnctional equation of the 3rst kind of an incomplete rank
Definition 1.7.0. The vector (Po,Go) refers to as a normal solution of the SLAE (1.0.7) with the degenerated matrix [=,HI, when
where H , E are the known exact matrices and p is the known exact vector. 1.7.1 Regularized Least Squares-Distances Method
Assumption 1.7.1. We use the linear confluent-regression stochastic models of passive experiment
for the linearJirnctional equation (1.0.7) of an incomplete rank. Instead of the matrix X and the vector y we consider the (mn+n)-dimensional vector z = ( ~ , - y ~For ) ~this . purpose, we stretch the matrix X by rows and
augment it with the row -yT. The similar operation is applied to the vectors < = ( ~ , - p ~Z=(g,-y ) ~ , -T )T and w = ( ~ , - e ~Then ) ~ .the equation (1.7.0) is rewritten as a nondegenerate linear regression model with linear constraints:
where the matrix of constraints
(nx(nm+n))has the same form, as in Eq.
85
Pseudosolution of Linear Functional Equations
(1.5. la). In the given model the matrix I is not degenerated also the constraint matrix has the full rank without dependence from completeness of a rank of the initial model matrix. Therefore the first stage: an estimation of the vector 6 will be always well posed. Problem 1.7.la. Assume given the approximate vector %, its covariance matrix R and the constraint matrix r. It is required to estimate the vector of parameters 6 of the model (1.7.1) by the MLM. Theorem 1.7.la. The solution of a Problem 1.7.la exists, is unique and minimizes the quadraticform S'2 =
min
1% -61 2
0 there is such value of an input data error ~ ( s ) , that the approximation vector to the normal vector of the SLAE (1.0.7) evades from a normal solution no more than on s
The proof is similar to [Tikhonov & Ufimtsev 19881.
Alexander S. Mechenov 1.7.4 Quasisolutions
In the same way we generalize the quasisolution concept [Ivanov 19621. Definition 1.7.4.
he vector
T
( f 3 ~ , 6 5 ) we call quasisolution of the equa-
tion (1.0.7) on the set M, if T
( pTM , tTi M ) =arg
2
~ E ~ + H G. - ~ (
min (pT,STr€+!
We consider as the set M the full-sphere
Problem 1.7.4. Given the approximate vector 5 , the exact matrix H , and the T
value of g>0, calculate the quasisolution vector ( p L , t i L ) as the argument minimizing the RSS on the set M of admissible quasisolutions min p,S:l(pT
[
min
c:I' x. The problem of computing the solution of this equation is Hadamard ill posed. The main feature of the various approaches to the solution of this problem with a known error in the right-hand side is that the approximate solution in principle cannot be computed without information about the norm of the right-hand side error [Tikhonov 19631 or the norm of the solution [Ivanov 19631. We will describe these approaches for the case when the right-hand side is measured with an error and then compare them with the previous results for a core with observation or specification errors. We thus assume that only the right-hand side of the equation (3.1.6) contains nonrandom observation errors.
147
Pseudosolution of Linear Functional Equations
Assumption 3.1.5. We use the following linear model of passive observations for the equation (3.1.6) in the presence of a nonrandom passive measurement error in the right-hand side:
where y(x) is an r2[c,d]known function measured with an error. We start by reviewing the essentials of a regularization method [Tikhonov 19631 and statement of the corresponding variational problems [Morozov 1967, 1987, Ivanov 19631. 3.1.2.1 Regularized Pseudosolution. Let us consider a method that utilizes the known norm of the right-hand side errors to compute the pseudosolution (the residue principle [Morozov 19671). Assumption 3.1.6. The set of admissible pseudosolutions u of the equation (3.1.7) is deJned by the inequality
where the error norm o is known in advance. The problem of constructing the normal pseudosolution as the argument minimizing the functional
on the set of admissible pseudosolutions kv has the form
However, this pseudosolution has only weak convergence to exact solution as 04 The .distance of the approximation z(s) from the unknown function ~ ( s is)
1 1 ~ 1 (,1, ~ =
b
therefore measured in the space w2(1) [a.b]:
W2
z2 (s)+ zt2(s)ds [Tikhonov
a
19631. The selection of regularized pseudosolutions (that is, pseudosolutions that strongly converge to the exact solution as 0 4 ) relies on the choice of set of suf-
148
Alexander S. Mechenov
ficiently smooth pseudosolutions [Tikhonov 19631 (a set in the space w2(1) [a,b] [Sobolev 19501). Problem 3.1.3. Given the approximate right-hand side 7,lly"- = o , an exactly specwed nondegenerate core K(x,s)of the equation (3.1.7), and the error norm a,Jind the regularized pseudosolution as the argument that minimizes the stabilizing functional [Tikhonov 19631
qll
on the set of admissible pseudosolutions
~x (3.1.8) has
the form
This approach has been systematically developed in [Tikhonov & Arsenin 19791, [Morozov 19871. It is proved in [Tikhonov & Arsenin 19791, that the regularized pseudosolution obtained in this way stably converges to the exact solution as 04. We write out the solution method used in what follows and the required Euler equation. Since the infinum is attained on the set boundary [Tikhonov & Arsenin 19791, the inequality in Eq. (3.1.9) can be replaced with an equality. Applying the method of Lagrange indeterminate multipliers, we pass to problem
A necessary condition for a minimum of functional Ma[4 is that its first variation with respect to 6 and A, vanish: A M ' [ ~ = OThis . leads to the Euler equation
where L is the Sturm-Liuville operator with boundary conditions that specify equality to zero of a sought solution and (or) its derivatives:
Pseudosolution of Linear Functional Equations
149
If inf r(il)=r(co) 0, then the pseudosolution is computed for A+@ [Tanana 19811. Theorem 3.1.3a. Assume that the conditions of the Problem 3.1.5 are satisJied. Then the regularized pseudosolution is a stable approximation to the exact solution of the equation (3.1.7). Proof. Define the sequence 6k = { o k , p k ) such that "
llyk - -,Il
l$ -KI
= ok,
= pk
[Morozov & Grebennikov 19921. Let
=2
exact solution. Using the definition of the element z6, ,we obtain
Using the relationship
we have
5 be the
153
Pseudosolution of Linear Functional Equations
Hence we obtain that the sequence zak ,k = 1,2,.... is bounded from above
It follows that we can isolate a weakly convergent subsequence from the sequence zak . Without loss of generality, we assume that zgk A?. Passing to the limit k+
in the inequality
we obtain that lim
ak+o
- (011
%
=
1
-
1
%
= 0 and B is the solution of the
equation (3.1.7). Then it follows from Eq. (3.1.15) that lim llzgk k-m
wil
=~
~.
In the Hilbert space wf) , weak convergence and convergence of the norms imply strong convergence (Arcela theorem- [Kolmogorov & Fomin 1972]), so that lim zg., = i and by uniqueness i = 6 . k+oo
Remark 3.1.2. 1. The Euler equation for. this problem differs from the Euler equation (3.1.10) primarily in the following sense. With respect to the SturrnLiuville operator LZ= z - Z" in the stabilizing operator of the equation (3.1.14)
~
Alexander S. Mechenov
the smoothing part becomes dominant, and the bias is suppressed. This is consistent with common sense: since the core contains an error, that is, is less smooth than the exact core, a better (smooth) solution is obtained by increasing the influence of the smoothing part. 2. The common approach to construction problems of the regularized pseudosolution bases on sets of the admissible pseudosolutions of the form
that is, the functional Q from a pseudosolution less or equal to the functional Q from the exact solution. In principle, we can compute other regularized pseudosolution on other sets of admissible pseudosolutions. For example, the set
which, after carrying out of the same computations, as in Theorem 3.1.3, has the form
allows to calculate the best pseudosolution, but demands knowledge
11k?z- jf
12
llqL. If there are these values we can take advantage of the formula (3.1.12"). If there is only (lqf we can take the formula L2
and
Pseudosolution of Linear Functional Equations
155
At last, taking into account that value of the exact solution norm beforehand very seldom is known, we take advantage of a Cauchy-Bunyakovskii inequality
that in result we receive the formula (3.1.12) of the set of admissible pseudosolutions. The application of the following iterative process for computing the pseudosolution: use the pseudosolution from Eq. (3.1.13) for computing the pseudosolution from Eq. (3.1.12') and then the pseudosolution from Eq. (3.1.12') for computing the pseudosolution from Eq. (3.1.12'7, can improve quality of the last. 3. The Euler equations (3.1.14), (3.1.12') and (3.1.12") for regularization parameter (the Lagrange multiplier) also differ from previously proposed equations [Tikhonov, etc. 19711, [Tikhonov & Arsenin 19791, [Morozov 19871, [Motozov & Grebennikov 19921. The previous procedures [Tikhonov, etc. 19711, [Tikhonov & Arsenin 19791, [Morozov 19871 and many other methods were applied to the problem with passive errors in the core and were strictly heuristic. They were not the result of the variational problem solution on minimization of quadratic functional~on compact sets. In particular, when discussing such a problem with A.G. Jagola back in 1969, the author proposed a clearly similar to the inequality [Morozov 19671
formula for computing the regularization parameter of the Euler equation (3.1.10) from the condition
Of course, this condition does not follow from any variational problem, but it has been tested in practice and as has been shown subsequently [Tikhonov, etc.
156
Alexander S. Mechenov
19711, some modifications of this condition lead to the necessary asymptotic expression, which has been useful for solving a number of physical problems. At that point however, the variational problem (producing the regularized pseudosolution) in the presence of observation errors in the core and in the right-hand side had not been developed. Its algebraic solution for SLAE was only obtained in [Mechenov 19911. Let us compare the result with that proposed in the formula of the generalized residue principle [Tikhonov & Arsenin 19791, [Jagola 1979, 1979a, 1980, l98Oa], which requires solving the following Euler equation
where c is the measure of inconsistency of the equation (3.1.11). Here the same goal (increasing the smoothness of the solution) is achieved by increasing the value of the regularization parameter a in addition to plliall we also introduce the
r,
inconsistency measure in the equation (3.1.10) for the regularization parameter. The inconsistency measure is not needed for computing the regularized pseudosolution, because the following Cauchy-Bunyakovskii inequality holds:
and so
where equality is attained only for p = oll&
I(
L2
.
4. The error norms are the weakest link in the regularization theory, and summation of two squared error norms reduces the probability of an error in estimating the norm. We know [Tikhonov, etc. 19731that the residue principle produces a parameter value that is somewhat greater than optimal.
157 5. In principle, we can easily consider the case when the equation (3.1.7) additionally contains a pure regression part (for instance, in the form of an unknown constant), as has been done in [Mechenov 19911 and in the Table 3 and 4 of Application. 6. This is applicable to nonlinear integral equations of the first kind and to other forms of the operator equations.
Pseudosolution of Linear Functional Equations
3.1.3.2 Regularized Quasisolution Let us construct the solution method that utilizes the known solution norm in the presence of nonrandom passive measurement errors in the right-hand side and in the core (a generalization of the regularized quasisolution method). Problem 3.1.6. Given one realization of the approximate right-hand side y"
and of the approximate core of the model (3.1.1 1) and given the value y bounding the set of admissible quasisolutions, compute the regularized quasisolutions
Theorem 3.1.4. Assume that the conditions of the Problem 3.1.6 are satisJied. Then the variational problem to construct the regularized quasisolution take the form
and the regularized quasisolutions is stable approximation of the exact solution ofEq. (3.1.7). Proof. Again passing from the integral to Darboux sums, we repeat the calculations of Theorem 3.1.2. Passing from the discrete problem to continuous problem, we obtain the equation (3.1.18). Applying the Lagrange method with the multiplier a , we obtain the functional
158 Alexander S. Mechenov Differentiating this functional, we obtain the corresponding Euler equation. The proof of stability of the regularized quasisolution is similar to Theorem 3.1.4. 3.1.4 Allowing of Nonrandom Active Errors in the Core and Passive Errors of the Right-Hand Side
Let us construct methods that compute the solution in the presence of nonrandom active errors arising during the realization of the given core, and passive errors associated with the measurement of the right-hand side. Assumption 3.1.10. We use the following linear model of active observations for a linear integral equation of thefrst kind (3.1.6)
where j(x,s) is the realization error of an exactly specijied core in the experiment. To the best of our knowledge, exactly specified cores that are realized with an error (an active experiment [Mechenov 1997, 1988]), have not been considered in literature. Let us formulate the construction of the residue-based regularized pseudosolution and the regularized quasisolution. 3.1.4.1 Regularized Pseudosolution We construct a method to find the regularized pseudosolution given the norm of nonrandom active errors that arise during the realization of the specified core and the norm of passive measurement errors in the right-hand side (further to us it can be demanded the exact solution norm). Problem 3.1.7. Given the approximate right-hand side 8,118- KG - k,ll= G, the exactly specfled nondegenerate core, the core realization error norm llSll=p for model (3.1.19), and the value q,p, fnd the regularized pseudosolution. Since the total right-hand side error
depends on the sought solution, the set of admissible pseudosolutions
159
Pseudosolution of Linear Functional Equations
will lead to the minimization problem with the nonquadratic functional, that is, to zl = arg
1
inf
IMI'
J,
E-"Y: w2 (~llsll, +.)
Even if we take advantage of the set of admissible pseudosolutions of the form 22
= arg
then, in both cases, we construct a method similar to the method for passive errors, that contradicts sense of the problem. Besides the given methods do not coincide with the statistical approach in which that fact is essentially used, that the full error turns out depending from a sought solution. To solve this problem we apply an equivalent of the statistical approach [Mechenov 19971. Using the result of [Mechenov 19971, we construct a functional equivalent to the MLM [Kendall & Stuart 19691 and we realize it for finding the set of admissible pseudosolutions:
where
This functional distinguishes from a functional implying from the MLM on some constants. In particular, in a finite-dimensional case, after the computing the logarithm of the determinant of expectation of a correlation error matrix, before the logarithm there is a value n, which is more or equal to expectation of the RSS. For the set of admissible pseudosolutions T this value is equal to 1. With the pur-
160
Alexander S. Mechenov
pose to compare the results with the previous results, we write the variational problem for the regularized pseudosolution in the following form. Problem 3.1.7'. Find the regularized pseudosolution as argument of functional injnum
where the value rn is equal to
Theorem 3.1.6. Assume that the conditions of the Problem 3.1.7 are satisjed. Then the variational problem for construction of a regularized pseudosolution has the form: zp = arg
inf
(3.1.22)
Il~~-q"ll:~+(u2+p2)ln (l+\~dl:~) =m2
6:
l+II$
and the regularizedpseudosolution of the equation (3.1.19) satisjes to the Euler equation
and it is unique in area
Pseudosolution of Linear Functional Equations
161
Proof. The proof is similar to Theorem 3.1.2. Theorem 3.1.7. Assume that the conditions of the Problem 3.1.6 are satisJied. Then the regularized pseudosolution is a stable approximation to the exact solution of equation (3.1.6). Proof. Define the sequence 6 k = {ok ,pk such that [Morozov & Grebennikov 19921
Let
be the exact solution. Using the definition of the element isk,we obtain
Hence we have
that is, the sequence z6, ,k = 1,2,.... is bounded from above
It follows that we can isolate a weakly convergent subsequence from the sequence z6, . Without loss of generality, we assume that
Passing to the limit k +co in the inequality
162
Alexander S. Mechenov
we obtain that Nm
6, +o
1 1 - vll~'k ~1 1 -~ 1 =
"L
=0
and i is the solution of the
equation (3.1.7). Then it follows from Eq. (3.1.24) that /'Jlz6,
iw4,
=I lil,!~)
.
In the Hilbert space wz(l), weak convergence and convergence of the norms imply strong convergence (Arcela theorem- [Kolmogorov & Fomin 1972]), so that lim z6, = i and by uniqueness 5 = 4. k-m
Remark 3.1.3. 1. The Euler equation (3.1.23) for this problem differs from the Euler equation (3.1.10) in the following sense. With respect to the Sturm-Liuville operator LZ= z - Z" in the stabilizing operator of the equation (3.1.23)
the smoothing and biasing parts become multiplying by a constant, and biasing part becomes increasing.
Pseudosolution of Linear Functional Equations
163
2. Since the value m is complicated for computing (it depends from the norm of the exact solution), we propose to compute the regularized pseudosolution from the Euler equation (see Eq. (3.1.19))
The changed mode of the parameter choice can be preferable. In this case, the parameter choice does not depend from the exact solution norm. 3.1.4.2 Regularized Quasisolution Let us construct a method that utilizes the known solution norm in the presence of nonrandom passive measurement errors in the right-hand side and of nonrandom active errors in the prescribed core (a generalization of the regularized quasisolution method). Problem 3.1.8. Given the approximate right-hand side q,llg"- KC, - k,l= o,
the exactly specijed nondegenerate core, the core realization error norm p in model (3.1.19) and values a,y;find the regularized quasisolution. Since the total right-hand side error G = - KC = 74+ Z depends on a sought solution, the L2 residue norm inadequately represents the situation and the minimization problem cannot be solved by standard method, that is, in the form
Therefore, correct description of the error norm required applying the equivalent of the statistical approach [Mechenov 19971. Using Eq. (3.1.20), we construct the problem of computing the regularized quasisolution
164 Alexander S. Mechenov Theorem 3.1.8. Assume that the conditions of the Problem 3.1.8 are satisJied. Then the variational problem to construct the regularized quasisolution takes the form
O and we consider variety twice continuously differentiable functions a s ) such, that r(a)=r(b)=O. It is obvious, that unique function which satisfies to equation Lc=O, at ~(s)=const,S(s)=const, and to these boundary conditions, it will be identically equal to zero. It is known, that the Sturm-Liuville operator is unlimited. We consider the following functional [Tikhonov 1963, Tikhonov & Arsenin 19791: the norm in Sobolev space
5(1) [a,b][Sobolev 19501
Integrating by parts at ga)=c(b)=O, we have
3.1.5.2 Sturm-Liuville Differential Operator of the Order n Similarly above-stated, we consider the function norm in the Sobolev space
P$)[a,b]
c(s)
with the conditions q(s)>O, i 0 , l and function such, that &(a)=&b)=0, i=O, 1,...,n- 1. Fulfilling an integration by parts, we have
where L(") is the Sturm-Liuville differential operator of the order n.
166
Alexander S. Mechenov
3.2 About the Appurtenance of Gauss-Markov Processes to Probability Sobolev Spaces 3.2.1 Random Processes
The given item is necessary for an account of some facts from the theory of random processes that will be used further. Let { S 2 , 7 , ~ )is a probability space, where n={o} is a space of simple events, 7 is the a-algebra of its subsets, which elements refer to as random events, P is a probability (the measure on {S2,7) such that P{n}=l). Also is present the measurable space (W,q in which all singletons are measurable also which elements refer to as conditions [Ventsel 1975, Malenvaud 19701. The random variable w is measurable map of space {Q7,P) in (W, 9 . Under the a-algebra generated by system of random variables w, with values in (Wa,?a), we understand the a-algebra generated by events of an aspect {w,EG}, GE?,, CF {w.}=cF {{w,EG}, GE?,}. Distribution of a random variable w is the measure F, on (W, a), assigned to the relation
A joint distribution of random variables x, y..., z, accepting values in spaces (X, 4, (Y, a),..., (2, $Z) is a distribution of the random vector (x, y..., z)
The random defined on the set T function is function wt(u) which at everyone f ET is measurable on w. Random process with discrete time is a random function, for which T is or set of all whole: f c Z or set of the whole positive: f d . Taking into account a finite number of values f: M = {tl,f2,...,t, ) c T , the distribution F of random variables wt, (a),wt2(o),...,wtm(o) (that is, the vector (wl,w2,...,w,)~) is by definition the probability P {w €A} for set A E F
These distributions at every possible f 1, f2,...,fm€ T carry a title of finitedimensional distributions of a random function. Random process wt(o) refers to stationary, if
Pseudosolution of Linear Functional Equations
identically relatively u, t and at anyone T. Only random process by definition is formed of sequence of independent random variables with the same distribution. Distributions of this process are the following
It is obvious, that this process is stationary.
The moments of the first and second sort of random process will be
The expectation is calculated with the help of a cumulative distribution function O f ,a covariance is calculated with the help of OtIt2. If process is station-
"
ary, the condition Ot,+, = Otl leads to that Ew(t)= is equal to a constant, not dependent from t, and K(t1,t2)=K(tl-t2)=K(u)depends only on a difference of arguments and refers to as an autocovariance. We put rco = a2 ,/ci = cr2f i ,i = 1,. ..,GO. Thus, performances of the first and second order of stationary random process are: expectation ~ i r ,a variance a2, and a correlogram p i ,i = 0,.. oo . All autocorrelation coefficients of correlogram of only random process are equal to zero except for the first. Let random process a,
be stationary random process without correlations. Process of the form wt = yget + ylet-1 +...+Yket-k refers to as sliding average process, where yo,^,...,yk are the fixed numbers. It is easy to calculate its correlogram
168
Alexander S. Mechenov
Hence, the correlogram is equal to zero since valuej=k+ 1. 3.2.2 Norms of Autoregression Processes
We consider the equation
where ,$,b,...,Pk are the fixed numbers, and wt is a stationary random process. We study solutions of this classical linear finite difference equation with constant factors. We replace w, on zt, where z is a complex number
where z,, j=1,. ..,k, are the radicals of this equation. Three types of radicals differ: 1. Iz,l>l, j=l, ...,k, the solutions miss, when t will increase. 2. Iz,l
+
3