Lecture Notes in Mathematics Edited by A. Dold and B. Eckmann
690 William J. J. Rey
Robust Statistical Methods
Spring...
22 downloads
473 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Lecture Notes in Mathematics Edited by A. Dold and B. Eckmann
690 William J. J. Rey
Robust Statistical Methods
SpringerVerlag Berlin Heidelberg New York 1978
Author William J. J. Rey M B L E  Research Laboratory 2, Avenue van Becelaere B1170 Brussels
Library of Congress Cataloging in Publication Data
Rey, William J J 1940Robust statistical methods. (Lecture notes in ~athematics ; 690) Bibliography: p. Includes indexes. 1. Robust statistics. 2. Nonparametric statistics. 3. Estimation theory. I. Title. II. Series: Lecture notes in mathematics (Berlin) ; 690. QA3.L28 no. 690 [QA2763 510'.8s [519.53 7824262
AMS Subject Classifications (1970): Primary: 62 G 35 Secondary: 62G25, 62J05 ISBN 3540090916 ISBN 0387090916
SpringerVerlag Berlin Heidelberg NewYork SpringerVerlag NewYork Heidelberg Berlin
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, reprinting, reuse of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. © by SpringerVerlag Berlin Heidelberg 1978 Printed in Germany Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr. 2141/3140543210
FOREWORD
During
the
processing These
last nine
of b i o m e d i c a l
problems
assumptions
had
were
from unknown
years,
in common
without
were
the
the
ordinary
had to be r e l i a b l y
estimators
was
partly
presented
robust
estimators.
offset
of the
spurious
and v a r i a n c e estimators assess
in the
estimation.
is justified
their
without
properties,
resorting
The t h e o r e t i c a l function.
Applied
and r e g r e s s i o n aspects.
The primary
estimates
data
in this
sample.
tools
even
for
Due
toward
the
has
concern
is bias
reserved
form which
sample
sizes
design
of
any significant
of an erroneous
emphasis
small
This
methods.
is p r e v e n t i n g
analytical
model
or
reduction
to type M permits
(n=10
to
or 50),
arguments.
are m a i n l y
derivations
analysis.
special
drawn
of the
results.
oriented
second
samples
sophisticated
scatter
of the
selection
The
by their
to involved
are
concern
due to the
The
and the
of robust text
quality
and frequently
nevertheless,
comparison
solved by application
The methods
to
lot;
statistical
by the anthor.
of the usual
Poor
nonnormal
estimated
needed to permit
in the
en c o u n t e r e d
fact that most
usually
parameters
been
problems
been
any solid basis.
distributions,
nonstationary,
several
data have
are
the jackknife in the
attention
fields
and the
influence
of location
is devoted
estimation
to c o m p u t a t i o n a l
TABLE OF CONTENTS
I. INTRODUCTION
I
1.1. History
and main contributions
1.2. Why robust
estimations
I
?
4
1.3. Summary
6
2. ON SAMPLING D I S T R I B U T I O N S 2.1.
8
Scope of section
2.2. Metrics
8
for p r o b a b i l i t y d i s t r i b u t i o n s
2.3. D e f i n i t i o n of robustness, 2.4. Estimators
8
breakdownpoint
seen as functional of d i s t r i b u t i o n s
2.5. The influence function of Hampel
15
3. THE JACKKNIFE 3.1.
17
Introduction
17
3.2. Jackknife theory 3.3.
Case
11 12
20
study
24
3.4. Comments
28
h. M  E S T I M A T O R S
30
4.1. Warning
30
4.2. M M  e s t i m a t o r s
32
4.3. M  e s t i m a t o r s 4.3.1.
Location and r e g r e s s i o n
4.3.2.
Least powers
4.3.3.
Can we expand in Taylor
4.3.4.
"Best" robust
4.4. M M  e s t i m a t o ~
4.5.
in location and r e g r e s s i o n
38 38 43
series ?
46
location estimator
~9
in regression
estimation
60
4.4.1. R e l a x a t i o n methods
62
4.4.2.
Simultaneous
64
4.4.3.
Some proposals
4.4.4.
Illustration
solutions
Solution of fixed point
65 68 and nonlinear
equations
5. OPEN AVENUES 5.1. Estimators
72 77
seen as functional
of d i s t r i b u t i o n s
77
5.2. Sample d i s t r i b u t i o n of estimators 5.3. A d a p t a t i v e estimators
78 80
5.4. R e e u r s i v e
82
estimators
Vl
5.5.
Other
views
on r o b u s t n e s s
84
6. R E F E R E N C E S
88
APPENDIX
100
Consistent
101
estimator distributions
102
Contaminated
normal
Distribution
space
106
Influence
function
108
Jackknife
technique
109
Prokhorov
metric
114
derivative
120
118
Robustness von M i s e s AUTHOR SUBJECT
INDEX INDEX
~2~129
I.
INTRODUCTION
The t e r m statistical
"robustness" definition.
in
1953 to
cover
by
Kendall
and B u c k l a n d
does It
a rather
not
seems
vague
lend
itself
to h a v e
concept
(1971).
Their
to
been
a clearcut
introduced
by G . E . P .
described
in the
following
dictionary
states
:
Box way
" R o b u s t n e s s : M a n y test p r o c e d u r e s i n v o l v i n g p r o b a b i l i t y l e v e l s d e p e n d for t h e i r e x a c t i t u d e on a s s u m p t i o n s c o n c e r n i n g the g e n e r a t i n g m e c h a n i s m , e.g. t h a t the p a r e n t v a r i a t i o n is N o r m a l ( G a u s s i a n ) . If the i n f e r e n c e s are l i t t l e a f f e c t e d by d e p a r t u r e f r o m t h o s e a s s u m p t i o n s , e.g. if t h e s i g n i f i c a n c e p o i n t s of a t e s t v a r y l i t t l e if the p o p u l a t i o n d e p a r t s q u i t e s u b s t a n t i a l l y f r o m the n o r m a l i t y , the t e s t s on the i n f e r e n c e s are said to be r o b u s t . In a r a t h e r m o r e g e n e r a l sense, a s t a t i s t i c a l p r o c e d u r e is d e s c r i b e d as r o b u s t if it is not v e r y s e n s i t i v e to d e p a r t u r e f r o m the a s s u m p t i o n s on w h i c h it d e p e n d s . " This
quotation
various with
can be
associates
procedures.
expressed
applicability robust
clearly
statistical
as f o l l o w s
of a g i v e n
against
some
robustness The
how
we d e s i g n
a statistical
procedure
to
safe
of p o s s i b l e
data
set
I.I.
History
Due
to t h e
increased
and main
is the
met
domain
equivalently, ?
Second,
robust
or,
uncertainty
of is it
how
in o t h e r
in the
available
contributions
appearance
For
mathematics
relates
:
field
the and
as
analytical
thirty
a mode it has
can
stands
some
be
line
seen
twenty
as
received
Mainly,
algorithms in the
possibly
been
has
years.
in r e c u r s i v e it
as w e l l
of r o b u s t n e s s
last
However,
instance,
of l o c a t i o n ,
Thucydides
the
during
developments.
studies.
of i n v o l v e d
facilities,
attention
in n o n  l i n e a r
estimate
to be
of the
questions
?
computational
the n e w
or,
assumptions
should
in s p i t e
applicability
large
procedure
f r o m the
terms,
remain
with
complementary
: first,
statistical
departure
two
as
four
have
an
progresses permitted
of m a n y
old
a robust centuries
ago.
" D u r i n g t h e same w i n t e r (h28 B . C . ) , the P l a t a e a n s ... and t h e A t h e n i a n s who w e r e b e s i e g e d w i t h t h e m p l a n n e d t o l e a v e the c i t y and climb o v e r the e n e m y ' s w a l l s in the h o p e that t h e y m i g h t be able to f o r c e a p a s s a g e ... T h e y m a d e l a d d e r s e q u a l in h e i g h t to the e n e m y ' s w a l l , g e t t i n g the m e a s u r e by c o u n t i n g the l a y e r s of b r i c k s at a p o i n t w h e r e the e n e m y ' s w a l l on the side f a c i n g P l a t a e a h a p p e n e d not to have b e e n whitewashed. M a n y c o u n t e d the l a y e r s at the same t i m e , and w h i l e s o m e w e r e s u r e to m a k e a m i s t a k e , the m a j o r i t y w e r e l i k e l y to hit the true c o u n t , e s p e c i a l l y since t h e y c o u n t e d t i m e and a g a i n , and, b e s i d e s , w e r e at no g r e a t d i s t a n c e , and the part of the w a l l t h e y w i s h e d t o see was e a s i l y v i s i b l e . The m e a s u r e m e n t of the l a d d e r s , t h e n , t h e y got at in this way, r e c k o n i n g the m e a s u r e f o r m the t h i c k n e s s of t h e b r i c k s . " Eisenhart
(1971)
sophisticated
has
also
compiled
procedures
to
background
can also
other
estimate
examples
a location
of m o r e
or less
parameter
for
a set
of
measurements. Historical of a f f a i r s that
the
around
adoption
to u n m a n a g e a b l e accepted; seen
that
was
done
well
as in the
this
history
papers
surveyed
by H a r t e r
in the
complementarily
important
works
clarified estimation framework
of t h e i r
of H u b e r We w i l l
relations
and
finite
where
Particularly, derivations.
the
they
theory provide
past
of y o n
which
size
in w h a t
exist
time,
in the
an
more
(19721973) later
join
best
were
important
procedures report
series
as
of
of
specifically
developments
is r o b u s t n e s s
to
are
and them.
we n o w
sketch
some
:
Mises
practical
dogma
A remarkable
regard
state
largely
the
rejection
and t h e o r e t i c a l
(1973).
recent
that
of t h e
it a p p e a r s
as s e c o n d
was
with
found
With
Lecture
of the
At
can be
1972 W a l d
insight
seen
of n o r m a l i t y
by
Hampel
was
soundness.
time
account
Briefly,
of v a r i o u s
(197h1976).
investigations the
dogma
discarded.
motivations
some m o r e
 The
techniques
investigation
up to the p r e s e n t
by the
(1973).
in d i s a g r e e m e n t
and h a d to be
history,
To g a i n
The
observations
in the
be g a i n e d
in S t i g l e r
squares
approaches.
assessment
written
robustness
given
of l e a s t
is,
as e r r o n e o u s
effort
1900
(19h7)
between
and P r o k h o r o v
asymptotic
situations.
of r o b u s t n e s s justifications
has
been
for the
(1956)
theory
of
They
provide
able
to grow.
analytical
have
a neat
 A mathematical by
Tukey
and
(1958),
estimate
the
distribution released
 A review
on h o w
of t h e  The
answers He
famous
to t h e
the
principles has
into
question
on h o w to
design
asymptotic sample)
have
Further, family
underlying we
here
augmented
the bias
regard
for the
is thus
rejection and
of
experimental
appearing
in the
places,
for o t h e r
the
normal
estimators
estimation
pioneer
pvalues.
work
squares
seems
to be
and c h a r a c t e r i z e s location,
normal.
In this
minimization estimations
of G e n t l e m a n
The m a x i m u m
procedures.
set of u n d e r l y i n g
of the
through
least
of
F (the
suitable
Mestimators
obtained
origin
: "A c o n v e n i e n t
is a c o n t a m i n a t e d
classical
at the
statistical
excerpt
some
for the
estimators the
has b e e n
(n + ~) w h e n
over
introduces
them
distribution
find the
their to
variance
ranges
he
among
p of r e s i d u a l s ;
referred
to t h e
robust
we
for a s y m p t o t i c a l l y
of the
framework,
and
assumptions.
theoretical
196h,
of d i s t r i b u t i o n s ;
of the
power
any
distribution
observations
in
of r o b u s t n e s s
the
without
pertaining
account
(1956)
to reduce
statistician
stimulated
of H u b e r ,
supremum
when
The
paper
distributions"
permits
distributions.
domains
robust
set.
questionable
distribution
a most
Quenouille
estimators
data
(1960)
to t a k e
sample
considers
measure
of the
by
technique,
of m o s t
the
frequently
by A n s c o m b e
researches tails
variance
some
proposed
jackknife
underlying
of
outliers
trick
the
(1965)
likelihood
of some (p = 2)
is
estimators
also
are M  e s t i m a t o r s . A thesis

introduces
the
an e s t i m a t o r estimation other
far
we r e f e r
Statistics,
worthy
Among
Berger 
(N.C.H.S.
the
(1971), (1976a,
Thus,
to d e p e n d
work,
the
sensitivity
it p e r m i t
of
to modify
on o u t l i e r s ,
or on any
observations. of t h e
interested with
list
that
question and,
not
to yon M i s e s ' exhibiting
the
period
prior to
r e a d e r to the U.S.
1970
annotated
National
Center
is bibliography
for H e a l t h
 1972).
the
to m e n t i o n
them,
Jaeckel
the
a tool
values.
literature
arrangement
To c o n c l u d e

be
as the
of the
related
as
in o r d e r
feature
under
(1968), curve
observation
procedures
concerned, prepared
Hampel
to the
specific  As
by
influence
of m a i n
many
of what
whether
theoretical
fundamental we
really
estimators
are
contributions,
questions estimate
remain
it m a y
open.
is a p p r o a c h e d
admissible
by
is a n a l y s e d
by
1976b).
Complementary
to the t h e o r e t i c a l
progress,
experience
has
been
gained
through
Princeton has
Monte
Study
displayed
Carlo
(Andrews
computer
et
in an o b v i o u s
al.
runs.
 1972)
w a y the
In this
specially
need
for
respect, deserves
robust
the mention;
it
statistical
procedures.
1.2.
robust estimatons
Why
It seems robust
statistical
computing analysis
robust
with
methods In
computers
sets
then
as w e l l
contribute
the m a i n
lies
in the
short,
rather
softwares.
draws
to the
the
that
classical
elaboration
for making
overwhelming
frequently,
robustness
use
of
of our statistical
sets
are
results
produced
deficiencies
applied
by
of the
statistical
is e s s e n t i a l
statistics.
and to the
data
with
on p o s s i b l e of the
power
to p e r f o r m
Comparison
attention
it a p p e a r s
to
motivation
it is so easy
as on l i m i t a t i o n s
Thus,
complementary
that
that,
by u n s u i t a b l e
methods
procedures. is
author
facilities.
processed
data
to t h i s
?
One
and
validation
of
the
because
it
other must
statistical
conclusions. Many apply has
statisticians
when
been
the
see e n t r y
arrive
than at
normal
by t h e
the m o s t
illustrated
by T u k e y
(1960)
distributions.
deviation.
efficient
robustness
have
a long
normal that
been
layed
excerpt
:
scale
Some
paper down
large
fact m u s t
some
justifications (1973).
kept
study
 and
are r e q u i r e d
deviation
standard
This
by H a m p e l
be
this
observations
the
This
of
appendix
sizes
standard
that
study
extended
of the
thousand
0.95)
they
assumptions.
in his
sample
by the
eight
the m e t h o d s the
further
distribution"
(at l e v e l
estimator.
incentive
We h a v e
quite
of the
to g u a r a n t e e
In a r a t h e r
propose
satisfy
conclusion
mean
can be
strictly
the m e a s u r e m e n t
disposal
how poor
do not
"contaminated
at t h e
justify
do not k n o w
sets
brillantly
contaminated 
data
to
rather should
be
deviation
is
in m i n d .
to the use
of
Hereunder,
we
"What do those "robust estimators" intend ? S h o u l d we g i v e up our f a m i l i a r and s i m p l e m o d e l s , such as our b e a u t i f u l a n a l y s i s of v a r i a n c e , our p o w e r f u l r e g r e s s i o n , or our h i g h  r e a c h i n g c o v a r i a n c e m a t r i c e s in multivariate statistics ? The a n s w e r is no; but it m a y w e l l be a d v a n t a g e o u s to m o d i f y them slightly. In f a c t , g o o d p r a c t i c a l s t a t i s t i c i a n s h a v e d o n e such m o d i f i c a t i o n s all a l o n g in an i n f o r m a l way; we n o w o n l y start to h a v e a t h e o r y a b o u t t h e m . Some l i k e l y a d v a n t a g e s of such a f o r m a l i z a t i o n are a b e t t e r i n t u i t i v e i n s i g h t into t h e s e modifications, improved applied methods (even r o u t i n e m e t h o d s , for some a s p e c t s ) , and the c h a n c e of h a v i n g p u r e m a t h e m a t i c i a n s c o n t r i b u t e s o m e t h i n g to the p r o b l e m . Possible disadvantages may arise along the usual transformations of a t h e o r y w h e n it is u n d e r s t o o d less and less by m o r e and m o r e people. D o g m a t i s t s , who i n s i s t e d on the use of " o p t i m a l " or " a d m i s s i b l e " p r o c e d u r e s as l o n g as m a t h e m a t i c a l t h e o r i e s c o n t a i n e d no o t h e r c r i t e r i a , m a y n o w be g o i n g t o i n s i s t on " o p t i m a l r o b u s t " or " a d m i s s i b l e r o b u s t " e s t i m a t i o n or t e s t i n g . T h o s e who h a b i t u a l l y try t o lie w i t h s t a t i s t i c s , r a t h e r t h a n seek f o r t h r u t h , m a y c l a i m even m o r e d e g r e e s of f r e e d o m for t h e i r w i c k e d doings. "Now w h a t are the r e a s o n s f o r u s i n g r o b u s t procedures ? T h e r e are m a i n l y t w o o b s e r v a t i o n s w h i c h c o m b i n e d g i v e an a n s w e r . O f t e n in s t a t i s t i c s one is u s i n g a p a r a m e t r i c m o d e l i m p l y i n g a v e r y l i m i t e d set of p r o b a b i l i t y d i s t r i b u t i o n s t h o u g h t p o s s i b l e , such as t h e c o m m o n m o d e l of n o r m a l l y d i s t r i b u t e d e r r o r s , or t h a t of exponentially distributed observations. Classical (parametric) statistics derives r e s u l t s u n d e r the a s s u m p t i o n t h a t t h e s e m o d e l s w e r e s t r i c t l y true. However, apart f r o m some s i m p l e d i s c r e t e m o d e l s p e r h a p s , such m o d e l s are n e v e r e x a c t l y true. We m a y t r y to d i s t i n g u i s h t h r e e m a i n r e a s o n s for the d e v i a t i o n s : (i) r o u n d i n g and g r o u p i n g and o t h e r " l o c a l i n a c c u r a c i e s " ; (ii) the o c c u r r e n c e of " g r o s s e r r o r s " such as b l u n d e r s in m e a s u r i n g , w r o n g d e c i m a l p o i n t s , e r r o r s in c o p y i n g , i n a d v e r t e n t m e a s u r e m e n t of a m e m b e r of a d i f f e r e n t p o p u l a t i o n , or just " s o m e t h i n g w e n t w r o n g " ; (iii) the m o d e l m a y have b e e n c o n c e i v e d o n l y as an a p p r o x i m a t i o n a n y w a y , e.g. b y v i r t u e of the central limit theorem".
1.3.
Summary
It a p p e a r s fact
must
be
that
most
faced
difficulty.
When
and
concerning
at d i s p o s a l
as w e l l
factors thus
(e.g.,
parameter. to t h e i r
given
any
model
very at the
is the
robustness Section compare
level;
2 provides
that
difficulty
the to
its
but
the
some
and,
the
by r e f e r e n c e sizes.
estimators
we w i l l
under
encounter
parameter.
appears
may
However
what
be u n c l e a r .
at the
theory
of the
and p e r m i t
root
of all
h is the
on
leads
central
(or
part
and
covariance)
several
generalizations
in m a n y
respects.
5 reconsiders
a few
met,
on the
6 is a l i m i t e d
the
not
are
has
the
been
to form
the
possible
in
We m a y h o w e v e r
scarcely of r o b u s t n e s s
several
of this
side
so far,
through
jackknife are
results.
It is a d e t a i l l e d
Mestimators problems.
they
which were
(denoted
This
developments
questions
because
the
as w e l l
derivations
conjectural
text.
of p r e v i o u s
specific
in e s t i m a t o r s
presented
in r e g r e s s i o n
original
are,
permit
to v a l i d a t e
years.
bias
of s i m u l t a n e o u s
presents
questions
result
to a d e f i n i t i o n
of p o s s i b l e
substantiate
left
has
thirty
of v a l i d i t y
of our k n o w l e d g e ,
applications
but
last
which
function.
variance
of M  e s t i m a t o r s
The
is p r e s e n t e d
demonstration the
directly
reduction
to
derivations
value.
argument
conditions
influence
3 provides
emphasis
an
during
needed
original
Section
and it
asymptotic
a strict
attempts
To the b e s t
of t h e
therefrom,
various
estimate
with
sample
compare
few t h e o r e t i c a l
expansion;
This
as e s t i m a t i o n
Most
infinite to
to
analysed
is e r r o n e o u s
method.
Section
be
assumption
conceptual
as to the
previously
parameter
sample
issues
These
an e s t i m a t o r
will
for
of t h e
the
principles.
restrictive.
with
obtained
quality
function).
of the
it is p o s s i b l e
the
concerning
loss
estimators
some
the m o d e l
several
Section
optimal
this
conceptual
is due to
to
structure,
analysis
model,
uncertainty
some
basis;
it
consistent
spite
Section
to
a weak
are
obtained
as w e l l
partly
of c o m p a r i n g
asymptotic
an e s t i m a t o r
conjecture
all
have
required,
definition
values
which
when
of a T a y l o r  l i k e
of
of an
clear
often
estimators
methods
statistical
any p o s s i b i l i t y
asymptotic
is e s t i m a t e d This
the
are
as, p e r h a p s ,
In p r a c t i c e ,
Furthermore,
robust
methods
selection
prohibit
suppress
robust
can be r e l a t e d
those
uncertainty
involved
of the
and
have not
MM),
part is
been
timely.
open.
bibliography.
We h a v e
tried
to
favor
recent
papers
and
surveys
investigation. very
We
arbitrary.
in the
domains
are w e l l
aware
Furthermore,
we
which
are
of the feel
secondary
fact
that
limited
to
such
our m a i n selection
in r e a d a b i l i t y ,
time
is and
space. Throughout asymptotic finite methods
this
text,
properties
sample
we
have
and h a v e
estimators.
in a p p l i c a t i o n s .
This
not
devoted
placed option
the
much
accent
results
attention
to the
on the b e h a v i o u r
from
the
need
of
of r o b u s t
2.
ON SAMPLING DISTRIBUTIONS
2.1.
Scope o f
the section
Hereinafter,
we
relationships relative
intend
existing
to
the
to
sketch
between
relations
two
theories
distributions
between
relative
for
distributions
one and
of
to
the
them,
and
estimators
for
the
other. To
fix
the
ideas,
consider
m =
[ w i xi,
the
following
standard
set
of
equations
(i=I ..... n ) ,
=
s
X w.(x.m)
=
1
2
1
,
= lira m, n * ~, 2 2 c* = lira s , n ÷ ~. Under of
appropriate
the
the
one
dimension
present
relative
conditions, sample
formulation
weight
wi,
it
be
(Xl,...,Xn)
each
which
can
used , as
to
estimate
well
as
the
its
location
scale
o.
observation
could
x. has b e e n a t t r i b u t e d I reflecting its i m p o r t a n c e or
be
In a its
accuracy. To
investigate
underlying such
the
that
The
xi,
interest
the
present the
This
will
case,
the
asymptotic
s
2
=
~
2
+
lie
in t h e
itself
2.2.
Metrics
for
indicated
to
easy
probability by
to
w i as are the
[ w i [(xiu)2
Munster
Baire
~ on t h e to
lead
sample with
well
to
distributions
"closeness"
distribution
respect
as t o
the
enlightened scale
 ~2I
distribution
compare
of
concern.
dependence
weight
and
a tool
distributions first
relations
lends
As
the
U
need
our
estimators;
which
distributions
also
in
relative
be
of
we
between
will
specifically, to
dependence
"closeness"
estimators.
more
the
observations,
by
parameter
This
will
of m
each
sample
also
be
our
n.
s or,
In t h e
in t e r m s
given
(x i  u)
and
observation
size
expansions is
 [ [ wiw j
analysis.
to
(xj
of
by
 u),
second
concern.
distributions (197h),
functions,
we nor
do the
not
have
to
application
restrict space
our to
Borel
sets;
although,
in p r a c t i c e ,
we
of f u n c t i o n s
and the
Euclidean
distribution
we m a k e
use
o v e r the the
ndimension
appendix
concerned one
at the
here
another;
be
reduced
only
assess
to the
with
the
are
the
in our
help
They
 They may
 They
context
simple
in
been
distributions
and,
thus,
been
they
proposed
are
revie~
to
generally
of K a n a l
following
not
(197~)
or
deficiencies
of c o n t i n u o u s
appropriate
to
compare
its u n d e r l y i n g and,
we n o w
latter
an e m p i r i c a l
parent.
thus,
are not
applicable
is,
need
accordingly conclude,
former
limit
respect,
the
I.~, in
a subset
The
achieved
over
Prokhorov
subsets
present
metric level
is  see
understanding
the
of
here
association
sample
space;
measure of the
of
the
of the
is d e r i v e d sample
considerations.
of p r o b a b i l i t y holds
an i l l u s t r a t i o n .
it
details
probability
distance
supremum
inequality
1956),
its
to
of a d i s c r e t e
of t h e
of the
of d i m e n s i o n a l i t y to
suitable
just m e n t i o n e d
one t h r o u g h
subset.
to the by
helpful
a continuous
measure
comparison
help
the
a very with
in the
points
the
triangular
ourselves
help
three
with
(sec.
involved
 To
the
is r e l a t i v e
incidentally
in this
rather
with
this
distance
Prokhorov
a continuous
measures
the
only
permits
independent
its d e f i n i t i o n accordingly, only
with
over
probability therefore,
by
consider
inequality, distribution
its p a r e n t . the
appendix
performed
distribution
from the
from
metric
of the
is t h e n
empirical
although
in the
Prokhorov
observation
We
later
the
closeness
triangular
an
proposed
distribution
comparison
the
knowledge
idea
metric"
The
empirical
and,
not ~ith
differing
of our
has
analytics,
above.
fact,
m a y be to
:
onedimension
compare
its m a i n
"Prokhorov
and
in
only
will
have
the
(1976),
of the
are
satisfy
to
distribution To the b e s t
each
but,
strictly
rarely
condition
its
be
Rn .
on
our
distances
distribution
be
given
 We w i l l
estimators
however
through
of Chen
a measure
distributions,
class
defined
are
distributions
between
definitions
Going
paper
for m o s t
function
space"
close
Baire
discontinuous
precisions
sampling
of d i s t r i b u t i o n s ,
of the
(discrete)
of t h e i r
the
of i n t e r e s t .
context.
provide
how
of c l o s e n e s s
with
fundamental Dirac
"distribution
of " d i s t a n c e "
closeness
observed 
are
at ease The
classical
of m e a s u r i n g
closeness
number
acceptable
on
assessment
distributions
A great
is the
feel R n.
R n  Complementary
section
by ways
the
of
space
only space
space In
measures
true. definition
and will
of i n f o r m a t i o n .
We
10
We
estimate
densities
Distribution
where
the
distance
of p r o b a b i l i t y
two
distributions
some
derived
from
f through
g(x)
=
f(x)
+ t ~(XXo),
(lt)
centered
of the
on x0,
sample
space
integration
simplicity, that
variable
we a s s u m e
g ( x  x 0)
I
is a D i r a c
if x ~ x 0
f ~(XXo)dX
the
0 < t
I.
In q u i t e
11
d(f,g) but
we m u s t
frequently
2.3.
warn
Definition
This Hampel
the
requires
of
paragraph (1971)
reader fairly
< t
0
,
observed
that
these
T rather
than
the
: A Taylorlike is v a l i d
with
two
definitions
respect
constraining They
of the
T defined
distributions
to the
are
f(x).
in terms
a functional both
R.
distribution
expansion
for
and g, when
Ixl
are
functional.
yon
over
Mises
two
smooth
For the
yield
the
: derivatives
distributions
and d o m a i n real
f
limited
variable
t
satisfying O~t~1, we have T[ (lt)
f + tg]
= T(f)
+ t I ¢(x) 2 t +7f ÷
To gain
further
expansion Let
say we have
pdimension
sample
function
f(x).
thus,
attribute
them
be n o t e d
in t e r m
at d i s p o s a l
space
The
(p >
sample to
f¢(x,y)
each
( W l , . . . , w n).
dx
g(x)
g(y)
dx dy
...
in this
of an e s t i m a t o r
us
we
insight
g(x)
theorem, of
illustrate
its a s y m p t o t i c
some
I) with
population observation The
we
sample
drawn
probability
possibly
a given
empirical
value.
(Xl,...,Xn)
continuous has
by the
been
positive
density
from
density
stratified weight;
functions
a
and,
let
is,
accordingly, g(x) where We now
~(xxi)
is the
introduce
estimated
= (I/I
the
by T(g),
Dirac
wi)
function
parameter
i.e.
[ W i ~(xxi) ,
(i=1,...,n)
concentrated
8 defined
by the
on the
observation
functional
T(f)
and
x i.
15
e =
T(f)
= T(g)
we f u r t h e r
assume
observations the
that
= T(x I ..... Xn;
T(g)
w I ..... Wn) ;
is a n a l y t i c a l
with
respect
to the
x. as w e l l as to the w e i g h t s w.. W i t h this r e s t r i c t i o n I i T, the d i s t r i b u t i o n s f and g are s m o o t h and d o m a i n
functional
limited. domain
Effectively,
limited
Distribution converges
per
g(x)
existence is
to T(g)
distribution
smooth
with
g*(x)
of
f(x)
e, the
with
to
smooth
parameter
respect
m tending
is
to
per
definition
t o be
estimated.
functional
infinity
T,
and
for T(g*)
in
hm(XXi)
= (II[ w i) [ w i
and lim hm(X) Any to the
sequence Dirac
theorem is
have
of c o n t i n u o u s
function
m a y be
applicability
inasmuch the
as
= ~(x),
functions
{h
considered.
is t h a t
it has b e e n
m + .
distribution
possible
to
(x)} u n i f o r m e l y c o n v e r g i n g m The last c o n d i t i o n f o r the g(x)
observe
be d o m a i n the
limited;
sample.
Thus
it
we
e~pansion
e
=
T(g)
=
T[ (lt) f + tg] ,
for t = I
= e + (I/[ wi) [ w i ~(xi) I 2 + ~ (I/[ w i) [ [ w i wj @(xi,x j) + with
the
higher
...
order terms
sample
is r e p r e s e n t a t i v e
terms,
when
f(x)
and g(x)
It m a y
be
noticed
that
estimator
in
section
2.1
2.5.
The
Under section, parameter
influence
the we
appropriate
8 by the
are
of
given the
for the
above
of
regularity
e is r e l a t e d
approximate
e +
f(x)
when or,
the in o t h e r
variance
structure.
Hampel
conditions
=
importance
distribution
expansion precisely
an e s t i m a t o r
simple
neglegible
parent
close.
the has
function
see that
having
of t h e
to
relation
(11[ w i) [ w i @(x i)
met
in
the
corresponding
the
above
on
16
where
the
on the
factor
result
function
~(xi)
8.
or,
rather,
samples.
robustness
one
(1975)
 as w e l l
indicative has
the
onedimension at
is
Hampel
at
of t h e function
"influence This
dimension as
named
is
curve"
a very
 see
several
influence
of the
~(x),
influence
for
he
powerful
Andrews
et
dimensions
the
al.
 for
was
tool
value
xi
considering to
only
appreciate
(1972)
and
instance,
the
Hampel
Rey
(1975a)
A
Basically The
it m e a s u r e s
influence
the
first
Let
us
von
first
the
function
Mises
sensitivity
of
can
defined,
recall
the
for
the
particular
have,
when
t
is
observation.
and
direction
easily
of
obtained,
a Dirac
as
function.
¢(x)
f(x)
dx = 0.
distribution
g(x) we
be
in t h e
each
tautology
: Then,
also
derivative
e to
small,
=
(lt)
f(x)
that
is w h e n
@(x)
g(x)
+ t
6 ( x  x O)
f(x)
and
g(x)
are
close
to
one
another, T(g)
Therefrom
the
= T(f)
+ f
= T(f)
+ t : ~(x)
: T(f)
+ t ¢(x0).
following ~(x 0)
Observe
that
applicability derivatives; distribution the
sample The
seen
than
in R e y
definition
the is
g(x)
which
of the
= lira ([ T(g)
above
this
 T(f)] /t),
definition
Taylorlike
due
to t h e involves
influence
has
only
local
domain
in t e r m s
particular
occurs
t ÷ O.
a larger
expansion
very
function
of
of t h e
selection
properties
yon
Mises
of
at p o i n t
x 0 of
space.
concept
situations
the
dx
~ ( x  x 0) d x
of
where
influence several
(1975b).
function
parent
can
be
distributions
immediately are
generalized
concerned
as
can
to be
3. THE JACKKNIFE
3.1.
Introduction
The
socalled
reduce
possible
obtain
jackknife bias
estimation
of v a r i a n c e s .
power
to p r o d u c e
cheap
way,
cheap
in c o m p u t a t i o n
the
that
results
not v e r y
estimator is t o
say
are
defined
been
has
introduced
and t h e n
cheap
and
to be
impressively
circumstances,
but
the
by
cases;
are
to to
its
in a
necessarily
By all
in m o s t
results
extended
assessment
not
considered. good
Quenouille
interesting
estimator
in m e t h o d o l o g y
has
by
progressively
It is e s s e n t i a l l y
improvement
if t h i s
obtained
well
method
in e s t i m a t i o n
standards, but
either
in a few
poor
or
ridiculous. Unfortunately not
easy to
under
check As
And,
them
support in the
the
to d e v i s e
use of
relatively
few details
More
and w i l l
be
reserved
story
starts
possible
bias
Huber
involved
(1972)
more
specific
we d i s a g r e e
that
: "It
technique
to t h e
is h a r d l y under
work
be n e e d e d
might
and b e t t e r the
for
is
it is a p p l i c a b l e
respect
conditions
with
method
?
variance
above
which
the
to
estimate"
viewpoint
all e s t i m a t o r s
and w i t h
We n o w p r e s e n t a relatively
considerations
for the in
with
method
and
falling
2.h.
jackknife
by Miller.
The
more
jackknife
however,
regularity
jackknife
section
is the
to
properties,
sequel,
of t h e
seen,
of the
estimator
precise
useful
in the
frame
But w h a t with
these
than
indicated
down
be
of the
according
to w r i t e has
of a p p l i c a t i o n
It m a y
regularity
observations.
jackknife
scope
delineate.
suitable
worthwhile
the
1956,
of s t a t i s t i c a l
could
next
section.
when
Quenouille
estimators
the
light
obscure
proposes
through
what
technique
notation the m a i n
to r e d u c e appears
used ideas
the
to be
a
A
"mathematical sample
of
sample
size
trick".
size
n can
He
observes
frequently
that
have
an e s t i m a t o r
its bias
e based
expanded
on
in t e r m s
a of the
as f o l l o w s
E
(i
 8) = a l n
I
+ a2 n
2
+
... A
This based
form
is n o w
on the
observation.
compared
sample The
with
of size
new
the
nl,
expansion
corresponding
the is
same
sample
for the
estimator
without
the
ith
ei
18
E(; i when
the
observations
proposes
which
to
consider
have
=
are
a 1 (nl)
+
independent.
a2(n1)
2
+
To r e d u c e
...
the
bias,
Quenouille
bias
expansion,
except
that
the
first
order
term
It is
E(~ i Obviously, leading
1
the v a r i a t e s
a similar
cancels.
8)
the
term
efficiency
e) =  a2/[ n(n1)]
same m a t h e m a t i c a l
of the
in the
expansion,
estimation,
trick and
he
+
can be
so on.
suggests
...
applied
To a v o i d the
to the
a loss
definition
second
of
of an
average
estimator
8 = =
For
no c l e a r
has
~i ~i [ (n1)/n]
8
reasons
demonstrates appear
(l!n) n
fairly also

at t h a t good
named
A 8i .
[
time,
the
statistical
8i, the
jackknife
estimate
properties.
pseudoestimates,
Tukey
8 frequently
who w i l l
and 8i,
the
soon
jackknife
pseudovalues. With
regard
to b i a s
reduction
advantageous
to m o d i f y
observation,
say we d e l e t e
pseudoestimates constituting devoted
these
to the
observation
three
some
and the
serial
selection be m o r e
appear
be
of m o r e
I) and w o r k
There
of
(h >
(nnh)
are
scheme
each
of t h e
group
other.
out t h e ways
of
of
(h = I)
I) c o n s e c u t i v e third,
different
when
the
ways.
parameter
independent.
When
can be m o r e
reliable;
of h d e l e t e d The
one
has b e e n
deletion
possible
strictly
be
than
are m a n y
consideration
equivalent
second
such t h a t
independent
( h >
: first,
deletion
could possibly
of g = n/h p s e u d o  e s t i m a t e s ;
observations
of h m u s t
it
deletion
(nh).
special
in the
the
that
n by
schemes
t o be r a t h e r
correlation,
or less
and
second,
of h o b s e r v a t i o n s
moderate
of size
following
and c o m p u t a t i o n
schemes
seems
size
h observations
subsamples
three
it
sample
samples
at a time;
observations deletion
with
the
third
The h is
there
is
the
observations
scheme
appears
to
19
be v e r y
valuable
although
it has
experimentally Before mention the it
for theoretical limited
supported
leaving
this
pseudoestimates
samples. Gray
The takes are
This
and
second in
1958 w h e n
different
at l e n g t h
in the
in t h e i r
progress Tukey
therefrom
This
conjecture the
significant
will
in the
of t h e
he p r o p o s e s
2(~)
retain
book
largely
be
on the
feature
property
of the
jackknife
collection
history
to
sizes,
on the
same
of p a p e r s
by
that
of the
the
jackknife
pseudovalues
of e a c h
~i
specific
variance
estimate
in the
sequel.
We
is,
fact,
_ e)2] ,
supported
attention
is
of e v a l u a t i n g
sample
estimators
jackknife
= E[ (~ i
is w o r t h y
Instead
incidences
the

(1972).
conjectures
representative
it
on d i f f e r e n t
account
as
significant
essentially
observation,
as w e l l
(1977)
discussion
for r e f e r e n c e s .
method.
based
Sen
above
reduction,
jackknife
estimators
into
is d e f e n d e d
Schucany
place
from
to t a k e
The
(1974a)
of bias
of the
 see
interest.
 see M i l l e r
subject
generalizations
ispossible
derivations
practical
which
in
now
simply
the m o s t
method.
Bias reduction as well as variance estimation can be achieved without detailed knowledge
of the sample distribution nor involved
analysis of the estimation method. estimation
definition.
 Next
We only need a sample and an
section
will
demonstrate
that
the
third
A
central The large
moment
of
0 is
computation and,
through
accordingly,
analytical
(1966).
also
With
observation
means.
E ranging
deletion,
eventually
and
second the
variable
order
terms
infinitesimal ¢ is k e p t
analytical
derivations can be
To
conclude
from
he
zero,
limits
in the
states
this
it
is
introduction
to c.
the
be a b u s i v e l y as p o s s i b l e
is due
deletion
to D e m p s t e r of one
to
one,
with
the
of the
This
is met
approach
(1972a),
where
concern
differences
of d e r i v a t i v e s . scarcely
published
we w o u l d
like
help
complete
consideration
It is of g r e a t
then,
may
developments
of J a e c k e l
in t e r m s
background,
case
the
no d e l e t i o n ,
variable
because,
method
it as m u s c h
his
himself
jackknife
stated
apply
interesting
infinitesimal.
estimators
of t h e o r e t i c a l
One
jackknife to
to manipulate
formulas,
of a v a r i a b l e
with
by t h e
one t e n d s
difficulties
in his
and
available.
involved
the
first anew above
for
between
But,
due
to the
lack
so far.
to m e n t i o n
three
papers
20
which
could
a naive
introduce
of s t a t i s t i c s also
some
warning
very
complete
advantages In the
line
theoretical
setup, (1968)
survey
sampling
survey
context
(1975).
The
jackknife
we
should
which
has
a n d the two
is
and F e r g u s o n
but
say,
can be
sharp
the
place
general
to w o u n d
is p o s s i b l y
to b a l a n c e
in
courses
having
: "However
(197ha)
tries
(1971)
(1975)
after
sufficiently
of M i l l e r which
but
with very
mention only
the
interesting are
different
the
the
the
paper
method
justification,
(1976)
partly
objectives
leaveoneout
intuitive
and C a u s e y
papers
Mosteller
in e l e m e n t a r y
of the m e t h o d .
jackknife
of W o o d r u f f
last
review
drawbacks
of t h e
Lachenbrueh
jackknife
of i n f o r m a t i o n
the
Bissel
properties
the
 the
jackknife.
introduction
level;
robust
important
piece
with
its
highschool
on the
stands
The
to the
for
how beneficial
still
unwary".
reader
pleads
at t h e
emphasis
demonstrated
most
the
presentation
which
of G r a y , related
is
of
the
work
adapted
Schucany
to the
and
in
t o the
and
Watkins
infinitesimal
jackknife.
5.2. Jackknife t h e o r y Hereinafter, justifies being
the
rather
we
sketch
jackknife involved,
Particularly,
this
method.
estimators
although account.
points
of the
A few p o i n t s
we r e f e r
section
taken
into
the main
to the
will
be
vectorvalued
of this
appendix
solely
derivation
original
for f u r t h e r
concerned
estimators
which
and
with
work
details.
scalar
functionals
could
be
A
Assume respect
we
to
at d i s p o s a l
parameter
observation the
have
8 and b a s e d
x i appearing
estimator
observations
m a y be
with
seen
as
and w e i g h t s .
=
Then,
according
to
a scalar on the
sample
the b o u n d e d
it be w r i t t e n
T(x I ..... Xn;
2.h,
it can
8, c o n s i s t e n t
(Xl,...,Xn),
nonnegative
a transformation
Let
section
estimator
w I .....
on the
each
weight set
with
w i.
Then
of
as
Wn).
frequently
be
expanded
in the
form
= e + (i/Di)[w where T(.)
the
coefficients
and the
sample
i *i +
( I / Z wi)2
[[ wiw.J *ij
+
"
'
"
~. a n d @.. are f u n c t i o n s of t h e t r a n s f o r m a t i o n i ij ( X l , . . . , X n ) , but i n d e p e n d e n t of the w e i g h t s .
21 This m o d e l , throughout. but we k n o w
the
method.
few t erms,
happens
modifications
of one w e i g h t
can p o s s i b l y
correspond
w e i g ht
w. is m o d i f i e d i p s e u d o  e s t i m a t e is
however,
Wl,...,wi_1,
g=n).
of the
the
ourselves
to the
This m o d i f i c a t i o n
observation.
When the
(I + t)wi,
wi+1,...,Wn].
by
~i  (~wj)
5] /t
through
e = (~/[ corresponding
we c o n s i d e r
(I + t), the p e r t a i n i n g
~i = [ (twi + ~wj)
The
(h=1,
with the d e l e t i o n
by a f a c t o r
estimate
of
in s i t u a t i o n s
to derive
we limit
at a t i m e
the p s e u d o  v a l u e s
and the j a c k k n i f e
required
for the v a l i d i t y
fails
introduction,
set of w e i g h t s
wi,
ei = T[Xl,...,Xn; We now d e f i n e
in the
section,
modification
condition
the j a c k k n i f e
met
of the
In this
is s t r i c t l y
to be not v e r i f i e d .
to the d e r i v a t i o n
pseudovalues.
will be a s s u m e d
its v a l i d i t y
a sufficient
Moreover,
above m o d e l
Contrary arbitrary
to its first
it c o n s t i t u t e s
the j a c k k n i f e where
limited
We do not k n o w w h e t h e r
expansion
has
wi ) [ ~'i"
approximately
the
form
I
8 = 8 + (I/~wi)[wi~ i + ~ which nearly The
first
multiplied introduce
With derived
the
differing
by the a bias
cancellation, note that
equates
expansion
item
factor
met
is the
(I/lwi)2 with
second
(I + t); this
decreasing
obtained
(I + t)
with the
with t = I,
sample usually
not b r i n g
any bias
r e g a r d to bias
reduction
the
for a class
+
o
is also the size.
appears
Therefore
reduces
the bias.
 if the w e i g h t s 
to s e c t i o n
w i are i n d e p e n d e n t
if the g g r o u p s
 if each g r o u p
according
have
the
conclusions
may be
of the
2.h, observations
same w e i g h t s ,
is i n d e p e n d e n t
of the
We also
reduction.
following
others;
then,
can
its
of e s t i m a t o r s
If 8 can be e x p a n d e d
•
first which
A

•
for 5.
order t e r m w h i c h
t erm
small t does
[[wiwj#ij
xi,
22
 the 
ordinary
jackknife
the
jackknife
the
original
bias
(t = I)
estimate
e has
estimator
possibly
the
e, but
same
for
reduces
the
asymptotic
a translation
bias,
distribution resulting
as
from
reduction.
In o r d e r attentively
to
obtain
the
a variance
pseudovalues
estimate,
~..
They
we now
have
the
consider
more
particularly
simple
I
expansion
:
~i which
indicates
they
observation
weight
Furthermore,
the
are
w.
wi
e + wi
essentially
and
$i + proper
independent
in the the
the
random
pseudovalues.
former
are,
are the
terms
component
mean
Thus,
up to This
perturbed
other
weights.
A
main
arithmetic
pseudovalues.
'
to the
of the
I
precisely
"'"
some
of the the
omitted
in the
~,2(e)
e, i.e.
main
the
(I/~w i)
~wi~ i,
is
random
components
and t h i r d
central
moments
homologous
moments
of the
second
factor,
argument
of
is the m o r e
correct,
expansions.
the
more
Asymptotically,
= u,2(e ) = (1/[wi)2
g
appearing of
neglegible
we have
U'2(~i)
and
"'3 (~) = "'3(e) The and,
"notquite
large
in a g r e e m e n t
with
sample" Tukey
: (I/[wi)3 situation
(1958),
we
g ~ ,3el.() is t r e a t e d
obtain
the
in the
appendix
jackknife
variance
estimate
var(e)
var(;)
Moreover var(@)l order
when t
right
hand
= 1.
variance
tolerates as the
bias
number
is
is
due t o
the
in
the
jackknife
estimator
is
obtained
reduction. of groups
independent.
several
2
expression
This
random component
This
are
the
=
Therefore,
observations
when
cancellation
for
it may
be
of
the
class
of
its v a r i a b i l i t y
sufficiently
some
good approximation the
of
second
estimate.
Furthermore, g is
a fairly

large
safe
correlation
and
e which
is low
inasmuch
to p e r t u r b is e x p e c t e d .
also
inasmuch
as they
simultaneously This
23
consideration considers
strict
While range
is o p p o s i t e
possible robust
Sharot's
viewpoint
(1976a),
who
only
independency.
computing
of the
to
the
variates
outliers,
jackknife (7.
 w.
i
I
they
estimators
would
estimate,
it
is wise
8) in o r d e r
to
assess
imoair
are j a c k k n i f e d ,
the
these
variance
to the
check
presence
estimate•
variates
the of
When
are b o u n d e d
due
to the
A
limited For
incidence
procedure
as
observation (finite)
obtain
would
of a s m a l l In the
the
be m a r k e d l y
The
an
preferred.
parameter
be
limited
8.
to m i m i c
or c o m p l e t e ei i m p l i e s
the
deletion
the use
(infinitesimal) This
of an
of a
differential
is o b t a i n e d
through
the
t.
of the
pseudoestimates
estimator
difficult
partial
whereas
version
strictly
The
on the
appears
pseudoestimate
operator,
perturbing
can
follows.
it
above•
infinitesimal
expansions as
described to
observation
derivations,
difference
operation use
of each
analytical
jackknife,
to t h e i r
t small,
first
leading
the random
terms
are
8 i = e + t w .i ( a / ~ w i) T(x I ' "
" • DEn
; w I'
" • • 'Wn
)
= e + tw i J i and the
jackknife
estimate
is not bias
e
The
corresponding
jackknife
var(~) In his
in
version•
His
with
respect
analytical
estimators
Jaeckel
and
of the
recommend as w e l l
his be
his
comparison h i m to
has
the
is
proposed
is,
of
with
the
the
jackknife
infinitesimal
partial
derivatives
of T
in f a c t ,
as m u c h
as p o s s i b l e
an
derivation
that
treatment
seeing
to m i m i c
the
several
conclude
to m o d i f y
even
computational
line
)2] •
second
advanced
 In a p a r a l l e l
leads
estimate
biasreduction
involves
t o w i and wj
to
variance
obtain
proposal
could
observation. noteworthy,
to
duplication
We h e s i t a t e treatments
order
8.
= X(wiji)2/t I X w 2i / ( ~ w i
1972 m e m o r a n d u m ,
procedure
=
reduced
paper types :
that the
of
with
other
full
Sharot
(t = I). appropriate
deletion (1976b)
of j a c k k n i f e
of an is
variance
24
"On the b a s i s of the M o n t e C a r l o s t u d i e s , w o u l d a p p e a r t h a t the d e s i r e d g a i n in p r e c i s i o n ... is o f t e n a c h i e v e d . A l t e r n a t i v e e s t i m a t o r s d e s i g n e d for a p a r t i c u l a r a p p l i c a t i o n m a y , not s u r p r i s i n g l y , do b e t t e r still. The i n f i n i t e s i m a l j a c k k n i f e is seen t o y i e l d just such an e s t i m a t o r in m a n y c a s e s . " We
frankly
only
regret
appears
reference To
conclude
this
(1976)
transformation assumed
3.3.
than
Case
To
is r e q u i r e d
the
of m e a n
The
the
the
section
has
been
been
in
Suppose an i n i t i a l
be
reported
issued
connection we have time
interest
censored
in Rey
by M i l l e r with
the
and t h e r e
It
the
we
indicate
satisfied
the
last
Taylorlike but
that
by the
one
expansion
equation.
of a p p l i c a t i o n
m a y be
durations
they
are
remaining
x i (m < i < n).
the m e a n
residual
likelihood
sum
(1975);
either
in this
with
application
the
another latter
section
in s u r v i v o r s h i p lies
fashion.
and
items
stopped
observed are
Then,
under
time
met
The
in the
fact
derivation
paper
in the
has p o s s i b l y
same
been
before
also
which
or in f a i l u r e .
durations observed
assumption
failure
are r u n n i n g
from
We w i l l
of the m f a i l i n g
during
known
of c o n s t a n t
is c l a s s i c a l l y
hazard
given
by the
estimator
on all
items,
[ xi, whereas
i=I ..... n. the
denominator
is the
number
ones.
A question consistent
runs
of this
items
= (11m)
of f a i l i n g
investigate
a set of n i n d e p e n d e n t
until
(nm)
we
frequently
former.
the
the
309,
(1975b)
items;
note
that
in an u n k n o w n
b y x i (I ~ i < m ~ n), the
maximum
theory,
to be
scope
power,
life
denote
rate,
known.
2.4.
jackknife residual
particular
data may
already
field
by
obtains
that
(197ha),
jackknife
 see p a g e
we c o n j e c t u r e
so l i t t l e
help.
conditions
He e v e n t u a l l y
demonstrate
written
on the the
review
is
study
studies.
has
of M i l l e r ' s
section
implied
an e s t i m a t o r
that
jackknife
is of no g r e a t
enumerates T.
Nevertheless, broader
infinitesimal
in one p a g e
to J a e c k e l
Thorburn
we h a v e
the
it
we w i l l
to t h e
not
resolve
real mean
life;
here
is w h e t h e r
estimator
this
is not the
object
e is
of the
present
25
discourse.
This
of c o n s t a n t
hazard
rate.
assume
that
not
anymore
a negative more
important
exponential
reliable
observations free
is
ourselves
the
failure
derived
agree
that
on the
probability
results, do not
seeing
We k e e p
now relax
times
solely
In t h i s
f r o m the
with
the
the
assumption
definition
are d i s t r i b u t e d
function.
strictly
from distribution
we
estimator
data
set,
preconceived
ideas.
we n o t e t h a t
the
do
according
way,
exponential
but
we
when
to
expect the
assumption.
This
We
is a f e a t u r e
of r o b u s t n e s s . Before
applying
the
jackknife
weighted
form
A
e = (IIX wj) [ fits
in the
frame
wixi,
of s e c t i o n
(j=1 ..... m;
2.~
and,
thus,
we
i=I ..... n) are
in a g o o d
position
to p r o c e e d . With
all
procedure. the
weights
equal
Modification
to
one,
of the
we n o w d e v e l o p
weight
w i by
the
a factor
finite
jackknife
(I + t)
produces
pseudoestimate A
0. = e + t g. 1
1
with 5.
1
The
corresponding
~i
=
(x.

e)l(m
i = xi/m ,
+ t),
i
m.
pseudovalue
= [(n
if
+ t)
is
Oi
= 0 + (n + t)
 n O] / t ~.. i
Therefrom,
we
obtain
the
jackknife
estimate
o = ~ ~iln =
0
+
[ (n
+
t)In]
~
= e + t [ (n + t)/n] The
bias
reduced
introduced,
estimate
see t h a t
[xillm(m
is the
+ t)] ,
expression
for
where
i >m. t = I has
been
i.e.
T0 We
~i
the
_ ~ _ n  I e n m (m  I) [ xi'
bias
correction
is
inversely
for
i > m.
proportional
to the
sample
26
size,
when
ratio
is
the
Effectively fluctuates matter
in
of s t o p p e d
with
such
with
size
case
size
to
failing
n, t h e n
the
n and,
the
parameter
items
bias
is given.
correction
e to be e s t i m a t e d
therefore,
what
should
be
If this
is m e a n i n g l e s s . also
estimated
is a
of o p i n i o n .
A variance jackknife size
ratio
changing
estimate
and this
for
finite
is r e a d i l y
variance
derived
is c o r r e c t
tvalues.
According
through
to the
to the
the
first
infinitesimal
order
in
sample
definition
e i = e + t w i Ji' we h a v e
in this
case
study
wi and,
immediately,
the
estimator
var(e) = [ ~2i/(
Ji = 6i follows
I  n/n 2 )
or
var(e)

n
The
jackknife
confirmed be r e l a t e d t; this
to the
Carlo fact
is i n d i c a t i v e
2 + ~ x i]
(j
< m,
i > m).
J
estimate
by M o n t e
2
[ [(x.e)
m2(n1)
and the
jackknife
experiments.
that
the
of r a p i d
The
variance quality
estimate
of the
have
results
been may
6 .  v a l u e s are s c a r c e l y d e p e n d e n t upon l c o n v e r g e n c e of the e x p a n s i o n of s e c t i o n
2.h. It
is h a r d l y
illustration.
However,
consideration. specifically
possible
Assume as
we
We w o u l d on t h e
multidimensional
as w e l l
to d r a w
covariance
that
conclusion the
jackknife
now to retain matrix
f r o m this
the
which
is w o r t h y
attention
comes
simple
out
for
to
estimate
covariance
the mean
matrix,
vector
given
that
of a s a m p l e
(Xl,...,~n),
the m u l t i d i m e n s i o n a l
observations
We d e n o t e
estimator
x. are p o s s i b l y i n c o m p l e t e l y k n o w n . I of the m e a n and its jth c o m p o n e n t w i l l
be
j to be u n b i a s e d . absence
of
more
estimators.
we w a n t its
feel
like
any
The
of c o m p o n e n t
: [ ~ . . x al
coefficients x... Jl
Per
zji
ji
/[
are
definition,
given
by 8 the by
zji indices we h a v e
of p r e s e n c e
or
27
z.. = I, J1 = O, Before
applying
if x.. is o b s e r v e d , 31 if x.. j! is m i s s i n g . the
jackknife
independency
assumptions~
the
form
weighted
method
on the
zji
be
w..
These
the
need
of
as on the ~i;
furthermore
of 8,
~j = [ w i zji must
we note
as w e l l
sufficiently
xji
differentiable
conditions
are
/ [ w i zji,
with
respect
to the
x.. J1
and to the
satisfied.
I
Attempts bias.
to
reduce
We will
sequel.
the
therefore
It r e s u l t s
bias
would
only
from
the
here
estimate partial
put
the
to
light
covariance
the
lack
matrix
of
in the
derivatives.
J..j~ = a ej / a wi =zji (xji  ej) / [ w i zji and
has
the
components
{co~(~)Ikl We will say
write
for the
it down
=
[ w~
Jki
for the
unweighted
mean
Jli
/ [ I  ~ w~
uniformly
weighted
/ ([ wi)2]. estimator,
A
n
ZZki(Xki
This
covariance the
does
not
interest be
to
from the
positive

~
all
definite
on the
elegancy with
theory.
it w o u l d
1
Zli
The
have
Zi ~ki
matrix.
observations
by c o m p a r i s o n
classical
definite;
t eov(~_)lkl
when
insist
estimator
 81)
Z~ki Z~li
is a p o s i t i v e
estimator
seem u s e f u l of this
inferred
even
estimator
ordinary
is to
A
 ek)(Xli
{cov(~)]~l  n1
with
that
vector,
been
(xki
are
It concurs
complete.
of the m e t h o d the
structure
latter
estimator
It
seeing which
the
can
is not
written
_ ~k ) ( Xli
_ ~ 1 ) ~li
where
m = ~i Zki With
regard
to
applications,
they
Zli"
can
be f o u n d
in v a r i o u s
domains;
28
some
are
interesting
apparently not
fails
surprising
Ferguson and
Rey
3.4.
et
on
because
to us.
al.
Gray
(1975),
been
data
However
(1975),
and M a r t i n
it has
correlated
it
et
observed
and on o r d e r
is i n f o r m a t i v e
al.
that
the
jackknife
statistics. to
scan
This
the
(1976),
Miller
(197~b),
to the
review
of M i l l e r
posterior
is
papers Rey
of
(197h)
(197ha).
Comments On t e n d e n c y
conditions g, the
to n o r m a l i t y .
and w i t h
jackknife
degrees
large
estimate
of f r e e d o m .
consequence
tends
To u s ,
of the
It
is k n o w n
groupsize
to have
this
performed
that
under
h as w e l l
regularity
as f i n i t e
groupnumber
a Tdistribution
appears
accidental
mkthematical
with
g1
and e s s e n t i a l l y
derivation.
Expressing
a
the
A
estimator
e in t e r m s
and t r u n c a t i n g with
normal
that
This
of
state
of the
happens but, the
the
affair
frequently
then,
the
Lindeberg
has
of
been
rather
that
original
the
components
regularity
estimate
tends
avoided
in this
than
in t e r m s
jackknife
estimator
also
analysis.
Jackknifing
independency.
When
must
assembled.
All
other
statistics.
This
presentations
jackknife.
Effectively,
of the
continuous
independent
in the
of the
classify,
among
observation
several
information.
of o b s e r v a t i o n s
observations. jackknifed
in
subject
in It
or less
to
the
normal
application
of
classes,
This
can be
Estimator
prior
large
generally does the
not
can be
sizes
h
fulfilled.
justify
estimators
and the
Numerous
an e x t r a n e o u s
information
as
of the
"discriminant
are
under
in v a r i a n c e s
of r e l a t i v e l y
weights
to k n o w n seen
applied
weights
the
are u s u a l l y are
not
values.
probabilities.
belonging
misclassification
grouos
conditions
observation
On m i s c l a s s i f i c a t i o n
prior
then
is r e a c h e d .
by e x p a n d i n g
is m o r e
can be
size
not
And
observations.
stationarity
sample
utilization
permits,
some
and
be
paper
estimate
e was
and b o u n d e d
to n o r m a l i t y
of the
assumed
On o r d e r
are m a n y
conditions).
conditions.
On t i m e  s e r i e s conditions
in the
jackknife
weights
x. as a p o w e r series e x p a n s i o n 1 necessarily pseudovariates 8 i
produces
if t h e i r
disguised
conclusion
observations
order
distributions
(appropriately
terms
of the
at a low
sample
consists
classes.
a function
Then,
where
the
are
on the
in use basis
frequently the
of t h e s e
misclassification
analysis",
methods
to of
in a set
probability
of
known
probability estimator
can be
depends
29 smoothly some not
on o b s e r v a t i o n
other
importances;
classification
application
such
as
of the
"nearest
method
with
neighbour",
is
justified. On t r a n s f o r m a t i o n s .
is
techniques,
but
also
applicable
differentlable be u s e f u l easy
to
If the
jackknife
to
$ = ~(~)
inasmuch
in the
vicinity
of e.
set
confidence
to m a n i p u l a t e
On r o b u s t n e s s .
than
the
domains original
The p s e u d o  v a l u e s
method as
~(.)
However, when
the
is a p p l i c a b l e
to ~,
is c o n t i n u o u s l y the
once
transformation
distribution
it
may
of $ is more
distribution. ~.
are
indicative
of the
IA
observation version reveal
incidences
of H a m p e l ' s an a b u s i v e l y
on the
influence high
estimator curve
sensitivity
8 and h a p p e n
(197h). to
some
Their
to be a d i s c r e t e
inspection
observations.
may
4. MESTIMATORS
~,.I. Warning In t h e s e wellknown viewpoint which
sections in the
For
consider
"best"
representativity
real
location
from
each
But
under
what
e is the
estimation.
leads
estimation
without
both
translation
However
disregard
of a l o c a t i o n
to t h e
real meaning.
to e s t i m a t e
to
which
are
the
certain
aspects
concern.
according
mean;
problems
feeling We m a y
are
of t h e the
law f(x
by
end up with with
distribution;
its
a median
respect
both
however
e for
We w i l l
concerned
invarlant
location,
parameter
 8).
to the
may
they may
differ
other. are t h e s e
generalizations
have
classical
statistical
estimator
or its
be u s e d
a few
and this
the
or an a r i t h m e t i c
frequently
with
be of g r e a t
x distributed
for the
estimate
of
unusual
would
example,
the v a r i a t e search
theory
is r a t h e r
otherwise
we m e e t
Mestimators
of the
parameter
in o b v i o u s
usual
value
we
maximum
maximizing
investigate likelihood
the
?
They
are
estimates.
likelihood
Classically
function,
i.e.
we
notation L
=
n
f(x i
I e)
= max
for
e
or e q u i v a l e n t l y  in L =  Z in f(x. 1 The
estimators
of t y p e
M are
solutions
M = ~ P(Xi, where note
the
function
that
the
correlated The
above
estimators
be
structure
of t y p e
problems,
the m a n y
other
attention
 The
meet
developments
with
may
of the m o r e
e) = m i n
rather
for
general
structure
e,
arbitrary.
is r a r e l y
e.
Before
appropriate
proceeding,
we
to p r o c e s s
observations.
for l o c a t i o n however,
p(.)
I e) = rain for
independency
next
M have
since
been
which which
analysed
initial
estimation
section
properties
the
problems
is o r i g i n a l , are b a s e d usually
in q u i t e
contribution have
in this
many
received respect
on d i f f e r e n t i a b i l i t y do not
hold
respects
of H u b e r
f o r the
(196h);
scarcely  We w i l l and other
two
any
31
great
classes,
respectively and t h r o u g h (some
are
the
L and the
obtained rank
Restimators.
through
tests.
linear
The
We r e f e r t o H u b e r
questionable),
and to
latter
combinations
Scholz
(1972)
(1974)
are
of o r d e r
statistics
for their
for their
properties
respective
merits. Developping
the
functions,
their
structures
as w e l l
reading tried need
we p r e s e n t
to maintain a special
We w i l l
theory
of M  e s t i m a t o r s
derivatives,
operators
as
summations
the
notation
the
now
script,
meet
n
sample
P
dimensionality
space,
f(x)
probability
X. 1 n
observations,
w. i
nonnegative
~(x
sample

x i)
8. S g M. J
Dirac
the
~ C of
its
ease
each t i m e
definition. scripts
~.
density
function
of x, x E ~.
I ~ i ~ n.
weight
function
simultaneous minimized
estimators.
t E R,
(.)*
perturbed
(.)'
transposed
entities
have
(~)
underlined
entities
are
Hampel's
of x to M~,
I ~
j ~ g.
I ~
j ~ g.
function,
x E ~.
~ (x,.). J ,j(x,.). influence
0 ~t
~
I.
entities
have
asymptotic
variance
regression
model,
m
dimensionality
x
= (u, Z')'
,(c)
= (~lac)
~i(c).
= (~l~c)
~(c).
e
a star
superscript.
a prime column
of 8.. J u = X' ~I
of ~ a n d ~I'
in p1(a)
=
+ ~" p = m + I.
= (a/~u i) c i. k
IEI v
superscript.
vectors.
n.
rigidity index, R i exponent
on x.. z I ~ j < g.
by e~,
t
contribution
x.. z
concentrated
and e s t i m a t o r ,
of
function
of o b s e r v a t i o n
the
we have
R p.
= (~lae.) J = (al~e k)
Ri
further,
following
~jCx,.) ,j(x,.) *jk(X,.) a.(x) J
2 o'. J u,v,£
to
as p o s s i b l e
size.
parameter number
notation;
or m a t r i c i a l
In o r d e r
As m u c h
recalled
with
need various
vector
integrals.
in use.
classical
we h a v e
successively
and
we w i l l
with
we
$2
e
= e I for m =
y , D ,F ,e i , n
see
V
2 = oi,
asymptotic
s
scale
of ~.
4.2.
section
I, v =
I.
h.3.h. variance
of 8.
MMestlmators
Consider
some
distribution
sample
f(x),
unknown
except
account
the
space,
x E ~,
for
some
empirical
sample
on a set
function
section
investigate
we
(81,...,8g)
( X l , . . . , x n)
which
(MI,...,Mg) , that
weights
concentrated
are
a density
this
distribution
and t h e n
we t a k e
is
into
w i) [ w i ~(x  x i)
= (I/X
of n o n  n e g a t i v e
is a D i r a c
on w h i c h
possibly
distribution.
f(x) based
say ~ C R p,
is d e f i n e d ;
the
on the
estimation
such that
is t h e y
are
M. = m i n J
( W l , . . . , w n)
they
such for
and w h e r e
~(x  x i)
observation
x.. In this I of p a r a m e t e r s
of a set minimize
the
functions
that
8., J
j =
1,...,g
where Mj = I Oj This
definition
yields
to
(x,
e I ..... eg)
identity
of the
f(x)
dx.
parameters
with
their
estimates. The
above
classical
framework
Mestimator
interdependence often e.g.
depends the
is e s s e n t i a l l y theory.
of v a r i o u s upon
variance
where
a 2 m a y be
~ is the m e a n
These
are
in fact
 o212
f(x)
of t y p e
of
through
f(x)
another
(x  ~)2
estimators
For
estimation
defined
through
#
M M  e s t i m a t ors.
It is m o t i v a t e d
estimators.
a previous
$ [ (x  U)2
a generalization by t h e
instance, some
of the frequent a scale
location
for
estimator
estimator,
the M  s t r u c t u r e ,
dx = m i n
now
f o r p = I,
a 2,
Mstructure
dx
= min
for
MultipleM
~. or,
as we
call them,
33
In the under
sequel
the
we
integral
 Independence
=
,jk(X,.)
set
respect
meet
to
M. can be d i f f e r e n t i a t e d J assume :
we
01,...,8g ,
of f(x)
the
and
at the
frontier
of the
sample
needs).
differentiability
the M M  e s t i m a t o r s
can
of d e r i v a t i v e s ) .
as w e l l
be
defined
by the
following
of g e q u a t i o n s
e I ..... eg) f(x) dx = 0.
f ~j(x, In our for
functions
(alaej) ~j(x,.), = (~lae k) ,j(x,.),
(existence Accordingly,
the
precisely,
on d e r i v a t i v e s
G generally
 ,j(x,.)
that More
of ~ with
(conditions space
assume sign.
illustration,
an e m p i r i c a l
the
corresponding
distribution
set has
g = 2 equations
and
is,
f(x),
[ wi [(xi

,)2

O
2]
O,
=
w i (x i  ~) = O. Strictly
speaking,
We w i l l
now
estimators. we give
be
their
to
adaptations are be
of the
According to the
to
estimate
s
2
°
Is an M M  e s t i m a t o r .
distributions
their
influence
conclude
this
of t h e s e functions,
section
by
then
some
robustness.
domains
difficult
(possibly
to
select
of a p p l i c a t i o n s .
derivations,
a prime
whereas
sample
derive
eventually
is r e l a t i v e l y
certain
by
by the
first
and
on t h e i r
columnvectors denoted
we
variances
notation
ourselves
interested
Precisely,
considerations The
~ is an M  e s t i m a t o r ,
we will
assume
of d i m e n s i o n
I).
without
restricting
In o r d e r
to p e r m i t
that
estimators
the
The t r a n s p o s i t i o n
easy 8j
will
superscript.
section
2.5,
8j is g i v e n
the
Hampel's
influence
function
relative
by
Gj(x O) = lim[ (8~.  8j)/t] , t ÷ 0 where
ej
is d e f i n e d
through
f*(x) for
a given
coordinate
=
the
perturbed
(I  t)
set x 0.
f(x)
Observe
distribution
+ t ~(x that
 x 0)
the
script
Gj(x)
describes
34 a vectorvalued
function
w i t h the
sample
space
although
unfortunate;
I C s j , F(.)
of H a m p e l
on the
itself;
sample
the
it s t r i c t l y
space
notation
and should not be c o n f u s e d
£(.)
corresponds
is b e c o m i n g
standard,
with the c u m b e r s o m e
(197h).
The g e x p r e s s i o n s
Cj( x,
I can be e x p a n d e d
as follows
We c o n t r a c t
the
notation
first
in the p e r t u r b a t i o n .
order
[ ej(x) f
= ~ [[
+ ~ Cjk(X)
Cjk(X)
= [ Ajk(e;
linear
see that
the
equations.
or f u n c t i o n a l coefficients estimators
f dx]
respect
to the n o n  p e r t u r b e d into
account
elements
the terms
= 0.
(e~~  8j)
They m a y be
depending Ajk
(e~  e k) + t ~ j ( x O)
difference
scalar
is the
but p o s s i b l y
u p o n the n a t u r e
(k ~ j) are d o m i n a t e d independent
solution
of the
they
of a set of g are v e c t o r i a l
estimators.
by Ajj, that
When the
is when the
f r o m one a n o t h e r ,
a solution
Under
II~k Aik Akk Akj A
II <
v' 
e I"
distribution
 v'
t)
 v'
v_~(u
~(u
first
integral
Let
the
solution
e I)
f*Cx)
9_ I ) 
 v_' A t ) term,
consider
I
e_1 a c c o r d i n g
dx
the
be
0 ~ t ~
to
= 0.
to
obtain
have
the
~(u
 v'
9_i )]
g(x)
dx =
0
in fact, to
and
perturbation
in o r d e r
immediately
is equal
g(~)
+ t g(~),
equation
We
of the
and the
=
f(~)
and g(x).
+ t f v
u
perturbed
integral
/
if
on ~I'
= (I  t)
f v
We
some
perturbation
f*(~)
and
by
+
0,
9_I as a
implicit
f(x)
is n o n  z e r o
equation
dx
in a v e r y
limited
48
 2 [ Z
@(u
 Z'
~i ) f ( z )
dz
with =
"between"
~
= Z' ~I and
is thus
implicit far. later are
continuous
equation
only
assume
discuss
that
cancel
for
know
the
the
What the
the are
=
that
would
g(~),
Then But
set
The
sufficiently
we have both
obtained
~,
when
are
~;_ r e m a i n s
dependent function ~I'
interested
as we
(I
(t2/2)
upon
 8")[
~2'
"'"
vicinity
regularity
comes
I, v =

its
2 f
e.
f(y)
ol
I.
out
are
to
of ~I"
conditions
particularly
Then
dy
fCx) dx =
relation
(I  t)
did p r e v i o u s l y
 t)
the m e d i a n
easily
to
f(x)
o,
with
e
+ t g(x)]
in dx = O.
obtain
+ t [+~
~(x
*)
g(x)
dx
= 0
 e)
g(x)
dx
= 0
 e
approximately,
(I
and
 t)
2
(e
 e
Strict needed
e of the
e or,
can
theorem.
m =
by
sum
...
+
in the
the
terms
to t and we
satisfies
Y+~_ @(x We d e v e l o p
is enough
a 2
I~] ~(x we
This
a 1
problem,
f(x)
respect
e 1
+
both
and t h e i r
t and g(~)
expansion
t
that
so
and will
with
Taylorlike
+
of the
distribution
are v e c t o r i a l
of v a l u e s
term
regular
of the
coefficients
distribution
second
an u n s p e c i f i e d
is also
~I"
is when
be
implicit
onedimension
whereas
to ~I'
M o r e o v e r ~I is c o n t i n u o u s e s o l u t i o n ~I = ~I for t = O.
validity
small,
demonstration apply
to
condition.
e 1
to
respect
a discrete
trivial
t is
u = ~' ~;
g(~)
with
two h y p e r p l a n e s
given.
conjecture
when
with
that
smoothly
arbitrarily
and
is r e l a t i v e
We will
varying
the
) f(e)
+ t S+~_ ¢ ( x
in
49
e Whether
g(~)
question.
allow
the
above
expansion. The
one
hand,
of g(x)
g(x)
e(x 2 f ( ~ ) 8) g(x)
has to be d i f f e r e n t i a b l e
On the
approximation
function
= e + t I +Z
by
the
We
support
these
point
of the
critical
On the
a very
poor
"'"
everywhere
limiting
scheme,
a differentiable
presentation.
may mean
dx ÷ d2
is a p a r t l y
9 ÷
function,
other
hand,
convergence
open
I, or an is
sufficient
a fairly
of the
to
rough
Taylorlike
remarks. derivation
is the
continuity
of the
second
.
term
in the
implicit
g(~)
is met
with,
equation
the
with
respect
to ~I"
as a f u n c t i o n
a discontinuous
integral
f Z ¢(u  v' e[) g(~) seen
When
of e I, is d i s c o n t i n u o u s
u=v'_
d~,
at each e I such
that
e_1
with = Note
the
parameter have
(u, ~')'
discontinuities space
smoothed
by
of the
a set
these
: discontinuity integral
of h y p e r p l a n e s
discontinuities;
of g(~).
provide  To
but
a partition
allow
they may
the be
of the
presentation,
we
intrinsically
present. Consider
now v a r i a t i o n
of the
parameter
t,
from t = 0 to t = I.
.
That
may
partition
mean
that
and thus
~I m o v e s has
through
a fairly
several
connex
discontinuous
parts
of the
variation.
This
above can
only
.
be
expressed
by
a slowly
neighbourhood
of [1'
expansion
converge
level
can
of its
4.3.4.
"Best"
We d e s i g n
first
that
robust
is in the
very
order
in this
converging
rapidly
distribution
continuous,  the
1ocatlon section
f(u)
curve
part
of the
and t h e r e f o r e
~I
remains
in the
partition,
the
be t r u n c a t e d
at the
estimator the
function
given
is k n o w n ,
for u 6 [ u_,
influence
robustness).
same
When
term.
a minimum a s y m p t o t i c variance  the
expansion.
u+] .
pI(E),
or ~(e),
which
yields
that except Moreover
is e v e r y w h e r e
for
its
f(u
bounded
location,
) = f(u+)
(criterion
and
= O. of
50
 the
solution
Although same
type
only
of a r g u m e n t
estimations. involved
Thus,
we
of the
seeing
intend
an a p p r o p r i a t e
attention defined
on the
up to
in o r d e r
to
In w a y
2
choice fact
that,
a unit
f o r the
of H u b e r ' s
du
@(¢).
best,
this
have.
variance
e) f(u) du]
factor.
the
thought
already
e)] 2 f(u)
function
between
not
the
2
We
first
function
The
latter
retain
@(¢)
the
can
w i l l be
o n l y be
selected
denominator.
of r o b u s t n e s s
frame

If ~ ( u 
of t h e
we
asymptotic
= L [~(u
some m u l t i p l i c a t i v e
have
questions.
the
results
the
of a f a i r l y
existing
E; we have
sequel,
regression
expense
relationships
limited
1
of i n t r o d u c t i o n
constraints in t h e
the
of c o n v e x i t y ) . in the
in m u l t i p l e
be to the
residuals
to m i n i m i z e
V
by
would
of u, ~ and the
profitable
(criterion
is c o n c e r n e d
be p r o d u c e d
this
investigation
defined
estimation
could
However,
distributions effort
e is u n i q u e l y location
to the and
paper
derivation,
convexity. (1964)
However
the
approach
to
the a n a l y t i c a l
we
when
he
be
quite
will
first
omit t h e
We t h e r e f o r e
fit m o r e
investigates different
or less
his m i n i m a x in m a n y
respects. In o r d e r throughout
discretisation operators
be
ease
this
section of t h e
sample
space.
we w i l l
notation
Functions
become
use
resulting vectors,
from
a
whereas
are m a t r i c e s .
Let the
space
defined
by t h e
taken
manipulations,
a vectorialmatricial
at the
coordinate set
be d i s c r e t e .
of v a l u e s
regularly
spaced
{...,
Any
function
y(u)
will
then
Y ( U i _ 1 ) , Y(Ui) , Y ( U i + 1 ) , . . . }
coordinates
{...,
ui_1,
ui, u i + 1 , . . . }
with uj = u 0 + jq,

Remark
is not
the
representation
essential,
of v a l u e s , We w i l l
we
other
associate
assume
that to
where
the m a t r i x
D has
basis
n infinitesimal. will
representations a vector functions (~/Su)
are
an i n t e r m e d i a t e
can be p r e f e r r e d
Z of p o s s i b l y
y(u),
elements
only be
infinite
sufficiently we
associate
step
 To the
set
dimension.
differentiable, Dy
and
e.g.,
51 d .1O . = =

t/(2n),
if
i
= j1
1/(2n),
if
i
=
= When 2 d o e s n o t
vanish
differentiation, similarity We
also
in o r d e r arbitrary
at
need
an i n t e g r a t i o n
support
y(u) z(u)
one
f(u)
=
to
With
this
moment,
z(u)
with
z(u), du,
set
be
taken
will
of care
of by
calculus. It will
respect
be
of m a t r i c i a l
to
a weight
f(u)
2'
F A = A'
F 2
type
and two
we a s s o c i a t e
and has
elements
diag( .... f(u i) . . . . ).
I, f(u)
formalism,
only
take
Let 2 be the
the
unit
constant
we
associage
~,
we
associate
2'
we t r a n s f o r m
with into
vector
du,
equality account
associated
the and
the
function inequality
is s u b s t i t u t e d
F ! analysis
problem
constraints.
in a
At the
equalities.
to the
V = 2' under
open
e.g.,
is c o n c e r n e d ,
=
minimization we
the
e.g.,
to $ y(u)
standard
~
which
operator.
F is d i a g o n a l
function
other,
and
of
infinitesimal
an o p e r a t i o n
functions
only
occur
ordinary
F
When
frontier
difficulties
the m a t r i x
to the
the
the
to $ y(u) where
O, o t h e r w i s e ,
with
to
j+l
~(u  e),
then
it m i n i m i z e s
F 2
constraints c I = 2 !'
F 2=
0
and c 2 = 2( !' Constraint stands We
c I indicates
for the solve
multipliers;
that
denominator
this let
minimization them
be
8 is
of the
XI'
F D ~  I) = O. an M  e s t i m a t o r , asymptotic
by the m e t h o d X2"
Then,
V + ~1 C l or,
after
differentiation
with
respect
~ is
+ ~2 c 2 to
~,
whereas
constraint
variance. of the
Lagrange
also m i n i m u m
of
c2
52
F y + ~I
F 1 + ~2 D'
F I = 0
and Y =  ~I 1 + ~2 FI The
last
can be its
transformation
performed
domain
only
makes if t h e
of d e f i n i t i o n ; D'F
Introduction
of ~
use
DF 1.
of t h e
antisymmetry
distribution
vanishes
of m a t r i x
on the
D,
frontier
and of
i.e.
=  D F * f(u_)
in c I and
= f(u+)
c2 provides
the
= 0.
Lagrange
multipliers
~I = ~2 !' D F !, given !' F ! = I, X 2 = II!' F D FI D F !" The
former
cancels
accordingly
with
x11x ~ = !' o F ! = $
(alau)
= f(u+) We
collect
the
results
and
f(u)
du
 f(u_)
= 0
obtain
~(u
 e) = x 2 f(u) I
(alau)
~(u
 e)
in
f(u)
or
Given
ci,
see t h a t
estimator,
f(u).
i.e.
f we
= x 2 (alau)
~(u
any M  e s t i m a t o r except
for
which
of S t e i n ( 1 9 5 5 ) ,
space
boundary.
when
the
asymptotic
du = 0,
is e q u a l
a translation
sense
The
 e) f(u)
to
constant,
distribution variance
the m a x i m u m is
vanishes
of this
=

F y 12
where
x2
=
I
I I [ ( a 2 1 a u 2)
in
f(u)]
on t h e
Mestimator
by V = y'
likelihood
admissible
fCu)
du.
in the sample is g i v e n
53
Although
the
classical, relations of the
we
characteristics
illustrate
between
sample
the
the
various
=
[I/r(~
+
I)]
likelihood
findings
elements.
Ul,...,u n drawn
f(u)
of m a x i m u m
above
Let
from
the
gamma
(u
a) v
exp

estimators
in o r d e r us
estimate

a)],
strictly Note
the
Effectively, respect
to
u >
a
we h a v e
In the
v.
parameter
restriction derived
present
v
on p a r a m e t e r
a differentiable
boundary.
location
,ifu
explicit
{ [ 9/(U.1
result

derived
e)]
with
1}
= 0
respect
to
e0,
an a p p r o x i m a t i o n
of 8.
e =
This not
eo +
estimator
robust.
~ wi
[(1/~)

is m i n i m u m
 By the
way,
of d i f f i c u l t i e s
the
of s e c t i o n
of e is not high
order
anymore terms
i

eo)]
in a s y m p t o t i c
note
is i n d i c a t i v e expansion
1/(u
that
in t h e
inversely the
variance
V tends
too
slowly.
to t h e
first
expansion
n var(@)
= V + 0(nl).
eo)2].
obviously
is
for v = I; t h i s
conditions
proportional in the

but
to c a n c e l
analytical
2.4 to c o n v e r g e
dominate
[ [ wi/(u i
/
which
Then the
sample
size
lead
variance n; the
54
We now p r o c e e d constraints a minimum
which
in the
derivation
by
impose
robustness
to the M  e s t i m a t o r .
variance
e and thus
have
equality
of the
inequality We
search
for
to m i n i m i z e
V=y' under
addition
FZ
constraints e I
=
2
1'
F
y
=
0
e2
=
2( S '
F
D
Z

and
We
further
limit
p(.)
to be
I)
=
convex,
0.
that
is we r e s t r i c t
by
C3k =  2~' k D [ < O, for any basis bound mean may
to the
vector
influence
quadratic be
~k =
stated
curve.
value. in the
Thus
Inspection
reveals
only
This
bound
a second
We
will
group
of
be
also
set
a superior
set r e l a t i v e
inequality
to the
constraints
form
Chl
if, and
(0,...,0,1,0,...,0)'
that
= (e' 1 Z) 2  8 [' the
set
F ~ ~ 0.
of e q u a t i o n s
has
a non t r i v i a l
solution
if, 8>I.
The
situation
median We
of f(u)
8 = I, seen for
investigate
method
of Kuhn
section
h.3.3).
be m i n i m u n
as the
solution. this
and
limit
It is the most
inequality
Tucker,
to the
to
above
the
inequalities
lead
the
[ X3k with
the
Beveridge derivation,
following
last
C3k + [ Xhl
two
sums
C3k + [ Xhl
constraints
on the
X3k > 0
and
to
and
Xhl > 0
the
by the
Schechter
the
solution
Chl ,
satisfy
ehl = O,
Lagrange
gives
e.
minimization
of
V + X1c I + X2c 2 + ~ X3k but
8values,
robust
constrained
according
Similarly
of g r e a t e r
multipliers
(1970, Z must
55
for
all k and
delimited not
I.
There
by the
exterior
to this
as r e s o l v e d
accessible
minimum. now
a further
region.
considered
Consider
is
inequalities;
the
seeing
the
requirement
minimum
This
question
that
8 >
term
conclude
as w e l l
can
the
of a c c e s s i b i l i t y
sequel,
Vajda We order
the
contribute
in a non
at the
accessible
either
k
will
be
existence
of an
= 0.
positive
minimum,
way,
therefrom
we n e c e s s a r i l y
and
or
= 0 3k k3k > 0
and
< 0 3k C3k = 0,
either
~41 = 0
and
c41 < 0
or
ALl >
and
Chl = 0.
alternatives are not
lies
we
will
of the
on the carry
thorough
(1961,
we
have
c
obtain
boundary
the
is o b v i o u s l y
binding of the
attention
discussion
section
differentiate to
0
of these
second
for the
for the
accessibility
on the
of the
that,
and that,
region.
terms;
KuhnTucker
combined
the m i n i m u m .
this
approach
expression,
This
yields
with
respect
terms, In the
is
in the
given
to ~,
to
) F Z + ~I F ! + ~2 D' F !

D'
(e'
Z) ~ i = 0
and 
)I
 [ ~3k is u n d e r
first
second
12.~). the
(I  8 ~ ~ i
This
region
and thus
condition
constraints
the m i n i m u m
line
the
accessible
as,
The m e a n i n g terms,
only
that,
be
I guarantees
~3k C3k + X ~ 1 % 1 Each
concerning
must
conditions
F I
D ~k  X ~ I
(e' ~)

F I
~l l"
in
by
56
D'
F =  OF o r
f(u_)
= f(u+)
= 0
and
The
latter
on t h e
frontier
The will
factor
of t h e
are,
types
subsets
Type
be
in
to
that
third,
of
cancel
new
the
definitions
that
 e)
can
They
are
inequalities
on
Lagrange
matrices
assume
$(u
be c o n s t r a i n i n g
multiplicative
a continuous
space.
of
cannot
of t h e
the
accept
note
of b e h a v i o u r s
groups
(ZBk m u s t
obtain
sample
criterion
that
if we
And
D ~k
in t e r m s
note
Second,
corresponding
=  Z3k
space
omitted,
in o r d e r
two
~k
for ~
First,
of t h e
I : the
obtain
convexity
diagonal.
derivative
different
the
sample
analysed. can
D'
obtained
multipliers.
or n e a r l y

be
(.)I
Lagrange
connex
that
expression
now
first
implies
Z3k
F I
frontier). multipliers scalar
of
the
and
(FID)
f(u)
has
a continuous
~(u
 e).
Then,
be d i s t i n g u i s h e d
are
not
binding.
are,
three
in
Then
we
subsets.
k3k_; Y =
~41
k2 F
= 0
D F I_  ~,i !
or
$(u
 o) = x 2 [ ( a / a u )
in f(u)]
 ~I
with
[¢(u

0)] 2 ,~ Bv
and

~(u  Type
2
: in
subsets
where
the C3k
e) > o .
convexity = O,
criterion
Chl < 0
and $(u
 0)
= constant
with
[~,(u

0)]2
I.
I subset, with
unknown
There,
coefficients
b I and
b2, ~(u
Seeing the
that
the
existence
There
can
be
o)
convexity
one
 e)
or
two
bound,
=
b I ll/(u
criterion
of a u n i q u e
corresponding ~(u

type
of t y p e
summarize
the
=  blb3, =biE
S/(u
a)
is n e v e r
I subset
subsets
we

 h2] "
binding,
and
to
the
3.
Noting
structure
of
 b21 ,
with
b~
b2, =
b5 =
~(u
 e)
if b 4 ~ u  a ~ b 5 if u ~ b 5
b3 ~ 0
I / ( b 2 + b 3) I/(h 2  b3) ,
if b 2 ~ b 3 if b 2 ~ b 3 .
conclude of t y p e
to 2.
blb 3 the
if u  a ~ b~  a)
= + blb3,
bl,
we
absence
as f o l l o w s
88

It m a y
be n o t e w o r t h y
obtained
by
Huber
distribution The
although
w a y we h a v e
determination variance verify
the
observe section
the
constraint
they
ci,
satisfy
as
such t h a t
is v e r y
nearly
contaminated
the
result
normal
differs.
function
~(u
In o r d e r
to
coefficients
 8) p e r m i t s produce
must
be
an e a s y
a minimum
such t h a t
they
i.e.
¢(u  e) f(u)
the
constraint
f~ t(~/~u) as w e l l
context
the
the
this
for t h e
coefficients.
Mestimator,
f~ such t h a t
that 6)
present
parametrized
of its
robust
to
(1964,
they
du = 0,
c2,
i.e.
~(u  e)J f(u) du = I,
satisfy
c41
in t h e
max~ ~(u  e)l 2 =
type
(blb3)2
3 subsets,
i.e.
~ sV
with
f~ E~(u  e)J2 f(u) du.
V Due to the
absence
constraints This
set
means. values
of i m p l i c i t
of ~ a n d good
B.
but
less
To set
Our
V = ~;
further
define
have
concern
it
can
omitted
easily
I the r e s u l t s is for
With
this
is m o r e
be
the
solved
obtained
B = 4 and Bvalue,
efficient
inequality
by n u m e r i c a l
for
we its
than
relative
efficiency
= ~/(~
than
the
nonrobust
minimum
relative
efficiency
= ~/(~
illustrate,
~ = 3.
we w r i t e
Accordingly
f ~(u we
we
a few particular
see that
the
asymptotic
the
robust
variance
nonrobust
mean,
efficient
B = 4,
equations
in T a b l e
efficiency.
is a p p r o x i m a t e l y arithmetic
2 subset,
CBk.
We p r o p o s e
e exhibits
of t y p e
e through
the
down t h e
+ I), variance  I).
solution
with
 e) f(u)
implicit
du = O,
equation
Mestimator,
for the
parameter
59
[ ~i
~(ui
 e)
=
o
or .2849 [ wj  [ Wk[ I/(u k  e) " .3076]
= 0
where u i = uj, = Uk, Some 9 percents observations
v=2,
v=3,
of their
into account through their weights
and
exact values. V
b5 $b 4 f(u) du
8=4
6.002
.4401
.4698
1.988
.9006
6=9
4.114
.4694
.9171
1.582
.9632
8=25
3.052
.4879
1.878
1.314
.9908
8=100
2.458
.4968
4.353
1.145
.9988
8=4
12.07 8.921
.3076 .3219
.2849 .5309
2.956 2.492
.9086 .9686
8=25 8=100
7.212
.3297
1.031
2.212
6.369
.3327
2.256
2.065
.9932 .9993
8=4
20.27 15.68
.2360 .2444
.1959 .3546
3.942
.9144
3.435
.9723
13.30 12.28
.2485 .2498
.6675
3.154
.9947
1.418
3.032
.9996
8=4
114.1
.0980
.0553
89 8=25
97.92
.0994
.0934
9.958 9.295
.9826
91.42
.0999
.1645
9.046
.9981
8=100
90.06
.1ooo
.3331
9.002
I .000
8 large
v(v1)
1/~
*
v1
8=100
b3 = 8
there the
b3
8=25
v > I,
lie in the type 3 subset;
b2
6=9
v=10,
1.688 + e.
bI
8=9
v=4,
if u i >
of the weights
are taken
irrespectively
if u i ~ 1.688 + e
1/2 [ v 2 ( ~  1 ) ]  1/2
Table
I
.93O3
60
The be
estimator
applied
8 is c o n s i s t e n t
to the
assessment
approximately
according
to f(u).
distributions
than
may
4.4.
MMestlmators
We h a v e robust some 
f(u)
Mestimators
we
robustly
We w i l l residuals scatter (1976)
are
a regression
defines
what
a.
sample
e means
It m a y
distributed
for o t h e r
difficulties.
estimation that
to
it was
without
is t r u e
involving
even
include
not
in the
a scale
possible
to
obtain
a dependence
simple
point
estimation
on
location
whenever
we
problem. by the
as w e l l
This
to p a r a m e t e r of a n y
conceptual
h.3.1
concerned
we c o u l d
estimation. who
 This
compelled
be m a i n l y but
section
of r e g r e s s i o n
parameter
Therefore,
solve
at
respect
location
However,
present
in r e g r e s s i o n
observed
"scale"
with
of the
be
is the
estimation
in n e e d object
of
of the
scale
of the
some m u l t i d i m e n s i o n a l
of the
simultaneously
location
and
"scale"
residuals
?
investigation scatter
by
of M a r o n n a
a set of
MMestimators. But
what
is t h e
disappointing
answer
has
of the
been
proposed
A partially
by H u b e r
(196~).
We
excerpt
:
"The t h e o r y of e s t i m a t i n g a s c a l e p a r a m e t e r is less s a t i s f a c t o r y t h a n t h a t of e s t i m a t i n g a location parameter. P e r h a p s the m a i n s o u r c e of t r o u b l e is that t h e r e is no n a t u r a l " c a n o n i c a l " p a r a m e t e r to be estimated. In the case of a l o c a t i o n p a r a m e t e r , it was c o n v e n i e n t to r e s t r i c t a t t e n t i o n to s y m m e t r i c d i s t r i b u t i o n s ; t h e n t h e r e is a n a t u r a l l o c a t i o n p a r a m e t e r , n a m e l y t h e l o c a t i o n of the c e n t e r of s y m m e t r y , and we c o u l d s e p a r a t e d i f f i c u l t i e s by o p t i m i z i n g the e s t i m a t o r for s y m m e t r i c distributions (where we k n o w w h a t we are e s t i m a t i n g ) and t h e n i n v e s t i g a t e the p r o p e r t i e s of this o p t i m a l e s t i m a t o r for n o n s t a n d a r d c o n d i t i o n s , e.g., for nonsymmetric distributions. In the case of s c a l e p a r a m e t e r s , we m e e t , t y p i c a l l y , h i g h l y a s y m m e t r i c d i s t r i b u t i o n s , and the a b o v e d e v i c e to e n s u r e u n i c i t y of the p a r a m e t e r to be e s t i m a t e d fails. M o r e o v e r , it b e c o m e s q u e s t i o n a b l e , w h e t h e r one s h o u l d m i n i m i z e b i a s or v a r i a n c e of the e s t i m a t o r . " The scale will to
same and
only
some
author
scatter assume
estimator
has
recently
definitions that, ~I
given
through
investigated
(1977a).
For
various our p a r t ,
a set of r e s i d u a l s
approaches
to
at the m o m e n t
we
Cl,...,a n corresponding
61
•
=
cl we have
a minimization
rule
u

i
which
Z'i
~I
provides
s=8
' the
scale
s according
to
2
and M 2 = min
for
e2
with
M2 = I In
order
to
function
provide
p2(.)
P2 ( ~ ' ~ 1 ' e 2 )
a measure of
must
be
selected
the
scale
such
that
e 2 = scale is c o n s i s t e n t
f(~)
of
d~.
of
the
residuals,
the
(¢1,...,En)
with
Ixle 2 = scale We now d e v o t e
our
of
attention
(X c 1 . . . . .
X Cn ) , X 6 R.
to the M M  e s t i m a t o r
~I"
It will
be
such
that M I = rain for
eI
with MI where
Pl
is c o n s t r a i n e d ~I
= [ Pl
(~'A1'e2) f(K)
to y i e l d
compatibility
= regression
estimate
on
d~ between
(~1,...,~n)
and ~I The two to have
= regression conditions
the v e r y
on the
natural
Pl(~,
estimate
A 1,
Pl
on and
(X ~ I , . . . , X P2structures
two
next
in use
subsections, to
the
e 2 ) = p l ( £ / e 2 , i 1) =
In the
constrain
form
= Pl [ (u Z'
procedures
~n ), ~ E R+.
solve
the
01
we p r e s e n t set
~l)/e 2]
(c/s).
the m a i n
of i m p l i c i t
computational
equations
former
62
The
last
provide
subsection robust
f
~1(~,
A1,
e 2) f ( ~ )
d~ = O,
f
~2(~,
A1 , e 2) f ( ~ )
d~ = O.
indicates
the m a i n
proposals
designed
in o r d e r
to
estimators.
4.4.1, Relaxation methods Although present be
many
the m a i n
sufficient The
one
are
algorithm
where
to i n d i c a t e
algorithm
repeated
variations
is
sufficient.
Each
and we w i l l
assume
of ~I
and
In
times
that the
we step
corresponding
~2(~,
step,
in a t r e a t m e n t
convergence will
at d i s p o s a l
squares
In p r i n c i p l e practice
these
no great
usually
two
difficulty
an e x p l i c i t
is a k n o w n
reorganized
regression First
an M  e s t i m a t o r notation
in o r d e r
of
previously,
i.e.
k+1)
f(x)

could
can
enter
the
method. k
(~
, e 2)
and we
dx = 0 _ •
dx = 0

"
be d i f f i c u l t
is e n c o u n t e r e d
k+1 e2
e~ +I
$.3.
being
Let
according
= S(c~,
of the
to be
of r e g r e s s i o n
section
we
k
of
' 02
We d e t a i l
that,
is
at this
to
solve,
level.
The
but
in
first
solution
function
algorithm. observe
k+l
which
is d e e m e d
solution
f(x) _
solution
A1
only will
of
k+1) e2
A~,
equations
s = S(.)
solution
This
be i d e n t i f i e d by an index k k ~I and e2, a p p r o x i m a t i o n s
Initially,
least
^k+1 ~I , the
$ ~1 ( ~ '
did
the
begins.
_k+1 ' the ~2
 Compute
be
or
to the
$
can
until
we will
appears.
difficulties.
consists
k, we start with the a p p r o x i m a t e in ~• I k+1 ' e2k+1) by the s c h e m e
 Compute
where
the
methods,
factor
step
improve
has
lie
have
in t h e s e
no d a m p i n g
and
repetition,
e2, w h e n
solution
where
iterative
or s e v e r a l
possible
to
.,¢~)
residuals.
solved
by
The
any l e a s t
second
equation
squares
further. already
(g = I). us
..
define
estimated,
Therefore, the
scalar
we
are
we will
estimating
join
function
the
@(g)
as we
63
¢i(£, !I, e2) = ¢I [(u  Z' A1)le2] =  ¢ ( u  Z' =  ¢(~) Z, then
the
equation
AI ) Z
can be w r i t t e n
[ wi ¢(k)i
vl = 0
for
f(~) = (11[ w i) [ w i ~(~  ~i ). And,
to
conclude,
the
equation
is t r a n s f o r m e d
k k wi[¢(¢i)/~i]
(u i  v'
in
AI)
i
Zi = 0
or
this
produces
system.
an
improved
Whether
effectively
n a t u r e of the f u n c t i o n ^k+1 to ~I , by r e p e t i t i o n admissible is quite
in the
general
can be v e r y 8)
clear
as w e l l
important whether
starting
set
that
that,
after
drawback
this
can
be
be a s s e s s e d possibly
and
corresponds
• Ik +1~Ik
depends
upon
that
converges
seen
Note
situations,
in c o m p u t a t i o n
the
well
time
the
is approach
selected
 see H u b e r
(1973,
(1974).
relaxation
erratic
[I
¢(s)
method
be o b s e r v e d importance situations to the
at s e c t i o n
a few
results
above
in the
linear
be
that
It m a y
in the v i c i n i t y
last
whenever
Dutter
associated
of the
h.3.1.
the u t m o s t
saddlepoints)
Thus,
is linear
and
of this
has
easily
specific
cheaper
inversion
computation,
of a few p a t h o l o g i c a l
will
converge.
much
in
it c o n v e r g e s .
(and p o s s i b l y problem
last
section
as H u b e r
(AV , e~)
Investigation author
of the of
by
improvement
It m a y
and that,
section
an
~I
¢(~).
sense
methods
An
estimate
is that
that
existence space
h.5
the
 For
steps,
of the
approximately
the
final
final
revealed
parameter
is not
sometimes
on the has
it
to this
of
several
of
(~I'
time
being
relaxation solution,
minima
82)
 This
we
assume
method the
the
solution.
does
convergence
to
I I = A11 A12 A 2 2 A21
(e~ ^k+1)   ~I '
where
Ajk = f C j k We omit the
the
argument
derivation
leading
presented
(~' At' e2) f ( ~ )
to this
at s e c t i o n
result ~.2.
d~.
seeing
its
similarity
with
64
In
spite
frequently most
of the
centers least
have
as
software
package
Various sets
methods
we
only
simultaneously perform The
~I
4.5 b a s e d
the m o m e n t ,
of e q u a t i o n s the
and
defining
we have
we want
the
by
e 2.
Each
equation our
the
improved
solve
The
algorithms Recent
as Y p e l a a r Coleman
and
convergence,
This
is
computer generalized
are g e n e r a l l y
experience
and V e l l e m a n
has
been
(1977),
a
et al. (1977).
to
of the
direct
in the and
h.2
simultaneously
a fairly
problem
solution
estimates
general
section
estimate
describe
of [I space
by
e 2.
([I'
It p r e s e n t s
leading
to
method
but,
iterations
and of
the
general
at hand,
can be p r e s e n t e d
an M M  e s t i m a t o r .
of
at
which
In fact,
we
82)" for any
many
number
g
similarities
an e x p r e s s i o n
of the
~(x). at d i s p o s a l
solution
a set
Under
only
of a p p r o x i m a t i o n s
(el,...,eg)
is a p p r o x i m a t e l y
attention
expansions.
to
We w i l l
f Cj fax = f Cj(x,
devote
most
iterations
quite
function
Assume
that
proposed
consider
derivation
influence
been
produce
is
method.
on a m o d i f i c a t i o n
NewtonRaphson argument
a relaxation
software
of
solutions
have
of v a l u e s
at s e c t i o n
that
is p r e s e n t e d
approach
consideration
squares.
as well
above
hazards
problems. least
(1977)
Slmultaneous
4.4.2.
into
of the
possible
an e f f i c i e n t
regression
"reweighted"
by Gross
of the
adopted
taking
at d i s p o s a l
reported
with
have
when
squares
denoted
deficiencies
knowledge
experimenters
understandable
two
obvious
without
(el,...,eg),
and
set of g e q u a t i o n s
e I ..... Og) f(x) dx = 0.
satisfied
to the
the u s u a l
of the
first
conditions
by the
order
set
terms
(el,..., in the
eg)
and we
following
of d i f f e r e n t i a b i l i t y ,
we have
t
f Cj fax = f [¢j + X ¢~k (ek  ek)] fd~ = f Cj fd~ + ~ [f Cjk fax] (e k  e k) = X Ajk(e k  ek). We see
the
equations. that
is w h e n
another,
difference
When the
the
(e~  e.)j is the
coefficients
estimators
a solution
are
Ajk
(k @
relatively
can be d e r i v e d .
Under
solution
of
a set of g l i n e a r
j) are d o m i n a t e d
by Aj~,
independent
one
from
65
II Ik Aik A kik Ak~. A?! jj IT < < I, k ~ i, k # j, we
obtain
ej = e~  {X A~! jj A j k A ki k f ~k* fax  2 A~! JJ f Sj* fdx}
*
•
j fdx
: ej  ;
(x, e I ..... eg) f(x) dx.
= e 5  $ nj Inspection
of the m a t h e m a t i c a l
expression
is c o r r e c t
upon
treatment
if the
reveals
estimators
that
are
the
strongly
last dependent
one a n o t h e r .
Therefrom finite
we
conclude
increments
section
that
e~  e~.
robust
This
MMestimators
is i m p o r t a n t
are
in v i e w
obtained of the
through
next
where
R. b e c o m e s a c o n t i n u o u s f u n c t i o n of some p a r a m e t e r J from (81,... , eg). F u r t h e r if we o b s e r v e large i n c r e m e n t s ,
independent this
even
must
lead
us
to
suspect
lack
of r o b u s t n e s s
of the
concerned
MMestimator.
4.4.3.
Some
Basically robustness
proposals very
few p r o p o s a l s
is f r e q u e n t l y
functions
~I and
have
difficult
~2 s h o u l d
ve
been
to
select
f ~I (x' ~I'
advanced
and t h e i r
appreciate.
The
level
question
of
is what
in
02) f(~)
d~ = 0
AI , o 2 ) f(~)
d~ = 0
and $ ~2(x, in o r d e r
to
obtain
robust
estimate
of the
u = Z' AI and r o b u s t
scale
estimate
of the
Possibly
that
the most
Princeton
Although
described the
are
exhaustive
Robustness
thoroughly
both
Group
of
c.
simultaneously set
of p r o p o s a l s
and
Study
in the m o d e l
residuals
(see A n d r e w s
by Gross
Princeton
11
+
s = O 2 = scale  Let us r e c a l l
regression
Tukey
has
been
robust has
et al.,
or n o n  r o b u s t
been
1972)
conceived and
it has
by the been
(1973). largely
concerned
by l o c a t i o n
66
problems, field.
many
of t h e i r
In this
although
they
will
retain
not
are o b t a i n e d at
section
well
as
There to
should
already
Assume
that
for k n o w n
to
problem
of the
regression procedures
at hand.
onestep
from
Thus,
estimators
relaxation
can be d r a w n
associate
a given
although
we d i s s o c i a t e
scale
to t h e
method
Bickel
we
which seen
(1975)
as
work.
of @ 2 ( . ) ,
At the m o m e n t ,
to t h e
iteration
details
reason
selection
adapted
the c o m p u t a t i o n a l
on the v a r i o u s
referred
is no o b v i o u s
performed.
adapted
a single
Further
can be
disregard
attention
through
a specific
we
he w e l l
the
h.4.1.
in the
estimators
section,
this
these
selection
of @i(.)
is f r e q u e n t l y selections.
factor
s, we
estimate
M I = ~ w i p1(ei)
= min
for ~I
~I
through
with ¢i = ui then
the
family
two m a i n
classes
2 c ,
=
= ks is a p a r a b o l a
functions proposal
becoming
classes
whatever squares
I~1

ks),
prolonged gradually
: first,
a
~ ks
if
Ic]
if
I~1 > k s ,
by two t a n g e n t s constant
for
and,
large
=
I  COS [ e/(CS)] , if
I¢I ~ wCS
=
2
I~I
is ¢ and,
, if
for
have
small
robust
distributions, the m i n i m a x
known
scale
least
bounded
>
first
residuals,
squares.
In o r d e r to has
estimation
Huber
second,
residuals,
families among
of
them
a
~cs.
and
second
are e q u i v a l e n t
s.
His
It p r o d u c e s distribution avoid
suggested
any
(1964)
good
the
can be
results
is not
context
f o r the
proposal
incidence
to u s e
in t h e
demonstrated
solution
parameter
reference
1975)
following
derivatives to t h e
least
method.
as b e i n g
the
C2
of p r o p o s a l s
Investigating normal
are t h e
due to A n d r e w s PiCe)
Both
of p r o p o s a l s
AI
i
due to H u b e r
~1C~)
which
 v'
of c o n t a m i n a t e d
optimality location seen
as
but p o s s i b l y
of his
problem
proposal with
a "robustified" suboptimal
normal. of t h e
a function
0(e)
outliers, constant
Andrews
(1974,
for l a r g e
when
67
residuals. 25A
This
in A n d r e w s
had
also
et al.
continuous
~(¢),
cdomain.
Therefore,
these
produce
unforseen
clearly
appear
the
hand,
one
and,
on the
been
1972)
approaches these
results
in the ~I
yield
functions
possibly
hand,
by H a m p e l (1977),
to n e g a t i v e
are
not
at the
$(¢)
are
12A to
a
for
some
and may
way.
of this
with
solutions
using
admissible
end
discontinuities
several
(proposals
but,
in an u n n o t i c e d
illustration
exhibits
other
considered
and by Gross
This
will
section
respect
where,
to p a r a m e t e r
possible
for
some
on c
given
c. We now turn robust
our
attention
definition
to the
is the m e d i a n
scale
of the
factor
residuals
estimate.
A seemingly
in a b s o l u t e
values,
i.e. S = e2 = m e d i a n
(ISll .....
l~nl)
or
e 2 = lim e2v , for v ÷
I, v >
I
in
M 2 = ~ w i p2(¢i
) = min
for
e2v
with
This
definition
definitions statistics
in the
frame
of
gained
normal
be
been
(e.g.,
robustness. be
will
have
section
the two
it r e m a i n s
to p e r f o r m as
to
(1977b)
seen,
avoid propose
obtain
Dutter
illustration. based but
furthermore
of the
the
this
this
on how to tables
of pi(¢)
they
they
Several on
other
some o r d e r
do not
necessarily
sometimes
select
in the
and
computations may
involve
trouble,
to m i n i m i z e
simultaneously
exhibit
a scale
appendix
fit
poor
estimator
can
on c o n t a m i n a t e d
Huber
and
leading rather
Their
been
to the
(1974)
expression
e 2.
have
than
decided
not
seem v e r y
properties.
 Z'i
~1)/sl
appealing
We feel
afraid
to us by the
s + As
calculations. as well (MI,
expression,
= rain for ~I'
in spite level
of
upon,
eventual
expensive
and D u t t e r
another ~I
p2(¢)
M2)
such that
resolved
several
s >
0
favorable
of a r b i t r a r i n e s s .
In
as H u b e r
(1977), wi P[ (ui
does
2.4,
definitions
estimates;
they
range)
insight
inspection
the
distributions.
Once
order
in
frequently
interquartile
Some m o r e by
applied
proposed
by
68
Quite
a different
in d i r e c t l y among
alteri,
by H a m p e l argument factor
approach
minimizing the
(1975)
the
has
scale
attitude
of J a e c k e l
as p r o v i d i n g
appears
dubious
the
to us.
above
procedure
selection
of
it c o n s i s t s
instead
This
(1972b)
optimum When
of M I.
a n d has
breakdown
the
been recommended
point,
definition
is,
but
of the
the
scale
(loll
for v e r y
large
power The
everywhere.
The
Campenhout
a ~lestimate
the
Pl(S)
= min
v.
It m a y ,
above
corresponding
to
the
1 + l/v]
[ ~ ( l ~ ,l / s' ) ' " therefore,
function
solution
(1977)
escape
produces
I%1),
.....
a function
Mestimators.
to
been mentioned,
is s = median
the
already estimate
is not u n i q u e .
it does
not
combinatorial
be
seen
is a d m i s s i b l e ,
seem
Further,
possible
as the l i m i t
but
not
after
to d e s i g n
of
convex Cover
and v a n
an a l g o r i t h m
able
complexity.
Illustration
@.4.4.
We b r i e f l y of r e g r e s s i o n The
report on
regression
considered for the
(1965,
and
Wood
observations
problem
by m a n y
oxydation
Brownlee Daniel
the
a classical we
are
authors. of
It
amonia
section (1971,
made
in c o m p a r i n g
several
methods
example. investigating is r e l a t i v e
into
13.12), chapter
nitric Draper
has
to t h e
acid and and
5), A n d r e w s
already
been
operation
of a p l a n t
can be f o u n d
Smith
(1966,
(197~)
in
chapter
and D e n b y
6),
and M a l l o w s
(1977). The
data
consists flow,
set has
in the
against
against constant
the
size
regression
n = 21
of ui,
v3i , a c o o l i n g
v4i , an a c i d in the
and
is o f d i m e n s i o n
a stack
water
inlet
concentration;
the
loss,
against
temperature, term Vli=
p = ~.
It
v2i , an air
as w e l l
as
I introduces
a
regression
u i = v'i 8_i + ¢ i. In this
example,
various
observations
(i =
distribution
of t h e
techniques
1,3,~,21)
are
seventeen
have
clearly
others
revealed outlying
which
form
that with
a neat
four respect
to the
cluster.
69
Although be
this
does
ascertained.
not
a method
of n o n  l i n e a r
(1976).
This
2dimension
method
plane
observations; close. the
In o r d e r
in the
of t h e
observation
21 o b s e r v a t i o n s
multidimensional
produces
while
distant
distance
appear
A plot
preserving
points
to be
a map
the
remain
independent
d,. zj between
scaling
of the
distance
two o b s e r v a t i o n s
this
in F i g u r e
described
~dimension
distant of the
coordinates,
is g i v e n
in Rey
space
in a
relationships
and
close
coordinate x. and x. z j
can I by
between
points
remain
dimensionalities, has b e e n
defined
according t o d~• j where ~Ik
is m e m b e r
solutions. have
= ~ ~
The
scarcely
of a set
exact
any
 v' i
[(ui
Ilk ) 
{el ,I' ~I ,2'
composition
incidence
(u j
of this
on t h e m e t r i c
 v' ~ Ilk)] 2 '
"" .} of r o b u s t set of
regression
solutions
as long
as t h e y
seems
to
are
independent. Four
different
selections
of t h e
function
~I = X w i p1(~i ) to be m i n i m i z e d
will
results
reported
scale
will
be
factor
s,
n o w be
considered
in T a b l e
table
is c o r r e c t
Weighted outlying 21 to
observations
size
17.
outliers. whereas
up
This
In T a b l e
Fit
to the
least squares.
last
With
procedure
2 corresponds
Least 9th power.
implies
v between
incidence Rey
of the
(1975a)
result
2 and
or,
17.
the
form
outlying
for v = 1.2
HuberWs
I, a r a n g e
at less
method.
of
=
selection
some
corresponding
digit.
to go
solutions
identification least
squares
with more
is o b t a i n e d
by the m e t h o d
as Fit
of the
smoothly
four
from
size
of the on
size
21
I~I ~
observations
expense,
is r e p o r t e d As
prior
ordinary
size
Selecting
pI(C) with
each
and the
= c 2, the w e i g h t
decreased
I is the with
printed
pi(¢)
is g r a d u a l l y
2, F i t
for
(I~iI ..... ICnl).
s = median This
and
2, n a m e l y [I
we
by the
of next
3.
seen p r e v i o u s l y ,
select
or less algorithm
section.
The
of
70
21
15e 11ol2 Q
20
16 10 19 18
14
§1] 5 1
2
3
4
FIG. 1
? 90 0 6
13
71
2 = ¢ ,
pi(¢)
if E > ks,
= ks(2 The
result
proposed
for
I~I ks),
k = I is r e p o r t e d
by H u b e r
is the
only
if ~ > ks.
as Fit
4.
admissible
Let us
method
note
among
that
the
the m e t h o d
four
we
are
considering.
Method of Andrews. o1(c)
A result
for
strictly
with
large the
same
solution
given
1.5
c value. c =
and
5.
It does 5, last
The
four
outlying
s on
size
size
21 r e s u l t
the
size
17 r e s u l t
of this way.
21
There
7 and
claim
that
size
be
several
between
due
result
to
is
is n o n s e n s e ,
correct Possibly
it m a y
produce
are d i s c o n t i n u i t i e s may
the
17 d i f f e r
defensible.
is that
correspond I column)
is a n e a r l y
Fit 8 are t h r e e
intermediate
not but
observations and on
is not
method
c and there
6, Fit
a value
I~I > ~ e s
Table
computation.
parameter Fit
1.8,
Icl ~ ~cs
if
the
weakness
of the
as Fit
(1974,
is
if
factor
in an u n n o t i c e d
function
with
in his
the
structure
eos[c/(cs)],
result
scale
whereas
selected
is r e p o r t e d
In fact,
important
results
2,
as w i t h o u t
the
significantly.
most
I 
=
Andrews'
with
that
=
c = 1.5
inaccuracies
seeing
The
in ~I
as a
solutions
solutions
the two A n d r e w s
the
aberrant
for
a
obtained recommends,
2.1.
Fit
8
I
82,1
e3,1
e&,1
39.920
.71564
1.2953
.15212
1.9175
L.sq.,
size=21
2
37.652
.79769
57734
.O67O6
1.o579
L.sq.,
size=17
3
38.805
.82643
64760
.08577
1.2194
L~,
38.158
.83800
66290
.10631
1.1330
Huber,
5
37.132
.81829
51952
.07255
.96533
Andrews,
c=1.5
6
37.334
.81018
54199
.07037
.99926
Andrews,
e=1.8
7
~1.551
.93911
58026
.11295
1.4385
Andrews,
c=1.8
8
~1.990
.93352
.61946
.11278
1.5710
Andrews,
c=1.8
1,1
Table
Each method which
become
produces
apparent
a different
while
~I
comparing
~ = 1.2 k =
I
2
estimate, the
but t h e r e
respective
are trends
fit r e s i d u a l s ,
~i"
72
All methods differently The
scale
see l a s t
least
There
an
of
last
simultaneous
divergence
we
the
solution
R'(.)
repeat
conceptual been
have
estimated are
than
by the
the
former
be
seen
that
to
We
In our
solution
process
section
nonlinear iterative
,...,eg)
(e
an " i m p r o v e d "
regard
involved
are,
to
observed
or of
that
could dissimulate
Even
with
h.3.2).
and
in t h e
rather
of r e l a x a t i o n
Mestimators
This
application
section
will
of t h e
equations.
methods
can be
seen
of M M  e s t i m a t i o n ,
we w o r k
approximate
out
through
as f i x e d
given some
an
arithmetic
solution.
eg ) = R ' ( e
until
also
possibly,
process.
(see
context
methods
have
and,
considerations
solve
the m e t h o d s
and f r e q u e n t l y
attempted.
all c o n v e r g i n g
with
equations
obvious
inaccurate
point
difficulty
.....
stationarity
e )
is o b t a i n e d ,
that
is to
say
until
Then,
The
the
solution
solution
difficulties.

The m e t h o d
followed
expensive
~I
peremptory.
estimated.
can be,
encountered
method
the
the
is c l o s e l y
and nonlinear
(e 1 ..... and
but
deceiving.
is m o r e
otherwiseiterative
computations.
approximate
and
upon
solutions
is t o t a l l y
relatively
are
could
on f i x e d
Basically
rule
section,
can be
continuation
point
way,
observations.
dependent
latter
has
point
of t h e
centered
point
what
fixed
when
method
difficulties
in a s i m i l a r outlying
competition The
of M M  e s t i m a t o r s
timeconsuming
be
the
important
4,5.
onestep
strongly
by A n d r e w s
wins
exactly
computation
non
time.
remains
Solution
the
2  and f i x e d
power method.
understanding
In the
s is
proposed
in c o m p u t a t i o n
outliers
account
of T a b l e
by H u b e r
vth
to r e j e c t
into
parameter
column
The method proposed
tend
take
of the
(e 1 .....
eg)
: R,(e 1 .....
eg).
(e I , ....
eg) = R'(e I ..... eg).
is
general
We w i l l
first
fixed
point
consider
problem
the
still
situation
presents
where
many
we k n o w
for
73
sure
there
is a s i n g l e
solution;
then
we w i l l
take
care
of m u l t i p l e
solutions. We n o t e to d e f i n e the
that some
it
impossibility
parameters, section
(1973)
to
in f a i r l y
in t h i s
aspects
apply
subset
situations
 Thus,
we w i l l
and p r a c t i c a l
Brouwer
et al.
theorem
constant
in the
space
 See H e n r i c i
rather
algorithm
in K e l l o g g
the
a Lipschitz
a compact
computational
refined
to
through
trivial
respect
a general
and f u r t h e r
complementary
difficult
mapping
of d e l i n e a t i n g
except
6.12)
attention
is e q u a l l y
contracting
devote
considerations
as to
of the (1974,
our
investigated
(1976).
due
by S c a r f
A few
are
reported
by T o d d
(1976). The
algorithm
following we want
"the"
the
is
based
on t h e
solution
single
when
solution e
"continuation
some
method"
parameter
varies.
and
consists
Precisely,
in
assume
of =
R(e)
with
e = (e'l,...,e'g)' and
assume
we k n o w t h e
solution
e0
than
e is the
solution
for
for
method
consists
e = e 0, up to X = efficiency function The
I where
e(X)
with
problem,
it
+
the

X)
R0(e).
solution fixed
greater,
to the
of the
whatever
(1
is the
is t h e
respect
continuation
serious
R0(.)
k = I of
in f o l l o w i n g
of the m e t h o d
rule
= R0(e0),
e = X R(e) The
another
point the
"variable"
solution
from
the method
from
k = 0, w h e r e
solution
smoother
it is
desired.
is the
The
implicit
~.
X = 0 up to
is, as l o n g
X = I presents
as the
no
(matricial)
expression B = 1 remains
positive
involved seeing
definite.
analytical
their
illustration
good
x(~/ae)
RCe)
numerical
(1

X)
(a/~e)
Predictorcorrector
treatment
of s e c t i o n

can be p r o p o s e d .
stability
4.h.~.
When
algorithms We
and t h e y h a v e the
above
R0(e) as w e l l
f a v o r the been used
expression
as
former in the
becomes
74
singular [0,1] What
that
or,
indicates
in o t h e r
is h a p p e n i n g
More with
involved
some
into
lead
for one
goingin
relevant
theory
of o p e n
concerned,
we
continuation onetoone e(o)
path
property
then
degree
not
8(k)
are k n o w n . specialize
some
k 6
appropriate.
(~I ~
of a p a r t i t i o n
the
number
compact
vector
fields
space.
on a v e r y R(O)
space
As
are
~ in [ 0 , 1 ] ;
that
when
8 0 is
moreover
with
respect
approximate
discussion
split
the
to the
of the
continuous
when
above
to
~.
e0 = in some
This
fixed
evaluation
point
of
expression
0 = R(e) in its
components;
we w a n t
a fixed
(e, I ..... e'g) where
we m u s t
select
Ri(e)
=
point
solution
of
(R1(e)' ..... R g ( e ) ' )
according
to t h e
estimation
theory,
we
select Ri(e) The
choice
where robust takes
e0i
of t h e
rule
R0(.)
is a c o n s t a n t
estimation the
= e i  I ¢ i ( x , e l .... ,eg)
of e.. i
may
be
fairly
R0i(e)
= e0i,
of the
trivial,
right magnitude
With these
=
(I

X)
or,
definitions,
form B
f(x)
I +
XA
on
are here
assume
computations
let us
of the
on the
as we
property
0, f u r t h e r
paths
it is b a s e d
defined
far
useful
and R0(8) of
of the
account 3);
a
path
of g o i n g  o u t
A good
that
I), or to
continuation
chapter
Banach
a continuation
is c o n t i n u o u s
fast
the
First
~I
of t h e
part.
attention
e(~)
some
help
count
a given
= 8 0 for
to v e r y
at
starting
(1976,
for
: assume
We n o w
MMestimators.
for
e 0 was
in A m a n n
of 8 in t h e
solutions
see that
the
to
of some
the
of 8 ( I ) ,
leads
set
an e x p l o s i o n
entering
subsets
method
to
with
can be f o u n d
mappings
neighborhood
to
is p o s s i b l e
retain
= 8(I),
differentiable
starting
to a t e r m i n a t i o n
Further,
it
the L e r a y  S c h a u d e r closure
permits
possibly,
paths.
space,
is not
the
?
or,
several
parameter
0(~)
that
analysis
80 m a y
discontinuity
that
terms,
dx. say we
select
possibly, the
above
a less factor
B
75
where
A is the m a t r i x
of b l o c k  c o m p o n e n t s
Aij = $ ~ij (x'el ..... eg) f(x) dx encountered
at
section
an a p p r o x i m a t i o n improved
of 0(~)
approximation
e(x) The
with =
but
For
as
instance
continuous
function
and all
the v a l u e
 e.g.,
1
[~
it has
been
to
This
been
of t h e
has
produced
were
starting
With
00(2.25) a fast
computation
we h a v e With
tried
the
has
to a g a i n
given
the
estimate
a very
fixed To
rapid
computation
but
point
totally solution
conclude
present solution
converges
section
the
X = 0, u p to
c = ~ down
the
was
a
approximately
Difficulties
to
were met
set
0(2.~),
continuation result
associated
from
0 = 0(c)
predicted

with,
~.h.h with
that
fairly
only very
as s t a r t i n g
slowly.
set
= 0(2.3), estimator
the
result
fixed of
this point
section
= 2 0(2.3)
has
come
different
0(2.25)
for
c = 2.25.
corresponding
with
Then
c = 2.35.
had been section
be p e r f o r m e d
by
out w i t h
found
for
it m a y be
mathematics 4.h.2,
 0(2.25),
from the
an
that
.....
estimate
already met
0 close
0(2.35).
to A second
c = 2.35.
suitable
is e q u i v a l e n t
except
pi(x,el must
 an
initialization
00(2.35)
e0(2.35)
of
c from
this
is
*
been met
problemfree.
the
formula
0
)  0 1.
generally
appared
with
*
continuation
= 2 0(2.35)
to r e a l i z e
X) R 0 ( e
have
the
parameter
0(2.3).

described
it has
and w h e n
by a p r e d i c t o r
) + (1
hazards
computations
possible
*
definite
NewtonRaphson
computations
00(2.3) it has
obtained
achieve
of A n d r e w s ,
c = 2.3;
B is p o s i t i v e
by the
R(0
to t i m e
in t h e
method
with
+ B
difficulty
regression
c = 2.35
*
When
is g i v e n
from time
serious
I.
= e
procedure
rapidly
h.2.
to
indicate
to the
substitution
eg)
that
the
simultaneous of the
function
76
~i(X,el ..... eg)
+
(I
 ~)~0i
(x'el ..... eg)
with ~0i(x,el ..... eg) = (I/2) Possibly
the g r e a t e s t
provide
a larger
aspects
as well
f i x e d point
theories
as on the
conce p t
by S w a m i n a t h a n
advantage
insight
as well
the
continuation
attention method.
illustrated
by the
and by K a r a m a r d i a n
as a l g o r i t h m i c
(e i  e0i).
of the fixed p o i n t
retaining
is w e l l
(1976)
(e i  e0i)'
features.
(1977)
approach
is to
on the m u l t i p l i c i t y The g e n e r a l i t y
series
of p a p e r s
concerning
of the edited
mathematical
5. OPEN AVENUES
We p r e s e n t worthy
5.1.
in this
of f u r t h e r
Estimators
Section work
seen
2.2
however,
possible
to
evaluated
and the
This
state
plausibility
groups
possibly,
also
this
What level some and
we
of
the
and, has
basis,
in case been
With
T(.),
to
theory
although
alrpady
gCy) the
noted
high
for
a large
the
attempts
by
failed.
This
conditions
have
complexity
of the
analysts
 Furthermore,
it
...
dx dy +
to the
of m o t i v a t e d
class
problem
but,
with is not
clear
required.
and,
could
gCx)
precise
which
possibly,
expansions.
as they
integrand
to the (19T5),
can be t r u n c a t e d this
Moreover,
have
been
is v e r i f i e d
the
presented
have
previously;
they
it m a y
be p o s s i b l e
to assess
influence fully even
"natural"
defined,...).
introduce
tackle
respect
demonstrated
is an e x p a n s i o n
few terms
derivative,
a few
everywhere
function
hardly
have
even
influence with
for
function
a completely
their by
to the
values
simulation
per
se
what
derived.
von M i s e s
to
been
in this
a functional
with
~(x,y)
been As
is r e a l l y
of d o u b t ,
regard
investigation
involve
state
shortage
infinite
jackknife
heuristic
has
in t o p o l o g i e s .
first
diverging
it has
expanded
is u n s a t i s f a c t o r y
to
need
seem
justified.
presented
conditions
I
attributed
expansion
really
its
be
to the
topqualification whether
stressed,
and d i s t r i b u t i o n s .
partly
derivations
dx + ~ I#
expansion
of e x p e r t s
can
either
weakly
distributions
what
can be
gCx)
of a f f a i r
of f u n c t i o n a l s
be
which
rather
to
+ # ~(x)
of the
of
under
g,
to us
of m o s t
fact m u s t
precisely
considerations
appear
root
f according
= T(f)
a few
or
functional
on d i s t r i b u t i o n
T(g)
failure
as
is at the
state
distribution
many
chapter
research
~(x,y)
(linearly)
of the
the
for m o d e r a t e
requirements Mallows
a secondorder
of the
function,
supports
correlated
integral
and
in an e x p e r i m e n t a l based
samples.
use
function;
influence data.
size
(consistency,
also m a k e s
influence
double
(firstorder)
Mallows, definition
This
first
criteria
a statistic, above
it h a p p e n s
is d e f i n e d
function.
His
being of the
on the
as the may
expansion
to be the
be
influence a way to
78 5.2.
Sample d i s t r i b u t i o n
Relatively sample the
few pieces
distributions
jackknife
and t h e
of
information
of r o b u s t
method
symmetry
Furthermore, normal
of estimators
ways
of a s s e s s i n g
(by t h i r d
we m a y
central
safely
distribution
available We have
regarding
already
the
given
the
possible
bias,
moment)
of t h e i r
distributions.
conjecture
is r a p i d
are
estimators.
that
whenever
the
the
tendency
influence
with
the v a r i a n c e
toward
function
the
is
bounded. This
state
of a f f a i r
utmost
practical
limits
have
supplied
distribution the
sample;
difficulty
to us v e r y
on
robust
on the
location
problem.
computation
same
of the
excerpt
estimator
and the
sample
he has
between
of
the
a few
(1970);
distribution
the
the
distribution
proposed
of t e n d e n c y
has
set c o n f i d e n c e
demonstrated
estimates
line
question
to
relations
essentially
of r o b u s t
In the
the
(1968)
on the
Subsequently,
studentization
we
have
although
by H u b e r
insight
attempts
reasonable.
is n o t e w o r t h y ;
Attempts
some m o r e
these
of the
conjectures
paper
of t h e but
is u n s a t i s f a c t o r y
interest.
they
appear
to n o r m a l i t y , by H a m p e l
the
(1973b)
:
"... t h e t h i r d a n d p e r h a p s m o s t i m p o r t a n t p o i n t seems to be e n t i r e l y new. It c o n c e r n s not the q u e s t i o n w h e r e to e x p a n d , but what to e x p a n d . M o s t p a p e r s c o n s i d e r the c u m u l a t i v e d i s t r i b u t i o n Fn, some the d e n s i t y fn'
but
neither
approach
leads
to v e r y
simple expressions. It s h a l l be a r g u e d that the m o s t n a t u r a l and s i m p l e q u a n t i t y to sudy is the d e r i v a t i v e of the l o g a r i t h m of the d e n s i t y , f ' n / f n , a n d this for s e v e r a l reasons whether
his
It m a y use
: ..."
approach
be
is p r a c t i c a l
seen t h a t
of r o b u s t
we
estimators
feel
remains
entangled
because
to be d e m o n s t r a t e d . in a v i c i o u s
we do not k n o w
distribution
of the
sample
at h a n d ,
and we
departs
a given model
in o r d e r
to
from
estimator. derive the
only
the
evident
sizes
We
are
theoretical
afraid
it is f i g h t i n g
distributions
possibility
to
of t h i s assume
f r o m the
method. any
strict
know
: we m a k e
the
how
a sample
the d i s t r i b u t i o n
for a w r o n g
for r o b u s t
is i n f e r e n c e
limitations
it is not w i s e
should
state
circle
precisely
issue
estimators. sample
We f e e l model
itself, that
for
whereas,
of t h e
to t r y
We in
are
spite
small
to
afraid of
sample
for moderate
79
or l a r g e
sample
sizes,
models
are
not
needed
seeing
the
tendency
to
normality. But and
we m a y
the
even
also
scatter
for
small
to
question
whether
apply
studentization.
sizes,
in
with
the
we
help
are
able
to
The
of t h e
estimate
answer
jackknife
the
mean
is p o s i t i v e ,
method.
We
illustrate. For
5000
negative
replicates
exponential
of
size
n =
11,
(Ul,...,u11) , drawn
from
the
distribution
f(u)
= exp[(u =
 a)] , if u >
0,
we
have
estimated
the
location
we
have
estimated
e such
a,
if
u k .
 k),
value
 e 0)
f(u)
a consistent
du
V
is t h e
Through giving
to
relatively In T a b l e
= min
%0
for
asymptotic
e0,
of p a r a m e t e r
small 3,
the
sample
we
variance
according
e 0 and
and
Se,
experimental
the
perfect
agreement
size
of the
theoretical
Moreover,
to
the
~e
and
jackknife the
expressions
Thus
experimental
theory.
we
are
results
in
for
a a
size.
report
k = 2.
replicates.
a.
I = n~I V
analytical derivations we h a v e o b t a i n e d 2 o 0 as f u n c t i o n s of t h e p a r a m e t e r k. compare
satisfies
approximately
and
small
i.e.
e 0 and
position
5000
e,
estimator
2 a e = var(e) where
of
are
Except
the the mean for
results
for
two
theoretical and
standard
a neglegible
between
theory
sample,
n =
11.
and
specific
values
to
deviation bias
simulation
with
values,
be
compared
observed k =
k =
.I,
.I
with
with
we
note
notwithstanding
the
the a
80
k = .I
k = 2
8o
.6948
.9~75
F
.7370
.953O
se
• 3055
.2768
s e
.2993
.2632
.0422
.0055
9.768
I .ho5
eo

(~
eo)/(~eH5OOO)

Table
We m a y values
also
of k.
functions
compare Figure
(pdf)
the
positive
from
one
another
further
5.3.
tails
are
due
similar
which
to t h e
a relatively
while be
that
number
the
negative
We
tails
note
differ
We r e f r a i n
figure
in this
two
density
(8  ~ ) / ~ 8 "
expected. the
for t h e s e
probability
coordinate
could
fact
small
distributions
of t h e two
the r e l a t i v e
in a w a y
commenting
experimental
2 is a d i s p l a y
against
that
replicates,
the
3
is d e r i v e d
from from
5000
regard.
Adaptive estimators
Seeing
the
inappropriate estimators,
inadequacy
Then,
by m e a n s
illustrate,
some
distributions,
each m e m b e r
of d i s t r i b u t i o n s . is d e c i d e d
of
consider
the
many
of the which
either
on a s a m p l e
of a h e a v y  t a i l e d
location
distribution
above
favors
the
of the
alternative of
alternative
heavytail
+
of a d a p t i v e
tried
to
to d e s i g n
for
select
setting
to sets
a specific
for a g i v e n weights.
of
class
sample
To
estimator (I  w)
in
(median).
may
indicate
see
Smith,
setting
and to
be to d e f i n e
statistics.
applied
optimal
or by
(e.g.,
consists
hypothesis,
would
some t e s t
and d e f i c i e n c i e s
estimator
when
have
being
(Xl,...,Xn) , a test
of the
function
authors
sets
of t e s t s
8 = w(mean) Based
estimators
w as
high
1975). zero
one o t h e r w i s e ;
 A good
methods
w to
the
The when the
a monotonously account
is p r e s e n t e d
likelihood
of the
by H o g g
first the
term
test
second term
increasing advantages (1974),
with
81
pdf (O)
k=.l / / k =2
.2
.I
2
i
o
FIG. 2
82
particular
emphasis
aspects. although
being
arguments
that
manmade value
"corrected" first
lead
O = w I (midrange) situation are
select
"best"
optimality
drawn
from
to v a r i o u s
context,
then,
involved
treatment
of p r i o r
distributions;
prior
appears
To
5.4.
only
among
from
some
a location
but t h e
our p a r t ,
we do not
(I  w I  w2)
(median).
when
class.
the v a r i o u s Yohai
a family
(197h)
of H u b e r ' s
k and p r o v e s
second feel
the
estimator suggests
to
estimators asymptotic
what
but
prefer
the b a y e s i a n
is the
origin
of the
by M i k ~
(1973)
who
possibly
the
soundest
arbitrariness.
introduces treatment
defined
line
of D e m p s t e r
(1968),
significant
application
in the r o b u s t n e s s
to
these
refrain
from
according
viewpoint
only
infering
to u p p e r
would
be to
and l o w e r
but this
too much
An
a family
when
author
has
field.
the
It
sample
evidence. mixed
of R e l l e s
estimators
rather
is p r o p o s e d
little
conclude
feelings,
and Rogers
there
(1977)
is the
comforting
: statisticians
are
fairly
of l o c a t i o n .
Recursive estimators
Mainly which the
is c l e a r
is in the
of a n y
observation robust
it
important
provides
+
parameters
distributions This
not h e a r d
a n d the m e d i a n , For
different
we w o u l d
because,
bounds.
closer
favors
structure
a common
estimator
to d e c l a r i n g
to be
position
the
that most
of the m e t h o d .
In this
use
the m e a n
+ w2(mean)
is not m u c h
components
The
to the m i d r a n g e . composite
of the
to j u s t i f y
saying
distributions
they
with the
corresponding
from
are u n c o n s c i o u s l y
could
the
range
of n o r m a l
between
robust
scope
a discussion
are p r o p o s e d
They
than
viewpoint
The
arguments
should.
as the
a broader
estimators.
data
intermediate
happy
as w e l l
has p o s s i b l y
is e s s e n t i a l l y
structures.
from mixtures
estimator
very
it
adaptive
subjective
estimator
are d r a w n
typical
robust
background
(1975)
informative;
to
fairly
of the
samples
of T a k e u c h i
less
leading
Frequently, choice
on the h i s t o r i c a l
The p a p e r
in t i m e  s e r i e s
permit
to w o r k
estimator
Makhoul linear
(1975) models
on
analysis,
out
size
surveys and t h e y
it is p e r e m p t o r y
an e s t i m a t o r
n as w e l l
as
the v a r i o u s appear
on a s a m p l e
some
summary
methods
nonrobust.
of
to h a v e size
information
relevant
expressions
(n +
I), w h e n
are given.
to f o r e c a s t i n g
with
83
The theory
design
of robust
for recursive
Nevel'son
(1975)~
algorithm
and
sample
regard with
to gross
errors
influence,
same
estimation robust
(1976),
In general,
of the
analysis
of s t a t i o n a r i t y
must
hardly
to build
possible
information.
nonlinearities can be seen It does
in Rey
investigation (1970) Poisson
have
jackknife
of point
compared with
In t i m e  s e r i e s results
robust
several
They
is optimal as well
can be attributed
assumptions
of the models
sensitivity
to these
assumption
This
question
Devlin
et al.
complicate
with
the
also provides
a
of Kersten
(]976).
recursive
robust
approaches
derived
it is rare to obtain
are,
on the
and that,
one hand,
that lack
on the other hand,
compact
representation
are e n c o u n t e r e d
it is of the
even when the
treatment
estimation
is c o n s i d e r e d
except
estimates placed
are m o d e r a t e
but this
as
to be very
of a r e l i a b i l i t y
that
variance
an estimate
process
to d i f f i c u l t y
analyses,
to assess
and to the d i f f i c u l t y
assumptions.
is only one
Among
with
promising.
and Hoel
factor
in a
and s e n s i t i v i t y derived
them,
what
the help
the
from the
the
possible
are the key the
independency opposed
alternate
of an influence
(1975);
limited
of e v a l u a t i n g
it is f r e q u e n t l y of the
and by Mallows
in the Gaver
respects.
as in point
is approached (1975)
occasionally.
on bias,
conclude in most
is nearly u n m a n a g e a b l e ~
correlation,
work
the proposal
analytical
processes,
emphasis
hypothesis. theory
or less
paper
(1975a).
seem that
process
to Poisson
in the
accessible
a recursive
on heuristic
any a p p r o p r i a t e
sensitivity
a weak
More
et al.
However,
difficulties
involved
not
series
The reasons
be coped with
These
That
finite
in order to
A more
with
with
of Evans
in part,
algorithm.
convergence.
data.
analogous
of time
at least
from the R o b b i n s  M o n r o
past
of serial
the
have
similar.
(1974)
the
a KalmanBucy
bounded
offset.
by Rey
Whether
to bound
outliers
is partly
by
question.
proposes
strictly
of a large
b a s e d on the paper
the
Some
the R o b b i n s  M o n r o
an open
in order
is not
scale
with
(1976)
Occasional
(1977)
"median"
is based,
satisfactory
Tollet
influence
unsatisfactory.
Mestimators.
remains
selected
had been p r o p o s e d
of the
estimators
factor
in presence
estimation
and Kurz
satisfactory
and M a r t i n
strategy
so far,
similarities
or outliers. that
tracking
by M a s r e l i e z
are
a gain
but
is,
of type M has been d e v e l o p p e d
to o n e  s t e p
to filtering,
approach
permit
it presents
is relative
properties
With
methods
estimators
the results
to linear hypotheses.
function
are too
by
84 5.5.
Other
views
In s e c t i o n "linear" This
approach
h, we
Sacks
of w o r k
by
regard
deleting
here
facing
by
structures
"abusively"
be
and van
(1976)
this
attitude
is v e r y m u c h
be d i s c u s s e d that
To is,
be
in the
evident
help
while
basis.
The
is not
as
need
for these data,
Rohlf,
1975)
are w e l c o m e . is a c l a s s i c a l
is a r e g r e s s i o n
at h a n d
obtained
and
Most
classical
analysis methods.
by
and K e t t e n r i n g may
methods
is i l l u s t r a t e d squares
by H u b e r ' s
a
is p a r t i c u l a r l y
of o u t l i e r s
and r o b u s t
can
procedures.
by p r o v i d i n g
advocated
least
as
on the
methods
in d a t a
methods
This
and
can h a r d l y
attention
results
as
identification
(see H o e r l
statistical
to the m o r e
computation
structures
appropriate
the
estimation
to
in the v a r i a b l e s ,
clear.
than
and by G n a n a d e s i k a n the
(e.g.,
former
problem
put
or t h e
is the m o s t
we r e t a i n
robust
multivariate
(1969)
the
But
counterpart
domain,
4, t h e
method)
methods
that
identification
latter
on t h e i r
estimated
in v a l i d a t i n g
difficulties
regression
What
the
possible
by r i d g e
upon
text.
recall
and W i l k
this
3 and the
dependent
processing
Nonpolynomial
errors
are
it is c l e v e r l y
shortcomings
of the
We
of v a r i a b l e s ;
when
its
(197h).
(1977).
to
(1975) for
and an i n t e r e s t i n g
by D y k e
of
(1974).
present
let us
In the m u l t i v a r i a t e
whilst
al.
an e s s e n t i a l
Gnanadesikan
Figures
et
and P i k e
in s p i t e
attributed
is p r e c i s e l y
comparison
formidable
paper
More
eventual
techniques
selection
is t a k e n
and
is p a r t i c u l a r l y
The
be d i s r e g a r d e d
care
be d o n e
as a p p r o x i m a t i o n
methods
requiring
may
due
seen
f o r us,
reliable
when
what
conclude
Robust
1977)
quadratic;
a tremendous
with
This
(1976)
best
or
by M a r c u s
as by M e a d
is p r e s e n t e d
of the not
can be p a r t l y
by F l o r e n s
better
should
for a r e c e n t
reviewed
fact
problem
a minimax
that
assumed
approach.
by H o c k i n g
data
Campenhout
concerned,
nonlinearities
of the
linear
of
validity.
nonlinearities.
terms.
as w e l l
with
it seems
hidden
solution
the m o d e l
augmented
frequently
(1975)
are r e v i e w e d
general
nonlinearities; Kennard
are
either
cases,
with
introduced
and D r a p e r
(e.g.~ P e a r s a l l ,
Cover
can a l s o
cope
to the
(1975)
slightly
simplest
to
discontinuous
the
implemented light
in the
and bound method
by H u b e r
and
the r o b u s t n e s s
with
attention
questionning
alternative,
Even
Box to
addressed
on the
is n e e d e d
great
without
criticized
variables
situation
branch
based
of the
discussed
devoted
problems
been
polynomial
deletion
with
has
(1977).
general
have
is d i r e c t l y
partly
paper
amount
robustness
regression
question
this
in
(1972). present not by
regression
method
as
85
described
at
of o p i n i o n , important) sample
section but
the
the
4.h.3. second
difference
is w o r t h y
of
Which
is the b e s t
m a y be p r e f e r r e d between
further
the
attention.
two
and
regression (this
results
is
is a m a t t e r
is the m o s t such t h a t
the
86
•
• •
•
O0
8 II
II c
.
I
F~G. 3
87
•
•
"'X.. oo
\o
q,,,m
II
II C
•
FIG. Z+
6. REFERENCES
Amann,
H., Fixed point equations
ordered Banach spaces, Andrews,
D.F., A robust method
Technometrics,
and nonlinear
SIAM Roy., 18 (1976)
16 (1974)
for multiple
eigenvalue
problems
linear regression,
523531.
Andrews,
D.F., Alternative
variance
problems,
Andrews,
D.F., Bickel,
calculations
In Gupta
for regression
and analysis
of
(1975) 27.
P.J., Hampel, F.R., Huber, P.J., Rogers,
and Tukey, J.W., Robust
in
620709.
estimates
of location,
Princeton
Univ.
W.H. Press
(1972). Anscombe,
F.J., Rejection
of outliers,
Barra, J.R., Brodeau, F., Romier, developments
in statistics
Technometrics,
~ (1960)
123147.
G. and Van Cutsem, B., Recent
(edited by), NorthHolland,
Amsterdam
(1977). Beran,
R., Robust location
estimates,
Beran, R., Minimum Hellinger Ann.
Statist.,
Berger, J.O.,
Beveridge, practice, Bickel, Statist.
estimates
results
of a location vector,
G.S.G.
results
and Schechter,
P.J., Onestep Ass.,
for parametric
for generalized Ann. Statist.,
for generalized
of a location vector, Ann.
Mc GrawHill
~ (1977a)
431444. models,
445463.
J.0., Admissibility
coordinates
distance
Inadmissibility
of coordinates Berger,
~ (1977b)
Ann. Statist.,
Statist.,
estimators 302333.
Bayes estimators
~ (1976b),
R.S., Optimization
Cy., New York
Bayes
~ (1976a),
334356.
: theory and
(1970).
Huber estimates
in the linear model,
J. Am.
7__O0(1975) h2843~.
Bissel, A.F.,
and Ferguson,
R.A., The jackknife
edged weapon
? Statistician,
24 (1975) 79100.
Box, G.E.P., Nonnormality (1953) 318335.
: Toy, tool or two
and tests on variances,
Biometrika,
40
of
89 Box, G.E.P. 3~7352.
and Draper, N.R., Robust designs,
Biometrika,
6_~2 (1975)
Boyd, D.W., The power method for Lp norms, Linear Algebra Appl., (197~) 95101. Brownlee, K.A., Statistical theory and methodology engineering,
Wiley, New York, 2nd ed.
in science and
(1965).
Cargo, G.T. and Shisha, 0., Least pth powers of deviations, Approximation Thy,
J.
15 (1975) 335355.
Chert, C.H., On information and distance measures, feature selections,
error bounds and
Information Sciences, 10 (1976)
15917h.
Coleman, D., Holland, P., Kaden, N., Klema, V. and Peters, system of subroutines
for iteratively reweighted
computations, manuscript,
Mass.
Inst. Techn.,
S.C., A
least squares
(1977).
Collins, J.R., Robust estimation of a location parameter in the presence of asymmetry, Ann.
Statist., k (1976) 6885.
Cover, T.M.
and van Campenhout,
measurement
selection problem,
J.M., On the possible orderings IEEE Trans.
Syst. Man Cyb., SMC7
in the (1977)
657661. Daniel,
C. and Wood, F.S., Fitting
equations to data, Wiley, New York
(1971). Dempster, A.P., Estimation (1966) 315332.
in multivariate
Dempster, A.P., A generalization Statist.
Soc., B30
analysis,
of bayesian
In Krishnaiah
inference,
J. Roy.
(1968) 205232.
Dempster, A.P., A subjectivist
look at robustness,
In I.S.I., ! (1975)
3hg37h. Dempster, A.P., Examples relevant to the robustness inferences, Denby,
In Gupta and Moore
(1977)
L. and Mallows, C.L., Two diagnostic
regression
analysis.
of applied
121138. displays
Technometrics, 19 (1977)
113.
for robust
90 Devlin, S.J., Gnanadesikan, R. and Kettenring, J.R., Robust estimation and outlier detection with correlation coefficients, Biometrika, 6_~2 (1975)
5315~5.
Draper, N.R. and Smith, H., Applied regression York (1966).
analysis,
Dutter, R., Algorithms for the Huber estimator Computing, 18 (1977) 167176.
of multiple
Dyke, G'V., Designs regression, Eisenhart,
Appl.
to minimize
Statist.,
loss of information
of the concept
Ekblom,
H., Calculation
Tidskr.
Informationsbehandling),
Ferguson,
Sciences,
estimation
with
1!I (1976) 6992.
to probability
theory and its applications,
2 (1966).
R.A., Fryer, J.G.
a truncated
BIT (Nordisk
BIT (Nordisk Tidskr.
P. and Kurz, L., Robust recursive
W., An introduction
Wiley, New York, Vol.
1971 Am. Statist.
I__33(1973) 292300.
Ekblom, H., Lpmethods for robust regression, Informationsbehandling), I~ (197&) 2232.
Feller,
in polynomial
of linear best Lp approximations.
Information
regression,
of the best mean of a set
of measurements from antiquity to the present day, Ass. Presidential Address, Audience notes (1971).
applications,
New
2__33(197~) 295299.
C., The development
Evans, J.G., Kersten,
Wiley,
and Mc Whinney,
normal distribution,
I.A., On the estimation
In I.S.I., ~ (1975) 259263.
Filippova, A.A., Mises' theorem of the asymptotic behavior of functionals of empirical distribution functions and its statistical applications,
Theor.
Probab.
Appl., ~ (1962) 2h57.
Fine, T.L., Theories of probability; Academic Press, New York (1973). Fletcher,
R., Grant, J.A.
and Hebden,
an examination
of foundations,
M.D., The continuity
and
differentiability of the parameters of best linear Lp approximations, J. Approximation Thy, I__O0(197h) 6973.
of
91 Fletcher, R. and Powell, M.J.D., A rapidly convergent for minimization, Computer J., 6 (1963) 163168. Florens,
J.P., Mouchart,
errorinvariables Forsythe, A.B., coefficients
M. and Richard J.F., Bayesian
models,
Robust
estimation
by minimizing
(1972)
159166.
Garel,
B., D~tection
J. Multivariate
of straight
aberrantes
Th~se, Univ.
Gaver, D.P. and Hoel. D.G., Comparison probability
estimates,
Technometrics,
Gentleman,
W.M., Robust
minimizing
pth power deviations,
University,
R. and Kettenring,
outlier detection 8112~. Gnanadesikan, statistical
analysis,
of certain
(1976). smallsample
of multivariate
and Schucany, New York
Gray, H.L., Schucany,
Poisson
location by
Dissertation,
Princeton
Bell Tel. Labs.
J.R., Robust
In Krishnaiah
1_~h
dans un ~chantillon
estimates,
data, Biometrics,
(1969)
(1965). residuals
and
2 8 (1972)
R. and Wilk, M.B., Data analytic methods
Marcel Dekker,
jackknife
Technometrics,
12 (1970) 835850.
Ph.D.
with multiresponse
in
line regression
Grenoble
and Memorandum MM 65121516,
Gnanadesikan,
Gray, H.L.
estimation
inference
Anal., ~ (197~) 419452.
pth power deviations,
des valeurs
gaussien multldimensionnel,
descent method
in multivariate
593638.
W.R., The generalized
jackknife
statistic,
(1972).
W.R. and Watkins,
T.A., On the generalized
and its relation to statistical
differentials,
Biometrika,
62
(1975) 637642. Gray, H.L., functions
Schucany,
distributions, Gross, A.M., Am,
W.R. and Woodward,
of the parameters
Statis.
Gross, A.M.
IEEE Trans.
Confidence
and Tukey,
Study, Princeton
Univ.,
Reliability,
intervals
Ass., 72 (1977)
W.A., Best estimates
of the gaussian R25
of
and the gamma (1976)
for bisquare
9599.
regression
estimates,
J.
34135~.
J.W., The estimators Techn.
Rept.
382
of the Princeton
(1973).
Robustness
92 Gupta, R.P., Applied Statistics
(edited by), North Holland, Amsterdam
(1975). Gupta, S.S. and Moore, D.S., Statistical decision theory and related topics
(edited by), Academic Press,
Hampel, F.R., Contributions Dissertation,
(1977).
to the theory of robust estimation,
Univ. California,
Berkeley
Ph. D.
(1968).
Hampel, F.R., A general qualitative definitions of robustness, Ann. Math.
Statist.,
42 (1971)
18871896.
Hampel, F.R., Robust estimation Wahrscheinlichkeitstheorie Hampel, F.R., Asymptotic
Some small sample asymptotics,
Statist.,
(1973b)
Hampel, F.R., The influence Am. Statist.
: A condensed partial
Proceed.
Prague Symp.
109126. curve and its role in robust estimation, J.
Ass., 69 (1974)
383393.
Hampel. F.R., Beyond location parameters In I.S.I.
survey, Z.
verw. Cab., 27 (1973) 87104.
: Robust concepts and methods,
I (1975) 375382.
Hampel, F.R., On the breakdown points of some rejection rules with mean, E.T.H. Harter,
Zurich, Res. Rept
11 (1976).
H.L., The method of least squares and some alternatives,
Statist.
Rev., 42 (1974)
125190,
273278,
147174, 235264,
269272,
44 (1976)
282, 4 3 (1975)
113159.
Henrici, P., Applied and computational complex analysis, Vol. New York (1974). Hill, R.W., Robust regression Ph. D. dissertation, Hille, E., Analytic
when there are outliers
Harvard University function theory,
I, Wiley,
in the carriers,
(1977).
Blaisdell,
New York, Vol.
I
(1959). Hocking,
R.R., The analysis
regression,
and selection of variables
Biometrics, 32 (1976)
Int.
144,
149.
in linear
93
Hoerl, A.E. and Kennard, R.W., Ridge regression of the biasing parameter.
Commun.
Statist.
: iterative estimation
 Theor. Meth., A5 (1976)
7788. Hogg, R.V., Adaptive robust procedures suggestions
for future applications
: A partial review and some
and theory, J. Am. Statist. Ass.,
69 (1974) 909923. Huber, P.J., Robust estimation
of a location parameter,
Ann. Math.
Statist., 35 (196~) 73101. Huber, P.J., Robust verw. Geb.,
confidence
limits, Z. Wahrscheinlichkeitstheorie
I_~0 (1968) 269270.
Huber, P.J., Th~orie de l'inf~rence Montreal
Huber, P.J., Studentizing robust Huber, P.J., Robust (1972)
statistique robuste, Presses Univ,
(1969).
statistics
estimates,
(1970) h53h63.
: A review, Ann. Math.
Statist.,
4_~3
10411067.
Huber, P.J., Robust regression: Carlo, Ann.
Asymptotics,
conjectures
and designs,
Huber, P.J., Robust covariances,
In Srivastava
In Gupta and Moore
Huber, P.J., Robust methods of estimation Math. 0perationsforsch.
Statist.,
197h,
(197~)
Jackson,
165191.
~I53.
solution of robust regression
165172.
Speech Signal Proc., ASSP23
I.S.I., Proceedings
(1977a)
of regression coefficients,
S.Y., On monotonicity of Lp and lp norms,
Electroacoustic
(1975) 287301.
Ser. Statistics, ~ (1977b)
Huber, P.J. and Dutter, R., Numerical problems, COMPSTAT
Int. Stat.
and Monte
Statist., ~ (1973) 799821.
Huber, P.J., Robustness
Hwang,
In Purl
IEEE, Trans.
(1975) 593596.
of the bOth Session, Warsaw  1975, h books, Bull.
Inst., h6 (1975).
D., Note on the median of a set of numbers, Bull. Ann. Math.
Soc., 27 (1921)
16016~.
94 Jaeckel,
L.A., Robust
contamination,
estimates
Ann. Math.
of location
Statist.,
Jaeckel, L.A., The infinitesimal MH1215 (1972a). Jaeckel,
L.A., Estimating
dispersion
Kanal, L., Patterns Theory,
IT20
Karamardian,
(197~)
S., Fixed points
Kellogg,
R.B., Li, T.Y.
(prepared
Kersten,
IEEE Trans.
I~491458.
Information
and applications
and Yorke, J., A constructive
(edited by),
and computational
proof of the
results,
SIAM J. Numer.
273483.
and Buckland,
W.R., A dictionary
3rd ed.
Statistical
of statistical
Institute).
terms
Oliver and
(1971).
P. and Kurz, L., Robustized vector RobbinsMonro
with applications (1976)
algorithms
for the International
Boyd, Edinburgh,
23 (1972b)
the
(1977).
fixed point theorem
Kendall, M.G.
Statist.,
by minimizing
697722.
Press,
Brouwer
I020I03~.
coefficients
Ann. Math.
and asymmetric
Bell Lab. Memorandum,
in pattern recognition,
Academic
Anal., 13 (1976)
h_~2 (1971)
jackknife,
regression
of the residuals,
: Symmetry
to Minterval
detection,
Information
algorithm
sciences,
I__!I
121I~0.
Koml6s, J., Major P. and Tusn~dy, In Revesz (1975) 149165. Krishnaiah,
G., Weak convergence
and embedding,
P.R., Multivariate
analysis
(edited by), Academic
Press, !
P.R., Multivariate
analysis
(edited by), Academic
Press,
(1966)o Krishnaiah,
(1969). Lachenbrueh, discriminant multiple
P.A., On expected probabilities of misclassification in analysis, necessary sample size, and a relation with the
correlation
coefficient,
Biometrics,
2k (1968) 82383h.
Lewis, J.T. and Shisha, 0,, Lpconvergence of monotone functions and their uniform convergence, J. Approximation Thy, l h (1975) 28128h.
95 Makhoul,
J., Linear prediction
: a tutorial
review,
Proc.
IEEE, 63
(1975) 693708. Mallows,
C.L., On some topics
in robustness,
I.M.S. meeting,
Rochester,
May (1975). Marcus,
M.B.
and Sacks, J., Robust designs
Gupta and Moore Maronna,
(1977)
Masreliez,
Statist.,
C.J.
linear model
k (1976)
and Martin,
of multivariate
Merle,
viewpoint,
the Kalman
Biometrics,
and
filter,
estimation
for the
IEEE Trans.
on Aut.
Computing,
12 (1974)
Mik@, V., Robust Pitmantype Math.,
surface methodology
from
31 (1975) 803851.
G. and Spath, H., Computational
approximation,
Statist.
location
(1977) 361371.
Mead, R. and Pike, D.J., A review of response a biometric
In
5167.
R.D., Robust bayesian
and robustifying
Contr., AC22
problems,
245268.
R.A., Robust Mestimators
scatter, Ann.
for regression
experiences
with discrete
Lp
315321.
estimators
of locations,
Ann.
Inst.
25 (1973) 6586.
Miller.
R.G., The jackknife
 a review,
Miller,
R.G., An unbalanced
jackknife,
Biometrika, Ann.
61 (1974a)
Statist.,
115.
2 (1974b)
880891. Miller.
R.G., Jackknifing
Biostatistics Mosteller,
censored
Div., Techn. Rept.,
F., The jackknife,
Rev.
data, Stanford University, I~ (1975). Int., Statist.
Inst., 39 (1971)
363368. Munster,
M., Th~orie g~n~rale
Soc. Roy. Sc. Liege, N.C.H.S.,
Annoted bibliography
procedures,
(1972).
de la mesure
et de l'int6gration,
Bull.
43 (1974) 526567.
U.S. Dept.
on robustness
Health Educ. Welf.,
studies
of statistical
Publication
(HSM) 721051
96 Nevel'son,
M.B., On the properties
of the recursive
functional of an unknown distribution
function,
estimates for a
In Revesz
(1975)
227251. 01kin,
I., Contributions to probability and statistics
Stanford Univ. Press,
(edited by),
(1960).
Pearsall, E.S., Best subset regression by branch and bound, report, Wayne State University, (1977). Prokhorov,
Y.V., Convergence of random processes
and limit theorems
probability theory, Theor. Probab. Appl., ! (1956) Puri, M.L., Nonparametric
techniques
by), Cambridge Univ. Press,
Internal
in statistical
in
15721~. inferences
(edited
(1970).
Quenouille, M.H., Notes on bias in estimation,
Biometrika,
~3 (1956)
353360. Rao, C.R., Linear statistical
inference
and its applications

second
edition, Wiley, New York (1973). Reeds, J.A., On the definition of yon Mises functionals, Dissertation
and Res. Rept.
S&h, Harvard University,
Relles, D.A. and Rogers, W.H., estimators
of location,
Amsterdam
(1976).
are fairly robust
J. Am. Statist. Ass., 72 (1977)
Revesz, P., Limit theorems NorthHolland,
Statisticians
Ph. D.
of probability theory
107111.
(edited by)
(1975).
Rey, W., Robust estimates of quantiles,
location and scale in time
series, Philips Res. Repts, 29 (197h) 6792. Rey, W., On least pth power methods location
estimations,
in multiple regressions
BIT (Nordisk Tidskr.
and
Informationsbehandling),
15
Rey, W., Mean life estimation from censored samples, Biom. Praxim.,
15
(1975a) 17418h.
(1975b)
I~5159.
Rey, W., Mestimators Brussels, R329
(1976).
in robust regression, MBLE Research Laboratory,
97 Rey, W.J.J., Mestimators Barra et al. Rey, W.J.J. In I.S.I.,
in robust regression

a case study, In
(1977) 591594. and Martin, L.J., Estimation of hazard rate from samples,
4 (1975) 238240. m
Rohlf, F.J., Generalization multivariate
of the gap test for detection of
outliers, Biometrics,
Ronner, A.E., Pnorm estimators Dissertation,
31 (1975) 93101.
in a linear regression model,
Groningen University,
The Netherlands
Scarf, H., The computation of economic New Haven and London
equilibria,
(1977). Yale Univ. Press,
(1973).
Scholz, F.W., A comparison of efficient location estimators, Ann. Statist., ~ (1974)
13231326.
Schucany, W.R., Gray, H.L. and Owen, D.B., On bias reduction estimation, J. Am. Statist. Ass., 66 (1971) Schwartz,
L., Analyse numerlque~
fonctionnelle,
Hermann, Paris
topologie
in
524533. g~n~rale et analyse
(1970).
Sen, P.K., Some invariance principles
relating to jackknifing and their
role in sequential analysis, Ann. Statist., ~ (1977) 316329. Sharot,
T., The generalized jackknife
sizes, J. Am. Statist.
: finite samples
and subsample
Ass., 71 (1976a) 451454.
Sharot, T., Sharpening the jackknife,
Biometrika, 63 (1976b)
315321.
Smith, V.K., A simulation analysis of the power of several tests for detecting heavytailed distributions,
J. Am. Statist.
Ass., 7 0 (1975)
662665. Srivastava, J.N., A survey of statistical design and linear models (edited by), NorthIIolland, Amsterdam
(1975).
Stein, C., A necessary and sufficient condition Math.
for admissibility,
Ann.
Statist., 26 (1955) 518522.
Stigler,
S.M.,
estimation
Simon Newcomb,
: 18851920,
Percy Daniell and the history of robust
J. Am. Statist.
Ass., 68 (1973) 872879.
98 Swaminathan, Academic
S., Fixed point theory and its applications
Press,
Takeuchi,
K., A survey of robust
procedures,
especially
In I.S.I., ! (1975) Thorburn,
Eisenhart
History
Tukey,
properties
of the Peloponnesian
statistics,
war, ~ (428 B.C.) para 20, In
Washington
forecasting (1976)
Lecture
124 (1976).
IEEE Int. Conf.
Cybernetics
on
&
600605.
29 (1958)
S., Mathematical
Berlin,
for the linear model with emphasis
ouliers,
J.W., Bias and confidence Statist.,
and applications,
Springer Verlag,
in notquite
large
samples
(abstract),
614.
Tukey, J.W., A survey of sampling 01kin (1960) 4~8485.
Mas.,
of jackknife
of fixed points
Syst.,
toward occasional
Ann. Math.
Vajda,
and
quantity,
305313.
and Math.
I.H., Robust
robustness Society,
: models
of a physical
(1971).
in Econ.
Tollet,
of location
336348.
Todd, M.J., The computation Notes
estimation
in case of measurement
D., Some asymptotic
Biometrika 63 (1976) Thucydides,
(edited by),
(1976).
from contaminated
programming,
AddisonWesley
distributions,
In
Inc., Reading,
(1961).
Von Mises, R., On the asymptotic statistical Woodruff,
functions,
R.S.
the variance
Ann. Math.
and Causey,
distribution Statist.,
B.D., Computerized
of a complicated
estimate,
of differentiable
18 (1947) 309348. method for approximating
J. Am. Statist.
Ass., 7_!I (1976)
315321. Yale, C. and Forsythe, (1976)
291300.
Yohai,
V.J., Robust
(1974)
562567.
Youden,
A.B., Winsorized
estimation
W.J., Enduring values,
regression,
Technometrics,
in the linear model, Ann.
Technometrics,
I_~ (1972)
Statist.,
111.
18
99 Ypelaar,
A. and Velleman,
procedures,
Economic
Cornell University,
P.F., The performance
and social 877/007
statistics
(1977).
of robust regression
technical
reprint
series,
APPENDIX
The
following
which have
would been
ranked
developments, made
between
are n o t e d
pages
unduely
in
are
in a l p h a b e t i c a l
wherever the
essentially
load the main
they may
sections
italic
type.
of t h e
devoted
text order
to the
involved
derivations
otherwise.
The v a r i o u s
sections
to
research
specific
ease t h e
be r e q u i r e d . appendix.
Many Then,
of
crossreferences the
concerned
are
entries
101 Consistent
estimator
A vector
estimate
tn
is said
to be
consistent
when
it c o n v e r g e s
in
m
probability, which
it
terms,
~n
as the
sample
size
is an e s t i m a t o r
 from
is c o n s i s t e n t
if, and plim
where
p(.,.)
is the m e t r i c
to the v a n i s h i n g distribution
There with
only
interest
to m o m e n t
Chebyshevlike
(1971).
sample
in b o u n d i n g
of p o s i t i v e
~ of In o t h e r
= O, space
the
~.
This
between
Prohkorov
order
is e q u i v a l e n t fn'
metric
m,
= E{[O(~n , ~)]e}.
inequality
P r o b { P ( & n , ~ ) < ~} > 1 leads
parameter
if,
O(~n , ~)
in the
to the
and B u c k l a n d
Prokhorov metric d(.,.)
of the
m The
Kendall
of n t , and f, the D i r a c d i s t r i b u t i o n centered C o n s i s t e n c y ~ lim d(fn, f) = O.
m a y be
respect
n increases,

to
1/(e+1)
d(f n, f) ~ m e
me
8
e
the on
sample
8.
d(fn,
f),
102 Contaminated
normal
With the
desire
of a s s e s s i n g
important,
Tukey
estimators
on the
recall
findings
his
compares
(1960)
and
estimations
from
a purely
drawn
from
the
observations
normally
distribution
symmetry
has
longer
when
The known
model
instead
has
may
of t h e m e a n
contamination
but
is
of s e v e r a l
distributions.
We
own o b s e r v a t i o n s .
He
performed
distribution
with
a sample
a 2) and w i t h
a sample
contaminated
His m o d e l
by e x t r a n e o u s
of c o n t a m i n a t e d
form
N(~, for
shorter
~).
+ c NC.I, ~I = ~ and, tails
than
contamination
unexpected
of the
leads
for m o d e r a t e the
normal
estimation
of
to
some
£, t h e
w h e n ~I < c and
the
better
situation
thousandths
12 % b e t w e e n
readers.
distribution
to have
of one or two
loss
normal
to an e f f i c i e n c y
is s u f f i c i e n t
scale
efficiency
performances
our
N(~,
c)
appear
estimation
Concerning
with
and scale
distribution
the

the
assumption
~I > a.
findings that
them
normality
nonnormal
distributed.
is m a i n t a i n e d
contaminated
slightly
augment
normal
the
investigated
and
same n o r m a l
also
whether
of l o c a t i o n
(i The
has
normal
drawn
normal
distributions
the mean
loss
b y the m e d i a n
of 36 %, b u t
estimation is r a t h e r
is
It is w e l l
location
exceptional;
sufficient
deviation
a 10 %
by t h e m e d i a n .
to b a l a n c e
and t h e
a an
standard
deviation. Precisely, obtained to the
for
in t h e some
symmetric
following
specific
the
asymmetric
given
table
Location
has
been
asymptotic
comparison.
Scale : the
£.
find Table
the r e s u l t s 4 is r e l a t i v e

£) N ( 0 ,
I)
+ e N(0,
32),
 e)
~(0,
I) + ~ ~ ( 2 ,
32 )
5.
respective
estimators
one w i l l
model
(I has
tables,
of p a r a m e t e r
model
(I whereas
two
values
estimated variances
has b e e n
standard
by the m e a n (war)
assessed
deviation,
and t h e m e d i a n .
constitute by t h e t h r e e the m e a n
a natural
Their basis
for
following
deviation
to the m e a n
and
103
the m e d i a n
deviation
semiinterquartile reported we
have
the r e s u l t s with
definition
compare
the
variation
the
at l e v e l whether
lines
0.95
as
estimator,
computed
but
(or s l i g h t l y
the
has
not b e e n
poorer
than)
to the m e d i a n .
Due
it has
appropriate
on the b a s i s
appeared
of t h e i r
of t h e i r
tables
to the
respective
asymptotic
are d r a w n
contaminated
ratio
given
test
relative
size
such
that
with
probability
simple
minded;
estimation
to H I.
vhat
lack
of
to
coefficients
standard
deviations
of t h e
a "discriminatory the
It m u s t
of to
and,
distribution
and the
integrations
H 0 be
we
size"
to t e s t
that
HI,
or
a
with
or 2, 3 2 )
that
should
then,
the
our
take
standard
have
attributed
be n o t e d
realistically
sample
have
supposed
H 0 against
a E are the m e a n
from
must
normal
We h a v e
to t e s t
Numerical
parameters
sample
a strictly
I) + ¢ N ( 0
drawn
0.95.
more
from
performed
~s and
a sample
size
distribution.
was
~, and w h e r e
deviation
provides
is t h e m i n i m u m
H 0 = N(~¢, s H I = (I  ¢) N ( 0 ,
would
same
scale,
ratio
of t h e
which
observations
a given
likelihood
for
been
means.
The l a s t
from
the
A fourth
also
deviation
f o r the
estimators
(c.v.),
has
being
the m e d i a n
natural
their
to the m e d i a n .
range,
provided
the
to H 0 by the
approach into
is v e r y
account
discriminatoy
test
the
sample
sizes
be l a r g e r .
Inspection distribution
of the shape
In p a r t i c u l a r , significantly
tables
is v e r y
a slight impair
Moreover,
to t e s t
optimality
the
longtail
the
deviation.
reveals
that
dependent
large
of t h e s e
sensitivity nature
contamination
efficiencies
quite
the
upon the
is
of the m e a n
samples
must
estimators.
be
to t h e
of the
estimators.
sufficient and of the at d i s p o s a l
to standard in o r d e r
104 Symmetric model
0.0000
0.0018
0.0282
0. 1006
0.2436
0.5235
0.8141
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
1.0000
1.0140
1.2252
1.8047
2.9488
5.1882
7.5230
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
1.5708
1.5745
1.6315
1.8042
2.2389
3.7066
2.5130
Mean
var
Median
var
Standard d e v i a t i o n 
1.0000
1.0070
1.1069
1.3434
1.7172
2.2778
2.7410
c.v.
0.7071
0.7627
1.1725
1.3540
1.2317
0.9720
0.7929
Mean d e v i a t i o n to the mean 
O.7979
0.8007
0.8428
0.9584
1.1866
1.6333
2.0970
c.v.
0.7555
0.7627
0.8514
0.9822
1.0461
0.9720
0.8417
Median deviation to the m e d i a n 0.6745
0.6754
0.6891
0.7297
0.8259
1.1089
1.6167
1.1664
1.1668
1.1725
1.1899
1.2317
1.3435
1.3600
137
83
123
638
C.V.
Discriminatory
0.95 1

sample
7880
size
497
Table 4
105 Asymmetric
model
0.0000
0.0008
0.0115
0.0617
0.2228
0.6246
O.6878
0.0000 1.0000
0.0016 1.0097
0.0230 1.1377
0.123h
0.4456 3.4753
1.2492
1.7254
6.9348
1.3756 7.3613
I
0.0000
0.0005
0.0072
0.0~01
0.1657
0.7409
0.8986
var
1.5708
1.5727
1.5977
1.7254
2.2901
6.9348
8.7865
Mean
var
Median
Standard
deviation
1.0000
1.0049
1.0666
1.3136
1.8682
0.7611
1.1694
1.5176
1.2520
2.6334 0.8172
2.7132
c.v. I 0.7071

I
Mean deviation
0.?837
to the mean
I 0.7979
0.7996
0.8222
0.9301
1.2841
2.0487
2.1355
c.v. I 0.7555
0.7611
0.8263
0.9973
1.0525
0.8077
0.7837

Median 
deviation
I O.6745
c.v. I
1.1664
Discriminatory
to the median 0.6749
0.6810
0.7115
0.8368
1.4726
1.6093
1.1666
1.1694
1.1838
1.2520
1.3731
1.3080
121
51
119
164
sample size
1
0.951
8760
769
Table 5
106
Distribution All
space
probability
convex
complete
(1971,
Ch.
11)
For
any
three
functions,
metric and
we
space,
Hille
(1959,
functions
f,
define
a metric
by
concerned
Sec.
g,
a distance
with,
in a B a n a c h
are
defined
space.
See
in
a
Schwartz
h.7).
g and f,
we
are
included
h belonging h E
to
this
space
E,
E,
function
d(.,.)
having
the
ordinary
properties d(f,g) d(f,g)
The
space
sequence
is
0
d(f,g)
= d(g,f)
d(f,g)
> d(f,h)
complete
{fn ) of
>
= 0 ~ f = g
or,
E belongs
+ d(h,g).
in o t h e r
to
E,
{fn} lim The
space
is
convex,
With
the
we
usual
distributions to
some Let
have
not
E,
Q be
associate
the
the
measure
whole
+
must
be
The f, can
f(x)
it has
not
understood sample
however be
is
we
do
extended a unit
not
G
can
measure
Cauchy
consider
in o r d e r
probability
to
be
be
this
function
as b e i n g
to
be
we
either
and
they
the
some
subset,
probability
related
then
we
function
the
integral
~,
distribution
theory.
different
possibility
i.e.,
frequency are
f by
dx.
x E
to
consider.
way.
m C Q be
any
common
measure,
seen
f(x)
of t h e
possibly
~ h E E.
distributions
F(m)
for
I
probability
and
= f
sense
0 ~ t ~
following
space
continuous
in t h e
space
can
in t h e
sample
probability
what
density
F(=) Whenever
of a n y
E
(1t)g
they
probability
probability
6
t E R,
clarified
terminology, or
limit
i.e.
h = tf far,
the
{fn ) = f ~ f E E.
f,g E
So
terms,
i.e.
all
for and
each we
notation
distribution
assume
distributions
that
f and
it that
107
F(n)
This
possible
distribution
extension
functions
smooth
functions
only
metric
construction
of the
as a sum defined
I.
sample
space
of D i r a c
on
purposes,
=
some
~ leads
functions
discrete
we also
to
sample
require
see d i s c r e t e
rather
the
than
space.
sample
as For
space
~ to be
metric. It must
be
axiomatic Feller
(1966,
constraining of the
Chap. and
The
our
metr4c
constraint
1977b)
discrete
data
as p o s s i b l e
we have
found
to m a n i p u l a t e . reported
avoided
by K o l m o g o r o v ,
A
Except
by
it too
critical
by Fine
the
as r e l a t e d
(1973,
appraisal
Chap.
3)
and
seen
to the
to
in the
first
of
considering
the
then,
include
that
performs
and
papers
metric.
both
escape
For
we have types
of
this
instance,
distribution
estimations
of the
discrete
 Although
simultaneously
The
to
The Prokhorov
space.
space.
a continuous
is due
generalization
real
several
Prokhorov
difficult.
knowledge,
continuous
distribution
substitutes
set and,
is e s p e c i a l l y
of our
pdimension
indicate
avoid
best
as a m u l t i d i m e n s i o n a l
simultaneously
we m u s t
metric
Beran
to any
by m i n i m i z i n g
the
distance. when
cumulative are
been
~ = R p, the
and thus
(1977a, Hellinger
be
possibility
distributions,
work
proposal,
functions
the
is b e c a u s e
heavy
has
as m u c h
introduced
of an a p p r o p r i a t e
while
permits
required
this
setup
and m a y
probability
we have
attitude.
selection
Levy metric
h);
abusively
satisfactory
Prokhorov
that
of p r o b a b i l i t y
Kolmogorov
justifies
only
observed
theory
~ is the
distribution
relative
real
axis,
functions
to p r o b a b i l i t y
it is not (cdf);
density
thus
possible most
functions
to d e f i n e
arguments (pdf).
in this
108
Influence For
function
regular
in the
functional
vicinity
distribution
of f in terms
g be
in this T(g)
Truncation smaller The named
at the
~(~),
"influence"
of d i s t r i b u t i o n of the
vicinity,
= T(f)
level
the P r o k h o r o v function
obtained
T(f)
distance
function
first
by H a m p e l
g
derivatives.
(~) d~ +
order
d(f,g)
depends
Mise8
is p o s s i b l e Let
then
+ f ~(~)
of the
which
Yon
f, e x p a n s i o n
term
...
is the m o r e
valid,
the
is.
upon
the d i s t r i b u t i o n
(1972).
Its
f, has
evaluation
been
is easily
through
~(~0 ) = f ~(~) ~(~~0 ) d~ =
where
lim
{T[ ( l  t )
~(~~0 ) is the D i r a c
important
role
in the
f +
t~]
 T(f)},
distribution
assessmet
t 6
centered
of robustness.
R,
t ÷
0
on ~ = ~0"
It has
an
109 Jackknife The
technique
socalled
and w i t h
the
suitably
regular
justifying
the
infinitesimal
method
of t h e
derivation
functional
regular
distribution
which
g be
~(f)
= X(f)
X and ~(.)
present
produces
an
while
recipe
ordinary
f,
Mise8
uon
on
as w e l l
of n o t a t i o n ,
we
as to the
suppose
sample
expansion
a
space. is p o s s i b l e Let
derivatives.
then
f is the
distribution
g is an e m p i r i c a l
of
centered
bias
by
g(~) d~
(an a r i t h m e t i c
evaluation
to the
of the
defined
g(~) f A(~,~) g(~) d~ d~ +
context,
is u n k n o w n , is some
is o n l y
of d i s t r i b u t i o n o f the
+ [ l(~)
estimation
as a m u l t i d i m e n s i o n a l
vicinity,
I + ~ f
the
of e s t i m a t o r s
section
leads
as w e l l
in this
with
convenience
of f in t e r m s
X(g)
In t h e
This
For
functional
in the v i c i n i t y
deals
variance
functionals.
jackknife.
vectorvalue For
jackknife
estimation
rule
a vector
. . .
underlying
distribution.
or p o s s i b l y
denoted
~,
some
The
sample
functional
an a l g o r i t h m )
which
i.e.
e = T(f), A = lCg), with
g = [
wi
6(a
X = (x 1 .....
a i)/[w
i
x n) ,
w. > O. The the
script
6(x_  xi)
observation
x.
stands
and to
each
for the
Dirac
observation
x.
I
nonnegative
weight
expansion
obtain
distribution
is a s s o c i a t e d
We
introduce
the
expression
of g in the
A = A + [ Z(a i) w i / ( [ w i ) 1
+ 7 [[ which
can
also
a
i
w.. 1
to
centered
wi ~ ( ~ i '
be w r i t t e n
~j)
wj/([wi
12 + " ' "
in t h e m a t r i c i a l
notation
A
8 = e_ + V w I ( I '
w)
+ 1 _w' @ ~ @ wlC1'
where w =
I'
Wls... =
,Wn) !
(I,...,I)'
w) 2 + ...
on
110
a n d @ is a s c r i p t were
a scalar.
bilinear
form
which
Per with
would
an o r d i n a r y
[_w'
[ elk
where
While due
is a s q u a r e
searching
to t h e
denote
definition,
@
a matricial kth
matrlcial
¢ @
=
w_] k
product
component product,
_w '
[ ¢] k
if v e c t o r
is g i v e n
by
e
a
i.e.,
w

matrix.
for the
quadratic
the
term
origin
of p o s s i b l e
inasmuch
bias,
it a p p e a r s
to be
as
A
8_ is c o n s i s t e n t
with
respect
to _8
and w.
is i n d e p e n d e n t
of
x..
l
That
is
Terms
of o r d e r
superior
to two
have
respect
to t h e
lower
neglegible
with
order
cannot
term
To the
i
derive
first
introduce
an e x p r e s s i o n
order
term.
any
covariance
Thus
preliminary
derivation
of t h e
of an e s t i m a t o r on d i f f e r e n t
we o n l y
To size group
apply h
the
(n = gh) by
some
=
I is the
+
being
Observe
that
the
are first
of ~,
we l i m i t
ourselves
to
consider
V
w
/I'
w
given
V'}
reported,
method.
by
/ (I' w) 2 we
now produce
It c o n s i s t s
on a set of w e i g h t s
w with
the
in a c l e v e r other
comparison
estimators
based
of w e i g h t s .
method,
we d i s t r i b u t e
and c h a n g e
factor
the
identity
matrix
the
weights
(I + t). w.
where
e
seeing they
consistent estimators.
for
= E { W w w'
results
jackknife
based
sets
orders.
bias
of ~ is a p p r o x i m a t e l y
coy(S) These
neglected
for the v a r i a n c e
8
a n d the
been
Then :
~i
the +
~.~
and E. I
observations
of the
in g g r o u p s
observations
weightvector
in the
of ith
w becomes
w
is d i a g o n a l
with
ones
for
the
111 ith g r o u p usual
and
zeros
presentation
produces
otherwise.
is r e l a t i v e
the o r d i n a r y
It m a y be
appropriate
to a scalar
jackknife,
and
estimator
to note that the
e and that
t = I
small t the i n f i n i t e s i m a l
jackknife. We d e n o t e i.e.,
by ~i the p s e u d o  e s t i m a t e
based
on the w e i g h t  v e c t o r
Ei ,
A
ei The
corresponding
TCX, _wi).
=
pseudovalue
is g i v e n by A
~i = [ (1' _wi) _ei  (I' w) ~_]/t and has the e x p a n s i o n ,
~i
for
small t,
= (I' E. w) e + v E i w I
~ ~
+
I
i
I
2w @ ¢® Ei_w + tw'
E. @ ¢ ~ E.w
I' E. w 
Averaging
it has
of t h e s e
a relatively
all g r o u p s
 I v
w~
w' i 
pseudovalues
simple
are e q u a l l y
®
¢
•
leads
expansion
weighted.
wil.
+
"'"
to the j a c k k n i f e
when
Thus,
estimate
¢ is b l o c k  d i a g o n a l
under
and when
conditions
[i[j _w' Ej ® ¢ @ E i w_ = O, for
i # j
and g 1' E i E = 1' w, we d e r i v e ,
for
for all i,
small t/g,
= e + vwl(1,w)
+ ½
(I ÷ t

2t/g)
w' • ¢ @ w / ( 1 ' w ) 2 + ...
A
Comparison equates applied.
w i t h the e x p a n s i o n the o r i g i n a l Bias
cancellation producing
estimator
reduction
occurs
of the s e c o n d
term
is g i v e n
of e r e v e a l s e while
the
the j a c k k n i f e
infinitesimal
w i t h the o r d i n a r y
o r d e r term.
by the
that
solution
Exact of the
version
jackknife
cancellation equation
[i(~i @ ¢ @ X i / ! ' ~ i ) = g[' @ ¢ @ w / 1 ' w
estimate is
(t = I) by of this bias
in t
112
or,
under
the
same
conditions,
by p r e c i s e l y t
whatever
the v a l u e
=
I
of g is. A
Estimation the
of the
covariance
covariance
of the
of ~
is o b t a i n e d
after
investigating
variables
~mi : ~i  (l' E i W) "e'. The
attitude
is that
the
pseudovalue
7.
is e s s e n t i a l l y
N I
of the
ith
covariance now be
group could
followed
incidence possibly
under
two
E(W
on the
lead
estimator
to the
representative
A
[;
therefore,
covariance
of
e.
its
This
way
will
conditions.
E. w) I 
(E'
E. W') j
= 0, for
i ~
j
and
(I' w) [i (I' E. w) cov(~ E i w) = [i(I' E i w) 2 cov(~w) They
imply,
from
a population
sampled
for the
group
first,
that
of g r o u p
either
has
the
g groups
variates
the
same
and,
are
for the
weight
(I'
Under
these
total
covariance
conditions, 8.
=
W
that
or yields
each a

proportional
to
its w e i g h t .
we have E.
i
1
[i c°v(!i)
i
independently
second,
E. w)
u
to the
contribution
drawn
w

(I'

E.

1
w/1' .
.
w) .
= [ I  [ i ( ! ' E i w/1'
~
w
.
w) 2]
cov(~
X)
and
covCA) = [i covCAi)l[(±' It
is c o n v e n i e n t
derivatives such
of ~(X,
to
state
~)
with
this
z) 2  [i(±' E i a)2].
last
respect
result
to ~.
Let
in terms
of the
us d e f i n e
the
Jacobian
that A
e_i = T(X, then the
we
evaluate
jackknife
the
wi)
= e_ + t J E i ~,
pseudoestimate,
estimate
the
pseudovalue
and,
finally,
J
113 A
8" = e + J w = e, while
t is
becomes
small
in t h e
and
due
present
to
the
homogeneity
of ~ ( X ,
~).
The
context
= CA' E)
J Ei
and, t h e r e f r o m , cov(~)
= [i
cov(J
Ei Z)/[ 1  [i(l
' Ei " w / l '
w)2].
variate
~.
114
Prokhorov This i.e.,
metric
metric
a topology
Prokhorov has
induces
of p o i n t w i s e
(1956),
progressively
investigate
limit
We f i r s t over
by H a m p e l
retained
the
theorems
 e.g.,
sample
space
space
~ and,
~, to
any
nneighborhood
m R of all p o i n t s
Formally,
0(.,.)
with
n Then
the m e t r i c
a n d g,
= {y
is g i v e n
associated
to m e a s u r e s
H(F,G) U = n(c), arbitrariness
function
n(c)
performed distance
in the
in ~.
distribution
the
A further
functions;
simplified
d(f,g) or,
= max
= 0,
= inf{e
> 0
by
one
~ H(F,G)
metric
f and g
several
associate
than
n from
~.
< n}. between
following
to f u n c t i o n s
It is b o u n d e d
= inf{e
> 0
as f o l l o w s
: G(m)
f
way
~ F ( ~ ~) + ¢, 0,
for
all
m C ~}.
q has
n equal the
increasing to
e.
dimension
is p o s s i b l e
while
This of
= H(G,F).
can he w r i t t e n
< G ( ~ E) + e, for
~ F ( m e) + e, f o r
is
a
f and g are
has
all ~ C ~)
equivalently,
d(f,g)
an o p e n
H(G,F)}
setting
whereas
: F(~)
for
of the m o n o t o n o u s l y
simplification
Prokhorov
less
in the
: F(~)
effectively, = G(n)
functions
~ C ~ we
d(f,g)
(H(F,G),
reduced
to
(1975).
axis.
0(x,y)
n'(~) >
e is a s c a l a r
F(n) Then
F and G,
theory
by
and
~space,
distance
selection
is o r d i n a r i l y
although
distance
of t h e
= inf{a ~ 0 ~(0)
real
subset
: ~ x E m,
by t h e
d(f,g)
The
at
the m e t r i c
illustrate
on t h e
closed
et al. two
introduced
robustness
in p r o b a b i l i t y
between
then,
been
to a s s e s s
see K o m l 6 s
metric
functions
It has
(1971)
attention
the P r o k h o r o v
distribution
In a m e t r i c
convergence.
proposed
define
a common
particular
distribution space a v a g u e t o p o l o g y ,
over the
all m C ~}.
115
0 ~ dCf,g) To
provide
some
distribution
insight,
functions
in [O,a] , w i t h
a >
0
in [ 0 , b ] , w i t h
b >
a
(I
 t)g,
: a sampling
n
h
a linear sum
is
of
The
size
between
functions
restrictive
distance
between
is
realized
H(H,F).
For
by
values been x I
2e
otherwise.
+ 2 [ s i)
knowledge
of
c.
117
The
way
f
converges
to f, w h i l e
the
size
n increases,
is of great
n
interest We
and will
assume
of v a r i a t e
thus
be
investigated.
n sufficiently
i i to its
pdf With
this
only
large
continuous {1}
condition,
leads
to a law
of
=
to the
is
conclude,
involved
iterated 
= c ~
may
we
observe
in the
that
construction
contribute
is so m u c h
faster
Experimental obtained
as
distribution
nl/a).
of the
+ 2n I
[ a/(2n)]
implicit
£ pdf
equation
{i}
dl).
in
why
{[ a/(2n)]
a fraction
[
s,
the
=
l
(2n/a)}.
e of the
intervals
in F r o k h o r o v
theoretical
derivation
results. 1
n
1000
10000
d(f,fn)
.00292
.000392
~(f,fn )
~(f,fn )
dev.
Number
of r e p l i c a t e s
obtain
the
last
two
.00289
.000377
.00032
.000021
5o
20
to lines
metric
metric.
1
Stand.
Yi
~.
convergence
above
following
in
i.e.
Kolmogorov
of the
a
size
only
(l/n)
the
by the
in
of ~0'
explain with
validation
Theoretical Average
to than
indicated
Parameter Sample
sample
logarithm
 [ a/(2n)]
lim This
exp(
dl/(a
the
distribution
approximation d(f,fn)
To
(n/a)
{i}
£ =
and
assimilate
~ is s o l u t i o n
e = n [ ¢ i pdf This
to
parent
has
been
118
Robustness Let be
f, g, ~ be
a distance
~(t
,g) the
n n drawn if,
distributions
function
such
distribution
from
and only
O, ~
between
(1971)
Under relate
also
based
the
estimator
t
based
on a sample
n t n is r o b u s t
Estimator
O,