Preface
The area of Order Statistics received a tremendous attention from numerous researchers during the past century...
69 downloads
1736 Views
46MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Preface
The area of Order Statistics received a tremendous attention from numerous researchers during the past century. During this period, several major theoretical advances were made in this area of research. As a matter of fact, two of those have been adjudged as breakthroughs in the field of Statistics [see: Kotz, S. and Johnson, N. L. (1991). Breakthroughs in Statistics, Vols. 1 and 2, Springer-Verlag, New York]. In the course of these developments, order statistics have also found important applications in m a n y diverse areas including life-testing and reliability, robustness studies, statistical quality control, filtering theory, signal processing, image processing, and radar target detection. Based on this immense activity, we decided to prepare this H a n d b o o k on Order Statistics and Their Applications. We feel that we have successfully brought together theoretical researchers working on theoretical and methodological advancements on order statistics and applied statisticians and engineers developing new and innovative applications of order statistics. Altogether, there are 44 articles covering most of the important theoretical and applied aspects of order statistics. For the convenience of readers, the subject matter has been divided into two volumes, the first one ( H a n d b o o k - 16) focusing on Theory and Methods and the second one ( H a n d b o o k - 17) dealing primarily with Applications. Each volume has also been organized according to different parts, with each part specializing on one aspect of Order Statistics. The articles in this volume have been classified into six parts as follows: Part Part Part Part Part Part
I II IIIIV V VI -
Results for Specific Distributions Linear Estimation Inferential Methods Prediction Goodness-of-fit Tests Applications
We have also presented an elaborate Author Index as well as a Subject Index in order to facilitate an easy access to all the material included in the volume. Part I contains five articles - - the first one by A. P. Basu and B. Singh discussed some important properties of order statistics from exponential distribution, the second one by N. Balakrishnan and S. S. G u p t a studies higher order moments of order statistics fi'om exponential and right-truncated exponential distributions and uses them to develop Edgeworth approximate inference in life-
vi
Preface
testing problems, the third one by N. Balakrishnan and P. S. Chan discusses order statistics from the log-gamma distribution and develops linear estimation of parameters of this distribution, the fourth one by N. Balakrishnan and R. Aggarwala establishes several recurrence relations for single and product moments of order statistics from a generalized logistic distribution and illustrates their application to inference and also extends these results to the case of doubly truncated generalized logistic distribution, and the last article by N. Balakrishnan and S. K. Lee similarly discusses order statistics from the Type III generalized logistic distribution and their applications to the estimation of parameters of this distribution. Part II contains four articles - - the first one by S. K. Sarkar and W. Wang discusses some properties of estimators for the scale parameter of the distribution based on a fixed set of order statistics, the second one by M. M. Ali and D. Umbach provides a review of various results on the optimal linear estimation of parameters in location-scale families, the third one by J. R. M. Hosking presents an overview of the L-estimation method, and the last article by S. Alimoradi and A. K. Md. E. Saleh discussed the L-estimation method in the context of linear regression models. Part III contains five articles - - the first one by A. C. Cohen illustrates the role of order statistics in estimating threshold parameters for a variety of life-span models, the second one by F. Kong discusses the estimation of location and scale parameters in the case when the available sample is multiply Type-II censored, the third one by N. Ni Chuiv and B. K. Sinha provides an elaborate review of results available on ranked set sampling and estimation methods based on such a sampling scheme, the fourth one by S. Geisser highlights some uses of order statistics in Bayesian analysis with particular stress of prediction and discordancy problems, and the last article by S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan develops some inverse sampling procedures based on order statistics for the purpose of testing for homogeneity in a multinomial population. Part IV contains an expository review article by K. S. Kaminsky and P. I. Nelson on the subject of prediction of order statistics, both one-sample and two-sample situations. Part V contains two articles the first one by R. A. Lockhart and M. A. Stephens provides a review of the probability plot and the related goodness-of-fit test based on the correlation coefficient, and the second article by S. Shapiro illustrates the role of order statistics in distributional assessment problems. Part VI contains six articles - - the first one by H. Schneider and F. Barbera presents applications of order statistics in the development of sampling plans for inspection by variables, the second one by M. Viana uses linear combinations of ordered symmetric observations to analyze visual acuity, the third (by G. R. Arce, Y.-T. Kim and K. E. Barner) and fourth articles (by K. E. Barner and G. R. Arce) overviews filtering methods based on order statistics and their role in smoothing of time-series data, the fifth one by S. T. Acton and A. C. Bovik elaborates the role of order statistics in image processing, and the last article by R. Viswanathan displays the application of order statistics to CFAR radar target detection.
Preface
vii
It needs to be mentioned here that the companion volume (Handbook - 16), focusing on theory and methods of order statistics, has been divided similarly into nine parts. While preparing this volume as well as the companion volume (Handbook 16), we have made a very clear distinction between order statistics and rank order statistics, the latter being an integral part of the area of Nonparametric Statistics. Even though there is an overlap between the two and also that order statistics play a role in Nonparametric Statistics, one of the most important uses of order statistics is in the development of parametric inferential methods, as is clearly evident from this volume. Unfortunately, some researchers still view Order Statistics as part of Nonparametric Statistics. Strangely enough, this view is also present in Mathematical Reviews. We express our sincere thanks to Mr. Gerard Wanrooy (North-Holland, Amsterdam) for his interest in this project and for providing constant support and encouragement during the course of this project. We also thank Mrs. Debbie Iscoe for helping us with the typesetting of some parts of this volume. Thanks are also due to the Natural Sciences and Engineering Research Council of Canada and the U.S. Army Research Office for providing individual research grants to the editors which facilitated the editorial work of this volume. Our special thanks go to all the authors for showing interest in this project and for preparing fine expository articles in their respective topics of expertise. Our final thanks go to Miss Ann Balasubramaniam for helping us with the preparation of the Author Index for this volume. We sincerely hope that theoretical researchers, applied scientists and engineers, and graduate students involved in the area of Order Statistics will all find this Handbook to be a useful and valuable reference in their work. N. Balakrishnan C. R. Rao
Contributors
S. T. Acton, School of Electrical & Computer Eng., 202 Engineering South, Oklahoma State University, Stillwater, OK 74078-0321, USA (Ch. 22) R. A. Aggarwala, Department of Mathematics and Statistics, University of Calgary, 2500 University Drive, N W Calgary, Alberta, Canada T2N 1N4 (Ch. 4) M. M. Ali, Department of Mathematical Sciences, Bell State University, Muncie, IN 47306-0490, USA (Ch. 7) S. Alimoradi, Isfahan University of Technology, POB 69-34, Isfahan, Iran (Ch. 9) G. R. Arce, Department of Electrical Engineering, University of Delaware, Newark, DE 19716, USA (Chs. 20, 21) N. Balakrishnan, Department of Mathematics and Statistics, McMaster University, 1280 Main Street West, Hamilton, Ontario, Canada, L8S 4KI (Chs. 2, 3,4,5, 14) F. Barbera, Management Department, University of Southeast Louisiana, Hammond, LA 70402, USA (Ch. 18) K. E. Barner, Department of Electrical Engineering, University of Delaware, Newark, DE 19716, USA (Chs. 20, 21) A. P. Basu, University of Missouri, Dept. of Statistics, 222 Math. Sciences Bldg., Columbia, MO 65211, USA (Ch. 1) A. C. Bovik, Center for Vision and Image Sciences, Department of Electrical & Computer Eng., The University of Texas at Austin, Austin, TX 78712-1084, USA (Ch. 22) P. S. Chan, Department of Statistics, The Chinese University of Hong Kong, Shatin, Hong Kong (Ch. 3) A. Childs, Department of Mathematics and Statistics, McMaster University, 1280 Main Street West, Hamilton, Ontario, Canada L8S 4K1 (Ch. 14) A. C. Cohen, 421g Westfield Cornt, Columbus, SA 31907-1837, USA (Ch. 10) S. Geisser, School of Statistics, University of Minnesota-Twin Cities, 206 Church Street SE, 270 Vincent Hall, Minneapolis, MN 55455, USA (Ch. 13) S. S. Gupta, Department of Statistics, Purdue University, West Lafayette, IN 47907, USA (Ch. 2) J. R. M. Hosking, IBM Research Division, Thomas J. Watson Research Ctr., P.O. Box 218, Yorktown Heights, New York, N Y 10598, USA (Ch. 8) B. H. Humphrey, Department of Mathematics, Southern Illinois University at Carbondale, Carbondale, IL 62901-4408, USA (Ch. 14) xvii
xviii
Contributors
K. S. Kaminsky, Department of Math. Stat., School of Mathematics & Computer Sci., Chalmers University of Technology, S-41290 Gothenburg, Sweden (Ch. 15) Y.-T. Kim, Signal Processing R&D Center, Samsung Electronics Co., Suwon, Korea (Ch. 20) F. Kong, Department of Mathematics and Statistics, University of Maryland at Baltimore County, Baltimore, MD 21228, USA (Ch. 11) S. K. Lee, Department of Math. & Stat., McMaster University, Hamilton, Ontario, Canada L8S 4K1 (Ch. 5) R. A. Lockhart, Department of Mathematics and Statistics, Simon Fraser University, Burnaby, BC, Canada V5A 1S6 (Ch. 16) P. I. Nelson, Department of Statistics, Kansas State University, Manhattan, KS 66506, USA (Ch. 15) N. Ni Chuiv, Department of Mathematics and Statistics, University of New Brunswick, Fredericton, NB, Canada, E3B 5A3 (Ch. 12) S. Panchapakesan, Department of Mathematics, Southern Illinois University at Carbondale, Carbondale, IL, 62901-4408, USA (Ch. 14) A. K. Md. E. Saleh, Department of Mathematics and Statistics, Carleton University, 4302 Herzberg Building, 1125 Colonel By Drive, Ottawa, Ontario, Canada KIS 5B6 (Ch. 9) S. K. Sarkar, Department of Statistics, Temple University, Philadelphia, PA 19122, USA (Ch. 6) H. Schneider, Department of Quantitative Business Analysis, Louisiana State University, Baton Rouge, LA 70803, USA (Ch. 18) S. Shapiro, Department of Statistics, Florida International University, University Park Campus, Miami, FL 33199, USA (Ch. 17) B. Singh, University of Missouri, Dept. of Statistics, 222 Math. Sciences Bldg., Columbia, MO 65211, USA (Ch. 1) B. K. Sinha, Department of Mathematics and Statistics, University of Maryland at Baltimore County, Baltimore, MD 21228-5398, USA (Ch. 12) M. A. Stephens, Department of Mathematics and Statistics, Simon Fraser University, Burnaby, BC, Canada V5A 1S6 (Ch. 16) D. Umbach, Department of Mathematical Sciences, Bell State University, Muncie, IN 47306-0490, USA (Ch. 7) M. Viana, Department of Ophthalmology and Visual Sciences (M/C 648), Eye and Ear Infirmary, Lions of Illinois Eye Research Inst., 1855 West Taylor Street, The University of Illinois at Chicago, Chicago, IL 60612-7243, USA (Ch. 19) R. Viswanathan, Department of Electrical Engineering, College of Engineering, Mailcode 6603, Southern Illinois University at Carbondale, Carbondale, IL 62901-6603, USA (Ch. 23) W. Wang, Clinical Biostatistics Dept., Wyeth-Ayerst Research, 145-B2 King of Prussia Road, Radnor, PA 19087, USA (Ch. 6)
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 ElsevierScience B.V. All rights reserved.
1 [
Order Statistics in Exponential Distribution
Asit P. Basu and Bahadur Singh
1. Introduction
The exponential distribution is a basic physical model in reliability theory and survival analysis. The properties of the exponential model have been studied widely in the statistical literature. For a survey of exponential and other models in reliability see Balakrishnan and Basu (1995), Lawless (1982) and Sinha and Kale (1980). Order statistics occur naturally in lifetesting and related areas as the weakest unit, among a set of n units, fails first. The second weakest unit fails next and so on giving rise to order statistics. The properties of order statistics have been studied extensively in a number of monographs such as, Balakrishnan and Cohen (1991), David (1981), and Sarhan and Greenberg (1962). In this chapter we consider the properties of order statistics and use these results for estimating the parameters of the one and two parameter exponential distributions. In Section 2 we give a brief summary of some important properties of order statistics from the exponential distribution. In Section 3 various types of censoring are described. The estimates of scale parameter 0 of the exponential distribution for Type I, Type II and randomly censored data are derived. The inferences concerning the two-parameter exponential distribution are also considered. In Section 4 these results are extended to two or more independent Type II censored samples. Order restricted inference for the scale parameters 01, 02,..., Ok (k > 2) of k exponential distributions are considered in Section 5. Bayesian inference is considered in Section 6. Bayesian estimates of 0, for Type I and Type H censored samples, are obtained. Also, Bayesian estimators of # and 0 for the two-parameter exponential family for Type II censored sample are obtained.
2. Order statistics and its properties
In this section we state some well known results of order statistics for the exponential distribution. Let X be a random variable from the one parameter exponential distribution with density function f(x]O) = 0-1 exp(-x/O), distribution function F(x]O)= 1-exp(-x/O), and survival function S(x]O)= 1-F(x]O)=
4
A.P. Basu and B. Singh
e x p ( - x / O ) . We shall denote this one parameter exponential distribution by e(0). Let )((1) _< )((2) _< ..- _< X(,) denote the order statistics in a r a n d o m sample of size n from a population with the standard exponential distribution e(1) with probability density function (pdf) and distribution function, respectively, f(x)=e
-x, F ( x ) = l - e
Then, the marginal pdf of the
f r ( x ) = ( r _ l ) ! ( n _n!r ) !
-x, r th
0_<x '
r with Tr = Z t(i) + (n - r)t(,.)
(3.4)
i =1
Then, it follows easily that the maximum likelihood estimator of 0, say 0 is Tr/r. Making the transformation W1 =nt(l),
~=(n-i+ F
1)(t(i)-t(i_l)),
i=2,...,r r
,
it is easy to verify that T~ = ~i=1 t(i) + (n - r)t(r) = ~ i = l W/, and the Jacobian is
8
A . P . Basu and B. Singh
0(w~,..., m~)
n!
O ( t ( , ) , . . . ,t(~))
(n - r)!
'
and therefore the joint pdf of (W1,..., W~) is given by
Moreover,
2/'/0
~
-0=
= ~ e1 x p
f(wl,...,w~)
Wi
,
Wi > 0 .
z22 r '
EXAMPLE 3. Consider now a random sample of size n with Type II censoring from the two-parameter exponential family with the pdf f(t[#,O)=-oe
1
/t ~0 o ,
(3.5)
t>_#
where # _> 0 is a threshold or guarantee time parameter and 0 is a scale parameter. The joint pdf of the r smallest observations, say t(l) < t(2) < ... < t(r), based upon the above random sample of size n from (3.5), is given by (n -n! r)! 01~exp - 0
=
t(i) - #) + (n - r) (t(r) - #
,)}
(3.6)
where # < t(1) < t(2) < ... < t(~). Obviously, the M L E o f # is given by/~ = t(~), and with # = t(1) the M L E of 0 is given by 0 =
t(i) + (n - r)t(r) - nt(1)
r = Tzr/r, say .
(3.7)
Then, ~ and 0 are independently distributed, and 2n(/~- #)/0 and 2rO/O are independently distributed as Z~ and Z22r-2, respectively. It should be noted here that with Type II censoring the number, r, of exact failure times is fixed (non-stochastic) whereas it is a random variable with singly Type I censored data.
3.3. R a n d o m
censoring
Intuitively, it is reasonable to think of censoring times as random variables. To illustrate the ideas, we generalize the example described above in the first paragraph at the beginning of Section 3.1 for a clinical study in cancer research involving Type I censoring: Assume that a clinical study begins at time t = to, and the enrollment period terminates at time t = T > to, and then the patients are followed-up for an additional time z beyond T. Patients may enter the study at various point in time during the enrollment period, [to, T], more or less in a random fashion, according to their time of diagnosis. A patient may die, be alive or leave the study (i.e., be censored) during the course of the study (i.e., enrollment
Order statistics in exponential distribution
9
or follow-up periods). Then, it is reasonable to assume that the censoring times are independent random variables. Assume now that there are n patients enrolled in the study, and each patient has a lifetime or failure time T and a censoring time L, where T and L are independent continuous random variables with survival functions S(t) and G(t), respectively. Let (T~,Li);i = 1,2,..., n be independent and let ti = min(T~, Li) and 6i = 1 if T/_< Li and 6i = 0 if T / > Li. The data on n patients consist of pairs (ti, 6i), i = 1 , 2 , . . . ,n. Let f(t) and 9(t) be the pdf's of Ti and Li, respectively. Then [Lawless (1982, pp. 37-38)], the sampling distribution of {(ti, 6i); i = 1 , 2 , . . . , n} is given by n
U[f(ti)G(ti)]a~[g(ti)S(ti)]l-6i i=1
(3.8)
If 9(t) and G(t) do not involve any parameters of interest, then the first factor in the parentheses in (3.8) can be dropped, and then, the likelihood function is given by n
L = I-[f(ti)a'S(ti) 1 ~'
(3.9)
i=1
EXAMPLE 4. If T1, T2,..., Tn are a random sample from an exponential distribution with the unknown means 0, i.e., p d f f ( t ) = (1/O)exp(-t/O) and survival function S(t) = exp(-t/O), then likelihood function (3.9) simplifies to [Lawless (1982, p. 105)],
l{l }
L(O) = ~ e x p
- ~ .= ti
,
(3.10)
n
where r = ~ i = l 6i is the observed number of failures, and r is a random variable. Note that n
T~r=Zti=~_Ti+ZLi i=l
iED
(3.11) iEC
is the total observed lifetime for the n patients, where D and C denote the set of individuals for whom lifetimes are observed are censored, respectively. Moreover, (r, T,.') is minimal sufficient for 0, and if r > 0, then 0 = T[/r is the M L E of 0. In case r = 0, likelihood does not have a finite maximum. However, if n is large the probability of r being zero is negligible and, therefore, in the following it is assumed that r > 0. The first and second derivatives of the log-likelihood function of (3.10) are given by
10
A. P. Basu and B. Singh dlnL r 1 " d ~ - 0 t - ~ i~_1 n
d 2 In L
r
2 Z ti
d02
02
03 i=1
Now, we obtain the expected Fisher's information in two steps. First, we assume that the censoring values are L1, L 2 , . . . , L , are fixed and known. Later on in the second step, we shall relax this assumption. So, assume that L1,L2,... ,L, are fixed and known. The probability distribution of ~i is given by Pr(6i = 0) = exp(-Li/O)
Pr(6i = 1) = 1 - exp(-Li/O)
and
.
Moreover, E(til6i = O) = Li, and using integration by parts Lt
E(t,16i= l ) = E ( T i t T i 2) exponential means is considered using the likelihood ratio approach when the data are Type II censored. The two cases of known and u n k n o w n threshold parameters are dealt with separately.
Case (i) Type H censored samples with known threshold parameters Assume that there are k independent Type II censored r a n d o m samples of sizes nl, n2~..., nk with pre-specified number of failures rl, r 2 , . . . , rk, respectively, and that t~(l) < ti(2) < ..- < ti(r,)
(i = 1 , 2 , . . . , k )
are the ri smallest ordered observations from the population i with mean 0i (#i is assumed to be zero) with p d f (3.5). The main interest is in testing the null hypothesis
Ho : 01 = 02
Ok versus
.....
HI : Oi ~ Oj
for some i ¢ j .
The likelihood function given the above Type II censored data is given by
L(O,,...,
0k) = I I (., _ r,)! 07' exp i=1
where T / = G~'_-J ti(;) + (hi - ri)ti(rj). Under H0, the M L E of the c o m m o n value of 0, say 0, is given by 0 = p,, and under the full parameter space the M L E of 0i, say Oi, is Oi = Z-/ri. Therefore, the likelihood ratio test of H0 versus H1 rejects H0 for large values of
EL, /EL1
A = - 2 ln[L(0,..., O)/L(01,...., 0k)] =
2
ri
rilnOi .
lnO-2 i=1
(4.3)
Order statistics in exponential distribution
13
Asymptotically for large values of ri (i = 1 , 2 , . . . , k), under H0, A is distributed as 2 Zk-l. For small to moderate size values of r~ s, as Bartlett type correction may be applied. In Bartlett type correction, one approximates the random variable CA by a chi-square distribution with k - 1 degrees or freedom, Z2 1. The constant C is given by C = 1 + 6 ( k - 1~--)
\ i=1 -t
-
"
Case (ii) Type H censored samples with unknown threshold parameters Assume that there are k independent Type II censored random samples of sizes ni, n2,..., nk with pre-specified number of failures rl, r 2 , . . . , rk, respectively, and that
t~(i) < ti(2) < " " < t~(r,)
(i = 1 , 2 , . . . , k )
are the ri smallest ordered observations from the population i from a two-parameter family with parameters (#i, Oi) with pdf (3.5). Then, the likelihood function given the above Type II censored data is given by L(#l, 01,..., #k, Ok) = ]--[ (n~---hi 7=T
exp - - ~
(tio) -- Pj) + (ni - ri)(ti(~i) - #i)
where #i < t~(1) < ti(2) < -.. < ti(r~), (i = 1 , 2 , . . . ,k). Since #i < ti(1), the M L E of#i under full parameter space, fi# is /~i = ti(l)" With ~i = ti(1), the M L E of Oi under the full parameter space, say Oi, is given by ri
Oi = Ti/ri where j=l
Under H0, the MLE of #i is exactly the same a s / 2 i = ti(1) as above for the full parameter space. However, the MLE of the common value of 0~ = 0 under H0, say 0, is given by
0 = Zri i=l
ri /
i=l
The likelihood ratio test of H0 vs. H1 rejects H0 for large values of A = - 2 In [L(/~I, 0,...,/)k, 0)/L(/~I, 0,,...,/)k, 0k)] =
ri
lnO-2
rilnOi . i=l
(4.4)
A. P, Basu and B. Singh
14
and likelihood ratio test statistic (4.4) is in exactly the same form as (4.3) except that the MLE's are slightly different that those in case (i) above. Asymptotically for large values of ri (i = 1,2, ' " "~ k), under H0, A is distributed in )~2 For small k-l' to moderate size values of ri's, a Bartlett type correction may be applied. In Bartlett type correction, one approximates the random variable CtA by a chisquare distribution with k - 1 degrees of freedom, Xk2 1" The constant C I is given by
C'=l+6(k-l__
i=1
Fi
I
r
k
I
Similar results for randomly censored data can be derived, and these will be given elsewhere.
5. Order restricted inference
In this section a very brief summary of inference concerning exponential means is presented when the means are subject to an order restriction. Two cases of complete and censored samples are considered. It turns out that the solutions to a wide class of restricted optimization problems can be found by restricted least squares methods. To put it in perspective, some notation, terminology and background information in introduced in the next section. For more information the reader is referred to Chapter 1 of Robertson, Wright and Dykstra (1988).
5.1. Some notation, terminology and background Letting X = {xl,x2,... ,xk}, suppose that w is a positive weight function defined on X and Y is a restricted family of functions on X. For an arbitrary function 9 defined on X, let g* be the least squares projection of g onto Y . The g* is the solution to the restricted least squares problem if 9* minimizes Z [ 9 ( x ) -f(x)]2w(x)
subject to
f E@ .
xEX
If the family ~ satisfies certain conditions, then 9* provides the solution for a variety of other optimization problems for which the objective functions are quite different from that of least squares. First, the concept of isotonic regression for the simply ordered case is introduced. Assume that the k elements of X are simply ordered i.e., xl _~ x2 _ -. - _~ xk. A function f defined on X is isotonic (with respect to the simple order) if and only if f(xl) _ 0 for all 0 E (0,_0), p2(v) > 0 for all E T, (iii) q'(O,z) = -Op~i(O)p2(z) for all 0 E (_0,0) and ~ E T. (5.5) Obviously, f o f(t; 0, ~) d t = 1. Now, differentiating twice with respect to 0 under the integral sign, one may show that E[K(t;~)] = 0 and Var[K(t;v)] = [P~( O)p2('C)] -1. Assume now that we have k (k _< 2) populations characterized by the densities in the exponential form f(.; O(xj), zj) for j = 1 , . . . , k, and that one has k independent random samples {tji; i = 1,... ,nj;j = 1,... ,k} from the k populations. Then, using the regularity conditions (5.3), (5.4) and (5.5), and differentiating the log-likelihood function, the maximum likelihood function of O(xj) is
Order statistics in exponential distribution
17
nj
0(xj) =
ZK(tji; zj)/nj
(5.6)
i=1
Now, assume that -< is a quasi-order on X = {Xl,X2,... ,xk}; assume that one has prior information that 0 is isotonic with respect to this quasi-order; and assume that an estimate of 0 is required with this property. Then, the following theorem can be used to obtain restricted maximum likelihood estimate of 0 which is also isotonic with respect to the quasi-order _~ on X = {Xl , X 2 , . . . ,Xk}. THEOREM 5.1. Assume that one has k independent random samples as described above from the k populations characterized by the exponential densities of the form (5.2) and satisfying the regularity conditions (5.3), (5.4) and (5.5). Then, the restricted maximum likelihood estimate of O(x) = (O(xl), 0(x2),..., O(xk)) subject to the condition that the estimate be isotonic with respect to the quasi-order _~ on X is given by the isotonic regression of 0(x) = (0(Xl), 0(x2),..., 0(xk))~with weight vector w(x) = (W(Xl), w(x2),... ,w(xk)) where w(xj) = njp2(zj) and O(xj) is given by (5.6). For the proof of Theorem 5.1 the reader is referred to Robertson, Wright and Dykstra (1988, p. 34). Theorem 5.1 is, also, true for the Type II censored data and the corresponding proof is straightforward. Assume, now that zj's are known and let H0:01
Ok; HI:
= 02 . . . . .
andH2:(_0- (00/]
k
j=l
(5.9) Similarly, if A~2 denotes the L R T statistics for testing H L R T rejects H1 for large values of T12 = - 2 In Ai2 where
l versus//2
- H~, then,
18
A. P. Basu and B. Singh k k T12 = 2 Z wjOj [/91( 8 j ) - pl (0;) 1 + 2 E j=l j=l
nj[q(Oj,zj)-
q(8),
zj)]. (5.10)
The LRT of Ho versus/-/2 - Ho rejects Ho for T02 m - 2 In Ao2 where
k k To2 = 2 Z wjOj[Pl (Oj) -- Pl (80)] -}- 2Z nj[q(Oj, V) - q(Oo, V)] • j=l j=l
(5.11)
Now, assume that nj-~ cc and N ~ ~ (with ~ nj = N) in such a way that nj/N ~ aj E (0, 1) for j E X. Let P(I, k; w) denote the probability that, under H0, there are I distinct coordinates in the vector 0* of the restricted M L E of 0. Then, we have the following theorem which can be used to compute the tail probabilities of the appropriate test statistics. THEOREM 5.2. If H0 is true, for any real number C1,
k lim P[Tol >_ C1] = ZP(l,k;w)P[z~_l >_ C1] • N---+c~ l-1
(5.12)
Moreover, H0 is asymptotically least favorable within Ha, and under H0, for any real number C2,
k lim P[T12 >_ (72] = E P(l,k;w)P[)~_l >_ C21 • l=1
(5.13)
Furthermore, under H0 and for any real number C3, lim P[T02 >_ C3] =P[z~-I -> C3] .
(5.14)
The coefficients P(l,k; w) are the level probabilities associated with the usual normal means problem which are discussed extensively in Chapters 2 and 3 in Robertson, Wright and Dykstra (1988). For a proof of Theorem 5.2 the reader is referred to Robertson et al. (1988, pp. 164-166).
5.3. Application to one parameter exponential distributions In this section we give a very brief account of the results for the case of complete and Type II censored samples from one parameter exponential distributions. The case of one parameter exponential distribution is included in Section 5.2 as a special member of the exponential family (5.2) with the following substitutions: pl(0)-
1
0'
p2(z)=l,
K(t;z)=t,
The case of compete samples is discussed next.
S*(t;~)=Oandq(O,z)=-lnO
Order stat•tics in exponential distribution
19
Case I. Complete samples Assume k independent random samples {tji; i = 1,..., nj;j = 1 , . . . , k} from k one parameter exponential distribution e(0j). In this case the unrestricted M L E of Oj, Oj (5.6) simplifies and is given by
Oj = Z t#nj ,
(5.15)
i=1
and the weight vector is w = (w~,..., wk) with wj = nj. The M L E of the common value of 0j in (5.7) is given by
00 = Z nj j=l
(5.16)
.j /j=l
and the M L E of 0 = (01,02,..., Ok) subject to 0 E H1 is given by 0* = Pw(OjlJ) where Oj is now given by (5.15). Similarly, the LRT statistics (5.9), (5.10) and (5.11) simplify. The level probabilities P(l, k; w) now depend on the sample sizes ( n 2 , . . • , nk) and the null distributions of T01, T12 and T02 are given by Theorem 5.2 with these level probabilities.
Case II. Type censored samples Assume now that there are k independent random samples of size nj from one parameter exponential distributions e(Oj),j = 1 , . . . , k. Let the rj smallest (order statistics) failure times for the jth sample be tj(1) < tj(2) < - . .
0 .
r(v + r)
The posterior density (6.5) is again an inverted gamma density with parameters + Tr and v + r. That is, O[Tr ~ IG(c~ + Tr, v + r). In this case the Bayes estimator of 0, assuming the squared loss function, is given by the posterior mean, 0B2 = E(0IT~) --
c~+ T~ v+r- 1 '
(6.6)
provided v + r > 1. The posterior variance of 0 is given by
E 0-0B2
(v+r_ 1)2(v+r_2 )
Similarly, the Bayes estimator of 0, say 0e t , assuming squared error loss function, for the Type I censored sample is given by (6.6) with Tr replaced by T~. The corresponding posterior variance of 0 for the Type 1 censored sample is similarly given by (6.7) with T~ replaced by T~. The Bayes estimator and posterior variance of 0 using noninformative prior 91 (0) can be obtained in an analogous manner from (6.6) and (6.7) with the substitution ~ = v = 0. We next consider the two parameter exponential distribution with density (3.5). Here the conditional density of t given # and 0 is 1
f ( t l # ,O)=-~e
(t-~/
o ,
t>_# .
As in Example 3 in Section 3 consider a Type II censored sample t = (t0) < t(2) < ... < t(r)) size r based upon a random sample on size n. The iikelihood function L(#, O) = f ( t [#, 0)
(6.8)
is given by (3.6). Now, consider the class of noninformative prior distribution 1 gl(#,0) o(0,
--o(~0
(6.10)
The Bayes estimator of 0 is then
rzr
OB--r_2,
r>2
.
(6.11)
Similarly, the marginal posterior density of # is given by ( r - 1)nr~r 1 h 2 ( ~ l t = fo0 ° g(#,O)lt)dO={Tzr+n(t(1)_#)}r
, -cx~ 1 and a = 1 , 2 , . . . , #(a) (a 1),/ n l:n = a#l:n
(2.3)
with #(o) l:n ~ 1. PROOF. F r o m Eqs. (2.1) and (2.2), for n > 1 and a _> 1 let us consider ttl~n-1) = n
/7
wa-l[1 - F ( w ) ] n - l f ( w ) d w
(2.4) [1
F(w)] n d(w a)
u p o n using the relation in (1.3). The recurrence relation in Eq. (2.3) follows u p o n integrating the right h a n d side of (2.4) by parts and simplifying the resulting expression. THEOREM 2. F o r n _> 2, 2 < r < n and a = 1 , 2 , . . . , (a-l) #(a) r:n = #(a) r-l:n _I_ a#r:n /(?./ _ r -}- 1)
(2.5)
with t*r:n , (0) =- 1. PROOF. F r o m Eqs. (2.1) and (2.2), for n _> 2, 2 < r < n and a _> 1 let us consider n!
/0
W a-1 [F(w)] ~-1 [1 - F ( w ) ] "
rf(w) dw
n~ (r - l)!(n - r)!a
(2.6)
where I is the integral
I =
/0
[F(w)l r-1 [1 - F(w)] n r+l d(w a)
u p o n using the relation in (1.3). Integration by parts now yields I = (n - r + 1)
/7
waiF(w)] r 1[1 - F ( w ) ] n - r f ( w ) d w
- (r - 1) f0 ~ w a [ f ( w ) ] r - 2 [ 1 -- F(w)]n-r+If(w)dw
=
It- l l!(ea-r + 11!{# r : , (
r-l:,j
•
28
N. Balakrishnan and S. S. Gupta
Upon substituting this expression of I in (2.6) and simplifying the resulting equation, we derive the recurrence relation in Eq. (2.5). It should be mentioned here that Theorems 1 and 2 have been proved by Joshi (1978) and have been presented here for the sake of completeness and a better understanding of the results derived in subsequent sections. The recurrence relations presented in Theorems 1 and 2 will enable one to compute all the single moments of order statistics for all sample sizes in a simple recursive way.
3. Relations for double moments
The joint density function of Xr:n and Xs:n is (see David, 1981, p. 10) given by n!
1)!(s- r - 1)!(n- s)! [F(w)]~-i
fr,s:,(w,x) = ( r -
x IF(x) - F(w)] *-r-1 [1 - F ( x ) ] " - S f ( w ) f ( x ) , 0_<w<x<ec,
l 2 that
I(w) = ( n - s + 1) --(s--r--
/? /?
xb[F(x) -F(w)]S-r-l[1 - F(x)]n-S f ( x ) d x xb[F(x)_F(w)lS
1)
.
r 2[l_f(x)]n-s+lf(x)dx
U p o n substituting the above expressions of I(w) in Eq. (3.5) and simplifying the resulting equations, we derive the recurrence relations in Eqs. (3.3) and (3.4). THEOREM 4. F o r n >_ 2 and a, b = 1 , 2 , . . . , (a,b) = #(a+b) (a-l,b) (a+b) 1,2:n 2:n +a#1,2:n --]'/#1:n 1 ; forn_>3,22ands=r+l
that
J ( x ) = x~[F(x)] r-1 - (r - 1)
-xa[F(x)]r +r
forr=l
dw ,
fo Xwa[F(w)]r-2 f ( w )
~0xwa[F(w)]r-l f ( w ) d w
dw
,
and s _> 3 that J ( x ) = (s - 2)
+
foxwa[F(x)
fo Xw a IF(x) -
- (s - 2)
- F(w)]S-3 f ( w ) d w
F(w)] ~-2 f ( w ) d w
fo Xw a F ( w ) [ F ( x )
- F(w)] s J f ( w ) d w ,
and for r_> 2 and s - r >_ 2 that J ( x ) = (s - r - 1)
fo Xw~[F(w)] r-1 IF(x) -
- (r- l) foXwaIF(w);-2[F(x) + r
foXwa[F(w)] r
- (s - r - 1)
F(w)]S-r-Z f ( w ) d w
- F(w)]~-r-lf(w) dw
l[F(x) - F(w)] s-r-1 f ( w ) dw
~0Xwa[F(w)]r[F(x)
_ F(w)]S
r
2 f ( w ) dw .
U p o n substituting the above expressions of J ( x ) in Eq. (3.10) and simplifying the resulting equations, we derive the recurrence relations given in Eqs. (3.6)(3.9). Either the recurrence relations presented in T h e o r e m 3 or those presented in T h e o r e m 4 m a y be used in a simple systematic recursive way to c o m p u t e the double m o m e n t s o f order statistics (of all order) for all sample sizes. The relations in T h e o r e m 3 have been proved by Joshi (1982) for the special case when a=b=l.
31
Higher order moments o f order statistics
4. Relations for triple moments T h e j o i n t d e n s i t y f u n c t i o n o f Xr:n,Xs:, a n d Xt:~ is g i v e n b y
n! f~,s,t:,(w,x,y) = (r - 1)!(s - r - l ) ! ( t - s - 1)!(n - t)! [F(w)]r-1 × IF(x) - F(w)] s-r-1 IF(Y) - F ( x ) ] t-~-I × [1 - F ( y ) ] n - t f ( w ) f ( x ) f ( y )
,
O<w<x 2 that
yC{f(y) - f(x)}t-s-l{1 - f(y)}n-tf(y) dy
I(x) = (n - t ÷ 1)
- (t-s-
yC{F(y) -F(x)}t-s-2{1 - F ( y ) } n - t + l f ( y ) d y
1)
.
U p o n substituting the a b o v e expressions o f I(x) in Eq. (4.5) a n d simplifying the resulting equations, we derive the recurrence relations in Eqs. (4.3) a n d (4.4). THEOREM 6. F o r n _> 3, 3 < t < n a n d a,b,c = 1 , 2 , . . . , Id(a,b,c) 1,2,t:n
forn_>4,
=
(a+b,c) (a-l,b,c) n (a+b,c) ]'12,t:n + al~l,2,t:n -~l,t-l:n-1
2 5,
]A(a'b'c)
a (a-l,b,c) t'l (a,b,c) ]Al,s,t:n -- [Al,s-l,t-l:n
2,s,t:n ÷
2 2 a n d t - s K ( w , y ) = (t - s)
- (s-r-
+ (t-s
_> 2 that xb[F(x) - F(w)] s-~-I [F(y) - F(x)] t-s-1 f ( x ) dx
1)
- 1)
xb[F(x) - F ( w ) ] ~ r 2{F(y)
2
× {1 - F ( y ) ] f ( x )
x~{F(x) _ F ( w ) ] ~
dx - (~ - r - l)
× IF(y) - F(~)] t-~-I [1 - F ( y ) ] f ( x )
-F(x)]t-'f(x)d~
~-IEF(y ) _ F ( x ) ] t - ,
£
2
x~{F(x) - F ( w ) ] ' - ~ - 2
d~.
U p o n substituting the a b o v e expressions o f K ( w , y) in Eq. (4.15) a n d simplifying the resulting equations, we derive the recurrence relations given in Eqs. (4.11)(4.14). The recurrence relations presented in a n y one o f T h e o r e m s 5, 6 a n d 7 m a y be e m p l o y e d in a simple systematic recursive m a n n e r to c o m p u t e the triple m o m e n t s o f o r d e r statistics (of all order) for all sample sizes.
N. Balakrishnan and S. S. Gupta
36
5. Relations for quadruple moments The joint density f u n c t i o n ofXr:., Xs:., Xt:. a n d X~:. is given by
n! 1 ) ! ( t - s - 1 ) ! ( u - t - - 1 ) [ ( n - u)[
1)[(s- r-
fr,s,t .... ( W , X , y , z ) = ( r -
x IF(w)] r - l [ F ( x ) - F(w)]" x IF(z) - F(y)]U
t-I
l[F(y) - F(x)] t - , - I
"
[1 - F ( z ) ] n - U f ( w ) f ( x ) f ( y ) f ( z )
O 4, 3 _< t < u _< n a n d a , b , c , d
#(a,b,c,a)
1,2,t,u:n =
Ft(a+b,c,d) 2,t,u:n
(~-l,b,c,a)
q- a#l,2,t,u:n
for n_> 5, 2 6,
2_ 3 that J ( x ) = (s - 2)
/o ~
w a[F(x) - F ( w ) ] s 3 f ( w ) d w
+ f o x wa[F(x) - F ( w ) ] S - Z f ( w )
- (s - 2) a n d for r > 2 a n d s -
/o xw ~ F ( w ) [ F ( x )
dw
- F(w)]S-3f(w)
dw ,
r_> 2 that
J ( x ) = (s - r - 1)
waiF(w)] ~-1 IF(x) - F ( w ) ] s ~ - 2 f ( w ) dw
- (r -
/0 ~ 1) j0 x wa[F(w)]r-2[F(x)
4- r
w a [F(w)] r-1 IF(x) - F(w)] * - r - ' f ( w ) d w
/o ~
- (s - r - 1)
- F(w)] s ~ - l f ( w )
wa[g(w)]~[F(x)
/o ~
_ F(w)]s
r
dw
2 f ( w ) dw ,
U p o n substituting the a b o v e expressions o f J(x) in Eq. (5.10) a n d simplifying the resulting equations, we derive the recurrence relations given in Eqs. (5.6)-(5.9).
39
Higher order m o m e n t s o f order statistics
THEOREM 10. F o r n _ > 4 ,
1 _3anda, 1 t-r-1
li (a:b,c,d) (a+b,c,d) r,r+ l,t,u:n = lir,t,u:n
(u
r,r+l,u-l:,} ;
b,c,d=l,2,3,...,
[. (a,b-l.c,d) [eti,.,.+l/.u:, ' ' t, U
I
_ li(a+b,c,d)
~li(a,b,c,d) r,r+l,t
1....
_ li(a+b,c,d)\
(5.12)
r,t-l,u:n J
[ li(a,b,c,d) __ + 1~1.) r,~+l,t-l,,-1 . . . .
li(a+b,c,d)
~]
t-l,~-l:,JJ
.
,
and a,b,c,d = 1,2,3,...,
for 1 < r < s < u < _ n , s - r > 2
li(a,b,c,d) = li(a,b,cfl) .. (a,b 1,c,d) r,s,s+l,u:n r,s-l,s+l,u:n ~- Olir,s,s+l,u:n
(u - s __ 1){lit,;,.:.' (a b+c d) --/o,b,c,d/ r,s-1,s,u:n 1 J
-
• (a.b+c,d)
--
(n -- u -t- 1) { tXr,.s.u_l:n
a n d for 1 _ 2 and a, b , c , d = l , 2 , 3 , . . . ,
__1
[ Dlirs't'u:n["(a b l.c,d)
(u
"f (~ b,~ a)
r,s-l,t,u:n + t -- S --
--
-- l ) l l i r , s , t : l , u : n
--
p(a,b,c,d)
r , s - l , t - I ....
,, f (a b.c,d) -- ( n - - U + (a,O,c,d)
(5.13)
}
p(a,b,c,a)
l)~lir,s',t2i,u-l:n--
r,s 14 l,u
(5.14)
l:n}] ,
(a c d)
w h e r e li,.,s,t.... - lir,t',~,':n PROOF. F r o m Eqs. (5.1) and (5.2), for 1 _< r < s < t < u _< n a n d a , b , c , d _> 1 we can write (a,b l,c,d) lir,s,t,u:n
n! (r-
1)!(s- r-
x [~[7
1)!(t- s-
1)!(u- t-
1 ) ! ( n - u)!b
y wayCS[F(w)] r-1 IF(z) - F(y)] u-` 111 - F(z)] '~-u
JO dOdO
x K(w,y)f(w)f~/)f(z) where
dwdydz ,
(5.15)
40
N. Balakrishnan and S. S. Gupta
K(w,y) =
[F(x) - F(w)] s-r-1 IF(y) - F ( x ) ] t - s - l f ( x ) d(x b)
= fwY[F(x) - F(w)] s-r-1 [F(y) - F(x)] t-* d(x b) +
yw[F(x)
+
- F(w)] s-r-1 IF(y) - F(x)] t-*-I [F(z) - F ( y ) ] d(x a)
IF(x) - F(w)] s-r-1 [F(y) - F(x)] t-s-1 [1 - F(z)] d(x b)
u p o n u s i n g the r e l a t i o n in (1.3) a n d t h e n writing 1 - F ( x ) as [ F ( y ) F(x)] + IF(z) - F ( y ) ] + [1 - F ( z ) ] . I n t e g r a t i n g b y p a r t s n o w , we o b t a i n for s = r+l andt=r+2that
K ( w , y ) = - w~[FO,) - F(w)] +
//
- w~[F(z) - F C v ) ] + y ~ [ 1 fors=r+l
xb f ( x ) d~ + y ~ [ F ( z ) - F 0 , ) I - F(z)] - w~[1 - F ( z ) ]
,
andt-r>3that K ( w , y ) = - wb[F(y) -- F(w)] t r-1 + (t - r - 1)
/w
xb[F(y)-- F(x)] t-r-2
x f ( x ) d x - w a [ F ( y ) - F(w)] t-~ 2IF(z) - F ( y ) ] + (t - r - 2)
x b [ f ( y ) -- f ( x ) ] t-r 3If(z) - f ( y ) ] f ( x )
dx
- w ~[F(y) - F(w)] t-r-2[1 - F(z)] + (t - r - 2)
fors-r_>2andt=s+l
/w"
~b[F(y)
- F(~)]'
~-3 [1 - F ( z ) ] f ( ~ ) d ~ ,
~[F(x)
- F(w)] .... 2IF(y) - F(x)]f(~)
that
X ( w , y ) = - ( s - r - 1)
£
d~
÷ fwYxb[F(x) -- F(w)] s-r-1 f ( x ) dx + yb[F(y) -- F(w)] s-r-1 IF(z) - F ( y ) ] - (s - r - 1)
/J
xb[F(x) -- F ( w ) ] . . . . 2[F(z) - F(y)] f ( x ) dx
+ y b [ F ( y ) - F(w)] s-r l[1 - F(z)] - (s - r - 1)
xb[F(x) -- F ( w ) ] . . . . 211 - F ( z ) ] f ( x ) dx ,
41
Higher order m o m e n t s o f order statistics
and for s-
r >_ 2 a n d t - s
>_ 2 t h a t
X ( w , y ) = - ( s - r - 1)
2
z~[F(x) - F(w/] '-~ 2IF(y) - F(x)]' ' f ( x ) cU
+ (t - ~) . / ~ x ~IF(x) - F(w)I'-r-~[FO,)
- (s-r-
- F(x)l'-"
i f ( x ) d~
x b [ F ( x ) - F ( w ) ] ' ~ 2[F(y) - F ( x ) ] t ,-I
1)
x IF(z) - F(y)]f(x) dx + (t - s - 1)
xb[F(x) -- F ( w ) ] . . . . 1
x IF(Y) - F(x)lt-s-zIF(z ) - F(y)]f(x) dx
(s - r - 1)
-
x b f F ( x ) - F(w)] s ~ - 2 [ F ( y ) - F(x)] '-'-~ [1 - F ( z ) ] f ( x )
+(t-s-1
xb[F(x)-F(w)l s-r l[F(y)-F(x)]'-"
dx
2[1-F(z)lf(x)dx.
U p o n s u b s t i t u t i n g t h e a b o v e e x p r e s s i o n s o f K(w, y) in Eq. (5.15) a n d s i m p l i f y i n g the r e s u l t i n g e q u a t i o n s , w e d e r i v e the r e c u r r e n c e r e l a t i o n s g i v e n in Eqs. ( 5 . 1 1 ) (5.14). THEOREM 1 1. F o r n >_ 4, 1 _< r < s < n -- 2 a n d a, b, c, d = 1 , 2 , . . . ,
#(~,b,c,4
#(~,b+~,d)
r,s,s+l,s+2:n =
r,s,s+2n
(a,b.~-1,a)
@ C~r,s,s+l,s+2:n
-- ( n -- S -- 1 ) f # (a'b'c+d) -- # (a'b+c'd) l J ( r,s,s+l:n r,s,s+l:n J
forn
> 5, l < _ r < s < u < n , ]A(a'b'c'd)
u-s>_3anda, 1
= ]l(a'b+c'd) Jr-
..... +1,,:,
r~s....
5, l < _ r < s < t < _ n - 1 ,
b,c,d=l~2,...,
V (a 6,c-l,d)
LC#r,s,s+l .....
u - s - 1 ., f
' (a b c d)
(5.17)
#(a,b+c,d) ~ - l ) l ] 2 r , s ' s ' + ' l , u _ l : n -- r .... l:n }]
--(gl--b!
forn>_
(5.16)
" ,
t-s>_2anda,
;
b,c,d=l,2,...,
]A(a,b,c,d) #(a,b,c,d) (a,b,c 1 .d) ..... +l,u:n = r,s,t-l,t+l:n ~- C]Ar,s~tlt+l:'n (g/ " [" (a bc+d) #(a,b,c,d) I --- t ) l llr,s,tln -- .... t - l ,t:n J i
andforn
> 6, 1 < _ r < s < t < u < _ n , (a,b,c,a) _ #(,,b,c,a) ,tlr.s,t.u:n
--
t-s>_2, 1
F
(5.18) • '
u-t>_2anda,b,c,d=l,2,...,
(, b c-i,a)
r,s,t-l,u:n -w b/ -- t
. , f (a.b.~d) . (~,b,c,d)}] - (n - u + 1) J ,u~,s,tf_h, , -- #~,s,t-l,,-l:n
(~,b,O,a)
w h e r e ~lr,s, t ....
(~ b a)
~ #r4,u:n •
(5.19) ,
N. Balakrishnan and S. S. Gupta
42
PROOF. F r o m Eqs. (5.1) a n d (5.2), for 1 _< r < s < t < u _< n a n d a , b , c , d _> 1 we can write
(a,b,c- l,d) = #r's'tlu:n (r -×
n! 1)!(S
-- r --
1)!(t
-- S --
1)!(H
-- t -- 1)!(n
I7oYo xwaxbzd[F(w)] r-1 [F(x) -
-- b/)!C
F(w)] s-r-1
x [1 - F(z)]n-"L (x, z ) f ( w ) f
( x ) f ( z ) d w dx dz , (5.20)
where
L ( x , z ) = ~x z IF(y) =
F(x)] t-"
/xzIF(y) +
I[F(z)-F(y)jUtlf(y)d(y c)
F(x)] t-s-1 IF(z) - F(y)] u-t d ( y c)
/x z[F(y) -
F(x)] t ,-1 [F(z) - F ( y ) ] "-t-1 [1 - F(z)] d ( f f )
u p o n u s i n g the r e l a t i o n in (1.3) a n d t h e n w r i t i n g 1 - F ( y ) as [ f ( z ) - F ( y ) ] + [1-F(z)]. I n t e g r a t i n g b y p a r t s now, we o b t a i n for t = s + 1 a n d u = s + 2 that
L ( x , z ) = -xC[F(z) - F ( x ) ] fort=s+l
+
/x yOf(y)
dy+S[1
-F(z)]-x~[1
- F(z)] ,
andu-s_>3that
L(x, z) = - x~[F(z) - F ( x ) ] " - s - I + ( u - s -
1
y°[F(z)-F(y)]"-'-2f(y)
dy
- x°[F(z) - F(x)] u-s 211 - F(z)] + (u - s - 2) fort-s_>2andu=t+l
that
L ( x , z ) = - ( t - s - 1) -+-
_> 2 a n d u -
/x zyC[F(y) -
F(x)]t-s-2[F(z) - F ( y ) ] f ( y ) dy
yC[F(y)-F(x)]t-s-lf(y)dy+zC[F(z)-F(x)]
- (t - s - 1) a n d for t - s
yC[F(z) - F(y)] u-s-3 [1 - F ( z ) ] f ( y ) dy ,
/z yC[F(y) -
t_> 2 t h a t
t-s l [ 1 - F ( z ) ]
F ( x ) ] t - ' - 2 [ 1 - F ( z ) ] f ( y ) dy ,
Higher order moments of order statistics
L(x, z) = - ( t - s - 1)
~XZ
yCIF(y ) - F ( x ) ] t - s - 2 [ F ( z )
43
- F(y)]"-tf(y)
yC[F(y ) _ F(x)]t ,-1 IF(z) - F ( y ) ] " - t - l f ( y )
+ (u - t)
-- (t -- S --
1)
//
yC[f(y ) _ f(x)]t-s-2[f(z)
x [1 - F ( z ) ] f ( y ) d y + (u - t - 1)
dy dy
_ f(y)]U t 1
yC[F(y ) _ F(x)]t , 1
x IF(z) - F(y)] u t 2[1 _ F ( z ) l f ( y ) d y . Upon substituting the above expressions of L(x, z) in Eq. (5.20) and simplifying the resulting equations, we derive the recurrence relations given in Eqs. (5.16)(5.19). The recurrence relations presented in any one of Theorems 8, 9, 10 and 11 may be employed in a simple systematic recursive way to compute the quadruple moments of order statistics (of all order) for all sample sizes.
6. Applications to inference for the one-parameter exponential distribution By using the results presented in Sections 2-5, we computed the single, double, triple and quadruple moments of order statistics (of order up to 4) for sample sizes up to 12. As will be displayed in this section, these quantities may be successfully used in order to develop a chi-square approximation for the distribution of the best linear unbiased estimator of the scale parameter of an exponential distribution based on doubly Type-II censored samples. In this section we assume that Yr+l:n 3,1 < r < n - 2 a n d a ,
b= 1,2,...,
#~o~,), . : #(a,b)r,n_l:~+ b#~,a,;:b~-1) - n
~(ri #r:n_ 1 - /~.~,~b)l.~ } ,. 1
,
(7.15)
where ,I,lr,s:n THEOREM 15. For n _> 2 and a,b = 1 , 2 , . . . , (a,b)
. (a+b)
#l,2:n = ,U2:n
(a l,b)
+ a#l,2:n
n . (a+b)
-- ePtl:n-I
;
(7.16)
N. Balakrishnan and S. S. Gupta
50
for n > 3,2 < r < n ,
1 and a , b = 1 , 2 , . . . , r kapr'r+hn
r+l:.
~ ~.Ur:n 1 --
,.
'
for n _> 3,3 < s < n and a , b = 1 , 2 , . . . , (a-l,b) #I;~2 = #~s~2 -~ a"#l,s:n and for n >_ 4,2 3, 1 < r < n - 2 ]A(a'b'c) r'r+l'r+2:n =
- n
[
r,r+hn-1
1,2,..., " \ ((a.b+c) ]2(a+b,c)'~ -- l)el.12r,r+l:n -- r,r+l:n J
r,r+l:n-I j
,
(7.30)
N. Balakrishnan and S. S. Gupta
52
forn>4,1_ 5, 1 2 and a , b , c = 1 , 2 , . . . , (ab
it)
[o~,r,s,,:° '
r,s,t-l:n
(7.32)
1-
- (n -t+ r,s-l,t
l:n-1
.\((abc) .(a,b,c) ,)t~,~;,:,:, . . . - . ~,_~,
}]
"1
~:'l'
,7.33,
'
(~,c)
where t.tr,s,t:n ~ ].lr,t:n . The recurrence relations presented in any one of Theorems 16, 17 and 18 may be used in a sample systematic recursive fashion in order to compute the triple moments (of all order) of order statistics for all sample sizes. (iv) Relations for quadruple moments The joint density function of Xr:,,Xs:n,Xt:, and X~:, is given by L , , , , .... ( w , x , y , z )
=
(r- 1)!(s- r-
n! 1 ) ! ( t - s - 1 ) ! ( u - t - 1 ) ! ( n - u)!
× IF(w)] r-I [F(x) - F(w)] s ~ 1
x [F(y) - F(x)] t ~-l[F(z) - F(y)] ~ t
1
x [1 - X ( z ) ] ' - U f ( w ) f ( x ) f ( y ) f ( z ) , O<w<x 6, 1 ~ r < s < t < u 2 a n d a , b , c , d = 1 , 2 , . . . ,
p(a,b,c,d) 1 F . (a.b c d-l) r,s,t,u-l:n -~ n -- u + l La/~r'&iu':n
(7.38) --n
a n d for n _> 5, 1 _< r < s < t_< n -
~ r,s,t,u:n-1 -- t~r,s,t,u-l:n-I
;
2 and a , b , c , d = 1 , 2 , . . . ,
(a,b,c,d) (a,b,c,d) ,. (a,b,c,d 1) [1-P]f,,4 fir.s,t,n:n, = #r,s,t,n_l:n ~- a#r,s,t,n:n -- l"t
(a,b,c)
k-7-JUi r,s,,:.-1
#(a,b,c,d) "~ r,s,t,n l:n 1J '
(7.39) (a,b,c,O) (a,b,c) w h e r e #r,s,t,u:n = #r,s,t:n "
THEOREM 20. F o r n _> 4, 3 _< t < u _< n a n d a, b, c, d = 1 , 2 , . . . , (a,b,c,d) = #(a+b,c,d) (a-l,b,c,d) 24.... q- aPl,2,t,u:n
/~l,2,t ....
n (a+b,e,d) ~Pl,t-l,u-l:n-I
;
(7.40)
for n > 5,2 5,3 _< s < t < u _< n a n d a , b , c , d = 1 , 2 , . . . , #(_ 6 , 2 _5,1
_ 2, u -
--
s_> 2 a n d a, b , c , d
= 1,2,...,
1){#r,s,u:n'(ab+cd)
f fi(a.b+c,d)
-- (n -- u + 1 ) t
(7.45)
i}1 ;
r,t-l,u-l:n
#(a,b,c,d) = #(a,b,c,d) _ _ . (a,b-l,c,d), -- (hi -- S r,s,s+l,u:n r,s i,s+i,u:n "~ Oflr,s,s+l.u:n _ #(a,b,c,d) r,s- 1,s,u:n
#(a+b,c,d)
-- r,r+l,u-l:n}
#(a+b,~,d) r,r+l,u-l:n-1}
for n > 5, 1 < r < t < u < n, t - r _> 3 a n d a , b , c , d l~(a,b, 6, 1 < _ r < s < t < u
2, t - s
_> 2 a n d a , b , c , d
(a.b,c,d) (~.b#,d) 1 [ . (a,b-1 c d) , = #r,s l,t,u:n ~- t - s [O#r,s,t,u:n' -- ( U #r,s,t.u:n -
#(a,b,c,a)
-
1" f
., f
(a,o#,d) (a c d) w h e r e #r,s,t .... = #r,t',~,':,, •
r,s,t 1,u-l:n-1-
= 1,2,...
(~ b c.d)
' ~j 5, l < _ r < s < u < _ n ,
u-s>_3anda,
~(a,b,c,d) ,u(a,b..c,d) + , , , = r,s,u:n rss+l.u:n IA
f ]A(a,b,c+d ) _ #(a,b+c,d) 1. . I. ..... , , l : n - I ..... +l:n-1 j ' b,c,d=l,2,...,
1 [ (~ b,c l.d) s - - 1 [C]gr,s,s+l,mn - - (H
_#(a,b--c,d) I -- [~-} r,s,u-l:n j n
(7.48)
--
U
-~-
S. (~,b,- 5, l 1, and #r,s:, to denote the p r o d u c t m o m e n t s E(Xr:,Xs:,) for 1 _< r < s < n. Let us further denote Var(X~:n) by a ..... 85
86
N. Balakrishnan and R. Aggarwala
-k=
0.4
=0.3
k= 0.1 -/-//// I"
\ \ ~-~- k=0.2
15
f(x
05
-6
-2
10U
~. X
Fig. 1.1 Pdf's of generalized logistic distribution.
and CoV(Xr:,,
Xs:,)
by a ...... For simplicity, we shall also use the notation #r:, for
In Section I of this paper, we establish several recurrence relations satisfied by the single moments/~!i~ and the product moments//r,s:,. These recurrence relations will enable one to compute all the single and product moments of order statistics for all sample sizes in a simple recursive manner. If we let the shape parameter k --+ 0, the recurrence relations reduce to the corresponding results for the logistic distribution established by Shah (1966, 1970); see also Balakrishnan (1992). By starting with the values of E(X) =/-/1:1, E(X2) =//12:1, and //1,2:2 = //2:1, for example, we have determined the means, variances and covariances of order statistics and have tabulated them for samples of size up to 8 for k = 0.1 (0.1)0.4. These quantities have then been used to determine the best linear unbiased estimators (BLUE's) of the location and scale parameters of the generalized logistic distribution, and the necessary tables of coefficients and the variances and covariance of the BLUE's have been tabulated for sample sizes 5(5)20 for k = 0.1(0.1)0.4. Next, we discuss maximum likelihood estimation for the twoparameter and the three-parameter models based on Type-lI right censored samples. Finally, for the generalized logistic distribution, we have presented an example to illustrate these methods of inference.
Recurrence relations for single and product moments
87
Zelterman and Balakrishnan (1992) discussed four types of generalizations of the logistic distribution. The generalized logistic distribution in (1.1), however, is not one of those types; for this reason, the distribution in (1.1) may be referred to as "Type V Generalized Logistic Distribution". This distribution has been proposed, discussed and applied by Hosking (1986, 1990). From (1.1) and (1.2), we observe easily that the hazard function for this family of distributions is given by
h(x) - 1~f(x) ff-(x)- {
(1 - kx)[1 + (1
- kx)I/k I '
1 +e -x'
-oc<x0,
±k < x < e c
when
k 2,
•
n-s+
n - s + 2
-#~,s ~:,
1
(3.5)
}
n-s+l
#~:~
PROOF. For 1 < r < s _< n and s - r >_ 2, let us consider from (3.1) #r:n -- k#r,s:n = t[Xr:nXOs..n - kXr:nXs:n]
_
n,
- (r - 1)!(s - r -
1)!(n - s)l
(x - kxy)[F(x)] r l x 2 .
(3.11)
THEOREM 3.4. F o r l _ < r < s _ < n a n d s - r _ > 2 , #r+l,s+l:n-kl = [Ar+2,s+l:n+l A v ~
T
7 ~s:n ~-
1 --
].l..... -- ['lrWl,s:n
"
(3.12)
Recurrence relations for single and product moments
95
PROOF. F o r 1 < r < s _< n a n d s - r > 2, let us consider f r o m (3.1) /~s:, - kpr,s:, = E [X°,Xs:n - kXr:nXs:n]
"
= (r - 1)!(s - r - 1)!(n - s)!
ff
(Y - kxy)[F(x)]r-'
x_ 2 and i = 0, 1 , 2 , . . . ,
#(i÷1) (1 . . . . (i+l) - - - , (g+l) - r" + ~d)#1.,,-1 - (1 - 2P + 2~)#l:n l:n+l =
i + 1 {. (i) n
\#1:~-k~'i:.
(9.5)
. (i+1)~ . ) ,
and for i = 0, 1 , 2 , . . . , #(g+l)l:2 = (1
-- P + Q)p(g+l)
--(i+
_
,..,.-.~\ (,+1)
(1 - 2P + z~d)#H
1 ) ( , u l i { - k • (i+l)'~ 'Ul:l
) '
(9.6)
Recurrence relationsfor single and product moments
119
PROOF. For n _> 1 and i = 0, 1,2,..., let us consider from (9.1) and (8.4)
l:n--g#l:n /~(i/ , (i+1) : r / < { ( l - P - I - Q )
fo P' xi[f(x)][1-F(x)] n lax i
Integrating by parts treating x i for integration and the rest of the integrands for differentiation, and then writing F(x) as 1 - [1 - F(x)], we obtain k~(i)
.(i+1)_
l:n -- kt~l:n
n
i; 1
{(l_p+Q)[,(i+l
)
.(i+1)1
Lt~l:n-I -/Xl:n
]
The relation in (9.5) follows simply by rewriting (9.7). Relation (9.6) is obtained by setting n = 1 in the above proof and simplifying. Q.E.D. REMARK 9.1. Letting the shape parameter k ~ 0 in (9.2)-(9.5), we deduce the recurrence relations established by Balakrishnan and Kocherlakota (1986) for the single moments of order statistics from the doubly truncated logistic distribution. Furthermore, letting P -+ 1 and Q ~ 0, we deduce the recurrence relations for the generalized logistic distribution, established in Section I. REMARK 9.2. The recurrence relations established in (9.2)-(9.5) will enable one to compute all the single moments of all order statistics for all sample sizes in a simple recursive manner. This is explained in detail in Section 11. 10. Recurrence relations for product moments
The joint density function of X/:~ and Xj:n (1 < i < j _< n) is given by [David (1981, p. 10), Arnold, Balakrishnan and Nagaraja (1992, p. 16)] n! fi,j:n(x,y) = (i - 1 ) ! ( j - i - 1 ) ! ( n - j ) ! [F(x)]i-I[F(Y) - F(x)]J-i=l x [1-F(y)]"-Jf(x)f(y),
Q1 _ 2,
I'ln-l.n:n+l -- 2(p÷-l-Q) I(l -- g ÷ Q)(l'lPl#n-l:n-, -- ( n
1]# (2,~/n:nl
-- (1 -- 2P + 2Q)#,,_l,n: n - (#,, l:n -- k#n-, .... )] n - l , (2) -
PROOF.For
t~n:n+l
•
1 < i < n - 2, let us consider from (10.1) and (8.4)
#i:n-k#i,i+l:n
= E(X/:ng/°l:n - k~i:nX/+l:n)
n!
= (i - 1)!(n -- i - 1)!
£P~x{F(x)]i-lf(x)J2(x)
dx,
(10.10)
I
where
J2(x)
= (1 - P + Q) fx PI [1 - F(y)]" i-iF(y) dy + (P - Q)
f"
[1 -
dy.
x
Writing F(y) as 1 - [1 - F(y)] in the above integrals and then integrating by parts treating dy for integration and the rest of the integrands for differentiation, we obtain an expression for J~ (x) which, when substituted into (10.10) and simplified by combining integrands containing ( 1 - P + Q)x 2 and then combining integrands containing (P - Q)x 2, gives 1" . (2) + n#i,i+l:n_ 1 -- (n -- i)#i,i+l:n] #i:n -- k#i,i+l:,, = (1 - P + Q) [-t#i+l:~ I-i #(2) + (P - O)(n - i) kn + 1 i+l:n+l ÷
n-i+l n+ 1
#i,i+l:n
] #i,i+l:n+l
"
Recurrence relations for single and product moments
123
The recurrence relation in (10.8) is derived simply by rewriting the above equation. Relation (10.9) is derived by setting i = n - 1 in the above proof and simplifying. Q.E.D. THEOREM 10.4. For 1 < i < j < n " 1 and j - i > 2, i /zi,jwl:n+l z /zi~j:n+l ~ - ~ / . (/zi+l,j:n+l - /zi+l,jWl:n+l ) F/ -]- 1
(1 -- P ~- Q)
~ ( n - j + 1)(P- Q)
1/ +7--_
+
(/zi+l,j:n -- /zi+ld+l:n)
- k/z/j:,)}
(10.11)
and f o r l < i < n - 2 ,
_ i (/zi+l:n+l t` (2) -- /zi+l,i+2:n+l) /zi,i+2:n+l = /zi,i+l:n+l ~n+l -~ ( l ' l - i ) ( P - a )
{(1 - P +Q)[i(/zl2+)l:n- /zi+l'i+2:n)
+ (/zi,i+l:n-/zi,i+2:n)]-{-(/zi:n-k/zi,i+l:n)}
,
(10.12)
PROOF. Following the proof of Theorem 10.2, writing F(y) as F(x) + [F(y) - F(x)l in Ji (X) and then integrating by parts treating dy for integration and the rest of the integrands for differentiation, we obtain an expression for Jl(x) which, when substituted into (10.7) and simplified, gives, for 1 l . \J,/
(3.10) Next, for n > 2 we have
aj(m, n) = Coefficient of ud(1 - u) m~ j in ~ ( m r ) u r ( 1 - u ) m Coeff. of ur(1 -- U)m-r in
=
ur(1
-
-
~
11,)m-r
× [Coeff. of uj r(1 - u) m(~-l)-~-r) in
{r=~(7)bit(I--hi)m--r} n-l] r=c~
=~(m)aj-~(m,n-1)
•
(3.11)
Thus, by starting with the values of aj(m, 1) given in (3.10), we can compute the coefficients aj(m, n) for any value of n by repeated application of the recurrence relation in (3.11). Means, variances and covariances of all order statistics have been computed for sample size n = 20 and for choices of e = 0.5~0.5)3(1)6(2)12 by using Eqs. (3.3) and (3.4). Since the distribution is symmetric around zero, we have
Xi:n d -X,-i+l:,
(3.12)
Order statistics from Type 1II generalized logistic
137
and (X~:n,Xj:n) _a (-Xn j+~:,,,-X,,_~+~:,,) .
(3.13)
These results reduce the amount of computation involved in the evaluation of moments of order statistics. From (3.12) and (3.13), we readily get z
n-i+l:n
and
#i,j:~ =/~-j+~,n-i+~:n •
(3.15)
The IMSL routine D Q D A G was employed to perform the single integration in the computation of the single moments of order statistics, while the double integration involved in the computation of the product moments was performed by IMSL routine DTWODQ. Values of means, and variances and covariances of all order statistics are presented in Table 2 and Table 3, respectively; all these values were computed in double precision and rounded off to five decimal places. In order to check the correctness of these values, we used the following known identities: Table 2 Means of order statistics (#i:,) for sample size n = 20 n
i
~=0.5
~--1.0
~=1.5
7=2.0
~=2.5
~=3,0
20
1l 12 13 14 15 16 17 18 19 20
0.07943 0.24027 0.40727 0.58547 0.78165 1.00595 1.27552 1.62408 2.13524 3.14255
0.07071 0.21356 0.36087 0.51628 0.68464 0.87320 1.09417 1.37147 1.76431 2.50863
0.06777 0.20456 0.34529 0.49316 0.65239 0.82929 1.03435 1.28797 1.64012 2.28745
0.06630 0.20008 0.33755 0.48170 0.63647 0.80768 1.00500 1.24709 1.57928 2.17765
0.06542 0.19741 0.33294 0.47488 0.62701 0.79487 0.98765 1.22298 1.54345 2.11270
0.06484 0.19564 0.32989 0.47037 0.62075 0.78641 0.97621 1.20712 1.51991 2.06999
n
i
c~= 4.0
7 = 5.0
~ = 6.0
c~= 8.0
~ = 10.0
~ = 12.0
20
11 12 13 14 15 16 17 18 I9 20
0.06412 0.19344 0.32609 0.46477 0.61300 0.77594 0.96208 1.18755 1.49093 2.01744
0,06369 0,19213 0,32383 0.46144 0.60839 0.76972 0.95369 1.17597 1.47381 1.98645
0.06341 0.19126 0.32233 0.45923 0.60533 0.76560 0.94814 1.16831 1.46251 1.96604
0.06305 0.19017 0.32047 0.45648 0.60153 0.76049 0.94126 1.15883 1.44853 1.94083
0.06284 0.18952 0.31935 0.45483 0.59927 0.75744 0.93716 1.15318 1.44022 1.92588
0.06270 0.18909 0.31861 0.45374 0.59776 0.75541 0.93444 1.14943 1.43471 1.91599
t Missing values can be found by the symmetry relation #i:~ = -#n-i+l:n.
138
N. Balakrishnan and S. K. Lee
Table 3 Variances and covariances of order statistics (~tj:n) for n = 20 n
i
j
~=0.5
~ = 1.0
7 = 1.5
~=2.0
~=2.5
~=3.0
20
1 1 1 1 1 1 1 1 1 1 1 1 l 1 1 1 1 1 1 1 2 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3
1.60641 0.60709 0.35907 0.25018 0.18992 0.15212 0.12648 0.10815 0.09454 0.08416 0.07608 0.06971 0.06463 0.06058 0.05734 0.05477 0.05276 0.05124 0.05015 0.04945 0.61272 0.36290
0.84810 0.33539 0.20612 0.14842 0.11588 0.09501 0.08051 0.06984 0.06166 0.05520 0.04996 0.04563 0.04200 0.03889 0.03622 0.03389 0.03184 0.03002 0.02840 0.02695 0.34949 0.21560
0.62091 0.25628 0.16160 0.11845 0.09371 0.07760 0.06624 0.05777 0.05119 0.04592 0.04158 0.03794 0.03483 0.03212 0.02973 0.02759 0.02564 0.02383 0.02210 0.02035 0,27459 0,17396
0.51786 0,22057 0.14134 0.10467 0.08340 0.06943 0.05949 0,05203 0~04619 0,04148 0,03757 0.03426 0.03141 0.02891 0.02667 0.02465 0.02277 0.02100 0.01924 0.01737 0.24103 0.15522
0.46061 0.20067 0.12995 0.09688 0.07754 0.06475 0.05562 0.04872 0.04330 0.03890 0.03524 0.03213 0.02943 0.02705 0.02492 0.02297 0.02115 0.01940 0.01765 0.01570 0.22234 0.14472
0.42466 0.18810 0.12272 0.09190 0.07378 0.06175 0.05312 0.04658 0.04143 0.03724 0.03374 0.03075 0.02816 0.02586 0.02379 0.02189 0.02011 0.01838 0.01663 0.01465 0.21053 0.13805
n
i
j
~ = 4.0
~ = 5.0
e¢ = 6.0
e¢ = 8.0
¢~= 10,0
~ = 12.0
20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
l 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 3
0.38243 0.17322 0.11411 0.08594 0.06926 0.05812 0.05009 0.04399 0.03916 0.03522 0.03191 0.02908 0.02661 0.02441 0.02243 0.02059 0.01886 0.01717 0.01543 0.01341 0.19651 0,13009
0.35864 0.16475 0.10918 0.08251 0.06665 0.05602 0.04834 0,04249 0,03784 0.03404 0.03085 0.02811 0.02571 0.02358 0.02164 0.01984 0.01814 0.01647 0.01474 0.01271 0.18850 0.12552
0.34344 0.15931 0.10599 0.08029 0.06495 0.05466 0.04720 0.04151 0.03698 0.03327 0.03015 0.02747 0.02513 0.02303 0.02112 0.01935 0.01767 0.01602 0.01429 0.01226 0.18334 0.12256
0.32521 0.15272 0.10213 0.07759 0.06289 0.05299 0.04581 0.04031 0.03593 0.03233 0.02931 0.02670 0.02441 0.02236 0.02049 0.01876 0.01710 0,01547 0.01376 0.01171 0.17707 0.11896
0.31469 0.14890 0.09987 0.07601 0.06168 0.05201 0.04499 0.03960 0.03531 0.03178 0.02881 0.02624 0.02399 0.02197 0.02012 0.01841 0.01677 0.01515 0.01344 0.01139 0.17342 0.11685
0.30784 0.14639 0.09839 0.07497 0.06088 0.05137 0.04445 0.03914 0.03490 0.03142 0.02848 0.02594 0.02371 0.02171 0.01988 0.01818 0.01655 0.01493 0.01324 0.01119 0.17103 0.11547
t Missing values can be found by the symmetry relation aij:,, = ¢7n-j+l,n i+l:n.
Order statistics from Type 1II generalized logistic
139
Table 3 (Contd.) n
i
j
~=0.5
~=1.0
~=1.5
~=2.0
~=2.5
~=3.0
20
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 3 4 5 6 7 8
0.25305 0.19221 0.15402 0.12810 0.10956 0.09580 0.08529 0.07712 0.07067 0.06553 0.06142 0,05814 0,05554 0,05351 0.05197 0.05086 0.36880 0.25750 0.19576 0.15696 0.13061 0.11175
0.15557 0.12162 0.09981 0.08462 0.07345 0.06488 0.05810 0.05260 0.04805 0.04423 0.04097 0.03815 0.03570 0.03355 0.03164 0.02993 0.22603 0.16345 0.12796 0.10512 0.08919 0.07745
0.12784 0.10128 0.08396 0.07172 0.06259 0.05549 0.04979 0.04510 0.04116 0.03779 0.03485 0.03227 0.02995 0.02783 0.02587 0.02400 0.18600 0.13700 0.10870 0.09020 0.07711 0.06732
0.11525 0.09197 0.07665 0.06573 0.05751 0.05108 0.04588 0.04157 0.03792 0.03477 0.03201 0.02954 0.02730 0.02523 0.02326 0.02132 0.16796 0.12500 0.09990 0.08333 0.07151 0.06261
0.10816 0.08670 0.07249 0.06230 0.05461 0.04855 0.04364 0.03955 0.03606 0.03304 0.03038 0.02798 0.02580 0.02375 0.02179 0.01983 0.15782 0.11822 0.09490 0,07942 0.06831 0.05990
0.10363 0.08333 0.06981 0.06010 0.05273 0.04692 0.04219 0.03823 0.03486 0.03193 0.02933 0.02698 0.02483 0.02281 0,02086 0.01887 0,15137 0.11389 0.09170 0.07690 0.06624 0.05815
n
i
j
ct = 4 . 0
~ =
ct =
c~ =
~ =
~ =
20
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 3 4 5 6 7 8
0.09821 0.07927 0.06659 0.05744 0.05047 0.04494 0.04043 0.03665 0.03340 0.03057 0.02805 0.02577 0.02367 0.02168 0.01974 0.01774 0.14364 0.10868 0.08784 0.07385 0.06374 0.05603
0.09509 0.07692 0.06473 0.05589 0.04915 0.04379 0.03941 0.03572 0.03256 0.02979 0.02732 0.02507 0.02300 0.02102 0.01909 0.01709 0.13920 0.10567 0.08560 0.07209 0.06229 0.05480
5.0
6.0
0.09306 0.07540 0.06351 0.05488 0.04829 0.04304 0.03874 0.03512 0.03200 0.02927 0.02683 0.02462 0.02256 0.02060 0.01867 0.01667 0.13631 0.10372 0.08414 0.07093 0.06133 0.05399
8.0
0.09059 0.07353 0.06202 0.05365 0.04723 0.04212 0.03792 0.03437 0.03132 0.02864 0.02624 0.02406 0.02202 0.02008 0.01816 0.01615 0.13279 0.10132 0.08235 0.06951 0.06017 0.05299
10.0
0.08914 0.07244 0.06114 0.05292 0.04661 0.04157 0.03743 0.03394 0.03092 0.02827 0.02589 0.02373 0.02170 0.01977 0.01786 0.01585 0.13072 0.09992 0.08130 0.06868 0.05948 0.05241
t Missing values can be found by the symmetry relation crid:, = ~r, j+l,n-i+l:n.
12,0
0.08818 0.07172 0.06057 0.05244 0.04620 0.04121 0.03711 0.03365 0.03066 0.02802 0.02566 0.02351 0.02150 0.01957 0.01766 0.01566 0.12937 0.09899 0.08060 0.06813 0.05902 0.05202
140
N. Balakrishnan and S. K. Lee
Table 3 (Contd.) n
i
j
c~= 0.5
c~= 1.0
c~= 1.5
c~= 2.0
c~-- 2.5
~ = 3.0
20
3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4
9 10 I1 12 13 14 15 16 17 18 4 5 6 7 8 9 10 1l 12 13 14 15
0.09774 0.08705 0.07872 0.07215 0.06692 0.06273 0.05938 0.05673 0.05466 0.05309 0.26366 0.20067 0.16104 0.13410 0.11480 0.10045 0.08949 0.08096 0.07422 0.06885 0.06455 0.06112
0.06844 0.06131 0.05552 0.05074 0.04671 0.04327 0.04031 0.03772 0.03545 0.03343 0.17221 0.13502 0.11103 0.09428 0.08192 0.07242 0.06490 0.05880 0.05374 0.04948 0.04585 0.04272
0.05971 0.05360 0.04856 0.04433 0.04071 0.03756 0.03477 0.03228 0.03000 0.02789 0.14648 0.11640 0.09668 0.08271 0.07226 0.06411 0.05757 0.05218 0.04764 0.04376 0.04038 0.03739
0.05563 0.04998 0.04530 0.04133 0.03790 0.03490 0.03221 0.02977 0.02752 0.02538 0.13476 0.10785 0.09005 0.07733 0.06774 0.06021 0.05412 0.04906 0.04478 0.04107 0.03782 0.03492
0.05328 0.04790 0.04342 0.03960 0.03629 0.03337 0.03075 0.02835 0.02611 0.02396 0.12812 0.10299 0.08627 0.07425 0.06515 0.05797 0.05213 0.04727 0.04312 0.03953 0.03635 0.03350
0.05176 0.04656 0.04220 0.03849 0.03525 0.03239 0.02980 0.02743 0.02520 0.02305 0.12387 0.09987 0,08382 0,07226 0,06346 0.05651 0.05084 0.04610 0.04205 0.03853 0.03540 0.03258
n
i
j
c~= 4.0
c~= 5.0
c~= 6.0
~ = 8.0
c~= 10.0
c~ = 12.0
20
3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4
9 10 11 12 13 14 15 16 17 18 4 5 6 7 8 9 10 11 12 13 14 15
0.04992 0.04492 0.04073 0.03713 0.03399 0.03120 0.02866 0.02633 0.02411 0.02196 0.11874 0.09609 0.08086 0.06984 0.06142 0.05474 0.04927 0.04468 0,04075 0.03731 0.03424 0.03147
0.04884 0.04396 0.03986 0.03634 0.03325 0.03050 0.02800 0.02568 0.02348 0.02133 0.11577 0.09389 0.07914 0.06842 0.06022 0.05370 0.04835 0.04385 0.03998 0.03659 0.03357 0.03082
0.04814 0.04334 0.03930 0.03582 0.03277 0.03004 0.02756 0.02526 0,02307 0.02092 0.11383 0.09246 0.07801 0.06750 0.05944 0.05302 0.04775 0.04330 0.03948 0.03612 0.03313 0.03040
0.04727 0.04257 0.03860 0.03518 0.03218 0.02949 0.02703 0.02475 0.02257 0.02041 0.11146 0.09070 0.07662 0.06636 0.05848 0.05218 0.04700 0.04263 0.03886 0.03555 0.03258 0.02987
0.04676 0.04211 0.03819 0.03480 0.03183 0.02916 0.02672 0.02444 0.02227 0.02012 0.11006 0.08966 0.07581 0.06569 0.05791 0.05169 0.04656 0.04223 0.03850 0.03521 0.03226 0.02956
0.04642 0.04181 0.03792 0.03456 0.03159 0.02894 0.02651 0.02424 0.02208 0.01993 0.10914 0.08898 0.07527 0.06524 0.05753 0.05136 0.04627 0.04197 0.03825 0.03498 0.03205 0.02936
t Missing values can be found by the symmetry relation cri,j:n = a,, /+l,,-i+~ ....
Order statistics from Type III generalized logistic
141
Table 3 (Contd.) n
i
j
e = 0.5
c~ = 1.0
c~= 1.5
c~= 2.0
c~ = 2.5
c~= 3.0
20
4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
16 17 5 6 7 8 9 10 11 12 13 14 15 16 6 7 8 9 10 11 12 13
0.05840 0.05627 0.20713 0.16641 0.13869 0.11881 0.10402 0.09271 0208390 0.07695 0.07140 0.06696 0.06341 0.06059 0.17324 0.14453 0.12392 0.10857 0.09683 0.08767 0.08043 0.07466
0.03999 0.03758 0.14291 0.11766 0.09999 0.08694 0.07690 0.06894 0.06248 0.05712 0.05261 0.04876 0.04544 0.04254 0.12513 0.10644 0.09261 0.08197 0.07352 0.06665 0.06096 0.05616
0.03471 0.03227 0.12460 0.10361 0.08870 0.07754 0.06883 0.06183 0.05606 0.05120 0.04703 0.04341 0.04020 0.03733 0.11115 0.09524 0.08331 0.07399 0.06649 0.06030 0.05509 0.05062
0.03228 0.02984 0.11616 0.09709 0.08343 0.07313 0.06503 0.05847 0.05302 0.04840 0.04441 0.04090 0.03776 0.03491 0.10463 0.08998 0.07891 0.07021 0.06315 0.05729 0.05231 0.04800
0.03089 0.02845 0.11134 0.09335 0.08040 0.07058 0.06283 0.05653 0.05127 0.04678 0.04289 0.03945 0.03636 0.03353 0.10087 0.08695 0.07637 0.06802 0.06121 0.05553 0.05069 0.04648
0.02999 0.02756 0.10823 0.09093 0.07843 0.06893 0.06140 0.05526 0.05012 0.04573 0.04190 0.03851 0.03545 0.03263 0.09844 0.08498 0.07472 0.06659 0.05995 0.05439 0.04964 0.04549
n
i
j
c~= 4.0
c~= 5.0
c~ = 6.0
c~= 8.0
~ = 10.0
c~= 12.0
20
4 4 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
16 17 5 6 7 8 9 10 11 12 13 14 15 16 6 7 8 9 10 11 12 13
0.02891 0.02648 0.10447 0.08799 0.07604 0.06691 0.05966 0.05372 0.04873 0.04444 0.04070 0.03736 0.03434 0.03155 0.09548 0.08257 0.07269 0.06484 0.05840 0.05299 0.04835 0.04428
0.02827 0.02586 0.10227 0.08628 0.07464 0.06573 0.05863 0.05281 0.04791 0.04369 0.03999 0.03669 0.03369 0.03091 0.09375 0.08116 0.07151 0.06381 0.05749 0.05217 0.04759 0.04357
0.02786 0.02545 0.10084 0.08515 0.07372 0.06496 0.05796 0.05221 0.04737 0.04319 0.03953 0.03625 0.03327 0.03050 0.09261 0.08023 0.07072 0.06314 0.05689 0.05162 0.04709 0.04310
0.02735 0.02495 0.09907 0.08377 0.07259 0.06400 0.05713 0.05148 0.04670 0.04258 0.03896 0.03571 0.03275 0.02999 0.09121 0.07909 0.06976 0.06230 0.05615 0.05095 0.04647 0.04252
0.02705 0.02465 0.09803 0.08295 0.07192 0.06343 0.05664 0.05104 0.04631 0.04222 0.03862 0.03539 0.03244 0.02969 0.09038 0.07841 0.06919 0.06180 0.05571 0.05056 0.04610 0.04218
0.02685 0.02446 0.09735 0.08242 0.07148 0.06306 0.05632 0.05075 0.04605 0.04198 0.03839 0.03518 0.03223 0.02949 0.08983 0.07797 0.06881 0.06148 0.05542 0.05029 0.04586 0.04195
t Missing values can be found by the symmetry relation aiff:n
=
~n-j+l,n-i+l:n.
142
N. B a ~ k r ~ h n a n and S. K. L e e
Table 3 (Contd.) n
i
j
c~= 0.5
c~= 1.0
~ = 1.5
c~= 2.0
c~= 2.5
c~= 3.0
20
6 6 7 7 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 10 10
14 15 7 8 9 10 11 12 13 14 8 9 10 11 12 13 9 10 11 12 10 11
0.07004 0.06634 0.15186 0.13034 0.11429 0.10200 0.09241 0.08482 0.07877 0.07392 0.13833 0.12141 0.10845 0.09832 0.09030 0.08390 0.13027 0.11648 0.10569 0.09714 0,12650 0.11490
0.05207 0.04853 0.11379 0.09909 0.08776 0.07875 0.07143 0.06535 0.06023 0.05585 0.10655 0.09443 0.08479 0.07694 0.07043 0.06493 0.10221 0.09184 0.08338 0.07636 0.10017 0.09100
0.04673 0.04329 0.10249 0.08971 0.07972 0.07167 0.06503 0.05943 0.05462 0.05044 0,09691 0.08618 0.07752 0.07036 0,06432 0.05914 0.09356 0.08420 0.07647 0.06993 0.09197 0.08357
0.04422 0.04084 0.09715 0.08526 0.07589 0.06829 0.06197 0.05660 0.05196 0.04787 0.09232 0.08223 0.07403 0.06720 0.06140 0.05638 0.08940 0.08053 0.07314 0.06685 0.08802 0.07998
0.04276 0.03942 0.09406 0,08267 0.07366 0.06632 0.06019 0.05496 0.05041 0.04638 0.08965 0.07992 0.07199 0.06536 0.05969 0.05477 0.08697 0.07838 0.07119 0.06504 0.08571 0.07788
0.04182 0.03850 0.09205 0.08098 0.07221 0.06504 0.05902 0.05388 0.04939 0.04541 0.08790 0.07842 0.07066 0.06415 0.05857 0.05371 0.08538 0.07697 0.06991 0.06386 0.08419 0.07651
n
i
j
~=4.0
~=5.0
~=6.0
~=8.0
~=10.0
~=12.0
20
6 6 7 7 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 10 10
14 15 7 8 9 10 11 12 13 14 8 9 10 1l 12 13 9 10 11 12 10 11
0.04066 0.03738 0.08959 0.07892 0.07043 0.06346 0.05760 0.05256 0.04815 0.04422 0.08576 0.07656 0.06902 0.06266 0.05720 0.05242 0.08343 0.07524 0.06834 0.06240 0.08233 0.07481
0.03998 0.03672 0.08815 0.07771 0.06938 0.06253 0.05675 0.05178 0.04742 0.04353 0.08449 0.07547 0.06805 0.06179 0.05639 0.05166 0.08227 0.07422 0.06741 0.06155 0.08122 0.07381
0.03954 0.03629 0.08720 0.07691 0.06868 0.06191 0.05620 0.05127 0.04694 0.04307 0.08366 0.07475 0.06741 0.06121 0.05586 0.05115 0.08151 0.07354 0.06680 0.06098 0.08050 0.07315
0.03899 0.03576 0.08603 0.07592 0.06783 0.06116 0.05551 0.05064 0.04635 0.04250 0.08264 0.07386 0.06662 0.06049 0.05520 0.05053 0.08057 0.07270 0.06604 0.06028 0.07959 0.07233
0.03866 0.03544 0.08533 0.07533 0.06732 0.06070 0.05510 0.05026 0.04599 0.04216 0.08202 0.07334 0.06615 0.06007 0.05481 0.05016 0.08001 0.07221 0.06559 0.05986 0.07906 0.07184
0.03844 0.03523 0.08487 0.07495 0.06698 0.06041 0.05483 0.05001 0.04576 0.04194 0.08162 0.07298 0.06584 0.05979 0.05455 0.04992 0.07964 0.07188 0.06529 0.05959 0.07870 0.07152
t Missing values can be found by the symmetry relation ~ri,j:. = an_jWI,n_i÷l: n,
Order statistics from Type III generalized logistic
143
~-~-'~#i:. = nE(X) i=1 =
n,ul: 1
and n
o-ij:, = n Var(X) i=1 j = l :
nol,l: 1 .
4. BLUEs of location and scale parameters Let Y be a random variable with p.d.f. 2X/~
e-~2-~(Y-~)/~
g(y; #, a) = aB(c~, c~) (1 + e-X/T~(y-~)/,,)2~' '
- o c < y , # < oc, c~,a > 0 (4.1)
where #, a and ~ are the location, scale and shape parameters, respectively. Then, Y is the location-scale form of the reparametrized model presented in Section 2. The corresponding c.d.f, is found as
G(y;#,o)--B(~,~f0
- -
u~ l ( 1 - u ) ~ - l d u
where p(.) is as defined earlier. Now, let 111,I12,..., Yn be a random sample of size n from the distribution with p.d.f, and c.d.f, as given in (4.1) and (4.2), respectively, where the shape parameter e is assumed to be known. Further, let Yl:n < Y2:~-< " " -< Y, .... be a Type-II right-censored sample obtained from the above random sample. The best linear unbiased estimators (BLUEs) of # and a are then given by [David (1981, pp. 128132); Arnold, Balakrishnan and Nagaraja (1992, p. 172)] n-F
#* = -/~'FY = d Y =
~aiYi:,,
(4.3)
i=1
and n--r
•* = I ' / ' Y = b'Y= ~-~biYi:, , i=1
(4.4)
144
N . B a l a k r i s h n a n a n d S. K . L e e
where a and b are the vectors of coefficients for the BLUEs of the location and scale parameters, respectively,
/'=X1 [r 1=
1(1/{ _/d,)12-1],
{1 , 1 , . . . , 1 ]'(n-r)×l,
#=
A = (ltX-11)(/t'.Y-lp) -- (/Rt~Y'-ll) 2,
[#l:n,#2:n,''',#n-r:n(nl'
r)×l
and o.l,l:n
O'l,2:n
"' "
o.i,n-r:n
O-l,2:n
o.2.2:n
"''
ff2,n-r:n
.
i
o.l,n-r:n
0"2,n-r:n
12=
.
.
"""
~Tn-r,n-r:n
(n-r) ×(n-r)
Further, we have the variances and covariance of the BLUEs #* and o.* as Var(#*)
(4.5)
= o.2/R'12 1/[///~ ,
Var(o.*) = o-21'12
I1/A
(4.6)
,
and Cov(/~*, a*) = - a 2 # ' .r - l l / A
.
(4.7)
After computing the means, variances and covariances of order statistics for sample size n = 20 from the standard reparametrized distribution as described in Section 3, the coefficients of the BLUEs o f # and o- were computed from (4.3) and (4.4). All the computations were carried out in double precision and the IMSL routine D L I N D S was used for finding the inverse of the variance-covariance matrix. In Tables 4 and 5, we have presented the coefficients ai and b i of the best linear unbiased estimators #* and a*, respectively, for sample size n = 20, r = 0, 2, 4 and e = 1.0, 2.0(2)10.0. Values of these coefficients have been rounded off to five decimal places. In order to check the accuracy of the tabulated coefficients, we verified the conditions n-r
n-r
Z ai = 1, Z ai#i:n = O i=1
(4.8)
i=1
and n
r
n--r
~bi=O,
Zbi#i:,= l ,
i=I
i=1
(4.9)
based on the fact that the estimators #* and a* are unbiased for # and a, respectively. The values of Var(#*)/~ 2, Var(a*)/a 2 and Cov(#*, cr*)/a 2 have been computed and are presented in Table 6 for sample size n = 2 0 , r = 0 , 2 , 4 and c~= 1.0,2.0(2)10.0. These values have also been rounded off to five decimal places.
Order statistics from Type III generalized logistic
145
Table 4 Coefficients for the B L U E of p for n = 20 r
i
~=1.0
~=2.0
~=4.0
~=6.0
~=8.0
~=10.0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 ll 12 13 14 15 16 17 18 1 2 3 4 5 6 7 8 9 10
0.00685 0.02012 0.03221 0.04291 0.05214 0.05986 0.06605 0.07070 0.07381 0.07536 0.07536 0.07381 0.07070 0.06605 0.05986 0.05214 0.04291 0.03221 0.02012 0.00685 0.00574 0.01894 0.03108 0.04187 0.05123 0.05911 0.06548 0,07032 0,07363 0,07539 0.07559 0.07423 0.07131 0.06683 0.06078 0.05317 0.04401 0.06129 0.00231 0.01540 0.02771 0.03885 0.04866 0.05708 0.06404 0.06951 0.07345 0.07584
0.01838 0.03445 0.04297 0.04893 0.05337 0.05673 0.05923 0.06102 0.06217 0.06274 0.06274 0.06217 0.06102 0.05923 0.05673 0.05337 0.04893 0.04297 0.03445 0.01838 0.01549 0.03178 0.04067 0.04701 0.05181 0.05553 0.05840 0.06054 0.06205 0.06296 0.06330 0.06307 0.06224 0.06077 0.05857 0.05549 0,05127 0.09904 0,00907 0,02602 0,03580 0.04302 0.04870 0.05327 0.05697 0.05993 0.06224 0.06393
0.03070 0.04276 0.04731 0.05023 0.05228 0.05378 0.05486 0.05562 0.05611 0.05635 0.05635 0.05611 0.05562 0.05486 0.05378 0.05228 0.05023 0.04731 0.04276 0.03070 0.02573 0.03884 0.04415 0.04770 0.05033 0.05235 0.05393 0.05517 0.05611 0.05680 0.05724 0.05744 0.05739 0.05707 0.05641 0.05534 0.05368 0.12433 0.01660 0.03181 0.03860 0.04338 0.04710 0.05012 0.05265 0.05479 0.05660 0.05813
0.03630 0.04537 0.04841 0.05032 0.05165 0.05260 0.05329 0.05377 0.05407 0.05422 0.05422 0.05407 0.05377 0.05329 0.05260 0.05165 0.05032 0.04841 0.04537 0.03630 0.03033 0.04097 0.04494 0.04759 0.04956 0.05110 0.05234 0.05333 0.05414 0.05478 0.05526 0.05559 0.05576 0.05576 0.05556 0.05510 0.05425 0.13364 0.02003 0.03354 0.03920 0.04319 0.04633 0.04892 0.05113 0.05305 0.05474 0.05624
0.03941 0.04661 0.04889 0.05031 0.05128 0.05198 0.05248 0.05283 0.05305 0.05316 0.05316 0.05305 0.05283 0.05248 0.05198 0.05128 0.05031 0.04889 0.04661 0.03941 0.03287 0.04196 0.04526 0.04747 0.04913 0.05045 0.05152 0.05242 0.05316 0.05377 0.05428 0.05467 0.05495 0.05511 0.05512 0.05494 0.05448 0.13844 0.02194 0.03435 0.03943 0.04305 0.04590 0.04829 0.05036 0.05219 0.05382 0.05531
0.04138 0.04733 0.04915 0.05028 0.05105 0.05160 0.05199 0.05227 0.05244 0.05252 0.05252 0.05244 0.05227 0.05199 0.05160 0.05105 0.05028 0.04915 0.04733 0.04138 0.03448 0.04252 0.04542 0.04738 0.04886 0.05005 0.05103 0.05187 0.05257 0.05318 0.05369 0.05412 0.05446 0.05471 0.05485 0.05484 0.05461 0.14136 0.02314 0.03480 0.03955 0.04294 0.04564 0.04791 0.04989 0.05167 0.05328 0.05475
2
4
N. Balakrishnan and S. K. Lee
146
Table 4 (Contd.) r
i
~=1.0
~=2.0
ct=4.0
~=6.0
~=8.0
~=10.0
11 12 13 14 15 16
0.07667 0.07590 0.07352 0.06950 0.06383 0.16775
0.06503 0.06553 0.06541 0.06461 0.06301 0.21744
0.05939 0.06040 0.06115 0.06160 0.06169 0.24601
0.05756 0.05873 0.05972 0.06054 0.06113 0.25594
0.05666 0.05789 0.05901 0.06000 0.06083 0.26097
0.05612 0.05740 0.05858 0.05967 0.06065 0.26399
Table 5 Coefficients for the BLUE of ~r for n = 20 r
i
~ = 1.0
c~ = 2.0
c~~ 4.0
~ = 6.0
~ = 8.0
~ = 10.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-0.05686 -0.06110 -0.06034 -0.05660 -0.05076 -0.04338 -0.03483 -0.02545 -0.01549 -0.00520 0.00520 0.01549 0.02545 0.03483 0,04338 0.05076 0.05660 0.06034 0.06110 0.05686 -0.06308 -0.06766 -0.06671 -0.06245 -0.05588 -0.04761 -0.03807 -0.02762 -0.01654 -0,00511 0.00643 0.01782 0.02882 0.03915 0.04851 0.05653
-0.07491 -0.07176 -0.06409 -0.05580 -0.04731 -0.03875 -0.03014 -0.02153 -0.01292 -0.00431 0.00431 0.01292 0.02153 0.03014 0.03875 0.04731 0.05580 0.06409 0.07176 0.07491 -0.08433 -0.08044 -0.07159 -0,06210 -0.05241 -0.04266 -0.03289 -0.02312 -0.01337 -0.00363 0,00609 0.01580 0.02548 0.03512 0.04471 0.05417
-0,09085 -0.07550 -0.06356 -0.05334 -0.04409 -0.03545 -0.02722 -0.01927 -0.01149 -0.00382 0.00382 0.01149 0.01927 0.02722 0.03545 0.04409 0.05334 0.06356 0,07550 0,09085 -0A0350 -0,08548 -0,07161 -0.05978 -0.04909 -0.03911 -0.02962 -0.02046 -0.01152 -0.00270 0.00606 0.01485 0.02373 0.03280 0.04213 0.05185
-0.09751 -0.07616 -0.06294 -0.05223 -0.04284 -0.03426 -0.02620 -0.01849 -0.01101 -0.00366 0.00366 0.01101 0.01849 0.02620 0.03426 0.04284 0.05223 0.06294 0.07616 0.09751 -0.11159 -0.08656 -0.07114 -0,05868 -0.04777 -0.03781 -0.02846 -0.01953 -0.01087 -0.00237 0.00608 0.01456 0.02317 0.03201 0.04122 0.05096
-0.10109 -0.07637 -0.06255 -0.05162 -0.04218 -0.03364 -0.02568 -0.01810 -0.01077 -0.00358 0.00358 0.01077 0.01810 0.02568 0.03364 0.04218 0.05162 0.06255 0.07637 0.10109 -0.11596 -0.08696 -0.07082 -0.05808 -0.04708 -0.03714 -0.02787 -0.01906 -0.01055 -0.00220 0.00609 0.01442 0.02289 0.03162 0.04076 0.05049
-0.10331 -0.07645 -0.06229 -0.05125 -0.04178 -0.03327 -0.02537 -0.01787 -0.01063 -0,00353 0.00353 0.01063 0.01787 0.02537 0.03327 0.04178 0.05125 0.06229 0.07645 0.10331 -0.11868 -0.08716 -0.07060 -0.05770 -0.04666 -0.03673 -0.02752 -0.01878 -0.01035 -0.00209 0.00610 0.01434 0.02273 0.03139 0.04049 0.05021
Order statistics frorn Type III generalized logistic
147
Table 5 (Contd.) r
i
c~= 1.0
c~- 2.0
c~= 4.0
7 = 6.0
7 = 8.0
c~= 10.0
17 18 1 2 3 4 5 6 7 8 9 10 11 12 13 t4 15 t6
0.06274 0.19071 -0.07233 -0.07723 -0.07580 -0.07061 -0,06281 -0.05310 -0.04198 -0.02985 -0.01705 -0.00391 0.00930 0.02228 0.03474 0.04634 0.05671 0.33532
0.06339 0.22178 -0.09747 -0.09224 -0.08157 -0.07026 -0.05879 -0.04729 -0.03582 -0.02438 -0.01300 -0.00166 0.00961 0.02082 0.03195 0.04295 0.05377 0.36336
0.06212 0.23933 -0.11992 -0.09812 -0.08160 -0.06756 -0.05490 -0.04312 -0.03193 -0.02115 -0.01065 -0.00032 0.00992 0.02016 0.03047 0.04093 0.05160 0.37619
0.06147 0.24532 -0.12935 -0.09937 -0.08105 -0.06628 -0.05336 -0.04158 -0.03055 -0.02003 -0.00984 0.00015 0.01005 0.01996 0.02999 0.04023 0.05081 0.38021
0.06111 0.24833 -0.13444 -0.09983 -0.08067 -0.06557 -0.05255 -0.04079 -0.02985 -0.01946 -0.00943 0.00039 0.01012 0.01987 0.02975 0.03989 0.05041 0.38219
0.06089 0.25014 -0.13761 -0.10006 -0.08041 -0.06513 -0.05205 -0.04031 -0.02943 -0.01912 -0.00919 0.00053 0.01016 0.01981 0.02961 0.03968 0.05017 0.38336
Table 6 Values of (1) Var(/~*)/rr2, (2) Var(rr*)/a2 and (3) Cov(g*, cr*)/rr2 for n = 20 e=l.0
~--2.0
(1) (2) (3)
0.07595 0.03660 0.00000
0.06282 0,03224 0,00000
(1) (2) (3)
0.07609 0.04065 0.00072
0,06323 0,03645 0.00130
(1) (2) (3)
0.07695 0.04673 0.00298
0.06468 0.04247 0.00424
e=4.0 r=0 0.05635 0.02962 0.00000 r=2 0.05705 0.03405 0.00174 r=4 0.05889 0.03999 0.00505
a=6.0
~=8.0
e=10.0
0.05421 0.02870 0.00000
0.05315 0.02824 0.00000
0.05252 0.02796 0.00000
0.05504 0.03322 0.00192
0.05405 0.03280 0.00201
0.05346 0.03255 0.00207
0.05703 0.03912 0.00534
0.05612 0.03869 0.00549
0.05557 0.03842 0.00558
5. MLEs of location and scale parameters C o n s i d e r a r a n d o m s a m p l e o f size n f r o m t h e r e p a r a m e t r i z e d d i s t r i b u t i o n w i t h p . d . f , g(y) a n d c.d.f. G(y) as g i v e n in (4.1) a n d (4.2), r e s p e c t i v e l y , w h e r e t h e s h a p e p a r a m e t e r e is a s s u m e d t o b e k n o w n . L e t Yl:n _< Y2:n _< " " _ p } .
Given a random sample of size n from some population with distribution function G, an estimate of G is the empirical distribution function, G~(x), given by Gn(x) =
{o orx .l i-~
1
for X/-I:~ _< x < X/:n for X,:,_<x .
(2.7)
Since G,(x) -+ G(x) in probability as n ~ oc, a natural estimate of Q(p) is given by the sample quantile function, Q, (p), given by Q~(p) = G21(p) := inf{xlGn(x ) >_p} = X~.,.
for -i - < p1_ < /'/
i .
(2.8)
n
Parzen (1979) modified this estimator slightly through interpolation. He suggests using 0,(p)=n (/-p)Xi
l:n+n(p
i-I.)X~.~. /2
for i - 1 < p < _ -i. T/
(2.9)
n
Under the assumption that the form of G is known up to some unknown parameters, efficiencies of the preceding estimators can be greatly improved upon by using a few selected order statistics. In particular, when sampling from a
186
M.
M. Ali and D. Umbach
location-scale model, estimation by the use of selected order statistics is often used. This is typically accomplished by first estimating the location and scale parameters and then using these estimators to form an estimate of Q(p). Thus, estimation of the parameters will be considered at this point. Let X1, X2, . . . , X ~ be a random sample of size n from a distribution with distribution function Fx,~ and density function f~,6, where F~.,~(x) = F ( ( x - 2)/3) and f ~ . , ~ ( x ) = f ( ( x - 2 ) / 3 ) / 6 with location 2 E I R and scale 6 E I R +. Let XI:, <X2:n < ... <X,:, be the corresponding order statistics and let Y,.:, = (X/:n - 2 ) / 6 for i = 1, 2, . . . , n. Then YI:~ < Y2:~ < " " < Y~:n are order statistics corresponding to a random sample from F. Hence, with X/:, = 2 + bY/:,, for i = 1, 2, . . . , n, and E(Y/:,) = ei and Cov(Y/:,, Yj:n) = Yij, E(~:~) = 2 + 6e;
(2.10)
Cov(X/:n,Xj:~) = 62vii .
(2.11)
and
Using these results, one can find the Best Linear Unbiased Estimators (BLUE) of )o and/or 3 based on selected order statistics as follows. Denote the ranks of the k selected order statistics by nl < n2 < " " < nk. Express X~,:~ as X , ~ : , = ) c + 6 c ~ n i + e n ~ for i = 1, 2, . . . , n ,
(2.12)
where E(e~i) = 0 and Cov(e,,, % ) = 32v~,j. In matrix notation, this is X=AO+e=21+6e+e
,
(2.13)
where
x --(x., :., xn2:.,..., xok:.)' A =(1, x)
o
6/ =(c~n,,
~n2, - • •, c%) ~
1 = ( 1 , 1 , . . . , 1) t
e =(ent,em_,
• • • ,enk) t
(2.14)
•
By the Gauss-Markov theorem, the BLUE for O based on Xn,:,, X,2:~,... , Xn~:~ is 6 = (A'V-1A)-IA'V i x ,
(2.15)
where V is the matrix whose elements are vninj. The variance-covariance matrix for is 62(A'V-IA) 1. If 6 is known, then the B L U E of 2 is
= (ltV-1l) l l t v - l ( x -
6~)
(2.16)
Optimal linear inference using selected order statistics in location-scale models
187
with Var()~) = 6 2 ( 1 W 11) 1
(2.17)
If 2 is known, then the BLUE of 6 is
; -~- (o~tv-lo~)-l~tv-i(x - 21)
(2.18)
Vat(3) = 62(0(V-1~¢) i
(2.19)
with
These estimates can all be computed once the first two moments of the order statistics are known. This demonstrates the importance of tables of moments of order statistics. Depending on one's focus and which parameters may be known, one may use a variety of criteria to choose the optimal set of k order statistics at this point. This will be the focus of the next section. Suffice it to say here that analytic results on optimal selection of order statistics are quite rare. The result is the generation of lengthy tables for each possible sample size. To simplify things, asymptotic results are often used. For the asymptotic case, (Pl,P2,... ,p/r) is called a spacing of the k selected order statistics if 0 < pl < P2 < ' < pk < 1. Here, the ranks of the order statistics are obtained from ni = Inpi] + 1, for i = 1, 2, . . . , n, where [.] represents the greatest integer function. For a fixed spacing, the asymptotic distribution of selected order statistics was obtained by Mosteller (1946). THEOREM (Mosteller). Let 0 < Pl < P2 < " " < Pk < 1 be fixed. Suppose that X = (Xn~:~, Xn2:n, . . . , X,~:,) ~ is a vector of k order statistics from a random sample of size n from distribution function G with associated density g, where ni = [npi] + 1, for i = 1, 2, . . . , k. Suppose that g(G-l(pi)) > 0 and that g and g' are continuous in a neighborhood of G-l(pi) for i = 1, 2, . . . , n. Then X h a s a kvariate normal distribution with
E(Xn,:,)=G 1(pi)
for i = l ,
2,..., k
and
(2.20)
Cov(X,,,,X,j,) •
= 1.
"
n
p,(1 - pj)
g(G-l(pi))g(G l(pj))
for i _ < j = l ,
2, . . . , k .
Under the location-scale model, with G = F~.6, note that E(X) = 21 + 6u, where U = (Ul,U2,.--,Uk)
' = (F I(pl),F-1Co2),... , F l(pk))' .
(2.21)
Also, the variance-covariance matrix of X is given (62/n)W, where the (i,j)th element of W is given by ( p i / j ~ ) ( ( 1 - p j ) / f j ) for i_t}
(1 O)ly,
] (17)
-
{t:yt 0} and dj is the jth row of Dn the design matrix. For p = 0 and dj = 1 the minimization problem yields b0 equal to the sample quantile of order 2, while 2 = 1/2 yields b0 to be the sample median. Thus, the median is the least absolute error estimator (LAE) of fl0. ^
^/
I
Let 1~(2) be the solution sets consisting of (fi0(2),/~n(2)) of the regression quantiles by minimising (2.4). Also, let ~ denote the (p + 1)-element subsets of J = {1,2,... ;n}. There are (p+,n,) possible_ subsets in" x/t°. Elements h E ~ have complements h--- J - h and h and h partition the vector Y as well as the design matrix Dn. Thus, the notation Y(h) stands for the (p + 1)-element vector o f y ' s from Y, i.e., {y},j E h} while D~(h) stands for a (n - p - 1) x (p + 1) matrix with rows {(1,xj)lj E h}. Finally, let
On some L-estimation in linear regression models H = {h E Jgl rank D,(h) = p +
1} .
239 (2.5)
Then, the following theorem gives the form of the solution of (2.4). THEOREM 2.1. If the rank of D, is p + 1, then the set of regression quantiles I~(2') has at least one element of the form
1*,(Z) = D,(h) -1 Y(h)
(2.6)
for some h E ~/g. Moreover, 13(2') is the convex hull of all solutions of the form (2.6). The proof of Theorem 2.1 follows from the linear programing formulation of the minimization problem as given below: min[2'l',r + + (1 subject to
Y=D.b+r
-2')l.r' ], In = (1 , . . . , 1)'
+-r-,
(b,r +,r ) E I R p+I × I R + × I R + . (2.7)
For example see Abdelmalek (1974). Sample quantiles in the location model are identified with a single order statistic in a sample. Theorem 2.1 generalizes this feature to regression quantiles where normats to hyperplanes defined by subset of (p + 1) observations, play the role Of order statistics. The following Theorems describe various other properties of regression quantiles (RQs') given in Koenker and Bassett (1978).
THEOREM2.2. Iffl~*(2, Y, O,) E 1~(2, Y, O,) then the following are elements of the specified transformed problems: (i)
~*,(2, cV, D,) =c1*,(2, ^* V,D,),
c E [0, o0).
(ii)
^* - 2',dr, O.)= I*.(1
(iii)
^* Y + D,~,D,) = 1.n(2, 1.,(2, ^* Y , D , ) + ~,
(iv)
^* v, D.C) = C 1..(2',
dL(2', r,D.),
1.,(2, ^* Y,D,),
d E
~E 1Rp+I
Cp+l×p+1 non-singular matrix (2.8)
The following theorem gives the conditions when RQ's are unique. THEOREM 2.3. If the error distribution Fo(z) is continuous, then with probability ^. one: 1.,(2) = D,(h) -1 Y(h) is a unique solution if and only if (2'--l)lp+ 1 < Z[1/2
-- 1/2 sgn(yj - d j f '^* l.)-
2']djD.(h) I < 2.1;+1
jCh (2.9) where lp+l is a vector of ones.
240
S. Alimoradi and A. K. Md. E Saleh
The next theorem states the relationship of the number of residual errors /^, ilj = y j - djpn(2), J = 1 , . . . , n that are positive, negative and zero with that of n2. THEOREM 2.4. Let ~ = (ill,...,iin) / and P(it),N(ii) and Z(fi) be the number of positive, negative and zero residuals. Then, N(it) 0. This suggest the course of estimation using (3.25) and (3.26), where there is no such anamoly. In order to obtain the A B L U E of (p*', ~)' optimally, we maximise KPA with respect to 21,22,... ,2k satisfying 0 < 21 < 22 < ... < 2k < 1. Once we have obtained the optimum spacing vector ~0 = (20 ... , z~), n0 , we can determine the optimal coefficients a ° = (a°,...,a°) ' and b ° = (bl,...,bk). 0 0 , To illustrate the methodology, we consider the Cauchy distribution• Here, we note that
((7~2(2iA2j-2i2j)~) =
(3.32)
t tsi 7,77i>77 )
and Q0(2j) = t a n [ ~ ( 2 j - 1/2)];
j = 1,...,k .
(3.33)
The explicit forms of K1, K2 and K3 are k+l [sin2(Tc2i) _ sin2(rc2i_l)]2 x,
'
i=1 k+l [sin(2~2i) - sin(2=2i_l)]2 K2 = ~ ~ ( £ - ~-"~'-1) ' i=1 k+l [sin 2 (7~2i) K3 =
_
_
(3.34)
sin 2(7~,~i_1)] [sin(2~2i) - sin(2~2i ,)] 2 (2; - 2 _1)
i=1
The optimum spacings for the estimation of (]l*', ~r)' may be obtained by maximising 2p+2KiA in this case since Ill =/22 = 1/2 and I12 = 0, We know that optimum spacings for the location-scale parameters are given by
20=(?+
2 l'k+l'''"k+l
k )
(3.35)
S. Alimoradi and A. K. Md. E Saleh
246
[see for example, Balmer et al. (1974) and Cane (1974)]. Using these spacings we obtain the o p t i m u m coefficients as 4 • 4 ~i 2~zi = ---sin ff-~cos k+ 1 k + 1' 4 • 2 rti 2~zi =sm ~ - ~ c o s bi k+l k+l' ai
i = 1 ... k ' '
and
(3.36)
i = 1 ... k . ' '
These coefficients m a y be used to obtain the A B L U E o f fl* and a respectively. The /
\ 2p+4
J A R E o f the estimator for this case will be (k+l s i n . ' . ) Now, we present some numerical examplesku~mg tkheljcomputatlonal algorithms o f K o e n k e r and d ' O r e y (1987, 1994) for c o m p u t i n g the R Q s ' for two sets o f data. EXAMPLE 3.1. Consider the simple regression model y = flo + fll x + z , with the following data: X:
Y:
78 75
78 73
78 71
78 70
78 68
77 68
76 66
76 66
76 65
76 63
76 62
75 60
77 60
76
17.3 17.6 15.0 18.1 18.7 17.9 18.4 18.1 16.3 19.4 17.6 19.5 12.7 17.0 16.1 17.3 18.4 17.3 16.1 15.9 14.6 16.5 15.8 19.5 15.3 17.4 17.6
Then, the different fitted lines are as follows, LSE : Regression M e d i a n (RQ(.5)) : RegressionlStQuantile (RQ(.25)) : Regression3rdQuantile (RQ(.75)) :
14.72224 + 0.0329x 9.9999 + 0. lx 9.1 + 0 . 0 9 9 9 9 9 x 15.72499 + 0.03125x
Now, we represent the solutions for k = 7 o p t i m u m regression quantiles for the n o r m a l distribution. They are
1 2 3 4 5 6 7
0.0t71 0.0857 0.2411 0.50 0.7589 0.9143 0.9829
26.0000019 16.46249771 12.499999 9.9999294 14.599999 19.984619141 19.49999
-0:1727273 -0.01874998 0.0499999 0.1000009 0.049999 -0.00769 0.000000089
Based on these regression quantiles, we obtain the estimates o f flo and /31 as (14.0088, 0.04139). The following figure shows the scatter plot o f data with different fitted lines.
On some L-estimation in linear regression models
scatter plot 3.1
247
LSE FSRQ RQ(.25, .5, .75)
....
O
p,a(.7~) .
.
.
.
.
.
•
.
.
Q
._--___~_. .
"
i
I
I
I
60
65
70
75
-
-
-
EXAMFLE 3.2. In this example we consider the regression model Y = flo + fllXl ÷ f12X2 ÷ f13X3 ÷ Z
with the following data, XI:
4.3 3.6
4.5 3.9
4.3 5.1
6.1 5.9
5.6 4.9
5.6 4.6
6.1 4.8
5.5 4.9
5.0 5.1
5.6 5.4
5.2 6.5
4.8 6.8
3.8 6.2
3.4
X2:
62 77
68 63
74 70
71 77
78 63
85 70
69 77
76 56
83 63
70 70
77 49
84 56
63 63
70
X3:
78 75
78 73
78 71
78 70
78 68
77 68
76 66
76 66
76 65
76 63
76 62
75 60
77 60
76
Y:
17.3 17.6 15.0 18.1 18.7 17.9 18.4 18.1 16.3 19.4 17.6 19.5 12.7 17.0 16.1 17.3 18.4 17.3 16.1 15.9 14.6 16.5 15.8 19.5 15.3 17.4 17.6
Then, the LSE fit and Regression Median fit is given, LSE :
5.344783 + 0.9234785Xl ÷ 0.04712587x2 ÷ 0.05217684x3
RM :
5.48703623 + 0.84914058Xl - 0.01677862x2 ÷ 0.11797349x3
In this case we obtain the solutions for k = 7 optimum regression quantiles for the normal distribution. The results are:
248
S. Alimoradi and A. K. Md. E Saleh
1 0.0211 2 0.1009 3 0.2642 4 0.5000 5 0.7358 6 0.8991 7 0.9789
DO
])1
])2
])3
-5.79232454 -7.45990467 3.67410445 5.48703623 9.57583427 12.98698711 11.04361057
1.84005880 1.93643999 0.86474842 0.84914058 0.45074245 0.79262042 1.01503742
0.07403532 0.07792763 0.02345317 -0.01677862 0.05087731 0.07539532 0.06348020
0.08877762 0.10249341 0.09597122 0.11797349 0.03373820 -0.04833032 -0.02330833
The estimates of/30,/31, /32, and/33 based on these regression quantiles are given respectively by (5.065928, 0.941837, 0.0356317, 0.0665379). We provide here a sample table of optimum spacings, coefficients and related quantiles for A R E computation where error distribution of the linear model (p = l) is Cauchy. In this case, as an example, if we choose k = 7 the A R E is A R E = K111~ 1 = (0.4774) × 2 = 0.9548 . Table 3.1 Optimum Spacings, coefficients and ARE quantities for the L-estimation of regression line with Cauchy Errors [k = 3(1)10] k
3
4
5
6
7
8
9
10
21 al 22 a2 23 a3 24 a4 25 a5 26 a6 27 a7 28 a8 29 a9 210 alo KI K2 K3 A
0.2500 0.0000 0.5000 1.0000 0.7500 0.0000
0.1837 -0.0769 0.4099 0.5769 0.5901 0.5769 0.8163 --0.0769
0.1535 -0.0730 0.3465 0.2664 0.5000 0.6132 0.6535 0.2664 0.8465 --0.0730
0.1321 -0.0593 0.2941 0.0892 0.4364 0.4701 0.5636 0.4701 0.7059 0.0892 0.8679 -0.0593
0.1147 -0.0446 0.2500 0.0000 0.3853 0.3147 0.5000 0.4599 0.6147 0.3147 0.7500 0.0000 0.8853 -0.0446
0.1013 -0.0333 0.2170 -0.0350 0.3431 0.1860 0.4501 0.3823 0.5499 0.3823 0.6569 0.1860 0.7830 -0.0350 0.8987 -0.0333
0.09tl -0.0254 0.1926 -0.0453 0.3074 0.0948 0.4089 0.2934 0.5000 0.3650 0.5911 0.2934 0.6926 0.0948 0.8074 -0.0453 0.9089 -0.0254
0.4053 0.4053 0.0000 0.0666
0.4469 0.4244 0.0000 0.0848
0.4631 0.4457 0.0000 0.0956
0.4712 0.4619 0.0000 0.1025
0.4774 0.4713 0.0000 0.1074
0.4822 0.4770 0.0000 0.1109
0.4855 0.4813 0.0000 0.1135
0.0827 -0.0197 0.1732 -0.0454 0.2765 0.0354 0.3736 0.2117 0.4591 0.3180 0.5409 0.3180 0.6264 0.2117 0.7235 0.0354 0.8268 -0.0454 0.9173 -0.0197 0.4880 0.4846 0.0000 0.1154
249
On some L-estimation in linear regression models
3.2. Estimation of conditional quantile function In this subsection, we consider the estimation of a conditional quantile function defined by Q(2) = t~* + O-Qo(2),
0< 2< 1
(3.37)
based on a few, say k, (p + 2 _< k < n) selected RQ's where to is a prefixed vector in advance. To achieve this, we simply substitute the ABLUE of (p*', O-)' in (3.37) from (3.6) and (3.7) and obtain = t0/~. + 6,Q0(2),
for each 2 E (0, 1)
(3.38)
Then the asymptotic variance is given by Var(Qn(2)) = a2 {KJo,~-lto - 2K3doX-lmQo(2) + K1Q2(2)} . (3.39) Similarly, the MLE of Q(2), say, Qn(2) has the asymptotic variance given by
O-2
hill
{[22t;~r-lt0 -- 2112tlor~-lmQo()~) q-/I1Q2@) }
(3.40)
The ARE of Qn(2) relative to Q.(2) is given by (3.40) divided by (3.39). The ARE expression lies between Chmin (K*/* 1) _< ARE[Qn(2) : Qn(2)] _< Chmax(K*/* 1) ,
(3.41)
( K,2 inK3) IC = \ K3m' K2
(3.42)
where and
1"= ( I~1"~ rail2 \[12m' 122 J '
and Chmin(A) and Chrna×(A) are a minimum and maximum characteristic root of A. Now, consider two quantities namely (i) tr[K*F ~] and (ii) ]K*/*-I] I/p+2. Then, p~2tr[K,/, l] will provide us with the average ARE of Q~(2) r_e~ative2 to Q~(2) which lies between Chmin(K*F ) and Chmax(K*F- ) while ]K'F- ] /p+ gives the geometric mean of the characteristics roots of K*F 1. Hence, we determine the average ARE as 1
p+2
tE[K*F 11 = [(p-t-1)K1i22 - 2/(3112 + K2111]/(p+ 2).
(3.43)
On the other hand, IK*r-ll =
K fA If1 (I11/22 - I22) "
(3.44)
The optimization of (3.43) or (3.44) will provide the required optimum spacings for the determination of the regression quantiles required to estimate (/~*', O-)' and
S. Alimoradi and A. K. Md. E Saleh
250
thereby Q(2). It is clear that (3.43) and (3.44) do not depend on the design matrix Dn. Alternatively, if fo(z) is symmetric then, the (p+2) characteristic roots of (/C/*-1) are
KIlIll,...,
KII~I 1, KIlH 1,
( K I I l l l q- K21~21)/2-4-1/2((KIIll 1 - K2I~21)2 q- 4K3I~lII~21) 1/2
(3.45) Thus, we maximise (KIll' + K2122')/2 + 1/2 ((K,/{] 1 - K21~')2 + 4K3Ii,'lyza) l/z w.r.t Q0(2) to obtain the optimum spacing vector which is independent of D,.
3.3. Estimating functions of regression and scale parameters Let 9(/P, G) be a function of the regression and scale parameters of the linear model. We consider the estimation of 9(~*, a) based on k, (p + 2 < k < n) re. gresslon quantiles ]~j,(~.)= (]~j,(21),... ,/?j,(2k)),J = 0, 1,... ,p as in (3.4) and ^
^
I
,
then estimate 9(/F,c;) by the substitution method as g(~*,6-). To study the properties of this estimator, we use the local linearization theorem as in Rao (1973). This allows us to obtain the asymptotic variance of 9(~*, 6). Now, assume that g(~*,o-) is defined on IRp+l x IR+ possessing continuous partial derivatives of order at least two at each point of ]R_p+I x ]R +. Then, using Rao (1973), the asymptotic variance of 9(~*, 6) may be written as 2
Var(9(~*,6)) =°--e~'K*-lo), n
~o=(~Oo, CO,,...,COp+,)' ,
(3.46)
where
COl-
og(/r, o) Offi
,
i= 0,1,...,p;
COp+l-
og(fl*, &r
(3.47)
and K *-1 is the elements in verse of K* given by (3.42). Similarly, let g(~*, ~-) be the MLE of 9(/P, a) with asymptotic variance Var 9
*,~
= - -no Y F 10~.
(3.48)
The ARE of g(~*, 6) relative to g(~*, ~) is given by
ARE [9 (~" , 6) : 9(~',8-)] - (oIK*~°'/*-I~°1(/1 and/* is given by (3.42).
(3.49)
On some L-estimation in linear regression models
251
To obtain the optimum spacing vector for the estimation ofg(/F, a), one has to maximize (3.49) with respect to Q0(21),..., Qo(2k)for fixed vector o). However, the function g(/F, a) is nonlinear in general and the vector e~ depends on the parameters (/V ~, a) r which is unknown, therefore, we may apply the CourantFisher Theorem (Rao, 1973) and find that Chmin(K, r
1)< ARE
< Chmax(K*r-1) .
(3.50)
Now, we follow section 3.2, for optimization problem to determine the optimum spacings. In this way one is able to estimate functions of (/W, a) ~ such as, (i) conditional survival function and (ii) conditional'hazard function.
4. Trimmed least-squared estimation of regression parameters and its asymptotic distribution In the previous section we considered the L-estimation of regression and scale parameters based on a few selected regression quantiles. These estimators resulted in quick and robust estimator of/~ and o- against outliers for each distribution under consideration while sacrificing very little on the asymptotic relative efficiency. The basic idea of derivation was based on the generalized least squares principle. In this section we consider another L-estimator namely, the trimmed least squares estimator (TLSE) of/L Consider the regression model
Y= D,~+Z,
where
~ = (fl0, fll,''"
,flp)t E INp+I
(4.1)
where Z = (Z1,..., Z,)' has components which are i.i.d r.v's from a continuous cdf F0(z) (with pdf f0(z)). The least squares estimator (LSE) of ~ is given by = (D,D,)
Onr
(4.2)
which is obtained by minimizing
(4.3/ with respect to/~. For the trimmed least squares estimation (TLSE) of p, we first define an n x n diagonal matrix A whose elements are
{ 0, aii
=
1,
,A
if Y,. < d~/~,(21) or Y/> di'~n(22 ) otherwise
(4.4 /
where ~n(21) and ~,(22) are the regression quantiles of orders 21 and 22 respectively for 0 < ~1 < 1/2 < 22 < 1. We obtain the TLSE of p by minimising
(Y- D,~)'A(Y- D,~)
(4.5)
S. Alimoradi and A. K. Md. E Saleh
252
with respect to p obtaining fl,(41,42) :
t
1
(D, AD,) D',AY.
(4.6)
Ruppert and Carroll (1980) studied the TLSE under the following conditions 1. F0 has continuous density f0 that is positive on the support of F0. 2. The first column of design matrix is 1, i.e., d/1 = l, i = 1 , . . . , n and ~-~d~j=O,
forj=2,...,p+l
.
i--I
3. n!irao o (max(n-'/21dij])") =0. \ t,J / 4. There exists a positive definite 2; such that lim n -1 (D~,D,) =
n~oo
~ .
Jure6kovfi (1984) considered Bahadur representation and asymptotic properties of ft, (21,22) under the following conditions: (i) (ii)
F0 is absolutely continuous with density function f0 such that f0(') > 0 and f~(.) is bounded in a neighborhood of Fol(4). max~,jn-1/41dijl= O(1), as n ~ c~.
(iii)
lim,~n-l(D',D,)=2;,
(iv) (v)
where 2; is a positive definite ( p + 1)× ( p + 1) matrix. ~y-~i=ad4=O(1), ~ " as n - + ec, j = 1 , . . . , p + 1. l i m , ~ d ~ j = dj, j = 1,...,p+ 1.
We obtain the same results as in Ruppert and Carroll (1980) and Jure6kovfi (1984) under weaker conditions on the design matrix D, and distribution function F0. We use the weighted empirical approach of Koul (1992) providing new proofs for the theorems and lemmas that follows later on. Before we state the main theorem for TSLE we need an asymptotic linearity result for ft,(4). First, we consider the conditions A0, A1 and A2 from section 2 after Theorem 2.5 and define fl(2)=fl+Q0(4)el,
el = ( 1 , 0 , . . . , 0 )
.
(4.7)
Now, consider the weighted empirical process
T,(t;4)=n 1/2~di{I(yi-d/t 0
sup Tn(fl(,~)q-//-1/2t;.)~)- Tn(fl(,~);,~)--q0(,~)~nt [Itll_ 0, there is a 6 > 0 such that l i m s u p P (suP\llt_sl, 2e) < ~"
(4.23)
We need the following relations to prove (4.23)
di't : di's + (d/t - ali's) .
(4.24)
So that d/t - dis = d/(t - s)
and Idi't - d/sl :
Id/(t
- s)l
(4.25)
_< I]dillllt - sll
Thus,
dis -IId~l16 ~ tilt ~ dis +
IId~l16 •
(4.26)
Therefore, we have
T°(t;2)-T°(s;2) n
= n-l~2 Z dij{[l(Zi u)>l-e
(4.34)
PROOF of (i). We prove this step via Koul and Saleh (1995). By assumption A0, F0 is continuous hence we obtain Pn(2) as a unique solution of the minimization problem. Now use of Theorem 3.3 of Koenker and Bassett (1978) yields after writing sgn(x) =_ 1 - 2I(x _
inf
f~'(rO;2).O
[rl=b,HO[l=1!.
(4.52) Now, let • -T*, ( r , O;,t)~ = T ,*' (0,2). 0 +q0(2)r 0' ,r,,,O
(4.53)
and kn =inf{0'I2,0, II01t = 1} k = l i m k , = inf{0'I;0, II011 = 1}
(4.54)
A~ = N 1 - ~) 0. This with the result of step (ii), implies that for every e > 0 there exists an M~, Nf and N~ such that
P(A~)=P(Ik,-k and
I <ek)_> l - e / 3 ,
Vn_>N~
(4.55)
S. Alimoradiand A. K. Md. E Saleh
260
/ P [ s u p ~'(0;2).0 _<M~] _> 1 - c / 3 , / \I[0H=I
(4.56)
Vn>N~ .
Also, by Lemma 4.1 3 an N~ such that
P(
inf f~'(rO;2).O}2>u) \lrl=b~ll011=lL
2
>_P(
)
inf ~f~(r~0;2)~ > u - e / 3 , \lrl=b,ll011=ll
Now, we use the fact that P
inf
(4.57)
n>_N~ .
Idl - Icl _< Id + cI, d, c c ~,. Then, we have
ir~(r,
2)}2 > u)
P(llrn'(o;;~).01-qo(,~)bO'~nO[ > b/|/2 1) >_P(Ir.'(o;~).ol
vlloll ~--
/
(4.58) X
>_P ( sup IT'(0; 2). 0l_< -u -1/2 + bk(1 -e)qo(2);A,]~ \11011=1 _> 1-(2c/3),
/
Vn>_n~=NfVN~VN5
(M~+,v2) which completes the proof of by (4.55) and (4.56), as long as b _> k(l-¢)q0(;~), step (iii). Now, we use the results (4.32) and (4.34), to prove the Lemma similar to Jure6kovfi (1984) by defining t = n~/2(~,(2)- p(2)) . Thus, for a l l c > 0 , 0 < u < o c
(4.59)
3b(=b~) > 0 a n d N ~ so that
b T~(/~(2)+n 1 / 2 t ; 2 ) < u ) + P ( T ~ ( ~ , ( 2 ) ; 2 )
_>u)
= e/2 + e/2 = e .
(4.60)
Therefore, the proof of Lemma 4.2 is complete.
OnsomeL-estimationin linearregressionmodels
261
PROOF OF THEOREM 4.1. It follows from Lemma 4.1 that T, (j8(2)+ n - 1 / 2 t ; 2 ) - T~ (~8(2); 2 ) -
12,tq0(2)= oe(1)
(4.61)
for every sequence of random vector t, such that ]]t]] = Op(1). Hence, by substituting t with nl/2(~,(2) - p(2)) we have
Tn(~(~ ) +n-1/2[nl/2(~n(2 ) -/~(2))] ; 2) - Tn(1~(2);2) --qo(2)~n[nl/2(~n(2)--~(2)] ] =op(l) .
(4.62)
On the other hand
Tn(l~n(2);2)- Tn(]~(2);2) --qo(2)~n[nl/2(~n(2 )-I~(2))] =Op(1) . (4.63) But, by (4.33)we have T, (1), (2); 2) = op(1). Therefore, the proof of Theorem 4.1 is complete. [] The following Corollaries give the asymptotic distribution of RQ's. COROLLARY 4.1. Assume the conditions of Theorem 4.1 satisfied. Then,
rtl/2(~n()~) -- fl(2)) ~ ~A/p+I(O,'Q1- }~)
(4.64)
COROLLARY 4.2. Let the conditions A0-A2 hold. Then, for each 21, • • •, 2k such that 0 < 21 < ... < 2k < 1 the asymptotic joint distribution of the regression quantiles is given by
[n1/2(~n(21)- ~(~l)),...,nl/2(~n(2k)- fl(2k))] (4.65) ;~,A).j 2,,9 ' (i < j ) . where I2 = [co,).] and co,? - v0(~,i)q0(&) PROOF. By Theorem 4.1 we write
(4.66) and then by Corollary 4.1 and Cramer-Wold device we get the result. Now, we state the main Theorem of TLSE similar to Jure6kovfi (1984).
[]
262
S. Alimoradiand A. K. Md. E Saleh
THEOREM 4.2. Assume that the conditions A0, A1 and A2 hold. Then, for each 21 and 22 such that 0 < 21 < 1/2 < 22 < 1
(1)
F/1/2(~n (21,/~2) -- fl)
(ii)
: F/-1/2{(22_ 21).~,n} 1 ~ di(O(ai)_ 3)) ..~ Op(1) i=1 nl/g(~n(21,22)--~)~ Jl/'p+l (0, 0"2(21,22),~-"-1)
where O(Zi), Y and
0"2(21,22)
(4.67) (4.68)
are as defined
f Qo(21), / 4(z) = { z, Qo(22),
if z < Qo(21), if Qo(21) < z _< Qo(22), if Z > Qo(22)
t
(4.69)
7 = 21Q0(21) + (1 - 22)00(22) and
0"2(/].1,22) = (22--21) -2 f222 (Qo(u)-~0) 2 dbtq-/~l(Qo(/~l)-~50) 2 I
Jr- (l -- 22)(Qo(22)-
30) 2
(4.70)
--[21 (Qo(21)- go) q-(1- 22)(Qo(22)- ~5o)]2}. with
60 z (22 --21) -1
22
f2 Q0(u)
du
I
PROOF. This Theorem will be proved also with the aid of two Lemmas. First, we define the following processes Tn(t; 2 ) = n -1/2 ~ diZiI(Zi 0 sup Iltll c} × 2; IR'(R1;-1R')-'(RU+6)
.
Assume that, for an estimator ~(21,22) of p, G*(x) = llmP{x/n(~n(21,22)' -* - if) < x} .
(5.18)
Then, we define the asymptotic distributional quadratic risk (ADQR) of ~* (21,22) by
,5}~(~*(21,22) ; W) = [
xtWx dG*(x) = tr(IJ*W) ,
J l~p
(5.19)
where
f Z* = J xx'dG*(x)
(5.20)
and W is a positive definite matrix associated with the quadratic loss function
~, , )~2) - fl) tW(fl,( -* 2 , , 22) - P) . L ( ~~,, ( 2 , , 22); ~) = n(p,(21
(5.21)
THEOREM 5.2. Under the assumption of theorem 5.1, the ADQR's of the estimators are given by (i)
~(~.(21,22); W) = 0"2(21,22)tr.F-1W,
(ii)
~}~(~n(21,22);W) = a2(21,22)[tr(1;-1W) - tr(B)] + ~t(R1;-IRr)-IB~
(iii)
~(~.VT(2t, 22); W) = o-2(21,22)tr(1;-lW) - o2(21,2e)tr(B)Hq+2 (Z~,~;A)
-I- 8t(R1;-lat)-lB~{2gq+2()~2q,o: A ) - Hq+4()~2q,e;A) } ,
275
On some L-estimation in linear regression models
(iv)
~(flS(21, ~2); W) = 0"2(21,22)tr(-y'-lw) - 0"2(21, )~2)c tr(B)
x {2E[)~q-+22(A)]- cE[Zq+42(A)]} + c(c + 4){ t~'(RX-1R')-IBt~E[x¢4(A)] }
(v)
~}~(I~S+(21,22) ; W) = ~}~(~S;W) - 0"2(21,22)tr(B)
x El/1 Jr- ~t(R~-lR)
c/l 'B6{2E[(1-C)~q22(A))]()~2q+ 2 ~ C)]
- El(1--C)~q24(A))I()~2q+4(A) ~ c)]}
where B = RX-1WZ-1R'(RX IR)-I
(5.22)
PROOF. (i) and (ii) follow by straight forward computations using Theorem 5.1 (ii) and (iii). Using Theorem 5.1 (iv) - (v), (iii), (iv) and (v) are given by the same argument as in Judge and Bock (1978). 5.3. Comparison of ADQR
First, we compare the ~n(~1,22) with 1g.(21,22) by the ratio of the ADQR of fl.(21,22) to that of/~.(,~l, 22). ARE[~n(2,,22);~.(21,22)]=
tr(B) 1 tr(X_lw)
az(,q, 22)tr(/;-'W) j (5.23)
< Hence, ARE[~.(21, ~2); ~n(~l, ~2)] _~ 1 according as
_ 1. Let chmin(B) and chmax(B) be the smallest and largest characteristic root of B, then using the relation chmin(B)A
- 2Hq+2(Z2#, A) Hq+4(Xq2#,A)
(5.29)
-
Also let us define hmax(A) = ch
2
2
~B~Af2Hq+20~q'~'A)-Hq+4(Xq'~'A)
max,] /.
tr(X-1W)
__ tr(B)Hq+2()~2q,~;
(5.30)
tr(S 1W) A).}
2 , A) - Hq+4(Zq#, 2 A) (~,__f 2Hq+2(Xq,~
hmi"(k) = chmin'n)ai,
tr~WW-~
tr(B)Hq+2 (Z2#; A) tr(_r I~V) }
(5.31)
Then we obtain the inequality I1 q-hmax(A)] -1 < ARE(j~PT(;q,22),j~n(~,l,)~2) ) < [1 + hmin(A)] 1 (5.32) We compare fin(21,22) with fin(21,22). The difference of the risks is given by - ~(#, (;~l, h); w)
= o-2(.,~l,;~2)ctr(B){an[z~2(z~)] -cg[zq42(A)] } -
c(c + 4){6'(RZ 1R')-'B6E[Xq+44(A)]} .
Using the relation (5.28), the R.H.S of (5.33) is bounded from below by
(5.33)
On some L-estimation in linear regression models
277
cE[Zq+42(A)] }
(5.34)
az(2"22)ctr(B){2ED~q2;(A)] - c ( c + 4)chmax(B)AE
-
[z$+e4(A)] .
We have also the relation E[X~2(A)] - (q - 2)EEz22(A)] = AE[x;44(A)] .
(5.35)
We may rewrite (5.34) as a2(/ll, 22)ctr(B){ ( 2 - 9)EIZq22(A)] + { ( q - 2 ) 9 - c}E[Zq4+2(A)]}
,
(5.36) where 9 = (c + 4)h and h = chmax(B)/tr(B). Thus, for (5.33) to be non-negative for all 6 it suffices to choose c such that 0_0
.
(5.37)
N o t e that 9_< 2 is equivalent to c < 2(h -1 - 2). This implies that tr(B) > 2chmax(B) is the minimum requirement for (5.33) to be non-negative. Also, from non-negativeness of (5.36) at 6 = 0, it follows that c has to be less than or equal to 2(q - 2 ) ( o q > 3). Hence, we get THEOREM 5.3. A sufficient condition for the asymptotic dominance of t h e S T L E over U T L E [i.e., ~(/~n(21,22);W)>~(~s(21,,)~2);W ) for g ~ C R q and W(p.d)]is that the shrinkage factor c is positive and it satisfies the following inequality 2E[Xq+22(A)] - cE[x$+42(A)] - ( c - 4 ) h A E [ X q 4 4 ( A ) ]
2 0
VA > 0
(5.38) which, in turn, requires that q _> 3, 0 < c < 2(q - 2) and tr(B) > 2chmax(B). By the same argument as in the p r o o f of Theorem 4.3 of Sen and Saleh (1987), we get THEOREM 5.4. Under the sufficient condition of T h e o r e m 5.3 the P T T L E fails to dominate the STLE. Also, if for c~, the level of significance of the preliminary test, we have 2 0) > Hq+2(Zq,~;
r[2(q - 2) - r]/[q(q - 2)] with r = (q - 2) A [2/(h - 4)] ,
then S T L E fails to dominate the P T T L E . Finally, we compare ~s+(21,n 22) with ~s(21 ,22), the difference of the risks is given by
278
S. Alimoradi and A. K. Md. E Saleh
~:~(PS+(21,~2); W) - ~(~3(21,22); W) =-{0-2(21,22)tr(B)E[(1-cZqZ2(A))2I(z2q+2(A)
< c)]
_.~¢~t(R~y, lR)-lBc~(g[(1- CXq24(A))/(X:+4(A)~c)] -2E[(1-c)~;22(A))I(z:+2 2. © 1996 American Society for Quality Control. Reprinted with Permission. Reprinted from Cohen and Whitten (1986)
8. Illustrative examples Example 1 T h i s e x a m p l e is f r o m M c C o o l (1974). T h e d a t a c o n s i s t o f f a t i g u e life i n h o u r s o f t e n b e a r i n g s o f a c e r t a i n type. T h e s a m p l e o b s e r v a t i o n s l i s t e d i n i n c r e a s i n g o r d e r of magnitude are 152.7, 172.0, 172.5, 173.3, 193.0, 204.7, 216.5, 234.9, 262.6, 4 2 2 . 6 I n s u m m a r y , n = 10, 2 = 2 2 0 . 4 8 0 4 , s 2 = 6 1 4 7 . 6 7 3 3 , s = 78.407, a3 = 1.8636, Xl:10 = 152.7, a n d s2/(2-x1:10) 2 = 1.33814. I f w e c o n s i d e r t h i s s a m p l e t o b e f r o m a W e i b u l l d i s t r i b u t i o n , w e f i n d c~ = 0.95 ( a p p r o x i m a t e l y 1) as t h e e s t i m a t e f o r 6. T h i s s u g g e s t s t h a t p e r h a p s t h e e x p o n e n t i a l d i s t r i b u t i o n is t h e a p p r o p r i a t e m o d e l . F u r t h e r m o r e , w i t h a3 = 1.8663 ( a p p r o x i m a t e l y 2), it s e e m s r e a s o n a b l e t o c o n c l u d e t h a t t h i s s a m p l e is f r o m a t w o p a r a m e t e r e x p o n e n t i a l d i s t r i b u t i o n , a n d w e e m p l o y e q u a t i o n s (2.8) t o c a l c u l a t e e s t i m a t e s as
309
The role of order statistics in estimating threshold parameters
Table 9 Variance-Covariance factors for m a x i m u m likelihood estimates of inverse Gaussian parameters
0~3
(911
0.50 0.55 0.60 0.65 0.70
777.60000 520.18733 359.19540 254.68604 184.68582
0.60000 0.62007 0.64172 0.66491 0.68956
5.40000 4.80812 4.31034 3.88573 3.51929
5.15000 4.53312 4.01034 3.56073 3.16929
0.75 0.80 0.85 0.90 0.95
136.53333 102.64044 78.30306 60.51803 47.31804
0.71563 0.74304 0.77177 0.80176 0.83298
3.20000 2.91955 2.67155 2.45098 2.25385
2.82500 2.51955 2.24655 2.00098 1.77885
1.00 1.05 1.10 1.15 1.20
37.38462 29.81606 23.98443 19.44521 15.87907
0.86538 0.89895 0.93364 0.96945 1.00634
2.07692 1.91755 1.77352 1.64299 1.52439
1.57692 1.39255 1.22352 1.06799 0.92439
1.25 1.30 1.35 1.40 1.45
13.05348 10.79709 8.98215 7.51246 6.31489
1.04431 1.08335 1.12344 1.16458 1.20677
1.41639 1.31784 1.22775 1.14523 1.06954
0.79139 0.66784 0.55275 0.44523 0.34454
1.50 1.55 1.60 1.65 1.70
5.33333 4.52442 3.85435 3.29660 2.83020
1.25000 1.29427 1.33958 1.38594 1.43335
1.00000 0.93602 0.87708 0.82271 0.77249
0.25000 0.16102 0.07708 -0.00229 -0.07751
1.75 1.80 1.85 1.90 1.95
2.43851 2.10821 1.82858 1.59098 1.38836
1.48180 1.53131 1.58188 1.63352 1.68622
0.72605 0.68306 0.64322 0.60625 0.57192
-0.14895 -0.21694 -0.28178 -0.34375 -0.40308
2.00 2.25 2.50 2.75 3.00
1.21500 0.64831 0.36593 0.21650 0.13333
1.74000 2.02524 2.33824 2.67964 3.05000
0.54000 0.41026 0.31765 0.25014 0.20000
-0.46000 -0.71474 -0.93235 -1.12486 -1.30000
3.25 3.50 3.75 4.00 4.25
0.08500 0.05584 0.03766 0.02601 0.01833
3.44977 3.87931 4.33890 4.82877 5.34909
0.16210 0.13300 0.11034 0.09247 0.07819
-1.46290 -1.61700 -1.76466 -1.90753 -2.04681
4.50 5.00 6.00 7.00 8.00
0.01317 0.00713 0.00245 0.00099 0.00045
5.90000 7.09404 9.85294 13.10854 16.86226
0.06667 0.04954 0.02941 0.01882 0.01274
-2.18333 -2.45046 -2.97059 -3.48118 -3.98726
(922 = q~. + 1
(933
(913
(923
(912 = - ( 9 .
Reprinted from Cohen and Whitten (1988), p. 82 by courtesy of Marcel Dekker, Inc.
A. C. Cohen
310
f = [ 1 0 ( 1 5 2 . 7 ) - 220.4804]/9 = 145.17, /~ = [10(220.4804 - 152.7)]/9 = 75.31 . F r o m (2.9), we calculate estimate variances as V(~) = 63.02
and
V(/~) = 630.2 ,
a n d a p p r o x i m a t e 95% confidence intervals become 129.60
,
(5)
for 0 > 0 and fi > 0, where fi is the shape parameter and 0 is the scale parameter. Suppose Try,n,.-., Try,,, is a multiply Type II censored sample from the Weibull distribution Wei(0, fi), then Xr,, = lnTr,,n, i = 1 , . . . , k , is a multiply Type II censored sample from the extreme-value distribution EV(/~, ~r), with the density function f(x;/~,a)=-e
{7
(-")/ exp - e (x-z)/~ ,
-c~<x 0, where a = 1/fl and # = ln0. Notice this is a location and scale family with p and a, so their BLUE and BLIE can be derived, and these estimators can immediately be used to derive estimators for 0 and fl in the Weibull distribution. In fact, the corresponding estimators are: fi = l i p ,
fi* = 1/a*,
0 * = e ~*,
(7)
0=e ~ .
(8)
Being nonlinear functions of o-* a n d / / , respectively, estimators fl* and 0* do not enjoy the nice property of being unbiased. In fact, the unbiased estimators for fi and 0 do not exist, In this subsection, through the asymptotic distribution of a*, we provide an approximately unbiased estimator for fl with mean-squared error smaller than that of both fl* and ft. Denote a* = ~ %,,X~i,n in (4) and k-1
bl = --Crl.n
b2 = - ( c r l , n -t- Cr2,n)~... ~bk 1 ~ -- ~ C
.... = Crk,n
i=1
then a* can be expressed as = -
k- 1
=
O" i=1 k
k- 1 i=1
ri+l -ri
b ~'~ Xri+j'n -- X r i + j - l ' n j=l
l
Z
-_
i=l
j=l
28,
F. Kong
318
Here Xri+j'n -Xri+j-l'n Hij = aE(Zr,+j,n - Zr,+j-l,n) '
i= 1,...,k-
1, j =
1,...,ri+l -1"1. ,
and E(Zr,+j,,) is the expectation of the (ri + j)th order statistic of a sample of size n from the standard extreme-value distribution EV(0, 1). According to David (1981), each//,7 approximately has an exponential distribution with mean 1 and variance 1. For j ¢ m, H/j and Him have approximately zero covariance, and 2H/j has approximately a chi-square distribution with 2 degrees of freedom. Therefore, a*/a is approximately a weighted sum of independent chi-square random variables with # = E ( a * / a ) = 1 and v = V a r ( a * / a ) = L ~ , . . According to Patnaik (1949), as n--+ ec, 2tr*/aLk,n has a chi-square distribution with 2#2/v = 2/Lk,. degrees of freedom, that is 2a* o l d , , ~ x2(2/L
")
"
Similar to Zhang, Fei and Wang (1982), aLk,~/2a* approximately has an inverse Gamma distribution, so approximately
EdaLk,n~
1
and
Var
4
i,)
(Lk,"
2
1
Therefore an approximately unbiased estimator for fl is fl** _ 1 - Lk,. = (1 - Ll,,.)fl* 0"*
(9)
1
with Var(fl**) = fl 2/(Lk,-1. - 2). It's also easy to calculate the MSE's for fi* and fl, MSE(fl*) =
1 + 2Lk,,
f12 ,
1 + 7Lk,n
f12 .
(1- L,,.)(Lkl- 2)
and =
So, MSE(fl**) < MSE(fi*) < MSE(fl) .
(10)
Parameter estimation under multiply Type-H censoring
319
3. Maximum likelihood estimation
3.1. M L E in multiply Type H censoring Let's consider m a x i m u m likelihood estimate in a general population. Suppose
Xr~,~,...,Xrk,n with 0 1 and fi > 1 - 1/#. Note when # < 1, fi could be negative. Furthermore, if e, fl also satisfy e < 2 and fi < 2 - l / y , one can find a small positive ~ that conditions (18)-(21) are satisfied. F o r (22)-(25) to be satisfied, one needs to choose ~, fi such that 1 < e < 3 / 2 , and 1 - 1 / # < f i < 3 / 2 - 1 / # . By taking derivatives with respective to 0, we can easily determine 71 3# - 1 and 72 = 1 - # if # < 1, and 72 = 0 otherwise. For # < 1, choose 1 < ~ < 2 and 1-1//~>_q, 0 > 0 , (35) and cumulative distribution function
F(x;O, tl)=l-exp{-~O--~O~},
x_>t/, 0 > 0 ,
(36)
where t/>_ 0 is the warranty time or threshold and 0 is the residual mean survival time. If we make a transformation Y = Y-~, then Y ~ exp(1,0). For order statistics X~,n,... ,Xn,, of a sample of size n from population exp(0, it), YI,,,..., Y,,, are the corresponding order statistics of a sample from population exp(1,0). First we have the following lemma. LEMMA 1. Assume X ~ exp(1,0) and XI,~,... ,X~,, are the order statistics of a sample of size n. Define WI = Xl,n, W2 = X2,~ - XI,~,. • •, W~ = X,,, - Xn-l,, then 1. Wl,..., W~ are mutually independent, and W~ ~ e x p ( ~ ,
0), i = 1 , . . . , n.
2. Xi,n has the expression i
Xi,= ~ '
Zj
jl
n - j + l '
i = l,. .,n "
(37) '
where Z l , . . . ,Zn are i.i.d from exp(l,0). The proof can be found in Balakrishnan and Cohen (1991) or Arnold, Balakrishnan and Nagaraja (1992).
5.1. Confidence intervals for one-parameter exponential distribution In this subsection we assume the survival time has an exponential distribution exp(0,0) and consider the confidence interval for 0. Suppose 0 0
.
j=l,j#k
k=I
(49) K a m p s (1990) gives a proof. Instead of using Qk defined in (38) and (39), we introduce
Q* = Qk/2 =
(50)
r,., _
where Zj (/" = 1 , . . . , rk are i.i.d with exp(1,0) distribution. Define
(n-j+l)/k (n-j+l)/(k-1) aj =
j = 1,...,rl j--rl+l,...,r2
.
(51)
n-j+l
j=rk-l+l,...,rk
,
then according to L e m m a 3, the cumulative distribution function of Q~ is rk
Grk(t)=
r/~
1-(-1)rk-lHaiZa~' i=1
h=l
rk
H
(a~-aj)-'e
ah,, , > 0 .
j=l,j=~h
(52) As an example, if t, ~ is the 100(1 - @th quantile of 1 - c~ lower confidence bound for 0 0L -
tl -~
Grk(t), we
have the exact
(53)
Parameter est#nation under multiply Type-H censoring
331
The exact 1 - c~ confidence interval for 0 can be derived similarly. Again, complicated calculations are needed to derive the quantiles. Besides, for (52) to be valid, the k numbers a l , . . . ,ak defined in (51) must be different, so one cannot always use Lemma 3 to derive the exact confidence intervals. 5.2. Confidence intervals f o r two-parameter exponential distribution
Consider a two-parameter exponential distribution exp(0, r/) with t / > 0, in which one believes that death or failure cannot occur before certain time t/. As earlier, suppose X~,~,... ,X~k,~ is a multiply Type II censored sample obtained from a sample of size n from exp(0, t/). We have the following simple lemma which is proved in Balakrishnan and Cohen (1991) and Arnold, Balakrishnan and Nagaraja (1992). LEMMA 4. Assume X ~ exp(1,~/) and Xl,n,... ,X~,, are the first r order statistics of a sample of size n, then for l < _ s < r _ < n , V,I=Xs+I,~-X~.,, V~2= Xs+2,, - X~,,,..., Vs,r ~ = X~,,, - Xs,,, are the first r - s order statistics of a sample of size n - s from population exp(1,0). According to this lemma, Xr2,n-X~,,~,... ,X~k,,- Xr,, are the ( r 2 - rl)th,.. • , (rk --rl) th order statistics of a sample of size n - r~ from the one-parameter exponential distribution exp(0,0). Therefore the results in Section 5.1 can be directly used to derive confidence intervals for 0. Define Q1 2 ( X r , , n - - t / ) 0
(54)
and Qk,1 = 2 Z i=2
- 2 0
Xri,, -- kXr,,, /0 .
(55)
I_i=1
Pivotal Qk,1 only contains parameter 0, so the confidence intervals for 0 can be constructed through Qk,1 only. The three methods provided in the previous section can be adopted here to construct the confidence intervals for 0. To derive the confidence intervals for r/, consider the pivotal quantity Z-
Q1 _ Qk, l
Xr~,,-q
(56)
2ik=l Xri,n - kXrl ,n '
which is a division of two independent random variables, each being a weighted sum of exponential or )~2 random variables. The approximate and exact distributions of Q1 and Qk,1 have been given earlier, therefore those of Z are easy to derive. Since it is of particular interest to test hypothesis H0: t / = 0 versus HI: r/ > 0, SO besides constructing the confidence intervals for r/, we also provide three different tests for H0.
332
F. K o n g
M e t h o d I. By L e m m a 1 and (42), approximately, 2&Q1 ~ z2(f~)
and
2# 0, one can reject Ho if 0L ) 0. M e t h o d II. In Section 5.1, L e m m a 2 gives a p p r o x i m a t e distributions of both 2Mz Q' and 2Mk,~Ok" where Mk.1. = #k,1/2. One can use these a p p r o x i m a t i o n s to derive the a p p r o x i m a t e distribution of Z. But if r1 is too small, L e m m a 2 m a y not give a good a p p r o x i m a t i o n to the distribution of Q1, therefore the a p p r o x i m a t e distribution derived for Z m a y not be accurate. However, in this particular situation, one can always use L e m m a 3 to derive the exact distribution of Q1, therefore give a more accurate a p p r o x i m a t i o n to the distribution of Z. T o do that, define k+l-j din = 2Mk,1 (n -- h + 1)' An a p p r o x i m a t e density of Z -
h ~-rj_l I}- 1 , . . . , r j ,
--7+ 11 (60)
1 ~ 1/2d/h
27;J
x
{(n-i+
.
Mk,tQI M, Qk,l is
rl)! i=1 j=2 h=~,_,+I Z
f z ( z ) - 4(n
j=2,...,k
z>0
.
1)z/2 + 2d,i h} l/2djh+l
Denote ~1-~ as its I00(1 - c0 th quantile, then the 1 - c~ lower confidence b o u n d for t/is M1
(61)
Parameter estimation under multiply Type-II censoring
333
Again if 0L is negative, we replace it with zero. Null hypothesis /7o: ~/= 0 is rejected if 0L > 0. F r o m Section 5.1, (60) is quite accurate. M e t h o d III. We can also derive the exact distribution o f Z t h r o u g h L e m m a 3. Define
Q*I = Q1/2,
Q*k,l = Qk,1/2 ,
then Q~ and Q*k,1 have the forms o f n2
nl
QI =
Q*k,1
1i ai,
i
:
i,
i=l
i=1
where Z l l , . . . , Z I , t , Z21,...,Z2~2 are i.i.d exp(0,0) r a n d o m variables. If a l , . . . , an~ and b l , . . . , b,2 are all different positive numbers, we can use L e m m a 3 to derive the exact distribution o f Z. In fact, nl = rl, ai = n - i + 1, for i= 1,...,rl, and n2=rg-rl, bj=(n-j+l)/(k-i+l) for ri-1 n) experimental units but measuring at most m(< n) of them. The object is to efficiently use these m measurements to get an estimator of # which is Table 2.4 Values of m, n satisfying var(/~in(m)) < vat(X) n
24
5-8
9-12
13-17
18-20
m
2
3
4
5
6
N. N. Chuiv and B. K. Sinha
346
unbiased and has less variance compared to the sample mean based on n measurements. As an example, for n = 2, to dominate X with variance = cr2/2, we can record one more observation (i.e., 3 observations in all) but use only the median (i.e., only one measurement) to yield var(/~median:3) = 0.45o -2 < ff2/2 !! It turns out that this is a c o m m o n phenomenon, and can be done for every sample size n. The following table provides minimum m and corresponding optimum estimators o f # for n 7, which in turn does better than for all n _> 6. Moreover, a comparison of var(0blue) with the RCLB reveals another striking feature that 0blue performs better than any unbiased estimator of 0 based on the SRS for all n _> 8! This latter fact can be proved theoretically as the following result shows. Its proof appears in Ni Chuiv et al. (1994). ~
0blue
N
THEOREM 5.1. For n > 8, var(0blue) < 20.2/n. REMARK 5.1. For n = 5, var(0blue), var(0rss) and var(0blue) are all the same because all the estimators involved coincide with X3:s, the sample median. REMARK 5.2. Unlike Lloyd's (1952) BLUE, Oblue, the estimators 0rss and 0blue do not involve the covariances of the order statistics, and so avoid the computational difficulties associated with 0blue.
On s o m e
aspects
of ranked set sampling
361
Table 5.1 Comparison of variances n
var(0blue)
var(0~ss)
var(0blue)
RCLB
5 6 7 8 9 10 11 12 13 14 15 16 18 20
1.2213 0.8607 0.6073 0.4733 0.3865 0.3263 0.2820 0.2481 0.2214 0.1998 0.1820 0.1671 0.1435 0.1257
1.2213 0.5454 0.3519 0.2651 0.2170 0.1870 0.1667 0.1521 0.1411 0.1327 0.1260 0.1205 0.1122 0.1062
1.2213 0.5452 0.3126 0.2041 0.1441 0.1074 0.0832 0.0664 0.0542 0.0451 0.0381 0.0327 0.0247 0.0194
0.4000 0.3333 0.2857 0.2500 0.2222 0.2000 0.1818 0.1667 0.1538 0.1429 0.1333 0.1250 0.1111 0.1000
Incidentally, we can also derive the BLUE of 0 based on a partial RSS, namely, X(33),... ,X(z+2 l+2), for l < (n - 4). Starting with ~ l + 2 ) ciX(ii) and minimizing var(F~'+2)ciXiii)) subject to the unbiasedness conditions: ~ I + 2 ) c i = 1 , ~(t+2) 3 ciCi:n = 0, leads to 0blue(prss, l) .... v-~(l+2) Ci:nX(ii)/dii:n) = (E~l+2) X(ii)/dii:n)(E~,+2)2Ci:n/dii:n) - +v-~('+2) ~2..~3 Ci:n/aii:n)~2..a3 ( E ~ 1+2) 1/dii'n)(E~ 1+2). C2n/dii'n). • - (K-~(/+2) ~,z_.,3 c i:nl/dii:n,~2
(5.16) with var(0blue(prss, I)) = 0-2
(E~I+ 2) Ci:n/ 2 i:n) ( V'(l+2)Z~3 1/dii:n)(E~ 1+2) ei:n/dii:n ) 2
(5.17)
_ ( X-'(l+2)z__~3Ci'n/4"i'n) 2 , •
The following table provides, for n = 5(1)16(2)20, minimum values of l for which (i) var(0blue(prss, l)) < vat(0), and (ii) var(0blue(prss, I)) < 2a2/n (RCLB). Again, we have taken 0-2 = 1 without any loss of generality. Clearly, it follows from Table 5.2 that often a partial RSS, based on a relatively very few actual measurements and combined with optimum weights, does better than Lloyd's 0blue and, more importantly, better than any unbiased estimator. Thus, for n = 10, 0blue(prss,4) based on a partial RSS of size l - 4 is more than 50% efficient than 0blue, as well as better than any unbiased estimator based on a SRS of the same size. We now discuss another variation of a partial RSS. Instead of working with (X'(33),... ,X((1+2)(1+2)) ) for some l < (n - 4), we begin with the central diagonal
N. N. Chuiv and B. K.
362
Sinha
Table 5.2 Minimum values of l, indicating dominance of PRSS over SRS and RCLB
8 9 10 11 12 13 14 15 16 18 20
var(0btue(prss, 1)) < RCLB
var(0blue(prss, l)) < var(0blue)
l
var(0btue(prss, l))
l
var(0blue (prss, l))
4 4 4 5 5 6 6 6 7 8 8
0.2041 0.1619 0.1724 0.1091 0.1251 0.0818 0.0954 0.1184 0.0753 0.0610 0.0860
3 3 4 4 5 5 6 6 6 7 8
0.2614 0.3533 0.1724 0.2289 0.1251 0.1602 0.0954 0.1184 0.1497 0.1113 0.0860
element(s) in McIntyre's RSS and spread out along the diagonal in both directions to include a few more terms. Thus, for n odd = 2m + 1, we propose to use the BLUE based on the central diagonal ( 2 / + 1) order statistics {Y((m÷l)(m+l)) , (Y(mm),Y((m+2)(m+2))),... ,(Y((m+l_l)(m+l_l)),X((m+l+l)(m+l+l)))} , which, in view of symmetry and (5.14), is given by
Oblue(mprss, 2l +
1) =
~-~J-lX((m+l+r)(m+~+~))/d(m+l+r)(m+l+r):"
(5.18)
~ l _ l 1/d(m+l+r)(m+l+r):n with ~72
var(Oblue(mprss,21 +
1)) = ~ t l
1/d(m+l+r)(m+l+r):n
(5.19)
On the other hand, for n even = 2m, we intend to use the BLUE based on 2l central diagonal order statistics, namely, {(Xmm,X(m+l)(m+X)),..., (X(m l+l)(,~-t+l), Y(m+l)(m+l))) , which, again in view of symmetry and (5.14), is given by l
/~blue(mprss, 2/) = ~-(1-1)X((m+r)(m+r))/d(m+r)(m+r):n
(5.20)
}-~J-(l- 1)1/d(m+r)(m+r):n
with ~72
var(0blue(mprss, 2/)) = 2_5"---'(;-1) l/d(m+r)(m+r):n
(5.21)
The following result, whose proof is given in Ni Chuiv et al. (1994), shows that this modification of a partial RSS often pays off in the sense that, whatever be n _> 8, the weighted average of only 4 or 5 selected middle order statistics from the
On some aspects of ranked set sampling
363
McIntyre's RSS, depending on whether n is even or odd, does better than any unbiased estimator based on a SRS of size n.
THEOREM 5.2. (i) For n = e v e n _ > 8 , var(Oblue(mprss,4)) 9, var(0blue(rnprss, 5)) < 2ry2/n.
(ii) For
5.3. Which order statistic? In this section, as in Sections 2.2, 3.3, and 4.2, we address the issue of the right selection of order statistics in the context of RSS. The following variance inequality for order statistics of a Cauchy distribution is useful for our subsequent discussion. Its small sample validity follows from Table 2 of Barnett (1966b), and its asymptotic validity is obvious (see also David and Groeneveld, 1982). Recall that a sample median is defined as Xmedian:n Ym+l:Zm+lwhen n = 2m + 1, and as [Xm:2m +Xm+l:Zm]/2 when n = 2m. =
LEMMA 5.1. var(Xmedian:n) _< var(X~:n) for any r and n. In view of the above result, we can recommend the use of the sample median from each row of n observations in Table 1.1, and the mean of all such medians as an estimator of 0, namely,
0rnedian:n(r/) =---[Xm(1)dian:n-]--''-I-X2)dian:nl/n
(5.22)
where X(mi~dian:n is the sample median from the i th row of Table 1.1. Clearly, 0median:n(n) is unbiased for 0, and, by Lemma 5.1, 0median:n(n ) is much better than the ordinary Mclntyre's estimator 0rss. Slightly more efficiently, we propose measuring only m medians from the first m rows of Table 1.1, where m < n, and use rv(l) (m) 0median:n(m) = C-~median:n -1-""" + X~edian:n]/m
(5.23)
as an estimator of 0. Clearly,
E(0median:n(m)) = 0,
var(Omedian:n(m)) = var(Xmedian:n)/rn .
(5.24)
The following result, whose proof again appears in Ni Chuiv et al. (1994), shows that it's enough to measure only 2 or 3 medians to achieve universal dominance over any unbiased estimator of 0 based on a SRS, whatever be n. This result is very similar to those in the cases of normal and exponential distributions.
THEOREM 5.3. (i) For 8 < n < 21, m = 2 will do, i.e., var(0median:n(2)) < 2rT2/n. (ii) For n _> 22, m = 3 will do, i.e., var(0median:n(3)) < 2~2/n. We finally discuss another variation of the above concept. Since the merit of using a RSS depends on the ability to rank the experimental units without their actual measurements, it is obvious that the fewer the number of units we need to rank the better. This suggests the strategy of ranking exactly 5 units at a time (the
N. N. Chuiv and B. K. Sinha
364
m i n i m u m n u m b e r for a C a u c h y distribution), measuring only the median, repeating the process m* times, and eventually using the average Omedian:5(m*) o f these medians as the resultant estimator o f 0. The following table provides, for n = 5(1)16(2)20, m i n i m u m values o f m* for which var(0median:5(m*)) is smaller than var(0blue) and also R C L B ( = 2aZ/n), based on a SRS of size n. Thus, for example, the average o f 4 sample medians, each based on 5 observations (but only one measurement), is better than Lloyd's B L U E based on 10 measurements. Similarly, the average o f 7 such medians dominates any unbiased estimator o f 0 based on a sample o f size 10. REMARK 5.3. It is again possible to explore the notion of expansion in this problem. We omit the details.
6. Estimation of location and scale parameters of a logistic distribution This section is based on Lain et al. (1995). Here we apply the concept of RSS and its suitable modifications to estimation o f the location and scale parameters of a Logistic distribution. We note that the p d f of a Logistic distribution can be written as 1
e
x
O
f(xlO, a ) - - a . [ l + e ~ f l o l 2 ,
-oc<x,
00
(6.1)
where 0 is the location parameter and a is the scale parameter. First we give a discussion o f some standard estimators o f 0 and o- based on a SRS o f size n, namely, X1,...,Xn.
Table 5.3 Minimum values of m* for which var(0mcdian:5(m*)) is smaller than var(0blue) and RCLB ^
n
m*(RCLB)
m*(0blue)
5 6 7 8 9 l0 ll 12 13 14 15 16 18 20
4 4 5 5 6 7 7 8 8 9 l0 10 11 13
1 2 3 3 4 4 5 5 6 7 7 8 9 10
On some aspects of ranked set sampling
365
It is clear that the conventional maximum likelihood estimators of 0 and ~ are extremely difficult to get in the context of (6.1), and their small sample properties are completely unknown. However, based on the order statistics XI:,,...,X,:,, Lloyd's (1952) best linear estimators (BLUEs) are quite popular. Throughout this section, we have taken the BLUEs of 0 and a as our main standards for comparison against RSS estimators. Note that E(Xr:n) = 0 + Cr:,a,
var(X~:~) = drr:nff 2,
cov(Xr:n,Xs:n) = drs:n~2
(6.2) where cr:n and dry:, are respectively the mean and variance of Xr:,, and d~s:n is the covariance between X~:, and Xs:, from a standard Logistic distribution with 0 = 0 and a = 1. The values of c~:,, drr:, and dr,:, are given in Balakrishnan and Malik (1994). Using (6.2), the BLUEs of 0 and o- are given by (Lloyd, 1952, Balakrishnan and Cohen, 1991) 0blue = l ' ( Z * ) - l x * / l t ( Z * ) - l l
(6.3)
ablue = c ( ( Z * ) ' X * / ~ ' ( Z * ) l c ~ .
(6.4)
In the above, X* = (Xl:n,... ,X,:,)', ~ = (c1:,,-.. ,Cn:,) and £* = var(X*). Moreover, var(&blue) = cr2/l'(y~*) I1
(6.5)
var(0blue) = o2/~t(~*)-10{
(6.6)
We also note in passing that the Fisher information matrix in our problem is given by
I,(0, ~r) =
(o ;)
(6.7)
where
c = 2 feX(l/°°x2-'- eX)---~,~2dx = 2.43 J0
(6.8)
(1 + e x)
which provides the Rao-Cramer Lower Bound (RCLB) of the variance of any unbiased e s t i m a t o r d l 0 q- d2ff o f dlO q- d2cr as RCLB(& 0 + d26) = dZ3a2/n + d2 aZ /cn .
(6.9)
This immediately gives RCLB(0) = 3~2/n
(6.10)
RCLB(~r) = ff2/Cn
(6.11)
N. N. Chuiv and B. K. Sinha
366
6.1. Estimation o f O: ordinary R S S
Since E(X) = 0, following McIntyre's concept, our first estimator of 0 can be taken as n
0rs s =
~-~X(rr)/rl
(6.12)
r=l
with var(0rss) = cr2 ~
(6.13)
dr~:./n 2 •
r=l
6.2. Estimation o f O: blue based on R S S
We now address the issue of how best to use the RSS, namely, X0~),... ,X(,,), for estimation of 0. Recall that E(X(u)) = 0 + ci:,~r and var(X(ii)) = dii:,a 2. Starting with }-~ cgX(,) and minimizing var(}-]~7 cY(iO) subject to the unbiasedness conditions: ~-~1~ ci = 1, 2 1 eiCi:n = 0, leads to the BLUE of 0 as 0blu e =
( 2 ~ X(ii)/dii:n)(E7 ci2:n/dii:n) -- ( E l n Ci:n/ d i':n)(Eln ei:nX (i')/dii:n)
(27 lld, i:.)(E7
-
(27
2 (6.14)
with
var(Gue) = .2
(27 cL/d,:.) (271/4,:.)(27 eL/a,:.) - (27 "i:./dii:.) 2
(6.15)
Using the fact that, for a Logistic distribution,
£ ei:. = O,
£ ei:./dii:.=
i=1
i=1
0
,
(6.16)
we get E l nX, n (ii)/dii:n 0blue -- Y~'~11~el,:, '
~ __ ~72 var(0blue) 2 7 1/dii:, "
(6.17)
Table 6.1 provides a comparison of var(0blue), var(0rss), var(0blue) and the RCLB (0) for n = 3(1)10. Without loss of generality, we have taken O"2 = 1. It is clear from Table 6.1 that, as expected, 0blue performs better than McIntyre's estimator 0rss which in turn does better than 0blue for all n. Moreover, N a comparison of var(0rss) with the RCLB reveals another striking feature that var(0rss) is smaller than RCLB for all n, so that 0rss performs better than any unbiased estimator of 0 based on the SRS for all n. This latter fact can be proved theoretically as the following result shows. Its proof appears in Lam et al. (1995).
On some aspects of ranked set sampling
367
Table 6.1 Comparison of variances for estimation of 0 n
var(0blue)
var(0rss)
var(0blue)
RCLB(0)
3 4 5 6 7 8 9 10
1.0725 0.7929 0.6284 0.5201 0.4435 0.3866 0.3425 0.3073
0.5968 0.3711 0.2553 0.1872 0.1438 0.1142 0.0931 0.0776
0.5695 0.3379 0.2227 0.1576 0.1171 0.0905 0.0720 0.0586
1.0000 0.7500 0.6000 0.5000 0.4286 0.3750 0.3333 0.3000
Table 6.2 Minimum values of l, indicating dominance of PRSS over SRS and RCLB(0)
3 4 5 6 7 8 9 10
var(Ob~uo(prss, l)) < RCLB
Var(Oblue(prss, l)) < var(0blue)
l
var(Ouue(prss, l))
1
var(Oblue(prss, l))
3 3 4 4 4 5 5 6
0.5695 0.4913 0.2755 0.3058 0.3823 0.2113 0.2541 0.1552
3 3 4 4 4 5 5 5
0.5695 0.4913 0.2755 0.3058 0.3823 0.2113 0.2541 0.3055
THEOREM 6.1. var(0rss) < 3~r2/n (RCLB for 0) for any n. As in the previous sections, we can also derive the B L U E o f 0 based on a
partial RSS (PRSS), namely, X01),. • . ,X(1 i), for l < n. Starting • with . E l l ciX(io and
l l m i n i m i z i n g v a r ( ~ 1 ciX(ii) ) s u b j e c t to t e e u n b i a s e d n e s s c o n d i t i o n s : ~ 1 ci = 1, ~ t 1 cici:~ = 0, leads to the s a m e f o r m as (6.14) with n r e p l a c e d b y l as the u p p e r limit in all the s u m m a t i o n s . T h e e x p r e s s i o n for the v a r i a n c e o f the r e s u l t a n t e s t i m a t o r 0blue(prss, l) is exactly as in (6.15) w i t h the a b o v e c h a n g e . T h e f o l l o w i n g table p r o v i d e s , for n = 3 ( 1 ) 1 0 , m i n i m u m v a l u e s o f l for w h i c h (i) var(0blue(prss, l)) < var(0blue), and (ii) var(0blue(prss, l)) < 3~rZ/n (RCLB). Again, we have taken 0.2 = 1 w i t h o u t loss o f generality. It is clear f r o m this table that substantial savings in the number o f m e a s u r e m e n t s can occur in s o m e cases.
6.3. Estimation of O: which order statistic? Once again, as in the previous sections, we address the issue of the right selection o f order statistics in the context o f RSS. The following variance inequality for
N. N. Chuiv and B. K. Sinha
368
order statistics of a Logistic distribution is useful for our subsequent discussion. Its validity essentially follows from David and Groeneveld (1982). Recall that a sample median is defined as Xmedian:~ =Xm+l:2m+l when n = 2m + 1, and as [Ym:Zm @Ym+l:2ml/2 when n = 2m. LEMMA 6.1. var(Xmedian:n) _< var(Xr:,) for any r and n. In view of the above result, we can recommend the use of the sample median from each row of n observations in Table 1.1, and the mean of all such medians as an estimator of 0. Slightly more efficiently, we propose measuring only m medians from the first m rows of Table 1.1, where m _< n, and use 0median:n(m)
=
ILX median:n (1) + • " ' -t X
(m) ~" median:n]/m
(6.18)
as an estimator of 0. Here X2~dian:n is the sample median from the i th row of Table 1.1. The following result, whose p r o o f again appears in Lain et al. (1995), shows that it is enough to measure only two (2) experimental units to achieve universal dominance over any unbiased estimator of 0 based on a SRS, whatever be n. This result is similar to those in the previous sections. 6.2. (i) var(Xm+l:2m+l)/2 < 3a2/(2m + 1) (RCLB for 0) for n = 2m + 1. (ii) var(Xm:zm)/2 < 3~2/(2m) (RCLB for 0) for n = 2m.
THEOREM
6.4. Estimation of ~7: B L U E based on R S S In this section we discuss the problem of estimation of the scale parameter a in (6.1), and point out that the use of RSS and its suitable variations results in much improved estimators compared to the use of the B L U E (6-blue) under SRS. To derive the B L U E of a based on the entire McIntyre sample X(aI),... ,X(nn), we minimize the variance of ~ ciX(ii) subject to the unbiasedness conditions : ~ ci = O, ~ 1n cici:n = 1. This results in ffbJue = (~-]~nlX(ii)ci:n/dii:")(E~ 1/dii:n) - (~-]~ Ci:n/dii:n)(~nlX(ii)/dii:n) (}--~ 1/d.:.)(E~ c2./d.:.) - (E~ Ci:n/dii:n) 2 (6.19) with
1/dii:n)
var(c blue ) = ¢72
(6.20)
(E7 1/a.:.)(E7 G/a,,:.) - ( E l ° c , : ° / 4 , : . ) 2 Using (6.16), the above expression can be simplified as
_ O'blue =
with
n
x(. I Ci:n/4,:.
E",
(6.21)
On some aspects of ranked set sampling
369
Table 6.3 Comparison of variances for estimation of cr n
var (6"blue)
var(~blue)
RCLB(a)
3 4 5 6 7 8 9 10
0.3333 0.2254 0.1704 0.1370 0.1145 0.0984 0.0864 0.0768
0.4533 0.2521 0.1627 0.1143 0.0850 0.0658 0.0525 0.0428
0.1372 0.1029 0.0823 0.0686 0.0588 0.0514 0.0457 0.0412
Table 6.4 Minimum values of l, indicating dominance of PRSS over SRS var(6blue(prss, 1)) < var(0blue) var(ablu~(prss, l)) 8 9 10
var((}blue) = 0-2
7 8 8
0.0891 0.0681 0.0718
n 21 ~ 1 ei:n/dii:n
(6.22)
In Table 6.3 we have presented the values of var(6-blue), var((}blue) and the R C L B (a) for n = 3(1)10. The dominance of (}blue o v e r (}blue holds for n > 5, although there is no d o m i n a n c e over R C L B ( a ) . As in the case of estimation of 0, here also we can use a partial RSS, namely, X(ll),...,X(ll) for some l 0, fl > 0 are the scale and shape p a r a m e t e r s respectively. Let x = l n y ( the natural logarithm of y ), then x has a Type I asymptotic distribution of smallest (extreme) values given by -oc <x
-=7)
,
z<m*,
.... ~N*d(Y*--m)d*2 (N,+l){z_m,+N(y,_m,)}d,_l
m* ,
z>
Note that for the noninformative prior P(7, e) c~ e -l, m*---+m,
y*~y,
d*--+d,
N*~N
.
In what follows we shall remove the stars although it is clear that when the proper prior is available we can replace the unstarred values with starred values. Clearly, then, for (3.1) we would calculate a P-value for the largest, ( N - 1)c@(a4)-m) c-' PM =
(4.2)
m))
where c = d - 1 if Ya¢ is fully observed c= d
if ym is censored
and for the smallest, d-2
l
(4.3)
Pm = 1 - - ~ \Y(") - - m /
where m2 the second smallest is smaller than the censored observations when m is excluded. For combinations of the largest and smallest we first calculate (bereft of stars) the predictive probability of a pair of future observables Z1 and Z2
PrIZe_ t] = 1 - ( X - 1 ) ( X - 2)B.l\ t'
J
(5.3)
Some uses ~?[order statistics in Bayesian analysis
where
B(u, v) is the m2 -
T - - - -
387
beta function. Similarly for the smallest a statistic used is m
m-m
whose sampling distribution yields for a P-value, Pr[T < t] =
(N-2)BC+(n-2)t 1-t
-
,n - 2 ) .
(5.4)
When d < N, straightforward frequentist solutions are not available. In general then assume some diagnostic, say D, is used in ransacking to order the Yi. Then the transformation D(Y,.)=Di yields random variables D1, D2,..., DN. Hence we need to find the distribution of DM the transform which yields the observed y which is most discrepant, namely
FDM(dIO) , the conditional distribution of Then
DM associated
PM = 1 -- f FDM(dlO)p(O)d0
with the most removed Y~ given 0. (5.5)
where p(O) is a proper prior. Tests of this sort were termed Unconditional Predictive Discordancy (UPD), Geisser (1989). They allow the prior to play a substantial role in determining the outlier status of an observation. One can continue by finding the joint distribution of the ordered Di's given 0 and test whether jointly some most discrepant subset in terms of the diagnostic's discrepancy ordering is a discordant subset. For a simple normal case we assume ~, i = 1,... ,N are i.i.d. N(#, a 2) with 0-2 known and kt,-,~N(0,'c2). Now the unconditionally Y~, i = 1 , . . . , N are an exchangeable set of normal variates with mean 0, variance 0-2 + ~2 and covariance ~2. This might imply that V~= (Y~-- 0)2/(0-2 + "t"2) is the appropriate diagnostic with the maxi Vii = V0 being used to construct the significance test for the largest deviation, namely PM = Pr[V0 _> v] .
(5.6)
It is clear that V1,..., VN are all exchangeable and marginally distributed as Z2 with one degree of freedom. Although the distribution of ~ is not analytically explicit, I'M can be calculated by either numerical approximation or Monte Carlo techniques, see also tables by Pearson and Hartley (1966). However, this is not the critical issue. The question is whether V~ is an appropriate discrepancy measure because V~ only reflects distance from the prior mean and this could cause some discomfort as it need not reflect sufficient distance from the rest of the observations. The latter is often the implied definition of an outlier or a discordant observation. One could also use (Y/ - - Y ) Z N
max (0-2 + v2)(x + 1)
=
maxZi = Z0
(5.7)
388
S. Geisser
again a joint distribution of exchangeable normal random variables, each marginally Z2 with one degree of freedom, and though slightly more complex, it is still calculable. Again, this is essentially the frequentist approach for ~2 = 0 which in essence is independent of the prior assumptions. Perhaps this goes too far in the other direction, i.e. disregarding the prior information. Some compromise may be needed and the one that suggests itself is
( NT2y2 92.] 2 W/= Yi NT2 q- 0"2 ]
(5.8)
where the deviation is from the posterior mean an appropriate convex combination of sample mean and prior mean. Although unconditionally ~ , . . . , WN are still exchangeable the marginal distribution of Wi is essentially proportional to a non-central Z2, thus complicating matters still further for 14~ = max W~. However, deviations such as W~seem more sensible in that both prior and likelihood play a part in contrast to only either prior or likelihood. Further distributional complications ensue when the usual conjugate gamma prior is assumed for 0.-2. In addition, the two hyperparameters of the gamma prior also must be known. Extension to multiple linear regression with normally distributed errors, though clear for all 3 approaches, involves further unknown hyperparameters. For Poisson regression we also would require a discordancy ordering perhaps based on the least probable Y,. as the potential outlier. As this becomes quite complicated we shall merely illustrate for i.i.d. Poisson variates with a gamma prior, for 0 p(Olw,6) -
76 0z,- I e - ~O v(a)
If the maximum Y,. has the smallest probability we let Z = maxi Y/, assuming this is the potential outlier. Then [-z l e 00y] N
pr[z >
z] = 1 - [
dO ky=O y6 /.oo
=I-F-~]
p(O) dO
(
02
° e-(N+JO0a l l + 0 + ~ - + . . . - t
0z-i )N ( z T ~ ) .~
dO . (5.9)
Clearly one can write the multinomial expansion for the term raised to the N th power in the integrand and integrate termwise and obtain a complex but finite and explicit solution involving gamma functions. If rain Y/= W has the smallest probability, then
Some uses of order statistics in Bayesian analysis
Pr[W < wlO] = 1 -
389
w e_OOy]N 1 - y~=o-7-i]"
(5.10) w e O0yTN
Pr[Wa)
1-y_~07]
dO.
Again this is complex but explicitly computable in terms of a finite series involving gamma functions. Although simple analytic expressions, except when dealing with the exponential distribution, are rare, Monte Carlo methods are generally available to handle such situations. However, the major difficulty is of course the assignment of the proper prior distribution and the ensuing set of hyperparameters. Because of these difficulties we shall present another way of handling these situations which can be used with proper or improper priors.
6. Conditional predictive discordancy (CPD) tests We shall now present a method which (a) turns out to be much easier to calculate, (b) can be used for the usual improper priors, (c) depends on a proper prior and its hyperparameters when a proper prior is used, (d) is seamless in its transition from a proper prior to an improper prior and to censoring, and (e) in certain instances when an improper prior is used it will yield a result identical to a frequency significance test. The idea is relatively simple if D(Y/) represents the scalar diagnostic which characterizes the discrepancy of the observation from the model and orders the observables from most to least discrepant D1,D2,... ,DN, then a significance test P = er[D, > &lD1 > d2, d0) ]
(6.1)
where do) refers to d (N) with dl deleted. Here we assume only D1 is random and conditioning is on D(1), i.e. all but the largest discrepant value. Alternatively, we could consider conditioning D(1,2), i.e. all but the largest and second largest discrepant values which would result in P = Pr[D1 > dllD2 > d2,d0,2)] .
(6.2)
As an example we consider the exponential case of section 4. For testing the largest for discordancy using (6.1) we obtain
PM = Pr[Z > MIZ > M2,Y(M)] = rlVl:.(y-- m ) - ( M - M2)/c] -
PM = (1 - t ) c where
_
Y-6-7)
(6.3)
S. Geisser
390
M-M2 t
--
-
-
N ( y - m)
c = d - 1 if M were censored c = d - 2 if M were uncensored when the non-informative prior is used. F o r the conjugate prior we need only to affix stars to ~, m, d, and N, using the previous definitions of (4.1). Using (6.2) we obtain
PtM • e[Zl ~ MIX2 > M2,Y(M,M=)] P[ZI ~ M, Z2 ~ Mzly(M,M2)] P[Z2 e
M21Y(M,M2)]
_ N_;I .(N(Y -_N___@__~om) (M - m)) c
(6.4)
N-1 --
N
(1-t)~
where t
--
mmm N(~ - m) -
-
and c=
d-1 d-2 d-3
if M and M2 are censored if one of M or M2 is censored if M and M2 are uncensored
We know that if d = N, the uncensored case, the sampling distribution of the statistic
M-M2
T -- - N ( y - m)
(6.5)
which can be used to test for the largest being an outlier is such that Pr[T > t] = (1 - t) N-1 = PM i.e. the same value as (6.3). Hence we have a seamless transition from proper prior with censoring to the usual non-informative prior without censoring yielding the sampling distribution statistic. The second m e t h o d illustrated by (6.4) does not provide a frequentist analogue M-m and this cannot be reconciled with for the sampling distribution of T - N(y-m) (6.4). F o r the smallest observation we obtain, basically using (6.1), Pm = P r [ Z _< mlZ < m2,Y(m)]
Some uses of order statistics in Bayesian analysis
391
where
A(m)/A(m2), B(m)/A(m2), B(m)/B(m2),
Pm =
mo uclU~ > U~-l,Y(c)] where u~ and u~-i are the realized values of Uc and Uc 1. We suggest that the significance computation be made as follows: Pc -
Pr[U > Uc] Pr[U > uc i]
(6.6)
S. Geisser
392
where U is distributed as an F-variate with 1 and N - 1 - p degrees of freedom. Similarly for Poisson regression we can order (3.1) using
Pi : Pr [Y/= Yi Ixi, Y(i), x(i)]
: (t(i)-}-Yi--1)( Xi ~Yi(U(i) ~t(i) \ t(i) - 1 ku(i) "-[-Xi/I \hi(i) ~-Xi,I
(6.7)
where t(i) = ~]#iyj, u(i) = 2]#iXj and Pc and Pc-I are the smallest and second smallest probabilities corresponding to say yc and yc I. At this point one could use as significance level the simple computation Pc-
pc
Pc 1+Pc
if xc # xc 1- However, if xc = xc-1, alternatively one can use the tail area C P D approach, i.e. Pr[Yc
Pr [Y~ _> Yc[Y(c)]
(6.8)
>_y~lYc >_yc ,,y(c)] = Pr:YcL 2yc lly(c)]
7. Combinations of largest and smallest In order to derive C P D tests for two observations at a time, i.e. combinations of the smallest and largest, we need the joint predictive distribution of two future observations given y(N). We shall derive results for the translated exponential case of the previous section. Here Pr Zl _< zl, Z2 _< zzly (x) =
N(~- m)
N( + N
N+l
( \ N ( f - v) + z2 - v ]
) d-1 + 2)V N
{
N(y-m)
,~a-1
- - -X + 1 \)V(y--- ~)-+V1 - ] (7.1)
for v : min(zl,z2) and max(zl,z2) _< m,
N ( N~-m) )a-1 Pr S,[ z21Y(N)~]= (N + I)(N + 2) \N (y --~[)-~-~2 - z, for
Z 1 _ MIZ~ M2,Y(m,M)]
)(-F(M,m)_~ ~m I . = [((NN_-_?2)@(M,m)-m~ m +M2+M 2 c F o r (M - m)tM = M - M2 and (M - m)tm = m2 -- rn, Pm,M = {1 -- U(tM + (N -- 1)tM)} c .
(7.2)
F o r d = N, it can easily be shown that the unconditional-frequency calculation Pr[(TM + (N - 1)Tm)g > (tM -}- (N -- 1)tm)U] = Pm,M ,
(7.3)
M m tM and tm are the realized values of the r a n d o m variables U, TM and Tm u - N(-7--m)' respectively, Geisser (1989). F o r the two smallest (re, m2), where m3 is the third smallest, and assuming m3 _< m i n ( y d + l , . . . ,YN), it seems plausible to calculate Pm,m2
= Pr[Z1 < m , m
< Z2 ~
m2lZ1~ 22 ~ ma,Y(m,m2)]
(7.4) for N > d > 3. F o r the two largest (M, M2), similarly we m a y calculate PM,M2 = Pr[Z1 > M, M2 < Z2 Z2 >_ M3,Y(M,M2)] = 2Pr[Z1 > M, Z2 > M21 - Pr[Z1 > M, Z2 > M] Pr[Z1 > M3,Z2 > M3] where M3 is the third largest observation. Then for c= we have
d - 1 d-2 d - 3
if M and M2 are censored, if one o f M and M2 is censored, if M and M2 are uncensored ,
394
s. Geisser
m)
(7.5) It is of interest to point out that plausible alternative regions can be used for testing the two largest or the two smallest observations, which have frequentist analogues when d = N. It is not difficult to show that defining PrIZ1 > M , Zz > M2[y(M,M2)] P~t,M2 = Pr[Z1 > M3,Z2 > M3IY(M,M2)]
(7.6)
will result in P~t,M2 = (1 -- ur) c, where c is defined as before in the censored case and r ~
(M - M2) + 2(M2 - M3) M - rn
Further, for d = N, the unconditional-frequency calculation for the random variable UR observed as ur is Pr[UR > ur] : PM,a6 •
(7.7)
A similar calculation for the two smallest gives e' er[Zl < re, Z2 < m2lY(m,m2)] m,m2 = er[Z, < m3,Z2 < m3[Y(m,m2)] = (1 -- US) a-3 ,
(7.8)
where S ~
( N - 1)(m2 - m) + ( N - 2)(m3 - m2) M-m
Again for d = N the frequency calculation for US the random variable observed as us yields Pr[US > ur] = Ure,m2
(7.9)
All of the CPD tests can be given in terms of the original proper prior distribution by substituting m*, y*, d*, N* for m, y, d, N respectively. Advantages of the C P D tests given here over the usual frequency-discordancy tests are that they
Some uses of order statistics in Bayesian analysis
395
can include prior information and censoring. The comparative advantage of the C P D tests over the U P D tests is that the former can be used with certain useful improper priors and are always much easier to calculate. All of these tests are basically subjective assessments and, in general, are not frequency-based, even though under very particular circumstances some of them can be shown to have a frequency analogue. O f course, when alternatives can be specified, a full Bayesian treatment will supersede the methods suggested here for unknown alternatives.
8. Ordering future values In certain problems where there is data regarding an observable such as the yearly high water m a r k on the banks of a river or a dam there is an interest in calculating the probability that no flood will occur in the next M years, i.e. the m a x i m u m high water m a r k does not exceed a given value. Conversely, another situation is the use of possibly harmful drugs serving a limited number of patients given to alleviate very serious but rather rare diseases. Here the drug may be lethal or severely damage some important bodily function if some measured physiological variable falls below (or above) some defined value. A more mundane situation is where a buyer of say M light bulbs, whether connected in series or not, wants to calculate the probability of no failures in a given time based on previous information of bulb lifetimes. In the last two situations we require the chance that the minimum physiological value (or failure time) exceeds a certain threshold. In the first case we are interested in calculating the m a x i m u m Z of future values say YN+I,..., YN+Mnot exceeding a given value z, i.e. the distribution function of the m a x i m u m Z,
Pr[Z
w{y(N)I =
for w > m*
NZ-~-ML-~J
for w < m* ,
c.f. D u n s m o r e (1974). Sometimes the situation is such that we are interested more generally in the chance that at least the r th largest will not exceed a given value. We first obtain the probability that exactly r out of M will not exceed the threshold w, Geisser (1984). Let V/=I = 0,
if YN+i< w i= l,...,M otherwise
and set R = ~ N V/. Then after some algebra we obtain
if r > 0 ,
(y*-m*h
N*+M\y*-w]
=1
/:0
1+
w m* , (8.3)
roly(N)]
and 1 - Pr[R _< is also the distribution function of the r th order statistic of the future random variables Y~+i, i = 1,... ,M. For further ramifications on interval estimation of the r th order statistics, see Geisser (1985). Other sampling distributions cure conjugate priors are generally not amenable to explicit results but numerical approximations or Monte Carlo simulations are often capable of yielding appropriate numerical answers for the most complicated situations.
I11,..., YN, Y N + I , . . . , YN+M i.i.d. N(#, 0"2), Pr[Yx+i_ D1) 6, the approximation of Patnaik (1949) described in Young (1962) has been used. From Tables 12 and 13, we see that the power of the )~2-test is usually as good or better than those of the other two procedures. However, for slippage to the left (2 < 1), the powers of Procedure 1 are quite comparable to those of the )~2-test. But Procedure 1 has the distinct advantage that the expected sample size decreases as 2 decreases, whereas the sample size of the zZ-test is fixed. For example, we see from the seventh row o f k = 4, 2 = 1/5 in Table 3 and the sixth row of k = 4, 2 = 1/5 in Table 13 that the respective attained significance levels (.0605, .0708) and powers (.8168, .8342) are comparable, but the sample size for the 7~2-test is 42 whereas the expected sample size for Procedure 1 is seen from Table 5 to be 35. The same observations can be made regarding Procedure 2 when 2 > 1. This time, however, the advantage of decreased expected sample size is quite significant. For example, in the fifth row of k = 5 of Tables 9 and 13 we see that the attained significance levels of the two procedures are similar (.0728, .0692), and
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
410
Table 5 Exact expected sample size when using Procedure 1 k
M
2 1/5
1/4
1/3
1/2
2
3
4
5
1
3
8 9 10 11 12 13 14 15
14.14 16.13 18.12 20.13 22.14 24.17 26.20 28.23
14.46 16.49 18.53 20.59 22.65 24.72 26.79 28.87
14.98 17.09 19.21 21.34 23.48 25.63 27.78 29.94
15.94 18.21 20.50 22.79 25.10 27.40 29.72 32.04
15.23 17.32 19.39 21.46 23.52 25.57 27.62 29.67
13.21 14.91 16.60 18.28 19.96 21.64 23.31 24.99
11.97 13.48 14.99 16.49 18.00 19.50 21.00 22.50
11.19 12.60 14.00 15.40 16.80 18.20 19.60 21.00
17.20 19.76 22.34 24.95 27.57 30.21 32.86 35.52
4
8 9 10 11 12 13 14 15
18.35 21.08 23.83 26.61 29.41 32.22 35.05 37.89
18.64 21.41 24.21 27.03 29.87 32.72 35.60 38.48
19.11 21.95 24.82 27.72 30.63 33.56 36.51 39.47
19.98 22.98 26.01 29.06 32.13 35.21 38.31 41.42
18.69 21.32 23.94 26.55 29.16 31.75 34.33 36.91
15.79 17.84 19.88 21.91 23.94 25.95 27.97 29.97
13.95 15.72 17.48 19.24 20.99 22.75 24.50 26.25
12.79 14.39 16.00 17.60 19.20 20.80 22.40 24.00
21.26 24.54 27.87 31.22 34.61 38.02 41.46 44.91
5
8 9 10 11 12 13 14 15
22.32 25.77 29.26 32.78 36.34 39.92 43.53 47.16
22.58 26.08 29.61 33.18 36.77 40.40 44.05 47.72
23.02 26.58 30.19 33.82 37.49 41.19 44.91 48.66
23.85 27.56 31.31 35.09 38.91 42.75 46.62 50.51
22.06 25.24 28.42 31.58 34.73 37.86 40.98 44.09
18.35 20.76 23.16 25.53 27.90 30.26 32.61 34.96
15.93 17.96 19.97 21.98 23.99 25.99 28.00 30.00
14.38 16.19 17.99 19.80 21.60 23.40 25.20 27.00
25.11 29.10 33.15 37.24 41.38 45.55 49.75 53.98
6
8 9 10 11 12 13 14 15
26.12 30.27 34.47 38.73 43.03 47.37 51.74 56.14
26.37 30.56 34.81 39.10 43.45 47.83 52.24 56.68
26.78 31.04 35.36 39.72 44.14 48.58 53.07 57.58
27.57 31.97 36.43 40.94 45.49 50.09 54.71 59.37
25.36 29.10 32.82 36.54 40.24 43.92 47.58 51.23
20.89 23.67 26.42 29.15 31.86 34.56 37.26 39.94
17.91 20.19 22.46 24.73 26.99 29.24 31.49 33.75
15.98 17.99 19.99 22.00 24.00 26.00 28.00 30.00
28.82 33.50 38.25 43.07 47.94 52.85 57.81 62.81
7
8 9 10 11
29.78 34.62 39.53 44.50
30.02 34.90 39.85 44.86
30.42 35.36 40.38 45.46
31.18 36.26 41.41 46.64
28.60 32.89 37.17 41.44
23.42 26.56 29.67 32.75
19.88 22.42 24.95 27.47
17.57 19.78 21.99 24.19
32.41 37.77 43.22 48.74
the powers for 2 = 2, 3, 4, and 5 are all comparable (.4027, .8299, .9652, .9941 in Table 9 and .4322, .8562, .9764, .9967 in Table 13). However, for each value of 2, the sample size of the z2-test is fixed at 45, whereas the expected sample size of Procedure 2 can be seen from Table 11 to be 39.25 when 2 = 2, 29.61 when 2 = 3, 22.84 when 2 = 4, and 18.88 when 2 = 5.
411
Inverse sampling procedures to test for homogeneity in a multinomial distribution
Table 6 Simulated expected sample size when using Procedure 1 k
M
,t 1/5
1/4
1/3
1/2
2
8 9 10 ll 12 13 14 15
14.14 16.12 18.12 20.11 22.15 24.18 26.19 28.21
14.47 16.49 18.55 20.59 22.66 24.72 26.74 28.94
14.96 17.12 19.19 21.31 23.47 25.63 27.79 29.95
15.97 18.18 20.50 22.80 25.10 27.41 29.71 32.08
15.20 17.27 19.37 21.49 23.57 25.60 27.64 29.64
8 9 10 11 12 13 14 15
18.32 21.10 23.83 26.58 29.36 32.27 35.01 37.90
18.68 21.42 24.19 27.06 29.88 32.70 35.58 38.51
19.13 21.97 24.85 27.65 30.61 33.50 36.46 39.40
19.95 23.01 26.09 29.05 32.05 35.20 38.21 41.39
5
8 9 10 11 12 13 14 15
22.39 25.78 29.39 32.73 36.35 39.82 43.51 47.25
22.59 26.09 29.46 33.26 36.79 40.39 44.12 47.75
22.96 26.60 30.15 33.81 37.51 41.14 45.04 48.69
6
8 9 10 11 12 13 14 15
26.09 30.30 34.48 38.70 43.01 47.36 51.78 56.37
26.39 30.55 34.80 39.11 43.47 47.90 52.33 56.72
7
8 9 10 11 12 13 14 15
29.79 34.66 39.58 44.58 49.65 54.51 59.80 64.88
8
8 9 10 11 12 13 14 15
33.45 38.86 44.46 50.20 55.85 61.69 67.68 73.44
3
3
4
5
1
13.21 14.89 16.61 18.31 19.90 ,21.54 23.37 24.96
11.99 13.47 14.97 16.52 17.93 19.53 20.98 22.54
11.17 12.63 13.98 15.43 16.78 18.23 19.60 20.99
17.22 19.78 22.40 25.01 27.61 30.23 32.88 35.61
18.64 21.29 23.96 26.52 29.15 31.77 34.39 36.86
15.76 17.81 19.79 21.86 23.94 26.04 28.02 29.93
13.99 15.66 17.49 19.27 21.07 22.78 24.49 26.30
12.78 14.40 16.01 17.56 19.20 20.80 22.37 23.98
21.22 24.51 27.82 31.24 34.72 37.90 41.46 44.77
23.88 27.60 31.29 35.17 39.00 42.75 46.66 50.52
22.07 25.27 28.46 31.61 34.69 37.77 40.97 44.32
18.32 20.78 23.15 25.55 27.86 30.16 32.62 35.06
15.95 18.05 19.99 21.97 24.02 25.96 28.04 30.03
14.42 16.21 17.99 19.82 21.57 23.40 25.15 27.06
25.18 29.09 33.15 37.28 41.29 45.61 49.71 54.00
26.82 31.10 35.35 39.72 44.07 48.67 53.16 57.60
27.51 31.91 36.45 41.01 45.53 50.02 54.63 59.41
25.42 29.00 32.78 36.42 40.22 44.02 47.53 51.10
20.93 23.67 26.34 29.23 31.91 34.38 37.28 39.95
17.90 20.26 22.40 24.66 27.04 29.16 31.53 33.75
15.95 17.96 20.04 21.97 23.96 25.96 28.04 29.90
28.83 33.60 38.09 43.01 47.96 52.74 57.73 62.78
30.01 34.87 39.93 44.89 49.92 55.14 60.23 65.33
30.46 35.36 40.46 45.36 50.68 55.90 61.03 66.12
31.29 36.18 41.42 46.69 51.97 57.08 62.60 67.97
28.62 33.03 37.08 41.47 45,60 50.05 54.14 58.57
23.48 26.60 29.65 32.65 35.72 38.88 41.88 45.07
19.88 22.36 24.92 27.55 30.00 32.43 34.91 37.63
17.53 19.78 21.99 24.22 26.41 28.56 30.79 32.96
32.44 37.78 43.21 48.80 54.38 60.05 65.79 71.40
33.56 39.12 44.81 50.45 56.30 62.10 67.90 73.91
33.95 39.65 45.32 51.14 57.03 62.96 68.89 74.92
34.80 40.60 46.45 52.08 58.13 64.22 70.30 76.51
31.77 36.72 41.55 46.18 51.01 55.99 60.78 65.18
25.96 29.43 32.97 36.32 39.82 43.25 46.47 49.76
21.85 24.71 27.50 30.33 32.99 35.80 38.59 41.18
19.21 21.57 24.02 26.34 28.93 31.18 33.55 35.95
35.81 41.93 48.09 54.24 60.78 66.87 73.31 79.90
412
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Table 7 Critical values D2 for Procedure 2 k
3
4
5
6
7
8
~
No 18
21
24
27
30
33
36
0.01 0.05 0.10 0.15
9 7 6 6
10 8 7 6
10 8 7 7
i1 9 8 7
12 9 8 7
12 10 9 8
12 10 9 8
0.01 0.05 0.10 0.15
No 22 9 7 6 5
26 9 7 6 6
30 10 8 7 6
34 10 8 7 6
38 11 8 7 7
42 11 9 8 7
46 11 9 8 7
0.01 0.05 0.10 0.15
No 25 8 6 6 5
30 8 7 6 5
35 9 7 6 6
40 10 7 7 6
45 10 8 7 6
50 10 8 7 7
55 11 9 8 7
0.01 0.05 0.10 0.15
No 29 8 6 5 5
35 8 6 6 5
41 9 7 6 6
47 9 7 6 6
53 10 8 7 6
59 10 8 7 6
65 10 8 7 7
0.01 0.05 0.10 0.15
No 33 7 6 5 5
40 8 6 6 5
47 8 7 6 5
54 9 7 6 6
61 9 8 7 6
68 10 8 7 6
75 10 8 7 7
0.01 0.05 0.10 0.15
No 37 7 6 5 5
45 8 6 5 5
53 8 7 6 5
61 9 7 6 6
69 9 7 6 6
77 10 8 7 6
85 10 8 7 6
5. The combined procedure W e h a v e s e e n in t h e l a s t s e c t i o n t h a t P r o c e d u r e 1 is t h e b e t t e r p r o c e d u r e w h e n 2 < 1, w h i l e P r o c e d u r e 2 p e r f o r m s b e t t e r w h e n 2 > 1. S i n c e t h e f o r m o f t h e a l t e r n a t i v e h y p o t h e s i s is n o t a l w a y s k n o w n , i d e a l l y w e w o u l d like t o h a v e a procedure that performs optimally for both forms of the alternative hypothesis.
413
Inverse sampling procedures to test for homogeneity in a multinomial distribution Table 8 Powers of Procedure 2 at 5% significance level k
No
~
D2
1/5
1/4
1/3
1/2
2
3
4
5
1
3
18 21 24 27 3O 33 36
0.1431 0.1225 0.1617 0.1257 0.1598 0.1352 0.1551
0.13t5 0.1152 0.1468 0.1272 0.1496 0.1209 0.1458
0.1169 0.0907 0.1395 0.1081 0.1430 0.1034 0.1329
0.0870 0.0652 0.1045 0.0795 0.1006 0.0764 0.1030
0.1987 0.1915 0.2647 0.2567 0.3148 0.3003 0.3639
0.4713 0.4968 0.6217 0.6323 0.7171 0.7187 0.7915
0.6811 0.7371 0.8299 0.8418 0.9104 0.9081 0.9461
0.8162 0.8591 0.9244 0.9345 0.9674 0.9735 0.9878
0.0367 0.0253 0.0387 0.0280 0.0396 0.0255 0.0350
7 8 8 9 9 10 10
4
22 26 30 34 38 42 46
0.0559 0.0869 0.0636 0.0868 0.1082 0.0780 0.0939
0.0551 0.0859 0.0587 0.0877 0.0994 0.0766 0.0938
0.0529 0.0794 0.0579 0.0764 0.1000 0.0678 0.0897
0.0404 0.0680 0.0460 0.0621 0.0828 0.0621 0.0781
0.1432 0.2215 0.2130 0.2847 0.3609 0.3259 0.3996
0.4330 0.5720 0.5950 0.7080 0.7780 0.7812 0.8432
0.6752 0.8107 0.8322 0.8994 0.9428 0.9520 0.9742
0.8318 0.9231 0.9392 0.9713 0.9874 0.9907 0.9964
0.0214 0.0384 0.0259 0.0316 0.0462 0.0312 0.0425
7 7 8 8 8 9 9
5
25 30 35 40 45 5O 55
0.0729 0.0512 0.0714 0.0949 0.0652 0.0854 0.0556
0.0691 0.0487 0.0653 0.0905 0.0613 0.0812 0.0531
0.0643 0.0424 0.0675 0.0860 0.0578 0.0695 0.0485
0.0561 0.0386 0.0572 0.0792 0.0532 0.0664 0.0435
0.1868 0.1727 0.2506 0.3232 0.3117 0.3702 0.3488
0.4928 0.5166 0.6513 0.7533 0.7537 0.8283 0.8337
0.7362 0.7830 0.8765 0.9371 0.9413 0.9696 0.9756
0.8823 0.9143 0.9641 0.9834 0.9892 0.9948 0.9952
0.0389 0.0256 0.0405 0.0486 0.0325 0.0435 0.0254
6 7 7 7 8 8 9
6
29 35 41 47 53 59 65
0.0482 0.0815 0.0504 0.0673 0.0423 0.0580 0.0713
0.0461 0.0741 0.0477 0.0694 0.0416 0.0598 0.0723
0.0495 0.0715 0.0485 0.0701 0.0467 0.0553 0.0658
0.0465 0.0663 0.0475 0.0575 0.0410 0.0518 0.0635
0.1544 0.2360 0.2254 0.2934 0.2730 0.3475 0.4203
0.4723 0.6276 0.6502 0.7408 0.7541 0.8259 0.8767
0.7411 0.8597 0.8911 0.9406 0.9517 0.9743 0.9876
0.8809 0.9588 0.9635 0.9893 0.9917 0.9969 0.9989
0.0326 0.0489 0.0299 0.0414 0.0243 0.0337 0.0403
6 6 7 7 8 8 8
7
33 40 47 54 61 68 75
0.0383 0.0698 0.0380 0.0619 0.0304 0.0459 0.0617
0.0419 0.0590 0.0389 0.0548 0.0366 0.0437 0.0596
0.0353 0.0614 0.0347 0.0495 0.0329 0.0451 0.0553
0.0315 0.0529 0.0328 0.0525 0.0287 0.0381 0.0550
0.1339 0.2151 0.2066 0.2730 0.2547 0.3253 0.3983
0.4530 0.6042 0.6317 0.7331 0.7472 0.8272 0.8795
0.7396 0.8616 0.8884 0.9400 0.9553 0.9768 0.9881
0.8848 0.9585 0.9741 0.9890 0.9925 0.9974 0.9986
0.0268 0.0462 0.0294 0.0374 0.0222 0.0300 0.0406
6 6 7 7 8 8 8
8
37 45 53 6t 69 77 85
0.0307 0.0544 0.0341 0.0453 0.0610 0.0406 0.0497
0.0326 0.0576 0.0316 0.0450 0.0594 0.0402 0.0490
0.0308 0.0434 0.0305 0.0415 0.0570 0.0405 0.0462
0.0312 0.0452 0.0259 0.0400 0.0555 0.0344 0.0468
0.1243 0.2029 0.1867 0.2673 0.3286 0.3058 0.3715
0.4434 0.5901 0.6197 0.7358 0.8143 0.8255 0.8810
0.7305 0.8612 0.8862 0.9430 0.9721 0.9754 0.9888
0.8905 0.9577 0.9751 0.9907 0.9972 0.9972 0.999t
0.0234 0.0387 0.0206 0.0331 0.0432 0.0241 0.0349
6 6 7 7 7 8 8
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakr&hnan
414
Table 9 Powers of Procedure 2 at 10% significance level No
D2 1/5
1/4
1/3
1/2
2
3
4
5
1
3
18 21 24 27 30 33 36
0.2541 0.1991 0.2559 0.1973 0.2469 0.1917 0.2296
0.2389 0.1903 0.2301 0.1877 0.2273 0.1796 0.2156
0.2131 0.1664 0.2166 0.1700 0.2103 0.1640 0.1956
0.1704 0.1285 0.1777 0.1280 0.1696 0.1204 0.1579
0.3007 0.6086 0.2777 0.6125 0.3666 0.7100 0.3506 0.7211 0.4192 0.7982 0.3950 0.7903 0.4635 0.8436
0.7931 0.8097 0.8869 0.8979 0.9386 0.9385 0.9658
0.9012 0.9148 0.9521 0.9617 0.9832 0.9833 0.9916
0.0928 0.0542 0.0784 0.0532 0.0784 0.0541 0.0629
6 7 7 8 8 9 9
4
22 26 30 34 38 42 46
0.1228 0.1642 0.1214 0.1520 0.1859 0.1388 0.1598
0.1167 0.1589 0.1100 0.1451 0.1815 0.1331 0.1518
0.1046 0.0897 0.1542 0.1262 0.1150 0.0928 0.1365 0.1205 0.1751 0.1486 0.1218 0.1101 0.1493 0.1272
0.2490 0.5621 0.3305 0.6891 0.3008 0.6855 0.3784 0.7863 0.4666 0.8510 0.4336 0.8460 0.4918 0.8893
0.7780 0.8795 0.8866 0.9398 0.9690 0.9695 0.9845
0.8909 0.9537 0.9657 0.9842 0.9930 0.9941 0.9983
0.0594 0.0856 0.0531 0.0724 0.0945 0.0643 0.0721
6 6 7 7 7 8 8
5
25 30 35 40 45 50 55
0.0729 0.1030 0.1439 0.0949 0.1215 0.1434 0.0961
0.0691 0.1017 0.1450 0.0905 0.1138 0.1431 0.0992
0.0643 0.0934 0.1294 0.0860 0.1061 0.1387 0.0949
0.1868 0.2681 0.3629 0.3232 0.4027 0.4698 0.4484
0.7362 0.8823 0.8588 0.9517 0.9256 0.9832 0.9371 0.9834 0.9652 0.9941 0.9847 0.9977 0.9834 0.9987
0.0389 0.0601 0.0877 0.0486 0.0728 0.0839 0.0548
6 6 6 7 7 7 8
6
29 35 41 47 53 59 65
0.1215 0.0815 0.1064 0.1404 0.0946 0.1149 0.1336
0.1197 0.0741 0.1007 0.1346 0.089l 0.1113 0.1352
0.1189 0.1094 0.0715 0.0663 0.1042 0.0935 0.1351 0.1218 0.0883 0.0751 0.1065 0.1017 0.1267 0.1199
0.2511 0.5948 0.2360 0.6276 0.3212 0.7382 0.4073 0.8247 0.3736 0.8233 0.4406 0.8826 0.5207 0.9228
0.8311 0.8597 0.9256 0.9636 0.9687 0.9853 0.9925
0.9351 0.9588 0.9840 0.9943 0.9955 0.9984 0.9992
0.0863 0.0489 0.0673 0.0928 0.0575 0.0716 0.0884
5 6 6 6 7 7 7
7
33 40 47 54 61 68 75
0.1010 0.0698 0.0907 0.1188 0.0770 0.0901 0.1076
0.1014 0.0942 0.0590 0.0614 0.0851 0.0826 0.1105 0.1144 0.0687 0.0683 0.0921 0.0887 0.1112 0.1057
0.0931 0.0529 0.0805 0.1064 0.0631 0.0847 0.0986
0.2376 0.5870 0.2151 0.6042 0.3010 0.7245 0.3761 0.8098 0.3491 0.8219 0.4170 0.8807 0.4933 0.9142
0.8250 0.8616 0.9293 0.9674 0.9690 0.9868 0.9933
0.9372 0.0741 0.9585 0.0462 0.9870 0.0613 0.9934 0.0848 0.9965 0.0503 0.9990 0.0639 0.9996 0.0769
5 6 6 6 7 7 7
8
37 45 53 61 69 77 85
0.0863 0.1322 0.0749 0.0997 0.1297 0.0836 0.0953
0.0858 0.1223 0.0732 0.1016 0.1310 0.0764 0.0965
0.0805 0.1182 0.0668 0.0876 0.1196 0.0703 0.0846
0.2212 0.5748 0.3129 0.7110 0.2783 0.7173 0.3611 0.8113 0.4369 0.8803 0.3938 0.8757 0.4720 0.9217
0.8242 0.9372 0.9202 0.9764 0.9348 0.9861 0.9668 0.9950 0.9849 0.9988 0.9858 0.9989 0.9951 0.9999
0.0863 0.1241 0.0743 0.0992 0.1149 0.0787 0.0961
0.0561 0.0893 0.1238 0.0792 0.1037 0.1262 0.0788
0.4928 0.6403 0.7467 0.7533 0.8299 0.8828 0.8821
0.0670 0.0995 0.0612 0.0796 0.0966 0.0588 0.0706
5 5 6 6 6 7 7
Inverse sampling procedures to test for homogeneity in a muhinomial distribution
415
Table 10 Expected sample size when using Procedure 2 at 5% significance level k
No 1/5
1/4
1/3
1/2
2
3
4
5
1
3
18 21 24 27 30 33 36
17.28 20.34 22.89 26.09 28.62 31.78 34.39
17.37 20.37 23.05 26.10 28.76 31.94 34.55
17.44 20.53 23.09 26.27 28.83 32.12 34.73
17,60 20.69 23.35 26.48 29.25 32.42 35.07
17.07 20.06 22.35 25.28 27.47 30.46 32.46
15.54 18.10 19.30 21.88 22.87 25.49 26.19
14.07 16.I6 16.76 19.03 19.19 21.55 21.72
12.83 14.69 14.99 16.99 16.93 18.90 18.97
17.83 20.88 23.78 26.82 29.70 32.82 35.69
4
22 26 30 34 38 42 46
21.67 25.36 29.49 33.18 36.83 41.08 44.81
21.68 25.38 29.55 33.19 36.90 41.15 44.78
21.70 25,44 29,54 33,29 36,93 41,29 44,86
21.79 25.54 29.68 33.45 37.19 41.36 45.09
21.21 24.43 28.34 31,33 34.04 38,28 40.58
19.32 21.28 24.51 25.87 27.00 30.35 31.04
17.24 18.20 20.81 21.31 21.73 24.17 24.43
15,35 15.91 18.11 18.27 18.35 20.49 20.38
21,88 25,74 29,81 33,73 37.57 41.69 45.53
5
25 30 35 40 45 50 55
24.50 29.61 34,30 38.89 44.18 48.79 54,20
24.51 29,60 34.35 38.92 44.21 48.88 54.25
24.57 29.67 34.37 38.98 44.30 49.05 54.32
24.63 29.73 34.47 39.12 44.41 49.12 54.41
23.72 28.65 32.50 36.18 41.08 44.49 49.67
21.09 25.27 27.28 28.68 32.57 33.67 37.57
18.22 21.42 22,29 22.56 25.88 25.73 28.90
15.78 18.47 18.79 18.95 21.36 21.40 23.97
24.73 29.81 34.63 39.43 44.62 49.43 54.67
6
29 35 41 47 53 59 65
28.62 34.17 40.45 46.13 52.40 58,03 63.69
28.64 34.22 40.47 46.05 52.44 58.05 63.69
28.61 34.26 40,45 46.09 52.37 58.12 63,79
28.64 34.34 40.48 46.26 52.41 58.19 63.93
27.79 32.54 38.45 43.00 49.09 53.00 56.66
24.67 27.16 32.08 33.90 38.64 39.82 40.65
21.03 22.19 25.75 26.30 29.86 29.96 30.08
18.23 18,54 21.51 21.59 24.29 24.52 24.43
28.76 34.51 40.68 46.49 52.67 58.51 64.28
7
33 40 47 54 61 68 75
32.65 39.20 46.51 53.09 60.53 67.15 73.77
32.62 39.34 46.52 53.16 60.46 67.23 73.78
32.67 39.29 46.59 53.29 60.49 67.15 73.87
32.72 39.41 46.62 53.23 60.57 67.32 73.90
31.80 37.51 44.43 49.81 56.71 61.58 66.12
28.31 31,52 36.89 39.28 44.48 45.94 47,05
23,97 25,33 29,25 30.02 33.86 33.95 34.27
20.50 20.88 24.08 24.33 27.43 27.39 27.41
32.76 39.51 46.66 53,46 60.66 67.48 74.21
8
37 45 53 61 69 77 85
36.70 44.34 52.51 60.23 67.78 76.21 83.85
36.67 44.26 52.53 60.26 67.85 76.19 83.93
36.70 44.43 52.60 60.33 67.91 76.16 83.92
36.68 44.43 52.64 60.36 67.98 76.33 83.88
35.79 42.38 50.39 56.33 62.14 70.40 75.75
31.95 35.71 41.88 44.32 46.20 52.39 53.29
27.08 28.57 33.17 33.70 34.01 38.42 38.51
22.78 23.36 26.98 26.95 27.06 30.43 30.43
36.75 44.50 52.73 60.49 68.19 76.55 84.27
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
416
Table 11 Expected sample size when using Procedure 2 at 10% significance level
No 1/5
1/4
1/3
1/2
2
3
4
5
1
3
18 21 24 27 30 33 36
16.49 19.74 21.98 25.40 27.58 31.06 33.34
16.63 19.83 22.25 25.54 27.82 31.24 33.6t
16.81 19.95 22.38 25.69 28.07 31.49 33.89
17.09 20.27 22.75 26.06 28.52 31.96 34.42
16.34 19.34 21.29 24.21 26.19 29.26 30~97
14.27 16.81 17.82 20.31 20.98 23.70 24.13
12.53 14.64 15.07 17.22 17.50 19.64 19.73
11.18 13.06 13.40 15.14 15.28 17.11 17.11
17.54 20.62 23.45 26.61 29.43 32.64 35.40
4
22 26 30 34 38 42 46
21.20 24.62 28.92 32.39 35.74 40.28 43.71
21.23 24.67 29.00 32.49 35.77 40.30 43.87
21.33 24.74 29.01 32.59 35.86 40.54 43.90
21.43 25.04 29.20 32.81 36.28 40.73 44.26
20.41 23.31 27.37 30.02 31.98 36.31 38.56
17.90 19.39 22.76 23.44 24.30 27.69 28.13
15.43 16.02 18.76 19.01 19.07 21.78 21.78
13.72 13.80 16.03 16.11 16.27 18.39 18.37
21.66 25.32 29.53 33.31 36.94 41.30 45.08
5
25 30 35 40 45 50 55
24.50 29.10 33.36 38.89 43.35 47.65 53.51
24.51 29.06 33.36 38.92 43.46 47.73 53.41
24.57 29.19 33.58 38.98 43.60 47.87 53.47
24.63 29.25 33.64 39.12 43.65 48.06 53.79
23.72 27.61 30.86 36.18 39.25 41.98 47.31
21.09 23.11 24.65 28.68 29.61 30.45 34.12
18.22 19.15 19.42 22.56 22.84 22.88 25.95
15.78 16.08 16.31 18.95 18.88 19.03 21.36
24.73 29.50 34.03 39.43 44.09 48.75 54.26
6
29 35 41 47 53 59 65
27.87 34.17 39.66 44.87 51.47 56.91 62.34
27.83 34.22 39.72 45.00 51.60 57.02 62.16
27.92 34.26 39.71 44.97 51.62 57.09 62.39
28.00 34.34 39.85 45.24 51.87 57.23 62.67
26.66 32.54 36.89 40.56 46.87 50.47 53.36
22.50 27.16 29.00 30.40 35.13 35.80 36.74
18.52 22.19 22.85 23.09 26.65 26.94 26.76
15.55 18.54 18.82 18.95 21.67 21.80 21.61
28.20 34.51 40.14 45.52 52.14 57.71 63.37
7
33 40 47 54 61 68 75
31.93 39.20 45.70 51.96 59.66 66.16 72.52
31.90 39.34 45.80 52.09 59.79 66.08 72.44
32.02 39.29 45.85 52.05 59.75 66.21 72.59
32.02 39.41 45.83 52.27 59.86 66.34 72.73
30.54 37.51 42.65 47.20 54.49 58.96 62.36
25.84 31.52 33.74 35.50 40.64 41.72 42.40
21.15 25.33 25.99 26.45 30.37 30.27 30.44
17.66 20.88 21.16 21.18 24.34 24.33 24.44
32.22 39.51 46.12 52.75 60.22 66.79 73.26
8
37 45 53 61 69 77 85
36.02 42.99 51.75 59.09 66.04 75.14 82.53
36.01 43.19 51.80 59.18 66.12 75.30 82.52
36.00 43.16 51.83 59.14 66.45 75.23 82.55
36.07 43.29 51.94 59.38 66.38 75.42 82.79
34.39 40.24 48.52 53.80 58.59 67.43 71.53
29.21 31.95 38.37 40.08 41.33 47.29 48.I8
23.81 24.77 29.38 29.61 29.90 34.24 34.23
19.79 19.92 23.42 23.82 23.58 27.00 27.14
36.31 43.49 52.13 59.64 66.99 75.89 83.10
Inverse sampling procedures to test for homogeneity in a multinomial distribution
417
Table 12 Powers of the chi-square test at 3% significance level k
3
N 1/5
1/4
1/3
1/2
2
3
4
5
1
18 21 24 27 30 33 36
0.3145 0.5428 0.5362 0.6350 0.7740 0.7981 0.8465
0.2398 0.4334 0.4229 0.5064 0.6518 0.6782 0.7336
0.1555 0.2940 0.2788 0.3339 0.4601 0.4823 0.5330
0.0702 0.1361 0.1187 0.1382 0.2044 0.2107 0.2338
0.1375 0.2250 0.2066 0.2473 0.3275 0.3366 0.3683
0.4001 0.5583 0.5545 0.6365 0.7386 0,7594 0,8012
0.6308 0.7828 0.7904 0.8585 0.9197 0.9327 0.9533
0.7820 0.8970 0.9062 0.9479 0.9764 0.9821 0.9896
0.0168 0.0315 0.0213 0.0222 0.0331 0.0296 0.0294
22 26 30 34 38 42 46
0.2288 0.3434 0.3945 0.5lll 0.5716 0.6532 0.7071
0.1874 0.2733 0.3122 0.4035 0.4595 0.5272 0.5810
0.1356 0.1895 0.2127 0.2704 0.3121 0.3546 0.3978
0.0740 0.0968 0.1028 0.1250 0.1408 0.1530 0.1712
0.1664 0.2183 0.2365 0.2899 0.3077 0.3478 0.3741
0.4736 0.5824 0.6336 0.7211 0.7537 0.8099 0.8396
0.7225 0.8215 0.8665 0.9208 0.9393 0.9637 0.9741
0.8642 0.9296 0.9560 0.9798 0.9867 0.9939 0.9963
0.0254 0.0297 0.0268 0.0298 0.0284 0.0277 0.0278
25 30 35 40 45 50 55
0.1648 0.2001 0.2708 0.3734 0.4232 0.5026 0.5858
0.1411 0.1658 0.2189 0.2991 0.3352 0.3985 0.4720
0.1097 0.1227 0.1556 0.2085 0.2276 0.2682 0.3213
0.0686 0.0706 0.0828 0.1059 0.1084 0.1232 0.1453
0.1719 0.1952 0.2299 0.2801 0.3009 0.3388 0.3777
0.4914 0.5655 0.6441 0.7247 0.7692 0.8184 0.8582
0.7475 0.8220 0.8831 0.9293 0.9522 0.9705 0.9820
0.8864 0.9354 0.9668 0.9845 0.9918 0.9961 0.9982
0.0301 0.0266 0.0269 0.0310 0.0273 0.0280 0.0297
29 35 41 47 53 59 65
0.1883 0.2315 0.2766 0.3229 0.3695 0.4159 0.4615
0.1609 0.1966 0.2341 0.2730 0.3128 0.3529 0.3931
0.1234 0.1484 0.1749 0.2026 0.2315 0.2611 0.2913
0.0731 0.0839 0.0952 0.1071 0.1195 0.1324 0.1457
0.1609 0.1966 0.2341 0.2730 0.3128 0.3529 0.3931
0.5458 0.6493 0.7365 0.8068 0.8614 0.9025 0.9327
0.8495 0.9199 0.9596 0.9806 0.9910 0.9960 0.9983
0.9631 0.9874 0.9960 0.9988 0,9997 0.9999 1.0000
0.0300 0.0300 0.0300 0.0300 0.0300 0.0300 0.0300
33 40 47 54 61 68 75
0.1648 0.2032 0.2438 0.2859 0.3289 0.3723 0.4156
0.1417 0.1733 0.2069 0.2420 0.2783 0.3154 0.3529
0.1100 0.1322 0.1557 0.1806 0.2066 0.2335 0.2611
0.0674 0.0770 0.0871 0.0977 0.1088 0.1203 0.1323
0.1541 0.1894 0.2268 0.2657 0.3058 0.3464 0.3871
0.5510 0.6590 0.7489 0.8202 0.8744 0.9142 0.9426
0.8686 0.9351 0.9700 0.9868 0.9945 0.9978 0.9991
0.9747 0.9927 0.9981 0.9995 0.9999 1.0000 1.0000
0.0300 0.0300 0.0300 0.0300 0.0300 0.0300 0.0300
37 45 53 61 69 77 85
0.1478 0.1823 0,2192 0.2578 0,2977 0.3383 0.3792
0.1278 0.1562 0.1866 0.2186 0.2520 0.2863 0.3214
0.1004 0.1202 0.1415 0.1640 0.1877 0.2123 0.2378
0.0632 0.0718 0.0810 0.0906 0.1006 0.1111 0.1221
0.1478 0.1823 0.2192 0.2578 0.2977 0.3383 0.3792
0.5515 0.6631 0.7554 0.8277 0.8819 0.9211 0.9484
0.8808 0.9446 0.9760 0.9903 0.9963 0.9986 0.9995
0.9813 0.9953 0.9990 0.9998 1.0000 1.0000 1.0000
0.0300 0.0300 0.0300 0.0300 0.0300 0.0300 0.0300
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
418
Table 13 Powers of the chi-square test at 7% significance level k
3
N
2 1/5
1/4
1/3
1/2
2
3
4
5
1
18 21 24 27 30 33 36
0.6497 0.7594 0.7198 0.8376 0.8934 0.8955 0.9236
0.5493 0.6566 0.6055 0.7379 0.8066 0.8083 0.8442
0.4090 0.4993 0.4370 0.5659 0.6382 0.6362 0.6724
0.2246 0.2760 0.2145 0,2976 0,3451 0.3319 0.3476
0.3057 0.3718 0.3177 0.4084 0.4655 0.4539 0.4820
0.6214 0.7147 0.6871 0.7842 0.8395 0.8423 0.8730
0.8136 0.8857 0.8784 0.9341 0.9606 0.9637 0.9761
0.9091 0.9546 0.9540 0.9802 0.9904 0.9918 0.9956
0.0755 0.0872 0.0499 0.0708 0.0797 0.0636 0.0597
22 26 30 34 38 42 46
0.4670 0.5051 0.6427 0.7039 0.7555 0.8342 0.8519
0.3892 0.4198 0.5420 0.5984 0.6485 0.7327 0.7537
0.2906 0.3087 0.4020 0.4435 0.4839 0.5586 0.5790
0.1704 0.1712 0.2196 0.2345 0.2526 0.2915 0.2998
0.2929 0.3013 0.3724 0.4010 0.4318 0.4882 0.5010
0.6371 0.6708 0.7639 0.8054 0.8427 0.8892 0.9031
0.8435 0.8745 0.9311 0.9527 0.9690 0.9836 0.9876
0.9358 0.9552 0.9812 0.9893 0.9944 0.9978 0.9986
0.0705 0.0599 0.0717 0.0675 0.0657 0.0708 0.0654
25 30 35 40 45 50 55
0,3276 0.4155 0.4990 0.5797 0.6545 0.7078 0.7677
0.2824 0.3515 0.4167 0.4876 0.5531 0.6021 0.6619
0.2226 0.2679 0.3099 0.3629 0.4102 0.4463 0.4963
0.1439 0.1615 0.1775 0.2028 0.2220 0.2354 0,2596
0.2710 0.3068 0.3527 0.3923 0.4322 0.4611 0.4997
0.6164 0.6857 0.7618 0.8104 0.8562 0.8852 0.9138
0.8366 0.8900 0.9367 0.9592 0.9764 0.9850 0.9914
0.9359 0.9651 0.9850 0.9923 0.9967 0.9983 0.9993
0.0700 0.0686 0.0678 0.0702 0.0692 0.0662 0.0672
29 35 41 47 53 59 65
0.3036 0.3572 0.4102 0.4619 0.5117 0.5592 0,6040
0.2681 0.3141 0.3603 0.4060 0.4508 0.4943 0.5361
0.2169 0.2513 0.2864 0.3217 0.3570 0.3922 0.4268
0.1425 0.1591 0.1762 0.1937 0.2114 0.2295 0.2478
0.2681 0.3141 0.3603 0.4060 0.4508 0.4943 0.5361
0.6820 0.7703 0.8382 0.8886 0.9249 0.9502 0.9676
0.9173 0.9604 0.9819 0.9921 0.9967 0.9986 0.9995
0.9837 0.9952 0.9987 0.9996 0.9999 1.0000 1.0000
0.0700 0.0700 0.0700 0.0700 0.0700 0.0700 0.0700
33 40 47 54 61 68 75
0.2736 0.3228 0.3722 0.4211 0.4689 0.5151 0.5592
0.2427 0.2847 0,3274 0.3701 0,4125 0.4541 0.4946
0.1983 0.2295 0.2616 0.2942 0.3270 0.3599 0.3926
0.1337 0.1488 0.1643 0,1802 0,1964 0.2130 0.2298
0.2594 0.3054 0.3518 0.3980 0.4434 0.4877 0.5304
0.6869 0.7783 0.8476 0.8979 0.9331 0.9571 0.9730
0.9295 0.9689 0.9871 0.9949 0.9981 0.9993 0.9998
0.9894 0.9974 0.9994 0.9999 1.0000 1.0000 1.0000
0.0700 0.0700 0.0700 0.0700 0,0700 0,0700 0.0700
37 45 53 61 69 77 85
0.2512 0.2967 0.3429 0.3891 0.4347 0.4794 0.5225
0.2238 0.2625 0.3022 0.3422 0.3823 0.4220 0.4610
0.1844 0.2131 0.2427 0.2729 0.3036 0.3345 0.3655
0.1272 0.1410 0.1552 0.1698 0.1848 0.2001 0.2157
0.2512 0.2967 0.3429 0.3891 0.4347 0.4794 0.5225
0.6876 0.7818 0.8524 0.9030 0.9379 0.9611 0.9761
0.9371 0.9741 0.9900 0.9964 0.9987 0.9996 0.9999
0.9925 0.9984 0.9997 0.9999 1.0000 1.0000 1.0000
0.0700 0.0700 0.0700 0.0700 0.0700 0.0700 0.0700
Inverse sampling procedures to test for homogeneity in a multinomial distribution
419
To accomplish this, we propose a combined procedure that makes use of both of the previous procedures. It is performed as follows. Procedure 3
For a fixed value of k and ~, we first choose values of M and No in Procedures 1 and 2, respectively, with corresponding critical values D1 and D2 given in Tables 1 and 7, respectively. We then take one observation at a time, and continue sampling until the first time t when one of the following three events occurs. (1) X(k)t - X(k-1)t ~_ D2 (2) X(~)t = i (3) No observations have been taken If (1) occurs, then we stop sampling and reject H0. If (2) occurs, we perform the test in Procedure 1 and draw the appropriate conclusion. And if (3) occurs, then we stop sampling and accept H0. If any two, or all three, of the above events occur simultaneously, we perform the action corresponding to the lowest numbered event that occurs. Ideally, once k and ~ have been fixed, we would like to choose M and No so that the procedure that is optimal, for a particular value of 2, is the one that determines the outcome of the test described above most of the time. That is, when 2 < 1 and Procedure 1 is optimal, we would like (2) to occur before (1) and (3), and (1) or (3) to occur before (2) when 2 > 1. In Tables 14 and 15, we present the powers of Procedure 3 performed when Procedures 1 and 2 are used at the 5% and 10% significance levels, respectively. The values of M and No were chosen according to the considerations described above. In addition to the powers, we also present (in brackets below the powers) the percentage of times that the outcome was determined by Procedure 1. That is, the percentage of times that (2) occurred before (1) and (3) or (2) and (3) occurred simultaneously, both before (1). The ideal situation would be for the percentages given in brackets to be close to l when 2 < l, and close to 0 when 2 > 1. Indeed, we see this occurring for some of the larger values of M and No for each value ofk. For example, in the last line ofk = 5 in Table 14, we see that Procedure 1 determines the outcome 91% of the time when 2 = 1/5, while it determines the outcome only 15% of the time when 2 = 5. Comparing the powers in this situation when £ < 1 to the optimal Procedure l, we see that with a similar attained significance level (0.0252 for Procedure 3 compared with 0.0194 for Procedure l), the powers of Procedure 3 are nearly as good as the optimal powers of Procedure 1. For example, when 2 = 1/4, the power of Procedure 3 (0.4699) is quite close to that of Procedure 1 (0.4814). Furthermore, the power is clearly superior to that of Procedure 2 in the same situation (0.0531). When 2 > 1, we compare the powers to those of the optimal Procedure 2. Both procedures have almost identical attained significance levels (0.0252 for Procedure 3 and 0.0254 for Procedure 2), and the powers of Procedure 3 are fairly close to the optimal powers of Procedure 2. For example, when 2 = 5, the power of
420
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Table 14 Powers of Procedure 3 at 5% significance level k
No
M
18
9
21
10
24
11
27
12
30
13
33
14
36
15
22
9
26
10
30
11
34
12
38
13
42
14
46
15
25
9
30
10
35
11
40
12
45
13
50
14
55
15
2 1/5
I/4
1/3
1/2
2
3
4
5
.2911 (.77) .5291 (.88) .4976 (.89) .6783 (.94) .6388 (.93) .7814 (.95) .7437 (.94)
.2267 (.72) .4265 (.83) .3796 (.86) .5509 (.92) .5208 (.92) .6538 (.95) .6110 (.94)
.1507 (.64) .2785 (.74) .2460 (.80) .3883 (.87) .3382 (.87) .4611 (.92) .4100 (.92)
.0732 (.49) .1317 (.60) .1128 (.64) .1675 (.71) .1467 (.74) .2034 (.80) .1751 (.82)
.1249 (.53) .1591 (.65) .1667 (.66) .2018 (.74) .2024 (.73) .2458 (.79) .2425 (.77)
.3121 (.62) .3904 (.72) .4334 (.62) .4918 (.71) .5156 (.60) .5811 (.66) .6051 (.55)
.4970 .6309 (.54) (.43) .5716 .7143 (.63) (.51) .6494 .7909 (.45) (.31) .6990 ,8327 (.54) (.38) .7449 .8732 (.38) (.24) .7939 .9024 (.44) (.28) .8235 .9206 (.31) (.18)
.0278 (.30) ,0332 (.38) .0293 (.41) .0396 (.45) .0269 (.48) .0373 (.52) .0264 (.55)
.2527 (.62) .2501 (.73) .4936 (.82) .4710 (.88) .6740 (.89) .6353 (.94) .7826 (.96)
.1985 (.58) .1927 (.68) .3924 (.78) .3618 (.84) .5642 (.87) .5085 (.92) .6693 (.94)
.1287 (.53) .1239 (.62) .2630 (.72) .2332 (.78) .3880 (.81) .3230 (.87) .4731 (.90)
.0694 (.43) .0631 (.51) .1220 (.60) .0959 (.65) .1823 (.70) .1344 (.76) .2131 (.79)
.1023 (.56) .1217 (.60) .1710 (.71) .1706 (.71) .2703 (.70) .2050 (.79) .3137 (.76)
.2721 (.71) .3600 (.64) .4260 (.75) .4663 (.64) .6383 (.52) .5483 (.62) .7070 (.50)
.4693 (.65) .5713 (.49) .6606 (.58) .7127 (.42) .8490 (.30) .8100 (.35) .9087 (.24)
.6164 (.55) .7238 (.35) .8074 (.42) .8532 (.26) .9442 (.16) .9175 (.20) .9709 (.11)
.0255 (.29) .0203 (.34) .0327 (.40) .0268 (.45) .0514 (.49) .0287 (.53) .0443 (.56)
.2186 (.41) .2170 (.57) .4475 (.65) .4523 (.73) .4245 (.83) .6330 (.86) .6037 (.91)
.1835 (.39) .1596 (.53) .3725 (.63) .3575 (.70) .3126 (.79) .5137 (.83) .4699 (.88)
.1311 .0852 (.36) (.29) .I058 .0554 (.49) (.42) .2649 .1330 (.58) (.49) .2523 .1202 (.65) (.54) .2086 .0856 (.74) (.64) .3388 .1581 (.78) (.68) .3016 .1202 (.84) (.75)
.1609 (.42) .1022 (.58) .2286 (.63) .2352 (.63) .1739 (.73) .2752 (.74) .2006 (.81)
.4058 (.54) .3134 (.70) .5478 (.61) .5876 (.50) .4946 (.61) .6702 (.49) .6125 (.56)
.6290 .7650 (.45) (.34) .5381 .6962 (.58) (.44) .7768 .8950 (.42) (.26) .8196 .9280 (.29) (.15) .7559 .8925 (.37) (.20) .8828 .9607 (.25) (.12) .8440 .9473 (.30) (.15)
.0390 (.21) .0200 (.29) .0547 (.35) .0463 (.41) .0237 (.46) .0457 (.50) .0252 (.54)
D2
D~
7
9
8
9
8
10
9
10
9
11
10
11
I0
12
7
9
7
10
8
I0
8
11
8
11
9
12
9
12
6
9
7
10
7
10
7
11
8
12
8
12
9
13
1
421
Inverse sampling procedures to test for homogeneity in a multinomial distribution
Table 14 (Contd.) k
No
M
29
9
35
10
41
11
47
12
53
13
59
14
65
15
33
9
40
10
47
11
54
12
61
13
68
14
75
15
37
9
45
10
53
11
61
12
69
13
77
14
85
15
2
D2
DI
1/5
1/4
1/3
1/2
2
3
4
5
1
.2108 (,40) .2274 (.51) .2078 (.64) .4565 (.73) .4219 (.79) .6230 (.84) .6166 (.88)
.1787 (.39) .1874 (.49) .1563 (.61) .3529 (.69) .3374 (.78) .5240 (.82) .4906 (.86)
.1330 (.36) .1299 (.45) .0987 (.57) .2440 (.65) .2060 (.73) .3670 (.78) .3264 (.82)
.0870 (.30) .0801 (.39) .0513 (.50) .1257 (.58) .0903 (.65) .1753 (.70) .1436 (.75)
.1580 (.45) .1670 (.52) .1086 (.67) .2289 (.68) .1653 (.77) .2767 (.78) .2633 (.76)
.3869 (.60) .4492 (.55) .3683 (.68) .5766 (.58) .5009 (.67) .6756 (.57) .6968 (.45)
.6094 (.54) .6841 (.39) ,6097 (.49) .8139 (.34) .7592 (.42) .8850 (.30) .9071 (.20)
.7692 (.40) .8283 (.25) .7814 (.31) .9226 (.19) .8968 (.24) .9689 (.14) .9736 (.08)
.0440 (.22) ,0403 (.30) .0212 (.38) .0476 (.44) .0268 (.50) .0504 (.55) .0406 (.57)
6
9
6
10
7
11
7
11
8
12
8
12
8
13
.0132 (.39) .2238 (.51) ,2068 (.63) ,1993 (.71) .4274 (.79) .4121 (.84) .6210 (.88)
.0154 (,38) .1851 (.50) .1667 (.63) .1456 (.70) .3393 (.77) .3131 (.82) .5020 (.86)
.0156 .0121 (.35) (.31) .1269 .0784 (.46) (.41) .1025 .0503 (.58) (.52) .0887 .0433 (.65) (.59) .2235 .1002 (.74) (.67) .1883 .0738 (.79) (.72) .3375 .1511 (.83) (.76)
.0457 (.48) .1646 (.56) .1025 (.70) .1205 (.72) .1665 (.81) .1651 (.81) .2718 (.80)
.1858 (.65) .4386 (.61) .3457 (.73) .4036 (.64) .5022 (.72) .5186 (.62) .6954 (.50)
.3782 (.58) .6829 (.44) .5956 (.54) .6873 (.38) .7602 (.47) .8025 (.32) .9140 (.23)
.5275 (.46) .8240 (.29) ,7743 (.35) ,8452 (.21) .9035 (.27) .9260 (.16) .9792 (.09)
.0117 (.25) .0407 (.32) .0210 (.41) .0201 (.47) .0304 (.53) .0217 (.58) .0460 (.63)
6
10
6
10
7
11
7
12
8
12
8
13
8
13
.0089 (.40) .2296 (.50) .2224 (.64) .2027 (.72) .4461 (.77) .4269 (.84) .6269 (.88)
.0092 (.38) .1873 (.50) .1670 (.61) .1469 (.70) .3543 (.76) .3221 (.83) .5090 (.86)
.0101 (.36) .1408 (.47) .1047 (.58) .0917 (.66) .2383 (.73) .2024 (.80) .3659 (.84)
.0394 (.50) .1492 (.59) .1047 (.71) .1068 (.76) .2273 (.77) .1608 (,85) .2754 (.83)
.1568 (.69) .4350 (.64) .3400 (.77) .3861 (.67) .6074 (.56) .5201 (.64) .7113 (.53)
.3316 (.63) .6732 (.49) .5989 (.57) .6597 (.43) .8386 (.31) .7982 (.37) .9205 (,25)
.4948 (.50) .8217 (.32) .7618 (.40) .8295 (.25) .9488 (.14) .9276 (.19) .9811 (.11)
,0088 (.26) .0427 (.34) .0222 (.44) .0185 (.51) .0431 (.55) .0277 (.63) .0519 (.67)
6
10
6
10
7
ll
7
12
7
12
8
13
8
13
.0112 (.32) .0860 (.42) .0554 (.53) .0403 (.62) .1205 (.67) .0824 (.74) .1670 (.79)
422
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Table 15 Powers of Procedure 3 at 10% significance level •#
No
M
18
9
21
l0
24
11
27
12
30
13
33
14
36
15
22
9
26
10
30
11
34
12
38
13
42
14
46
15
25
9
30
10
35
11
40
12
45
13
50
14
55
15
,~
D2
1/5
1/4
1/3
1/2
2
3
4
5
1
.6292 (.67) .7780 (.81) .7455 (.81) .8419 (.88) .8244 (,86) .8974 (,91) .8793 (.89)
.5264 (.62) .6823 (.77) .6355 (.78) .7586 (.87) .7196 (.85) .8078 (.91) .7774 (.89)
.3978 (.54) .5186 (.68) .4729 (,72) .5961 (.82) .5474 (.81) .6396 (.87) .5974 (.87)
.2349 (,41) .3009 (.55) .2641 (.57) .3162 (.67) .2834 (.69) .3443 (.77) .3097 (.78)
.3142 (.39) .3436 (.55) .3439 (.54) .3703 (.64) .3679 (.61) .4004 (.69) .3952 (.66)
.5914 (.39) .6466 (.52) .6703 (.42) .7021 (.51) .7166 (.41) .7492 (.48) .7725 (.38)
.7697 (.29) .8091 (.38) .8460 (.26) .8671 (.33) .8894 (.21) .9131 (.26) .9218 (.17)
.8733 (.19) .9041 (.26) .9325 (.14) .9442 (.19) .9604 (.11) .9673 (.14) .9721 (.08)
.1004 (.26) .1072 (.35) .0931 (.37) .0991 (.43) ,0771 (.46) .0860 (.51) .0671 (.53)
.4997 (.57) .5418 (.65) .5073 (.78) .7100 (.83) .8380 (.84) .8180 (.91) .9041 (.92)
.4401 (.53) .4566 (.62) .4084 (.75) .6049 (.79) .7538 (.81) .7151 (.89) .8246 (.90)
.3340 (,48) .3467 (.56) .2821 (.69) .4461 (.73) .5941 (.76) .5289 (.84) .6554 (.86)
.2194 (.39) .2056 (.45) .1421 (.57) .2339 (.61) .3471 (.65) .2744 (.73) .3618 (.75)
.3064 (.47) .3076 (.48) .2322 (.62) .3552 (.60) .4689 (.57) .3778 (.68) .4909 (.63)
.6034 (.50) .6450 (.42) .5407 (.55) .7122 (.44) .8223 (.34) .7584 (.43) .8523 (.33)
.7873 (.39) .8305 (.27) .7705 (.35) .8966 (.23) .9505 (.15) .9277 (.19) .9674 (.13)
.8918 (.28) .9152 (.16) .8908 (.20) .9603 (.12) .9877 (.07) .9789 (.09) .9928 (.05)
.2186 (.41) .4446 (.53) .4636 (.60) .6354 (.73) .6542 (.79) .7859 (.81) .7748 (.88)
.1835 (.39) .3793 (.50) .3960 (.57) .5501 (.70) .5494 (.76) .699l (.79) .6682 (.86)
.1311 (.36) .2956 (.46) .2948 (.53) .4345 (.65) .4075 (.71) .5428 (.74) .4943 (.81)
.0852 (.29) .1928 (.38) .1678 (.44) .2427 (.54) .2152 (.60) .3122 (.63) .2475 (.72)
.1609 (.42) .3025 (.49) .3239 (.50) .3667 (.63) .3600 (.62) .4776 (.61) .3773 (.70)
.4058 (.54) .6312 (.48) .6632 (.41) .7337 (.50) .7279 (.42) .8459 (.31) .8017 (.38)
.6290 (.45) .8407 (.33) .8652 (.22) .9072 (.29) .9168 (.20) .9675 (.12) .9470 (.16)
.7650 (.34) .9275 (.21) .9485 (.12) ,9698 (.15) .9758 (.09) .9916 (.05) .9876 (.07)
D~
6
8
7
8
7
9
8
9
8
l0
9
10
9
11
.1055 (.26) .0899 (.30) .0499 (.38) .0848 (.42) .1300 (.45) .0782 (,51) .1086 (,54)
6
8
6
9
7
10
7
10
7
10
8
11
8
11
.0390 (.21) .0899 (.26) .0847 (.31) .1018 (.41) .0788 (.44) .1188 (.47) .0733 (.52)
6
9
6
9
6
10
7
10
7
11
7
I1
8
12
423
Inverse sampling procedures to test for homogeneity in a multinomial distribution
Table 15 (Contd.) k
6
7
8
No
M
29
9
35
10
41
11
47
12
53
13
59
14
65
15
33
9
40
10
47
11
54
12
61
13
68
14
75
15
37
9
45
10
53
11
61
12
69
13
77
14
85
15
)o 1/5
1/4
1/3
1/2
2
3
4
5
1
.2458 (.34) .2274 (.51) .4508 (.60) .4732 (.67) .6338 (.77) .6298 (.81) .7812 (.84)
.2208 (.33) .1874 (.49) .3861 (.58) .3777 (.64) .5577 (.75) .5316 (.79) .6857 (.82)
.1788 (.30) .1299 (.45) .2862 (.53) .2771 (.60) .4073 (,71) .3796 (.75) ,5316 (.79)
.1363 (.25) .0801 (,39) .1777 (.46) .1639 (.53) .2346 (.63) .1963 (.67) .2954 (.71)
.2516 (.33) .1670 (.52) .3018 (.56) .3237 (.55) .3647 (.68) ,3509 (.66) .4608 (.64)
.5416 (.38) .4492 (.55) .6732 (.46) ,6929 (.38) .7437 (.48) .7588 (,38) .8566 (.29)
.7616 (.28) .6841 (.39) .8667 (,27) .8931 (.18) .9225 (.23) .9337 (.15) .9751 (.09)
.8802 (.18) .8283 (.25) .9464 (.14) .9628 (.08) .9799 (.10) ,9841 (.06) .9952 (.03)
.0888 (.18) .0403 (.30) .0934 (.35) .0842 (.40) .0904 (.48) .0704 (.53) .1101 (.54)
.2342 (.35) .2238 (,51) .4624 (.60) .4691 (.67) .4352 (.77) .6455 (.82) .6295 (.85)
.2136 (.33) .1851 (.50) .4047 (.60) .3906 (.66) .3472 (.75) .5445 (.80) .5139 (.83)
.1694 (.30) .1269 (.46) .3003 (.55) .2806 (.61) .2353 (.72) .4001 (.76) .3510 (,80)
.1253 (.27) ,0784 (.41) .1869 (.49) .1640 (.56) .1124 (.65) .2038 (.70) .1719 (.73)
.2353 (.37) .1646 (.56) .3210 (.60) .3093 (.61) .2263 (.73) .3431 (.72) .3486 (.68)
.5307 (.43) .4386 (.61) .6705 (,52) .6862 (.43) ,6033 (.52) .7613 (.43) .7839 (.33)
.7626 (.32) .6829 (.44) .8712 (.30) .8881 (.20) .8478 (.26) .9368 (.18) .9506 (.12)
.8797 (.20) ,8240 (.29) .9563 (.16) .9648 (.09) .9473 (.12) .9862 (.07) ,9905 (.04)
.0878 (.21) .0407 (.32) .0961 (.39) .0886 (.44) ,0434 (,52) .0806 (,56) .0667 (,60)
.2334 (.36) .2625 (.45) .2377 (.62) .4714 (.68) .4635 (.73) .6549 (.82) .6319 (.85)
.2088 (.34) .2233 (.45) .1838 (.59) .3926 (.67) .3769 (.71) .5572 (,81) .5157 (.84)
.1655 (.32) .1849 (.42) .1227 (.56) .2884 (,63) .2673 (.68) .4084 (.79) .3768 (.81)
.1274 (.28) .1342 (,37) .0749 (.51) .1704 (.59) .1530 (.63) .2178 (.72) .1835 (.76)
.2342 (.40) .2528 (.47) .1707 (.63) .3032 (.65) .3153 (.65) .3701 (.75) .3421 (.73)
,5184 (.48) .5899 (,42) .4857 (,56) .6830 (,47) .7224 (.36) .7637 (.46) .7797 (.36)
.7529 (.37) .8092 (.25) .7436 (.34) .8900 (.23) .9103 (.15) .9412 (,20) .9528 (.13)
.8779 (.24) .9155 (.13) .8760 (.18) .9646 (.11) .9762 (.06) .9866 (.08) .9904 (.04)
.0864 (,23) ,0899 (.30) .0391 (.42) .0904 (.48) .0810 (.51) .0926 (.61) .0696 (.64)
D2
DI
5
9
6
10
6
10
6
11
7
11
7
12
7
12
5
9
6
10
6
10
6
11
7
12
7
12
7
13
5
9
5
10
6
11
6
11
6
12
7
12
7
13
S. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
424
Table 16 Expected sample size when using Procedure 3 at 5% significance level k
3
No
M
2 1/5
1/4
1/3
1/2
2
3
4
5
1
11.99 13.54 14.18 15.87 16.39 18.02 18.40
17.42 20.19 22.96 25.77 28.50 31.26 34,00
18 21 24 27 30 33 36
9 10 11 12 13 14 15
15.79 17.95 19.90 22.04 23.99 26.06 28.04
16.00 18.22 20.36 22.55 24.49 26.68 28.69
16.33 18.73 20.88 23.20 25.36 27.63 29.78
16.87 19.41 21.86 24.42 26.72 29.24 31.60
1 6 . 1 8 1 4 . 3 8 13.02 18.54 16.21 14.67 20.68 17.64 15.61 22.94 19.57 17.37 24.95 20.91 18.10 27.17 22.66 19.90 29.03 23.78 20.47
22 26 30 34 38 42 46
9 10 11 12 13 14 15
20.13 23.12 26.22 29,19 31.89 34.91 37.74
20.27 23.37 26.56 29.45 32.39 35.37 38.29
20.52 23.70 26.99 30.00 33.04 36.25 39.20
20.82 24.21 27.65 31.02 34.18 37.59 40.77
19.81 22.73 25.61 28.36 30.84 33.76 36.10
17.35 19.17 21.56 23.17 24.51 27.08 28.23
15.37 13.97 16.65 14.88 18.64 16.72 1 9 . 6 0 17.4l 20.43 17.79 22.55 19.88 23.13 20.17
21.31 24.99 28.69 32.27 35.77 39.56 43.10
25 30 35 40 45 50 55
9 10 11 12 13 14 15
23.54 27.73 31.66 35.54 39,36 43.09 46.92
23.60 27.88 31.86 35.73 39.80 43.62 47.44
23.77 28.11 32.19 36.08 40.21 44.16 48.14
23.98 28.44 32.73 36.95 41.20 45.30 49.47
22.73 26.62 30.05 33.26 36.79 39.71 43.41
19.60 22.57 24.63 26.10 29.17 30.56 33.14
1 6 . 9 1 14.96 19.29 17.12 20.54 17.90 21.44 18.21 23,97 20.61 24.63 20.94 27.19 23.23
24.35 29.01 33.56 37.98 42,61 47,08 51.62
29 35 41 47 53 59 65
9 10 11 12 13 14 15
27.45 32.35 37.22 41.77 46.75 51.31 55.66
27.46 32.42 37.47 42.29 46,97 51.61 56.14
27.59 32.66 37.75 42.59 47.52 52.22 56.87
27.80 32.95 38.21 43.20 48.37 53.24 58.08
26.34 30.62 34.97 38.86 43.05 46,67 50.06
22.71 25.05 28.25 30.41 33.47 35.59 36.87
19.39 20.91 23.53 24.55 27.29 28.41 28.96
16.93 17.87 20.17 20.80 23.38 23.87 24,04
28,19 33,59 39,08 44,41 49,84 55,10 60.39
33 40 47 54 61 68 75
9 10 11 12 13 14 15
31.27 37.03 42.75 48.26 53.76 59.14 64.37
31.27 37.11 42.85 48.46 54.15 59.29 64.69
31.44 37.29 43.30 48.86 54.55 60.06 65.50
31.65 37.67 43.74 49.64 55.45 60,98 66.88
29.97 34.78 39.93 44.37 49.10 53.25 57,21
25.59 28.38 32.06 34.65 37.94 40.20 42.01
21.57 23.40 26,25 27.71 30,68 31.74 32,90
18,91 19,95 22.50 23.25 25.93 26.65 27.12
31.95 38.27 44.59 50.74 57.01 63.09 68.96
37 45 53 61 69 77 85
9 10 11 12 t3 i4 15
35.04 41.65 48.07 54.39 60.76 66.89 72.94
35.10 41.75 48.30 54.72 60.99 67.15 73.46
35.25 41.90 48.64 55.18 61.34 67.82 73.89
35.39 42.29 49.15 55.74 62.28 68.91 75.36
33.44 39.21 44.81 49.87 54.59 59.71 64.57
28.54 31.65 35,76 38,57 40.86 44,87 46,97
24.01 26,01 29.12 30.76 32.16 35.48 36.38
20.73 21.97 24.78 25.86 26.25 29.41 30.05
35.73 42.95 50.09 56.95 64.07 70.80 77.53
Inverse sampling procedures to test for homogeneity,in a multinomial distribution
425
Table 17 Expected sample size when using Procedure 3 at 10% significance level k
3
No
M 1/5
1/4
1/3
1/2
2
3
4
5
1
18 21 24 27 30 33 36
9 10 11 12 13 14 15
15.47 17.76 19.56 21.83 23.67 25.87 27.74
15.67 18.04 20.04 22.38 24.17 26.50 28.39
16.06 t8.57 20.58 23.04 25.07 27.45 29.50
16.66 19.30 21.60 24.29 26.48 29.11 31.39
15.82 18.32 20.26 22.65 24.49 26.82 28.48
13.71 15.68 16.79 18.88 19.90 21.83 22.64
12.11 13.91 14.52 16.41 16.82 18.74 19.06
10.97 12.63 12.93 14.74 14.98 16.70 16.89
17.27 20.12 22.83 25.72 28.38 31.20 33.91
22 26 30 34 38 42 46
9 10 11 12 13 14 15
19.98 22.78 26.07 28.95 31.51 34.72 37.44
20.12 23.09 26.43 29.20 31.96 35.16 37.94
20.38 23.42 26.85 29.74 32.64 36.04 38.88
20.68 23.95 27.52 30.79 33.84 37.41 40.46
19.56 22.25 25.32 27.84 30.00 33.16 35.22
16.82 18.20 20.86 22.07 22.99 25.81 26.65
14.57 15.37 17.59 18.12 18.59 20.90 21.21
12.98 13.48 15.46 15.83 15.96 18.19 18.25
21.23 24.81 28.59 32.11 35.51 39.43 42.89
25 30 35 40 45 50 55
9 10 11 12 13 14 15
23.54 27.52 31.32 35.54 39.06 42.67 46.72
23.60 27.69 31.49 35.73 39.52 43.21 47.24
23.77 27.90 31.82 36.08 39.92 43.74 47.96
23.98 28.28 32.44 36.95 40.95 44.84 49.25
22.73 26.25 29.35 33.26 36.05 38.68 42.68
19.60 21.69 23.28 26.10 27.64 28.60 31.47
16.91 t8.07 18.85 21.44 22.09 22.45 25.15
14.96 15.70 16.03 18.21 18.62 18.77 21.15
24.35 28.88 33.33 37.98 42.40 46.72 51.47
29 35 41 47 53 59 65
9 10 11 12 13 14 15
27.07 32.35 36.92 41.28 46.59 51.01 55.20
27.07 32.42 37.19 41.80 46.75 51.29 55.60
27.20 32.66 37.45 42.13 47.30 51.91 56.41
27.43 32.95 37.96 42.73 48.19 52.91 57.55
25.74 30.62 34.36 37.83 42.42 45.71 48.70
21.61 25.05 26.88 28.55 31.99 33.58 34.37
17.88 20.91 21.75 22.26 25.29 25.97 26.14
15.25 17.87 18.16 18.54 21.22 21.46 21.50
27.88 33.59 38.84 44.01 49.70 54.84 59.94
33 40 47 54 61 68 75
9 10 11 12 13 14 15
30.93 37.03 42.49 47.80 53.57 58.88 63.97
30.92 37.11 42.61 48.01 53.99 59.00 64.25
31.10 37.29 43.06 48.42 54.39 59.77 65.09
31.34 37.67 43.51 49.25 55.30 60.69 66.46
29.45 34.78 39.38 43.48 48.55 52.42 55.98
24.51 28.38 30.73 32.80 36.48 38.20 39.44
20.02 23.40 24.47 25.35 28.56 29.23 30.01
17.14 19.95 20.47 20.80 23.68 24.06 24.36
31.69 38.27 44.43 50.38 56.86 62.88 68.60
37 45 53 61 69 77 85
9 10 11 12 13 14 15
34.76 41.06 47.87 53.98 60.17 66.64 72.60
34.82 41.20 48.10 54.40 60.38 66.92 73.04
34.96 41.33 48.44 54.81 60.73 67.62 73.47
35.12 41.70 48.96 55.39 61.63 68.66 74.97
32.93 38.22 44.30 48.97 53.27 58.89 63.25
27.50 29.82 34.50 36.71 38.20 42.89 44.27
22.51 23.76 27.26 28.27 29.13 32.83 33.26
18.94 19.59 22.65 23.26 23.34 26.66 27.01
35.46 42.45 49.94 56.61 63.42 70.57 77.16
426
s. Panchapakesan, A. Childs, B. H. Humphrey and N. Balakrishnan
Procedure 3 (0.9473) is quite close to the optimal power (0.9952) of Procedure 1. Furthermore, the power is clearly superior to that of Procedure 1 (0.8558) in the same situation. Similar results hold at the 10% significance level and are given in Table 15. Aside from being nearly optimal in certain situations under both forms of the alternative hypothesis, the combined procedure has the further advantage of always having an expected sample size that is smaller than both Procedure 1 and Procedure 2. This can be seen by comparing the expected sample sizes for Procedure 3 given in Tables 16 and 17 (for the 5% and 10% significance levels, respectively) with the expected sample sizes given in Tables 5, 10 and 11 for Procedures 1 and 2.
6. Conclusions In this paper, we have presented three alternatives to the standard z2-test of homogeneity in a multinomial distribution. In each of these procedures, the sample size is not fixed. We have seen that Procedure 1 has a distinct advantage over the z2-test in terms of expected sample size (with the power being comparable) when there is slippage to the left, while Procedure 2 has a very significant advantage when there is slippage to the right. In addition, the test statistic for each procedure is much simpler to compute than the zZ-test statistic. Therefore, if the form of the alternative hypothesis is known, we recommend use of one of the first two tests (whichever one is optimal under the given alternative hypothesis). But if the form of the alternative is not known, we recommend the use of Procedure 3 for the large values of M and No given in Tables 14 and 15. This will provide a test that is nearly optimal under both forms of the alternative hypothesis.
Acknowledgements The second and fourth authors would like to thank the Natural Sciences and Engineering Research Council of Canada for funding this research.
References Alam, K. (1971). On selecting the most probable category. Technometrics 13, 843-850. Cacoullos, T. and M. Sobel (1966). An inverse sampling procedure for selecting the most probable event in a multinomial distribution. In Multivariate Analysis (Ed., P. R. Krishnaiah), pp. 423-455. Academic Press, New York. Gupta, S. S. and K. Nagel (1967). On selection and ranking procedures and order statistics from the multinomial distribution. Sankhy6 B 29, 1 34. Hogg, R. V. and A. T. Craig (1995). Introduction to Mathematical Statisties, Fifth edition. Macmillan, New York.
Inverse sampling procedures to test for homogeneity in a multinomial distribution
427
Johnson, N. L. and D. H. Young (1960). Some applications of two approximations to the multinomial distribution. Biometrika 47, 463-469. Patnaik, P. B. (1949). The non-central 7,2- and F-distributions and their applications. Biometrika 36, 202-232. Ramey, J. T. and K. Alam (1979). A sequential procedure for selecting the most probable multinomial event. Biometrika 66, 171-173. Rao, C. R. (1973). Linear Statistieal Inference and Its Applications, Second Edition. John Wiley & Sons, New York. Young, D. H. (1962). Two alternatives to the standard ~2 test of the hypothesis of equal cell frequencies. Biometrika 49, 107-116.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved,
1 l ~)
Prediction of Order Statistics
Kenneth S. Kaminsky and Paul I. Nelson
1. Introduction
Let X = ( X ( 1 ) , X ( 2 ) , . . . ,X(m))' and Y = (I1(1), Y ( 2 ) , . . . , Y(n))' be the order statistics of two independent random samples from the same family of continuous probability density functions {f(xlO)}, where 0 is an unknown parameter vector. Our main goal is to describe procedures where, having observed some of the components of X, say X1 = (X(1),X(2),... ,X(r))' it is desired to predict functions of the remaining components of X, namely X2 = (X(r+l),X(r+2),... ,X(m))t, called the one sample problem, or of Y, called the two sample problem. Motivation for this type of prediction arises in life testing where X represents the ordered lifetimes of m components simultaneously put on test. If the test is stopped after the r th failure so that X1 represents the only available data, we have a type II censored sample. In the one sample problem it may be of interest, for example, to predict: (i) X(m), the time at which all the components will have failed, (ii) a sample quantile X(s) of X2, where s is the greatest integer in m2, 0 < 2 < 1, s > r, (iii) the mean failure time of the unobserved lifetimes in X2. In the two sample problem, it may also be of interest to predict such functions of Y as: (i) the range, (ii) quartiles, (iii) the smallest lifetime. Prediction of order statistics can also be used to detect outliers or a change in the model generating the data. See for example Balasooriya (1989). We will describe interval and point prediction. Much of the past work on prediction intervals concerns approximations to complicated distributions of pivotals on a case by case basis. The availability of high speed computing has somewhat diminished the need for such approximations. Accordingly, we will focus on this computational approach to constructing prediction intervals. Patel (1989) gives an extensive survey of prediction intervals in a variety of settings, including order statistics. While overlapping his review to some extent, we will for the most part complement it. Throughout, we use boldface to denote vectors and matrices and A' to denote the transpose of the matrix or vector A.
431
K. S. Kaminsky and P. L Nelson
432
2. Prediction preliminaries Since prediction of random variables has received less attention in the statistical literature than parameter estimation, we begin by giving a brief general description of point and interval prediction. Let U and W be vectors of random variables whose joint distribution (possibly independence) depends on unknown parameters 0. Having observed U = u, it is desired to predict T = T(W), some function of W. Let T = T(U) be a function of U used to denote a generic predictor of T. Good choices for i? may be defined relative to specification of either the loss L(i?, T) incurred when i~ is used to predict T or some inference principle such as maximum likelihood. When the former approach is used, L(i?, T) is typically some measure of distance between i? and T. An optimal choice for T is then made by finding (if possible) that function which minimizes E{L[I?(U), T(W)]} ,
(2.1)
where"E"denotes expectation over all joint realizations of U and W. The set A = A(u) computed from the observed value of U is called a 1 - 27 prediction region for T = T(W) if for all 0 in the parameter space:
Po(T E A(U)) = 1 - 27 •
(2.2)
This statement is to be interpreted in the following frequentist perspective. Let (Ui, Wi), i = 1,2,... ,M, be independent copies of (U, W) and let N(M) denote the number of these for which T / = Ti(Wi)EA(U,.) holds. Then as M---+ oe, N ( M ) / M ~ 1 - 2 7 . Unlike the typical case of parameter estimation where the true value of the parameter may never be known, an experimenter is often able to ascertain whether or not T lies in A(u). Thus, if an experimenter makes many forecasts where there is a real cost associated with making an incorrect prediction, it becomes important to control the ratio N ( M ) / M . To apply this general setting to the prediction of order statistics we associate the vector U with the elements of X1 = (X(~),X(2),... ,X(r))' I n the one sample problem we associate W with X2 (X(r+l),X(r+2),... ,X(m))' while in the two sample problem, we associate W with Y. Specifically, we have: =
U = X, = (Xllt,X(21,... ,X~rl)' , X2 W=
(2.3)
in the one sample problem, in the two sample problem .
In all cases we consider the function being predicted T = T(W) to be linear with generic form: T = Z~i W~ = i¢' W ,
(2.4)
where {~:i} are constants. Note that by taking all of the components of K except one equal to zero, predictions can be made for individual components of X2 and Y.
Prediction of order statistics
433
3. Assumptions and notation Most of the parametric work on prediction of order statistics assumes that the order statistics X and Y are based on random samples from a location scale family of continuous probability density functions (pdf's) of the form: f(xlO ) = (1/a)g((x -/z)/a)
,
(3.1)
where o- > 0 and the pdf g (known) generates a distribution with finite second moment. Some early papers assumed for convenience that one of these parameters was known. Except when/~ is the left endpoint of the support of the distribution, as is the case when g is a standard exponential distribution, this assumption is unrealistic and we will assume for the most part that both location and scale parameters are unknown. We will also describe the construction of nonparametric prediction intervals where it is only assumed that the pdf of the underlying observations is continuous. Recall that we partition X ' = (Xtl, XI2), where X 1 represents the first r observed order statistics. Denote the vectors of standardized order statistics Zx~, Zx2, and Z y by: gxi ---~ ( X i -]21i)/17 ,
i = 1,2,
(3.2) Zy = (Y-
t~lv)/a ,
where 11,12 and 1y are columns of ones of appropriate dimension. The Z ' s have known distributions generated by the pdf 9- We let c~ij be the expected value of the ith standardized order statistic of a sample of size j from (3.1) and express these as vectors:
~1 = E(Zxl) = (~l,m, ~2,m,.-., °~r,m)I, e2 = E(Zx2) = (er+l,m, Ctr+2,m,..., em,m)',
(3.3)
~y = E ( Z y ) = (~l,n, ~2,n,..., O~n,n)! • Partition the variance-covariance matrix of X as: Var(X) = a2Var(Zx)
0"2( VI'I -
Vl'2)
(3.4)
v211
¢y2 V X
and represent the covariance matrix of Y by: Var(Y) = a2Var(Zv) =- ~r2 Vv .
(3.5)
K. S. Kaminsky and P. I. Nelson
434
For s > r, let a2w~ denote the m - r dimensional column vector of covariances between X1 and X~, so that: C o v ( Xi,X(s) ) -~- ry20)s -- a2V! --
(3.6)
1 ,s
where Vl,s is the i th row of V1,2. The covariances given in Vx and Vr and the expected values in the ~'s specified above do not depend on unknown parameters. They can be obtained: (i) from tables in special cases, (ii) by averaging products of simulated standardized order statistics and appealing to the law of large numbers, (iii) by numerical integration. Note that X and Y are uncorrelated since they are independent. Prediction can also be carried out in a Bayesian setting. Let ~c(0) denote a prior distribution on the parameter 0, rc(0[xl) the posterior distribution of 0 given xl and f(xl, tl__0) be the joint pdf of (X1, T) conditional on 0. The predictive density of t is then given by:
f(tlxl) = / If(x1, tlO)/f(xl IO)]~(dOIxl) •
(3.7)
In the two sample case T and X1 are conditionally independent given 0 and (3.7) simplifies accordingly.
4. Point prediction When predicting future outcomes from past outcomes, whether in the context of order statistics or otherwise, prediction intervals and regions probably play a more important role than point prediction. Yet, just as there are times when we desire a point estimator of a parameter, accompanied, if possible, by an estimate of its precision, there are times when a point predictor of a future outcome may be preferred to an interval predictor. And when a point predictor is accompanied with an estimate of its precision (such as mean square error), it could be argued that little is lost in comparison with having intervals or regions. Point predictors of order statistics, and their errors are the topics of this section.
4.1. Linear prediction Using the notation established in Section 3, we assume that X~ has been observed, and that X2 and/or Y have yet to be observed (or are missing). Let f X(s)
W(s) = [ y(~)
in the one sample problem in the two sample problem
and W = ~"X2 LY
in the one sample problem in the two sample problem
(4.1)
Prediction o f order statistics
435
We will concentrate on the prediction of a single component g~s) of W, although we could equally well predict all of W, or some linear combination of its components. Hence, for point prediction we will drop the subscript s and simply let C o v ( X 1 ,X(s)) = COs = CO ~" ((DI, C O 2 , ' . . , (Dr) t. From Lloyd (1952), the BLUE estimators of # and a based on X1, and their variances and covariances are /~ = - ~ ' I F I X 1 ,
8 = -ltlflX1,
Var(/~) = (0(lall0~l)O-2/A,
Var(6) = ( l ' l a l l l l ) o ' 2 / A ,
(4.2)
Cov(~, 6-) = --(l'lOllel)oa/A , where ! r 1 = ~"~ll(ll~ 1 -- 0~tlll)~"~ll/A A = (ltlallll)(0Ctl~r-~ll0~l)
-- ( l t l a l l l l )
2
,
and Ar'~12 ~
~?x =
1?21 ~r'~22 ff = Vx 1
A linear predictor W(~) = ~'X1 is called the best linear unbiased predictor (BLUP) of W(,)if E(W(s)) = E ( ~ ) )
(4.3)
and E(I~(,) - E(g~s)) 2 is a minimum, where ~ is a vector of constants. Note that in the context of the model (3.2), two constraints are imposed on ~ by (4.3). Namely, that ~'ll = 1 and ~'~1 = cq,k, where k = m in the one sample case and k = n in the two sample case. Goldberger (1962) derived the best linear unbiased predictor of the regression parameters in the generalized linear model (i.e., the linear model with correlated errors). Kaminsky and Nelson (1975a) first applied Goldberger's result to the order statistics of model (3.2) for the prediction of the components of X2 and Y, or for linear combinations of X2 and Y. Combining the one and two sample problems, the BLUP of W(~) can be written X(s) = (~ + ffO~s,rn) -~- cot~e~ll(Xl - ~11 -- O'~1)
W(~) =
^ Y(~)
/~ + 6~,~
in the one sample problem, in the two sample problem,
(4.4)
where 2 _< r < s _< m in the one sample problem, and 2 < r < m; 1 < s < n in the two sample problem. The simpler two sample predictor results from the fact that, by our independence assumption, Cov(Xl, Y(~)) = 0. The mean square error (mse) of W-(~) is
436
K. S. Kaminsky and P. 1. Nelson
O'2{/)ss --¢.O'1t~11¢.0-]-Cll }
mse(l~(,))
i n t h e o n e sample case,
, 2 i t O-2 {~1~'-~1 -t- O~s,nll~'~l 1 -- 2cq,nllllal}/A
in the two sample case , (4.5) where cll = var{(1 - ~O'~llll)fi q- (C~s,rn (2)/~"~11~1)(}}/0-2. We point out that in the two sample case, the BLUP of Y(~)isjust the BLUE of E(Y(~)), and we call f'(~) the expected value predictor of Y(~). In the one sample case, similar simplifications occur to the BLUP of X(~) for a fairly wide class of underlying pdf's. To see what these simplifications are, we state some results from Kaminsky and Nelson (1975b). -
-
Regularity conditions. The underlying standardized pdf, 9, is such that the limits: lira c%~ = Pl;
lira es,m = P2 ,
m--+oo
tn--+oo
(4.6)
and mlim (m • vr~)
_
2,(1
- 22)
g(Pl) :7]~2)
(4.7)
'
where 0 < ~1 < ,~.2 < l, r = ((m -Jr- 1)~1), s = ((m + 1)22), Pi = G-I()~i), i = 1,2, and (a) denotes the closest positive integer to a. Conditions under which the above limits hold may be found in Blom (1958) and van Zwet (1964). We are now ready to state the theorem which characterizes the conditions under which the BLUP of X(~) simplifies to the BLUE of its expected value: exist,
THEOREM 4.1. Consider the order statistics of a random sample of size m from (3.1) and suppose that regularity conditions (4.6) and (4.7) are satisfied. Then the following are equivalent: (i) The (standardized) distribution of the random sample is a member of the family ~ = { G1, G2, G3 }, where % ( x ) = x c,
0<x # and f(x[#, a) = 0 elsewhere. When # is known to equal zero, these results are derived from the fact that the scaled spacings { 2 ( m - i + 1)(X(i)-X(i_l))/a} are digtributed as independent chi-square variates. Let Sr = Y']ir=i X(i) + (m - r)X(~), r times the role of a. For predicting X(~) from X1 with /~ known and taken as zero Lawless (1971) derived the distribution of the pivotal
R(X1 ,Xs) = (X(,) - X(r))/S~
(5.14)
as:
P(R(xl,x,)
>_ t) = ( r / ~ ( s - r , m - s +
rl(
1)) ~
i=0 × [l+(m-s+i+l)t]-r/[m-s+i+l
)
s--r-- 1 (--1)' i
I ,
(5.15)
for all t > 0, where B(a,b) = ( a - 1)!(b- 1 ) ! / ( a + b 1)!. Percentage points {Ra (X1, X(s))}, 0 < 6 < 1, of the distribution given in (5.15) can be approximated by a Newton-Raphson iteration, yielding a 1 - 27 prediction interval for X(s) of the form:
[x(r~ + R,(xl,X(s~)Sr, x(r~ + Rl='(x,,x(,~)sr]
.
(5.16)
Prediction of order statistics
443
Kaminsky and Nelson (1974) show that the percentage points of the distribution given in (5.15) can be approximated by scaled percentage points of an appropriate F distribution. This F-approximation follows from using a Satterthwaite type approximation to the numerator and denominator of R(X1,X(~)). They show that this approximation can also be used when the prediction is based on a subset of X1. An approximate 1 - 2~ prediction interval for X(~) based on X1 is given by: IX(r) + A1F~(a,b)a, X(r) + A,F0 r)(a,b)6-I ,
(5.17)
m where a = 2AZ/A2, Ai = ~j:r+l(m - j + 1)-i,i =. 1,2;b = 2r, F~(a,b) is the 1006 percentage point of an F-distribution with a and b degrees of freedom and r~ is the BLUE of a based on X1. Like~ (1974) extended Lawless's (1971) result to an exponential family with unknown location parameter #. However, the exact results are quite complicated and given implicitly. Instead, we recommend using simulation as outlined above. Lawless (1977) gives explicit formulas for prediction intervals for Y(s) in the two sample problem in the same setting. Abu-Salih et al. (1987) generalize prediction from the one parameter exponential to samples from a mixture of exponential pdf's of the form f(xlal, o-2, fi) = fi(1/al) exp(-x/al) + (1 - fl)(1/a2) exp(-x/a2). The mixing proportion fl is assumed known. They derive an exact prediction interval for X(~) when al/a2 is known. Their formulas are fairly complicated. Their intervals can also be obtained through simulation.
5.4. Optimality Takada (1985, 1993) obtains some optimality results in the one sample problem for the one sided version of the prediction interval given in (5.16) based on samples from an exponential distribution with known location parameter. Consider the class C(1 - ~) of lower 1 - y prediction intervals for X(s) of the type given in (5.9) which are invariant in the sense that for any positive constant c, Lx(cX1) = cLx(X1). Takada (1985) showed that the conditional mean length E(X(~.) - Lx(XI))[X(s) >_Lx(XI)) is minimized over C(1 - y) for all a by the lower bound of Lawless's (1971) interval. In the same setting, Takada (1993) also shows that 6(X1) = X(r)+ RT(X1,X(~)), the lower bound of Lawless's (1971) interval as given in (5.16), minimizes P(X(~) > 6(X1) + a) for all positive constants a , for all values of the scale parameter a, among all invariant lower 1 - 7 prediction limits 6(X1). Takada (1993) calls such optimal intervals uniformly most accurate equivariant. This type of optimality minimizes the probability that the value being predicted is more than any specified distance above the lower endpoint of the prediction interval.
5.5. Adaptive and distribution free intervals Suppose that the exact form of the standardized density given in (3.1) is not known but we are willing to assert that it lies in some specified finite collection G
444
K. S. Kaminsky and P. L Nelson
of pdf's. In such a situation, Ogunyemi and Nelson (1996) proposed a two stage procedure. In the first stage X1, the available data, is used to select a density g from G and the same data are then used in the second stage to construct prediction intervals from (5.12) and (5.13) via simulation. This procedure can be used for both the one and two sample problems. Fully nonparametric (for any continuous pdf) prediction intervals in the two sample problem for individual components Y(~) were given by Fligner and Wolfe (1976). They showed using the probability integral transform and basic properties of uniform order statistics that for 1 < i < j < m ,
P(x(,) < Y(.) < x )l = i=1
-A.
m
i
"
m
(5.18)
Thus, (X(i),X(])) provides a 100A% prediction interval for Y(~). For samples from a discrete distribution the right hand side of (5.18) provides a lower bound on the coverage probability. Fligner and Wolfe (1979) for m and n large and m / ( m + n) not close to 1 approximate the coverage probability A in (5.18) in terms of the standard normal C D F • by ~(Aj) - @(Ai), where Ak = [(m(n + 2))°5(n + 1)/ (s(n - s + 1)(n + m + 1))°5]((k - 0.5)/m - s/(n + 1)), k = i,j. These nonparametric intervals are easy to use and perform well in predicting middle sample quantiles. However, if m and n are very different, the possible levels of coverage are very limited. For example, if s is close to n and m/n is small, the coverage rate in (5.18) will be close to zero. See Patel (1989) for a further discussion of nonparametric prediction intervals.
5.6. Bayesian prediction Specification of a prior on the unknown parameter 0 allows the derivation of a predictive distribution on the quantity being predicted as given in (3.7). The frequentist interpretation of prediction regions given in Section 2 does not apply to probabilities obtained from this distribution and inferences drawn from it can be highly dependent on the prior chosen. The use of what are called noninformative priors provides some level of objectivity. See Geisser (1993) for a full treatment of the application of the Bayesian paradigm to prediction. Dunsmore(1974) applied the predictive distribution to life testing and proposed constructing a 1 - 27 highest posterior density region for T of the form:
A = {t;f(t[xl) > b} ,
(5.19)
where the constant b is chosen by the requirement that f A f ( d t l x l ) = 1 - 2 . 7. Dunsmore derives these regions for samples from the one and two parameter exponential distribution with a variety of priors. From the Bayesian perspective the region given in (5.19) would be interpreted as a 1 - 27 prediction region for T. In some cases, with noninformative priors Dunsmore's regions turn out to be identical to frequentist prediction regions.
Prediction of order statistics
445
Calabria and Pulcini (1994) consider the two sample problem where the data are generated by an inverse Weibull distribution. Specifically, 1/X has a Weibull distribution with parameters c~ and ft. They derive the predictive distribution of Y(s) for a noninformative prior of the form ~(~, fl) = c/aft, ~ > 0, fi > 0. They also use a log inverse gamma prior on P(X > t), for some specified t. This is converted into a conditional prior on ~ given fl and finally into a joint prior by placing a noninformative prior on ft. Lingappaiah (1983) placed a normal prior on # and an independent noninformative prior on ~r2 when observations are from a normal distribution with mean # and variance 0-2. He derived the predictive distribution of Y(s). Lingappaiah (1978) obtained predictive distributions for future order statistics based on several random samples from a gamma family with pdf f(xl~, ~) = x ~-1 exp(-x/a)/F(~)a ~, x > 0, with shape parameter c~being a known integer, by placing a gamma prior on 1/o- ~ 0. There is an initial sample from which all the data are used to construct a posterior for 0. He also supposes that p additional samples resulting in selected order statistics {X(i),ki} are available, where {X(i), ki}, is the ki th order statistic from sample i, 1 < i ulO ) in terms of the quantity S = 1 + Y'~=I iSi, where S,- is the sum of the observations in the ith block, i--- 1,2,... ,k. For example, H(u, 1) = [S/(S + nku)] N+I, so that a
Prediction of order statistics
447
1 - 7 lower predictive interval for Y(1) is given by [u0, ec), where u0 is the solution to H(uo, 1) = 1 - ?. The cases where s > 1 are more complicated but can be handled similarly. Lingappaiah (1985) allowed a more general shift model and used the posterior from previous stages to serve as the prior for the next stage. The presence of outliers can cause serious distortions for both estimation and prediction. Dixit (1994) considered the possibility that the available data contain some outliers. Specifically, Let X = ( X ( I ) , X ( 2 ) , . . . , X ( m ) ) t and Y=(Y(1), Y(2), - • •, Y(n))~ be collections of order statistics constructed by sorting independent random variables {X/} and {Yj} constructed as follows. Both {X/} and {Yj} have distributions in the Weibull family with pdf's of the form:
f(x[O, 6,fl) = fiO6x ~-1 exp[-6Ox~],
x > 0 .
(5.22)
The {X~} are distributed independently with 6 = 1. The {Yj} are independent with k components having pdf with 6 ¢ 1. These k random variables represent outliers. The remaining n - k components have 6 = 1. It is not known which components are outliers. But, the values of 6 and k are assumed known. Dixit (1994) places a gamma prior on 0 of the form:
n(Oia, h) = h(Oh) a-1 e x p ( - O h ) / F ( a ) ,
0> 0 ,
(5.23)
where a and h are specified hyperparameters. The goal here is to construct a predictive interval for the future order statistic Y(s), which may or may not be an outlier. Since the estimator 0 = ~i-1X].~ + (m - r)X/~, the total time on test of the/3 th power of the available data, is sufficient for 0, the predictive interval may conveniently based on 0. Dixit (1994) obtains an expression for H(u,s) =_P(Y(s)> ulO), the predictive probability that Y(~) exceeds u. Things simplify when predicting Y(1). An upper 1 - 7 predictive interval for Y(1) is given by (0, b), where b = [(h + O)/(6k + n - k)l[? -1/(a+r) - 1]. Dixit (1994) also extends these results to situations where several samples are available. In related work, Lingappaiah (1989a) allowed a single outlier with mean (1/a) + 6 in samples from a one parameter exponential distribution with mean a. Using a gamma prior on 1/a he obtained the predictive distribution for X(r) in terms of confluent hypergeometric functions. Also see Lingappaiah (1989c) for predictive distributions of maxima and minima in the one and multiple sample problems based on data from an exponential distribution with outliers. Lingappaiah (1989b) considered samples from a generalized logistic distribution with pdf of the form:
f(x[b, 6) = ce-X/(1 + e-X) c+1 ,
(5.24)
where c = b6 or b + 6. In both cases c ¢ b corresponds to an outlier with 6 a known value. A gamma prior is placed on b and the predictive distribution for Y(s) based on an independent random sample from (5.24) with c = b is obtained. Lingappaiah (1991b) derived the distribution of X(s)-X(r) in samples from a gamma distribution with shape parameter b, both shape and scale parameters assumed known, in the presence of a single outlier with shape parameter shifted a
448
K. S. Kaminsky and P. L Nelson
k n o w n a m o u n t , b + 6. T h i s d i s t r i b u t i o n c a n be u s e d to c o n s t r u c t a p r e d i c t i o n i n t e r v a l f o r X(s) b a s e d o n X(r).
6. Concluding remarks M u c h h a s b e e n l e a r n e d a b o u t the p r e d i c t i o n o f o r d e r statistics in the p a s t 25 years. L i n e a r p o i n t p r e d i c t o r s a n d p r e d i c t i o n i n t e r v a l s b a s e d o n s a m p l e s f r o m l o c a t i o n - s c a l e f a m i l i e s h a v e b e e n e x t e n s i v e l y studied. R e l a t i v e l y little is k n o w n about prediction based on samples from more general families of distributions, n o n l i n e a r p r e d i c t i o n a n d o p t i m a l i t y . F u t u r e r e s e a r c h in these a n d o t h e r a r e a s will u n d o u b t e d l y e x p a n d o u r k n o w l e d g e o f this i n t e r e s t i n g p r o b l e m .
References Abu-Salih, M. S., M. S. Ali Khan and K. Husein (1987). Prediction intervals of order statistics for the mixture of two exponential distributions. Aligarh. or. Statist. 7, 11 22. Adatia, A. and L. K. Chan (1982). Robust procedures for estimating the scale parameter and predicting future order statistics of the Weibull distribution. IEEE Trans. Reliability R-31, 5, 491-499. Balasooriya, Uditha (1987). A comparison of the prediction of future order statistics for the 2parameter gamma distribution. IEEE Transactions on Reliability R-36(5), 591 594. Balasooriya, Uditha (1989). Detection of outliers in the exponential distribution based on prediction. Commun. Statist. - Theory Meth. 711-720. Balasooriya, Uditha and Chart, K. Lai (1983). The prediction of future order statistics in the twoparameter Weibull distributions - a robust study. Sankhyd B, 45(3), 320-329. Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables. Almqvist and Wiksell, Uppsala, Sweden; Wiley, New York. Calabria, R. and D. Pulcini (1994). Bayes 2-sample prediction for the inverse Weibull distribution. Commun. S t a t i s t . - Theory Meth. 23(6), 1811-1824. Chou, Youn-Min. (1988). One-sided simultaneous prediction intervals for the order statistics of 1 future samples from an exponential distribution. Commun. Statist. - Theory Meth. 17(11), 3995-4003. Chou, Youn-Min and D. B. Owen (1986a). One-side distribution free and simultaneous prediction limits for p future samples. J. Qual. Tech. 18, 96-98. Chou, Youn-Min and D. B. Owen (1986b). One-sided simultaneous lower prediction intervals for 1 future samples from a normal distribution. Technometrics 28(3), 247-251. Dixit, Ulhas J. (1994). Bayesian approach to prediction in the presence of outliers for a Weibull distribution. Metrika 41, 12~136. Dunsmore, I. R. (1974). The Bayesian predictive distribution in life testing models. Technometrics 16(3), 455-460. Fligner, M. A. and D. A. Wolfe (1976). Some applications of sample analogues to the probability integral transformation and a coverage probability. Amer. Statist. 30, 78-85. Fligner, M. A. and D. A. Wolfe (1979). Methods for obtaining a distribution-free prediction interval for the median of a future sample. J. Qual. Tech. 11, 192-198. Geisser, S. (1975). The predictive sample reuse method with application. J A S A 70, 320-328. Geisser, S. (1993). Predictive Inference: An Introduction. Chapman Hall, New York. Goldberger, A. S. (1962). Best linear unbiased prediction in the generalized regression model. J A S A 57, 369-375. Kaminsky, K. S. and P. I. Nelson (1974). Prediction intervals for the exponential distribution using subsets of the data. Technometrics 16(1), 57-59.
Prediction of order statistics
449
Kaminsky, K. S. and P. I. Nelson (1975a). Best linear unbiased prediction of order statistics in location and scale families. JASA 70(349), 145-150. Kaminsky, K. S. and P. I. Nelson (1975b). Characterization of distributions by the form of the predictors of order statistics. In: G. P. Patil et al. ed., Statistical Decisions in Scientific Work 3, 113115. Kaminsky, K. S., N. R. Mann and P. I. Nelson (1975). Best and simplified prediction of order statistics in location and scale families. Biometrika 62(2), 525 527. Kaminsky, K. S. and L. S. Rhodin (1978). The prediction information in the latest failure. JASA 73, 863-866. Kaminsky, K. S. and L. S. Rhodin (1985). Maximum likelihood prediction. AnInStMa 37, 507-517. Lawless, J. F. (1971). A prediction problem concerning samples from the exponential distribution with application in life testing. Technometrics 13(4), 725-729. Lawless, J. F. (1977). Prediction intervals for the two parameter exponential distribution. Technometrics 19(4), 469472. Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data. Wiley, New York. Like~, J. (1974). Prediction of sth ordered observation for the two-parameter exponential distribution. Technometries 16(2), 241-244. Lingappaiah, G. S. (1978). Bayesian approach to the prediction problem in gamma population. Demonstratio Mathematica 11(4), 907420. Lingappaiah, G. S. (1983). Prediction in samples from a normal population. J. Statist. Res. 17 (1, 2), 43 50. Lingappaiah, G. S. (1985). A study of shifting models in life tests via Bayesian approach using semi-orused priors (Soups). Ann. Inst. Statist. Math. 37(A), 151-163. Lingappaiah, G. S. (1986). Bayesian approach to prediction in censored samples from the power function population. J. Bihar Math. Soc. 10, 60-70. Lingappaiah, G. S. (1989a). Prediction in life tests based on an exponential distribution when outliers are present. Statistica 49(4), 585 593. Lingappaiah, G. S. (1989b). Prediction in samples from a generalized logistic population of first or second kind when an outlier is present. Rev. Mat. Estat., Sao Paulo, 7, 87-95. Lingappaiah, G. S. (1989c). Bayes prediction of maxima and minima in exponential life tests in the presence of outliers. J. Indust. Math. Soc. 39(2), 169-182. Lingappaiah, G. S. (1991a). Prediction in exponential life tests where average lives are successively increasing. Pak. J. Statist. 7(1), 33-39. Lingappaiah, G. S. (1991b). Prediction in samples from a gamma population in the presence of an outlier. Bull. Malaysian Soc. (Second Series), 14, 1-14. Lloyd, E. H. (1952). Least-squares estimation of location and scale parameters using order statistics. Biometrika 39, 88-95. Malik, H. J. (1966). Exact moments of order statistics from the Pareto distribution. Skandinavisk Aktuarietidskrift 49, 3-4, 144-157. Malik, H. J. (1967). Exact moments of order statistics from a power-function distribution. Skandinavisk Aktuarietidskrift 50, 3 4 , 64-69. Mann, N. R. (1968). Optimum estimators for linear functions of location and scale parameters. Ann. Math. Stat. 40, 2149 55. Nagaraja, H. N. (1984). Asymptotic linear prediction of extreme order statistics. Ann. Inst. Statist. Math. 289-299. Nagaraja, H. N. (1986). Comparison of estimators and predictors from two-parameter exponential distribution. Sankhy~ Ser. B 48(1), 10-18. Ogunyemi, O. T. and P. I. Nelson (1996). Adaptive and exact prediction intervals for order statistics. Commun. Statist. B., 1057 1074. Patel, J. K. (1989). Prediction intervals a review. Commun. Statist. Theory Meth. 18(7), 2393-2465. Raqab, M. Z. and H. N. Nagaraja, (1992). On some predictors of future order statistics. Tech. Report. No. 488, Ohio State Univ.
450
K. S. Kaminsky and P. L Nelson
Stone, M. (1974). Cross-validatory choice and assessment of statistical prediction (with discussion). J. Roy. Stat. Soc. B 36, 111-147. Takada, Y. (1985). Prediction limit for observation from exponential distribution. Canad. J. Statist. 13(4). 325-330. Takada, Y. (1991). Median unbiasedness in an invariant prediction problem. Stat. and Prob. Lett. 12, 281-283. Takada, Y (1993). Uniformly most accurate equivariant prediction limit. Metrika 40, 51-61. Van Zwet, W. R. (1964). Convex Transformations of Random Variables. Mathematisch Centrum, Amsterdam. Watson, G. S. (1972). Prediction and the efficiency of least squares. Biometrika 59, 91-98.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved.
1
I O
T h e P r o b a b i l i t y Plot: T e s t s o f F i t B a s e d o n the C o r r e l a t i o n C o e f f i c i e n t
R. A. L o c k h a r t a n d M . A. S t e p h e n s
1. Introduction
1.1. The probability plot Suppose a r a n d o m sample )(1,X2,... ,Xn comes from a distribution Fo(x) and let X(~),X(2),...,X(n) be the order statistics. Fo(x) m a y be of the form F(w) with w = (x - ~)//3; ~ is then the location p a r a m e t e r and/3 > 0 is the scale p a r a m e t e r of F0(x). There m a y be other p a r a m e t e r s in F(w), for example, a shape parameter; here we assume such p a r a m e t e r s known, but ~ and fi are u n k n o w n . We can suppose that the r a n d o m sample of X-values has been constructed from a r a n d o m sample Wl, w 2 , . . . , wn f r o m F(w), by the t r a n s f o r m a t i o n
x~ = ~ +/~wi .
(1)
If the order statistics of the w-sample are w0) < w(2) < - • • < w(,), we have also x(,) = ~ + ~w(o
(2)
.
Let m~ be E(w(0 ) and let v~j be E(w(~) m i ) ( w ( j ) - mj); let V be the n × n matrix with entries vii. V is the covariance matrix of the order statistics w(0. F r o m (2) we have -
E(X(i)) = 0~+ flmi
(3)
and a plot of X(i) against rn~ should be approximately a straight line with intercept on the vertical axis and slope ft. The values mi are the most natural numbers to plot along the horizontal axis to achieve a straight line plot, but for most distributions they are difficult to calculate. Various authors have therefore p r o p o s e d alternatives T,- which are convenient functions of i; then (2) can be replaced by the model
X(i) = o~+ flTi + ei
(4)
where e~ is an " e r r o r " which has m e a n zero only for T~ = mi.
453
R. A. Lockhart and M. A. Stephens
454
A common choice for T/is ~ _= F-~{i/(n + 1)} or similar expressions which approximate rni. A plot of X(i) against T,. is called a probability plot and the T/are plotting positions. Historically, the plot was often made with T~on the vertical axis and X(0 on the horizontal axis, but we shall think of the plot with these axes x'eversed. Also, when Hi is used, the values i/(n + 1) were marked along the iri axis at the actual value of//,.; this axis is thus distorted, but the resulting paper (called probability paper) is then much easier to use, since only values of i/(n + 1) are required and the actual values of Hi need not be calculated. When the plot is made, a test of H0 : the sample comes from F0 (x) ,
(5)
can then be based on how well the data fits the line (3) or (4). As an example, suppose it is desired to test that the X-sample is normally fw ~ IV2 dt distributed, with unknown mean p and variance 0-2. Then F(w) = ~ ~ and the w-sample is standard normal. Then (3) becomes E(X(i)) =
+ 0-mi
where mi are the expected values of standard normal order statistics. For this distribution, c~= # and fl = 0-.
1.2. Measures of fit The practice of plotting the X(i) against mi (or against T~) and looking to see if a straight line results, is time-honoured as a quick technique for testing fit, particularly for testing normality. However, historically this appears to have been done by eye. An improvement is clearly to find a statistical measure of how well the data fits the line (4), and it is remarkable that this does not seem to have been done for many years after the introduction of the plot. Almost certainly this would have been because the tools were not then available to give tables for an appropriate test statistic. Three main approaches to measuring the fit can be identified. The first is simply to measure the correlation coefficient R(X, T) between the paired sets X/ and T~. A second method is to estimate the line c~+/~T,., using generalized least squares to take into account the covariance of the order statistics, and then to base the test of fit on the sum of squares of residuals. A closely related procedure is to fit a higher-order regression equation for X(i) against T~, and then to test that the coefficients of the higher-order terms are zero. Finally, a third technique is to estimate/~ from (2) using generalized least squares, and to compare this estimate with the estimate of scale given by the sample standard deviation. For all these methods an investigation of the null distribution of the resulting test statistic, for finite n, would require Monte Carlo methods, and high-speed computers were not available when the probability plot was first used; even asymptotic theory is greatly facilitated by modern probability methods which arrived only much later. In this article we give the asymptotic theory for the first of the methods above,
The probability plot
455
that based on the correlation coefficient. In a later section, the other techniques will be briefly surveyed, and connections made.
1.3. The correlation coefficient The correlation coefficient R(X, T) is an attractive measure of straight line fit, if only for the reason that the concept of correlation is well-known to applied workers. In what follows we extend the usual meaning of correlation, which applies to two random variables, and also that of variance and covariance, to apply when one of the pair, T/, is a constant. Thus let X refer to the vector X(1),Xcz),...,X(/7), let m refer to the vector ml,m2,... ,ran, and let T refer to the vector T1 T2, Tn; let X = ~ i = 1 (i)/n and T ---- ~ i = 1 Ti/n, and define the sums ...
~
--
n
XJ
i=1 n
--
/7
i=l n
slx, xl ~- ~(xc,>- x ) 2 ~ ( x , - x-) 2 i=1
i=l
/7
s(r, r)-- ~ ( T / - ~)~. i=1
S(X, X) will often be called S 2. The variance of X is then V(X,X)=I-~IS(X,X), the variance of /" is V( T, T) = J:r_IS( T, T), and the covariance of X and T is V(X, T) = 1-~_S(X a , T). The correlation coefficient between X and T is
R(X, r )
:
v(x, r ) ~/v(x,x)v(r,
r) =
s(x, r ) ~/s(x,x)s(r, r )
We now see another reason why statistic R(X, m), (sometimes called simply R), is an attractive statistic for testing the fit of X to the model (2), since if a "perfect" sample is given, that is, a sample whose ordered values fall exactly at their expected values, R(X,m) will be 1; and with a real data set, the value of R(X,m) can be interpreted as a measure of how closely the sample resembles a perfect sample. Then tests based on R(X, m), or equivalently o n R Z ( X , m ) will be onetailed; rejection of H0 occurs only for low values of R 2. Suppose )((i) = ~ + fiT/, where & and fi are the usual regression estimators of e and fl (ignoring the covariance between the X(i)). From the standard ANOVA table for straight line regression:
R. A. Lockhart and M. A. Stephens
456
Regression SS -
S2(X' T)
s(r, r)
Error SS = S 2
S2S(T,T)(X' T) _
~-'~(X(i)n-X(i))^ 2 i=1
Total
SS
= 82 =
S(X~ X)
it is clear that Error SS Total SS
- 1 - R2(X, T)
g
Define, for any T vector,
z ( x , r) = n{1 - R2(X,
r)}
Then Z(X, T) is a test statistic equivalent t o R2(X, T), based on the sum of squares of the residuals after the line (4) has been fitted. Z(X, T) has, in common with many other goodness-of-fit statistics (e.g., chi-square, and E D F statistics), the property that the larger Z(X, T) is, the worse the fit. Furthermore, in many practical situations, (as in Case 1 of Section 2.1 below), Z has a limiting distribution, whereas R 2 tends in probability to one. Sarkadi (1975) and Gerlach (1979) have shown consistency for correlation tests based on R(X,m), or equivalently Z(X, m), for a wide class of distributions, including all the usual continuous distributions. This is to be expected, since for large n we expect a sample to become perfect in the sense above. We can expect the consistency property to extend to R(X, T) provided T approaches m sufficiently rapidly for large samples.
1.4. Censored data R(X, T) can easily be calculated for censored data, provided the ranks of the available X(i) are known. These X(i) are paired with the appropriate T/and R(X, T) is calculated using the same formula as above, with the sums running over the known i. For example if the data were right-censored, so that only the r smallest values X(i) were available, the sums would run for i from 1 to r; if the data were leftcensored, with the first s values missing, the i would run fi'om s + 1 to n. Tables of Z(X, iv) = n{1 - RZ(x, T)} for T = m or T = H, for testing for the uniform, normal, exponential, logistic, or extreme-value distributions, and with various fractions of the data censored, have been published by Stephens (1986a). Note that the factor n, and not the number of observations available, is used in calculating Z(X, T) for censored data. The only exception to this is when the test is for the uniform distribution (see Section 4).
The probability plot
457
2. Distribution theory for the correlation coefficient
2.1. The general case We n o w discuss the asymptotic behaviour of Z(X, m) for the general test of H0 given in (5). Suppose F(w) is a continuous distribution, and let f ( w ) be its density. We observe ordered values X(~) < X(k+l) < " " < X(r) from a sample of size n f r o m the distribution Fo(x). We can assume that the sample comes from Fo(x) with = 0 and fi = 1, that is, from F(w) although (3) is fitted without this knowledge. N o t e that the sample m a y be left- and/or right-censored; suppose the given n u m b e r of observations is n* = r + 1 - k. We shall proceed heuristically, assuming firstly that the limit of Z(X, m) equals that of Z(X, H). We therefore study Z(X, H), which for convenience we abbreviate to Zn. The expression X ( i ) - ) ( ( i ) = X ( i ) - & - / ~ H ~ m a y be written X ( 0 - H i (X - H ) - (/) - 1) ( 8 - H), where & and/~ are the estimates of c~and fl in (4) when Ti = Hi, X = ~i=kX(i)/n*, and H = ~ir_k H i / n * . Then
Z(X, H) = n{1 - R2(X, H ) }
1
r
n E i = k (X(i) -- X ) 2
L e t p = (k - 1)In and q = r/n and let W = 1/(q - p ) . Also let q* =- F -1 (q) and p* = F l(p). It will be convenient to define the function 0(t) = ( F - l ( t ) - p ) / a where p a r a m e t e r s p and a are defined by
it=
F l(s)Wds=
xf(x)Wdx ,
and 62 =
/;
(F '(s) - #)2Wds =
//
x2f(x)Wdx _ #2 .
Finally, with m = n + 1, define the quantile process
Qn(t) = v/n{X([mt]) -- F -1 (t)} ; note that when t = i/m, Q.(t) = v~{X(i) - Hi}. Also define the process Y.(t) derived f r o m Q.(t) as follows: r (t) = Qn(t) -
Q
(s)Wds- O(t)
O(s)Qn(s)Wds
We now consider the terms in X(i) - X(i) expressed in terms of these processes. Let K = fqQn(s)Wds; then v ~ ( X - H ) = K + o p ( 1 ) .
R. A. Lockhart and M. A. Stephens
458
Also
\.
)
2 ; k (u, -
= (W/a)
f qO(t){Q.(t)
- K } dt + Op(1) ,
Then insertion of these expressions into X(i) - 2(0 = X(i) - Hi - (X - H) (D - 1)(Hi - H) gives the numerator of Z. equal to fpq y2(t) dt + op(1). For the limiting behaviour of Z., suppose Zoo is a random variable with the limiting distribution of Z. as n ---+ec. The process Q.(t) tends to Q(t), a Gaussian process with mean 0 and covariance rain(s, t) - st po(s, t) = f ( F _ 1(s) ) f ( F _ 1(t) ) Also, the process Y.(t) tends to
Y(t) = Q(t) -
Q ( s ) W d s - O(t)
O(s)Q(s)Wds
;
this process is a Gaussian process with mean 0 and covariance
p(s, t) = Po(S, t) - O(s)
~,(u)po(u, t ) W d u - O(t)
/q O(u)po(s , u ) W d u
_ fpqpo(u,t, W d u _ ~ q P o ( S , u ) W d u + ~qfpq p o ( u , v ) W 2 d u d v + O(s)O(t)
I 2 qpo(u,v)O(u)~'(v)W2dudv
+ (O(s) + O(t))
po(u, v)O(u ) W 2 du dv .
The numerator of Zoo is then T = fq y2(t)dt, and the denominator is a 2. Thus the limlting behavlour of Z, depends on the behavlour of T = fpq y2(t)dt, whxch depends on f.q Q2(t)dr; the behaviour of this integral in turn depends ultimately on the covarlance function po(s, t) through the following two integrals: •
.
J1 =
and
//
J P
Po(t, t)
,
,
dt
q J2 =
p02(s, t) ds dt .
The first integral decides whether or not Z, has a finite limiting mean, and the second whether it has a finite limiting variance. There are then three possible cases
Theprobabilityplot
459
guiding the limiting behaviour of Zn, depending on whether or not Ja and J2 are finite or infinite. These will be described and then illustrated with examples of tests on well-known distributions. CASE 1. In this case both Jl and ,/2 are finite. This is a situation which occurs often with other goodness-of-fit statistics which are asymptotically functionals of a Gaussian process, for example, statistics based on the empirical distribution function (EDF). There is then a well-developed theory to obtain the limiting distribution of the functional T = fq y 2 ( t ) d t and hence of 2",, above; see, for example, Stephens (1976). The limiting distribution of Zn takes the form
L¢. i
Zoo = ~r2 ~
2i
(6)
where vi are independent X2 variables and '~i a r e eigenvalues of the integral equation f(s) = 2
//
(7)
p(s, t)f(t) dt .
The mean of Zo~ is ~i~__l 2~-l/a 2 and the variance is ~i~1 2~i2/°-4; these will be finite when both J1 and J2 are finite. CASE 2. Suppose J2 is finite but J! -- ~ , In this case the limit of the mean of Z, is ~i~=l 2~-1 = ~ , and there exists an -+ cc such that 1
Z. - a. = n(1 - R 2) - an ~ ~ Z
271 (/)i -- 1) ,
(8)
i=1
where the 2i are again the eigenvalues of (7), and the vi are again independent Z~ variables. CASE 3. In this case both integrals J1 and J2 are infinite. Then in regular cases there exist constants an and bn, such that
Zn-an_n(t bn
- R 2) - a n
bn
=~ N(0, 1) .
(9)
2.2. Examples 1. Test f o r the uniform distribution - Case 1 Suppose the test is for the uniform distribution over the interval (a,b), with parameters a and b unknown. For any p or q Case 1 applies and ( r - k + 1)(1 - R 2) has the same limiting distribution regardless o f p or q. This test will be discussed in detail in Section 4.
460
R. A. Lockhart and M. A. Stephens
2. Test f o r the exponential distribution - Cases 1 and 3 The test is for F(x) = 1 - e -x/°, - o c < x < ec, with 0 > 0, and u n k n o w n . This test has been extensively examined by L o c k h a r t (1985) who gave the following results. F o r q < 1 (right-censored data) Case 1 applies and the distribution is a sum o f weighted chi-squared variables. This case is important when the exponential distribution is used to model, for example, survival times. F o r q = 1 we have case 3; a, = log n, arid b. = 2 1 ~ n, so that n(1 - R 2) - logn 2lx/~n
N(0, 1) .
3. Test f o r the logistic distribution - Cases 1 and 3 This test is for F(x) = 1/(1 +e-X), - e c < x < ec. F o r p > 0, and q < 1 we get Case 1. Thus the logistic test, when both tails are censored, is similar to the exponential test. F o r p = 0 or q = 1 or both we get Case 3. F o r complete samples, where p = 0 and q = 1, M c L a r e n and L o c k h a r t (1987) have shown that a, = log n and bn = 23/2 l v ~ n . 4. Test f o r the extreme value distribution Cases 1 and 3 Suppose the tested distribution is F(x) = exp (-e-X), - o c < x < ec; we shall call this distribution EV1; when q < 1 Case 1 occurs, and Case 3 occurs when q = 1. M c L a r e n and L o c k h a r t have shown that for complete samples, a, = logn, and b, = 2 ~ , as for the exponential test. W h e n the test is for the extreme value distribution (EV2) in the form: F(x) = 1 - exp (-e~), - o c < x < oc, case 1 occurs when p > 0 and for any value o f q; for p = 0, Case 3 occurs. This is to be expected since EV2 is the distribution of - X when X has distribution EV1. 5. Test f o r the normal distribution Cases 1 and 2 Suppose the test is for the n o r m a l distribution with mean and variance unknown. Then it m a y be shown that, for p > 0 and q < 1, that is, for data censored at both ends, we get Case 1 (both Jl andJ2 are finite), while f o r p = 0 or q = 1 or both we get Case 2. In the next section this test is examined and c o m p a r e d with similar tests.
3. Tests for the normal distribution 3.1. The correlation test The probability plot for testing normality has such a long history and has been used so often that it seems worthwhile to treat this example in greater detail. Historically, it was for testing normality that probability paper has been m u c h used, especially for the various effects arising in the analysis o f factorial experiments; see, for example, Davies (1956). It is also for the normal test that most of the more recent w o r k has been done to bring some structure into tests based on such plots, starting with the well-known test o f Shapiro and Wilk (1965). We
The probability plot
461
show in the next section that there are interesting connections between this test and the correlation test (the two are asymptotically equivalent), but only for the test for normality. Also in the next section we discuss briefly extensions of the probability-plot technique which are developed by fitting a polynomial to the plot rather than the simple linear equation (2). As was stated above, when the test is for the normal distribution with mean and variance unknown, we have that, for p > 0 and q < 1, that is, for data censored at both ends, Case 1 occurs (both Jl and J2 are finite), while for p = 0 or q = 1 or both Case 2 occurs. The results for p = 0 and q = 1 (that is, for a complete sample) were shown by de Wet and Venter (1972), using an approach somewhat different from that given above. De Wet and Venter show
Z, -an = n ( 1 - R 2) - a , ==~~-~ 2?l(vi- 1) ,
(10)
i=1
that is, Case 2 of Section 2.1 above with a = 1. These authors give a table of values of an for given n, and also tabulate the asymptotic distribution. They also considered the case where both # and a are given, so that the test reduces to a test for the standard normal distribution, and also the cases where one of the parameters is known and the other unknown. In all these cases the estimates of unknown parameters must be efficient; the obvious choices are the usual maxim u m likelihood estimates. A natural choice of an, in Cases 2 and 3, will be the mean of Zn, and for the normal tests this can be found. Consider again the case where both parameters are unknown, and suppose the test statistic is Z(X,m). The statistic is scale-free, and we can assume the true a = 1. Consider R2(X,m)= T/m~m, where N = S2(X, m) is the numerator of T and D = S(X, X) is the denominator, using the notation of Section 1. Because D is a completely sufficient statistic for a, T is distributed independently of D. It follows that the mean of T is the ratio of the mean of N and the mean of D. The mean of D is n - 1, so it remains to find the mean of N. Therefore we need E{Sa(X, m)}. Let V be the covariance matrix of standard normal order statistics. We have S(X, m) = m'X, so S2(X, m) = m'XX'm, and its expectation is m ' ( V + m m t ) m = m ~ V m + (m~m) 2. Thus the mean of R 2 (X, m) is E{R 2 (X, m) } = { (mtVm/mtm) + re'm}/(n - 1) . Using (11) in
(11)
a, = E(Z,) we obtain
a, = { n / ( n - 1 ) } { ( n - 1) - (mWm/m'm) - m ' m } . Asymptotically, using the result Vm -+ m / 2 (Stephens, 1975; Leslie 1987), we find an=n-l.5-m'm=trace(V)-l.5
.
Tables of mi exist for a wide range of values of n; they can also be obtained in some computer packages. Balakrishnan (1984) gives an algorithm to calculate mrm directly, and Davis and Stephens (1977) give an algorithm for V.
R. A. Lockhart and M. A. Stephens
462
When the parameters are known, it is best to substitute X / = (X(0 - #)/a. The test statistic corresponding to Z, is, say, Z0,n = ~ = 1 (~' - mi)2; the mean of Z0,n is aon equal to trace(V) = n - m'm. Asymptotically, an = a0n - 1.5. When both parameters are unknown and H is used instead of m, algebra similar to the above gives for the new an, say an~/
anH = (n/(n -- 1)} { (n -- 1) -- ( H ' V H / H ' H ) - (H'm) 2 / H ' H } . De Wet and Venter noted the relationship an = a0, - 1.5 (approximately for finite n) between the constants obtained in their consideration of the statistic Z(X, 1t). This is to be expected, since for large n the limiting distributions of Z(X, m) and Z(X, H) are the same (Leslie, Stephens and Fotopoulos, 1986). The expression for a0n used by de Wet and Venter is a0n = {1/(n + 1)}
j(l -j
~-1
-2
where j = i/(n + 1), and where ~b(.) and ~b(.) denote the standard normal density and distribution functions respectively. The term in the sum is the first term in a classic formula for approximating vii; see, for example, David (1981). An interesting feature of the various limiting distributions is that the weights in the infinite sum of weighted Z~ variables are the harmonic series I/j, j = 1,2,.... The terms in the sum start at j = 1 when both parameters are given; they start at j = l but omit j = 2 when the mean is known but the variance is estimated from the sample; they start at j = 2 when the mean must be estimated but the variance is known; and they start at j = 3 when both parameters must be estimated.
3.2. Other test procedures (a) The Shapiro-Wilk and Shapiro-Franc& tests There is a fascinating connection between Z(X, m) and the well-known ShapiroWilk (1965) and Shapiro Francia (1972) statistics for testing normality with complete samples and parameters unknown; these statistics derive from the third method of testing given in Section 2, namely to estimate fl from (2) using generalized least squares, and to compare this estimate with the estimate of scale given by the sample standard deviation. In the test for normality, the estimate/~, which is now an estimate of a, becomes /~ = & = m ' V - i X / m ' V - I m
,
(12)
and the Shapiro-Wilk statistic is, to within a constant,
w-
(re'v-Ix)2
0. Table 2 gives Monte Carlo percentage points for Z3,,, for a range of values of n, and also asymptotic points.
466
R. A. Lockhart and M. A. Stephens
Table 1 Critical points for Z2,~ and Zzx,, n
Z2,,
Z2A,,,
Upper tail significance level
4 6 8 10 12 18 20 40 60 80 100 e~ 4 6 8 10 12 18 20 40 60 80 100 vo
0.50
0.25
0.15
0.10
0.05
0.025
0.01
0.690 0.763 0.806 0.832 0.848 0.877 0.881 0.907 0.916 0.920 0.922 0.932 0.140 0.166 0.184 0.193 0.200 0.209 0.212 0.224 0.228 0.229 0.230 0.233
1.240 1.323 1.364 1.388 1.407 1.438 1,444 1,470 1,480 1.485 1.488 1.497 0.245 0.287 0.307 0.320 0.330 0.346 0.349 0.362 0.367 0.369 0.370 0.374
1.94 1.89 1.85 1.88 1.89 1.91 1.92 1.93 1.93 1.94 1.94 1.94 0.333 0.379 0.403 0.420 0.432 0.452 0.455 0.472 0.477 0.479 0.480 0.485
3.47 2.59 2.37 2.34 2.33 2.32 2,32 2.32 2.32 2.32 2.32 2.31 0.411 0.467 0.494 0.512 0.523 0.543 0.547 0.563 0.568 0.570 0.572 0.578
8.67 4.74 3.78 3.40 3.27 3.12 3.10 3.03 3.00 2.99 2.98 2.98 0.545 0.616 0.648 0.670 0.683 0.705 0.708 0.727 0.734 0.736 0.737 0.744
20.3 8.49 6.29 5.30 4.80 4.26 4.18 3.82 3.73 3.71 3.70 3.67 0.707 0.796 0.830 0.848 0.861 0.882 0.886 0.903 0.909 0.911 0.912 0.917
47,0 17.0 11.4 8.9 7.8 6.3 6.0 5.1 4.9 4.9 4.8 4.6 1.010 1.065 1.089 1.102 1.111 1.121 1.124 1.138 1.146 1.149 1.150 1.155
Table 2 Critical points for Z3,n n
4 6 8 10 12 18 20 40 60 80 e~
Upper tail significance level 0.50
0.25
0.15
0.10
0.05
0.025
0.01
0.344 0.441 0.495 0.535 0.560 0.605 0.610 0.640 0.648 0.658 0.666
0.559 0.703 0.792 0.833 0.864 0.940 0.960 0.980 0.988 0.997 0.992
0.734 0.901 1.000 1.068 1.093 1.147 1.200 1.215 1.227 1.228 1.234
0.888 1.053 1.163 1.245 1.280 1.348 1.370 1.396 1.410 1.418 1.430
1.089 1.325 1.474 1.532 1.608 1.672 1.680 1.732 1.750 1.760 1.774
1.238 1.590 1.739 1.846 1.918 2.008 2.025 2.076 2.092 2.104 2.129
1.388 1.918 2.100 2.294 2.360 2.503 2.520 2.580 2.590 2.610 2.612
Theprobabilityplot
467
The derivation of the weights for Cases 2 and 3 is given in the Appendix.
4.2. Use of the tables with censored data Suppose origin and scale are both unknown (Case 3), and the data is censored at both ends. Thus n* = r - k + 1 observations are available, consisting of all those between X(k) and XCr). R(X, T) may be calculated, using the usual formula, but with sums for i from k to r, and with ~ = i/(n + 1) or T / = i, or even Tl, T2,..., Tn, equal to 1 , 2 , . . . , n*, these latter values for T, being possibilities because R(X, m) is scale and location invariant. In effect, for this test, the sample can be treated as though it were complete. Then n * { 1 - R 2 ( X , T)} = Z(X, T) will be referred to Table 2, using the values for sample size n*.
4.3. Example It is well-known that if times Q(i), i = 1 , 2 , . . . , n represent times of random events, occurring in order with the same rate, the Q(i) should be proportional to uniform order statistics U(i). Thus the Q(i) may be regressed against i/(n + 1) or equivalently against i as described above, to test that the events are random. Suppose Q(9), Q(10),..., Q(20) represent a subset of such times, denoting times of breakdown of an industrial process. We wish to test that these are uniform; times Q(1) to Q(s) have been omitted because the process took time to stabilize and the events are not expected to have occurred at the same rate as the later times. The times Q(9),Q(10),...,Q(20) are 82, 93, 120, 135, 137, 142, 162, 163, 210, 228, 233, 261. The value of Z(Q, T) = 12{1 -RZ(Q, T)} = 0.464. Reference to Table 2 at line n = 12 show that there is not significant evidence, even at the 50% level, to reject the hypothesis of uniformity.
5. Power of correlation tests
In this section we make some general remarks about the power of tests based on the correlation coefficient. Firstly, it is well-known that the Shapiro-Wilk test gives very good power for testing normality; although not as superior as originally suggested, it is marginally better than E D F statistics W2 and A2 (Stephens, 1974). Because of the connection noted in Section 3, we can expect the correlation test to have similarly good power for the normal test, where it has been mostly used. Also, the correlation test for the uniform distribution can be expected to be good, because of its connection with the E D F statistic W2, which is powerful for this test. However, the uniform and normal distributions have relatively short tails, and power results are less impressive, at least for large samples, when we consider tests for heavy-tailed distributions. The EV and logistic distributions have tails similar to the exponential and for testing these and similar distributions McLaren and Lockhart (1987) show that correlation tests can have asymptotic relative
468
R. A. Lockhart and M. A. Stephens
efficiency zero when compared with E D F tests. If the tests are conducted at level c~ and the alternative distribution approaches the null at the rate 1In 1/2, E D F tests will have power greater than e, whereas correlation-coefficient tests will have power equal only to c~. It is clear that these results on relative efficiency apply because the values in the heavy tails are the important ones influencing efficiency of the tests; for the effect of heavy tails see Lockhart (1991). The effect may be seen from the probability plots themselves; it is a well-known feature of regression that widely-spaced observations at one end of the set will be the influential values in determining the fit of the line. A direct comparison was made for the exponential test, by Spinelli and Stephens (1987), using complete samples of sizes 10, 20 and 50. The results for these sample sizes are somewhat difficult to interpret, but they suggest that the E D F statistics W2 and A2 are overall better than the correlation coefficient statistics, although the latter occasionally score well. In cases where the data are censored in the heavy tail (for this test, right-censored), and the asymptotic distribution reverts to Case 1, power results might well be somewhat different. More work is needed on comparisons for small, censored samples. It should be emphasised also that the rather negative theoretical results above are asymptotic results, and may only have a serious effect for very large samples. For small samples, the advantages of familiarity and graphical display will make correlation tests appealing in many circumstances; this will be especially so in the important situations where data are censored, since for Z, tables exist (Stephens, 1986a), where they may not for E D F tests.
A. Appendix A.1. Asymptotic theory f o r the uniform t e s t - Case 3
In this section the asymptotic theory of Z ( X , m ) is given for case 3 (the most difficult case, when both the parameters in the uniform distribution are unknown) following the general theory developed in Section 2. It is convenient to write the fitted model as X(i) = ~ + fi(mi - m) + ¢i •
(1)
As before, the process Qn(t) = x(i) - m i , and Y,(t) = X(i) -f((i), using the notation of Section 2; also v ~ ( X - N) =
and
/0'
O~(s) ds + Op n-½
The probability plot l
v/n ~ -
,
469
X
1 =
E i n l ( mi _ ~ ) 2 / n
=12foi(t-~){Qn(t)-folQn(s)ds}dt+Op(n-½) recalling that rh = 1/2 and ~ ' l ( m i Then Y,(t) becomes
m)2/n -+ 1/12.
I'
Y, (t) = X(i) - X(i) = Q, (t) -
Q, (s) ds
(/0'
-/0'
,
du+
As before, when n ~ oc, let Q(t) and Y(t) be limiting processes for Q,(t) and Y~(t) respectively. Q(t) is the well-known Brownian bridge with mean E{Q(t)} = 0 and covariance po(s, t) = min(s, t) - st. The process Q,(t) - fo Q,(s) ds has already been studied in connection with the Watson statistic U 2 (Watson, 1961; Stephens, 1976). For the asymptotic distribution of Z(X, m) we now need the distribution of Z~ = 12
I'
YZ(t)dt .
(2)
The covariance function of Y(t) requires considerable algebra but the calculation is straightforward; the result is
p3(5`, t) = p0(5`, t) + ~(5`)~(t)/5 - ~ ( s ) w ( t ) - w(5`)~(t) with v(s) = s - 1/2 and w(s) = s(1 - s ) ( 2 s - 1). The distribution of Z~ takes the form (Case 1 of Section 2) Z~ = 12~ -~v~ A..~ 2t i=I
(3)
where vi are independent Z~ variables. The 2~ are eigenvalues of the integral equation
2
11
f ( s ) p 3 (5`, t) ds = f ( t )
.
(4)
Suppose the corresponding eigenfunctions are j~(t). The solution of (4) is found as follows. The covariance/)3 (s, t) can be written p3(s, t) = min(s, t) + g(s, t), with
9(5`, t) = 65`t- I@ + 25`2 -- 5`3 --11~ t + 2t 2 _ t 3 + 2 _ 3st 2 + 2st 3 _ 3s2t + 2s3t . Differentiation of (4) twice with respect to t then gives
470
R. A. Lockhart and M. A. Stephens
f01f(s)ds + ( 1 2 t - 6 ) f o l S f ( s ) d s =
-f(t)-(6t-4)
lf"(t).
(5)
D i f f e r e n t i a t i o n a g a i n gives
- f ' ( t ) - 6 .j~O1f ( s ) d s + 12 ~o 1 s f ( s ) d s =
f(3)(/)
(6)
a n d finally
-f"(t)
= If(4)(t) .
Thus
f ( t ) = A cos v ~ t + B sin v ~ t + Ct + D .
(7)
L e t Ko = fd f ( s ) ds a n d K, = fd sf(s) ds. Set 0 = x/2~; t h e n C K0 = -}sin 0 - ~(cos 0 - 1) + ~ - + D
(8)
and
K1 =
f
l
C D sf(s) ds = AI1 + BI2 + ~ + g ,
(9)
where It =
f l
s cos Os ds =
0 sin 0 + cos 0 - 1
02
and I2 =
a£o
s sin Os ds
sin 0 + 0 cos 0 -
02
S u b s t i t u t i n g f ( t ) i n t o (5) gives - C t - D + (4 - 6t)K0 + ( 1 2 t - 6)K1 = 0 for all t; thus, e q u a t i n g coefficients, we h a v e - C - 6K0 + 12K1 = 0 a n d - D + 4K0 - 6 K I = 0. H e n c e c + ~ = K 1 , and C+D=K0. T h u s f r o m (8) we h a v e A sin 0 - B ( c o s 0 - 1) = 0, a n d f r o m (9) we have
AII+BI2 = 0
.
(10)
H e n c e 0 m u s t satisfy sin 0 cos 0 - 1
B A
I1 12
1 - 0 sin 0 - cos 0 sin 0 - 0 cos 0
(11)
So 0 satisfies 2 - 0 sin 0 - 2 cos 0 = 0, b y c r o s s - m u l t i p l i c a t i o n o f (11). L e t q5 = 0; then 2 - 44> sin ~b cos q5 - 211 - 2 sin 2 ~b] = 0, and hence sin ~b = 0 or sin ~b - q5 cos 4) = 0. T h e n q5i = ~i, i = 1 , 2 , . . . ; o r a l t e r n a t i v e l y q5k is the s o l u t i o n
471
The probability plot
o f tan q~ = qSk, k = 1 , 2 , . . . . Finally, 2i = 4~b~, for the first A-set, and 2~ = 4q52, for the second 2-set.
A.2. Asymptotic theory - Case 2 n
^
2
n
F o r Case 2 the test statistic is n ~-~4=l(X(i)-X(i)) / ~ i = l ( X ( i ) _ ~ ) 2 = Z2,n. We can take a = 0 in the model E(X(i)) = a + t~mi, so t h a t E(X(i)) = flmi, and least squares gives /~ = 2i=1 n X (i)mi/Y'~i=l n m2. Hence /~ - 1 = ~ i =n i (X( i ) - mi)mi/ ~ i ~ l m~. Similar reasoning to that for Case 3 gives the asymptotic distribution of Z2,, to be that of 12 f l yz2(t) dt where Y2(t) = Q(t) - 3t
/01
sQ(s) ds .
(12)
Q(t) is as defined in the previous section, and then Y2(t) is a Gaussian process with mean 0; its covariance function (after some algebra) is 14 st 3 s3 t p2(s, t) = rain(s, t) - y s t + ~ - -t 2
(13)
Thus for the weights in the asymptotic distribution of Z2,~, we need eigenvalues o f 2f~p2(s,t)f(s)ds=f(t ). Similar steps for those of Case 3 give f ( t ) = A cos Ot + B sin Ot + Ct + D with 0 = v/2, as before. Also, f ( 0 ) = 0, so D = - A , and el
- f ( t ) + 3t Jo sf(s) ds - f"(t)2
(14)
Thus f"(O) = 0, so D = A = 0. Then, from (14), we have
[/0'
- B sin Ot - Ct + 3t B
s sin Os ds +
/0 ]
Cs 2 ds = - B sin Ot .
Hence fd s sin Os ds = 0; thus Oj is the solution of sin Oj - Oj cos Oj = 0, that is, tan Oj = 0j, j = 1 , 2 , . . . . Finally, 2j = 03. These are the weights given in Section 4.
A.3. Asymptotic percentage points The final step is to calculate the percentage points of, say, Z3o~ = f/-~l vi/2i where )oi are the weights for Case 3. The mean #3 of Z3oo is f~ p3(s,s)ds = 1/15. The 80 largest 2i were found, and Z3~ was approximated by $1 = S * + T, where S* = ~ 2 1 vi/2i and T =/~3 - ~ ° 1 2 ; 1. SI differs from Z3~ by ~~ic~__812 ~ l ( v i - 1) which is a r a n d o m variable with mean 0 and variance 2 i~812 9 =
= 2
{/001
p~(s,t)dsdt-
8 2} 2~
;
472
R. A. Lockhart and M. A. Stephens
this value is negligibly small. Thus critical points of Z3oo are found by finding those of S*, using Imhof's (1961) method for a finite sum of weighted )/2 variables, and then adding T. These points are given in the last line of Table 2. Similar methods were used to give the asymptotic points for Z2,n and ZZA,nin Table 1.
References Balakrishnan, N. (1984). Approximating the sum of squares of normal scores. Appl. Statist. 33, 242-245. Coronel-Brizio, H. C. and M. A. Stephens (1996). Tests of fit based on probability plots. Research Report, Department of Mathematics and Statistics, Simon Fraser University, Burnaby, B. C., Canada, V5A 1S6. David, H. A. (1981). Order Statistics. Wiley, New York. Davies, O. L. (Ed.) (1956). The Design and Analysis of Industrial Experiments. Hafner, New York. Davis, C. S. and M. A. Stephens (1977). The covariance matrix of normal order statistics. Comm. Statist. Simul. Comput. B6, 135-149. De Wet, T. and J. H. Venter (1972). Asymptotic distributions of certain test criteria of normality. S. African Statist. J. 6, 135-149. Durbin, J. (1973). Distribution Theory for Tests Based on the Sample Distribution Function. Regional Conference Series in Applied Mathematics, 9. SIAM: Philadelphia. Durbin, J. and M. Knott (1972). Components of Cram6r-von Mises statistics. J. Roy. Statist. Soc. B 34, 290-307. Gerlach, B. (1979). A consistent correlation-type goodness-of-fit test; with application to the twoparameter Weibull distribution. Math. Operations-forsch. Statist. Ser. Statist. 10, 427~452. Imhof, J. P. (1961) Computing the distribution of quadratic forms in normal variables. Biometrika 48, 419~426. LaBrecque, J. (1977). Goodness-of-Fit tests based on nonlinearity in probability plots. Technometries 19, 293 206. Lehman, H. Eug6ne (1973). On two modifications of the Cram~r-von Mises statistic. J. Roy. Statist. Soc. B 35, 523. Leslie, J. R. (1987). Asymptotic Properties and New Approximations for both the Covariance Matrix of Normal Order Statistics and its Inverse. Goodness-of-Fit (P. R6v6sz, K. Sarkadi and P. K. Sen, eds), 317 354. Elsevier, New York. Leslie, J. R., M. A. Stephens and S. Fotopoulos (1986). Asymptotic distribution of the Shapiro Wilk W for testing for normality. Ann. Statist. 14, 1497-1506. Lockhart, R. A. (1985). The asymptotic distribution of the correlation coefficient in testing fit to the exponential distribution. Canad. J. Statist. 13, 253-256. Lockhart, R. A. (1991). Overweight tails are inefficient. Ann. Statist. 19, 2254-2258. Lockhart, R. A. and M. A. Stephens (1995). The Probability Plot: Consistency of Tests of fit. Research Report, Dept. of Mathematics and Statistics, Simon Fraser University. McLaren, C. G. and R. A. Lockhart (1987). On the asymptotic efficiency of certain correlation tests of fit. Canad. J. Statist. 2, 159-167. Purl, M. L. and C. Radhakrishna Rao (1975). Augmenting Shapirc~Wilk Test for Normality. Contributions to Applied Statistics: Volume dedicated to A. Linder 129-139. Birkhauser-Verlag: New York. Sarkadi, K. (1975). The consistency of the Shapiro-Francia test. Biometrika 62, 445450. Shapiro, S. S. and R. S. Francia (1972). Approximate analysis-of-variance test for normality. J. Amer. Statist. Assoc. 67, 215-216. Shapiro, S. S. and M. B. Wilk (1965). An analysis-of-variance test for normality (complete samples). Biometrika 52, 591-611.
The probability plot
473
Spinelli, J. J. and M. A. Stephens (1987). Tests for exponentiality when origin and scale parameters are unknown. Technometries 29, 471-476. Stephens, M. A. (1974). EDF statistics for goodness-of-fit and some comparisons. J. Amer. Statist. Assoc. 69, 730-737. Stephens, M. A. (1975). Asymptotic properties for covariance matrices of order statistics. Biometrika 62, 23-28. Stephens, M. A. (1976). Asymptotic results for goodness-of-fit statistics with unknown parameters. Ann. Statist. 4, 357-369. Stephens, M. A. (1986a). Tests based on regression and correlation. Chap. 5 in Goodness-of-Fit Techniques (R. B~ D'Agostino and M. A. Stephens, eds.). Marcel Dekker, New York. Stephens, M. A. (1986b). Tests for the uniform distribution. Chap. 8 in Goodness-of-Fit Techniques (R. B. D'Agostino and M. A. Stephens, eds.) Marcel Dekker, New York. Watson, G. S. (1961). Goodness-of-fit tests on a circle. Biometrika 48, 109-114.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved,
1 p-/ l l
Distribution Assessment
Samuel Shapiro
1. Introduction
Many statistical analysis procedures require that the analyst assume some form for the distributional model which gave rise to the data. Early in the development of mathematical statistics the normal distribution was the model of choice. Today there are a wide range of models to choose from. The accuracy of the analyses which require such assumptions depend on how close the chosen model is to the actual distribution. Thus it is not surprising that the history of "goodness of fit" goes back to the beginnings of the development of modern statistics; the initial procedure in this area was developed by Pearson (1900) and is the well known chi squared goodness of fit test. Since Pearson's beginnings in this area there have been a plethora of procedures developed for a wide range of statistical models. Each of these procedures attempts to make use of some property of the model being tested and use this property to differentiate between the model and other possible distributions. Since distributions can be described by their order statistics it follows that properties of their order statistics can also be used to construct distributional tests. Some of these procedures use the sample order statistics directly while others use the spacings between adjacent sample order statistics. In the following chapter we will limit the discussion to distributional tests based on order statistics which are composite, no assumptions about the value of the parameters are needed, and are omnibus, have good power against a wide range of possible alternative models. Most of these procedures can only be used for location and scale parameter families or with distributions which can be transformed to a location-scale format. The procedures presented will be based on one of two rationales. The first will use the regression relationship of the sample order statistics on the expected values of the order statistics from the standardized hypothesized model, i.e., the model stated in the null hypothesis with its location parameter equal to zero and its scale parameter equal to one. Letting Y(i, n) be the ith order statistic from a sample of size n from population f(Y; #, ~) we have the relationship
Y(i,n)=#+~rm(i,n)÷~(i),
i= 1 , 2 , . . . , n 475
(1)
s. Shapiro
476
where # and G are the location and scale parameters, m(i , n) is the expected value of the i th order statistic from a sample of size n from f(y;0,1) and s(i) is the random error. The rationale behind these regression type tests of fit is that if the data were sampled from the hypothesized model the regression would be a straight line or equivalently the correlation between Y(i, n) and the re(i, n) would be close to one. The second group of tests are based on some function of the weighted spacings of the sample order statistics defined as
X(i, n) = K[Y(i, n) - Y(i - 1, n)],
i = 2,..., n
(2)
where K is some weighting function. These ideas will be developed in the following sections of this chapter. An excellent reference for an extensive discussion of testing for distributional assumptions can be found in D'Agostino and Stephens (1986) and for a handbook description of such procedures see Shapiro (1990) or Chapter 6 of Wadsworth (1990).
2. Probability plotting 2.1. Introduction One of the earliest techniques which used the order statistics in distributional assessment was with a graphical procedure known as probability plotting. Probability plotting can be used with scale and location parameter families of distributions. This technique, while not an objective procedure, yields a graphical representation of the goodness of fit of the data to the hypothesized model. The extent and magnitude of the departures from the model are apparent. The underlying rational is to use the regression equation expressed in (1) and plot the ordered observations against the expected value of the order statistics from the null distribution. If the selected model is correct this plot will be approximately linear, up to perturbations due to random error, and the slope of the line will yield an estimate of the scale parameter, a, and the intercept an estimate of the location parameter, #. If the model is incorrect then the plot will deviate from linearity, usually in a systematic pattern and the analyst will be able to reject the hypothesized distribution. The procedure also highlights outlying observations. While a subjective assessment as to the linearity must be made it is possible to make informative decisions and the ability to distinguish between the null and alternative models gets easier as the sample size increases. Some of the earliest work in the area of probability plotting was done by Mosteller and Tukey (1949) in connection with the use of Binomial probability paper. Chernoff and Lieberman (1956) discussed the use of generalized probability paper and Birnbaum (1959) and Daniel (1959) developed the concepts involved in using half-normal probability plots in connection with the analysis of 2 ~ experimental designs. Wilk et al. (1962) discussed probability plotting for the
Distribution assessment
477
gamma distribution and Wilk and Gnanadesikan (1961) applied the technique in connection with graphical analysis of certain multivariate experiments. Elementary discussions on the construction of probability plots appear in many statistical texts such as Hahn and Shapiro (1967), Shapiro (1990), Nelson (1986) and D'Agostino and Stephens (1986).
2.2. Construction of plots One of the major assets of this procedure is the ease in preparing the plots. It is not necessary to know the parameters of the distribution being hypothesized nor the expected value of the order statistics from the null, standardized distribution. Special paper is available for a number of distributions where one of the scales has been transformed so that the user need only plot some function of the order number and sample size and the scaling of the paper transforms it to the corresponding value of re(i, n) in equation (1). The choice of the function depends on the null distribution and is based on the work of Blom (1958) who suggested that a good approximation to the mean of the iTM order statistic
m(i,n) = F ~(Tri) , where
While various authors have made recommendations for use of specific values of ai and fli for a variety of distributions (for example Blom (1958) suggested using 3/8 for both these constants with the normal model) in many cases the plotting positions (i - 0.5)/n or i/(m + 1) are used as a general compromise. There is commercially available probability paper for the following distributions: normal, lognormal, exponential, extreme value, logistic, Weibull and chi squared (with degrees of freedom known, up to ten). The latter can be used for a gamma plot with known shape parameter if the shape parameter corresponds to the degrees of freedom for the chi squared paper. Wilk et al. (1962) describe how to construct probability paper for any gamma distribution when the shape parameter is known. Most statistical software packages have routines for construction of probability plots although for many of these the output is difficult to use. The following are the steps for constructing a probability plot if the computer is not used and the plotting paper is available. 1. Select the model to be tested and obtain a sheet of probability paper for the chosen model. 2. Let ~, i = 1,2,... ,n be the unordered sample values. Obtain the sample order statistics by ordering the observations from smallest to largest and denote these as f(i,n), where Y(1,n) U. Otherwise the lot is accepted. Again, using t ~ = (U - g ) / ~ the batch will be rejected if t ~ < k. Two critical quality levels are usually associated with acceptance sampling plans: an acceptable quality level (AQL) and a lot tolerance percent defective (LTPD) which is also called the rejectable quality level (RQL). AQL represents the percent defective considered acceptable as a process average. LTPD represents the level of quality that the consumer wants to have rejected. The acceptance criteria and sample size are often chosen such that the probability of accepting a lot coming from a process operating at an acceptable quality level (AQL) and the probability of rejecting a lot coming from a process operating at the Lot Tolerance Percent Defective (LTPD) are preassigned values e and/?, respectively (see, for example, Owen, 1963). Hence, the acceptance criteria and sample size are chosen such that the probability of accepting a lot coming from a process operating at an acceptable quality level (AQL) is 1 - c~, and the probability of rejecting a lot coming from a process operating at the Lot Tolerance Percent Defective (LTPD) is 1 -/~. Alpha (e) is the producer's risk, and/3 is the consumer's risk. For normally distributed characteristics one uses the statistics # = x and ~ = s. We will denote the latter plans as sampling plans by Lieberman and Resnikoff (1955). An identical statement of the sampling-plan specification problem can be made in terms of hypothesis testing. Essentially we are seeking tests of the hypothesis concerning the fraction of defectives p:
Application of order statistics to sampling plans Ho:p=AQL
and
Ha:p=LTPD
.
499
(3)
Thus we state that P{acceptlHo }=1-~
and
P{acceptlHa }=/~ ,
(4)
and we seek to determine n and k which satisfy this requirement.
3. Robustness of variable sampling plans for normal distributed characteristics Although the elegance of the variables method and its efficiency when the assumption of normality is true and all observations are available make this procedure superior to the attributes procedure, its sensitivity to the assumption of normality leads to the attributes procedure being used even when the variables method can be applied. Many authors therefore studied the effect of non-normality on variables sampling plans. Owen (1969) gives a summary of the work in this area. More recent work on sampling plans for non-normal populations include Masuda (1978), Rao et al. (1972), Schneider et al. (1981), Srivastava (1961) and Takagi (1972). In this article we will only discuss methods involving order statistics. Farlie (1983) has described some undesirable features of acceptance sampling by variables that have hindered its widespread use. He also gives an example to demonstrate the role outliers play in variable sampling plans. He considers an (n, k) sampling plan with n = 3 and k = 1.12. A lower limit L = 0 is specified. Hence, i f ~ - 1.12 s >_ 0, the batch is accepted; otherwise, it is rejected. From two batches, samples are taken where xl = 0.15, X 2 = 1.15, x3 = 2.15 and yl = 0.15, y2 = 1.15, y3 -- 3.05. The first sample leads to the acceptance of the associated batch because 2 = 1.15, sx = 1. The second sample leads to rejection of the associated batch since y = 1.45 and Sy = 1.473. The result seems paradoxical since the y sample is intuitively better than the x sample, yet the better sample leads to the rejection of the batch and the poorer sample leads to acceptance of the batch. This paradox is caused by the large observation 3.05. The normality assumption translates the large value (far away from the lower specification limit) into evidence for large deviations in the other direction as well. Thus, due to the symmetry of the normal distribution one expects small values in the batch close to the lower specification limit. Farlie (1983) states that sampling plans should have a property which relates sample quality to batch acceptance. Consider two samples A (Xl,X2,... ,x,) and sample B (Yl,Y2,..., y,). Let x(i) and Y(i) be the ith order statistics of sample A and B respectively. Sample A is preferred to sample B (with respect to a lower specification limit) if, and only if x(i) _>Y(i) for all i and x(i) > Y(i) for at least one i. Intuitively there should be no sample leading to a rejection of a batch which is preferred to a sample leading to acceptance of a batch. Farlie calls this the "Property Q" and develops sampling plans based on order statistics which have this intuitive property. He considers the statistic
500
H. Schneider and F. Barbera
n
T(Xl, x2,..., xn) = Z
aix(i)
(5)
i=1
where the weights ai, i = 1,2,... ,n are chosen to minimize the variance of the estimator T with the restrictions that T is an unbiased estimator of # + ka and ai >_ 0 for i = 1 , 2 , . . . , n. The latter requirement is to satisfy the property Q mentioned above. Farlie's sampling plans for lower specification limits result in sampling plans which are censored from above, i.e., a~ = 0 for i = r + 1, r + 2 , . . . , n for some r < n. The sampling plans turn out to have increasing censoring (i.e., more sample items have weights zero) as the acceptability constant, k, increases. The relative efficiency, measured by the ratio of the asymptotic variances of sampling plans by Lieberman and Resnikoff to the property Q sampling plans, is very high. For instance, for a sample size of n = 10 and 50% censoring from above the reported efficiency is still 95% . Symmetrical censoring was proposed by Tiku (1980) as a method of obtaining robust test procedures. He showed that symmetrical Type II censoring, where a fixed number of the sample items is truncated at both ends of the sample, is a powerful method of obtaining robust estimation of the location parameter of a population and performs quite well with other well-known robust estimators. This is so because non-normality essentially comes from the tails of the distribution and once the extreme observations (representing the tails) are censored, there is little difference between a non-normal sample and a normal sample. Subsequently, Kocherlakota and Balakrishnan (1984) used symmetrical censoring to obtain robust two-sided variable sampling plans. The authors use Tiku's modified maximum likelihood estimator (MML) to estimate the mean and standard deviation. A simulation study (Kocherlakota and Balakrishnan (1984)) suggests that these censored sampling plans are quite robust when applied to various non-normal distributions. This means that while variable sampling plans by Lieberman and Resnikoff are very sensitive to deviations from normality, symmetrical censoring of the sample will result in probabilities of acceptance which are closer to the expected ones regardless of the distribution of the population.
4. Failure censored sampling plans Consider a life test where the quality characteristic is time of failure, T, and the distribution function F(x; #.~a) belongs to the location scale family, i.e., the distribution of F(z), where z = (x - #)/~r is parameter free. Examples discussed later are the normal and the extreme value distribution. We note however, that the sampling plans will apply to the Weibull and lognormal distribution because of the relationship between the two pairs of distributions, i.e., the logarithm of a Weibull distributed random variable is extreme value distributed while the logarithm of a lognormal random variable is normally distributed.
Application of order statistics to sampling plans
501
Variable sampling plans such as discussed earlier and published for instance in MIL-STD-414 (1957) may be used. However, since it is time consuming to wait until the last item fails, these plans are not best suited for life tests. To save time, tests can be terminated before all test units have failed. The test can be discontinued after a prechosen time (time censored) or after a prechosen number of items have failed (failure censored). This paper is restricted to failure-censored sampling for location scale family distributions for two reasons. First, it is easier to draw inference from failurecensored samples than from time-censored samples. The reason is that the covariance matrix of estimators for location,/~, and scale, a, parameters of the distribution depends on the true values of # and a only through the pivotal quantity u = (x(r) -/~)/o- ,
(6)
where x(r) is the censoring point in time or the largest failure. For failure-censored samples, this quantity u is fixed, but for time censoring it has to be estimated. Consequently, the design and performance of failure-censored sampling plans do not depend on the unknown parameters of the distribution function as do the design and performance of time-censored sampling plans. Second, for time censoring there might not be any failures, in which case it is impossible to estimate both parameters of the distribution. In practice, however, time censored sampling may be preferred. This is partly because most life-test sampling plans have a constraint on the total amount of time spent on testing. Although the test time of failure-censored sampling plans is random, the distribution of test time can be estimated from historical data. If pooled parameter estimates were used from batches that were accepted in the past, the distribution of test time for good quality could be estimated very accurately. (Poor quality will obviously have a shorter test time.) This distribution can then be used as a guide for choosing a plan. The accuracy of these parameter estimates does not influence the performance of the failure-censored sampling plans, but it does influence the distribution of the time the experimenter has to wait until the failures have occurred. Failure-censored sampling plans for the doubly exponential distribution were discussed by Fertig and Mann (1980) and by Hosono, Okta, and Kase (1981). Schneider (1989) presented failure-censored sampling plans for the lognormal and Weibull distribution. Bai, Kim and Chun (1993) extended these plans to accelerated life-tests sampling plans. The main difference between various published plans is the type of estimators used. We will first describe the general methodology for developing failure-censored sampling plans. To prepare for this description, note that an analogy can be made between the sampling plans presented in this section and the variable-sampling plans by Lieberman and Resnikoff which control the fraction defective of a product, wherein the item is defective if its measured variate, X, is less than some limit, L. Since the reliability is just the fraction of items with failure times greater than the specified mission time, the time an item is required to perform its stated mission, we may equate unreliability at the mission time with fraction defective below L. Hence we
502
H. Schneider and F. Barbera
will use L as the notation for mission time. Analogously, one may define (Fertig and Mann 1980) the Acceptable Reliability Level (ARL) as the reliability level at which we want a 1 - ~ probability of acceptance and the Lot Tolerance Reliability Level (LTRL) as that level at which we want a 1 - fl probability of lot rejection. The statistic whose realization is used to decide whether or not a lot should be accepted in the normal case is ( x - L)/s, with ~ the sample mean and s the adjusted sample standard deviation. This is the statistic which gives uniformly most accurate unbiased confidence bounds for the fraction defective p = ~b(L;#, a). Even though the Weibull case does not admit complete sufficient statistics as in the normal situation, there do exist statistics which can be used to obtain confidence intervals on the reliability R(L). Consider failure censored data that can be generated by placing n items on life test and waiting until the r th failure has occurred. Let the order statistics of the sample from the location scale family be given by xlll,°,xl21,°,xc31,.,
. . .
(7)
where X(i), n is the i th order statistic of a sample of size n. For simplicity we will omit the index n and write X(i). Note that for the lognormal and Weibull distribution we take the logarithm of the failure times to obtain a location scale distribution. These order statistics can be used to test hypotheses concerning R(L). We may use t = ~ - k~ where ~ and ~ are estimates of the location and scale parameter, respectively and compare the t with the mission time L. If t < L the batch is rejected, otherwise the batch is accepted. Equivalently, we may use ta = ~ L and reject the batch if t ~ < k. The value k is called the acceptability constant and depends on the sample size, censoring, percentage defectives and the covariance matrix of the estimators used. For the variables sampling plans by Lieberman and Resnikoff specifying the combination of consumer's and producer's risk levels is sufficient to define both the acceptance criterion and the sample size. This is because censoring is not considered. When censoring is allowed, however, an added degree of freedom is introduced that requires the user to specify another criterion. In designing sampling plans one seeks the smallest sample size satisfying the specified levels of consumer's and producer's risk. The purpose of censoring is usually to obtain more failure information in a shorter period of time. Thus, placing three items on test and waiting until all three fail will take, o n the average, a longer period of time than if one places 10 items on test and waits for three failures. Fertig and Mann (1980) therefore suggest that with the introduction of censoring, some function of sample size and test time should be minimized subject to consumer and producer's risk in order to find a compromise between sample size and test time.
4.1. Plans based on the chi square approximation for the statistic t Fertig and Mann (1980b) consider the extreme value distribution and use the best linear invariant estimators (BLIE's) for its location and scale parameters # and tr,
Application of order statistics to samplingplans
503
respectively which are, for instance, described by Mann (1968). The distribution of t given in (1) (or t' given in (2)) depends on # and a only through F(e-~), the standard extreme value distribution. For the extreme value case where the unreliability, or fraction defectives is p=l-R(k)=l-
e x p { - exp [L--~a ~] }
(8)
and thus tp = # + a l n ( - l n [ 1 - p ] )
(9)
is t h e p l 0 0 th percentile of the reduced extreme value distribution, the distribution of t has been tabulated by Mann and Fertig (1973) using Monte Carlo procedures. Engelhardt and Bain (1977) and Mann, Schafer, and Singpurwalla (1974) offer approximations that have been used to determine percentiles of the distribution of t for various ranges of p, n and r, the r th failure of the censored data. The approximations were developed in order to construct confidence intervals on the reliability R(L) as well as tolerance bounds on x(r) for specified p. Unfortunately, these approximations are not universally valid as pointed out by Fertig and Mann (1980b). Lawless (1973) offers an exact procedure for obtaining confidence bounds on reliability and thus performing hypothesis testing. However, his method, which is based on an ancillary statistic, requires a numerical integration for each new sample taken, and therefore is not amenable to the construction of tables. Moreover, it is not clear how one could easily determine the risk under unacceptable alternatives (e.g., when the process is operating at the LTRL). Fertig and Mann (1980b) developed a Chi Square approximation to define the sampling plans they presented. 4.2. Plans based on the normal approximation for the statistic t
Schneider (1989) used the maximum likelihood estimators to estimate the location and scale parameters/~ and a and applied a large sample approximation to the statistic t defined in (1). However other estimators may be used as well. In what follows we shall use the best linear unbiased estimators (BLUE's) of Gupta (1952). The main difference between the plans described by Schneider (1989) and the plans described below is the covariance matrix used. Consider the BLUE's of /~ and a for a failure censored sample of size n where only the first r failures are observed. The estimators are weighted sums of the order statistics x(i) i = 1 , 2 , . . . , n of a sample of size n r
=
a .x(o
(lO)
i--1
r
:
(111 i=1
504
H. S c h n e i d e r a n d F. B a r b e r a
where the ai,n and bi,n are coefficients depending on the sample size and the distribution of the measured characteristic. For the normal distribution they are tabulated in Sarhan and Greenberg (1962) for sample sizes up to n = 20. For the extreme value distribution the coefficients are tabulated in Nelson (1982) for sample sizes up to n = 12. Let fi and ~ be the best linear unbiased estimators of/~ and a, respectively. We consider the statistic given in (1) i.e., t = ~ - k~ which is an unbiased estimator of # - k a and is asymptotically normally distributed (Plackett, 1958). Let the covariance matrix of the estimators be given by Var(~'a) = az[Ta11712
722712]"
(12)
For the normal distribution the factors 76 are tabulated in Sarhan and Greenberg (1962) for sample sizes up to n = 20. Nelson (1982) gives the factors 7a for the extreme value distribution for sample sizes up to n = 12. The variance of the statistic t is therefore Var(t) = o-2{71, + k272z - 2k712} .
(13)
In the following, large-sample theory is used to derive equations for the sample size and the acceptability constant for a given degree of censoring and given two points on the operating characteristic (OC) curve. The standardized variate U=
(14) M/O'2{711 ÷ k2722 - 2k712}
is parameter-free and asymptotically standard normally distributed. Thus let Zp be the p l00 th percentile of the log lifetime distribution (normal or extreme value) corresponding to the fraction nonconforming p, then the operating characteristic curve which gives the probability of acceptance for various percent defectives is approximately given by
(15)
where 7n,r(k) = {711 + k2722- 2k712}
(16)
and ~b(x) is the cumulative standard normal distribution function. Suppose we would like to determine an (n, k) sampling plan for two given points on the OC curve (p~, 1 - ~) and (p~, fl). It can be shown (Schneider, 1989) that the acceptability constant k is (asymptotically) dependent only on the percentiles of the log lifetime distribution and the standard normal distribution, i.e. k = U~Zp~ - ul ~Zpo U~ - - U 1
(17)
Application of order statistics to sampling plans
505
Thus k can be determined independently of n and the degree of censoring. The sample size n satisfies the equation n =
7n,r(k) .
(18)
Unfortunately the right side of the last equation involves the sample size n. However, a solution can be found through a search procedure. Notice also that for any (n, k) sampling plan discussed here, the OC curve can be determined by Monte Carlo simulation. This is possible because the distribution of the standardized variate U is parameter-free, depending on the sample size n, the acceptability constant k, and the number of failures r. Thus a simulated OC curve can be used to select a sampling plan. The same procedure is used when the maximum likelihood estimators (Schneider, 1989) are used instead of the BLUE. However, for the MLE, the equation for n is easier to solve because asymptotic covariance factors on the right hand side of the equation depend only on the percent censoring and not on the sample size. It was shown (Schneider, 1989) that even for small sample sizes the asymptotic results for the MLE's are accurate enough for practical purposes. Since the small sample covariance matrix is used for the BLUE's the results can be expected to be accurate as well. 4.3. Distribution o f the test length
The sampling plans derived in this article are failure censored. In practice, it is often desirable to know the length of the test in advance. Percentiles of the test time distribution can also be obtained from the distribution of the order statistic x(r) (David, 1981), which gives the time of the r th failure. If the test time x is lognormal distributed or Weibull distributed, then, after a logarithmic transformation, the pl00 th percentile x(r)p is computed by x(r)p = antilog{# + z(~)pa}
(19)
where z(r)p is the pl00 th percentile of the r th order statistic from a standard normal distribution or the smallest-extreme-value distribution. These percentiles may be obtained from Pearson and Hartley (1970). Note that the computation of the percentage points of the test time requires estimates of the parameters # and a. They should be estimated from historical data based on an acceptable quality level p~. In this case, the estimated test times are conservative; that is, if the lot quality of the current lot is good (at p~), then the estimated times are valid. If the quality of the lot is poor (p > p~), however, then the true test times will be shorter. Therefore, for planning purposes the test time distribution is conservative. To protect against long testing of a very good product (p < p~) one can introduce a time at which tests are terminated. A combination of failure censored and time censored sampling plans was discussed by Fertig and Mann (1980b). For these plans the tests are terminated if
506
H. Schneider and F. Barbera
a predetermined number of failures (re) occur or the test time for the items being tested exceeds xr, whichever comes first. The time xr is a predetermined feasible test time. I f the test is terminated because the test time exceeds x~, the lot will be accepted provided less than re failures occurred. The actual test time is then the minimum o f x r and the test time x~c to obtain rc failures. Fertig and M a n n (1980) give the median test time of xr, for Weibull distributed data based on the test statistic t.
5. Reduction of test times for life-test sampling plans A c o m m o n problem in life-test sampling plans is the excessive length of test times. Life times of good products are usually high making life test time consuming and expensive. Thus there is an interest in methods which help to reduce test times. The next two sections deal with such methods. One method to shorten the test time is to use accelerated test times discussed in the next section. The last section discusses ways of reducing test time by testing groups of items and stopping the test when the first item in the group fails.
5.1. Accelerated life-test sampling plans Bai, K i m and Chun (1993) extended the life test sampling plans developed by Schneider (1989) to accelerated tests. Accelerated life tests make use of a functional relationship between stress levels and failure time. The functional relationship has to be known. In m a n y cases a linear function is used. When accelerated tests are used the test time can be reduced substantially. Test items are usually tested at much higher stress levels than normally experienced during actual applications which we will refer to as the design stress. The failure times are then used to estimate the linear (or other) relationship. The estimated function is then used to extrapolate to failure times at design stress conditions usually not tested. Bai, K i m and Chun consider the following model. The location parameter /~ is a linear function of the stress level ]/(S) = 70 -~- 71 s
(20)
where s is the stress level and 71 and 7o are unknown constants. The scale parameter a is constant and independent of stress. The life test uses two predetermined stress levels Sl and s2 where sa < s2. A random sample of size n is taken from a lot and allocated to the two stress levels. The tests at each stress level are failure censored and the respective number of failures at each test level are rl and r2. The test procedure is the same as for the sampling scheme discussed in Section 4.2. The test statistic used at the design stress so, is t = ~(s0) - k~ where
(21)
Application of order statistics to sampling plans ~(S0) = 70 ~- ~lS0 •
507
(22)
The statistic t is compared to a lower limit L. The lot is accepted if t > L otherwise the lot is rejected. The sample size n and acceptability constant k are to be determined so that the OC curve of the test plan passes through two points (p~, 1 - c~) and (pfl, fl). Bai et al. (1993) use the maximum likelihood estimators for estimating the parameters in the model. To obtain the optimum proportions of the sample allocated to each stress level the following reparametrization is convenient. Let _ s-s0
,
(23)
S2 -- SO
then the mean # may be written in terms of ~=fl0+fll~
(24)
fl0----~0-J-~lS2
(25)
fil = 71(s2 - so) •
(26)
where
and
Bai et al. (1993) choose the proportion of the sample rc which is allocated at the low stress level to minimize the determinant of the covariance matrix
Cov(fi0, ill, ~)Nelson and Kielpinski (1976), however, provide an argument that an optimum plan uses just two stresses where the highest allowable test stress, s2, must be specified, while the low stress, sl, and the proportion, re, may be determined by minimizing the variance of the estimator under consideration. Hence the sampling plans of Bai et al. (1993) may not be optimal. This explains why the required sample sizes of the accelerated test plans can actually exceed the sample sizes of the failure censored sampling plans suggested by Schneider (1989) for a given risk. Bai et al. (1993) also give an approximation for the expected log test time
E[x(r,),ni] = { flo + fll~i + acb-l {ri-3/8"~ I,ni 1/4] f°r l°gn°rmal }
(27)
flO -}- fll~i q- crlp-1 {ri-l/4"~ f o r W e i b u l l
where ~b and ku are the standard normal and standard extreme value distribution and the adjustments (3/8) and (1/4) are based on Kimball's (1960) plotting positions on probability paper.
5.2. Group-Test sampling plans Balasooriya (1993) presents failure censored reliability sampling plans for the two parameter exponential distribution where m sets of n units are tested. Each set of n
508
H. Schneider and F. Barbera
units is tested until the first failure occurs. Balasooriya considers the two parameter exponential distribution function
f(x;#,a)=lexpI-X-#], 0.-
L
x>#
and o - > 0 .
(28)
O.~j
Let (29)
X(I),i, X(2),i, X(3),i,""", X(n-1),i, Y(n),i
be the order statistics of a random sample of size n from (28) of the ith set, i = 1 , 2 , . . . , m . The first order statistics xfl),i of sample i, i = 1 , . . . , m has the probability distribution function f(xfl); #, o-) = ~ne x p [- n X(1 )a-- ~] , X(l)_>#
anda>O
(30)
Let X(1),(I), X(1),(2), X(1 ),(3), • • • , X(1),(m-1), X(l),(m)
(31 )
be the order statistics of the smallest values from each sample of size n then the maximum likelihood estimator of xfl) in (30) is fi = xfl),fl)
(32)
and the maximum likelihood estimator of a is = 2ira1 (xfl),(i) -xfl),fl))
(33)
m
Sampling plans are then constructed in the usual way. The statistic t = ~ 4- k~
(34)
is compared to a lower specification limit L and the lot is rejected if t < L otherwise the lot is accepted. The operating characteristic curve is based on a result of Guenther et al. (1976). For t = g - k~ the probability of acceptance is easily obtained, however, when t = ~ + k~ the operating characteristic curve is more complicated and the solutions for m and k have to be found iteratively (Balasooriya, 1993). The expected test times depend on the setup of the tests. When the m sets are tested consecutively, the total test time is m
T c = ~-~x(1),(i )
(35)
i=l
and assuming # = 0 the expected test time is mcr E[rcl
/'1
For simultaneous testing one obtains
(36)
Application of order statistics to sampling plans
509
Ts = X(l),(m )
(37)
E[T~]
(38)
a n d thus =
- - 2._.#
ni=li
B a l a s o o r i y a (1993) p r o v i d e s tables for these s a m p l i n g plans.
6. Conclusion O r d e r statistics are e m p l o y e d in m a n y ways in a c c e p t a n c e sampling. First, o r d e r statistics are used to i m p r o v e the r o b u s t n e s s o f s a m p l i n g p l a n s b y variables. Second, in life testing one uses o r d e r statistics to s h o r t e n test times. Since life is an i m p o r t a n t quality characteristic a n d life tests are time c o n s u m i n g a n d expensive, recent w o r k focuses on reducing test times o f s a m p l i n g plans. T r a d i t i o n a l l y only the s a m p l e size a n d a c c e p t a b i l i t y c o n s t a n t k were c o n s i d e r e d the design p a r a m eters for v a r i a b l e s a m p l i n g plans. W h e n test p l a n s are censored, a new design p a r a m e t e r , the degree o f censoring, is a d d e d a n d a c o m p r o m i s e between s a m p l e size a n d test length has to be found. F u r t h e r research needs to be d o n e to c o m p a r e different s a m p l i n g schemes to d e t e r m i n e s a m p l i n g plans which are a best c o m p r o m i s e b e t w e e n the v a r i o u s objectives.
References Bai, D. S., J. G. Kim and Y. R. Chun (1993). Design of failure censored accelerated life test sampling plans for lognormal and Weibull distributions. Eng. Opt. vol. 21, pp. 197-212. Balasooriya, U. (1993). Failure-Censored reliability sampling plans for exponential distribution-A special case. J. Statist. Comput. Simul. to appear. Cohen, A. C., Jr. (1961). Tables for maximum likelihood estimates: Singly truncated and singly censored samples. Teehnometrics 3, 535-541. Das, N. G. and S. K. Mitra (1964). The effect of Non-normality on sampling inspection. Sankhya 261, 169-176. David, H. A. (1981), Order Statistics. New York, John Wiley. Engelhardt, M. and L. J. Bain (1977). Simplified procedures for the Weibull or extreme-value distribution. Technometrics 19, 323 331 Farlie, D. J. G. (1983). Sampling Plans with Property Q, in Frontiers in Statistical Quality Control. H. J. Lenz et al., eds., Physica-Verlag, Wurzburg, West Germany. Fertig, K. W. and N. R. Mann (1980b). An accurate approximation to the sampling distribution of the studentized extreme-value statistic. Teehnometrics 22, 83 97. Fertig, K. W. and N. R. Mann (1980b). Life-Test sampling plans for two-parameter Weibull populations. Technometrics 22, 165-177. Guenther, W. C., S. A. Patil and V. R. Uppuluri (1976). One-sided/3-content tolerance factors for the two parameter exponential distribution. Technometrics 18, 333-340. Gupta, A. K. (1952). Estimation of the mean and standard deviation of a normal population from a censored sample. Biometrika 39, 260-273. Harter, H. L. (1970). Order Statistics and Their Use in Testing and Estimation (Vol. 2), Washington, DC: U.S. Government Printing Office.
510
H. Schneider and F. Barbera
Hosono, Y., El. Okta and S. Kase (1981). Design of Single Sampling Plans for Doubly Exponential Characteristics, In: Frontiers in Statistical Quality Control, H. J. Lenz et al., eds., Physica-Verlag, Wurzburg, West Germany, 94--112. Kimball, B. F. (1960). On the choice of plotfing position on probability paper. J. Amer. Statist. Assoc. 55, 546-560. Kocherlakota, S. and N. Balakrishnan (1985). Robust two-sided tolerance limits based on MML estimators. Commun. Statist. - Theory Meth. 14, 175-184. Lawless, 1. F. (1973). Conditional versus unconditional confidence intervals for the parameters of the Weibull distribution. J. Amer. Statist. Assoc. 68, 665-669. Lawless, 1. F. (1975). Construction of tolerance bounds for the extreme-value and Weibull distributions. Technometrics 17, 255-261. Lieberman, G. J. and G. J. Resnikoff (1955). Sampling plans for inspection by variables. J. Amer. Statist. Assoc. 50, 457-516. Mann, N. R. (1967a). Results on location and scale parameter estimation with application to the extreme-value distribution. ARL67-0023, Aerospace Research Laboratories, Office of Aerospace Research, USAF, Wright-Patterson Air Force Base, Ohio. Mann, N. R. (1967b). Tables for obtaining best linear invariant estimates of parameters of the Weibull Distribution. Technometrics 9, 629-645. Mann, N. R. (1968). Point and interval estimation procedures for the two-parameter Weibull and extreme-value distributions. Technometrics 10, 231-256. Mann, N. R. and K. W. Fertig (1973). Tables for obtaining confidence bounds and tolerance bounds based on best linear invariant estimates of the extreme-value distribution. Technometrics 15, 86-100. Mann, N. R., R. E. Schafer and N. D. Singpurwalla (1974). Methods for Statistical Analysis of Reliability and Life Data. New York, John Wiley and Sons. Masuda, K. (1978). Effect of Non-normality on Sampling Plans by Lieberman and Resnikoff. Proceedings of the International Conference on Quality Control, Tokyo, Japan D3, 7-11. MIL-STD-414 (1957). Sampling Procedures and Tables for Inspection by Variables for Percent Defectives. MIL-STD-414, U.S. Government Printing Office, Washington, D.C. Nelson, W. (1982). Applied Life Data Analysis. John Wiley and Sons, New York. Nelson, W. and T. Kielpinski (1976). Theory for optimum accelerated life tests for normal and lognormal distributions. Technometrics 18, 105-114. Nelson, W. and J. Schmee (1979). Inference for (log) Normal life distributions from small singly censored samples and BLUE's. Technometrics 21, 43-54. Owen, D. B. (1963). Factors for One-sided Tolerance Limits and for Variables Sampling Plans. SCR607, Sandia Corporation monograph. Owen, D. B. (1969). Summary of recent work on variables acceptance sampling with emphasis on nonnormality. Technometrics 11, 631-637. Pearson, E. S. and H. O. Hartley (1970). Biometrika Tables for Statisticians (Vol. 1, 3rd ed.) Cambridge, U.K., Cambridge University Press. Rao, J. N. K., K. Subrahmaniam and D. B. Owen (1972). Effect of non-normality on tolerance limits which control percentages in both tails of normal distribution. Technometrics 14, 571-575. Sarhan, A. E. and B. G. Greenberg (eds.) (1962). Contributions to Order Statistics. New York, Wiley. Schneider, H. and P. Th. Wilrich (1981). The robustness of sampling plans for inspection by variables, In: Computational Statistics, H. Buning and P. Naeve, eds., Walter de Gruyter, Berlin New York. Schneider, H. (1985). The performance of variable sampling plans when the normal distribution is truncated. J. Qual. Tech. 17, 74-80. Schneider, H. (1989). Failure-Censored variable-sampling plans for lognorrnal and Weibull distributions. Technometrics 31, 199-206. Srivastava, A. B. L. (1961). Variables sampling inspection for non-normal samples. J. Sci. Engg. Res. 5, 145-152. Takagi, K. (1972). On Designing unknown-sigma sampling plans based on a wide class of non-normal distributions. Technometrics 14, 669-678.
Application of order statistics to sampling plans
511
Thoman, D. R., L. J. Bain and C. E. Antle (1969). Inferences on the parameters of the Weibull distribution. Technometrics 11, 445~,60. Tiku, M. L. (1967). Estimating the mean and standard deviation from a censored sample. Biometrika 54, 155-165. Tiku, M. L. (1980). Robustness of MML estimators based on censored samples and robust test statistics J. Statist. Plann. Infer. 4, 123-143.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 Elsevier Science B.V. All rights reserved,
1
g~
l_
Linear Combinations of Ordered Symmetric Observations with Applications to Visual Acuity
Marlos Viana
1. Introduction In vision research, the Snellen chart is commonly used to assess visual acuity and is made up of letters of graduated sizes. By combining letter size and chart distance it is possible to determine the minimum visual angle of retinal resolution. A visual acuity of 20/30 means that at 20 feet the minimum angle of resolution is 3o_ 20 times the resolution standard (about 5 minutes of arc). The vision of a normal eye is recorded as 20/20 and corresponds to zero in the scale determined by the logarithm of the minimum angle of resolution (Log MAR). Normally, a single measure of visual acuity is made in each eye, say Y1,112,together with one or more covariates X, such as the subject's age, reading performance, physical condition, etc. Because smaller values of Log M A R correspond to better visual acuity, the extremes of visual acuity are defined in terms of the "best" acuity I1(1) = min{Y1, Y2} and the "worst" acuity Y(2) = max{Y1, Y2}. Ordered acuity measurements are also required to determine the person's total vision impairment, defined as Total Impairment - 3Y(1) + Y(2) 4 [e.g., Rubin, Munoz, Fried and West (1984)]. Consequently, there is interest in making inferences on the covariance structure A = Cov(X,
WlY(1 ) -1- w2Y(2)) ,
which includes the assessment of the correlation and linear predictors between X and linear combinations wl I1(1) + w2Y(2) of the extreme acuity measurements. In particular, the correlations among average vision (wl = w2 = 0.5), best vision (Wl = 1,w2 = 0), worst vision (wl = 0, w2 = 1), acuity range (wl = -1,w2 = 1), vision impairment (wl = 0 75,w2 = 0.25) and one or more of the patient's conditions can be assessed. Other applications of extreme bivariate measurements include the current criterion for an unrestricted driver's license, which in the majority of states is 513
M. Viana
514
based on the visual acuity of the best eye [see Fishman et al., (1993), Szlyk et al., (1993)]; the assessment of defective hearing in mentally retarded adults based on the ear with best hearing (Parving and Christensen 1990); the predictive value of the worst vision following surgery in the eyes of glaucoma patients (Frenkel and Shin 1986); sports injury data on the reduction of best vision in damaged eyes (Aburn 1990); and the analysis of worst vision among patients treated for macular edema (Rehak and Vymazal 1989). 2. Models and basic results
Because of the natural symmetry between responses from fellow eyes as well as between the additional measurement X and the response of either eye, it is assumed that the vector of means associated with (X, Y1,Y2) is given by #' = (#0, #1, #1) and the corresponding covariance matrix 2; by o 2 7o-3 7av] 70-77 .g.2 p~-2 ] , 7o-'c Pz 2 z2 j
2; =
1 -}-p 72 < - -2 '
p2 < 1 -'
(2.1)
where the range of the parameters is necessary and sufficient for 2; to be positive semi-definite. When there are p Y-values, the restriction is 72 _< [1 + (p - 1)p]/p. In general, the correlation p between Ya and Y2 is in the interval [-1, 1]. However, in the present context in which I"I and Y2 represent measurements on each eye, the correlation may be assumed to be non-negative. A key result is that the covariance between X and Y(i) is equal to the covariance between X and Y,., which is surprising. In the bivariate case, because Y(2) =½IY1 - I+21+21-(I+1+g2) , it obtains that, coy(X, Y(2)) - coy(X, Y2) = 2!coy(X, II11- Y21)
= fX~Y2~Yl(X- #o)(Yl-y2)dP = ffxx(x-#o)R(x)dPx, where the next to last equality follows from the fact that the distribution symmetric in y and the last equality from defining
R(x) = /
(Yl - Y2) dPYIx J~ _0,p>0 V/(1 + p)/2 The maximum value is obtained when w' = (1/2, 1/2), in which case w'°~ is the average of the components of Y. PROOF. Write equation (3.1) as Corr(X, w ' ~ ) =
7 x//p + (1 - p ) f '
f _ w ' C fwf
(w'e) 2
Because cg is positive definite, a solution w to the constrained minimization problem for w'~w, and equivalently for f , needs to satisfy cgw = 2e, where 2 is a Lagrangian multiplier. The fact that c~ is stochastic shows that the unique constrained solution is w' = (1/2, 1/2). [] The correlation 0 between the extreme values Y(1) and Y(2) is 0 = Corr(Y(1),Y(2))
= p + (1 - p)Cl2 p + (1 p)c22
(3.2)
For non-negative values of p, it holds that 0.4669
c1~ _< 0 _< 1 , C22
whereas the partial correlation of Y(1) and Y(2) given X is 01210 =
Corr(Y(1), y(2)lX ) = p + (1 - p)c12 p + (1 - p)c22 -
]~2 72
_< 0 .
(3.3)
Linear combinations of ordered symmetric observations
517
Thus, the partial correlation is always a contraction of the product moment correlation, regardless of the composition of the covariate. The minimum mean-squared error (m.s.e.) linear predictor of w'~d from X follows from the fact that E[w'°~] = #lw'e + zV/1 - p w'c ,
where c ' = (Cl,C2) = (-0.56419,
0.56419)
(3.4)
is the expected value of the smallest and largest of two independent standard normal variables [e.g., Beyer (1990)], and from the fact that A10Ao1 = 7rw'e/a. The resulting equation is t
w'~=
"["
#1we+ zv/1-
p w'c + - 7 ( X - # o ) w e (7
l
(3.5)
.
The corresponding mean-squared error can be expressed as All[0
=
"C2(1 --
72)[P.lo(w'e) 2 + (1 - p.lo)W'cgw],
P.10 - - p1 ---~ 272 ,
(3.6)
whereas the multiple correlation coefficient is equal to 62, the squared correlation between w ' ~ and X. Similarly, the best (minimum m.s.e.) linear regression of X on w ' ~ is described by t
awe
x = #o ~ @ ( w , e ) 2 + (1 - p ) w ' ~ w ]
7w'(°lJ-Ule-zv/1-p
c) ,
(3.7)
with corresponding mean-squared error -Aooll = ~2 1
72(w'e)2
]
(3.8)
p(w'e) 2 + (1 - p)w'~wJ
Also of interest are best linear predictors of one extreme of Y based on X and the other extreme of Y. The appropriate partitioning of q' shows that the two linear regression equations I~1) =b112 + blY(2) -b b2X , I~2) =b2ll + bl Y(1) + b2X , are defined by parallel planes in which the coefficient bl is the partial correlation 01210 given by (3.3), the coefficient of X is b2 =
zy(1 - p ) ( c 2 2 - c12 ) o-lp -4- (1 - p ) c 2 2 - ~2]
'
whereas the intercept coefficients are, respectively,
(3.9)
518
M . Viana
bll2 = ktl(1 - bl) - b2//o -
(c2 -
blcl)zV/1 - p
b2ll = ktl( 1 - h i ) - b2kt0 + (c2 - b l C l ) ' C V / 1
- p
,
(3.1o)
•
Because cl = - c 2 , the vertical distance between the planes is 2c2zx/1 - p
(1 + b l )
•
In addition, the model m e a n - s q u a r e d error and corresponding multiple correlation coefficient R 2 can be expressed as m.s.e. =
"t'2(l -- p2)(C22 -- C12)
R2
p + (1 - p ) c 2 2 - y2 ,
= 1-
m.s.e. ,~2[p + (1 -- p)C22 ] '
4. Maximum likelihood and large-sample estimates Given a sample (Xc~,ylct,y2c~), O~ = 1 , . . . , N of size N with means x, Yi, crossp r o d u c t matrix A = (aij), i,j = 0, 1,2, the m a x i m u m likelihood estimates of ~ and 0 are given by
3=
,
9w'e
/
~/,b(w'e) 2 + (1 - D)w'Cgw
~ = / 5 + (1 - p)Cl2 , /) -}- (1 -- /0)C22
(4.1)
where
1
&2 - a 0 0
{2=~(all+a22) N
~
and a12 /3 - l(al I q- a22)'
21-(aol + a02) 7 = V/1
(all -/- a22) a v ~
,
(4.2)
are the m a x i m u m likelihood estimates of a 2, r2, p and ~ based on N a00 = ~ ( x 7 c~=l
N -- . ~ ) 2
a0j = Z ( x ~
- ~)(~> - yj)
c~=l
and N ~=1
The delta m e t h o d [e.g., A n d e r s o n (1985, p. 120)] shows that the asymptotic joint distribution of ~ (6 - 6, t) - 0) is n o r m a l with means zero, variances
Linear combinations of ordered symmetric observations
519
AVar(6) = [2p 2 + 6p2?2f + ~:2 _ 5p2)~2 _ 472p -4- 474p + 2p 3 + 4 f p - 4p3f + 2p3f 2 - 2p2f 2 + 2 f 2 - 2 p f 2
+ 474f - 6y2f - 4 7 4 p f ] / [ - 4 ( f + (1 _ / ) p ) 3 ]
[w'Cgw]/(w'e)2, w'e ¢
where f =
(4.3) ,
0;
aVar(0) = (Cl2 - c22)2(1 - p)2(1 + p)2 (e22 + (1 - c22)P) 4 '
(4.4)
and covariance ACov(3, O) = I ( - 2 p 2 + 3p2f + 272p - 2y2pf - p + 222f - 3 f + 1) X (C12 -- C22)~(1 -- p)]
(4.5)
/ [ 2 ( f + (1 --f)p)3/2(C22 + (1 --C22)P) 2] . In particular, note that
0 ACov(6,~),p=0,7=O)=
[0
(C22 GI2) 2
'
c~2 so that ~ and 0 are asymptotically independent when X, Y1, II2 are jointly independent.
5. An exact test for ~ = 0
As indicated earlier in Section 3, Proposition 2.2 implies that the following conditions are equivalent under the exchangeable multivariate normal model with covariance structure given by (2.1): 1. ~ = Corr(X, Y/) = 0 2 . 6 = Corr(X, w ' ~ ) = 0 3. X and Y are independent 4. X and ~ are independent 5. X and w'°~ are independent The hypothesis 7 - - 0 can be assessed as follows. p Let A00 = a00, A01 = (a0j), j = 1,... ,p, All = (aij), i , j = 1 , . . . ,p, and Al0 = Am. Further, let r denote the sample intraclass correlation coefficient
~i<j; i=1 aij /[p(p P
r=
1)/2]
E i P l aii/P
associated with the sample p × p matrix of cross-products All. The distribution of All is Wishart Wp(z2 [pee + ( 1 - p ) I ] , n ) , n = N - 1 . Further, let r.10 denote the
520
M. Viana
sample intraclass correlation coefficient based on the conditional cross-product matrix 2
t
Alll0 = All -- AIoAoolAol ~ Wp(z2(1 - 7 )[P.lo ee + (1 - P.10)l], n) , where P.10 = (P - 72)/( 1 - 72). It follows [e.g., Wilks (1946)] that t r nSlll0 UI - - (1 + Co - 1)r.10) ~ z2[1 + Co - 1)p -pTZ]ZZ_p , P
U2 . Co . 1)tr. n&,lo 2 . (1 . r.lo ) 272(1 P)Z(p-1)(n-p) , P tr nSll V1 - - (1 + Co - 1)r) ~ "c2(1 + Co - 1)p)z 2 , P V2 = Co - 1)tr n&l (1 - r) ~ "c2(1 - P)Z~p-I> Furthermore, U 1 is independent of U2, and V~ is independent of//2. In addition, when ~ = 0, from Anderson (1985), Corollary 4.3.2, it follows that Vii -
U1 ~
v2[1 + C o - 1)p]x2 ,
independent of V1. Consequently, when 7 = 0, n V1 - U~ _ n [1 - (1 + C o - 1)rq0)tr $1110] p V1 p [_ (l + Co - 1)r)tr S l l J
Fp,.
m
(5.1)
Similarly, when p = 0, directly from the canonical representation of All, Co-1) V'-I+Co-I> V2 1- r
~ Fn,(p 1)n
,
(5.2)
so that (5.1) and (5.2) can be used to assess the corresponding hypotheses. Note that when 7 is different from zero, larger values of (5.1) are expected; when p is positive larger values o f (5.2) are expected. In the unrestricted case, smaller values are expected when p is negative.
6. Numerical examples
The following statistics are based on N = 42 subjects participating in a larger experiment reported by Fishman et al. (1993), in which the evaluation of patients with Best's vitelliform macular dystrophy included the measurement of their bilateral visual acuity loss, denoted by (Ya, Y2), and age, denoted by X. Because the visual acuities YI, Y2 in (respectively left and right) fellow eyes are expected to be about the same, to have about the same variability and to be equally correlated with age, the model defined in Section 1 to describe the data on
521
Linear combinations of ordered symmetric observations
(X, Y1,112) is used. The c o r r e l a t i o n structure between age a n d linear c o m b i n a t i o n s o f extreme visual acuities will be c o n s i d e r e d next. T h e starting p o i n t is the s a m p l e m e a n s (2,)51,J52) = (28.833,0.412, 0.437) , covariance matrix A = - S N-1
"367.996 4.419 4.200
=
4.419 0.135 0.074
4.200]
0.074 / 0.163J
based on (X, 111, I12) d a t a , a n d c o r r e s p o n d i n g c o r r e l a t i o n m a t r i x
R :
[1.000 10'627 L0.542
0.627 1.000 0.499
0.542] 0.499l 1.000J
A g e is expressed in years a n d visual acuity m e a s u r e m e n t s are expressed L o g M A R units. T h e m a x i m u m l i k e l i h o o d estimate (4.2) o f the c o r r e l a t i o n p between vision on fellow eyes is 0.496, whereas the e s t i m a t e d c o r r e l a t i o n ~ between the p a t i e n t s ' s age a n d vision in either eye is 0.581. In a d d i t i o n , the e s t i m a t e d s t a n d a r d d e v i a t i o n o f vision in either eye is 0.386, the s t a n d a r d d e v i a t i o n for age is 19.182, the e s t i m a t e d m e a n vision a n d age are/51 = 0.424 and/5o = 28.83, respectively. The m a x i m u m l i k e l i h o o d estimate (4.1) o f the c o r r e l a t i o n 0 between extreme acuities is 0.782. T a b l e 1 s u m m a r i z e s the coefficients needed to estimate the c o r r e l a t i o n a n d linear regression p a r a m e t e r s between X a n d a linear c o m b i n a t i o n w ' ~ o f extreme Table 1 Linear combinations of extreme vision acuity w~
W'°~
w'e
w'C~w
w'c
(0.5, 0.5) (1, 0) (0, 1) (-1, 1) (.75, .25)
average vision best vision worst vision range visual impairment
1 1 1 0 1
0.5 c l l = 0.6817 c22 - 0.6817 2 ( C l l - - c22) = 0.7268 0.5454
0 cj = -0.5642 c2 - 0.5642 2c2 -- 1.1284 -0.2821
Table 2 Linear combinations of extreme vision acuity and corresponding estimates w'~
3
Avar(6)
Acov(;, 0)
Avar(6]~/= 0)
average vision best vision worst vision visual impairment
0.671 0.634 0.634 0.661
0.105 0.155 0.155 0.117
0.104 0.236 0.236 0.151
1.000 0.733 0.733 0.916
M. Viana
522
acuities. F r o m (4.1), (4.3) and (4.5), the corresponding estimates o f 6, Avar(~), Acov(6, 0) and Avar(6]7 = 0) are shown in Table 2. The estimated large sample variance o f ~), given by (4.4), is 0.6115. The value o f the test statistic (5.1) for 7 = 0 is Fp,n = 9.48, which supports the conclusion o f a non-null correlation 7 between age and vision. Consequently, there is evidence to support the hypothesis o f association between the patient's age and non-null linear combinations o f extreme vision measures, such as those indicated in Table 1. N o t e that the range o f vision acuity is necessarily independent o f the patient's age under the equicorrelated-exchangeable model described by (2.1). The test statistic (5.2) for p = 0 is F,,(p_~), = 2.97 which also supports the claim o f a positive correlation p between vision o f fellow eyes. The estimates o f the regression lines (3.5) predicting the linear combination of extreme visual acuity from the patient's age and corresponding standard errors s.e. derived from (3.6) are shown in Table 3. Similarly, the estimates o f the regression lines (3.'7) predicting the patient's age f r o m the linear combination o f extreme visual acuity and corresponding standard errors s.e. obtained f r o m (3.8) are shown in Table 4. A more realistic application, in this case, is the prediction o f the subject's reading performance f r o m a linear combination o f extreme acuities, such as the subject's total visual impairment 3y(1 ) ÷ l y ( 2 ) , defined earlier in Section 1 [see also Rubin et al., (1984)]. Tables 5 and 6 show the corresponding m i n i m u m m.s.e, estimates for these models, obtained from sample means and cross-products o f (X, Y(1), Y(2)). These estimates will be contrasted with those obtained from data on (X, Y1, Y2). The usual estimates obtained from (X, Y(I), Y(2)), although o p t i m u m in the m.s.e, sense, fail to carry over the multivariate normal assumption and properties. Table 3 MLE Linear regression estimates of w'~J on age w'~
constant
coefficient
X
s.e.
r2
average vision best vision worst vision range of vision visual impairment
0.0866 -0.068 0.2412 0.3092 0.0093
0.0117 0.0117 0.0117 0 0.0117
age age age age age
0.247 0.273 0.273 0.233 0.254
0.450 0.401 0.401 0 0.436
Table 4 MLE Linear regression estimates of age on w'~ X
constant
coefficient
w'~
s.e.
r2
age age age age age
12.464 8.932 19.565 28.83 15.845
38.599 34.389 34.389 0 37.4453
average vision best vision worst vision range of vision vision impairment
14.222 14.834 14.834 19.181 14.393
0.450 0.401 0.401 0 0.436
523
Linear combinations of ordered symmetric observations
Table 5 (x,yo),y(21)-based linear regression estimates of w'~ on age
w'~
constant
coefficient
X
s.e.
r2
average vision best vision worst vision range of vision visual impairment
0.0876 -0.037 0.2124 0.2494 0.0253
0.0117 0.0114 0.0119 0.0005 0.0115
age age age age age
0.2506 0.2431 0.3260 0.2823 0.2365
0.4516 0.4548 0.3376 0.0381 0.4743
Table 6 (x,y(l),y(2))-based linear regression estimates of age on w'~'
X
constant
coefficient
w'~
s.e.
r2
age age age age age
12.427 17.193 13.114 28.134 14.118
38.568 39.778 28.165 2.621 40.990
average vision best vision worst vision range of vision vision impairment
14.381 14.340 15.805 19.407 14.080
0.4519 0.4548 0.3376 0.0381 0.4743
U n d e r these data, the covariance matrix (2.3) would be estimated by [367.996 ~0 = / 4.207 [ 4.411
4.207 0.105 0.092
4.411 0.092 0.156
,
with resulting correlation matrix [1.000 Corr0(X, Y(1), Y(2)) = / 0.674 [0.581
0.647 1.000 0.716
0.581] 0.716 / 1.000J
In contrast, the corresponding m a x i m u m likelihood estimate obtained f r o m Section 4 under the equicorrelated-exchangeable model is A I i.ooo C o r r ( x , Y(1), Y(2)) = /0.634 [_0.634
0.634 1.000 0.782
0.634]
0.782 / 1.000J
The differences can be remarkable: for example, from Table 3, the estimated range o f vision is 0.3092, whereas the unrestricted estimated value f r o m Table 5 is 0.2494. The difference is numerically nearly equivalent to the difference between normal vision (Log M A R = 0) and a reduced vision o f 20/40 (Log M A R = 0.3). The unrestricted model for best vision overestimates 62 by a b o u t 12% and underestimates it by a b o u t 21% for the worst vision. Tables 4 and 6 show that the
524
M. Viana
expected ages c o r r e s p o n d i n g to a n o r m a l best v i s i o n (the m o d e l ' s i n t e r c e p t ) differ b y a b o u t 8 years. P r o p o s i t i o n 3.1 is p a r t i c u l a r l y i m p o r t a n t to j u s t i f y the choice o f the a v e r a g e v i s i o n a g a i n s t o t h e r c o n v e x l i n e a r c o m b i n a t i o n s w h e n the p u r p o s e is to o b t a i n the best m.s.e, l i n e a r m o d e l r e l a t i n g X a n d the c o n v e x c o m b i n a t i o n w ' ~ , u n d e r the e q u i c o r r e l a t e d - e x c h a n g e a b l e m o d e l . T a b l e 3 shows t h a t the c o r r e l a t i o n b e t w e e n the s u b j e c t ' s age a n d the a v e r a g e v i s i o n d o m i n a t e s the c o r r e l a t i o n w i t h best vision, w o r s t v i s i o n or visual i m p a i r m e n t . T h i s is a m a t h e m a t i c a l fact a n d n o t sampling variation.
Acknowledgement R e s e a r c h was s u p p o r t e d in p a r t b y u n r e s t r i c t e d d e p a r t m e n t a l g r a n t s f r o m Research to P r e v e n t B l i n d n e s s , Inc., N e w Y o r k , N e w Y o r k .
References Aburn, N. (1990). Eye injuries in indoor cricket at wellington hospital: A survey January 1987 to June 1989. New Zealand Medic. J. 103(898), 454~456. Anderson, T. W. (1985). An Introduction to Multivariate Statistical Analysis. 2nd edn., John Wiley, New York. Beyer, W. (1990). Standard Probability and Statistics - Tables and Formulae. CRC Press, Boca Raton. David, H. A. (1996). A general representation of equally correlated variates. J. Amer. Statist. Assoc. 91 (436), 1576. Fishman, G. A., W. Baca, K. R. Alexander, D. J. Derlacki, A. M. Glenn, and M. A. G. Viana (1993). Visual acuity in patients with best vitelliform macular dystrophy. Ophthahnology 100(11), 16651670. Frenkel, R. and D. Shin (1986). Prevention and management of delayed suprachoroidal hemorrhage after filtration surgery. Arch. Ophthal. 104(10), 1459-1463. Olkin, I. and M. A. G. Viana (1995). Correlation analysis of extreme observations from a multivariate normal distribution. J. Amer. Statist. Assoc. pp. 1373-1379. Parving, A. and B. Christensen (1990). Hearing of the mentally retarded living at home. Ugeskr(ft For Laeger 152(43), 3161 3164. Rehak, J. and M. Vymazal (1989) Treatment of branch retinal vein occlusion with argon laser photocoagulation Acta Universitalis Palackianae Olomucensis Facultatis Medicae 123, 231-236. Rubin, G. S., B. Munoz, L. P. Fried, and S. West (1984). Monocular vs binocular visual acuity as measures of vision impairment. Vision Science and Its Applications OSA Technical Digest Series 1, 328-331. Szlyk, J. P., G. A. Fishman, K. Sovering, K. R. Alexander and M. A. G. Viana (1993). Evaluation of driving performance in patients with juvenile macular dystrophies. Arch. Opthal. 111,207 212. Viana, M. A. G. and Olkin, I. (1997). Correlation analysis of ordered observations from a blockequicorrelated multivariate normal distribution. In: S. Panchapakesan and N. Balakrishnan, eds, 'Advances in Statistical Decision Theory and Applications', Birkhauser, Boston, chapter 21,305322. Wilks, S. S. (1946). Sample criteria for testing equality of means, equality of variances and equality of covariances in a normal multivariate distribution. Ann. Math. Statist. 17, 309-326.
N. Balakrishnan and C. R. Rao, eds., Handbook of Statistics, Vol. 17 © 1998 ElsevierScienceB.V. All rights reserved.
t,~t-~ ZU
Order-Statistic Filtering and Smoothing of Time-Series: Part I
Gonzalo R. Arce, Y e o n g - T a e g K i m and Kenneth E. Barner
1. Introduction
The processing of time-series is of fundamental importance in economics, engineering, and some social fields. Estimation methods based on structural timeorderings are extensively used in time-series smoothing and forecasting. Their design vary from ad-hoc to very sophisticated where the dynamical nature of the underlying time-series is taken into account. Unfortunately, many time-series filtering problems have not been satisfactorily addressed through the use of linear filters. As we illustrate in this tutorial, nonlinear filters can outperform linear methods in applications where the underlying random processes are nonGaussian or when system nonlinearities are present. Nonlinear and non-Gaussian processes are quite common in signal processing applications. Example waveforms include sea clutter in radar, speech waveforms, image and video signals, and many digital communication signals. For instance, image and video signals contain edges, details, scenes, and colors that can abruptly change from one sample to another. If linear filters are used to estimate these signals from their corresponding noisy observations, the resulting linear estimates will unavoidably yield blurred signals which, in many cases, are objectionable to the end user. Linear filters fail to preserve those fine features that are of great importance to visual perception. These facts agree with statistical principles which dictate that nonlinear estimation is advantageous for time series which are non-Gaussian in nature (Priestley 1988; Tong 1990). While second-order moments are sufficient to effectively process Gaussian processes, more powerful statistics must be exploited for the processing of non-Gaussian or nonlinear time series. In our case, we exploit traditional temporal statistics with order-statistics, jointly. Robustness is another issue that must be considered in the design of time-series filters. During the past decades it has become increasingly accepted that statistical procedures optimized under the assumption of Gaussianity are excessively sensitive to minor deviations from the Gaussian assumption (Huber 1981). Thus, the need of "robust" estimation frameworks for non-Gaussian sequence processing has become highly apparent. Since order statistics provide the basis for 525
G. R. Arce, Y. T. Kim and K. E. Barner
526
reliable inferences such as the estimation of location and scale, it is not surprising that the ordering information provided by the observation samples can significantly enhance the capability of time-series filters. This idea was first explored by Tukey (1974) when he introduced the running median for time series analysis. The running median is a special case of the running L-filter whose output can be written as N
y(n) = ~
wix(i)
(1)
i=l
where x(i) are the sample order statistics at time n, and where the set of weights {wi} are individually designed for each particular application. If the weights are chosen uniformly as wi = 1/N, the running L estimator reduces to the running mean. In fact, the mean is the only filter which is both a linear FIR and a L filter. If the weights are assigned as {~
for i = ~ + I , . . . , N - ~
Wi =
(2)
otherwise the obtained estimator is the symmetric trimmed mean where the bottom and top order statistics have been removed and where the remaining samples are averaged to produce the output. As described by Bednar and Watt (1984), trimmed means provide a connection between average smoothing and median smoothing as is illustrated here. Consider a segment of the voiced waveform "a", shown at the bottom of Fig. 1. This speech signal is placed at the input to several running trimmed mean filters of size 9. The outputs of the trimmed means as we vary the trimming parameter ~ from zero to four are also shown in Fig. 1. The vertical index denotes the trimming where the top signal is the median filtered output, the second signal from the top is the trimmed mean with ~ = 1 output signal, and successively the other trimmed means are displayed in Fig. 1. The different characteristics of the filtered signals as we vary the trimming can be immediately seen. Notice that while the running mean results in a smooth blurring of the signal, the running median smooths the signal with sharp discontinuities. This is due to the fact that the running median restricts the output value to be identical to the value of one of the input samples in the observation window. Depending on the amount of trimming, the alpha trimmed filter removes narrow impulses, but it also does some edge smoothing. The running L-filter has many desirable attributes which have been exploited in several applications (Bovik et al. 1983) However, L-filters fail to exploit the temporal structure of time series (Pitas and Venetsanopoulos 1989). Our goal is to define estimators that utilize both the temporal and ranking configurations of the permutation mappingp: xe(n) ~ xL(n), where xe(n) and xL(n) are the observation vector and its corresponding sorted order-statistic vector, respectively. It will be shown that for the analysis and processing of time series, one attains significant advantages by exploiting the information embedded in the mapping p, rather than using the marginal information contained in either xe(n) or xL(n). We denote the
Order-Statistic filtering and smoothing of time-series: Part I
527
0
-1
-2 0
50
i
i
i
100
150
200
r
250 time n
300
350
400
450
500
Fig. 1. Trimmed mean filtering of a speech signal for various levels of trimming.
estimators which exploit the permutation xe(n) +-+ xL(n) as permutation Ug-filters where L refers to the use of order-statistics, g denotes the temporal ordering of the linear combiner forming the estimate, and j refers to the amount of information extracted from the permutation mapping p. The structure and optimization methods used for the class of Ug filters parallels that of linear finite-impulse-response filters widely used in the area of signal processing for engineering applications. The underlying concepts, however, may be extended to other filters structures that may be more amenable for other fields.
2. The estimators
2.1. Lg filters Consider the real valued sequence {x(n)}, and define the N-long observation vector at the discrete time n as xe(n)=[xl(n),x2(n),...,xu(n)l r, where xi(n) = x(n + i - (K + 1)) with N = 2K + 1. Thus xe(n) is a temporally ordered observation vector centered at x(n). The observation samples can also be ordered by rank which defines the vector x c ( n ) = [x(1)(n),x(z)(n),...,X(N)(n)] T, where x(1) (n) _< x(2)(n) < ... < x(N)(n) are the sample order-statistics. When there can be no confusion, for the sake of notational simplicity, the temporal index n is dropped from the notation. The temporal-order and rank order observations are
528
G. R. Arce, Y. T. Kim and K. E. Barner
then expressed as simply xe = [Xl,X2,... ,XN] r and XL = Ix(1),x(2),... ,X(N)I r. The subscripts g and L refer to the temporal and ranking orderings of the elements in xe and XL, respectively. We define rz as the rank of xi among the elements of xe. Hence, the sample xz in xe gets mapped to x(r,) in Xz. In the case of rank ties among a subset of input samples, stable sorting is performed where a lower rank is assigned to the sample with the lower time indexing in the subset containing rank ties. xe and XL respectively contain local temporal and ranking information of the underlying time-series. It is useful to combine the marginal information of both vectors into one. To this end, the N2-1ong vector XLe is next defined as (Ghandi and K a s s a m 1991; Palmieri and Boncelet 1994).
xTg =- [XI(1), X l ( 2 ) , . , Xl(N) I...
,Xio'),...
IXNo),XN(2),... ,XN(N)]
,
(3)
where
{~ i
xi(j) =
if xi
>x(/) (4)
else
and where xi ~ , x(j) denotes the event that the ith element in xe is the jth smallest in the sample set. Thus, the ith input sample is mapped into the bin of samples xi(a), xi(2), . . . , X~(N) of which N - 1 are zero and where only one is non-zero having the same value as xi. The location of the nonzero sample, in turn, characterizes the ranking of xg among the N input samples. The decomposition xe C R x < >XLg C R N2 specified in (3) and (4) is a one-toone nonlinear mapping where x~ can be reconstructed from XLe as xe = EIN ®e~]xLe ,
(5)
where IN is an N x N identity matrix, eN is an N × 1 one-valued vector, and ® is the matrix Kronecker product. Since the XLe vector contains both, time and rank ordering information, it is not surprising that we can also obtain XL from XLe as
=
® Idx
.
(6/
EXAMPLE 1. Consider a length 3 filter and let the observation vector be xe = [3, 5, 2] r. The ranks of the elements in the observation vector are rl = 2, r2 = 3, and r3 = 1; thus, X L = [ 2 , 3 , 5] r xLe ----[0, 3,010, 0, 512, 0, 0] r .
(7)
The xe vector can be reconstructed from Xze as xe=
I
Or 0r
e3r 0r
]
[ 0 , 3 , 0 1 0 , 0 , 5 1 2 , 0 , 0 ] r,
e3r
(8)
Order-Statistic filtering and smoothing of time-series." Part I
529
where e? = I1, 1, 1]T, and 0 = [0,0,0] r. Similarly, XL is obtained from XLe as XL = [13113113][0, 3, 010, O, 512, O, O] r
where 13 is
the 3 x 3 identity matrix.
(9) []
The decomposition xe ~ XLe thus maps a vector with time-ordered samples into a vector whose elements are both time- and rank-ordered. The manner in which each element x~(j) is given its value is particularly important. The rank ofx~, ri, determines the value of all the elements X i ( 1 ) , X i ( 2 ) , . . . ,Xi(N) , regardless of how the other samples in the window are ranked. Given the value of ri, there are (N - 1)! ways of assigning ranks to the remaining N - 1 samples such that the values of xi(1), xi(2),..., Xi(N) are not modified. This can be considered a coloring of the full permutation space described by the N! ways in which the ranks of N elements can be assigned. The class of Lg filters, introduced by Palmieri and Boncelet (1994) and independently by Ghandi and Kassam (1991), builds on this idea of combining temporal and rank orderings. Here, the output is a linear combination of the observation vector where the coefficient associated with the ithinput sample also depends on its rank among all N samples. The output of the filter at time n can be expressed as = w T
(lO)
where the weight vector is W = [ ( w l ) r l ( w 2 ) r l . . . [(WN)T1T in which wi is the N long tap weight vector associated with the ithinput sample, and where £/(n) is the Lg estimate of a desired signal statistically related to the observation vector xe(n). The values given to the weights in W must be designed according to an optimization criterion. This topic will be addressed shortly. It is useful at this point to present an example illustrating the advantages of Lg filters over traditional linear filters. Consider the information signal of Fig. 2(a), used in (Ghandi and Kassam 1991), which is the superposition of two sinusoids at normalized frequencies 0.03 and 0.25 with amplitude 10 and 20, respectively. The desired signal is transmitted through a channel that exhibits saturation. The saturation can be modeled by a sigmoid function followed by a linear time-invariant channel. The sigmoid function is given by A ( 1 - e ~d("))/(1 + e ~d(n)), where d(n) is the desired signal, and the FIR channel is a low pass filter. The signal distorted by the sigmoid function and the FIR channel is depicted in Fig. 2(b) where A = 20 and c~= 0.2 are used. In addition, the channel also introduces additive contaminated Gaussian noise, whose probability density function is given by (1 - cS)G(0, cr2) + 6G(O, ~2) where G(0, a 2) represents a Gaussian distribution function, 6 is the density of outliers, and where al < a2. A contaminated Gaussian noise with al = 3,~r2 = 15 and 6 = 0.1 is added to the signal. The corrupted observed signal is depicted in Fig. 2(b). Figure 3 shows segments of the linear filter output and of the Lg filter output for a window of size 9. The figures show that the output of the linear filter is severely affected whenever an outlier is present. On the other hand, single outliers
G. R. Arce, Y. T. Kim and K. E. Barner
530
40
30
20
10
0
© -10
-20
-30
-40
10
20
30
40
50
60
70
80
90
100
time
40
30
I
tt
I ~1
20 I
I I
!° ~ -10
-20 II
Ii
,
i[ iI 'i~
-30 -40
10
20
30
40
5'0
6'0
70
8'0
90
100
dme
Fig. 2. Segment of (a) the original signal, and (b) nonlinearly distorted signal ( ) and the observed signal (-) which is corrupted by an additive contaminated Gaussian noise. have m i n o r effects on the Lg filter output. These observations can be readily seen in the time interval (20-25) o f Figs. 2 3, where an outlier was present in the observation sequence. The m e a n squared error attained by the linear and Lg filters are 70.55 and 58.16, respectively.
Order-Statistic filtering and smoothing of time-series: Part I
531
40 3G
10 © 0
;q q0 -20 i' -30
-40
10
20
30
40
50 time
60
70
80
90
100
90
100
40
30
20
10
0 Y. -7 -10 Ir
I, -20
L,
IiI
Lr
'l
q
,I
30
-40
10
20
30
40
50 time
60
70
80
Fig. 3. Segment of the estimated signals using (a) a linear filter, and (b) Lg filter, where ( - - ) represents the original signal and ( ) represents the estimation.
Before we conclude this section, we further elaborate o n this example to motivate the i n t r o d u c t i o n of the more general class of LJg filters. This can be done by noticing that neither the linear filter n o r the Lg filter are effective in reversing the n o n l i n e a r s a t u r a t i o n of the channel. The definition of the Lg filter incorporates
532
G. R. Arce, Y. T. Kim and K. E. Barner
both rank and temporal information to some extent• It does not fully exploits the information provided by the mapping xe ~ ~ xL. This follows from the fact that the weight given to the sample xi depends on the rank of x~ but the rank distribution of the remaining samples does not affect the weight applied to x~. In the following these concepts are generalized into a more general filtering framework. These are denoted as permutation LJg filters where j is the parameter that determines the amount of time-rank information used by the estimator (Kim and Arce 1994). To illustrate the performance of these estimators as j is increased, Fig. 4 shows segments of the L2g filter output and of the L3g filter output whose mean square errors are 28.5 and 4.6, respectively. These outputs are clearly superior to those of the linear and L1g filters shown in Fig. 3. The higher order LJf filters, in particular, are more effective in removing the saturation effects of the nonlinear channel. 2.2. U g filters
Consider the observation vector xe = [Xl,X2,...,XN] r and its corresponding sorted vector xL = [x(1),x(2), . . . , X(N)]r. Define the rank indicator vector =
T ,
(1 l)
where •¢~k =
1 0
if xi ~ x(k) else
(12)
where xi ~ x(k) denotes that the ith temporal sample occupies the k th order statistic. The variable ri is then defined as the rank ofxi; hence, N?~ri= 1 by definition. Assuming the rank indicator vector ~i is specified, if we would like to jointly characterize the ranking characteristics of xi and its adjacent sample, xi+l, contained in xe, then an additional indicator vector is needed which does not contain the information provided by ~ . Hence, we define the reduced rank indicator of xi+l, as ~ ] , where we have removed the r~helement from the rank indicator vector ~i+1. The two indicators ~i and ~ ] fully specify the rank permutation characteristics of xi and xi+l. We can generalize this concept by characterizing the rank permutation characteristics of a set of j samples. Here, a reduced rank indicator of xiea, ~ , is • th th th formed by removing the r_~ , riea, . . . , rie(a 1)elements from the rank indicator vector ~ie~ where ® denotes the Modulo N addition i ® a = (i + a) Mod N ] The parameter a specifies the sample, x~e~, whose rank information is being considered in addition to the rank information of the samples (x~,xiel,...,x~e(~_l)), i.e., a = 1 considers the rank information of the sample xiel when the rank information of the sample xi is known, and a = 2 considers the rank information of the sample xie2 when the rank information of the samples (xi,xi, i) is known. For
1 The Modulo N operation defined here is in the group { 1 , 2 , . . . , N}, such that (N Mod N = N) and ( N + I M o d N = 1).
Order-Statistic filtering and smoothing of time-series: Part I
533
40 30 2O o_
(~ 10 I1) LL
g
o
~_ -10 -20 -30
-40
i
10
20
30
40
110
20
30
40
50 time
60
7~0
8~0
60
7~0
80
90
1O0
9~0
100
40
30
20 "5 0 1C (1)
bY.
g o "5
E #_ -lo -20
-30
-40
i 50 time
i
Fig. 4. Segment of the estimated signals using (a) L2g-permutation filter, and (b) L3g-permutation filter, where ( - - ) represents the original signal and (--) represents the estimation.
534
G. R. Arce, Y. T. Kim and K. E. Barner
example, if xe = [6, 3, 10, 1]r and xL = [1, 3, 6, 10] r, then the rank indicator vectors and their respective rank parameters are ~ 1 = [0, 0, 1,0] T, ~7~3 [0,0,0,1] r,
~2 = [0, 1,0, 03 T, ~4=[1,0,0,0]T,
F 1 = 3, r3=4,
r2 = 2, F4=l '
(13)
The reduced rank indicator vectors D~ and ~2 are, for instance, = [ 1 , o , o,
~
E = [~r4,0,
171
= I i , o , o1
1, ~n]r
= [0, 1]T ,
(14)
where the r~h sample was removed from D~3el = ~ 4 to obtain ~ and where the r~h and r~h samples were deleted from D~3e2 = g~l to get ~ . Note that the notation /Or' used in (14) represents the deletion of the sample "0" which is the F~h element of the rank indicator vector. The D~ indicates that the sample x4 is the first ranked sample among (xl,x2,x4), and similarly g~ indicates that xl is the second ranked sample among (xl,x2). The general idea behind the reduced rank indicator ~ is that it characterizes the rank information of the sample xiea under the situation in which the rank information of the samples (xi, xi~l,..., xie(a 1)) is known. The rank indicator vector and the reduced rank indicator vectors are next used to define the rank permutation indicator PJ as l~ = ~ i ® ~ ]
@...®~
1
(is)
for 1 < j _< N, where ® denotes the matrix Kronecker product. Note that while the vector ~ i is of length N, the vector ~ in (15) has length of P/v dOfN(N - 1)... (N - j + 1) which represents the number of permutations choosing j samples from N distinct samples. The vector PJ effectively characterizes the relative ranking of the samples (xi, xi~l,... ,xie(j-1)), i.e., the rank permutation o f ( x i t x i ~ l , . . . , x i ~ ) ( ]. 1))- Hence, p0 does not unveil any rank information, whereas P] provides the rank information ofxi, but ignores the rank of the other N - 1 input samples. Similarly, P~ provides the rank-information of xi and ximl but eludes the ranking information of the other N - 2 input samples. Clearly, pX accounts for the ranks of all input samples xl through Xx. In order to illustrate the formulation of the vector P~i, again let xe = [6,3, 10, 1]r and xL = [1,3, 6, 10IV. The rank indicator vectors, ~/i, for this example vector xe were listed in (13), then the rank permutation indicators for j = 2 are found as p2
P]
= :
1@
I1
: :
:
:
:
:
[0, O, 1, O]7"0 [0, 1, O]r [0, 1,0,03 r @ [0,0, 1]r [0, O, O, 1]r ® [1,0,0] 7, [1,0,0,03 r ® [0, 1,0] r
(16)
Order-Statistic filtering and smoothing c?f time-series: Part I
535
To see how the PJ characterizes the rank permutation, let us carry out the matrix Kronecker product in the first equation in (16), that is, P~ = [(0, 0, 0), (0, 0, 0), (0, 1,0), (0,0, 0)1 ~
(17)
where parentheses are put for ease of reference. Note that the 1 located in the second position in the third parenthesis in p2 implies that the rank of xl is 3 and the rank of x2 among (x2,x3,x4) is 2. Thus, P~ obtained in this example clearly specifies the rank permutation of xl and x2 as (rl, r2) = (3, 2). Notice that the vectors P~ can be found recursively from (16) as P~ = P? ® ~¢? In general, it can z. z: 1 " 1 be easily seen from (15) that this recursion is given by PJ = PJ ® ~//- . The rank permutation indicator forms the basis for the rank permutation vectors Xj defined as the NUN long vector Xj
= ~1 (p~)r
ix2 (p~)r i ... iXN ( p j ) r ] r
(18)
Note that Xj places each xi based on the rank of j time-ordered samples (xi, Xi+l,..., xi+(j 1)). Consequently, we refer to it as the LJg vector, where we have borrowed the notation from the order statistics terminology (L and g refer to the rank and time variables, respectively). It should be mentioned here that there are other ways of defining rank permutation indicators. For instance, we could let P~ characterize the rank permutation of the samples (Xi+l,Xi®3,... ,xi@(2j+l)), or it can characterize the rank permutation of (xl, x2,...,xj) regardless of the index i. Here, we use the definition of PJ in (15) since it provides a systematic approach to the design. Associated with the X/, Kim and Arce (1994) define the LJg estimate as
:wTx j
(19)
where the weight vector is
in which wJ is the UN long tap weight vector, and where @ is the LJg estimate of a desired signal statistically related to the observation vector xe. Notice that for j = 0, the permutation filter LOg reduces to a linear FIR filter. For j = 1, the permutation filter is identical to the Lg filter introduced earlier.
3. ~-trimmed
LJg filters
One of the advantages of order-statistic filters is their ability to trace extreme observation samples at the beginning and end of the vector XL (David 1982). Thus, LJg-estimates can be made more robust to outliers simply by always ignoring these extreme observation (i.e., trimming). The a-trimmed LJg filters are easily defined by introducing the e-trimmed rank indicator N~ which is formed by removing
536
G. R. Arce, Y. T. Kim and K. E. Barner
the l'st through the e th, and the (N - 7 + 1) th through the N th elements from ~ where e = 0, 1 , . . . , [~_tj. For instance, suppose we have ~ i = [0, 1,0, 0,017,, then the n-trimmed rank indicator will be N~ = I1,0, 0] 7, for e = 1 and ~ = [0] for e = 2. The ~-trimmed rank permutation indicator vector W/,~ is easily defined as P~i,~ - ~ ® ~
®'" ® ~-1,
(21)
and the a-trimmed Ug vector is then defined as
Ix'< X~ =
]
•
(22)
T c~ where The e-trimmed LJg estimate immediately follows as df = (¥j)X~ Vj = [(~)7,[(~£)7,1... I (~)7,]7, in which ~/ is the ( ~ ) P/~ long tap weight vector.
4. Optimization The Wiener Approach Given the Ug filtering framework, the goal is to minimize the error e(n) between the desired signal d(n) and the permutation filter estimate. Under the MSE criterion, the optimization is straightforward, since the output of the LYg filter is linear with respect to the samples in Xj. Hence, it is simple to show that the optimal Ug filter is found as (Kim and Arce 1994) W ; pt : Rflpy
(23)
where pj = {d(n) Xj} and Rj is the Ux × Ux moment matrix
Rj = E{XjX5}
(24)
[11{1 R{2 ... R~N]
" L R~¢I
R J2
--"
(25) RJN J
in which
I. [] PROPERTY 2.1. Given a length L sequence to be median filtered with a length N = 2N1 + 1 window, a necessary and sufficient condition for the signal to be invariant (a root) under median filtering is that the extended (beginning and end appended) signal be LOMO(N1 + 2). [] Thus, the set of signals that forms the passband or root set (invariant to filtering) of a size N median filter consists solely of those signals that are formed of constant neighborhoods and edges. Note that by the definition of LOMO(m), a change of trend implies that the sequence must stay constant f o r at least m - 1 points. It follows that for a median filter root signal to contain both increasing and decreasing regions, these regions must be separated by a constant neighborhood of least N1 + 1 identically valued samples. It is also clear from the definition of LOMO(.) that a LOMO(ml) sequence is also LOMO(m2) for any two
Order-Statistic filtering and smoothing of time-series: Part H
567
positive integers ml _> m2. This implies that the roots for decreasing window size median filters are nested, i.e., every root of a window size M filter is also a root of a window sized N median filter for all N < M. This is formalized by: PROPEkTY 2.2. Let S denote a set of finite length sequences and RN~ be the root set of the window size N = 2N1 + 1 median filter operating on S. Then the root sets are nested such that ...RN~+I E RNI E RN~-I E . . . E R1 E Ro = S. [] In addition to the above description of the root signal set for a median filter, it can be shown that any signal of finite length is mapped to a root signal by repeated median filtering. In fact, it is simple to show that the first and last points to change value on a median filtering operation remain invariant upon additional filter passes, where repeated filter passes consist of using the output of the prior filter pass for the input of an identical filter on the current pass. This fact, in turn, indicates that any L long nonroot signal (oscillations and impulses) will become a root structure after a maximum of ( L - 2)/2 successive filterings. This simple bound was improved in [17] where it was shown that at most 3I- L_--2 .] |2(N1 + 2)[
(15)
passes of the median filter are required to reach a root. This bound is conservative in practice since in most cases a root signal is obtained after ten or so filter passes. The median filter root properties are illustrated through an example in Fig. 6. This figure shows an original signal and the resultant root signals after multiple passes of window size 3, 5, and 7 median filters. Note that while it takes only a single pass of the window size 3 median filter to obtain a root, it takes two passes for the window sizes 5 and 7 median filters. Clearly, the locally monotonic structure requirements of the root signals are satisfied in Fig. 6. For the window size 3 case, the input sequence becomes LOMO(3) after a single pass of the filter. Thus, this sequence is in the root set of the window size 3 median filter, but not a root of the window size N > 3 median filter since it is not LOMO(N) for N > 3. The deterministic and statistical properties form a powerful set of tools for describing the median filtering operation and performance. Together, they show that the median filter is an optimal estimator of location for Laplacian noise and that common signal structures, e.g., constant neighborhoods and edges in images, are in the filter pass-band (root set). Moreover, impulses are removed by the filtering operation and repeated passes of the median filter always results in the signal converging to a root, where the root consists of a well defined set of structures related to the filter window size. 2.4. Median filtering and threshold decomposition
A fundamental property of median filters is threshold decomposition (Fitch et al. 1984). This property was the key to deriving many of the median filter statistical and deterministic properties. Moreover, threshold decomposition is instrumental
568
K. E. Barner and G. R. Arce
~
•
~OOO
Input signal x(n) GOOd, A
A
A
A v
A v
- - v
v
A v
A v
Root signal for a window
A w
A v
v
A v
A w
A v
of size 3 ( 1 filter pass).
A v
~.t-Root signal for a window of size 5 ( 2 filter passes).
4 3 2
11o
Root signal for a window of size 7 ( 2 filter passes).
1
0 Fig. 6. Root signals obtained by median filters of size 3, 5, and 7. o: appended points.
in the optimization of the median filter generalizations discussed in the following sections. A review o f this i m p o r t a n t property is therefore in order. Threshold decomposition is simply a means o f decomposing an M-level signal into an equivalent set o f M - 1 binary sequences 2. Let x(n) = Ix1,x2,..., XN] be an N element observation vector where the signal is quantized to M levels such that x ( n ) E ZM = {0, 1 , . . . , M - 1 }. The threshold decomposition o f x(n) results in the set o f binary vectors Xl(n), X 2 ( n ) , . . . , x V - l ( n ) , where Xi(n) ~ {0, 1} N is the observation vector thresholded at level i for i = 1 , 2 , . . . , M - 1. As a function o f the threshold operator T/[.], Xi(n) • r/[x(n)] = [ri[x1], ri[x2],..., T/[XN]] = I < i, < , .i. . ,xh],
(16) (17)
(is)
2 For now we restrict the discussion to quantized signals. This restriction is lifted in Section 3.4
Order-Statistic filtering and smoothing of time-series." Part H
569
where T/[.] is defined as 1 0
Xj = T/[xjJ = for i = 1 , 2 , . . . , M -
if xj > i,
(19)
otherwise
1 and j = 1 , 2 , . . . , N . In terms of the time indexed samples,
Xi(n) = T/Ix(n)]. Threshold decomposition can be reversed by simply adding the threshold decomposed signals, M-1
M-I
x(n) = ~ X i ( n )
and
x(n) = ~ X ' ( n )
i=1
.
(20)
i=1
Furthermore, it was shown by Fitch et al. that the median operation commutes with thresholding (see Fitch et al. 1985). Stated more formally, the median filtering of a M-level signal x(n) C {0, 1 , . . . , M - 1} is equivalent to filtering the M - 1 threshold signals X 1(n),X2(n),... , X M-1 (n), and summing the results, M-1
Med[x(n)l = ~
Med[Xi(n)]
(21)
i=1
for all n. Thus, threshold decomposition is a weak superposition property. A related property is the partial ordering property known as the stacking property. DEFINITION 2.6. Let X and Y be N element binary vectors. Then X stacks on Y, which is denoted as Y _< X, if and only if ~ < X/for i = 1, 2 , . . . , N . A function f(.) possesses the stacking property if and only if Y _< X ~ f ( Y ) < f ( X ) .
(22) []
The median filter was shown to possesses the stacking property (Fitch et al. 1985), which can be stated as follows. In the threshold decomposition domain, the binary median filter output at threshold level i is always less than or equal to the binary median filter output at lower threshold levels: Med[Xi(n)] _< Med[XJ(n)]
(23)
for all i,j such that 1 _< j < i < M - 1. The stacking property is a partial ordering property. It states that the result of applying the median filter to each of the binary sequences obtained by thresholding the original signal will have a specific structure to them. Thus, in median filtering by threshold decomposition, the input sequence is first decomposed in M - 1 binary sequences, and each of these is then filtered by a binary median filter. Furthermore, the set of output sequences possesses the stacking property. As a simple example, consider the median filter of window size three (N = 3) being applied to a 4-level input signal as shown in Fig. 7. The outputs of the multi-level median filter and of the threshold decomposition median filter are identical because of the weak superposition property.
570
K. E. Barner and G. R. Arce
1 1 0
2
0
3
3
1 2
Median Filter
2
>1
1 1 0
;
Threshold at 1, 2, and 3
2
3
3 2
2
2
T T
Add binary outputs
J
1
0 0 0 0 0
1 1 0 0 0
~l BinaryMed. Filter t
~ 0
0 0 0 0
0 0 0
1 0
1 1 0
:~1 Binary Me'd"Filter
~ 0
0 0 0
1 1 0
1 0
1 1 1 1 1
-[ BinaryM~l.Filter
~ 1 l
1 1-
1 0
1 1 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1
Fig. 7. Median Filtering by threshold decomposition. The 4-valued input signal is filtered by the running sorting method in the top part of the figure. In the bottom part of the figure, the signal is first decomposed into a set of binary signals and each of these is filtered by a binary median filter. The output is produced by adding together the outputs of the binary median filters.
3. Weighted median filters
Numerous generalizations to the median filtering operation have been introduced since Tukey first suggested the median filter as a smoother in 1974 (see Tukey 1974). While m a n y different approaches have been taken in an attempt to improve the median filter performance, most have, in some way, attempted to include temporal information into the filtering process. For most signals, and certainly those of practical interest, it is clear that certain observation samples have a higher degree of correlation with the desired estimate than do others. In the linear filter case, this correlation is reflected in the weight given each sample. A similar weighting approach can be taken to generalize the median filter. The sample weighting approach to generalizing the median filter is developed in this section. We begin by discussing the Center Weighted Median (CWM) filter, in which only one sample, the sample centrally located in the observation window, is weighted. This is then generalized to the Weighted Median (WM) filter case in which all observation samples are weighted. In both the C W M and W M filter cases the output is the median value of the weighted set. A further generalization can be achieved by allowing the output to be an order statistic other than the median. This leads to the class of Weighted Order Statistic (WOS) filters. Following the development of these generalizations, we show that each possesses the threshold decomposition property. As noted earlier, threshold decomposition is an extremely powerful tool for both filter analysis and optimization, and is the final topic covered in this section.
571
Order-Statistic filtering and smoothing o f time-series: Part H
3.1. Center weighted median filters
The median filter is strictly a rank order operator. Thus, all temporal locations within the observation window are considered equivalent. That is, given a window of observation samples, any permutation of the samples within the observation window results in an identical median filter output. As stated above, for most signals certain samples within the observation window are more correlated with the desired estimate than are others. Due to the symmetric nature of the observation window, the sample most correlated with the desired estimate is, in general, the center observation sample. The center observation sample can be weighted to reflect its importance, or correlation with the desired estimate. Since median filters select the output in a different fashion than do linear filters, i.e., ranking versus summing, the observation samples must also be weighted differently. In the median filtering case, weighting is accomplished through repetition. Thus, the output of the CWM filter is given by y(n) = Med[xm,... ,Xc-1 ,Xc
_ N, the CWM reduces to an identity operation. On the right side of (24) the time index n has been dropped for notational simplicity and the observation samples indexed according to their location in the observation window. In terms of the time series, the samples in the observation window are xi = x ( n - (N1 + 1) + i ) for i = 1 , 2 , . . . , N . The effect of varying the center sample weight is perhaps best seen by way of an example. Consider a segment of recorded speech. The voiced waveform "a" is shown at the top of Fig. 8. This speech signal is taken as the input of a CWM filter of size 9. The outputs of the CWM, as the weight parameter Wc from 1 to 9, are also shown in Fig. 8. The vertical index denotes the value given to Wc. The signal at the top is the original signal, or the output signal of the C M W when Wc = N, or 9 in this example. The second signal from the top is the CWM filtered signal with w~ = N - 1. The weight wc is successively decreased until w~ = 1, in which case the CWM filter reduces to the standard median. The smoothing characteristics of the CWM filter, as a function of the center sample weight, are illustrated in the previous example and figure. Clearly, as Wc is increased less smoothing occurs. This response of the CWM filter is explained by the following property which relates the weight Wc and the CWM filter output to select order statistics (OS). The N observation samples x I , X 2 , . . . , X N can be written as an OS vector, XL :
[X(1),X(2),... ,X(N)] ,
(25)
K. E. Barner and G. R. Arce
572
0
50
100
150
200
250 time n
300
350
400
450
500
Fig. 8. Effects of increasing the center weight of a C W M filter of size N = 9 operating on the voiced speech "a". The C W M filter o u t p u t is shown for wc = 1,3,5,7,9. Note that for wc = 1 the C W M reduces to median filter, and for wc = 9 it becomes the identity filter.
where x0) _< x(2) _< .-. _< X(N). The following relation (Hardie and Boncelet 1993; Ko and Lee 1991) utilizes this notation. PROPERTY 3.1. Let {y} be the output of a CWM filter operating on the sequence {x}. Then y(n) = Med[xl,..., x~_,, xc owe, Xc+,,. . . , XN], = Medlx(k),Xc,X(N_k+,)]
wherek=(N+2-wc)/2for l_<wc N . From this property we can write the CWM filter output y(n) as
y(n) =
I Xc x(k) X(x+l-k)
if X(k) ~ Xc ~ X(N+I_k) if Xc _ X(N+I k)
(26) (27) []
(28)
Since x(n) is the center sample in the observation window, i.e., xc = x(n), equation (28) indicates that the output of the filter is identical to the input as long as the x(n) lies in the interval [x(k),X(u+l k)]- If the center input sample is greater than X(N+I-~) the filter outputs X(u+l-k), guarding against a high rank order (large)
Order-Statistic fihering and smoothing of time-series. Part H
573
aberrant data point being taken as the output. Similarly, the filter's o u t p u t is X(k) if the sample x(n) is smaller than this order statistic. This C W M filter performance characteristic is illustrated in Figs. 9 and 10. Figure 9 shows how the input sample is left unaltered if it is between the trimming statistics x(k) and X(N+l-k) and m a p p e d to one o f these statistics if it is outside this range. Figure 10 shows an example o f the C W M filter operating on a Laplacian sequence. A l o n g with the input and output, the trimming statistics are shown. It is easily seen h o w increasing k tightens the range in which the input is passed directly to the output.
I
I
L X(k)
X(1)
X(N+I-k)
X(N)
Fig. 9. The center weighted median filtering operation. The center observation sample is mapped to the order statistic x(k) (X(N+J-k)) if the center sample is less (greater) than X(k) (X(N+I-k)), and left unaltered otherwise.
I
---
Input signal Trimming order statistics Filter output
, II f
il ii
'
ill
-1
iI . . . .Ii '!i 1
iI F
-2
II
!1
I
I i' r
i
i Lli~ li
~
ili
~ j~r ~
~r
~"
11',, iJ~l Ii I
II
il !l liI -3 I
-4
I
0
20
40
60
80
100
120
140
160
180
200
Fig. t0. An example of the CWM filter operating on an i.i.d. Laplacian sequence with unit variance. Shown are the filter input and output sequences as well as the trimming statistics x(k) and X(N+t-k).The filter window size is 25 and k = 7.
K . E . Barner and G. R. Arce
574
3.2. Weighted median filters The weighting scheme used by CWM filters can be naturally extended to include all input samples. To this end, let w = [wl, w2,..., WN] be a N long weight vector with positive integer elements that sum to an odd number, i.e., ~U=l wi is odd. Given this vector of weights, the W M filter operation is defined as (Brownrigg 1984)
y(n) = Med[x(n) w] = Med[x~ owl ,x2 o W2,... ,X u WN].
(29) (30)
Thus, W M filters incorporate temporal order information by weighting samples according to their temporal order prior to rank filtering. The filtering operation is illustrated through the following example. EXAMeLE 3.1. Consider the window size 5 W M filter defined by the symmetric weight vector w = [1,2, 3, 2, 1]. For the observation x(n) = [12, 6, 4, 1,9], the filter output is found as
y(n) = Med[12 o 1,6o2,43, 1 o 2 , 9 o 1] = Med[12, 6, 6,4, 4, 4, 1, 1,9] = Med[1, 1,4,4,4, 6,6, 9, 12] =4
(31)
where the median value is underlined in equation (31). The large weighting on the center input sample results in this sample being taken as the output. As a comparison, the standard median output for the given input is y(n) = 6. [] The W M filtering operation can be schematically described as in Fig. 11. This figure illustrates that as the filter window slides over an input sequence, the observation samples are duplicated (weighted) according to their temporal order within the window. This replication forms an expanded observation set which is then ordered according to rank, and the median sample selected as the output. In this fashion specific temporal order samples can be emphasized, and others deemphasized. The figure also illustrates that structurally, the W M filter is similar to the linear FIR filter. This relationship between linear and W M filters can be further explored through an alternative W M filter definition. The constraint that the W M filter weights be integer valued can be relaxed through a second, equivalent, filter definition. Thus, let w be an N element weight vector with positive (possibly) non-integer elements. The output of the W M filter defined by w and operating on the observation x(n) can be defined as
y(n) = arg rn~n D~ (fi)
(32)
where D"w(.) is the weighted distance operator N
D~w(fi) = E w'lxi - fl['~ " i:l
(33)
Order-Statistic filtering and smoothing of time--series: Part H
575
Observation Window x4
...
XN
)¢ ~
20, but both parameters being unknown. Even though a UMP test of (1) exists and is of the form Decide //i
iff Z > b
(2)
the threshold b cannot be determined for a given probability of false alarm, PF, (same as type I error probability), since 20 is not known. The PF of the test (2) with a fixed b varies significantly even with small changes in 20. A method to obtain a C F A R test is then based on obtaining several reference samples X = (X 1,)(2,...,Xn) as the output of the squared envelope detector corresponding to the radar returns in the cells adjacent to the test cell (typical values of n range from 10 to 30). The hope is that the noise plus clutter present in these reference cells are similar to the noise plus clutter in the test cell, and therefore a reasonable estimator of 20 using the reference samples can be obtained. Typically it is assumed that the samples XI,X2,...,Xn are independent among themselves and are independent of Z. Correlation between the samples might occur when the samples are converted to digital signals using an A/D operation. A/D sampling frequencies higher than the radar bandwidth cause adjacent resolution cell voltages to be statistically correlated (Nitzberg 1992). As a first degree analysis, such correlation effects can be ignored. Denoting the estimator as S(X), a test inspired by (2) is given by HI
z _> t s
(3)
_ t S]20)
(4)
If 20 is the scale parameter of the density of S and if it is the only parameter, then PF is independent of 2o, and a constant t that achieves the required false alarm probability can be found. In the case of exponential density with identically distributed {Xi}, the sample mean n!2ni=l X,. is an U M V U E of 20 and the test (3) with S as the sample mean is called the cell average (CA-CFAR) test: tl
Z>_t
tS
[
"
Decide Ho
~' Yes Decide H1 Fig. 1. A CFAR test based on adjacent resolutioncells. appealing because it uses UMVUE. If the problem (1) is modified such that under Ho, i.i.d, exponential reference samples {Xi} with mean 20 are available, then it has been shown recently by Gandhi and Kassam that the CA-CFAR test is indeed UMP for a specified PF (see Gandhi and Kassam 1994). The CA test is not totally satisfactory because the X~ s may not be identically distributed always. It is well known that the sample mean is not a robust estimator when outliers are present. Realistically, with a reasonably large number of adjacent cells, it is likely that 1) some of these samples are from other interfering targets and 2) two groups of samples may be from differing clutter power backgrounds when a clutter transition occurs within the range of resolution cells. There are several models for clutter transition, such as ramp, step etc. (Nitzberg 1992), but we consider only the step clutter transition in the sequel. This seems to be predominant in many practical situations. Fig. 2 illustrates a step clutter, showing two values for clutter power, one to the left and the other to the right of the transition. When the clutter transition occurs, ideally we want the estimate of the power level (which is the mean in the case of exponential) of the clutter-plusnoise background that is present in the test cell under investigation. Since estimates are based on a finite number of samples, the ideal value cannot be realized. The tradeoff parameters are 1) good probability of detection performance in the homogeneous background, that is, no interfering targets or clutter power variations 2) a good resolution of multiple targets that may be closely spaced within the resolution cells and 3) low false alarm rate swings during clutter transitions.
646
R. Viswanathan
Clutter Power
High
c_
Low /
I
1 2
I
. . . . . .
I
I
Test Cell
I
. . . . . .
b
n+l Cell Number
Fig. 2. Step clutter.
Two practical problems that are associated with the performance of CAC F A R can now be seen. The threshold t in the CA test of (5) is computed using a homogeneous background and a specified probability of false alarm, which is typically of the order of 10 -6. If there is even a single interfering target causing one of the reference samples in the set {X/, i = 1 , . . . , n} to have a larger value, then the estimate s will assume a value larger than what it should. The consequence of this is that if a target is present in the test cell, the probability that Z exceeds tS will be diminished yielding a low probability of detection. This is called target masking phenomenon. Another undesirable situation is when a clutter transition occurs within the reference cells such that the test cell is in the high clutter power region along with half of the reference cells, with the remaining in the low clutter. In this case the estimate s will be lower than what it should be, and if there is no target in the test cell, the probability that Z exceeds tS will be increased yielding a large increase in probability of false alarm (upper false alarm swing). It is to be understood that for any of the C F A R test discussed here, the test will be designed, that is appropriate threshold constant and test statistic will be found, so that the test has the desired false alarm probability under homogeneous noise background. Historically two variations of CA-CFAR, which are termed G O - C F A R and SO-CFAR, were proposed to remedy some of the problems associated with CA-CFAR. With reference to Fig. 1, the SO-CFAR, which stands for smallest-of-CFAR, takes as the estimate of 20, the minimum of the two arithmetic means formed using the samples to the right of the test cell (lagging window) and the samples to the left of the test cell (leading window), respectively (Weiss 1982). Therefore, if one or more of interfering targets are present in only one of the lagging or leading windows, then the estimate s will not be large as in the CA case, and therefore the target masking does not happen. However, target masking happens if interfering targets appear in both the win-
Order statistics application to CFAR radar target detection
647
dows. Also, it is unable to control the upper false alarm swing that happens with a clutter transition. The GO-CFAR or the greatest-of-CFAR computes as its estimate the maximum of the two arithmetic means from the two windows (Hansen 1973). This controls the upper false alarm swing during clutter transition but target masking occurs when interfering targets are present. Thus, these two variations of CA-CFAR are able to address one, or the other, but not both the problems encountered with the CA-CFAR. Estimators based on order statistics are known to be robust. Also, not using the samples from interfering targets in obtaining an estimate is essentially a problem of estimation with outliers, and therefore estimators such as trimmed mean, linear combination of order statistics etc., should prove useful. The rest of the chapter is organized as follows. In Section 2 we discuss order statistics based CFAR tests for target detection in Rayleigh clutter. Section 3 presents order statistics based CFAR tests for Weibull, log-normal, and K-clutter. In section IV we conclude this chapter.
2. Order statistics based CFAR tests for Rayleigh clutter
When the clutter amplitude is Rayleigh distributed, the squared-envelope detector output is exponentially distributed. Since the noise also has a Rayleigh amplitude, the squared-envelope detector output when noise plus clutter is present at its input will also be exponentially distributed. When a clutter power transition occurs, for the sake of convenience, we call the high clutter plus noise as simply clutter (or as high clutter) and the low clutter plus noise as simply noise (or as low clutter). Reflections from an air plane due to the transmission of a single pulse can be modeled as a Gaussian process, with the corresponding amplitude distributed as Rayleigh. Such a model for the target is called the Rayleigh fluctuating target model or simply Rayleigh target (Di Franco and Rubin 1980; Nathanson 1991). For multiple pulse transmission, the target returns can be classified i n t o four categories, Swerling I through Swerling IV. Swerling ! and II have Rayleigh distributed amplitude whereas Swerling III and IV have Rayleigh-plus-one dominant amplitude distribution (Di Franco and Rubin 1980; Nathanson 1991). For most part we concern ourselves with the single pulse case and Rayleigh target. With a Rayleigh target, under the target hypothesis H1, the envelope W will be Rayleigh distributed (or Z will be distributed as an exponential). Therefore, the test sample and the reference samples are all independent and exponentially distributed. The mean of each of these samples is determined according to the scenario. The mean of the squaredenvelope detector output when only noise is present at its input, is assumed to be 20. The mean, when a target reflection is present, is taken as 20(1+ SNR), where SNR stands for signal-to-noise power ratio, which is the ratio of the means of the squared envelope corresponding to target and noise, respectively. Similarly, one can define INR, the interfering target-to-noise power ratio and
648
R. Viswanathan Table 1 The mean values of different cells Cell
The mean value of the exponential dist.
Test Cell Z Noise only Clutter (high) Reference Cell X/ Noise only Clutter (high) Interfering target
Rayleigh Target (HI) No target (Ho) 2o(1 + SNR) 20 2o(1 + SNR + CNR) 2o(1 + CNR) 20 20(1 + CNR) 2o (1 + INR)
CNR, the clutter(high)-to-noise power ratio. The detection problem can be summarized as follows. 2.1. Fixed order statistics test ( O S - C F A R )
The fixed order statistics test (OS-CFAR) is based on the following: HI
Z > t Y~
(6)
< H0
where Y~is the r th order statistic of the samples {X/, i = 1 , . . . , n}. Since the sample size n is clear from the context, the r th order statistic is denoted by Y~, instead of the customary Y~:,,.First thing to observe is that (6) is a C F A R test, because under homogeneous background, 2o is also the scale parameter of the density of Yr. The probability of false alarm and the probability of detection under homogeneous background are given by (Gandhi and Kassam 1988; Rohling 1983) r 1
PF = P ( Z >_ t Y~IHo) = 1-I (n - i)/(n - i + t)
(7)
i=0 r-I
PD = P ( Z >_ t Y,.IHI) = 1--[ ( n
-- i ) / ( n
-- i +
(t/(1 + SNR)))
(8)
i=0
Notice that PD can be found from PF expression by replacing t with t/(1 + SNR). Because of the assumed Rayleigh target and Rayleigh clutter models, this relation between PF and PD is valid for any C F A R test, Rohling first proposed an order statistic test for C F A R detection (Rohling 1983). It was later analyzed by Gandhi and Kassam (1988), and Blake (1988). Detailed analysis of CA-CFAR and OS-CFAR are presented in (Gandhi and Kassam 1988). As this paper shows, in homogeneous background, for n = 24, PF = 10-6, the probability of detection of OS-CFAR, when r is greater
Order statistics application to CFAR radar target detection
649
1.0 I J
I
N=
Pf=
24
~".
1E-6
/y 1 and 1 < b < n. b has to be chosen smaller than n/2 so as to make a possible inference about the presence of a clutter transition near the middle of the reference window, d has to be chosen as a compromise between detection performances under homogeneous and interfering target situations. For Rayleigh target in Rayleigh clutter, closed form expressions for PF, corresponding to homogeneous and interfering target situations are obtained in (Viswanathan and Eftekhari 1992). In the homogeneous case, the probability of selecting the subset with size r is given by
b
(b_l
1, i=0 i (-l/i beta(v-b+ 1,
)
19, -
where beta(.,.) is the standard beta function. Based on (19) and another expression for Ps(r) corresponding to interfering target situation, it is possible to reasonably optimize d for a given b. Ideally, the subset selection should meet the following. 1) All X~'s should be selected, if they are from homogeneous background. 2) If there are multiple targets, the samples due to these should not be selected but the rest should be included. 3) If clutter background with differing power levels exist, the samples whose power levels are the same as the one present in the test cell must be selected and the rest should not be selected. In practice, all
Order statistics application to CFAR radar target detection
657
these requirements cannot be met. The design of (b,d) can be completed by considering few b values below (n/2) and then obtaining the best (b, d) value over all the choices considered. Smaller d is better when interfering targets exist whereas a larger d is preferable for the homogeneous case. As shown in (Viswanathan and Eftekhari 1992), a compromise value for d can be usually chosen from a study of Ps(r). The design of SE is then completed by specifying ft. This can be done by means of a "look up" table that provides the proper choice of fl values for every r value. The proper choice, as explained in (Viswanathan and Eftekhari 1992), is based on logical reasoning. For example, if r is determined as being close to n/2, it implies a possible clutter transition situation, and therefore fl needs to be kept close to n, to control the upper false alarm swing. It is shown that for a given OS-CFAR, a SE test can be designed so that (i) it can tolerate an additional target over the OS and (ii) its false alarm increase in clutter transition is much below the level of the OS, as well as the VTM. The false alarm control during clutter transition gets better as C N R increases. This is to be anticipated, because as C N R increases, it is much easier to identify the "outliers" - the high clutter samples, from the composite group. The subset selection (18) is identical to the one used in VTM(16), but by choosing b to be smaller than n/2, and by having a better estimation procedure, the SE test is able to provide a better false alarm control during clutter transition. Gandhi and Kassam considered another test called the adaptive order statistic detector(AOS) (Gandhi and Kassam 1989). AOS uses a statistic similar to (17) where fl takes one of two possible values, kl or k0, with kl >_ k0. These two numbers are the design parameters. A hypothesis test on the reference samples yields a decision on whether a clutter transition is present or not within the window. If clutter present decision is made, the order kl is chosen, otherwise the order/co is used. Like the SE test, AOS can be designed to limit the false alarm increase during clutter transition. Lops and Willett considered another order statistic based scheme called LI-CFAR (Lops and Willett 1994). It is based on a combination of linear and ordering operations. Denote the rank ordered vector of the reference vector X as Xr (X(1), X(2),... ,X(n)). Here for convenience the r th order statistic Yr is denoted as X(,.). The test is again based on (3) with S given by =
(20)
S = cTw
where
= (X(1)l,...,X(1)nlX(2),,.
X(j), =
{X~ 0
if
.... X(2)nl.. " iX(n)l,...,X(n)n
X(j)+-+Xk else
)T
(21) (22)
and the notation X(j) +-~Xk implies that the k th element in X occupies the jth location in the ranked vector Xr. Both e and w are column vectors of length n 2. The design of LI test(or filter) is controlled by the elements of
R. Viswanathan
658
j=0,1,...,n-1;k= optimization problem
C~ Cj, k
1 , 2 , . . . , n . c is obtained as the solution to the
c = arg m i n ( E ( 2 0 - S) 2)
(23)
subject to the constraints c>0
and
E(S)=20
(24)
This is a quadratic programming problem for which an efficient solution exists. The solution to (23) depends on two quantities Rw = E(ww r) and p = E(20w). If (23) is solved for homogeneous background, the solution would turn out to be CA-CFAR, because the sample mean is the minimum mean square error unbiased estimator. Since analytical expressions for Rw and p are not possible, these will have to be estimated from realistic realizations of the vector w. That is, the Ll filter must be trained with realizations of X that best describe the different heterogeneous and homogeneous situations. A model for the generation of X is then based on the following. Each reference cell is affected independently by an interfering target return with probability Pi (the subscript i denotes interferer). A step clutter occurs (or does not occur) within the window with probability pc ((1 -Pc)). The step occurrence being low to high, or high to low, is then decided on an equally likely basis. In order to generate interferer and clutter samples, the parameters INR and C N R are also needed. The authors show that the Ll filter is an average of a collection of linear combination of order statistic filters (L-filters). They provide performance equations for Rayleigh target in Rayleigh clutter, but simulation of partial random variables yields computationatly faster error probability estimates than a direct evaluation of analytical expressions, which is combinatorially explosive. Even though the design of the Ll filter is elaborate, and it requires a training process, once the coefficients are obtained, the actual implementation on line is simpler (notice that only n terms in the vector w are nonzero). Based on the results from this study it can be said that LI filter provides a better false alarm control than a simple OS-CFAR. It is not known whether a L1 type filter, or the SE test performs better in an overall fashion, as no comparative study has been done. Barkat, Himonos, and Varshney proposed a generalized censored mean level detector (GCMLD) (Barkat et al. 1989). The data dependent censoring scheme of the G C M L D determines the number of interfering targets present in the reference window, for a given false probability of false censoring. This detector, however, is designed to operate satisfactorily in homogeneous backgrounds and multiple target environments only, and would exhibit considerable increase in the false alarm rate in regions of clutter power transitions. Finn considered a multistage C F A R design procedure, which is not order statistic based, but which uses maximum likelihood estimates (Finn 1986). The scheme tests the reference samples for homogeneity, for possible clutter transition, and position of clutter transition, if a transition is suspected, and for the samples from possible interferers. These tests are parametric and use the know-
Order statistics application to CFAR radar target detection
659
ledge that the target and clutter are Rayleigh distributed. The author has quoted encouraging results based on simulation studies. The drawback of such a multistage procedure is that it introduces statistical dependencies and it cannot be analytically evaluated. 2.3. Distributed C F A R tests based on order statistics
Distributed radar target detection refers to the situation where geographically separated radars look for targets in a search volume. The radars then communicate their information, including their decisions with regard to the presence of targets, to a central site called the fusion center. The fusion center then combines information from all the radars to form target tracks. In a very general situation, the separation would be so large that the same target will not be seen by all the radars at the same time. We consider a simpler situation where a search volume is simultaneously searched by multiple radars. A typical scenario would be a search with two or three radars. As before, only the target detection problem is addressed. Sending all the reference samples and the test samples would require more communication capacity in the links between the fusion center and the radars, and would also demand increased processor capability at the fusion center. To ease these requirements, two approaches are considered in the literature. 1) Individual radars send their decisions to the fusion center, and the fusion center makes a final decision based on the individual decisions. 2) The radars send condensed information in the form of few statistics to the fusion center, and the fusion center makes the decision regarding the presence of a target in a test cell in the search volume. Uner and Varshney analyzed distributed C F A R detection performance in homogeneous and nonhomogeneous backgrounds (Uner and Varshney 1996). Each radar conducts a test based on its own reference samples and the test sample, and sends its decision to the fusion center. Let ui denote the decision of t h e i th radar such that ui =
1 0
if if
ith s e n s o r decides H1
ith sensor decides H0
(25)
If the distributions of the Bernoulli variables {ui}, under the two hypotheses, are known completely, then the fusion center can employ an optimal likelihood ratio test based on {ui} (Chair and Varshney 1986; Thomopoulos et al. 1987). However, in radar situation with nonhomogeneous reference cells and unknown target strength, it is not possible to construct such a test. Therefore, the fusion center employs a reasonable nonparametric test of the type of a counting rule. That is, the counting rule is given by n ,~--~ Ui HI >k < i=1 H0
(26)
where k is an integer, k = 1 is called an OR rule because in that case (26) is nothing but the OR operation on the Boolean variables {ui}. Similarly, k = n
660
R. Viswanathan
corresponds to the A N D rule and k = (n + 2)/2 (for n odd) corresponds to the majority logic rule. The authors considered OS-CFAR and CA-CFAR for individual radar tests and considered A N D and OR rules for the fusion site. The distributed OS test is more robust with respect to interfering targets and false alarm changes than the distributed CA test. Amirmehrabi and Viswanathan evaluated a distributed C F A R test called signal-plus-order statistic C F A R (Amirmehrabi and Viswanathan 1997). The radar modeling assumes that the returns of the test cells of different radars are all independent and identically distributed. In this scheme, each radar transmits its test sample, and a designated order statistic of its surrounding observations to the fusion center. At the fusion center, the sum of the test cells' samples is compared to a constant multiplied by either (1) the minimum of the order statistics (called mOS detector) or (2) the maximum of the order statistics (called MOS detector). For detecting a Rayleigh target in Rayleigh clutter with two radars, closed form expressions for the false alarm probabilities, under homogeneous and nonhomogeneous conditions, are obtained. The results indicate that the MOS detector performs much better than the OS detector with the A N D or the OR fusion rule. Of course, this is achieved at the price of sending two real numbers from each sensor instead of sending only a binary information (bit), as in the case of a counting rule. The MOS detector performs better than the mOS detector and performs nearly as good as a central order statistic detector that compares the sum of the test samples against a constant times an order statistic of the reference samples from all the sensors. The general superiority of the MOS is of no surprise in the light of earlier result that the MX-OSD performs better than a fixed OS test, in terms of its ability to control the false alarm increase, without sacrificing detection performance in interfering target situations. A drawback of the MOS is the assumption that the returns in the test cells of the two radars are identical. In a more realistic situation, the noise powers of the test cells of the two radars would be different, and the actual probability of false alarm of the MOS test would change from the designated value, with the exact value being a function of the ratio of the noise powers. Elias-Fuste, Broquetas-Ibars, Antequera, and Marin Yuste considered k-out of - n fusion rule with CA and OS tests for individual radars (Elias-Fust'e et al. 1992). Rayleigh target and Rayleigh clutter were the models used in the analysis. Necessary conditions that the individual thresholds at the radars and the k value should satisfy, in order to maximize probability of detection, for a given probability of false alarm at the fusion center, are derived. Numerical results indicate that there exists no unique k that is optimum for a wide range of system parameters, such as the individual SNRs, the assumed order of OS detector at each radar, the number of reference cells at each site etc. Distributed order statistic C F A R was investigated by Blum and Qiao for detecting a weak narrowband signal in Gaussian noise (Blum and Qiao 1996). A two sensor system using either A N D OR rule was considered. The signal modeling allows for statistical dependency between test cells of the two sensors. However, weak signal detection has more relevance to sonar rather than radar targets.
Order statistics application to CFAR radar target detection
661
3. Order statistics based tests for non-Rayleigh clutter In this section we consider order statistics based tests for Weibu11, log-normal and K-clutter distributions. Unlike the Rayleigh case, these are distributions with two parameters. Therefore, the C F A R detectors designed for the case of both parameters being unknown are less powerful in general, as compared to the detectors designed for the case where one of the parameters is known. 3.1. Weibull clutter
Let W denote the output of the envelope detector corresponding to the test cell. W is distributed as Weibull if the corresponding CDF is given by
Rayleigh clutter is a member of the Weibull family since it can be obtained from (27) with C = 2. The parameter C controls the skewness of the distribution (the "shape" parameter), whereas B is the scale parameter. Smaller values of C result in heavier tailed distributions, viz. spiky clutter. Notice that Z = W2 is also distributed as another Weibull. The moments of (27) are given by
r r
1)
(28)
Thus, for a fixed C, the clutter power (E(W2)) variation is due to a change of B, and a C F A R test maintains a constant false alarm rate irrespective of the value of B. Also, for a fixed C, the C D F is a stochastically ordered family with respect to B. Therefore, if a radar clutter data fits reasonably well a Weibull with a fixed C value, then an order statistic can be used as an estimator for B, and a C F A R test formulated as in the Rayleigh case. Even though an order statistic estimator is not very efficient for small sample sizes, the OS-CFAR can easily tolerate interfering targets and provide some false alarm control during clutter power transitions. Notice that the squared-envelope detector output Z = W2 is distributed as Weibull with parameters (C/2, BZ). Therefore, for single pulse detection with an OSCFAR, it does not matter if a test is formulated with W or Z. An OS-CFAR test based on the squared-envelope is of the form (6) with t determined for a given false alarm requirement. The order number r is a design parameter. The probability of false alarm is given by (Levanon and Shot 1990) nl(t c/2 + n - r)! PF = ( ~ - ) [ ( ~ ~-~!
(29)
Notice that the solution for t requires the knowledge of C. We can call the test (6) as one parameter OS for Weibull clutter. The shape parameter significantly affects the probability of false alarm. Fig. 6 shows the variation of PF with C, for dif-
R. Viswanalhan
662 0.0 N=
~E"
24
-5.0
0~
0~
-6.0
"5 .6 o n
o~
2
r
-10.0
"\
r = 24 /
-15.0 L 1
2
~ -
3
C
Fig. 6. False alarm changes with changes in the shape parameter of Weibull (test designed for Pf=l E - 6 a t C = 2 ) . ferent values of r. In this figure, for a given r, and C = 2 (Rayleigh), t had been fixed at a value to provide PF = 10 -6. When the clutter is nonstationary, and its envelope distribution changes, it may not be reasonable to assume that C is known. Also, when clutter variations take place within the adjacent resolution cells, both C and B might change. Therefore, it is desirable to have a C F A R test that is independent of both B and C. Such a test, called two parameter OS, was formulated by Weber and Haykin (1985):
Z H1 ~ ~l-flyjfl
(30)