The Stress-Strength Model and its Generalizations Theory and Applications
The Stress-Strength Model and its Generaliz...
447 downloads
887 Views
10MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
The Stress-Strength Model and its Generalizations Theory and Applications
The Stress-Strength Model and its Generalizations Theory and Applications
= >
P(X 0 ) . Estimation of P(A'X + B'Y + C> 0) is studied in detail in Sections 3.4 and 3.5. 1.3 1.3.1
Stress-Strength Models: History and Geography History
It may be of interest to point out that chronologically the stress-strength model originated not in a parametric but rather in a nonparametric set-up in the path breaking works of Wilcoxon (1945), Mann and Whitney (1947). The main objective of these investigations was to compare two random variables X and Y which describe results of two treatments. Wilcoxon, Mann and Whitney introduced statistic which bears their names and is based on ranks of the observations on X and Y in the joint sample. They also pointed out the connection between the hypothesis Fx — Fy and P(X < Y) = 1/2. Their initial effort lead to the series of papers studying point and interval estimation of P(X < Y) in sixties of the last century. Here we should mention Birnbaum (1956), Birnbaum and McCarty (1958) (this was the first paper with P(X < Y) in its title), Govindarajulu (1967, 1968), Owen et al. (1964), Sen (1960, 1967), Van Dantzig (1951) and Zaremba (1965) among
Stress-Strength Models: History and Geography
7
others. Nonparametric methods, were "safe" in the sense that they posed no assumptions on X and Y, however, they may be were too inefficient for practical purposes. In a way, this methodology was somewhat akin to the approach encountered by one of the authors of this book on his sojourn to China in late seventies where at that time dentures of only one size were available and the patients were compelled to adjust their mouths to these average specifications. The first attempt to study P{X < Y) under certain parametric assumptions on X and Y was undertaken by Owen et al. (1964) who constructed confidence limits for P(X < Y) when X and Y are dependent or independent normally distributed random variables. In the sixties very little was done to investigate a parametric version of the stress-strength model, however, in the seventies investigation of the topic gathered some steam. By the end of seventies, estimation of P(X < Y) was carried out for the major distributions such as exponential (Kelley et al. (1976), Tong (1974)), normal (Church and Harris (1970), Downton (1973), Woodward and Kelley (1977)), Pareto (Beg and Singh (1979)) and exponential families (Tong (1977)). Also, significant advances in Bayes estimation of P(X < Y) for exponentially or normally distributed X and Y were made by Enis and Geisser (1971). The other milestones of the seventies are introduction of non-parametric empirical Bayes estimation of P(X < Y) (Ferguson (1973), Hollander and Korwar (1976)) and the study of system reliability (Bhattacharyya and Johnson (1974)). By the late eighties estimators of P(X < Y) were obtained for the majority of common distribution families for the situations when X and Y are independent, see e.g. Awad and Gharraf (1986), Beg (1980a,b,c), Constantine et al. (1986), Ismail et al. (1986), Iwase (1987), Reiser and Guttman (1986), Voinov (1984). At the same time, the efforts were also made to consider broader, more realistic, models. In the view of the successful introduction of a variety of bivariate exponential distributions by Gumbel (1960), Preund (1961), Marshall and Olkin (1967), Block and Basu (1974), it became possible to study dependent exponential random variables with various types of dependence. Estimators of P(X < Y) for a bivariate exponential random vector (X, Y) were derived by Abu-Salih and Shamseldin (1988), Awad et al. (1981), Klein and Basu (1985) among others. Pensky (1982) constructed estimators for P(A'X + C > 0) for a normally distributed random vector X with a general variance-covariance matrix. Going further, Bilikam (1985) discussed above suggested a time-dependent
8
The Stress-Strength Models. Mathematics, History, and Applications
model for X and Y, and Raghava Char et al. (1984) studied stress and strength Markov models for the system reliability. Some other important advances of the eighties were investigations of extensions of "standard" stress-strength models such as stress-strength models with categorized data (see e.g. Brownie (1988), Halperin et al. (1989), Simonoff et al. (1986)) or explanatory variables (Guttman et al. (1988)). Another major achievement was an application of the theory to a variety of real-world problems (see e.g. Guttman et al. (1988), Johnstone (1983), Halperin et al. (1987), (1989)), Simonoff et al. (1986), Ury and Wiggins (1979)). Some of the above mentioned works were summarized in the review paper by Johnson (1988). In the nineties and the years 2000-2002 we have witnessed further developments on stress-strength models. More diverse probabilities such as A'kXk + C > 0) where Xi, , Xfc are P{Xi <Xk) and P(AiXi + independent normal vectors were studied (see e.g. Ivshin (1998), Ivshin and Lumelskii (1994), Hayter and Liu (1996), Miwa et al. (2000)), and some new, less familiar distributions were considered such as Burr type X (Ahmad et al. (1997), Surles and Padgett (1998, 2000)), mixtures of inverse Gaussian (Akman et al. (1999)), skew-normal (Azzalini and Chiogna (2002), Gupta and Brown (2001)), Wienman multivariate exponential (Cramer and Kamps (1997a), Cramer (2001)), bivariate Pareto (Hanagal (1997a)), elliptical (Pensky (2002)), and generalized gamma (Pham and Almhana (1995), Pensky and Takashima (2002)). The field seems to have reached its maturity. It is virtually impossible to mention here every author who contributed to the development of the stress-strength models, and we apologize in advance for inadvertently omitting some names in this brief history review. 1.3.2
Geography
In our age of globalization, Internet and instantaneous World communications, geography rarely affects development of scientific theories. This, however, is not entirely valid as far as the stress-strength models are concerned. The research in this area has been conducted all over the world, and the results appeared in publications ranging from the Journal of the American Statistical Association to Pakistanian Journal of Statistics, from the Canadian Journal of Statistics to the Chinese Journal of Mathematics, from the Journal of Korean Statistical Society to the Journal of Mathematical Sciences (which publishes translations of Russian collections), from the Journal of Indian Association for Productivity, Quality Control and
Applications
9
Reliability to Prace Naukowe Instytutu Matematyczuego Politechniki Wroclawskiej (Proceedings of the Institute of Mathematics of the Wroclaw Polytechnic) . However, the bulk of results was obtained by the American, Russian, Canadian and Indian scientists (some of the latter are residing in USA). The peculiarity of situation was that the Russian school, being mainly confined to provincial city of Perm where almost no foreign publications were available, has developed and worked in complete isolation from their Western and Eastern (Indian) colleagues. For this reason, useful techniques in estimation theory developed in Perm from late sixties to the early eighties (see e.g. Lumelskii (1969a), Lumelskii and Sapoznikov (1969), Lumelskii and Pensky (1982)) were unknown to the rest of the world. On the other hand, due to the lack of information, Russian scientists occasionally "reinvented the wheel" constructing estimators that have already been available in literature. It should also be noted that their derivations of the best unbiased estimators were based on a general technique (described in Section 2.2) and were supplemented by estimators of the corresponding variances. The main thrust of exploration of the Perm school were the maximum likelihood and the best unbiased estimation of P(X < Y) and that , X& are independent normal of P(A[Ji.i H Aj.Xfc + C > 0) where Xi, vectors (k > 1). Their work of some twenty five years has culminated in the monograph by Ivshin and Lumelskii (1995) already mentioned above.
1.4
Applications
As it was already pointed out, the subject-matter of this book - the stressstrength models - initially originated from a seemingly unrelated problem of classical non-parametric tests of equality of two distribution functions. It then naturally led to the expressions of the type P(X < Y) and next it was realized that these quantities can be fruitful for examining the probability of inequality-type relations between two or more random variables under a great variety of conditions and situations. This naturally resulted in applications in numerous engineering problems under the banner of "reliability" provided that the random variables under consideration admit appropriate interpretation. These will be discussed in detail in Chapter 7 in which a number of specific engineering cases and other applications are described.
10
The Stress-Strength Models. Mathematics, History, and Applications
Next it became evident that practical applications are by no means confined to engineering, or to military problems (only a small portion of the latter research constitutes a public knowledge). In fact, the advances in medical statistics in the last twenty years triggered numerous applications for medical-oriented problems of which the clinical trials are one of the fastest growing areas. Next came applications in psychology which required adjustment of the theory to accommodate categorical data. Further natural applications, especially in but not limited to medicine, involve comparison of two or more random variables representing the state of affairs in two or more situations at different time intervals. The new frontier of potential application is the real-world problems where the model cannot be viewed as consisting involving independent identically distributed random variables and is more appropriately represented by a binary data leading to the so-called "ROC approach" with a strong dose of logistic regression. One of the recent applications is the challenging problem of estimating the unknown strength characteristics from the observable distribution of stress which leads to more interesting probabilistic and statisticaltheoretical problems. Another possible application still in its infancy is the relation between the stress-strength models and the quality control concepts, specifically the so called process capability indices originated in the quality control literature some twenty years ago. It should be noted that as the sources of numerical data are becoming more widely available and statistical calculation becoming more accessible due to the rapid advances in computer technology more and more applications are to be expected. The stress-strength relation is an universal flexible relation easily adaptable to various fields of human endeavor and nature phenomena. It is a powerful tool for comparing and dissecting interrelated situations, the simplicity of the model can be both deceiving and rewarding.
Chapter 2
The Theory and Some Useful Approaches
2.1 2.1.1
The Maximum Likelihood Estimators The Theory
The maximum likelihood estimation (MLE) is undoubtedly the most popular (at least until now) procedure for estimation of reliability R = P(X < Y) due to its flexibility and generality. The technique can always be used if the joint distribution of the stress X and the strength Y is a known function with some unknown parameters. A detailed description of the MLE method is presented, for example, in Casella and Berger (1990) and Lehmann and Casella (1998). Here, we shall concentrate on a discussion of the MLE of the reliability R = P(X < Y). Assume that a random vector (X, Y) has the probability density function (pdf) f(x, y\&) with an unknown scalar or vector-valued parameter 6 € (Xn, Yn) ©. The aim is to estimate R on the basis of observations {X\, Y\), Note that if X and Y are independent with the pdf of the form (2.1) the number of observations for X and Y need not be the same. In general, the data is of the form (X_, Y_) (2.2)
with n\ =ni'\iX
and Y are dependent. 11
12
The Theory and Some Useful Approaches
Let f(2L, Yji 8) denote the joint pdf of the data, i.e. (2.3) Note that if X and Y are independent, (2.3) becomes ri2
n\
f(X,Y\8) = Ylfx{Xi\B) J ] MYj\0).
(2.4)
Definition 2.1 Given that (X, Y) is observed, the function of 9 defined by L(9\X_,Y_) = f{X_,Y_\8) is called the likelihood function. Definition 2.2 The maximum likelihood estimator (MLE) 8 = 9(X_, Y_) of the parameter 8 based on the sample (X_, Y_) is the parameter value at which the likelihood function L(8\X_, Y) attains its maximum as a function of 9. Theorem 2.1
(Invariance property of the MLEs.) If 8 is the MLE
of 8, then for any function B'y}
(2.17)
Q2 = {(x, y) : A'x + B'y + C > 0} ,
(2.18)
or
where A and B are known vectors and C is a known scalar (see e.g. Pensky (1982), Gupta and Gupta (1990), Ivshin and Lumelskii (1993) and Reiser and Faraggi (1994)). In the case when vectors X and Y have the same dimension (fci = k2 =-k), another important quantity is R = P((X,Y) € SI3) where fi3 = {(x, y) : xt < yu i = 1,..., k}
(2.19)
(see e.g. Singh(1981)). Similarly to the one-dimensional case, when constructing the MLE of R, the first step is evaluation of
R{8) = JJ f(x,y\0)dxdy. Then, the MLE of R is of the form R = R(9) where 6 is the MLE of 9.
2.2 2.2.1
Unbiased Estimation The Theory
The merit of the MLE approach is that it is universal and allows to obtain an estimator of R for practically any distribution family. However, estimators derived by the MLE method may be biased which is undesirable especially if the sample size is small. In such a situation, a way out might be construction of an unbiased estimator of R = P(X < Y). Below we shall briefly discuss unbiased estimation of R. A more detailed description of the methods of unbiased estimation and the sufficiency principle on which this method is based can be found in any standard text on statistical inference, e.g. Casella and Berger (1990) or Lehmann and Casella (1998). In this section we shall assume the same parametric set-up as in Section 2.1 postponing the discussion of nonparametric unbiased estimation until
Unbiased Estimation
17
Chapter 5. To develop the procedure one needs to assume that the family of pdfs f(x, y\8) has a sufficient statistic. Definition 2.3 A statistic T — T(X_,Y_) is said to be a sufficient statistic for 6 if the conditional pdf of the sample given the value T does not depend on 8. Intuitively, if T is a sufficient statistic for 8, then T captures all information about the parameter 8 that the sample (X_,Y) contains. To retrieve a sufficient statistic for the family of pdfs f(x, y\8) the theorem below can be used (see e.g. Casella and Berger (1990) or Lehmann and Casella (1998)). Theorem 2.2 Factorization Theorem. A statistic T is a sufficient statistic for f(x, y\8) if and only if there exist functions q(-\0) and h(X,Y) (the latter does not depend on 8) such that for all the sample points X_, Y_ and all possible 6 € 6 the joint pdf of (X_,Y_) defined in (2.3) or (2.4) is of the form f(X,Y_\8) = 9(T(X,Y)\8) h(X,Y).
(2.20)
There are infinitely many unbiased estimators of R = P(X < Y) based on the sample (K,Y_), e.g. V(X_,Y_) = -f(*i < ^i), V(X_,¥.) = [min(ni,n 2 )]~ 1 E^ii ( n i ' n 2 ) I(Xj < Yj), etc. Our objective, however, is to catch the one which has the smallest variance (and, consequently, the smallest MSE) for all values of 8. Definition 2.4 Let cp(8) be any parametric function. Then the unbiased estimator V*(X_, Y) of -
,j
j
The integral in (2.38) ought to be calculated numerically.
Bayes and Empirical Bayes Estimation of R
2.2.4
23
A Multivariate Case
So far, we have been discussing only the case when X and Y are scalar random variables. However, all of the results obtained above can easily be generalized to the case when X = (X^, ...,X^) and Y = (Y^\ ...,Y^) are random vectors. For example, the UMVUE of /(xi, ,Xfc,yi, ,yfc|0) becomes /(xi, .Xfe^!, ,yfc) = n*=i /(xj,yj|0o)x ,Xfc = Xfc, Yi = yi, teWo)]-1 geoWX! = x l f
, Y* = yfc), (2.39)
, Xfe = xfe, Yi = where g(T\90) is the pdf of T(X, Y) and se o (T|Xi = xi, yi, , Yfe =yk) is the conditional density of T for given X.j = Xj, Yj = yj, j = 1, , k, when 6 = 60. The UMVUE of R = P((X, Y) e fi) can then be derived as
R = JI {(x, y) € n} /(x, y)dxdy,
(2.40)
where dx = n * i i dx^, dy = 11%! dy{j) and fi is a subset of the (fci + fc2) dimensional Euclidian space described in Section 2.1. For example, fi can be defined by (2.17), (2.18) or (2.19). The UMVUE of the variance of (2.40) can be determined as -
/ / JJ
/(xi,x2,yi,y2)dxidx2dy!dy2.
(2.41)
Here, R is defined by (2.40) and W** = {(xi,x2,y!,y2) : (xi,yi)eft, (x 2 ,y 2 )efi}. 2.3 2.3.1
Bayes and Empirical Bayes Estimation of R The Theory
Bayes estimators of R are constructed in the same set-up as the MLE or the UMVUE. Let (X,Y) ~ f(x,y\9), where ~ indicates "distributed as", and the sample (X_, Y_) (see (2.2)) be available for estimation of R = P(X < Y). Bayesian approach treats parameter 6 (scalar or vector) not as fixed unknown constant(s) but as a random variable (vector) with the (joint) pdf TT(0) called the prior pdf. This pdf is based on some knowledge available
24
The Theory and Some Useful Approaches
to the person carrying out the inference and should be formulated before data has been obtained. Definition 2.5
Let -K{9) be a prior pdf of 9. Then the posterior pdf of
0is K{V\2L,L.)
- — /jf y\
(2-42)
Here, f{X_,Y_\9) is defined by (2.3) or (2.4) and
p{X,Y_) = [ f{X.,Y\0M9)M Je
(2.43)
is the joint unconditional marginal pdf of X_ and Y_. The pdf n(6\2L, Y.) in (2.42) is termed posterior since it is derived after X_ and y have been observed, and can be interpreted as an update of the prior pdf based on the data. Note that, in view of (2.42), the posterior pdf remains invariant if TT(#) is multiplied by a constant. This fact is often used in Bayesian analysis and is expressed by denoting "TT(#) oc " which u means n(6) is proportional to". Bayesian approach to statistical inference in particular and to the problem at hand is becoming more prominent. We shall therefore briefly present a necessary background. The Bayes estimator R of R can be obtained as the expectation of R — R(0) with respect to the posterior pdf TT(9\X_,Y_) R= f R{6) 7r(6\X, Y)d6.
(2.44)
The value of R(9) in (2.44) can be calculated using one of the expressions (2.5), (2.6) or (2.9). Another way of determining estimator (2.44) is to derive (if possible) the posterior pdf of R first and then find the estimator R as an expectation over this posterior pdf. The pdf TTR(R\X_, 1Q can be obtained using a transformation of the random variables. For this purpose, one needs to choose a one-to-one transformation F : (6) —> (R,9R) with the inverse Q = F~1. Then, the joint posterior pdf of (R, 9R) is given by ir(Q(R, 9R))\JQ(R, 6R)\, where \JQ(R, 9R)\ is the Jacobian of transformation Q, so that
2L,Y) = j'n(Q(R,9R))\JQ(R,eR)\d9R.
(2.45)
Bayes and Empirical Bayes Estimation of R
25
The Jacobian \JQ(R, 6R)\ here is the absolute value of the determinant of the matrix of the partial derivatives of 6 with respect to R and components of 6R. For example, if 6 = (61,62), then \JQ(R, 6R)\ is the absolute value of ' 88x 8R
8ix_ 88R
det ~8R
80R~
As we have already mentioned above, the most common choice for the Bayes estimator of R is the expectation over (2.45) R= f R irR(R\X, Y) dR.
(2.46)
However, other Bayes estimators such as the median of TTR(R\X_, Y_) or the value of R maximizing TTR(R\X_, Y) can be utilized. Each of these variants is used depending on the specific problem at hand. The posterior pdf (2.42) can also be applied for construction of an interval estimators of R. We relegate formulation and discussion of Bayes credible sets to Section 2.4. 2.3.2
The Choice of a Prior
From the discussion above it follows that a starting point of any Bayesian analysis is the choice of the prior. Volumes has been devoted to this problem compiled by most brilliant researchers in the last 20-30 years. How can one choose 7r(#) if no specific information about the values of the parameters is available or prior information is rather vague? There are several ways to obviate the dilemma. One of the most popular solutions is to take a conjugate prior for n(6). Definition 2.6 Let T denote the class of pdfs f(x,y\6). A class V of prior distributions is said to be a conjugate family for J- if the posterior distribution is in class V for all / G T and all priors in V. The advantage of using conjugate priors is that the posterior belongs to the same class as the prior, so that updating the prior reduces to updating of its parameters. As a rule, conjugate priors lead to straightforward mathematical calculations, and this may be one of the reasons that it has
26
The Theory and Some Useful Approaches
been applied for estimation of R by a number authors (see, e.g. Enis and Geisser (1971), Abu-Salih and Shamseldin (1988), among others). Another possibility is to use a noninformative prior. The convenience of choosing a noninformative prior lies in the fact that it can be constructed if no knowledge about the values of the parameters is available. However, its shortcoming is that majority of noninformative prior pdfs turn out to be improper, i.e. while being nonnegative, they do not integrate to one. Historically, it would seem that Laplace (1812) was the first to introduce the uniform noninformative prior n(9) — 1. However, since this prior alters under a one to one reparametrization, inference based on the resulting posteriors can often show significant variation. As a partial remedy, Jeffreys (1961) proposed the prior proportional to the positive square root of the determinant of the Fisher information matrix TT(0) = [det(/(0)] 1/2 .
(2.47)
If 9 = (0i, ,9k), then, under commonly satisfied assumptions (see e.g. Lehmann and Casella (1998)) 1(9) is the matrix with (i, j)-th element
Yet another possibility - in the absence of specific information about 9 is to match the Bayesian solution with the frequentist solution of the problem. A prior which satisfies this condition is called a matching prior. It is derived by requiring the classical frequentist coverage probability of the posterior region of a real-valued parametric function to match the nominal level with a remainder of the order of O(n~^2). Here n is the sample size, j = 1 for the first order and j = 2 for the second order matching prior (see, e.g. Datta (1996), Datta and Ghosh (1995), Ghosh and Mukerjee (1992), Mukerjee and Dey (1993)). However, unlike the case of Jeffreys's prior, derivation of the matching prior leads to much more extensive calculations. The advantage of Jeffreys's prior is that it remains invariant under any one to one reparametrization. But despite its success in one parameter case, Jeffreys's prior often runs into serious technical difficulties in the presence of nuisance parameters, that is, when some parameters that are present in a model may not be of a direct inferential interest. This situation often occurs in the case of the stress-strength model since we are interested in the value of R and do not need to know other component 9R (see discussion
Bayes and Empirical Bayes Estimation of R
27
preceding (2.45)). For this reason, it may be advantageous to use the socalled reference prior introduced by Bernardo (1979) and generalized in the articles by Berger and Bernardo (1989), (1992). This prior is specially devised for multiparameter situations and is derived by dividing the set of parameters into parameters of interest and the nuisance ones. Another attractive feature of reference priors is that they usually satisfy the matching criterion described above. Readers interested in this topic are referred to Kass and Wasserman (1996). Since the derivation of noninformative priors is not always an easy task, a catalog of noninformative priors compiled by Yang and Berger (1997) may be helpful. Several authors have constructed noninformative priors specifically for the stress-strength model (see, e.g. Kim et al. (2000), Lee (1998), Thompson and Basu (1993) and Sun et al. (1998)). 2.3.3
One-parameter Exponential
Distribution
Let as in Example 2.1, X and Y_ be independent samples from distributions with pdfs fx(x\a) = aexp(—ax) and /y(y|/3) = /?exp(—f3y), respectively. Parameters a and /3 could be reasonably assumed to be independent a priori. Following Enis and Geisser (1971), we shall employ conjugate gamma prior distributions for a and /3 with parameters n, 7 and is, A, respectively, so that ir(a,(3)oca^ 1 e-' y a ^ - 1 e ~ A / 3 ,
M ,7,i/,/3
> 0.
(2.48)
(Recall that <x means "proportional to"). Taking into account that f(X_, Y_\a,f3) is of the form (2.13) and applying the Bayes formula (2.42) we obtain the posterior density of (a, /3) of the form ir(a,/3\K,Y) ex Q»i+M-ie-a(7+m*) ^na+iz-ig-^A+n^
^
^
Evidently the posterior is also the product of gamma pdfs with the updated parameters H* = ni + (i, 7* = 7 + niX, v* = n 2 + u, A* = A + n2Y.
(2.50)
Here X and Y are the sample means (see (2.14)). In order to derive the posterior pdf of R, we shall follow the recipe described in the beginning of this section and consider a one-to-one transformation F : R = a/(a + /3), On = a + f3 with the inverse Q : a =
The Theory and Some Useful Approaches
28
,P = R{1- OR). The Jacobian \JQ(R,0R)\ det
da dR
da d6R
90 dR
d0 ~"~
here is
= det
= 9R.
\_-eR 1 - R j
Hence, the joint posterior density of R and OR becomes TT*(R,6R\X_,Y)
where 0 c7} ,
(2.63)
where c 7 is chosen so that
L
KR(R\2L,
Y_)dR = 1 - 7 .
(2.64)
w(2L,Y)
2.4.5
Hypothesis Testing: Theory and Methods
Suppose one is required to test a hypothesis about R of the type Ho
R G S versus Hi : R <E Hc,
(2.65)
34
The Theory and Some Useful Approaches
where H is a subset of the interval [0,1] and Hc is its complement of S in [0,1], i.e. [0,1] \ 3. Here Ho and Hi are called the null hypothesis and the alternative hypothesis, respectively. Since R = R(9), 0 6 0, where © is the set of all possible values of parameters, hypotheses Ho and Hi about R can be re-formulated in terms of 6, that is Ho : 0 € 0 O versus Hx : 6 e 9g. Here ©g is the complement of ©o in © and R € 2 whenever 0 € ©oA hypothesis test is a rule that specifies for which sample values Ho is accepted as true and for which sample values it is rejected and, consequently, Hi is accepted as being true. This description is somewhat simplified but should be adequate for our purposes. Definition 2.9 The subset of the sample space for which Ho is rejected is called the rejection region (or the critical region). The complement of the rejection region is called the acceptance region. A hypothesis test is subject to two kinds of errors: rejecting Ho while it is true (this error is called Type I Error) or accepting Ho when it is false (Type II Error). For a fixed sample size, it is usually impossible to make probabilities of both types of errors arbitrary small. In searching for a good test, it is common to consider only those that control Type I Error at a specified level. Within this class of tests we then search for tests that have Type Error II probability as small as possible. Definition 2.10 We say that the test with the rejection region TZ is a level 7 (or size 7 ) test if sup Typically, the hypothesis test is specified in terms of a test statistic V = V(X_, Y) (a computable function of observations {X_, Y)), that is we accept HQ if the values of V are in some subset Wo of (—00,00). The size of the test carries important information about our decision: if 7 is small, the decision to reject Ho is quite convincing while, if 7 is large, this is not the case since the test has a large probability of false rejection of HQ. Another way of reporting the results of a hypothesis test is to provide the p-value. The p-value is especially useful when large (or small) values of test statistic indicate that Ho is false.
Interval Estimation
35
Definition 2.11 Let V(X_, Y_) be a test statistic such that the large values of V give evidence that Ho is false. A p-value p(X_, Y.) € [0,1] is a statistic satisfying P(p{X_, Y) < 7) < 7 for every 6 £ ©0 and every 7 £ [0,1] and such that for any sample point (x, y) p(x, y) = sup P(V(X, Y) > V(x, y)).
(2.66)
9€0
The p-value can be interpreted as the probability to obtain the data that has been observed in our particular experiment if hypothesis Ho is true. Consequently, small values of p(2L, Y_) can be treated as an evidence against .HoA practical construction of hypothesis tests is closely associated with interval estimation. For example, if (L(X_, Y_), U(X_,Y)) is a two-sided (1 — 7)-confidence interval for R, then the test which accepts HQ : R = Ro whenever Ro e (L(X_, F), U(X_,Y_)) and rejects it otherwise is a size 7 test since P(L(2L,Y) Ro versus Hi : R < Ro, a one-sided confidence interval (0,f/(X, y),0) will do the job. Indeed, a test that rejects Ho if Ro > U(X_, Y_) is of size 7 since
U(X,Y)\R
= Ro)
= P(R0 > U(2L,Y)\R0 < R) = 7.
Similarly, a test which rejects Ho : R < RQ whenever Ro < L(X_, Y) is a size 7 test. Another approach to testing a hypothesis about R is to construct a Bayesian test. Consider the general problem (2.65). Here the Bayesian test rejects HQ if \X,Y) P{R e 5C \X,Y) _l-P(R€E P(ReE\X,Y) ~ P(ReE\X,Y)
> A
'
[ ()
^
where the numerator and the denominator in (2.67) are the posterior probabilities of Hc and S, respectively, and A is a threshold value chosen in advance. For example, if A = 1, Ho is rejected if evidence in support of Hi is stronger than evidence in support of HQ. However, if one wants to
36
The Theory and Some Useful Approaches
be more conservative in rejection of Ho, one should increase the value of A. Observe that P(R G 3|X,20 i n (2.67) can be evaluated as P(RGE\X_,Y)
=
f I(R € E)nR(R\X,Y)dR
=
G
(2.68)
E)n(9\X,Y)d6,
where R(9), TT(6\X_,Y_) and TVR(R\X_,Z) are defined in (2.5) or (2.6), (2.42) and (2.45), respectively. It is appropriate to use the first half of (2.68) when the expression for T^R{R\X_, Y_) is available since it leads to a one-dimensional integration. However, the knowledge of 7Tfl(i?|X, Y) is not absolutely necessary: one can always numerically calculate the second (multivariate) integral in (2.68). 2.4.6
One-parameter Exponential Distribution
We are back to the set up of Examples 2.1 or 2.2 but now we are looking for the interval estimator of R. The one-parameter exponential distribution is one of the exceptional cases when the exact confidence interval can be constructed. Surprisingly, this derivation was carried out in a purely Bayesian paper of Enis and Geisser (1971) which shows interconnection between seemingly different approaches. Recall that the MLE of R = a/(a + (3) is of the form R = Y/(X + Y) ((2.12) and (2.15)) and note that n\X and n-^ have gamma distributions with parameters (a,ni) and (/3,712), respectively. In order to obtain the exact confidence interval for R we shall derive the exact distribution of the variable anxX + /3n2Y Denote £ = an\X, r\ = (3ri2Y and observe that £ and 77 have gamma distributions with the parameters (l,ni) and (I,ri2) respectively. Introduce now a new set of variables C = £/(£ + v),T — V- Expressing the old variables in terms of the new ones £ = ( ( T ) / ( 1 — £), V = T and obtaining the Jacobian of transformation J = (1 - ()~2T, we arrive at the following joint pdf of £ and r
r(m)r(n2)(i -
Interval Estimation
37
Integrating out r we have the marginal distribution of £ p(0 = [^(ni.na)]- 1 ^ ( l - 0n'~\
0 < C < 1,
namely, £ has a beta distribution with the known parameters n\ and n2. Therefore, for any 0 < a < b, P(a < ( < b) = Ib(ni,n2) - Ia(ni,n2),
(2.69)
where F ni-l
-1
n 2 -l
Jo
(2.70)
is the incomplete beta function (see Abramowitz and Stegun (1992), page 263). To connect £ and R, it is easy to check by direct calculations that
(2.71) hence the right hand side of (2.71) is a pivotal quantity. Consequently, if a and b in (2.69) are such that for a given 7 2)-Ia(ni,n2)
= l-J,
(2.72)
then
Y < b ) l ,
(2.73)
Solving the inequality in (2.73) for R:
- a) + ri2Ra
Hi(l — -R)(l — b) + n2Rbl
(2.74) The exact confidence interval (2.74) has the advantage of being valid for any values of ni and n2, large or small. However, for its construction one needs to solve the equation (2.72) for a and b. Although the equation has infinite number of solutions, the objective is to find those that are the closest to each other, namely, for which b - a — min. Since the solution of this optimization problem is often far from trivial, we may wish to replace the exact confidence interval by the asymptotic one provided n\ and n2 are
38
The Theory and Some Useful Approaches
relatively large. We can base our asymptotic confidence interval on either the UMVUE R or the MLE R of R. In both instances, we shall use the fact that as m —> oo and n 2 —> 00, the estimators R and R are asymptotically normal with the mean R. In example 2.2 we have derived the unbiased estimator a2 = Vari? for the variance Var.R of the form given by (2.37) and (2.38) . Then, the asymptotic confidence interval for R with the confidence coefficient 1 — 7 is given approximately by (R — z^/2a, R — Z7/2CT) where as above z 7 / 2 is the 1 — 7/2 percentile of the standard normal distribution. We warn our readers that even with the modern computer facilities, an application of (2.37) and (2.38) may require substantial numerical effort. If the sample sizes are equal n\ = n2 = n, we can obtain an asymptotic confidence interval for R based on the MLE using the asymptotic expression (2.16) for MSE(£). Choose a2 = 2f 2 (l + f ) - 4 ^ - 1 + 4f 2 (2r - l)(f - 2)(1 + f)-6n~2 with f = Y/X. Then, the asymptotic confidence interval based on the MLE, is of the form (R — z7/2 1 and v* > 1, the pdf (2.53) is unimodal, so that luckily the HPD region reduces to the interval W{X_,Y_) = (L(X_,Y.),U(2L,¥_)). Algebraically, in view of (2.63) and (2.64), L and U are such that (2.75) and = 7.
(2.76)
The system of equations (2.75) and (2.76) ought to be solved numerically. We have not done it here but an example of solution of the system of equations (2.63) and (2.64) is given in Section 4.3.2.
Transformation Methods
2.5 2.5.1
39
Transformation Methods The Theory
In the Sections 2.1 — 2.4 we have dealt with construction of both point and interval estimators of R — P(X < Y) when the random vector (X, Y) has the pdf f(x,y\9) where 9 is a scalar or a vector-valued parameter. In this section we shall describe how the point or the interval estimators can be derived using the so called transformation methods. These methods seem to have been overlooked by statisticians, and, to the best of our knowledge, has never been applied to the stress^strength problem before. Let, as above, (X, Y) be a random vector with the pdf f(x, y\9). We shall assume that there exist random variables £ and r? and a monotone function u(-) with the inverse v = u~l such that
X = u(0 ^=> £ = v(X),
Y = ufa) «=> r, = v(Y).
(2.77)
Assume, also without loss of generality, that the functions u and v are strictly increasing, so that (£, rf) is the random vector with the pdf
9*(tM°) = /M£),ufa)|0)u'(£)u'(T,).
(2.78)
Parameterization of the pdf (2.78) can also be carried out in a different manner. Namely, let (£, rj) be the random vector with the pdf g(£, TJ\T) where the scalar or vector-valued parameter r is connected to 6 by the one-to-one transformation g with the inverse v: 6 = U{T) T = Q(6).
(2.79)
Thus, there exists the following correspondence between f(x, y\6) and