This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
0; hence cp(t) -> + oo as t -> +00. For any f e / let G(() be the left-continuous probability distribution function defined by dG(t\y) = [(p(t)]~l e\p(ty)dF. The distributions {G(r)} associated wit a given F appear in many statistical and probabilistic contexts. For example, the G(t) with t =£ 0 such that cp(t) = 1 is of importance in sequential analysis [Wl]; cf. also [B3], [F2], [F3], [S3]. We are interested in the case when F satisfies the standard conditions, and t = T. LEMMA 2.4. Suppose F satisfies the standard conditions (2.10)-(2-ll), and let
where p and r are given by (2.3), (2.12). Then the m.g.f. of G is finite in a neighborhood of the origin, and
Proof. The m.g.f. of G, \l/(t) say, is (T). Since 0 < T < /?, it is plain that \j/(t) < oo for all sufficiently small \t\. Next, i//(0) = 0) > 0. To treat this case, let k be a positive integer and for each
n let Y(k) be defined by (3.1) with Y replaced by Yn. The
sequence of independent and identically distributed variables, and hence
Let p(k) be defined as in the paragraph following (3.1) and let mk = E(Y(k)), — oo ^ mk < oo. Now consider the following subcases of the present Case 2: (i) mk < 0 for all k, (ii) mk > 0 for some k, and (iii) mk = 0 for all sufficiently large k. In view of (3.3) these three cases are exhaustive. Suppose Case 2(i) obtains. It then follows from Lemma 2.3 that, for each k, the d.f. of Y(k) satisfies the standard conditions; hence n~1 log Pn(k) -> log p(k) by the first paragraph of this proof with Y replaced with Y(k]. It now follows from (3.8) that the left-hand side of (3.7) is not less than log p(k) for any fc; hence (3.7) holds, by Lemma 3.2. Suppose now that Case 2(ii) obtains, and let k be such that mk > 0. Then Pn(k) -> 1 as n -» oo by the law of large numbers; hence Pn -> 1 by (3.8); hence n~l log Pn ->• 0, so (3.7) holds since p ^ 1. Suppose finally that Case 2(iii) obtains. In this case E(Y) exists and equals 0. It is thus seen that the theorem is established in all cases except possibly in Case 3: P(Y > 0) > 0, E(Y) exists and equals 0. To treat this case let u be a positive constant, and let Y* = Yn - u for each n. Then Pn ^ P(Y* + ••• + Y* ^ 0) = P*, say, for each n. Since E(Y*) = — u^Q, Y* does not belong in Case 3. Hence n~ 1 log P* -> —/(«), where / is given by (3.6). Thus the left-hand side of (3.7) is not less than — f(u). Since u is arbitrary, it follows from Lemma 3.3 that (3.7) holds. Notes. Theorem 3.1 is due to Chernoff [C1 ]. The present proof is a rearrangement and continuation of the proof in [B3] for the case when the standard conditions are satisfied. A different proof, again under the standard conditions, is given in [fill]. Concluding remark. The following partial generalization of Theorem 3.1 is required in certain applications. Let Y be an extended real-valued random variable such that P(— oo ^ Y < oo) = 1, and let nun) for all n, and 0 < a ^ 1 by (3.10). Choose e > 0. Then u — E < un < u + e for all sufficiently large n. Hence, for all sufficiently large n, P(Y*1 + • • • + Y* ^ nun) and P(Y* + • • • + Y* > nun) are both bounded by P(Y* + • • • + Y* ^ n(u + e)) and P(Y* + • • • + Y* ^ n(u — e)). By first applying Theorem 3.1 to these bounds and then letting e -> 0, it follows from (3.10) and Lemma 3.3 that the limits in (3.11) and (3.12) exist and that both limits are equal to log a — f * ( u ) , where /* is defined by (3.6) with (p replaced by cp* equal to the m.g.f. of Y*. Since 1, it follows from (4.4), under appropriate regularity conditions, that K(pe,peo) ~ E6()(r(x\0, 0 0 ) — l) 2 /2 as 0 -> 00. Similarly, under appropriate regularity conditions, K(p0o, pe) - E6o(r(x\6,00) ~ l) 2 /2 as 9 -» 0 0 . £6o(r(.x|0, 00) — I) 2 is a familiar quantity in the theory of estimation. In particular, if/ fl (x) is continuously differentiable in 9 for each x and the partial derivatives are square integrable, then, under appropriate regularity conditions, E9o(r(x|0, 00) - I) 2 - (0 - 90)I(90)(9 - 00)' as 0-> 00, where 7(00) is Fisher's information matrix, when (X, ^) is the sample space and 00 obtains. (v) In the same context as that of the preceding paragraph consider fixed 9l and 92, and let T(x) = log K^l^i > ^2)- According to the Neyman-Pearson theory, T is the best statistic for discriminating between p0i and pB2. Assume that x consists of a large number of independent components. Then T is approximately normally distributed under each 9. Let m, be the mean and a? the variance of T under pe.,i = 1,2. If a^ = a2 — G saY' the separation between pdi and p02 afforded by the optimal statistic T is approximately the distance between the N(0,1) and the N(d, 1) distributions, where d = (mi — m2)/a. (Here m2 < 0 < ml.) In the general case, i.e., a{ not necessarily equal to a2, we may take d{ — (m t — m2)/o-1 or d 2 = (m t — w 2 )/fT 2 or some mixture of the two as the effective distance, in the standard normal scale, between p 0i and p 02 . It can be shown, under appropriate
SOME LIMIT THEOREMS IN STATISTICS
11
regularity conditions, that and 2 (cf. [B7]). Consequently, -J K(pQ^ pe2) + K(pg2,pdl) is the approximate distance between p6i and p02 in the standard normal scale if 02 is close to 0 ^ . So much for heuristics. Suppose now that Y = Y(x) is a real-valued ^-measurable function on X , and that p is a given probability measure on &. Let 0 such that (p(t) < oc. Then
by (4.1) and (4.3). Hence log 0 by (5.10), Fa satisfies the standard conditions, by Lemma 2.3. It is readily seen that
It follows from (5.15) that pn =
• the integral in (5.13) as n -> oo, by (5.14). It follows hence that and that 0 < T < oo. It follows from (5.15) and (5.17) by another application of (5.14) that n ^ 1 logp n -»log p, where p is given by (5.12) and (5.13). It will now suffice to show that conditions (2.23) and (2.24) are satisfied. Let C n (r) be the cumulant generating function of G n , the distribution obtained from Fn by exponential centering. Then n is given by (5.15). Hence G2n = cj,2)(0) = n f x sech2(tn>') dWn. Hence
by an application of (5.14). Another application of (5.14) shows that
Since in is bounded, it follows from (5.18) that (2.23) holds. It follows from (5.18) and (5.19) that the fourth cumulant of Hn —> 0 as n —>• oc. Hence the fourth moment of Hn ->• 3, so (2.24) holds with c = 2. It can be shown that in the present case Hn -> O, even if}' 4 is replaced by >'3 in (5.8). Example 5.3. Let X j , x 2 , • • • be a sequence of independent random variables, with each x, uniformly distributed over [0, 1]. For each n let FH(t) be the empirical d.f. based on Xj, • • • , xn, i.e., Fn(t) = (the number of indices / with 1 ^j^n and Xj ^ t)/n, and let
and Let a be a constant, 0 < a < 1, and let Pn+ = P(D+ ^ a), P~ = P(D~ ^ a), and pn = p(Dn ^ a). We shall show that
16
R. R. BAHADUR
as n -> oo, where g is defined as follows. Let
Then
It follows from (5.22) that Fn+ ^ Pn ^ P+ + P~ ^ 2 max{P + , P~}; consequently, (5.23) implies that
as n —> oc. Some properties of g defined by (5.24), (5.25) are described in the following lemma. LEMMA 5.1. g(a) is a strictly increasing and continuously dijferentiable function of a for 0 < a < 1; g(a) — 2a2 + 0(a3) as a —>• 0; g(a) -> oo as a -»• 1. The proof of this lemma is omitted. Now consider Pn+ for given a. It is plain from (5.20) that P+ ^ P(Fn(t) - (a + t) ^ 0) for each te[0, 1]. For fixed t, Fn(t) — (a + t) is the mean of n independent and identically distributed random variables, say Y! , • • • , Y n , and with / defined by (5.24) we have
It follows hence from Theorem 3.1 that lim^^ n'1 \ogP(Fn(t) - (a + r) ^ 0) = —/ Hence lim inf n ^ x n~ l log Pn+ ^ —f(a, t). Since t is arbitrary,
Now let k be a positive integer so large that 0 < a — k * < 1. Then
Hence
thus
SOME LIMIT THEOREMS IN STATISTICS
17
It follows from Theorem 2.1 that the /th term of the series does not exceed Qxp[-nf(a — k'^i/k)]; hence Pn+ ^ /cexp[-ng(a - AT 1 )], by (5.25). Since n is arbitrary,
Since g is continuous, it follows from (5.27) by letting k -> oo in (5.28) that the first part of (5.23) holds. The second part of (5.23) follows from the first since Dn+ and D~ have the same distribution for each n. Now let
and let Qn = P(Tn ^ a). Then, for Qn also, Since Tn ^ D n + , Qn ^ Pn+ for all n. In view of (5.23) it will therefore suffice to show that
To this end, note first that
Let k be a positive integer so large that 0 < a — 2k 1 < 1, and let i and j be integers, 1 ^ i, j ^ fc. If (i - l)/k £ t £ i/k and (; - l)/k ^ u £j/k, then FH(t) -t + u- Fn(u) ^ Fn(i/k) - (i - l)//c + 7y/c - FJi(j - l)/k) = G n (U), say. Now, i ^ j - 1 implies P(Gn(iJ) ^ a) ^ exp[-n/(a - 2fc~ *, (i - 7 + !)/&)], and i ^ ;
- 1 implies P(Gn(i,j) ^ a) ^ exp[-n/(a - 2fc~1, 1 - (7 - 1 - i)/k)], by applica
tions of Theorem 2.1. Hence P(Gn(i,j) ^ a) ^ exp[ —ng(a — 2/c" 1 )] for all tj. Since Tn ^. a implies Gn(i,j) ^ a for some i and;, it follows that Qn ^ /c2 exp[ — ng(a — 2k~1)]. Since this holds for each n, the left-hand side of (5.31) does not exceed — g(a — 2k~1}. By letting k -> oo it now follows that (5.31) holds. Remark 1. All four conclusions (5.23), (5.26), and (5.30) are special cases of (4.12). Remark 2. It is known that P(n 1/2 D n + ^ i) -» exp[-2t 2 ] for each f > 0. This suggests that if a > 0 is very small the limit of n' l log P* is nearly — 2a2 ; verification of this suggestion is provided by (5.23) and Lemma 5.1. The parallel remark applies to Pn, and perhaps also to Qn. Example 5.4 (Sanov's theorem in the multinomial case). Let X be a finite set, say X = {av, • • • , ak}, k ^ 2. Let A denote the set of all v = (v^, • • • , vk) with vt ^ 0 and Xfc=i vi ~ 1- Regard A as the set of all probability measures on X, i.e., i v = (v1, • • • , vk) obtains, then P(x = a,) = vt for / = 1, • • • , k. For any v = (vl, • • • , vk) and p = (P! , • • • , pk) in A let
18
R. R. BAHADUR
with 0/0 = 1 (say) and 0 log 0 = 0. Then K as just defined is the specialization to the present case of K as defined in § 4. Now let p — (p,, • • • , pk} be a given point in A with p( > 0 for each /, and let A be a subset of A. Let K ( A , p ) = inf{X(i;,p):ve A } if A is nonempty, and let K(A, p) = co otherwise. Let x l 5 x 2 , ••• denote a sequence of independent and identically distributed observations on x. For each n let fin = the number of indices j with 1 ^j^n and x}: — «,-, i = 1, • • • , /c, and let Vn = n~l(j\n, ••• ,fkn). We are interested in P(Vne A\p). The values of Vn are restricted, of course, to A n = the set of all v of the form (i{/n, • • • , ik/n), where i\, • • • , ik are nonnegative integers with il + • • • + ik = n. Let An — A n A n . We shall show that there exists a positive constant y(k), depending only on /c, such that
for all n, A c A, and p in the interior of A. Let us say that A is p-regular if
In this case (5.33) implies that
as n —> oo. The following lemma gives a useful sufficient condition for /^-regularity. Let A° denote the interior of /4, and let A° be the closure of A°. LEMMA 5.2. If A c: A° (e.g., if A is open), then A is p-regular for any p e A°. Proof. If A is empty so is An and K(An, p) — co — K(A, p) for all n, so (5.34) holds trivially. Suppose then that A is nonempty. Choose e > 0 and let v be a point in A such that K(v, p) ^ K(A, p) + K. Now, p E A° implies that K(i\ p) is finite-valued and continuous in its first argument. It follows hence from v E A and A c: A° that there exists w e A° such that K(w, p) fg K(v, p) + E. Hence K(w, p) ^ K(A, p) + 2i:. Suppose vv = ( W L , • • • , w k ). For each n let rin be the greatest integer ^nwf for i = 1, • • • , k - 1, let rkn = n - ^~} rin, and let wn = n' l(rln, ••• , rkn). Then \vn e A n for each n, and vvn —>• w as n —>• oo. Hence w n e A° for all sufficiently large n, say for /? > m. Since /4n — /4 n A n =3 /1° n A n , it follows that K(An, p) ^ K(wn, p) for all n > m. Hence limsup n ^ a ) K(An,p) ^ K(w,p). Thus limsup,,^^ K(An, p) ^ X(/l,p) + 2fi. Since £ is arbitrary, and since An rk) with each r, ^ 1. Since r\ = ^/2nr rr e~r +s, where (12r + I)' 1 < s < (12r) -1 , it follows by an easy l/k calculation using that P(Vn = v\v) ^ 0(fc) 1 ••• rk) (k 1}/2 (fc l l / 2 /13 •n~ ' , where jg(fc) = (27r)' • fc* • e~* . Suppose now that v is a point in An with exactly /c, positive coordinates where 1 ^ ^ < /c. The preceding argument shows that then P(Vn = y|y) ^ P ( k j n ~ ( k l ~ 1)/2 ^ ft(k^n~ {k ~ l) > 2 . Letting y(/c) = min{0(l), • • • , jS(fc)}, it now follows that (5.37) holds for all v in A n . If/!„ is empty the upper and lower bounds in (5.33) are zero and so is P(Vne A\p). Suppose then that An is nonempty. Since An is a finite set, there exists vn e An such that K(An,p) = K ( v n , p ) . Then P(VneA\p) = P(Vne An\p] ^ P(Vn = vu\p) ^ the lower bound in (5.33), by (5.36) and (5.37). Notes. The sources of Example 5.1 are [Al] and [B4], where a different method is used. Example 5.2 is from [K3]. Example 5.3 is based partly on [S2] and partly on [A2]. Example 5.4 is based on the treatment in [H5]. 6. Stein's lemma. Asymptotic effective variances. The statistical framework and notation of this section and all subsequent ones is the following. X is a set of points x, and 0 for all / > 0. Choose and fix an / > 1 and define z = r(x) if Q -^ r(x) < I and z = / if r(x) ^ /. Let Z = [0, /] be the sample space of z, and let P£ denote the probability measure on Z when 0 obtains. We then have dP$2 = p(z}dP*ei, where p(z) = z for 0 ^ z < / and p(/) = J r(x)i , ; r(x)^P ei /P 0i (r(x) ^ /). It is plain that p is a bounded function on Z; hence X* = £^2(log p(z)) is finite. Now let z, = z(x,) for i = ! , - • • , « and let a* be the minimum available size when power is fixed at 1 — /? and the sample point is ( z j , • • • , z n ). Then an ^ a* for all n; hence
by the first two parts of the present proof with x replaced by z. Now, since K* = J z p(z] log p(z) dP^ and since z = / implies p(z) ^ / > 1,
Since K = J x r(x) log r(x) dPdi = oo, it follows from (6.12) that K* —> GO as / —> GO. By letting / -» oo in (6.11) it follows, as desired, that n * log a n -»• — oo. It remains now to consider the case when K = oo and P02 is not dominated by P9l on 30. Then there exists a set £ c X, 5e 0. Consider m. In the remainder of this section we consider a theory of estimation in terms of the framework (X(n\ ^ (n) ), {Pj^rfle®}, « = 1,2, • • - , introduced at the outset of this section. Let g be a real-valued functional defined on 0. For each n, let Tn = Tn(x(n)) be a ^'"'-measurable function on X(n), to be thought of as a point estimate of g. For any 9 and e > 0, let Tn = rn(e, 0) be defined by 0 ^ r n ^ oo. Since the right-hand side of (6.13) can be found exactly by entering a standard normal table with e/t n , let us call in(s, 9) the effective standard deviation of Tn when 9 obtains and it is required, for some theoretical or practical reason, to compute the left-hand side of (6.13). Note that if Tn is exactly normally distributed
SOME LIMIT THEOREMS IN STATISTICS
23
with mean g(9) when 0 obtains, then tn(£, 9) equals the actual standard deviation of Tn for each s. The sequence {T n } is said to be a consistent estimate of g if, for each fixed e and 0, the left-hand side of (6.13) —> Oas n —> oc. It is plain that we have consistency if and only if
for all 8 and 0. A special case of consistency occurs when Tn is asymptotically normal with mean g(9) and variance v(9)/n, i.e., there exists a positive function v on 0 such that for each 9, nli2(Tn - g(9))/[v(9)]1/2 tends in distribution to an JV(0, 1) variable when 9 obtains. It is readily seen that in this case for each 9 and h > 0. In (6.15), E -> 0 as n -» oo. We now consider the case when E remains fixed as n -> oc. It will be shown that for consistent estimates there is an asymptotic lower bound for nil f°r a^ sufficiently small c. This conclusion (Theorem 6.1) is an analogue of Fisher's bound for the asymptotic variance of asymptotically normal estimates (for a fuller description and discussion see §§ 1-3 of [B9]). Assumption 6.1. 0 is an open set in Rk, and g(0) is a continuously differentiate function of 9 = ( 9 l , • • • , 0k). Let
Assumption 6.2. For each 0° in 0 there exists a positive definite symmetric k x k matrix 1(9°) = {/,7(fl°)} such that
As noted in § 4, under additional assumptions the matrix 1(9 ) coincides with the information matrix when (X, &) is the sample space and 9° obtains; these additional assumptions are, however, not required here. Write I ~ l ( 9 ) = {Iij(9)} and let
Note that v depends only on the framework (X,&),{Pe:9e 0} and on the function g to be estimated. THEOREM 6.1. Suppose that Assumptions 6.1 and 6.2 hold and that [Tn] is a consistent estimate of g. Then
for every 9.
24
R. R. BAHADUR
Proof. Choose and fix 0° e 0. We shall show that (6.19) holds i) = 0(0°). If 0 = 0, then (6.19) holds trivially. Suppose then that / = 1(0°) and /i = (hi(0°), • • • , M#°)), where the ht are given by h 7^ 0, so hl~l is a nonzero vector. Choose and fix A, 0 < A < 1. For
at 9°. Write v > 0. Write (6.16). Then e > 0 let
It follows from Assumption 6.1 that 0* e 0 for all sufficiently small e, and that g(0*) - g(0°) = 6 + 0(6) as e -> 0. Consequently, by (6.17), for all sufficiently small e. Choose and fix e so small that (6.21) holds, and consider testing 0° against 9* by means of tests which have power ^ (say) against 9*, and let a* be the minimum available size when the sample point is x(n). It is known (cf. the proof of Lemma 6.1 in case 0 < K < oo) that if (pn is a ^'"'-measurable test such that Ee4 „ = ! if | Tn - g(0°)| ^ Ad and (pn = 0 otherwise. Since {Tn} is a consistent sequence, it follows from (6.21) that Ee*((pn) -> 1 as n -> oo. Hence £fl*( m (say); hence E60((pn) ^ a* for n > m. It now follows from the definition of q>n by Lemma 6.1 that
Let us write Pe0(\Tn - g(0°)\ ^ A 0 we obtain
It follows from (6.17), (6.18) and (6.20) that K(0*, 9°) - e2v/2 + o(e2) as e -> 0. It now follows from (6.20) that the right-hand side of (6.23) equals —(2A2v)~ l. Since A is arbitrary, we conclude that
Since 0 < v < oo it follows from (6.24) that there exists 8j > 0 such that if 0 < E < e l 5 then a n (e) > 0 for all sufficiently large rc, say for n > m(e). Since an(e) equals the left-hand side of (6.13) with 9 = 0°, it follows that 0 < e < EJ and n > m(e) imply that 0 < tn(e, 0°) fS oo. Since {Tn} is consistent, tn(e, 0°) ->• 0 as n -> oo. It follows hence from (6.13) by Theorem 1.1 that if 0 < e < e l 5 then
as n -> oo. It follows from (6.24) and (6.25) that (6.19) holds at 9°. In view of Theorem 6.1 let us say that {7^,} is an efficient estimate of g, in the sense of asymptotic effective variances if, for each fixed 0, lim n _ ao {nT^(e, 9)} exists for all sufficiently small e, say w(g, 9), and lim £ ^ 0 w(e, 9) = 0(0). At present it is an
SOME LIMIT THEOREMS IN STATISTICS
25
open problem to find estimates which are efficient in this sense. For partial results concerning efficiency of the maximum likelihood estimate in the present sense see [B6]and[B9]. We conclude this section with an example where the regularity assumptions of classical estimation theories are not satisfied but Assumptions 6.1 and 6.2 do hold, and the maximum likelihood estimate is efficient in the sense of asymptotic effective variances. Example 6.1. Suppose that X is the real line and x is distributed in X according to the double exponential distribution with mean 0, i.e., dPe = exp( —|x — 0\) dx/2, and 0 = (a, b), where a and b are constants, — oo ^ a < b ^ oc. Let g(0) = 0. A straightforward calculation shows that, for any 0{ and 02, It follows from (6.26) that (6.17) holds with for all 9. It follows from (6.27) that, for the present g, v(0) = 1 for all 0. Now for each n let kn be the integer such that n/2 < kn ^ n/2 + 1, let yn(l) < • < yn(n) be the ordered sample values {.Xj, • • • , xj , and let Tn — yn(kn). Then, for each 0 and £, where p < \ is given by Denote the left-hand side of (6.28) by a n (e). It follows easily from (6.28) and the definition of kn by Example 1.2 that where It is plain that rn does not depend on 0, that 0 < r n (e) < x, and that rn(e) ->• 0 as n -> oo. It therefore follows from (6.25) and (6.30) that, for each c > 0,
It is readily seen from (6.29) and (6.31) that
so {T n ] is asymptotically efficient. Suppose now that g(0) is a continuously differentiate function over («, b) and g'(0) ^ 0 for a < 0 < b. Let Un = g(Tn). It can be shown by an elaboration of the preceding argument that { U n } is an asymptotically efficient estimate of g.
26
R. R. BAHADUR
Notes. Lemma 6.1 is contained in unpublished work of Stem. The first published statement of the lemma seems to be in [C2]. The present proof of Lemma 6.1 is based on the proof in [B9]. Theorem 6.1 is due to Bahadur [B6]. 7. Exact slopes of test statistics. Let (S, s#) be the sample space of infinitely many independent and identically distributed observations 5 = ( x l , x 2 , • • • ad inf) on an abstract random variable x, the distribution of x being determined by an abstract parameter 0 taking values in a set 0. Let 00 be a given subset of 0, and consider testing the null hypothesis that some 9 in 00 obtains. For each n = 1,2, • • • , let Tn(s) be an extended real-valued function such that Tn is .^/-measurable and depends on s only through ( x l , • • • , x n ); Tn is to be thought of as a test statistic, large values of Tn being significant. Assume for simplicity that Tn has a null distribution, i.e., there exists an Fn(t) such that
and all t, — oo ^ t ^ oo. Then the level attained by Tn is defined to be
If in a given case the data consists of ( x l , • • • , x n ), then Ln(x^, • • • , xn) is the probability of obtaining as large or larger a value of Tn as the observed value T n (xi, • • • , xj if the null hypothesis is true. In typical cases Ln is asymptotically uniformly distributed over (0, 1) in the null case, and Ln -» 0 exponentially fast (with probability one) in the non-null case. We shall say that the sequence {T n } has exact slope c(9) when 9 obtains if
This definition is motivated in part by the following considerations. Consider the
Fisherian transformation Vn(s) = — 21ogLn(s). Then, in typical cases, Vn -> xl m
distribution in the null case. Suppose now that a non-null 9 obtains and that (7.3) holds, with 0 < c(9) < oo. Suppose we plot, for a given s, the sequence of points {(n, Vn(s)):n = 1, 2, • • •} in the uu-plane. It then follows from (7.3) that, for almost all s, this sequence of points moves out to infinity in the direction of a ray from the origin, the angle between the ray and the w-axis, on which axis the sample size n is being plotted, being tan" l c(&). The term "exact" in the above definition serves to distinguish c from another quantity, called the approximate slope of {7^}, which is defined as follows. Suppose that Tn has an asymptotic null distribution F, i.e., lim,,^^ Fn(t) = F(t) for each t. For each n and s let L("} = 1 — F(Tn(s)). Suppose (7.3) holds when Ln is replaced by L(°] and c is replaced by c(a)(9}. Then c(a\9) is the approximate slope of {Tn} when 9 obtains. (For a discussion of approximate slopes c(a\ and of the rather tenuous relations between c and c(a), see [B9].) In the remainder of this section, and in subsequent sections, we consider only exact slopes. In particular, the assumption that Tn has an asymptotic null distribution is henceforth not in force.
SOME LIMIT THEOREMS IN STATISTICS
27
Now for given £, 0 < £ < 1, and given s, let N = N(E, s) be the smallest integer m such that Ln(s) < e, for all n ^ m and let N — oo if no such m exists. Then N is the sample size required in order that the sequence {7^} become significant (and remain significant) at the level e. The following theorem shows that, for small £, N is approximately inversely proportional to the exact slope. THEOREM 7.1. If (1.3) holds and 0 < c(9) < oo, then
Proof. Choose and fix 9 such that 0 < c(9) < oo and choose and fix s such that n" 1 logics) -» —c(9)/2. Then Ln > 0 for all sufficiently large n and Ln -> 0 as n -» oo. It follows that N < oo for every e > 0 and that N -> oo through a subsequence of the integers as e -> 0. Hence 2 ^ N < oo for all sufficiently small £, say
for £ < £,. For1e < E{ we 1have LN < E ^ LN_:. Hence
^ (N — 1)- N' -(N — I)" log L N _ !. It now follows from the present choice of s that AT' log £ -» -c(0)/2 as £ -> 0. Suppose that {T^,1*} and {T(n2}} are two sequences of test statistics such that T(^ has exact slope ct(0) and suppose a non-null 0 with 0 < cf(9) < oo obtains. It then follows from Theorem 7.1 that, with W,-(£, s) the sample size required to make 7™ significant at level e, N2(z, S)/N{(E, s) -> Ci(0)/c 2 (0)[P fl ]. Consequently c](G)lc2(0) is a measure of the asymptotic efficiency of T(n1} relative to T(2) when 9 obtains. The following theorem describes a useful method of finding the exact slope of a given sequence {7^} for which (7.1) holds. Let 0! = 0 — 00 denote the non-null set of points 9. THEOREM 7.2. Suppose that
for each 0€&l, where — oo < b(9) < oo, and that
for each t in an open interval I, where f is a continuous function on I , and {b(0):0 e 0 t } ci /. Then (7.3) holds with c(0) = 2f(b(9))for each 0 6 0 , . Proof. Choose and fix 9 e 0 X , and choose and fix an s such that n~ l!2Tn(s) -> b as n -» oo. Let £ > 0 be so small that b + B and b — E are in /. Since Fn(t) is nondecreasing in t it follows from (7.2) that nll2(b — E) < Tn < nll2(b + e) implies 1 - Fn(nl/2(b - e)) ^ Ln ^ 1 - Fn(n1/2(b + e)); consequently the latter inequality holds for all sufficiently large n. It now follows from (7.6) that lim supn_ x n' l log Ln g —f(b - E) and lim inf^^ n~l log Ln ^ -f(b + e). Since / is continuous and e is arbitrary we conclude that lim,,^ n~1 log Ln = —f(b). Remark 1. Suppose 9 is a point in 0 0 . Then, for any { T n } , (7.3) holds wit c(9) — 0. This is an immediate consequence of Theorem 7.5 below and Ln ^ 1. Remark 2. If a given {T n } does not satisfy the two conditions of Theorem 7.2 it may well be the case that {T*} does, where, for each n, T* is equivalent to Tn in the sense that T* = (pn(Tn), where r; by hypothesis, there are such points 9. Now choose and fix an 5 such that n~1 log Ln(s) -> —c(9)/2, and n'1/2Tn(s) -> b(9); (7.3) and (7.5) imply that there are such sequences s. Since Tn > n 1/2 nmpliesL n ^ 1 — Fn(nll2t), it follows that this last inequality holds for all sufficiently large n. Hence lim in^^^ n - 1 log[l — Fn(nll2t)] ^ —c(9)/2. Since 9 with b(9) > tis arbitrary, it follows from (7.7) that the first inequality in (7.8) holds. The last inequality in (7.8) is established similarly. The following theorem describes an interesting and useful nonasymptotic property of Ln in the null case. THEOREM 7.4. For each 9 e 00 and each n, Pg(Ln(s) ^ u) ^ u for 0 ^ u ^ 1. Proof. Suppose that a particular 9 e ©0 obtains, and consider a particular statistic Tn. Since 9 and n are fixed, they are omitted from the notation. We assume that Tis real-valued; this involves no loss of generality since any extended realvalued statistic T° is equivalent to the bounded statistic tari~ l T°. If F, the d.f. of T, is continuous, then L is uniformly distributed over [0,1] and P(L ^ u) = u for 0 ^ u f$ 1. To treat the general case, let U be a random variable distributed uniformly over [0,1], independent of s, and let T* = T*(s, U) = T(s) + at/, where a > 0 is a constant. Then F*, the d.f. of T*, is continuous; hence F*(T*) is uniformly distributed over [0,1]. Now, for any t, F*(t) = P(T + at/ < t) ^ P(T < t - a) - F(t - a); hence F*(T*) ^ F(T* - a) ^ F(T - a) since T* ^ T an F is nondecreasing. It follows that P(l - F(T- a) < t) ^ t for t ^ 0. Now let a t , a 2 , • • • be a decreasing sequence of positive constants such that afc ->• 0. For t ^ 0 and k= 1,2, • • • , let Ak(t) be the event that 1 - F(T - <x k ) < f. Then P(y4k(0) ^ t for each k. Since F is nondecreasing and left-continuous (cf. (7.1)), y4 k (t) c: Ak+ t (t) for each /c, and U fc v4 fc (r) is the event that 1 — F(T)(= L) < t. Consequently,
SOME LIMIT THEOREMS IN STATISTICS
29
P(L < t) = lim^^ P(Ak(t)) ^ t. Since t ^ 0 is arbitrary, it now follows easily that P(L ^ u) ^ w f o r O ^ u ^ 1. It is worthwhile to note that the preceding Theorems 7.1 7.4 are valid for any sample space (S, ,c/), any set {Pe:0e 0} of probability measures on ,o/, and any sequence {Tn:n = 1, 2, • • •} of extended real-valued ^-measurable functions. We conclude this section with a theorem which depends heavily on the present assumptions that s is a sequence of independent and identically distributed observations on x, and that Tn depends on s only through the first n observations. For 0 and 00 in 0, let K(9, 90) be defined as in § 6, and let Then 0 ^ J(0) g oo for all 0, and J(0) = 0 for 0 e 0 0 . In typical cases 0 < J(0) ^ oo on 0,. The following theorem implies that the exact slope of any sequence {Tn} cannot exceed 2J(0) when 0 obtains. THEOREM 7.5. For each 9 e 0,
Proof. Since (7.10) holds trivially if J — oc, it will suffice to consider points 0 for which J(0) < oc. Choose and fix such a 0. Let E > 0 be a constant. It follows from (7.9) that there exists a 00 e 00 such that With 0 and 00 fixed, abbreviate K(0, 00) and J(0) to K and J respectively. Since K < oo, Pe(jJ dominates Pe on (X,i%), say dP0 = r(x)dP0o. Then with rn(s)
= ri" KX,),
andrfP^1" = rn dP(^ on (X'"',^'"'). For each n let /!„ be the event that Ln < exp(-n x [K + 2e]) and fin be the event that rn < e\p(n[K + e]). Then
by Theorem 7.4. It follows from (7.13) that £n P0(An n Bn) < oo. It follows hence from (7.12) and the definitions of An and Bn, that if 0 obtains, then, with probability
30
R. R. BAHADUR
one, Ln(s) ^ exp[ — n(K + 2e)] for all sufficiently large n. Hence the left-hand side of (7.10) is not less than -K - 2s[P0]. It now follows from (7.11) that
Since e in (7.14) is arbitrary, it follows that (7.10) holds. Remark 4. If a statistic Tn does not have an exact null distribution (cf. (7.1)), the level attained by it is defined to be Ln(s) — 1 — Fn(Tn(s)), where Fn(t) = inf{P0(7^(s) < f): 6 e © 0 }. It is readily seen that, with Fn and Ln as defined here, Theorems 7.1 through 7.5 are valid for any sequence {Tn}. Notes. This section is based mainly on [B5], [B8], [B9]. Various versions of Theorem 7.5 are given under various regularity assumptions in [B6], [B8] and [BIO]; that no assumption whatsoever is required was shown by Raghavachari [Rl]. The present proof of Theorem 7.5 is a simplification suggested by R. Berk and others of the proof in [Rl]. Certain generalizations and refinements of the content of this section are given in [BIO] and [Bll]. Certain nonasymptotic treatments of the level attained are given in [Dl], [Jl]. 8. Some examples of exact slopes. Example 8.1. Suppose that X is the real line, and that x is normally distributed with mean 6 and variance 1 when 9 obtains. The parameter space 0 is [0, oo) and the null hypothesis is that 0 = 0. Consider T(nl) = n~ 1/2£"=1 x t , T(n2\s) = n~1'2 (the number of indices j with 1 ^ j ^ n and Xj > 0), and for n ^ 2, T(*\s) = T(al)/vln'2, where vn = £"(*,. - xn)2/n - 1. T< 3) might be used by a forgetful statistician who fails to remember that the underlying variance is one. Then T*,0 satisfies (7.5) with b = bt, where
where 1 as 9 -> 0. Short tables of c2/c1 and c2/c3 are given in [B4]. Example 8.2. Let X be the real line, and let 0 be the set of all continuous probability distribution functions 9(x) on X, and let Pe(B) denote the probability measure on X determined by the d.f. 6. The null hypothesis is that 6 = 0 0 , where 00 is a given continuous p.d.f. For each n let Fn(t) — F J ^ x ^ , • • • , xn) be the empirical d.f. based on [xl, • • • , xn}, and let T(nl) be the Kolmogorov statistic, i.e., T(n1}(s) = n 1 / 2 sup{|F n (f) - 0 0 (f)|:—GO < t < 00}. It follows from the Glivenko-Cantelli theorem that (7.5) holds for T(ttl\ with b(9) = 6(9) = sup{\9(t) - 90(t)\: - oo < t < 00}; 0 < where g is defined by (5.24) and (5.25). Since g is continuous, the exact slope of T< 1 ) isc 1 (0) - 2g(c5(0)). Now consider Kuiper's statistic T(n2}(s) = n1/2[sup,{Fn(r) - 0 0 (f)} + sup r {0 0 (f) — Fn(t)}]. It follows from Example 5.3, exactly as in the preceding paragraph, that T< 2) has exact slope c 2 (0) = 2g(S+(9) + 6~(0)), where , say & — {Pe:9e&}, and that g(9) is a real-valued functional on 0. For each n let 9n denote the maximum likelihood (m.l.) estimate of 9 when the data is (xl, • • • , x n ). Then the m.l. estimate of g is g(9n). In particular, for any B e ^, the m.l. estimate of P0(B) is Pon(B) = Qn(B), say; Qn is, of course, a probability measure in the set (J?. It is thus seen that the m.l. method always estimates the entire under lying distribution from given data. Since successful estimation of the entire underlying distribution is the maximum of objectives attainable by any statistical method, it is of interest to enquire whether the m.l. estimated distribution is co sistent, i.e., if some P in ^ obtains, then Qn -> P with probability one. According to this viewpoint, the consistency of g(9n) for a given g is a subsidiary issue governed almost entirely by such questions as whether g is identifiable, i.e., a functional on ^, and if so whether this functional is continuous on &. It seems reasonable not to confound such nonstochastic questions with the consistency problem, so we dispense with parametrization; more precisely, we take P itself to be the unknown parameter and 3P to be the parameter space.
SOME LIMIT THEOREMS IN STATISTICS
33
It is assumed that & is a dominated set, i.e., there exists a er-finite /i, and a family (f P G £^} of J'-measurable functions /P, 0 ^ fp < oo, such that, for each P e ^, p:
Let there be given a /a and a family {/P:Pe^} such that (9.1) holds, n and [fp]
remain fixed throughout this section and the following one. For each n and 5 let
Suppose for the moment that 3? is a finite set, say {Pl, • • • , Pm}, 1 < m < oo. For each n and s let Qn be a measure in & such that ln(Qn\s) = max{/n(Pj|s): 1 ^ i ^ m}. Suppose that a particular P{ obtains, 1 ^ i ^ m. Then 0 < /n(P;|s) < oo for all n [PJ. It follows hence that n'1 log[/n(P,-|s)//n(P,-|s)] = rn(ij; s), say, is well-defined for each j and «[PJ, and that
Since K(P(, Pj) > 0 for / ^ j, it follows from (9.3) that l^s) > max{l n (Pj\s): 1 ^ j ^ m,j ^ /} for all sufficiently large «[PJ; hence
It is thus seen that m.l. estimates always exist and are consistent in the finite case. The basic idea of Wald's famous proof of consistency [W2] is to reduce the general case to the finite case by some compactification device. The following is a somewhat hyperbolic description of Wald's beautiful argument. A compact space is essentially a finite space. If 2P is compact, or can be compactified in a suitable way, and certain integrability conditions are satisfied, & is nearly finite; hence Qn is nearly consistent. But Qn is either consistent or inconsistent; so Qn is consistent. We proceed to formulate some sufficient conditions for the existence and consistency of m.l. estimates. Let M denote the set of all measures M on the Borel field^ of sets of X such that M(X) ^ 1. For any sequence {M^.j = 0, 1, 2, • • •} \nJt let us say that Mj -» M0 asy -> oo if and only if, for each real-valued continuous function h on X with compact support, J x h(x) dMj —> §x h(x) dM0. It can be shown by standard methods (cf. [B14]) that, with this definition of convergence, Ji becomes a metrizable and compact topological space. Let d be a distance function on M x M such that, for any sequence {Mj'.j = 0, 1, • • •} in Jt, Mj —>• M0 if and only if d(Mj, M0) -> 0. It should be noted that if M 0 , M t , • • • are all probability measures, then d(Mj, M 0 ) -» 0 if and only if Mj -+ M0 weakly. It is not necessary to specify d; indeed, having a specific d on hand is often a handicap in examples. Now let 0> be the closure in M of the given set & of probability measures P. ^ is a compact set. For any M e 0> and any constant r, 0 < r < oo, let
34
R. R. BAHADUR
Then gM is nondecreasing in r for each fixed x. Let
We shall say that ?J* is a suitable compactification of # if, for each M e ^, gM(x, r) defined by (9.5) is ^-measurable for all r > 0, and yM defined by (9.6) satisfies
Condition A. & is a suitable compactification of ^. It should be noted that this is a condition not only on the set & but also on the
particular version fp of dP/dfi which is in force for each P in &. Some of the addi
tional conditions stated below also involve the given family {fp: F e ^} of densit functions. This is not inappropriate since the very definition of m.l. estimates presupposes that a family of density functions is given. It is readily seen that Condition A is independent of the metric d, i.e., it holds with some choice of d if and only if it holds for every choice, and the same is true of the other conditions of this section. If M is in &, it follows from (9.5) and (9.6) that yM(x) ^ /M(x); hence yM(x) = /M(-X)M by (9.7), so yM is necessarily a version of dM/d/.i. However, if M is in & — (?, yM is not necessarily a version of dM/d/j.; in fact there are simple examples in which Condition A holds but & is not even dominated by \i or any other erfinite measure. Let l(J>\x] = sup{/ P (x):Pe^}. Since l(* depends on s and c; so write ^>* = i^*(s;c). Q e ^ is an m.l. estimate based on
SOME LIMIT THEOREMS IN STATISTICS
35
( x j , • • • , x n ) if ln(Q\s) = ln(i?\s). Let &n(s) denote the (possibly empty) set of all m.l. estimates based on ( x 1 , • • • , xj. It is plain that
for all n and s. Suppose now that a given P€ & obtains. In the following theorems (and in the following section) the phrase "with probability one" preceding a statement means that there exists an j^-measurable set SP of sequences s with P(cc\SP) — 1 such that the statement is true for each s in SP. If {^fn:n — 1, 2, • • •} is a sequence of subsets of ^>, j§?n -> P means that sup{d(Q, P)\Q€&n}^> 0. THEOREM 9.1. // Conditions A, B, and C are satisfied, then with probability one ^*(s;c) is nonempty for every n and 2?*(s; c) -» P as n -» oo. It follows from (9.11) that ,^* -» F implies .^n -> P provided that &„ is nonempty for all sufficiently large n. THEOREM 9.2. // Conditions A-E are satisfied, then with probability one $n(s} is nonempty for all sufficiently large n and $n(s) -» P as n —* oo. The proofs of Theorems 9.1 and 9.2 are along the lines of the proof on pp. 320321 of [B9], with 0 of the latter proof identified with &. The above theorems assert consistency in the sense of weak convergence. However, in many examples, if {Qn'.n = 0, 1, 2, • • •} is a sequence in & such that d(Qn, -J(Q)asn -> x[0]. According to Theorem 7.5, 2J(Q) is the maximum available exact slope when Q obtains. The idea underlying these conditions is the idea used in the preceding section; specifically, if ^o and ^ are both finite sets, then {T n } does have exact slope 2J(Q) again each Q e ^ (cf. [B9], pp. 315-316); the general case is reduced to the finite case by a compactification device. Let ,// be the set of measures M on X with M(X) ^ 1, let ,// be topologized as in § 9, and choose and fix a distance function d. Let J"0 be the closure of ^0 in jtf. Assumption 10.1. ^0 is a suitable compactification of .^0. Under this assumption, if M is a measure in ,^0, then is J'-measurable for each r > 0, 0 ^ g^ ^ x, and with
we have
38
R. R. BAHADUR
For Q e ^ and M e ^ let
It follows from (10.8) that K* is well-defined and 0 g K* ^ oo. Since Pe^ 0 implies >'£(x) = //>(x)[//|, it follows from (10.4) and (10.9) that K* is an extension of the function K on ^ x ,^0 to a function on ^ x ^0. Assumption 10.2. For each Q e ^ , J(g) = inf{K*(Q, M):M e^ 0 }. Since X* is an extension of K, it is plain from (10.5) that this assumption is automatically satisfied if #0 — ^0 is empty, or if K*(Q, M) — oo for Q e ^ and Me;^ 0 -&0. Let /(^Q|X) be the supremum of fp(x) over t/0. Assumption 10.1 implies that this supremum is ^-measurable. Assumption 10.3. For each Q e ^ , EQ(log + [/(^0 x)/JQ(x)]) < op. Now let t^ be the closure of ^ in M. Let M be a measure in -^ and for r > 0 let Assumption 10.4. Given E and T, £ > 0 and 0 < T < 1, and M e .-^ > 1 , there exists an r = r(c, T, M) > 0 such that g]/(x, r) is ^-measurable and
This assumption is perhaps the most troublesome one to verify in examples. A simple sufficient condition for the validity of the assumption is that .^, be a suitable compactification of ^ and that for each M e J*\ there exist an r = r(M] > 0 such that
(Cf.[B8], pp. 22-23.) THEOREM 10.1. Suppose that Assumptions 10.1-10.4 hold. Then (i) Tn(s) -> J(Q) as n -»• oo [Q] /or eac/i Q e ^ , (ii) n~ J log Ln(s) -> -J(Q) as n -* co[Q] for each Q€^, and (hi) for each t in the interior of the set {J(Q)'.Qe^}, n~l\og[l — -^i(0] ->• — t as n -> GO. Proo/ Choose a g 6 ^ and suppose that Q obtains. We first show that, whether J(Q) is finite or not,
Let a and b be positive constants, and let According to (10.1), fn(s) ^ n~l \og[ln(Q\s)/ln(J>Q\s)]. It will therefore suffice to show that, with probability one,
SOME LIMIT THEOREMS IN STATISTICS
39
Let M be a point in ,^0, let g°M be defined by (10.6), and let Y(x) = log[g^(.x, O//Q(^)]- It follows from Assumption 10.1 that Y is a well-defined [Q] extended real-valued random variable. It follows from Assumption 10.3 that m = EQ(Y) is well-defined, — x ^ m < x, and that m -> — K*(Q, M) as r -» 0, where K* is given by (10.9). Since - K*(Q, M) ^ -J(Q) < H(Q) by Assumption 10.2 and (10.14), w < H(Q) for all sufficiently small r. Now choose r = r(M, (), a, b) > 0 so that m < H(Q) and let ,4" = {N:Ne.,>0, d(M, N) < r } . Then /„(. T n .^0|.s) g 11,"= i gAf(*«» >")• Since 0 < ln(Q\s) < x[Q], it follows that n l log[/ n (. I n //0 s)/ ln(Q\s)] ^ i"1^^! ^(-xi) w ith probability 1. Hence, with probability one,
Thus corresponding to each M e .^0 there exists a spherical neighborhood of M in the space ,///, say ,4'(M), such that (10.16) holds with probability one. Since
.^o is compact, there exist open sets .41, • • • , .A'"k in Jt such that .^0 c UJ= t . 47 and such that, with probability one, (10.16) holds with ,,4'" = , 4 ] for each j. Since ^0 = U} = 1 (.^}n^ 0 ), it follows that ln(i?0\s) = max{/ n (,4 y n .^0\s): 1 ^ j ^ k } . It now follows that (10.15) holds with probability one. Thus (10.13) is established. Now choose e > 0 and T, 0 < T < 1. We shall show that there exists a positive integer k = /c(e, T) such that, with Fn defined by (10.3),
for all n = 1, 2, • • • and all t, - oc ^ f rg oo^ It follows from the compactness of ^ and Assumption 10.4 that there exists a finite set, M t , • • • , Mk say, of points in J^ and spherical neighborhoods of these points, say , \] — {N:N e ,//, d(M}, N) < r^} for / = 1, • • • , / c , such that ,^ c U} = , . \-] and such that (10.11) holds with M = Mj and r = ^ > 0 for each j = 1, • • • , k. Consider a P in J*0 and a r, — x < f < x, and for
each ./ let Y(j\x) = \og[glf.(x, rj)/fp(x)] - t. Then r0'1 is well-defined [P] and j} P(-x ^ y(j)(J < x) = 1. Let (/>°» = EP(exp(uY(j}}}. For any w write Z(n
= YJI= i Y \Xi). It follows from an extension of Theorem 2.1 to extended random variables that P(Z(nj)^ 0) ^ [(p(J>(r)]" ^ [/i(f)]n, by (10.11) and the definition of h in (10.17). Hence P(ma\{Z(nj):l ^j ^ k} ^ 0) ^ k[h(t)]a. However,
by (10.1) and (10.10). Hence P(fu(s) ^ t) ^ k[h(t)]". Since Pe //0 is arbitrary, it follows from (10.3) that (10.17) holds for all n and all finite t. That (10.17) holds for t = -x is trivially true. Since 1 - F n (x) ^ 1 - Fn(t) for all finite r, and since /i(x>) = 0, it follows by letting t -> x through finite values that (10.17) holds also for t = oc. Thus (10.17) is established for all n and r.
40
R. R. BAHADUR
It follows from (10.2) and (10.17) that Ln(s) ^ k(\ + e)nexp[-mfn(s)] and s. Hence
for all n
and
for every s. It follows from (10.13) and (10.18) that
Since R and T are arbitrary, it follows from (10.20) that, whether J is finite or not, the left-hand side of (10.20) does not exceed — J(Q) [Q]. It now follows from Theorem 7.5 applied to {fn} that n~l log Ln(s) -> -J(Q)[Q]. This conclusion and (10.19) imply that, for each R and T, lim sup n ^ r, f n (s) ^ [log(l + R) + J(0]/i[Q]. Hence limsup^^ Tn(s) ^ J(Q)[Q]. We now see from (10.13) that ftt(s) -> J(0[