Introduction to Nonparmetric Detection with Applications
This is Volume 1 19 in MATHEMATICS IN SCIENCE AND ENGINEERIN...
69 downloads
688 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Introduction to Nonparmetric Detection with Applications
This is Volume 1 19 in MATHEMATICS IN SCIENCE AND ENGINEERING A Series of Monographs and Textbooks Edited by RICHARD BELLMAN, University of Southerit California The complete listing of books in this series is available from the Publisher upon request.
Introduction to Nonparametric Detection with Applications Jerry D. Gibson
Department of Electrical Engineering University of Nebraska Lincoln, Nebraska
James L. Melsa
Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana
ACADEMIC PRESS
New York San Francisco London
A Subsidiary of Harcourt Brace Jovanovich, Publishers
1975
COPYRIGHT (B 1975. BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR A N Y INFORMATION STORAGE AND RETRIEVAL SYSTEM. WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC.
111 Fifth Avenw, New York, New' York 10003
United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. 24/28 Oval Road. London NWl
Library of Congress Cataloging in Publication Data Gibson, Jerry D Introduction to nonparametficdetection with applications. (Mathematics in science and engineering ; ) Bibliography: p. Includes index. 1. Signal processing. 2. Demodulation (Electronics) 3. Statistical hypothesis testing. I. Melsa, James L., joint author. 11. Title. 111. Series. 621.38'0436 15-3519 TK5 102AG48 ISBN 0-12-282150-5
PRINTED IN THE UNITED STATES OF AMERICA
To Elaine
This page intentionally left blank
Contents
Preface
xi
Chapter 1 Introduction to Nonparametric Detection Theory 1.1 1.2 1.3 1.4
Introduction Nonparametric versus Parametric Detection Historical and Mathematical Background Outline of the Book
Chapter 2 Basic Detection Theory 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8
Introduction Basic Concepts Bayes Decision Criterion Neyman-Pearson Lemma Receiver Operating Characteristics Composite Hypotheses Detector Comparison Techniques Summary
Chapter 3 One-Input Detectors
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8
Introduction Parametric Detectors Sign Detector Wilcoxon Detector Fisher-Yates, Normal Scores, or c, Test Van der Waerden’s Test Spearman Rho Detector Kendall Tau Detector
3.9 summary Problems
9 10 12 17 21 24 31 36
38 39 47 51 57 61 65 68 71 12
viii
Contents
Chapter 4 One-Input Detector Performance
4.1 4.2 4.3 4.4 4.5 4.6
Introduction Sign Detector Wilcoxon Detector One-Input Detector ARES Small Sample Performance Summary Problems
Chapter 5 Two-Input Detectors
5.1 5.2 5.3 5.4
Introduction One-Input Detectors with Reference Noise Samples Two Simultaneous Input Detectors Summary Problems
Chapter 6 Two-Input Detector Performance
6.i 6.2 6.3 6.4 6.5 6.6 6.7
Introduction Asymptotic Results for Parametric Detectors ARE of the PCC Detector ARE of the Mann-Whitney Detector ARE of Other Two-Input Detectors Small Sample Results Summary Problems
Chapter 7 Tied Observations 7.1 7.2 7.3 7.4
Introduction General Procedures Specific Studies Summary Problems
Chapter 8 Dependent Sample Performance 8.1 8.2 8.3 8.4
Introduction Dependence and the Constant False Alarm Rate Property Nonparametric and Parametric Detector Performance for Correlated Inputs Summary Problems
Chapter 9 Engineering Applications 9.1 9.2 9.3 9.4
Introduction Radar System Applications Other Engineering Applications Summary
74 75 81 87 90 92 93
95 97 122 152 152
154 155 158 162 164 167 170 170
172 173 176 178 178
181 182 185 188
189
191 192 204 209
Contents
ix
Appendix A Probability Density Functions
210
Appendix B Mathematical Tables
214
Answers to Selected Problems
226
References
232
Index
231
This page intentionally left blank
Preface
Statisticians have long been interested in the development and application of nonparametric statistical tests. Although engineers have unknowingly implemented nonparametric detectors in the past, only recently have engineers become aware of the advantages of nonparametric tests, and hence have only recently become interested in the theoretical analysis and application of these detectors. Because of these two facts, almost the entire development and analysis of nonparametric detectors has taken place in the statistical literature. Engineers are presently highly interested in the application and analysis of nonparametric detectors; however, engineering understanding is proceeding slowly in this area. This is because the basic results of nonparametric detection theory are developed in papers scattered throughout statistical, psychological, and engineering journals and consist almost wholly of mathematically rigorous proofs and statements of theorems or specific applications to nonengineering problems. Consequently, the nonmathematically oriented engineer and the engineer who is unfamiliar with these journals find it extremely difficult, if not impossible, to become educated in this area. It is the purpose of this book to alleviate this problem by presenting a tutorial development of nonparametric detection theory written from the engineer’s viewpoint using engineering terminology and stressing engineering applications. Only those nonparametric hypothesis testing procedures that have already been applied to or seem well suited to the type of statistical detection problems that arise in communications, radar, sonar, acoustics, and geophysics are discussed in this book. The Neyman-Pearson approach to hypothesis testing is employed almost exclusively. Xi
xii
Preface
The book is written at the senior-first-year graduate level and requires only a basic understanding of probability and random variables. Some previous exposure to decision theory would be useful but is not required. The book should find its principal application as a reference monograph for students and practicing engineers. However, the unusually thorough consideration of Neyman-Pearson detection theory in both the parametric and nonparametric cases make it attractive for use as a secondary text for a standard engineering detection theory course or as a primary text for a more in-depth study of detection theory. The book draws heavily on the statistical and engineering literature on nonparametric detection theory, and an attempt has been made to assign complete and accurate credit for previous work and ideas in all instances. Three excellent survey papers on nonparametric detectors by Carlyle and Thomas [ 19641, Carlyle [ 19681, and Thomas [ 19701 have been published, and these papers probably served as the genesis for this book. The authors gratefully acknowledge the contribution of these and other researchers in the field. The authors would like to express their appreciation to their colleagues and students who have assisted in the preparation of this book in various ways. The authors also wish to thank Kathryn Stefl, Linda Schulte, Sharon Caswell, Marge Alles, and Elaine Gibson who patiently typed, retyped, and corrected the manuscript.
CHAPTER
1
Introduction to Nonparametric Detection Theory
1.1 Introduction
The received waveform at the input of a communication or a radar system normally consists of a randomly fluctuating component called noise plus any desired signal components that may be present. The system designer desires to extract information (the signal) from these received data. Since the noise may almost completely obscure the signal, the received waveform must be processed to enhance the extraction of information from the input. When the problem requires the determination of the presence or absence of a signal, an area of statistical inference called hypothesis testing, or decision theory, is useful. In engineering terminology, decision theory is often referred to as detection theory, because of the early applications to radar systems. The latter terminology is employed throughout this book. The bulk of the engineering literature on detection theory is concerned with the determination of optimum detectors utilizing the NeymanPearson' fundamental lemma of hypothesis testing or the Bayesian' approach to hypothesis testing. Both methods require knowledge of a statistics€ description of the interfering noise process. This statistical description may be obtained by actual measurement or it may be assumed. Once a statistical description of the noise has been determined, it can be utilized with one of the previously mentioned procedures to yield a detector which is optimum in the chosen sense. Unfortunately the resulting 'see Chapter 2.
I
2
I
Introduction to Nonparametric Detection Theory
optimum detectors are often difficult to implement. Another class of detectors, called adaptive or learning detectors, operate in a near optimum fashion without a complete statistical description of the background noise. This is achieved by allowing the system parameters or structure to change as a function of the input. Due to their continually varying structure, adaptive detectors are difficult to analyze mathematically and must be simulated on a computer. Additionally, since the detector structure is a function of the input, it is also dependent on the operating environment and as a result, adaptive detectors are generally complex to implement. When a statistical description of the input noise process is not available or when the optimum or adaptive detector is too complex to implement, one often seeks a nonoptimum detector whose performance is satisfactory. A class of nonoptimum detectors called nonparametric or distribution-free detectors exists which is simple to implement and requires little knowledge of the underlying noise distribution. Throughout this book and the literature, nonparametric and distribution-jree are generally used as synonyms. Strictly speaking, the two terms are not completely equivalent. Nonparametric refers to a class of detectors whose input distribution has a specified shape or form but still cannot be classified by a finite number of real parameters; while distribution-free refers to the class of detectors which makes no assumptions at all concerning the form of the input distribution [Bradley, 19681. The purpose of the introductory material in this chapter is to give the reader an intuitive feeling for nonparametric detection theory and to outline the approach used in this book for the systematic study of this subject. 1.2 Nonparametric versus Parametric Detection Before beginning the development of nonparametric detectors, it is instructive to discuss further the differences between parametric and nonparametric detection and to give some specific examples of the types of problems to which nonparametric detectors can be effectively applied. The detection problem considered throughout this book is of the type where a signal is observed, and one of two different decisions must be made. Problems of this type are not unique to the engineering field and can be found in agriculture, business, and psychology as well as many other fields. In agriculture a farmer must analyze the yield of a plot of land and decide whether or not to plant wheat on the land; a businessman must study the sales and growth possibilities of a company and decide whether or not the company would be a good investment; a psychologist must observe the effect of a certain environment on individual behavior and decide whether
1.2
Nonparametric versus Parametric Detection
3
or not the environment is detrimental to human development. In each of these cases, the decision is either yes or no, accept or reject. There are therefore two different possibilities. In this book one of the possibilities will be called the null hypothesis, or simply hypothesis, and the second possibility will be called the alternative hypothesis, or alternative. The hypothesis will be denoted by H and the alternative by K. Consider now the general detection problem where a random signal x is observed, and a decision must be made as to which one of the two possibilities, the hypothesis H or the alternative K, is true. If H and K could be observed directly, there would be no uncertainty in the detection problem, and a correct decision could be made every time. In truth, however, the observations of H and K are usually corrupted by noise. A detector must be devised which attempts to compensate for the distortion introduced by the noise and chooses either H or K as the correct source of the observed data. A generalized detector is shown in Fig. 1.2-1. The input to the detector is a random signal x and the output is a random variable, D ( x ) , which depends on the input x , and takes on the values 0 or 1. If the output of the detector is 0, the hypothesis H is accepted, and if the output of the detector is 1, the alternative K is accepted. The input x is assumed to be a stochastic process. At any instant of time t , , x is a random variable, and x will have a different density function depending on whether the hypothesis H is true or the alternative K is true. The probability density function of x is written asf(x1H) if H is true and asf(x1K) if K is true. DECISION D(X) DETECTOR
Fig. 1.2-1.
)
Generalized detector.
Parametric detectors assume that f ( x l H ) and f ( x l K ) are known and utilize the known form of these probability density functions to arrive at a form for the detector D. If the actual probability density functions of the input x are the same as those assumed in determining D, the performance of the detector in terms of probability of detection and false alarm probability is good. If, however, the densities of the input x are considerably different from the probability density functions assumed, the performance of the parametric detector may be poor. On the other hand, nonparametric detectors do not assume that the input probability density functions are completely known, but only make general assumptions about the input such as symmetry of the probability density function and continuity of the cumulative distribution function. Since there are a large number of density functions which satisfy these
4
1
Introduction to Nonparametric Detection Theory
assumptions, the density functions of the input x may vary over a wide range without altering the performance of the nonparametric detector. Since parametric detectors are based on specified forms of the input probability density functions, their performance may vary widely depending on the actual density of the input x . Nonparametric detectors, however, maintain a fairly constant level of performance because they are based on general assumptions on the input probability density. How do the performance of parametric and nonparametric detectors compare when applied to the same detection problem? The answer to this question pivots about how well the assumptions upon which the two detectors are based are satisfied. For instance, assume that the parametric detector is based on the assumption that the input density is Gaussian, and the nonparametric detector is based on the assumption that the input density is symmetric and the cumulative distribution function is continuous. If the input probability densities are actually Gaussian, the performance of the parametric detector will generally be significantly better than the performance of the nonparametric detector. If, however, the input densities are not Gaussian but are still symmetric, the nonparametric detector performance may be much better than the performance of the parametric detector. Another property of nonparametric detectors is their relative ease of implementation when compared to parametric detectors. The following example will illustrate this property. Consider the .problem of detecting a known constant signal of positive amplitude in additive white noise on the basis of a set of observations x = { x , , x2, . . . , x,}. If the noise has a zero mean Gaussian density with a known variance u2, the optimum parametric detector has the form (see Section 3.2)
(1.2-1)
Here Z:, ,xi is known as the test statistic and To is the threshoZd of the test. The optimum parametric detector for this case sums the input observations, compares the resulting sum to a threshold To, and decides H (D ( x ) = 0) if the sum is less than To or decides K (D ( x ) = 1) if the sum is greater than T,. The simplest nonparametric detector is called the sign test and bases its decision only on the signs of the input observations. For this problem, the
1.2
Nonparametric versus Parametric Detection
5
sign detector has the form (see Section 3.3)
i n r
D(x) =
O
if
x x n
u(xi) < T ,
i- 1
1
if
i= 1
#(xi) > TI
( 1.2-2)
where #(xi) is the unit step function, and T I is again referred to as the threshold of the test. The nonparametric detector of Eq. (1.2-2) must therefore simply determine the polarity of each input observation, count the number of positive observations, compare the total to the threshold T , , and decide or accept H if the total is less than T , or decide K if the total is greater than T I . Since the parametric detector requires that the input observations be summed whereas the nonparametric detector requires only the total of the number of positive observations to be found, the nonparametric detector is considerably simpler to implement. A shorthand notation will often be used to represent the detector rule. For example, Eq. (1.2-2) will be written in the form
( 1.2-3) By this notation, we imply that we decide K if %T-lu(xi)> T I and decide H if Z:,,u(xi) < T I .The probability of Z:,,u(xi) = T , is zero when the test statistic is continuous, however in this discrete case, exact equality has finite probability. Although not indicated notationaly, we will always assign this event to the alternative. As another illustration, suppose that the noise has a Cauchy density, then the optimum parametric detector has the form
n (A,x; n
i- 1
+ A,xi)
K
S T2
H
( 1.2-4)
where A , and A, are known constants and T, is the threshold. The parametric detector given by Eq. (1.2-4) requires each observation to be squared, multiplied by a constant, and then added to the unsquared observation multiplied by another constant. This calculation has to be 'For this book, the unit step function is defined by
6
I
Introduction to Nonparametric Detection Theoty
performed for each observation, the product of the individual results taken, and the product compared to the threshold T,. If the product is less than T,, H is accepted and if the product is greater than T,, K is accepted. The basic form for the nonparametric detector for this case is still given by Eq. (1.2-2). The nonparametric detector is therefore much simpler to implement than the parametric detector when the input noise has a Cauchy density. Many other examples could be given which illustrate the simple implementation schemes required by nonparametric detectors. The most important property of nonparametric detectors is, however, the maintenance of a constant level of performance euen for wide variations in the input noise density. This property, coupled with the usually simple form of the test statistics, make nonparametric detection techniques quite attractive to the engineer. 1 3 Historical and Mathematical Background Since the simplest nonparametric detectors are quite intuitive in nature and are based on fairly elementary mathematical concepts, it is not surprising that papers concerned with nonparametric tests of hypotheses appeared as early as the eighteenth century. In particular, in 1710, John Arbuthnott applied the sign test to a hypothesis concerning Divine Providence [Bradley, 1968; Arbuthnott, 17101. In spite of these few early papers, however, the true beginning of nonparametric detection theory is fixed by Savage [1953] as 1936, when Hotelling and Pabst [1936] published a paper on rank correlation. The development and application of nonparametric detectors was considerably delayed by the overwhelming acceptance by mathematicians and scientists of the normal distribution theory. Since it could be shown that the sum of a large number of independent and identically distributed random variables approached the normal distribution, practitioners in almost all fields viewed and used the normal distribution as the tool for whatever purpose they had in mind. Such was the reputation of the normal distribution that many times, sets of experimental data were forced to fit this mathematical and physical phenomenon called the normal distribution. When sets of data appeared which could not even be forced to fit the normal distribution, it was assumed that all that was needed to obtain a normal distribution was to take more samples. As larger data sets became available, however, it became evident that not all situations could be described by normal distribution theory. When nonparametric tests began to appear, people continued to consider normal distribution theory as the
1.4 Outline of the Book
7
most desirable approach, and nonparametric tests were used only as a last resort. Since the beginning in 1936, however, the number of papers in the statistical literature concerning nonparametric techniques has swelled, until today, the literature on nonparametric statistical inference is quite large. Although nonparametric or distribution-free methods have been studied and applied by statisticians for at least 25 years, only recently have engineers recognized their importance and begun to investigate their applications. Only elementary probability theory is necessary to apply and analyze nonparametric detectors. Therefore unlike many engineering techniques, nonparametric detection was not slowed in development due to the lack of mathematical methods. Most of the definitions and concepts of detection theory required to understand the mathematics contained in this book, excluding basic probability theory, are covered in Chapter 2. 1.4
Outline of the Book
This book consists of essentially three sections: basic theory, detector description, and detector performance. Chapter 2, entitled Basic Detection Theory, comprises the entire basic theory section wherein simple and composite hypotheses and alternatives, type 1 and type 2 errors, false alarm and miss probabilities, and receiver operating characteristics are discussed. In addition, the Neyman-Pearson fundamental lemma is treated in detail, and the important concepts of consistency, robustness, efficacy, and asymptotic relative efficiency are defined. The detector description section is composed of two chapters, Chapter 3, which covers one-input detectors, and Chapter 5 which treats two-input detectors. Some clarification of terminology is necessary here. The terms one- and two-input detectors indicate the number of sets of observations available at the detector input, not the total number of available input samples. For example, a one-input detector has only one set of observations (x,, xZr. . . , x,,) at its input to use in making a decision, whereas a two-input detector has two sets of samples ( y , , y z , . . . , y k ) and (zlrz2,. . . , zm). In these two chapters, the basic rank and nonrank nonparametric detectors are described in detail and examples given of the test statistic calculations. In order to compare a nonparametric detector with another nonparametric or a parametric detector, some kind of evaluation technique must be established. In the performance section of the book, which consists of the remaining chapters, the calculation of certain performance parameters is illustrated, the effects of experimental methods are demonstrated, and
8
I
Introduction to Nonparametric Detection Theory
the performance of the nonparametric detectors for various applications is given. In Chapters 4 and 6, the calculation of the asymptotic relative efficiency for both rank and nonrank detectors is illustrated, and the small sample performance of several of the nonparametric detectors is given. Chapter 4 treats the performance of one-input detectors, while Chapter 6 is concerned with two-input detector performance. When working with signed-rank detectors, a method for treating zero value observations and for assigning ranks to observations with the same absolute value must be adopted. Both zero value observations and equal magnitude observations are generally lumped under the common title of tied observations. Chapter 7 discusses some of the more common techniques for treating tied observations and demonstrates the effects of the various techniques on nonparametric detector performance. All of the nonparame.tric detectors in this book are based on the assumption that the input observations are independent and identically distributed. Although the assumption of independent input observations is quite restrictive, it should not be surprising that the nonparametric detectors are based on this assumption, since the nonparametric tests were developed primarily by statisticians who generally have more control over experimental techniques such as sampling frequency, than do engineers. Since there are many physical situations in engineering which may give rise to dependent input observations, the performance of nonparametric detectors when the input samples are correlated is of prime importance. A treatment of the dependent sample performance of. both one- and twoinput detectors is contained in Chapter 8. In order to apply nonparametric detectors to radar and communication systems, slight modifications of the basic detector test statistics are sometimes required. Chapter 9 presents examples of the engineering applications of the nonparametric detectors covered in this book and discusses the implementation and performance of these detectors for the applications considered. Two appendices complete the book. Appendix A gives a brief description of several standard probability densities that are used in the book, while Appendix B contains mathematical tables which are needed for the work in the book. Answers to selected problems are given after the appendices.
CHAPTER
2
Basic Detection Theory
2.1
Introduction
This chapter is concerned with defining, developing, and discussing the basic detection theory concepts required throughout the remainder of this book. The reader is assumed to have a basic knowledge of probability theory and random variables. While an attempt has been made to make the book self-contained, prior introduction to the basic concepts of detection theory will be helpful since only a limited class of detection problems is treated here. The reader interested in a more comprehensive discussion of detection t h e o j is referred to the extensive available literature [Van Trees, 1968; Melsa and Cohn, 19761. The first section defines the basic concepts of simple and composite hypotheses and alternatives, type 1 and type 2 errors, and false alarm and detection probabilities. Sections 2.3 and 2.4 discuss three of the most common approaches to the hypothesis testing problem, the Bayes, ideal observer, and Neyman-Pearson methods. The Neyman-Pearson technique is developed in detail since it is the test criterion that will be utilized in the rest of this book. Receiver operating characteristics are discussed in Section 2.5. Section 2.6 presents methods for handling composite hypotheses including such concepts as uniformly most powerful, optimum unbiased, and locally most powerful detectors as well as the detector properties of consistency and robustness. The last section of the chapter presents several techniques for comparing the performance of parametric and nonparametric detectors. The principal technique presented is the concept of asymptotic relative 9
10
2
Basic Detection Theory
efficiency (ARE). The relationship of the ARE to the finite sample relative efficiency and the efficacy is given and the advantages and disadvantages of each technique are discussed. 2.2
Basic Concepts
Let the input samples be denoted by x = { x i ; i = 1, . . . , n } and let F be the probability distribution of the input data. The form of the probability distribution F depends on the values of some number of parameters which will be denoted by 8 = { O,, O,, . . . , Om}, m not necessarily finite. If a hypothesis or an alternative uniquely specifies each of the Oi,i = 1, 2, . . . , m , then the hypothesis or alternative will be called simple. The hypothesis or alternative will be called composite if one or more of the Oi are unspecified. To illustrate this concept, consider a Gaussian density which has mean p and variance u2. For this particular density function, there are two parameters, p and u2, which play the part of the Oi,i = 1, 2, and specify the form of the probability density. Three different statements or hypotheses concerning p and u2 are given below: H , : p = 5, H,: p = 2,
u2 = 1 u2
>2
u2 = 1 H,: p > 0, In hypothesis H I both of the parameters are uniquely specified, and therefore H I is called a simple hypothesis. The mean is explicitly given as 2 in hypothesis H,, but the variance is only given as being greater than 2. Since the variance can take.on a range of values greater than 2, hypothesis H, is called composite. In hypothesis H, the variance is fixed at 1, but tlie mean is only stated as being positive. Since p can equal any positive number, hypothesis H, is called composite. Many physical problems give rise to composite hypotheses or alternatives. One such example is a pulsed radar system. The received waveform at the input of a pulsed radar system can be written as (2.2-1) 0< t < T y ( t ) = A cos(wt +) n ( t ) ,
+ +
where A cos(wt + +) is the radar return, or signal, and n ( t ) is the noise. The amplitude A of the target return is not known exactly at the radar receiver due to variations in radar cross section and signal fading. A decision is required as to whether “noise only” ( A = 0) is present or “target and noise” ( A > 0) is present. Since a target is declared for any
2.2
Basic Concepts
11
> 0, the decision problem consists of testing a simple hypothesis ( A = 0) versus a composite alternative ( A > 0). Composite hypothesis testing problems may also arise in radar systems due to other parameters such as phase angle, frequency, and time-ofarrival being unknown. In radar systems the time-of-arrival of a radar return is rarely accurately known, and therefore, radar systems must be capable of detecting targets over a range of possible arrival times. Accurate phase angle information is usually not available due to phase shifts contributed by the target and propagation effects, and if the target is moving with respect to the radar, a substantial frequency shift may occur due to the Doppler effect. In communication systems, composite hypothesis testing problems may occur as a result of signal fading (range of signal amplitudes), multipath propagation (range of arrival times), and phase angle changes due to oscillator drift. Returning now to the general detection problem, it is clear that when testing a hypothesis H versus an alternative K, four possible situations can occur. The detector can A
accept the hypothesis H when the hypothesis H is true; (2) accept the alternative K when the hypothesis H is true; (3) accept the alternative K when the alternative K is true; (4) accept the hypothesis H when the alternative K is irue. (1)
Situations (1) and (3) are correct decisions, and situations (2) and (4) are incorrect decisions. The incorrect decision stated in (2) is called a type I error, and the incorrect decision given by (4) is called a type 2 error. Suppose that r is the set of all possible observations, then let us divide the observation space r into two disjoint subsets rHand rKwhich cover r such that rHu r , = r and rHn r,=O
If x E rH,we accept the hypothesis H, and if x E r,, we accept the alternative K. Since rK and rH are disjoint and cover r, the detector is completely defined once either rK or rH is selected. Because rKis the region for which we decide for the alternative K, it is sometimes referred to as the critical region of the test. If F E H,' the conditional probability density of x may be written as f ( x l H) and similarly, the conditional probability of x is f (XI K) for F E K. The probability of a type 1 error, also called a false alarm, is denoted by a 'F E H denotes that the probability distribution of the input data x is a subset of the probability distributions described in the hypothesis H , or more simply, that the hypothesis H is true. Similarly, F E K denotes that the alternative K is true.
12
2
Basic Detection Theory
and may be expressed as a = P { x E IIKIH} =
J
(2.2-2)
f ( x l H ) dx
TK
The false alarm probability is the probability of deciding K when H is true. By making use of the fact that rK and rH are disjoint and cover r we can also write a as a = i K f ( x l H )d x = = 1
-I
s,
f(xlH) dx-
f ( x l H ) d x = 1 - P { x E IIHIH}
(2.2-3)
H
Equation (2.2-3) simply expresses the fact that a is one minus the probability of correctly identifying H. The probability of a type 2 error, also called a miss, is denoted by p and may be written as (2.2-4)
/3 = P { x E r H I K } = J f ( x l K ) d x rH
The miss probability is the probability of deciding H when K is true. It is also possible to express P in terms of rKas
=
1
-s,
K
f ( x l K ) dx= 1 - P { x E rKIK}
(2.2-5)
K
Since P is the probability of a miss, 1 - p is the probability of detection. The probability of detection is also referred to as the power of the rest and can be written as 1 - p = 1 -L , f ( x l K ) dx=
s,
f ( x l K ) dx= P
{X
E r K ( K } (2.2-6)
K
In other words, 1 -
P is the probability of accepting K when K is true.
2.3 Bayes Decision Criterion In the theory of hypothesis testing, there are several different approaches to choosing the form of a test or detector. Two of the more well-known methods are the Bayes criterion and the Neyman-Pearson criterion. Each of these criteria is a method for choosing the two subsets, r, and rK,of the observation space r to achieve a specific goal. In this section, we consider the Bayes decision criterion which makes use of a systematic
2.3 Bayes Decision Criterion
13
procedure of assigning costs to each decision and then minimizing the total average cost. Let the cost corresponding to each of the four situations given in Section 2.2 be denoted by C,, = cost of accepting H when H is true; CKH= cost of accepting K when H is true; CKK= cost of accepting K when K is true; C,,
= cost of accepting H when K is true.
It is assumed that these costs are uniquely known. The Bayes criterion is a method for selecting the decision regions rH and rK such that the expected value of cost 4 = E {cost} is minimized. The expected cost can be written as 4 = E{cost} = E{costlH}P{H} + E{costlK}P{K) (2.3-1) Here P { H} is the probability, sometimes called the aprioriprobability that
H is true, while P { K} = 1 - P { H} is the probability that K is true. The
conditional expected cost E {costlH} can be expressed as + C,,P { x E = E {costlH = C H H P { x E r,lH
aH
=,,c
(1 - a) +
ac,,
(2.3-2)
In a similar manner, the expected cost given K is = E { c o s t l q = C,,P { x E r,jq + C,,P
aK
= cKK(l
r,p
{ x E r,lK
- P ) + PcHK
(2.3-3)
The total expected cost is, therefore, 3 = 4 H P { H } + 4 t , P { K } = P{H}[CHH
+ a(CKH
-
CHH)]
(2.3-4) + P ( ‘ H K - ‘KK 11 Now let us substitute the integral expressions of Eqs. (2.2-2) and (2.2-5) for a and to write 4 as +P{K)[CKK
4
= cHHP{H} -P{K}(CHK
+
+
cHKP{K}
-
‘KK)J
r K
f(xlK)
Combining the two integrals yields
CR =
cHHP{H)
+
cHKP{K)
- {K}(CHK -
P{H>(cKH
dx
+J
[P{H}(c,H
rK
‘KK
- cHH)J
) f ( x l K ) l dx
TK
f(x~~)dx
(2.3-5)
- c H H ) f ( x l ~ )
(2.3-6)
14
2
Basic Detection Theory
Under the Bayes criterion, the subsets rHand rKare now to be selected such that 9,,given by Eq. (2.3-6) and called the average risk, is a minimum. The minimum value of the average risk is called the Bayes risk, and choosing rK to minimize 9,results in what is called a Bayes solution to the statistical hypothesis testing problem. N o other strategy can yield a smaller average risk than the Bayes criterion. The problem is to select rK such that 3 is minimized. The first two terms in Eq. (2.3-6) are not a function of rK; to minimize the integral we assign to rKall of the values of x for which (2.3-7) Therefore the Bayes detector can be written as
(2.3-9) The quantity on the left-hand side of Eq. (2.3-9) is referred to as the likelihood ratio defined as (2.3-10) The right-hand side is the threshold of the test (2.3-1 1) Hence Eq. (2.3-9) can be written as K
L ( x ) 2 TB H
(2.3- 12)
Tests which have the form of Eq. (2.3-12) are known as likelihood ratio tests. The likelihood ratio is also referred to as a sufficient statistic. In general, a test statistic is any function (normally scalar) of the observation x. A test statistic is sufficient if it can be used to replace the complete observation as far as a detector is concerned. 2This assumption is reasonable since we are simply assuming that the cost for an error is greater than the cost for a correct decision. If this assumption is not valid, the decision rule may still be written in the form of Eq. (2.3-9) but the inequalities must be reversed.
15
2.3 Bqyes Decision Criterion
Example 23-1 In order to illustrate the use of the Bayes criterion, let us consider a simple problem. Suppose that we have a single observation whose density function under H and K is given by H : f ( x ( H ) = - exp(- 71 x 2 )
G
K : f ( x l K ) = -exp(- 81 x 2 ) 2 G
If H is true, x has a variance of one, while if K is true x has a variance of
4. We assume that H and K are equally likely so that P { H} = P { K } =
0.5 and let the costs be
CHK= CKH= 1; Then the likelihood ratio is given by ~ ( x =) $ exp( - i x 2
CHH=
+)’.4
c,,
=0
)’2.
= $ exp(
while the threshold is T, =
+(1 - 0)
$(1 - 0)
= 1
Therefore the Bayes optimal detector is given by K
4 exp( a x ’ ) s 1 H
Let us take logarithms on both sides to obtain K
3x221n2 H
or
1x1
g d E = 1.36 H
Example 23-2 As another example, suppose that the observation densities are given by
If H is true, the observations xi,i = 1, 2, . . . , n are uncorrelated, zero mean Gaussian variables while if K is true, the observations are uncorrelated Gaussian variables with mean p. This problem corresponds to detecting the presence of a known constant signal in additive white noise. We assume that P { H} = 0.75 and P { K} = 1 - P { H} = 0.25 and the costs
16
2 Basic Deiection Theory
are given by
0; CHK= 1 and The likelihood ratio for this problem is given by CHH = CKK =
= exp{ -
[
=2
i= I
7 1 - 2 p i xi 20
CKH
+ np2
i- 1
while the threshold is 0.75(2 - 0) =6 TB = 0.25(1 - 0) Therefore the Bayes optimal detector for this problem is i- I
Once again let us take logarithms on both sides to obtain p
K
p2n
H
20’
- X x i S -+ln6 U2 i-1
If we assume that p
> 0, then we have
Hence the detector sums the observations and then compares the sum to the threshold T,,. If in the expression for the average risk, Eq. (2.3-4), the costs are such that C,, = CKK= 0, and CKH= CHK= 1, the following expression results (2.3-13) 3 = P,= P { H } a P { K } P
+
which is called the total average probability of error. Substituting integral expressions for a and P in Eq. (2.3-13) yields
P,
=
P ( H ) I f ( x l H ) dx+ P(K)/ f ( x l K ) dx rK
rH
(2.3-14)
The ideal observer criterion consists of choosing rHand r K such that Eq. (2.3-14), the total average probability of error, is minimized. The ideal observer criterion can therefore be used when the cost of correct decisions is assumed or known to be zero, and the cost of incorrect decisions is taken
I7
2.4 Neyman-Pearson Lemma
to be one. The ideal observer criterion still requires the a priori probability of the hypothesis and the alternative to be known.
2.4 Neyman-Pearson Lemma If neither the cost function nor the probabilities of H and K are known, the Neyman-Pearson criterion can be used. However the main advantage of the Neyman-Pearson over the Bayes and ideal observer criteria is that it yields a detector which keeps the false alarm probability less than or equal to Some prechosen value and maximizes the probability of detection for this value of a. The Neyman-Pearson criterion is based on the NeymanPearson fundamental lemma of hypothesis testing, which can be stated as follows.
Neyman-Pearson Lemma Necessary and sufficient conditions for a test to be most powerful of level a. for testing H: f ( x l H ) against the alternative K: f ( x l K ) are that the test satisfy P { xE rKIH}
= J f ( x l H ) dx=
a.
(2.4-1)
rK
and (2.4-2) for some A. The form of the test given by Eq. (2.4-2) can be derived by using the method of Lagrange’s undetermined multipliers to maximize the probability of detection or power of the test subject to the constraint given by Eq. (2.4-1). Since maximizing 1 - p is equivalent to minimizing p, the following equation must be minimized by choosing rH and rK, A = p + A(a - ao) where A is an undetermined Lagrange multiplier. Using Eqs. (2.2-2) and (2.2-5) for a and p, we obtain the following expression
(2.4-3) For A
> 0, the expression for A in Eq. (2.4-3) will be minimized if all of the
18
2
Basic Detection Theory
x’s for which f(xlK) < Xf(xlH) are placed in rH,and all of the x’s for which f(xlK) > Xf(xlH) are placed in rK.Since in the continuous case the situation where f(xlK) = Xf(x1H) has probability zero, this situation can be included in either rH or rK.The allocation of the points wheref(x1K) = Xf(xlH), however, can affect the false alarm probability. Maximization of the power of a test subject to the constraint of a fixed false alarm probability therefore results in a test of the form (2.4-4) and once again we have obtained a likelihood ratio test. The threshold X is obtained by satisfying the constraint on the false alarm probability (2.4-5) It is sometimes convenient to think of the likelihood ratio L(x) as simply a random variable which is a function of the observation x. In this case, one can think of L(x) as possessing a density function which will depend on whether H or K is true. We will write these densities asf[L(x)lH] and f[L(x)lK]. Since L(x) is a ratio of probability densities, L(x) > 0 so that f [ L ( H ] = f[LIK] = 0 for L < 0. The false alarm and miss probabilities can now be written as a = P { L(x)
> AIH}
= / m f [ L ( ~ ) l H ]dL
(2.4-6)
/3 = P{L(x)
< XIK}
=/f[L(x)lK] dL
(2.4-7)
h
h
0
We can use Eq. (2.4-6) to obtain X by setting a = a. and solving for A. We note that h must be nonnegative. The sufficiency of Eqs. (2.4-1) and (2.4-2) can be shown in the following manner. Let r denote the observation space and let r , ( h ) be that critical region where f(xlK) > Af(x1H) gives a test of maximum power among all tests which have a false alarm probability of a,,.In other words, r , ( h )is the Neyman-Pearson critical region as defined by Eqs. (2.4-1) and (2.4-2). Now let robe any region in r which is the critical region of another test that also has a false alarm probability of a,,.If Eqs. (2.4-1) and (2.4-2) are sufficient conditions for a most powerful test, then the power of the test which has critical region rl will be greater than or equal to the power -of
2.4
19
Neyman-Pearson Lemma
the test which has critical region ro.The regions rl and romay have a common part which will be denoted by rol.A diagram of the relationship among rl, ro,and rolis shown in Fig. 2.4-1.
Fig. 2.4-1. Critical regions of two tests.
Let rl - rolrepresent the set of samples in r, not contained in To,. Then, for every observation in r , , f ( x l K ) > Af(xlH), so that we have
1
(2.4-8)
AP { x E (rl - To1)lH Adding P { x E rolIK} to both sides of Eq. (2.4-8) yields P { X E (rl - rol)lK} + p { x E rollK}
> AP { x E (rl - rol)lH 1
+ P { x E rollK1
The left-hand side of Eq. (2.4-9) is simply P { x test, and therefore, P{X E
rlpq > A P { X
E
E
(2.4-9)
r l l K } , the power of the
(r,- rol)lH)+ P { X
E
rolpq
(2.4-10)
Working with the right-hand side of Eq. (2.4-lo), we have A P { x E (l-1 - rol)lH} + P { X E rollK} = AP { x E r , p 1 - AP { x E r o l p 1 + P { x E rolpq (2.4-11)
Since both tests have a false alarm probability of a0, we know that
P { x E rllH} = a0 = P {x E I',lH} so that Eq. (2.4-1 1) becomes AP{X E (rl - rol)lH} + P { X E rolIK1 = AP { x E rolH1 - AP { x E rollH1. + P { x E rollK1 = AP { x E
(r, - rol)lH1 + P { x E rollK1
Substituting this result into Eq. (2.4- 10) yields P { X E rlpq > AP { x E (r, - rol)lH1 + P { x The points in
(2.4-12) E
rolpq
(2.4-13)
ro- rolare not contained in the region TI, and therefore, as
20
2 Basic Detection Theory
shown in Fig. 2.4-1, f ( x l K ) < X f ( x l H ) for all observations in Using this result in Eq. (2.4-13) gives
P{xEr,lK}
ro - rol.
> P{xE(ro - ro,)lK} + P{xEro,lK} = P{xEr,lK} (2.4- 14)
Since P {x E I',lK} represents the power of the test corresponding to rl while P {x E r 0 l K } is the power for the test corresponding to ro,Eq. (2.4-14) shows that the Neyman-Pearson test has the greatest power of any test of level ao. Hence, Eqs. (2.4-1) and (2.4-2) are sufficient conditions for the test to be most powerful.
Example 2.4-1 To illustrate the use of the Neyman-Pearson lemma, let us consider the following hypothesis and alternative: H: f ( x l ~= ) ( l / V G ) exp(-x2/2) K : ~ ( X I K=)( l / V G ) exp[ - ( x - 112/2]
We assume that a. = 0.2. The likelihood ratio is given by
f( 4 K ) L ( x ) = -= exp[(2x - 1)/2] f( 4 H ) and the Neyman-Pearson test is therefore, K
exp[(2x - 1)/2] S A H
By taking logarithms on both sides, we reduce the test to K
x S lnA+t H
Now we must select X to satisfy the constraint on the false alarm probability. The false alarm probability is given by
where @[.I is the cumulative distribution function for a zero mean, unit variance Gaussian random variable (see Appendix B). Hence, we wish to find X such that 1 - @[ln X
+ $1 = a. = 0.2
From Appendix B, we find that In A Neyman-Pearson test becomes K
+4
x S 0.84 H
= 0.84, so that
X
= 1.4 and the
2.5 Receiver Operating Characteristics
The corresponding power for the test is 1-
p
= 1
-J
m
1
0.84
- exp[ -
( x - 1)2
6
]
dx=
21
a[-0.161
= 0.4364
2.5 Receiver Operating Characteristics A plot of the probability of detection 1 - /3 versus the false alarm probability a is sometimes useful in describing the performance of a detector. Plots of this type are called receiuer operating characteristics (ROC). An expression for the receiver operating characteristic may be obtained by treating the output of the detector as a random variable D ( x ) that depends on x and takes on the values 0 or 1; if D ( x ) = 0, we accept H and if D ( x ) = 1, we accept K. Consider the expected value of the random variable D (x). If F E H, the expected value of D ( x ) equals Since D ( x ) = 0 for x E r,, Eq. (2.5-1) can be written as E{D(x)lH}
=I f ( x l H ) d x = a
(2.5-2)
rK
which is another expression for the false alarm probability given by Eq. (2.2-2). If F € K, the expected value of D ( x ) becomes
1 - i H f ( x l K )dx= 1 -
p
(2.5-3)
which is the probability of detection. The expected value of the random variable D ( x ) therefore defines the receiver operating characteristic curves and will be denoted by Q, (F),
Q D ( F )= E { D ( x ) }
(2.5-4)
It is clear that the operating characteristic of the detector Q,(F) depends both on the probability distribution of the input data and the form of the detector. Since the random variable D ( x ) takes on only the values 0 and 1, the operating characteristic is bounded by O < Q D ( F )4 1
(2.5-5)
In terms of the operating characteristic of the detector, we can express the
22
2 Basic Detection Theory
false alarm and detection probabilities as (2.5-6)
Q,(F E H ) = a
(2.5-7) Q, ( F E K ) = 1 - p A typical receiver operating characteristic is illustrated in Fig. 2.5- 1 for the detection problem of Example 2.3-2, where
H: K:
F is a zero-mean, unit-variance Gaussian distribution F is a mean p > 0, unit-variance Gaussian distribution
From Fig. 2.5-1 it can be seen that as p increases, the probability of detection for a specified false alarm probability increases, and as the threshold T increases, the probability of accepting K becomes small. Although the receiver operating characteristic presented in Fig. 2.5-1 is for a specific problem, it illustrates that operating characteristics depend only on the detection and false alarm probabilities and are not a function of a priori probabilities or costs. The receiver operating characteristic is therefore quite general and can be very useful in analyzing detector performance.
5m 4
m 0
a a 0
0.5 PROBABILITY OF MLSE ALARM
a
1.0
Fig. 2Sl. Typical receiver operating Characteristic.
Another interesting property of the operating characteristic is that for any likelihood ratio test, the slope of the operating characteristic at a particular point is equal to the value of the threshold T required to achieve the false alarm and detection probabilities which define that point. This fact can be'proven by first considering a likelihood ratio detector for testing H versus K which has the form K
L(x) 3 T H
(2.5-8)
2.5 Receiver Operating Characteristics
23
The false alarm and detection probabilities of this test can be written by the use of Eqs. (2.4-6) and (2.4-7) as a = PF =
and 1-
LW
f ( L ( x ) l H ) dL
0 = P D = STm f ( L ( ~ ) ( dL K)
(2.5-9)
(2.5- 10)
Differentiating both Eq. (2.5-9) and Eq. (2.5-10) with respect to T, yields3 dpF/dT
= -fLIH
(2.5-1 1)
dpD/dT
= -fLlK
(2.5- 12)
and dividing Eq. (2.5-12) by Eq. (2.5-1 l), gives (2.5-13) It must be shown that
Working with the probability densities of the observations, the probability of detection can be written as = p [ L ( x )>
where rK = {x: L(x)
TIK]
f ~ l K ( ~ dx l ~ )
(2.5-14)
rK
> T}. Rewriting Eq. (2.5-14)
(2.5-15) which becomes upon making the change of variablesy = L(x),
Differentiating Eq. (2.5-16) with respect to T yields dpD/dT
= - TfLIH
(2.5- 17)
'We will employ subscripts on density functions whenever there is a possibility of confusion.
2 Basic Detection Theory
24
which can be equated to Eq. (2.5-12) to get
-
TfL(H
(TIH)
= -fLIK
(TIK)
or (2.5-18) Using Eq. (2.5-18) with Eq. (2.5-13) gives the result dPD/dPF = T
(2.5-19)
which shows that the slope of any one of the operating characteristic curves at a particular point equals the value of the threshold required to obtain the PD and PF values which define that point. The conditions of the Neyman-Pearson lemma may be stated in terms of the receiver operating characteristic as follows: If (? is the class of all detectors satisfying QD(FEH) a (2.5-20) then the optimum detector fi in this class for hypothesis H, alternative K, and false alarm probability a satisfies the conditions Q i ( F E K) 2 QD(F E K) forall D E (? (2.5-21) These statements simply say that the detector b is the most powerful test with level less than or equal to a which belongs to the class ‘2. 2.6
Composite Hypotheses
In developing the Neyman-Pearson criterion, no limit was put on the number of parameters which are necessary to define the probability densities under H and K. Each of these parameters, however, must be exactly specified in H and K ; that is, the hypothesis and alternative must be simple. Since composite hypotheses and alternatives occur often in detection problems, it is important that a procedure be established to treat these cases. Consider the problem of testing the composite hypothesis H versus the simple alternative K H : f ( x l H ) = f(x, 8 ) with 8 E Q K : ~ ( X I K ) = f(x, 8,)
(2.6-1)
where Q is a set of possible values for 8 and 8, B Q is a specific value of 8. The general approach to the problem is to reduce the composite hypothesis to a simple hypothesis, and then test the simple hypothesis versus the
2.6 Composite Hypotheses
25
simple alternative. That is, instead of determining the most powerful test which satisfies D ( x ) f ( x I H )d x = J w D ( x ) f ( x , 8 ) d x < a
(2.6-2)
-W
for every 8 E 52, find the most powerful test which satisfies
for a specific 8 = 8,. The detection problem now becomes that of testing a simple hypothesis versus a simple alternative as given by H': ~ ( X I H = ) f ( x , 8,) K : f ( x l K ) = f ( x , 01)
(2.6-4)
The most powerful detector for testing H' versus K as specified by Eq. (2.6-4) can be found by utilizing the Neyman-Pearson fundamental lemma. If the detector for testing H' versus K , which is a member of the larger class of tests which satisfy Eq. (2.6-3), also belongs to the smaller class of tests which satisfy Eq. (2.6-2), then the detector for testing H' versus K is also most powerful for testing H versus K as given by Eq. (2.6-1). In many detection problems, the composite hypothesis may be reduced to a simple one by taking a weighted average of the density under the hypothesis f ( x l H ) = f ( x , 8) utilizing an assumed probability density h ( 8 ) for 8. That is, a test is found among the larger class of tests which satisfy
1
D ( x ) f ( x , 8 ) dx h ( 8 ) d o < a
(2.6-5)
instead of among the smaller class which satisfies Eq. (2.6-2). This results in a new hypothesis of the form (2.6-6) If the density for 8 is properly chosen, the detector for testing the new simple hypothesis and alternative will be most powerful for the original problem of testing a composite hypothesis versus a simple alternative. The question arises as to what criterion to use to select the proper value of 8, or correct density h ( 8 ) . In the case of a finite number of 8 parameters, the following procedure is quite useful. Inspect each 8 E Q and choose for 8, that 8 which most resembles the alternative parameter 8,; that is, select for 8, that 8 E Q which would seem to be most difficult to test against 8,. The resulting most powerful detector, must then be checked to see if it satisfies Eq. (2.6-2) for every 8 E 52. If it does, the value for 8, 8 = 8,, is called the least fuuorable choice of all 8 E 52.
26
2 Basic Detection Theory
For the case where 8 takes on a range of values, the density h ( 8 ) should be chosen such that it gives as little information about 8 as possible, and such that the weighted average over all 8 E 52 yields a simple hypothesis statement which is most difficult to test against the alternative. A most powerful test is found by the fundamental lemma and it is checked to see if it satisfies Eq. (2.6-2) for every 8 E 52. The distribution of 8, h ( 8 ) , which yields a test that satisfies Eq. (2.6-5) for every 8 E 52 is called the least favorable distribution of 8. The choice of a least favorable distribution of 8 usually requires the system designer to have some intuitive feeling of the 8 variation, such as symmetry or a specific order. When such intuitive guidelines are not available, the selection of a least favorable distribution is extremely difficult. A common example of the assignment of a least favorable distribution occurs in the problem of detecting narrowband, noncoherent pulses. The signal to be detected for this case has the form S ( t ) = A ( t ) cos[wt + 91 where 9, the phase angle, is unknown. Since no information about 9 is available, 9 is usually assumed to have the density function
(y
0 Q 9 2a otherwise which demonstrates maximum uncertainty in the value of 6, and hence gives little information concerning the unknown parameter. Helstrom [ 19681 proves that this distribution of 9 is truly least favorable. In certain cases, one detector may maximize the power of the test for all F E K even when K is composite. Detectors with this property are called uniformly most powerful (UMP). The concept of a uniformly most powerful test is best illustrated by means of two simple examples. h(+) =
Example 2.6-1 Consider the problem of testing the simple hypothesis H versus the composite alternative K where H and K are given by H: x = v K: x = m + v and x is the observation, v is zero mean, Gaussian noise with variance u2, and m is a fixed but unknown parameter. The probability densities under H and K can be written as
2.6
27
Composite Hypotheses
and
Using these two densities and forming the likelihood ratio yields [ ( x - m)’ - x’] .XP{ -
2a’
K
} HS T
which can be further simplified to yield mx 2 -
m’ = TI‘ T’ + U’ H 2a’ If m is assumed to be greater than zero, this result can be expressed by
(2.6-7) The threshold T”’ must be chosen to fix the false alarm probability a at the desired level ao. Writing the expression for a and equating it to the desired false alarm probability ao, gives
= S m lexp( - 2a2 ) dx= X’
T”’
a.
(2.6-8)
(I
Since the parameter m does not appear in the above expression for ao, the threshold T”’ is independent of m, and consequently, the test is independent of m. The test given by Eq. (2.6-7) with T“’ determined from Eq. (2.6-8) is therefore the most powerful test of level a. independent of the value of m, and therefore, Eq. (2.6-7) defines a UMP test for testing H versus K. A second example will illustrate a simple detection problem where a UMP test does not exist.
Example 2.6-2 Consider testing the composite hypothesis H versus the composite alternative K shown below H: x = - m + u K: x = m + u where x is the observation, u is zero mean Gaussian noise with variance a’, and m is a fixed but unknown parameter. The probability densities under
28
2 Basic Detection Theory
H and K can be written as
and
Forming the likelihood ratio and simplifying yields 2mx Z T' U2
Assuming m
H
> 0, the test can be written in the form (2.6-9)
Writing the expression for the false alarm probability and equating this expression to the desired level ao,gives a = JTOPf(xIH) dx=
Jm T" %a
erp[ -
(x
+ ml2 2a2
]
dx= a.
The above expression for a. clearly depends on the parameter m, and hence if T" is chosen to fix a. utilizing this expression, T" will also be a function of m. Since the threshold T" and therefore the test depends on m, the test defined by Eq. (2.6-9) is not the most powerful test of level a. for all values of m, and thus is not a UMP test for testing H versus K. When a UMP test does not exist for a composite hypothesis or alternative, a detector which is most powerful in some suitably restricted sense may prove useful. Unbiased statistical tests are of great importance because there is a large class of problems for which a UMP unbiased test may be defined, even though a UMP test does not exist. Unbiasedness is a simple and reasonable condition to impose on a detector. Definition: Unbiased Detector A detector D (x) is unbiased at level a if QD(F E H ) a (2.6-10a) and (2.6-lob) In other words, a detector is unbiased if the probability of detection for all
2.6
Composite Hypotheses
29
F E K is never less than the level of the test. The reasonableness of requiring a test to be unbiased can be seen by noting that if the conditions given in Eq. (2.6-10) are not satisfied, the probability of accepting H will be greater under some distributions contained in the alternative K than in some cases when the hypothesis H is true.
Definition: Optimum Unbiased -Detector If A is the class of unbiased detectors of level a, a detector D is optimum unbiased, or UMP unbiased, for hypothesis H and alternative K at level of significance a if Q,(F E K ) 2 Q,(F E K ) forall F E K and D € A (2.6-11) Although Eq. (2.6-10) states the concept of unbiasedness quite clearly, it is difficult to employ this definition to find most powerful unbiased tests because of the presence of the inequalities. For this reason when it is desired to restrict the class of tests being considered to only unbiased tests, the concept of similar regions or similar tests is often useful. Definition: Similar Tests A detector or test D ( x ) is said to be similar of size a for testing the hypothesis H if QD(F)
=Iw D ( x ) f ( x I H )d x = i x / ( x l H ) d x = -03
(2.6-12)
for all possible F E H [Kendall and Stuart, 1967; Fraser, 1957a; Lehmann, 19591. From Eq. (2.6-12) it can be seen that for a test to be similar, the test must make incorrect decisions at the maximum allowable rate a for all densities that belong to the composite hypothesis. This does not seem to be an intuitively desirable property for a detector to have, and indeed, it is not. The importance of similar tests is that they provide a route by which an optimal unbiased test can be found, since unbiasedness is a much more reasonable property for a detector to possess. If an unrestricted UMP test for a composite hypothesis cannot be found and it is desired to find an optimal unbiased test for this#hypothesis,the procedure is as follows. Find a detector that is most powerful similar for testing a boundary hypothesis A (i.e., a hypothesis on the boundary between H and K) versus the original alternative. Since all unbiased tests of the original hypothesis are similar tests of the boundary hypothesis, if the test of the boundary hypothesis is most powerful similar, and it can be shown that this test is unbiased, then the test is optimal unbiased for the original composite hypothesis. Although this procedure may seem circuitous and unlikely to be very
30
2
Basic Detection Theory
useful, the fact is that similarity and unbiasedness are closely related, and unbiased tests can frequently be found from analyzing the similar regions of a particular hypothesis testing problem. In this book, a rigorous proof that certain tests are UMP similar will not be given, since the demonstration of such a property generally requires mathematics beyond the intended scope of the book. The concept of similarity will be used frequently, however, and in these cases, the required proofs will be referenced for the more mathematically advanced reader. A general example of the type of problem that can be solved by the above technique is that of testing the mean of a Gaussian distribution with known variance: H: p
< 0,
a’isunknown
versus K : p > 0, a’is unknown The hypothesis and alternative being tested are composite since it is desired to detect any positive mean value as opposed to any negative mean value, and not specific values of mean. In addition the variance is unknown. Since the hypothesis and alternative are composite and no fixed form can be written for the probability densities under H and K , the Neyman-Pearson lemma cannot be immediately employed to solve the problem. Suppose, however, that it is desired to limit the detectors considered to only unbiased tests, and a UMP detector within this limited class is required. The first step in solving this problem is to replace the unknown a’ with some estimate 6’ and find a detector for testing the boundary hypothesis H : p = 0,
f Gaussian with a’
=
6’
K : p = h, f Gaussian with a’ = 6’ by using the Neyman-Pearson lemma. If the resulting test statistic is independent of a’, then the detector is UMP similar for testing the original H and K. If the detector can then be shown to be unbiased, the required UMP unbiased detector for testing H versus K has been found. Locally most powerful (LMP) tests are useful when certain weak signal assumptions are valid. Locally optimum or LMP detectors maximize the power of the test for a subset of the probability distributions specified by the alternative K which are close in some sense to the hypothesis H. Since detectors are most useful under weak signal conditions, LMP detectors are very important. Two other properties of interest when considering nonparametric detectors are consistency and robustness.
2.7 Detector Comparison Techniques
31
Definition: Consistericy Considering the input data set x and a fixed alternative K , a detector D is said to be consistent if the probability of detection approaches one for any F E K as the number of samples becomes large, (2.6- 13)
The limiting process must be taken in such a manner that the detector has the same false alarm rate a under the hypothesis H for each n. Some detectors, although not nonparametric in the sense already defined, are robust or asymptotically nonparametric. Definition: Asymptotically Nonparametric A detector D ( x , , . . . , x,,) is robust or asymptotically nonparametric if lim QD(x,,
n+w
,, ,
,xa)( F
)=a
for all F E H
(2.6- 14)
2.7 Detector Comparison Techniques In general, there are many possible solutions to a given detection problem, each of which may produce different results. In order to evaluate the performance of these detectors, some meaningful criterion must be established. Two types of comparisons are usually made: (1) A comparison of a parametric and a nonparametric detector under some subclass of the hypothesis and alternative, and (2) a comparison of two nonparametric detectors under the same hypothesis and alternative. One method for comparing two detectors under the same hypothesis and alternative is to compute the relative efficiency of one detector with respect to the other. Many different definitions of relative efficiency exist in the statistical literature, but the definition most commonly used is due to Pitman [1948] and may be stated as follows: Definition: Relative Efficiency If two tests have the same hypothesis and the same significance level and if for the same power with respect to the same alternative one test requires a sample size N , and the other a sample of size N,, then the relative efficiency of the first test with respect to the second test is given by the ratio e I a 2= N 2 / N l . It is clear from the above definition that the relative efficiency is a function of the significance level, alternative K, and sample size of the two detectors and is therefore highly dependent on the experimental proce-
32
2
Basic Detection Theoty
dures used when taking the data. In order to completely compare two detectors using the definition given above, the relative efficiency must be computed for all possible values of a,K, N,, and N,. Besides the fact that this computation may be quite difficult, there does not usually exist for small N, a sample size N , such that the powers of the two detectors are exactly equal. To circumvent this problem, some kind of interpolation of the power of the second test as a function of N must be performed. Hodges and Lehmann [ 19561 have cited a specific example in which they compared two detectors utilizing linear interpolation to achieve the desired power and then compared their results with those of Dixon [1953, 19541 who utilized polynomial interpolation to obtain equal powers. The values obtained for the relative efficiency utilizing the two interpolation methods were markedly different, and the two methods did not even yield the same trend (increasing or decreasing in value) for the relative efficiency as the alternative was varied. The finite sample relative efficiency is difficult to compute, highly dependent on experimental methods, and even peculiar to the mathematical techniques employed for its computation. For these reasons, a simpler expression somewhat equivalent to the finite sample relative efficiency is usually employed. The simpler expression for the relative efficiency is obtained by holding the significance level and power constant, while the two sample sizes approach infinity and the alternative approaches the hypothesis. This technique yields the concept of asymptotic relative efficiency (ARE). Definition: Asymptotic Relative Efficiency The ARE of a detector D,, with respect to detector D, can be written as n2
El,, = lim n1-m n,
(2.7- 1)
n2+m
K+H
where n , and n2 are the smallest number of samples necessary for the two detectors to achieve a power of 1 - j3, for the same hypothesis, alternative, and significance level. The alternative K must be allowed to approach the hypothesis, because for the case of consistent detectors against fixed alternatives for large sample sizes, the power of the detectors approaches 1. If K is allowed to approach H as n , and n2 + 00, the power of the detectors with respect to the changing alternative converges to a value between the significance level a and 1. In many cases the ARE defined by Eq. (2.7-1) is just a single, easy-to-use number, independent of a. The ARE of two detectors can also be written in terms of the ratio of the efficacies of the two detectors if certain regularity conditions are satisfied
2.7
Detector Comparison Techniques
33
[Capon, 1959al. Given a test statistic S,, for deciding between the hypothesis H: 9 = 0 (signal absent) and the alternative K : 9 > 0 (signal present), the regularity conditions can be stated as: (1) S,, is asymptotically Gaussian when H is true with mean and variance E { S , J H } and var{ S , J H } , respectively; (2) S,, is asymptotically Gaussian when K is true with mean and variance E { S,,IK} and var{ S , J K } ,respectively; (3)
lim
K-+H
var{ Sn I K } = 1; var{S,IH}
constant independent of n and dependent only on the probability densities under H and K; (7)
lim var{S,,lH} = 0.
n-+m
The regularity conditions are essentially equivalent to requiring that the detector test statistic be consistent as defined by Eq. (2.6-13) [Capon, 1959bl. The expression in regularity condition ( 5 ) is called the efficacy of the detector. It can be shown that if the test statistics S,, and S,* for two detectors satisfy the regularity conditions, that the ARE of the S,, detector with respect to the S,* detector is equal to the ratio of the efficacies of S,, and S,*, that is [Pitman, 1948; Capon, 19601 (2.7-2) The definitions of the asymptotic relative efficiency in Eqs. (2.7-1) and (2.7-2) and the regularity conditions listed above are valid only for detectors which have one input. Since both one- and two-input detectors are treated in the sequel, it is necessary to state the analogous definitions for two-input detectors.
Definition: A RE for Two-Input Detectors The asymptotic relative efficiency of detector D, with respect to detector D4 is given by (2.7-3)
2
34
Basic Detection Theory
where m,,n, and m,, n, are the sample sizes of the two sets of observations for detectors D, and D,,respectively. Of course, the ARE in Eq. (2.7-3) can also be written in terms of the efficacies of the two detectors if the following regularity conditions are satisfied. Given a test statistic S,,,,, for testing the hypothesis H : 8 = 0 versus the alternative K: 8 > 0, 8 a signal-to-noise ratio parameter, the regularity conditions can be stated as: (1') S,, is asymptotically Gaussian when H is true with mean and variance E { S,,IH} and var{ Sm,IH},respectively; (2') S,, is asymptotically Gaussian when K is true with mean and variance E { S,,IK} and var{ S,l K}, respectively; (3')
lim
K+H
var{ S m n I K } = 1; var{ S,,IH }
is independent of m and n, and depends only on the probability densities under H and K ;
(7')
lim
m, n-oo
var{ S,,IH} = 0.
When two test statistics S,, and S i n satisfy the regularity conditions, the ARE of the detector based on S,, with respect to the S& detector is equal to the ratio of the efficacies of S,,,,, and S i n , which is ESm,S& = 6 , /GS& (2.7-4) Perhaps not surprisingly, the efficacy can be related to the distinguishability of the test statistic probability densities under the hypothesis and alternative [Hancock and Lainiotis, 19651. In order to demonstrate this fact, consider a one-input detector with test statistic S, and define as a measure of separability the quantity (2.7-5)
which is designated the output signal-to-noise ratio (SNR,,h for reasons that will become evident momentarily. That Eq. (2.7-5) is a valid measure of the distinguishability of the test statistic probability densities when H and K are true can be seen by noting that as the difference in the locations
2.7 Detector Comparison Techniques
35
or mean values becomes larger or the variance of the test statistic under K becomes smaller, SNR,,, in Eq. (2.7-5) increases. Define further the input signal-to-noise ratio as SN\, = e (2.7-6) where 8 = 0 when H is true, and 8 > 0 when K is true. The interpretation of Eqs. (2.7-5) and (2.7-6) as signal-to-noise ratios is now clearer, since 8 is obviously a measure of the SNR at the detector input and Eq. (2.7-5) is the SNR after the input has been operated on by the detector. Returning to the goal of relating SNR,,, to the efficacy, consider the regularity condition (4) which becomes for weak signals (8' >
SNR,,
(2.7-9) e-o
(2.7- 10)
The efficacy of a detector test statistic is thus a measure of the improvement in SNR afforded by the detector for 8 small and n large. The efficacy is extremely useful for calculating the ARE of various nonparametric detectors, and its use is demonstrated in later chapters. The ARE is in general independent of a, and hence the ARE is not dependent on experimental methods as the finite sample relative efficiency is. The ARE has two obvious deficiencies, however. Since no detector is able to base its decision on an infinite number of observations, the ARE does not give the engineer or statistician the relative efficiency of the two detectors for the finite number of samples used in the detection procedure. The requirement that K+H is also a severe limitation since few practical problems require the testing of a hypothesis and alternative that differ only infinitesimally from each other. How useful then is the ARE as a criterion for comparing detectors when the number of observations is not large or
36
2
Basic Detection Theoiy
when the alternative is not close to the hypothesis? Calculations of small sample relative efficiencies using a common test statistic and similar test conditions indicate that the ARE and the small sample relative efficiency, in general, rank the tests in the same order; however, the actual values of the ARE should be used only as an approximation of the small sample relative efficiency of two detectors. When the ARE is used to compare two detectors, the detector used as the basis for the comparison is usually chosen to be the optimum, or most powerful, detector, assuming one exists, for the hypothesis and alternative being tested. When this is the case, the ARE of the nonoptimum detector with respect to the optimum detector is some number less than one. If however, deviations from the assumptions on which the optimum detector is based occur, the ARE of the nonoptimum detector with respect to the optimum detector may then become greater than one. This is because the optimum detector may be highly susceptible to changes in design conditions, whereas the suboptimum detector may be based on more general assumptions and thus be less sensitive to environmental changes. It should also be noted when comparing a suboptimum nonparametric detector to an optimum parametric detector that the ARE is, in general, a lower bound for the relative efficiency. The relative efficiency tends to be largest when the sample size is small and then decreases to the lower bound given by the ARE as the number of samples becomes large. This statement is only true, however, for vanishingly small signals; that is, K close to H. 2.8
Summary
In this chapter the basic detection theory concepts have been introduced, several approaches to the detection problem have been discussed, and various detector evaluation and comparison techniques have been described. The initial discussion explained the difference between simple and composite hypotheses and alternatives and gave examples of physical situations where simple and composite hypotheses arise. The false alarm and detection probabilities were then defined and their use in the receiver operating characteristic was explained. The importance and generality of the receiver operating characteristic for evaluating the performance of detectors was described in detail and several properties of the ROC were pointed out. The Bayes criterion, the ideal observer criterion, and the NeymanPearson criterion were introduced and developed. The applicability of the Bayes criterion, which minimizes the average risk, and of the ideal observer
2.8 Summary
37
criterion, which minimizes the total average probability of error, was discussed. The use of the Neyman-Pearson criterion when costs and a priori probabilities are unknown or when the false alarm probability must be maintained at or below some predetermined level was described. The Neyman-Pearson fundamental lemma was derived and the sufficiency of the Neyman-Pearson lemma for defining a most powerful test was proven. A modification to the Neyman-Pearson lemma for determining a most powerful test when the hypothesis or alternative is composite was presented and discussed, thus introducing the concept of a least favorable distribution. The important concept of a UMP test was introduced and two examples were given to further clarify the subject. Since UMP tests cannot always be found, optimum unbiased and locally most powerful tests were defined and their applicability discussed. The two asymptotic properties of consistency and robustness were also defined. The important problem of comparing two detectors was then considered with the concepts of finite sample relative efficiency, asymptotic relative efficiency (ARE), and efficacy being introduced. The difficulty of computing the finite sample relative efficiency due to its dependence on the significance level, alternative, and sample size was pointed out. Two expressions for the ARE were given, one in terms of efficacies, and the usefulness of the ARE described. The regularity conditions necessary for the relation between the ARE and efficacies to hold were given.
CHAPTER
3
One-Input Detectors
3.1 Introduction
This chapter is concerned with introducing and developing several one-input parametric and nonparametric detectors that can be utilized in communication and radar systems. The term one-input as used here is intended to delineate the number of sets of input data samples, not the number of data samples themselves. In this chapter parametric detectors for testing for a positive signal embedded in Gaussian noise for various simple and composite hypotheses and alternatives are derived utilizing the Neyman-Pearson fundamental lemma. The parametric detectors are developed primarily to be used as a baseline for the evaluation of the performance of the nonparametric detectors in Chapter 4 and to illustrate the application of the NeymanPearson lemma. The Student’s t test, the uniformly most powerful unbiased detector for testing for a positive shift of the mean when the noise is Gaussian with an unknown variance, is derived and shown to be asymptotically nonparametric and consistent. Six nonparametric detectors, the sign detector, the Wilcoxon detector, the normal scores test, Van der Waerden’s test, the Spearman rho detector, and the Kendall tau detector are presented and discussed. The basic form of each of the detector test statistics is given and the computation of each test statistic is illustrated. The important properties and the advantages and disadvantages of each detector with respect to implementation and performance are also discussed.
38
3.2 Parametric Detectors
39
3.2 Parametric Detectors One of the classic problems in detection theory and one that has received much attention in the literature is the detection of a deterministic signal in additive, white Gaussian noise. The noise, which will be denoted by u, is usually assumed to be zero mean with known variance u;. The input process to the detector is sampled at intervals such that the observations obtained are independent. The input to the detector is denoted by x = { x,, x2, . . . , x n } , assuming n samples of the input to have been taken, and x will consist of either noise alone, xi = ui, or signal plus noise, xi = si + ui. The noise alone possibility is called the null hypothesis or simply, hypothesis, and is denoted by H,. The signal plus noise possibility is called the alternative hypothesis, or alternative, and is denoted by K,. If F ( x ) is the cumulative distribution function, p is the mean, and u2 is the variance of the input, the detection problem may be written as testing H,:
F ( x i ) is Gaussian with pi = 0, u,? = u;
versus KO: F(xi) is Gaussian with pi = si, u,? = u; for i = 1, 2, . . . , n. The problem is therefore simply one of testing the mean of a normal distribution. Since each of the xi are independent and identically distributed, the joint density of the xi under the hypothesis H,, which will be denoted by f(xlHo), equals the product of the n identical densities, thus (3.2-1) Similarly, the density under the alternative KO,denoted by f(xl KO)can be written
Utilizing the Neyman-Pearson fundamental lemma, the likelihood ratio can now be formed
40
3
One-Input Detectors
which is then compared to a threshold T ,
]fr
(3.2-4)
HO
To simplify Eq. (3.2-4), take the natural logarithm of both sides (2skxk
'k')
-
20:
k- 1
2 In T = T'
and rearrange the terms
or 2
x n
k-1
KO
skxk 2 2ugT'+ Ho
n
2 si
k- 1
Since each s, is a known signal, Xi-lsi is known and is the energy in the transmitted signal. The above expression can now be rewritten as
The optimum detector under the Neyman-Pearson lemma is therefore
i
k - 1' k X k (
< To + > To
Do(x) = 0, Do(x) = 1,
accept H , accept KO
(3.2-5)
where the threshold Tois determined by setting the false alarm probability a. A somewhat simpler detection problem results when all of the sk are assumed to be equal to some unknown positive constant C. The detection problem then becomes that of testing H I : F ( x J is Gaussian with pi = 0, u,? = ug
versus K , : F ( x j ) is Gaussian with p j = C
> 0, u,? = ui
where i = 1, 2, . . . , n. The detector given by Eq. (3.2-5) becomes
41
3.2 Parametric Detectors
or
5
k-,
.k{
< TI * > T, =$
D , ( x ) = 0, D,(x) = 1,
acceptH, accept K,
(3.2-6)
which is sometimes called the linear detector. As before, the threshold T I may be determined under the NeymanPearson criterion by fixing the false alarm probability a. To determine T I , the density of the test statistic under the hypothesis H I must be obtained. Since each of the xi have a Gaussian distribution with zero mean and variance ui, the statistic S = Et_Ixk has a Gaussian distribution with zero mean and variance nu;
The expression for the false alarm probability may then be written (3.2-7) By setting a = a0,the threshold T I may be determined from Eq. (3.2-7),
Making a change of variables with y = ( S / f i a. =
m
1
uo)
exp( 7 -Y2 d) y = 1 - a[
-1
TI
fi(I0
(3.2-8)
where a[-1 is the cumulative distribution function of a zero mean Gaussian random variable with unit variance. Solving Eq. (3.2-8) for T I yields TI = fiu0@-'[ 1 -
(yo]
(3.2-9)
A slightly more general detection problem is obtained if the noise power ui is assumed to be unknown. The hypothesis and alternative to be tested are then
H,: F(x)is Gaussian with p = 0 and unknown variance u2 K,: F ( x ) is Gaussian with p > 0 and unknown variance u2 Lehmann and Stein [ 19481 have shown that for a < f an optimum detector for testing H , versus K, does not exist in the unrestricted sense. In the same paper they show that for a
> 4,
a detector which implements the
42
3 One-Input Detectors
Student’s t test is uniformly most powerful for testing H , versus K,. Since false alarm levels greater than one-half are of little practical interest, a detector which is optimum in a restricted sense will now be considered. If attention is limited to the class of unbiased rules, the optimum detector for testing H , versus K, may be derived as follows. The input to the detector is assumed to be n independent and identically distributed random variables xi, i = 1, 2, . . . , n. The density function of the xi under the hypothesis H, is therefore given by (3.2- 10) and similarly the joint density of the xi under K, may be written as j ( x l K , ) = ( 2 r ~ ~ ) - exp{ ” / ~-
i
(Xj -
j-1
20,
]
d2
(3.2-11)
Since the densities of p and u2 are unknown, these parameters will be estimated by maximizing the two density functions in Eqs. (3.2-10) and (3.2-1 1) with respect to the unknown parameters. Under the hypothesis H,, u2 can be estimated by utilizing
(a/au2)
l n f ( x ( ~ , )= o
(3.2- 12)
Taking the natural logarithm of Eq. (3.2-10) and substituting into Eq. (3.2- 12),
Equating this to zero and solving for u2 yields (3.2- 13) as the estimate of the variance under H,. The estimates of p and u2 under K2 are found by simultaneously solving the equations
a lnf(xlK,) acl
=0
and
a lnf(xlK2) = 0 au2
Let us first consider the partial derivative with respect to p :
3.2 Parametric Detectors
43
Equating this result to zero and solving for p gives
P,
=
1 " J'
2 xj
(3.2-14)
1
which is independent of the variance. Similarly for the estimate of u2 under K29
Equating this result to zero and rearranging yields .
n
(3.2- 15)
where jiKis given by Eq. (3.2-14). Forming the likelihood ratio
and substituting the estimates of p and u2 into this expression,
n
2I xi'
J-
J-
1
(3.2- 16)
3 One-Input Detectors
44
Equation (3.2-16) can be rewritten by using
I:xi'= x n
n
i- 1
i- I
(Xi
-
&),+ nil;
Using this expression, the likelihood ratio becomes n/2
(3.2- 17)
The quantity in the brackets is 1 + [ t Z / ( n - I)] where t has the t distribution with n - 1 degrees of freedom when H , is true. Since L ( x ) in Eq. (3.2-17) is a monotonic function of t 2 , the test based on t Z is equivalent to the likelihood ratio test defined by Eq. (3.2-17). Since t 2 = 0 when L(x) = 1, and t 2 becomes infinite when L(x) approaches infinity, a critical region of the form L(x) > A > 1 is equivalent to a critical region t 2 > B. The critical region is therefore defined by the extreme values for t, both positive and negative. Since the alternative K, is concerned only with positive signals and since t Z is a monotonic function of t, the test statistic for the hypothesis H , versus the alternative K , becomes n
Since under H , the test statistic 6 has a limiting Gaussian distribution with zero mean and unit variance, the threshold T may be approximately determined from (3.2- 19) @ ( T )= 1 - (Yo for a large number of observations. The probability of detection or power of the test can then be found from the expression 1 - p = 1 - O [ T - p] since the test statistic has a Gaussian distribution with mean p and unit variance under K,. For small n, T can be exactly determined by utilizing the equation (Yo = P ( 6 > TIH,} where a0 is the desired false alarm rate, by using tables of the t distribu-
45
3.2 Parametric Detectors
tion. Knowing T, the power of the test can be found from 1 - /3= P ( 6 > TIK,} = 1 - P ( 6 < T J K , } using tables of the noncentral t distribution (see Hogg and Craig, 1970 for a discussion of the noncentral t distribution). Since the threshold T is chosen according to the Neyman-Pearson criterion utilizing the distribution of the Student’s t statistic given by the left-hand side of Eq. (3.2-18), the false alarm probability clearly does not depend on the parameters p or 6’. The Student’s t test given by Eq. (3.2-18) is therefore invariant for any parameter values under H , and K,. Even though the Student’s t test has been derived utilizing a generalized form of the Neyman-Pearson lemma, the use of maximum likelihood estimates in the probability densities prevents the Student’s t test from being UMP for testing H , versus K,. Indeed, all that the calculation which led to Eq. (3.2-18) has done is to illustrate how the Student’s t test might be derived; no optimum properties can be inferred from the derivation, even for a restricted class of tests. Fraser [1957a] has shown that the Student’s t test given by Eq. (3.2-18) is uniformly most powerful similar for testing H , versus K,. By the discussion of similarity and unbiasedness in Section 2.6, the Student’s t test will be the UMP unbiased test of H , versus K,,if it can be shown that the Student’s t test statistic in Eq. (3.2-18) is unbiased. As stated in Section 2.6, a detector is said to be unbiased at level a if Q,(F) < a forall F E H and if Q,(F)
>a
forall F
E
K
To check to see if the Student’s t test is unbiased, let t, be chosen such that (3.2-20)
when p = 0 and where S is defined by
Consider a value of p
> 0; then
46
3
Therefore,
One-Input Detectors
Q, (F) > a
for all F E K2
(3.2-2 1)
and from Eq. (3.2-20), E { D ( x ) l H , } = Q, ( F ) = a
for all F E H,.
(3.2-22)
The Student’s t test is therefore the UMP unbiased test of level a for testing the hypothesis H, versus the alternative K,. To evaluate the performance of the Student’s t test for a nonparametric problem, let the form of the underlying noise distribution be unknown so that the hypothesis and alternative become H,:
p = 0, u2 finite, F ( x ) otherwise unknown
K,:
p
> 0, u2 finite,
F ( x ) otherwise unknown
Since the detector defined by Eq. (3.2-18) is the optimum unbiased detector under the assumption of Gaussian noise, the false alarm rate will not in general remain constant for arbitrary input noise distributions. As the number of input observations becomes large, however, the Student’s t test exhibits an interesting property. The bracketed quantity in the denominator of Eq. (3.2-18) can be shown to converge in probability to u2 as the number of observations becomes infinite. By the central limit theorem, which states that the distribution of the sum of n independent and identically distributed random variables is asymptotically Gaussian [Hogg and Craig, 19701, E x i / fi has a limiting Gaussian distribution with zero mean and variance a2 under H,. The test statistic on the left-hand side of Eq. (3.2-18) thus has a limiting Gaussian distribution with zero mean and unit variance under H,. Since the Student’s t test statistic is asymptotically Gaussian, if the threshold T is chosen according to Eq. (3.2-19), the false alarm probability will remain constant for arbitrary input noise distributions if the number of samples is large. The Student’s t test is therefore asymptotically nonparametric. Since Zxi/ds has a limiting Gaussian distribution with mean fi p and variance u2 under the alternative K,, the mean of the test statistic in Eq. (3.2-18) approaches infinity as n + 00, and the probability of detection goes to one. The Student’s t test is thus consistent for the alternative K, by Eq. (2.6-13). The Student’s t test is therefore the optimum unbiased detector for the hypothesis H , versus the alternative K2 and is an asymptotically nonparametric detector for testing the hypothesis H , versus the alternative K3. Since the detector is only asymptotically nonparametric for H , versus K,, the false alarm rate will not remain constant, and may be significantly greater than the desired level, for small n and non-Gaussian input noise.
3.3 Sign Detector
47
Bradley [ 19631 has noted and studied an interesting phenomenon that occurs when the Student’s t statistic is being used to test for a location shift of a probability density function. When testing for a location shift, it seems intuitive that the hypothesis should be accepted or rejected according to whether the sample mean, K = 2:,Ixi/n, essentially the numerator of Eq. (3.2-18), is small or large, respectively. In H2and K,, however, the variance is unknown and thus must be estimated from the set of observations. The test statistic in Eq. (3.2-18) therefore depends not only on the sample mean but also on the sample variance. The statistic may thus exceed the threshold not due to a positive location shift, but due to a small value of the sample variance in the denominator. Large values of the test statistic so obtained are called irreleuant, since the hypothesis is rejected not due to a change in the characteristic being tested but due to a chance value of a parameter which has no relationship to the hypothesis being tested. Such a parameter is sometimes called a nuisance parameter. Bradley’s results show that irrelevant values of the test statistic adversely affect the power of the test for small sample sizes with the effects of irrelevance virtually negligible for large sample sizes. This result is intuitive since one would expect that the sample variance would approach the true variance at large sample sizes.
3 3 Sign Detector For a detector to be classified as nonparametric or distribution-free, its false alarm rate must remain constant for a broad class of input noise distributions. A detector whose test statistic uses the actual amplitudes of input observations is not likely to have a constant false alarm rate for a small number of observations. This is clear from intuition and from the previous discussion on parametric detectors and the Student’s t test. The use of the input observation amplitudes may be avoided by discarding all amplitude information and using only polarity information, or by ranking the input data and using the relative ranks and polarities in the detection process. The simplest nonparametric detector would logically seem to be one that utilizes only the polarities of the input data observations. A detector which bases its decision on the signs of the input data may be derived as follows. Let the input data consist of n independent and identically distributed samples of the input process denoted by x = {xl, x2, . . . , x,,}. The detection problem consists of deciding whether noise alone is present or noise plus a positive signal is present. If the probabilityp is defined by p = P{x,
> O}
= 1
- e(0)
(3.3-1)
48
3 One-Input Detectors
where F,(x,) is the cumulative probability distribution of the xi, the hypothesis and alternative to be tested may be stated as H,: p = 1/2, F otherwise unknown (3.3-2) K,: p > 1/2, Fotherwise unknown Since the hypothesis and alternative described above are composite, an optimum test cannot be found by a straightforward application of the Neyman-Pearson lemma. If, however, an optimum test of level a is found for testing the simple hypothesis F(xlH,) versus the simple alternative F(xlK,) and it can be shown that this test maintains a level < a for all densities under H,, then the test is also most powerful for testing H4 versus K.,. This is simply an application of the modification to the NeymanPearson lemma described in Section 2.6, where the least favorable distribution of the unknown parameter is found. To find the most powerful test of H4 versus K,, letf' be the conditional density of xi given that the input is positive andf- the conditional density given that the input is negative, f' = f ( x i l x i > 0); f - = f ( x i l x i < 0) (3.3-3) The f' and f - densities are illustrated in Figs. 3.3-la and 3.3-lb7 respectively.
b
a
Fig. 331. Sketches off' and f - .
A typical density under the alternative f(xilK4) can be written as f(XiIK4)
= Pf+ +(I - P)f-
(3.3-4)
To choose the density f(xilH,) which yields a most powerful test for all densities in H,, the procedure described in Section 2.6 is sometimes useful. Since H4 provides little information concerning f(xil H,), the density f ( x i l H 4 ) should provide as little help as possible for testing against f(xilK4);f ( x i l H 4 ) should therefore be as close tof(xilK4) as possible. Let (3.3-5) f(XiIH4) = 2 (f' + f - ) The density f ( x i l H 4 ) has zero median, and hence belongs to H,; and f(x,lH,) closely resembles f ( x i l K 4 ) since f ( x i l H 4 ) has the same shape as If(x,IK4) on the positive and negative axes.
49
3.3 Sign Detector
Since the xi are independent and identically distributed, the likelihood ratio may be written as (3.3-6) The ratio of the signal present density to the signal absent density in Eq. (3.3-6) has different values when the input is positive and when it is negative, for xi > 0
f (xiIK4) f (xilH4)
(3.3-7) for xi
O [-I, t < O The Kendall 7 coefficient in Eq. (3.8-2) is a linear function of the statistic =
(3.8-3)
and therefore, T' can be used as the test statistic when testing hypotheses. The Kendall T test statistics considered thus far require two different sets of observations and are therefore two-sample detectors. To obtain a one-input detector that is directly related to Kendall's basic 7 correlation coefficient, let n observations of an input process be obtained and be denoted by x = (xI,x2,. . . , x,,}. Similar to the basic Kendall 7 coefficient, the rank of each observation is compared to the rank of every succeeding observation and the number of times that R., > R, is totaled. The test statistic may be written as follows n-1
K, =
n
X 2
-
4,)
(3.8-4)
i=l j=i+l
where u( .) is the unit step function defined earlier, and R., is the rank of the jth observation. Another form of K, does not utilize ranks explicitly, but compares each observation with every following observation and sums the number of times that xi > xi.This procedure may be written as the statistic
x
n-1
K,, =
n
2
U ( X ~-
xi)
(3.8-5)
i=l j-i+l
This statistic K,. is clearly amenable to a recursive form. For instance, if after n samples have been received and K,. has been calculated, another sample is taken, the statistic based on all n + 1 observations may be calculated using n
(3.8-6) i= 1
where K,. is given by Eq. (3.8-5). In order for the test statistic K, or K,. to be of any utility, a specific rank order of the input samples must be expected; that is, the detector that uses the test statistic K, or K,. decides if the rank order of the input observations is the same, within reasonable limits, as a specific rank order that the system designer wishes to detect. This requirement may seem to limit the practical utility of the K,(K,,) detector, but there are cases where a specific amplitude modulated sequence may be received. A particular
3.9 Summary
71
pulsed radar application is discussed in Chapter 9. The Kendall 7 coefficient as well as the statistics K, and K,. are nonlinear, and therefore special consideration is necessary when calculating their statistical properties. The Spearman p coefficient discussed previously is the projection of the Kendall 7 coefficient into the family of linear rank statistics, and this fact may be used to circumvent some of the computational problems [Hajek and Sidak, 19671. The detector based on the Kendall 7 coefficient is very efficient when compared to the Student’s t test under the assumption of Gaussian noise. This fact, along with the properties that the statistic may be computed recursively and easily implemented digitally, makes the Kendall 7 detector of great practical importance.
3.9 Summary In this chapter the most powerful (parametric) detectors for testing for the presence of a known signal and an unknown positive signal in zero mean, Gaussian noise with known variance were derived. The variance or noise power was then assumed to be unknown, and the UMP unbiased detector for testing for a positive signal in zero mean, Gaussian noise was derived by finding maximum likelihood estimates for the mean and variance. It was then shown that the resulting detector, the Student’s t test, is both asymptotically nonparametric and consistent. The simplest nonparametric detector, the sign detector, was derived and shown to be the UMP detector for testing for the positive shift of the median of a distribution. The invariance of the sign detector for certain nonlinear operations on the input data was explained. The Wilcoxon detector was then presented and its mean and variance derived. An alternative form for the Wilcoxon detector test statistic was given and its equivalence to the basic form of the test statistic illustrated by an example. The invariance of the Wilcoxon detector as well as any symmetric detector to any nonlinear transformation of the input data which is continuous, odd, and strictly increasing was then proven. The normal scores test was next presented and proven to be the locally most powerful detector for testing for the mean of a Gaussian distribution with known variance. A closely related detector, Van der Waerden’s test, was then discussed, and the equivalence of the normal scores and Van der Waerden’s tests for an infinite number of observations was shown. The fact that both detectors may be cumbersome to apply due to the transformations required was pointed out.
72
3 One-Input Detectors
The last two detectors to be discussed were the nonparametric Spearman
p and Kendall7 detectors, which are both related to the common product-
moment correlation coefficient. Several different forms for the test statistic of each detector were given, and an example was utilized to illustrate the calculation of the Kendall 7 coefficient. It was also pointed out that the Spearman p coefficient is the projection of the nonlinear Kendall 7 coefficient into the family of linear rank statistics. Problems 1. Given the 20 observations listed below, it is desired to test the hypothesis H , against the alternative K , using the one-input Wilcoxon detector. The observations are: 1.4 6.2 - 4.4 6.5 7.7 4.1 8.1 2.4 - 0.2 1.1 3.0 7.1 4.9 4.3 - 0.4 0.6 - 0.1 5.4 5.2 3.9 The desired false alarm probability is a = 0.1. Hint: Use the Gaussian approximation to the distribution of the Wilcoxon detector test statistic. 2. Given a set of 10 independent, identically distributed observations of a random variable x with cumulative distribution function F ( x ) , it is desired to test the hypothesis
H: F ( 0 ) = 0.5 against the alternative K:
F ( 0 ) < 0.5
using the sign detector. Write an expression for the power function of the test and find the false alarm probability a if the threshold T = 8. Assume that F(0IK) = P [ x < OIK] = 0.1 and find the probability of detection. 3. Use the sign detector to test the hypothesis of a zero median against the alternative of a positive shift of the median for the three sets of observations given. Let a = 0.1 and do not randomize to obtain the exact a. (i) (ii) (iii) 4.
tions
1.7 0.8 - 0.2 1.3 - 1.8 2.2 9.3 1.4 - 1.2 3.3 6.4 2.1 - 1.2 1.1 0.0 3.1 2.6 - 0.6
1.4 - 0.6 - 1.1 1.9 0.6 0.4 8.8 6.1 - 0.9 - 2.1 2.4 1.4
Use the Wilcoxon detector to test H , versus K , given the 10 observa-
0.4 2.0 1.2 - 0.2 - 0.8 0.9 2.2 0.6 0.1 Let a = 0.1 and use the exact distribution of the test statistic.
1.3
Problems
73
5. Repeat Problem 4 using the Gaussian approximation. 6. For n = 12 observations from a Gaussian distribution with mean p = 32.3 and variance u2 = 1, it is desired to compare the performance of the sign detector and the optimum Neyman-Pearson parametric detector for testing the hypothesis
H: F(31) = 0.5
against the alternative K: F(31)
< 0.5
The desired significance level (Y is 0.08. Find the threshold T to insure the desired false alarm probability, the exact value of a, and the probability of detection 1 - p for both the sign detector and the optimum Neyman-Pearson detector for this problem. 7. For the 10 observations in Problem 4, compute the value of Van der Waerden’s test statistic. 8. Calculate the value of the normal scores test statistic for the 10 observations given in Problem 4. (Use the table of expected values of normal order statistics in Appendix B.) 9. Calculate the value of the Spearman p detector test statistic for the 10 observations in Problem 4. Discuss how the threshold of the detector can be selected to obtain a significance level a = 0.05 when testing H , against
4.
10. Investigate the accuracy of the Gaussian approximation when determining the threshold for (i) the sign test, and (ii) the Wilcoxon detector. Let (Y = 0.05 and consider the cases where n = 8, 10, and 12 observations are available. Assume the hypothesis and alternative to be tested are H , and K,.
CHAPTER
4
One-Input Detector Performance
4.1
Introduction
In this chapter the various one-input detectors discussed in Chapter 3 are compared using the concepts qf ARE and finite sample size relative efficiency and sample calculations of the ARE are given. Since ARE calculations for all detectors and various background noise densities clearly cannot be given here, calculations of the ARE of the sign detector and of the Wilcoxon detector with respect to the optimum parametric detector for Gaussian noise are presented as illustrations of ARE calculations for the two types of nonparametric detectors discussed in this book, those utilizing only polarity information and those utilizing both polarities and ranks. The ARE of the sign detector with respect to the linear detector, given by Eq. (3.2-6), when detecting a positive dc signal for various background noise densities is calculated in Section 4.2 using the basic definition of the ARE given in Section 2.7. In Section 4.3, the ARE of the Wilcoxon detector with respect to the linear detector when detecting a positive dc signal for various background noise densities is calculated utilizing the expression for the ARE in terms of the efficacies of the two detectors. It is also proven in Section 4.3 that the Wilcoxon detector is never less than 86.4% as efficient as the linear detector for large samples and for testing for a location shift regardless of the background noise. The asymptotic relative efficiencies of the six one-input nonparametric detectors developed in Chapter 3 with respect to the linear detector, when testing for a location shift for various background noise densities, are given in Section 4.4. In 74
4.2
Sign Detector
75
addition, the results of a study using a different asymptotic efficiency definition are also included in this section. Finite sample size performance is discussed in Section 4.5 and the results compared with that indicated by the asymptotic performance measures. 4.2
Sign Detector
The general detection problem that will be considered is the detection of a dc signal in background noise which has a symmetric but otherwise unknown probability density. The hypothesis and alternative can therefore be written as H s : p = 1/2 or p = 0, f symmetric but otherwise unknown (4.2-1) Ks: p > 1/2 or y > 0, f symmetric but otherwise unknown
where p is the mean off and p = P { x > 0} = 1 - F(0). To compute the ARE utilizing Eq. (2.7-1), the asymptotic false alarm and detection probabilities of the two detectors to be compared must be calculated, set equal to each other, and solved for the ratio of n1 to n2. Since the optimum detector for the present problem when the noise is Gaussian distributed is the linear detector given by Eq. (3.2-6), this detector will be used as the basis of comparison for calculating the ARE. In order to determine the asymptotic probability of detection for the linear detector, the distribution of the test statistic under the alternative K, must be ascertained. Letting n, be the number of samples of the input process, the linear detector may be written as
5
< TI * D , ( x ) = 0, accept H , (4.2-2) k-1 > Tl * D , ( x ) = 1, accept& where the xi, i = 1, 2, . . . , n, are the input samples. At present, the only xk(
assumption on the background noise density is that it is symmetric. If the input observations xi are assumed to be elements of a random sample, the distribution of the test statistic C",l,,x, as n, becomes large is asymptotically Gaussian with mean n l y and variance n,02 by virtue of the central limit theorem. Utilizing the expression for the probability of detection given by Eq. (2.2-6) and the asymptotic probability density of the detector test statistic, the asymptotic probability of detection may be written as
Making a change of variables with y = ( x - n1y ) / c u, the asymptotic
76
4
One-Input Detector Performance
probability of detection becomes
where the last step follows since the integrand is symmetric with respect to the origin. Equation (4.2-3) may clearly be written as (4.2-4) which is the asymptotic probability of detection for the linear detector. The asymptotic false alarm probability may be determined similarly by using the fact that the noise has zero mean under H , and by noting that if the xi are the elements of a random sample, then the test statistic has a limiting Gaussian distribution with zero mean and variance n1u2.Using Eq. (2.2-2), the asymptotic false alarm probability may be written as
Making the change of variables y = x / G u, Eq. (4.2-5) becomes
which is the asymptotic false alarm probability for the linear detector. An expression for the threshold may be obtained by solving for T , in Eq. (4.2-6),
which yields
T, =
6u a-'[ 1 - a&]
(4.2-7)
Substituting Eq. (4.2-7) for T I in Eq. (4.2-4) yields 1-
where
@-I(-)
pol = (D
is the inverse of
nlp - G u @ - l [ l
@[-I.
- aD,]
1
(4.2-8)
4.2
77
Sign Detector
The asymptotic false alarm and detection probabilities must now be calculated for the sign detector defined by Eq. (3.3-13). Let n2 denote the number of samples taken, and thus for the present detection problem, the sign detector may be written as
< T2 * > T, *
n2
i- 1
D,(x) = 0, accept H, D,(x) = 1 , accept K ,
(4.2-9)
Following the development in Section 3.3, let k equal Zy-,u(xi), the number of positive observations in the sample. Under the alternative K,, u ( x i ) can take on the value 1 with probability p , and therefore k = 2%lu(xi)is binomially distributed with parameters n, andp. The probability density of k under K, is therefore
which has a mean n,p and a variance n,p(l - p ) . As n, becomes large, the binomial density in Eq. (4.2-10) is asymptotically Gaussian with mean n2p and variance n,p (1 - p ) . The asymptotic probability of detection can now be found using the asymptotic probability density and Eq. (2.2-6) to get
which upon making the change of variables y becomes
(nzp
--I-, -
- TZ)/VW(I - p )
1 e-~'/2 (&
G
The asymptotic probability of detection for the sign detector under the alternative K,, is therefore (4.2-1 1)
Under the hypothesis H,, p = 1 - p = 1 / 2 and the binomial density of
78
One-Input Detector Performance
4
k becomes
(4.2-12)
As n2 becomes large, the density in Eq. (4.2-12) is asymptotically Gaussian with mean n2/2 and variance n2/4. Using this result in Eq. (2.2-2), the asymptotic false alarm rate may be written as
(4.2-13) which by making the change of variables z = ( x - n 2 / 2 ) / ( G /2) be-
which is the asymptotic false alarm rate of the sign detector. Solving for the threshold in Eq. (4.2-14) 1-
(YDz
=@
2T, - n2
or @-"l which yields
(YDJ
=
2T, - n,
T2=4(n2+V+q1
G -
(4.2-15)
(YD2])
Substituting Eq. (4.2-15) into the expression for the asymptotic probability of detection, Eq. (4.2-1l), yields
=@
2
which becomes after factoring
6 p -G
+ @-yl -
(YD,
79
Sign Detector
4.2
The asymptotic relative efficiency of the sign detector with respect to the linear detector can now be determined by equating Eqs. (4.2-8) and (4.2-16) and Eqs. (4.2-6) and (4.2-14) and solving for the ratio of n 1 to n2.
- P D , = - PO,,
(YD,= (YD,=
(Y
6(2p - 1) + W ' ( 1 - a) G C L -U
Factoring out
cp-yl - a) =
2
i
S
'
6 on the left-hand side,
and dividing both sides by
Solving for
G ( 2 p - 1) + cp-'(l - a)
1
6,
V G ,
U2
-n1 -- -
p2n2 r
-
"+2 2
i S Y ] i
m 1
-2
cp-yl - a)
+
(2p - l)a2
CLvm-
80
4
One-Input Detector Performance
Letting nl and nz approach infinity and using the definition of ARE given by Eq. (2.7-1), the ARE of the sign detector with respect to the linear detector is (4.2- 18) Ka+ Ha
as the alternative & approaches the hypothesis H,. The expression for the ARE in Eq. (4.2-18) is not very useful; however, by using the basic assumptions of this detection problem, namely that f(x) is symmetric, p > 1/2 (under the alternative &), and p is small (since K, + H8), a more informative expression may be obtained. From the definition of p, p = P{x, > 0} = 1 - F ( 0 ) = 1
-I f(x)dx 0
-m
wheref(x) is a symmetric density function with mean p and variance u2. Making a change of variablesy = x - p ,
(4.2-19) since f(y) is a zero mean symmetric density function with variance u2. Since p is small, p may be approximated by (4.2-20) P = t +Pf(O) Substituting this expression for p into Eq. (4.2-18),
(4.2-21) As K, approaches H,, p + 0 and the ARE of the sign detector with respect to the linear detector is E2,
, = 4u2f2(0)
(4.2-22)
for testing the hypothesis H8 versus the alternative K,. It should be noted that in this case the ARE is independent of the false alarm probability a. To obtain numerical results, a specific form for the noise density f must be assumed. Assuming that the noise is Gaussian with zero mean and
4.3
variance u2,
Wilcoxon Detector
81
f(0) = 1 / m u
Substituting this result into Eq. (4.2-22),
E2, = 4u2(1/ fiu)Z= 4u2(1/27m2) (4.2-23)
The sign detector is therefore asymptotically only 63.7% as efficient as the linear detector when the background noise.is Gaussian with zero mean and variance u2. This result is not surprising since the linear detector was shown to be optimum for detecting a positive dc signal when the noise is Gaussian. Assume now that the background noise has a double exponential or Laplace distribution, (4.2-24)
which when evaluated at the origin becomes
Substitutingf(0) into Eq. (4.2-22), (4.2-25)
which is the ARE of the sign detector with respect to the linear detector when the background noise has a density given by Eq. (4.2-24) [Thomas, 19701. It is clear then that although the sign detector is only 63.7% as efficient as the linear detector when the noise is Gaussian, the sign detector is twice as efficient as the linear detector when the noise has a double exponential density. This illustrates the point made previously that although a detector may be optimum for one set of design conditions, the “optimum” detector may be less efficient than a suboptimum detector when deviations from the design conditions occur. 4 3 Wilcoxon Detector
As an example of the calculation of the ARE for detectors which use both polarity and rank information, the ARE of the Wilcoxon detector
82
4
One-Input Detector Performance
with respect to the linear detector will now be calculated. In order to perform these calculations, a slightly different approach must be taken. As discussed in Section 2.7, the basic definition of ARE is under certain regularity conditions equal to the ratio of the efficacies of the two detectors being compared. The efficacy of a detector is defined as (4.3-1) where S is the detector test statistic and 8 = 8, is the hypothesis being tested [Kendall and Stuart, 19671. For the present detection problem, 8 = p which equals zero under the hypothesis. In order to derive the ARE, the expected value and variance of both test statistics must be calculated and then substituted into Eq. (4.3-1) to find the efficacy of each detector. The test statistic of the linear detector defined by Eq. (3.2-6) will.be denoted by S , , and its mean and variance will be computed first. The mean of S, is E{Sl}=E
K1I x
xk =
"I
&=I
E{xk} =
x "I
p= nip,
&= 1
(4.3-2)
and the variance is given by
x "I
var{Sl} = var{
"I
xk] =
k- 1
2
var{xk}
&- 1
"I
=
2
(4.3-3)
u2= n,u2
k= 1
Substituting these values into Eq. (4.3-1), the efficacy of the linear detector becomes
To compute the mean of the Wilcoxon test statistic, a different form of the test statistic must be used. Define U, as
The Wilcoxon test statistic will be denoted by- S,- and can now be written using U, as "2
s2
=
2
x "2
i = l j=1
Kj
4.3
83
Wilcoxon Detector
Taking the expected value of S , yields (4.3-5) If xi and xi are elements of a random sample, then the expected value in Eq. (4.3-5) becomes nz
E(S2) =
n2
2 2 p=
i = l j-1
n:p
(4.3-6)
where p is the probability that xi is greater than xi. To determine p, note that p is just the probability that xi - xi < 0. The density function of the sum of two independent random variables simply equals the convolution of their respective densities; for instance, if z = w + y ,
f,(z)
=JW
-w
f w ( z-
v ) f , ( v )dv
The cumulative distribution function of z can now be found fromf,(z) as follows
(4.3-7) The density of 3 - xi can be obtained from,Eq. (4.3-7) by letting z = xi xi, - y in Fw( -) equal x - p, and y in f,(.) equal x , to yield F ( x j - xi)
= F(I xj 00
-w
- xi
+x
- p ) f ( x ) dx
(4.3-8)
Since p is the probability that xi - xi < 0, p can be found from Eq. (4.3-8) as p = F(0) =
J-:
F(0
+ x - p ) f ( x ) dx
(4.3-9)
Using this result in Eq. (4.3-6) and taking the partial derivative yields a/aP{E(S,)}
=a/aP[n;p]
(4.3-10)
84
4
One-Input Detector Performance
The variance of the Wilcoxon test statistic is given by
var{S2} = &n2(n2 + 1)(2n2 + 1)
(3.4- 15)
which was calculated previously. The efficacy of the Wilcoxon test statistic can now be found from Eqs. (4.3-1), (4.3-lo), and (3.4-15) to be
(4.3-1 1) The ARE of the Wilcoxon detector with respect to the linear detector can now be found by taking the ratio of Eq. (4.3-1 1) to Eq. (4.3-4), (4.3-12) Equation (4.3-12) is the ARE of the Wilcoxon detector with respect to the linear detector for a background noise density f(x). If the background noise density is Gaussian, then
and
dx= - (4.3-13) 2 a G Substituting Eq. (4.3-13) into the expression for the ARE in Eq. (4.3-12) yields =-
4
1
= 12d[
-1
1 2aG
e-Xz/d
2
12a2 4af
3
= -= T
(4.3- 14)
4.3
85
Wilcoxon Detector
The Wilcoxon detector is therefore 95.5% as efficient as the linear detector for the problem of detecting a positive dc signal in Gaussian noise. This result is quite amazing since the linear detector is the optimum detector for this detection problem. If the noise has a Laplace density
then,
Substituting this result into Eq. (4.3-12) yields E 2 , 1= 12.”
z] l
2= 12u2 = 3 (4.3-15) 802 2 The Wilcoxon detector is thus 1.5 times as efficient as the linear detector when the noise has a Laplace distribution. The ARE of the Wilcoxon detector with respect to the linear detector for the two densities considered thus far is quite high. Because of this fact, the question arises as to whether a nonzero lower bound for the ARE exists for this case, and if so, what is the minimum value? Hodges and Lehmann [ 19561have proven a theorem which states that a lower bound for the ARE of the Wilcoxon detector does exist, and that the minimum value is 0.864 for the detection of a positive shift. Following their development, the derivation of the lower bound requires the minimization of the ARE given by Eq. (4.3-12), or equivalently, the minimization of
for a fixed u2. Since Eq. (4.3-12) is invariant to changes of location or scale, u2 and E { x } may be chosen to have the values u2 = 1 and E { x } = 0. The problem thus becomes the minimization of (4.3- 16) subject to the conditions m
S__f(x)dx=
1
and
J- x 2 f ( x ) dx= m
m
1
(4.3-17)
86
4
One-Input Detector Performance
Using the method of Lagrange undetermined multipliers, the above conditions can be satisfied by minimizing
- 2 v 2 - x ’ ) f ( x ) } dx
J-;{P(x)
(4.3- 18)
Equation (4.3- 18) may be minimized by taking the derivative with respect tof(x) and equating the result to zero,
=JW
--m
{2f(x) - 2h(02 - x ’ ) } dx= 0
(4.3- 19)
Solving Eq. (4.3-19) for f ( x ) , m
00
21-_f(x) d x = 2]-_X(e2
- x 2 ) dx
yields
x(e2- x 2 )
for x 2 G
e2
for x 2 > e2
o
(4.3-20)
The parameters X and 8 may be determined from the conditions in Eq. (4.3-17) as follows 00
/-mf(x) dx= J -e e h ( 0 2 - x’) dx= $he3 = 1
(4.3-21)
and 00
J-_xY(x)
dx=l”
--m
Ax2(02 - x’) d x = $X05 = 1
(4.3-22)
Solving Eq. (4.3-21) for X and substituting into Eq. (4.3-22),
or e2
=5
(4.3-23)
Using the value of 8 in Eq. (4.3-21), and solving for A,
$he3 = g h [ 5 v 3 ] = I or
A=-
3 20v3
(4.3-24)
4.4
87
One-Input Detector AREs
To determine the minimum value of the ARE, the density in Eq. (4.3-20) may be substituted into Eq. (4.3-16) to yield
Using the values for 8 and A given in Eqs. (4.3-23) and (4.3-24) to evaluate Eq. (4.3-25), f ( x ) } 2 d x = %(\15)5(
3
5v3
(4.3-26)
Remembering that u2 = 1, the lower bound on the ARE for the Wilcoxon detector may be found by utilizing Eqs. (4.3-12) and (4.3-26),
(4.3-27) The lower bound on the ARE given by Eq. (4.3-27) means that for large samples and for testing for a location shift, the Wilcoxon detector will never be less than 86.4% as efficient as the linear detector regardless of the background noise density. 4.4
One-Input Detector AREs
Table 4.4-1 gives the asymptotic relative efficiencies of the nonparametric detectors discussed in this book with respect to the Student’s f test or the linear detector for various background noise densities when applied to the problem of detecting a positive location shift. All of the detectors have a surprisingly high value of ARE with respect to the linear detector when the background noise is Gaussian, considering the fact that the linear detector is the optimum (UMP) Neyman-Pearson detector for the Gaussian noise case. The sign detector has the lowest ARE for all cases shown except for the Laplace noise density case. The lower efficiency of the sign detector for most of the cases is not surprising since the sign detector uses only polarity information. The’high ARE value of the sign detector for the Laplace density is an unexpected result, however it can be shown that the sign detector is weak signal optimum for this case [Carlyle, 19681.
88
4
One-Input Detector Performance
Table 4.4-1 Asymptotic relative efficiencies of certain one - input nonparametric detectors with respect to the optimum detector for Gaussian noise Nonparametric detector Sign Wilcoxon Normal scores ( c , ) Van der Waerden Kendall tau ( 7 ) Spearman rho (p)
Bounds for ARE Noise density Gaussian Uniform Laplace Exponential Lower Upper 0.637 0.955 1.000 1 .000 0.912 0.912
0.333 1.000
0 0.864
2.000 1.500
1.o
0.312 0.312
00
1.o
The high ARE values of the Wilcoxon detector have already been discussed quite fully. The very high lower bound of the Wilcoxon detector coupled with the relative ease with which it may be implemented make the Wilcoxon detector one of the most useful detectors for a wide variety of applications. The normal scores and Van der Waerden’s tests are as efficient as the linear detector for Gaussian noise, and it has been shown by Savage and Chernoff [1958] that these two detectors always have an ARE > 1 when compared to the linear detector. This is because both the normal scores and Van der Waerden’s detectors utilize Gaussian transformations to compute their statistics, and the linear detector is based on the assumption of Gaussian background noise. The result is that the normal scores and Van der Waerden’s detectors are as efficient as the linear detector when the noise is Gaussian and more efficient when the noise is non-Gaussian. The Kendall 7 and the Spearman p detectors both have a high ARE relative to the linear detector when the noise is Gaussian, but they are much less efficient when the noise has an exponential distribution. The Kendall 7 and Spearman p detectors have the same ARE since the two tests are equivalent for large sample sizes. The utility of any asymptotic relative efficiency computation or asymptotic performance indicator depends on how well the asymptotic results predict finite sample performance. The ARE definition used thus far in the text, besides requiring the assumption of a large number of observations, is based on the additional assumption that the alternative is arbitrarily close to the hypothesis. More specifically, the ARE is only a weak signal performance indicator. This may sometimes prove to be a fatal restriction if the prediction of detector performance for a finite number of samples and large signals is desired. In order to eliminate the limitation to weak signal detector performance evaluation, Klotz [ 19651 developed asymptotic
4.4
One-Input Detector ARES
89
efficiency curves by examining the exponential rate of convergence to zero of the false alarm probability while maintaining a constant probability of detection. Since a is allowed to vary, it is no longer necessary for the alternative to approach the hypothesis to prevent the detection probability from becoming one when considering consistent tests. The end result is that the asymptotic efficiency values obtained by Klotz have a better agreement with finite sample detector performance than the ARE. Using the above approach, Klotz computed the relative efficiencies of the one-input Wilcoxon, normal scores, and sign detectors with respect to the Student’s t test or linear detector when the input noise density is Gaussian. As the distance p between the hypothesis and alternative probability densities increases from zero, the efficiency of the normal scores detector decreases from an initial value of 1.0, while the efficiency of the Wilcoxon detector increases from an initial value of 0.955, until they cross between p = 1.OOO and p = 1.125. The efficiencies of both detectors then decrease together as p increases. The asymptotic efficiency of the sign test peaks at an efficiency value of approximately 0.76 for p = 1.5 and then approaches the relative efficiency of the normal scores and Wilcoxon detectors as p becomes larger. For p > 3.0 the sign detector has essentially the same relative efficiency as the rank detectors. The three nonparametric detectors are also compared by Klotz for logistic and double exponential noise distributions. For the case of a logistic distribution, the Wilcoxon detector has the highest relative efficiency value for small signals and maintains this superiority as p approaches 3.0. The sign test is the most efficient detector for p = 0 and a double exponential distribution, but the Wilcoxon detector becomes more efficient than the sign test for p > 0.55, and the normal scores detector becomes more efficient than the sign test for p > 1.15. The Wilcoxon detector outperforms the other two detectors for p = 3.0. Of the conclusions that can be drawn from these results, perhaps the most important is that the Wilcoxon detector maintains a reasonably high performance level for the three probability densities and the various location shifts considered. Another important conclusion is that the ARE in Eq. (2.7-1) is generally valid only for the case of vanishingly small signals, and relative detector performance may be opposite to that predicted by the ARE for moderate to large signal inputs (location shifts). It should also be noted that the sign test performs much better for moderate to large location shifts than is indicated by its ARE values. It is clear from the preceding discussion that although nonparametric detectors utilize only polarity and rank information, the loss in efficiency is many times quite small when compared to the optimum linear detector; and if the background noise varies from the Gaussian assumption of the
90
4
One-Input Detector Performance
linear detector, the nonparametric detectors may be infinitely more efficient than the linear detector. It can also be concluded that asymptotic results, in general, and the ARE, in particular, may not accurately predict detector performance for finite sample sizes. For this reason, detector performance studies when only a small number of observations are available for the detection process are extremely important. The following section discusses some of these investigations that have been reported in the literature. 4.5
Small Sample Performance
Small sample detector performance studies have generally been shunned by researchers since such studies are limited in general interpretation by the sample size, false alarm and detection probabilities, and the hypothesis and alternative being tested. However, investigations of this type are important since conditions under which the detector will be applied are more accurately duplicated than when using asymptotic results, and hence the performance evaluation is more accurate. In this section, three interesting studies of one-input nonparametric detector performance are discussed and the results compared to those obtained using asymptotic procedures. The finite sample performance of the one-input Wilcoxon and normal scores detectors for a Gaussian noise density and location shift alternatives has been investigated by Klotz [1963]. Klotz calculates both the power of the Wilcoxon detector and its relative efficiency with respect to the Student’s t test for false alarm probabilities in the range 0.001-0.1 and for sample sizes from 5 to 10. The location shift values considered vary from p = 0.25 to 3.0, with results also obtained for p = 0 and p + co. One general result is that the relative efficiency tends to decrease slightly for all significance levels a and sample sizes n considered as p varies from 0 to 1.5. Furthermore, for all sample sizes and all false alarm probabilities except a = 0.001, the relative efficiency is always greater than or equal to the ARE value of 0.955. The relative efficiency values are larger for smaller sample sizes and tend to be concave downward as a function of increasing a for fixed n, with the maximum occurring for a 2 0.05. The power and relative efficiency of the normal scores test with respect to the Student’s t test was also computed by Klotz [1963]. This part of the investigation considered significance levels in the interval 0.02 < a < 0.1 and n in the same range used for the Wilcoxon detector. The decision regions for the normal scores and Wilcoxon detectors are the same for the combinations of a and n given by (n < 7, a < 0.10), (n = 8, a < 0.054), (n = 9, a < 0.027), and (n = 10, a < 0.0186). The relative efficiency val-
4.5
Small Sample Performance
91
ues when computed for the normal scores detector with respect to the Student’s t test are all less than 1.0 for p < 1.5. This is an important result since the normal scores detector has an ARE with respect to the t test which is always greater than or equal to 1.0. Additionally, the normal scores detector has a higher relative efficiency than the Wilcoxon detector only for p < 1.25, with the relative efficiencies becoming approximately the same for p = 1.25, and the Wilcoxon detector showing an advantage in relative efficiency for some a and n combinations. Notice that this result is in good agreement with the asymptotic calculations performed by Klotz [ 19651 and reported in Section 4.4. Arnold [ 19651 has expanded on the work of Klotz and has computed the power of the Wilcoxon, sign, and Student’s t tests for nonnormal shift alternatives and a small number of input observations. In this investigation, the hypothesis of zero median is tested against various positive shifts of the median for the t distribution with degrees of freedom t , 1, 2, and 4. Sample sizes of n = 5 through 10 observations and false alarm probabilities in the 0.01 to 0.1 range are used in the calculations. The results are that both the Wilcoxon and Student’s t detectors exhibit a significant loss in power for the long-tailed t distributions (degrees of freedom = 4, 1) especially when compared to their detection probabilities for Gaussian alternatives. In fact, Arnold’s results show that the sign test maintains a higher detection probability than either the Wilcoxon or Student’s t test for a Cauchy distribution (degrees of freedom = 1). The Wilcoxon detector does have a higher power than the t test for this case, but in general, its performance is closer to that of the t test than the sign detector. The surprisingly poor performance of the Wilcoxon detector for the Cauchy distribution can be traced to the fact that reverse orderings of positive and negative observations are equally likely for the Cauchy distribution even though the rank sums of these reverse orderings may be radically different [Arnold, 19651. The finite sample relative efficiency of the sign test with respect to the Student’s t test is calculated by Blyth [1958] by fixing the sample sizes and a,and defining loss functions which depend on the detection probabilities of the two tests. The loss functions are then investigated as the ratio of the mean to the standard deviation of the underlying noise distribution, which is Gaussian, is varied. For a = 0.05 and n = 11, the Student’s t test has a 0.1 to 0.2 advantage in power over the sign test for values of the mean to standard deviation ratio from about 0.3 to 1.3. For ratios greater than or equal to 1.5, the powers are essentially the same. Other than the three investigations described in this section, seemingly few additional studies of nonparametric detectors have been conducted. Some engineering applications which consider only a finite number of
92
4
One-Input Detector Performance
input observations are discussed in Chapter 9. However, considering the very interesting, useful, and sometimes surprising results obtained in the studies described in this section, additional work in this area is clearly needed. 4.6
Summary
The purpose of this chapter has been to give examples of the calculation of the ARE of one-input rank and nonrank nonparametric detectors as well as to compare the performance of the six one-input nonparametric detectors discussed in Chapter 3 using the concepts of ARE and finite sample size relative efficiency. In reviewing the results of the comparison, the words of caution on applying the ARE expressed in Section 2.7 should be kept in mind. If the limitations of the ARE as a performance indicator are realized, the ARE can be an extremely useful performance comparison technique, since it allows many detectors to be compared both generally and simply. The results in Section 4.5 indicate, however, that whenever possible, a finite sample study should be performed which duplicates as nearly as possible the detector application environment. The calculation of the ARE of the sign detector with respect to the linear detector utilized the basic definition of the ARE given by Eq. (2.7-1). The asymptotic false alarm and detection probabilities of the two detectors were calculated and the ratio of the two sample sizes determined by equating the false alarm probabilities of the two detectors to each other and the detection probabilities of the two detectors to each other. This approach is, of course, very general and can be used whenever the asymptotic false alarm and detection probabilities can be found. The ARE of the Wilcoxon detector with respect to the linear detector was determined by using the expression for the ARE in terms of the efficacies of the two detectors. This technique is preferred to the approach described in the previous paragraph whenever it is simpler to find the mean and variance of the two detectors rather than the asymptotic false alarm and detection probabilities. Since the mean and variance of the linear detector test statistic are easily calculated and the variance of the Wilcoxon test statistic was determined in Chapter 3, the approach utilizing the efficacies of the two detectors was employed in this case. Following the calculation of the ARE of the Wilcoxon detector with respect to the linear detector in Section 4.3 is the derivation of an extremely powerful result. This result, originally proven by Hodges and Lehmann [1956], shows that the ARE of the Wilcoxon detector with respect to the linear detector will never be less than 86.4% for detecting a
93
Problems
positive location shift regardless of the background noise density. Since many communication and radar systems require the detection of a positive shift of the noise density and since the linear detector is commonly employed in these communication and radar systems, the high lower bound on the ARE of the Wilcoxon detector with respect to the linear detector indicates that the Wilcoxon detector might be gainfully employed in these systems. In Section 4.4, the ARE of the six nonparametric detectors discussed in Chapter 3 with respect to the linear detector for various background noise densities are tabulated and discussed. The ARES shown are surprisingly high, even for the case of Gaussian noise for which the linear detector is the most powerful detector. This section also contains the results of a study by Klotz [1965] using a different definition of asymptotic efficiency. This definition is important since it does not require the alternative to approach the hypothesis, which seems to be the primary weakness of the ARE in predicting finite sample size performance. A few detector performance evaluations based on a small number of input observations are discussed in Section 4.5. Using these results, some idea of the utility of the asymptotic performance measures can be obtained. From the power and relative efficiency values presented in this chapter, it can be concluded that the performance of nonparametric detectors compares very favorably with that of parametric detectors. Further, nonparametric detector performance tends to remain high for a variety of input noise distributions. These last two facts are the primary motivating factors behind the consideration of nonparametric detection procedures for engineering applications.
Problems 1. Compute the ARE of the sign detector with respect to the linear detector for a uniform noise distribution given by
*(.)
=
{ i:
-1/2
<x
o
-00
< y < 00
where 8 is the amount of the location shift of the distribution function and is the received signal-to-noise ratio (SNR).Of course, a form for the most powerful detector for testing the very general hypothesis H , versus the general alternative K9 cannot be found utilizing the likelihood ratio technique. If however, the distribution functions F and G are assumed to be Gaussian, a detector can be derived which is the most powerful unbiased detector for this particular problem. If the probability densities f ( x i ) and g(y,) are assumed to be Gaussian with means 9, and 9,, respectively, and with equal variance, say u2, the detection problem can be written as testing H,,:
9, = g2,f and g Gaussian with unknown variance a’
versus K,,:
9 , # g2, g2 > 0,f and g Gaussian with unknown variance u’
One method for testing H I , versus K,, would be to find the joint densities of x and y under H , , and K,,, and then form the likelihood ratio of the joint density given K,, is true to the joint density given H,, is true. This approach transforms the problem into the familiar framework used in Section 3.2. Since f and g are Gaussian and independent, the joint density of x and y given H , , is true can be written as w
7
Y P I , ) = f (xlHIo)g(YlHIo)
(5.2-1)
5.2
One-Input Detectors with Reference Noise Samples
99
and the joint density given that K , , is true can be written as h(x, YIK,,) = f(xlHlo)g(YlK,o)
(5.2-2) The problem of testing H I , versus K , , has been transformed into the equivalent problem of testing the density in Eq. (5.2-1) versus the density given by Eq. (5.2-2). Since the parameters 8, and u2 in Eq. (5.2-1) and the parameters B,, B2, and u2 in Eq. (5.2-2) are unknown, these parameters must be eliminated from the expressions before the likelihood ratio is formed. Following the procedure utilized in Section 3.2, these parameters will be estimated by the method of maximum likelihood. The parameters 8, and u2 in Eq. (5.2-1) will be estimated by utilizing the expressions (5.2-3) (a/ae,) In 4% YIH,,) = 0 and (a/au2) In h(x, ~IH,,)
=0
(5.2-4)
Taking the natural logarithm of Eq. (5.2-1) and substituting into Eq. (5.2-3)
yields
x m
i l l
xi
n
+ x y , = ( m + “)el j-I
5
100
Two-Input Detectors
or (5.2-5) Taking the natural logarithm of Eq. (5.2-1), substituting il, for using Eq. (5.2-4),
el, and
Solving for u2
Substituting J1, and @, into Eq. (5.2-1) yields
(5.2-7) To find the estimates of el, 8,, and u2 given that Klo is true, the following expressions must be solved simultaneously (5.2-8) in h ( x , YK,) = 0
(awl)
and
(ape,)
In h ( x , YIK,,) = 0
(a/auz)
In h ( x , YIK,,)
=
o
(5.2-9)
(5.2-10)
Taking the natural logarithm of Eq. (5.2-2) and substituting into Eq.
(5.2-8),
= (& ~ ~ ( x ; - 6 1 ) ( - l ) = o i= 1
5.2 One-Input Detectors with Reference Noise Samples
I01
or
(5.2-11 ) Similarly for
i,,
-
Substituting (5.2-lo),
i1, and i2for 8 ,
-(m+ n)/2
2
-
c
i- I
-
[
(xi
i- I
(5.2-12)
I
and 8, in h ( x , yIK,,) .and utilizing Eq.
(xi -
ln(2no2) -
2” Yj
I’
m
ew[-
-(m2+ ( 2 [ 5 n,
1
=
82
i:
J ~ , -~ Y(vj - b2)2 j- I
2 ( x i - J , , , ) ~+ 2 ( y j m
i-1
-
61, K 12 +
i: j- 1
j-
(Yj -
~2
8,)‘1( 2 I
=0
which when solved for 6’ yields
Substituting Eqs. (5.2-11), (5.2-12), and (5.2-13) for 8 , , O,, and u2, respectively, in Eq. (5.2-2) yields the maximum value of the density
(5.2- 14) Of
The generalized likelihood ratio can now be formed by taking the ratio Eq. (5.2-14) to Eq. (5.2-7)
5
I02
Two-Input Detectors
I).
Working with the first term in the denominator of Eq. (5.2-15)
x m
i
[ x i - ( k ) ( j l x k +
i- 1
j-
=
=
5
i- 1
x m
1
mK +nJ
[xi
-
[xi
- x12+ m E
i- 1
2
+
[
i= 1
[
-
m+n
mK +njj m + n
(5.2- 16)
and similarly with the second term in the denominator,
mK +njj j - 1
i= 1
k- 1
j- 1
j- 1
n
=
2 [ y j -ylz+ j= 1
[
n jj-
Expanding the last term in Eq. (5.2-16), m[P-
mE
+ w ]’=
n+m
(m
.[ 72
m+n
+
l2
(5.2-17)
( mXm ++nnjj 12]
+ nI2Z2 - ( 2 m ~ 2+ 2 n ~ j j ) ( m+ n ) + ( m x + nJI2 (m + n)’
=
m[
-
mn2(X - j j ) 2
(m
- 2 ~ ( + m njj) ~
mX +nJ m+n
+ n)’
1
(5.2- 18)
and similarly for the last term in Eq. (5.2-17),
(5.2- 19) Using the results of Eqs. (5.2-18) and (5.2-19) and substituting Eqs. (5.2-16)
5.2 One-Znput Detectors with Reference Noise Samples
I
and (5.2-17) back into Eq. (5.2-15), yields
L ( x , Y) =
2 (xi - E ) +~ 21
i- 1 m
I
- ( m +n ) / 2
n
m
103
(
j-
~
-j J)’
n
2 (xi- E)’ + 21 ( y , - j j 1 2 + ( m + n ) i- 1
m n ( i -J)’
J-
(m
+
(m+ n)/2
m + n
(5.2-20)
2 (xi - ?l2 + 21 (Yj - Y>’
i- 1
J-
since (X - J ) 2 = ( J - $, ( J - K)2 will be substituted for (K -y)’ in Eq. (5.2-20). Utilizing this substitution and taking the square root of the second term under the brackets and multiplying and dividing the denominator by
d(n
+ m - 2 ) / ( n + m - 2)
yields
+
where t has the Student’s t distribution with n m - 2 degrees of freedom when H I , is true. Equation (5.2-20) can therefore be written as
+
1
(m+n)/2
(5.2-2 1) n+m t2 - 2 L(x, y) in Eq. (5.2-21) is a monotonic function of t 2 and therefore a test based on t 2 will be equivalent to a test utilizing L(x, y) as the test statistic.
L(x, Y) = ( 1
5
I04
Two-Input Detectors
Since t 2 = 0 when L(x, y) = 1 and t 2 + 00 as L(x, y) + 00, an alternative acceptance region of the form L ( x , y ) > A > 1 is equivalent to an alternative acceptance region of the form t 2 > B; that is, the alternative K,,is accepted for positive and negative values of t with magnitudes greater than B. Since the signal to be detected, /I2 is assumed , to be positive and t 2 is a monotonic function of t , the test statistic for testing hypothesis H,, versus the alternative K , , can be written as
i
dzt[ i: J- 1y
1 n+m-2
)[ E i=l
[xi.-
1
j-- l
ix -"1 x
i
1i
1 " 2 " Xkl2+ [ y j - iz ykl2]} k- 1 j- I k c1
KlO
2 T,
Hlo
(5.2-22)
which is the two-input version of the Student's t test. For small m and n, the threshold T can be determined by fixing the false alarm probability at its desired value, say a,, and using tables of the t distribution to find the threshold; that is, let S denote the left-hand side of Eq. (5.2-22), then T can be determined from (5.2-23) a = a, = P { S > TIH,,} using tables of the t distribution (see Appendix B). After finding T using the above equation, the probability of detection can be found by using the expression 1 - p = 1 - P { S < TIK,o} and tables of the noncentral t distribution. [See Hogg and Craig, 1970, for a discussion of the noncentral t distribution.] For m and n large the statistic S under H,, is asymptotically Gaussian with zero mean and unit variance. This can be seen if the definitions (5.2-24a)
x = -1 s;=-2
"
"
i-1
and
(5.2-24b)
E X i i-1
n J'1.
[
(5.2-24~) k-1
k- 1
(5.2-24d)
5.2 One-Input Detectors with Reference Noise Samples
I05
are substituted into the left-hand side of Eq. (5.2-22) to yield
In Eq. (5.2-25) the second quantity in the denominator in brackets is just the estimate of the variance u2, which will be denoted by 6,. Substituting into Eq. (5.2-25),
The quantity j j - X in Eq. (5.2-26) is Gaussian distributed, since x and y are independent and Gaussian distributed, and for m and n large and H,, true, p - X has zero mean and variance 02[(1/m) + (1/.)I. Since for m and
n large, the denominator of Eq. (5.2-26) approaches ud(l/m) + ( l / n ) , S is asymptotically Gaussian with zero mean and unit variance by the central limit theorem. The threshold can thus be found as was done in Chapter 3 by specifying a and utilizing a = 1- @(T)
(3.2-19)
and with this value of T, the power of the test can be determined from 1-
p
=
@[ T - p ]
since S is asymptotically Gaussian with mean p = 8, - el, and unit variance when K , , is true. It was stated earlier that the two-input version of the Student’s t test given by Eq. (5.2-22) is UMP unbiased for testing H I , versus Klw That this statement is true can be seen by employing an argument directly analogous to that used when discussing the one-sample Student’s t test in Section 3.2. Kendall and Stuart (19671 have shown that the two-sample Student’s t test statistic is the UMP similar test for the present hypothesis and alternative. By the discussion of similarity and unbiasedness in Section 2.6, the Student’s t test will be the UMP unbiased test of H,,versus K , , if it can be shown that the test statistic on the left-hand side of Eq. (5.2-22) is unbiased. This result can be demonstrated in a manner analogous to that employed for the one-sample Student’s t test in Section 3.2. Let t, be
I06
5
Two-Input Detectors
chosen such that
(5.2-27) when 8, = 8,, and where SD is defined by
Consider now a case such that 8,
> 8,;
[ ( J -2) - ( 8 2 - 4 ) l SD Therefore,
QD(F) > a
and from Eq. (5.2-27), E { D ( x , y)lH,,} = a
for all F
E
K,,
for all F E HI,
1
> talKlo
=a
(5.2-28) (5.2-29)
The two-sample Student's t test is thus unbiased of level a for testing HI, versus Klw Since the Student's t test is also the UMP similar test for this same problem, it follows that the test defined by Eq. (5.2-22) is the UMP unbiased detector for testing H I , versus Klw The next logical generalization of the present detection problem is to let the density functions under the hypothesis and alternative to be continuous but otherwise unknown. The resulting detection problem can
5.2
I07
One-Input Detectors with Reference Noise Samples
be stated as testing versus
HI,: 8 , = 8,,
u2 finite,f and g otherwise unknown
K,,: 8, < O,,
u2 finite,f and g otherwise unknown
Since no specific form is assumed for either f or g, the problem can be easily classified as being nonparametric or distribution-free. Since the Student’s t test is the UMP unbiased detector for the assumption that f and g are Gaussian, its false alarm rate will not remain constant for arbitrary input probability densities; however, for m and n large in Eq. (5.2-22), the Student’s t test is asymptotically nonparametric. This can be seen by again considering Eqs. (5.2-24a-dH5.2-26). For a large number of x and y observations, both 7 and X are asymptotically Gaussian by the central limit theorem, and their difference is thus also asymptotically Gaussian with zero mean and variance ( u 2 / n ) ( u 2 / m ) under H I , . The discussion of the denominator in Eq. (5.2-26) still holds for
+
HI,, namely that for m and n large, uv(l/m)
+ (l/n)
approaches
ui(l/m) + ( l / n ) . The Student’s t test statistic on the left-hand side of Eq. (5.2-22) is thus asymptotically Gaussian with zero mean and unit variance for HI, true when the number of x and y observations are many. If the threshold is chosen according to Eq. (3.2-19), then the Student’s t test is asymptotically nonparametric when H I , is true. The power of the detector can be found as before. When K , , is true, the test statistic has a limiting Gaussian distribution with mean i m m (0, - 0,) and variance ( u 2 / n ) ( u 2 / m ) ;thus as m and n approach infinity, the mean of the test statistic approaches infinity and the probability of detection goes to one. By Eq. (2.6-13), the Student’s t test is thus consistent for the alternative K, The two-input Student’s t test given by Eq. (5.2-22) is therefore the UMP unbiased detector for testing H I , versus K,,,and is asymptotically nonparametric under H , , and consistent for K,,.The two-input version suffers from the same problem of irrelevance in its rejection region as does the one-input t test discussed in Section 3.2.
+
,.
5.2.2
Wilcoxon and Mum-Whitney Detectors
In this section the problem to be considered is again that of detecting a positive location shift of a probability density function. Two sets of
5
108
Two-Input Detectors
observations are assumed to be available; one is a set of reference noise samples denoted by ,x = { x I ,x,, . . . ,x,,} with distribution function F ( x ) , and the other is the set of observations to be tested denoted by y = { y l , y,, . . . ,y,} with distribution function G (y). The hypothesis and alternative of interest can be written as H,,:
F ( x ) = G(y)
for all x and y
K,,:
F ( x ) 2 G(y)
for all x and y and F ( x )
> G(y) for some x and y
This is simply a statement of the desire to detect a positive shift of the probability density or distribution function of the y observations, and is another way of writing the hypothesis and alternative H I , and K I I , respectively. For the purpose of this discussion, F and G are assumed to be continuous in order to eliminate the possibility of tied observations. The detector originally proposed by Wilcoxon in 1945 consists of algebraically ranking the m .+ n available observations and then summing the ranks of the xi and comparing the statistic to a threshold, (5.2-30) From Eq. (5.2-30), it can be seen that the hypothesis is accepted when the value of the test statistic is greater than a fixed threshold. Since normally in engineering applications the alternative is accepted for larger values of the test statistic, the test will be written in terms of the ranks of the y observations, (5.2-3 1) which is entirely equivalent to the detector in Eq. (5.2-30). A slightly modified version of the detectors in Eqs. (5.2-30) and (5.2-31) which is ideally suited for testing H , , versus K , , was proposed by Mann and Whitney [ 19471. The original test proposed by Mann and Whitney has the form K
r n n
U
=
2 j-1
i-1
U ( X ~-
y j ) 5 T, H
(5.2-32)
where u ( x ) is the unit step function. This test or its equivalent was proposed by a number of authors over the years, including Wilcoxon, and Kruskal [1957] summarizes the proposals of some of the lesser known authors. Strictly speaking, the detector described by Eqs. (5.2-30) and (5.2-31) is a Wilcoxon detector, and that described by Eq. (5.2-32) is a Mann-Whitney detector. However, in spite of a lack of a consensus in the literature, the name Mann-Whitney will be used to denote the detectors
109
5.2 One-Input Detectors with Reference Noise Samples
represented by Eqs. (5.2-30) through (5.2-32), which include the two-input Wilcoxon test. The U detector in Eq. (5.2-32) counts the number of times that an x observation exceeds a y observation and accepts the hypothesis when the value of the test statistic is greater than T2.This detector, like the detector in Eq. (5.2-30), is not in the usual form for engineering applications, but it can be rewritten as
xx m
n
j-1
i-1
U' =
u(yj - xi)
K
(5.2-33)
3 T3
H
which is exactly equivalent to the detector in Eq. (5.2-32). The S, S', U, and U' statistics are all related by straightforward expressions which greatly facilitate the computation of probabilities for the detectors that employ these statistics since only one table is required to supply the necessary probability information. The relationship stated without proof by Mann and Whitney [1947] in their original paper is (5.2-34) U = mn + [ m(m 1)/2] - S'
+
where S' is based on the ranks of the y observations and is given by Eq. (5.2-31). The proof of Eq. (5.2-34) will be approached by first deriving a relation between U and S and a similar relation between U' and S'. To simplify the derivation, let the x and y observations be jointly represented by a single set of observations denoted by w,where w is given by w = { w p w 2 , . . . , wn, W n + l , . * - ,w n + m } = {x,, x 2 , . * X n , Y , , * * * d,} Utilizing this definition, S in Eq. (5.2-30) can be written as 9
S =
n i- 1
x
n
n+m
;=I
j-1
R,= 2
(5.2-35)
u ( w i - wj)
which is simply another method of computing and summing the ranks of the x observations. This can be seen by noting that the statistic on the right-hand side of Eq. (5.2-35) compares each xi with each observation in the total set of observations w, records a 1 each time xi 2 wj, and sums all of the 1's generated in this manner, thus computing the rank of each x i . Continuing with the proof, Eq. (5.2-35) can further be written as n
s = ;2 =I
x
j-1
x n
m+n
U(Wi
- 9 )=
;=I
n+m
xx n
2 U ( W i - w,) + i - 1 j==n+l
n
j-1
U(Wi
-
9)
(5.2-36) Working now with the last term in Eq. (5.2-36), note that this statistic is simply comparing the x observations with themselves. Using this fact, the
5
110
Two-Input Detectors
value of this term can be found as follows. For each value of i there will be a value of j = i such that wi = 9;and thus, u(wi - wj) = u(0). Therefore, using the definition of the unit step function in Chapter 1, there will be a total of n ones generated by the above process for i = j . Since the maximum value of the last term in Eq. (5.2-36) is n2, the maximum possible score excluding the cases for i = j is n2 - n = n(n - 1). Of these n(n - 1) terms, only n(n - 1)/2 will equal one since each observation is being compared to every other observation twice, and in one of these cases xi > xi and in the other, xi < xi. Summing the contributions of the i - j and i # j cases, yields a value of the last term in Eq. (5.2-36) of n + [n(n - 1)/2] = n(n + 1)/2. Using this fact and noting that the first term on the right-hand side of Eq. (5.2-36) is U as given in Eq. (5.2-32), S can now be expressed as [Hajek, 19691 (5.2-37) s = u + [n(n + 1)/2]
u = s - [n(n + 1)/2]
(5.2-38)
In an exactly analogous manner, an expression relating S' and U' can be found, j-1 m
i-1
m+n
m
m
2 2 U ( W j - W i ) + j2 2 -1 = U' + [ m ( m + 1)/2] =
j-1 i=m+l
i-1
or
U' = S'
-[m(m
U(Wj
-
Wi)
(5.2-39)
+ 1)/2]
(5.2-40)
The original result of Mann and Whitney [1947] in Eq. (5.2-34) can now be obtained from Eq. (5.2-40) and the fact that U U' = mn; thus
U' = S' - [ m ( m or rearranging,
U
= mn
+ 1)/2]
+
= mn -
+ [ m ( m + 1)/2]
-
S'
U (5.2-34)
which is the desired result, Eq. (5.2-34). Using Eqs. (5.2-37H5.2-40) and U + U' = mn, any necessary relationship between U,S, U',and S' can be found. As noted earlier, these relationships are particularly useful when one is working with the U' statistic, but only has a table of probabilities for the S statistic available.
5.2 One-Input Detectors with Reference Noise Samples
111
Capon [1959b] has presented and analyzed a normalized version of the
U' statistic given by
(5.2-41) All calculations performed for the U' statistic are applicable to the V statistic with the effects of normalization incorporated, and therefore only the U and U' statistics will be treated in the ensuing discussion. Since the mean and variance of detector test statistics are important when approximations are used to compute false alarm and detection probabilities, the mean and variance of the basic test statistic treated in this section, the S statistic, when the hypothesis H I , is true will be derived. The use of the relationships in Eqs. (5.2-37)-(5.2-40) to obtain the means and variances of the S', U,and U' test statistics will then be illustrated. From Eq. (5.2-30), S = Zl-IRx,and therefore the expected value of S is n
E{&,} =
n+m
2 2 kP{&, i - 1 k-1
=
k } (5.2-42)
where the last step follows from the definition of expectation. To compute the probability that the rank of the ith observation is k, note that when H I , is true, the total possible permutations of the ranks of the first n observations in the set of N = n + m observations is just the number of . permutations of N things taken n at a time, or (n + m ) ! / ( m ) ! Similarly, the number of permutations of the ranks of the first n observations given that the rank of the ith observation is k is (n m - l)!/(m)!. Using these results, the desired probability is found to be
+
Substituting this result into Eq. (5.2-42) yields,
n E { S } = Z /Since Z y , , k = N ( N
+ 1)/2
2N Fk = $
i l l k=l
N
x k k-1
[Selby and Girling, 19651,
+
1) n N(N - n(N + 1) E { S } = -. (5.2-44) 2 2 N which is the mean of the S statistic when H I , is true. The mean of the U statistic can be obtained from Eq. (5.2-32) by noting
5
112
Two-Input Detectors
that
=$
q+l
mn
(5.2-45)
j-1 i = l
This result can also be obtained by taking the expected value of Eq.
(5.2-38),
E{U}
=
E { S } - E{n(n
+ 1)/2}
which upon substituting for E ( S ) from Eq. (5.2-44) becomes
+ 1) - f n ( n + 1) = +n(n + m + 1) - +n(n + 1) = t ( n 2 + mn + n - n2 - n) = tmn
E { U }= f n ( N
which is the same as Eq. (5.2-45). Since U
+ U’= mn,
E { U’}= mn - E { U } = mn - (mn/2) = m n / 2
(5.2-46)
and from Eq. (5.2-39) it follows directly that E { S ’ } = E { U ’ } + + m ( m + l ) = t m n + f m ( m + 1) = fm(n
+ m + 1) = f m ( N + 1)
(5.2-47)
The variance of the S statistic is somewhat more difficult to compute than the mean value. The derivation here parallels that of Hajek [1969], but is less general than his result since only the variance of the S statistic is derived here. The derivation utilizes the fact that the variance of S can be written as n
var{ S } =
2
i- 1
var{ R,
}
xx n
+ j-1
n
k-1
cov{ R,,,,, &, }
(5.2-48)
j#k
The first term on the right-hand side of Eq. (5.2-48) can be expanded as
n
=
N
2 1-1 2 [Z - E { & , ) I 2 P { R , , , , = Z}
(5.2-49)
i-1
When H I , is true, P{R,,,, = I} = 1/N using the same reasoning that was employed in the one-input case in Section 3.4, Eq. (3.4-8). Substituting
5.2 One-Input Detectors with Reference Noise Samples
113
back into Eq. (5.2-49) yields
i- 1
(5.2-50)
N
i-1 I - 1
To compute the contribution of the second term of Eq. (5.2-48), consider only the expression for the covariance, cov{%, N
=
N
kk1 = {[% - E(Rw,)] *[kk - E(kk)I)
22 [ g - E ( %,}][A
g#h-1
- E { k k } ] * P { k= , &kk = A}
(5.2-51)
Using the law of conditional probability, the joint probability of the ranks in Eq. (5.2-51) can be written as
P{R,
=
&kk= h }
= P { k , = g } P { k k=
-
4%
=g}
( N - 2)! 1 ( N - l)! N ( N - 1)
( N - l)! N!
Using this result in Eq. (5.2-51),
and substituting for the double summation yields
-
N
2
[g-
}(.12}
(5.2-52)
8'1
Since E { k,} = 27.]IP { k,= I } = X7,l l / N , the first term on the righthand side of Eq. (5.2-52) becomes
The expression for the covariance thus reduces to
5
I14
Two-Input Detectors
Substituting Eqs. (5.2-50) and (5.2-53) into Eq. (5.2-48) yields
=
[ ( N - l)n - n(n - l ) ] N ( N - 1)
I- 1
(5.2-54)
Since E { R , , , } = Z ; , , k P ( R ,
= k } = Z;,,k/N
From Selby and Girling [1965], N ( N + 1)(2N + 1 ) $p= 6 I= 1
-
4N (N - 1) mn(N + 1 )
=
x N
and
k-1
+ 2 - 3N
-
E,
k =
Eq. (5.2-54) be-
N ( N + 1) 2
3
12
(N - 1) mn(N + 1 ) 12 = 12 ( N - 1) which is the variance of S when H , , is true.
-
(
)
(5.2-55)
5.2 One-Input Detectors with Reference Noise Samples
I15
The variance of S’ can be found in an exactly analogous manner by letting the upper limit on the summations in Eq. (5.2-48) equal m . The variance of S’ thus computed is the same as the variance of S, that is var{S’} = g m n ( N
+ 1)
(5.2-56)
Using Eq. (5.2-38) to find the variance of U yields var{ U >= E
( [ ( s- t n ( n + 1)) - E { S - t n ( n + 1)}12)
but E { S - fn(n
+ l)} = E { S } - $ n ( n + 1)
Therefore,
+ 1 ) s + fn’(n + 1), - 2SE { S }+ n ( n + 1)s + 1 ) s - i n 2 ( . + 11, + E , { s } + n ( n + ~ ) E { s } + a n 2 ( . + I)’} = E { S 2 } - E 2 { S } = var{S} = g m n ( N + 1) (5.2-57)
var{ U } = E { S z - n ( n -n(n
Similarly, var{ U’}= & m n ( N
+ 1)
Since Capon’s [ 1959bl V test is simply V = (1/ m n ) U’,then E { V } = ( l / m n ) E { U‘} = 1/2
(5.2-58) (5.2-59)
and from Eq. (5.2-46),
=-
m2n2
N + l var{U’} = 12mn
(5.2-60)
Capon obtained these same results, Eqs. (5.2-59) and (5.2-60) by utilizing the approach employed by Mann and Whitney [1947] in their original paper. Mann and Whitney [1947] have shown in their original paper on the U statistic that the U statistic is asymptotically normally distributed when HI, is true, if F ( x ) is continuous, and if the limit of the ratio m / n exists as n and m tend to infinity in any manner. In the same paper, Mann and Whitney prove that the U test is consistent under K,,. Lehmann [ 19511 has shown that U is asymptotically normally distributed when K,, is true if F ( x ) and G(y) are continuous and if the limit of m / n exists as m and n approach infinity. Capon [1959b] has proven that the V statistic is asymptotically normally distributed when either HI, or K,, is true in a manner that differs from the proofs of either Mann and Whitney [1947] or Lehmann [ 19511.
116
5
Two-Input Detectors
The two-input version of the Wilcoxon detector or the Mann-Whitney detector is just as efficient as its one-input form. The S , S ’ , U,U‘,or V detectors are all extremely efficient for testing a positive location shift of the probability density function. This is not surprising since the MannWhitney U detector was specifically designed for the problem of testing whether one of two random variables is stochastically larger than the other. Engineering problems in this category are, using Capon’s [ 1959bl nomenclature, the dc detection problem and the noncoherent detection problem. These and other applications of nonparametric detectors are discussed in Chapter 9.
5.2.3 Normal Scores Detector The two-input version of the normal scores detector can be obtained in a manner directly analogous to that used to derive the one-input form of the detector. A slightly abbreviated derivation is given here, but no important. steps are excluded. Assume that two sets of independent, identically distributed observations, x = { x , , . . . , x,,} and y = { y , , . . . ,y,}, which have probability densities f(x) and g(y), respectively, are available at the input. The problem is to find the locally most powerful detector for testing H13: f(x) = g(y),f(x) is Gaussian with mean p , and variance u2 versus K13: f(x) = g ( y - S ) , g(y) is Gaussian with mean p2 and variance u2, and 8 = pz - p ,
To simplify notation, let the combined data set be represented by w = { w , , . . . , w,,, w,,+,, . . . , w,,+,} = { x , , . . . ,x,,, y , , . . . ,y,} with joint probability density h(w). The hypothesis and alternative to be tested can now be stated as H I , : h(w) = IIY. ,f(wi),f(wi) is Gaussian with mean p, and variance u2 K14:
h(w) = I I ~ - , f ( w i ) I I ~ - , , , , g ( w if)(,w i ) as given in HI4 and g(wi) Gaussian with mean p2 and variance u2
Let R = { r , , r2, . . . , rN} be the ranks of w = {w,, w2, . . . , wN}, respectively. In order to find the locally most powerful test of H , , versus K14, the likelihood ratio of the probability densities af the ranks of the observations, wi, i = 1, 2, . . . , n + m = N, underX,, and H , , must be formed. Since the density of the observations is symmetric under Hi4 and there are N ! possible permutations of rank assjgnments, the probability density of the ranks when H , , is true is l/N!. Since when K , , is true, h(w) is no
ZI 7
5.2 One-Input Detectors with Reference Noise Samples
longer symmetric, another approach must be taken to find the probability densities. The desired probability density f R ( RIKl4) can be expressed as fR(RIK14)
=J
f ( R , Z~lK14)d z ~
O O (7’,
+ u;
To determine the most powerful detector for this problem, the probability densities when H i , and K;,are true must be obtained. When H i , is true, the joint probability density is simply the product of the individual x and y densities, ”
where it has been assumed that n observations are available. When true, the joint probability density becomes
n n
X(X,
YIK;,) =
i=l
4,is
1
27r(u,2 + u;)2
(xi - P , ) - 2r(xi -
- H)
~ s ) ( ~ i
uo’
+ u,’
+
(yi - ~ 1 , ) ~ u;
+ u,’
(5.3-4)
Since the mean value of the signal p, is assumed to be known, ps can be set equal to zero without loss of generality. Equation (5.3-4) thus becomes n
X(X,
YI&)
=
fl
i=l
27r(u,2
+
1
Equation (5.3-5) can be further simplified by noting that u,” + u; = u,”/ ( 1 - r), and therefore
5.3 Two Simultaneous Input Detectors
125
Using Eqs. (5.3-3) and (5.3-6) to form the likelihood ratio, X(X,
L(x'
YIK;,)
= x ( x , YlHi,)
=
fi
0 0 '
;,(
+ ,-,;)dm
-I [ -x,? - y,? - rx,? - ry? 24(1 + r)
+ x,? - 2rxiyi + $1
Taking the natural logarithm of Eq. (5.3-7) and comparing the result to a threshold yields
(5.3-8) Since the first term to the right of the equals sign in Eq. (5.3-8) and the multiplier of the summation in the second term are constants, both can be included in the threshold setting. This modification results in a test of the form n
2 ( x i + yi)'
i- 1
K
3 T' H
(5.3-9)
which is the most powerful detector in the Neyman-Pearson sense for testing Hi,versus K;, when all processes have zero mean. If much less information is known about the signal and noise processes, a most powerful detector can usually still be derived. For instance, assume in the previous problem that it is still known that the probability densities are Gaussian, but the parameters px, p,,, ,:u u;, and r are all unknown. Following a procedure already used twice in earlier sections, the parameters can be estimated using the method of maximum likelihood, the
5
I26
Two-Input Detectors
resulting estimates substituted for the appropriate parameters in the density functions, and a most powerful detector determined from the likelihood ratio of the two probability densities. T h e detection problem now being considered is that of testing the hypothesis
H I , : X(x, y) is Gaussian with r = 0 and p,, py, u;, u; unknown versus K16: X(x, y) is Gaussian with r > 0 and p,, py, u:, I$ unknown A most powerful detector for testing HI6 versus K16 will now be found using the method outlined above. Writing the probability density under
and taking its natural logarithm yields n 2 2 2 In ~ ( x YIH,,) , = - 7 ln[ ( 2 ~ )OxOy] Yi -
2
Py
(5.3-10)
i-1
In this equation, px, 4, uz, and u;, are all unknown and therefore must be estimated using the method of maximum likelihood. Taking the derivative of Eq. (5.3-10) with respect to each of the parameters p,, py, uz, and and equating all four of the results to zero produces, (5.3-1 1) (5.3-12)
all of which must be solved simultaneously to obtain the desired parameters. Equations (5.3-1 1) and (5.3-12) can be solved immediately for fix and
127
5.3 Two Simultaneous Input Detectors
fi,,,
since they are essentially independent of 1 "
fix = -
2 xi
and
i-1
u,'
and uy, to yield 1 "
fiy = -
2 yi
(5.3- 15)
i-1
Equations (5.3-13) and (5.3-14) can then be solved just as simply to yield 1 " 6; = - 2 (Xi-
fix)2
(5.3- 16)
1 2 " ( y i- fiy)2
(5.3-17)
i-1
and -2 uY
E
n
i l l
where jix and jiy are given by Eqs. (5.3-15). Using these estimates, the probability density under HI, now becomes
(5.3-18) Now writing the probability density when the alternative K , , is true
and taking the natural logarithm of the density
(5.3-20) Taking the derivative of Eq. (5.3-20) with respect to the parameters px, cc,,
5
128 a:, a;,
Two-Input Detectors
and r individually and equating to zero,
a In X(x, yIK,,) aa; -(xi i- 1
-0
(5.3-21)
=o
(5.3-22)
-n 20:
= -2 PX)
2(1
-
+ r(Xi -2
~
1')
x
)( U i
2~;
-~
}=O
y )
(5.3-23)
axay
OX
a In X(x, y(KI6)= -n aa;
1
1
2(1 - r2) (5.3-24)
(5.3-25)
Equations (5.3-21) and (5.3-22) reduce to (5.3-26)
and (5.3-27)
5.3
129
Two Simultaneow Input Detectors
These two equations can be satisfied if n
n
2 xi-
or equivalently,
npx =
2 yi-
npy = O
i- 1
i- 1
(5.3-28) Equations (5.3-23) and (5.3-24) can be factored as
a In W x , yIK16) -- - 1 au; 2031 - r 2 )
+ r 2
(Y; -
n(1 - r 2 ) -
(xi -
Py)(X;
-
Px)
)
~
2
x
)
ox
i-1
0yax
i= 1
2
=O
(5.3-30)
and also Eq. (5.3-25) as
r2) + ((1 l -+ r2$
i
}=O
(5.3-31)
i-1
Eliminating the factors outside the brackets and adding Eqs. (5.3-29) and (5.3-30) produces
(5.3-32)
130
5
Two-Input Detectors
After eliminating the factor outside the brackets and multiplying by (1 - r 2 ) / r , Eq. (5.3-31) is subtracted from Eq. (5.3-32) to yield
or n(1 - r’) =
(1 - r’)
5
(Xi
- Px)(Y;
i-1
- Py)
axay
which can be solved for r to obtain 1
;=
2” ( X i
i-1
- ANY; SXey
fiy)
(5.3-33)
The estimates of u,’ and a? , can be found by substituting Eq. (5.3-33) for r in Eqs. (5.3-29) and (5.3-30),
and
and thus, (5.3-34) (5.3-35) With Eqs. (5.3-28), (5.3-33), (5.3-34), and (5.3-35) specifying the maxi-
5.3 Two Simultaneous Input Detectors
131
mum likelihood estimates for fix, Ci,, i, 6:, and 6;, respectively, the probability density under K]6 becomes
The likelihood ratio can be formed by dividing Eq. (5.3-36) by Eq. (5.3-18). Since Eqs. (5.3-15) are identical to Eqs. (5.3-28), and Eqs. (5.3-16) and (5.3-17) are the same as Eqs. (5.3-34) and (5.3-39, respectively, the likelihood ratio immediately simplifies to
(5.3-37) If L ( x , y) in Eq. (5.3-37) is compared to a threshold,
but this is the same as
or equivalently,
The detector for testing the hypothesis H I , versus the alternative K,, thus
I32
5
Two-Input Detectors
1
"
is 1 " K
2 T' (5.3-38) H
which simply consists of comparing the sample correlation coefficient defined by r^ in Eq. (5.3-33) to a predetermined threshold. Throughout the derivation of the detector described by Eq. (5.3-38), it has only been claimed that a most powerful detector has been sought. This is because since the maximum likelihood estimates of parameters were used in the probability densities, the Neyman-Pearson lemma does not necessarily yield the most powerful detector for testing H I , versus K16, but only a most powerful detector in some unspecified sense. It can be shown that the detector defined by Eq. (5.3-38) is the UMP unbiased detector for testing HI6 versus K16. The proof of this fact is outlined below. It can be shown that the detector given by Eq. (5.3-38) is UMP similar for testing H I , versus Kl6 by following step-by-step Lehmann's proof [Lehmann, 19471 that the circular serial correlation coefficient is UMP similar (Lehmann also demonstrates that the circular serial correlation coefficient is UMP unbiased). Accepting this fact and remembering the discussion in Section 2.6, it only remains to be shown that the test statistic, the sample correlation coefficient, is unbiased. First of all, note that r^ can be expanded and written in the form
(5.3-39)
133
5.3 Two Simultaneow Input Detectors
where
and 1 " Now let the value pa be chosen such that
(5.3-40) when r = 0. Consider now a case when K16 is true, that is r
=
P( w
> 0, then
- > ~s ~ ~1, I K , , }
> p { xy -v - IE(XY) - E ( X ) E ( Y ) l > SfPalK,6) -
=
Therefore,
P{
XY
-v - IE(XY) - E ( X ) E ( Y ) l > palK16
QD(F) > a
st
for all F E K16
I
=a
(5.3-41)
and from Eq. (5.3-40),
E { D ( x , y)lH16} = a
for all F E
HI6
(5.3-42)
The sample correlation coefficient is thus unbiased at level a for testing HI6 versus K16. Since the detector defined by Eq. (5.3-38) is also UMP similar for this problem, Eq. (5.3-38) specifies the UMP unbiased detector for testing HI6 versus K16. It should be noted that this detector is not UMP unbiased for testing the hypothesis r = rl versus r = r2, where rl # 0 and r2 # 0 [Kendall and Stuart, 19671. To determine the required threshold setting to achieve a chosen false alarm rate, it is necessary to know the probability density of the sample correlation coefficient when HI6 is true. Kendall and Stuart [1969] have
5
I34
Two-Input Detectors
noted that the quantity (5.3-43) has a Student’s t distribution with n - 2 degrees of freedom under H I 6 . The threshold, say T,, that achieves the desired false alarm rate a. can thus be found from tables of the Student’s t distribution and then transformed according to T, T; = I (5.3-44) [ n - 2 T:]’
+
to obtain the threshold for the sample correlation coefficient. It is not too difficult to derive the large-sample mean and variance of i, however, these quantities are of limited usefulness and thus only the results are stated here. As the number of samples, n, approaches infinity, .the distribution of i very slowly approaches a Gaussian distribution with mean r and variance (1 - r 2 ) * / n [Kendall and Stuart, 19671. In fact, the distribution of F tends to normality so slowly that such an assumption is not valid for n less than 500; hence, the statement concerning the asymptotic mean and variance made at the beginning of the paragraph. Kendall and Stuart [1969] have derived the large sample variance of F, and they give an expression for the mean of F from which the asymptotic value is easily obtained [Kendall and Stuart, 19671. The slow convergence of the distribution of i to normality should serve to emphasize to the reader that great care must be exercised when using asymptotic distributions of test statistics. The next logical step in the further relaxation of the structure of the hypothesis and alternative is to assume that the joint probability density X(x,y) is completely unknown. The hypothesis and alternative to be tested thus become H17:
versus
X(x,
Y)
= F(x)G(y)
K17: r > 0, X(x, y) unknown The reader may have noticed that in the hypothesis H,,, no specific statement is made concerning the value of the correlation coefficient r, whereas the correlation coefficient is specified to be positive in Kl,. The alternative K,,thus implies that some decision is to be made concerning the value of r, but Hi7 seemingly says nothing at all about r. Actually, Hi7 does make a statement about the value of r. This can be seen by the following discussion.
5.3
Two Simultaneous Input Detectors
I35
Consider the hypothesis H I , , which states X(x, y) = F(x)G(y). This simply says that the joint distribution function of x and y factors into the product of the marginal distributions of x and y, which means that x and y are independent. If x and y were Gaussian distributed, then independence would imply r = 0. One would thus be tempted to conclude that in general, zero correlation implies independence. This is not true, however. In fact, Kendall and Stuart [I9671 have cited the example of a bivariate distribution with uniform probability over a unit circle centered at the means of x and y. For this example, x and y are shown to be nonindependent and also to have zero correlation. In spite of this last discussion and realizing that there is some inconsistency in its specification, we will let r = 0 imply H I , for our purposes. When H I , is true, the sample correlation coefficient i is nonparametric for sample sizes as small as 11. The sample correlation coefficient is also consistent when K,,is true for distributions under K,,for which x and y dependent is equivalent to r > 0, which are essentially those cases being considered here. When dependence between x and y is accompanied by r = 0, i is not consistent under K,,nor does i remain nonparametric for distributions under H,,. As should be clear from the preceding discussion, there are several difficulties with employing i as a detector test statistic, particularly when one desires a detector that is nonparametric or distribution-free. Perhaps the major difficulty is that i, and hence its distribution, depends on the actual observation values x andy. This produces two problems. First, for a small number of observations, say 5 to 10, the distribution of i is fairly difficult to compute. Second, the dependence of the distribution of i on the sample values means that F is only asymptotically nonparametric, although F tends to normality rapidly when H I , is true. Because of these problems, the possibility of finding a detector that is nonparametric and whose distribution is relatively simple to compute, or can be easily tabulated, is attractive. The following sections discuss three detectors that satisfy both of these properties.
5.3.2 Polarity Coincidence Correlator The polarity coincidence correlator (PCC) is a nonparametric detection device that, as its name suggests, uses only the polarities of the two sets of observations to arrive at a decision. The PCC has been applied to problems in geophysics [Melton and Karr, 19571 and acoustics [Faran and Hills, 1952a, b], and has been studied extensively by various authors [Ekre, 1963; Kanefsky and Thomas, 1965; Wolff et al., 19621. Actually, the PCC detector was first investigated by Blomqvist [1950] under the name of
136
5
Two-Input Detectors
medial correlation test. The PCC has been shown to be inferior to the familiar correlation detector given by
x n
K
xiy, 2 T H
i-1
when the noises and signal are stationary random processes, but clearly superior when the noise is nonstationary. The test statistic for the PCC detector is n
Spec =
2
i- 1
(5.3-45)
sgn(xi) sgn(Yi)
where the xi andy,, i = 1, 2, . . . , n are the two sets of observations which consist of either noise alone xi = uli and y, = u2,, or signal plus noise xi = s, + uli andy, = s, + u2,. The s,, u l i , and uZi sequences are assumed to be stationary, independent random variables with the same probability distributions of all orders. The probability distributions are also assumed to have zero median, to be symmetric with respect to the origin, have finite mean square value, and to be absolutely continuous. The hypothesis and alternative to be tested can be stated as H I , : xi = u,,; yi = u2, (5.3-46) K,,: xi = s, + u,,; y, = s, u2, i = 1, 2, . . . , n. The probability distribution of S,, when H I , and K , , are true can be determined as follows. When H I , is true, that is when noise alone is present, the probability of the term under the summation sign being + 1 is
+
p{sgn(xj) sgn(yi) = + l I ~ l 8 } = P {xi andy, > 01H,,} + P {xi andy, = p{xj =
> OIH~~}P{Y~ > OIH,,)
(f ) 2 + (+)2=
+
< OIH,,) + P{xi < OIH,,)P{yi < OIH1,}
and similarly,
t
P {sgn(xj) sgn(yi) = - ~ I H I ~ = } When the alternative K,, is true, the probability of the value P {sgn(xj) sgn(Yj) = + l I ~ 1 8 1 = P{xiandy, W
P{uli W
+ 1 is
> OIK,,} + P{xiandy, < OIKI,)
> -uanduZi >
P { u l i < - u and uZi
-u}f,(u)du
< - u } ~ , ( u )du
(5.3-47)
5.3 Two Simultaneous Input
I37
Detectors
which is simply the probability that both arguments of the sgn( .) terms are greater than or less than - u, integrated over all possible values of u, the signal to be detected. Since the two channels are independent and the noise distributions are symmetric with respect to the origin, the probabilities under the integral signs can be written as P { o , , > -uandozi > - u } = P { o l i > - u } P { o z i > - u } = {1
- F,(-u)}2
and
P{uli < -uanduzi The probability of a
31
-fi(2p+ -1)
2[P+ (1 - P +
1 >I2
I
I
(5.3-57)
5.3 Two Simultaneous Input Detectors
141
Taking the limit of Eq. (5.3-57) as n + 00, 1 - p + 1, and hence from Eq. (2.6-13), the PCC detector is consistent for the present hypothesis and alternative. The PCC detector is at least asymptotically unbiased as can be seen by a straightforward manipulation of Eq. (5.3-57). That the PCC detector is unbiased for small sample sizes can be seen by noting that the PCC is a bivariate generalization of the sign test in Section 3.3, and as such, it is uniformly most powerful for testing the hypothesis and alternative H19: p = 5 1 ; K19; p >; wherep is the probability of the summand of S,, being + 1 [Carlyle and Thomas, 19641. Since a detector that is UMP is necessarily unbiased, the PCC detector is unbiased for testing H I , versus KI9. Note that given the assumptions concerning the probability distributions of the signal and noises at the beginning of this section, the problem of testing H I , versus K,, is equivalent to testing H I , versus K19. The PCC detector thus possesses the desirable properties of being nonparametric for HI8,consistent for K18, and unbiased for testing HI8 versus KI8. In addition, the PCC detector is simple to implement and its test statistic has a probability distribution that is extensively tabulated. All of these facts combine to make the polarity coincidence correlator a highly attractive detector for testing H I , versus K I 8 .The performance of the PCC detector relative to other detectors for testing other hypotheses and alternatives still must be evaluated, and this evaluation is accomplished in Chapter 6. 5.3.3
Spearman Rho Detector
It is clear from the previous section that a large amount of information available in the input observations is being discarded by the PCC detector. At the same time, however, the development of the sample correlation coefficient r in Section 5.3.1 indicates that the utilization of the actual values of the observations in the test statistic causes the false alarm rate to be dependent on the probability density of the observations.The Spearman rho ( p ) detector developed in this section utilizes only the ranks of the input observations, and therefore employs more of the information available in the observations than the PCC detector while retaining the desirable property of being nonparametric. The test statistic for the Spearman p detector is given by [Spearman, 19041
5 [R(xi)- i ( n
P'
i- 1
fin(.
+ l ) ] [ ~ ( y i )- f ( n + 113
+ l)(n - 1)
(5.3-58)
5
142
Two-Input Detectors
where R ( x i ) and R ( y i ) are the ranks of the xi and y i , i = 1, 2, . . . , n, observations. Specifically, R ( x i ) = 1 if xi is the smallest in magnitude of the xi, i = 1, 2, . . , , n, and R ( x i ) = 2 if xi is the second smallest in magnitude and so on. The observation with the largest magnitude has rank n. For the case when there are no tied observations, the p test statistic in Eq. (5.3-58) is actually just the sample correlation coefficient in Eq. (5.3-33) with the xi andy, replaced by their corresponding ranks. To see this, consider the sample correlation coefficient with the xi and yi replaced by their ranks, which is given by
The sample mean of the ranks is easily computed to be
and (5.3-60b) where use has been made of the fact that X 7 m l i = n(n + 1)/2 [Selby and Girling, 19651. The sample variance can be found similarly
1
i:
i-1
and
-
1 2 " [i [ R ( X i )- R ( x ) I 2 - i-1
F] 2
( n + l)(n 12
- 1)
(5.3-6 1b)
5.3 Two Simultaneous Input Detectors
since n
i2=
n(n
i= I
143
+ 1)(2n + 1) 6
[Selby and Girling, 19651. Substituting the above results into Eq. (5.3-59) yields Eq. (5.3-58) as desired. Another form of the Spearman p test statistic can be obtained by defining
n
substituting the results in Eqs. (5.3-61a, b) into Eq. (5.3-62), pT = g n ( n 2 - 1)
+ g n ( n 2 - 1)
n
and rearranging, to obtain
i= 1
Using Eq. (5.3-63) and Eqs. (5.3-61a, b) in Eq. (5.3-59) yields
Hence, an equivalent form for the Spearman p test statistic is P=
- 6PT n ( n 2 - 1)
(5.3-64)
where pT is given in Eq. (5.3-63) [Gibbons, 19711. A detector that uses the test statistic pT instead of p is called the Hotelling-Pabst test in recognition of their work concerning the nonparametric property of the Spearman p detector [Hotelling and Pabst, 19361. The advantage of using the Hotelling-Pabst test is that it requires
144
5
Two-Input Detectors
fewer arithmetic operations than the Spearman p detector. Note that the Hotelling-Pabst test statistic varies in an inverse fashion relative to the p statistic; that is, pT is large when p is small, and pT is small when p is large [Conover, 19711. The Spearman p detector is useful in engineering applications for testing the hypothesis and alternative stated earlier in Section 5.3.1 X(x, Y) = F ( x ) G ( y ) K,,: r > 0, X(x, y) unknown
H17:
where r is the correlation coefficient. Since H I , specifies that the x and y observations are independent, each of the possible ranks of the xi andy,, i = 1, 2 , . . . , n, are equally probable and the Spearman p detector is nonparametric for testing the hypothesis H17. The mean and variance of the Spearman p detector under the hypothesis of independence, H I , , can be calculated by first expanding Eq. (5.3-58) to obtain
-
i- I
n ( n 2 - 1) n
(5.3-65)
To determine E { p I H I 7 } and var{plH,,} then, it suffices to compute E {El-,R( x i ) R (yi)lH17}and var{Zl, , R (xi) R (yi)IHl7} and use these results along with Eq. (5.3-65) to obtain the desired mean and variance. Since, when H I , is true, the observations xi, i = 1, 2, . . . , n and y,., i = 1, 2, . . . , n are independent, their ranks R ( x i ) and R ( y i ) ,i = 1,2, . . . , n will also be independent, hence
145
5.3 Two Simultaneous Input Detectors
and Var[
5 R (xi)R (Yi)IH17}
i= 1
var{ R (Xi)IHl7} var{
=
(Yi)IHI,}
(5.3-67) +n(n - 1) COV{R(xi)R(xj)lHl7} COV{R(Y~)R(Y~)IHI~} It is thus necessary to compute the mean, variance, and covariance of the ranks. When HI, is true, each of the ranks are equiprobable, hence 1 " (5.3-68) E{ R(xi)lH~,}= E { R ( Y ~ ) I H I ~=} - Z i= 2 +
i=l
and
(5.3-69) The variance is thus var{ R (xi)IHl7} = var{ R (yi)lHI7} = h(n2- 1) The covariance can be determined from the expression
(5.3-70)
(xi)R (Xj)lH~7}
'Ov{
=
{ (xi)R (X,)lH17} n
= n(n - 1)
{ R (xi)lH~7}E{ (Xj)IH~7}
(;I1)
2
n
xi
xxij-i l l j-1 i#j
:2
(5.3-71)
since there are n(n - 1) equiprobable combinations of the ranks. Expanding the first term in Eq. (5.3-71) as
x n
xn i j = ( i i )2 - i i '
i-1 j - 1 i #J
i=
I
i= I
and factoring l/[n2(n - l)] out of all terms, yield cov{ R ( x i ) R (Xj)lH~7}
(5.3-72)
5 Two-Input Detectors
146
Finally, from Eq. (5.3-65) {PIHi7)
=
12nE { (xi)IH~7}E{ (J’i)IH~7} 3(n + 1) n-1 n(n2 - 1)
=o (5.3-73)
1 n-1
(5.3-74)
I -
since for a, b constants var{x
+ u } = var{x}
and
var{bx} = b2 var(x}
The exact probability density of p when Hi7 is true can only be obtained by enumeration and has been tabulated by Kendall [1962]. For applications of p where n > 10, the asymptotic distribution of the random variable Z = p m is very useful, since Z is asymptotically Gaussian distributed with zero mean and unit variance [Gibbons, 19711. The following three examples illustrate the selection of the detector threshold T and the calculation of the detector test statistic: Example 53-3 Suppose it is desired to test HI, versus K,, given two sets of n = 10 observations and a maximum allowable false alarm rate of a = 0.1. From the table of the Spearman p test statistic quantiles in Appendix B, the value of the threshold is seen to be TI = 0.4424. If instead of the exact distribution the normal approximation is used, the value of the threshold (from tables of the standard Gaussian distribution in Appendix B), is T2 = 1.282. The two thresholds are quite different in value and at first glance one suggests that the normal approximation may not be too accurate for n = 10. However, recall that the random variable Z = p m is the statistic that is being compared to T2. In order to compare the thresholds = = 3 to obtain T2 = for p, T2 must be divided by 0.4273, which is much closer to TI.
147
5.3 Two Simultaneous Input Detectors
Example 53-4 Let the two sets of observations for testing H I , versus K,, in Example 5.3-3 be given by the tabulation shown with the ranks indicated directly below each observation. The value of p is 10
lO[(lO)* - 11
P=
[ l o - 11
12[4 + 70 + 81 + 42
E
=
-
+ 10 + 50 + 3 + 64 + 24 + 61 990
4.29 - 3.67 = 0.62 i : 1 5.2 R(x,): 4 yi: 3.2 R(y,): 1
2 11. 10 7. 7
x,:
3 9.1 9 8. 9
4 7.3 7 6.4 6
33 -9
5 3.2 2 6.1 5
6 6.1 5 10. 10
7 1.2
I 4.2 3
8 8.6 8 1.2 8
9 7. 6 5.1 4
10 4.1 3 3.3 2
Since 0.62 > T, and 0.62= 1.86 > T,, both the exact test and the normal approximation cause K , , to be accepted.
Example 53-5 Suppose that for the same hypothesis and alternative in Example 5.3-3, the two sets of observations shown in tabulation are taken with the ranks indicated directly below the corresponding observation in the table. Note that in this case, since negative observation values are present, it is necessary to take the absolute values of the observations before ranking. For example, the rank of y 3 = - 2.1 is R ( y 3 )= 7, and the rank of y,, = 2.3 is R ( y , ) = 8. Only magnitudes are used in assigning values for the ranks. i: 1 x,: 1.5
R(x& yi:
7 2.4
R(Y~): 9
2
3
4
8
3 -3 10
- 0.7
- 2.1
4 0.4
- 2.1
2
7
0.6
5 1.1 6
- 0.6 4
6 3.4 10 1.7 5
1 0.2 1 1.9 6
8 1 5 - 0.2 1
-
9 - 0.4 2 2.3 8
10 -2.3 9 0.5 3
The value of the Spearman p test statistic is thus P =
12[63 + 8
+ 56 + 30 + 24 + 50 + 6 + 5 + 16 + 271 - 3.67
= 3.45 - 3.67 = -0.22
Since -0.22
< T, and
-0.66
990
< T2,the hypothesis Hi7
is accepted.
5 Two-Input Detectors
I48
The Spearman p detector has a test statistic that is reasonably easy to compute, an exact distribution that has been tabulated, an asymptotic distribution that is easy to use, and it is nonparametric for testing the hypothesis of independence between the two sets of observations. The performance of the Spearman p detector relative to other detectors is considered in Chapter 6.
5.3.4 Kendall Tau Detector The Kendall tau ( 7 ) detector is another rank correlation statistic that is closely related to the Spearman p detector. The Kendall 7 detector is generally considered to be less efficient computationally than the Spearman p detector, but the Kendall7 detector possesses the advantage that its test statistic has an asymptotic distribution that tends to normality very rapidly [Conover, 1971; Kendall and Stuart, 19671. Perhaps the best way to introduce the Kendall7 test statistic is to treat a specific example, as done by Kendall [ 19621. Consider the two sets of observations in Example 5.3-4 whose ranks are given by the data shown in the tabulation where the letter column headings are included to allow an easy comparison of ranks. Of the xi ranks, consider the first pair of ranks AB. These ranks are 4 and 10, respectively, and clearly occur in the natural counting order. A score of 1 is therefore assigned to this pair. The pair of xi ranks BC is 10, 9. These ranks occur in inverse order and hence a score of - 1 is assigned to this pair. The pair of yi ranks AB is 1, 7 and the score assigned for this pair is 1. Now the two scores for the pair AB are multiplied together which yields a 1. The products of the two scores for each pair of ranks are then summed to obtain a total score, and the Kendall 7 statistic is calculated by dividing this result by the maximum total score.
+
+
+
A i : l 4
R(x,): R ( y, ) :
1
B
C
D
E
F
G
2 10 7
3 9 9
4 7 6
5 2 5
6 5 10
7 1 3
H 8
I 9
8 8
J 1
6 4
0 3 2
In order to further illustrate this procedure, the following example enumerates the steps in the calculation of the test statistic for the above set of ranks.
5.3 7bo Simultaneour Input Detectors
149
Example 53-6 Each pair of ranks from the preceding tabulation is listed (see table) with the score assigned to that pair of ranks: The total number of positive. scores, usually denoted by P, is 32 and the total number of negative scores, denoted by Q, is 13. Letting S = P - Q, the total score is thus S = 32 - 13 = 19. The maximum value of S is 45 and its minimum value is -45. The Kendall T statistic is therefore given by actual score 7' - P-Q = 19 = +0.42 maximum possible score 45 45 which indicates a significant amount of positive correlation between the x and y rankings. This result, of course, is as anticipated, since the Spearman p detector has already indicated significant positive correlation between the two rankings.
Pair Score AB AC AD AE AF AG AH A1
AJ
BC BD BE
+ 1 +1 +1 - 1
+ I
- 1 + 1 +1 - 1 - 1 + 1 +1
Pair
Score
Pair
Score
Pair
Score
BF BG BH BI BJ CD CE CF CG CH CI CJ
- 1 + 1 - 1 + 1 +1 + I +1 - 1 +1 +1 +1 +1
DE DF DG DH DI DJ EF EG EH EI El FG
+ 1 - 1 + 1
FH FI FJ GH GI GJ HI HJ IJ
- 1 - 1 +1 + 1
+ 1 +1 + 1 + 1 + 1 + 1 - 1 - 1 + 1
+1 - 1 + 1 +1 +1
The expression for the test statistic can be generalized by noting that each pair of the possible rank combinations is equally probable and the total number of possible combinations is given by
Using S to indicate the actual score, the Kendall defined as r =
Using the fact that P
Y
f n ( n - 1)
T
test statistic can be (5.3-75)
+ Q = f n(n - I), Eq. (5.3-75) can also be expressed
150
5 Two-Input Detectors
as 7 =
P-Q in(. - 1)
= I -
$.(.
2Q
- 1)
(5.3-76) (5.3-77) (5.3-78)
There are several simpler ways of calculating 7 than the approach used in Example 5.3-6 [Kendall, 19621. One of the better methods consists of arranging one set of ranks in the natural order with the rank for the corresponding observation listed under it. When this is done, all scores resulting from the natural ordered set of ranks are positive and only the scores due to the second set of ranks need be considered to obtain P. This method is illustrated in the following example. Example 53-7 Consider again the two sets of ranks from Examples 5.3-4 and 5.3-6 which are listed below with the xi ranks written in natural order. Note that the ranks of they, observations are also reordered. G E J A F I D H C B i: R(xj): R(y,):
7 5 1 0 1 1 2 3 4 3 5 2 1
6 9 4 8 3 2 5 6 7 8 9 10 10 4 6 8 9 7
In the above y rankings, the number of ranks to the right of R (y,) = 3 that are greater than 3 are 7 and hence, P = + 7. Similarly for R ( y s ) = 5 , P = + 5 . Continuing this procedure
P= +7+5+6+6+0+4+3+1+0=32 From Eq. (5.3-78), then 2(32) 1 = 1.42 - 1 = +0.42 45 as before. Of course, the number of negative scores Q also could have been counted using the above procedure and Eqs. (5.3-76) or (5.3-77) utilized to calculate 7. 7=--
The mean and variance of the Kendall 7 test statistic are not difficult to calculate, however, readable derivations are available from Kendall [ 19621
5.3
Two Simultaneous Input Detectors
I51
and hence this work is not repeated here. The mean of the test statistic when H , , is true is zero as might be expected, since all possible scores are equally likely and range in magnitude between 1 and - 1. The variance of the T statistic under HI, is given by
+
var{~lH,,}=
2(2n 9n(n
+ 5) -
1)
(5.3-79)
The utility of knowing the mean and variance of T under Hi7 is that the Kendall T test statistic tends to normality very rapidly, and therefore the asymptotic distribution can prove important in hypothesis testing. The approach usually employed to prove asymptotic normality is to show that the moments of S for n large tend to the moments of the Gaussian distribution. The second limit theorem is then invoked to establish asymptotic normality [Kendall, 1962; Kendall and Stuart, 19691. This approach can also be used to demonstrate the asymptotic normality of the Spearman p test statistic. The following example illustrates the application of the Kendall T statistic to the detection problem of Example 5.3-3.
Example 53-8 Again it is desired to test H , , versus K , , at a false alarm rate of a = 0.1 given two sets of n = 10 observations. From the table in Appendix B of quantiles of the Kendall statistic, the value of the threshold is T3 = 0.33. For n = 10, the variance of the statistic is var{T} =
2(20
i:o
+ 5) -- - - - 0.06
90(9)
For this case, the random variable T/has zero mean and unit variance under H,,. From tables of the standard Gaussian distribution in Appendix B, the required quantile is 1.282. The threshold for the Kendall T statistic is therefore T4= 1.282= 0.313. Since from Examples 5.3-6 and 5.3-7 7 = 0.42, which is greater than both T3 and T4,both the exact test and the normal approximation accept K17’
Note that in the previous example and Example 5.3-4, the Kendall T statistic is smaller than the Spearman p statistic for the same problem, but that both produced the same decision when testing HI, versus K17 at a level a = 0.1. These two occurrences remain true in general. Although the Spearman p and Kendall T statistics are computed in seemingly very different ways, the two statistics are highly correlated when H , , is true. The correlation between p and T varies from 1 when n = 2 to a minimum value of 0.98 when n = 5, and then approaches 1 as n + 00. The
5
I52
Two-Input Detectors
Spearman p and Kendall 7 detectors are thus asymptotically equivalent under H,,. Whether 7 and p remain highly correlated when the probability densities of the observations are correlated depends on the form of these distributions. Indeed, the correlation between p and 7 may be as small as zero or as large as 0.984 when the observations have a bivariate Gaussian distribution with correlation less than or equal to 0.8 [Kendall and Stuart, 19671. 5.4
Summary
Two different classes of two-input detectors have been developed in this chapter. Of the detectors in the first class, those that have one input plus a set of reference noise samples, the Student's t test was derived and shown to be asymptotically nonparametric and consistent as well as unbiased. Two nonparametric detectors in this class, the Mann-Whitney and normal scores detectors, were also developed and their use demonstrated. Two simultaneous input detectors, which include the sample correlation coefficient, the polarity coincidence correlator, the Spearman p detector, and the Kendall 7 detector, were developed next. These detectors are mainly useful for detecting a random signal and each perform some form of correlation between the two inputs or transformed versions of the two inputs. It was shown that the sample correlation coefficient is asymptotically nonparametric and is unbiased, while the PCC was demonstrated to be nonparametric, unbiased, and consistent. The nonparametric Spearman p and Kendall T detectors possess the important property of a rapid tendency to normality, and hence for moderate n have a very accessible distribution function. The following chapter develops the relative performance of the detectors described here and therefore allows useful comparisons between the detectors to be made. Applications of some of the detectors discussed are described in Chapter 9. Problems 1. In using the S' form of the Mann-Whitney detector given by Eq. (5.2-31), it is desired to test for a positive location shift in they, observations.There are n = 30 xi (reference noise) samples and m = 20y, samples.
If the value of the Mann-Whitney test statistic is given by 817, at what minimum significance level a will the hypothesis be rejected? Use the Gaussian approximation.
Problems
I53
2. Demonstrate that Eq. (5.2-74) is true by selecting a few cases from the table of expected normal scores. 3. Show that the test statistic in Eq. (5.3-9) has a x2 distribution with n degrees of freedom when H i , is true. 4. Suppose it is desired to test H I , against K , , using the Spearman p detector given the two sets of n = 5 observations i : l xi: 3.1 yi: 2.3
2 2.6 2.0
3 2.4 4.2
4 0.8 0.4
5 1.1 1.9
For a significance level a = 0.1, determine the threshold from both the exact distribution and the normal approximation. Compare the decisions reached using the two separate thresholds. 5. Test H I , versus K,, at a level a = 0.1 using the two sets of observations in Problem 4 and the Kendall7 test. Compute the threshold using the exact distribution only. 6. Given the n = 10 reference noise observations it is desired to test the m = 10 input observations for a positive location shift using the MannWhitney detector. Let a = 0.25 and use the exact distribution. n 10: 0.82 0.11 0.21 m = 10: 1.11 0.65 0.10
- 0.32 - 0.60 - 0.27 - 0.41 1.42 1.08 - 0.08 0.40
0.63 0.54 0.95 0.33
- 0.73 0.55
7. Use the PCC detector to test the two sets of n = 12 observations given below for the presence or absence of a stationary random signal at a significance level a = 0.10. Assume that all probability distributions have zero median. i: 1 2 xi: 0.81 1.63 yi: 0.76 0.89
3
- 0.42 - 0.66
4 0.08 0.53
5
- 0.11 0.21
6
- 0.55
- 0.30
7
- 0.03 - 0.45
8 9 0.90 0.78 1.07 0.95
10 11 - 0.04 2.21
- 0.01
1.77
12 1.43 1.39
8. Repeat Problem 7 using the Spearman p detector with a = 0.1. 9. Repeat Problem 8 for the sample correlation coefficient. 10. Compare the effort required in Problems 7-9 to calculate the values of the test statistics and to arrive at a decision. Which test seems preferable, the PCC detector, the Spearman p detector, or the sample correlation coefficient?
CHAPTER
6
Two-Input Detector Performance
6.1 Introduction
In Chapter 5 several parametric and nonparametric detectors which have one input plus a reference noise source or two simultaneous inputs were developed. These detecters provide the capability for tackling a wide variety of detection problems. However, as soon as one considers a specific application, a decision must be made as to which of the detectors to use. One is thus immediately led to the necessity of evaluating the performance of the two-input detectors presented in the preceding chapter. The tools available for the performance evaluation of two-input nonparametric detectors are the same as those which were utilized in Chapter 4 for the performance evaluation of the one-input detectors, namely the small sample relative efficiency and the ARE. In this chapter asymptotic results for the two-input detectors are derived first, followed by a presentation of some of the small sample performance comparisons given in the literature. Since ARE derivations for all of the detectors developed in Chapter 5 clearly cannot be given here, only examples of ARE computations are presented. In particular, the AREs of the PCC and the Mann-Whitney detectors with respect to optimum parametric detectors are derived in Sections 6.3, and 6.4, respectively, preceded by the development of some of the necessary asymptotic results for parametric detectors in Section 6.2. The AREs of other nonparametric detectors are then presented and discussed in Section 6.5. Section 6.6 details small sample comparisons of 154
155
6.2 Asymptotic Results for Parametric Detectors
the Mann-Whitney and normal scores detectors with respect to the parametric Student’s t test and the implications are discussed. A summary and conclusions concerning the utility and validity of the asymptotic and small sample performance indicators are given in Section 6.7. 6.2 Asymptotic Results for Parametric Detectors
In this section asymptotic results for parametric detectors with two simultaneous inputs are derived for use in evaluating the ARE of the PCC detector in Section 6.3. In particular, expressions for the probabilities of detection and false alarm for the sample correlation coefficient and the optimum Neyman-Pearson detectors derived in Section 5.3.1 are obtained when the number of samples available for the detection process is large. Actually, a nonnormalized form of the sample correlation coefficient called the correlation detector is used for the development here. The presentation in this section and Section 6.3 follows Wolff et al. [ 19621. The correlation detector test statistic is given by nl
Sc =
X xiyi
(6.2- 1)
i- 1
where the x i , y i , i = 1, 2, . . . , n, denote the two sets of observations obtained from the two simultaneous inputs. The test statistic in Eq. (6.2-1) is a sum of n, independent and identically distributed random variables, since the xi and y i sequences are independent, identically distributed random variables. For the case where the number of observations n, becomes large, the central limit theorem [Hogg and Craig, 19701 can be invoked to demonstrate that the statistic [ S, - E { S,}]/[var{ S,}]f has a Gaussian distribution with zero mean and unit variance. Assuming that the signal and noise are both Gaussian processes, the hypothesis and alternative being considered here can be explicitly stated as testing Hzo:
x and y are independent, zero mean Gaussian processes each
with variance ui versus
Kzo: x and y are zero mean Gaussian processes and consist of signal plus noise with the signal variance = u,’ and noise variance = ui The expected value and variance of the correlation detector test statistic
156
6
Two-Input Detector Performance
when Hzo and K,, are true can be found by inspection to be E
0
(6.2-2)
nIu$
(6.2-3)
{ ~ c l ~ = z o ~
var{ SJH,)
=
var{ s , I K , )
= n,(u$
+ 20,203
(6.2-5)
Now, the probability of a false alarm for the correlation detector is given by
where, as before, @ {. } is the unit normal cumulative distribution function. Similarly, the probability of detection can be written as l-P=l-@
= I - @
TI - nIu,’
[ n,(u$ + 2u;U:)l
[ @-‘[I
1
I
- a] - ( u & ? / u ; ) G
1
(6.2-7)
where this last expression results by solving Eq. (6.2-6) for TI and substitut1 ing into the first expression for 1 - fl and then factoring u;np out of the radical in the denominator. The optimum Neyman-Pearson detector for testing Hzo versus Kzo was derived in Section 5.3.1 and has a test statistic given by (6.2-8) i- 1
The mean and variance of Soptwhen H,, is true can be calculated as follows
= 2n,4
(6.2-9)
6.2 Asymptotic Results for Parametric Detectors
157
i=j
+ E { X$?IH~O} + E { x : x ; ~ H ~]~-} 4n$7: = 12n20,4
+ 4n2(n2 - 1)0,"- 4400"
(6.2-10)
which yields (6.2-1 1)
var{ SoptIH20} = 8 n 2 4
The middle term in Eq. (6.2-9) vanished since xi and yi are independent and zero mean when H20 is true. For the same reasons, terms which have xi or yi to the first power drop out of the variance calculation. The final result in Eq. (6.2-1 1) is obtained from Eq. (6.2-10) since E { x,41H20} = E { y41H2,} = 30;
for Gaussian random variables [Papoulis, 1965; Melsa and Sage, 19731 and E { XiZYi2IH20) = E
{ x:lH2O}E { Y:lH20}
=
00"
for xi and yi independent under H2@ Performing similar calculations when K20 is true, yields n1
E{SoptIK20} =
2 ( ( 4+ 0s') + 20:
i- 1
= 2n2(4
+ 20s')
+
( 4 + 4>} (6.2-12)
158
6
Two-Input Detector Performance
+ 20;) 4- n2(n2 - 1)(4(0,24+ 80,2(0,2 + us’) + 44‘) - 4n,2(a,2+ 2u:y = 8n2(~; + 20;)~ 2
= 3n2(40,2
2
( 1 ) ;
(6.2- 13)
Since Sop,is a sum of independent and identically distributed random variables, the central limit theorem can again be employed as n2 becomes large. The false alarm probability is therefore given by
Similarly, the probability of detection is 1-/3=1-0
T2 - 2n2(0,2 + 2 4 ) (0;
= I - @
+ 20,’)[
W l ( 1 - a) 1
8n2]
-a 2),./‘,.(
+ (20,2/0,2)
I
(6.2- 15)
where this last expression is obtained by solving Eq. (6.2-14) for T2 and substituting. Equations (6.2-6), (6.2-7), (6.2- 14), and (6.2- 15) are the desired asymptotic expressions for the parametric detector false alarm and detection probabilities. These equations are used with similar results for the PCC detector in the following section to calculate the ARE of the PCC and the correlation and Neyman-Pearson detectors. 6 3 ARE of the PCC Detector Expressions for the false alarm and detection probabilities of the PCC detector when the sample size is large have been derived in Section 5.3.2
159
6.3 A R E of the PCC Detector
and are given by Eqs. (5.3-55) and (5.3-57), which are repeated here for convenience with n = n3 a = 1-
@[ T
/ G
]
(6.3-1)
and l-P=l-@
W ' ( l - a) -$(2p+ 2[P+ (1 - P +
-1)
)I:
1
(6.3-2)
In order to evaluate Eq. (6.3-2), it is necessary to calculate p + given by Eq. (5.3-49) when K20 is true. It is desirable also to evaluate Eq. (6.3-2) for other non-Gaussian probability distributions, and hence a general expression for p + is required. A general expression for p + can be obtained if interest is limited to the case of vanishingly small signal-to-noise ratios and series expansions of the distribution functions F,(A) and F:(A) in Eq. (5.3-49) are employed. This development is somewhat tedious, however, and would contribute little to the present derivation; hence only the result is given here. The final expression for p + is [Wolff et af., 1962; Kanefsky and Thomas, 19651 (6.3-3) P + = i + 2(F: (A)lA-o)2tJ,2 where Fi(A)lA*o = [(d/dh)Fu(A)]1,,, and Fu(A) is the noise probability distribution function. Since for F,(A) zero mean and Gaussian, F: (A)lA=o = l / G uo,
Eq. (6.3-3) becomes 1 P+ = 2
+ -1 -
0,.
(6.3-4) uo' Substituting Eq. (6.3-4) into the probability of detection expression yields
1-p-
1-@
72
W ' ( 1 - a) - v/.3 (2/a)(u,2/uo')
'[ ( V 4 ) -
(1/722)(of/&]
= 1 - @( @ - y l - a) - (
2 6 /n)(u:/u;))
(6.3-5)
where the second-order term in u,'/uf has been neglected since it is assumed that u,"/u,' y } i-I
= mnP{y - x
= (;)(PKlrQ
- P,)^--'
(9.2-12)
-p,)'--'
(9.2-13)
the probability of detection can be written as
P, = 1 - p
=
2 (;)p;(l
r-
T
where T is found from Eq. (9.2-1 1). To obtain analytical results, the expression for p , given by Eq. (9.2-14) must be utilized, PK
=/-pWl*dG(w)
(9.2- 14)
where F(w) is the cumulative distribution function of the range bins containing noise only, and G(w) is the CDF of the range bin containing a target return. Dillard and Antoniak have calculated the performance of their nonparametric radar detector for the three target plus noise distributions given in Table 9.2-2. The noise only distribution is assumed to be the same for all three cases, namely ~ ( w =) 1 - exp( - w2/2)
(9.2- 15)
202
9 Engineering Applications Table 9.2-2 “Target present” probability density functions
Case
Densitv function
2
Case 1 is called the Rice distribution and is the density of the envelope of a constant amplitude sinewave plus additive narrow-band Gaussian noise. The symbol I) denotes the input SNR to the envelope detector. Case 2, called the Rayleigh density, and Case 3 are the densities of the output of a linear envelope detector when the target models are those known as Swerling’s cases 2 and 4 [Swerling, 19601. Since the input fluctuates from pulse-to-pulse for both cases, I) is the average SNR over all fluctuations at the input to the envelope detector. Figures 9.2-4-9.2-6 are reproduced from Dillard and Antoniak [1970] to illustrate the performance of their radar detector, called the modified sign test, when compared to other detectors for the three target present densities in Table 9.2-2 (N = the number of observations). A discussion of the video integration and binary integration detection schemes is given in Dillard and Antoniak [1970]. In all three cases the loss in SNR is seen to be approximately 2 dB with respect to the video integration, and 1 dB with respect to binary integration. This loss is, of course, almost negligible, especially when it is considered that the video and binary integration procedures would be greatly affected by changes in the background noise density, whereas the modified sign test will maintain a constant false alarm rate for a wide class of noise densities. The modified sign test has several important disadvantages. One is the fact that weak targets at longer ranges than strong targets at the same relative antenna position may not be detected. This situation can be improved by comparing different sets of range bins from scan-to-scan, by comparing range bins in varying orders, or by applying some other similar technique. The advantages of such techniques, however, will in general be paid for in system complexity. The modified sign test also has the disadvantage that highly correlated clutter signals may cause a target to be declared present. The utilization of some filtering or clutter rejection technique could reduce this problem somewhat, but again, only with increased system complexity. The detection performance of the nonparametric Dicke-fix radar detector, also called the vector sign test, and the asymptotically nonparametric
203
9.2 Radar System Applications 1.0-
@ 0.5-
0S N R td0)
Fig. 9.24.
Fig. 92-5.
I.o
Q
0.5
Figs. 9.2A9.M. Comparison of the modified sign test with nondistribution free procedures [Dillard and Antoniak, 19701. The curves in Figs. 4 - 5 are as follows: A , video integration; B, binary integration; and C, modified sign test. For Fig. 6 the curves are: A, binary integration ; B, modified sign test.
-
0 SNR (d0)
Fig. 9.2-6.
Siebert detector has been computed by Hansen and Zottl [ 19711. This comparison is particularly interesting since the two detectors have actually been used in radar systems in the past and since the Siebert detector is a version of the one-input Student’s t test. The results obtained indicate a 1-dB disadvantage for the Dicke-fix receiver for both finite and asymptotically large sample sizes. Again, however, these results must be interpreted with caution since the detection probability used was 0.5. Zeoli and Fong [1971] investigated the performance of the MannWhitney detector with respect to the optimum detector for testing a positive location shift in Gaussian noise, the linear detector, over a wide variety of noise probability densities. The results were, as could have been predicted, that the Mann-Whitney detector maintained a more constant false alarm rate, generally performed well in comparison with the linear detector, and for certain input noise statistics, was clearly superior in performance. One important factor in practical applications that must be emphasized, however, is that the Mann-Whitney detector is considerably more complex than the linear detector. All of the nonparametric detectors considered in this section appear to be quite useful when applied to the radar detection problem. The two most attractive properties of these detectors are their simplicity of implementa-
204
9 Engineering Applications
tion (sometimes) and the maintenance of a constant false alarm rate (always) for a wide range of noise densities. The Wilcoxon and MannWhitney detectors 'are the most efficient of the detectors considered, but the requirement that large amounts of data be rapidly ranked may prohibit their use in real-time applications. The nonparametric detectors based on the sign test are relatively efficient detectors for the detection problems considered and have the advantage of not requiring the ranking of observations, and thus are quite simple to implement. In fact, the practical usefulness of the vector sign detector has already been proven since its continuous analog, the Dicke-fix receiver, has been applied successfully for many years. A detection system already in use by the Federal Aviation Administration, designated the Common Digitizer [ 19701, is strikingly similar to the modified sign test formulated by Dillard and Antoniak. Nonparametric detectors have thus already proven their practical usefulness, and their application to the search radar detection problem is seen to be well justified by the results discussed in this section. 9.3 Other Engineering Applications Less energy has been expended by engineers in trying to apply nonparametric detectors to engineering problems other than various radar detection problems. This does not mean that such applications are not possible. It simply emphasizes the fact that engineers in general have only recently become aware of nonparametric detectors and hence much work remains to be done in applying these detectors to realistic problems. A performance relation which is very useful for system design has been derived and used by Capon [1959b, 19601, and later used by Hancock and Lainiotis [ 19651 for detector performance evaluation. Although the proof is not included here, it can be shown [Capon, 1959bl that for any two-input detector test statistic which satisfies the regularity conditions given in Chapter 2 that are necessary for the efficacy to exist, the performance relation
& O:mn/ ( m + n) = 2{erf-'[l - ~cu,,,,,] + erf-'[ 1 - 2P,,,,,])
2
(9.3-1)
holds for m, n large and el small (the weak signal case). In Eq. (9.3-1), a,,,,, and P,,,,, denote the false alarm and false dismissal probabilities, respectively, & is the efficacy, and
205
9.3 Other Engineering Applications
A similar relation can be obtained for one-input detectors as [Capon, 1959b, 19601
& fl?n = 2{erf-'[ 1 - 2a,]
+ erf-'[
1 - 2P,]}2
(9.3-2)
where the parameters are as defined for Eq. (9.3-1). The performance relations in Eqs. (9.3-1) and (9.3-2) are extremely important for detector performance comparisons and system design since they provide a relatively simple algebraic relation between the detector efficacy, the SNR, the number of observations required, and the false alarm and false dismissal probabilities. It is thus possible, for example, to determine the number of observations required by a detector to achieve a specified a and for a given input signal magnitude if the detector efficacy is known. Of course, the efficacies of most of the more familiar nonparametric detectors have been derived and are available in the literature. A specific example of the utilization of these equations for system design is given in the following application. In many radio communication systems, the magnitude of the input signal to the receiver varies with time even though the transmitted signal power remains constant. This phenomenon is called signal fading and is caused by a wide variety of physical channel effects [Schwartz et al., 19661. Capon [ 1959b] described the application of the Mann-Whitney nonparametric detector to such a communication system problem, which he called the scatter propagation problem. In this problem the received signal power is not constant due to transmission channel effects, but the background noise and reference noise samples are assumed to be stationary and Gaussian with zero mean and unit variance. The noise is also assumed to be additive. The assumption that the noise is stationary even though the signal is fluctuating is valid, since the primary noise sources are present in the receiver itself. A common example of such a noise is thermal noise generated in the receiver front end. The resulting detection problem is therefore one in which it is desired to detect a time-varying signal in a background of stationary noise. If it is assumed that a set of m reference noise observations are available, the hypothesis and alternative to be tested consist of H:
K:
CDF of the xi is F ( x ) , i = 1,2, . . . , m + n (signal absent) CDF of the xi is G@,(x),i = 1,2, . . . , n and the CDF of x,,+~ is ~ ( x )i ,= 1, 2, . . . , m (signal present)
where the xi, i = 1, 2, . . . , n are the input observations to be tested and the xi+,,, i = 1, 2, . . . , m are the reference noise samples. The probability distribution function under the alternative is assumed to differ from
N16
9 Engineering Applications
sample to sample only in terms of the SNR parameter 0,. In order to use the performance relation in Eq. (9.3-1), it is necessary to let m and n be large, and calculate the asymptotic distributions under the hypothesis and alternative. If the observations are independent when H is true, the joint distribution function is just a product of the marginal distributions, and hence is Gaussian. Capon proves, under the restrictions that the limit of m/n exists as m and n + 00 and that F ( x ) and Go,(x)are continuous, that the Mann-Whitney test statistic (actually Capon’s V test which is a slightly modified version of the Mann-Whitney detector and which has been previously considered in Section 5.2) is asymptotically Gaussian when K is true. Further, the means and variances under the hypothesis and alternative are given by E { V I H } = 1/2
+ n + 1)/12mn E { V I K } = (1/2) + (8/6)
var{ V I H } = (m
e
= var{ V I K }
where = Z;-,?/n,and it has been assumed that when K is true, F ( x ) and G8,(x)are related by G,,(x) = ( I -
e , ) q x )+ e p ( . )
Using these results, the efficacy becomes & = 1/3 Letting 8, in Eq. (9.3-1) become performance relation becomes
(9.3-3) (9.3-4)
e and substituting for the efficacy, the +
-e’mn - 6{erf-’(l
- 2a) erf-’(1 - 2p)}’ (9.3-5) m + n Since the reference noise observations are obtained during system “dead time,” it is reasonable to assume that m >> n, and therefore, Eq. (9.3-5) can be written as
ne’ = 6{erf-’(l - 2a)
+ erf-’(1
- Zp)}’
(9.3-6)
To arrive at a specific system design, assume that it is necessary for satisfactory operation that a = p = 0.01, and that the average value of the peak signal to rms noise ratio is known to be 0.1. Substituting these values into Eq. (9.3-6) and solving for n yields
n = 600(erf-’(0.98)
+ erf-’(0.98)}’
= 6500 observations for m >> n To satisfy the assumption that m > n, let m = 10n = 65,000 samples. The
207
9.3 Other Engineering Applications
threshold setting necessary to achieve the false alarm probability of a = 0.01 can be found from 1 -(Y 0.01 = -~a exp[ 2 4
&
ZI,
by letting u: = var{ Y I H }
and
/.to =
I+
(9.3-7)
E{ VIH)
Substituting and solving for the threshold T produces T = 0.5. The probability of an error in this communication system is thus P, = a P { H } P P { K } = O.O1{P{H} + P { K } } = 0.01
+
For this detection problem then, the number of observations required to attain the desired performance level for the weak signal received is fairly large. Additionally, the storage of such a large number of reference observations (65,000) seems prohibitive. This application, however, which is due to Capon [1959b], has illustrated how Eqs. (9.3-1) and (9.3-2) can be used for system design. One possible application of the polarity coincidence correlator is the detection of a weak noise source or random signal in additive background noise. This problem arises in radio astronomy or acoustics, as in an underwater sound detection system. Ekre [ 19631 has compared the performance of the PCC with that of an analog cross-correlator and an analog version of the PCC for three different input signal spectra. The signal and the noises in the two different inputs are all assumed to be independent, zero mean, Gaussian processes with identical normalized power spectra. The SNR is used as the measure of performance. The three different input signals are obtained by passing the white noise signal through an RC low pass filter, a single-tuned RLC filter, and a rectangular band-pass filter which have normalized autocorrelation functions P R d 7 ) = exp( 1' ) (9.3-8) pRLC.7) = exp( -Ao(T~) cos wor PBP(7) =
(9.3-9)
sin Ao7 cos w0r
(9.3- 10)
respectively. The results indicate a loss for broad-band signals of 1-10 dB for the PCC with respect to the analog cross-correlator depending on the sampling frequency. The loss decreases to the 1-4 dB range for narrow-band input signals. The loss of the PCC with respect to the analog PCC is 0.5 dB or less. For relatively high sampling frequencies, the losses fall in the lower
-
208
9 Engineering Applications
end of the given ranges, and therefore, the losses experienced may be negligible in light of the simple implementation requirements of the digital PCC discussed in this text. The following narrow-band version of the Wilcoxon detector was formulated by Helstrom [ 19681 for detecting a narrow-band phase-incoherent signal, and is equivalent to the narrow-band Wilcoxon detector discussed in Section 9.2 and investigated by Hansen [ 1970al for application to radar. A similar test statistic has also been suggested by Carlyle [ 19681. The input signal is assumed to be (9.3-1 1) s ( t ) = A Re{f(t) exp(jwt + j + ) } + n ( t )
+
where A is constant, w is the known carrier frequency, is the unknown phase, and n ( t ) is additive white, Gaussian noise. The input signal is mixed with coswt and sin at to generate the in-phase and quadrature components of s ( t ) . From these components two sets of independent observations are obtained via sampling. These two sets of observations given by { xC1,xC2,. . . , x,,,} and { y,,, ysz, . . , ,y,,} are ranked separately by their absolute values as and lxci,l < lxci21 < . * * < lxciml lxsj,l < Ixsj21< * * * < 1xsi.l where { i l , i,, . . . , in} and { j l , j 2 ,. . . , j n } are some permutations of the first n integers. Using these ranked values, the test statistics
(9.3-12) and (9.3-13) are computed along with
s; = i n ( . s; = in(.
+ 1) - s, + 1) - s,
(9.3- 14) (9.3- 15)
The statistic S: = max{S,, S;} and S," = max{Ss, S ; } are squared and summed, and the result compared to a threshold. Whenever the test statistic is greater than the threshold, a signal is decided to be present. The resulting detector is a two-sided test of the input signal magnitude. For the no-signal present condition, the phase should be random and the test statistics in Eqs. (9.3-12) and (9.3-13) will be close to their mean value, an(, + 1). When a signal is present, however, the phase should remain relatively constant and one or both of the mixer outputs will have a large number of either positive or negative values, thus producing a large value
9.4 Summary
209
of the final test statistic (Sl)2 + (S:)2. Since this statistic is closely related to that narrow-band Wilcoxon version proposed by Hansen, the performance should be the same as the narrow-band Wilcoxon detector discussed in Section 9.2. A few applications of nonparametric detectors to some important communication system configurations have been presented in this section. Other applications of nonparametric detectors to communication system problems are given by Capon [1959b, 19601, and Hancock' and Lainiotis [ 19651. The Mann-Whitney and Wilcoxon detectors' applications could easily have been considered in Section 9.2 rather than here, and reciprocally, the narrow-band forms of the normal scores and sign detectors could have been developed in this section. In any event, it is evident that forms of nonparametric detectors suitable for applications to realistic communication systems can be devised, and therefore, nonparametric or distribution-free detectors provide a viable engineering alternative to parametric detectors for communication system applications. 9.4
S
m
Since nonparametric detectors were originally developed primarily by statisticians for solving nonengineering problems, the test statistics discussed in Chapters 3 and 5 many times are not in a form which allows them to be used for engineering applications. In this chapter, modifications to detector test statistics necessary to provide a form suitable for radar and communication system applications have been presented and performance results of these detectors for their specific applications have been given. The general conclusion that can be reached is that radar and communication system applications of nonparametric detectors are possible, and therefore nonparametric detectors can be used profitably in those systems which require the maintenance of a constant false alarm rate for varying input probability densities. The modified nonparametric detector test statistics seem to retain the encouraging performance characteristics of the original detectors. As engineers become more familiar, and hence, more comfortable with nonparametric detectors, the applications of these detectors to realistic engineering problems is certain to increase.
APPENDIX A
Probability Density Functions
Frequently in the text, a random variable is said to have a certain type of probability density function, such as Gaussian, Cauchy, Laplace, uniform, or others. Since the exact form of the probability density function indicated by these labels is not perfectly standardized and since the reader may be unfamiliar with some of these probability densities, those probability density functions which appear often in the text are listed here for easy reference. Obviously, no attempt has been made at completeness, but rather the goal is to minimize confusion and thus to allow efficient utilization of the textual presentation.
Gaussian or Normal (Fig. A-1)
Parameters: I -
p, u
fX (X
1
P-sCr PQU P-@ P
210
.
x
W WUP+50
Flg. A-1. Example of a Gaussian density function.
ProbabiIity Density Functions
2zz
Uniform or Rectangular (Fig. A-2)
1,
fX(x)
Parameters:
a, b
1fx fl
1 b-a '
for a < x < b otherwise
- -----
'
b-a
,x
Fig. A-2. Example of a density function for a uniform distribution.
Cauchy (Fig. A-3)
Parameters:
A, 0
Fig. A-3.
Example of a Cauchy density function.
Laplace or Double Exponential (Fig. A-4) fX
Parameter:
( x ) = f h e-+l
- 00
< x < 00
A
Exponential (Fig. A-5) fx(x) =
Parameter:
for
A
("7,
for x > o otherwise
Appendix A
212
-2
0
-I
I
Fig. A-4. Example of a Laplace density function.
0
I
I
I
2
Ax
Fig. A-5. Example of a density function for an exponential distribution.
Binomial (Fig. A-6) f x ( x ) = P { X = k } = b(k; n , p ) = (;)p"qn-"
Parameters: n, p Student's t (Fig. A-7)
Parameter: n
for k = 0, 1 , 2 , .
. . , n.
Probability Density Functions
213
Fig. A-7. Example of a density function for a Student's r distribution.
APPENDIX B
Mathematical Tables
Table
Number B- 1 B-2 B-3 B-4
B-5 B-6 B-7 B-8
214
Title Cumulative binomial distribution Cumulative Gaussian distribution Cumulative Student’s t distribution Wilcoxon (one-input) test: Upper tail probabilities for the null distribution Expected normal scores Mann-Whitney (two-input) test: Upper tail probabilities for the null distribution Quantiles of the Kendall 7 test statistic Quantiles of the Spearman p test statistic
Table B-1 Cumulative binomial distribution" ~~
n
r'
2
1 2
- -
3
4
5
6
7
8
9
1
2 3
1 2 3 4
1 2 3 4 5
.10 .15 - 0975 ,1900 ,2775 ,0025 .0100 ,0225
.05
-
~
~~
~
P .25 .3600 4375 ,0400 .0625 .20.
-
-
.30
.35
.40
,5100 ,0900
,5775 ,1225
,1600
.45
.50
- -
,6400 .6975 ,2025
,7500 ,2500
,2710 .3859 ,0280 .0608 ,0010 .0034
.488O ,1040 .0080
,5781 ,1562 ,0156
6570 ,2160 ,0270
.7254 ,7840 ,2818 .3520 ,0429 ,0640
,1855 .0140 .0005
.3439 .0523 .0037 .o001
,4780 ,1095 .0120 ,0005
,5904 ,6836 ,1808 .2617 ,0272 .0508 ,0016 ,0039
.7599 ,3483 ,0837 ,0081
,8215 ,4370 .1265 ,0150
,2262 ,0226 ,012
,4095 ,0815 ,0086 .o005
,5563 ,1648 ,0266 ,0022 .0001
,6723 .2627 ,0579 ,0067 ,0003
,7627 ,3672 .lo35 .0156 ,0010
,8319 .4718 ,1631 ,0308 ,0024
,8840 ,9222 ,5716 .6630 ,2352 ,3174 .0870 ,0540 ,0102 ,0053
,9497 ,7438 .4069 ,1312 ,0185
,8824 ,5798 ,2557 ,0705
,9246
,9533 ,7667 ,4557 ,1792 ,0410
,9723 ,8364
,2553 .0692
.6562 ,3438 .lo94
,1426 ,0072
.ow1
.oooo
.m .m .m
.8704 ,5248 ,1792 .02.58
.a336 .4252 ,0911
.a750
.5000
,1250
,9375 ,9085 ,6090 ,6875 .2415 ,3125 ,0410 ,0625 ,9688
,8125
.5000
,1875 ,0312
1 2 3 4 5
,2649 .0328 ,0022 .0001
,4686 .1143 ,0158 ,0013 ,0001
,6229 .2235 .0473 ,0059 ,0004
,7379 .3447 ,0989 .0170 ,0016
,8220 ,4661 ,1694 ,0376 ,0046
.OlW
,3529 ,1174 ,0223
6
.m .oooo
.m
,0001
,0002
.o007
.0018
.0041
,0083
,0156
1 2 3 4 5
,3017 ,0444
.6794 ,2834 .0738 ,0121 ,0012
.7903 .4233 .1480 .0333 ,0047
,8665 ,5551 ,2436 ,0706 ,0129
,9176 ,6706 ,3529 ,1260 ,0288
,9510 ,7662 ,4677 ,1998 .0556
,9720 ,8414
.9848 .a976 .6836 ,3917 ,1529
,9922 ,9375 ,7734
6 7
.m .m ,0001 .o004 ,0013 ,0038 .0090 ,0188 ,0357 ,0625 .m .m .oooo .m . o001 ,0002 .0006 ,0016 .0037 ,0078
1 2 3 4 5
,3366 ,0572
.m
,0038
,0002
.m
.0058
,0004
,5217 .1497 ,0257 ,0027 .oO02
,5695 .la69 ,0381 ,0050
.m ,0004
,7275 ,3428 ,1052 ,0214 .0029
.8322 ,4967 ,2031 .0563 ,0104
.a999 ,6329 ,3215 ,1138 ,0273
,9424 ,7447 .4482 ,1941 ,0580
,6809
,5801
.2898 ,0963
,5585
.9844 ,8808
.m ,2206
,9681 .9832 ,9916 ,9961 ,8309 ,8936 ,9368 ,9648 .5722 ,6846 .7799 ,8555 ,2936 ,4059 ,5230 ,6367 .lo61 .1737 ,2604 ,3633
.m .m ,0002 .0012 ,0042 .0113 ,0253 ,0498 ,0885 ,1445
6 7 8
.oooo .oooo .m
. o001
,0004 ,0013
,0036
,0085
.0181
,0352
.ooo9
.m .m .ooo .oooo ,0001 ,0002 ,0007 ,0017 .0039
1 2 3 4 5
,3698 ,0712 ,0084 ,0006
.6126 ,2252
.a658 ,5638 .2618
,9249 ,6997 ,3993 ,1657 ,0489
,9793 ,8789 ,6627 ,3911 ,1717
.9899 ,9954 .9295 ,9615 ,7682 .a505 ,5174 ,6386 .2666 ,3786
,9980 .9805 ,9102 ,7461
,0536
,0994 ,1658 ,2539 .0250 .0498 0898 .0038 ,0091 ,0195 .0003 .o008 ,0020
6 7 8 9
.m
,0530
,0083 .o009
.m .0001
,7684 .4005 .1409 ,0339
,0856
,0056
.0196
.OOO6
,0031
,9596 ,8040 ,5372 .2703
,0988
,0100 .0253 .0043 .OOO4 .0000
.oooo .oooo .oooo .0003 .0013 .oooo .oooo .oooo .0000 ,0001 .oooo .oooo .oooo .oooo .oooo
. 0112
,0014 ,0001
.5000
215
Appendix B
216
Table B-1 (Continued)a
n
in
I’
i
2 3 4 5
6 7 8 0
10 11
086 1
0115 0010 0001
m m m m m
10
15
6513 2639 0702 0128 0016
8031 4557 1798
- -
0001
0500 0099
0014 0001
8926 6242 ,3222 ,1209 ,0328
P 25 9437 7560 4744 2241 0781
,0064 .oO09 ,0001
0197 0035 0004
.20
- -
.40
.45
,9718 ,9865 ,8507 ,9140 ,6172 ,7384 ,3504 .4862 ,1503 ,2485
9940 9536 ,8327 ,6177 3669
9975 ,9767 . m 4 ,7340 ,4956
,0473 ,0106 ,0016 ,0001
,0949 ,0280 ,0048 .0005
.1662 ,0548 ,0123 ,0017 .0001
,2616 ,1020 ,0274 .0045 .0003
30
.35
.50
-
.m
,9893 ,9453 8281 ,6230 ,3770 ,1719 ,0547 .0107 .0010
m m
m
m
m
.m oooo .m m
.m .m
6862 3026 0896 0185
8327 5078 2212 0694 0159
,9141 ,6779 ,3826 .lSll .0504
9578 8029 5448 2867 1146
.9802 .a870 .6873 ,4304 ,2103
.9912 ,9964 .9986 .9995 ,9394 ,9698 ,9861 ,9941 ,9348 ,9673 ,7999 ,8811 ,5744 ,7037 ,8089 ,8867 ,3317 ,4672 ,6029 ,7258
0003
0027 0003
,0117
0343 0076 0012 0001
,0782 .0216 .0043
,1487 ,0501 .0122 .o020 .oO02
m o o 0 0
4312 1019 0152 0016 0001
0028
7 8 9 10
m m m m m
m m m m
m .m m .m .ooo8 m .m m .m
,2465 ,0994 ,0293 ,0059 .oO07
,3889 ,1738 ,0610 ,0148 ,0022
,2744 ,1133 ,0327 ,0059
11
m
m
m .m m .m .m .m
,0002
.m5
1 2 3 4 5
4588 1184 0196 0022 0002
,7176 ,3410 ,1109 ,0258 ,0043
9683 8416 8083 3512 1576
,9862 ,9150 ,7472 ,5075 ,2763
6
m ,0005 .0046 ,0194 0544 m .0001 ,0007 .0039 0143 m .m ,0001 ,0006 0028 m .m .m .0001 0004 m .m .m .m oooo
,1178 .03M ,0095 .0017 .o002
1
2 3 4 5 6
12
05 4013
7 8 9
10 11
12
,8578 ,5585 ,2642 ,0922 ,0239
,0020
.9313 ,7251 ,4417 ,2054 ,0726
.m
,9943 ,9576 .a487 ,6533 ,4167
,9978 ,9804 .9lM ,7747 ,5818
,8992 ,9917 ,9579 ,8655 .6956
,9998 ,8888 .8807 .9270 .a062
.2127
,3348 ,1582 .0573 ,0153
,4731 .2607 .1117 .0358 f0079
.6128 .a72 ,1938
,0848
.0255 ,0056 .0008
,0028
.0730
.0n3
m .m .m .m m .m ,0001 ,0003 .0011 .0032 m .m .m .m m .m .m .m ,0001 .0002
-
“From CRC Staruhrd Mathematical Tables, S. M. Selby and B. Girling, editors, Copyright 0 The Chemical Rubber Co., 1965. Used by permission of the Chemical Rubber Company.
Table B-2 Cumulative Gaussian distribution".
.W/ .01
.
.08
z
----------
.o
.5OOO .5398 .5793 .6179 .655
.5040 .5438 .5832 .6217 .6591
.5478 .5871 .6255 .6628
.5517 .5910 .6293 .6664
.5557 .5948 .6331 .6700
.5596 .5987 .6368 .6736
.5636 .5675 .5714 .5753 .6026 .MI64 ,6103 .6141 .6406 .6443 .6480 .6517 .6772 . .6844 .6879
.5 .6 .7
.9
.6915 .7257 .7580 .7881 .8159
.6950 .7291 .7611 .7910 .8186
.6985 .7324 .7642 .7939 .8212
.7019 .7357 .7673 .7967 .8238
.7054 .7389 .7704 .7995 .8264
.7088 .7422 .7734 .8023 .8289
.7123 .7454 .7764 .8051 .8315
1 .o 1.1 1.2 1.3 1.4
.8413 .8643 .8849 .9032 .9192
.8438 .8665 .8869 .go49 .9207
.8461 .a85 .SO8 .8686 .8708 .8729 .8888 .8907 .8925 .9066 .9082 .9099 .9222 .9236 .9251
.8531 .8749 .8944 .9115 .9265
.8554 .8770 .8962 .9131 .9279
1.5 1.6 1.7 1.8 1.9
.9332 .9452 .9554 .9641 .9713
.9345 .9463 .9564 .9649 .9719
.9357 .9474 .9573 .9666 .9726
.9370 .9484 .9582 .9664 .9732
.9382 .9495 .9591 .9671 .9738
.9394 .9505 .9599 .9678 .9744
2.0 2.1 2.2 2.3 2.4
.9772 .9821 .9861 .9893 .9918
.9778 .9826 .9864 .9896 .9920
.9783 .9830 .9868 .9898 .9922
.9788 .9834 .9871 .9901 .9925
.9793 .9838 .9875 .9904 .9927
.9798 .9842 .9878 .9906 .9929
.9941 .99S6 .9967 .9976 .9982
.9943 .9957 .9968 .9977 .9983
.9945 .9959 .9969 .9977 .99a
.9987 .9993 .9993 .9991 .9994 .9995 .9995 .9995 .9997 .9997 .9997
.9988 .9991 .9994 .9996 .9997
.9988 .9992 .9994 .9996 .9997
.1 .2 .3 .4
.8
2.s 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4
:::I
.04
.05
.06
.09
.9803 .9846 .9881 .9909 .9931
1
.7157 .7486 .7794 .8078 .8340
.71 .7517 .7823 .8106 .8365
.7224 .7549 .7852 .8133 .8389
.9418 .9525 .9616 .9693 .9766
.9429 .953S .9625 .9699 .9761
.9441 .9545 .9633 .9706 .9767
.9808 .9812 .9817
.9850 . 9 w .9857 .9884 .9887 ,9890 .9911 .9932
.9986l .998G
1.2821.6451.9602.3262.5763.0903.2913.891 4.417 --------
F(4
- F(z)l
"F(x) =
.03
.5080 .5120 .5160 .5199 ,5239 .S279 .S319 .5359
::ti
2
zI1
.02
J_a,
.90 .95 .975 _____------.20
.99 .995 ,999 .9995 .99995 ,999995
.10 .05 .02 .01
.002 .001
I
.OOO1 .ooOol
e-*/2dt
'From Introduction to the Theory of Statistics by A. M. Mood and F. A. Graybill, 2nd ed., Copyright Q 1963, the McGraw-Hill Book Company, Inc. Used with permission of McGrawHill Book Company. 217
Appendix B
218
Table B-3 Cumulative Student's t distributiona#
'
.75
.90
.95
.975
.9995 -I-
1 2 3 4
1 .Ooo .816 .765 .741 .727
3.078 1.886 1.638 1.533 1.476
6.314 2.920 2.353 2.132 2 015
12.706 4.303 3.182 2.776 2.571
31.821 6.965 4.541 3.747 3.365
63.657 9.925 5.841 4.604 4.032
636.619 31.598 12.941 8.610 6.859
.718 .711 .706 .703 .700
1.440 1.415 1.397 1.383 1.372
1.943 1.895 1.860 1.833 1.812
2.447 2.365 2.306 2.262 2.228
3.143 2.998 2.896 2.821 2.764
3.707 3.499 3.355 3.250 3.169
5.959 5.405 5.041 4.781 4.587
11 12 13 14 15
.697 .695 .694 .692 .691
1.363 1.356 1.350 1.345 1.341
1.796 1.782 1.771 1.761 1.753
2.201 2.179 2.160 2.145 2.131
2.718 2.681 2.650 2.624 2.602
3.106 3.055 3.012 2.977 2.947
4.437 4.318 4.221 4.140 4.073
16 17 18 19 20
.690 .689 .688 .688 .687
1.337 1.333 1.330 1.328 1.325
1.746 1.740 1.734 1.729 1.725
2.120 2.110 2.101 2.093 2.086
2.583 2.567 2.552 2.539 2.528
2.921 2.898' 2.878 2.861 2.845
4.015 3.965 3.922 3.883 3.850
21 22 23 24 25
.686 .686 .685 .685 .684
1.323 1.321 1.319 1.318 1.316
1.721 1.717 1.714 1.711 1.708
2.080 2.074 2.069 2.064 2.060
2.518 2.508 2.500 2.492 2.485
2.831 2.819 2.807 2.797 2.787
3.819 3.792 3.767 3.745 3.725
26 27 28 29
.684 .684
.683
1.315 1.314 1.313 1.311 1.310
1.706 1.703 1.701 1.699 1.697
2.056 2.052 2.048 2.045 2.042
2.479 2.473 2.467 2.462 2.457
2.779 2.771 2.763 2.756 2.750
3.707 3.690 3.674 3.659 3.646
.681 .679 .677 .674
1.303 1.296 1.289 1.282
1.684 1.671 1.658 1.645
2.021 2.000 1.980 1.960
2.423 2.390 2.358 2.326
2.704 2.660 2.617 2.576
3.551 3.460 3.373 3.291
6 6 7 8 9 10
*
.683 .683
30 40
60
120 a
a
F(t)
=/'
(+)!
x) 2
(*+1)/2
dx
-= (Y)!G(l+
bAdapted from Table 111 of Statistical Tables for Biological, Agricultural, and Medical Research, by R. A. Fisher and F. Yates, 6th ed., Longman, 1974. Published by permission.
219
Mathematical Tables Table B-4 Wilcoxon (one-input) test: Upper tail probabilities for the null distribution' n X
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
3
.625 .375 .250 .125
4
.562 .438 .312 .188 .125 .062
5
6
7
8
9
so0
.406 .312 .219 .156 .094 .062 .031
so0
.422 .344 .281 .219 .156 .lo9 .078 .047 .031 .016
.53 1 .469 .406 .344 .289 .234 .188 .148 .lo9 .078
.055
.039 .023 .016 .008
527 .473 .422 .371 .320 .273 .230 .191 .156 .125 .098 .074
.055
.039 .027 .020 .012 .008 .004
.500
.455 .410 ,367 .326 .285 .248 .213 .180 .150 .125 .102 .082 .064 .049 .037 .027 .020 .014 .010 .006 .004 .002
220
Appendix B Table B-4 (Continued)" n ~
X
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
10
~~~~
11
12
13
14
15
500
.461 .423 .385 .348 .312 .278 .246 .216 .188 .161 .138 .116 .097
.080
.065 .053 .042 .032 .024 .019 .014 .010 .007
.005
.003 .002 .001
.517 .483 .449 .416 .382 .350 .319 .289 .260 .232 .207 .183 .160 .139 .120 :lo3 .087 .074 .062 .05 1 .042 .034 .027 .021 .016 .012 .009 .007
.005
.003 .002 .001 .001
.ooo
.515 .485 .455 .425 .396 .367 .339 .311 .285 .259 .235 .212 .190 .170 .151 .133 .117 .lo2 .088 .076
.065 .055
.046 .039 .032 .026 .021 .017 .013 .010 .008 .006
.005
.003 .002 .002 .oo 1 .oo 1
.ooo
500 .473 ,446 .420 .393 .368 .342 .318 .294 .271 .249 .227 .207 .188 .170
.153 .137 .122 .lo8 .095 .084
.013 .064 .055
.047 .040
.034 .029 .024 .020 .016 .013
500 .476 .452 .428 .404 .380 .357 .335 .313 .292 ,271 .25 1 .232 .213 .196 .179 .163 .148 .134
.121 .108 .097 .086 .077 .068
.511 .489 .467 .445 ,423 .402 .381 .360 .339 .319 .300 .281 .262 .244 .227 .211
.195 .180
221
Mathematical Tables Table B-4 (Continued)o n X
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
10
11
12
13
14
15
.ooo
.011 .009 .007
.059 .05 2 .04 5 .039 .034 .029 .025 .02 1 .O 18
.165 .i51 .138 .I26
.005
.004 .003 .002 .002 .oo 1 .oo 1 .oo 1
.ooo .ooo .ooo
.o 1 5 .o 12 .o 10
.008 .007
.005
.004 .003 .003
.002
.002 .oo 1 .oo 1 .oo 1
.ooo .ooo .ooo .ooo .ooo
.115
. I 04 .094 .084 .076 .068 .060 .053 .04 7 ,042 .036 .032 .028 ,024 .02 1 .O 18 .o 15 .O 13 .01 I .009 .008 .006
.005
.004 ,003 .003 ,002 .002 .oo 1 .oo 1 .001
.oo I .ooo ,000 .ooo .ooo .ooo .ooo ,000
“Adapted from A Nonparametric Introduction to Statistics, by C. H. Kraft and C. van Eeden, Copyright Q 1968, The Macmillan Company. Published by permission.
Appendix B
222
Table B-5 Expected normal scoresa
Y
2
b
1
0.56419 0.84628
2 3 4 5
11 1 2 3 4 5
6 7 8 9
10
1 2
3 4 5
6 7 8 9 10 11 12
13 14 15
3
-
40000
12
13
-
4
5
1.02938 1.16296 0.29701 0.49502
-
14
40000
-
6
-
1.86748 1.40760 1.1 3095 0.92098 .74538
O~QOOOO 0,10259 0.19052
-
-
21
22
23
1.88917 1.43362 1.16047 0.95380 .78150
1.90969 1.45816 1.18824 0.98459 41527
142916 1.48137 1.21445 1.01356 0.84697
8
9
1.42360 1.48501 045222 093330 *47282 .57197 *00000 -15251 .27453 40000
1.26721 1.35218 0.64176 0.75737 .20155 a35271
-
-
-
15
16
1.53875 1.58644 1.62923 1.66799 1.70338 1.73591 1.76599 1@0136 1.06192 1.11573 1.16408 1.20790 1.24794 1.2k474 0.65606 0.72884 0.79284 044983 0.90113 044769 0.99027 .71488 *76317 46176 .37576 *46198 .53684 40285 .12267 .22489 .31225 .38833 .45557 -51570 .57001 -
7
17
18
19
1.79394 1.31878 1.02946 0.80738 .61946
142003 1.35041 1.06573 0.84812 46479
1.84448 1.37994 1.09945 0.88586 .70661
0.26730 0.33530 0.39622 0.45133 0.50158 0.54771 .1653O *23375 .29519 .35084 40164 .07729 .14599 .20774 ,36374 wJo00 .ooOOO .06880 -13072
4oooO 0.08816
-
24
1.94767 1.50338 1.23924 1~04091 0-87682
-
-
40000
25
26
n
28
29
1.96531 1.52430 1.26275 1.06679 0.90501
1.98216 1.54423 1.28511 1.09135 0.93171
1.99827 1.56326 1.30641 1.11471 0.95705
2.01371 1.58145 1.32674 1.13697 0.98115
2.02853 1.59888 1.34619 1.15822 140414
0.59030 042982 0.66667 0.10115 0.73554 0.76405 0.79289 0.82021 0.84615 0.87084 -72508 .75150 -63690 66794 49727 e53157 -56896 60399 .44833 a49148 .64205 61385 ~5,1935 .55267 .58411 .31493 .36203 .40559 .44609 .48391 .50977 .53982 .28579 .32965 .37047 .40860 .44436 .47801 .18696 .23841 .37706 .41096 .44298 *06200 .11836 .16997 .21755 .26163 ,30288 .34105 O~OOOOO 0.05642
-
-
0.108 13
4oooo
-
0.15583 0.20006 0.24128 0.27983 0.31603 0.35013 .05176 .OD953 .14387 .18520 .22389 .26023 ~00000 ,04781 .OD220 .13361 .17240 .OoooO ,04442 .08588
-
-
40000
“Adapted from Table 9 of Biomefrika Tables for Sfafisticiam, Vol. 2, E. S. Pearson and H. 0. Hartley, editors, Cambridge:The University Press, 1972. Published by permission of the Biometrika Trustees and the author, H. L. Harter.
Mathematical Tables
223
Table B-6 Mann-Whitney (two-input) test: Upper tail probabilities for the null distribution" .X
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110
n=9 .500 .466 .432 .398 ,365 .333 ,302 .273 .245 .218 .I93 .I70 .I49 ,129 .I11 G.095 .081 .068 .057 ,047 .039 .031 .025 .020 .016
m=9 n=lO
x
n-9
n=lO
111
.516 ,484 .452 .42 1 .390 ,360 .330 .302 .274 .248 ,223 .200 .I78 .I58 .I39 ,121 .I06 .09 1 .078 .067 .056
112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 I29 I30 131 132 I33 134 135
,012 .009 ,007 .005
.047 .039 .033 ,027 .022 .017 .014 .01 I .009 .007
.004
,003 .002 ,001 .001 ,001 .Ooo .Ooo .Ooo .Ooo .Ooo
.Ooo
x
105 106
I07 108 109 110 111
112 113 I14
.005
115
.004 .003 .002 .001 .001
118
.00I
.Ooo .Ooo
.Ooo .Ooo .Ooo .Ooo .Ooo .Ooo
116 I17
119 I20 121 122 I23 124 125 126 127 128 I29
m = 10 n-10 x
.515 ,485 ,456 ,427 .398 .370 ,342 ,315 .289 .264 .241 ,218 ,197 .I76 .I57 .I40 .I24 .I03 .095 ,083 .072 .062 .053 ,045 ,038
I30 131 I32 133 134 I35 136 I37 138 I39 140 141 I42 143 144
145 146 147 I48 I49 150 151 I52 I53 I54 155
n*10
,032 ,026 .022 ,018 .014 .012 .009 .007
.006 ,004
.003 .003 .002 .00I .00I .00I .00I .Ooo .Ooo
.Ooo .Ooo
.Ooo .Ooo .Ooo .Ooo
.Ooo
"Adapted from A Nonparametric Introduction to Statistics, by C . H. Kraft and C. van Eeden, Copyright 0 1968, The Macmillan Company. Published by permission.
T d ~ kB-7 Quantiles of the Kendall T test statistie * n
p = .900
.950
.975
.990
.995
4 5 6 7 8 9
4 6 7 9 10 12
4 6 9
14 16
6 8 11 13 16 18
6 8 11 15 18 22
6 10 13 17 20 24
10 11 12 13 14 15
15 17 18 22 23 27
19 21 24 26 31 33
21 25 28 32 35 39
25 29 34 38 41
47
27 31 36 42 45 51
16 17 18 19 20
28 32 35 37
44
40
36 40 43 47 50
50 56 61 65 70
56 62 67 73 78
21 22 23 24 25
42 45 49 52 56
54 59 63 66 70
69 73 78 84
64
76 81 87 92 98
84 89 97 102 108
26 27 28 29 30 31 32 33 34 35
59 61 66 68 73 75 80 84 87 91
75 79 84 88 93 97 102 106 111 1 I5
89 93 98 104 109 I15 I20 126 131 137
105 111 116 124 129 135 I42 I50 155 163
115 123 128 136 143 149 158
11
48 51 55
60
164
173 179
“The values in the table are for the statistic S. The value of T can be found from Eq. (5.3-79, T p
S
$n(n
- 1)
bAdapted from “Tables for Use in Rank Correlation,” by Kaarsemaker and van Winjngaarten,Srarisricu Neerlandica, Vol. 7,pp. 41-54, 1953. Published by permission. 224
Mathematical Tables
225
Table B-8 Quantiles of the Spearman p test statistica IJ
4 5 6 7 8 9
p = .YO0
.950
.Y75
.YYO
.8000 .7000 .6000 s357
.8O00 .8OOO .7714 .6786 .6190 S833
.9OOo .8286 .7450 .7143 .6833
.9OOo
so00 .4667
.995
f999
.8857 .8571 .8095 .7667
.9429 .8929 .8571 .8167
.9643 .9286 .9000
12 13 14 I5 16 17 18 19 20
424 .4182 .3986 .3791 .3626 .3500 .3382 .3260 .3148 .3070 .2977
S273 .4965 .4780 .4593 .4429
.6364 .609 I S804 349 S341 s179
.7333 .7000 .6713 .6429 .6220 .6000
.7818 .7455 .7273 .6978 .6747 .6536
.8667 .8364 .8182 .7912 .7670 .7464
,4265 .4118 .3994 .3895 .3789
so00 .4853 .47 I6 .4579 .445 I
S824 S637 S480 .5333 S203
.6324 .6152 s975 .5825 S684
.7265 .7083 .6904 .6737 .6586
21 22 23 24 25
.2909 .2829 .2767 .2704 .2646
.3688 .3597 .35 I8 .3435 .3362
.4351 .424 1 .4150 .406 I .3977
SO78 .4963 .4852 .4748 A654
5545 S426 3306 3200
.6455
.5100
5962
26 27 28 29 30
.2588 .2540 .2490 .2443 .2400
.3299 .3236 .3175 .31 I3 .3059
.3894 .3822 .3749 .3685 .3620
.4564 .448 I .440 I .4320 .425 I
so02 .4915 4828 .4744 .4665
5856 5757 5660 367 5479
10 I1
.55 15
.6318 .6186
.6070
“Adapted from “Critical Values of the Coefficient of Rank Correlation for Testing the Hypothesis of Independence,” by Glasser and Winter, Biometrika, Vol. 48, pp. 444-448, 1961. Published by permission of the Biometrika Trustees.
Answers to Selected Problems
CHAPTER 3 1. For n = 20, the mean of the Wilcoxon test statistic is 105 and its variance is 717.5. The value of the test statistic is S, = 192. For a = 0.1, the threshold is T = 139.4. Thus, S, = 192 > T = 139.4 accept K , 2.
The power function is 1- p =
210
('k")p*(l - p)'O-*
k-8
where p = 1 - F(0). The false alarm probability is a = 0.0547, and for F(0IK) = 0.1, 1 - p = 0.93 3. The largest value of a such that a < 0.1 is 0.0547, and hence the threshold T = 8. (i) (ii) (iii)
10
2 "(xi) = 6 < 8 *
i= 1
10
2 .(xi)
i- I
=
9
>8 *
acceptH accept K
Cannot use the sign test until Chapter 7 because of the zero-valued observation 4. From Table B-4, the threshold T = 41. The value of the test statistic
226
Answers to Selected Problems
is Sw = 48. Thus, Sw = 48
>
T = 41
227
accept K5
5. The mean and variance of the test statistic for n = 10 are 27.5 and 96.25, respectively. The threshold T = 40. Thus, S, = 48 > T = 40 =$ accept K5 6. For the sign detector, T is found from Table B-1 and
2
0.08
(:)(0.5)*
0 ) = 1 - F,(O). Therefore,
E [ SSIK] = np = n P [ X
> OIK]
d
and
E [ SSIK] = n f ( 0 )
From Section 2.7 the efficacy is &s, = 4f2(0)
3. For n large, the Student’s t test in Eq. (3.2-18) has a Gaussian distribution with mean p and variance u2 for a general K. Therefore, E[SstIK] =GP / U and var[S,JH or K] = 1. Using the expression for the efficacy yields = l/u2
&, 5.
From Eq. (4.3-12), since u2 = &
E2,,= 1
6. For the sign detector with n = 12 observations, T = 9, a = 0.073, and 1 - /3 = 0.974 from Problem 3-6. The parameters n and T must be selected for the linear detector to achieve a = 0.073 and 1 - /3 = 0.974. Now, E [ S,,,IH] = 31,
var[ SOptlH] = 1/n
a = 0.073 = P
[
Sopt-
l / G
31
>-
T - 31 l/G
H
Answers to Selected Problem
229
and
From Table B-2, and
( T - 31)/ ( l / d i ) = 1.45
( T - 32.3)/ ( l / f i ) = -2.34
Solving simultaneously, n = 8.5. Thus, the relative efficiency is edm, opt = 8.5/ 12 = 0.7 1
CHAPTER 5 1. The mean and variance are given by Eqs. (5.2-47) and (5.2-56),
and
E [ S ' I H ] = 765
var[ S ' I H ] = 2550
Std. Dev.[ S ' I H ] = 50.5 From the Gaussian approximation, a =
P[
SMW
-
50.5
765
>
T'IH]
Since SM, = 817, T' = 1.03, and from Table B-2, a = 0.1515. 4. From Table B-8, TexacI = 0.700 and from Table B-2, Tappro, = 1.282 for 2 = p m . The test statistic is found to be p = 0.7. Therefore, p = TexacI = 0.7, and hence, barely accept K,, with the exact test, but the approximation 2 = 0 . 7 f i = 1.4 > 1.282
clearly accepts K,,. 5. From Table B-7, T,,,,, = 0.6 since in(. - 1) = 10. The value of the test statistic is 7 = 0.6, exactly on the threshold, as obtained with the Spearman p test. 6. From Table B-6, the threshold is T = 115. Using the statistic S in Eq. (5.2-30) yields S =
LO
2 R,=
46
< 115
accept K;
i= 1
or using Eq. (5.2-31) for S' produces 10
S'= i- I
I$,= 116 > 115
accept K.
230
Answers to Selected Problems
7. The threshold can be found from 12
a =
(?)( + ) < 0.10 k
k- T’
as T = 3 for a = 0.073. The value of the test statistic is Spec = 10, so decide “signal present.” 8. The value of the test statistic is p = 0.84 and from Table B-7, T = 0.3986. Thus, decide “signal present.” 9. The value of the test statistic is i = 0.917. From Table B-3, T, = 1.372, thus from Eq. (5.3-44) T; = 0.398 i
>T *
signal present
CHAPTER 6 3. Neither are desirable since both are designed for problems where the scale parameters are identical, and here u: # uz. Both detectors may yield an a more than twice the desired value if applied to this problem. 6. From Problem 6-5, ( a / a r ) E { ~ l K= } 2
/
r
m
so that the efficacy is &, = 18/r2 using the two-input definition of the efficacy. 7. Again using the two-input definition of the efficacy G; = 2. 8. The ARE of the Kendall 7 with respect to the sample correlation coefficient is thus E7.; = 9 / r 2 CHAPTER 7 1. In Eq. (7.2-1), N = 9 and M = 12 so that the midrank = 11. From Problem 3-1 the threshold is T = 139.4, and using the given observations the value of the test statistic is S, = 193. Thus, S , = 193 > T = 139.4, therefore accept 4. 6. The mean and variance of the Wilcoxon test statistic are 76.5 and 446.3, respectively. For a = 0.1, T = 103.6 and the value of the test statistic is found to be S, = 147. Hence S, > T, therefore accept K,.The margin of S, over T is reduced here. 7. The rankings most likely to cause H , to be accepted are R, = 12 and R,, = 11, R,, = 10 or R,, = 10, R,, = 11. This yields a test statistic value
231
Answers to Selected Problems
of s,H = 192. The rankings most likely to cause K, to be accepted are R, = 10, R,, = 12, R,, = 11 or R , , = 11, R,, = 12. For these rankings, the value of the test statistic is S; = 194. From Table B-2,
P[St = 1921H] = 0.0006
and
P [ S ; = 1941H] = 0.0004
and since both are less than a = 0.1, K, is accepted. 10. (1) S!') = 6 and T = 8. Since S!') < T, accept H,. (2) S;*) = 8 = T, thus the test statistic and threshold are equal, hence accept K,. 11. New variance = 717. No change from before due to small number of tied groups. CHAPTER 8 1.
Under dependent sample conditions [Lee, 19601, n
2.
[Armstrong, 19651 R ( k 7 ) = (1/4)
7. ARE = 0.993!
+ (1/27r) sin-'
p(k7)
References
Arbuthnott, J. (1710). An argument for Divine Providence, taken from the constant regularity observed in the births of both sexes, Philos. Tram. 27, 186190. Amstrong, G. L. (1965). The effect of dependent sampling on the performance of nonparametric coincidence detectors. Ph.D. dissertation, Oklahoma State University, Stillwater. Arnold, H. J. (1965). Small sample power of the one-sample Wilcoxon test for nonnormal shift alternatives. Ann. Math. Statist. 36, 1767-1778. Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. Roy. SOC.Ser. A (London) 160, 268. Blomqvist, N. (1950). On a measure of dependence between two random variables. Ann. Math. Statist. 21, 593. Blyth, C. R. (1958). Note on relative efficiency of tests. Ann. Math. Statist. 29, 898-903. Bradley, J. V. (1963). Studies in research methodology V, irrelevance in the t test’s rejection region. United States Air Force, No. AMRL-TDR-63-109, Wright-Patterson Air Force Base, Ohio. Bradley, J. V. (1968). “Distribution-Free Statistical Tests.” Prentice-Hall, Englewood Cliffs, New Jersey. Capon, J. (1959a). A nonparametric technique for the detection of a constant signal in additive noise. IRE WESCON Conu. Record, Part 4, 92-103. Capon, J. (1959b). Nonparametric methods for the detection of signals in noise. Dept. Electrical Engr. Tech. Report No. T-l/N, Columbia Univ., New York. Capon, J. (1960). Optimum coincidence procedures for detecting weak signals in noise. IRE Internat. Conu. Record, Part 4, 156166. Carlyle, J. W. (1968). Nonparametric methods in detection theory. In “Communication Theory.” (A. V. Balakrishnan, ed.), pp. 293-319. McGraw-Hill, New York. Carlyle, J. W. and Thomas, J. B. (1964). On nonparametric signal detectors. IEEE Tram. Information Theory IT-10,No. 2, 146-152. Chanda, K. C. (1963). On the efficiency of two-sample Mann-Whitney test for discrete populations. Ann. Math. Statist. 34, 612-617. Conover, W. J. (1971). “Practical Nonparametric Statistics.” Wiley, New York. Davisson, L. D., and Thomas, J. B. (1969). Short-time sample robust detection study. United States Air Force, No. RADC-TR-69-24, Griffiss Air Force Base, New York.
232
References
233
Dillard, G. M. (1967). A moving-window detector for binary integration. IEEE Trans. Information Theory IT-13,2-6. Dillard, G. M., and Antoniak, C. E. (1970). A practical distribution-free detection procedure for multiple-range-bin radars. IEEE Trans. Aerospace and Electron. Systems AES-6, 629-635. Dixon, W. J. (1953). Power functions of the sign test and power efficiency for normal alternatives. Ann. Math. Statist. 24, 467473. Dixon, W. J. (1954). Power under normality of several nonparametric tests. Ann. Math. Sfatist. 25, 610-614. Ekre, H. (1963). Polarity coincidence correlation detection of a weak noise source. IEEE Trans. Information Theory IT-9,18-23. Faran, J. J., Jr., and Hills, R., Jr. (1952a). Correlators for signal reception. Acoustics Res. Lab., Tech. Memo. No. 27, Harvard Univ., Cambridge, Massachusetts. Faran, J. J., Jr., and Hills, R., Jr. (1952b). The application of correlation techniques to acoustic receiving systems. Acoustics Res. Lab., Tech. Memo. No. 28, Harvard Univ., Cambridge, Massachusetts. Federal Aviation Administration. (1970). “Introduction to the Common Digitizer,” Vols. 1-3. Directed Study Manual FRD-410, Oklahoma City, Oklahoma. Feller, W. (1960). “An Introduction to Probability Theory and Its Applications,” Vol. I, 2nd ed. Wiley, New York. Fisher, R. A., and Yates, F. (1949). “Statistical Tables for Biological, Agricultural, and Medical Research,” 3rd ed. Hafner, New York. Fraser, D. A. S. (1957a). “Nonparametric Methods in Statistics.” Wiley, New York. Fraser, D. A. S. (1957b). Most powerful rank-type tests. Ann. Math. Sfafist.28, 1040-1043. Gibbons, J. D. (1964). On the power of two-sample rank tests on the equality of two distribution functions. J. Rcyal Statist. Sm., Ser. E , 26, 293-304. Gibbons, J. D. (1971). “Nonparametric Statistical Inference.” McGraw-Hill, New York. Hajek, J. (1969). “A Course in Nonparametric Statistics.” Holden-Day, San Francisco. Hajek, J., and Sidak, Z. (1967). ‘Theory of Rank Tests.” Academic Press, New York. Hancock, J. C., and Lainiotis, D. G. (1965). Distribution-free detection procedures. Tech. Report AFAL-TR-65-12, Vol. IV, AD-613-069, Wright-Patterson Air Force Base, Ohio. Hansen, V. G. (1970a). Nonparametric detection procedures applied to radar. Proc. Uniu. of Missouri-Rolla-MervinJ. Kelb Comm. Con$, pp. 24-3- 1 through 24-3-6. Hansen, V. G. (1970b). Detection performance of some nonparametric rank tests and an application to radar. IEEE Trans. Information Theory IT-16,309-318. Hansen, V. G., and Olsen, B. A. (1971). Nonparametric radar extraction using a generalized sign test. IEEE Trans. Aerospace and Electron Systems, -7, 942-950. Hansen, V. G., and Zottl, A. J. (1971). The detection performance of the Siebert and Dicke-fix CFAR radar detectors. IEEE Trans. Aerospace and Elecfron. Sysfems AES-7, 706-709. Helstrom, C. W. (1968). “Statistical Theory of Signal Detection.” Pergamon, New York. Hodges, J. L., and Lehmann, E. L. (1956). The efficiency of some nonparametric competitors of the t-test. Ann. Math. Statist. 27, 326335. Hodges, J. L., and Lehmann, E. L. (1961). Comparison of the normal scores and Wilcoxon tests. Proc. 4th Berkeley Symposium on Math. Statist. and Probability. (Jerzy Neyman, ed.), Vol. I, pp. 307-317. Univ. of California Press, Berkeley. Hoeffding, W. (1951). Optimum nonparametric tests. Proc. 2nd Berkeley Symposium on Math. Sfatist. and Probability. (Jerzy Neyman, ed.), pp. 83-92. Univ. of California Press, Berkeley. Hogg, R. V., and Craig, A. T. (1970). “Introduction to Mathematical Statistics,” 3rd ed. MacMillan, London.
234
References
Hollander, M., and Wolfe, D. A. (1973). “Nonparametric Statistical Methods.” Wiley, New York. Hotelling, H., and Pabst, M. R. (1936). Rank correlation and tests of significance involving no assumption of normality. Ann. Math. Srarisr. 7 , 29-43. Kanefsky, M., and Thomas, J. B. (1965). On polarity detection schemes with non-Gaussian inputs. J. Franklin Insr. 280, No. 2, 120-138. Kendall, M. G. (1962). “Rank Correlation Methods,” 3rd ed. Griffin, London. Kendall, M. G., and Stuart, A. (1967). “The Advanced Theory of Statistics,” Vol. 11, 2nd ed. Hafner, New York. Kendall, M. G., and Stuart, A. (1969). “The Advanced Theory of Statistics,” Vol. I. Charles Griffin, London. Klotz, J. (1963). Small sample power and efficiency for the one-sample Wilcoxon and normal scores tests. Ann. Marh. Statist. 34, 624432. Klotz, J. (1964). On the normal scores two-sample rank test. J. Amer. Statist. Assoc. 59, 652-664. Klotz, J. (1965). Alternative efficiencies for signed rank tests. Ann. Math. Sratist. 36, 1759-1766. Kruskal, W. H. (1957). Historical notes on the Wilcoxon unpaired two-sample test. J. Amer. Statist Assoc. 52, 356-360. Leaverton, P., and Birch, J. J. (1969). Small sample power curves for the two-sample location problem. Technometrics 11, No. 2, 299-307. Lee, Y. W. (1960). “Statistical Theory of Communicatipn.” Wiley, New York. Lehmann, E. L. (1947). On optimum tests of composite hypotheses with one constraint. Ann. Math. Srarisr. 18, 473-494. Lehmann, E. L. (195 1). Consistency and unbiasedness of certain nonparametric tests. Ann. Marh. Statisr. 22, 165. Lehmann, E. L. (1953). The power of rank tests. Ann. Math. Starisr. 24, 23-43. Lehmann, E. L. (1959). “Testing Statistical Hypotheses.” Wiley, New York. Lehmann, E. L., and Stein, C. (1948). Most powerful tests of composite hypotheses. Ann. Math. Statist. 19, 495-516. Mann, H. B., and Whitney, D. R. (1947). On a test whether one of two random variables is stochastically larger than the other. Ann. Marh. Statist. 18, 50-60. Melsa, J. L., and Cohn, D. L. (1976). “Decision and Estimation Theory.” McGraw-Hill, New York. Melsa, J. L., and Sage, A. P. (1973). “Introduction to Probability and Stochastic Processes.” Prentice-Hall, Englewood Cliffs, New Jersey. Melton, B. S., and Karr, P. H. (1957). Polarity coincidence scheme for revealing signal coherence. Geophysics 22, 553-564. Mood, A. M., and Graybill, F. A. (1963). “Introduction to the Theory of Statistics,” 2nd ed. McGraw-Hill, New York. National Bureau of Standards. (1950). Tables of the binomial probability distribution. Appl. Marh. Ser. 6. Neave, H. R., and Granger, C. W. J. (1968). A Monte Carlo study comparing various two-sample tests for differences in mean. Technometrics 10, No. 3, 509-522. Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philos. Trans. Roy. Soc. London Ser. A . 236, 333. Noether, G. E. (1967). “Elements of Nonparametric Statistics.” Wiley, New York. Papoulis, A. (1965). “Probability, Random Variables, and Stochastic Processes.” McGrawHill, New York. Pitman, E. J. G. (1948). Lecture notes on nonparametric statistical inference. Columbia Univ., New York.
References
235
Pratt, J. W. (1959). Remarks on zeros and ties in the Wilcoxon signed rank procedures. J. Amer. Srarisr. Assoc. 54, 655-667. Putter, J. (1955). The treatment of ties in some nonparametric tests. Ann. Marh. Srarisr., 26, 368-386. Romig, H . C. (1953). “50-100 Binomial Tables.” Wiley, New York. Rosenblatt, M. (1961). Independence and Dependence, Proc. 4rh Berkeley Symposium on Marh. Srarisr. and Probabilify. (Jeny Neyman, ed.), pp. 431-433. Univ. of California Press, Berkeley. Sage, A. P., and Melsa, J. L. (1971). “Estimation Theory with Applications to Communications and Control.” McGraw-Hill, New York. Savage, I. R. (1953). Bibliography of nonparametric statistics and related topics. J. Amer. Srarisr. Assoc. 48, No. 264, 844-906. Savage, I. R., and Chernoff, H. (1958). Asymptotic normality and efficiency ‘of certain nonparametric test statistics. Ann. Mark Srarisr. 24, 972-974. Schwartz, M., Bennett, W. R., and Stein, S. (1966). “Communication Systems and Techniques.” McGraw-Hill, New York. Selby, S. M., and Girling, B. (4s.) (1965). “CRC Standard Mathematical Tables,” 14th ed. The Chemical Rubber Co.,Cleveland, Ohio. Spearman, C.(1904). The proof and measurement of association between two things. Amer. J . P ~ c h o l o15, ~ 72-101. Swerling, P. (1960). Probability of detection for fluctuating targets. IRE Trans. Information Theory IT-6,269-308. Terry, M. E. (1952). Some rank order tests which are most powerful against specific parametric alternatives. Ann. Math Statist. 23, 346-366. Thomas, J. B. (1970). Nonparametric detection. Proc. IEEE 58, No. 5,623-631. Van der Waerden, B. L. (1952). Order tests for the two-sample problem and their power. Proc. Koninklijke Nederlandre Akad. Wetenschappen 55,453458. Van der Waerden, B. L. (1953). Order tests for the two-sample problem and their power. Proc. Koninklijke Nederlandse Akad Wetenschappen 56, 80, 303-316. Van Trees, H. L. (1968). “Detection, Estimation, and Modulation Theory.” Wiley, New York. Wilcoxon, F.(1945). Individual comparisons by ranking methods. Biomerrics 1, 80-83. Witting, H. (1960). A generalized Pitman efficiency for nonparametric tests. Ann. M a d Srarisr. 31, 405414. Wolff, S. S., Gastwirth, J. L., and Rubin, H. (1967). The effect of autoregressive dependence on a nonparametric test. IEEE Trans. Information Theory IT-13,311-313. Wolff, S. S., Thomas, J. B., and Williams, T. R. (1962). The polarity-coincidencecorrelator: A nonparametric detection device. IRE Trans. Information Theory IT-8,5-9. Zeoli, G. W., and Fong, T. S. (1971). Performance of a two-sample Mann-Whitney nonparametric detector in a radar application. IEEE Trans. Aerospace and Electron. Systems AES-7,95 1-959.
This page intentionally left blank
Index
A Alternative hypothesis, 3 Antoniak, C. E., 200, 201, 202, 204 A priori probability, 13 Arbuthnott, J., 6 ARE, see Asymptotic relative efficiency Armstrong, G. L., 184, 185, 189 Arnold, H. J., 91 Asymptotically nonparametric, 3 1 Asymptotic loss, 195 Asymptotic relative efficiency, 32-36, see olso Specific detectors Average probability of error criterion, see Ideal observer criterion
B Bayes decision criterion, 12-17 Bayes risk, 14 Binomial distribution, 212-213 table of, 215-216 Birch, J. J., 168 Blomqvist, N., 135 Blyth, C. R., 91 Bradley, J. V., 2, 6, 47, 58, 65, 165, 175, 176, 180 C c , test, see Normal scores test
Capon, J., 33, 94, 111, 115, 116, 204, 205, 206,207, 209
Carlyle, J. W., 87, 141, 190, 193, 208 Cauchy density function, 21 1 Chanda, K. C., 178 Chernoff, H., 88 Circular serial correlation coefficient, 132 Cohn, D. L., 9 Common digitizer, 204 Communication systems applications, 204209 Composite hypothesis, 10, 24 -3 I Conover, W. J., 144, 148 Consistency, 3 1 Correlation coefficient, 123, 134 Correlation detector, 136, 155-156 Craig, A. T., 45, 46, 50, 104, 155 Critical region, 11, 18-20
D Davisson, L. D., 184, 186, 187, 189, 190 Detector comparison techniques, 31-36 Dicke-fix radar detector, see Vector sign test Dillard, G. M., 200, 201, 202, 204 Distribution-free detectors, definition of, 2 Dixon, W. J., 32, 167
E Efficacy, 33-36, see also Specific detectors Ekre, H., 135, 207 Expected cost, 13 Exponential distribution, 21 1-212
237
Index
238 F
False alarm, 11 False alarm probability, see Probability of false alarm Faran, J. J., Jr., 135 Feller, W., 139, 140 Fisher, R. A., 121 Fisher-Yates test, see Normal scores test Fong, T. S., 203 Fraser, D. A. S., 29, 45, 49, 60 G
Gaussian distribution, 2 10 table of, 217 Generalized detector, 3 Generalized sign detector, 199-200 Gibbons, J. D., 143, 146, 169 Girling, B., 55, 114, 142, 143 Granger, C.W. J., 168 Graybill, F. A., 61, 62
H Hajek, J., 53, 71, 110, 112, 178 Hancock, J. C., 34, 204,209 Hansen, V. G., 192, 194, 195, 196, 198, 199, 203, 208 Helstrom, C. W., 26, 208 Hills, R., Jr., 135 Hodges, J. L., 32, 85, 92, 166 Hoeffding, W., 60 Hogg, R. V., 45,46, 50, 104, 155 Hollander, M., 180 Hotelling, H., 6, 143 Hotelling-Pabst test, 143-144
I Ideal observer criterion, 16-17 Inverse normal transformation, 63 Irrelevant. 47
K Kanefsky, M., 135, 159 Karr, P. H., 135 Kendall, M. G., 29, 65, 68, 82, 105, 119, 133, 134, 135, 146, 148, 150, 151, 152, 171, 178, 188
Kendall tau coefficient, 68 Kendall tau detector, 197, 198 one-input, 68-71, 196-198 ARE, 87-90 recursive form of test statistic, 70 table of quantiles, 224 two-input, 68-70, 148-152, 187 ARE, 165 asymptotic distribution, 151 dependent sample performance, 187188 mean, 150-151 relation to Spearman rho, 148, 151152 variance, 150-1 5 1 Klotz, J., 88, 89, 90,91, 93, 167 Kruskal, W. H., 108
L Lainiotis, D. G., 34, 204, 209 Laplace density function, 21 1-212 Least favorable choice, 25 Least favorable distribution, 26 Leaverton, P., 168 Lee, Y.W., 189 Lehmann, E. L., 29, 32, 41, 49, 55, 57, 58, 85, 92, 115, 132, 166, 169 Likelihood ratio, 14 Likelihood ratio test, 14, 18 Linear detector, 41 asymptotic false alarm probability, 76 asymptotic probability of detection, 75-76 efficacy, 82 LMP, see Locally most powerful Locally most powerful, 30
M Mann, H. B., 108, 109, 110, 115 Mann-Whitney detector, 107-1 16, 199200,203, 205-207 ARE, 165-166 ARE calculation, 162-164 ARE with respect to the normal scores test, 166 asymptotic distribution, 115 dependent sample performance, 187 efficacy, 163-164
Index equivalent statistics, 108- 11 mean, 11 1-1 12, 163 small sample performance, 167- 170 table for null distribution, 223 variance. 112-1 15 Medial correlation test, see Polarity coincidence correlator Median detector, 184 performance for correlated input samples, I 85-186 Melsa, J. L., 9, 157 Melton, B. S., 135 Miss, 11, 12 Modified sign test, 200-202 Mood, A. M., 61, 62 Most powerful test, 17-21 Moving window detector, 200
N Neave, H. R., 168 Neyman-Pearson lemma, 17-2 1, 24 Nonparametric detectors, definition of, 2 Nonparametric versus parametric detection, 2 Normal distribution, see Gaussian distribution Normal scores test one-input, 57-61, 64-65 ARE, 87-90 asymptotic loss, 195 locally most powerful, 58-60 narrow-band, 194 small sample performance, 90-91 two-input, 116-122 ARE, 165-166 ARE with respect to the Mann-Whitney detector, 166 asymptotic distribution, 121 locally most powerful, 118 mean, 119 small sample.performance, 167-170 variance, 120-121 Nuisance parameter, 47 Null hypothesis, 3 0
Olsen, B. A., 198, 199
239
P Pabst, M. R., 6, 143 Papoulis, A., 157 Parametric detectors one-input, 39-47 two-input, 97-107, 122-135 asymptotic performance, 155-158 PCC, see Polarity coincidence correlator Pitman, E. J. G., 31, 33 Polarity coincidence correlator, 135-141, 207-208 ARE, 165 ARE calculation, 158-162 asymptotically unbiased, 141 asymptotic distribution, 139-140 consistent, 140-141 nonparametric, 140-141 uniformly most powerful, 141 Power of the test, see Probability of detection Pratt, J. W., 176, 177, 178 Probability of detection, 12 Probability of false alarm, 11-12 Putter, J., 173, 176
R Radar system applications, 192-204 Receiver operating characteristics, 2 1-24 Relative efficiency, 31-32 Robust, see Asymptotically nonparametric ROC, see Receiver operating characteristics Romig, H. C., 138 Rosenblatt, M., 190
s Sage, A. P., 157 Sample correlation coefficient, 126135 asymptotic distribution, 134135 UMP unbiased, 132-133 Savage, I. R., 6, 88 Schwartz, M., 205 Selby, S. M., 55, 114, 142, 143 Sidak, Z., 71 Siebert detector, 203 Sign detector, 47-51 ARE, 87-90
240
Index
ARE calculation, 75-81 asymptotic false alarm rate, 78 asymptotic loss, 195 asymptotic probability of detection, 77 dependent sample performance, 182-183 false alarm rate, 50 invariance properties, 5 1 narrow-band, 194 probability of detection, 50 small sample performance, 91 Significance level, see Probability of false alarm Similar tests, 29 Simple hypothesis, 10 Small sample performance one-input detectors, 90-92 two-input detectors, 167-170 Spearman, C., 141 Spearman rho detector one-input, 65-67 ARE, 87-90 invariance properties, 67 table of quantiles, 225 two-input, 141-148 ARE, 165 asymptotic distribution, 146-147 mean, 144146 variance, 144- 146 Stein, C., 41 Stuart, A., 29, 82, 105, 119, 133, 134, 135, 148, 151, 152, 171, 188 Student's t distribution, 212-213 table of, 218 Student's r test one-input, 4 2 4 7 asymptotically nonparametric, 46 consistent, 46 irrelevance, 47 UMP unbiased, 45-46 two-input, 98-107, 162 asymptotically nonparametric, 107 consistent, 107 efficacy, 162 irrelevance, 107 UMP unbiased, 105-106 Sufficient statistic, 14 Swerling, P., 202
T Terry, M. E., 1'18, 121, 122, 167 Test statistic, 4, 14 Thomas, J. B., 81, 135, 141, 159, 184, 186, 187, 189, 190 Threshold of the test, 4 Tied observations, 172-180 dropping of zeros and ties, 175 general procedures, 173-176 continuous noise denkity case, 173-175 discrete noise density case, 175-176 midrank, 173-174 randomization, 173 specific studies, 176178 Type 1 error, see False alarm Type 2 error, see Miss
U UMP, see Uniformly most powerful Unbiased detector, 28-30 Uniform distribution, 21 1 Uniformly most powerful, 26-28 unbiased, 29 similar, 29
V Van der Waerden, B. L., 61 Van der Waerden's test, 61-65 one-input, ARE, 87-90 Van Trees, H. L., 9 Vector sign test, 195-198 W Whitney, D. R., 108, 109, 110, 115 Wilcoxon detector invariance properties, 55-57 lower bound on ARE, 85-87 one-input, 51-57, 180, 184, 186-187 ARE, 87-90 ARE calculation, 81-87 asymptotic distribution, 53 asymptotic loss, 195 dependent sample performance, 184, 186-187
241
Index efficacy, 84 mean, 53-54 narrow-band, 196198, 208-209 small sample performance, 9&9 I table for null distribution, 219-221 variance, , 5 4 5 5 variance in the presence of ties, 180 tied observations, 177-178 two-input, see Mann-Whitney detector Wilcoxon, F., 52, 108 Witting, H., 168 Wolfe, D. A., 180
A 5 B 6
c 7
o a
c 9 F O
G I
H 2
1 3
1
4
Wolff, S. S., 135, 155, 159, 161, 182
Y Yates, F., 121
2 Zeoli, G. W., 203 Zero-valued observations, see Tied observations Zottl, A. J., 203
This page intentionally left blank