The Foundations of Statistics LEONARD
J.
SAVAGE
"* BII,. . Higpu PrtJf,sJOr of Sl4IiShu YiIh U"w"si"
llCONO aBVISID ...
109 downloads
2214 Views
29MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
The Foundations of Statistics LEONARD
J.
SAVAGE
"* BII,. . Higpu PrtJf,sJOr of Sl4IiShu YiIh U"w"si"
llCONO aBVISID BDI110H
DOVER PUBUCAnONS, INC. NEW YORK
Copyright © 1972 by Dover Publicationa, Inc.. CoP}rilbt © 1954 by I. IUchard Sante. All rights raerwd under PaD AmeriCaJI aad 1Dter'· national CopJrilht Conw:adonL
Tbit DoYer edition. fint publilbed in 1972. is a reviled and enIarFd YCnioa of the wort oripaall, pGblilhed by John WOey. Son. in 195f.
lfltma4tiOntJl SlImdMd Bool Number: tJ.41J6..62J49·1
Libr." 01 CO"VaI CaI4Zo, Cdrd NU1fIber: 7"1181'" Manufactured in the United Stata of America Dover PublicadoDs, lac. ISO Varic:k Street New Yort. N.Y. )OOlf
TO MY FA1HER
Preface to the Dover Edition CONTINUINO INTEREST HAS LlIoJOOUR.\OED PUBlJC4\TION OF A SEOOND
edition of this book. BecaU8e revising it to fit my present thinking and the new elimate of opinion about the foundatioDS of statiBtica would obliterate rather than nwtore, I have limited myself in the preparatioD of this edition much 88 though dealin, with the work of another. The objective erron that have come to my attention, mainly through the generosity of readers, of whom Peter Fishburn has my special thanJra, have been corrected, of eoUI'Ie. Minor and mechanical ones, sueb .. a name misspelled or an inequality that had peraiated in pointing in the wrong direction, have been silently eliminated. Other changes are conspicuous 88 additions. They consist mainly of thi8 Prefacey Appendix 4: Bibliographic Supplement, and several footnotes identified 88 new by the To enable you to pursue the many new developments since 19M according to the intensity and direction of your own interests, a number of new references leading to many more are listed in the Bibliographic Supplement, and the principle adV8Dces knowu to me are pointed out in new footnotes or in comments on the DeW references. Citations to the bibliography in the original Appendix 3 are made by a compact, but otherwise ill.advised, letter aDd number code; thoae to the new Appendix 4 are made by a DOW popular system, whieh is dective, informative, and flexible. Example: The historie papers (Borel 1924) aDd [D2] have beeD translated by Kyburg and SmokIer (1964). The following paragraphs are intended to help you approach this book with a more current perspective. To some extent, they will be intelligible and useful even to a novice in the foundations of statistics, but they are neeell8rily somewhat teehnieal and will therefore take on new m@aoing if you return to them as your reading in this book and elsewhere progre88e8. The book falla into two parts. The 11m, ending with Chapter 7, is a general introduction to the peJ"8ODalistic tradition in probability and utility. Were this part to be done over, radical revision would not be required, though I would DOW IUpplement the Jine of argument centering around 8 system of postulatea by other leas formal approaches, eaeb convincing in ita own way, that converge to the general coDe1wdon that peJ'IODal (or subjective) probability is a good key, and the best yet
.+.
...
IU
• IV
PREFACE TO THE DOVER EDITION
mown, to all our valid ideas about tbe app1ieationa of probability. There would also be IDaD1 new worb to report on and analyze more thoroughly
than can be done in footnotes. The origiDal aim of the aecond part of the book. beginning with Chapter 8, is aD too plainly stated in the aeeond complete paragraph on page 4. There, a personaliatie justification is promiaed for the popular body of devices developed by the enthusiastically frequentiatic aeboola that then occupied almOlt the whole statistical &ceDe and atill dominate it, though 1. . eompletAtly. The aecond part of the book is jndeed devoted
to perIOnalistie diae1l8Bion of frequentiatie dmC8l, but for one after another it reluctantly admits that jUBtifieatioD haa not been found. Freud aloDe could uplain how the rub and UDfulfilled promise on page 4 went UDamended through 10 many reviaioDa of the mlDWlCript. Today, as I see it, the theory of peraonal probability applied to statistics sbows that many of the prominent frequentistie deviees can at belt lead to accidental and approximate, not ayatematic and eopnt, 8IlCeelS, U is expanded upon, perhaps more optimistically, by Pratt (1965). AmODg the ill-founded frequentiatie devieea are minimax rul. almOit all tail-area tests, tolt-ranee intervals, aDd, in a 80rt of elaaa by itaelf, fiducial probability. If I have lost faith in the deviCftl of the frequentistic aehools, I have learned new re&peet for lOme of their pneral theoretical ideas.. Let me amplify first in eomaectiOD with the Neyman-Pearson aeboot WhUe imdating on long-run frequency 88 the basis of probabl1ity, that eehool wiaely emphaaizes the ultimate subjectivity of statistical inference or behavior within the objective coDStraint of 'c adm;';bility, " 88 in (Lebmann 1958; Wolfowitz 1962). But careful study ot admi';bility Ieada almOlt inexorably to the reeognitiOD of peraonal probabUitiea and their central role in statistica (Savage 1961, Section 4; 1962, pp. 170-175), 80 personalistie statistiea appears as a natural late deveiopmeDt of the Neyman-Pearaon ideas. One conseqUeDce of this sort of analyaha of admissibility is the extremely important likelihood principle, a eorol1ary of Bayea' theorem, of which I was not even aware when writing the first edition of this book This principle, inferable from, though nominally at variaDee with, Neyman-Peanon idea (Birnbaum 1962), was first put forward by Barnard (1947) and by Fisher (1955), members of what might be called the Fisher aehool of frequentiat& See alao (Barnard 1965; Bar.. Dard et aL 1962 ; Corn1leld 1966).
The views just exprMled are evidently controversial, and if I have permitted myself lOeb expre.ioDS .. "show" aDd Hinexorably," they are not meant with mathematical ftnality. Yet, eoutroversial though
v
PRF3ACE TO THE DOVER EDITION
they may be, they are today shared by a number of atatiaticiaDs, who may be aalled penonalistie BayMiana, or simply penoD&lists. This book baa played-and continue. to play-a role in the p8ft0Da}istie movemet, but the movement itself baa other sources apart from thoae from which this book itself was drawn. One with great impact on praetiea1 8tatiatiea and scientiiie manqeJDent is a book by Robert. Sehlaifer (1959). This ill a welcome opportunity to say that his idea were deve1· oped wholly independently of the present book, and indeed of other
penonalistie literature. They are in full harmony with the ideas in thia book but are more down to earth and leg spellbound br tndition. L. J. YaU U"""'" Juu,1971
&VA8
Preface to the First Edition A BOOJt ABOUT 80 CONTBOVEBSlAL A SUBJECT AS THE FOUNDATIOn
of atatistica may have some value in the classroom, 88 I hope this ODe win; but it cannot be a textbook, or manual of instruction, statiq the accepted facta about ita subject, for there scame1y are any. Openly, or ooyly screened behind the polite conventions of what we call a disinterested approach, it must, even more than other boob, be an airing of ita author'8 current opinions. One who 80 airs his opinions has serious misgivings that (as may be judied from other prefaces) he often tries to eommnnieate along with his book. First, he longs to know, for reasons that are Dot altogether noble, whether he i8 really making a valuable contribution. His O\\~ conceit, the encoU1'8plD.ent of friends, and the confidence of his publiaher have given him hope, but he knows that the hopes of others in his position ~ve seldom been fully realized. Again, what he has written is far from perfect, even to his biased eye. He bas stopped revising and caUed the book finished, becaUBe one must sooner or later. Finally, he fears that he himself, and stiU more such public as he baa, win forget that the book is tentative, that an author's most recent
word need not be his last word. The application of statistics interests some workers in almost every field of empirical investigation-not only in science, but also in commerce and industry.. Moreover, the foundations of statistics are connected conceptually l\ith many disciplines outside of statistics itae1f, particularly mathematics, philosophy, economica, and peycholOU a situation that, incidentally, must augment the Datura! misgiviDp of an author in this field about his own competence. Those who read in this book may, therefore, be diverse in background and interests. With this consideration in mind, I have endeavored to keep the book 88 free from teclmieal prerequisites 88 its subject matter and its restriction to a reaaouable aile permit.. Technical knowledge of statistics is nowhere &88umed, but the ..-del who has some general knowledge of statisties will be much better prepared to understand and appraise this book. The books 8tatiBlics, by L. H. C. Tippett, and On 1M Principia of 8tGtiaticol Ifl/~ by
...
VUl
PREFACE TO THE FIRST EDITION
A. Wald, listed in the Bibliography at the end of Appendix 3, are short authoritative introductions to statistics, either of which would provide some statistical backcround for this book. The boob of Tippett and Wald are so different in tone and empbaais tbat it would by no meaDS be wasteful to read them both, in that order. Any but the most casual reader should have lOme formal preparation in the theory of mathematical probability. Thole acquainted with moderately advanced theoretioal statistics will automatically have this preparation; others may acquire it, for example, by reading TI&«wr 01 Probability, by M. E. Munroe, or selected parts of AA l'*«ltlditm to Probability TMory and It. ApplictltioM, by W. FeUer, aceorcIiDg to their taste. In Feller's book, a thorough reading of the Introduction and Chapter It and a casual readjug of Chapten 5, 7, and 8 would be sufficient. The explicit mathematical prerequisites are not great; a year of calculus would in principle be more than enough. But, in practice, readers without some training in formal logic or one of the abstract bnmcbea of mathematics usually taupt only after calculus win, I fear, find lOme of the 10111 thoup elementary mathematical deductioDB quite forbiddiDg. For the sake of such readers, I therefore take the Uberty of giving some pedasogical advice here and elsewhere that mathematically more mature readers will find superfluous and possibly irritating. In the first place, it MDIlOt be too strongly emphasized that a long mathematical argument can be fully unde1'8tood on first readjDI only",1len it is very elementary indeed, relative to the reader's mathematical bowledge. If one wanta only the gist of it, he may read such material ODee only; but othenriBe he must expect to read it at least once spin. Serioua reading of mathematics is best done sittiDl bolt upright OD a hatd chair at a desk. Pencil and paper are nearly indispeDSable j for there are always figures to be sketched and steps in the &llWDent to be verified by calculation. In this book, as in maDY mathematical boob, when exercu. are indicated, it is absolutely euential that they be read and nearly essential that they be worked, becauee they coDStitute part of the exposition, the exercise form being adopted where it seems to the authol' best for conveyiDg the partieu1ar information at hand. To some mathematiciaDs, and even more to logicians, I must say a word of apology for what they may consider lapeea of rigor, such as using the same symbol with more than one meaaiDI and failinl to distinguish uniformly between the use and the mention of a symbol; but they will understand that theM lapses are sacrifices to what I take to be general intelligibility and will have, I hope, no real difficulty in repairing them.
PRJI:FACE TO THE FIRST EDITION
.
IX
Few J1rill wish to read the whole book; therefore introductions to the chapters and sections have been 80 \vritten 88 not only to provide orieatation but also to facilitate skipping. In particular, we fieroull are indicated around mathematically advanced topice and other digleuioD8. A few words in explanation of the eonventioDl, such 88 those by which intemal and external referencea are made in this book, may be uaeful. The abbreviation, 3.4 means Section 4 of Chapter 3; within Chapter 3 itae1f, this would be abbreviated Itill further to § 4. The abbreviation (3.4.1) meaDS the first numbered and displayed equation or other expreaion in § 3.4; within Chapter 3, this would be abbreviated stiD further to (4.1) and within f 3.4 simply to (1). Theorems, lemmas, eDftises, coronaries, figures, and tables are named by a similar system, e.I., Theorem 3.4.1, Theorem 4.1, Theorem 1. Incidentally, the proofs of theorems are terminated \\;th the special punotuation mark . , a device borrowed from Halm08'. Jl8tJ1Ur, f'luory.
Seven postu1atea, Pl, P2, etc., are introduced over the coune of several chapters. For ready reference these are, with 80Dle explanatoty material, reproduced on the end papers. Entries in the Bibliography at the end of Appendix 3 are designated by a eelf-explanatory notation in square brackets. For ezample, the works of Tippett, Wald, Munroe J FeDer, and HaIm08, &heady referred to, are (TJ), [WI], [M6], [FI], and [H2), respectively. I often allude to a set of UrI refert:tU"AI to a given u,pic. This meaDS a set of external references intended to lead the reader that wishes to pur8Ue that particular topic to the fullest and mOlt recent bibliographies; it baa nothing to do with the merit or importance of ·the worb retened to. Tecbnical terms (except for Don-verbal symbols) that are defined in this book are printed in bold face or italies (depending on the impor-
tance of the term for this book or for established U881e) in the context where the term is defined. These special fonts are occasionally used for other purpoee8 &8 well. Terms a~ sometimes uaed iDformaDyeven in unofficial definitions-before beiDg oflieiaUy defined. Even the official defiDitions are sometimes of neceuity very loose, oorrespoudiDg to the well-known principle that, in & formal theory, some terms must iD striet logic be left undefined. L. J. SAVAGB
Acknowledgement I IUVB IIAKY JPBIBNDe, PEW OF WHOM 8JlA.BE KY PBB8BNT OPIN-
iODl. to thank for criticism and encouragement. Though the list Beema 10D& I eannot refrain from explicitly mentioDiDg: I. Bma, A. Burks, R. Camap, B. de Fmetti, M. Flood, I. J. Good, P. R. HalmOl, O. Helmer, C. Hildreth, T. Koopman8, W. Kruskal, C. F. Mosteller, I. R. &Vap, W. A. Wallis, and M. A. Woodbury. Wal6a &8 chairman of my department and close friend has particularly encouraged me to write the book and facilitated my doiDg 80 in many ways. lira. Janet towrey and Miu Louise Foraytb typed and retyped and did 80 many other paiDStalring tMke 80 well that it would be inadequate to call their help secretarial. My work on the book was made possible by four organizations to which I herewith expn. thanks. During the yean l~ through 1964 I worked on it at the University of Chicago, where the work \\·88 supported by the Offiee of Naval Research and the UDiversity.itaelf, which also supported it during the summer of 1952. During the academic year 1951-52 I worked on it as a. research scholar in France under the Fulbrilht Act (Public Law 584, 79th Congress), and during the \\11ole of that year 88 a fellow of the John SimoD Gugenheim lfemorial F01JDdation. L. J. 8.
Contents Postulate. of a peraoaaliatio theory of deeision . . . . . . • • . . . End papers
I,
I1f'l"WQDvCTlQlC
1. Tbe mIe of foundatjone ,
1 1
!
2. Historical backcround
a, ae.ral outline of thiI book
.
,. PBVIMDf.ar (A)N81P ZJlAnON8 ON DECWON IN 'l'HJ: FAce OJ' UNCliBTAINft
J. Introduction
..,.,.!"
2. The penoD . . • . . . . . , . 3. The world. and ltata of the world . 4, Eyeota . . , . . . . . . . . . . . 6. CoDI8Quenoee, acta, and decisions 8. The simple orderiDl of acta with reIJ)eCt to preference 7. The aurH.hiDc principle. . . . . . . . . . . . . . 3.
pPeoJlAlc
8 10 13 17 21
PRo.ABfLlTI
1. Introduction. . . . , . . . . '
2. Qualitative peraonal probability . 3. Quantitative pereonal probability .. Aqmt matbelD&Uaal details . . . • 5. Conditional prot.bility, qualitative and quantitative 6. The approach to certainty throgh experience 7. Symmetric aeqU8ncee of event.. . . . . . . •. CluTtCAL
6 7
2'1 30 33
43 48 50
Co.MAD ON P'sPONAL PaoBABILtTT
1. IDtroduetjOD. . . . . . . . . . . . . . 2. Bome abortoominp of the per8Onalist.ie view
56
a. ConnectiOD with other vieg, , . . •. Criticism of other views. . . . . . . . .
60
S. The role of aym.metry in probability . . . • 6. How can eeienee UIG a pen!Ona1ilt.ic view of prob&bility?
6'1 63 • 67
5. UTlLlTT
1, IDtroduction. . . . . . . . . . . . 2. GamtW . . . , . . . . . . . . . 3. Utility, and preference pmb1es 4. The exteDsion of utility to more poeraI act8 .
amoac
lit Smen wortda . . . . . . . . . . . . . 6. Historical and critical comment. on utility I" wtn
69
70 78 76 82 91
OONTENTS 8. 0 ....1'4'110" 1. Introductioa .
...............
..
... , 2. What aD nl1 • [J("!'Atlon is , . . , . . , . . . . . . . . . . . 3. Multiple obeot vatiou, and extmaiODI of obeervatiODl and of Beta of
105
acta . . . . . . . . • . . . . A). A-B.
• is an element of A, te., a etate in A. t A is contained inB, i.e., every e1emeat of A is an element of B. A equals B, i.e., A 11 the lame let 81 B, te., A and B haw u:actly the IUD8
eJementi. (C~)
the compJemeDt of .A with
tbOl8 element. of 8 that are not in A
napectto8 ~A
the pion 0( the All
UlA,
AUB
the iD.tenectloA of the Al.
n,A,
AnB
the compJemeat of A with respect to 8 thoee e1emeota of 8 that are element. of &t least ODe of the aeta A 1, AI, etc.. the union of the Ale the union of A aDd B f i.e., thOle elementa of 8 that are elements of A or B (possibly of both) thOle elementa 01 8 that. are e1emeote of each of the eete A 1. At, e&e. the intenection of the Al. the intenectioll of A and B, i.e., thoee alemen.. of S that are e1emeota of both A and B
t Typocaaphical Dote: The Ponon font of the Greek alphabet (a, tJ, ,., I,
I,
t •... )
is the ODe aJm. . always printed, at 1eut in America, wbeD matbem.tieaI COMtaata Mel variabIee aft denoted by Greek letters. The symbol • OIled la tbie &ad lOme other pub1icatioDa to denote "element 01" iIt ~er. the epi&cm of the Vertical fOllt (., ~, T. 1, .. t. •.. ). Some pubUcatiOll8 1118 the apec(aI symbol ~; 8Dd the POI8OIl epaiIoo, plMlmably becaue of ita re.mblance to E:. The latter utap eataU8 eilller uIiD& • fOl' iwo different purpoI8I 01' elee AMainl fOIl. in mid alpbaW (a., ~ " ...... r, ...) - - ooutu. aDd ftriab1el aN deooted by Onek IeUea.
1OIDe".,
PRELIMINARY OOKSIDERATION8 ON DECISION
12
12.4
C may be. Alathematiciu8 would for the most part verify them by translating them into EDgliah and appealing to common seDIe, though in complicated cases explicit use might be made of Exercise 9. Diagrams, called VenD diqr&lDl, in which aeta aN symbolized by ant88, as illustrated by Figure 1, are often suggestive.
F'JIUI81
It is a remarkable and useful fact that any univenally valid statement about sets remains 80 if, throughout, U is interebaDpd with n, o with S, and c with =>. The dual in this sense of each exercise should be studied along with the exercise iteelf. For example, the dual of Exercise 7 is: A ::> B, if and only if A .. A U B. Note that the Brat parte of Exercises 1 through 6 are dual to the second parts. It may be remarked that, if Exerciees 1-6 are taken &8 axi0tn8 and 7 88 a definitioD, Exercises 8-21 and a180 the duality principle follow formally from them. For example, 10 can be proved thus: By 7, if A B is A, then A c B; but, by 1, A A is A; therefore A C A. Apin, 8 can be proved, usiDg 6, 3, 2, 1, 3, and 6 in that order. thus:
n
(1)
n
0
nA
-=:
(A
n ""A) n A
- ""A
n (A n A)
== J , .. given B" are to be defined mutatis mutandis. It is DoteWOrthy though obvioU8 that, if 1(,) - g(.) for all •• B, then f ... , given B. It is now poaaibJe and instructive to give an atemporal analysis of the following temporally described decision situation: The person must decide between f and I after he finds out, that is, observes, whether B obtains; what will his decision be if he finds out that B does in fact obtain? Atemporally, the peraon can 8Ubmit bilD8elf to the consequences of f or else of I for all. I B, and, independently, he can submit himself to the consequences of f or else of , for all , e ,...", B; which alternative will he decide upon for the 8'8 in B1 Finally, dateribing the situation not only atemporally but aleo quite formally, the persoD must decide amoDl four acts defined thus: ~
boo agrees with I on B and with f on ,..." B, bol &pees with f on B and with , on "-B, b lo qreea with I on B and with f on ,...",B, bll qreea with I on B and with I on ,....,B.
The question at iasue now takes this form.
Supposing that
DODe
of
PRELIMINARY CONSID(4!RATIONS ON DF£ISION
(2.7
the four functions is preferred to the particular one hi;' is i a: 0, or is i == 1; that is, does hil agree with I on B or with I on B1 I t is not hard to see that i can be 1, if and only if f S C given B.. Indeed, if i - I , hoJ S hi;' which means that f S C given B. Arguing in the opposite direction, if f < I given B; then boo C; then C is null. If "-IBis null; f S I given B, if and only if f ~ ,. f :5 I given S, if and only if f S c. If S is null, f == I for every f and I.
Component 6 of Theorem 1 requires comment, because it corresponds to a pathological situation. In cue S is null, it is not really intuitive to say that S (and therefore every event) is virtually impoasible. The interpretation is rather that the person simply doesn't care ,,·hat bappens to him. This is imaginable, especially under a suitably restricted interpretation of F, but it is uninteresting and will accordingly be ruled out by a. later postulate, P5. A finire set of events Bi is a partition of B i if B. n Bj ~ 0, for i ~ j, and i Bf == B. 'Vith t.his definition, it is easily proved by arithmetic induction that
U
If Bi is a partition of B, and f S I given B; for each i, then f ~ , given B. If, in addition, f < I given 8 J for at least one j, then f < g given B.
THEOREM
2
The union of any finite Dumber of null events is null.
There are still other interesting consequences of Theorem 2, which may be most conveniently mentioned informally. If, in Theorem 2, B == S (or, more generally, if ~~ is null), it is superfluous to say "given
THE SURE-THING PRINCIPIIE
2.71
Btl in the conclusions of the theorem. If f ..:. g given B, for each i, then f I given B. So much for the consequences of P2. Acta that are con-.t, that is, acta whose consequences are independent 01 the etat.e of the world, are of special interest. In particular, they lead to a natural definition of preference among consequence8 in terms of preference among acf.6. F ollo"ping ordinary mathemati(Wal U8age, f .. g will mean that f is identically g, that is, for every a, J(,) - g. A formal definition of preference among consequences can now conveniently be expre88ed thus. For any consequence8 g and g', g < g'; if and only if, when f Iii g and f' == g', f ~ f'. In the same spirit, meaning can be &88igned to such expressioll8 88 f ~ g, g ~ f given B, etc., and I will freely use such expressions without defining them explicitly. In particular, J ~ g given B has 8 natural meaning, but one that is rendered SUperflUOU8 by the next postulate,
=
PS. Incidentally, it is no,,· evident how awkward for U8 it would be to use/(,) for f; because!(,) < g(8) is a statement about the eoneequence8 J(') and g(,) , whereas f :s g is a statement about acts, and we \\ill have frequent need for both sorts of statements. Suppose t.hat f = gt and f' e g', and that 9 S g', is it nuonable to admit that, for some B, f > I' given B1 That depends largely OD the interpretation we choose to make of our technical terms, &8 an example helps to bring out. + Before going on a picnic with friends, a person decides to buy a bathing suit or a tennis raeket, not. having at the moment enough money for both. If we call ~on of the tennis racket and po88e86ioD of the bathing suit consequences, then ...~e must say that the cODSeqUf'nces of his decision ,vill be indepeDd~nt of where the picnic is actually held. If the person prefers the bathing mit this deciBion would presumably be re,"ersed, if he learned that the picnic were not going to be held Dear water. Thus the question ",,'bether it ean happen that f > f' given B would be answered in the affirmative. But., under the interpretation of "act" a.nd "COD.CJe L
for r == 1" ...
Pi
k-r+1
1
ft -
l',
then r
(2)
L
~
Pi
>
{r -
E
and
l)/n,
1
Pi
S
(r
+ l)/n.
ft-,+1
2c. The sum of any r of the p/slies between (r - l)/n and (r + l)/n. 2d. U P alm08t agrees ,vith S·, and C(r. n) denotes bere and later in this proof any union of r elements of any n...fold almost unifonD par-
tition (not necessarily the same from one context to another), then (3)
(r - 1)/n
S
·0. there ill a partition of S. DO element of
QrANTIT.-\TIVF,
3.31
PJ.~RSONAI..
PROBABILITY
37
whi(eh iR as probahlt~ as Il; :s;. is Ii~' (:. + B and C are aIm o.t tqUiVdlent, written B ¢ . C; if and only if for all non-null G and H such that B n G =: C n H = 0, BUG >. C and C U H ~. B. It is ob'''iou8 that equivalent events are also almost equivalent. Finally, if and only if every pair of almost equivalent events are equivalent, . 0, and C >. 0; there exists Dee such that o < . D m. If G. were more probable than H and therefore more probable than each element of the partition, it would follow tbat the union of all elements of the partition, namely S, is leas probable than 0 1 , which would be absurd. The two events Bl = U. Cra, Bt = G.) partition B in the required faahiOD...
B and A ,14 B, then A is necesaari1y more probable thaD 8, though the numerical probability of A may well be the 8aDle as that of B. Thus, if. marksman shoota at a wall, it is logically contradictory that his bullet should fall nowhere at all, but it is logiea1ly coJl8istent that a preacnDed mathematically ideal point on the bullet should strike a praacnDed mathematicaUy ideal line on the wall Since the event of the PN8Cribed Point hitting a pre8CnDed line is logieally possible, Koopmall would insist that the event is more probable than the V&eUoua event, uamely that the bullet goes nowhere, though the numerieal probability of both events is sero. I do not take direct issue "ith Koopman, becaQ8e he is preaunabJy talking about a somewhat different concept of probability from the particular relation S; but I do not think it appropriate to suppoee that the penon would distinctly rather stake a pin on the line than on the Dull eet. The issue is Dot really either an empirical or a normative one, because the point and line in qumtion are mathematical ideali_tions. If the point and line are replaced by & dot aDd • band, respectively, then, of COUr&e, no matter how small the dot and band may be, the probability of the one bitting the other is greater than that of the vacuous event. But it teems to me enUndy & matter of taste, conditioned by mathematical experience, to decide what idealization to make if the dot and band are replaced by their idea1iMd limits. So much for hair 8plitting. All far &8 the theory of probability per 8e is concerned, postulate P6' ia all that, need be assumed, but in Chapter 5 a slightly stronger &88WDption will be needed that bears on acts generally, not only OD thoee very special acta by which probability is defined. Therefore, I am about to propoae & postulate, PS, that obviously implies P6' and \'till therefore su.peraede it.. This atronpr postulate 8ee1M to me acceptable for the lUna reason that P6' itself does.
P6 If I < h. and I is any consequence; then there exista a partition of 8 such that. if , or h is 80 modified on any one element of the partition 88 to take the value I at every 8 there, other values being un-
PE&c;oNAL PROBABILITY
disturbed; then the modified I remains less than h, or than the modified h, as tbe case may require. ,
13.4
« remains less
Some mathematical details
Are there qualitative probabilities that are both fine and tight, that are fine but not tight, that are tight. but not fine, that are neither fine nor tight but do have one and only one alm08t agreeing probability measure? Examples anBwering all these questions in the aflirmative will be exhibited in this sectioD. To indicate a different t.opic that will also be treated here, thoee of you who have had more than elementary experience with mathematical treatments of probability know that it is not wrual to suppose, 88 has been done here, that all sets have a numerical probability, but rather that a sufficiently rich cl888 of seta do 80, the .remainder being considered unmeasurable. Again, it is usual to suppose that, if each of an infinite sequence of di8joint sets i8 measurable, the probability of their union is the sum of their probabilities, that is, probability meuu1e8 are generally 88SUIIled to be countably additive~ But the theory being developed here does 8IJ8Ume that probability is defiDed for all events, that is, for all sets of statal, and it does not imply countable additivity. but only finite additivity. The present section not only an8\\~ra the questions raised in the preceding paragraph, but also discusses the relation of the notions of limited domain of definition and of countable additivity to the theory of probability developed bere. The general conclusions of this discussion are: First, there is no technical obatac1e to working with a limited domain of definition, and, except for expository complicatioDB, it might have been mildly preferable to have done 80 throughout. Second, it is a little better not to 888Ume countable additivity as a postulate, ,but. rather &8 a special hypothesi8 in certain contexts. A different and much more e.."(tensive treatment of these questions has been given by de Finetti (04). Finally, before entering upon the main technical work of this section, one easy question about the relation between qualitative and quantitative probability will be ans\vered and several 88 yet unanswered ones will be raised. Are there qualitative probabilities "ithout any strictly agreeing measure? Yes, becauae any qualitative probability that i8 fine but Dot tight is easily shown t.o provide an example. It is, bo,ve,·er, an open question, stressed by de :....inetti [D5). whether a qualitati\re probability OD a finjte S always hu a strictly agreeing measure. I t would also be technically interesting to know about the existence of almost acreeinl measures in the same context..+ + EVf'D this has eint"e bHn answftJ'.-d in th~ negatift by Kraft, Pratt. and Af.idflubP.,. (19M). Rfle (Filhbnm 1970. pp. 21f).211).
.1""
The matters to be treated in the rest of this section are rather teeh.. nieal mathematicalJy, and, though I would not delete them altogether, it does not seem justifiable to lay the necessary groundwork for presenting them in an elementary fashion. Some may, therefore, find it necessary to skip the rest of this section altogether, or to skim it rather lightly. It is well kno\m that there does not exist a eountably additive probability measure defined for every subset of the unit interval, agreeing with Lebesgue measure on those sets where Lebesgue measure is defined, and assigning the same measure to each pair of congruent sets+ (Problem (b). p. 276 of [112]). On the other hand, there do exist finitely additive probability measures agreeing with Lebesgue measure on thOle sets for which Lebesgue measure is defined, and 8.8Bigning the same measure to each of any pairs of congruent sets; cf. p. 32 of [B4J. The existence of such measures shows, amoDg other things, that a finitely additive measure need not be eountably additive. Again, calling such a finitely additive extension of Lebesgue measure P and defining B S· C to I1lMD P(B) ~ P(C), we see an example of a qualitative probability that is both fine and tight. An example of a qualitative probability that is tight but not fine may be cODStructed by taking for S two unit intervals, 8 1 and~, in eaeh of which finitely additive extensions of Lebesgue measure, PI and P2J are defined. The generic set B in this example is therefore partitioned into BI - B Sl and B2 B 8 2 , respectively. For this example, let B p I B 1> ==
1
for 0 S p
< aD.
The problem is quite simple when account is taken of the fact that R(J:) is the product of n random variables, R'(zr), tbat are independent given B 1 - In attacking the problem, two cuea are to be diatiDguished, according 88 there are or are Dot values of % that, have positive pr0bability given BI but Ie1"O probability given B2 • It is in practice rather fortunate to find iD8taDce8 of the firat cue, for then (7) applies ~;tb a vengeance. Indeed, BUppo&e that P(R'(~)
(8)
< GO IB.)
= t/I,
Then
peR .. co I B s>=-= 1 - ••,
(9)
which
ob\~jous1y
approaches 1 "ith increasing n.
3.61
APPROACH TO CERTAINTY THROUGH EXPERIENCE
The second cue, namely " == 1, is more interesting. Since much is known about awna of identically distributed independent random variablea, it is uatural to investigate
log R(x) =
(10)
L log R'(X,.), r
thereby replacing a product by a awn. It is easily eeen from the definition of R'(zr) that P(R'(zr) > 0 B l ) == 1, 80, in the cue DOW at band, the fUDctiona Jog R'(x.,) are independent real bounded random variables.
I
Letting (11)
tbe weak law of large Dumbers (12)
t
implies that, for any
lim P(log R(x) ~ n(1 - e) I B l )
-
E
> 0,
1,
a- •
equivalently, (13)
lim P(R(x)
> eflU-e) I B I ) = 1.
The objective \\;11 therefore be achieved, if it is demonatra.ted that
1
> 0 unless
(14)
(6) holds. But 1 - B(log R'(zr) IB I ) ~ -log E(R,-I(Z.,)
I B1)
- - log 1 - 0, may be argued thU8: The inequality in the above calculation is 88signed as Exercise 8 in Appendix 2, together with the fact that equality can hold in (14) if and only if R,-I(Xr ) is constant with probability one given B 1• But the expected value of R,-l(z..) given Bl is equal to 1, 88 (14) 888erts and 88 may be easily verified from the definition of R,-J(xr ). So, barring the exceptions provided for, 1 > 0, and the demonstration of (7) is eomplete. Before the observation, the probability that the probability given ~ of whichever element of the partition actually obtains will be greater than a is 88
(Ii)
~ ~('lp(p(B.1 ~)
,
>
I
a B~,
where summation is confined to tboee i'8 for which /l(i) jill! O. Application of (14) (edeDded to arbitrary pairs of i'a) shows that the coefficients t For the definition of this law, _, if DecOIlE,)" p. un of FeUer. book [Fl).
PEBSONAL PROBABILITY
[3.7
of each /J(t.) in the quantity (15), and therefore the quantity itself, approaches 1 as n increases; provided only that no two funetiODB f(ZrI i) and ~(Xr i') are the same, if ~(t.) and /J(i') are both different from sero. To summariJre informally, it has DOW been shown that. with tbe observation of an abundance of relevant data, the peraon is almost certain to become highly convinced of the truth, and it has also beaD shown that he bimaelf knows this to be the cue. It may be remarked, for those familiar with certain theorems, that many refinementa of (7) and its COD8eQUeDces could be worked out by application of the strong I&w of large numbers, the central limit theorem, and the law of the iterated logarithm to R'(z..). The quantity 1 is coming to be called the iDfODDation of the distribution of Xr given Bl with respect to the distribution of x.. given B". More generally, if P and Q are probability measures, confined (for simplicity) to a finite set X with elements 2:; the information of P with reepect to Q is defined by
I
1: P(z) log per) .
(16)
:I
Q(:r)
This usage stems from work of Claude Shannon in communication engineering, a good account of which is given in [SI1J; and also from independent work of Norbert Wiener in a related contAnct [WIOJ. The ideas of ShanDOD and of Wiener, though concerned with probability, seem rather far from statistics. It is, therefore, somethiDg of aD accident that the term "information" coined by them should be not altopther inappropriate in statistics. The situation is still further confused, because, as long ago 88 1925, R. A.. Fisher emphasized an important notion, which he called "information," in connection with the theory of estimation (Paper 11, Theory oj 8uJtiMical t~timalitm in (F6J). At lint glance, Fisher's notion seems quite different from that of Shannon and Wiener, but, 88 a matter of fact, his is a limiting fann of theirs. A useful but rather technical exposition relating the several aen8el of "information n is given by Kullhack and LeihJer (K15], and I return to tbe topic in § 15.6....
7 Symmetric sequence. of 8YeJlts A problem often posed by statisticians is to estimate from a eequenee of observatioDB the unknown probability p that repeated trials of aome 80rt are successful. On an objectivistic view, this problem is natural and important, for on such a view the probability that a coin falla heads, for example, is a property of the eoin that ean be determined by experimentation with the coin and in no other way. But on a personalistic .,
~
aiM
(Knllhar~
1001).
8.7)
SYMMETRIC SEQUENC&'J OF EVENTS
II
view of probability, strictly interpreted, no probability is unknown to the pel'8OD concerned, or, at any rate, he can detennine a probability only by interroptiDg himselft not by reference to the external world. This situation has been interpreted to imply that the pel'lODalistic view is wrong, or at any rate inadequate, because it apparently cannot eVeD express one of the most natural and typical problema of atatistiea. Thus far in this book, I have not argued against the poIBibDity of defining some U8ef'ul Dotion of objective probability, but have contented myself with presenting a particular notion of pel"8Oll&1 probability. Therefore, at this point it might be tempting to seek a dualistic theory admittiDg both objective and personal probabilities in some kind of articulat.ion with one another. De Finetti fD3] haa shown, however, that it is not necesaary to do so, that the notion of a coin with unknown probability p can be reinterpreted in t.erms of personal probability alone. The present section is de,·oted to outlining this development due to de Ymetti. In the organisation of tbe book &8 a whole, it playa DO lop. ea11y 888eJltial pan; it is, rather, a digression intended to give a clearer understanding of the notion of personal probability, especially in relation to objectivistic views. The ideas presented. here are but a fl'&lment of those OD the same subject in [D2]. z,. be a sequence of random variables taking only the values 0 and 1. The Z,.'8 &re, to all intents and PUrp0ee8, & aeqUeDee of events, the rth of which is the event that %,.(8) - 1. To say that these events are independent, each occurring with probability p, is to aay that the probability of any finite pattern, %1, ••• , x"' initiating the sequeDce %r(') is given by the formula
ut
(1)
P(%,.(,) .. %,.; r -= 1, •.. ," I p) .. ,,(1 - p)"-',
where II is the number of 1'8 among the %,.'8 for r z:: I, ••.• n. Mixtures, in a certain sense, of sequences of random variables are often of interest, as they already ha,,~ been in the preceding section. SUPP088, to be explicit, that the world is partitioned by Bl and that, liven Bit the x/a are independent "ith P(r,.(I) == 1 B.) having some fixed value p(i). Then the unconditional probability of a particular initial aequence is a mixture of tbe probabiUtiee given by (1) tbu8:
I
(2)
P(~(,) ..
x,.; r - 1, ..• , 11) -
L
•
p(t)·(l - p(I)"-'P(Bi) .
It is natural to pneralise (2) formally thus: (3)
P(z.(.) - :r,.; r - I, .'., n) -
f P"(I -
p)." dM(p),
(3.1
PERSONAL PROBABILITY
where M is a probability measure on the real numbers in the interval [0, 1). It is noteworthy that equation (3), UDderatood to apply for every ft, is equivalent to the cODdition that the probability that every fa of each prescribed set of n of the Zp's takes the value 1 is
f
(4)
p'"dM(p).
This follows by arithmetic induction from the obvious formula (5)
P(Zr(a)
-= z,.; r
:. 1, •.. J ta)
-= P(z..(,)
II:
:rr; r
+ P(S,.{,)
ID
II:
1, ... , n; Za+l(')
w=
0)
z,.; r .. 1, .•. , fa; %-+1(') - I),
wbich app1iee to aDy sequenee of random variables t.aJring OIl only the values 0 and 1. Equation (3) caD very well have an interpretation in such terma that the meaaure JI is not merely an abstract probability measure, but is actually a peraonaI probabilit.y. Thus, if p is a random variable tbat. is (for a giVeD perIOn) distributed aecordiDa to Jl, and, Hlor each 'P the eonditiODal distribution of the %r'a given p is independent, with p(z,.(e) - 1) .. p; then (3) obtains. Strictly speaking, the notion 01 conditional probability .. it OCCUI8 in the preeediDg sentence is uaed in a aomewhat wider aenae than has been defined in this book, for the probability of any particular p win typically be zero. At least for countably additive measures, the neee sary exteDIiOll of eonditioaal probability and cooditional expectatioo is presented by Kolmogordf in [K7); it is a concept, of the p-eateat value in adV8Dced mathematical statistice and in probability generally. However, in moat contexta where objectivista apeak of an 1mbown probability 'P, there is, 80 far 88 aD exclusively persooaJistio view of probability is concerned, DO unknown parameter that can play the role of p in (3). Examination of aituati0D8 in which "unlmoWll" probability is appealed to, whether juatifiably or not, abOWI that, from the persoD&liatic standpoint, they alwaye refer to 8)'DlDletric eequenc. of events in the 8eDle of t.he following definition. The sequence of random vanabl. z" ta1rinl only the values 0 aDd 1, is a IJIfIIfftdric t ~, if and only if the probability that aD7 b of the %,,(.)'8 equal 1 and aDy c other ~(.)'8 equal 0 depends only on the intepre II ad c.
t De FiDeUl ... the 1PreDeth word for "equi..at."+ + Be aDd oUtfl,. DOW preter "-elaarllCMhl•." Tb~ C'ODf4tpt ....ms to ha.,~ '-n ftNt ~ by Jl1lM Bug (1928).
3.7)
8YMMETBlC
~UENCES
OF EVENTS
&3
It is euy to verify that any mixture of independent sequences in the leDle of (3) is a symmetric sequence. De Finetti has di.seovered that the converse is a1ao true. These conclusions can be fonnally 8WIlDlarized thU8: THEOREM 1 A sequence of random variables z,., taking only the valUe8 0 and 1, is symmetric, if and only if there exiBta a probability
measure M on the interval [0, 1Jsuch that the probability that any prescribed n of the zr(8fs equal 1 is gi,"en by (4). Two such measures, M and M', must be eseentiaUy the same, t in the sense that, if B is a subinterval of [0, 11, then M(B) .. JI'(B). CODBidering that de Finetti haa published a proof of Theorem 1 in [D2) based on the Fourier integral, that &U1 proof of it must be rather technical, and that the theorem is not the basis of any formal inference later in this book, it eeems best not. to prove it here. t I t is Theorem 1 that makes it possible to express propoejtiODS referring to uDknown probabilities in purely personalistic terma. If, for example, a statistician were to say, U I do not know the p of this coin, but I am sure it is at most one half," that would mean in personalistic terms, "I rteard the sequence of toeses of this coin 88 a symmetric sequence, the measure M of which 888igns unit measure to the interval [0, t)." This condition on M meana in tum that for every n the (personal) probability of n consecutive heads is at most 2--, &8 is easily verified. I do not insist that propositions couched in tenns of a fictitiou8 unknown probability are bad~ if undel'8tood &8 suggestive abbreviations, but only that the meaningfulness of such propoaitioM does Dot cODstitute an inadequacy of the pel'8Onaliatic \iew of probability. The mathematical concept of probability me&8Ul'C or, a trifle more generally, bounded mea...ure is fundamental to mathematics generally. Probability measures, often under other Dam~ are, therefore, employed in many parts of pure and applied mathematics completely unrelated to probability proper. For example, the distribution of m&88 in a not necessarily rigid body i8 expressed by a bounded measure that tells how much of the body is in each region of space. We must, therefore, not be A1rprised if, even in studying probability itself. we come &C1'088 some probability measures used Dot to measure probability
t TechnieaJ note: If "probability measure" were here uDderstood to mean a COWl"" ably additive probability meuure on the Borel . . of (0, I). the theorem would remain true, aDd the __tiaJ wdqu__ or M would beoome true uniquene-. ~ Teclmical DOte: Theorem 1 can be proved very quickly and naturally by applyina tbe theory or the lIasdodr moment problem (pp. 8-9 of 18130 ~ M t but; method does !lOt _m to pneraliae readily. + + New ad pneral utelhodl are in Hewitt and Savap (1955) aad ByllNardzewUi (1961). For related work see Biihlmann (1960), Freedman (1962, 1_), Milier-Gnu.ewaka (1949, 1950), and ReDyi and Reves& (1963).
to.
PERSONAL PROBABILITY
proper but only for auxiIiaty purpoIe8. In the event that p is not aotually an unknown parametsr, the measure M pr8leDted by Theorem 1 seems at first sight to be such & purely auxiljary measure, but, aa a matter of fact, M does measure certain inteJeeq probabilities, at least approximately. For example, JettiDg 1 •
(6)
I. - -
ft
E%,., 1
it can be shown that
......
(7)
lim P(I.(.)
S ') - M(p S
I).
In words, the person considers the average of any larp Dumber of future obaervatioDS to be distributed approximately the way 11 is distributed by M. This is an extension of the ordinary weak law of larp Dumbers, proved in (D2] alODg with a correspoDdiDg eDemIion of the strong law.
If the first n terms of a symmetric aequenoe are obaerved, how does the rest of the sequence appear to the penon in the liIht of this obeervation? In the first place, it aIao is a symmetric eeqUeDce but paerally of a structure different from that of the original eequenee, as may be shown thus: Let ..(y, n - 1/) .. Df P(z1'{&) - Z1'; ,
(8)
as one may for a symmetric sequence. (9)
P(z,(,)
-= 1, ••• , A),
Then
+ 1, ... , n + m 1%1'(8) P{Xp (') -= XII' P .. 1, ... , n + m) == P(Zr(8) == Z", T = 1, ..• , n) r(1/ + z, (n - 1/) + (m - z»
-= z,; q
la
fa
2:,-,
-
r - 1, .•. ,.)
:Ii: - - - - - - - - - -
"(11,
n - II)
where a is the number of 1'8 among the Z,'8, q - n
+ 1,
+ "'-
.•. , ft. Equation (9) shows that the aequence q > n, given that s ..(.) r =- 1, .•. , A, is a new symmetric sequence characterised by
x.,
(10)
~~m-~=ru
"(1/
+ z, (n -
r)
"'(11,
fa -
~1'J
+) (m - z» . fI
The measure M' associated with the new sequence is, aceordiDa to Theorem 1, esaentially determined by the CODdition that
&7J
(11)
8YMMErRIC SEQUENCES OF EVENTS
f
p'" dM'(p) - "'(m, 0) ...(m
+ 7/, n -
1/)
=-----r(y,1I - y)
f p..+tt(1 -
p) ....... dM(p)
=.---------------"'(J/, n - 71) ==
f",. ,-(1 -
p)"--' d.V(p). "'(,1, n - 1/)
Equation (11) makes it plausible that, except for the slight ambiguity .,.mitted by Theorem 1, M' is defined (for Borel seta B) by (12)
M'(B) - 11'-1(1/. n - 1/)£P"(1
- p)"" dM(p),
and this ean in fact be demonstrated with some appeal to slightly advaced methods pertaining to the Hausdorff moment problem (pp. 8-9 of [813]). It is noteworthy that, if M(B) -= 0, then M'(B) 0 also. In tbe event that p reaUy is an unknown parameter, this rne&D8 that, if the peraon is virtually certain that the true p is Dot in B, no amount of II:
evidence can ait« that opinion. Equation (12) shOWl that M' is generally different from M. Indeed, for fixed ft ~ 1, Jl' is clearly the same as M for every tI for which ...CJ/, fa - If) > 0, if and only if M assigns the measure 1 to lOme one value of p~ That is, the person regards evidence drawn from a symmetric sequence 88 irrelevant to the future behavior of the sequence, if aDd 0Dly if at the outset he regards the sequence Dot merely 88 aym:netrie but also as independent. It can be shown that the person regards it 88 highly probable that, if he obeerves a eufticiently lonl ..,'bent of a symmetric eequence, tbe cODtinuation of the sequence will then be one for which the eOJ1ditionaJ variaDce of p, (13)
wiD be small. In tbe event that p is really an unknown parameter, this implies that the person is very sure that after a Ioog sequence of obaervatiODB he will aaaign nearly unit probability to the immediate neigh.. borhood of the value of p that actually obtaine--a paraDel to the approach to certainty discus8ed in 16.
CHAPTER
4
Critical Comments on Personal Probability 1 Introduction It is my tentative view that the concept of personal probability introduced and illustrated in the preceding chapter is, except poeaibly
for 81ight modifieatioDl, the only probability concept essential to aeience and other activities that call upon probability. I propoaa in this chapter to discusa the ahorteominp I Bee in that particular peraoDaliatic view of probability, which, for brevity, shall here be ea1led simply Uthe personalistic view"; to point out briefly the relationships between it and other views; to criticize other view8 in the light of it; aDd to diecuss the criticisms holders of other views have raised, or may be expected to raise, against it. From the standpoint of strict logical organization such critical ~ marks are somewhat premature, becauae the personalistic view jtself insists that probability is concerned with consistent action in the face of uncertainty. Consequently, until the theory of such action has been completely outlined in later chapters, the view to be criticised cannot even be considered to have been wholly presented. Practically, however, it aeoms wise not to confine critical comment. to the one part of the text that logic may 8Uggest as appropriate, but rather to touch on criticism from time to time, even at the cost, of some repetition. Thus, some of what is to be said here has already been said in the introductory chapter and elsewhere, and some of it will be said &pin Views other than the personalistic view are to be diacueed here, but it cannot be too distinctly emphasized that the account given of them will be very superficial. t One function of diacWl8ing other view. is to provide the reader with at leut some orientation in the larp and diversified body of ideas pertaiDing to the foundation of 8tatistiea that t
~fucb
more extensive comparative material ia given by Keynes (KtJ, by N.pI (NIl. and by Camap (CJI. Koopmao IK121 should &lAo be mentlofted in tbi8 COD-
Det"tion.
".21
8HORTCOAflNG8 OF THE PERSONALISTIC VIEW
51
have been accumulated. A less obvious, but I think no less important and legitimate, function is to cast new light on the personalistic view, especially for thoae who already hold, or tend to hold, other ,·iew8.
I can aDSWer, to my own satisfaction, some criticisms of the perBODalistie view that have been brought to my attention. These points are mscuSBed later in the chapter, but in this section I state and discu~ &8 clearly as I can those that I find more difficult and confusing t.o answer. According to the pel'8Oll&1istie view, the role of the mathematical theory of probability is to enable the pe1'8On using it to de~et incoDsistencies in his own real or envisaged beha,ior. It is also understood that, baling detected an inconsistency, he ",,11 remo,·e it. An incoDsistency is typically removable in many different ways, amoDg which the theory gives no guidance for chOO8iDg. Silence on this point does Dot seem altogether appropriate, so there may be room to improve the theory here. Consider an example: The persoD finds on interrogatinl himself about the possible outcome of tossing a particular coin five tim. that he coDSidera each of the thirty-t,,·o possibilities equally probable, 80 each baa for him the numerical probability 1/32. He also finds that he considers it more probable that there will be four or fh·e heads in the five toI8e8 than that the first two tAl.as will both be heads. Now, reference to the mathematical theory of probability soon shows the persoD that, if the probability of each of the thirty-two p08Bibiliuea is 1/32, then the probability of four or five heads out of five is 6/32, and the probability that the first two toe&es wUI be heads is 8/32, 10 the person has caught himself in an inconsistency. The theory does not tell him how to resolve the inconsistency; there are literally an infinite number of poabilities amoDg which he must choose. In this particular example, the choice that first comes to my mind, aDd I imAJPne to yours, is to hold fast to the poaitioD that all thirty-two poaJibilitie.s are equally likely and to accept the implications of that position, including the implication that four or five heads out of five is Ieee probable than two heads out of two. I do not t.hink that there is my justification for that choice implicit in the example &8 formally et.ated, but rather that in the 80rt of actual situation of ,,-hieh the example is a crude schematization there generally are considerations not incorporated in the example that do justify, or at any rate elicit, the
choice_ To approach the matter in a 80mewhat different ,,·ay, there seem to be some probability relations about which we feel relatively "sure'" as
CRITICAL COMMENTS ON PERSONAL PROBABILITY
[4.1
compared with others. When our opinions, as reflected in real or envisaged action. are inconsistent, we sacrifice the unsure opiniODB to the sure ones. The notion of "sure" and "unsure" introdueed here is vague, and my complaint is precisely that neither the theory of personal probability, as it is developed in this book, nor any other device known to me renders the Dotion less vague.+ There is some temptation to introduee probabilities of a seeond order 80 that the person would find himaelf saying such things 88 "the probability that B is more probable than C is greater than the probability that F is more probable than G. u But such & program seems to meet insurmountable difficulties. The first of theae pointed out to me by Max Woodbury-is this. If the primary probability of an event B were a random variable b with respect to eeeondary probability, then B would have a "composite" probability, by which I mean the (secondary) expectation of b. Composite probability would then play the allegedly villainous role that secondary probability was intended to obviate, and nothiDg would have
been accompliahed. Api", once second order probabilities are introduced, the introduction of an endleA8 hierarchy seems mescapable.. Such a hierarchy . . "'very difficult to interpret, and it seems at best to make the theory less realistic, not mote. Finally, the objection concerning composite probability would seem to apply, even if an endless hierarchy of higher order probabilities were introduced. The composite probability of B would here be the limit of a sequence of numbers, E.(B.. _1 (··· E2 (Pl(B»· .. ». a limit 'that could scarcely be postulated not to exist in any interpretable theory of this sort. The reader may wish to evaluate for himseH the arguments in favor of such a hierarchy put forward by Reichenbach (Chapter 8, [R2)), takiDI proper account of the differences, between Reichenbaeh'. overall view. and his mathematical theory, of probability on one band and, on the other, the personalistic view and measure-theoretic mathematical theory that are the basis of my critique of higher order probabilities. The interplay between the "sure" and "unaure" is interestiDll1 expreaaed by de FiDetti (p. 60, [D2D thus: "The fact that a direct estimate of a probability is not always possible is just the reason that the loPcal rules of probability are useful. The practical object of these rules is simply to reduce an evaluation, scarcely aeceuible directly, to others by meaD8 of which the determination is rendered easier and more
precise." It may be clarifying, especially for some Madera under the away of the objectivistic tradition, to mention that, if a person is '~8Ure" that ... One teolptillg representation of the l1Jl8un is to replace the penGn'.lingle probability nU!!UUrfI P by a ..t of .\l~h measures, especially a OODvex set. Some .xploratioDS of thia are Dempster (1968), Good (1962), and 8nuUa(1961).
4.2)
SHORTCOMINGS OF THE PERSONALISTIC VIEW
the probability of heads on the first toss of a certaiD penny is i, it does not at all foUow that he considen the coin fair. He might, to take an extreme example, be convinced that the penny is a trick ODe that always falhJ beads or always falls tails. Logic, to which the theory of personal probability can be closely paralleled, is similarly incomplete. Thus, if my beliefs are inconsistent with each other, logic insists that I amend them, without teUing me how to do 80. This is not a derogatory criticism of logic but simply a part of the truism that logic alone is not a comp1ete guide to lile. Since the theory of personal probability is more complete than logie in some reapecta, it may be 801Ilewhat disappointing to find that it represents no improvement in the partieular direction now in question. A leCond difficulty, perhaps closely associated with the fim one, stems from the vagueness aaociated \\ith judgments of the magnitude of personal probability. The postulates of personal probability imply that I can determine, to any degree of accuracy whatsoever, the probability (for me) that the next president will be a Democrat. Now, it is manifest that I cannot really determine that number with great aecuracy, but only roughly. Since, &8 is widely recognized, all tbe IDte~ ing and useful theories of modern science, for example, geometry, relativity, quantum mecbanics, Mendelism, and the theory of perfect competition, are ine.uct; it may not; at first sight seem disquieting that the theory of personal probability should also be somewhat inexact. As wiD immediately be explained, however, the theory of personal probability ca.DIlot safely be compared with ordinary scientific theories in this respect. I am not familiar with any serious analysis of the notion that a theory is only slightly inexact or is almost true, though philO8Opbem of science have perhaps presented some. Even if valid analyses of the Dotion have beeD made, or are made in the future, for the ordinary theories of science, it is Dot to be expected that those analyses wiD be immediately applicable to the theory or personal probability, normatively interpreted; because that theory is a code of consistency for the penon applying it, not. & system of predictioDs about the world around him.
The difficulty experienced in § 2.6 with defining inditJerence &ee1Dl eloeely &88OCiated with the difficulty about vagueDe88 raised here. Another difficulty with the theory of personal probability (or, more properly, with that larger theory of the behavior of a person in the face of uncertainty, of which the theory of personal probability is a part) is that the statement of the theory is not yet necessarily complete. Tbus we shan in the next chapter come upon another proposition that. demands acceptance as a postulate, and, sinee even this leaves the per-
eo
CRITICAL COMMENTS ON PERSONAL PROBABILITY
(4.4
a great deal of freedom, there is DO telling when someone will come upon still another postulate that clamors to be adjoined to the others. Strictly speaking, this is not 80 much an objection to the theory &8 a warning about what to expect of ita future development. SOD
3 Connection with other ft8WI All view. of probability are rather intimately connected with one another. For example, any necessary view can be reprded 88 an extreme persooaliatic view in which 10 many criteria of consistency have been invoked that there is no role lelt for the peJ'8OD'S individual judptent. Again, objectivistic views can be regarded as penonalistic views according to which comparisou of probability can be made only for very special pairs of events, and then only a.coording to such eriteria that all (right-minded) people agree in their comparisons. From a different ~dpointt personalistic views lie not between. but beside, necetRry and objectivistic views; for bot.h Decessary aDd objectivistic view8 may, in contrast to personalistic views, be called objective in that they do not concem individual judgment. ,
Criticism of other vi....
It will throw some light on the personalistic vie\v to say briefly bow some other views seem to compare unfavorably \\~itb it. It is one of my fundamental tenets that any satisfactory account of probability must deal ,lith the problem of action in the face of uncertainty. Indeed, almost everyone who seriously considers probability, especially if be has practical experience with statistica, does sooner or later deal with that problem, though often only tacitly. Even BOme pel"8OllaJistie view8 seem to me too remote from the problem of action, or decision. For example, de FiDetti in (021 gives two approaches to personal probability. Of these, one is almost exactly like the view 8pOD8Ored here, except onJy that the notion "more probable than" is supposed to be intuitively evident to the person, without reference to any problem of decision. The other is more satisfactory in this . . apect, being couched in terms of betting beha\ior, but it seems to me a BOmewhat Ie. satisfactory approach than the one apoD8Ol'ed here, because it must assume either that the beta are for infiniteeimalaumB oranticipating the language of the next chapter-that the utility of money is linear. The theory expressed by Koopman in [K9), [KIO], and [KIll and that expressed by Good in [02] are both peraoDaiiatic views that tend to ignore decision, or at any rate keep it out of the foreground; but the personalistic view expressed by Ramsey in [Rl1 like the one 8pOIl8OJ'ed here, takes decision as fundamental. If any neceasary view
4.4)
CRITICIS~I
OF OTHER VIEWS
61
can be formulated at all, it might well be poesible to formulate it in terms of decision, but, so far as I know, the notion of decision has Dot appeared fundamental to the holders of any necessary view. It seems fair to .y that objectivistic views, by their very nature, must in principle regard decision as secondary to probability, if relevant at all. Yet, the objeetivist A. 'Vaid has done more than anyone el8e to popuMae the notion of decision. As bas already been indicated, from the position of the personalistic view, there is no fundamental objection to the possibility of CODstruCting a necessary view, but it is my impression that that p08Sibility has Dot yet been realiJed, and, though unable to verbalize rea8ODS, I eonjeetul'8 that the possibility is not real. Two of the most prominent enthusiasts of necessary views are Keynes, represe1lted by [K41, and Carnap, who has begun in {ell to state what he ho~ will prove a satisraetory necessary (or Dearly Decessary) view of probability. Keyoes indicated in the clOBiDg pages of [K4) that he was not fully satisfied that he had solved his problem and even sugestAd that aome element of objectivistic views might have to be accepted to achieve a satisfactory theory, and Carnap regards (ell as only a step toward the establiahmeut of a satisfactory Deee88ary view, in the existence of which he declares confidence. That these men express any doubt at all about the poasibility of narrowing a personalistic view to the point where it becomes & necessary one, after such extensive and eareful Jabor directed toward proving this possibility, speaks loudly for their integrity; at the same time it indicates that the task they have set themselves, if poesible at all, is Dot a light one. KeyDe8, writing in 1921 of what a.re here called objectivistic views, complained, "The absence of a recent exposition of the logical basis of the frequency theory by any of its adherents baa been a great disadvantage to me in critici.ing it." (Chap. VIII, Sec. 17, of [K4]). I beUeve that his complaint applies as aptly to my position today 88 to his then, though I cannot pretend to have combed the intervening literature \vith anything like the thoroughneM Keynes himeelf would hAve em· ployed. Reichenbach, to be sure, preaenta in great detail an interestiDg view that must be classified 88 objectivistic [R21, but it &eeDl8 far removed from those that domiDate modern statistical theory and form. the main 8ubject of the following discU8lion. Whatever objectivistic views may be, they seem, to holders of necessary and peraonaHstic views alike, subject to two major lines of eriticiam. In the first place, objectivistic view8 typically attach probability only to very special events. Thus, on no ordinary objectivistic view would it be meaninlful, let alone true, to say that on the basis of the available evidence it
ClUTICAL OOMMENTB ON PEBSONAL PROBABILITY
is very improbable. though Dot impolllibleJ that France will become a monarchy within the next decade. Many who bold objectivistic views admit that such everyday statements may have a meaning, but they insist, dependiDg on the extremity of their positions, that that meanjDI is not ~evant to mathematical concepts of probability or even to acienoe generally. The peraonaJistie view claims, however. to ana1yae such statements in terms of mathematical probabilityt and it CODSideni them important in science and other buma activities. Secondly, objectivistic viewa are, and I thjnk fairly, cbarpd with circularity. They are generally predicated on the existence in nature of processes that may, to a sufficient degree of approximation, be repreeented by a purely mathematical object, namely an in6Dite sequence of independeDt events. Tbis idealization is said, by the objectivista who rely on it, to be analogous to the treatment of the vague and extended mark of a carpenter's pencilaa a geometrical point, which is 80, fruitlul in certain contexta. When it is pointed out to the objectivist that he uses the very theory of probability in determiDing the quality of the approximation to which he refers, he retorts that the appHecl pometer--a fictitious cba.racter whose reputation for 80lidity in acience is unquestioned-likewise U8e8 geometry in determining the quality of his approximatioDL Let the geometer then be challenged, and he replies with a tllreefold reference to experience, saying, "It is a common experience that with sufficient experience ODe develope good judgment in the uee of geometry and thenceforth generally experiences 8U.ooeI8 in the predictiODS he baaee on it." "Now,".ya the objectivist, "the geometer's answer is my answer." But it seems to critics of objectiviatia views that, though the geometer may be entitled to make 88 maDy aUusioDS to experience 88 he pleases, the probabiliat is Dot free to do 10, precisely because it is the bwlinees of the probabilist to aoalyze the concept of experience. He, therefore, cannot properly support his position by alluding to experience until he baa analysed that concept, thoqb he can, of course, allude to 88 many experiences as be wishes. Two eorta of mixed views call for special comment here. First, some {amODl them Camap (el]; Koopman [K9], [KlO), aDd (KII); and Nagel [Nl)) hold that two probability concepti playa role in inference, an objectivistic ODe and a personalistic or a nece-uy one. This dualism is typically justified 88 neoesaa.ry to the analyaia of such a concept 88 that of a coin with unknown probability of failiDg heada. But, as 13.7 explains, de Finetti has provided a satisfactory aulyaia on the buia of personal probability alone. 8e00Dd, othere--for example, van Dumg (VI] and Feraud [F2]fiMinl the conventional objectivistic viewa circular for the re&IOD8 I
4.5]
THE ROLE OF SYl\fMETRY IN PROBABILITY
have cited, try to break the circle by relatively iaoIated uee of subjective ideas. Very crudely, it seems to be their position that in any ODe context it is allowable for a peraon to act &S though some one event of aufticiently small (objective) probability, cboaen at his di8cretion, were impc_ible. Quite apart from the relatively technical question of whether any consistent mixed view of this kind ean be CODatrueted. holders of personalistic aDd neeeasary views alike criticise them 81 unneeeaaarily timid, for they embrace subjective ideas, but only giDpdy. I
1'he role of lIJDUDetrJ In probability
An important and highly controversial question in the foundationa of probability is whether and, if 80, how symmetry CODSiderations can deteandne the probabilities of at least some events. Symmetry eoDSideratioDS have always been important in the study of probability. Indeed, early work in probability was dominated by the notion of symmetry, for it wu uauaUy either coneemed with, or directly iupired by, symmetrical pmbJiDg apparatus such &8 dice or cards. To illustrate those cJaasieal problema, suppose that a pmbler is offered eeveral beta concerning the poeaible outcome of rolling three dice, where it is to be understood that refraining from any beta at all may be amODl the available "beta." Which of the available bets should the gambler choose? Pemapa I distort history somewhat in inlilting that early problems were framed in terms of choice among beta, for many, if not most, of them 'Were framed terms of equity, that is, they ISked which of two pJayen, if either, would have the advantage in a hypothetical bet. But, especially from the point of view of the earlier probabilists, such a question of equity is tantamount to a question of choice amoDg beta, for to uk which of two "equal" betters has the advantage is to ask which of them baa the preferable alternative, U W84 pointed out quite explicitly by D. Bernoulli in [BIO). In elect, the clusical worken recommended the foUowiDllOlutiOD to the problem of three dice, with corresponding solutions to other pmbliDg problema: 1. Attach equal mathematieal probabilities to each of the 216 (- 63 ) lJ(aI"ble outcomes of roUiDg the three dice. (There are 61 poeaibilitiea, becauIe the firat, eecond, and third dice can each ahow any of six 8COJ'eI, all combinatiODS being poeaible.) 2. Under the mathematical probability established in Step 1, compute the expected \\innings (possibly negative) of the gambler for each available bet. 3. Choose & bet that has the largest expected winnings among thOle available.
m
CRITICAL COMMENTS ON PERSONAL PROBABILITY
('.6
At present it is appropriate to refrain from criticisms of the use made of expected wimUnp until the next chapter and to coneentrate discussion on the notion that the 216 possibilities should be coDSidered equally probable, which can coDveniently be done by drastically reducing the class of beta cOJl8idered to be available. Say, for definiteneaa, that the only beta to be eoDSidered are simply even-money beta of one doUar, tha.t the triple of ac0re8 falls iD a preasaiped sublet of the 216 possibilities. When attention is focused on this restricted cIaaa of bet.. the total recommendation is seen to imply that the probability meaaure defined in the first step of the recommendation be adopted 88 the personal probability of the gambler. To put it differentlYI a gambler who adopts the recommendation will hold the 216 possible outcom.. equally probable Dot only in some abstract 8eD8e, but also in the sense of personal probability as defined in § 3.2. The notion that the 216 possibilities should be repnled 81 equaDy probable is familiar to everyone; for it is taken for granted wherever gentlemen gamble 88 weD 88 in the standard biglHehool alpbra eoune8, where it serves to illustrate the theory of combiuatioDi and permutatioD•• Traditionally, the equality of the probabilities W88 auppoeed to be established by what was called the priDdple of msafRdeat reum., t thus: Suppose that there is an argument leading to the eoncluaion that one of the possible combinations of ordered scores. ..y t1, 2, 3 J, is more probable than some other, say 16. 3, 4). Then the information on which that hypothetical argument is baaed has such symmetry 81 to permit a completely parallel, and therefore equally valid, argument leading to the conclusion that t6, 3, 41 is more probable than 11, 2, 31. Therefore, it was ~rted, the probabiHtiea of aD combiDatiODB must be equal. The principle of iDBUfficient reason has been and, I think, will continue to be a most fertile idea in the theory of probability; but it is Dot so simple as it may appear at first sight, and criticism has frequently and justly been brought against it. Holden of DecelBBIU')' viewa typically attempt to put the principle on a. rigoroua basis by modifyiDc it in such a way 88 to take account of such criticism. Holders of per8OIIAIistic and objectivistic views typically regard the criticism as Dot altogether refutable, 80 they do not attempt to establiah a formal postulate corresponding to the principle but content themselves 88 I shan here -with exhibiting an element of truth in it. One of the first criticisms is that the principle is Dot strictly appHeable for a person who has had any experience with the apparatus in qUe&t Perhaps wh&t I here call the principle of insufficient reuoo should be called the principle of ropnt reaaon. See Section 3 of (Bill for the diatiDdion iDvolwd.
4.11
THE ROLE OF SY'fMETRY IN PROBABILITY
tion, or even with similar apparatus. Thus, attempts to use the principle, as I have stated it, to prove that there is no such thing as a run of luck at dice, as actually played, are invalid. The penon may have had relevant experience, directly or vicariously, not only with gambling apparatus itself, but also with people who make and handle it, includiq cheaters. It is Dot always obvious what the symmetry of the informatiOD ia m a situation in which one wishes to invoke the principle of jDl81fficieDt reason. For example, d'Alembert, an otherwise great eighteeDth-century mathematician, is supposed to have argued aeriously that the prob.ability of obtaining at least one bead in two toues of a fair coin is 2/3 rather than 3/4. (Cf. (Tal, Art. 464.) Heads, aa he said, might appear OD the Jim tOSB, or, failing that, it might appear on the second, or, finally, might not appear on either. D'Alembert conaidered the three poaaibilitiee equally likely. It .ems reasonable to suppoee that, if the principle of i n ad6 cieDt reason were formulated and applied with sufficient care, the eooclusiou of dtAlembert would appear simply as a mistake. There are, however, more serious examples. Suppose, to take a famous one, that it is known of an um only that it contains either two white balls, two black balla, or a white ball and a black ball. The principle of insuffieient reason baa been invoked to eonclude that the three possibilities are equally proba• ble, 80 that in partieular the probability of one white and one black ball is concluded to be 1/3. But the principle hu also been applied to conclude that there are four equally probable possibilities, namely, that the first ball is white and the second also, that the first is white and the 8eCOud black, etc. On that basis, the probability of one white and one black ball is, of coune, 1/2. Personally, I do Dot try to arbitrate between the two conclusions but consider that the existence of the pair 01 them reflects doubt on the notion that a person's knowledge relevant to any matter admits any full and precise description in terms of propositions he knows to be true and others ahout which he knows nothing. Afost holders of personalistic views do not find the principle of insufficient reason eompelling, because they envisage the possibility that a person may consider one event more probable than another without haviDg any compelling argument for his attitude. Viewed practically. this position is closely associated \1i;th the first criticism of the principle of iblUflicient reason, for the holder of a personalistic view typically 8Upp0ge8 that the person is under the influence of experience, and poe.ibly even biologically determined inheritance, that expmaaes iteelf in his opinions, though not neeesaarily through compelling argument.
cnmcAL COMMENTS ON PERSONAL PROBABILITY
Holders of personalistic views do see some truth in the principle of insufficient reason, because they recognize that there are frequently pal'-
titioDB of the world, associated with symmetrical-looking gambling apparatus and tbe like, tbat many and dive. . people all consider (very nearly) unifonn partitions. As was illustrated in the preeedmg section, we often feel more "awe" about probabilities derived from. the judllllent that such partitions are uniform than we do about others. Such partitioDs are, moreover, very important in that they provide some events the probability of which to diverse people is in agreement.. Though the events eoneemed are often of no importance in themaelvea, agreement about them can, through the statistical invention of randomization, contribute to agreement about all sorts of issues open to empirical investigation. Widespread though the agreement about the near unifonnity of some partitions is, holders of personalistic views typically do Dot find the contexts in which such agreement obtaine sufficiently definable to admit of expression in a postulate. Holders of purely objectivistic views see no seuse at all in the original formulation of the principle of insufficient reason. for it U88I '·probability" in a manner they consider meanin&iess. But they too Bee all element of truth in the principle, which they consider to be establisbed 88 a part of empirical physics. Thus, for example, they regard it .. an experimental fact, admitting some explanation in terms of theoretiea1 physics, that three dice manufactured with reasonable symmetry WI11 exhibit each of the 216 poesible patterns with nearly equal frequency, if repeatedly rolled ",,;th sufficient violence on & suitable surface. Holders of personalistic views agree that experiments Of, more pnerally, experiences determine to a large extent when people employ the idea of insufficient reason. Thus, though experiments with pmblinl apparatus, quite apart from gambling it8elf, have & fascination tbat perbaps exceeds their real interest, such experiments are not altogether worthlese. On the one band, they provide strong evidence that a perSOD cannot expect to maintain a symmetrical attitude toward any piece of apparatus with which he has had IODg experience, unless he is vIrtually CODVinCed at the outset that the p088ible states of the apparatus are equally probable and independent from trial to trial. To say it in the more fanu1iar and sometimes more eongeniall8Dguage of objective probability, long experiments with coins, dicet cards, and the like have always shown some bias, and often 8O~e dependence from trial to trial. On the other hand (and this has the utmost practical importance), it has been shown that. with skill and experience, gambling a.pparatus, or its statistical equivalent, can be manufactured in which the bias and the dependence from trial to trial are extremely small. This implies
'.6)
HOW CAN SCIENCE USE A PEBSONALISTIC VIEW!
that groupe of very diverse people can be brought to agree that repeated triaJa with certain apparatus are nearly uniform and nearly independent.. Thus eertain methods of obtaiDing random numbers and other outcomes of uniform and independent trial8, which are vital to many 80rta of experimentation, have justifiably found a~eptance with the scientific public. A stimulating account of practical methods of obtainiDg laDdom numbers, and random samples generally, is given by Kendall in Chapter 8 (Vol. I) of (K2].
6 How can lCience use a personalistic view of probabiUq? It is often argued by holders of necessary and objectivistic view8 alike that that ill-defiDed activity known as science or scientific method CODaiate largely, if Dot exclusively, in finding out what is probably true, by criteria on which all reasonable men agree. The theory of probability relevant to science, they therefore argue, ought to be a codification of universally acceptable criteria. HoJders of necessary views eay that, just as there is no room for dispute as to whether one proposition is logically implied by others, there can be no dispute 88 to the uteat to which one proposition is partially implied by others that are thought of as evidence bearing on it, for the exponents of necessary views reprd probability 88 a generalization of implication. Holders of objectivistic views say that, after appropriate observations, two reaaonable people can DO more disagree about the probability with which trials in a sequence of coin toeses are heads than they can disagree about the length of a &tick after measuring it by suitable methods, for they con.. Bider probability an objective property of certain physical systems in the same sense that length is generally considered an objective property of other physical systems, small errors of measurement being contemplated in both contexts. Neither the necessary Dor the objectivistic outlook leaves any room for personal differences; both, therefore, look on any penonalistic ,,;ew of probability 8B, at beat, an attempt to predict some of the behavior of abnormal, or at any rate unacientme,
people. I would reply that the personalistic view incorporates all the universally aeceptable criteria for reasonableness in judgment known to me and that, when any criteria that may have been overlooked are brought forward, they will be welcomed into the pel'lODalistie view. The criteria incorporated in the personalistie view do not guarantee agreement on all questions amoDg all honest and freely communicating peoplet even in principle. That incomplereness, if one will call it sucb, does not distre88 me, for I think that at least some of the disagreement we see around us is due neither to di8bODesty, to errors in rea8ODing. nor to
68
CRITICAL COMMENTS ON PERSONAL PROBABILITY
(4.6
friction in communication, though the harmful effects of the latter are almost ineapable of exageration. As was mentioned in connection with aymmetry, there are partitioDi that diverse people all cousider nearly uniform, though not eompe1led to that agreement by any poatulate of the theory of peraonal probability. As has also been mentioned and as will be expJained later (e. pecially in § 14.8), through the statistical invention of randomisatiOD, agreement about partitions pertaining to gambling apparatus of no importance in itself can be made to contribute to agreement in every part of empiric.1 science. Another mechanism that brings people baviDl some, but. not all, opinions in common into more complete agreement was illuatrated in 113.6-7. Indeed, it was there shown that in certain contata aDy two opinions, provided that neither is extreme in & technical sense, are almost sure to be brought very close to one another by a suf6ciently large body of evidence. It b88 been counteredt I believe, that, if experience syatematicaDy leads people with opinions orilinally ditJerent to hold a commOD opinion, then that common opinion, and it only, is the proper eubject of acientific probability theory. There are two inaccuracies in this argument. In the first place, the conclusion of the personalistic view is not that evidence brings holders of different opiniODS to the same opiniona, but rather to similar opinions. In the second place, it is typically true of any observational program, however extensive but preecribed in advance, that there exist pairs of opinions, neither of which can be called extreme in any precisely defined aense, but which e&Dnot be expected, either by their holders or any other person, to be brought into cloee agreement after the ob8ervational program. I have, at least once, beard it objected against the personalistic view of probability that, according to that view, two people might be of different opinions, according 88 one is 'pessimistic and the other optimistic. I am not sure what position I would take in abstract diacuMiOD of whether that alleged property of pel'8Onalistie views would be 0bjectionable, but I think it ia clear from the formal definition of qualitative probability that the particular per8OD8listic view 8pODSOJed here does not leave room for optimism and pessimism, however these traits be interpreted, to play any role in the pel'8On'e judgment of probabilit.ies. + See (Fiaber 1934), p. 287.
CHAPTER
5
Utility 1 Introduction The postulates P4-6, introduc~ in Chapter 3, have already led to simplification of the relation S in so far as it applies to acts of a special but important fonn. Indeed, through the introduction of numeriea1 probability, those special comparisons have been reduced to ordinary arithmetic comparison of numbers in such a '9tray that many relatiOll8 &mODI acts are deducible by simple and systematic aritbmetic calculation. In this ciaapter it \\ill be shown that the arithmetization of comparison among acts can, with the introduction of one mild new postulate, be extended to virtually all pairs of acts. This far-reaching arithmetization of comparison among acts is achieved by attaching a number U(/) to each eOD8eQUence! in such a way that f S I if and only jf the expe I,
> Po if (/ < Po.
jf 11
According to (5), no number, except possibly Po, can satisfy the equivaJence demanded by the theorem. Finally, using (5) and P6 (much &8 it was used in the proof of Theorem 2), it follows that Po does indeed satisfy the equivalence. • 3 Utility, and preference &mODI gambles
The idea of utility can most conveniently be introduced in connection with gambles or, equivalently, acts that with probability one are confined to a finite number of cOIlSP.Quences, thus: A utility is a function U 8880ciating real numbers "itb consequencea in such a way that. if '%PJi and II - ~(TRJ; then 1 ~ II, if and only if ~p,U(fd S '1:,"jU(g/). Writing U(I) for 'Z,PiU(f.), the condition takes the fonn flff].s Similarly, it is convenient to understand that, for an act f, ,
:=II
(1)
ur,].
U[f] - B(U(f).
In this notation the following obvious theorem gives a slightly different characterization of utility.
TBBoRllll 1
A real-valued function of consequences, U, is a utility; if and only jf f S C is equivalent to U(f) < UflJ. provided t and C are both with probability one confined to a finite set of consequences.
Do the postulates thus far assumed guarantee that any utilities exist at all? Can Theorem 1 be extended to an even wider cl888 of acta? Does a great diversity of utilities exist, or does the relation ~ practically detennine the function U? These questioDS, here mentioned in the order in l\iUch they most naturally arise, are manjfestly of great importance in understanding utility. 'For technical reasons, they will
t cr., if necelllal'Y. any introduction to the theory of the n!al numbers for e~plan. lion of t.hia principle, e.g.,
Cha~r
II of (G3).
(a.a
74
be anarwezed in a different order-the third followed by the first in this aectiou, and the a!>COIld in the next aection.
U there is a utility at aU, there is aurely more than one, because • utility plus a coD8tant and a utility times a positive coDStant are aJeo obvioualy utilities; thU8: 2 If 11 is a utility, and p, then 11' - pU + (I is aleo a utility.
TBBORDI
(I
are real nwnbera with
p
> 0;
If there exista a utility, and if J < g; then there ex... ists a utility U for which U(J) and U(g) are any preassigned pair of
COROLLARY
1
numbers, provided UCJ)
I : J, U(f) - 8(1), and
s
or
(1)
VII] - lim PI/(,) • -0
~ 1-
.1 .
Since the probability in (1) decreaaee with ., there is DO question about the existence of the limit. Now let Wlf) - utI) VlIl, and define f S I to mean that W[f) S WlIJ. Checking postulates Pl-6, it will be found that the < thua defined satisfies them all, and that what haa here been called UU) is indeed & utility for S. But if, for example, there is an f such that U[fI - VII) ~ 1, P7 is violated, 88 can be seen by compariq f to the act that, for each " takes &8 value the maximum. of t and 1(.). Whether there can be such an f, mayI 80 far as I know, depend on the choice of S and P. But, if the positive integers are taken as 8, and P is 80 chosen that though the probability of anyone integer is 0 the probabDity of the set of even integers is 1/2, a poaaibUity 888UJ'ed by the Dote to Section 3 of Chapter II on p. 231 of (B4]J the function equal to 0 at the odd intege1'8 and equal to (1 - lin) at each even A is such an f. Finite, 88 oppoeed to countable, additivity P!IND8 to be essential to this example; perhaps, if the theory were worked out in & countably additive spirit from the start, little or no oounterpal't of FI would be necesaary.+
+
+ .Fishburn (1970, .'dercise 21, p. 2lS) haa sua-ted an appropriate ..t-
eningof P7.
6.4J
THE EXTENSION OF UTILITY TO MORE GENERAL Aar8
79
Several lemmas depending on P7 are now to be proved ~tory to proving that U[I) governs preference for a very large cluI of acts. It is to be understood throughout the section that U is any fixed utility. The truth of each lemma is intuitively clear, in the seD8e that each could justifiably be aeeepted 88 a postulate if need be. Since they are also easy to prove and of secondary interest, condensed proofs "ill suftice. LEMMA
If, for every consequence h, f
1
< A, and I < "; then f
.. I.
PaoOF. Conaider in the light of P7 that f S ge,) and , S f(,) for every •.• If there exists a consequence 10 such that f S lOt and if U(J(.) S Uo for every I, then there exists a pmbJe • such that f S • and U(,) < Uo• LEMMA,
2
If UUo) SUo, then • can be taken to consist of 10 alone. Otherwise, let /1 be any consequence such that U(fl) S Uo and let. be the unique mixture of 10 and 11 such that UC.) - Uo•• PROOF.
IIDOIA
Hyp.
3 1. Tbe Bl8, i ;: 1, ... , n, are a partition, and the Ui's are
corresponding numbers. 2. f is an act such that U(J(.» < U, (or, • B,. 3. I is a pmble such that I :S f. UrI]
CoNCL.
oS 'XUtP(B.).
PRooF. If the lemma were falae, it would be falae even for some' < f. Then it may be &88UDled, modifying t if need be by meaDS of P6 and Lemma 1, that there exists for each i an fi such that f < fi given BiNow, in view of Lemma 2, there exists for each i a gi such that f S Qi
ut g = ~P(B.)g"
given B, and Ulgil < U ,. f S g. Therefore, utf) S Ufo]
=:
and obseJve that , %'P(Bi ) U(fli) S %P(B~) U. . .
(Bi ) ..
Similarly, acconlilll to Exercise 3 of Appendix 1, ~V._IP(Bi) ~
(4)
U(f]
~
ZV,(B.).
Therefore (6)
I U(fJ -
I
U[I] S l:(VI - V'_l)P(B,) - (U I
-
Uo)/n,
whence U(f) -= U(I). To consider the remaining case, suppose that the bounded act f exeeeda (is exceeded by) every CODaequeneej caD it for the moment big (l","). Aceonlinl to Lemma 1, all big (aDd, dually, alllitt1e) acts are equivalent to one another. Furthermore, it is, for example, eaeily aeen that, if an act is big, then for ~ > 0, (6)
p{ U(/(8»
~ sup U(f) I
II!} -
1.
(Some may be more familiar with the notatiOD "LUB" and "GLB," read "least upper bound" and Ugreatest lower bound," thaD with the conespondiDg "sup" and "w," read "supremum" and "infimum." If even theae older terms are not famUiar, see Exercise 4 of AppeoWs 2.) Therefore, if there are big (little) acta, they all have the II8Dle expected utility, namely sup U(J> (inf UU». . Suppoee now that f S I. It is poaaible that f and I are both little; that f ia little, and I is equivalent to lOme pmble; that f is Htde and I big; that f and I are each equivalent to some pmble; that f is equiv.. lent to some pmble, and , it big; or, finaDy, that they are both bil" In each of these eases, a simple argument showe that U(I] ~ UIIJ. The CODveI8e arguments are similar. • COROLLARY 1 If t and C are bounded, and P(B) > 0" then f S I given B, if and ollly if E(U(f) - U O.IU (12,500,000) + 0.S9U (t500,OOO) + O.lU (to),
(3) 0.1 U ('2,500,000)
+ O.9U (eo) > 0.11 U (t500,OOO) + 0.89U (to);
and these are obviously incompatible.
Examples t like the ODe cited do have a strong intuitive appeal; eveD if you do not peraonally feel a tendency to prefer Gamble 1 to Gamble 2 and simultaneously Gamble 4 to Gamble 3, I think that a few trials with other prizes and probabilities will provide you with an example appropriate to yourself. H, a.ftB thorough deliberation, anyone maintains a pair of distinct preferences that are in conflict with the sure-tbin, principle, he must abandon, or modify, the principle; for that kind of discrepancy seems intolerable in a normative theory. Analogous circumstances foreed D. Bernoulli to abandon the theory of mathematical expectation for that of utility [BIO). In general, a pe1'8On who has tentatively accepted a normative theory must conscientiously study aituatiODB in which the theory seems to lead him astray; he must decide for each by reflection -deduction will typically be of little relevance-whether to retain hie initial impresaion of the situation or to accept the impllcatiooa of the theory for it. To illustrate, let me record my own reactions to the example with
t ADaia baa aDDOUDeed (but Dot yet. pubUahed) reapoue8 of
an empirical InveetiptioD of the
prudent, eduea" people to mch eDlDpIee (At].
5.6J
HISTORICAL AND CRITICAL
OOMME~"TS
ON trrILITY
103
which tbia heading waa introduced. When the two situatiODS were 6nt presented, I immediately expre.ed preference for Gamble 1 18 opposed to Gamble 2 aDd for Gamble 4 as opposed to Gamble 3, and I still feel an intuitive attraction to those preferences. But I have since accepted the following way of looking at the two eituatiOll8, which amounts to repeated use of the sure-thing principle. One way in which Gambles 1-4 could be realised is by a lottery with a hundred numbered tickets and with prizes according to the schedule shown in Table 1. TABLB
1.
PRlZU IN UNIT8
or el00,OOO
GAMBI&6
1
Situatio 1 Gamble 1
tGamble 2
6 0
S· tuatio 2 {Gamble 3 1 D Gamble 4
5 0
D
IN A LarJ'EBT UALlIIMO
1-4
Ticket Number 2-11 12-100 6
6
25
5
5
0 0
25
Now, if one of the tickets numbered from 12 through 100 is drawn, it will not matter, in either situation, which gamble I chooee. I therefore focUII on the poasibility that one of the ticketa numbered from 1 tJuooch 11 will be drawn, in which cue Situations 1 aDd 2 are exactly parallel. The subsidiary decision depends in both situations on whether I would eeD aD outright gift of esoo,ooo for a 10-to-l chance to win 12,500,000a conclusion that I think has a claim to univeraality, or objectivity. Finally, consulting my pme1y personal taste, I find that I would prefer the gift of 8500,000 and, accordingly, that I prefer Gamble 1 to Gamble 2 and (contrary to my initial reaction) Gamble 3 to Gamble 4. It IeEms to me that in reversing my preference between Gambles 3 and 4: I have corrected an elTOr. There is, of courae, an important senae in which preferences, being entirely subjective, cannot be in error; but in a different, more subtle sense they can be. Let me illustrate by a
simple example containjng no reference to uncertainty. A man buying a ear for a,134.56 is tempted to order it with a radio installedJ whiah will bring tbe total price to 12,228.41, feeliDg that the difference is trifliDg. But, when he reflects that, if he already had the ear, he certainly would not spend t93.85 for a radio for it, he realicea that he baa made an elTOr.
One thing that mould be mentioned before this chapter is closed is that the law of diminiabiDI JD8lIinal utility playa DO fundamental role
104
UTILITY
16.6
in the von Neumann-MorpDltem theory of utility, viewed either empirically or normatively. Therefore the pOl8ibUity is left open that utility as a function of wealth may not be concave, at leBA in IOIDe in-
tervals of wealth. Some economic-theoretical CODBequences of 1'eCOInition of the possibility of non-concave eegn1eote of the utility function have been worked out by Friedman and myaeJf (F121, and by Friedman alone [FlII. The work of Fri«Jman aDd myeeIf OD this point is eriticized by Markowitz (MIJ.+
+ See &JIO Anhibald (1959) aDd HWD880D (1970).
CHAPTER
6
Observation 1 Introdactlon With the constroction of utility, the theory of declaion in the face of uncertainty is, in a aeD88, complete. I have DO further postulates to propoee, and those I have proposed have been ahown to be equiva.lent to the aasumption that the person always decides in favor of an act the expected utility of which is as large as poEble, auPposiDI for simplicity that only a finite number of acta are open to him. At the level of generality that has led to this conclusion there seems to be little or nothing left to .y. To go further now meaDS to 10 into more detail, to investigate special types of decision problema. One type of decision problem of central importance is that in which the pel80ll iB called upon to make an observation and then to choose some act in the light of the outcome of the observation. The consideration of such observational decision problema is a step toward thoee problems of great interest for statistics in which the perBOn must decide what observation to make, that is, of coone, what to look at, not what to see. Tbey are the problems of designing experiments and other observational programs. Some remarks on observation were made in Chapter 3, but only DOW that the theory of utility is established is it poEble to give a relatively complete analysis of the concept. Observation is a concept esaential to the study of statistics proper, most of what has been said thus far being preliminary to, but Dot really part of, 8tatistit~; even after this chapter and the next one, on observation, there will still remain a major transition. One important feature of mud. uf wbat is ordinarily called statistics is, according to my analysis, concerned \lith the behavior not of an isolated peraoD, but of a group of pel'8OD8 acting, for example, in concert. ID later chapters I will deal, 80 far as I am able, with the problem group action, but preliminary considerations bearing on it will be made and pointed out from time to time in this chapter and the next. Though the details of these two cbaptel'8 may eeem mathematically forbiddinlt drastie simpIifyiDI &88umptioDa are made in them to keep
or
105
108
OBSERVATION
(8.2
extraneous difficulties to a minjmum. These typically take the form of assuming that certain seta of ads, events, and values of random varia. bles are finite. Even in elementary applications of the theory, theae simplifying assumptions seldom actually hold. In some contexta, it is quite elementary to relax them sufficiently; in others, serious mathematical effort has been required; and aome are still at the frontier of research. RelaxatiODS of the aaaumptiODB will be touched OD from time
to time, sometimes explicitly but sometimes only implicitly in the choice of suggestive notation and nomenclature. Beyond this introduction, the preeent chapter is divided into four sections: 12 analyses informally aDd then formally the notion of a coatfree observation; It 3 a.nd" discuss certain obvious but important c0nditions under which one observation, aDd similarly one set of acta, is more valuable than another; 15 abstractly discu_ problema of designing experiments or, perhaps more generally, obaervational programs.
2 What an observation is To begin with an informal survey of obaervation, coDBider a decision problem, that is, a person faced with a decision &mODI aevera! acts. Calling it the basic decision problem and the acta associated with it the basic acts, a new decision problem would arise, if the person were infonned before he made his decision that a particular event, say B, obtained. The new decision problem is related to the basic decision problem in & simple way; for the acts a.saoeiated with it are a1so the basic acts, and the decision is to be made by computiDg the expected utility given B of the basic acts and deciding on one that maximises the conditional expected utility. The basic problem may be modified in stiU another, though closely related, way. Let the person say in advance, for each possible Bit which of the basic acts he will decide on when he is informed, as he is to be, which element Bi of a given partition obtains. This will be called the derived decision problem arising from the basic decision problem aDd the observation of i, and ita acts will be called derived acts. Technically speaking, the derived acte are detennined by arbitrarily 888igning one basic act to each element of the partition. For any state " the consequence of a derived act is the cODBequence for , of the basic act A880ciated with the particular Bt in which 8 lies. The terms informaUy introduced in this paragraph are defined formally later in the section. A derived decision problem is not necesaarily clliferent in kind from the basic problem; indeed it is quite possible that the basic problem ean itself be viewed as derived from lOme other basic problem and obeer-
vation.
WHAT AN OBSERVATION IS
107
Formidable thouab the description of a derived problem may seem at fim reading, ita aolUtiOD is, in a aeoae, easy and has aIn.dy almost been given; for it is clear that, if P(Bi} > 0, the penon will decide to associate with B, a basic act the expected utility of which given B, is 88 high as possible, and, if P(B,) - 0, it is immaterial to the person which basic act is associated with B i . It is almost obvious that the value of a derived problem cannot be leaB, and typically is greater, than the value of the basic problem from which it is derived. After all, any basic act is amODg the derived acta, 80 that any expected utility that can be attained by deciding on a basic act can be attained by deciding on the same basic act coDBidered as a derived act. In ahort, the pel'BOIl is free to ignore the obeervation. That obvious fact is the theory'8 expression of the commonplace that knowledge is Dot disadvantapoua. It sometimes happens that a real pel'BOIl avoida finding something out or tbat his friends feel duty bound to keep something from him,t saying that what he doesn't know can't hurt him; the jealous apouae and the hypochondriac are familiar tragic examples. Such apparent exceptiODB to the principle that forewarned is forearmed call for anaIJ'Iis. At first sight, one might be inclined to _y that the peJ'801l who refWJel freely proffered information is behaving irrationally and in violation of the postulates. But perhaps it is better to admit that information that aeema free may prove expensive by doing psychological harm to ita recipient. Consider, for example, a sick penon who is certain that he baa the best of medical care and is in a position to find out whether bis mclme88 is mortal. He may decide that his own peraoDality is such that. though he can continue with some cheer to live in tbe fear that he may poaibly die 8OOD, what is left of his life would be agony, if he knew that death were imminent. Under such circumstances, far from caning him irrational, we might extol the person's rationality, if he abstained from the information. On the other band, such an interpretation may aeem foreed. (Cf. Critieiam (f) of '5.6.) Examplee of deciaioDS based on observation are aD every hand, but it will be worth while to examine one"in some detail before undertaking an abstract mathematical analysia of such dec,-isions. Any example would have to be hiehly idealiled for simplicity, becauae the complexity of &By real decision problem defies complete explicit description, but particular simplicity is in order here. The penon in the example is considering whether to buy some of the grapes he aees in a grocery store and, if 80, in what quantity. To hi8 taste, the grapes may be of any of three qualities, poor, fair, and excellent. CaD the qualities Qgenerically and 1, 2, and 3 individually. From
lOB
OBSERVATION
(U
what the perlOR knows at the moment, inciudiDg of coune the appearanee of the grapes, he caaDofr be certain of their quality, but be attaches peraonal probability to each of the three poaeibilitiea according to Table 1. T A.8L11 1. P(Q) Q(uatity)
123
1/4
P(robability)
1/2
1/4
The perIOD can decide to buy 0, 1, 2, or 3 pounds or 1faPI8; theae are the buic acta of the example. TV;", one consideration with &Dother, be finds the CODBeqUeDCeI of each act, measured in utilea, in each of the three poEible evente to be tho. Pven in the body of Table 2. The expected utilities in the riPt IIl&l1Pn of Table 2 follow, of coune, from Table 1 and the body of Table 2. TABLIt
2. UTtLIT!' 1(0)
)'OB .ACB
f
AND BAca
0
Q f
1
0
0
1
-1
2 3
-3 -6
2
3
B(f)
0 1 0
0 3 5
0 1
1/2
-2
6
-1
The entriea in Table 2 have not been ehoeen hapbuardly, but with an attempt at verisimilitude. ThUi it is IUppoeed that if the perIOD buys grapes of poor quality his di. .tisfaetioD with the barpiD will accelerate rapidly with the amount bought, which seems reaaonabte, especially if the keeping quality of poor grapes is low. He is., of course, unaffected by the quality if he buys DODe. Again, buyiDg & few fair grapes m&y be mildly desirable, but overbuyiq is not. FiDally, excellent grapea are worth buyiq, evan in large quantities, but the utility of the purchase increases 1888 than proportionally to the amount bouPt. The correct solution of the basic decision problem is to buy 1 pound of grapes; for that act baa, according to the right margin of Table 2, an expecUd utility of 1, which is the largest that can be attained. Now. suppoee the penon is free to make aD observation, that is, a new observation in addition to thOle that may have contributed to the determination of the probabilities in the basic problem. It may be, for example, that the grocer invites him to eat a lew of the grapes or that the penon is going to uk the woman beaide him how fJley look to her. Let there be five poaeible outcomes of his obeervatiOD; call them ~
109
WHAT AN OBSERVATION IS
8.2]
generically and 1, 2, 3, 4, and 5 individually_ I 888Ume, though this f.tore ill rather iDcidentai to the example, that low values of % tend to be IlUl(plltive of low quality. The joint distribution of % and Q. that is, the probability that % and Q simultaneoualy have &Dy given pair of values, is of eentral technical importance. ThOBe probabilities, each multiplied by 128 for limplicity of presentation, are given in the body of Table 3. The "'htrband and bottom margiDa of the table give, TABU:
3. 128P(%
n Q)
Q 1 2
3 4 6
1
2
3
128P(z)
15 10 4 2 1
5
21
15
1 2
27
24
4
32
15 5
10
15
'l7 21
32
64
32
128
128P(Q)
also multiplied by 128, tbe probability of each value of x and of each value of Q. The marginal entries are, of course, obtained by adding rows and columns. As indicated in the lower right-hand comer of the table, the probabilities assumed do indeed add up to 1, and the bottom marain recapitulates Table 1. Conditional probabilities can easily be read from Table 3. Thus, for uample, the conditional probability that % is 2, given that Q is 3, is 2/32, and the conditionaJ probability that Q is 2, given that % is 4. is 15/'Zl. It wiD be aeeD in later seetioDa that the distribution of z given 12 is, iD • eeo8e, even more fundamental than the joint distribution of 2: and Q. There are 4' - 1,024 derived acts, since one of the four basic acts can be &89igned arbitrarily to each of the five po.gible outcomes of the observation. It is an easy exercise, using Tables 2 and 3, to verify Table 4, \\·hich shows the conditional expectation of the utility of eai then z is aD ......00 of J, and y is a contractioa of x. If z is an exteDBioD of ,. and '1 is an exteoaiOD of x, then z and ., are equivalent. Strictly speaking, one should 8&y not that x and J are equivalent, but rather that they are equivalent regarded as obeervatioDa, for tbia would Dot be a good cOlleept of equivalence to apply to random variables regarded &8 such. For example, & pair of equivalent obeervatioD8 can obviously be a pair of real random variables with different expected valUaI. Some properties of the relatiODS of exUmaion, contraction, and equivalence between observations are given by the followiDg easy but important aercisee. Throughout ibis set of exerciaes it is unn8Cesaary to IUppoee the obeervatioDS confined to a finite 88t of values; in the ease of ExerciBe 3b, it is impossible to do 80. Exerciaea 1. z and y ~ equivalent, if and only if x is both an extensiOD and a contraction of y. 2a. If P (%(.) fI(') I ~ 1, z and y are equivalent. 2b. Any observation z is equivalent to itaelf. Sa. If there is a value flo such that P{W(.) - rol == 1, then every % is an exteDBion of 1, and any two such obee"ations are equivalent. Such an obaervatioD, of course, amounts to obeerving notbing at all and will therefole be called a DUll obler'fatIaDab. If %(.) - • for almost all , • S, then z extends every J. 4. If z is an exteDsion of 'I, and '1 i8 an extension of %, then z is an exteDsion of &. State and verily the analogous fact about equivalence. 5a. If Tit .. function &I8OCiatiDg an element of Y with each element of X, and x is an obeervation. then the obaervatiOD '1 such that 1 v(x) is a contraction of z. 1:11
MULTIPLE OBSERVATIONS, AND EXTENSIONS
6.31
113
5b. If '1 is & contraction of Z, then there is a function l' such that Ply(') - 11'(%(,»' -= 1. What freedom is there in the ehoice of the function 1'1 Se. What are the implications of Exercises 5a and 5b for equivalence between obeervations? 6. If x and J are observations and z == tx, 1) is the corresponding double obeervatioD, then z is an extension or z and of .,. (This exereiee seems to call for a eonverae saying that every extension can be n!prded as a double obeervatiOD, but DO really neat one sugests itaelf to me. None the lees, in thinkiog about extensions and eontractiODl, the aort brought out by the exercise is a typical and stimulatinl example.) 7. tx, 11 is equivalent to.x, if and only if x extends J. The relatiODS ot extension, contraction, and equivalence have parallels for seta of acts., defined thU8: If I' and G are (noD-v&euoua) sets of acts ~h that, for lOme B of probability one, there is for each, I G an f with lea) =- ,(1) for all , • B; then I' is an atBnlion of G, and G is a contraction of II. If .. is an exteD8ioD of G, aDd G is aD extension of F, then J' and G are equiv-
I'
al_t More nerdses 8. If:r is an extension of (equivalent to) G, then v(F) ~ (-) p(G). 9. Discu88 the analogues of Exereiaes 1, 2b, and 4: for seta of
acts. 10. H po ~ G, then P extends G. 11. If P(z) is derived from F on obeervation of x, then V(x) extends
P. 12. HTP.
F(x) is derived from P on obeervation of %; )'(1) is derived from F on observation of 1; F(x, y) is derived from F on obeervation of (x, ,..1; F(z; y) is derived from F(z) OD observation of J. CONCL.
1. r(x, 1) ia equivalent to P(.x; 1). 2. '(Zt y) extends F(z) and P(y) .. 3. If z is equivalent to 1, then P(z) is equivalent to F{J). 4. If 1 extend..z; then P(z, y) is equivalent to P(y), P(y) is equil-alent to Jl(z; 1), and PCy) extads F(x).
(64
114
13a. Under the hypothesis of 12, the equivalences and relatioDS of extension &mODI the seta of acta arising out of two obaervatioDB can, with evident coDventioDB, be diagrammed thua:
I
Zi J
Z, J
Ji Z
l
I
!
x-
.04
J
13b. If "1 extends x, the di&lram becomes
I
z, Y
Zi Y
Yi z-
J
I-
Z -
o.
l3e. If x and '1 are equivalent, the diagram becomes
Z,Y
XiJ
z
Jj Z J
-+
0•
14. If :F(x) and G(z) are derived from P and G, respectively, and if P extends G, then r{x) extends G(z).
I
15. II(F(z» - E[I/(F z)I" ,
f
I
II(F x(,» dP(,)
~ II(F).
Dominance and admgtdbiUty
According to Exercise 3.14. if one set of acta, reprded .. ba.aie, extends another, the first is at least &8 valuable as the secoDd in the liaht of any observation whatever. This aeetion splorea a relation, dominance, which haa the same property but is Dot 80 strict as exteD8ion. Dominance is of some importance for the theory of personal probability as it baa been developed thus far. But its ·importance will be even greater in the study of statistics proper, where interperscm.a.l agreement is of particular interest; for, as the definition shortly to be given will make clear, two people having different persow probabilitiee will agree 88 to whether one of two sets of acts dominates another, if ollly they agree which events have probability zero a condition generally met in practice, and one that could if desired be dispeDSed with by a alight cbange in the definition of domina.u~ I t will be seen that dominance and notions related to it are intimately a.ooiated with the sur&-tbiDg principle. Indeed, probability beiDa taken for granted, the basic facts about domiDance aeem to give a c0mplete expreB8ion of the sure-thing principle. Dominance aDd related concepts were much stressed by Wald, in [W3) for e.umple.
DOMINANCE AND ADMISSIBILITY
8.41
116
Two or three notiODa, the logical connectioDS among them, and those between them and exteDsion, are to be treated. The logical conneotioDS being many but simple, I think that the materiallenda itae1f better to formal than to expository treatment, for in such a context the reader who looks for the motivating ideas sees them himself more easily than he comprehends someone else's verbalilation of them. This section will therefore consist primarily of a group of formal definitions and
several exerci8ea.
,(8»
If and only if PUC,) ~ = 1, f domin.tes ,. If and only if some (every) element of P dominates (18 domina~ by) I, I' domlnatel (II dominated b,) I. If and only if :r dominates every element of 0, P domina. . G. If and ODiy if f dominates " but I does not domiDate f, f Itrlctly dominates,. If and only if f • P, and f i. Dot strietly dominated by any element of P, f is aclmisaible (with respect to F).
Involving as they do acta 88 wen 88 sets of aeta, the definitioDa, strictly apealdDl, introduce four different kinds of domiDanee. However, this complexity can be alleviated, with a alight lapse of logie, by identifying each act f with the set of acta of which f is the ODly element, for it is easily seen that this identification is in lOch harmony with the definition that, once it is made, the four kinds of dominance coJlapee illto one. BxerdIM la. Consider analogues of Exerciaes 3.2b and 3.4.
lb. When can two acts dominate each other? 2&. If F extends G, then :r dominates G. DiscU88 the converse. 2b. P(z) dominates F. 2c. If F :::l G t then P dominates G. 3&. If F c G, and F dominates G, then each admi88ible elf'ment of G dominates and is dominated by aD element of P. ab. Mter any finite number of nOD-adm;..ole elements is deleted from 1', what remains of any subset of J' that dominated .. continues to
domiDate F. 30. Though the set of admissible elementa of P may in eome iDstaDces
dominate rJ
DO
proper subset of the set of admissible e1emeDta can ever
do 80; but, if any other subset domiDates P, some proper subeet of it
at.o does 80. 3d. U:r is finite, the eet of admissible elements of :r domjnatee 1'. 38. DiacuI8 the role of "finite" in 3b and 3d.
118
4&. If the set of adm_Die elements of P dominatal G, and G dominates 1', then the set of admiMible elements of F is equivalent to the set of admissible elements of G. 4b. If F and G dominate each other, and either is finiU!, then the seta of admissible elements of P and G, respectively, are equivalent to each other, and each dominatea both I' and G. 5. If 'F dominatal G, then 11(1') ~ ,(G). 6. If .. dominates G J then, for any observation z, P(Z) dominates G(z).
I
Outline of the dellp of upedmenta
Often, eapecially in statistics, a decision problem can be seen 88 the problem of decidiDg which of aeveral experimente-or which of several obeervational programs, if that is really a more general term-to under-
take. In this eection the notion of the decision problem derived from a basic decision problem and an obeervation must be elaborated a little, because, as derived acta have been treated thus far, they correepoDd to the poesibility of making aD observation free of charge. Though obaervations are sometimes free, there is typically a cost 888OCiatA!d "lith making them; information must typically be bought either from other people or, mOle often from Datura, 80 to speak. The cost of iDfOJ'lD&otiOD may be money, trouble, one's own life, that of another, or any of iDnumerable poesibilltiee, but all can in principle be measured in terms of utility. The coat of an observation in utility may be neptive 88
well
zero or positive; witneIB the cook that tastes the broth. In principle, if a Dumber of experiments are available to a person, he bas but to ehooae one whose set of derived acts has the greatest value to him, due account being taken of the cost of observation. That simple formulation, like some others in this book, is, in a 8eD8e, oversimple; it abstracts from the enormous variety of coDSiderations that enter into the careful desip of any experiment. The poesibility of so abatmetiDg from variety does not remove the ultimate necessity of studying some aspects of that variety in detail. R. A. Fisher's TM Duign. 0/ EZlperimenta (F4], for example, is concerned almost exclusively with experiments baaed on a special technique called the analysis of variance, and it is but an introduction to even that important facet of statistiea. AgaiD, there is a growiDg literature (in which the work of A. Wald is outstanding) on sequential anaIyais, which is concerned in principle with aD experiments in which later parte of the experiment are conducted in the light of what happens in earlier parte; but this literature baa, by neee8lit,., been confined to a relatively tiny part of that domain. &8
e.II
OUTLINE OF THE DESIGN 01' EXPERIMENTS
111
Belore tumiDg to a more formal recapitulation of the outline of the desip of experiments, this may be & good place for a few speculative worda about the difference, if any, between experiment and obeervation.
Some aeiencea are commonly called experimental as oppoeed to others that are eal1ed obaenr&tional. Aerodynamics, the psychology of rote le&l'lliD& and the genetics of fruit flies would typically be called experimental sciences; and, to take parallel examples, meteorology, the paycholOlY of dreams, and human genetics would be called observational . But it is widely agreed, and the most casual consideration makes it clear, that any basic difference that may really be present resides Dot in the sciences themselves but in the methods typical of each. To illu8trate the role of observation in sciences ordinarily COD8idered experimental and vice versa, observations of wild populations of fruit flies have been useful in the study of the pnetica of fruit flies; the elects of fatipe, for example, on dream content may well be the subject of an experiment; and, e.'tcept for the atom, no topic in science is more popular today than experimental rain making. The illustrations could be utAmded indefinitely, and there is also a lees direct sort exemplified by the dieeipline called experimental medicine, which typically studies experiments on animals with the hope, often justified, that the findings thus obtained can be extrapolated to humans. The problem, then, is to distinguish an experiment from an observation. Except for brevity, it might be better to say mere observation, for, in general \l8IP, an experiment would be coruridered a special sort of observation. The first apparent contrast that comes to mind is that experimentation is generally thought of 88 active and observation 88 passive. But, upon examinatioD. it is Been that obaer/ation is also active, for obeervatiODS are typieally made by goiDg B'JlDewhere to observe, or waiting attentively till something happens. Often it is not only the observer bimMli who must be transported and put in readiness to make an observation, but also a considerable body of apparatus. What demands more activity than the modem observation of a solar eclipae? Another apparent contrast is that the experimenter acts on the tb.iq he obeerves. where&8 the observer acts only on himself and on instruments of observation that may be regarded 88 extenaioDB of his own 8eD8e orpns. If this criterion were accepted altogether DAively, there would be no such thing 88 a physiological experiment on one'. self; even 80phisticated interpretations might find it difficult to embrace plYchological experiments on one's self. Finally, experiments as opposed to observations are commonly suppoeed to be characterized by reproducibility and repeatability. But
118
OBSERVATION
(U
the observation of the angle between two &tara is easily Iep8&table and with highly reproducible rEaJlta in double contrast to an experimeat to determine the effect of upJodiDI an atomic bomb near a battleabip. All in all, however useful the distinction between observation and Bperiment may be in ordinary practice, I do not yet .a that it admits of any 80lid analysis. At any rate, no formal UI8 of the ctiatinction will be attempted in this book. Return DOW to the notion of obeervation subject to coat. It may be that the value of the random variable % is obaatvable but oDly at & coat c, a real-valued random variable meaaund in utilea. If, u heretofOIe, P(x) denota the set of acta derived from If on cost-free oblel,... tion of %, let F(z) - c denote the set of derived. acta subject to the laDdom coat c. This notation is interpreted to meaD that, if I is the generic element of F(z), then f - c (which, being a utility-valued function of I, is an act) is tbe generic act of the eet, P(z) - c. Very often the coR of an observation is independent of I, but not, for example, for him that testa the sharpness of a thorn with his finpr. Since obeervaticma are typically paid for before, or simultaneously with, maJriDI the oblerv... tion, the cost is typically obeerved along with the obeervation proper. Put differentlyJ the cost c is typically ,. contraction of the obaatvation z. Thus, if in some special context any advantage were to be gained by 80 doing, it would not be drastic to URlDl8 the cost of obeerriug z to be a function of the form ~ (x) j but, 88 a matter of fact, DO such advantage has come to my attention. It is not difficult to think of experiments to which the &88UDlptiOD does not apply. For example, in the present state of uncertainty about the long-term efJecta of X ..1'&)'8, anyone conducting a ahort--term experiment in which YOUDI huma beings were subjected to large doses of x-radiation would risk. coats that might not overtly manifest themselves for hall a century, or even for generations.
Much that would ordinarily be eaIled obeervation cannot be described by saying that the random COlt is simply to be subtracted from each derived act of the C01Te8pODd iD I obeervatiOD thought of 88 free of COlt. Allowing that it may be lepndary, the form of trial by ordeal in which the guilty Boated safely to be banged and the innocent drowned to be exonerated epitomUree such a situation; except in point of absurdity, ordinary industrial destructive testing of electric fuses and other products is much the same. Strictly speaking, discrepancy occurs even in the ordinary context in which the coet of obaatvation is a fixed IUID of money; for the utility of money is DOt, strictly linear, 80 tbe cost of observation typicaUy affects different derived acts somewhat differently. ThiB eort of situation is indeed 80 common as to introduce at leut a
8.51
OUTLINE OF THE DES(GN OF EXPERIMENTS
IJ9
alicbt error into almoet every application of the notion of cost 88 a subtraetive te1'lD. It would therefore be desirable to extend considerably the notion of cost of observation, but, thus far, I see no way to do 80 that does not destroy the mathematical advautage of singling problems of observation out of the class of decision problema generally. It is convenient now to analyze the appropriateness of regarding the Dumber 11(1") as a measure of the value of P. As must already be clear to tl:e reader, if a person is to make a preliminary decision limiting his next deeiaion to one or another of several &eta of acts, say, F, G, and H, then his preliminary decifJion will select a set that bas the highest value of P, and the preliminary and secondary decisioDS, regarded &8 a single p&Dd decision, amount to the problem of deciding on an act from PUG U B. So far 88 thia WJe of v is concemed, any increuing monotonic function of p such 88 " or 3- would be equally satisfactory, but ., baa an advantage in arithmetic simplicity when costs of obeervatioD are involved. Consider, for example, the problem of whether to make a particular observation at the random cost c or to make no observation at aIL The two seta of acta involved may then be symbolised by ~(z) - c) and F, respectively. The peculiar simplicity of 11 88 a mea&UN cl the value of a aet of acta, in this eontext, is exhibited by the almost obvious fact that 17(1'(:1) - c) == v(F(z» - B(c). It may be remarked in paring that v is a particularly good measure.in any problem where P, a, or His, 80 to speak, made available by lot, a poBbllity realised in (7.3.2), for example. FiDally, if one among several obeervations is to be ChoseD, each with it. own random coet (poBbly including the null observation), the perIOD will choose an ob8ervatioD for which v(l'(z» - B(c) is as large as poIIible. If the number of observations among which deeision is to be made is infinite, that function may not attain a maximum value. but the value of the situation to the persoD can reasonably be regarded 81 the auprem.um of the funotion; there are, of COUl'88, observatiODB &m.cmg those avajlable for which the supremum is arbitrarily nearly attained.
CHAPTER
7
Partition Problems 1 Introduction In the introduction of the preceding chapU!r it was explained that the treatment of decision problems in general had been carried to a logical conclusion, and that to study decision problems further it bad become DeeeIJ&ry to specialize. The notion of obeervation wu acc0rdingly chosen as the subject of specialization. The llituatioll now repeats it8e1f at a new level, for I have now covered the main pointe tbat occur to me about observation in general, though I see eODBiderably more to say about a certain type of obaervation. The type of observation problem to which the Pl8lBl1t chapter is devoted, though relatively special, is still very general. Indeed, ita generality is 8Ugested by the fact that no other type of problem is aystematically treated in modern statistics. In objectivistic terms, it would be descnDed 88 the type of decision problem in whieh the CODSequenC8 of each basic act dependa only on which of aeveral (pcaaDly jDfiaitel,. many) probability distributiona does in fact apply to the random vu;... able to be observed. f\fodem statiatics has no DAIIle for this type of problem, beeauae it recognises DO other type; and DO particularly sugestive Dame oecU18 to me. I am therefore tentatively adopting the noncommital name ''partition problem." Such motivation as there is for that Dame will be apparent when the concept is defined. In non-objectivistic terms, a partition problem bu the folIowiDa atructure. There are, of course, basic acts F and an observation z. The peculiar feature is a random variable b, which is typieally not lObject to observation, with the property that every f ill J is CODStant given that b has any particular value b. In many practical problems b takes on au infinity, even a non-denumerable infinity, of values, but systematic coDaideration of such problems would involve those advanced mathematical techniques that are explicitly being avoided in this book. Glossing over such queetioDS of technique for the moment, the state of the world, which is itaelf a 120
1.2]
STRUCTURE OF (TWOFOLD) PARTITION PROBLEMS
121
random variable, might play the role of b; with respect to this b, 8D7 observational decision problem would presumably be a partition problem. It may, therefore, be iDaoourate to eall partition problema special, but they are special whenever b is not equivalent to the state of the
world. AI has just heeD mentioned, the general poJicy of this book with .respect to mathematical teclmique restricts formal treatment of partition problems here to thoee in which b 888UDle8 only a finite number of different values, that is to say, those in which b is to all intents aDd pur-
poeee a partition B't whence the name ·'partition problem}' For the reader who is not familiar with the elements of the geometry of fHlimenaioD&l convex bodies, there will be a distinct expo8itory advaDtap in confining the formal treatment still further to twofold partitions. At the same time, by explicit statements and by the uae of suggestive natation, all readers "ill be given at least some idea of the extension of the theory to n-fold partitions; indeed, a reader familiar, for example, with Sections 16.1-2 of (V4], or with [B20] wiD find the extension as plain u if it had been made explicitly. Thus the restriction to twofold .. oppoeed to .foJd partitiODB will be to the advantap of some and to the diMdvantap of none. Partition problems are even cloeer than are observatioDal problema generally to the subject matter of statistics proper. ID particular, in the course of this chapter, multipel'BODal cODBideratioDS wBl from time to time be pointed out in connection with partition problema. I
Structure of (twofold) putitiOD problema
A central feature of a twofold partition problem ie, of coune, a twofold partition, or dichotomy, B i , i-I, 2. By way of abbreviation let tJ(i) - P(B,), and fJ - {8(1), 1J(2) J. The /l(I)'e can be any two Dumbers such that 1J(t.j > 0 and Z~(,) - 8(1) 8(2) - 1. Sinee 11(2) .at 1 8(1), it might seem superfluous to have a special notation for 1l(2); but this redundancy more than pays for itself in symmetry, especially in the extalsion of the theory to n--fold partitiODl. The po8Iibilit,y that one of the /J(t)'s vaniabes baa been ruled out, for it is neither typical nor iDteeItiDg, ud it. retention would mar the expo8ition 01 the theory. Each basic act f I F is characterized by a pair of Dumbers Ii ncb that
+
(1)
for each i. The technical a88UIllption will be made that 88 f ranges over l' the Dumbers Ii are bounded from above for each i, which is a little more stringent than tbe now familiar ....mptiOD that p(~ < co.
(7.2
122
The assumption expre-ed by (1) is made for defiuiteDeas and aimplicity. though its full force will aeldom be used. The poaaibility of relaxing (1) in certain context. will be mentioned from time to time, e&peciaUy since this poaaibility is of lOme interest even in the exploitation of (1) itaelf. In particular, for .vera! paps now it will seareely ever be necessary to a-1me . .)'thing about the struature of F relative to Bt, except that B(f I BJ is bounded from above for each I; for making the abbreviation /. == B(f, B.), almost everytbinl from here through Exercise 1 applies verbatim. The expected utility of any f • F can be computed in several fOl'JDl thus:
I
(2)
E(f) == E(f B1)P(B 1) - '1P(I)
+ 8(f I Bt)P(B,)
+ fJJ(2)
= 'Zt/IJ(I) SIr
12
+ (Jl -/J)~(l).
The first of these forma expreaaee the expected value in general terms; the second utilises abbreviations; the third is an obvious mathematical transcription of the second, particularly sugestive of extension to the n-fold situatioD; the fourth sacrifices the symmetry exhibited by the preceding three in order to take adVaDtage of the relation between ~(l) and fj(2). From the fourth form of (2), it i8 clear that, for fixed I, E(f) is a linear function of ~(l). Henceforth that fact, for example, would be expreaeed in symmetric form by saying that B(l) is IjDear iD /J, and the dependence of E(f) on ~ might be expHeitly indicated by writing B(f fJ). Since in any one decision problem fJ is constant, it might seem pointless to emphasise that E(t (J) is linear in~. But there are, in fact, two different reasons for being interested in variation of 8. In the tillt pIaee, once the observation :s: has been observed to have the value %, the basic, or a priori, decision problem is replaced by an a posteriori problem in which P(Bi \ z) plays the role originally played by P(B,) - /J(.). Beeond, interest in compa.riDg different people is becoming increasingly more explicit 88 the book proceeds. In particulart it is of interest to compare people who have available the same set of basic acta and who, at least 80 far &8 the distribution of x and the acta in F are concerned, ha.ve the same conditional personal probability given B i , but who attach different probabilities ~(J1 to tbe elements of tbe partitiOD.
I
I
7.21
STRUCTURE OF (TWOFOLD) PARTITION PROBLE?!S
123
To emphuiae ita dependence OD ~, .,(P) will sometimes be written '(I' fJ); ita computation in the fonowing faahion is fundamental to the theory of partition problems.
I
(3)
f.,
u(F IIJ) == sup E(f llJ) = sup Ul~(l) fl'
+ J~(2)J
= k(jJ),
where k(jJ) is defined by the equation in which it CCUIB. AcoordiDg to Exercise 4 of Appendix 2, the function k is convex in {J, that ie, It i8 convex when recognised as a function of ,,(1) alone. IntA!rpreUd as 8 pair of a priori probabilities, {J is confined to the open interval defined by ~(Jj - 1, /J(t) > 0, but it is valuable to recognize that k is defiDed, convex, and continuous on the closed interval %fJU) - 1, fJ(i) ~ o. Many typical features of the relationship between Ii' and B, are i11ustraWd grapbically by Figure 1. The abscisaa of that graph represents
FilUM 1
both /J(l) and fJ(2), as indicated, and the ordinate is measured in utm.. The straight lines, the left enda of which are marked fl. b. c, fl, and ., papb 88 functiODl of fj the expected values of the five basic acta of the partictilar problem represented. The ordinates at their right aDd left ends, respectively, are the correspondiDg values of the 11'. and 12'8. The graph of IE is marked by heavy line segmenta. It is seen that the linea G, C, and I, and they alone, touch the graph of k, for they repre-
124
PARTITION PROBLEMS
eent the only acta that aN optimal for lOme value of IJ. The act repteaented by d is inadmissible (if (1) is taken HWa1ly), beiDg in fact atrictly dominated by every other act except e, and it is therefore auperftuoua to the perIOD, no matter what the value of /J; b is obviously equally superfluous, but for a different reaaoD. In many typical problema in which l' baa aD infiDity of elementa, It is, IInlike the k in Figure 1, strictly CODvex; that is, ita 0Dly iDtervaJa of linearity are point intervale. BDrdIe 1. Compu~ and paph k for the eet P of dichotomous acta of the form
12(.> ..
1 - (1 - .)'; AnatHr.
i(ft) - lI'(l) - ~(2)f'
c:
(2,8(1) - Ir.
Tum now to the relationa between an observation x and the dichoWny B,. As before, it will be &MUJDed for mathematical simplicity that the value! of z are confined to a fiDilAt aet X. The probability that x attains the value z given BiJ written P{z I B i ), is fundamental in connection with partition problems. For one thiDl, as has already been indicated, there is interest in coDSidering people who, though differing with respect to fJt agree with reepeot w P(z t Bi). The probability P(%, B~ that z atWDI the value ~ and that B, simultaneously obtaiDB, the probability P(z) that x attains the value %, and the probability fJ(i I s) of Bi given that ~(.) -=- z are derived from P(~ I B i ) and /J by meaDS of Bays' rule (3.5.4) and the partition rule (3.5.3) thus: (4) (S)
P(:t, Bi) =- P(z 1Bi)~('l. P(z.) -
L
P(x, Bi).
i
(6)
if P{z) paE 0; and ~(i I~) is meaningless othen\i.se. It must be remembered that P(,;, B,), P(%), and ~(i l z) depend on the value of fJ and that a really eomplete Dotation would show that dependence. On the other hand, the condition that P(%) JII. 0 is independent of the value of f1.. When a 1eC00d obeervation y is to be discwmed, lJ(i is, in defiance of atriet locie, to be understood as the analogue of fJ(i I %); that is, 81 the eonditional rrobability of B, given that y(.) - fl. Dot aa the 1BID8 function 88 ~(i z) with 11 substituted for:t. CorMBpondiDg COllVeu-
t,)
7.3]
THE VALUE OF OBSERVATION
tions apply to PCJJ),
125
p(,1 Bi ), and P(II, B,).
made of such contractions &8 Equation (1) implies that
YmalIy, free use wiU be ~(x) for {,,{II z)t fj(21 z) I.
I
I
E(f B" %) - E(I B i )
(7)
J' and for all z such that P(z I B i ) > O. Equation (7) 11 the mathematical eaaence of the concept of a partition problem, and virtually all that is to be said about partition problems applies verbatim, if (7), even without (1), applies to such observations lUI may be under diacussion. In view of (7), for all f
I
I
B(f /J, ~) -
(8)
:=
if P(~)
E, B(t I B"
I
z}P(B, z)
1: IIJ(i I ~),
> o.
•
3 The ftlue of obaemation U the obeervation z ia made, and it is found that %(.) - z, then the & posteriori value of the set of basic acts, written ,(~ 1z), or mON fully v(F'~, z), will typically be ditJerent from the a priori value ,(PIli). Indeed, in view of (2.8), (1)
v(l
IIJ, z)
- sup B(I Ifl. z)
I.'
- "(1'111(%» - k(P(z».
This is the first illustration of the technical convenience of the function k. It is known on general principles that v(F{z» ~ I'(P), but there i8 lOme interest in reverifying the inequality in the PraJ8Dt context; in particular, it is P 0; or, if and only if
(8) when P(z, B.)
(9) when p(JlI B i )
P{x J B,) p(yl B i )
P(z)
--P(y)
> 0; or, again, if and only if P(z l B~, 1/) == P(,; 11/),
> 0; or finally if and only if P(x 1B i , 1/) is independeDt
or i for thaee values of i for which it is defiDed. In this form, and yet another to be derived in connection with (10), the condition is widely studied in modem statistical theory aDd a statistic eatiafying the COIldition is there ca11ed a eufBdeDt statistic. The name is weD justified; forJ . . baa just been shown, it is sufficient, for any purpoee to which x micbt be put, to know" if and only if '1 is a 81Jf6eient statistic for z. A different, and perhaps more congenial, approach to auf&eieDt statistics is the fonowing. If the person observes 'the particular value , of ." his original basic decision problem is replaced by a new one with the same basic acts. but with ~ replaeed by IJ(',I). Strictly . .kiDI. this will fail to be a partition problem, in case ~(y) is (0, 1) or (1, 0). or, for brevity, if 6(r} is eztmM. To lee whetber ,,(P(z) (1) is really greater
I
130
PARTITION PROBLEMS
(7.'
than ,(P{J) Ill), it is enough to investigate whether, for some 11 of positive probabilif,y for which fl(g) is not extreme, z is relevant to the ~ titian problem baaed on ~6J), for if ~(y) is extreme there can be DO value in foilowiDg the observation that 11 baa occurred by the observation of x. Therefore, x will be a worthltBI addition to '1, if, for evWT II for which flC1l) is Dot extreme, x is utterly inelevant, that is, if J is suflicieD' for x. If k is strictly convex, the condition is aleo necSIS..,.. The recognition of sufticient statistics in explicit problema .is often facilitated by the foUowiDI faClDIablHty cdtarioD. A statistic J ia auIicient for x if and only if there exists at least ODe pair of funetiODS R and S Nch that P(z 1B,) - RC1I(z); ;}8(%).
(10)
The Dece88ity of the condition follows from the exhibition of a particular R and S for a 8UfIicient statistic thus: (11)
I
P(z B,)
IS
E P(z I B i , lI)p{JIl Bi)
• == L P(z J ,,)p(rl B,) • I
I
~ P(y'(:r} Bi)P(:c ,,'(%}).
On the other hand, if P(z 1B,) can be expressed in the form (10), J can be seen to be sufficient for x thus: If P(;e , B" r) is meaninaful, it is given by (J2)
I
P(z Bh
)
== P(z, 111
B()
P(, I B.)
'II ==
==
0,
I
P(:z B,)
p(rl Bi )
•
if 11'(%) - 1/,
8(z)
= I: SCSi) __
t
w(~·)
which is independent of i. The reader may be interested in aakine himself, 88 an exercise, what freedom there is in ehooaiDI R and S when at least one such pair of factors exista. IDterest in sufficient statistics is not confined, of COUl'8e, to twofold, or even finite, partitions. With that in mind, the varioul criteria for su1Iicient statistics have been given in such terma as to be valid for allY finite partition and the usual infinite ones. They require aome modifica-
7.4J
SUPFICIENT STATISTICS
181
tion if the obeervatioDS are Dot confined to a finite, or at any rate denumerable, let of values, but formal details of that important exteDIiOD will not be given here. Elementary treatments are given in most textbooks of mathematical atatiatiC8; more advanced and general treat,..
menta are given in [B21, [L6], and [H3J. There are l8VeraI examples of 8UfIicient atatistiea in the exerciara below, others are given in almost any fairly advanced textbook on statistics (in particular, in [09]), and one other general example of extraordinary importance is treated in the next section.
h .... In theae exerciaee, let x denote a multiple obeervation x -= (Xl, ••• , ~ J, where, given B" the z,.'a are independent and identieally distributed. There will be no real advantage here in thinking of the partition 88 twofold, or even finite, and for some of the exercises it will be impractical to do 80. 1. Let P(~,. I B,) - Pi,
+ q, -
:l1li
I,
if x,.
s
0,
otherwise,
- 0, where P.
if z,.
1; and let 1I(z) -
E %1'. r
Show that:
I
(a> P(z B,) - .,f,r--; (b) J is aufticient for z, using the faetorabilitv criterion;
(e) PC1J I BiJ - (:) pl'q/' ....., where, as always, (:) - nlf,I{" - 1/) I; (d) P(z I II'(Z»
)-1 . \v(z)
=(
1&
2. For each positive integer iJ let
p(z..1 B,)
- i- 1,
if ~r
- 0,
otbenrise,
S i,
where the values of Xr are confined to the positive integers; and let V(%) - max z,.. Show that,:
•
Ca) P(s I B.) -
j--,
- 0, (b) J is IUflicient for 1.
if fI S i, otherwise;
132
(7.'
PARTfflON PROBLEMS
3. In the two exercises above it baa been poeaible to chooae the factor 8 identically equal to 1. To exhibit & more typical example, let i, %.., and 11 be confined to the positive integers with 'II{z) == max Sr, 81 in the preceding exercise, and let
2:cr I Bo) .. , i f "'- .s '1* S P(Tl S '1-' B 1) > '1*1 Bt) S It* .s Perl ~ 1"1· J B,,).
What about the typical C888 that Per .. ,.) .. 01 Ca) Show that, if i is at least 88 ,ood all ,- in the 8eDI8 that e. S ..for both ,"s, then i is a likelihood-ratio teat ad i is virtually 1* in that 6i a 6i· for both i'a. Hint: CoDaider an If and a ~ for which r-(fl, flo) 1::1 ahowiDg that theee exist, and note that, for this decision problem,
r·,
B(f••
I,,) - Ie, c:a
(21)
I
E(f. fJ) -
1t(1 - 2el-) 16(1)
+ (II -
'1(1 - 2It·)16(2)
I
.(P(z) (J)
I fl
-
3,(1 - ~) 1,,(1)
+ t -2 -
'1(1 - ~J) ltl(2)
I
~ ,(!lex) fJ),
with equality if and only if I is a likelihood-ratio test. This important conclusion about likelihood-ratio testa baa been much emphasized, especially by the Neyman-Pearaon acbool.
The concept of likelihood ratio, sometimes simply called likelibood, je DOW one of the most pervasive concepti of atati8tical theory.. It I8elD8 to have been introduced in 1922 by R. A. Fisber (cf. ind. of (Fa», who emphasised it in CODDection with the important method of estimation named by him "the method of maximum likelihood. n Ita use in teatiDg bypotheees was apparently first emphasised by J. Ney.. man and E. S. Pearson (see Vol. IT, p. 303 of [IC2]). In connection with likelihood ratios as nece ••ry and sufficient statistics, mathematieally advanced readers will be interested in Section 6 of (1.,6), (82). and [M5]. ODe of the earliest contributiooa in this direction was made by C. A. B. Smith (814).
8 Repeated ob....tioDl If x(n) - {%1' ••• , x,. J, where, given B 4, the z,.'. are independent identically distributed random variables, then f1(:r(x(n») is • DOD-decreasing function of ft, for the (n + I)-tuple is aD exteDaion of t.he fttuple. If~) is atrictly convex-a COIldition that you noW' recopiw
7.6)
141
REPEATED OBSERVATIONS
.. interestin~(P(x(n») is eaai1y seeD to be strictly increaaing in ta, unleas the individual :Er'e are either utterly irrelevant or definitive. It is to be expected, especially jn the light of the approach to certainty diacuaaed in § 3.6, that, 88" becomes very large, zen) wiD become prae• tically definitive. Indeed, t 3.6 makes it poaaible to state and prove a
formal theorem to that effect. 1im>BJDIl 1. x(n) - IZh ••. , z.J, where, given Bi, the z,.'a are independent and identieally distnouted random variables. 2. The X,.'8 are not utterly irreJevant to Bl3. v(F I {J) - k(JJ).
Hyp.
CoXCL. uniformly
lim 11(I'(z(n» I {J) a-_ in fJ.
l(JJ) - Df fJ{l)k(l, 0)
+ ~2)k(O, 1)
Pao<W. Writing x aa abort for zen),
I
(1)
v(P(x) fJ) - E[k(P(z»).
For an arbitrary. > 0, let the closed interval! on which k is defined be partitioned into two subsets J and K, where J is the set of those (18
such that k(JJ)
(2)
~
left) -
f,
and K is the complement of J relative to 1. It fonows from the continuity of the functions on each side of (2) that fJ aJ, if either component of fJ is sufficiently large. The computation initiated in (1) can now be eanied forward thus:
(3)
E[k~(J:»] - E[k(B(z»
I /J(2:(.»
eJ]P(B(z(,» IJ)
+ Elk(p(z» IP(z(,»
1
K}P(p{z(,» c K)
~ E(l~(z» 1~(%(.» IJ]P~(2;('» 1.1)
+ min ken ·P(p(z(.» K) II' == E[l(p(x»)) - {E(l~(x» I fJ(%('» «K] I
- min k(JJ)JP(jJ(2:('»
I
f
K) -
"max I k(JY) l·p(fJ(2:(,» " in which (3.6.15) Now, in view of the paragraph > 1(fJ) -
I
I
K) -
f.
and the fact that, if either component of IJ is close to I, /J I J; P(fJ(z(a» • K) becomes arbitrarily small for suftieiently large ft• • 0CCUl'8
142
PARTmON PROBLEMS
1'1.7
'1 SeqU-tiaI probability DtIo pnced.area The preeent section digreaeee to diacu88 an iDtaeating application of the ideas preeented in this chapter to what is ca1led aequential aulyaia. Sequential analJlds refers in principle to the theory of observational pr0grams in which the selection of what obeervations to make in later pbuea of the program depends on what baa been observed in eattier phases. Such behavior is commonplace in everyday life; for example, you look for something until you find it, but Dot lonpr. Statistiea itself has always used sequential procedures. For eamp1e, it is Dot rare to conduct a preliminary experimeD~ to determine how a main experiment should be aanied out. Thus, if one were required to estimate with a roughly preassigned precision the mean of a normal distribution of unknown mean and unknown variance, one mipt reasonably begiD by taking ten or twenty observationa, which would give BODle idea of the variance and would therefore determine about how many obearvationa are Deeess&ry for achieving the requisite precision. Commonplace though problems with aequent.ial featuMs are, A. Wald was the first to develop (1943) a systematic theory of a COIl8iderable body of problems of this sort. For early history see the Introduction of [W2J and the Foreword of Section I of (817]. Some later ideas on sequential analysis, due maialy to Wald and Wolfowitz, are the subject of this section. It wiD not be practical to proceed with full rigor, primarily because random variables capable of a88umjng an infinite number of values are neceBVily involved. Full det&ila are given in [W3] and more compactly in (A7], but not in Wald'. book on sequential analysis [W2]. Z == (x(I), ... , x(,), ... J, where the Z(II)'. are conditionally an infinite sequence of independent, relevant, identically distributed random variables. Rather informally, a 8luential observational propam. with respect to x is a rule telling whether to observe z(1) or whether to make DO observation at all; if the particular value 2:(1) ia obeervecl, whether to obeerve x(2) or to discontinue observation; if the values %(1) and z(2) are observed whether to observe x(3) or to m."ontinue observation, etc. More formally, let N be a function of the infinite sequence of values % .: (x(I), .•. , x(,,), •.. t such that, if the sequence z! &pees with ~ in every component from the first through the N(z)th, then N(z') - N(z).
ut
Such a function N detemUDe8 a sequential obletYatioDal prapam, which is a contraction of z, call it y(x; If) J defined thus: (1)
rex; X)
-
Df
(x(l), ... , z(N(z»J.
7.7)
SEQUENTIAL PROBABILITY RATIO PROCEDURFl3
143
It is te be understood that, if N(%) is zero for some %, it is identieally aero, and that ,(I:; 0) is & null observation. It will be 888Umed that the random cost 8880Ciated with a sequential observational pJ'OII"&Dl is proportional to the number of random variables observed, that is, c = N(z}y, "1 > o. No categorical defeDSe of this ••I)mption ia augested, but clearly there are interesting problema in which it is met at least approximately. The domain of applicability of the theory ean aetua1ly be considerably extended by modifying the a-Imption to include a fixed overhead cost that applies except in cue N is identically zero; this does not greatly complicate the analysis, 88 the interested reader will be able to see for himself. The theory would even remain virtually unchaqed, if c were only 888UlDed to be of the form Nea)
(2)
c- h -= 0,
+ ...L1 c('),
if N
> 0,
if N -= 0,
where h, c(l), c(2) , ... are independent with finite expected values E(h) ~ 0, E(c(r» > 0, and the c(,)'s are identiea1ly distributed. For any F there are some values of ~ for which it would be UDwise to adopt a.oy sequential obaervatiODal pJ'Oll"&Dl other than the null obser· vation. Suppoee, for example. that /J is 80 close to an extreme value that l(JJ) - k(fJ) < "1; under this circum8taDee the most that could be pined by observing even % itself would be leas than 'Y. but the cost of ma)dnllO much as one observation is at least "1. Let the Bet of values of IJ for which it is not justified to make any but the null observation be denoted for a while by J(F; "1), or simply J, for short. Now, if fJ • J, the person's utility can, by the definition of J, be maximired by refraining from any obaervation but the null observation and aceeptiDg the utility k~); otherwise there will be lOme advantage to him in obeerviog %(1). If the person does observe the particular value %(1) of z(l), he finde himself with a posteriori probabilitiee /J(z(l» in place of the a priori ~, he baa paid (or at any rate entailed) a cost -r, aDd he must now decide whether to make any further obeervations. His new problem is simply the problem he would have faced at the outlet had his a priori probabilities been fJ(%(I» iDstead of 8, except that all utilities are now reduced by -,. He jUBtifiably aecepta the utility ~~(z(1») - ~, if /J(z(l» aJ; othen-ise he will observe 1:(2). ContinuiDa this line of argument step after step, it foUowa that optimal action COD8iata in obaerving successive X(D)'S until an a posteriori probability in J oceura, and then adopting a basic act consistent with the a poeteriori probability.
144
PARTITION PBOBI.EMS
(7.1
In actual practice, it is far from easy to determine whether a particular value of ~ belODp toJ (Fj ,,), because in principle the whole enormous variety of sequential observational prqpama has to be expl k(fjo),
7.1]
SEQUENTIAL PROBABILITY RATIO PROCEDURES
145
for h is supposed to be adVaDtageoua at 60; and (5)
L
E{h , B i )IJ.(s1
S k(ft.. ),
m - 1,2,
i
for DO derived act is supposed to be advantapoua at ~., Bince fJ.. «J. Since IJo is a weighted average, say ';-,,JJ., of the ~'8, and since i{IJ) is linear in the interval between ~l and /Jt, it follows frcm (4) and (5) that (6)
I
}: E(h B,)/Jo(t)
< k(fJo),
i
contradicting (4). The supposition that flo «,...,J baa thus been Iedueed to absurdity. The demonstration just given e~"tends directly to n-fold problema.. The aeneral conclusion is that the intersection of J with any domain of liDearity of k is convex, 80 that, if k is polyhedral, J is the union of a finite Dumber of cloeed convex sets, each lying wholly in a domaiD of ljnearity of k. The practical implications of the conclusion are enormously greater for twofold than for higher-fold problema, because twofold problems lead to one-dimensional bounded, closed, convex Bets, which present DO great variety, all of them heiDg closed bounded intervals. But threefold problems, for example, lead to c10eed bounded two-dimeDBiona1 convex sets, a restriction that leaves great room lor variety. If IE is polygonal, the variety of seta J' to be surveyed is enormously reduced, for J' must be the union of a known number of intervals, each of whlch is confined to a known interval. Suppose that this number is m; the cl&88 of sequf'ntial observational programs to be surveyed can be characterized by the two end points of each of the m intervals, except that, the possibility that some of the intervals are vacuous must be borne in mind. Since the extremes of 1 are neeessari1y in J, and therefore neeEBuanly appear as end points of intervals in J, the exploration baa been reduced to a 2(m - 1) parameter family of possibilities. The poesibility that m .. 1, \vhicb almost means that F is dominated by a single element of itself, is trivial; for then all ~'8 are in J, and obeervation is never called for. This can be seen in many ""a18. In particulart it {ollowa 88 an illustration of the machinery that has just been developed, thus: The end points, or extremes. of 1 are both in J, as always, and, since m -= 1, they are both in the same interval of linearity of J; therefore the in~rval between them, namely every value of (J, lies in J. The possibility that m -= 2-iD ordinary statistical usage, the aequential testing of a simple dichotomy-is of particular importance.
148
(7.7
PARTmON PROBLEMS
It OCCUl'l typically when II is dominated by two acta, Deither of which dominates the other, 88 in Exereiae 5.2. One of the two acts is appropriate to one "hypothesis" B1, and the other is appropriate to Ba- In case fA - 2, it is easily seen, by methods that bave now been indicated more than OIlce, that each of the two cloeed intervals that CODStitute J baa &8 ODe end point ODe of the extrem. of 1. Neither of the two intervals can be vacuous, nor can either coDBilt only of • siqle point. It is relatively easy to find, at least approximatAdy, the two values of fJ that determine J(P; ~), aDd the theory of this situation hu corresponcliD&fy been broupt to a relatively high depee of perfection; for details, see (817], (W21, [Wa], and [A7J. FollowiDg (or at least paraphrasing) Wald [W2], a sequential obaervatiODal propam. characteriIed by makinl IUCCfJ' ive obIervatiou UDtil the a poeteriori probabilities fall into lOme set J, fonowed by adoptiDa & basic act appropriate to the a poateriori probability, ia called a Hqueotial pmbabOlt)' ratio pmcedure. The Ie&8OD for this DomeDCIature is that to obaerve until the a poeteriori probabilities faU into J is to obIerve until the numbers
#(&)P(z(l), •.• , s(.) I Bi) tlU1P(z(1), '.', x(,) BJ)
.
(7)
"(~ I z(I) ••..• z(..» - L
I
J
tie in a certain eet, or, what amounts to the same tbia& aatisfy certain conditiona. But, the particular value of fj having been Nlliped, this is tantamount to requiring the ratioe of probabilities (8)
I ..• , zeN) I B j )
P(z{l), .•• , z(N) B,)
P{z(l),
to satisfy certain coaditious. Since (7) and (8) are ways of expressing the likelihood ratio, the observational program together with the act derived from it m;pt also be referred to 88 a sequential likelihood-ratio proeedure. Indeed, but for the precedent established by Wald, that would seem the better name.
As an actual example of a sequential probability ratio procedure, auppoee that the distribution of xCv) given Bi attaches the probabilitiea " and fl' - 1 - p, to the values 1 and 0, respectively. The expreaaion (8) can in any case be writtAm in the factored form (9)
IT
Bi)}
{P(%(V) I .... P(%(v) I Bi ) •
7.7]
SEQUENTIAL PROBABILITY RATIO PROCEDURm
147
and ill the preaent example this takes the special form (10)
-(;re::J(N),
(;J(N)(;r~(N)
where N
(11)
L 2:(11).
r(N) -
...1
It is noteworthy, in connection with sufficient statistics, that the CODdition that the a posteriori probability be in J is in this cue expreaible, accordiDc to (10), as a condition on y(N) and N. BpeeialjsiDI the sample further, SUppollfS that J hi of the sort appropriate to testiDl a limple dichotomy. The condition that the & poateriori probability be in ,....,., is then apreaaed by each of the following equivalent paba of inequalities, where 41(1) ad 41(2) are positive numbers 8uch that a(l} +.(2) < 1. ~(II ~(l), •.. ,
%(N» < 1 -
a(l),
in terms of ...
v(P(x) 1{J) - B(l(}(z» , Il)
- B(k«(r,,(i)/L r~(i»)) I ~J I
- nB [k( (r,.,(,1/L rJlCJ1}) E rIJCJ) I mJ. I
I
Temporarily adopt the convention that, if a is any ta-tuple of positive numbers and h any function of r (not neceeaarily OODVex), 7'(CI)h is a function of r defined thus: (8)
T(J& (~«) a.r + \;.,
.. AT(a)"(r)
p.a.r'
r')
- a a-(AI" + PT') 01" (AT + pr') a· r'
ph(«·rr' ,«) «.r'
+ pT(a)h(r').
It is amuaiDc to eetabliah once more that observation generally pays, thia time by meana of (10), (4), and Exereiaea 5 and 2. (11)
fI.8(T(JJ)k(r) I m) ~ ftT~)II;(B(r
1m»
== ftT(JJ)l~·)
- k~).
If Z aDd r are obIa V.tiODS aDd '" and ",' are the correapondiDa dietnDuticma, it is now easy to 8f4Y .in terms of '" and ",' when Z is utterly irrelevant, when it is definitive, and when x is virtually aD extensiOll of r.
lIore eurciIee 6. The observation z is utterly irrelevant if and only if P(r -
6*' tA)
- 1. 7. The obeervation x is definitive; if and only if PCri m) - lIn, or, equivalently, if and only if 0 1m) - (n - 1)/".
per, -
11
162
(7.8
PARTITION PBOBIeEMS
Sa. The observation z is a virtual extenaiQll of r, if and 0DIy if, for every convex function b defined for ",
B(h(r) r m) ~ B(l(r)
(12)
I.,.
8b. The two observations are virtually equivalent, if and 0DIy if, for every convex function+h,
I
E(1a(r) tn) .. B(1a(r)
(13)
1m').
The conclusioD reached in Exercise 8b can be much improved. Indeed, it will be shown that the two obgervati0D8 are virtually equivalent, if and only if m and m,' are the same probability measures. This will be achieved if, for example, it is shown that, m and m' have the same momenta, for it is well known that two dift'erent countably additive probability measures confined to a bounded aet of n-tuples of numbera cannot have the same moments. t The momeDta in questiOD are expected values of monomials of the form (14)
where the ti's are DOD-negative intege1'8. In general, , will not be convex, 80 it cannot be concluded immediately that I has the same expected value with respect to m and m'. If, however, & highly convex function is added to g, then the sum will be CODVex and its expected value will be the same with respect to til and m'. Since, by hypotheaia, this is also true of the convex term of the sum, it must also be true of the not neceasari1y convex term. Specifically, let (15)
I&(r) - g(r)
+ ~ 1: rl, j
where A is a positive Dumber to be detennined later. To teet h for convexity, let, be for the moment an arbitrary "-tuple of numbers and. a real variable, and compute tbe second derivate of h(r cr.) with respect to fI at t1 =:t O.
+
(16)
tIll&(r .J
+ fI')
~
WI
I
~ a'g(r)
t:II
.-0
.&.J ~,j
ar, 8rl
8.-8S
~
2
+ >. ~ Ij • J
CoaaideriDg that each r, is between 0 ADd I, the abaolute values of the derivatives of , that appear in (16) have a common upper bound, tJay
t See, for eumpJe. CoroI1a:r;y 1.1, p. 11, 01 (811). Under our UIUIl IimpJif)'iD, awumptioa that s is coofi.ned to a fiDite DUmber of valu.ea, m is certaialy countably additive. AetuaIJy, t.be whole theor7 am be . . veloped mutatia mutandis only that the dilrtributiOD of z Ie cauntablJ additive OQ lOme suitable BoN! &eld. + Morae ud SaebWeI' (l966) show, iD deet, that the teet caD be conflDed to the very special eonvex funetioDS mas p,r•• where the P' are arbitrary poeitive Dumbers.
_miD,
8T~~DARD
7.81
1S3
roRM
't
if A ~ pA2, h is convex in the region where each n. between 0 and 1 and is a fortiori convex in the intenection « that region with the hyperplane '%rJ -= 1. Now that it baa been established that m and ",,' repreaent virtually equivaJent observatiODB, if and only if m and m' are identi~ it is apparent that fit or, more exactly, the . t of CODditioaal diatributiODI per I B,) - """,,(r)-is a unique standard form for all obeervatione virtually equivalent to z. If x virtually extends J, it is to be expected that, no matter what rea80Dabie definition of "informative" may be suaesteci, J: will be at least 88 informative as J. In particular, it is to be expected that the iDformation of B, with respect to B j (as defined in § 3.6) will be at least 118 large for Z 88 for y, which the followin, calculation verifies, supposing for simplicity that, for both obeervatiODS, jnfinite information is impoBble. The point in question depends on the cODvexity of the funetion h defined by
pj 10,
(17)
her) -
,.(log,. - log ri),
because Ji. i
(18)
II:
E(log
r, - log'j I B i )
- ftE[r,(1og
r, - log 'j) I mI.
The required convexity can be demODStrated much .. it was in (15)+ for & difterent function also momentarily ealled h: (19)
- -
1
',"'J
I
1, >- 0.
(r·,· l ' - r·,·)2
It would be intereetiDg to know whether every virtual exteDaiOD is realised by an actual extenaioo, that ie, whether whenever x is a virtual exteDsion of 1 there exist random variables r and " such that x and are virtually equivalent, 1 and are virtually equivalent, and r ea:teuds J'. To the best of my knowledge that conclusion baa thus far beaD eatabliahed only in the case of twofold problema, the demonatratiOil for that cue being liVeD by Blackwell in [DIS).
r
r
+ AetaallJ, w. oaleulation c1ependa only em the eoDTmty of (IOC r. -
log r/) in r/r,.
CHAPTER
8
Statistics Proper 1 IDtroductioD. I think any profealional statistician, whether or Dot he fOUDd himae1f in sympathy with the preceding chapters, would feel that, even aIlowiDe for the abetractDeas expected in a book OD foundatioDl, tb088 chaptel"8 do not really discU88 his profeaaion. Be would not, I hope, find the same ahorteomiDg in this and the aucceediDg chapters, for they are concemed with what seem- to me to be atatiatiea proper. The purpoI8 of the present abort chapter is to explain this traDaition and to eerve as .. geDeral introduetion to ita aucceaeors. 2 What is atatiatica proper? So far as I can see, the feature peeuliar to modem statistical activity is ita effort to combat two inadequacies of the theory of decision, 88 I have thus far discU88ed it. In the first place, there are the vagu8lltBI difficulties aaaociated. with what in § 4.2 were caUed "UD8UIe probabilities." Second, there are the special problema that &rile from more than ODe perlOn '. participating in • dee_on. From the peJ'lOnaliatic point of view, ltatiaticl plOp« can perhaps be defined 88 the art of dealing with vagueDe118 and with interperlonal difrerence in decision situatioD8. Whether this very teDtative de&mtiOD is juatUied, later aeetiona and chaptA!r8 will permit the statistical reader to judge. At any rate, vagueneaa and interpemonal diBenmce are the concepts that, directly or indirectly, dominate the rest of this
book. I will
try to discuas vagtleDe18 in thia chapter, but IOmetbing may profitably be said here about interper80Dal differences. Dot
8 Multlpenoa.l problema As I have already frequently said, it seems to me that multiperBODal eoDSideratiou constitute much of the eeaence of wba\ is ordiDarily called It&tiatia., and that it is largely through auch CODSideratiODl that the achievements of the British-American SchooJ can be interpreted in 1M
8.8J
MULTIPERSONAL PROBLEMS
ISS
tams of peraonal probability. This is a ",iew that can best be defended by illustration, and the requisite illustrations will be scattered throughout later chapters; but some support is lent to it by thOle critics of peraonal probability who say that personal probability is inadequate becauee it applies only to individual people, whereas the methods of science are, more or leas by definition, those methods that are aeceptable to all rational people. The 80rt of multipersonal problems I mean to call attention to are those arisiDg out of differences of t&8te and judgment, as oppoeed to those, 80 familiar in eeonomica, arising out of eonfticting interests. .As a matter of fact, the latter type of multipersonal situation can, if one chOO8e8, be reprded 88 among the fonner; it may, for example, be said that you and I have different tastes for the process of taking a dollar from me and giving it to you. Though modem statisticians do not at all deny the existence of different ta8tea in different people, only occasionally do they take that dift'erence explicitly into acCOUDt. In particular, the theory of utility baa acarceIy ever entered explicitly into the works of statisticiaDs. Our intellectual ancestors who believed in the principles of mathematical expectation were leas tolerant than modem statisticians in 80 far .. they deDied rationality in thoee whose tastes departed from that principle, and some of their bigotry is occasionally met with today. In dealing with multipel'8Onal situations, it is clearly valuable to recognize tboee in which the people involved may all reasonably be expecUd to have the same 10*_, that is, utilities, with respect to the altema.tive8 involved in the situation. Explicit attempts to discover pneral cireumatancea under which people's tastes will be identical are rare. The moat important and fruitful attempt of this sort is repreeented by D. Bernoulli's idea that utility functions will typically be approximately linear within sufficiently confined rrmges of income.. CoDscioualy or uncOD8cioualy, that principle is repeatedly appealed to throughout statisties; it wu, for example, brought out in § 6.0 that the very idea of an observation depends for ita practical value on Bernoulli's principle of approximate linearity. Relatively inexplicit exploitations of similarity of taste are sometimes made in statistics. The idea is often expreMed, for example, tllat the penalty for making an estimate discrepant from the number to be estimated will, for everyone concerned, be proportional (within a reasonable range) to the square of the discrepancy; an argument for this principle u a rule of thumb appropriate to many contAmtB will be given in ,15.5. Again, there are situations in which it is agreed that the penalty will depend only on the diacrepancy and Dot OIl the true value of
151
STATlSTIaJ PROPER
(8.'
the Dumber to be estimated. Of COUl'8e, theze are problema in which both rule. are invoked simultaneously, the peaalty beiDa auppoead to be proportional to the square of the diacrep8llCY and independent of the value to be estimated. Tum DOW to cUereocee In judpDeat, that ia, to diffenmees in the perIIOD81 probability, for different people, of the lUDe event. ThouP modern objectivistic atatisticlu. may recopi., the existence of differences of judgment, they argue in theoretical ctiacUlli0D8 that _tietics must be pUl'IIUed without reference to the existence of thOle differences, indeed without reference to judlPDeDt at an, in order that CODelusiooa shall have scientific, or general, validity. To put the same idea in peraonaliatic terms, I would .y that atatietica is Jarply devoted to exploidng similarities ill the judgmenta of certain cl.Res of people and in aee1dng devices, notably relevant obeervation, that tend to miDimise their differenoee. The tendency of oblervation to bring about agreement baa been iIlUltrated in 13.6. Some of the other pnerU cireumatances in which different people may be expected to agree, or at leaat nearly agree, in eome of their judgments have also been mentioned. For axampl., it ID&J well happen that dilerent people ue faced with partition problema that are the laDle in that the aame variable is to be obeerved. by each person, but differ in that each peraon baa his own a priori probabilities ~ and hie own set of available acta 1'. If, however, the cODditiODal distribution of z liven Bi ia the aame for each perIOD, thea the people will, for example, agree 88 to whether & contraction '1 of z is 8UfIicient, which is often of great practical value. Apin, there are circumatances under which each of these same people will agree that certain derived acta are nearly optimal.
,
The mlnimlx theory
In receat years there haa been developed a theory of decisioD. here with due precedent to be called the miDima x theory, that embraeee 80 much or current statistical theory that the remainiDI chapters caD larpIy be bunt around it. The minimax theory was originated and much developed by A. Wald, whoee work on it is almost completely .1mmariaed in hie book [W3]. Wald's minjmax theory, of courae, deriv.. from, and reflects the body of statistical theory that had been developed by others, particularly the ideas aaaoclated with the names of J. Neyman and E. S. Pea.non. It seemalikely that, in the development of the minimax theory, Wald owed much to von Neumann'a t.-tmeot of what VOD NeumanD caDs zero.eum two-peI'8OIl pm., which thouah conceptually remote from statistics, is mathematically all but identical
THE MINIMAX THEORY
167
with study of the minimax rule, the characteristic feature of the minimax theory. Wald in his publicatiODB, and even in CODVe1'8&tion, held himself aloof from. extramathematica1 questions of the founciatiODl of statistics; and therefore many of the opinions expmmed in lats chapters on such points in connection with the minimax theory were neither supported nor oppoeed by him. It may fairly be said, however, that he was an objectivist and that his work was strongly motivated by objectivistic
ideas. My policy here of bolding difficulties of mathematical technique to a minimum by making stringent simplifying 88BWDptioDS will be adhered to in connection with the minimax theory. A larp part of Wald'. book [W3) is concerned with overcoming the difficulties in technique that are heM avoided by simplifying assumptions, but that must be faced in many practical problems. Despite Wald's able effort, important problema of aDalytic teehDique stin remain in connection with the minimax theory. It should also be appreciaUd that the individual mathematical problems raised by applicatioDS of the miD imp. theory are often very awkward, even when stringent simplifying 888UlDptiODB are complied with; consequently much work on specific appUeationa of the theory is atill in progress.
CHAPTER
9
Introduction to the Minimax Theory 1 Introduction This chapter explains what the minimax theory ie, almost without referenee to the theory of peraonal probability. This courae IfJ8ID8 best, beeause the theory W88 originated from an objectivistic point of view and 88 the solution of an objectivistic problem. Moreover, a philosophically more neutral presentation seem. to result, if the ideas of personal probability are here kept out of the foreground. The minimax theory begiDa with some of the ideas with which the theory of personal probability, &I developed in this book. also begiDa. In particular, the notiona of person, world, states of the world, events, CODBequencea, acta, and decisi0D8 preaenteci in Ii 2.2-5 apply as well to the minimax theory-from which they were in fact derived-u to the theory of personal probability. The point at which the two theories depart from each other is 12.6, which postulates that the person's prefereneee establish a simple order among all acts. That 888UlDption is necessarily rejected by objectivist&, for it, together with the sure-thing principle (which they presumably accept), implies the existence of personal probability. For objectivists, of COUJ'8e, conditional probability does not apply to all ordered pains of events. More specifieally, it aeems to be a tacit assumption of objeeti. viatic statistics that the world envisaged in any one problem is partitioned into events with le8p8Ct to each of which the conditioD&l prot. bUitiea of all events (ignoring the mathematical UKiliDica1ity of meastl1'&bility considerations) are defined, but that conditional probability with le8p8Ct to seta other than uniODS of elements of the partition are Dot defined. That, incidentally, is why partition problema domiDate objectivistic statistics. The partition in question is in general infinite, but, for mathematical simplicity, it will here be 888UIIled to be a finite partition B,. The objectivistic position is not in principle oppoaed to the concept of utility. In particular, the minjmax theory is predicated on the idea
168
9.2)
1&8
THE BERAVIORALISTIC OUTLOOK
that the coD8eQuences of thOle acts with which it deals are meuured numeriea11y by a quantity the expected value of which the per8OD. wishes to have as large &8 possible, whenever (from the objectivistic point of view) the concept of expected value applies. It will therefore be doing the minjmax theory little or no injustice to postulate here, as eleewhere, that the oouaequences of acts are measured in utility. Theee preJjminaries dispoaed oft the general objectiYlatlc ctedaton problem is to decide on an act f in some given r, by criteria dependiDs only on the conditional expectations B(f I Bi), and therefore without reference to the "meaningless" P(B.}. Taking any peJ'lODaliatic or neuooary point of view literally, it is DOD8eDliea1 to pose an objectivistic decision problem, that to uk which f of F is beat for the person, without reference to the P(Bu. On the other hand, many, if not all, holders of objectivistic viewa, like WaId, fiDd themaelvea logically compelled by two widely held _eta to COIlaider such problems meaningful. First, for re&8OD8 I have alluded to in Chapter 2 and will BOOn expand upon, many theoretical statisticiau today agree, at least tacitly, that the object, or at any rate one object, of atatistica is to recommend wise action in the face of uncertainty • point of view that Wald W&8 particularly active in bringiDg to the fore. Second, statisticians of the British-American School, of which Wald ja to be eonsiciered a member, are objectivists and are therefore committed • to the view that the probabilities P(Bi > are meaningleD, or, at &Dy rate, that they cannot be legitimately used in solutions of statistical problems. So far as I know, Wald is the only ODe ,,·ho has proposed any solution to the general objectivistic decision problem, barring minor variati0D8. His propoeal, which is here called the minimax theory, is rather complicated to state. In view of its complexity and the importance of this theory for the rest of this book, and for 8tatiatiea1 theory generally, I hope the reader will have particular patience with the present chapter.
w,
I
The behaYkalJatlc oudook
Prior to Wald'e formulation of what is here called the objectivistio decision problem, the problema of statistics were almost always thoulht of as problems of deciding what to say rather than what to do, though then had already been lOme interest in replacing the verbalistic by the bebavioralistic outlook. The first empbaaia of the bebavioraliatic ou~ look in atatistica was apparently made by J. Neyman in 1938 in [N3), where be coined the term "inductive behavior" in oppoeition to "inductive inference." In the verbalistic outlook, which still dominates moat everyday statistical thought, the basic acta are auppoeed to be
180
INTRODUCl'ION TO THE MINIMAX THEORY
[0.2
aaeerti0D8; and schemes baaed OD ob8ervatioD are IOUgbt that eeldom lead to false, or at any rate p:aly inaccurate, MErtioua. The verbalistic outlook in statistics seems to have ita origin in the verbalistic outlook in probabllity criticized in 12.1. which in turn is traceable to the ancient tradition in epistomology that deductive aDd iDduetive inference are eloeely analogous proeeiIJ&8. I, aDd I believe others sympathetic with WaId'. work, would aDalyze the verbalistic outlook in statistics thus: WhatAwer an uaertion may be. it is an act; and decidiDg what to asaert is an instance of decidiDa how to act. Therefore deciaion problema formulated ill tenDI of act.e are no less general than those formulated in terms of assertions. If, on the other hand, a sufficiently broad interpretation is put on the notion of .-ertion, perhaps every decision to adopt an act can be ... prded as an a.seertiOD to the effect that that act is the best available. in which cue the difference between the verbalistic and the behavioralistic outlooks is only terminological; but I do think that, even UDder such an iDterpretation, the behavioralistic outlook with its tendency to emphasise consequences offers the better terminology. FaUacious attempts to analyze away the difference between the verbalistic and behavioralistic viewpoints are also sometimes put forward, especially in informal discu_on. For example, it is sometimes said that one should act as though his best estimate of a quantity were in fact the quantity itself. But on that basis few of us would buy life iuunmce for next year, for we do not typically estimate the year of our death to be 80 close. Other examples are diacuaaed by Camap .in Section 50 of [Cl]. If uaertions are, indeed, to be in~rpreted 88 a special cla8a of acta of particu1ar importance to statistics, I have no clear idea what tba~ class may be; but it would presumably exclude certain acts, 8Uch &8 the design of an experiment, that surely are of importanC!e to statistics. Actually the verbalistic outlook huled to much confusion in the foundations of statisties, because the notion of a&gettion has been used in 8everal difJerent, but always ill-defined, senses, and because emphasis on assertion distracts from the indispenaabJe concept of consequences.
I conclude that the bebavioralistie outlook is clearer, fuller, and better unified than the verbalistic; and that such value 88 any verbalistic con.. cept may have it owes to the poseibility of one or more bebalioralistie interpretations. This analysis is really too brief and must be IUpplemented by certain remarks. To begin with, the reader may wonder whether the verbalistic outlook has adherents who defend it against the behavioralistic, and if 80 what their arguments may be. Actually, the .tatistica1 public seems
1.2]
THE BEH..~VIORALISTIC OUTLOOK
111
to greet the behavioraliatie outlook as a relatively new idea-how old it may actually be is beside the point here-which as such must be reprded with some skepticism. To the best of my knowledge, however, only one objection against the behavioraliatie outlook baa been presented. It must be di8CUS8ed next. It hal been eeen 88 an objection to the behavioralistio outlook that the eonsequences of some assertions, particularly those of pure acience, are extremely subtle and difficult to appraise. AI a function of the true but unknown velocity of light, what, for example, will be the C()IUI8ooo quences of aaaerting that the velocity of light is between 2.99 X 1010 and 3.01 X 1010 centimeters per second? But, if some acts do have subtle CODSequences, that difficulty caDnot properly be met by denying tbat they are acts or by ignoring their consequences. Certain practieal solutions of the difficulty are known. For example, coDSiderations of symmetry or continuity may, as is iUustrated in Chapters 14 and 15, make a wi8e decision poasible even in some caaea where the explicit cODBequences of the available acts are beyond human reckoning~ AgaiD, analysis sketched in the next two paragraph8 tends to show that aaaertions with extremely subtle eonsequences playa smaller 1'01e in science and other affairs than might at first be thought. No worker would actually publish-indeed no journal would accept -88 research the hypothetical usertion about the. velocity of lipt mentioned in the paragraph above. The eOJUJequenee8 might be subtle, if he did; but they would not be very important, for no one would take him 8eriously. An actual worker would do 88 much as was practical to say what obeervationa relevant to tbe velocity of light he, and perhaps others, had performed and what had been observed. To be lUre, his statement of the observations would typically be much condensed; he would resort to sufficient statistics or other deviceJ to put his reader rapidly in position to ad as though the reader himself had made the observations. Alaertions about the velocity of light, and countlesa others of that sort, are of course published in textbooks and handbooks. These asaertions do indeed have complicated consequences, 80 judgment is eaUed for in the compilation of aueh boob; but the seriousness of the consequences of their aasertiona is limited because of the possibility of refeniDg to original research publicatiODS, a poasibility serious textboob and handbooks facilitA~ by the inclusion of bibliographies.. On the other band, it is obvioua that many problema described according to the verbalistic outlook as calling for deeisi0D8 between aaeertiODS ...uy eaIl only for decisions between much more down-to-earth acts, such as whether to issue single- or double«lpd 1'U01'8 to aD army,
UI2
I~wrRODUCTION
TO THE MINIMAX THEORY
19.8
how much posta.ge to put on a parcel, or whether to have a watch IeadjustAd. It is time now to turn back to objectivistio decision problems.
a
Mb:ec1 acta SpeakiDI with pedantic strictness, it might be said that Wald does Dot propoee a 101ution for the general objectivistic decision problem, because, before UDdertaking a solUOOD, he iDsiats that B be aubject to a certain condition. On the other hand, he argues that the condition is typically met in practice; he might fairly have iDBiatal that it ill the very heart of much actual statistical practiee. Before di&ouaaiDg the issue in detail, let me give a emall but typical illustration of it. Suppose that in a rental library I am confronted with the choice between two detective stories, each of which looks more horrifying than the other. At first Bight it would seem that only two aeta are open to me, namely, to rent one book or the other, but Wald pointe out that there are other possibilities, Dot otdinarily thought of as such. In particular, I can eliminate one of the books by flippin, a coin.. More aceurately and more generally, I can let my choice depend on the outcome of a random variable that is utterly irrelevant to the fundamental partition-in this example, a random variable the outcome of which is independent of the relative merita of the two books. The random vaJia.. ble may 88 well be confined at the outset to two values eorrespondiDs to the rental of one or the other of the books, and random variabl. . . . signing the same probabilities to the books are equivalent for the purpose at band. In practice, especially serious statistical p.ractice, auch random variables are, taking reasonable preeautioDl, readily provided
by coin8, cardB, dice, tables of random numben, ad other devicea. In terms of the general objectivistic decision problem, WaJd'. poiDt can (except for mathematical technicalities) be formulated thus: II ff' represents a finite number of elements of F, and .(r) is a cOITe8pOnding set of Don-negative numbers such that ~(r) - 1, then the person can make the mized act
(1) r
available to himself by observing at DO appreciable coat & random vaJia.. bJe taking the values r with corresponding probabDities .(r) irrespective of which B, obtains, 80 P may be assumed to include f. TeeJmically, the sum in (1) should, for full generality, be replaced by aD integral with respect to a probability measure. But such integrals become SUperfluoU8 under the simplifying auanunption, which is herewith made,
INCOME AND LOSS
183
that there are in F a finite set of acts f." to be ealled prlmary acta, with respect to which every act in F can be represented in the form (1). In the rental-library 8DlDple, the two acta corresponding to the two boob can be regarded 88 primary. 8inee mixed acta are also available from the personalistic point of view, it may weD be wed whether it is advantageous to consider them in eoDDection with that point of view, and, if not, how they can be of advantap from one point of view but not the other. The &DSWer to the first part of the question is easy. Indeed, if f is defined by (1) then it is pemoDaliatieal1y impossible that f should be definitely preferred to every fr, that is, that (2)
E(f) -
.Er .,,(r)B(fr) > max E(fr ), r
for a weighted mean e8nnot be peater than all ita terms. Technical explanation of the efficacy of mixed acta from the objectivistic point of view can best be presented after the whole statement of the minimax me, but those at all familiar with modem statistical practice will derive lOme insight from the remark that the u8ual preference of etati&tici8D8 for random samples repreeente a preference for certain mixed
acta. ,
Incomeandloss
It is 80metimea suggestive, and in conformity with some statistical (though not quite with economic) usage, to refer to B(f r B i ) as the IDcome of f when Bi obtaiDS, and, eorrespondingly, to tile the notation l(f; tj. An important concept 8I8OOiated with the income is that which I shall refer to as the 1088 (symbolized by L(f; .1) incurred by the act f when B, obtains. By that I mean the difference between the income the penon could attain if he were able to act with the certain knowledge that B. obtained and that which he will attain if he decides 011 f when B, does in fact obtain. Formally, (1)
L(fi
t.1 =- Df max l(f'; .) I'
- l(f; IJ.
H the penon decides on f when Bi ObtaiDS, L(f; I) measures in tel1Dl of income the error he has made. If he were himself informed of Bi after f had been chosen, which is not typically the case, L(f; .) would, 80 to apeak, "measure his eauee for regret. On that account, lOme have propoeed to call 1088 "regret," but that term seems to me charged with emotion and liable to lead to such misinterpretation .. that the 1081 necusarily becomes known to the peraoD. On the other hand, the
INTRODUCTION TO THE MINIMAX THEORY
1M
(9.5
term "1011" has been UJed by Wald in the 88DI8 of Degative income, but in contexte where 1 for every I and then f and ( are, reepectively, minimax aDd maximin, and L· -= L. -= L(t; 1'). COROLLARY
'i
1
8 BiUDear pm..
If one stumbles aomehow onto a pair r, ( satisfyiDa the hypothesis of Corollary 2.1, then he baa diaeovered a minimax, a maximin. and the values (in this cue equal to each other) of L· and I.. But that poaaibiHty of diecovery does not exist unJe- L· - L., which at the level of puerality of the last aection is 11IW8U8l.. Almoet all real interest, however, centers OIl a very special cIasa of abetract pm., here to be ealled bilinear pDles, for which it is demODBtrable that L· ia invariably equal to L.. The definition of bilinear pmee involves 8eV~ stepe. First, COIlsider au abstract pme, L(r; i), baaed on a pair of variables, ,. and i. The two variables are here assumed for simplicity to have only a finite number of poaaible values, an aMlmption that can, and for statistics mUlt, be COD8iderabIy relaxed. Next, let f and I be Don-negative funetioaa of , and it respectively, arbitrary except for the coDStraint that (1)
E/(r) -= r
:E, ,('1 =- I,
in short, probability measures on the r'e and i'a, MSpeCtive1y. FiDally, the bilinear pm. L(f; ,) is defined thus. (2)
L(f; &) -
Df
:E L(T; t)/(r),(t). p.i
It is important to recopiH that the duality principle continues to hold, that is, if L(t; .) ia a bilinear pine, then L(c; f) - -L(I; .) is alIO one.
12.3)
BILINEAR GAMES
187
In Wma of the auxUial7 functioDa L(t; I)
- Df
1: L(r; t,j!(r), r
(3) L(T; I) -
Df
E L(r; I)g(,), ~
the followiDg equalities and inequalities can easily be verified by tbe
reader. max L(f; ,) - max L(f; I), &
,
(4)
min L(f; I) - miD L(T; I). r
f
(5)
min max L(r; tj p
~
min max L(I;.1 - L*
~
L.
i f '
-=- max min L(T; I) ~ max min L(T; ,). &
,.
•
r
But more can be said in connection with (5), for it baa been shown by von Neumann (V3l that for the special clua of functioDl now under diaeuaDon L· is actually equal to L.. This important equality cannot couvenieDtly be proved here, but tbe interested reader ean refer to the relatively simple proof given by von Neumann and Morgeustem in Section 17.6 of (V4] (reading first, if necessary, the introduction to the mathematics of convex sets that constitutes Chapter 16 of that book) or to the version of. it presented in [BI8}. In the light of the equality of L* and L., (5) becomes (6)
min max L(T; 11 r
i
> min max L(f; ,) = L* ,
=:
•
max min L(,.; ,) I
,.
> max min L(r; i). ,
r
In view of (4) and (6), Theorem 2.1 can be much improved upon for
bilinear games: TIDODIIl For bilinear games, the following three conditiODI t, (, and C are equivalent: 1. f' minirnu, C maximin, and L* = c. 2. L(f'; I) ~ < L(f; 1') for every f and ,. S. L(t; i) S C S L{,.; C> for every i and r.
OIl
c
Paoo,.. Condition 2 implies 1, by Theorem 2.1; 1 implies 3 by (6); aDd 3 implies 2 by (4) • •
188
THE MATHEMATICS OF MINIMAX
PROBI~MS
(12.8
A DeceBIBl1 and suflicient condition that f be mini. max is that, for lOme I, L(f; i) S L(r; ,) for every r and i. UDder that OOIldition Le - L(f; .), and I is maximin.
CoROLLABY
1
Corollary 1 • .q .. an eapecial1y appropriate expreaaiOll of Theorem 1 in connection with the minimax deciaiOD theories, where the ,'a are, after all, Dot really of interest in themselves. Theorem I, and equiva1aD.tJy Corollary 1, are of great practical value. To be sure, there ale alprithms, or rules (given by Shapley aDd Snow in [S12D, by which L· and all miU im" values of f can in principle be computed, but theae aIgorit.h ma are 80 awkward to apply that in practice ODe pnara11y gunl. one or more miuimax 1'8, aDd al80 a maximin ., aD the baaia of aom8 clues. verifying the 1Ue88 and evaluatiJ1l L* by Corollary 1. To finjah the job, one then finds, if one can, an argument to show that the minimax "s thus discovered are all there 8l8.. This rather imperfect pr0.cedure is especially important, since it can relatively easily be ateDdad to many situatiODl in which r and i are not confiDed to finite nmgee, .. does Dot seem to be true of the algorithms. As WIB mentioned in § 10.3 and 88 the examplea that have been given illustrate, if f is minimax, then L(I; 1') is in practice often actually equal to L· for all, or at least many, values of i. Insight into that phenomeDon is given by the foDowing theorem. 2 If i is such that there exiata a maximin I for which 0, then L(fi .1 - L· for every minimax f.
THEOREM
g{i)
>
PRooP. L(f; i) S Le, because f is mioimax. Therefore L(t; I), being a weighted average of the L(f; I)'., is at most Lei and it is actually 1888, if any term with positive weight is not equal to Le. But L(fj I) ~ L*, because I is mmmin. • I t can happen, and in statistical practice it often does happen, that every i satisfies the hypothesis of Theorem 2, in which case L(f; 11 L· fot' every i and every minjmax f. Theorem 2 often provides a basis for guessing a minimax " a maxim;» I, and the value of L*, which can then be checked by application of Corollary 1. To take a simple example, suppose that there are ft valuee of f', aDd n of i. There may be 8ODl8 J'8UOn to conjecture that each i is used by some maximin I, that is, that each i aatJsfiee the hypothesis of Theorem 2. If the conjecture is in fact true, then fer) and L· satisfy the system of equations Jf(r) OL· - 1 r (7) 1: L(rj i)/(r) - lL· - 0.
l:
r
+
12.f)
AN EXAMPLE OF A BILINEAR
Typically, (7) as
& system
of n
188
~\lE
+ 1 linear equations in n + 1 variables
win have exaetly one 801ution (J(r), L·). This 801ution, if the conjecture is valid, will actually consist of the eomponentB of a minimax f (m thi8 eaae the only one) aDd the value of L*. But the conjecture ia Dot yet confirmed. In particular, if any fer) in the solution of (7) is negative, it is contradicted; if not, the investigation can proceed. The candidates for maximin values of , are now, by the dual of Theorem. 2, amoDg the solutIons of the system.
E 19(i) + OL· :: 1 i
(8)
L,
L(r; ~)g(t) - lL* - 0,
where r is confined to the values for which fer) > O. To CODSider ODly the simplest and most typical cue, suppose fer) > 0 for every r. HeL* 88 known, (8) consists of n 1 equations for " variables, which at first sight might be expected generally to have DO solution. To put the matter differently, if one forgets for the moment that L· baa been determined by (7), it might seem poBble that (8) could lead to a dift'erent value, say L·'. But, using the latter part of (8) and then the first part of (7), it is seen that
+
PntinI
(9)
L
L(r; i)!(r)g{tl - L!(r)L·' - L·',
r. i
r
and dually the double sum equals L·; 80 di.seMpaney between L· and L*' is not amonl the real snap in the tentative program-irrespective of the number of r's participating in (8). Finally, if (8) leads to even one set of positive g(,l's, it follows from Corollary 1 that the f and L· derived from (7) are the unique minimax and the true value of respectively. The convene of Theorem 2 has been proved by Bohneo.blust, Karlin, and Shapley in [B19), though their proof CAnnot be reproduced here. As is pointed out by these authors, the convene does not extend at all readily to situations involving infinite ranges or r and i. Theorem 2 and ita converse can be aummarized thus:
L·,
Tm!oltmI3 'There exists & maximin, for which g(t) only if L(f; ,j =: L· for every minimax f. ,
> 0,
if and
All eumple of • biIiDeu pme
It is now convenient to discuss a certain example, or rather a cJasa of examples, of bilinear games, namely those in which i takes only two valUeB, say 1 and 2. Two preliminary remarks will help to orient the
190
THE MATBEMATIQI OF MINIMAX PROBLEMS
diaouadOll. Firat, bilinear pm_ in which i takes only one value are devoid of intertwt, for the mimmax problem in that cue is simply • problem of finding an ordinary minimum. Second, the discuaaioa of bilinear pm. in which i takes only two valute includes, in effect, because of the duality principle, the diacuaDOD of thoae in which r tak. only two values.
If i takes only the two values 1 and 2, the values I - 1,(1), ,(2) J cau be Jepteaented grapbically by points OIl an interval. 88 illuatAtAd at the foot of Fipre 1. For every r, L(r; ,) is linear 88 a function of
1.--.(1)--. .. . . . 1. •...------,f%)--------. Fiprel
I, as is L(fi ,) for every f. It is, of COUlI8, just because the L(t; I) of a bilinear game is linear in this 8eD88 and ita dual that I uae the tAVm "bilinear." In Figure 1 the five slanting solid linea represent the five linear functioDa L(r i I) of a bilinear game in which r (for illustration) takes five values and i takes two. The dashed lines repreaent two vatu. of f,
AN EX.UlPLE OF A BILINEAR GAME
191
each of which has for simplicity been 80 choaen as to use, or mix, only two values of r. As may be verified by inspection, the particular bilinear game repreaented by Figure 1 baa the special property that minL(r; t1 - 0 for
eaeh i, which is the distinguishing property of those bilintsr pmea that arise in connection with the minimax decision theories descnDed in Chapters 9 and 10. Figure 1 bears a more than accidental resemblance to Figure 7.2.1. In particular, the concave function (1)
min 1,,(1; I) r
marked by heavy line segments in Figure 1 is closely analogous to the convex function 80 marked in Figure 7.2.1. The particular I emphasized by Figure 1 is that for which the function (1) attaiDS ita maximum value, which aecoroing to (3.6) is L*. This I i8 therefore the unique maximin. It has been shown quite generally in [BID] that bilinear pm. with more than one minimax or maximin are, in a seDIe, UDusual; Figure 1 makes it graphically clear that the special bilinear games now under consideration do usually have a unique maximin, becaUle there is more than one maximin only in case (1) happeD8 to have a horilontaJ Iti'dent.. What are the minimax f'8 for the bilinear game represented by FiguN I? Aeeording to the dual of Theorem 3.2, an r cannot be U8ed in the formation of a minimax f unless L(r; g} == L· for the (in this cue UDique) maximin ,. That consideration eliminates all but two 01 the r's from cOll8ideration. and it is graphically clear that thia Wl1. usually be the case for bilinear games in which i takes only two values. Theotem 3.2 itaelf, applied to the particular game under diseussion, shows that the graph of L{f; I) as a function of I must be horisontaJ for any minimax f. The two preceding conditions together eliminate all values of f exeept the one corresponding to the horimotal daabed line in Figure Ii and that f is indeed minimax, because L(f; i) -=- L· for both vaiueJJ of i. To specialize still further, suppose that r as well as i takes only two values. Such a game can, of oowse, be represented graphically in the spirit of Figure 1. Several qualitatively different situatiOIUl can occur, which might, for example, be elassified by the relation of the two liDear fuDCtiODS L(r, ,) to each other. The reader should graph and coDlider many or all of these possibilities for himaelf. The only one treated here will be that in which the two functions CI'088 each other at an interior It with one funotioD aloping up and the other doWD. It is graphi-
192
THE MATHEMATICS 0)' MINIMAX PROBLEMS
(12.4
cally clear that there will then be a unique minimax and a unique maximin, as will now be shown auaIyticaUy. The condition postulated can be expressed without loss of generality thus: L(I; 2) > L(t; 1), L(2; 1) > L(2j 2), (2) L(2j 1) > L(1; 1), L(I; 2) > L(2; 2).
Or, more mnemonically, L(I; 2), L(2; 1)
(3)
> L(I; 1), L(2; 2).
I t is conjectured, in this case on graphical grounds, that the program outlined in CODIlection with (3.7-8) applies, and the reader caD indeed verify that that program leads to the conclusion (4)
L·
=r
{L(I; 2)L(2; 1) - L(I; I)L(2; 2)}/4,
where
(5)
A!1 == L(I;
2)
+ L(2; 1) -
L(I; 1) - L(2; 2);
and that the unique minimax f and maximin I are (6)
f(I) { 1(2)
(7)
{
= [L(2;
1) - L(2; 2»)/.6
= [L(I; 2) -
L(I; 1»)/4,
g(l) -= [L(I; 2) - L(2; 2»)/4 g(2) - (L(2; 1) - L(I; 1)]/4.
If the game arises from an application of the minimax decision theory, (3) almost always applies. More precisely, in this ease, except possibly for the order of numbering,
(8)
L(I; 1) == L(2; 2) - 0
and
L(l; 2), L(2; 1) ~ 0;
if only the inequalities in (8) are both 8triCt, (3) applies. Then (4-7) specialize to
80,
(9)
L·
a:
L(l; 2)L(2; 1)/4,
where
+ L(2; 1);
(10)
~
(11)
1(1) = L(2; l)/~,
1(2) =- L(I; 2)/4,
(12)
g(l) - L(I; 2)/4,
g(2)
= L(l; 2)
= L(2; 1)/4.
12.$1
I
BILINEAR GAMm EXHIBITING 8YlttMETRY
193
Bilinear pmes uhibitiDg symmetry
Mathematically the solution of a bilinear game is often simplified by eonsiderat·ioDS of symmetry. For statistical applications, the implications of symmetry for biHnear games are of fundamental importance in 80 far as they represent & counterpart in the minimu theory of the disreputable but irrepressible principle of iDsufticient reason. This section discUS8e8 tbeee implications in an elementary, but fonnal, way. It can be skimmed over or skipped outright without much detriment
to the unde1'8tanding of later seetiona. Any discussion of aymmetry involves, at least implicitly, the branch of mathematics known 88 the theory of groups. Though what is to be said here about games exhibiting symmetry is intended to be c1ea.r without prior knowledge of the theory of groups, it may be mentioned that introductions to that subject are to be found in many p1aeea, for example in [B141. It CAD, and in practice often does, happen that a bilinear game has 80me qmmdrg. t This means that there are permutaticma, here symbolised by P, T', etc., of the values of r among themselves and the values
of i among themselves such that L(Tr; Ttl -= L(rj i)
(1)
for every r and i, where, of course, Tr and Ti an! the valUe! into which T carries ,. and i respectively. Permutations satisfying (1) are said to ~ the game inWJria1ll, or to belong to the group (01 qm1MlJU_) 0/ 1M game.. The pennutation U that leaves every r and every i fixed must be counted among the permutations in the group of the game, but the game h88 DO symmetry (worthy of the name) unless there are other permutations besides U in its group. An example of a game \l;th high symmetry is the game implicit in the IeCOnd example of 19.6, (or, to any permutation whatsoever at the six i's in that game among themselvtS, there is a eorrespondiq permutation of the such that the two pennutations taken together leave the game invariant. It was, of course, the exploitation of symmetry that made the treatment of that example relatively simple. ReturniDg to biliDear games in general, if T and T' are in the group of the game. then the product TT' defined by the condition that
"'8
(2)
(TT'),.
:II
DI
T(T'r),
(TT')i == Df T{T't)
is obviously also a permutation in the group of the game. This multi-
t This concept must not be confused with that of 'Icymmetrical pm..."
which ant symmetrical in the __ that the equation L(ri 1) - - L(i; r) is meanincful aad true for evert r and.i.
1M
THE MATHEMATICS OF MINIMAX PROB),EM8
(12.1
plication of perDlutatioDl somewhat reeemblee the ordinary multiplication of numbers. In particular, (T7")T" is evidently the Mme .. T(T'T"), though it is Dot Dect!Blarily true that TT' - T'T. Relative to this multiplication the permutation U playa the role of the unit, or Dumber 1, in arithmetic, for it is obvious that TU - U7' - T for any permutation T. For every permutatiOD '1', there is evidently & permutation r 1, and one only, that undoes T, that is, one such that T-IT - U. It is euy to see alao that TT-I - U and that, if '1' is in the II'Oup of the game, 7'-1 is too. The notation is of course motivatA!d by the CODIideration that, relative to the multiplication of permutatiOlUl, r 1 playa the role of the reciprocal of T. It will be adopted as a defiDition that Tf and '1'1 are the funcDODS such that T/(r) .. J(r1r) and Tg(f,) - g(T- 1,1 for every permutation of '1' and for every r and i, The intervention of r 1 in this definition may at first eeem arbitrary, but it is motivated by the followiDI CODaideratioDa. F"uat, if f ii, for example, the function auch that f(ro) - 1 and I(r) - 0 for , ~ '0, then Tf should be such that TJ(Tf'o) - 1 and TI(r) - 0 for r ~ Tro. Second, 8(Tf) should be (81')1 rather than (TS)f. The definition having been adopted, L(Tf; Til caD be calculated thU8:
r-'
(3)
L(Tf; 1'1) -
1: L(r; t.")/(T-l,),{T-l,) r.i
where the basic fact is exploited that, if r, i nma once through all pairs of values, then Tr, Ti also does 80. It follow8 from (1) and (3) that, if T is in the group of the game, then
(4)
L(Tt; TC) == L(f; I).
An f {I) is eaJ1ed UwariGnt under 1M group of 1M game, if aDd only if Tf - f (TI - c> for every T in the group. There is a natural way to
construct from any f an f invariant under the group, and dually for ,. Namely, let 1
I =- DI -
(5)
n
:E Tf, f'
12.6)
BILINEAR GAM&') EXmBITING SYMMETRY
115
where (here and throughout this aeetion) fa is the Dumber of elements in the group and the aummation is over all elements of the group. The definition (5) accomptiabea its objective, because (6) I
.. -1: I n 'r
and (7)
n
-= - - 1, tI
'I'1(r) .. J(T,-lr)
- ! E/CT-1T,-t r ) n
=-
f'
~ 1: T'T/(r) n
/(r)
T
for every , and for every T' in the group. In. (7) 1188 is made of the easily eetabUahed facta that T- 1T'-1 = (T'T) -1 and that as T nma once tbrouah the group so does T'T. The justification of I is, of eourae, dual to that of I. It is noteworthy that f - I, if and only if f is invariant UDder the group of the pme.
Suppoee R (1) ie a eet of the 1"8 (i'a). Then, by definition, r a TB (i • TI), if and only if 'l'-J, • R (T-1i e 1); and the set R (1) is invtJrli.mt under IAe ".wp 0/ 1M gcIrM, if aDd only if TR -= B (T1 - 1) for every T in the II'01lP.
Bxercl_ la. If R is iDvariant, 80 is ~R. lb. U B and R' are invariant, 10 are R
n H' and BUR'.
Ie. The vacuous Bet and the eet of all 1"8 are invariant. 2. For every R, let 12 - Df UT TB, where T is of eoul'8e confined to thepwp; and, for every r, define the trGj~ 0/ r I I (ri, where [rJ ii, as is customary, the aet whoee only element is T. (a) R is the amalJest invariant set coat-jniDl B. (b) I is the iDtenoction 01 aD invariant leta containiol R. (c) R iri.
U
t ••
(d) iii is the smallest invariant set of which, is aD element. 3&. H R is iDvariaDt, and R n fri ~ 0, then B ::> [r). 3b. If R is invariant, and r I R, then R ::> M. 3e. If trl n [r'J " 0, then (r] - (?j.
198
THE MATHEMATICS OF MINIMAX PROBI,EMS
(12.6
... The followiq conditiODl are equivalent: (I. R is invariant.
fJ. B - R. 'Y. For every r I R, fri c R. ,. R is partitioned into seta each of which is a trajectory.
4b. The following conditions are equivalent: a. f is invariant. 8. The set of ,'s for which f takes any liven value is invariant. -r. f is constant on every trajectory. Sa. If T'r - r, then (TT'T-1)Tr - Tr. Sh. If Ir} denotes the number of elementa of the group that leave r fixed, then (r I .. {Tr}. Se. If r denotes the number of elements in [rj, then" Sci. Both IT J and r are divisors of n. Se. The value of I every where on the trajectory of r is
II II
(rill r II.
II II
1
',T;:-rr, E f(r).
(8)
1 r II
r.
(rl
6. Note the dual of eaoh of the preceding exerci8ea.
In the establishment of all these preliminaries, the theory of bilinear pmes bas been almOst lost sight of, but it is now possible to say much
about the significance of invariant functions and seta for bilinear games. I begin with & theorem valued for 80me of its corollaries rather than for any charm of ita own. THEOREM
L(fll; I).
1 U L(f'; T,) S L(f"; TI) for e\~ery T, then L(f'; I) ~ If in addition L(f/; C) < L(f"; C), then L(l'; I) < L(f" j I).
PROOF.
L(T-1f'; ,) == L(f'; T,) ~ L(f"; Tg).
(9)
Therefore
L(I'j C) -
(10)
1 -1: L(T-1f'i I) n f' 1
S-
1: L(f"; TC)
n T
- L(f"; I).
If L(f'; I) < L(f"; I), then (9) is strict for T - U, and therefore (10) is also strict. • CoROLLARY
L(fll; t).
1
H L(f'; TC) == L(f"; TI) for every T. then L(l'; Il
-
12.6)
BILL'lEAR GAMa EXHIBITING
8Y~fMETRY
197
CoaoLLARY 2 If L(f'; I) =- L(f"; I) for every I, thea L(l'; ,) L(r'; I) for every I. COBOIUBY
L(I; I) - L(f; I) - L(l i I) for every f and ,.
3
4 If f .is invariant under the group of the pme, L(t; I) - L(f; I) for every ,.
CoBOLI.••y
ParaphraaiDgsome of the nomenclature of 16.4, if L(t'; I) S L(r' j , ) for every ., say that l' domifUJu, til; if f' dominates fll, but f" does not dominate f', 8&y that f' Ibidljj domifUJU8 f"; if fl dominates fll, aud f" dominates 1', say that f' and f" are equivalent; if fl is not strictly domiDated by 8DY I, say that, f' is ~ CoROLLARY
H f' dominates, atrietJy dominata, or is equivalent
5
to f", then f' dominatea, strictly dominates, or is equivalent to 1", respectively. CoROLLARY
If L(f; TI) S L(I; Tc) for every T, then L(I; I)
6
11K
L(li I)· H L(fj '1 S L(I; '1 for every i • I, where 1 is invariant under the group of the pme, then L(f;.1 = L(I j 1.1 for i • 1. CoBOLLAltY
7
It is impossible that f strictly dominates I.
Coao1d,ABT 8
max L(l; I>
TuoRBII 2
if the riPt-hand of the game.
& maximum
S max L(I; C), equality boldiDg, if and only I
is attained for a I invariant under the group
Paoor. (11)
maxL(lil) - muL(fjl) &
I
S max L(f; ,). I
The inequality in (11) follows from the fact that every I is a ,; equality holda, if and ODly if the fiDa1 maximum is attained for some I, that is, for IOIIl8 invariant . . . CoBOJ,YRT
9
10 ofthepme. CoBOLLART
If f is minimax, 80 is I. There exiaU a. mmimax I invariant under the group
If. pme has more than one minimax I, it is temptiq to 8Up~ that in statistical, if not in all, applicatiODB of the theory aD invariant,
198
THE MATREMATIt8 OF MINIMAX PBOBI.EMS
or symmetrical, mjnimax f would recommend jtae1f at least
(12.6
hilhly as any other minimax f. This supposition, being vapte, e&DDot be really proved, but certain faeta tend to support it.. In particular, the following theorem is a reaasuriDg improvement of Corollary 10.
1'ImollEll 3
88
There is at least one admi88ible, invariant, minimax f.
Paoo... It is a direct coaaquence of a theorem (Theorem 2.22, p. 04, 01 (Wa]) of Wald'., too techDical for statement or proof heN, that at least one invariant minimax f is strictly domiDated by DO iDvariaDt f'. U that f were Itrictly dominated by any til (invariant or not), it would also, according to Corollary 5, be domiDat«l by I", which is impoabJe. Therefo~ f is admiMibIe... U the bilinear game baa high symmetry or, more aplicitly, if the number of trajectories into which the r'a or the ,"8, or both, are partitiooed is amaU; the search for invariant miDimax fa aDd invariaDt maximin ,'8 is relatively simple. An iDvariaDt minimax ia cbaraeterised as an invariant f such that max L(f'; I) - min max L(I; c) - L*.
(12)
I
,
•
But, since at least one invariant minimax exi.stB, the criterion (12) is Dot cbanged if the minimization on its right side is confined to invari· ant '8i with f 80 confined, the criterion remajns unchanged, if both maximil&tiODB are confined to invariant ,'a (as Corollary 3 ahows). Thus the search for invariant minimax f8 and invariant maximin (8 amounts to the solution of an abstract game that arises from the original bilinear pille by ruling out certain values 01 f and I, namely the UD-invariant ones. This Dew aDd .maller abatract game can be exhibited &8 a bilinear pme thus: it be understood for the moment that r raDp8 over such a 88t of tlle ra that there is exactly ODe r' in every trajectory dually for i'. For invariant / and g,
ut
(13)
L(f; I)
tri;
-=
I: E L(rj ,j/(r)g('1 ,.
-
i
~~ rI
L L
i',. I (rij
'I
Wi
L(r; ,)J(r)g(,)
- :E I: L'(r'i i')f(r),,(i'), r'
i'
12.4]
BILINEAR GAMES EXHIBITING BYMME'rRY
199
where (14)
aDd (15)
I'(r') ~ Df II " 11/(1'); g'(i') = Df II i' 1I,(i').
FiDaIly, it is easily verified that, except for the conditions/'(,,) ~ 0, (i') ~ 0, and %f'{r'} =- l:g'(i') -= 1, the coefficients I'(r') and ,'(i') are arbitrary. The new game is therefore to all intents and purpoaea a bilinear pme with only 88 many ,"s and i"a 88 there are r·trajectories. and Hrajectoriee, respectively, in the original game. The new game, incidentally, may well have symmetry of ita OWD. If there is only one r- or one i-trajectory, the new game is 80 simple i~ ecarce1y deserves to be called a game. This OCCUl'B, for example, in the second example of § 9.6, where there is only one ..trajectory. In tbM situation there is only one invariant I, and it is equal at every i to the reciprocal of the total number of i'8 (which i8 here the value of i for every -). That I must therefore be an admissible maximin. The value of L· is the~ore given by
II II
(16) The invariant mjnimax f's are those and only thoae invariant f'a euch that f(r) = 0 for every' that fails to minimise the sum in (16). Moreover, here the minimax I's (invariant or Dot) are all equivalent, as can be argued thus: Any invariant minimax f is such that (17)
L(I; C) - L(f; I) -= L·
for every I. If any minimax f whatsoever failed to satisfy (17), it would strictly dominate I; but according to Corollary 8 that is impossible. Therefore in the very special situation at hand all mjnjmu: eatiafy (17) and are aeeordingly equivalent. It is. of COUl'88, important to extend consideration of symmetry to bilinear games with jnfinite seta of r'a and i's, and infinite groupe of symmetries. but the task has not yet proved straightforward. Two key references bearing on it are [U] and [B17].
r.
CHAPTER
13
Objections to the Minimax Rules 1 Introduction I have already expressed and supported my opinion that neither the objectivistic nor the personalistic minimax rule can be categorically de-fended (§ 9.7 and § 10.3). On the other hand, certain objectiona have been leveled against the objectivistic rule (that beiDg the weB-bOWD one) that seem to me to call for reinterpretation, if not outright refutation.
2 A confusion between loIS and negative Income Some objections valid against the minimax rule based on negative income are irrelevant to that based on 1088. The notions that the minimax rule is ultrape88i m istic aDd that it can lead to the igDoriDg of even extensive evidence have already been discussed as ...mplea of such 01>-
jectiona. Another example I would put in the same category baa been augeated by Hodges and Lehmann [m1. In this example a peJ'llOD who baa observed n independent to&8es of a coin for which the probability of heads has an unknown value p is required to predict the outcome of the (n l)th toss. Hodges and Lehmann here mterpret prediction in the following 80mewbat sophisticated, but reaaonable, aeD8e. The per80Il is, in the light of his observation, required to choose a number p betWeeD 0 and 1 and to pay a fine of (1 - p)1 or p2 according 88 the (n l)th toss is in fact heads or tails. Thus the (expected) mcome attached to the primary act p and event p is
+
+
(1)
l(p; p) -= -p(l - p)2 - (1 _ p)p2
= - (p -
p)2 - 1'(1 - pl.
AB Hodges and Lehmann show, the only derived act (mixed or pure) that yields the minimax of the negative income is to set p - i irrespective of the observation. But it is, in common sense, absurd thus to ic~
J3.31
UTILITY AND THE
~UNI~fA.~
RULE
201
the observation of the first n toages. In view of this absurdity, almost everyone would agree that applying the minimax rule directly to the neptive of (1) is a foolish act for the person to employ. The absurdity of minimising the maximum of neptive income in this example is of courae DO valid argument apiDat minjmi.mg the maximum 1088. It is easy to see that the loss corresponding to (1) is DOle
L(p; p)
(2)
=:I
(p - 'Pr~.
AM Hodges and Lehmann happen to show in the same paper [H6] (though in a different context), and &8 will be diaeus8ed in some detail in § 4, the UDique minimax derived act does use the obeervatiODB to advantage, resulting in & loss of 1
(3)
irrespective of 'PI The absurd act of setting p .. 1 irrespective of the obeervation results in the 1088 (p - 1>2, which in any ordinary contezt would be inferior to (3), especially for large n. Incidentally; the minimax derived from (2), though not nearly 80 bad as aetting p identiea11y equal to is itself open to a serious objection, which will be explained in § 4.
I,
I
Utility and the minimax rule
Some objecti0D8 to the objectivistic, and mutatis mutedis to the group, minimax rule are in effect objecti0D8 to the concept of utility, which underlies the minimax rules. Critieisms of the concept of utility have already been discuseed in Chapter 5, particularly in § 5.6, but certain aspects of the diseusaion need to be continued here. It is often said, and I think with justice, that, even grantiug the validity of the utiJity concept in principle, a persoD can aeldom write down his income function J(r; J) with much aeeuracy. This idea is put fQJ'Ward sometimes with one interpretation and sometimes with another. Of these, only the first is strictly an objection to the utility concept.
That one is
It
dilemma
raised by the phenomenon of vagueness.
VaguenNl may 10 blur a person's utility judgments that he cannot accurately write down his income function. I suppose that no one will eeriously deny this; I would be particularly embarraseed to do 80J for it is almost a reeapitulation of the very argument that leads me, tholllb in principle & personalist, to see some 8e1l8e in the objectivietic decision problem. On the other hom, if all meaning is denied to utility (or lODle exteDaion of that notion) no unification of statistics seems possible.
OBJECTIONS TO THE MINIMAX RULm
(18.8
Three special circumstances are known to me under which eecape from the dilemma ia poIIible. First, there are problema in which IOIDe straightforward commodity, lOch 88 1DOD8Y, livee, man hours, hospital bed days, or submarines sighted, is obvioualy 80 nearly proportiooal to utility as to be substitutable for it. Second, there are problems in which exact or approximate minimax decisioDi can be calculated on the basis of only relatively little, and eaaily available, information about the income function, such as symmetry, mODotoneity, or amoothneM.
The po88ibiUty of cheap extensive obeen'atiOD, which (wben it occurs) makea the miDimu priDciple attractive, aIIo tAmcla to make many decision problema fall into both of the two types in which the difBculty of VllUenesB is alleviatAd. For example, in a monetary cieoiliOD problem with cheap observation available, it often happens that the weak law of large numbers, and the like, can be invoked to justify reprdinl cash income as proportional to utility income. Third, there are many important problems, not necessarily lacking in richness of struoture, in which there are exactly two ccmaequencea, typified by overall SUCOO88 or failure in a venture. In IRlCh a problem, .. I have heard J. von Neumann etreIe, the utility can, without lea of generality, be set equal to 0 on the leas desired and equal to 1 on the more desired of the two cOlUBluence8. The 8eCond sense in which it may, though not quite properly, be said to be impossible to write down the income function is typified by this example. A manufacturer of small ahort-lived objecta, .y paper napkins, is faced with the problem of deciding on a pJ'Oll'BlD 01 sampling to control the quality of his product.. He complaiu that, thouah for this problem his utility is adequately measured by money, he cannot write down his income function because he does Dot know how the public will reaet to various levels of quality-that, in particular, the minimax rule does not ten him at aU how much be ought to spend on the sampling program, though it may say how any given amount can best be employed. The manufacturer baa a real difJiculty, tboup he expresaes it inaccurately. He forgets that the lack of knowledge that gives rise to the decision problem involves not only the &taW of hie product, but also the state of the public j taking the state of the public into account, there is no real difficulty in writing down the income function. But, if it is not practical for the manufaeturer to make obIervatioDa bearing on the state of the public &8 well as thoee bearin, on the state of the product, the minimax rule is not a practical 8Olution to his problem; for, rigorously applied, it would remove him from the papernapkin busine88. I believe that in practice the penonaliatio method often is, and must be, used to deal with the unmoWD state of the pub-
1&4)
ALMOST SUB-MINIMAX ACTS
lie, while objeetivistic methods, particularly the minimax principle, are now increasingly often used to deal with the state of the product-&, IOrt of dualism haviDg some paraDel in almost all eerious applicatioDa of statisti~. This is not to deny that relatively objectivistic methods of market rmearch can sometimes be used, Dor that there are penoualiatic e1emente aside from tboee concerning the state of the public in much 01 even the moat advanced quality control practice. ,
AImoIt _b-mfnima" acts
Another lIOn of objeetiOD to the objectivistic minimax rule is illu. trated by the following example attributed to Herman Rubin and published by Hodges and Lehmann [HS]. An integer-valued random variable :I subject to the binomial dlatributioD (1)
I
P(z p) =- (:) ,,-(1 - p)--
ia obeerved by a perIOD who knows n but Dot p. His decision problem is to decide on a function,. of s subject to the 1088 functioD: (2)
Let; p) - B«t - pr'l p) - ~ P(z I B.) fez)
z
S max E(f I Bu. f
In diaenl8ing application of the minimax rule to the buic and d~ rived 10118 functions, it is doubly advantageous to introduce mixtures of the ,"'s, for thereby the theory of bilinear games preeented in Chapter 12 and that of partition problems (with aom.e reinterpretation) eaD both be brousbt to bear. Letting fJ denote a generic syaUm of weiPta ~(,). /J(t) ~ 0 and %IJCJj - 1, and waDI the notation of Chapter 7, the
RECAPlTULAnON OF PARTITION PBOBLEM8
1...2]
211
biJiuear pm...lM)Ciated with the primary and derived problema are, respectively,
I
(8)
L(f; fJ) .. 1(Jl) - B(f IJ),
(9)
L(I(z) ; fJ) .. l(ft) - B(I(z) fJ)
I
- 1(JJ) -
l: L B(I(z) I BI)P{z I Bi){Ju1 J
- lep) -
s
E 8(1(%) I (J, :&)P(% I fJ). s
If necen.".. (9) can be iDterpre~ and verified by comparison with
(7.3.7) and (7.2.8), in that order. In Chapter 7J /lei) W88 generally required Dot only to be DOIHleptive, but al80 strictly positive; on examination, this alipt differenee from the preeent context wiD be found iDnoeuoua. Again, in Chapter 7, the statemeDt and derivation of conclusions were, for simplicity, nominally confined to twofold partition problema. Here the exteDlioD of th~ coneluaioD8 to ,..fold problema will be freely used, though aome reaclen may prefer here, 88 there, to focus on twofold probJema, Lettiq L· denote the minimax (and maximin) value of the basic, and L·(z) that of the derived problem, it is obviOUl, since pes) :) B, that L·(z) ~ L·i but there is lODle interest in viewing this inequality aa a cODSequenee of (7.3.4): L·(x) .. max min L(I(s) ; fJ)
(10)
•
l(a)
- max (l~) - ,(I'(z) I ~)]
• S max , [l~) -
I
,(P fJ)]
.. max min L(I; fJ) - L·. •
f
It is clear that the maximin fJ'8 for the basic aDd derived problema are the (I. that maximise the concave functiona (11)
A(jJ) -
Dt
I(JJ) - .,(1' IIJ) - l(J) - k~)
and
I
I
A(JJ; x) .. Df l~) - 11(:r(z) 8) - 1(P) - 8(k(JJ(s» fJ),
(12)
respectively. The eearcb for minimax I(S)'I, for example, is ~t1y Dan'Owed by the coDSideration that, if fez) is minimax, B(f(z) l 8) .(It(z) /I) for eome #, indeed for every maximin IJ. AccordiJa& to 17.3.
I
212
THE MINIMAX TJIIX)RY APPLIED TO OBSERVATIONS
equality obtaine in (lO)t if and ODly if there is a maximiD
basic problem such that
fJo(z> -
(13)
Df
{
(If.4
/Jo of the
P(~ I BI)~(f,) }
E P(l: I BJ)/JoCJ1 j
is also a maximin of the basic problem for every z lOch that
%P(~ I BJ)~(;l
> o.
The moat typical poasibility, and the only
ODe
to be explored here, is
that the basic problem baa a UDique maximin flo with /Jou1 > 0 for all m ptiOD, L·(z) - L·, if and OD.Iy if z is utterly irj. Under this relevant, .. is easily shown. In the laDle spirit, as can easily be shown, L-(z) - 0, if z is definitive, but Dot typically otbenrile; and, if z extends y, then L· (x) S L· (y) with equality if, and typically only if, y is su1BcieDt for z.
'.'1
a
Sd.deDt etatilltica DigrMiDI from the miDimax rule for a moment, somethiDI more fundamental caD be aaid about a sufticient statistic J of x. Namely, for every f(x) £ P'(I:), there exista aD fCy) e F(y) aueh that l(f(y); t) l(f(z); i) for every i. Indeed fer) - L f(z)P(z 1y} defines such an
•
act. Without appeal to 10 weak a step 88 the minimax rule, this remark demODStrates that even an objectivist loees DotbiDa by acbaDBiDe bowl. of an obeervation for knowledge of a auflicient statistic of it. The remark might 88 well have been expru.ed in 17.4, except that there it would have involved some circumlocutioD, mixed acta Dot yet havm, been introduced. ,
SImple cUcbotomJ, aD eumple Much of what has been said thus far is well illustrated by the miDimax counterpart of Exerci8e 7.5.2. The reader is aceordiDgly asked to review that exerci8e and coDtinue it thU8: Ezerdlea
1. For the problem in question:
ACP) - 1,JJ(1)
(a>
(b) l(p; 1:)
¥(l) ).
+ 3.6(2) - ~ I ".rJlJ(2) - lar.6(l) I {~p(r I BJ)} 1t(2P(r. < rl·(fJ, *'0>1 B 1) + Per =- r·(fJ, 1Jo) I B 1)]#(I) + 'd2P(r. < rl·(JJ, flo) I B,) + P(r -= ,.~, ~) I B2)~(2).
- 1,JJ(1)
-
+ ' 1R(2) -1'16(2) -
BIMPLE DICHOTOMY, AN EXAMPLE
2&. A ~ is maximin, if and only if r·(ft, (1)
~)
213
is such that
'sPe"l < "I-(ft, fJo) I B1) S ,IP(,., S "I-eft, ~o) I Bt )
and
(2)
'sP(r. S
"I -(fJ, 1Jo) I Bt )
~ &tP(rt
< 1', *(JJ, 60) 1B 2 ).
2b. There is typi~ly only one maximin, but there may be a chad intBval of them. 3. Tbough the acta of F and F(z) 88 defined by Exercise 7.5.2 do Dot provide for mixed acta, it willsuftice to consider mixtures of the f(x)'a. Each of theee will be determined by an 1, and nothing will be lost by requirin,i to be of the form i{r(z». 4a. Any minimax will be equivalent to a mixture of f(z)'8 each corre8pOJldiug to & likelihood-ratio teet 8880Ciated with r·~, /Jo) for every
muimin IJ. 4b. In view of Exercise 3, the only likelihood-ratio teats that need be cODlidered for a minimax ~ are: i(,.) - 1, jf and only if r, i(,.) - 1, if and only if'l
< rl*(lJt fJo}. < '1·(IJ, (0).
These are not Deceasarily different testa. 5&. If the maximin 8 is unique, the minimax act is unique (except J) Lettiog >.(A) be independent of A.
8Dd
r
therefore equal to ( :
I
for every A; that is, letting every sample of size m have the same probability of being chosen, or randomjziaC, 88 it is said. (b) utting f(zl(A), ... , x.(A» be symmetric in its m· arguments aad iDdepeDdent of A. It
in {act be shown, by the method illustrated in the second example of § 9.6 and di8Cu88ed more generally in § 12.5, that there is at least one minimax 8&tiBfying (a) and (b). and el-en that there is an admiSBible one. Typieally. if m. is large, but, small compared to ft, L).· is much smaller than the common ,·alue of the L·(z(A»'s. The importance of randomiaation in applied statistics ean 8earcely be exaaerared. From the personalistic viewpoint it is one of the most important ways to bring groupe of people into virtual unanimity; from the objectivistic viewpoint it not only makes po88ible great reductions in maximum loes, but it is seen 88 an invention by which the theory or probability is brought to bear on situations to which probability on eaD
first (objectivistic) sight would
IleeDl
irrelevaDt.+
9 limed acts in ltatiatica Many have eommen~ that modem applied statistics makes one. but only one, important uee of mixed acta, namely in deciding, through the procees of randomization, what to oblerve. Thus, for example, once the o~atioD baa been made. the derived act is in practice almoet aiwaya chOlen. without mOOng, from a set of basic acta natural to the problem. This might seem to imply a sharp conflict between tbe minimax rule and ordinary statistical practice; but actually it reflects + I WOIIld aprl .. m,-lt ftry
d~ereDtIy
today (SaftP 19trl, pp. 33-M).
218
THE MINIMAX THEORY APPLIED TO OBSERVATIONS
(1'.0
agreement, for mixed acta (p'e&tly reduce the minimax 1018 in decisionproblem intapretatioDS of typical practical atatistical aituatioDl, whea and only when ordinary practice calla for mixed acta of the same sort, namely when random;.tion is caDed for. There are certain mechaniama that aystematiea1ly tend to make mixed acta have relatively little, or even abeolutely DO, adV8lltap over unmixed acts. In the followinl discussion of theae mech_ism., let L(f'i i) be the abstract game on which a bilinear pme L(fj I) is baaed. In the first place, supposing that L(ri s1 is DOn-neptive for f1Very r and i (as is appropriate to the context DOW at band), (12.8.6) caa be completed, 80 to speak, thus: (1)
L· min (R, 1)
~
min max L(T; s), i
r
where R and I denote for the moment the number of values of r and i. respectively, aDd min (R, 1) is of courae the minimum of the two integers R and 1. An inequality atronger thaD (I) will actua1ly be proved. Conaider a minimax f for which the smallest poEble Dumber R' of the /(1')'8 are actually positive: (2)
B,'L* == max B' I: , L(r; '1/(r)
,
~ max L(r' j i
'J
> min max L(r; aJ r
i
where r' is 80 ehoeen that R'I(,.') ~ 1, as can obvioualy be done. It is known (Bl9] that R' ~ min (R, 1). The important le880D of (1) is that, unI8!ll!l R and 1 are both larp, the introduction of mixed acts e&nnot reduce the minjmax Ie. to a very amaU fraction of the value it would otherwise have. To mention a different mechanism, Figure 12.4.1 augests that, if there are many r'I, the comers of the concave function emphaaised in that figure may well be very blunt, in which caae a minimax mixed act baa almost sa high a maximum 1088 as anyone of ita components. When the Dumber of r'a is infinite, the concave fUDction may well be diBeteDtiable, in which case mixed acta have absolutely no advantage. The remark appended to Exercise 4.6& is pertinent here. This mechanism can be related to a certain large e1asa of infinite abstract (i.e., not DecelBlrily bilinear) games, discovered by Kakutani {KI], for which L* - L.. Bilinear games are but & special case of these, and numerous others seem to arise frequently in applicatiODB.
14.0)
MIXED Acre IN STATISTICS
219
If L· - L. for an abetract game, nothing at all can be pined by adjomirag mixed acta, &8 (12.3.5) mow8. Finally, it may be mentioned that in many cases where an observation z might be followed by a mixed derived act, the same, or nearly the aame, conaequences can often be realized by a pure act. Speaking a little loosely, this ooours whenever % has a continuous or nearly CODtinuous contraction '1 that is irrelevant, or nearly irrelevant, for then '1 can play the role in selecting a basic derived act that would otbeJ'\\iae be aBIDed to a table of random numbers. If, for example, x is CODtinuous, y(x) can be taken 88 the last few digits in the decimal expansion of x to an extravagant number of plaeea. Again if, conditionally, z {za, ."", x..1 is an n-tuple of continuously, identically, and independently distributed real random variables, y{z) may be taken .. the permutation that ranks the z's in ascending order t provided that n 1 is fairly large: 10! should satisfy a1moet any need. A reeent technical reference on the auperfiuOU8De88 of mixed acta in the preeence of continuous observations is {DI3]. I have oceuioually heard it conjectured that any mixed act made after the obeervation (m an obeervational decision problem) is.~ in principle. I would al'lUe that the conjecture is mistaken thus: Any obeervational problem that cal1s for randomisatioD can be simulated, 80 far as its 1088 function L(T;.) is concerned, by a basic problem. A mixed act will be 88 appropriate to the basic problem 88 it was to the observational problem from which the basic one was derived. In this way a peat variety of situations eaIliDg for mixed acta haviDg nothing to do with choice of observation can be conatructAd, though they aeem. to be atypical in practice. MOr.!Over, any basic problem can obviously occur 88 the decision problem remaining after aome particular value z of aD. obeervation baa been observed, 80 the situations just cODStructed lead to cloeely re1atAKl ones calJing for mixed acts G/16 observation. lAB abstractly, cOD8ider a person choosing from a tray of aaaorted French pastries. Even after extensive visual obeervation and interrogation of the waiter, the person might justifiably introduce considerable mixture into hia choice.. I think that the conjecture that mixed acts are necesaa.rily inappropriate after obeervatioDB etelDB partly from the mechanisms that do tend to make such acta inappropriate or unimportant in many typical caaes and partly from justifiable dissatisfaction with specific mixed acta that have from time to time been sugested by statisticians. For example, the suggestion that ties in rank arising in nOD-parametric testa be removed by ranking the tied obeervatiODS at random may in many, or perb&pe all, cases fairly be reprded with suspicion.
CHAPTER
15
Point Estimation 1 IDtroductioa This chapw discuases point estimation, and the next two di8CW18 the testing of hypotheaee and interval estimation, respectively. Definitions of these procc-. must be sought in due course; but, for the momeDt, whatever notioDB about them you happen to have will afford sulIicieot background for certain introductory remarks applyiDg equally wen to both kinds of estimation and to testing. Estimating and testing have been, and inertia alone would insure that they wiD long continue to be, comerstones of practical statiatica. Their development has until recently been almost exclusively in the verbalistic tradition, or outlook. For example, testing and interval estimation have often been expressed as problems of making asaerti0D8, on the basis of evidence, according to systems that lead, with high probabilityI to true a.eaertiODl, and point estimation has even been decried 88 ill-conceived becauee it is not 80 expl"e88ible. Wald's minimax theory has, as was explained in 19.2, 8timulated interest in the interpretation of problems of estimation and testing in behavioraliatic terms; to objectivists this has, of OOUl'88, meant interpretation 88 objectivistic decision problems. For reasons diseU888d in § 9.2, it does seem to me that any verbalistic concept in statistics owes whatever value it may have to the possibility of one or more behavioral-
istic interpretatioDS. The task of any such interpretation from one framework. of ideas to another is neeeasarily delicate. In the present iD8tance, there is a particular temptation to force the interpretation, namely, 80 that criteria proposed by t.he verbalistic outlook are translated into applications of the minimax theory, that is, of the minimax role and the 8Ul'e-thiDg principle (88 expressed by the eriterion of admissibility), for these are the only general criteria thus far proposed and eeriously maintained for the solution of objectivistic decision problema. Of courae it is to be expected, and I hope later aeeti0D8 of this chapter and the next demonstrate, that unforced interpretations do often translate verbalistic 220
16.31
EXAl\fPJ.. I'.:8 OF
PROBI.,E~IS
OF POINT ESTIMATION
221
criteria into applications of the behavioraJistic ones. In evaluating any such interpretations, it must be borne in mind that an analogy of great mathematical value may be valueless as an interpretation; correspondingly, what is put forward as m~re analogy should not be taken to be an interpretation, much less branded as a forced one. For example, attention has al~ been called (in 111.4) to the danger of regardiDg the analogy between the theory of two-person games and that, of the minimax rule for objectivistie decision problems 88 an interpretation. In fact, minimax problflms are of such mathematical generality that they arise, even \\;thin statistics, in context8 other than direct application of the minimax rule to objectivistic decision problems; a striking, though technical, example is Theorem 2.26 of '''aid's book ['V3]. The literature of estimation and testing i.., vast; indeed it has, I think, been seriously contended that statistics treats of no other subjects. This r.hapter and the next t.wo cannot, therefore, pretend to present a complete digest of that literature, fven 80 far &8 it pertains to the foundations of statistics. For further reading certain chapters of Kendall's treatise (K2] may be rec.ommended as a key reference to the verbalistic tradition (Chapt.ers 1i and 18 for point estimation; 19 aDd 20 for interval estimation; 21, 26, and 27 f01" testing). !\{any newer aspects are treated in \Vald's book (W3]; and a reoent review of testmg by Lehmann lU] is recommended. 2 The verbalistic concept of point estimation
AbBtractly aDd very generally, but in verbalistic language (which is De 1/2; but, according to Criterion 3, l' is better thaD I, because 5/11 > 4:/11 and 7/11 > 6/11. The example can euily be modified to suit any taste for symmetry and continuity. But, if I and I' are conditionally independent (which is not a natural 8BunptiOll)t and I is better than l' according to Criterion 7; then, u may easily be shown, I' cannot be better than 1 by Criterion 3.
I
The list of criteria is here interrupted by several paragrapha of explanation in preparation for two concluding criteria. The approach to certainty treated in 113.6 and 7.6 has ita counterpart in the theory of estimation. In particular, if x(n) = (1:1, ••• , z.t is an n-tuple of conditionally independent and identically dietn1>uted observations, there will typically exist sequences of estimates l(n) baaed
on x(1&), sueb that (7)
lim P ~(,) for which Q(l; .) '" o. With the abbreviations L{~)
-= Df L(l.; i), 4(k) := Df L(k) - L(t - 1), and Q(k) the sum to be investigated is (3)
1: L(k)Q(k) o 1. The ineome attached to orderiDs L feet of shelviDlj at the price t1.00 per foot, is clearly 1(1; i)
(7)
=:
i(l) - I.
It. is maximised at the one and only value ~ ror which di(A)/dA == IJ
80
that L(l; i) - (i~) - A] - [i(l) - I),
(8)
which is of course t"ice differentiable in I. The moral of these two possible economic analyaes of one example is of wide applieability, 88 is weD known amoDg economists. Where a superficial analysis 8Ugesta & kink, or even a diaoontinuity, in an income function, deeper analysis will often show that the function i8 smoothed out by vari0U8 economic phenomena such 88 the iDhomopReityand the mutual substitutability of commodities. To retum from the digr.-ion, if L is twice differentiable in I (at least when I is close to A), L can be expanded in a Taylor aeries thus: (9)
L(l; -1
-
L(A; i)
+ (I -
a
~) -;-, L(l;
.1 '-ACi)
U
+ ~ (I 2
A)2
~ L(l; ,11 '~(J) + 0«1 -
a,-
A)t),
where, following standard uaap, 0«1- ).)2) is a runction orland i, Dot oece.arily the same from ODe context to another, 8Uch that 0«1 - A),) +
234
(uta
POINT ESTIMATION
(I - ~)2 approaches aero 881 approaches ~(.) for fixed t. The firet term OD the right side of (9) vanish.- by the definition of eatimation; the aeoond must vanish alao, for othenriae L could be negative.. Therefore,
(10)
L(l; i)
:=II
21 (I -
A)2
a2 L{l; i) ap
- (I - A(s1)la(i)
I
,_
+ o«l -
+ 0«1 -
~)2)
A)'),
where a(i) is defined by the context. In view of (10), it is plausible that L may, in JDaDY problema where estimates of great accuracy are poasible, be aupP«*d to be practically of the form (11)
L(l; i) - (I - ~(i»Ja(i),
where c:t{i) > 0 for every i. Tbis does Dot exaatly mean that a re&8ODable L can be eloae1y approxima~ by functi0D8 of the form. (11) lor aD I. In particular, the absurd aaaumptiOD that L is unbounded (which such approximation would typically imply) is not to be made. It meaDS, rather, that under favorable circumstances (11) may lead to a reaaonably good evaluation of L(I; i). In 80 far as the form (11) can be 1UpP«*d adequately to represent L, Criterion 2 is obviously an application of the principle of admissibility. An interesting discussion and application of (11) is given by Yates (Y2]. 8 A bebaviora1istic review, continued
Thua far, Criteria 1, 2, 3, 5, and 8 have been diacuaaed in behavioraliatic terms. In ract, under suitable bypotbeeea, each has been found to have considerable bebavioralistic justification. Criteria 4 and 9 also have such juatmeation, but my diacussion of them is 80 bulky it had better be isolafAd in a special section. As for Criteria 6 and 7, the only ones remaininl, they do not seem to me to have any aeriOUI justification at all, as will be discussed in still another section. Criterion 4, the recommendation of maximum-likelihood eatimatee, Us of extraordinary interest, for, of all the criteria of the verbalistic tradition, it is ea8lltially the only one that seleetB a unique estimate in almoet every estimation situation of practical importance. The present
section demoostrates that, in the presence of exteDBive observation, maximum-likelihood estimates are often almost minimax estimat81; it alao gives some analysis of Criterion 9, which refers to efficiency. The way to theee goala ia roundabout; it beginB with a study of information in the techniea1 Mille mentioned in 13.6. In this aection it will be ...
BEHAVIORALI8TIC REVIEW OF mrIMATlON
16.61
aumed for mathematical simplicity that each obeervation under diacuesion is confined to a finite number of values, each having positive probability for every element of whatever partition is under diacu8sion. If Bi and Bj are elements of a partition, not necessarily finite, and x Us an obaervation, BaY, ill the spirit of (3.6.11), that the in/omuJtitm 0/ j relGtiw to i for 1M ~ z is (1)
J(i, jj x)
e Df
-E (log P(z P(z: 8/)IBi) - -B (log Bi)' 'j
B~)
r~
The expreesion of J in terms of likelihood rati08 is important, eapeciaJly for the exteDBion of the di8cuaaioD to more general obeervatioDl than th088 contemplated here.. The reader should, therefore, try to bear in mind that the whole diacussion eould be carried on in terms of likelihood ratios; I refrain from 80 doing only for momentary ftUODS of notational convenience. The theory of ] caD CODveniently be preeented in a aeries of exercises. Burel_ la. If., is a contraction of x, then J (i, j; x)
when? Hint: (2)
-
E
I
(I
~
J(i, j; y). With equality
1
P(% I Bi ) B ) > pc, I Bi ) 01 P(z I BJ iJ 11 - - og p(rl B i ) •
lb. J(i, j; x) ~ o. With equality when? 2&. If %1, ••• , z.. are conditioDally independent, then (3)
J(i, j; Zl,
2b. If in addition the then
".,
%.'8
x.) -
r; J(i, jj z.) •
are conditioD&11y identically distributed,
(4)
It is interesting to evaluate the information J (A, A + 4A; 1:) where A and ~ + 4A are two e1oee1y neighboring values of the parameter of an estimation problem, supposed, for aimpHcity, to be free of nllisance parameters. If P(z I~) is continuous in A, it is almost obvious that JCAt A 4A; %) approaches zero 88 4A approaches zero. If P(% I A) is differentiable in A, it is easy to show further (eoDSideriDg that J is non.. neptive) that even J()., ~ + 4A; z)/tU. approaehee sero as AA ap-
+
POINT I8TIMATION
proacbea namely, (5)
HI'O.
But in this
lim J('A, ~
~l"'O
caE
much more
111.8 caD
and will be ahowD,
+ 4Ai x) _ ! R('A; z) 2
4)..1
- Dr
i
E[
e :% ).)Y I).]· I
101
The function H is pnerally, following Fisher, called iDformatiOll, but here we bad better call it tli8~ 'aJ~ Chl'OllolOlieally, as explained at the end of 13.6, the coneept of differential information ia older than that here called limply information and of which it is, according to (5), a limiting C&I8. The dem.ODStration of (5) bePa. with the CODSideration that (6)
log (1
+ ,> =- t -lP + o(p).
Therefore, (7)
{P(Z I A + lU) - P(z I A>} + ---~---
+ I
P(z 1A 4A) - In. 1 "8 P(x A ) · - a
)nIJ
=-
P(z J A)
{P(Z f A + 4A) -
I
P(z A)}
P(z f A)
_ ~ {P(Z I A+ 4A) - P(z I A)}' 41.'" 2 P(z I A) + 0(
Ja
Since the expected value liven A of the term in the second line 01 (7) is eaaily aeen to be euctly selO, it will be tactful to leave that term alone; but the eecond may be approximated thus: (8)
{ P(Z
I A+ ~) - pes I A)}' {4A apes I ~) }I P(z I A) P(.z I A) aA + o(~) log P{z I A>}2 S\ - 4A aA + o(~J.
'ta
Therefore,
J(A, A + ~; z)
(9)
II:
tH(A; Z)4).1
+ o(4AI),
which establishes (5). Moreaerd_ 3. If the kth derivative (A: for 8Very z, then (10)
E
> 0)
with respect, to 'A
or P(z I A) eDIt.
~(Z\A) ~ p(z I).) I).) - ~(~ p(% I A») - 0.
15.8)
BEHAVIORALISTIC REVIEW OF
ESTI~IATION
4. If the requisite &eeond derivative exists, then
(11)
H(>"j x) - -E
(:2101 P(z I A) I A)-
5. Uy is a contraction of % (and H(>..; x) is well defined), then H()'; 1)
S H~; x). Remark: The inequality is obvious in the light of Exercise la and the first part of (5). But it can also be derived from the following application of Theorem 1 of Appendix 2t which is useful in the next exercise. (12)
{
1
p(,1 ~)
ap(rl A)}2 _ gJ ( dA
1
apex I A)
P(% A)
aA
({
SE
I
8P(:r. i P(z A) clA 1
I
I
A)
1/,
A)}21
) Y. A ,
I
with equality for every 11 and A, if and only if :A log P(z A) ran be expl8l8ld 88 a function of 11 and A alone. &. H J is a contraction of %, BfA; x) == H(A; 1) for every ).; if and only ii, is sufficient for x. 6b. RCA; x) .. 0 for every A, if and only if x is utterly irrelevant. 7&. If z., ... , z. are independent giVeD A, tbeD
H(A; %1,
(13)
".,
I: H('A; z.) .
x.) -
•
7b. H, in addition, the x,'s are identically (14)
B(A;
Xh ••• ,
distribu~
given A, then
x.) == nH().;x.).
8. If 1 is a real-valued contraction of
%,
and H(A; x) is well defined.
thea (a) (1&)
!!. 8(11 A)
=- E
~
('(X) a log P(l(z) I ).) a~
(b) (16)
E(11 - Ar I A)H(>'j 1)
~ {~E(ll A)
A).
r.
with equality if aDd only if (17)
a
- Jog p(ll A) == (l - >.)k
aA
for aome coD8taDt k. Hint: Use Exerciae 3 and apply the Schwartz inequality to (15).
(lS.8
POINT J!SI1MATION
(e) If H().i z)
> 0, then
E«(1- A]21 A)
(18)
~ {~B(ll A)}'18(1; x).
Exercise 8e is an important, and now famous, inequality. It, together with its n.dimensional pneraiizatiOD, baa been called the Cram6r-Rao inequality because of ita independent pUblication by Reo and Cram" in 1945 and 1946 respectively (see [H6J). But the D&1Ile is Dot at all well justified historically. Frichet pnwented the inequality in 1M3 [FS], and Darmois extended F~et'8 inequality to fa dimeD8iODS, at least for unbiaaed estimates. in a publication [DIJ Dot later than RaG'I. The inequality has also, tho. I think erroneously J been attributed. to an early paper by Aitken and Silverstone (AI], and to one by Doob [DlOl. My point is, of COUI'Ie, not to give a definitive history of the iDequality, but merely to augest that lor the time beiDg an imperBOD&l name would be better. I tentatively propose callml it the in/~ iMqUGlity. Some recent references pertinent to the informatiOll inequality aDd other topice treated thus far in this eection are (W16], [M5], [061. and [H6]. The techniques uaed in the remainder of tbia aeetion, which revolve around the infonnatioD inequality, were published posthumously by Wald [W5]. The information inequality has an important beariDl on application of the minimax nile to estimation, of which the following theorem may, in view of (5.11) be taken as a first illustration. 1
THEOREII
HTP.
1. For every A in a closed interval of lezaph I, H().; z) where H is a constant. 2. 1 is a real-valued contraction of z.
(
H,
2)-1·
For aome A in the interval, B«(l - A)2l A) ~ HH + i
CoMCL.
Suppoee that the theorem is falae. Then aceordiDI to
PaooP. erciae 8c, (19)
lor every (20)
~
~
d
in the interval.
dA [A - E(ll A»)
>
Therefore, (
1 - HH H~
~~ + ,/ -
2
-('-H~~-+-2-)
E.~-
BEHAVIORALISTIC REVIEW OF
15.61
E9TI~IATIOX
for every ~ in the interval. Therefore, at one end of the interval or the other, (2J)
This leads to a contradiction through the well-kDo,,~ inequality
(22)
which
I
E([l- ~J'I~) ~ IE(l - ~ A)ll
-I A -
E(l t A)
I',
be derived as a direct application of Theorem 1 of Appeodix 2, or of the Schwart. inequality. or of the useful identity caD
E([l - A)' J A) =
(23)
v(11 A) + {E(I - AI ") }2• •
In the remaining portion of this section, let it be understood that: 1. The x,'. are an infinite sequence of observations that are, given A, identieally distributed and independent. 2. z(ta) - (X., •.• , z.1 for 1& - 1, 2, .... S. 1(,,) ill a real-valued contraction of x(n). The contraction l(n) is to be thought of &8 an estimate of A based on observation of x(n). In the spirit of the minimax theory it is really mixed, rather than ordinary, estimates that should be treated here. But this entailB no eaaential change in tbe following discussion onee it is recopiaed that a mixed estimate is, in effect, an ordinary estimate baaed on observation of y(n) =- Dr (1(n), z(n», where x(n) is sufficient for y(n), 80 that H(A; y(n» = H(A; x(n» for all A. 4. t aDd , are poeitive numbers. 5. Ao is a cloeed interval of length a contained in the range of A and including a given value ).0.
theorem shOW8 that, if L(I; ).) is of the fonn (5.11), L(l(n); A) caDDot ordinarily be kept much smaller than a('Ao)/nH(Ao; Xl) for Iarp la, even in a email interval about Ao. The
Dext
Tamu.. 2
If H(Aj Zl) is continuou8 and positive at lo, and if CI~) is a DOD-negative function continuous at >.0, then, for sufficiently larp A, 8«(1(3) - ).)Ia().) I A) ~ (1 - t)o:{'Ao)/1I1l('Aoi :1:1) for some ). lAo.
PaooP. There ill no loes of generality in supposing that
f
Ao such that, for A lAo, a(A) ~ (1(>.0)(1 - f)~ and H().j H().o; Zl)H (1 + (1 - f)-~)/2. Using Exercise 7b, (24)
H(A; Z(R»~ - "JiH(>.; Xl)H
n~
S -
2
H{lo; Zt)H[l
+ (l
< 1 and XI)Ji
- f)-UJ
S
POINT DTIMATION
H)
for A. At.
I.
By Theorem
(15.1
if ,,~ 16/,sH(Ao; :1:1)[(1 - f)-~ -
then (25)
E«I(n) - ).)2) A) ~ {
2nH H(>.oi zl)~l + (I -
.)-)()
11'.
+ ,2}-2
(1 - f)U
>---- nH(Ao; Zl)
for some). lAo. • The next theorem exteDc.l8 Theorem 2 to practically _1' 1011 functioa that is twice dift'erentiable in l for I and A clOBe to ActTBBoRBK3 Hyp.
1. HCA; :1:1) is positive aDd continuous at Ao.
2. a(A) -
1 at L(l; A) 2al
Df - ---,
I
I~
is continuous at Ao.
3. Inequality (5.1) bolds for). in 1\0. CoNCL.
For BUfficiently large ta, L(I(n); A) 2: (1 - e)cx(Ao)/aHCAo; Zl)
for eome ). lAo. PRooP. It may be suppoEd without 1088 of pnerality that. < Ii aDd that, for I, A • .\0, L(l; A) ~ (1 - ,)~a(A)(1 - A)'. It may also be suppoEd that 1(%; 71) lAo- This is 80, because it would 8uffice to prove tbe theorem for a new estimate 1'(,,), where l'(zj _) is defined to be the number in Ao cloeest to l(~; ft), which ill tum follows from the fact that L(I'(n); A) S L(I(,.); A) for A lAo. These auppoeitioDB haviq been made, tbe theorem is a direct COD-
aequeDce of Theorem 2. • If L(l; A) satisfies (5.1) and has two derivatives with respect to I eontinuoU8 in A for every ~ and for every I sufficiently cloee to A, and if H().; Xl) is ConUnUOUB and positive, then, for su8iciently large ft., CoROLLARY
(26)
1
L·(n)
~
(1 - f) sup «('A)/nH('A; x.), ~
where L *(n) is the minimax value of the estimation decision problem derived from L(lj A) and z(n), unleas the supremum ill queetiOll is infinite, in which cue nL·(ft) approaches infinity.
Of course, it would be enough to 888UlDe only that L(l; A) and H(Ai Xl) are weD behaved at some aequence of values of A. OIl whieh the aupremum
16.6]
BEHAVIORALISTIC REVIEW OF F.BrIMATION
241
in question is approached. In particular, if the supremum is actually attained at some A, they need only be wen behaved there.. Now. turning to the sequence of rnaximum-likelihood estimates, let them be denoted for the moment by l(n). It is known that under rather general hypotheaes "H(i(n) - A) is asymptotically normal about saro with asymptotic variance l/H(A; Zl). t This suggests, and ex· ampJee tend to confirm, that, under some supplementary eonditioDB, (27)
Jim nE( (1(n) -
~)
2
• -.
) ==
1 H(AjJ:l)
.
Indeed, one set of conditions implying (27) is stated in [W5], but one that BeeDll diffieult to apply. It can be shown that (27), together with the usual asymptotic behavior of i(n), implies __
(28)
lim
nL(l(n);~)
a(A)
-
•- •
, H('A; %1)
provided, for example, that L(l; ~) is bounded for eaeh A and that the second derivative of L(I j ~) \\;th respect to l exists when l - A. Easily applied rigorous theorems implying (28) much less (27) do not seem to have been formulated yet; but examples suggest that, under conditions general enough for many applications, (28) actually does hold uniformly, in the sense that, for n sufficiently large, (29)
(1 - e)cx(A)..
(1
+ e}a{A)
- - - - < L(l(n); A) S - - - nH(~; Xl)
nH().; J:t}
-
for all A simultaneously. If (29) holds, then, in view of CoroUary 1, i(n) is nearly minimax for large n, in the sense that (30)
L·(n) ~ (1 - e) sup L(1(n); >.). ~
Good examples can be based on (a> of Tables 3.1 and 4.1, letting L(lj p) be any 1088 function having two continuous derivatives in l throughout 0 ~ " p ~ 1. In particular, the example diseuased in 113.4 arises, if L(lj p) - (I - p)l. It can be argued that the phenomeDon discussed in connection with that example is probably not rare;
t Some key references for the asymptotic behavior of i(ft) are [K21. [C9J. [LSI. [WIG), [N.). The literature on this subject is extraordinarily complirateci. There are aclmowledpd mathematic:aJ mistakes in some of ita most aopbiaticat.ed publicatioaa. otben prove much leal tbaD any but the most attentive reader would be led to euppoee; lew pve an adeqU&te statement of their re1ationa to their predeeell8Ol'I; aod thoee that make eerious pretentiOQa to ricor involve complicated hypotheee8. Por dOCW'Dentation or tbia lament tee IN4). [W.j), mel (La).
POINT ESTIMATION
(16.1
because, for minimax I(A), L(I(n); A) is, judginl from. eumplee, ofteD constant and, therefore, nearly equal to mp «().)/ftll(>.; ~l)t but L(i; A) ~
closely follows the rise and fall of a(A)/nHCA j Xl). Turn now to Criterion 9. efficiency. It 888ID8 difficult to defend the criterion as it has been defined in cOIlDeetion with (4.8); for what virtue is there in the asymptotic normality required by (4.8)1 It it perhaps noteworthy that the sequence of minimax .timates, tl(n). ariaing in cODilection with 113.4 does not .tilly (4.8). Indeed, (13.4.3) implies that 1&H~(n) - p) is asymptotically normal not about zero, but about p). It is my impression that the essence of the efficiency concept resides not in asymptotic normality, but in the overall behavior of the mean square error of a sequence of estimates. I therefo~ propoee U!ntatively to modify the definition and to call a aequence of estimates 1(,,) efficient, if and only if ita mean square error behavee at leaat as weU 81 can typically be expected for a sequence of muimum-likelihood esti...
(I -
mates. Formally, I propose to eaIll(n) efficient, if and only if, for " sufficiently large, E([l(n) _ AJ2) ~ (1 + e) (31) nH(~; Xl)
for every ~ simultaneously. I think the main objection that is likely to be raiaed to this proposed definition is 8880Ciated with the p088ibility that in some problema of theoretical, and perhaps also of practical, importance (31) is not .tiIfied by any sequence of estimates whatsoever, though the maximumlikelihood eequence is efficient in the "official" 8eD8e. In BUM a problem, are the maximum-likelihood estimates not as good for all practical purposes for sufficiently large n as though their variances were actually equal to thOle of the nonnal distributions to which they approximate? It is natural to think 80 by analogy with other contexts in the theory of probability, but approximate nonnality is actually no substitute for (31) in the present context. The next paragraph is devoted to aD example illustrating the inadequacy of asymptotic variance 88 a measure of asymptotic 1088. It can be skipped without 1088 by anyone Dot interested in such technicalities. The best example I have been able to construct is derived from a Iequence of observations that is not a standard sequence. Whether the interesting features that it exhibits CAD actually be realised by standard aequeoces, I do not know; but the example will do to illustrate the iasue. Let y(n) be any real random variable 8Ubject to the deuity
14.6) .~(0
0 and
L(f1 ; i)
>0
for i. Ho,
and L(11 ; i) - 0 for i • Hit
L(loi ,j - 0 and L(f1 ; i) - 0 for i . N. When it is recalled that the i's correspond to a partition B, of S, the leta H o• HI, and N may, with a slight clash of logical gears, be regarded as three events partitioning 8. The traditional names of H 0 and HI are the DUll and the altematift hJPOtheaia, respectively j N, beiDg quite unimportant and often either ignored or made vacuous by some trick of definition, has DO IUch name. Rejecting the null hypothesis when it does in fact obtain and accepting it \\ilen it does not obtain are called erroR, more specifically erron of the first and aecoa.cl kind, respec-
tively. A teat is a derived act of a testing problem. A test may conveniently be identified with the real-valued contraction z of the obaervation Z, web that ~(%) is the probability prescribed by the test for rejection of the null hypothesis in ease z is observed. An unmixed test (which v...s until recently the only kind contemplated) oorresponds to & I confined to the two values 0 and 1, which respeetively imply outright aceeptanee and rejection of the null hypothesis.
(16.2
TESTING
The 1088 8IIJOCiated with the taJt z when i obtains is clearly (2)
L(z; 11 - L(fo; I;B(1 - z , I)
+ L(f1 ; I1E(z 1i)
== L(ft ; i)B(1 \ I}
for i .Ho
= L(fo; i)(l - B(s 111]
for i
-0
fori .N.
,H.
I
I
The functions B(z i) and [1 - B(z t)] are, respectively, the probability of rejecting and aeeeptiDg the null hypothesis with the teat z wIleD i obtains. There is obviously Dot.h;DI to chooee between them in importance or convenience, each beiDg equivalent to the other. They are commonly caUed the power faDctioD, and opemtlDc characterisdc, respectively. In view of (2), ODe test % domiDateB another ZlJ if and only if (3)
E(z I i) S E(z' I '}
fori aHo
B(z 1I)
for i
~ E(ZI , i)
I
Ht ;
or. apin, if and cmly if the probability of error with ~ is at least as great 88 with z for every i. Thus, dominance, admissibility, and equivalence depend on the basic 1088 function, L(fr ; I), only in 80 far 88 that function determines Ho and HI. This is not ODly remarkable but aJao useful; for H 0 and HI may well be clearly defined in contata where the basic 1088 is vague, or othenvi8e ill detennmect If z is admissible in the spirit of (3) relative to a pair of seta Ho and HI, then (if co is for the moment admitted 88 a poasi'ble value for .. lea) there exists a basic 1088 function leading to Ho and HI and having z &8 its essentially unique minjmax. Indeed, let L(fo; i)
(4)
L(f1; tJ
== [1 - E(z I i) ]-1
for i aH.
==0
elsewhere;
..
for i e Ho
E(z, ,)-1
==0
elsewhere.
'Vith this 1088 and reckoning o· co I=t 0 (as is appropriate here), L(zl i) == 1 or 0, according 88 there is or is not positive probability of making an error at i with z. In view of (2) and (4), any minimax Zl Dot equivalent to z would strictly dominate It contrary to the assumption that z is admissible. The moral of that conclusion can be put thus: Without special 888UDlptioDS about the basic 1088, the principle of admissibility
16.21
A THEORY OF TF.3FING
and the minjmax rule lead to no criteria expressible solely in terms of Ho. lIt, and the conditional distributions of the observation x other than that of admiaJibility itself. Whether some other objectivistic principle could justify such criteria may be cousidered an opeD question, but, 88 I have already said (in , 15.1), DO other general objeotivistic principles have been aeriously maintained. It is natural, for example, to demand that z have the same symmetry BB P(z I 'J and Ho and H1 j but that criterion can surely Dot be jU8tified at all, unless the basic 1098 is also assumed to have the same symmetry, the jll8tifiability of which in tum depends on the cue. To take another important example, it is often proposed that a satisfactory test must be unbiaaed, t that is, ita power function must never be hilher in Ho than in HI' More formally, the test I is unbiased, if and oraly if
E(z I io) S E(II il)
(5)
for every 10 I H0 and every il • HIA.umiDg that L(fo; f.j and L(fl; i) are conatant in HI and Ho, respectively, it will be shown that any minimax muat be unbiased. As a step toward that demonstration, coDSider a testing problem as a minimax problem, without any special aaaumption about the baaia 1088.. It is pcaible that L* .. 0, in which C888 the miDimax testa are all equivalent and all unbiased. Putting that poaibility aaide, I aaert, and will ahoW', that (under the usual mathematical simplmea.tiODs) max L(z; i) .. max L(z; i) == L*
(6)
,.R,
'cS,
for any minimax z. It is obvious that neither maximum exceeds L·, and also that one or the other must equal L*. But suppose, for example, that the aeeond maximum were actually lese tban L*, and consider r - cd with 0 < cr < 1. AccordiDg to (2), if z' is substituted for z, the first maximum in (6) will be dep1'e8led, and, for cr aufticiently cia. to 1, the aecond would remain actually 1MB thaD which contradicts the ,.lmptiOll that z is m;njmax~ eetabliahmg (6). Now make the apecialua1mption that
L·,
L(fo; I} - A
(7)
for i ,Ht
forieHo,
aDd suppose that z could be minimax but biased.. There would then
t A definition uDifyin& the 'YuiOWl fonrard in
[L5J.
eoocepta of UDbiued.De. ill atati8tiea is put
TIBTlNG
118.1
exist io • Ho and il • HI such that (8)
L* - L(I; 10)
a
BE(z I to) - A - AB(z IiI) - L(z; il)'
and such that E(z; io) > B(I; ill. But eoDaideration of the teat that. simply Ulips to every z the Dumber 8 midway between B(z; 10) and B(r.; '1) ahowe that I eou1d not be minimax, The condition (7) is a re&8OD&ble &88UIDption in lOme testing problema, and, where (7) is satisfied, the criterion of unbiaaedD 888 has such aupport as the minimax rule can give. In many other typical testiaa pro)). lema, however, there are borderline errors that hardly matter at all but can scarcely be prevented, and serious errors that eaD largely be prevented. The fonowing example, which can be varied to suit divene t.aataJ, ahows that it can be folly to insist on unbiaaedDe8I in such problema. Let i take the three values 0, 1, 2, and let J: take the vatu. 0 aDd 1 with conditioD&l probabilities defined thus: (9)
P(O I 0) - 99/100,
P(O 11) - 0,
P(O 12) - 1.
Let the buic lea be defined by the condition tbat i • Ho or i • H 1, ~ cording 88 i - 0 or DOt, and by (10)
L(ll; 0) - 1,
L(to; 1) - 1,
Then L(z; 0) - ["z(O) (11)
L(Io; 2) - 1/101.
+ .e(I)VlOO
L(z; 1) - 1 - a(l) L(z; 2) - [1 - ,(0)1/101.
It is eaai1y verified that the only minimax z· is defined by z*(O) - 0, .1*(1) =- 100/101, and that L(z*; t) :IS L* == 1/101 for every i. But it is alao euily verified that the only unbiaaed teats are absurd in that they ignore the observation 1:; they are in fact just thoae for which %(0) == ,(1). It has until quite recently been said by many that attention should be confined to tests such that there is a fixed probability (I (called the lize of the test) of making an error of the first kind for every i • HoIndeed, the criterion of siR has often been taken 80 seriously as to be incorporated into the very definition of a test. Thoup many important testa happen to have a sile, others equally important do not; 10 it now eeema to be recognised [lAl that the po_8Ifion of a siae cannot
16.2J
251
A THEORY OF TESTING
be taken seriously 88 a criterion. t To take an everyday example, conaider the binomiaJ distnUUtiODS P(z I p) '"'
(12)
C:
1) pa(l - p)101_,
where tbe parameter p confined to [0, 1] plays the role of i and % - 0, ..• , 101; and suppoae that Ho is the hypothesis that p < 1/2. A teat of size Q is a test fOI" which
~ z{x) C~) P"(1 -
(13)
for all" (14)
< 1/2.
p)101-..
-= CI
This obvioue1y implies
E [z(.%) -
al
(101) ( P)Z _0 z
z
P I -
for all p < 1/2, whence z(z) = ex for every z. So only absurd testa have aile, in this example, though there are elearly tests here that are quite satisfactory for many applications, for example, let z(z) equal 0 or 1 according 88 Z S 50 or % > 00. In view of the criticism just made, there is a tendency to redefiDe lise 80 that any test bas a aiu cr, namely,
(15)
a
&:IS
Df
max B(z
I t.J.
i. Ho
In t.erms of this definition of size, a concept of testing somewhat different from that proposed in this section hat been defined and defended (Wald, p. 21 of (Wal, and Lehmann, pp. li-l8 of [IA]; D81D.ely, it is postulated that a test is to be chosen not trom among all possible tests. but only from among those haling a size a (in the sense 01 (IS)) given .. part of the talting problem.l This concept ot testing is Dot defended to the exclusion of the one proposed here, but it is asserted by tbe authOl'8 cited to be more realistic for some problems. The aJ'IUD1enta of both authors on this point are similar and, I think, quite weak in two crueial p1acee, for the advantage is supposed to Bow in some uUJMciJNd way from the undemona'"*t1 impossibility of comparing preferenees for consequences of qualitatively different kinds. It MelDS, it I may be allowed such 8 conjecture, that the concept of testing under a
t Statisticians in~rested in the Be~n.t-Fisber problem may be interestA! 0, i < 0, i O. This analysis of this, and similar, problems bas recently been explored in terms of the minimax rule, for example by Sprowls [S16] and a little more fully by Rudy (R4], and by Allen [AS]. It seems to me natural and promi8irlc for many fields of :II
18.1)
TESTING IN PRACTICE
253
application, but it is Dot a traditional analysis. On the eontrary, much
literature recommends, in effect, that the person pretend that only two values of i, io > 0 and it < 0, are possible and that the person then choose a test for the resulting simple dichot.omy. The selection of the two values 10 and i l is left to the person, though they are sometimes supposed to correspond to the person's judgment of what constitutes good quality and poor quality-terms really quite without definition. The emphasis on 8imple dichotomy is tempered in 80me acceptan~ sampling literature, where it is recommended that the person choose amoDg available testa by some largely unspecified overall consideration of operating characteristics and costa, and that he facilitate his 8Urvey of the a,"ailable tests by focusing on a pair of points that happen to interest him and considering the test whose operating characteristic passes (economicallYt in the case of sequential testing) through the pair of points. These traditional analyses are certainly inferior in the theoretical framework of the present discussion, and I think they will be found inferior in practice. To make a small digression, there is a complication in connection "ith testing whether to buy that is not ordinarily envisaged by statistical theory; namelyJ the economic reaction between the buyer and the supplier. If, for example, the supplier know'S the test the buyer is going to apply, that knowledge "ill influence the quality of the lot supplied. There seems to be little, if any, successful work oh the economic problem thU8 raised about the game-like beha\ior of the two people involved (af. pp .. 331, 340, and 346 of {W61). The problem "ilether to buy a Jot obviously has many formal COUDterparta in other domains. In some of them it is particularly clear that purely objectivistic methods do not suffice. To illustrate, imagine two experiments: one designed to determine whether it is advantageous to add a certain small amount of sodium fluoride to the drinking water of children, the other to det~nnine whether the same amount of oil of peppermint is advantageous. Grant.ing that each of the two additions can be made at the same cash cost for labor and material and that the desip of the two hypothetical e:~perjment8 differ only in the interchange of the roles of sodium fluoride and oil of peppermint, the corresponding testing problems are object~vi8tiea1ly completely parallel, that is, the same with regard to 1088 function and eonditional probability of the oheervation8. But it must be acknowledged, lthink, that the people actually charged with the decision in either of these two eases would ad should take into account opinions they had before the observation. For example, they might originally have considered it nearly impossible that the oil of peppermint could result in any hygienic advantage large
TESTING
(18.1
enouch to compensate for even the small coat of ita admjDiatrationt but, in view of recent dental.roeoarches on the subject, they mi.", Dot have couidered it at all unlikely that the sodium fluoride should have aD overall advantap. In that ease, parallel observatiODB in the two aperiments would Dot always lead to parallel decisioDS. ObjectivUsta typically admit such a poasibUity but go on to say that it is UDl'e88OIl&ble to isolate the experiment and that it is the totality of iDformation beariDl on the subject that should be trgted objectiviatically. If objectiv.. ists could give a more detailed diacuaeiOD of how to deal with such a totality of information, it might do much to elarify their position. I turD now to a djft'erent andt at least for me, delicate topic in CODDfJOtion with applications of the theory of teetirag. Much attention is given in the literature of statistics to what purport to be teeta of bypath__, in which the null hypothesis is such that it would Dot reaDy be aeeepted by anyone. The followinc three propoaitiODl, though pJayful in content, are typical in form of these abdM null hypotheses, .. I shaJl call tbem for the moment. A
The mean Doise output of the cereal Krakl is a linear fUDction of
the atmoepherie pressure, in the range from 900 to 1,100 miDibar8.
B The basal metabolic consumption of sperm whales is normally diatributed [WIl J. C New York taxi drivers of Irish, Jewish, and Scandinavian extraction are equally proficient in abusi\"e language. Literally t~ test 8Uch hypotheses AI these is preposterous. If, for example, the loes associated with f1 is sero, except in cue Hypotheaia A is exactly satisfied, what possible experience with Krakl could dissuade you from adopting The uDacceptabUity of extreme null hypotheees is perfectly well known; it is closely related to the often heard maxim that acience diaproves, but never proves, hypotheses. The role extreme hypotheeea in science and other statistical activities seems to be important but 0bscure. In particular, though If like everyone who practices statistica, have often "tested" extreme hypotheses, I ea.nnot give & very satisfactory analysis of the process, nor say clearly how it is related to testing 88 defined in this chapter and other theoretical diaculliODS. None the leas, it, aeema worth while to explore the subject tentatively; I wiD do 80 largely in terms of two examples. Consider fi1'8t the problem of a cereal dyDamicist who muat estimate the noi8e output of Krakl at each of ten atmospheric preaeures between 900 and 1,100 millibars. It may well be that he can properly n!pI'd the
'.?
or
le.8]
TESTING IN PRACTICE
problem 88 that of estimating the ten parametera in question, in whieh cue there is no question of testing. But aupPQJ8, for example, that one or both of the following cODSiderations apply. First, the engineer and his colleagues may attach considerable personal probability to the possibility that A is very nearly satisfied-very nearlyr that is, in terms of the dispersion of his measurements. Second, the administrative, eomputational, aDd other incidental (-.()8t8 of using ten individual estimates might be considerably greater than that of using a linear formula. It might be impractical to deal with either of these considerations very rigorously. One rough attack is for the engineer first to examine the obaerved data % and then to proceed either as though he actually believed Hypothesi8 A or else in some other way. The other way might be to make the estimate according to the objectivistic formulae that would have been used had there been no complicating considerations, or it might take into account difTerent but related eomplieating considerations not explicitly mentioned here, such 88 the advantage of U8ing a quadratic approximation. It is artificial and inadequate to regard tbil decision between one class of basic acts or another as a test, but that is what in current practice we seem to do. Tile choice of which test to adopt in BUM a context is at least partly motivated by the '·ague idea that the test should readily accept, that is, result in acting 88 though the extreme DUn hypotheses were true, in the farfetched caae that the nuD hypothesis is indeed true, and that the worse the approximation of the null hypotheses to the truth the less probable should be the aaeeptance. The method just outlined is crude, to say the best. It is often modified in accordance with common sense, especially 80 far &8 the aecond coDSideration is concerned. Thu8, if the measurements are sufficient]y preeiae, no ordinary test might accept the Dull hypotheses, for the experiment will lead to a clear and sure idea of just what the departures from the null hypotheses actually are. But, if the engineer considers those departuree unimportant for the context at hand, he will justifiably decide to neglect them. Rejection of an extreme null hypothesis, in the sense of the foregoing discussion, typically gives rise to a complicated subsidiary decision problem. Some aspects of this situation ha\"8 recently been explored, for example by Paulson [P3), (P4J; Duncan {DII], (D12]; Tukey [T4], [T5]; Scheff~ [S7]; and W. D. Fisher [F7]. To summarize abstractly, I would say that, in current practice, ~ called testa of extreme hypotheses are resorted to when at least a little credence is attached to the possibility that the null hypothesis is very nearly true and when there is some special advantage to behaving 18
TESTING
[18.3
though it were true. One other illustration will make it clear that point estimation is not essential to the situation and that belief in the approximate truth of the null hypothesis alone does not alwayB justify testing. Conaider the penonnel manager of a great New York taxi company.
Wishing, of
that his drivers should be 88 proficient 88 poaaible, he would, under simple circumstances, hire exclusively from the Dational-extraction group that bad obtained the highest mean aeoree in a standard profioiency examination; for why should he not be guided by a positive indication, however slight? A statistical test of the extreme Hypothesis C would not, therefore, be called for, .. has been pointed out in general terms by Bahadur and Robbins [Bal. Even strong belief that ethnic difierenr.es are extremely email in the respect in question would not alone be any reason for departing from this simple policy, dictated by the principle of admj88ibility-quiu, in contrast to the . ample framed around Hypothesis A. H, however, public opinion, a shortage of labor, or administrative difficulty militataJ againat any diacrimination at all, the manager may resort to a test baaed on the exCOUJ'88,
amination SCOral. In practice, tests of extreme hypotheses are typically chOEll from a relatively small arsenal of staDdaM types, or families, each family consisting of one unmixed test at every signifiea.nee level (as is always called in this context). In publications, it is standard practice Dot simply to report the result of a test, but rather to report that level of significance for v.·hich the corresponding teat of the relevant family would be on the borderline between aeeeptance and rejection. The rationale usually given for this procedure is that it enables each U8e1" of the publication to make his own test at the aignifiamce level he deem. appropriate to his partioular problem. Thu the aignificauce level is supposed to play much the same practical role .. a auflicient statistic. An interesting contribution to the theory of extreme hypotheaea is given by Bahadur [BI] in the speoial context of the two-sided t-teat.
me
CHAPTER
17
Interval Estimation
and Related Topics 1 Bstimatel of the accuracy of estimates The doctrine is often expre8ged that a point estimate is of little, or no, value UDlesa aooompanied by an eatimate of ita own accuracy. This doctriDe, which for the moment I will call the d~ of ~ ... motion, may be a little old-fashioned, but 1 think some critical diacua&ion of it here is in order for two reasons. In the first plaee, the doctrine is still widely coDSidered to contain more than a grain of tmth. For example, many readers will think it strange, and even remisB, that I have written a long chapter (Chapter 15) on estimation without even sugestiq that an estimate should be accompanied by an estimate of ita accuracy. In the second place, it seems to me that the concept of interval estimation, ,,-bleb is the subject of the next section, baalarply evolved from the doctrine of aceuracy estimation and that di8cuaDOD of the doctrine will, for some, pave the way for discuasion of infBval estimation. The doctrine of accuracy estimation is vague, even by the standards of the verbalistic tradition, for it does not 8&y what should be taken .... meuure of accuracy, that is, what an estimate of aceu~y ahould estimate. Any measure would be rather arbitrary; a typical one, here adopted for definiteness, is the roo~ enor, (1)
BH({l - ~(i)J21 B i )
-
(V(11 B i )
+ (E(11 B,) -
~(t))2} H,
using (15.6.23). The root-mean...quare error reduces to the ItaDdard deviation, VH(l1 Bi), in case the estimate 1 is unbiased. Taking the doctrine literally, it evidently leads to endIe. regressiOD. for an estimate of the accuracy of an estimate should presumably be accompanied by an estimate of ita own accuracy, and 80 on forever. E\"en supposing that the doctrine were somehow purged of vagueDeee and endless regression, it would atiI1 be in clear eonJliet with the behavioralistie concept of estimation studied in Chapter 15. If a decision ~7
INTERVAL E8TIMATION AND BEI.A.TED TOPICS
[17.1
problem consists in deciding on a number in the light of an observation, the person concerned wants to adopt an 1 that i8, in lOme 88D8e or other, as good as possible; but, sinee he must make some deeision, it eould at most satisfy idle curiosity to know how good the best ~ idle, I say, because, his decision once made, there is no way to use knowledge of ita accuracy. Since it seems to me that the kind of problem envisaged in Chapter 15 is of frequent occurrence and may properly be called estimation, I am inelined to say that the doctrine of accuracy estimation is erroneous. However, it is possible that someone should point out a difterent class of problems, also properly called problema of estimation, with respect to which the doctrine baa some validity; thouih, 80 far as I know, this has not yet occurred. One sort of situation that might, through what I would coDlider faulty analysis, seem to support the doctrine of accuracy estimation is illustrated by the fonowing, hiPIy scbematized example. A per&oI1 has to estimate the number 7& of replacement parte of a certain sort that should be carried by an expedition. He can conduct a trial the outcome of which will, let us say, be an observation x distnouted in the Poisson distribution with mean equal to tICR; that is, (2)
I
P(~ n) = ,---(am)S/2:1,
where a is a known cODStant and e, which the persoD can choose, is the east (beyond overhead) of the trial. Under reasonable hypotheses, once c baa been chosen and the value % observed, n(s) == s/ae is a good estimate of 7&; and in 80 far as the problem is of the type envisapd in Chapter 15, that is the end of the matter. But there may be features of the problem that have not yet been stated, though in principle they should have been. In particular, it may be that the persoD is free to conduct a.aecond trial, though there wiD typically be a high penalty for doing so. One rough, but sometimes natural and practical, step toward deciding whether a seeond trial is called for is to remark that (n/ac)H is a good estimate of the root-meansquare error of ft and may give a fairly good basis on which to judge whether the risk of misestimation warrants the expense of a second trial. My own conviction is that we should frankly regard such a problem as has just been described 88 a special problem in sequential analysis and treat it as an orpnic whole. Viewed thus, c is to be chosen in the light of the pouibility of making a seeond trial. The decision to be baaed on % is the complex one of whether to go to the expense of a second trial; if 80, of what magnitude; and, if Dot, what estimate of fa to adopt.
17.2)
INTERVAL ESTIMATION
Another sort of situatiOD that aoe&liB to have stimulated the doctrine of accuracy estimation is the following. Suppose that a rEHearch worker
baa obeerved :1:1, • • ., Sa, which are independent and normally distributAMI about the mean I' with varianee til liven ", and tI. If be wishes to publiab the resulta of hie investigation for all concerned to use as their own needs and opiniOD8 may dictate, be should, ideally, publish a sufticient ltatiatic of his observation, stating how it is distributed given " aDd fl. Any other COUI'8e may deprive some reader of some information he might be able to put to use. So far as the primary aim is concerned, all euflieient statistics are equivalent, but secondary collliderationa greatly oarrow the research worker's choice. To illustrate, consider the five auJlieient statiatica the values of which for (ZI, ••• , %" l are:
Ca) {%1' ••. , %.J. (b) The n order statistics of (e) %, and %i2•
(ZI • ••• ,
Cd) f
Df
L
=:
L
Df
:E Zi/n and .,
(e) ~ and '/nH.
t=
z.J.
(L %l - f :E %1)/'" -
1.
is at all large, (e), (d), and (e) are cbeaper to publiab than (a) and (b). ~foreoverJ for almost any use to whieh a reader miKbt wish to put the data, (c), (d), and (e) will save him a conaiderable amoUDt of computation. In 80 far IS it is true that alm08t any reader who has a use for the data at all will use I, but Dot Decessarily E ~" statistics like (d) and (e) are slightly preferable to (c). There is IOmeih i og to be Mid both for (d) and for (e), in view of the teady availability of certain tables; but, at Ieaet when n is very large, there is a eligbt advantap to (e) for those calculations a reader is most likely to perform. III particular, a reader using (e) caD, when n is large, often ignore the actual value of ft. Even if the distributions of the Xl, ••• , z" are not euctly nonnal, (e), (d), ad (e) often can play alm08t the aame role as aufticient statistics. It is no wonder then that (e) is often ehOll8ll .. a CODveDient way to present data. But, in my opinion, it is a mistake to lay great theoretical emphasis on the f~t that (e) happeD8 to eoDBist of what is ordinarily a good estimate of 11>, namely ~, together with what is ordinarily a good estimate of the root.-mean...aquare error of that ... timate, namelY./ft H•
If
ft
S Interftl eatimatioD and ccmft4eDce mterva1a
The verbalistic tradition baa suggested a procedure differeDt from point estimation but somehow related to it. This other procedure, here called inCemll utimalion, can be defined as followa, though the definition is neeetarily vague. Where x is aD observation subject to the
INTERV AL ESTIt.IATION AND RELATED TOPIC5
260
(17.2
I
conditional distributions P(%- B,) and A(,l is a function of i, gllefll that ~(i) lies in some set Jt(s) (to be eaIled an interval estimate) determined for each value of z. It is almost a part of the definition to say that the function AI(x) is to be 80 chosen that P(>,,(t,) I M(z) B,) aha ll be nearly 1 lor every i and that M(z) should tend to be small and "e1(J88 knit" in a geometrie.al8eD8e, some compromise being effected between these two conflicting deJIiderata. The parameter ~(t) could in principle be a very general function, but it will here be enough to suppose for definiteneaa and simplicity that A(t) is real. Though more general poaribilities are cODtemplat«i in principle. the set M (z) is in practice typically a bounded interval, which corresponds with what I meant in saying that M(z) is supposed to be "close knit." The idea of interval estimation is complicated; an example is in order. Suppose that, for each >.., x is a real random variable normally distributed about>.. with unit variance; then, &8 is very easy to see with the aid of a table of the DOnnal distribution, if M (%) is taken to be the interval [% - 1.9600, z 1.9600], then
I
+
PC). ,M{:r:) I ~) == a ,
(1)
where a is constant and almost equal to 0.95. It is usually thought nece88&rY to warn the novice that such an equa(1) does not concern the probability that a random variable ). lies in a fixed let M(z). Of course, >.. is given and therefore not random in the context at hand; and, given A, a: is the probability that M(x), which is a contraction of 1:, has 88 its value an interval that cODtaiD 8 A. Why seek an interval estimate? One sort of verbalistic answer nma like this: At first glance, the problem of estimation seems to require that a person guess, on observing that x takes the value %, that A(i) hu lOme particular value l(z); but, since it is virtually impossible that such a gueM should be correct, it seems better to try something else. In particular. it is often poarible to a.ssert that )'(i) is in a comparatively narrow interval M(z). chosen according to such & system that it is very improbable for each i that the aasertion will be false. Less extreme verbalistic explanatiODS tend to give the impreseion that point estimation need not be altogether rejected t but that interval estimation satisfies a parallel need. The first part of the explanation just cited is specious, since no ODe really expects a point estimate to be correct, and since, when one really is obliged by circumstances to make a point estimate in the behavioralistic sense, there is no escaping it. NODe the leas, that part of the explanation does seem to give some insight into the appeal of int«val estimation. The second part of the explanation is a sort of fiction; for it tion
88
17.2]
INTERVAI4
F.srI~IATION
261
will be found that whenever its advocates talk of making 888ertiollB that have high probability, whether in connection with testing or estimation, they do not actually make such assertions themselves, but end.. leasly paaa the buck, saying in effect, "This assertion has arisen accordinK to ,. sy&tem that will seldom lead you to make false assertions, if you adopt it. M, for myself, I usert nothing but the properti~ of l.he system." From the behavioralistie point of view, I maintain that point estimation fulfil. an important function. On the other hand, I can cite no important beh&vioraliatic interpretation of inten·al estimation. )Ioreover, in such direct and indirect contact as I have bad \lith actual statistical practice, I bave-l\ith but one extraordinary exception, which wiD soon be discussed---imeountered no applications of interval estimation that seemed convincing to me as anything more than an infonnal device for exploring data or crudely summarizing it for others. In short, not being convinced myself, I am in no position to present eonvincing evidence for the usefulneaJ of interval estimation as a direct step in decision. The reader should knowJ however, that few are as pessimistic 88 I am about interval estimation and that most leaders in statistical theory have a long-standing enthusiasm for the idea, which may have more solid grounds than I now know. The following is a scbematized example of one sort of decision problem that does call for something like interval estimation. An observation % bears on the position A of a lifeboat, the occupantR of which will be saved or lost according 88 the boat is or is not sighted by a searching aircraft before nightfall. The decision problem is, therefore, to choose, from all the domains that the airplane could seareh in time, one domain )f(z); and the 1088 must, in effect, be reckoned as 0 or 1 aceordiDK 88 M(z) does or does not contain A. rrhis type of problem eeema. however, too rare and too special to be taken as representath·e of thoee for which interval estimation is so \\idely advocated. Many criteria have been put fonvard for intenPal estimation, but I am of course in no position to discuss them critically. J. Neyman has lone about the search for criteria systematically, setting up a parallelism between the theory of interval estimation and of testing. In particular, paralleling the criterion of fixed size for tests, he has emphasised interval eetimates aueb that (2)
P().(i) • M(%)
I Bi )
=-
a
for a fixed a (typically cloee to 1) and for every i. Such interval estimates are called con fldeac8 inte"als at the coa1Ideuce level CI. The interval estimate mentioned in connection "it·h (1) is obviously a COD-
282
INTERVAL F.BTIMATION AND RELATED
TOPI~
(11A
fidence interval. Wald [W3) sought to include the theory of confidence intervals in the minimax theory, but in my opinion he did not succeed in giving intaval estimation a behavioraliatic interpretation. Though I am in no position to criticize any criterion of interval . . timation, I venture to uk: whether (2) is not gratuitous, 88 I have m.ore positively aaaerted. of ita &Dalogue in the theory of testing. Chapters 19 and 20 of (K2] will serve as key refereneea for interval estimation. a Tolenace inten-ala There baa recently been considerable study of what are called tol__ ace intemlll (or IlmltB). They are related to the problem of gueBBiug the actual value of a real random variable ." on the basia of an obaervatioD of E. A tolerance intero1 for ., at toitnDCe level a and f:4II8deDce left! ~ is an interval-valued function Y(z) lUob that
I
(1)
P{P(, • Y(%) B" %)
> a I B.)
- (J
for every i. The concept expre88f!d by (1) is & slippery one; perhaps it will help to express it in worda thus: For every Bil there ia probability" that ~ i8 such that 11 wiD fall in Y(%) with probability at least G, given B, and z. In typical applications 11 is independent of z; this permits a alight simplifieation of the definition. The notion of tolerance interval IUs AlB to me at least as I1namenable to behavioralistie interpretation 88 that of confidence interval, and I therefore venture no diacusaion of it here. Key references are [B22] and [W7]. ,
PidudaJ probe.bUity
This is not really a section on fiducial probability, but rather all apology for Dot having such a section. The concept of fiducial probability put fOl"Ward and stressed by R. A. Fisher is the moat disputed technical concept of modem sta.ti8tics, and, the concept is larpIy concerned with interval estimation, I wanted to diacuss it here. I have, however, been privileged to see certain as yet unpubliabed manuscripta of R. M. Williams [W12J and J. W. Tukey which convince me that such diacWJJion by me now would be premature. Some key references to fiducial probability aDd to the BehreD&.Fisher problem, which ia the moat disputed field of application of fiducial probability. are YJSher's own papers, especially [F5], and Papers 22, 25, 26, 27, and 35 of the collectioD [F6); Kendall [K2J, Chap_ 20; Yatea (YI]; Owen (01]; SepI [89]; Bartlett (B6]; Scheff41 [86), (86]; Walsh (W9]; and Chand [CSJ.+
since
+ ADd I e&J1 now add Ba1'D&rd (1983), Dempate'r (19M), l'iaber (1958, s... bODS ITI 3, IV 6, V 5, V 8, VI 8, VI 12), IAnDik (1988, Chapten VlD-X), Patil (1965),
S~befi
(1970), Tukey (1957), and Williams (1966).
APPENDIX
1
Expected Value 'I'hiI appendix, a brief account of aome relatively e1emezatary aapects of the badly named mathematical concept, expected value, is pnamted for thoee who might otherwi8e be handicapped in reading this book. No proofs are given here, but the reader who needs this appendix will probably be willing and able to accept the facta cj~ without proof. espeeiaIly if he acquires intuition for the subject by working the SUIa-ted exerciee& The requisite proofs are, however, given implicitly in 8Dy standard WOTk on integration or measure (e.g., Cbapten I-V of
1H2]). nttoughout this appendix, let S be a set with elements, and aubeefa A, B, CJ • • • on which a (finitely additive) probability measure P is definecL Bounded real radom ftriablea, that bounded real-valued functions, defined for each , • S, will here be denoted by %, 1, ••• , and real Dumbers by %, 1/, and lower-case Greek letters. The upected value of x, generally written E(x), is cbaraeterized 88 the one and only function attaching a real number to every bounded random variable x, subject to the fonowing three conditiODl for 8VerJ
w,
't
X,
J,
p,
tI,
and B:
(1)
E(px
(2)
E(s) ~ 0
(3)
+ 411) -
pB(x)
+ ,BC3).
whenever P(z(,) < 0) =-
o.
Jr(c(1 ~» ~ jl(1J).
In (3), c( I B) is the c.baracterJatk fuIlcdon of B, that is, c(-I B) - 1, if • t B, and c(, I B) - 0, if _ , "'B. In mathematical contexte remote from the topics in this book, the term "cba.racteristic funation" baa at least two other meanings virtually unconnect,ed with the one at hand, one in connection with linear operators on function spaoee, and another in cODDeCtion with the Fourier analysis of distributions. Often the expected value of z is refened to 88 the bltepal of % over 8, in which cue it is generally written s(.) dP(_).
J
:188
APPENDIX 1
Buzdses %lJ ••• , ~.
1. If s takes only a finite number of values, Bet of probability lero; then
except on a
II
(4)
E(z) ..
L %iP (S(8)
- Xi),
i-I
that is, the average of the ~l8, each weighted by the probability of ita occurrence. 2. If P(x(,) < r(I» - 0, B(s) ~ BCy); and if, in additioD, P(~(,) >
+ .)
r(') > 0 for some f > Ot then H(x) > B(y). t 3. If x is a real random variable, B, a partition, Pi and., real numben such that Pi S %(.) S tli for, I Bit then
'%paP(B,) S :8(x)
(5)
~
1:,,;P(B,).
cd A n B) - e(I A)e 0,
the conditional probability, defined by p(e IB) - p(e n B)/P(B), is itself a probability meuuret the expectation of z with respect to a conditional probability is a meMinpul concept. This conditioDal upectatioD is written E(z I B) and read "the expeeted value of % given B." Jlore ezerd_
I
5. E(x B) - B(xed B»/P(B). Hint: It suftjCftJ to verify that the expression on the right satisfies the three conditions parallel to (1-3) that define H(x 1B). 6. If B. is a partition of S, then
for every 3.
(6) 7. E(:.:) -
E E(% I Bi)P{Bi).
Hint: Use J:
-
I%.
i
t TfebnicaJ
DOte: In the event tbat P i.e oountably additive, P(z(.)
> r(,» > 0
imptiee the eKistence of a suitable f, 10 then t need not be mentioned at all.
EXPECrED VALUE
Suppoee y is a (not neeeasarily real) random variable that take. on only a finite number of values. It will be understood that B(% I ,) is the expected value of x given that ,,(.) =: 1/, provided 11 is such that this event bas positive probability.. Furthermore, it will be understood that B(I: I y) is a bounded real random variable that for each , takes
I
I,)
the value 8(x r('». The definition leaves B(I: UDdefined on the nuD set of those points , where y(.) is a value that 'I takes on with probability seJ'O.. It is immaterial how this blemi.h is removed; in particular B(s y) may as well be set equal to 0, where it baa Dot already been defined.
I
SdII more uerciaa
1,» -
8. B(B(h E(h}. 9. If f is a real-valued function defined on the values of y; then ICy) is a bounded real variable, and
(7)
E(f(y)z) =- B(JlJ)B(z I
J».
10. If 1&(x) is such that, for all f,
(8)
B(f(y)x) =- BU(y)h{J»,
then A(r(,» == B(x I tI('»' except possibly OD a set of .'. of probabUity seI'O.
Exercise 9 and ita corollary, 8, prefJeDt the moat frequently used properties of conditional expectation. Exercise 10 shOWI that the property pft!lellted in 9 characterises conditional expectation. Through this cbaraeterisation Kolmogoroff (K7) extends the ideas of conditioDal expectation and also of conditional probability (for eountably additive measurea) to random variables Y Dot neeeeearily confined to & finite or even denumerable set of values; though the definition in terms of ordiDaI'Y eonditional probability then breaks down completely, the pro... bility that ,(.) - , often being 0 for every ,.
APPENDIX
2
Convex Functions This appendix Jives a brief account of convex fuDotioua in the eame spirit as the preceding one gives an account of expeetAd value. Reaaonable facsimiles of the proofs omitUd here are . .ttared through [B,), where they may be found by anyone not, content to aldp them. An interval is a set 1 of real numbers; such that, if z, a I 1 aDd % S 11 :S -, tben 11 .1. It is Dot difficult to see that intervale can be c'laPBified according to Table 1, where it is to be understood that ~ < •. TABLZ
1. Tn
VABlOU6 T1'PB8 0. DrBBV.AL8
A real-valued fUDotion t defiDed for, in an interval I is COD"', if and only if the graph of the function never rises above any chord of it,.. aelf. ADalytieally, if p and II' are positive, p + ~ - 1, and Z, 7/11; thea (1)
t(p%
+ t11/) ~ pt(z) + .t{J). 266
CONVEX FUNCTIONS
.,
If equality holds in (1) for eome p; then, 81 is euiJy verified. it holds for every p, and t is b ..... i.e., of the form cd + fl, in the dOled interval [s, fll. An infAlrval in which t is linear will here be called an 1atern1 of 1iIleuJ.tr. If and only if there are DO intervals of linearity other than the 0De-P0int and vacuous intervals, t is ItrJctly CODftIL
BHrcIee. 1. Verify, at least II'8PhiealIy, that the fonowing functiODS are eonvex in the indicated intervals; diacua their intervals of liDearity; and _., which are mictly convex. I - (-00, +00):
(a) ". for every p, (e)
I ~ I.
(b)
:r:" + ps + • for every p and .,
(d)
I % I' for p ~ I,
(e) z. 1 - (0, GO): (f) -loss,
(g) z' for -GO
< p < o.
1- (-1, +1): (Il) (1 -
r.,-H,
(i) 1 -
COl
(d/2).
2. In an int«val where t is eoDVax, if "(s)/M:' exists at s, the "(s)/ibJ ~ 0; and if, for every ~ in an interval 1, tl't(s)/dz:' uiata ADd is DOD-oeptive, then t is convex in 1. 3. Be -plore Exercise 1 in the HPt of 2. 4. Let T be a non-vacuous .t of funcQoDl, t, t', ... , convex in I, _diet (2)
1-(.) - lOp 1(.). t
In (2), as always in mat.bematb, the IUp, or supremum, of a set of numbers is the least number, poEl'b1y GO, that is not Ifill than any . . mea.t of the set. If t·(.) < GO for every ,,1, then t- is convex in 1. Explore the proposition just stated, first paphica1ly, especially for a mute Bet of linear fs, and then analytically. What if the elementl of T an all strictly convex? 6. In aD open interval whet'e t is convex, it is aI80 eontiDuoua. What are the facta for clo.ed and half-clcleed intervaJa?
APPENDIX 2
6. If t is convex in 1, z• • 1, Pi r; then (3)
> 0, aDd Z,. - 1, where ~ -
1, ••• ,
~PlI(%i) ~ ,(~~)
Equality obtains, if and ODly if aU the sa'. are in a siDgie interval 01 linearity of t. (a) Interpret the propoaitioDB above in terms of probability. (b) Prove them by arithmetic induction on r. (c) What if t is strictly eonvex?
Exercise 6 suggesta, and indeed proves a special case of, the followiDg well-known and moat uaeful theorem, which caDDot be proved here in full pnerality.
TBmBEK 1 If t is convex and bounded in the interval 1, and z(a) .1 for all B 18, then (4)
B(I(%»
~ '(8(%».
Equality obtaiM • if and only if the valuea of z are with probability ODe contained in a single interval of linearity 01 t. Here and tJuouPout tide appendix, such conditions for equality are to be understood to apply only in the event that either p is countably additive or the lBDdom variable is with probability one confined to a fiDite set of valuesj the general situatioD for finitely additive me&lW'el is a little mOIa complicated. More aerci_ 7. The ftriance of z, often written Vex), is defined thus: (5)
Vex) - B([s - E{J:)J~.
Show that (6)
with equality if aDd only if P{z(a) .. H(x» :It 1. 8. Show that, if x is never 8IDaller than aome positive number, (7)
log ~l{X-l) S EOOK x) ~ 101 B(x).
When can either equality obtain? Write the analogue of (7) au...... by (3), and show thereby that (7) is a pIleI'&IilatioD of the Iamjliv fact that the arithmetio meaD (of poeiave numbers) is at 1. . . . pat
OONvEX FUNcrION8
.. the geometric mean aDd the geometric mean is at least &8 great 88 ~
harmonic mean.
One of the most famous of all inequalities is the Schwartz inequality. which ean, though not quite obviously, be derived from. Theorem I, and which can be stated in terms of expected values thus: (8)
g2(xy) S B(r)Bt:r>,
with equality obtaining if and only if for some Dumbers
p
and .. Dot
both sero (9)
P(PX(.) == 01/('»
== 1.
Note that (9) expresaes (perhaps too compactly) that, except on some . t of probability aero, either x or J VAnishes identically or else each is • fixed multiple of the other. Statiatically speaking, the Schwarts inequality EtXpftlJ8e8, in effect, the famDiar fact that any eorrelation eoefficient must lie betwecm +1 and -1, one of the extremes occurring if and only if at least ODe of the two random variables involved is a linear function of the other. The concept of convex functions and its implications ~ eaaiIy be extended to real-valued functions defined OD vectors in an n--dimeaaicmal vector space, the role of intervals there being replaced by convex subaeta of the vector space; but an understanding of this exteDaion, though deeirab1e, is not aheolutely eeaential in reading this book. ODe good introduction to convex subsets of vector spacee is SectiODl 16.1-2 of [V4], and another especially adapted to ltatiatieal applic. tiODS is incorporated in IBIS]. The standa.rd treati8e on the topic is that of Bonnesem and Fencbel [B20).
APPENDIX
3
Bibliographic Material The bibliography of about 170 items that termiDatea this appendix lists Dot only all worb referred to in this book but aI80 801D8 othera, for it is intended to aerve not only as a mecbanical aid to referaee but also 88 a briefly and informally annotated list of augeated !-.din. in the foundations of statistics. In addition to the DOtes meorporateci into the bibliography, information about many of the works Iiated theN is given in other parts of the book, where it call be fourad by ref•• Uc to the author'. name in the authol' index. BefereDCBI that have eo.e to my attention since the first edition are in Appendix -&: Bibliographie Supplement. They are cited by the eonvention aeeordinr to which the first of them is called (Aczel ]966). Todhunter baa abundant references scattered in chronoiOlica1 order through ['1'3), emphasising the mathematical aapecta of probability up through the period of Laplace. Keynes, in [K41 pves a formal bibliography which purpoeely does Dot overlap Todhunter's material very exteDBively, the emphasis being on more philoaophieal aspeeta of probability and on the period between Laplace and Ke,nee. Canaap ill (011 also gives a formal bibliography, which empbMi. . pubticatiODl sinee Keynes. Camap promises an eVeD fuller bibUopaphy in the projected second volume of his work, and he recommeDds the bibiOlraphy of Oeol'l Henrik von Wright in (V5]. Bibliographiea of 8tatistics proper are of eome, thouch diluted, relevance. Of these, the mOlt useful is that of M. G. Kendall in Vol II of [K2J. Camap at the beginniDg of his bibHopaphy Iiv- reference to some other statistieal bibliographies. The enormous work of O. K. Buroe in statistical bibliography, [B23) , (B24]. and [B2S], should also be mentioned . His volumes bring together pointed excerpta from reviews of statistical books. Buros also directed a bibliopaphlc department, entitled "Statiatical Methodology," in the Jt1UIYIIIl 0/ 1M A~ &0'istictJl AaociGtion. from September 1945 to September 1948, liatiiDB current artielee, books, theeee, aod chapters dealiDI with atatiatica. In 270
BIBLIOGRAPmC MATERIAL
211
Volume 20 (1949) of the AnnaL. 0/ M~8tatiltiel, an important journal of statistical theory, there are two cumulative indexes of Volumes 1-20, ODe arranged by author, the other by subject.
BlbJiapaphy Aitken, ~ C., and B. Silftl'ltoDe {Al] "OIl dae eltimatlcm of atadstlea1 p&rameten," protJl«jm,. of tIu! &,al &tMt" of ~ 81 (IMl-43), 18&-lM Caasued aeparately April 2, 1942). A1IaiI, Maurice (A21 ''Le co~meDt de l'homme rationnel devant Ie risque: Critique dee poe.tWati et axioms de 1'6oole Americaine," E~. 21 (1953),503-616.
ADea. S. G., Jr. (Aa) "A . . of miDimu teet.a for ODe Bided compoaite hypotheees," A"nalt oj JI~ &tJ1i1liu, 24 (1958), 295-298.
ADIoombe, F. J. [A4) "Mr. KaeaJe OD probability and iDductioD," MiRtl. 60 (1951), 29SH09. Sa,. much of poera1lnt.ereet on the foundadoaa of atatistics, in the course of comment. OD {K5J. Arrow, K.meth J. (Aa] 80citIJ CItt1U» tIIttllfIIIWi4uoj VaI_. Cowlee CommlWon Monocraph No. 12. New York, 101m Wiley ct Solll, 1951. (8eeoDd ecUlioD, 1M3.) (AS] "Altematlft ap~ ~ the theory of choice in riak..tekjDI eituatJODl,Jt ~ 19 (1961), 401-437. Arrow, K. J., Da'f'id Blaekwell, and M. A. Girabick (A7l j'Bayee aDd mlDlmu 101utioDl of aequential decilioa prob1ems," ~, 17 (1M9). tl3-M8. JWwdQJ', RaP.. Raj (Bl) "A property of the ,.. 8tatistie," &u.kA,., 12 (1952), 79-88. (B2J "SafBeieDcy Uld statistical deeiaiOll funcdooI." AMGle of M~ &olillit;s, 25 (19M), (to appear). BaJwlur. RaP" Raj. aDd Herbert Bobbins [B3) ''!'he problem of the pealer meaD," Aftaoll oJ JI~ 8t4titlia, 21
(19&0),'" C. LAMb, 8. (1M.] "WorN dee ~ Ii.,..,.., Wanaw, Fuad. . Kultury Narodowei. 1932. B&u.t'lI, 8.• aad A. Tanki [B6] "Su.r Ia d6e0mpolition dee eDlembJea de pointe eo partie. nwpeetiftlDeDt coaaruentel," ~ M~, 6 0924), ~277. Bu1Iet;t" M. S. lB6) "Completely llmultaDeou1 &ducial distributioaa," Aft1'lGle oJ M~ 8t.tJIi1lia. 10 (1939), 128-138. ."moJ, WDljam J. (B7J '-rhe Nenmann.MorpDltenl utiJi\, index-u ordiDalilt -.iew," Jt1fIJMl of Ptlliliotll..." 59 (1911), 61-88. Be.J-. Tbomu IBBJ Foaifflila of T1Do Po,.. b, BOJa: i. Aft BfMJJ/ TotIIGI'd Solri"ll G Problem ifl , . Dodrirw tJ/ ~, Will ~MnI Prk~'. FtIrfVJtIrfl and ~; PAil. 2',.... Bo,aI Soc., JIP. ",~'18, 11.. Will G C~,. Bcfwor4 C. Jloliu.
APPENDIX
a
ii. A L«ur Oft A.",.pIoIie s.n. J""" Sa". 10 Joim CIJIIImt,; 7P. 189-1'11 I/Me 801M VoluJU. Will G C~ 6J W. BdlDGl'dt Dttmi,." ed. W. Edwarda Demin" Wubiopon, D. C., The Graduate School, The Departmeat of Apt. eulture. UNO; ...pllbl....... al (Baya W8). The first of tbeIe two paperl, in which .. IpeCiaJ. cue of what ia DOW .ned Ba,..' rule iI Introduced. fiaurea prominently in controvendee about the fOWldat.iou of prot.biUty, for W. paper fint put e,enl of the major . . . in &be HmeliahtBell, E. T. [B9) Jlat. 0/ JI~, New York, Simon and Schuster, 1987.
Bemoum.DuW (BIO] u8pw:i1M't tbeoriae IlD\'U de ID8D1W'a 1IOrtia," C~i ~ IricatlGrum ~ P.~ (lor 1130 and 1731), 5 (1738), 116-192. ~
... moct.n. W..".... yerlUCl ........ TAeriJ . , W ~mtmf I'0Il 01.,.,"" (German traDIIation of (BIO] by Alfred PriDphelm, with iotJocluction by Ludwia Frick), LeiPSi& Dwlcker V. Bumblot, 1886. {B11aJ "Expoeitioo of a DeW theory on the meuuremeD\ of risk" (Enc1ieh tIuItlation of [BIO) by Low. Sommer), B~. 22 (19M), 28-28. BemouW, Jaoob (-James) (812] AI". ~ Buel, 1713. [BIB] 1V"'nlie~UIIf (German traDllation of [BIt) by R. BaaOetwald'. Kle-ker der Exakten \rJMeDJCbaften. NOI. 107 and 108, W. EnplmanD , 1899. Coataiaa, besides much of primary matbematlea1 inte..., what I rmderetaDd to be tbe tiNt euended di8cUleioD of the appli~tioD of probebility to the problem of iDfere.a.ce.. UafOJ'tUDate1y, the German translation w laid to be illoom" [Bll] DN
_>.
1Ai"'.
plde. BirkhoB, 0., and 8. MacLane (Bl") A Surw, oJ Modem Alge6ro, New York, The Maemil1aa Co., UNI. Bide,., M. T. L. (B161 "Some Dotes on probability," JOtII'fUJl of 1M 1,..,.,. oJ Ad&Mlria Stutltnla' &ci«", 10 (1951), 161-203. Blackwell, David [BIG) "Compariaon of experimenta," pp. 93-102 of PrOMeitli..,. 0/ ,.. s..d (1950) S.,.w., SrmJlOliu", ma M GlMrruJtU:ol SfGliltic. GIld Probabilily, ed. Jeny Seynwa, Berkeley, University of C.tifomia PNee, 1951. [B17) "On the translation parameter problem for ctiecrete n.riatMe," 0/ MGtMrntJl~ ~ 22 (1951), 393-399. BIacltwen, David, aDd M. A. Gil'8hiek (BI8) TIatJ TMorti 0/ GJttf &otialicol D«:Uiou, New York, Jolm W....y • SoDS, 1~ BoImenblUlt, H. F., S. Karlin, and L. S. Shapley [B191 "Solutione of diecnte two-persoD pmeI," pp. 61-72 of [KI3}. Bonn en. T., and W. Fenebel (820) 2'1YorN . , Uuam K6rpR, Erpbniale dar Mathematik wul ihrw Gnupbiete, Vol. m, Part I, Berlin, J. Spriqer, 193-&; reprinted, New York, CbdR.
A....,.
aa..
s
Publishi. Co., 1948. Borel, Emile {B2IJ ''The theory of play and intecraJ equationa with skew lIfII'IMbie '-DeJa; Oa pmel tha, involve ehaaoe and the aldll of U1e players; On.,.... 01 m-r
BIBLIOGRAPmc MATERIAL
278
forml of Ike. q_.etrir 4etermiDant aDd the .elleral theory of play (tra..-
Iated by ~ J. Bev...,)," B~, 21 (1953), 97-124.. Bowker, A. H. {B22] ''Tolerance limits for normal dietributions," Chapter 2, pp. 95-110 of T«Ir ftique.t of Statillical AMl,m, by the Statiatical Reeearch Group. CAmJmbia University, New York, 1-{cOraw-Hill Book Co., UM7. Buroe, O. K. (eel) {B28] Baean:Ja orad SttrtiItictJl MetMdolow, Booka aM ~ (1938-88), New BruoIwick, New Jemey, Rutprw University Preas, 1938. lim) T_ 8ectmtl Y~ ift llAeGrcA cmd MdI&otlolDgJl, Booa cuacI ~. ffighland Park, New Jersey, The Griffin Preas, 1941. [B25) SItJIiIIit:al MetMdolow Rmew8 1941-1960, New York, John wUey & SoDI,
1961. Camap, Rudolf (Cl) Logial F~ oj p~, Chieago, Univenity of Chicap Prea. 19S0. This is the tint of a projected pair of volumes desiped to demoDBtrate metieuloualy the author. contention that a oertain almoet D:eee1lMl'Y view of pro~ ability is .-entia! tA> lCience-not cIenyiDc the meaninaflJ1Q818 of the objec-tiviltic concept. Reviewed by me in (84). (C2) '1'''' Nalurs oM A~ oJ Iftlludi", Logk, Cbica&O, Univeraity of Chieap Preas, 1961. A reprint of aelected eeetioos of [C1). [ea) TIw ConIiftuUJn of IfIIiudiN M~oU, Chiea&o, Uni"nity of Chieago Pre.,
1962. Faentially a chapter of the eecond volume of the projected pair referred to ander (el J.
Centre NatioD&1 de Beebercbe Scienti&que (04] F~ _ G~ • lG ~ du n.que Centre National de Ia Recherche Scientifique, 19M.
'"
b:tIrwm«rit, Paris,
Report of &11 international econometric colloquium on JiIk, in which there • • much disculBion of ut.ility, held in Paria, May 12-17, 1952. Chand, Uttam {(6) ''Distributioos related to compari8on of two meaM and two ~gresaion coeftieieDte," AfUIGlI 0/ M~41~, 21 (l9S0). 507-522. Chapman, Doqlu G. t aDd Herbert Robbins [eG] "M;nimum va.riaDce estimation without repl&rity &88UDlptwos, tJ AnMU oJ MaI~
Sl4litlia, 22 (1961), 581-686.
Chernoff, Herman (07) "Remal'D on a Rational Selection of a Decillion Funmon," Cowles Commj";ou Diacusion Paper, Statistics, No. 326 (January 10, 1949). Unpublished.
Chu.rcJnnan,
c. West
[ca] 7'Aetwy
oJ
Ezperim.al4l 1n/IJf'fJfta, New York, The Macnnillan Co., 1948.
A diacuation of curreDt atatiltica from tbe vie1r.pOint of teebnical philoeophy. Cram6r, Harald [09] JI~ M~ oJ 8tatiaIic., PrincetoD, Princeton University Press, 1946. By ,~ &be JD08t oomprehenlive rigorous book on mathematical methods of Itatiati" in Proee,• of 'i~ Iflurtttlti0ft41 01 .G'~.'ieia [Edinbargh.19M], Cambridp, CaDlbridp Uai1'enity Pita. Sa••• Leonard J. 1981 I"The foundations of stali.uN reeonaideftd," pp. 57S.588 in Vol. I of Pf'OHHtag. 01 'Ia~ Four'" [1980] B6rIt,I,'1 Sy.poIIi.... 011 JI,.,Ia#M,.tkfrl S,."-tic. ..ad ProbGbili'", ed. Jersy Neyman, Berkeley, Univntrity of California Preas. "' Sa. ., Leoaard J. 1982 "Bay_ian .tatiatieat" pp. 161-194 in Rf'e~..' D"'~lo".._" i. D.cmo. aH 1• .forfAatiott ProC'.tIB~t fIda. Robert E. Multol and Paul Gray, New York, Maemillan Co. ie Savage, Leonard 1. 1962 "Subjeetive probability and statiatieal pnetiee," pp. 9-36 ill (Sa.,., et ai, 1962). 217
.,lf
e",.,..,.
..
APPENDIX 4 Sa. ., Lecmani J. 1987 "DiIlealtiee in the theory of perlow probability,"
P"il~O'-lI
of 8~, 34, 306-310.
Savage, Lecmard J. 1961 "ImpHeatioDs of per80w probability for iaduetioa," J nrtIGl 01 PlIiloIJoplt" 84, 593-807. Savage, LeoDUd J. 1910 "R_diag augp8tiou for the fOlUldatiODB of a.dati.," TN .A~.. St.ti8ficitJ_, 24, No.4, 23-27. Som.what O\'erla~ but is maeh shorter thaa, the pneent Bibliocraphie Supplement. 188 Sayage, Leonard J. 1m "EHeitatiOD of pelBODal prohabilitiea and expeetatiOD8," J ow..., of ,It• ..4."";'.. Bfa""ieM .dMoeMIioIf, 68, 788-80L See (BtU' "011 BolateiIl1970). Savap. Leouard J., et a1. 1982 n. Fm."-'ioM o/S,.,."ietJll-t.rMU: A Sppoft••, New York, John WHey and 8oaa. Valuable tor the interehaDp of ideu amoor atadlU. . of di"erse ezperit'Dee and outlook. S• •', HeID'J 1970 "PraetiaJ 10111tiona of the .Behrens-Filher problem," J o"ntGl 0/ ,IN A...nco.. 8'.... .4.'0-';0,., 85, 1501.1598. III SeheUiDg, Thoma C. 1960 Tl" Seral.gy 01 CorajUcl, Cambridge, Harvard University
'i_'
PertiDent beeaue eoD1liet and group decision are _peeta of the same thiDg. Estramatbematical and partie.alarly.unl1llatblg. Sehlaifer, Robert 1959 Pf'oboW.,,, CItICI S,,.,.,,klt lor Buiruu D,tUiMu, New York,
McGraw-Hill Book Co. Sehmitt, Samuel A. 1989 Jl~tU."., UftC~rtQi.'y: Atl ElflflNtft'.r, IlllnHlwrw. to BG1J'Mil BIa'ulia, Reading, MUIM'huetts, Addition-Welley. SheUy, Mayurd W., II, and GleDD L. Bryan (eds.) 1964 N ....... JtuIg __l. Gttd O"ti.alily, New York, John Wiley and SoDa. An orpnizecJ eoUeetioD of _ , . by many alithoJ'L Intel eetiug in itselt and uef1l1 as an extensiye key reference. Smith, Cedric A. B. 1961 "Ccmaiateney ill atatiltieal infenmee anel decision," J OtInI4il 01 ,,.~ Boy.'8kJ'Y'ic41 Socil'l}, 8,M B,23, 1-25. Smith, ~e A. B. 1965 "Penonai probability and atatiatiea1 analyaia," J oul'ltGl of ,~ RoyGl81atil~41 BoeNly, B~• .A, 128, 469-489. Stall von Holstein, Carl-A.%eI S. 1970 .A"~UtIHfN au EHl...,iott oI8..bj'eli~~ ProbabiliIJI DUlrih'iou, Stockholm, The EaoDomie Reae&rch Institute at the Stock-
68
holm Sehool of Eeonomiee. 1'17 Exeellent monograph on how to elieit perlOw probabiliti. aad what to do with them. Reviews and enriehea a eoDiiderable literature. Related refermeee are (Sav.p 1971; WiDkler 1988).
2f11
BIBLIOGRAPHIC SUPPLEMENT Stone, Mervyn 1970 "The role of e%perimental randomizatioD in Bay.... datJati.: l'inite .tDpliog and two Bayeeiane," BiofII_,riktl, M, 881..s83. Strarift'llJlUl, WUliam E. 1911 "Proper lJay. minimax eatima~n of the multivariate Bornw 1IlMIl," ,A..z. of JlClCA.. . ,iat, 42, 3&')..388. Key retel'ftlee for a ehalleugiDg theoretieal develOPloeot initiated by Chari. Steia. Suppee, Patriek 1960 "Some Opt'b probleu. in &be foundationa of Bubjective prob&bility,"' pp. 182-189 in 1./fWflltlliH aN D~.. PrOC6I11f••, ed. Roben 11"01, New York, KeOraw-Hill Book Co. Ta\'&Dee, P. V., ed.
,iealS,",..
mo
Prob'. . .
0/".
Lo~ of ScNral'jk: KfIOtDl,dg_,
PI,_.
tra....tect
bJ
T. J. B1akely, New York, Bum.nitiee A rare opportunity to rMd in English the icHae of 101M n»Odena SoviR phiJOIIOph(lftJ about probability and iududioD. Tribe, U1U'ftlt'e H. 1m "Trial by lUathernati~: ~Oll and ritual in thfl legal prot..," HurNr" lAte R#ri#w:, 84, 1329-1393. A key ntere~ on the poeaible applifability of pro_bilistie idea in the eourta, whi~h the author dOftillot fiDd pro.llising.
Tribas, Xyron 1989 BG'ioJttJ' D.~"ip'iofllf, D,.ci8iOfU aM
D'~igfU,
Nf!W York, Per-
gamon Pras. A Beet. Irian approaeh. Tuk~,John
w.
1957 "Son.. e-Ulllplee with fiducial relevaot'E'," .A ..,../~ 01 JlfJt"~'"
'ktil 8'd'i8'~, 28, 687-895.
III
Tukey, Job. W.
8,.,..· '''''111..
1962 "The future of data anal,..is," .A ..... of 1I.'''ftla'kdl tiu, 33, 1-6'7. U1A1•• ,8tanialaw 1930 "Zv K_theori~ in der aUgemeineq )(eDgeniehre," 141 .,,~_, 18, 140-150. van Dantsig, David 1950-1 "Reriew of Camap'. Logi~tJl FOHttd4liotu 01 ProbGbilt"," 811f"1u.~, 8. 459-470. van Dutar. David
1957 "8tatiBtical prieatbood: Sav. OD penonal probabiliu.." SI...
,ut'". N6M'ltJfKiictJ, 11, 1..16.
'Wan
naDtzie, na.id
n: Sir Ronald on eeielltifle intereneet 8.lUfka N ~mfJttdktJ. 1.1, 185-200. The thn-e preeediDg retereDeaJ review three dUferent views of the foandatioD8 of pTObahility and etatiatia. fTOUl the .t&ndpoiDi
1951 "8t.&tistieal priMtbood
of a foa.rth. Vetter, KennanD 1987 WM"t'Aft.. 'ie'k~it aUld logittcltn Mohr (Paul Sieb.ek).
Spi~'m."',
TiibiDgea, J. C. B.
41
APPENDIX 4 ViiI. . ., C. ~OD
qualitative probability cralpbru," A ...I, 'ioGl B"fJ''''k~t 35, 1787-1796.
1984
von }Ii.., Riebard 1942 "On the _rreet ue of Bay.' foruaula," St4l1il'ieI, 13, 156-165.
A."".
01
MafA,...
43
o/lItJ,AneaIic81
DIOltratei u approaeh anaauallor a fnqueDtiaL YOD W richt, Geol'l Henrik 1982 "Remarb OIl epiiteruoiOl1 of I1lbjefti~e probability," pp. 330339 in (N..... Suppa, aDd Tanki 1982). Wald, Abraham (ed.) 19M S~IMJ'M PaJHl'" i. St.,istic. aN ProbGbDilg, New York, )f~ Graw...Bill Book Co. WalllJerm&ll, Paul, and Fred 8. BilaDder 1958 D.Nioft Jlan.,: .4. A.tIO'.'. Biblio,ntp", Itbaea. Comell UDivenity Prt.. 181 W . . .nnan, Pa1ll, ad Fred S. SilaDder 19M D.~Jlaa",: A. A.fIO'G,.d BiblitJ,,.p11l: SWpplflftdl, 1968-63, ltb.., Com.!) Univenity Pre.. ., Watta, Doaald O. (eeL) 1987 n., I'.'.r., 01 SI.fUIia, Proeeediup of a Conferenee OD the Future of 8tatistim held at the Uni'Yenrity of W. . . .in. JUDe 1987, New York aDd Londoa, A-.dende PreM.
Wetheri1l, G. B. 1961 "Ba7eeiaD sequential ualyaia," B~''''', 48, 281-2t2. WhittJt', Peter 1961 "Cu"e and periodogram 8nloodliDg," J otWUI 0/ Boyal SIntMfieGl BOcf.", sm.. B, 19, 38-47. Whitde, Peter 1958 ClOD the amoothing of probability deWlity tUlletiODB," JOII,.,.1 01 ". Boyal 8tGtU'ieal SocNIy.t S"w. B, 20, 334-343.. Theile two refereDete are sua_dye far penonaliatie teftllliqae. Williama••J. 8. 1988 "The role of probability in ftdatrial iofereDee," S •• 8ttri6~ A, 28, 21}'·296. 262 Winkler, Robert L. 1988 "The eoaR__ of ..bjeetive probability diatriblltioDtl," JIa. ag. . . .' StMrace, 15, 2. 881-B75. 1.71
I..
t".,
Wolfowit&, J. 1982 ''Bay.iao mlereaee &Dd UiOll18 of t'JO"tAmt deei8ioa," Reo. ",'riea, SO, 411-479. Woltowiiz, J. 1970 "Bdeetioaa on the tutare of matbematiea1 .tatiatiea," pp. 7397SO in Bu.". •• Probabilify Gad Bt.,..,ae., eds. R. C. Boee et Chapel RUI, UDivtaity of North CaroIiDa PI I••
.1..
ie
Technical Symbols 'lhII iDda ia illteaded to lead to the de&aita. of aD teehDMt.llJ1DboII tMt I n clefivd in the te:d &Del ued esteDIiftly. Some IJIIlba ba" man tIaaa . . . . . Nf....., ooneepoodilll to their u. ill more tbaa ODe - . clepeDdiDi OIl OOIltat..
A. B, 0,11,"
1,lIL 85
4(., B), cr of AD eetim,te, de6nitioD of, m Errora 01 &as aDd IeOODd kiDd, 140.
M7 FatJmatJon, interval, 2:i9 poim,Z1OI' de&DitioG of, 221 Fajmatloa den'doll plOb1em, 22M Eftnt., oomplemeat of, U deflnltioD of, 10 esamplee 01, 10 _ric eymboll for, 11 Jlull (or vtrtua11y ilDpOllib1e), 2i
ani"""
10
vacuous, 10
Eftllte, almoet equivalent, 37 oontaiDiDI. 11
equaJ,l1
defiDition of, U1 Derived deeiliOD problem, 108
m
Derived problem, n.icn of elpelimeDt., ~ 106, 118 Dicbot.omy, 121 Ditferential iDformatioo. 2381 ~ment between people, Dominance, 111 in theory of pmeI~ 197 of ODe teet by uaother, US
ea
Duelity principle, 1M of Doo'en alpbra, 12 01 peMm&l probability, 78 of theory or PmeI, 185. 186
EqulftleDt. obaervatioaI. 112 EqaiftleD& eequeeoe of fmDte, 62 El'J'Or, meeD equare, ~
eo. of oon"da'atioD, ao
De Morpn'. theorem, poeral, 11 Derived act, 10D
D"aHatic views OD probebility, ~ ~ ~
~
as
iDtereectioD of, 11 UDioD of, 11 EspeetatioD, oooditiODAl, W Ezpeeted ftlue, 21M defiDitioD of, 283 Experience, ~ ~ ~ 82
EzpedmeDt aDd ~ 111. 118 ExteDaion, of aD ot.rvation, 112
of • • 01 .... m EDreme II. J.29
1_
GBNEBAL INDEX
J'utonhiJl~,. aftedoG for . .~.
NroolD, II JlduaW prolJUiIlty, 212
Pme.1L •
IDdaaUft ___.. 2 I..........
ID&mum.1Il
Infhdte letala ..,.,u.d met,heeetl., rl
Foad..... 01 __- . role of, 1
Infinite
FoaadatioDa of atatiItiea, deep, I hittory of, 1«
lDfonnaifoD, ~ !A 2861 cBenmtial. 2811 Iafonaatioa iDeqaalit,., 218 r,..at6a.t,plOll, principle of, 193
tlaaDow,l
~
utiJit7, &1
~
M.
Intepoal, 263 lIlt.enapt.ioD, hebaftnl, •
of deei8ioDs 180ft matbemMiCi of, 184. tbeorr of, US, 178ff Givea,
~M
GlaDd world, IY
Gteek fODta, 11 Group, mathematical. 193 OIOUP &etioa. 105 Group deeiIIoD problem. 1721 ad o_nation, 210 Group minimax rule, 317 llauldortf moment problem, & M. 112 Homopaeou ooordiaateB, 1&1 B,per-uWity. 76 Bypotheaia. altemative, 247 estreme null, 2M
.d,M7 Iaoome, UIii _ t l... ~~llD UId - . 182, Dl peI.onN,123 IaOOllliatency, ~ II, 6I ladeailion, 21 lad.epeuda0l ill qualitative probability. "tIl Iadepeadent ewnte, M IDClepeQdeDt rudom ~ j8
lDdIa. . . . lL69 cU8loulty of t-dar. 1.7 laducdft behavior. 1M
intermediate mode oft .28 atrictly empiriaal. 2& 29 IntAaectioa of ewau, 11 Iot.enal, 261 IDterYal eetimatioD, m definition of, . , 28Q IDtezTaI of pm.... D Interval of HllMri'7• •7 InvariaDee of & pme, UN' IDvariaat minima, lt1. 191 Irrelevant, 126
utterly, 126 Imtlevaut even', ~
JountIIl oJ A~ &a,.-• • A . . . . liM, 2m Judcm-t,l66 IMp numbers, atroac law of, M weak law of, A M. Q1 Learnia&~M
. . GltoEsperi_oe Lebeepe m.eure, jJ. I.ikelD.oocl rMio, A 18M, 225
Ukelibaodfttlo .... IlL 211 l'near I1IDotloD, . , ~C,a . . - "01' . . . . ,
empbiea1 iD~ of, 2D
erl\W1m 01, m iDcompJet.eae. 01, & JlOI'ID&tift iDterpretatloll of, to Loalcal behavior. Implieatiou of, t ~ 2D "Look before 70U leap prineip1e/' 11 critkiIm of, lI. 11
Lea. 111. 1M. 1& un perIOIIAI, 114
OENBBAL INDEX
301
Loaa, uniformity of, 188. 174 Lea and neptift iDoome, 182, 2m
MaraiDal utility, 103. 1M diminiahi"" Od
Mat.hemMieal expectation. priDdpIe of. ~91
.Iuimin, !H Muimum-Ubliboocl eatimate, 140, 203. 222&,241 deinition of, 226
Mean-lqU&N error,
~
. . ol.o ~ ~ error tIII4 Squared error Meuurable random variable. 44
MectiM,228 Microooem, 86 MiDimV-j 18i
1(.'."7 statiatic, 137. 2:H
Meet e111"7 . . . of probUilit,y. &•
81.
II Neptlft lnoame. 184. 169. 170 uad 10M, 181, D Ne1JllAD-Peanca ICbool, UO Ne,maD-P.noD dleory 01 ~ 262 aoa-ArchiIDlKleaD probebiDt.y, 19
1.
Normal cli8tribu\km, 182. 222 Normative interpretadou, of postuIa&ee. of tbeotJ of utility, 91 Normatift theory, 102 NuiIaD. puameter, 223 Null eveat, It, 1& Nail hJpot,heli,. 14.7 edreme, 2M
Nun . . . . .tioD, 112
MiDimu~.lM
Minimax equali'Y, 179, 187 Mhdmu _mete, 212, ~ 211 Mlnjmu rule. 157. 1801 ADd eimp1e o.rderiDc, _ poap,17ttr.m objeetivieUc, 1641
deftnitioD of, 1M
iJIuItratioDa of, IMI objectiYiatic mot.ivatiOD of, I§B. 189. Minimax rules, criticism of, 200Jr MiDimu test, 249, 2liO Minim" tbeon-. met.laematiCl of, IMt1 AfiDimu tbeoIy, 1. objectivil&io, de6DitioD of, 186 objectiYiat.ic approach to, 1581 Minimax tbeoIy aDd cae. ,atioa, D lWoimax value, 1M MiDcl act, IA !R ill croup deci8ioa problem, 173 Mised acta in atatist.icI, 218. 216. 217« Mixture of pmble8, n Momell' problem, 1IawIdodr, ~ H, W
Moral expectation, ~ 94 Moral worth, 91ft' Multiper8OD&l OODIIideratiou, 122, 1M, 126, 127. ~ lMlr, 17'21
... . . Apeemeat, Certaint)', tIftd
J>ila&reemeat
Mal...., o. . . . .tioD (or atatiaUc), W oouati", of, 111
Object.iYiado deciliOJl problem, llII Obit_riItic oIarvatioul problem. 2DI ObjeotiYiltlc . . . of probebiltJ'. ~ ~ ~g253,2H
ceatra1 dilicult)' of, , Pl'Obabilt7 of iao1ated propoeitiODll 1UIder,j
ot.t .atioD, 10611. 18 COlt of, lltt UI.. 189. 214. 214 c:IeciIiOD after, 23 cleftnitioll of. l.1O ObeeIYat.ioD8l problem, objectaftldo. 2D8 Ot.erYatioa aDd aperim8Ilt. 117. 118 0 ... ,eclft1ue, un ObtaiDl, 1D ()peIam.c obanctaiatJc. HI Optiri-l, 88 Ordar ItfAiIIie, 111
Parameter, 221 m"nGe, 113
Partial onleriD& 21
Partition, 2A aI!DOG UDifOl'lD, M Partition formaIa. 4Ii P&rtitioD pI~ 1JOI' PfII'IIODaIiItio view, 18
cti8icuItieI wit.b, 17 po_ble ill.... , . . . of, II P....alIIUo . . . of prot.bIIlV, J, 87 Pea8OMI ......... ~ 10
GENERAL INDEX
PaIODal probability, cridclam of . . . . . lade approaeh to, ~ 28 .... termI for, •
Pemoa IS eeoaomio UDi~ a
'D
,E""I
I.m..,. !.
ao SIll.
Repet.1Sl
BeJectiD& ~7 Root-~un enor, m .... Meaa«I'JAN enorGJWlBqaazed
PIa . . . .np cleeiliOll, 16 oriticiWD of, !l.. II PaiDt 81&1matioD, 220« de6Dittoo of, 221.
errm'
Po. COIl distribution, 222 Power faaotion, 21.8
St. Petenburs paradox. 981 8chwart. iDequalit:v. . . Science, aImoIt eaet, 101 8equea.tia1 ~ ll§. 1421. 215, 216 8equeat.ial obwa vatiODal pJOIIUD, 1G
PrefereDOe, 17
.. limpJe 0IderiDc. 11 .. pertial ordeIiD& 21
CODditioDal, 22 IUpeliooaa for
oo..-queD~
a§. 26
lne8aivity 01, 17 traDIlttri\y of, 18 PreI. .ce amoDI OOD8lqueDceI, 26 diatlnp'ehed from preference &mODI
....
PIe ItatiltiCl, 6 Primary act, 18& Prbe, 81 PrcabiUtlea of hiper emler, R ProbMJUitJ, methemati-l propertiee of,
M
UDbow:a, mperflUOUilllI. of iD perIODaliatie theory, ~ U Yiewa OIl, d,N.,iC, 3a §!. ~ 83
Ja eo. 61. 87 obJeedvlldc, I, ~ lI. fl, 2S8, 2M perlODa1Wc, Iz trl DeClEIII!7,
_ . . PerllClftl1jatio ~
P!obabBi\y 1Deaaare, 33 ProbMJUit7 ...." 46 PropoIItIoDa, probabQit 1 oi, UDder objectiNtie . . . . f, 2t §L 12 Pa.Jdo.mioroooem, sa PaJOhololiea1 probability,
m
Qualitative probability, defiDitioa of, 32 egm., 28 be hot Jd tiabt, ~ neither fiDe DOl' tiFt, i1
tlaht but not be, a Quantitative probebiJit1.
Random variable, 46 reel, 211 Batjona1 be":rior, t Ray. 1M
aa
Bandomi.tkm, . . 183. 216. m Raadom Dumbera, 67
8equeatial probability ratio proeed.are 146 SipifiCIDce lewl, 252 zeportinc of, 266 Sipificance .... HIlI 8hnpIe dichotomy, ~ 141, Mt. ~
212. 213, 212 Simple orderiDi. 18 aDd the miniIOU rule, D exera- on, 19 abe of a teat, 2m SmNI world, it ~ 821 8qaared ernJI', It 2U
. . . . MeeIHquare error IDfJIYHqUAI"8 error
"., . .
Standard deviatioD, 2U StMdvd pale. 17M 8tudard aequeoce of oIanatloaa, 22t State. 9 true, a State&, pDeric I)'Dlbola fOl', U Statilltic, 128 Btatiati-. other DIINII for. 2 . . . ot.2 Btatiltiea proper, A. 10L 114, 121 defiDitioD of, 1M 8tratea1 fanctioD, 111 Strict.b' convex faactioD, 7111 Subjective probability, :I) SdicieDt ItAtiItie. 1201, 212, ~ tao, 287. ~ 2Ha 259 factorabOlty eri\eftoo for. 1801 8uptemum, ~ 2IJ1 Sure pellODAl ~ §7.. II. 1& 8are-tbiDc prinelple, 2~. 114. 2Irl
OENEllAL INDEX
310
&,mmetric dual, 78 Symmetric BlueDoe of eYeD'" _ Symmetry. 232, 246 in probability, 831 of pmee, 193&
Tutea,l65
Te&m mate, 182 Teat., definition of, 247 of hypotheees, 24:61
Utility• • aDd \he mjnlmay ruleI, toUr bounded, 91
critidlm of, Id defiDitioa of, 78
history of, our
loprithmic, M. 91 probability.., Ot, 95. 91 Utterly lrreleYa1lt o'beer9atba, 1" 211. 23'1
TeetiDa. 221
Tedina problem, "7
n. in rank, 219
Tipt, 'KI, 40 Time in theory of deeimOD, 10, 11, II, 44
VMilIation, 21 VacuOlil tmDt, 10
..,...bollor, 10, t 1 Vap.u II, 18, 118, u,e
Value of
0_ ,.1iaD,
111
Toleranoe interval. 262 Tolerance level, 262 Topolosica1 "'mptioDa poeaible for a 1imp1e orden... 18
Variaaoe. 218 VeDA diapam, 12 Verbelietio ud behavioral. . . oatJooh,
Traaaitivity. 19 True state, 9
Verbelistio outlook, 1_. 2m, 280, III iudequacy oIia cJe6ajUoo 01 ptI'IIOUI probUliIity,2'7,28 Vutual exteDlioD, IC8 Vutually equlftleD\ acta, I . Vutually im~ l.hIe eYellt. 2C
UDbiaeed eetimate, ma, 224, m. 246 definition of, 228 UDbi.ued teet, 249 ortticlam of, 250 Uniform distributioD, 1St UDiOD of eventa, 11
Uoiveral event, 10 symbol for, 11 Ut.ile,82
17
World, choice of. g de6Dition 01, 9 eumplt. of, 8 1f'&Ild, M small, 9, 16, 821