A PRIMER ON STATISTICAL DISTRIBUTIONS
N. BALAKRISHNAN McMaster University Hamilton, Canada V. B. NEVZOROV St. Petersburg State University Russia
A JOHN WILEY & SONS, INC., PUBLICATION
A PRIMER ON STATISTICAL DISTRIBUTIONS
This Page Intentionally Left Blank
A PRIMER ON STATISTICAL DISTRIBUTIONS
N. BALAKRISHNAN McMaster University Hamilton, Canada V. B. NEVZOROV St. Petersburg State University Russia
A JOHN WILEY & SONS, INC., PUBLICATION
Copyright 0 2003 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate percopy fee to the Copyright Clearance Center, lnc., 222 Rosewood Drive, Danvers, MA 01923, (978) 7508400, fax (978) 7504744, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748601 1, fax (201) 7486008, email:
[email protected] Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the U.S. at 8777622974, outside the US.at 3175723993 or fax 3175724002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress CataloginginPublication Data:
Balakrishnan, N., 1956A primer on statistical distributions / N. Balakrishnan and V.B. Nevzorov. p. cm. Includes hibliographical references and index. ISBN 0471427985 (acidfree paper) 1. Distribution (Probability theory) I. Nevzorov, Valery B., 1946 11. Title. QA273.B25473 2003 519.2'4dc21 Printed in the United States of America. I 0 9 8 7 6 5 4 3 2 1
2003041157
To my lovely daughters, Sarah and Julia
CNJN
To my wge, Ludmila (VB.N.)
This Page Intentionally Left Blank
CONTENTS PREFACE
xv
1 PRELIMINARIES 1.1 Ra.iidoni Varia.bles arid Distribut.ions . . . . . . . . . . . . . . . 1.2 Type of Distribution . . . . . . . . . . . . . . . . . . . . . . . . 1.3 h'foinent. Cha.ra.cteristics . . . . . . . . . . . . . . . . . . . . . . 1.4 S h p e Chara.ctcristics . . . . . . . . . . . . . . . . . . . . . . . 1.5 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Geiic?r;tt.iiig Function arid Cha.ra.cterist,icFuiict.iori . . . . . . . . . . . . . . . . . . . . . . 1.7 Decomposition of Distributions . . . . . . . . . . . . . . . . . . 1.8 St.able Dist.rihiitions . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Randoin Vectors and Multivariat.e Dist.ributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Conditional Distributioiis . . . . . . . . . . . . . . . . . . . . . 1.11 Moiiient Clmrxteristics o f Random Vect.ors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12 Coiidit.iona.l Expect,a.t,ioiis . . . . . . . . . . . . . . . . . . . . . 1.13 Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14 Generat.ing Function of R.antlom Vwt.ors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.15 Tra.iisforiiia,t,ioiisof Variables . . . . . . . . . . . . . . . . . . .
I
DISCRETE DISTRIBUTIONS
1 1 4 4 7 8 10 14 14 15 18 19 20 21 22 24
27
29 2 DISCRETE UNIFORM DISTRIBUTION 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Not,a.tioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 Molllrllt.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Grric:rat. iiig Fiinct.ion a.iid C1ixact.erist.ic. Fnnct.ion . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Convoliitioris . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6 Decorriposit.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.7 Eiitropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.8 Rrhtiorisliips with Otlic'r Ilistributioiis . . . . . . . . . . . . . . 36
vii
viii
CONTENTS
3 DEGENERATE DISTRIBUTION 3.1 Int.rocluctioii . . . . . . . . . . . . . 3.2 hlommts . . . . . . . . . . . . . . 3.3 IiidcI,cndcnc:t:i(.c . . . . . . . . . . . . 3.4 Convolution . . . . . . . . . . . . . 3.5 Decorriposit.ion . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39 39 39 40 41 41
4 BERNOULLI DISTRIBUTION 4.1 1nt.roduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Nota.tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Convolut.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Maximal Valur:s . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Rda.t,ioriships with Other Distribiitions . . . . . . . . . . . . . .
43 43 43 44 45 46 47
5 BINOMIAL DISTRIBUTION 5.1 Iiitroduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Not,a.t,ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Useful Representatiori . . . . . . . . . . . . . . . . . . . . . . . 5.4 Generating Function a.nd Cliara.cterist,ic Function . . . . . . . . . . . . . . . . . . . . . . 5.5 R'lorllerlt.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 nlaxiiiiiiiii Proba.bilit,ies . . . . . . . . . . . . . . . . . . . . . . 5.7 Coiivoliitjioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.8 Dec.oinposit,ions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 hlixturcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Coritlitioiial Probahilities . . . . . . . . . . . . . . . . . . . . . . 5.11 Tail Probabilit.ies . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Liiiiithg Distributions . . . . . . . . . . . . . . . . . . . . . . .
49 49 49 50 50 50 53 56 56 57 58 59
59
6 GEOMETRIC DISTRIBUTION 63 6.1 1nt.roduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 6.2 N o t a t h i s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Tail Prolmbilities . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.4 Grricratiiig Fiinctioii mid Characteristic Funct.ion . . . . . . . . . . . . . . . . . . . . . . 64 6.5 Moinc~nts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.6 Convolut.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 . . 6.7 L>cvx)rripositiolls . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.8 Eiit.ropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.9 Conditional Probabilitics . . . . . . . . . . . . . . . . . . . . . . 71 6.10 Gtwiwt.ric Dist.riliiit.ion of 0rtic.r k . . . . . . . . . . . . . . . . 72
7 NEGATIVE BINOMIAL DISTRIBUTION 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Gcneratirig Function and Characteristic. Fiinctioii . . . . . . . . . . . . . . . . . . . . . .
73 73 74 74
CONTENTS
7.4 7.5 7.6 7.7
ix . . . .
. . . .
. . 74 . . . 76 . . 80 . . 81
. . . . . .
. . . . . .
. . . . . . . . . . . .
83 83 84 84 84 88
9 POISSON DISTRIBUTION 9.1 Iiitrotluctiori . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
hlornrmts . . . . . . . . . . . . . . . . . . . . . . . . . . Coiivolutioris arid Decoiripositions . . . . . . . . . . . Tail Probabilities . . . . . . . . . . . . . . . . . . . . . . Limiting Distributions . . . . . . . . . . . . . . . . . . .
8 HYPERGEOMETRIC DISTRIBUTION 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 8.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Grrierating Function . . . . . . . . . . . . . . . . . . 8.4 Chamcteristk Function . . . . . . . . . . . . . . . . 8.5 Moment.s . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Limiting Distxibut.ions . . . . . . . . . . . . . . . . .
83 . . . . . .
. . . . . .
89 9.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 9.3 Geiierating Function and 90 Chara.cterist,ic Function . . . . . . . . . . . . . . . . . . . . . . 9.4 Moirirwts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 9.5 Tail Prolmbilities . . . . . . . . . . . . . . . . . . . . . . . . . . 91 9.6 Corivoliit.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 9.7 Dt.c:oiiiposit,ioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 9.8 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . 94 9.9 Maxirrial Probability . . . . . . . . . . . . . . . . . . . . . . . . 95 96 9.10 Limiting Distribut.ion . . . . . . . . . . . . . . . . . . . . . . . . 9.11 Mixt.iires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.12 R.a.0Rubin Cha.r~r.ct,t:rizatiori . . . . . . . . . . . . . . . . . . . 99 9.13 Geiicra.lized Poisson Distrilnition . . . . . . . . . . . . . . . . . 100
10 MISCELLANEA
10.1 10.2 10.3 10.4
I1
Iiitrodiictiori . . . . . . . . . . . . . . . . . . . P6lya Distribution . . . . . . . . . . . . . . . Pascal Distributioii . . . . . . . . . . . . . . . Negative Hypergeometric Distrihiitioii . . .
101 . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . .
CONTINUOUS DISTRIBUTIONS
11 UNIFORM DISTRIBUTION 11.1 Intmtluctioii . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
101 101 102 103
105 107
107 11.2 Nota.tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.3 Moiiiellts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 11.4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 110 11.5 Chara.ctcristic Furict.ion . . . . . . . . . . . . . . . . . . . . . . 11.6 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 11.7 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 11.8 Probability Integral Tra.nsforni . . . . . . . . . . . . . . . . . . 112 11.9 Distrihut.ioris of' Minima aad hla.xiriia . . . . . . . . . . . . . . . 112
CONTENTS
X
11.10 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . .
114
11.11 Relationships with Other Distributions . . . . . . . . . . . . . 117
12 CAUCHY DISTRIBUTION 12.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 12.4 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . .
119 119 120 120 120 121
12.6 St.able Distributions . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . .
121 121
13 TRIANGULAR DISTRIBUTION 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 C1ia.racteristic Function . . . . . . . . . . . . . . . . .
. . . . .
123 123 123 124 125
14 POWER DISTRIBUTION 14.1 Iiitrodiiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
127 127 127
14.3 14.4 14.5 14.6
..... ..... .....
Distributions of Maximal Values . . . . . . . . . . . . . . . . . 128 129 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 131
15 PARETO DISTRIBUTION 15.1 Introduction . . . . . . . . . . . . . . . 15.2 Notations . . . . . . . . . . . . . . . . 15.3 Distributions of Minimal Values . . . 15.4 Moments . . . . . . . . . . . . . . . . 15.5 Entropy . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. ..............
16 BETA DISTRIBUTION 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Some Transformations . . . . . . . . . . . . . . . . . . . . . . . 16.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.6 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 16.7 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 16.8 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.9 Relationships with Other Distributions . . . . . . . . . . . . . .
133 133 133 . 134 136 137 139 139 140 140 141 141 147 147 148 149
CONTENTS
xi
17 ARCSINE DISTRIBUTION 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8
151 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 151 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 154 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 154 Relationships with Other Distributions . . . . . . . . . . . . . . 155 155 Characterizations . . . . . . . . . . . . . . . . . . . . . . . . . . Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
18 EXPONENTIAL DISTRIBUTION 157 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 18.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 18.3 Laplace Transform and Characteristic Function . . . . . . . . 158 18.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 18.5 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 160 18.6 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 162 18.7 Distributions of Minima . . . . . . . . . . . . . . . . . . . . . . 18.8 Uniform and Exponential Order Statistics . . . . . . . . . . . . 163 164 18.9 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.10 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . 165 18.11 Lack of Memory Property . . . . . . . . . . . . . . . . . . . . . 167 19 LAPLACE DISTRIBUTION 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 19.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 19.6 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.7 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.8 Deconipositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.9 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .
169 169 169 170 171 172 172 173 174 174
20 GAMMA DISTRIBUTION 179 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 20.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 20.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 20.4 Laplace Transform and Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 20.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 20.6 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 182 20.7 Convolutions and Decompositions . . . . . . . . . . . . . . . . . 185 20.8 Conditional Distributions and Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 20.9 Limiting Distributions . . . . . . . . . . . . . . . . . . . . . . . 187
xii
CONTENTS
21 EXTREME VALUE DISTRIBUTIONS 21.1 Int.roduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Limiting Distributions of Maximal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Limiting Distributions of Minimal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Relationships Between Extreme Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Generalized Extreme Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
189 189 190 191 191 193 194
22 LOGISTIC DISTRIBUTION 197 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 22.2 Not.ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 22.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 22.4 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 201 22.5 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 201 22.6 Relationships with Other Distributions . . . . . . . . . . . . . . 203 22.7 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 22.8 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 22.9 Gcneralized Logistic Distributions . . . . . . . . . . . . . . . . 205 23 NORMAL DISTRIBUTION 209 23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 23.2 Not.ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 23.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 23.4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 23.5 Tail Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 23.6 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 214 23.7 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 23.8 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 217 23.9 Convolutions and Decompositions . . . . . . . . . . . . . . . . 217 23.10 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . 219 23.11 Independence of Linear Combinations . . . . . . . . . . . . . . 220 221 23.12 Rernstein’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 23.13 DarnloisSkitovit. ch’s Theorem . . . . . . . . . . . . . . . . . . 224 23.14 Helmert’s Transformation . . . . . . . . . . . . . . . . . . . . 226 23.15 Identity of Distributions of Linear Coixbinations . . . . . . . . 227 23.16 Asymptotic Relations . . . . . . . . . . . . . . . . . . . . . . 228 23.17 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 229 24 MISCELLANEA 235 24.1 Introdiiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 24.2 Linnik Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 235 24.3 Inverse Gaussian Distribution . . . . . . . . . . . . . . . . . . . 237 24.4 ChiSquare Distribution . . . . . . . . . . . . . . . . . . . . . . 239 24.5 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
CONTENTS
xiii
24.5 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.6 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.7 Noncentra.1 Distribut.ions . . . . . . . . . . . . . . . . . . . . . .
240 245 246
I11
MULTIVARIATE DISTRIBUTIONS
247
25 MULTINOMIAL DISTRIBUTION 25.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.3 Conipositioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . 25.5 Conditional Distrihutioris . . . . . . . . . . . . . . . . . . . . . 25.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.7 Generating Function and Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 25.8 Limit Theorcnis . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 MULTIVARIATE NORMAL DISTRIBUTION 26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 26.2 Notations . . . . . . . . . . . . . . . . . . . . . . . 26.3 Marginal Distributions . . . . . . . . . . . . . . . . 26.4 Distributions of Sums . . . . . . . . . . . . . . . . 26.5 Linear Combinations of Components . . . . . . . 26.6 Indeptmlenrc of Components . . . . . . . . . . . 26.7 Linear Transformations . . . . . . . . . . . . . . . 26.8 Bivariatc Normal Distribution . . . . . . . . . . .
249 249 250 250 250 251 252 254 256
259 259 260 ...... 262 . . . . . . 262 . . . . . . . 262 . . . . . . . 263 . . . . . . 264 . . . . . . . 265
. . . . . . . . . . . . . .
.
. . . . .
27 DIRICHLET DISTRIBUTION 269 27.1 Iiitroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 27.2 Derivation of Dirichlct Formula . . . . . . . . . . . . . . . . . . 271 27.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 27.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . 272 27.5 Marginal Moments . . . . . . . . . . . . . . . . . . . . . . . . . 274 27.6 Product Moments . . . . . . . . . . . . . . . . . . . . . . . . . 274 27.7 Diriclilet Distribution of Second Kind . . . . . . . . . . . . . . 275 27.8 Lioiiville Distribution . . . . . . . . . . . . . . . . . . . . . . . 276 APPENDIX
PIONEERS IN DISTRIBUTION THEORY
.
277
BIBLIOGRAPHY
289
AUTHOR INDEX
294
SUBJECT INDEX
297
This Page Intentionally Left Blank
PREFACE Distribiit,ioiis and t,heir properties a.nd iiiterrelationsliips assume a. very irnport,ant role iri most, upperlevel uiidergradiia.tc a,s well a.s gradua.t,e courses in the st,at,istic:sprograin. For this rea.son, many introductory st,atistics textbooks discuss in a chapter or two a few h s i c st;ttistical distributions, such as binomial, Poisson, exponciitial, and normal. Yet a. good knowledge of sonic other distributions, such a.s gcornet,ric, negative binomial, Pareto, beta, ga.rrima, chisquare, logistic, Laplace, extreme value, niultinornial, niultivaria,te iiorIiia1, and Dirichlet will t)c iniiiierisely useful to those students who go on t,o upperlevel untlergraduat,e or graduate courses in statist,ics. Students iii applied programs such as psychology, sociology, biology, geogra.phy, geology, econoinics, business, and erigirieeririg will also benefit sigriificaiit,ly from a,ri exposurc: to different distributions and their propert,ies as statisticti1 riiodelling of observed data is a.11intogral p a t , of t,h It, is for this rea.son we have prepa.red this textbook which is tailormade (i1 distributions. All for a. oneterm course (of a.bont, 35 lectures) on s t tlie prelimiimry coric:r:pt,s arid definitions are prcs in Chapter 1. The rest of the inaterial is divided into three parts, witah Part I covering discretc: distributions, Pa.rt I1 covering continuous distributions, a.nd Part I11 covering niult,ivariate distributions. In ea.ch cha.pt,er we have induded a few pertinent exercises (at,ail a.ppropriate level for students taking tlie course) which may be handed out as lioiriework at the end of each chapter. A biographical sketch of some of the 1oa.ding contribut,ors to t,hc area of statistical distribution theory is presented in the Appcndix to present students with a historical sensc o f development,s in t,his irnporta.nt a.nd fundamental area in the field of statistics. From our experience, uld suggest the following lecture allocation for teaching a, course on sta distributions based on t’his hook: 5
9 17 4
lectures lectures lectures lectures
on on
on
on
p7elirninaries discrete distributions continuous distributioris multi.ocl7.iatt:iu~iatedi~tributions
(Cha.pter 1) (Part, I) (Pa.rt 11) (Pa.rt 111)
We welcome cxmirnerits and criticisms from all those who tcach a coiirsc based on this hook. Any suggest ions for improvcnient or “neccssary” addit ion (omission of which in this vcmion should be regarded a s a consequcncc of our xv
PREFACE
xvi
ignorance, not of personal nonscic:ntific antipa.t,hy) sent t,o us will be riiuc~li .ted a i d will lit, acted upon when the opport,iiriit,yarises. inportant to mentiori hcrt: tliat, inany a.ut,horit,ativeand eiic.yc.lopc:dic* tical dist,ribiit,ioii theory exist in tlit literat,ure. For exarriplc: volll1llr:s 011 0
0
0
0
0
0
0
Johnson, Kot,z, a.nd Kemp (1992), dcscrihing discret>eiinivariatt: tlistribiit,ioiis Stuart, and Ord (1993), discussing gerieral distribiit,ion t,kir:ory ,Johnson, Kot,z, a.nd Ba.la.krishnaii (1994, 1995), clescribing continiious uiiivariat,c tlistribiitioiis Johnson, Kot A , arid Balakrisliriaii (lW7), describing discrt>tvniiiltivari;it? distrihitioris Vl'iniiner and Altniarin (1999), provitling variatt. tlistrihtions
rl
tlicsaiirus on discrete uni
Evans, P t w v c k , and Hastiiigs (2000), desc.ril)ing discrete arid contiriiioiis distributions Kotz, Balwkrislinaii, arid ,Johnson (2000), discussing continiioiii rnultiV ~ I iatc. I tlistri1)iitiorls
are soni(~of t h prorniiiciit, ories. In addition, t,herr a.re sepa.ra.te books dedicatcd t,o sonic>specific distributions, such as Poisson, generaliimd Poisson, cliisquare, Pareto, exponent,ial, lognormal, logistic, normal, a.nd La.placc, (which have a11 hrcii rc,ft:rrtd to in this hook at appropria.tv places). Tliese books may 11c cwisiiltcd for any a.dditionad inform a.t'1011. WP t,a.ke this opport,iinit,y t,o express our sincere t,lianks t,o Mr. St,t:ve Quiglcy (of Johii Wiley & Sons, New York) for his support a.nd encoura.genient diiriiig t,he preparation of tjliis hook. Our specia.1 t1ia.nks go to htrs. Dc+bic: Iscoe (hlississauga, Ontario, Caiiatla.) for a.ssisting us wit,li t,lic canit:raproduction of the maniiscript, a.nc1 t,o Mr. Weiquari Liu for preparing all the figiircs. Wc also a.ckiiowledge wit,h gra.titude the financial support provided by the Natural Scicnces itrid Enginteririg R~esea.rcliCouncil of Canada. a.nd t,hr Riissia,ii Fountlation of Basic Research (Graiit,s 010100031 aiid 001596019) during t,lirx (aiirst' o f this projc
N . BALAKRISHNAN Hainilton, Canada
V. B. NEVZOROV St,. Pctcrsbiirg, Russia
April 2003
CHAPTER 1
PRELIMINARIES In this chapter we present some basic notations, notions, and definitions which a reader of this book must absolutely know in order t o follow subsequent chapters .
1.1
Random Variables and Distributions
Let (R, 7 ,P ) be a probability space, where R = {w} is a set of elementary events, 7 is a aalgebra of events, and P is a probability measure defined on ( 1 2 , 7 ) . Fiirthcr, let B denote an elcment of the Borel aalgebra of subsets of the real line R.
Definition 1.1 A finite sirigl(:va.lued fiinction X = X ( w ) whic,h maps 0 into R is called a randorn v n ~ i u b l eif for a.riy Borel set, B in R, the inverse image of B, i.e., X  y B ) = {w : X ( w ) t B } belongs t o the adgebra. 7 . It means that for all B o d sets B , on(' can definc probahilities
P { X t B} = P { X  ' ( B ) } . In particular, if for any :x function
(m
fix two sequences: a sequence of value 1 , z2.. . . and a. sequence of probahilititxs p k = P { X = zk}, k = 1:2 k
In this case, the cdf of X is given by (1.3) Definition 1.4 A ra.ndom varkble X with a. cdf F is said t o have ari absolutely coritirau,ous distribution if there exists a. nonnegative fiinctioii p(x) sur:h that,
F ( z )= & ( t )
dt
(1.4)
for ariy rcal :r. Remark 1.3 The fiiriction p ( ~then ) satisfies the condition
p(t) d t
= 1,
(1.5)
and it is called t,he probability density function, (pdf) of X . Note tha.t any nonnegativt. fiiiictioii p ( z ) satisfying (1.5) ( x i be the pdf of soiiie rantloni va.riablr X . Remark 1.4 If a random variable X has an absolutely contiriiious distribution, then its cclf F(.c) is continuous.
RANDOM VARIABLES AND DISTRIBUTIONS
3
Definition 1.5 We say that random variables X and Y ha.ve the same distribution, and write d
X = Y if the cdf’s of X and Y (i.e., F x and FX(2)= P { X
5
2)
Fy)
(1.6) coincide; that is,
= P{Y
5 }.
= Fy(2)
v 2.
Exercise 1.1 Construct an example of a probability space (R,T,P)and a finite singlevalued function X = X ( w ) ,w E R, which ma.ps Cl into R,that is not a ra.ndom variable. Exercise 1.2 Let p ( z ) and q ( z )be probability dciisity fuhctions of two random variables. Consider now the followiiig functions: 1 ( a ) 2 P ( Z )  4x1; ( b ) P ( Z ) + % ( X I ; (c) IP(Z)  q(J.)l; ( 4 2 ( d r )+ d z ) ) . Which of these functions are probability density functions of some random variable for any choice of p ( z ) and q(z)?Which of them can br valid probability density functions under suitably chosen p ( z ) and q ( z ) ? Is there a function that can never he a probability density function of a random variable?
Exercise 1.3 Suppose that p ( z ) arid q(x) are probability density functions of X and Y , respectively, satisfying
p(x) = 2 Then, find P { X
< 1)

q(s) for
+ P { Y < 2).
0 < z < 1.
The quaritrle functron of a random variable X with cdf F ( z ) is defined by
Q ( u ) = inf{r : F ( z ) 2 u } ,
0 < ?L < 1.
In the case when X has an absolutely continuous distribution, thrn the quantile function & ( u ) inay simply be written as
Q(u)= Fyu),
0 < 7L
< 1.
The corresponding qunntrlc dcnsrt y functrori is given by
where p ( z ) is the pdf corresponding to the cdf F ( z ) . It, should be noted that just, as forms of F ( z ) may be used to propose familics of distributions, general forms of the quaiitile function Q ( u )may also be used to propose families of statistical distributions. Interested readers may refer to the recent hook by Gilchrist (2000) for a detailed discussion on statist,ical niodelling wit,li qimntile funct,ioris.
4
1.2
PRELIMINARIES
Type of Distribution
Definition 1.6 R.a.ndom variables X and Y arc said t20 belong to the sa.m,e type of di,strilmtion if there exist corist,ant,s n a.nd h > 0 such that
Y
d
= a+
hX.
(1.7)
Not,c then that, the cdf’s F=( and F y of the random variables X and satisfy tlic rtlatiori
F y ( x ) = Fx
Y
2u ( 7 ‘d 2.)
One t ~ ~ itherefore, i , choost: a certa.in cdf F as tlie sta.nda,rd distribution fiinction of i t certain tlistribution family. Then this family would consist of all cdf’s of the form
and
F ( x ) = F ( x ,0: 1).
Tliiis, we have a twopa.rarneter fa.mily of cdf’s F ( z ,a , h ) , where a is ca.lled thrx location pawmeter and h is t,he scale parameter. For a.bsolut,ely coiitiriuous distributions, one can introduce tlie corrcspondirig twopara.meter families of proba.bility density functions:
(1.10) wherc p ( ~ r = ) p ( a . 0 , l ) corresponds t o the random variable X with cdf E’, aiid p(x,a , I r ) cmresponds to the randoin variable Y = a h X with cdf F ( a ,(1, h )
+
1.3
Moment Characteristics
Tliare a,re soin? classical numerical c1iara.cteristics of random va.ria.blcs a.nd their dist,ribut ions. The most popular oms are expected values a.nd variances. Morc g m m d cliara.ct,rristics are the momen,ts. Among them, we emphasize rnoincnt s ;tl)out zero (about, origin) a.nd cent>ralmorrimts. Definition 1.7 For a. discrtite ra.ndorn variable X taking on va.lues 2 1 , 2 2 ; . . . wit,li proba.bilit,ies Ilk =
P{X
k
= Zk},
=
1 , 2 , . . .,
wt’ define t,lie n t h rnonaent of X about zero a.s
(1.11) k
We say tsliat oTL exists if
MOMENT CHAR.ACTERISTICS Notc that the cxpected value E X is nothing but mean of X or the mathematecal Pxpectataon o j X .
a1.
5
E X is also called the
Definition 1.8 The nth central m o m e n t of X is defined as (1.12)
c
given that,
k
1x1, EXl"pk < 00.
If a random variable X has an absolutely continuous distribution with a pdf p(x), then the nioments about zero and t,hc central moments have the following expressions: 30
a,, = EX'l = l m z " p ( x ) dx
(1.13)
and (1.14)
We say that rnoiiients (1.13) exist if (1.15) The varzanc~of X is simply the sccond central riiornent: Var
x = p2 = E ( X

EX)^.
(1.16)
Central rrioriients are easily exprcssed in ternis of rnomerit,s about zero as follows:
d,,
=
E(X  E X ) "
C(l)" n
=
k=O
(1.17) k=O
In particular, we have Va.r X
= ,32 = a2
and
Note that tlir first central iriornent 81 = 0.

aI 2
(1.18)
6
PRELIMINARIES
The inverse problem cannot be solved, however, because all ccntral moments save no information about E X ; hence, the expected value cannot be expressed in terms of PTL( n = 1 , 2 . . . .). Nevertheless, the relation an
= =
=
k=O
c)
k=O
(L)ffFPnk
EX"
9 2
=
E [ ( X E X ) + E X ] " ( E X ) ' " E ( X E X ) "  k (12 0 )
will cnahle us to express a , ( n = 2.3,. . .) in terms of central moments /&, . . . . In particular, we have
+3
a3 =
+
h ~ i a;
0 2 =
Pz + (27,
and
a4 =
E X and t,he
a1
(1.21)
p4+ 4 0 3 ~ ~t16p2a: + a;.
(1.22)
Let X aiitl Y belong to the sa.rne type of distribution [see (1.7)], rnea.ning that, d Y =a, hX
+
for some constmts a and h > 0. Then, the following equalities a.llow us t o exprcss moments of Y in terms of the corresponding moments of X :
(1.23) and
E ( Y  E Y ) " = E [ h ( X
~
E X ) ] " = h,"E(X  E X ) 7 1 .
(1.24)
Note that the centxal niomcnts of Y do not depend 011 t,ha 1oca.tioiipara.ineter a. As partic:ul;tr ca.ses of (1.23) and (1.24), wc havc
EY BY2 EY" EY'
=I
= = =
u+hEX,
(1.25)
az
(1.26) (1.27) (1.28)
+ 2ahEX + h,'EX2, Var Y = h2 Var X , + 3a2hEX + 3ah2EX2$ h 3 E X 3 , a4 + 4u'hEX + Ba2hzEX2+ 4 u h 3 E X 3 + h 4 E X " .
Definition 1.9 For ra.ndorn varia,bles takiiig oil values 0, 1 , 2 , . . ., tliejactorial momeats of pos%t.l,?ie order are defined as p,. = E X ( X

1). . . ( X
 7'
+I),
'r = 1,2, . . .
~
(1.29)
while the f a c t o k l morrren,ts of negative order are defined as / L r=
E
[(X
+
1 l ) ( X '2).
+
I
. . (X + 7.) '
r
= 1,2,
(1.30)
SHAPE CHARACTERISTICS
7
While dealiiig with discrete distributions, it is quite often convenient to work with these fa.ctorial moments rather t,hari regular moments. For this reason, it is useful to note t,he following rehtionships between thc fa.ctoria1 rnoinents and the moments:
Exercise 1.4 Present, two different miidom variables having the same cxpectatioris and the same variances. Exercise 1.5 Let X be a random variable with expectation E X and variance Var X . Wha.t, is the sign of r ( X ) = E ( X  iXl)(Var X  Var IXl)? When does the qua.ntity r ( X ) eyua.1O? Exercise 1.6 Suppose tha.t X is a random variable such that P { X > 0) = 1 and that both E X and E ( l / X ) exist. Then, show that E X E ( l / X ) 2 2 .
+
Exercise 1.7 Suppose that P(0 5 X 5 l } = 1. Then, prove that E X 2 5 E X 5 E X 2 f . Also, find a.ll distributions for which the left and right bounds are attained.
+
Exercise 1.8 Construct a varia.ble X for which E X 3 = 5 and E X 6 = 24.
1.4
Shape Characteristics
For any distribution, we are often interested in some cha.ra.cteristics that are associated with t,he shape of the distribution. For example, we may be interested in finding out whether it is unimodal, or skewed, and so on. Two important measures in this respect are Pearson’s measures of skewness and kurtosis.
PRELIMINARIES
8
Definition 1.10 Pearson’s measures of skewness a.nd kurtosis are given hy
and
iij4
$2
72 = 
’
Since tliese mea.sures are functions of central moments, it is clear t,liat, they are free of t,lir. location. Siinilarly, dur to the fra.ct,ionalforni of thc rnca.sures, it can readily bt? vcrified that they are free of sca.le as well. It ca.n also he seen that tlie nieasure of skewness y1 may take on positive or nega.tive valiics depending on whtther /3:, is positive or negative, respectively. Obviously, whcn the distribiitiori is symnietric aboiit its mean, we may note that, /jn is 0, in which cast! tlie measure of skewiicss y1 is also 0. Hence, distrihiitions with y1 > 0 a.re sa.id to be positively skewed distributiorrs, while those with y1 < 0 arc said toohe n q n t i v e l y skewed distributions. Now, witliout, loss of generality, let, us consider a n arbitrary distribution with niean 0 a.nd va.riance 1. Then, by writing
and applyirig thr. Caiichy Schwarz ineqiiality, we readily obtain the inequality
Lat,er, we will observe the coefficient of kurtosis of a norma.1 distribution to hr 3 . Based on this value, distributions with 2 2 > 3 are called Zeptokwrtic distribu,tions, while those with y2 < 3 a,re ca.lled plntykurtic distributionw. Incidenta.lly, distribut,ionsfor which y2 = 3 (which clea.rly includes the normal) arc called m,esokurtic distributions.
Remark 1.5 Karl Pearson (1895) designed a. system of continuous distributions whcrcin t,he pdf of every riieniber satisfies a. differential cqiia.tion. By studying t,lit:ir irioriieiit properties and, in particiila.r, their coefficients of skewness and kiirtosis, he proposed seven families of distributioris which all occupied different, rcgioiis of the (71,~2)pla,iie.Several prominent dist~rihutJions (such as beta., gxnma, normal, and t that, we will set in subsequent cha.pters) bclong to t,hcsc families. This tieveloprnent wa.s the first and historic attempt t o propose it iiriificd mechanism for developing different families of sta.tistica1 distributioiis.
1.5
Entropy
One more useful charact,erist,icof distributions (called entropy) was int,roducecl by Shannon.
ENTROPY
9
Definition 1.11 For a discrete random variable X taking on values . . . with probabilities p l , p 2 , . . . , the e72tropy H(X)is defined as
.c1,x2,
If X has an absolutely continuous distribution with pdf p(x), then t h entropy is defined as (1.39) where
D
=
{x : p(x) > O}.
In the case of discrete distributions, the transformation
Y
=u
+ hX.
CC
< a < 00, h > 0
does not change the probabilities p , a.nd, consequently, we ha.ve
H ( Y ) = H(X). On the other h n d , if X has a pdf p ( . z ) , then Y
=a
+ hX
ha.s the pdf
and
whcre
It, is thcn easy to verify that
=
log h tH ( X ) .
(1.40)
10
PRELIMINARIES
Generating Function and Characteristic F'unct ion
1.6
In this section we present some functions that are useful in geiicratiiig the probahilit,ies or the niornrnts of the distribution in a siiiiplc arid unified nia,nner. In addition, they ma.y also help in identifying the distribution of a.n underlying random va.ria.ble of interest.
Definition 1.12 Let X take on values 0, 1 , 2 , .. . with proba.bilit,ies p , = P { X = n } , ri = 0 , 1 , . . , . All the information a.bout this distribution is contained in tlir generntin,g function, which is defined as (1.41) n=O
with the righthand side (RHS) of (1.41) converging a t least for / s / 5 1. Sonie iiiiportant properties of generating functions are as follows:
(a) P(1)= 1;
(11) for
Is/
0 and h such that f ( a 1 t )f
(azt) = e"D"f(at).
(1.56)
A raridoni varia.ble is said to ha.ve a stable distribution if it,s characteristic function is stable. Remark 1.9 It is of interest t o note tha.t a.ny sta.ble distribution is a.bsolutely continuous, and is also infinitely divisible.
15
R.ANDOM VECTORS AND MULTIVARIATE DISTRIBUTIONS
1.9 Random Vectors and Multivariate Distributions Let (R, 7 ,P ) be a probability spa.ce where (1 = { w } is a set, of element,ary events, 7 is a. aalgebra, of events, and P is a probability mea,siire defined on (a, 7 ) .Further, let B denote an element, of tlie Borel aalgebra. of subsets of the ndin1ensiona.l Euclidean space R'"
Definition 1.17 An ndimensional vector X = X(w) = (X,(W),. . . , X,,(w)) which maps R into R" is ca.lled a random vector (or an ndimensional random variable) if, for any Borel set B in R",the inverse image of B giveii by
B} = {W : (X,(W),. . . X,(W)) t B}
= {W : X(W)E
Xl(B)
belongs to the aalgebra 7 . This niea.ns t,ha.t,for any Borel set B, we can define probability as
P{X t B} = P{Xl(B)}. In particular, for ariy x
= (XI,.
F(x) = F ( z 1 , . . .
. . , z,), the fimctioii
2,)
=
P ( X 1 5 XI,.. . , X , 5 Z,}? CC < x i , . . . ,x, < CC,
(1.57)
is defined for the randoin vector X = (XI, . . . , Xn).
Definition 1.18 The function F ( x ) = F ( z 1 , . . . , x7,)is called the distribsution function of the ra.ridoin v Remark 1.10 The eltments X I , .. . , X , of the random vector X car1 be considered as n univariate random variables having distribution functions Fl(X)
=
F ( z :m , . . . ,m) = P ( X 1 5 x } ,
FZ(2)
=
F ( 3 0 , 2 , 0 0 , . . . , m ) = P ( X 2 5 X}; F ( o o , . . . . x , 2 ) = P { X n 5 x},
respectively. Moreover, ariy set of n random variables X I , . . . , X,, forms a random vector X = ( X I , .. . , X,). Hence,
F ( x ) = F ( z 1 , .. . , 2 , ) = P{X1 5
51,.
..
;x,5 2 , }
.
is often called t,he joint distribution, function of the variables X I ,. . . X,.
.
.T:'~)
is the joint, distribution function of the ra.ridorri variables
can obtain from it t,he joint distribiit,ion fiinction of any subset
,!)
ra.tlr1t.r ra.sily. For exa.rnple, we have
, x, 5 z:7n}= F ( s 1 , . . . as the joirit distribution function of
(XI,.
X,,,,w . . . . ; m )
. . ,X T n ) .
(1.58)
PRELIMINARIES
16
Definition 1.19 The ra.ndom variables X I . . . . X , a.re said to be independ e n t random! variables if
P(X1
5 2 1 , . . . , x , , 5 Xn}
n 7,
=
P{Xk I Zk}
(1.59)
k=l
for any m
< % k < oc ( k = I , , . . , n ) .
.
Definition 1.20 The vectors XI = ( X I , . . . X n ) and X2 = ( X T L + l.,.. Xn+m) are said to be independent if
for any m
< J'k < co ( k = 1,.. . , n + m ) .
In the following discussion we restrict ourselves t o the twodiniensiona.1 case. Let, ( X , Y ) be a. twodimensional random vector and let F ( z ,y) = P ( X 5 r. Y 5 y) be the joint, distribut,ion function of ( X ,Y ) . Then, F x ( 2 ) = F ( J ,m) a.nd FlJ (y) = F ( m , y) are the marginal distribution functions. Now, as we did ea.rlier in the univariate case, we shall discuss discret,c a.nd absolutely continuous cases sepa.ra.tely. Definition 1.21 A twodiinensiona.1 random vector (X,Y )is sa.id t o have a. discrefe bi7iariate distribution if there cxists a. countable set, B = ( ( 2 1 ; y l ) , ( I C ~ , U ~. ). ,.} such that, P { ( X , Y )E B } = 1. Remark 1.11 In order to determine a.twodimensional random vcctor ( X ,Y ) having a biva.riate discrete distribution, we need t,o fix two sequences: a. sequence of twodimensional points ( 2 1 , y l ) , ( 2 2 , y2). . . . and a. sequcnce of proba.bilities p k = P { X = x k , Y = yk}, k = 1 , 2 , . . . , such that X k P k = 1. In this casc, the joint distribution fiinct,ion F ( z ,y) of (X,Y ) is given by (1.61)
Also, the coniponents of the vector ( X ,Y ) arc independent if
P{x
= .rk,Y = y k } =
P{X
= .ck}P{Y = y ~ }
for any k .
(1.62)
Definition 1.22 A twodimensional random vect,or ( X ,Y ) with a joint distribution function F ( z , y) is said to have an absolutely con)tinwous biwarinte distribution, if there exists a nonnegat,ive function p ( u , such tha.t !ti)
( 1.63)
for any rcal
2
and y.
R.ANDOM VECTORS AND MULTIVARIATE DISTRIBUTIONS
17
Remark 1.12 The function p(u..rs) satisfies the condition (1.64) and it is ca.lled the probability density f u n c t i o n (pdf) of the bivariate random vector ( X , Y ) or the j o i n t probability density f u n c t i o n of the random variables X and Y.If p ( u , 21) is the pdf of' tlie bivariatr: vector ( X ,Y ) ,then tlic components X and Y have onedimensional (margiiial) denshies (1.65)
(1.66) respect,ively. Also, the comporicnts of the absolutely continuous bivariate vector ( X ,Y ) are independent if P ( U , 2') = P X ( U ) P Y (7'1,
(1.67)
wlicrc px ( u ) and p y ( I>) are the marginal t1oiisitic.s as givcn in (1.65) and (1.66). Moreover, if the joint pdf p ( u , I > ) of ( X ,Y)admits a factorization of thp form P ( U , 2') =
ql(u)q2(v).
(1.68)
then the components X and Y are iridependent,, and tliere exists a nonzero constant c such that
Exercise 1.20 Let F ( J ,y) denote tlie distribution function of the random vector ( X , Y ) . Then, exprcss P { X 5 0,Y > l} in terms of the function
F ( x ,Y). Exercise 1.21 Let F ( z ) denote the distribution function of a random variable X . Consider the vector X, = ( X ,. . . , X ) with all its corriporimts coinciding with X . Express the distribution function of X, iri ternis of F ( x ) .
PRELIMINARIES
18
1.10
Conditional Distributions
Let ( X .k’) be a rantlorn vector having a discretc bivariat,? distribution conc w i t m t d on smie poiiit,s (xi,g,?),and Ict, p z J = P { X = ri,Y = y j } , qi
=
P{X
= xi}
> 0 , and rj
=
P(Y = yJ} > 0,
for i > j= 1 , 2 , . . . . Then, for any y.7 ( j = 1,2 , . . .), the conditional distrib,utio,ri of X . g i i w t , I’ = yj, is defined a.s
(1.69) Siinilarly, for any :c, ( i X = r , , is dcfined a.s
P{Y
=
=
1,2: . . .), the conditional distribution of
? J ( X=xi}
=
P { X = 22,Y = yj} P{X =Xi}

pi, 
Y,givcn (1.70)
(12
N ) wit>li pdf px(.rl. . . . , .c,,). For example, let 11s consider the case wlie lie ?~tliiiic.nsioiial ( X I ,. . . . X 7 ! )has an a.t)soliit,elycont,iiiiious dist,ril)iition. raadorn vector X : Let, u == ( X I , . . . , X I , , ) and v = (X7,>+1, , X r z )( 7 n < 71,) t K tjllc?random vet.tors corrcsponiliiig: t,o tlir. first, ‘/r/ ~ , i i (t.he l last T I  m conipoiicnts of t,he ra.iitlo~n vect>orX . We ( x i tlrfiiicx the ptlf’s p c ~ ( . x . l . .. . n.,) and pv(x,+l, 1 1 ) ill t,liis ( m e as in Eqs. (1.58) and (1.63). Thcn, the con,ditionnE pdf of the 7andom uector V, ,9i,oen,U = (z1, . . . x,,,),is defirictl a.s ~
~
(1.73)
MOMENT CHARACTERISTICS OF RANDOM VECTORS
1.11
19
Moment Characteristics of Random Vectors
Let, (X,Y ) be a. bivariate discrete randoni vector concentrating on points (xi,y j ) with probabilities p i j = P { X = xi,Y = yj} for i , j = 1 , 2 , . . . . For any measurable function g(x,y), we ca.n find the expected value of Z = g ( X , Y) as (1.74) Similarly, if (X,Y ) has an absolutely continuous bivariate distribution with the density fiinction p ( x ,y) , then we have the expected value of Z = g ( X , Y) a5
1,JI
m o o
EZ
= Ey(X,Y) =
co d
z . Y ) P ( T Y)
Of coiirse, as in the univariate case, we say that E Z
Eqs. (1.74) and (1.75) exist if
dx dY. =
(1.75)
E g ( X , Y ) defined in
respectively. In particular, if g(x,y) = x k y e , we obtain E Z = E g ( X ,Y ) = EX"', which is said to be the product m o m e n t of order ( k , t ) . Similarly, the moment E ( X  EX)'(Y  EY)' is said to be the central product m o m e n t of order ( k , l ) , and the specia.1 ca,se of E ( X E X ) ( Y  E Y ) is called the covariance between X and Y and is denoted by Cov(X,Y). Based on the covariance, we can define another measure of association which is invariant with respect, t o both location and scale of t,he variables X and Y (meaning that it, is not affccted if the means and the varia.nces of the variables are changed). Such a. mea.sure is the correlation coeficient between X and Y and ~
It can easily be shown tjhat 1pl 5 1. If we are dealing with a general ndimensional random vector X = ( X I ,. . ,X,,), then the following moment characteristics of X will be of interest t o us: the iricari vector m = (ml, . . . , m n ) ,where m,= E X , ( 1 = 1,. . . , n ) , the covariance matrix C = ( ( c ~ ~ , ) ) ~ , = where ~, a,, = a,%= Cov(X,,X,) (for z # 3 ) and o,,= Var(X,), and the correlation matrix p = ( ( p z 3 ) ) F 7 = where Ll pz, = a,,/,/. Note that the diagonal elements of the correlation matrix are all 1.
20
PRELIMINARIES
Exercise 1.22 Find all distributions of the random variable X for which the correlation coefficicmt p ( X ,X 2 ) = 1. Exercise 1.23 Suppose that the variances of the random variables X arid Y are 1 and 4, respectively. Then, find tlic cxact upper and lower bounds for Var ( X Y ) . ~
1.12
Conditional Expectations
111 Section 1.10 we introduced conditiorial distributions in the case of discrete as well a s absolutely continuous niultivariate distributions. Based on those conditional distributions, we describe in this section conditional expectations. For this piirpose, let us first consider the case when ( X ,Y ) ha,s a discretr hivariate dist,ribution concentrating O K points ~ (xci,y j ) (for i , j = 1,2 , . . .), arid as before, let pi,] = P { X = z ~ , Y= y,} a,lld r3 : P { Y = y j } > 0. Suppose also t1ia.t E X exists. Then, based on the definition of the conditional distribution of X, given Y = y j , presented in Eq. (1.69), we readily have the conditional mea'n of X , given Y = y j , as
E (XIY
= yj) =
1 J ~ P { X= xilY =
Pi, .
:yj} =
(1.76)
7'j
2
2
More generally, for any nieasurable fiinctio~it i ( . ) for which E h ( X ) exists, wt' have t,he condztaonal expectatton of h ( X ) , gzven Y = y, as
E { h ( X ) l Y = y3}
h ( x 2 ) P { X= .r,lY
=
h(.c,) %. (1.77)
= y,} =
z
2
1'3
Based 011 (1.76), we can introduce the conditional expectation of X , giToeri Y , denot,ecl by E ( X l Y ) , as a new random variable which takes on the value E ( X I Y = g j ) when Y ta.kes on t h value y j (for j = 1 . 2 , . . .). Hcnce, the conditiorial expt:cta.tion of X, givtm Y , its a random variablt: takes on va,lues
with probabi1itic.s r3 (for j = 1 , 2 , . . .). Consequently,
E {E(XIY))
=
1E (XIY J
= Y,)
WP 7'3
rcadily observe that
21
REGRESSIONS
Similarly, if the conditional expectation E { h ( X ) I Y } is a random varia.ble which takes on values E { h ( X ) I Y = y j } when Y takes on va,lues y j (for j = 1 , 2 , . . .) with probabilities r j , we can show that
E [ E{ h ( X ) I Y ) I = E { h ( X ) } . Next, let us consider the case when the random vector ( X , Y ) has an absolutely continuous bivariate distribution with pdf p ( z ,y), and let p y ( y ) be the margina.1 densit,y function of Y . Then, from Eq. (1.71), we have the conditional mean of X , given Y = YJ, its
provided that E X exists. Similarly, if h(.) is a measurable function for which E h ( X ) exists, we have the conditional expectation of h ( X ) , given Y = y, as x
E { h ( X ) I Y = y}
=
As in the discrcte case, we can regard E ( X I Y ) and E { h ( X ) I Y }as random variables which take on the values
when the random variable Y takes shown that
thc vitluc y. In this case, too, it can be
and
E { E ( X I Y ) }= E X
1.13
011
E [ E { h ( X ) l Y }= ] E{h(X)}.
Regressions
In Eqs. (1.76) and (1.78) wr dcfinecl the conditional expectation E ( X I Y = y) provided that E X exists. From this conditional expectation, we may consider the function a(y) = E ( X I Y = Y) 1
(1.80)
which is called the rcgression function of X on, Y . Similarly, when E Y exists, the function b ( ~=) E ( Y I X
(1.81)
= X)
is called the regression function, of Y on X . Note tha.t when the random va.riables X and Y are indepcndent, then u ( y ) = E ( X I Y = y) = E X
and
h ( z )= E ( Y I X
=
z) = EY
are simply the unconditional means of X and Y , and do not dcpentl on y a.nd
z, respectively.
PRELIMINARIES
22
1.14
Generating Function of Random Vectors
Let X = ( X I , .. . , X,) be a random vector, elements of which take on values 0 , 1 , 2 , . . . . In this case, the generatzng function P ( s 1 , .. . , s,) is defined as P(S1.. . .
, s,)
=
EST' . . . s:n
(1.82) Although the following properties can be presented for this general case, we prment them for notational simplicity only for thc bivariate case ( n = 2). Let Px,~(s,t), P x ( s ) , and P y ( t ) be the generating fiinctioii of the bivariate random vector ( X , Y ) , the marginal generating function of X , and the marginal generating function of Y, defined by
P ~ ( s )= EsX
=c
P { X
=j
}~',
(1.84)
j=O 00
Py(t)
=
EtY
=C
P { Y =k}P,
(1.85)
k=O
respectively. Then, the following propcrt,ies of easily:
Px,y(s,t ) can be establislicd
Px y(1.1) = 1; P x . Y ( s ,1) = Px(s)and P x . Y ( = ~P , Y~()~ ) ; P y , ) ; ( s . s ) = P x + Y ( s ) ,where Px+y (s) = E S " + ~is the gcncrating function of the variable X Y ;
+
PX.l( s , t )= Px(~)Py(t) if and only if X and Y are independent;
Next, for the random vector X f m c t i o n f ( t 1 , . . . , t,) a.s
=
(XI.. . . , X,,,),we define the ch,nro,cteristic
GENERATING FUNCTION OF R.ANDOM VECTORS
23
(1.86)
Similarly, in the case when the ra.ndom vector X = ( X I , .. . , X,) has an absolutely coritiriiious distribution with density fiinction p(x1,. . . , z,,), then its characteristic function f ( t 1 , . . . , tTL) is defined a s
f ( t l , .. ,t n ) ,
=
E
ei(~lX1+...+tnXn)
S,L 0

M
ez( t 1s1 +...+t ,,z, ) ~
( z I ., .., x , )d ~ l. .. dx,.
(1.87)
Once again, althoiigh t,he following properties ca.n be presented for this genera,] ndimensional ca.sc, we present, t,hem for notational simplicity only for tkic bivaria.te case (71 = 2 ) . Let f x , y ( s ,t ) , fx(s), arid f y ( t ) be the cha.racteristic fiiiiction of the bivariate raridoin vector ( X ,Y ) ,thc marginal charat ristic function of X, and the marginal cha.racteristic function of Y , tlefiried by
1,.I_, c
f X , Y (*%t )
fX(S)
= =
.I_,
u
m
=
&"Jl+tI/)
I.(
eiSZl)x
Lm 00
fY(t)
m
eitglpy
p x , y ( : c ; y )d z dy,
(1.89)
dz,
(1.90)
(y) d y ,
respectively. Then, the following properties of easily:
(13 8 )
f ~ , (s. y
t ) call be established
(a) fX,Y(0,O) = 1;
(b)
fX,Y
(.%0)
= .fx(.)
and
fX,Y(O,t)
fv(t);
(c) f x , y ( s ,s) = , f x + ~ ( s )where , f ~ + + y ( s=) Et,i"(X+') is t,he characteristic Y; fiinckion of the w.riable X
+
(d) f x , Y ( s , t ) = f x ( . s ) f y ( t if ) and only if
X and Y are independent;
24
PRELIMINARIES
Exercise 1.24 Let P ( s , t ) be the gerieratirig function of the random vector ( X . Y ) . T h m , find the grnclrat,ing function Q ( s .t , 7 ~ ) of the rantlorn vector ( 2 X 1. r;, 2Y).
+ x + 3x +
1.15
Transformations of Variables
a:r,
ax1 ~
~
3Yl
aY2
d:Yl
dYz
...
a22 _ _ _ _8x2 _
J =
a&
~
dyl
ax 1 __ 8YTl
852 __ dYrl
... 11, ~
dY2
...
ax,, ~
?Y,L
,
TR.ANSFORMATIONS OF VARIABLES
25
where lJ1 is the a.bsoliite valiic of the Jacobian of tlic transforma.t,ion. Once a.ga,in, the marginal ptlf o f m y subset, of the new va.riables may be obtained from (1.92) by integrating out the other variables. Note that, if the transformation is not onetoone, hiit B is the union of a finite number of mutually disjoint spa.ces, sa.y B1,. . . , Be, then we can construct l sets of onetoone transformations (one for each Bi)and their respect,ive Jxobians, and then finally express the tleiisity function of the vector Y = (Yl... . , Y,) as the sun1 of k' ternis of the form (1.92) corresponding t o B1,. . . , Bb.
This Page Intentionally Left Blank
Part I
DISCRETE DISTRIBUTIONS
This Page Intentionally Left Blank
CHAPTER 2
DISCRETE UNIFORM DISTRIBUTION 2.1
Introduction
The general discrete uniform distribution takes on k distinct values x1,x2, . . . , x k with equal probabilities l / k , where k is a positive integer. We restrict our attention here to lattice distributions. In this case, xj = a j h , j = 0,1, . . . , k1, where a is any real value and h > 0 is the step of the distribution. Sometimes, such a distribution is called a discrete rectangular distribution. The linear transformations of random variables enable us to consider, without loss of generality, just the standard discrete uniform distribution taking on values 0 , 1 , . . . , k  1, which correspond to a = 0 and h = 1. Note that the case when a = 0 and h = l / k is also important, but it can be obtained from the standard discrete uniform distribution by means of a simple scale change.
+
2.2
Not at ions
We will use the notation
if
P{X=a+jh}=and
1
k
x
for j = O , l ,

. . . ,k  1 ,
DU(k)
for the corresponding standard discrete uniform distribution; i.e., D U ( k ) is simply D U ( k ,0 , l ) .
Remark 2.1 Note that if
Y
N
D U ( k , a ,h )
and
29
X
N
DU(k),
DISCRETE UNIFORM DISTRIBUTION
30 the11
x=Y h a d
d
and

d
Y =a+hX,
where = denotes “having the same distribution” (see Definition 1.5). More generally, if Y1 D U ( k ,ul,h l ) and Y2 D U ( k ,a2, h2), then N
where c = 111 1112 and d = a1  a2 hl lh2. This means that the random variables Y 1 a.nd Yl belorig to the same type of distribution, depending only on the shape pa,rarneter k , a.nd do not depend on location (a1 and u2) and scale (hl and ha) pa.ra.meters. Discrete uniform distributions p1a.y a na.tiira.lly important role in nmny classical problems of probability theory that, t1ea.l with a random choice with equal probabilities from a finite set of k items. For example, a. lottery machine contains k halls, riurnbercd 1 , 2 , . . . , k . On selecting one of these balls, we get a. random number Y which has the D U ( k , 1 , l ) distribution. This principle, in fact, allows 11s to genera.te tables of random numbers used in different sta.tistic:al simulations, by ta,king k sufficient,ly large (say, k = lo6 or 232). For t h rest of this chapter, we deal only with the standa,rddiscrete uniform D U ( k ) distribut>ion.
2.3
Moments
Wc will now determine tlic rnorn(wts of X X has a finite support.
N
D U ( k ) , all of which exist since
Moments nhout zero:
. kl
T=O
( k  1)(2k  1)(3k2  3k 30

1)
(2.4)
MOMENTS
31
To obtain the expressions in (2.1)(2.4), we liavc used the following wellknown identities for sums:
cr=k1
3
r=O
k1
c . 3= r=O
k1
C.2
(k  l)k
( k  l)"2 4
,
'
=
r=O
cr4
k1
and
( k  1 ) k ( 2 k  1) 6
1
( k  l)k(2k
=
r=O

1)(3k2  3 k 30

1)
,
sce, for exarnplt, Gradshteyn and Ryzhik (1994, p. 2). Note that an N k"/(n
+ 1)
as k
+ cm.
(2.5)
Central momen,ts: The variancc? or the second central moment is obtained from (2.1) a.nd (2.2) as
The third cmtral rnomciit, is obtaiiird from (2.1) (2.3) as ,!)3
E(X
 0!1)~
=
Q3 
3CYzCy1
+ 2N:
At, first, (2.7) ma.y seem surprising; but once we realize that X is symmetric a1 = ( k  1 ) / 2 , (2.7) makes perfect sense. In fact, it is about its mean va.11~ easy t o see that (A  1  X ) and X take on the same values 0 , 1 , . . . , k  1 with equal proba.bilities 1/k. Therefore, we have
x

d
N1 = a1

x,
and consequently,
which simply implies that /?2?.+l =
0,
r
=
1 , 2 , . .. .
Fuctoriul m,omen,ts of positive order pr
=
EX(Xl)...(Xr+l)
(2.8)
DISCRETE UNIFORM DISTRIBUTION
32
It is easily seen that p7

__ 
= 0 for T
2 k,
and
1 7n! k 711=1’ (rn  T ) !

( k  l)(k2)...(kr)
forr1.2 ,.... k  1 .
T + l
In deriving the last expression, we have used the wcllknown conibiiiatorial identity
In particular, we have I11
=
a1
=
k1 2 ’
(2.9)
~
(2.10) p3
=
a3
(k
3a.L+2a1=

l ) ( k  2 ) ( k ~3 ) , 4
(2.11)
and pk2
=
Pk1
=
( k  2)!, ( k  l)!
k
(2.12) (2.13)
Fuctorinl rriomerits of negatiue order:
p,.
=
E
{(X +
1
1). . . ( X
+
T)
In pa.rt,icular, (2.14) and
(2.15)
GENERATING FUNCTION AND CHAR.ACTERISTIC FUNCTION
33
(2.16) =
py
1 kl k m=O ( m

1
+ l ) ( m + 2)(m + 3) 1
m+1
2k 
2.4
m +2
m+3
k+3 4(k+ 1)(k+2)'
(2.17)
Generating Function and Characteristic F'unction
The generating function of D U ( k ) distribution exists for any s and is given
bY
px (.s) = E s X
For s
#
=
1 kl

k
5'
r=O
1, it can be rewritten in the form
Px(s)=
1  SIC k(l  s)'
For any k = 1.2. . . . , Px ( s ) is a polyriornial of ( k difficult t o see that its roots coincide with s3
j
= exp(2?rij/k),
(2.18)
~
=
~
1)th degree, arid it is not
1 , 2 , . . . ,k  1 for k
> 1.
This readily givrs us thc following form of the gmcrating function:
Another form of the generating function exploits the hypergeometric func
tion, defined by
2F1[n,h;(.;x] =
1
a ( a + l)b(b + 1)x2 + abz + el! c(c + 1)2! ~
The genera.ting function, in terms of the hypergeometric function, is given by
Px(s)= zF1[n
+ 1,1; + 1;s]. 7i
(2.21)
34
DISCRETE UNIFORM DISTRIBUTION
Sirice the characteristic function and the generating function for lionnegative intcgervalued random variables sa.tisfy the relation
f x ( t )= ~ e x p ( i t X = ) Px(eLt), if we change s by eit in Eqs. (2.18), (2.19), and (2.21), we obtain the corresponclirig expressions for the chara.ct,crist,icfiinct,ioii f x ( t ). For example, from (2.18) we get, (2.22)
Convolutions
2.5

Let, 11s t,akc two independent random variables, bot,h having discretc iiniform distributions, say, X D U ( k ) and Y D U ( T ) , k 2 r (without loss of a.ny generality). Tlien, what ca.n we say about the distribution of t,he sun1 2 = X Y ? The distribution of 2 is called the convolution, (or composition) of the two initid distributions. N
+
+
Exercise 2.1 It is clear that 0 5 2 5 k r  2. Consider the tlircc tliff(wnt sit,iiat,ioris, aiid prove that P { Z 5 m } is given bv (a)
(b) (c)
(77)
+ l ) ( m + 2)
if O < n z < r  l , 2kr 2m r 3 if r  l < m < k  l . 2k ( T + k  2  m ) ( T + k  1 m ) . 1 if k < 7 r i < k + r  2 . 2kr
+
(2.23)
From ( 2 . 2 3 ) , we readily obtain
( 0
ot herwisc.
One can see now that r = 1 is the only case when the corivolution of two discrc%curiiforiri D U ( k ) and D U ( T )distributions leads t o the same distribution. Note that in this situation P { Z = 0) = 1, which nieaiis that Z has a degenerate distribution. Nevertheless, it turns out that convolution of more general rioridegenerate discrctc uniform distribiitions may bdorig to tht. s a m , set of distiibutions.
DECOMPOSITIONS

35

Exercise 2.2 Suppose that Z DU(r,O,s) and Y D U ( s ) , where T = 2 , 3 , . . . , s = 2 , 3 , . . . and that Y and 2 are independent random variables. Show then that U = Y 2 D U ( T S ) .
+
N

d
Remark 2.2 It is easy to see that Z = s X , where X D U ( r ) . Hence, we get another equivalent form of the statement given in Exercise 2.2 as follows. If X D U ( T )and Y D U ( s ) , then the sum s X Y has the discrete uniform D U ( s r ) distribution. Moreover, due t o the symmetry argument, we can immediately obtain tha.t the sum X rY also has the same D U ( s r ) distribution.

+
N
+
2.6
Decompositions
Decomposition is an operation which is inverse to convolution. We want to know if a certain ra.ndom variable can be represented as a sum of at least two independent random variables (see Section 1.7). Of course, any random variable X can be rewritten as a trivial sum of two terms a+ ( X  a ) , the first of which is a degenerate random variable, but we will solve a more interesting problem: Is it, possible, for a certain random variable X , to find a pair of nondegenera.te independent random variables Y and 2 such that
Consider X
T
N
D U ( k ) ,where k is a compound number. Let k
= rs,
where
2 2 and s 2 2 are integers. It follows from the statement of Exercise 2.2 that
X is deconiposable as a sum of two random variables, both having discrete uniform distributions. Moreover, we note from Remark 2.2 that we have at least two different options for decomposition of D U ( r s ) if T # s. Let k be a. prime integer now. The simplest ca.se is k = 2, when X takes on two values. Of course, in this situa.tion X is indecomposable, because any nondegerierate random variable takes on a t least two values and hence it is easy to see that any sum Y 2 of independent nondegenerate random variables has at least three values. Now we can propose that D U ( 3 ) distribution is decomposable. In fact, there are a lot of random variables, taking on values 0, 1, a.nd 2 with probabilities p o , p 1 , and p 2 , that can be presented as a sum Y + Z , where both Y and 2 take on values 0 a.nd I, probably with different probabilities. However, it turns out that one ca.n not decompose a random variable, taking three values, if the corresponding probabilities are equal (PO = 1 Pl = p 2 = 3).
+
36
DISCRETE UNIFOR,M DISTRIBUTION
Exercise 2.3 Prove that D U ( 3 ) distribution is indecomposable.
In the general case when k is any prime integer, by considering the cmresponding generating function
we see that, the problem of decomposition in this c;ase is equivalent to the following problcm: Is it possible to present P x ( s ) as a product of two polynomials with positive coefficients if k is a prime? The nega.tive answer was given in Krasncr a.nd Ranu1a.c (1937),and independently by Raikov (19374. Sumnmrizing all these, we ha.ve the following result,.
Theorem 2.1 T h e discrete u n i f o r m D U ( k ) distribution (for k > 1) i s indecornposable iff k i s a p r i m e integer.
It is also evident that X D U ( k ) is not infinitely divisible when k is a prime nurnber. Moreover, it is known that any distribution with a. finite support ca,iinot be infinitely divisible (see Remark 1.6). This mea.ns that any D U ( k ) distribution is not infinitely divisible. N
2.7
Entropy

Frorn the definition of the entropy H ( X ) in (1.38), it is clear that the entropy of any D U ( k .a , 11.) distribution depends only on k . If X D U ( k ) ,then
H ( X ) = log k.
(2.25)
It is of interest to iiieritiori liere that among all the random variables taking on a t most k values, any random variable taking on distinct values T I , 2 2 . . . . . x k with probabilities p J = l / k ( j = 1 , 2 , . . . , k ) has log k t o bc the niaximurn possiblc valiie for its entropy.
2.8
Relationships with Other Distributions
The discrete uniform distribution f o r m the ba.sis for the derivat,ion of many distributions. Here, we present some key corinect,ions of the discrete uniform distrihiition t o sornc othcr distributions: (a) We ha.ve a.lrea.dy mentioned that DU( 1) distribution ta.kes on thr: value 0 with probability 1 a.nd, in fa.&, coincides with the degenerate disthit)iition, which is discussed in Chapter 3.
RELATIONSHIP WITH OTHER DISTRIBUTIONS
37
(b) One more special case of discrete uniform distributions is D U ( 2 ) distribution. If X D U ( 2 ) , then it takes on values 0 and 1 with equal probability (of This belongs to the Bernoulli type of distribution discussed in Chapter 4.
i). N
(c) Let 11s consider a sequence of random variables X , DU(n), n 1 . 2 , . . . . Let Y, = X,/n. One can see that for any n = 1 , 2 , . . . , N
Y, DU ( n ,0, N
=
A)
arid it takes on values 0, l / n , 2 / n , . . . , ( n l ) / n with equal probabilities l / n . Let us try to find the limiting distribution for the sequence Y1, Y2,.. . . The simplest way is to consider the characteristic function. The characteristic function of Y, is given by ~
gy,%(t)= Eexp(itY,)
= Eexp{i(t/n)X,}
= f,(t/n),
where f n ( t ) is the cha.racterist#ic function of D U ( n ) distribution. Using (2.22), we readily find that
(2.26) It is not difficult to see now that for any fixed t ,
gY,(t) + g ( t )
as
+ 00,
where
(2.27) The RHS of (2.27) shows that g ( t ) is the characteristic function of a continuous distribution with probability density function p(x),which is equal to 1 if 0 < 2 < 1 and equals 0 otherwise. The distribution with the given pdf is said to be uniform U ( 0 ,l ) ,which is discussed in detail in Chapter 11. From the rorivrrgence of the corresponding characteristic functions, we can immetliatrly concludc that thc uniform U ( 0 , l ) distribution is the limit for the sequence of random variables Y, = X n / n , where X , DiY(n). Thus, we have constructed an important “bridge” between continuous uniforni and discrete uniform distributions.

This Page Intentionally Left Blank
CHAPTER 3
DEGENERATE DISTRIBUTION 3.1
Introduction
Consider the D U ( k ) distributed random variable X when k = 1. One can see that X takes on only one value 20 = 0 with probability 1. It, gives an exa.mple of the degenerate distribution. In the general case, the degenerate random variable takes on only one value, say c, with probability 1. In the sequel, X D ( c ) denotes a ra,ridorn va.riable having a degenerate distribution concentrated a.t the only point c, 00 < c < 00. Degeneratr distributions a.ssume a specia.1 place in distribution theory. They can be included as a special case of many families of probability distributions, such as normal, geometric, Poisson, and binomial. For any sequence of random va.riables XI, X z , . . . and any arbitrary c, we can a.lways choose sequences of normalizing constants ana.nd ,Onsuch that the limiting distribution of ra.ndoni variables N
will become the degenerate D ( c ) distribution. A very important role in queueing theory is played by degenerate distributions. Kendall (1953), in his classification of queueing systems, has even reserved a special letter to denote systems with constant interarrival tirnes or constant service times of customers. For example, M / D / 3 means that a queueing system has three servers, all interarrival times are exponentially distributed, and the service of each custonier requires a fixed nonrandom time. It should be mentioned that practically only degenerate ( D ) ,exponential ( M ) , and Erlang ( E ) have their own letters in Kendall’s classification of queueing systems.
3.2
Moments
Degenerate distributions have all its moments finite. Let X following discussion.
39
N
D(c) in the
DEGENERATE DISTRIBUTION
40
Moments about zero: a, = EX"
n
= cn,
=
1,2.. .. .
(3.1)
In part,icular, QI=
EX
(3.2)
=c
and a2 =
E X 2 = C2.
(3.3)
The varianw is = Var
x = a2
(1:
~
(3.4)
=0
Note that (3.4) characterizes degenerate distributions, meaning that they are thc only dist,ribiitions having zero varia.nce.
Ch,aracte:ristic function: f x ( t ) = Ee LtX
~

and, in particular,
C
,
(3.5)
if c = 0.
f x ( t )= 1
3.3
p
Independence
It turns out, that any random variable X, having degenerate D ( c ) dist,ribution, is independent of any arbitrarily choscn random variable Y . For observing this, we must, chcck that for any 2 and y,
P{X
I z,
Y 5 y}
= P{X
Equality ( 3 . 6 ) is evidently true if x
P { X 5 z, Y 5 y} If z
5 z } P { Y I Y}.
< c, in which
=0
(3.6)
ca.se
and P{X 5
.E} =
0.
2 c, then P{X 5 x }
=1
and P{X 5 2 ,
Y L v } = P{Y 5 11).
and wc scc that ( 3 . 6 ) is once again truc.
Exercise 3.1 Let Y = X. Show that if X and Y are iridcpendrrit, thcn X has a degrricratc distribution.
DECOMPOSITION
3.4
41
Convolution
It is clear that the convolution o f two degenerate distributions is degenerate; that is, if X D ( q ) and Y D(c2), then X Y D(cl ca). Note also that if X D ( c ) and Y is an arbitrary random variable, then X Y belongs t o the same type of distribution as Y . N
+
N
+
N
N
+
Exercise 3.2 Let X I and X2 be independent random variables having a common distribution. Prove t1ia.t the equality
XI implies that X1
N
+ x2 = XI d
(3.7)
D ( 0).
Remark 3.1 WP see that (3.7) characterizes the degenerate distribution concentrated at zero. If wc take X I + c iristead of X1 on the RHS of ( 3 . 7 ) ,we get a characterization of D ( c )distribution. Moreover, if X I ,X2,. . . are iridepeiident arid identically distributed random variables, then the equality x1+. ' .
+ Xe = XI + . . . + XI, + c, d
gives a. characterizat,ion of degenerate D
3.5
15 k
< e,
(
k ) distribution.
Decomposition
There is 110 doubt that any degenerate random variable can be presented only as a sum of degenerate random variables, but even this evident statement needs to be proved.
Exercise 3.3 Let X and Y be independent random varia.bles, and let X $ Y have a degenerate distribution. Show then tha.t bot,h X and Y a.re degenera.te.
It is interesting t o observe that even such a simple distribution a.s a degenerate distribution possesses its own special properties and also assumes an important role among a very large family of distributions.
This Page Intentionally Left Blank
CHAPTER 4
BERNOULLI DISTRIBUTION 4.1 Introduction The next simplest case after the degenerate random variable is one that takes on two values, say z1 < 52,with nonzero probabilities p l and p 2 , respectively. The discrete uniform D U ( 2 ) distribution is exactly doublevalued. Let us recall that if X DU(2), then N
P{X
= O} = P { X =
1 1) = . 2
(4.1)
It is easy to give a.n example of such a random variable. Let X be the number of heads tha.t have appeared after a single toss of a. bahnced coin. Of course, X can be 0 or 1 and it satisfies (4.1). Similarly, unbalanced coins result in distributions with
P{X
= 1) = 1  P { X = 0)
=p,
(4.2)
where 0 < p < 1. Indeed, a false coin with two tails or heads can serve as a model with p = 0 or p = 1 in (4.2), but in these situations X has degenera.te D ( 0 ) or D(1) distributions. The distribution defined in (4.2) is known a.s the Bernoulli distribution.
4.2
Notations
If X is defined by (4.2), we denot,e it by
x
N
Be(p).
Any random variable Y taking on two values zo
P{Y = 2 1 )
< z1 with probabilities
= 1 P { Y = 5 0 } = p
can clearly be represented as
Y
= (51

z0)X
43
+
50,
(4.3)
44
BERNOULLI DISTRIBUTION

B e ( p ) . This means that random variables defined by (4.2) a.nd where X (4.3) have the same type of distribution. Hence, X can be called a random variuble of the Bernoulli B e ( p ) type. Moreover, any random variable ta.king on two values with positive probabilities belongs to one of the Bernoulli types. In wha.t follows, we will deal with distributions satisfying (4.2).
4.3 Moments Let X B e ( p ) , 0 < p < 1. Since X has a finite support,, we can guarantee the cxistrricc o f all its moments. N
M o m m t s about zero:
Exercise 4.1 Lvt X have a noridcgcncrate distribution and
E X 2 = EX" = E X 4 .
< p < 1, such
Show then that therc exists a p , 0
x
N
that
Be(p).
Variance: ~2 = Var
It is c1ea.r t1ia.t 0 < [32 5
x = (12 04 
=p(1
(4.5)
p).
$, and i j 2 attains its maximum when p
=
Gentrul moments:
From (4.4), we readily find t,ha.t /jTl= p ( 1

+(  ~ ) ~ p ~  l }
p ) { (1
for n
2 2.
The expression of the va.ria.ncc in (4.5) follows from (4.6) if we set
(4.6) 7t =
2.
M e ( ~ s ~ r eof. 9 skewness n n d kurtosis: Frorri (4.6), we find Pearson's coefficient of skewness as =
h j p

1  2p (p(1 p ) } ' P 
This exprcwion for rea.dily reveals that the distribution is negatively skewed when p > and is positively skewed when p < $. Also, y1 = 0 only w1it:ri p = (in t,liis case, the distribution is also symmetric).
CONVOLUTIONS
45
Sirnilarly, we find Pearson's coefficient of kurtosis as
i]
This expression for 7 2 [due to the fact rioted earlier that /& = p(1  p ) 5 readily implies that 7 2 2 1 and that 7 2 = 1 only when p = Thus, in the case of B e ( ; ) distribution, we do see that, 7 2 = 7: 1, which rnearis that the inequality presented in Section 1.4 cannot, be improved in general.
4.
+
Entropy: It is easy t o see that (4.7)
H ( X ) =plogp(1p)log(lp). Indeed, the maximal value of H ( X ) is attained when p equa.ls 1.
=
i, in which case it
Characteristic function: For X Be(p),0 < p < 1, the characteristic function is of the form N
f x ( t ) = (1 p ) +peit.
(4.8)
As a special case, we can also find the characteristic function of a random variable Y , taking on values 1 and 1 with equal probabilities, since Y can be expressed as Y =  1, where X

2x
B e ( ; ) . It is easy to see that f y ( t ) = cost.
4.4 Convolutions Let X I , X 2 , . . . be independent and identically distributed B e ( p ) randorn variables, and let Yn=X1+...+X, . Many methods are available to prove that (4.9)
We will u5e here the characteristic function for this purpose. Lrt gT1,f l . . . . , f n bc the characteristic functions of Y,, XI,. . . , X,, resprctively. From (4.8),we have f k ( t )= ( ~  p + p e " ) , 1 c = 1 , 2 , . . . , n.
BER.NOULL1 DISTRIBUTION
46 Then, we get
(4.10)
One can see that the sun1 on the RHS of (4.10) coincides with the characteristic function of a discrete random variable taking on values ni ( 7 7 ~ = 0 , 1 , . . . , n ) with probabilities
Thus, the probability distribution of Y, is given by (4.9). This distribution, called the binomial distribution, is discussed in deta.il in Chapter 5. Fiirtlier, from Chapter 3, we already know tha.t a,ny X Br(p) is indecomposahle.

4.5
Maximal Values
We have seen above that sums of Bernoulli raridorn variables do not have the same tvpe of distribution as their suminands, but Bernoulli distributions are stable with respect to another operation.
Exercise 4.2 Lct,
be indepeiiclmt random variablcs, and lct
M,z = niax(X1,. . . ,X,,}.
Quite oftrn, Bernoiilli raridorn variables appear as randoin iridicat ors of different events.
RELATIONSHIP WITH OTHER DISTRIBUTIONS
47
Example 4.1 Let us consider a. random sequence ~ 1a2, , . . . , a, of length n, which consists of zeros and ones. We siipposc t1ia.t ( 1 1 . 0 2 , . . . , a,, a.re independent ra.ndoni variables taking 011 va.lues 0 and 1 with proba.bilities T arid 1 T , respectively. We say that a peak is present a,t point k ( k = 2 , 3 , .. . , n  1) if ai;1 = a k + l = 0 arid a!, = 1. Let Nk be the total number of peaks prcisent in the sequence a l , ~ 2 : .. . , a,. What is tlie expected value of N,? To find EN,, let us introduce tlie events Ak = ( ( ~ k  1 = O, ai;
=
1.
ak+l =
0}
and random indicators
Xk
=
k
l{Ak},
=
2 , 3 , . . ..TL

1.
Note that Xi; = 1 if A!, happens, and Xi; = 0 otherwise. In this cast’. P{xk=1}
=
lP{Xk=O}
=
P{A!,} P{Q&l
=
= 0,

and
XA Be(pj,
a!, = 1,
k
= 0} = (1  T ) T 2
ai;+1
= 2 , 3 , .. . , n

1,
where p = (1  r)?. Now, it, is easy to see that
EX!,
=
(1  r ) r 2 ,
arid
ENk. = E ( X 2
k
= 2 , 3, . . . :R.
+ . ’ . + Xnl) = ( n

1,
2j(1  T ) r2 .
In a.ddit,ion t o the classical Bernoulli distributed random variables, there is one more cla.ss of Bcrrioulli distributions which is often encountered. These iiivolve random va.riab1t.s Y l ,Yz,. . . , which ta.ke on values *l. Based on these random va.riables, the slim Sk = Yl . . . Y, (71. = 1 , 2 , . . .) foriiis different discrete raiitloiri walks on tlie integervalued httice and result in some interesting prolmbility problrtnis.
+
4.6
+
Relationships with Other Distributions
(a) We have sliowii that convolutions of Bernoulli distributions give rise to biriornial distribution, which is discussed in detail in Chapter 5.
(b) Let X1,X2.. . . be independent B P ( ~ 0) ,< p < 1, raridoin variables. Introduce a n ~ w random variable N a s
N
= min{.j : X,+, = O};
that is, N is simply tlic. iiuinber of 1’s in the s(qiieiicc~X1, X 2 , . . . that precede tlic’ first zmo.
BERNOULLI DISTRIBUTION
48
It, is easy to see tha.t N can take on values 0, 1, 2, . . ., and its probability distribution is
P{N
= 7L)
= = =
x,,
P{X1 = 1, x,= 1,.. . , = 1, Xn+l = 0) P { X 1 = 1}P{X2= 1). . . P{X,,, = l)P{X,,+1 (lp)p", n=0,1,....
= 0)
This dist,sibiition, called a, geometric distribution, is discussed in drta.il in Chapter 6.
CHAPTER 5
BINOMIAL DISTRIBUTION 5.1
Introduction
As shown in Chapter 4, convolutions of Bernoulli distributions give rise t o a new distribution that takes on a fixed set of integer values 0,1, . . . , n with probabilities p , = P { X = m} =
(3
m
p m ( l p)"'",
= 0, I , . . . , n ,
(5.1)
where 0 5 p 5 1 and n = 1 , 2 , . . .. The parameter p in (5.1) can be equal t o 0 or 1, but in these situa.tions X has degenera.te distributions, D ( 0 ) arid D ( n ) , respectively, which a.re naturally special ca,ses of these distributions. The probability distribution in (5.1) is called a binomial distribution.
5.2
Notations
Let X have the distribution in (5.1). Then, we say that X is a binomially distributed random variable, and denote it by
X

B(72,p).
We know that linear transforinations
Y=u+hX,
co0,
have the sa.iiie type of distributions and, hence, we need t o study only the st,andard distribution of the given type. If X sa.tisfies (5.1) and Y = a, h X , then Y takes on values a , a h, . . . , a nh and
+
P{Y
y
u + mh,} =
J:(
+
p"'(1

p),"',
m
= 0 , 1 , . . . ;n.
+
(5.2)
We say that Y twlongs t o the binomial type of distribution, and denote it by
y
B ( 7 b , P ,a,h ) ,
a a.nd h > 0 being location a.nd scale parameters, respectively. 49
50
BINOMIAL DISTRIBUTION
5.3
Useful Representation
As wc know froin Chapter 4, binomial random varia,bles c m i be expresscd as slims of independent Bernoulli random variables. Let 21,Zz. . . . be independent, Bernoulli B e ( @ )randorri variables a i d X B ( n ,p ) . Then the followiiig equalit,y holds for ally n = 1,2 , . . . : N
x= d
21
+z,+...+z,.
(5.3)
Due t o (5.3), we (mi ea.sily obtain generating function aiid charact,eristic fuiictioii of binoiniad random va.ria.blesfrom the corresporidiiig expressions of Brrnoulli randoin variables.
5.4
Generating Function and Characteristic Function

Let, X B ( n , p ) .It follows froin (5.3) a d the independence of Zi's that the geiiera.ting function of X is
p, (s)
EcsX
~
=
~
E8Z1+."+Zn
~
E Q B I E S Z Z , . . E,s"r
(5.4)
( P z ( s ) ) "= (1  p + p * s ) ' " ,
+
wlicrc. Px(.s) = 1 p p s is t,he coininon generating furict>ionof the Bernoulli random va.riables ZI, 22,. . . 2,. Lct, f x ( t ) he the c1ia.racteristic fuiictioii of X. Frorii (5.4), we readily obt,ain 
A\ a rorisqueiice, if Y
5.5
B ( n ,p . a , h ) , tlieri Y
d
=a
+ hX
and
Moments
Equdit 1 (5.3) irnmtdiatcly yiclds
as
wc4l a s
Otlicr inonleiits of X caii h found by differentiating the generating finiction in (5.4).
MOMENTS
51
Factorial moments of positiue order: pk
1 ) .. . ( X
=
EX(X
=
n(nl)+k+l)&


k
+ 1) = P$'(l) k=1,2 )....
(5.9)
In particiilar, we have =
lL2
=
pLy
=
71p,
(5.10)
n(n  l)p", 4 7 1 .  1)(n 2)p3,
(5.12)
= n! p n .
(5.13)
(5.11)
and
pn Note that
pn+1 = p n + 2 =
. . . = 0.
(5.14)
Factorial moments of nxgutive order: For any k = 1 , 2 , . . . ,
Exercise 5.1 Show that we rari use the equality i1k
= P!p)(l),
k
=
1,2... . ,
under the assumption that we consider the derivative of a negative order  k as the fdlowirig integral:
Now we can use (5.15) t o find pl.l
=
and
p2
.I, (1  p + p s ) n
as follows:
ds
(5.16)
BINOMIAL DISTRIBUTION
52
p2
E ( (X
1'
= =
1
~ ~

+ 1)(X l + 2) )
l ( 1 p + p ~ )ds~ d t
+
(1 p)"+'{l+ (71 l)p} (71 1)(n 2)p2
+
+
(5.17)
Moments about zero: Relatioils (5.10)(5.13) readily imply that
and
Ce71 trul m omP 71 ts:
From (5.18) a i d (5.19)' we obtain [see also (5.8)] the variance as
Var X
(5.22)
= Bj2 = 7 ~ p1(  p ) .
Siinilarly, we find from (5.20) and (5.21) the third arid fourth central niomcnts as p3
=
a.j

302Ql
+2
4
= np(1

p)(l

2p)
(5.23)
and
S h n p ~chuructertstzcs: Froin (5.22) (5.24), we find Pcarsoii'b coefficients of sk(>wriessand kurtosis as (5.25) and
(5.26)
MAXIMUM PROBABILITIES
53
respectively. From (5.25), it is clear that the binomial distribution is negatively skewed when p > a.nd is positively skewed when p < f . The coefficient of skewness is 0 when p = (in this case, the distribution is also symmetric). Equation (5.25) also reveals that the skewness decreases as n increases. Furthermore, we note from (5.26) that the binomial distribution may be leptokurtic, mesokurtic, or platykurtic, depending on the value of p . However, it is clear that y1 and yz tend t o 0 and 3, respectively, as n tends t o co (which are, as mentioned in Section 1.4, the coefficients of skewness and kurtosis of a normal distribution). Plots of the binomial mass function presented in Figures 5.1 and 5.2 reflect these properties.
i
_
_
_
_
~
Exercise 5.2 Show that the binomial B ( n , p ) distribution is leptokurtic for p < platykiirtic for
$
4 (1 5 ) or 
p
>i
(1
+ &) ,
arid mesokurtic
5.6
Maximum Probabilities
Among all the binomial B ( n ,p ) probabilities,
m
p)””,
= 0 , 1 , . . . , n,
it will be of interest t o find the rnaxirnum values. It appears that there are two different cases: (a) If m* = ( n
+ 1)p is an integer, then p7n*l
= p,,,.
=
max p,.
O<m 0, of the geometric random variable. In this case we denote it by Y G ( p ,a , h ) , which is a random variable concentrated at points a , a $ h , a 2h,. . . with probabilities
+
+
P{Y

= a+71h} =
(1  p ) p ” ,
63
R =O,l,.
...
(6.2)
GEOMETRIC DISTRIBUTION
64
Tail Probabilities
6.3 If X
N
G ( p ) ,then the tail probabilities of X are very simple. In fa.ct,
Formula (6.3) can be used to solve the following problem.
Exercise 6.1 Let X I , random variables, and
N
G ( p k ) ,k
=
1 , 2 , . . . , n, be a seyuencc of independent
mrL= mi11 &. lions,it will be showri that any geometric distribution is infinitely divisible.
6.8
Entropy
For geornct,ric distributions, one can easily derive the entropy.
Exercise 6.3 Show that the entropy of X
H ( X )= ln(l  p ) in particular, H ( X ) = 2 I n 2 i f p =
+.


G ( p ) ,0 < p < 1, is given by
]rip;
1P
(6.37)
CONDITIONAL PROBABILITIES
6.9
71
Conditional Probabilities
In Chapter 5 we derived the hypergeometric distribution by considering the conditional distribution of X I , given that X1 X2 is fixed. We will now try t o find the corresponding conditional distribution in the case when we have geometric distributions. Let X 1 and X2 be independent and identically distributed G ( p ) random variables, arid Y = X I X,. In our case, Y = 0 , 1 , 2 , . . . and [see (6.30) when n = 21 r = 0,1,2,. . .. P{Y = r } = ( r 1)(1 p l 2 p r ,
+
+
+
We can find the rcquired conditional probabilities as
P{X1
= tjXl+
X,
=r}
=
P(X1 = t , X , = T t } P { X 1 + X2 = r } ~
(6.38)
+
It follows from (6.38) that the conditional distribution of X I , given that X1 X2 = T , becomes the discrete uniform DU(r 1) distribution. Geometric distributions possess one more interesting property concerning conditional probabilities.
+
Exercise 6.4 Let X
N
G ( p ) ,0 < p < 1. Show that
P { X 2 71 holds for any n
= 0,1,.. .
+ V7lX 2}.7
and nz
=P{X
2 71)
(6.39)
= 0 , 1 , . . ..
Remark 6.1 Imagine that in a sequence of independent trials, we have m “successes” in a row and no “failures”. It turns out that the additional number of “successes” iiiitil the first “failure” that we shall observe now, has the same geometric G(p) distribution. Among all discrete distributions, the geometric distribution is the only one which possesses this lack of m e m o r y property in (6.39). Remark 6.2 Instead of defining a geometric random variable as the number of “successes” until the a.ppearance of the first “failure” (denoted by X ) , we could define it altmmtively as the number of “trials” required for the first “failure” (denoted by Z ) . It is clear that 2 = X 1 so that we readily have the mass function of 2 as [see (6.l)]
+
P { Z = n} = (1 p)p”1,
n = 1 , 2 , .. ..
(6.40)
72
GEOMETRIC DISTRIBUTION
From (6.4), wc then have the generating function of 2 as
P ~ ( s=) EsZ
=E
S ~ ” = sEsX
1 if /sI < .
= ___ ’(’’)
P
1ps
(6.41)
Many authors, in fact, take these as ‘standard form’ of the geomrtric distribution [instead of (6.1) arid (6.4)].
6.10
Geometric Distribution of Order k
Consider a sequence of Bernoulli(p) trials. Let 21,denote the nurnber of “tria.ls” required for ‘‘k consecutive failures” to appear for the first time. Then, it can be sliown tha,t the generating function of‘ 21, is given by
PZk(,s)= E s Z k =
+
(1  p ) k s k ( l  s p s ) 1  s +p(l  p ) kSk+l ’
(6.42)
The corresponding distribution has been named the geometric distribution of order k . Clearly, the genemting function in (6.42) reduces t o t1ia.t of the geometric distribution in (6.41) when k = 1. For a review of this and many other “runrelated” distributions, one may refer t o the recent book by Balakrishnari and Koi1tra.s (2002).
Exercise 6.5 Show tl1a.t (6.42) is indeed the generating function of Exercise 6.6 From the generating function of arid variance of Zk.
z k
Zk.
in (6.42). dcrivc the mean
CHAPTER 7
NEGATIVE BINOMIAL DISTRIBUTION 7.1
Introduction
In Chapter 6 we discussed the convolution of n geometric G ( p ) distributions and derived a new distribution in (6.30) whose generating function ha.s the form P,(s) = (1  p)"(l  ps)".
As we noted there, for any positive integer n, the corresponding random variable ha.s a suitable interpretation in the scheme of independent trials as the distribution of the total number of "successes" until the nth "failure". For this interpretation, of course, n has to be an integer, but it will be interesting to see wha.t will happen if we take the binomial of arbitrary negative order as the gt:nera.ting function of some ra.ndom variable. Specifically, let 11s consider the function
m(s) = (1

p),(l

ps),,
cy
> 0,
(7.1)
which is a generating function in the case when all the coefficients ~ 0 . ~ 1. ., . in the power expansion x y = O p k s k of the RHS of (7.1) are nonnega.tive and their sun1 cquals 1. The second condition holds since C y = O p k = P,(l) = 1 for any Q > 0. The function in (7.1) has derivatives of all orders, and simple calculations enable us to find the: coefficients p k in the expansion
k=O
as pk
= 
r,(k)(0) ~~
k!
N(cy
+ 1) ' .
(1 p)"(p)"ai)(a
( a+ k k! '

1)

k! (1 d " P k
73
1).. . (.

k
+ 1)
74
NEGATIVE BINOMIAL DISTRIBUTION
a.nd p ~ .> 0 for a.ny k = 0 , 1 , 2 , . . . . Thus, we have obtained the following assertion: For any CY > 0, the function in (7.1) is generating the distribution concentrat,ed at, points 0, 1 , 2 , . . . with probabilities as in (7.2). It should be meritioried here that negative binomial distributions and some of their genedized forms have found importa.nt applicat,ioris in actuarial w.na.lysis; see, for example, the book of Klugman, Panjer and Willmot (19%).
7.2
Notations
Let a random variable X take on values 0 , 1 , . . . with probabilities pk =
P{X
=
A"}
=
(a
+
;

1) (1  p)"p'" ,
cy
> 0.
We say that X lias the negatzve bznom7al distrabutaon with parameters p (0 > 0, 0 < p < l ) ,and denote it by
x
N
o/
and
NB((L.,p).
Note t,lia.t,N B ( 1 . p ) = G ( p ) and, hencc, the geomet,ric distribution is a pa.rticiilar case of the negative binomial distribution. Sometimes, the negativr: binomial clist,ribution with an integer parameter 0 is called a PG,SUJ,~ distrihu
tiol?,.
7.3
Generating Function and Characteristic Function
There is no necessity to calculate the generating function of this distribution. Unlike the earlier situations, the generating function was thv prirnary object in this cabe, which then yielded the probabilities. Recall that if X NB(a.p), then N
P x ( s )= EsX
=
(  ) O .
1 p s
(7.3)
From ( 7 . 3 ) ,we inimediatcly have the characteristic function of X as
7.4
Moments
From the expression of the generating function in (7.3),we can readily find tlic factorial niotnents.
MOMENTS
75
Factorial moments of positive order: pk
=
where
E X ( X  l )  . . ( X  k + 1) = PC'(1)
r(s)=
./o
00
ezxsl dx
is the complete gamma function. In particular, we have
P4
a(a
+ l ) ( a+ 2 ) ( a+ 3)p4
(7.9) (1  PI4 Note that if a = 1, then (7.5)(7.9) reduce to the corresponding moment,s for the geometric distribution given in (6.6)(6.10), respectively. =
Factorial moments of negative order:
and in particular,
(7.11) if a > 0 and a (6.12).
#
1. The expression for p1 for the case a = 1 is as given in
(7.12)
76
NEGATIVE BINOMIAL DISTRIBUTION
Centrul moment.s: Froni (7.12) and (7.13), we readily find the variance of
0 2 = Var X
= a2
E
as
(7.16)

Similarly, from (7.12)(7.15), we find the third and fourth central moments of X as P3
=
p4
=
(7.17) a4

4a3al f
2 6ff2ctil

3a4 
~
ap (1 p)4 (1
+ (3a + 4)p + p ” } , (7.18)
respectively
Shape characterrstm: Froni (7.16)(7.18), we find Pearson’s coefficients of skewness and kurtosis as (7.19) and P4
y 2 = 7
=
Pz”
=
1
+ (3a + 4)P + P2
CUP 1+ 4 p + P 2 3+( ap
>.
(7.20)
respectively. It is quite clear from (7.19) that the negative binomial distribution is positively skewed. Furthermore, we observe from (7.20) that the distribution is lrptokurtic for all values of the parameters a and p . We also observc, that as 01 tends to cc, y1 and 7 2 in (7.19) and (7.20) trnd to 0 and 3 (the values corresponding to the normal distribution), respectively. Plots of negative binomial mass function presented in Figures 7.1 7.3 reveal these propertics.
7.5
Convolutions and Decompositions

Let XI N B ( a 1 , p ) and ables, arid Y = X1 Xp.
+
X2
N
N B ( a 2 , p ) be two independent random vari
Exercise 7.1 Use (7.3) t o estahlish that NB((wl ( 1 2 , ~ distribution. )
+
Y has
a negative binomial
CONVOLUTIONS AND DECOMPOSITIONS
77
NegBin(2,0.25)
0
10
k
20
Neg Bin(2,0.5)
0
10
k
20
NegBin(2,0.75)
0
10
k
20
Figure 7.1. Plots of negative binomial mass function when
T =2
78
NEGATIVE BINOMIAL DISTRIBUTION
NegBin(3.0.25)
10
k
20
NegBin(3,0.5) I
0 2 i
0
10
20
k
NegBin(3,0.75)
0
10
k
20
Figure 7.2. Plots of negative binoniial mass function whcn
r'
=3
CONVOLUTIONS AND DECOhtPOSITIONS
79
NegBin(4,0.25)
0 06
2 n
0 05 004
000
7
0
10
k
20
NegBin(4,O. 5)
0
10
k
20
NegBin(4,0.75)
0
10
20
k
Figure 7.3.Plots of nega.tive binomial ma.ss function when
T
=4
80
NEGATIVE BINOMIAL DISTRIBUTION
Remark 7.1 Now we see that the sum of two or more independent random variables X k N B ( a k , p ) ,k = 1 , 2 , . . . , a.lso has a negative binomial distribution. On the other hand, the assertion above enables us t o coricliide tha.t any negative binomial distribution admits a decomposition with negative binomial components. In fact, for any n = 1 , 2 , .. . , a ra.ndom variable X N N ( a , p ) can be represented as d = Xl,, ’. N
x
+ + x,,,, ’
where XI,^, . . . , X,,,,, are independent and identically distributed raiidoni variables haviiig N B ( a / n ,p ) distribution. This means that any iiega.tive hinornial dist,ribution (including geometric) is infinitely divisible.
7.6
Tail Probabilities

Let X N B ( n . p ) , where n is a positive integer. The interpretation of X based on iridepcriderit trials (each resulting in “successes” and “failiircs” ) gives iis a way to obtain a siniple foriri for tail probabilities as (7.21)
In fact, event { X 2 m } is equivalent t o the following: If we fix out,corries of the first r n R  1 tria.ls, then the number of “successful” trials must be at least r n . Let Y be the number of LLsuccesses” in m n  1 indepeiident trials. WP know t1ia.t Y has the binomial B ( m n  1,p ) distribution and
+
+
+
Moreover, (5.36) gives the following expression for the RHS of (7.22):
Since
P { X 2 732) = P{Y 2 m}, upon coiiibiniiig (7.21) (7.23), we obtain
As a special case of (7.24), we get equality (6.3) for geometric G ( p )distribution when 71 = 1.
LIMITING DISTRIBUTIONS
7.7
81
Limiting Distributions
Fox X > 0, let us consider X , function of X , in this case is
N
N B ( a ,X/a) where X/a < 1. The generating
(7.25) We see immediately that as
Pa(s) t eX(Sl)
Q
+ 00.
(7.26)
Note tha.t (7.27) is the generating function of a random variable Y taking on values 0,1,2, with proba.bilities Plc =
e'Xk ~
k!
k ?
= 0,1,2,
(7.28)
In Chapter 5 we mentioned that Y has the Poisson, distribution. Now relation (7.26) implies that for any k = 0,1, . . . , (7.29) that is, the Poisson distribution is the limit for the sequence of N B ( a ,X / a ) distributions as cy + 00. Next, let X , N B ( a , p ) and N
w,
=
X,

EX,
Jm (7.30)
Let f,(t) be the cliaracteristic function of W,, which can be derived by standard methods from (7.4). It turns out that for any t , (7.31) Comparing this with (5.43), we note that the sequence W,, as (Y + m, converges in distribution to the standard normal distribution, which we came across in Cha.pter 5 in a similar context.
This Page Intentionally Left Blank
CHAPTER 8
HYPERGEOMETRIC DISTRIBUTION 8.1
Introduction
In Chapter 5 we derived hypergeometric distribution as the conditional distribution of X I , given that XI X z is fixed, where XI and X z were binomial random variables. A simpler situation wherein hypergeometric distributions arise is in connection with classical combinatorial problems. Suppose that an urn contains a red and b black balls. Suppose that n balls are drawn at random from the urn (without replacement). Let X be the number of red balls in the sa.mple drawn. Then, it is clear that X takes on an integer value m such that
+
max(0, n  b) 5 m 5 min(n, a ) ,
(8.1)
with proba.bilities
8.2
Notations
In this case we sa.y that X ha.s a h?jpergeometrzc distribution with parameters n, a, and b, and denote it by
x
N
H g ( n ,a , b).
Remark 8.1 Inequalities (8.1) give those integers m for which the RHS of (8.2) is nonzero. We have from (8.2) the following identity:
83
HYPERGEOMETRIC DISTRIBUTION
84
To simplify our calculations, we will suppose in the sequel that n 5 min(a, b) and hence probabilities in (8.2) are positive for m = 0,1, . . . , n only. Identity (8.3) t h m beromes the following useful equality:
Generating Function
8.3
If X H g ( n ,a , b ) , then we can write a formal expression for its generating function as N
Px(s)=
n!(a+ b ((J

+ b)!
n)!
c 7L
m=O
m!( a  m)! (71
a!b! s*. m)! ( b  72. + m)!
(8.5)

It turns out that the RHS of (8.5) can be simplified if we use the Gaussian hypcrgeonietric function
which was introduced in Chapter 2. Then, the generating funct,ion in (8.5) becomes
from which it becomes clear why the distribution has been given the name hypergeometric distribution.
8.4 Characteristic Function On applying the relation between generating function and characteristic function, we readily obtain the characteristic function from (8.6) t o be fx(t)
= 
8.5
E e Z t X= Px(e”) 2F1 [n, a;b  n + 1;eZL] * zF1[n, a; b  72 1;11
+
(8.7)
Moments
To begin with, we show how WP can obtain the most important moments a1 = E X and /& = Var X using the “urn interpretation” of X . We have n
MOMENTS
85
red balls in the urn, numbered 1 , 2 , . . . , a. Corresponding to each of the red balls, we introduce the random indica.tors Y1,Y2,. . . , Y, as follows:
Yk
=
1
if the kth red ball is drawn
=
0
otherwise.
Note that
EYk
= P{Yk =
n 1) : a+b'
k = 1 , 2 , .. . , a ,
and
k = 1 , 2 ) . . . ,a. It is not difficult, to see that
+
X=Y,+Y2+..
(8.10)
KL.
It follows immediately from (8.10) that
EX
= E(Y1
an + . . . + Y,) = aEY1 = __ a+b
(8.11)
Now. (8.12)
Using the symmetry argument, we can rewrite (8.12) as Var X
=a
Var Yl
+ a(a

(8.13)
1) Cov(Yl,Y2).
Now we only need to find the covariance on the RHS of (8.13). For this purpose, we have
=
P(YIY2 = 1 } 
(
(8.14)
Note that
P{y1Y2=1}=
(';)/("Y) =
n(n 1) (u+bj(a+b1)
.
(8.15)
Finally, (8.9) and (8.13)(8.15)readily yield Var X
=
abn(a + b  n) ( a b ) 2 ( ~ b  1)
+
+
(8.16) '
86
HYPERGEOMETRIC DISTRIBUTION
Factorial rnonierits of positive order: pk =

E X ( X  1 ) .. . ( X
a! b! ,n! ( a (0


+b
+ b)!
u! b! 72! ( a + b (a b)!
+ a!n! (a+ b + b)! (U
(U


Ic
+ 1)
n)! m=k

1 ( m  k ) ! ( a  m)! ( n  m)! ( b  n + m)! 1
n)!nk m=O
m! ( a  k
1('"m k , ( n
n)! nk  k)! m=O 
m ) ! ( n k



m)! ( b  7~
+ k + m)!
;,,)
(8.17)
Froni (8.4)' we know that
c
71!i
711
b
(a+ b
rk)! ( n k ) ! ( U + b n)!
USbk
=o ( U ~ k ) ( n  k  m


using which we readily obtain /Lk
=
a! n! ( a + b  k ) ! ( u b)! ( a  Ic)! (71  k ) !
+
for k 5 n.
(8.18)
Note also that p k = 0 if k > n. 1x1 particular, the first four factorial moments are as follows: 111
=
p2
=
11.3
=
114
=
an a+b'
(8.19)
__
1) 1) ' u ( u  1 ) ( a 2)n(n 1 ) ( r t  2) ' (a b ) ( u b  1)(a b  2 ) a(a 1 ) ( a  2 ) ( u  3)n(n 1)(n  2 ) ( n 3 ) (a b ) ( ~ b  l)(a b  2 ) ( ~ b  3 ) U ( a  1)72(n 
(a
+ b ) ( a+ b
+ 

+
+
+
+
+
Fuctoraal momrnts of negutave order: Analogous to (8.17) and (8.18)' we have 1dA
=
E
[( X +
1
1 ) ( X + 2 ) . . . ( X +k )
+
I
(8.20) (8.21) (8.22)
MOMENTS 
=
a! n!( a + b (a k)! (a
+

n)! n+k
+ b ) ! m=k
(
a! n! ( a + b  n ) ! a + b + k (a+k)!(a+b)! n+k 
87
a! n!( a + b  n ) ! ( a k ) ! ( a b)! m = O
+
)
+
In particular, we find I*.1
= 

1
.(X+.) a+b+l ( a l ) ( n 1) ( a a+b+l ( a l ) ( n 1) ( a
+
+
+
+
a! n! ( a + b  n ) !b! + l)!( a + b)! ( n + l ) ! ( b  n  l ) ! b! ( a + b  n)! + l ) ( n + 1)(a+ b)! ( b  n  l)! .
(8.24)
Moments about zero: Indeed, the RHSs of (8.11) and (8.19) coincide, and we have
Furthermore, we can show tha.t 0 2 =EX
2
= p 1 + I*.2 =
and a3
= 
+
an a f b
~
E X 3 = /LI + 3 p ~ p3 an 3a(a  l ) n ( n  1) a+b ( a + b ) ( a + b  1)
+
l ) n ( n 1) + (a(a a+b)(a+bl) 

(8.25)
 2)n(n l ) ( n 2) + a (( aa + b1)(a )(a+b l)(a+b2) 


(8.26)
Remark 8.2 If n
=
1, then the hypergeometric Hg(1,a , b) distribution co
incides with Bernoulli Be
(a 1 b ) ~
distribution.
Exercise 8.1 Check that expressions for moments are the same for
Hg(1,a , b) as those for Be ( a ~
b ) distributions.
Exercise 8.2 Derive the exprcssions of t,he second and third moments in (8.25) and (8.26), respectively.
HYPERGEOMETRIC DISTRIBUTION
88
8.6
Limiting Distributions

Let X N H g ( n . p N , (1 p ) N ) , N = 1 , 2 , . . . , where 0 Then, wr have
~
< p < 1 and R is fixed.
rl! m! ( n m)! pN(pn'l)...(pNm+l)((lp)N)((lp)Nl)...((l~~)N~n+~+l) N(Nl)...(Nnfl)
from which it is easy to see that for any fixed m
= 0, 1 , .. .
, n,
asNioo.
P{XN = m} i
(8.27)
Thus, we get the binomial distribution as a limit of a special sequcnce of hypergronirt rir distributions.
Exercise 8.3 Lrt X,v any fixed ni = 0 . 1 , . . . ,
P{XN
N
1 , 2 , . . . . Show then that for
H g ( N ,A X 2 , N 3 ) , N
=
A'" epx 7n.
as N
= m} i
t 00.
(8.28)
The Poisson distribution, which is the limit for the sequence of hypergeometric rantloni variables, present in Exercise 8.3, is the subject of discussion in the iirxt chapter.
CHAPTER 9
POISSON DISTRIBUTION 9.1
Introduction
The Poisson distribution arises natiirally in many instances; for example, as we have already sectn in preceding chapters, it appears as a limiting distribution of some sequences of binomial, negative binomial, and liypergeometric random variables. In addition, due to its many interesting characteristic properties, it is also used as a probability model for the occurrence of rare events. A booklength account of Poisson distributions, discussing in great detail their various properties and applications, is available [Haight (1967)l.
9.2
Not at ions
Let a random variable X take on values 0,1,. . . with probabilities
cAAm p , = P { X = m } = , ni!
m=0;1, ...;
(9.1)
where X > 0. We say that X has a Poisson distribution with paramcter A, and denote it by
X
N
..(A).
If Y = u + h X , 30 < a < 00,h > 0, then Y takes on values a,a+h,a+2h,.. . wit,h probabilities P { Y = a+nzh}
epAXm
= P { X = m} = ____
m!
’
m
= 0,1,
This distribution also belongs t o the Poisson type of distribution, and it will be denoted by .(A, a , h). The standard Poisson .(A) distribution is nothing but n(X,0 , l ) .
89
POISSON DISTRIBUTION
90
9.3 Let, X
Generating Function and Character ist ic F'unct ion

.(A),
X
> 0. From (9.1), we obtain the generating function of X as
Then, the characteristic function of X has the form . f x ( t )= EeitX = Px(eit)= exp{X(eit
9.4

1)).
(9.3)
Moments
The simple form of the generating function in (9.2) enables us to derive all the factorial iiioinerits easily.
Factorial moments of positive order: I L = ~ EX(X

1 ) . . . ( X k + 1) = PF'(1)= X k ,
k = 1,2,....
(9.4)
In particular, we have:
Factorial moments of negative order:
In particular, we obtain (9.10) and
TAIL PROBABILITIES
91
Moments about zero: From (9.5)(9.8), we immediately obtain the first four moments about zero as follows:
Central m o m e n t s : From (9.12) and (9.13), we readily find the variance of X as p2 = Var
X
= a2  a; = A.
(9.16)
Note that if X N .(A), then E X = Var X = A. Further, from (9.12)(9.15), we also find the third and fourth central moments as
/!& b4
=
E(X

E X ) 3=

3Q2Q1+
2 ~= ; A,
=
E(X

EX)4= a 4

4a3~tl

+ 6a2a'4
3ai: = X
+ 3X2,
(9.17) (9.18)
respectively.
Shape characteristics: From (9.16)(9.18), we obtain the coefficients of skewness and kurtosis as (9.19) 72
=
P4
1
jg=3+x,
(9.20)
respectively. From (9.19), we see that the Poisson distribution is positively skewed for all values of A. Similarly, we see from (9.20) that the distribution is also leptokurtic for all values of A. Furthermore, we observe that as X tends to 03, the valiies of 71 and y2 tend to 0 and 3 (the values corresponding to the normal distribution), respectively. Plots of Poisson mass function presented in Figure 9.1 reveal these properties.
Tail Probabilities
9.5 Let X
N
.(A).
Then (9.21)
The RHS of (9.21) can be simplified and written in the form of an integral.
92
POISSON DISTRIBUTION
Exercise 9.1 Show that for any m = 1,2 , . . ,
P { X 2 m} =
(9.22)
du.
Remark 9.1 We will recall the expression on the RHS of (9.22) later when we discuss the gamma distribution in Chapter 20.
9.6
Convolutions
Let XI .(XI) and X 2 .(A,) be independent random variables. Then, by it is easy t o show making use of the generating fiinctions P x , ( s ) and PxZ(s), that, Y = XI X2 ha.s the generating function N
N
+

+
and, hence, Y .(XI A 2 ) . This simply means that convolutions of Poisson distributions arc also distributed a s Poisson.
9.7
Decompositions
Due to the result just st,ated, wc know that for any X pair of independent Poisson random variables U .(XI) where 0 < A1 < A, to obtain the decomposition N
N
.(A) wt' can find a and V T ( X  XI),
d
X=U+V. Hrncc, any Poisson distribution is decomposable. 1 , 2 . . . . , the decomposition

(9.23) Moreover, for any n
x = x1+x2 + + x , d
" '
=
(9.24)
holds with X ' s being independent and identimlly distributed as r ( X / n ) . This siniply implies that X is infinitely divisible for any A. Raikov (1937h) established that if X a.dmits decomposition (9.23), them both the independent nondegenera.te components U a.nd V have necessarily a. Poisson type of distribution; that is, there exist constants cc< u < cx) arid X I < x such that
U  J T ( X ~ , U and ,~) V~~(Axl,a,l). Thus, convolutions and decompositions of Poisson distributions always belong t o the Poisson class of distributions.
DECOMPOSITIONS
93
Poisson( 1)
03
nr
0
5
10
n
Poisson(4)
0
5
10
n
Poisson( 10)
0
10
20
n
Figure 9.1. Plots of Poisson mass function
94
POISSON DISTRIBUTION
Conditional Probabilities
9.8


Let X I .(XI) arid X2 r ( X 2 ) be independent random variables. Consider the conditional distribution of X1 given that XI +XZ is fixed. Since X 1+ X 2 r(X1 Xz), we obtain for any n = 0 , 1 , 2 , . . . and 712 = 0 , 1 , . . . , n that
+
P(X1
+ x2 = n } P ( X 1 = m , X l + X2 = n}
= mix1
x,

P{X1+ = n} P { X ] = Ill, x2 = 72  m } P { X l + x2 = n } P ( X 1 = rrb}P{X2 = n  nL} P { X I + x, = 7L)


,  A I x ~ L1
~
~
m!
=
( 7 3
n!
eX2,ym
(n

m)!
+ X2)"
dX1+XZ)(X1
(A)nL+
Thus, the conditional distribution of X 1 , given t,hat X I the biriorriial B
( ?' h z ) n, X1 ~
(9.25) X2 =
n, is simply
distribution.
Now, we will try to solve the inverse problem. Let, X1 and X2 be independent random va.riablcs taking on values 0 , 1 , 2 , . . . with positive proba.bilities. Then, Y = X1 X2 also takes on values 0 , 1 , 2 , . . . with positive probabilities. Suppose t,ha.t for a.ny n = 0 , 1 , . . . , the conditional distribut,ion of X I , given that Y = 71, is binomial B ( n , p ) for soiric: pa.rameter p , 0 < p < 1. Then, we are interested in determining the distributions of X1 and X2! It, turns out that both distributions arc Poisson. To see this, let
+
P(X1
= 711) = r m >
0,
aid
P{X2=l}=qe>0,
V L=
0, I ? .. . ,
1=0,1,....
As seen above, the conditional probabilities P { X 1 = ni/Y = n } result in the expression r.,,q,,/P{Y = n}. In this situation, we get the equality Trn4nm
P{Y
=
n}

n! p m ( l  p)"", nz! ( n nz)!
m
Compare (9.26) with the sanie equality written for m has the form
= 0 , 1 , . . . ,71.
(9.26)
+ 1 in place of 712, which
It readily follows from (9.26) and (9.27) tha.t (9.28)
MAXIMAL PROBABILITY holds for any n = 0 to obtain
=
1 , 2 , . . . and m
=
95
0 , 1 , . . . ,n. In particular, we ca,n take
m
(9.29) Let us denote X Yn =
= q ( 1  p)/(rop). Then,
X n
%I
=
A2
n(n  I)
we immediately get qo
= epx
4n =
(9.29) implies that
A" qnp2 = . . . = qo,
n = 1 , 2, . . . .
n!
(9.30)
and, consequently,
e'Xn n! '
~
n=o,1, ...
,
(9.31)
where X is some positive constant. On substituting qn into (9.28) and taking m = n  1, we obtain
Hence, and consequently, (9.32) Thus, only for X I and X2 having Poisson distributions, the conditional distribution of X I , given t1ia.t X I X2 is fixed, can be binornial.
+
9.9
Maximal Probability
We may often be interested to know what of proba.bilities p , ( m = 0 , 1 , . . .) are maximal.
= e'X"/rn!
Exercise 9.2 Show tha.t there are the following two situations for maximal Poisson probabilities: If mo < X < mo 1, where mo is an integer, then p,,, is maxinial among all probabilities p , ( m = 0 , 1 , . . .). If X = mo, then p,,, = pmOpl,and in this case both these probabilities a.ro maximal.
+
96
POISSON DISTRIBUTION
Limiting Distribution
9.10 Let X,
N
.ir(n),
n = 1 , 2 , . . . , and X,EX, w,, = d=

Xnn
f i '
Using characteristic functions of Poisson distributions, it is easy t o find the liniiting distributions of random variables W,.
Exercise 9.3 Let g T 1 ( t )be the characteristic function of W,. Then, prove that for any fixed t , g,,(t) + e
t'/2
as n
+ 00.
(9.33)
Remark 9.2 As is already known, (9.33) means that the standard normal distribution is the limiting distribution for the sequence of Poisson random variables W l ,1V2, . . . . At the same timc, tlir Poisson distribution itself is a limiting form for the binomial and some other distributions, a5 noted in the preceding chapters.
9.11
Mixtures
In Chapter 5 we considered the distribution of binomial B ( N , p ) random varia.bles in the case when N itself is a binomial random variable. Now we discuss the dist,ribution of B ( N , p ) random vaxiable when N has tht: Poisson distribution. La.ter, wr deal with a niore general Poisson mixtures of random va.riables. Let X , . X , , . . . be independent and identically distributed ra.ndom varip k s k , where ables liaviiig a conirnon generating function P ( s ) = Cy=o p k = p{x7,L = k } , 172 = 1 , 2, . . . , k = 0 , 1 , 2 , . . . . Let s o = 0 and = XI t X2 t. . . t X , ( n = 1 , 2 , . . .) bc the cumulative sunis of X ' s . Then, the genera.ting function of S, has the form
s,,
Pn7ns) = EsSn
=
ESXl+"'fX,,
=
E s ~E ' s X a . . . E s X V= L P"(s),
n = 0,1,.. . .
Consider now an intcgervalued random variable N taking on values 0 . 1 , 2 . with probabilities qn = P { N = n } ,
n = 0 , 1.....
MIXTURES
97
Suppose that N is independent of X I ,Xa,. . . . Further, suppose that Q ( s ) = 00 Cn=Oqn~n is the generating function of N . Let us introduce now a new random variable, Y =SN. The distribution of Y is clearly a mixture of distributions of S, taken with probabilities qn. Let us find probabilities r , = P { Y = m } , m = O , l , . . . . Due to the theorem of total probability, we readily have
r,
=
P{Y
= m } = P{SN = m )
cu
=
C p { s ~ mlN =
= 7b}P{N = n}
n=O 0
(9.34) n=O
Since random variables S,
+ ... + X ,
= XI
and N are independent,
P{Sn = ,m/N = n } = P { S n = na}; and hence we may write (9.34) as 00
r,
=
C P{S,
= m>qn,
n=O
m
= 0, I,..
(9.35)
Then, the genera.ting function of Y has the form
m=O m
m
m=O n=O 00
00
n=O
nL=O
(9.36) We note that the sum
oii
n1=O
is simply the generating function of Sn and, as noted earlier, it is equal to P"(s). This enables us to simplify the RHS of (9.36) and write 00
(9.37) n=O
Relation (9.37) gives the genemting function of Y, which provides a way t o find the probabilities TO, r 1 , . . . .
98
POISSON DISTRIBUTION Suppose that we take V,
N
B ( n , p ) ,and we want t o find the distribution of
Y = V N ,where N is the Poisson r ( X ) random variable, which is independent of V,, V,. . . . . We can then apply relation (9.37) t o find the generating function of Y . In fact, for any n = 1 , 2 . . . ., due to the properties of the binomial distribution, we have
v, 2 X 1 + X2 + . ’ . + x,,
where X I ,X 2 , . . . are independent Bernoulli B e ( p ) randoni variables. Therefore, wc can consider Y as a sum of N independent Bernoulli random variables
Y =d X I
+ X2+...+XN
.
(9.38)
In this case,
P ( s )= E s X k = 1  p + p s , and therefore,
k
=
1 , 2 , .. . ,
Q ( s ) = E s N = eX(’l).
Hence, wc readily obtain from (9.37) that
R ( s ) = EsY
= & ( P ( s ) )= exp{X(ps

p ) } = exp{Xp(s
~
1)).
(9.39)
Clearly, the RHS of (9.39) is thc generating function of the Poisson .(Xp) random variable, so we have
P{Y
= m} =
e’P
(Xp)” m!
,
m = 0,1,2,
Exercise 9.4 For any n = 1 , 2 , . . ., let the random variable X , take on values 0 . 1 , . . . , n  1 with equal probabilities l / n , and let N be a randorn variable independent of X ’ s having a Poisson .(A) distribution. Then, find the generating fiiriction of Y = X N arid hericc P{Y = 0).
More generally, when ATis distributed a,s Poisson with parameter X so that Q(.) = ex(”’)
[see (9.2)],(9.37) becomes
qs)= e A { l Y s )  l  l .
This t,hen is the generating function of the distribution of the sum of a Poisson number of i.i.d. random variables with generating function P ( s ) . The corrcspondiiig distributions are called Poissonstoppedsum distributionq a na,nie introduced by Godainbe and Patil (1975) and adopted since then by a nurnber of authors including Doug1a.s (1980) a.nd Johnson, Kotz, and Kemp (1992). Some other na,mes such a.s generalized Poisson [Feller (1943)], stuttering Poisson [Kemp (1967)] and compound Poisson [Feller (1968)l have also been i i s e d for these distributions.
RAORUBIN CHARACTERIZATION
99
Exercise 9.5 Show that the negative binomial distribution in (7.2) is a Poissonstoppedsum distribution with P ( s ) being the generating function of a logarithmic distribution with mass function
9.12
RaoRubin Characterization
In this section we present (without proof) the following celebrated RaoRubin characterization of the Poisson distribution [Rao and Rubin (1964)]. If X is a discrete random variable taking on only nonnegative integral values and that the conditional distribution of Y , given X = xi is binomial B ( z , p ) (where p does riot depend on x ) , then the distribution of X is Poisson if and only if
P ( Y =ylY = X ) = P ( Y
= y/Y#
X).
(9.40)
An interesting physical interpretation of this result was given by Rao (1965) wherein X represents a naturally occurring quantity with some of its components not being counted (or destroyed) when it is observed, and Y represents the value remaining (that is, the components of X which are actually counted) after this destructive process. In other words, suppose that X is the original observation having a Poisson .(A) distribution, and the probability that the original observation n gets reduced t o 2 due to the destructive process is (9.41) Now, if Y represents the resulting random variable, then
P(Y
= y)
= 
P (Y = yldestroyed) ePx (PA)” .
Y!
,
=P
(Y = ylnot, destroyed) (9.42)
furthermore, the condition in (9.42) also characterizes the Poisson distribution. Interestingly, Srivastava and Srivastava (1970) established that if the original observations follow a Poisson distribution and if the condition in (9.42) is satisfied, then the destructive process has to be binomial, as in (9.41). The RawRubin characterization result above generated a lot of interest, which resulted in a number of different variations, extensions, arid generalizations.
100
9.13
POISSON DISTRIBUTION
Generalized Poisson Distribution
Let, X be a random variable defined over nonnegative integers with its probability function as p,(H,X)
=
P { X = 7n)
where 0 > 0, max(l,8/!) < X 5 1, and l (2 4) is the largest positive integer for which 8 t X > 0 when X is negative. Then, X is said t o have the generalized P o i s s o n (GPD) distribution. A booklength account of generalized Poisson distributions, discussing in great detail their various properties and applications, is available and is due to Consul (1989). This distribution, also known as the LagrangianPoisson distribution, is a. Poissonstoppedsum distributioii. The special case when X = 00 in (9.43) is called the restricted generalized P o i s s o n distribution, and the probability function in this case beconies
+
1
p,(O, A) = P { X = n?,}=  H m ( l nL!
wherc max(1/6, 1,")
+ rna)mleBma@,
7R=0,1, ...,
(9.44)
< a < l/6.
Exercise 9.6 For the restricted generalized Poisson distribution in (9.44), show that the mean and variance are given by
8 1 0 0 respect i w 1y.
and
H (1

4
3
,
(9.45)
CHAPTER 10
MISCELLANEA Introduction
10.1
In the last eight chapters we have seen the most popular and commonly encountered discrete distributions. There are a few natural generalizations of some of these distributions which are not only of interest due to their mathema.tica1niceties but also due to their interesting probabilistic basis. These are described briefly in this chapter, and their basic properties are also presented.
P6lya Distribution
10.2
P6lya (1930) suggested this distribution in the context of the following combiiiatoriad problem; see also Eggenberger and P6lya (1923). There is an urn with b black and T red balls. A ball is drawn at random, after which c 1 (where c 2 1) new balls of the same color are added t,o the urn. We repeat this process siiccessively n times. Let X , be the total number of black balls observed in these n draws. It can ea.sily be shown that X , takes on values 0,1, . . . , n with the following probabilities:
+
TI
pk
=
where k
(k)
+
+ +
b(b C) . . . { b ( k  ~)c}T(T+ C) . . . { T ( b + T ) ( b + T C)(b+ T 2c). . . { b +
= 0 , 1 , . . . ,n
p=
and n
b b+r '
=
+
+ (n T

+ (n
k 
~
l)~}
l)C}
(10.1)
1
1 , 2 , . . . . Let us now denote
r q = 1 p = 
C
(10.2)
a=bfr'
b+r
Then, the proba.bility expression in (10.1) can be rewritten in the form pk =
n P ( P + a ) .. . { P (k) (1
+ ( k l)ol}q(q + a ) ' . { g + ( n + a ) ( 1 + 2 a ) . ' . (1 + ( n 1)a) 
'


k

l)a}
>
(10.3)
wherein we can forget that p , q, and a are quotients of integers a.nd suppose only that 0 < p , q < 1, arid a >  l / ( n  1). Note that if cy > 0, then (10.3), 101
102 for 1 < k
MISCELLANEA
< n , can be expressed in terms of complete beta functions
as
(10.4) where the complete beta function is given by
Distribution in (10.3) is called the Po’lya distribution with parameters n , p , and a , where n = 1 , 2 , . . . , 0 < p < 1 and Q >  l / ( n  1). If a = 0, then the probabilities in (10.3) simply become the binomial B ( n , p )probabilities. As a matter of fact, in the P6lya urn scheme, we see that this case corresponds to the number of black balls in a sample of size n with replacement from the urn, in which the ratio of black balls equals p . Next, we know from Chapter 8 that a hypergeometric random variable can be interpreted as the number of black balls in a. sample of size n without replacement from the urn. Clearly, this corresponds to the P6lya urn scheme with c = 1, that is, Q = 1/(b r). Thus, P6lya distributions include binomial as well as hypergeometric distributions as special cases, and hence may be considered as a “bridge” between these two types of distributions. Note also that if a random variable X has distribution (10.4), then
+
EX
=
(10.5)
np
and
Var X
10.3
l t a n
= np(1  p ) ____
l+cW
.
(10.6)
Pascal Distribution
From Chapter 7 we know that the negative binomial N B ( m , p )distribution with integer parameter ~ r is t same as the distribution of the sum of m indepeiident geometrically distributed G ( p ) random varia.bles. If X N B ( m ;p ) , then N
(10.7) The appearance of this class of distributions in this manner was, therefore, quite natural. Later, the family of distributions (10.7) was enlarged to the family of negative binomial distributions N B ( a ,p ) for any positive parameter a . Negative binomial distributions with integer parameter a are sometimes (especially if we want to emphasize that Q is an integer) called the Pascal distributions.
NEGATIVE HYPERGEOMETRIC DISTRIBUTION
10.4
103
Negative Hypergeometric Distribution
The Pasca.1 distribution (i.e., the negative binomial distribution with integer parameter a ) has the following “urn and balls” interpretation. In an urn that contains black and red balls, suppose that the proportion of black balls is p . Balls are drawn from the urn with replacement, which means tha.t we have the P6lya urn scheme with c = 0. The sampling process is considered to be completed when we get the mth red ball (i.e., the mth “failure”). Then, the number of black balls (i.e., the number of “successes”) in our random sample has the distribution given in (10.7). Let us now consider an urn having T red and b black balls. Balls are drawn without replacenient (i.e., the P6lya urn scheme with c = 1). The sampling process is once again considered t o be completed when we get the m t h red ball. Let X be the number of black balls in our sa.mple. Then, it is not difficult to see that X takes on values 0 , 1 , . . . , b with probabilities
Such a randoni variable X is said to have the negative hypergeometric distribution with parameters b = 1 , 2 , . . ., r = 1 , 2 , . . ., and m = 1 , 2 , . . . , r.
Exercise 10.1 Show that the mean and variance of the random variable above are given by
EX=and Var X respectively.
=
m(b + r (T
mb r+l
+ l)b(r
+
1)2(T
(10.9)

m
+ 2)
+ 1)’
(10.10)
This Page Intentionally Left Blank
Part I1
CONTINUOUS DISTRIBUTIONS
This Page Intentionally Left Blank
CHAPTER 11
UNIFQRM DISTRIBUTION 11.1 Introduction The uniform distribution is the simplest of all continuous distributions, yet is one of the most important distributions in continuous distribution theory. As shown in Chapter 2, the uniform distribution arises naturally as a limiting form of discrete uniform distributions.
11.2
Notations
We say that the random variable X has the standard uniform distribution if its pdf is of the form 1 0
ifO a ) , we obtain the distribution with pdf ifa<x = zn.
n zd(z") = n+l '
(11.32)
(11.33) (11.34)
and 1 E M n
1
+1+ O
= Em, = ___ 71
asntm.
(11.35)
This shows that for large n, 1 Mn and m, are close to zero. To examine the behavior of m, a.nd 1 Mn for large n, we will now determine the asymptotic distributions of these ra.ndom variables, after suitable normalization. It turns out that
Gn(:)
":I
=
P{m,< n =P{~zm,<x}

1  (1 
37''
1' 1 
e P
for z
> 0,
(11.36)
as n t 00. Thus, we have established that the asymptotic distribution function of n min(X1, X z , . . . , X,) is given by
G ( x )=
1  e"
ifx 0,
(14.2)
and the corresponding pdf is pa(.)
0 < 2 < 1, Q > 0.
= ad1,
(14.3)
In the special case when a = 1, we have the standard uniform distribution. The linear transformation gives us a general form of the power distribution. The corresponding cdf is
2a G a ( z ) =( T ) a ,
a < z < a + h , a>0,
(14.4)
and the pdf is
ga(x) = ah"(x

ay1,
(14.5)
where Q > 0, cc < a < 03 and h > 0 are the shape, location, and scale parameters of the power distribution, respectively. We will use the notation
x

P o ( a , a, h )
to denote the random variable X having cdf (14.4) and pdf (14.5).
128
POWER DISTRIBUTION
Remark 14.1 Let U have the standard uniform U ( 0 , l ) distribution, and X Po(a,a , h ) . Then, N
X
14.3
d
a
+ hU'l*.
(14.6)
Distributions of Maximal Values
Let X I , X 2 , . . . , X , be independent random variables having the same power Po(cu,a , h ) distribution, and M , = max(X1, X 2 , . . . , X,}. It is then easy to see that,
P { M , 5 x} m,
=
(xha)"'L, __
M,
N
a<x X"}
=
1  X",
x >_ 1,
(15.1)
and (15.2)
A booklength account of Pareto distributions, discussing in great detail their various properties and applica.tions, is available [Arnold (1983)].
15.2
Notations
A random variable X with cdf (15.1) and pdf (15.2) is said t o have the standard Pareto distribution. A linear transformation Y=a+hJ gives a general form of Pareto distributions with pdf
ah"
if z > a + h if z < a + h ,
1 0
and cdf
FY(z) = 1
(")
2a
a
,
x >_ a t h , 133
M
< a < 00. h > 0.
(15.3)
(15.4)
134
PARETO DISTRIBUTION
We will use the notation
a > o , cw 0,
Pa(a,O,l ) ,
corresponds to the standard Pareto distribution with pdf (15.2) and cdf (15.1). Note that if Po(a,0, l),
v
then
N
1 x =V
N
Pa(a,O,1).
It is easy t o see that if Y Pa(a,a , h ) , then the following representation of Y in terms of the standard uniform random variable U is valid: N
Y
1 a + hU'la.
(15.5)
Exercise 15.1 Prove the relation in (15.5).
15.3
Distributions of Minimal Values
Just as the power distributions have simple form for the distributions of maximal values M , = max(X1, X,, . . . , X,}, the Pareto distributions possess convenient expressions for minimal values. Let X1 , X2, . . . , X , be independent random variables, and let
XI,
N
P u ( ~ Ia,, ,h ) ,
k = 1 , 2 , . . . , R,
i.e., these random variables have the same location and scale parameters but have different values of the shape parameter QI,. Now, let mTL = min{X1, X 2 , . . . ,X n } .
It is then ea.sy to see that
J:
where
a(n) = a1
+ a2 + . . . + an.
We siniply note from (15.6) that
rn,
N
Pa(a(n),a, h).
Thus, the Pareto distributions are closed under minima.
2 a + h,
(15.6)
DISTRIBUTIONS OF MINIMAL VALUES
135
Exercise 15.2 Let X I ,X2, . . . ,X , be a random sample from the standard Pareto Pa(a,O,1) distribution, and let X I , , < X Z , , < ... < X,,, be the corresponding order statistics. Then, from the pdf of x k , , ( k = 1 , 2 . . . , n) given by
pk,n(x) = ( I c
n! {F,(2)}k1 (1  F,(x)}nkp,(2), 0 < 2 < 1;  l ) ! ( n  Ic)!
where F,(s) and pa(.) mulas:
are as in (15.1) and (15.2), derive the following for
and
Exercise 15.3 Let X I ,X 2 , . . . ,X , be a random sample from the standard pareto Pa(a,O,1) distribution, and let X I , , < Xz,, < ... < X,,, be the corresponding order statistics. Further, let
Prove that W1,W2,.. . ,W, are independent random variables with tributed as P a ( ( n k 1)a,0 , l ) .
+
w k
dis
Exercise 15.4 By making use of the distributional result above, derive the expressions for EXk,n and E(XZ%,)presented in Exercise 15.2. Exercise 15.5 Let V1,Vz,. . . ,V, be a random sample from the standard power Po(a,0 , l ) distribution, and let Vl,, < VZ,,,. . . , V,,, be the corresponding order statistics. Further, let X I ,X 2 , . . . , X , be a random sample from the standard Pareto Pa(a,O,l)distribution, and let X I , , < Xz,, < . . . < X,,, be the corresponding order statistics. Then, show that d
Xk,n =
1
vnk+ 1 ,n
for I c = 1 , 2 ,..., n.
Exercise 15.6 By using the propoerty in Exercise 15.5 along with the distributional result for order statistics from power function distribution presented before in Exercise 14.2, establish the result in Exercise 15.3.
136
PARETO DISTRIBUTION
15.4 Moments
+
Let X Pa(a,O,1) and Y = a hX P a ( a , a , h ) . Unlike the power dist,ributions which have bounded support and hence possess finite moments of all orders, the Pareto distribution Pa(cy,a , h ) takes on values in an infinite interval ( a h, m) and its moments &IP and central moments E ( Y  EY)O are finite only when ,B < a. N
N
+

Moments about zero: If X Pa(a,O,l ) ,then we know that X can be expressed in terms of the standard uniform random variable U as
Hence. an = E X " =EU"I"
1
1
=
and
xnla d x =
if n
a , = oc)
I n the general case when Y consequently, we obtain EYn
=
0,
a an
__
if n < a ,
(15.7)
2 a.
P a ( a , a , h ) ,the relation in (15.5) holds and =E(u
+ hU'/O)"
(15.8)
a.nd a ,
= 00
for n
2 a. In particular, we have a1
EY
1
ha a1
= a + __
if a > l
(15.9)
and (15.10)
ENTROPY
137
Central moments:
The relation in (15.5) can also be used t o find the central moments of a Pareto distribution, as follows:
Pn
=
E(Y  EY)" = h"E(Ul/"

EUl/")"
(15.11) for n < a. From (15.11) we find the variance of Y t o be Var Y
= /32
=
a2
(a1)2 (15.12)
Plots of Pareto density function are presented in Figure 15.1 for some choices of CY.
15.5
Entropy
The entropy N ( Y ) of Y
N
P a ( a ,a,, h ) is defined by
where p(x) is as given in (15.3).
Exercise 15.7 If Y

Pu(a,a , h), show that
H ( Y ) = (I
+ 2a)logh

logs
+ 1+ o!1 . 
( 15.13)
PARETO DISTRIBUTION
138
3
Pareto(0 5) Pareto(2) Pareto(4)
m
U
n
a
m I I
,, .
I
7
.
, \
1
. I .  _
0
...__ ...._.__._.....___
I
I
I
I
I
1
2
3
4
5
X
Figure 15.1. Plots of Pareto density function
CHAPTER 16
BETA DISTRIBUTION 16.1
Introduction
+
Let a stick of length 1 be broken at random into ( n 1) pieces. In other words, this means that n breaking points U l , . . . , U, of the unit interval are taken independently from the uniform U ( 0 , l ) distribution. Arranging these random coordinates U 1 , . . . , Vn in increasing order of magnitude, we obtain the uni,form order statistics
[see (11.39)], where, for instance,
U1,n = min(U1,. . . , Un} and
Un,+ = max(U1,. . . ,77,).
The cdf Fk,n(z) of
U k , n was
obtained earlier in (11.41) as
where (16.1)
denotes the incomplete beta function, and (16.2)
is the complete bet,a function. The corresponding pdf form fk,n(IC) =
n! &'(l  z)~', ( k  l ) ! ( n k ) !
139
fk,n(z)
of
Uk,n
0 < IC < 1.
has the (16.3)
140
BETA DISTRIBUTION
Notations
16.2
We say that the order statistic Uk,n has the beta distribution with shape parameters k and n  k 1. More generally, we say that a random variable X has the standard beta distribution with parameters p > 0 and q > 0 if its pdf is given by
+
A linear transformation Y=a+hX,
m 1 and q > 1, then the beta(p, q ) distribution is unimodal, and its mode (the point a t which the density function p x ( z ) takes its maximal value) is at 2 =
’ ~
p+q2
, a.nd the density function in this case is a bellshaped curve. If
SOME TRANSFORMATIONS
141
p < 1 (or q < l),then p x ( z ) tends t o infinity when z + 0 (or when z + 1). If p < 1 and q < 1, then p x ( z ) has a Ushaped form and it tends to infinity when 2 t 0 as well as when x + 1. If p = 1 and q = 1, then px(x) = 1 for all 0 < z < 1 (which, as noted earlier, is the standard uniform density). Plots of beta density function are presented in Figures 16.116.3 for different choices of p and q .
16.4
Some Transformations

It is quite ea5y to see that if X N beta(p, q ) , then 1  X beta(q,p). Now, let us find the distribution of the random variable V = 1/X.
Exercise 16.1 Show that (16.6)
Taking 4 = 1 in (16.6), we obtain the pdf of Pareto Pa(p, 0 , l ) distribution. The density function p w ( z ) of the random variable
takes on the form 1
p w ( 2 )=
rJyV1
(I + x ) P + 4 '
(16.7)
> O.
Distribution with pdf (16.7) is sometimes called the beta distribution of the second kind.
16.5
Moments

Since the beta distribution has bounded support, all its moments exist. Consider the standard beta distributed random variable X bet+, q ) . Since 0 5 X 5 1, we can conclude that
and
EIX for any a
2 0.

EX/" 5 1
BETA DISTRIBUTION
142
c :
LD
I
Beta(0 5 , 0 5) Beta(0 5 , 2) Beta(0 5 4)
U
I
m U
n
a
c.l
7
0
I
00
I
02
I
I
04
06 X
I
I
08
10
Figure 16.1. Plots of beta density function when p = 0.5
MOMENTS
,
f
00
02
143
04
06 X
I
I
08
10
Figure 16.2. Plots of beta density function when p
= 2.0
BETA DISTRIBUTION
144
I 00
I
I
I
I
I
02
04
06
08
10
Figure 16.3. Plots of beta density function when p = 4.0
MOMENTS
145

Moments about zero: Let X beta(p,q). Then
(16.8) and, in particular,
= a+
EY
hEX
hP
(16.13)
= a+ 
P+4
and
EY2
=
oZ2+ 2ahEX
+ h2EX2 (16.14)

Central moments: If X bet& q ) , then we readily find from (16.9) and (16.10) the variance of t o be
0, enlarge the
(18.3)
157
158
EXPONENTIAL DISTRIBUTION
Note that F y ( 2 ) in (18.3) can be rewritten as
{
(
F y ( 2 ) = max 0 , l  exp  “ x u ) } , ~
m<xz 0. In many situations, we deal with nonshifted ( a = 0) exponential distribution E(0,A). For the sake of simplicity] we will denote it by Y E ( X ) ,and in this case N
Fy(x)= max 0 , 1  exp
{
(
and
Note that if X

31
(18.6)
(18.7)
E(1),then Y
= AX
N
E(X).
E(1) and Y E(X) be independent random variExercise 18.1 Let X ables. Then, find the value of the parameter X such that P { X 2 Y } = N
i.
N
Exercise 18.2 Let X and Y be independent standard exponentia.1 random variables. Find the distribution of X / Y .
18.3 Laplace Transform and Characteristic F‘unct ion If Y form:
N
E ( X ) , then its Laplace transform py(s) = EeSY has the following 1 1fXs.
If V
=a
+ Y , then V
N
(18.8)
E ( u ,A), and consequently, (18.9)
MOMENTS
159
To obtain the characteristic function of an exponentially distributed ran
dom variable, we can use the following relation between Laplace transforms
cpx(s)= EecSX and characteristic functions f x ( t )= EeitX:
fx ( t )= cpx (it).
(18.10)
Using (18.10) and the expression for the Laplace transform of exponential distribution given above, we ca.n write the characteristic functions of X E(l),Y E(A), and V E ( a , X ) as N
N
N
(18.11)
( 18.12) (18.13) respectively.
Moments
18.4
The exponential decay of the pdf (18.5) provides the existence of all the moments of exponential distribution.
Moments about zero: If Y E ( A ) ,then N
a,=EY"
xn exp
=
( ):
dx
z"e" dx = Anr(n =
+ 1)
n = 1 , 2 , . .. .
Ann!,
(18.14)
In particular, we have (18.15) (18.16) ( 18.17)
EY = A, E Y 2 = 2A2, E Y 3 = 6A3, E Y 4 = 24A4 In the general case when V
EV"
= E(u
+ U)"
N
( 18.18)
E(a,A), we obtain =
(;)U"~EY
n!xm, Xranr

r=O
n = 1 , 2, . . . .
(18.19)
160
EXPONENTIAL DISTRIBUTION
Central moments:

Indeed, the central moments coincide for random variables V N E ( a , A ) and Y E(A). Let X have the standard exponential E(1) distribution. Then
v  EV 5 Y

EY
d
= X(X

EX),
and therefore
pn = E(V  EV)"
=
E ( Y  A)"
=
A"E(X  1)"
r=O
c r=O
=
Ann!
( n r ) ! (1)T
PI
'
N7e can show from (18.20) tjhat central moments recurrence relations: @I /?,+I
= =
0, (n
+ l)A@, + (l)"+'Xn+',
n = 1 , 2 ,. . . . (18.20)
fin satisfy the following
TL
=
1,2,.. . .
(18.21)
In particular, we obtain froin (18.20) the variance of Y to be Va.r Y = p2 = x2.
18.5

( 18.22)
Shape Characteristics
Let Y E(X). Then, from (18.7), we see readily that the distribution is unimodal with the mode at 0. It has an exponential decay forni. Also, from (18.20), we obta.in the third and fourth central moments of Y as p3
= 2 3 ~ and
p4=9x4.
We then readily find Pearson's measures of skewness arid kurtosis as
respectively. Thus, the exponential distribution is a positively skewed and leptokurtic distribution which is reverse Jshaped. Plots of exponential density function are presented in Figure 18.1 for some choices of A.
SHAPE CHARACTERISTICS
161
0 r
03 0
W
0
CL
n Q U 0
c.I 0
0
0
I
I
I
I
I
I
0
1
2
3
4
5
X
Figure 18.1. Plots of exponential density function
162
EXPONENTIAL DISTRIBUTION
18.6
Entropy
Since the random variable V H ( V ) is defined by
N
E ( a ,A) has pdf as given in (18.5)' its entropy

x
") dx
=1
+ log A.
(18.23)
Consider the set of all probability density functions p ( z ) satisfying the following restrictions: (a) p ( z ) > 0 for z 2 0 and p ( z ) = 0 for z < 0; z p ( z ) dx = C, where C is a positive constant. (b) It happens that the ma.xima1value of entropy for this set of pdf's is attained for
that is, for an exponential distribution with mean C
18.7
Distributions of Minima
Let Y k N E(Xk),k min(Y1,. . . , Y,}.
=
1 , 2 , . . . , n, be independent random variables, and mn =
Exercise 18.3 Prove that m, (for any n = 1 ' 2 , . . .) also has the exponential E(X) distribution, where
x =A1
+
" '
+A,.
The statement of Exercise 18.1 enables us to show that d
yl
min(Y1,. . . , Y ? }= , n
n = 1 , 2 , . ..
( 18.24)
when Yl,Y2,.. . are independent and identically distributed as E ( X ) . It is of interest to mention that property (18.24) characterizes the exponential E(X) (A > 0) distribution.
UNIFORM AND EXPONENTIAL ORDER STATISTICS
163
Uniform and Exponential Order Stat istics
18.8
In Chapter 11 [see (11.39)] we introduced the uniform order statistics
. . 5 Un,n
U1,n L u2,n I .
arising from independent and identically distributed random variables Uk U ( 0 ,I), k = 1 , 2 , . . . ,n. The analogous ordering of independent exponential E ( l ) random variables X I , X z , . . . , X , gives us the exponential order statistics N
X1,n Note that
I X2,n 5 . . 5 Xn,n'
= min(X1,
and
X2,. . . , X n }
Xn,n = max(X1, X 2 , . . . , X n } .
There exist useful representations of uniform as well as exponential order statistics in terms of sums of independent exponential random variables. To be specific, let X I , X,, . . . be independent exponential E(1) random variables, and S k = x1
+ x,+ . + X k , '
k = 1,2, . ..
'
Then, the following relations are valid for any n = 1 , 2 , . . . : (18.25) and
Note that relation (18.24) is a special case of (18.26). The distributional relations in (18.25) and (18.26) enable us t o obtain some interesting corollaries. For example, it is easy t o see that the exponential spacings
D2,n = X2,n
D1,n = X1,ni
are independent, and that
( n k

X l , n , . . . 3 Dn,n = Xn,n  X n  ~ , n
+ I)&,,
N
E(1).
It also follows from (18.26) that

1 1 +n n1
+ .  . +n  k1+ l '
1 5 k 5 n.
(18.27)
Upon setting k = n = 1 in (18.25), we obtain the following interesting result: If X1 and X , are independent exponential E(l) random variables, then Z = X , / ( X l X,) has the uniform U ( 0 , l ) distribution.
+
164
EXPONENTIAL DISTRIBUTION
Exercise 18.4 Using the representation in (18.26), show that the variance of x k , , is given by
and that the covariance of
and Xe,, is simply
Xk,,
COv(Xk,n,Xe,+)= Var
Y1
< t 5 72 .
Convolutions
18.9 Let Yl
15 k
Xk,n,
+ Y2.
N
E(A1) and
Y2

E(A2) be independent random variables, and V
=
Exercise 18.5 Show that the pdf pv(x) of V has the following form:
e x/x1 PV(2) =
if A1
# Az,
__ e  x / x z
z>0
1
A 1  A2
(18.28)
and p v ( x )=
X
z
e"/X, A2
20
(18.29)
if A 1 = A2 = A.
Equation (18.29) gives the distribution of the sum of two independent, raridon1 variables having the same exponential distribution. Now, let us consider the sum
v,,= XI + ' . + X, '
of n independent exponential random variables. For the sake of simplicity, we will suppose that XI, N E ( l), k = 1 , 2 , . . . , n. It follows from (18.8) that the Laplace transforni p(s) of any XI, has the form (18.30)
Then the Laplace transform p,(s) positive values, is given by p,(.)
00
=
of the sum V,, which also takes on only
e  S Z p n ( x ) dx = (1
+
S ) y ,
(18.31)
DECOMPOSITIONS
165
where p,(x) denotes the pdf of V,. Comparing (18.30) and (18.31), we can readily see that
a.nd, hence,
It then follows from (18.32) that (18.33) Probability density function in (18.33) is a special case of g a m m a distribu t i o n s , which is discussed in detail in Chapter 20.
Exercise 18.6 Let X1 and X2 be independent random variables having the standard E(1) distribution, and W = XI  X2. Show that the pdf of W has the form 1 p w ( x ) =  eIz1, 2
00
< x < 00.
(18.34)
The probability density function in (18.34) corresponds to the standard Laplace distribution, which is discussed in detail in Chapter 19.
18.10 Decompositions Let X N E(1). We will now present two different ways to express X as a sum of two independent nondegenerate random variables. (1) Consider random variables V = [XI and U = { X } , which are the integer and fractional parts of X , respectively; for example, V = n and U = x if X = n x,where n = 0 , 1 , 2 , . . . and 0 5 x < 1. It is evident that
+
X=V+U.
(18.35)
Let us show that the terms on the RHS of (18.35) are independent. It is not difficult to check tha.t V takes on values 0 , 1 , 2 , . . . with probabilities p , = P{V = n } = P { n 5 X
< n + 1) = (1  X)X",
(18.36)
166
EXPONENTIAL DISTRIBUTION where X = l/e. This simply means that V has the geometric G(l/e) distribution. We see that the fractional part of X takes values on the interval [0, 1) and 03
Fu(x)
P{U 5 z}
=
=
C P { n5 x 5 n + z }
n=O
To check the independence of random variables V and U , we must show that for any n = O , 1 , 2 , . . . and 0 5 z 5 1, the following condition holds:
P{V = n,
u 5 .}
(18.38)
= p,Fv(z).
Relation (18.38) becomes evident once we note that
P{V
= n,
U 5 x}
= P{n 5
X < n + x}
= eFn(1 e").
Thus, we have expressed X having the standard exponential distribution as a. sum of two independent random variables. Representation (18.35) is also va.lid for Y E ( X ) ,X > 0. In the general case, when Y E ( a ,A), the following general form of (18.35) holds: N
N
Y =a
+ [Y

a] + {Y  a } ,
where the integer and fractional parts ([Y a] and {Y random variable Y  a are independent.

a } ) of the
(2) Along with the random variable X N E(l),let us consider two independent random variables Yl and Y2 with a common pdf
(18.39) It is easy to prove that the nonnegative function g ( x ) above is really a probability density function because of the fact that
Exercise 18.7 Show that the sum Y1f yZ has the exponential E(1) distribution.
Thus, we have established that the exponential distributions are decomposable. Moreover, in Chapter 20 we show that any exponential distribution is infinitely divisible.
LACK OF MEMORY PROPERTY
18.11
167
Lack of Memory Property
In Chapter 6 [see (6.39)], we established that the geometric G ( p ) random variable X for any n = 0, 1,. . . and m = 0,1, . . . satisfies the following equality:
P { X 2 n + m I X 2 m} = P { X 2 n}. Furthermore, among all discrete distributions taking on integer values, geometric distribution is the only distribution that possesses this “lack of memory” property. Now, let us consider Y E ( X ) , X > 0. It is not difficult to see that the equality N
P{Y 2 z + y
1 Y 2 y}
= P{Y
2 z}
(18.41)
holds for any z 2 0 and y 2 0. This “lack of memory” property characterizes the exponential E(X) distribution; that is, if a nonnegative random variable Y with cdf F ( z ) [such that F ( z ) < 1 for any z > 01 satisfies (18.41), then
F ( z ) = 1  e+, for some positive constant A.
II:
> 0,
This Page Intentionally Left Blank
CHAPTER 19
LAPLACE DISTRIBUTION 19.1 Introduction In Chapter 18 (see Exercise 18.4) we considered the difference V = XI  X , of two independent random variables with X I and X , having the standard exponential E(1) distribution. It wa.s mentioned there that the pdf of V is of the form 1 pv(x) =  e + ~ , co < x < 00. 2 This distribution is one of the earliest in probability theory, and it was introduced by Laplace (1774). A booklength account of Laplace distributions, discussing in great detail their various properties and applications, is available and is due to Kotz et al. (2001).
19.2
Notations
We say that a random variable X has the Laplace distribution if its pdf px (x) is given by (19.1)

We use X L ( a ,A) to indicate that X has the Laplace distribution in (19.1) with a 1oca.tionpa.rameter a and a scale parameter X (co < a < 00, X > 0). In the special case when X has a symmetric distribution with the pdf (19.2)

we denote it by X L(X) for the sake of simplicity. For insta.nce, V denotes that V has the standard Laplace distribution with pdf 1
pv(x) =  eI21, 2
M
169
< x < 00,

L(1)
(19.3)
170
LAPLACE DISTRIBUTION
and its cdf Fv(x)has the form (19.4)
+
Indeed, if V L(1), then Y = XV L(X),and X = a XV L ( a ,A). Laplace distributions are also known under different names in the literature: the first law of Laplace (the second law of Laplace is the standard normal distribution), double exponential (the name dou.ble exponential or, sometimes, doubly exponential is also applied to the distribution having the cdf N
N
co < x < 00,
F ( z ) = exp(e2),
which is one of the three types of extreme value distributions), twotailed exponen,tial, and bilateral exponential distributions.
19.3
Characteristic Function
Recall that if V L ( 1 ) , it can be represented as V = X I  X,, where XI and X z are independent random variables and have standard exponential E( 1) distribution. Consequently, the characteristic function of V is N
f v ( t )= EeitV = EeitX1EeitX2 and, therefore, we obta.in from (18.11) that fV(t) =
1 1 1 t2 (1  it)(l +it)
+
Since the linear transformation X = a distribution, we readily get
.
(19.5)
+ XV leads to the Laplace L ( a ,A) (19.6)
It is of interest t o mention here that (19.5) gives a simple way to obtain the characteristic function of the Cauchy distribution [see also (12.6) and (12.7)]. In fact, (19.5) is equivalent to the following inverse Fourier transform, which relates the characteristic function f v ( t )with the probabilit.7 density function PV (XI:
(19.7) It t,hen follows from (19.7) that (19.8) Now wt' can see from (19.8) that the Cauchy C(0.1) distribution with pdf
has characteristic function
f ( t ) = elt'.
MOMENTS
19.4
171
Moments
The exponential rate of the decreasing nature of the pdf in (19.1) entails the existence of all the moments of the Laplace distribution.

Moments about zero: Let Y L ( A ) . Since Y has a symmetric distribution, we readily have =
a2n1
EY2"
=o,
1
For moments of even order. we have ( ~ 2 " = EY2"
1"
n=1,2)....
( y )dx ( T) dx
x2nexp 2A m 1 " x2"exp
=
x.6
=
dx
x2n e x
=
A2"Iw
=
x2" 1 ? ( 2 ~ +1) = x2" ( 2 4 4
=
i , 2 , . . . . (19.9)
In particular, we have
E Y 2 = 2A2
( 19.10)
E Y 4 = %A4.
(19.11)
and
In the general case when X = a + Y
N
L ( a ,A), we also have
n
EX"
= E(u
+ Y)" = C
an'EYT,
n = 1,2,.. .,
(19.12)
r=O
where
EY'
=
if r = 1 , 3 , 5 , ... if T = 0 , 2 , 4 , . . . .
0 A'r!

Central moments: Let Y L(X) and X = a + Y L ( a ,A). It is clear that central moments of X and moments about zero of Y are the same since N
,& = E ( X  E X ) "
= E(Y

EY)"
= EY",
n = 1 , 2 , .. . .
From (19.13) and (19.10), we immediately find that
(19.13)
172
LAPLACE DISTRIBUTION
19.5

Shape Characteristics
Let Y L(X). Then, due to the symmetry of the distribution of Y , we readily have the Pearson’s coefficient of skewness as y1 = 0. Further, from (19.11), we find the fourth central moment of Y as p4
=
a4
=
a4 =

+
4 ~ 3 ~ 61 ~
~ 2 ~ ~C X 4:
(19.15)
From Eqs. (19.15) and (19.14), we readily find the Pearson’s coefficient of kurtosis as (19.16)
Thus, we find the Laplace distribution t o be a symmetric leptokurtic distribution.
19.6
Entropy
The entropy of the random variable X

J’, O
+ =
1
N
L ( a ,A) is given by
ex ( l o g ( 2 ~ ) x log e> tin:
lrr f + e” (log(2X)
log(2Xe) = 1
3: loge}
dx
+ log(2X).
( 19.17)
Consider the set of all probability density functions p(x) satisfying the following restriction: (19.18) where C is a positive constant. Then, it has been proved that the maximal value of entropy for this set of densities is attained if p ( x ) =  exp 2c that is, for a Laplace distribution.
( ;.) 

,
173
CONVOLUTIONS
19.7

Convolutions

Let Y1 L(X) and Y 2 L(X)be independent random variables, W = Yl +Y2, and T = Y1 Y 2 . Since the Laplace L(X)distribution is symmetric about zero for any X > 0, it is clear that the random variables W and T have the same distribution. Exercise 19.1 Show that densities p w ( z ) and p ~ ( zof) the random variables W and T have the following form:

Next, let Y1 N L(X1) and Y 2 L(X2) be independent random variables each having Laplace distribution with different scale parameters. In this case, too, the random variables W = Yl +Y2 and T = Y1 Y2 have a common distribution. To find this distribution, we must recall that characteristic functions fl(t) = EeitY1 and f ~ ( t )= EeitY2have the form
and hence the characteristic function fw(t) is given by
It follows from (19.20) that the probability density functions
of the random variables W , Yl,and YLsatisfy the relationship
Hence, we obtain
oo
< 2 < 00.
(19.22)
174
LAPLACE DISTRIBUTION
Decompositions
19.8
Coming back to the representation V = X1  X2,where V L ( 1 ) and independent random variables X1 and X2 have the standard exponential E ( 1 ) distribution, we easily obtain that the distribution of V is decomposable. Recall from Chapter 18 that the random variables X1 and X , are both decomposable [see, for example, (18.35) and Exercise 18.51. This simply means that there exist independent random variables Yl,Y2,Y3, and Y4 such that N
and
d
x2 = Y3
+ y4,
and hence
V = (Yl  y3) + (Y2  Y4), d
(19.23)
where the differences Y1  Y3 and Y2  Y4 are independent random variables and have nondegenerate distributions. Therefore, V L( 1) is a decomposable random variable. Furthermore, a linear transformation a XV preserves this property and so the Laplace L ( a ,A) distribution is also decomposable. In addition, since exponential distributions are infinitely divisible, the representation V = X 1  X 2 enables us to note that Laplace distributions are also infinitely divisible. N
19.9
+
Order Statistics
Let Vl, Vl, . . . , V, be independent and identically distributed as standard Laplace L(1) distribution. Further, let V1,n < Vz,, < . . . < V,,n be the corresponding order statistics. Then, using the expressions of p v ( x ) and F v ( z ) in (19.3) and (19.4), respectively, we can write the kth moment of V?,, (for 1s T 5 n) as

n! ( r  l ) ! ( n T ) !
x k {Fv(x)}'l (1  Fv(2)}""pv(z) dx
oi)
n!


/"
an(?
n! I)! ( n T ) !
xk
(2  e  x } T  '
{ex}nr
ePx dx (19.24)
ORDER STATISTICS
175
+
Now, upon writing 2 e" in the two integrands as 1 ( 1 ec") and expanding the corresponding terms binomially, we readily obtain
E (Vtn)
l n n! + (1)"E 2n ( T  I ) (! n  r ) ! . z=r

1
c (3
rl
2n i=O
2 k (T 
( n  i)! 1  i)! (TL  T ) ! { I  ex}rli
i!
z=r
x { e  ~ > ~  ' e  X dz.
(19.25)
Noting now that if X I ,X,, . . . ,X , is a random sample from a standard exponential E ( l ) distribution and that X I , , < X,,, < ... < Xm,m are the corresponding order statistics, then
(19.26) Upon using this formula for the two integrals on the RHS of (19.25), we readily obtain the following relationship:
( 19.27) Thus, the results of Section 18.8 on the standard exponential E ( l ) order statistics can readily be used to obtain the moments of order statistics from the standard Laplace L ( 1 ) distribution by means of (19.27).
176
LAPLACE DISTRIBUTION
Similarly, by considering the product moment E (V,,, K , n ) ,splitting the double integral over the range (cc < z < y < cc) into three integrals over the ra.nge (cc < z < y < 0), (cc < z < 0, 0 < y < m), and (0 < z < y < oo), respectively, and proceeding similarly, we can show that the following relationship holds:
for 1 5 r < s 5 n, where, as before, E ( X k , m Land ) E ( X k , m Xe,,) denote the single and product moments of order statistics from the standard exponential E ( 1) distribution.
Exercise 19.2 Prove the relation in (19.28).
Remark 19.1 As done by Govindarajulu (1963), the approach above can be generalized to any symmetric distribution. Specifically, if &,,'s denote the order statistics froin a distribution symmetric about 0 arid Xz,7L's denote the order statistics from the corresponding folded distribution (folded about 0), then the two relationships above continue to hold.
Exercise 19.3 Let Vl,V2,. . . , V, be a random sample from a distribution F ( z ) synirnctric about 0, and let Vl,, < V2,, < . . . < Vn,nbe the corresponding order statistics. Further, let Xe,, denote the l t h order statistic from a random sample of size nz from the corresponding folded distribution with cdf G(z) = 2 F ( z )  1 (for z > 0). Then, prove the following two rclationships between the moments of these two sets of order statistics: For 1 5 r 5 n and k 2 0,
(19.29)
ORDER STATISTICS for 1 5 r
177
< s 5 n,
E(V,,,K,,)
=
1 2"

The rehtionship in (19.27) can also be established by using simple probability arguments as follows. First, for 1 5 T 5 n, let us consider the event Vr,, 2 0. Given this event, let i (0 5 i 5 r  1) be the number of V's (among V1,V2,.. . , V,) which are negative. Then, since the remaining ( n  i) V ' s (which are nonnegative) form a random sample from the standard exponential E(1) distribution, conditioned on the event V,,,, 2 0, we have d
V,,,= Xri,,i wit,h binomial probabilities
(3I
for i
= 0 , 1 , . . . ,r

1
2". Next, lct us consider the event
(19.31)
V,,,< 0.
Given this event, let i ( r 5 i 5 7 ~ )be the number of V's(among V1,V2,.. . , Vn) which are negative. Then, since the negative of these i V ' s (which are negative) form a random sample from the standard exponential E(1) distribution, conditioned on the event Vr,, < 0, we also hame d
V,.,, = Xi,+l,i
for i
with binomial probabilities (:)/27L.
= r, r
+ 1 , .. . , n
(19.32)
Combining (19.31) and (19.32), we
readily obtain the relation in (19.27).
Exercise 19.4 Using a similar probability argument, prove the relation in (19.28).
This Page Intentionally Left Blank
CHAPTER 20
GAMMA DISTRIBUTION 20.1
Introduction
In Section 18.9 we discussed the distribution of the sum
w,= x1+. . . + x, of independent random variables X I , (k = 1 , 2 , . . . ,n ) having the standard exponential E(1) distribution. It was shown that the Laplace transform p,(s) of W , is given by p,(s) = (1+s),,
(20.1)
and the corresponding pdf p,(x) is given by
p,(.)
1
=
xnl
e I
if
5
> 0.
(20.2)
It was also shown in Chapter 18 (see Exercise 18.5) that the sum of two positive random variables Yl and Y2 with the common pdf
p+(x)= 1 ~
J;;
2
112
e I ,
(20.3)
x>O,
has the standard exponential distribution. In addition, we may recall Eq. (9.22) in which cumulative probabilities of Poisson distributions have been expressed in terms of the pdf in (20.2). Probability density functions in (20.2) and (20.3) suggest that we consider the family of densities
pa(.)
=C(a)xaleP,
x > 0,
where C ( a )depends on cr only. It is easy to see that pa(.)
where
r(a)is the complete gamma function. 179
(20.4) is a pdf if
GAMMA DISTRIBUTION
180
Notations
20.2
We say that a random variable X has the standard gamma distribution with parameter a > 0 if its pdf has the form
(20.5) The linear transformation Y = a PY(X) =
+ AX yields a random variable with pdf
1 (x  a)a1 exp qQ)XN
~
{ x}> a. 2a
if z
(20.6)
We will use the notation Y r(a,a,A) to denote a random variable with pdf (20.6). Hence, X r(a,O,1) corresponds t o the standard gamma distribution with pdf (20.5). Note that when QI = 1, we get the exponential distributions as a subset of the gamrria distributions. The special gamma dist,ribut,ions r (71/2,0, a ) , when n is a positive integer, are called chisquare distributions (xzdistribution) with n degrees of freedom. These distributions play a very important role in statistical inferential problems. From (20.5), we have the cdf of X r ( a ,0 , l ) (when Q > 1) as N
N
N
=

+ F X [(21,
px (x)
where X ' is a r(n  1 , 0 , 1) random variable. Thus, the expression above for the cdf of X presents a recurrence relation in Q. Furthermore, if a equals a positive integer 7 1 , then repeated integration by parts a s above will yield
1x7p n1
=
eXz2
i=O
which is precisely the relationship between tlie cumulative Poisson proba.bilities arid tlie gamma distribution noted in Eq. (9.22).
Mode
20.3
Let, X r(a,O,1). If Q = 1 (the case of the exponential distribution), the pdf px (x)is a monotonically decreasing function, as we have seen in Cha,pter 18. If Q > 1, the gamma r(a,0 , l ) distribution is unimodal and it,s mode [the point wkiert: density function (20.5) takes the maximal value] is at 5 = cy 1. 011th: othcr hand, if Q < 1, then p x ( z ) tends to infinity as z + 0. N

MOMENTS
20.4 Let X

181
Laplace Transform and Characteristic Function r(a,0 , l ) . Then, the Laplace transform ' p ~ ( s )of X is given by (20.7)
Then the characteristic function f x ( t ) = EeitX has the form
fx ( t )= px (  i t )
=
1 (1 it)"
(20.8)
As a result, we can write the following expression for the characteristic function of the random variable Y == a A X , which has the general gamma r ( a ,a,A) distribution:
+
(20.9)
20.5
Moments
The exponentially decreasing nature of the pdf in (20.5) entails the existence of all the moments of the gamma distribution.

Moments about zero: Let X r ( a ,0 , l ) . Then, an=EXn
=
(20.10) In particular, we have
E X = a, E X 2 = a ( a + l), EX3 = a(a+l)(a+2), E X 4 = C X ( Q+ l)(a+ 2)(a + 3).
(20.11) (20.12) (20.13) (20.14)
Note that (20.10) is also valid for moments E X " of negative order n > a. Let us now consider Y N I ' ( a , a , A ) . Since Y can be expressed as Y = a A X , where X r ( a ,0, l),we can express the moments of Y as
+
N
EY" = E ( a + AX)"
=

GAMMA DISTRIBUTION
182
Central moments: Central moments of the random variables Y r(a,a , A) and V r(Q, 0, A) are the same. Now, let X have the standard gamma r(a,O, 1) distribution. Then, Y  EY 5 v  EV 5 A(X  E X ) N
N
and hence
p"
Y
=
E(Y

EY)"
= E(V

EV)"
= XnE(X

EX)"
As a special case of (20.16), we obtain the variance of the random variable r(a, a , A) as
N
Var Y = L,ZI = ax2.
20.6
(20.17)
Shape Characteristics
From (20.16), we get the third central moment of Y t o be p3
= 2aX
3
using which we readily obtain Pearson's coefficient of skewness as (20.18)
This reveals that the gamma distribution is positively skewed for all values of the shape parameter a. Next, we get the fourth central moment of Y from (20.16) to be
p4 = ( 3 2 + 6 4 x 4 , using which we readily obtain Pearson's coefficient of kurtosis as 7 2
=
D4
Pz"

=3
+ .a6
(20.19)
This reveals tha.t the gamma distribution is leptokurtic for all values of the shape parameter a. Furthermore, we note from (20.18) and (20.19) that as Q m, y1 and 7 2 tend to 0 and 3, respectively (which are the values of skewness and kurtosis of the normal distribution). As we shall see in Section 20.9, the normal distribution is, in fact, the limiting distribution of gamma. distributions as the shape parameter a + 00. Plots of gamma density function presented in Figure 20.1 reveal these properties.
SHAPE CHARACTERISTICS
183
LD 0
U
0
m 0
U
n
a
c.l 0
7
0
0
0
0
5
10
15
Figure 20.1. Plots of gahrna density function
20
184
GAMMA DISTRIBUTION
Remark 20.1 From (20.18) and (20.19), we observe that y1 and the rela.tionship 72 = 3
72
+1.5~;~
satisfy (20.20)
which is the Type I11 line in the Pearson plane (that corresponds to the gamma faniily of distributions).
Exercise 20.1 Generalizing the pdf in (20.5),we may consider the pdf of the generalized ga’mmn distributions as
p x , ( z )= ~ ( a :h)z6ep1ers: ,
z
> 0, a: > 0, 6 > 0.
Then, find the normalizing constant C(a:,6). Derive the moments and discuss the shape charackristics of this generalized gamma family of distributions.
+
Consider the transformation Z = y logX’ when X’ has a generalized gamma distribution with pdf p x (z) ~ as given above. Then, we readily obtain the pdf of 2 a.s cy6(*y)
p ~ ( z )= C(cr,6)e
e
&(ZY)
cx)O, 6 > 0 , cx, 0, 0 < u < 1.
(20.24)
From (20.24), we observe immediately that the random variables U and V are independent, with V having the gamma I'(a+P, 0 , l ) distribution (which is to be expected as V is the sum of two independent gamma random variables), and U ha.ving the standard beta(cu, P ) distribution. We need to mention here two interesting special cases. If X r ( 0 , l ) and Y I' 0, l ) , then U = X/(X Y ) has the standard arcsine distribution; and if X and Y have the standard exponential E(1) distribution, then U = X / ( X Y ) is a standard uniform U ( 0 , l ) random variable. Of course, we will still have the independence of U and V, and the b e t a ( a , p ) distribution of U if we take X r ( a ,0 , A) and Y r ( p ,0, A). Lukacs (1965) has proved the converse result: If X and Y are independent Y positive random variables and the random variables X / ( X Y ) and X are also independent, then there exist positive constants a , p, and A such that X r(a,0, A) and Y r(p,0, A). It was also later shown by Marsaglia (1974) that the result of Lukacs stays true in a more general situation (i.e., without the restriction that X and Y are positive random variables). N
+
(i,
N
+

N
+
N
+
N
+
Exercise 20.3 Show tha.t the independence of U = X / ( X Y ) and V = X + Y also implies the independence of the following pairs of random variables: (a)
V
X
f_
a.nd x
+
+Y ;
X 2 Y2 and X XY (d)
(xX Y*I2 +
+Y ;
and ( X
+ Y)'.
+
Exercise 20.4 Exploiting the independence of U = X / ( X Y ) and V = X Y , the fact that X and Y are both gamma, and the moments of the gamma distribution, derive the moment of the beta distribution in (16.8).
+
187
LIMITING DISTRIBUTIONS
Once again, let X r ( a ,0, A) and Y r(p,0, A) be independent random variables, and let V = X Y. Consider now the conditional distribution of X given that V = v is fixed. The results presented above enable us to state that conditional distributions of X/v and X / V are the same, given that V = ‘u. Thus, the conditional distribution of X/v, given that X Y = ‘u, is beta(a,p). If a = p = 1 [which means that X and Y have the common exponential E(A) distribution], the conditional distribution of X , given that X Y = v, becomes uniform U ( 0 ,v). The following more general result is also valid for independent gamma random variables. Let x k r ( a k , 0, A), k = 1 , 2 , . . . ,n, be independent random variables. Then, N
N
+
+
+
N
vx1
(X,+...+X,
.’.+X, ’ ... ’ X l +vxn
d
= {Xi, . . . , X,
1
1 V = Xi + . . . + X ,
= U} .
(20.25)
Recalling now the representation (18.25) for the uniform order statistics Ul,,,. . . , U,,,, which has the form
where
s k =
x1 + . . . + x k , and xkr(i,o,i),
k = 1 , 2 ,...
are the standard exponential E(1) random variables, we can use (20.25) to obtain another representation for the uniform order statistics as {Ul,n,..
20.9
un,n}=d {SI,.. . , Sn I Sn+l
I}.
(20.26)
Limiting Distributions
Consider a sequence of random variables Yl,Y2, . . . , where
~ , ~ r ( n , o , i ) , n = 1 , 2 ,..., and with it, generate a new sequence
( 20.2 7)
with its characteristic function f n ( t )being
(20.28)
GAMMA DISTRIBUTION
188
Exercise 20.5 Show that for any fixed t , as n + M;
(20.29)
Earlier, we encountered the characteristic function e"l2 [for example, in Eqs. (5.43), (7.31) and (9.33)] corresponding to the standard normal distribution. Wc have, therefore, just esta.blished that the normal distribution is t,he limiting distribution for the sequence of gamma random variables W, in (20.27).
Exercise 20.6 Let a(.) denote the cumulative distribution function of a random variable with the characteristic function e"/'. Later, in Chapter 23, we will find that
1
/'
@(x) = 
v%
m
ePt2/' d t .
Making use of the limiting relation in (20.29), prove that
s";"
( n  l)! as n + 00.
Znl
0
e x d z + @ ( 1 ) = 0 . 8 4 1 3 4 ...
(20.30)
CHAPTER 21
EXTREME VALUE DISTRIBUTIONS 2 1.1 Introduction In Chapter 11we considered the minimal value m, = min(U1,. . . , U,} of i.i.d. uniform U ( 0 , l ) random variables U,, Uz, . . . and determined [see Eq. (11.36)] that the asymptotic distribution of n rn, (as n + m ) becomes the standard exponential distribution. Instead, if we take independent exponential E ( 1) random variables XI, X z , . . ., and generate a sequence of minimal values
z, = min{XI,Xz,. . . ,X,},
n = 1 , 2 , ... ,
then, as seen in Chapter 18 [see Eq. (18.24)], the sequence n z, converges in distribution (as n + m ) t o the standard exponential distribution. Consider now the corresponding maximal values
M,
= max(U1,. . . , U,}
and
2, = max(X1, X,, . . . , X,},
n = 1 , 2 , .. . .
Then, as seen earlier in Eq. (11.38),as n + m,
P[n{Mn, I} < z] + e",
z
< 0.
This simply means that the sequence n(1 M,} converges asymptotically t o the same standard exponential distribution. This fact is not surprising to us since it is clear that the uniform U ( 0 ,1)distribution is symmetric with respect to and, consequently, d
lM,=rn, Let us now find the asymptotic distribution of a suitably normalized maximal value 2,. We have in this case the following result:
P{Z,  I n n < z}
=
( I  exp{(z
+ Inn)})" (21.1)
189
190
EXTREME VALUE DISTRIBUTIONS
as n + 00. Thus, unlike in the previous cases, we now have a new (nonexponential) distribution for normalized extremes. The natural question that arises here is regarding the set of all possible limiting distributions for maximal and minimal values in a sequence of independent and idetically distributed (i.i.d.) random variables. The evident relationship d
max{Yl, Y2,. . . , YrL} =  min{YI,Yz,.
. . , Y,}
(21.2)
requires us to find only one of these two sets of asymptotic distributions. Indeed, if some cdf H(x)is the limiting distribution for a sequence max(Y1, Y2, . . . ,Y,}, n = 1 , 2 , . . . , then the cdf G(x) = 1  H (  z ) would be the limiting distribution for the sequence min{Yl, Y2,. . . , Y,}, n = 1 , 2 , . . . , and vice versa.
21.2
Limiting Distributions of Maximal Values
We are interested in all possible limiting cdf's H(x)for sequences =
{F(a,z
+ b,)),,
(21.3)
where F is the cdf of the underlying i.i.d. random variables Y1, Y2,. . . , and a, > 0 and b, ( n = 1 , 2 , . . .) are some normalizing constants; hence, H,(x)is the cdf of max(Y1, Y2,. . . , Yn}  b,
v, =
an Of course, for any F , we can always find a sequence a, ( n = 1 , 2 , .. .) which provides the convergence of the sequence V, to a degenerate limiting distribution. Therefore, our aim here is to find all possible nondegenerate cdf's H(x). Lemma 21.1 I n order for a nondegenerate cdf H(x)to be the limit of sequence (21.3) for some cdf F and normalizing constants a, > 0 and b, ( n = 1 , 2 , . . .), it is necessary and suficient that f o r any s > 0 and x,
H"[A(s)x + B ( s ) ]= H ( z ) ,
(21.4)
i h r e A ( s ) > 0 and B ( s ) are some functions defined for s > 0. Thus, our problem of finding the asymptotic distributions of maxima is reduced to finding all solutions of the functional equation in (21.4). It turns out that all solutions H ( z ) (up to location and scale parameters) are as follows: (21.5)
H3(x) =
ee"
, m<x 0, can also be limiting distributions of minimal values. Gz is commonly known as the Weibull distribution.
21.4
Relationships Between Extreme Value Distributions
As seen in the preceding two sections, we have three types of extreme value distributions for maxima and three corresponding types of extreme value distributions for minima. The term extreme value distributions includes all distributions with cdf’s
with the standard members (when a = 0 and X = 1) being as given in (21.5)(21.10). Often, the name extreme value distribution has been used in the
EXTREME VALUE DISTRIBUTIONS
192
literature only for distributions with cdf's H3
( "). 
It is useful to remem
ber tha.t all six types of extreme value distributions given above are closely connected with exponential distributions.
Exercise 21.1 Let X have a standard exponential distribution. Then, show that the random variables
x'/",
X I / " , logX,
x'/", XI/",
1ogx
have, respectively, the distributions
Linear transformations of random variables mentioned in Exercise 21.1 enable us to express the distribution of any random variable with cdf's
via the standard exponential distribution. Note also that the exponential E ( a ,A) distribution is a special case of the Weibull distribution because its cdf coincides with Gz,l ( x ~
more, if we take Y =
a,where X
Fy(x) = P { Y < .}
N
=
')
. Further
E(l),then it is easy to show that
1  ez2,
x > 0.
(2 1.11)
We see that the RHS of (21.11) coincides with the Weibull cdf G 2 , 2 ( x ) .This distribution of Y is called the standard Rayleigh distribution, while linear transformations a XY yield the general twoparameter Rayleigh distribution with cdf
+
(21.12)
Exercise 21.2 If X denotes a standard exponential random variable and Y = show that the cdf of Y is as given in (21.11). Then, derive the incan and variance of Y .
a,
GENERALIZED EXTREME VALUE DISTRIBUTIONS
21.5
193
Generalized Extreme Value Distributions
It turns out, that all the limiting distributions of maxima as well as all the limiting distributions of minima can be presented in a unified form. For this purpose, let us introduce the faniily of cdf’s H(z,,!?) (0s < ,d < m) which are defined as
+
H(X, P) = exp{(l+ zp)’’D}
(21.13)
in the domain 1 zp > 0,and we suppose that H(z,,B) equals 0 or 1 (depending on the sign of p) if 1 z,LJ< 0. For p = 0, H ( z ,0) means the limit of H(x,,B)as /3 t 0. Let us first consider the case P = l / a , where a > 0. Then,
+
(21.14)
It is easy to see that the cdf (21.14) coincides with Next, if
p = l/cx, where a > 0,then
This cdf coincides with H2,a ( XiQ). Finally, for /3
= 0,we
have
H(x,O)= ee5
= H3(x).
(21.16)
Thus, the derivations above show that the threeparameter faniily of cdf’s
H ( z ,p, a, A)
=H
(xiu,A ) , oo 0,includes all the cdf’s
as special cases. Equation (21.17) defines the generalized extreme value distributions for maxima, while H ( z ,p) in (21.18) correspond to its standard form. Similarly,
G(z, P, a, A)
=
1 H (  T
P, a , A)
(21.19)
EXTREME VALUE DISTRIBUTIONS
194
defines the generalized extreme value distributions for minima, and (212 0 )
G ( x , P ) = 1 H (  x , P )
correspond to its standard form.
21.6
Moments
Making use of the representation in Exercise 21.1, we can express moments of the extreme value distributions in terms of moments of the standard exponential distribution. Let random variables Y , W , and V have cdf's H I , ~ ( X ) , respectively. Then, from Exercise 21.1, we have the following H Z . ~ ( XH3(x), ), relations:
y
d
=xl/a,
wd
XI/",
v =d

logX,
(21.21)
where X has the standard exponential distribution. Hence, we have
EYk
=
EX'/"
1
cc xk/a
e I dx,
(21.22)
and (21.24) It readily follows from (21.22) that moments E Y k exist if k
k).
E Y k =I (1
< Q and that (21.25)
Relation (21.23) reveals that moments EW' exist for any 1 , 2 , . . . , and
Q
> 0 and k
=
(21.26) From (21.25) and (21.26), we also obtain (21.27)
Var for any a
> 0.
w = r (1 + :)

{r (1 +
:)}
2
(21.28)
195
MOMENTS It is known that Euler's constant y = 0.57722.. . is defined as lim
=
n+cc
(2

log n )
(21.29)
k=l
rw
=

Jo
logx e" dx.
(21.30)
Comparing (21.24) and (21.30), we immediately see that
EV
= y = 0.57722 ....
(21.31)
Another way to obtain (21.31) is through the characteristic function fv(t) of the random variable V . We have
fv(t)
EeitV  ~ ~  i t l o g X EXit
lee
=
=
xit
e x dx
= r ( lit).
(21.32)
From (21.32) and the relation
f'"(0)
= i k E V k,
we readily find that
k = 1,2,... .
E V ~= (  i ) k r ( k ) ( i ) ,
(21.33)
The following useful identity, which is valid for positive z , helps us to find the necessary derivatives of the gamma function:
(21.34) Since
we obtain from (21.33) and (21.34) that
EV
=
 q i ) = +(I) = 7,
(21.36)
(21.37)
It follows now from (21.36) and (21.37) that n
Var V
7r4
=
6
(21.38)
EXTREME VALUE DISTRIBUTIONS
196
), Now, let, the random varia.bles Yl, W1, and V1 have cdf's G I , ~ ( XGz,a(x), and G3 ( x ) , respectively. Since Yl
d
=
Y,
d
WI =
w,v1 = v, d
(21.39)
we immediately obtain EY:
5)'
=
Q
k 2, (21.42) Var Wl
=
Var W =I'
(
l +
(21.43) (21.44)
and ?I2
Var V1 = Var V = 6
.
(21.45)
It is important t o mention here that extreme value distributions discussed in this chapter have assumed a very important role in lifetesting and reliability probltnis besides being used as probabilistic models in a variety of other problems.
CHAPTER 22
L 0GISTIC DISTRIBUTION 22.1
Introduction
Let Vl and V2 be i.i.d. random variables having the extreme value distribution with cdf [see (21.7)] &(2)
Let V
= V1 
= ee",
02
< 2 < 00.
V2. Then, the cdf Fv(x)of V is obtained as roo
J00
This distribution is a particular case of the logistic distribution, which has been known since the pioneering work of Verhulst (1838, 1845) on demography. A booklength account of logistic distributions, discussing in great detail their various properties and applications, is available [Balakrishnan (1992)].
22.2
Notations
A random variable X is said to have a logistic distribution if its pdf is given by
The corresponding cdf is given by
197
198
LOGISTIC DISTRIBUTION
We will use X L o ( p , a 2 ) to denote the random variable X which has the logistic distribution with pdf and cdf as in (22.2) and (22.3), respectively. It is evident that p (03 < p < cc) is the location parameter while u ( u > 0) is the scale parameter. Shortly, we will show that p and a2 are, in fact, the mean and variance of this logistic distribution. The standard logistic random variable, denoted by Y Lo(0, l), has its pdf and cdf as N
N
7r
eXX/&
PY(Z)= 
03 0 when
Inverse Gaussian Distribution
The twoparameter inverse Gaussian distribution, denoted by I G ( p ,A), has its pdf as
p x ( z ) = /=exp
271.x3
{

~
A
2P
2
( p) z
and the corresponding cdf as
Fx(z)
=
a)
(E(;

1))
+
,
z
> 0, A, p > 0, (24.13)
(8 (; +
1))
z
> 0.
, (24.14)
The characteristic function of IG (p ,A) can be shown t o be k.
(24.15)
238
MISCELLANEA
Exercise 24.2 From the characteristic function in (24.15), show that E X p and Var X = p3/X.
=
Exercise 24.3 Show that Pearson coefficients of skewness and kurtosis are given by
respectively, thus revealing that IG(p,A) distributions are positively skewed and leptokurtic. Note that these distributions are represented by the line 7 2 = 3 t 5$/3 in the ($, y2)plaiie.
By taking X = p2 in (24.13), we obtain the oneparameter inverse Gaussian distribution with pdf
,
x > 0,p > 0,
(24.16)
denoted by I G ( p ,p’). Another oneparameter inverse Gaussian distribution may be derived from (24.13) by let,ting p + 00. This results in the pdf
Exercise 24.4 Suppose X I ,X 2 , . . . ,X , are independent inverse Gaussian random variables with X , distributed as IG(p,, A,). Then, using the characteristic function in (24.15), show that X,X,/p: is distributed as I G ( p ,p 2 ) , where p = C:=l A z / p z . Show then, when p, = p and A, = X for all i = 1 . 2 . . . . , n, that the sample mean X is distributed as I G ( p ,nA).
x;&
Inverse Gaussian distributions have many properties analogous to those of normal distributions. Hence, considerable attention has been paid in the literature t o inferential procedures for inverse Gaussian distributions as well as their applications. For a detailed discussion on thew developments, one may refer to the books by C h h i h r a and Folks (1989) and Seshadri (1993, 1998).
CHISQUARE DISTRIBUTION
24.4
239
Chisquare Distribution
In Chapter 20 (see, for example, Section 20.2), we made a passing remark that the specia.1case of gamma r (n/2,0,2) distribution (where n is a positive integer) is called the chisquare (x2) distribution with n degrees of freedom. We shall denote the corresponding random variable by x;. Then, from (20.6), we have its density function as 1 (n/2)
Pxz,(.) =
e2/2
x(n/2)1
,
o<x<m.
From (20.9), we also have the characteristic function of
(24.17)
xi as
fxz, ( t )= (1 2it)
(24.18)
From (20.15) and (20.17), we have the mean and variance of
Ex: = n
and
Var
x:
xn2 = 2n.
as (24.19)
Furthermore, from (20.18) and (20.19), we have the coefficients of skewness and kurtosis of as
xi
(24.20) Also, as shown in Chapter 20, the limiting distribution of the sequence (x:  n ) / G is standard normal. Next, let and x i be two independent chisquare random variables, and x i . Then, from (24.18), we obtain the characteristic function let x2 = of x2 as
xi +
fX*(t) = Eeitx2 = EeitXz EeitXi
=
4(nfm)/2
(1  2.
(24.21)
which readily implies that x2 has a chisquare distribution with (n+m)degrees of freedom. On the other hand, if x i and X are independent random variables with X having an arbitrary distribution, and if x2 = X is distributed as chisquare with ( n m ) degrees of freedom (where m is a positive integer), then the characteristic function of X is
xi +
+
xk.
which implies that X is necessarily distributed as Let X I , . . . , X,n be independent standard normal N ( 0 , l ) random variables. Then, as noted in Chapter 23 [see, for example, Eq. (23.81)],
k= 1
follows a chisquare distribution with n degrees of freedom. More generally, the following result can be established.
240
MISCELLANEA
Exercise 24.5 Let Y l ,. . . , Y, be a random sample from the normal N ( a ,g2) Yk denote the sample mean. Then, show that distribution, and = CE==, [see Eq. (23.60)]
v
( 24.23)
It is because of this fact that chisquare distributions play a very important role in statistical inferential problems. A booklength account of chisquare distributions, discussing in great detail their various properties and applications, is available [Lancaster (1969)l.
24.5
t Distribution
Let X , X I , . . . , X , be i.i.d. random variables having standard normal N ( 0 , l ) distribut,ion. Then, consider the random variable [see also Eq. (23.85)] (24.24) where the numerator and denominator are independent with the numerator having a standard normal distribution and S, having a chisquare distribution with n degrees of freedom. Then, as given in Exercise 23.8, the pdf of this random variable is given by [see Eq. (23.86)]
which is called Student’s t distribution with n degrees of freedom. Let us denote this distribution by t,. This is a special form of Karl Pearson’s Type VII distribution. Since “Student” (1908) was the first to obtain this result, it is called Student’s distribution. But sometimes this distribution is called Fisher’s distribution. More generally, the following result can be established. Exercise 24.6 Let Yl,. . . , Y, be a random sample from the normal N ( a ,g 2 ) distribution, and = C;=,Y k / n denote the sample mean. Then, with S2 as defined in (24.23), show that
t DISTRIBUTION
24 1
It is for this reason that t distributions play a very important role in statistical inferential problems. d From the density function of X = t, in (24.25), it can be shown that the r t h moment of X is finite only for r < n. Since the density function is symmetric about IC = 0, all odd moments of X are zero. If r is even, it can be shown that the r t h moment of X (which are also central moments) is given by

~ _ _ _ _ _
n'/2
1 . 3 . . . ( r  1)
( n  r ) ( n r
+ 2 ) . . . (n

2)
( 24.2 7)
~
Exercise 24.7 Derive the formula in (24.27).
From the expressions above, we readily obtain the mean, variance, and d coefficients of skewness and kurtosis of X = t, as
EX n(X)
=
0,
=
0
n a), n2 3 ( n  2) and y2(X) = ( n > 4). n4 Var X
= (n, >
(24.28)
It is evident that the t distributions are symmetric, unimodal, bellshaped and leptokurtic distributions. In addition, as mentioned earlier in Exercise 23.9, t , distributions converge in limit (as n + m) to the standard normal distribution. Plots of the t density function presented in Figures 24.1 and 24.2 reveal these properties.
Exercise 24.8 Let X and Y be i.i.d. random variables with t , distribution. Then, show that
(24.29) a result established by Cacoullos (1965).
MISCELLANEA
242
In a recently published article, Jones (2002) observed that the t 2 distribution has simple forins for its distribution and quantile functions which lead to simple calculations for many properties and measures relating to this distributiori. The t density in (24.25) reduces, for the case n = 2, simply to P2(t) =
1
< t < 00,
m
(2 + t 2 ) 3 / 2 '
(24.30)
from which we readily obta.in the cdf as

L2

I,,,
( t / 4)
tan
1 h s e c 2 6 dQ 23/2(sec26 ) 3 / 2
(setting 7~ =
tan'(t/JZ)
Jztan 6 )
d6
C O S ~
2
; { l + + j (v5 ) )} 1
=
tan2 (tan'
m
< t < 00.
Exercise 24.9 From (24.31), show that the quantile function of the tribution is 2u  1 FT'(u) = O 2) and Var X
Exercise 24.11 If X
d
=
=
+
2n2(m 12  2) (n> 4). m(n  2)2(n  4)
F,,,, then show that
fi

(24.35)
l/a) /2 d t,.
246
24.7
MISCELLANEA
Noncentral Distributions
In Section 24.4 we noted that when X I , X 2 , . . . , X, are i.i.d. N ( 0 , l ) random variables, the variable S, = CE=,X z has a chisquare distribution with n degrees of freedom. Now, consider the distribution of the variable n
sk c ( x k + a k ) 2 .
(24.36)
k=l
The distribution of Sk depends on a l , a2, . . . , a, only through X = C:=,a:, and is called thc noncentral chisquare distribution with n degrees of freedom and Iioncentality parameter X = la^. When X = 0 (i.e., when all a l , . . . , a, are zero), this noncentral chisquare distribution becomes the (central) chisquare distribution in (24.17). Exercise 24.12 Let Y1,Y2,. . . , Yn be independent random variables with Y k distributed as N ( Q , g 2 ) ,and Y = Cy=,Y k / n denote the sample mean. Then, show that (nl)S2 1 7L 02
=02
C(Yk 
Y ) 2
k=l
x:=l
is distributed as noncentral chisquare with n  1 degrees of freedom and noncentmlity parameter X = x:=,(ak  u ) 2 / a 2where , zi = akin. Exercise 24.13 From (24.36), derive the mean and variance of the noncentral chisquare distribution with n degrees of freedom and noriceritrality parameter x = u;.
x:=,
In a similar manner, we can define noncentral t and noncentral F distributions which are useful in studying the power properties o f t and F tests. For example, in Eq. (24.32), we defined the F distribution with ( m ,n) degrees of freedom as thn distribution of the variable
where V arid W are independent chisquare random variables with m and n degrees of freedom, respectively. Now, consider the distribution of the variable
(24.3 7 ) where V’ and W’ are independent noncentral chisquare random variables with m and n degrees of freedom and noncentrality parameters XI and X2, respectively. The distribution of X’ is called the doubly noncentral F distribution witjh (m, n ) degrees of freedom and noncentrality parameters ( X I , X2). In the special case when A 2 = 0 (i.e., when there is a central chisquare in the denominator), the distribution of X’ is called the (singly) noncentrul F distribution with ( m ,n) degrees of freedom and noncentrality parameter XI.
Part I11
MULTIVARIATE DISTRIBUTIONS
This Page Intentionally Left Blank
CHAPTER 25
MULTINOMIAL DISTRIBUTION 2 5.1
Introduction
The multinomial distribution, being the multivariate generalization of the binomial distribution discussed in Chapter 5, is one of the most important and interesting multivariate discrete distributions. Consider a sequence of independent and idential trials each of which can result in one of k possible mutually exclusive and collectively exhaustive events, say, A l , Az, . . . , A k , with respectively probabilities p1 , p 2 , . . . ,p k , where p l pa . .. p k = 1. Such trials are termed multinomial trzals. Let Ye = ( Y I ,Yz,!, ~ , . . . , Y k , e ) , 1 = 1 , 2 , . . . , be the indicator vector variables, that is, Y,,e takes on the value 1 if the event A, ( j = 1 , 2 , . . . , k ) is the outcome of the 4th trial and the value 0 if the event A, is not the outcome of the l t h trial. Note that the variables Yl,e,YZ,e,.. . , Yk,! (which are the components of the vector Ye) are dependent, and that
+
+
+
Y1,e
+ Y2,e + . . + Y k , ! = 1, ’
e = 1,2,....
For any n = 1 , 2 , . . . , let us now define the random vector X,
(25.1) =
(XI,,, X Z , ~ ,
,Xk,n) as
x,
(25.2) + YZ + . . . + Y,, n = 1 , 2 , . . . , = y 3 , ~+ y3,2 + . . . + %,, ( j = 1 . 2 , . . . , k ) is the number of occur= Y1
where X,,, rences of event A, in the n multinomial trials. In other words, the random vector X, is simply a counter which gives us the number of occurrences of the events A l , A z , . . . , Ak in the n multinoniial trials, and hence
X l , , + X z , , + ~ ~ ~ + X k ,=, n , n = 1 . 2 , . . . .
(25.3)
Then, simple probability arguments readily yield
P, ( m l , m z , . . , n
~ k ) =
Pr { X I , , = m l , X ~=, m2,. ~ .., X ~ C= ,, m k }
m,
= 0,. ..,n,
249
ml
+ . . . + mk = n.
(25.4)
250
25.2
MULTINOMIAL DISTRIBUTION
Notations
A random vector X, = ( X l , , , x,,,,. . . , X k , , ) having the joint probability mass function as in (25.4) is said t o have the multinomial M ( n ,p l , p2, . . . ,p k ) distribution. In the case when k = 3, the distribution is also referred to a.s the
trinomial distribution.

Remark 25.1 The random vector x, M ( n , p l , p z , . . . , p k ) is actually ( k 1)dimensional since its components XI,^, Xz,,, . . . , X k , , satisfy the relationship XI,, X 2 , n . . f x k , n = n
+
+
'
and, consequently, one of the components (say, x k , n ) can be expressed as xk,, =n

XI,,

X z , ,  . . .  xk 1 ,n.
Hence, the distribution of the random vector X, = (XI,,, X 2 , , , . . . , Xk,lL) is completely determined by the distribution of the ( k  1)dirnensional random vector XI,^, X2,,,. . . , Xkl,?L). For cxaniple, when k = 2, the probabilities P,(ml, m.2) in (25.4) simply become
which are the binomial probabilities.
25.3
Compositions
Due to the probability interpretation of multinomial distributions given above, it readily follows that if independent vectors Y1, Y2,. . . ,Y, all have multinomial h ' ( l , p 1 , p 2 , . . . , p k ) distribution, then t h e s u m x , = Y 1 + Y 2 + . . . + Y n has the iiiult,iriomial M ( n , p l , p 2 , . . . , p k ) distribution. In addition, if X M(n1,p1.p2.. . . , p k ) and Y M ( n 2 ,p l , p a , . . . , p k ) are independent multinomial random vectors, then X Y is distributed as the sum Y1 Y2 . . . Y,,+,, of i.i.d. multinomial M ( l , p l , p z , . . . ,p,+)random vectors, and hence, is distributed as multinomial M(nl n2,p1,pzr.. .,pk).

+
+
+ +
+
25.4
Marginal Distributions
The fact that the multinomial distribution is the joint distribution of the number of occurrences of the events Al, A2,. . . ,A,+ in n rnultinornial trials enables ub to derive easily any marginal distribution of interest. Suppose that we are interested in finding the marginal probabilities
P r ( X 1 , , = m ~ , X a=, m ~,...,X,,,=m,}
CONDITIONAL DISTRIBUTIONS for j
< k , when X,

25 1
M ( n , p l , p z , .. . , p k ) . We first note that
P r { X I , , = m l , . . . ,X j , ,
=mj}
= Pr {XI,, = m l , . . . , X j , , = mj, V = m } ,
(25.6)
where
evidently, V denotes the number of occurrences of the event A = Aj+l U Aj+2 U.. .U A k in the n multinomial trials with the corresponding probability of occurrence being
Pr{A} = pj+l
+ ... + p k
=
1 p l

. . .  p .3

P (say) .
Then, the random vector (XI,,,. . . , Xj,,, V) clearly has the multinomial M ( j l , p l , . . . , p j , p ) distribution; then, using (25.4) and (25.6), we have
+
Pr
= m l , . .., X j , , = m j }
=
P, ( m l ,. . . ,mj,m )
j
j
m + C m i = n and p + C p i = l . i=l i=l (25.7) In particular, for j of XI , , as Pr
=
1, we simply obtain from (25.7) the marginal distribution
= rnl} =
n!
p Y l ( 1  pl)nml , ml
ml! ( n ml)!
= 0 ,..., n,
(25.8)
which simply reveals that the marginal distribution of X I , , is binomial B ( n , p l ) Similarly, we have X , , B(n,p,) for any r = 1 , 2 , . . . , k . N
2 5.5

Conditional Distributions
M ( n , p l , p z , .. . , p k ) . Consider now the conditional distribution of (X,+i,,,. . . , X k , 7 L )given , ( X i , , = m i , . . . ,X,,, = m,) , defined by
Let X,,
Pr{X,+i,,
= m3+1,.
.. ,Xk,,
= mk
1 XI,,
= m l , .. .
,x,,,= m,} (25.9)
Substituting the expressions in (25.4) and (25.7) into (25.9), we obtain
MULTINOMIAL DISTRIBUTION
252
Pr {Xj+l,, = n ~ j +. .~. ,,X k , ,
=mk
1 X1,,
= m l , . . . ,Xj,,, = m j ]
(25.10)
+. +
+. +
for mj+l . . mk = n  (ml . . m j ) ,and 0 otherwise. From (25.10), we readily observe that the conditional distribution of ( X j + l , n ,. . . ,Xk,7L),given = m l , . . . ,Xj,, = m j , is multinomial M ( n  m , y j t l , . . . , y k ) , where m = 'tn1 . . . m j , yi = p i / p (for i = j 1 , .. . , k ) , and p = p j + l + . . . + p k . Since the dependence on ml, m2,. . . , mj in (25.10) is only through the sum 7n1 ni2 . . . mj, we readily note t1ia.t
+ + + + +
+
P r {Xj+l,, = mj+l,.. . , X k , , = m
k
I
= m l , .. . . X j , , = mj)
(25.11) hence, the conditional distribution of (x2+l,n, . . . , X k , l L ) , given Xl,+ . . . X 2 , n= r n , is also the same multinomial M ( n  7)2,y J + l , . . . , y k ) distribution.
+
25.6
+
Moments
Let x,, :( X I,,,, . . . , x k , n ) M ( n . p l . . . . , p k ) . Then, since the marginal distribution of X,,, ( r = 1 , 2 , . . . , k ) is binomial B(n,p,), we readily have N
EX,,,,
= 7lp, = e,
and Var Xr,n = np,(l

p,) = (T, 2 .
(25.12)
Next, in order t o derive the correlation between the variables X,,,, ( r 1.. . . , k ) , we shall first, find the covariance using the formula
where the last equality follows from the fact that
=
MOMENTS
j
i
i
j
=
253
Cj ~r { x ~ j ,> E~(Xr,nIXs,n =
j
=E
=j
) (25.14)
{Xs,n E (Xr,nlXs,n)}.
Now, we shall explain how we can find the regression E (Xr,nlXs,,) required in (25.13). For the sake of simplicity, let us consider the case when T = 2 and s = 1. Using the fact that the conditional distribution of the vector (X2,,,. . . , X k , n ) , given = m, is multinomial M ( n  m, q 2 , . . . , q k ) , where qi = pi/( 1  p l ) (i = 2,. . . , k ) , we readily have the conditional distribution of X 2 , n , given XI,^, = m, is binomial B ( n  m, q 2 ) ; hence, (25.15) from which we obtain the regression of
X2,,
to be
on
Using this fact in (25.13), we obtain 012
= 
E XI,^ XZ,,)e1e2 E 1x1,~ E (X2,nlt1,n)} e1e2 EP2 1P1 Pa lpl

( n  Xl,n)P2 1 P1
~
( n XI,,)} 2

e1e2
{ n Pl nP1(1P1)n2P?}
nPlP2.
7L2PlP2
(25.16)
Similarly, we have
From (25.12) and (25.17), we thus have the covariance matrix of X, (Xl,n,.. . , X k , n ) to be
=
(25.18) for 1 5 T , S 5 k . Furthermore, from (25.12) and (25.17), we also find the correlation coefficient between Xr,n and X s , n (1 5 T < s 5 k ) to be (25.19)
254
MULTINOMIAL DISTRIBUTION
It is important t o note that all the correlation coefficients are negative in a multinomial distribution.
Exercise 25.1 Derive the multiple regression function of . .. Xy,n.
XT+1,7Lon
XI,^,
1
25.7
Generating Function and Characteristic Function
Consider a random vector Y = (Y1,. . . , Yk)having M ( l , p l ,. . . ,pk) distribution. As pointed out earlier, the distribution of this vector is determined by the nonzero probabilities (for T = 1,. . . , k ) py = P r {Yl = 0 , . . . , Yrl= 0, Yr 1,Yy+1= 0 , . . . , Yk = o} .
Then, it is evident that the generating function
$ ( s 1 , s2,..
(25.20)
. , sk) of Y is (25.21)
+
+
Since X, N M ( n , p l , .. . ,pk) is distributed as the sum Y1 . . . Y, [see Eq. (25.2)], where Y1,.. . , Y n are i.i.d. multinomial M ( l , p l , .. . ,pk) variables with generating function as in (25.21), we readily obtain the generating function of X, as
n
=
(Q(S1,
. . . ,Sk)}n
=
From (25.22), we deduce the generating function of m = l , . . . , k  1 ) as Rn(s1,.
. . ,S r n )
=
Pn(S1,.
. . , sm, 1,.. . ,I)
In paxticular, when m = 1, we obtain from (25.23)
. . , X m , 7 1 (for )
GENERATING FUNCTION AND CHARACTERISTIC FUNCTION 255 which readily reveals that X I , , is distributed as binomial B ( n , p l ) (as noted earlier). Further, we obtain from (25.23) the generating function of the sum X I , , . . . Xm,n (for m = I , . . . , k  1) as
+ +
EsX13,+...+X,,
=Rn(S,.
. . ,s ) =
+
{+ 1
(8 
1)e p r } ; ' ,
r=l
+ +
(25.25)
which reveals that the sum XI,^ . . . Xm,n is distributed as binomial B (n, p r ) . Note that when m = k , the sum XI,^ . . . Xk,n has a degenerate distribution since X I , , + . . . Xk,, = n.
c7==l
+
+
Exercise 25.2 From the generating function of X, M ( n , p l , . . . , p k ) in (25.22), establish the expressions of means, variances, and covariances derived in (25.12) and (25.17). N
Exercise 25.3 From the generating function of (XI,,, . . . , X m , n ) in (25.23), prove that if m > n and m 5 k , then E (XI,,.. . Xm,+)= 0. Also, argue in this case that this expression must be true due to the fact that at least one of the X r , n ' ~ must be 0 since . . . Xk,, = n.
+
+
From (25.22), we immediately obtain the characteristic function of X n M ( n , ~ l ., .., ~ l i as ) N
E
f n ( t l , .. . , tk) =
{
ei(tlXl,~+'..+tkxk,n)
P, (eitl, . . . ,e i t k )
3
(25.26) In addition, from (25.23), we readily obtain the characteristic function of (XI,,, . . . , X m , n ) (for m = I , . . . ,k  1) as g,(tl,. . . , tm)
= =
R, ( e i t l , .. . ,eitm)
{+ 1
n
m
x p r
(,it,

1)) .
(25.27)
r=l
Exercise 25.4 From (25.27), deduce the characteristic function of the sum XI,^ . . . Xm,, (for m = 1 , 2 , . . . , k  1) and show that it corresponds to that of the binomial B (n, pr).
+ +
xr=l
256
25.8
MULTINOMIAL DISTRIBUTION
Limit Theorems
Let 11s now consider the sequence of random vectors
Xn
= ( X 1 , n r . .. , X k , n ) N
M(n,pl,.. . ,
(25.28)
~ k ) ,
where p k = I  C“’ p,. Let p , = A T / n for T = 1,.. . , k  1. Then, for nz = k1, the characteristic function of . . , X ~  I , ~in&(25.27) ) becomes n
(25.29) Letttirig n
4
00
in (25.29), we observe that
where h,(t) = exp {A, (ezt 1)) is the characteristic function of the Poisson .(A,.) distribution (for T = 1,.. . , k  1). Hence, we observe from (25.30) the components XI,,, . . . , X k  l , , of the multinomial random tha.t, a.s ‘n+ x, vector X, in (25.28) are asymptotically independent and that the marginal distribution of X , , converges to the Poisson .(A,) distribution for any r = 1 , 2 , . . . , k  1. ~
+
+
Exercise 25.5 Using a similar argument, show that XI,, . . . X m , , (for rn = 1,.. . , k  1) converges to the Poisson 7r ( C r = l A,) distribution.
Next, let tors
11s corisider
the sequence of the ( k  1)dimensional raridorri vec
Let h,,(tl.. . . , t k  1 ) be the characteristic function of W, in (25.31). Then, it follows froni (25.27) that
(25.32)
Exercise 25.6 As n to
LIMIT THEOREMS
257
show that h,(tl,.
. . , t k  l ) in (25.32) converges
4 m,
where
is the correlation coefficient between Xr , n and X s , n derived in (25.19).
From (25.33), we see that the limiting characteristic function of the random variable XT.n  npr Wr,n = nP ( 1  P 1
dy
(it:),
becomes exp which readily implies that the limiting distribution of the random variable Wr,, is indeed standard normal (for T = 1,.. . , k  1). Furthermore, in Chapter 26, we will see that the limiting characteristic function of W, in (25.33) corresponds to that of a multivariate normal distribution with mean vector (0,. . . , 0) and covariance matrix ifi=i
(25.34)
for 1 5 i , j 5 k  1. Hence, we have the asymptotic distribution of the random vector W, in (25.31) to be multivariate normal.
This Page Intentionally Left Blank
CHAPTER 26
MULTIVARIATE NORMAL DISTRIBUTION 26.1
Introduction
The multivariate normal distribution is the most important and interesting multivariate distribution and based on it, a huge body of multivariate analysis has been developed. In this chapter we present a brief description of the multivariate normal distribution and some of its basic properties. For a detailed discussion on multivariate normal distribution and its properties, one may refer t o the book by Tong (1990). At the end of Chapter 25 (see Exercise 25.6), we found that the limiting distribution of a sequence of multinomial random variables has its characteristic function as [see Eq. (25.33)]
h ( t l , .. . ,tkl)
= exp
{

1
Q ( t l , . . . , tk1)) ,
(26.1)
where Q (tl, . . . ,tk1) is the quadratic form
Q(t1,. . . d k  1 ) =
k1
Ct:+ 2 C
Prstrts.
(26.2)
l 0. Then, the joint density function of Y1, . . . ,Yn+lis N
With the transformation in (27.4), we obtain the joint density function of Vl,. . , Vn+l as '
Now making the transformation in (27.7) (with the Jacobian as X:+l), obtain from (27.14) the joint density function of X I , . . . ,Xn+las P X 1 , ...,Xn+1 (51,.. '
x (1

we
,Zn+l)
p)an+l', n
Upon integrating out the variable xn+l in (27.15), we then obtain the joint density function of X I , . . . , Xn as PXl,...,x, (21, . . . , Z n )
c n
0 5 21, . . . , 2 n 5 1, 0 5
22
i=l
I 1.
(27.16)
272
DIRICHLET DISTRIBUTION
Indeed, from the fact that (27.16) represents a density function, we readily obtain
which is exactly the Dirichlet integral formula presented in (27.12). We thus have a multivariate density function in (27.16) which is very closely related to the multidimensional integral in (27.12) evaluated by Dirichlet (1839).
Notations
27.3
A random vector X = ( X I ,. . . , X,) is said to have an ndimensional standard Dirichlet distribution with positive parameters a l , . . . , a,+l if its density function is given by (27.16) and is denoted by X D,(al,. . . , a,+l). Note that when
71. =

1, (27.16) reduces to
which is nothing but the standard beta distribution discussed in Chapter 16. Indeed, the linear transformation of the random va.riables X I , . . . ,X , will yield the random vector (bl c1X1,. . . , b, cnXn) having an ndimensional general Dirichlet distribution with shape parameter ( a l ,. . . , a,+l), location parameters ( b l , . . . , b,), and scale parameters (c1,. . . , c,). However, for the rest, of this chapter we consider only the standard Dirichlet distribution in (27.16), due t o its simplicity.
+
+
Marginal Distributions
27.4
Let X D n ( a l , . . . ,a,+l). Then, as seen in Section 27.2, the components XI, . . . , X , admit the representations N
(27.17) and yk
d
XI, =

~
s,.+1
,
k = 1 , . . . ,n,
(27.18)
where Y1. . . . , Yn+l are independent standard gamma random variables with Yk r ( a k ,0,1) and = Y1 .. Y,+l. From the properties of gamma distributions (see Section 20.7), it is then known that
sn+l
S,,,

+. +
r ( a , 0 , l ) with a = a1
+ . . . + a,+l,
MARGINAL DISTRIBUTIONS
Sn+l
Yk
N
r ( a  a k , 0,1) (independent of
and d
y k
&
XI, = sn+l
273
y k
y k
+ (!%+I
Yk)

Yk)
(27.19)
B e ( a k , u  arc);
that is, the margina.1 distribution of X k is beta B e ( a k , a  u k ) , k = 1,.. . ,n (note that this is just a Dirichlet distribution with n = 1). Thus, the Dirichlet distribution forms a natural mult,ivariate generalization of the beta distribution. Exercise 27.1 From the density function of X in (27.16), show by means of direct integration that the marginal distribution of x k is B e ( a k , a  a k ) .
For twodimensional marginal distributions, it follows from (27.17) that for any 1 5 k < e 5 n,
where Yk a t , 0, l), and that N
(27.20)

0, I ) , Z =  Y k  5 r(a ak 0, I),5 and 2 are independent. Thus, (27.20) readily implies
Y k , Ye,
N
(27.21) In a similar manner, we find that
for any
T
= 1 , . . . ,n. and 1 5 k(1)
< k ( 2 ) < . . . < k ( r ) 5 n.
Exercise 27.2 If (XI,.. . ,XS)
N
(Xl,x2 + x3,x4
D ~ ( a 1 ,. . , a 7 ) , show that
+ x5 + X S )
Exercise 27.3 Let (XI,.. . , X n ) of Wk = XI + . . . X k .
+
N
DD3(Ul,a2
+ a3, + a5 + a(j,a7). a4
D n ( a l , . . . ,u,+l). Find the distribution
Exercise 27.4 Let (XI,X 2 ) Dz(a1, a2, a 3 ) . Obtain the conditional density function of XI, given X2 = 2 2 . Do you observe some connection to the beta distribution? Derive an expression for the conditional mean E(XIIX2 = 2 2 ) and comment. N
2 74
DIRICHLET DISTRIBUTION
Marginal Moments
27.5
Let X V , ( a l , . . . ,a,+l). In this case, as shown in Section 27.4, the marginal distribution of Xk is beta Be(ak, a  a k ) , where a = a1 . . . a,+l. Then, from the formu1a.s of moments of beta distribution presented in Section 16.5, we immediately have
+ +
N
27.6

Product Moments
Let X D n ( a l , .. . , a,+l). Then, from the density function in (27.16), we have the product moment of order ( a 1 , .. . , a,) as
E ( X p l .. . X:")
where the last equality follows by an application of the Dirichlet integral formula. in (27.12). In particula.r, we obtain from (27.23) that
and
Exercise 27.5 Let X N D T L ( a l., . , lation coefficient between X I , and X!.
Derive the covariance and corre
2 75
DIRICHLET DISTRIBUTION OF SECOND KIND
27.7
Dirichlet Distribution of Second Kind
In Chapter 16, when dealing with a beta random variable X probability density function

B e ( a , b) with
by considering the transformation Y = X/(1  X ) or equivalently X = Y / ( 1 Y ) ,we introduced the beta distribution of the second kind with probability density function
+

In a similar manner, we shall now introduce the Dirichlet distribution of the second kind. Specifically, let X D,(al,. . . , a,+l), where a k > 0 ( k = 1 , . . . , n + 1) are positive parameters. Now, consider the transformation
x1 . . . ,y, Y1 = 1  x1 . . . Xn ’
1
1
x n
x1

. . .  xrl
(27.24)
or equivalently, x1=
Yl ... l+Y1+...+Yn’
,x, 1 + Y1 +Yn. . ’ + Y,‘ 1
(27.25)
+ +
Then, it can be shown that the Jacobian of this transformation is (1 Y1 . . . + Y,)p(ntl).Then, we readily obtain from (27.16) the density function of Y = (YI,.. . ,Yn)as
The density function (27.26) is the Dirichlet density of the second kind.
Exercise 27.6 Show that the Jacobian of the transformation in (27.25) is (1+ Yl + . . . + Yrl)(n+l) (use elementary row and column operations). Exercise 27.7 Suppose that Y has a Dirichlet distribution of t,he second kind in (27.26). Derive explicit expressions for EYk, Var Yk, cov(Yk,f i ) , and correlation p(Yk,Ye).
276
27.8
DIRICHLET DISTRIBUTION
Liouville Distribution
Liouville (1839) generalized the Dirichlet integral formula in (27.12) by establishing that
where a l , . . . , u, are positive parameters, 2 1 , . . . , 2 , are positive, and f ( . ) is a suitably chosen function. It is clear that if we set h = 1 and choose f ( t ) = (1 t)an+ll, (27.27) readily reduces to the Dirichlet integral formula in (27.12). Also, by letting h + 00 in (27.27), we obtain the Liouuille integral formula
where ~ 1 ,. .. , ulL> 0 and tal+."+anl f ( t )is integrable on ( 0 ,co). The Liouville integral formula in (27.28) readily yields Liouwillr distribution with probability density function
p x l ,...)x n ( Z 1 , . . . , 2 , ) 21,
...,% > O ,
= a1
c f(21 + . ' . + 2 , )
,..., a,>O,
2y11
..' x a n T L  l
l
(27.29)
where C is a normalizing constant a.nd f ( . ) is a nonnegative function such that f ( t ) t " i + ' . ' f a . z  1 is integrable . on (0, co). For a historical view a.nd details on the Liouville distribution, one may refer to Gupta and Richards (2001). '
Exercise 27.8 Show that the Dirichlet distribution of the second kind in (27.26) is a Liouville distribution by choosing the function f ( t ) appropriately and then determining the constant C from the Lioiiville integral forniiila in (27.28).
APPENDIX PIONEERS IN DISTRIBUTION THEORY
As is evident from the preceding chapters, several prominent mathematicians and statisticians have made pioneering contributions to the area of statistical distribution theory. To give students a historical sense of developments in this important and fundamental area of statistics, we present here a brief biographical sketch of these major contributors. Bernoulli, Jakob Born  January 6,1655, in Basel, Switzerland Died  August 16, 1705, in Basel, Switzerland Jakob Bernoulli was the first of the Bernoulli family of Swiss mathematicians. His work Ars Conjectandi (The Art of Conjecturing),published posthumously in 1713 by his nephew N. Bernoulli, contained the Bernoulli law of large numbers for Bernoulli sequences of independent trials. Usually, a random variable, taking values 1 and 0 with probabilities p and 1  p , 0 p 1, is said to have the Bernoulli distribution. Sometimes, the binomial distributions, which are convolutions of Bernoulli distributions, are also called the Bernoulli distributions.
<