A primer on statistical distributions

A PRIMER ON STATISTICAL DISTRIBUTIONS N. BALAKRISHNAN McMaster University Hamilton, Canada V. B. NEVZOROV St. Petersbu...

Author: N. Balakrishnan | Valery B. Nevzorov

717 downloads 2758 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

A PRIMER ON STATISTICAL DISTRIBUTIONS

N. BALAKRISHNAN McMaster University Hamilton, Canada V. B. NEVZOROV St. Petersburg State University Russia

A JOHN WILEY & SONS, INC., PUBLICATION


This Page Intentionally Left Blank


N. BALAKRISHNAN McMaster University Hamilton, Canada V. B. NEVZOROV St. Petersburg State University Russia

A JOHN WILEY & SONS, INC., PUBLICATION

Copyright 0 2003 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, lnc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-601 1, fax (201) 748-6008, e-mail: [email protected]. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representation or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services please contact our Customer Care Department within the U.S. at 877-762-2974, outside the US.at 317-572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however, may not be available in electronic format. Library of Congress Cataloging-in-Publication Data:

Balakrishnan, N., 1956A primer on statistical distributions / N. Balakrishnan and V.B. Nevzorov. p. cm. Includes hibliographical references and index. ISBN 0-471-42798-5 (acid-free paper) 1. Distribution (Probability theory) I. Nevzorov, Valery B., 1946- 11. Title. QA273.B25473 2003 519.2'4-dc21 Printed in the United States of America. I 0 9 8 7 6 5 4 3 2 1

2003041157

To my lovely daughters, Sarah and Julia

CNJN

To my wge, Ludmila (VB.N.)


CONTENTS PREFACE

xv

1 PRELIMINARIES 1.1 Ra.iidoni Varia.bles arid Distribut.ions . . . . . . . . . . . . . . . 1.2 Type of Distribution . . . . . . . . . . . . . . . . . . . . . . . . 1.3 h'foinent. Cha.ra.cteristics . . . . . . . . . . . . . . . . . . . . . . 1.4 S h p e Chara.ctcristics . . . . . . . . . . . . . . . . . . . . . . . 1.5 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Geiic?r;tt.iiig Function arid Cha.ra.cterist,icFuiict.iori . . . . . . . . . . . . . . . . . . . . . . 1.7 Decomposition of Distributions . . . . . . . . . . . . . . . . . . 1.8 St.able Dist.rihiitions . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Randoin Vectors and Multivariat.e Dist.ributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Conditional Distributioiis . . . . . . . . . . . . . . . . . . . . . 1.11 Moiiient Clmrxteristics o f Random Vect.ors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12 Coiidit.iona.l Expect,a.t,ioiis . . . . . . . . . . . . . . . . . . . . . 1.13 Regressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14 Generat.ing Function of R.antlom Vwt.ors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.15 Tra.iisforiiia,t,ioiisof Variables . . . . . . . . . . . . . . . . . . .

I

DISCRETE DISTRIBUTIONS

1 1 4 4 7 8 10 14 14 15 18 19 20 21 22 24

27

29 2 DISCRETE UNIFORM DISTRIBUTION 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2 Not,a.tioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 Molllrllt.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 Grric:rat. iiig Fiinct.ion a.iid C1ixact.erist.ic. Fnnct.ion . . . . . . . . . . . . . . . . . . . . . . 33 2.5 Convoliitioris . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6 Decorriposit.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.7 Eiitropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.8 Rrhtiorisliips with Otlic'r Ilistributioiis . . . . . . . . . . . . . . 36

vii

viii

CONTENTS

3 DEGENERATE DISTRIBUTION 3.1 Int.rocluctioii . . . . . . . . . . . . . 3.2 hlommts . . . . . . . . . . . . . . 3.3 IiidcI,cndcnc:t:i(.c . . . . . . . . . . . . 3.4 Convolution . . . . . . . . . . . . . 3.5 Decorriposit.ion . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39 39 39 40 41 41

4 BERNOULLI DISTRIBUTION 4.1 1nt.roduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Nota.tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Convolut.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Maximal Valur:s . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Rda.t,ioriships with Other Distribiitions . . . . . . . . . . . . . .

43 43 43 44 45 46 47

5 BINOMIAL DISTRIBUTION 5.1 Iiitroduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Not,a.t,ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Useful Representatiori . . . . . . . . . . . . . . . . . . . . . . . 5.4 Generating Function a.nd Cliara.cterist,ic Function . . . . . . . . . . . . . . . . . . . . . . 5.5 R'lorllerlt.s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 nlaxiiiiiiiii Proba.bilit,ies . . . . . . . . . . . . . . . . . . . . . . 5.7 Coiivoliitjioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5.8 Dec.oinposit,ions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 hlixturcs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Coritlitioiial Probahilities . . . . . . . . . . . . . . . . . . . . . . 5.11 Tail Probabilit.ies . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Liiiiithg Distributions . . . . . . . . . . . . . . . . . . . . . . .

49 49 49 50 50 50 53 56 56 57 58 59

59

6 GEOMETRIC DISTRIBUTION 63 6.1 1nt.roduct.ion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 63 6.2 N o t a t h i s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Tail Prolmbilities . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.4 Grricratiiig Fiinctioii mid Characteristic Funct.ion . . . . . . . . . . . . . . . . . . . . . . 64 6.5 Moinc~nts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.6 Convolut.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 . . 6.7 L>cvx)rripositiolls . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.8 Eiit.ropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.9 Conditional Probabilitics . . . . . . . . . . . . . . . . . . . . . . 71 6.10 Gtwiwt.ric Dist.riliiit.ion of 0rtic.r k . . . . . . . . . . . . . . . . 72

7 NEGATIVE BINOMIAL DISTRIBUTION 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Gcneratirig Function and Characteristic. Fiinctioii . . . . . . . . . . . . . . . . . . . . . .

73 73 74 74

CONTENTS

7.4 7.5 7.6 7.7

ix . . . .

. . . .

. . 74 . . . 76 . . 80 . . 81

. . . . . .

. . . . . .

. . . . . . . . . . . .

83 83 84 84 84 88

9 POISSON DISTRIBUTION 9.1 Iiitrotluctiori . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

hlornrmts . . . . . . . . . . . . . . . . . . . . . . . . . . Coiivolutioris arid Decoiripositions . . . . . . . . . . . Tail Probabilities . . . . . . . . . . . . . . . . . . . . . . Limiting Distributions . . . . . . . . . . . . . . . . . . .

8 HYPERGEOMETRIC DISTRIBUTION 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 8.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Grrierating Function . . . . . . . . . . . . . . . . . . 8.4 Chamcteristk Function . . . . . . . . . . . . . . . . 8.5 Moment.s . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Limiting Distxibut.ions . . . . . . . . . . . . . . . . .

83 . . . . . .

. . . . . .

89 9.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 9.3 Geiierating Function and 90 Chara.cterist,ic Function . . . . . . . . . . . . . . . . . . . . . . 9.4 Moirirwts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 9.5 Tail Prol-mbilities . . . . . . . . . . . . . . . . . . . . . . . . . . 91 9.6 Corivoliit.ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 9.7 Dt.c:oiiiposit,ioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 9.8 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . 94 9.9 Maxirrial Probability . . . . . . . . . . . . . . . . . . . . . . . . 95 96 9.10 Limiting Distribut.ion . . . . . . . . . . . . . . . . . . . . . . . . 9.11 Mixt.iires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.12 R.a.0-Rubin Cha.r~r.ct,t:rizatiori . . . . . . . . . . . . . . . . . . . 99 9.13 Geiicra.lized Poisson Distril-nition . . . . . . . . . . . . . . . . . 100

10 MISCELLANEA

10.1 10.2 10.3 10.4

I1

Iiitrodiictiori . . . . . . . . . . . . . . . . . . . P6lya Distribution . . . . . . . . . . . . . . . Pascal Distributioii . . . . . . . . . . . . . . . Negative Hypergeometric Distrihiitioii . . .

101 . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . .

CONTINUOUS DISTRIBUTIONS

11 UNIFORM DISTRIBUTION 11.1 Intmtluctioii . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

101 101 102 103

105 107

107 11.2 Nota.tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 11.3 Moiiiellts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 11.4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 110 11.5 Chara.ctcristic Furict.ion . . . . . . . . . . . . . . . . . . . . . . 11.6 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 11.7 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 11.8 Probability Integral Tra.nsforni . . . . . . . . . . . . . . . . . . 112 11.9 Distrihut.ioris of' Minima aad hla.xiriia . . . . . . . . . . . . . . . 112

CONTENTS

X

11.10 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . .

114

11.11 Relationships with Other Distributions . . . . . . . . . . . . . 117

12 CAUCHY DISTRIBUTION 12.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 12.4 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . .

119 119 120 120 120 121

12.6 St.able Distributions . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . .

121 121

13 TRIANGULAR DISTRIBUTION 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 C1ia.racteristic Function . . . . . . . . . . . . . . . . .

. . . . .

123 123 123 124 125

14 POWER DISTRIBUTION 14.1 Iiitrodiiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

127 127 127

14.3 14.4 14.5 14.6

..... ..... .....

Distributions of Maximal Values . . . . . . . . . . . . . . . . . 128 129 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 131

15 PARETO DISTRIBUTION 15.1 Introduction . . . . . . . . . . . . . . . 15.2 Notations . . . . . . . . . . . . . . . . 15.3 Distributions of Minimal Values . . . 15.4 Moments . . . . . . . . . . . . . . . . 15.5 Entropy . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .............. ..............

16 BETA DISTRIBUTION 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4 Some Transformations . . . . . . . . . . . . . . . . . . . . . . . 16.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.6 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 16.7 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 16.8 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.9 Relationships with Other Distributions . . . . . . . . . . . . . .

133 133 133 . 134 136 137 139 139 140 140 141 141 147 147 148 149

CONTENTS

xi

17 ARCSINE DISTRIBUTION 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8

151 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 151 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 154 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 154 Relationships with Other Distributions . . . . . . . . . . . . . . 155 155 Characterizations . . . . . . . . . . . . . . . . . . . . . . . . . . Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

18 EXPONENTIAL DISTRIBUTION 157 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 18.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 18.3 Laplace Transform and Characteristic Function . . . . . . . . 158 18.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 18.5 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 160 18.6 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 162 18.7 Distributions of Minima . . . . . . . . . . . . . . . . . . . . . . 18.8 Uniform and Exponential Order Statistics . . . . . . . . . . . . 163 164 18.9 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18.10 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . 165 18.11 Lack of Memory Property . . . . . . . . . . . . . . . . . . . . . 167 19 LAPLACE DISTRIBUTION 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.3 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 19.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.5 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 19.6 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.7 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.8 Deconipositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.9 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .

169 169 169 170 171 172 172 173 174 174

20 GAMMA DISTRIBUTION 179 20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 20.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 20.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 20.4 Laplace Transform and Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 20.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 20.6 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 182 20.7 Convolutions and Decompositions . . . . . . . . . . . . . . . . . 185 20.8 Conditional Distributions and Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 20.9 Limiting Distributions . . . . . . . . . . . . . . . . . . . . . . . 187

xii

CONTENTS

21 EXTREME VALUE DISTRIBUTIONS 21.1 Int.roduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Limiting Distributions of Maximal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Limiting Distributions of Minimal Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Relationships Between Extreme Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Generalized Extreme Value Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

189 189 190 191 191 193 194

22 LOGISTIC DISTRIBUTION 197 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 22.2 Not.ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 22.3 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 22.4 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 201 22.5 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 201 22.6 Relationships with Other Distributions . . . . . . . . . . . . . . 203 22.7 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 22.8 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 22.9 Gcneralized Logistic Distributions . . . . . . . . . . . . . . . . 205 23 NORMAL DISTRIBUTION 209 23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 23.2 Not.ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 23.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 23.4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 23.5 Tail Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 23.6 Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 214 23.7 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 23.8 Shape Characteristics . . . . . . . . . . . . . . . . . . . . . . . 217 23.9 Convolutions and Decompositions . . . . . . . . . . . . . . . . 217 23.10 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . 219 23.11 Independence of Linear Combinations . . . . . . . . . . . . . . 220 221 23.12 Rernstein’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . 23.13 Darnlois-Skitovit. ch’s Theorem . . . . . . . . . . . . . . . . . . 224 23.14 Helmert’s Transformation . . . . . . . . . . . . . . . . . . . . 226 23.15 Identity of Distributions of Linear Coixbinations . . . . . . . . 227 23.16 Asymptotic Relations . . . . . . . . . . . . . . . . . . . . . . 228 23.17 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 229 24 MISCELLANEA 235 24.1 Introdiiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 24.2 Linnik Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 235 24.3 Inverse Gaussian Distribution . . . . . . . . . . . . . . . . . . . 237 24.4 Chi-Square Distribution . . . . . . . . . . . . . . . . . . . . . . 239 24.5 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

CONTENTS

xiii

24.5 t Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.6 F Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 24.7 Noncentra.1 Distribut.ions . . . . . . . . . . . . . . . . . . . . . .

240 245 246

I11

MULTIVARIATE DISTRIBUTIONS

247

25 MULTINOMIAL DISTRIBUTION 25.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.2 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.3 Conipositioiis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . 25.5 Conditional Distrihutioris . . . . . . . . . . . . . . . . . . . . . 25.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25.7 Generating Function and Characteristic Function . . . . . . . . . . . . . . . . . . . . . . 25.8 Limit Theorcnis . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 MULTIVARIATE NORMAL DISTRIBUTION 26.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 26.2 Notations . . . . . . . . . . . . . . . . . . . . . . . 26.3 Marginal Distributions . . . . . . . . . . . . . . . . 26.4 Distributions of Sums . . . . . . . . . . . . . . . . 26.5 Linear Combinations of Components . . . . . . . 26.6 Indeptmlenrc of Components . . . . . . . . . . . 26.7 Linear Transformations . . . . . . . . . . . . . . . 26.8 Bivariatc Normal Distribution . . . . . . . . . . .

249 249 250 250 250 251 252 254 256

259 259 260 ...... 262 . . . . . . 262 . . . . . . . 262 . . . . . . . 263 . . . . . . 264 . . . . . . . 265

. . . . . . . . . . . . . .

.

. . . . .

27 DIRICHLET DISTRIBUTION 269 27.1 Iiitroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 27.2 Derivation of Dirichlct Formula . . . . . . . . . . . . . . . . . . 271 27.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 27.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . 272 27.5 Marginal Moments . . . . . . . . . . . . . . . . . . . . . . . . . 274 27.6 Product Moments . . . . . . . . . . . . . . . . . . . . . . . . . 274 27.7 Diriclilet Distribution of Second Kind . . . . . . . . . . . . . . 275 27.8 Lioiiville Distribution . . . . . . . . . . . . . . . . . . . . . . . 276 APPENDIX

PIONEERS IN DISTRIBUTION THEORY

.

277

BIBLIOGRAPHY

289

AUTHOR INDEX

294

SUBJECT INDEX

297


PREFACE Distribiit,ioiis and t,heir properties a.nd iiiterrelationsliips assume a. very irnport,ant role iri most, upper-level uiidergradiia.tc a,s well a.s gradua.t,e courses in the st,at,istic:sprograin. For this rea.son, many introductory st,atistics textbooks discuss in a chapter or two a few h s i c st;ttistical distributions, such as binomial, Poisson, exponciitial, and normal. Yet a. good knowledge of sonic other distributions, such a.s gcornet,ric, negative binomial, Pareto, beta, ga.rrima, chi-square, logistic, Laplace, extreme value, niultinornial, niultivaria,te iiorIiia1, and Dirichlet will t)c iniiiierisely useful to those students who go on t,o upper-level untlergraduat,e or graduate courses in statist,ics. Students iii applied programs such as psychology, sociology, biology, geogra.phy, geology, econoinics, business, and erigirieeririg will also benefit sigriificaiit,ly from a,ri exposurc: to different distributions and their propert,ies as statisticti1 riiodelling of observed data is a.11intogral p a t , of t,h It, is for this rea.son we have prepa.red this textbook which is tailor-made (i1 distributions. All for a. one-term course (of a.bont, 35 lectures) on s t tlie prelimiimry coric:r:pt,s arid definitions are prcs in Chapter 1. The rest of the inaterial is divided into three parts, witah Part I covering discretc: distributions, Pa.rt I1 covering continuous distributions, a.nd Part I11 covering niult,ivariate distributions. In ea.ch cha.pt,er we have induded a few pertinent exercises (at,ail a.ppropriate level for students taking tlie course) which may be handed out as lioiriework at the end of each chapter. A biographical sketch of some of the 1oa.ding contribut,ors to t,hc area of statistical distribution theory is presented in the Appcndix to present students with a historical sensc o f development,s in t,his irnporta.nt a.nd fundamental area in the field of statistics. From our experience, uld suggest the following lecture allocation for teaching a, course on sta distributions based on t’his hook: 5

9 17 4

lectures lectures lectures lectures

on on

on

on

p7elirninaries discrete distributions continuous distributioris multi.ocl7.iatt:iu~iatedi~tributions

(Cha.pter 1) (Part, I) (Pa.rt 11) (Pa.rt 111)

We welcome cxmirnerits and criticisms from all those who tcach a coiirsc based on this hook. Any suggest ions for improvcnient or “neccssary” addit ion (omission of which in this vcmion should be regarded a s a consequcncc of our xv

PREFACE

xvi

ignorance, not of personal nonscic:ntific antipa.t,hy) sent t,o us will be riiuc~li .ted a i d will lit, acted upon when the opport,iiriit,yarises. inportant to mentiori hcrt: tliat, inany a.ut,horit,ativeand eiic.yc.lopc:dic* tical dist,ribiit,ioii theory exist in tlit literat,ure. For exarriplc: volll1llr:s 011 0

0

0

0

0

0

0

Johnson, Kot,z, a.nd Kemp (1992), dcscrihing discret>eiinivariatt: tlistribiit,ioiis Stuart, and Ord (1993), discussing gerieral distribiit,ion t,kir:ory ,Johnson, Kot,z, a.nd Ba.la.krishnaii (1994, 1995), clescribing continiious uiiivariat,c tlistribiitioiis Johnson, Kot A , arid Balakrisliriaii (lW7), describing discrt>tvniiiltivari;it? distrihitioris Vl'iniiner and Altniarin (1999), provitling variatt. tlistrihtions

rl

tlicsaiirus on discrete uni-

Evans, P t w v c k , and Hastiiigs (2000), desc.ril)ing discrete arid contiriiioiis distributions Kotz, Balwkrislinaii, arid ,Johnson (2000), discussing continiioiii rnultiV ~ I iatc. I tlistri1)iitiorls

are soni(~of t h prorniiiciit, ories. In addition, t,herr a.re sepa.ra.te books dedicatcd t,o sonic>specific distributions, such as Poisson, generaliimd Poisson, cliisquare, Pareto, exponent,ial, lognormal, logistic, normal, a.nd La.placc, (which have a11 hrcii rc,ft:rrtd to in this hook at appropria.tv places). Tliese books may 11c cwisiiltcd for any a.dditionad inform a.t'1011. WP t,a.ke this opport,iinit,y t,o express our sincere t,lianks t,o Mr. St,t:ve Quiglcy (of Johii Wiley & Sons, New York) for his support a.nd encoura.genient diiriiig t,he preparation of tjliis hook. Our specia.1 t1ia.nks go to htrs. Dc+bic: Iscoe (hlississauga, Ontario, Caiiatla.) for a.ssisting us wit,li t,lic canit:raproduction of the maniiscript, a.nc1 t,o Mr. Weiquari Liu for preparing all the figiircs. Wc also a.ckiiowledge wit,h gra.titude the financial support provided by the Natural Scicnces itrid Enginteririg R~esea.rcliCouncil of Canada. a.nd t,hr Riissia,ii Fountlation of Basic Research (Graiit,s 01-01-00031 aiid 00-15-96019) during t,lirx (aiirst' o f this projc

N . BALAKRISHNAN Hainilton, Canada

V. B. NEVZOROV St,. Pctcrsbiirg, Russia

April 2003

CHAPTER 1

PRELIMINARIES In this chapter we present some basic notations, notions, and definitions which a reader of this book must absolutely know in order t o follow subsequent chapters .

1.1

Random Variables and Distributions

Let (R, 7 ,P ) be a probability space, where R = {w} is a set of elementary events, 7 is a a-algebra of events, and P is a probability measure defined on ( 1 2 , 7 ) . Fiirthcr, let B denote an elcment of the Borel a-algebra of subsets of the real line R.

Definition 1.1 A finite sirigl(:-va.lued fiinction X = X ( w ) whic,h maps 0 into R is called a randorn v n ~ i u b l eif for a.riy Borel set, B in R, the inverse image of B, i.e., X - y B ) = {w : X ( w ) t B } belongs t o the a-dgebra. 7 . It means that for all B o d sets B , on(' can definc probahilities

P { X t B} = P { X - ' ( B ) } . In particular, if for any :x function

(-m

fix two sequences: a sequence of value 1 , z2.. . . and a. sequence of probahilititxs p k = P { X = zk}, k = 1:2 k

In this case, the cdf of X is given by (1.3) Definition 1.4 A ra.ndom varkble X with a. cdf F is said t o have ari absolutely coritirau,ous distribution if there exists a. nonnegative fiinctioii p(x) sur:h that,

F ( z )= & ( t )

dt

(1.4)

for ariy rcal :r. Remark 1.3 The fiiriction p ( ~then ) satisfies the condition

p(t) d t

= 1,

(1.5)

and it is called t,he probability density function, (pdf) of X . Note tha.t any nonnegativt. fiiiictioii p ( z ) satisfying (1.5) ( x i be the pdf of soiiie rantloni va.riablr X . Remark 1.4 If a random variable X has an absolutely contiriiious distribution, then its cclf F(.c) is continuous.

RANDOM VARIABLES AND DISTRIBUTIONS

3

Definition 1.5 We say that random variables X and Y ha.ve the same distribution, and write d

X = Y if the cdf’s of X and Y (i.e., F x and FX(2)= P { X

5

2)

Fy)

(1.6) coincide; that is,

= P{Y

5 }.

= Fy(2)

v 2.

Exercise 1.1 Construct an example of a probability space (R,T,P)and a finite single-valued function X = X ( w ) ,w E R, which ma.ps Cl into R,that is not a ra.ndom variable. Exercise 1.2 Let p ( z ) and q ( z )be probability dciisity fuhctions of two random variables. Consider now the followiiig functions: 1 ( a ) 2 P ( Z ) - 4x1; ( b ) P ( Z ) + % ( X I ; (c) IP(Z) - q(J.)l; ( 4 2 ( d r )+ d z ) ) . Which of these functions are probability density functions of some random variable for any choice of p ( z ) and q(z)?Which of them can br valid probability density functions under suitably chosen p ( z ) and q ( z ) ? Is there a function that can never he a probability density function of a random variable?

Exercise 1.3 Suppose that p ( z ) arid q(x) are probability density functions of X and Y , respectively, satisfying

p(x) = 2 Then, find P { X

< -1)

-

q(s) for

+ P { Y < 2).

0 < z < 1.

The quaritrle functron of a random variable X with cdf F ( z ) is defined by

Q ( u ) = inf{r : F ( z ) 2 u } ,

0 < ?L < 1.

In the case when X has an absolutely continuous distribution, thrn the quantile function & ( u ) inay simply be written as

Q(u)= F-yu),

0 < 7L

< 1.

The corresponding qunntrlc dcnsrt y functrori is given by

where p ( z ) is the pdf corresponding to the cdf F ( z ) . It, should be noted that just, as forms of F ( z ) may be used to propose familics of distributions, general forms of the quaiitile function Q ( u )may also be used to propose families of statistical distributions. Interested readers may refer to the recent hook by Gilchrist (2000) for a detailed discussion on statist,ical niodelling wit,li qimntile funct,ioris.

4

1.2

PRELIMINARIES

Type of Distribution

Definition 1.6 R.a.ndom variables X and Y arc said t20 belong to the sa.m,e type of di,strilmtion if there exist corist,ant,s n a.nd h > 0 such that

Y

d

= a+

hX.

(1.7)

Not,c then that, the cdf’s F=( and F y of the random variables X and satisfy tlic rtlatiori

F y ( x ) = Fx

Y

2-u ( 7 ‘d 2.)

One t ~ ~ itherefore, i , choost: a certa.in cdf F as tlie sta.nda,rd distribution fiinction of i t certain tlistribution family. Then this family would consist of all cdf’s of the form

and

F ( x ) = F ( x ,0: 1).

Tliiis, we have a two-pa.rarneter fa.mily of cdf’s F ( z ,a , h ) , where a is ca.lled thrx location pawmeter and h is t,he scale parameter. For a.bsolut,ely coiitiriuous distributions, one can introduce tlie corrcspondirig two-para.meter families of proba.bility density functions:

(1.10) wherc p ( ~ r = ) p ( a . 0 , l ) corresponds t o the random variable X with cdf E’, aiid p(x,a , I r ) cmresponds to the randoin variable Y = a h X with cdf F ( a ,(1, h )

+

1.3

Moment Characteristics

Tliare a,re soin? classical numerical c1iara.cteristics of random va.ria.blcs a.nd their dist,ribut ions. The most popular oms are expected values a.nd variances. Morc g m m d cliara.ct,rristics are the momen,ts. Among them, we emphasize rnoincnt s ;tl)out zero (about, origin) a.nd cent>ralmorrimts. Definition 1.7 For a. discrtite ra.ndorn variable X taking on va.lues 2 1 , 2 2 ; . . . wit,li proba.bilit,ies Ilk =

P{X

k

= Zk},

=

1 , 2 , . . .,

wt’ define t,lie n t h rnonaent of X about zero a.s

(1.11) k

We say tsliat oTL exists if

MOMENT CHAR.ACTERISTICS Notc that the cxpected value E X is nothing but mean of X or the mathematecal Pxpectataon o j X .

a1.

5

E X is also called the

Definition 1.8 The nth central m o m e n t of X is defined as (1.12)

c

given that,

k

1x1,- EXl"pk < 00.

If a random variable X has an absolutely continuous distribution with a pdf p(x), then the nioments about zero and t,hc central moments have the following expressions: 30

a,, = EX'l = l m z " p ( x ) dx

(1.13)

and (1.14)

We say that rnoiiients (1.13) exist if (1.15) The varzanc~of X is simply the sccond central riiornent: Var

x = p2 = E ( X

-

EX)^.

(1.16)

Central rrioriients are easily exprcssed in ternis of rnomerit,s about zero as follows:

d,,

=

E(X -- E X ) "

C(-l)" n

=

k=O

(1.17) k=O

In particular, we have Va.r X

= ,32 = a2

and

Note that tlir first central iriornent 81 = 0.

--

aI 2

(1.18)

6

PRELIMINARIES

The inverse problem cannot be solved, however, because all ccntral moments save no information about E X ; hence, the expected value cannot be expressed in terms of PTL( n = 1 , 2 . . . .). Nevertheless, the relation an

= =

=

k=O

c)

k=O

(L)ffFPn-k

EX"

9 2

=

E [ ( X- E X ) + E X ] " ( E X ) ' " E ( X- E X ) " - k (12 0 )

will cnahle us to express a , ( n = 2.3,. . .) in terms of central moments /&, . . . . In particular, we have

+3

a3 =

+

h ~ i a;

0 2 =

Pz + (27,

and

a4 =

E X and t,he

a1

(1.21)

p4+ 4 0 3 ~ ~t-16p2a: + a;.

(1.22)

Let X aiitl Y belong to the sa.rne type of distribution [see (1.7)], rnea.ning that, d Y =a, hX

+

for some constmts a and h > 0. Then, the following equalities a.llow us t o exprcss moments of Y in terms of the corresponding moments of X :

(1.23) and

E ( Y - -E Y ) " = E [ h ( X

~

E X ) ] " = h,"E(X - E X ) 7 1 .

(1.24)

Note that the centxal niomcnts of Y do not depend 011 t,ha 1oca.tioiipara.ineter a. As partic:ul;tr ca.ses of (1.23) and (1.24), wc havc

EY BY2 EY" EY'

=I

= = =

u+hEX,

(1.25)

az

(1.26) (1.27) (1.28)

+ 2ahEX + h,'EX2, Var Y = h2 Var X , + 3a2hEX + 3ah2EX2$- h 3 E X 3 , a4 + 4u'hEX + Ba2hzEX2+ 4 u h 3 E X 3 + h 4 E X " .

Definition 1.9 For ra.ndorn varia,bles takiiig oil values 0, 1 , 2 , . . ., tliejactorial momeats of pos%t.l,?ie order are defined as p,. = E X ( X

-

1). . . ( X

- 7'

+I),

'r = 1,2, . . .

~

(1.29)

while the f a c t o k l morrren,ts of negative order are defined as / L r=

E

[(X

+

1 l ) ( X '2).

+

I

. . (X + 7.) '

r

= 1,2,

(1.30)

SHAPE CHARACTERISTICS

7

While dealiiig with discrete distributions, it is quite often convenient to work with these fa.ctorial moments rather t,hari regular moments. For this reason, it is useful to note t,he following rehtionships between thc fa.ctoria1 rnoinents and the moments:

Exercise 1.4 Present, two different miidom variables having the same cxpectatioris and the same variances. Exercise 1.5 Let X be a random variable with expectation E X and variance Var X . Wha.t, is the sign of r ( X ) = E ( X - iXl)(Var X - Var IXl)? When does the qua.ntity r ( X ) eyua.1O? Exercise 1.6 Suppose tha.t X is a random variable such that P { X > 0) = 1 and that both E X and E ( l / X ) exist. Then, show that E X E ( l / X ) 2 2 .

+

Exercise 1.7 Suppose that P(0 5 X 5 l } = 1. Then, prove that E X 2 5 E X 5 E X 2 f . Also, find a.ll distributions for which the left and right bounds are attained.

+

Exercise 1.8 Construct a varia.ble X for which E X 3 = -5 and E X 6 = 24.

1.4

Shape Characteristics

For any distribution, we are often interested in some cha.ra.cteristics that are associated with t,he shape of the distribution. For example, we may be interested in finding out whether it is unimodal, or skewed, and so on. Two important measures in this respect are Pearson’s measures of skewness and kurtosis.

PRELIMINARIES

8

Definition 1.10 Pearson’s measures of skewness a.nd kurtosis are given hy

and

iij4

$2

72 = -

’

Since tliese mea.sures are functions of central moments, it is clear t,liat, they are free of t,lir. location. Siinilarly, dur to the fra.ct,ionalforni of thc rnca.sures, it can readily bt? vcrified that they are free of sca.le as well. It ca.n also he seen that tlie nieasure of skewness y1 may take on positive or nega.tive valiics depending on whtther /3:, is positive or negative, respectively. Obviously, whcn the distribiitiori is symnietric aboiit its mean, we may note that, /jn is 0, in which cast! tlie measure of skewiicss y1 is also 0. Hence, distrihiitions with y1 > 0 a.re sa.id to be positively skewed distributiorrs, while those with y1 < 0 arc said toohe n q n t i v e l y skewed distributions. Now, witliout, loss of generality, let, us consider a n arbitrary distribution with niean 0 a.nd va.riance 1. Then, by writing

and applyirig thr. Caiichy Schwarz ineqiiality, we readily obtain the inequality

Lat,er, we will observe the coefficient of kurtosis of a norma.1 distribution to hr 3 . Based on this value, distributions with 2 2 > 3 are called Zeptokwrtic distribu,tions, while those with y2 < 3 a,re ca.lled plntykurtic distributionw. Incidenta.lly, distribut,ionsfor which y2 = 3 (which clea.rly includes the normal) arc called m,esokurtic distributions.

Remark 1.5 Karl Pearson (1895) designed a. system of continuous distributions whcrcin t,he pdf of every riieniber satisfies a. differential cqiia.tion. By studying t,lit:ir irioriieiit properties and, in particiila.r, their coefficients of skewness and kiirtosis, he proposed seven families of distributioris which all occupied different, rcgioiis of the (71,~2)-pla,iie.Several prominent dist~rihutJions (such as beta., gxnma, normal, and t that, we will set in subsequent cha.pters) bclong to t,hcsc families. This tieveloprnent wa.s the first and historic attempt t o propose it iiriificd mechanism for developing different families of sta.tistica1 distributioiis.

1.5

Entropy

One more useful charact,erist,icof distributions (called entropy) was int,roducecl by Shannon.

ENTROPY

9

Definition 1.11 For a discrete random variable X taking on values . . . with probabilities p l , p 2 , . . . , the e72tropy H(X)is defined as

.c1,x2,

If X has an absolutely continuous distribution with pdf p(x), then t h entropy is defined as (1.39) where

D

=

{x : p(x) > O}.

In the case of discrete distributions, the transformation

Y

=u

+ hX.

-CC

< a < 00, h > 0

does not change the probabilities p , a.nd, consequently, we ha.ve

H ( Y ) = H(X). On the other h n d , if X has a pdf p ( . z ) , then Y

=a

+ hX

ha.s the pdf

and

whcre

It, is thcn easy to verify that

=

log h -tH ( X ) .

(1.40)

10

PRELIMINARIES

Generating Function and Characteristic F'unct ion

1.6

In this section we present some functions that are useful in geiicratiiig the probahilit,ies or the niornrnts of the distribution in a siiiiplc arid unified nia,nner. In addition, they ma.y also help in identifying the distribution of a.n underlying random va.ria.ble of interest.

Definition 1.12 Let X take on values 0, 1 , 2 , .. . with proba.bilit,ies p , = P { X = n } , ri = 0 , 1 , . . , . All the information a.bout this distribution is contained in tlir generntin,g function, which is defined as (1.41) n=O

with the right-hand side (RHS) of (1.41) converging a t least for / s / 5 1. Sonie iiiiportant properties of generating functions are as follows:

(a) P(1)= 1;

(11) for

Is/
0 and h such that f ( a 1 t )f

(azt) = e"D"f(at).

(1.56)

A raridoni varia.ble is said to ha.ve a stable distribution if it,s characteristic function is stable. Remark 1.9 It is of interest t o note tha.t a.ny sta.ble distribution is a.bsolutely continuous, and is also infinitely divisible.

15

R.ANDOM VECTORS AND MULTIVARIATE DISTRIBUTIONS

1.9 Random Vectors and Multivariate Distributions Let (R, 7 ,P ) be a probability spa.ce where (1 = { w } is a set, of element,ary events, 7 is a. a-algebra, of events, and P is a probability mea,siire defined on (a, 7 ) .Further, let B denote an element, of tlie Borel a-algebra. of subsets of the n-din1ensiona.l Euclidean space R'"

Definition 1.17 An n-dimensional vector X = X(w) = (X,(W),. . . , X,,(w)) which maps R into R" is ca.lled a random vector (or an n-dimensional random variable) if, for any Borel set B in R",the inverse image of B giveii by

B} = {W : (X,(W),. . . X,(W)) t B}

= {W : X(W)E

X-l(B)

belongs to the a-algebra 7 . This niea.ns t,ha.t,for any Borel set B, we can define probability as

P{X t B} = P{X-l(B)}. In particular, for ariy x

= (XI,.

F(x) = F ( z 1 , . . .

. . , z,), the fimctioii

2,)

=

P ( X 1 5 XI,.. . , X , 5 Z,}? -CC < x i , . . . ,x, < CC,

(1.57)

is defined for the randoin vector X = (XI, . . . , Xn).

Definition 1.18 The function F ( x ) = F ( z 1 , . . . , x7,)is called the distribsution function of the ra.ridoin v Remark 1.10 The eltments X I , .. . , X , of the random vector X car1 be considered as n univariate random variables having distribution functions Fl(X)

=

F ( z :m , . . . ,m) = P ( X 1 5 x } ,

FZ(2)

=

F ( 3 0 , 2 , 0 0 , . . . , m ) = P ( X 2 5 X}; F ( o o , . . . . x , 2 ) = P { X n 5 x},

respectively. Moreover, ariy set of n random variables X I , . . . , X,, forms a random vector X = ( X I , .. . , X,). Hence,

F ( x ) = F ( z 1 , .. . , 2 , ) = P{X1 5

51,.

..

;x,5 2 , }

.

is often called t,he joint distribution, function of the variables X I ,. . . X,.

.

.T:'~)

is the joint, distribution function of the ra.ridorri variables

can obtain from it t,he joint distribiit,ion fiinction of any subset

,!)

ra.tlr1t.r ra.sily. For exa.rnple, we have

, x, 5 z:7n}= F ( s 1 , . . . as the joirit distribution function of

(XI,.

X,,,,w . . . . ; m )

. . ,X T n ) .

(1.58)

PRELIMINARIES

16

Definition 1.19 The ra.ndom variables X I . . . . X , a.re said to be independ e n t random! variables if

P(X1

5 2 1 , . . . , x , , 5 Xn}

n 7,

=

P{Xk I Zk}

(1.59)

k=l

for any -m

< % k < oc ( k = I , , . . , n ) .

.

Definition 1.20 The vectors XI = ( X I , . . . X n ) and X2 = ( X T L + l.,.. Xn+m) are said to be independent if

for any -m

< J'k < co ( k = 1,.. . , n + m ) .

In the following discussion we restrict ourselves t o the two-diniensiona.1 case. Let, ( X , Y ) be a. two-dimensional random vector and let F ( z ,y) = P ( X 5 r. Y 5 y) be the joint, distribut,ion function of ( X ,Y ) . Then, F x ( 2 ) = F ( J ,m) a.nd FlJ (y) = F ( m , y) are the marginal distribution functions. Now, as we did ea.rlier in the univariate case, we shall discuss discret,c a.nd absolutely continuous cases sepa.ra.tely. Definition 1.21 A two-diinensiona.1 random vector (X,Y )is sa.id t o have a. discrefe bi7iariate distribution if there cxists a. countable set, B = ( ( 2 1 ; y l ) , ( I C ~ , U ~. ). ,.} such that, P { ( X , Y )E B } = 1. Remark 1.11 In order to determine a.two-dimensional random vcctor ( X ,Y ) having a biva.riate discrete distribution, we need t,o fix two sequences: a. sequence of two-dimensional points ( 2 1 , y l ) , ( 2 2 , y2). . . . and a. sequcnce of proba.bilities p k = P { X = x k , Y = yk}, k = 1 , 2 , . . . , such that X k P k = 1. In this casc, the joint distribution fiinct,ion F ( z ,y) of (X,Y ) is given by (1.61)

Also, the coniponents of the vector ( X ,Y ) arc independent if

P{x

= .rk,Y = y k } =

P{X

= .ck}P{Y = y ~ }

for any k .

(1.62)

Definition 1.22 A two-dimensional random vect,or ( X ,Y ) with a joint distribution function F ( z , y) is said to have an absolutely con)tinwous biwarinte distribution, if there exists a nonnegat,ive function p ( u , such tha.t !ti)

( 1.63)

for any rcal

2

and y.

R.ANDOM VECTORS AND MULTIVARIATE DISTRIBUTIONS

17

Remark 1.12 The function p(u..rs) satisfies the condition (1.64) and it is ca.lled the probability density f u n c t i o n (pdf) of the bivariate random vector ( X , Y ) or the j o i n t probability density f u n c t i o n of the random variables X and Y.If p ( u , 21) is the pdf of' tlie bivariatr: vector ( X ,Y ) ,then tlic components X and Y have one-dimensional (margiiial) denshies (1.65)

(1.66) respect,ively. Also, the comporicnts of the absolutely continuous bivariate vector ( X ,Y ) are independent if P ( U , 2') = P X ( U ) P Y (7'1,

(1.67)

wlicrc px ( u ) and p y ( I>) are the marginal t1oiisitic.s as givcn in (1.65) and (1.66). Moreover, if the joint pdf p ( u , I > ) of ( X ,Y)admits a factorization of thp form P ( U , 2') =

ql(u)q2(v).

(1.68)

then the components X and Y are iridependent,, and tliere exists a nonzero constant c such that

Exercise 1.20 Let F ( J ,y) denote tlie distribution function of the random vector ( X , Y ) . Then, exprcss P { X 5 0,Y > l} in terms of the function

F ( x ,Y). Exercise 1.21 Let F ( z ) denote the distribution function of a random variable X . Consider the vector X, = ( X ,. . . , X ) with all its corriporimts coinciding with X . Express the distribution function of X, iri ternis of F ( x ) .

PRELIMINARIES

18

1.10

Conditional Distributions

Let ( X .k’) be a rantlorn vector having a discretc bivariat,? distribution conc w i t m t d on smie poiiit,s (xi,g,?),and Ict, p z J = P { X = ri,Y = y j } , qi

=

P{X

= xi}

> 0 , and rj

=

P(Y = yJ} > 0,

for i > j= 1 , 2 , . . . . Then, for any y.7 ( j = 1,2 , . . .), the conditional distrib,utio,ri of X . g i i w t , I-’ = yj, is defined a.s

(1.69) Siinilarly, for any :c, ( i X = r , , is dcfined a.s

P{Y

=

=

1,2: . . .), the conditional distribution of

? J ( X=xi}

=

P { X = 22,Y = yj} P{X =Xi}

-

pi, -

Y,givcn (1.70)

(12

N ) wit>li pdf px(.rl. . . . , .c,,). For example, let 11s consider the case wlie lie ?~-tliiiic.nsioiial ( X I ,. . . . X 7 ! )has an a.t)soliit,elycont,iiiiious dist,ril)iition. raadorn vector X : Let, u == ( X I , . . . , X I , , ) and v = (X7,>+1, , X r z )( 7 n < 71,) t K tjllc?random vet.tors corrcsponiliiig: t,o tlir. first, ‘/r/ ~ , i i (t.he l last T I - m conipoiicnts of t,he ra.iitlo~n vect>orX . We ( x i tlrfiiicx the ptlf’s p c ~ ( . x . l . .. . n.,) and pv(x,+l, 1 1 ) ill t,liis ( m e as in Eqs. (1.58) and (1.63). Thcn, the con,ditionnE pdf of the 7andom uector V, ,9i,oen,U = (z1, . . . x,,,),is defirictl a.s ~

~

(1.73)

MOMENT CHARACTERISTICS OF RANDOM VECTORS

1.11

19

Moment Characteristics of Random Vectors

Let, (X,Y ) be a. bivariate discrete randoni vector concentrating on points (xi,y j ) with probabilities p i j = P { X = xi,Y = yj} for i , j = 1 , 2 , . . . . For any measurable function g(x,y), we ca.n find the expected value of Z = g ( X , Y) as (1.74) Similarly, if (X,Y ) has an absolutely continuous bivariate distribution with the density fiinction p ( x ,y) , then we have the expected value of Z = g ( X , Y) a5

1,JI

m o o

EZ

= Ey(X,Y) =

co d

z . Y ) P ( T Y)

Of coiirse, as in the univariate case, we say that E Z

Eqs. (1.74) and (1.75) exist if

dx dY. =

(1.75)

E g ( X , Y ) defined in

respectively. In particular, if g(x,y) = x k y e , we obtain E Z = E g ( X ,Y ) = EX"', which is said to be the product m o m e n t of order ( k , t ) . Similarly, the moment E ( X - EX)'(Y - EY)' is said to be the central product m o m e n t of order ( k , l ) , and the specia.1 ca,se of E ( X E X ) ( Y - E Y ) is called the covariance between X and Y and is denoted by Cov(X,Y). Based on the covariance, we can define another measure of association which is invariant with respect, t o both location and scale of t,he variables X and Y (meaning that it, is not affccted if the means and the varia.nces of the variables are changed). Such a. mea.sure is the correlation coeficient between X and Y and ~

It can easily be shown tjhat 1pl 5 1. If we are dealing with a general n-dimensional random vector X = ( X I ,. . ,X,,), then the following moment characteristics of X will be of interest t o us: the iricari vector m = (ml, . . . , m n ) ,where m,= E X , ( 1 = 1,. . . , n ) , the covariance matrix C = ( ( c ~ ~ , ) ) ~ , = where ~, a,, = a,%= Cov(X,,X,) (for z # 3 ) and o,,= Var(X,), and the correlation matrix p = ( ( p z 3 ) ) F 7 = where Ll pz, = a,,/,/-. Note that the diagonal elements of the correlation matrix are all 1.

20

PRELIMINARIES

Exercise 1.22 Find all distributions of the random variable X for which the correlation coefficicmt p ( X ,X 2 ) = -1. Exercise 1.23 Suppose that the variances of the random variables X arid Y are 1 and 4, respectively. Then, find tlic cxact upper and lower bounds for Var ( X Y ) . ~

1.12

Conditional Expectations

111 Section 1.10 we introduced conditiorial distributions in the case of discrete as well a s absolutely continuous niultivariate distributions. Based on those conditional distributions, we describe in this section conditional expectations. For this piirpose, let us first consider the case when ( X ,Y ) ha,s a discretr hivariate dist,ribution concentrating O K points ~ (xci,y j ) (for i , j = 1,2 , . . .), arid as before, let pi,] = P { X = z ~ , Y= y,} a,lld r3 : P { Y = y j } > 0. Suppose also t1ia.t E X exists. Then, based on the definition of the conditional distribution of X, given Y = y j , presented in Eq. (1.69), we readily have the conditional mea'n of X , given Y = y j , as

E (XIY

= yj) =

1 J ~ P { X= xilY =

Pi, -.

:yj} =

(1.76)

7'j

2

2

More generally, for any nieasurable fiinctio~it i ( . ) for which E h ( X ) exists, wt' have t,he condztaonal expectatton of h ( X ) , gzven Y = y, as

E { h ( X ) l Y = y3}

h ( x 2 ) P { X= .r,lY

=

h(.c,) %. (1.77)

= y,} =

z

2

1'3

Based 011 (1.76), we can introduce the conditional expectation of X , giToeri Y , denot,ecl by E ( X l Y ) , as a new random variable which takes on the value E ( X I Y = g j ) when Y ta.kes on t h value y j (for j = 1 . 2 , . . .). Hcnce, the conditiorial expt:cta.tion of X, givtm Y , its a random variablt: takes on va,lues

with probabi1itic.s r3 (for j = 1 , 2 , . . .). Consequently,

E {E(XIY))

=

1E (XIY J

= Y,)

WP 7'3

rcadily observe that

21

REGRESSIONS

Similarly, if the conditional expectation E { h ( X ) I Y } is a random varia.ble which takes on values E { h ( X ) I Y = y j } when Y takes on va,lues y j (for j = 1 , 2 , . . .) with probabilities r j , we can show that

E [ E{ h ( X ) I Y ) I = E { h ( X ) } . Next, let us consider the case when the random vector ( X , Y ) has an absolutely continuous bivariate distribution with pdf p ( z ,y), and let p y ( y ) be the margina.1 densit,y function of Y . Then, from Eq. (1.71), we have the conditional mean of X , given Y = YJ, its

provided that E X exists. Similarly, if h(.) is a measurable function for which E h ( X ) exists, we have the conditional expectation of h ( X ) , given Y = y, as x

E { h ( X ) I Y = y}

=

As in the discrcte case, we can regard E ( X I Y ) and E { h ( X ) I Y }as random variables which take on the values

when the random variable Y takes shown that

thc vitluc y. In this case, too, it can be

and

E { E ( X I Y ) }= E X

1.13

011

E [ E { h ( X ) l Y }= ] E{h(X)}.

Regressions

In Eqs. (1.76) and (1.78) wr dcfinecl the conditional expectation E ( X I Y = y) provided that E X exists. From this conditional expectation, we may consider the function a(y) = E ( X I Y = Y) 1

(1.80)

which is called the rcgression function of X on, Y . Similarly, when E Y exists, the function b ( ~=) E ( Y I X

(1.81)

= X)

is called the regression function, of Y on X . Note tha.t when the random va.riables X and Y are indepcndent, then u ( y ) = E ( X I Y = y) = E X

and

h ( z )= E ( Y I X

=

z) = EY

are simply the unconditional means of X and Y , and do not dcpentl on y a.nd

z, respectively.

PRELIMINARIES

22

1.14

Generating Function of Random Vectors

Let X = ( X I , .. . , X,) be a random vector, elements of which take on values 0 , 1 , 2 , . . . . In this case, the generatzng function P ( s 1 , .. . , s,) is defined as P(S1.. . .

, s,)

=

EST' . . . s:n

(1.82) Although the following properties can be presented for this general case, we prment them for notational simplicity only for thc bivariate case ( n = 2). Let Px,~(s,t), P x ( s ) , and P y ( t ) be the generating fiinctioii of the bivariate random vector ( X , Y ) , the marginal generating function of X , and the marginal generating function of Y, defined by

P ~ ( s )= EsX

=c

P { X

=j

}~',

(1.84)

j=O 00

Py(t)

=

EtY

=C

P { Y =k}P,

(1.85)

k=O

respectively. Then, the following propcrt,ies of easily:

Px,y(s,t ) can be establislicd

Px y(1.1) = 1; P x . Y ( s ,1) = Px(s)and P x . Y ( = ~P , Y~()~ ) ; P y , ) ; ( s . s ) = P x + Y ( s ) ,where Px+y (s) = E S " + ~is the gcncrating function of the variable X Y ;

+

PX.l( s , t )= Px(~)Py(t) if and only if X and Y are independent;

Next, for the random vector X f m c t i o n f ( t 1 , . . . , t,) a.s

=

(XI.. . . , X,,,),we define the ch,nro,cteristic

GENERATING FUNCTION OF R.ANDOM VECTORS

23

(1.86)

Similarly, in the case when the ra.ndom vector X = ( X I , .. . , X,) has an absolutely coritiriiious distribution with density fiinction p(x1,. . . , z,,), then its characteristic function f ( t 1 , . . . , tTL) is defined a s

f ( t l , .. ,t n ) ,

=

E

ei(~lX1+...+tnXn)

S,-L 0

-

M

ez( t 1s1 +...+t ,,z, ) ~

( z I ., .., x , )d ~ l. .. dx,.

(1.87)

Once again, althoiigh t,he following properties ca.n be presented for this genera,] n-dimensional ca.sc, we present, t,hem for notational simplicity only for tkic bivaria.te case (71 = 2 ) . Let f x , y ( s ,t ) , fx(s), arid f y ( t ) be the cha.racteristic fiiiiction of the bivariate raridoin vector ( X ,Y ) ,thc marginal charat ristic function of X, and the marginal cha.racteristic function of Y , tlefiried by

1,.I_, c

f X , Y (*%t )

fX(S)

= =

.I_,

u

m

=

&"Jl+tI/)

I.(

eiSZl)x

Lm 00

fY(t)

m

eitglpy

p x , y ( : c ; y )d z dy,

(1.89)

dz,

(1.90)

(y) d y ,

respectively. Then, the following properties of easily:

(13 8 )

f ~ , (s. y

t ) call be established

(a) fX,Y(0,O) = 1;

(b)

fX,Y

(.%0)

= .fx(.)

and

fX,Y(O,t)

fv(t);

(c) f x , y ( s ,s) = , f x + ~ ( s )where , f ~ + + y ( s=) Et,i"(X+') is t,he characteristic Y; fiinckion of the w.riable X

+

(d) f x , Y ( s , t ) = f x ( . s ) f y ( t if ) and only if

X and Y are independent;

24

PRELIMINARIES

Exercise 1.24 Let P ( s , t ) be the gerieratirig function of the random vector ( X . Y ) . T h m , find the grnclrat,ing function Q ( s .t , 7 ~ ) of the rantlorn vector ( 2 X 1. r;, 2Y).

+ x + 3x +

1.15

Transformations of Variables

a:r,

ax1 ~

~

3Yl

aY2

d:Yl

dYz

...

a22 _ _ _ _8x2 _

J =

a&

~

dyl

ax 1 __ 8YTl

852 __ dYrl

... 11, ~

dY2

...

ax,, ~

?Y,L

,

TR.ANSFORMATIONS OF VARIABLES

25

where lJ1 is the a.bsoliite valiic of the Jacobian of tlic transforma.t,ion. Once a.ga,in, the marginal ptlf o f m y subset, of the new va.riables may be obtained from (1.92) by integrating out the other variables. Note that, if the transformation is not one-to-one, hiit B is the union of a finite number of mutually disjoint spa.ces, sa.y B1,. . . , Be, then we can construct l sets of one-to-one transformations (one for each Bi)and their respect,ive Jxobians, and then finally express the tleiisity function of the vector Y = (Yl... . , Y,) as the sun1 of k' ternis of the form (1.92) corresponding t o B1,. . . , Bb.


Part I

DISCRETE DISTRIBUTIONS


CHAPTER 2

DISCRETE UNIFORM DISTRIBUTION 2.1

Introduction

The general discrete uniform distribution takes on k distinct values x1,x2, . . . , x k with equal probabilities l / k , where k is a positive integer. We restrict our attention here to lattice distributions. In this case, xj = a j h , j = 0,1, . . . , k-1, where a is any real value and h > 0 is the step of the distribution. Sometimes, such a distribution is called a discrete rectangular distribution. The linear transformations of random variables enable us to consider, without loss of generality, just the standard discrete uniform distribution taking on values 0 , 1 , . . . , k - 1, which correspond to a = 0 and h = 1. Note that the case when a = 0 and h = l / k is also important, but it can be obtained from the standard discrete uniform distribution by means of a simple scale change.

+

2.2

Not at ions

We will use the notation

if

P{X=a+jh}=and

1

k

x

for j = O , l ,

-

. . . ,k - 1 ,

DU(k)

for the corresponding standard discrete uniform distribution; i.e., D U ( k ) is simply D U ( k ,0 , l ) .

Remark 2.1 Note that if

Y

N

D U ( k , a ,h )

and

29

X

N

DU(k),

DISCRETE UNIFORM DISTRIBUTION

30 the11

x=-Y -h a d

d

and

-

d

Y =a+hX,

where = denotes “having the same distribution” (see Definition 1.5). More generally, if Y1 D U ( k ,ul,h l ) and Y2 D U ( k ,a2, h2), then N

where c = 111 1112 and d = a1 - a2 hl lh2. This means that the random variables Y 1 a.nd Yl belorig to the same type of distribution, depending only on the shape pa,rarneter k , a.nd do not depend on location (a1 and u2) and scale (hl and ha) pa.ra.meters. Discrete uniform distributions p1a.y a na.tiira.lly important role in nmny classical problems of probability theory that, t1ea.l with a random choice with equal probabilities from a finite set of k items. For example, a. lottery machine contains k halls, riurnbercd 1 , 2 , . . . , k . On selecting one of these balls, we get a. random number Y which has the D U ( k , 1 , l ) distribution. This principle, in fact, allows 11s to genera.te tables of random numbers used in different sta.tistic:al simulations, by ta,king k sufficient,ly large (say, k = lo6 or 232). For t h rest of this chapter, we deal only with the standa,rddiscrete uniform D U ( k ) distribut>ion.

2.3

Moments

Wc will now determine tlic rnorn(wts of X X has a finite support.

N

D U ( k ) , all of which exist since

Moments nhout zero:

. k-l

T=O

( k - 1)(2k - 1)(3k2 - 3k 30

-

1)

(2.4)

MOMENTS

31

To obtain the expressions in (2.1)-(2.4), we liavc used the following wellknown identities for sums:

cr=--k-1

3

r=O

k-1

c . 3= r=O

k-1

C.2

(k - l)k

( k - l)"2 4

,

'

=

r=O

cr4

k-1

and

( k - 1 ) k ( 2 k - 1) 6

1

( k - l)k(2k

=

r=O

-

1)(3k2 - 3 k 30

-

1)

,

sce, for exarnplt, Gradshteyn and Ryzhik (1994, p. 2). Note that an N k"/(n

+ 1)

as k

+ cm.

(2.5)

Central momen,ts: The variancc? or the second central moment is obtained from (2.1) a.nd (2.2) as

The third cmtral rnomciit, is obtaiiird from (2.1) (2.3) as ,!)3

E(X

- 0!1)~

=

Q3 -

3CYzCy1

+ 2N:

At, first, (2.7) ma.y seem surprising; but once we realize that X is symmetric a1 = ( k - 1 ) / 2 , (2.7) makes perfect sense. In fact, it is about its mean va.11~ easy t o see that (A- - 1 - X ) and X take on the same values 0 , 1 , . . . , k - 1 with equal proba.bilities 1/k. Therefore, we have

x

-

d

N1 = a1

-

x,

and consequently,

which simply implies that /?2?.+l =

0,

r

=

1 , 2 , . .. .

Fuctoriul m,omen,ts of positive order pr

=

EX(X--l)...(X-r+l)

(2.8)


32

It is easily seen that p7

-

__ -

= 0 for T

2 k,

and

1 7n! k 711=1’ (rn - T ) !

-

( k - l)(k-2)...(k-r)

forr-1.2 ,.... k - 1 .

T + l

In deriving the last expression, we have used the wcll-known conibiiiatorial identity

In particular, we have I-11

=

a1

=

k-1 2 ’

(2.9)

~

(2.10) p3

=

a3

(k

-3a.L+2a1=

-

l ) ( k - 2 ) ( k ~-3 ) , 4

(2.11)

and pk-2

=

Pk-1

=

( k - 2)!, ( k - l)!

k

(2.12) (2.13)

Fuctorinl rriomerits of negatiue order:

p-,.

=

E

{(X +

1

1). . . ( X

+

T)

In pa.rt,icular, (2.14) and

(2.15)

GENERATING FUNCTION AND CHAR.ACTERISTIC FUNCTION

33

(2.16) -=

p-y

1 k-l k m=O ( m

-

1

+ l ) ( m + 2)(m + 3) 1

m+1

2k -

2.4

m +2

m+3

k+3 4(k+ 1)(k+2)'

(2.17)

Generating Function and Characteristic F'unction

The generating function of D U ( k ) distribution exists for any s and is given

bY

px (.s) = E s X

For s

#

=

1 k-l

-

k

5'

r=O

1, it can be rewritten in the form

Px(s)=

1 - SIC k(l - s)'

For any k = 1.2. . . . , Px ( s ) is a polyriornial of ( k difficult t o see that its roots coincide with s3

j

= exp(2?rij/k),

(2.18)

~

=

~

1)th degree, arid it is not

1 , 2 , . . . ,k - 1 for k

> 1.

This readily givrs us thc following form of the gmcrating function:

Another form of the generating function exploits the hypergeometric func-

tion, defined by

2F1[n,h;(.;x] =

1

a ( a + l)b(b + 1)x2 + abz + el! c(c + 1)2! ~

The genera.ting function, in terms of the hypergeometric function, is given by

Px(s)= zF1[-n

+ 1,1; + 1;s]. --7i

(2.21)

34


Sirice the characteristic function and the generating function for lionnegative intcger-valued random variables sa.tisfy the relation

f x ( t )= ~ e x p ( i t X = ) Px(eLt), if we change s by eit in Eqs. (2.18), (2.19), and (2.21), we obtain the corresponclirig expressions for the chara.ct,crist,icfiinct,ioii f x ( t ). For example, from (2.18) we get, (2.22)

Convolutions

2.5

-

Let, 11s t,akc two independent random variables, bot,h having discretc iiniform distributions, say, X D U ( k ) and Y D U ( T ) , k 2 r (without loss of a.ny generality). Tlien, what ca.n we say about the distribution of t,he sun1 2 = X Y ? The distribution of 2 is called the convolution, (or composition) of the two initid distributions. N

+

+

Exercise 2.1 It is clear that 0 5 2 5 k r - 2. Consider the tlircc tliff(wnt sit,iiat,ioris, aiid prove that P { Z 5 m } is given bv (a)

(b) (c)

(77)

+ l ) ( m + 2)

if O < n z < r - l , 2kr 2m -r 3 if r - l < m < k - l . 2k ( T + k - 2 - m ) ( T + k - 1- m ) . 1- if k < 7 r i < k + r - 2 . 2kr

+

(2.23)

From ( 2 . 2 3 ) , we readily obtain

( 0

ot herwisc.

One can see now that r = 1 is the only case when the corivolution of two discrc%curiiforiri D U ( k ) and D U ( T )distributions leads t o the same distribution. Note that in this situation P { Z = 0) = 1, which nieaiis that Z has a degenerate distribution. Nevertheless, it turns out that convolution of more general rioridegenerate discrctc uniform distribiitions may bdorig to tht. s a m , set of distiibutions.

DECOMPOSITIONS

-

35

-

Exercise 2.2 Suppose that Z DU(r,O,s) and Y D U ( s ) , where T = 2 , 3 , . . . , s = 2 , 3 , . . . and that Y and 2 are independent random variables. Show then that U = Y 2 D U ( T S ) .

+

N

-

d

Remark 2.2 It is easy to see that Z = s X , where X D U ( r ) . Hence, we get another equivalent form of the statement given in Exercise 2.2 as follows. If X D U ( T )and Y D U ( s ) , then the sum s X Y has the discrete uniform D U ( s r ) distribution. Moreover, due t o the symmetry argument, we can immediately obtain tha.t the sum X rY also has the same D U ( s r ) distribution.

-

+

N

+

2.6

Decompositions

Decomposition is an operation which is inverse to convolution. We want to know if a certain ra.ndom variable can be represented as a sum of at least two independent random variables (see Section 1.7). Of course, any random variable X can be rewritten as a trivial sum of two terms a+ ( X - a ) , the first of which is a degenerate random variable, but we will solve a more interesting problem: Is it, possible, for a certain random variable X , to find a pair of nondegenera.te independent random variables Y and 2 such that

Consider X

T

N

D U ( k ) ,where k is a compound number. Let k

= rs,

where

2 2 and s 2 2 are integers. It follows from the statement of Exercise 2.2 that

X is deconiposable as a sum of two random variables, both having discrete uniform distributions. Moreover, we note from Remark 2.2 that we have at least two different options for decomposition of D U ( r s ) if T # s. Let k be a. prime integer now. The simplest ca.se is k = 2, when X takes on two values. Of course, in this situa.tion X is indecomposable, because any nondegerierate random variable takes on a t least two values and hence it is easy to see that any sum Y 2 of independent nondegenerate random variables has at least three values. Now we can propose that D U ( 3 ) distribution is decomposable. In fact, there are a lot of random variables, taking on values 0, 1, a.nd 2 with probabilities p o , p 1 , and p 2 , that can be presented as a sum Y + Z , where both Y and 2 take on values 0 a.nd I, probably with different probabilities. However, it turns out that one ca.n not decompose a random variable, taking three values, if the corresponding probabilities are equal (PO = 1 Pl = p 2 = 3).

+

36

DISCRETE UNIFOR,M DISTRIBUTION

Exercise 2.3 Prove that D U ( 3 ) distribution is indecomposable.

In the general case when k is any prime integer, by considering the cmresponding generating function

we see that, the problem of decomposition in this c;ase is equivalent to the following problcm: Is it possible to present P x ( s ) as a product of two polynomials with positive coefficients if k is a prime? The nega.tive answer was given in Krasncr a.nd Ranu1a.c (1937),and independently by Raikov (19374. Sumnmrizing all these, we ha.ve the following result,.

Theorem 2.1 T h e discrete u n i f o r m D U ( k ) distribution (for k > 1) i s indecornposable iff k i s a p r i m e integer.

It is also evident that X D U ( k ) is not infinitely divisible when k is a prime nurnber. Moreover, it is known that any distribution with a. finite support ca,iinot be infinitely divisible (see Remark 1.6). This mea.ns that any D U ( k ) distribution is not infinitely divisible. N

2.7

Entropy

-

Frorn the definition of the entropy H ( X ) in (1.38), it is clear that the entropy of any D U ( k .a , 11.) distribution depends only on k . If X D U ( k ) ,then

H ( X ) = log k.

(2.25)

It is of interest to iiieritiori liere that among all the random variables taking on a t most k values, any random variable taking on distinct values T I , 2 2 . . . . . x k with probabilities p J = l / k ( j = 1 , 2 , . . . , k ) has log k t o bc the niaximurn possiblc valiie for its entropy.

2.8

Relationships with Other Distributions

The discrete uniform distribution f o r m the ba.sis for the derivat,ion of many distributions. Here, we present some key corinect,ions of the discrete uniform distrihiition t o sornc othcr distributions: (a) We ha.ve a.lrea.dy mentioned that DU( 1) distribution ta.kes on thr: value 0 with probability 1 a.nd, in fa.&, coincides with the degenerate disth-it)iition, which is discussed in Chapter 3.

RELATIONSHIP WITH OTHER DISTRIBUTIONS

37

(b) One more special case of discrete uniform distributions is D U ( 2 ) distribution. If X D U ( 2 ) , then it takes on values 0 and 1 with equal probability (of This belongs to the Bernoulli type of distribution discussed in Chapter 4.

i). N

(c) Let 11s consider a sequence of random variables X , DU(n), n 1 . 2 , . . . . Let Y, = X,/n. One can see that for any n = 1 , 2 , . . . , N

Y, DU ( n ,0, N

=

A)

arid it takes on values 0, l / n , 2 / n , . . . , ( n l ) / n with equal probabilities l / n . Let us try to find the limiting distribution for the sequence Y1, Y2,.. . . The simplest way is to consider the characteristic function. The characteristic function of Y, is given by ~

gy,%(t)= Eexp(itY,)

= Eexp{i(t/n)X,}

= f,(t/n),

where f n ( t ) is the cha.racterist#ic function of D U ( n ) distribution. Using (2.22), we readily find that

(2.26) It is not difficult to see now that for any fixed t ,

gY,(t) + g ( t )

as

-+ 00,

where

(2.27) The RHS of (2.27) shows that g ( t ) is the characteristic function of a continuous distribution with probability density function p(x),which is equal to 1 if 0 < 2 < 1 and equals 0 otherwise. The distribution with the given pdf is said to be uniform U ( 0 ,l ) ,which is discussed in detail in Chapter 11. From the rorivrrgence of the corresponding characteristic functions, we can immetliatrly concludc that thc uniform U ( 0 , l ) distribution is the limit for the sequence of random variables Y, = X n / n , where X , DiY(n). Thus, we have constructed an important “bridge” between continuous uniforni and discrete uniform distributions.

-


CHAPTER 3

DEGENERATE DISTRIBUTION 3.1

Introduction

Consider the D U ( k ) distributed random variable X when k = 1. One can see that X takes on only one value 20 = 0 with probability 1. It, gives an exa.mple of the degenerate distribution. In the general case, the degenerate random variable takes on only one value, say c, with probability 1. In the sequel, X D ( c ) denotes a ra,ridorn va.riable having a degenerate distribution concentrated a.t the only point c, -00 < c < 00. Degeneratr distributions a.ssume a specia.1 place in distribution theory. They can be included as a special case of many families of probability distributions, such as normal, geometric, Poisson, and binomial. For any sequence of random va.riables XI, X z , . . . and any arbitrary c, we can a.lways choose sequences of normalizing constants ana.nd ,Onsuch that the limiting distribution of ra.ndoni variables N

will become the degenerate D ( c ) distribution. A very important role in queueing theory is played by degenerate distributions. Kendall (1953), in his classification of queueing systems, has even reserved a special letter to denote systems with constant interarrival tirnes or constant service times of customers. For example, M / D / 3 means that a queueing system has three servers, all interarrival times are exponentially distributed, and the service of each custonier requires a fixed nonrandom time. It should be mentioned that practically only degenerate ( D ) ,exponential ( M ) , and Erlang ( E ) have their own letters in Kendall’s classification of queueing systems.

3.2

Moments

Degenerate distributions have all its moments finite. Let X following discussion.

39

N

D(c) in the

DEGENERATE DISTRIBUTION

40

Moments about zero: a, = EX"

n

= cn,

=

1,2.. .. .

(3.1)

In part,icular, QI=

EX

(3.2)

=c

and a2 =

E X 2 = C2.

(3.3)

The varianw is = Var

x = a2

(1:

~

(3.4)

=0

Note that (3.4) characterizes degenerate distributions, meaning that they are thc only dist,ribiitions having zero varia.nce.

Ch,aracte:ristic function: f x ( t ) = Ee LtX

~

-

and, in particular,

C

,

(3.5)

if c = 0.

f x ( t )= 1

3.3

p

Independence

It turns out, that any random variable X, having degenerate D ( c ) dist,ribution, is independent of any arbitrarily choscn random variable Y . For observing this, we must, chcck that for any 2 and y,

P{X

I z,

Y 5 y}

= P{X

Equality ( 3 . 6 ) is evidently true if x

P { X 5 z, Y 5 y} If z

5 z } P { Y I Y}.

< c, in which

=0

(3.6)

ca.se

and P{X 5

.E} =

0.

2 c, then P{X 5 x }

=1

and P{X 5 2 ,

Y L v } = P{Y 5 11).

and wc scc that ( 3 . 6 ) is once again truc.

Exercise 3.1 Let Y = X. Show that if X and Y are iridcpendrrit, thcn X has a degrricratc distribution.

DECOMPOSITION

3.4

41

Convolution

It is clear that the convolution o f two degenerate distributions is degenerate; that is, if X D ( q ) and Y D(c2), then X Y D(cl ca). Note also that if X D ( c ) and Y is an arbitrary random variable, then X Y belongs t o the same type of distribution as Y . N

+

N

+

N

N

+

Exercise 3.2 Let X I and X2 be independent random variables having a common distribution. Prove t1ia.t the equality

XI implies that X1

N

+ x2 = XI d

(3.7)

D ( 0).

Remark 3.1 WP see that (3.7) characterizes the degenerate distribution concentrated at zero. If wc take X I + c iristead of X1 on the RHS of ( 3 . 7 ) ,we get a characterization of D ( c )distribution. Moreover, if X I ,X2,. . . are iridepeiident arid identically distributed random variables, then the equality x1+. ' .

+ Xe = XI + . . . + XI, + c, d

gives a. characterizat,ion of degenerate D

3.5

15 k

< e,

(

k ) distribution.

Decomposition

There is 110 doubt that any degenerate random variable can be presented only as a sum of degenerate random variables, but even this evident statement needs to be proved.

Exercise 3.3 Let X and Y be independent random varia.bles, and let X $- Y have a degenerate distribution. Show then tha.t bot,h X and Y a.re degenera.te.

It is interesting t o observe that even such a simple distribution a.s a degenerate distribution possesses its own special properties and also assumes an important role among a very large family of distributions.


CHAPTER 4

BERNOULLI DISTRIBUTION 4.1 Introduction The next simplest case after the degenerate random variable is one that takes on two values, say z1 < 52,with nonzero probabilities p l and p 2 , respectively. The discrete uniform D U ( 2 ) distribution is exactly double-valued. Let us recall that if X DU(2), then N

P{X

= O} = P { X =

1 1) = -. 2

(4.1)

It is easy to give a.n example of such a random variable. Let X be the number of heads tha.t have appeared after a single toss of a. bahnced coin. Of course, X can be 0 or 1 and it satisfies (4.1). Similarly, unbalanced coins result in distributions with

P{X

= 1) = 1 - P { X = 0)

=p,

(4.2)

where 0 < p < 1. Indeed, a false coin with two tails or heads can serve as a model with p = 0 or p = 1 in (4.2), but in these situations X has degenera.te D ( 0 ) or D(1) distributions. The distribution defined in (4.2) is known a.s the Bernoulli distribution.

4.2

Notations

If X is defined by (4.2), we denot,e it by

x

N

Be(p).

Any random variable Y taking on two values zo

P{Y = 2 1 )

< z1 with probabilities

= 1- P { Y = 5 0 } = p

can clearly be represented as

Y

= (51

-

z0)X

43

+

50,

(4.3)

44

BERNOULLI DISTRIBUTION

-

B e ( p ) . This means that random variables defined by (4.2) a.nd where X (4.3) have the same type of distribution. Hence, X can be called a random variuble of the Bernoulli B e ( p ) type. Moreover, any random variable ta.king on two values with positive probabilities belongs to one of the Bernoulli types. In wha.t follows, we will deal with distributions satisfying (4.2).

4.3 Moments Let X B e ( p ) , 0 < p < 1. Since X has a finite support,, we can guarantee the cxistrricc o f all its moments. N

M o m m t s about zero:

Exercise 4.1 Lvt X have a noridcgcncrate distribution and

E X 2 = EX" = E X 4 .

< p < 1, such

Show then that therc exists a p , 0

x

N

that

Be(p).

Variance: ~2 = Var

It is c1ea.r t1ia.t 0 < [32 5

x = (12 04 -

=p(1-

(4.5)

p).

$, and i j 2 attains its maximum when p

=

Gentrul moments:

From (4.4), we readily find t,ha.t /jTl= p ( 1

--

+( - ~ ) ~ p ~ - l }

p ) { (1-

for n

2 2.

The expression of the va.ria.ncc in (4.5) follows from (4.6) if we set

(4.6) 7t =

2.

M e ( ~ s ~ r eof. 9 skewness n n d kurtosis: Frorri (4.6), we find Pearson's coefficient of skewness as =

h j p

-

1 - 2p (p(1 p ) } ' P -

This exprcwion for rea.dily reveals that the distribution is negatively skewed when p > and is positively skewed when p < $. Also, y1 = 0 only w1it:ri p = (in t,liis case, the distribution is also symmetric).

CONVOLUTIONS

45

Sirnilarly, we find Pearson's coefficient of kurtosis as

i]

This expression for 7 2 [due to the fact rioted earlier that /& = p(1 - p ) 5 readily implies that 7 2 2 1 and that 7 2 = 1 only when p = Thus, in the case of B e ( ; ) distribution, we do see that, 7 2 = 7: 1, which rnearis that the inequality presented in Section 1.4 cannot, be improved in general.

4.

+

Entropy: It is easy t o see that (4.7)

H ( X ) =-plogp-(1-p)log(l-p). Indeed, the maximal value of H ( X ) is attained when p equa.ls 1.

=

i, in which case it

Characteristic function: For X Be(p),0 < p < 1, the characteristic function is of the form N

f x ( t ) = (1- p ) +peit.

(4.8)

As a special case, we can also find the characteristic function of a random variable Y , taking on values -1 and 1 with equal probabilities, since Y can be expressed as Y = - 1, where X

-

2x

B e ( ; ) . It is easy to see that f y ( t ) = cost.

4.4 Convolutions Let X I , X 2 , . . . be independent and identically distributed B e ( p ) randorn variables, and let Yn=X1+...+X, . Many methods are available to prove that (4.9)

We will u5e here the characteristic function for this purpose. Lrt gT1,f l . . . . , f n bc the characteristic functions of Y,, XI,. . . , X,, resprctively. From (4.8),we have f k ( t )= ( ~ - p + p e " ) , 1 c = 1 , 2 , . . . , n.

BER.NOULL1 DISTRIBUTION

46 Then, we get

(4.10)

One can see that the sun1 on the RHS of (4.10) coincides with the characteristic function of a discrete random variable taking on values ni ( 7 7 ~ = 0 , 1 , . . . , n ) with probabilities

Thus, the probability distribution of Y, is given by (4.9). This distribution, called the binomial distribution, is discussed in deta.il in Chapter 5. Fiirtlier, from Chapter 3, we already know tha.t a,ny X Br(p) is indecomposahle.

-

4.5

Maximal Values

We have seen above that sums of Bernoulli raridorn variables do not have the same tvpe of distribution as their suminands, but Bernoulli distributions are stable with respect to another operation.

Exercise 4.2 Lct,

be indepeiiclmt random variablcs, and lct

M,z = niax(X1,. . . ,X,,}.

Quite oftrn, Bernoiilli raridorn variables appear as randoin iridicat ors of different events.

RELATIONSHIP WITH OTHER DISTRIBUTIONS

47

Example 4.1 Let us consider a. random sequence ~ 1a2, , . . . , a, of length n, which consists of zeros and ones. We siipposc t1ia.t ( 1 1 . 0 2 , . . . , a,, a.re independent ra.ndoni variables taking 011 va.lues 0 and 1 with proba.bilities T arid 1- T , respectively. We say that a peak is present a,t point k ( k = 2 , 3 , .. . , n - 1) if ai;-1 = a k + l = 0 arid a!, = 1. Let Nk be the total number of peaks prcisent in the sequence a l , ~ 2 : .. . , a,. What is tlie expected value of N,? To find EN,, let us introduce tlie events Ak = ( ( ~ k - 1 = O, ai;

=

1.

ak+l =

0}

and random indicators

Xk

=

k

l{Ak},

=

2 , 3 , . . ..TL

-

1.

Note that Xi; = 1 if A!, happens, and Xi; = 0 otherwise. In this cast’. P{xk=1}

=

l-P{Xk=O}

=

P{A!,} P{Q&l

=

= 0,

-

and

XA- Be(pj,

a!, = 1,

k

= 0} = (1 - T ) T 2

ai;+1

= 2 , 3 , .. . , n

-

1,

where p = (1 - r)?. Now, it, is easy to see that

EX!,

=

(1 - r ) r 2 ,

arid

ENk. = E ( X 2

k

= 2 , 3, . . . :R.-

+ . ’ . + Xn-l) = ( n

-

1,

2j(1 - T ) r2 .

In a.ddit,ion t o the classical Bernoulli distributed random variables, there is one more cla.ss of Bcrrioulli distributions which is often encountered. These iiivolve random va.riab1t.s Y l ,Yz,. . . , which ta.ke on values *l. Based on these random va.riables, the slim Sk = Yl . . . Y, (71. = 1 , 2 , . . .) foriiis different discrete raiitloiri walks on tlie integer-valued httice and result in some interesting prolmbility problrtnis.

+

4.6

+

Relationships with Other Distributions

(a) We have sliowii that convolutions of Bernoulli distributions give rise to biriornial distribution, which is discussed in detail in Chapter 5.

(b) Let X1,X2.. . . be independent B P ( ~ 0) ,< p < 1, raridoin variables. Introduce a n ~ w random variable N a s

N

= min{.j : X,+, = O};

that is, N is simply tlic. iiuinber of 1’s in the s(qiieiicc~X1, X 2 , . . . that precede tlic’ first zmo.

BERNOULLI DISTRIBUTION

48

It, is easy to see tha.t N can take on values 0, 1, 2, . . ., and its probability distribution is

P{N

= 7L)

= = =

x,,

P{X1 = 1, x,= 1,.. . , = 1, Xn+l = 0) P { X 1 = 1}P{X2= 1). . . P{X,,, = l)P{X,,+1 (l-p)p", n=0,1,....

= 0)

This dist,sibiition, called a, geometric distribution, is discussed in drta.il in Chapter 6.

CHAPTER 5

BINOMIAL DISTRIBUTION 5.1

Introduction

As shown in Chapter 4, convolutions of Bernoulli distributions give rise t o a new distribution that takes on a fixed set of integer values 0,1, . . . , n with probabilities p , = P { X = m} =

(3

m

p m ( l -p)"-'",

= 0, I , . . . , n ,

(5.1)

where 0 5 p 5 1 and n = 1 , 2 , . . .. The parameter p in (5.1) can be equal t o 0 or 1, but in these situa.tions X has degenera.te distributions, D ( 0 ) arid D ( n ) , respectively, which a.re naturally special ca,ses of these distributions. The probability distribution in (5.1) is called a binomial distribution.

5.2

Notations

Let X have the distribution in (5.1). Then, we say that X is a binomially distributed random variable, and denote it by

X

-

B(72,p).

We know that linear transforinations

Y=u+hX,

-co0,

have the sa.iiie type of distributions and, hence, we need t o study only the st,andard distribution of the given type. If X sa.tisfies (5.1) and Y = a, h X , then Y takes on values a , a h, . . . , a nh and

+

P{Y

y

u + mh,} =

J:(

+

p"'(1

-

p),-"',

m

= 0 , 1 , . . . ;n.

+

(5.2)

We say that Y twlongs t o the binomial type of distribution, and denote it by

y

B ( 7 b , P ,a,h ) ,

a a.nd h > 0 being location a.nd scale parameters, respectively. 49

50

BINOMIAL DISTRIBUTION

5.3

Useful Representation

As wc know froin Chapter 4, binomial random varia,bles c m i be expresscd as slims of independent Bernoulli random variables. Let 21,Zz. . . . be independent, Bernoulli B e ( @ )randorri variables a i d X B ( n ,p ) . Then the followiiig equalit,y holds for ally n = 1,2 , . . . : N

x= d

21

+z,+...+z,.

(5.3)

Due t o (5.3), we (mi ea.sily obtain generating function aiid charact,eristic fuiictioii of binoiniad random va.ria.blesfrom the corresporidiiig expressions of Brrnoulli randoin variables.

5.4

Generating Function and Characteristic Function

-

Let, X B ( n , p ) .It follows froin (5.3) a d the independence of Zi's that the geiiera.ting function of X is

p, (s)

EcsX

~

=

~

E8Z1+."+Zn

~

E Q B I E S Z Z , . . E,s"r

(5.4)

( P z ( s ) ) "= (1 - p + p * s ) ' " ,

+

wlicrc. Px(.s) = 1 p p s is t,he coininon generating furict>ionof the Bernoulli random va.riables ZI, 22,. . . 2,. Lct, f x ( t ) he the c1ia.racteristic fuiictioii of X. Frorii (5.4), we readily obt,ain -

A\ a rorisqueiice, if Y

5.5

B ( n ,p . a , h ) , tlieri Y

d

=a

+ hX

and

Moments

Equdit 1 (5.3) irnmtdiatcly yiclds

as

wc4l a s

Otlicr inonleiits of X caii h found by differentiating the generating finiction in (5.4).

MOMENTS

51

Factorial moments of positiue order: pk

1 ) .. . ( X

=

EX(X

=

n(n-l)-+-k+l)&

-

-

k

+ 1) = P$'(l) k=1,2 )....

(5.9)

In particiilar, we have =

lL2

=

pLy

=

71p,

(5.10)

n(n - l)p", 4 7 1 . - 1)(n- 2)p3,

(5.12)

= n! p n .

(5.13)

(5.11)

and

pn Note that

pn+1 = p n + 2 =

. . . = 0.

(5.14)

Factorial moments of nxgutive order: For any k = 1 , 2 , . . . ,

Exercise 5.1 Show that we rari use the equality i1-k

= P!p)(l),

k

=

1,2... . ,

under the assumption that we consider the derivative of a negative order - k as the fdlowirig integral:

Now we can use (5.15) t o find p-l.l

=

and

p-2

.I, (1 - p + p s ) n

as follows:

ds

(5.16)

BINOMIAL DISTRIBUTION

52

p-2

E ( (X

1'

= =

1

~ ~

-

+ 1)(X l + 2) )

l ( 1- p + p ~ )ds~ d t

+

(1 -p)"+'{l+ (71 l)p} (71 1)(n 2)p2

+

+

(5.17)

Moments about zero: Relatioils (5.10)-(5.13) readily imply that

and

Ce71 trul m omP 71 ts:

From (5.18) a i d (5.19)' we obtain [see also (5.8)] the variance as

Var X

(5.22)

= Bj2 = 7 ~ p1( - p ) .

Siinilarly, we find from (5.20) and (5.21) the third arid fourth central niomcnts as p3

=

a.-j

-

302Ql

+2

4

= np(1

-

p)(l

-

2p)

(5.23)

and

S h n p ~chuructertstzcs: Froin (5.22)- (5.24), we find Pcarsoii'b coefficients of sk(>wriessand kurtosis as (5.25) and

(5.26)

MAXIMUM PROBABILITIES

53

respectively. From (5.25), it is clear that the binomial distribution is negatively skewed when p > a.nd is positively skewed when p < f . The coefficient of skewness is 0 when p = (in this case, the distribution is also symmetric). Equation (5.25) also reveals that the skewness decreases as n increases. Furthermore, we note from (5.26) that the binomial distribution may be leptokurtic, mesokurtic, or platykurtic, depending on the value of p . However, it is clear that y1 and yz tend t o 0 and 3, respectively, as n tends t o co (which are, as mentioned in Section 1.4, the coefficients of skewness and kurtosis of a normal distribution). Plots of the binomial mass function presented in Figures 5.1 and 5.2 reflect these properties.

i

_

_

_

_

~

Exercise 5.2 Show that the binomial B ( n , p ) distribution is leptokurtic for p < platykiirtic for

$

4 (1 5 ) or -

p

>i

(1

+ &) ,

arid mesokurtic

5.6

Maximum Probabilities

Among all the binomial B ( n ,p ) probabilities,

m

p)”-”,

= 0 , 1 , . . . , n,

it will be of interest t o find the rnaxirnum values. It appears that there are two different cases: (a) If m* = ( n

+ 1)p is an integer, then p7n*-l

= p,,,.

=

max p,.

O<m 0, of the geometric random variable. In this case we denote it by Y G ( p ,a , h ) , which is a random variable concentrated at points a , a $- h , a 2h,. . . with probabilities

+

+

P{Y

-

= a+71h} =

(1 - p ) p ” ,

63

R =O,l,.

...

(6.2)

GEOMETRIC DISTRIBUTION

64

Tail Probabilities

6.3 If X

N

G ( p ) ,then the tail probabilities of X are very simple. In fa.ct,

Formula (6.3) can be used to solve the following problem.

Exercise 6.1 Let X I , random variables, and

N

G ( p k ) ,k

=

1 , 2 , . . . , n, be a seyuencc of independent

mrL= mi11 &. lions,it will be showri that any geometric distribution is infinitely divisible.

6.8

Entropy

For geornct,ric distributions, one can easily derive the entropy.

Exercise 6.3 Show that the entropy of X

H ( X )= -ln(l - p ) in particular, H ( X ) = 2 I n 2 i f p =

+.

-

-

G ( p ) ,0 < p < 1, is given by

-]rip;

1-P

(6.37)

CONDITIONAL PROBABILITIES

6.9

71

Conditional Probabilities

In Chapter 5 we derived the hypergeometric distribution by considering the conditional distribution of X I , given that X1 X2 is fixed. We will now try t o find the corresponding conditional distribution in the case when we have geometric distributions. Let X 1 and X2 be independent and identically distributed G ( p ) random variables, arid Y = X I X,. In our case, Y = 0 , 1 , 2 , . . . and [see (6.30) when n = 21 r = 0,1,2,. . .. P{Y = r } = ( r 1)(1- p l 2 p r ,

+

+

+

We can find the rcquired conditional probabilities as

P{X1

= tjXl+

X,

=r}

=

P(X1 = t , X , = T t } P { X 1 + X2 = r } ~

(6.38)

+

It follows from (6.38) that the conditional distribution of X I , given that X1 X2 = T , becomes the discrete uniform DU(r 1) distribution. Geometric distributions possess one more interesting property concerning conditional probabilities.

+

Exercise 6.4 Let X

N

G ( p ) ,0 < p < 1. Show that

P { X 2 71 holds for any n

= 0,1,.. .

+ V7lX 2}.7

and nz

=P{X

2 71)

(6.39)

= 0 , 1 , . . ..

Remark 6.1 Imagine that in a sequence of independent trials, we have m “successes” in a row and no “failures”. It turns out that the additional number of “successes” iiiitil the first “failure” that we shall observe now, has the same geometric G(p) distribution. Among all discrete distributions, the geometric distribution is the only one which possesses this lack of m e m o r y property in (6.39). Remark 6.2 Instead of defining a geometric random variable as the number of “successes” until the a.ppearance of the first “failure” (denoted by X ) , we could define it altmmtively as the number of “trials” required for the first “failure” (denoted by Z ) . It is clear that 2 = X 1 so that we readily have the mass function of 2 as [see (6.l)]

+

P { Z = n} = (1 -p)p”-1,

n = 1 , 2 , .. ..

(6.40)

72

GEOMETRIC DISTRIBUTION

From (6.4), wc then have the generating function of 2 as

P ~ ( s=) EsZ

=E

S ~ ” = sEsX

1 if /sI < -.

= ___ ’(’-’)

P

1-ps

(6.41)

Many authors, in fact, take these as ‘standard form’ of the geomrtric distribution [instead of (6.1) arid (6.4)].

6.10

Geometric Distribution of Order k

Consider a sequence of Bernoulli(p) trials. Let 21,denote the nurnber of “tria.ls” required for ‘‘k consecutive failures” to appear for the first time. Then, it can be sliown tha,t the generating function of‘ 21, is given by

PZk(,s)= E s Z k =

+

(1 - p ) k s k ( l - s p s ) 1 - s +p(l - p ) kSk+l ’

(6.42)

The corresponding distribution has been named the geometric distribution of order k . Clearly, the genemting function in (6.42) reduces t o t1ia.t of the geometric distribution in (6.41) when k = 1. For a review of this and many other “run-related” distributions, one may refer t o the recent book by Balakrishnari and Koi1tra.s (2002).

Exercise 6.5 Show tl-1a.t (6.42) is indeed the generating function of Exercise 6.6 From the generating function of arid variance of Zk.

z k

Zk.

in (6.42). dcrivc the mean

CHAPTER 7

NEGATIVE BINOMIAL DISTRIBUTION 7.1

Introduction

In Chapter 6 we discussed the convolution of n geometric G ( p ) distributions and derived a new distribution in (6.30) whose generating function ha.s the form P,(s) = (1 - p)"(l - ps)-".

As we noted there, for any positive integer n, the corresponding random variable ha.s a suitable interpretation in the scheme of independent trials as the distribution of the total number of "successes" until the nth "failure". For this interpretation, of course, n has to be an integer, but it will be interesting to see wha.t will happen if we take the binomial of arbitrary negative order as the gt:nera.ting function of some ra.ndom variable. Specifically, let 11s consider the function

m(s) = (1

-

p),(l

-

ps)-,,

cy

> 0,

(7.1)

which is a generating function in the case when all the coefficients ~ 0 . ~ 1. ., . in the power expansion x y = O p k s k of the RHS of (7.1) are nonnega.tive and their sun1 cquals 1. The second condition holds since C y = O p k = P,(l) = 1 for any Q > 0. The function in (7.1) has derivatives of all orders, and simple calculations enable us to find the: coefficients p k in the expansion

k=O

as pk

= -

r,(k)(0) ~~

k!

N(cy

+ 1) ' .

(1 -p)"(-p)"-ai)(-a

( a+ k k! '

-

1)

-

k! (1- d " P k

73

1).. . (-.

-

k

+ 1)

74

NEGATIVE BINOMIAL DISTRIBUTION

a.nd p ~ .> 0 for a.ny k = 0 , 1 , 2 , . . . . Thus, we have obtained the following assertion: For any CY > 0, the function in (7.1) is generating the distribution concentrat,ed at, points 0, 1 , 2 , . . . with probabilities as in (7.2). It should be meritioried here that negative binomial distributions and some of their genedized forms have found importa.nt applicat,ioris in actuarial w.na.lysis; see, for example, the book of Klugman, Panjer and Willmot (19%).

7.2

Notations

Let a random variable X take on values 0 , 1 , . . . with probabilities pk =

P{X

=

A"}

=

(a

+

;

-

1) (1 - p)"p'" ,

cy

> 0.

We say that X lias the negatzve bznom7al distrabutaon with parameters p (0 > 0, 0 < p < l ) ,and denote it by

x

N

o/

and

NB((L.,p).

Note t,lia.t,N B ( 1 . p ) = G ( p ) and, hencc, the geomet,ric distribution is a pa.rticiilar case of the negative binomial distribution. Sometimes, the negativr: binomial clist,ribution with an integer parameter 0 is called a PG,SUJ,~ distrihu-

tiol?,.

7.3


There is no necessity to calculate the generating function of this distribution. Unlike the earlier situations, the generating function was thv prirnary object in this cabe, which then yielded the probabilities. Recall that if X NB(a.p), then N

P x ( s )= EsX

=

( - ) O .

1- p s

(7.3)

From ( 7 . 3 ) ,we inimediatcly have the characteristic function of X as

7.4

Moments

From the expression of the generating function in (7.3),we can readily find tlic factorial niotnents.

MOMENTS

75

Factorial moments of positive order: pk

=

where

E X ( X - l ) - . . ( X - k + 1) = PC'(1)

r(s)=

./o

00

e--zxs-l dx

is the complete gamma function. In particular, we have

P4

a(a

+ l ) ( a+ 2 ) ( a+ 3)p4

(7.9) (1 - PI4 Note that if a = 1, then (7.5)-(7.9) reduce to the corresponding moment,s for the geometric distribution given in (6.6)-(6.10), respectively. =

Factorial moments of negative order:

and in particular,

(7.11) if a > 0 and a (6.12).

#

1. The expression for p-1 for the case a = 1 is as given in

(7.12)

76


Centrul moment.s: Froni (7.12) and (7.13), we readily find the variance of

0 2 = Var X

= a2

E

as

(7.16)

-

Similarly, from (7.12)-(7.15), we find the third and fourth central moments of X as P3

=

p4

=

(7.17) a4

-

4a3al f

2 6ff2ctil

-

3a4 -

~

ap (1 -p)4 (1

+ (3a + 4)p + p ” } , (7.18)

respectively

Shape characterrstm: Froni (7.16)-(7.18), we find Pearson’s coefficients of skewness and kurtosis as (7.19) and P4

y 2 = 7

=

Pz”

=

1

+ (3a + 4)P + P2

CUP 1+ 4 p + P 2 3+( ap

>.

(7.20)

respectively. It is quite clear from (7.19) that the negative binomial distribution is positively skewed. Furthermore, we observe from (7.20) that the distribution is lrptokurtic for all values of the parameters a and p . We also observc, that as 01 tends to cc, y1 and 7 2 in (7.19) and (7.20) trnd to 0 and 3 (the values corresponding to the normal distribution), respectively. Plots of negative binomial mass function presented in Figures 7.1 7.3 reveal these propertics.

7.5

Convolutions and Decompositions

-

Let XI N B ( a 1 , p ) and ables, arid Y = X1 Xp.

+

X2

N

N B ( a 2 , p ) be two independent random vari-

Exercise 7.1 Use (7.3) t o estahlish that NB((wl ( 1 2 , ~ distribution. )

+

Y has

a negative binomial

CONVOLUTIONS AND DECOMPOSITIONS

77

NegBin(2,0.25)

0

10

k

20

Neg Bin(2,0.5)

0

10

k

20

NegBin(2,0.75)

0

10

k

20

Figure 7.1. Plots of negative binomial mass function when

T =2

78


NegBin(3.0.25)

10

k

20

NegBin(3,0.5) I

0 2 -i

0

10

20

k

NegBin(3,0.75)

0

10

k

20

Figure 7.2. Plots of negative binoniial mass function whcn

r'

=3

CONVOLUTIONS AND DECOhtPOSITIONS

79

NegBin(4,0.25)

0 06

2 n

0 05 004

000

7

0

10

k

20

NegBin(4,O. 5)

0

10

k

20

NegBin(4,0.75)

0

10

20

k

Figure 7.3.Plots of nega.tive binomial ma.ss function when

T

=4

80


Remark 7.1 Now we see that the sum of two or more independent random variables X k N B ( a k , p ) ,k = 1 , 2 , . . . , a.lso has a negative binomial distribution. On the other hand, the assertion above enables us t o coricliide tha.t any negative binomial distribution admits a decomposition with negative binomial components. In fact, for any n = 1 , 2 , .. . , a ra.ndom variable X N N ( a , p ) can be represented as d = Xl,, ’. N

x

+ + x,,,, ’

where XI,^, . . . , X,,,,, are independent and identically distributed raiidoni variables haviiig N B ( a / n ,p ) distribution. This means that any iiega.tive hinornial dist,ribution (including geometric) is infinitely divisible.

7.6

Tail Probabilities

-

Let X N B ( n . p ) , where n is a positive integer. The interpretation of X based on iridepcriderit trials (each resulting in “successes” and “failiircs” ) gives iis a way to obtain a siniple foriri for tail probabilities as (7.21)

In fact, event { X 2 m } is equivalent t o the following: If we fix out,corries of the first r n R - 1 tria.ls, then the number of “successful” trials must be at least r n . Let Y be the number of LLsuccesses” in m n - 1 indepeiident trials. WP know t1ia.t Y has the binomial B ( m n - 1,p ) distribution and

+

+

+

Moreover, (5.36) gives the following expression for the RHS of (7.22):

Since

P { X 2 732) = P{Y 2 m}, upon coiiibiniiig (7.21)- (7.23), we obtain

As a special case of (7.24), we get equality (6.3) for geometric G ( p )distribution when 71 = 1.

LIMITING DISTRIBUTIONS

7.7

81

Limiting Distributions

Fox X > 0, let us consider X , function of X , in this case is

N

N B ( a ,X/a) where X/a < 1. The generating

(7.25) We see immediately that as

Pa(s) t eX(S-l)

Q

+ 00.

(7.26)

Note tha.t (7.27) is the generating function of a random variable Y taking on values 0,1,2, with proba.bilities Plc =

e-'Xk ~

k!

k ?

= 0,1,2,

(7.28)

In Chapter 5 we mentioned that Y has the Poisson, distribution. Now relation (7.26) implies that for any k = 0,1, . . . , (7.29) that is, the Poisson distribution is the limit for the sequence of N B ( a ,X / a ) distributions as cy + 00. Next, let X , N B ( a , p ) and N

w,

=

X,

-

EX,

Jm (7.30)

Let f,(t) be the cliaracteristic function of W,, which can be derived by standard methods from (7.4). It turns out that for any t , (7.31) Comparing this with (5.43), we note that the sequence W,, as (Y + m, converges in distribution to the standard normal distribution, which we came across in Cha.pter 5 in a similar context.


CHAPTER 8

HYPERGEOMETRIC DISTRIBUTION 8.1

Introduction

In Chapter 5 we derived hypergeometric distribution as the conditional distribution of X I , given that XI X z is fixed, where XI and X z were binomial random variables. A simpler situation wherein hypergeometric distributions arise is in connection with classical combinatorial problems. Suppose that an urn contains a red and b black balls. Suppose that n balls are drawn at random from the urn (without replacement). Let X be the number of red balls in the sa.mple drawn. Then, it is clear that X takes on an integer value m such that

+

max(0, n - b) 5 m 5 min(n, a ) ,

(8.1)

with proba.bilities

8.2

Notations

In this case we sa.y that X ha.s a h?jpergeometrzc distribution with parameters n, a, and b, and denote it by

x

N

H g ( n ,a , b).

Remark 8.1 Inequalities (8.1) give those integers m for which the RHS of (8.2) is nonzero. We have from (8.2) the following identity:

83

HYPERGEOMETRIC DISTRIBUTION

84

To simplify our calculations, we will suppose in the sequel that n 5 min(a, b) and hence probabilities in (8.2) are positive for m = 0,1, . . . , n only. Identity (8.3) t h m beromes the following useful equality:

Generating Function

8.3

If X H g ( n ,a , b ) , then we can write a formal expression for its generating function as N

Px(s)=

n!(a+ b ((J

-

+ b)!

n)!

c 7L

m=O

m!( a - m)! (71

a!b! s*. m)! ( b - 72. + m)!

(8.5)

-

It turns out that the RHS of (8.5) can be simplified if we use the Gaussian hypcrgeonietric function

which was introduced in Chapter 2. Then, the generating funct,ion in (8.5) becomes

from which it becomes clear why the distribution has been given the name hypergeometric distribution.

8.4 Characteristic Function On applying the relation between generating function and characteristic function, we readily obtain the characteristic function from (8.6) t o be fx(t)

= -

8.5

E e Z t X= Px(e”) 2F1 [-n, -a;b - n + 1;eZL] * zF1[-n, -a; b - 72 1;11

+

(8.7)

Moments

To begin with, we show how WP can obtain the most important moments a1 = E X and /& = Var X using the “urn interpretation” of X . We have n

MOMENTS

85

red balls in the urn, numbered 1 , 2 , . . . , a. Corresponding to each of the red balls, we introduce the random indica.tors Y1,Y2,. . . , Y, as follows:

Yk

=

1

if the kth red ball is drawn

=

0

otherwise.

Note that

EYk

= P{Yk =

n 1) : a+b'

k = 1 , 2 , .. . , a ,

and

k = 1 , 2 ) . . . ,a. It is not difficult, to see that

+

X=Y,+Y2+..

(8.10)

KL.

It follows immediately from (8.10) that

EX

= E(Y1

an + . . . + Y,) = aEY1 = __ a+b

(8.11)

Now. (8.12)

Using the symmetry argument, we can rewrite (8.12) as Var X

=a

Var Yl

+ a(a

-

(8.13)

1) Cov(Yl,Y2).

Now we only need to find the covariance on the RHS of (8.13). For this purpose, we have

=

P(YIY2 = 1 } -

(

(8.14)

Note that

P{y1Y2=1}=

(';)/("Y) =

n(n- 1) (u+bj(a+b-1)

.

(8.15)

Finally, (8.9) and (8.13)--(8.15)readily yield Var X

=

abn(a + b - n) ( a b ) 2 ( ~ b - 1)

+

+

(8.16) '

86


Factorial rnonierits of positive order: pk =

-

E X ( X - 1 ) .. . ( X

a! b! ,n! ( a (0

-

-

+b

+ b)!

u! b! 72! ( a + b (a b)!

+ a!n! (a+ b + b)! (U

(U

-

-

Ic

+ 1)

n)! m=k

-

1 ( m - k ) ! ( a - m)! ( n - m)! ( b - n + m)! 1

n)!n-k m=O

m! ( a - k

1('"m k , ( n

n)! n-k - k)! m=O -

m ) ! ( n- k

-

-

-

m)! ( b - 7~

+ k + m)!

;-,,)

(8.17)

Froni (8.4)' we know that

c

71-!i

711

b

(a+ b

rk)! ( n- k ) ! ( U + b n)!

USb-k

=o ( U ~ k ) ( n - k - m

-

-

using which we readily obtain /Lk

=

a! n! ( a + b - k ) ! ( u b)! ( a - Ic)! (71 - k ) !

+

for k 5 n.

(8.18)

Note also that p k = 0 if k > n. 1x1 particular, the first four factorial moments are as follows: 1-11

=

p2

=

11.3

=

114

=

an a+b'

(8.19)

__

1) 1) ' u ( u - 1 ) ( a- 2)n(n- 1 ) ( r t - 2) ' (a b ) ( u b - 1)(a b - 2 ) a(a 1 ) ( a - 2 ) ( u - 3)n(n- 1)(n - 2 ) ( n- 3 ) (a b ) ( ~ b - l)(a b - 2 ) ( ~ b - 3 ) U ( a - 1)72(n -

(a

+ b ) ( a+ b

+ -

-

+

+

+

+

+

Fuctoraal momrnts of negutave order: Analogous to (8.17) and (8.18)' we have 1d-A

=

E

[( X +

1

1 ) ( X + 2 ) . . . ( X +k )

+

I

(8.20) (8.21) (8.22)

MOMENTS -

=

a! n!( a + b (a k)! (a

+

-

n)! n+k

+ b ) ! m=k

(

a! n! ( a + b - n ) ! a + b + k (a+k)!(a+b)! n+k -

87

a! n!( a + b - n ) ! ( a k ) ! ( a b)! m = O

+

)

+

In particular, we find I*.-1

= -

-

1

.(X+.) a+b+l ( a l ) ( n 1) ( a a+b+l ( a l ) ( n 1) ( a

+

+

+

+

a! n! ( a + b - n ) !b! + l)!( a + b)! ( n + l ) ! ( b - n - l ) ! b! ( a + b -- n)! + l ) ( n + 1)(a+ b)! ( b - n - l)! .

(8.24)

Moments about zero: Indeed, the RHSs of (8.11) and (8.19) coincide, and we have

Furthermore, we can show tha.t 0 2 =EX

2

= p 1 + I*.2 =

and a3

= -

+

an a f b

~

E X 3 = /-LI + 3 p ~ p3 an 3a(a - l ) n ( n - 1) a+b ( a + b ) ( a + b - 1)

-+

l ) n ( n 1) + (a(a a+b)(a+b-l) -

-

(8.25)

- 2)n(n l ) ( n 2) + a (( aa + b1)(a )(a+b- l)(a+b-2) -

-

-

(8.26)

Remark 8.2 If n

=

1, then the hypergeometric Hg(1,a , b) distribution co-

incides with Bernoulli Be

(a 1- b ) ~

distribution.

Exercise 8.1 Check that expressions for moments are the same for

Hg(1,a , b) as those for Be ( a ~

b ) distributions.

Exercise 8.2 Derive the exprcssions of t,he second and third moments in (8.25) and (8.26), respectively.


88

8.6


-

Let X N H g ( n . p N , (1- p ) N ) , N = 1 , 2 , . . . , where 0 Then, wr have

~

< p < 1 and R is fixed.

rl! m! ( n- m)! pN(pn'-l)...(pN-m+l)((lp)N)((l-p)N-l)...((l~~)N~n+~+l) N(N-l)...(N-nfl)

from which it is easy to see that for any fixed m

= 0, 1 , .. .

, n,

asNioo.

P{XN = m} i

(8.27)

Thus, we get the binomial distribution as a limit of a special sequcnce of hypergronirt rir distributions.

Exercise 8.3 Lrt X,v any fixed ni = 0 . 1 , . . . ,

P{XN

N

1 , 2 , . . . . Show then that for

H g ( N ,A X 2 , N 3 ) , N

=

A'" epx 7n.

as N

= m} i

t 00.

(8.28)

The Poisson distribution, which is the limit for the sequence of hypergeometric rantloni variables, present in Exercise 8.3, is the subject of discussion in the iirxt chapter.

CHAPTER 9

POISSON DISTRIBUTION 9.1

Introduction

The Poisson distribution arises natiirally in many instances; for example, as we have already sectn in preceding chapters, it appears as a limiting distribution of some sequences of binomial, negative binomial, and liypergeometric random variables. In addition, due to its many interesting characteristic properties, it is also used as a probability model for the occurrence of rare events. A book-length account of Poisson distributions, discussing in great detail their various properties and applications, is available [Haight (1967)l.

9.2

Not at ions

Let a random variable X take on values 0,1,. . . with probabilities

cAAm p , = P { X = m } = -, ni!

m=0;1, ...;

(9.1)

where X > 0. We say that X has a Poisson distribution with paramcter A, and denote it by

X

N

..(A).

If Y = u + h X , -30 < a < 00,h > 0, then Y takes on values a,a+h,a+2h,.. . wit,h probabilities P { Y = a+nzh}

epAXm

= P { X = m} = ____

m!

’

m

= 0,1,

This distribution also belongs t o the Poisson type of distribution, and it will be denoted by .(A, a , h). The standard Poisson .(A) distribution is nothing but n(X,0 , l ) .

89

POISSON DISTRIBUTION

90

9.3 Let, X

Generating Function and Character ist ic F'unct ion

-

.(A),

X

> 0. From (9.1), we obtain the generating function of X as

Then, the characteristic function of X has the form . f x ( t )= EeitX = Px(eit)= exp{X(eit

9.4

-

1)).

(9.3)

Moments

The simple form of the generating function in (9.2) enables us to derive all the factorial iiioinerits easily.

Factorial moments of positive order: I L = ~ EX(X

-

1 ) . . . ( X- k + 1) = PF'(1)= X k ,

k = 1,2,....

(9.4)

In particular, we have:

Factorial moments of negative order:

In particular, we obtain (9.10) and

TAIL PROBABILITIES

91

Moments about zero: From (9.5)-(9.8), we immediately obtain the first four moments about zero as follows:

Central m o m e n t s : From (9.12) and (9.13), we readily find the variance of X as p2 = Var

X

= a2 - a; = A.

(9.16)

Note that if X N .(A), then E X = Var X = A. Further, from (9.12)-(9.15), we also find the third and fourth central moments as

/!& b4

=

E(X

-

E X ) 3=

-

3Q2Q1+

2 ~= ; A,

=

E(X

-

EX)4= a 4

-

4a3~tl

-

+ 6a2a'4

3ai: = X

+ 3X2,

(9.17) (9.18)

respectively.

Shape characteristics: From (9.16)-(9.18), we obtain the coefficients of skewness and kurtosis as (9.19) 72

=

P4

1

jg=3+x,

(9.20)

respectively. From (9.19), we see that the Poisson distribution is positively skewed for all values of A. Similarly, we see from (9.20) that the distribution is also leptokurtic for all values of A. Furthermore, we observe that as X tends to 03, the valiies of 71 and y2 tend to 0 and 3 (the values corresponding to the normal distribution), respectively. Plots of Poisson mass function presented in Figure 9.1 reveal these properties.

Tail Probabilities

9.5 Let X

N

.(A).

Then (9.21)

The RHS of (9.21) can be simplified and written in the form of an integral.

92


Exercise 9.1 Show that for any m = 1,2 , . . ,

P { X 2 m} =

(9.22)

du.

Remark 9.1 We will recall the expression on the RHS of (9.22) later when we discuss the gamma distribution in Chapter 20.

9.6

Convolutions

Let XI .(XI) and X 2 .(A,) be independent random variables. Then, by it is easy t o show making use of the generating fiinctions P x , ( s ) and PxZ(s), that, Y = XI X2 ha.s the generating function N

N

+

-

+

and, hence, Y .(XI A 2 ) . This simply means that convolutions of Poisson distributions arc also distributed a s Poisson.

9.7

Decompositions

Due to the result just st,ated, wc know that for any X pair of independent Poisson random variables U .(XI) where 0 < A1 < A, to obtain the decomposition N

N

.(A) wt' can find a and V T ( X - XI),

d

X=U+V. Hrncc, any Poisson distribution is decomposable. 1 , 2 . . . . , the decomposition

--

(9.23) Moreover, for any n

x = x1+x2 + + x , d

" '

=

(9.24)

holds with X ' s being independent and identimlly distributed as r ( X / n ) . This siniply implies that X is infinitely divisible for any A. Raikov (1937h) established that if X a.dmits decomposition (9.23), them both the independent nondegenera.te components U a.nd V have necessarily a. Poisson type of distribution; that is, there exist constants --cc< u < cx) arid X I < x such that

U - J T ( X ~ , U and ,~) V~~(A-xl,-a,l). Thus, convolutions and decompositions of Poisson distributions always belong t o the Poisson class of distributions.

DECOMPOSITIONS

93

Poisson( 1)

03

nr

0

5

10

n

Poisson(4)

0

5

10

n

Poisson( 10)

0

10

20

n

Figure 9.1. Plots of Poisson mass function

94


Conditional Probabilities

9.8

-

-

Let X I .(XI) arid X2 r ( X 2 ) be independent random variables. Consider the conditional distribution of X1 given that XI +XZ is fixed. Since X 1+ X 2 r(X1 Xz), we obtain for any n = 0 , 1 , 2 , . . . and 712 = 0 , 1 , . . . , n that

+

P(X1

+ x2 = n } P ( X 1 = m , X l + X2 = n}

= mix1

x,

-

P{X1+ = n} P { X ] = Ill, x2 = 72 - m } P { X l + x2 = n } P ( X 1 = rrb}P{X2 = n - nL} P { X I + x, = 7L)

-

-

, - A I x ~ L1

~

~

m!

=

( 7 3

n!

e-X2,y-m

(n

-

m)!

+ X2)"

dX1+XZ)(X1

(A)nL+

Thus, the conditional distribution of X 1 , given t,hat X I the biriorriial B

( ?' h z ) n, X1 ~

(9.25) X2 =

n, is simply

distribution.

Now, we will try to solve the inverse problem. Let, X1 and X2 be independent random va.riablcs taking on values 0 , 1 , 2 , . . . with positive proba.bilities. Then, Y -= X1 X2 also takes on values 0 , 1 , 2 , . . . with positive probabilities. Suppose t,ha.t for a.ny n = 0 , 1 , . . . , the conditional distribut,ion of X I , given that Y = 71, is binomial B ( n , p ) for soiric: pa.rameter p , 0 < p < 1. Then, we are interested in determining the distributions of X1 and X2! It, turns out that both distributions arc Poisson. To see this, let

+

P(X1

= 711) = r m >

0,

aid

P{X2=l}=qe>0,

V L=

0, I ? .. . ,

1=0,1,....

As seen above, the conditional probabilities P { X 1 = ni/Y = n } result in the expression r.,,q,-,/P{Y = n}. In this situation, we get the equality Trn4n-m

P{Y

=

n}

-

n! p m ( l - p)"-", nz! ( n- nz)!

m

Compare (9.26) with the sanie equality written for m has the form

= 0 , 1 , . . . ,71.

(9.26)

+ 1 in place of 712, which

It readily follows from (9.26) and (9.27) tha.t (9.28)

MAXIMAL PROBABILITY holds for any n = 0 to obtain

=

1 , 2 , . . . and m

=

95

0 , 1 , . . . ,n. In particular, we ca,n take

m

(9.29) Let us denote X Yn =

= q ( 1 - p)/(rop). Then,

X n

-%-I

=

A2

n(n - I)

we immediately get qo

= epx

4n =

(9.29) implies that

A" qnp2 = . . . = -qo,

n = 1 , 2, . . . .

n!

(9.30)

and, consequently,

e-'Xn n! '

~

n=o,1, ...

,

(9.31)

where X is some positive constant. On substituting qn into (9.28) and taking m = n - 1, we obtain

Hence, and consequently, (9.32) Thus, only for X I and X2 having Poisson distributions, the conditional distribution of X I , given t1ia.t X I X2 is fixed, can be binornial.

+

9.9

Maximal Probability

We may often be interested to know what of proba.bilities p , ( m = 0 , 1 , . . .) are maximal.

= e-'X"/rn!

Exercise 9.2 Show tha.t there are the following two situations for maximal Poisson probabilities: If mo < X < mo 1, where mo is an integer, then p,,, is maxinial among all probabilities p , ( m = 0 , 1 , . . .). If X = mo, then p,,, = pmOpl,and in this case both these probabilities a.ro maximal.

+

96


Limiting Distribution

9.10 Let X,

N

.ir(n),

n = 1 , 2 , . . . , and X,-EX, w,, = d=

-

Xn-n

f i '

Using characteristic functions of Poisson distributions, it is easy t o find the liniiting distributions of random variables W,.

Exercise 9.3 Let g T 1 ( t )be the characteristic function of W,. Then, prove that for any fixed t , g,,(t) + e

-t'/2

as n

+ 00.

(9.33)

Remark 9.2 As is already known, (9.33) means that the standard normal distribution is the limiting distribution for the sequence of Poisson random variables W l ,1V2, . . . . At the same timc, tlir Poisson distribution itself is a limiting form for the binomial and some other distributions, a5 noted in the preceding chapters.

9.11

Mixtures

In Chapter 5 we considered the distribution of binomial B ( N , p ) random varia.bles in the case when N itself is a binomial random variable. Now we discuss the dist,ribution of B ( N , p ) random vaxiable when N has tht: Poisson distribution. La.ter, wr deal with a niore general Poisson mixtures of random va.riables. Let X , . X , , . . . be independent and identically distributed ra.ndom varip k s k , where ables liaviiig a conirnon generating function P ( s ) = Cy=o p k = p{x7,L = k } , 172 = 1 , 2, . . . , k = 0 , 1 , 2 , . . . . Let s o = 0 and = XI t X2 -t. . . t X , ( n = 1 , 2 , . . .) bc the cumulative sunis of X ' s . Then, the genera.ting function of S, has the form

s,,

Pn7ns) = EsSn

=

ESXl+"'fX,,

=

E s ~E ' s X a . . . E s X V= L P"(s),

n = 0,1,.. . .

Consider now an intcger-valued random variable N taking on values 0 . 1 , 2 . with probabilities qn = P { N = n } ,

n = 0 , 1.....

MIXTURES

97

Suppose that N is independent of X I ,Xa,. . . . Further, suppose that Q ( s ) = 00 Cn=Oqn~n is the generating function of N . Let us introduce now a new random variable, Y =SN. The distribution of Y is clearly a mixture of distributions of S, taken with probabilities qn. Let us find probabilities r , = P { Y = m } , m = O , l , . . . . Due to the theorem of total probability, we readily have

r,

=

P{Y

= m } = P{SN = m )

cu

=

C p { s ~ mlN =

= 7b}P{N = n}

n=O 0

(9.34) n=O

Since random variables S,

+ ... + X ,

= XI

and N are independent,

P{Sn = ,m/N = n } = P { S n = na}; and hence we may write (9.34) as 00

r,

=

C P{S,

= m>qn,

n=O

m

= 0, I,..

(9.35)

Then, the genera.ting function of Y has the form

m=O m

m

m=O n=O 00

00

n=O

nL=O

(9.36) We note that the sum

oii

n1=O

is simply the generating function of Sn and, as noted earlier, it is equal to P"(s). This enables us to simplify the RHS of (9.36) and write 00

(9.37) n=O

Relation (9.37) gives the genemting function of Y, which provides a way t o find the probabilities TO, r 1 , . . . .

98

POISSON DISTRIBUTION Suppose that we take V,

N

B ( n , p ) ,and we want t o find the distribution of

Y = V N ,where N is the Poisson r ( X ) random variable, which is independent of V,, V,. . . . . We can then apply relation (9.37) t o find the generating function of Y . In fact, for any n = 1 , 2 . . . ., due to the properties of the binomial distribution, we have

v, 2 X 1 + X2 + . ’ . + x,,

where X I ,X 2 , . . . are independent Bernoulli B e ( p ) randoni variables. Therefore, wc can consider Y as a sum of N independent Bernoulli random variables

Y =d X I

+ X2+...+XN

.

(9.38)

In this case,

P ( s )= E s X k = 1 - p + p s , and therefore,

k

=

1 , 2 , .. . ,

Q ( s ) = E s N = eX(’-l).

Hence, wc readily obtain from (9.37) that

R ( s ) = EsY

= & ( P ( s ) )= exp{X(ps

-

p ) } = exp{Xp(s

~

1)).

(9.39)

Clearly, the RHS of (9.39) is thc generating function of the Poisson .(Xp) random variable, so we have

P{Y

= m} =

e-’P

(Xp)” m!

,

m = 0,1,2,

Exercise 9.4 For any n = 1 , 2 , . . ., let the random variable X , take on values 0 . 1 , . . . , n - 1 with equal probabilities l / n , and let N be a randorn variable independent of X ’ s having a Poisson .(A) distribution. Then, find the generating fiiriction of Y = X N arid hericc P{Y = 0).

More generally, when ATis distributed a,s Poisson with parameter X so that Q(.) = ex(”-’)

[see (9.2)],(9.37) becomes

qs)= e A { l Y s ) - l - l .

This t,hen is the generating function of the distribution of the sum of a Poisson number of i.i.d. random variables with generating function P ( s ) . The corrcspondiiig distributions are called Poisson-stopped-sum distributionq a na,nie introduced by Godainbe and Patil (1975) and adopted since then by a nurnber of authors including Doug1a.s (1980) a.nd Johnson, Kotz, and Kemp (1992). Some other na,mes such a.s generalized Poisson [Feller (1943)], stuttering Poisson [Kemp (1967)] and compound Poisson [Feller (1968)l have also been i i s e d for these distributions.

RAO-RUBIN CHARACTERIZATION

99

Exercise 9.5 Show that the negative binomial distribution in (7.2) is a Poisson-stopped-sum distribution with P ( s ) being the generating function of a logarithmic distribution with mass function

9.12

Rao-Rubin Characterization

In this section we present (without proof) the following celebrated Rao-Rubin characterization of the Poisson distribution [Rao and Rubin (1964)]. If X is a discrete random variable taking on only nonnegative integral values and that the conditional distribution of Y , given X = xi is binomial B ( z , p ) (where p does riot depend on x ) , then the distribution of X is Poisson if and only if

P ( Y =ylY = X ) = P ( Y

= y/Y#

X).

(9.40)

An interesting physical interpretation of this result was given by Rao (1965) wherein X represents a naturally occurring quantity with some of its components not being counted (or destroyed) when it is observed, and Y represents the value remaining (that is, the components of X which are actually counted) after this destructive process. In other words, suppose that X is the original observation having a Poisson .(A) distribution, and the probability that the original observation n gets reduced t o 2 due to the destructive process is (9.41) Now, if Y represents the resulting random variable, then

P(Y

= y)

= -

P (Y = yldestroyed) e--Px (PA)” .

Y!

,

=P

(Y = ylnot, destroyed) (9.42)

furthermore, the condition in (9.42) also characterizes the Poisson distribution. Interestingly, Srivastava and Srivastava (1970) established that if the original observations follow a Poisson distribution and if the condition in (9.42) is satisfied, then the destructive process has to be binomial, as in (9.41). The RawRubin characterization result above generated a lot of interest, which resulted in a number of different variations, extensions, arid generalizations.

100

9.13


Generalized Poisson Distribution

Let, X be a random variable defined over nonnegative integers with its probability function as p,(H,X)

=

P { X = 7n)

where 0 > 0, max(-l,-8/!) < X 5 1, and l (2 4) is the largest positive integer for which 8 t X > 0 when X is negative. Then, X is said t o have the generalized P o i s s o n (GPD) distribution. A book-length account of generalized Poisson distributions, discussing in great detail their various properties and applications, is available and is due to Consul (1989). This distribution, also known as the Lagrangian-Poisson distribution, is a. Poisson-stopped-sum distributioii. The special case when X = 00 in (9.43) is called the restricted generalized P o i s s o n distribution, and the probability function in this case beconies

+

1

p,(O, A) = P { X = n?,}= - H m ( l nL!

wherc max(-1/6, -1,")

+ rna)m-le--B--ma@,

7R=0,1, ...,

(9.44)

< a < l/6.

Exercise 9.6 For the restricted generalized Poisson distribution in (9.44), show that the mean and variance are given by

8 1- 0 0 respect i w 1y.

and

H (1

-

4

3

,

(9.45)

CHAPTER 10

MISCELLANEA Introduction

10.1

In the last eight chapters we have seen the most popular and commonly encountered discrete distributions. There are a few natural generalizations of some of these distributions which are not only of interest due to their mathema.tica1niceties but also due to their interesting probabilistic basis. These are described briefly in this chapter, and their basic properties are also presented.

P6lya Distribution

10.2

P6lya (1930) suggested this distribution in the context of the following combiiiatoriad problem; see also Eggenberger and P6lya (1923). There is an urn with b black and T red balls. A ball is drawn at random, after which c 1 (where c 2 -1) new balls of the same color are added t,o the urn. We repeat this process siiccessively n times. Let X , be the total number of black balls observed in these n draws. It can ea.sily be shown that X , takes on values 0,1, . . . , n with the following probabilities:

+

TI

pk

=

where k

(k)

+

+ +

b(b C) . . . { b ( k - ~)c}T(T+ C) . . . { T ( b + T ) ( b + T C)(b+ T 2c). . . { b +

= 0 , 1 , . . . ,n

p=-

and n

b b+r '

=

+

+ (n T

-

+ (n

k -

~

l)~}

l)C}

(10.1)

1

1 , 2 , . . . . Let us now denote

r q = 1- p = -

C

(10.2)

a=bfr'

b+r

Then, the proba.bility expression in (10.1) can be rewritten in the form pk =

n P ( P + a ) .. . { P (k) (1

+ ( k l)ol}q(q + a ) ' . { g + ( n + a ) ( 1 + 2 a ) . ' . (1 + ( n 1)a) -

'

-

-

k

-

l)a}

>

(10.3)

wherein we can forget that p , q, and a are quotients of integers a.nd suppose only that 0 < p , q < 1, arid a > - l / ( n - 1). Note that if cy > 0, then (10.3), 101

102 for 1 < k

MISCELLANEA

< n , can be expressed in terms of complete beta functions

as

(10.4) where the complete beta function is given by

Distribution in (10.3) is called the Po’lya distribution with parameters n , p , and a , where n = 1 , 2 , . . . , 0 < p < 1 and Q > - l / ( n - 1). If a = 0, then the probabilities in (10.3) simply become the binomial B ( n , p )probabilities. As a matter of fact, in the P6lya urn scheme, we see that this case corresponds to the number of black balls in a sample of size n with replacement from the urn, in which the ratio of black balls equals p . Next, we know from Chapter 8 that a hypergeometric random variable can be interpreted as the number of black balls in a. sample of size n without replacement from the urn. Clearly, this corresponds to the P6lya urn scheme with c = -1, that is, Q = -1/(b r). Thus, P6lya distributions include binomial as well as hypergeometric distributions as special cases, and hence may be considered as a “bridge” between these two types of distributions. Note also that if a random variable X has distribution (10.4), then

+

EX

=

(10.5)

np

and

Var X

10.3

l t a n

= np(1 - p ) ____

l+cW

.

(10.6)

Pascal Distribution

From Chapter 7 we know that the negative binomial N B ( m , p )distribution with integer parameter ~ r is t same as the distribution of the sum of m indepeiident geometrically distributed G ( p ) random varia.bles. If X N B ( m ;p ) , then N

(10.7) The appearance of this class of distributions in this manner was, therefore, quite natural. Later, the family of distributions (10.7) was enlarged to the family of negative binomial distributions N B ( a ,p ) for any positive parameter a . Negative binomial distributions with integer parameter a are sometimes (especially if we want to emphasize that Q is an integer) called the Pascal distributions.

NEGATIVE HYPERGEOMETRIC DISTRIBUTION

10.4

103

Negative Hypergeometric Distribution

The Pasca.1 distribution (i.e., the negative binomial distribution with integer parameter a ) has the following “urn and balls” interpretation. In an urn that contains black and red balls, suppose that the proportion of black balls is p . Balls are drawn from the urn with replacement, which means tha.t we have the P6lya urn scheme with c = 0. The sampling process is considered to be completed when we get the mth red ball (i.e., the mth “failure”). Then, the number of black balls (i.e., the number of “successes”) in our random sample has the distribution given in (10.7). Let us now consider an urn having T red and b black balls. Balls are drawn without replacenient (i.e., the P6lya urn scheme with c = -1). The sampling process is once again considered t o be completed when we get the m t h red ball. Let X be the number of black balls in our sa.mple. Then, it is not difficult to see that X takes on values 0 , 1 , . . . , b with probabilities

Such a randoni variable X is said to have the negative hypergeometric distribution with parameters b = 1 , 2 , . . ., r = 1 , 2 , . . ., and m = 1 , 2 , . . . , r.

Exercise 10.1 Show that the mean and variance of the random variable above are given by

EX=and Var X respectively.

=

m(b + r (T

mb r+l

+ l)b(r

+

1)2(T

(10.9)

-

m

+ 2)

+ 1)’

(10.10)


Part I1

CONTINUOUS DISTRIBUTIONS


CHAPTER 11

UNIFQRM DISTRIBUTION 11.1 Introduction The uniform distribution is the simplest of all continuous distributions, yet is one of the most important distributions in continuous distribution theory. As shown in Chapter 2, the uniform distribution arises naturally as a limiting form of discrete uniform distributions.

11.2

Notations

We say that the random variable X has the standard uniform distribution if its pdf is of the form 1 0

ifO a ) , we obtain the distribution with pdf ifa<x = zn.

n zd(z") = n+l '

(11.32)

(11.33) (11.34)

and 1- E M n

1

+1+ O

= Em, = ___ 71

asn-tm.

(11.35)

This shows that for large n, 1- Mn and m, are close to zero. To examine the behavior of m, a.nd 1- Mn for large n, we will now determine the asymptotic distributions of these ra.ndom variables, after suitable normalization. It turns out that

Gn(:)

":I

=

P{m,< n =P{~zm,<x}

-

1 - (1 -

37''

1' 1 -

e P

for z

> 0,

(11.36)

as n t 00. Thus, we have established that the asymptotic distribution function of n min(X1, X z , . . . , X,) is given by

G ( x )=

1 - e-"

ifx 0,

(14.2)

and the corresponding pdf is pa(.)

0 < 2 < 1, Q > 0.

= ad-1,

(14.3)

In the special case when a = 1, we have the standard uniform distribution. The linear transformation gives us a general form of the power distribution. The corresponding cdf is

2-a G a ( z ) =( T ) a ,

a < z < a + h , a>0,

(14.4)

and the pdf is

ga(x) = ah-"(x

-

ay-1,

(14.5)

where Q > 0, -cc < a < 03 and h > 0 are the shape, location, and scale parameters of the power distribution, respectively. We will use the notation

x

-

P o ( a , a, h )

to denote the random variable X having cdf (14.4) and pdf (14.5).

128

POWER DISTRIBUTION

Remark 14.1 Let U have the standard uniform U ( 0 , l ) distribution, and X Po(a,a , h ) . Then, N

X

14.3

d

a

+ hU'l*.

(14.6)

Distributions of Maximal Values

Let X I , X 2 , . . . , X , be independent random variables having the same power Po(cu,a , h ) distribution, and M , = max(X1, X 2 , . . . , X,}. It is then easy to see that,

P { M , 5 x} m,

=

(xha)"'L, __

M,

N

a<x X-"}

=

1 - X-",

x >_ 1,

(15.1)

and (15.2)

A book-length account of Pareto distributions, discussing in great detail their various properties and applica.tions, is available [Arnold (1983)].

15.2

Notations

A random variable X with cdf (15.1) and pdf (15.2) is said t o have the standard Pareto distribution. A linear transformation Y=a+hJ gives a general form of Pareto distributions with pdf

ah"

if z > a + h if z < a + h ,

1 0

and cdf

FY(z) = 1-

(")

2-a

a

,

x >_ a t h , 133

--M

< a < 00. h > 0.

(15.3)

(15.4)

134

PARETO DISTRIBUTION

We will use the notation

a > o , -cw 0,

Pa(a,O,l ) ,

corresponds to the standard Pareto distribution with pdf (15.2) and cdf (15.1). Note that if Po(a,0, l),

v

then

N

1 x =V

N

Pa(a,O,1).

It is easy t o see that if Y Pa(a,a , h ) , then the following representation of Y in terms of the standard uniform random variable U is valid: N

Y

1 a + hU-'la.

(15.5)

Exercise 15.1 Prove the relation in (15.5).

15.3

Distributions of Minimal Values

Just as the power distributions have simple form for the distributions of maximal values M , = max(X1, X,, . . . , X,}, the Pareto distributions possess convenient expressions for minimal values. Let X1 , X2, . . . , X , be independent random variables, and let

XI,

N

P u ( ~ Ia,, ,h ) ,

k = 1 , 2 , . . . , R,

i.e., these random variables have the same location and scale parameters but have different values of the shape parameter QI,. Now, let mTL = min{X1, X 2 , . . . ,X n } .

It is then ea.sy to see that

J:

where

a(n) = a1

+ a2 + . . . + an.

We siniply note from (15.6) that

rn,

N

Pa(a(n),a, h).

Thus, the Pareto distributions are closed under minima.

2 a + h,

(15.6)

DISTRIBUTIONS OF MINIMAL VALUES

135

Exercise 15.2 Let X I ,X2, . . . ,X , be a random sample from the standard Pareto Pa(a,O,1) distribution, and let X I , , < X Z , , < ... < X,,, be the corresponding order statistics. Then, from the pdf of x k , , ( k = 1 , 2 . . . , n) given by

pk,n(x) = ( I c

n! {F,(2)}k-1 (1 - F,(x)}n--kp,(2), 0 < 2 < 1; - l ) ! ( n - Ic)!

where F,(s) and pa(.) mulas:

are as in (15.1) and (15.2), derive the following for-

and

Exercise 15.3 Let X I ,X 2 , . . . ,X , be a random sample from the standard pareto Pa(a,O,1) distribution, and let X I , , < Xz,, < ... < X,,, be the corresponding order statistics. Further, let

Prove that W1,W2,.. . ,W, are independent random variables with tributed as P a ( ( n- k 1)a,0 , l ) .

+

w k

dis-

Exercise 15.4 By making use of the distributional result above, derive the expressions for EXk,n and E(XZ%,)presented in Exercise 15.2. Exercise 15.5 Let V1,Vz,. . . ,V, be a random sample from the standard power Po(a,0 , l ) distribution, and let Vl,, < VZ,,,. . . , V,,, be the corresponding order statistics. Further, let X I ,X 2 , . . . , X , be a random sample from the standard Pareto Pa(a,O,l)distribution, and let X I , , < Xz,, < . . . < X,,, be the corresponding order statistics. Then, show that d

Xk,n =

1

vn-k+ 1 ,n

for I c = 1 , 2 ,..., n.

Exercise 15.6 By using the propoerty in Exercise 15.5 along with the distributional result for order statistics from power function distribution presented before in Exercise 14.2, establish the result in Exercise 15.3.

136

PARETO DISTRIBUTION

15.4 Moments

+

Let X Pa(a,O,1) and Y = a hX P a ( a , a , h ) . Unlike the power dist,ributions which have bounded support and hence possess finite moments of all orders, the Pareto distribution Pa(cy,a , h ) takes on values in an infinite interval ( a h, m) and its moments &IP and central moments E ( Y - EY)O are finite only when ,B < a. N

N

+

-

Moments about zero: If X Pa(a,O,l ) ,then we know that X can be expressed in terms of the standard uniform random variable U as

Hence. an = E X " =EU-"I"

1

1

=

and

x-nla d x =

if n

a , = oc)

I n the general case when Y consequently, we obtain EYn

=

0,

a a-n

__

if n < a ,

(15.7)

2 a.

P a ( a , a , h ) ,the relation in (15.5) holds and =E(u

+ hU-'/O)"

(15.8)

a.nd a ,

= 00

for n

2 a. In particular, we have a1

EY

1

ha a-1

= a + __

if a > l

(15.9)

and (15.10)

ENTROPY

137

Central moments:

The relation in (15.5) can also be used t o find the central moments of a Pareto distribution, as follows:

Pn

=

E(Y - EY)" = h"E(U-l/"

-

EU-l/")"

(15.11) for n < a. From (15.11) we find the variance of Y t o be Var Y

= /32

=

a-2

(a-1)2 (15.12)

Plots of Pareto density function are presented in Figure 15.1 for some choices of CY.

15.5

Entropy

The entropy N ( Y ) of Y

N

P a ( a ,a,, h ) is defined by

where p(x) is as given in (15.3).

Exercise 15.7 If Y

-

Pu(a,a , h), show that

H ( Y ) = -(I

+ 2a)logh

-

logs

+ 1-+ o!1 . -

( 15.13)

PARETO DISTRIBUTION

138

-3

Pareto(0 5) Pareto(2) Pareto(4)

m

U

n

a

m I I

,, .

I

7

.

, \

1

.- I . - _

0

..--.__ .--.---.._.__._.....___

I

I

I

I

I

1

2

3

4

5

X

Figure 15.1. Plots of Pareto density function

CHAPTER 16

BETA DISTRIBUTION 16.1

Introduction

+

Let a stick of length 1 be broken at random into ( n 1) pieces. In other words, this means that n breaking points U l , . . . , U, of the unit interval are taken independently from the uniform U ( 0 , l ) distribution. Arranging these random coordinates U 1 , . . . , Vn in increasing order of magnitude, we obtain the uni,form order statistics

[see (11.39)], where, for instance,

U1,n = min(U1,. . . , Un} and

Un,+ = max(U1,. . . ,77,).

The cdf Fk,n(z) of

U k , n was

obtained earlier in (11.41) as

where (16.1)

denotes the incomplete beta function, and (16.2)

is the complete bet,a function. The corresponding pdf form fk,n(IC) =

n! &'(l - z)~-', ( k - l ) ! ( n- k ) !

139

fk,n(z)

of

Uk,n

0 < IC < 1.

has the (16.3)

140

BETA DISTRIBUTION

Notations

16.2

We say that the order statistic Uk,n has the beta distribution with shape parameters k and n - k 1. More generally, we say that a random variable X has the standard beta distribution with parameters p > 0 and q > 0 if its pdf is given by

+

A linear transformation Y=a+hX,

-m 1 and q > 1, then the beta(p, q ) distribution is unimodal, and its mode (the point a t which the density function p x ( z ) takes its maximal value) is at 2 =

’ ~

p+q-2

, a.nd the density function in this case is a bell-shaped curve. If

SOME TRANSFORMATIONS

141

p < 1 (or q < l),then p x ( z ) tends t o infinity when z + 0 (or when z + 1). If p < 1 and q < 1, then p x ( z ) has a U-shaped form and it tends to infinity when 2 --t 0 as well as when x -+ 1. If p = 1 and q = 1, then px(x) = 1 for all 0 < z < 1 (which, as noted earlier, is the standard uniform density). Plots of beta density function are presented in Figures 16.1-16.3 for different choices of p and q .

16.4

Some Transformations

-

It is quite ea5y to see that if X N beta(p, q ) , then 1 - X beta(q,p). Now, let us find the distribution of the random variable V = 1/X.

Exercise 16.1 Show that (16.6)

Taking 4 = 1 in (16.6), we obtain the pdf of Pareto Pa(p, 0 , l ) distribution. The density function p w ( z ) of the random variable

takes on the form 1

p w ( 2 )=

rJyV-1

(I + x ) P + 4 '

(16.7)

> O.

Distribution with pdf (16.7) is sometimes called the beta distribution of the second kind.

16.5

Moments

-

Since the beta distribution has bounded support, all its moments exist. Consider the standard beta distributed random variable X bet+, q ) . Since 0 5 X 5 1, we can conclude that

and

EIX for any a

2 0.

-

EX/" 5 1

BETA DISTRIBUTION

142

c :

LD

I

Beta(0 5 , 0 5) Beta(0 5 , 2) Beta(0 5 4)

U

I

m U

n

a

c.l

7

0

I

00

I

02

I

I

04

06 X

I

I

08

10

Figure 16.1. Plots of beta density function when p = 0.5

MOMENTS

,

f

00

02

143

04

06 X

I

I

08

10

Figure 16.2. Plots of beta density function when p

= 2.0

BETA DISTRIBUTION

144

I 00

I

I

I

I

I

02

04

06

08

10

Figure 16.3. Plots of beta density function when p = 4.0

MOMENTS

145

-

Moments about zero: Let X beta(p,q). Then

(16.8) and, in particular,

= a+

EY

hEX

hP

(16.13)

= a+ -

P+4

and

EY2

=

oZ2+ 2ahEX

+ h2EX2 (16.14)

-

Central moments: If X bet& q ) , then we readily find from (16.9) and (16.10) the variance of t o be

0, enlarge the

(18.3)

157

158

EXPONENTIAL DISTRIBUTION

Note that F y ( 2 ) in (18.3) can be rewritten as

{

(

F y ( 2 ) = max 0 , l - exp - “ x u ) } , ~

-m<xz 0. In many situations, we deal with nonshifted ( a = 0) exponential distribution E(0,A). For the sake of simplicity] we will denote it by Y E ( X ) ,and in this case N

Fy(x)= max 0 , 1 - exp

{

(-

and

Note that if X

-

31

(18.6)

(18.7)

E(1),then Y

= AX

N

E(X).

E(1) and Y E(X) be independent random variExercise 18.1 Let X ables. Then, find the value of the parameter X such that P { X 2 Y } = N

i.

N

Exercise 18.2 Let X and Y be independent standard exponentia.1 random variables. Find the distribution of X / Y .

18.3 Laplace Transform and Characteristic F‘unct ion If Y form:

N

E ( X ) , then its Laplace transform py(s) = Ee-SY has the following 1 1fXs.

If V

=a

+ Y , then V

N

(18.8)

E ( u ,A), and consequently, (18.9)

MOMENTS

159

To obtain the characteristic function of an exponentially distributed ran-

dom variable, we can use the following relation between Laplace transforms

cpx(s)= EecSX and characteristic functions f x ( t )= EeitX:

fx ( t )= cpx (-it).

(18.10)

Using (18.10) and the expression for the Laplace transform of exponential distribution given above, we ca.n write the characteristic functions of X E(l),Y E(A), and V E ( a , X ) as N

N

N

(18.11)

( 18.12) (18.13) respectively.

Moments

18.4

The exponential decay of the pdf (18.5) provides the existence of all the moments of exponential distribution.

Moments about zero: If Y E ( A ) ,then N

a,=EY"

xn exp

=

(- ):

dx

z"e-" dx = Anr(n =

+ 1)

n = 1 , 2 , . .. .

Ann!,

(18.14)

In particular, we have (18.15) (18.16) ( 18.17)

EY = A, E Y 2 = 2A2, E Y 3 = 6A3, E Y 4 = 24A4 In the general case when V

EV"

= E(u

+ U)"

N

( 18.18)

E(a,A), we obtain =

(;)U"-~EY

n!xm, Xran--r

-

r=O

n = 1 , 2, . . . .

(18.19)

160


Central moments:

-

Indeed, the central moments coincide for random variables V N E ( a , A ) and Y E(A). Let X have the standard exponential E(1) distribution. Then

v - EV 5 Y

-

EY

d

= X(X

-

EX),

and therefore

pn = E(V - EV)"

=

E ( Y -- A)"

=

A"E(X - 1)"

r=O

c r=O

=

Ann!

( n- r ) ! (-1)T

PI

'

N7e can show from (18.20) tjhat central moments recurrence relations: @I /?,+I

= =

0, (n

+ l)A@, + (-l)"+'Xn+',

n = 1 , 2 ,. . . . (18.20)

fin satisfy the following

TL

=

1,2,.. . .

(18.21)

In particular, we obtain froin (18.20) the variance of Y to be Va.r Y = p2 = x2.

18.5

-

( 18.22)


Let Y E(X). Then, from (18.7), we see readily that the distribution is unimodal with the mode at 0. It has an exponential decay forni. Also, from (18.20), we obta.in the third and fourth central moments of Y as p3

= 2 3 ~ and

p4=9x4.

We then readily find Pearson's measures of skewness arid kurtosis as

respectively. Thus, the exponential distribution is a positively skewed and leptokurtic distribution which is reverse J-shaped. Plots of exponential density function are presented in Figure 18.1 for some choices of A.


161

0 r-

03 0

W

0

CL

n Q U 0

c.I 0

0

0

I

I

I

I

I

I

0

1

2

3

4

5

X

Figure 18.1. Plots of exponential density function

162


18.6

Entropy

Since the random variable V H ( V ) is defined by

N

E ( a ,A) has pdf as given in (18.5)' its entropy

-

x

") dx

=1

+ log A.

(18.23)

Consider the set of all probability density functions p ( z ) satisfying the following restrictions: (a) p ( z ) > 0 for z 2 0 and p ( z ) = 0 for z < 0; z p ( z ) dx = C, where C is a positive constant. (b) It happens that the ma.xima1value of entropy for this set of pdf's is attained for

that is, for an exponential distribution with mean C

18.7

Distributions of Minima

Let Y k N E(Xk),k min(Y1,. . . , Y,}.

=

1 , 2 , . . . , n, be independent random variables, and mn =

Exercise 18.3 Prove that m, (for any n = 1 ' 2 , . . .) also has the exponential E(X) distribution, where

x =A1

+

" '

+A,.

The statement of Exercise 18.1 enables us to show that d

yl

min(Y1,. . . , Y ? }= -, n

n = 1 , 2 , . ..

( 18.24)

when Yl,Y2,.. . are independent and identically distributed as E ( X ) . It is of interest to mention that property (18.24) characterizes the exponential E(X) (A > 0) distribution.

UNIFORM AND EXPONENTIAL ORDER STATISTICS

163

Uniform and Exponential Order Stat istics

18.8

In Chapter 11 [see (11.39)] we introduced the uniform order statistics

. . 5 Un,n

U1,n L u2,n I .

arising from independent and identically distributed random variables Uk U ( 0 ,I), k = 1 , 2 , . . . ,n. The analogous ordering of independent exponential E ( l ) random variables X I , X z , . . . , X , gives us the exponential order statistics N

X1,n Note that

I X2,n 5 . . 5 Xn,n'

= min(X1,

and

X2,. . . , X n }

Xn,n = max(X1, X 2 , . . . , X n } .

There exist useful representations of uniform as well as exponential order statistics in terms of sums of independent exponential random variables. To be specific, let X I , X,, . . . be independent exponential E(1) random variables, and S k = x1

+ x,+ . + X k , '

k = 1,2, . ..

'

Then, the following relations are valid for any n = 1 , 2 , . . . : (18.25) and

Note that relation (18.24) is a special case of (18.26). The distributional relations in (18.25) and (18.26) enable us t o obtain some interesting corollaries. For example, it is easy t o see that the exponential spacings

D2,n = X2,n

D1,n = X1,ni

are independent, and that

( n- k

-

X l , n , . . . 3 Dn,n = Xn,n - X n - ~ , n

+ I)&,,

N

E(1).

It also follows from (18.26) that

-

1 1 -+n n-1

+ . - . +n - k1+ l '

1 5 k 5 n.

(18.27)

Upon setting k = n = 1 in (18.25), we obtain the following interesting result: If X1 and X , are independent exponential E(l) random variables, then Z = X , / ( X l X,) has the uniform U ( 0 , l ) distribution.

+

164


Exercise 18.4 Using the representation in (18.26), show that the variance of x k , , is given by

and that the covariance of

and Xe,, is simply

Xk,,

COv(Xk,n,Xe,+)= Var

Y1

< t 5 72 .

Convolutions

18.9 Let Yl

15 k

Xk,n,

+ Y2.

N

E(A1) and

Y2

-

E(A2) be independent random variables, and V

=

Exercise 18.5 Show that the pdf pv(x) of V has the following form:

e -x/x1 PV(2) =

if A1

# Az,

__ e - x / x z

z>0

1

A 1 - A2

(18.28)

and p v ( x )=

X

z

--e-"/X, A2

20

(18.29)

if A 1 = A2 = A.

Equation (18.29) gives the distribution of the sum of two independent, raridon1 variables having the same exponential distribution. Now, let us consider the sum

v,,= XI + ' . + X, '

of n independent exponential random variables. For the sake of simplicity, we will suppose that XI, N E ( l), k = 1 , 2 , . . . , n. It follows from (18.8) that the Laplace transforni p(s) of any XI, has the form (18.30)

Then the Laplace transform p,(s) positive values, is given by p,(.)

00

=

of the sum V,, which also takes on only

e - S Z p n ( x ) dx = (1

+

S ) y ,

(18.31)

DECOMPOSITIONS

165

where p,(x) denotes the pdf of V,. Comparing (18.30) and (18.31), we can readily see that

a.nd, hence,

It then follows from (18.32) that (18.33) Probability density function in (18.33) is a special case of g a m m a distribu t i o n s , which is discussed in detail in Chapter 20.

Exercise 18.6 Let X1 and X2 be independent random variables having the standard E(1) distribution, and W = XI - X2. Show that the pdf of W has the form 1 p w ( x ) = - e-Iz1, 2

-00

< x < 00.

(18.34)

The probability density function in (18.34) corresponds to the standard Laplace distribution, which is discussed in detail in Chapter 19.

18.10 Decompositions Let X N E(1). We will now present two different ways to express X as a sum of two independent nondegenerate random variables. (1) Consider random variables V = [XI and U = { X } , which are the integer and fractional parts of X , respectively; for example, V = n and U = x if X = n x,where n = 0 , 1 , 2 , . . . and 0 5 x < 1. It is evident that

+

X=V+U.

(18.35)

Let us show that the terms on the RHS of (18.35) are independent. It is not difficult to check tha.t V takes on values 0 , 1 , 2 , . . . with probabilities p , = P{V = n } = P { n 5 X

< n + 1) = (1 - X)X",

(18.36)

166

EXPONENTIAL DISTRIBUTION where X = l/e. This simply means that V has the geometric G(l/e) distribution. We see that the fractional part of X takes values on the interval [0, 1) and 03

Fu(x)

P{U 5 z}

=

=

C P { n5 x 5 n + z }

n=O

To check the independence of random variables V and U , we must show that for any n = O , 1 , 2 , . . . and 0 5 z 5 1, the following condition holds:

P{V = n,

u 5 .}

(18.38)

= p,Fv(z).

Relation (18.38) becomes evident once we note that

P{V

= n,

U 5 x}

= P{n 5

X < n + x}

= eFn(1- e-").

Thus, we have expressed X having the standard exponential distribution as a. sum of two independent random variables. Representation (18.35) is also va.lid for Y E ( X ) ,X > 0. In the general case, when Y E ( a ,A), the following general form of (18.35) holds: N

N

Y =a

+ [Y

-

a] + {Y - a } ,

where the integer and fractional parts ([Y- a] and {Y random variable Y - a are independent.

-

a } ) of the

(2) Along with the random variable X N E(l),let us consider two independent random variables Yl and Y2 with a common pdf

(18.39) It is easy to prove that the nonnegative function g ( x ) above is really a probability density function because of the fact that

Exercise 18.7 Show that the sum Y1f yZ has the exponential E(1) distribution.

Thus, we have established that the exponential distributions are decomposable. Moreover, in Chapter 20 we show that any exponential distribution is infinitely divisible.

LACK OF MEMORY PROPERTY

18.11

167

Lack of Memory Property

In Chapter 6 [see (6.39)], we established that the geometric G ( p ) random variable X for any n = 0, 1,. . . and m = 0,1, . . . satisfies the following equality:

P { X 2 n + m I X 2 m} = P { X 2 n}. Furthermore, among all discrete distributions taking on integer values, geometric distribution is the only distribution that possesses this “lack of memory” property. Now, let us consider Y E ( X ) , X > 0. It is not difficult to see that the equality N

P{Y 2 z + y

1 Y 2 y}

= P{Y

2 z}

(18.41)

holds for any z 2 0 and y 2 0. This “lack of memory” property characterizes the exponential E(X) distribution; that is, if a nonnegative random variable Y with cdf F ( z ) [such that F ( z ) < 1 for any z > 01 satisfies (18.41), then

F ( z ) = 1 - e-+, for some positive constant A.

II:

> 0,


CHAPTER 19

LAPLACE DISTRIBUTION 19.1 Introduction In Chapter 18 (see Exercise 18.4) we considered the difference V = XI - X , of two independent random variables with X I and X , having the standard exponential E(1) distribution. It wa.s mentioned there that the pdf of V is of the form 1 pv(x) = - e + ~ , -co < x < 00. 2 This distribution is one of the earliest in probability theory, and it was introduced by Laplace (1774). A book-length account of Laplace distributions, discussing in great detail their various properties and applications, is available and is due to Kotz et al. (2001).

19.2

Notations

We say that a random variable X has the Laplace distribution if its pdf px (x) is given by (19.1)

-

We use X L ( a ,A) to indicate that X has the Laplace distribution in (19.1) with a 1oca.tionpa.rameter a and a scale parameter X (-co < a < 00, X > 0). In the special case when X has a symmetric distribution with the pdf (19.2)

-

we denote it by X L(X) for the sake of simplicity. For insta.nce, V denotes that V has the standard Laplace distribution with pdf 1

pv(x) = - e-I21, 2

--M

169

< x < 00,

-

L(1)

(19.3)

170

LAPLACE DISTRIBUTION

and its cdf Fv(x)has the form (19.4)

+

Indeed, if V L(1), then Y = XV L(X),and X = a XV L ( a ,A). Laplace distributions are also known under different names in the literature: the first law of Laplace (the second law of Laplace is the standard normal distribution), double exponential (the name dou.ble exponential or, sometimes, doubly exponential is also applied to the distribution having the cdf N

N

-co < x < 00,

F ( z ) = exp(-e-2),

which is one of the three types of extreme value distributions), two-tailed exponen,tial, and bilateral exponential distributions.

19.3

Characteristic Function

Recall that if V L ( 1 ) , it can be represented as V = X I - X,, where XI and X z are independent random variables and have standard exponential E( 1) distribution. Consequently, the characteristic function of V is N

f v ( t )= EeitV = EeitX1Ee-itX2 and, therefore, we obta.in from (18.11) that fV(t) =

1 1 1 t2 (1 - it)(l +it)

+

Since the linear transformation X = a distribution, we readily get

.

(19.5)

+ XV leads to the Laplace L ( a ,A) (19.6)

It is of interest t o mention here that (19.5) gives a simple way to obtain the characteristic function of the Cauchy distribution [see also (12.6) and (12.7)]. In fact, (19.5) is equivalent to the following inverse Fourier transform, which relates the characteristic function f v ( t )with the probabilit.7 density function PV (XI:

(19.7) It t,hen follows from (19.7) that (19.8) Now wt' can see from (19.8) that the Cauchy C(0.1) distribution with pdf

has characteristic function

f ( t ) = e-lt'.

MOMENTS

19.4

171

Moments

The exponential rate of the decreasing nature of the pdf in (19.1) entails the existence of all the moments of the Laplace distribution.

-

Moments about zero: Let Y L ( A ) . Since Y has a symmetric distribution, we readily have =

a2n-1

EY2"-

=o,

1

For moments of even order. we have ( ~ 2 " = EY2"

1"

n=1,2)....

(- y )dx (- T) dx

x2nexp 2A -m 1 " x2"exp

=

x.6

=

dx

x2n e -x

=

A2"Iw

=

x2" 1 ? ( 2 ~ +1) = x2" ( 2 4 4

=

i , 2 , . . . . (19.9)

In particular, we have

E Y 2 = 2A2

( 19.10)

E Y 4 = %A4.

(19.11)

and

In the general case when X = a + Y

N

L ( a ,A), we also have

n

EX"

= E(u

+ Y)" = C

an-'EYT,

n = 1,2,.. .,

(19.12)

r=O

where

EY'

=

if r = 1 , 3 , 5 , ... if T = 0 , 2 , 4 , . . . .

0 A'r!

-

Central moments: Let Y L(X) and X = a + Y L ( a ,A). It is clear that central moments of X and moments about zero of Y are the same since N

,& = E ( X - E X ) "

= E(Y

-

EY)"

= EY",

n = 1 , 2 , .. . .

From (19.13) and (19.10), we immediately find that

(19.13)

172


19.5

-


Let Y L(X). Then, due to the symmetry of the distribution of Y , we readily have the Pearson’s coefficient of skewness as y1 = 0. Further, from (19.11), we find the fourth central moment of Y as p4

=

a4

=

a4 =

-

+

4 ~ 3 ~ 61 ~

~ -2 ~ ~C X 4:

(19.15)

From Eqs. (19.15) and (19.14), we readily find the Pearson’s coefficient of kurtosis as (19.16)

Thus, we find the Laplace distribution t o be a symmetric leptokurtic distribution.

19.6

Entropy

The entropy of the random variable X

-

J’, O

+ =

1

N

L ( a ,A) is given by

ex ( l o g ( 2 ~ )- x log e> tin:

lrr f + e-” (log(2X)

log(2Xe) = 1

3: loge}

dx

+ log(2X).

( 19.17)

Consider the set of all probability density functions p(x) satisfying the following restriction: (19.18) where C is a positive constant. Then, it has been proved that the maximal value of entropy for this set of densities is attained if p ( x ) = - exp 2c that is, for a Laplace distribution.

( ;.) -

-

,

173

CONVOLUTIONS

19.7

-

Convolutions

-

Let Y1 L(X) and Y 2 L(X)be independent random variables, W = Yl +Y2, and T = Y1- Y 2 . Since the Laplace L(X)distribution is symmetric about zero for any X > 0, it is clear that the random variables W and T have the same distribution. Exercise 19.1 Show that densities p w ( z ) and p ~ ( zof) the random variables W and T have the following form:

-

Next, let Y1 N L(X1) and Y 2 L(X2) be independent random variables each having Laplace distribution with different scale parameters. In this case, too, the random variables W = Yl +Y2 and T = Y1 -Y2 have a common distribution. To find this distribution, we must recall that characteristic functions fl(t) = EeitY1 and f ~ ( t )= EeitY2have the form

and hence the characteristic function fw(t) is given by

It follows from (19.20) that the probability density functions

of the random variables W , Yl,and YLsatisfy the relationship

Hence, we obtain

--oo

< 2 < 00.

(19.22)

174


Decompositions

19.8

Coming back to the representation V = X1 - X2,where V L ( 1 ) and independent random variables X1 and X2 have the standard exponential E ( 1 ) distribution, we easily obtain that the distribution of V is decomposable. Recall from Chapter 18 that the random variables X1 and X , are both decomposable [see, for example, (18.35) and Exercise 18.51. This simply means that there exist independent random variables Yl,Y2,Y3, and Y4 such that N

and

d

x2 = Y3

+ y4,

and hence

V = (Yl - y3) + (Y2 - Y4), d

(19.23)

where the differences Y1 - Y3 and Y2 - Y4 are independent random variables and have nondegenerate distributions. Therefore, V L( 1) is a decomposable random variable. Furthermore, a linear transformation a XV preserves this property and so the Laplace L ( a ,A) distribution is also decomposable. In addition, since exponential distributions are infinitely divisible, the representation V = X 1 - X 2 enables us to note that Laplace distributions are also infinitely divisible. N

19.9

+

Order Statistics

Let Vl, Vl, . . . , V, be independent and identically distributed as standard Laplace L(1) distribution. Further, let V1,n < Vz,, < . . . < V,,n be the corresponding order statistics. Then, using the expressions of p v ( x ) and F v ( z ) in (19.3) and (19.4), respectively, we can write the kth moment of V?,, (for 1s T 5 n) as

-

n! ( r - l ) ! ( n- T ) !

x k {Fv(x)}'-l (1 - Fv(2)}"-"pv(z) dx

--oi)

n!

-

-

/"

an(?--

n! I)! ( n- T ) !

xk

(2 - e - x } T - '

{e-x}n--r

ePx dx (19.24)

ORDER STATISTICS

175

+

Now, upon writing 2- e-" in the two integrands as 1 ( 1- ec") and expanding the corresponding terms binomially, we readily obtain

E (Vtn)

l n n! + (-1)"E 2n ( T - I ) (! n - r ) ! . z=r

-

1

c (3

r-l

2n i=O

2 k (T -

( n - i)! 1 - i)! (TL - T ) ! { I - e-x}r-l-i

i!

z=r

x { e - ~ > ~ - ' e - X dz.

(19.25)

Noting now that if X I ,X,, . . . ,X , is a random sample from a standard exponential E ( l ) distribution and that X I , , < X,,, < ... < Xm,m are the corresponding order statistics, then

(19.26) Upon using this formula for the two integrals on the RHS of (19.25), we readily obtain the following relationship:

( 19.27) Thus, the results of Section 18.8 on the standard exponential E ( l ) order statistics can readily be used to obtain the moments of order statistics from the standard Laplace L ( 1 ) distribution by means of (19.27).

176


Similarly, by considering the product moment E (V,,, K , n ) ,splitting the double integral over the range (-cc < z < y < cc) into three integrals over the ra.nge (-cc < z < y < 0), (-cc < z < 0, 0 < y < m), and (0 < z < y < oo), respectively, and proceeding similarly, we can show that the following relationship holds:

for 1 5 r < s 5 n, where, as before, E ( X k , m Land ) E ( X k , m Xe,,) denote the single and product moments of order statistics from the standard exponential E ( 1) distribution.

Exercise 19.2 Prove the relation in (19.28).

Remark 19.1 As done by Govindarajulu (1963), the approach above can be generalized to any symmetric distribution. Specifically, if &,,'s denote the order statistics froin a distribution symmetric about 0 arid Xz,7L's denote the order statistics from the corresponding folded distribution (folded about 0), then the two relationships above continue to hold.

Exercise 19.3 Let Vl,V2,. . . , V, be a random sample from a distribution F ( z ) synirnctric about 0, and let Vl,, < V2,, < . . . < Vn,nbe the corresponding order statistics. Further, let Xe,, denote the l t h order statistic from a random sample of size nz from the corresponding folded distribution with cdf G(z) = 2 F ( z ) - 1 (for z > 0). Then, prove the following two rclationships between the moments of these two sets of order statistics: For 1 5 r 5 n and k 2 0,

(19.29)

ORDER STATISTICS for 1 5 r

177

< s 5 n,

E(V,,,K,,)

=

1 2"

-

The rehtionship in (19.27) can also be established by using simple probability arguments as follows. First, for 1 5 T 5 n, let us consider the event Vr,, 2 0. Given this event, let i (0 5 i 5 r - 1) be the number of V's (among V1,V2,.. . , V,) which are negative. Then, since the remaining ( n - i) V ' s (which are nonnegative) form a random sample from the standard exponential E(1) distribution, conditioned on the event V,,,, 2 0, we have d

V,,,= Xr-i,,-i wit,h binomial probabilities

(3I

for i

= 0 , 1 , . . . ,r

-

1

2". Next, lct us consider the event

(19.31)

V,,,< 0.

Given this event, let i ( r 5 i 5 7 ~ )be the number of V's(among V1,V2,.. . , Vn) which are negative. Then, since the negative of these i V ' s (which are negative) form a random sample from the standard exponential E(1) distribution, conditioned on the event Vr,, < 0, we also hame d

V,.,, = -Xi-,+l,i

for i

with binomial probabilities (:)/27L.

= r, r

+ 1 , .. . , n

(19.32)

Combining (19.31) and (19.32), we

readily obtain the relation in (19.27).

Exercise 19.4 Using a similar probability argument, prove the relation in (19.28).


CHAPTER 20

GAMMA DISTRIBUTION 20.1

Introduction

In Section 18.9 we discussed the distribution of the sum

w,= x1+. . . + x, of independent random variables X I , (k = 1 , 2 , . . . ,n ) having the standard exponential E(1) distribution. It was shown that the Laplace transform p,(s) of W , is given by p,(s) = (1+s)-,,

(20.1)

and the corresponding pdf p,(x) is given by

p,(.)

1

=

xn-l

e --I

if

5

> 0.

(20.2)

It was also shown in Chapter 18 (see Exercise 18.5) that the sum of two positive random variables Yl and Y2 with the common pdf

p+(x)= 1 ~

J;;

2

-112

e --I ,

(20.3)

x>O,

has the standard exponential distribution. In addition, we may recall Eq. (9.22) in which cumulative probabilities of Poisson distributions have been expressed in terms of the pdf in (20.2). Probability density functions in (20.2) and (20.3) suggest that we consider the family of densities

pa(.)

=C(a)xa-leP,

x > 0,

where C ( a )depends on cr only. It is easy to see that pa(.)

where

r(a)is the complete gamma function. 179

(20.4) is a pdf if

GAMMA DISTRIBUTION

180

Notations

20.2

We say that a random variable X has the standard gamma distribution with parameter a > 0 if its pdf has the form

(20.5) The linear transformation Y = a PY(X) =

+ AX yields a random variable with pdf

1 (x - a)a-1 exp qQ)XN

~

{ -x}> a. 2-a

if z

(20.6)

We will use the notation Y r(a,a,A) to denote a random variable with pdf (20.6). Hence, X r(a,O,1) corresponds t o the standard gamma distribution with pdf (20.5). Note that when QI = 1, we get the exponential distributions as a subset of the gamrria distributions. The special gamma dist,ribut,ions r (71/2,0, a ) , when n is a positive integer, are called chi-square distributions (xzdistribution) with n degrees of freedom. These distributions play a very important role in statistical inferential problems. From (20.5), we have the cdf of X r ( a ,0 , l ) (when Q > 1) as N

N

N

=

-

+ F X [(21,

px (x)

where X ' is a r(n - 1 , 0 , 1) random variable. Thus, the expression above for the cdf of X presents a recurrence relation in Q. Furthermore, if a equals a positive integer 7 1 , then repeated integration by parts a s above will yield

1-x7p n-1

=

e-Xz2

i=O

which is precisely the relationship between tlie cumulative Poisson proba.bilities arid tlie gamma distribution noted in Eq. (9.22).

Mode

20.3

Let, X r(a,O,1). If Q = 1 (the case of the exponential distribution), the pdf px (x)is a monotonically decreasing function, as we have seen in Cha,pter 18. If Q > 1, the gamma r(a,0 , l ) distribution is unimodal and it,s mode [the point wkiert: density function (20.5) takes the maximal value] is at 5 = cy 1. 011th: othcr hand, if Q < 1, then p x ( z ) tends to infinity as z + 0. N

-

MOMENTS

20.4 Let X

-

181

Laplace Transform and Characteristic Function r(a,0 , l ) . Then, the Laplace transform ' p ~ ( s )of X is given by (20.7)

Then the characteristic function f x ( t ) = EeitX has the form

fx ( t )= px ( - i t )

=

1 (1- it)"

(20.8)

As a result, we can write the following expression for the characteristic function of the random variable Y == a A X , which has the general gamma r ( a ,a,A) distribution:

+

(20.9)

20.5

Moments

The exponentially decreasing nature of the pdf in (20.5) entails the existence of all the moments of the gamma distribution.

-

Moments about zero: Let X r ( a ,0 , l ) . Then, an=EXn

=

(20.10) In particular, we have

E X = a, E X 2 = a ( a + l), EX3 = a(a+l)(a+2), E X 4 = C X ( Q+ l)(a+ 2)(a + 3).

(20.11) (20.12) (20.13) (20.14)

Note that (20.10) is also valid for moments E X " of negative order n > -a. Let us now consider Y N I ' ( a , a , A ) . Since Y can be expressed as Y = a A X , where X r ( a ,0, l),we can express the moments of Y as

+

N

EY" = E ( a + AX)"

=

-

GAMMA DISTRIBUTION

182

Central moments: Central moments of the random variables Y r(a,a , A) and V r(Q, 0, A) are the same. Now, let X have the standard gamma r(a,O, 1) distribution. Then, Y - EY 5 v - EV 5 A(X - E X ) N

N

and hence

p"

Y

=

E(Y

-

EY)"

= E(V

-

EV)"

= XnE(X

-

EX)"

As a special case of (20.16), we obtain the variance of the random variable r(a, a , A) as

N

Var Y = L,ZI = ax2.

20.6

(20.17)


From (20.16), we get the third central moment of Y t o be p3

= 2aX

3

using which we readily obtain Pearson's coefficient of skewness as (20.18)

This reveals that the gamma distribution is positively skewed for all values of the shape parameter a. Next, we get the fourth central moment of Y from (20.16) to be

p4 = ( 3 2 + 6 4 x 4 , using which we readily obtain Pearson's coefficient of kurtosis as 7 2

=

D4

Pz"

-

=3

+ -.a6

(20.19)

This reveals tha.t the gamma distribution is leptokurtic for all values of the shape parameter a. Furthermore, we note from (20.18) and (20.19) that as Q m, y1 and 7 2 tend to 0 and 3, respectively (which are the values of skewness and kurtosis of the normal distribution). As we shall see in Section 20.9, the normal distribution is, in fact, the limiting distribution of gamma. distributions as the shape parameter a + 00. Plots of gamma density function presented in Figure 20.1 reveal these properties.


183

LD 0

U

0

m 0

U

n

a

c.l 0

7-

0

0

0

0

5

10

15

Figure 20.1. Plots of gahrna density function

20

184

GAMMA DISTRIBUTION

Remark 20.1 From (20.18) and (20.19), we observe that y1 and the rela.tionship 72 = 3

72

+1.5~;~

satisfy (20.20)

which is the Type I11 line in the Pearson plane (that corresponds to the gamma faniily of distributions).

Exercise 20.1 Generalizing the pdf in (20.5),we may consider the pdf of the generalized ga’mmn distributions as

p x , ( z )= ~ ( a :h)z6ep1e--rs: ,

z

> 0, a: > 0, 6 > 0.

Then, find the normalizing constant C(a:,6). Derive the moments and discuss the shape charackristics of this generalized gamma family of distributions.

+

Consider the transformation Z = y logX’ when X’ has a generalized gamma distribution with pdf p x (z) ~ as given above. Then, we readily obtain the pdf of 2 a.s cy6(*--y)

p ~ ( z )= C(cr,6)e

e

&(Z-Y)

--cx)O, 6 > 0 , --cx, 0, 0 < u < 1.

(20.24)

From (20.24), we observe immediately that the random variables U and V are independent, with V having the gamma I'(a+P, 0 , l ) distribution (which is to be expected as V is the sum of two independent gamma random variables), and U ha.ving the standard beta(cu, P ) distribution. We need to mention here two interesting special cases. If X r ( 0 , l ) and Y I' 0, l ) , then U = X/(X Y ) has the standard arcsine distribution; and if X and Y have the standard exponential E(1) distribution, then U = X / ( X Y ) is a standard uniform U ( 0 , l ) random variable. Of course, we will still have the independence of U and V, and the b e t a ( a , p ) distribution of U if we take X r ( a ,0 , A) and Y r ( p ,0, A). Lukacs (1965) has proved the converse result: If X and Y are independent Y positive random variables and the random variables X / ( X Y ) and X are also independent, then there exist positive constants a , p, and A such that X r(a,0, A) and Y r(p,0, A). It was also later shown by Marsaglia (1974) that the result of Lukacs stays true in a more general situation (i.e., without the restriction that X and Y are positive random variables). N

+

(i,

N

+

-

N

+

N

+

N

+

Exercise 20.3 Show tha.t the independence of U = X / ( X Y ) and V = X + Y also implies the independence of the following pairs of random variables: (a)

V

X

f_

a.nd x

+

+Y ;

X 2 Y2 and X XY (d)

(xX Y*I2 +

+Y ;

and ( X

+ Y)'.

+

Exercise 20.4 Exploiting the independence of U = X / ( X Y ) and V = X Y , the fact that X and Y are both gamma, and the moments of the gamma distribution, derive the moment of the beta distribution in (16.8).

+

187

LIMITING DISTRIBUTIONS

Once again, let X r ( a ,0, A) and Y r(p,0, A) be independent random variables, and let V = X Y. Consider now the conditional distribution of X given that V = v is fixed. The results presented above enable us to state that conditional distributions of X/v and X / V are the same, given that V = ‘u. Thus, the conditional distribution of X/v, given that X Y = ‘u, is beta(a,p). If a = p = 1 [which means that X and Y have the common exponential E(A) distribution], the conditional distribution of X , given that X Y = v, becomes uniform U ( 0 ,v). The following more general result is also valid for independent gamma random variables. Let x k r ( a k , 0, A), k = 1 , 2 , . . . ,n, be independent random variables. Then, N

N

+

+

+

N

vx1

(X,+...+X,

.’.+X, ’ ... ’ X l +vxn

d

= {Xi, . . . , X,

1

1 V = Xi + . . . + X ,

= U} .

(20.25)

Recalling now the representation (18.25) for the uniform order statistics Ul,,,. . . , U,,,, which has the form

where

s k =

x1 + . . . + x k , and xk-r(i,o,i),

k = 1 , 2 ,...

are the standard exponential E(1) random variables, we can use (20.25) to obtain another representation for the uniform order statistics as {Ul,n,..

20.9

un,n}=d {SI,.. . , Sn I Sn+l

I}.

(20.26)


Consider a sequence of random variables Yl,Y2, . . . , where

~ , ~ r ( n , o , i ) , n = 1 , 2 ,..., and with it, generate a new sequence

( 20.2 7)

with its characteristic function f n ( t )being

(20.28)

GAMMA DISTRIBUTION

188

Exercise 20.5 Show that for any fixed t , as n + M;

(20.29)

Earlier, we encountered the characteristic function e-"l2 [for example, in Eqs. (5.43), (7.31) and (9.33)] corresponding to the standard normal distribution. Wc have, therefore, just esta.blished that the normal distribution is t,he limiting distribution for the sequence of gamma random variables W, in (20.27).

Exercise 20.6 Let a(.) denote the cumulative distribution function of a random variable with the characteristic function e-"/'. Later, in Chapter 23, we will find that

1

/'

@(x) = -

v%

-m

ePt2/' d t .

Making use of the limiting relation in (20.29), prove that

s";"

( n - l)! as n + 00.

Zn-l

0

e -x d z + @ ( 1 ) = 0 . 8 4 1 3 4 ...

(20.30)

CHAPTER 21

EXTREME VALUE DISTRIBUTIONS 2 1.1 Introduction In Chapter 11we considered the minimal value m, = min(U1,. . . , U,} of i.i.d. uniform U ( 0 , l ) random variables U,, Uz, . . . and determined [see Eq. (11.36)] that the asymptotic distribution of n rn, (as n + m ) becomes the standard exponential distribution. Instead, if we take independent exponential E ( 1) random variables XI, X z , . . ., and generate a sequence of minimal values

z, = min{XI,Xz,. . . ,X,},

n = 1 , 2 , ... ,

then, as seen in Chapter 18 [see Eq. (18.24)], the sequence n z, converges in distribution (as n + m ) t o the standard exponential distribution. Consider now the corresponding maximal values

M,

= max(U1,. . . , U,}

and

2, = max(X1, X,, . . . , X,},

n = 1 , 2 , .. . .

Then, as seen earlier in Eq. (11.38),as n + m,

P[n{Mn,- I} < z] + e",

z

< 0.

This simply means that the sequence n(1- M,} converges asymptotically t o the same standard exponential distribution. This fact is not surprising to us since it is clear that the uniform U ( 0 ,1)distribution is symmetric with respect to and, consequently, d

l-M,=rn, Let us now find the asymptotic distribution of a suitably normalized maximal value 2,. We have in this case the following result:

P{Z, - I n n < z}

=

( I - exp{-(z

+ Inn)})" (21.1)

189

190

EXTREME VALUE DISTRIBUTIONS

as n + 00. Thus, unlike in the previous cases, we now have a new (nonexponential) distribution for normalized extremes. The natural question that arises here is regarding the set of all possible limiting distributions for maximal and minimal values in a sequence of independent and idetically distributed (i.i.d.) random variables. The evident relationship d

max{-Yl, -Y2,. . . , -YrL} = - min{YI,Yz,.

. . , Y,}

(21.2)

requires us to find only one of these two sets of asymptotic distributions. Indeed, if some cdf H(x)is the limiting distribution for a sequence max(Y1, Y2, . . . ,Y,}, n = 1 , 2 , . . . , then the cdf G(x) = 1 - H ( - z ) would be the limiting distribution for the sequence min{-Yl, -Y2,. . . , -Y,}, n = 1 , 2 , . . . , and vice versa.

21.2

Limiting Distributions of Maximal Values

We are interested in all possible limiting cdf's H(x)for sequences =

{F(a,z

+ b,)),,

(21.3)

where F is the cdf of the underlying i.i.d. random variables Y1, Y2,. . . , and a, > 0 and b, ( n = 1 , 2 , . . .) are some normalizing constants; hence, H,(x)is the cdf of max(Y1, Y2,. . . , Yn} - b,

v, =

an Of course, for any F , we can always find a sequence a, ( n = 1 , 2 , .. .) which provides the convergence of the sequence V, to a degenerate limiting distribution. Therefore, our aim here is to find all possible nondegenerate cdf's H(x). Lemma 21.1 I n order for a nondegenerate cdf H(x)to be the limit of sequence (21.3) for some cdf F and normalizing constants a, > 0 and b, ( n = 1 , 2 , . . .), it is necessary and suficient that f o r any s > 0 and x,

H"[A(s)x + B ( s ) ]= H ( z ) ,

(21.4)

i h r e A ( s ) > 0 and B ( s ) are some functions defined for s > 0. Thus, our problem of finding the asymptotic distributions of maxima is reduced to finding all solutions of the functional equation in (21.4). It turns out that all solutions H ( z ) (up to location and scale parameters) are as follows: (21.5)

H3(x) =

e-e-"

, -m<x 0, can also be limiting distributions of minimal values. Gz is commonly known as the Weibull distribution.

21.4

Relationships Between Extreme Value Distributions

As seen in the preceding two sections, we have three types of extreme value distributions for maxima and three corresponding types of extreme value distributions for minima. The term extreme value distributions includes all distributions with cdf’s

with the standard members (when a = 0 and X = 1) being as given in (21.5)(21.10). Often, the name extreme value distribution has been used in the


192

literature only for distributions with cdf's H3

( "). -

It is useful to remem-

ber tha.t all six types of extreme value distributions given above are closely connected with exponential distributions.

Exercise 21.1 Let X have a standard exponential distribution. Then, show that the random variables

-x-'/",

X I / " , logX,

x-'/", -XI/",

-1ogx

have, respectively, the distributions

Linear transformations of random variables mentioned in Exercise 21.1 enable us to express the distribution of any random variable with cdf's

via the standard exponential distribution. Note also that the exponential E ( a ,A) distribution is a special case of the Weibull distribution because its cdf coincides with Gz,l ( x ~

more, if we take Y =

a,where X

Fy(x) = P { Y < .}

N

=

')

. Further-

E(l),then it is easy to show that

1 - e-z2,

x > 0.

(2 1.11)

We see that the RHS of (21.11) coincides with the Weibull cdf G 2 , 2 ( x ) .This distribution of Y is called the standard Rayleigh distribution, while linear transformations a XY yield the general two-parameter Rayleigh distribution with cdf

+

(21.12)

Exercise 21.2 If X denotes a standard exponential random variable and Y = show that the cdf of Y is as given in (21.11). Then, derive the incan and variance of Y .

a,

GENERALIZED EXTREME VALUE DISTRIBUTIONS

21.5

193

Generalized Extreme Value Distributions

It turns out, that all the limiting distributions of maxima as well as all the limiting distributions of minima can be presented in a unified form. For this purpose, let us introduce the faniily of cdf’s H(z,,!?) (-0s < ,d < m) which are defined as

+

H(X, P) = exp{-(l+ zp)-’’D}

(21.13)

in the domain 1 zp > 0,and we suppose that H(z,,B) equals 0 or 1 (depending on the sign of p) if 1 z,LJ< 0. For p = 0, H ( z ,0) means the limit of H(x,,B)as /3 t 0. Let us first consider the case P = l / a , where a > 0. Then,

+

(21.14)

It is easy to see that the cdf (21.14) coincides with Next, if

p = -l/cx, where a > 0,then

This cdf coincides with H2,a ( XiQ). Finally, for /3

= 0,we

have

H(x,O)= e-e-5

= H3(x).

(21.16)

Thus, the derivations above show that the three-parameter faniily of cdf’s

H ( z ,p, a, A)

=H

(xiu,A ) , --oo 0,includes all the cdf’s

as special cases. Equation (21.17) defines the generalized extreme value distributions for maxima, while H ( z ,p) in (21.18) correspond to its standard form. Similarly,

G(z, P, a, A)

=

1- H ( - T

P, a , A)

(21.19)


194

defines the generalized extreme value distributions for minima, and (212 0 )

G ( x , P ) = 1- H ( - x , P )

correspond to its standard form.

21.6

Moments

Making use of the representation in Exercise 21.1, we can express moments of the extreme value distributions in terms of moments of the standard exponential distribution. Let random variables Y , W , and V have cdf's H I , ~ ( X ) , respectively. Then, from Exercise 21.1, we have the following H Z . ~ ( XH3(x), ), relations:

y

d

=x-l/a,

wd

-XI/",

v =d

-

logX,

(21.21)

where X has the standard exponential distribution. Hence, we have

EYk

=

EX-'/"

-1

cc x-k/a

e --I dx,

(21.22)

and (21.24) It readily follows from (21.22) that moments E Y k exist if k

k).

E Y k =I- (1-

< Q and that (21.25)

Relation (21.23) reveals that moments EW' exist for any 1 , 2 , . . . , and

Q

> 0 and k

=

(21.26) From (21.25) and (21.26), we also obtain (21.27)

Var for any a

> 0.

w = r (1 + :)

-

{r (1 +

:)}

2

(21.28)

195

MOMENTS It is known that Euler's constant y = 0.57722.. . is defined as lim

=

n+cc

(2

-

log n )

(21.29)

k=l

rw

=

-

Jo

logx e-" dx.

(21.30)

Comparing (21.24) and (21.30), we immediately see that

EV

= y = 0.57722 ....

(21.31)

Another way to obtain (21.31) is through the characteristic function fv(t) of the random variable V . We have

fv(t)

EeitV - ~ ~ - i t l o g X EX-it

lee

=

=

x-it

e -x dx

= r ( l-it).

(21.32)

From (21.32) and the relation

f'"(0)

= i k E V k,

we readily find that

k = 1,2,... .

E V ~= ( - i ) k r ( k ) ( i ) ,

(21.33)

The following useful identity, which is valid for positive z , helps us to find the necessary derivatives of the gamma function:

(21.34) Since

we obtain from (21.33) and (21.34) that

EV

=

- q i ) = -+(I) = 7,

(21.36)

(21.37)

It follows now from (21.36) and (21.37) that n

Var V

7r4

=-

6

(21.38)


196

), Now, let, the random varia.bles Yl, W1, and V1 have cdf's G I , ~ ( XGz,a(x), and G3 ( x ) , respectively. Since Yl

d

=

-Y,

d

WI =

-w,v1 = -v, d

(21.39)

we immediately obtain EY:

5)'

=

Q

k 2, (21.42) Var Wl

=

Var W =I'

(

l +-

(21.43) (21.44)

and ?I-2

Var V1 = Var V = 6

.

(21.45)

It is important t o mention here that extreme value distributions discussed in this chapter have assumed a very important role in life-testing and reliability probltnis besides being used as probabilistic models in a variety of other problems.

CHAPTER 22

L 0GISTIC DISTRIBUTION 22.1

Introduction

Let Vl and V2 be i.i.d. random variables having the extreme value distribution with cdf [see (21.7)] &(2)

Let V

= V1 -

= e-e-",

-02

< 2 < 00.

V2. Then, the cdf Fv(x)of V is obtained as roo

J-00

This distribution is a particular case of the logistic distribution, which has been known since the pioneering work of Verhulst (1838, 1845) on demography. A book-length account of logistic distributions, discussing in great detail their various properties and applications, is available [Balakrishnan (1992)].

22.2

Notations

A random variable X is said to have a logistic distribution if its pdf is given by

The corresponding cdf is given by

197

198

LOGISTIC DISTRIBUTION

We will use X L o ( p , a 2 ) to denote the random variable X which has the logistic distribution with pdf and cdf as in (22.2) and (22.3), respectively. It is evident that p (-03 < p < cc) is the location parameter while u ( u > 0) is the scale parameter. Shortly, we will show that p and a2 are, in fact, the mean and variance of this logistic distribution. The standard logistic random variable, denoted by Y Lo(0, l), has its pdf and cdf as N

N

7r

e-XX/&

PY(Z)= -

--03 0 when

Inverse Gaussian Distribution

The two-parameter inverse Gaussian distribution, denoted by I G ( p ,A), has its pdf as

p x ( z ) = /=exp

271.x3

{

-

~

A

2P

2

(- p) z

and the corresponding cdf as

Fx(z)

=

a)

(E(;

-

1))

+

,

z

> 0, A, p > 0, (24.13)

(-8 (; +

1))

z

> 0.

, (24.14)

The characteristic function of IG (p ,A) can be shown t o be k.

(24.15)

238

MISCELLANEA

Exercise 24.2 From the characteristic function in (24.15), show that E X p and Var X = p3/X.

=

Exercise 24.3 Show that Pearson coefficients of skewness and kurtosis are given by

respectively, thus revealing that IG(p,A) distributions are positively skewed and leptokurtic. Note that these distributions are represented by the line 7 2 = 3 -t- 5$/3 in the ($, y2)-plaiie.

By taking X = p2 in (24.13), we obtain the one-parameter inverse Gaussian distribution with pdf

,

x > 0,p > 0,

(24.16)

denoted by I G ( p ,p’). Another one-parameter inverse Gaussian distribution may be derived from (24.13) by let,ting p + 00. This results in the pdf

Exercise 24.4 Suppose X I ,X 2 , . . . ,X , are independent inverse Gaussian random variables with X , distributed as IG(p,, A,). Then, using the characteristic function in (24.15), show that X,X,/p: is distributed as I G ( p ,p 2 ) , where p = C:=l A z / p z . Show then, when p, = p and A, = X for all i = 1 . 2 . . . . , n, that the sample mean X is distributed as I G ( p ,nA).

x;&

Inverse Gaussian distributions have many properties analogous to those of normal distributions. Hence, considerable attention has been paid in the literature t o inferential procedures for inverse Gaussian distributions as well as their applications. For a detailed discussion on thew developments, one may refer to the books by C h h i h r a and Folks (1989) and Seshadri (1993, 1998).

CHI-SQUARE DISTRIBUTION

24.4

239

Chi-square Distribution

In Chapter 20 (see, for example, Section 20.2), we made a passing remark that the specia.1case of gamma r (n/2,0,2) distribution (where n is a positive integer) is called the chi-square (x2) distribution with n degrees of freedom. We shall denote the corresponding random variable by x;. Then, from (20.6), we have its density function as 1 (n/2)

Pxz,(.) =

e--2/2

x(n/2)--1

,

o<x<m.

From (20.9), we also have the characteristic function of

(24.17)

xi as

fxz, ( t )= (1- 2it)

(24.18)

From (20.15) and (20.17), we have the mean and variance of

Ex: = n

and

Var

x:

xn2 = 2n.

as (24.19)

Furthermore, from (20.18) and (20.19), we have the coefficients of skewness and kurtosis of as

xi

(24.20) Also, as shown in Chapter 20, the limiting distribution of the sequence (x: - n ) / G is standard normal. Next, let and x i be two independent chi-square random variables, and x i . Then, from (24.18), we obtain the characteristic function let x2 = of x2 as

xi +

fX*(t) = Eeitx2 = EeitXz EeitXi

=

4-(nfm)/2

(1 - 2.

(24.21)

which readily implies that x2 has a chi-square distribution with (n+m)degrees of freedom. On the other hand, if x i and X are independent random variables with X having an arbitrary distribution, and if x2 = X is distributed as chi-square with ( n m ) degrees of freedom (where m is a positive integer), then the characteristic function of X is

xi +

+

xk.

which implies that X is necessarily distributed as Let X I , . . . , X,n be independent standard normal N ( 0 , l ) random variables. Then, as noted in Chapter 23 [see, for example, Eq. (23.81)],

k= 1

follows a chi-square distribution with n degrees of freedom. More generally, the following result can be established.

240

MISCELLANEA

Exercise 24.5 Let Y l ,. . . , Y, be a random sample from the normal N ( a ,g2) Yk denote the sample mean. Then, show that distribution, and = CE==, [see Eq. (23.60)]

v

( 24.23)

It is because of this fact that chi-square distributions play a very important role in statistical inferential problems. A book-length account of chi-square distributions, discussing in great detail their various properties and applications, is available [Lancaster (1969)l.

24.5

t Distribution

Let X , X I , . . . , X , be i.i.d. random variables having standard normal N ( 0 , l ) distribut,ion. Then, consider the random variable [see also Eq. (23.85)] (24.24) where the numerator and denominator are independent with the numerator having a standard normal distribution and S, having a chi-square distribution with n degrees of freedom. Then, as given in Exercise 23.8, the pdf of this random variable is given by [see Eq. (23.86)]

which is called Student’s t distribution with n degrees of freedom. Let us denote this distribution by t,. This is a special form of Karl Pearson’s Type VII distribution. Since “Student” (1908) was the first to obtain this result, it is called Student’s distribution. But sometimes this distribution is called Fisher’s distribution. More generally, the following result can be established. Exercise 24.6 Let Yl,. . . , Y, be a random sample from the normal N ( a ,g 2 ) distribution, and = C;=,Y k / n denote the sample mean. Then, with S2 as defined in (24.23), show that

t DISTRIBUTION

24 1

It is for this reason that t distributions play a very important role in statistical inferential problems. d From the density function of X = t, in (24.25), it can be shown that the r t h moment of X is finite only for r < n. Since the density function is symmetric about IC = 0, all odd moments of X are zero. If r is even, it can be shown that the r t h moment of X (which are also central moments) is given by

-

~ _ _ _ _ _

n'/2

1 . 3 . . . ( r - 1)

( n - r ) ( n- r

+ 2 ) . . . (n

-

2)

( 24.2 7)

~

Exercise 24.7 Derive the formula in (24.27).

From the expressions above, we readily obtain the mean, variance, and d coefficients of skewness and kurtosis of X = t, as

EX n(X)

=

0,

=

0

n a), n-2 3 ( n - 2) and y2(X) = ( n > 4). n-4 Var X

= -(n, >

(24.28)

It is evident that the t distributions are symmetric, unimodal, bell-shaped and leptokurtic distributions. In addition, as mentioned earlier in Exercise 23.9, t , distributions converge in limit (as n + m) to the standard normal distribution. Plots of the t density function presented in Figures 24.1 and 24.2 reveal these properties.

Exercise 24.8 Let X and Y be i.i.d. random variables with t , distribution. Then, show that

(24.29) a result established by Cacoullos (1965).

MISCELLANEA

242

In a recently published article, Jones (2002) observed that the t 2 distribution has simple forins for its distribution and quantile functions which lead to simple calculations for many properties and measures relating to this distributiori. The t density in (24.25) reduces, for the case n = 2, simply to P2(t) =

1

< t < 00,

-m

(2 + t 2 ) 3 / 2 '

(24.30)

from which we readily obta.in the cdf as

-

L2

-

I,,,

( t / 4)

tan

1 h s e c 2 6 dQ 23/2(sec26 ) 3 / 2

(setting 7~ =

tan-'(t/JZ)

Jztan 6 )

d6

-C O S ~

2

; { l + + j (v5 ) )} 1

=

tan2 (tan-'

-m

< t < 00.

Exercise 24.9 From (24.31), show that the quantile function of the tribution is 2u - 1 FT'(u) = O 2) and Var X

Exercise 24.11 If X

d

=

=

+

2n2(m 12 - 2) (n> 4). m(n - 2)2(n - 4)

F,,,, then show that

fi

-

(24.35)

l/a) /2 d t,.

246

24.7

MISCELLANEA

Noncentral Distributions

In Section 24.4 we noted that when X I , X 2 , . . . , X, are i.i.d. N ( 0 , l ) random variables, the variable S, = CE=,X z has a chi-square distribution with n degrees of freedom. Now, consider the distribution of the variable n

sk c ( x k + a k ) 2 .

(24.36)

k=l

The distribution of Sk depends on a l , a2, . . . , a, only through X = C:=,a:, and is called thc noncentral chi-square distribution with n degrees of freedom and Iioncentality parameter X = la^. When X = 0 (i.e., when all a l , . . . , a, are zero), this noncentral chi-square distribution becomes the (central) chi-square distribution in (24.17). Exercise 24.12 Let Y1,Y2,. . . , Yn be independent random variables with Y k distributed as N ( Q , g 2 ) ,and Y = Cy=,Y k / n denote the sample mean. Then, show that (n-l)S2 1 7L 02

=02

C(Yk -

Y ) 2

k=l

x:=l

is distributed as noncentral chi-square with n - 1 degrees of freedom and noncentmlity parameter X = x:=,(ak - u ) 2 / a 2where , zi = akin. Exercise 24.13 From (24.36), derive the mean and variance of the noncentral chi-square distribution with n degrees of freedom and noriceritrality parameter x = u;.

x:=,

In a similar manner, we can define noncentral t and noncentral F distributions which are useful in studying the power properties o f t and F tests. For example, in Eq. (24.32), we defined the F distribution with ( m ,n) degrees of freedom as thn distribution of the variable

where V arid W are independent chi-square random variables with m and n degrees of freedom, respectively. Now, consider the distribution of the variable

(24.3 7 ) where V’ and W’ are independent noncentral chi-square random variables with m and n degrees of freedom and noncentrality parameters XI and X2, respectively. The distribution of X’ is called the doubly noncentral F distribution witjh (m, n ) degrees of freedom and noncentrality parameters ( X I , X2). In the special case when A 2 = 0 (i.e., when there is a central chi-square in the denominator), the distribution of X’ is called the (singly) noncentrul F distribution with ( m ,n) degrees of freedom and noncentrality parameter XI.

Part I11

MULTIVARIATE DISTRIBUTIONS


CHAPTER 25

MULTINOMIAL DISTRIBUTION 2 5.1

Introduction

The multinomial distribution, being the multivariate generalization of the binomial distribution discussed in Chapter 5, is one of the most important and interesting multivariate discrete distributions. Consider a sequence of independent and idential trials each of which can result in one of k possible mutually exclusive and collectively exhaustive events, say, A l , Az, . . . , A k , with respectively probabilities p1 , p 2 , . . . ,p k , where p l pa . .. p k = 1. Such trials are termed multinomial trzals. Let Ye = ( Y I ,Yz,!, ~ , . . . , Y k , e ) , 1 = 1 , 2 , . . . , be the indicator vector variables, that is, Y,,e takes on the value 1 if the event A, ( j = 1 , 2 , . . . , k ) is the outcome of the 4th trial and the value 0 if the event A, is not the outcome of the l t h trial. Note that the variables Yl,e,YZ,e,.. . , Yk,! (which are the components of the vector Ye) are dependent, and that

+

+

+

Y1,e

+ Y2,e + . . + Y k , ! = 1, ’

e = 1,2,....

For any n = 1 , 2 , . . . , let us now define the random vector X,

(25.1) =

(XI,,, X Z , ~ ,

,Xk,n) as

x,

(25.2) + YZ + . . . + Y,, n = 1 , 2 , . . . , = y 3 , ~+ y3,2 + . . . + %,, ( j = 1 . 2 , . . . , k ) is the number of occur= Y1

where X,,, rences of event A, in the n multinomial trials. In other words, the random vector X, is simply a counter which gives us the number of occurrences of the events A l , A z , . . . , Ak in the n multinoniial trials, and hence

X l , , + X z , , + ~ ~ ~ + X k ,=, n , n = 1 . 2 , . . . .

(25.3)

Then, simple probability arguments readily yield

P, ( m l , m z , . . , n

~ k ) =

Pr { X I , , = m l , X ~=, m2,. ~ .., X ~ C= ,, m k }

m,

= 0,. ..,n,

249

ml

+ . . . + mk = n.

(25.4)

250

25.2

MULTINOMIAL DISTRIBUTION

Notations

A random vector X, = ( X l , , , x,,,,. . . , X k , , ) having the joint probability mass function as in (25.4) is said t o have the multinomial M ( n ,p l , p2, . . . ,p k ) distribution. In the case when k = 3, the distribution is also referred to a.s the

trinomial distribution.

-

Remark 25.1 The random vector x, M ( n , p l , p z , . . . , p k ) is actually ( k 1)-dimensional since its components XI,^, Xz,,, . . . , X k , , satisfy the relationship XI,, X 2 , n . . f x k , n = n

+

+

'

and, consequently, one of the components (say, x k , n ) can be expressed as xk,, =n

-

XI,,

-

X z , , - . . . - xk- 1 ,n.

Hence, the distribution of the random vector X, = (XI,,, X 2 , , , . . . , Xk,lL) is completely determined by the distribution of the ( k - 1)-dirnensional random vector XI,^, X2,,,. . . , Xk-l,?L). For cxaniple, when k = 2, the probabilities P,(ml, m.2) in (25.4) simply become

which are the binomial probabilities.

25.3

Compositions

Due to the probability interpretation of multinomial distributions given above, it readily follows that if independent vectors Y1, Y2,. . . ,Y, all have multinomial h ' ( l , p 1 , p 2 , . . . , p k ) distribution, then t h e s u m x , = Y 1 + Y 2 + . . . + Y n has the iiiult,iriomial M ( n , p l , p 2 , . . . , p k ) distribution. In addition, if X M(n1,p1.p2.. . . , p k ) and Y M ( n 2 ,p l , p a , . . . , p k ) are independent multinomial random vectors, then X Y is distributed as the sum Y1 Y2 . . . Y,,+,, of i.i.d. multinomial M ( l , p l , p z , . . . ,p,+)random vectors, and hence, is distributed as multinomial M(nl n2,p1,pzr.. .,pk).

-

-+

+

+ +

+

25.4

Marginal Distributions

The fact that the multinomial distribution is the joint distribution of the number of occurrences of the events Al, A2,. . . ,A,+ in n rnultinornial trials enables ub to derive easily any marginal distribution of interest. Suppose that we are interested in finding the marginal probabilities

P r ( X 1 , , = m ~ , X a=, m ~,...,X,,,=m,}

CONDITIONAL DISTRIBUTIONS for j

< k , when X,

-

25 1

M ( n , p l , p z , .. . , p k ) . We first note that

P r { X I , , = m l , . . . ,X j , ,

=mj}

= Pr {XI,, = m l , . . . , X j , , = mj, V = m } ,

(25.6)

where

evidently, V denotes the number of occurrences of the event A = Aj+l U Aj+2 U.. .U A k in the n multinomial trials with the corresponding probability of occurrence being

Pr{A} = pj+l

+ ... + p k

=

1- p l

-

. . . - p .3

-

P (say) .

Then, the random vector (XI,,,. . . , Xj,,, V) clearly has the multinomial M ( j l , p l , . . . , p j , p ) distribution; then, using (25.4) and (25.6), we have

+

Pr

= m l , . .., X j , , = m j }

=

P, ( m l ,. . . ,mj,m )

j

j

m + C m i = n and p + C p i = l . i=l i=l (25.7) In particular, for j of XI , , as Pr

=

1, we simply obtain from (25.7) the marginal distribution

= rnl} =

n!

p Y l ( 1 - pl)n--ml , ml

ml! ( n- ml)!

= 0 ,..., n,

(25.8)

which simply reveals that the marginal distribution of X I , , is binomial B ( n , p l ) Similarly, we have X , , B(n,p,) for any r = 1 , 2 , . . . , k . N

2 5.5

-

Conditional Distributions

M ( n , p l , p z , .. . , p k ) . Consider now the conditional distribution of (X,+i,,,. . . , X k , 7 L )given , ( X i , , = m i , . . . ,X,,, = m,) , defined by

Let X,,

Pr{X,+i,,

= m3+1,.

.. ,Xk,,

= mk

1 XI,,

= m l , .. .

,x,,,= m,} (25.9)

Substituting the expressions in (25.4) and (25.7) into (25.9), we obtain


252

Pr {Xj+l,, = n ~ j +. .~. ,,X k , ,

=mk

1 X1,,

= m l , . . . ,Xj,,, = m j ]

(25.10)

+. +

+. +

for mj+l . . mk = n - (ml . . m j ) ,and 0 otherwise. From (25.10), we readily observe that the conditional distribution of ( X j + l , n ,. . . ,Xk,7L),given = m l , . . . ,Xj,, = m j , is multinomial M ( n - m , y j t l , . . . , y k ) , where m = 'tn1 . . . m j , yi = p i / p (for i = j 1 , .. . , k ) , and p = p j + l + . . . + p k . Since the dependence on ml, m2,. . . , mj in (25.10) is only through the sum 7n1 ni2 . . . mj, we readily note t1ia.t

+ + + + +

+

P r {Xj+l,, = mj+l,.. . , X k , , = m

k

I

= m l , .. . . X j , , = mj)

(25.11) hence, the conditional distribution of (x2+l,n, . . . , X k , l L ) , given Xl,+ . . . X 2 , n= r n , is also the same multinomial M ( n - 7)2,y J + l , . . . , y k ) distribution.

+

25.6

+

Moments

Let x,, :( X I,,,, . . . , x k , n ) M ( n . p l . . . . , p k ) . Then, since the marginal distribution of X,,, ( r = 1 , 2 , . . . , k ) is binomial B(n,p,), we readily have N

EX,,,,

= 7lp, = e,

and Var Xr,n = np,(l

-

p,) = (T, 2 .

(25.12)

Next, in order t o derive the correlation between the variables X,,,, ( r 1.. . . , k ) , we shall first, find the covariance using the formula

where the last equality follows from the fact that

=

MOMENTS

j

i

i

j

=

253

Cj ~r { x ~ j ,> E~(Xr,nIXs,n =

j

=E

=j

) (25.14)

{Xs,n E (Xr,nlXs,n)}.

Now, we shall explain how we can find the regression E (Xr,nlXs,,) required in (25.13). For the sake of simplicity, let us consider the case when T = 2 and s = 1. Using the fact that the conditional distribution of the vector (X2,,,. . . , X k , n ) , given = m, is multinomial M ( n - m, q 2 , . . . , q k ) , where qi = pi/( 1 - p l ) (i = 2,. . . , k ) , we readily have the conditional distribution of X 2 , n , given XI,^, = m, is binomial B ( n - m, q 2 ) ; hence, (25.15) from which we obtain the regression of

X2,,

to be

on

Using this fact in (25.13), we obtain 012

= -

E XI,^ XZ,,)e1e2 E 1x1,~ E (X2,nlt1,n)}- e1e2 -EP2 1-P1 Pa l-pl

-

( n - Xl,n)P2 1 -P1

~

( n- XI,,)} 2

-

e1e2

{ n Pl -nP1(1-P1)-n2P?}-

nPlP2.

7L2PlP2

(25.16)

Similarly, we have

From (25.12) and (25.17), we thus have the covariance matrix of X, (Xl,n,.. . , X k , n ) to be

=

(25.18) for 1 5 T , S 5 k . Furthermore, from (25.12) and (25.17), we also find the correlation coefficient between Xr,n and X s , n (1 5 T < s 5 k ) to be (25.19)

254


It is important t o note that all the correlation coefficients are negative in a multinomial distribution.

Exercise 25.1 Derive the multiple regression function of . .. Xy,n.

XT+1,7Lon

XI,^,

1

25.7


Consider a random vector Y = (Y1,. . . , Yk)having M ( l , p l ,. . . ,pk) distribution. As pointed out earlier, the distribution of this vector is determined by the nonzero probabilities (for T = 1,. . . , k ) py = P r {Yl = 0 , . . . , Yr-l= 0, Yr 1,Yy+1= 0 , . . . , Yk = o} .

Then, it is evident that the generating function

$ ( s 1 , s2,..

(25.20)

. , sk) of Y is (25.21)

+

+

Since X, N M ( n , p l , .. . ,pk) is distributed as the sum Y1 . . . Y, [see Eq. (25.2)], where Y1,.. . , Y n are i.i.d. multinomial M ( l , p l , .. . ,pk) variables with generating function as in (25.21), we readily obtain the generating function of X, as

n

=

(Q(S1,

. . . ,Sk)}n

=

From (25.22), we deduce the generating function of m = l , . . . , k - 1 ) as Rn(s1,.

. . ,S r n )

=

Pn(S1,.

. . , sm, 1,.. . ,I)

In paxticular, when m = 1, we obtain from (25.23)

. . , X m , 7 1 (for )

GENERATING FUNCTION AND CHARACTERISTIC FUNCTION 255 which readily reveals that X I , , is distributed as binomial B ( n , p l ) (as noted earlier). Further, we obtain from (25.23) the generating function of the sum X I , , . . . Xm,n (for m = I , . . . , k - 1) as

+ +

EsX13,+...+X-,,

=Rn(S,.

. . ,s ) =

+

{+ 1

(8 -

1)e p r } ; ' ,

r=l

+ +

(25.25)

which reveals that the sum XI,^ . . . Xm,n is distributed as binomial B (n, p r ) . Note that when m = k , the sum XI,^ . . . Xk,n has a degenerate distribution since X I , , + . . . Xk,, = n.

c7==l

+

+

Exercise 25.2 From the generating function of X, M ( n , p l , . . . , p k ) in (25.22), establish the expressions of means, variances, and covariances derived in (25.12) and (25.17). N

Exercise 25.3 From the generating function of (XI,,, . . . , X m , n ) in (25.23), prove that if m > n and m 5 k , then E (XI,,.. . Xm,+)= 0. Also, argue in this case that this expression must be true due to the fact that at least one of the X r , n ' ~ must be 0 since . . . Xk,, = n.

+

+

From (25.22), we immediately obtain the characteristic function of X n M ( n , ~ l ., .., ~ l i as ) N

E

f n ( t l , .. . , tk) =

{

ei(tlXl,~+'..+tkxk,n)

P, (eitl, . . . ,e i t k )

3

(25.26) In addition, from (25.23), we readily obtain the characteristic function of (XI,,, . . . , X m , n ) (for m = I , . . . ,k - 1) as g,(tl,. . . , tm)

= =

R, ( e i t l , .. . ,eitm)

{+ 1

n

m

x p r

(,it,

-

1)) .

(25.27)

r=l

Exercise 25.4 From (25.27), deduce the characteristic function of the sum XI,^ . . . Xm,, (for m = 1 , 2 , . . . , k - 1) and show that it corresponds to that of the binomial B (n, pr).

+ +

xr=l

256

25.8


Limit Theorems

Let 11s now consider the sequence of random vectors

Xn

= ( X 1 , n r . .. , X k , n ) N

M(n,pl,.. . ,

(25.28)

~ k ) ,

where p k = I - C“-’ p,. Let p , = A T / n for T = 1,.. . , k - 1. Then, for nz = k-1, the characteristic function of . . , X ~ - I , ~in&(25.27) ) becomes n

(25.29) Letttirig n

4

00

in (25.29), we observe that

where h,(t) = exp {A, (ezt 1)) is the characteristic function of the Poisson .(A,.) distribution (for T = 1,.. . , k - 1). Hence, we observe from (25.30) the components XI,,, . . . , X k - l , , of the multinomial random tha.t, a.s ‘n+ x, vector X, in (25.28) are asymptotically independent and that the marginal distribution of X , , converges to the Poisson .(A,) distribution for any r = 1 , 2 , . . . , k - 1. ~

+

+

Exercise 25.5 Using a similar argument, show that XI,, . . . X m , , (for rn = 1,.. . , k - 1) converges to the Poisson 7r ( C r = l A,) distribution.

Next, let tors

11s corisider

the sequence of the ( k - 1)-dimensional raridorri vec-

Let h,,(tl.. . . , t k - 1 ) be the characteristic function of W, in (25.31). Then, it follows froni (25.27) that

(25.32)

Exercise 25.6 As n to

LIMIT THEOREMS

257

show that h,(tl,.

. . , t k - l ) in (25.32) converges

4 m,

where

is the correlation coefficient between Xr , n and X s , n derived in (25.19).

From (25.33), we see that the limiting characteristic function of the random variable XT.n - npr Wr,n = nP ( 1 - P 1

dy

(-it:),

becomes exp which readily implies that the limiting distribution of the random variable Wr,, is indeed standard normal (for T = 1,.. . , k - 1). Furthermore, in Chapter 26, we will see that the limiting characteristic function of W, in (25.33) corresponds to that of a multivariate normal distribution with mean vector (0,. . . , 0) and covariance matrix ifi=i

(25.34)

for 1 5 i , j 5 k - 1. Hence, we have the asymptotic distribution of the random vector W, in (25.31) to be multivariate normal.


CHAPTER 26

MULTIVARIATE NORMAL DISTRIBUTION 26.1

Introduction

The multivariate normal distribution is the most important and interesting multivariate distribution and based on it, a huge body of multivariate analysis has been developed. In this chapter we present a brief description of the multivariate normal distribution and some of its basic properties. For a detailed discussion on multivariate normal distribution and its properties, one may refer t o the book by Tong (1990). At the end of Chapter 25 (see Exercise 25.6), we found that the limiting distribution of a sequence of multinomial random variables has its characteristic function as [see Eq. (25.33)]

h ( t l , .. . ,tk-l)

= exp

{

-

1

Q ( t l , . . . , tk-1)) ,

(26.1)

where Q (tl, . . . ,tk-1) is the quadratic form

Q(t1,. . . d k - 1 ) =

k-1

Ct:+ 2 C

Prstrts.

(26.2)

l 0. Then, the joint density function of Y1, . . . ,Yn+lis N

With the transformation in (27.4), we obtain the joint density function of Vl,. . , Vn+l as '

Now making the transformation in (27.7) (with the Jacobian as X:+l), obtain from (27.14) the joint density function of X I , . . . ,Xn+las P X 1 , ...,Xn+1 (51,.. '

x (1

-

we

,Zn+l)

p)an+l-', n

Upon integrating out the variable xn+l in (27.15), we then obtain the joint density function of X I , . . . , Xn as PXl,...,x, (21, . . . , Z n )

c n

0 5 21, . . . , 2 n 5 1, 0 5

22

i=l

I 1.

(27.16)

272

DIRICHLET DISTRIBUTION

Indeed, from the fact that (27.16) represents a density function, we readily obtain

which is exactly the Dirichlet integral formula presented in (27.12). We thus have a multivariate density function in (27.16) which is very closely related to the multidimensional integral in (27.12) evaluated by Dirichlet (1839).

Notations

27.3

A random vector X = ( X I ,. . . , X,) is said to have an n-dimensional standard Dirichlet distribution with positive parameters a l , . . . , a,+l if its density function is given by (27.16) and is denoted by X D,(al,. . . , a,+l). Note that when

71. =

-

1, (27.16) reduces to

which is nothing but the standard beta distribution discussed in Chapter 16. Indeed, the linear transformation of the random va.riables X I , . . . ,X , will yield the random vector (bl c1X1,. . . , b, cnXn) having an n-dimensional general Dirichlet distribution with shape parameter ( a l ,. . . , a,+l), location parameters ( b l , . . . , b,), and scale parameters (c1,. . . , c,). However, for the rest, of this chapter we consider only the standard Dirichlet distribution in (27.16), due t o its simplicity.

+

+

Marginal Distributions

27.4

Let X D n ( a l , . . . ,a,+l). Then, as seen in Section 27.2, the components XI, . . . , X , admit the representations N

(27.17) and yk

d

XI, =

-

~

s,.+1

,

k = 1 , . . . ,n,

(27.18)

where Y1. . . . , Yn+l are independent standard gamma random variables with Yk r ( a k ,0,1) and = Y1 .. Y,+l. From the properties of gamma distributions (see Section 20.7), it is then known that

sn+l

S,,,

-

+. +

r ( a , 0 , l ) with a = a1

+ . . . + a,+l,

MARGINAL DISTRIBUTIONS

Sn+l-

Yk

N

r ( a - a k , 0,1) (independent of

and d

y k

&

XI, = -sn+l

273

y k

y k

+ (!%+I

-Yk)

-

Yk)

(27.19)

B e ( a k , u - arc);

that is, the margina.1 distribution of X k is beta B e ( a k , a - u k ) , k = 1,.. . ,n (note that this is just a Dirichlet distribution with n = 1). Thus, the Dirichlet distribution forms a natural mult,ivariate generalization of the beta distribution. Exercise 27.1 From the density function of X in (27.16), show by means of direct integration that the marginal distribution of x k is B e ( a k , a - a k ) .

For two-dimensional marginal distributions, it follows from (27.17) that for any 1 5 k < e 5 n,

where Yk a t , 0, l), and that N

(27.20)

-

0, I ) , Z = - Y k - 5 r(a- ak 0, I),5 and 2 are independent. Thus, (27.20) readily implies

Y k , Ye,

N

(27.21) In a similar manner, we find that

for any

T

= 1 , . . . ,n. and 1 5 k(1)

< k ( 2 ) < . . . < k ( r ) 5 n.

Exercise 27.2 If (XI,.. . ,XS)

N

(Xl,x2 + x3,x4

D ~ ( a 1 ,. . , a 7 ) , show that

+ x5 + X S )

Exercise 27.3 Let (XI,.. . , X n ) of Wk = XI + . . . X k .

+

N

DD3(Ul,a2

+ a3, + a5 + a(j,a7). a4

D n ( a l , . . . ,u,+l). Find the distribution

Exercise 27.4 Let (XI,X 2 ) Dz(a1, a2, a 3 ) . Obtain the conditional density function of XI, given X2 = 2 2 . Do you observe some connection to the beta distribution? Derive an expression for the conditional mean E(XIIX2 = 2 2 ) and comment. N

2 74


Marginal Moments

27.5

Let X V , ( a l , . . . ,a,+l). In this case, as shown in Section 27.4, the marginal distribution of Xk is beta Be(ak, a - a k ) , where a = a1 . . . a,+l. Then, from the formu1a.s of moments of beta distribution presented in Section 16.5, we immediately have

+ +

N

27.6

-

Product Moments

Let X D n ( a l , .. . , a,+l). Then, from the density function in (27.16), we have the product moment of order ( a 1 , .. . , a,) as

E ( X p l .. . X:")

where the last equality follows by an application of the Dirichlet integral formula. in (27.12). In particula.r, we obtain from (27.23) that

and

Exercise 27.5 Let X N D T L ( a l., . , lation coefficient between X I , and X!.

Derive the covariance and corre-

2 75

DIRICHLET DISTRIBUTION OF SECOND KIND

27.7

Dirichlet Distribution of Second Kind

In Chapter 16, when dealing with a beta random variable X probability density function

-

B e ( a , b) with

by considering the transformation Y = X/(1 - X ) or equivalently X = Y / ( 1 Y ) ,we introduced the beta distribution of the second kind with probability density function

+

-

In a similar manner, we shall now introduce the Dirichlet distribution of the second kind. Specifically, let X D,(al,. . . , a,+l), where a k > 0 ( k = 1 , . . . , n + 1) are positive parameters. Now, consider the transformation

x1 . . . ,y, Y1 = 1 - x1- . . . Xn ’

1

1-

x n

x1

-

. . . - xrl

(27.24)

or equivalently, x1=

Yl ... l+Y1+...+Yn’

,x, 1 + Y1 +Yn. . ’ + Y,‘ 1

(27.25)

+ +

Then, it can be shown that the Jacobian of this transformation is (1 Y1 . . . + Y,)p(n-tl).Then, we readily obtain from (27.16) the density function of Y = (YI,.. . ,Yn)as

The density function (27.26) is the Dirichlet density of the second kind.

Exercise 27.6 Show that the Jacobian of the transformation in (27.25) is (1+ Yl + . . . + Yrl)-(n+l) (use elementary row and column operations). Exercise 27.7 Suppose that Y has a Dirichlet distribution of t,he second kind in (27.26). Derive explicit expressions for EYk, Var Yk, cov(Yk,f i ) , and correlation p(Yk,Ye).

276

27.8


Liouville Distribution

Liouville (1839) generalized the Dirichlet integral formula in (27.12) by establishing that

where a l , . . . , u, are positive parameters, 2 1 , . . . , 2 , are positive, and f ( . ) is a suitably chosen function. It is clear that if we set h = 1 and choose f ( t ) = (1- t)an+l-l, (27.27) readily reduces to the Dirichlet integral formula in (27.12). Also, by letting h + 00 in (27.27), we obtain the Liouuille integral formula

where ~ 1 ,. .. , ulL> 0 and tal+."+an-l f ( t )is integrable on ( 0 ,co). The Liouville integral formula in (27.28) readily yields Liouwillr distribution with probability density function

p x l ,...)x n ( Z 1 , . . . , 2 , ) 21,

...,% > O ,

= a1

c f(21 + . ' . + 2 , )

,..., a,>O,

2y1-1

..' x a n T L - l

l

(27.29)

where C is a normalizing constant a.nd f ( . ) is a nonnegative function such that f ( t ) t " i + ' . ' f a . z - 1 is integrable . on (0, co). For a historical view a.nd details on the Liouville distribution, one may refer to Gupta and Richards (2001). '

Exercise 27.8 Show that the Dirichlet distribution of the second kind in (27.26) is a Liouville distribution by choosing the function f ( t ) appropriately and then determining the constant C from the Lioiiville integral forniiila in (27.28).

APPENDIX PIONEERS IN DISTRIBUTION THEORY

As is evident from the preceding chapters, several prominent mathematicians and statisticians have made pioneering contributions to the area of statistical distribution theory. To give students a historical sense of developments in this important and fundamental area of statistics, we present here a brief biographical sketch of these major contributors. Bernoulli, Jakob Born - January 6,1655, in Basel, Switzerland Died - August 16, 1705, in Basel, Switzerland Jakob Bernoulli was the first of the Bernoulli family of Swiss mathematicians. His work Ars Conjectandi (The Art of Conjecturing),published posthumously in 1713 by his nephew N. Bernoulli, contained the Bernoulli law of large numbers for Bernoulli sequences of independent trials. Usually, a random variable, taking values 1 and 0 with probabilities p and 1 - p , 0 p 1, is said to have the Bernoulli distribution. Sometimes, the binomial distributions, which are convolutions of Bernoulli distributions, are also called the Bernoulli distributions.

<

A primer on statistical distributions

A Primer on Statistical Distributions

Statistical distributions

Statistical distributions

Statistical Distributions

Statistical Distributions in Engineering

Fitting statistical distributions

Statistical Distributions, Fourth Edition

A Primer on securitization

A Primer on Determinism

Statistical ecology: a primer on methods and computing, Volume 1

A primer on spectral theory

A Primer on Islamic Finance

A primer on Lebesgue integration

Probability distributions in quantum statistical mechanics

Spectral Distributions in Nuclei and Statistical Spectroscopy

Probability distributions in quantum statistical mechanics

A Primer on the Taguchi Method

A Primer on Scientific Programming with Python

A Primer on Spectral Theory (Universitext)

Primer on Flat Rolling

Primer on Kidney Diseases

Primer On Spiritual Warfare

Constructed Climates: A Primer on Urban Environments

Primer on the Rheumatic Diseases (Primer on Rheumatic Diseases (Klippel))

A Primer on Spectral Theory (Universitext)

A primer on mapping class groups

A Farmer's Primer on Growing Upland Rice

A Primer on Asynchronous Modem Communication

A Primer on Scientific Programming with Python

A primer on statistical distributions