RUIN PROBABILITIES Second Edition
7431 tp.indd 1
8/5/10 9:33 AM
ADVANCED SERIES ON STATISTICAL SCIENCE & APPLIED PROBABILITY Editor: Ole E. BarndorffNielsen Published Vol. 1
Random Walks of Infinitely Many Particles by P. Révész
Vol. 2
Ruin Probabilities by S. Asmussen
Vol. 3
Essentials of Stochastic Finance: Facts, Models, Theory by Albert N. Shiryaev
Vol. 4
Principles of Statistical Inference from a NeoFisherian Perspective by L. Pace and A. Salvan
Vol. 5
Local Stereology by Eva B. Vedel Jensen
Vol. 6
Elementary Stochastic Calculus — With Finance in View by T. Mikosch
Vol. 7
Stochastic Methods in Hydrology: Rain, Landforms and Floods eds. O. E. BarndorffNielsen et al.
Vol. 8
Statistical Experiments and Decisions: Asymptotic Theory by A. N. Shiryaev and V. G. Spokoiny
Vol. 9
NonGaussian Merton–Black–Scholes Theory by S. I. Boyarchenko and S. Z. Levendorskiĭ
Vol. 10 Limit Theorems for Associated Random Fields and Related Systems by A. Bulinski and A. Shashkin Vol. 11 Stochastic Modeling of Electricity and Related Markets . by F. E. Benth, J. Òaltyte Benth and S. Koekebakker Vol. 12 An Elementary Introduction to Stochastic Interest Rate Modeling by N. Privault Vol. 13 Change of Time and Change of Measure by O. E. BarndorffNielsen and A. Shiryaev Vol. 14 Ruin Probabilities (2nd Edition) by S. Asmussen and H. Albrecher
EH  Ruin Probabilities.pmd
2
7/26/2010, 4:56 PM
Advanced Series on Statistical Science &
Vol. 14
Applied Probability
RUIN PROBABILITIES Second Edition
Søren Asmussen Aarhus University, Denmark
Hansjörg Albrecher University of Lausanne, Switzerland
World Scientific NEW JERSEY
7431 tp.indd 2
•
LONDON
•
SINGAPORE
•
BEIJING
•
SHANGHAI
•
HONG KONG
•
TA I P E I
•
CHENNAI
8/5/10 9:33 AM
Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE
Library of Congress CataloginginPublication Data Asmussen, Søren. Ruin probabilities / by Søren Asmussen & Hansjörg Albrecher p. cm.  (Advanced series on statistical science and applied probability ; v. 14) Includes bibliographical references and index. ISBN13: 9789814282529 (hardcover : alk. paper) ISBN10: 9814282529 (hardcover : alk. paper) 1. InsuranceMathematics. 2. Risk. I. Albrecher, Hansjörg. II. Title. HG8781.A83 2010 368'.01dc22 2010023280
British Library CataloguinginPublication Data A catalogue record for this book is available from the British Library.
Copyright © 2010 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.
Printed in Singapore.
EH  Ruin Probabilities.pmd
1
7/26/2010, 4:56 PM
Contents Preface
ix
Notation and conventions I
II
III
IV
xiii
Introduction 1 The risk process . . . . . . 2 Claim size distributions . . 3 The arrival process . . . . 4 A summary of main results
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
1 1 6 11 13
. . . . . motion . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
21 21 23 29 30
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
39 39 45 48 54 62
compound Poisson model Introduction . . . . . . . . . . . . . . . . . . . . . . The PollaczeckKhinchine formula . . . . . . . . . . Special cases of the PollaczeckKhinchine formula . Change of measure via exponential families . . . . . Lundberg conjugation . . . . . . . . . . . . . . . . . Further topics related to the adjustment coefficient
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
71 72 75 77 82 84 91
. . . . . . . . . . . . . . . . . . . . . . . . and methods
. . . .
Martingales and simple ruin calculations 1 Wald martingales . . . . . . . . . . . . . . 2 Gambler’s ruin. Twosided ruin. Brownian 3 Further simple martingale calculations . . 4 More advanced martingales . . . . . . . .
. . . .
. . . .
Further general tools and results 1 Likelihood ratios and change of measure . . . 2 Duality with other applied probability models 3 Random walks in discrete or continuous time . 4 Markov additive processes . . . . . . . . . . . 5 The ladder height distribution . . . . . . . . . The 1 2 3 4 5 6
v
. . . .
. . . . .
. . . .
. . . . .
vi
V
VI
VII
CONTENTS 7 8 9 10
Various approximations for the ruin probability . . . . Comparing the risks of different claim size distributions Sensitivity estimates . . . . . . . . . . . . . . . . . . . Estimation of the adjustment coefficient . . . . . . . .
The 1 2 3 4 5 6 7
probability of ruin within finite time Exponential claims . . . . . . . . . . . . . The ruin probability with no initial reserve Laplace transforms . . . . . . . . . . . . . When does ruin occur? . . . . . . . . . . . Diffusion approximations . . . . . . . . . . Corrected diffusion approximations . . . . How does ruin occur? . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. . . . . . .
. . . .
. 95 . 100 . 103 . 110
. . . . . . .
115 116 121 126 128 136 139 146
. . . . . . .
Renewal arrivals 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Exponential claims. The compound Poisson model with negative claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Change of measure via exponential families . . . . . . . . . . . 4 The duality with queueing theory . . . . . . . . . . . . . . . .
151 151
Risk theory in a Markovian environment 1 Model and examples . . . . . . . . . . . . . . . 2 The ladder height distribution . . . . . . . . . . 3 Change of measure via exponential families . . . 4 Comparisons with the compound Poisson model 5 The Markovian arrival process . . . . . . . . . . 6 Risk theory in a periodic environment . . . . . . 7 Dual queueing models . . . . . . . . . . . . . . .
154 157 161
. . . . . . .
. . . . . . .
. . . . . . .
165 165 172 180 188 194 196 205
VIII Leveldependent risk processes 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2 The model with constant interest . . . . . . . . . . . . . . 3 The local adjustment coefficient. Logarithmic asymptotics 4 The model with tax . . . . . . . . . . . . . . . . . . . . . . 5 Discretetime ruin problems with stochastic investment . . 6 Continuoustime ruin problems with stochastic investment
. . . . . .
. . . . . .
209 209 222 227 239 242 248
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
CONTENTS IX
X
XI
XII
vii
Matrixanalytic methods 1 Definition and basic properties of phasetype 2 Renewal theory . . . . . . . . . . . . . . . . 3 The compound Poisson model . . . . . . . . 4 The renewal model . . . . . . . . . . . . . . 5 Markovmodulated input . . . . . . . . . . . 6 Matrixexponential distributions . . . . . . . 7 Reservedependent premiums . . . . . . . . 8 Erlangization for the finite horizon case . . . Ruin probabilities in the presence 1 Subexponential distributions . . . 2 The compound Poisson model . . 3 The renewal model . . . . . . . . 4 Finitehorizon ruin probabilities . 5 Reservedependent premiums . . 6 Tail estimation . . . . . . . . . .
distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
253 253 260 264 266 271 277 281 287
of heavy tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
293 293 302 305 309 318 320
Ruin probabilities for L´ evy processes 1 Preliminaries . . . . . . . . . . . . . . . . . . . . . 2 Onesided ruin theory . . . . . . . . . . . . . . . . 3 The scale function and twosided ruin problems . 4 Further topics . . . . . . . . . . . . . . . . . . . . 5 The scale function for twosided phasetype jumps
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
329 329 336 340 345 353
GerberShiu functions 1 Introduction . . . . . . 2 The compound Poisson 3 The renewal model . . 4 L´evy risk models . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
357 357 360 374 384
XIII Further models with dependence 1 Large deviations . . . . . . . . . . . . . . . . . . 2 Heavytailed risk models with dependent input 3 Linear models . . . . . . . . . . . . . . . . . . . 4 Risk processes with shotnoise Cox intensities . 5 Causal dependency models . . . . . . . . . . . . 6 Dependent Sparre Andersen models . . . . . . . 7 Gaussian models. Fractional Brownian motion . 8 Ordering of ruin probabilities . . . . . . . . . . 9 Multidimensional risk processes . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
397 398 410 417 419 424 427 428 433 435
. . . . model . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
viii
CONTENTS
XIV Stochastic control 445 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 2 Stochastic dynamic programming . . . . . . . . . . . . . . . . 447 3 The HamiltonJacobiBellman equation . . . . . . . . . . . . . 448 XV
Simulation methodology 1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . 2 Simulation via the PollaczeckKhinchine formula . . . . 3 Static importance sampling via Lundberg conjugation . 4 Static importance sampling for the finite horizon case . 5 Dynamic importance sampling . . . . . . . . . . . . . . 6 Regenerative simulation . . . . . . . . . . . . . . . . . 7 Sensitivity analysis . . . . . . . . . . . . . . . . . . . .
XVI Miscellaneous topics 1 More on discretetime risk models . . . . 2 The distribution of the aggregate claims 3 Principles for premium calculation . . . . 4 Reinsurance . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
461 461 465 470 474 475 482 484
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
487 487 493 510 513
Appendix A1 Renewal theory . . . . . . . . . . . . . . . A2 WienerHopf factorization . . . . . . . . . A3 Matrixexponentials . . . . . . . . . . . . . A4 Some linear algebra . . . . . . . . . . . . . A5 Complements on phasetype distributions . A6 Tauberian theorems . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
517 517 522 526 530 536 548
Bibliography
549
Index
597
Preface This book is a second edition of the book of the same title by the first author which was published in 2000. The subject of ruin probabilities and related topics has since then undergone a considerable development, not to say boom. This much expanded and revised second edition aims at covering a substantial part of these developments as well as the classical topics. Risk theory in general and ruin probabilities in particular are traditionally considered as part of insurance mathematics, and has been an active area of research from the days of Lundberg all the way up to today. One reason for writing this book is a feeling that the area has in recent years achieved a considerable mathematical maturity, which has in particular removed one of the standard criticisms of the area, namely that it can only say something about very simple models and questions. Although in insurance practice, usually simpler (and coarser) risk measures like ValueatRisk are used, it is widely believed that the thinking advocated by ruin theory is still important for modern risk management. For instance, in times of marketconsistent valuation principles, the role of the time diversification effect of insurance portfolios, which is one of the core elements of ruin theory, should not be forgotten. In addition, ruin theory has fruitful methodological links and applications to other fields of applied probability, like queueing theory and mathematical finance (pricing of barrier options, credit products etc.). Apart from these remarks, we have deliberately stayed away from discussing the practical relevance of the theory; if the formulations occasionally give a different impression, it is not by intention. Thus, the book is basically mathematical in its flavor. The present second edition is more than 50% longer than the first and has more than double the number of references. The longer parts of the new material, reflecting subareas that have been particularly active in the last decade, are collected in Chapters XI–XIV, which treat L´evy processes, GerberShiu functions, dependence and stochastic control, respectively. Shorter additions include ix
x
PREFACE
more about martingales and generators (II.4), various versions in Chapter VIII of models with level dependence, e.g. tax or stochastic investments, Erlangization (IX.8), statistical techniques for distinguishing between light and heavy tails (X.6), more material on discretetime risk models (XVI.1) and recent advances in simulation techniques scattered in Chapter XV. In addition, there are amendments and updates at a large number of places. A book like this can be organized in many ways. One is by model, another by method. The present book is somewhere between these two possibilities. Chapters IV–VIII introduce some of the main models and give a first derivation of some of their properties. Chapters IX–XV then go into more depth with some of the special approaches for analyzing specific models and add a number of results on the models in Chapters IV–VIII. Chapters II and III are essentially methodological in flavor. Here is a suggestion on how to get started with the book. For a brief orientation, read first Chapter I, continue with II.1–3 to see some of the simplest ruin calculations, the first part of III.5 (to understand the PollaczeckKhinchine formula in IV.2 more properly), and then, to get acquainted with the classical theory of the Cram´erLundberg model, IV.1–5, V.4a, VIII.1, IX.1–3 and X.1– 2. For a second reading, incorporate II.4, III.1–3, IV.8–9, V.1–2, V.5, VII.1–3, VIII.2, X.3–4, XII.1–2, XIII.12 and XV.1–3. The rest is up to your specific interests. Enjoy! The symbols used for the quantities appearing in the book differ among the disciplines. We chose to use those that are common in the queueing community, partly also to be in line with the first edition. We apologize for the confusion this may cause for readers who are used to other symbols. In a book project like this it is impossible to avoid conflicts of notation in the sense that the same symbol may be used for different quantities. We hope that it will always be clear from the context what the notation refers to. In addition, we have collected a number of conventions, abbreviations and symbols after this Preface. We have tried to be fairly exhaustive in citing references close to the text, but it is obvious that such a system involves a number of inconsistencies and omissions, for which we apologize to the reader and to the authors of the many papers that ought to have been on the list. We intend to keep a list of misprints and remarks posted on the web page http://www.hec.unil.ch/halbrecher/rp2.html and we are grateful to get relevant material sent by email to
[email protected] PREFACE
xi
Finally, we would like to thank Corina Constantinescu, Hans Gerber, Peter Glynn, Dominik Kortschak, Ronnie Loeffen, Stefan Thonhauser and Hailiang Yang for discussions and proofreading parts of the manuscript, and Dominik Kortschak for help with some figures and general LaTeX issues. Most of all, we would like to thank our wives May Lise and Renate for their support and patience during the writing of this book. Aarhus and Lausanne, May 2010 Søren Asmussen
Hansj¨org Albrecher
This page intentionally left blank
Notation and conventions Numbering and reference system The chapter number is specified only when it is not the current one. Thus Proposition 4.2, formula (5.3) or Section 3 of Chapter VI are referred to as Proposition VI.4.2, formula VI.(5.3) and Section VI.3 (or just VI.3), respectively, in all other chapters whereas in VI we just write Proposition 4.2, formula (5.3) or Section 3. References like Proposition A.4, (A.29) refer to the Appendix. Throughout the book, [APQ] refers to the first author’s earlier book Applied Probability and Queues, reference [69]. Abbreviations a.s. almost surely c.d.f. cumulative distribution function P(X ≤ x) b where B[s] b is the m.g.f. c.g.f. cumulant generating function, i.e. log B[s] IDE integrodifferential equation i.i.d. independent identically distributed i.o. infinitely often l.h.s. lefthand side (of equation) b below. m.g.f. moment generating function, see under B[s] ODE ordinary differential equation r.h.s. righthand side (of equation) r.v. random variable s.c.v. squared coefficient of variation, EX 2 /(EX)2 . xiii
xiv
NOTATION AND CONVENTIONS w.r.t. with respect to w.p. with probability
Mathematical notation P probability. E expectation. ∼ Used in asymptotic relations to indicate√that the ratio between two expressions is 1 in the limit. E.g. n! ∼ 2π nn+1/2 e−n , n → ∞. ≈ A different type of asymptotics: less precise, say a heuristic approximation, or a more precise one like eh ≈ 1 + h + h2 /2, h → 0. log
∼ Used in asymptotic relations to indicate that the ratio between logarithms of two expressions is 1.
≺st stochastic order. ≺cx convex order. ≺icx increasing convex order (i.e. stoploss order). ≺sm supermodular order. u).
(1.3) (1.4)
So far we have not imposed any assumptions on the risk reserve process. However, the following setup will cover a main part of the book: • There are only finitely many claims in finite time intervals. That is, the number Nt of arrivals in [0, t] is finite. We denote the interarrival times of claims by T2 , T3 , ... and T1 is the time of the first claim. Thus, the time of arrival of the nth claim is σn = T1 + · · · + Tn , and Nt = min {n ≥ 0 : σn+1 > t} = max {n ≥ 0 : σn ≤ t}. • The size of the nth claim is denoted by Un . • Premiums flow in at rate p, say, per unit time. Putting things together, we see that Rt = u + pt −
Nt X
Uk ,
St =
k=1
Nt X
Uk − pt.
(1.5)
k=1
The sample paths of {Rt } and {St } and the connection between the two processes are illustrated in Fig. I.1.
Figure I.1
1. THE RISK PROCESS
3
Note that it is a matter of taste (or mathematical convenience) whether one allows {Rt } and/or {St } to continue its evolution after the time τ (u) of ruin. Thus, for example, one could well replace Rt by Rt∧τ (u) or Rt∧τ (u) ∨ 0. For the purpose of studying ruin probabilities this distinction is, of course, immaterial. Some main examples of models not incorporated in the above setup are: • Models which are nonhomogeneous in space, for example with a premium depending on the reserve (i.e. on Fig. I.1 the slope of {Rt } should depend also on the level). We study this case in Chapter VIII. • Brownian motion or more general diffusions. Traditionally, Brownian motion has mainly been used as an approximation to the risk process rather than as a model of intrinsic merit and we look at this in Chapter V. However, since any modeling involves some approximative assumptions, it has (partly inspired from the modeling in mathematical finance) become more and more common to use Brownian motion as an intrinsically reasonable model. • General L´evy processes (defined as continuous time processes with stationary independent increments) where the jump component has infinite L´evy measure, allowing a countable infinity of jumps on Fig. I.1. We treat L´evy processes in Chapter XI. The models we consider will often have the property that there exists a constant ρ such that Nt 1X a.s. Uk → ρ, t → ∞. (1.6) t k=1
The interpretation of ρ is as the average amount of claim per unit time. A further basic quantity is the safety loading (or the security loading) η defined as the relative amount by which the premium rate p exceeds ρ, η=
p−ρ . ρ
It is sometimes stated in the theoretical literature that the typical values of the safety loading η are relatively small, say 10% − 20%; we shall, however, not discuss whether this actually corresponds to practice. It would appear obvious, however, that the insurance company should try to ensure η > 0, and in fact: Proposition 1.1 Assume that (1.6) holds. If η < 0, then M = ∞ a.s. and hence ψ(u) = 1 for all u. If η > 0, then M < ∞ a.s. and hence ψ(u) < 1 for all sufficiently large u.
4
CHAPTER I. INTRODUCTION
Proof. It follows from (1.6) that St = t
PNt k=1
Uk − pt a.s. → ρ − p, t
t → ∞.
a.s.
If η < 0, then this limit is > 0 which implies St → ∞ and hence M = ∞ a.s. If a.s. η > 0, then similarly lim St /t < 0, St → −∞, M < ∞ a.s. 2 In concrete models, we obtain typically a somewhat stronger conclusion, namely that M = ∞ a.s., ψ(u) = 1 for all u holds also when η = 0, and that ψ(u) < 1 for all u > 0 when η > 0. However, this needs to be verified in each separate case. The simplest concrete example (to be studied in Chapter IV) is the Cram´erLundberg or compound Poisson model, where {Nt } is a Poisson process with rate β (say) and U1 , U2 , . . . are i.i.d. and independent of {Nt }. Here it is easy to see that ρ = βEU (on the average, β claims arrive per unit time and the mean of a single claim is EU ) and that also N
t 1X Uk = ρ. lim E t→∞ t
(1.7)
k=1
Again, (1.7) is a property which we will typically encounter. However, not all models considered in the literature have this feature: Example 1.2 (Cox processes) Here {Nt } is a Poisson process © with random ª rate β(t) (say) at time t. If U1 , U2 , . . . are i.i.d. and independent of (β(t), Nt ) , it is not too difficult to show that ρ as defined by (1.6) is given by 1 t→∞ t
Z
t
β(s) ds
ρ = EU · lim
0
(provided the © limit ª exists). Thus ρ may well be random for such processes, namely, if β(t) is nonergodic. The simplest example is β(t) ≡ V where V is a r.v. This case is referred to as the mixed Poisson process, with the most notable special case being V having a Gamma distribution, corresponding to the P´ olya process. 2 We shall only encounter a few instances of a Cox process, in connection with risk processes in a Markovian or periodic environment (Chapter VII), and here (1.6), (1.7) hold with ρ constant.
1. THE RISK PROCESS
5
et = Rt/p . Then the connection Proposition 1.3 Assume p 6= 1 and define R e between the ruin probabilities for the given risk process {Rt } and those ψ(u), © ª e e ψ(u, T ) for Rt is given by e ψ(u) = ψ(u),
e T p). ψ(u, T ) = ψ(u,
© ª et has premium rate 1, the role of the result is The proof is trivial. Since R © ª et to justify taking p = 1, which is feasible since in most cases the process R has a similar structure as {Rt } (for example, the claim arrivals are Poisson or renewal at the same time). Note that when p = 1, the assumption η > 0 is equivalent to ρ < 1; in a number of models, we shall be able to identify ρ with the traffic intensity of an associated queue, and in fact ρ < 1 is the fundamental assumption of queueing theory ensuring steadystate behavior (existence of a limiting stationary distribution). Notes and references The study of ruin probabilities, often referred to as collective risk theory or just risk theory, was largely initiated in Sweden in the first half of the century. Some of the main general ideas were laid down by Lundberg [614], while the first mathematically substantial results were given in Lundberg [615] and Cram´er [265]; another important early Swedish work is T¨ acklind [826]. The Swedish school was pioneering not only in risk theory, but also in probability and applied probability as a whole; in particular, many results and methods in random walk theory originate from there and the area was ahead of related ones like queueing theory. Some early surveys are given in Cram´er [265], Segerdahl [792] and Philipson [699]. Some main later textbooks are (in alphabetical order) B¨ uhlmann [208], Dickson [309], Daykin, Pentik¨ ainen & Pesonen [279], De Vylder [300], Gerber [398], Grandell [429], Rolski, Schmidli, Schmidt & Teugels [746] and Seal [784, 788]. Besides in standard journals in probability and applied probability, the research literature is often published in journals like Astin Bulletin, Insurance: Mathematics and Economics, the North American Actuarial Journal, the Scandinavian Actuarial Journal and Mitteilungen der Schweizerischen Aktuarvereinigung. Note that the latter has recently been merged with Bl¨ atter der Deutschen Gesellschaft f¨ ur Versicherungs und Finanzmathematik and a number of further Actuarial Bulletins of European countries into The European Actuarial Journal. The term risk theory is often interpreted in a broader sense than as just to comprise the study of ruin probabilities. An idea of the additional topics and problems one may incorporate under risk theory can be obtained from the survey paper [665] by Norberg; see also Chapter XVI. In the even more general area of nonlife insurance mathematics, some main texts (typically incorporating some ruin theory but emphasizing the topic to a varying degree) are Bowers et al. [195], B¨ uhlmann [208], Daykin et al. [279], Embrechts et al. [349], Heilmann [458], Hipp & Michel [468], Kaas et al. [515], Klugman, Panjer & Willmot [536], Mikosch [638], Schmidt [782], Straub [818], Sundt
6
CHAPTER I. INTRODUCTION
[820] and Taylor [840]. Note that life insurance (e.g. Gerber [402]) has a rather different flavor, and we do not get near to the topic anywhere in this book. Cox processes are treated extensively in Grandell [429]. For mixed Poisson processes and P´ olya processes, see e.g. the recent survey by Grandell [431] and references therein.
2
Claim size distributions
This section contains a brief survey of some of the most popular classes of distributions B which have been used to model the claims U1 , U2 , . . . We roughly classify these into two groups, lighttailed distributions (sometimes the term ‘Cram´ertype conditions’ is used), and heavytailed distributions. Here lighttailed means that the tail B(x) = 1 − B(x) satisfies B(x) = O(e−sx ) for some b is finite for some s > 0. In contrast, B is s > 0. Equivalently, the m.g.f. B[s] b heavytailed if B[s] = ∞ for all s > 0, but different more restrictive definitions are often used: subexponential, regularly varying (see below) or even regularly varying with infinite variance. On the more heuristical side, one could mention also the folklore in actuarial practice to consider B heavytailed if ‘20% of the claims account for more than 80% of the total claims’, i.e. if Z ∞ 1 x B(dx) ≥ 0.8, µB b0.2 where B(b0.2 ) = 0.2 and µB is the mean of B.
2a
Lighttailed distributions
Example 2.1 (the exponential distribution) Here the density is b(x) = δe−δx .
(2.1)
The parameter δ is referred to as the rate or the intensity, and can also be interpreted as the (constant) failure rate b(x)/B(x). As in a number of other applied probability areas, the exponential distribution is by far the simplest to deal with in risk theory as well. In particular, for the compound Poisson model with exponential claim sizes the ruin probability ψ(u) can be found in closed form. The crucial feature is the lack of memory: if U is exponential with rate δ, then the conditional distribution of U − x given U > x is again exponential with rate δ (this is essentially equivalent to the failure rate being constant). For example in the compound Poisson model, a simple stopping time argument shows that this implies that the conditional distribution
2. CLAIM SIZE DISTRIBUTIONS
7
of the overshoot Sτ (u) − u at the time of ruin given τ (u) is again exponential with rate δ, a fact which turns out to contain considerable information. 2 Example 2.2 (the gamma distribution) The gamma distribution with parameters p, δ has density δ p p−1 −δx x e Γ(p)
b(x) = and m.g.f.
µ b = B[s]
δ δ−s
(2.2)
¶p ,
s < δ.
The mean EU is p/δ and the variance Var U is p/δ 2 . In particular, the squared coefficient of variation (s.c.v.) 1 Var U = (EU )2 p is < 1 for p > 1, > 1 for p < 1 and = 1 for p = 1 (the exponential case). The exact form of the tail B(x) is given by the incomplete Gamma function Γ(x; p), Z ∞ Γ(δx; p) where Γ(x; p) = B(x) = tp−1 e−t dt. Γ(p) x Asymptotically, one has B(x) ∼
δ p−1 p−1 −δx x e . Γ(p)
In the sense of the theory of infinitely divisible distributions, the Gamma density (2.2) can be considered as the pth power of the exponential density (2.1) (or the 1/pth root if p < 1). In particular, if p is integer and U has the D gamma distribution p, δ, then U = X1 + · · · + Xp where X1 , X2 , . . . are i.i.d. and exponential with rate δ. This special case is referred to as the Erlang distribution with p stages, or just the Erlang(p) distribution. An appealing feature is its simple connection to the Poisson process: B(x) = P(U1 + · · · + Up > x) is the probability of at most p − 1 Poisson events in [0, x] so that B(x) =
p−1 X i=0
e−δx
(δx)i . i!
In the present text, we develop computationally tractable results mainly for the Erlang case (i.e. p ∈ N). Ruin probabilities for the general case have been studied, among others, by Grandell & Segerdahl [433] and Thorin [847]. 2
8
CHAPTER I. INTRODUCTION
Example 2.3 (the hyperexponential distribution) This is defined as a finite mixture of exponential distributions, b(x) =
p X
αi δi e−δi x
(2.3)
i=1
Pp where 1 αi = 1, 0 ≤ αi ≤ 1, i = 1, . . . , p. An important property of the hyperexponential distribution is that its s.c.v. is > 1. If αi ∈ R, then one speaks of the distribution as a combination of exponentials and this class is dense in the set of all distributions on the positive halfline. 2 Example 2.4 (phasetype distributions) A phasetype distribution is the distribution of the absorption time in a Markov process with finitely many states, of which one is absorbing and the rest transient. Important special cases are the exponential, the Erlang and the hyperexponential distributions. This class of distributions plays a major role in this book as one within computationally tractable exact forms of the ruin probability ψ(u) can be obtained. The parameters of a phasetype distribution are the set E of transient states, the restriction T of the intensity matrix of the Markov process to E and the row vector α = (αi )i∈E of initial probabilities. The density and c.d.f. are b(x) = αeT x t, resp. B(x) = αeT x e,
x ≥ 0,
where t = T e and e = (1 . . . 1)T is the column vector with 1 at all entries. The couple (α, T ) or sometimes the triple (E, α, T ) is called the representation. We give a more comprehensive treatment in IX.1 and defer further details to Chapter IX. 2 Example 2.5 (distributions with rational transforms) A distribution b = B has a rational m.g.f. (or, equivalently, a rational Laplace transform) if B[r] p(r)/q(r) with p(r) and q(r) being polynomials of finite degree. An equivalent characterization is that the density b(x) is the solution of a homogeneous ordinary differential equation with constant coefficients b(q) (x) + dq−1 b(q−1) (x) + · · · + d0 = 0;
dj ∈ R, d0 6= 0, R∞ where one of the initial conditions is determined by 0 b(x) dx = 1. Consequently the density b(x) has one of the forms b(x) = b(x) =
q X j=0 q1 X j=0
cj xj eηj x , cj xj eηj x +
(2.4) q2 X j=0
dj xj cos(aj x)eδj x +
q3 X j=0
ej xj sin(bj x)e²j x , (2.5)
2. CLAIM SIZE DISTRIBUTIONS
9
where the parameters in (2.4) are possibly complexvalued but the parameters in (2.5) are realvalued. This class of distributions is popular in the literature on both risk theory and queues, but often the attention is restricted to the class of phasetype distributions, which is slightly smaller but more amenable to probabilistic reasoning. We give some theory for matrixexponential distributions in IX.6. 2 Example 2.6 (distributions with bounded support) This example (i.e. there exists a x0 < ∞ such that B(x) = 0 for x ≥ x0 , B(x) > 0 for x < x0 ) is of course a trivial instance of a lighttailed distribution. However, it is notable from a practical point of view because of reinsurance: if excessofloss reinsurance has been arranged with retention level x0 , then the claim size which is relevant from the point of view of the insurance company itself is U ∧ x0 rather than U (the excess (U − x0 )+ is covered by the reinsurer). See XVI.6. 2
2b
Heavytailed distributions
Example 2.7 (the Weibull distribution) This distribution originates from reliability theory. Here failure rates δ(x) = b(x)/B(x) play an important role, the exponential distribution representing the simplest example since here δ(x) is constant. However, in practice one may observe that δ(x) is either decreasing or increasing and may try to model smooth (increasing or decreasing) deviations from constancy by δ(x) = dxr−1 (0 < r < ∞). Writing c = d/r, we obtain the Weibull distribution r
B(x) = e−cx ,
r
b(x) = crxr−1 e−cx ,
(2.6)
which is heavytailed when 0 < r < 1. All moments are finite. Another interpretation is that it is the distribution of X 1/r , where X is exponential with parameter c. 2 Example 2.8 (the lognormal distribution) The lognormal distribution with parameters σ 2 , µ is defined as the distribution of eV where V ∼ N (µ, σ 2 ), or equivalently as the distribution of eσW +µ where W ∼ N (0, 1). It follows that the density is 1 ³ log x − µ ´ d ³ log x − µ ´ Φ = b(x) = ϕ dx σ σx σ n ³ 1 log x − µ ´2 o 1 √ exp − . (2.7) = 2 σ xσ 2π Asymptotically, the tail is B(x) ∼
n 1 ³ log x − µ ´2 o σ √ exp − , 2 σ log x 2π
(2.8)
10
CHAPTER I. INTRODUCTION
which is heavier than the one of the Weibull distribution. The lognormal dis2 tribution has moments of all orders. In particular, the mean is eµ+σ /2 and the 2 second moment is e2µ+2σ . 2 Example 2.9 (the Pareto distribution) Here the essence is that the tail B(x) decreases like a power of x. There are various variants of the definition around, the simplest one being B(x) = x−α ,
x ≥ 1,
(2.9)
which can be interpreted as the distribution of eX for an exponential r.v. X with parameter α. Another variant is often referred to as USPareto and defined by aα αaα , b(x) = , x ≥ 0, B(x) = (2.10) α (a + x) (a + x)α+1 for some a > 0. The pth moment is finite if and only if p < α − 1. The LaplaceStieltjes transform of the Pareto distribution defined in (2.9) can be expressed through the incomplete Gamma function by Z ∞ α b B[−s] = e−sx α+1 dx = α sα Γ(−α, s). x 1 b Similarly, the LaplaceStieltjes transform of the US Pareto distribution is B[−s] = α as α (as) e Γ(−α, as). These relatively simple expressions have not always been noted. Abate, Choudhury & Whitt [1] introduced a somewhat related class of random variables called Pareto mixture of exponentials, which are products of Pareto and exponential r.v.’s and lead to quite explicit LaplaceStieltjes transforms. 2 Example 2.10 (the loggamma distribution) The loggamma distribution with parameters p, δ is defined as the distribution of eV where V has the gamma density (2.2). The density is b(x) =
δ p (log x)p−1 . xδ+1 Γ(p)
(2.11)
The pth moment is finite if p < δ and infinite if p > δ. For p = 1, the loggamma distribution is a Pareto distribution. 2 Example 2.11 (distributions with regularly varying tails) The tail B(x) of a distribution B is said to be regularly varying with index α if B(x) ∼
L(x) , xα
x → ∞,
(2.12)
3. THE ARRIVAL PROCESS
11
where L(x) is slowly varying, i.e. satisfies L(xt)/L(x) → 1 as x → ∞ (any L having a limit in (0, ∞) is slowly varying; another standard example is (log x)η ). Thus, examples of distributions with regularly varying tails are the Pareto distribution (2.10) (here L(x) → 1), the loggamma distribution (with index δ) and a Pareto mixture of exponentials. 2 Example 2.12 (the subexponential class of distributions) We say that a distribution B is subexponential if B ∗2 (x) = 2. x→∞ B(x) lim
(2.13)
It can be proved (see X.1) that any distribution with a regularly varying tail is subexponential. Also, for example the lognormal distribution is subexponential (but not regularly varying), though the proof of this is nontrivial, and so is the Weibull distribution with 0 < r < 1. Thus, the subexponential class of distributions provide a convenient framework for studying large classes of heavytailed distributions. We return to a closer study in X.1. 2 When studying ruin probabilities, it will be seen that we obtain completely different results depending on whether the claim size distribution is exponentially bounded or heavytailed. From a practical point of view, this phenomenon represents one of the true controversies of the area. Namely, the knowledge of the claim size distribution will typically be based upon statistical data, and based upon such information it seems questionable to extrapolate to tail behavior. However, one may argue that this difficulty is not restricted to ruin probability theory alone. Similar discussion applies to the distribution of the accumulated claims (XVI.2) or even to completely different applied probability areas like extreme value theory: if we are using a Gaussian process to predict extreme value behavior, we may know that such a process (with a covariance function estimated from data) is a reasonable description of the behavior of the system under study in typical conditions, but can never be sure whether this is also so for atypical levels for which far less detailed statistical information is available. We give some discussion on standard methods to distinguish between light and heavy tails in Chapter X.
3
The arrival process
For the purpose of modeling a risk process, the claim size distribution represents of course only one aspect (though a major one). At least as important is the
12
CHAPTER I. INTRODUCTION
specification of the structure of the point process {Nt } of claim arrivals and its possible dependence with the claims. By far the most prominent case is the compound Poisson (Cram´erLundberg) model where {Nt } is Poisson and independent of the claim sizes U1 , U2 , . . . The reason is in part mathematical since this model is the easiest to analyze, but the model also admits a natural interpretation: a large portfolio of insurance holders, which each have a (timehomogeneous) small rate of experiencing a claim, gives rise to an arrival process which is very close to a Poisson process, in just the same way as the Poisson process arises in telephone traffic (a large number of subscribers each calling with a small rate), radioactive decay (a huge number of atoms each splitting with a tiny rate) and many other applications. The compound Poisson model is studied in detail in Chapters IV, V (and, with the extension to premiums depending on the reserve, in Chapter VIII). To the authors’ knowledge, not so many detailed studies of the goodnessoffit of the Poisson model in insurance are available. Some of them have concentrated on the marginal distribution of NT (say T = one year), found the Poisson distribution to be inadequate and suggested various other univariate distributions as alternatives, e.g. the negative binomial distribution. The difficulty in such an approach lies in that it may be difficult or even impossible to imbed such a distribution into the continuous setup of {Nt } evolving over time, and also that the ruin problem may be hard to analyze. Nevertheless, getting away from the simple Poisson process seems a crucial step in making the model more realistic, in particular to allow for certain inhomogeneities. Historically, the first extension to be studied in detail was {Nt } to be renewal (the interarrival times T1 , T2 , . . . are i.i.d. but with a general not necessarily exponential distribution). This model, to be studied in Chapter VI, has some mathematically appealing random walk features, which facilitate the analysis. However, it is more questionable whether it provides a model with a similar intuitive content as the Poisson model. One could possibly argue that renewal models are a compromise between choosing a tractable model and taking into account statistical information that may indicate that exponential interarrival time distributions do not calibrate given data well enough. Of course, one is then still left to believe in the independence assumption and – with the introduced memory between claims – one has to be aware that the resulting model is for most applications to be seen as an interpolation rather than a causal model. A more appealing way to allow for inhomogeneity is by means of an intensity β(t) fluctuating over time. An obvious example is β(t) depending on the time of the year (the season), so that β(t) is a periodic function of t; we study this case in VII.6. Another one is Cox processes, where {β(t)}t≥0 is an arbitrary stochastic process. In order to prove reasonably substantial and interesting results, Cox processes are, however, too general and one needs to specialize to
4. A SUMMARY OF MAIN RESULTS AND METHODS
13
more concrete assumptions. The one we focus on (Chapter VII) is a Markovian environment: the environmental conditions are described by a finite Markov process {Jt }t≥0 , such that β(t) = βi when Jt = i. I.e. with a common term {Nt } is a Markovmodulated Poisson process; its basic feature is to allow more variation (bursty arrivals) than inherent in the simple Poisson process. This model can be intuitively understood in some simple cases like {Jt } describing weather conditions in car insurance, epidemics in life insurance etc. In others, it may be used in a purely descriptive way when it is empirically observed that the claim arrivals are more bursty than allowed for by the simple Poisson process. Mathematically, the periodic and the Markovmodulated models also have attractive features. The point of view we take here is Markovdependent random walks in continuous time (Markov additive processes), see III.4. This applies also to the case where the claim size distribution depends on the time of the year or the environment (VII.6), and which seems well motivated from a practical point of view as well.
4
A summary of main results and methods
4a
Duality with other applied probability models
Risk theory may be viewed as one of many applied probability areas, others being branching processes, genetics models, queueing theory, dam/storage processes, reliability, interacting particle systems, stochastic differential equations, time series and Gaussian processes, extreme value theory, stochastic geometry, point processes and so on. Some of these have a certain resemblance in flavor and methodology, others are quite different. The ones which appear most related to risk theory are queueing theory and dam/storage processes. In fact, it is a recurrent theme of this book to stress this connection which was often neglected in the early specialized literature on risk theory. Mathematically, the classical result is that the ruin probabilities for the compound Poisson model are related to the workload (virtual waiting time) process {Vt }t≥0 of an initially empty M/G/1 queue by means of ψ(u, T ) = P(VT > u),
ψ(u) = P(V > u),
(4.1)
where V is the limit in distribution of Vt as t → ∞. The M/G/1 workload process {Vt } may also be seen as one of the simplest storage models, with Poisson arrivals and constant release rule p(x) ≡ 1. A general release rule p(x) means that {Vt } decreases according to the differential equation V˙ = −p(V ) in between jumps, and here (4.1) holds as well provided the risk process has a premium rule depending on the reserve, R˙ = p(R) in between jumps. Similarly, ruin
14
CHAPTER I. INTRODUCTION
probabilities for risk processes with an input process which is renewal, Markovmodulated or periodic can be related to queues with similar characteristics. Thus, it is desirable to have a set of formulas like (4.1) permitting to translate freely between risk theory and the queueing/storage setting. In Chapter VIII we will also see a direct and natural link between the maximum workload of an M/G/1 queue and the ruin probability in a compound Poisson risk model in terms of excursions. In general, methods or modeling ideas developed in one area often have relevance for the other one as well. A stochastic process {Vt } is said to be in the steady state if it is strictly stationary (in the Markov case, this amounts to V0 having the stationary distribution of {Vt }), and the limit t → ∞ is the steadystate limit. The study of the steady state is by far the most dominant topic of queueing and storage theory, and a lot of information on steadystate r.v.’s like V is available. It should be noted, however, that quite often the emphasis is Ron computing expected values ∞ like EV . In the setting of (4.1), this gives only 0 ψ(u)du which is of limited intrinsic interest. Thus, the two areas, though overlapping, have to some extent a different flavor. A prototype of the duality results in this book is Theorem III.2.1, which gives a sample path version of (4.1) in the setting of a general premium rule p(x): the events {VT > u} and {τ (u) ≤ T } coincide when the risk process and the storage process are coupled in a suitable way (via timereversion). The infinite horizon (steady state) case is covered by letting T → ∞. The fact that Theorem III.2.1 is a sample path relation should be stressed: in this way the approach also applies to models having supplementary r.v.’s like the environmental process {Jt } in a Markovmodulated setting.
4b
Exact solutions
Of course, the ideal is to be able to come up with closed form solutions for the ruin probabilities ψ(u), ψ(u, T ). The cases where this is possible are basically the following for the infinite horizon ruin probability ψ(u): • The compound Poisson model with constant premium rate p = 1 and exponential claim size distribution B, B(x) = e−δx . Here ψ(u) = ρe−γu where β is the arrival intensity, ρ = β/δ and γ = δ − β. • The compound Poisson model with constant premium rate p = 1 and B being phasetype with just a few phases. Here ψ(u) is given in terms of a matrixexponential function (Corollary IX.3.1), which can be expanded into a sum of exponential terms by diagonalization (see, e.g., Example IX.3.2). The qualifier ‘with just a few phases’ refers to the fact that the diagonalization has to be carried out numerically in higher dimensions.
4. A SUMMARY OF MAIN RESULTS AND METHODS
15
• The compound Poisson model with a claim size distribution degenerate at one point, see Corollary IV.3.7. • The compound Poisson model with some rather special heavytailed claim size distributions, see Boxma & Cohen [193] and Abate & Whitt [3]. • The compound Poisson model with premium rate p(x) depending on the reserve and exponential claim size distribution B. Here ψ(u) is explicit provided that, as is typically the case, the functions Z x Z ∞ 1 1 βω(x)−δx dy, e dx ω(x) = p(y) p(x) 0 0 can be written in closed form, see Corollary VIII.1.9. • The compound Poisson model with a piecewise constant premium rule p(x) and B being phasetype with just a few phases, see IX.7. • Renewal models with exponential claim sizes, see Theorem VI.2.2. • Renewal model variants of the above cases for which the interclaim time is phasetype with just a few phases. • Any L´evy model where the risk reserve process (not the claim surplus process!) is downward skipfree (Theorem XI.2.3). This includes Brownian motion. • Any L´evy model for which the scale function is explicitly available, see XI.3 (for an early example cf. Furrer [381]). A further notable fact (see again XVI.1) is the explicit form of the ruin probability when {Rt } is a diffusion with infinitesimal drift and variance µ(x), σ 2 (x): © Rx ª R∞ exp − 0 2µ(y)/σ 2 (y) dy dx S(u) u © ª R R (4.2) ψ(u) = ∞ = 1− x 2 (y) dy dx S(∞) exp − 2µ(y)/σ 0 0 where
Z
u
S(u) = 0
½ Z exp −
x
¾ 2
2µ(y)/σ (y) dy
dx
0
is the natural scale. The finite horizon ruin probability ψ(u, T ) is explicit for Brownian motion (III.1) and the compound Poisson model with exponential claim size distribution (V.1). Later in the book a number of further rather specific cases will be discussed for which explicit expressions exist.
16
4c
CHAPTER I. INTRODUCTION
Numerical methods
Next to a closedform solution, the second best alternative is a numerical procedure which allows to calculate the exact values of the ruin probabilities. Here are some of the main approaches: Laplace transform inversion Often, it is easier to find the Laplace transforms Z ∞ Z ∞Z ∞ −su b b ψ[−s] = e ψ(u) du , ψ[−s, −ω] = e−su−ωT ψ(u, T ) du dT 0
0
0
in closed form rather than the ruin probabilities ψ(u), ψ(u, T ) themselves. In that case ψ(u), ψ(u, T ) can be calculated numerically by some method for transform inversion, say the fast Fourier transform (FFT) as implemented in Gr¨ ubel [438] for infinite horizon ruin probabilities for the renewal model. We do not discuss Laplace transform inversion much; other relevant references are Abate & Whitt [2], Embrechts, Gr¨ ubel & Pitts [346] and Gr¨ ubel & Hermesmeier [439]; see also Albrecher, Avram & Kortschak [14] and the Bibliographical Notes in [746, p. 191]. Matrixanalytic methods This approach is relevant when the claim size distribution is of phasetype (or matrixexponential), and in quite a few cases (Chapter IX), ψ(u) is then given in terms of a matrixexponential function eU u (here U is some suitable matrix) which can be computed by diagonalization, as the solution of linear differential equations or by some P∞ series expansion (not necessarily the straightforward 0 U n u/n! one!). In the compound Poisson model with p = 1, U is explicit in terms of the model parameters, whereas for the renewal arrival model and the Markovian environment model U has to be calculated numerically, either as the iterative solution of a fixed point problem or by finding the diagonal form in terms of the complex roots to certain transcendental equations. Differential and integral equations The idea here is to express ψ(u) or ψ(u, T ) as the solution to a differential or integral equation, and carry out the solution by some standard numerical method. One example where this is feasible is the renewal equation for ψ(u) (Corollary IV.3.3) in the compound Poisson model which is an integral equation of Volterra type. The method is, however, restricted to models that have a certain degree of Markovian structure in which case conditioning (or applying the more formal tool of generators, see II.4a) leads to equations that often involve both differential and integral terms. We will discuss cases where this approach can even lead to explicit solutions (see e.g. IX.7 and XII.3c). In
4. A SUMMARY OF MAIN RESULTS AND METHODS
17
many more cases, numerical solution methods are applicable, although the initial or boundary conditions can be a challenge. If an integral equation is available, it is often possible to define a contractive integral operator and to identify the ruin probability as its fixed point, in which case the ruin probability can be approximated by iterated application of the integral operator to some starting function. The resulting highdimensional integral can then be calculated by standard Monte Carlo and QuasiMonte Carlo techniques (see e.g. Albrecher et al. [31, 24]). In comparison to the alternative of direct simulation of the risk process (as discussed in Section 4g), this technique often has significant computational advantages over the latter.
4d
Approximations
The Cram´ erLundberg approximation This is one of the most celebrated results of risk theory (and probability theory as a whole). For the compound Poisson model with p = 1 and claim size distribution B with mob ment generating function (m.g.f.) B[s], it states that ψ(u) ∼ Ce−γu ,
u → ∞,
(4.3)
b 0 [γ] − 1) and γ > 0 is the solution of the Lundberg where C = (1 − ρ)/(β B equation ¡ ¢ b − 1 − γ = 0, β B[γ] (4.4) which can equivalently be written as γ b B[γ] = 1+ . β
(4.5)
It is rather standard to call γ the adjustment coefficient but a variety of other terms are also frequently encountered (and often the notation R instead of γ is used in the literature). The Cram´erLundberg approximation is renowned not only for its mathematical beauty but also for being very precise, often for all u ≥ 0 and not just for large u. It has generalizations to other L´evy models, to the models with renewal arrivals, a Markovian environment or periodically varying parameters. However, in such cases the evaluation of C is more cumbersome. In fact, when the claim size distribution is of phasetype, the exact solution is as easy to compute as the Cram´erLundberg approximation in some of these models. The shape of the l.h.s. of equation (4.4) and its extensions to other models lie at the heart of ruin theory. Its level sets (not only the one at 0) reveal a
18
CHAPTER I. INTRODUCTION lot of (in particular asymptotic) properties of ruinrelated quantities and will play an important role in this book.
Diffusion approximations Here the idea is simply to approximate the risk process by a Brownian motion (or a more general diffusion) by fitting the first and second moments, and to use the fact that first passage probabilities are more readily calculated for diffusions than for the risk process itself. Diffusion approximations are easy to calculate, but typically not very precise in their first naive implementation. However, incorporating correction terms may change the picture dramatically. In particular, corrected diffusion approximations (see V.6) are by far the best one can do in terms of finite horizon ruin probabilities ψ(u, T ). Large claims approximations In order for the Cram´erLundberg approximation to be valid, the claim size distribution should have an exponentially decreasing tail B(x). In the case of heavytailed distributions, other approaches are thus required. Approximations for ψ(u) as well as for ψ(u, T ) for large u are available in most of the models we discuss. For example, for the compound Poisson model under certain assumptions on B Z ∞ ρ B(x) dx, u → ∞. ψ(u) ∼ (4.6) µB (1 − ρ) u In fact, in some cases the results are even more complete than for light tails. See Chapter X. This list of approximations does by no means exhaust the topic; some further possibilities are surveyed in IV.7 and V.2.
4e
Bounds and inequalities
The outstanding result in the area is Lundberg’s inequality ψ(u) ≤ e−γu .
(4.7)
Compared to the Cram´erLundberg approximation (4.3), it has the advantage of not involving approximations and also, as a general rule, of being somewhat easier to generalize beyond the compound Poisson setting. We return to various extensions and sharpenings of Lundberg’s inequality (finite horizon versions, lower bounds etc.) at various places and in various settings. When comparing different risk models, it is a general principle that adding random variation to a model increases the risk. For example, one expects a model with a deterministic claim size distribution B, say degenerate at m, to
4. A SUMMARY OF MAIN RESULTS AND METHODS
19
have smaller ruin probabilities than when B is nondegenerate with the same mean m. This is proved for the compound Poisson model in IV.8 (see also further ordering results for dependent risks in Section XIII.8). Empirical evidence shows that the general principle holds in a broad variety of settings, though precise mathematical results are not always available.
4f
Statistical methods
Any of the approaches and results above assume that the parameters of the model are completely known. In practice, they have however to be estimated from data, obtained say by observing the risk process in [0, T ]. This procedure in itself is fairly straightforward; e.g., in the compound Poisson model, it splits up into the estimation of the Poisson intensity (the estimator is βb = NT /T ) and of the parameter(s) of the claim size distribution, which is a standard statistical problem since the claim sizes U1 , . . . , UNT are i.i.d. given NT . However, the difficulty comes in when drawing inference about the ruin probabilities. How do we produce a confidence interval? And, more importantly, can we trust the confidence intervals for the large values of u which are of interest? In the present authors’ opinion, this is extrapolation from data due to the extreme sensitivity of the ruin probabilities to the tail of the claim size distribution in particular (in contrast, fitting a parametric model to U1 , . . . , UNT may be viewed as an interpolation or smoothing of the histogram). For example, one may question whether it is possible to distinguish between claim size distributions which are heavytailed and those that have an exponentially decaying tail. This issue will be further discussed in Section X.6.
4g
Simulation
The development of modern computers has made simulation a popular experimental tool in all branches of applied probability and statistics, and of course the method is relevant in risk theory as well. Simulation may be used just to get some vague insight in the process under study: simulate one or several sample paths, and look at them to see whether they exhibit the expected behavior or some surprises come up. However, the more typical situation is to perform a Monte Carlo experiment to estimate probabilities (or expectations or distributions) which are not analytically available. For example, this is a straightforward way to estimate finite horizon ruin probabilities. The infinite horizon case presents a difficulty, because it appears to require an infinitely long simulation. Truncation to a finite horizon (or above a certain surplus level) has been used, but is not very satisfying. Still, good methods exist in a number of models and are based upon representing the ruin probability ψ(u)
20
CHAPTER I. INTRODUCTION
as the expected value of a r.v. (or a functional of the expectation of a set of r.v’s) which can be generated by simulation. The problem is entirely analogous to estimating steadystate characteristics by simulation in queueing/storage theory, and in fact methods from that area can often be used in risk theory as well. We look at a variety of such methods in Chapter XV, and also discuss how to develop methods which are efficient in terms of producing a small variance for a fixed simulation budget. A main problem is that ruin is typically a rare event (i.e. having small probability) and that therefore it is expensive or even infeasible in terms of computer time to obtain reasonably precise estimates of the ruin probability through naive simulation. Variance reduction techniques to improve the situation are discussed in Chapter XV.
Chapter II
Martingales and simple ruin calculations 1
Wald martingales
A random walk in discrete time is defined as Rn = R0 + Y1 + · · · + Yn where the Yi are i.i.d., with common distribution F (say). Here F is a general probability distribution on R (the special case of F being concentrated on {−1, 1} is often referred to as simple random walk or Bernoulli random walk). Most often in the probability literature, R0 = 0, but since we are here thinking of the random walk as a model for the risk reserve, we often allow R0 = u > 0. Denote by Fb[α] = EeαY1 the m.g.f. (moment generating function) of F and let κ(α) = log Fb[α] be the c.g.f. (cumulant generating function). Theorem 1.1 Let Rn = R0 + Y1 + · · · + Yn be a random walk. Then for any α such that Fb[α] < ∞, the sequence eα(Y1 +···+Yn ) = eα(Y1 +···+Yn )−nκ(α) n b F [α]
(1.1)
is a martingale. Proof. Denote by Mn the expression (1.1). Then ¯ £ ¤ E Mn+1 ¯ Y1 , . . . , Yn = =
¤ eα(Y1 +···+Yn ) £ αYn+1 b ¯¯ E e /F [α] Y1 , . . . , Yn Fb[α]n Mn EeαYn+1 /Fb[α] = Mn . 21
2
22 CHAPTER II. MARTINGALES AND SIMPLE RUIN CALCULATIONS The martingale in Theorem 1.1 is denoted the Wald martingale. The main application is optional stopping, i.e. exploiting the identity Eeα(Y1 +···+Yτ )−τ κ(α) = 1
(1.2)
where τ < ∞ is a finite stopping time. A sufficient condition for (1.2) is that E sup eα(Y1 +···+Yn )−nκ(α) < ∞ . n≤τ
For a necessary and sufficient condition, see III.1.4. The Wald martingale generalizes to a L´evy process {Xt }, defined as a continuous time process with stationary and independent increments. The traditional formal definition is that {Xt } is Rvalued with the increments Xt1 − Xt0 , Xt2 − Xt1 , . . . , Xtn − Xtn−1 being independent whenever t0 < t1 < . . . < tn and with Xti − Xti−1 having distribution depending only on ti − ti−1 . An equivalent characterization is {Xt } being Markov with state space R and ¯ ¤ £ E f (Xt+s − Xt ) ¯ Ft = E0 f (Xs ) , (1.3) where Ex refers to the case X0 = x. Note that the structure of such a process admits a complete description: {Xt } can be written as the independent sum of a pure drift {µt}, a Brownian component {Bt } (scaled by a variance constant σ) and a pure jump process {Mt }, Xt = X0 + µt + σBt + Mt .
(1.4)
More precisely, the pure jump process is given by its L´evy measure ν(dx), a positive measure on R with the properties Z ² Z x2 ν(dx) < ∞, ν(dx) < ∞ (1.5) −²
{x:x>²}
for some (and then all) ² > 0.R Roughly, the interpretation is that the rate of ² a jump of size x is ν(dx) (if −² xν(dx) = ∞, this description needs some amendments). The simplest case is β = kνk < ∞, which corresponds to the compound Poisson case: here jumps of {Mt } occur at rate β and have distribution B = ν/β (in particular, the claim surplus process for the compound Poisson risk model, with premium rate p, corresponds to a L´evy process with µ = −p, σ 2 = 0 and ν = βB). A general jump process can be thought of as limit of compound Poisson processes with drift by considering a sequence ν (n)
2. GAMBLER’S RUIN. TWOSIDED RUIN. BROWNIAN MOTION
23
of bounded measures with ν (n) ↑ ν. These issues are discussed in more detail in XI.1. For the moment, it suffices to have Brownian motion (possibly with a nonzero drift µ and a general variance constant σ 2 ) in mind as a second main example besides the compound Poisson model. We have: Theorem 1.2 Let {Xt } be a L´evy process and α ∈ R. Then EeαXt is either finite for all t > 0 or infinite for all t > 0. In the first case, EeαXt = etκ(α) for some κ(α) ∈ R, and the process eαXt −tκ(α)
(1.6)
is a martingale. Proof. The first part is easily seen to hold with κ(α) = log Eeα(X1 −X0 ) . For the second, denote by Mt the expression (1.6) and let {Ft } be the natural filtration of {Xt }. Then ¯ ¤ £ E[Mt+s  Ft ] = Mt E eα(Xt+s −Xt )−sκ(α) ¯ Ft = Mt Eeα(Xt+s −Xt )−sκ(α) = Mt . 2 A sufficient condition for optional stopping, i.e. EMτ = 1, is E supt≤τ Mt < ∞. For a necessary and sufficient condition, see again III.1.4. For Brownian motion with drift µ and variance constant σ 2 , X1 is N (µ, σ 2 ). 2 2 Therefore EeαX1 = eαµ+α σ /2 , so that κ(α) = αµ + α2 σ 2 /2 .
(1.7)
For the Cram´erLundberg process Rt with premium rate p, Poisson parameter β and claim size distribution B, it is shown in IV.1 that ¡ ¢ b κ(α) = β B[−α] − 1 + αp . (1.8) For the claim surplus process St = u − Rt , ¡ ¢ b − 1 − αp . κ(α) = β B[α]
2
(1.9)
Gambler’s ruin. Twosided ruin. Brownian motion
The first solution of a ruin problem appears to be that of de Moivre (1711) for the gambler’s ruin problem, which is a twosided ruin problem. That is,
24 CHAPTER II. MARTINGALES AND SIMPLE RUIN CALCULATIONS starting from u ∈ [0, a] we define the twobarrier ruin probability ψa (u) as the probability of being ruined before the reserve reaches level a. I.e. ¡ ¢ ¡ ¢ ψa (u) = P τ (u, a) = τ (u) = 1 − P τ (u, a) = τ+ (a) , where1 τ (u) = inf {t ≥ 0 : Rt ≤ 0} ,
τ+ (a) = inf {t ≥ 0 : Rt ≥ a} ,
τ (u, a) = τ (u) ∧ τ+ (a) = inf {t ≥ 0 : Rt ≤ 0 or Rt ≥ a} . Note that P(τ (u, a) < ∞) = 1, because the interval [0, a] is finite. Besides its intrinsic interest, ψa (u) can also be a useful vehicle for computing ψ(u) by letting a → ∞. De Moivre considered a Bernoulli random walk, defined as R0 = u with u ∈ {0, 1, . . . , a}, Rn = u+Y1 +· · ·+Yn where Y1 , Y2 , . . . are i.i.d. and {−1, 1}valued with P(Yk = 1) = θ. His result was: Proposition 2.1 For a Bernoulli random walk with θ 6= 1/2, ³ 1 − θ ´a
³ 1 − θ ´u − θ θ , ³ 1 − θ ´a −1 θ
ψa (u) =
If θ = 1/2, then ψa (u) =
a = u, u + 1, . . . .
(2.1)
a−u . a
We give two proofs, one elementary but difficult to generalize to other models, and the other more advanced but applicable also in some other settings. Proof 1. Conditioning upon Y1 yields immediately the recursion ψa (1) ψa (2)
= 1 − θ + θψa (2), = (1 − θ)ψa (1) + θψa (3), .. . ψa (a − 2) = (1 − θ)ψa (a − 3) + θψa (a − 1), ψa (a − 1) = (1 − θ)ψa (a − 2), 1 Note that the definition of τ (u) differs from the rest of the book where we use τ (u) = inf {t ≥ 0 : Rt < 0} (sharp inequality); in most cases, either this makes no difference (P(Rτ (u) = 0) = 0) or it is trivial to translate from one setup to the other, as e.g. in the Bernoulli random walk example below.
2. GAMBLER’S RUIN. TWOSIDED RUIN. BROWNIAN MOTION
25
and insertion shows that (2.1) is the solution satisfying the obvious boundary conditions ψa (0) = 1, ψa (a) = 0.2 2 The second proof uses martingales. We remark here as a matter of terminology that whereas our general definition of the Lundberg coefficient γ in Chapter I is as the nonzero solution of log eγS1 = 0 where St = R0 − Rt is the claim surplus, we work here directly with the reserve process Rt , so that for our Bernoulli random walk the Lundberg coefficient is the nonzero solution of log e−γ(R1 −R0 ) = 0, i.e. Fb[−γ] = 1 where Fb[s] = θes +(1−θ)e−s . In view of the discrete nature of a Bernoulli random walk, we write z = e−γ . The Lundberg equation is then 1 1 = Fb[−γ] = θz + (1 − θ) z with solution z = (1 − θ)/θ. Similar remarks apply when computing the Lundberg coefficient for Brownian motion below. © ª © ª Proof 2. Wald’s exponential martingale with α = −γ is just e−γRt = z Rn . By optional stopping, zu
= = =
Ez R0 = Ez Rτ (u,a) ¡ ¢ ¡ ¢ z 0 P Rτ (u,a) = 0 + z a P Rτ (u,a) = a ¡ ¢ z 0 ψa (u) + z a 1 − ψa (u) ,
(2.2)
and solving for ψa (u) yields ψa (u) = (z a − z u )/(z a − 1). If θ = 1/2, (2.2) is trivial (z = 1). However, {Rn } is then itself a martingale and we get in a similar manner u = ER0 = ERτu,a = 0 · ψa (u) + a(1 − ψa (u)),
ψa (u) =
a−u . a
2 We note that if θ ≤ 1/2, then a Bernoulli random walk hits 0 w.p. 1 so ψ(u) = 1 for u ≥ 0. In contrast: Corollary 2.2 For a Bernoulli random walk with θ > 1/2, ¶u µ 1−θ . ψ(u) = θ
(2.3)
If θ ≤ 1/2, then ψ(u) = 1. 2 Alternatively, a constructive solution of the difference equation ψ (x) = (1 − θ)ψ (x − a a 1) + θψa (x + 1) is to substitute rx , leading to the two choices r = 1 and r = (1 − θ)/θ and the result then follows from their linear combination determined by the boundary conditions.
26 CHAPTER II. MARTINGALES AND SIMPLE RUIN CALCULATIONS Proof. Let a → ∞ in (2.1).
2
Now let us turn to Brownian motion. Proposition 2.3 Let {Rt } be Brownian motion starting from u ≥ 0 and with drift µ and variance constant σ 2 . Then for µ 6= 0, 2
2
e−2µa/σ − e−2µu/σ . ψa (u) = e−2µa/σ2 − 1 If µ = 0, then ψa (u) =
(2.4)
a−u . a
Proof. By (1.7), the Lundberg equation κ(−γ) = 0 is γ 2 σ 2 /2 − γµ = 0 with 2 solution © −γR ª γ = 2µ/σ . Applying optional stopping to the exponential martingale t e yields e−γu = Ee−γR0 = Ee−γRτ (u,a) = e0 ψa (u) + e−γa (1 − ψa (u)), and solving for ψa (u) yields ψa (u) = (e−γa − e−γu )/(e−γa − 1) for µ 6= 0. If µ = 0, {Rt } is itself a martingale and just the same calculation as in the proof of Proposition 2.1 yields ψa (u) = (a − u)/a. 2 Corollary 2.4 For Brownian motion starting in u ≥ 0 with drift µ > 0 and variance constant σ 2 , 2 ψ(u) = e−2µu/σ . (2.5) If µ ≤ 0, then ψ(u) = 1. Proof. Let a → ∞ in (2.4).
2
The reason that the calculations work out so smoothly for Bernoulli random walks and Brownian motion is the skipfree nature of the paths, implying Rτ (u,a) = a on {τ (u, a) = τ+ (a)} and similarly for the boundary 0. For most standard risk processes, the paths are upwards skipfree but not downwards, and thus one encounters the problem of controlling the undershoot under level 0. Here is one more case where this is feasible: Example 2.5 Consider a compound Poisson model with exponential claims (with rate, say, δ). That is, Rt = u + t −
Nt X i=1
Ui
2. GAMBLER’S RUIN. TWOSIDED RUIN. BROWNIAN MOTION
27
where N is a Poisson(β) process and P(Ui > x) = e−δx . Now consider Rτ (u) , assume Rτ (u)− = x > 0, and let Z = −Rτ (u) + x be the size of the claim leading to ruin. The available information on Z is that its distribution is that of a claim size U given U > x, and thus by the memoryless property of the exponential distribution, the conditional distribution of Z is again just exponential with rate δ. Hence e−γu
= Ee−γR0 ¯ £ ¤ = E e−γRτ (u,a) ¯ Rτ (u,a) ≤ 0 P(Rτ (u,a) ≤ 0) + e−γa P(Rτ (u,a) = a) δ P(Rτ (u,a) ≤ 0) + e−γa P(Rτ (u,a) = a) = δ−γ ¡ ¢ δ ψa (u) + e−γa 1 − ψa (u) . = δ−γ
It follows from (1.8) that γ = δ − β, from which we obtain ψa (u) =
e−γu − e−γa . δ/β − e−γa
(2.6)
Again, letting a → ∞ yields the classical expression ρ e−γu for ψ(u) where ρ = β/δ (cf. I.4b), valid if ρ < 1 (otherwise, ψ(u) = 1). 2 However, passing to even more general cases the method quickly becomes unfeasible (see, however, IX.5a). It may then be easier to first compute the onebarrier ruin probability ψ(u): Proposition 2.6 If the paths of {Rt } are upward skipfree and ψ(a) < 1, ψa (u) =
ψ(u) − ψ(a) , 1 − ψ(a)
0 ≤ u ≤ a.
(2.7)
Proof. By the upward skipfree property, ¡ ¢ ψ(u) = ψa (u) + 1 − ψa (u) ψ(a). If ψ(a) < 1, this immediately yields (2.7).
2
We will meet this argument again in VIII.1a for computing ruin probabilities for a twostep premium function. Let us now return to Bernoulli random walk and Brownian motion and consider finite horizon ruin probabilities. For the symmetric (drift 0) case these are easily computable by means of the reflection principle:
28 CHAPTER II. MARTINGALES AND SIMPLE RUIN CALCULATIONS Proposition 2.7 For Brownian motion starting in u ≥ 0 with drift 0 and variance constant σ 2 , ³ −u ´ ψ(u, T ) = P(τ (u) ≤ T ) = 2Φ √ . σ T
(2.8)
Proof. In terms of the claim surplus process {St } = {u − Rt }, we have ψ(u, T ) = P(MT ≥ u) where MT = max0≤t≤T St . Here {St } is Brownian motion with drift 0 (starting from 0), in particular symmetric so that from time τ (u) (when the level is u) it is equally likely to go to levels < u and levels > u in time T − τ (u). Hence P(MT ≥ u, ST < u) = P(MT ≥ u, ST > u), and one gets P(MT ≥ u)
= = = =
P(ST ≥ u) + P(ST < u, MT ≥ u) P(ST ≥ u) + P(ST > u, MT ≥ u) P(ST ≥ u) + P(ST > u) 2P(ST > u) .
(2.9)
2 Small modifications also apply to Bernoulli random walks: Proposition 2.8 For the Bernoulli random walk with θ = 1/2, ψ(u, T ) = P(ST = u) + 2 P(ST > u),
(2.10)
whenever u, T are integervalued and nonnegative. Here
P(ST = v) =
³ 2−T
´ T , v = −T, −T + 2, . . . , T − 2, T, (v + T )/2 0,
otherwise.
Proof. The argument leading to (2.9) goes through unchanged, and (2.10) is the same as (2.9). The expression for P(ST = v) is just a standard formula for the binomial distribution. 2 Notes and references All material of the present section is standard. A classical reference for further aspects of Bernoulli random walks is Feller [361]. For generalizations of Proposition 2.6 to Markovmodulated models, see Asmussen & Perry [95]. Further early references on twobarrier ruin problems include Dickson & Gray [312, 313].
3. FURTHER SIMPLE MARTINGALE CALCULATIONS
3
29
Further simple martingale calculations
We consider the claim surplus process {St } of a general risk process. As usual, the time to ruin τ (u) is inf {t ≥ 0 : St > u}, and the ruin probabilities are ψ(u) = P(τ (u) < ∞),
ψ(u, T ) = P(τ (u) ≤ T ).
Our first result is a representation formula for ψ(u) obtained by using the martingale optional stopping theorem. Let ξ(u) = Sτ (u) − u denote the overshoot. © ª Proposition 3.1 Assume that (a) for some γ > 0, eγSt t≥0 is a martingale, a.s.
(b) St → −∞ on {τ (u) = ∞}. Then ψ(u) =
e−γu ¯ £ ¤, E eγξ(u) ¯ τ (u) < ∞
u ≥ 0.
Proof. We shall use optional stopping at time τ (u) ∧ T . 1
= =
3
(3.1)
We get
EeγS0 = EeγSτ (u)∧T ¤ £ ¤ £ E eγSτ (u) ; τ (u) ≤ T + E eγST ; τ (u) > T .
(3.2)
As T → ∞, the second term converges to 0 by (b) and dominated convergence (eγST ≤ eγu on {τ (u) > T }), and in the limit (3.2) takes the form ¤ £ 1 = E eγSτ (u) ; τ (u) < ∞ + 0 ¯ £ ¤ £ ¤ = eγu E eγξ(u) ; τ (u) < ∞ = eγu E eγξ(u) ¯ τ (u) < ∞ ψ(u). 2 Example 3.2 Consider the compound Poisson model with Poisson arrival rate β, claim size distribution B and ρ = βµB < 1. Thus St =
Nt X
Ui − t,
i=1
where {Nt } is a Poisson process with rate β and the Ui are i.i.d. with common distribution B (and independent of {Nt }). Condition (a) of Proposition 3.1 is satisfied, if a positive solution γ > 0 to the Lundberg equation (1.9) (i.e. an adjustment coefficient) exists. Property (b) follows from ρ < 1 and the law of large numbers (see Proposition IV.1.2(c)). 2 3 We cannot use the stopping time τ (u) directly because P(τ (u) = ∞) > 0 and also because the conditions of the optional stopping theorem present a problem; however, using τ (u) ∧ T invokes no problems because τ (u) ∧ T is bounded by T .
30 CHAPTER II. MARTINGALES AND SIMPLE RUIN CALCULATIONS Example 3.3 Assume that {Rt } is Brownian motion with variance constant σ 2 and drift µ > 0. Then {St } is Brownian motion with variance constant σ 2 and drift −µ < 0. Since the positive solution to the Lundberg equation (1.7) is γ = 2µ/σ 2 , the conditions of Proposition 3.1 are satisfied. 2 Corollary 3.4 (Lundberg’s inequality) Under the conditions of Proposition 3.1, ψ(u) ≤ e−γu for all u ≥ 0. Proof. Just note that ξ(u) ≥ 0 a.s.
2
We also retrieve again the exact expression of Section I.4b for exponential claims: Corollary 3.5 For the compound Poisson model with B exponential, B(x) = e−δx , and ρ = β/δ < 1, the ruin probability is ψ(u) = ρe−γu where γ = δ − β. Proof. As before, from (1.9) it is immediately seen that γ = δ − β. Now at the time τ (u) of ruin, {St } upcrosses level u by making a jump. By the memoryless property of the exponential distribution, the conditional distribution of the overshoot ξ(u) is again just exponential with rate δ. Thus Z ∞ Z ∞ ¯ £ ¤ 1 δ = . E eγξ(u) ¯ τ (u) < ∞ = eγx δe−δx dx = δe−βx dx = β ρ 0 0 2 If {Rt } is Brownian motion with variance constant σ 2 and drift µ > 0, 2 then ξ(u) = 0 by continuity of Brownian motion and ψ(u) = e−2µu/σ , which reconfirms Corollary 2.4. Notes and references The first use of martingales in risk theory is due to Gerber [397], and is further exploited in his book [398]. More recent references are Dassios & Embrechts [273], Grandell [429, 430], Embrechts, Grandell & Schmidli [345], Delbaen & Haezendonck [287] and Schmidli [770, 780].
4
More advanced martingales
4a
Generators. The Dynkin martingale
Assume that a stochastic process {Rt } is a Markov process and write Pu and Eu for the case R0 = u. In loose terms, the generator A is then an operator in an appropriate function space such that ¯ d ¯ Eu f (Rt )¯ = Af (u) dt t=0
4. MORE ADVANCED MARTINGALES
31
or equivalently lim h↓0
Eu f (Rh ) − f (u) = Af (u) h
(4.1)
for a sufficiently rich class of functions f . Historically, there have, however, been several different ways to make this loose definition precise, and in particular, one will find many definitions of the domain D(A ) on which A is defined. For example, some older definitions require (4.1) to hold uniformly or locally uniformly in u. The most standard current definition is in terms of martingales: f ∈ D(A) with Af = g (g can be shown to be unique up to some null set complications (cf. Davis [278, p. 32]) if and only if Z t
f (Rt ) −
g(Rs ) ds
(4.2)
0
is a local martingale4 , commonly referred to as the Dynkin martingale. The motivation relating to (4.1) is loosely the following. Denote by Mt the expression (4.2), assume it is a martingale (and not just a local martingale) and that the function s → Eu Af (Rs ) is sufficiently wellbehaved at s = 0, say continuous and bounded. Then Z h Eu f (Rh ) − f (u) = Eu [Mh − M0 ] + Eu Af (Rs ) ds 0
=
0 + hAf (u) + o(h)
so that (4.1) holds. Example 4.1 Assume that {Rt } is Brownian motion with drift µ and variance constant σ 2 . Let further f ∈ C 3 have compact support. Then if V is a standard normal r.v., we have5 √ ¡ ¢ Eu f (Rh ) = Ef u + µh + hσV √ √ £ £ ¤2 ¤ = E f (u) + f 0 (u)[µh + hσV ] + f 00 (u) µh + hσV /2 + O(h2 ) = f (u) + f 0 (u)µh + f 00 (u)hσ 2 EV 2 /2 + O(h3/2 ) . Thus (4.1) holds with Af = µf 0 + (σ 2 /2) f 00 . 4 Strictly speaking, a local martingale w.r.t. P for all x. For ease of exposition, we omit x such specification here and in the following. 5 This calculation is of course a heuristic derivation of Itˆ o’s formula. In its full generality Itˆ o’s formula permits to weaken the assumption on f to f ∈ C 2 .
32 CHAPTER II. MARTINGALES AND SIMPLE RUIN CALCULATIONS It is less clear how much one can relax the assumptions on f to still get a local martingale and we will not go into this issue here. Nevertheless, it is clear that for a suitable class of twice differentiable functions f , one should have f ∈ D(A) and that Af = µf 0 + (σ 2 /2) f 00 . The operator sending a twice differentiable function f to µf 0 + (σ 2 /2) f 00 is often called the differential operator of the Brownian motion. Similarly, a diffusion process with local drift µ(u) and local variance σ 2 (u) has differential operator µ(u)f 0 (u) + (σ 2 (u)/2) f 00 (u). 2 Example 4.2 Consider the Cram´erLundberg model with parameters β, B and let U be a generic claim. Then, conditioning on whether or not a claim occurs in (0, h), we have under suitable conditions on f ¡ ¢ Eu f (Rh ) = (1 − βh)f (u + h) + βh Ef u − U + O(h) + o(h) Z ∞ = f (u) − βf (u)h + hf 0 (u) + βh f (u − x) B(dx) + o(h) . 0
Thus, it is clear that for a suitable class of differentiable functions f , one should have f ∈ D(A) and that Z ∞ 0 Af (u) = −βf (u) + f (u) + β f (u − x) B(dx) . 0
2 A function f such that f (Rt ) is a martingale is called harmonic. Obviously, in view of (4.1) a harmonic function will have the property f ∈ D(A) and Af ≡ 0. Example 4.3 As a simple example of the relevance of harmonic functions for ruin theory, consider the Brownian setting of Example 4.1 and take f (u) = ψ(u) (the ruin probability), with the convention ψ(u) = 1 for u ≤ 0. For u > 0, P(τ (u) ≤ h) = o(h) 6 and so by boundedness of ψ(u) and the Markov property of Rt , £ ¤ £ ¤ ψ(u) = Eu ψ(Rh ); τ (u) > h + Eu I(τ (u) ≤ h) = Eu ψ(Rh ) + o(h). The same holds for u < 0, and so (omitting the details for u = 0) we conclude that ψ(u) is harmonic. Combining this with the remarks of Example 4.1, we conclude that µψ 0 (u) + ψ 00 (u)σ 2 /2 = 0. 6 This
for instance follows from III.(1.8).
4. MORE ADVANCED MARTINGALES
33
The general solution of this differential equation is C+ eλ+ u + C− eλ− u , where λ± are the solutions of the quadratic equation 0 = λµ + λ2 /2, i.e. λ+ = 0 and λ− = −2µ/σ 2 . If µ > 0, we have ψ(u) → 0 as u → ∞, and so C+ = 0. Together with the boundary condition ψ(0) = 1 (due to the oscillation) we arrive at 2 ψ(u) = e−2µu/σ , as found in Corollary 2.4 by different means. The method employed may be seen as a continuous time analogue of Proof 1 of Theorem 2.1. 2 Notes and references Ethier & Kurtz [359] is a good reference for the modern approach to generators. Further references are Davis [278] and Rolski et al. [746].
4b
Diffusions and twosided ruin
One of the major areas where generators play a main role is diffusion processes. Thus let {Rt } be a diffusion on [0, ∞) with drift µ(x) and variance σ 2 (x) at x. We assume that µ(x) and σ 2 (x) are continuous with σ 2 (x) > 0 for x > 0. Thus, close to x, {Rt } behaves as Brownian motion with drift µ = µ(x) and variance σ 2 = σ 2 (x). We can define a ‘local’ adjustment coefficient γ(x) = −2µ(x)/σ 2 (x) for the locally approximating Brownian motion. Let Z x Z ∞ Ry s(y) = e 0 γ(x) dx , S(x) = s(y) dy, S(∞) = s(y) dy. (4.3) 0
0
The following result gives a complete solution of the ruin problem for the diffusion subject to the assumption that S(x), as defined in (4.3) with 0 as lower limit of integration, is finite for all x > 0. If this assumption fails, the behavior at the boundary 0 is more complicated and it may happen, e.g, that ψ(u), as defined above as the probability of actually hitting 0, is zero for all u > 0 a.s. but that nevertheless Rt → 0 (the problem leads into the complicated area of boundary classification of diffusions, see e.g. Breiman [199] or Karlin & Taylor [522, p. 226]). Theorem 4.4 Consider a diffusion process {Rt } on [0, ∞), such that the drift µ(x) and the variance σ 2 (x) are continuous functions of x and that σ 2 (x) > 0 for x > 0. Assume further that S(x) as defined in (4.3) is finite for all x > 0. If S(∞) < ∞, (4.4) then 0 < ψ(u) < 1 for all u > 0 and ψ(u) = 1 −
S(u) . S(∞)
Conversely, if (4.4) fails, then ψ(u) = 1 for all u > 0.
34 CHAPTER II. MARTINGALES AND SIMPLE RUIN CALCULATIONS Lemma 4.5 Let 0 < b ≤ u ≤ a and let ψa,b (u) be the probability that {Rt } hits b before a starting from u. Then ψa,b (u) =
S(a) − S(u) . S(a) − S(b)
(4.5)
Proof. Recall that under mild conditions on q, Eu q(Rh ) = q(u) + Aq(u)h + o(h), where σ 2 (u) 00 q (u) + µ(u)q 0 (u) . Aq(u) = 2 If b < u < a, the probability of ruin or hitting the upper barrier a before h is of order o(h), so that ψa,b (u) = Eu ψa,b (Rh ) + o(h) = ψa,b (u) + Aψa,b (u)h + o(h), i.e Aψa,b (u) = 0. Using s0 (x)/s(x) = −2µ(x)/σ 2 (x), elementary calculus shows that we can rewrite A as d h q 0 (u) i 1 . (4.6) Aq(u) = σ 2 (u)s(u) 2 du s(u) 0 Hence Aψa,b (u) = 0 implies that ψa,b (u)/s(u) is constant, i.e. ψa,b (u) = α + βS(u). The obvious boundary conditions ψa,b (b) = 1, ψa,b (a) = 0 then yield the result. 2
Proof of Theorem 4.4. Letting b ↓ 0 in (4.5) yields ψa (u) = 1 − S(u)/S(a). Letting a ↑ ∞ and considering the cases S(∞) = ∞, S(∞) < ∞ separately completes the proof. 2 Notes and references A good introduction to diffusions is in Karlin & Taylor [522]; see in particular pp. 191–195 for material related to Theorem 4.4. In view of (4.5), the function S(x) is referred to as the natural scale in the general theory of diffusions (in case of integrability problems at 0, one works instead with a lower limit ² > 0 of integration in (4.3)). Another basic quantity is the speed measure M , defined by the density 1/(σ 2 (u)s(u)) showing up in (4.6). For related arguments concerning the local adjustment coefficient, see VIII.3.
4c
The KellaWhitt martingale
Example 4.6 As a motivating example, consider the surplus process {St } of a Cram´erLundberg model with a claim size distribution B that is a mixture of two exponential distributions. To be definite, let the density be b(x) =
1 −3x 1 −7x 3e + 7e 2 2
4. MORE ADVANCED MARTINGALES
35
and let the Poisson parameter be β = 3. Then ´ ³1 3 1 7 + − 1 − α. κ(α) = 3 23−α 27−α
(4.7)
Thus a solution of the Lundberg equation κ(α) = 0 must satisfy 9 21 (7 − α) + (3 − α) − 3(3 − α)(7 − α) − α(3 − α)(7 − α) . 2 2
0 =
This is a cubic equation, and roots are easily seen to be 0, γ = 1 , γ ∗ = 6; that the Lundberg coefficient γ must be 1 and not 6 follows since EeαS1 is only finite if α < 3. Consider the problem of twosided exit of {St } from [−a, b] with a, b > 0. Let p0 be the probability of exit at the lower barrier and p3 , p7 the probabilities of exit at the upper barrier as result of a jump which is exponential with rate 3 and 7, respectively. For brevity, write τ = τ (u, a). Now Sτ equals 0 for lower exit and b + V3 for upper exit as result of an exponential(3) jump (where V3 is again exponential(3) due to the lackofmemory property of the exponential distribution). Defining V7 similarly, we get by optional stopping of {eγSt } 1 = eγS0 = EeγSτ = p0 e−γa + p3 eγb i.e.
3 7 + p7 eγb , 3−γ 7−γ
6 = 6 p0 e−a + 9 p3 eb + 7 p7 eb .
(4.8)
Also obviously 1 = p0 +p3 +p7 . To get the missing third equation, it is tempting to formally proceed with γ ∗ as with γ, which would give 1 = eγ
∗
S0
= Eeγ
∗
Sτ
= p0 e−γ
∗
a
+ p3 eγ
∗
b
∗ 7 3 + p 7 eγ b , ∗ 3−γ 7 − γ∗
(4.9)
i.e. 1 = p0 e−6a + e6b (−p3 + 7p7 ) . ∗
(4.10)
∗
The problem is, however, that Eeγ St = ∞ so {eγ St } is not a martingale (or for that sake a local martingale), so we are missing a rigorous justification for (4.9). 2 We will see in Example 4.9 that the solution is nevertheless correct. For this and other purposes, we will exploit a martingale introduced by Kella & Whitt [527]. Let {Rt } be a L´evy process with L´evy exponent κ(α). The Wald martingale is then Mt = eαRt −tκ(α) . The KellaWhitt martingale is a stochastic integral w.r.t. {Mt } and has a somewhat different range of applications; in particular, it allows for a more direct study of aspects of reflected L´evy processes.
36 CHAPTER II. MARTINGALES AND SIMPLE RUIN CALCULATIONS Theorem 4.7 Let {Rt } be a L´evy process with L´evy exponent κ(α), let Z
t
Yt = 0
dYsc +
X
∆Ys
0≤s≤t
be an adapted process of locally bounded variation with continuous part {Ytc }, Dpaths and jumps ∆Ys = Ys − Ys− , and define Zt = x + Rt + Yt . For each t, let Kt be the r.v. Z t Kt = eαZs dMs 0 Z t Z t X eαZs (1 − e−α∆Ys ). = κ(α) eαZs ds + eαx − eαZt + α eαZs dYsc + 0
0
0≤s≤t
Then {Kt } is a local martingale whenever κ(α) is welldefined. αYt +tκ(α) Proof. Let B . Then, by the general theory of stochastic integraR tt = e ∗ tion, Kt = 0 Bs− dMs is a local martingale. Using the formula for integration by parts (see e.g. Protter [718, p.60] for a version sufficiently general to deal with the present case) yields
Z
t
Ms− dBs + Kt∗ +
Mt Bt − M0 B0 = 0
X
∆Ms ∆Bs .
0≤s≤t
Inserting X
Z 0
Z −Kt∗ =
t
∆Ms dBs =
0≤s≤t
it follows that
Z
t
∆Ms ∆Bs =
(Ms − Ms− ) dBs , 0
t
Ms dBs + M0 B0 − Mt Bt .
(4.11)
0
¡ ¢ Using Ms Bs = eαZs and dBs = Bs α dYsc + κ(α)ds + 1 − e−α∆Ys shows that the r.h.s. of (4.11) reduces to Kt . 2 For practical purposes, the main application is optional stopping which is often verified via the following standard result from martingale theory: Theorem 4.8 If for a given t we have E sups≤t Ks  < ∞, then {Kt } is a proper martingale. Further, let τ be a stopping time such that E supt≤τ Kt  < ∞. Then EKτ = EK0 = 0.
4. MORE ADVANCED MARTINGALES
37
Example 4.9 Consider again the mixed exponential setting of Example 4.6. Take Yt ≡ 0. Thus we have simply Z t Kt = κ(α) eαSs ds + 1 − eαSt 0
whenever α < 3. Noting that −a ≤ Ss < b, we get ¯ ¯ Ks  ≤ ¯κ(α)¯seα max(a,b) + 1 + eα(b+V3 ) + eα(b+V7 ) . Since Eτ < ∞, this shows the conditions of Theorem 4.8 for α < 3 and gives as in (4.9) that 0 = κ(α)φ(α) + 1 − p0 e−αa − p3 eαb
7 3 − p7 e−αb , 3−α 7−α
(4.12)
Rt where φ(α) = E 0 eαRs ds. The same bound as used above easily gives that φ(α) is defined for all α ∈ R, not only for α < 3, and is an analytic function of α. Since everything else in (4.12) is analytic in the region Ω = R\{3, 7} (think of κ(s) as the r.h.s. of (4.7), i.e. the analytic continuation of log EesS1 ), we conclude that (4.12) holds in the whole of Ω. Taking α = γ ∗ = 6 we get the desired rigorous proof of (4.9). 2 The picture that emerges is that all roots of the analytic continuation of log Ees(R1 −R0 ) may enter in ruin formulas, and we will see several examples of this, in particular when phasetype distributions are involved (see e.g. XI.5 for a much more elaborate version of Example 4.9). The problem with the Wald martingale is of course that one only gets (4.9) for 0 and γ, and this is not enough to do an analytic continuation. Other values of α require conditional expectations of eατ given the type of exit, and we are not aware of how to approach these via the Wald martingale. However, Example 4.9 shows how to solve the problem via the KellaWhitt martingale. Notes and references Optional stopping of the KellaWhitt martingale is further discussed in Asmussen & Kella [85], and Markovmodulated versions of the martingale are in Asmussen & Kella [84]. A variety of different applications are surveyed in [APQ, Ch. IX.3].
This page intentionally left blank
Chapter III
Further general tools and results 1
Likelihood ratios and change of measure
We consider stochastic processes {Xt } with a Polish state space E and paths in the Skorohod space DE = DE [0, ∞), which we equip with the natural filtration {Ft }t≥0 and the Borel σfield F. Two such processes may be repe on (DE , F), and in analogy with the resented by probability measures P, P theory of measures on finitedimensional spaces one could study conditions for e the RadonNikodym derivative dP/dP to exist. However, as shown by the following example this setup is too restrictive: typically1 , the parameters of the e are then two processes can be reconstructed from a single infinite path, and P, P singular (concentrated on two disjoint measurable sets). e correspond to the claim surplus process of two comExample 1.1 Let P, P pound Poisson risk processes with Poisson rates β, βe and claim size distributions e The number Nt(²) of jumps > ² before time t is a (measurable) r.v. on B, B. (²) (DE , F), hence so is Nt = lim²↓0 Nt . Thus the sets o Nt =β , t→∞ t
n S=
lim
Se =
n
o Nt = βe t→∞ t lim
e then S and Se are disjoint, and by the law of are both in F. But if β 6= β, 1 Though not always: it is not difficult to construct a counterexample say in terms of transient Markov processes.
39
40
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
e S) e = 1. A somewhat similar large numbers for the Poisson process, P(S) = P( e argument gives singularity when B 6= B. 2 The interesting concept is therefore to look for absolute continuity only on finite time intervals (possibly random, cf. Theorem 1.3 below). I.e. we look for a process {Lt } (the likelihood ratio process) such that e P(A) = E[Lt ; A], A ∈ Ft ,
(1.1)
e to (DE , Ft ) is absolutely continuous w.r.t. the restriction (i.e. the restriction of P of P to (DE , Ft )). The following result gives the connection to martingales. Proposition 1.2 Let {Ft }t≥0 be the natural filtration on DE , F the Borel σfield and P a given probability measure on (DE , F). ¡ ¢ (i) If {Lt }t≥0 is a nonnegative martingale w.r.t. {Ft } , P such that ELt = 1, e on F such that (1.1) holds. then there exists a unique probability measure P e (ii) Conversely, if for some probability measure P and some {Ft }adapted ¡ process¢ {Lt }t≥0 (1.1) holds, then {Lt } is a nonnegative martingale w.r.t. {Ft } , P such that ELt = 1. e by P et (A) = E[Lt ; A], A ∈ Ft . Proof. Under the assumptions of (i), define P e Then Lt ≥ 0 and ELt = 1 ensure that Pt is a probability measure on (DE , Ft ). Let s < t, A ∈ Fs . Then ¯ ¤ £ et (A) = E[Lt ; A] = E E Lt I(A) ¯ Fs = E I(A)E[Lt Fs ] P es (A), = E I(A)Ls = P © ª et using the martingale property in the fourth step. Hence the family P is t≥0 e consistent and hence extendable to a probability measure P on (DE , F) such e et (A), A ∈ Ft . This proves (i). that P(A) =P Conversely, under the assumptions of (ii) we have for A ∈ Fs and s < t that A ∈ Ft as well and hence E[Ls ; A] = E[Lt ; A]. The truth of this for all A ∈ Fs implies that E[Lt Fs ] = Ls and the martingale property. Finally, ELt = 1 follows by taking A = DE in (1.1) and nonnegativity by letting A = {Lt < 0}. e Then P(A) = E[Lt ; Lt < 0] can only be nonnegative if P(A) = 0. 2 The following likelihood ratio identity (typically with τ being the time τ (u) to ruin) is a fundamental tool throughout the book:
1. LIKELIHOOD RATIOS AND CHANGE OF MEASURE
41
e be as in Proposition 1.2(i). If τ is a stopping time Theorem 1.3 Let {Lt }, P and G ∈ Fτ , G ⊆ {τ < ∞}, then i h e 1 ;G . (1.2) P(G) = E Lτ Further, if Z is a r.v. which is Fτ measurable and 0 on the set {τ = ∞}, then e EZ = E[Z/L τ ].
(1.3)
Proof. Assume first G ⊆ {τ ≤ T } for some fixed deterministic T < ∞. By the martingale property, we have E[LT Fτ ] = Lτ on {τ ≤ T }. Hence i h i h i h e 1 ; G = E LT ; G = E 1 I(G)E[LT Fτ ] E Lτ Lτ Lτ i h 1 I(G)Lτ = P(G) . (1.4) = E Lτ In the general case, applying (1.4) to G ∩ {τ ≤ T } we get h i e 1 ; G ∩ {τ ≤ T } . P(G ∩ {τ ≤ T }) = E Lτ Since everything is nonnegative, both sides are increasing in T , and letting T → ∞, (1.2) follows by monotone convergence. For (1.3), just use standard measuretheoretic arguments to extend from indicators to r.v.’s. 2 A main example of change of measure is to take {Lt } as the Wald martingale eθXt −tκ(θ) in the case where {Xt } is a random walk or a L´evy process. We e and talk about exponential change of measure then write Pθ rather than P, or exponential tilting. We will see in Section 3 that {Xt } remains a random walk/L´evy process under Pθ , only with changed parameters. A first elegant application of the changeofmeasure technique is the following observation: Corollary 1.4 The necessary and sufficient condition for optional stopping of the Wald martingale, i.e. 1 = EeθXτ −τ κ(θ) for a given stopping time with τ < ∞, is that Pθ (τ < ∞) = 1. Proof. Let Z = Lτ = eθXτ −τ κ(θ) = eθXτ −τ κ(θ) I(τ < ∞) . e < ∞). Then the assertion means EZ = 1, whereas (1.3) gives EZ = P(τ
2
42
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
From Theorem 1.3 we obtain a likelihood ratio representation of the ruin probability ψ(u) parallel to the martingale representation II.(3.1) in Proposition II.3.1: Corollary 1.5 Under condition (a) of Proposition II.3.1, ψ(u) = e−γu Eγ [e−γξ(u) ; τ (u) < ∞].
(1.5)
Proof. Letting G = {τ (u) < ∞}, we have P(G) = ψ(u). Now just rewrite the r.h.s. of (1.2) by noting that 1 = e−γSτ (u) = e−γu e−γξ(u) . Lτ (u)
2
The advantage of (1.5) compared to II.(3.1) is that it seems in general easier to deal with the (unconditional) expectation Eγ [e−γξ(u) ; τ (u) < ∞] occurring there than with the (conditional) expectation E[eγξ(u) τ (u) < ∞] in II.(3.1). The crucial step is to obtain information on the process evolving according to e and this problem will now be studied, first in the Markov case and next P, (Sections 3, 4) for processes with some randomwalklike structure. As another simple application of the changeofmeasure technique, we shall establish a formula for the finitetime ruin probability of Brownian motion: Corollary 1.6 Let {Rt } be Brownian motion with drift µ and variance constant 1. Then the density and c.d.f. of τ (u) are n (u − T µ)2 o u , (1.6) Pµ (τ (u) ∈ dT ) = √ T −3/2 exp − 2T 2π ³ u ³ u √ ´ √ ´ Pµ (τ (u) ≤ T ) = 1 − Φ √ − µ T + e2µu Φ − √ − µ T . (1.7) T T For a general variance constant σ 2 , one furthermore obtains P(τ (u) ≤ T ) = Φ
³ −u + µT ´ ³ −u − µ√T ´ 2 √ √ + e2µu/σ Φ . σ T σ T
(1.8)
Proof. Consider first the case σ 2 = 1. For µ = 0, (1.7) is the same as II.(2.8), and (1.6) follows then by straightforward differentiation. For µ 6= 0, the ratio 2 dPµ /dP0 of densities of St is eµSt −tµ /2 , since κ(θ) = −θµ + θ2 σ 2 /2. Hence £ ¤ 2 Pµ (τ (u) ∈ dT ) = E0 eµSτ (u) −τ (u)µ /2 ; τ (u) ∈ dT ¡ ¢ 2 = eµu−T µ /2 P0 τ (u) ∈ dT n 1 u2 o 2 u , = eµu−T µ /2 √ T −3/2 exp − · 2 T 2π
1. LIKELIHOOD RATIOS AND CHANGE OF MEASURE
43
which is the same as (1.6). (1.7) then follows by checking that the derivative of the r.h.s. is (1.6) and that the value at 0 is 0. For a general σ 2 , a completely analogous calculation can be done, but the analogue of (1.6) is more tedious to write down, so we omit the details. 2 The same argument as used for Corollary 1.6 also applies to Bernoulli random walk with θ 6= 1/2, but we again omit the details. Consider next a (timehomogeneous) Markov process {Xt } with state space E, say, in continuous time (the discrete time case is parallel but slightly simpler). In the context of ruin probabilities, one would typically have Xt = Rt , Xt = St , Xt = (Jt , Rt ) or Xt = (Jt , St ), where {Rt } is the risk reserve process, {St } = {u − Rt } the claim surplus process and {Jt } a process of supplementary variables possibly needed to make the process Markovian. A change of measure is performed by finding a process {Lt } which is a martingale w.r.t. each Px , is nonnegative and has Ex Lt = 1 for all x, t. The problem is thus to investigate which characteristics of {Xt } and {Lt } ensure a given set of properties of the changed probability measure. First we ask when the Markov property is preserved. To this end, we need the concept of a multiplicative functional. For the definition, we assume for simplicity that {Xt } has Dpaths, is Markov w.r.t. the natural filtration {Ft } on DE and define {Lt } to be a multiplicative functional if {Lt } is adapted to {Ft }, nonnegative and Lt+s = Lt · (Ls ◦ θt ) (1.9) Px a.s. for all x, s, t, where θt is the shift operator. The precise meaning of this is the following: being Ft measurable, Lt has the form ¡ ¢ Lt = ϕt {Xu }0≤u≤t for some mapping ϕt : DE [0, t] → [0, ∞), and then ¡ ¢ Ls ◦ θt = ϕs {Xt+u }0≤u≤s . Theorem 1.7 Let {Xt } be Markov w.r.t. the natural filtration {Ft } on DE , e be let {Lt } be a nonnegative martingale with Ex Lt = 1 for all x, t and let P © ªx e e the probability measure given by Px (A) = Ex [Lt ; A]. Then the family Px x∈E defines a timehomogeneous Markov process if and only if {Lt } is a multiplicative functional. Proof. Since both sides of (1.9) are Ft+s measurable, (1.9) is equivalent to £ ¤ Ex [Lt+s Vt+s ] = Ex Lt · (Ls ◦ θt )Vt+s (1.10)
44
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
for any Ft+s measurable r.v. Vt+s , which in turn is the same as £ ¤ £ ¤ Ex Lt+s Zt · (Ys ◦ θt ) = Ex Lt · (Ls ◦ θt )Zt · (Ys ◦ θt )
(1.11)
for any Ft measurable Zt and any Fs measurable Ys . Indeed, since Zt · (Ys ◦ θt ) is Ft+s measurable, (1.10) implies (1.11). The converse follows since Qn the class of r.v.’s of the form Zt · (Ys ◦ θt ) comprises all r.v.’s of the form 1 fi (Xt(i) ) with all t(i) ≤ t + s. Similarly, the Markov property can be written e x [Ys ◦ θt Ft ] = E e X Ys , t < s, E t for any Fs measurable r.v. Ys , which is the same as £ ¤ £ ¤ e x Zt (Ys ◦ θt ) = E e x Zt E e X Ys E t ex , this in turn means for any Ft measurable r.v. Zt . By definition of P £ ¤ £ ¤ Ex Lt+s Zt (Ys ◦ θt ) = Ex Lt Zt EXt [Ls Ys ] , ¯ ¤ £ or, since EXt [Ls Ys ] = E (Ys ◦ θt )(Ls ◦ θt ) ¯ Ft , £ ¤ £ ¤ Ex Lt+s Zt (Ys ◦ θt ) = Ex Lt Zt (Ys ◦ θt )(Ls ◦ θt ) , which is the same as (1.11).
(1.12) 2
© ª ex Remark 1.8 For P to define a timehomogeneous Markov process, it x∈E suffices to assume that {Lt } is a multiplicative functional with Ex Lt = 1 for all x, t. Indeed, then E[Lt+s Ft ]
= Lt E[Ls ◦ θt Ft ] = Lt EXt Ls = Lt ,
(using the Markov property in the second step) so that the martingale property is automatic. 2 Notes and references The results of the present section are essentially known in a very general Markov process formulation, see Dynkin [338] and Kunita [562]. A more elementary version along the lines of Theorem 1.7 can be found in K¨ uchler & Sørensen [561], with a proof somewhat different from the present one. A further relevant reference is BarndorffNielsen & Shiryaev [139].
2. DUALITY WITH OTHER APPLIED PROBABILITY MODELS
2
45
Duality with other applied probability models
In this section, we shall establish a general connection between ruin probabilities and certain stochastic processes which occur for example as models for queueing and storage. The formulation has applications to virtually all the risk models studied in this book. The result is a sample path relation, and thus for the moment no parametric assumptions (on say the structure of the arrival process) are needed. We work on a finite time interval [0, T ] in the following setup (which can be much generalized): The risk process {Rt }0≤t≤T has arrivals at epochs σ1 , . . . , σN , 0 ≤ σ1 ≤ . . . ≤ σN ≤ T . The corresponding claim sizes are U1 , . . . , UN . In between jumps, the premium rate is p(r) > 0 when the reserve is r (i.e., R˙ = p(R)). Thus Z Rt = R0 +
t
p(Rs ) ds − At
where At =
0
X
Uk .
(2.1)
k: σk ≤t
The initial condition is arbitrary, R0 = u (say), and the time to ruin is τ (u) = inf {t ≥ 0 : Rt < 0}. The storage process {Vt }0≤t≤T is essentially defined by timereversion, reflection at zero and initial condition V0 = 0. More precisely, the arrival ∗ where σk∗ = T −σN −k+1 , and just after time σk∗ {Vt } epochs are σ1∗ , . . . , σN makes an upwards jump of size Uk∗ = UN −k+1 . In between jumps, {Vt } decreases at rate p(r) when Vt = r (i.e., V˙ = −p(V )). That is, instead of (2.1) we have Z Vt =
A∗t
−
t
p(Vs ) ds 0
X
where A∗t = k:
Uk∗ = AT − AT −t ,
∗ u} coincide. In particular, ψ(u, T ) = P(VT > u).
(2.2)
(u) (u) (u) Proof. Let rt denote the solution of R˙ = p(R) subject to r0 = u. Then rt (v) > rt for all t when u > v. Suppose first VT > u (this situation corresponds to the solid path of {Rt } in Fig. III.1 with R0 = u = u1 ). Then ∗ = rσ(V1T ) − U1 > rσ(u) − U1 = Rσ1 . Vσ N 1 ∗ > 0, we can repeat the argument and get Vσ ∗ If VσN > Rσ2 and so on. Hence N −1 ∗ if n satisfies VσN = 0 (such an n exists, if nothing else n = N ), we have −n+1 Rσn < 0 so that indeed τ (u) ≤ T . Suppose next VT ≤ u (this situation corresponds to the dotted path of {Rt } in Fig. III.1 with R0 = u = u2 ). Then similarly ∗ Vσ N = rσ(V1T ) − U1 ≤ rσ(u) − U1 = Rσ1 , 1
∗ Vσ N ≤ Rσ2 , −1
and so on. Hence Rσn ≥ 0 for all n ≤ N , and since ruin can only occur at the times of claims, we have τ (u) > T . 2
2. DUALITY WITH OTHER APPLIED PROBABILITY MODELS
47
A basic example is when {Rt } is the risk reserve process corresponding to claims arriving at Poisson rate β and being i.i.d. with distribution B, and a general premium rule p(r) when the reserve is r. Then the time reversibility of the Poisson process ensures that {At } and {A∗t } have the same distribution (for finitedimensional distributions, the distinction between right and left continuity is immaterial because the probability of a Poisson arrival at any fixed time t is zero). Thus we may think of {Vt } as having compound Poisson input and being defined for all t < ∞. Historically, this represents a model for storage, say of water in a dam though other interpretations like the amount of goods stored are also possible. The arrival epochs correspond to rainfalls, and in between rainfalls water is released at rate p(r) when Vt (the content) is r. We get: Corollary 2.2 Consider the compound Poisson risk model with a general premium rule p(r). Then the storage process {Vt } has a proper limit in distribution as t → ∞, say V , if and only if ψ(u) < 1 for all u, and then ψ(u) = P(V > u). Proof. Let T → ∞ in (2.2).
(2.3) 2
Consider now a compound Poisson risk model with constant premium rate 1 and claim arrival rate β. Then a direct relationship can be obtained between the survival probability in the risk model and the maximum workload Vmax (u) of a busy period in an M/G/1 queue with arrival rate β. Theorem 2.3 Under the safety loading condition η > 0, we have for every u≥0 1 d log φ(u), (2.4) P(Vmax > u) = β du where φ(u) = 1 − ψ(u) is the survival probability. Proof. The risk process Rt starting in u can only survive if after each claim that occurs at some running maximum s > u, the level s will be reached again before ruin occurs. This is equivalent to the statement that the maximum workload Vmax of a busy period in an M/G/1 queue (with traffic intensity ρ < 1) does not exceed s. As we are only concerned about eventual survival, we can cut out every such ‘surviving’ excursion away from the running maximum of the risk process and only consider those claims at the running maximum which lead to a downward excursion causing ruin before recovering to level s. The instantaneous probability of having a claim of the latter type is β dt P(Vmax > s) = β ds P(Vmax > s),
48
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
since at the running maximum we have ds = dt. Consequently, the survival probability φ(u) can simply be interpreted as the probability to have zero events during [u, ∞) of an inhomogeneous Poisson process with rate β(s) = β P(Vmax > s) (which constitutes a thinning of the original Poisson process). This finally implies Z ∞ ³ Z ∞ ´ ³ ´ φ(u) = exp − β(s) ds = exp −β P(Vmax > s) ds . (2.5) u
u
2 The relation (2.4) also follows from a combination of (2.3) and the identity P(Vmax > u) =
1 d log P(V < u) β du
for an M/G/1 queue (see for instance Cohen [249, p. 618]), but the above proof establishes a direct and selfcontained link between Vmax and φ(u) that will also be useful later on (cf. VIII.4). Notes and references Two main references on storage processes are Harrison & Resnick [451] and Brockwell, Resnick & Tweedie [203]. Theorem 2.1 and its proof is from Asmussen & Schock Petersen [104], Corollary 2.2 from Harrison & Resnick [452]. The results can be viewed as special cases of Siegmund duality, see Siegmund [808]. Some further more general references are Asmussen [63] and Asmussen & Sigman [105]. Theorem 2.3 is due to Albrecher, Borst, Boxma & Resing [16]. Historically, the connection between risk theory and other applied probability areas appears first to have been noted by Prabhu [711] in a queueing context. It is a standard tool today, but one may feel that the interaction between the different areas was surprisingly limited in the first decades after the appearance of [711].
3
Random walks in discrete or continuous time
Consider a random walk Xn = X0 + Y1 + · · · + Yn in discrete time where the Yi are i.i.d., with common distribution F . For discrete time random walks, there is an analogue of Theorem 2.1 in terms of Lindley processes. For a given i.i.d. Rvalued sequence Z1 , Z2 , . . ., the Lindley process W0 , W1 , W2 , . . . generated by Z1 , Z2 , . . . is defined by assigning W0 some arbitrary value ≥ 0 and letting Wn+1 = (Wn + Zn+1 )+ .
(3.1)
Thus {Wn }n=0,1,... evolves as a random walk with increments Z1 , Z2 , . . . as long as the random walk only takes nonnegative values, and is reset to 0 once the
3. RANDOM WALKS IN DISCRETE OR CONTINUOUS TIME
49
random walk hits (−∞, 0). I.e., {Wn }n=0,1,... can be viewed as the reflected version of the random walk with increments Z1 , Z2 , . . . In particular, if W0 = 0 then WN = Z1 + · · · + ZN − min (Z1 + · · · + Zn ) (3.2) n=0,1,...,N
(for a rigorous proof, just verify that the r.h.s. of (3.2) satisfies the same recursion as in (3.1)). Theorem 3.1 Let τ (u) = inf {n : u + Y1 + · · · + Yn < 0}. Let further N be fixed and let W0 , W1 , . . . , WN be the Lindley process generated by Z1 = −YN , Z2 = −YN −1 , . . ., ZN = −Y1 according to W0 = 0. Then the events {τ (u) ≤ N } and {WN > u} coincide. Proof. By (3.2), WN
=
−YN − · · · − Y1 −
=
−
min
n=0,1,...,N
min
n=0,1,...,N
(−YN − · · · − YN −n+1 )
(Y1 + · · · + YN −n ) = −
min
n=0,1,...,N
(Y1 + · · · + Yn ).
From this the result immediately follows.
2
Corollary 3.2 assertions are equivalent: ¡ The following ¢ (a) ψ(u) = P ¡τ (u) < ∞ ¢ < 1 for all u ≥ 0; (b) ψ(u) = P τ (u) < ∞ → 0 as u → ∞; (c) The Lindley process {WN } generated by Z1 = −Y1 , Z2 = −Y2 , . . . has a proper limit W in distribution as n → ∞; (d) m = inf n=0,1,... (Y1 + · · · + Yn ) > −∞ a.s.; (e) Y1 + · · · + Yn 6→ −∞ a.s. D In that case, W = −m and P(W > u) = P(−m > u) = ψ(u). Proof. Since (YN , . . . , Y1 ) has the same distribution as (Y1 , . . . , YN ), the Lindley processes in Corollary 3.2 and Theorem 3.1 have the same distribution for n = 0, 1, . . . , N . Thus the assertion of Theorem 3.1 is equivalent to D WN = M N =
sup n=0,1,...,N
(Z1 + · · · + Zn )
D so that WN → M = supn=0,1,... (Z1 + · · · + Zn ) = −m and P(W > u) = P(M > u) = ψ(u). By Kolmogorov’s 01 law, either M = ∞ a.s. or M < ∞ a.s. Combining these facts gives easily the equivalence of (a)–(d). Clearly, (d) ⇒ (e). The converse follows from general random walk theory since it is standard that lim sup(Y1 + · · · + Yn ) = ∞ when Y1 + · · · + Yn 6→ −∞. 2
50
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
By the law of large numbers, a sufficient condition for (e) is that EY is welldefined and ≥ 0. In general, the condition ∞ X 1 P(Y1 + · · · + Yn < 0) < ∞ n n=1
is known to be necessary and sufficient ([APQ, p. 231]) but appears to be rather intractable. Remark 3.3 The i.i.d. assumption on the Z1 , . . . , ZN (or, equivalently, on the Y1 , . . . , YN ) in Theorem 3.1 is actually not necessary — the result is a sample path relation as is Theorem 2.1. Similarly, there is a more general version of Corollary 3.2. One then assumes Yn to be a stationary sequence, w.l.o.g. doubly infinite (n = 0, ±1, ±2, . . .) and defines Zn = −Y−n . 2 Next consider change of measure via likelihood ratios. For a random walk, a Markovian change of measure as in Theorem 1.7 does not necessarily lead to a random walk: if, e.g., F has a strictly positive density ex corresponds to a Markov chain such that the density of X1 given X0 = x and P ex to Fn are equivalent (have is also strictly positive, then the restrictions of Px , P the same null sets) so that the likelihood ratio Ln exists. The following result gives the necessary and sufficient condition for {Ln } to define a new random walk: Proposition 3.4 Let {Ln } be a multiplicative functional of a random walk with Ex Ln = 1 for all n and x. Then the change of measure in Theorem 1.7 corresponds to a new random walk if and only if Ln = h(Y1 ) · · · h(Yn )
(3.3)
Px a.s. for some function h with Eh(Y ) = 1. In that case, the changed increment distribution is Fe(x) = E[h(Y ); Y ≤ x]. Proof. If (3.3) holds, then ex E
n Y
fi (Yi )
= Ex
i=1
n Y
fi (Yi )h(Yi )
i=1
=
n Y i=1
Efi (Yi )h(Yi ) =
n Z Y i=1
fi (y)Fe (dy)
3. RANDOM WALKS IN DISCRETE OR CONTINUOUS TIME
51
from which the random walk property is immediate with the asserted form of e x f (Y1 ) = E e 0 f (Y1 ). Since Fe. Conversely, the random walk property implies E L1 has the form g(X0 , Y1 ), this means E[g(x, Y )f (Y )] = E[g(0, Y )f (Y )] for all f and x, implying g(x, Y ) = h(Y ) a.s. where h(y) = g(0, y). In particular, (3.3) holds for n = 1. For n = 2, we get L2 = L1 (L1 ◦ θ1 ) = h(Y1 )g(X1 , Y2 ) = h(Y1 )h(Y2 ), and so on for n = 3, 4, . . ..
2
A particular important example is exponential change of measure (h(y) = eαy−κ(α) where κ(α) = log Fb[α] is the c.g.f. of F ). The corresponding likelihood ratio is © ª Ln = exp α(Y1 + · · · + Yn ) − nκ(α) . (3.4) Thus {Ln } is the Wald martingale, cf. II.1. We get: Corollary 3.5 Consider a random walk and an α such that κ(α) = log Fb[α] = log EeαY is finite, and define Ln by (3.4). Then the change of measure in Theorem 1.7 corresponds to a new random walk with changed increment distribution Z x Fe(x) = e−κ(α) eαy F (dy) . −∞
Discrete time random walks have classical applications in queueing theory via the Lindley process representation of the waiting time, see Chapter VI. In risk theory, they arise as models for the reserve or claim surplus at a discrete sequence of instants, say the beginning of each month or year, or imbedded into continuous time processes, say by recording the reserve or claim surplus just before or just after claims (see Chapter VI for some fundamental examples). However, the tradition in the area is to use continuous time models. Now consider reflected versions of L´evy processes (cf. II.1). First assume in the setting of Section that {Rt } is the risk reserve process for the compound Poisson risk model with constant premium rate p(r) ≡ 1. Then the storage process {Vt } has constant release rate 1, i.e. has upwards jumps governed by B at the epochs of a Poisson process with rate β and decreases linearly at rate 1 in between jumps. A different interpretation is as the workload or virtual waiting time process in an M/G/1 queue, defined as a system with a single server working at a unit rate, having Poisson arrivals with rate β and
52
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
distribution B of the service times of the arriving customers. Here ‘workload’ refers to the fact that we can interpret Vt as the amount of time the server will have to work until the system is empty provided no new customers arrive; virtual waiting time refers to Vt being the amount of time a customer would have to wait before starting service if he arrived at time t (this interpretation requires FIFO = First In First Out queueing discipline: the customers are served in the order of arrival). Corollary 3.6 In the compound Poisson risk model with constant premium rate p(r) ≡ 1, ψ(u, T ) = P(VT > u), where VT is the virtual waiting time at time T in an initially empty M/G/1 queue with the same arrival rate β and the service times having the same distribution B as the claims in the risk process. D Furthermore, VT → V for some r.v. V ∈ [0, ∞], and ψ(u) = P(V > u). [The condition for V < ∞ a.s. is easily seen to be βµB < 1, cf. Chapter IV.] Processes with a more complicated path structure like Brownian motion or jump processes with unbounded L´evy measure are not covered by Section 2, and the reflected version is then defined by means of the abstract reflection operator as in (3.2), WT = XT − min Xt 0≤t≤T
(assuming W0 = X0 = 0 for simplicity). Proposition 3.7 If {Xt } is a L´evy process of the form Xt = X0 +µt+σBt +Mt as in II.(1.4), then Eeα(Xt −X0 ) = E0 eαXt = etκ(α) , (3.5) where
Z κ(α) = αµ + α2 σ 2 /2 +
∞
(eαx − 1)ν(dx)
−∞
provided the L´evy measure of the jump part {Mt } satisfies
R² −²
(3.6) x ν(dx) < ∞.
Proof. This is basically an easy application of formulas II.(1.7), II.(1.8). To repeat: by standard formulas for the normal distribution, Eeα(µt+σBt ) = et{αµ+α
2
σ 2 /2}
.
By explicit calculation, we show in the compound Poisson case (kνk < ∞) in Proposition IV.1.1 that ½Z ∞ ¾ EeαMt = exp (eαx − 1)ν(dx) . −∞
3. RANDOM WALKS IN DISCRETE OR CONTINUOUS TIME
53
In the general case, use the representation as limit of compound Poisson processes. 2 Note that (3.6) is the L´evyKhinchine representation of the c.g.f. of an infinitely divisible distribution (see, e.g., Chung [246]). This is of course no coincidence since the distribution of X1 − X0 is necessarily infinitely divisible when {Xt } has stationary independent increments. R² Theorem 3.8 Assume that {Xt } is a L´evy process with −² x ν(dx) < ∞, and that {Lt } is a nonnegative multiplicative functional of the form Lt = g(t, Xt − X0 ) with Ex Lt = 1 for all x, t. Then the Markov process given by Theorem 1.7 is again a L´evy process. In particular, if Lt = eθ(Xt −X0 )−tκ(θ) , then the changed parameters in the representation (1.4) are µ e = µ + θσ 2 , σ e2 = σ 2 , νe(dx) = eθx ν(dx). Proof. For the first statement, we use the characterization (1.3) and get ¯ ¤ ¯ ¤ £ £ e f (Xt+s − Xt ) ¯ Ft = E f (Xt+s − Xt )Ls ◦ θt ¯ Ft E ¯ ¤ £ = E f (Xt+s − Xt )g(s, Xt+s − Xt ) ¯ Ft = E0 f (Xs )g(s, Xs ) = E0 f (Xs )Ls e 0 f (Xs ). = E e 0 eαX1 . Then For the second, let eκe(α) = E £ ¤ £ ¤ eκe(α) = E0 L1 eαX1 = e−κ(θ) E0 e(α+θ)X1 = eκ(α+θ)−κ(θ) , κ e(α) = κ(α + θ) − κ(θ) Z ∞ ¡ ¢ ¡ (α+θ)x ¢ = αµ + (α + θ)2 − θ2 σ 2 /2 + e − eθx ν(dx) −∞ Z ∞ 2 2 2 = α(µ + θσ ) + α σ /2 + (eαx − 1)eθx ν(dx). −∞
2 ©
Remark 3.9 If X0 = 0, then the martingale e time analogue of the Wald martingale (3.4).
ª θX(t)−tκ(θ)
is the continuous 2
Example 3.10 Let Xt be the claim surplus process of a compound Poisson risk process with Poisson rate β and claim size distribution B, corresponding to µ = −1, σ = 0, ν(dx) = βB(dx). Then we can write e νe(dx) = βeθx B(dx) = βeB(dx),
eθx b e B(dx). where βe = β B[θ], B(dx) = b B[θ]
54
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
Thus (since µ e = µ = −1, σ e = σ = 0) the changed process is the claim surplus process of another compound Poisson risk process with Poisson rate βe and claim e size distribution B. 2 Example 3.11 For an example of a likelihood ratio not covered by Theorem 3.8, let the given Markov process (specified by the Px ) be the claim surplus process of a compound Poisson risk process with Poisson rate β and claim size ex refer to the claim surplus process of another distribution B, and let the P compound Poisson risk process with Poisson rate βe = β and claim size distribue 6= B. Recalling that σ1 , σ2 , . . . are the arrival times and U1 , U2 , . . . the tion B corresponding claim sizes, it is then easily seen that Lt =
Y dB e (Ui ) dB
i: σi ≤t
e e whenever the RadonNikodym derivative dB/dB exists (e.g. dB/dB = eb/b when e have densities b, eb with b(x) > 0 for all x such that eb(x) > 0). B, B 2
4
Markov additive processes
A Markov additive process, abbreviated as MAP in this section2 , is defined as © ª a bivariate Markov process {Xt } = (Jt , St ) , where {Jt } is a Markov process with state space E (say) and the increments of {St } are governed by {Jt } in the sense that ¯ ¤ £ £ ¤ E f (St+s − St )g(Jt+s ) ¯ Ft = EJt ,0 f (Ss )g(Js ) .
(4.1)
For shorthand, we write Pi , Ei instead of Pi,0 , Ei,0 in the following. As for processes with stationary independent increments, the structure of MAP’s is completely understood when E is finite: In discrete time, a MAP is specified by the measurevalued matrix (kernel) F (dx) whose ijth element is the defective probability distribution Fij (dx) = Pi (J1 = j, Y1 ∈ dx) where Yn = Sn − Sn−1 . An alternative description is in terms of the transition matrix P = (pij )i,j∈E (here pij = Pi (J1 = j)) and the probability measures ¯ ¡ ¢ Fij (dx) . Hij (dx) = P Y1 ∈ dx ¯ J0 = i, J1 = j = pij 2 and only there; one reason is that in parts of the applied probability literature, MAP stands for the Markovian arrival process discussed below.
4. MARKOV ADDITIVE PROCESSES
55
In simulation language, this means that the MAP can be simulated by first simulating the Markov chain {Jn } and next the Y1 , Y2 , . . . by generating Yn according to Hij when Jn−1 = i, Jn = j. If all Fij are concentrated on (0, ∞), a MAP is the same as a semiMarkov or Markov renewal process, with the Yn being interpreted as interarrival times. In continuous time (assuming Dpaths), {Jt } is specified by its intensity matrix Λ = (λij )i,j∈E . On an interval [t, t + s) where Jt ≡ i, {St } evolves like a L´evy process and the parameters µi , σi2 , νi (dx) in II.(1.4) depending on i. In addition, a jump of {Jt } from i to j 6= i has probability qij of giving rise to a jump of {St } at the same time, the distribution of which has some distribution Bij . [That a process with this description is a MAP is obvious; the converse requires a proof, which we omit and refer to Neveu [663] or C ¸ inlar [247].] If E is infinite a MAP may be much more complicated. As an example, let {Jt } be standard Brownian motion on the line. Then a Markov additive process can be defined by letting St = lim ²↓0
1 2²t
Z
t
¡ ¢ I Js  ≤ ² ds
0
be the local time at 0 up to time t. b t [α] with ijth element of the m.g.f., consider the matrix F £ As a generalization ¤ Ei eαSt ; Jt = j . b n [α] = F b [α]n Proposition 4.1 For a MAP in discrete time and with E finite, F where ¡ ¢ ¡ ¢ ¡ ¢ b [α] = F b 1 [α] = Ei [eαS1 ; J1 = j] b ij [α] F = Fbij [α] i,j∈E = pij H . i,j∈E i,j∈E Proof. Conditioning upon (Jn , Sn ) yields X Ei [eαSn+1 ; Jn+1 = j] = Ei [eαSn ; Jn = k] Ek [eαY1 ; J1 = j], k∈E
b n+1 [α] = F b n [α]F b [α]. which in matrix formulation is the same as F
2
Proposition 4.2 Let E be finite and consider a continuous time Markov additive process with parameters Λ, µi , σi2 , νi (dx) (i ∈ E), qij , Bij (i, j ∈ E) and
56
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
£ ¤ b t [α] with ijth element Ei eαSt ; Jt = j is given by S0 = 0. Then the matrix F etK[α] , where ¡ ¢ ¡ ¢ bij [α] − 1) , K[α] = Λ + κ(i) (α) diag + λij qij (B Z κ(i) (α) = αµi + α2 σi2 /2 +
∞
(eαx − 1)νi (dx).
−∞
© (i) ª Proof. Let St be a L´evy process with parameters µi , σi2 , νi (dx). Then, up to o(h) terms, £ ¤ Ei eαSt+h ; Jt+h = j £ ¤ (j) = (1 + λjj h)Ei eαSt ; Jt = j Ej eSh X £ ¤© ª bkj [α] + λkj hEi eαSt ; Jt = k 1 − qkj + qkj B k6=j
£ ¤¡ ¢ Ei eαSt ; Jt = j 1 + hκ(j) (α) X £ ¤© ª bkj [α] − 1) +h Ei eαSt ; Jt = k λkj + λkj qkj (B
=
k∈E
(recall that qjj = 0). In matrix formulation, this means that ¡ ¡ ¢ ¡ ¡ ¢¢ b t+h [α] = F b t [α] I + h κ(i) (α) bij [α] − 1 , F + hΛ + h λij qij B diag b 0 [α] F t
=
b t [α]K , F
b 0 [α] = I implies F b t [α] = etK[α] according to the which in conjunction with F standard solution formula for systems of linear differential equations. 2 In the following, assume that the Markov chain/process {Jt } is ergodic. By PerronFrobenius theory (see A.4c), we infer that in the discrete time case the b [α] has a real eigenvalue κ(α) with maximal absolute value and that matrix F in the continuous time case K[α] has a real eigenvalue κ(α) with maximal real part. The corresponding left and right eigenvectors ν (α) , h(α) may be chosen with strictly positive components. Since ν (α) , h(α) are only given up to a constants, we are free to impose two normalizations, and we shall take ν (α) h(α) = 1, πh(α) = 1, where π = ν (0) is the stationary distribution. Then h(0) = e. The function κ(α) plays in many respects the same role as the cumulant g.f. of a random walk, as will be seen from the following results. In particular,
4. MARKOV ADDITIVE PROCESSES
57
its derivatives are ‘asymptotic cumulants’, cf. Corollary 4.7, and appropriate generalizations of the Wald martingale (and the associated change of measure) can be defined in terms of κ(α) (and h(α) ), cf. Proposition 4.4. £ ¤ (α) (α) Corollary 4.3 Ei eαSt ; Jt = j ∼ hi νj etκ(α) . Proof. By PerronFrobenius theory (see A.4c).
2
We also get an analogue of the Wald martingale for random walks: (α)
(α)
Proposition 4.4 Ei eαSt hJt = hi etκ(α) . Furthermore, ©
(α) ª
eαSt −tκ(α) hJt
t≥0
is a martingale. Proof. For the first assertion, just note that (α)
Ei eαSt hJt
(α)
(α) tK[α] (α) tκ(α) (α) b = eT = eT h = eT h = etκ(α) hi i F t [α]h ie ie
.
It then follows that £ ¤ (α) ¯ E eαSt+v −(t+v)κ(α) hJt+v ¯ Ft £ ¤ (α) ¯ = eαSt −tκ(α) E eα(St+v −St )−vκ(α) hJt+v ¯ Ft £ (α) ¤ (α) = eαSt −tκ(α) EJt eαSv −vκ(α) hJv = eαSt −tκ(α) hJt . Let k(α) denote the derivative of h(α) w.r.t. α, and write k = k(0) .
2
Λt Corollary 4.5 Ei St = tκ0 (0) + ki − Ei kJt = tκ0 (0) + ki − eT i e k.
Proof. By differentiation in Proposition 4.4, £ ¡ (α) (α) (α) ¤ (α) ¢ Ei St eαSt hJt + eαSt kJt = etκ(α) ki + tκ0 (α)hi . (0)
Let α = 0 and recall that h(0) = e so that hi
(0)
= hJt = 1.
(4.2) 2
The argument is slightly heuristic (e.g., the existence of exponential moments is assumed) but can be made rigorous by passing to characteristic functions. In the same way, one obtains a generalization of Wald’s identity ESτ = Eτ · ES1 for a random walk:
58
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
Corollary 4.6 For any stopping time τ with finite mean, Ei Sτ = κ0 (0)Eτ + ki − Ei kJτ . Corollary 4.7 No matter the initial distribution ν of J0 , lim
t→∞
Eν St t
=
κ0 (0),
lim
t→∞
Varν St = κ00 (0). t
Proof. The first assertion is immediate by dividing by t in Corollary 4.5. For the second, we differentiate (4.2) to get h i (α) (α) (α)0 Ei St2 eαSt hJt + 2St eαSt kJt + eαSt kJt ³ ´ © (α)0 (α) (α) (α) (α) ª = etκ(α) ki + tκ0 (α)ki + t κ00 (α)hi + tκ0 (α)2 hi + κ0 (α)ki . Multiplying by νi , summing and letting α = 0 yields £ ¤ Eν St2 + 2St kJt = t2 κ0 (0)2 + 2tκ0 (0)νk + tκ00 (0) + O(1) . Squaring in Corollary 4.5 yields [Eν St ]
2
= t2 κ0 (0)2 + 2tκ0 (0)νk − 2tκ0 (0)Eν kJt + O(1) .
Since it is easily seen by an asymptotic independence argument that Eν [St kJt ] = tκ0 (0)Eν kJt + O(1), subtraction yields Varν St = tκ00 (0) + O(1). 2 Remark 4.8 Also for E being infinite (possibly uncountable), EeαSt typically grows asymptotically exponential with a rate κ(α) independent of the initial condition (i.e., the distribution of J0 ). More precisely, there is typically a function h = h(α) on E and a κ(α) such that Ex eαSt −tκ(α) → h(x) , t → ∞ , for all x ∈ E. From (4.1) one then (at least heuristically) obtains h(x) = = =
lim Ex eαSv −vκ(α) £ ¤ lim Ex eαSt −tκ(α) EJt eαSv−t −(v−t)κ(α)
v→∞
v→∞ Ex eαSt −tκ(α) h(Jt ).
It then follows as in the proof of Proposition 4.4 that o n h(J ) t eαSt −tκ(α) h(J0 ) t≥0
(4.3)
4. MARKOV ADDITIVE PROCESSES
59
is a martingale. In view of this discussion, we take the martingale property as our basic condition below (though this is automatic in the finite case). An example beyond the finite case occurs for periodic risk processes in VII.6, where {Jt } is deterministic period motion on E = [0, 1) (i.e., Jt = (s + t) mod 1 Ps a.s. for s ∈ E). 2 Remark 4.9 The condition that (4.3) is a ªmartingale can be expressed via the © generator A (cf. II.4a) of {Xt } = (Jt , St ) as follows. Given a function h on E, let hα (i, s) = eαs h(i). We then want to determine h and κ(α) such that Ei eαSt h(Jt ) = etκ(α) h(i). For t small, this leads to ¡ ¢ h(i) + tAhα (i, 0) = h(i) 1 + tκ(α) , i.e. Ahα (i, 0) = κ(α)h(i). We shall not exploit this approach systematically; see, however, VI.3b and Remark VII.6.5. 2 © ª Proposition 4.10 Let (Jt , St ) be a MAP and let θ be such that {Lt }t≥0 =
n h(J ) o t eθSt −tκ(θ) h(J0 ) t≥0
is a Px martingale for each x ∈ E. Then {Lt } is a multiplicative functional, © ª ex and the family P given by Theorem 1.7 defines a new MAP. x∈E Proof. That {Lt } is a multiplicative functional follows from Ls ◦ θt =
h(Jt+s ) θ(St+s −St )−sκ(θ) e . h(Jt )
The proof that we have a MAP is contained in the proof of Theorem 4.11 below in the finite case. In the infinite case, one can directly verify that (4.1) holds ex . We omit the details. for the P 2 Theorem 4.11 Consider the irreducible case with E finite. Then the MAP in Proposition 4.10 is given by b [θ]∆ (θ) , e = e−κ(θ) ∆−1(θ) F P h h
e ij (dx) = H
eθx H (dx) b ij [θ] ij H
60
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
in the discrete time case, and by e = ∆−1(θ) K[θ]∆ (θ) − κ(θ)I, µ Λ ei = µi + θσi2 , σ ei2 = σi2 , h h νei (dx) = eθx νi (dx), qeij =
θx bij [θ] qij B eij (dx) = e B (dx) ¡ ¢, B bij [θ] ij bij [θ] − 1 1 + qij B B (θ)
in the continuous time case. Here ∆h(θ) is the diagonal matrix with the hi on the diagonal. In particular, if νi (dx) is compound Poisson, νi (dx) = βi Bi (dx) with βi < ∞ and Bi a probability measure, then also νei (dx) is compound Poisson with θx bi [θ], B ei (dx) = e Bi (dx). βei = βi B bi [θ] B e means Remark 4.12 The expression for Λ h ¡ ¢i bij [θ] − 1 , i 6= j. λ 1 + q B ij ij (θ) (θ)
eij = λ
hj hi
(4.4)
e is an intensity matrix: the offIn particular, this gives a direct verification that Λ bij [θ] > 0. diagonal elements are nonnegative because λij ≥ 0, 0 ≤ qij ≤ 1 and B That the rows sum to 1 follows from e Λe = =
∆−1 K[θ]h(θ) − κ(θ)e = κ(θ)∆−1 h(θ) − κ(θ)e h(θ) h(θ) κ(θ)e − κ(θ)e = 0 .
That 0 ≤ qeij ≤ 1 follows from the inequality qb ≤ 1, 1 + q(b − 1)
0 ≤ q ≤ 1, 0 < b < ∞ .
2
b e t [α] is Proof of Theorem 4.11. First note that the ijth element of F (θ)
e i [eαSt ; Jt = j] = Ei [Lt eαSt ; Jt = j] = E
hj
(θ)
hi
e−tκ(θ) Ei [e(α+θ)St ; Jt = j].
In matrix notation, this means that b b t [α + θ]∆ (θ) . e t [α] = e−tκ(θ) ∆−1(θ) F F h h
(4.5)
4. MARKOV ADDITIVE PROCESSES
61
e follows Consider first the discrete time case. Here the stated formula for P immediately by letting t = 1, α = 0 in (4.5). Further Feij (dx)
ei (Y1 ∈ dx, J1 = j) = Ei [Lt ; Y1 ∈ dx, J1 = j] = P (θ)
hj
=
(θ)
hi
(θ)
e
θx−κ(θ)
Pi (Y1 ∈ dx, J1 = j) =
hj
(θ)
hi
eθx−κ(θ) Fij (dx).
This shows that Feij is absolutely continuous w.r.t. Fij with a density propore ij and Hij ; since H e ij , Hij are tional to eθx . Hence the same is true for H b ij [θ]. probability measures, it follows that indeed the normalizing constant is H Similarly, in continuous time (4.5) yields f
etK[α] = ∆−1 et(K[α+θ]−κ(θ)I) ∆h(θ) . h(θ) By a general formula (A.13) for matrixexponentials, this implies f K[α] = ∆−1 (K[α + θ] − κ(θ)I)∆h(θ) = ∆−1 K[α + θ]∆h(θ) − κ(θ)I. h(θ) h(θ) e Letting α = 0 yields the stated expression for Λ. Now we can write ¡ ¢ f e + ∆−1(θ) K[α + θ] − K[θ] ∆ (θ) K[α] = Λ h h ³h ¡ ¢´ ¡ ¢ j e + κ(i) (α + θ) − κ(i) (θ) bij [α + θ] − B bij [θ] . λ q B Λ + ij ij diag (θ) hi (θ)
=
That κ(i) (α + θ) − κ(i) (θ) corresponds to the stated parameters µei , σ ei2 , νei (dx) of a L´evy process follows from Theorem 3.8. Finally note that by (4.4), (θ)
hj
(θ) hi
λij qij
¡
bij [α + θ] − B bij [θ] B
¢
(θ)
=
hj
(θ) hi
¡b ¢ bij [θ] B e ij [α] − 1 λij qij B
¡b ¢ eij qeij B e ij [α] − 1 . = λ 2
Notes and references The earliest paper on treatment of MAP’s in the present spirit we know of is Nagaev [654]. Much of the pioneering was done in the sixties in papers like Keilson & Wishart [524, 525, 526] and Miller [642, 643, 644] in discrete time; the literature on the continuous time case tends more to deal with special cases. Though the literature on MAP’s is extensive, there is, however, hardly a single comprehensive treatment; an extensive bibliography on aspects of the theory can be found in Asmussen [58].
62
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
Conditions for analogues of Corollary 4.3 for an infinite E are given by Ney & Nummelin [657]. For the Wald identity in Corollary 4.6, see also Fuh & Lai [380] and Moustakides [651]. The closest reference on exponential families of random walks on a Markov chain we know of within the more statistical oriented literature is H¨ oglund [477], which, however, is slightly less general than the present setting.
5
The ladder height distribution
We consider the claim surplus process {St } of a risk process with jumps Ui , interclaim times Ti > 0 and premium rate 1 (but note that no independence or Poisson assumptions are made). As usual, τ (u) = inf {t > 0 : St > u} is the time to ruin. In the particular case u = 0, write τ+ = τ (0) and define the associated ladder height Sτ+ and ladder height distribution by G+ (x) = P(Sτ+ ≤ x) = P(Sτ+ ≤ x, τ+ < ∞). Note that G+ is concentrated on (0, ∞), i.e. has no mass on (−∞, 0], and is typically defective, kG+ k = G+ (∞) = P(τ+ < ∞) = ψ(0) < 1 when η > 0 (there is positive probability that {St } will never come above level 0). M 6 Sτ+ (2) Sτ+ = Sτ+ (1) 
Figure III.2 The term ladder height is motivated from the shape of the process {Mt } of relative maxima, see Fig. III.2. The first ladder step is precisely Sτ+ , and the maximum M is the total height of the ladder, i.e. the sum of all the ladder steps (if η > 0, there are only finitely many). In Fig. III.2, the second ladder point is Sτ+ (2) where τ+ (2) is the time of the next relative maximum after τ+ (1) = τ+ ,
5. THE LADDER HEIGHT DISTRIBUTION
63
the second ladder height (step) is Sτ+ (2) − Sτ+ (1) and so on. In simple cases like the compound Poisson model, the ladder heights are i.i.d., a fact which turns out to be extremely useful. In other cases like the Markovian environment model, they have a semiMarkov structure (but in complete generality, the dependence structure seems too complicated to be useful). In any case, at present we concentrate on the first ladder height. The main result of this section is Theorem 5.5 below, which gives an explicit expression for G+ in a very general setting, where basically only stationarity is assumed. To illustrate the ideas, we shall first consider the compound Poisson model in the notation of Example II.3.2. Recall that B(x) = 1 − B(x) denotes the tail of B. Theorem 5.1 For the compound Poisson model with ρ = βµB < 1, G+ is given by the defective density g+ (x) = βB(x) = ρb0 (x) on (0, ∞). Here b0 (x) = B(x)/µB . For the proof of Theorem 5.1, define the preτ+ occupation measure R+ by Z ∞ Z τ+ R+ (A) = E I(St ∈ A, τ+ > t) dt = E I(St ∈ A) dt. 0
0
The interpretation of R+ (A) is as the expected time {St } spends in the set A before τ+ . Thus, R+ is concentrated on (−∞, 0], i.e., has no mass on (0, ∞). Also, by approximation with step functions, it follows that for g ≥ 0 measurable, Z
Z
0
τ+
g(y)R+ (dy) = E −∞
g(St ) dt.
(5.1)
0
Lemma 5.2 R+ is the restriction of the Lebesgue measure to (−∞, 0]. Proof. Let T be fixed and define St∗ = ST − ST −t , 0 ≤ t ≤ T . That is, {St∗ }0≤t≤T is constructed from {St }0≤t≤T by timereversion and hence, since the distribution of the Poisson process is invariant under time reversion, has the same distribution as {St }0≤t≤T , see Fig. III.3. Thus, P(ST ∈ A, τ+ > T ) = P(ST ∈ A, St ≤ 0, 0 ≤ t ≤ T ) = P(ST∗ ∈ A, ST∗ ≤ ST∗ −t , 0 ≤ t ≤ T ) = =
P(ST∗ ∈ A, ST∗ ≤ St∗ , 0 ≤ t ≤ T ) P(ST ∈ A, ST ≤ St , 0 ≤ t ≤ T ).
(5.2)
64
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
Figure III.3(a): τ+ > t
Figure III.3(b): τ+ ≤ t
Integrating w.r.t. dT , it follows that R+ (A) is the expected time when ST is in A and at a minimum at the same time. But since St → −∞ a.s., this is just the Lebesgue measure of A, cf. Fig. III.4 where the bold lines correspond to minimal values. 2
Lemma 5.3 G+ is the restriction of βR+ ∗B to (0, ∞). That is, for A ⊆ (0, ∞), Z
0
G+ (A) = β
B(A − y)R+ (dy) . −∞
5. THE LADDER HEIGHT DISTRIBUTION
65
6 
A
Figure III.4 © Proof.ª A jump of {St } at time t and of size U contributes to the event Sτ+ ∈ A precisely when τ+ ≥ t, U + St− ∈ A. The probability of this given {Su }u t dt
0
Z
Z
τ+
= βE
0
g(St ) dt = β 0
g(y)R+ (dy) −∞
where g(y) = B(A − y) (here we used the fact that the probability of a jump at t is zero in the second step, and (5.1) in the last). 2 Proof of Theorem 5.1. With r+ (y) = I(y < 0) denoting the density of R+ , Lemma 5.3 yields Z ∞ Z ∞ g+ (x) = β r+ (x − z) B(dz) = β I(x < z) B(dz) = βB(x). 0
0
2 Generalizing the setup, we consider the claim surplus process {St∗ }t≥0 of a risk reserve process in a very general setup, assuming basically stationarity in time and space, © ∗ ª D St+s − Ss∗ t≥0 = {St∗ }t≥0 (5.3) for all s ≥ 0. The sample path structure is assumed to be as for the compound Poisson case: {St∗ } is generated from interclaim times Tk∗ and claim sizes Uk∗
66
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
according to premium 1 per unit time, i.e. ∗
St∗
=
Nt X
Uk∗ − t
where Nt∗ = max {k = 0, 1, . . . : T1∗ + · · · + Tk∗ ≤ t} .
k=1 ∗ The first ladder epoch τ+ is defined as inf {t > 0 : St∗ > 0} and the corresponding ladder height distribution is ¡ ¢ ¡ ¢ ∗ G∗+ (A) = P Sτ∗+∗ ∈ A = P Sτ∗+∗ ∈ A, τ+ 0 of M and h > 0 an arbitrary constant (in the literature, most often one takes h = 1). As above, the r.h.s. of (5.4) does not depend on h; letting h ↓ 0, βh becomes the approximate probability P(σ1∗ ≤ h) of an arrival in [0, h] and the sum approximately ϕ(M ∗ )I(σ1 ≤ h). This more or less gives a proof that indeed (5.4) represents the conditional distribution of M ∗ given σ1∗ = 0. Note also that (again by stationarity) the Palm distribution also represents the conditional distribution of M∗ ◦ θt given an arrival at time t. See, e.g., Sigman [812] or [APQ, VII.6] for these and further aspects of Palm theory. Example 5.4 Consider a finite Markov additive process (cf. Section 4) which has pure jump structure corresponding to µi = σi2 = 0, νi (dx) = βi Bi (dx). Assume {Jt } irreducible so that a stationary distribution π = (πi )i∈E exists. Interpreting jump times as arrival times and jump sizes as marks, we get a marked point process generated by Poisson arrivals at rate βi and mark distribution Bi when Jt = i, and by some additional arrivals which occur w.p. qij when {Jt } jumps from i to j and have mark distribution Bij . A stationary marked point process M∗ is obtained by assigning J0 distribution π. If Jt− = i, an arrival for M∗ occurs before time t + dt w.p. n o X dt βi + λij qij . j6=i
Thus the arrival rate for M∗ is o X n X β = πi βi + λij qij . i∈E
j6=i
Given that an arrival occurs at time t, the probability αij of Jt− = i, Jt = j is πi βi /β for i = j and πi λij qij /β for i 6= j. It follows that we can describe the Palm version M as follows. First choose (J0− , J0 ) w.p. αij for (i, j) and let the initial mark U1 have distribution Bi when i = j and Bij otherwise. After that, let the arrivals and their marks be generated by {Jt } starting from J0 = j.
68
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
Note in particular that the Palm distribution of the mark size (i.e., the distribution of U1 ) is the mixture o o X Xn X X πi n βi Bi + λij qij Bij . B = αii Bi + αij Bij = β i∈E
i∈E
j6=i
j6=i
2 Theorem 5.5 Consider a general stationary claim surplus process {St∗ }t≥0 , let U0 be a r.v. having the Palm distribution of the claim size and F (x) = P(U0 ≤ x) its distribution. Assume that St∗ → −∞ a.s. and that ρ = βEU0 < 1. Then the ∗ ladder height distribution G∗+ is given by the (defective) density g+ (x) = βF (x). Before giving the proof, we note: Corollary 5.6 Under the assumptions of Theorem 5.5, the ruin probability ψ ∗ (0) with initial reserve u = 0 is ρ = βEU0 . This follows by noting that Z ∗
ψ (0) =
kG∗+ k
= 0
∞
Z ∗ g+ (x) dx
By (5.4), ψ ∗ (0) = E
∞
F (x) dx = βEU0 .
= β 0
X
Uk∗ ;
∗ ∈[0,1] k: σk
here the r.h.s. has a very simple interpretation as the average amount of claims received per unit time. The result is notable by giving an explicit expression for ruin in great generality and by only depending on the parameters of the model through the arrival rate β and the average (in the Palm sense) claim size EU0 . The last property is referred to as insensitivity in the applied probability literature. Proof of Theorem 5.5. A standard argument for stationary processes ([199, p. 105]) shows that one can assume w.l.o.g. that M ∗ and M have doubly infinite time (i.e., are point processes on (−∞, ∞) × (0, ∞)). We then represent M by the mark (claim size) U0 of the arrival at time 0, the arrival times 0 < σ1 < σ2 < . . . in (0, ∞) and the arrival times 0 > σ−1 > σ−2 > . . . in (−∞, 0); the mark at time σk is denoted by Uk . Let p(t) be the conditional probability that Sτ∗+ ∈ A, τ+ = t given the event At that an arrival at t occurs. Then clearly Z ∞ G∗+ (A) = P(Sτ∗+ ∈ A) = p(t)β dt . 0
5. THE LADDER HEIGHT DISTRIBUTION
69
© ª Consider a process S˘t t≥0 , which makes an upwards jump at time −σ−k (k = 1, 2, . . .), moves down linearly at a unit rate in between jumps and starts from S˘0 = U0 . © ª © ª Now conditionally upon At , Su∗ 0≤u≤t is distributed as a process Seu∗ 0≤u≤t where a claim arrives at time t and has size U0 , and the kth preceding claim arrives timeª t − σ−k and has size U−k . The sample path relation between © ∗ ª at © ∗ e Su and S˘u amounts to S˘u = Set∗ − Set−u− (left limit) when 0 ≤ u ≤ t and is illustrated on Fig. III.6. It follows that for A ⊆ (0, ∞) ¯ ¢ ¡ = P St∗ ∈ A, Su∗ ≤ 0, 0 < u < t ¯ At ¡ ¢ ∗ = P St∗ ∈ A, Su− ≤ 0, 0 < u < t  At ¡ ¢ ∗ = P Set∗ ∈ A, Seu− ≤ 0, 0 < u < t ¡ ¢ = P S˘t ∈ A, S˘t ≤ S˘t−u , 0 < u < t ¡ ¢ = P S˘t ∈ A, S˘t ≤ S˘u , 0 < u < t ¡ ¢ ˘t , = P S˘t ∈ A; M
p(t)
© ª © ª ˘ t = S˘t ≤ S˘u , 0 < u < t is the event that S˘u has a relative minimum where M at t. In Fig. III.6, time instants corresponding © ª to such minimal values have been marked with bold lines in the path of S˘t , and we let L(dy) be the random R∞ ˘ t ) dt. measure L(A) = 0 I(S˘t ∈ A; M Since S˘0 = U0 , the support of L has right endpoint U0 , and since by assumption St∗ → −∞ a.s., t → ∞, the left endpoint of the support is −∞. A sample path inspection just as in the proof of Lemma 5.2 therefore immediately shows that L(dy) is Lebesgue measure on (−∞, U0 ], cf.©Fig. ª III.6 where the boxes on the time axis correspond to time intervals where S˘t is at a minimum belong© ª ing to A and split A into pieces corresponding to segments where S˘u is at a relative minimum. Thus, Z G∗+ (A)
∞
= β
P(S˘t ∈ A; Mt ) dt = βEL(A)
0
Z ∞ Z = βE I(U0 > y)I(y ∈ A) dy = β P(U0 > y) dy A Z −∞ F (y) dy. = β A
2
70
CHAPTER III. FURTHER GENERAL TOOLS AND RESULTS
6
6 n o Seu∗
U−1
0≤u≤t
U0
A
U0 I
6 t
t n o S˘u
u≥0
U−1
Figure III.6 Notes and references Theorem 5.5 is due to Schmidt & coworkers [102, 372, 648] (a special case of the result appears in Proposition VII.2.1). A further relevant reference related to Corollary 5.6 is Bj¨ ork & Grandell [170]. Two alternative somewhat simpler approaches to prove Theorem 5.1 will be given in Chapter IV (after Theorem IV.2.1 and in Remark IV.3.6).
Chapter IV
The compound Poisson model We consider throughout this chapter a risk reserve process {Rt }t≥0 in the terminology and notation of Chapter I, and assume that • {Nt }t≥0 is a Poisson process with rate β. • the claim sizes U1 , U2 , . . . are i.i.d. with common distribution B, say, and independent of {Nt }. • the premium rate is p = 1. Thus, {Rt } and the associated claims surplus process {St } are given by Rt = u + t −
Nt X
Ui ,
St = u − Rt =
i=1
Nt X
Ui − t.
i=1
An important omission of the discussion in this chapter is the numerical evaluation of the ruin probability. Some possibilities are numerical Laplace transform inversion via Corollary 3.4 below, exact matrixexponential solutions under the assumption that B is phasetype (see further IX.3), Panjer’s recursion (Corollary XVI.2.6) and simulation methods (Chapter XV). For finite horizon ruin probabilities, see Chapter V. It is worth mentioning that much of the analysis of this chapter can be carried over in a direct way to more general L´evy processes, see Chapter XI. 71
72
1
CHAPTER IV. THE COMPOUND POISSON MODEL
Introduction
For later reference, we shall start by giving the basic formulas for moments, cumulants, m.g.f.’s etc. of the claim surplus St = u − Rt . Write (n)
(1)
µB = EU n , µB = µB = EU, ρ = βµB = 1/(1 + η) . Proposition 1.1 (a) ESt = t(βµB − 1) = t(ρ − 1); (2) (b) Var St = tβµB ; ¡ ¢ b − 1 − r; (c) EerSt = etκ(r) where κ(r) = β B[r] (k) (d) The kth cumulant of St is tβµB for k ≥ 2. Proof. It was noted in Chapter I that ρ − 1 is the expected claim surplus per unit time, and this immediately yields (a). A more formal proof goes as follows: ESt
=
E
Nt X
Uk − t = E E
Nt hX
k=1
=
¯ i ¯ Uk ¯ Nt − t
k=1
E[Nt µB ] − t = βtµB − t = t(ρ − 1).
The same method yields also the variance as Var St
=
Var
Nt X
Nt Nt ¯ i ¯ i hX hX ¯ ¯ Uk = Var E Uk ¯ Nt + E Var Uk ¯ Nt
k=1
=
k=1
Var [Nt µB ] + E[Nt Var U ] =
k=1
tβµ2B
(2)
+ tβVar U = tβµB .
For (c), we get EerSt
= =
e−rt e−rt
∞ X k=0 ∞ X k=0
Eer(U1 +···+Uk ) P(Nt = k) k
b k · e−βt (βt) B[r] k!
© ª b = exp −rt − βt + B[r]βt = etκ(r) .
Finally, for (d) just note that the kth cumulant of St is tκ(k) (0), where κ(k) (0) b (k) [0] = µ(k) . is the kth derivative of κ at 0, and that B 2 B The linear way the index t enters in the formulas in Proposition 1.1 is the same as if {St } was a random walk indexed by t = 0, 1, 2, . . . The connections to random walks are in fact fundamental, and there are at least two ways to exploit this:
1. INTRODUCTION
73
Recalling that σk is the time of the kth claim, we have Sσk −Sσk−1 = Uk −Tk , where Tk is the time between the kth and the (k − 1)th claim. Obviously, the Uk − Tk are i.i.d. so that {Sσk } is a random walk with mean EU − ET = EU −
βEU − 1 1 = = −ηµB β β
where η is the safety loading. In this way, we get a discrete time random walk imbedded in the claim surplus process {St }, which is often used in the literature for obtaining information about {St } and the ruin probabilities. For example, obviously ψ(u) = P(maxk Sσk > u). We return to this approach in Chapter VI. The point of view in the present chapter is, however, rather to view {St } directly as a random walk in continuous time, meaning that the increments are stationary and independent, cf. III.3, so we have a L´evy process. Here is one immediate application: Proposition 1.2 (drift and oscillation) a.s. (a) No matter the value of η, St /t → ρ − 1 as t → ∞; a.s. (b) If η < 0, then St → ∞; a.s. (c) If η > 0, then St → −∞; (d) If η = 0, then lim inf t→∞ St = −∞, lim supt→∞ St = ∞. For the proof, we need the following lemma: Lemma 1.3 If nh ≤ t ≤ (n + 1)h, then Snh − h ≤ St ≤ S(n+1)h + h. Proof. We first note that for u, v ≥ 0, Su+v ≥ Su − v. Indeed, Su+v − Su attains its minimal value when there are no arrivals in (u, u + v], and the value is then precisely v. In particular, if t = nh + v with 0 ≤ v ≤ h, then St ≥ Snh − v ≥ Snh − h. The inequality on the right in (1.3) is proved similarly.
2
Proof of Proposition 1.2. For any fixed h, {Snh }n=0,1,... is a discrete time random a.s.
walk, and hence by the strong law of large numbers, Snh /n → ESh = h(ρ − 1).
74
CHAPTER IV. THE COMPOUND POISSON MODEL
Thus using Lemma 1.3, we get lim inf t→∞
St t
St t
=
lim inf
≥
1 1 Snh − h lim inf = ESh = ρ − 1. h n→∞ n h
inf
n→∞ nh≤t≤(n+1)h
A similar argument for lim sup proves (a), and (b), (c) are immediate consequences of (a). Part (d) follows by a (slightly more intricate) general random walk result ([APQ, pp. 224–225]) stating that lim inf n→∞ Snh = −∞, lim supn→∞ Snh = ∞ (Lemma 1.3 is not needed for (d)). 2 Corollary 1.4 The ruin probability ψ(u) is 1 for all u when η ≤ 0, and < 1 for all u when η > 0. Proof. The case of η ≤ 0 is immediate since then M = ∞ by Proposition 1.2. If η > 0, it suffices to prove ψ(0) = P(M > 0) < 1. However, if P(M > 0) = 1, then {St } upcrosses level 0 a.s. at least once. Considering the next downcrossing (which occurs w.p. 1 since St → −∞) and repeating the argument, it is seen that upcrossing occurs at least twice, hence by induction i.o. This contradicts St → −∞. 2 There is also a central limit version of Proposition 1.2: ¡ ¢ √ Proposition 1.5 The limiting distribution of St − t(ρ − 1) / t as t → ∞ is (2) normal with mean zero and variance βµB . Proof. Since {St }t≥0 is a L´evy process (a random walk in continuous time), {Snh }n=0,1,... is a discretetime random walk for any h > 0, and hence it follows (2)
from standard central limit theory and the expression Var(St ) = tβµB (Proposition 1.1(b)) that the assertion holds as t → ∞ through values of the form t = 0, h, 2h . . .. The general case now follows either by another easy application of Lemma 1.3, or by a general result on discrete skeletons ([APQ, p. 415]). 2 Remark 1.6 Often it is of interest to consider sizefluctuations, where the size of the portfolio at time t is M (t). Assuming that each risk generates claims at Poisson intensity β and pays premium 1 per unit time, this case can be reduced to the compound Poisson R s model by an easy operational time transformation T −1 (t) where T (s) = β 0 M (t)dt (this was already pointed out by Lundberg [614], see also [11] for an overview). 2 Notes and references All material of the present section is standard.
2. THE POLLACZECKKHINCHINE FORMULA
2
75
The PollaczeckKhinchine formula
The time to ruin τ (u) was already defined in Chapter I as inf {t > 0 : St > u}, and we shall here exploit the decomposition of the maximum M as sum of ladder heights, cf. Fig. III.2. We assume throughout η > 0 or, equivalently, ρ < 1. It is crucial to note that for the compound Poisson model, the ladder heights are i.i.d. This follows simply by noting that the process repeats itself after reaching a relative maximum. The decomposition of M as a sum of ladder heights now yields: ∞ ¡ ¢X Theorem 2.1 The distribution of M is 1 − kG+ k G∗n + , where G+ is given n=0
by the defective density g+ (x) = βB(x) = ρb0 (x) on (0, ∞). Here b0 (x) = B(x)/µB . The formula for g+ was already obtained in Theorem III.5.1, but before showing the rest of Theorem 2.1, we give an alternative argument which is short and intuitive, but also slightly heuristical: Proof of g+ (x) = βB(x): Assume B has a density b. Note that if there is a claim arrival before dt, then Sτ+ ∈ (u, u + du] occurs precisely when the claim has size u. Hence the contribution to g+ (u) from this event is b(u)β dt. If there are no claim arrivals before dt, consider the process {Set }t≥0 where Set = St+dt − Sdt = St+dt + dt. For Sτ+ ∈ (u, u + du] to occur, Se must either have its first ladder point equal to u + dt or v ∈ (0, dt], and in the latter case the process starting from v must have its first ladder point equal to u + v, i.e. the probability is R dt g+ (v)g+ (u + v) dv. Collecting all first order terms, it follows that, 0 g+ (u)
0 g+ (u)
¡ ¢ = b(u)β dt + (1 − β dt) g+ (u + dt) + g+ (0)g+ (u) dt + o(dt) ¡ ¢ 0 = b(u)β dt + (1 − β dt) g+ (u) + g+ (u) dt + g+ (0)g+ (u) dt + o(dt) ¡ ¢ 0 = g+ (u) + dt −βg+ (u) + g+ (u) + βg+ (0)g+ (u) + βb(u) + o(dt) , ¡ ¢ = β − g+ (0) g+ (u) − βb(u) . (2.1)
Integrating from 0 to x gives ¡ ¢ g+ (x) = g+ (0) + β − g+ (0) P(Sτ+ ≤ x, τ+ < ∞) − βB(u) . Letting x → ∞ and assuming (heuristical but reasonable!) that then g+ (x) → 0, we get ¡ ¢ ¡ ¢ 0 = g+ (0) + β − g+ (0) P(τ+ < ∞) − β = − β − g+ (0) P(τ+ = ∞) .
76
CHAPTER IV. THE COMPOUND POISSON MODEL
Since P(τ+ = ∞) > 0 because of the assumption of a positive loading, we 0 therefore have g+ (0) = β. Thus (2.1) simply means g+ (u) = −βb(u), and the 2 solution satisfying g+ (0) = β is g+ (u) = βB(u). Proof of Theorem 2.1. The probability that M is attained in precisely n ladder steps and does not exceed x is G∗n + (x)(1 − kG+ k) (the parenthesis gives the probability that there are no further ladder steps after the nth). Summing over n, the formula for the distribution of M follows. 2 Alternatively, we may view the ladder heights as a terminating renewal process and M becomes then the lifetime. Combined with ψ(u) = P(M > u), Theorem 2.1 provides a representation formula for ψ(u), which we henceforth refer to as the PollaczeckKhinchine formula. Note that the integrated tail distribution B0 with density b0 is familiar from renewal theory as the limiting stationary distribution of the overshoot (forward recurrence time), see [APQ, V.3–4] or A.1e. Thus, we can rewrite the PollaczeckKhinchine formula as ψ(u) = P(M > u) = (1 − ρ)
∞ X
∗n
ρn B 0 (u) ,
(2.2)
n=1
representing the distribution of M as a geometric compound. As a vehicle for computing ψ(u), (2.2) is not entirely satisfying because of the infinite sum of convolution powers, but we shall be able to extract substantial information from the formula, nevertheless. The following result generalizes the fact that the conditional distribution of the deficit Sτ (0) just after ruin given that ruin occurs (i.e., that τ (0) < ∞) is B0 : taking y = 0 shows that the conditional distribution of the risk reserve immediately before ruin (i.e. −Sτ (0)− ) is again B0 , and we further get information about the joint conditional distribution of this quantity and the deficit. Note that this distribution is the same as the limiting joint distribution of the age and excess life in a renewal process governed by B, cf. Theorem A1.5. Theorem 2.2 The joint distribution of (−Sτ (0)− , Sτ (0) ) is given by the following four equivalent statements: Z ∞ ¡ ¢ B(z) dz; (a) P −Sτ (0)− > x, Sτ (0) > y; τ (0) < ∞ = β x+y
(b) the joint distribution of (−Sτ (0)− , Sτ (0) ) given τ (0) < ∞ is the same as the distribution of (V W, (1 − V )W ) where V, W are independent, V is uniform on (0, 1) and W has distribution FW given by dFW /dB(x) = x/µB ; (c) the marginal distribution of −Sτ (0)− is B0 , and the conditional distribution (y)
of Sτ (0) given −Sτ (0)− = y is the overshoot distribution B0
(y)
given by B0 (z) =
2. SPECIAL CASES OF POLLACZECKKHINCHINE
77
B0 (y + z)/B0 (y); (d) the marginal distribution of Sτ (0)− is B0 , and the conditional distribution of (z) −Sτ (0)− given Sτ (0)− = z is B0 . The proof is given in V.2 and it gives an alternative derivation of the distribution of the deficit Sτ (0) . Notes and references The PollaczeckKhinchine formula is standard in queueing theory, see for example [APQ], Feller [362] or Wolff [894]. The proof of Theorem III.5.1 is traditionally carried out for the imbedded discrete time random walk, where it requires slightly more calculation. As shown in Theorem III.5.5, the form of G+ is surprisingly insensitive to the form of {St } and holds in a certain general marked point process setup. However, in this setting there is no decomposition of M as a sum of i.i.d. ladder heights so that the results do not appear too useful for estimating ψ(u) for u > 0. Theorem 2.2(a) is from Dufresne & Gerber [333]. Again, there is a general marked point process version, cf. Asmussen & Schmidt [103]. For the study of the joint distribution of the surplus Sτ (u)− just before ruin and the deficit Sτ (u) at ruin, see Schmidli [773] and references therein. In Chapter XII these results will be generalized in various directions. In risk theory literature, the PollaczeckKhinchine formula is often referred to as Beekman’s convolution formula, cf. Beekman [152, 153].
3
Special cases of the PollaczeckKhinchine formula
The model and notation is the same as in the preceding sections. We assume η > 0 throughout.
3a
The ruin probability when the initial reserve is zero
The case u = 0 is remarkable by giving a formula for ψ(u) which depends on the claim size distribution only through its mean: Corollary 3.1 ψ(0) = ρ = βµB =
1 . 1+η
Proof Recall that τ+ = τ (0) and note that Z
∞
B(x) dx = βµB .
ψ(0) = P(τ+ < ∞) = kG+ k = β 0
2
78
CHAPTER IV. THE COMPOUND POISSON MODEL
Notes and references The fact that ψ(u) only depends on B through µB is often referred to as an insensitivity property. As shown in III.6, the formula for ψ(0) holds in a more general setting; a slightly modified version also holds for certain twosided jumps, cf. Section XII.4. A further relevant reference is Bj¨ ork & Grandell [170].
3b
Exponential claims
Corollary 3.2 If B is exponential with rate δ, then ψ(u) = ρ e−(δ−β)u . Proof The distribution B0 of the ascending ladder height (given that it is defined) is the distribution of the overshoot of {St } at time τ+ over level 0. But claims are exponential, hence without memory, and so this overshoot has the same distribution as the claims themselves. I.e., B0 is exponential with rate δ and the result can now be proved from the PollaczeckKhinchine formula by elementary calculations. Thus, B0∗n is the Erlang distribution with n phases and the density of M at x > 0 is (1 − ρ)
∞ X n=1
ρn
δ n xn−1 −δx e = (1 − ρ)ρδe−δ(1−ρ)x = ρ(δ − β)e−(δ−β)x . (n − 1)!
Integrating from u to ∞, the result follows. Alternatively, use Laplace transforms. The result can, however, also be seen probabilistically without summing infinite series. Let λ(x) be the failure rate of M at x > 0. For a failure at x, the current ladder step must terminate which occurs at rate δ and there must be no further ones which occurs w.p. 1 − ρ. Thus λ(x) = δ(1 − ρ) = δ − β so that the conditional distribution of M given M > 0 is exponential with rate δ − β and ¯ ¡ ¢ ψ(u) = P(M > u) = P(M > 0) P M > u ¯ M > 0 = ρe−(δ−β)u . 2 In IX.3, we show that expressions for ψ(u) which are explicit (up to matrix exponentials) come out in a similar way also when B is phasetype. E.g. (Example IX.3.2), if β = 3 and B is a mixture of two exponential distributions with rates 3 and 7, and weights 1/2 for each, then ψ(u) =
1 24 −u e + e−6u . 35 35
(3.1)
For heavytailed B, we use the PollaczeckKhinchine formula in Chapter X to show that ρ B 0 (u), u → ∞. ψ(u) ∼ 1−ρ
3. SPECIAL CASES OF POLLACZECKKHINCHINE
79
Notes and references Corollary 3.2 is one of the main classical early results in the area. A variety of proofs are available. We mention in particular the following: (a) check that ψ(u) = ρ e−(δ−β)u is the solution of the renewal equation (3.2) below; (b) use stopped martingales, cf. II.3.
3c
Some classical analytical results
Recall the notation G+ (u) =
R∞ u
G+ (dx).
Corollary 3.3 The ruin probability ψ(u) satisfies the defective renewal equation Z ∞ Z u B(y) dy + ψ(u) = G+ (u) + G+ ∗ ψ(u) = β ψ(u − y)βB(y) dy. (3.2) u
0
Equivalently, the survival probability φ(u) = 1 − ψ(u) satisfies the defective renewal equation Z u φ(u) = 1 − ρ + G+ ∗ φ(u) = 1 − ρ + φ(u − y)βB(y) dy. (3.3) 0
Proof Write ψ(u) as P(M > u) = P(Sτ+ > u, τ+ < ∞) + P(M > u, Sτ+ ≤ u, τ+ < ∞). Then the first term on the r.h.s. is G+ (u), and conditioning upon Sτ+ = y yields P(M > u, Sτ+ ≤ u, τ+ < ∞) Z u Z = P(M > u − y)G+ (dy) = 0
u
ψ(u − y)G+ (dy) .
0
For the last identity in (3.2), just insert the explicit form of G+ . The case of (3.3) is similar (equivalently, (3.3) can be derived by elementary algebra from (3.2)). 2 Corollary 3.4 The Laplace transform of the ruin probability is Z
∞
e−su ψ(u)du =
0
b β − β B[−s] − ρs . b s(β − s − β B[−s])
(3.4)
c0 of B0 as Proof. We first find the m.g.f. B Z c0 [r] = B 0
∞
eru
B(u) du = µB
Z
∞ 0
b −1 eru − 1 B[r] B(du) = . rµB rµB
(3.5)
80
CHAPTER IV. THE COMPOUND POISSON MODEL
Hence Ee
rM
=
(1 − ρ)
∞ X
c0 [r]n = ρn B
n=0
Z
∞
Z e−su ψ(u)du
∞
(1 − ρ)r 1−ρ = , (3.6) b b 1 − ρB0 [r] r + β − β B[r]
=
e−su P(M > u)du =
=
´ 1³ (1 − ρ)s , 1+ b s β − s − β B[−s]
0
0
1 (1 − Ee−sM ) s
which is the same as (3.4).
2
Corollary 3.5 The first two moments of M are Z
∞
EM = 0
(3)
(2)
ρµB , ψ(u) du = 2(1 − ρ)µB
EM
2
(2)2
ρµB β 2 µB = + . (3.7) 3(1 − ρ)µB 2(1 − ρ)2
Proof. This can be shown, for example, by analytical manipulations (L’Hˆopital’s rule) from (3.6). We omit the details (see, e.g., [APQ, p. 237]). 2 Remark 3.6 As mentioned in the Notes below, one can also derive (3.2) by analytical techniques. At the same time, (3.4) follows from (3.2) directly by taking Laplace transforms and noting that the Laplace transform of a convolution of two functions is the product of their Laplace transforms. The Laplace transform of the survival probability φ(u) correspondingly is Z ∞ 1−ρ b . φ[−s] = e−su φ(u)du = b s − β(1 − B[−s]) 0 This can now be used to provide yet another more analytical proof of the ladder height density for a compound Poisson process. From φ(u) = P(M ≤ u) (or from (3.5)) one sees that Z ∞ (1 − ρ)s −sM . Ee = φ(0) + e−su φ0 (u) du = b s − β(1 − B[−s]) 0 On the other hand, as a sum of i.i.d. ladder heights, M is a geometric com¡ ¢ d + [−s]/ρ)N and N is geometric(1 − ρ), leading to pound with E(e−sM ) = E (G ¡ ¢ d + [−s] . A comparison of those two representations for Ee−sM = (1 − ρ)/ 1 − G d + [−s] = β (1 − B[−s])/s, b Ee−sM now gives G so that g+ (x) = ρ b0 (x). 2
3. SPECIAL CASES OF POLLACZECKKHINCHINE
81
Notes and references Corollary 3.3 is standard, see e.g. [APQ, pp. 144–145] or Feller [362]. The approach there is to condition upon the first claim occurring at time t and having size x, which yields the survival probability as Z u+t Z ∞ φ(u) = βe−βt dt φ(u + t − x)B(dx), 0
0
from which (3.3) can be derived by elementary but tedious manipulations (in Section XII.3 a formal procedure will be discussed that is applicable in much more general models). Of course, it is not surprising that such arguments are more cumbersome since the ladder height representation is not used. Also (3.6) and Corollary 3.5 can be found in virtually any queueing book. In fact, either of these sets of formulas are what many authors call the PollaczeckKhinchine formula. In view of (3.4), numerical inversion of the Laplace transform is one of the classical approaches for computing ruin probabilities, see e.g. Abate & Whitt [2], Embrechts, Gr¨ ubel & Pitts [346], Gr¨ ubel [438], Thorin & Wikstad [848] and Albrecher, Avram & Kortschak [14] (see also the Bibliographical Notes in [746, p. 191]).
3d
Deterministic claims
Corollary 3.7 If B is degenerate at µ, then bu/µc
ψ(u) = 1 − (1 − ρ)
X
e
−ρ(k−u/µ)
k=0
£ ¤k ρ(k − u/µ) . k!
Proof. By replacing {St } by {Stµ /µ} if necessary, we may assume µ = 1 so that the stated formula in terms of the survival probability φ(u) = 1 − ψ(u) takes the form £ ¤k buc X −β(k−u) β(k − u) . (3.8) e φ(u) = (1 − β) k! k=0
The renewal equation (3.3) for φ(u) means Z 1∧u φ(u) = 1 − β + φ(u − y)βI(0 ≤ y ≤ 1) dy Z0 u = 1−β+ φ(y)βI(0 ≤ u − y ≤ 1) dy u−1∧u Z u ( 1−β+β φ(y) dy, 0 ≤ u ≤ 1, =
Z
0 u
1−β+β
φ(y) dy, u−1
1 ≤ u < ∞.
82
CHAPTER IV. THE COMPOUND POISSON MODEL
For 0 ≤ u ≤ 1, differentiation yields φ0 (u) = βφ(u) which together with the boundary condition φ(0) = 1 − β yields φ(u) = (1 − β)eβu so that (3.8) follows e for 0 ≤ u ≤ 1. Assume (3.8) shown for n − 1 ≤ u ≤ n and let φ(u) denote the r.h.s. of (3.8). For n ≤ u ≤ n+1, differentiation yields φ0 (u) = βφ(u)−βφ(u−1), e0
φ (u) =
£ ¤k n X d −β(k−u) β(k − u) (1 − β) e du k! k=0
=
βu
(1 − β)βe
+ (1 − β)
n X
βe
−β(k−u)
k=1
£ ¤k−1 β(k − u) e − (1 − β) (k − 1)! k=1 £ ¤k n−1 X −β(k−u+1) β(k − u + 1) e e β φ(u) − β(1 − β) k! n X
=
£ ¤k β(k − u) k!
−β(k−u) β
k=0
=
e e − 1) . β φ(u) − β φ(u
e e Since φ(n) = φ(n) by the induction hypothesis, it follows that φ(u) = φ(u) for n ≤ u ≤ n + 1. 2 Notes and references Corollary 3.7 is identical to the formula for the M/D/1 waiting time distribution derived by Erlang [356]. See also Iversen & Staalhagen [496] for a discussion of computational aspects and further references.
4
Change of measure via exponential families
If X is a random variable with c.d.f. F and c.g.f. Z ∞ κ(α) = log EeαX = log eαx F (dx) = log Fb[α], −∞
the standard definition of the exponential family {Fθ } generated by F is Fθ (dx) = eθx−κ(θ) F (dx),
(4.1)
or equivalently, in terms of the c.g.f. of Fθ , κθ (α) = κ(α + θ) − κ(θ). (Here θ is any such number such that κ(θ) is welldefined.)
(4.2)
4. CHANGE OF MEASURE VIA EXPONENTIAL FAMILIES
83
The adaptation of this construction to L´evy processes (such as {St }) has been carried out in III.3, but will now be repeated for the sake of selfcontainedness. We could first tentatively consider the claim surplus X= ¡ ¢ St for a single t, say b − 1 − α, and define κθ by t = 1: recall from Proposition 1.1 that κ(α) = β B[α] (4.2). The question then naturally arises whether κθ is the c.g.f. corresponding to a compound Poisson risk process in the sense that for a suitable arrival intensity βθ and a suitable claim size distribution Bθ we have bθ [α] − 1) − α. κθ (α) = κ(α + θ) − κ(θ) = βθ (B
(4.3)
The answer is yes: inserting in (4.2) shows that the solution is b βθ = β B[θ],
Bθ (dx) =
b eθx bθ [α] = B[α + θ] . (4.4) B(dx), or equivalently B b b B[θ] B[θ]
Repeating for t 6= 1, we just have to multiply (4.3) by t, and thus (4.4) works as well. Formalizing this for the purpose of studying the whole process {St }, we set up Definition 4.1 Let P be the probability measure on D[0, ∞) governing a given compound Poisson risk process with arrival intensity β and claim size distribution B, and define βθ , Bθ by (4.4). Then Pθ denotes the probability measure governing the compound Poisson risk process with arrival intensity βθ and claim size distribution Bθ ; the corresponding expectation operator is Eθ . The following result (Proposition 4.2, with T taking the role of n) is the analogue of the expression © ª exp θ(x1 + · · · + xn ) − nκ(θ) (4.5) for the density of n i.i.d. replications from Fθ (replace x by xi in (4.1) and multiply from 1 to n). Let FT = σ(St : t ≤ T ) denote the σalgebra spanned by the St , t ≤ T , and (T ) Pθ the restriction of Pθ to FT . (T )
Proposition 4.2 For any fixed T , the Pθ
are mutually equivalent on FT , and
(T ) © ª dPθ = exp θST − T κ(θ) . (T ) dP
That is, for G ∈ FT , £ © ª ¤ P(G) = P0 (G) = Eθ exp − θST + T κ(θ) ; G .
(4.6)
84
CHAPTER IV. THE COMPOUND POISSON MODEL
Proof. We must prove that if Z is FT measurable, then £ ¤ Eθ Z = E ZeθST −T κ(θ) .
(4.7)
By standard measure theory, it suffices to consider the case where Z is mea¡ ¢ (n) surable w.r.t. FT = σ SkT /n : k = 0, 1, . . . , n for a given n. But let Xk = SkT /n − S(k−1)T /n . Then the Xk are i.i.d. with common c.g.f. T κ(α)/n, Z is measurable w.r.t. σ(X1 , . . . , Xn ), and thus (4.7) follows by discrete exponential family theory, in particular the expression (4.5) for the density. The identity (4.6) now follows by taking Z = e−θST +T κ(θ) I(G). 2 Theorem 4.3 Let τ be any stopping time and let G ∈ Fτ , G ⊆ {τ < ∞}. Then £ © ª ¤ P(G) = P0 (G) = Eθ exp −θSτ + τ κ(θ) ; G . (4.8) Proof. We first note that for any fixed t, Eθ e−θSt +tκ(θ) = 1 .
(4.9)
Now assume first that G ⊆ {τ ≤ T } for some deterministic T . Then G ∈ FT , and hence (4.6) holds. Given Fτ , t = T − τ is deterministic. Thus by (4.9), ¯ £ © ª ¤ Eθ exp −θST + T κ(θ) I(G) ¯ Fτ ) = 1, so that PQ equals ¯ £ © ª ¤ Eθ Eθ exp −θST + T κ(θ) I(G) ¯ Fτ ) £ ¤ = Eθ exp {−θSτ + τ κ(θ)} I(G)Eθ [ exp {−θ(ST − Sτ ) + (T − τ )κ(θ)} Fτ ] £ © ª ¤ = Eθ exp −θSτ + τ κ(θ) I(G) . Now consider a general G. Then GT = G ∩ {τ ≤ T } satisfies GT ∈ Fτ , GT ⊆ {τ ≤ T }. Thus, according to what has just been proved, (4.8) holds with G replaced by GT . Letting T ↑ ∞ and using monotone convergence then shows that (4.8) holds for G as well. 2
5
Lundberg conjugation
Being a c.g.f., κ(α) is a convex function of α. The behavior at zero is given by the first order Taylor expansion η α. κ(α) ≈ κ(0) + κ0 (0)α = 0 + ES1 α = α(ρ − 1) = − 1+η Thus, subject to the basic assumption η > 0 of a positive safety loading, the typical shape of κ is as in Fig. IV.1(a).
5. LUNDBERG CONJUGATION
85
Figure IV.1 When the tail of the claim size distribution is exponentially bounded, then typically a γ > 0 satisfying ¡ ¢ b −1 −γ 0 = κ(γ) = β B[γ] (5.1) exists. Equation (5.1) is known as the Lundberg equation and plays a fundamental role in risk theory; an equivalent version illustrated in Fig. IV.2 is γ b (5.2) B[γ] = 1+ . β 6
1 + βs b B[s]
1 r γ
s
Figure IV.2 As support for memory, we write PL instead of Pγ , βL instead of βγ and so on in the following. Note that ¡ ¢ bL [α] − 1 − α = κ(α + γ), κL (α) = βL B cf. Fig. IV.1(b). An established terminology is to call γ the adjustment coefficient but there are various alternatives around, e.g. the Lundberg exponent.
86
CHAPTER IV. THE COMPOUND POISSON MODEL
b = δ/(δ − r). It is Example 5.1 Consider the case of exponential claims, B[r] then readily seen that the nonzero solution of (5.1) (or (5.2)) is γ = δ −β. Thus b = δ/β, and (4.4) yields βL = δ and that BL is again exponential with rate B[γ] δL = β. Thus, Lundberg conjugation corresponds to interchanging the rates of the interarrival times and the claim sizes. 2 It is a crucial fact that when governed by PL , the claim surplus process has positive drift EL S1 = κ0L (0) = κ0 (γ) > 0, (5.3) cf. Fig. IV.1(b). Taking τ = τ (u), G = {τ (u) < ∞} in Theorem 4.3, we further note that (5.1) is precisely what is needed for one of the terms in the exponent to vanish so that Theorem 4.3 takes a particular simple form, £ © ª ¤ ψ(u) = P(τ (u) < ∞) = EL exp −γSτ (u) ; τ (u) < ∞ . Letting ξ(u) = Sτ (u) − u be the overshoot and noting that PL (τ (u) < ∞) = 1 by (5.3), we can rewrite this as ψ(u) = e−γu EL e−γξ(u) ,
(5.4)
see also III.(1.5). Theorem 5.2 (Lundberg’s inequality) For all u ≥ 0, ψ(u) ≤ e−γu . Proof. Just note that ξ(u) ≥ 0 in (5.4).
2
´rLundberg approximation) ψ(u) ∼ Ce−γu as Theorem 5.3 (the Crame u → ∞, where C =
γ
R∞ 0
1−ρ 1−ρ . = b xeγx βB(x) dx β B 0 [γ] − 1
(5.5)
Proof. By renewal theory, see A.1e, ξ(u) has a limit ξ(∞) (in the sense of weak convergence w.r.t. PL ) with density (L)
(L)
1 − G+ (x) (L)
µ+ (L)
=
G+ (x) (L)
µ+
, (L)
where G+ is the PL – ascending ladder height distribution and µ+ its mean. Since e−γx is continuous and bounded, we therefore have EL e−γξ(u) → C where Z ∞ 1 (L) −γξ(∞) e−γx (1 − G+ (x)) dx C = EL e = (L) µ+ 0 Z ∞ 1 (L) = (1 − e−γx )G+ (dx) , (5.6) (L) γµ+ 0
5. LUNDBERG CONJUGATION
87
and all that is needed to check (5.6) is the same as (5.5). To that end, © is that ª take first θ = γ, τ = τ+ , G = Sτ+ ∈ A in Theorem 4.3. Then £ © ª ¤ P(Sτ+ ∈ A) = EL exp −γSτ+ ; Sτ+ ∈ A , which shows that (L)
G+ (dx) = eγx G+ (dx) = eγx βB(x) dx.
(5.7)
In principle, this solves the problem of evaluating (5.6), but some tedious (though elementary) calculations remain to bring the expressions on a final form. Noting (L) that kG+ k = 1 because of (5.3), we get Z ∞ Z ∞ (L) −γx (1 − e )G+ (dx) = 1 − βB(x) dx = 1 − ρ. 0
0
Using (5.7) yields
Z (L)
µ+
∞
= β
xeγx B(x) dx = βϕ0 (γ),
(5.8)
0
where
Z
∞
ϕ(α) = 0
eαx B(x) dx =
¢ 1¡b B[α] − 1 α
(5.9)
so that
¡ ¢ b 0 [γ] − B[γ] b −1 b 0 [γ] − 1/β γB B = ϕ (γ) = γ2 γ (using (5.1)) and 0
(L)
γµ+
= γβ
b 0 [γ] − 1/β B b 0 [γ] − 1. = βB γ
(5.10) 2
Example 5.4 Consider first the exponential case b(x) = δe−δx . Then ψ(u) = ρe−(δ−β)u where ρ = β/δ. From this it follows, of course, that γ = δ − β (this was already found in Example 5.1 above) and that C = ρ. A direct proof of C = ρ is of course easy: b 0 [γ] B
=
C
=
δ δ d δ = = 2, dγ δ − γ (δ − γ)2 β 1−ρ 1−ρ 1−ρ = = ρ. = b 0 [γ] − 1 βδ/β 2 − 1 1/ρ − 1 βB
The accuracy of Lundberg’s inequality in the exponential case thus depends on how close ρ is to one, or equivalently of how close the safety loading η is to zero. 2
88
CHAPTER IV. THE COMPOUND POISSON MODEL
Remark 5.5 Noting that b 0 [γ] − 1 , ρL − 1 = βL µBL − 1 = κ0L (0) = κ0 (γ) = β B we can rewrite the Cram´erLundberg constant C in the nice symmetrical form C =
1−ρ κ0 (0) = . 0 κ (γ) ρL − 1
(5.11) 2
R∞ b Remark 5.6 Let ψ[−s] = 0 e−su ψ(u) du denote the Laplace transform of the b ruin probability. Obviously, the Laplace transform of ψ(u)eγu is then ψ[−s + γ]. Since from the damping property of Laplace transforms, for any function f (u), limu→∞ f (u) = lims→0 s fb[−s], given that this limit exists, we can also determine the constant C in the Cram´erLundberg approximation by b C = lim s ψ[−s + γ], s→0
which from (3.4) again gives (5.5). Although it looks tempting to use this procedure for determining C in more general models where γ exists and only the Laplace transform of ψ may be available explicitly, it is important to note that this procedure does not prove the Cram´erLundberg approximation, but just gives the correct value of C in case the approximation holds (the approximation itself usually has to be established by other techniques and often only exists in a weaker logarithmic sense, cf. Chapter XIII). For a related method to obtain the asymptotic behavior of ψ(u) for regularly varying claims, see Chapter X. 2 In Chapter V, we shall need the following result which follows by a variant of the calculations in the proof of Theorem 5.3: Lemma 5.7 For α 6= γ, EL e−αξ(∞) =
b − α] − 1 ´ γ ³ B[γ . 1−β 0 ακ (γ) γ−α
Proof. Replacing γ by α in (5.6) and using (5.7), we obtain Z ∞ ´ 1 ³ EL e−αξ(∞) = 1 − e(γ−α)x βB(x) dx (L) 0 αµ+ ³ b − α] − 1 ´ 1 B[γ , = 1 − β (L) γ−α αµ+
5. LUNDBERG CONJUGATION
89
using integration by parts as in (3.5) in the last step. Inserting (5.10), the result follows. 2 Notes and references The results of this section are classical, with Lundberg’s inequality being given first in Lundberg [615] and the Cram´erLundberg approximation in Cram´er [265]. Therefore, extensions and generalizations are main topics in the area of ruin probabilities, and in particular numerous such results can be found later in this book; in particular, see Sections V.4, VI.3, VII.3, VII.6, XI.2, and XII.2–3. The mathematical approach we have taken is more recent in risk theory (some of the classical ones can be found in the next subsection). The techniques are basically standard ones from sequential analysis, see for example Wald [869] and Siegmund [810].
5a
Alternative proofs
For the sake of completeness, we shall here give some classical proofs, first one of Lundberg’s inequality which is slightly longer but maybe also slightly more elementary: Alternative proof of Lundberg’s inequality. Let X be the value of {St } just after the first claim, F (x) = P(X ≤ x). Then, since X is the independent difference U − T between an interarrival time T and a claim U , Fb[γ] =
b Eeγ(U −T ) = EeγU · Ee−γT = B[γ]
β = 1, β+γ
where the last equality follows from κ(γ) = 1. Let ψ (n) (u) denote the probability of ruin after at most n claims. Conditioning upon the value x of X and considering the cases x > u and x ≤ u separately yields Z u (n+1) ψ (u) = F (u) + ψ (n) (u − x) F (dx). −∞
We claim that this implies ψ (n) (u) ≤ e−γu , which completes the proof since ψ(u) = limn→∞ ψ (n) (u). Indeed, this is obvious for n = 0 since ψ (0) (u) = 0. Assuming it proved for n, we get Z u ψ (n+1) (u) ≤ F (u) + e−γ(u−x) F (dx) −∞ Z ∞ Z u ≤ e−γu eγx F (dx) + e−γ(u−x) F (dx) u
=
e
−γu
Fb[γ] = e−γu .
−∞
2
90
CHAPTER IV. THE COMPOUND POISSON MODEL
Of further proofs of Lundberg’s inequality, we mention in particular the martingale approach, see III.1. Next consider the Cram´erLundberg approximation. Here the most standard proof is via the renewal equation in Corollary 3.3 (however, as will be seen, the calculations needed to identify the constant C are precisely the same as above): Alternative proof of the Cram´erLundberg’s approximation. Recall from Corollary 3.3 that Z ∞ Z u B(x) dx + ψ(u) = β ψ(u − x)βB(x) dx. u
0
Multiplying by eγu and letting Z γu
γu
∞
Z(u) = e ψ(u), z(u) = e β
B(x) dx, F (dx) = eγx βB(x) dx,
u
we can rewrite this as Z Z(u)
u
= z(u) +
eγ(u−x) ψ(u − x) · eγx βB(x) dx,
Z0 u = z(u) +
Z(u − x)F (dx) , 0
i.e. Z = z + F ∗ Z. Note that by (5.9) and the Lundberg equation, γ is precisely the correct exponent which will ensure that F is a proper distribution (kF k = 1). It is then a matter of routine to verify the conditions of the key renewal theorem R∞ (Proposition A1.1) to conclude that Z(u) has the limit C = 0 z(x)dx/µF , so that it only remains to check that C reduces to the expression given above. (L) However, µF is immediately seen to be the same as µ+ calculated in (5.8), whereas Z ∞ Z ∞ Z x Z ∞ Z ∞ B(x) dx = B(x) dx z(u) du = βeγu du βeγu du 0 0 0 0 u Z ∞ i β βh1 b (B[γ] − 1) − µB B(x) (eγx − 1) dx = = γ γ γ 0 h i 1−ρ β 1 − µB = , = γ β γ using the Lundberg equation and the calculations in (5.9). Easy calculus now gives (5.5). 2
6. MORE ON THE ADJUSTMENT COEFFICIENT
91
Notes and references Another related, but slightly different proof of the Cram´erLundberg’s approximation in the spirit of Feller [362] that utilizes the Blackwell renewal theorem can be found in Albrecher & Teugels [36]. The asymptotic behavior of the ruin probability for heavytailed claims will be discussed in X.2.
6
Further topics related to the adjustment coefficient
6a
On the existence of γ
In order that the adjustment coefficient γ exists, it is of course necessary that b B is lighttailed in the sense of I.2a, i.e. that B[α] < ∞ for some α > 0. This excludes heavytailed distributions like the lognormal or Pareto, but may in many other cases not appear all that restrictive, and the following possibilities then occur: b 1. B[α] < ∞ for all α < ∞. b b 2. There exists α∗ < ∞ such that B[α] < ∞ for all α < α∗ and B[α] =∞ ∗ for all α ≥ α . b b 3. There exists α∗ < ∞ such that B[α] < ∞ for all α ≤ α∗ and B[α] =∞ ∗ for all α > α . b In particular, monotone convergence yields B[α] ↑ ∞ as α ↑ ∞ in case 1, and ∗ b B[α] ↑ ∞ as α ↑ α in case 2 (in exponential family theory, this is often referred to as the steep case). Thus the existence of γ is automatic in cases 1, 2; standard examples are distributions with finite support or tail satisfying B(x) = o(e−αx ) for all α in case 1, and phasetype or Gamma distributions in case 2. Case 3 may be felt to be rather atypical, but some nonpathological examples exist, for example the inverse Gaussian distribution (see Example 9.7 below for details). In b ∗ ] ≥ 1+α∗ /β and not otherwise, that is, dependent case 3, γ exists provided B[α ¡ ¢ b ∗] − 1 . on whether β is larger or smaller than the threshold value α∗ / B[α Notes and references Ruin probabilities in case 3 with γ nonexistent are studied, e.g., by Borovkov [182, p. 132] and Embrechts & Veraverbeke [353]. To the present authors’ mind, this is a somewhat special situation and therefore not treated in this book.
92
6b
CHAPTER IV. THE COMPOUND POISSON MODEL
Bounds and approximations for γ
Proposition 6.1 If the adjustment coefficient exists, it can be bounded by γ
1 + µB α + µB α2 /2. Hence ¡ ¢ ¡ ¢ (2) (2) b −1 β γµB + γ 2 µB /2 β B[γ] βγµB > = βµB + , (6.1) 1 = γ γ 2
from which the results immediately follows.
2
The upper bound in Proposition 6.1 is also an approximation for small safety loadings (heavy traffic, cf. Section 7c): Proposition 6.2 Let B be fixed but assume that β = β(η) varies with the safety 1 loading such that β = . Then as η ↓ 0, µB (1 + η) γ = γ(η) ∼
2ηµB (2)
µB
.
(6.2)
Further, the Cram´erLundberg constant satisfies C = C(η) ∼ 1. Proof. Since ψ(u) → 1 as η ↓ 0, it follows from Lundberg’s inequality that γ → 0. Hence by Taylor expansion, the inequality in (6.1) is also an approximation so that ¡ ¡ ¢ ¢ (2) (2) b −1 β γµB + γ 2 µB /2 β B[γ] βγµB ≈ =ρ+ , 1 = γ γ 2 2(1 − ρ) 2ηµB . γ ∼ = (2) (2) βµB µB That C → 1 easily follows from γ → 0 and C = EL e−γξ(∞) (in the limit, ξ(∞) is distributed as the overshoot corresponding to η = 0). For an alternative analytic proof, note that C
= ≈ ≈
ηµB 1−ρ = 0 0 b [γ] − 1 b [γ] − 1/β βB B ηµB η = (2) (2) µB + γµB − µB (1 + η) γµB /µB − η η = 1. 2η − η
2
6. MORE ON THE ADJUSTMENT COEFFICIENT
93
Obviously, the approximation (6.2) is easier to calculate than γ itself. However, it needs to be used with caution say in Lundberg’s inequality or the Cram´erLundberg approximation, in particular when u is large.
6c
A refinement of Lundberg’s inequality
The following result gives a sharpening of Lundberg’s inequality (because obviously C+ ≤ 1) as well as a supplementary lower bound: Theorem 6.3 C− e−γu ≤ ψ(u) ≤ C+ e−γu where C− = inf R ∞ x≥0
x
B(x) eγ(y−x) B(dy)
,
C+ = sup R ∞ x≥0
x
B(x) eγ(y−x) B(dy)
.
Proof. Let H(dt, dx) be the PL distribution of the time τ (u) of ruin and the reserve u − Sτ (u)− just before ruin. Given τ (u) = t, u − Sτ (u)− = x, a claim occurs at time t and has distribution BL (dy)/B L (x), y > x. Hence Z ∞Z ∞ Z ∞ BL (dy) EL e−γξ(u) = H(dt, dx) e−γ(y−x) B L (x) 0 0 x R∞ Z ∞Z ∞ B(dy) x = H(dt, dx) −γx b e B L (x)B[γ] 0 0 Z ∞Z ∞ B(x) = H(dt, dx) R ∞ γ(y−x) e B(dy) 0 0 x Z ∞Z ∞ ≤ C+ H(dt, dx) = C+ . 0
0
The upper bound then follows from ψ(u) = e−γu EL e−ξ(u) , and the proof of the lower bound is similar. 2 Example 6.4 If B(x) = e−δx , then an explicit calculation shows easily that R∞ x
e−δx B(x) β R∞ = ρ. = = γ(y−x) (δ−β)(y−x) −δy δ e B(dy) e δe dy x
Hence C− = C+ = ρ so that the bounds in Theorem 6.3 collapse and yield the exact expression ρ e−γu for ψ(u). 2 The following concluding example illustrates a variety of the topics discussed above (though from a general point of view the calculations are deceivingly simple: typically, γ and other quantities will have to be calculated numerically).
94
CHAPTER IV. THE COMPOUND POISSON MODEL
Example 6.5 Assume as for (3.1) that β = 3 and b(x) =
1 1 · 3e−3x + · 7e−7x , 2 2
and recall that the ruin probability is ψ(u) =
1 24 −u e + e−6u . 35 35
Since the dominant term is 24/35 · e−u , it follows immediately that γ = 1 and C = 24/35 = 0.686 (also, bounding e−6u by e−u confirms Lundberg’s inequality). For a direct verification, note that the Lundberg equation is ¶ µ 3 1 7 1 b · + · −1 , γ = β(B[γ] − 1) = 3 2 3−γ 2 7−γ which after some elementary algebra leads to the cubic equation 2γ 3 − 14γ 2 + 12γ = 0 with roots 0, 1, 6. Thus indeed γ = 1 (6 is not in the domain of b convergence of B[γ] and therefore excluded). Further, ¶ µ 2 1 1 1 1 · + · = , 1 − ρ = 1 − βµB = 1 − 3 2 3 2 7 7 ¯ ¯ 3 1 7 17 b 0 [γ] = 1 · ¯ + · = , B 2 2 2 (3 − α) 2 (7 − α) ¯α=γ=1 36 2 24 1−ρ 7 = = . C = 17 b 0 [γ] − 1 35 βB −1 3· 36 For Theorem 6.3, note that the function ¾ Z ∞½ 1 1 · 3e−3x + · 7e−7x dx 3 + 3e−4u 2 2 ¾ ½ Z ∞u = 1 1 9/2 + 7/2e−4u · 3e−3x + · 7e−7x dx ex−u 2 2 u attains its minimum C− = 2/3 = 0.667 for u = ∞ and its maximum C+ = 3/4 = 0.750 for u = 0, so that 0.667 ≤ C ≤ 0.750 in accordance with C = 0.686. 2 Notes and references Theorem 6.3 is from Taylor [836]. Closely related results are given in a queueing setting in Kingman [535], Ross [748] and Rossberg & Siegel [749].
7. VARIOUS APPROXIMATIONS FOR THE RUIN PROBABILITY
95
Some further references on variants and extensions of Lundberg’s inequality are Kaas & Goovaerts [514], Willmot [886], Cai & Garrido [218], Dickson [306], Kalashnikov [516, 518] and Chadjiconstantinidis & Politis [227], all of which also go into aspects of the heavytailed case.
7
Various approximations for the ruin probability
7a
The BeekmanBowers approximation
The idea is to write ψ(u) as P(M > u), fit a gamma distribution with parameters λ, δ to the distribution of M by matching the two first moments and use the incomplete gamma function approximation Z ∞ λ δ xλ−1 e−δx dx. ψ(u) ≈ Γ(λ) u According to Corollary 3.5, this means that λ, δ are given by λ/δ = a1 , 2λ/δ 2 = a2 (2)2 (2) (3) β 2 µB ρµB ρµB , a2 = + , a1 = 2(1 − ρ)µB 3(1 − ρ)µB 2(1 − ρ)2 i.e. δ = 2a1 /a2 , λ = 2a21 /a2 . Notes and references The approximation was introduced by Beekman [151], with the present version suggested by Bowers in the discussion of [151].
7b
De Vylder’s approximation
Given a risk process with parameters β, B, p = 1, the idea is to approximate the ruin probability with the one for a different process with exponential claims, e arrival intensity βe and premium rate pe. In order say with rate parameter δ, to make the processes look as much as possible alike, we make the first three cumulants match, which according to Proposition 1.1 means βe − pe = βµB − 1 = ρ − 1, δe
2βe (2) = βµB , 2 e δ
6βe (3) = βµB . 3 e δ
These three equations have solutions (2)
3µB , δe = (3) µB
(2)3
9βµB βe = , (3)2 2µB
(2)2
pe =
3βµB
(3)
2µB
− ρ + 1.
(7.1)
96
CHAPTER IV. THE COMPOUND POISSON MODEL
e p, ρ∗ = β ∗ /δ, e the approximating risk process has ruin probabilLetting β ∗ = β/e ∗ e ity ψ(u) = ρ∗ e−(δ−β )u , cf. Proposition I.1.3 and Corollary 3.2, and hence the ruin probability approximation is ψ(u) ≈
βe −(δ− e β/e e p)u e . e peδ
(7.2)
Notes and references The approximation (7.2) was suggested by De Vylder [299]. Though of course it is based upon purely empirical grounds, numerical evidence (e.g. Grandell [429, pp. 19–24], [432]) shows that it may produce surprisingly good results, in particular for lighttailed claim distributions. Extensions of this method to approximations with more general involved claim distributions are immediate, but there is a natural tradeoff between complexity and accuracy of the approximation. For an investigation on the use of Coxian distributions of order two for the claim distribution of the approximating risk process, see Badescu & Stanford [120]. Due to its simplicity, the De Vylder approximation is also very popular for the study of effects of external mechanisms such as dividend payments and reinsurance on the probability of ruin (see for instance Beveridge, Dickson & Wu [161], Gerber, Shiu & Smith [413]). A related procedure is to approximate ψ(u) by a combination of two exponential terms, where one of them is the Cram´erLundberg approximation (5.5) and the coefficient and exponent of the other are determined by matching E[M ] and the mass of M in 0. This leads to the socalled Tijms approximation, see [852] and Lin & Willmot [892, Ch.8].
7c
The heavy traffic approximation
The term heavy traffic comes from queueing theory, but has an obvious interpretation also in risk theory: on the average, the premiums exceed only slightly the expected claims. That is, heavy traffic conditions mean that the safety loading η is positive but small, or equivalently that β is only slightly smaller than βmax = 1/µB . Mathematically, we shall represent this situation with a limit where β ↑ βmax but B is fixed. Proposition 7.1 As β ↑ βmax , (βmax − β)M converges in distribution to the 2µ2B . exponential distribution with rate δ = (2) µB Proof. Note first that 1− ρ = (βmax −β)µB . Letting B0 be the stationary excess life distribution, we have according to the PollaczeckKhinchine formula in the
7. VARIOUS APPROXIMATIONS FOR THE RUIN PROBABILITY
97
form (3.6) that Ees(βmax −β)M = ≈ =
1−ρ 1−ρ £ ¤ = © £ ¤ª b0 s(βmax − β) b0 s(βmax − β) 1 − ρB 1−ρ+ρ 1−B 1−ρ 1−ρ ≈ 1 − ρ − ρs(βmax − β)µB0 1 − ρ − s(βmax − β)µB0 δ µB = , µB − sµB0 δ−s (2)
where δ = µB /µB0 = 2µ2B /µB .
2
Corollary 7.2 If β ↑ βmax , u → ∞ in such a way that (βmax − β)u → v, then ψ(u) → e−δv . ¡ ¢ Proof. Write ψ(u) as P (βmax − β)M > (βmax − β)u .
2
These results suggest the approximation ψ(u) ≈ e−δ(βmax −β)u .
(7.3)
It is worth noting that this is essentially the same as the approximation (2)
ψ(u) ≈ Ce−γu ≈ e−2uηµB /µB
(7.4)
suggested by the Cram´erLundberg approximation and Proposition 6.2. This follows since η = 1/ρ − 1 ≈ 1 − ρ, and hence δ(βmax − β) =
2µ2B (2) µB
·
1−ρ 2ηµB . ≈ (2) µB µB
However, obviously Corollary 7.2 provides the better mathematical foundation. Notes and references Heavy traffic limit theory for queues goes back to Kingman [534]. The present situation of Poisson arrivals is somewhat more elementary to deal with than the renewal case (see e.g. [APQ, X.7]). We return to heavy traffic from a different point of view (diffusion approximations) in Chapter V and give further references there. In the setting of risk theory, the first results of heavy traffic type seem to be due to Hadwiger [445]. Numerical evidence shows that the fit of (7.3) is reasonable for η being say 10–20% and u being small or moderate, while the approximation may be far off for large u.
98
7d
CHAPTER IV. THE COMPOUND POISSON MODEL
The light traffic approximation
As for heavy traffic, the term light traffic comes from queueing theory, but has an obvious interpretation also in risk theory: on the average, the premiums are much larger than the expected claims. That is, light traffic conditions mean that the safety loading η is positive and large, or equivalently that β is small compared to µB . Mathematically, we shall represent this situation with a limit where β ↓ 0 but B is fixed. Of course, in risk theory heavy traffic is most often argued to be the more typical case. However, light traffic is of some interest as a complement to heavy traffic, as well as it is needed for the interpolation approximation to be studied in the next subsection. Proposition 7.3 As β ↓ 0, Z ∞ £ ¤ B(x) dx = βE U − u; U > u = βE(U − u)+ . ψ(u) ≈ β
(7.5)
u
Proof. According to the PollaczeckKhinchine formula, ψ(u) = (1 − ρ) Asymptotically, hence
P∞ n=2
∞ X
∞ X
∗n
β n µnB B 0 (u) ≈
∗n
β n µnB B 0 (u) .
n=1
n=1 2
· · · = O(β ) so that only the first terms matters, and Z
∞
βµB B 0 (u) = β
ψ(u) ≈
B(x)dx. u
The alternative expressions in (7.5) follow by integration by parts.
2
Note that heuristically the light traffic approximation in Proposition 7.3 is the same which comes out by saying that basically ruin can only occur at the time T of the first claim, i.e. ψ(u) ≈ P(U − T > u). Indeed, by monotone convergence Z ∞ Z ∞ B(x + u)βe−βx dx ≈ β B(x) dx. P(U − T > u) = 0
u
Notes and references Light traffic limit theory for queues was initiated by Bloomfield & Cox [178]. For a more comprehensive treatment, see Daley & Rolski [270, 271], Asmussen [61] and references therein. Again, the Poisson case is much easier than the renewal case. Another way to understand that the present analysis is much simpler than in these references is the fact that in the queueing setting light traffic theory is much easier for virtual waiting times (the probability of the conditioning event {M > 0} is explicit) than for actual waiting times, cf. Sigman [811]. Light traffic was first studied in risk theory in the first edition of this book.
7. VARIOUS APPROXIMATIONS FOR THE RUIN PROBABILITY
7e
99
Interpolating between light and heavy traffic
We shall now outline an idea of how the heavy and light traffic approximations can be combined. The crude idea of interpolating between light and heavy traffic leads to ¶ µ β β lim ψ(u) + lim ψ(u) ψ(u) ≈ 1− βmax β↓0 βmax β↑βmax ¶ µ β β β ·0+ ·1 = = ρ, = 1− βmax βmax βmax which is clearly useless. Instead, to get nondegenerate limits, we combine with our explicit knowledge of ψ(u) for the exponential claim size distribution E with the same mean µB as the given one B, that is, with rate 1/µB = βmax . Let (B) ψeLT (u) denote the light traffic approximation given by Proposition 7.3 and use (E) (B) similar notation for ψ (B) (u) = ψ(u), ψ (E) (u) = ρe−(βmax −β)u , ψeLT (u), ψeHT (u), (E) ψeHT (u). Substituting v = u(βmax − β), we see that the following limits exist: ´ ³ (B) ψeHT βmaxv −β e−δv ´ = ³ = e(1−δ)v = c HT (v) (say), lim (2) 2 β↑βmax e(E) e−2µE /µE ·v ψHT βmaxv −β ´ ³ R∞ (B) ψeLT βmaxv −β B(x) dx ´ = R ∞v/βmax ³ lim β↓0 e(E) e−βmax x dx ψLT βmaxv −β v/βmax Z ∞ B(x) dx = c LT (v) (say), = βmax ev v/βmax
and the approximation we suggest is ´ ³h β β i cLT (u(βmax − β)) + cHT (u(βmax − β)) ψ(u) ≈ ψ (E) (u) 1 − βmax βmax Z ∞ B(x) dx + ρ2 e−δ(βmax −β)u . = ρ(1 − ρ)βmax (7.6) u(1−ρ)
The particular features of this approximation are that it is exact for the exponential distribution and asymptotically correct both in light and heavy traffic. Thus, even if the safety loading is not very small, one may hope that some correction of the heavy traffic approximation has been obtained. Notes and references In the queueing setting, the idea of interpolating between light and heavy traffic is due to Burman & Smith [209, 210]. Another main queueing paper is Whitt [882], where further references can be found. The adaptation to risk theory is new; no empirical study of the fit of (7.6) is, however, available.
100
8
CHAPTER IV. THE COMPOUND POISSON MODEL
Comparing the risks of different claim size distributions
Given two claim size distributions B (1) , B (2) , we may ask which one carries the larger risk in the sense of larger values of the ruin probability ψ (i) (u) for a fixed value of β. To this end, we shall need various ordering properties of distributions, for more detail and background on which we refer to M¨ uller & Stoyan [653] or Shaked & Shantikumar [795]. Recall that B (1) is said to be stochastically smaller than B (2) (in symbols, (1) (1) (x) ≤ B (2) (x) for all x; equivalent characterizations are ≺st B (2) RB R ) if B (1) (2) f dB ≤ f dB for any nondecreasing function f , or the existence of random variables U (1) , U (2) such that U (1) has distribution B (1) , U (2) distribution B (2) and U (1) ≤ U (2) a.s. A weaker concept is increasing convex ordering: B (1) is said to be smaller than B (2) in the increasing convex order (in symbols, B (1) ≺icx B (2) ) if Z
Z
∞
∞
B (1) (y) dy ≤ x
B (2) (y) dy
(8.1)
x
R R for all x; an equivalent characterization is f dB (1) ≤ f dB (2) for any nondecreasing convex function f . In the literature on risk theory, most often the term stoploss ordering is used instead of increasing convex ordering because R∞ for a given distribution B, one can interpret x B(y) dy as the net stoploss premium in a stoploss or excessofloss reinsurance arrangement with retention limit x, cf. XVI.4. Finally, we have the convex ordering: R B (1) is saidR to be convexly smaller than B (2) (in symbols, B (1) ≺cx B (2) ) if f dB (1) ≤ f dB (2) for any convex function f . Rather than measuring difference in size, this ordering measures difference in variability. In particular (consider the convex functions x and −x) the definition implies that B (1) and B (2) must have the same mean, whereas (consider x2 ) B (2) has the larger variance. One can show that if B (1) and B (2) have the same mean and B (1) ≺icx B (2) , this is equivalent to B (1) ≺cx B (2) . Proposition 8.1 If B (1) ≺st B (2) , then ψ (1) (u) ≤ ψ (2) (u) for all u. Proof. According to the above characterization of stochastic ordering, we can (1) (2) In terms of the assume that St ≤ St for all t. © ª time © to ruin, this ª implies τ (1) (u) ≥ τ (2) (u) for all u so that τ (1) (u) < ∞ ⊆ τ (2) (u) < ∞ . Taking probabilities, the proof is complete. 2
8. COMPARISONS OF CLAIM SIZE DISTRIBUTIONS
101
Of course, Proposition 8.1 is quite weak, and a particular deficit is that we cannot compare the risks of claim size distributions with the same mean: if B (1) ≺st B (2) and µB (1) = µB (2) , then B (1) = B (2) . Here convex ordering is useful: Proposition 8.2 If B (1) ≺icx B (2) and µB (1) = µB (2) (i.e. B (1) ≺cx B (2) ), then ψ (1) (u) ≤ ψ (2) (u) for all u. Proof. Since the means are equal, say to µ, we have Z Z 1 ∞ (1) 1 ∞ (2) (1) (2) B0 (x) = B (y) dy ≤ B (y) dy = B0 (x). µ x µ x (1)
(8.2)
(2)
I.e., B0 ≺st B0 which implies the same order relation for all convolution powers. Hence by the PollaczeckKhinchine formula ψ (1) (u) = (1 − ρ)
∞ X
(1)∗n
β n µn B0
n=1
(u) ≤ (1 − ρ)
∞ X
(2)∗n
β n µn B0
(u) = ψ (2) (u).
n=1
2 Remark 8.3 From the proof u still holds if the assumption weakened to just ask for (8.2). Z ∞ 1 B (1) (y) dy µB (1) x
above it is clear that ψ (1) (u) ≤ ψ (2) (u) for all on the ordering of the claim size distribution is Slightly more general, the ordering defined by Z ∞ 1 B (2) (y) dy for all x ≥ 0, ≤ µB (2) x
is known as the harmonic mean residual life order and is sufficient for ψ (1) (u) ≤ ψ (2) (u) to hold as long as β1 µB (1) ≤ β2 µB (2) . 2 A general picture that emerges from these results and numerical studies like in Example 8.6 below is that (in a rough formulation) increased variation in B increases the risk (assuming that we fix the mean). The problem is to specify what ‘variation’ means. A first attempt would of course be to identify ‘variation’ with variance. The heavy traffic approximation (7.4) certainly supports this view: noting that, with fixed mean, larger variance is paramount to larger second moment, it is seen that asymptotically in heavy traffic larger claim size variance leads to larger ruin probabilities. Proposition 8.2 provides another instance of this, and here is one more result of the same flavor: Corollary 8.4 Let D refer to the distribution degenerate at µB . Then ψ (D) (u) ≤ ψ (B) (u) for all u.
102
CHAPTER IV. THE COMPOUND POISSON MODEL
Proof. If f is convex, we have by Jensen’s inequality that Ef (U ) ≥ f (EU ). This implies that D ≺cx B and we can apply Proposition 8.2. 2 A partial converse to Proposition 8.2 is the following: Proposition 8.5 If ψ (1) (u) ≤ ψ (2) (u) for all u and β, then B (1) ≺cx B (2) . Proof. Consider the light traffic approximation in Proposition 7.1.
2
We finally give a numerical example illustrating how differences in the claim size distribution B may lead to very different ruin probabilities even if we fix the mean µ = µB . Example 8.6 Fix β at 1/1.1 and µB at 1 so that the safety loading η is 10%, and consider the following claim size distributions: B1 : the standard exponential distribution with density e−x ; B2 : the hyperexponential distribution with density 0.1λ1 e−λ1 x + 0.9λ2 e−λ2 x where λ1 = 0.1358, λ2 = 3.4142; B3 : the Erlang distribution with density 4xe−2x ; B4 : the Pareto distribution with density 3/(1 + 2x)5/2 . Let uα denote the α fractile of the ruin function, i.e. ψ(uα ) = α, and consider α = 5%, 1%, 0.1%, 0.01%. One then obtains the following table: u0.05 u0.01 u0.001 u0.0001
B1 32 50 75 100
B2 181 282 425 568
B3 24 37 56 74
B4 35 70 245 1100
(the table was produced using simulation and the numbers are therefore subject to statistical uncertainty). Note to make the figures comparable, all distributions have mean 1. In terms of variances σk2 , we have 1 < σ12 = 1 < σ22 = 10 < σ42 = ∞ 2 so that in this sense B4 is the most variable. However, in comparison to B2 the effect on the uα does not show before α = 0.01%, which appears to be smaller than the range of interest in insurance risk (certainly not in queueing applications!), and this is presumably a consequence of a heavier tail rather than larger variance. For B1 , B2 , B3 the comparison is as expected from the intuition concerning the variability of these distributions, with the hyperexponential distribution being more variable than the exponential distribution and the Erlang distribution less. 2 σ32 =
9. SENSITIVITY ESTIMATES
103
Notes and references Further relevant references are Goovaerts et al. [425], van Heerwarden [454], Kl¨ uppelberg [539], Pellerey [689] and (for the convex ordering) Makowski [623]. For the harmonic mean residual life order, see Michel [636] and Trufin, Albrecher & Denuit [854]. For relations between higherorder stoploss orderings of claim size distributions and ruin probabilities see Cheng & Pai [236]. Tsai [856] considers orderings in the presence of perturbations. We return to ordering of ruin probabilities in a special problem in VII.4 and also in XIII.8. For the situation that the claim size distribution and the Poisson parameter are unknown, but a sample of data points is available, Politis [709] considers the problem of semiparametric estimation of ruin probabilities.
9
Sensitivity estimates
In a broad setting, sensitivity analysis (or pertubation analysis) deals with the calculation of the derivative (the gradient in higher dimensions) of a performance measure s(θ) of a stochastic or deterministic system, the behavior of which is governed by a parameter θ. A standard example from queueing theory is a queueing network, with θ the vector of service rates at different nodes and routing probabilities, and s(θ) the expected sojourn time of a customer in the network. In the present setting, s(θ) is of course the ruin probability ψ = ψ(u) (with u fixed) and θ a set of parameters determining the arrival rate β, the premium rate p and the claim size distribution B. For example, we may be interested in ∂ψ/∂p for assessing the effects of a small change in the premium, or we may be interested in ∂ψ/∂β as a measure of the uncertainty on ψ if β is only approximatively known, say estimated from data. Example 9.1 Consider the case of claims which are exponential with rate δ (the premium rate is one). Then ψ = βδ e−(δ−β)u , and hence βu −(δ−β)u ∂ψ 1 e = = e−(δ−β)u + ∂β δ δ
µ
¶ 1 + u ψ(u), β
which is of the order of magnitude uψ(u) for large u. Assume for example that δ is known, while β = βb is an estimate, obtained say in the natural way as the empirical arrival rate Nt /t in [0, t]. Then if t is large, the distribution of βb − β is approximatively normal N (0, β/t). Thus, b b if ψb = βδ e−(δ−β)u , it follows that ψb − ψ is approximatively normal N (0, σ 2 /t), where µ ¶2 ∂ψ σ2 = β ∼ βu2 ψ 2 . ∂β
104
CHAPTER IV. THE COMPOUND POISSON MODEL
b (the relIn particular, the standard deviation on the normalized estimate ψ/ψ 1/2 ative error) is approximatively β u, i.e. increasing in u. Similar conclusions will be found below. 2 Proposition 9.2 Consider a risk process {Rt } with a general constant premium rate p. Then ∂ψ ∂ψ = −β , ∂p ∂β where the partial derivatives are evaluated at p = 1. Proof. This is an easy time transformation argument in a similar way as in © (p) ª (p) Proposition I.1.3. Let Rt = Rt/p . Then the arrival rate β (p) for Rt is β/p, and hence the effect of changing p from 1 to 1 + ∆p corresponds to changing β to β/(1 + ∆p) ≈ β(1 − ∆p). Thus at p = 1, ∂β ∂ψ ∂ψ ∂ψ = = −β . ∂p ∂p ∂β ∂β 2 As a consequence, it suffices to fix the premium at p = 1 and consider only the effects of changing β or/and B. In the case of the claim size distribution B, various parametric families of claim size distributions could be considered, but we shall concentrate on a special structure covering a number of important cases, namely that of a twoparameter exponential family of the form © ª Bθ,ζ (dx) = exp θx + ζt(x) − ω(θ, ζ) µ(dx) , x > 0 (9.1) (see Remark 9.5 below for some discussion of this assumption). Consider first the adjustment coefficient γ as function of β, θ, ζ, and write γβ = ∂γ/∂β and so on. Similar notation for partial derivatives are used below, e.g. for the ruin probabilities ψ = ψ(u) and the Cram´erLundberg constant C. Proposition 9.3 γβ
=
γθ
=
γζ
=
γ ¡ ¢, β 1 − ωθ (θ + γ, ζ)(β + γ) £ ¤ (β + γ) ωθ (θ + γ, ζ) − ωθ (θ, ζ) , 1 − (β + γ)ωθ (θ + γ, ζ) £ ¤ (β + γ) ωζ (θ + γ, ζ) − ωζ (θ, ζ) . 1 − (β + γ)ωθ (θ + γ, ζ)
(9.2) (9.3) (9.4)
9. SENSITIVITY ESTIMATES
105
Proof. According to (9.8) below, we can rewrite the Lundberg equation as ω(θ + γ, ζ) − ω(θ, ζ) = log(1 + γ/β). Differentiating w.r.t. β yields ωθ (θ + γ, ζ)γβ =
³γ γ ´ 1 β − 2 . 1 + γ/β β β
From this (9.2) follows by straightforward algebra, and the proofs of (9.3), (9.4) are similar. 2 Now consider the ruin probability ψ = ψ(u) itself. Of course, we cannot expect in general to find explicit expressions like in Example 9.1 or Proposition 9.3, but must look for approximations for the sensitivities ψβ , ψθ , ψζ . The most intuitive approach is to rely on the accuracy of the Cram´erLundberg approximation, so that heuristically we obtain ψβ ≈
∂ Ce−γu = Cβ e−γu − uγβ Ce−γu ≈ −uγβ ψ ∂β
(9.5)
as u → ∞. As will be seen below, this intuition is indeed correct. However, mathematically a proof is needed basically to show that two limits (u → ∞ and the differentiation as limit of finite differences) are interchangeable. Consider first the case of ∂ψ/∂β: Proposition 9.4 As u → ∞, it holds that ∂ψ γC 2 . ∼ ue−γu ∂β β(1 − ρ) Proof. We shall use the renewal equation (3.2) for ψ(u), Z ∞ Z u B(x) dx + ψ(u) = β ψ(u − x)βB(x) dx. u
Letting ϕ = ∂ψ/∂β and differentiating (9.6), we get Z ∞ Z u Z B(x) dx + ϕ(u) = ψ(u − x)B(x) dx + u
(9.6)
0
0
u
ϕ(u − x)βB(x) dx.
0
Proceeding in a similar way as in the proof of the Cram´erLundberg approximation based upon (9.6) (Section 5), we multiply by eγu and let Z(u) = eγu ϕ(u), F (dx) = eγx βB(x)dx and z = z1 + z2 , where Z u Z ∞ B(x) dx, z2 (u) = eγu ψ(u − x)B(x) dx . z1 (u) = eγu u
0
106
CHAPTER IV. THE COMPOUND POISSON MODEL
Then Z = z + F ∗ Z and F is a proper probability distribution. By dominated convergence, Z Z 1 ∞ 1 u γ(u−x) C e ψ(u − x) F (dx) → C F (dx) = , z2 (u) = β 0 β 0 β b as u → ∞, and also z1 (u) → 0 because of B[γ] < ∞. Hence by a variant of the key renewal theorem (Proposition A1.2 of the Appendix), Z(u)/u → C/βµF where µF is the mean of F . But from the proof of Theorem 5.3 (see in particular (5.10)), µF = (1 − ρ)/Cγ. Combining these estimates, the proof is complete. 2 For the following, we note the formulas Eθ,ζ t(U ) Eθ,ζ e
αU
Eθ,ζ t(U )e
αU
= ωζ (θ, ζ), © ª bθ,ζ [α] = exp ω(θ + α, ζ) − ω(θ, ζ) , = B © ª = ωζ (θ + α, ζ) exp ω(θ + α, ζ) − ω(θ, ζ)
(9.7) (9.8) (9.9)
which are wellknown and easy to show (see e.g. BarndorffNielsen [136]). Further write £ ¤ © ª dθ = ωθ (θ + γ, ζ) − ωθ (θ, ζ) exp ω(θ + γ, ζ) − ω(θ, ζ) £ ¤³ γ´ , = ωθ (θ + γ, ζ) − ωθ (θ, ζ) 1 + β £ ¤ © ª dζ = ωζ (θ + γ, ζ) − ωζ (θ, ζ) exp ω(θ + γ, ζ) − ω(θ, ζ) £ ¤³ γ´ . = ωζ (θ + γ, ζ) − ωζ (θ, ζ) 1 + β Proposition 9.5 Assume that (9.1) holds. Then as u → ∞, ∂ψ βC 2 dθ , ∼ ue−γu ∂θ 1−ρ
∂ψ βC 2 dζ . ∼ ue−γu ∂ζ 1−ρ
Proof. By straightforward differentiation, Z ∞ © ª ∂B(x) ∂ = exp θy + ζt(y) − ω(θ, ζ) µ(dy) ∂ζ ∂ζ Z ∞x £ ¤ = t(y) − ωζ (θ, ζ) B(dy). x
Letting ϕ = ∂ψ/∂ζ, it thus follows from (9.6) that Z u ϕ(u) = e−γu z1 (u) + e−γu z2 (u) + ϕ(u − x)βB(x) dx , 0
(9.10)
9. SENSITIVITY ESTIMATES
107
where Z = βeγu Z γu = e
z1 (u) z2 (u)
∞
Multiplying by e
∞£
¤ t(y) − ωζ (θ, ζ) B(dy) dx, u x Z ∞ u £ ¤ ψ(u − x)β t(y) − ωζ (θ, ζ) B(dy) dx.
0 γu
Z
x
and letting
Z(u) = eγu ϕ(u),
z = z1 + z2 ,
F (dx) = eγx βB(x)dx,
this implies Z = z + F ∗ Z. By dominated convergence and (9.7)(9.9), Z ∞ Z ∞ £ ¤ γx z2 (u) → C ·e β t(y) − ωζ (θ, ζ) B(dy) dx 0 x Z ∞ £ ¤1 = βC t(y) − ωζ (θ, ζ) (eγy − 1) B(dy) γ Z0 ∞ £ ¤ βC t(y) − ωζ (θ, ζ) eγy B(dy) = γ 0 βC dζ = γ as u → ∞, and also z1 (u) → 0 because of Z ∞ £ ¤ eγy t(y) − ωζ (θ, ζ) B(dy) < ∞ . 0
Hence,
βC Z(u) → dζ , u γµF
from which the second assertion of (9.10) follows, and the proof of the first one is similar. 2 Example 9.6 Consider the gamma density b(x) =
© ª 1 δ α α−1 −δx x e = exp −δx + α log x − (log Γ(α) − α log δ) · . Γ(α) x
Here (9.1) holds with µ(dx) = x−1 dx, θ = −δ, ζ = α, t(x) = log x, ω(θ, ζ) = log Γ(α) − α log δ = log Γ(ζ) − ζ log(−θ).
108
CHAPTER IV. THE COMPOUND POISSON MODEL 0
We get ωζ (θ, ζ) = Ψ(ζ)−log(−θ) = Ψ(α)−log δ where Ψ = Γ /Γ is the Digamma function, ωθ (θ, ζ) = −ζ/θ = α/δ. It follows after some elementary calculus that ρ = αβ/δ and, by inserting in the above formulas, that C
=
dθ
=
dζ
=
γβ
=
γδ = −γθ
=
γα = γζ
=
1−ρ , − γ)α+1 − 1
αβδ α /(δ
δ α−1 , (δ − γ)α+1 ³ δ ´³ δ ´α , log δ−γ δ−γ γ 2 − δγ , αβ 2 + αβγ + βγ − βδ αβγ + αγ 2 − 2 , δ − δγ − αβδ − αδγ ³ δ ´ (βδ + δγ − βγ − γ 2 ) . log δ − γ − αβ − αγ δ−γ αγ
(9.11) (9.12) (9.13) (9.14)
Finally, (9.10) takes the form ∂ψ ∂ψ βC 2 dθ , = − ∼ −ue−γu ∂δ ∂θ 1−ρ
∂ψ ∂ψ βC 2 dζ . = ∼ ue−γu ∂α ∂ζ 1−ρ 2
Example 9.7 Consider the inverse Gaussian density bξ,c (x) = √
´o n 1 ³ c2 + ξ2x . exp ξc − 2 x 2x3 π c
This has the form (9.1) with µ(dx) = √
1 2x3 π
dx, θ = −
ξ2 c2 1 , ζ = − , t(x) = , 2 2 x
p 1 1 ω(θ, ζ) = −ξc − log c = −2 (−θ)(−ζ) − log(−ζ) − log 2. 2 2 In particular, for α ≤ α∗ =
ξ2 2
p ¢ª © ª © ¡ bθ,ζ [α] = exp ω(θ + α, ζ) − ω(θ, ζ) = exp c ξ − ξ 2 − 2α . B
9. SENSITIVITY ESTIMATES
109
b ∗ ] ≥ 1 + α∗ /β of Section 6a needed for the existence of Thus the condition B[α ξc γ becomes e ≥ 1 + ξ 2 /2β. Straightforward but tedious calculations, which we omit in part, further yield 0 bθ,ζ βB [γ] − 1 =
= ωζ (θ, ζ)
=
γβ
=
γξ
=
γc
=
dθ
=
dζ
=
p ª © 1 −1 β exp c(ξ − ξ 2 − 2γ) p 2 c ξ − 2γ β+γ p −1 c ξ 2 − 2γ s r 1 ξ 1 c θ ζ − = + , ωθ (θ, ζ) = = , ζ 2ζ c c2 θ ξ p γ ξ 2 − 2γ p , β ξ 2 − 2γ − βc(β + γ) p ξ − ξ 2 − 2γ , −ξγθ = −c(γ + β) p ξ 2 − 2γ − c(γ + β) p ξ 2 − 2γ − ξ ξ 2 − 2γ , −cγζ = −(γ + β) p ξ 2 − 2γ − c(γ + β) h 1 i³ γ´ 1 − 1− , c p ξ β ξ 2 − 2γ i³ γ´ 1 hp 2 . ξ − 2γ − ξ 1 − c β
Finally, (9.10) takes the form ∂ψ ∂ψ βC 2 dθ , = −ξ ∼ −ξue−γu ∂ξ ∂θ 1−ρ
∂ψ ∂ψ βC 2 dζ . = −c ∼ −cue−γu ∂c ∂ζ 1−ρ 2
Remark 9.8 The specific form of (9.1) is motivated as follows. In general, the exponent of the density in an exponential family has the form θ1 t1 (x) + · · · + θk tk (x). Thus, we have assumed k = 2 and t1 (x) = x. That it is no restriction to assume one of the ti (x) to be linear follows since the whole setup requires exponential moments to be finite (thus we can always extend the family if necessary by adding a term θx). That it is no restriction to assume k ≤ 2 follows since if k > 2, we can just fix k − 2 of the parameters. Finally if k = 1, the exponent is either θx, in which case we can just let t(x) = 0, or ζt(x), in which case the extension just described applies. 2
110
CHAPTER IV. THE COMPOUND POISSON MODEL
Notes and references The general area of sensitivity analysis (gradient estimation) is currently receiving considerable interest in queueing theory. However, the models there (e.g. queueing networks) are typically much more complicated than the one considered here, and hence explicit or asymptotic estimates are in general not possible. Thus, the main tool is simulation, for which we refer to XV.7 and references therein. Comparatively less work seems to have been done in risk theory; thus, to our knowledge, the results presented here are new. Van Wouve et al. [861] consider a special problem related to reinsurance. For the study of perturbation via perturbed renewal equations, see Gyllenberg & Silvestrov [444].
10
Estimation of the adjustment coefficient
We consider a nonparametric setup where β, B are assumed to be completely unknown, and we estimate γ by means of the empirical solution γT to the Lundberg equation. To this end, let βT =
NT , T
NT X bT [α] = 1 B eαUi , NT i=1
¡ ¢ bT [α] − 1 − α , κT (α) = βT B
and let γT be defined by κT (γT ) = 0. bT and hence γT is undefined. Also, if Note that if NT = 0, then B ρT = β T
1 (U1 + · · · + UNT ) > 1 , NT
then γT < 0. However, by the LLN both P(NT = 0) and P(ρT > 1) converge to 0 as T → ∞. a.s. b Theorem 10.1 As T → ∞, γT → γ. If furthermore B[2γ] < ∞, then ³ 1 ´ γT − γ ≈ N 0, σγ2 , T
where σγ2 = βκ(2γ)/κ0 (γ)2 . For the proof, we need a lemma. Lemma 10.2 As T → ∞, bT [γ] B
≈
κT (γ) ≈
³ b b 2´ B[2γ] − B[γ] b N B[γ], , βT ³ κ(2γ) ´ . N T
(10.1) (10.2)
10. ESTIMATION OF THE ADJUSTMENT COEFFICIENT
111
Proof. Since ¡ ¢2 b b 2, Var(eγU ) = Ee2γU − EeγU = B[2γ] − B[γ] we have
³ b b 2´ 1 X γUi b B[2γ] − B[γ] . e ≈ N B[γ] n i=1 n n
a.s.
Hence (10.1) follows from NT /T → β and Anscombe’s theorem. More generally, since NT /T ≈ N (β, β/T ), it is easy to see that we can write √ β V1 ³ β ´ ³ β ´ 1 T , ≈ + √ √ q bT [γ] b B B[γ] 2 b b T β B[2γ] − B[γ] V2 where V1 , V2 are independent N (0, 1) r.v.’s. Hence κT (γ)
¢¡ ¢ bT [γ] − B[γ]) b b − 1) − γ β + (βT − β) (B + (B[γ] ¡ ¢ ¡ ¢ ¡ ¢ b − 1 − γ + (βT − β) B[γ] b −1 +β B bT [γ] − B[γ] b ≈ β B[γ] p q ¢ ª 1 ©p ¡ b b b 2 · V2 β B[γ] − 1 V1 + β B[2γ] − B[γ] ≈ 0+ √ T o´ ³ βn D b − 1)2 + B[2γ] b b 2 B[γ] − B[γ] = N 0, T o´ ³β n 2γ b B[2γ] − −1 = N T β =
¡
which is the same as (10.2).
2
Proof of Theorem 10.1. By the law of large numbers, a.s. a.s. bT [α] a.s. b βT → β, B → B[α], κT (α) → κ(α).
Let 0 < ² < γ. Then κ(γ − ²) < 0 < κ(γ + ²) and hence κT (γ − ²) < 0 < κT (γ + ²) for all sufficiently large T . I.e., γT ∈ (γ − ², γ + ²) eventually, and the truth of a.s. this for all ² > 0 implies γT → γ. Now write κT (γT ) − κT (γ) = κ0T (γT∗ )(γT − γ), (10.3)
112
CHAPTER IV. THE COMPOUND POISSON MODEL
where γT∗ is some point between γT and γ. If γT ∈ (γ − ², γ + ²), we have κ0T (γ − ²) < κ0T (γT∗ ) < κ0T (γ + ²). By the law of large numbers, bT0 [α] = B
NT 1 X a.s. b 0 [α]. Ui eαUi → EU eαU = B NT i=1
a.s.
Hence κ0T (α) → κ0 (α) for all α so that for all sufficiently large T κ0 (γ − ²) < κ0T (γT∗ ) < κ0 (γ + ²), a.s.
which implies κ0T (γT∗ ) → κ0 (γ). Combining (10.3) and Lemma 10.2, it follows that γT − γ
≈ ≈
κT (γ) κT (γT ) − κT (γ) = − 0 0 κ (γ) κ (γ) ³ κ(2γ) ´ ¡ ¢ N = N 0, σγ2 /T . T κ0 (γ)
2 Theorem 10.1 can be used to obtain error bounds on the ruin probabilities when the parameters β, θ are estimated from data. To this end, first note that ¡ ¢ e−γT u ≈ N e−γu , u2 e−2γu σγ2 /T . Thus an asymptotic upper α confidence bound for e−γu (and hence by Lundberg’s inequality for ψ(u)) is fα e−γT u + √ u e−γu σγ;T T 2 where σγ;T = βT κT (2γT )/κ0T (γT )2 is the empirical estimate of σγ2 and fα satisfies Φ(−fα ) = α (e.g., fα = 1.96 if α = 2.5%).
Notes and references Theorem 10.1 is from Grandell [428]. A major restriction b of the approach is the condition B[2γ] < ∞ which may be quite restrictive. For example, if B is exponential with rate δ so that γ = δ − β, it means 2(δ − β) < δ, i.e. δ < 2β or equivalently ρ > 1/2 or η < 100%. For this reason, various alternatives have been developed. One (see Schmidli [771]) is to let {Vt } be the workload process of an M/G/1 queue with the same arrival epochs as the risk process and service times U1 , U2 , . . ., i.e. Vt = St − inf 0≤v≤t Sv . Letting ˘ ¯ ω0 = 0, ωn = inf t > ωn−1 : Vt = 0, Vs > 0 for some t ∈ [ωn−1 , t] ,
10. ESTIMATION OF THE ADJUSTMENT COEFFICIENT
113
the nth busy cycle is then [ωn−1 , ωn ), and the known fact that the Yn =
max
t∈[ωn−1 ,ωn )
Vt
are i.i.d. with a tail of the form P(Y > y) ∼ C1 e−γy (see e.g. Asmussen [65]) can then be used to produce an estimate of γ. This approach in fact applies also for many models more general than the compound Poisson one. Further work on estimation of γ with different methods can be found in Cs¨ org˝ o& Steinebach [268], Cs¨ org˝ o & Teugels [269], Deheuvels & Steinebach [285], Embrechts & Mikosch [348], Herkenrath [459], Hipp [464, 465], Frees [371], Mammitzsch [628], Brito & Freitas [202], Conti [256] and Pitts, Gr¨ ubel & Embrechts [707].
This page intentionally left blank
Chapter V
The probability of ruin within finite time This chapter is concerned with the finite time ruin probabilities ψ(u, T )
¡ ¢ = P τ (u) ≤ T ³ ´ = P inf Rt < 0  R0 = u 0≤t≤T ³ ´ = P sup St > u . 0≤t≤T
Only the compound Poisson case is treated; generalizations to other models are either discussed in the Notes and References or in relevant chapters. The notation is essentially as in Chapter IV. In particular, the premium rate is 1, the Poisson intensity is β and the claim size distribution is B with b and mean µB . The safety loading is η = 1/ρ − 1 where ρ = βµB . m.g.f. B[·] Unless otherwise stated, it is assumed that η > 0 and that the adjustment coefficient (Lundberg exponent) γ, defined as solution of κ(γ) = 0 where κ(α) = b − 1) − α, exists. Further let γm be the unique point in (0, γ) where κ(α) β(B[α] attains its minimum value, see Fig. V.1 (the role of γy will be explained in Section 4b). The claims surplus is {St }, the time of ruin is τ (u) and ξ(u) = Sτ (u) − u is the overshoot. 115
116
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
Figure V.1
1
Exponential claims
Proposition 1.1 In the compound Poisson model with exponential claims with rate ν and safety loading η > 0, the conditional mean and variance of the time to ruin are given by ¯ £ ¤ E τ (u) ¯ τ (u) < ∞ = ¯ £ ¤ Var τ (u) ¯ τ (u) < ∞ =
βu + 1 , ν−β 2βνu + β + ν . (ν − β)3
(1.1) (1.2)
Proof. Let as in Example IV.5.1 PL , EL refer to the exponentially tilted process with arrival intensity ν and exponential claims with rate β (thus, ρL = ν/β = 1/ρ > 1). By the likelihood identity IV.(4.8), we have for k = 1, 2 that £ ¤ E τ (u)k ; τ (u) < ∞ = EL τ (u)k e−γSτ (u) = e−γu EL e−γξ(u) EL τ (u)k β = e−γu EL τ (u)k = ψ(u)EL τ (u)k , ν using that the overshoot ξ(u) is exponential with rate β w.r.t. PL and independent of τ (u). In particular, ¯ ¯ £ ¤ £ ¤ E τ (u) ¯ τ (u) < ∞ = EL τ (u), Var τ (u) ¯ τ (u) < ∞ = VarL τ (u) . For (1.1), we have by Wald’s identity that (note that EL St = t(ρL − 1)) EL Sτ (u)
=
(ρL − 1)EL τ (u),
EL τ (u)
=
u + 1/β βu + 1 u + EL ξ(u) = . = ρL − 1 ν/β − 1 ν−β
1. EXPONENTIAL CLAIMS
117
For (1.2), Wald’s second moment identity yields ¡ ¢2 2 EL Sτ (u) − (ρL − 1)τ (u) = σL EL τ (u) 2 where σL = κ00 (γ) = 2ν/β 2 . Since Sτ (u) and (ρL − 1)τ (u) are independent with the same mean, the l.h.s. is
VarL Sτ (u) + VarL ((ρL − 1)τ (u))
= =
VarL ξ(u) + (ρL − 1)2 VarL τ (u) µ ¶2 ν 1 + − 1 VarL τ (u). β2 β
Thus the l.h.s. of (1.2) is 2 EL τ (u) − 1/β 2 σL 2ν(βu + 1)/(ν − β) − 1 , = (ν/β − 1)2 (ν − β)2
which is the same as the r.h.s.
2
Proposition 1.2 In the compound Poisson model with exponential claims with rate ν and safety loading η > 0, the Laplace transform of the time to ruin is given by ³ £ ¤ ρδ ´ (1.3) Ee−δτ (u) = E e−δτ (u) ; τ (u) < ∞ = e−ρδ u 1 − ν √ for δ ≥ κ(γm ) = 2 βν − β − ν, where p ν − β − δ + (ν − β − δ)2 + 4δν . ρδ = 2 √ Proof. It is readily checked that γm = ν − βν and hence that the value of κ(γm ) is as asserted. Let ρδ > γm be determined by κ(ρδ ) = δ. This means that β(ν/(ν − ρδ ) − 1)−ρδ = δ, which leads to the quadratic ρ2δ +(β −ν +δ)ρδ −νδ = 0 with solution ρδ (the sign of the square root is + because ρδ > 0). But by the fundamental likelihood ratio identity (Theorem IV.4.3) we have £ ¤ E e−δτ (u) ; τ (u) < ∞ £ © ª ¤ = Eρδ exp −δτ (u) − ρδ Sτ (u) + τ (u)κ(ρδ ) ; τ (u) < ∞ νρδ , = e−ρδ u Eρδ e−ρδ ξ(u) = e−ρδ u νρδ + ρδ where we used that Pρδ (τ (u) < ∞) = 1 because ρδ > γm and hence Eρδ S1 = κ0 (ρδ ) > 0. Using νρδ = ν − ρδ , the result follows. 2
118
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME Note that it follows from Proposition 1.3 that we can write Ee−δτ (u) = e−ρδ u Ee−δτ (0) .
(1.4)
The interpretation of this is that τ (u) can be written as the independent sum of τ (0) plus a r.v. Y (u) belonging to a convolution semigroup. More precisely, M (u)
τ (u) = τ +
X
τk
(1.5)
k=1
where τ = τ (0) is the length of the first ladder segment, τ1 , τ2 , . . . are the lengths of the ladder segments 2, 3, . . ., and M (u) + 1 is the index of the ladder segment corresponding to τ (u). Cf. Fig. V.2, where Y1 , Y2 , . . . are the ladder heights which form a terminating sequence of exponential r.v.’s with rate ν. S τ (u) 6 t 6
Y2 τ1 Y1
t τ
τ1
u Y1

τ
Y2
Figure V.2 For numerical purposes, the following formula is convenient by allowing ψ(u, T ) to be evaluated by numerical integration: Proposition 1.3 Assume that claims are exponential with rate ν = 1. Then Z 1 π f1 (θ)f2 (θ) −(1−β)u dθ (1.6) ψ(u, T ) = βe − π 0 f3 (θ) where
n p ¢o ¡p β exp 2 βT cos θ − (1 + β)T + u β cos θ − 1 , ¢ ¡ p ¡ p ¢ f2 (θ) = cos u β sin θ − cos u β sin θ + 2θ , p f3 (θ) = 1 + β − 2 β cos θ . f1 (θ) =
1. EXPONENTIAL CLAIMS
119
Note that the case ν 6= 1 is easily reduced to the case ν = 1 via the formula ψβ,ν (u) = ψβ/ν,1 (νu, νT ). Proof. We use the formula ψ(u, T ) = P(VT > u) where {Vt } is the workload process in an initially empty M/M/1 queue with arrival rate β and service rate ν = 1, cf. Corollary III.3.6. Let {Qt } be the queue length process of the queue (number in system, including the customer being currently served). If QT = N > 0, then VT = U1,T + · · · + UN,T , where U1,T is the residual service time of the customer being currently served and U2,T , . . . , UN,T the service times of the customers awaiting service. Since U1,T , U2,T , . . . , UN,T are conditionally i.i.d. and exponential with rate ν = 1, the conditional distribution of VT given QT = N is that of EN where the r.v. EN has an Erlang distribution with parameters (N, 1), i.e. density xN −1 e−x /(N − 1)!. Hence ψ(u, T ) =
P(VT > u) =
∞ X
P(QT = N )P(EN > u)
N =1
= =
∞ X
P(QT = N )
N =1 ∞ X
e−u
k=0
e−u
k=0
N −1 X
uk k!
k
u P(QT ≥ k + 1) . k!
(1.7)
For j = 0, 1, 2, . . ., let (cf. [4]) Z ∞ X 1 π x cos θ (x/2)2n+j = e cos jθ dθ Ij (x) = n!(n + j)! π 0 n=0
(1.8)
denote the modified Bessel function of order j, let I−j (x) = Ij (x), and define √ ιj = e−(1+β)T β j/2 Ij (2 βT ). Then (see Prabhu [712, pp. 9–12], in particular equations (1.38), (1.44); similar formulas are in [APQ, pp. 87–89]) ∞ X
ιj
= 1,
j=−∞
P(QT ≥ k + 1)
=
1−
k X
ιj + β k+1
j=−∞
=
β k+1 +
∞ X j=k+1
−k−2 X
ιj
j=−∞
ιj − β k+1
∞ X j=−k−1
ιj .
120
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
By Euler’s formulas, ∞ h X i h β (k+1)/2 ei(k+1)θ i β j/2 cos(jθ) = < β j/2 eijθ = < β 1/2 eiθ − 1 j=k+1 j=k+1 £ ¡ ¢¤ < β (k+1)/2 ei(k+1)θ β 1/2 e−iθ − 1 = ¯ ¯ ¯β 1/2 eiθ − 1¯2 £ ¡ ¢¤ β (k+1)/2 β 1/2 cos(kθ) − cos (k + 1)θ , = f3 (θ) ∞ X
β k+1
∞ X
β j/2 cos(jθ) = β k+1
T = P M (0, T ) Z Z T ¡ ¢ ¢ 1 T ¡ 1 P M (v, T ) dv = E I M (v, T ) dv, = T 0 T 0 where the second equality follows from III.(5.2) with A = (0, ∞), and the third from the obvious fact (exchangeability properties of the Poisson process) that © (v) ª © (0) ª ¡ ¢ St has the same distribution as St = St so that P M (v, T ) does not depend on v. ¢ RT ¡ Now consider the evaluation of 0 I M (v, T ) dv. Obviously, this integral (v) is 0 if ST ≡ ST > 0. If ST < 0, there exist v such that M (v, T ) occurs. For example, letting ω = inf {t > 0 : St− = min0≤w≤T Sw }, we can take v ∈ (ω − ², ω) for some small ². We claim that if M (0, T ) occurs, then M (v, T ) = M (0, v). Indeed, we can write M (v, T ) as © ª © ª ST ≤ St+v − Sv , 0 ≤ t ≤ T − v ∩ ST ≤ ST − Sv + St−T +v , T − v ≤ t ≤ T © ª © ª = ST ≤ St − Sv , v ≤ t ≤ T ∩ ST ≤ ST − Sv + St , 0 ≤ t ≤ v = {ST ≤ St − Sv , v ≤ t ≤ T } ∩ M (0, v) = M (0, v),
2. THE RUIN PROBABILITY WITH NO INITIAL RESERVE
123
where the last equality follows from ST ≤ St on M (0, T ) and Sv ≤ 0 on M (0, v). It follows that if M (0, T ) occurs, then 1 T
Z
T 0
¡ ¢ 1 I M (v, T ) dv = T
Z
T
¡ ¢ I M (0, v) dv = −ST
0
(note that the Lebesgue measure of the v for which {St } is at a minimum at v is exactly −ST on M (0, T )). It is then clear from the cyclical nature of the problem that this holds irrespective of whether M (0, T ) occurs or not as long as ST < 0. Hence Z
T
¡ ¢ I M (v, T ) dv 0 Z 1 ∞ 1 EST− = P(ST ≤ −x) dx = T T 0 Z Z NT ´ 1 T 1 T ³X P(ST ≤ −x) dx = P Ui ≤ T − x dx . = T 0 T 0 i=1
1 E T
2 Let f (·, t) denote the density of F (·, t). Z Theorem 2.2 1 − ψ(u, T ) = F (u + T, T ) −
T¡
¢ 1 − ψ(0, T − t) f (u + t, t) dt.
0
o nP NT can occur in two ways: either Proof. The event {ST ≤ u} = 1 Ui ≤ u + T ruin does not occur in [0, T ], or it occurs, in which case there is a last time σ where St downcrosses level u, cf. Fig V.4. Here σ ∈ [t, t + dt] occurs if and only if St ∈ [u, u + dt] and there is no upcrossing of level u after time t, which occurs w.p. ψ(T − t). Hence Z P(ST ≤ u) = 1 − ψ(u, T ) +
T¡
¢ ¡ ¢ 1 − ψ(0, T − t) P St ∈ [u, u + dt] ,
0
which is the same as the assertion of the theorem.
2
124
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME 6
u
σ
T
Figure V.4 The following representation of τ (0) will be used in the next section. The proof will be combined with the proof of Theorem IV.2.2. Proposition 2.3 Define τ− (z) = inf {t > 0 : St = −z}, z > 0. Let Z be a r.v. which is independent ¢of St and ¡ ¡ has the ¢stationary excess distribution B0 . Then P τ (0) ∈ ·  τ (0) < ∞ = P τ− (Z) ∈ · . Proof of Theorem IV.2.2. For a fixed T > 0, define St∗ = ST − ST −t− and let © ª A(z, T ) = St < 0, 0 < t < T, ST − = −z , © ª C(z, T ) = St > −z, 0 < t < T, ST − = −z , © ∗ ª C ∗ (z, T ) = St > −z, 0 < t < T, ST∗ − = −z . Then ¡ ¢ ¡ ¢ P τ (0) ∈ [T, T + dT ], −Sτ (0)− ∈ [z, z + dz] = P A(z, T ) βB(z) dz dT. (2.1) But by sample path inspection (cf. Fig. V.5), A(z, T ) = C ∗ (z, T ), ¡and since¢ {St }0≤t≤T , {St∗ }0≤t≤T have the same distribution, we therefore have P A(z, T ) ¡ ¢ = P C(z; T ) . Hence integrating (2.1) yields Z ∞ ¡ ¢ ¡ ¢ P −Sτ (0)− ∈ [z, z + dz], τ (0) < ∞ = βB(z) dz P C(z, T ) dT ¡ 0 ¢ = βB(z) dz P τ− (z) < ∞ = βB(z) dz.
2. THE RUIN PROBABILITY WITH NO INITIAL RESERVE
6
6
St
St∗
T = τ+
−z
125
T 
Figure V.5 Thus ¡ ¢ P −Sτ (0)− > x, Sτ (0) > y; τ (0) < ∞ Z ∞ ¯ ¡ ¢ ¡ ¢ = P U > y + z ¯ U > z P −Sτ (0)− ∈ [z, z + dz], τ (0) < ∞ x ∞
Z =
x
B(y + z) βB(z) dz = β B(z)
Z
Z
∞
∞
B(y + z) dz = β x
B(z) dz , x+y
which is the assertion of Theorem IV.2.2.
2
Proof of Proposition 2.3. It follows by division by ¡ ¢ P Sτ (0)− ∈ [z, z + dz], τ (0) < ∞ = βB(z) dz in (2.1) that ¡ ¢ ¡ ¢ P τ (0) ∈ [T, T + dT ]  Sτ (0)− ∈ [z, z + dz], τ (0) < ∞ = P C(z) dT. Hence ¡ ¢ P τ (0) ∈ [T, T + dT ]  τ (0) < ∞ Z ∞ ¡ ¢ ¡ ¢ = dT P C(z) P Sτ (0)− ∈ [z, z + dz], τ (0) < ∞ Z0 ∞ ¡ ¢ ¡ ¢ = dT P C(z) P Z ∈ [z, z + dz], τ (0) < ∞ ¡0 ¢ = dT P τ− (Z) ∈ [T, T + dT ] . 2
126
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
Notes and references For Theorems 2.1, 2.2, see in addition to Prabhu [711] also Seal [784, 787]. Theorem 2.1 and the present proof is in the spirit of Ballot theorems, cf. Tak´ acs [827]; a martingale proof is in Delbaen & Haezendonck [287]. For related inequalities for positive u, see De Vylder & Goovaerts [304]. Proposition 2.3 was noted by Asmussen & Kl¨ uppelberg [86], who instead of the present direct proof gave two arguments, one based upon a result of Asmussen & Schmidt [103] generalizing Theorem III.5.5 and one upon excursion theory for Markov processes (see X.4a). For discrete claim size distributions, Picard & Lef`evre [701] used generalized Appell polynomials to develop recursion formulae for finite time ruin probabilities, see also Rulli`ere & Loisel [757]. This was later extended to more general setups including dependent claims, cf. for instance Ignatov & Kaishev [493, 494] and Lef`evre & Loisel [575]. Continuous versions of the discrete expressions of [701] are given in De Vylder & Goovaerts [303]. In the setting of general L´evy processes, some relevant references are Shtatland [798] and Gusak & Korolyuk [442].
3
Laplace transforms
As usual, −ρδ denotes the negative solution of the equation ¡ ¢ b − 1 − r = δ. κ(r) = β B[r]
(3.1)
Let τ− (y) be defined as Proposition 2.3. Note that τ− (y) < ∞ a.s. because of η > 0. Lemma 3.1 Ee−δτ− (y) = e−ρδ y . Proof. Optional stopping at τ− (y) ∧ T of the martingale © −ρ St −tκ(−ρ ) ª © −ρ St −δt ª δ e δ = e δ and letting T → ∞ using dominated convergence yields 1 = eρδ y Ee−δτ− (y) . 2 £ ¤ Let gδ (x) be the density of the measure E e−δτ (0) ; τ (0) < ∞, ξ(0) ∈ dx (recall that ξ(0) = Sτ (0) ). Z Lemma 3.2 gδ (x) = β eρδ x
∞
e−ρδ y B(dy).
x
Proof. Let Z be the surplus −Sτ (0)− just before ruin. Then by Proposition 2.3, ¯ £ ¤ E e−δτ (0) ¯ τ (0) < ∞, Z = y = Ee−δτ− (y) = e−ρδ y .
3. LAPLACE TRANSFORMS
127
Further by Theorem IV.2.2 ¡ ¢ P Z ∈ [y, y + dy], ξ(0) ∈ dx = βB(x + dy) dx and hence Z gδ (x) =
∞
Z e−ρδ y βB(x + dy) = β
0
∞
e−ρδ (y−x) B(dy) .
2
x
Lemma 3.3 For the Laplace transform gbδ [−s] = gbδ [−s] =
R∞ 0
e−sx gδ (x) dx we have
κ(−s) − s − δ + ρδ . ρδ − s
Proof. Z gbδ [−s] = = = =
∞
Z
∞
dx e−ρδ y B(dy) 0 x Z ∞ Z y −ρδ y β e B(dy) ex(−s+ρδ ) dx 0 0 Z ∞ β e−ρδ y B(dy)[ey(ρδ −s) − 1] ρδ − s 0 ¢ β ¡b b B[−s] − B[−ρ δ] . ρδ − s
β
x(−s+ρδ )
e
b The result follows by inserting β B[−s] = κ(−s) + β − s and κ(−ρδ ) = δ. Corollary 3.4 E[e−δτ (0) ; τ (0) < ∞) = 1 −
2
δ . ρδ
Proof. Let b = 0.
2
Here is a classical result: the double Laplace transform of the ruin time τ (u): Z ∞ £ ¤ κ(−s)/s − δ/ρδ . Corollary 3.5 e−su E e−δτ (u) ; τ (u) < ∞ du = κ(−s) − δ 0 £ ¤ Proof. Define Zδ (u) = E e−δτ (u) ; τ (u) < ∞ . It is then easily seen that Zδ (u) Ru is the solution of the renewal equation Zδ (u) = zδ (u) + 0 Zδ (u − x)gδ (x) dx R∞ where zδ (u) = u gδ (x)dx. Hence Z 0
∞
£ ¤ e−su du E e−δτ (u) ; τ (u) < ∞ = Zbδ [−s] =
gbδ [0] − gbδ [−s] zbδ [−s] = . 1 − gbδ [−s] s(1 − gbδ [−s])
128
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
Using Lemma 3.3, the result follows after simple algebra.
2
Notes and references An explicit inversion of the double Laplace transform in Corollary 3.5 to obtain expressions for ψ(u, t) in terms of infinite series can be found for claim size distributions of mixed Erlang type in Garcia [389] and Dickson & Willmot [322], see also Willmot & Woo [893] and Dickson [310]. For a power series expansion, see e.g. Usabel [859, 860]. An alternative very accurate numerical method is to randomize the time horizon T and exploit the resulting additional smoothness of the problem, in particular in a matrixanalytic framework, cf. Section IX.8. In Chapter XII the results of this section will be extended in various directions in the context of GerberShiu functions.
4
When does ruin occur?
For the general compound Poisson model, the known results are even less explicit than for the exponential claims case, and take basically the form of approximations and inequalities. The first main result of the present section is that the value u mL , where mL =
1 1 C 1 = = = , 0 b κ (γ) βL EL U − 1 1−ρ β B [γ] − 1 0
is in some appropriate sense critical as the most ‘likely’ time of ruin (here C is the Cram´erLundberg constant). Later results then deal with more precise and refined versions of this statement. P
Theorem 4.1 Assume η > 0. Then given τ (u) < ∞, τ (u)/u → mL as u → ∞. That is, for any ² > 0 ¯ ¯ ³¯ τ (u) ´ ¯ ¯ ¯ − mL ¯ > ² ¯ τ (u) < ∞ → 0 . P ¯ (4.1) u Further, for any m ψ(u, mu) → ψ(u)
½
0 m < mL 1 m > mL .
(4.2)
For the proof, we need the following auxiliary result: Proposition 4.2 Assume η < 0, i.e. ρ = βµB > 1. Then as u → ∞, 1 τ (u) a.s. → m = , u ρ−1 τ (u) − mu D √ → N (0, ω 2 ) u
Eτ (u) 1 → , u ρ−1 (2)
where ω 2 = βµB m3 .
(4.3) (4.4)
4. WHEN DOES RUIN OCCUR?
129 a.s.
Proof. The assumption η < 0 ensures that P(τ (u) < ∞) = 1 and τ (u) → ∞. a.s. By Proposition IV.1.2, St /t → 1/m, and hence a.s. m = lim
t→∞
t τ (u) τ (u) τ (u) = lim = lim = lim , u→∞ Sτ (u) u→∞ u + ξτ (u) u→∞ u St
using ξτ (u) = o(u) a.s., cf. Proposition A1.6. This proves the first assertion of (4.3). For the second, note that by Wald’s identity u + Eξ(u) = ESτ (u) = Eτ (u) · ES1 = (ρ − 1)Eτ (u) and that Eξ(u)/u → 0, cf. again Proposition A1.6. For (4.4), note first that (Proposition IV.1.5) ³ ´ St − t/m D (2) √ → N 0, βµB . t According to Anscombe’s theorem (e.g. Theorem 7.3.2 of [246]) and (4.3), the same conclusion holds with t replaced by τ (u). If Z ∼ N (0, 1), this can be rewritten as q u + ξ(u) − τ (u)/m (2) p ≈ βµB Z, implying τ (u) q q τ (u) − mu D (2) (2) p ≈ −m βµB Z = m βµB Z, τ (u) q τ (u) − mu (2) 3/2 √ ≈ m βµB Z = ωZ. u 2 Proof of Theorem 4.1. The l.h.s. of (4.1) is ¯ ³¯ τ (u) ´ ¯ ¯ − mL ¯ > ², τ (u) < ∞ P ¯ u P(τ (u) < ∞) ¯ ¯ τ (u) h i ¯ ¯ − mL ¯ > ², τ (u) < ∞ e−γu EL e−γξ(u) ; ¯ u = ψ(u) ¯ ³¯ τ (u) ´ ¯ ¯ − mL ¯ > ² e−γu PL ¯ u ≤ . O(e−γu ) By Proposition 4.2, PL (·) → 0, proving (4.1), and (4.2) follows immediately from (4.1). 2
130
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
Notes and references Theorem 4.1 is standard, though it is not easy to attribute priority to any particular author. Thus, the result comes out not only by the present direct proof but also from any of the results in the following subsections. For a study of the distribution of the number of claims until ruin, see Egidio dos Reis [340].
4a
Segerdahl’s normal approximation
We shall now prove a classical result due to Segerdahl, which may be viewed both as a refinement of Theorem 4.1 (by considering ψ(u, T ) for T which are close to the critical value umL ), and as a timedependent version of the Cram´erLundberg approximation. Corollary 4.3 (Segerdahl [791]) Let C be the Cram´erLundberg constant 2 b 00 [γ]m3 where mL = 1/(ρL −1) = 1/(β B b 0 [γ]− and define ωL = βL EL U 2 m3L = β B L 1). Then for any y, ¡ √ ¢ (4.5) eγu ψ u, umL + yωL u → CΦ(y). For the proof, we need the following auxiliary result: Proposition 4.4 (Stam’s lemma) If η < 0, then ξ(u) and τ (u) are asymptotically independent in the sense that, letting Z be a N (0, ω 2 ) r.v. with ω 2 as in (4.4), one has ¡ ¢ ¡ ¢ ³ τ (u) − mu ´ √ → Ef ξ(∞) · Eg(Z) (4.6) Ef ξ(u) g u whenever f, g are continuous and bounded on [0, ∞), resp. (−∞, ∞). Proof. Define u0 = u − u1/4 . Then the distribution of τ (u) − τ (u0 ) given Fτ (u0 ) is readily seen to be degenerate at zero if Sτ (u0 ) > u and otherwise that of τ (v) ¡ ¢ with v = u − Sτ (u0 ) = u1/4 − ξ τ (u0 ) . Using (4.3), we get £ ¤ £ ¡ ¢ ¤ E τ (u) − τ (u0 ) = E τ u1/4 − ξ(u0 ) ; ξ(u0 ) ≤ u1/4 ≤ Eτ (u1/4 ) = O(u1/4 ), ¡ ¢ 0 and thus in (4.6), we ¡ can¢replace τ (u) by τ (u ). Let h(u) = Ef ξ(u) . Then h(u) → h(∞) = Ef ξ(∞) , and similarly as above we get £ ¡ ¢¯ ¤ E f ξ(u) ¯ Fτ (u0 ) ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ = h u1/4 − ξ(u0 ) I ξ(u0 ) ≤ u1/4 + f ξ(u0 ) − u1/4 I ξ(u0 ) > u1/4 P
→
h(∞) + 0,
4. WHEN DOES RUIN OCCUR?
131
P D using that u1/4 − ξ(u0 ) → ∞ w.r.t. P because of ξ(u0 ) → ξ(∞) (recall that η < 0). Hence
¡ ¢ ³ τ (u0 ) − mu ´ √ Ef ξ(u) g u
= ∼
h £ ¡ ¢¯ ¤ ³ τ (u0 ) − mu ´i √ E E f ξ(u) ¯ Fτ (u0 ) g u ³ τ (u0 ) − mu ´ √ ∼ h(∞)Eg(Z) . h(∞)Eg u 2
Proof of Corollary 4.3. ¡ √ ¢ eγu ψ u, umL + yωL u
= = ∼ →
¡ √ ¢ eγu P τ (u) ≤ umL + yωL u £ √ ¤ EL e−γξ(u) ; τ (u) ≤ umL + yωL u ¡ √ ¢ EL e−γξ(u) · PL τ (u) ≤ umL + yωL u CΦ(y),
where we used Stam’s lemma in the third step and (4.4) in the last.
2
For practical purposes, Segerdahl’s result suggests the approximation ¶ µ T − umL −γu √ . (4.7) ψ(u, T ) ≈ Ce Φ ωL u √ To arrive at this, just substitute T = umL + yωL u in (4.5) and solve for y = y(T ). The precise condition for (4.7) to be valid is that T varies with u in such a way that y(T ) has a limit in (−∞, ∞) as u → ∞. Thus, in practice one would trust (4.7) whenever u is large and y(T ) moderate or small (numerical evidence presented in [55] indicates, however, that for the fit of (4.7) to be good, u needs to be very large). A remarkably sharp and explicit asymptotic result in terms of the time horizon T is the following: Theorem 4.5 For every fixed u ≥ 0, we have, as T → ∞, ¡ ¢ ψ(u) − ψ(u, T ) ∼ C1 e−γm u−κ(γm )T T −3/2 1 + H(u) ,
(4.8)
¡ ¢−1/2 −2 where C1 = 2πβB 00 [γm ] γm and H(u) is a renewal function which satisfies ¡ 00 ¢−1 H(u) ∼ 2 βB [γm ] u as u → ∞. The proof is quite involved and uses deep results from random walk theory; we refer to Teugels [841].
132
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
Notes and references Corollary 4.3 is due to Segerdahl [791]. The present proof is basically that of Siegmund [806]; see also von Bahr [121] and Gut [443]. For refinements of Corollary 4.3 in terms of Edgeworth expansions, see Asmussen [55] and Malinovskii [625]. Cf. also H¨ oglund [478].
4b
Gerber’s timedependent version of Lundberg’s inequality
For y > 0, define αy , γy by κ0 (αy ) =
1 , y
γy = αy − yκ(αy ).
(4.9)
Note that αy > γm and that γy > γ (unless for the critical value y = 1/mL ), cf. Fig. V.1. Theorem 4.6 ψ(u, yu)
≤
e−γy u ,
y
1
,
(4.10)
1 . κ0 (γ)
(4.11)
κ0 (γ)
Proof. Consider first the case y < 1/κ0 (γ). Then κ(αy ) > 0 (see Fig. V.1), and hence £ ¤ ψ(u, yu) = e−αy u Eαy e−αy ξ(u)+τ (u)κ(αy ) ; τ (u) ≤ yu £ ¤ ≤ e−αy u Eαy eτ (u)κ(αy ) ; τ (u) ≤ yu ≤ e−αy u+yuκ(αy ) . Similarly, if y > 1/κ0 (γ), we have κ(αy ) < 0 and get £ ¤ ψ(u) − ψ(u, yu) = e−αy u Eαy e−αy ξ(u)+τ (u)κ(αy ) ; yu < τ (u) < ∞ £ ¤ ≤ e−αy u Eαy eτ (u)κ(αy ) ; yu < τ (u) < ∞ ≤ e−αy u+yuκ(αy ) . 2 Remark 4.7 It may appear that the proof uses considerably less information on αy than is inherent in the definition (4.9). However, the point is that we want to select an α which produces the largest possible exponent in the inequalities. From the proof it is seen that this amounts to that α should maximize α−yκ(α). Differentiating w.r.t. α, we arrive at the expression in (4.9). 2
4. WHEN DOES RUIN OCCUR?
133
In view of Theorem 4.6, γy is sometimes called the timedependent Lundberg exponent. An easy combination with the proof of Theorem IV.6.3 yields the following sharpening of (4.10): Proposition 4.8 ψ(u, yu) ≤ C+ (αy )e−γy u where C+ (αy ) = sup R ∞ x≥0
x
B(x) eαy (y−x) B(dy)
.
Notes and references Theorem 4.6 is due to Gerber [397], who used a martingale argument. For a different proof, see MartinL¨ of [631]. Numerical comparisons are in Grandell [430]; the bound e−γy u turns out to be rather crude, which may be understood from Theorem 4.9 below, which shows that the correct rate of decay of ψ(u, yu) is √ e−γy u / u. Some further discussion is given in XVI.2, and generalizations to more general models are given in Chapter VII. H¨ oglund [477] treats the renewal case.
4c
Arfwedson’s saddlepoint approximation
Our next objective is to strengthen the timedependent Lundberg inequalities to approximations. As a motivation, it is instructive to reinspect the choice of the change of measure in the proof, i.e. the choice of αy . For any α > γm , Proposition 4.2 yields Eα τ (u) ∼
u κ0α (0)
=
u κ0 (α)
.
I.e., if we want Eα τ (u) ≈ T , then the relevant choice is precisely α = αy where y = T /u. We thereby obtain that T is ‘in the center’ of the Pα distribution of τ (u). This idea is precisely what characterizes the saddlepoint method. The traditional application of the saddlepoint method is to derive approximations, not inequalities, and in case of ruin probabilities the approach leads to the following result: Theorem 4.9 If y < 1/κ0 (γ), then the solution α ey < αy of κ(e α) = κ(αy ) is < 0, and ψ(u, yu) ∼
αy − α ey e−γy u q · √ , u b 00 [αy ] αy e αy  2πyβ B
u → ∞.
(4.12)
134
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
If y > 1/κ0 (γ), then α ey > 0, and
ψ(u) − ψ(u, yu)
∼
α −α ey e−γy u qy · √ , u b 00 [αy ] αy α ey 2πyβ B
u → ∞.
(4.13)
Proof. In view of Stam’s lemma, the formula £ ¤ ψ(u, yu) = e−αy u Eαy e−αy ξ(u)+τ (u)κ(αy ) ; τ (u) ≤ yu suggests heuristically that £ ¤ ψ(u, yu) ≈ e−αy u Eαy e−αy ξ(∞) · Eαy eτ (u)κ(αy ) ; τ (u) ≤ yu .
(4.14)
Here the first expectation can be estimated similarly as in the proof of the Cram´erLundberg’s approximation in Chapter IV. Using Lemma IV.5.7 with P replaced by Pαey and PL by Pαy , we have γαey = αy − α ey and get
Eαy e−αy ξ(∞)
³ bαe [γαe − αy ] − 1 ´ B γαey y y 1 − β α ey αy κ0αey (γαey ) γαey − αy
=
³ b b αy ] − 1 ´ ey ]/B[e αy − α ey b αy ] B[γαey − αy + α 1 + β B[e 0 αy κ (γαey + α ey ) α ey ³ ´ b αy ] αy − α ey 1 − B[e 1+β 0 αy κ (αy ) α ey ¡ ¢ b αy ] ey + β 1 − B[e y(αy − α ey ) α · αy α ey y(αy − α ey )κ(αy ) −y(αy − α ey )κ(e αy ) = . α ey αy e αy 
= = = =
For the second term in (4.14), it seems tempting to apply the normal approximation (4.4). Writing τ (u) ≈ yu + u1/2 ωV , where V is normal(0, 1) under Pαy and (2)
b 00 [αy ]/(ρα − 1)3 = y 3 β B b 00 [αy ] , ω 2 = βαy µBαy /(ραy − 1)3 = β B y
4. WHEN DOES RUIN OCCUR?
135
we get heuristically that £ ¤ Eαy eτ (u)κ(αy ) ; τ (u) ≤ yu £ ¤ 1/2 = eyuκ(αy ) Eαy eκ(αy )u ωV ; V ≤ 0 Z ∞ 1/2 = eyuκ(αy ) e−κ(αy )u ωx ϕ(x) dx 0 Z ∞ ¡ ¢ 1 yuκ(αy ) e−z ϕ z/(κ(αy )u1/2 ω) dz = e κ(αy )u1/2 ω 0 Z ∞ 1 1 yuκ(αy ) e−z · √ dz ∼ e κ(αy )u1/2 ω 0 2π 1 yuκ(αy ) √ = e . κ(αy ) 2πuω 2 Inserting these estimates in (4.14), (4.12) follows. The proof of (4.13) is completely similar. 2 The difficulties in making the proof precise is in part to show (4.14) rigorously, and in part that for the final calculation one needs a sharpened version of the CLT for ψ(u) (basically a local CLT with remainder term). ¡ ¢ Example 4.10 Assume that B(x) = e−νx . Then κ(α) = β ν/(ν − α) − 1 − α, κ0 (α) = βα/(ν − α)2 − 1, and the equation κ0 (α) = 1/y is easily seen to have solution s βν αy = ν − 1 + 1/y (the sign of the square root is negative because the c.g.f. is undefined for α > ν). It follows that s s βν βν ναy = ν − αy = , βαy = β + αy = β + ν − , 1 + 1/y 1 + 1/y s αy − α fy = βαy − ναy = β + ν − 2
b 00 [αy ] = B
s βν , 1 + 1/y
α fy =
2ν 1/2 (1 + y)3/2 2ν = , 3 (ν − αy ) β 3/2
βν − β, 1 + 1/y
136
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
and (4.12) gives the approximation r ³ βν ´ β 1/4 β + ν − 2 e−γy u 1 + 1/y r r · √ ´ ³ ´³ √ u βν βν ν 1/4 (1 + 1/y)3/4 4πy ν− β− 1 + 1/y 11/ + y for ψ(u, yu) when y < 1/κ0 (γ) = ρ/1 − ρ.
2
Notes and references Theorem 4.9 is from Arfwedson [51]. A related result appears in BarndorffNielsen & Schmidli [138].
5
Diffusion approximations
The idea behind the diffusion approximation is to first approximate the claim surplus process by a Brownian motion with drift by matching the two first moments, and next to note that such an approximation in particular implies that the first passage probabilities are close. The mathematical result behind is Donsker’s theorem for a simple random walk {Sn∗ }n=0,1,... in discrete time: if µ = ES1∗ is the drift and σ 2 = Var(S1∗ ) the variance, then o n 1 ª D © ∗ √ (Sbtcc − tcµ) → W0 (t) t≥0 , c → ∞, (5.1) σ c t≥0 where {Wζ (t)} is Brownian motion with drift ζ and variance (diffusion constant) D 1 (here → refers to weak convergence in D = D[0, ∞)). It is fairly straightforward to translate Donsker’s theorem into a parallel statement for continuous time random walks (L´evy processes), of which a particular case is the claim surplus process (see the proof of Theorem 5.1 below). However, for the purpose of approximating ruin probabilities the centering around the mean (the tcµ term in (5.1)) is inconvenient. We want an approximation of the claim surplus process itself, and this can be obtained under the assumption that the safety loading η is small and positive. This is the regime of the diffusion approximation (note that this is just the same as for the heavy traffic approximation for infinite horizon ruin probabilities studied in IV.7c). © (p) ª Mathematically, we shall represent this assumption on η by a family St t≥0 of claim surplus processes indexed by the premium rate p, such that the claim size distribution B and the Poisson rate β are the same for all p (i.e., St = PNt i=1 Ui − tp), and consider the limit p ↓ ρ, where ρ is the critical premium rate βµB .
5. DIFFUSION APPROXIMATIONS
137
Theorem 5.1 As p ↓ ρ, we have n µ σ
o
(p)
S 2 tσ 2 /µ2
t≥0
ª D © → W−1 (t) t≥0
(5.2)
(2)
where µ = µp = ρ − p, σ 2 = βµB . Proof. The first step is to note that n 1 ¡ n 1 ¡ ¢o ¢o D © ª (p) (ρ) √ Stc √ Sct − tcµp = − ρct → W0 (t) σ c σ c
(5.3)
whenever c = cp ↑ ∞ as p ↓ ρ. Indeed, this is an easy consequence of (5.1) with (p) Sn∗ = Sn and the inequalities (ρ)
(ρ)
Sn/c − ρ/c ≤ St
(ρ)
≤ S(n+1)/c + ρp/c,
n/c ≤ t ≤ (n + 1)/c ,
cf. Lemma IV.1.3. Letting c = σ 2 /µ2p , (5.3) takes the form o ª D © (p) S + t → W0 (t) , 2 2 σ 2 tσ /µ o n µ ª © ª D © (p) S → W0 (t) − t = W−1 (t) . 2 2 tσ /µ 2 σ
n µ
2 Now let © ª (p) τp (u) = inf t ≥ 0 : St > u ,
© ª τζ (u) = inf t ≥ 0 : Wζ (t) > u .
It is wellknown (Corollary III.1.6 or [APQ, p. 263]) that the distribution IG(·; ζ; u) of τζ (u) (often referred to as the inverse Gaussian distribution) is given by ³ u ³ u ¡ ¢ √ ´ √ ´ IG(x; ζ; u) = P τζ (u) ≤ x = 1−Φ √ −ζ x +e2ζu Φ − √ −ζ x . (5.4) x x Note that IG(·; ζ; u) is defective when ζ < 0. Corollary 5.2 As p ↓ ρ, ψp
³ uσ 2 T σ 2 ´ , → IG(T ; −1; u) . µ µ2
138
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
Proof. Since f → sup0≤t≤T f (t) is continuous on D a.e. w.r.t. any probability measure concentrated on the continuous functions, the continuous mapping theorem yields µ (p) D sup 2 Stσ2 /µ2 → sup W−1 (t). σ 0≤t≤T 0≤t≤T Since the r.h.s. has a continuous distribution, this implies ³ ´ ³ ´ µ (p) P sup 2 Stσ2 /µ2 > u → P sup W−1 (t) > u . 0≤t≤T σ 0≤t≤T ¡ ¢ But the l.h.s. is ψp uσ 2 /µ, T σ 2 /µ2 , and the r.h.s. is IG(T ; −1; u).
2
For practical purposes, Corollary 5.2 suggests the approximation ψ(u, T ) ≈ IG(T µ2 /σ 2 ; uµ/σ 2 ).
(5.5)
Note that letting T → ∞ in (5.5), we obtain formally the approximation 2
ψ(u) ≈ IG(∞; uµ/σ 2 ) = e−2uµ/σ .
(5.6)
This is the same as the heavytraffic approximation derived in IV.7c. However, since ψ(u) has infinite horizon, the continuity argument above does not generalize immediately, and in fact some additional arguments are needed to justify (5.6) from Theorem 5.1. Because of the direct argument in Chapter IV, we omit the details; see Grandell [426], [427] or [APQ, pp. 196, 199]. Checks of the numerical fits of (5.5) and (5.6) are presented, e.g., in Asmussen [55]. The picture which emerges is that the approximations are not terribly precise, in particular for large u. In view of the excellent fit of the Cram´erLundberg approximation, (5.6) therefore does not appear to be of much practical relevance for the compound Poisson model. However, for more general models it may be easier to generalize the diffusion approximation than the Cram´erLundberg approximation; as an example of such a generalization we mention the paper [342] by Emanuel et al. on the premium rule involving interest. In contrast, the simplicity of (5.5) combined with the fact that finite horizon ruin probabilities are so hard to deal with even for the compound Poisson model makes this approximation more appealing. However, in the next subsection we shall derive a refinement of (5.5) for the compound Poisson model which does not require much more computation, and which is much more precise. We conclude this section by giving a more general triangular array version of Theorem 5.1. The proof is a straightforward combination of the proof of Theorem 5.1 and Section VIII.6 of [APQ].
6. CORRECTED DIFFUSION APPROXIMATIONS
139
© (θ) ª Theorem 5.3 Consider a family St of claim surplus processes indexed by a parameter θ, such that the Poisson rate βθ , the claim size distribution Bθ and the premium rate pθ depends on θ. Assume further that βθ µBθ < pθ , that D βθ → βθ0 , Bθ → Bθ0 , pθ → pθ0 , pθ − βθ µBθ → 0, as θ → θ0 and that the U 2 are uniformly integrable w.r.t. the Bθ . Then as θ → θ0 , we have o n µ ª D © (θ) S → W−1 (t) t≥0 (5.7) 2 /µ2 tσ σ2 t≥0 (2)
where µ = µθ = ρθ − pθ = βθ µBθ − pθ , σ 2 = σθ2 = βθ µBθ . Notes and references Diffusion approximations of random walks via Donsker’s theorem is a classical topic of probability theory. See for example Billingsley [167]. The first application in risk theory is Iglehart [492], and two further standard references in the area are Grandell [426], [427]. All material of this section can be found in these references. For claims with infinite variance, Furrer, Michna & Weron [383] suggested an approximation by a stable L´evy process rather than a Brownian motion. Further relevant references in this direction are Furrer [382], Boxma & Cohen [194] and Whitt [883].
6
Corrected diffusion approximations
The idea behind the simple diffusion approximation is to replace the risk process by a Brownian motion (by fitting the two first moments) and use the Brownian first passage probabilities as approximation for the ruin probabilities. Since Brownian motion is skipfree, this idea ignores (among other things) the presence of the overshoot ξ(u), which we have seen to play an important role for example for the Cram´erLundberg approximation. The objective of the corrected diffusion approximation is to take this and other deficits into consideration. The setup is the exponential family of compound risk processes with parameters βθ , Bθ constructed in IV.4. However, whereas there we let the given risk process with safety loading η > 0 correspond to θ = 0, it is more convenient here to use some value θ0 < 0 and let θ = 0 correspond to η = 0 (zero drift); this is because in the regime of the diffusion approximation, η is close to zero, and we want to consider the limit η ↓ 0 corresponding to θ0 ↑ 0. In terms of the given with Poisson intensity β, claim size dis¡ risk process ¢ b − 1 − α and ρ = βµB < 1, η = 1/ρ − 1 > 0, this tribution B, κ(α) = β B[α] means the following: 1. Determine γm > 0 by κ0 (γm ) = 0 and let θ0 = −γm .
140
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
2. Let P0 refer to the risk process with parameters b β0 = β B[−θ 0 ],
B0 (dx) =
e−θ0 x B(dx). b B[−θ 0]
b (k) [0] = B b (k) [−θ0 ]/B[−θ b Then E0 U k = B 0 ] and κ0 (s) = κ(s−θ0 )−κ(−θ0 ), 0 0 κ0 (0) = 0. 3. For each θ, let Pθ refer to the risk process with parameters b0 [θ] = β B[θ−θ b βθ = β0 B 0 ], Bθ (dx) =
eθx e(θ−θ0 )x B0 (dx) = B(dx). b0 [θ] b − θ0 ] B B[θ
Then κθ (s) = κ0 (s + θ) − κ0 (θ) = κ(s + θ − θ0 ) − κ(θ − θ0 ) and the given risk process corresponds to Pθ0 where θ0 = −γm . ¡ ¢ ¡ ¢ In this setup, Pθ τ (u) < ∞ = ¡ 1 for θ ≥ ¢ 0, Pθ τ (u) < ∞ < 1 for θ < 0, and we are studying ψ(u, T ) = Pθ0 τ (u) ≤ T for θ0 < 0, θ0 ↑ 0. Recall that IG(x; ζ; u) denotes the distribution function of the passage time of Brownian motion {Wζ (t)} with unit variance and drift ζ from level 0 to level u > 0. One has IG(x; ζ; u) = IG(x/u2 ; ζu; 1) . (6.1) The corrected diffusion approximation to be derived is ³Tν ν2 γu ν2 ´ 1 + ; − ; ψ(u, T ) ≈ IG 1 + u2 u 2 u
(6.2)
where as ususal γ > 0 is the adjustment coefficient for the given risk process, i.e. the solution of κ(γ) = 0, and b 00 [γm ], ν1 = β0 E0 U 2 = β B
ν2 =
b 000 [γm ] B E0 U 3 . = 2 b 00 [γm ] 3E0 U 3B
Write the initial reserve u for the given risk process as u = ζ/θ0 (note that ζ < 0) and, for brevity, write τ = τ (u), ξ = ξ(u) = Sτ − u. The first step in the derivation is to note that µ Varθ0 S1
= κ0 (0) = κ00 (θ0 ) ∼ θ0 κ000 (0) = θ0 ν1 = ∼ Var0 S1 = β0 E0 U 2 = ν1 , θ0 ↑ 0.
ζν1 , u
6. CORRECTED DIFFUSION APPROXIMATIONS
141
Theorem 5.3 applies and yields n ζν
1
uν1
o Stν1 u2 /ζ 2 ν12
t≥0
ª D © → W−1 (t) t≥0
which easily leads to n
o ª 1 D © → Wζ √ν1 (t) t≥0 , √ Stu2 u ν1 t≥0
³ √ 1 ´ = IG(tν1 ; ζ; 1) . ψ(u, tu2 ) → IG t; ζ ν1 ; √ ν1 Since Z
∞
e−αt IG(dt; ζ, u) = e−uh(λ,ζ)
where h(λ, ζ) =
p 2λ + ζ 2 − ζ,
(6.3)
0
this implies (take u = 1) © ª Eθ0 exp −λν1 τ (u)/u2 → e−h(λ,ζ) .
(6.4)
The idea of the proof is to improve upon this by an O(u−1 ) term (in the following, ≈ means up to o(u−1 ) terms): Proposition 6.1 As u → ∞, θ0 ↑ 0 in such a way that ζ = θ0 u is fixed, it holds for any fixed λ > 0 that © ª © ªh λν2 i . (6.5) Eθ0 exp −λν1 τ (u)/u2 ≈ exp −h(λ, −γu/2)(1 + ν2 /u) 1 + u Once this is established, we get by formal Laplace transform inversion that ³ tu2 ´ ¡ ν2 γu ν2 ´ ≈ IG t + ; − ; 1 + . ψ u, ν1 u 2 u Indeed, the r.h.s. is the c.d.f. of a (defective) r.v. distributed as Z − ν2 /u where Z has distribution IG (· ; −γu/2; 1 + ν2 /u). But the Laplace transform of such a r.v. is Ee−λZ eλν2 /u ≈ Ee−λZ [1 + λν2 /u] where the last expression coincides with the r.h.s. of (6.5) according to (6.3). To arrive at (6.2), just replace t by T ν1 /u2 . Note, however, that whereas the proof of Proposition 6.1 below is exact, the formal Laplace transform inversion is heuristic: an additional argument would be required to infer that the remainder term in (6.2) is indeed o(u−1 ). The
142
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
justification for the procedure is the wonderful numerical fit which has been found in numerical examples and which for a small or moderate safety loading η is by far the best among the various available approximations [note, however, that the saddlepoint approximation of BarndorffNielsen & Schmidli [138] is a serious competitor and is in fact preferable if η is large]. A numerical illustration is given in Fig. V.5, which is based upon exponential claims with mean µB = 1. The solid line represents the exact value, calculated using numerical integration and Proposition 1.3, and the dotted line the corrected diffusion approximation (6.2). In (1) and (2), we have ρ = β = 0.7, in (3) and (4), ρ = 0.4. The initial reserve u has been selected such that the infinite horizon ruin probability ψ(u) is 10% in (1) and (3), 1% in (2) and (4). (1)
(2)
0.1 ψ (u,T)
0.01
0.08
0.008
0.06
0.006
ψ (u,T)
0.004
0.04
0.002 0.02
0
40
80
120
160
200
240
280 T
0 40
80
120
(3)
160
200
240
280 T
(4)
ψ (u,T)
ψ (u,T) 0.01
0.1
0.09 0.008 0.08
0.07
0.006
0.06 0.004 0.05
0.04 0.002 0
20
40
60
80
T
100
0
20
40
60
80
T
100
Figure V.5 It is seen that the numerical fit is extraordinary for ρ = 0.7. Note that the ordinary diffusion approximation requires ρ to be close to 1 and ψ(u) to be not too small, and all of the numerical studies the authors knows of indicate that its fit at ρ = 0.7 or at values of ψ(u) like 1% is unsatisfying. Similarly, the fit at ρ = 0.4 may not be outstanding but nevertheless, it gives the right order
6. CORRECTED DIFFUSION APPROXIMATIONS
143
of magnitude and the ordinary diffusion approximation hopelessly fails for this value of ρ. For further numerical illustrations, see Asmussen [55], BarndorffNielsen & Schmidli [138] and Asmussen & Højgaard [81]. The proof of Proposition 6.1 proceeds in several steps. n ¢o λν1 τ ν1 ν2 τ ¡ e3 ξ 3 θ − ζ . Lemma 6.2 e−h(λ,ζ) ≈ Eθ0 exp h(λ, ζ) − 2 − u u 2u3 Proof. For θe ≥ 0, © ¡ ¢ª e − κ0 (θ0 ) . 1 = Pθe(τ < ∞) = Eθ0 exp (θe − θ0 )(u + ξ) − τ κ0 (θ) e and θ0 by ζ/u yields Replacing θe by θ/u © ¡ ¡ ¢ ¢ª e e e−(θ−ζ) = Eθ0 exp (θe − ζ)ξ/u − τ κ0 θ/u − κ0 (ζ/u) . Let θe = (2λ + ζ 2 )1/2 = h(λ, ζ) + ζ and note that κ0 (θ) =
1 ν1 ν2 θ3 1 2 ν1 θ2 θ β0 E0 U 2 + θ3 β0 E0 U 3 + · · · = + + · · · . (6.6) 2 6 2 2
Using θe2 − ζ 2 = 2λ, the result follows. Lemma 6.3 lim E0 ξ(u) = E0 ξ(∞) = ν2 = u→∞
2 E0 U 3 . 3E0 U 2
Proof. By partial integration, the formulas ¡ ¢ P0 ξ(0) > x = ¡ ¢ P0 ξ(∞) > x =
Z ∞ 1 P0 (U > y) dy , P0 (Sτ (0) > x) = E0 U x Z ∞ ¡ ¢ 1 P0 ξ(0) > y dy E0 ξ(0) x
imply E0 ξ(0)k =
E0 U k+1 E0 ξ(0)k+1 , E0 ξ(∞)k = . (k + 1)E0 U (k + 1)E0 ξ(0) 2
n ν τo 1 Lemma 6.4 Eθ0 exp − 2 u io n ³ ζ3 ν2 ´on ν2 h . 1 + 2λ + ζ 2 − ≈ exp −h(λ, ζ) 1 + u 2u (2λ + ζ)1/2
144
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
Proof. It follows by a suitable variant of Stam’s lemma (Proposition 4.4) that the r.h.s. in Lemma 6.2 behaves like i n λν τ oh ν1 ν2 τ e3 ξ 1 3 ( θ − ζ ) Eθ0 exp − 2 1 + h(λ, ζ) − u u 2u3 n λν τ o ν2 1 + e−h(λ,ζ) h(λ, ζ) ≈ Eθ0 exp − 2 u u n λν τ oi ¢ h ν1 τ ν2 ¡ e3 1 3 θ − ζ Eθ0 exp − 2 . (6.7) − 2u u2 u The last term is approximately ¢ d −h(λ,ζ) ν2 ¡ e3 θ − ζ3 e 2u dλ h i ζ3 ν2 2λ + ζ 2 − e−h(λ,ζ) = − 2u (2λ + ζ)1/2 i n ³ ζ3 ν2 h ν2 ´o exp −h(λ, ζ) 1 + 2λ + ζ 2 − . ≈ − 1/2 2u u (2λ + ζ) The result follows by combining Lemma 6.2 and (6.7) and using n ³ ν2 ´o ν2 ≈ exp −h(λ, ζ) 1 + . e−h(λ.ζ) − e−h(λ,ζ) h(λ, ζ) u u 2 The last step is to replace h(λ, ζ) by h(λ, −γu/2). There are two reasons for this: in this way, we get the correct asymptotic exponential decay parameter γ in the approximation (6.2) for ψ(u) (indeed, letting formally T → ∞ yields ψ(u) ≈ C 0 e−γu where C 0 = e−γν2 ); and the correction terms which need to be added cancels conveniently with some of the more complicated expressions in Lemma 6.4. n ³ ν2 ´o Lemma 6.5 exp −h(λ, ζ) 1 + u io n ³ ν2 h ζ3 ν2 ´on p − ζ2 . 1− ≈ exp −h(λ, −γu/2) 1 + u 2u 2λ + ζ 2 Proof. Use first (6.6) and κ0 (θ0 ) = κ0 (γ + θ0 ) to get 0 = i.e.
ν1 2 ν1 ν2 3 (γ + 2γθ0 ) + (γ + 3γ 2 θ0 + 3γθ02 ) + O(u−4 ), 2 2 ν2 γ + θ0 = − (γ 2 + 3γθ0 + 3θ02 ) + O(u−3 ). 2 2
6. CORRECTED DIFFUSION APPROXIMATIONS
145
Thus γ = −2θ0 + O(u−2 ), and inserting this and θ0 = ζ/u on the r.h.s. yields ν2 θ02 γ ν2 ζ 2 + θ0 = − + O(u−3 ) = − 2 + O(u−3 ). 2 2 2u Thus by Taylor expansion around ζ = θ0 u, we get i h ζ − 1 (γ/2 + θ0 )u , h(λ, −γu/2) ≈ h(λ, ζ) − p 2λ + ζ 2 n ³ ν2 ´o exp −h(λ, −γu/2) 1 + u i ³ n ³ o ζ ν2 ´on ν2 ´h p − 1 (γ/2 + θ0 )u 1 − 1+ ≈ exp −h(λ, ζ) 1 + u u 2λ + ζ 2 n ³ ν2 ´o ≈ exp −h(λ, ζ) 1 + u i ν ζ2 ³ © ª ζ ν2 ´h 2 p exp −h(λ, ζ) −1 − 1+ u 2u 2λ + ζ 2 ´o i n ³ © ª ν2 h ζ3 ν2 p − − ζ 2 exp −h(λ, ζ) . ≈ exp −h(λ, ζ) 1 + 2 u 2u 2λ + ζ Proof of Proposition 6.1: Just insert Lemma 6.5 in Lemma 6.4.
2 2
Notes and references Corrected diffusion approximations were introduced by Siegmund [809] in a discrete random walk setting, with the translation to risk processes being carried out by Asmussen [55]; this case is in part simpler than the general random walk case because the ladder height distribution G+ can be found explicitly (as ρB0 ) which avoids the numerical integration involving characteristic functions which was used in [809] to determine the constants. In Siegmund’s book [810], the approach to the finite horizon case is in part different and uses local central limit theorems. The adaptation to risk theory has not been carried out. The corrected diffusion approximation was extended to the renewal model in Asmussen & Højgaard [81], and to the Markovmodulated model of Chapter VII in Asmussen [58]; Fuh [379] considers the closely related case of discrete time Markov additive processes. Hogan [473] considered a variant of the corrected diffusion approximation which does not require exponential moments. His ideas were adapted by Asmussen & Binswanger [72] to derive approximations for the infinite horizon ruin probability ψ(u) when claims are heavytailed; the analogous analysis of finite horizon ruin probabilities ψ(u, T ) has not been carried out and seems nontrivial. For corrected diffusion approximations with higherorder terms, see Blanchet & Glynn [174]; their results also cover some heavytailed cases.
146
7
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME
How does ruin occur?
We saw in Section 4 that given that ruin occurs, the ‘typical’ value (say in sense of the conditional mean) was umL , that is, the same as for the unconditional Lundberg process. We shall now generalize this question by asking what a sample path of the risk process looks like given it leads to ruin. The answer is similar: the process behaved as if it changed its whole distribution to PL , i.e. changed its arrival rate from β to βL and its claim size distribution from B to BL . Recall that Fτ (u) is the stopping time σalgebra carrying all relevant information about τ (u) and {St }0≤t≤τ (u) . Define P(u) = P(·τ (u) < ∞) as the distribution of the risk process given ruin with initial reserve u. We are concerned with describing the P(u) distribution of {St }0≤t≤τ (u) (note that the behavior after © ª τ (u) is trivial: by the strong Markov property, Sτ (u)+t − Sτ (u) t≥0 is just an independent copy of {St }t≥0 ). © ª Theorem 7.1 Let F (u) u≥0 be any family of events with F (u) ∈ Fτ (u) and satifying PL F (u) → 1, u → ∞. Then also P(u) F (u) → 1. Proof. P
(u)
F (u)
c
= ≤
£ ¤ ¡ ¢ EL e−γSτ (u) ; F (u)c P F (u)c ; τ (u) < ∞ ¡ ¢ = ψ(u) P τ (u) < ∞ £ −γu ¤ c −γu EL e ; F (u) e PL F (u)c ∼ → 0. ψ(u) Ce−γu 2
Corollary 7.2 If B is exponential, then P(u) and PL coincide on ³ ´ Fτ (u)− = σ τ (u), {St }0≤t ² . M (u) k=1
The proof of the second is similar.
2
We finally consider the limiting joint distribution of ζ(u) = u − Sτ (u)− = Rτ (u)− and
ξ(u) = Sτ (u) − u = −Rτ (u)
(the surplus prior to ruin, resp. the deficit at ruin). Proposition of ¢the Cram´erLundberg approximation, ¡ ¢ 7.4 Under the conditions ¡ ζ(u), ξ(u) has a proper limit ζ(∞), ξ(∞) as u → ∞ in P(u) distribution. The limiting distribution is given by ¡ ¢ P ζ(∞) ∈ dx, ξ(∞) ≥ y =
β B(x + y)(eγx − 1) . 1−ρ
¡ ¢ Proof. Define Z(u) = P ζ(u) ∈ dx, ξ(u) ≥ y . Consider first the case where ζ(u) ∈ dx, ξ(u) ≥ y occurs in the first ladder step, illustrated in Fig. V.6.
148
CHAPTER V. PROBABILITY OF RUIN IN FINITE TIME u+y 6 u
0

u−x Figure V.6 For occurrence in the first ladder step, we need x > u and ζ(0) = u − y. Also, by Theorem IV.2.2 the (defective) density of ζ(0) is βB(z) and if ζ(0) = 0, the available information on ξ(0) given ζ(0) = z is that it has the distribution of a claim U conditioned to exceed z. Thus, taking z = u−x, we get the contribution to Z(u) from the first ladder step as βB(y − u)
B(x + y) I(u < x) = βB(x + y)I(u < x) . B(y − u)
If ξ(0) = z < u, everything repeats itself from z, and thus, since also ξ(0) has density βB(z), Z u Z(u) = βB(x + y)I(u < x) + Z(u − z)βB(z) dz . 0
R∞ This is a defective renewal equation, and since κ(γ) = 0 implies 0 eγv βB(v) dv = 1, the usual exponential technique gives that eγu Z(u) has the limit ÁZ u Z x eγx − 1 γC , eγu du βB(x + y) βB(v)veγv dv = βB(x + y) γ 1−ρ 0 0 where C is the Cram´erLundberg constant (for the last equality, see the calculations around IV.(5.8) and IV.(5.11)). Thus Z(u)/Ce−γu has the asserted limit which is what was to show. That the limit is proper follows from Z ∞ β B(x)(eγx − 1) dx 1−ρ 0 Z ´ ´ ¢ 1 ³ ∞ 1 ³β ¡ b βB(x)eγx dx − ρ = B[γ] − 1 − ρ = 1−ρ 0 1−ρ γ 1 (1 − ρ) = 1 . = 2 1−ρ
7. HOW DOES RUIN OCCUR?
149
Notes and references Proposition 7.4 can be found in Schmidli [773]. It will be shown in X.4 that in the heavytailed case, ζ(u) and ξ(u) need to be scaled down before a conditional limit can be obtained. The remaining results of the present section are part of a more general study carried out by the first author [54]. A somewhat similar study was carried out in the queueing setting by Anantharam [48], who also treated the heavytailed case; however, the queueing results are of a somewhat different type because of the presence of reflection at 0. From a mathematical point of view, the subject treated in this section leads into the area of large deviations theory. This is currently a very active area of research, see further XIII.1. Convergence properties of empirical finitetime ruin probabilities are investigated in Loisel, Mazza & Rulli`ere [608]. Schmidli [781] derives some exact expressions for the distribution of the risk process given that ruin occurs. These may be seen as special htransform calculations for piecewise deterministic Markov processes (recall that the distribution of a Markov chain or Markov process conditioned to hit a set is always a htranform, see e.g. Asmussen & Glynn [79, VI.7]).
This page intentionally left blank
Chapter VI
Renewal arrivals 1
Introduction
The basic assumption of this chapter states that the arrival epochs σ1 , σ2 , . . . of the risk process form a renewal process: letting Tn = σn − σn−1 (T1 = σ1 ), the Tn are independent, with the same distribution A (say) for T2 , T3 , . . . In the socalled zerodelayed case, the distribution A1 of T1 is A as well. A different important possibility is A1 to be the stationary delay distribution A0 with density A(x)/µA . Then the arrival process is stationary which could be a reasonable assumption in many cases (for these and further basic facts from renewal theory, see A.1). We use much of the same notation as in Chapter I. Thus the premium rate is 1, the claim sizes U1 , U2 , . . . are i.i.d. with common distribution B, {St } is the claim surplus process given by I.(1.5), with Nt = # {n : σn ≤ t} the number of arrivals before t, and M is the maximum of {St }, τ (u) the time to ruin. The ruin probability corresponding to the zerodelayed case is denoted by ψ(u), the one corresponding to the stationary case by ψ (s) (u), and the one corresponding to T1 = s by ψs (u). µB . Then regardless of the distribution A1 of T1 , Proposition 1.1 Define ρ = µA St St = lim E = ρ − 1, t→∞ t t→∞ t
(1.1)
µ2 σ 2 + µ2 σ 2 Var(St ) = B A 3 A B. t→∞ t µA
(1.2)
lim
lim
151
152
CHAPTER VI. RENEWAL ARRIVALS
Furthermore for any a > 0, lim E [St+a − St ] = a(ρ − 1).
(1.3)
t→∞
Proof. Obviously, ESt = E E
Nt hX
¯ i ¯ Ui ¯ Nt − t = ENt · µB − t.
i=1
However, by the elementary renewal theorem (cf. A.1) ENt /t → 1/µA . From this (1.1) follows, and (1.3) follows similarly by Blackwell’s renewal theorem, stating that E[Nt+a − Nt ] → a/µA . For (1.2), we get in the same way by using known facts about ENt and Var Nt that Var(St )
Nt Nt ¯ i ¯ i hX hX ¯ ¯ = Var E Ui ¯ Nt + E Var Ui ¯ Nt i=1
i=1
2 = Var(µB Nt ) + E(σB Nt ) 2 2 σ σ + t B + o(t). = tµ2B A µ3A µA
2 Of course, Proposition 1.1 gives the desired interpretation of the constant ρ as the expected claims per unit time. Thus, the definition η = 1/ρ − 1 of the safety loading appears reasonable here as well. The renewal model is often referred to as the Sparre Andersen process, after E. Sparre Andersen whose 1959 paper [816] was the first to treat renewal assumptions in risk theory in more depth. The simplest case is of course the Poisson case where A and A1 are both exponential with rate β. This has a direct physical interpretation (a large portfolio with claims arising with small rates and independently). Here are two special cases of the renewal model with a similar direct interpretation (see also the discussion in I.3): Example 1.2 (Deterministic arrivals) If A is degenerate, say at a, one could imagine that the claims are recorded only at discrete epochs (say each week or month) and thus each Un is really the accumulated claims over a period of length a. 2 Example 1.3 (Switched Poisson arrivals) Assume that the process has a random environment with two states on, off, such that no arrivals occur in the off state, but the arrival rate in the on state is β > 0. If the environment
1. INTRODUCTION
153
is Markovian with transition rate λ from on to off and µ from off to on, the interarrival times become i.i.d. (an arrival occurs necessarily in the on state, and then the whole process repeats itself). More precisely, A is phasetype (Example I.2.4) with phase space {on,off}, initial vector (1 0) and phase generator ³ −β − λ µ
λ ´ . −µ
2
However, in general the mechanism generating a renewal arrival process appears much harder to interpret in the risk theory context and therefore the relevance of the model has been questioned repeatedly, see the discussion in I.3. Nevertheless we will present at least some basic features of the renewal model, if only for the mathematical elegance of the subject, the fundamental connections to the theory of queues and random walks, and for historical reasons. The following representation of the ruin probability (already discussed in Section IV.1) will be a basic vehicle for studying the ruin probabilities: Proposition 1.4 The ruin probabilities for the zerodelayed case can be repreª ¡ ¢ © (d) sented as ψ(u) = P M (d) > u where M (d) = max Sn : n = 0, 1, . . . with © (d) ª a discrete time random walk with increments distributed as the indepenSn dent difference U − T between a claim U and an interarrival time T . Proof. The essence of the argument is that ruin can only occur at claim times. The values of the claim surplus process just after claims has the same distri© (d) ª bution as Sn . Since the claim surplus process {St } decreases in between arrival times, we have max St = max Sn(d) . n=0,1,...
0≤t u) of ruin at¢ the time s of the first claim whereas the second is P τ (u) < ∞, U1 − s ≤ u , as follows easily by noting that the evolution of the risk process after time s is that of a renewal risk model with initial reserve U1 − s. For the stationary case, integrate (1.4) w.r.t. A0 .
154
2
CHAPTER VI. RENEWAL ARRIVALS
Exponential claims. The compound Poisson model with negative claims
We first consider a variant of the compound Poisson model obtained essentially by signreversion. That is, the claims and the premium rate are negative so that the risk reserve process, resp. the claim surplus process are given by ∗
Rt∗
= u+
Nt X
∗
Ui∗
− t,
St∗
i=1
= t−
Nt X
Ui∗ ,
i=1
where {Nt∗ } is a Poisson process with rate β ∗ (say) and the Ui∗ are independent of {Nt } and i.i.d. with common distribution B ∗ (say) concentrated on (0, ∞). This model is sometimes referred to as the dual risk model in the literature1 . A typical sample path of {Rt∗ } is illustrated in Fig. VI.1. 6
u
Figure VI.1
∗
τ (u)
One interpretation of the model is to have continuous expenses and events according to a Poisson process (e.g. innovations) which increase the value of the portfolio or company. Another interpretation is of course the workload in an M/G/1 queue in its first busy period. Define the time of ruin τ ∗ (u) = inf {t > 0 : Rt∗ < 0}. Using Lundberg conju¢ ¡ gation, we shall be able to compute the ruin probability ψ ∗ (u) = P τ ∗ (u) < ∞ for this model very quickly. A simple sample path comparison will then provide us with the ruin probabilities for the renewal model with exponential claim size distribution. 1 Although
this terminology is not related to the other duality concepts used in this book.
2. EXPONENTIAL OR NEGATIVE CLAIMS
155
Theorem 2.1 If β ∗ µB ∗ ≤ 1, then ψ ∗ (u) = 1 for all u ≥ 0. If β ∗ µB ∗ > 1, then ψ ∗ (u) = e−γu where γ > 0 is the unique solution of ¡ ∗ ¢ b [−γ] − 1 + γ. 0 = κ∗ (−γ) = β ∗ B (2.1) [Note that κ∗ (α) = log Ee−αS1 .] Proof. Define St∗ = u − Rt∗ , Set = Rt∗ − u = −St∗ . © ª Then Set is the claim surplus process of a standard compound Poisson risk process with parameters β ∗ , B ∗ . If β ∗ µB ∗ ≤ 1, then by Proposition IV.1.2 sup St∗ = − inf Set = ∞ t≥0
t≥0
and hence ψ ∗ (u) = 1 follows.
Figure VI.2 Assume now β ∗ µB ∗ > 1. Then the function κ∗ is defined on the whole of (−∞, 0) and has typically the shape on Fig. VI.2(a). Hence γ exists and is unique. Let b ∗ [−γ], β = β∗B
B(dx) =
e−γx B ∗ (dx), b ∗ [−γ] B
and let {St } be a compound Poisson risk process with parameters β, B. Then the c.g.f. of {St } is κ(α) = κ∗ (α − γ), cf. Fig. VI.2(b), and the Lundberg conjugate © ª of {St } is Set and vice versa. Define © ª © ª τ− (u) = inf t ≥ 0 : St = −u , τe− (u) = inf t ≥ 0 : Set = −u = −u . Since κ0 (0) < 0, the safety loading of {St } is > 0. Hence τ− (u) < ∞ a.s., and thus ¡ ¢ £ ¤ e e−γ Seτe− (u) ; τe− (u) < ∞ 1 = P τ− (u) < ∞ = E ¡ ¢ = eγu P τe− (u) < ∞ = eγu ψ ∗ (u). 2 Now return to the renewal model.
156
CHAPTER VI. RENEWAL ARRIVALS
Theorem 2.2 If B is exponential, with rate δ (say), and δµA > 1, then ψ(u) = π+ e−γu where γ > 0 is the unique solution of 1 = Eeγ(U −T ) = and π+ = 1 −
δ b A[−γ] δ−γ
(2.2)
γ . δ
Proof. We can couple the renewal model {St } and the compound Poisson model {St∗ } with negative claims in such a way that the interarrival times of {St∗ } are T0∗ , T1∗ = U1 , T2∗ = U2 , . . . Then B ∗ = A, β ∗ = δ, and (2.1) means that b δ(A[−γ] − 1) + γ = 0 which is easily seen to be the same as (2.2). Now the value of {St∗ } just before the nth claim is T0∗ + T1∗ + · · · + Tn∗ − U1∗ − · · · − Un∗ , and from Fig. VI.1 it is seen that ruin is equivalent to one of these values being > u. Hence © ª M ∗ = max St∗ = max T0∗ + T1∗ + · · · + Tn∗ − U1∗ − · · · − Un∗ n=0,1,...
t≥0
D =
T0∗ + max
D =
T0∗ + M (d)
©
n=0,1,...
U1 + · · · + Un − T1 − · · · − Tn
ª
in the notation of Proposition 1.4. Taking m.g.f.’s and noting that ψ ∗ (u) = P(M ∗ > u) so that Theorem 2.1 means that M ∗ is exponentially distributed with rate γ, we get ∗
EeαM
(d)
=
γ/(γ − α) EeαM γ = . = 1 − π+ + π+ ∗ αT 0 δ/(δ − α) γ−α Ee
I.e. the distribution of M (d) is a mixture of an atom at zero and an exponential distribution with rate parameter γ with weights 1 − π+ and π+ , respectively. Hence P(M (d) > u) = π+ e−γu . 2 Remark 2.3 A variant of the last part of the proof, which has the advantage of avoiding transforms and leading up to the basic ideas of the study of the phasetype case in IX.4 goes as follows: define π+ = P(M (d) > 0) and consider {St∗ } only when the process is at a maximum value. According to Theorem 2.1, the failure rate of this process is γ. However, alternatively termination occurs at a jump time (having rate δ), with the probability that a particular jump time
3. CHANGE OF MEASURE VIA EXPONENTIAL FAMILIES
157
is not followed by any later maximum values being 1 − π+ , and hence the failure rate is δ(1 − π+ ). Putting this equal to γ, we see that γ = δ(1 − π+ ) and hence π+ = 1−γ/δ. However, consider instead the failure rate of M (d) and decompose M (d) into ladder steps as in III.6, IV.2. The probability that the first ladder step is finite is π+ . Furthermore, a ladder step is the overshoot of a claim size, hence exponential with rate δ. Thus a ladder step terminates at rate δ and is followed by one more with probability π+ . Hence failure rate of M (d) is ¡ the ¢ −γu (d) (d) δ(1 − π+ ) = γ and consequently P(M > u) = P M > 0 e = π+ e−γu . 2
3
Change of measure via exponential families
We shall discuss two points of view, the imbedded discrete time random walk and Markov additive processes.
3a
The imbedded random walk
The key steps have already been carried out in Corollary III.3.5, which states that for a given α, the relevant exponential change of measure corresponds to changing the distribution F (d) of Y = U − T to Z x (d) Fα(d) (x) = e−κ (α) eαy F (d) (dy) −∞
where b + log A[−α] b κ(d) (α) = log Fb(d) [α] = log B[α] .
(3.1)
It only remains to note that this change of measure can be achieved by changing (d) (d) the interarrival distribution A and the claim distribution B to Aα , resp. Bα , where e−αt eαx (d) A(dt), B(dx) . (dt) = A(d) (dx) = B α α b b A[−α] B[α] (d)
This follows since, letting Pα refer to the renewal risk model with these changed parameters, we have βY E(d) α e
= =
b b + β] A[−α − β] B[α (d) bα(d) [β]A bα · [−β] = B b b B[α] A[−α] (d) Fb [α + β] = Fbα(d) [β] . Fb(d) [α]
158
CHAPTER VI. RENEWAL ARRIVALS Let
© ª M (u) = inf n = 1, 2, . . . : Sn(d) > u
be the number of claims leading to ruin and (d)
ξ(u) = Sτ (u) − u = SM (u) − u the overshoot, then we get: 0
Proposition 3.1 For any α such that κ(d) (α) ≥ 0, (d)
−αξ(u)+M (u)κ ψ(u) = e−αu E(d) α e
(α)
.
Consider now the Lundberg case, i.e. let γ > 0 be the solution of κ(d) (γ) = 0. We have the following versions of Lundberg’s inequality and the Cram´erLundberg approximation: Theorem 3.2 In the zerodelayed case, (a) ψ(u) ≤ e−γu ; (d) (b) ψ(u) ∼ Ce−γu where C = limu→∞ Eγ e−γξ(u) , provided the distribution F of U − T is nonlattice. Proof. Proposition 3.1 implies −αξ(u) ψ(u) = e−αu E(d) , γ e
and claim (a) follows immediately from this and ξ(u) > 0 a.s. For claim (b), just (d) note that Fγ is nonlattice when F is so. This is known to be sufficient for ξ(0) (d) to be nonlattice w.r.t. Pγ ([APQ, p. 222]) and thereby for ξ(u) to converge in ¡ ¢ 0 (d) 2 distribution, since Pγ τ (0) < ∞ = 1 because of κ(d) (γ) > 0. It should be noted that the computation of the Cram´erLundberg constant C is much more complicated the renewal case than for the compound Poisson ¡ for ¢ b 0 [γ] − 1 is explicit given γ. In fact, in the easiest case where C = (1 − ρ)/ β B nonexponential case where B is phasetype, the evaluation of C is at the same level of difficulty as the evaluation of ψ(u) in matrixexponential form, cf. IX.4. Corollary 3.3 For the delayed case T1 = s, ψs (u) ∼ Cs e−γu where Cs = b Ce−γs B[γ]. For the stationary case, ψ (s) (u) ∼ C (s) e−γu where C
(s)
¢ ¡ b −1 C B[γ] = . γµA
3. CHANGE OF MEASURE VIA EXPONENTIAL FAMILIES
159
Proof. Using (1.4), B(x) = o(e−γx ) and dominated convergence, we get Z u+s γu γu e ψs (u) = e B(u + s) + eγ(y−s) eγ(u+s−y) ψ(u + s − y) B(dy) 0 Z ∞ → 0 + eγ(y−s) C B(dy) = Cs . 0
For the stationary case, another use of dominated convergence combined with b0 [s] = (A[s] b − 1)/sµA yields A Z ∞ Z ∞ b A0 (ds) eγu ψ (s) (u) = eγu ψs (u) A0 (ds) → Ce−γs B[γ] 0 0 ¡ ¢ b b C B[γ] A[−γ] −1 = = C (s) . −γµA 2 Of course, a delayed version of Lundberg’s inequality can be obtained in a similar manner. The expressions are slightly more complicated and we omit the details.
3b
Markov additive representations
We take the Markov additive point of view of III.5. Markov © The underlying ª process {Jt } for the Markov additive process {Xt } = (Jt , St ) can be defined by taking Jt as the residual time until the next arrival. According to Remark III.4.9, we look for a function h(s) and a κ (both depending on α) such that Ghα (s, 0) = κ(α)h(s), where G is the generator of {Xt } = {(Jt , St )} and hα (s, y) = eαy h(s). Let Ps , Es refer to the case J0 = s. For s > 0, ¡ ¢ Es hα (Jdt , Sdt ) = h(s − dt)e−αdt = h(s) − dt αh(s) + h0 (s) so that Ghα (s, 0) = −αh(s) − h0 (s). Equating this to κh(s) and dividing by h(s) yields h0 (s)/h(s) = −α − κ, h(s) = e−(α+κ(α))s
(3.2)
(normalizing by h(0) = 1). To determine κ, we invoke the behavior at the boundary 0. Here £ ¤ 1 = hα (0, 0) = E0 hα (Jdt , Sdt ) = EeαU h(T ) means
Z 1 = 0
∞
Z eαy B(dy)
∞
h(s)A(ds) , 0
160
CHAPTER VI. RENEWAL ARRIVALS
i.e.
£ ¤ b A b −α − κ(α) = 1. B[α]
(3.3)
As in III.5, we ªcan now for each α define a new probability measure Pα;s © governing (Jt , St ) t≥0 by letting the likelihood ratio Lt restricted to Ft = ¡ ¢ σ (Jv , Sv ) : 0 ≤ v ≤ t be Lt = eαSt −tκ(α)
h(Jt ) = eαSt −tκ(α) e−(α+κ(α))(Jt −s) h(s)
where κ(α) is the solution of (3.3). Proposition 3.4 The probability measure Pα;s is the probability measure governing a renewal risk process with J0 = s and the interarrival distribution A and the service time distribution B changed to Aα , resp. Bα where Aα (dt) =
e−(α+κ(α))t £ ¤ A(dt), b −α − κ(α) A
Bα (dx) =
eαx B(dx) . b B[α]
Proof. Pα;s (J0 = s) = 1 follows trivially from L0 = 1. Further, since JT1 = Js = T2 , £ ¤ Eα;s eβU1 +δT2 = Es eβU1 +δT2 LT1 £ ¤ = Es eβU1 +δT2 eα(U1 −s)−sκ(α) e−(α+κ(α))(T2 −s) £ ¤ b b £ ¤ b + β]A b δ − α − κ(α) = B[α + β] A δ£ − α − κ(α)¤ = B[α b b −α − κ(α) B[α] A bα [β]A bα [δ], = B which shows that U1 , T2 are independent with distributions Bα , resp. Aα as asserted. An easy extension of the argument shows that U1 , . . . , Un , T2 , . . . , Tn+1 are independent with distribution Aα for the Tk and Bα for the Uk . 2 Remark 3.5 For the compound Poisson case with rate ¡ ¢ where A is exponential ¡ ¢ b b β, (3.3) means 1 = B[α]β/ β + α + κ(α) , i.e. κ(α) = β B[α] − 1 − α in agreement with Chapter IV. 2 Note that the changed distributions of A and B are in general not the same (d) for Pα;s and Pα . An important exception is, however, the determination of the adjustment coefficient γ where the defining equations κ(d) (γ) = 0 and κ(γ) = 0 are the same.
4. THE DUALITY WITH QUEUEING THEORY
161
The Markov additive point of view is relevant when studying problems which cannot be reduced to the imbedded random walk, say finite horizon ruin probabilities where the approach via the imbedded random walk yields results on the probability of ruin after N claims, not after time T . Using the Markov additive approach yields for example the following analogue of Theorem V.4.6: Proposition 3.6 Let y < 1/κ0 (γ), let αy > 0 be the solution of κ0 (αy ) = 1/y, and define γy = αy − yκ(αy ). Then ψs (u, yu) ≤
e−(αy +κ(αy ))s −γy u b y ]e−γy u . = e−(αy +κ(αy ))s B[α £ ¤e b A −αy − κ(αy )
In particular, for the zerodelayed case ψs (u, yu) ≤ e−γy u . Proof. As in the proof of Theorem V.4.6, it is easily seen ¡ that ¢ κ(αy ) > 0. Let M (u) be the number of claims leading to ruin. Then J τ (u) = TM (u)+1 and hence h i h(s) ; τ (u) ≤ yu ψs (u, yu) = Eαy ;s e−αy Sτ (u) +τ (u)κ(αy ) h(Jτ (u) ) h e−(αy +κ(αy ))s i ≤ e−αy u+yuκ(αy ) Eαy ;s −(α +κ(α ))T y M (u)+1 e y £ ¤ −(αy +κ(αy ))s −γy u b = e e Aαy αy + κ(αy ) , which is the same as the asserted inequality for ψs (u, yu). The claim for the zerodelayed case follows by integration w.r.t. A(ds). 2 Notes and references The approach via the imbedded random walk is standard, see e.g. [APQ]. The random walk interpretation also allows to translate general asymptotic results for finite timehorizon ruin probabilities of random walks to the corresponding renewal model, for instance the sharp timehorizon asymptotics of Veraverbeke & Teugels [864]. For the approach via Markov additive processes, see in particular Dassios & Embrechts [273] and Asmussen & Rubinstein [99].
4
The duality with queueing theory
We first review some basic facts about the GI/G/1 queue, defined as the single server queue with first in first out (FIFO; or FCFS = first come first served) queueing discipline and renewal interarrival times. Label the customers 1, 2, . . . and assume that Tn is the time between the arrivals of customers n − 1 and n, and Un the service time of customer n. The actual waiting time Wn of customer
162
CHAPTER VI. RENEWAL ARRIVALS
n is defined as his time spent in queue (excluding the service time), that is, the time from which he arrives to the queue until he starts service. The virtual waiting time Vt at time t is the residual amount of work at time t, that is, the amount of time the server will have to work until the system is empty provided no new customers arrive (for this reason often the term workload process is used) or, equivalently, the waiting time a customer would have if he arrived at time t. Thus, since customer n arrives at time σn , we have Wn = Vσn −
(4.1)
(left limit). The traffic intensity of the queue is ρ = EU/ET . The following result shows that {Wn } is a Lindley process in the sense of III.4: +
Proposition 4.1 Wn+1 = (Wn + Un − Tn )
.
Proof. The amount of residual work just before customer n arrives is Vσn − . It then jumps to Vσn − + Un , whereas in [σn , σn+1 ) = [σn , σn + Tn ) the residual work decreases linearly until possibly zero is hit, in which case {Vt } remains at + zero until time σn+1 . Thus Vσn+1 − = (Wn + Un − Tn ) , and combining with (4.1), the proposition follows. 2 Applying Theorem III.3.1, we get: ¡ ¢ (d) Corollary 4.2 Let Mn = maxk=0,...,n−1 U1 +· · ·+Uk −T1 −· · · Tk . If W1 = 0, D (d) then Wn = Mn . The next result summarizes the fundamental duality relations between the steadystate behavior of the queue and the ruin probabilities (part (a) was essentially derived already in III.4): Proposition 4.3 Assume η > 0 or, equivalently, ρ < 1. Then: (a) as n → ∞, Wn converges in distribution to a random variable W , and we have P(W > u) = ψ(u); (4.2) (b) as t → ∞, Vt converges in distribution to a random variable V , and we have P(V > u) = ψ (s) (u).
(4.3)
Proof. Part (a) is contained in Theorem III.3.1 and Corollary III.3.2, but we shall present a slightly different proof via the duality result given in Theorem III.2.1. Let the T there be the random time σN . Then P(τ (u) ≤ T ) is
4. THE DUALITY WITH QUEUEING THEORY
163
the probability ψ (N ) (u) of ruin after at most N claims, and obviously ψ(u) = limN →∞ ψ (N ) (u). Also {Zt }0≤t≤T evolves like the leftcontinuous version of the virtual waiting time process up to just before the N th arrival, but interchanging the set (T1 , . . . , TN ) with (TN , . . . , T1 ) and similarly for the Un . However, by an obvious reversibility argument this does not affect the distribution, and hence in particular ZT is distributed as the virtual waiting time just before the N th arrival, i.e. as WN . It follows that P(WN > u) = ψ (N ) (u) has the limit ψ(u) for all u, which implies the convergence in distribution and (4.2). For part (b), we let T be deterministic. Then the arrivals of {Rt } in [0, T ] form a stationary renewal process with interarrival distribution A, hence (since the residual lifetime at 0 and the age at T have the same distribution, cf. A.1e) the same is true for the timereversed point process which is the interarrival process for {Zt }0≤t≤T . Thus as before, {Zt }0≤t≤T has the same distribution as the leftcontinuous version of the virtual waiting time process so that ¡ ¢ P(s) (VT > u) = P(s) τ (u) ≤ T , (4.4) ¡ ¢ lim P(s) (VT > u) = lim P(s) τ (u) ≤ T = ψ (s) (u). T →∞
T →∞
2 It should be noted that this argument only establishes the convergence in distribution subject to certain initial conditions, namely W1 = 0 in (a) and V0 = 0, T1 ∼ A0 in (b). In fact, convergence in distribution holds for arbitrary initial conditions, but this requires some additional arguments (involving regeneration at 0, but not difficult) that we omit. Letting n → ∞ in Corollary 4.2, we obtain: Corollary 4.4 The steadystate actual waiting time W has the same distribution as M (d) . Corollary 4.5 (Lindley’s integral equation) Let F (x) = P(U1 −T1 ≤ x), K(x) = P(W ≤ x). Then Z x K(x) = K(x − y)F (dy), x ≥ 0 . (4.5) −∞
D Proof. Letting n → ∞ in Proposition 4.1, we get W = (W + U ∗ − T ∗ )+ , ∗ ∗ where U , T are independent and distributed as U1 , resp. T1 . Hence for x ≥ 0, conditioning upon U ∗ − T ∗ = y yields ¡ ¢ K(x) = P (W + U ∗ − T ∗ )+ ≤ x = P(W + U ∗ − T ∗ ≤ x) Z x = K(x − y)F (dy) −∞
164
CHAPTER VI. RENEWAL ARRIVALS
(x ≥ 0 is crucial for the second equality!).
2
Now return to the Poisson case. Then the corresponding queue is M/G/1, and we get: Corollary 4.6 For the M/G/1 queue with ρ < 1, the actual and the virtual D waiting time have the same distribution in the steady state. That is, W = V . Proof. For the Poisson case, the zerodelayed and the stationary renewal processes are identical. Hence ψ(u) = ψ (s) (u), implying P(W > u) = P(V > u) for all u. 2 Notes and references The GI/G/1 queue is a favorite of almost any queueing book (see e.g. Cohen [249] or [APQ, Ch. X]), despite the fact that the extension from M/G/1 is of equally doubtful relevance as we argued in Section 1 to be the case in risk theory. Some early classical papers are Smith [814] and Lindley [598]. Note that (4.5) looks like the convolution equation K = F ∗ K but is not the same (one would need (4.5) to hold for all x ∈ R and not just x ≥ 0). Equation (4.5) is in fact a homogeneous WienerHopf equation, see e.g. Asmussen [66] and references therein. For some further explicit treatments beyond exponential claim sizes, see e.g. Malinovski [627] and Rongming & Haifeng [747]. The imbedded random walk approach also leads to a PollaczeckKhinchine type formula in the renewal case, which will be exploited for the asymptotic behavior of the ruin probability with heavytailed claims in Section X.3. A detailed exposition of the compound geometric approach to renewal models is Willmot & Lin [892]. Whenever the interclaim time is phasetype, renewal models are a special case of Markov additive processes and hence we also refer to Chapter IX for related material. Exploiting some links between wave governed random motions and the renewal risk process, Mazza & Rulli`ere [632] give an algorithm for computing finitetime ruin probabilities for nonexponential interarrival times. A number of further results on ruinrelated quantities in renewal models will be discussed in XII.3.
Chapter VII
Risk theory in a Markovian environment 1
Model and examples
We assume that the arrivals form an inhomogeneous Poisson process, more precisely determined by a Markov process {Jt }0≤t u}, M = supt≥0 St . The ruin probabilities with initial environment i are ¡ ¢ ¡ ¢ ψi (u) = Pi τ (u) < ∞ = Pi (M > u), ψi (u, T ) = Pi τ (u) ≤ T , 165
166
CHAPTER VII. MARKOVIAN ENVIRONMENT
where as usual Pi refers to the case J0 = i. Unless otherwise stated, we shall assume that pi = 1; this is no restriction when studying infinite horizon ruin probabilities, cf. the operational time argument given in Example 1.5 below. We let X 1−ρ . (1.1) ρi = βi µBi , ρ = πi ρi , η = ρ i∈E
Then ρi is the average amount of claims received per unit time when the environment is in state i, and ρ is the overall average amount of claims per unit time, cf. Proposition 1.11 below. An example of how such a mechanism could be relevant in risk theory follows. Example 1.1 Consider car insurance, and assume that weather conditions play a major role for the occurrence of accidents. For example, we could distinguish between normal and icy road conditions, leading to E having two states n, i and corresponding arrival intensities βn , βi and claim size distributions Bn , Bi ; one expects that βi > βn and presumably also that Bn 6= Bi , meaning that accidents occurring during icy road conditions lead to claim amounts which are different from the normal ones. 2 The versatility of the model in terms of incorporating (or at least approximating) many phenomena which look very different or more complicated at a first sight goes in fact much further (note that for the following discussion a basic knowledge of phasetype distributions is needed, cf. IX.1): Example 1.2 (alternating renewal environment) The model of Example 1.1 implicitly assumes that the sojourn times of the environment in the normal and the icy states are exponential, with rates λni and λin , respectively, which is clearly unrealistic. Thus, assume that the sojourn time in the icy state has a more general distribution A(i) . According to Theorem A5.14, we can approximate A(i) with a phasetype distribution (cf. Example I.2.4) with ¡ ¢ representation E (i) , α(i) , T (i) , say. Assume similarly that the sojourn time in the normal state has distribution ¡A(n) which we approximate with a phasetype ¢ distribution with representation E (n) , α(n) , T (n) , say. Then the state space for the environment is the disjoint union of E (n) and E (i) , and we have βj = βi when j ∈ E (i) , βj = βn when j ∈ E (n) ; in blockpartitioned form, the intensity matrix is ³ T (n) t(n) α(i) ´ , Λ = t(i) α(n) T (i) where t(n) = −T (n) e, t(i) = −T (i) e are the exit rates.
2
1. MODEL AND EXAMPLES
167
Example 1.3 Consider again the alternating renewal model for car insurance in Example 1.2, but assume now that the arrival intensity changes during the icy period, say it is larger initially. One way to model this would be to take A(i) to be Coxian (cf. Example IX.1.4) with states i1 , . . . , iq (visited in that order) and let βi1 > . . . > βiq . 2 Example 1.4 (semiMarkovian environment) Dependence between the length of an icy period and the following normal one (and vice versa) ¡ ¢ can be modelled by semiMarkov structure. This amounts to a family A(η) η∈H of sojourn time distributions, such that a sojourn time of type η is followed by one of type ι w.p. wηι where W = (wηι )η,ι∈H is a transition matrix. Approximating ¡ ¢ each A(η) by a phasetype distribution with representation E (η) , α(η) , T (η) , © ª say, the state space E for the environment is (η, i) : η ∈ H, i ∈ E (η) , and Λ =
T (1) + ω11 t(1) α(1) ω21 t(2) α(1) .. .
ω12 t(1) α(2) T (2) + ω22 t(2) α(2)
··· ··· .. .
ω1q t(1) α(q) ω2q t(2) α(q) .. .
ωq1 t(q) α(1)
ωq2 t(q) α(2)
···
T (q) + ωqq t(q) α(q)
where q = H, t(η) = −T (η) e. The simplest model for the arrival intensity amounts to βη,j = βη depending only on η. In the car insurance example, one could for example have H = {i` , is , n` , ns }, such that the icy period is of two types (long and short) each with their sojourn time distribution A(i` ) , resp. A(is ) , and similarly for the normal period. Then for example wi` ns is the probability that a long icy period is followed by a short normal one. 2 Example 1.5 (Markovmodulated premiums) Returning for a short while to the case of general premium rates pi depending on the environment i, let Z T θ(T ) = pJt dt, Jet = Jθ−1 (t) , Set = Sθ−1 (t) . 0
© ª Then (by standard operational time arguments) Set is a risk process in a Markovian environment with unit premium rate, and ψei (u) = ψi (u). Indeed, eij = λij /pi , βei = βi /pi . the parameters are λ 2 From now on, we assume again pi = 1 so that the claim surplus is St =
Nt X i=1
Ui − t.
168
CHAPTER VII. MARKOVIAN ENVIRONMENT
We turn to some more mathematically oriented basic discussion. The key property for much of the analysis presented below is the following immediate observation: Proposition 1.6 The claim surplus process {St } of a risk process in a Markovian environment is a Markov additive process corresponding to the parameters µi = −pi , σi2 = 0, νi (dx) = βi Bi (dx), qij = 0 in the notation of Chapter III.4. In particular, the Markov additive structure will be used for exponential change of measure and thereby versions of Lundberg’s inequality and the Cram´erLundberg approximation. Next we note a semiMarkov structure of the arrival process: Proposition 1.7 ¡ 0 ¢ The Pi distribution of T1 is phasetype with representation ei , Λ − (βi )diag . More precisely, Pi (T1 ∈ dx, JT1 = j) = βj · e0i e(Λ−(βi )diag )x ej dx. Proof. The result immediately follows by noting that T1 is obtained as the lifelength of {Jt } killed at the time of the first arrival and that the exit rate obviously is βj in state j. 2 A remark which is fundamental for much of the intuition on the model consists in noting that to each risk process in a Markovian environment, one can associate in a natural way a standard Poisson one by averaging over the environment. More precisely, we put β∗ =
X
πi βi ,
i∈E
B∗ =
X πi βi i∈E
β∗
Bi .
These parameters are the ones which the statistician would estimate if he ignored the presence of Markovmodulation: Proposition 1.8 As t → ∞, Nt a.s. ∗ → β , t
Nt 1 X a.s. I(U` ≤ x) → B ∗ (x) . Nt `=1
Note that the last statement of the proposition just means that in the limit, the empirical distribution of the claims is B ∗ . Note also that (as the proof shows) πi βi /β ∗ gives the proportion of the claims which are of type i (arrive in state i). Rt (i) Proof. Let ti = 0 I(Js = i) ds be the time spent in state i up to time t and Nt a.s. the number of claim arrivals in state i. Then it is standard that ti /t → πi as
1. MODEL AND EXAMPLES
169 (i)
t → ∞. However, given {Jt }0≤t 0 is arbitrary. That is, we may imagine that we have two types of claims such that the claim size distributions are E3 and E7 . Claims of type E3 arrive 27 3 in state 1 and with intensity 23 · 51 = 10 in state 2, with intensity 29 · 53 = 10 9 9 2 those of type E7 with intensity 2 · 5 = 5 in state 1 and with intensity 23 · 54 = 56 in state 2. Thus, since E3 is a more dangerous claim size distribution than E7 (the mean is larger and the tail is heavier), state 1 appears as more dangerous than state 2, and in fact µ 9 3 2 5 µ 3 1 = 2 5
ρ1
= β1 µB1 =
ρ2
= β1 µB2
¶ 1 2 1 81 + · = , 3 5 7 70 ¶ 1 4 1 19 · + · = . 3 5 7 70 ·
Thus in state 1 where ρ1 > 1, the company even suffers an average loss, and (at least when a is small such that state changes of the environment are infrequent), the paths of the surplus process will exhibit the type of behavior in Fig. VII.1 with periods with positive drift alternating with periods with negative drift; the overall drift is negative since π = ( 21 21 ) so that ρ = π1 ρ1 + π2 ρ2 = 75 < 1. On Fig. VII.1, there are p = 2 background states of {Jt }, marked by thin, resp. thick, lines in the path of {St }. Computing the parameters of the averaged compound Poisson model, we first get that 1 9 1 3 · + · = 3. β∗ = 2 2 2 2
1. MODEL AND EXAMPLES
171
Figure VII.1 Thus, a fraction π1 β1 /β ∗ = 3/4 of the claims occurs in state 1 and the remaining fraction 1/4 in state 2. Hence B∗ =
2 ´ 1³1 4 ´ 1 1 3³3 E3 + E7 + E3 + E7 = E3 + E7 . 4 5 5 4 5 5 2 2
That is, the averaged compound Poisson model is the same as in IV.(3.1).
2
The definition (1.1) of the safety loading η is (as for the renewal model in Chapter VI) based upon an asymptotic consideration given by the following result: Proposition 1.11 (a) ESt /t → ρ − 1, t → ∞; (b) St /t → ρ − 1 a.s., t → ∞. Proof. In the notation of Proposition 1.8, we have X X ¯ £ ¤ E St + t ¯ (ti )i∈E = ti βi µBi = t i ρi . i∈E
i∈E
Taking expectations and using the wellknown fact Eti /t → πi yields (a). For PN (i) a.s. (b), note first that 1 Uk /N → µBi . Hence (i)
Nt X X N (i) 1 X St + t (i) a.s. t Uk → πi βi · µBi = ρ. = · (i) t t Nt k=1 i∈E i∈E
2
172
CHAPTER VII. MARKOVIAN ENVIRONMENT
Corollary 1.12 If η ≤ 0, then M = ∞ a.s., and hence ψi (u) = 1 for all i and u. If η > 0, then M < ∞ a.s., and ψi (u) < 1 for all i and u. Proof. The case η < 0 is trivial since then the a.s. limit ρ − 1 of St /t is > 0, and hence M = ∞. The case η > 0 is similarly easy. Now let η = 0, let some state i be fixed and define ω = ω1 = inf {t > 0 : Jt− 6= i, Jt = i} , X1 = Sω1 ,
ω2 = inf {t > ω1 : Jt− 6= i, Jt = i} ,
X2 = Sω2 − Sω1 ,
and so on. Then by standard Markov process formulas (e.g. [APQ, Th. II.4.2(i), p. 50]) Ei ω1 = −1/πi λii and Z ω Ei X1 = Ei βJt µBJt dt − Ei ω 0 ¡X ¢ = Ei ω · πj βj µBj − 1 = (ρ − 1)Ei ω = 0. j∈E a.s.
Now obviously the ωn form a renewal process, and hence ωn /n → Ei ω. Since the Xn are independent, with X2 , X3 , . . . having the Pi distribution of X, also X1 + · · · + Xn a.s. Sωn = → Ei X = 0. n n Thus {Sωn } is a discrete time random walk with mean zero, and hence oscillates between −∞ and ∞ so that also here M = ∞. 2 Notes and references The Markovmodulated Poisson process has been very popular in queueing theory since the early 1980s, see the Notes to Section 7. In risk theory, some early studies are in Janssen & Reinhard [501, 730, 502], and a more comprehensive treatment in Asmussen [58]. The mainstream of the present chapter follows [58], with some important improvements being obtained in Asmussen [59] in the queueing setting and being implemented numerically in Asmussen & Rolski [97]. Statistical aspects are not treated here. See Meier [634] and Ryd´en [760, 761]. There seems still to be more to be done in this area, in particular in order to treat more than lowdimensional state spaces E. Proposition 1.1 and the Corollary are standard. The proof of Proposition 1.1(b) is essentially the same as the proof of the strong law of large numbers for cumulative processes, see [APQ, p. 178] or A.1d.
2
The ladder height distribution
Our mathematical treatment of the ruin problem follows the model of Chapter IV for the simple compound Poisson model, and involves a version of the
2. THE LADDER HEIGHT DISTRIBUTION
173
PollaczeckKhinchine formula (see Proposition 2.2(a) below) where the ladder height distribution is evaluated by a time reversion argument. Define the ladder epoch τ+ by τ+ = inf {t : St > 0} = τ (0), let ¡ ¢ G+ (i, j; A) = Pi Sτ+ ∈ A, Jτ+ = j, τ+ < ∞ and let G+ be the measurevalued matrix with ijth element G+ (i, j; ·). The form of G+ turns out to be explicit (or at least computable), but is substantially more involved than for the compound Poisson case. However, by specializing results for general stationary risk processes (Theorem III.5.5; see also Example III.5.4) we obtain the following result, which represents a nice simplified form of the ladder height distribution G+ when taking certain averages: starting {Jt } stationary, we get the same ladder height distribution as for the averaged compound Poisson model, cf. the definition of β ∗ , B ∗ in Section 1. ∗
Proposition 2.1 πG+ (dy)e = β ∗ B (y)dy. For measurevalued matrices, we define the convolution operation by the same rule as for multiplication of realvalued matrices, only with the product of real numbers replaced by convolution of measures. Thus, e.g., G∗2 + is the matrix whose ijth element is X G+ (i, k; ·) ∗ G+ (k, j; ·). k∈E
Also, kG+ k denotes the matrix with ijth element Z ∞ ° ° °G+ (i, j; ·)° = G+ (i, j; dx). 0
Let further R denote the preτ+ occupation kernel, Z τ+ R(i, j; A) = Ei I(St ∈ A, Jt = j) dt, 0
and S(dx) the measurevalued diagonal matrix with βi Bi (dx) as ith diagonal element. Proposition 2.2 (a) The distribution of M is given by 1 − ψi (u) = Pi (M ≤ u) = eT i Z (b) G+ (y, ∞) =
∞ X
¡ ¢ G∗n + (u) I − kG+ k e.
(2.1)
n=0 0
¡ ¢ R(dx)S (y − x, ∞) . That is, for i, j ∈ E,
−∞
¡ ¢ G+ i, j; (y, ∞) =
Z
0
R(i, j; dx)βj B j (y − x). −∞
(2.2)
174
CHAPTER VII. MARKOVIAN ENVIRONMENT
Proof. The probability that there are n proper ladder steps not exceeding x and ∗n that the environment is j at the nth when we start from i is eT i G+ (x)ej , and the probability that ¡ ¢ there are no further ladder steps starting from environment j is eT j I − kG+ k e. From this (2.1) follows by summing over n and j. The proof of (2.2) is just the same as the proof of Lemma III.5.3. 2 To make Proposition 2.2 useful, we need as in Chapters III, IV to bring R and G+ on a more explicit form. To this end, we need to invoke the timereversed version {Jt∗ } of {Jt }; the intensity matrix Λ∗ has ijth element λ∗ij = and we have Pi (JT∗ = j) =
πj λji , πi πj Pj (JT = i). πi
(2.3)
We let {St∗ } be defined as {St }, only with {Jt } replaced by {Jt∗ } (the βi and Bi are the same), and let further {mx } be the Evalued process obtained by observing {Jt∗ } only when {St∗ } is at a minimum value. That is, mx = j when for some (necessarily unique) t we have St∗ = −x, Jt∗ = j, St∗ < Su∗ for u < t; see Figure VII.2 for an illustration in the case of p = 2 environmental states of {Jt }, marked by thin, resp. thick, lines in the path of {St }.
Figure VII.2 The following observation is immediate: Proposition 2.3 When η > 0, {mx } is a nonterminating Markov process on E, hence uniquely specified by its intensity matrix Q (say).
2. THE LADDER HEIGHT DISTRIBUTION
175
Proposition 2.4 Q satisfies the nonlinear matrix equation Q = ϕ(Q) where Z ϕ(Q) = Λ∗ − (βi )diag +
∞
S(dx) eQx ,
0
and S(dx) is the diagonal matrix with the βi Bi (dx) on the diagonal. Further© ª more, the sequence Q(n) defined by ¡ ¢ Q(n+1) = ϕ Q(n)
Q(0) = Λ∗ − (βi )diag , converges monotonically to Q.
Note that the integral in the definition of ϕ(Q) is the matrix whose ith row is the ith row of Z ∞ bi [Q] = βi βi B eQx Bi (dx). 0
Proof. The argument relies on an interpretation in terms of excursions. An ∗ excursion of {St∗ } above level −x starts at time t if St− = −x, {Sv∗ } is a minimum value at v = t− and a jump (claim arrival) occurs at time t, and the excursion ends at time s = inf {v > t : Sv∗ = −x}. If there are no jumps in (t, s], we say that the excursion has depth 0. Otherwise each jump at a minimum level during the excursion starts a subexcursion, and the excursion is said to have depth 1 if each of these subexcursions have depth 0. In general, we recursively define the depth of an excursion as 1 plus the maximal depth of a subexcursion. The definitions are illustrated on Figure VII.3 where there are three excursions of depth 1,0,2. For example the excursion of depth 2 has one subexcursion which is of depth 1, corresponding to two subexcursions of depth 0.
Figure VII.3
176
CHAPTER VII. MARKOVIAN ENVIRONMENT (n)
Let pij be the probability that an excursion starting from Jt∗ = i has depth at most n and terminates at Js∗ = j and pij the probability that an excursion starting from Jt∗ = i terminates at Js∗ = j. By considering minimum values within the excursion, it becomes clear that Z ∞ £ Qy ¤ pij = e B (dy) . (2.4) ij i 0
To show Q = ϕ(Q), we first compute qij for i 6= j. Suppose mx = i. Then a jump to j (i.e., mx+dx = j) occurs in two ways, either due to a jump of {Jt∗ } which occurs with intensity λ∗ij , or through an arrival starting an excursion terminating with Js∗ = j. It follows that qij = λ∗ij + βi pij . Similarly, Pi (mh = i) = 1 + λ∗ii h − βi h + βi hpii + o(h) implies qii = λ∗ii − βi + βi pii . Writing out in matrix notation, Q = ϕ(Q) follows. © (n) ª Now let mx be {mx } killed at the first time ηn (say) a subexcursion of © (n) ª is a terminating Markov process depth at least n occurs. It is clear that mx © (0) ª has subintensity matrix Λ∗ − (βi )diag = Q(0) . The proof of and that mx Q = ϕ(Q) then immediately carries over to show that the subintensity matrix ¡ ¢ © (1) ª is ϕ Q(0) = Q(1) . Similarly by induction, the subintensity matrix of of mx ¡ (n) ¢ © (n+1) ª is ϕ Q = Q(n+1) which implies that mx (n+1)
qij
(n)
= λ∗ij − βi + βi pij .
(n)
Now just note that pij ↑ pij and insert (2.4).
2
Define a further kernel U by Z Z U (i, j; A) = Pi (mx = j) dx = −A
−A
Qx eT ej dx ie
(2.5)
(note that we use −A = {x : −x ∈ A} on the r.h.s. of the definition to make U be concentrated on (−∞, 0)). Theorem 2.5 R(i, j; A) =
πj U (j, i; A). πi
Proof. We shall show that ¡ ¢ ¢ πj ¡ ∗ Pj Jt = i, St∗ ∈ A, St∗ < Su∗ , u < t , Pi Jt = j, St ∈ A, τ+ > t = πi
(2.6)
2. THE LADDER HEIGHT DISTRIBUTION
177
from which the result immediately follows by integrating from 0 to ∞ w.r.t. dt. To this end, consider stationary versions of {Jt }, {Jt∗ }. We may then assume Ju∗ = Jt−u , Su∗ = St − St−u , 0 ≤ u ≤ t, and get ¡ ¢ πi Pi Jt = j, St ∈ A, τ+ > t ¡ ¢ = Pπ Jt = j, J0 = i, St ∈ A, Su < 0, 0 < u < t ¡ ¢ ∗ = Pπ J0∗ = j, Jt∗ = i, St∗ ∈ A, St∗ < St−u ,0 < u < t ¡ ¢ = πj Pj Jt∗ = i, St∗ ∈ A, St∗ < Su∗ , 0 < u < t , and this immediately yields (2.6).
2
It is convenient at this stage to rewrite the above results in terms of the matrix K = ∆−1 QT ∆, where ∆ is the diagonal matrix with π on the diagonal: Corollary 2.6 (a) R(dx) = Re−Kx dx, ¡x ≤ 0; ¢ ∞ (b) for z ≥ 0, G+ ((z, ∞)) = 0 eKx S (x + z, ∞) dx; (c) the matrix K satisfies the nonlinear matrix equation K = ϕ(K) where Z ∞ ϕ(K) = Λ − (βi )diag + eKx S(dx); 0
©
(n)
ª
(d) the sequence K defined by K converges monotonically to K.
(0)
¡ ¢ = Λ − (βi )diag , K (n+1) = ϕ K (n)
[The ϕ(·) here is of course not the same as in Proposition 2.4.] From Qe = 0, it is readily checked that π is a left eigenvector of K corresponding to the eigenvalue 0 (when ρ < 1), and we let k be the corresponding right eigenvector normalized by πk = 1. Remark 2.7 It is instructive to see how Proposition 2.1 can be rederived using the more detailed form of G+ in Corollary 2.6(b): from πK = 0 we get Z ∞ πG+ (dy)e = πeKx (βi Bi (dy + x))diag dx · e 0 Z ∞ = π(βi Bi (dy + x))col dx 0 X ∗ = πi βi B i (y) dy = β ∗ B (y)dy. i∈E
2 Though maybe Corollary 2.6 is hardly all that explicit in general, we shall see that nevertheless we have enough information to derive, e.g., the Cram´erLundberg approximation (Section 3), and to obtain a simple solution in the
178
CHAPTER VII. MARKOVIAN ENVIRONMENT
special case of phasetype claims (Chapter IX). As preparation, we shall give at this place some simple consequences of Corollary 2.6. ¡ ¢ Lemma 2.8 I − kG+ k e = (1 − ρ)k. Proof. Using Corollary 2.6(b) with z = 0, we get Z ∞ ¡ ¢ kG+ k = eKx S (x, ∞) dx.
(2.7)
0
In particular, multiplying by K and integrating by parts yields Z ∞ KkG+ k = (eKx − I)S(dx) 0 Z ∞ = K − Λ + (βi )diag − S(dx) = K − Λ.
(2.8)
0
Let L = (kπ − K)−1 . Then (kπ − K)k = k implies Lk = k. Now using (2.7), (2.8) and πeK x = π, we get Z ∞ ¡ ¢ kπkG+ ke = k πS (x, ∞) e dx = k(πi βi µBi )row e = ρk, 0
KkG+ ke = ¡ ¢ (kπ − K) I − kG+ k e =
Ke, k − Ke − ρk + Ke = (1 − ρ)k.
Multiplying by L to the left, the proof is complete.
2
Here is an alternative algorithm to the iteration scheme in Corollary 2.6 for computing K. Let A denote the determinant of the matrix A and d the number of states in E. Proposition 2.9 The following assertions are equivalent: (a) all d eigenvalues of K are distinct; (b) there exist d distinct solutions s1 , . . . , sd ∈ {s ∈ C : 1. Then kG+ k is stochastic with invariant probability vector ζ + (say) proportional to −πK, ζ + = −πK/(−πKe). Furthermore, −πKM + e = ρ − 1. a.s.
Proof. From ρ > 1 it follows that St → ∞ and hence kG+ k is stochastic. That −πK = −eT Q0 ∆ is nonzero and has nonnegative components follows since −Qe has the same property for ρ > 1. Thus the formula for ζ + follows immediately by multiplying (2.8) by −π, which yields −πKkG+ k = −πK. Further Z ∞ Z ∞ ¡ ¢ M+ = dz eKx S (x + z, ∞) dx Z0 ∞ Z0 y ¡ ¢ = dy eKx dx S (y, ∞) 0 Z ∞0 ¡ ¢ −1 = K (eKy − I) S (y, ∞) dy, Z ∞0 ¡ ¢ dy(I − eKy ) S (y, ∞) e −πKM + e = π 0
=
π(βi µBi )diag e − πkG+ ke = ρ − 1
180
CHAPTER VII. MARKOVIAN ENVIRONMENT
(since kG+ k being stochastic implies kG+ ke = e).
2
Notes and references The exposition follows Asmussen [59] closely (the proof of Proposition 2.4 is different). The problem of computing G+ may be viewed as a special case of WienerHopf factorization for continuoustime random walks with Markovdependent increments (Markov additive processes); the discretetime case is surveyed in Asmussen [57] and references given therein.
3
Change of measure via exponential families
We first recall some notation and some results which were given in Chapter III in a more general Markov additive process context. Define F t as the measureb t [s] as the valued matrix with ijth entry Ft (i, j; x) = Pi [St ≤ x; Jt = j], and F sSt b b matrix with ijth entry Ft [i, j; s] = Ei [e ; Jt = j] (thus, F [s] may be viewed as the matrix m.g.f. of F t defined by entrywise integration). Define further ³ ¡ ¢´ bi [α] − 1 K[α] = Λ + βi B − αI diag
(the matrix function K[α] is of course not related to the matrix K of the preceding section). Then (Proposition III.4.2): b t [α] = etK[α] . Proposition 3.1 F It follows from III.4 that K[α] has a simple and unique eigenvalue κ(α) with maximal real part, such that the corresponding left and right eigenvectors ν (α) , h(α) may be taken with strictly positive components. We shall use the normalization ν (α) e = ν (α) h(α) = 1. Note that since K[0] = Λ, we have ν (0) = π, h(0) = e. The function κ(α) plays the role of an appropriate generalization of the c.g.f., see Theorem III.4.7. bi [θ] and hence κ(θ), ν (θ) , h(θ) etc. Now consider some θ such that all B are welldefined. The aim is to define governing parameters βθ;i , Bθ;i , Λθ = (θ) (λij )i,j∈E for a risk process, such that one can obtain suitable generalizations of the likelihood ratio identitites of Chapter III and thereby of Lundberg’s inequality, the Cram´erLundberg approximation etc. According to Theorem III.4.11, the appropriate choice is eθx B (dx) , bi [θ] i B
βθ;i
bi [θ], = βi B
Λθ
= ∆−1 θ K[θ]∆θ − κ(θ)I ¡ ¢ bi [θ] − 1) = ∆−1 Λ∆θ + βi (B θ
Bθ;i (dx) =
diag
¡ ¢ − κ(θ) + θ I
3. CHANGE OF MEASURE VIA EXPONENTIAL FAMILIES
181
(θ)
where ∆θ is the diagonal matrix with hi as ith diagonal element. That is, (θ) hj i 6= j λij (θ) (θ) . λij = hi bi [θ] − 1) − κ(θ) − θ i = j λii + βi (B (θ)
We recall that it was shown in III.4 that Λθ is an intensity matrix, that Ei eθSt hJt © (θ) (θ) ª = etκ(θ) hi and that eθSt −tκ(θ) hJt t≥0 is a martingale. We let Pθ;i be the governing probability measure for a risk process with (T ) parameters βθ;i , Bθ;i , Λθ and initial environment J0 = i. Recall that if Pθ;i is © ª (T ) (T ) (T ) the restriction of Pθ;i to FT = σ (St , Jt ) : t ≤ T and Pi = P0;i , then Pθ;i (T )
and Pi are equivalent for T < ∞. More generally, allowing T to be a stopping time, Theorem III.1.3 takes the following form: Proposition 3.2 Let τ be any stopping time and let G ∈ Fτ , G ⊆ {τ < ∞}. Then # " 1 (θ) (3.1) Pi G = P0;i G = hi Eθ;i (θ) exp {−θSτ + τ κ(θ)} ; G . hJ τ b θ;t [s], κθ (s) and ρθ be defined the same way as F b t [s], κ(s) and ρ, only Let F with the original risk process replaced by the one with changed parameters. b θ;t [s] = e−tκ(θ) ∆−1 F b t [s + θ]∆. Lemma 3.3 F Proof. Use III.(4.5).
2
Lemma 3.4 κθ (s) = κ(s + θ) − κ(θ). In particular, ρθ > 1 whenever κ0 (s) > 0. Proof. The first formula follows by Lemma 3.3 and the second from ρθ = κ0θ (s). 2 Notes and references The exposition here and in the next two subsections (on likelihood ratio identities and Lundberg conjugation) follows Asmussen [58] closely (but is somewhat more selfcontained).
3a
Lundberg conjugation
Since the definition of κ(s) is a direct extension of the definition for the Cram´erLundberg model, the Lundberg equation is κ(γ) = 0. We assume that a solution γ > 0 exists and use notation like PL;i instead of Pγ;i ; also, for brevity we write h = h(γ) and ν = ν (γ) .
182
CHAPTER VII. MARKOVIAN ENVIRONMENT
Substituting θ = γ, τ = τ (u), G = {τ (u) < ∞} in Proposition 3.2, letting ξ(u) = Sτ (u) − u be the overshoot and noting that PL;i (τ (u) < ∞) = 1 by Lemma 3.4, we obtain: Corollary 3.5 ψi (u, T )
= hi e−γu EL,i
ψi (u)
= hi e−γu EL,i
h e−γξ(u) hJτ (u)
i ; τ (u) ≤ T ,
e−γξ(u) . hJτ (u)
(3.2) (3.3)
Noting that ξ(u) ≥ 0, (3.3) yields Corollary 3.6 (Lundberg’s inequality) ψi (u) ≤
hi e−γu . minj∈E hj
Assuming it has been shown that C = limu→∞ EL;i [e−γξ(u) /hJτ (u) ] exists and is independent of i (which is not too difficult, cf. the proof of Lemma 3.8 below), it also follows immediately that ψi (u) ∼ hi Ce−γu . However, the calculation of C is nontrivial. Recall the definition of G+ , K, k from Section 2. ´rLundberg approximation) In the lighttailed Theorem 3.7 (the Crame −γu case, ψi (u) ∼ hi Ce , where C =
1−ρ νk. (ρL − 1)
(3.4)
To calculate C, we need two lemmas. For the first, recall the definition of ζ + , M + in Lemma 2.10. ¡ ¢ Lemma 3.8 As u → ∞, ξ(u), Jτ (u) ¡ converges¢in distribution w.r.t. PL;i , with the density gj (x) (say) of the limit ξ(∞), Jτ (∞) at ξ(∞) = x, Jτ (∞) = j being independent of i and given by gj (x) =
X
¢ 1 L;` L ¡ ζ+ G+ `, j; (x, ∞) . L L ζ + M + e `∈E
Proof. We shall need to invoke the concept of semiregeneration, see A.1f. Interpreting the ladder points as semiregeneration pointsª(the types being the © environmental states in which they occur), (ξ(u), Jτ (u) ) is semiregenerative with the first semiregeneration point being (ξ(0), Jτ (0) ) = (Sτ+ , Jτ+ ). The formula for gj (x) now follows immediately from Proposition A1.7, noting that the nonlattice property is obvious because all GL 2 + (`, j; ·) have densities.
3. CHANGE OF MEASURE VIA EXPONENTIAL FAMILIES
183
b L [−γ] = ∆−1 kG+ k∆, G b + [γ]h = h. Lemma 3.9 K L = ∆−1 K∆ − γI, G + Proof. Appealing to the occupation measure interpretation of K, cf. Corollary 2.6, we get for x < 0 that Z ∞ ¡ ¢ −Kx eT e e dx = Pi St ∈ dx, Jt = j, τ+ > t dt j i 0 Z ¡ ¢ hi −γx ∞ e = PL;i St ∈ dx, Jt = j, τ+ > t dt hj 0 hi −γx 0 −K L x e = ei e ej dx, hj which is equivalent to the first statement of the lemma. The proof of the second is a similar but easier application of the basic likelihood ratio identity Proposib + [γ] = ∆kGL k∆−1 , and since kGL ke = e, tion 3.2. In the same way we get G + + it follows that b + [γ]h = ∆kGL k∆−1 h = ∆kGL ke = ∆e = h. G + + 2 Proof of Theorem 3.7. Using Lemma 3.8, we get Z ∞ £ −γξ(∞) ¤ EL e ; Jτ (∞) = j = e−γx gj (x) dx 0 X L;` Z ∞ ¡ ¢ 1 ζ+ e−γx GL = + `, j; (x, ∞) dx L L ζ + M + e `∈E 0 Z ∞ X 1 1 L;` (1 − e−γx )GL ζ+ = + (`, j; dx) L γ ζL M e 0 + + `∈E X L;` ¡ ¢ 1 bL ζ+ kGL = + (`, j)k − G+ [`, j; −γ] . L L γζ + M + e `∈E In matrix formulation, this means that C
= = =
¡ L ¢ −1 1 e−γξ(∞) bL = ζL e + kG+ k − G+ [−γ] ∆ L L hJτ (∞) γζ + M + e ¡ ¢ −1 1 bL ζL e + I − G+ [−γ] ∆ L L γζ + M + e ¡ ¢ 1 b L [−γ] ∆−1 e, (−π L K L ) I − G + γ(ρL − 1)
EL;i
184
CHAPTER VII. MARKOVIAN ENVIRONMENT
using Lemma 2.10 for the last two equalities. Inserting first Lemma 3.9 and next Lemma 2.8, this becomes ¡ ¢ 1 π L ∆−1 (γI − K) I − kG+ k e γ(ρL − 1) 1−ρ 1−ρ π L ∆−1 (γI − K)k = π L ∆−1 k . = γ(ρL − 1) (ρL − 1) Thus, to complete the proof it only remains to check that π L = ν∆. The normalization νh = 1 ensures ν∆e = 1. Finally, ν∆ΛL = ν∆∆−1 K[γ]∆ = 0 since by definition νK[γ] = κ(γ)ν = 0.
3b
2
Ramifications of Lundberg’s inequality
We consider first the timedependent version of Lundberg’s inequality, cf. V.4. The idea is as there to substitute T = yu in ψi (u, T ) and to replace the Lundberg exponent γ by γy = αy − yκ(αy ), where αy is the unique solution of κ0 (αy ) =
1 . y
(3.5)
Graphically, the situation is just as in Fig. V.1. Thus, one always has γy > γ, whereas αy > γ, κ(αy ) > 0 when y < 1/κ0 (γ), and αy < γ, κ(αy ) < 0 when y > 1/κ0 (γ).
(0)
Theorem 3.10 Let C+ (y) =
1 (αy )
mini∈E hi
. Then
(0)
(αy ) −γy u
e
,
(0)
(αy ) −γy u
,
ψi (u, yu)
≤ C+ (y)hi
ψi (u) − ψi (u, yu)
≤ C+ (y)hi
e
1 , κ0 (γ) 1 . y> 0 κ (γ) y
0, (3.1) yields ψi (u, yu) (αy )
=
hi
h Eαy ;i
1 (α )
=
hJτ y(u) h (α ) hi y e−αy u Eαy ;i
≤
hi
≤
hi
(αy )
(αy )
i © ª exp −αy Sτ (u) + τ (u)κ(αy ) ; τ (u) ≤ yu 1 (α ) hJτ y(u)
i exp {−αy ξ(u) + τ (u)κ(αy )} ; τ (u) ≤ yu
£ ¤ (0) C+ (y)e−αy u Eαy ;i eτ (u)κ(αy ) ; τ (u) ≤ yu (0)
C+ (y)e−αy u+yuκ(αy ) .
Similarly, if y > 1/κ0 (γ), we have κ(αy ) < 0 and get ψi (u) − ψi (u, yu) (αy ) −αy u
=
hi
≤
hi
≤
hi
(αy )
(αy )
e
h Eαy ;i
i
1 (α )
hJτ y(u)
exp {−αy ξ(u) + τ (u)κ(αy )} ; yu < τ (u) < ∞
£ ¤ (0) C+ (y)e−αy u Eαy ;i eτ (u)κ(αy ) ; yu < τ (u) < ∞ (0)
C+ (y)e−αy u+yuκ(αy ) .
2 Note that the proof appears to use less information than is inherent in the definition (3.5). However, as in the classical case (3.5) will produce the maximal γy for which the argument works. Our next objective is to improve upon the constant in front of e−γu in Lundberg’s inequality as well as to supplement with a lower bound: Theorem 3.11 Let C−
= min
B j (x) 1 · inf R , hj x≥0 x∞ eγ(y−x) Bj (dy)
C+
= max
B j (x) 1 · sup R . hj x≥0 x∞ eγ(y−x) Bj (dy)
j∈E
j∈E
(3.8)
Then for all i ∈ E and all u ≥ 0, C− hi e−γu ≤ ψi (u) ≤ C+ hi e−γu .
(3.9)
For the proof, we shall need the matrices G+ and R of P Section 2.¡ We further¢ write G(u) for the vector with ith component Gi (u) = j∈E G+ i, j; (u, ∞)
186
CHAPTER VII. MARKOVIAN ENVIRONMENT
and, for a vector ϕ(u) = (ϕi (u))i∈E of functions, we let G+ ∗ ϕ(u) be the vector with ith component X
XZ
(G+ (i, j) ∗ ϕj )(u) =
j∈E
j∈E
u
ϕj (u − y)G+ (i, j; dy).
0
¯ (0) ¯ Lemma 3.12 Assume supi,u ¯ϕi (u)¯ < ∞, and define ϕ(n+1) (u) = G(u) + (n)
(G+ ∗ ϕ(n) )(u). Then ϕi (u) → ψi (u) as n → ∞. PN P∞ ∗n Proof. Write U N = 0 G∗n + , U = U∞ = 0 G+ . Then iterating the defining (n) (n+1) we get equation ϕ = G + G+ ∗ ϕ ∗(N +1)
ϕ(N +1) = U N ∗ G + G+
∗ ϕ(0) .
However, if τ+ (n) is the nth ladder epoch, we have h
G∗(N +1) ∗ ϕ(0)
i i
¯ (0) ¯ ¡ ¢ (u) ≤ sup¯ϕi (u)¯Pi τ+ (N + 1) < ∞ → 0. i,u
Hence lim ϕ(n) exists and equals U ∗ G. To see that the ith component of U ∗ G(u) equals ψi (u), just note that the (n) recursion ϕ(n+1) = G + G+ ∗ ϕ(n) holds for the particular case where ϕi (u) is the probability of ruin after at most n ladder steps and that then obviously (n) ϕi (u) → ψi (u), n → ∞. 2 Lemma 3.13 For all i and u, C−
X j∈E
Z
∞
hj
eγ(y−u) G+ (i, j; dy) ≤ Gi (u) ≤ C+
u
X j∈E
Z
∞
hj
eγ(y−u) G+ (i, j; dy).
u
Proof. According to (2.2), Z
0
G+ (i, j; dy) = βj
Bj (dy − x)R(i, j; dx). −∞
3. CHANGE OF MEASURE VIA EXPONENTIAL FAMILIES Thus C+
X
Z
∞
hj
j∈E
= C+
eγ(y−u) G+ (i, j; dy)
u
X
Z
X
−∞
Z
≥
j∈E
eγ(y−u) Bj (dy − x)
u
R∞
0
βj hj
R(i, j; dx)
u−x
eγ(y−u+x) Bj (dy) B j (u − x)
−∞
Z
∞
R(i, j; dx)
j∈E
X
Z
0
βj hj
j∈E
= C+
187
B j (u − x)
0
R(i, j; dx)B j (u − x) = Gi (u),
βj −∞
proving the upper inequality, and the proof of the lower one is similar.
2
(0)
Proof of Theorem 3.11. Let first ϕi (u) = C− hi e−γu in Lemma 3.13. We claim (n) by induction that then ϕi (u) ≥ C− hi e−γu for all n, from which the lower inequality follows by letting n → ∞. Indeed, this is obvious if n = 0, and assuming it shown for n, we get X Z u (n) (n+1) ϕi (u) = Gi (u) + ϕj (u − y)G+ (i, j; dy) (3.10) j∈E ∞
≥
C−
XZ j∈E
+C−
XZ
−γu
C− e
hj eγ(y−u) G+ (i, j; dy)
u
j∈E
=
0
u
hj eγ(y−u) G+ (i, j; dy)
0
X
b + [i, j; γ]hj = C− e−γu hi , G
(3.11)
j∈E
estimating the first term in (3.10) by Lemma 3.13 and the second by the induction hypothesis, and using Lemma 3.9 for the last equality in (3.11). (0) The proof of the upper inequality is similar, taking ϕi (u) = 0. 2 Here is an estimate of the rate of convergence of the finite horizon ruin probabilities ψi (u, T ) = Pi (τ (u) ≤ T ) to ψi (u) which is different from Theorem 3.10: Theorem 3.14 Let γ0 > 0 be the solution of κ0 (γ0 ) = 0, let C+ (γ0 ) be as in (γ ) (3.8) with γ replaced by γ0 and hi by hi 0 , and let δ = eκ(γ0 ) . Then (γ0 ) −γ0 u T
0 ≤ ψi (u) − ψi (u, T ) ≤ C+ (γ0 )hi
e
δ .
(3.12)
188
CHAPTER VII. MARKOVIAN ENVIRONMENT
Proof. We first note that just as in the proof of Theorem 3.11, it follows that (γ0 ) −γ0 u
ψi (u) ≤ C− (γ0 )hi
e
.
(3.13)
Hence, letting MT = max0≤t≤T St , we have ψi (u) − ψi (u, T ) = = ≤
Pi (M > u) − Pi (MT > u) = Pi (MT ≤ u, M > u) ¡ ¢ Pi ST ≤ u, MT ≤ u, M > u £ ¤ Ei ψJT (u − ST ); MT ≤ u, ST ≤ u £ (γ ) ¤ C+ (γ0 )e−γ0 u Ei hJT0 eγ0 ST
=
C+ hi
=
(γ0 ) −γ0 u T
e
δ . 2
Notes and references The results and proofs are from Asmussen & Rolski [98]. Further related discussion is given in Grigelionis [435, 436]. Jasiulewicz [503] uses an integral equation approach to study the ruin probability in a Markovmodulated model with surplusdependent premium rates, for approaches involving systems of IDEs see Siegl & Tichy [805] and Lu & Li [610]. For moments of discounted aggregate claims, see Kim & Kim [532]. Yin, Liu & Yang [906] deal with effects of statespace reduction of Jt on Lundbergtype bounds for the ruin probability. Zhu & Yang [921] investigate general regularity issues for ruinrelated functions in a Markovian environment. For the stability of ruin probabilities w.r.t. parameter changes, see Enikeeva, Kalashnikov & Rusaityte [355]. Discretetime models with Markovian environment are e.g. studied in Reinhard & Snoussi [731] and Wagner [866].
4
Comparisons with the compound Poisson model
4a
Ordering of the ruin functions
For two risk functions ψ 0 , ψ 00 , we define the stochastic ordering by ψ 0 ≺st ψ 00 if ψ 0 (u) ≤ ψ 00 (u),
u ≥ 0.
(4.1)
Obviously, this corresponds to the usual stochastic ordering of the maxima M 0 , M 00 of the corresponding two claim surplus processes (note that ψ 0 (u) = P(M 0 > u), ψ 00 (u) = P(M 00 > u)). environment and define ψπ (u) = P Now consider the risk process in a Markovian ∗ ∗ π ψ (u). It was long conjectured that ψ ≺ st ψπ , where ψ (u) is the ruin i∈E i i
4. COMPARISONS WITH THE COMPOUND POISSON MODEL
189
probability for the averaged compound Poisson model defined in Section 1 and ψπ is the one for the Markovmodulated one in the stationary case (the distribution of J0 is π). The motivation that such a result should be true came in part from numerical studies, in part from the folklore principle that any added stochastic variation increases the risk, and finally in part from queueing theory, where it has been observed repeatedly that Markovmodulation increases waiting times and in fact some partial results had been obtained. The results to be presented show that quite often this is so, but that in general the picture is more diverse. The conditions which play a role in the following are: β1 ≤ β2 . . . ≤ βp .
(4.2)
B1 ≺st B2 ≺st . . . ≺st Bp .
(4.3)
The Markov process {Jt } is stochastically monotone.
(4.4)
To avoid trivialities, we also assume that there exist i 6= j such that either βi < βj or Bi 6= Bj . Occasionally we strengthen (4.3) to B = Bi does not depend on i.
(4.5)
Note that whereas (4.2) alone just amounts to an ordering of the states, this is not the case for (4.3). For the notion of monotone Markov processes, we refer to M¨ uller & Stoyan [653]; note that (4.4) is automatic in some simple examples like birthdeath processes or p = 2. Conditions (4.2)–(4.4) say basically that if i < j, then j is the more risky state, and it is in fact easy to show that ψi (u) ≤ ψj (u) (this is used in the derivation of (4.9) below). Theorem 4.1 Assume that conditions (4.2)–(4.4) hold. Then ψ ∗ ≺st ψπ . For the proof, we need two lemmas. The first is a standard result going back to Chebycheff and appearing in a more general form in Esary, Proschan & Walkup [357], the second follows from an extension of Theorem III.5.5 (cf. also Proposition 2.1) which with basically the same proof can be found in Asmussen & Schmidt [103]. Lemma 4.2 If a1 ≤ . . . ≤ ap , b1 ≤ . . . ≤ bp and πi > 0 (i = 1, . . . , p), P p 1 πi = 1, then p p p X X X πi ai bi ≥ πi ai πj bj . i=1
i=1
j=1
The equality holds if and only if a1 = . . . = ap or b1 = . . . = bp .
190
CHAPTER VII. MARKOVIAN ENVIRONMENT
¡ ¢ (+) (+) Lemma 4.3 (a) P¯π Jτ (0) = i, τ (0) < ∞ = ρπi , where πi = βi µBi πi /ρ; ¡ ¢ (b) Pπ Sτ (0) ∈ dx ¯ Jτ (0) = i, τ (0) < ∞ = B i (x) dx/µBi . Proof of Theorem 4.1. Conditioning upon the first ladder epoch, we obtain (cf. Proposition 2.1 for the first term in (4.7) and Lemma 4.3 for the second) Z u ψ ∗ (u) = β ∗ B ∗ (u) + β ∗ ψ ∗ (u − x)B ∗ (x) dx, (4.6) 0
ψπ (u)
= β ∗ B ∗ (u) + ρ Z =
p X
u
β ∗ B ∗ (u) + Z β ∗ B ∗ (u) +
i=1 p uX
0
=
p X
β ∗ B ∗ (u) + β ∗
u
ψi (u − x)B i (x)/µBi dx
0
i=1
0
≥
Z (+)
πi
(4.7)
πi βi B i (x)ψi (u − x) dx πi βi B i (x) ·
p X
(4.8)
πi ψi (u − x) dx
(4.9)
i=1
i=1 Z u
B(x)ψπ (u − x) dx .
(4.10)
0
Here (4.9) follows by considering the increasing functions βi B i (x) and ψi (u − x) of i and using Lemma 4.2. Comparing (4.10) and (4.6), it follows by a standard argument from renewal theory that ψπ dominates the solution ψ ∗ to the renewal equation (4.6). 2 Here is a counterexample showing that the inequality ψ ∗ (u) ≤ ψπ (u) is not in general true: Proposition 4.4 Assume that βi µi < 1 for all i, that p X
πi βi2 µBi
i=1
0. 0
0 Proof. Since ψπ (0) = ψ ∗ (0), it is sufficient to show that ψπ (0) < ψ ∗ (0) for ² small enough. Using (4.6), (4.8) we get
0
ψ ∗ (0) =
−β ∗ + β ∗ ψ ∗ (0) =
p X i=1
0 ψπ (0) =
p X i=1
πi βi ψi (0) − β ∗ .
πi βi ·
p X i=1
πi βi µBi − β ∗ ,
4. COMPARISONS WITH THE COMPOUND POISSON MODEL
191
But it is intuitively clear (see Theorem 3.2.1 of [370] for a formal proof) that ψi (u) converges to the ruin probability for the compound Poisson model with parameters βi , Bi as ² ↓ 0. For u = 0, this ruin probability is βi µBi , and from this the claim follows. 2 To see that Proposition 4.4 is not vacuous, let ¡ ¢ 1/2 1/2 , β1 = 10−3 , β2 = 1, µB1 = 102 , µB2 = 10−4 . π = Then the l.h.s. of (4.11) is of order 10−4 and the r.h.s. of order 10−1 . Notes and references The results are from Asmussen, Frey, Rolski & Schmidt [78]. As is seen, they are at present not quite complete. What is missing in relation to Theorem 4.1 and Proposition 4.4 is the understanding of whether the stochastic monotonicity condition (4.4) is essential (the present authors conjecture it is).
4b
Ordering of adjustment coefficients
Despite the fact that ψ ∗ (u) ≤ ψπ (u) may fail for some u, it will hold for all sufficiently large u, except possibly for a very special situation. Recall that the adjustment coefficient for the Markovmodulated model is defined as the solution γ > 0 of κ(γ) = 0 where κ(α) is the eigenvalue with maximal real part ¡ ¢ bi [α] − 1 −α. The adjustment of the matrix Λ + (κi (α))diag where κi (α) = βi B coefficient γ ∗ for the averaged compound Poisson model is the solution > 0 of κ∗ (γ ∗ ) = 0 where X ¡ ∗ ¢ b [α] − 1 − α = κ∗ (α) = β ∗ B πi κi (α). (4.12) i∈E
Theorem 4.5 γ ≤ γ ∗ , with equality only when κi (γ ∗ ) does not depend on i ∈ E. P Lemma 4.6 Let (δi )i∈E be a given set of constants satisfying i∈E πi δi = 0 and define λ(α) as the eigenvalue with maximal real part of the matrix Λ + α(δi )diag . Then λ(α) ≥ 0, with strict inequality unless α = 0 or δi = 0 for all i ∈ E. Proof. Define
Z
t
Xt = 0
δJs ds.
© ª Then (Jt , Xt ) is a Markov additive process (a socalled Markovian fluid model, cf. e.g. Asmussen [62]) as discussed in III.5, and by Proposition III.4.2 we have ¡ ¢ Ei [eαXt ; Jt = j] i,j∈E = eΛ+α(δi )diag .
192
CHAPTER VII. MARKOVIAN ENVIRONMENT
Further (see Corollary III.4.7) λ is convex with λ0 (0) =
X EXt = πi δi = 0, t→∞ t lim
(4.13)
i∈E
λ00 (0) =
lim
t→∞
Var Xt . t
(4.14)
By convexity, (4.13) implies λ(α) ≥ 0 for all α. Now we can view {Xt } as a cumulative process (see 1d) with generic cycle ¯ © ª ω = inf t > 0 : Jt− 6= k, Jt = k ¯ J0 = k (the return time of k) where k ∈ E is some arbitrary but fixed state. It is clear that the distribution of XωPis nondegenerate except when δi does not depend on i ∈ E, which in view of i∈E πi δi = 0 is only possible if δi = 0 for all i ∈ E. Hence if δi 6= 0 for some i ∈ E, it follows by Proposition A1.4(b) that the limit in (4.14) is nonzero so that λ00 (0) > 0. This implies that λ is strictly convex, in particular λ(α) > 0 for all α 6= 0. 2 P ∗ Proof of Theorem 4.5. Let δi = κi (γ ), α = 1 in Lemma 4.6. Then πi δi = 0 because of (4.12) and κ∗ (γ ∗ ) = 0. Further λ(1) = κ(γ ∗ ) by definition of λ(·) and κ(·). Hence κ(γ ∗ ) ≥ 0. Since κ is convex with κ0 (0) < 0, this implies that the solution γ > 0 of κ(γ) = 0 must satisfy γ ≤ γ ∗ . If κi (γ ∗ ) is not a constant function of i ∈ E, we get κ(γ ∗ ) > 0 which in a similar manner implies that γ < γ∗. 2 Notes and references Theorem 4.5 is from Asmussen & O’Cinneide [93], improving upon more incomplete results from Asmussen, Frey, Rolski & Schmidt [78].
4c
Sensitivity estimates for the adjustment coefficient
Now assume that the intensity matrix for the environment is Λ² = Λ0 /², whereas the βi and Bi are fixed. The corresponding adjustment coefficient is denoted by γ(²). Thus γ(²) → γ ∗ as ² ↓ 0, and our aim is to compute the sensitivity ¯ ∂γ ¯¯ . ∂² ¯²=0 A dual result deals with the limit ² → ∞. Here we put a = 1/², note that γ(a) → mini=1,...,p γi and compute ¯ ∂γ ¯¯ . ∂a ¯a=0
4. COMPARISONS WITH THE COMPOUND POISSON MODEL
193
¡ ¢ In both cases, the basic equation is Λ + (κi (γ))diag h = 0, where Λ, γ, h depend on the parameter (² or a). In the case of ², multiply the basic equation by ² to obtain ¢ ¡ 0 = Λ0 + ²(κi (γ))diag h, ¡ ¢ ¡ ¢ 0 = (κi (γ))diag + ²γ 0 (κ0i (γ))diag h + Λ0 + ²(κi (γ))diag h0 .
(4.15)
Normalizing h by πh = 0, we have πh0 = 0, h(0) = e. Hence letting ² = 0 in (4.15) yields ¡ ¢ ¡ ¢ 0 = κi (γ ∗ ) diag e + Λ0 h0 (0) = κi (γ ∗ ) diag e + (Λ0 − eπ)h0 (0), ¡ ¢ h0 (0) = −(Λ0 − eπ)−1 κi (γ ∗ ) diag e. (4.16) Differentiating (4.15) once more and letting ² = 0 we get ¡ ¢ ¡ ¢ 0 = 2γ 0 (0) κ0i (γ ∗ ) diag e + 2 κi (γ ∗ )r diag h0 (0) + Λ0 h00 (0), ¡ ¢ 0 = 2γ 0 (0)ρ + 2π κi (γ ∗ ) diag h0 (0),
(4.17) (4.18)
multiplying (4.17) by π to the left to get (4.18). Inserting (4.16) yields ¯ ¡ ¢ ∂γ ¯¯ 1 ¡ 0 ∗ ¢ π κi (γ ) diag (Λ0 − eπ)−1 κi (γ ∗ ) diag e . Proposition 4.7 = ¯ ∂² ²=0 ρ Now turn to the case of a. We assume that 0 < γ1 < γi ,
i = 2, . . . , p.
(4.19)
Then γ → γ1 as a ↓ 0 and we may take h(0) = e1 (the first unit vector). We get ¡ ¢ 0 = aΛ0 + (κi (γ))diag h, ¢ ¡ ¢ ¡ (4.20) 0 = Λ0 + γ 0 (κ0i (γ) diag )h + aΛ0 + (κi (γ))diag h0 . Letting a = 0 in (4.20) and multiplying by e1 to the left we get 0 = λ11 + γ 0 (0)κ01 (0) + 0 (here we used κ1 (γ(0)) = 0 to infer that the first component of K[γ(0)]h0 (0) is 0), and we have proved: ¯ ∂γ ¯¯ λ11 . Proposition 4.8 If (4.19) holds, then = − 0 ∂a ¯a=0 κ1 (0) Notes and references The results are from Asmussen, Frey, Rolski & Schmidt [78]. The analogue of Proposition 4.8 when γi < 0 for some i is open.
194
5
CHAPTER VII. MARKOVIAN ENVIRONMENT
The Markovian arrival process
We shall here briefly survey an extension of the model, which has recently received much attention in the queueing literature, and has some relevance in risk theory as well. The additional feature of the model is the following: • Certain transitions of {Jt } from state i to state j are accompanied by a claim with distribution Bij ; the intensity for such a transition (referred to (2) as marked in the following) is denoted by λij and the remaining intensity (1)
(1)
(2)
for a transition i → j by λij (thus λij = λij + λij ). For i = j, we use (2)
the convention that λii = βi where βi is the Poisson rate in state i, that (1) Bii = Bi , and that the λii are determined by Λ = Λ(1) + Λ(2) where Λ is the intensity matrix governing {Jt }. Thus, the Markovmodulated compound Poisson model considered so far corresponds to Λ(2) = (βi )diag , Λ(1) = Λ − (βi )diag , Bii = Bi ; the definition of Bij is redundant for i 6= j. Note that the case that 0 < qij < 1, where qij is the probability that a transition i → j is accompanied by a claim, is covered by letting Bij have an atom of size qij at 0. Again, the claim surplus is a Markov additive process (cf. III.3). The extension of the model can also be motivated via Markov additive processes: if {Nt } is the counting process of a point process, then {Nt } is a Markov additive process if and only if it corresponds to an arrival mechanism of the type just considered. Here are some main examples: Example 5.1 (Phasetype renewal arrivals) Consider a risk process where the claim sizes are i.i.d. with common distribution B, but the point process of arrivals is not Poisson but renewal with interclaim times having common distribution A of phasetype with representation (ν, T ). In the above setting, we may let {Jt } represent the phase processes of the individual interarrival times glued together (see further IX.2 for details), and the marked transitions are then the ones corresponding to arrivals. This is the only way in which arrivals can occur, and thus βi = 0, Λ(1) = T , Λ(2) = tν, Bij = B; the definition of Bi is redundant because of βi = 0.
2
5. THE MARKOVIAN ARRIVAL PROCESS
195
Example 5.2 (superpositions) A nice feature of the setup is that it is closed © (1) ª © (2) ª under superposition of independent arrival streams. Indeed, let Jt , Jt (k) be two independent environmental processes and let E (k) , Λ(1;k) , Λ(2;k) , Bij © (k) ª etc. refer to Jt . We then let (see the Appendix for the Kronecker notation) ¡ (1) (2) ¢ E = E (1) × E (2) , Jt = Jt , Jt , Λ(1) = Λ(1;1) ⊕ Λ(1;2) , Λ(2) = Λ(2;1) ⊕ Λ(2;2) , (1)
(2)
Bij,kj = Bik , Bij,ik = Bjk
(the definition of the remaining Bij,k` is redundant). In this way we can model, e.g., superpositions of renewal processes. 2 Example 5.3 (an individual model) In contrast to the collective assumptions (which underly most of the topics treated so far in this book and lead to Poisson arrivals), assume that there is a finite number N of policies. Assume further that the ith policy leads to a claim having distribution Ci after a time which is exponential, with rate αi , say, and that the policy then expires. This means that the environmental states are of the form i1 i2 · · · iN with i1 , i2 , . . . ∈ {0, 1}, where ik = 0 means that the kth policy has not yet expired and ik = 1 that it has expired. Thus, claims occur only at state transitions for the environment so that λ0i2 ···iN ,1i2 ···iN = α1 , B0i2 ···iN ,1i2 ···iN = C1 , λi1 0···iN ,i1 1···iN = α2 , Bi1 0···iN ,i1 1···iN = C2 , .. . All other offdiagonal elements of Λ are zero so that all other Bii are redundant. Similarly, all βi1 i2 ···iN are zero and all Bi are redundant. Easy modifications apply to allow for • the time until expiration of the kth policy is general phasetype rather than exponential; • upon a claim, the kth policy enters a recovering state, possibly having a general phasetype sojourn time, after which it starts afresh. 2 Example 5.4 (a single life insurance policy) Consider the life insurance of a single policy holder which can be in one of several states, E = {working, retired, married, divorced, widowed, invalidized, dead etc.}. The individual pays at rate pi when in state i and receives an amount having distribution Bij when his/her state changes from i to j. 2
196
CHAPTER VII. MARKOVIAN ENVIRONMENT
Notes and references The point process of arrivals was studied in detail by Neuts [658] and is often referred to in the queueing literature as Neuts’ versatile point process, or, more recently, as the Markovian arrival process (MAP). However, the idea of arrivals at transition epochs can be found in Hermann [460] and Rudemo [755]. The versatility of the setup is even greater than for the Markovmodulated model. In fact, Hermann [460] and Asmussen & Koole [88] showed that in some appropriate sense any arrival stream to a risk process can be approximated by a model of the type studied in this section: any marked point process is the weak limit of a sequence of such models. For the Markovmodulated model, one limitation for approximation purposes is the inequality Var Nt ≥ ENt which needs not hold for all arrival streams. Some main queueing references using the MAP are Ramaswami [722], Sengupta [794], Lucantoni [612], Lucantoni et al. [612], Neuts [662] and Asmussen & Perry [95]. For recent applications in risk theory, see e.g. Badescu, Drekic & Landriault [118] and Cheung & Landriault [239].
6
Risk theory in a periodic environment
6a
The model
We assume as in the previous part of the chapter that the arrival mechanism has a certain timeinhomogeneity, but now exhibiting (deterministic) periodic fluctuations rather than (random) Markovian ones. Without loss of generality, let the period be 1; for s ∈ E = [0, 1), we talk of s as the ‘time of the year’. The basic assumptions are as follows: • The arrival intensity at time t of the year is β(t) for a certain function β(t), 0 ≤ t < 1; • Claims arriving at time t of the year have distribution B (t) ; • The premium rate at time t of the year is p(t). By periodic extension, we may assume that the functions β(t), p(t) and B (t) are defined also for t 6∈ [0, 1). Obviously, one needs to assume also (as a minimum) that they are measurable in t; from an application point of view, continuity would hold in presumably all reasonable examples. We denote throughout the initial season by s and by P(s) the corresponding governing probability measure for the risk process. Thus at time t the premium rate is p(s + t), a claim arrives with rate β(s + t) and is distributed according to B (s+t) . Let Z 1 Z 1 Z 1 β(t) β∗ = β(t) dt, B ∗ = B (t) ∗ dt, p∗ = p(t) dt. (6.1) β 0 0 0
6. RISK THEORY IN A PERIODIC ENVIRONMENT
197
Then the average arrival rate is β ∗ and the safety loading η is η = (p∗ − ρ)/ρ, where Z 1 Z ∞ ρ = β(v) dv xB (v) (dx) = β ∗ µ∗B . (6.2) 0
0
Note that ρ is the average net claim amount per unit time and µ∗ = ρ/β ∗ the average mean claim size. In a similar manner as in Proposition 1.8, one may think of the standard compound Poisson model with parameters β ∗ , B ∗ , p∗ as an averaged version of the periodic model, or, equivalently, of the periodic model as arising from the compound Poisson model by adding some extra variability. Many of the results given below indicate that the averaged and the periodic model share a number of main features. In particular, it turns out that they have the same adjustment coefficient. In contrast, for Markovmodulated model typically the adjustment coefficient is larger than for the averaged model (cf. Section 4b), in agreement with the general principle of added variation increasing the risk (cf. the discussion in IV.9). The behavior of the periodic model does not need to be seen as a violation of this principle, since the added variation is deterministic, not random. Example 6.1 As an example to be used for numerical illustration throughout this section, let β(t) = 3λ(1 + sin 2πt), p(t) = λ and let B (t) be a mixture of two exponential distributions with intensities 3 and 7 and weights w(t) = (1 + cos 2πt)/2 and 1 − w(t), respectively. It is easily seen that β ∗ = 3λ, p∗ = λ whereas B ∗ is a mixture of exponential distributions with intensities 3 and 7 and weights 1/2 for each (1/2 = R1 R1 w(t) dt = 0 (1 − w(t)) dt). Thus, the average compound Poisson model is 0 the same as in IV.(3.1) and Example 1.10, and we recall from there that the ruin probability is 1 24 −u e + e−6u . (6.3) ψ ∗ (u) = 35 35 Note that λ enters just as a scaling factor of the time axis, and thus the averaged standard compound Poisson models have the same risk for all λ. In contrast, we shall see that for the periodic model increasing λ increases the effect of the periodic fluctuations. 2 Remark 6.2 Define
Z θ(T ) =
T
p(t) dt, 0
Set = Sθ−1 (t) .
© ª Then (by standard operational time arguments) Set is a periodic risk process with unit premium rate and the same infinite horizon ruin probabilities. We assume in the rest of this section that p(t) ≡ 1. 2
198
CHAPTER VII. MARKOVIAN ENVIRONMENT
The arrival process {Nt }t≥0 is a timeinhomogeneous Poisson process with © ª intensity function β(s + t) t≥0 . The claim surplus process {St }t≥0 is defined PNt in the obvious way as St = 1 Ui − t. Thus, the conditional distribution of Ui given that the ith claim occurs at time t is B (s+t) . As usual, τ (u) = inf {t > 0 : St > u} is the time to ruin, and the ruin probabilities are ¡ ¢ ¡ ¢ ψ (s) (u) = P(s) τ (u) < ∞ , ψ (s) (u, T ) = P(s) τ (u) ≤ T . The claim surplus process {St } may be seen as a Markov additive process, with the underlying Markov process {Jt } being deterministic period motion on E = [0, 1), i.e. Jt = (s + t) mod 1 P(s) a.s. (6.4) At a first sight this point of view may appear quite artificial, but it turns out to have obvious benefits in terms of guidelining the analysis of the model as a parallel of the analysis for the Markovian environment risk process. Notes and references The model has been studied in risk theory by, e.g., Daykin et.al. [279], Dassios & Embrechts [273] and Asmussen & Rolski [97], [98] (the literature in the mathematical equivalent setting of queueing theory is somewhat more extensive, see the Notes to Section 7). The exposition of the present chapter is basically an extract from [98], with some variants in the proofs. Recently, K¨ otter & B¨ auerle [558] addressed the stochastic optimization problem to minimize the ruin probability through investment in the framework of a periodic environment, see also Chapter XIV.
6b
Lundberg conjugation
Motivated by the discussion in Chapter III.4 (see in particular Remark III.4.8), we start by deriving formulas giving the m.g.f. of the claim surplus process. To this end, let ¡ ∗ ¢ b [α] − 1 − α = κ∗ (α) = β ∗ B
Z
s+1
¡ (v) ¢ b [α] − 1 dv − α β(v) B
s
be the c.g.f. of the averaged compound Poisson model (the last expression is independent of s by periodicity), and define ½ Z sh i ¾ ¡ (v) ¢ b [α] − 1 − α − κ∗ (α) dv ; β(v) B h(s; α) = exp − 0
then h(·; α) is periodic on R.
6. RISK THEORY IN A PERIODIC ENVIRONMENT Theorem 6.3 E(s) eαSt =
199
h(s; α) tκ∗ (α) e . h(s + t; α)
Proof. Conditioning upon whether a claim occurs in [t, t + dt] or not, we obtain £ ¤ E(s) eαSt+dt Ft ¡ ¢ b (s+t) [α] = 1 − β(s + t)dt eαSt −αdt + β(s + t)dt · eαSt B ¡ £ (s+t) ¤¢ b = eαSt · 1 − αdt + β(s + t)dt B [α] − 1 , E(s) eαSt+dt d (s) αSt E e dt
= =
¡ £ (s+t) ¤¢ b E(s) eαSt 1 − αdt + β(s + t)dt B [α] − 1 , ¡ £ (s+t) ¤¢ b E(s) eαSt −α + β(s + t) B [α] − 1 ,
d log E(s) eαSt dt
=
log E(s) eαSt
=
£ (s+t) ¤ b −α + β(s + t) B [α] − 1 , Z t ¡ (s+v) ¢ b −αt + β(s + v) B [α] − 1 dv
=
log e h(s + t; α) − log e h(s; α),
0
where
½Z e h(t; α) = exp
t
¡ (v) ¢ b [α] − 1 dv − αt β(v) B
¾
0
Thus E(s) eαSt =
∗
=
etκ (α) . h(t; α)
e h(s; α) tκ∗ (α) h(s + t; α) e . = e h(s + t; α) h(s; α)
(6.5) 2
Corollary 6.4 For each θ such that the integrals in the definition of h(t; θ) exist and are finite, ½ ¾ h(s + t; θ) θSt −tκ∗ (θ) e {Lθ,t }t≥0 = h(s; θ) t≥0 is a P(s) martingale with mean one. Proof. In the Markov additive sense of (6.4), we can write Lθ,t =
h(Jt ; θ) θSt −tκ∗ (θ) e h(J0 ; θ)
P(s) a.s. so that obviously {Lθ,t } is a multiplicative functional for the Markov process {(Jt , St )}. According to Remark III.1.8, it then suffices to note that E(s) Lθ,t = 1 by Theorem 6.3. 2
200
CHAPTER VII. MARKOVIAN ENVIRONMENT
Remark 6.5 The formula for h(s) = h(s; α) as well as the fact that κ = κ∗ (α) is the correct exponential growth rate of EeαSt can be derived via Remark III.4.9 as follows. With G the generator of {Xt } = {(Jt , St )} and hα (s, y) = eαy h(s), the requirement is Ghα (i, 0) = κh(s). However, as above E(s) hα (Jdt , Sdt ) = = Ghα (s, 0) =
¡ ¢ b (s) [α]h(s) h(s + dt)e−αdt 1 − β(s)dt + β(s)dt · B © ª b (s) [α]h(s) , h(s) + dt −αh(s) − β(s)h(s) + h0 (s) + β(s)B b (s) [α]h(s). −αh(s) − β(s)h(s) + h0 (s) + β(s)B
Equating this to κh(s) and dividing by h(s) yields h0 (s) h(s) h(s)
b (s) [α] + κ, α + β(s) − β(s)B ½ Z sh i ¾ ¡ (v) ¢ b = exp − β(v) B [α] − 1 − α − κ dv
=
0
(normalizing by h(0) = 1). That κ = κ∗ (α) then follows by noting that h(1) = h(0) by periodicity. 2 For each θ satisfying the conditions of Corollary 6.4, it follows by Theorem III.1.7 that we can define a new Markov process {(Jt , St )} with governing prob(s) ability measures Pθ , say, such that for any s and T < ∞, the restrictions of (s) P(s) and Pθ to Ft are equivalent with likelihood ratio Lθ,T . (s)
Proposition 6.6 The Pθ , 0 ≤ s < 1, correspond to a new periodic risk model with parameters b (t) [θ], βθ (t) = β(t)B
(t)
Bθ (dx) =
eθx B (t) (dx). b (t) [θ] B
Proof. (i) Check that m.g.f. of St is as for the asserted periodic risk model, cf. Proposition 6.3; (ii) use Markovmodulated approximations (Section 6c); (iii) use approximations with piecewiese constant β(s), B (s) ; (iv) finally, see [98] for a formal proof. 2 Now define γ as the positive solution of the Lundberg equation for the averaged model. That is, γ solves κ∗ (γ) = 0. When α = γ, we put for short h(s) = h(s; γ). A further important constant is the value γ0 (located in (0, γ)) at which κ∗ (α) attains its minimum. That is, γ0 is determined by 0
0
b ∗ [γ0 ] − 1. 0 = κ∗ (γ0 ) = β B
(6.6)
6. RISK THEORY IN A PERIODIC ENVIRONMENT
201
¢ (s) ¡ Lemma 6.7 When α ≥ γ0 , Pα τ (u) < ∞ = 1 for all u ≥ 0. Proof. According to (6.2), the mean number of claims per unit time is Z 1 Z ∞ ρα = β(v) dv xeαx B (v) (dx) 0 0 Z ∞ 0 ∗ αx ∗ b ∗ [α] = κ∗0 (α) + 1, = β xe B (dx) = β ∗ B 0
which is ≥ 1 by convexity.
2
The relevant likelihood ratio representation of the ruin probabilities now follows immediately from Corollary III.1.5. Here and in the following, ξ(u) = Sτ (u) − u is the overshoot and θ(u) = (τ (u) + s) mod 1 the season at the time of ruin. Corollary 6.8 The ruin probabilities can be computed as ¸ · −αξ(u)+τ (u)κ∗ (α) e ; τ (u) ≤ T , ψ (s) (u, T ) = h(s; α)e−αu E(s) α h(θ(u); α) ψ
(s)
(u)
ψ (s) (u)
= h(s; α)e
−αu
−αξ(u)+τ (u)κ∗ (α) (s) e , Eα
h(θ(u); α)
= h(s)e−γu E(s) γ
e−γξ(u) . h(θ(u))
α ≥ γ0
(6.7) (6.8) (6.9)
To obtain the Cram´erLundberg approximation from Corollary 3.1, we need the following auxiliary result. The proof involves machinery from the ergodic theory of Markov chains on a general state space, which is not used elsewhere in the book, and we refer to [98]. Lemma 6.9 Assume that there exist open intervals I ⊆ [0, 1), J ⊆ R+ such that the B (s) , s ∈ I, have components with densities b(s) (x) satisfying inf
s∈I, x∈J
β(s)b(s) (x) > 0.
(6.10)
©¡ ¢ª Then for each α, the Markov process ξ(u), θ(u) u≥0 , considered with gov© (s) ª erning probability measures Pα s∈[0,1) , has a unique stationary distribution, ¡ ¢ say the distribution of ξ(∞), θ(∞) , and no matter what is the initial season ¡ ¢D¡ ¢ s, ξ(u), θ(u) → ξ(∞), θ(∞) . Letting u→∞ ¡ ¢ in (6.9) and noting that weak convergence entails convergence of Ef ξ(u), θ(u) for any bounded continuous function (e.g. f (x, q) = e−γx /h(q)), we get:
202
CHAPTER VII. MARKOVIAN ENVIRONMENT
Theorem 6.10 Under the condition (6.10) of Lemma 3.1, ψ (s) (u) ∼ Ch(s)e−γu ,
u → ∞,
(6.11)
e−γξ(∞) ¢. where C = Eγ ¡ h θ(∞) Note that (6.11) gives an interpretation of h(s) as a measure of how the risks of different initial seasons s vary. For our basic Example 6.1, elementary calculus yields ¶¾ ½ µ 1 1 1 9 cos 2πs − sin 2πs + cos 4πs − . h(s) = exp λ 2π 4π 16π 16π Plots of h for different values of λ are given in Fig. VII.4, illustrating that the effect of seasonality increases with λ.
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Figure VII.4 In contrast to h, it does not seem within the range of our methods to compute C explicitly, which may provide one among many motivations for the Markovmodulated approximation procedure to be considered in Section 6c. Among other things, this provides an algorithm for computing C as a limit. At this stage, Theorem 6.10 shows that certainly γ is the correct Lundberg exponent. Noting that ξ(u) ≥ 0 in (6.9), we obtain immediately the following version of Lundberg’s inequality which is a direct parallel of the result given in Corollary 3.6 for the Markovmodulated model: (0)
Theorem 6.11 ψ (s) (u) ≤ C+ h(s)e−γu , where (0)
C+
=
1 inf 0≤t≤1 h(t)
.
6. RISK THEORY IN A PERIODIC ENVIRONMENT
203 (0)
Thus, e.g., in our basic example with λ = 1, we obtain C+ = 1.42 so that ¾ ½ 1 1 9 1 (s) cos 2πs − sin 2πs + cos 4πs − e−u . ψ (u) ≤ 1.42 · exp 2π 4π 16π 16π (6.12) As for the Markovian environment model, Lundberg’s inequality can be considerably sharpened and extended. We state the results below; the proofs are basically the same as in Section 3 and we refer to [98] for details. Consider first the timedependent version of Lundberg’s inequality. Just as in V.4, we substitute T = yu in ψ(u, T ) and replace the Lundberg exponent γ by γy = αy − yκ(αy ), where αy is the unique solution of κ0 (αy ) =
1 . y
(6.13)
Elementary convexity arguments show that we always have γy > γ and αy > γ, κ(αy ) > 0 when y < 1/κ0 (γ), whereas αy < γ, κ(αy ) < 0 when y > 1/κ0 (γ). (0)
Theorem 6.12 Let C+ (y) =
1 . Then inf 0≤t≤1 h(t; αy ) (0)
ψ (s) (u, yu)
≤ C+ (y)h(s)e−γy u ,
ψ (s) (u) − ψ (s) (u, yu)
≤ C+ (y)h(s)e−γy u ,
(0)
y< y>
1
,
(6.14)
1 . κ0 (γ)
(6.15)
κ0 (γ)
(0)
The next result improves upon the constant C+ in front of e−γu in Theorem 6.11 as well as it supplements with a lower bound. Theorem 6.13 Let (t)
C−
B (x) 1 · inf R ∞ γ(y−x) (t) , = inf 0≤t≤1 h(t) x≥0 e B (dy) x
C+
=
(t)
B (x) 1 · sup R ∞ γ(y−x) (t) . B (dy) 0≤t≤1 h(t) x≥0 x e sup
(6.16)
Then for all s ∈ [0, 1) and all u ≥ 0, C− h(s)e−γu ≤ ψ (s) (u) ≤ C+ h(s)e−γu .
(6.17)
In order to apply Theorem 6.13 to our basic example, we first note that the function R∞ {w · 3e−3x + (1 − w) · 7e−7x } dx 6w + 6(1 − w)e−4u © ª R∞u = 9w + 7(1 − w)e−4u ex−u w · 3e−3x + (1 − w) · 7e−7x dx u
204
CHAPTER VII. MARKOVIAN ENVIRONMENT
attains its minimum 2/3 for u = ∞ and its maximum 6/(7 + 2w) for u = 0. Thus ½ µ ¶¾ 1 2 1 1 9 inf exp −λ cos 2πs − sin 2πs + cos 4πs − C− = 3 0≤s≤1 2π 4π 16π 16π 2 −0.013λ e , = 3 ¢ª © ¡1 1 1 9 cos 2πs − 4π sin 2πs + 16π cos 4πs − 16π 6 exp −λ 2π C+ = sup . 8 + cos 2πs 0≤s≤1 Thus e.g. for λ = 1 (where 32 e−0.013λ = 0.66, C+ = 1.20), ½ 1 1 1 cos 2πs − sin 2πs + cos 4πs − ψ (s) (u) ≥ 0.66 · exp 2π 4π 16π ½ 1 1 1 cos 2πs − sin 2πs + cos 4πs − ψ (s) (u) ≤ 1.20 · exp 2π 4π 16π
¾ 9 e−u , 16π ¾ 9 e−u . 16π
Finally, we have the following result: Theorem 6.14 Let C+ (γ0 ) be as in (6.16) with γ replaced by γ0 and h(t) by ∗ h(t; γ0 ), and let δ = eκ (γ0 ) . Then 0 ≤ ψ (s) (u) − ψ (s) (u, T ) ≤ C+ (γ0 )h(s; γ0 )e−γ0 u δ T .
(6.18)
Notes and references The material is from Asmussen & Rolski [98]. Some of the present proofs are more elementary by avoiding the general point process machinery of [98], but thereby also slightly longer.
6c
Markovmodulated approximations
A periodic risk model may be seen as a varying environment model, where the environment at time t is (s + t) mod 1 ∈ [0, 1), with s the initial season. Of course, such a deterministic periodic environment may be seen as a special case of a Markovian one (allowing a continuous state space E = [0, 1) for the environment), and in fact, much of the analysis of the preceding section is modelled after the techniques developed in the preceding sections for the case of a finite E. This observation motivates to look for a more formal connection between the periodic model and the one evolving in a finite Markovian environment. The idea is basically to approximate the (deterministic) continuous clock by a discrete (random) Markovian one with n ‘months’. Thus, the nth Markovian environmental process {Jt } moves cyclically on {1, . . . , n}, completing a cycle
7. DUAL QUEUEING MODELS
205
within one unit of time on the average, so that the intensity matrix is Λ(n) given by −n n 0 ··· 0 0 −n n · · · 0 Λ(n) = (6.19) . .. .. .. . . .. .. . . . n
0
0
···
−n
Arrivals occur at rate βni and their claim sizes are distributed according to Bni if the governing Markov process is in state i. We want to choose the βni and Bni in order to achieve good convergence to the periodic model. To this end, one simple choice is βni = β
³i − 1´ n
and
Bni = B ((i−1)/n) ,
(6.20)
© (n) ª but others are also possible. We let St be the claim surplus process of t≥0 (n)
the nth approximating Markovmodulated model, M (n) = supt≥0 St , and the ruin probability corresponding to the initial state i of the environment is then (n)
ψi (t) = Pi (M (n) > t),
(6.21)
which serves as an approximation to ψ (s) (u) whenever n is large and i/n ≈ s. Notes and references See Rolski [745].
7
Dual queueing models
The essence of the results of the present section is that the ruin probabilities ψi (u), ψi (u, T ) can be expressed in a simple way in terms of the waiting time probabilities of a queueing system with the input being the timereversed input of the risk process. This queue is commonly denoted as the Markovmodulated M/G/1 queue and has received considerable attention in the last decades. Thus, since the settings are equivalent from a mathematical point of view, it is desirable to have formulas permitting freely to translate from one setting into the other. Let βi , Bi , Λ be the parameters defining the risk process in a random environment and consider a queueing system governed by a Markov process {Jt∗ } (‘Markovmodulated’) as follows: • The intensity matrix for {Jt∗ } is the timereversed intensity matrix Λ∗ = (λ∗ij )i,j∈E of the risk process, λ∗ij = λji πj /πi . • The arrival intensity is βi when Jt∗ = i;
206
CHAPTER VII. MARKOVIAN ENVIRONMENT • Customers arriving when Jt∗ = i have service time distribution Bi ; • The queueing discipline is FIFO.
The actual waiting time process {Wn }n=1,2,... and the virtual waiting time (workload) process {Vt }t≥0 are defined exactly as for the renewal model in Chapter VI. Proposition 7.1 Assume V0 = 0. Then ¡ ¢ ¢ πj ¡ Pj VT > u, JT∗ = i . Pi τ (u) ≤ T, JT = j = πi
(7.1)
In particular, ψi (u, T ) =
¡ ¢ ¡ ¢ 1 Pπ VT > u, JT∗ = i = Pπ VT > u  JT∗ = i , πi
1 P(V > u, J ∗ = i) = Pπ (V > u  J ∗ = i), πi where (V, J ∗ ) is the steadystate limit of (Vt , Jt∗ ). ψi (u) =
(7.2) (7.3)
Proof. Consider stationary versions of {Jt }0≤t≤T , {Jt∗ }0≤t≤T . Then we may assume that Jt∗ = JT −t , 0 ≤ t ≤ T and that the risk process {Rt }0≤t≤T is coupled to the virtual waiting process {Vt }0≤t≤T as in the basic duality lemma (Theorem III.2.1). The first conclusion of that result then states that the events {τ (u) ≤ T, J0 = i, JT = j} and {VT > u, J0∗ = j, JT∗ = i} coincide. Taking probabilities and using the stationarity yields ¡ ¢ πi Pi τ (u) ≤ T, JT = j = πj Pj (VT > u, JT∗ = i), and (7.1) follows. For (7.2), just sum (7.1) over j, and for (7.3), let T → ∞ in (7.2) and use that lim Pj (VT > u, JT∗ = i) = P(V > u, J ∗ = i) for all j. 2 Now let In∗ denote the environment when customer n arrives and I ∗ the steadystate limit. Proposition 7.2 The relation between the steadystate distributions of the actual and the virtual waiting time distribution is given by P(W > u, I ∗ = i) = where β ∗ =
P j∈E
βi P(V > u, J ∗ = i), β∗
(7.4)
πj βj . In particular, ψi (u) =
β∗ P(W > u, I ∗ = i). πi βi
(7.5)
7. DUAL QUEUEING MODELS
207
Proof. Identifying the distribution of (W, I ∗ ) with the timeaverage, we have N 1 X a.s. I(Wn > u, In∗ = i) → P(W > u, I ∗ = i), N → ∞. N n=1
However, if T is large, on average β ∗ T customers arrive in [0, T ], and of these, on average βi T P(V > u, J ∗ = i) see W > u, I ∗ = i. Taking the ratio yields (7.4), and (7.5) follows from (7.4) and (7.3). 2 Notes and references One of the earliest papers drawing attention to the Markovmodulated M/G/1 queue is Burman & Smith [210]. The first comprehensive solution of the waiting time problem is Regterschot & de Smit [729], a paper relying heavily on classical complex plane methods. A more probabilistic treatment was given by Asmussen [59], and further references (to which we add Prabhu & Zhu [714]) can be found therein. Proposition 7.1 is from Asmussen [58], with (7.3) improving somewhat upon (2.7) of that paper. The relation (7.4) can be found in Regterschot & de Smit [729]; a general formalism allowing this type of conclusion is ‘conditional PASTA’, see Regterschot & van Doorn [327]. In the setting of the periodic model of Section 6, the dual queueing model is a periodic M/G/1 queue with arrival rate β(−t) and service time distribution B (−t) at time t of the year (assuming w.l.o.g. that β(t), B (t) have been periodically extended to negative t). With {Vt } denoting the workload process of the periodic queue, ρ < 1 then ensures that V (s) = limN →∞ VN +s exists in distribution, and one has ` ´ P(s) τ (u) ≤ T = P(−s−T ) (VT > u), (7.6) ` ´ P(−s−T ) τ (u) ≤ T = P(s) (VT > u), ` ´ P(1−s) τ (u) < ∞ = P(s) (V (0) > u).
(7.7) (7.8)
For treatments of periodic M/G/1 queues, see in particular Harrison & Lemoine [450], Lemoine [579, 580], and Rolski [745].
This page intentionally left blank
Chapter VIII
Leveldependent risk processes 1
Introduction
We assume as in Chapter IV that the claim arrival process {Nt } is Poisson with rate β, and that the claim sizes U1 , U2 , . . . are i.i.d. with common distribution B and independent of {Nt }. Thus, the aggregate claims in [0, t] are At =
Nt X
Ui
(1.1)
i=1
(other terms are accumulated claims or total claims). However, the increase of the surplus process Rt in between the claim payments now does not have to be linear with constant slope, but can depend on the current surplus level. This can always be interpreted as a modified premium rate p(r) charged at the current reserve Rt = r (but note that the actual reason for the level dependence of the increase may be quite different, see the examples below). Thus, in between jumps, {Rt } moves according to the differential equation R˙ = p(R), and the evolution of the reserve may be described by the equation1 Z R t = u − At +
t
p(Rs ) ds.
(1.2)
0 1 Here it is assumed that p(r) is a deterministic function. Stochastic p(r) will be discussed in Sections 5 and 6.
209
210
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
As earlier, ¯ ³ ´ ¯ ψ(u) = P inf Rt < 0 ¯ R0 = u , t≥0
³ ψ(u, T ) = P
¯ ´ ¯ inf Rt < 0 ¯ R0 = u
0≤t≤T
denote the ruin probabilities with initial reserve u and infinite, resp. finite horizon, and τ (u) = ¡inf {t > 0 :¢ Rt < u} is the ¡ time to ¢ruin starting from R0 = u so that ψ(u) = P τ (u) < ∞ , ψ(u, T ) = P τ (u) ≤ T . The following examples provide some main motivation for studying the model: Example 1.1 Assume that the company reduces the premium rate from p1 to p2 when the reserve comes above some critical value v. That is, p1 > p2 and ½ p1 r ≤ v p(r) = (1.3) p2 r > v. One reason could be competition, where one would try to attract new customers as soon as the business has become reasonably safe. Another could be the payout of dividends: here the premium paid by the policy holders is the same for all r, but when the reserve comes above v, dividends are paid out at rate p1 − p2 . Possibilities for more general leveldependent premium (dividend payment) schemes than the twostep rule above are obvious. 2 Example 1.2 (interest) If the company charges a constant premium rate p but invests its money at interest rate i, we get p(r) = p + ir. 2 Example 1.3 (absolute ruin) Consider the same situation as in Example 1.2, but assume now that the company borrows the deficit in the bank when the reserve goes negative, say at interest rate i0 . Thus at deficit x > 0 (meaning Rt = −x), the payout rate of interest is i0 x and absolute ruin occurs when this exceeds the premium inflow p, i.e. when x > p/i0 , rather than when the reserve et = Rt + p/i0 , itself becomes negative. In this situation, we can put R ½ p + i(r − p/i0 ) r > p/i0 , pe(r) = p − i0 (p/i0 − r) 0 ≤ r ≤ p/i0 . © ª et is of the type defined above, and the probability Then the ruin problem for R e + p/i0 ). 2 of absolute ruin with initial reserve u ∈ [−p/i0 , ∞) is given by ψ(u Example 1.4 (tax) If the insurance company makes profit, it will have to pay tax. One way to model this is to assume that whenever the risk process Rt is in a running maximum, a certain proportion ϑ of the premium income is paid
1. INTRODUCTION
211
to the tax authority (such a model is related to the socalled losscarriedforward scheme). The resulting premium rule is p(r) = ϑ p in the running maxima and p(r) = p otherwise. Due to the nonMarkovian character, the analysis for this model is somewhat different from the above examples, see Section 4. 2 Now let us return to the general Markovian model. Proposition 1.5 Either ψ(u) = 1 for all u, or ψ(u) < 1 for all u. Proof. Obviously ψ(u) ≤ ψ(v) when u ≥ v. Assume ψ(u) < 1 for some u. If R0 = v < u, there is positive probability, say ², that {Rt } will reach level u before the first claim arrives. Hence in terms of survival probabilities, 1 − ψ(v) ≥ ²(1 − ψ(u)) > 0 so that ψ(v) < 1. 2 A basic question is thus which premium rules p(r) ensure that ψ(u) < 1. No tractable necessary and sufficient condition is known in complete generality of the model. However, it seems reasonable to assume monotonicity (p(r) is decreasing in Example 1.1 and increasing in Example 1.2) for r sufficiently large so that p(∞) = limr→∞ p(r) exists. This is basically covered by the following result (but note that the case p(r) ↓ βµB requires a more detailed analysis and that µB < ∞ is not always necessary for ψ(u) < 1 when p(r) → ∞, cf. [APQ, pp. 388–389]): Theorem 1.6 (a) If p(r) ≤ βµB for all sufficiently large r, then ψ(u) = 1 for all u; (b) If p(r) > βµB + ² for all sufficiently large r and some ² > 0, then ψ(u) < 1 for all u, and P(Rt → ∞) > 0. Proof. This follows by a simple comparison with the compound Poisson model. Let ψp (u) refer to the compound Poisson model with the same β, B and (constant) premium rate p. In case (a), choose u0 such that p(r) ≤ p = βµB for r ≥ u0 . Starting from R0 = u0 , the probability that Rt ≤ u0 for some t is at least ψp (0) = 1 (cf. Proposition IV.1.2(d)), hence Rt ≤ u0 also for a whole sequence of t’s converging to ∞. However, obviously inf u≤u0 ψ(u) > 0, and hence by a geometric trials argument ψ(u0 ) = 1 so that ψ(u) = 1 for all u by Proposition 1.5. In case (b), choose u0 such that p(r) ≥ p = βµB + ² for r ≥ u0 . Then if u ≥ u0 , we have ψ(u) ≤ ψp (u − u0 ) and, appealing to Proposition IV.1.2 once more, that ψp (u − u0 ) < 1. Hence ψ(u) < 1 for all u by Proposition IV.1.2(d). 2 We next recall the following result, which was proved in III.3. Here {Vt }t≥0 is a storage process which has reflection at zero and initial condition V0 = 0.
212
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
In between jumps, {Vt } decreases at rate p(v) when Vt = v (i.e., V˙ = −p(V )). That is, instead of (1.2) we have Z
t
Vt = At −
p(Vs ) ds,
(1.4)
0
and we use the convention p(0) = 0 to make zero a reflecting barrier (when hitting 0, {Vt } remains at 0 until the next arrival). Theorem 1.7 For any T < ∞, one can couple the risk process and the storage process on [0, T ] in such a way that the events {τ (u) ≤ T } and {VT > u} coincide. In particular, ψ(u, T ) = P(VT > u), (1.5) and the process {Vt } has a proper limit in distribution, say V , if and only if ψ(u) < 1 for all u. Then ψ(u) = P(V > u). (1.6) In order to make Theorem 1.7 applicable, we thus need to look more into the stationary distribution G, say, for the storage process {Vt }. It is intuitively obvious and not too hard to prove that G is a mixture of two components, one having an atom at 0 of size g0 , say, and the other being given by a density g(x) on (0, ∞). It follows in particular that Z ∞ ψ(u) = g(y) dy . (1.7) u
Proposition 1.8 Z
x
B(x − y)g(y) dy.
p(x)g(x) = g0 βB(x) + β
(1.8)
0
Proof. In stationarity, the flow of mass from [0, x] to (x, ∞) must be the same as the flow the other way. In view of the path structure of {Vt }, this means that the rate of upcrossings of level x must be the same as the rate of downcrossings. Now obviously, the l.h.s. of (1.8) is the rate of downcrossings (the event of an arrival in [t, t + dt] can be neglected so that a path of {Vt } corresponds to a downcrossing in [t, t + dt] if and only if Vt ∈ [x, x + p(x)dt]). An attempt of an upcrossing occurs as a result of an arrival, say when {Vt } is in state y, and is successful if the jump size is larger than x − y. Considering the cases y = 0 and 0 < y ≤ x separately, we arrive at the desired interpretation of the r.h.s. of (1.8) as the rate of upcrossings. 2
1. INTRODUCTION
213
Define
Z ω(x) = 0
x
1 dt. p(t)
Then ω(x) is the time it takes for the reserve to reach level x provided it starts with R0 = 0 and no claims arrive. Note that it may happen that ω(x) = ∞ for all x > 0, say if p(r) goes to 0 at rate r or faster as r ↓ 0. −δx and that Corollary 1.9 Assume that B is exponential with rate δ, B(x) R ∞= e ω(x) < ∞ for all x > 0. Then the ruin probability is ψ(u) = u g(y) dy, where
g(x) =
© ª g0 β 1 exp βω(x) − δx and =1+ p(x) g0
Z 0
∞
© ª β exp βω(x) − δx dx . p(x) (1.9)
Proof. We may rewrite (1.8) as Z x o 1 n β −δx −δx −δx g0 βe e g(x) = + βe eδy g(y) dy = κ(x) p(x) p(x) 0 Rx where κ(x) = g0 + 0 eδy g(y) dy so that κ0 (x) = eδx g(x) =
β κ(x). p(x)
Thus Z log κ(x)
=
log κ(0) + 0
κ(x) g(x)
x
β dt = log κ(0) + βω(x), p(t)
= κ(0)eβω(x) = g0 eβω(x) , = e−δx κ0 (x) = e−δx g0 βω 0 (x)eβω(x)
which is the same as the expression R ∞in (1.9). That g0 has the asserted value is a consequence of 1 = kGk = g0 + 0 g(y)dy. 2 Remark 1.10 The exponential case in Corollary 1.9 is the only one in which explicit formulas are known (or almost so; see further the notes to Section 2), and thus it becomes important to develop algorithms for computing the ruin probabilities. We next outline one possible approach based upon the integral equation (1.8) (another one is based upon numerical solution of a system of differential equations which can be derived under phasetype assumptions, see further IX.7).
214
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES A Volterra integral equation has the general form Z x g(x) = h(x) + K(x, y)g(y) dy,
(1.10)
0
where g(x) is an unknown function (x ≥ 0), h(x) is known and K(x, y) is a suitable kernel. Dividing (1.8) by p(x) and letting K(x, y) =
βB(x − y) , p(x)
h(x) =
g0 βB(x) , p(x)
we see that for fixed g0 , the function g(x) in (1.8) satisfies (1.10). For the purpose of explicit computation of g(x) (and thereby ψ(u)), the general theory of Volterra equations does not seem to lead beyond the exponential case already treated in Corollary 1.9. However, one might try instead a numerical solution. We consider the simplest possible approach based upon the most basic numerical integration procedure, the trapezoidal rule Z xN ¤ h£ f (x0 ) + 2f (x1 ) + 2f (x2 ) + · · · + 2f (xN −1 ) + f (xN ) , f (x) dx = 2 x0 where xk = x0 + kh. Fixing h > 0, letting x0 = 0 (i.e. xk = kh) and writing gk = g(xk ), Kk,` = K(xk , x` ), this leads to gN = hN + i.e. gN =
h {KN,0 g0 + KN,N gN } + h {KN,1 g1 + · · · + KN,N −1 gN −1 } , 2
hN + h2 KN,0 g0 + h {KN,1 g1 + · · · + KN,N −1 gN −1 } 1 − h2 KN,N
.
(1.11)
In the case of (1.8), the unknown g0 is involved. However, (1.11) is easily seen to be linear in g0 . One therefore first makes a trial solution gR∗ (x) corre∞ sponding to g0 = 1, i.e. h(x) = h∗ (x) = βB(x)/p(x), and computes 0 g ∗ (x)dx numerically (by truncation and using the gk∗ ). Then g(x) = g0 g ∗ (x), and kGk = 1 then yields Z ∞ 1 =1+ g ∗ (x)dx (1.12) g0 0 from which g0 and hence g(x) and ψ(u) can be computed.
2
Remark 1.11 Plugging (1.7) into (1.8), one obtains by partial integration and reordering Z u (1.13) p(u)ψ 0 (u) − β ψ(u) + β ψ(u − y) dB(y) + β B(u) = 0, 0
1. INTRODUCTION
215
where in the last term it was used that g0 + ψ(0) = 1. It is also possible to derive (1.13) directly (without reference to storage processes) in the following way. For h > 0, consider the time interval (0, h) and condition on the time t and the amount y of the first claim in (0, h). Since the probability that there is no claim in (0, h) is e−βh and the probability that the first claim occurs between time t and t + dt is e−βt β dt, one obtains, using the Markov property of the process Rt , ψ(u) =
¡ ¡ ¢ Rh ¢ Rt Rh e−βh ψ u + 0 p(Rs ) ds + e−βt β dt B u + 0 p(Rs ) ds +
Rh 0
e−βt βdt
R u+ 0t p(Rs )ds
R 0
0
¡ ¢ Rt ψ u + 0 p(Rs )ds − y dB(y).
¡ Since every other part of the above equation is differentiable w.r.t. h, also ψ u + ¢ Rh p(Rs ) ds has to be (by symmetry this also establishes the differentiability 0 w.r.t. u). Taking the derivative w.r.t. h and subsequently setting h = 0 then gives (1.13). The formal framework for this approach are of course generators (cf. Chapter II). For the particular case B(x) = e−δx , one can multiply (1.13) by eδu and differentiate the resulting equation w.r.t. u. In this way one obtains the secondorder differential equation p(u) ψ 00 (u) + (p0 (u) + δp(u) − β) ψ 0 (u) = 0 ¡ ¢ with the boundary conditions p(0)ψ 0 (0) = β ψ(0) − 1 and limu→∞ ψ(u) = 0. This leads to R ∞ 1 β ω(v)−δ v e β u p(v) dv R∞ 1 , (1.14) ψ(u) = β ω(v)−δ v dv 1 + β 0 p(v) e which is again the result of Corollary 1.9.
1a
2
Twostep premium functions
We now assume the premium function to be constant in two levels as in Example 1.1, ½ p1 r ≤ v p(r) = (1.15) p2 r > v. We may think of the risk reserve process Rt as pieced together from two risk (1) (2) reserve processes Rt and Rt with constant premiums p1 , p2 , such that Rt (1) (2) coincide with Rt under level v and with Rt above level v. If, as outlined
216
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
in Example 1.1, the reduced income above v is due to dividend payments, this model is usually referred to as the threshold dividend model.2 For an example of a sample path of such a refracted process, see Fig. VIII.1.
Figure VIII.1 © (i) ª Proposition 1.12 Let ψ (i) (u) denote the ruin probability of Rt , define σ = inf {t ≥ 0 : Rt < v}, let π(u) be the probability of ruin between σ and the next upcrossing of v (including ruin possibly at σ), and let φv (u) =
1 − ψ (1) (u) , 0 ≤ u ≤ v. 1 − ψ (1) (v)
(1.16)
Then 1 − φv (u) + φv (u)ψ(v), 0 ≤ u ≤ v, π(v) , u = v, ψ(u) = 1 + π(v) − ψ (2) (0) ¡ ¢ π(u) + ψ (2) (u − v) − π(u) ψ(v), v ≤ u < ∞. Proof. Recall from Proposition II.2.6 that φv (u) = 1 − ψv (u) is the probability © (1) ª for Rt (and hence also for {Rt }) of upcrossing level v before ruin given the process starts at u ≤ v. Hence, for u < v the probability of ruin for {Rt } will be the sum of the probability of being ruined before upcrossing v, 1 − φv (u), and the probability of ruin given we hit v first, φv (u)ψ(v). 2 The corresponding dividend payout scheme, namely to pay nothing when R < v and to t pay out at rate p1 − p2 when Rt ≥ v, turns out to maximize the expected discounted sum of dividend payments until ruin under certain assumptions, see e.g. Gerber & Shiu [412].
1. INTRODUCTION
217
Similarly, if u ≥ v then the probability of ruin is the sum of being ruined between σ and the next upcrossing of v which is π(u), and the probability of ruin given the process hits v before (−∞, 0) again after σ, ¡ ¢ ¡ ¢ Pu (σ < ∞) − π(u) ψ(v) = ψ (2) (u − v) − π(u) ψ(v). This yields the expression for u ≥ v, and the one for u = v then immediately follows. 2 Example 1.13 Assume that B is exponential, B(x) = e−δx . Then β −γ1 u e , p1 δ
ψ (1) (u) =
ψ (2) (u) =
β −γ2 u e , p2 δ
where γi = δ − β/pi , so that β −γ1 u e p1 δ . φv (u) = β −γ1 v e 1− p1 δ 1−
Furthermore, for u ≥ v, P(σ < ∞) = ψ (2) (u−v) and the conditional distribution of v − Rσ given σ < ∞ is exponential with rate δ. If v − Rσ < 0, ruin occurs at time σ. If v − Rσ = x ∈ [0, v], the probability of ruin before the next upcrossing of v is 1 − φv (v − x). Hence Z v n o ¡ ¢ (2) −δv π(u) = ψ (u − v) e + 1 − φv (v − x) δe−δx dx 0
β −γ1 (v−x) ) e p1 δ −δx δe dx β −γ1 v 0 e 1− p1 δ ¡ ¢ β ( 1 − e−δv − e−γ1 v e(γ1 −δ)v − 1 ) β −γ2 (u−v) π(γ1 − δ) e 1− β −γ1 v p2 δ e 1− p1 δ ) ( 1 − e−γ1 v β −γ2 (u−v) . e 1− β −γ1 v p2 δ e 1− p1 δ (
=
=
=
β −γ2 (u−v) e 1− p2 δ
Z
v
1−
2 Also for general phasetype distributions, all quantities in Proposition 1.12 can be found explicitly, see IX.7.
218
1b
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Multistep premium functions
In a similar manner one can investigate a more general model with a premium rule of the form p1 0 = v0 ≤ r < v1 , p2 v1 ≤ r < v2 , p(r) = (1.17) .. . pk r ≥ vk−1 . Assume that pk > ρ. A similar approach as in Remark 1.11 then gives the piecewise integrodifferential equation Z u pi ψ 0 (u) − βψ(u) + β ψ(u − z)dB(z) + βB(u) = 0 (1.18) 0
for vi−1 ≤ u < vi and i = 1, . . . , k − 1. For the solution ψ(u) to be continuous, we require the boundary conditions (contact conditions) lim ψ(u) = lim ψ(u)
(1.19)
lim ψ(u) = 0.
(1.20)
u→vi +
u→vi −
and from η > 0 we have u→∞
Note that ψ(u) is not differentiable at the boundaries of the layers, as in view of (1.18) and continuity of ψ the conditions (1.19) can be rewritten in the form ¯ ¯ ¯ ¯ ∂+ ∂− ¯ pi+1 ψ(u)¯ ψ(u)¯¯ = pi . ∂u ∂u u=vi u=vi For exponential claim distribution B(x) = e−δx , the system (1.18) can be solved explicitly in the following way. Akin to the procedure in Remark 1.11, first transform (1.18) into pi ψ 00 (u) + (pi δ − β)ψ 0 (u) = 0,
u ∈ [vi−1 , vi ).
(1.21)
Using the notation ψ(u) =
k X
I(bi−1 ≤ u < bi ) ψ [i] (u),
(1.22)
i=1
where the function ψ [i] (u) is the solution of (1.21) for u ∈ [vi−1 , vi ) for each i, we obtain ψ [i] (u) = A(i) + C (i) e−γi u ,
1. INTRODUCTION
219
where γi = δ − β/pi is the (unique) positive solution of the Lundberg equation ¡ ¢ b − 1 − pi α = 0. β B[α] It remains to establish the constants in the above representation: from (1.20), we have A(k) = 0. From the continuity conditions (1.19) we immediately get A(i+1) + C (i+1) e−γi+1 vi − A(i) − C (i) e−γi vi = 0,
(i = 1, 2, . . . , k − 1). (1.23)
Using (1.18) and comparing the coefficients of e−δu , we further obtain after elementary algebra −A(i+1) − C (i+1)
δe−γi vi δe−γi+1 vi + A(i) + C (i) = 0, δ − γi+1 δ − γi
(i = 1, 2, . . . , k − 1), (1.24)
δ = 1. Adding (1.23) and (1.24) leads to together with A(1) + C (1) δ−γ 1
C (i+1)
© ª (δ − γi+1 )γi exp (γi+1 − γi )vi C (i) (δ − γi )γi+1 n ³1 1 ´ o (i) β − δ(pi ) exp β − vi C β − δ(pi+1 ) pi pi+1
= =
so that =
nX o (δ − γi )γ1 exp (γj+1 − γj )vj C (1) (δ − γ1 )γi j=1
=
i−1 n X o 1 1 β − δ(p1 ) exp β ( − )vj C (1) β − δ(pi ) p pj+1 j=1 j
i−1
C (i)
for i = 1, . . . , k. Define Li = γ1 e−γ1 v1
¶ j i−1 µ nX o X 1 1 − exp (v`−1 − v` )γ` . γj γj+1 j=1 `=2
Then, again from (1.23), we have A(i) = A(1) + C (1) and hence A(1) = −
Lk 1 − Lk
δ Li = A(1) + (1 − A(1) )Li δ − γ1 and C (1) =
δ − γ1 1 . δ 1 − Lk
(1.25)
220
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Altogether we thus arrive at the explicit formula (1.22) with [i]
ψ (u) =
=
µ i−1 n o¶ X (δ − γi )γ1 1 Li − Lk + exp −γ1 u + (γj+1 − γj )vj 1 − Lk δ γi j=1 µ ¶ i−1 nX o γ1 1 (i) Li − Lk + exp (γj+1 − γj )vj ψ (u) 1 − Lk γi j=1
for i = 1, . . . , k, where ψ (i) (u) again denotes the ruin probability in the classical model with constant premium intensity pi (u) ≡ pi . Note that for k = 2, the formula from Example 1.13 is retained. Remark 1.14 The main tool in the above calculation was the reformulation of the integrodifferential equations as ordinary differential equations, which allowed to find the fundamental solution for each layer locally and separately and subsequently to determine the coefficients through the continuity assumptions between the solutions in different layers (through a system of linear equations). This program can still be carried out for, say, Erlang(n) claim sizes, in which case the ODEs (with constant coefficients) that generalize (1.21) are of order n + 1. However, the solution of the resulting linear system of equations usually is highly involved and can only be evaluated numerically. 2 Notes and references Some early references drawing attention to the model are Dawidson [277] and Segerdahl [790]. For the absolute ruin problem, see Gerber [396], Dassios & Embrechts [273] and, for a recent extension to finitetime horizons in a more general L´evy setup, Loeffen & Patie [605]. Equation (1.6) was derived by Harrison & Resnick [450] by a different approach, whereas (1.5) is from Asmussen & Schock Petersen [104]; see further the notes to III.3. The analytic derivation of (1.14) can be found in Tichy [851]. For some explicit solutions beyond Corollary 1.9, see the notes to Section 2. Remark 1.10 is based upon Schock Petersen [695]; for complexity and accuracy aspects, see the Notes to IX.7. Extensive discussion of the numerical solution of Volterra equations can be found in Baker [125]; see also Jagerman [499, 500]. An (2) (1) extension of Proposition 1.12, in which the switch from Rt to Rt only takes place (1) if the risk process has gone below a threshold w < v first, but the switch from Rt to (2) Rt still takes place at the upcrossing of v is given in Bratiychuk & Derfla [196], see also Frostig [377]. The special case p2 = 0 in the premium rule (1.15) refers to the situation when all original premium income is paid out as dividends to shareholders whenever the surplus level is above v. If in addition it is specified that for initial capital u > v, the difference u − v is immediately paid out as a lump sum dividend payment, then the resulting strategy is known as the horizontal dividend barrier strategy. The ruin probability is
1. INTRODUCTION
221
ψ(u) = 1 for the corresponding risk process (namely Rt reflected at v) and hence this case is not of interest for the main focus of this book. However, already back in 1957 de Finetti [283] suggested to consider the expected discounted accumulated dividend payments until ruin in a portfolio as an (economically motivated) alternative to the ruin probability for measuring the value of a portfolio, and the identification of the optimal dividend strategy to maximize this quantity leads to challenging sto chastic control problems. The horizontal barrier strategy often turns out to be optimal (see Gerber [394] for early results; the weakest currently known criteria on the risk process under which horizontal dividend barrier strategies are optimal among all admissible payout strategies are due to Loeffen [604], Kyprianou, Rivero & Song [568] and Loeffen & Renaud [606]). For duality considerations of the reflected risk processes with G/M/1 queues, see L¨ opker & Perry [609]. An analysis of exit problems for the threshold model (1.15) in a general L´evy setup (with particular emphasis on the spectrally negative case) is given in Kyprianou & Loeffen [565]. The threshold and multistep premium rules are a somewhat popular alternative to horizontal dividend strategies that are still to some extent analytically tractable and lead to ruin probabilities smaller than 1. The multistep rule was first studied by Kerekesha [530], who looked at (1.18) for arbitrary u > 0 using correction terms for the different premium intensities outside the respective layer, which he expressed through truncated Fourier transforms. The explicit solution for exponential claims given above is due to Albrecher & Hartinger [25]; see also Zhou [919] and Lin & Sendova [594], where (1.18) is derived by a renewal approach that directly implies the continuity conditions (1.19). In [25, 594, 919] the analysis is also considerably extended to cover quantities like the time value of ruin, deficit at ruin and the surplus prior to ruin. To improve upon the problem described in Remark 1.14, an alternative recursive approach that iteratively calculates the full solution for the same model with one layer less is developed in [25]. For extensions of these results to the renewal model see Yang & Zhang [904]. In the literature also other surplusdependent risk processes have been discussed in connection with dividend payout schemes that lead to a positive probability of survival. Among them are timedependent barrier strategies, for which the barrier itself is an increasing function of time and if the risk process touches the barrier it stays at the barrier until the next claim occurs and the additional premium income is paid out as dividends. The ruin probability for the resulting risk process can then often be obtained as the solution of partial integrodifferential equations, see Gerber [399], Siegl & Tichy [804, 805] and Albrecher & Tichy [41]. Albrecher, Hartinger & Thonhauser [26] analytically compare the performance of linear barrier strategies with the threshold strategies of (1.15). Albrecher & Kainhofer [30] and Albrecher, Kainhofer & Tichy [31] investigate risk processes with a nonlinear timedependent barrier structure including constant interest on the surplus. See also Garrido [390] in a diffusion setting. For general surveys on dividend models in risk theory, see Avanzi [109] and Albrecher & Thonhauser [40].
222
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
2
The model with constant interest
In this section, we assume that p(x) = p + ix. This example is of particular application relevance because of the interpretation of i as interest rate. However, it also turns out to have nice mathematical features. A basic tool is a representation of the ruin probability in terms of a discounted stochastic integral Z ∞ Z = − e−it dSt (2.1) 0
PN w.r.t. the claim surplus process St = At − pt = 1 t Ui − pt of the associated (u) compound Poisson model without interest. Write Rt when R0 = u. We first note that: (u)
Proposition 2.1 Rt
(0)
= eit u + Rt .
Proof. The result is obvious if one thinks in economic terms and represents the reserve at time t as the initial reserve u with added interest plus the gains/deficit from the claims and incoming premiums. For a more formal mathematical proof, note that (u)
dRt
=
£ (u) ¤ d Rt − eit u = (u)
(u)
Since R0 − ei·0 u = 0 for all u, Rt which yields the result.
(u)
p + iRt − dAt , £ (u) ¤ p + i Rt − eit u − dAt . − eit u must therefore be independent of u 2
Let (0)
Zt = e−it Rt
= e−it
³Z t ³ ´ ´ p + iRs(0) ds − At . 0
Then dZt
= e
−it
Z
³ − i dt ·
0
t¡
´ ¢ ¡ (0) ¢ p + iRs(0) ds + p + iRt dt + i dt · At − dAt
= e−it (p dt − dAt ) = −e−it dSt . Thus
Z Zv
= −
v
e−it dSt ,
0
where the last integral exists pathwise because {St } is of locally bounded variation.
2. THE MODEL WITH CONSTANT INTEREST
223
Proposition 2.2 The r.v. Z in (2.1) is welldefined and finite, with distribution H(z) = P(Z ≤ z) given by the m.g.f. nZ ∞ ¡ nZ α 1 o ¢ o αZ −it b κ (−y) dy , H[α] = Ee = exp κ −αe dt = exp 0 0 iy ¡ ¢ b − 1 − pα. Further Zt a.s. where κ(α) = β B[α] → Z as t → ∞. Proof. Let Mt = At − tβµB . Then St = Mt + t(βµ ©RBv −−itp) and ª {Mv } is a martingale. From this it follows immediately that e dM is again a t 0 (2) martingale. The mean is 0 and (since Var(dMt ) = βµB dt) ³Z
v
Var
−it
e
Z
´ dMt
v
=
0
0
(2)
βµB (1 − e−2iv ). 2i
(2)
e−2it βµB dt =
Hence the limit as v → ∞ exists by the convergence theorem for L2 bounded martingales, and we have Z v Z v ¡ ¢ Zv = − e−it dSt = − e−it dMt + (βµB − p)dt 0 Z0 ∞ ¡ ¢ a.s. → − e−it dMt + (βµB − p)dt Z0 ∞ = − e−it dSt = Z. 0
if X1 , X2 , . . . are i.i.d. with c.g.f. φ and ρ < 1, we obtain the c.g.f. of P∞Now n ρ X n at α as 1 log E
∞ Y
eαρ
n
Xn
= log
n=1
∞ Y
eφ(αρ
n
)
∞ X
=
n=1
φ(αρn ).
n=1
Letting ρ = e−ih , Xn = Snh − S(n+1)h , we have φ(α) = hκ(−α), and obtain the R∞ c.g.f. of Z = − 0 e−it dSt as lim h↓0
∞ X
n
φ(αρ ) = lim h
n=1
h↓0
∞ X n=1
Z κ(−αe
−inh
) =
∞
¡ ¢ κ −αe−it dt ;
0
b the last expression for H[α] follows by the substitution y = αe−it . Theorem 2.3 ψ(u) =
H(−u) ¯ ¤. E H(−Rτ (u) ) ¯ τ (u) < ∞ £
2
224
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Proof. Write τ = τ (u) for brevity. On {τ < ∞}, we have Z h −iτ iτ u + Z = (u + Zτ ) + (Z − Zτ ) = e e (u + Zτ ) −
∞
e−i(t−τ ) dSt
i
τ
e−iτ [Rτ(u) + Z ∗ ], R ∞ where Z ∗ = − τ e−i(t−τ ) dSt is independent of Fτ and distributed as Z. The (u) last equality followed from Rt = eit (Zt + u), cf. Proposition 2.1, which also yields τ < ∞ on {Z < −u}. Hence ¡ ¢ H(−u) = P(u + Z < 0) = P Rτ + Z ∗ < 0; τ < ∞ ¯ £ ¤ = ψ(u)E P(Rτ + Z ∗ < 0 ¯ Fτ , τ < ∞) ¯ £ ¤ = ψ(u)E H(−Rτ (u) ) ¯ τ (u) < ∞ . =
2 −δx
Corollary 2.4 Assume that B is exponential, B(x) = e , and that p(x) = p + ix with p > 0. Then ³ δ(p + iu) β ´ ; βiβ/i−1 Γ i i (2.2) ψ(u) = ³ δp β ´ , β/i β/i −δp/i β/i−1 ; δ p e + βi Γ i i R ∞ η−1 −t where Γ(x; η) = x t e dt is the incomplete Gamma function. Proof 1. We use Corollary 1.9 and get Z
x
1 1 1 dt = log(p + ix) − log p, p + it i i 0 nβ o γ0 β β exp log(p + ix) − log p − δx g(x) = p + ix i i γ0 β = (p + ix)β/i−1 e−δx , pβ/i Z ∞ © ª 1 β = 1+ exp βω(x) − δx dx γ0 p(x) Z0 ∞ β = 1+ (p + ix)β/i−1 e−δx dx β/i p 0 Z ∞ β = 1 + β/i y β/i−1 e−δ(y−p)/i dy ip p βiβ/i−1 eδp/i ³ δp β ´ ; , = 1+ Γ i i δ β/i pβ/i
ω(x) =
2. THE MODEL WITH CONSTANT INTEREST Z
ψ(u)
225
∞
β exp {βω(x) − δx} dx p(x) u βiβ/i−1 eδp/i ³ δ(p + iu) β ´ ; , = γ0 Γ i i δ β/i pβ/i = γ0
from which (2.2) follows by elementary algebra.
2
Proof 2. We use Theorem 2.3. From κ(α) = βα/(δ − α) − pα, it follows that Z Z α ¢ 1 α¡ 1 b κ (−y) dy = p − β/(δ + y) dy log H[α] = i 0 0 iy ¶β/i i µ h £ ¤ 1 δ pα + β log δ − β log(δ + α) = log epα/i = , i δ+α which shows that Z is distributed as p/i − V , where V is Gamma(δ, β/i), i.e. with density xβ/i−1 δ β/i −δx e , x > 0. fV (x) = Γ (β/i) In particular, H(−u) =
P(Z < −u) = P(V > u + p/i) =
¡ ¢ Γ δ(p + iu)/i; β/i . Γ (β/i)
By the memoryless property of the exponential distribution, −Rτ (u) has an exponential distribution with rate (δ) and hence ¯ £ ¤ E H(−Rτ (u) )¯ τ (u) < ∞ Z ∞ = δe−δx P(p/i − V ≤ x) dx 0 Z ∞ £ −δx ¤∞ = −e P(p/i − V ≤ x) 0 + e−δx fV (p/i − x) dx Z
p/i
0 β/i−1 β/i
(p/i − x) δ = P(V ≥ p/i) + e−δp/i dx Γ (β/i) 0 ½ ¾ 1 (p/i)β/i δ β/i e−δp/i = Γ (δp/i; β/i) + . Γ (β/i) β/i From this (2.2) follows by elementary algebra.
2
Proof 3. Just insert p(x) = p + i x into (1.14) and identify the resulting integral as the incomplete Gamma function. 2
226
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Example 2.5 The analysis leading to Theorem 2.3 is also valid if {Rt } is obtained by adding interest to a more general process with stationary independent increments. As an example, assume Brownian motion {Bt } with drift µ and variance σ 2 ; then {Rt } is the diffusion with drift function µ + ix and constant variance σ 2 . The process {St } corresponds to {−Bt } so that κ(α) = σ 2 α2 /2 − µα, and the c.g.f. of Z is Z ´ 1 1 α ³ σ2 y κ (−y) dy = + µ dy i 0 2 0 iy µα σ 2 α2 + . = 4i i ¡ ¢ I.e., Z is normal µ/i, σ 2 /2i , and since Rτ = 0 by the continuity of Brownian motion, it follows that the ruin probability is Z
b log H[α] =
ψ(u) =
α
H(−u) = H(0)
³ −u − µ/i ´ √ σ/ 2i ³ −µ/i ´ . √ Φ σ/ 2i
Φ
(2.3)
2 Notes and references Theorem 2.3 is from Harrison [449]; for a martingale proof, see e.g. Gerber [398, p. 134] (the time scale there is discrete but the argument is easily adapted to the continuous case). Corollary 2.4 is classical. Formula (2.3) was derived by Emanuel et al. [342] and Harrison [449]; it is also used as a basis for a diffusion approximation by these authors. Paulsen & Gjessing [687] found some remarkable explicit formulas for ψ(u) beyond the exponential case in Corollary 1.9. The solution is in terms of Bessel functions for an Erlang(2) B and in terms of confluent hypergeometric functions for a H2 B (a mixture of two exponentials). It must be noted, however, that the analysis does not seem to carry over to general phasetype distributions, not even Erlang(3) or H3 , nor to nonlinear premium rules p(·). Explicit formulas for the finitetime ruin probabilities ψ(u, T ) for exponential claims in terms of finite gamma series were obtained in Albrecher, Teugels & Tichy [39] whenever β = ki for some integer k. See also Knessl & Peters [548] for a detailed asymptotic study of ψ(u, T ) for i > 0 and exponential claims. Avram, Leonenko & Rabehasaina [110] extend the method of [39] to certain jumpdiffusion models. A numerical algorithm for determining ψ(u, T ) based on discrete time Markov chains can be found in Cardoso & Waters [221, 222]. P n A r.v. of the form ∞ 1 ρ Xn with the Xn i.i.d. as in the proof of Proposition 2.2 is a special case of a perpetuity; see e.g. Goldie & Gr¨ ubel [422] and Section 5.
3. THE LOCAL ADJUSTMENT COEFFICIENT
227
Further studies of the model with interest can be found in Boogaert & Crijns [180], Gerber [396], Delbaen & Haezendonck [288], Emanuel et al. [342], Paulsen [680, 681, 682], Paulsen & Gjessing [687], Sundt & Teugels [822, 823], Yang [899], Cai & Dickson [215] and Rulli`ere & Loisel [758]. Some of these references also go into a stochastic interest rate.
3
The local adjustment coefficient. Logarithmic asymptotics
For the classical risk model with constant premium rule p(x) ≡ p∗ , write γ ∗ for the solution of the Lundberg equation ¡ ¢ b ∗ ] − 1 − γ ∗ p∗ = 0 , β B[γ (3.1) write ψ ∗ (u) for the ruin probability etc., and recall Lundberg’s inequality ∗
u
ψ ∗ (u) ∼ C ∗ e−γ
∗
ψ ∗ (u) ≤ e−γ
(3.2)
and the Cram´erLundberg approximation u
.
(3.3)
When trying to extend these results to the model of this chapter where p(x) depends on x, a first step is the following: b Theorem 3.1 Assume that for some 0 < δ0 ≤ ∞, it holds that B[s] ↑ ∞, log ψ(u) ≤ −δ0 . If δ0 < ∞ s ↑ δ0 , and that p(x) → ∞, x → ∞. Then lim sup u u→∞ log ψ(u) → −δ0 , and e−²r p(r) → 0, e(δ0 +²)x B(x) → ∞ for all ² > 0, then u u → ∞. In the proof as well as in the remaining part of the section, we will use the local adjustment coefficient γ(x), which for a fixed x is defined as the adjustment coefficient of the classical risk model with p∗ = p(x), i.e. as the solution of the equation ¡ ¢ b − 1) − αp(x). κ x, γ(x) = 0 where κ(x, α) = β(B[α] (3.4) We assume existence of γ(x) for all x, as will hold under the steepness assumption of Theorem 3.1, and (for simplicity) that inf p(x) > βµB ,
x≥0
(3.5)
228
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
which implies inf x≥0 γ(x) > 0. The intuitive idea behind introducing local adjustment coefficients is that the classical risk model with premium rate p∗ = p(x) serves as a ‘local approximation’ at level x for the general model when the reserve is close to x.3 Proof of Theorem 3.1. The steepness assumption and p(x) → ∞ ensure γ(x) → δ0 . Let γ ∗ < δ0 , let p∗ be as in (3.1) and for a given ² > 0, choose u0 such that p(x) ≥ p∗ when x ≥ u0 ². When u ≥ u0 , obviously ψ(u) can be bounded with the probability that the Cram´erLundberg compound Poisson model with premium rate p∗ downcrosses level u² starting from u, which in turn by Lund∗ berg’s inequality can be bounded by e−γ (1−²)u . Hence lim supu→∞ log ψ(u)/u ∗ ≤ −γ (1 − ²). Letting first ² → 0 and next γ ∗ ↑ δ0 yields the first statement of the theorem. (1) (2) (1) For the last assertion, choose c² , c² such that p(x) ≤ c² e²x , B(x) ≥ (2) −(δ0 +²)x c² e for all x. Then we have the following lower bound for the time for the reserve to go from level u to level u + v without a claim: Z v 1 −²u dt ≥ c(3) ω(u + v) − ω(u) = ² e p(u + t) 0 (3)
(1)
where c² = (1 − e−²v )/(²c² ). Therefore the probability that a claim arrives (4) before the reserve has reached level u + v is at least c² e−²u . Given such an arrival, ruin will occur if the claim is at least u + v, and hence −²u (2) −(δ0 +²)u c e . ψ(u) ≥ c(4) ² e
The truth of this for all ² > 0 implies lim inf log ψ(u) ≥ −δ0 .
2
Obviously, Theorem 3.1 only presents a first step, and in particular, the result is not very informative if δ0 = ∞. The rest of this section deals with tail estimates involving the local adjustment coefficient. The first main result in this direction is the following version of Lundberg’s inequality: Theorem 3.2 Assume that p(x) is a nondecreasing function of x and let I(u) = Ru γ(x) dx. Then 0 ψ (u) ≤ e−I(u) . (3.6) The second main result to be derived states that the bound in Theorem 3.2 is also an approximation under appropriate conditions. The form of the result is superficially similar to the Cram´erLundberg approximation, noting that in many cases the constant C is close to 1. However, the limit is not u → ∞ but 3 Note
that this was also the motivation behind the approach of Section 1b.
3. THE LOCAL ADJUSTMENT COEFFICIENT
229
the slow Markov walk limit in large deviations theory (see e.g. Bucklew [207]). © (²) ª For ² > 0, let ψ² (u) be evaluated for the process Rt defined as in (1.2), only with β replaced by β/² and Ui by ²Ui . Theorem 3.3 Assume that either (a) p(r) is a nondecreasing function of r, or (b) Condition 3.13 below holds. Then lim −² log ψ² (u) = I(u). ²↓0
(3.7)
Remarks: 1. Condition 3.13 is a technical condition on the claim size distribution B, which essentially says that an overshoot r.v. U U > x cannot have a much heavier tail than the claim U itself. (²)
2. If p(x) ≡ p is constant, then Rt = ²Rt/² for all t so that ψ² (u) = ψ(u/²), I.e., the asymptotics u → ∞ and ² → 0 are the same. 3. The slow Markov walk limit is appropriate if p(x) does not vary too much compared to the given mean interarrival time 1/β and the size U of the claims; one can then assume that ² = 1 is small enough for Theorem 3.3 to be reasonably precise and use e−I(u) as approximation to ψ (u). 4. As typical in large deviations theory, the logarithmic form of (3.7) only captures ‘the main term in the exponent’, but is not precise to describe the asymptotic form of ψ(u) in terms of ratio limit theorems (the precise asymptotics could be log I(u)e−I(u) or I(u)α e−I(u) , say, rather than e−I(u) ).
3a
Examples
Before giving the proofs of Theorems 3.2, 3.3, we consider some simple examples. First, we show how to rewrite the explicit solution for ψ(u) in Corollary 1.9 in terms of I(u) when the claims are exponential: Example 3.4 Consider again the exponential case B(x) = e−δx as in Corollary 1.9. Then γ(x) = δ − β/p(x), and Z
Z
u
u
γ(x) dx = δu − β 0
0
p(x)−1 dx = δu − βω(u).
230
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Integrating by parts, we get Z ∞ © ª 1 β = 1+ exp βω(x) − δx dx γ0 p(x) Z0 ∞ © ª dβω(x) exp βω(x) − δx dx = 1+ dx 0 Z ∞ £ © ª¤∞ © ª exp βω(x) − δx dx = 1 + exp βω(x) − δx 0 + δ 0 Z ∞ −I(x) = 1+0−1 + δ e dx, 0
1 γ0
Z
Z
∞
g(x) dx
=
u
= =
∞
© ª β exp βω(x) − δx dx p(x) u Z ∞ £ © ª¤∞ © ª exp βω(x) − δx u + δ exp βω(x) − δx dx u Z ∞ © ª © ª δ exp βω(x) − δx dx − exp βω(u) − δu , u
and hence R∞ ψ(u) =
u
e−I(y) dy − e−I(u) /δ R∞ = e−I(u) e−I(y) dy 0
R∞ 0
Ry
e− 0 γ(x+u) dx dy − 1/δ . (3.8) R ∞ − R y γ(x)dx e 0 dy 0 2
We next give direct derivations of Theorems 3.2, 3.3 in the particularly simple case of diffusions: Example 3.5 Assume that {Rt } is a diffusion on [0, ∞) with drift µ(x) and variance σ 2 (x) > 0 at x. The appropriate definition of the local adjustment coefficient γ(x) is then as the one 2µ(x)/σ 2 (x) for the locally approximating Brownian motion. It is well known (see Theorem XIII.4.4 or Karlin & Taylor [522, pp. 191–195]) that R ∞ −I(y) R ∞ − R y γ(x+u)dx e dy e 0 dy . = e−I(u) 0R ∞ − R y γ(x)dx ψ(u) = Ru∞ −I(y) 0 e dy e dy 0 0
(3.9)
If γ(x) is increasing, applying the inequality γ(x + u) ≥ γ(x) yields immediately the conclusion of Theorem 3.2. For Theorem 3.3, note first that the appropriate
3. THE LOCAL ADJUSTMENT COEFFICIENT
231
slow Markov walk assumption amounts to µ² (x) = µ(x), σ²2 (x) = ²σ 2 (x) so that γ² (x) = γ(x)/², I² (u) = I(u)/², and (3.9) yields −² log ψ² (u) = I(u) + A² − B² , where
³Z
∞
A² = ² log
e
−
Ry 0
γ(x)dx/²
³Z
´ dy ,
B² = ² log
∞
e−
(3.10) Ry 0
γ(x+u)dx/²
´ dy .
0
0
The analogue of (3.5) is inf x≥0 γ(x) > 0 which implies that the integral in the definition of A² converges to 0. In particular, the integral is bounded by 1 eventually and hence lim sup A² ≤ lim sup ² log 1 = 0. Choosing y0 , γ0 > 0 such that γ(x) ≤ γ0 for y < y0 , we get Z ∞ R Z y0 y ² ² (1 − e−y0 γ0 /² ) ∼ . e− 0 γ(x)dx/² dy ≥ e−yγ0 /² dy = γ0 γ0 0 0 This implies lim inf A² ≥ lim ² log ² = 0 and A² → 0. Similarly, B² → 0, and (3.7) follows. 2 The analogue of Example 3.5 for risk processes with exponential claims is as follows: Example 3.6 Assume that B is exponential with rate δ. Then the solution of the Lundberg equation is γ ∗ = δ − β/p∗ so that Z u 1 dx. I(u) = δu − β p(x) 0 Note that this expression shows up also in the explicit formula for ψ(u) in the form given in Example 3.4. Ignoring 1/δ in the formula there, this leads to (3.6) exactly as in Example 3.5. Further, the slow Markov walk assumption means δ² = δ/², β² = β/². Thus γ² (x) = γ(x)/² and (3.10) holds if we redefine A² as ´ ³Z ∞ R y A² = ² log e− 0 γ(x)dx/² dy − ²/δ 0
and similarly for B² . As in Example 3.5, lim sup A² ≤ lim sup ² log(1 − 0) = 0 . ²→0
²→0
By (3.5) and γ ∗ = δ − β/p∗ , we have δ > γ0 and get ³ ³1 1 ´´ − ≥ 0. lim inf A² ≥ lim ² log ² γ0 δ Now (3.7) follows just as in Example 3.5.
2
232
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
We next investigate what the upper bound (or approximation) e−I(u) looks like in the case p(x) = p + ix (interest) subject to various forms of the tail B(x) of B. Of course, γ(x) is typically not explicit, so our approach is to determine standard functions G1 (u), . . . , Gq (u) representing the first few terms in the asymptotic expansion of I(u) as u → ∞. I.e., Gi (u) → ∞,
Gi+1 (u) = o(1), Gi (u)
¡ ¢ I(u) = G1 (u) + · · · + Gq (u) + o Gq (u) .
It should be noted, however, that the interchange of the slow Markov walk limit ² → 0 and the limit u → ∞ is not formally justified and in fact, the slow Markov walk approximation deteriorates as x becomes large. Nevertheless, the results are suggestive in their form and much more explicit than anything else in the literature. Example 3.7 Assume that B(x) ∼ c1 xα−1 e−δx ,
x→∞
(3.11)
with α > 0. This covers mixtures or convolutions of exponentials or, more generally, phasetype distributions (Example I.2.4) or gamma distributions; in the phasetype case, the typical case is α = 1 which holds, e.g., if the phase b → generator is irreducible (Proposition IX.1.8). It follows from (3.11) that B[s] ∞ as s ↑ δ and hence γ ∗ ↑ δ as p∗ → ∞. More precisely, Z ∞ ¢ c1 δΓ(α) ¡ b 1 + o(1) B[s] = 1+s esx B(x)dx = 1 + α (δ − s) 0 as s ↑ δ, and hence (3.1) leads to (δ − γ ∗ )α ≈
βc1 Γ(α) , p∗ Z
I(u) ≈ δu − c2 0
u
γ ∗ ≈ δ − c2 p∗
−1/α
,
¡ ¢1/α c2 = βc1 Γ(α) ,
δu 1 δu − c3 log u dx ≈ (p + ix)1/α δu − c4 u1−1/α
where c3 = c2 /i, c4 = c2 i−1/α /(1 − 1/α).
α1
, 2
Example 3.8 Assume next that B has bounded support, say 1 is the upper limit and B(x) ∼ c5 (1 − x)η−1 , x ↑ 1, (3.12)
3. THE LOCAL ADJUSTMENT COEFFICIENT
233
with η ≥ 1. For example, η = 1 if B is degenerate at 1, η = 2 if B is uniform on (0, 1) and η = k + 1 if B is the convolution of k uniforms on (0, 1/k). Here b is defined for all s and B[s] Z b −1 B[s]
= ≈
1
Z
s
e B(x) dx = e e−y B(1 − y/s) dy 0 0 Z c5 es Γ(η) c5 es ∞ −y η−1 e y dy = sη−1 0 sη−1
s
sx
s
∗
η
as s ↑ ∞. Hence (3.1) leads to βc5 eγ Γ(η) ∼ γ ∗ p∗ , γ ∗ ≈ log p∗ + η log log p∗ ,
I(u) ≈ u(log u + η log log u). 2
Example 3.9 As a case intermediate between (3.11) and (3.12), assume that B(x) ∼ c6 e−x
2
/2c7
, x ↑ ∞.
(3.13)
We get Z
b −1 B[s]
≈ =
Z ∞ 2 2 dx = c6 sec7 s /2 e−(x−c7 s) /2c7 dx 0 0 ³ s ´ √ √ 2 c7 s2 /2 ∼ c6 s 2πc7 ec7 s /2 , c6 s 2πc7 e Φ √ c7 p p ∼ log p∗ , γ ∗ ∼ c8 log p∗ , I(u) ≈ c8 u log u,
c6 s
∞
2
esx e−x
/2c7
c7 ∗2 γ 2 p where c8 = 2/c7 .
3b
2
Proof of Theorem 3.2
We first remark that the definition (3.4) of the local adjustment coefficient is not the only possible one: whereas the motivation for (3.4) is the formula 1 b − 1) − sp(u), h ↓ 0, log Eu es(Rh −u) ∼ β(B[s] h
(3.14)
for the m.g.f. of the increment in a small time interval [0, h], one could also have considered the increment ru (T1 )−u−U1 up to the first claim (here ru (·) denotes the solution of r˙ = p(r) starting from ru (0) = u). This leads to an alternative local adjustment coefficient γ0 (u) defined as solution of Z ∞ b 0 (u)] · 1 = Eeγ0 (u)(U1 +u−ru (T1 )) = B[γ βe−βt eγ0 (u)(u−ru (t)) dt. (3.15) 0
234
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Proposition 3.10 Assume that p(x) is a nondecreasing function of x. Then: (a) γ(x) and γ0 (x) are also nondecreasing functions of x; (b) γ(x) ≤ γ0 (x). Proof. That γ(x) is nondecreasing follows easily by inspection of (3.4). The assumption implies that ru (t) − u is a nondecreasing function of u. Hence for u < v, 1 =
Eeγ0 (u)(U1 +u−ru (T1 )) ≥ Eeγ0 (u)(U1 +v−rv (T1 )) .
By convexity of the m.g.f. of U1 + v − rv (T1 ), this is only possible if γ0 (v) ≥ γ0 (u). For (b), note that the assumption implies that ru (t) − u ≥ tp(u). Hence 1
0
Eeγ0 (u)(U1 +u−ru (T1 )) ≤ Eeγ0 (u)(U1 −p(u)T1 ) £ ¤ β b γ0 (u) , = B β + γ0 (u)p(u) ¡ ¢ b 0 (u)] − 1 − γ0 (u)p(u). ≤ β B[γ =
Since (3.4) considered as function of γ is convex and equals 0 for γ = 0, this is only possible if γ0 (u) ≥ γ(u). 2 We prove Theorem 3.2 in terms of γ0 ; the case of γ then follows immediately by Proposition 3.10(b): Theorem 3.11 Assume that p(x) is a nondecreasing function of x. Then Ru
ψ(u) ≤ e− 0 γ0 (x) dx . (3.16) ¡ ¢ Proof. Define ψ (n) (u) = P τ (u) ≤ σn as the ruin probability after at most n claims (σn = T1 + · · · + Tn ). We shall show by induction that ψ (n) (u) ≤ e−
Ru 0
γ0 (x) dx
,
(3.17)
from which the theorem follows by letting n → ∞. The case n = 0 is clear since (0) here ¡ T0 = 0 so that ψ¢ (u) = 0. Assume (3.17) shown for n and let Fu (x) = P U1 + u − ru (T1 ) ≤ x . Separating after whether ruin occurs at the first claim or not, we obtain ψ (n+1) (u) ≤ ≤ =
Z u 1 − Fu (u) + ψ (n) (u − x)Fu (dx) −∞ Z u Z ∞ R u−x Fu (dx) + e− 0 γ0 (y)dy Fu (dx) u −∞ ½Z ∞ R ¾ Z u R Ru u u e− 0 γ0 (x)dx e 0 γ0 (y)dy Fu (dx) + e u−x γ0 (y)dy Fu (dx) . u
−∞
3. THE LOCAL ADJUSTMENT COEFFICIENT
235
the cases x ≥ 0R and x < 0 separately, it is easily seen that RConsidering u u γ (y)dy ≤ xγ0 (u). Also, 0 γ0 (y)dy ≤ uγ0 (u) ≤ xγ0 (u) for x ≥ u. Hence 0 u−x Z u nZ ∞ o Ru ψ (n+1) (u) ≤ e− 0 γ0 (x)dx exγ0 (u) Fu (dx) + exγ0 (u) Fu (dx) =
e−
=
e−
Ru 0
Ru 0
u
γ0 (x)dx
£ ¤ Fbu γ0 (u)
γ0 (x)dx
,
−∞
where the last identity immediately follows from (3.15); we used also Proposition 3.10(a) for some of the inequalities. 2 It follows from Proposition 3.10(b) that the bound provided by Theorem 3.11 is sharper than the one given by Theorem 3.2. However, γ0 (u) appears more difficult to evaluate than γ(u). Also, for either of Theorems 3.2, 3.11 be reasonably tight something like the slow Markov walk conditions in Theorem 3.3 is required, and here it is easily seen that γ0 (u) ≈ γ(u). For these reasons, we have chosen to work with γ(u) as the fundamental local adjustment coefficient.
3c
Proof of Theorem 3.3
© (²) ª The idea of the proof is to bound Rt above and below in a small interval [x − x/n, x + x/n] by two classical risk processes with a constant p and appeal to the classical results (3.2), (3.3). To this end, define uk,n =
k u, n
pk,n =
sup
p(x),
uk−1,n ≤x≤uk+1,n
pk,n =
inf
uk−1,n ≤x≤uk+1,n
p(x),
and, in accordance with the notation ψ² (u), ψp∗∗ (u), let ψp∗∗ ;² (u) denote the ruin probability for the classical model with β replaced by β/² and Ui by ²Ui . Lemma 3.12 lim sup²↓0 −² log ψ² (u) ≤ I(u). © (²) ª Proof. For ruin to occur, Rt (starting from u = un,n ) must first downcross un−1,n . The probability of this is at least φp∗n,n ;² (u/n), the probability that ruin occurs in the Cram´erLundberg model with p∗ = pn,n (starting from u/n) without that 2u/n is upcrossed before ruin. Further, given downcrossing occurs, © (²) ª the value of Rt at the time of downcrossing is < un−1,n so that ψ² (u)
≥ φp∗n,n ;² (u/n) ψ² (un−1,n ) ≥ φ∗pn,n ;² (u/n) φp∗n−1,n ;² (u/n) ψ² (un−2,n ) ≥ ... ≥
n Y k=1
φp∗k,n ;² (u/n) .
236
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Now as ² ↓ 0,
ψp∗∗ ;² (u) = ψp∗∗ (u/²) ∼ C ∗ e−γ
∗
u/²
,
where the first equality follows by an easy scaling argument and the approximation by (3.3). Let Ck,n , γ k,n be C ∗ , resp. γ ∗ evaluated for p∗ = pk,n ; in particular, since γ ∗ is an increasing function of p∗ , also γ k,n =
sup
γ(x).
uk−1,n ≤x≤uk,n
Clearly, ψp∗∗ ;² (u/n) − φ∗p∗ ;² (u/n) ≤ φp∗k,n ;² (u/n) ≥ ∼ =
ψp∗∗ ;² (2u/n) , ψp∗k,n ;² (u/n) − ψp∗k,n ;² (2u/n) ¡ ¢ Ck,n e−γ k,n u/²n 1 − e−γ k,n u/²n ¡ ¢ Ck,n e−γ k,n u/²n 1 + o(1) ,
where o(1) refers to the limit ² ↓ 0 with n and u fixed. It follows that − log ψ² (u)
≤ −
n X
log φp∗k,n ;² (u/n)
k=1
=
n n X u X γ k,n − log Ck,n + o(1), ²n
≤
n uX γ k,n . n
k=1
lim sup −² log ψ² (u) ²↓0
k=1
k=1
Letting n → ∞ and using a Riemann sum approximation completes the proof. 2 Theorem 3.3 now follows easily in case (a). Indeed, in obvious notation one has γ² (x) = γ(x)/², so that Theorem 3.2 gives ψ² (u) ≤ e−I(u)/² ⇒ lim inf −² log ψ² (u) ≥ I(u). ²↓0
Combining with the upper bound of Lemma 3.12 completes the proof. In case (b), we need the following condition: Condition 3.13 There exists a r.v. V < ∞ such that (i) for any u < ∞ there exist Cu < ∞ and δ(u) > supx≤u γ(x) such that P(V > x) ≤ Cu e−δ(u)x ;
(3.18)
3. THE LOCAL ADJUSTMENT COEFFICIENT
237
(ii) the family of claim overshoot distributions is stochastically dominated by V , i.e. for all x, y > 0 it holds that ¯ ¡ ¢ B(x + y) ≤ P(V > y). P U > x + y¯U > x = B(x) To complete the proof, let v ≤ u and define © ª (²) (²) τ (²) (u, v) = inf t > 0 : Rt < v  R0 = u ,
(3.19)
(²)
ξ (²) (u, v) = v − Rτ (²) (u,v) .
Then ψ² (u) £ ¡ ¢ ¤ = E ψ² Rτ (²) (u,u/n) ; τ (²) (u, u/n) < ∞ £ ¡ ¢ ¤ = E ψ² u/n − ξ (²) (u, u/n) ; τ (²) (u, u/n) < ∞ £ ¡ ¢¯ ¤ ¡ ¢ = E ψ² u/n − ξ (²) (u, u/n) ¯τ (²) (u, u/n) < ∞ · P τ (²) (u, u/n) < ∞ ¡ ¢ ≤ Eψ² (u/n − ²V ) · P τ (²) (u, u/n) < ∞ . Write Eψ² (u/n − ²V ) = E1 + E2 , where E1 is the contribution from the event that the process does not reach level 2u/n before ruin and E2 is the rest. Then the standard Lundberg inequality yields E1
≤
(u/n − ²V ) = Eψp∗ (u/²n − V ) 1,n ¤ u/²n £ γ1,n V −γ 1,n E e ; V ≤ u/²n + P(V > u/²n) e
=
e
Eψp∗
≤
1,n
−γ
1,n
;²
u/²n
O(1)
(using (3.18) for the last equality). For E2 , we first note that the number of (²) downcrossings of 2u/n starting from R0 = 2u/n is bounded by a geometric r.v. N with
EN ≤
1 1−
∗ ψinf x≥2u/n p(x);²
(0)
=
inf x≥2u/n p(x) = O(1), inf x≥2u/n p(x) − βEU
cf. (3.5) and the standard formula for ψ(0). The probability of ruin in between two downcrossings is bounded by Eψp∗
1,n
so that E2 ≤ e
−2γ
;²
1,n
(2u/n − ²V ) = e
u/²n
O(1),
−2γ
1,n
u/²n
−γ
E1 + E2 ≤ e
O(1)
1,n
u/²n
O(1).
238
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Hence lim inf −² log ψ² (u) ²↓0 © ¡ ¢ª ≥ lim inf −² log(E1 + E2 ) + log P τ (²) (u, u/n) < ∞ ²↓0
≥
¡ ¢ u γ + lim inf −² log P τ (²) (u, u/n) < ∞ ²↓0 n 1,n
.. . n
≥
uX γ . n i=1 i,n
Another Riemann sum approximation completes the proof.
2
Notes and references With the exception of Theorem 3.1, the results are from Asmussen & Nielsen [92]; they also discuss simulation based upon ‘local exponential change of measure’ for which the likelihood ratio is n Z Lt = exp −
o
t
γ(Rs− ) dRs 0
n Z = exp −
t
γ(Rs )p(Rs ) ds + 0
Nt X
o γ(RTi − )Ui .
i=1
An approximation similar to (3.7) for ruin probabilities in the presence of an upper barrier b appears in Cottrell et al. [263], where the key mathematical tool is the deep WentzellFreidlin theory of slow Markov walks (see e.g. Bucklew [207]). Djehiche [324] gives an approximation for ψ(u, T ) = Pu (inf 0≤t≤T Rt < 0) via related large deviations techniques. Comparing these references with the present work shows that in the slow Markov walk setup, the risk process itself is close to the solution of the differential equation r(x) ˙ = −κ0 (x, 0) (= p(x) − βEU ) (3.20) (with κ(x, s) as in (3.4) and the prime meaning differentiation w.r.t. s), whereas the most probable path leading to ruin is the solution of ` ´ r(x) ˙ = −κ0 x, γ(x) (3.21) (the initial condition is r(0) = u in both cases). Whereas the result of [324] is given in terms of an action integral which does not look very explicit, one can in fact arrive at the optimal path by showing that the approximation for ψ(u, T ) is maximized over T by taking T as the time for (3.21) to pass from u to 0; the approximation (3.7) then comes out (at least heuristically) by analytical manipulations with the action integral. Similarly, it might be possible to show that the limits ² ↓ 0 and b ↑ ∞ are interchangeable in the setting of [263]. Typically, the rigorous implementation of these ideas via large deviations techniques would require slightly stronger smoothness conditions on p(x) than ours and conditions somewhat different from Condition 3.13, b to be defined for all s > 0 (thus excluding, e.g., the the simplest being to require B[s]
4. THE MODEL WITH TAX
239
exponential distribution). We would like, however, to point out as a maybe much more important fact that the present approach is far more elementary and selfcontained than that using large deviations theory. For different types of applications of large deviations to ruin probabilities, see XIII.3. Asymptotic results for surplusdependent premium under heavytailed claims are given in Section X.5.
4
The model with tax (ϑ)
Consider now a compound Poisson risk process Rt with constant premium income intensity (1 − ϑ) p (0 < ϑ ≤ 1) whenever the risk process is in its © (ϑ) ª (ϑ) running maximum Mt = max Rs , 0 ≤ s ≤ t and constant premium income intensity p otherwise. So the dynamics of the risk process are given by ( (ϑ) (ϑ) p dt − dAt , if Rt < Mt , (ϑ) dRt = (4.1) (ϑ) (ϑ) (1 − ϑ) p dt − dAt , if Rt = Mt , P Nt where At = i=1 Ui are the aggregate claims up to time t. As outlined in Example 1.4, a natural interpretation for this model is that the insurance company needs to pay tax at rate ϑ whenever the risk process is at a new record height (which is considered to be profit) and does not need to pay tax if it is below the running maximum, as then the incoming premium is needed to amortize the previous claim payments until a new running maximum is reached. But one can also simply think of a profitparticipation in terms of dividend payments to shareholders according to the above scheme. Figure VIII.2 depicts a sample path of the resulting risk process. Rt
u
t
Figure VIII.2 The resulting ruin probability ψϑ (u) has a strikingly simple relation to the ruin probability ψ(u) = ψ0 (u) of the original risk process.
240
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Theorem 4.1 In case of positive safety loading η > 0 and ϑ < 1, ψϑ (u) < 1 holds for all u ≥ 0. In particular, in this case ¡ ¢1/(1−ϑ) 1 − ψϑ (u) = 1 − ψ0 (u) .
(4.2)
Proof. Recall from Theorem II.2.3 that the survival probability φ(u) = 1 − ψ(u) in the classical compound Poisson risk model can be interpreted as the probability to have zero events during [u, ∞) of an inhomogeneous Poisson process with rate β(s) = β/p P(Vmax > s), where Vmax denotes the maximum workload of an M/G/1 queue.4 But now one realizes that the survival probability with tax φϑ (u) can also be interpreted in a similar way, since when we cut out the excursions away from the running maximum that do not lead to ruin (which are identical to those witho ut tax), then a straight line with slope p(1 − ϑ) remains. After rescaling to slope 1, the probability to survive is to have no events during ¡ ¢ [u, ∞) of the inhomogeneous Poisson process with rate β(s) = β/ p(1 − ϑ) P(Vmax > s), where Vmax is again the maximum workload of the original M/G/1 queue. In view of (2.5) from Chapter II, this leads to Z ∞ ³ ´ ¡ ¢1/(1−ϑ) β P(Vmax > s) ds = φ0 (u) φϑ (u) = exp − . p(1 − ϑ) u 2 Remark 4.2 If one defines φϑ (u, v) for u < v as the probability that, starting from level u at time 0, the process reaches level v before ruin occurs (clearly φϑ (u) = φϑ (u, ∞)), then — along the same line of arguments as above — we have Z v ´ ³ ¡ ¢1/(1−ϑ) β P(Vmax > s) ds = φ0 (u, v) . (4.3) φϑ (u, v) = exp − p(1 − ϑ) u 2 It is now straightforward to extend identity (4.2) to a surplusdependent tax rate. Corollary 4.3 If the tax rate ϑ(r) (0 ≤ ϑ(r) < 1) depends on the current surplus level Rt = r, then the corresponding survival probability φΓ (u) is given by ´ ´ ³ Z ∞³ d 1 log φ0 (s) ds . (4.4) φΓ (u) = exp − ds 1 − ϑ(s) u 4 In Theorem II.2.3 we had p = 1. Here we have a general p, so we first need to rescale time by the factor p, i.e. the Poisson process with intensity β/p and the new premium rate 1 leads to the same φ(u) as the original process, but now can be linked with the M/G/1 queue.
4. THE MODEL WITH TAX
241
Proof. In this case φΓ (u) is the probability to have zero ¡ events during ¢ [u, ∞) of an inhomogeneous Poisson process with rate β(s) = β/ p(1 − ϑ(s)) P(Vmax > s), where Vmax is again the maximum workload of a busy period in a classical M/G/1 queue. Now just again invoke Theorem II.2.3. 2 Remark 4.4 From (4.4) one sees that even in the case ϑ(r) → 1 as r → ∞ there can be a positive probability of survival, as long as the convergence rate is sufficiently low. 2 Consider finally a further generalization of the risk process given by the surplusdependent dynamics ½ p1 (Rtg ) dt − dAt , if Rtg < Mtg , dRtg = (4.5) p2 (Rtg ) dt − dAt , if Rtg = Mtg . Denote with q(u) the probability that if a claim occurs in the running maximum u, then ruin occurs before Rtg reaches level u again (note that q(u) depends on p1 (·), but not on p2 (·)). Then, the same reasoning as above (where now q(u) takes the role of P(Vmax > u)) shows that the survival probability is given by Z ∞ ´ ³ q(s) g ds . (4.6) φ (u) = exp −β p2 (s) u It immediately follows that if p2 (s) = (1 − ϑ)p1 (s), then the tax identity (4.2) holds again, where ψ0 (u) then refers to the ruin probability of the risk process with premium rule p1 (x) for all x ≥ 0. In particular, this shows that the tax identity also holds for the compound Poisson risk process with interest (where p1 (x) = p + ix) discussed in Section 2. As a byproduct, one obtains with p1 (x) = p2 (x) = p + ix that the ruin probability of the classical risk process with constant interest (without tax) can be expressed as Z ∞ ´ ³ q(s) ds , ψ(u) = 1 − exp −β p + is u which can be compared with (1.6). It then remains, however, to identify an explicit expression for the quantity q(u). Notes and references The tax model was introduced in Albrecher & Hipp [28], where the identity (4.2) was derived by a different approach. The simpler proof given here is based on Albrecher, Borst, Boxma & Resing [16], where also the treatment of the surplusdependent tax rate can be found. Albrecher, Renaud & Zhou [35] used excursion theory to extend the identity (4.2) to arbitrary spectrally negative L´evy processes and also generalized a formula from [28] for the moments of the accumulated discounted tax payments until ruin in terms of scale functions. Kyprianou & Zhou [569]
242
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
then further extended this approach to surplusdependent tax rates and quantities like the time of ruin, the deficit at ruin and the surplus prior to ruin in the tax model; see also Renaud [736]. A more direct analysis of these latter quantities in the compound Poiss on setting with tax can be found in Ming & Wang [645] and absolute ruin under the tax payment scheme is studied in Ming, Wang & Xao [646]. A treatment of the tax problem for the model (4.5) can be found in Wei [878], where (4.6) is derived by a direct differential argument and in Wang et al. [868], where the accumulated discounted tax payments until ruin is considered. For an extension to a Markovmodulated model, see Wei, Yang & Wang [877]. A slightly different model is considered in Hao & Tang [448], where the authors give a fine asymptotic study of the ruin probabilities of a spectrally negative L´evy risk model that is subject to periodic taxation on its net gains during each period. The effect of tax payments on a nonMarkovian risk process is investigated in Albrecher, Badescu & Landriault [15] in the context of the dual risk model, in which the sign of the premium income and the aggregate claims are reverted. The simple tax identity (4.2) then does not hold anymore, but a similar relationship holds for arbitrary interarrival times and exponential jump sizes.
5
Discretetime ruin problems with stochastic investment
We consider a discretetime risk reserve process R0 , R1 , . . . given by R0 = u > 0 and the recursion Rn = An Rn−1 − Bn , (5.1) where {An }, {Bn } are independent sequences each consisting of i.i.d. r.v.’s and An > 0. The interpretation is that the reserve is invested in risky assets yielding a stochastic interest rate of An − 1 in period n, whereas Bn is the claim surplus, that is, the difference between claims and premiums received. The reserve may decrease if either An < 1 or Bn > 0, so that financial risk enters via the An and traditional insurance risk via the Bn . As usual, the ruin time τ (u) corresponding to R0 = u is the first n with Rn < 0, and the ruin probability ψ(u) is the probability that Rn < 0 for some n. To avoid trivialities, we assume P(Bn > 0) > 0 since otherwise ψ(u) = 0 for all u > 0, and also P(An < 1) > 0 since otherwise there is no investment risk. A first question is when ψ(u) = 1 for all u and when not. One could expect the first possibility to occur when EA1 < 1 but not when EA1 > 1 (then one would expect Rn → ∞ with positive probability). However, the relevant criterion is in terms of E log An : Proposition 5.1 Assume E log Bn  < ∞. If E log An < 0, then ψ(u) = 1 for all u > 0. If E log An > 0, then ψ(u) < 1 for all large u > 0.
5. DISCRETETIME RUIN WITH STOCHASTIC INVESTMENT
243
Proof for E log An < 0. First note that for Rn−1 = x > 0, we have
¡ ¢ log Rn+ = log An + log x + log(1 − Bn /x)+ ≤ log An + log x + log 1 + Bn /x . ¡ ¢ By assumption, there exists x0 such that E log An + E log 1 + Bn /x ≤ 0 for all x > x0 . I.e., ¯ £ ¤ E log Rn+ − log x+ ¯ Rn−1 = x ≤ 0 for all x > x0 . By standard recurrence criteria for Markov processes ([APQ, p. 21), this implies that Rn cannot go to ∞ so that some interval of the form [0, x1 ] is visited i.o. by {Rn+ }. Since (unless for trivial cases) inf 0≤u≤x1 ψ(u) > 0, a geometric trial argument therefore gives ψ(u) = 1. 2 The proof for the case E log An > 0 will be given after Proposition 5.3. In view of Proposition 5.1, we henceforth assume E log Bn  < ∞ and E log An > 0. This implies in particular P(An > 1) > 0 (recall that we also assumed P(An < 1) > 0). We next note some representations of Rn and the ruin time, which are similar to some from Section 2 in the constant interest rate case (with the −Y chain taking the role of the Z process there). −1 ∗ Proposition 5.2 Define Dn = A−1 1 · · · An , Rn = Dn Rn and
Yn = D1 B1 + D2 B2 + · · · + Dn Bn =
n X
Dk Bk .
k=1
Then
Rn∗ = u − Yn , τ (u) = inf{n ≥ 1 : Yn > u} .
(5.2)
A−1 k
Note that is simply the discounting factor for period k and Dn the one for the totality of periods 1, . . . , n. Thus Rn∗ is simply the present value of the reserve, and the formula Rn∗ = u − Yn tells that, as should be, this is the initial reserve minus the present value of claim surpluses from the different periods. ∗ Proof. We can rewrite (5.1) (with n replaced by k) as Rk∗ − Rk−1 = −Dk Bk . Thus n X ∗ Rn∗ = u − (Rk∗ − Rk−1 ) = u − Yn . k=1
From this the claim on τ (u) follows by noting that Rn < 0 if and only if Rn∗ < 0, because Ak > 0. 2 −1 ∗ Proposition 5.3 Assume A−1 1 > 0, µ = E log A1 < 0 and E log B1  < ∞. P∞ a.s. Then the r.v. Y = 1 Dn Bn satisfies P(−∞ < Y < ∞) = 1, and Yn → Y as n → ∞.
244
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES ∗
Proof. We have a.s. that Dn = eµ n(1+o(1)) . Furthermore, a standard application of the BorelCantelli lemma shows that E log Bn  < ∞ ensures that for all c1 one has log Bn  > nc1 , i.e. Bn  > enc1 , only for finitely many n. Choosing c1 < −µ∗ , it follows that the terms in the series in (5.3) decay at least geometrically fast, which implies the assertions. 2 As a corollary, we can give the Proof of Proposition 5.1 when E log An > 0. Note that ψ(u) = P(Yn > u for some n) ≤ P
∞ ³X
´ Di Bi > u .
i=1
By the above, the sum is a finite r.v., and therefore the r.h.s. goes to 0 as u → ∞. 2 The ruin probability can be represented in terms of Y as follows: Theorem£5.4 Define ¯ H(u) = ¤P(Y > u). Then ψ(u) = H(u)/C1 (u) where C1 (u) = E H(Rτ (u) ) ¯ τ (u) < ∞ . Proof. For brevity, write τ = τ (u). On {τ < ∞} we have u − Y = u − Yτ − (Y − Yτ ). Here Y − Yτ =
∞ X
Di Bi = Dτ
i=τ +1
where Ye =
∞ X Dτ +i i=1
Dτ
∞ X Dτ +i i=1
Bi+τ =
∞ X
Dτ
Bτ +i = Dτ Ye
−1 A−1 τ +1 · · · Aτ +i Bτ +i
i=1
is a copy of Y independent of τ and Rτ . Thus u − Y = Dτ (Dτ−1 (u − Yτ ) − Ye ) = Dτ (Rτ − Ye ) . Since Yn → Y , we also have Yn > u for some n (and hence τ < ∞) when Y > u. Thus ¡ H(u) = P(u − Y < 0) = P Rτ − Ye < 0, τ < ∞) ¯ ¡ £ ¤ = ψ(u)P Rτ − Ye < 0 ¯ τ < ∞) = ψ(u)E H(Rτ )  τ < ∞ . 2 D ˘ ˘ Proposition 5.5 The r.v. Y satisfies Y = A−1 1 (B1 + Y ) where Y is a copy of Y which is independent of A1 , B1 .
5. DISCRETETIME RUIN WITH STOCHASTIC INVESTMENT
245
Proof. Take Y˘ =
∞ X
−1 A−1 2 . . . A i Bi .
2
i=2
The representation in Theorem 5.4 is in general quite intractable, because usually H and C1 (u) cannot be calculated. An obvious and important question is therefore to ask for tail asymptotics. The question that immediately comes up is how strongly C1 (u) depends on u. The following result shows that under suitable conditions, this dependence is weak so that the important part of the tail asymptotics is H(u) itself. Proposition 5.6 0 < lim inf u→∞
ψ(u) ψ(u) ≤ lim sup ≤ 1. H(u) u→∞ H(u)
Proof. Since −∞ < Rτ ≤ 0, we have H(0) ≤ C1 (u) ≤ 1, so all that remains to show is H(0) > 0. Assume otherwise. Then the upper point a in the support of Y satisfies −∞ < a ≤ 0. Assume first a < 0. Recall that by assumption, p1 = P(A1 ≥ 1) > 0 and/or that B1 ≤ 0 is excluded. Choose ² > 0 with p2 = P(B1 > 2²) > 0 and let p3 = P(Y > a − ²). Then p3 > 0 and hence P(Y > a + ²) = ≥
¡ ¢ ˘ P A−1 1 (B1 + Y ) > a + ² ¡ ¢ p1 P B1 + Y˘ > a + ² ≥ p1 p2 p3 > 0 .
If instead a = 0, we have similarly that P(Y > a + ²) ≥ p4 p2 p3 where p4 = P(A1 ≤ 1) > 0. In both cases, we have reached a contradiction with the definition of a. 2 It thus remains to get some hold on H(u). We shall here involve a classical result due to Kesten [531] and Goldie [421] on perpetuities. By a perpetuity one understands a r.v. of the form Ye = B1 + D1 B2 + D2 B3 + · · · =
∞ X
Di−1 Bi
(5.3)
i=1
where {An }, {Bn } are independent sequences each consisting of i.i.d. r.v.’s and −1 Dn = A−1 1 · · · An (the use of reciprocals is unusual but made to conform with the above risk theoretic setting). The result of [421, 531] states that under suitable conditions, H(u) = P(Y > u) decays like an αpower of u. Before stating the result (the proof of which is outside the scope of this book), we give as a help for intuition some heuristic steps that give a heuristic motivation of the heavy tail and allow one to identify α.
246
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Remark P∞ Tn 5.7 Let Tn be the random walk − log A1 − · · · − log AnTnso that Y = 1 e Bn . With lighttailed Bn , we expect that a single term e Bn can only be large if Tn is so. By assumption, Tn has negative drift and we will assume that the conditions of the Cram´er Lundberg approximation are satisfied. Then the probability that Tn > x for some x is approximately CT e−αx where α solves 1 = E e−α log A = E
1 . Aα
(5.4)
Choose b0 > 0 with B(b0 ) = P(Bn > b0 ) > 0. For an n with Tn > x we then have P(eTn Bn > ex b0 ) > B(b0 ). Taking x = log u − log b, it follows that P(eTn Bn > u for some n) ≥ CT e−αx B(b0 ) =
CT B(b0 )bα . uα
This motivates that Y is heavytailed and that the tail decays at least as a power. The first trial solution for tail asymptotics is therefore a power tail, P(Y > u) ∼ C/(1 + u)β . Assume as before that B1 is lighttailed. Then Proposition 5.5 ˘ implies that Y and A−1 1 Y have equivalent tails, i.e. C (1 + u)β
∼ = =
˘ P(Y > u) ∼ P(A−1 1 Y > u) Z ∞ Z ∞ C P(Y˘ > u/a) P(A−1 ∈ da) ∼ P(A−1 1 1 ∈ da) (1 + u/a)β 0 0 Z ∞ Caβ P(A1−1 ∈ da) . (a + u)β 0
Multiplying by (1 + u)β and going to the limit under the integral, the r.h.s. becomes EA−β . This suggests that β = α, where α is as in (5.4). 2 Here is the result of Kesten [531] and Goldie [421]: Theorem 5.8 Assume that there exists α > 0 such that EA1 −α = 1 together with EA1 −α log− A1 < ∞ and EB1  < ∞. Assume further that the distribution of A−1 1 is nonlattice. Then for some C2 > 0, P(Y > u) ∼
C2 , u → ∞. (1 + u)α
(5.5)
The (untractable) expression for C2 is given in [531, 421] (it is shown in Nyrhinen [669] that C2 > 0). The proof of Theorem 5.8 is much too technical to be given here. However, up to the untractable constant we can now obtain the desired asymptotics of the ruin probability by combining with Theorem 5.4:
5. DISCRETETIME RUIN WITH STOCHASTIC INVESTMENT Corollary 5.9 Under the conditions of Theorem 5.8, ψ(u) ∼ particular,
247
C2 1 . In C1 (u) uα
0 < lim inf uα ψ(u) ≤ lim sup uα ψ(u) < ∞ . u→∞
u→∞
The result follows immediately by combining Theorems 5.8, 5.4 and Proposition 5.6 with the following lemma: Lemma 5.10 Under the conditions of Theorem 5.8, P(Ye > u) ∼ P(Y > u) ∼ C2 /uα . e ∗ where Ye ∗ = P∞ Bi Qn A−1 is a copy of Ye Proof. We have Y = A−1 1 Y i 1 2 independent of A1 . Rewrite Theorem 5.8 as P(Ye ∗ > y) ∼ C2 /(1 + y)α . Then also P(Ye ∗ > y) ≤ C3 (1 + y)α for all y > 0 since clearly such an inequality holds on any finite interval. Thus Z ∞ uα P(Y > u) = uα P(Ye ∗ > au) P(A1 ∈ da) 0 Z ∞ uα = P(Ye ∗ > au)(1 + au)α P(A1 ∈ da) (1 + au)α Z0 ∞ 1 → C2 α P(A1 ∈ da) = C2 a 0 by dominated convergence.
2
Remark 5.11 The reason that Corollary 5.9 gives a heavier tail asymptotics than in Section 2 is not so much that the interest is random, but that it is inherent in the setup that negative returns are possible. Namely, if A1 ≥ 1 (and is not degenerate at 1), then EA−α is always < 1 so the conditions of 1 Corollary 5.9 cannot hold. 2 Remark 5.12 Nyrhinen [669] gives conditions under which C1 (u) in Corollary 5.9 is not significant in terms of logarithmic asymptotics. 2 Notes and references Rather than assuming that {An }, {Bn } are independent sequences each consisting of i.i.d. r.v.’s, much of the literature relaxes this to the case that (An , Bn ) being i.i.d. I.e. some dependence between An and Bn is allowed. See, for example, Nyrhinen [669, 668]. For models where the independence among the Aj themselves or among the Bj is relaxed see e.g. Cai [212], Cai & Dickson [214], Goovaerts et al. [424], Weng, Zhang & Tan [881], Shen, Lin & Zhang [796] and Collamore [252]. The recursion Rn = An (Rn−1 − Bn ) also has some relevance as a model for the risk reserve in the presence of investments. The results are much as for (5.1), but will not be given here, see again Nyrhinen [669, 668].
248
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
Tail asymptotics for finite horizon ruin probabilities ψ(u, n) in a setting where n goes to infinity with u and either A−1 or B1 (or both) are heavytailed are given in 1 Tang & Tsitsiashvili [830, 831]. For extensions to other ruinrelated quantities see Yang & Zhang [903]. An important early paper is Paulsen [680]. Paulsen [682] surveys the literature up to 1998 and Paulsen [684] in the decade after that.
6
Continuoustime ruin problems with stochastic investment
Results for continuoustime models with stochastic investment can be obtained as a suitable limit of corresponding discretetime setups (see the Notes). However, continuoustime models often also enable a direct analysis that can have a quite different flavor from its discrete counterpart. In this section this will be illustrated on a heuristic level for the case of a risk reserve process of compound Poisson type (with Poisson intensity β), where all the reserve is continuously invested in a financial market of BlackScholes type, i.e. the risky asset is a geometric Brownian motion. More precisely, the resulting reserve is given by Z t Z t Nt X Rt = u + t − Ui + a Rs− ds + σ Rs− dBs , i=1
0
0
where {Bt } is standard Brownian motion, σ is the volatility and a is the drift of a geometric Brownian motion. In view of the generators for the diffusion and the compound Poisson process derived in Examples II.4.1 and II.4.2 one observes that the generator of the resulting risk process is given by Z ∞ ¡ ¢ σ 2 2 00 u f (u) + f 0 (u) + β Af = au f 0 (u) + f (u − x) − f (u) B(dx). 2 0 Using Itˆo’s Lemma, one can now show that if a twice continuously differentiable and bounded function f (u) with limu→∞ f (u) = 0 satisfies Af (u) = 0 for u > 0 and f (u) = 1 for u < 0, then f (u) must be the ruin probability ψ(u) of the process. Hence ψ(u) satisfies Z u σ 2 u2 00 ψ (u)+ψ 0 (u)−β ψ(u)+β au ψ 0 (u)+ ψ(u−x) B(dx)+βB(u) = 0. (6.1) 2 0 with limu→∞ ψ(u) = 0.5 In this way the problem of studying the ruin probability with stochastic investment has been reduced to the purely analytical problem of solving an integrodifferential equation. 5 Note that for σ = 0 the investment is riskless and we get back to the risk model with constant interest rate (cf. (1.11) with p(u) = p + a u).
6. CONTINUOUSTIME RUIN WITH STOCHASTIC INVESTMENT
249
Example 6.1 Assume that the claim size distribution is exponential(ν). Then one can add the derivative of (6.1) to (6.1) multiplied by ν to get rid of the convolution term,6 which leads to ¡ ¢ σ 2 u2 000 ψ (u) + au + σ 2 u + 1 + νσ 2 u2 /2 ψ 00 (u) + (a − β + uaν + ν) ψ 0 (u) = 0 2 with additional boundary condition ψ 0 (0) = βψ(0). After substituting ψ 0 (u) by another function, this is in fact a secondorder ODE with polynomial coefficients which can be solved analytically in terms of special functions (of Heun type). Although the resulting formula is explicit, it is quite lengthy and we do not state it here. 2 Looking at the drift of the geometric Brownian motion, one can show that for σ 2 ≥ 2a, ψ(u) = 1 holds for all u ≥ 0 (cf. e.g. Paulsen [681] or Pergamenshchikov & Zeitouny [691]), so it is enough to restrict to σ 2 < 2a. In general, it is impossible to obtain an explicit solution of the above IDE, but one can retrieve asymptotic results as u → ∞ for a large class of claim size distributions. Theorem 6.2 Assume that the free reserve in the Cram´erLundberg model is invested in a financial asset that is modeled by a geometric Brownian motion with drift a > 0 and volatility σ > 0 with 2a > σ 2 . If the claim size distribution is exponentially bounded, then 2
ψ(u) ∼ C u1−2a/σ ,
u → ∞,
for some constant C > 0. If the claim size distribution is regularly varying (B(x) ∼ L(x)x−α , α > 0), then ψ(u) ∼ L1 (u) umax{1−2a/σ
2
,−α}
,
u → ∞,
(6.2)
where L(u), L1 (u) are slowly varying functions. Proof. For the full proof, we refer to Paulsen [683]. Here we only give a sketch of an analytical proof for 1 < 2a/σ 2 < 2 to highlight the origin of the involved power terms. In view of the convolution term in (6.1), it is natural to take the Laplace transform of (6.1) and then try to apply Tauberian theorems to use b the asymptotic behavior of ψ[−s] for s → 0 to infer information about ψ(u) as b u → ∞. For simplicity of notation, call g(s) = ψ[−s] the Laplace transform of ¡ ¢0 the ruin probability. Since the Laplace transform of uψ 0 (u) is − sg(s) and the 6 Cf. Section XII.3c for a more general procedure to eliminate convolution terms for a large class of claim size distributions.
250
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
¡ ¢00 Laplace transform of u2 ψ 00 (u) is s2 g(s) , one obtains from (6.1) after some elementary calculations that s2 g 00 (s) + p0 s g 0 (s) + (q0 + q1 (s)) g(s) = h(s) with p0 = 4 −
2a , σ2
q0 = 2 −
2a , σ2
¡ ¢ 2 b q1 (s) = 2 s − β + β B[−s] /σ ,
c0 [−s]/σ 2 , h(s) = 2ψ(0)/σ 2 − 2βµB B ¡ ¢ c0 [−s] = 1 − B[−s] b where B /(µB s) is the Laplace transform of the integrated tail distribution. It follows that s = 0 is a regular singular point of the homogeneous equation s2 g 00 (s) + p0 s g 0 (s) + (q0 + q1 (s)) g(s) = 0
(6.3)
which by the usual Frobenius method has a solution of the form g(s) = sr
∞ X
ck sk .
(6.4)
k=0
Substituting this into (6.3) gives the condition r(r − 1) + p0 r + q0 = 0 for r, i.e. r1 = −1
and
r2 = −2 +
2a . σ2
(6.5)
For 1 < 2a/σ 2 < 2, r1 − r2 is not an integer, so we obtain two independent solutions of the homogeneous ODE. The particular solution gp (s) can now be obtained by the classical method of variation of constants. With considerable but purely analytical effort (the details are omitted here) one can show that for exponentially bounded claim size distribution B, gp (s) tends to a constant for s → 0. We now look at the asymptotics for s → 0 of the full solution 2a
g(s) = C1 s−1 η1 (s) + C2 s−2+ σ2 η2 (s) + gp (s) (due to (6.4), η1 (s), η2 (s) also tend to a constant for s → 0). A priori the first R u term would dominate, but from Theorem A6.1 this would translate into ψ(x)dx ∼ C1 u and by the Monotone Density Theorem ψ(u) → C1 , which 0 contradicts limu→∞ ψ(u) = 0. Hence we must have C1 = 0 and the second term asymptotic behavior at s → 0. Theorem A6.1 now gives R u dominates the 2−2a/σ 2 ψ(x)dx ∼ C u /Γ(3 − 2a/σ 2 ) and the Monotone Density Theorem 2 0
6. CONTINUOUSTIME RUIN WITH STOCHASTIC INVESTMENT
251
2
implies ψ(u) ∼ C · u1−2a/σ (one still needs to ensure that C2 6= 0 which is omitted here). If, on the other hand, B is regularly varying with parameter α, then one can show that gp (s) ∼ sα−1 L(1/s), so that now the dominating term (which is either 2a the particular solution or the second term above) is of order smin{α−1,−2+ σ2 } . The same arguments as above then imply the claimed result. 2 Remark 6.3 Theorem 6.2 states that (as in the discrete risk model of the previous section) full investment in geometric Brownian motion leads to Paretotype asymptotic decay of the ruin probability even for lighttailed claim distributions. If the tail of the claim size distribution is heavy enough (and from (6.2) one sees that the tail needs to be very heavy), insurance risk can still dominate the financial risk. 2 Remark 6.4 If only a constant fraction η (0 < η < 1) of the current wealth is invested in the risky asset, the analysis above is exactly the same, one just needs to replace a by aη and σ by ση. In Chapter XIV we will see how to improve the asymptotic behavior of the ruin probability by dynamically changing the investment fraction η as a function of the current risk reserve level. 2 Notes and references Early results on ruin probabilities with investments can be found in Frolova, Kabanov & Pergamenshchikov [374] for a Cram´erLundberg model with exponential claim sizes and investments into geometric Brownian motion (the explicit solution of Example 6.1 can also be found there) and bounds are derived in Kalashnikov & Norberg [520] in a more general setup. For extensions to more general claim size distributions, see Constantinescu & Thomann [255]. Results for the continuoustime risk model can also be derived by using methods motivated by discretetime models, see e.g. Nyrhinen [669] and for a fairly general account Paulsen [683]. Martingale techniques are exploited in Ma & Sun [618]. In order to assess ψ(u) for moderate size u, Paulsen, Kasozi & Steigen [688] transform the IDE (6.1) into an ordinary Volterra integral equation of the second k ind and design an effective numerical solution procedure for the latter. The proof technique of Theorem 6.2 outline here is made rigorous in Albrecher, Constantinescu & Thomann [22], and it is shown there that the method in principle also extends to renewal risk models with interclaim times of rational Laplace transform (in particular it turns out that the asymptotic result is insensitive to the choice of interclaim time distribution within that class). See also Wei [879] for a different method in the renewal setup. Pergamenshchikov & Zeitouny [691] deal with more general premium rate functions and Cai & Xu [220] add perturbation to the original risk process. For absolute ruin probabilities in this context, see Gerber & Yang [414]. Kl¨ uppelberg & Kostadinova [540], Brokate et al. [205], Tang, Wang & Yuen [833] and Heyde & Wang [462] study the investment into exponential L´evy models in more
252
CHAPTER VIII. LEVELDEPENDENT RISK PROCESSES
detail. For further references in this active field of research we refer again to the recent survey by Paulsen [684].
Chapter IX
Matrixanalytic methods 1
Definition and basic properties of phasetype distributions
Phasetype distributions are the computational vehicle of much of modern applied probability. Typically, if a problem can be solved explicitly when the relevant distributions are exponentials, then the problem may admit an algorithmic solution involving a reasonable degree of computational effort if one allows for the more general assumption of phasetype structure, and not in other cases. A proper knowledge of phasetype distributions seems therefore a must for anyone working in an applied probability area like risk theory. A distribution B on (0, ∞) is said to be of phasetype if B is the distribution of the lifetime of a terminating Markov process {Jt }t≥0 with finitely many states and time homogeneous transition rates. More precisely, a terminating Markov process {Jt } with state space E ©andªintensity matrix T is defined as the restriction to E of a Markov process J t 0≤t 0 : J t = ∆} (the absorption time), i.e. B = Pα (ζ ≤ t). Equivalently, ζ is the lifetime sup {t ≥ 0 : Jt ∈ E} of {Jt }. A convenient graphical representation is the phase diagram in terms of the entrance probabilities αi , the exit rates ti and the transition rates (intensities) tij :
j
tij 3 ¾
αi ti
¾
i
+ tji tkj k tki
tjk ?
tik s

αj
6 
tj
k
¾
tk

αk Figure IX.1: The phase diagram of a phasetype distribution with 3 phases, E = {i, j, k}. The initial vector α is written as a row vector. Here are some important special cases: Example 1.1 Suppose that p = 1 and write β = −t11 . Then α = α1 = 1, t1 = β, and the phasetype distribution is the lifetime of a particle with constant failure rate β, that is, an exponential distribution with rate parameter β. Thus the phasetype distributions with p = 1 are exactly the class of exponential distributions. 2 2 This
means that tii ≤ 0, tij ≥ 0 for i 6= j and
P
j∈E tij
≤ 0.
1. PHASETYPE DISTRIBUTIONS
255
Example 1.2 The Erlang distribution Ep with p phases is defined as the Gamma distribution with integer parameter p and density δp
xp−1 −δx e . (p − 1)!
(1.3)
Since this corresponds to a convolution of p exponential densities with the same rate δ, the Ep distribution may be represented by the phase diagram (p = 3)
α1 =1 1
δ
δ
 2
δ
 3
 ∆
Figure IX.2 corresponding to E = {1, . . . , p}, α = (1 0 0 . . . 0 0),
−δ 0 .. .
T = 0 0
δ −δ
0 δ
··· ··· .. .
0 0
0 0
0 0
··· ···
−δ 0
0 0 .. .
, δ −δ
0 0 .. .
t= 0 δ
. 2
Example 1.3 The hyperexponential distribution Hp with p parallel channels is defined as a mixture of p exponential distributions with rates δ1 , . . . , δp so that the density is p X αi δi e−δi x . (1.4) i=1
Thus E = {1, . . . , p}, T =
−δ1 0 .. .
0 −δ2
0 0
0 0
0 ··· 0 ··· .. . 0 0
··· ···
and the phase diagram is (p = 2)
0 0
0 0 .. .
−δp−1 0
0 −δp
,
δ1 δ2 .. .
t= δp−1 δp
,
256
CHAPTER IX. MATRIXANALYTIC METHODS α1 1
δ1 j
α2 2
*
∆
δ2
Figure IX.3 2 Example 1.4 (Coxian distributions) This class of distributions is popular in much of the applied literature, and is defined as the class of phasetype distributions with a phase diagram of the following form:
Figure IX.4 For example, the Erlang distribution is a special case of a Coxian distribution. 2 The basic analytical properties of phasetype distributions are given by the following result. Recall the matrixexponential eK is defined by the stanPthat ∞ dard series expansion n=0 K n /n!. 3 Theorem 1.5 Let B be phasetype with representation (E, α, T ). Then: (a) the c.d.f. is B(x) = 1 − αeT x e; (b) the density is b(x) = B 0 (x) = αeT x t; R b = ∞ erx B(dx) is α(−rI − T )−1 t; (c) the m.g.f. B[r] 0 R∞ (d) the nth moment 0 xn B(dx) is (−1)n n! αT −n e. © ª s Proof. Let P = (psij ) be the sstep E∆ × E∆ transition matrix for J t and s P s the sstep E × Etransition matrix for {Jt }, ª the restriction of P to E. © i.e. Then for i, j ∈ E, the backwards equation for J t (e.g. [APQ, p. 48]) yields X X dpsij dpsij = = ti psδj + tik pskj = tik pskj . ds ds k∈E
3 For
k∈E
a number of additional important properties of matrixexponentials and discussion of computational aspects, see Appendix A3.
1. PHASETYPE DISTRIBUTIONS That is, Since
s d ds P
257
= T P s , and since obviously P 0 = I, the solution is P s = eT s .
1 − B(x) = Pα (ζ > x) = Pα (Jx ∈ E) =
X
αi pxij = αP x e,
i,j∈E
this proves (a), and (b) then follows from B 0 (x) = −α
d x P e = −αeT x T e = αeT x t dx
(since T and eT x commute). For (c), the rule (A.12) for integrating matrixexponentials yields Z ∞ ³Z ∞ ´ rx Tx b B[r] = e αe t dx = α e(rI+T )x dx t = α(−rI − T )−1 t. 0
0 rζ
Alternatively, define hi = Ei e . Then o X tij −tii n ti + hj . hi = −tii − r −tii −tii
(1.5)
j6=i
Indeed, −tii is the rate of the exponential holding time of state i and hence (−tii )/(−tii − r) is the m.g.f. of the initial sojourn in state i. After that, we either go to state j 6= i w.p. tij / − tii and have an additional time to absorption which has m.g.f. hj , or w.p. ti / − tii we go to ∆, in which case the time to absorption is 0 with m.g.f. 1. Rewriting (1.5) as X X hi (tii + r) = −ti − tij hj , tij hj + hi r = −ti , j∈E
j6=i
this means in vector notation that (T + rI)h = −t, i.e. h = −(T + rI)−1 t, and b = αh, we arrive once more at the stated expression for B[r]. b since B[r] Part (d) follows by differentiating the m.g.f., dn α(−rI − T )−1 t = (−1)n+1 n! α(rI + T )−n−1 t, drn b (n) [0] = (−1)n+1 n! αT −n−1 t = (−1)n n!αT −n−1 T e B = (−1)n n! αT −n e. Alternatively, for n = 1 we may put ki = Ei ζ and get as in (1.5) X tij 1 + kj , ki = −tii −tii j6=i
which is solved as above to get k = −αT −1 e.
2
258
CHAPTER IX. MATRIXANALYTIC METHODS
Example 1.6 Though typically the evaluation of matrixexponentials is most conveniently carried out on a computer, there are some examples where it is appealing to write T on diagonal form, making the problem trivial. One obvious instance is the hyperexponential distribution, another the case p = 2 where explicit diagonalization formulas are always available, see the Appendix. Consider for example Ã ! Ã ! −3/2 9/14 6/7 α = (1/2 1/2) , T = so that t = . 7/2 −11/2 2 Then (cf. Example A3.7) the diagonal form of T is Ã ! Ã 9/10 9/70 1/10 T = − −6 7/10 1/10 −7/10
−9/70 9/10
! ,
where the two matrices on the r.h.s. are idempotent. This implies that we can compute the nth moment as ! Ã 9/10 9/70 µ 1 ¶ −n n n (−1) n! αT e = 1 n! (1/2 1/2) 1 7/10 1/10 ! Ã 1/10 −9/70 µ 1 ¶ + 6−n n! (1/2 1/2) 1 −7/10 9/10 ¶ µ 3 32 + . = n! 35 35 · 6n Similarly, we get the density as µ αe
Tx
t
=
e
−x
1 1 2 2 µ
+ e−6x =
¶ Ã 9/10 9/70 ! Ã 6/7 !
7/10 1/10 2 ¶ Ã 1/10 −9/70 ! Ã 6/7 ! 1 1 2 2 −7/10 9/10 2
32 −x 18 −6x e + e . 35 35 2
The following result becomes basic in Sections 4, 5 and serves at this stage to introduce Kronecker notation and calculus (see Section 4b for definitions and basic rules):
1. PHASETYPE DISTRIBUTIONS
259
Proposition 1.7 If B is phasetype with representation (ν, T ), then the matrix b m.g.f. B[Q] of B is Z ∞ b B[Q] = eQx B(dx) = (ν ⊗ I)(−T ⊕ Q)−1 (t ⊗ I). (1.6) 0
Proof. According to (A.29) and Proposition A4.4, µZ ∞ ¶ Z ∞ T x Qx Tx Qx b B[Q] = νe te dx = (ν ⊗ I) e ⊗ e dx (t ⊗ I) 0 0 µZ ∞ ¶ = (ν ⊗ I) e(T ⊕Q)x dx (t ⊗ I) = (ν ⊗ I)(−T ⊕ Q)−1 (t ⊗ I). 0
2 Sometimes it is relevant also to consider phasetype distributions, where the P initial vector α is substochastic, kαk = i∈E αi < 1. There are two ways to interpret this: • The phasetype distribution B is defective, i.e. kBk = kαk < 1; a random variable U having a defective phasetype distribution with representation (α, T ) is then defined to be ∞ on a set of probability 1 − kαk, or one just lets U be undefined on this additional set. • The phasetype distribution B is zeromodified, i.e a mixture of a phasetype distribution with representation (α/kαk, T ) with weight kαk and an atom at zero with weight 1 − kαk. This is the traditional choice in the literature, and in fact one also most often there allows α to have a component α∆ at ∆.
1a
Asymptotic exponentiality
Writing T on the Jordan canonical form, it is easily seen that the asymptotic form of the tail of a general phasetype distribution has the form B(x) ∼ Cxk e−ηx , where C, η > 0 and k = 0, 1, 2 . . . The Erlang distribution gives an example where k > 0 (in fact, then k = p − 1), but in many practical cases, one has k = 0. Here is a sufficient condition: Proposition 1.8 Let B be phasetype with representation (α, T ), assume that T is irreducible, let −η be the eigenvalue of largest real part of T , let ν, h be the corresponding left and right eigenvectors normalized by νh = 1 and define C = αh · νe. Then the tail B(x) is asymptotically exponential, B(x) ∼ Ce−ηx .
(1.7)
260
CHAPTER IX. MATRIXANALYTIC METHODS
Proof. By PerronFrobenius theory (A.4c), η is real and positive, ν, h can be chosen with strictly positive component, and we have eT x ∼ hνe−ηx , x → ∞. Using B(x) = αeT x e, the result follows (with C = (αh)(νe)).
2
Of course, the conditions of Proposition 1.8 are far from necessary (a mixture of phasetype distributions with the respective T (i) irreducible has obviously an asymptotically exponential tail, but the relevant T is not irreducible, cf. Example A5.8). In Proposition A5.1 of the Appendix, we give a criterion for asymptotical exponentiality of a phasetype distribution B, not only in the tail but in the whole distribution. Notes and references The idea behind using phasetype distributions goes back to Erlang, but today’s interest in the topic was largely initiated by M.F. Neuts, see his book [660] (a historical important intermediate step is Jensen [505]). Other expositions of the basic theory of phasetype distributions can be found in [APQ], Lipsky [600], Rolski, Schmidli, Schmidt & Teugels [746] and Wolff [894]. All material of the present section is standard; the text is essentially identical to Section 2 of Asmussen [68]. In some of the literature and also in Section XII.3, the slightly larger class of distributions with a rational m.g.f. (or Laplace transform) is used which may seem less intuitive than phasetype distributions. See in particular the notes to Section 6. O’Cinneide [670] gave a necessary and sufficient criterion for a distribution B with a b = p(s)/q(s) to be phasetype: the density b(x) should be strictly rational m.g.f. B[s] positive for x > 0 and the root of q(s) with the smallest real part should be unique (not necessarily simple, cf. the Erlang case). No satisfying algorithm for finding a phase representation of a distribution B (which is known to be phasetype and for which the m.g.f. or the density is available) is, however, known. A related important unsolved problem deals with minimal representations: given a phasetype distribution, what is the smallest possible dimension of the phase space E?
2
Renewal theory
A summary of the renewal theory in general is given in A1 of the Appendix, but is in part repeated below. Let U1 , U2 , . . . be i.i.d. with common distribution B and define4 U(A) = =
E# {n = 0, 1, . . . : U1 + · · · + Un ∈ A} ∞ X E I(U1 + · · · + Un ∈ A). n=0
4 Here
the empty sum U1 + . . . + U0 is 0.
2. RENEWAL THEORY
261
We may think of the Ui as the lifetimes of items (say electrical bulbs) which are replaced upon failure, and U(A) is then the expected number of replacements (renewals) in A. For this reason, we refer to U as the renewal measure; if U is absolutely continuous on (0, ∞) w.r.t. Lebesgue measure, we denote the density by u(x) and refer to u as the renewal density. If B is exponential with rate β, the renewals form a Poisson process and we have u(x) = β. The explicit calculation of the renewal density (or the renewal measure) is often thought of as infeasible for other distributions, but nevertheless, the problem has an algorithmically tractable solution if B is phasetype: Theorem 2.1 Consider a renewal process with interarrivals which are phasetype with representation (α, T ). Then the renewal density exists and is given by u(x) = αe(T +tα)x t. (2.1) © (k) ª © ª Proof. Let Jt be the governing phase process for Uk and define Jet by © (k) ª piecing the Jt together, (1) (2) Jet = Jt , 0 ≤ t < U1 , Jet = Jt−U1 , U1 ≤ t < U1 + U2 , . . . . © ª (k) Then Jet is Markov and has two types of jumps, the jumps of the Jt and the (k) (k+1) jumps corresponding to a transition from one Jt to the next Jt . A jump of the last type from i to j occurs at rate ti αj , and the jumps of the first type are governed by T . Hence the intensity matrix is T + tα, and the distribution of Jex is αe(T +tα)x . The renewal density at x is now just the rate of jumps of the second type, which is ti in state i. Hence (2.1) follows by the law of total probability. 2
The argument goes through without change if the renewal process is terminating, i.e. B is defective, and hence (2.1) remains valid for that case. However, the phasetype assumptions also yield the distribution of a further quantity of fundamental importance in later parts of this chapter, the lifetime of the renewal process. This is defined as U1 + · · · + Uκ−1 where κ is the first k with Uk = ∞, that is, as the time of the last renewal; since Uk = ∞ with probability 1 − kBk which is > 0 in the defective case, this is welldefined. Corollary 2.2 Consider a terminating renewal process with interarrivals which are defective phasetype with representation (α, T ), i.e. kαk < 1. Then the lifetime is zeromodified phasetype with representation (α, T + tα). © ª Proof. Just note that Jet is a governing phase process for the lifetime. 2 Returning to nonterminating renewal processes, define the excess life ξ(t) at time t as the time until the next renewal following t, see Fig. IX.5.
262
CHAPTER IX. MATRIXANALYTIC METHODS 6
ξ(t)
U2
U1
U3 U1
U2

U3
U4
Figure IX.5 Corollary 2.3 Consider a renewal process with interarrivals which are phasetype with representation (α, T ), and let µB = −αT −1 e be the mean of B. Then: (a) the excess life ξ(t) at time t is phasetype with representation (ν t , T ) where ν t = αe(T +tα)t ; (b) ξ(t) has a limiting distribution as t → ∞, which is phasetype with representation (ν, T ) where ν = −αT −1 /µB . Equivalently, the density is νeT x t = B(x)/µB . © ª Proof. Consider again the process Jet in the proof of Theorem 2.1. The time of the next renewal after t is the time of the next jump of the second type, hence ξ(t) is phasetype with representation (ν t , T ) where ν t is the distribution of Jet which is obviously given by the expression in (a). Hence in (b) it is immediate that ν exists and is the stationary limiting distribution of Jet , i.e. the unique positive solution of νe = 1,
ν(T + tα) = 0.
Here are two different arguments that this yields the asserted expression: (i) Just check that −αT −1 /µB satisfies (2.2): −αT −1 e = µB −αT −1 (T + tα) µB
= =
µB = 1, µB −α + αT −1 T eα µB −α + α −α + αeα = = 0. µB µB
(2.2)
2. RENEWAL THEORY
263
(ii) First check the asserted identity for the density: since T , T −1 and eT x commute, we get αeT x e αT −1 eT x T e B(x) = = = νeT x t. µB µB µB Next appeal to the standard fact from renewal theory that the limiting distribution of ξ(x) has density B(x)/µB , cf. Section A.1e. 2 Example 2.4 Consider a nonterminating renewal process with two phases. The formulas involve the matrixexponential of the intensity matrix µ ¶ µ ¶ t11 + t1 α1 t12 + t1 α2 −q1 q1 Q = T + tα = = (say). t12 + t2 α1 t22 + t2 α2 q2 −q2 According to Example A3.6, we first compute the stationary distribution of Q, ¶ µ q1 q2 , π = (π1 π2 ) = q1 + q2 q1 + q2 and the nonzero eigenvalue λ = −q1 − q2 . The renewal density is then µ ¶µ ¶ π1 π2 t1 αeQt t = (α1 α2 ) π1 π2 t2 µ ¶µ ¶ π2 −π2 t1 + eλt (α1 α2 ) −π1 π1 t2 µ ¶ µ ¶ t1 π2 (t1 − t2 ) = (π1 π2 ) + eλt (α1 α2 ) t2 π1 (t2 − t1 ) = =
π1 t1 + π2 t2 + eλt (α1 π2 − α2 π1 ) (t1 − t2 ) 1 + eλt (α1 π2 − α2 π1 ) (t1 − t2 ) . µB 2
Example 2.5 Let B be Erlang(2). Then Q =
³ −δ 0
³ 0 ´ ³ −δ δ ´ + (1 0) = −δ δ δ
δ ´ . −δ
Hence π = (1/2 1/2), λ = −2δ, and Example 2.4 yields the renewal density as u(t) =
¢ δ¡ 1 − e−2δt . 2 2
264
CHAPTER IX. MATRIXANALYTIC METHODS
Example 2.6 Let B be hyperexponential. Then Q =
³ −δ 1 0
³ δ ´ ³ −δ α 0 ´ 1 1 2 + (α1 α2 ) = −δ2 δ2 δ2 α1
Hence
³ π =
δ1 α 2 ´ . −δ2 α1
´ δ1 α2 δ2 α1 , δ1 α2 + δ2 α1 δ1 α2 + δ2 α1
λ = −δ1 α2 − δ2 α1 , and Example 2.4 yields the renewal density as u(t) =
δ1 δ2 (δ1 − δ2 )2 α1 α2 + e−(δ1 α2 +δ2 α1 )t . δ1 α2 + δ2 α1 δ1 α2 + δ2 α1 2
Notes and references Early expositions of renewal theory for phasetype distributions are Neuts [659] and Kao [521]. The present treatment, similar to that in [APQ], is somewhat more probabilistic.
3
The compound Poisson model
3a
Phasetype claims
Consider the compound Poisson (Cram´erLundberg) model in the notation of Chapter I, with β denoting the Poisson intensity, B the claim size distribution, τ (u) the time ¡ of ruin with initial ¢ reserve u, {St } the claim surplus process, G+ (·) = P Sτ (0) ∈ ·, τ (0) < ∞ the ladder height distribution and M = supt≥0 St . We assume that B is phasetype with representation (α, T ). Corollary 3.1 Assume that the claim size distribution B is phasetype with representation (α, T ). Then: (a) G+ is defective phasetype with representation (α+ , T ) where α+ is given by α+ = −βαT −1 , and M is zeromodified phasetype with representation (α+ , T + tα+ ). (b) ψ(u) = α+ e(T +tα+ )u e. Note in particular that ρ = kG+ k = α+ e. Proof. The result follows immediately by combining the PollaczeckKhinchine formula with general results on phasetype distributions: for (a), use the phasetype representation of B0 , cf. Corollary 2.3. For (b), represent the maximum M as the lifetime of a terminating renewal process and use Corollary 2.2.
3. THE COMPOUND POISSON MODEL
265
Since the result is quite fundamental, we shall, however, add a more selfcontained explanation of why the phasetype structure is preserved. The essence is contained in Fig. IX.6. Here we have taken the terminating Markov process underlying B with two states, marked by thin and thick lines on the figure. Then each claim (jump) corresponds to one (finite) sample path of the Markov process. The stars represent the ladder points Sτ+ (k) . Considering the first, we see that the ladder height Sτ+ is just the residual lifetime of the Markov process corresponding to the claim causing upcrossing of level 0, i.e. itself phasetype with the same phase generator T and the initial vector α+ being the distribution of the upcrossing Markov process at time −Sτ+ − . Next, the Markov processes representing ladder steps can be pieced together to one {mx }. Within ladder steps, the transitions are governed by T whereas termination of ladder steps may lead to some additional ones: a transition from i to j occurs if the ladder step terminates in state i, which occurs at rate ti , and if there is a subsequent ladder step starting in j which occurs w.p. α+j . Thus the total rate is tij +ti α+j , and rewriting in matrix form yields the phase generator of {mx } as T + tα+ . Now just observe that the initial vector of {mx } is α+ and that the lifelength is M .
Figure IX.6 This derivation is a complete proof except for the identification of α+ with −βαT −1 . This is in fact a simple consequence of the form of the excess distribution B0 , see Corollary 2.3. 2
266
CHAPTER IX. MATRIXANALYTIC METHODS
Example 3.2 Assume that β = 3 and b(x) =
1 1 · 3e−3x + · 7e−7x . 2 2
Thus b is hyperexponential (a mixture of exponential distributions) with α = ( 21 21 ), T = (−3 − 7)diag so that µ ¶ ¡ ¢ −1/3 0 1/2 3/14 , α+ = −βαT −1 = −3 (1/2 1/2) = 0 −1/7 ¶ Ã −3/2 9/14 ! µ ¶ µ ¶µ −3 0 3 1 3 = T + tα+ = + . 0 −7 7 2 14 7/2 −11/2 This is the same matrix as is Example 1.6, so that as there Ã ! Ã 9/10 9/70 1/10 (T +tα+ )u −u −6u e = e +e 7/10 1/10 −7/10 Thus ψ(u) = α+ e(T +tα+ )u e =
−9/70
1 24 −u e + e−6u . 35 35
9/10
! .
2
Notes and references Corollary 3.1 can be found in Neuts [660] (in the setting of M/G/1 queues, cf. the duality result given in Corollary III.3.6), but that such a simple and general solution existed does not appear to have been well known to the risk theory community before a rather late stage. The result carries over to B being matrixexponential, see Section 6. In the next sections, we encounter similar expressions for the ruin probabilities in the renewal and Markovmodulated models, but there the vector α+ is not explicit but needs to be calculated (typically by an iteration or a rootfinding). The parameters of Example 3.2 are taken from Gerber [398]; his derivation of ψ(u) is different. For further more or less explicit computations of ruin probabilities, see Shiu [800]. It is notable that the phasetype assumption does not seem to simplify the computation of finite horizon ruin probabilities substantially (but see Section 8). For an attempt, see Stanford & Stroi´ nski [817].
4
The renewal model
We consider the renewal model in the notation of Chapter VI, with A denoting the interarrival distribution and B the service time distribution. We assume
4. THE RENEWAL MODEL
267
ρ = µB /µA < 1 and that B is phasetype with representation (α, T ). We shall derive phasetype representations of the ruin probabilities ψ(u), ψ (s) (u) (recall that ψ(u) refers to the zerodelayed case and ψ (s) (u) to the stationary case). For the compound Poisson model, this was obtained in Section 3, and the argument for the renewal case starts in just the same way (cf. the discussion around Fig. IX.6 which does not use that A is exponential) by noting that the distribution G+ of the ascending ladder height Sτ+ is necessarily (defective) phasetype with representation (α+ , T ) for some vector α+ = (α+;j ). That is, if we define {mx } just as for the Poisson case (cf. Fig. IX.6): Proposition 4.1 In the zerodelayed case, (a) G+ is of phasetype with representation (α+ , T ), where α+ is the (defective) distribution of m0 ; (b) The maximum claim surplus M is the lifetime of {mx }; (c) {mx } is a (terminating) Markov process on E, with intensity matrix Q given by Q = T + tα+ . The key difference from the Poisson case is that it is more difficult to evaluate α+ . In fact, the form in which we derive α+ for the renewal model is as the unique solution of a fixed point problem α+ = ϕ(α+ ), which for numerical purposes can be solved by iteration. Nevertheless, the calculation of the first ladder height is simple in the stationary case: (s)
Proposition 4.2 The distribution G+ of the first ladder height of the claim © (s) ª surplus process St for the stationary case is phasetype with representation ¡ (s) ¢ α , T , where α(s) = −αT −1 /µA . Proof. Obviously, the Palm distribution of the claim size is just B. Hence by (s) Theorem III.5.5, G+ = ρB0 , where B0 is the stationary excess life distribution corresponding to B. But by Corollary 2.3, B0 is phasetype with representation (−αT −1 /µB , T ). 2 Proposition 4.3 α+ satisfies α+ = ϕ(α+ ), where Z ∞ b + tα+ ] = α ϕ(α+ ) = αA[T e(T +tα+ )y A(dy) .
(4.1)
0
Proof. We condition upon T1 = y and define {m∗x } from {St+y − Sy− } in the same way as {mx } is defined from {St }, cf. Fig. IX.7. Then {m∗x } is Markov with the same transition intensities as {mx }, but with initial distribution α rather than α+ . Also, obviously m0 = m∗y . Since the conditional distribution of m∗y given T1 = y is αeQy , it follows by integrating y out that the distribution α+ of m0 is given by the final expression in (4.1). 2
268
CHAPTER IX. MATRIXANALYTIC METHODS
Figure IX.7 We have now almost collected all pieces of the main result of this section: Theorem 4.4 Consider the renewal model with interarrival distribution A and the claim size distribution B being of phasetype with representation (α, T ). Then ψ(u) = α+ e(T +tα+ )x e, ψ (s) (u) = α(s) e(T +tα+ )x e, (4.2) where α+ satisfies (4.1) and α(s) = −αT −1 /µA . Furthermore, α+ can be computed by iteration of (4.1), i.e. by ¡ (0) ¢ ¡ (1) ¢ (n) (0) (1) (2) α+ = lim α+ where α+ = 0, α+ = ϕ α+ , α+ = ϕ α+ , . . . n→∞
(4.3)
Proof. The first expression in (4.2) follows from Proposition 4.1 by noting that the distribution of m0 is α+ . The second follows in a similar way by noting that only the first ladder step has a different distribution in the stationary case, and that this is given by Proposition 4.2; thus, the maximum claim surplus for the stationary case has a similar representation as in Proposition 4.1(b), only with initial distribution α(s) for m0 . It remains to prove convergence of the iteration scheme (4.3). The term tβ in ϕ(β) represents feedback with rate vector t and feedback probability vector β. Hence ϕ(β) (defined on the domain of subprobability vectors β) is an increasing (1) (0) function of β. In particular, α+ ≥ 0 = α+ implies ¡ (1) ¢ ¡ (0) ¢ (2) (1) α+ = ϕ α+ ≥ ϕ α+ = α+
4. THE RENEWAL MODEL
269
© (n) ª (n) and (by induction) that α+ is an increasing sequence such that limn→∞ α+ (0) exists. Similarly, 0 = α+ ≤ α+ yields ¡ (0) ¢ = ϕ α+ ≤ ϕ (α+ ) = α+
(1)
α+
(n)
(n)
and by induction that α+ ≤ α+ for all n. Thus, limn→∞ α+ ≤ α+ . To prove the converse inequality, we use an argument similar to the proof of Proposition VII.2.4. Let Fn = {T1 + · · · + Tn+1 > τ+ } be the event that {m∗x } (n) has at most n arrivals in [T1 , τ+ ], and let α e+;i = P(m∗T1 = i; Fn ). Obviously, (n)
(n)
(n)
e + ↑ α+ , so to complete the proof it suffices to show that α e + ≤ α+ for all α n. For n = 0, both quantities are just 0. Assume the assertion shown for n − 1. Then each subexcursion of {St+T1 − ST1 − } can contain at most n − 1 arrivals (n arrivals are excluded because of the initial arrival at time T1 ). It follows that (n−1) e+ on Fn the feedback to {m∗x } after each ladder step cannot exceed α so that Z (n) e+ α
≤
Z ≤
∞
α 0
α 0
∞
(n−1)
)y
A(dy)
(n−1)
)y
¡ (n−1) ¢ (n) = α+ . A(dy) = ϕ α+
e+ e(T +tα
e(T +tα+
2 We next give an alternative algorithm, which links together the phasetype setting and the classical complex plane approach of the renewal model (see further the Notes). To this end, let F be the distribution of U1 − T1 . Then b Fb[r] = α(−rI − T )−1 t · A[−r]
(4.4)
whenever Ee u) of the GI/PH/1 waiting time W ; in turn, D W = M (d) in the notation of Chapter VI). In older literature, explicit expressions for the ruin/queueing probabilities are most often derived under the slightly more general b is rational (say with degree d of the polynomial in the denominator) assumption that B as discussed in Section 6. As in Corollary 4.6, the classical algorithm starts by looking b A[−γ] b for roots in the complex plane of the equation B[γ] = 1, 0. The roots are counted and located by Rouch´e’s theorem (a classical result from complex analysis
5. MARKOVMODULATED INPUT
271
giving a criterion for two complex functions to have the same number of zeros within a defined region). This gives d roots γ1 , . . . , γd satisfying 0, and the solution is then in transform terms , d Z ∞ d Y Y αu αW 1+α e ψ(u) du = Ee = (−γi ) (α − γi ) (4.7) 0
i=1
i=1
(see, e.g., Asmussen & O’Cinneide [94] for a short selfcontained derivation). In risk theory, a pioneering paper in this direction is T¨ acklind [826], whereas the approach was introduced in queueing theory by Smith [814]; similar discussion appears in Kemperman [528] and much of the queueing literature like Cohen [249], see also Chapters XII and XIII. This complex plane approach has been met with substantial criticism for a number of reasons like being lacking probabilistic interpretation and not giving the waiting time distribution / ruin probability itself but only the transform. In queueing theory, an alternative approach (the matrixgeometric method) has been developed largely by M.F. Neuts and his students, starting around in 1975. For surveys, see Neuts [660], [661] and Latouche & Ramaswami [574]. Here phasetype assumptions are basic, but the models solved are basically Markov chains and Markov processes with countably many states (for example queue length processes). The solutions are based upon iterations schemes like in Theorem 4.4; the fixed point problems look like R = A0 + RA1 + R2 A2 + · · · , where R is an unknown matrix, and appears already in some early work by Wallace [870]. The distribution of W comes out from the approach but in a rather complicated form. The matrixexponential form of the distribution was found by Sengupta [793] and the phasetype form by the first author [60]. The exposition here is based upon [60], which contains somewhat stronger results concerning the fixed point problem and the iteration scheme. Numerical examples appear in Asmussen & Rolski [97]. For further early explicit computations of ruin probabilities in the phasetype renewal case, see Dickson & Hipp [314, 315]; some recent extensions are discussed in Section XII.3. There is also much literature on the case where A is phasetype with a few phases.
5
Markovmodulated input
We consider a claim surplus process {St } in a Markovian environment in the notation of Chapter VII. That is, the background Markov process with p states is {Jt }, the intensity matrix is Λ and the stationary row vector is π. The arrival rate in background state i is λi and the distribution of an arrival claim is B¢i . ¡ We assume that each Bi is phasetype, with representation say α(i) , T (i) , E (i) . The number of elements of E (i) is denoted by qi .
272
CHAPTER IX. MATRIXANALYTIC METHODS
It turns out that subject to the phasetype assumption, the ruin probability can be found in matrixexponential form just as for the renewal model, involving some parameters like the ones Q or α+ for the renewal model which need to be determined by similar algorithms. We start in Section 5a with an algorithm involving roots in a similar manner as Corollary 4.6. However, the analysis involves new features like an equivalence with first passage problems for Markovian fluids and the use of martingales (these ideas also apply to phasetype renewal models though we have not given the details). Section 5b then gives a representation along the lines of Theorem 4.4. The key unknown is the matrix K, for which the relevant fixed point problem and iteration scheme has already been studied in VII.2.
5a
Calculations via fluid models. Diagonalization
© ª Consider a process (It , Vt ) t≥0 such that {It } is a Markov process with a finite state space F and {Vt } has piecewiese linear paths, say with slope r(i) on intervals where It = i. The version of the process obtained by imposing reflection on the V component is denoted a Markovian fluid and is of considerable interest in telecommunications engineering as model for an ATM (Asynchronuous Transfer Mode) switch. The stationary distribution is obtained by finding the maximum © ª of the V component of the version of (It , Vt ) obtained by time reversing the I component. This calculation in a special case gives also the ruin probabilities for the Markovmodulated risk process with phasetype claims. The connection between the two models is a fluid representation of the Markovmodulated risk process given in Fig. IX.8. (a)
6
♥ ◦ ♥ ♠ • • ♥ ♠ ◦ • • ♥ ♣ ◦ • ◦ ♣ • ◦

¦ ♥ ¦ ♥ ◦ ♣ ¦ ♥ • ◦ ♣ ¦ ◦ ◦ ♣ •
(b)
6 ♥ ♥ ♥
◦ ◦
♥
◦
♠
•
♠
•
♣
• •
♣

¦
•
♥
•
¦
◦
♣
◦
♣
◦ •
♣
Figure IX.8
¦
• ◦
¦
♥
◦ ◦
♥
5. MARKOVMODULATED INPUT
273
On Fig. IX.8, p = q1 = q2 = 2. The two environmental states are denoted ◦, •, the phase space E (◦) for B◦ has states ¦, ♥, and the one E (•) for B• states (i) ♣, ♠. A claim in state i can then be represented by © ª an E valued Markov process as on Fig. IX.8(a). The fluid model (It , Vt ) on Fig. IX.8(b) is then obtained by changing the vertical jumps to segments with slope 1. Thus F = {◦, ¦, ♥, •, ♣, ♠}. In the general formulation, F is the disjoint union of E and the E (i) , © ª F = E ∪ (i, α) : i ∈ E, α ∈ E (i) , r(i) = −1, i ∈ E, r(i, α) = 1. The intensity matrix for {It } is (taking p = 3 for simplicity) β1 α(1) 0 0 (2) 0 β α 0 Λ − (β ) 2 i diag 0 0 β3 α(3) ΛI = (1) (1) 0 T 0 0 0 t (2) (2) 0 0 T 0 t 0 0 0 0 t(3) 0 T (3)
.
The reasons for using the fluid representation are twofold. First, the probability in the Markovmodulated model of upcrossing level u in state i of {Jt } and phase α ∈ E (i) is the same as the probability that the fluid model upcrosses level u in state (i, α) of {It }. Second, in the fluid model EesVt < ∞ for all s, t, whereas EesSt = ∞ for all t and all s ≥ s0 where s0 < ∞. This implies that in the fluid context, we have more martingales at our disposal. bi [s] = −α(i) (T (i) + sI)−1 t(i) . Let Σ Recall that in the phasetype case, B denote the matrix −β1 α(1) 0 0 0 −β2 α(2) 0 (βi )diag − Λ (3) 0 0 −β α 3 ∆−1 , r ΛI = (1) 0 T (1) 0 0 0 t (2) (2) 0 0 T 0 t 0 0 0 0 t(3) 0 T (3) with the four blocks denoted by Σij , i, j = 1, 2, corresponding to the partitioning of Σ into components indexed by E, resp. E (1) + · · · + E (p) . Proposition 5.1 A complex number s satisfies ¯ ¯ ¯ bi [−s] − 1))diag + sI ¯¯ = 0 ¯Λ + (βi (B
(5.1)
274
CHAPTER IX. MATRIXANALYTIC METHODS
if and only if s is an eigenvalue of Σ. If s is such a number, consider theµvector ¶ a ¡ ¡ ¢ ¢ c bi [−s] − 1) satisfying Λ + βi (B a = −sa and the eigenvector b = of diag d ∆−1 r ΛI , where c, d correspond to the partitioning of b into components indexed by E, resp. E (1) + · · · + E (p) . Then (up to a constant) X c = a, d = (sI − Σ22 )−1 Σ21 a = ai (sI − T (i) )−1 t(i) . i∈E
Proof. Using the wellknown determinant identity ¯ ¯ ¯ ¯ ¯ ¯ Σ11 Σ12 ¯ ¯ −1 ¯ ¯ ¯ ¯ ¯ ¯ ¯ Σ21 Σ22 ¯ = Σ22 · Σ11 − Σ12 Σ22 Σ21 , with Σii replaced by Σii − sI, ¯ ¯ ¯ ¯ (βi )diag − Λ − sI ¯ ¯ ¯ ¯ (1) 0 ¯ t ¯ ¯ 0 t(2) ¯ ¯ 0 0 then also
it follows that if
0 0 t(3)
−β1 α(1) 0 0 (1) T − sI 0 0
0 −β2 α(2) 0 0 T (2) − sI 0
0 0 −β1 α(1) 0 0 T (3) − sI
¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ = 0, ¯ ¯ ¯ ¯ ¯
¯ ¯ ¡ ¢ ¯ ¯ ¯(βi )diag − Λ − sI + βi α(i) (T (i) − sI)−1 t(i) diag ¯ = 0
which is the same as (5.1). For the assertions on the eigenvectors, assume that a is chosen as asserted which means ³ ´ −1 Σ11 − sI + Σ12 (sI − Σ22 ) Σ21 a = 0 , −1
and let d = (sI − Σ22 )
Σ21 c + Σ22 d = =
Σ21 a, c = a. Then −1
Σ21 a − (sI − Σ22 − sI) (sI − Σ22 ) Σ21 a − Σ21 a + sd = sd.
Σ21 a
Noting that Σ11 c + Σ12 d = sc by definition, it follows that ³ Σ ³ c ´ Σ12 ´³ c ´ 11 = s . Σ21 Σ22 d d 2
5. MARKOVMODULATED INPUT
275
Theorem 5.2 Assume that Σ = ∆−1 r ΛI has q = q1 + · · · + qp distinct eigen³ c(ν) ´ values s1 , . . . , sq with 0, define © ª © ª ω(u, v) = inf t > 0 : Vt = u or Vt = −v , ω(u) = inf t > 0 : Vt = u , ¡ ¢ pi (u, v; j, α) = Pi Vω(u,v) = u, Iω(u,v) = (j, α) , ¡ ¢ pi (u, v; j) = Pi Vω(u,v) = −v, Iω(u,v) = j , ¡ ¢ pi (u; j, α) = Pi ω(u) < ∞, Iω(u,v) = (j, α) .
Optional stopping at time ω(u, v) yields X X (ν) (ν) (ν) ci = e−sν u pi (u, v; j, α)dj,α + esν v pi (u, v; j)cj . j,α
j
Letting v → ∞ and using u) = θ (i) eU u e.
(5.3)
6. MATRIXEXPONENTIAL DISTRIBUTIONS
277
Proof. We decompose M in the familiar way as sum of ladder steps. Associated with each ladder step is a phase process, with phase space E (j) whenever the corresponding arrival occurs in environmental state j (the ladder step is of type j). Piecing together P these phase processes yields a terminating Markov process with state space i∈E E (i) , intensity matrix U , say, and lifelength M , and it just remains to check that U has the asserted form. Starting from J0 = i, the initial value of (i, α) is obviously chosen according to θ (i) . For a transition from (j, α) to (k, γ) to occur when j 6= k, the current ladder step of type j must (j) terminate, which occurs at rate tα , and a new ladder step of type k must start (j) in phase γ, which occurs w.p. θkγ . This yields the asserted form of ujα,kγ . For j = k, we have the additional possibility of a phase change from α to γ within (j) the ladder step, which occurs at rate tαγ . 2 Notes and references Section 5a is based upon Asmussen [63] and Section 5b upon Asmussen [59]. Numerical illustrations are given in Asmussen & Rolski [97]. The connection to fluid models is further exploited in a series of papers by Ahn & Ramaswami, e.g. [9, 10]. They also involve the connection to quasi birthdeath processes, defined as birthdeath processes in a Markovian environment and with some modification at the boundary 0. See also Badescu et al. [116]. First passage times for Markov additive processes with positive jumps of phase type are discussed in Breuer [201].
6
Matrixexponential distributions
When deriving explicit or algorithmically tractable expressions for the ruin probability, we have so far concentrated on a claim size distribution B of phasetype. However, in many cases where such expressions are available there are classical results from the prephasetypeera which give alternative solutions under the slightly more general assumption that B has a LaplaceStieltjes transform (or, equivalently, a m.g.f.) which is rational, i.e. the ratio between two polynomials (for the form of the density, see Example I.2.5). An alternative characterization is that such a distribution is matrixexponential, i.e. that the density b(x) can be written as αeT x t for some row vector α, some square matrix T and some column vector t (the triple (α, T , t) is the representation of the matrixexponential distribution/density): Proposition 6.1 Let b(x) be an integrable function on [0, ∞) and b∗ [θ] = R ∞ −θx e b(x) dx its Laplace transform. Then b∗ [θ] is rational if and only if b(x) 0 is matrixexponential. Furthermore, if b∗ [θ] =
b1 + b2 θ + b3 θ2 + ... + bn θn−1 , θn + a1 θn−1 + ... + an−1 θ + an
(6.1)
278
CHAPTER IX. MATRIXANALYTIC METHODS
then a matrixexponential representation is given by b(x) = αeT x t where 0 B B T =B B @
α = (b1 b2 . . . bn−1 bn ) , 0 0 .. 0 −an
1 0 .. 0 −an−1
0 1 .. 0 −an−2
0 0 .. 0 −an−3
t = (0 0 . . . 0 1)T , 0 0 .. 0 −an−4
... ... ... ... ...
0 0 .. 0 −a2
0 0 .. 1 −a1
(6.2) 1 C C C. C A
(6.3)
Proof. If b(x) = αeT x t, then b∗ [θ] = α(θI − T )−1 t which is rational since each element of (θI − T )−1 is so. Thus, matrixexponentiality implies a rational transform. The converse follows from the last statement of the theorem. For a proof, see Asmussen & Bladt [74] (the representation (6.2), (6.3) was suggested by Colm O’Cinneide, personal communication). 2 Remark 6.2 A remarkable feature of Proposition 6.1 is that it gives an explicit Laplace transform inversion which may appear more appealing than the first attempt to invert b∗ [θ] one would do, namely to asssume the roots Pn δ1 , . . . , δn of the denominator to be distinct and expand the r.h.s. of (6.1) as i=1 ci /(θ + δi ), Pn 2 giving b(x) = i=1 ci e−δi x /δi . Example 6.3 A set of necessary and sufficient conditions for a distribution to be phasetype are given in O’Cinneide [670]. One of his elementary criteria, b(x) > 0 for x > 0, shows that the distribution B with density b(x) = c(1 − cos(2π x))e−x , where c = 1 + 1/4π 2 , cannot be phasetype. Writing b(x) = c(−e(2πi−1)x /2 − e(−2πi−1)x /2 + e−x ), it follows that a matrixexponential representation (β, S, s) is given by 2π i − 1 0 0 −c/2 0 −2π i − 1 0 , s = −c/2 . (6.4) β = (1 1 1) , S = 0 0 −1 c This representation is complex, but as follows from Proposition 6.1, we can always obtain a real one (α, T , t). Namely, since b∗ [θ] =
θ3
+
3θ2
1 + 4π 2 , + (3 + 4π 2 )θ + 1 + 4π 2
it follows by (6.2), (6.3) that we can take 0 0 α = (1 + 4π 2 0 0), T = −1 − 4π 2
1 0 −3 − 4π 2
0 0 1 , t = 0 . −3 1 2
6. MATRIXEXPONENTIAL DISTRIBUTIONS
279
Example 6.4 This example shows why it is sometimes useful to work with matrixexponential distributions instead of phasetype distributions: for dimensionality reasons. Consider the distribution with density b(x) =
¡ ¢ 15 e−x (2e−2x − 1)2 + δ . 7 + 15δ
Then it is known from O’Cinneide [670] that b is phasetype when δ > 0, and that the minimal number of phases in a phasetype representation increases to ∞ as δ ↓ 0, leading to matrix calculus in high dimensions when δ is small. But since b∗ [θ] =
(7 +
15δ)θ3
15(1 + δ)θ2 + 120δθ + 225δ + 105 , + (135δ + 63)θ2 + (161 + 345δ)θ + 225δ + 105
Proposition 6.1 shows that a matrixexponential representation can always be obtained in dimension only 3 independent of δ. 2 As for the role of matrixexponential distributions in ruin probability calculations, we shall only consider the compound Poisson model with arrival rate β and a matrixexponential claim size distribution B, and present two algorithms for calculating ψ(u) in that setting. For the first, we take as starting point a representation of b∗ [θ] as p(θ)/q(θ) where p, q are polynomials without common roots. Then (cf. Corollary IV.3.4) the Laplace transform of the ruin probability is Z ∞ β − βp(θ)/q(θ) − ρθ b ¢. ψ[−θ] = e−θu ψ(u) du = ¡ (6.5) θ β − θ − βp(θ)/q(θ) 0 b Thus, we have represented ψ[−θ] as a ratio of polynomials (note that θ must necessarily be a root of the numerator and cancels), and can use this to invert by the method of Proposition 6.1 to get ψ(u) = βeSu s. For the second algorithm, we use a representation (α, T , t) of b(x). We recall (see Section 3; recall that t = −T e) that if B is phasetype and (α, T , t) a phasetype representation with α the initial vector, T the phase generator and t = −T e, then ψ(u) = −α+ e(T +tα+ )u T −1 t where α+ = −βαT −1 .
(6.6)
The remarkable fact is that, despite that the proof of (6.6) in Section 3 seems to use the probabilistic interpretation of phasetype distribution in an essential way, then: Proposition 6.5 (6.6) holds true also in the matrixexponential case.
280
CHAPTER IX. MATRIXANALYTIC METHODS
Proof. Write −1 −1 b∗ = α(θI − T )−1 t, b∗+ = α+ (θI − T )−1 t, b∗∗ T t. + = α+ (θI − T )
Then in Laplace transform formulation, the assertion is equivalent to −α+ (θI − T − tα+ )−1 T −1 t =
β − βb∗ − ρθ , θ(β − θ − βb∗ )
(6.7)
cf. (6.5), (6.6). Presumably, this can be verified by analytic continuation from the phasetype domain to the matrixexponential domain, but we shall give an algebraic proof. From the general matrix identity ([789, p. 519]) (A + U BV )−1 = A−1 − A−1 U B(B + BV A−1 U B)−1 BV A−1 , with A = θI − T , U = −t, B = 1 and V = α+ , we get (θI − T − tα+ )−1
¡ ¢−1 (θI − T )−1 + (θI − T )−1 t 1 − α+ (θI − T )−1 t α+ (θI − T )−1 1 (θI − T )−1 tα+ (θI − T )−1 = (θI − T )−1 + 1 − b∗+
=
so that −α+ (θI − T − tα+ )−1 T −1 t =
−b∗∗ + −
b∗∗ b∗+ b∗∗ + = ∗+ . ∗ 1 − b+ b+ − 1
Now, since (θI − T )−1 T −1
=
(θI − T )−1 T −2
=
¢ 1 ¡ −1 T + (θI − T )−1 , θ 1 1 1 −2 T + 2 T −1 + 2 (θI − T )−1 θ θ θ
and Z 1
0
Z µB
∞
= = 0
∞
b(x) dx = −αT −1 t, xb(x) dx = αT −2 t,
7. RESERVEDEPENDENT PREMIUMS
281
we get b∗+
b∗∗ +
−βαT −1 (θI − T )−1 t = −βα(θI − T )−1 T −1 t ¢ β ¡ β = − α T −1 + (θI − T )−1 t = (1 − b∗ ), θ θ = −βαT −1 (θI − T )−1 T −1 t = −βα(θI − T )−1 T −2 t ¶ µ 1 −1 1 1 −2 −1 T + 2 T + 2 (θI − T ) t = −βα θ θ θ β β ρ = − + 2 − 2 b∗ . θ θ θ =
∗ From this it is straightforward to check that b∗∗ + /(b+ − 1) is the same as the r.h.s. of (6.7). 2
Notes and references As noted in the references to Section 4, some key early references using distributions with a rational transform for applied probability calculations are T¨ acklind [826] (ruin probabilities) and Smith [814] (queueing theory). A key tool is identifying poles and zeros of transforms via WienerHopf factorization. Much of the flavor of this classical approach and many examples are in Cohen [249]; see also Dufresne [332] and Kuznetsov [563] for a recent discussion. For expositions on the general theory of matrixexponential distributions, see Asmussen & Bladt [74], Lipsky [600] and Asmussen & O’Cinneide [94]; a key early paper is Cox [264] (from where the distribution in Example 6.3 is taken). The proof of Proposition 6.5 is similar to arguments used in [74] for formulas in renewal theory. Some relevant more recent references on matrixexponential distributions are Bean, Fackrell & Taylor [150], Bladt & Neuts [173] and Fackrell [360].
7
Reservedependent premiums
We consider the model of Chapter VIII with Poisson arrivals at rate β, premium rate p(r) at level r of the reserve {Rt } and claim size distribution B which we assume to be of phasetype with representation (E, α, T ). In Corollary VIII.1.9, the ruin probability ψ(u) was found in explicit form for the case of B being exponential (for some remarkable explicit formulas due to Paulsen & Gjessing [687], see the Notes to VIII.1, but the argument of [687] does not apply in any reasonable generality). We present here first a computational approach for the general phasetype case (Section 7a) and next (Section 7b) a set of formulas covering the case of a twostep premium rule, cf. VIII.1a.
282
7a
CHAPTER IX. MATRIXANALYTIC METHODS
Computing ψ(u) via differential equations
The representation we use is essentially the same as the one used in Sections 3 and 4, to piece together the phases at downcrossing times of {Rt } (upcrossing times of {St }) to a Markov process {mx } with state space E. See Fig. IX.9, which is selfexplanatory given Fig. IX.6.
Figure IX.9 The difference from the case p(r) ≡ p is that {mx }, though still Markov, is no longer timehomogeneous. Let P (t1 , t2 ) be the matrix with ijth element P ( mt2 = j  mt1 = i), 0 ≤ t1 ≤ t2 ≤ u. Define further νi (u) as the probability that the risk process starting from P R0 = u downcrosses level uPfor the first time in phase i. Note that in general i∈E νi (u) < 1. In fact, i∈E νi (u) is the ruin probability for a risk process with initial reserve 0 and premium function p(u + ·). Also, in contrast to Section 3, the definition of {mx } depends on the initial reserve u = R0 . ¡ ¢ Since ν(u) = νi (u) i∈E is the (defective) initial probability vector for {mx }, we obtain ψ(u) = P(mu ∈ E) = ν(u)P (0, u)e = λ(u)e
(7.1)
where λ(t) = ν(u)P (0, t) is the vector of state probabilities for mt , i.e. λi (t) = P(mt = i). Given the ν(t) have been computed, the λ(t) and hence ψ(u) is available by solving differential equations: ¡ ¢ Proposition 7.1 λ(0) = ν(u) and λ0 (t) = λ(t) T + tν(u − t) , 0 ≤ t ≤ u.
7. RESERVEDEPENDENT PREMIUMS
283
Proof. The first statement is clear by definition. By general results on timeinhomogeneous Markov processes, ½Z t2 ¾ P (t1 , t2 ) = exp Q(v) dv (7.2) t1
where
¤¯¯ d£ P (t, t + s) − I ¯ . (7.3) ds s=0 However, the interpretation of Q(t) as the intensity matrix of {mx } at time t shows that Q(t) is made up of two terms: obviously, {mx } has jumps of two types, those corresponding to state changes in the underlying phase process and those corresponding to the present jump of {Rt } being terminated at level u − t and being followed by a downcrossing. The intensity of a jump from i to j is tij for jumps of the first type and ti νj (u − t) for the second. Hence Q(t) = T + tν(u − t), ¡ ¢ λ0 (t) = λ(t)Q(t) = λ(t) T + tν(u − t) . 2 Q(t) =
Thus, from a computational point of view the remaining problem is to evaluate the ν(t), 0 ≤ t ≤ u. Proposition 7.2 For i ∈ E, −νi0 (u)p(u) = βαi + νi (u)
nX
o X νj (u)tj p(u) − β + νj (u)tji p(u) .
j∈E
(7.4)
j∈E
Proof. Consider the event A that there are no arrivals in the interval [0, dt], the probability of which is 1 − βdt. Given Ac , the probability that level u is downcrossed for the first time in phase i is αi . Given A, the probability that ¡ ¢ level u + p(u)dt is downcrossed for the first time in phase j is νj u + p(u)dt . Given this occurs, two things can happen: either the current jump continues from u + p(u)dt to u, or it stops between level u + p(u)dt and u. In the first case, the probability of downcrossing level u in phase i is ¡ ¢ δji 1 + p(u)dt · tii + (1 − δji )p(u)dt · tji = δji + p(u)tji dt , whereas in the second case the probability is p(u)dt · tj νi (u). Thus, given A, the probability of downcrossing level u in phase i for the first time is X ¡ ¢£ ¤ νj u + p(u)dt δji + p(u)dt · tji + p(u)dt · tj νi (u) j∈E
=
νi (u) + νi0 (u)p(u) dt + p(u) dt
X© j∈E
ª tji + tj νi (u) .
284
CHAPTER IX. MATRIXANALYTIC METHODS
Collecting terms, we get νi (u) = αi dt + (1 − βdt)νi (u) + νi0 (u)p(u) dt + p(u) dt
X©
ª tji + tj νi (u) .
j∈E
Subtracting νi (u) on both sides and dividing by dt yields the asserted differential equation. 2 When solving the differential equation in Proposition 7.2, we face the difficulty that no boundary conditions is immediately available. To deal with this, consider a modification of the original process {Rt } by linearizing the process with some rate ρ, say, after a certain level v, say. Let pv (t), Rtv , Pv etc. refer to the modified process. Then ½ p(r) r < v v p (r) = , ρ r≥v and (no matter how ρ is chosen) we have: Lemma 7.3 For any fixed u ≥ 0, νi (u) = lim νiv (u). v→∞
Proof. Let A be the event that the process downcrosses level u in phase i given that it starts at u and let Bv be the event © ª Bv = σ < ∞, sup Rt > v t≤σ
where σ denotes the time of downcrossing level u. Then P(Bv ) is the tail of a (defective) random variable so that P(Bv ) → 0 as v → ∞, and similarly Pv (Bv ) → 0. Since the processes Rt and Rtv coincide under level Bv , then P(A ∩ Bvc ) = v P (A ∩ Bvc ). Now since both P(A ∩ Bv ) → 0 and Pv (A ∩ Bv ) → 0 as v → ∞ we have P(A) − Pv (A)
= =
P(A ∩ Bv ) + P(A ∩ Bvc ) − Pv (A ∩ Bv ) − Pv (A ∩ Bvc ) P(A ∩ Bv ) − Pv (A ∩ Bv )
→
0
as v → ∞.
2
From Section 3, we have β p(r) ≡ ρ ⇒ νi (u) ≡ − αT ei , ρ
(7.5)
7. RESERVEDEPENDENT PREMIUMS
285
which implies that νiv (v) is given by the r.h.s. of (7.5). Thus, we can first for a given v solve (7.4) backwards for {νiv (t)}v≥t≥0 , starting from ν v (v) = −βπT −1 /ρ. This yields νiv (u) for any values of u and v such that u ≤ v. Next consider a sequence of solutions obtained from a sequence of initial values {νiv (u)}v where, say, v = u, 2u, 3u etc. Thus we obtain a convergent sequence of solutions that converges to {νi (t)}u≥t≥0 . Notes and references The exposition is based upon Asmussen & Bladt [75] which also contains numerical illustrations. The algorithm based upon the numerical solution of a Volterra integral equation (Remark VIII.1.10, numerically implemented in Schock Petersen [695]) and the present one based upon differential equations require both discretization along a discrete grid 0, 1/n, 2/n, . . .. However, typically the complexity in n is at best O(n2 ) for integral equations but O(n) for differential equations. The actual precision depends on the particular numerical scheme being employed. The trapezoidal rule used in [695] gives a precision of O(n−3 ), while the fourthorder RungeKutta method implemented in [75] gives O(n−5 ).
7b
Twostep premium rules
We now assume the premium function to be constant in two levels as in VIII.1a, ½ p1 r ≤ v (7.6) p(r) = p2 r > v. We may think of process Rt as pieced together of two standard risk processes Rt1 and Rt2 with constant premiums p1 , p2 , such that Rt coincide with Rt1 under (i)
(i)
level v and with Rt2 above level v. Let ψ i (u) = α+ e(T +tα+ )u e denote the ruin (i) probability for Rti where αi+ = α+ = −βαT −1 /pi , cf. Corollary 3.1. We recall (i) from Proposition VIII.1.12 that ¡ in addition ¢ to the ψ (·), the evaluation of ψ(u) (1) (1) requires φv (u) = 1 − ψ (u)/ 1 − ψ (v) , 0 ≤ u ≤ v, which is available since the ψ (i) (·) are so, as well π(u), the probability of ruin between σ and the next upcrossing of v, where σ = inf {t ≥ 0 : Rt ≤ v}. (2)
(2)
To evaluate π(u), let ν(u) = α+ e(T +tα+ )(u−v) , assuming u ≥ v for the moment. Then ν(u) is the initial distribution of the undershoot when downcrossing level v given that the process starts at u, i.e. for u ≥ v the distribution of v − Rσ (defined for σ < ∞ only) is defective phasetype with representation (ν(u), T ). Recall that φv (w) is the probability of upcrossing level v before ruin given the process starts at w ≤ v. Therefore Z v ¡ ¢ π(u) = ν(u)eT x t 1 − φv (v − x) dx + ν(u)eT v e (7.7) 0
286
CHAPTER IX. MATRIXANALYTIC METHODS
(the integral is the contribution from {Rσ ≥ 0} and the last term the contribution from {Rσ < 0}). The integral in (7.7) equals Z v Z v 1 − ψ (1) (v − x) dx ν(u)eT x t dx − ν(u)eT x t 1 − ψ (1) (v) 0 0 ½ ¾ Z v 1 Tv Tx (1) 1 − ν(u)e e − ν(u)e tψ (v − x) = 1 − ν(u)eT v e − dx 1 − ψ (1) (v) 0 from which we see that Z v ¡ ¢ 1 1 ν(u)eT x tψ (1) (v−x) dx− 1 − ν(u)eT v e . π(u) = 1+ 1 − ψ (1) (v) 0 1 − ψ (1) (v) (7.8) The integral in (7.8) equals Z v (2) (2) ν(u)eT x tα+ e(T +tα+ )(v−x) e dx 0
which, using Kronecker calculus (see A.4), can be written as `
(2)
ν(u) ⊗ α+ e(T +tα+ (2)
)v ´`
(2) ¯ (2) ´−1 ˘ {T ⊕(−T −tα+ )}v T ⊕ (−T − tα+ ) e − I (t ⊗ e) .
Thus, all quantities involved in the computation of ψ(u) have been found in matrix form. © ª Example 7.4 Let Rt1 be as in Example 3.2. I.e., B is hyperexponential corresponding to µ ¶ ¶ µ 1 1 −3 0 3 ), T = . α=( , t= 7 0 −7 2 2 5 = 75 yields ψ(u) = 1, so The arrival rate is β = 3. Since µB = 5/21, p2 ≤ 3 · 21 3 we consider the nontrivial case example p2 = 4 and p1 = 1. From Example 3.2,
1 24 −u 35 − 24e−u − e−6u e + e−6u ⇒ φv (u) = . 35 35 35 − 24e−v − e−6v √ √ (2) Let λ1 = −3 + 2 2 and λ2 = −3 − 2 2 be the eigenvalues of T + tα+ . Then one gets √ ³ √2 + 1 1 − 2 λ2 (u−v) 1 λ1 (u−v) 1 λ2 (u−v) ´ λ1 (u−v) e + e e + e , ν(u) = 3 3 7 7 ³ 10 1 √ ´ ³ 10 1 √ ´ − + 2 eλ2 (u−v) + 2 eλ1 (u−v) , ψ (2) (u − v) = 21 3 21 3 20 . ψ (2) (0) = 21 ψ (1) (u) =
8. ERLANGIZATION FOR THE FINITE HORIZON CASE
287
From (7.7) we see that we can write π(u) = ν(u)V 2 where V 2 depends only on v, and one gets 12e5v − 2 35e6v − 24e5v − 1 V2 = . 5v 4e + 6 35e6v − 24e5v − 1 Thus, π(u) = p12 (u)/p11 (u) where p11 (u) = p12 (u) =
35e6v − 24e5v − 1, √ ³ 32 √ ´ λ (u−v) 5v ³ 2 2 4 ´ λ2 (u−v) 2 −4 2 e + e e + 7 3 21√ ³4 ³ 32 √ ´ 2 2 ´ λ1 (u−v) + 4 2 eλ1 (u−v) e5v + − e . + 7 21 3
In particular, π(v)
=
ψ(v)
=
192e5v + 8 , 21(35e6v − 24e5v − 1) 192e5v + 8 . 6v 35e + 168e5v + 7
Thus all terms involved in the formulae for the ruin probability have been explicitly derived. 2 Notes and references The analysis and the example are from Asmussen & Bladt [75].
8
Erlangization for the finite horizon case
We consider the Cram´erLundberg model with parameters β, B and recall from Corollary V.3.5 that an explicit formula for the Laplace transform w.r.t. u of E[e−δτ (u) ; τ (u) < ∞] can be found in terms of the root −ρδ < 0 of ¡ ¢ b −1 = δ. κ(r) = β B[r]
(8.1)
Thus the finite horizon ruin probability ψ(u, T ) can in principle be computed exactly via a double Laplace transform inversion. Now transform inversion is never entirely straightforward and even less so when it is higherdimensional. We present in this section a numerical scheme that basically only requires a
288
CHAPTER IX. MATRIXANALYTIC METHODS
rootfinding and the computation of a matrixexponential under the assumption that the claim size distribution is phasetype (E, α, T ). The basic idea is to replace the deterministic time horizon T by a r.v. Hk that has an Erlang distribution with k stages and mean T , that is, with density δ k tk−1 −δt e where δ = δk = k/T . (k − 1)! That is, we compute Z
∞
ψ(u, t)
ψk (u) = Eψ(u, Hk ) = 0
δ k tk−1 −δt e dt . (k − 1)!
(8.2)
Since the s.c.v. of the Erlang distribution goes to 0 as k → ∞, of course also ψk (u) → ψ(u, T ). The case k = 1 of an exponential time horizon then comes out fairly easily, whereas a simple recursion scheme exists for going from k to k + 1. Combining with an extrapolation idea yields a considerable improvement of the numerical scheme. The approximation ψ(u, T ) ≈ ψk (u) could be called Erlang smoothing. Namely, (8.2) means that we approximate ψ(u, T ) by the function ψ(u, t) of t smoothed by the kernel which is the Erlang density with mean T . Cf. Fig. IX.10.
n=1 n=4 n=16 n=100 ΨHu,TL
T
Figure IX.10: Erlang smoothing Proceeding to the details, one may first note that the model is a special case of the Markovian environment model in Chapter VII. Namely, the state Jt of the environment at time t is the current exponential stage 1, . . . , k of Tk . However, we have the difference that here the environmental process J is
8. ERLANGIZATION FOR THE FINITE HORIZON CASE
289
terminating, whereas Chapter VII concentrates on J being ergodic. Indeed, J has terminated by time t if Hk < t. Nevertheless, we may proceed along similar ideas as in VII.2, only do we now reverse the sign and not the time. More precisely, we define Yx ∈ Ek = {1, . . . , k} × E to have the value (i, j) if the upcrossing of {St } of level x occurs in state j of the phasetype jump leading to the upcrossing and if Hk at that time is in stage i. Obviously, {Yx } is a Markov process, and it is terminating since Hk < ∞. Furthermore, a jump occurs in two ways: as consequence of a jump in the Markov process underlying the current claim. This changes only j, not i, so that the matrix of corresponding rates is I ⊕ T . Or the current claim may terminate, in which case the new state will be (k, `), with k ≥ i and ` ∈ E the phase at the next ladder point. Denoting by α(k) the row vector of state probabilities when the first ladder point of {St } occurs in Erlang stage k, it follows that the intensity matrix U of Y is given by tα(1) tα(2) tα(3) . . . tα(k) 0 tα(1) tα(2) . . . tα(k−1) 0 tα(1) . . . tα(k−2) U = I ⊕T + 0 . .. .. .. . . (1) 0 0 0 ... tα (recall that t = −T e denotes the exit vector of¡ the phasetype distribution). ¢ We further get ψk (u) = α∗ eU u e where α∗ = α(1) α(2) . . . α(k) . Thus, it only remains to compute the α(`) . We first consider the exponential case k = 1. −1 Theorem 8.1 Define αδ = α(1) ¡ . Then αδ =−1βα(ρδ¢I − T ) where −ρδ is the negative root of κ(r) = δ, i.e. β α(−rI − T ) t − 1 − r = δ.
Proof. We condition upon the time t of the first claim where St− = −t. The exponential time exceeds t w.p. e−δt , and so, proceeding again along the lines of the second proof of Corollary 3.1, we conclude that Z ∞ ¡ ¢−1 αδ = βe−δt e−δt αe(T +tαδ )t dt = βα (β + δ)I − T − tαδ . 0
Thus (β + δ)αδ − αδ T − αδ tαδ = βα .
(8.3)
For brevity, write ν = α(ρδ I − T )−1 . We will show that αδ = βν satisfies (8.3). We first note that the definition of ρδ implies βνt = β + δ − ρδ and that νT = α(ρδ I − T )−1 (−ρδ I + T ) + α(ρδ I − T )−1 ρδ I = −α + ρδ ν .
290
CHAPTER IX. MATRIXANALYTIC METHODS
Inserting αδ = βν in the l.h.s. of (8.2), we therefore obtain (β 2 + βδ)ν + βα − βρδ ν − β(β + δ − ρδ )ν which equals βα, as should be. We omit the proof that α(ρδ I − T )−1 is the correct one among the solutions of (8.2). 2 For a general k, we have the following recursion: (n+1)
Theorem 8.2 αδ
³ =
(n)
δαδ
+
n X
(j)
αn+2−j tαδ δ
´¡ (1) ¢−1 ρδ I − T − tαδ .
j=2
The proof is more complicated, and we refer to Asmussen, Avram & Usabel [71]. The algorithm can be improved by Richardson extrapolation. This is a general method (see e.g. Press et al. [715]) for computing a number w accurately using a sequence wk → w for which the convergence rate is known, w − wk =
d c + + ··· . k k 1+²
(8.4)
Here c is typically unknown but can be eliminated. Indeed, letting wk∗ = (k + 1)wk+1 − kwk , it is clear that wk∗ → w and that one obtains an improved approximation of convergence rate O(k −1−² ). In the present setting, w = ψ(u, T ), wk = ψk (u) and (8.4) simply follows by the CLT for the underlying Erlang r.v. Hk : ψk (u) = Eψ(u, Hk ) h i = E ψ(u, T ) + ψT (u, T )(Hk − T ) + ψT T (u, T )(Hk − T )2 /2 + · · · = ψ(u, T ) + 0 + ψT T (u, T )Var(Hk ) + · · · c = ψ(u, T ) + + · · · k where as usual ψT , ψT T are the first and second order partial derivatives of ψ w.r.t. T . Example 8.3 For an illustration of the method, consider a highly skewed claim size distribution, namely a mixture of three exponential distributions with rates 0.015, 0.190, 5.51 and corresponding weights 0.004, 0.108 and 0.888. Choose the safety loading η = 0.1 and T = 1, u = 0. The exact value ψ(u, T ) = 2.28% was calculated by transform inversion. Figure IX.11 shows the results of the Erlangization, with the circles corresponding to the simple method and the filled ones to the extrapolated values.
8. ERLANGIZATION FOR THE FINITE HORIZON CASE
291
6
ψ(u, T )
s
s
s c
s c
c
7
8
c
0.9 ψ(u, T ) c 1
2
3
4
5
6
 k
Figure IX.11: Erlangization and extrapolation It is seen that the simple method produces good results even in the range k = 5–7. This is maybe somewhat surprising, since the Erlang(7) distribution is quite far from being degenerate. The precision of the extrapolation method is remarkable. Even the value for k = 1 would suffice for all practical purposes! 2 Notes and references The exposition is based upon Asmussen, Avram & Usabel [71] (who also consider general phasetype horizons). Note, however, that Theorem 8.1 appears already in Avram & Usabel [112]. See also Ramaswami, Woolford & Stanford [723]. Exponential/Erlangian time horizons have also been used in finance, where the idea is known as Canadization. An early classical reference for the exponential case is Carr [223]. Erlangian horizons occur, e.g., in Kyprianou & Pistorius [566].
This page intentionally left blank
Chapter X
Ruin probabilities in the presence of heavy tails 1
Subexponential distributions
We are concerned with distributions B with a heavy right tail B(x) = 1 − B(x). b = A rough distinction between light and heavy tails is that the m.g.f. B[r] R rx e B(dx) is finite for some r > 0 in the lighttailed case and infinite for all r > 0 in the heavytailed case. For example, the exponential change of measure techniques discussed in III.3, IV.4–6 and at numerous later occasions require a light tail. Some main cases where this lighttail criterion are violated are (a) distributions with a regularly varying tail, B(x) = L(x)/xα where α > 0 and L(x) is slowly varying, L(tx)/L(x) → 1, x → ∞, for all t > 0; (b) the lognormal distribution (the distribution of eU where U ∼ N (µ, σ 2 )) with density 2 2 1 √ e−(log x−µ) /2σ ; 2 x 2πσ β
(c) the Weibull distribution with decreasing failure rate, B(x) = e−x with 0 < β < 1. For further examples, see I.2b. b = ∞ for all r > 0 of heavy tails is too general to allow The definition B[r] for general nontrivial results on ruin probabilities, and instead we shall work within the class S of subexponential distributions. For the definition, we require that B is concentrated on (0, ∞) and say then that B is subexponential (B ∈ S) if 293
294
CHAPTER X. HEAVY TAILS
B ∗2 (x) → 2, x → ∞. B(x)
(1.1)
Here B ∗2 is the convolution square, that is, the distribution of the sum of independent r.v.’s X1 , X2 with distribution B. In terms of r.v.’s, (1.1) then means P(X1 + X2 > x) ∼ 2P(X1 > x). To capture the intuition behind this definition, note first the following fact: Proposition 1.1 Let B¢ be any distribution on (0, ∞). Then: ¡ (a) P max(X1 , X2 ) > x ∼ 2B(x), x → ∞. B ∗2 (x) ≥ 2. (b) lim inf x→∞ B(x) ¡ ¢ Proof. By the inclusionexclusion formula, P max(X1 , X2 ) > x is P(X1 > x) + P(X2 > x) − P(X1 > x, X2 > x) = 2B(x) − B(x)2 ∼ 2B(x), proving (a). Since B is concentrated on (0, ∞), we have {max(X1 , X2 ) > x} ⊆ {X1 + X2 > x}, and thus the lim inf in (b) is at least lim inf P(max(X1 , X2 ) > x)/B(x) = 2. 1 2 The proof shows that the condition for B ∈ S is that the probability of the set {X1 + X2 > x} is asymptotically the same as the probability of its subset {max(X1 , X2 ) > x}. That is, in the subexponential case the only way X1 + X2 can get large is roughly by one of the Xi becoming large. We later show: Proposition 1.2 If B ∈ S, then ¯ ¯ ¡ ¡ ¢ ¢ 1 1 P X1 > x ¯ X1 + X2 > x → , P X1 ≤ y ¯ X1 + X2 > x → B(y). 2 2 That is, given X1 + X2 > x, the r.v. X1 is w.p. 1/2 ‘typical’ (with distribution B) and w.p. 1/2 it has the distribution of X1 X1 > x. In contrast, the behavior in the lighttailed case is illustrated in the following example: Example 1.3 Consider the standard exponential distribution, B(x) = e−x . Then X1 + X2 has an Erlang(2) distribution with density ye−y so that B ∗2 (x) ∼ xe−x . Thus the lim inf in Proposition 1.1(b) is ∞. In contrast to Proposition 1.2, one can check that ¶¯ µ X1 X2 ¯¯ D , X1 + X2 > x → (U, 1 − U ) x x ¯ 1 Note that it can be shown that for any heavytailed distribution one has in fact the stronger result lim inf x→∞ B ∗2 (x)/B(x) = 2, see Foss & Korshunov [365].
1. SUBEXPONENTIAL DISTRIBUTIONS
295
where U is uniform on (0, 1). Thus, if X1 + X2 is large, then (with high probability) so are both of X1 , X2 but none of them exceeds x. 2 Here is the simplest example of subexponentiality: Proposition 1.4 Any B with a regularly varying tail is subexponential. Proof. Assume B(x) = L(x)/xα with L slowly varying and α > 0. Let 0 < δ < 1/2. If X1 + X2 > x, then either one of the Xi exceeds (1 − δ)x, or they both exceed δx. Hence ¡ ¢ 2B (1 − δ)x + B(δx)2 B ∗2 (x) ≤ lim sup lim sup B(x) B(x) x→∞ x→∞ ¡ ¢ ¡ ¢α 2L (1 − δ)x / (1 − δ)x 2 +0 = . = lim sup α L(x)/x (1 − δ)α x→∞ Letting δ ↓ 0, we get lim sup B ∗2 (x)/B(x) ≤ 2, and combining with Proposition 2 1.1(b) we get B ∗2 (x)/B(x) → 2. We now turn to the mathematical theory of subexponential distributions. Proposition 1.5 If B ∈ S, then x → ∞.
B(x − y) → 1 uniformly in y ∈ [0, y0 ] as B(x)
[In terms of r.v.’s: if X ∼ B ∈ S, then the overshoot X − xX > x converges in distribution to ∞. This follows since the probability of the overshoot to exceed y is B(x + y)/B(x) which has limit 1.] Proof. Consider first a fixed y. Using the identity B ∗(n+1) (x) B(x) − B ∗(n+1) (x) = 1+ = 1+ B(x) B(x)
Z 0
x
1 − B ∗n (x − z) B(dz) (1.2) B(x)
with n = 1 and splitting the integral into two corresponding to the intervals [0, y] and (y, x], we get ¢ B ∗2 (x) B(x − y) ¡ ≥ 1 + B(y) + B(x) − B(y) . B(x) B(x) If lim sup B(x − y)/B(x) > 1, we therefore get lim sup B ∗2 (x)/B(x) > 1+B(y)+ 1 − B(y) = 2, a contradiction. Finally lim inf B(x − y)/B(x) ≥ 1 since y > 0.
296
CHAPTER X. HEAVY TAILS
The uniformity now follows from what has been shown for y = y0 and the obvious inequality 1 ≤
B(x − y) B(x − y0 ) ≤ , B(x) B(x)
y ∈ [0, y0 ]. 2
b = ∞ for all ² > 0. Corollary 1.6 If B ∈ S, then e²x B(x) → ∞, B[²] Proof. For 0 < δ < ², we have by Proposition 1.5 that B(n) ≥ e−δ B(n − 1) for all large n so that B(n) ≥ c1 e−δn for all n. This implies B(x) ≥ c2 e−δx for all x, and this immediately yields the desired conclusions. 2 Proof of Proposition 1.2. ¯ ¡ ¢ P X1 > x ¯ X1 + X2 > x =
1 B(x) P(X1 > x) → , = P(X1 + X2 > x) 2 B ∗2 (x)
¯ ¢ P X1 ≤ y ¯ X1 + X2 > x ¡
∼ →
Z y 1 B(x − z) B(dz) 2B(x) 0 Z y 1 1 B(dz) = B(y), 2 0 2
using Proposition 1.5 and dominated convergence.
2
The following result is extremely important and is often taken as definition of the class S ; its intuitive content is the same as discussed in the case n = 2 above. Proposition 1.7 If B ∈ S, then for any n, B ∗n (x)/B(x) → n as x → ∞. Proof. We use induction. The case n = 2 is just the definition, so assume the proposition has ¯ been shown for n. Given ² > 0, choose y such that ¯ ¯B ∗n (x)/B(x) − n¯ ≤ ² for x ≥ y. Then by (1.2), B ∗(n+1) (x) = 1+ B(x)
µZ
Z
x−y
x
+ 0
x−y
¶
B ∗n (x − z) B(x − z) B(dz) . B(x − z) B(x)
Here the second integral can be bounded by sup v≥0
B ∗n (v) B(x) − B(x − y) , B(v) B(x)
1. SUBEXPONENTIAL DISTRIBUTIONS
297
which converges to 0 by Proposition 1.5 and the induction hypothesis. The first integral is Z ¡ ¢ x−y B(x − z) B(dz) n + O(²) B(x) 0 ¾ ½ Z x ¡ ¢ B(x) − B ∗2 (x) B(x − z) − B(dz) . = n + O(²) B(x) B(x) x−y Here the first term in {·} converges¡ to 1 (by the definition of B ∈ S) and the ¢ second to 0 since it is bounded by B(x) − B(x − y) /B(x). Combining these estimates and letting ² ↓ 0 completes the proof. 2 Lemma 1.8 If B ∈ S, ² > 0, then there exists a constant K = K² such that B ∗n (x) ≤ K(1 + ²)n B(x) for all n and x. ¡ ¢ Proof. Choose T such that B(x) − B ∗2 (x) /B(x) ≤ 1 + ² for x ≥ T and let A = 1/B(T ), αn = supx≥0 B ∗n (x)/B(x). Then by (1.2), for all n αn+1
Z
Z x ∗n B ∗n (x − z) B (x − z) B(x − z) B(dz) + sup B(dz) ≤ 1 + sup B(x) B(x − z) B(x) x>T 0 x≤T 0 Z x B(x − z) B(dz) ≤ 1 + A + αn (1 + ²) . ≤ 1 + A + αn sup B(x) x>T 0 x
Iterating, we get with α1 = 1 αn+1 ≤ (1 + A) Take K = (1 + (1 + A)/²)/(1 + ²).
1 − (1 + ²)n + (1 + ²)n . −² 2
Proposition 1.9 Let A1 , A2 be distributions on (0, ∞) such that Ai (x) ∼ ai B(x) for some B ∈ S and some constants a1 , a2 with a1 + a2 > 0. Then A1 ∗ A2 (x) ∼ (a1 + a2 )B(x). Proof. Let X1 , X2 be independent r.v.’s such that Xi has distribution Ai . Then by definition A1 ∗ A2 (x) = P(X1 + X2 > x). For any fixed v, Proposition 1.5 easily yields Z v Aj (x − y)Ai (dy) P(X1 + X2 > x, Xi ≤ v) = 0 ¡ ¢ ∼ aj B(x)Ai (v) = aj B(x) 1 + ov (1)
298
CHAPTER X. HEAVY TAILS
(j = 3 − i). Since P(X1 + X2 > x, X1 > x − v, X2 > x − v) ≤ A1 (x − v)A2 (x − v) ∼ a1 a2 B(x)2 which can be neglected, it follows that it is necessary and sufficient for the assertion to be true that Z x−v Aj (x − y)Ai (dy) = B(x)ov (1). (1.3) v
Using the necessity part in the case A1 = A2 = B yields Z x−v B(x − y)B(dy) = B(x)ov (1).
(1.4)
v
Now (1.3) follows if Z
x−v
B(x − y)Ai (dy) = B(x)ov (1).
(1.5)
v
By a change of variables, the l.h.s. of (1.5) becomes Z x−v B(x − v)Ai (v) − Ai (x − v)B(v) + Ai (x − y)B(dy) . v
Here approximately the last term is B(x)ov (1) by (1.4), whereas the two first 2 yield B(x)(Ai (v) − ai B(v)) = B(x)ov (1). Corollary 1.10 The class S is closed under tailequivalence. That is, if A(x) ∼ aB(x) for some B ∈ S and some constant a > 0, then A ∈ S. Proof. Taking A1 = A2 = A, a1 = a2 = a yields A∗2 (x) ∼ 2aB(x) ∼ 2A(x). 2 Corollary ¢ Let B ∈ S and let A be any distribution with a lighter tail, ¡ 1.11 A(x) = o B(x) . Then A ∗ B ∈ S and A ∗ B(x) ∼ B(x). Proof. Take A1 = A, A2 = B so that a1 = 0, a2 = 1.
2
It is tempting to conjecture that S is closed under convolution, i.e. B1 ∗B2 ∈ S and B1 ∗ B2 (x) ∼ B 1 (x) + B 2 (x) when B1 , B2 ∈ S. However, B1 ∗ B2 ∈ S does not hold in full generality (but once B1 ∗B2 ∈ S has been shown, B1 ∗ B2 (x) ∼ B 1 (x) + B 2 (x) follows precisely as in the proof of Proposition 1.9). In the regularly varying case, it is easy to see that if L1 , L2 are slowly varying, then so is L = L1 + L2 . Hence:
1. SUBEXPONENTIAL DISTRIBUTIONS
299
Corollary 1.12 Assume that B i (x) = Li (x)/xα , i = 1, 2, with α > 0 and L1 , L2 slowly varying. Then L = L1 + L2 is slowly varying and B1 ∗ B2 (x) ∼ L(x)/xα . We next give a classical sufficient (and close to necessary) condition for subexponentiality due to Pitman [705]. Recall that the failure rate λ(x) of a distribution B with density b is λ(x) = b(x)/B(x). Proposition 1.13 Let B have density b and failure rate λ(x) such that λ(x) is decreasing for x ≥ x0 with limit 0 at ∞. Then B ∈ S provided Z ∞ exλ(x) b(x) dx < ∞ . 0
Proof. We may assume that λ(x) is everywhere decreasing (otherwise, replace B by a tail equivalent distribution with a failure rate which is everywhere deRx creasing). Define Λ(x) = 0 λ(y) dy. Then B(x) = e−Λ(x) . By (1.2), B ∗2 (x) −1 B(x) Z x Z x B(x − y) b(y) dy = = eΛ(x)−Λ(x−y)−Λ(y) λ(y) dy B(x) 0 0 Z x/2 Z x/2 Λ(x)−Λ(x−y)−Λ(y) = e λ(y) dy + eΛ(x)−Λ(x−y)−Λ(y) λ(x − y) dy . 0
0
For y < x/2, Λ(x) − Λ(x − y) ≤ yλ(x − y) ≤ yλ(y). The rightmost bound shows that the integrand in the first integral is bounded by eyλ(y)−Λ(y) λ(y) = eyλ(y) b(y), an integrable function by assumption. The middle bound shows that it converges to b(y) for any fixed y since λ(x − y) → 0. Thus by dominated convergence, the first integral has limit 1. Since λ(x − y) ≤ λ(y) for y < x/2, we can use the same domination for the second integral but now the integrand has limit 0. Thus B ∗2 (x)/B(x) − 1 has limit 1 + 0, proving B ∈ S. 2 β
Example 1.14 Consider the DFR Weibull case B(x) = e−x with 0 < β < 1. β Then b(x) = βxβ−1 e−x , λ(x) = βxβ−1 . Thus λ(x) is everywhere decreasing, β and exλ(x) b(x) = βxβ−1 e−(1−β)x is integrable. Thus, the DFR Weibull distribution is subexponential. 2
300
CHAPTER X. HEAVY TAILS
Example 1.15 For the lognormal distribution, √ 2 2 log x e−(log x−µ) /2σ /(x 2πσ 2 ) ∼ . λ(x) = Φ(−(log x − µ)/σ) σ2 x This yields easily that exλ(x) b(x) is integrable. Further, elementary but tedious calculations (which we omit) show that λ(x) is ultimately decreasing. Thus, the lognormal distribution is subexponential. 2 In the regularly varying case, subexponentiality has already been proved in Corollary 1.12. To illustrate how Proposition 1.13 works in this setting, we first quote Karamata’s theorem (Bingham, Goldie & Teugels [169]): Proposition 1.16 For L(x) slowly varying and α > 1, Z ∞ L(x) L(y) dy ∼ . α y (α − 1)xα−1 x From this we get Proposition 1.17 If B has a tail of the form b(x) = αL(x)/xα+1 with L(x) slowly varying and α > 1, then B(x) ∼ L(x)/xα and λ(x) ∼ α/x. Thus exλ(x) b(x) ∼ eα b(x) is integrable. However, the monotonicity condition in Proposition 1.13 may present a problem in some cases so that the direct proof in Proposition 1.4 is necessary in full generality. We conclude with a property of subexponential distributions which is often extremely important: under some mild smoothness assumptions, the overshoot properly normalized has a limit which is Pareto if B is regularly varying and exponential for distributions like the lognormal or Weibull. More precisely, let X (x) = X −xX > x and define the mean excess function e(x) = EX (x) (in insurance mathematics and in particular in reinsurance, R ∞the term stoploss transform for the unconditional expectation E(X − x)+ = x B(y) dy is common). Then: Proposition 1.18 (a) If B(x) = L(x)/xα with L(x) slowly varying and α > 1, then e(x) ∼ x/(α − 1) and ¡ ¢ 1 ¢α ; P X (x) /e(x) > y → ¡ 1 + y/(α − 1) (b) Assume that for any y0 the failure rate λ(·) satisfies ¡ ¢ λ x + y/λ(x) → 1 λ(x)
(1.6)
(1.7)
1. SUBEXPONENTIAL DISTRIBUTIONS
301
uniformly for y ∈ (0, y0 ]. Then e(x) ∼ 1/λ(x) and ¡ ¢ P X (x) /e(x) > y → e−y ; (c) Under the assumptions of either (a) or (b),
R∞ x
(1.8)
B(y) dy ∼ e(x)B(x).
Proof. (a): Using Karamata’s theorem, we get EX
(x)
= = =
Z ∞ 1 E(X − x)+ = P(X > y) dy P(X > x) P(X > x) x ¡ ¢ Z ∞ L(x)/ (α − 1)xα−1 1 α L(y)/y dy ∼ L(x)/xα x L(x)/xα x . α−1
Further ¡ ¢ P (α − 1)X (x) /x > y = = ∼
P(X > x[1 + y/(α − 1)]  X > x) ¡ £ ¤¢ L x 1 + y/(α − 1) xα ¤¢α ·¡ £ L(x) x 1 + y/(α − 1) 1· ¡
1 ¢α . 1 + y/(α − 1)
We omit the proof of (c) and that EX (x) ∼ 1/λ(x). The remaining statement (1.8) in (b) then follows from ¡ ¢ P λ(x)X (x) > y ¯ ¡ ¢ © ¡ ¢ª = P X > x + y/λ(x) ¯ X > x = exp Λ(x) − Λ x + y/λ(x) n Z y/λ(x) o n Z y λ¡x + u/λ(x)¢ o du = exp − λ(x + z) dx = exp − λ(x) 0 © ¡0 ¢ª = exp −y 1 + o(1) . 2 The property (1.7) is referred to as 1/λ(x) being selfneglecting. It is trivially verified to hold for the Weibull and lognormal distributions, cf. Examples 1.14, 1.15. The mean excess function will play a main role later in Section 4 in connection with finitehorizon ruin probabilities and in Section 6 in connection with tail estimation.
302
CHAPTER X. HEAVY TAILS
Notes and references Good general references for subexponential distributions are Embrechts, Kl¨ uppelberg & Mikosch [349] and Rolski et al. [746]. In the last decade, there has been a considerable literature on the theory of subexponential distributions. One direction is local subexponentiality, which in its simplest ∗n form has estimates for the density of the and more generally gives ` form b (x) ∼ nb(x) ´ ∗n ∗n conditions for B (x + y) − B (x) ∼ n B(x + y) − B(x) for any fixed y. See, e.g., Asmussen, Foss & Korshunov [77]. Another direction is variants of the definition. Some of these are slight generalizations like intermediate regular variation, following up on Cline [248], others are slightly less general classes designed typically for an ad hoc purpose of pursuing some specific line of applications. The perspective of such studies may be a matter of taste. We would like, however, to point out one specific class which has proved rather robust, the class S ∗ ⊂ S originally introduced by Kl¨ uppelberg by the requirement Z x
B(x − y)B(y) dy ∼ µB B(x) .
(1.9)
0
For the intuition, note that B(x − y)/B(x) → 1 for any y so one expects the integral Rx divided by B(x) to have a limit 0 B(y) dy = µB . However, there is nothing like a dominated convergence argument to justify this, and in fact (1.9) may fail in some exceptional cases.
2
The compound Poisson model
Consider the compound Poisson PNt model with arrival intensity β and claim size distribution B. Let St = i=1 Ui − t be the claim surplus at time t and M = supt≥0 St , τ (u) = inf {t > 0; St > u}. We assume ρ = βµB < 1 and are interested in the ruin probability ψ(u) = P(M > u) = P(τ R x(u) < ∞). Recall that B0 denotes the stationary excess distribution, B0 (x) = 0 B(y) dy / µB . Theorem 2.1 If B0 ∈ S, then ψ(u) ∼
ρ B 0 (u). 1−ρ
The proof is based upon the following lemma (stated slightly more generally than needed at present). Lemma 2.2 Let Y1 , Y2 , . . . be i.i.d. with common distribution G ∈ S and let K be an independent integervalued r.v. with Ez K < ∞ for some z > 1. Then P(Y1 + · · · + YK > u) ∼ EK G(u). Proof. Recall from Section 1 that G∗n (u) ∼ nG(u), u → ∞, and that for each z > 1 there is a D < ∞ such that G∗n (u) ≤ G(u)Dz n for all u. We get ∞ ∞ X X G∗n (u) P(Y1 + · · · + YK > u) = P(K = n) → P(K = n) · n = EK, G(u) G(u) n=0 n=0
2. THE COMPOUND POISSON MODEL using dominated convergence with
P
303
P(K = n) Dz n as majorant.
2
Proof of Theorem 2.1. The PollaczeckKhinchine formula states that (in the setD up of Lemma 2.2) M = Y1 +· · ·+YK where the Yi have distribution B0 and K is geometric with parameter ρ, P(K = k) = (1 − ρ)ρk . Since EK = ρ/(1 − ρ) and Ez K < ∞ whenever ρz < 1, the result follows immediately from Lemma 2.2. 2 The condition B0 ∈ S is for all practical purposes equivalent to B ∈ S. However, mathematically one must note that there exist (quite intricate) examples where B ∈ S, B0 6∈ S, as well as examples where B 6∈ S, B0 ∈ S. The tail of B0 is easily expressed in terms of the tail of B and the function e(x) in Proposition 1.18, Z ∞ e(x)B(x) B(x)EX (x) 1 = . (2.1) B 0 (x) = B(y) dy = µB x µB µB In particular, in our three main examples (regular variation, lognormal, Weibull) one has L(x) L(x) ⇒ B 0 (x) ∼ , xα µB (α − 1)xα−1 ¶ µ 2 2 2 log x − µ σxe−(log x−µ) /2σ √ , ⇒ µB = eµ+σ /2 , B 0 (x) ∼ B(x) = Φ σ eµ+σ2 /2 (log x)2 2π B(x) ∼
β
B(x) = e−x
⇒
µB =
β 1 Γ(1/β) , B 0 (x) ∼ x1−β e−x . β Γ(1/β)
From this, B0 ∈ S is immediate in the regularly varying case, and for the lognormal and Weibull cases it can be verified using Pitman’s criterion (Proposition 1.13). In general it is known that B ∈ S ∗ is sufficient for B0 ∈ S. Note that in these examples, B0 is more heavytailed than B. In general: Proposition 2.3 If B ∈ S, then B 0 (x)/B(x) → ∞, x → ∞. Proof. Since B(x + y)/B(x) → 1 uniformly in y ∈ [0, a], we have R x+a B(y)dy a B 0 (x) ≥ lim inf x = . lim inf x→∞ B(x) x→∞ µB µB B(x) Let a → ∞.
2
Remark 2.4 Note that for regularly varying claim size distributions, one can also use the PollaczeckKhinchine formula and the Tauberian Theorem A6.2
304
CHAPTER X. HEAVY TAILS
(given in the Appendix) to provide a somewhat alternative proof of Theorem 2.1. More precisely, combining IV.(3.4) and IV.(3.5) we have ³1 − B ´ b0 [−s] ´ ³ b0 [−s] − 1) ρ(B b b0 [−s] + ρ2 B b0 [−s]2 + · · · . = ρ ψ[−s] = 1 + ρB b0 [−s] − 1) s s(ρB Assume that B(x) ∼ L(x)/xα with α > 1 and write α = n + η with n = bαc and 0 < η < 1 (if η = 0 there are obvious amendments). Then from above B 0 (x) ∼
L(x) , µB (α − 1)xα−1
x→∞
and with Theorem A6.2 this implies b0 [−s] = 1 + B
n−1 X j=1
(−1)n−1 α−1 aj (−s)j + s L(1/s) j! Γ(−α + 1)
for some constants aj . Hence b ψ[−s] =
b0 [−s] ´ ¡ ¢ ρ ³1 − B 1 + b1 s + b2 s2 + . . . 1−ρ s
for some constants bj and after subtracting the resulting first n − 2 terms with powers sk (k = 1, . . . , n − 2) the r.h.s. (and correspondingly also the l.h.s.) is b regularly varying R u at s = 0 with index η. Since ψ[−s] is the LaplaceStieltjes transform of 0 ψ(y) dy, another application of Theorem A6.2 shows that Z ∞ Z ∞ ρ B 0 (y) dy ψ(y) dy ∼ 1−ρ u u and by the Monotone Density Theorem ψ(u) ∼
ρ B 0 (u). 1−ρ 2
Notes and references Theorem 2.1 was derived by several different authors and under varying assumptions. We mention here Teugels & Veraverbeke [842], von Bahr [122], Borovkov [182], Thorin & Wikstad [849], Pakes [677] and Embrechts & Veraverbeke [353]. The approximation in Theorem 2.1 is notoriously not very accurate. The problem is a usually very slow rate of convergence as u → ∞. For some earlier numerical
3. THE RENEWAL MODEL
305
studies, see Abate, Choudhury & Whitt [1], Kalashnikov [517] and Asmussen & Binswanger [72]. E.g., in [517, p.195] there are numerical examples where ψ(u) is of order 10−5 but Theorem 2.1 gives 10−10 . This shows that although the approximation is asymptotically correct in the tail, one may have to go out to values of ψ(u) which are unrealistically small before the fit is reasonable. Second order terms were e.g. introduced in Abate et al. [1] and Baltr¯ unas [126, 127], but unfortunately the improvement is not very pronounced, see also Omey & Willekens [675, 676] for some related work. Based upon ideas of Hogan [473], Asmussen & Binswanger [72] suggested an approximation which is substantially better than Theorem 2.1 when u is small or moderately large. For USPareto and classical Pareto claim size distribution, the explicit Laplace transform in terms of an incomplete Gamma function can be used to obtain an integral representation (with nonoscillating integrand on the real line) for the ruin probability, which can be seen as an ‘almost explicit’ formula, see Ramsay [725, 726] and Albrecher & Kortschak [33]. In recent years, there has been a lot of research activity on higherorder asymptotic expansions of general compound distributions (under certain additional assumptions on the tail), see e.g. Geluk, Peng & de Vries [393], Borovkov & Borovkov [184], Barbe, McCormick & Zhang [131, 132], Mikosch & Nagaev [639], Kortschak & Albrecher [557] and Albrecher, Hipp & Kortschak [29]. In [29] it is also shown that a shift in the argument can substantially improve the accuracy of the firstorder asymptotic approximation in Theorem 2.1, see also the Notes to Section XVI.2a. For higherorder approximations for absolute ruin probabilities, see Borovkov [183]. As any approximation valid as u → ∞, the one in Theorem 2.1 can of course not be precise for small u. OlveraCravioto, Blanchet & Glynn [674] discuss the alternative of using the heavy traffic approximation for small and moderate u and identify the threshold where the subexponential approximation takes over. Upper bounds for ψ(u) in the heavytailed case can be found in Kalashnikov [517, 518] (see also Willmot & Lin [891, Sec.6.2]), but are in general quite complicated.
3
The renewal model
We consider the renewal model with claim size distribution B and interarrival distribution A as in Chapter VI. Let Ui be the ith claim, Ti the ith interarrival time and Xi = Ui − Ti , © ª Sn(d) = X1 + · · · + Xn , M = sup Sn(d) , ϑ(u) = inf n : Sn(d) > u . {n=0,1,...}
Then ψ(u) = P(M > u) = P(ϑ(u) < ∞). We assume positive safety loading, i.e. ρ = µB /µA < 1. The main result is: Theorem 3.1 Assume that (a) the stationary excess distribution B0 of B is subexponential and that (b) B itself satisfies B(x − y)/B(x) → 1 uniformly on
306
CHAPTER X. HEAVY TAILS
compact yintervals. Then ψ(u) ∼
ρ B 0 (u) , u → ∞. 1−ρ
(3.1)
[Note that (b) in particular holds if B ∈ S.] The proof is based upon the observation that also in the renewal setting, there is a representation of M similar to the PollaczeckKhinchine formula. To © (d) ª this end, let ϑ+ = ϑ(0) be the first ascending ladder epoch of Sn , ¡ ¢ ¡ ¢ G+ (A) = P Sϑ+ ∈ A, ϑ+ < ∞ = P Sτ+ ∈ A, τ+ < ∞ where τ+ = T1 + · · · + Tϑ+ as usual denotes the first ascending ladder epoch of the continuous time claim surplus process {St }. Thus G+ is the ascending ladder height distribution (which is defective because of µB < µA ). Define further θ = kG+ k = P(ϑ+ < ∞). Then K D X M = Yi
(3.2)
i=1
where K is geometric with parameter θ, P(K = k) = (1 − θ)θk and Y1 , Y2 , . . . are independent of K and i.i.d. with distribution G+ /θ (the distribution of Sϑ+ given τ+ < ∞). As for the compound Poisson model, this representation will be our basic vehicle to derive tail asymptotics of M but we face the added difficulties that neither the constant θ nor the distribution of the Yi are explicit. R ∞ Let F denote the distribution of the Xi and F I the integrated tail, F I (x) = F (y) dy, x > 0. x Lemma 3.2 F (x) ∼ B(x), x → ∞, and hence F I (x) ∼ µB B 0 (x). Proof. By dominated convergence and (b), Z ∞ Z ∞ F (x) B(x + y) = A(dy) → 1 · A(dy) = 1 . B(x) B(x) 0 0 2 The lemma implies that (3.1) is equivalent to P(M > u) ∼
1 F I (u) , u → ∞, µF 
(3.3)
and we will prove it in this form (in XIII.2, we will use the fact that the proof of (3.1) holds for a general random walk satisfying the analogues of (a), (b) and does not rely on the structure Xi = Ui − Ti ).
3. THE RENEWAL MODEL
307
¡ ¢ Write G+ (x) = G+ (x, ∞) = P Sϑ+ > x, ϑ+ < ∞ . Let further ϑ− = © ª (d) inf n > 0 : Sn ≤ 0 be the first descending ladder epoch, G− (A) = P(Sϑ− ∈ A) the descending ladder height distribution (kG− k = 1 because of µB < µA ) and let µG− be the mean of G− . Lemma 3.3 G+ (x) ∼ F I (x)/µG− , x → ∞. Pϑ −1 (d) Proof. Let R+ (A) P = E 0 + I(Sn ∈ A) denote the preϑ+ occupation mea∞ ∗n sure and let U− = 0 G− be the renewal measure corresponding to G− . Then Z
Z
0
G+ (x) =
0
F (x − y) R+ (dy) = −∞
F (x − y) U− (dy) −∞
(the first identity is obvious and the second follows since an easy time reversion argument shows that R+ = U− , cf. A2). The heuristics is now that because ¢ ¡ of (b), the contribution from the interval (−N, 0] to the integral is O F (x) ¢ ¡ = o F I (x) , whereas for large y, U− (dy) is close to Lebesgue measure on (−∞, 0] normalized by µG−  so that we should have G+ (x) ∼
1 µG− 
Z
0
F (x − y) dy = −∞
1 F I (x) . µG− 
We now make this precise. If G− is nonlattice, then by Blackwell’s renewal theorem U− (−n − 1, −n] → 1/µG− . In the lattice case, we can assume that the span is 1 and then the same conclusion holds since then U− (−n − 1, −n] is just the probability of a renewal at n. Given ², choose N such that F (n − 1)/F (n) ≤ 1 + ² for n ≥ N (this is possible by (b) and Lemma 3.2), and that U− (−n − 1, −n] ≤ (1 + ²)/µG−  for n ≥ N . We then get G+ (x) x→∞ F I (x) Z ≤ lim sup
lim sup
x→∞
0 −N
F (x − y) U− (dy) + lim sup F I (x) x→∞
Z
−N
−∞
F (x − y) U− (dy) F I (x)
F (x) U− (−N, 0] ≤ lim sup x→∞ F I (x) ∞ X 1 F (x + n)U− (−n − 1, −n] + lim sup x→∞ F I (x) n=N
308
CHAPTER X. HEAVY TAILS ≤ ≤ =
∞ 1+² X 1 F (x + n) 0 + lim sup x→∞ F I (x) µG−  n=N Z ∞ (1 + ²)2 1 lim sup F (x + y) dy µG−  x→∞ F I (x) N
(1 + ²)2 F I (x + N ) (1 + ²)2 lim sup . = µG−  x→∞ µG−  F I (x)
Here in the third step we used that (b) implies B(x)/B 0 (x) → 0 and hence F (x)/F I (x) → 0, and in the last that F I is asymptotically proportional to B0 ∈ S. Similarly, (1 − ²)2 G+ (x) . ≥ lim inf x→∞ F I (x) µG−  Letting ² ↓ 0, the proof is complete.
2 ¡ ¢ Proof of Theorem 3.1. By Lemma 3.3, P(Yi > x) ∼ F I (x)/ θµG−  . Hence using dominated convergence precisely as for the compound Poisson model, (3.2) yields ∞ X F I (u) F I (u) = . P(M > u) ∼ (1 − θ)θk k θµG−  (1 − θ)µG−  k=1
Differentiating the WienerHopf factorization identity (A.9) ¡ ¢¡ ¢ b − [s] 1 − G d 1 − Fb[s] = 1 − G + [s] and letting s = 0 yields ¡ ¢ b 0+ [0] − 1 − kG+ k µG = −(1 − θ)µG . −µF = −(1 − 1)G − − Therefore by Lemma 3.2, µB B 0 (u) ρB 0 (u) F I (u) ∼ = . (1 − θ)µG−  µA − µB 1−ρ 2 We conclude by a lemma needed in XIII.2: ¢ ¡ Lemma 3.4 For any a < ∞, P(M > u, Sϑ(u) − Sϑ(u)−1 ≤ a) = o F I (u) . © ª (d) Proof. Let ω(u) = inf n : Sn ∈ (u − a, u), Mn ≤ u . Then ¡ ¢ ¡ ¢¡ ¢ P M ∈ (u − a, u) ≥ P ω(u) < ∞ 1 − ψ(0) .
4. FINITEHORIZON RUIN PROBABILITIES
309
© ª On the © other hand, onªthe set M > u, Sϑ(u) − Sϑ(u)−1 ≤ a we have ω(u) < ∞, and Sω(u)+n − Sω(u) n=0,1,... must attain a maximum > 0 so that ¡ ¢ P M > u, Sϑ(u) − Sϑ(u)−1 ≤ a
¡ ¢ ≤ P ω(u) < ∞ ψ(0) ¡ ¢ ψ(0) P M ∈ (u − a, u) . ≤ 1 − ψ(0)
But since P(M > u − a) ∼ P(M > u), we have ¢ ¡ ¢ ¡ ¢ ¡ P M ∈ (u − a, u) = o P(M > u) = o F I (u) . 2 Notes and references Theorem 3.1 is due to Embrechts & Veraverbeke [353], with roots in von Bahr [122], Pakes [677] and Teugels & Veraverbeke [842]. Asymptotic results for maxima of random walks with heavytailed increments again carry over to corresponding statements for ψ(u, T ) in the renewal model, cf. for instance Veraverbeke & Teugels [865] and more recently Baltr¯ unas [128]; see also Baltr¯ unas & Kl¨ uppelberg [129]. Further results on tails of the discounted aggregate claims in this model are given in Hao & Tang [447]. Wei & Yang [880] extend the integral representation for the ruin probability for US Pareto claims to an Erlang renewal model. A recent reference containing much relevant information on heavytailed asymptotics for random walks is Borovkov & Borovkov [185].
4
Finitehorizon ruin probabilities
We consider the compound Poisson model with ρ = βµB < 1 and the stationary excess distribution B0 subexponential. Then ψ(u) ∼ ρ/(1 − ρ)B 0 (u), cf. Theorem 2.1. The asymptotic behavior of the finitehorizon ruin probability ψ(u, T ) for fixed T is trivially given by (4.1) ψ(u, T ) ∼ βT B(u), ¡ ¢ since the coarse inequality P(AT − T > u) ≤ ψ u, T ≤ P(AT > u) and Proposition 1.5 in this case already suffice to see that ψ(u, T ) ∼ P(AT > u) from which (4.1) follows from Lemma 2.2. It is therefore clear that one can expect deeper results only for the case when the time horizon itself scales with u, and this will be the subject of this section. ¡ ¢ As usual, τ (u) is the time of ruin and as in V.7, we let P(u) = P ·  τ (u) < ∞ . The main result of this section, Theorem 4.4, states that under mild additional conditions, there exist constants e(u) such that the P(u) distribution of τ (u)/e(u) has a limit which is either Pareto (when B is regularly varying) or exponential
310
CHAPTER X. HEAVY TAILS
(for B’s such as the lognormal or DFR Weibull); this should be compared with the normal limit for the lighttailed case, cf. V.4. Combined with the approximation for ψ(u), this ¡ then¢ easily yields approximations for the finite horizon ruin probabilities ψ u, e(u) (Corollary 4.7). We start by reviewing some general facts which are fundamental for the analysis. Essentially, the discussion provides an alternative point of view to some results in Chapter V, in particular Proposition V.2.3.
4a
Excursion theory for Markov processes
Let until further notice {St } be an arbitrary Markov process with state space E (we write Px when S0 = x) and m a stationary measure, i.e. m is a (σfinite) measure on E such that Z m(dx)Px (St ∈ A) = m(A) (4.2) E
for all measurable A ⊆ E and all t > 0. Then there is a Markov process {Rt } on E such that Z Z m(dx)h(x)Ex k(Rt ) = m(dy)k(y)Ey h(St ) (4.3) E
E
for all bounded measurable functions h, k on E; in the terminology of general Markov process theory, {St } and {Rt } are in classical duality w.r.t. m. The simplest example is a discrete time discrete state space chain, where we can take h, k as indicator functions, for states i, j, say, and (4.3) with t = 1 means mi rij = mj sji where rij , sij are the transition probabilities for {St }, resp. {Rt }. Thus, a familiar case is time reversion (here m is the stationary distribution); but the example of relevance for us is the following: Proposition 4.1 A compound Poisson risk process {Rt } and its associated claim surplus process {St } are in classical duality w.r.t. Lebesgue measure. PN Proof. Starting from R0 = x, Rt is distributed as x + t − 1 t Ui , and starting PN from S0 = y, St is distributed as y − t + 1 t Ui (note that we allow x, y to vary in the whole of R and not as usual impose the restrictions x ≥ 0, y = 0). Let G PN denote the distribution of 1 t Ui − t. Then (4.3) means ZZ ZZ h(x)k(x − z) dx G(dz) = h(y + z) k(y) dy G(dz) . The equality of the l.h.s. to the r.h.s. follows by the substitution y = x − z.
2
4. FINITEHORIZON RUIN PROBABILITIES
311
For F ⊂ E, an excursion in F starting from x ∈ F is the (typically finite) piece of sample path2 ¯ {St }0≤t 0 : St 6∈ F } . We let QSx be the corresponding distribution and ¡ ¯ ¢ QSx,y = QSx · ¯ Sω(F c )− = y, ω(F c ) < ∞ , y ∈ F (in discrete time, Sω(F c )− should be interpreted as Sω(F c )−1 ). Thus, QSx,y is the distribution of an excursion of {St } conditioned to start in x ∈ F and R eS terminate in y ∈ F . QR x and Qx,y are defined similarly, and we let Qx,y refer to the time reversed excursion. That is, ¯ ³© ´ ª ¯ e Sx,y (·) = P Sω(F c )−t− c )− = y Q ∈ · S = x, S . ¯ 0 ω(F c 0≤t 0, the one in (b) is the time reversed path. The theorem states that the path in (b) has the same distribution as an excursion of {Rt } conditioned to start in y < 0 and to end in x = 0. But in the risk theory example (corresponding to which the sample paths are drawn), this simply means the distribution of the path of {Rt } starting from y and stopped when 0 is hit. In particular: 2 In general Markov process theory, a main difficulty is to make sense to such excursions also when Px (ω(F c ) = 0) = 1. Say {St } is reflected Brownian motion on [0, ∞), x = 0+ and F = (0, ∞). For the present purposes it suffices, however, to consider only the case Px (ω(F c ) = 0) = 0.
312
CHAPTER X. HEAVY TAILS
Corollary 4.3 The distribution of τ (0) given τ (0) < ∞, Sτ (0)− = y < 0 is the same as the distribution of ω(−y) where ω(z) = inf {t > 0 : Rt = z}, z > 0. [Note that ω(z) < ∞ a.s. when ρ = βµB < 1.] Proof of Theorem 4.2. We consider the discrete time discrete state space case only (wellbehaved cases such as the risk process example can then easily be handled by discrete approximations). We can then view QSx,y as a measure on all strings of the form i0 i1 . . . in with i0 , i1 , . . . , in ∈ F , i0 = x, in = y, ¡ ¢ Px S1 = i1 , . . . , Sn = in = y; Sn+1 ∈ F c S ¡ ¢ Qx,y (i0 i1 . . . in ) = ; Px ω(F c ) < ∞, Sω(F c )−1 = y note that ¡ ¢ Px ω(F c ) < ∞, Sω(F c )−1 = y ∞ X X ¡ ¢ = Px S1 = i1 , . . . , Sn = in = y; Sn+1 ∈ F c . n=1 i1 ,...,in−1 ∈F
e Sx,y and QR Similarly, Q y,x are measures on all strings of the form i0 i1 . . . in with i0 , i1 , . . . , in ∈ F , i0 = y, in = x, ¡ ¢ Py R1 = i1 , . . . , Rn = in = x; Rn+1 ∈ F c R ¡ ¢ Qy,x (i0 i1 . . . in ) = Py ω(F c ) < ∞, Rω(F c )−1 = y e Sx,y (i0 i1 . . . in ) = QSx,y (in in−1 . . . i0 ). and Q S To show QR y,x (i0 i1 . . . in ) = Qx,y (in in−1 . . . i0 ) when i0 , i1 , . . . , in ∈ F , i0 = y, in = x, note first that ¡ ¢ Py R1 = i1 , . . . , Rn = in = x; Rn+1 ∈ F c X = ri0 i1 ri1 i2 . . . rin−1 in rxj j∈F c
=
X mj sjx mi si i mi1 si1 i0 mi2 si2 i1 · . . . n n n−1 mi0 mi1 min−1 mx c
=
X 1 sin in−1 . . . si1 i0 mj sjx . my c
j∈F
j∈F
Thus
QR y,x (i0 i1 . . . in ) =
∞ X
sxin−1 . . . si1 y . X sxik−1 . . . si1 y
k=1 i1 ,...,ik−1 ∈F
4. FINITEHORIZON RUIN PROBABILITIES
313
Similarly but easier sxin−1 . . . si1 y QSx,y (in in−1
. . . i0 ) =
syj
j∈F c ∞ X
X
k=1 i1 ,...,ik−1 ∈F
=
X
∞ X
sxik−1 . . . si1 y
X
syj
j∈F c
sxin−1 . . . si1 y . X sxik−1 . . . si1 y
k=1 i1 ,...,ik−1 ∈F
2
4b
The time to ruin
Our approach to the study of the asymptotic distribution of the ruin time is to decompose the path of {St } in ladder segments. To clarify the ideas we first consider the case where ruin occurs already in the first ladder segment, that is, the case τ (0) < ∞, Sτ (0) > u. Let Y = Y1 = Sτ+ (1) be the value of the claim surplus process just after the first ladder epoch, Z = Z1 = Sτ+ (1)− the value just before the first ladder epoch (these r.v.’s are defined w.p. 1 w.r.t. P(0) ), see Fig. X.2. 6 u τ (0) = τ (u)
Y R

Z
Figure X.2 The distribution of (Y, Z) is described in Theorem IV.2.2. The formulation relevant for the present purposes states that Y has distribution B0 and that condi
314
CHAPTER X. HEAVY TAILS
tionally upon Y = y, Z follows the excess distribution B (y) given by B (y) (x) = B(y + x)/B(y). We are interested in the conditional distribution of τ (u) = τ (0) given © ª τ (0) < ∞, Sτ (0) > y = {τ (0) < ∞, Y > y} , that is, the distribution w.r.t. P(u,1) = P(·  τ (0) < ∞, Y > u). Now the P(u,1) (u) distribution of Y −u is B0 . That is, the P(u,1) density of Y is B(y)/[µB B 0 (u)], (u) y > u. B0 is also the P(u,1) distribution of Z since Z P(Z > a  Y > u)
∞
= Z
u ∞
= u+a
B(y) B(y + a) · dy µB B 0 (u) B(y) B(z) (u) dy = B0 (a) . µB B 0 (u)
Let {ω(z)}z≥0 be defined by ω(z) = inf {t > 0 : Rt = z} where {Rt } is independent of {St }, in particular of Z. Then Corollary 4.3 implies that the P(u,1) distribution of τ (u) = τ (0) is that of ω(Z). Now B0 ∈ S implies that the (u) B0 (a) → 0 for any fixed a, i.e. P(Z ≤ a  Y > u) → 0. Since ω(z)/z a.s. → 1/(1 − ρ), z → ∞, it therefore follows that τ (u)/Z converges in P(u,1) probability to 1/(1 − ρ). (u) Since the conditional distribution of Z is known (viz. B0 ), this in principle determines the asymptotic behavior of τ (u). However, a slight rewriting may be more appealing. Recall the definition of the auxiliary function e(x) in Section 1. It is straightforward that under the conditions of Proposition 1.18(c) (u) ¡
B0
ye(u)
¢
→ P(W > y)
(4.4)
where the distribution of W is Pareto with mean one in case (a) and exponential with mean one in case (b). That is, Z/e(u) → W in P(u,1) distribution. τ (u)/Z → 1/(1 − ρ) then yields the final result τ (u)/e(u) → W/(1 − ρ) in P(u,1) distribution. We now turn to the general case and will see that this conclusion is also true in P(u) distribution: Theorem 4.4 Assume that B0 ∈ S and that (4.4) holds. Then τ (u)/e(u) → W/(1 − ρ) in P(u) distribution. In the proof, let τ+ (1) = τ (0), τ+ (2), . . . denote the ladder epochs and let Yk , Zk be defined similarly as Y = Y1 , Z = Z1 but relative to the kth ladder
4. FINITEHORIZON RUIN PROBABILITIES
315
segment, cf. Fig. X.3. Then, conditionally upon τ+ (n) < ∞, the random vectors (Y1 , Z1 ), . . . , (Yn , Zn ) are i.i.d. and distributed as (Y, Z). We let K(u) = inf {n = 1, 2, . . . : τ+ (n) < ∞, Y1 + · · · + Yn > u} denote the number of ladder steps leading to ruin and P(u,n) = P(·  τ (u) < ∞, K(u) = n). The idea is now to observe that if K(u) = n, then by the subexponential property Yn must be large, i.e. > u with high probability, and Y1 , . . . , Yn−1 ‘typical’. Hence Zn must be large and Z1 , . . . , Zn−1 ‘typical’ which implies that the first n−1 ladder segment must be short and the last long; more precisely, the duration τ+ (n) − τ+ (n − 1) of the last ladder segment can be estimated by the same approach as we used above when n = 1, and since it dominates the first n − 1, we get the same asymptotics as when n = 1. 6
Y3
Y2 Y1
Z2
Z3

Z1 τ+ (1)
τ+ (1)
τ+ (1)
Figure X.3 In the following, k · k denotes the total variation norm between probability measures and ⊗ product measure. ° ° ¡ ¢ ° ⊗(n−1) (u) ° Lemma 4.5 °P(u,n) Y1 , . . . , Yn−1 , Yn − u) ∈ · − B0 ⊗ B0 ° → 0. 0 00 Proof. ¡We shall use the ¢ easily ¡ proved ¢ fact that if A (u), A (u) are events such that P A0 (u) ∆ A00 (u) = o P(A0 (u) (∆ = symmetrical difference of events), then ° ¡ ¢ ¡ ¢° °P ·  A0 (u) − P ·  A00 (u) ° → 0.
316
CHAPTER X. HEAVY TAILS
Taking A0 (u) = {Yn > u}, A00 (u) = {K(u) = n} =
©
ª Y1 + · · · + Yn−1 ≤ u, Y1 + · · · + Yn > u ,
the condition on A0 (u) ∆ A00 (u) follows ¡from B0 ¢being subexponential (Proposition 1.2, suitably adapted). Further, P ·  A0 (u) = P(u,n) , ¡ ¢ ⊗(n−1) (u) P Y1 , . . . , Yn−1 , Yn − u) ∈ ·  A0 (u) = B0 ⊗ B0 . 2 ° ° ¡ ¢ ⊗(n−1) Lemma 4.6 °P(u,n) (Z1 , . . . , Zn ) ∈ · − B0 ⊗ B (u) ° → 0. Proof. Let (Y10 , Z10 ), . . . , (Yn0 , Zn0 ) be independent random vectors such that the conditional distribution of Zk0 given Yk0 = y is B (y) , k = 1, . . . , n, and that Yk0 has (u) marginal distribution B0 for k = 1, . . . , n − 1 and Yn0 − u has distribution B0 . 0 That is, the density of Yn is B(y)/[µB B 0 (u)], y > u. The same calculation as (u) given above when n = 1 shows then that the marginal distribution of Zn0 is B0 . Similarly (replace u by 0), the marginal distribution of Zk0 is B0 for k < n, and clearly Z10 , . . . , Zn0 are independent. Now use that if the conditional distribution of Z 0 given Y 0 is the°same as the conditional distribution of Z given Y and ° °P(Y ∈ ·)−P(Y 0 ∈ ·)° → 0, then kP(Z ∈ ·)−P(Z 0 ∈ ·)k → 0 (here Y , Y 0 , Z, Z 0 are arbitrary random vectors, in our example Y = (Y1 , . . . , Yn ) etc.). 2 Proof of Theorem 4.4. The first step is to observe that K(u) has a proper limit distribution w.r.t. P(u) since by Theorem 2.1, P(u) (K(u) = n) = ∼
¢ 1 n ¡ ρ P Y1 + · · · + Yn−1 ≤ u, Y1 + · · · + Yn > u ψ(u) 1 ρn P(Yn > u) = (1 − ρ)ρn−1 ρ/(1 − ρ) B 0 (u)
(u,n) for n = 1, 2, . . .. It therefore that distribution of τ (u) © suffices ª to show © ª the P has the asserted limit. Let ω1 (z) , . . . , ωn (z) be i.i.d. copies of {ω(z)}. Then according to Section 4a, the P(u,n) distribution of τ (u) is the same as the P(u,n) distribution of ω1 (Z1 ) + · · · + ωn (Zn ). By Lemma 4.6, ωk (Zk ) has a proper limit distribution as u → ∞ for k < n, whereas ωn (Zn ) has the same limit behavior as when n = 1 (cf. the discussion just before the statement of Theorem 4.4). Thus ¡ ¢ ¡ ¢ P(u,n) τ (u)/e(u) > y = P(u,n) [ω1 (Z1 ) + · · · + ωn (Zn )]/e(u) > y ¡ ¢ ∼ P(u,n) ωn (Zn )/e(u) > y) → P(W/(1 − ρ) > y .
2
4. FINITEHORIZON RUIN PROBABILITIES Corollary 4.7 ψ(u, e(u)T ) ∼
317
¡ ¢ ρ B 0 (u) · P W/(1 − ρ) ≤ T . 1−ρ
¡ ¢ For a growth rate ¡ d(u)T¢ of the ¡ time¢ horizon with d(u) = o e(u) , Corollary 4.7 implies ψ u, d(u)T = o ψ(u) . The following theorem gives some more explicit information for this case and identifies the bridge between the asymptotic ¡ ¢behavior of ψ(u, T ) of the order of B(u) (cf. (4.1)) and the one of ψ u, e(u)T of the order of B 0 (u) (which is already the order of ψ(u)). ¡ ¢ Theorem 4.8 If B ∈ S and B0 ∈ S, then for d(u) ↑ ∞ with d(u) = o e(u) , ¡ ¢ ψ u, d(u)T ∼ β B(u) d(u) T. Pn Proof. Consider the random walk An = i=1 ξi with E(ξi ) = 0 and distribution function F (x) = P(ξi ≤ x) ∈ S and also its stationary excess distribution F0 ∈ S. Let Mσc := maxn≤σ (An − c n) for some stopping time σ of the random walk and some constant c ≥ 0. According to a result of Foss, Palmowski & Zachary [366], one then has P(Mσc > u) ∼
X
P(σ ≥ n)F (u + c n)
(4.5)
n≥1
as PNuh→ ∞, uniformly over all stopping times σ. Let h be fixed and choose ξi = j=1 Uj − βµB h (implying F ∈ S and F0 ∈ S) and furthermore σ = d(u) T . (h)
Denote the ruin probability of the discretetime process Rn (n ∈ N) (i.e. the Cram´erLundberg process viewed at time points n h only) by ψ (h) . Relation (4.5) then translates into ¡ ¢ ψ (h) u, d(u) T
∼
X ¡ ¢ ¡ ¢ P d(u) T ≥ nh F u + (1 − βµB )nh n≥1
=
X
¡ ¢ F u + (1 − βµB )nh .
1≤n≤d(u) T /h
¡ ¢ It follows that an asymptotic upper bound for ψ (h) u, d(u) T is ¢ d(u) T ¡ F u + (1 − βµB )h h
´ d(u) T ³ X P Uj > u + h h j=1 N (h)
=
∼ β d(u) T B(u + h) ∼ β d(u) T B(u)
318
CHAPTER X. HEAVY TAILS
and similarly an asymptotic lower bound is ¢ d(u) T ¡ F u + (1 − βµB ) h d(u)T h N (h) ´ d(u) T ³ X P Uj > u + (1 − βµB )h d(u) T + βµB h = h j=1 ¡ ¢ ∼ β d(u) T B u + (1 − βµB ) h d(u) T + βµB h ∼ β d(u) T B(u), ¡ ¢ where the ¡ last¢ asymptotic relation uses B u + d(u) ∼ B(u), which holds for d(u) = o e(u) under the stated assumptions on B (this property in fact characterizes the special role of the mean excess function e(u) in this context). Thus ¡ ¢ ψ (h) u, d(u) T ∼ β d(u) T B(u). From max
(At − t) ≥
t≤bd(u)T /hc h
max
n≤d(u)T /h
(Anh − n h) ≥
max
(At − t) − h
t≤bd(u)T /hc h
¡ ¢ one finally observes that for h → 0, ψ (h) u, d(u) T can be replaced by ψ(u, d(u)T ). 2 Notes and references Excursion theory for general Markov processes is a fairly abstract and advanced topic. For Theorem 4.2, see Fitzsimmons [363], in particular his Proposition 2.1. Most of the results of Section 5b are from Asmussen & Kl¨ uppelberg [86] who also treated the renewal model and gave a sharp total variation limit result. Extensions to the Markovmodulated model of Chapter VII are in Asmussen & Højgaard [80] and to L´evy processes in Kl¨ uppelberg, Kyprianou & Maller [543]. Theorem 4.8 can be found in Albrecher & Asmussen [12]. For extensions of (4.1) that hold uniformly for t in renewal models, see Tang [828] and Leipus & Siaulys [578]. Asmussen & Teugels [107] studied approximations of ψ(u, T ) when T → ∞ with u fixed; the results only cover the regularly varying case.
5
Reservedependent premiums
We consider the model of Chapter VIII with Poisson arrivals at rate β, claim size distribution B, and premium rate p(x) at level x of the reserve. Theorem 5.1 Assume that B is subexponential and that p(x) → ∞, x → ∞. Then Z ∞ B(y) dy . ψ(u) ∼ β (5.1) p(y) u
5. RESERVEDEPENDENT PREMIUMS
319
The key step in the proof is the following lemma on the cycle maximum of the associated storage process {Vt }, cf. Corollary III.2.2. Assume for simplicity R² that {Vt } regenerates in state 0, i.e. that 0 p(x)−1 dx < ∞, and define the cycle as ¯ n o ¯ σ = inf t > 0 : Vt = 0, max Vs > 0 ¯ V0 = 0 . 0≤s≤t
Lemma 5.2 Define Mσ = sup0≤t u) ∼ β Eσ · B(u). The heuristic motivation is the usual in the heavytailed area, that Mσ becomes large as consequence of one big jump. The form of the result then follows by noting that the process has mean time Eσ to make this big jump and that it then occurs with intensity βB(u). More precisely, one expects the level y from which the big jump occurs to be O(1); the probability that this exceeds u is then B(u − y) ∼ B(u). The rigorous proof is, however, nontrivial and we refer to Asmussen [64] (with a gap of that paper being filled in Asmussen et al. [83]). Proof of Theorem 5.1. We will show that the stationary density f (x) of {Vt } satisfies βB(x) . (5.2) f (x) ∼ p(x) We then get Z
Z
∞
ψ(u) = P(V > u) =
∞
f (y) dy ∼ β u
u
B(y) dy , p(y)
and the result follows. Define D(u) as the steadystate rate of downcrossings of {Vt } of level u and Dσ (u) as the expected number of downcrossings of level u during a cycle. Then D(u) = f (u)p(u) and, by regenerative process theory, D(u) = Dσ (u)/µ. Further the conditional distribution of the number of downcrossings of u during a cycle given Mσ > u is geometric with parameter q(u) = P(Mσ > u  V0 = u). Hence f (u)p(u) = D(u) =
P(Mσ > u) βB(u) Dσ (u) = ∼ . µ µ(1 − q(u)) 1 − q(u)
Now just use that p(x) → ∞ implies q(x) → 0.
2
Notes and references The results are from Asmussen [64], where also the (easier) case of p(x) having a finite limit is treated. It is also shown in that paper that typically, there exist constants c(u) → 0 such that the limiting distribution of τ (u)/c(u), given τ (u) < ∞, is exponential. An early reference for linear p(x) is Kl¨ uppelberg & Stadtm¨ uller [546]. Note that for linear p(x) and regularly varying claim size distribution, the result is consistent with the limit σ → 0 of X.(6.2). For extensions and
320
CHAPTER X. HEAVY TAILS
variants see Foss, Palmowski & Zachary [366] and Robert [741]. There are a number of further papers of Tang and coauthors dealing with aspects of ruin with subexponential claims under interest force, e.g. [551, 829]. See also Kalashnikov & Konstantinides [519].
6
Tail estimation
The fact that the order of ruin probabilities usually depends crucially on the tail and that the asymptotics are very different in light and heavytailed regimes poses the problem of which distributional tail F to employ. Of course, this is a general statistical problem but definitely something that needs to be taken seriously. We give here only a brief introduction and refer in the Notes to standard textbooks for more detailed and broader expositions. When computing ruin probabilities, assumptions on the tail do only partially suffice — more precise estimates like the Cram´erLundberg approximations with light tails or the subexponential approximation require the whole distribution, but we will not discuss here how to combine tail fitting with fitting in the whole support. We will consider the problem of fitting (with particular emphasis on the tail) a distribution F to a set of data X1 , . . . , Xn ≥ 0 assumed to be i.i.d. with common distribution F . As usual, X(1) , . . . , X(n) denote the order statistics. Inference on F (x) beyond x = X(n) is of course extrapolation of the data, and in a given situation, it will far from always be obvious that this makes sense. However, some extrapolation seems inevitable: because the empirical distribution Fn∗ has the finite upper bound X(n) , most methods are likely to underestimate the tail and in particular often postulate that it is light.
6a
The mean excess plot
A first question is to decide whether to use a light or a heavytailed model. The approach most widely used is based on the mean excess function Z ∞ ¯ £ ¤ 1 ¯ F (y) dy e(x) = E X − x X > x = F (x) x introduced in Section 1. The reason that the mean excess function e(x) is useful is that it typically asymptotically behaves quite differently for light and heavy tails. Namely, for a subexponential heavytailed distribution one has e(x) → ∞, whereas with light tails it will typically hold that lim sup e(x) < ∞; say a sufficient condition is F (x) ∼ `(x)e−αx
(6.1)
6. TAIL ESTIMATION
321
for some α > 0 and some `(x) such that `(log x) is slowly varying (e.g., `(x) = xγ with −∞ < γ < ∞). The mean excess test proceeds by plotting the empirical version X 1 (Xj − x) #j : Xj > x
en (x) =
j: Xj >x
of e(x), usually only at the (say) K largest Xj . That is, the plot consists of the pairs formed by X(n−k) and 1 k
n X
¡
¢ X(l) − X(n−k) ,
l=n−k+1
where k = 1, . . . , K. If the plot shows a clear increase to ∞ except possibly at very small k, one takes this as an indication that F is heavytailed, otherwise one settles for a lighttailed model. Example 6.1 Figure X.4 contains the mean excesses of simulated data with n = 1,000 from six different distributions. Each row is generated from i.i.d. r.v.’s Y1 , Y2 , . . . such that X = Y1 in the left column and X = Y1 + Y2 + Y3 in the right. In row 1, Y is Pareto with α = 3/2; in row 2, Y is Weibull with β = 1/2; and in row 3, Y is exponential; the scale is chosen such that EY = 3 in all cases. 30
400 300
20
200 10 0
100 0
20
40
60
80
0
30
30
20
20
10
10
0
0
20
40
60
80
0
6
15
4
10
2
5
0
0
5
10
15
20
25
0
200
400
600
0
50
100
150
0 10
Figure X.4
20
30
40
50
322
CHAPTER X. HEAVY TAILS
To our mind, the story told by these pictures is not all that clear, so one conclusion is certainly that using the mean excess plot is not entirely straightforward. 2
6b
Extreme values and POT
Another method for making qualitative statements about the tail of the underlying distribution F of the data is based on extreme value theory, more precisely the results from that area that describe the asymptotics of the maximum Mn = X(n) and large order statistics like X(n−1) , . . . , X(n−k) . To describe the method, we first recall the FisherTippett theorem that states that when Mn can be scaled and centered such that (Mn − dn )/cn has a nondegenerate limit in distribution H for suitable constants cn , dn , then H must be of one of three types, the three classical extreme value distributions Fr´echet, Weibull or Gumbel. Here a Fr´echet limit typically occurs for very heavytailed distributions (in fact, if and only if F is regularly varying), whereas a Gumbel limit occurs for lighttailed distributions and moderately heavytailed b distributions like the lognormal and Weibull tails e−ax with b < 1; Weibull limits3 need not concern us here since they only occur for distributions with a bounded support. −α The Fr´echet c.d.f. is H(x) = e−x , x > 0 and α > 0, and the Gumbel −x c.d.f. is H(x) = e−e , x ∈ R. The qualifier ‘type’ above refers to the fact that obviously H can only be given up to scaling and location constants. It is customary to work in the class of generalized extreme value distributions defined by n exp{−(1 + ξx)−1/ξ } ξ 6= 0, 1 + ξx > 0 Hξ (x) = . exp{−e−x } ξ = 0, −∞ < x < ∞ The particular reason for the normalization of the Fr´echet c.d.f. (ξ > 0) is to ensure continuity at ξ = 0, i.e. Hξ (x) → H0 (x) for all x as ξ → 0. The class of all possible limits is obtained by adding a location parameter µ¢and a scale ¡ parameter σ > 0, i.e. by considering the Hξ,µ,σ (x) = Hξ (x − µ)/σ . A distribution F such that (Mn − dn )/cn has limit H is said to be in the maximum domain of attraction of H. For applications, one can safely assume a given distribution F with infinite support to be in the maximum domain of attraction of either the Fr´echet or the Gumbel (but there are exceptions, in particular discrete distributions like the geometric sometimes disturb the 3 Note that the Weibull extreme value distribution is the negative analogue of the classical Weibull distribution.
6. TAIL ESTIMATION
323
picture). This means that the distribution of Mk with k large is likely to be close to some Hξ,µ,σ . The statistical procedure based on this observation is to obtain m approximately i.i.d. replicates Mk,1 , . . . , Mk,m of Mk and use these as data for maximum likelihood estimation of ξ, µ, σ. Writing n = km, the Mk,i can be obtained by splitting the n observations into m blocks of size k and letting Mk,i be the maximum over block i. The density hξ,µ,σ (x) of Hξ,µ,σ is nonzero only when 1 + ξ(x − µ)/σ > 0 and is given by hξ,µ,σ (x) =
© ¡ ¢−1/ξ ª 1 exp − 1 + ξ(x − µ)/σ 1/ξ+1 σ(1 + ξ(x − µ)/σ)
(taking ξ > 0 for simplicity). The log likelihood is therefore −m log σ − (1/ξ + 1)
m X
m ` ´ X ` ´−1/ξ log 1 + ξ(Mk,i − µ)/σ − 1 + ξ(Mk,i − µ)/σ
i=1
i=1
and has to be maximized over the region © ¡ ¢ª ξ ≥ 0, σ > 0, µ : 0 < min 1 + ξ(Mk,i − µ)/σ . i=1,..,m
Obviously, the maximization has to be done numerically. The most interesting parameter to estimate is ξ. Namely, an estimate ξb that is significantly larger than 0 indicates regular variation of F , one that is close to 0 indicates that most likely the tail is lighter than for regular variation. In practice the uncertainty on the estimates is usually high. One reason is that the block size k needs to be taken large in order that the Mk,i have a distribution reasonably close to the asymptotics predicted by extreme value theory. Thus, the sample size m for fitting the parameters will be orders of magnitude smaller than the actual number n of observations. Consequently, the resulting estimates should be interpreted with great care. Due to the difficulty with the waste of data by blocking, another method is more popular in practice. It is based on the generalized Pareto distribution Gξ,β with tail 1 ξ 6= 0, Gξ,β (x) = (1 + ξx/β)1/ξ −x/β e ξ = 0. Thus for ξ > 0, Gξ,β is the distribution of βX/ξ where X has the standard Pareto tail (1 + x)−α with α = 1/ξ, and G0,β is the exponential(1/β) limit as ξ ↓ 0. One has: Lemma 6.2 If X has distribution Gξ,β , then the distribution Fx of the overshoot X − x  X > x is Gξ,β(x) where β(x) = β + ξx.
324
CHAPTER X. HEAVY TAILS
The proof is elementary and omitted. Furthermore: Theorem 6.3 A distribution F is in the maximum domain of a generalized extreme value distribution Hξ with ξ ≥ 0 if and only if there exist constants β(x) such that ¯ ¯ lim sup¯Fx (y) − Gξ,β(x) (y)¯ = 0 . x→∞ y≥0
The proof is also omitted, but is not elementary! As noted above, one can almost always safely assume a given distribution F with infinite support to satisfy the assumptions of Theorem 6.3. This motivates that for tail estimation, one assumes that Fx has a Gξ,β distribution for all large x (where β depends on x), selects some large but fixed threshold x and estimates ξ, β from the observations exceeding x. Then, letting N (x) = #{i : Xi > x}, the final estimate of the tail of F is b (y) = N (x) G (y − x) , y > x , F bβ b ξ, n
(6.2)
b βb are the estimates of ξ, β = β(x). To obtain ξ, b β, b one lets Yei = Xj −x, where ξ, i i = 1, . . . , N (x) where j0 = 0, ji = inf{j > ji−1 : Xj > x} and maximizes the log likelihood N (x)
X
N (x)
log gξ,β (Yi ) = −N (x) log β − (1/ξ + 1)
i=1
X
log(1 + ξYi /β) ,
i=1
where gξ,β is the density of Gξ,β . The Yi represent peaks over the threshold x, and for this reason, the method and its extensions go under the name POT.
6c
The Hill estimator
We now assume that F is either regularly varying, F (x) = L(x)/xα , or lighttailed satisfying (6.1). The problem is to estimate α. Even with L or ` completely specified, the maximum likelihood estimator (MLE) is not adequate in this connection, because maximum likelihood will try to adjust α so that the fit is good in the center of the distribution, without caring too much about the tail, where there are fewer observations. The Hill estimator is the most commonly used (though not the only) estimator designed specifically to take this into account. To explain the idea, consider first the setting of (6.1). If we ignore fluctuations in `(x) by replacing `(x) by a constant, the Xj − x with Xj > x are i.i.d. exponential(α). Since the standard MLE of α in the (unshifted) exponential
6. TAIL ESTIMATION
325
distribution is n/(X1 + · · · + Xn ), the MLE α based on these selected Xj alone is #j : Xj > x P . j: Xj >x (Xj − x) The Hill plot is this quantity plotted as function of x or the number #j : Xj > x of observations used. As for the mean excess plot, one usually plots only at the (say) k largest j or the k largest Xj . That is, one plots k ¡ ¢ `=n−k+1 X(`) − X(n−k)
Pn
(6.3)
H as function of either k or X(n−k) . The Hill estimator αn,k is (6.3) evaluated at some specified k. However, most often one checks graphically whether the Hill plot looks reasonably constant in a suitable range and takes a typical value from there as the estimate of α. The regularly varying case can be treated by entirely the same method, or one may remark that it is 1to1 correspondence with (6.1) because X has tail L(x)/xα if and only if log X has tail (6.1). Therefore, the Hill estimator in the regularly varying case is
k ¡ ¢. `=n−k+1 log X(`) − log X(n−k)
Pn
(6.4)
It can be proved that if k = k(n) → ∞ but k/n → 0, then weak consistency α holds. No conditions on L are needed for this. One might think that the next step would be the estimation of the slowly varying function L, but this is in general considered impossible among statisticians. In fact, we will see H below that there are already difficulties enough with αn,k itself. P H αn,k →
Example 6.4 Figure X.5 contains the Hill plot (6.4) of simulated data (now with n = 10 000) from the same six distributions as in Example 6.1 and the number k = 10, . . . , 2 000 of order statistics used on the horizontal axis. Of course, only the first row, Pareto(3/2), is meaningful, since the distributions in the remaining rows are not regularly varying. Nevertheless, the appearance of the second row of plots, Weibull, is so close to the first that it is hard to assert from this alone that the distribution is not regularly varying (the evidence from the mean excess plot in Figure X.4 is not that conclusive either). The same holds, though maybe in a somewhat weaker form, for the exponential case in the third row. The first row also clearly demonstrates the difficulty in choosing k. Maybe one would settle for a value between 50 and 500 in the left panel, giving an estimate of α between 1.6 and 1.4.
326
CHAPTER X. HEAVY TAILS 2.5
2.5
2 2 1.5 1
0
500
1000
1500
2000
1.5
5
8
4
6
3
4
2
2
1
0
500
1000
1500
2000
8
0
0
500
1000
1500
2000
0
500
1000
1500
2000
0
500
1000
1500
2000
10 8
6
6 4 2
4 0
500
1000
1500
2000
2
Figure X.5 The high statistical uncertainty on an estimate α b of α is of course reflected in a high uncertainty on the estimates of the tail we obtain by plugging in α b instead of α in the parametric expression for the tail. To quantify this point, consider again the Pareto(3/2) example, where it was not easy to assess from our simulation studies whether one would use α b = 1.4, 1.5 or 1.6 or values even further from 1.5. In the following table, we use these three αvalues and compute the tail probabilities F α (x) for the 4 xvalues in the first row (chosen as the 99%, 99.9%, 99.99% and 99.999% quantiles of F1.5 ). 20.5 0.007 0.010 0.014
463 2153 0.00005 0.000005 0.00010 0.000010 0.00018 0.000022 ¡ H ¢ There is also a CLT k 1/2 αn,k − α → N (0, α2 ). For this, however, stronger conditions on k = k(n) and L are needed. In particular, the correct choice of k requires delicate estimates of L. That L can present a major complication has also been observed in the many “Hill horror plots” in the literature. 2 1.6 1.5 1.4
99.0 0.0006 0.0010 0.0016
Notes and references Some standard textbooks on the topic are Embrechts, Kl¨ uppelberg, & Mikosch [349], McNeil et al. [633], Resnick [738], Beirlant et al. [154]
6. TAIL ESTIMATION
327
and de Haan & Ferreira [284]. Another simple technique to estimate α ∈ (0, 2) from i.i.d. observations is given in Albrecher & Teugels [37]. One should note that the area is rapidly expanding and that much literature recently also deals with dependence contexts which are not considered here.
This page intentionally left blank
Chapter XI
Ruin probabilities for L´ evy processes 1
Preliminaries
An important family of stochastic processes arising in many areas of applied probability is the class of L´evy processes. A process X = {Xt }t≥0 is said to be a L´evy process if it has Dpaths and stationary and independent increments. Often one requires also X0 = 0, but at some instances we will also allow for starting values X0 = u 6= 0 and then write Pu for the governing probability measure (if u = 0, we simply write P). For the purposes of ruin theory we will usually think of Xt as claim surplus St at time t, in which case indeed S0 = 0 and as usual, the ruin time is then τ (u) ¡ = inf {t ≥¢ 0 : Xt > u} and the infinite horizon ruin probability is ψ(u) = P0 τ (u) < ∞ . At some points it will alternatively be convenient to think of Xt as the reserve process Rt with starting value R0 = u ≥ 0. One easily checks that under the Dpath assumption the strong Markov property holds for L´evy processes (see e.g. [APQ, p. 35]). Standard Brownian motion B is a L´evy process, and so is a Brownian motion {µt + σBt } with general drift and variance parameters. A further fundamental example is the counting process Nβ of a Poisson process, where β is the rate. In fact, one of the central results in the foundations of L´evy processes is that any L´evy process can be represented as an independent sum of a Brownian motion and a ‘compound Poisson’like process. In particular, any L´evy process 329
´ CHAPTER XI. LEVY PROCESSES
330
exhibiting finitely many jumps per unit time can be represented as Nβ (t)
Xt = µt + σBt +
X
Yi
i=1
for t ≥ 0, where the Yi are i.i.d. and independent of B, Nβ . This covers in particular the compound Poisson claim surplus process, where σ 2 = 0, µ = −1 and the Yi are positive. However, there are L´evy processes for which the nonBrownian jump component J = {J(t)}t≥0 exhibits infinitely many jumps per unit time. Dealing with such processes is the main topic of this chapter. The jump process J is characterized by its L´evy measure ν(dx), which can be any nonnegative measure on R satisfying ν({0}) = 0 and Z ∞ (y 2 ∧ 1) ν(dy) < ∞. (1.1) −∞
R² Equivalently, y>² ν(dy) and −² y 2 ν(dy) are finite for some (and then all) ² > 0. A rough description of J is that jumps R ∞ of size y occur at intensity ν(dy). In particular, if ν has finite mass λ = −∞ ν(dy) < ∞, then J is a compound Poisson process with intensity λ and jump size distribution ν(dy)/λ. In general, for any bounded interval K separated from 0, the sum of the jumps of size ∈ K in the R timeinterval [s, s + t) is a compound Poisson r.v. with intensity tλK = t K ν(dy) and jumpsize distribution ν(dy)I(y ∈ K)/λK . Jumps in disjoint intervals are independent, and so we can describe the totality of jumps by the points in a planar Poisson process N (dy, dt) with intensity measure ν(dy) ⊗ dt. A point of N at (Yi , Ti ) then corresponds to a jump of size Yi at time Ti for J. If in addition to (1.1) one has Z ∞ ¡ ¢ y ∧ 1 ν(dy) < ∞ (1.2) R
−∞
(this is equivalent to the paths of J being of finite variation), one can simply write Z Jt = y N (dy, ds). (1.3) R×[0,t]
If (1.2) fails, this Poisson integral does not converge absolutely, and J has to be defined by a compensation (centering) procedure. For example, letting Z Z Y0 (t) = y N (dy, ds), Yn (t) = y N (dy, ds) , {y: y>1}×[0,t]
y∈(yn+1 ,yn ]
1. PRELIMINARIES
331
one can let J(t) = Y0 (t) +
∞ X ©
ª Yn (t) − EYn (t) ,
(1.4)
n=1
where 1 = y1 > y2 > · · · ↓ 0 and
Z
EYn (t) = t
y ν(dy) . y∈(yn+1 ,yn ]
The series converges a.s. since ∞ ∞ Z X X ¡ ¢ Var Yn (t) = t n=1
n=1
Z y 2 ν(dy) = t y∈(yn+1 ,yn ]
1
y 2 ν(dy) < ∞ ,
−1
and the sum is easily seen to be independent of the particular partitioning {yn }. But note that since the role of the interval [−1, 1] is arbitrary, a compensated L´evy jump process is given canonically only up to a drift term. If Jt has nondecreasing paths, then J is called a subordinator. The L´evy measure for a subordinator necessarily satisfies (1.2), and any L´evy jump process satisfying (1.2) can be written as the independent difference between two subordinators, defined in terms of the restriction of ν to (0, ∞) and, respectively, the restriction of ν to (−∞, 0) reflected to (0, ∞) (possibly a positive drift term has to be added). The property of stationary independent increments implies that log EerXt has the form tκ(r). Here κ(r) is called the L´evy exponent (also often referred to as Laplace exponent); its domain includes the imaginary axis 0, n(x) = C− eGx /x1+Y x < 0, where C+ ≥, C− ≥ 0, C+ + C− > 0, G, M > 0, 0 ≤ Y < 2. Such a L´evy process is called a tempered stable process. For Y > 0 and C+ = C− = C, the corresponding L´evy process is called the CGMY process2 ; for Y = 0 and C+ = C− = C, the process is called the Variance Gamma process. The L´evy exponent is £ ¤ £ ¤ κ(r) = C+ Γ(−Y ) (M − r)Y − M Y + C− (G + r)Y − GY . 2 Example 1.3 Since the Gamma distribution with a density proportional to xα−1 e−λx is infinitely divisible, there is a L´evy process with this distribution of X1 . For obvious reasons, it is called the Gamma process. The L´evy measure can be shown to have density n(x) = αe−λx /x for x > 0; note that n(x) ∼ x−1 , x ↓ 0, so the L´evy measure is infinite but at the borderline of being so. Hence 2 CGMY
= CarrGemanMadanYor; cf. the notation for the parameters!
´ CHAPTER XI. LEVY PROCESSES
334
small jumps play a relatively small role for the Gamma process. By standard properties of the Gamma distribution, κ(r) = α log
λ λαt αt−1 −λx , ft (x) = x e . λ−r Γ(αt)
2
The Variance Gamma process in Example 1.2 is the difference between two independent Gamma processes. Example 1.4 The Normal Inverse Gaussian (NIG) L´evy process has four parameters α, δ > 0, β ∈ (−α, α), µ ∈ R, and ´ ³p p κ(r) = µr − δ α2 − (β + r)2 − α2 − β 2 . The L´evy measure has density ¡ ¢ αδ K1 αx eβx , x ∈ R, πx
(1.8)
(here as usual K1 denotes the modified Bessel function of the third kind with index 1), and the density of X1 is n p o K ¡αpδ 2 + (x − µ)2 ¢ αδ 1 2 2 p exp δ α − β − βµ f1 (x) = eβx , π δ 2 + (x − µ)2 which is called the NIG(α, β, µ, δ) density; clearly the density ft (x) of Xt is NIG(α, β, tµ, tδ). 2 Example 1.5 Let X be any L´evy process with nonnegative drift. Then T (x) = inf {t : X(t) > x} is finite a.s., and clearly {T (x)}x≥0 has stationary independent increments, so it is a L´evy process (in fact a subordinator, since the sample paths are nondecreasing). The most notable example is the Inverse Gaussian L´evy process, which corresponds to X being Brownian motion with drift γ > 0 and variance 1. Here ½ ´¾ x 1 ³ x2 √ exp γx − + γ2t fx (t) = 2 t t3/2 2π (cf. also Corollary III.1.6) and the L´evy measure has density n(x) = √
2 1 e−xγ /2 , x > 0. 3/2 2π x
2
1. PRELIMINARIES
1b
335
Exponential change of measure
As for the compound Poisson model and random walks, exponential change of measure also plays a main role for L´evy processes. It is also clear from the analogy with these classical models what should be the appropriate definition of an exponential θtilting for a θ satisfying κ(θ) < ∞: change the L´evy exponent κ(r) to κθ (r) = κ(r + θ) − κ(θ). Proposition 1.6 Assume that X has characteristic triplet (c, σ 2 , ν). Then κθ (r) is the L´evy exponent of the L´evy process with characteristic triplet (cθ , σθ2 , νθ ), where σθ2 = σ 2 , νθ (dx) = eθx ν(dx), and Z 2
1
cθ = c + σ θ +
(eθx − 1)x ν(dx) .
−1
Proof. In view of (1.5) we have κθ (r)
= κ(r + θ) − κ(θ) 2
Z 2 2
∞
= (c + σ θ)r + σ r /2 + Z
³ =
¡ (θ+r)x ¢ e − eθx − rxI(x ≤ 1) ν(dx)
−∞
´ c + σ2 θ + (eθx − 1)x ν(dx) r + σ 2 r2 /2 −1 Z ∞ ¡ rx ¢ + e − 1 − rxI(x ≤ 1) eθx ν(dx) . 1
−∞
2 Letting P be the governing probability measure for X and Pθ the one for the exponentially tilted L´evy process, the likelihood ratio on [0, T ] takes the form dP ¯¯ = e−θX(T )+T κ(θ) , ¯ dPθ FT as may be seen, for example, by discreterandomwalk approximations. Often exponential change of measure and other calculations involve roots r of an equation of the form κ(r) = δ where κ is the L´evy exponent and δ ≥ 0. By convexity and κ(0) = 0, κ(r) = δ has (depending on the domain of existence of κ) either zero, one or two real roots and we denote by −ρδ the smallest one. Notes and references Many of the examples mentioned above are frequently used for asset price modeling in finance. The NIG L´evy process and the Variance Gamma
´ CHAPTER XI. LEVY PROCESSES
336
process are particular examples of generalized hyperbolic L´evy processes. The generalized hyperbolic distribution Y was introduced by BarndorffNielsen [135] as a normal variancemean mixture (e.g. both the mean and variance of the normal distribution are distributed according to an (appropriately scaled) mixing distribution W ) with a generalized inverse Gaussian mixing distribution W . For the NIG distribution W is the inverse Gaussian distribution, for the Variance Gamma distribution W is a Gamma distribution, giving the corresponding processes their names. Another popular approach is to interpret the NIG and VG L´evy process as Brownian motion subordinated by an inverse Gaussian and a Gamma process, respectively. For details see e.g. Bibby & Sørensen [164] or Schoutens [783].
2
Onesided ruin theory
In this section, we give the results (both asymptotic and exact) for the infinite horizon ruin probability ψ(u) that can be derived with reasonable effort. We assume throughout that the claim surplus process {St } is a L´evy process with negative drift, i.e. κ0 (0) < 0 and, to avoid trivialities, that it is not the negative of a subordinator (in which case ψ(u) = 0 for all u > 0). Going beyond the compound Poisson model to general L´evy processes, heavy tails are a remarkably simple case, and we have the following analogue of results from Chapter X: 0 Theorem 2.1 Assume that {St } is a L´ R e∞vy process with ES1 = κ (0) < 0 and the L´evy measure ν satisfying ν(x) = x ν(dy) ∼ B(x) as x → ∞ for some distribution B such that the integrated tail B0 of B is subexponential. Then Z ∞ 1 ν(y) dy . ψ(u) ∼ (2.1) ES1  u
Lemma 2.2 P(S1 > x) ∼ ν(x) . Proof. Write S = S 0 + S 00 + S 000 where S 0 , S 00 , S 000 have characteristic triplets (c, σ 2 , ν 0 ), (0, 0, ν 00 ) and (0, 0, ν 000 ), resp., with ν 0 , ν 00 , ν 000 being the restrictions of ν to [−1, 1], (−∞, −1) and (1, ∞), respectively. With β 000 = ν(1), the r.v. S1000 is a compound Poisson sum of r.v.’s, with Poisson parameter β 000 and distribution ν 000 /β 000 . Thus by X.2.1, we have P(S1000 > x) ∼ β 000
ν 000 (x) = ν(x) , x > 1. β 000
The independence of S100 and S1000 > 0 therefore implies P(S100 + S1000 > x) ∼ ν(x) ,
2. ONESIDED RUIN THEORY
337
cf. the proof of X.3.2. From (1.5) it is immediate that κ0 (r) < ∞ for all r. In particular, S10 is lighttailed, and the desired estimate for S1 = S10 + S100 + S1000 then follows by X.1.11. 2 Proof of Theorem 2.1. Define Md =
sup n=0,1,2,...
Sn ,
Then P(M d > u) ∼
M =
1 ES1 
sup St .
0≤t u) ≤ P(M > u) = ψ(u). Given ² > 0, choose a > 0 with P inf 0≤t≤1 St > −a ≥ 1 − ². Then P(M d > u − a) ≥ (1 − ²)P(M > u) = (1 − ²)ψ(u). But by subexponentiality, P(M d > u − a) ∼ P(M d > u). Putting these estimates together completes the proof. 2 Let us now move to general tail behavior of the L´evy measure, but restrict to onesided jumps only. A L´evy process {S} is called spectrally negative if there are no positive jumps, i.e. ν(0, ∞) = 0. Equivalently, the paths are skipfree upwards. In this case, ψ(u) is in fact of exact exponential form: Theorem 2.3 Assume that the claim surplus process {St } is spectrally negative with ES1 < 0 (i.e. ruin can only be caused by diffusion). Then κ(r) as defined in (1.5) has a positive zero γ > 0 and ψ(u) = e−γu , u > 0. Proof. Spectral negativity implies κ(r) < ∞ for all r > 0, and since κ(r) → ∞ as r → ∞ and κ0 (0) < 0, the desired root γ exists by continuity. Under the change of measure with tilting factor γ, Eγ S1 = κ0 (γ) > 0 so that Pγ (τ (u) < ∞) = 1. Due to the absence of upjumps we have Sτ (u) = u and correspondingly ¤ ¡ ¢ £ ψ(u) = P τ (u) < ∞ = Eγ e−γSτ (u) ; τ (u) < ∞ = e−γu . 2 If {St } is spectrally positive (i.e. ν(−∞, 0) = 0), ψ(u) is not explicit but the Laplace transform can be found in closed form: Theorem 2.4 Assume that the claim surplus process {St } is spectrally positive with ES1 = µ < 0. Then (at least) for all s with 0 Z ∞ µ 1 b + . (2.3) ψ[−s] = e−su ψ(u) du = s κ(−s) 0
´ CHAPTER XI. LEVY PROCESSES
338
Proof. Define M = maxt≥0 St and recall that the distribution of M is also the stationary distribution of the L´evy process reflected at 0, Vt = V0 + S t + L t ,
(2.4)
¢+ ¡ where Lt = − inf s≤t Ss − V0 (cf. [APQ, p. 250]). Taking V0 = M ∗ where M ∗ is an independent copy of M , {Vt } becomes stationary. In particular, EV1 = EV0 and therefore (2.4) yields EL1 = −µ. Further, by spectral positivity {Lt } is continuous so that the KellaWhitt martingale becomes Z
t
κ(r)
erVu du + erV0 − erVt + rLt .
0
Optional stopping at t = 1 gives 0 = κ(r)EerM + EerM − EerM + rEL1 = κ(r)EerM − rµ . Hence Z ∞ Z e−su ψ(u) du = 0
∞
e−su P(M > u) du =
0
1 + sµ/κ(−s) 1 − Ee−sM = . s s 2
b − 1) − r Remark 2.5 Note that for the Cram´erLundberg model κ(r) = β(B[r] and µ = βµB − 1, so that (2.3) then simplifies to 1 − βµB 1 b , ψ[−s] = − b s β B[−s] −β+s which indeed coincides with IV.(3.4).
2
R∞ Next consider light tails for the upjumps, meaning 1 erx ν(dx) < ∞ for some r > 0 (which is the same as κ(r) < ∞). If ES1 < 0 and the adjustment coefficient γ > 0 (i.e. the positive root of κ(r) = 0) exists, one expects from the Cram´erLundberg theory that ψ(u) decays asymptotically exponentially at rate γ. Indeed, in the general case: Theorem 2.6 Consider a general L´evy process {St } with ES1 < 0. Assume that γ > 0 exists and satisfies κ0 (γ) < ∞, and further that {St } is not a compound Poisson process with lattice support of ν. Then ψ(u) ∼ Ce−γu , u → ∞, for some constant 0 < C < ∞.
2. ONESIDED RUIN THEORY
339
We will see that the proof is straightforward, given some small technicalities on the nonlattice property. However, the really difficult step in the Cram´erLundberg theory for L´evy processes is identifying C (for this one needs the WienerHopf factorization briefly discussed in Section 4). Here we note only the following special case, which comprises the compound Poisson case and where the expression for C is entirely analogous to the one there, cf. IV.5.5: Corollary 2.7 If in the spectrally positive case γ > 0 exists and κ0 (γ) < ∞, then C = −µ/κ0 (γ) = −κ0 (0)/κ0 (γ). Proof. Because of Theorem 2.6 and Remark IV.5.6 it suffices to calculate the b constant C = limu→∞ eγu ψ(u) = lims→0 s ψ[−s + γ]. In view of Theorem 2.4 the result hence follows by a simple application of L’Hˆopital’s rule. 2 Proof of Theorem 2.6. The spectrally negative case is covered by Theorem 2.3. In the (additional) presence of positive jumps, let ξ(x) = Sτ (x) − x denote the overshoot and © ª © ª Y = Y1 = inf x > 1 : ξ(x−) = 0 , Yn = inf x > 1 + Yn−1 : ξ(x−) = 0 . Then the Yn are finite Pγ a.s. since then Eγ S1 > 0 and hence τ (x) < ∞ for all x, and Eγ S1 > 0 implies that there exists an infinity of x with ξ(x) = 0 [note that we cannot use x > 0 in the instead of x > 1 since it may then happen © definition ª that Y1 = 0 a.s.]. Thus ξ(x) x≥0 is a regenerative process with regeneration points Y1 , Y2 , . . .. The assumption that S is not a compound Poisson process with lattice support of ν is easily seen to imply that the distribution of Y1 is D nonlattice (see Kyprianou [564] for details). Hence ξ(x) → ξ(∞) for some ξ(∞) < ∞, and using exponential change of measure, we get ¤ £ ψ(u) = Eγ e−γSτ (u) ; τ (u) < ∞ = Eγ e−γSτ (u) = e−γu Eγ e−γξ(u) ∼ Ce−γu where C = Eγ e−γξ(∞) .
2
Notes and references Asymptotic results on ruin probabilities for L´evy insurance risk processes can be found in Kl¨ uppelberg, Kyprianou & Maller [543]. For Theorem 2.1, see also Kl¨ uppelberg & Kyprianou [542]. Corollary 2.7 goes back to Doney [325], where it is given as a consequence of a more involved argument. Huzak et al. [490] derive a ladderheight decomposition of the ruin probability and in that way generalize the PollaczeckKhinchine formula to certain L´evy setups, see also Schmidli [776]. At the same time, this implies that one can formulate a defective renewal equation for the ruin probability, see also Section XII.4. Bernyk, Dalang & Peskir [156] use fractional derivatives to derive accurate information on finitetime ruin probabilities for αstable
´ CHAPTER XI. LEVY PROCESSES
340
L´evy processes (1 < α < 2) with only downsided jumps. For extensions to more general L´evy processes with twosided jumps, see for instance Bertoin & Doney [158] and Lewis & Mordecki [582]. For asymptotic results on finitetime ruin probabilities, see Palmowski & Pistorius [679].
3
The scale function and twosided ruin problems
The concept of a scale function as discussed in II.2 for diffusions and giving twosided exit probabilities generalizes to L´evy processes. One even can go one step further and include information on the exit time as well. So, let {Xt }t≥0 be a L´evy process with L´evy exponent κ(r), and for 0 < u < a, define © ª © ª τ0− = inf t > 0 : Xt ≤ 0  X0 = u , τa+ = inf t > 0 : Xt ≥ a  X0 = u . To avoid trivialities, we assume that {X} is not a subordinator or the negative of a subordinator. We will also need the assumption of spectral negativity (this ensures κ(s) < ∞ for all s > 0); so in this Section {Xt } refers to the reserve process {Rt }. Of course, results for the spectrally positive case (and thus for the claim surplus process {St }) follow immediately by sign reversion. For δ > 0, the equation κ(s) = δ has a unique positive solution which we denote by ρδ > 0. If δ = 0 and E(X1 ) = κ0 (0) > 0, then ρδ = ρ0 = 0. Note that since we now deal with Rt (rather than St ), the sign of the argument of κ is reversed,3 so the positive solution is now not the adjustment coefficient! £ ¤ + Lemma 3.1 Eu e−δτa ; τa+ < ∞ = e−ρδ (a−u) . © ª Proof. As a simple adaptation of Lemma X.3.1, just note that eρδ Xt −δt is a martingale, apply optional stopping at τa+ ∧ T and let T → ∞ with dominated convergence. 2 Theorem 3.2 (a) For each δ ≥ 0, there exists a function W (δ) (u) (the scale function) such that £ ¤ + W (δ) (u) . Eu e−δτa ; τa+ < τ0− = W (δ) (a)
(3.1)
(b) W (δ) (u) is unique up to a multiplicative constant, which may be chosen such that W (δ) (u) is given via its Laplace transform in u by Z ∞ 1 for s > ρδ . e−su W (δ) (u) du = (3.2) κ(s) −δ 0 3 which
we emphasize by using the argument s instead of r.
3. THE SCALE FUNCTION AND TWOSIDED RUIN PROBLEMS
341
Note that taking δ = 0 we obtain the probability of exiting the interval (0, a) to the right as W (0) (u)/W (0) (a). Note also that the onesided survival probability can be computed by taking the limit a → ∞. In particular, if κ0 (0) > 0, then lima→∞ W (0) (a) = lims→0 s/κ(s) = 1/κ0 (0), so that ψ(u) = 1 − κ0 (0) W (0) (u) (if κ0 (0) ≤ 0, ψ(u) = 1). A further fundamental function for twosided ruin problems is Z u Z (δ) (u) = 1 + δ W (δ) (y) dy.
(3.3)
0
In fact: Theorem 3.3 (a) £ ¤ − δ Eu e−δτ0 ; τ0− < ∞ = Z (δ) (u) − W (δ) (u) . ρδ
(3.4)
£ ¤ − W (δ) (u) . Eu e−δτ0 ; τ0− < τa+ = Z (δ) (u) − Z (δ) (a) W (δ) (a)
(3.5)
(b)
In the proofs, we will need the running minimum and maximum, X t = min Xs , 0≤s≤t
X t = max Xs . 0≤s≤t
Further eδ will denote an exponential r.v. with rate δ and which is independent of the L´evy process {X}. eδ will become useful via the following lemma: Lemma 3.4 Ee−sX eδ =
ρδ δ ρδ − s sX , Ee eδ = . ρδ + s ρδ δ − κ(s)
Proof. Using exponential change of measure, we get ¡ ¢ ¡ ¢ P X eδ > a = P τa+ < eδ ¤ £ −ρ X + +τ + κ(ρ ) = Eρδ e δ τa a δ ; τa+ < eδ £ + +¤ = Eρδ e−ρδ a+τa δ e−δτa = e−ρδ a . I.e., X eδ is exponentially distributed with parameter ρδ which is equivalent to the first statement of the lemma.
´ CHAPTER XI. LEVY PROCESSES
342
For the second, we use the KellaWhitt martingale Mt (say) with exponential parameter −s on Zt = −Xt + Lt with Lt = − inf 0≤s≤t −Xs = sup0≤s≤t Xs . I.e., Z is −X reflected at 0 which by the continuoustime analogue of III.(3.2) implies that D Zt = max −Xt = −X t . 0≤v≤t
Note that spectral negativity implies that L can only increase when Z is at 0 and that L has no jumps. Therefore Z t Z t Mt = κ(s) e−sZv dv + 1 − e−sZt − s L(dv) . 0
0
Since optional stopping at an independent random time is permissible for any martingale, we have 0 = M0 = EMeδ . (3.6) Here
Z E
eδ
Z e−sZv dv =
0
∞
e−vδ e−sZv dv =
0
1 sX e Ee δ . δ
Using the just established fact that X eδ has an exponential(ρδ ) distribution, (3.6) therefore becomes 0 =
s κ(s) sX e sX Ee δ + 1 − Ee eδ − , δ ρδ
which gives the desired conclusion concerning X eδ .
2
Proof of Theorem 3.2(a) when κ0 (0) > 0 , δ = 0. The assumption κ0 (0) > 0 ensures that −X ∞ is finite a.s., and we can define W (0) (u) = Pu (X ∞ ≥ 0). Sample path arguments beyond the scope of this book show that either at τ0− or immediately after, X will attain strictly negative values (one needs to consider the case of a Brownian component or none separately; see [564, pp. 216, 177–179]). Therefore, under the event τ0− < τa+ , X ∞ ≥ 0 is impossible so that by the strong Markov property £ ¤ Pu (X ∞ ≥ 0) = Eu Pa (X ∞ ≥ 0); τa+ < τ0− = Pa (X ∞ ≥ 0) Pu (τa+ < τ0− ), which gives the desired conclusion in the form Pu (τa+ < τ0− ) =
W (0) (u) . W (0) (a)
(3.7) 2
3. THE SCALE FUNCTION AND TWOSIDED RUIN PROBLEMS
343
Proof of Theorem 3.2(a) when δ > 0 or κ0 (0) < 0 , δ = 0. In this case ρδ > 0 and we can use exponential tilting with factor ρδ and define Wρ(0) (u) = Pu,ρδ (X ∞ ≥ 0) , W (δ) (u) = eρδ u Wρ(0) (u) . δ δ
(3.8)
Easy convexity arguments show that the Pρδ drift is positive, so by (3.7), (0)
Pu,ρδ (τa+ < τ0− ) =
Wρδ (u) (0)
Wρδ (a)
.
(3.9)
But using the exponential change of measure, the l.h.s. can also be written as £ © ª ¤ £ ¤ + Eu exp ρδ (Xτa+ − u) − δτa+ ; τa+ < τ0− = eρδ (a−u) Eu e−δτa ; τa+ < τ0− . Combining these two expressions gives (3.1).
2
Proof of Theorem 3.2(a) when δ = 0 and κ0 (0) = 0. An easy continuity argument, letting δ ↓ 0. We omit the details.
2
Proof of Theorem 3.2(b). Consider again first the case κ0 (0) > 0. It is obvious that W (δ) may be modified by a multiplicative constant, so we redefine W (0) (u) as W (0) (u) = Pu (X ∞ ≥ 0)/κ0 (0). Using integration by parts and noting that P(X ∞ = 0) = 0, we have EesX ∞
Z ∞ = Ee−s(−X ∞ ) = 1 − se−su P(−X ∞ > u) du 0 Z ∞ Z ∞ −su se−su Pu (X ∞ ≥ 0) du . = se P(−X ∞ ≤ u) du = 0
0
On the other hand, ρδ → 0 as δ ↓ 0, more precisely ρδ ∼ δ/κ0 (0). Letting δ ↓ 0 in the second part of Lemma 3.4 therefore yields EesX ∞ = κ0 (0)s/κ(s). Comparing these two expressions gives Z ∞ 1 , e−su W (0) (u) du = κ(s) 0 which is the desired conclusion for the case κ0 (0) > 0. The proofs for the remaining cases are then easy by involving the connections between W , Wρδ and W (δ) . 2
´ CHAPTER XI. LEVY PROCESSES
344
Remark 3.5 There is a simple, but slightly heuristic way to see¤ Theorem 3.2 for £ + arbitrary drift and δ ≥ 0: Define C(u, a) = Eu e−δτa ; τa+ < τ0− . By the absence of upward jumps, X with X0 = u can only reach an arbitrary level b (with b > a > u) without ruin in between, if level a is passed before that. Consequently, by the strong Markov property of X one has C(u, b) = C(u, a)C(a, b), so that C(u, a) = C(u, b)/C(a, b) = h(u)/h(a) and one may identify h with the scale function. This argument shows that Theorem 3.2 is in fact valid beyond L´evy processes, namely for stationary Markov processes without upward jumps. 2 We next turn to the proof of Theorem 3.3. The first step is to note: Lemma 3.6 There exists a measure W (δ) (du) on [0, ∞) such that W (δ) [0, u] = W (δ) (u). This measure has Laplace transform Z ∞ s . (3.10) e−su W (δ) (du) = κ(s) − δ 0 Proof. The first statement is clear since W (δ) (u) is strictly increasing.4 The l.h.s. of (3.10) then comes out as Z ∞ Z ∞ Z ∞ Z y W (δ) (du) se−sy dy = se−sy dy W (δ) (du) 0 u 0 0 Z ∞ −sy (δ) = s e W (y) dy 0
which is the same as the r.h.s. Proof of Theorem 3.3(a). For δ > 0, we have from Lemma 3.4 that Z ∞ hδ i W (δ) (du) − δW (δ) (u) du = e−su ρδ 0 I.e. P(−X eδ ∈ du) =
2
s δ δ sX − = Ee eδ . ρδ κ(s) − δ κ(s) − δ
δ (δ) W (du) − δW (δ) (u) du. ρδ
It follows that £ ¤ − Eu e−δτ0 ; τ0− < ∞ = Pu (eδ > τ0− ) = Pu (X eδ < 0) Z u δ (δ) W (u) = 1 − P(−X eδ ≤ u) = 1 + δ W (δ) (y) dy − ρ δ 0 δ (δ) W (u) . = Z (δ) (u) − ρδ 4 Strictly speaking, we would need a rightcontinuous version. However, this turns out to be inessential for the following, and in fact W (δ) (u) can be shown to be continuous.
4. FURTHER TOPICS The case δ = 0 is again easy by taking limits.
345 2
Proof of Theorem 3.3(b). We can write £ ¤ £ ¤ £ ¤ − − − Eu e−δτ0 ; τ0− < τa+ = Eu e−δτ0 ; τ0− < ∞ − Eu e−δτ0 ; τa+ < τ0− . Since Xτa+ = a, the second Eu is £ ¤ £ ¤ − + Eu e−δτa ; τa+ < τ0− · Ea e−δτ0 ; τ0− < ∞ . £ ¤ − Thus Eu e−δτ0 ; τ0− < τa+ becomes Z (δ) (u) −
δ (δ) i W (δ) (u) h (δ) δ (δ) Z (a) − W (u) − W (a) , ρδ ρδ W (δ) (a)
which is the asserted expression.
2
Notes and references Scale functions are a classical tool for spectrally onesided L´evy processes with roots in Zolotarev [922], Takacs [827] and Korolyuk [554]. Parts of the exposition above are close to Kyprianou [564]. For a recent survey of available explicit forms of scale functions and methods how to construct them see Hubalek & Kyprianou [482] and also Kyprianou & Rivero [567]. The argument of Remark 3.5 can be found in Gerber, Lin & Yang [407]. As for ruin probabilities themselves, one can naturally also use the generator approach to identify the scale function, leading to an integrodifferential equation and subsequently to a Volterra integral equation. The connection between this approach and the more standard one pursued here is highlighted in Biffis & Kyprianou [165]. For an extension of the above analysis to one and twosided exit problems with nonconstant boundaries see Bertoin, Doney & Maller [159]. Extensions to L´evy processes that are reflected at the supremum or infimum are worked out by Zhou [920]. Loeffen & Patie [605] give a fine analysis of one and twosided exit problems with interest rates and absolute ruin for the case when the aggregate claim process is a subordinator. Lemma 3.4 can be exploited to design efficient numerical procedures for determining finitetime ruin probabilities ψ(u, t); for an application in credit risk see e.g. Madan & Schoutens [622].
4
Further topics
This section gives a brief overview of some topics which are basic in fluctuation theory for L´evy processes but more advanced than what we have looked at so far. The treatment should basically be seen as a heuristical introduction (to be
´ CHAPTER XI. LEVY PROCESSES
346
followed up by the interested reader by more detailed and rigorous treatments such as Kyprianou [564]). Thus the ‘proofs’ we present should mainly be considered as heuristical motivations that the results are true (in fact, the theory is so advanced that even [564] has to skip certain steps). The topics under consideration are certainly relevant for ruin theory. However, one problem is that explicit results beyond what we have already presented are rarely available.
4a
Local time at the maximum
In the following, denote by X t = sup0≤s≤t Xt the running maximum. A nondecreasing process {Lt } with Dpaths and L0 = 0 is called a (version of) the local time at the maximum if ª © (i) The support of the measure dLt is the closure of the set X t = Xt ; (ii) For every stopping time τ such that X τ = Xτ on {τ < ∞}, the shifted trivariate process © ª Xτ +t − Xτ , X τ +t − Xτ +t , Lτ +t − Lτ t≥0 is independent of Fτ on {τ < ∞} and has the same distribution as © ª Xt , X t − Xt , Lt t≥0 . Note that this definition identifies L only up to a multiplicative constant, and that existence is not a priori obvious. Note also that the term ‘local time’ occurs in various different, though often related, meanings in the probability literature. For some L´evy processes an obvious candidate for L easily suggests itself and the verification that it indeed is a local time is straightforward. In particular: (a) If X is spectrally negative, one can take Lt = X t . (b) © For a compound Poisson process with positive drift and negative jumps, the ª set X t = Xt is a union of disjoint intervals, and one may take Z t ¢ ¡ Lt = a I X s = Xs ds 0
with a > 0 arbitrary. In particular, this covers the reserve process in the Cram´erLundberg model. (c) If the set of times of maxima of X is discrete (as for the claim surplus process of the Cram´erLundberg model), one may take Lt = Mt where Mt is the number of maxima before t. The intuition behind the definition of local time at the maximum is to give an indication of how much time X has spent at its running maximum before
4. FURTHER TOPICS
347
t. For this reason, none of the definitions in (a), (b), (c) are applicable in the whole class of L´evy processes. More precisely, the definition Lt = X t would not have the required intuitive properties for (say) a compound Poisson process without drift, and the definitions in ª (b), (c) would not be appropriate for (say) © Brownian motion because X t = Xt is a Lebesgue null set, excluding (b), and not discrete, excluding (c). The general definition of L requires the notion of regularity. We say that B (say B = (0, ∞) or B = [0, ∞)) is regular if P(τB = 0) = 1 where τB = inf {t > 0 : Xt ∈ B} (note that by Blumenthal’s 0–1 law, P(τB = 0) is either 0 or 1). For example, (0, ∞) is regular for Brownian motion, but (0, ∞) and [0, ∞) are both irregular for the Cram´erLundberg claim surplus process. There are then the following three cases: 1) X has bounded variation and [0, ∞) is irregular. Then the set of maxima is PMt discrete and we can take Lt = i=1 Eλ,i where Mt is as in (c) and the Eλ,i are i.i.d. exponential(λ).5 2) X has bounded variation and (−∞, 0) is irregular. One may define L as in (b) above. If X is spectrally negative, then L is proportional to X. 3) X has unbounded variation (this can be shown to imply that [0, ∞) is regular). A local time exists, but there is no simple known expression in terms of the path of X. Again, if X is spectrally negative, then L is proportional to X.
4b
The ladder height process
First note that L∞ = limt→∞ Lt may be finite (the main case is negative drift). Then define n inf{s > 0 : L > t} t < L s ∞ L−1 = . t ∞ t ≥ L∞ Further let the ladder height process be n X −1 t < L ∞ Lt . ∞ t ≥ L∞ ©¡ −1 ¢ª Theorem 4.1 The process Y = Lt , Mt t≥0 is a bivariate L´evy process, possibly terminating if L∞ < ∞. ¡ ¢ Proof. The definition of L−1 t , Mt immediately implies that X is at a maximum at time L−1 t . Therefore ¡ ¢ −1 Y t+s − Y t = L−1 t+s − Lt , Mt+s − Mt Mt =
5 The variation from (c) is motivated from the desire of L−1 to have certain properties, cf. t the following Section 4b.
´ CHAPTER XI. LEVY PROCESSES
348
¡ ¢ has the same distribution as Y s = L−1 s , Ms and is independent of {Y v }v≤t . This implies that Y has stationary independent increments and the assertion. 2 It follows that for some suitable function φ(·, ·) we can write ª © log E exp −aL−1 = −φ(a, b)t . t − bMt
4c
(4.1)
Excursions
By an excursion from the maximum we understand a segment {Xt }u≤t≤v of X such that X u = Xu ≤ Xv and Xt < Xu for u < t < v. The fundamental fact about excursions is that they roughly occur according to a Poisson process in the time scale given by L−1 . However, for example for X a Brownian motion and s a time where X is at a maximum, sample path properties of X imply that each interval [s, s + ²] contains infinitely many excursions. Of course, the sum of their lengths has to be finite, so for each δ > 0 there must be finitely many excursions of length > δ and infinitely many of length ≤ δ. The same phenomenon typically occurs for general L´evy processes, so a careful formulation of the Poisson property is needed, for example the following one: Theorem 4.2 Let δ > 0 and let η1 < η2 < . . . be the times > 0 where an excursion of length > δ starts.6 Then the points Lη1 , Lη2 , . . . form a homogeneous Poisson process. Proof. We only treat the case of Lt being continuous in t. For brevity, denote the excursions of length > δ as δexcursions. The counting process N of δexcursions on the L−1 scale is given by Nt = max{i : ηi ≤ s} where Ls = t. −1 −1 Let t1 < t2 < t3 < · · · . Then in the ordinary time scale for X, L−1 t1 , Lt2 , Lt3 correspond to times s1 < s2 < s3 with Lsi = ti and X is at a maximum at each si . It is clear that the number Nt2 − Nt1 of ηi ∈ [s1 , s2 ] is independent of the number Nt3 − Nt2 of ηi ∈ [s2 , s3 ], and similarly for further intervals of the same type. Further, it is not difficult to see by considering a sample path that the distribution of Nt2 − Nt1 only depends on t2 − t1 . Also, if a δexcursion starts and ends at say u, v, then X is at a maximum at v so the local time has to increase at v which implies that the local time at the time w (say) where the next δexcursion starts satisfies Lw > Lv = Lu . This implies that N cannot have multiple points, which together with the already noted fact of stationary and independent increments implies the Poisson property. 2 6 The characteristic ‘length > δ’ could be replaced by many others, for example that the maximal deviation from the maximum during the excursion exceeds δ.
4. FURTHER TOPICS
349
The intensity parameter of N or similar point processes of excursions is in general not available. A notable exception is Itˆo’s excursion law for Brownian motion (e.g. Rogers & Williams [744, Sec. VI.8]), where a complete description of the probability mechanism governing Brownian excursions is possible. Another one is the reserve process Rt of the Cram´erLundberg model, where excursions occur at intensity β and have the distribution of the busy period in the dual M/G/1 queue (see also Theorem III.2.3).
4d
The WienerHopf factorization
The WienerHopf factorization occurs in many alternative forms in the literature, but its currently most used version is the following one. Recall that eδ is an independent exponential(δ) time. Further, define Gt = sup{s < t : Xs = X s } ,
Gt = sup{s < t : Xs = X s }
(the times of the last maximum, resp. minimum, before t). ¢ ¡ ¢ ¡ Theorem 4.3 (i) The pairs Geδ , X eδ and eδ − Geδ , X eδ − Xeδ are independent. Therefore δ = Ψ+ (a, b)Ψ− (a, b) (4.2) δ − a − κ(b) where Ψ+ (a, b) = EeaGeδ +bX eδ ,
aGe +bX e
Ψ− (a, b) = Ee
δ
δ
.
(4.3)
(ii) The functions Ψ+ , Ψ− in (4.3) can be identified (involving analytic continuation, if needed) via the function φ in (4.1) and the corresponding one φ˘ for the descending ladder process by means of EeaGeδ +bX eδ =
˘ 0) φ(δ, 0) φ(δ, aG +bX e δ = , Ee eδ . ˘ − a, −b) φ(δ − a, −b) φ(δ
(4.4)
The functions Ψ+ , Ψ− are called the WienerHopf factors of X. They are obviously at best given up to a multiplicative constant, but can in fact be shown to be unique modulo this. Proof. Given eδ > t, the distribution of eδ − t is again exponential(δ). If t is the time of the last maximum before eδ , this changes the distribution to that of an exponential e0δ (which is an independent copy of eδ ) given that an excursion away from the maximum occurs at time 0 and lasts at least e0δ . However, independence pertains and obviously, X eδ − Xeδ is conditionally independent of (Gt , Xt ) and independent of t, which implies the claimed independence.
´ CHAPTER XI. LEVY PROCESSES
350 For (4.2), first note that δ = δ − a − κ(b)
Z
∞
δe−δt eat eκ(b)t dt = Eeaeδ +bXeδ .
0
Using the independence just established, this becomes EeaGeδ +bX eδ Eea(eδ −Geδ )+b(Xeδ −X eδ ) = Ψ+ (a, b)Eea(eδ −Geδ )+b(Xeδ −X eδ ) . (4.5) However, a sign reversion argument easily gives that ¢ D ¡ eδ − Geδ , Xeδ − X eδ = (eδ , X eδ ) . Hence the final expectation in (4.5) reduces to Ψ− (a, b), completing the proof of (i). We do not give the proof of (ii), see for instance Kyprianou [564]. 2 Example 4.4 For the spectrally negative case with positive drift, L−1 = τt+ t and Mt = t, so that from (4.1) and Lemma 3.1 it follows that φ(a, b) = ρa + b (τt+ < ∞ a.s. for positive drift). Hence Ψ+ (a, b) = ρδ /(ρδ−a − b), which for the special case a = 0 was already established in Lemma 3.4. From (4.2) we then easily identify Ψ− (a, b) =
δ ρδ−a − b . ρδ δ − a − κ(b)
¡ ¢ ˘ b) = a − κ(b) /(ρa − b). From this one can, in view of (4.3), read off φ(a,
4e
2
A quintuple identity
Consider the quintuple (V1 , V2 , V3 , V4 , V5 ) given by the r.v.’s Gτ + (x)− , X τ + (x)− , τ + (x) − Gτ + (x)− , Xτ + (x)− , Xτ + (x) (the time of the last maximum before first passage, the value of that maximum, the time from that maximum to first passage, the value just before first passage, and the value just after, see Figure XI.1).
4. FURTHER TOPICS
351
Figure XI.1 ˘ (often called potential measures) by Further define the measures U, U Z ∞ U(ds, dx) = P(L−1 t ∈ ds, Mt ∈ dx) dt , 0 Z ∞ ˘ ˘ −1 ˘ U(ds, dx) = P(L t ∈ ds, Mt ∈ dx) dt , 0
˘ −1 ˘ t refer to the corresponding quantities of the dewhere as before L and M t scending ladder height process. By Fubini’s Theorem and (4.1), the bivariate Laplace transform of U has the simple form Z Z ∞ −1 1 −as−bx e U(ds, dx) = . (4.6) dt · E(e−aLt −bMt ) = φ(a, b) [0,∞)2 0 R Remark 4.5 From (4.6), obviously [0,∞)2 U(ds, dx) = 1/φ(0, 0). With the R∞ definition U (dx) = s=0 U(ds, dx) one then sees by normalization that ¡ ¢ ψ(u) = P τ (u) < ∞ = φ(0, 0) U (u, ∞). (4.7) This representation of the ruin probability can be interpreted as the continuoustime extension of the PollaczeckKhinchine formula of Theorem IV.2.1, because Mt is the (ascending) ladder height process. 2 Theorem 4.6 The conditional distribution of V3 , V4 given V1 , V2 depends only on V2 , and the conditional distribution of V5 given V1 , V2 , V3 , V4 depends only on V4 .¡ Further, there exists a normalization of the local time such that the density ¢ of V1 , x − V2 , V3 , x − V4 , V5 − x at (s, y, t, v, z) can be written as ˘ U(ds, x − dy) U(dt, dv − y) ν(dz + v) .
´ CHAPTER XI. LEVY PROCESSES
352
Proof. The claims on the conditional distributions are clear from Figure XI.1 and the strong Markov property. This gives a factorization of the density of ¡ ¢ V1 , x − V2 , V3 , x − V4 , V5 − x as h1 (ds, x − dy)h2 (dt, dv − y) h3 (dz + v). Here it is clear that h3 (dz + v) = ν(dz + v). The claim on h1 , h2 will not be shown here (see e.g. again [564]). 2 The quintuple law usually does not lead to explicit formulas. For spectrally positive L´evy processes {Xt }, one can however obtain the following simpler expression for the joint density of the last maximum before first passage V2 , the value just before passage V4 and the value after passage V5 : Corollary 4.7 If {Xt } is a spectrally positive L´evy process drifting to −∞ with ¡ ¢ scale function W (u) = W (0) (u), then the density of x − V2 , x − V4 , V5 − x at (y, v, z) is given by W 0 (x − y) ν(dz + v) dv dy,
0 ≤ y ≤ min(x, v), z > 0.
(4.8)
Proof. We can use the expressions obtained in Example 4.4, but have to reverse the role of the ascending and descending ladder process, because now we have a spectrally positive process. From (4.6) with a = 0 we see that Z ∞ b . (4.9) e−bx U (dx) = κ(−b) 0 ˘ b) = b, so we can choose U ˘ (dx) = dx. But in view of At the same time φ(0, Theorem 4.6 this implies that the density in (4.8) is U (x − dy) ν(dz + v) . However, comparing with the definition (3.2) of the scale function W (0) , it is clear that U can be identified with W (0) , because the latter is only unique up to a constant (which can be controlled by the normalization of the local time of X at its supremum). 2 Remark 4.8 If in addition {Xt } has bounded variation, then from (1.6) Z ∞ κ(−b) = −c1 b + (e−by − 1)ν(dy) 0
with E(X1 ) < 0 and one can infer from (4.9) by expanding the resulting geometric series that ∞ 1 X ∗n χ (dx) U (dx) = c1 n=0
5. THE SCALE FUNCTION FOR TWOSIDED PHASETYPE JUMPS 353 where χ(dx) = ν(x, ∞) dx/c1 . Correspondingly, (4.8) can in this case also be written as ∞ 1 X ∗n χ (x − dy) ν(dz + v) dv, c1 n=0
0 ≤ y ≤ x, y ≤ v, z > 0. 2
In terms of the risk reserve process Rt , the above result gives a formula for the joint density of the surplus prior to ruin, the deficit at ruin and the size of the last minimum before ruin in terms of the scale function and the L´evy measure; see Chapter XII for a further discussion. Notes and references A complete proof of Theorem 4.6 is given in Doney & Kyprianou [326], where also asymptotics of the quintuple law for x → ∞ are given. Kuznetsov [563] recently gave quite general criteria under which the WienerHopf factors are of semiexplicit form and identified a set of tractable special cases.
5
The scale function for twosided phasetype jumps
In this section, we assume that {Xt }t≥0 is the superposition of a Brownian motion with drift µ and variance constant σ 2 > 0 and two compound Poisson processes, one having upward jumps at rate λ+ and being phasetype with representation (E + , α+ , T + ) and the other having downward jumps at rate λ− and being phasetype with representation (E − , α− , T − ) (the cardinalities of E + , E − are denoted by p+ , resp. p− ). That is, the L´evy exponent κ(s) as defined by log EesXt /t equals ¡ ¢ ¡ ¢ sµ + s2 σ 2 /2 + λ+ α+ (−sI − T + )−1 t+ − 1 + λ− α− (sI − T − )−1 t− − 1 . (5.1) It is welldefined in the strip D = {s ∈ C : ρ− < <s < ρ+ } where ρ+ is the eigenvalue with largest real part of −T + and ρ− is the eigenvalue with smallest real part of T − . Theorem 5.1 Assume that there exist p = p+ +p− +2 distinct complex numbers − + T sk such that κ(sk ) = δ. Define c+ 0 (s) = c0 (s) = 1 and ci (s) = ei (−sI − + −1 + − −1 − − + + − T T ) t , ci (s) = ei (sI − T ) t , and denote by b1 , . . . , bp+ , b1 , . . . , b− p+ , + − b0 , b0 the solutions to the p linear equations +
e sk u
=
−
p X i=0
−
sa + c+ i (sk )e bi −
p X i=0
− c− i (sk )bi .
(5.2)
´ CHAPTER XI. LEVY PROCESSES
354 Then
£ ¤ + Eu e−δτa ; τa+ < τ0− =
+
p X
b+ i ,
(5.3)
b− i .
(5.4)
i=0
£ ¤ − Eu e−δτ0 ; τ0− < τa+ =
−
p X i=0
Remark 5.2 The r.h.s. of (5.1) is welldefined not just for τ0− , Bi− ] . i = Eu [e i = Eu [e
Optional stopping at τ now gives Z ¡ ¢ M0 = esu = Eu Mτ = κ(s) − δ
τ
esZv dv − Eu e−δτ +sXτ . (5.5)
0
Bi+ ,
Given the overshoot over a equals 0 for i = 0 and is phasetype with + − representation (E + , eT i , T ) for i > 0. A similar argument applies to Bi , and so expanding the r.h.s. of (5.5), we obtain su
e
=
¡ ¢ κ(s) − δ
Z
+
τ
e
sZv
dv −
0
q X
−
sa + c+ i (s)e bi
−
i=0
Now note that
Z
q X
− c− i (s)bi . (5.6)
i=0
τ
esZv dv ≤ Eu τ esa (5.7) R for any s ∈ C. This readily implies that Eu {·} is an analytic function defined for all s ∈ C. It therefore follows by analytic continuation that 0 ≤ Eu
0
e
su
=
¡ ¢ κ(s) − δ
Z
τ
+
sZv
e 0
dv −
q X i=0
−
sa + c+ i (s)e bi
−
q X
− c− i (s)bi
i=0
for all s 6∈ D. In particular, taking s = sk this becomes (5.2). Formulas (5.3) and (5.4) are then clear. 2 Notes and references Theorem 3.3 occurs in Asmussen, Avram & Pistorius [70], though with a somewhat different proof. Further references on fluctuation theory for L´evy processes under phasetype assumptions include Pistorius [704], Dieker [323] and a series of papers by Mordecki and coauthors, e.g. Lewis & Mordecki [582]. A phasetype approximation for CGMY L´evy processes with applications for the pricing of equity default swaps is given in Asmussen, Madan & Pistorius [90]. In practice, the roots sk will more or less always be found to be distinct. In the rare cases with one or more roots having multiplicity > 1, modifications are needed. For an example of how this can be done, see D’Auria et al. [276].
This page intentionally left blank
Chapter XII
GerberShiu functions 1
Introduction
At some places in previous chapters we have seen results on the time of ruin τ (u), the deficit at ruin ξ(u) = Rτ (u)  and the surplus prior to ruin Rτ (u)− . In this chapter we will study a combination of these quantities simultaneously, which leads to a tractable and elegant treatment. This combination is of the form of an expected discounted penalty at ruin h i ¡ ¢ m(u) = E e−δτ (u) w Rτ (u)− , ξ(u) ; τ (u) < ∞ , (1.1) where the penalty w is a nonnegative function of the surplus prior to ruin and the deficit at ruin. The expression m(u) is usually referred to as the GerberShiu function. Clearly, for w ≡ 1 and δ = 0, (1.1) reduces to the ruin probability ψ(u), and for w ≡ 1 and δ > 0 one arrives (with a slight abuse of terminology) at the Laplace transform of the time to ruin τ (u). Alternatively, if δ = 0 and w is the bivariate Diracdelta function, (1.1) represents the joint density of the surplus prior to ruin and the deficit at ruin. The parameter δ ≥ 0 can be interpreted both as a discount rate and the Laplace transform argument. The refinement to include timedependence in the analysis is a natural step towards a better understanding of the behavior of the risk process. If f (x, y, tu) denotes the (defective) joint density of surplus prior to ruin, deficit at ruin and time of ruin given that R0 = u, then the finitetime ruin probability can be expressed as ¶ Z t µZ ∞Z ∞ ψ(u, t) = f (x, y, su) dx dy ds, 0
0
0
357
358
CHAPTER XII. GERBERSHIU FUNCTIONS
from which an integration by parts and the representation Z
∞Z ∞Z ∞
m(u) = 0
0
e−δt w(x, y) f (x, y, tu) dt dx dy
0
yields that for w ≡ 1 £ ¤ E e−δτ (u) ; τ (u) < ∞ =
Z
∞
e
−δt
0
∂ ψ(u, t) dt = δ ∂t
Z
∞
e−δt ψ(u, t) dt . (1.2)
0
Thus the GerberShiu function also contains as a special case the Laplace transform (w.r.t. time) of the finitetime ruin probability (or, equivalently, the ruin probability up to a random exponential time horizon with parameter δ). Other choices of the penalty w lead to interpretations of m(u) as the expected present value of deferred continuous annuities during the first negative excursion of Rt or the price of a perpetual American put option on an asset with dynamics given by {Rt } as well as the price of reset guarantees for mutual funds (see e.g. Gerber & Shiu [410]). Define the discounted (defective) joint density of surplus prior to ruin and deficit at ruin as Z ∞ f (x, yu) = e−δt f (x, y, tu) dt 0
and the discounted density of the surplus prior to ruin as Z f (xu) =
∞
f (x, yu) dy. 0
Let us start with some general considerations for renewal risk models. Unless stated otherwise, we will always assume a positive safety loading η > 0. Although not always necessary, we usually assume R ∞that the claim size distribution B has a density b. With the notation ω(x) = x w(x, y − x) B(dy), an alternative representation of the GerberShiu function is Z
∞Z ∞
m(u) =
w(x, y)f (xu) 0
0
b(x + y) dx dy = 1 − B(x)
Z
∞
ω(x) 0
f (xu) dx. 1 − B(x)
It is now easy to derive a defective renewal equation for m(u) that holds for general (zerodelayed) renewal models. Conditioning on the first time that the
1. INTRODUCTION
359
surplus falls below the initial level u and the size of this jump, one has Z uZ ∞Z ∞ m(u) = e−δt m(u − y)f (x, y, t0) dt dx dy 0 Z 0∞ Z 0∞ Z ∞ e−δt w(x + u, y − u)f (x, y, t0) dt dx dy + u 0 0 Z uZ ∞ = m(u − y)f (x, y0) dx dy 0 Z 0∞ Z ∞ + w(x + u, y − u)f (x, y0) dx dy. u
(1.3)
0
Denoting with
Z gδ (y) =
∞
f (x, y0) dx
(1.4)
0
the defective discounted density of deficit at ruin when u = 0, the defective renewal equation can be written as m = m ∗ gδ + h for the function h specified in (1.3). This equation will be useful at a number of places later on. A first consequence is Z Z ∞
∞
m(0) =
w(x, y)f (x, y0) dx dy. 0
Throughout the chapter, we tacitly assume Z ∞Z ∞ w(x, y)b(x + y) dx dy < ∞, 0
(1.5)
0
(1.6)
0
which is a natural condition to ensure that m(u) is finite for all u ≥ 0. In view of η > 0, we will also assume the natural boundary condition lim m(u) = 0
u→∞
(1.7)
(which in many cases is automatically fulfilled under additional assumptions on the interplay between the penalty w and the claim size distribution B). Notes and references The investigation of extensions of ruin probabilities has a long history, see e.g. Segerdahl [791], Siegmund [806], Gerber, Goovaerts & Kaas [405] and Dickson [305]. The definition of m(u) and the derivation of many of its properties goes back to Gerber & Shiu [408, 409]. Since then, this topic has experienced an enormous interest and activity. In a diffusion setup, another function involving the time value of ruin was discussed in Powers [710] under the name expected discounted cost of insolvency. In the literature, often additional conditions on the penalty function w like boundedness and continuity are imposed, which for instance ensure absolute continuity of m.
360
CHAPTER XII. GERBERSHIU FUNCTIONS
However, with some effort (and sometimes at the expense of regularity properties of m) these assumptions can usually be relaxed to condition (1.6). To avoid a too technical exposition, we will therefore not always be precise on the conditions for w, with the implicit understanding that for the respective proof method w is chosen appropriately and subsequently this choice can be relaxed (for a detailed discussion see e.g. Schmidli [780]).
2
The compound Poisson model
If {Rt } is the classical Cram´erLundberg process, one can derive an IDE for m, for instance via the following direct argument: Let h > 0. By conditioning on the time and amount of the first jump before time h (if there is such a jump), we get Z m(u) =
h
−(β+δ)t
βe 0
Z
³Z dt ∞
+
u+t
m(u + t − x)B(dx) 0
´ w(u + t, x − u − t)B(dx) + e−(β+δ)h m(u + h).
u+t
We differentiate this equation with respect to h and set h = 0 in the resulting equation.1 This yields Z u Z ∞ β m(u−x)B(dx) +β w(u, x−u)B(dx)−(β +δ)m(u)+m0 (u) = 0. (2.1) 0
u
Under the boundary condition limu→∞ m(u) = 0, equation (2.1) has a unique solution. The equation discussed in the following lemma will turn out to be of crucial importance throughout the whole section. b exists for an r > 0 and is steep (cf. p. 91) and δ > 0, then, Lemma 2.1 If B[r] b within the domain of B[r], the Lundberg fundamental equation ¡ ¢ b −1 −r = δ κ(r) = β B[r] (2.2) has one positive root γδ > 0 and one negative root −ρδ < 0. b = 1+δ/β +r/β, Proof. The only difference to equation IV.(5.2) is that now B[r] b so the result is obvious by the convexity of B[r] (see Figure XII.1). Note that γδ > γ0 = γ. 2 1 Note that the differentiability is again guaranteed by the same argument as in Remark VIII.1.11. Other ways to derive equation (2.1) include the (essentially equivalent) generator approach (cf. Chapter II) and the method given in Section 3c.
2. THE COMPOUND POISSON MODEL
361
Figure XII.1
2a
A Laplace transform approach
In view of the convolution term in the integrodifferential equation (2.1), the analysis becomes particularly transparent with Laplace transforms. Let Z ∞ m[−s] b = e−su m(u) du 0
R∞
R −su ∞
and ω b [−s] = u=0 e forms in (2.1) leads to
u
w(u, x − u) B(dx) du. Then taking Laplace trans
m[−s] b =
m(0) − β ω b [−s] . κ(−s) − δ
By (1.7), m(u) is bounded in u, so its Laplace transform m[−s] b must be an analytic function for (at least) 0 and hence the positive zero s = ρδ of the denominator must also be a zero of the numerator. In this way, one obtains by purely analytic arguments the identity Z ∞ Z ∞ −ρδ x m(0) = β ω b [−ρδ ] = β e w(x, y) b(x + y) dy dx . (2.3) x=0
0
From this we arrive at ¡ ¢ β ω b [−ρδ ] − ω b [−s] . m[−s] b = b s − δ − β + β B[−s]
(2.4)
362
CHAPTER XII. GERBERSHIU FUNCTIONS
Remark 2.2 Equation (2.3) contains surprisingly explicit information: if one chooses for w(x, y) the Diracdelta function for the second argument, one obtains the discounted probability density function of the deficit at ruin for initial surplus zero Z ∞ gδ (y) = β e−ρδ x b(x + y) dx, y > 0, (2.5) 0
which provides an alternative proof of Lemma V.(3.2). b On the other hand, the choice w(x) ≡ 1 (and using ω b [−ρδ ] = (1−B[−ρ δ ])/ρδ and κ(−ρδ ) = δ) leads to £ ¤ δ E e−δτ (0) ; τ (0) < ∞ = 1 − , ρδ
(2.6)
which already appeared in Corollary V.3.4 with a related, but somewhat different proof. In view of (1.2), (2.6) implies that the Laplace transform of the finitetime survival probability φ(0, t) = 1 − ψ(0, t) w.r.t. t is simply given by Z ∞ e−δt φ(0, t) dt = 1/ρδ . 0
2 Remark 2.3 Note that for δ → 0, under the net profit condition η > 0, we have ρδ → 0. An application of L’Hˆopital’s rule in (2.2) then gives δ/ρδ → 1 − βµB , and so the formulas in Remark 2.2 correspondingly simplify to g0 (y) = βB(y) (which is the ladder height density of the compound Poisson process as derived in IV.2) and (2.6) reduces to ψ(0) = βµB . Similarly, for δ = 0 equation (2.4) simplifies to ¡ ¢ β ω b [0] − ω b [−s] . (2.7) m[−s] b = κ(−s) If further w ≡ 1, this gives ¡ ¢ b β µB − (1 − B[−s])/s 1 1 − βµB b = − , ψ[−s] = κ(−s) s κ(−s) which is another way to write the PollaczeckKhinchine formula IV.(2.2).
(2.8) 2
This link to the classical model can be used to obtain some further nice identities in a simple way:
2. THE COMPOUND POISSON MODEL
363
Proposition 2.4 The defective (nondiscounted) density of the surplus prior to ruin in the compound Poisson risk model with initial capital u is given by ³ ¡ ¢ ¡ ¢´ β B(x) 1 − ψ(u) − I(x < u)B(x) 1 − ψ(u − x) (2.9) f0 (xu) = 1 − β µB and the defective (nondiscounted) density of the deficit at ruin is Z u ³ ´ ¡ ¢ ¡ ¢ β B(y) 1 − ψ(u) − 1 − ψ(u − x) b(x + y) dx . (2.10) f0 (yu) = 1 − β µB 0 Proof. Replacing the denominator in (2.7) by (2.8) gives, after inverting the Laplace transform, Z u Z ´ ³ ¡ ¢ ∞ ¡ ¢ β 1−ψ(u−x) w(x, y−x) B(dy) dx . m(u) = ω b (0) 1−ψ(u) − 1 − β µB 0 x ¡ ¢ b If we now choose w(x, y) = e−ax , i.e. ω b [−s] = 1 − B[−a − s] /(a + s), then this leads to £ −a R− ¤ τ (u) ; τ (u) < ∞ E e µ ¶ Z u b ¡ ¢ ¡ ¢ −a x β 1 − B(−a) 1 − ψ(u) − 1 − ψ(u − x) e B(x) dx , = 1 − β µB a 0 and the inverse Laplace transform w.r.t. a is (2.9). On the other hand, the ¡ ¢ b b choice w(x, y) = e−ay , i.e. ω b [−s] = B[−a] − B[−s] /(s − a), gives £ −a ξ(u) ¤ E e ; τ (u) < ∞ = µ ¶ Z u Z b ¡ ¢ ¡ ¢ ∞ −a(y−x) β 1 − B(−a) 1 − ψ(u) − 1 − ψ(u − x) e B(dy) dx 1 − β µB a 0 x and its inverse Laplace transform is (2.10).2
2
Taking derivatives at a = 0 of the above Laplace transforms now leads to the following identities: ¯ £ ¤ Corollary 2.5 The moments E (Rτ (u)− )n ¯ τ (u) < ∞ of the surplus prior to ruin are ¶ µ (n+1) Z u ¡ ¢ µB (1 − ψ(u)) β n − x 1 − ψ(u − x) B(x) dx , (1 − β µB )ψ(u) n+1 0 2 Note that from these expressions one can again observe the somewhat curious fact that in the compound Poisson model for u = 0 the distributions of the surplus prior to ruin and of the deficit at ruin coincide, see also Theorem IV.2.2.
364
CHAPTER XII. GERBERSHIU FUNCTIONS
¯ £ ¤ and the moments E ξ(u)n ¯ τ (u) < ∞ of the deficit at ruin are given by ¢ Z u µ (n+1) ¡ ¶ Z ¡ ¢ ∞ µB 1 − ψ(u) β n − 1 − ψ(u − x) (y − x) B(dy) dx . (1 − β µB )ψ(u) n+1 0 x £ ¤ (2) Proposition 2.6 Let µB < ∞ and define ψn (u) = E τ (u)n ; τ (u) < ∞ for n ∈ N0 . Then for n ≥ 1, ψn (u) is given by Z ∞ Z ∞ ³Z u ´ n ψ(u − y)ψn−1 (y) dy + ψn−1 (y) dy − ψ(u) ψn−1 (y) dy . 1 − β µB 0 u 0 In particular, £ ¤ E τ (u) τ (u) < ∞ =
Ru 0
ψ(u − y)ψ(y) dy +
R∞ u
ψ(y) dy −
(2)
βµB 2(1−βµB )
(1 − βµB ) ψ(u)
ψ(u)
.
¯ n b δ [−s] ¯ Proof. For w = 1, vn (s) = ∂ m is the Laplace transform of (−1)n ψn (u) ¯ ∂δ n δ=0 w.r.t. u. Consequently, differentiation w.r.t. δ of ¡ ¢ κ(−s) − δ m b δ [−s] = m(0) − β ω b [−s] and choosing δ = 0 gives κ(−s) vn (s) =
∂ n m(0) ¯¯ + n vn−1 (s). ¯ ∂δ n δ=0
Since vn (s) is an analytic function for 0 and s = 0 is the only zero of κ(−s) in the right halfplane, it follows that Z ∞ ∂ n m(0) ¯¯ n = −n lim v (s) = (−1) n ψn−1 (y) dy. ¯ n−1 s→0 ∂δ n δ=0 0 Hence, together with (2.8), we obtain µ ¶ Z ∞ b 1/s − ψ[−s] n , vn (s) = n (−1) ψn−1 (y) dy + vn−1 (s) 1 − βµB 0 from which the desired formula for ψn (u) follows immediately.R Finally, the ∞ formula for n = 1 follows from ψ0 (x) = ψ(x) and the identity 0 ψ(x) dx = £ ¤ (2) β µB / 2(1 − βµB ) (which is itself a direct consequence of (2.8) for s → 0; see also IV.(3.7)). 2 Let us now use equation (2.4) to derive another representation of the defective renewal equation for m(u):
2. THE COMPOUND POISSON MODEL
365
Proposition 2.7 The GerberShiu function m(u) in the compound Poisson model satisfies the defective renewal equation Z u m(u) = (1 − δ/ρδ ) m(u − y) gp (y) dy + h(u), (2.11) 0
where the (proper) density gp (y) is given by Z ∞ β gp (y) = e−ρδ (x−y) B(dx), 1 − δ/ρδ y and
Z
∞
h(u) = β
Z e−ρδ (x−u)
u
y≥0
(2.12)
∞
w(x, y − x) B(dy) dx.
(2.13)
x
b Proof. Replacing δ + β in the denominator of (2.4) by ρδ + β B[−ρ δ ] (which holds because κ(−ρδ ) = δ), one gets ¡ ¢ ¡ ¢ β ω b [−ρδ ] − ω b [−s] /(s − ρδ ) β ω b [−ρδ ] − ω b [−s] , = m[−s] b = b b b b s − ρδ − β B[−ρ δ ] + β B[−s] 1 − β B[−ρδ ]−B[−s] s−ρδ
(2.14)
which is of the form (2.11) for ω b [−ρδ ] − ω b [−s] b h[−s] = β s − ρδ and gbp [−s] =
b b β B[−ρ δ ] − B[−s] . 1 − δ/ρδ s − ρδ
Taking the inverse Laplace transform of the latter two quantities then gives the assertion. 2 As a byproduct, in view of (1.3) and (1.4) this again leads to Z ∞ gδ (y) = (1 − δ/ρδ ) gp (y) = β e−ρδ x b(x + y) dx, y ≥ 0,
(2.15)
0
which is (2.5). At the same time, with w ≡ 1 it follows from Proposition 2.7 (or directly b from ω b [−s] = (1 − B[−s])/s in (2.4)) that Z ∞ £ ¤ κ(−s)/s − δ/ρδ , e−su E e−δτ (u) ; τ (u) < ∞ du = κ(−s) − δ 0
366
CHAPTER XII. GERBERSHIU FUNCTIONS
which already appeared in Corollary V.3.5. A related consequence of Proposition 2.7 is the asymptotic behavior of m(u) for subexponential claim sizes: Theorem 2.8 Assume that w ≡ 1 and B ∈ S with finite mean. Then, for δ>0 β B(u) as u → ∞. m(u) ∼ δ ¡ ¢ b Proof. For w ≡ 1 we have ω b [−s] = 1 − B[−s] /s, and hence (again exploiting b κ(−ρδ ) = δ) the expression h[−s] in the proof of Proposition 2.7 simplifies to 1 − gbp [−s] b . h[−s] = (1 − δ/ρδ ) s But this implies that for all s ≥ 0 m[−s] b =
∞ 1−c gp [−s] (1 − δ/ρδ ) δ X 1 − δ/ρδ s − = (1 − δ/ρδ )n gbp [−s]n , 1 − (1 − δ/ρδ ) gbp [−s] s s ρδ n=1
where the geometric series converges since both m(0) = 1 − δ/ρδ < 1 and gbp [−s] < 1 (gp is a probability density). Taking the inverse Laplace transform now gives a representation of m(u) as the geometric compound tail ¶n ∞ µ δ δ X 1− G∗n (2.16) m(u) = p (u), ρδ n=1 ρδ where Gp is the c.d.f. of the density gp in (2.12). Its tail is Z ∞ β Gp (y) = e−ρδ x B(x + y) dx 1 − δ/ρδ 0 Z ∞ ´ β ³ B(y) + = e−ρδ x B(dx + y) ρδ − δ 0 ¡ ¢ β B(y) 1 + o(1) , = ρδ − δ
(2.17)
where the last step follows from B ∈ S and Proposition X.1.5. Corollary X.1.10 now implies Gp ∈ S. As in Lemma X.2.2, one obtains from (2.16) by dominated convergence that 1 − δ/ρδ m(u) = lim u→∞ Gp (u) δ/ρδ and the result finally follows from (2.17).
2
2. THE COMPOUND POISSON MODEL
367
Remark 2.9 Since for δ = 0, m(u) = ψ(u) for w ≡ 1, a comparison of the above result with Theorem X.2.1 shows that for subexponential claim sizes the introduction of the discount rate δ > 0 moves the asymptotic behavior of m(u) away from the magnitude of the integrated tail B 0 to the one of the tail B (for δ → 0 one has 1 − δ/ρδ → βµB and the density g0 (y) in the defective renewal equation is correspondingly replaced by B(y)/µB , cf. IV.(3.2)). Since for general penalty functions the representation of m(u) as a compound geometric tail is usually not available, one needs slightly different methods to establish corresponding asymptotic results. We shall not pursue this further; the interested reader is referred to Tang & Wei [834] for details. 2
2b
Change of measure
© ª As in Section IV.1, consider the Wald martingale Lt = exp rSt − κ(r)t as the likelihood ratio process. Then we have by a change of measure that h i ¡ ¢ m(u) = Er e−rSτ (u) +κ(r)τ (u) e−δτ (u) w Rτ (u)− , ξ(u) ; τ (u) < ∞ . ¡ ¢ If the Lundberg coefficient γδ > 0 exists, then Pγδ τ (u) < ∞ = 1 and hence ¡ ¢¤ £ ¡ ¢¤ £ m(u) = Eγδ e−γδ Sτ (u) w Rτ (u)− , ξ(u) = Eγδ e−γδ ξ(u) w Rτ (u)− , ξ(u) e−γδ u . Note that under the new measure Pγδ , not only the event of ruin is certain, but also the timedependence of the penalty function has disappeared (or, rather, hides in the value of γδ ). If the penalty w is bounded, then a Lundbergtype inequality of the form m(u) ≤ supx,y w(x, y)e−γδ u immediately follows. The next result gives the asymptotic behavior for general continuous penalty functions: Proposition 2.10 Assume that the penalty function w is continuous. If γδ > 0 exists, then R∞R∞ β 0 z w(z, x − z)B(dx) (eγδ z − e−ρδ z ) dz γδ u . lim m(u)e = Cδ = u→∞ b 0 [γδ ] − 1 βB £ ¡ ¢¤ Proof. Define m(u) e = m(u)eγδ u = Eγδ e−γδ ξ(u) w Rτ (u)− , ξ(u) and denote by H(x, y) = Pγδ [Rτ (0)− ≤ x, ξ(0) ≤ y] the joint distribution of the surplus prior to ruin and the deficit at ruin under the tilted measure given that the risk process starts in u = 0. Then, just as in (1.3) but now under the new measure, it immediately follows that Z u Z ∞ Z ∞ m(u) e = m(u e − y)H(∞, dy) + w(x + u, y − u)e−γδ (y−u) H(dx, dy). 0
y=u
x=0
368
CHAPTER XII. GERBERSHIU FUNCTIONS
This is a (now proper) renewal equation for m(u) e and according to Proposition A1.1 we need to show that the second summand above is directly Riemann integrable. Since w is continuous, it is enough to show that there is a directly Riemann integrable upper bound. Since γδ > 0 exists, all moments of the claim size distribution (and consequently of the surplus prior to ruin and the deficit at ruin) exist and in view of (1.6) it is then enough to show that 1 − H(∞, y) is directly Riemann integrable, but the latter follows from the existence of all moments of the claim size distribution B. Applying now Proposition A1.1, it just remains to calculate the limiting constant R∞
R∞ R∞ R∞
w(x + u, y − u)e−γδ (y−u) H(dx, dy) du R∞ . (1 − H(∞, y)) dy 0 (2.18) Following the idea of Remark IV.5.6, the simplest way to identify its value is from Cδ = lims→0 s m[−s b + γδ ] and expression (2.4). However, we shall here directly evaluate (2.18): Recall from Theorem IV.2.2 that under the original measure Cδ =
0
z(u)du = µF
u=0 y=u x=0
£ ¤ P Rτ (0)− ≤ x, ξ(0) ≤ y, τ (0) < ∞ = β
Z
x
Z
z+y
B(dv) dz. z=0
(2.19)
v=z
b δ ] and Under Pγδ , the risk process is again compound Poisson with βγδ = β B[γ eγδ x B(dx), so the safety loading is negative. Hence we need a Bγδ (dx) = b δ] B[γ further exponential tilting by the factor −(ρδ + γδ ) to obtain a classical compound Poisson process Rt∗ with positive safety loading, claim distribution B ∗ and Poisson parameter β ∗ , for which we can apply (2.19). This leads to ¤ £ ∗ E e(γδ +ρδ )ξ (0) ; Rτ∗ ∗ (0)− ≤ x, ξ ∗ (0) ≤ y, τ ∗ (0) < ∞ Z y Z x ∗ (γδ +ρδ )v = β e B ∗ (z + dv)dz v=0 z=0 Z x Z z+y = β ∗ e−(γδ +ρδ ) z e(γδ +ρδ ) v B ∗ (dv) dz.
H(x, y) =
0
z
Since β ∗ and B ∗ are related to β and B through exponential tilting by −ρδ , we finally arrive at Z
x
H(x, y) = β 0
Z e−(γδ +ρδ )z z
z+y
eγδ v B(dv) dz.
2. THE COMPOUND POISSON MODEL
369
The denominator of (2.18) then is, by changing the order of integration, Z ∞ Z ∞Z ∞Z ∞ ¡ ¢ 1 − H(∞, y) dy = β eγδ v dB(v)e−(γδ +ρδ )z dz dy 0
0
=
0
y+z
´ βB b δ ] − B[−ρ b b 0 [γδ ] − 1 B[γ β ³ b0 δ] B [γδ ] − = . γ δ + ρδ γδ + ρδ γδ + ρδ
Similarly, the numerator of (2.18) simplifies to Z ∞Z ∞Z ∞ β w(x + u, y − u)e−ρδ x+γδ u B(x + dy) dx du u=0 y=u x=0
which finally leads to the assertion. Remark 2.11 Note that for w ≡ 1 the constant simplifies to µ ¶ 1 1 δ + . Cδ = 0 κ (γδ ) γδ ρδ
2
(2.20) 2
2c
Martingales
© ª It is easy to see that the stochastic process e−δt−r Rt t≥0 is a martingale w.r.t. its natural filtration F if and only if r = γδ > 0 or r = −ρδ ≤ 0. This can be exploited in various ways.
Proposition 2.12 If γδ > 0 exists, then £ ¤ E e−δτ (u)+γδ ξ(u) ; τ (u) < ∞ = e−γδ u , δ ≥ 0, u ≥ 0. © −δt−γ R ª t δ Proof. The martingale e is bounded by 1 for 0 ≤ t < τ (u). Hence t≥0 we can apply the optional sampling theorem for the stopping time τ (u) to obtain £ ¤ E e−δτ (u)−γδ Rτ (u) = e−γδ u . £ ¤ Due to limu→∞ Rt = ∞ a.s., one has E e−δτ (u)−γδ Rτ (u) ; τ (u) = ∞ = 0 for δ ≥ 0 and the result follows. 2 £ γξ(u) ¤ Note that for δ = 0, E e ; τ (u) < ∞ = e−γu , in line with II.(3.1). Exploiting the martingale for r = −ρδ leads to the following result (which is Lemma V.3.1, but we state it here again for completeness).
370
CHAPTER XII. GERBERSHIU FUNCTIONS
Proposition 2.13 Let τa+ = min {t > 0 : Rt ≥ a  R0 = u} for a > u. Then +
E[e−δτa ] = e−ρδ (a−u) , δ > 0, a > u. (2.21) © ª Proof. For fixed a > u, the martingale e−δt+ρδ Rt t≥0 is bounded by eρδ a for 0 ≤ t ≤ τa+ . Hence we can apply the optional sampling theorem for the stopping + time τa+ to obtain E[e−δτa +ρδ a ] = eρδ u . 2 £ −δτ (u)+ρ R (u) ¤ δ τ Define ψδ (u) = E e ; τ (u) < ∞ . Let T0 = min{t > τ (u)Rt = 0} be the time of recovery after ruin. Since (2.21) holds for arbitrary a, u ∈ R, + + one has for a < b that E[e−δ(τb −τa ) τa+ < τb+ ] = e−ρδ (b−a) and consequently ¯ £ ¤ E e−δ(T0 −τ (u)) ¯ τ (u) < ∞, Fτ (u) = e−ρδ ξ(u) . This leads to £ ¤ £ ¤ E e−δT0 ; τ (u) < ∞ = E e−δ(T0 −τ (u)) e−δτ (u) ; τ (u) < ∞ = ψδ (u) , which gives ψδ (u) the interpretation as the expected present value of a payment of 1 made at the time of recovery, if ruin occurs. Proposition 2.14 The discounted density of the surplus prior to ruin satisfies f (xu) = f (x0)
eρδ u I(x > u) + eρδ x ψδ (u − x) I(x ≤ u) − ψδ (u) , 1 − ψδ (0)
x > 0.
At x = u, f (xu) has a discontinuity of size f (x0) eρδ u = β B(u). Proof. We will use Laplace transforms. Since ψδ (u) is the GerberShiu function ¡ ¢ b b with w(x, y) = e−ρδ y (and correspondingly ω b [−s] = B[−ρ δ ] − B[−s] /(s − ρδ ) = gbδ [−s]/β), it follows from (2.14) and β ω b [−ρδ ] = m(0) that ψbδ [−s] =
ψδ (0) − gbδ [−s] ¡ ¢. (s − ρδ ) 1 − gbδ [−s]
(2.22)
Similarly to (1.3), by conditioning whether or not ruin occurs at the first time when the surplus falls below the initial value u, one can write down the renewal equation Z u f (x, yu) = f (x, yu − z)gδ (z) dz + f (x − u, y + u0) I(x > u) . (2.23) 0
By (2.15) f (x − u, y + u0) = βe−ρδ (x−u) b(x + y) = f (x, y0)eρδ u .
2. THE COMPOUND POISSON MODEL
371
Hence integrating the renewal equation w.r.t. y, we have Z
u
f (xu) =
f (xu − z)gδ (z) dz + f (x0) eρδ u I(x > u).
0
The function ζ(u) defined through f (xu) = f (x0) ζ(u) then fulfills the renewal equation Z u ζ(u) = ζ(u − z)gδ (z) dz + eρδ u I(x > u), 0
so that its Laplace transform is given by b ζ[−s] =
e(ρδ −s)x − 1 1 . 1 − gbδ [−s] ρδ − s
The statement of the proposition is that this expression is also the Laplace transform (w.r.t. u) of the function ζ2 (u) =
eρδ u I(x > u) + eρδ x ψδ (u − x) I(x ≤ u) − ψδ (u) . 1 − ψδ (0)
Standard calculations show that ζb2 [−s] =
e(ρδ −s)x −1 ρδ −s
+ e(ρδ −s)x ψbδ [−s]− ψbδ [−s] 1 − ψδ (0)
=
¡ (ρ −s)x ¢¡ 1 ¢ e δ −1 ρδ −s) + ψbδ [−s] 1 − ψδ (0)
Substituting (2.22) into the latter equation gives 1 e(ρδ −s)x − 1 , ζb2 [−s] = ρδ − s 1 − gbδ [−s] b which indeed coincides with ζ[−s].
2
Since ρδ = 0 for δ = 0, ψ0 (u) is the usual ruin probability ψ(u), and we obtain the (defective) nondiscounted density of the surplus prior to ruin f0 (xu) = f0 (x0)
I(x > u) + ψ(u − x) I(x ≤ u) − ψ(u) , 1 − ψ(0)
which is another way of writing (2.9).
x > 0,
.
372
2d
CHAPTER XII. GERBERSHIU FUNCTIONS
Further ruinrelated quantities
Somewhat surprisingly, it just turned out that the time of recovery T0 plays a crucial role for the surplus prior to ruin. A related natural question is to consider ¯the maximum severity of ruin prior to recovery, i.e. the r.v. M (u) = sup{Rt  ¯ τ (u) ≤ t ≤ T0 }. Its distribution function (given that ruin occurs) turns out to have a strikingly simple form in terms of the ruin probability ψ(u): Proposition 2.15 For positive safety loading η > 0, ¡ ¢ ψ(u) − ψ(u + z) ¡ ¢. P M (u) ≤ z  τ (u) < ∞ = ψ(u) 1 − ψ(z) Proof. Given ruin occurs, the event M (u) ≤ z happens if ruin occurs with some deficit y ≤ z and if the reserve process does not fall below level −z from there on before it is positive again. The latter is equivalent to the event that a risk reserve process ¡ starting in¢ z¡− y attains ¢ level z before ruin, which happens with probability 1 − ψ(z − y) / 1 − ψ(z) . This gives ¡ ¢ P M (u) ≤ zτ (u) < ∞ =
Z
z
0
f0 (yu) 1 − ψ(z − y) dy . ψ(u) 1 − ψ(z)
On the other hand, Z
Z
∞
ψ(u + z) =
f0 (yu) dy + z
z
f0 (yu)ψ(z − y) dy,
(2.24)
0
because for a risk process starting at level u + z, ruin can only occur if the reserve falls below level z and the first integral gives the probability that ruin occurs directly then, whereas the second integral gives the probability that ruin occurs later. Equation (2.24) can be rewritten as Z z Z ∞ ¡ ¢ f0 (yu) 1 − ψ(z − y) dy = f0 (yu) dy − ψ(u + z) = ψ(u) − ψ(u + z) 0
from which the result follows.
0
2
If the reserve process recovers after ruin, it may again first become negative before it reaches the previous maximum of the process, possibly leading to a larger ruin severity before reaching again this running maximum. The maximum severity of the ruin excursion out of the running maximum can be studied by very simple means thanks to the duality with queueing models, leading to another simple formula in terms of the survival probability φ(u) = 1 − ψ(u):
2. THE COMPOUND POISSON MODEL
373
Proposition 2.16 Define Mr (u) as the (absolute value of the) maximum severity during the excursion out of the running maximum of Rt that causes ruin. Then, for positive safety loading, Z ∞ 0 ¡ ¢ φ (w + z) φ(u) dw. P Mr (u) > z  τ (u) < ∞ = φ(w + z) φ(w) u Proof. Denote by Fk (u, z) the probability that for a reserve process starting in u, ruin occurs at the kth excursion out of the running maximum and the maximum severity is below level −z. Recall from Theorem III.2.3 the close connection between the maximum workload Vmax of an M/G/1 queue and the survival probability φ(u) of the compound Poisson risk process. If G(u) = P(Vmax < u), then Z ∞ ³Z t k−1 dv ´k−1 −βt k t e G(u + t + z) dt, Fk (u, z) = β G(u + v) (k − 1)! t t=0 v=0 because each excursion out of the running maximum occurs at an exponential(β) distributed time and whenever the excursion does not lead to ruin it can be excised from the process. Accordingly, the kth excursion occurs after an Erlang(k) distributed time and the previous ones are uniformly distributed over this interval and must not lead¢ to P ruin. Now the assertion follows by noting that ¡ ∞ P Mr (u) > z  τ (u) < ∞ = k=1 Fk (u, z), some simple algebra and application of Theorem III.2.3. 2 Notes and references There are several ways to derive the results of this section. The seminal paper of Gerber & Shiu [409] is a rich source of calculations in this context and much of the material presented in this section can be found there. The transparency of Laplace transforms in the analysis of GerberShiu functions is apparent, see also Dufresne & Gerber [333], Gerber & Shiu [408] and Dickson [305, 308]. Parts of the exposition of Section 2a follow from Albrecher & Boxma [19]; see also Willmot & Lin [892] and Schmidli [773]. In Albrecher, Gerber & Yang [23] one can find a transparent approach to some of the derived formulas by just using rational functions. Starting with the defective renewal equation for m, Lin & Willmot [596] derive some of the above and further results via compound geometric tails. Computational aspects of the calculation of ruin time moments are discussed in Drekic et al. [329, 330], see also Dermitzakis et al. [296]. The duration of negative surplus T0 − τ (u) was studied by other techniques in Dickson & Egidio dos Reis [311]. For the time and area spent at negative surplus levels up to a fixed time T , see Loisel [607]. Borovkov, Palmowski & Boxma [190] give a detailed related analysis of such quantities in a queueing context. Pitts & Politis [708] use a functional approach to approximate the GerberShiu function with the one from a ‘near’ claim distribution for which more explicit results exist. An algorithmic procedure to obtain moments of the ruin time
374
CHAPTER XII. GERBERSHIU FUNCTIONS
for discrete claim sizes in terms of generalized Appell polynomials was developed in Picard & Lef`evre [702]. The result of Theorem 2.8 is from Siaulys & Asanaviciute [803]. Generalizing a number of earlier results, Landriault & Willmot [573] use the Lagrange implicit function theorem to determine an explicit expression for the trivariate distribution of the time to ruin, the deficit at ruin and the surplus prior to ruin. This expression contains an infinite series of the integral of convolutions of the claim size density. Extending an idea of Frey & Schmidt [373], Usabel [860] develops a recursive computational technique to approximate the above trivariate distribution using its Taylor expansion in terms of the Poisson parameter β around β = 0. Tail bounds for this distribution obtained from the integral equation can be found in Psarrakos & Politis [720]. The GerberShiu function in a compound Poisson model with interest is considered in Cai & Dickson [214], see also Yang & Zhang [901] and Wu, Wang & Zhang [897]. Cai [213] and Yuen & Wang [911] deal with stochastic interest, whereas Badescu, Drekic & Landriault [118] study these ruinrelated quantities with a multistep premium rule under a MAP arrival process. For absolute ruin and the inclusion of tax payments see Ming, Wang & Xao [646]. Albrecher, Hartinger & Tichy [27] study the GerberShiu function under a timedependent threshold model for the premium income. Using the renewal measure of the defective renewal sequence of the zero points of Rt , calculations involving the maximum of the surplus process up to ruin, the last time the surplus process passes zero before going to infinity ultimately and the minimum of the surplus process up to that time are provided in Wu, Wang & Wei [896]. Proposition R T0 Rt dt (i.e. the area below zero 2.15 is from Picard [700], where also the quantity τ (u) until the time of recovery) is studied. For Proposition 2.16 and further related formulas see Albrecher, Borst, Boxma & Resing [17]. Baigger [124] studies general criteria under which ruin occurs only finitely often. The KellaWhitt martingale and the martingale introduced in [84] are used in Frostig [376] to derive results about the time of ruin in the presence of a reflecting barrier. Extensions are of course possible in many directions, Cheung et al. [242] include for instance information on the surplus after the secondlast claim before ruin. Note that all techniques presented for the renewal model in the next section are by definition directly applicable for the compound Poisson model as well.
3
The renewal model
In the following we consider the zerodelayed renewal model with interarrival time distribution A(t) and density a(t). If T1¡is the epoch of the first ¢ claim, the standard renewal argument gives m(u) = E e−δT1 m(u + T1 − U1 ) . I.e. m(u) is given by Z
∞ 0
e−δt a(t)
³Z
Z
u+t
∞
m(u+t−y) B(dy)+ 0
u+t
´ w(u+t, y −u−t) B(dy) dt. (3.1)
3. THE RENEWAL MODEL
3a
375
Change of measure
Recall from Section VI.3a the imbedded random walk structure of the renewal model. I.e. if we only consider the discrete time points at which a claim just occurred the resulting discrete time process is a random walk and in particular Markovian. However, since in this chapter we want to keep information on the time to ruin and the surplus prior to ruin as well (which is both lost when only in the imbedded random walk view), this type of markovization of the process is too crude for the present purpose. An alternative and elegant way to markovize the process is to consider the random variable Vt = TNt +1 − t as an additional state variable, which is the time remaining until the next claim.3 Define κ(r) as the solution of £ ¤ b A b − r − κ(r) = 1 , B[r] (3.2) b < ∞. It is easy to show by properties of momentgenerating for every r with B[r] functions that for r ≥ 0 this solution κ(r) exists, is unique, and that κ(r) is a strictly convex function on the set where it exists. Also, κ(0) = 0 and κ0 (0) < 0 under the net profit condition. With some further effort, one can then show that −(r+κ(r))Vt rSt −κ(r)t b Lt = B[r]e e e is a martingale with respect to the filtration generated by {(Rt , Vt )} (see e.g. [746, Th.11.5.2]). Lt can now be used as a likelihood ratio process. Under the measure Pr [·] = E[Lt ; ·], the risk process Rt remains a Sparre Andersen b risk process with claim distribution Br (dy) = A[−r − κ(r)]ery B(dy) and the b interclaim time distribution changed to Ar (dt) = B[r]e−(r+κ(r))t A(dt). If r > 0 argmin κ(r), then ¡ the drift ¢under the new measure is negative (−κ (r) < 0) and consequently Pr τ (u) < ∞ = 1. Under the measure Pr , the GerberShiu function m(u) can be expressed as h i ¡ ¢ b A[−r − κ(r)] · Er e(r+κ(r))Vτ (u) −rSτ (u) +(κ(r)−δ)τ (u) w Rτ (u)− , ξ(u) ; τ (u) < ∞ . Since Vτ (u) is the time to the next claim after ruin and hence independent of Fτ (u)− and Rτ (u) , this identity simplifies to £ ¡ ¢ ¤ m(u) = Er e−rξ(u) e(κ(r)−δ)τ (u) w Rτ (u)− , ξ(u) ; τ (u) < ∞ e−ru . As in the compound Poisson case, the time dependence disappears if κ(r) = δ. 3 This is called forward markovization and leads to some subtleties concerning the interpretation of the appropriate filtration. Alternatively, one could also use the time since the last claim (backward markovization) with a more intuitive appropriate filtration, but then the resulting equations are usually more cumbersome, see the References at the end of the section.
376
CHAPTER XII. GERBERSHIU FUNCTIONS
Proposition 3.1 Assume that the equation κ(r) − δ = 0 with κ(r) defined in (3.2) has a positive solution γδ > argmin κ(r). Then £ ¡ ¢¤ m(u) = Eγδ e−γδ ξ(u) w Rτ (u)− , ξ(u) e−γδ u (3.3) and for a continuous penalty function w lim eγδ u m(u) = Cδ
u→∞
for some constant Cδ > 0. Proof. Expression (3.3) follows from r = γδ and Pγδ (τ (u) < ∞) = 1. Now the same procedure as in the proof of Proposition 2.10 gives a renewal equation for eγδ u m(u) and establishes the asymptotic result. 2 The constant Cδ is now more difficult to evaluate. We will give its form for a large class of interclaim time distributions A in Corollary 3.9. Formula (3.3) can be helpful in a number of situations. We give one particular example: Corollary 3.2 In the renewal model, for exponential(ν) claim sizes the Laplace transform of the time to ruin is given by £ ¤ ν − γδ −γδ u e . E e−δτ (u) ; τ (u) < ∞ = ν b Proof. In this case γδ > 0 clearly exists and is¡ the solution of A[−γ ¯ ¢ δ − δ] = ¯ 1−γδ /ν. The lackofmemory property implies P ξ(u) > x τ (u) < ∞ = e−νx , ¡ ¢ and correspondingly Pγδ ξ(u) > x = e−(ν−γδ )x . The result then follows from (3.3) with w ≡ 1. 2 Remark 3.3 Note that Corollary 3.2 extends Theorem VI.2.2 with a quite different proof. 2
3b
A modified random walk
Another way to remove the discounting is to interpret aδ (t) = e−δt a(t) in (3.1) as a new (now defective) interclaim time density for a nondiscounted risk process. Pk This leads to a modified imbedded random walk Sδ,k = i=1 (Ui −RTδ,i ), k ≥ 1, ∞ where Tδ,i has defective density aδ (t) and a point mass of size 1 − 0 aδ (t)dt = b 1 − A[−δ] at infinity. Consequently, supk Sδ,k is finite with probability 1. By
3. THE RENEWAL MODEL
377
definition, the ruin probability of this modified random walk is the Laplace transform of the ruin time of the original risk process Rt , £ ¤ ¡ ¢ E e−δτ (u) ; τ (u) < ∞ = P sup Sδ,k > u , k
and the discounted distribution of the deficit at ruin of Rt is £ ¤ ¡ ¢ E e−δτ (u) ; ξ(u) ≤ y, τ (u) < ∞ = P Nδ,τ < ∞, Sδ,Nδ,τ ≤ u + y , where Nδ,τ = inf{k : Sδ,k > u}. Hence for these penalty functions the calculations are reduced to random walk techniques with defective increment distribution (for which e.g. the WienerHopf factorization can still be done in the same way). If the claim sizes are phasetype, this leads to the following generalization of Corollary 3.2 and also of Theorem IX.4.4: Theorem 3.4 Consider the renewal model with arbitrary interarrival distribution A and phasetype claim size distribution B with representation (α, T ). Denote with αδ+ the minimal nonnegative solution of Z ∞ αδ+ = α e(T + tαδ+ )t aδ (t)dt. 0
Then the Laplace transform of the ruin time is £ ¤ E e−δτ (u) ; τ (u) < ∞ = αδ+ e(T + tαδ+ )u e, and the discounted distribution of the deficit at ruin is given by £ ¤ E e−δτ (u) ; ξ(u) ≥ y, τ (u) < ∞ = αδ+ e(T + tαδ+ )u eT y e . Proof. The proof is a straightforward extension of the one of Theorem IX.4.4; see also Ren [735]. 2
3c
Integrodifferential equations
If the interarrival time density a(t) has rational Laplace transform, the integral equation (3.1) can be transformed into an integrodifferential equation (IDE). For that purpose assume that a(t) satisfies an nth order linear differential equation with constant coefficients, written in operator notation as ³d´ a(t) = 0, pA (3.4) dt with the polynomial pA (x) = xn + cn−1 xn−1 + · · · + c0 ,
cj ∈ R, c0 6= 0.
378
CHAPTER XII. GERBERSHIU FUNCTIONS
The first initial condition is determined by the fact that a(t) is a density. For ease of exposition, assume that the remaining n − 1 initial conditions of this ordinary differential equation (ODE) are homogeneous,4 i.e. a(k) (0) = 0
(k = 0, . . . , n − 2). (3.5) R∞ Integrating (3.4) from 0 to ∞ and using 0 a(t)dt = 1, with (3.5) the first initial condition then is a(n−1) (0) = c0 . (3.6) Proposition 3.5 Let the interarrival density a(t) fulfill (3.4) with initial conditions (3.5) and (3.6) and let m(u) be sufficiently smooth. Then m(u) is the solution of the IDE Z u ³ d ´ m(u) = c0 pA δ − m(u − y) B(dy) + c0 ω(u), (3.7) du 0 with boundary condition lim m(u) = 0.
(3.8)
u→∞
Proof. Rewrite (3.1) as Z m(u) =
∞
e−δt a(t)g(u + t) dt
0
with
Z
Z
u
g(u) =
∞
m(u − y) B(dy) + 0
w(u, y − u) B(dy) . u
By dominated convergence and partial integration one has Z ∞³ ³ ¢ d ´ ¡ −δt d ´ m(u) = e a(t)g(u + t) dt δ− δ− du du Z0 ∞ i h³ d ´ g(u + t) dt = e−δt a(t) δ − du Z0 ∞ i h d = e−δt a(t) (δ − )g(u + t) dt dt 0 Z ∞ d = g(u)a(0) + e−δt g(u + t) a(t) dt. dt 0 4 Inhomogeneous initial conditions can be dealt with analogously, one just gets additional terms in the calculations. Recall from Chapter I that any a(t) with rational Laplace transform can be represented as the solution of (3.4) and general initial conditions. But already the subclass with homogeneous conditions (3.5) is relevant. For instance, any density which is a convolution of n exponential densities with parameters βi can be expressed through (3.4) and Q (3.5) with pA (x) = n i=1 (x + βi ).
3. THE RENEWAL MODEL
379
Analogously we have under (3.5) ³ δ−
d ´k m(u) = g(u)a(k−1) (0)+ du
Z
∞
e−δt g(u+t)
0
dk a(t) dt, dtk
k = 1, . . . , n,
and combining these identities in such a way that (3.4) appears inside the integral on the righthand side, we obtain (3.7) in view of the initial conditions. 2
Ordinary differential equations Assume now that the claim size density b(y) is also the solution of an ODE of the form ³d´ b(y) = 0, pB (3.9) dy with the polynomial pB (x) = x` + d`−1 x`−1 + · · · + d0 ,
dj ∈ R, d0 6= 0
and some initial conditions b(k) (0) (k = 0, . . . , ` − 1) (where one initial condition is again determined by the fact that b(x) is a density). Then the IDE for m(u) can further be reduced to a linear ODE: Proposition 3.6 Assume that the claim size density b(y) satisfies (3.9). Then, under the assumptions of Proposition 3.5, m(u) satisfies the ODE µ ³ ´ ³ ³ d ´¶ ³ d ´ d ´ d pA δ − − pI m(u) = c0 pB ω(u), (3.10) pB du du du du with the polynomial pI (x) = c0
`−1 X ` X
dk b(k−j−1) (0) xj
j=0 k=j+1
and d` = 1. One boundary condition is (3.8) and ` more boundary conditions need to be specified. Proof. For k = 1, . . . , ` dk duk
µZ
¶
u
m(y) b(u − y) dy 0
=
k−1 X i=0
Z m(k−i−1) (u) b(i) (0)+
u
m(y) 0
dk b(u−y)dy. duk
380
CHAPTER XII. GERBERSHIU FUNCTIONS
The appropriate linear combination of derivatives of (3.7) according to (3.9) cancels the integral term on the r.h.s. of (3.7) and leaves instead c0
` X k=1
`−1 ³ X ` ´ dj X dk−i−1 m(u) b (0) k−i−1 m(u) = c0 dk b(k−j−1) (0) dk du duj i=0 j=0 k−1 X
(i)
k=j+1
with d` = 1.
2
It immediately follows from the representation (3.10) that the Lundberg fundamental equation for this model is given by the polynomial equation pB (s) pA (δ − s) − pI (s) = 0.
(3.11)
b From the definition of pB and pI , it becomes clear that pI (s)/pB (s) = c0 B[−s], so that (3.11) can also be written as b pA (δ − s) − c0 B[−s] = 0.
(3.12)
Lemma 3.7 For δ > 0, the Lundberg fundamental equation (3.12) has exactly n roots with positive and ` roots with negative real part. Proof. From (3.11) it is clear that the Lundberg fundamental equation is a polynomial of degree n + `, so that it has n + ` complex roots. The location of the roots then follows by an application of Rouch´e’s theorem. 2 Example 3.8 Assume that the initial conditions of (3.9) are given by b(k) (0) = 0
(k = 0, . . . , ` − 2).
(3.13)
(`−1)
b(y) is a density, so it automatically follows that b (0) = d0 and the inhomogeneity polynomial in (3.10) simplifies to pI (x) = c0 d0 . A particular case is when the interclaim time is Erlang(n, β) distributed and the claim size is Erlang(`, ν) distributed, in which case we have pA (x) = (x + β)n with (3.5) and c0 = β n as well as pB (x) = (x + ν)` with (3.13) and d0 = ν ` . Consequently, the ODE (3.10) then simplifies to ´` ³ ´n ´` ³ d ³ d d +ν − + δ + β m(u) − β n ν ` m(u) = β n + ν ω(u), du du du which is a popular choice in the literature. 2 In order to use the ODE (3.10) for concrete calculations, one needs to determine the remaining ` − 1 boundary conditions, usually using the negative solutions of the Lundberg fundamental equation, which can be a quite cumbersome task in general, but is possible in particular cases (for instance by the method of socalled integrating factors, see the Notes). An alternative is to determine the fundamental solution of (3.10) and substitute that back in the original IDE (3.7), or to use Laplace transforms.
3. THE RENEWAL MODEL
381
Laplace transforms As in Section 2, the convolution term in the IDE R ∞ (3.7) suggests Laplace transforms as a natural tool in this context. Since 0 e−su m(k) (u) du = sk m[−s] b − k−1 k−2 0 (k−1) s m(0) − s m (0) − · · · − m (0), one gets b pA (δ − s)m[−s] b + q(s) = c0 m[−s] b B[−s] + c0 ω b [−s] for some polynomial q(s) of degree n − 1 and subsequently c0 ω b [−s] − q(s) , b pA (δ − s) − c0 B[−s]
m[−s] b =
s ≥ 0.
(3.14)
Note that the denominator in this expression is again the Lundberg fundamental equation, which from Lemma 3.7 for δ > 0 is known to have n roots s = ρ1 , . . . , ρn with positive real part. Since the Laplace transform is an analytic function for s ≥ 0, these n roots must also be zeros of the numerator, which determines the n coefficients of q(s). Assuming for simplicity that the roots are distinct, the usual Lagrange interpolation formula gives q(s) = c0
n X
n Y
ω b [−ρj ]
j=1
k=1,k6=j
s − ρk . ρj − ρk
(3.15)
Using m(0) = lims→∞ s m[−s], b it then follows from (3.14) that −c0 m(0) =
Pn j=1
ω b [−ρj ]
Qn k=1,k6=j
1 ρj − ρk
(−1)n
= c0
n X
n Y
ω b [−ρj ]
j=1
k=1,k6=j
1 . ρk − ρj
(3.16) R∞R∞ Since ω b [−s] = 0 0 e−sx w(x, y) b(x + y) dx dy, a comparison with (1.5) now yields the pleasant formula f (x, y0) = c0 b(x + y)
n X j=1
e−ρj x
n Y k=1,k6=j
1 ρk − ρj
(3.17)
for the discounted joint density of surplus prior to and at ruin (given zero initial capital), expressed in terms of the zeros of the Lundberg fundamental equation. For the compound Poisson case (n = 1 and c0 = β), (3.17) simplifies to (2.15). Since (2.23) holds in the present renewal setting as well, one gets the representation Z u n n X Y 1 I(x > u) f (x, yu − z)gδ (z) dz + c0 b(x + y) e−ρj (x−u) ρk − ρj 0 j=1 k=1,k6=j
382
CHAPTER XII. GERBERSHIU FUNCTIONS
of f (x, yu). Integration with respect to y gives the expression Z
u
f (xu − z)gδ (z) dz + c0 B(x) 0
n X
−ρj (x−u)
e
j=1
n Y k=1,k6=j
1 I(x > u) ρk − ρj
for f (xu). Correspondingly, as a function of x, at x = u the discounted density of the surplus prior to ruin has a discontinuity of size c0 B(u)
n X
n Y
j=1 k=1,k6=j
1 . ρk − ρj
But for n ≥ 2 this sum equals zero, so that the discontinuity disappears! 5 The explicit form of the Laplace transform also allows to sharpen Proposition 3.1: Corollary 3.9 If the interarrival density a(t) fulfills (3.4) with initial conditions (3.5), then under the assumptions of Proposition 3.1 with a simple positive zero γδ > 0 of κ(r) = δ and distinct roots −ρ1 , . . . , −ρn with negative real part, one has ω b [γδ ] − lim eγδ u m(u) =
u→∞
Pn j=1
ω b [−ρj ]
Qn k=1,k6=j
−γδ − ρk ρj − ρk
b 0 [γδ ] −p0A (δ + γδ )/c0 + B
.
Proof. In view of Proposition 3.1, it suffices to determine the constant Cδ . With the formula Cδ = lims→0 s m[−s b + γδ ], we obtain the result through an application of L’Hˆopital’s rule in (3.14), using (3.15) and the fact that the solution γδ of the Lundberg fundamental equation has multiplicity 1. 2 Finally, we illustrate how the simple formula (2.6) for the Laplace transform of the time to ruin with zero initial capital can be generalized to certain renewal models: Example 3.10 Assume that the interarrival time is a generalized Erlang r.v. (that is, an independent sumQof not necessarily identically distributed Qnexpon nential r.v.’s) with pA (x) = i=1 (x + βi ) and correspondingly c0 = i=1 βi . ¡ ¢ b For w ≡ 1 one has ω b [−s] = 1 − B[−s] /s, and formula (3.16) together with 5 Note that here an underlying assumption was the homogeneity condition (3.5). The discontinuity does not necessarily disappear if the boundary conditions for a(t) are inhomogeneous, see Ren [734].
3. THE RENEWAL MODEL
383
b c0 B[−ρ j ] = pA (δ − ρj ) for j = 1, . . . , n (note that ρj are solutions of the Lundberg fundamental equation) then implies n ³ ´ X £ ¤ β1 · · · βn − pA (δ − ρj ) E e−δτ (0) ; τ (0) < ∞ = j=1
n Y k=1,k6=j
1 . ρj (ρk − ρj )
But by an induction argument this expression can be simplified to Qn £ ¤ (δ + βi ) − β1 · · · βn . E e−δτ (0) ; τ (0) < ∞ = 1 − i=1 ρ1 · · · ρn 2 Notes and references Early studies of penaltyrelated quantities in renewal models include Dickson & Hipp [316], Cheng & Tang [237], Tsai & Sun [857], Sun & Yang [819] and Drekic et al. [328]. Extensions to general discounted penalty functions in renewal models with Erlang interclaim times go back to Li & Garrido [589] and Gerber & Shiu [411]. For this model, Li [584] and Li & Garrido [589] give an alternative representation of the renewal equation (1.3) in terms of certain integral transforms Tr that can be interpreted as pseudoresolvents of the differentiation operator (evaluated at r = ρ1 , . . . , ρn ). These transforms turn out to be helpful in related models as well (originally studied by Redheffer [728]; they are nowadays usually referred to as DicksonHipp operators), see [411] for a detailed comparison of methods. Since then there have been numerous further papers on the subject, and the following list is by no means exhaustive. More general interclaim times are treated in Li & Garrido [590], Schmidli [778] and Song et al. [815]. Ren [734] extends Proposition 2.14 and Li [585] extends Proposition 2.13 to phasetype interclaim times. In Li [586], the latter result is used to study the time to recovery T0 and the maximum severity of ruin for phasetype interclaim times. Biard et al. [163] study the asymptotic behavior of the expected timeintegrated negative part of the risk process. Li & Dickson [587] investigate the maximum surplus before ruin in general Sparre Andersen models. Willmot, Cai & Lin [888] derive general bounds for solutions of ren ewal equations and apply them to the present setup. The derivation of the integrodifferential equation with operators is from Constantinescu [254], where the formulation is in terms of adjoint operators. Albrecher et al. [21] start from (3.10) to factorize the differential operator and subsequently lift this factorization to the equation level, which leads to an iterative solution of firstorder boundary value problems and allows to obtain a number of explicit expressions for m(u) using Gr¨ obner bases. Landriault & Willmot [572] give explicit expressions for the Laplace transform m[−s] b for arbitrary interclaim times and Coxian claim sizes. However, its explicit inversion is in general difficult. Section 3a is based on Schmidli [780], who uses the same technique to work out Lundbergtype approximations also in more general models including certain Cox models. The trick to use forward markovization
384
CHAPTER XII. GERBERSHIU FUNCTIONS
(instead of backward markovization) needs some care w.r.t. the appropriate filtration, but can be quite powerful. For a detailed discussion see e.g. Rolski et al. [746]. The idea to get rid of the discounting by modifying the interclaim distribution can be found in Avram & Usabel [113] and Ren [735]. Using duality relations to a compound Poisson model with arbitrary claim size distribution, a closedform formula for the density of the time to ruin for arbitrary interclaim times and exponential claim sizes (in terms of an infinite series of convolutions of A) is derived in Borovkov & Dickson [189], see also Dickson & Li [317]. Necessary amendments of the above results for stationary renewal models are for instance discussed in Willmot et al. [889, 890] and Ng [664], for other delayed renewal models see Willmot [887]. For bounds on the distribution of the deficit, see Chadjiconstantinidis & Politis [228] and Psarrakos [719]. The asymptotic behavior of m(u) for large u in the presence of heavy and semiheavy tails depends in a subtle way on the shape of the penalty function w, see Tang & Wei [834] for a fine and complete analysis. For an extension to a model with constant interest rate, see Wu, Lu & Fang [895].
4
L´ evy risk models
As already discussed in Chapter XI, there may be certain reasons to consider more general L´evy processes in the risk reserve modeling procedure, let alone the appeal of generality on the mathematical level. Recall from the quintuple identity of Section XI.4e that the joint distribution of the surplus prior to ruin and the deficit at ruin of a general L´evy process can be expressed through potential measures. The resulting expression, however, usually does not lead to explicit expressions unless one adds further restrictions on the model. We will consider in the sequel two cases that admit a rather explicit treatment, namely the case with onesided jumps and the compound Poisson process with twosided jumps.
4a
Spectrally negative L´ evy processes
If the risk reserve process is a L´evy process that can only have downward jumps, then it is possible to find an integral representation of the GerberShiu function through the corresponding scale function. As in Section XI.3, ρδ denotes the positive solution of the Lundberg equation κ(s) = δ. Theorem 4.1 Suppose that {Rt } is a spectrally negative L´evy process with positive drift. Then for a bounded measurable penalty function w(x, y) with
´ 4. LEVY RISK MODELS
385
w(·, 0) = 0, Z
∞Z ∞
m(u) = 0
¡ ¢ w(x, y) e−ρδ x W (δ) (u) − W (δ) (u − x) ν(dy + x) dx.
0
Remark 4.2 Note that ruin can occur either by a jump or through diffusion and the assumption w(·, 0) = 0 simply restricts the discounted penalty function to the case when ruin happens through jumps. If ruin is caused by diffusion, then the problem is somewhat degenerate with Rτ (u)− = Rτ (u) = 0. In that case one knows from Pistorius [703] that ¢ £ ¤ σ 2 ¡ (δ) 0 W (u) − ρδ W (δ) (u) . E e−δτ (u) ; Rτ (u) = 0 = 2 If {Rt } has bounded variation, the assumption w(·, 0) = 0 is not needed. Also, if {Rt } has unbounded variation and σ = 0, the assumption is not needed for u > 0. 2 Proof of Theorem 4.1. Our model for Rt is equivalent to a spectrally positive L´evy process with negative drift and R0 = 0, where the ruin event then refers to an overshoot of level u. Hence we can directly use Corollary XI.4.7 to write down the (defective) joint density of the surplus prior to ruin and the deficit at ruin. In particular, integrating XI.(4.8) w.r.t. y and translating into the present notation we obtain ¡ ¢ ¡ ¢ P Rτ (u)− ∈ dx, Rτ (u)  ∈ dy = W (0) (u) − W (0) (u − x) ν(x + dy) dx. We can now use the fact that exponential tilting by ρδ leaves the drift of the process positive and by XI.(3.8) the zeroscale function under the tilted measure (0) is given by Wρδ (u) = e−ρδ u W (δ) (u) and the L´evy measure changes to νρδ (dx) = e−ρδ x ν(dx). Hence we can write £ ¤ E e−δτ (u) ; Rτ (u)− ∈ dx, Rτ (u)  ∈ dy ¡ ¢ = eρδ (u+y) Pρδ Rτ (u)− ∈ dx, Rτ (u) ∈ dy ¡ ¢ = eρδ (u+y) Wρ(0) (u) − Wρ(0) (u − x) νρδ (x + dy) dx δ δ ¡ ¢ = e−ρδ x W (δ) (u) − W (δ) (u − x) ν(x + dy) dx. From the latter the assertion follows.
2
Remark 4.3 In the absence of a diffusion component (i.e. σ = 0), the jumps larger than a fixed ² > 0 form a compound Poisson process. As ² → 0, this
386
CHAPTER XII. GERBERSHIU FUNCTIONS
compound Poisson process converges weakly to the original spectrally negative L´evy process.6 One can now use this fact to observe that a number of results derived for the compound Poisson process still¡hold for more general purejump ¢ L´evy processes. The recipe is to just replace β 1 − B(x) by ν(x). For instance, from (2.13) and (2.15) it follows that for a spectrally negative L´evy process with triplet (1, 0, ν) and some restrictions on the penalty function, m(u) satisfies the defective renewal equation m = m ∗ g + h with Z ∞ g(y) = e−ρδ (x−y) ν(dx) y
and
Z
∞Z ∞
h(u) = u
e−ρδ (x−u) w(x, y) ν(dy + x) dx.
0
2 The compound Poisson risk model with perturbation Consider the risk reserve process Rt = u + t −
Nt X
Ui + σWt ,
t ≥ 0,
(4.1)
i=1
where Nt is again a homogeneous Poisson process and {Wt } is independent standard Brownian motion. The interpretation is that the diffusion part accounts for small perturbations of the risk process that can come from various sources (inaccuracies in the estimation or measurement, local deviations in the premium income or claim payouts etc.). As usual, the justification of using a diffusion to model such effects is that it can be thought of as the sum of many small independent effects and in the absence of further knowledge it is natural to assume that its drift is zero. Mathematically, (4.1) is clearly a special case of a spectrally negative L´evy process and so the above fluctuation theory and its results apply, but due to its simplicity this model can also be treated by other selfcontained techniques which can give additional insight. In the following we give an illustration of this. Recall that in the presence of the Brownian component, ruin can occur in two ways, either by a claim (which results in a nonzero deficit at ruin) or by oscillation (which is also often referred to as creeping). Assume that the penalty for the second is given by the constant 6 In fact, it converges even almost surely uniformly on bounded time intervals, see Bertoin [157].
´ 4. LEVY RISK MODELS
387
w(0) = w0 . With the generator technique of Chapter II it is clear that the discounted penalty function now satisfies the integrodifferential equation Z u Z ∞ σ2 β m(u−x)B(dx)+β w(u, x−u)B(dx)−(β+δ)m(u)+m0 (u)+ m00 (u) = 0. 2 0 u (4.2) One can now again proceed with Laplace transform techniques as in Section 2a. Then ¡ ¢ σ2 0 b [−s] 2 s m(0) + m (0) − β ω , m[−s] b = κ(−s) − δ b − r − β + σ 2 r2 /2. Starting with zero initial capital leads where now κ(r) = β B[r] to ruin immediately (due to the oscillation), so that m(0) = w0 . Since κ(−s) − δ has exactly one positive zero −ρδ > 0, this must be a zero of the denominator as well (again m[−s] b is analytic for 0). Hence we arrive at m[−s] b =
σ2 2
w0 (s − ρδ ) + β ω b [−ρδ ] − β ω b [−s] , κ(−s) − δ
(4.3)
which extends formula (2.4). By adapting the techniques of Section 2b one can show that for general penalty functions w(x, y) (under mild assumptions, see Sarkar & Sen [762] for details) the Cram´erLundberg approximation lim m(u)eγδ u = Cδ
u→∞
holds for the perturbed model (4.1). The constant is again given by Cδ = lims→0 s m[−s b + γδ ], i.e. with L’Hˆopital’s rule we obtain from (4.3) Cδ =
1 κ0 (γδ )
„ Z β
Z
∞ 0
∞
(eγδ x − e−ρδ x )b(x + y) dx dy +
w(x, y) 0
σ 2 w0 (γδ + ρδ ) 2
« . (4.4)
For w(x) = w0 = 1, this further simplifies to µ ¶ 1 1 δ + , Cδ = 0 κ (γδ ) γδ ρδ which formally coincides with (2.20), but note that the underlying κ(r) is different. If furthermore δ = 0 (i.e. m(u) = ψ(u)), then C0 = C = (1 − βµB )/κ0 (γ), in accordance with Corollary XI.2.7. With δ = 0 in (4.4) and the previously established fact that δ/ρδ → 1 − βµB , the choice w(x, y) = 0 and w0 = 1 finally gives the asymptotic probability of
388
CHAPTER XII. GERBERSHIU FUNCTIONS
ruin caused by oscillation to be ψd (u) ∼
σ 2 γ −γu e 2κ0 (γ)
as u → ∞,
and the probability of ruin caused by a claim (w(x, y) ≡ 1 and w0 = 0) behaves as 1 − βµB − σ 2 γ/2 −γu e ψs (u) ∼ as u → ∞. κ0 (γ) Notes and references In Biffis & Kyprianou [165], Theorem 4.1 is given in a more general form, where the GerberShiu function also includes the size of the last minimum before ruin; see also Biffis & Morales [166] for a convolutiontype approach. Chiu & Yin [245] give expressions for the duration of ruin and the time of the last visit of the ruin boundary for spectrally negative L´evy processes. The idea of Remark 4.3 goes back to Dufresne, Gerber & Shiu [335]; see also Garrido & Morales [391]. Kl¨ uppelberg, Kyprianou & Maller [543] derive explicit asymptotic results for ruinrelated quantities like the deficit at ruin and the surplus prior to ruin for u → ∞; see also Doney & Kyprianou [326]. For a detailed study of the GerberShiu function in a compound Poisson model with Brownian perturbation including defective renewal equations and asymptotics, we refer to Gerber & Landry [406] and Tsai & Willmot [858]. Lin & Wang [595] apply these results to the pricing of perpetual American catastrophe put options. The same argument as in Remark 4.3 applies to extend the resulting formulas to general spectrally negative L´evy processes (see Morales [649] for details). Some explicit calculations for this model under phasetype claims can be found in Ren [733]. For inclusion of interest rates see Wang & Wu [872] and a recent extension with more general investment is given in Avram & Usabel [114] and Wang, Xu & Yao [867]. GerberShiu functions under Brownian perturbation in renewal models are for instance studied by Li & Garrido [591] and in Markovmodulated compound Poisson models by Lu & Tsai [611]. There have also been studies with more general perturbations than Brownian motion. Among them, Furrer [381] deals with αstable motion for the perturbation and Chi, Jaimungal & Lin [244] use singular perturbation theory to deal with the GerberShiu function under perturbation with stochastic volatility of OrnsteinUhlenbeck type.
4b
The compound Poisson model with twosided jumps
Another case that admits a direct treatment is a compound Poisson model where jumps can be both upward and downward. Complementing the first passage
´ 4. LEVY RISK MODELS
389
expressions given in Section XI.5, let us briefly revisit risk models of the type u
Rt = u +
Nt X i=1
Pi −
Nt X
Ui .
(4.5)
i=1
Here the linear drift t from the Cram´erLundberg process is replaced by a compound Poisson process with i.i.d. positive upjumps Pi (with density p(x)) that occur according to a homogeneous Poisson process {Ntu } with intensity β u , independent of the claims process. For transparency of the exposition, we neither include an additional drift term nor a further Brownian perturbation component, although each is easily possible, see the Notes. Let h > 0. By conditioning on the time and amount of the first jump before time h, one has Z h Z u −(β+β u +δ)t m(u) = β e m(u − x)b(x) dx dt 0
Z
0 h
e−(β+β
+β
u
Z
+δ)t
0
∞
w(u, x − u)b(x) dx dt u
Z
h
+ βu
Z e−(β+β
0
u
+δ)t
∞
m(u + x)p(x) dx dt + e−(β+β
u
+δ)h
m(u),
0
which after differentiation with respect to h and setting h = 0 gives7 Z u Z ∞ β m(u − x)b(x)dx + β w(u, x − u)b(x)dx 0 u Z ∞ + βu m(u + x)p(x)dx − (β + β u + δ)m(u) = 0.
(4.6)
0
The function m(u) is the unique solution of (4.6), since the mapping β m(u) → β + βu + δ
Z 0
u
Z ∞ β m(u − x)b(x)dx + w(u, x − u)b(x)dx β + βu + δ u Z ∞ βu m(u + x)p(x)dx + u β+β +δ 0
is a contraction and has a unique fixed point. Let us again impose the boundary condition limu→∞ m(u) = 0. Instead of using Laplace transforms, we shall here proceed in a related, but slightly heuristic way and restrict the analysis to a penalty function that only 7 As before, the formal background for this type of reasoning is the generator approach of Section II.4a.
390
CHAPTER XII. GERBERSHIU FUNCTIONS
depends on the deficit, i.e. w(x, y) ≡ w(y). Assume first that the claim size distribution is a combination of n exponentials, i.e. b(x) =
n X
Ai αi e−αi x ,
x > 0,
(4.7)
i=1
where 0 < α1 < α2 < ... < αn and A1 + ... + An = 1. Some of the Ai ’s may be negative as long as b(x) ≥ 0. Then the discounted penalty function is of the form m(u) =
n X
Ck e−rk u ,
u ≥ 0.
(4.8)
k=1
To see this, one substitutes (4.8) into (4.6) and r1 , . . . , rn turn out to be the n solutions with positive real part of the generalized Lundberg equation Z ∞ n X αi + βu e−rx p(x) dx − (β + β u + δ) = 0 (4.9) β Ai α − r i 0 i=1 (potential negative solutions would violate limu→∞ m(u) = 0). This equation has indeed exactly n solutions with positive real part (by the usual Rouch´e argument). The one with the smallest real part is real and is the adjustment coefficient γδ < α1 . The coefficients C1 , . . . , Cn are the solutions of n X k=1
Πi Ck = , αi − rk αi
with the notation
Z
∞
Πi = αi
i = 1, ..., n,
(4.10)
w(y)e−αi y dy.
0
One way to solve this system of n linear equations for C1 , ..., Cn goes as follows. Define a rational function Q(r) =
n X k=1
Ck r − rk
(note that m[−s] b = −Q(−s)). Obviously, Ch
=
lim (r − rh )Q(r),
r→rh
h = 1, ..., n.
(4.11)
One can now find more tractable expressions for Q(r) and apply (4.11) to these expressions. Note that Q(r) is completely determined by the following three properties:
´ 4. LEVY RISK MODELS
391
• It is a rational function of the type polynomial of degree at most n − 1 divided by polynomial of degree n. • Its poles are r1 , ..., rn . • Q(αi ) =
Πi αi ,
i = 1, ..., n, according to (4.10).
The rational function n P
Q1 (r) =
j=1
n Q
Πj αj
(αj − rk )
k=1
n Q i=1,i6=j
n Q
r−αi αj −αi
(r − rk )
k=1
also fulfills these properties, and from this together with (4.11) gives a full specification of (4.8). If we now restrict to p(x) = ηe−ηx (i.e. exponential upjumps), then the Lundberg equation (4.9) specializes to β
n X
Ai
i=1
η αi + βu − (β + β u + δ) = 0. αi − r η+r
In addition to r1 , . . . , rn , this equation has one negative solution −ρδ . We can hence represent Q(r) also as β Q2 (r) = η+r
(η + r) β
n P i=1
n P i=1
i Ai αΠ − (η − ρδ ) i −r
n P i=1
i Ai αiΠ+ρ δ
,
η i + β u η+r − (β + β u + δ) Ai αα i −r
which immediately leads to
Ch
β = η + rh
(η + rh )
n P i=1 n P
i Ai αiΠ−r − (η − ρδ ) h
β
i=1
n P i=1
i Ai αiΠ+ρ δ
.
(4.12)
η αi u Ai (αi −r 2 − β (η+r )2 h) h
Furthermore, with m(0) = limr→±∞ rQ2 (r) we get m(0) =
·X µ ¶¸ n n X β αi + η β η − ρδ A Π 1 = . Ai Πi + i i β + β u + δ i=1 αi + ρδ β + β u + δ i=1 αi + ρδ
392
CHAPTER XII. GERBERSHIU FUNCTIONS
The class of distributions (4.7) is dense in the class of all positive distributions, so (heuristic, but intuitive!) one can deduce from the above that m(0) is given by ·Z ∞ ¸ Z ∞ Z ∞ β −ρδ x w(y)b(y)dy + (η − ρδ ) w(y) e b(x + y) dx dy . β + βu + δ 0 0 0 In this general case, −ρδ is the negative solution of the equation b + βu κ(r) = β B[r]
η − (β + β u ) = δ, η+r
(4.13)
b < ∞ (see Figure XII.2, which shows which is defined for all r > −η with B[r] that with upjumps the straight line from Figure XII.1 for the classical model is replaced by a more general curve).
Figure XII.2 Note that β/(β + β u + δ) is the discounted probability that the surplus process has a downward jump before the first upward jump (ruin occurs at that time), which explains the first summand in (4b). Let again gδ (x) denote the discounted probability density function of the deficit at ruin for initial surplus zero (i.e. the discounted descending ladder height density). Because (4b) holds for arbitrary w(y), by choosing the Diracdelta function we get · ¸ Z ∞ β −ρδ x b(y) gδ (y) = + (η − ρ ) e b(x + y) dx , x > 0 . (4.14) δ β + βu + δ 0 If the Lundberg approximation mδ (u) ∼ Cδ e−γδ u
as u → ∞
´ 4. LEVY RISK MODELS
393
holds,8 then the corresponding extension of (4.12) with rh = γδ immediately gives R∞ R∞ w(y) 0 [(η + γδ )eγδ x − (η − ρδ )e−ρδ x ] b(x + y) dx dy β 0 . (4.15) Cδ = η + γδ κ0 (γδ ) Corollary 4.4 The Laplace transform of the time to ruin with zero initial capital in the twosided compound Poisson model with exponential(η) upjumps is given by £ ¤ E e−δτ (0) ; τ (0) < ∞ = 1 −
ηδ . ρδ (β + β u + δ)
With positive safety loading, the ruin probability with zero initial capital is ψ(0) =
β (1 + η µB ) . β + βu
(4.16)
Furthermore, the constant C in the Cram´erLundberg approximation ψ(u) ∼ Ce−γ0 u is given by C =
β u − η β µB . (η + γ0 )κ0 (γ0 )
Proof. Just choose w(x) ≡ 1 in (4.12) and use the identity Z ∞Z ∞ ¢ 1¡ b B[r] − 1 ery b(x + y) dy dx = r 0 0
(4.17)
(4.18)
for r = −ρδ together with κ(−ρδ ) = δ. For the ruin probability, ρδ → 0 as δ → 0, so ψ(0) follows from the limit δ/ρδ → β u /η − βµB as δ → 0. Alternatively, take w ≡ 1 and δ → 0 directly in (4.12). Finally, the constant C follows from (4.15) by another application of (4.18), this time for r = γ0 with κ(γ0 ) = δ. 2 Remark 4.5 Formula (4.16) can be reformulated as PN1u β u E[ i=1 Pi ] β , + ψ(0) = PN1 u u β+β β + β E[ i=1 Ui ] 8 Conditions under which a defective renewal equation for m can be derived (and hence a Lundberg approximation for lighttailed B exists by the key renewal theorem) can e.g. be found in Labb´ e & Sendova [570].
394
CHAPTER XII. GERBERSHIU FUNCTIONS
which has the following interpretation: The first term is the probability a downjump occurs before an upjump, in which case ruin occurs at that time. If not (the probability of which is β u /(β + β u )), the conditional probability of ruin is the ratio of expected claim payments per time unit over expected income per time unit, which is a natural extension of Corollary IV.3.1 from the onesided jumps model. 2 Remark 4.6 Choosing δ = 0 in (4.14) gives the (nondiscounted defective descending) ladder height density g0 (y) =
¤ β £ b(y) + η B(y) , u β+β
x > 0,
which is the extension of g0 (y) = βB(y) for the compound Poisson process with onesided jumps derived in Theorem III.5.1. 2 PNtu If we take the limit β u → ∞, η → ∞ such that β u /η = 1, then i=1 Pi → t, so the upjumps converge to a linear drift with slope 1 and we arrive at the classical Cram´erLundberg model. Correspondingly, one can retrieve from each of the above results the corresponding Cram´erLundberg analogues in Section 2 as a limit case. Remark 4.7 If the upjump distribution is extended to a combination of exponentials, in principle analogous formulas hold, but the Lundberg equation κ(r) = δ then has additional zeros in the negative halfplane, so that the exposition gets more cumbersome and is omitted here (see the Notes below). 2 Notes and references In an obvious way, one can derive the analogous expressions for an added Brownian perturbation in (4.5), where the polynomials in Q(r) will then have its degree increased by 1. Also, the inclusion of a drift term is just a notational issue (and was left out here deliberately for transparency, and also in order to identify the Cram´erLundberg model as a simple limit). A PollaczeckKhinchine type formula for the ruin probability in the twosided purejump model was derived by Boucherie, Boxma & Sigman [191] in a queueing context, exploiting the observation that upjumps can be equivalently described as an increase of interarrival times in a renewal model with constant premium intensity as long as the desired quantities are invariant to scaling of time; for early timedependent considerations we refer to Perry, Stadje & Zacks [692], Kou & Wang [559] and Jacobsen [498] . Since then many papers appeared on the subject. The slightly heuristical procedure given above is from Albrecher, Gerber & Yang [23] and can be formally backed up by the usual IDE and renewal techniques.
´ 4. LEVY RISK MODELS
395
In more financeoriented contexts, a compound Poisson model with twosided jumps and perturbation is usually referred to as a jumpdiffusion (see e.g. Kou & Wang [560]). This model is for instance investigated by IDE methods in Chen, Lee & Sheu [233] and by renewal techniques in Zhang, Yang & Li [915] under weaker assumptions on the upjumps. For methodological links between ruin theory and credit ratings assessments under this model assumption, see e.g. Chen & Panjer [232]. Related considerations of the WienerHopf factorization for more general upjumps with rational Laplace transform are given in Section XI.5, see also Lewis & Mordecki [582], Pistorius [704], Dieker [323], Levendorskii [581] with finance applications and Chi [243] for formulations in terms of GerberShiu functions. Roynette, Vallois & Volpi [750] identify the limit distribution as u → ∞ of the surplus prior to ruin and the deficit at ruin in such a model. For a Sparre Andersen model with twosided jumps, see Zhang & Yang [914]. For most of the above results, the zeros of the generalized Lundberg function play a crucial role. Accordingly, for possible extensions of the results to more general L´evy models with twosided jumps, a fine study of these zeros is essential, see e.g. Kuznetsov [563] for more general scenarios that are still somewhat tractable. The GerberShiu function has also been extensively studied for risk processes that are reflected at horizontal barrier b (in which case we denote it by m(u; b)), with one possible interpretation that above the barrier, all premium income is paid out to shareholders as dividends. A closely related question is then to determine the expected present value V (u; b) of the corresponding aggregate dividend payouts until ruin, a quantity that in certain economic approaches is interpreted as the ‘value’ of the insurance portfolio (see the Notes of Section VIII.1). If the discount factor is the same δ as the one for m(u), it is clear that the dynamics of the process for m(u; b) and V (u; b) between 0 and b are identical, differences occur only upon exit of this interval. Under particular model assumptions, this translates into integrodifferential equations only differing in inhomogeneity terms and boundary conditions. For the compound Poisson model Lin, Willmot & Drekic [597] identified in this way the socalled dividendspenalty identity m(u) = m(u; b) − m0 (b)V (u; b), 0 ≤ u ≤ b. See also Yuen, Wang & Li [912] and Cai, Feng & Willmot [216] for inclusion of interest rates. Gerber, Lin & Yang [407] established this identity for arbitrary stationary Markov risk processes with only downward jumps by a strikingly simple probabilistic argument. An even more direct argument can be used for the model (4.5) with exponential upjumps, since then the dividends are paid out discretely and the compound Poisson process identity is then again obtained as a limit, see Gerber & Yang [415]. Extensions for Markovmodulated processes (in which case one obtains a matrix identity) are investigated in Li & Lu [592]; see also Cheung & Landriault [240] for a Markovian arrival process setup. As discussed in the Notes to Section VIII.1, the literature on dividend processes is huge and is not treated in this book. A general formalism under which the GerberShiu function, the expected discounted dividends, but also more general utilities of paths of the risk process can be accommodated is
396
CHAPTER XII. GERBERSHIU FUNCTIONS
proposed in Cai, Feng & Willmot [217]. Further results on discounted penalty functions for dependent risk models will be treated in Chapter XVI.
Chapter XIII
Further models with dependence Many classical results in ruin theory rest on the independence assumption of claim sizes, of interclaim times and the independence between claims and interclaim times. However, examples of risk processes with a certain degree of dependence appeared already at several places in this book (in particular, the Markovmodulated and the general Markov additive processes discussed in Chapters VII and IX). In this chapter, a number of further risk models with dependence will be discussed, some of which allow a quite analytic treatment. Naturally, for more involved model assumptions, the possible calculations will be less explicit and there is a tradeoff between considering a flexible dependence model (that can be calibrated to practical portfolio situations) and its tractability. In any case, it is crucial to understand how (possibly neglected) dependence may influence the actual values of ruin probabilities and related measures of riskiness in the portfolio. We will start with some general considerations on large deviations, which are of independent interest, but also provide a powerful tool to generalize asymptotic ruin results for lighttailed claim sizes to certain dependent scenarios. It will turn out that for weak forms of dependence the asymptotic behavior of the ruin probability is still exponential, but often with a modified adjustment coefficient (dependence situations where the adjustment coefficient remains unchanged include certain types of delay in claim settlement). For stronger (longrange) dependence it can happen that the ruin probability becomes heavytailed although the claim sizes are lighttailed. On the other hand, for heavytailed claim size distributions we will see in Section 2 that the asymptotic ruin proba397
398
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
bility is relatively insensitive to weak forms of dependence (which is consistent with the ‘one large claim’ heuristic). Sections 37 then deal with some more specific dependence models and the chapter finishes with some results on ordering and multivariate risk processes.
1
Large deviations
The area of large deviations is a set of asymptotic results on rare event probabilities and a set of methods to derive such results. The last decades have seen a boom in the area and a considerable body of applications in queueing theory and insurance risk. The classical result in the area is Cram´er’s theorem. Cram´er considered a random walk Sn = X1 + · · · + Xn such that the cumulant generating function κ(θ) = log EeθX1 is defined for sufficiently many θ, and gave sharp asymptotics for probabilities of the form P (Sn /n ∈ I) for intervals I ⊂ R. For example, if x > EX1 , then ³S ´ 1 n > x ∼ e−ηn √ P (1.1) n θσ 2πn where we return to the values of θ, η, σ 2 later. The limit result (1.1) is an example of sharp asymptotics: ∼ means (as at other places in the book) that the ratio is one in the limit (here n → ∞). However, large deviations results have usually a weaker form, logarithmic asymptotics, which in the setting of (1.1) amounts to the weaker statement ³S ´ 1 n log P > x = −η. (1.2) n→∞ n n √ Note in particular that (1.2) does not capture the n in (1.1) but only the dominant exponential term — the correct sharp asymptotics might as well have α been, e.g., c1 e−ηn or c2 e−ηn+c3 n with α < 1. Thus, large deviations results typically only give the dominant term in an asymptotic expression. Accordingly, logarithmic asymptotics is usually much easier to derive than sharp asymptotics but also less informative. The advantage of the large deviations approach is, however, its generality, in being capable of treating many models beyond simple random walks which are not easily treated by other models, and that a considerable body of theory has been developed. lim
log
For sequences fn , gn with fn → 0, gn → 0, we will write fn ∼ gn if lim
n→∞
log fn = 1 log gn
1. LARGE DEVIATIONS
399
(later in this section, the parameter will be u rather than n). Thus, (1.2) can log
be rewritten as P (Sn /n > x) ∼ e−ηn . Example 1.1 We will go into some more detail concerning (1.1), (1.2). Define κ∗ as the convex conjugate of κ, ¡ ¢ κ∗ (x) = sup θx − κ(θ) θ
(other names are the entropy, the LegendreFenchel transform or just the Legendre transform, or the large deviations rate function). Most often, the sup in the definition of κ∗ can be evaluated by differentiation: κ∗ (x) = θx − κ(θ) where θ = θ(x) is the solution of x = κ0 (θ), which is a saddlepoint equation — the mean κ0 (θ) of the distribution of X1 exponentially tilted with θ, i.e. of £ ¤ e 1 ∈ dx) = E eθX1 −κ(θ) ; X1 ∈ dx , P(X (1.3) is put equal to x. In fact, exponential change of measure is a key tool in large deviations methods. Define η = κ∗ (x). Since ³S ´ h i n e e−θSn +nκ(θ) ; Sn > x , >x = E P n n replacing Sn by nx in the exponent and ignoring the indicator yields the Chernoff bound ´ ³S n > x ≤ e−ηn . (1.4) P n e with mean nx and variance nσ 2 Next, since Sn is asymptotically normal w.r.t. P where σ 2 = σ 2 (x) = κ00 (θ), we have ¡ √ ¢ e nx < Sn < nx + 1.96σ n → 0.475, P and hence for large n P(Sn /n > x)
≥
£ √ ¤ e e−θSn +nκ(θ) ; nx < Sn < nx + 1.96σ n E
≥
0.4 e−ηn+1.96θσ
√ n
,
which in conjunction with (1.4) immediately yields (1.2). √ More precisely, if we replace Sn by nx + σ nV where V is N (0, 1), we get √ £ ¤ e e−θnx+nκ(θ)−θσ nV ; V > 0 P(Sn /n > x) ≈ E Z ∞ √ 2 1 −ηn = e e−θσ ny √ e−y /2 dy 2π 0 1 −ηn √ = e (1.5) θσ 2πn
400
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
which is the same as (1.1), commonly denoted as the saddlepoint approximation. The substitution by V needs, however, to be made rigorous; see Jensen [506] or [APQ, pp. 355–356] for details. 2 Further main results in large deviations theory are the G¨ artnerEllis theorem, which is a version of Cram´er’s theorem where independence is weakened to the existence of κ(θ) = limn→∞ log EeθSn /n, Sanov’s theorem which give rare events asymptotics for empirical distributions, Mogul’ski˘ı’s theorem which gives path asymptotics, that is, asymptotics for probabilities of the form P
¡©
Sbntc /n
ª 0≤t≤1
¢ ∈Γ
for a suitable set Γ of functions on [0, 1], and the WentzellFreidlin theory of slow Markov walks, which is of similar spirit as the dicussion in VIII.3. In the application of large deviations to ruin probabilities, we shall concentrate on a result which give asymptotics under conditions similar to the G¨artnerEllis theorem: Theorem 1.2 (Glynn & Whitt [419]) Let X1 , X2 , . . . be a sequence ¡ of r.v.’s, ¢ and write Sn = X1 +· · ·+Xn , τ (u) = inf {n : Sn > u} and ψ(u) = P τ (u) < ∞ . Assume that there exist γ, ² > 0 such that (i) κn (θ) = log EeθSn is welldefined and finite for γ − ² < θ < γ + ²; (ii) lim supn→∞ EeθXn < ∞ for −² < θ < ²; 1 (iii) κ(θ) = limn→∞ κn (θ) exists and is finite for γ − ² < θ < γ + ²; n (iv) κ(γ) = 0 and κ is differentiable at γ with 0 < κ0 (γ) < ∞. log
Then ψ(u) ∼ e−γu as u → ∞. For the proof, we introduce a change of measure for X1 , . . . , Xn given by Fen (dx1 , . . . , dxn ) = eγsn −κn (γ) Fn (dx1 , . . . , dxn ) where Fn is the distribution of (X1 , . . . , Xn ) and sn = x1 + · · · + xn (note that the r.h.s. integrates to 1 by the definition of κn ). We further write µ e = κ0 (γ). We shall need: Lemma 1.3 For each η > 0, there exists z ∈ (0, 1) and n0 such that ¯ ³¯ ´ ¯ en ¯¯ Sn − µ e¯ > η ≤ z n , P n for n ≥ n0 .
¯ ³¯ ´ ¯ en ¯¯ Sn−1 − µ e¯ > η ≤ z n P n
1. LARGE DEVIATIONS
401
Proof. Let 0 < θ < ² where ² is as in Theorem 1.2. Clearly, en (Sn /n > µ P e + η)
e n eθSn = e−nθ(eµ+η) eκn (θ+γ)−κn (γ) . ≤ e−nθ(eµ+η) E
Hence by (iii) and (iv),
lim sup n→∞
1 en (Sn /n > µ log P e + η) ≤ κ(θ + γ) − θe µ − θη n
and by Taylor expansion and (iv), the r.h.s. is of order −θη + o(θ) as θ ↓ 0, in particular the r.h.s. can be chosen strictly negative by taking θ small enough. en (Sn /n > µ This proves the existence of z < 1 and n0 such that P e + η) ≤ z n for e n ≥ n0 . The corresponding claim for Pn (Sn /n < µ e − η) follows by symmetry (note that the argument did not use µ e > 0). This establishes the first claim of the lemma, for Sn . For Sn−1 , we have en (Sn−1 /n > µ P e + η)
e n eθSn−1 = e−nθ(eµ+η) E e n eθSn −θXn ≤ e−nθ(eµ+η) E = e−nθ(eµ+η) Ee(θ+γ)Sn −θXn −κn (γ) £ ¤1/p £ −qθXn ¤1/q ≤ e−nθ(eµ+η)−κn (γ) Eep(θ+γ)Sn Ee £ ¤1/q = e−nθ(eµ+η)−κn (γ) eκn (p(θ+γ))/p Ee−qθXn
where we used H¨older’s inequality with 1/p + 1/q = 1 and p chosen so close to £ ¤1/q 1 and θ so close to 0 that p(θ + γ) − γ < ² and qθ < ². Since Ee−qθXn is bounded for large n by (ii), we get
lim sup n→∞
¡ ¢ 1 en (Sn−1 /n > µ log P e + η) ≤ −θ(e µ + η) + κ p(θ + γ) /p n
and by Taylor expansion, it is easy to see that the r.h.s. can be chosen strictly negative by taking p close enough to 1 and θ close enough to 0. The rest of the argument is as before. 2 Proof of Theorem 1.2. We first show that lim inf u→∞ log ψ(u)/u ≥ −γ. Let
402
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
¥ ¦ η > 0 be given and let m = m(η) = u(1 + η)/e µ + 1. Then £ ¤ e m e−γSm +κm (γ) ; Sm > u P(Sm > u) = E h µ i e m e−γSm +κm (γ) ; Sm > me E 1+η h S µ eη i e m e−γSm +κm (γ) ; m − µ e>− E m 1+η ¯ ¯S h µ eη i ¯ ¯ m e m e−γSm +κm (γ) ; ¯ −µ e¯ < E m 1+η ¯ n o ³¯ µ eη ´ 1 + 2η ¯ em ¯¯ Sm − µ e¯ < exp −γ µ e m + κm (γ) P . 1+η m 1+η
ψ(u) ≥ ≥ = ≥ ≥
em (·) goes to 1 by Lemma 1.3, and since κm (γ)/u → 0 and m/u → Here P (1 + η)/e µ, we get 1 + 2η lim inf ψ(u) ≥ −γ . u→∞ 1+η Letting η ↓ 0 yields lim inf u→∞ log ψ(u)/u ≥ −γ. For lim supu→∞ log ψ(u)/u ≤ −γ, we write ψ(u) =
∞ X
P(τ (u) = n) = I1 + I2 + I3 + I4
n=1
where
n(δ)
I1 =
X
bu(1−δ)/e µc
P(τ (u) = n),
X
I2 =
n=1 bu(1+δ)/e µc
I3 =
X
P(τ (u) = n),
n=n(δ)+1
P(τ (u) = n),
bu(1−δ)/e µc+1
I4 =
∞ X
P(τ (u) = n)
bu(1+δ)/e µc+1
© ª and n(δ) is chosen such that κn (γ)/n < min δ, (− log z)/2 and ¯ ³¯ δe µ ´ ¯ en ¯¯ Sn − µ e¯ > ≤ zn, P n 1+δ
¯ ³¯ δe µ ´ ¯ en ¯¯ Sn−1 − µ e¯ > P ≤ zn n 1+δ
(1.6)
for some z < 1 and all n ≥ n(δ); this is possible by (iii), (iv) and Lemma 1.3. Obviously, £ ¤ e n e−γSn +κn (γ) ; Sn > u P(τ (u) = n) ≤ P(Sn > u) = E en (Sn > u) ≤ e−γu+κn (γ) P (1.7)
1. LARGE DEVIATIONS
403
so that n(δ)
I1
e−γu
≤
X
eκn (γ) ,
(1.8)
n=1 bu(1−δ)/e µc
I2
≤
X
e−γu
e n > u) eκn (γ) P(S
n=n(δ)+1 bu(1−δ)/e µc
≤
X
e−γu
n=n(δ)+1
¯ ³¯ δe µ ´ ¯ en ¯¯ Sn − µ e¯ > e−n log z/2 P n 1+δ
bu(1−δ)/e µc
≤
X
e−γu
1
z
n=n(δ)+1
z n ≤ e−γu n/2
∞ X n=0
bu(1+δ)/e µc
I3
≤
X
e−γu
≤
e−γu
µ e
e−γu , 1 − z 1/2
(1.9)
bu(1+δ)/e µc
X
eκn (γ) ≤ e−γu
bu(1−δ)/e µc+1
³ 2δu
z n/2 =
enδ
bu(1−δ)/e µc+1
´
+ 1 eδu(1+δ)/eµ .
(1.10)
Finally, I4
∞ X
≤
P(Sn−1 ≤ u, Sn > u)
bu(1+δ)/e µc+1
h i e n e−γSn +κn (γ) ; Sn−1 ≤ u, Sn > u E
∞ X
=
bu(1+δ)/e µc+1 ∞ X
≤ e−γu
bu(1+δ)/e µc+1 ∞ X
≤ e−γu
¯ ³¯ δe µ ´ ¯ en ¯¯ Sn−1 − µ e¯ > eκn (γ) P n 1+δ 1
bu(1+δ)/e µc+1
z n/2
zn ≤
e−γu . 1 − z 1/2
Thus an upper bound for ψ(u) is e−γu
nn(δ) X
eκn (γ) +
n=1
³ 2δu ´ o 2 δu(1+δ)/e µ + + 1 e , µ e 1 − z 1/2
and using (i), we get lim sup u→∞
log ψ(u) δ(1 + δ) ≤ −γ + . u µ e
(1.11)
404
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
Let δ ↓ 0.
2
The following corollary shows that given that ruin occurs, the typical time is u/κ0 (γ) just as for the compound Poisson model, cf. V.4. Corollary 1.4 Under the assumptions of Theorem 1.2, it holds for each δ > 0 that ¡ ¡ ¢¢ log ψ(u) ∼ P τ (u) ∈ u(1 − δ)/κ0 (γ), u(1 + δ)/κ0 (γ) . Proof. Since ¡ ¡ ¢¢ log ψ(u) = I1 +I2 +I3 +I4 ∼ e−γ(u) , I3 = P τ (u) ∈ u(1−δ)/κ0 (γ), u(1+δ)/κ0 (γ) , it suffices to show that for j = 1, 2, 4 there is an αj > 0 and a cj < ∞ such that Ij ≤ cj e−γu e−αj u . For I4 , this is straightforward since the last inequality in (1.11) can be sharpened to I4 ≤ e−γu
z bu(1+δ)/eµc/2 . 1 − z 1/2
For I1 , I2 , we need to redefine n(δ) as bβuc where β is so small that ω = 1 − 4βκ0 (γ) > 0. For I2 , the last steps of (1.9) can then be sharpened to I2 ≤ e−γu
z bβuc/2 1 − z 1/2
to give the desired conclusion. e n > u) ≤ 1 used in (1.8) by For I1 , we replace the bound P(S e n > u) ≤ e−αu Ee e αSn = e−αu eκn (α+γ)−κn (γ) P(S where 0 < α < ² and α is so small that κ(γ + α) ≤ 2ακ0 (γ). Then for n large, say n ≥ n1 , we have κn (α + γ) ≤ 2nκ(γ + α) ≤ 4nακ0 (γ). Letting c11 = maxn≤n1 eκn (α+γ) , we get bβuc
I1
≤
X
© ª exp −(γ + α)u + κn (α + γ)
n=1
≤
bβuc X © ªn © ªo exp −(γ + α)u c11 n1 + exp 4nακ0 (γ)
≤
© ª © ª exp −(γ + α)u c1 exp 4βuακ0 (γ) = c1 e−γu e−α1 u ,
n=1
1. LARGE DEVIATIONS
405
where α1 = αω.
2
The criteria given in Theorem 1.2 are the somewhat natural extension of those for the renewal model discussed in Chapter VI, as due to the independence of the increments Xi = Ui − Ti condition (iv) in that case simplifies to κ(γ) = P 1 γ n i=1 Xi ) = log E(eγ(Ui −Ti ) ) (cf. VI.(3.1)). In the renewal setup, of log E(e n course, also the stronger result of the Cram´erLundberg approximation holds (cf. Theorem VI.3.2). Example 1.5 Assume the Xn form a stationary Gaussian sequence with mean µ < 0. It is then wellknown and easy to prove that Sn has a normal distribution with mean nµ and a variance ωn2 satisfying ∞
X 1 2 ωn = ω 2 = Var(X1 ) + 2 Cov(X1 , Xk+1 ) n→∞ n lim
k=1
provided the sum converges absolutely. Hence 1³ θ2 ωn2 ´ θ2 ω2 1 κn (θ) = nθµ + → κ(θ) = θµ + n n 2 2 for all θ ∈ R, and we conclude that Theorem 1.2 is in force with γ = −2µ/ω 2 . 2 Inspection of the proof of Theorem 1.2 shows that the discrete time structure is used in an essential way. Obviously many of the most interesting examples have a continuous time scale. If {St }t≥0 is the claims surplus process, the key condition similar to (iii), (iv) becomes existence of a limit κ(θ) of κt (θ) = log EeθSt /t and a γ > 0 with κ(γ) = 0, κ0 (γ) > 0. Assuming that the further regularity conditions can be verified, Theorem 1.2 then immediately yields the estimate ³ ´ log P sup Skh > u ∼ e−γu (1.12) k=0,1,...
for the ruin probability ψh (u) of any discrete skeleton {Skh }k=0,1,... . The problem is whether this is also the correct logarithmic asymptotics for the (larger) ruin probability ψ(u) of the whole process, i.e. whether ³ ´ log P sup St > u ∼ e−γu . (1.13) 0≤t 0 such that κ(γ) = 0 and that κ(θ) < ∞ for θ < γ + ². If the nth claim arrives at time σn = s, it contributes to St by the amount Un (t − s). Thus by (1.14), Z t Z t ¡ θUn (t−s) ¢ ¡ θUn (s) ¢ κt (θ) = β Ee − 1 ds − θt = β Ee − 1 ds − θt , 0 θUn (s)
0 θUn (∞)
and since Ee → Ee as s → ∞, we have κt (θ)/t → κ(θ). Since the remaining conditions of Theorem 1.2 are trivial to verify, we conclude that log ψ(u) ∼ e−γu (cf. the above discussion of discrete skeletons). 1 Another interpretation is to consider V (s) as a claim whose distribution depends on the time s of its occurrence and At as the aggregate sum of such claims.
1. LARGE DEVIATIONS
407
It is interesting to note and intuitively reasonable that the adjustment coefficient γ for the shotnoise model is the same as the one for the Cram´erLundberg model where a claim is immediately settled by the amount Un (∞). Of course, the Cram´erLundberg model has the larger ruin probability. 2 Example 1.7 Given the safety loading η, the Cram´erLundberg model implicitly assumes that the Poisson intensity β and the claim size distribution B (or at least its mean µB ) are known. Of course, this is often not realistic. An apparent solution to this problem is to calculate the premium rate p = p(t) at time t based upon claims statistics. Most obviously, the estimator of βµB PNbest t based upon Ft− , where Ft = σ(As : 0 ≤ s ≤ t), At = i=1 Ui , is At− /t. Thus, one would take p(t) = (1 + η)At− /t, leading to Z
t
St = At − (1 + η) 0
As ds . s
(1.15)
With the σi being the arrival times, we have St =
Nt X
Z
PNs
t
i=1
Ui − (1 + η)
s
0
i=1
Ui
ds =
³ t´ Ui 1 − (1 + η) log . (1.16) σi i=1
Nt X
Let κt (α) = log EeαSt . It then follows from (1.14) that Z
t
κt (α) = β 0
³ h t i´ ds − βt = tκ(α) φ α 1 − (1 + η) log s
(1.17)
where Z κ(α) =
1
β
¡ £ ¤¢ φ α 1 + (1 + η) log u du − β .
(1.18)
0
Thus (iii) of Theorem 1.2 holds, and since the remaining conditions are trivial to log
verify, we conclude that ψ(u) ∼ e−γu (cf. again the above discussion of discrete skeletons) where γ solves κ(γ) = 0. It is interesting to compare the adjustment coefficient γ with the one γ ∗ of the Cram´erLundberg model, i.e. the solution of β(EeγU − 1) − (1 + η)βµB = 0.
(1.19)
γ ≥ γ∗
(1.20)
Indeed, one has
408
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
with equality if and only if U is degenerate. Thus, typically the adaptive premium rule leads to a ruin probability which is asymptotically smaller than for the Cram´erLundberg model. To see this, rewrite first κ as h βE
κ(α) =
i eαU − β. 1 + (1 + η)αU
(1.21)
D PN1 This follows from the probabilistic interpretation S1 = i=1 Yi where Yi = Ui (1 + (1 + η) log Θi ) = Ui (1 − (1 + η)Vi ) where the Θi are i.i.d. uniform(0, 1) or, equivalently, the Vi = − log Θi are i.i.d. standard exponential, which yields Ee
αY
Z h £ (1+η)αU αU ¤ αU = E Θ e = E e
1 0
t(1+η)αU dt
i
h = E
i eαU . 1 + (1 + η)αU
∗
Next, the function k(x) = eγ x − 1 − (1 + η)γ ∗ x is convex with k(∞) = ∞, k(0) = 0, k 0 (0) < 0, so there exists a unique zero x0 = x0 (η) > 0 such that k(x) > 0, x > x0 , and k(x) < 0, 0 < x < x0 . Therefore h E
∗ i h i eγ U k(U ) − 1 = E 1 + (1 + η)γ ∗ U 1 + (1 + η)γ ∗ U Z x0 Z ∞ k(y) k(y) = B(dy) + B(dy) ∗ ∗ 1 + (1 + η)γ y 0 x0 1 + (1 + η)γ y Z ∞ nZ x0 o 1 ≤ k(y) B(dy) + k(y) B(dy) = 0, 1 + (1 + η)γ ∗ x0 0 x0
using that Ek(U ) = 0 because of (1.19). This implies κ(γ ∗ ) ≤ 0, and since κ(s), 0 κ∗ (s) are convex with κ0 (0) < 0, κ∗ (0) < 0, this in turn yields γ ≥ γ ∗ . Further, ∗ γ = γ can only occur if U ≡ x0 . 2 Condition (iii) of Theorem 1.2 reflects that the ruin probability still decays exponentially if the involved dependence is weak enough such that the logarithmic average of the moment generating functions of the (lighttailed) increment distributions converges. As Example 1.5 indicates, this will usually only be the case for shortrange dependence in the risk process. If κn (θ)/vn does not converge for vn = n, but converges for another rate function vn , it is also sometimes possible to derive the limiting behavior of ψ(u) by large deviations techniques. The needed technical assumptions are then
1. LARGE DEVIATIONS
409
more involved and we just mention the type of result that one can expect in such situations (formulated for a continuoustime risk process): if 1 log Eeθ v(t) St /a(t) v(t)
κ(θ) = lim
t→∞
(1.22)
exists for some scaling functions a(t) : R+ → R+ and v(t) : R+ → R+ with a(t), v(t) ↑ ∞, and there ¡ exists ¢another increasing scaling function h(t) such that g(d) = limt→∞ v a−1 (t/d) /h(t) exists for all d > 0, then under some additional technical assumptions, h ¡ ¢i 1 log ψ(u) = − inf g(d) sup θ d − κ(θ) = −γ . u→∞ h(u) d>0 θ∈R lim
log
In particular, ψ(u) ∼ e−γ v(a
−1
(u))
(1.23)
.
Example 1.8 Consider a continuoustime stationary zeromean Gaussian process {Zt } with arbitrary covariance function Cov(s, t) = E(Xs Xt ) and let St = Zt −µ t for some µ > 0. Then (1.22) holds with a(t) = t and v(t) = t2 /σt2 , where 2 2 2 σ ¡ t 2 =2 E(Z ¢ t ). For the choice h(t) = v(t), the expression g(d) = limt→∞ σt / d σt/d has to be finite for all d > 0, which introduces a condition on σt . It turns out that one can in fact verify all the technical assumptions underlying the above result and obtains h i σu2 2 log ψ(u) = − inf g(d)(d + µ) /2 . u→∞ u2 d>0 lim
If σt2 /t → σ 2 > 0, then this formula simplifies to limu→∞ u1 log ψ(u) = −2µ/σ 2 , which is the continuoustime version of Example 1.5 (and corresponds to shortrange dependence of St ). On the other hand, for E(Xs Xt ) = 21 (s2H + t2H − s − t2H ) we arrive at the case of Fractional Brownian Motion with Hurst exponent H ∈ (0, 1) (which is longrange dependent). From σt2 = t2H , one can easily derive that in this case lim
u→∞
1 u2−2H
£ ¤2H µ(1 − 1/H) log ψ(u) = , 2(1 − H)2
(1.24)
which shows that the ruin probability has a Weibulltype tail. For H > 0.5 (positive dependence of the increments) this is an instance where, despite lighttailed increments, the involved dependence leads to a heavytailed ruin probability. 2
410
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
Notes and references Some standard textbooks on large deviations are Bucklew [207], Dembo & Zeitouni [290] and Shwartz & Weiss [799]. In addition to Glynn & Whitt [419], see also Nyrhinen [667] for Theorem 1.2. M¨ uller & Pflug [652] give an elementary proof of this result in terms of exponential inequalities. Variants of the claims delay model of Example 1.6 can be found in Kl¨ uppelberg & Mikosch [545], Gao & Yan [388] and Ganesh, Macci & Torrisi [387]. For Example 1.7, see Nyrhinen [667] and Asmussen [67]; the proof of (1.20) is due to Tatyana Turova. A more general adaptive premium rule was considered in M¨ uller & Pflug [652]. Result (1.23) is from Duffield & O’Connell [331], where details on the derivation can be found; see also Chang, Yao & Zajic [231]. Fractional Brownian Motion will be discussed in more detail in the framework of Gaussian processes in Section 7. Further applications of large deviations ideas in risk theory occur e.g. in Djehiche [324], Lehtonen & Nyrhinen [576, 577], MartinL˝ of [630, 631] and Nyrhinen [667].
2
Heavytailed risk models with dependent input
In the previous section we saw the effect of dependence on the adjustment coefficient in the case of lighttailed claims. We now turn to heavytailed claim size distributions. In view of the ‘one large claim’ heuristics from independent increments it seems reasonable to expect a certain insensitivity of the asymptotic behavior w.r.t. dependence, as long as the dependence is not too strong. Various criteria (on dependence types of interclaim times, but also for possible dependence between the arrival process and the claim sizes) for this to be true were given by Asmussen, Schmidli & Schmidt [101]. We give here one of them, Theorem 2.1 based upon a regenerative assumption, and apply it to the Markovmodulated model of Chapter VII. For further approaches, examples and counterexamples, see [101]. Assume that the claim surplus process {St }t≥0 has a regenerative structure in the sense that there exists a renewal process χ0 = 0 ≤ χ1 ≤ χ2 < . . . such that {Sχ0 +t − Sχ0 }0≤t x ∼ P0 (S1∗ > x) , where Mn(χ) =
sup 0≤t u) ≥ 1. P(M > u)
(2.4)
Define ϑ∗ (u)
=
β(u)
=
inf{n = 1, 2, . . . : Sn∗ > u}, © ª (χ) inf n = 1, 2, . . . : Sn∗ + Mn+1 > u
(note that {M > u} = {β(u) < ∞}). Let a > 0 be fixed. We shall use the estimate ¡ ¢ ¡ ¢ (χ) P0 M > u, Mβ(u)+1 ≤ a = o P0 (M > u) (2.5) which follows since ¡ ¢ (χ) P0 M > u, Mβ(u)+1 ≤ a ∞ ³[ © ∗ ª´ ≤ P0 Mn ∈ (u − a, u) n=1
¡ ¢ ¡ ¢ ≤ P M ∗ ∈ (u − a, u) /P(M ∗ = 0) = o P0 (M ∗ > u) .
2. HEAVYTAILED RISK MODELS WITH DEPENDENT INPUT
413
¡ (χ) ¢ Given ² > 0, choose a such that P0 (S1∗ > x) ≥ (1 − ²)P0 M1 > x , x ≥ a. Then by Lemma 3.4, ¡ ¢ P0 (M ∗ > u) ∼ P0 M ∗ > u, Sϑ∗∗ (u) − Sϑ∗∗ (u)−1 > a =
∞ X
¡ ¢ ∗ P0 Mn∗ ≤ u, Sn+1 − Sn∗ > a ∨ (u − Sn∗ )
n=1
≥ (1 − ε)
∞ X
¡ ¢ (χ) P0 Mn∗ ≤ u, Mn+1 > a ∨ (u − Sn∗ )
n=1
≥ (1 − ε)
∞ X
³ P0
n=1
max
(χ)
0 a ∨ (u − Sn∗ )
´
¡
¢ (χ) = (1 − ε)P0 M > u, Mβ(u)+1 > a ∼ (1 − ε)P0 (M > u) . Letting first u → ∞ and next ² ↓ 0 yields (2.4).
2
Under suitable conditions, Theorem 2.1 can be rewritten as ρ B 0 (u) ψ0 (u) ∼ 1−ρ
(2.6)
where B is the Palm distribution of claims and ρ − 1 = limt→∞ St /t. To this end, assume the path structure St =
Nt X
Ui − t + Zt
i=1
with {Zt } continuous, independent of Then the Palm distribution of claims is
nP Nt i=1
(2.7)
o Ui
a.s.
and satisfying Zt /t → 0.
Nχ X 1 E0 I(Ui ≤ x) . B(x) = E0 Nχ i=1
(2.8)
Write β = E0 Nχ /E0 χ. Corollary 2.2 Assume that {St } is regenerative and satisfies (2.7). Assume further that (i) both B and B0 are subexponential; (ii) E0 z Nχ < ∞ for some z > 1; (iii) For some σfield F, χ and Nχ are Fmeasurable and P0
µX Nχ i=1
¯ ¶ ¯ Ui > x ¯ F ∼ Nχ · B(x)
414
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE ³
(iv) P0
´ sup Zt > x
¢ ¡ = o B(x) .
0≤t 0. The average arrival rate β and the Palm distribution B of the claim sizes are given by p p X 1X πi βi Bi β = πi βi , B = β i=1 i=1 Pp and we assume ρ = βµB = i=1 πi βi µBi < 1. Theorem 2.5 Consider the Markovmodulated risk model with claim size distributions satisfying (2.9). Then (2.6) holds. The key step of the proof is the following lemma. p
Lemma 2.6 Let (N1 , . . . , Np ) be a random vector in {0, 1, 2, . . .} , χ ≥ 0 a r.v. and F a σalgebra such that (N1 , . . . , Np ) and χ are Fmeasurable. Let {Fi }i=1,...,p be a family of distributions on [0, ∞) and define Yχ =
p X Ni X
Xij − χ
i=1 j=1
where conditionally upon F the Xij are independent with distribution Fi for Xij . Assume Ez N1 +···+Np < ∞ for some z > 1 and all i, and that for some distribution G on [0, ∞) such that G ∈ S and some c1 , . . . , cp with c1 + · · · + cp > 0 it holds that F i (x) ∼ ci G(x). Then P(Yχ > x) ∼ c G(x) where c =
p X
ci ENi .
i=1
Proof. Consider first the case χ = 0. It follows by a slight extension of Section X.1 that P(Y0 > x  F) ∼ G(x)
p X i=1
ci Ni ,
P(Y0 > x  F) ≤ CG(x)z N1 +···+Np
416
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
for some C = C(z) < ∞. Thus dominated convergence yields p ³ P(Y > x  F) ´ ³X ´ P(Y0 > x) 0 = E → E ci Ni = c . G(x) G(x) i=1
In the general case, as x → ∞, P(Yχ > x  F) = P(Y0 > χ + x  F) ∼ G(χ + x)
p X
ci Ni ∼ G(x)
i=1
and
p X
ci Ni ,
i=1
P(Yχ > x  F) ≤ P(Y0 > x  F) ≤ CG(x)z N1 +···+Np .
The same dominated convergence argument completes the proof.
2
Proof of Theorem 2.5. If J0 = i, we can define the regenerations points as the times of returns to i, and the rest of the argument is then just as the proof of Corollary 2.2. An easy conditioning argument then yields the result when J0 is random. 2 For lighttailed distributions, Markovmodulation typically decreases the adjustment coefficient γ and thereby changes the order of magnitude of the ruin probabilities for large u, cf. VII.4. It follows from Theorem 2.5 that the effect of Markovmodulation is in some sense less dramatical for heavytailed distriR∞ butions: the order of magnitude of the ruin probabilities remains u B(x) dx. Within the class of risk processes in a Markovian environment, Theorem 2.5 shows that basically only the tail dominant claim size distributions (those with ci > 0) matter for determining the order of magnitude of the ruin probabilities in the heavytailed case. In contrast, for lighttailed distributions the value of the adjustment coefficient γ is given by a delicate interaction between all Bi . Notes and references Theorem 2.5 was first proved by Asmussen, Fløe Henriksen & Kl¨ uppelberg [76] by a lengthy argument which did not provide the constant in front of B 0 (u) in final form. An improvement was given in Asmussen & Højgaard [80], and the final reduction by Jelenkovic & Lazar [504]. The present approach via Theorem 2.1 is from Asmussen, Schmidli & Schmidt [101]. That paper also contains further criteria for regenerative input (in particular also a treatment of the delayed case which we have omitted here), as well as a condition for (2.6) to hold in a situation where the interclaim times (T1 , T2 , . . .) form a general stationary sequence and the Ui i.i.d. and independent of (T1 , T2 , . . .); this is applied for example to risk processes with Poisson cluster arrivals. See also Araman & Glynn [52]. For further studies of perturbations like in Corollary 2.2 and Example 2.4, see Schlegel [766] and Zwart, Borst & D¸ebicki [923]. In the latter reference, situations are identified under which perturbations by general Gaussian processes do change the asymptotic behavior of ψ(u).
3. LINEAR MODELS
3
417
Linear models
Let us consider a discretetime risk model, where Rn = u+Z1 +· · ·+Zn denotes the surplus of the portfolio at the end of year n and Zn correspondingly the gain incurred in year n (of course any other time unit may be considered). Assume the autoregressive moving average (ARMA) structure Zn = a1 Zn−1 + · · · + am Zn−m + Xn + b1 Xn−1 + · · · + bk Xn−k ,
(3.1)
where X1 , X2 , . . . are i.i.d. r.v.’s with E[X1 ] > 0 and a1 , . . . am , b1 , . . . bk are constants. In compact notation one can write p(∆)Zn = q(∆)Xn with the polynomials p(x) = 1−a1 x−· · ·−am xm and q(x) = 1+b1 x+· · ·+bk xk and ∆ the backward shift operator. Assume that q(1) > 0, p(x) and q(x) do not have any common factor and all zeros of p(x) lie outside the unit disk of the complex plane (hence p(1) > 0). Proposition 3.1 Assume that {Rn } follows an ARMA structure of the form (3.1) with the above assumptions and with given initial values z0 , . . . , z−m+1 , x0 , . . . , x−k+1 . Assume further that a positive solution r = γ of the adjustment equation E[e−rX1 ] = 1 exists. Then ¯ ¡ ¢ P τ (u) < ∞ ¯ z0 , . . . , z−m+1 , x0 , . . . , x−k+1 ¢ ª © ¡ P∞ P∞ exp −γ u + `=0 ( i=`+1 b0i )x−` /b0 © ¡ ¢ ª¯ ¤, P∞ P∞ = £ E exp − γ Rτ (u) + `=0 ( i=`+1 b0i )Xτ (u)−` /b0 ¯ τ (u) < ∞
(3.2)
where b` and b0 are defined by (3.4) and (3.6) and x−k,...,−m−k+1 are determined by (3.5). Proof. One can equivalently express the ARMA model through a moving average (MA) model ∞ X Zn = Xn + b0` Xn−` , (3.3) `=1
where (for instance) Xn = 0 for n ≤ −m − k and b0` is determined by q(x)/p(x) = 1 +
∞ X `=1
b0` x` .
(3.4)
418
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
The needed m additional starting values x−k , . . . , x−m−k+1 can then be determined in such a way that m+k+n−1 X
zn = xn +
b0` xn−`
for n = 0, . . . , −m + 1
(3.5)
`=1
(which is a linear system of equations with a unique solution). From the location of the zeros it follows that b0` tend to zero exponentially fast and P∞of p(x), 0 consequently `=1 `b`  < ∞. Define b0 = 1 +
∞ X `=1
b0` =
q(1) > 0. p(1)
(3.6)
© ¡ ¢ ª P∞ P∞ It is not difficult to check that exp − γ Rn + `=0 ( i=`+1 b0i )Xn−` /b0 is a martingale and the assertion then follows as usual by optimal stopping for τ (u) ∧ T and T → ∞ (for bounded r.v. Xi , this limit operation can be justified by dominated convergence; for the unbounded case more work is needed, see Promislow [716]). 2 Remark 3.2 The appearance of the factor b0 in the above result is natural since in view of (3.1) the overall contribution of Xn to the surplus over time is b0 Xn . Hence a ‘fair’ comparison of the ARMA model (3.1) (or equivalently (3.3)) with a classical risk model with independent increments would be to en = u consider latter R e + Ze1 + · · · + Zen , where Zen = b0 Xn and u e = P∞ forPthe ∞ 0 u + `=0 ( i=`+1 bi )x−` is the sum of contributions of all the deterministic starting values. The adjustment coefficient in this independence model is then γ e = γ/b0 . Hence the adjustment coefficient of the ARMA model and the one of its independence counterpart are equivalent, i.e. the introduced dependence is weak enough to leave the adjustment coefficient unchanged. 2 Notes and references Proposition 3.1 is (for bounded r.v.’s) given in Gerber [400, 401] who also showed that for a finiteorder MA model the denominator in (3.2) converges to a constant for u → ∞, which establishes a Cram´erLundberg approximation. Promislow [716] extended the proof to unbounded r.v.’s and slightly weaker conditions on the coefficients of the resulting MA model. Chan & Yang [229] include a force of interest and consider separate time series for the premium income and the annual claim payments. Particular cases of the ARMA model have immediate interpretations for a credibility model (see [401]) as well as for models including underwriting cycles effects on premiums and certain IBNR models for delay of claim payments (see e.g. Trufin, Albrecher & Denuit [853, 855]). Linear processes of the above type can
4. RISK PROCESSES WITH SHOTNOISE COX INTENSITIES
419
also be addressed by large deviation techniques, which leads to logarithmic asymptotics only, but asymptotic information about the time of ruin can then be achieved as well, see e.g. Nyrhinen [666]. Other types of shortrange dependence structures are e.g. discussed in Albrecher & Kantor [32] and Afonso, Egidio dos Reis & Waters [7], where the size of an annually changing premium may depend on previous loss experience. An extension of Cram´ertype estimates to certain nonGaussian longrange dependent processes of fractional autoregressive integrated moving average (FARIMA) type is given in Barbe & McCormick [133]. Twosided infiniteorder MA processes with regularly varying tails were investigated by Mikosch & Samorodnitsky [640] and it was shown that under a tailbalance condition and some conditions on the coefficients (that imply shortrange dependence!) the asymptotic ruin probability has the same asymptotic order as the case with independent increments given in Theorem X.3.1 (namely, the tail of the stationary excess distribution of the increments), but the constant in front changes (a similar conclusion is found for a firstorder AR process with random coefficients in Konstantinides & Mikosch [550]; see also Hult & Samorodnitsky [486] for a recent extension to general twosided linear processes). Barbe & McCormick [134] show that for nonstationary and longrange dependent FARIMA processes with regularly varying innovations this insensitivity no longer holds and the asymptotic order changes. Mikosch & Samorodnitsky [641] study the ruin probability of stationary ergodic symmetric αstable processes for α ∈ (1, 2) and show that its asymptotic decay can become significantly slower than the one for independent increments; further refinements of these results are given in Alparslan & Samorodnitsky [44].
4
Risk processes with shotnoise Cox intensities
PNt Ui with i.i.d. claim sizes Ui Consider the surplus process Rt = u + t − i=1 that are independent of Nt , but now the claim number process Nt is a doubly stochastic Poisson process (Cox process) with a Poisson shot noise intensity process of the form X βt = β + h(t − σn , Yn ) , (4.1) n∈N
where {σn }n∈N is the sequence of arrival epochs of a homogeneous Poisson process of rate ζ, {Yn }n∈N is an i.i.d. sequence of positive r.v. (with distribution function FY ) independent of the Poisson process, and the function h(t, x) is nonnegative with h(t, x) = 0 for t < 0 (here β > 0 is assumed to be constant). An interpretation of model (4.1) is as follows: In addition to the occurrence of ‘normal’ claims described by a homogeneous Poisson process with constant rate β, there are also claims triggered by external events (such as natural catastrophes). These events occur at times {σn }n∈N (according to a homogeneous Poisson process with rate ζ). Due to reporting lags of the claims that originate
420
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
from a given external event, the resulting increase in intensity will develop according to the function h(t − σn , Yn ). This model captures the effect that such events can lead to a dramatic increase of the number of claims, whereas the individual claim sizes still follow the same distribution B. Figure XIII.3 shows a sample path of the intensity βt for h(t, x) = x e−t (t > 0) (with Yn being exponential(1), β = 0.5 and ζ = 0.7). 4
3
2
1
0
1
2
3
4
5
time
Figure XIII.3 Let H(t, y) =
Rt
h(s, y) ds and Z t X Λt = βs ds = β t + H(t − σn , Yn ) . 0
0
n∈N
The limiting average ¡ ¢ claim amount arriving per unit time turns out to be µ = β + ζ EH(∞, Y1 ) µB and the safety loading condition here is µ < 1. Similarly to (1.14), one can derive by a differential equation in t that Z t Rt ¢ ¡ ¢ ¡ b b log EeθAt = β t B(θ) −1 +ζ EY (e s h(w−s,Y )(B(θ)−1) dw ) − 1 ds 0 Z t ¡ ¢ ¡ £ (B(θ)−1)H(s,Y ¤ ¢ b ) b = β t B(θ) −1 +ζ EY e − 1 ds, (4.2) 0
P Nt
where At = i=1 Xi again denotes the aggregate claim size at time t. For St = At − t and κt (θ) = log EeθSt we then have κt (θ)/t → κ(θ) with ³ ´ ¡ ¢ b b κ(θ) = β B(θ) − 1 − θ + ζ EY (e(B(θ)−1)H(∞,Y ) ) − 1 . (4.3) ¡ ¢ b Theorem 4.1 Let both the m.g.f. B(θ) and E exp θ H(∞, Y ) exist for all θ in a neighborhood of the origin and be steep (cf. p. 91). Then, the risk process with claim occurrence according to the shotnoise intensity process (4.1) satisfies lim
u→∞
1 log ψ(u) = −γ , u
where γ is the positive solution of κ(γ) = 0 and κ is given by (4.3).
4. RISK PROCESSES WITH SHOTNOISE COX INTENSITIES
421
Proof. Considering a discrete skeleton {Anh }n∈N , (4.3) implies that κnh (θ)/n has a limit of the form κ(h) (θ) = h κ(θ). 00
Since an easy calculation shows that κ(h) (θ) > 0 for every θ ≥ 0, κ(h) (0) = 0 0 0 and κ(h) (0) = h (µ−1) < 0 by the net profit condition, it follows that κ(h) (γ) > 0. Here the required steepness implies that κ(θ) is unbounded in a neighborhood of its abscissa of convergence and hence guarantees the existence of the solution log γ > 0. Consequently, Theorem 1.2 applies and P(maxn Snh > u) ∼ e−γu . Finally, since max St ≥ max Snh ≥ max St − h, t
n
t
the maximum over nh can be replaced by the continuous time maximum over t and the theorem follows from ψ(u) = P(maxt St > u). 2 We now intend to refine Theorem 4.1. For that purpose, consider the comet , which is obtained by moving all arrivals of pound Poisson batch process R claims that are caused by a catastrophic event at σn to σn . This risk process e has intensity βe = β + ζ for arrivals of claims and a claim size distribution B PN (Y ) which is a mixture of B and the distribution of the random sum Z = i=1 Ui , where N (Y ) is Poisson with parameter H(∞, Y ) given Y and independent of e resp. ζ/β, e and the premium rate is 1 (Z can be the Ui ; the weights are β/β, interpreted as the total claim amount caused by the N (Y ) claims triggered by a specific event). Let L be the time from the event until the last of the N (Y ) e claims occurs and ψ(u) the ruin probability of this compound Poisson batch e process. Obviously, ψ(u) ≤ ψ(u). Theorem 4.2 For some constant C− > 0, C− e−γu ≤ ψ(u) ≤ e−γu for all u. e Proof. The upper inequality is clear from Lundberg’s inequality for ψ(u). For e the lower, it is well known that Rτe(u)− has a limit distribution given τe(u) < ∞ as u → ∞ (see Proposition V.7.4). Hence there exists an A such that ¡ ¢ e eτe(u)− ≤ A ≥ (1 − ²)ψ(u) P τe(u) < ∞, R (4.4) for all large u. Define the pree τ (u) occupation measure Q(u) by Z τe(u) (u) Q (G) = E I(Set ∈ G) dt , G ⊆ (−∞, u) . 0
Then the l.h.s. of (4.4) is Z
u
u−A
¡ ¢ e − x) Q(u) (dx) βe 1 − B(u
422
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
e (u) (u − A, u). Clearly, we can choose `1 with which is bounded above by βQ et will also cause ruin for Rt , if P(Z > A, L ≤ `1 ) > 0. Every ruin event for R the initial surplus u is lowered by `1 , given that the variable L corresponding to the batch claim causing ruin does not exceed `1 . Moreover, considering the situation only where the surplus prior to ruin is bounded above by A, we obtain a lower bound for the ruin probability of Rt , getting Z u ¡ ¢ ψ(u − `1 ) ≥ βe P Z > u − x, L ≤ `1 Q(u) (dx) u−A
≥ ≥
βe Q(u) (u − A, u) P(Z > A, L ≤ `1 ) e . P(Z > A, L ≤ `1 ) (1 − ²)ψ(u)
e Appealing to the Cram´erLundberg asymptotics for ψ(u), the proof is complete. 2 Let us now turn to heavytailed claim size distributions B. Theorem 4.3 Assume both B ∈ S and B0 ∈ S, and EeθH(∞,Y ) < ∞ for some θ > 0. Then µ B 0 (u). (4.5) ψ(u) ∼ 1−µ et defined In the proof, we shall employ coupling with the batch process R e above. Clearly St ≥ St in the sense of sample paths, and so it is trivial that e e ψ(u) ≤ ψ(u). The next lemma shows that ψ(u) has the claimed asymptotics, establishing the asymptotic upper bound in (4.5). e Lemma 4.4 Under the assumptions of Theorem 4.3, ψ(u) ∼
µ B 0 (u) . 1−µ
Proof. Conditioning upon Y , we get © ª Ez N (Y ) = E exp H(∞, Y )(z − 1) which, under the assumptions of Theorem 4.3, is finite for some z > 1 (implying that P(N (Y ) = n) decreases geometrically fast in n). Hence Lemma X.2.2 implies ¡ ¢ (4.6) P Z > x ∼ EH(∞, Y )B 0 (x) and subsequently e 1 − B(x)
∼
e0 (x) 1−B
∼
β + ζ EH(∞, Y ) B(x), βe ¡ ¢Z ∞ β + ζ EH(∞, Y ) µ B(z) dz = B 0 (x) = B 0 (x) . e e β µBe β µBe x
4. RISK PROCESSES WITH SHOTNOISE COX INTENSITIES
423
Finally, we have by Theorem X.2.1 e ψ(u) ∼
µ B 0 (u). 1−µ
2
Proof of Theorem 4.3. Consider the aggregate claim process A˘t obtained from At by moving all claims triggered by a catastrophic event and occurring at most `0 time units later to all occurring precisely `0 time units after the catastrophic event, whereas claims occurring more than `0 time units later are deleted. Then ˘ ψ(u) ≥ ψ(u) for all u. Standard results on translation of Poisson processes imply that the restriction of A˘t − t to t ∈ [`0 , ∞) is an ordinary Cram´erLundberg risk process, and by reasoning as in the proof of Lemma 4.4, we obtain ³ ´ ¡ ¢ µ(`0 ) B 0 (u) (4.7) P sup A˘t − A˘`0 − (t − `0 ) > u ∼ 1 − µ(`0 ) t∈[`0 ,∞) ¡ ¢ where µ(`0 ) = µB β + ζ EH(`0 , Y ) . Now ¡ ¢ sup (A˘t − t) ≥ (A˘`0 − `0 ) + sup A˘t − A˘`0 − (t − `0 ) . (4.8) t∈[0,∞)
t∈[`0 ,∞)
Here the two terms are independent. Since A˘`0 is the sum of a Poisson(β`0 ) number of claims, P(A˘`0 − `0 > u) ∼ β`0 B(u), which is dominated by (4.7). Hence the tail of supt∈[0,∞) (A˘t − t) is asymptotically given by (4.7), and we get lim inf u→∞
ψ(u) B 0 (u)
≥ lim inf u→∞
˘ µ(`0 ) ψ(u) = . 1 − µ(`0 ) B 0 (u)
Letting `0 → ∞ and using µ(`0 ) ↑ µ, we obtain lim inf u→∞
µ ψ(u) . ≥ 1−µ B 0 (u)
e Combining this with the bound ψ(u) ≤ ψ(u) and Lemma 4.4 completes the proof. 2 Remark 4.5 Note that for both light and heavytailed claims, the asymptotic behavior of the ruin probability is the same as for the compound Poisson batch process, which is the process where all claims triggered from a particular event occur directly at that time as one ‘batch claim’. In other words, on an asymptotic scale, the ruin probability turns out to be insensitive to the introduced dependence of claim arrivals (delay of claim arrivals, respectively) in this model. 2
424
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
Notes and references The risk model with a Poisson shotnoise intensity was first proposed in Dassios & Jang [274] for the specific form h(t, x) = xe−t which makes Rt a piecewise deterministic Markov process and then in principle enables an analysis with tools developed in Embrechts, Grandell & Schmidli [345]. Palmowski [678] uses a generator approach to derive an upper bound for the ruin probability for a general class of Cox processes generated by a diffusion process. For the estimation of the intensity from claim data, see Dassios & Jang [275]. The results given above are from Albrecher & Asmussen [12], where some further results on the corresponding aggregate claim sizes, finite horizon ruin probabilities and the inclusion of adaptive premium rules can be found. It is also possible to add a further stochastic process νt in (4.1) that P represents some transient behavior. The particular choice νt = n∈Z− h(t − σn , Yn ) then makes the resulting process stationary in time, in which case Rt is a Poisson cluster process and Theorem 4.3 is covered by Theorem 3.1 of Asmussen, Schmidli & Schmidt [101]. Albrecher & Macci [34] provide sample path large deviations for the ruin probability of such a model in a Bayesian framework, where there is some uncertainty about involved parameters (see also Macci & Petrella [619]). Another approach to model ruin probabilities in the presence of catastrophes can be found in Cossette, Duchesne & Marceau [258].
5
Causal dependency models
Most of the models discussed so far contain dependence between claim sizes and/or their occurrence times through some common environment conditions. However, sometimes a causal dependence model may be needed in practice, where for instance the size of a claim determines the distribution of the next interclaim times (think e.g. of insurance of earthquake damages, where a large claim coming from an earthquake event may be followed by one of an afterquake etc.). It turns out that an example of a dependency model of that kind, where each interclaim time depends on the size of the previous claim, can conveniently be embedded in a semiMarkovian framework and in that way even allows explicit formulas for the ruin probabilityPand related quantities. To see this, conNt sider the surplus process Rt = u + t − i=1 Ui with i.i.d. claims Ui (and generic claim size distribution U ), and assume that the time Ti+1 between the ith claim Ui and the (i + 1)th claim Ui+1 is exponentially distributed with parameter βj if Ui ∈ Fj , where (Fj )j=1,...,M is a (possibly random) partition of the positive halfline. Let on the other hand {Zn }n≥0 be an irreducible discretetime Markov chain with state space {1, . . . , M } and transition matrix P = (pij )1≤i,j≤M and con
5. CAUSAL DEPENDENCY MODELS
425
sider the semiMarkovian model ¯ ¡ ¢ P Tn+1 ≤ x, Un+1 ≤ y, Zn+1 = j ¯ Zn = i, (Tr , Ur , Zr ), 0 ≤ r ≤ n = (1 − e−βi x )pij Bj (y) . Then the choices pij = P(U ∈ Fj ) and Bj ∼ U U ∈ Fj exactly correspond to the above causal dependency model. The net profit condition in this model is PM PM −1 j=1 πi µi < j=1 πi βi , where π = (π1 , . . . , πM ) is the stationary distribution of {Zn } and µi is the mean of the distribution Bi . Let mi (u) denote the GerberShiu function (cf. XII.(1.1)) given that Z0 = i. By the usual conditioning technique on the time interval (0, dt), or (more formally) using the generator approach, one obtains the system of IDEs (i = 1, . . . , M ) m0i (u)−(βi +δ)mi (u)+βi
M X j=1
Zu pij
Z M X pij w(u, y−u)Bj (dy) = 0 , ∞
mj (u−y)Bj (dy)+βi
j=1
0
u
and via Laplace transforms we arrive at the matrix equation ¡ ¢ b c[−s] = m(0) − Λ P ω b [−s], (s − δ)I − Λ + Λ P B[−s] m (5.1) ¡ ¢ ¡ ¢ c[−s] = m b [−s] = b [−s], . . . , m b M [−s] , ω ¡where m(u) = m1 (u), ¢ . . . , mM (u) ,Rm R∞1 ∞ ω b1 [−s], . . . , ω bM [−s] with ω bi [−s] = x=0 e−sx x w(x, y−x) Bi (dy) dx and Λ = ¡ ¢ b b1 [−s], . . . , B bM [−s] . As usual, we assume the diag(β1 , . . . , βM ), B(−s) = diag B boundary condition limu→∞ mi (u) = 0 (i = 1, . . . , M ). First, the quantities mi (0) have to be determined. For that purpose, denote b Aδ (s) = (s − δ)I − Λ + Λ P B[−s]. The equation ¡ ¢ det Aδ (s) = 0 (5.2) now generalizes the Lundberg fundamental equation XII.(2.2). By a combination of complex analysis and linear algebra, one that (5.2) has M zeros ¡ can show ¢ ρ1 , . . . , ρM with 0 for δ > 0 and det A0 (s) = 0 has one zero ρ1 = 0 and M − 1 zeros ρ2 , . . . , ρM with 0 (see [5, 19] for details). The mi (u) are bounded functions due to the boundary conditions, so m b i [−s] are analytic functions for 0 (for s = 0 we further need integrability of mi (u)), and for each of the M zeros ρ1 , . . . , ρM we can now proceed in the following way: determine a nontrivial solution ki of AT δ (ρi )ki = 0 for each i = 1, . . . , M . Since we then have ¡ ¢T c(ρi )T ATδ (ρi )ki = m(0) − Λ P ω b [−ρi ] ki , 0 = m this gives M linear equations for m1 (0), . . . , mM (0).
426
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
Remark 5.1 For δ = 0, the zeros ρ1 , . . . , ρM can always be obtained numerically. Moreover, if the involved claim size distributions have a rational Laplace transform, then m(u) can be obtained explicitly by inversion of the Laplace transform of the solution of (5.1). 2 Example 5.2 To see how this can be put into practice, consider a causal dependency model, where the (n + 1)th interclaim time Tn+1 is exponential(β1 ) if Un > Θn for some random threshold Θn and Tn+1 is exponential(β2 ) if Un ≤ Θn . This corresponds to M = 2 and dB1 (y) =
1 P(Θ ≤ y) dB(y) P(Θ < U )
and
dB2 (y) =
1 P(Θ > y) dB(y) P(Θ > U )
and pi1 = P(U > Θ) and pi2 = P(U ≤ Θ) for i = 1, 2. Let Θ be exponential(2) and B exponential(1), β1 = 1.5, β2 = 0.5. Then ³
´ ³ 1.5 0 ´ 2/3 1/3 , , Λ = 0 0.5 2/3 1/3 µ ¶ 1 3 1 3 b b2 [−s] = B1 [−s] = − , B . 2 1+s 3+s 3+s P =
For δ = 0, we obtain the determinant det A0 (s) = 3 − 8s + 4s2 +
6s − 3 4s − , 1+s 3+s
which has one zero ρ1 = 0 and one positive zero ρ2 = 1.226, the two remaining zeros s = −0.065 and s = −3.161 are negative. E.g., for the ruin probabilities one obtains ψ1 (u) = 0.007 e−3.161 u +0.938 e−0.065 u ,
ψ2 (u) = 0.003 e−3.161 u +0.867 e−0.065 u . 2
Notes and references The explicit treatment of causal dependency models of the above kind can be found in Albrecher & Boxma [18, 19]; see also Adan & Kulkarni [5] for related dependency models in a queueing context. An extension to MAP is given in Cheung & Landriault [240]. Note that for timeindependent quantities (δ = 0) the change of the Poisson intensity can also be reinterpreted as a change of the premium intensity for constant Poisson intensity and so an equivalent interpretation of the above model is to have dependence of the premium intensity between two claims on the size of the previous claim. Extensions of the model to include diffusion perturbation are studied in Zhou & Cai [917]; for an investigation of the GerberShiu function for more general Markovian arrival processes via a fluid flow approach that avoids determining
6. DEPENDENT SPARRE ANDERSEN MODELS
427
the roots of the Lundberg fundamental equation, see Ahn & Badescu [8] and the recent survey Badescu & Landriault [119]. Yang [900] studies ruinrelated quantities for a risk process that is itself a Markov chain, which also has relevance in credit risk applications. Finitetime ruin probabilities for regularly varying claim sizes and dependence that varies according to a Markovian environment process are studied in Biard, Lef`evre & Loisel [162]. Portfolios of lifeinsurance contracts contain certain dependencies that are different from the ones of nonlife portfolios. For the calculation of ruin probabilities in such a situation we refer to Frostig & Denuit [378].
6
Dependent Sparre Andersen models
As discussed in Section VI.3a, in the Sparre Andersen model the representation Rn = u +
n X
(Tk − Uk ),
n ≥ 0,
k=1
reveals an imbedded random walk structure of the risk process with independent increments Tk − Uk (which is the difference of the interoccurrence time and the claim size). This random walk description enables the application of a number of classical random walk techniques to the sudy of ruin probabilities and related quantities. If one now assumes that Tk and Uk are not independent, but have some joint distribution, then the random walk structure is still preserved as long as (Tk , Uk ), k ≥ 1 is an i.i.d. sequence of bivariate random variables. In other words, one can allow the interoccurrence time and the following claim to be dependent (which will change the increment distribution of Tk − Uk ) and still use the random walk framework. Recall that A and B are the distribution functions of the r.v. Tk and Uk , respectively. Let κ(s) denote the c.g.f. of the increment r.v. Tk − Uk , i.e. eκ(s) = Ees(Tk −Uk ) . If the dependence between Tk and Uk is described by a copula function C(a, b), then a simple calculation gives that κ(s) (in its domain of convergence) is given by Z
Z
1
b b − s2 B[−s] A[s]
e 0
1
−sB −1 (a)
esA
−1
(b)
¡
¢ C(a, b) − ab dA−1 (b) dB −1 (a). (6.1)
0
This formula shows quite explicitly how the dependence structure (expressed through the copula) and the marginal distributions A and B influence the shape of κ(s). In particular, for independent interoccurrence times and claims we have C(a, b) = ab, so the second term in (6.1) represents the correction for the introduced dependence. Since a number of asymptotic random walk properties can
428
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
be read off from the shape of κ(s), one can now study the effect of dependence by investigating the resulting κ(s). For instance, it is clear from (6.1) that positive quadrant dependence between T and U (i.e. C(a, b) ≥ ab for all 0 ≤ a, b ≤ 1) implies that κ(s) is for all s smaller than the one for independence. In case an adjustment coefficient γ exists, it will be the solution of κ(s) = 0 and so γ will be larger for this kind of positive dependence. More generally, whenever there is concordance ordering for two copulas (i.e. C1 (a, b) ≥ C2 (a, b) for all 0 ≤ a, b ≤ 1), then γ1 ≥ γ2 . Also, the minimum of κ(s) (which is modified through the dependence) reveals convergence rates of finitetime ruin probabilities (see the related Theorem V.4.5 and Veraverbeke & Teugels [864]). For particular choices of the copula and the marginal distributions, explicit expressions are possible. Notes and references The model discussed in this section was introduced in Albrecher & Teugels [36], where asymptotics of finite and infinite time ruin probabilities and their orderings were investigated. Boudriault, Landriault & Marceau [192], Cossette, Marceau & Marri [262], Badescu, Cheung & Landriault [117] and Ambagaspitiya [47] establish explicit formulas for the ruin probability and GerberShiu function for specific dependence structures within this model. An approach based on defective renewal equations is given in Cheung, Landriault, Willmot & Woo [241]. For a survey on dependence concepts and copulas in general, see e.g. Joe [508], Nelsen [656] and McNeil, Frey & Embrechts [633]. Models in which dependence is introduced through the aggregation of several lines of business are discussed in Section 9.
7
Gaussian models. tion
Fractional Brownian mo
When modeling the reserve process R or the claim surplus process S = u − R of an insurance company, individual claims may be more or less important to take into account compared to aggregation. That is, one may either choose to incorporate jumps such as in the Cram`erLundberg model or L´evy processes, or to use a continuous approximation. As examples of continuous approximations, we have already seen Brownian motion and more general diffusions. However, in the overall class of stochastic processes the most obvious other model choice that comes to mind is Gaussian processes. In fact, this alternative has within the last decade become popular within the area of queueing theory, and in many cases the problems studied there have as their main ingredient a ruin problem, as explained in more detail below. A process {Xt } (with t ≥ 0 or −∞ < t < ∞) is Gaussian if for all t1 < t2
u}. Thus, we are back to a ruin problem. Ruin problems for Gaussian processes (or equivalently to say something on their maxima over infinite or finite time horizons) are notoriously difficult. We shall here concentrate on one approximation method, that of the largest term, which consists in approximating ψ(u) by the tail Z ∞ © ª 1 p exp − (u + µt)2 /2v(t) dt 2πv(t) u of St at u for that t = t∗ for which the density is maximal. One uses Mill’s ratio to approximate the above tail by √
© ª 1 exp − (u + µt)2 /2v(t) . 2πv(t)u
As a final approximation, one ignores the prefactor to the exponential, so that the largest term approximation becomes © ª © ª ψ(u) ≈ max exp − (u + µt)2 /2v(t) = exp − min(u + µt)2 /2v(t) t≥0 t≥0 © ª ∗ 2 ∗ = exp − (u + µt ) /2v(t ) . (7.3) Example 7.2 Assume that S is standard Brownian motion, so that v(t) = t. The minimization problem is equivalent to minimizing 2 log(u+µt)−log t, which by differentation gives 0 =
1 u 2µ − ∗ , i.e. t∗ = . u + µt∗ t µ
Insertion in (7.3) gives ψ(u) ≈ e−2µu which we recognize as the exact value (cf. II.(2.5) with σ = 1). 2 2 σ(K)
is a constant that does not need to concern us here.
7. GAUSSIAN MODELS. FRACTIONAL BROWNIAN MOTION
431
Example 7.3 Assume, more generally, that S = BH is fBm, so that v(t) = t2H . Proceeding in the same way, we get 2H u H 2µ − ∗ , i.e. t∗ = . 0 = u + µt∗ t µ1−H Insertion in (7.3) gives the approximation n 1 ³ u ´2−2H ³ µ ´2H o . ψ(u) ≈ exp − 2 1−H H
(7.4) 2
Remark 7.4 Approximation (7.4) shows that in the fBm case, the largest term approximation for ψ(u) has a ‘Weibulllike’ decay with exponent r = 2 − 2H (for BM, we of course refind the exponential form). That is, the decay is slower the smaller r is (see also Section 1). This phenomenon can be explained from covariance properties of fBm. Indeed, the covariances between increments can be shown to be negative when H < 1/2 and positive when H > 1/2. Thus a period of increase is typically followed by one of decrease when H < 1/2. In other words, the increments compete to keep S low, whereas they collaborate when H > 1/2. A similar phenomenon exhibits itself in the path properties: fBm has smoother paths the smaller H is (cf. Figure XIII.4, which contains sample paths of fBm with H = 0.25, 0.5, 0.75 and 0.95). 3
2
2
1.5 1
1 0.5 0 0 −1 −2
−0.5 0
0.2
0.4
0.6
0.8
1
1.5
−1
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
1.2 1
1
0.8 0.6
0.5 0.4 0.2
0
0 −0.5
0
0.2
0.4
0.6
0.8
1
−0.2
Figure XIII.4 A third example is integrability properties of the covariance function: the sum ∞ X £ ¤ E S1 (Sn+1 − Sn ) n=1
432
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
converges for H < 1/2 but diverges for H > 1/2. Divergence of such sums or integrals is often referred to as longrange dependence, and the reason to focus on precisely this property is that certain CLT’s hold if and only if there is convergence. 2 The exact asymptotics of ψ(u) is in fact known for fBm (Piterbarg & H¨ usler [487], Narayan [655]): n 1 ³ u ´2−2H ³ µ ´2H o ψ(u) ∼ CH u2H−3+1/H exp − 2 1−H H
(7.5)
for some constant CH (that is, there is a powerprefactor to (7.4)). The constant CH can, however, hardly said to be explicit since it involves the socalled Pickands constant, a quantity that shows up also in other aspects of Gaussian process theory and is basically unknown. Example 7.5 As a final example, we consider another popular model from queueing studies, an integrated OrnsteinUhlenbeck process of the form Xt = Rt Y dv where Y is a stationary version of the OrnsteinUhlenbeck process v 0 defined as the solution to dYt = −Yt dt + dBt . Here one can check that v(t) = t + 1 − e−t . The optimizer t∗ is not readily computed, but one can use largeu asymptotics: if u is large, then so of course is t∗ , and one has v(t) ∼ t as t → ∞. Thus, for large u we expect t∗ to have the same asymptotic form as in Example 7.2 with Brownian motion, and we get the same approximation e−2µu as for that case. 2 A main justification for the largest term approximation is that it is simple to compute, as seen from the examples. Another one is that it provides the correct logarithmic asymptotics: © ª log ψ(u) ∼ exp − (u + µt∗ )2 /2v(t∗ ) ,
(7.6)
see D¸ebicki [280]. Finally it should be mentioned that the largest term approach also suggests that τ (u) is of order t∗ , thereby giving some information on the time horizon where ruin is most likely. Such information could be valuable, e.g., in a simulation study where attacking an infinite horizon is in general unfeasible and one could choose to simulate only up to time kt∗ for some suitably chosen k > 1. Notes and references The literature on ruin problems for Gaussian processes is huge. An accessible recent survey is in Mandjes [629]. Other main textbook references on aspects of Gaussian processes are Adler [6] and Rue & Held [756]. For convergence of stochastic processes, see Whitt [883] (motivating in many cases Gaussian models).
8. ORDERING OF RUIN PROBABILITIES
433
Apart from the largest term approach studied here, some trends in the literature on ruin problems for Gaussian processes are manysources asymptotics (generalizing n → ∞ in the onoff model) and the socalled double sum approach, see again [629]. Ruintype problems for fBm and other selfsimilar processes were also investigated in Michna [637], D¸ebicki, Michna & Rolski [282], H¨ usler & Piterbarg [488] and Frangos, Vrontos & Yannacopoulos [369]. For asymptotic results on the time of ruin see e.g. H¨ usler & Piterbarg [489].
8
Ordering of ruin probabilities
We have already seen some ordering results on ruin probabilities in Section IV.8 and Section VII.4. Such ordering results can be very helpful, especially in situations where quantitative results for ruin probabilities under dependence are difficult to obtain. In this section, we collect a few further ordering results in connection with discretetime models with dependent Pnincrements. Consider the discretetime risk process Rn = u + i=1 Xi , where Xi is the net income of year n. Assume that the r.v.’s Xi are dependent and lighttailed and that the assumptions of Theorem 1.2 are fulfilled. Then we still have an exponential decay of the ruin probability with adjustment coefficient γ defined by κ(γ) = 0. In this setup one can now compare streams of net incomes w.r.t. their resulting adjustment coefficient (recall the notion of convex ordering of Section IV.8). e1 , X e2 , . . . both fulfill the asProposition 8.1 Assume that and X PnX1 , X2 , . . . P n e sumptions of Theorem 1.2. If i=1 Xi ≺cx i=1 Xi for all n ∈ N, then γ ≥ γ e. Pn
Pn
e
Proof. The exponential function is convex, so we have Eeθ i=1 Xi ≤ Eeθ i=1 Xi and subsequently κ(θ) ≤ κ e(θ) for all θ ∈ R, from which the assertion follows. 2 If we now want to compare streams of net incomes with the same marginal distributions, but different dependence structure, the socalled supermodular order is a helpful concept. A function f : Rn → R is a supermodular function, if for any x, y ∈ Rn f (x) + f (y) ≤ f (x ∧ y) + f (x ∨ y), where the operators ∧ and ∨ denote the componentwise minimum and maximum, respectively (if f is twice differentiable, then supermodularity means that e are in ∂ 2 f /(∂xi ∂xj ) ≥ 0 for all 1 ≤ i < j ≤ n). Two random vectors X, X e if Ef (X) ≤ Ef (X) e for all supermodular funcsupermodular order (X ≺sm X) tions f : Rn → R. Since both functions y 7→ I(y > x) and y 7→ I(y ≤ x)
434
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
e the marginal are supermodular for each fixed x, it is clear that if X ≺sm X, e distributions of X and X have to coincide. e then Pn Xi ≺cx Pn X e Proposition 8.2 If X ≺sm X, i=1 i=1 i . e for all supermodular functions f : Proof. Simply note that Ef (X) ≤ Ef (X) Rn → R and in particular also for those supermodular functions that are nondecreasing and convex in each component. Let φ(x) = ¡x1 + ·¢· · + xn . Then for every nondecreasing convex function h clearly g(x) = h φ(x) is nondecreasing e so that we oband convex. But the supermodularity implies Eφ(X) = Eφ(X) e tain φ(X) ≺cx φ(X). 2 From Proposition 8.1 we thus get the following criterion: e1 , X e2 , . . . both fulfill the assumpCorollary 8.3 Assume that X1 , X2 , . . . and X e en ) for all n ∈ N, then tions of Theorem 1.2. If (X1 , . . . , Xn ) ≺sm (X1 , . . . , X γ≥γ e. ¡ ¢ A random vector X = (X1 , . . . , Xn ) is called associated if Cov f (X), g(X) ≥ 0 for all nondecreasing functions f, g : Rn → R. The following result is another indication that positive dependence among the risks in the insurance portfolio is dangerous. Proposition 8.4 Assume that (X1 , . . . , Xn ) is associated for all n ∈ N and e1 , . . . , X en is a sequence of independent random variables with the same that X marginals. Then γ ≤ γ e. Pn Pn e Proof. By Proposition 8.2 it suffices to show that i=1 Xi ≺cx i=1 X i , and ¡ Pn ¢ ¡ Pn ¢ e since E i=1 Xi = E i=1 Xi , it even suffices to show that n X
Xi ≺icx
i=1
n X
ei . X
(8.1)
i=1
We proceed inductively. For n = 1, this statement is clearly fulfilled. Assume now that (8.1) holds. Since the ≺icx order is closed under convolution, we then Pn+1 Pn e have j=1 Xi ≺icx j=1 X i + Xn+1 . Choosing appropriate indicator functions in the definition of association, it is clear that P
n ³X
n ´ ³X ´ ei ≤ x1 P(Xn+1 ≤ x2 ) = P ei ≤ x1 , Xn+1 ≤ x2 X X
i=1
i=1
≤ P
n ³X i=1
ei ≤ x1 , X en+1 ≤ x2 X
´
9. MULTIDIMENSIONAL RISK PROCESSES
435
for all x1 , x2 ≥ 0. But in view of the stoploss order interpretation IV.(8.1) of the ≺icx order, it then follows from the general representation Z E(Z1 + Z2 − d)+ = E(Z1 ) + E(Z2 ) − d +
d
P(Z1 ≤ x, Z2 ≤ d − x) dx 0
Pn e Pn+1 e that i=1 X i + Xn+1 ≺icx i=1 Xi . The assertion now follows from the transitivity of the ≺icx order. 2 This result is often useful, as in many situations association can be shown by a combination of the following properties: • If (X1 , . . . , Xn ) are independent, then the vector X = (X1 , . . . , Xn ) is associated. • If X ¡is associated and f1¢, . . . , fn : R → R are nondecreasing functions, then f1 (X1 ), . . . , fn (Xn ) is associated. • If X ¡is associated and f¢1 , . . . , fk : Rn → R are nondecreasing functions, then f1 (X), . . . , fk (X) is also associated. Notes and references A general reference for stochastic orderings in the context of actuarial science is Denuit et al. [292]. Some ordering results of the adjustment coefficient under dependence can be found in M¨ uller & Pflug [652]; see also Frostig [375]. Stochastic orderings for random sums have (in view of the PollaczeckKhinchine formula) also implications on ruin probabilities. Such results for a given ordering of the involved claim number r.v. are given by Denuit, Genest & Marceau [294]; for dependence between the number of claims and their individual distribution, see Belzunce et al. [155].
9
Multidimensional risk processes
Assume now that we have n possibly dependent portfolios (or lines of business) described through the vector Rt = (Rt1 , . . . , Rtn ) of risk reserve processes with initial capital vector u = (u1 , . . . , un ) and one Poisson process Nt with intensity β that generates a claim in each of the components represented through the claim vector U i = (Ui1 , . . . , Uin ). With a premium intensity vector p, the multivariate risk reserve process is given by Rt = u + t p −
Nt X i=1
U i,
t ≥ 0.
(9.1)
436
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
Here U1 , U 2 , . . . is a sequence of i.i.d. random vectors £ with joint distribution¤ b 1 , . . . , rn ] = E exp(r1 U11 + · · · + rn U1n ) function B(x1 , . . . , xn ), joint m.g.f. B[r and marginal distributions B1 (x1 ), . . . , Bn (xn ) (so in general the components of the claim vector U i may be dependent). It is easy to think of a number of situations where such a model applies, namely that one event or accident causes a claim in several lines of business or several portfolios. For such a risk process, there are now several ways to define the event of ruin and it will depend on the situation which one is appropriate. Let τmax be the first time when all of the components are negative, i.e. τmax (u) = inf{t > 0  Rt < 0} = inf{t > 0  max{Rt1 , . . . , Rtn } < 0} , where inequalities for vectors are meant componentwise. The corresponding finitetime ruin probability is ¡ ¢ ψmax (u, T ) = P τmax (u) ≤ T and the infinitetime ruin probability is ¡ ¢ ψmax (u) = P τmax (u) < ∞ . Other types of ruin times are ª © τmin (u) = inf t > 0  min(Rt1 , . . . , Rtn ) < 0 and
τsum (u) = inf{t > 0  Rt1 + . . . + Rtn < 0}.
Remark 9.1 Obviously the ruin probability ψsum (u) = P(τsum (u) < ∞) reduces the problem again to a univariate problem with u = u1 + · · · + un and i.i.d. claims Ui = Ui1 + · · · + Uin (so each Ui is a sum of n dependent r.v.). In this case the multivariate framework is then just the model setup to specify the dependence that determines the distribution of Ui and through that the ruin probability.3 In particular, one can now ask how dependence influences ψsum (u) either by quantifying the dependence structure or by studying stochastic ordering. In the Notes some references to corresponding work in the literature are given (in particular for more general multivariate point processes). In the remainder of this section we focus however on ruin definitions that leave the problem in a ‘truly’ multivariate setting. 2 The following martingale is an extension of the Wald martingale of the univariate case. 3 For
the distribution of dependent sums, see Section XVI.2d.
9. MULTIDIMENSIONAL RISK PROCESSES
437
b 1 , . . . , rn ] < ∞. Define Lemma 9.2 Let r1 , . . . , rn ∈ R be such that B[r b 1 , . . . , rn ] − β − p1 r1 − . . . − pn rn . κ(r1 , . . . , rn ) = β B[r Then
© ª Mt = exp −r1 Rt1 − . . . − rn Rtn − tκ(r1 , . . . , rn ) ,
t≥0
is a martingale w.r.t. the natural filtration F. Proof. Since Nt is a homogeneous Poisson process, we get for all t, h ≥ 0 n h n X oi i E exp − ri (Rt+h − Rti ) i=1
n =
exp −
n X
¡ ¢o b 1 , . . . , rn ] − 1 ri pi h + βh B[r = ehκ(r1 ,...,rn ) .
i=1
From this it follows that E(Mt+h Ft ) £ © ª¯ ¤ 1 n = E exp − r1 Rt+h − . . . − rn Rt+h − (t + h)κ(r1 , . . . , rn ) ¯ Ft © ª = exp − r1 Rt1 − . . . − rn Rtn − tκ(r1 , . . . , rn ) . 2 Let us consider the situation © ¯ of lighttailed marginal claim ª size distribub . . . , 0, ri , 0, . . . , 0] < ∞ as the abscissa tions and define ri0 = sup ri ¯ B[0, of convergence of the m.g.f. of the marginal r.v. U1i . Define further the b 1 , . . . , rn ] < ∞}, G0 = G ∩ (0, ∞)n and sets G = {(r1 , . . . , rn ) ∈ Rn  B[r ∆ = {(r1 , . . . , rn ) ∈ G  κ(r1 , . . . , rn ) = 0}, ∆0 = ∆ ∩ (0, ∞)n . Let µ be the vector containing the expected values of the marginal claim size distributions. Proposition 9.3 Assume that the componentwise net profit condition βµ < p holds. If ri0 > 0 for all i = 1, . . . , n and sup(r1 ,...,rn )∈G0 κ(r1 , . . . , rn ) > 0, then ψmax (u) ≤
inf
(r1 ,...,rn )∈∆0
e−r1 u1 −...−rn un .
An example of the shape of the set ∆ for dimension n = 2 is illustrated in Fig. XIII.5 (the arrows are unimportant for the moment but will show up below in Remark 9.5).
438
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
µ
r2 6 ¾
∆ r1 q
®
Figure XIII.5: The set ∆
Proof of Proposition 9.3. Due to Lemma 9.2, Mt is a martingale and τmax (u) is b 1 , . . . , rn ] < ∞ we know from a stopping time. For every (r1 , . . . , rn ) with B[r Lemma 9.2 that e−
Pn i=1
ri ui
= =
E[Mt ] ≥ E[Mt ; τmax (u) ≤ t] ¯ £ ¤ ¡ ¢ E Mτmax (u) ¯ τmax (u) ≤ t P τmax (u) ≤ t .
For all (r1 , . . . , rn ) ∈ G0 e−
Pn i=1
ri Rτi max (u)
≥ 1,
which thus leads to Pn ¡ ¢ P τmax (u) ≤ t ≤ e− i=1 ri ui sup eh κ(r1 ,...,rn ) . 0≤h≤t
It is easy to check by taking partial derivatives that along every ray from 0 into (0, ∞)n , κ(r1 , . . . , rn ) is a continuous and convex function that (with the positive safety loading for each component) has negative derivative in 0 and by κ(0, . . . , 0) = 0 and continuity will hence satisfy κ(r1 , . . . , rn ) = 0 for at least one (r1 , . . . , rn ) ∈ G0 , i.e. ∆0 is not empty. Hence we can write P ¡ ¢ − n i=1 ri ui . P τmax (u) ≤ t ≤ inf e 0 (r1 ,...,rn )∈∆
Letting t → ∞ then gives the result.
2
9. MULTIDIMENSIONAL RISK PROCESSES
439
Example 9.4 A particular boundary case of this multivariate risk model with practical relevance of its own is the twodimensional model µ 1¶ µ ¶ µ ¶ X ¶ Nt µ Rt u1 p1 a = +t − Ui , Rt2 u2 p2 1−a i=1
t ≥ 0.
Here the dependence between the two claim components is the strongest possible, namely comonotonic dependence. The interpretation is that the (univariate) claims Ui are proportionally shared by two portfolios who may have different premium intensities p1 , p2 (reflecting different safety loadings) and the question is for instance how to allocate initial capital u1 and u2 in a sensible way so as to minimize the ruin probability ψmin . One immediately observes that PNt τmin (u) can also be represented as τmin (u) = inf{t > 0  i=1 Ui > q(t)} with q(t) = min{(u1 + p1 t)/a, (u2 + p2 t)/(1 − a)}. So in fact this twodimensional model can be treated as a onedimensional crossing problem of a compound Poisson process over a piecewise linear barrier. If p1 /a > p2 /(1 − a) and u1 /a > u2 /(1 − a), then the barrier is linear and one bounces back to the classical risk model. Extensions of this capital allocation problem to higher dimensions and more general claim arrival processes are obvious. 2 Remark 9.5 In Collamore [250, 251] a related multidimensional ruin problem is considered, namely to estimate the probability that a random walk {S n } in Rd hits a rare set. More precisely, we will assume that the rare set has the form xA = {xa : a ∈ A}, where A is convex and x is a large parameter, and that the random walk in itself would typically avoid xA. For this, the drift vector µ = ES 1 should as a minimum satisfy tµ 6∈ xA for all t and x (technically, the existence of a separating hyperplane is sufficient). For simplicity, we take d = 2. Define τ (x) = inf {n : S n ∈ xA} and z(x) = P(τ (x) < ∞). The situation is as in Figure XIII.6. Here xξ(k) is the point at which the line with direction k hits xA. The mean drift vector µ points away from A, so an obvious possibility is to use an exponential change of measure changing the drift to some k pointing towards A. Exponential change of measure is defined along similar lines as for the multidimensional continuoustime risk process. We let κ(θ) = κ(θ1 , θ2 ) = log Eeθ1 X1 +θ2 X2 where (X1 , X2 ) = S 1 . The exponentially tilted measure is a random walk with increment distribution satisfying £ ¤ Eθ1 ,θ2 h(X1 , X2 ) = E h(X1 , X2 ) exp {θ1 X1 + θ2 X2 − κ(θ1 , θ2 )} . (9.2)
440
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE 6 x2 xA
xξ(k) r
µ k
®
x 1
µ Figure XIII.6: The ruin set xA
It easily follows that the changed drift under Pθ1 ,θ2 is given by ¡ µθ1 ,θ2 = Eθ1 ,θ2 (X1 , X2 ) = κ1 , κ2 ) = ∇κ(θ1 , θ2 ),
(9.3)
where ∇ denotes the gradient. In Fig. XIII.5, the arrows pointing outward from ∆ are the gradients. The gradient is orthogonal to ∆ and at any given point, its length is twice the radius of curvature at the given point of ∆. Thus, we face the problem of which (rowvector) γ ∈ ∆ to work with. A lower bound for z(x) is given by the probability of the path following the Pγ description, i.e. by ¡ ¢ (9.4) z(x) = P τ (x) < ∞ = Eγ e−γ·S τ (x) ≈ e−xγ·ξ(µγ ) . This suggests to take γ = γ ∗ where γ ∗ = argmin γ · ξ(µγ ) . It can be shown that (under appropriate conditions) indeed the correct logarithmic asymptotics corresponds to taking γ = γ ∗ in (9.4) and that the corresponding exponential change of measure with θ = γ ∗ leads to logarithmic efficiency; see Collamore [251]. 2 If the claim components are heavytailed, the general picture is less complete. Here is a simple result on asymptotic finitetime ruin probabilities for a risk
9. MULTIDIMENSIONAL RISK PROCESSES
441
process of the type (9.1) with independent components, but with a possibly more general claim number process: Qn Proposition 9.6 Assume that B(x1 , . . . , xn ) = i=1 Bi (xi ) and Bi ∈ S. Assume further that Ez NT < ∞ for some z > 1, where NT is the number of claims up to time T . Then for fixed T > 0 n ¤Y £ B i (ui ), ψmax (u, T ) ∼ E (NT )n
u → ∞.
(9.5)
i=1
Proof. Since ψmax (u, T ) = P
Nt ³X
´ U i − t p > u for some 0 < t ≤ T ,
i=1
we have the simple upper bound ψmax (u, T ) ≤ P
NT ³X
∞ n m ´ ³X ´ X Y Ui > u = P(NT = m) P Uij > uj . m=0
i=1
j=1
i=1
By the subexponential property of the marginals, Lemma X.1.8, Lemma X.2.2 and dominated convergence, the latter is asymptotically equal to ∞ X
P(NT = m) mn
m=0
n Y
n £ ¤Y B i (ui ) = E (NT )n B i (ui )
(9.6)
i=1
i=1
so that we have the upper bound n ¡ ¢ £ ¤Y B i (ui ). ψmax (u, T ) ≤ 1 + o(1) E (NT )n i=1
Similarly, a lower bound for the finitetime ruin probability is ψmax (u, T ) ≥ P
NT “X
” Ui − Tp > u
=
∞ X m=0
i=1
P(NT = m)
n m “X ” Y P Uij > uj + pj T . j=1
i=1
By the longtailed property of the subexponential distribution, this is asymptotically also equal to (9.6) so that ψmax (u, T ) ≥
¡
n ¢ £ ¤Y B i (ui ) 1 + o(1) E (NT )n i=1
and we have asymptotic equivalence.
2
442
CHAPTER XIII. FURTHER MODELS WITH DEPENDENCE
£ ¤ Remark 9.7 Note that by Jensen’s inequality E (NT )n ≥ E[NT ]n , so that ψmax (u, T ) is asymptotically larger than the product of the n marginal onedimensional finitetime ruin probabilities, which can be explained by the common claim number process that governs the n components. Notes and references Although multivariate ruin theory is a very natural extension of classical ruin theory with a lot of potential applications also in fields outside of insurance (such as credit risk or barrier option pricing), this research field is not yet very far developed. As in Remark 9.5, the event of ruin can in general be defined as the first passage of Rt into an ndimensional open set A that does not contain 0. An early paper in such a framework is Dembo, Karlin & Zeitouni [289] for multivariate L´evy processes. In the framework of Remark 9.5, Collamore [250, 251] also derived asymptotic results for the time of ruin; see in addition Borovkov & Mogul’ski˘ı [186, 187, 188]. Huh & Kolkiewicz [483] deal with ruin probabilities for multivariate diffusions and applications to the pricing of credit risk products. The particular risk model (9.1) was investigated for two dimensions in Chan, Yang & Zhang [230]. The martingale approach was also implemented in Li, Liu & Tang [583] who worked out some concrete examples of dependence structures for two dimensions and illustrated that it is easily possible to extend Proposition 9.3 for the situation where a Brownian perturbation is added in each component of (9.1) with a joint correlation matrix (the form of κ is then correspondingly modified). Finitetime ruin probability approximations via a bivariate compound binomial model for two dimensions as well as some ordering results are given in Yuen, Guo & Wu [910] and Cai & Li [219]. In the latter paper also an explicit solution for ψsum (u) for multivariate phasetype claims is derived, see also Eisele [341] for a Panjertype recursion and Sundt & Vernic [824] for a more general treatment. Explicit results for ψmax (u) and ψmin (u) are usually out of reach (except for very simple situations, see e.g. Dang et al. [272]); however, Q it was shown in Cai & Li [219] that if the claim vectors are associated, then n i=1 ψi (ui ) ≤ ψmax (u) ≤ min1≤i≤n ψi (ui ), where ψi (ui ) is the marginal ruin probability of the ith component. From this it is not hard to establish in two dimensions the bound max{ψ1 (u1 ), ψ2 (u2 )} ≤ ψmin (u1 , u2 ) ≤ ψ1 (u1 ) + ψ2 (u2 ) − ψ1 (u1 )ψ2 (u2 ) . The model of Example 9.4 with Nt being a renewal process is studied by Avram, Palmowski & Pistorius [111]. In particular, asymptotic results for lighttailed claim size distributions are derived for two dimensions. Related twosided barrier crossing problems for compound Poisson processes and analogies to queueing problems are studied in Perry, Stadje & Zaks [693]. If Ui in (9.1) is multivariate regularly varying, quite explicit and intuitive asymptotic results can be obtained for renewal claim number processes, see Hult & Lindskog [485] with a slightly different definition of ruin. As mentioned in Remark 9.1, if the claim sizes in each business line and between business lines (i.e. components) are i.i.d. and the aim is to assess ψsum , then it is often
9. MULTIDIMENSIONAL RISK PROCESSES
443
possible to transform more complicated multivariate point processes into simpler ones. One example is that Nt is a superposition of counting processes each of which causes claims only in a selection of components, in which case one can usually identify a onedimensional reformulation of the model with a modified (mixed) claim distribution (see e.g. Yuen, Guo & Wu [909], Ambagaspitiya [46] and for stochastic orderings Frostig [375] and Lindskog & McNeil [599]). Pfeifer & Neˇslehov´ a [696] and B¨ auerle & Gr¨ ubel [143] use copulas and random time shifts to generate multivariate claim counting distributions with Poisson marginals; for another general flexible multivariate counting process, see B¨ auerle & Gr¨ ubel [144]. Ruin probabilities of the type ψsum and Lundberg bounds in a discrete multivariate Pn autoregressive model are investigated in Zhang, Yuen & Li [916]. Since er i=1 xi is e in (9.1), then the a supermodular function, it is immediately clear that if U ≺sm U respective adjustment coefficients for ψsum (if they exist) fulfill γ e ≤ γ. Extensions of this result to Cox models and finitetime Lundberg inequalities are given in Juri [513], see also Macci, Stabile & Torrisi [620]. Bregman & Kl¨ uppelberg [198] show that if two compound Poisson processes are coupled by a Clayton L´evy copula, one can obtain quite explicit asymptotic results for ψsum . For stochastic ordering results of componentwise ruin times and ψsum in a general multivariate setup with L´evy copulas see B¨ auerle, Blatter & M¨ uller [142]. Another multivariate aspect is competing claim processes. That is, the reserve process is Rt = Rt1 + · · · + Rtn , and if ruin occurs, one may ask which of the k components caused it. In particular, if all of the Ri can only go downwards by a jump, it is a welldefined question which Ri actually performed the jump to make the reserve go negative. However, especially in the lighttailed case this does not tell the whole story: some other Rj may have taken R close to zero before ruin. For work in this direction, see Huzak et al. [491].
This page intentionally left blank
Chapter XIV
Stochastic control 1
Introduction
The purpose of stochastic control is to find strategies that are optimal in the sense of maximizing a suitably defined reward function. In the setting of this book, consider the risk reserve process. Time may be discrete or continuous, and the time horizon finite (deterministic or a stopping time) or infinite, and we will denote by T its upper limit. Assume given a set of possible actions. At each time t the controller then chooses one particular action ut , and the function U = (ut )t≤T is denoted a strategy, the set of admissible strategies over which to is U , and the reserve process governed by a particular strategy is © maximize ª RtU (for notational convenience {Rt } when it is clear what U is). We further assume that the reserve has a given initial value x = x0 .1 In discrete time, the reward to be maximized over U will have the form VTU (x) = Ex
T X
r(RtU , ut , t) .
(1.1)
t=0
Thus r(x, u, t) is the gain of using strategy u at reserve level x at time t. The function r may be negative, which corresponds to a loss, not a gain. It is common to assume that t only enters via a discounting factor δ, i.e. (with a slight abuse of notation) that r(x, u, t) = e−δt r(x, u), but we will not always make this assumption. It is, however, convenient in continuous time problems, and in infinitehorizon problems one certainly needs r(x, u, t) to somehow decrease with t since otherwise the total VTU (x) may well be infinite. 1 Note that u is used in most of the rest of the book. We use a different symbol here to avoid confusion with the control.
445
446
CHAPTER XIV. STOCHASTIC CONTROL
The value VT (x) of holding initial reserve x is obtained by maximizing over the set U of strategies under consideration, i.e. VT (x) = sup VTU (x) . U ∈U
(1.2)
The supremum may or may not be attained. When it is attained, the maximizer is denoted by U ∗ . The function x → VT (x) is denoted the value function. In continuous time, the reward will have the form Z T VTU (x) = rT (RTU ) + r(RtU , ut , t) dt . (1.3) 0
The added term rT (RTU ) corresponds to a terminal reward (or punishment). The value function is again defined by (1.2). Example 1.1 Consider a risk process in discrete time, such that the amount of premiums received at each time instant t = 1, 2, . . . is 1 and the claim amont for the time period (t − 1, t] is a r.v. Yt , such that the Yt are i.i.d. and satisfy EYt < 1. Thus, without any form of control we have R0 = x, R1 = x + 1 − Y1 , R2 = R1 + 1 − Y2 . . .
(1.4)
and the time horizon T is the time τ of ruin when τ < ∞, ∞ otherwise. As an example of a control problem, consider dynamic proportional reinsurance and minimization of the ruin probability. That is, at time t the company chooses to reinsure a proportion ut ∈ [0, 1] of its portfolio so that Yt is replaced by (1 − ut )Yt . If reinsurance is cheap, i.e. the premium income changes in the same proportion (replace 1 by 1 − ut in (1.4)), this reduces variability and so potentially the ruin probability. However, in practice reinsurance is not cheap: the premium b(ut ) to pay for reinsurance will typically satisfy b(u) > u so that the drift is reduced which potentially increases the ruin probability and there is a tradeoff. Note that U = [0, 1]N . The problem can be put into the framework (1.1) by taking T = ∞, [0, ∞) ∪ {∆} as state space, modifying R by letting Rt = ∆ when τ < t < ∞ (τ is the time of ruin) and taking the reward function as r(x, u, t) = −1 for x < 0 and 0 otherwise. Then the sum in (1.1) is −1 when τ < ∞ and 0 otherwise, and so VTU (x) is minus the probability of ruin under strategy U . 2 In this chapter we will only work with feedback strategies. This means that we assume RU to be (possibly timeinhomogeneous) Markov with transition mechanism at time t depending on ut , and that ut is chosen as a function of RtU and t only.
2. STOCHASTIC DYNAMIC PROGRAMMING
447
Notes and references In the Markovian setting it may certainly ˘ ¯seem counterintuitive that a ut depending on some further characteristics of RsU s≤t could be optimal. However, the problem is more complicated than it may look. For some discussion in discrete time, see Blackwell [171] and Bertsekas & Shreve [160]. In continuous time, one typically first finds a feedback strategy that is a candidate for being optimal and subsequently gives a proof of the optimality by a socalled verification theorem.
2
Stochastic dynamic programming
Stochastic dynamic programming is a method for solving the optimization problem (1.2) in discrete time. The model then means that RU = (R0U , . . . , RTU ) is a Markov chain whose transition probabilities p(t, x, y, u) from t to t + 1 and from state x to state y depend on u = ut . The idea is most readily explained by first assuming RU to have a finite state space E, the set of all controls u to be finite, and T < ∞ to be deterministic. We then define the value function V (t, x) at time t ≤ T and in state x as V (t, x) =
T X ¡ t ¢ max Ex r RtU , ut , t U t ∈U t s=t
(2.1)
where U t is the set of all admissible U t = (ut , . . . , uT ), and we denote by u∗ (t, x) a control which is optimal (it does not necessarily have to be unique). Clearly, the strategy U ∗ given by u0 = u∗0 (0, x0 ), u1 = u∗ (1, R1 ), . . . , uT = u∗ (T, RT ) is then optimal. To compute u∗ (t, x), one proceeds backward in time. At t = T , it is obvious that u∗ (T, x) = argmaxu r(x, u, T ) and that V (T, x) = r(x, u∗ (x), T ). Assume the values V (t + 1, y) have been computed for all y ∈ E. Then clearly X V (t, x) = max p(t, x, y, u)V (t + 1, y) , (2.2) u
y∈E
∗
and u (t, x) is a maximizer. Note that in this setting where everything is finite, we in principle have a finite maximization problem: V (0, x) =
max Ex
u0 ,...,uT
T X
r(RtU , ut , t),
(2.3)
t=0
where U = (u0 , . . . , uT ). The advantage of stochastic dynamic programming is to reduce complexity. Say we have p Markov states and q possible controls.
448
CHAPTER XIV. STOCHASTIC CONTROL
Using (2.3), the expectation is then a sum over all pT possible Markov chain paths, where each sum contains T + 1 terms, and this has to be evaluated over all q T +1 possible control combinations. Thus the total complexity is (T + 1)pT q T +1 . In contrast, each sum in (2.2) has p terms and has to be evaluated for q controls u and p Markov states x. Thus the complexity of each backward step is p2 q and the total complexity of the stochastic dynamic programming algorithm is T p2 q + pq (the second term comes from the initial t = T step). This is typically an enormous saving over (T + 1)pT q T +1 , not least in situations that are a discretization of a problem containing continuous components so that p and/or q and/or T may be huge. Beyond the finite setting just considered, one can in principle set up a rather similar scheme, but a considerable amount of difficulties will typically arise. If E and/or the set of controls is countable or continuous, it may be difficult just to compute E[Rt+1  Rt = x, ut = u] and also to find the maximizer in closed form (these two steps are the analogues of (2.2)). Even more serious problems arise when there is no upper bound on T because the initial step is then unfeasible. Finally, the supremum in (1.2) may not be attained. A warning should be issued that the obvious idea of discretizing and truncating requires continuity properties that (maybe as a surprise!) need not always hold even in nonartificial settings. For example, if T = ∞ or T is a stopping time, replacing the time horizon by T ∧ n, using the above approach going back from T ∧ n and finally letting n → ∞ to get the T = ∞ optimal strategy as limit of the T ∧ n optimal strategies does not always work, see Schmidli [779, pp.6–8]. Notes and references For a list of standard texts in stochastic dynamic programming, see Schmidli [779, p. 8]. A ruin probability problem treated by stochastic dynamic programming is in Sch¨ al [765].
3
The HamiltonJacobiBellman equation
We now turn to the continuous time setting. The approach will be similar to the dynamic programming one, but in continuous time we go back to t from t + dt rather than from t + 1. We will state two simplifying assumptions which, however, will be sufficient to cover the ruin probability applications we have in mind. One is r(x, u, t) = e−δt r(x, u), the other that T is an exit time (for example, the time of ruin). This ensures that the value function V (t, x) as defined by Z T h i −δ(T −t) U V (t, x) = sup Ex,t e rT (RT ) + e−δ(s−t) r(RsU , us ) ds t U t ∈U t
(3.1)
3. THE HAMILTONJACOBIBELLMAN EQUATION
449
(where Ex,t denotes E[ ·  Rt = x, t < T ]) only depends on T − t, not on t itself. One is then eventually interested in V (x) = V (0, x). Generators (cf. Chapter II) turn out to be an essential tool. Denote by A u the generator of the (timehomogeneous) Markov process according to which RU evolves when control ut ≡ u is used. Then: Theorem 3.1 Under certain assumptions, the value function V (·) is the solution of £ ¤ 0 = sup A u V (x) − δV (x) + r(x, u) . (3.2) u
Remark 3.2 Equation (3.2) goes under the name HamiltonJacobiBellman (HJB) equation. Its derivation typically requires some assumptions that are difficult to verify directly. For instance, suitable regularity conditions are needed, in particular that V (·) is in the domain of A u for all u (we do not specify the remaining ones and the reader should not take the proof below for more than a heuristic justification). The argument suggests that the maximizer u∗ (when it exists) is the optimal control when Rt = x. However, to establish that the solution of (3.2) indeed solves the control problem, more work is needed (in that sense, the custom to use V in the formulation of the HJB equation is a slight abuse of notation). Another complication is that it is not a priori clear whether the HJB equation has a unique solution. If it has, then one usually needs to prove separately (in a socalled verification step) that the obtained solution is indeed the value function of the optimal control problem. This can be done by either justifying all steps of the derivation of the HJB derivation rigorously, or by proving that the solution of the HJB equation dominates all other possible value functions (such a procedure often involves martingale arguments and (extensions of) Itˆo’s formula). The second possibility is usually the more feasible one. If the solution of the HJB equation is not unique (which may for instance happen if the initial condition cannot be specified), then the stochastic control problem can become very difficult. This effect can for instance occur if the value function is not as regular as the HJB equation would ask for. In that case one can often still work with either weak solutions or socalled viscosity solutions, see the references at the end of the chapter. Proof of Theorem 3.1. Let u be an arbitrary control and assume that u is used as control in [t, t + h) and the optimal control is used in [t + h, T ). Then V (x) has two parts, the contribution from [t, t + h) and the one from [t + h, T ). This gives V (x) ≥ =
u r(x, u)h + o(h) + e−δh Ex,t V (Rt+h )
r(x, u)h − δ h V (x) + A u V (x)h + V (x) + o(h) ,
450
CHAPTER XIV. STOCHASTIC CONTROL
which shows that (3.2) holds with = replaced by ≥. To see that the sup is actually 0, choose a control u such that the above scheme gives an expected reward of at least V (x) − ². The same calculation then gives V (x) − ² ≤ A u V (x) + r(x, u) + V (x) − δV (x) . Let ² ↓ 0.
2
Remark 3.3 When the value function is determined, the next step is to identify the corresponding control strategy that realizes this value function (this is not always simple and it may even happen that such a strategy does not exist!). In any case, by definition at least ²optimal strategies always exist, i.e. for each ² > 0 there is a strategy that leads to a value of V (x) − ². 2 As may become clear from the above remarks, giving a rigorous and systematic treatment of stochastic control theory in insurance is outside the scope of this book. In the sequel we shall rather consider a few particular examples to get the flavor of the topic. Example 3.4 (Optimal investment for a diffusion) As a first example, we consider the investmentruin problem of Browne [206]. The model is given by two stochastic differential equations dRt0 = a1 dt + a2 dBt0 ,
dMt = b1 Mt dt + b2 Mt dBt1 ,
where B 0 , B 1 are independent standard Brownian motions and R00 = x. Here R0 describes the evolution of the reserve of the company without investment and M the price process of a risky asset. Thus, R0 is Brownian motion with drift and M is geometric Brownian motion. It is now assumed that the company is free to invest an amount ut in the risky asset2 at any time t so that in the presence of investment, the reserve evolves according to dRt = a1 dt + a2 dBt0 + ut b1 dt + ut b2 dBt1 . The purpose is to minimize the infinite horizon ruin probability or, equivalently, to maximize the survival probability φ(u) = 1 − ψ(u). Thus in the general formulation we may take T as the ruin time (which is a stopping time), r ≡ 0, δ = 0 and rT ≡ −1. Adding the constant 1 to the value function, V indeed 2 Here and in the sequel u can exceed the present surplus level x, i.e. it is possible to use t additional sources (or borrow money) for the investment. See the Notes for references which deal with constraints on ut such as ut ≤ x.
3. THE HAMILTONJACOBIBELLMAN EQUATION
451
corresponds to the survival probability of the controlled process. Consequently the HJB equation is simply 0 = supu≥0 A u V (x), i.e. ¸ · 1¡ (3.3) 0 = sup (a1 + ub1 )V 0 (x) + a22 + u2 b22 )V 00 (x) . 2 u≥0 For the solution, we first note that (since V is increasing and hence V 0 (x) > 0), a maximizer u∗ can only exist when V 00 < 0. We then simply compute u∗ by differentiating A u V (x) w.r.t. u to get b1 V 0 (x) + u∗ b22 V 00 (x) = 0, i.e. u∗ = −
b1 V 0 (x) . b22 V 00 (x)
Substituting back in the HJB equation gives the ODE 0 = a1 V 0 (x) −
b21 V 0 (x) 0 1 2 00 1 2 b21 V 0 (x)2 00 V (x) a V (x) b V (x) . + + 2 b22 V 00 (x) 2 2 2 b42 V 00 (x)2
Dividing by V 0 (x) and collecting terms shows that z(x) = V 0 (x)/V 00 (x) must be the solution of 1 1 b21 1 − z(x). 0 = a1 + a22 2 z(x) 2 b22 Multiplying by z(x) gives a quadratic equation, and since we assumed V 00 < 0, z(x) must be the negative solution, say k. In particular z(x) and hence u∗ does not depend on x, and we get our final solution u∗ = −kb1 /b22 . The (somewhat surprising) conclusion is that, no matter how large the current capital x, it is optimal to always invest the constant amount −kb21 /b22 of money into the risky asset for minimizing the probability of ruin. The resulting minimal ruin probability can be calculated by substituting u∗ into (3.3) and using the boundary conditions V (0) = 0 (with diffusion, starting in zero leads to immediate ruin) and V (∞) = 1. This results in ψI (x) = 1 − V (x) = e−γI x
with γI =
2(a1 − b21 k/b22 ) 2(a1 + b1 u∗ ) = . 2 2 a2 + b2 (u∗ )2 a22 + b21 k 2 /b22 2
The example shows a common feature of control problems in diffusion models, that the HJB equation often takes a form of a nonstandard ODE. Here it was rather easily solvable, but in general much ingenuity may be required. It should also be stressed that the calculations do not provide a rigorous proof that the candidate for the optimal strategy that was found indeed is optimal. For this one needs a verification step, that for diffusion models is often done by checking that the solution to the HJB equation that was found is twice differentiable (as here). Optimality then follows by a simple application of Itˆo’s formula.
452
CHAPTER XIV. STOCHASTIC CONTROL
Example 3.5 (Optimal proportional reinsurance for a diffusion) Similar techniques as in Example 3.4 can be used to treat optimal proportional reinsurance for a diffusion that evolves according to dRt0 = a1 dt + a2 dBt0 . I.e., at each time t there is the possibility to pass on a fraction 1 − ut ∈ [0, 1] of the risk (which in the diffusion approximation is represented by the second part above), at the expense of a reduced drift (due to the subtraction of the reinsurance premium drift aθ ). Correspondingly, in the presence of proportional reinsurance, the process follows the dynamics dRtU
(ut a1 + (1 − ut )(a1 − aθ )) dt + a2 ut dBt0 (ut aθ + (a1 − aθ )) dt + a2 ut dBt0 .
= =
We can restrict to the case aθ > a1 (otherwise the ruin probability will trivially be minimized (namely be equal to 0) by ut = 0 for all t). We want to maximize the probability of survival φ(u), so as in Example 3.4 we choose T as the ruin time, r ≡ 0, δ = 0 and rT ≡ −1 (and add the constant 1 to the value function) to arrive at the HJB equation 0 = supu≥0 A u V (x), i.e. 0 =
h u2 a22 00 i V (x) . sup (a1 − aθ + u aθ ) V 0 (x) + 2 u∈[0,1]
It can be solved in much the same way as in Example 3.4 (one just has to additionally take care of the bound u ∈ [0, 1]) and one obtains that the optimal strategy is to have a constant fraction of proportional reinsurance given by u∗ = min{2(1 − a1 /aθ ), 1}. If this value is now substituted in the HJB equation, its solution (for the boundary conditions V (0) = 0 and V (∞) = 1) gives the resulting minimal ruin probability ψR (x) = 1 − V (x) = 1 − eγR x with
½ γR =
a2θ /(2a22 (aθ − a1 )) 2a1 /a22
if a1 < aθ < 2a1 , if aθ ≥ 2a1 .
One finally needs a verification theorem showing that the obtained form of V (x) indeed dominates all other admissible strategies which again can be done by applying Itˆo’s formula. 2 Stochastic control problems for jump processes turn out to be more subtle:
3. THE HAMILTONJACOBIBELLMAN EQUATION
453
´rLundberg model) Example 3.6 (Optimal investment for the Crame Let us now consider a Cram´erLundberg risk reserve process Rt0 = x + t − At (where {At } is a compound Poisson process with rate β and individual claim distribution function B) and the possibility to dynamically invest an amount of ut into a financial asset that is modeled by geometric Brownian motion Mt with dMt = b1 Mt dt + b2 Mt dBt . The controlled process then satisfies dRtU = (1 + ut b1 )dt + ut b2 dBt − dAt . © ª The goal is again to minimize the ruin probability of RtU over all admissible strategies ut which are assumed to be predictable (in particular, the value of an admissible strategy at time t may depend on the history of the process up to t, but not on the size of a claim that may occur at t). In the present model, the HJB equation 0 = supu≥0 A u V (x) translates into · µZ x ¶¸ u2 b22 00 V (x) + β 0 = sup (1 + b1 u) V 0 (x) + V (x − y)B(dy) − V (x) . 2 u≥0 0 (3.4) Note that u∗ (x) → 0 as x → 0 (otherwise the investment will lead to φ(0) = V (0) = 0 which cannot be optimal), so we obtain a boundary condition V 0 (0) = βV (0). The second boundary condition is again limx→∞ V (x) = 1. Since there is no solution to this equation for V 00 (x) ≥ 0, we assume V 00 (x) < 0. Then the supremum is attained for u∗ (x) = −
b1 V 0 (x) b22 V 00 (x)
and plugging this into the HJB equation one gets µZ x ¶ b21 V 0 (x)2 0 +β V (x − y)B(dy) − V (x) = 0 . V (x) − 2 00 2b2 V (x) 0
(3.5)
It is now considerably more difficult than in the diffusion case to solve this equation and retrieve further information about the optimal strategy. In a first step one can show in a verification theorem using Itˆo’s lemma that if (3.4) has an increasing twice continuously differentiable solution, then the feedback strategy u∗ is indeed optimal among all admissible investment strategies. Under further assumptions on B (like the existence of a bounded density b(x)), one can then show with considerable effort that indeed a unique increasing twice continuously differentiable solution of (3.4) exists; however its form can only be determined
454
CHAPTER XIV. STOCHASTIC CONTROL
numerically. Remarkably, one can still retrieve substantial information about the asymptotic behavior of both u∗ (x) and ψI (x) = 1 − V (x) as x → ∞: For lighttailed claim size distribution B, if the adjustment coefficient γI exists as the positive solution of 2 ¡ ¢ b − 1 − r = b1 , β B[r] 2b22
(3.6)
then ψI (x) ≤ e−γI x and (under a mild additional condition) the Cram´erLundberg approximation limx→∞ eγI x ψI (x) = C holds for some constant C. Without investment, the r.h.s. of (3.6) is zero, so clearly γI > γ, and hence optimal investment can substantially decrease the probability of ruin.3 Furthermore limx→∞ u∗ (x) = b1 /(b22 γI ), so asymptotically the optimal strategy is to invest a constant amount into the risky asset, which is somewhat surprising at first sight. b = ∞ for all r > 0), one On the other hand, for heavytailed B (i.e. B[r] ∗ can show that u (x) is unbounded. If the failure rate of B tends to zero, then the optimal strategy converges and limx→∞ u∗ (x) = ∞. A quite pleasant result (whose proof is beyond the scope of this book) is that for B, B0 ∈ S the optimal investment strategy leads to Z ∞ 1 2β b22 Ry dy, x → ∞. ψI (x) ∼ (3.7) 1 b21 dz x 0 1−B(z) This can be compared with Theorem VIII.2.1 to see the reduction of ψ(u) through investment. If further the failure rate of B tends to zero, then the rate at which u∗ (x) goes to infinity can be identified to be Z x 1 − B(x) b1 dz, x → ∞. u∗ (x) ∼ 2 b2 0 1 − B(z) In particular, if B(x) is regularly varying with index −α, then simple applications of Karamata’s theorem translate (3.7) into ψI (x) ∼
¢ 2β b22 (α + 1) ¡ 1 − B(x) , 2 b1 α
and u∗ (x) ∼
b1 x, + 1)
b22 (α
x → ∞,
x → ∞.
3 This is in contrast to investment of a constant fraction of the reserve, which led to a Paretotype tail even for lighttailed B, cf. Theorem VIII.6.2.
3. THE HAMILTONJACOBIBELLMAN EQUATION
455
Accordingly, for x → ∞ it is then optimal to invest the constant fraction b1 /(b22 (α + 1)) of the surplus into the risky asset.4 For a derivation and detailed discussion of the above results the reader is referred to Schmidli [779, Ch. IV]. 2 ´rLundberg model) Example 3.7 (Optimal reinsurance for the Crame PNt Assume again the Cram´erLundberg risk reserve process Rt0 = x + t − n=1 Ui and the purchase of reinsurance on individual claims according to some reinsurance form ut under which the cedent reduces a possible claim payment Ui at time t to ut (Ui ) (with the implicit understanding that ut (y) is a continuous function satisfying 0 ≤ ut (y) ≤ y). The goal is to minimize the ruin probability through dynamically adapting the reinsurance form ut (to avoid trivialities, the admissible strategies ut are again assumed to be predictable, cf. Example 3.6). The premium intensity for such a reinsurance contract is pR (ut ) for a continuous function pR with the understanding that more reinsurance is more expensive and that full reinsurance is more expensive than first insurance (i.e. if ut (Ui ) = Ui , then pR (ut ) > 1), as otherwise it would be optimal to reinsure the entire insurance risk, leading to a ruin probability of zero. The controlled surplus process is now given by Z RtU = x +
t
(1 − pR (us ))ds − 0
Nt X
ut (Ui ).
n=1
The HJB equation for this optimization problem then reads ½ µZ ∞ ¶¾ ¡ ¢ 0 sup (1 − pR (u)) V (x) + β V x − u(y) B(dy) − V (x) = 0 , (3.8) 0 u∈U where at this stage u is a function (rather than a constant) representing the reinsurance form and U is the (compact) set of all admissible reinsurance forms. Here V again corresponds to the survival probability of the controlled process. Since we are interested in strictly increasing solutions of (3.8), we can restrict U to those admissible strategies for which pR (u) < 1. If one specifies a boundary value, then (with quite some effort) this equation can be shown to have a unique, strictly increasing and continuously differentiable solution and that this solution (after appropriate scaling) indeed minimizes the ruin probability of the controlled process. 4 Theorem VIII.6.2 together with Remark VIII.6.4 show that adopting this constant fraction strategy for all x also leads to the same asymptotic behavior of ψI (x), but for finite x the performance can be quite different!
456
CHAPTER XIV. STOCHASTIC CONTROL
More explicit results can be obtained, if the set U is restricted to particular reinsurance forms. For instance, if one tries to find the optimal dynamic proportional reinsurance u(y) = u y with 0 ≤ u ≤ 1, then (3.8) simplifies to ( ÃZ !) sup
x/u
(1 − pR (u)) V 0 (x) + β
V (x − uy) B(dy) − V (x)
u∈[0,1]
= 0. (3.9)
0
¡ ¢ One can then show that if inf u↑1 1 − pR (u) /(1 − u) > 0 (i.e. the reinsurer charges more than the net premium for u close to 1), then it is optimal to purchase no reinsurance for any initial capital x below some positive level. Equation (3.9) cannot be solved explicitly, so one has to approximate the solution numerically in practical examples. Asymptotic results for x → ∞ can however be obtained. In particular, if a strictly positive solution γR to the adjustment equation © ¡ ¢ ª b − β − 1 − pR (u) r = 0 (3.10) inf β B[ur] u∈[0,1]
exists, then under some mild additional assumptions the Cram´erLundberg approximation lim eγR x ψR (x) = C (3.11) x→∞
holds for some constant C > 0. If moreover the value u∗ for which the infimum in (3.10) is attained is unique, then one can show that limx→∞ u(x) = u∗ , i.e. for increasing initial capital x the optimal strategy converges to a constant reinsurance fraction. If on the other hand B is regularly varying with index −α, then µ ¶ βuα B 0 (x) ψR (x) ∼ inf u∈[0,1] (1 − pR (u) − βµB u)+ and if the infimum is attained for a unique value u∗ , then again limx→∞ u(x) = u∗ . For subexponential, but lighter tails B, available results are not as explicit, but in that case the optimal strategy can be shown to satisfy lim supx→∞ u(x) = inf{u : 1 − pR (u) > βµB u}. I.e. for large x one tries to reinsure as much as still possible without resulting negative drift. The intuitive reason is that (in contrast to regularly varying tails) proportional reinsurance makes the tail of the distributions smaller and so one tries to purchase as much reinsurance as possible. As another example one can consider the restricted class of excessofloss strategies u(y) = min(y, u), so in this case u is the retention. This restriction leads to the HJB equation Z a(x,u) n¡ o ¢ 0 ¡ ¢ sup 1 − pR (u) V (x) + β V x − min(y, u) B(dy) − βV (x) = 0, u∈[0,∞]
0
3. THE HAMILTONJACOBIBELLMAN EQUATION
457
where a(x, u) is x if u ≥ x and infinity otherwise. Under mild additional assumptions one can then show that if the adjustment coefficient γR , this time defined as the positive solution of n Z u¡ ¢ ¡ ¢ o inf βr 1 − B(z) erz dz − 1 − pR (u) r = 0, u∈[0,∞]
0
exists, then the Cram´erLundberg approximation (3.11) again holds and if this infimum is attained for a unique value u∗ , then limx→∞ u(x) = u∗ . Note that γR now also exists for heavytailed claim size distributions and both the Cram´erLundberg approximation and the limiting strategy result still apply. 2 Notes and references Schmidli [779] is a recent and rich source where one finds rigorous treatments of the above examples and numerous further stochastic control problems in insurance. A short survey of the topic is in Hipp [466]. Browne [206] also extends Example 3.4 in several further directions, including dependent Brownian motions, minimizing expected penalty at ruin and maximizing exponential utility in finite time. Optimal investment problems for the Cram´erLundberg model were first studied by Hipp & Plum [469] and Gaier, Grandits & Schachermayer [386] and in a more general framework in [470]; since then many further results have been added. Among them, Gaier & Grandits [385] and Grandits [434] extend the optimal investment problem for regularly varying claims to the case when in addition a riskless asset is available, see also Liu & Yang [601] and Yang & Zhang [902]. For a periodic risk model, K¨ otter & B¨ auerle [558] investigate the control problem to maximize the adjustment coefficient through investment. The classical references for optimal reinsurance programs are Højgaard & Taksar [480], Schmidli [775] and Hipp & Vogt [472]. In a similar fashion, investment and reinsurance can be controlled simultaneously, usually without substantial additional complexity, see e.g. Schmidli [777] and Luo & Taksar [617] for minimizing the probability of absolute ruin. Whereas in most considered investment problems it is allowed to borrow money for the purchase of the risky asset if necessary, Promislow & Young [717] and Azcue & Muler [115] deal with the effects of borrowing constraints, in which case one has to work with weak solutions to the HJB equation. Luo [616] and Bai & Guo [123] deal with several available risky assets. The continuoustime risk model leads to elegant and often very explicit solutions for the control problems. However, similar to the dynamic hedging problem in finance, it will be practically impossible to continuously adjust the investment/reinsurance fraction. It is still a challenge for future research to incorporate frictions such as transaction costs and/or limited possibilities for portfolio adjustment into the model; for a step in this direction, see Højgaard & Taksar [481]. Optimal control strategies of
458
CHAPTER XIV. STOCHASTIC CONTROL
reinsurance and investment to minimize the ruin probability in a multivariate discretetime risk model can be found in B¨ auerle & Blatter [141]. Other types of control problems with the objective to minimize the ruin probability include the possibility to accumulate new business (see Hipp & Taksar [471]) and to choose between proportional insurance and the issuing of catastrophe bonds which are correlated with the insurer’s losses (see B¨ auerle [149]). The related topic of maintaining solvency for pension plans is considered in Olivieri & Pitacco [673]. Optimal investment and reinsurance when instead of minimizing the ruin probability the objective is to maximize the utility of terminal wealth is investigated in Irgens & Paulsen [495]. See also Korn & Wiese [553] for the case where the size of the resulting ruin probability is a constraint, Zhang & Siu [913] for a gametheoretic approach that involves model uncertainty and Xia & Zhang [874] where a martingale approach is employed to identify meanvariance efficient inv estment strategies. For a general model setup, see Liu & Ma [603]. Another methodological bridge between problems of the above kind and more financeoriented control problems can be found in Bayraktar & Young [145, 146, 147], who consider an individual who consumes at a certain (possibly surplusdependent or random) rate and can investment in a riskless and a risky asset in such a way that the probability of ruin before a random time horizon is minimized; for the objective to maximize the expected utility of consumption and the size of the ruin probability being a constraint, see [148]. Another classical stochastic control problem in insurance (originally raised by de Finetti [283]) is how to pay out dividends from the risk reserves to shareholders in such a way that the expected (utility of the) discounted sum of dividend payments until ruin is maximized. The resulting value function is a profitability type measure for the value of an insurance portfolio and as such may be interpreted as an alternative to the ruin probability which rather measures the safety. There also have been studies of optimal strategies that balance between profitability and safety, expressed through a penalty term on early ruin, see e.g. Thonhauser & Albrecher [843]. The corresponding control problems lead to intricate mathematical challenges and have developed into an active field of research, which cannot be covered in this book (see e.g. Albrecher & Thonhauser [40] for a recent survey and Schmidli [779] for a detailed treatment. Cf. also the Notes of VIII.1). In addition to the analytic approach outlined in this chapter, sometimes a more probabilistic approach also works in which the control problem is solved within a restricted smaller class of admissible strategies and then by comparison one can show that the optimal strategy is also optimal within the whole class of admissible strategies (see e.g. Loeffen [604]). In finance applications, a popular and workable alternative to the dynamic programming principle and the HJB equation is the socalled dual method. For insurance applications, however, it seems that due to the intervention of the control into the underlying surplus process, the resulting set of possible trajectories is too restricted to
3. THE HAMILTONJACOBIBELLMAN EQUATION
459
make the dual method work here. Concerning general methods, some standard textbooks on continuous time stochastic control are Davis [278], Fleming & Soner [364], Øksendal & Sulem [672], Pham [698] and, for numerics, Kushner & Dupuis [541].
This page intentionally left blank
Chapter XV
Simulation methodology 1
Generalities
This section gives a summary of some basic issues in simulation and Monte Carlo methods. We shall be brief concerning general aspects and refer to standard textbooks like Asmussen & Glynn [79], Bratley, Fox & Schrage [197], Ripley [740] or Rubinstein & Kroese [751] for more detail (a treatment with a special view towards insurance is Korn, Korn & Kroisandt [552]); topics of direct relevance for the study of ruin probabilities are treated in more depth.
1a
The crude Monte Carlo method
Let Z be some random variable and assume that we want to evaluate z = EZ in a situation where z is not available analytically but Z can be simulated. The crude Monte Carlo (CMC) method then amounts to simulating i.i.d. replicates Z1 , . . . , ZN , estimating z by the empirical mean z = (Z1 + · · · + ZN )/N and the variance of Z by the empirical variance ´ 1 ³X 2 1 X (Zi − z)2 = Zi − N z 2 . N − 1 i=1 N − 1 i=1 N
s2 =
N
(1.1)
√ D 2 2 According to standard central limit theory, N (z − z) → N(0, σZ ), where σZ = Var(Z). Hence 1.96 s (1.2) z ± √ N is an asymptotic 95% confidence interval, and this is the form in which the result of the simulation experiment is commonly reported. 461
462
CHAPTER XV. SIMULATION METHODOLOGY
In the setting of ruin probabilities, it is straightforward to use the CMC method to simulate the finite horizon ruin probability z = ψ(u, T ): just simulate the risk process {Rt } up to time T (or T ∧ τ (u)) and let Z be the indicator that ruin has occurred, ³ ´ ¡ ¢ Z = I inf Rt < 0 = I τ (u) ≤ T . 0≤t≤T
The situation is more intricate for the infinite horizon ruin probability ψ(u). The difficulty in the naive choice Z = I(τ (u) < ∞) is that Z cannot be simulated in finite time: no finite segment of {St } can tell whether ruin will ultimately occur or not. Sections 2–5 deal with alternative representations of ψ(u) allowing to overcome this difficulty.
1b
Variance reduction techniques
The purpose of the techniques we study is to reduce the variance on a CMC estimator Z of z, typically by modifying Z to an alternative estimator Z 0 with EZ 0 = EZ = z and (hopefully) Var(Z 0 ) < Var(Z). This is a classical area of the simulation literature, and many sophisticated ideas have been developed. Typically variance reduction involves both some theoretical idea (in some cases also a mathematical calculation), an added programming effort, and a longer CPU time to produce one replication. Therefore, one can argue that unless Var(Z 0 ) is considerable smaller than Var(Z), variance reduction is hardly worthwhile. Consider for instance Var(Z 0 ) = Var(Z)/2. Then replacing the number of replications N by 2N will give the same precision for the CMC method as when simulating N 0 = N replications of Z 0 , and in most cases this modest increase of N is totally unproblematic. We survey two methods which will be used below to study ruin probabilities, conditional Monte Carlo and importance sampling. However, there are others which are widely used in other areas and potentially useful also for ruin probabilities. We mention in particular (regression adjusted) control variates, stratification and common random numbers. Conditional Monte Carlo Let Z be a CMC estimator and Y some other r.v. generated at the same time as Z. Letting Z 0 = E[Z  Y ], we then have EZ 0 = EZ = z, so that Z 0 is a candidate for a Monte Carlo estimator of z. Further, writing ¡ ¢ ¡ ¢ ¡ ¢ Var(Z) = Var E[Z  Y ] + E Var[Z  Y ] = Var(Z 0 ) + E Var[Z  Y ] and ignoring the last term shows that Var(Z 0 ) ≤ Var(Z) so that conditional Monte Carlo always leads to variance reduction.
1. GENERALITIES
463
Importance sampling e The idea is to compute z = EZ by simulating from a probability measure P different from the given probability measure P and having the property that there exists a r.v. L such that e z = EZ = E[LZ].
(1.3)
e and Thus, using the CMC method one generates (Z1 , L1 ), . . . , (ZN , LN ) from P uses the estimator N 1 X Li Zi z IS = N i=1 and the confidence interval ´ 1.96 sIS 1 X 1 ³X 2 2 (Li Zi −z IS )2 = Li Zi −N z 2IS . where s2IS = z IS ± √ N − 1 i=1 N − 1 i=1 N N
N
e mutually In order to achieve (1.3), the obvious possibility is to take P and P e equivalent and L = dP/dP as the likelihood ratio. Variance reduction may or may not be obtained: it depends on the choice of e and the problem is to make an efficient choice. the alternative measure P, e To this end, a crucial observation is that there is an optimal choice of P: e e define P by dP/dP = Z/EZ = Z/z, i.e. L = z/Z (the event {Z = 0} is not a e = 0) = 0). Then concern because P(Z h i2 h 2 i £ ¤2 2 g e e e z Z2 − E z Z = z 2 − z 2 = 0. Var(LZ) = E(LZ) − E(LZ) = E Z2 Z Thus, it appears that we have produced an estimator with variance zero. However, the argument cheats because we are simulating since z is not avaliable analytically. Thus we cannot compute L = Z/z (further, it may often be ime in such a way that it is straightforward to simulate from possible to describe P e P). Nevertheless, even if the optimal change of measure is not practical, it gives e such that dP/dP e a guidance: choose P is as proportional to Z as possible. This e to may also be difficult to assess, but tentatively, one would try to choose P make large values of Z more likely.
1c
Rare events simulation
The problem is to estimate z = P(A) when z is small, say of the order 10−3 or less. I.e., Z = I(A) and A is a rare event. In ruin probability theory, A =
464
CHAPTER XV. SIMULATION METHODOLOGY
{τ (u) ≤ T } or A = {τ (u) < ∞} and the rare events assumption amounts to u being large, as is the case of typical interest. 2 The CMC method leads to a variance of σZ = z(1 − z) which tends to zero as z ↓ 0. However, the issue is not so much that the precision is good as that relative precision is bad: p z(1 − z) 1 σZ = ∼ √ → ∞. z z z In other words, a confidence interval of width 10−4 may look small, but if the point estimate z is of the order 10−5 , it does not help telling whether z is of the magnitude 10−4 , 10−5 or even much smaller. Another way to illustrate the problem is in terms of the sample size N needed to acquire a given relative precision, say 10%, in terms of the √ halfwidth of the confidence interval. This leads to the equation 1.96 σZ /(z N ) = 0.1, i.e. N =
100 · 1.962 100 · 1.962 z(1 − z) ∼ z2 z
increases like z −1 as z ↓ 0. Thus, if z is small, large sample sizes are required. We shall focus on importance sampling as a potential (though not the only) way to overcome this problem. The optimal change of measure (as discussed above) is given by i hZ 1 e P(AB) = P(BA). P(B) = E ;B = z z e is the conditional distribution given A. However, just the I.e., the optimal P same problem as for importance sampling in general comes up: we do not know z which is needed to compute the likelihood ratio and thereby the importance sampling estimator, and further it is usually not practicable to simulate from e look as much like P(·A) as possible. An P(·A). Again, we may try to make P example where this works out nicely is given in Section 3. Two established efficiency criteria in rare events simulation are bounded relative error and logarithmic efficiency. To introduce these, assume that the rare event A = A(u)¡ depends ¢ on a parameter u (say A = {τ (u) < ∞}). For each u, let z(u) = P A(u) , assume that the A(u) are rare in the sense that z(u) → 0, u → ∞, and let Z(u) be a Monte Carlo of z(u). We then ¡ estimator ¢ say that {Z(u)} has bounded relative error if Var Z(u) /z(u)2 remains bounded as u → ∞. According to the above discussion, this means that the sample size N = N² (u) required to obtain a given fixed relative precision (say ² =10%) remains bounded. Logarithmic efficiency is defined by the slightly weaker requirement that one can get as close to the power 2 as desired: Var(Z(u)) should
2. SIMULATION VIA THE POLLACZECKKHINCHINE FORMULA
465
go to 0 as least as fast as z(u)2−² , i.e. ¡ ¢ Var Z(u) 0. This allows Var Z(u) to decrease slightly slower than z(u)2 , so that N² (u) may go to infinity. However, the mathematical definition puts certain restrictions on this growth rate, and in practice, logarithmic efficiency is almost as good as bounded relative error. The term logarithmic comes from the equivalent form ¡ ¢ − log Var Z(u) ≥ 2 (1.5) lim inf u→∞ − log z(u) of (1.4). Notes and references A survey on rare events simulation is in Asmussen & Glynn [79, Ch. VI]. See also Juneja & Shahabuddin [512]. For details on random variate generation to implement CMC methods and its refinements we refer to the textbooks mentioned at the beginning of the section. The traditional approach is pseudorandom numbers generated by some recursion. In finance applications, quasirandom numbers ([79, IX.3]) have recently become popular and often lead to a substantial improvement of precision. However, it is folklore that quasirandom numbers perform less well when the time horizon is random (say a stopping time like the ruin time) rather than fixed. For an illustration, see [79, p. 274]. If however, an algorithm can be designed which, instead of the risk process, needs the simulation of some other quantities with fixed dimension, quasirandom numbers can be competitive, see e.g. Coulibaly & Lef`evre [253].
2
Simulation via the PollaczeckKhinchine formula
Consider the compound Poisson model, let X1 , X2 , . . . be i.i.d. with common density b0 (x) = B(x)/µB , let Sn = X1 + · · · + Xn and let K be independent and geometric with parameter ρ, P(K = k) = (1 − ρ)ρk . The PollaczeckKhinchine formula IV.(2.2) may be written as ψ(u) = P(M > u), where M = SK . Thus ψ(u) = z = z(u) = EZ, where Z = I(M > u) may be generated as follows: 1. Generate K as geometric, P(K = k) = (1 − ρ)ρk . 2. Generate X1 , . . . , XK from the density b0 (x). Let M ← SK . 3. If M > u, let Z ← 1. Otherwise, let Z ← 0.
466
CHAPTER XV. SIMULATION METHODOLOGY
The algorithm gives a solution to the infinite horizon problem, but as a CMC method, it is not efficient for large u. Therefore, it is appealing to combine it with some variance reduction method.
2a
Light tails: importance sampling
With light tails, there is a standard way to perform importance sampling for geometric sums. In the present ruin context (assuming the conditions of the Cram´erLundberg approximation), it amounts to the following. As setup, note that an easy argument using integration by parts shows that the Lundberg equation for the adjustment coefficent γ can alternatively be written as Z ∞ 1 = ρ eγy B0 (dy) . (2.1) 0
Let B0∗ be the distribution defined by dB0∗ /dB0 (x) = ρeγx , and to generate one replication of the estimator, generate X1∗ , X2∗ , . . . from B0∗ . Let Sn∗ = X1∗ + · · · + Xn∗ . Stop the simulation at τ ∗ (u) = inf {n : Sn∗ > u} and return the estimator Z ∗ (u) = e−γSτ ∗ (u) . ¡ ¢ To understand the algorithm, note first that z(u) = P τ (u) ≤ N . Next let P∗ be the probability measure where the Xi∗ are i.i.d. with distribution B0∗ and N remains independent and geometric(ρ). Then by the definition of B0∗ , ¡ ¢ ¤ 1 ∗ £ −γX1∗ E e P∗ X1∗ ∈ du = ; X1∗ ∈ du . ρ By a standard extension to stopping times (see, e.g., [79, pp. 131–132]), this implies z(u) = E∗
h
1 τ ∗ (u)
ρ
e−γ
∗
Sτ ∗ (u)
; τ ∗ (u) ≤ N
i
= E∗ e−γ
∗
Sτ∗(u)
,
where we used that N remains geometric and independent of the Xi∗ under P∗ . I.e., the estimator Z ∗ (u) is unbiased. Further ¡ ¢ E∗ Z ∗ (u)2 = E∗ e−2γSτ (u) ≤ e−2γu = O z(u)2 , where the last step used the standard Cram´erLundberg asymptotics z(x) ∼ Ce−γu . This shows: Theorem 2.1 The estimator Z ∗ (u) has bounded relative error.
2. SIMULATION VIA THE POLLACZECKKHINCHINE FORMULA
2b
467
Heavy tails: conditional Monte Carlo
With heavy tails, the first efficient algorithm seems to be that of Asmussen & Binswanger [72], which gives a logarithmically efficient estimator when the claim size distribution B (and hence B0 ) has a regularly varying tail. So, assume in the following that B 0 (x) ∼ L(x)/xα with α > 0 and L(x) slowly varying. Then (cf. Theorem X.2.1) ψ(u) ∼ ρ/(1 − ρ)B 0 (x), and the problem is to produce an estimator Z(u) with a variance going to zero not slower (in the logarithmic sense) than B 0 (u)2 . A first obvious idea when using conditional Monte Carlo is to write ψ(u) = =
P(X1 + · · · + XK > u) ¯ £ ¤ EP X1 + · · · + XK > u ¯ X1 , . . . , XK−1
=
EB 0 (u − X1 − · · · − XK−1 ).
Thus, we generate only X1 , . . . , XK−1 , compute Y = u − X1 − · · · − XK−1 and let Z (1) (u) = B 0 (Y ) (if K = 0, Z (1) (u) is defined as 0). As a conditional Monte Carlo estimator, Z (1) (u) has a smaller variance than Z1 (u). However, asymptotically it presents no improvement: the variance is of the same order of magnitude F (x). To see this, just note that EZ (1) (u)2
≥ =
E[B 0 (u − X1 − · · · − XK−1 )2 ; X1 > u, K ≥ 2] ρ2 P(X1 > u) = ρ2 B 0 (u)
(here we used that by positivity of the Xi , X1 + · · · + XK−1 > u when X1 > u, and that B 0 (y) = 1, y < 0). This calculation shows that the reason that this algorithm does not work well is that the probability of one single Xi to become large is too big. The idea of [72] is to avoid this problem by discarding the largest Xi and considering only the remaining ones. For the simulation, we thus generate K and X1 , . . . , XK , form the order statistics X(1) < X(2) < · · · < X(K) , throw away the largest one X(K) , and let Z (2) (u)
¡ ¢ = P SK > u  X(1) , X(2) , . . . , X(K−1) ¡ ¢ B 0 (u − S(K−1) ) ∨ X(K−1) , = B 0 (X(K−1) )
where S(K−1) = X(1) + X(2) + · · · + X(K−1) . To check the formula for the
468
CHAPTER XV. SIMULATION METHODOLOGY
conditional probability, note first that ¯ ¡ ¢ B 0 (X(n−1) ∨ x) . P X(n) > x ¯ X(1) , X(2) , . . . , X(n−1) = B 0 (X(n−1) ) We then get ¡ ¢ P Sn > x  X(1) , X(2) , . . . , X(n−1) ¡ ¢ = P X(n) + S(n−1) > x  X(1) , X(2) , . . . , X(n−1) ¡ ¢ = P X(n) > x − S(n−1)  X(1) , X(2) , . . . , X(n−1) ¡ ¢ B 0 (x − S(n−1) ) ∨ X(n−1) . = B 0 (X(n−1) ) Theorem 2.2 Assume©that B 0ª(x) = L(x)/xα with L(x) slowly varying. Then the algorithm given by Z (2) (u) is logarithmically efficient. The proof of Theorem 2.2 is elementary but lengthy. We will omit it, since another equally simple conditional Monte Carlo estimator developed later by Asmussen & Kroese [89] performs better. The idea there is to partition according to which Xi is the largest, i.e., for which i one has Mn = X(n) = Xi , and condition on the Xj with j 6= i. Since clearly by symmetry P(Sn > u) = nP(Sn > u, Mn = Xn ), this gives the estimator ¡ ¢ 0 Z (3 ) (u) = n P Sn > u, Mn = Xn  n, X1 , . . . , XN −1 ¡ ¢ (2.2) = n B 0 Mn−1 ∨ (u − Sn−1 ) when N = n is deterministic (note that for Mn = Xn we need Xn > Mn−1 and for Sn > u, we need Xn > u − Sn−1 ), and ¡ ¢ 00 Z (3 ) (u) = N P SN > u, MN = XN  N, X1 , . . . , XN −1 ¡ ¢ (2.3) = N B 0 MN −1 ∨ (u − SN −1 ) when N is random. 0
Theorem 2.3 The estimator Z (3 ) (u) has bounded relative error in the regularly varying case, and is logarithmically efficient in the Weibull case provided 00 β < β = log(3/2)/ log 2 = 0.585. The same holds for Z (3 ) (u) in the regularly varying case provided L(·) satisfies lim sup u→∞
£ ¤ 1 E L(u/N )2 N 2α+2 < ∞ . 2 L(u)
2. SIMULATION VIA THE POLLACZECKKHINCHINE FORMULA
469
Proof. We consider only the regularly varying case and the case of a deterministic N = n. If Mn−1 ≤ u/n, then Sn−1 ≤ (n − 1)u/n and therefore always Mn−1 ∨ (u − Sn−1 ) ≥ u/n. Therefore 0
EZ (3 ) (u)2 B 0 (u)2
2 2α B 0 (u/n)2 2 L(u/n) /(u/n) = n L(u)2 /u2α B 0 (u)2 2 L(u/n) ∼ n2+2α . = n2+2α L(u)2
≤ n2
Noting that z(u) ∼ nF (u) by subexponentiality completes the proof.
2
00
Notes and references In the case Z (3 ) (u) of a random N , it is suggested in [89] that either N be used as a control variate or that N be stratified, and a substantial variance reduction was obtained. A theoretical support for the controlvariate approach was provided by Hartinger & Kortschak [453], who showed that in fact, in this setting the relative error goes to 0 as u → ∞.
2c
Heavy tails: importance sampling
Asmussen, Binswanger & Højgaard [73] suggested an importance distribution e0 that is much heavier than B0 . They showed for example that for the B e0 being of order 1/ log x, this gives regularly varying case and the tail of B bounded relative error. The practical experience with the algorithm is, however, discouraging, and a much better importance distribution was suggested by Juneja & Shahabuddin [511]. They suggested that the tail of B0 be changed to c1 B 0 (x)θ(u) on [x0 , ∞) and that the density c2 b0 (x) be used on (0, x0 ), where θ(u) → 0 and c1 , c2 have to be chosen in a certain way. We will not give the details but only present a simplified version of the algorithm in the Pareto case b0 = (α − 1)/(1 + x)α , where we again choose eb0 (x) = (e α − 1)/(1 + x)αe as Pareto with α e=α e(u) = αθ(u) → 0 (the regularly varying case is an easy extension). Thus the estimator is Z (4) (u) = I(SN > u)
N Y b0 (Xi ) i=1
eb0 (Xi )
(2.4)
e where N is with the r.v.’s simulated as independent under the measure P e geometric(ρ) and X1 , . . . XN have density b0 (x). Theorem 2.4 The estimator Z (4) (x) is logarithmically efficient in the Pareto case provided log α e/ log u → 0.
470
CHAPTER XV. SIMULATION METHODOLOGY (4)
Proof. Let Zn (u) denote (2.4) with N replaced by some fixed n. Then Z Z b0 (x1 )2 b0 (xn )2 e e (4) (u)2 = ··· b (x ) · · · eb0 (xn ) dx1 · · · dxn EZ ··· n eb0 (xn )2 0 1 b0 (x1 )2 x1 +···+xn >x e Z Z # = c−n ··· b# 0 (x1 ) · · · b0 (xn ) dx1 · · · dxn # x1 +···+xn >x
# c−n # P (Sn > x) ,
= where (c# )−1 =
R
2 e b20 /eb0 and b# 0 = c# b0 /b0 . Now
Z c−1 #
= 0
∞
α (α − 1)2 /(1 + x)2α (α − 1)2 ∼ . dx = (e α − 1)/(1 + x)αe α e(2α − α e) 2e α
(2.5)
Bounding P2α−e α (Sn > x) above and below by P2α−² (Sn > x) ∼
n n , respectively P2α+² (Sn > x) ∼ 2α+² , x2α−² x (4)
letting ² ↓ 0 and using log α e/ log u → 0 gives easily that Zn (u) is logarithmically efficient for P(Sn > u). We omit the details that are needed to deal with a geometric N . 2 Notes and references Asmussen, Binswanger and Højgaard [73] give a general survey of rare events simulation for heavytailed distributions. In many aspects the findings of [73] are quite negative: the large deviations ideas which are the main approach to rare events simulation in the lighttailed case do not seem to work for heavy tails. It must be noted that a main restriction of all algorithms considered in this section is that they are so intimately tied up with the compound Poisson model because the explicit form of the PollaczeckKhinchine formula is crucial (say, in the renewal or Markovmodulated model P(τ+ < ∞) and G+ are not explicit). A further interesting and useful idea applicable in the PollaczeckKhinchine framework with heavy tails was given by Juneja [510]. He noted that ˛ ´ ˛ ´ ` ` P SN > u, MN > u ˛ N = P MN > u ˛ N = 1 − B0 (u)N is explicit, so that only P(SN > u, MN ≤ u) needs to be simulated.
3
Static importance sampling via Lundberg conjugation
We consider again the compound Poisson model and assume the conditions of the Cram´erLundberg approximation so that z(u) = ψ(u) ∼ Ce−γu , use the
3. STATIC IMPORTANCE SAMPLING
471
representation ψ(u) = Ee−γSτ (u) = e−γu EL e−γξ(u) where ξ(u) = Sτ (u) − u is the overshoot (cf. IV.5), and simulate from PL , that is, using βL , BL instead of β, B, for the purpose of recording Z(u) = e−γSτ (u) . For practical purposes, the continuoustime process {St } is simulated by considering it at the discrete epochs {σk } corresponding to claim arrivals. Thus, the algorithm for generating Z = Z(u) is: 1. Compute γ > 0 as solution of the Lundberg equation ¡ ¢ b − 1 − γ, 0 = κ(γ) = β B[γ] b b and define βL , BL by βL = β B[γ], BL (dx) = eγx B(dx)/B[γ]. 2. Let S ← 0. e Let 3. Generate T as being exponential with parameter βe and U from B. S ← S + U − T. 4. If S > u, let Z ← e−γS . Otherwise, return to 3. There are various intuitive reasons that this should be a good algorithm. It resolves the infinite horizon problem since PL (τ (u) < ∞) = 1. We may expect a small variance since we have used our knowledge of the form of ψ(u) to isolate what is really unknown, namely EL e−γξ(u) , and avoid simulating the known part e−γu . More precisely, the results of V.7 tell that P(·  τ (u)© < ∞) andª PL (both measures restricted to Fτ (u) ) asymptotically coincide on τ (u) < ∞ , so that changing the measure to PL is close to the optimal scheme for importance sampling, cf. the discussion at the end of Section 1b. In fact: Theorem 3.1 The estimator Z(u) = e−γSτ (u) (simulated from PL ) has bounded relative error. Proof. Just note that EZ(u)2 ≤ e−2γu ∼ z(u)2 /C 2 .
2
eB e It is tempting to ask whether choosing importance sampling parameters β, different from βL , BL could improve the variance of the estimator. The answer is no. In detail, to deal with the infinite horizon problem, one must restrict e e ≥ 1. The estimator is then attention to the case βµ B M (u)
Z(u) =
Y βe−βTi dB (U ) e i e i e −βT dB i=1 βe
where M (u) is the number of claims leading to ruin, and we have:
(3.1)
472
CHAPTER XV. SIMULATION METHODOLOGY
e B) e is not logTheorem 3.2 The estimator (3.1) (simulated with parameters β, e B) e 6= (βL , BL ). arithmically efficient when (β, The proof is given below as a corollary to Theorem 3.3. The algorithm generalizes easily to the renewal model. We formulate this in a slightly more general random walk setting.1 Let X1 , X2 , . . . be i.i.d. with distribution F , let Sn = X1 + · · · + Xn , M (u) = inf {n : Sn > u}, and assume that µF < 0 and that Fb[γ] = 1, Fb0 [γ] < ∞ for some γ > 0. Let FL (dx) = eγx F (dx). The importance sampling estimator is then Z(u) = e−γSM (u) . More generally, let Fe be an importance sampling distribution equivalent to F and M (u)
Z(u) =
Y dF (Xi ) . dFe
(3.2)
i=1
Theorem 3.3 The estimator (3.2) (simulated with distribution Fe of the Xi ) has bounded relative error when Fe = FL . When Fe 6= FL , it is not logarithmically efficient. Proof. The first statement is proved exactly as Theorem 3.1. For the second, write dF dF (X1 ) · · · (XM (u) ). W (F  Fe) = dFe dFe By the chain rule for RadonNikodym derivatives, h i EFe Z(u)2 = EFe W 2 (F Fe) = EFe W 2 (F FL )W 2 (FL Fe) h i = EL W 2 (F FL )W (FL Fe) = EL exp{K1 + · · · + KM (u) } , where ³ dF ´2 ¶ dFe dFL (Xi ) (Xi ) = − log (Xi ) − 2γXi . Ki = log dFL dFL dFe µ
Here EL Ki = ²0 − 2γEL Xi , where ²0 = −EL log
dFe (Xi ) > 0 dFL
1 For the renewal model, X = U − T , and the change of measure F → F corresponds to i i i L B → BL , A → AL as in Chapter VI.
3. STATIC IMPORTANCE SAMPLING
473
by the information inequality. Since K1 , K2 , . . . are i.i.d., Jensen’s inequality and Wald’s identity yield © ª EFe Z(u)2 ≥ exp EL (K1 + · · · + KM (u) ) © ª = exp EL M (u)(²0 − 2γEL Xi ) . Since EL M (u)/u → 1/EL Xi , it thus follows that for 0 < ² < ²0 /EL Xi , lim sup u→∞
EFe Z(u)2 z(u)2 e²u
= ≥
EFe Z(u)2 2 −2γu+²0 u u→∞ C e e−2γu 1 > 0, lim sup 2 −2γu = C2 u→∞ C e
lim sup
which completes the proof.
2
Proof of Theorem 3.2. Consider compound Poisson risk process with intensities β 0 , β 00 , generic interarrival times T 0 , T 00 , claim size distributions B 0 , B 00 and generic claim sizes U 0 , U 00 . Then according to Theorem 3.3, all that needs to D be shown is that if U 0 − T 0 = U 00 − T 00 , then β 0 = β 00 , B 0 = B 00 . First by the memoryless property of the exponential distribution, U 0 − T 0 has a left exponential tail with rate β 0 and U 00 − T 00 has a left exponential tail with rate β 0 . This immediately yields β 0 = β 00 . Next, from P(U 0 − T 0 > x) Z ∞ Z 0 −β 0 y 0 0 β0 x B (x + y) dy = β e = βe 0
∞
0
x
P(U 00 − T 00 > x) Z Z ∞ 00 00 00 = β 00 e−β y B (x + y) dy = β 00 eβ x 0
0
e−β z B (z) dz , ∞
00
00
e−β z B (z) dz
x
D (x > 0) and β 0 = β 00 , U 0 − T 0 = U 00 − T 00 , we conclude by differentiation that B 0 (x) = B 00 (x) for all x > 0, i.e. B 0 = B 00 . 2 Notes and references The importance sampling method was suggested by Siegmund [807] for discrete time random walks and further studied by Asmussen [56] in the setting of compound Poisson risk models. The optimality result Theorem 3.1 is from Lehtonen & Nyrhinen [576], with the present (shorter and more elementary) proof taken from Asmussen & Rubinstein [99]. In [56], optimality is discussed in a heavy traffic limit η ↓ 0 rather than when u → ∞. The extension to the Markovian environment model is straightforward and was suggested in Asmussen [58]. Further discussion is in Lehtonen & Nyrhinen [577]. The queueing literature on related algorithms is extensive, see e.g. the references in Asmussen & Rubinstein [99], Heidelberger [455] and Juneja & Shahabuddin [512].
474
4
CHAPTER XV. SIMULATION METHODOLOGY
Static importance sampling for the finite horizon case
The problem is to produce efficient simulation estimators for ψ(u, T ) with T < ∞. As in V.4, we write T = yu. The results of V.4 indicate that we can expect a major difference according to whether y < 1/κ0 (γ) or y > 1/κ0 (γ). The easy case is y > 1/κ0 (γ) where ψ(u, yu) is close to ψ(u), so that one would expect the change of measure P → PL to produce close to optimal results. In fact: Proposition 4.1 If y > 1/κ0 (γ), then the estimator Z(u) = e−γSτ (u) I(τ (u) ≤ yu) (simulated with parameters βL , BL ) has bounded relative error. Proof. The assumption y > 1/κ0 (γ) ensures that ψ(u, yu)/ψ(u) → 1 (Theorem V.4.1) so that z(u) = ψ(u, yu) is of order of magnitude e−γu . Bounding EL Z(u)2 above by e−γu , the result follows as in the proof of Theorem 3.1. 2 We next consider the case y < 1/κ0 (γ). We recall that αy is defined as the solution of κ0 (α) = 1/y, that γy = αy − yκ(αy ) determines the order of magnitude of ψ(u, yu) in the sense that − log ψ(u) → γy u
(4.1)
(Theorem V.4.9), and that γy > γ. Further £ ¤ ψ(u, yu) = e−αy u Eαy e−αy ξ(u)+τ (u)κ(αy ) ; τ (u) ≤ yu .
(4.2)
Since the definition of αy is equivalent to Eαy τ (u) ∼ yu, one would expect that the change of measure P → Pαy is in some sense optimal. The corresponding estimator is ¡ ¢ Z(u) = e−αy Sτ (u) +τ (u)κ(αy ) I τ (u) ≤ yu , (4.3) and we have: Theorem 4.2 The estimator (4.3) (simulated with parameters βαy , Bαy ) is logarithmically efficient. Proof. Since γy > γ, we have κ(αy ) > 0 and get Eαy Z(u)2
≤
£ ¤ Eαy e−2αy Sτ (u) +2τ (u)κ(αy ) ; τ (u) ≤ yu £ ¤ e−2γy u Eαy e−2αy ξ(u) ; τ (u) ≤ yu
≤
e−2γy u .
=
5. DYNAMIC IMPORTANCE SAMPLING
475
Hence by (4.1), lim inf u→∞
− log Var(Z(u)) − log Var(Z(u)) = lim inf ≥ 2 u→∞ − log z(u) γy u
so that (1.5) follows.
2
Remark 4.3 Theorem V.4.9 has a stronger conclusion than (4.1), and in fact, (4.1) (which is all that is needed here) can be shown more easily. Let σy2 = ¡ ¢ D limu→∞ Varαy τ (u) /u so that (τ (u)−yu)/(σy u1/2 ) → N (0, 1) (see Proposition V.4.2). Then z(u) £ ¤ = Eαy Z(u) ≥ Eαy e−αy Sτ (u) +τ (u)κ(αy ) ; yu − σy u1/2 < τ (u) ≤ yu £ ¤ = e−αy u+yuκ(αy ) Eαy e−αy ξ(u)+(τ (u)−yu)κ(αy ) ; yu − σy u1/2 < τ (u) ≤ yu £ ¤ 1/2 ≥ e−γy u+σy u κ(αy ) Eαy e−αy ξ(u) ; yu − σy u1/2 < τ (u) ≤ yu ¡ ¢ 1/2 ∼ e−γy u+σy u κ(αy ) Eαy e−αy ξ(∞) Φ(1) − 1/2 where the last step follows by Stam’s lemma (Proposition V.4.4). Hence lim inf u→∞
log z(u) −γy u + σy u1/2 κ(αy ) ≥ lim inf = −γy . u→∞ u u
That lim sup ≤ follows similarly (but more easily) as when estimating Eαy Z(u)2 above. 2 Notes and references The algorithms in the present section are the obvious ones, but seem to have been discussed for the first time in the first edition of this book. See also Nyrhinen [667]. In Asmussen [56], related discussion is given in a heavy traffic limit η ↓ 0 rather than when u → ∞.
5
Dynamic importance sampling
The terms dynamic importance samplingor adaptive importance sampling are used in at least two different meanings. One meaning is algorithms that, during the execution, change the importance distribution or seek for a good one; a good example is the crossentropy algorithm, Rubinstein & Kroese [751]. The sense in which we will understand these terms is in describing algorithms that are leveland timedependent: the importance distribution for (say) the Poisson rate
476
CHAPTER XV. SIMULATION METHODOLOGY
and the claim size distribution at time t in a compound Poisson claim surplus process {St } depends on the current value St as well as on t. Algorithms of this type have received considerable attention in recent years in areas such as queueing theory and have managed to provide efficient algorithms in situations where traditional (static) importance sampling got into difficulties. The basic idea in most of the papers in the area is to implement the principle of looking for a description of the conditional distribution given the rare event. We will exemplify this in two settings. Most steps in the variance calculations leading to asymptotic efficiency results are omitted since they are always very lengthy and technical in the dynamical setting.
5a
An algorithm by Dupuis, Leder and Wang
We follow Dupuis, Leder & Wang [336]. The setting is again that of estimating P(Sn > u) where Sn = X1 + · · · + Xn with X1 , . . . , Xn nonnegative and i.i.d. with common subexponential distribution F with density f (F can in particular be B0 as discussed earlier). e will generate In dynamic importance sampling, the importance distribution P e Xk from a density fu,k,x depending both on u, k and Sk−1 = x. Thus, the estimator is n Y f (Xk ) Z(u) = I(Sn > u). (5.1) e fu,k,S (Xk ) k=1
k−1
If x > u, obviously no importance sampling is needed. If x ≤ u, x will typically be much smaller than u. Basically, the event Sn > u then occurs by one of the X` , ` = k, . . . , n, exceeding u − x, and the probability that k = ` is 1/(n − k + 1); otherwise Xk is ‘typical’. This suggests taking feu,k,x (y) =
n−k 1 f (y) I(y > u − x) f (y) + n−k+1 n − k + 1 F (u − x)
(5.2)
(note that I(y > u − x) f (y)/F (u − x) is the conditional density of Xk given Xk > u − x). Unfortunately, this idea is too naive to produce efficient estimators, see Remark 5.2 below. One needs to replace the conditioning Xk > u − x by Xk > a(u − x) for some a < 1. As a generalization, [336] also allows weights different from the ones in (5.2). Thus, instead of (5.2) one has feu,k,x (y) = pk f (y) + qk where pk + qk = 1.
¡ ¢ f (y) I y > a(u − x) F (a(u − x))
(5.3)
5. DYNAMIC IMPORTANCE SAMPLING
477
Theorem 5.1 Assume that F is regularly varying with index α and that the importance distribution is given by (5.3). Then for any fixed n, the estimator (5.1) has bounded relative error. More precisely, n−1 `−1 n−1 Y 1 1 X 1 Y 1 EZ(u)2 + α , → pk a q` pk F (u) `=1 k=1 k=1
u → ∞.
(5.4)
As said above, the proof is too lengthy to be given here. Remark 5.2 Relation (5.4) shows that the closer a is to 1, the more asymptotically efficient is the estimator (5.3). It is therefore tempting to take a = 1. However, it turns out that there is a discontinuity at a = 1, and for a = 1, there is in fact not even logarithmic efficiency. The problem is that the firstorder heavytailed asymptotics are more imprecise than with light tails: realizations with max Xk < u but Sn > u are asymptotically unimportant, but cannot be neglected for a finite u. This phenomenon is somewhat related to the slow rate of convergence of heavytailed approximations. 2 Remark 5.3 An obvious question is to find the minimizers p∗1 , . . . , p∗n of the r.h.s. of (5.4). They are in fact not given by pk = (n − k)/(n − k + 1) but by p∗k =
(n − k − 1)/aα/2 + 1 (n − k)/aα/2 + 1
(of course, these two expressions coincide as a ↑ 1).
2
Notes and references Further relevant papers in the same direction are Dupuis & Wang [337] and Hult & Svensson [484].
5b
An algorithm by Blanchet and Glynn
For a more general discussion of the distribution of a stochastic process given a rare event (e.g. ruin), consider a discrete state Markov chain {Xn } with transition probabilities p(x, y), Let the state space be E, let G ⊂ E and τG = inf {n : Xn ∈ G}, h(x) = Px (τG < ∞). Then for any initial value x0 6∈ G, the conditional distribution of {Xn } given τG < ∞ is a Markov chain with transition probabilities h(y) . (5.5) p∗ (x, y) = p(x, y) h(x) See [79, VI.7]; the transition function in (5.5) is referred to as an htransform.
478
CHAPTER XV. SIMULATION METHODOLOGY
In a ruin context, one is interested in evaluating h(x0 ) or several of the h(x). Simulating using the p∗ (x, y) is of course not practicable since one is simulating precisely because the function h is unknown. However, one may try to plug in an approximation and adapt this as an importance sampling scheme. Example 5.4 Assume that Xn = Sn is a random walk with negative drift, G = (u, ∞) and x0 = 0 (of course, the compound Poisson case or the renewal model can be handled in this way by looking at the risk process at claim arrival instants only). Then h(0) = ψ(u). In (5.5), write y = x + z and assume for simplicity that the increment distribution has a density f . With light tails, we then have the Cram´erLundberg approximation h(v) ≈ Ce−γ(u−v) for v < u, so that (5.5) suggests that the transition density from x to x + z < u be taken roughly as Ce−γ(u−x−z) = f (z)eγz . f ∗ (x + zx) = f (z) Ce−γ(u−x) That is, we are back to the Siegmund algorithm discussed in Section 3 and that was shown there to give bounded relative error. Obviously, this is a promising start for implementing the htransform ideas. However, light tails are the easy case! We will see below that much more care is needed for heavytailed increments. Here the suggestion from the standard subexponential approximations in Chapter X is that the transition density pe(x, x + z) from x to x + z < 0 be taken roughly as f ∗ (x + zx) = D(x) f (z)
F I (u − x − z) , F I (u − x)
where D(x) is a normalizing constant. However, there are at least two difficulties in this choice. First, f ∗ (x + zx) is not a standard density even in simple cases as the Pareto where f ∗ (x + zz) ≈
α(1 + u − x)α−1 (1 + z)α+1 (1 + u − x − z)α−1
so that it is not straightforward to generate r.v.’s from f ∗ (x + zx). Further, f ∗ (x + zx) depends on x, which makes it far more difficult to bounding the variance than in the Siegmund case. 2 We will not discuss the r.v. generation issue here, but to resolve the difficulties in bounding the variance, we return to the general Markov chain case. Assume that the importance distribution is a Markov chain with transition probabilities of the form pe(x, y) = p(x, y)/r(x, y), where for each x one would typically try to choose r(x, y) as c(x)/a(y) where a(y) is roughly asymptotically
5. DYNAMIC IMPORTANCE SAMPLING
479
proportional to h(y) and c(x) ensures the normalization estimator for z = h(x0 ) is then I(τG < ∞)
τG Y
r(Xn−1 , Xn ),
P y∈E
pe(x, y) = 1. The
(5.6)
n=1
with the Xn simulated as a Markov chain with X0 = x0 and transition probabilities pe(x, y). Had one used instead the p∗ (x, y), one could be sure that the simulation would terminate, i.e. P∗ (τG < ∞) = 1. Given the way the r(x, y) e G < ∞) = 1, but this is a separate have been chosen, one could hope that also P(τ problem that we will ignore in the following. As noted above, a crucial but not easy point is to estimate and bound the variance of Z, or equivalently the second moment vector m2 with elements e x Z 2 , x 6∈ G. In the rest of this section, we follow Blanchet & m2 (x) = E Glynn [175]. The main idea of [175] is to use a Lyapounov function technique, cf. part (iii) of the following result. Define K as the Gc × Gc matrix with elements k(x, y) = r(x, y)p(x, y), and let η be the column vector with elements P c η(x) = Px (X1 ∈ G) = y∈G k(x, y). Note that m2 and η have dimension G . TheoremP 5.5 (i) The vector m2 is the minimal solution to m2 = η + Km2 . ∞ (ii) m2 = m=0 K m η. (iii) Let k be an Gc vector such that Kk ≤ k − η. Then m2 ≤ k. Given the potential of Theorem 5.5 and the following Corollary 5.6 for the (in general very difficult) problem of bounding the variance of the estimator (5.6), we give the Proof. We have X
Ex0 [Z 2  τG = m] =
x1 ,...,xm−1 6∈G, xm ∈G
pexm−1 ,xm
m−1 Y
r(Xn−1 , Xn )2 pexn−1 ,xn .
n=1
This is the x0 th element of the vector K m η. Summing over m, (ii) follows. (i) is then an easy consequence. For (iii), we have η ≤ k − Kk and hence K m η ≤ K m k − K m+1 k for all m. Thus n X K m η ≤ k − K n+1 k ≤ k . m=0
Letting n → ∞ and using (ii) gives (iii). 2 Intuitively, one should choose a(x) as a good approximation to h(x) and then take k(x) of the form a(x)2 b k(x) with b k(x) = O(1). This idea is made precise in the following corollary and its proof:
480
CHAPTER XV. SIMULATION METHODOLOGY
Corollary 5.6 Assume r(x, y) has the form c(x)/a(y) and that X c(x) p(x, y)a(y)b k(y) ≤ b k(x)a(x)2
(5.7)
y∈E
b and all y 6∈ G. If b for some Evector k k(x) ≥ 1 for all x ∈ E and a(y) ≥ κ > 0 for all y ∈ G, then m2 (x) ≤ κ−2 a(x)2 b k(x) for all x 6∈ G. Proof. We first note that η(x)
X c(x) ≤ κ−2 c(x) p(x, y)a(y) a(y) y∈G y∈G X ≤ κ−2 c(x) p(x, y)a(y)b k(y) .
=
X
p(x, y)
(5.8)
y∈G
k(x). Then for x 6∈ G, we have Define k(x) = κ−2 a(x)2 b κ−2 c(x)p(x, y)a(y)b k(y) = κ−2 b k(x, y)a(y)2 b k(y) = b k(x, y)k(y) . Thus combining with (5.8), it follows from (5.7) divided by κ2 that k ≥ Kk + η. Now appeal to Theorem 5.5(iii). 2 In the rest of this section, we assume that Xn = −u + Y1 + · · · + Yn is a random walk with negative drift, subexponential increments and (for simplicity) density f (z), and take G = (0, ∞). The ruin probability is then ψ(u) = h(−u) and τG is the ruin time. The start of implementing the htransform ideas is easy: as for light tails, we have an approximation for h(v) for v < 0, now h(v) ≈ a(v) = CF I (−v) where FI is the integrated tail distribution of the increment distribution F , cf. X.3.1. In the representation r(x, y) = c(x)/a(y), it will be convenient to be able to think of a(y) as the tail of a r.v. Z, that we take as the r.v. with Z ∞ h i 1 F (s) ds . P(Z > z) = min 1 , EY  z Thus, we take a(y) = P(Z > z) and have Z c(x) = p(x, y)a(y) dy = Ea(x + Y ) = P(Y + Z > −x) . R
The most obvious procedure is now to use the estimator τG Y c(Xn−1 ) . Z = Z(u) = I(τG < ∞) a(Xn ) n=1
(5.9)
5. DYNAMIC IMPORTANCE SAMPLING
481
However, as for the DupuisLederWang algorithm one encounters the difficulty that the most obvious choice does not have the desired efficiency properties but needs modification, more precisely to Z = Z(u) = I(τG < ∞)
τG Y c(Xn−1 + x∗ ) . a(Xn + x∗ ) n=1
(5.10)
Here x∗ = x∗ (γ) is taken as in the following lemma (recall the definition of the class S ∗ from p. 302): Lemma 5.7 Assume Y + ∈ S ∗ . Then: (i) c(x) − a(x) = o(F (−x)) as x ↓ −∞; (ii) given γ ∈ (0, 1], there exists x∗ (γ) ≤ 0 such that a(x)2 − c(x)2 ≥ −γ for all x ≤ x∗ (γ) . F (−x)c(x)
(5.11)
Theorem 5.8 Assume Y + ∈ S ∗ , let 0 < γ < 1, let x∗ be defined as in (5.11) and d(x∗ ) = P(Z > −x∗ ). Then lim sup u→∞
2 e 1 EZ(u) ≤ . 2 ψ(u) (1 − γ)d(x∗ )2
The proof of Theorem 5.8 and Lemma 5.7 (which is a crucial step in bounding the variance) are long and technical, although in principle elementary, and will not be reproduced here. We note once more that random variate generation in (5.11) is not a standard problem. Notes and references A further relevant reference in the setting of the BlanchetGlynn algorithm is Blanchet, Glynn & Liu [176]. To summarize our discussion of dynamic importance sampling, the method is not straightforward to implement in the heavytailed case. The most obvious ideas need modification and tuning to produce efficient algorithms, and these steps may require tedious calculations, cf. Lemma 5.7. Further, bounding the variance is not straightforward at all and random variate generation may present problems. However, the BlanchetGlynn algorithm is remarkable by being the first to be efficient for an infinite horizon problem with heavy tails when no alternative representation (say the PollaczeckKhinchine geometric sum in the compound Poisson model) is available. In fact, it is shown in Bassamboo, Juneja & Zeevi [140] that no static importance sampling algorithm exists for efficient simulation of the tail of the maximum of a random walk (at least in the regularly varying case).
482
6
CHAPTER XV. SIMULATION METHODOLOGY
Regenerative simulation
Our starting point is the duality representations in III.3: for many risk processes {Rt }, there exists a dual process {Vt } such that ³
ψ(u, T ) ψ(u)
´ inf Rt < 0 = P(VT > u), 0≤t≤T ³ ´ = P inf Rt < 0 = P(V∞ > u), = P
t≥0
(6.1)
where the identity for ψ(u) requires that Vt has a limit in distribution V∞ . In most of the simulation literature (say in queueing applications), the object of interest is {Vt } rather than {Rt }, and (6.1) is used to study V∞ by simulating {Rt } (for example, the algorithm in Section 3 produces simulation estimates for the tail P(W > u) of the GI/G/1 waiting time W ). However, we believe that there are examples also in risk theory where (6.1) may be useful. One main example is {Vt } being regenerative (see A.1): then by Proposition A1.3, ψ(u) = P(V∞
1 E > u) = Eω
Z
ω
I(Vt > u) dt
(6.2)
0
where ω is the generic cycle for {Vt }. The method of regenerative simulation, which we survey below, provides estimates for P(V∞ > u) (and more general expectations Eg(V∞ )). Thus the method provides one answer on how to avoid simulating {Rt } for an infinitely long time period. For details, consider first the case of independent cycles. Simulate a zerodelayed version of {Vt } until a large number N of cycles have been completed. ¡ (i) (i) ¢ (i) For the ith cycle, record Z (i) = Z1 , Z2 where Z1 = ωi is the cycle length, (i) (i) Z2 the time during the cycle where {Vt } exceeds u and zj = EZj , j = 1, 2. Then Z (1) , . . . , Z (N ) are i.i.d. and Z ω (i) (i) EZ1 = z1 = Eω, EZ2 = z2 = E I(Vt > u) dt . 0
Thus, letting Z1 =
1 ¡ (1) (N ) ¢ Z1 + · · · + Z1 , N
Z2 = (1)
1 ¡ (1) (N ) ¢ Z2 + · · · + Z2 , N (N )
Z + · · · + Z2 Z2 b = 2(1) ψ(u) = , (N ) Z1 Z1 + · · · + Z1
6. REGENERATIVE SIMULATION a.s.
483
a.s.
the LLN yields Z1 → z1 , Z2 → z2 , E a.s. z2 b = ψ(u) → z1
Rω 0
I(Vt > u) dt = ψ(u) Eω
b as N → ∞. Thus, the regenerative estimator ψ(u) is consistent. To derive confidence intervals, let Σ denote the 2 × 2 covariance matrix of Z (i) . Then ¢ D 1 ¡ √ Z1 − z1 , Z2 − z2 → N2 (0, Σ) . N Therefore, a standard transformation technique (sometimes called the delta method, cf. [79, IV.4]) yields ¢ ¢ D 1 ¡ ¡ √ h Z1 , Z2 − h (z1 , z2 ) → N (0, σh2 ) N for h : R2 → R and σh2 = ∇h Σ∇0h , ∇h = (∂h/∂z1 ∂h/∂z2 ). Taking h(z1 , z2 ) = z2 /z1 yields ∇h = (−z2 /z12 1/z1 ), ¢ D 1 ¡b √ ψ(u) − ψ(u) → N (0, σ 2 ) N where σ2 =
1 z2 z22 Σ11 + 2 Σ22 − 2 3 Σ12 . 4 z1 z1 z1
(6.3)
(6.4)
The natural estimator for Σ is the empirical covariance matrix N
¢¡ ¢T 1 X¡ (i) Z − Z Z (i) − Z N − 1 i=1
S = so σ 2 can be estimated by
2
2
Z2
=
4 Z1
S11 +
1
2 Z1
S22 − 2
Z2
(6.5) 3 S12 Z1 √ b and the 95% confidence interval is ψ(u) ± 1.96s/ N . The regenerative method is not likely to be efficient for large u but rather a brute force one. However, in some situations it may be the only one resolving the infinite horizon problem, say risk processes with a complicated structure of the point process of claim arrivals and heavytailed claims. There is potential also for combining it with some variance reduction method. s
Notes and references The literature on regenerative simulation is extensive, and we will not attempt a literature survey here.
484
7
CHAPTER XV. SIMULATION METHODOLOGY
Sensitivity analysis
We return to the problem of IV.9, to evaluate the sensitivity ψζ (u) = (d/dζ) ψ(u) where ζ is some parameter governing the risk process. In IV.9, asymptotic estimates were derived using the renewal equation for ψ(u). We here consider simulation algorithms which have the potential of applying to substantially more complex situations. Before going into the complications of ruin probabilities, consider an extremely simple example, the expectation z = EZ of a single r.v. Z of the form Z = ϕ(X) where X is a r.v. with distribution depending on a parameter ζ. Here are the ideas of the two main approaches in today’s simulation literature: The score function (SF) R method. Let X have a density f (x, ζ) depending on ζ. Then z(ζ) = ϕ(x)f (x, ζ) dx so that differentiation yields Z Z d d ϕ(x)f (x, ζ) dx = zζ = ϕ(x) f (x, ζ) dx dζ dζ Z (d/dζ)f (x, ζ) = ϕ(x) f (x, ζ) dx = E[SZ], f (x, ζ) where S =
(d/dζ)f (X, ζ) d log f (X, ζ) = f (X, ζ) dζ
is the score function familiar from statistics. Thus, SZ is an unbiased Monte Carlo estimator of zζ . Infinitesimal perturbation analysis (IPA) uses sample path derivatives. So assume that a r.v. with density f (x,¡ζ) can ¢be generated as h(U, ζ) where U is uniform(0, 1). Then z(ζ) = Eϕ h(U, ζ) , hd ¡ h ¡ i ¢i ¢ ϕ h(U, ζ) zζ = E = E ϕ0 h(U, ζ) hζ (U, ζ) , dζ ¡ ¢ where hζ (u, ζ) = (∂/∂ζ)h(u, ζ). Thus, ϕ0 h(U, ζ) hζ (U, ζ) is an unbiased Monte Carlo estimator of zζ . For example, if f (x, ζ) = ζe−ζx , one can take h(U, ζ) = − log U/ζ, giving hζ (U, ζ) = log U/ζ 2 . The derivations of these two estimators are heuristic in that both use an interchange of expectation and differentiation that needs to be justified. For the SF method, this is usually unproblematic and involves some application of dominated convergence. For IPA there are, however, nonpathological examples where sample path derivatives fail to produce estimators with the correct expectation. To see this, just take ϕ as an indicator function, say ϕ(x) = I(x > x0 )
7. SENSITIVITY ANALYSIS
485
and assume that h(U, ζ) is increasing in ζ. Then, for some ζ0 = ζ0 (U ), ϕ(h(U, ζ)) is 0 for ζ < ζ0 and 1 for ζ > ζ0 so that the sample path derivative ϕ0 (h(U, ζ)) is 0 w.p. one. Thus, IPA will estimate zζ by 0 which is obviously not correct. In the setting of ruin probabilities, this phenomenon is particularly unpleasant since indicators occur widely in the CMC estimators. A related difficulty occurs in situations involving the Poisson number Nt of claims: also here the sample path derivative w.r.t. β is 0. The following example demonstrates how the SF method handles this situation. Example 7.1 Consider the sensitivity ψβ (u) w.r.t. the Poisson rate β in the compound Poisson model. Let M (u) be the number of claims up to the time τ (u) of ruin (thus, τ (u) = T1 + · · · + TM (u) ). The likelihood ratio up to τ (u) for two Poisson processes with rates β, β0 is M (u)
Y βe−βTi ¡ ¢ I τ (u) < ∞ . −β T 0 i β0 e i=1
Taking expectation, differentiating w.r.t. β and letting β0 = β, we get ψβ (u) = =
¸ ·M (u) ³ ´ ¡ X ¢ 1 − Ti I τ (u) < ∞ E β i=1 ¶ ¸ ·µ ¡ ¢ M (u) − τ (u) I τ (u) < ∞ . E β
To resolve the infinite horizon problem, change the measure to PL as when simulating ψ(u). We then arrive at the estimator ¶ µ M (u) − τ (u) e−γu e−γξ(u) Zβ (u) = β for ψβ (u) (to generate Zβ (u), the risk process should be simulated with parameters βL , BL ). We recall (Proposition IV.9.4) that ψβ (u) is of the order of magnitude ue−γu . Thus, the estimation of ψβ (u) is subject to the same problem concerning relative precision as in rare event simulation. However, since ´2 ³ M (u) − τ (u) e−2γu = O(u2 )e−2γu , EL Zβ (u)2 ≤ β we have
¡ ¢ VarL Zβ (u) O(u2 )e−2γu ∼ = O(1) zβ (u)2 u2 e−2γu
so that in fact the estimator Zβ (u) has bounded relative error.
2
486
CHAPTER XV. SIMULATION METHODOLOGY
Remark 7.2 IPA and score functions are not the only ones around. Here are some further alternatives: Finite differences are simply a stochastic version of numerical differentiation. So, assume Z can be generated as h(X, ζ) for a suitable random vector X. Then the estimate of zζ is h(X, ζ + h/2) − h(X, ζ − h/2) (7.1) h (there are several possible variants; this one uses common random numbers and central differences). In many situations, this idea is the simplest one to implement. Its problem is that the estimate is biased. In the limit h ↓ 0, (7.1) becomes the IPA estimator. The idea of weak derivatives is measurevalued differentiation. Suppose that we are interested in the sensitivity of Eh(Y , X) w.r.t. ζ, where X has density f (x, ζ) with f 0 (x, ζ) = ∂f (x, R ζ)/∂ζ and Y is a random vector with distribution independent of ζ. Since f 0 (x, ζ) dx = 0, we will typically be able to write f 0 (x, ζ) as kf+ (x, ζ)−kf− (x, ζ) where f+ (x, ζ), f− (x, ζ) are probability densities and k a constant. If W+ , W− are r.v.’s with these densities, we therefore have Z Z d d Eh(Y , X) = Eh(Y , x) f (x, ζ) dx = Eh(Y , x) f 0 (x, ζ) dx dζ dζ £ ¤ = E kh(Y , W+ ) − kh(Y , W− ) , so that the desired estimator can be taken as kh(Y , W+ ) − kh(Y , W− ). For example, in the Poisson case f (x, ζ) = e−ζ ζ x /x! we get f 0 (x, ζ) = e−ζ ζ x−1 /(x − 1)! − e−ζ ζ x /x! = f (x − 1, ζ) − f (x, ζ) (with the convention (−1)/(−1)! = 0) so that k = 1 and we can generate W+ , W− as V+ + 1, V− with being V+ , V− Poisson(ζ). Finally, in finance (where the sensitivities go under the name Greeks) methods based on formulas from Malliavin calculus have become popular. 2 Notes and references A general survey of simulation methods for evaluating sensitivities is given in Asmussen & Glynn [79]. For topics not treated there in detail, see e.g. Heidergott [456, 457] for weak derivatives, and Fournie et al. [368] and KohatsuRiga & Montero [549] for the Malliavin approach. A general reference for IPA is Glasserman [417], one for the SF method is Rubinstein & Shapiro [754]. Example 7.1 is from Asmussen & Rubinstein [100] who also work out a number of similar sensitivity estimators, in part for different measures of risk than ruin probabilities, for different models and for the sensitivities w.r.t. different parameters. There has been much work on resolving the difficulties associated with IPA pointed out above. In the setting of ruin probabilities, a relevant reference is V´ azquezAbad [862].
Chapter XVI
Miscellaneous topics 1
More on discretetime risk models
There are at least two reasons to consider the discretetime counterparts of continuoustime risk models: one is that the resulting approximation can be computationally easier to handle, in particular when more complex features like interest, investment, dividends and reinsurance are also included. Secondly one could claim that all events (claims, premium payments etc.) are in practice only observable and/or payable at discrete points in time and so a discrete modeling may be considered closer to reality. However, much of the mathematical elegance and insight is usually lost when replacing continuoustime dynamics by discrete ones. If the claim size distribution is also discrete (which is the case we consider here1 ), then the differential equations from the continuous setup are replaced by difference equations and therefore the probability of ruin can be calculated recursively for given numerical values of the model parameters. A disadvantage of this approach is that it is usually not possible to track the influence of model parameters on the final result and consequently the qualitative behavior of ruin probabilities. On the other hand, the resulting method for calculating ruin probabilities and related quantities is simple and general and, as we shall see below, some relations and identities of continuoustime models have analogues in the discretetime setup. 1 Note that at several places in this book we have already dealt with certain discretetime models as approximations for continuoustime models, then usually with continuous claim size distributions. Here we focus on the fully discrete model to emphasize the computational alternative for obtaining ruin probabilities that it may offer.
487
488
CHAPTER XVI. MISCELLANEOUS TOPICS © (d) ª Assume that the discretetime risk reserve process Rn is given by Rn(d) (u) = u + n −
n X
Xn ,
n ∈ N,
(1.1)
i=1
where X1 , X2 , . . . are i.i.d. integervalued nonnegative random variables with probability function hk = P(X1 = k) (k = 0, . . . , ∞) and c.d.f. H. The interpretation is that Xj is the total claim amount paid in year (or time unit) j. The © (d) ª initial capital u is also assumed to be a nonnegative integer, so that Rn is always integervalued. We impose throughout the net profit condition EX1 < 1. Remark 1.1 The model (1.1) is often referred to as the compound binomial model. This is justified because each time the total claim size is 0 with Pin PNinterval n n probability h0 > 0 and hence i=1 Xi = i=1 Yi , where Nn is a binomial(n, 1− h0 ) r.v. and P(Yi = k) = hk /(1 − h0 ) for k = 1, 2, . . .. The compound Poisson model then appears as the natural continuoustime limit. In that sense the discretetime model can also help to sharpen the intuition for the continuoustime setup. 2 © (d) ª by Define as usual the claim surplus process Sn Sn(d) = u − Rn(d) (u) =
n X
Xn − n .
i=1
The ruin time for (1.1) is defined as © ª © ª τ (d) (u) = min n ≥ 1 : Rn(d) (u) ≤ 0 = min n ≥ 1 : Sn(d) ≥ u and the ruin probability as ³ ´ ¡ ¢ ψ (d) (u) = P τ (d) (u) < ∞ = P max Sn(d) ≥ u n≥1
(we follow here the tradition of the literature to consider the process ruined already if it reaches level 0, but only for some n ≥ 1, so ψ(0) < 1). Proposition 1.2 The ruin probability for the discretetime risk process (1.1) satisfies the recursion ψ (d) (u) =
u−1 X
∞ X ¡ ¢ ¡ ¢ 1 − H(y) ψ (d) (u − y) + 1 − H(y) , y=u
y=0
with starting value ψ (d) (0) =
P∞ ¡ y=0
¢ 1 − H(y) = E(X1 ).
u = 1, 2, . . . , (1.2)
1. MORE ON DISCRETETIME RISK MODELS
489
This is obviously a discrete analogue of the renewal equation IV.(3.2), and becomes clear at once if one conditions on the value y of the first (weak) ascending ladder point Sτ+ of the claim surplus process, where © ª τ+ = τ (d) (0) = min n ≥ 1 : Sn(d) ≥ 0 provided it has been shown that ¡ ¢ gy(d) = P Sτ+ = y, τ+ < ∞ = 1 − H(y) . (d)
That gy = 1−H(y) can be proved by adapting the proof of Theorem III.5.1 from continuous to discrete time. This is straightforward, which is intuitively plausible from the fact that the claim surplus processes have the common feature of being downward skipfree with unit drift when no claims occur. Alternatively, (d) one may simply refer to the form of gy as a known result in random walk theory (e.g. [APQ, Cor. 5.6, p. 236] combined with the connection on [APQ, p. 222] between strong and weak ladder points heights). We shall, however, also present a direct proof that avoids the slightly sophisticated probabilistic ideas of III.5 or [APQ]. Proof. In the first time unit, the premium income is 1 and the risk reserve process will only survive if the total claim amount satisfies X1 ≤ u and will then start anew at the level u + 1 − X1 (note that because of the independence assumption of the total claim amounts X1 , X2 , . . . the process is Markov). Hence ψ (d) (u) =
u X
hk ψ (d) (u + 1 − k) + 1 − H(u),
k=0
=
u+1 X
hu+1−j ψ (d) (j) + 1 − H(u),
u = 0, 1, 2, . . .
(1.3)
j=1
From this it follows that for w = 0, 1, 2, . . . w X
ψ
(d)
(u)
=
u=0
w u+1 X X
hu+1−j ψ
(d)
(j) +
u=0 j=1
= =
w+1 X j=1 w X j=1
ψ
(d)
w X ¡
¢ 1 − H(u) ,
u=0
(j)
w X u=j−1
hu+1−j
w X ¡ ¢ + 1 − H(u) u=0
ψ (d) (j)H(w + 1 − j) + ψ (d) (w + 1) h0 +
w X ¡ u=0
1 − H(u)
¢
490
CHAPTER XVI. MISCELLANEOUS TOPICS
or equivalently ψ (d) (w + 1) h0 = psi(d) (0) +
w X
w ¡ ¢ X ¡ ¢ ψ (d) (j) 1 − H(w + 1 − j) − 1 − H(u) . u=0
j=1
At the same time we can read off from (1.3) that ψ
(d)
(w + 1) h0 = ψ
(d)
(w) −
w X
¡ ¢ hw+1−j ψ (d) (j) − 1 − H(w) ,
j=1
and equating the last two equations gives ψ
(d)
(w) = ψ
(d)
(0) +
w X
ψ
(d)
X¡ ¡ ¢ w−1 ¢ (j) 1 − H(w − j) − 1 − H(j) .
j=1 (d)
On the other hand, with gy ψ (d) (u) =
u−1 X
(1.4)
j=0
as above, we clearly have
gy(d) ψ (d) (u − y) +
∞ X
gy(d) ,
u = 1, 2, . . .
y=u
y=0
and ψ (d) (0) =
∞ X
gy(d) .
y=0
We can hence write for u = 1, 2, . . . ψ (d) (u)
=
u−1 X
gy(d) ψ (d) (u − y) + ψ (d) (0) −
y=0
=
u X
u−1 X
gy(d)
(1.5)
y=0 (d)
gu−y ψy(d) + ψ (d) (0) −
u−1 X
y=1
gy(d) .
(1.6)
y=0
Comparing (1.4) and (1.6) now establishes gy(d) = 1 − H(y), and ψ
(d)
y = 0, 1, 2, . . .
∞ X ¡ ¢ (0) = 1 − H(y) = E(X1 ). y=0
Inserting the latter formula in (1.5) now gives (1.2).
2
1. MORE ON DISCRETETIME RISK MODELS
491 (d)
Remark 1.3 Note the complete analogy of the formulas for gy and ψ (d) (0) with the ones for the compound Poisson risk model in Chapter IV (this comes as no surprise, since as outlined in Remark 1.1 the latter may be interpreted as the continuoustime limit of the compound binomial model). 2 Remark 1.4 The proof of Proposition 1.2 via random walk theory has, how(d) ever, the advantage of naturally giving the form of gy for more general upward steps than 1 subject to some rootfinding. More precisely, one can handle the model n X Rn(d) (u) = u + (Yn − Xn ), n ∈ N, (1.7) i=1
where the Yn are i.i.d. with support in {0, . . . , r} for some r ∈ N. For details, see [APQ, Sect. VIII.5a]. An appealing special case of (1.7) is Yn = r > 1. This allows the claims to be orders smaller than the premium inflow, which may appear more realistic than (1.1). 2 Example 1.5 Consider the twopoint distribution h0 = θ = 1 − h2 for θ > 1/2. Then we are in the situation of the Gambler’s ruin problem and the recursion (1.2) indeed corresponds of Proof 1 of Proposition II.2.1 (with a = ∞) ¡ to the ¢one u leading to ψ (d) (u) = (1 − θ)/θ as already given in II.(2.3). 2 Example 1.6 Assume geometrically distributed claim sizes with h0 = p and hk = (1 − p)(1 − α)αk−1 for k ≥ 1 with 0 < p < 1 and α such that E(X1 ) = (1−p)/(1−α) < 1. In this case the recursion (1.2) reduces, after a little algebra, to ψ (d) (u + 1) = (α/p)ψ (d) (u) leading to ψ
(d)
1−p (u) = 1−α
µ ¶u α . p 2
Consider now the finitetime ruin probability ψ (d) (u, t) = P(τ (d) (u) ≤ t),
t ∈ N.
Noting that ψ (d) (u, 1) = 1 − H(u), it follows from the Markov property that ψ (d) (u, t) = ψ (d) (u, 1) +
u X k=0
hk ψ (d) (u + 1 − k, t − 1)
for all t = 2, 3, . . . (1.8)
492
CHAPTER XVI. MISCELLANEOUS TOPICS
This bivariate recursion2 can now be used to recursively calculate ψ(u, t) for fixed integer values of u and t, whenever the claim size probability function hk (k = 0, 1, 2, . . .) is given. Although this simple recursion is one of the reasons why the discretetime model also has some popularity for approximating ψ(u, t) of continuoustime models (in particular when adding additional features like interest rates, investment and dividends in the model), one should note that in practical applications it can be very computerintensive to implement and the result only gives the numerical values with no hold of sensitivities to model assumptions such as claim size parameters. There is also a natural analogue of the adjustment coefficient for this discrete setup. Define γ (d) as the unique positive root of the equation Eer(X1 −1) = 1 © (d) (d) ª if it exists. The reason for this definition is that in this way e−γ Rn is a © (d) ª (d) a.s. martingale and because Rn → ∞ on τ (u) = ∞ , Proposition II.3.1 applies and gives Proposition 1.7 Assume that the adjustment coefficient γ (d) > 0 exists. Then ψ
(d)
(u) =
e−γ
(d)
u
¯ £ ¤, (d) E exp{−γRτ (d) (u) } ¯ τ (d) (u) < ∞
In particular, the Lundberg inequality ψ (d) (u) ≤ e−γ
(d)
u
u ≥ 0.
holds.
Notes and references An early reference for the compound binomial model (1.1) is Gerber [398]. PollaczeckKhinchinetype formulas for ψ(u) can be found in Gerber [404] and Shiu [802]. De Vylder & Goovaerts [302] investigate possibilities to speed up the recursive calculation (1.8). In particular they give error bounds for ψ(u, t) if the claim size distribution is truncated. Another representation of the recursion (1.8) is given in Willmot [885]. Other quantities like the time of ruin, the surplus prior to ruin and the deficit at ruin in the compound binomial model are e.g. studied in Cheng, Gerber & Shiu [235], Li & Garrido [588] and Liu & Guo [602]. In the compound binomial model the number of periods until a claim occurs is geometrically distributed with parameter h0 . An extension is to allow for more general distributions for the number of interclaim time periods (which is the discretetime analogue of the extension of a Poisson process to a renewal process). The resulting discretetime Sparre Andersen model has received some interest recently; for a survey see Li, Lu & Garrido [593]. 2 Which is the discretetime analogue of having an additional partial derivative w.r.t. time t in the integrodifferential equation for ψ in the compound Poisson case, cf. Chapter V.
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
493
Cossette, Landriault & Marceau [260] extend the compound binomial model by assuming that the indicator random variable of whether Xj is nonzero in period j follows a homogeneous Markov chain, for an extension to a Markovmodulated environment for both this indicator r.v. and the claim size distribution see [261] and Yang, Zhang & Lan [905]. Another dependence structure between subsequent claim sizes is considered in Yuen & Guo [907]. For the effect of dependent claims in a discretetime model with continuous claim size distribution see e.g. Cossette & Marceau [259], Wu & Yuen [898] and Reinhard & Snoussi [732]. De Kok [286] deals with an inhomogeneous risk model. Dickson & Waters [318] use the recursions of the discretetime model to effectively approximate finitetime ruin probabilities in the Cram´erLundberg model, under additional force of interest see [320] and Brekelmans & De Waegenaere [200]. Egidio dos Reis [339] deals with moments of ruin and recovery times. For stochastic ordering concepts in the discrete framework and resulting ordering of ruin probabilities we refer to Denuit & Lef`evre [295]. As mentioned in the beginning of this section, one may take the viewpoint that observations and actions can in practice only happen at discrete points in time, but the underlying risk model has many computational and qualitative advantages when being continuous time. A possible bridge between these conflicting arguments can be to assume an underlying continuous time model and indeed only observe the risk process (and potential ruin) at discrete times. If these discrete time points are assumed to be random, e.g. exponentially distributed, this still leads to explicit formulas of continuous time flavor. By moving towards Erlang(n) (and hence more peaked) observation times with growing n, one approaches the discretetime setup with computational vehicle of continuous time models. This procedure is worked out in [20] and is close in spirit to the Erlangization approach for finite time horizon ruin probabilities as discussed in Section IX.8. For statistical inference issues for a continuous time risk model under discrete observations, see Shimizu [797].
2
The distribution of the aggregate claims
PNt We study the distribution of the aggregate claims At = 1 Ui at time t, assuming that the Ui are i.i.d. with common distribution B and independent of Nt . In particular, we are interested in estimating P(At > x) for large x. This is a topic of practical importance in the insurance business for assessing the probability of a great loss in a period of length t, say one year. Further, the study is motivated from the formulas in V.2 expressing the finite horizon ruin probabilities in terms of the distribution of At . The main example is Nt being Poisson with rate βt. For notational simplicity,
494
CHAPTER XVI. MISCELLANEOUS TOPICS
we then take t = 1 so that pn = P(N = n) = e−β
βn . n!
(2.1)
However, much of the analysis carries over to more general cases, though we do not always spell this out.
2a
The saddlepoint approximation
We impose the Poisson assumption (2.1) and define A = A1 . Then EeαA = eκ(α) ¡ ¢ b − 1 . The exponential family generated by A is given by where κ(α) = β B[α] £ ¤ Pθ (A ∈ dx) = E eθA−κ(θ) ; A ∈ dx . In particular, ¡ ¢ bθ [α] − 1 κθ (α) = log Eθ eαA = κ(α + θ) − κ(θ) = βθ B b and Bθ is the distribution given by where βθ = β B[θ] Bθ (dx) =
eθx B(dx) . b B[θ]
This shows that the Pθ distribution of A has a similar compound Poisson form as the Pdistribution, only with β replaced by βθ and B by Bθ . The analysis largely follows Example XIII.1.1. For a given x, we define the saddlepoint θ = θ(x) by Eθ A = x, i.e. κ0θ (0) = κ0 (θ) = x. b 00 [r] = ∞, Proposition 2.1 Assume that limr↑r∗ B b 000 [r] B = 0, lim∗ ¡ ¢ r↑r b 00 [r] 3/2 B
(2.2)
b < ∞}. Then as x → ∞, where r∗ = sup{r : B[r] P(A > x) ∼
e−θx+κ(θ) q . b 00 [θ] θ 2π β B
(2.3)
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
495
00 b 00 Proof. Since Eθ A = x, Varθ (A) = q κ (θ) = β B [θ], (2.2) implies that the b 00 [θ] is standard normal. Hence limiting Pθ distribution of (A − x)/ β B
£ ¤ £ ¤ P(A > x) = Eθ e−θA+κ(θ) ; A > x = e−θx+κ(θ) Eθ e−θ(A−x) ; A > x Z ∞ √ 2 1 b 00 ∼ e−θx+κ(θ) e−θ β B [θ]y √ e−y /2 dy 2π 0 Z ∞ 2 2 b 00 e−θx+κ(θ) q = e−z e−z /(2θ β B [θ]) dz b 00 [θ] 0 θ 2π β B Z ∞ e−θx+κ(θ) e−θx+κ(θ) q e−z dz = q . ∼ b 00 [θ] 0 b 00 [θ] θ 2π β B θ 2π β B 2 It should be noted that the heavytailed asymptotics is much more straightforward. In fact, just the same dominated convergence argument as in the proof of Theorem X.2.1 yields: Proposition 2.2 If B is subexponential and Ez N < ∞ for some z > 1, then P(A > x) ∼ EN B(x). Notes and references Proposition 2.1 goes all the way back to Esscher [358], and (2.3) is often referred to as the Esscher approximation. The present proof is somewhat heuristical in the CLT steps. For a rigorous proof, some regularity of the density b(x) of B is required. In particular, either of the following is sufficient: A. b is gammalike, i.e. bounded with b(x) ∼ y c1 xα−1 e−δx . B. b is logconcave, or, more generally, b(x) = q(x)e−h(x) , where q(x) is bounded ∗ away from 0 and ∞ and h(x) is convex R ∞ on ζan interval of the form [x0 , x ) where ∗ x = sup {x : b(x) > 0}. Furthermore 0 b(x) dx < ∞ for some ζ ∈ (1, 2). For example, A covers the exponential distribution and phasetype distributions, B α covers distributions with finite support or with a density not too far from e−x with α > 1. For details, see Embrechts et al. [347], Jensen [506] and references therein. For higherorder extensions of the asymptotic behavior in Proposition 2.2 see Albrecher, Hipp & Kortschak [29] and references therein. It is also shown there that the folklore use of the shifted asymptotics P(A > x) ≈ β B(x−βµB ) for Poisson N can be rigorously justified in `the sense that, under mild´ additional assumptions on B, the shifting P(A > x) ∼ EN B x−µB (E(N 2 )/E(N )−1) improves the asymptotic accuracy of Proposition 2.2 by an order of magnitude. Asymptotic results for situations where the tail behavior of N determines the tail behavior of A are given in Asmussen, Kl¨ uppelberg & Sigman [87] and Robert & Segers [742].
496
2b
CHAPTER XVI. MISCELLANEOUS TOPICS
The NP approximation
In many cases, the distribution of A is approximately normal. For example, (2) under the Poisson assumption (2.1), it holds that EA = βµB , Var(A) = βµB ¡ (2) ¢1/2 and that (A − βµB )/ βµB has a limiting standard normal distribution as β → ∞, leading to Ã ! x − βµB P(A > x) ≈ 1 − Φ q . (2.4) (2) βµB The result to be surveyed below improves upon this and related approximations by taking into account second order terms from the Edgeworth expansion. Remark 2.3 A word of warning should be said right away: the CLT (and the Edgeworth expansion) can only be expected to provide a good fit in the center of the distribution. Thus, it is quite questionable to use (2.4) and related results for the case of main interest, large x. 2 The (first order) Edgeworth expansion states that if the characteristic function gb(u) = EeiuY of a r.v. Y satisfies gb(u) ≈ e−u
2
/2
(1 + iδu3 ),
(2.5)
where δ is a small parameter, then P(Y ≤ y) ≈ Φ(y) − δ(1 − y 2 )ϕ(y).
(2.6)
Note as a further warning that the r.h.s. of (2.6) may be negative and is not necessarily an increasing function of y for y large. Heuristically, (2.6) is obtained by noting that by Fourier inversion, the density of Y is Z ∞ 1 e−iuy gb(u) du g(y) = 2π −∞ Z ∞ 2 1 e−iuy e−u /2 (1 + iδu3 ) du ≈ 2π −∞ = ϕ(y) − δ(y 3 − 3y)ϕ(y), and from this (2.6) follows by integration. In concrete examples, the CLT for Y = Yδ is usually derived via expanding the ch.f. as ½ ¾ u3 u4 u2 gb(u) = EeiuY = exp iuκ1 − κ2 − i κ3 + κ4 + · · · 2 3 4!
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
497
where κ1 , κ2 , . . . are the cumulants; in particular, κ1 = EY, κ2 = Var(Y ), κ3 = E(Y − EY )3 . Thus if EY = 0, Var(Y ) = 1 as above, one needs to show that κ3 , κ4 , . . . are small. If this holds, one expects the u3 term to dominate the terms of order u4 , u5 , . . . so that ½
u3 u2 gb(u) ≈ exp − − i κ3 2 3
¾
½
u2 ≈ exp − 2
¾µ
u3 1 − i κ3 6
¶
so that we should take δ = −κ3 /6 in (2.6). Rather than with the tail probabilities P(A > x), the NP (normal power) approximation deals with the quantile a1−² , defined as the solution of P(A ≤ y1−² ) = 1 − ². A particular case is a.995 , which is often used as the VaR (Value at Risk) for risk management purposes. p Let Y = (A − EA)/ Var(A) and let y1−² , z1−² be the (1 − ²)quantile in the distribution of Y , resp. the standard normal distribution. If the distribution of Y is close to N (0, 1), y1−² should be close to z1−² (cf., however, Remark 2.3!), and so as a first approximation we obtain p p (2.7) a1−² = EA + y1−² Var(A) ≈ EA + z1−² Var(A) . A correction term may be computed from (2.6) by noting that the Φ(y) terms dominate the δ(1 − y 2 )ϕ(y) term. This leads to
1−²
≈ ≈ ≈ =
2 Φ(y1−² ) − δ(1 − y1−² )ϕ(y1−² ) 2 Φ(y1−² ) − δ(1 − z1−² )ϕ(z1−² ) 2 Φ(z1−² ) + (y1−² − z1−² )ϕ(z1−² ) − δ(1 − z1−² )ϕ(z1−² ) 2 1 − ² + (y1−² − z1−² )ϕ(z1−² ) − δ(1 − z1−² )ϕ(z1−² )
which combined with δ = −EY 3 /6 leads to 1 2 − 1)EY 3 . y1−² = z1−² + (z1−² 6 p Using Y = (A − EA)/ Var(A), this yields the NP approximation E(A − EA)3 1 2 . − 1) a1−² = EA + z1−² (Var(A))1/2 + (z1−² 6 Var(A)3/2
(2.8)
498
CHAPTER XVI. MISCELLANEOUS TOPICS (k)
Under the Poisson assumption (2.1), the kth cumulant of A is βµB and (2) ¡ (k) ¢k/2 so κk = βµB / βµB . In particular, κ3 is small for large β but dominates κ4 , κ5 , . . . as required. We can rewrite (2.8) as (3) ¡ (2) ¢1/2 1 2 µ . a1−² = βµB + z1−² βµB + (z1−² − 1) q B 6 (2) β(µB )3
(2.9)
Notes and references We have followed largely Sundt [820]. Another main reference is Daykin et al. [279]. Note, however, that [279] distinguishes between the NP and Edgeworth approximations.
2c
Panjer’s recursion
PN Consider A = i=1 Ui , let pn = P(N = n), and assume that there exist constants a, b such that ¶ µ b pn−1 , n = 1, 2, . . . . pn = a + (2.10) n For example, this holds with a = 0, b = β for the Poisson distribution with rate β since β β n−1 β βn = e−β = pn−1 . pn = e−β n! n (n − 1)! n Proposition 2.4 Assume that B is concentrated on {0, 1, 2, . . .} and write P∞ gj = P(Ui = j), j = 0, 1, 2, . . ., fj = P(A = j), j = 0, 1, . . .. Then f0 = 0 g0n pn and ¶ j µ X k 1 a+b gk fj−k , j = 1, 2, . . . . fj = (2.11) 1 − ag0 j k=1
In particular, if g0 = 0, then f0 = p0 , fj =
¶ j µ X k gk fj−k , j = 1, 2, . . . . a+b j
(2.12)
k=1
Remark 2.5 The crux of Proposition 2.4 is that the algorithm is much faster than the naive method, which would consist in noting that (in the case g0 = 0) fj =
j X n=1
pn gj∗n
(2.13)
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
499
where g ∗n is the nth convolution power of g, and calculating the gj∗n recursively by gj∗1 = gj , gj∗n =
j−1 X
∗(n−1)
gk
gj−k .
(2.14)
k=n−1
Namely, the complexity (number of arithmetic operations required) is O(j 3 ) for (2.13), (2.14) but only O(j 2 ) for Proposition 2.4. 2 Proof of Proposition 2.4. The expression for f0 is obvious. By symmetry, h Ui E a+b j
¯X i ¯ n ¯ Ui = j ¯
(2.15)
i=1
is independent of i = 1, . . . , n. Since the sum over i is na + b, the value of (2.15) is therefore a + b/n. Hence by (2.10), (2.13) we get for j > 0 that fj
= = = =
=
=
b´ pn−1 gj∗n n n=1 n ∞ i h X U1 ¯¯ X Ui = j pn−1 gj∗n E a+b ¯ j i=1 n=1 ∞ ³ X
a+
n ∞ i h X U1 X ; Ui = j pn−1 E a+b j i=1 n=1
k ´ ∗(n−1) gk gj−k pn−1 j n=1 k=0 ¶ X j ³ j µ ∞ X X k´ k ∗n gk gj−k pn = a + b gk fj−k a+b j j n=0 k=0 k=0 ¶ j µ X k gk fj−k , a+b ag0 fj + j j ³ ∞ X X
a+b
k=1
and (2.10) follows.
2
If the distribution B of the Ui is nonlattice, it is natural to use a dis(h) (h) crete approximation. To this end, let Ui,+ , Ui,− be Ui rounded upwards, resp. PN (h) (h) downwards, to the nearest multiple of h and let A± = 1 Ui,± . An obvious (h)
(h)
modification of Proposition 2.4 applies to evaluate the distribution F± of A± ,
500
CHAPTER XVI. MISCELLANEOUS TOPICS (h)
(h)
letting fj,± = P(A± = jh) and gk,−
(h)
=
(h)
=
gk,+
¡ (h) ¢ ¡ ¢ P Ui,− = kh = B (k + 1)h − B(kh), k = 0, 1, 2, . . . , ¡ (h) ¢ ¡ ¢ P Ui,+ = kh = B(kh) − B (k − 1)h = gk−1,− , k = 1, 2, . . .
Then the error on the tail probabilities (which can be taken arbitrarily small by choosing h small enough) can be evaluated by ∞ X
(h)
fj,− ≤ P(A ≥ x) ≤
j=bx/hc
∞ X
(h)
fj,+ .
j=bx/hc
Further examples (and in fact the only ones, cf. Sundt & Jewell [821]) where (2.10) holds are the binomial distribution and the negative binomial (in particular, geometric) distribution. The geometric case is of particular importance because of the following result which immediately follows from combining Proposition 2.4 and the PollaczeckKhinchine representation: Corollary 2.6 Consider a compound Poisson risk process with Poisson rate β and claim size distribution B. Then for any h > 0, the ruin probability ψ(u) satisfies ∞ ∞ X X (h) (h) fj,− ≤ ψ(u) ≤ fj,+ , (2.16) j=bu/hc (h)
j=bu/hc
(h)
where fj,+ , fj,− are given by the recursions (h)
fj,+ = ρ
j X
(h) (h)
gk fj−k,+ , j = 1, 2, . . .
k=1
(h)
fj,− = (h)
j X
ρ (h)
1 − ag0,−
(h)
(h)
gk,− fj−k,− , j = 1, 2, . . .
k=1 (h)
(h)
starting from f0,+ = 1 − ρ, f0,− = (1 − ρ)/(1 − ρg0,− ) and using (h)
gk,−
=
(h)
=
gk,+
¡ ¢ B0 (k + 1)h − B0 (kh) =
1 µB
Z
(k+1)h
B(x) dx, k = 0, 1, 2, . . . , kh
¡ ¢ (h) B0 (kh) − B0 (k − 1)h = gk−1,− , k = 1, 2, . . . .
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
501
Remark 2.7 It is clear that the quotient of the upper and the lower bound in (2.16) tends to 1 for u → ∞ if Ui is longtailed (i.e. B(x − y)/B(x) → 1 as x → ∞ for all y, cf. Chapter X). Correspondingly, in numerical implementations one typically observes that the difference between the upper and lower bound in (2.16) gets larger for increasing u, but for longtailed (in particular subexponential) Ui again tends to zero for still larger u. 2 Notes and references The literature on recursive algorithms related to Panjer’s recursion is extensive, see e.g. Dickson [307] and references therein. Recursion formulas for counting distributions that are much more general than the class defined by (2.10) have been studied in the literature. A natural and very general class seems to be counting distributions that satisfy a finiteorder homogeneous recursion with polynomial coefficients, see e.g. Wang & Sobrero [873]. For a survey that also covers multivariate extensions, see Sundt & Vernic [824]. Gerhold, Schmock & Warnung [416] provide an improved recursion algorithm; see also Hipp [467] for a speedup from order O(j 2 ) to order O(j) for phasetype claim size distributions. In recent years, due to the increasing available computer power the emphasis is gradually shifting towards direct numerical inversion of the momentgenerating function of the aggregate claim size. In the context of discrete claim size distributions, Fast Fourier Transform techniques can be quite powerful (see Gr¨ ubel & Hermesmeier [439, 440] for details and Embrechts & Frei [344] for a recent comparison).
2d
The distribution of dependent sums
Whereas for the results in the previous subsections the independence assumption for the summands was crucial, in practice one will often face situations where information is needed about the tails of sums of dependent random variables. Clearly there are infinitely many possible dependence structures for a fixed set of marginal distributions and one cannot expect a complete picture of how dependence affects the behavior of the distribution tail. Nevertheless certain patterns occur and for the tail behavior it seems natural that only the dependence of the summands in the tail is important. In the sequel we will state some results in this direction (mainly) for the sum of two identically distributed subexponential r.v.’s to illustrate the challenges that occur when dependence enters. For more general results see the references in the Notes at the end of the section. Let us consider the sum X1 +X2 of two identically distributed subexponential random variables each with distribution function B. By definition, if X1 and X2 are independent, then P(X1 + X2 > x) ∼ 2B(x) as x → ∞ and a natural question is under which assumptions on the dependence structure of X1 and X2 and on B the same asymptotic relation holds true with dependence. A first rough description of tail dependence between X1 and X2 is the so
502
CHAPTER XVI. MISCELLANEOUS TOPICS
called (upper) tail dependence coefficient λ = lim P(X2 > v  X1 > v). v→∞
If λ = 0, then X1 and X2 are called tailindependent. The following simple result extends Proposition X.1.1(a). Lemma 2.8 P(max(X1 , X2 ) > x) ∼ (2 − λ)B(x). Proof. P(max(X1 , X2 ) > x)
= P(X1 > x) + P(X2 > x) − P(X1 > x, X2 > x) = B(x) + B(x) − B(x) P(X2 > xX1 > x). 2
Let us first collect some results for regularly varying marginals, a case that is quite well understood. Regularly varying marginal distributions Proposition 2.9 Let B(x) ∼ L(x)x−α with α > 0. Then ( ³ ´α+1 1 1 P(X1 + X2 > x) λ α+1 + (2 − 2λ) α+1 , ≤ lim sup α B(x) x→∞ 2 (2 − λ),
0 ≤ λ ≤ 32 , 2 3 < λ ≤ 1.
Proof. Analogously to the proof of Proposition X.1.4, for any 0 < δ < 1/2 we have ¯ ˘ ¯ ˘ ¯ ˘ ¯” X1 > (1−δ)x ∪ X2 > (1−δ)x ∪( X1 > δx ∩ X2 > δx ) ` ´ ≤ 2B((1 − δ)x) + P(X1 > δx, X2 > δx) − 2P X1 > (1 − δ)x, X2 > (1 − δ)x
P(X1 +X2 > x) ≤ P
“˘
so that P(X1 + X2 > x) lim sup B(x) x→∞
¡ ¢ ³ B (1 − δ)x B(δx) ´ ≤ lim sup (2 − 2λ) +λ B(x) B(x) x→∞ λ 2 − 2λ + α. = (1 − δ)α δ
Within the defined range of δ, this upper bound is minimized for ( 1 , 0 ≤ λ ≤ 2/3 1 2 ∗ −2) α+1 1+( λ δ = 1/2, 2/3 < λ ≤ 1,
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
503
which yields the result.
2
This upper bound is sharp for both independence and comonotone dependence (in the latter case,3 P(X1 + X2 > x) ∼ 2α B(x)). A combination of Lemma 2.8 and Proposition 2.9 immediately shows Corollary 2.10 If B has a regularly varying tail and λ = 0, then P(X1 + X2 > x) ∼ 2B(x) as x → ∞. Hence, for regularly varying distributions, tail independence is already a sufficient criterion to guarantee that the tail of the dependent sum behaves asymptotically as if X1 and X2 were independent. An important and natural subclass of distributions with regularly varying marginals are the ones with multivariate regular variation (for consistency we only state the bivariate case, although the extension to n dimensions is obvious). A vector X = (X1 , X2 ) is regularly varying with index −α < 0, if there exists a probability measure S on S1+ (the unit sphere in R2 with respect to the Euclidean norm  ·  restricted to the first quadrant) and a function b(x) → ∞ such that ¶ µµ ¶ X X D −1 , ∈ · → a να × S, b (x) P (2.17) x X ¡ ¢ in the space of positive Radon measures on (², ∞] × S1+ for all ² > 0, where a > 0 and να (t, ∞] = t−α , (t > 0, α > 0) (see e.g. Resnick [737]). S is often referred to as the spectral measure of X. The above implies in particular that on every ray from (0,0) into the positive quadrant (the direction of which is governed by S), we have a regularly varying tail with index −α. Moreover, the tail of X1 + X2 is also regularly varying with the same index. For this specific dependence structure, the asymptotic behavior of the sum can be given explicitly in terms of the spectral measure. Proposition 2.11 Assume that X = (X1 , X2 ) is exchangable and regularly varying with index −α < 0 and spectral measure S. Then R π/2 0 P(X1 + X2 > x) ∼ 2 B(x) R π/2 0 3 For
(cos ϕ + sin ϕ)α S(dϕ)
(cosα ϕ + sinα ϕ) S(dϕ)
.
α < 1 (i.e. infinite mean) this also shows that comonotone dependence does not necessarily provide an upper bound for the tail asymptotics of all possible dependence structures with fixed marginals! Intuitively, if the marginal distribution tail is heavy enough, then the two random sources for a possibility of a large sum caused by one of the summands outweighs the effect of summing two large components from one random source.
504
CHAPTER XVI. MISCELLANEOUS TOPICS
1 1 Proof. Consider in (2.17) the events X/x > t for t = cos ϕ+sin ϕ and t = cos ϕ , where ϕ ∈ [0, π/2] denotes the angle corresponding to X/X. We then obtain
Z b−1 (x) P(X1 + X2 > x) → a
π/2
(cos ϕ + sin ϕ)α S(dϕ)
0
and
Z
π/2
b−1 (x) P(X1 > x) → a
cosα ϕ S(dϕ),
0
so that the result follows from S(dϕ) = S(d(π/2 − ϕ)).
2
It occurs in a number of situations that the risks Xi are independent, but they need to be added with some weights that are not independent. Here is a result the proof of which can be found in Goovaerts et al. [424]. Proposition 2.12 Assume that X1 , . . . , Xn are i.i.d. r.v.’s with regularly varying tail B(x) ∼ L(x) x−α for some α > 0 and let θ1 , . . . , θn be dependent nonnegative r.v.’s, independent of X1 , . . . , Xn . If there exists some δ > 0 s.t. E(θkα+δ ) < ∞ for 1 ≤ k ≤ n, then ³ P
max
1≤m≤n
m X
n n ´ ³X ´ X E(θkα ). θk Xk > x ∼ P θk Xk > x ∼ B(x)
k=1
k=1
k=1
If either 0 < α < 1,
∞ X
∞ X
E(θkα+δ ) < ∞,
k=1
Eθkα−δ < ∞
for some δ > 0
k=1
or α ≥ 1,
∞ X
(E(θkα+δ ))1/(α+δ) < ∞,
k=1
∞ X £ α−δ ¤1/(α+δ) Eθk 0,
k=1
then ∞ n ∞ ³ ´ ³X ´ X X E(θkα ). P max θk Xk > x ∼ P θk (Xk )+ > x ∼ B(x) 1≤n≤∞
k=1
k=1
k=1
Example 2.13 Recall the discrete time risk model with stochastic investment −1 of Section VIII.5. If we choose θk = A−1 and Xk = Bk , then Propo1 · · · Ak sition 2.12 applies for the case of regularly varying insurance risk Bk (with index −α). The conditions of the Proposition translate into E(A−α+δ ) < ∞ 1
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
505
and E(A−α±δ ) < 1, respectively, for some δ > 0. Under these assumptions, 1 Proposition 2.12 gives the finitetime ruin probability ³ ´ −α n E(A−α 1 ) 1 − (E(A1 )) , u→∞ ψ(u, n) = P(τ (u) ≤ n) ∼ B(x) 1 − E(A−α 1 ) and the infinitetime ruin probability ψ(u) = P(τ (u) < ∞) ∼ B(x)
E(A−α 1 ) , 1 − E(A−α 1 )
u → ∞,
which refines Theorem VIII.5.8 for this particular case.
2
Other subexponential marginal distributions From the proof of Proposition 2.9, it becomes clear that Corollary 2.10 also holds true for any B ∈ S with heavier tail than regularly varying. On the other hand, in general the marginal tails cannot be much lighter than regularly varying in order to dominate the ‘dependence effect’ in the tail of the sum given λ = 0, as the following result shows. Proposition 2.14 If the mean excess function e(x) is selfneglecting, i.e. ¡ ¢ e x + a e(x) = 1 ∀ a ≥ 0, (2.18) lim x→∞ e(x) and if
¯ ¡ ¢ inf lim inf P X2 > a e(x) ¯ X1 > x > 0,
a>0 x→∞
then lim inf x→∞
(2.19)
P(X1 + X2 > x) = ∞. B(x)
Proof. From Proposition X.1.18 we know that the selfneglecting property (2.18) implies ¡ ¢ B x + a e(x) = e−a lim x→∞ B(x) and we have B(x) ¡ ¢ B x − a e(x)
∼ ∼
¡ ¢ B x + a e(x)
¡ ¢ B x + a e(x) − a e(x + a e(x)) ¡ ¢ B x + a e(x) . B(x)
506
CHAPTER XVI. MISCELLANEOUS TOPICS
Together with (2.19) this gives ¡ ¢ P(X1 + X2 > x) ≥ P X1 > x − a e(x), X2 > a e(x) ¯ ¡ ¢ ¡ ¢ = P X1 > x − a e(x) P X2 > a e(x) ¯ X1 > x − a e(x) ¯ ¡ ¢ ¡ ¢ ∼ P X1 > x − a e(x) P X2 > a e(x) ¯ X1 > x ¡ ¢ ≥ ε P X1 > x − a e(x) ∼ ε P(X1 > x) ea for some ε > 0 and any a > 0. Hence lim inf x→∞
P(X1 + X2 > x) ≥ ε ea B(x)
and the latter is unbounded for a → ∞.
2
Remark 2.15 Recall from Chapter X that condition (2.18) is satisfied for Weibull and lognormal distributions (more generally, for all subexponential distributions which lie in the maximum domain of attraction of the Gumbel distribution, cf. X.6b). A sufficient condition for (2.19) to hold is ¯ ¡ ¢ lim inf P X2 > e∗ (x) ¯ X1 > x > 0 x→∞
for any e∗ (x) with e∗ (x)/e(x) → ∞. One can show that for all B that satisfy (2.18) there exists a dependence structure such that (2.19) is satisfied (cf. [13]). 2 On the other hand, for a particular given dependence structure the tail of the sum may well be asymptotically equivalent to the one of the independent sum. This is illustrated by the following example with lognormal marginals and a Gaussian copula (which is tail independent). Proposition 2.16 Let Y1 , Y2 be bivariate normal with the same mean µ, the same variance σ 2 and covariance ρ ∈ [−1, 1). Then, for X1 = eY1 and X2 = eY2 one has p © ª 2/π exp −(log x − µ)2 /2σ 2 . P(X1 + X2 > x) ∼ 2 P(X1 > x) ∼ σ log x Proof. Rather than giving a rigorous technical proof (for which we refer to Asmussen & RojasNandaypa [96]), we give here just a short heuristical argument supporting the result. Take µ = 0, σ 2 = 1, ρ > 0 for simplicity. Then we can write Y1 = U + V1 , Y2 = U + V2 ,
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
507
where U, V1 , V2 are independent univariate Gaussian with mean zero and variances a2 , b2 , b2 , respectively, where a2 + b2 = 1, a2 = ρ. Given U = u, X1 and X2 are independent lognormals with logvariance b2 , so by subexponential limit theory ¯ ¡ ¢ P X1 + X2 > x ¯ U = u = P(eV1 + eV2 > xe−u ) p © ª 2/π exp −(log x − u)2 /2b2 . ∼ b(log x − u) We make the guess ¯ ¢ 2 2 ¡ 1 P(X1 + X2 > x) ≈ max √ e−u /2a P X1 + X2 > x ¯ U = u u a 2π
(2.20)
and ignore everything not in the exponent and constants. Then we have to find the u minimizing u log x u2 u2 − + 2 2 2 2a b 2b which (using a2 + b2 = 1) is easily seen to be u = a2 log x. Substituting back in (2.20), we get © ª P(X1 + X2 > x) ≈ exp −a4 log2 x/2a2 − (1 − a2 )2 log2 x/2b2 © ª = exp − log2 x/2 (2.21) in agreement with the claimed assertion (here ≈ is used to indicate asymptotics at a rough level, i.e. rougher than ∼ or even logarithmic asymptotics as used in large deviations theory). Note that the argument contains some information on how X1 + X2 exceeds x: U must be approximately u = a2 log x = ρ log x and either V1 or V2 but not both large. Translated back to X1 , X2 , this means that one is larger than x and the other of order eu = xρ . 2 We finish this section with a fairly general result of Foss & Richards [367] about conditions under which the tail of the dependent sum asymptotically behaves as the tail for the independent sum. Note that a consequence of Proposition X.1.5 is that for B ∈ S there always exists a monotone function h(x) % ∞ with ¡ ¢ lim B x − h(x) /B(x) = 1. (2.22) x→∞
Theorem 2.17 Let B ∈ S. Assume that X1 , X2 , . . . are positive r.v.’s with c.d.f. Bi in a probability space (Ω, F, P) such that for each i, B i (x) ∼ ci B(x) (with at least one ci 6= 0 and ∃ c > 0 and x0 > 0 s.t. B i (x) ≤ cB(x) for all x > x0 ). Further
508
CHAPTER XVI. MISCELLANEOUS TOPICS
(i) X1 , X2 , . . . are conditionally independent given a σalgebra G ⊂ F (ii) for each i there exists a nondecreasing function r(x) and an increasing collection of sets Ji (x) ∈ G with Ji (x) → Ω as x → ∞ such that ¡ ¢ ¡ ¢ P(Xi > x  G) I Ji (x) ≤ r(x)B(x)I Ji (x)
a.s.
and such that for a function h(x) that satisfies (2.22), uniformly in i, ¢ ¢ ¡ ¡ 1. P Ji (h(x)) = o B(x) , ¡ ¢ 2. r(x)B h(x) = o(1), ¢ ¡ R x−h(x) 3. r(x) h(x) B(x − y)B(dy) = o B(x) , as x → ∞. Then for all n ∈ N, P(X1 + · · · + Xn > x) ∼ nB(x). Proof. Consider first X1 + X2 . We have the inequalities ¡ ¢ ¡ ¢ P(X1 + X2 > x) ≤ P X1 > x − h(x) + P X2 > x − h(x) ¡ ¢ + P h(x) ≤ X1 ≤ x − h(x), X2 > x − X1 and P(X1 + X2 > x) ≥ P(X1 > x) + P(X2 > x) − P(X1 > x, X2 > x) . Now, if Y is another r.v. with c.d.f. B, independent of X1 , X2 , ¡ ¢ P h(x) ≤ X1 ≤ x − h(x), X2 > x − X1 h ¡ ¢ ¢i = E P h(x) ≤ X1 ≤ x − h(x), X2 > x − X1  G ³Z x−h(x) h ¡ ¢i´ ¢ ¡ =E P(X1 ∈ dy  G) P(X2 > x − y  G) I J2 (x − y) + I J 2 (x − y) h(x)
³Z ≤ r(x)E Z
x−h(x)
´ ³ ¡ ¢´ P(X1 ∈ dy  G) P(Y > x − y) + E I J 2 (h(x))
h(x) x−h(x)
= r(x) h(x)
¢ ¢ ¡ ¡ P(X1 ∈ dy)B(x − y) + o B(x) = o B(x) .
2. THE DISTRIBUTION OF THE AGGREGATE CLAIMS
509
At the same time P(X1 > x, X2 > x) ! Ã ´ ¯ ¢³ ¡ = E P(X1 > x, X2 > x ¯ G I(J2 (x)) + I(J 2 (x)) ³ ´ ¢ ¡ ≤ E P(X1 > x  G) P(X2 > x  G)I(J2 (x)) + EI J 2 (x) ¢ ¢ ¡ ¡ ≤ r(x)B(x)P(X1 > x) + o B(x) = o B(x) . Consequently, P(X1 + X2 > x) ∼ P(X1 > x) + P(X2 > x). Since w.l.o.g. c1 > 0, P(X1 + X2 > x) ∼ (c1 + c2 )B(x). The result for general n now follows by induction.
2
The following extension of Lemma X.1.8 is proved in [367]. Lemma 2.18 Under the conditions of Theorem 2.17, for any ² > 0 there exist V (²) > 0 and x0 = x0 (²) such that for any x > x0 and n ≥ 1 P(X1 + · · · + Xn > x) ≤ V (²)(1 + ²)n B(x). This result and Theorem 2.17 together with dominated convergence now gives the following extension of Lemma X.2.2. Proposition 2.19 Let K be an independent integervalued r.v. with Ez K < ∞ for some z > 1. UnderPthe assumptions of Theorem 2.17 one then has K P(X1 + · · · + XK > x) ∼ E( i=1 ci ) B(x). In several applications in risk theory, conditionally independent r.v. will be an appropriate description of the dependence structure. However, the challenge in the application of the above result is to identify a σalgebra G and a corresponding function h(x) that satisfies the assumptions of Theorem 2.17. See [367] for some worked out examples. Notes and references Some general nonasymptotic bounds on P(X1 +· · ·+Xn > x) are derived in Denuit, Genest & Marceau [293], Cossette, Denuit & Marceau [257], Mesfioui & Quessy [635] and Embrechts & Puccetti [350] (see also [351] for bounds on functions of multivariate risks). Worstcase scenarios are also studied by R¨ uschendorf [759]. Parts of the material in this section is from Albrecher, Asmussen & Kortschak [13]. It is of course also possible to represent results on the asymptotic behavior of the sum through conditions on the underlying copula. Quite explicit results for Archimedean copulas can be found in Alink, Loewe & Wuethrich [42], see also [43];
510
CHAPTER XVI. MISCELLANEOUS TOPICS
for extensions using multivariate extreme value theory see Barbe, Foug´eres & Genest [130] and, with an emphasis on nonidentically distributed marginals, Kortschak & Albrecher [556]. For further results on asymptotically independent subexponential risks in the maximum domain of attraction of the Gumbel distribution, see Mitra & Resnick [647] and also Laeven, Goovaerts & Hoedemakers [571] with a view towards actuarial applications. Tang & Wang [832] extend Proposition 2.12 to random variables with dominated variation. Asymptotic tail probabilities for negatively associated sums of heavytailed random variables are investigated in Wang & Tang [871] and Geluk & Ng [392].
3
Principles for premium calculation
The standard setting for discussing premium calculation in the actuarial literature does not involve stochastic processes, but only a single risk X ≥ 0. By this we mean that X is a r.v. representing the random payment to be made (possibly 0). A premium rule is then a [0, ∞)valued function H of the distribution of X, often written H(X), such that H(X) is the premium to be paid, i.e. the amount for which the company is willing to insure the given risk. Among the standard premium rules discussed in the literature (not necessarily the same which are used in practice!) are the following: The net premium principle H(X) = EX (also called the equivalence principle). As follows from the fluctuation theory of r.v.’s with finite mean, this principle will lead to ruin if many independent risks are insured. This motivates the next principle, The expected value principle H(X) = (1 + η)EX where η is a specified safety loading. For η = 0, we are back to the net premium principle. A criticism of the expected value principle is that it does not take into account the variability of X. This leads to The variance principle H(X) = EX + ηVar(X). A modification (motivated from EX and Var(X) not having the same dimension) is p The standard deviation principle H(X) = EX + η Var(X). The principle of zero utility. Here v(x) is a given utility function, assumed to be concave and increasing with (w.lo.g) v(0) = 0; v(x) represents the utility of size x. The zero utility principle then means v(0) = ¡ of a capital ¢ Ev H(X)−X or, taking into account the initial reserve u in the portfolio, ¡ ¢ v(u) = Ev u + H(X) − X .
(3.1)
3. PRINCIPLES FOR PREMIUM CALCULATION
511
¡ ¢ ¡ ¢ By Jensen’s inequality, v u + H(X) − EX ≥ Ev u + H(X) − X = 0 so that H(X) ≥ EX. For v(x) = x, we have equality and are back to the net premium principle. There is also an approximate argument leading to the variance principle as follows. Assuming that the Taylor approximation ¢2 ¡ ¢ ¡ ¢ v 00 (u) ¡ H(X) − X v u + H(X) − X ≈ u + v 0 (u) H(X) − X + 2 is reasonable, taking expectations leads to the quadratic equation v 00 H(X)2 + H(X)(2v 0 − 2v 00 EX) + v 00 EX 2 − 2v 0 EX = 0 (with v 0 , v 00 evaluated at u) with solution sµ ¶ 2 v0 v0 − Var(X). H(X) = EX − 00 ± v v 00 Write µ
¶2 v 00 Var(X) . 2v 0 √ If v 00 /v 0 is small, we can ignore the last term. Taking + · then yields v0 v 00
¶2
µ
− Var(X) =
v 00 v0 − 0 Var(X) 00 v 2v
H(X) ≈ EX −
¶2
µ
−
v 00 (u) VarX; 2v 0 (u)
since v 00 (u) ≤ 0 by concavity, this is approximately the variance principle. The most important special case of the principle of zero utility is The exponential principle which corresponds to v(x) = (1 − e−ax )/a for some a > 0. Here the initial capital u cancels out and (3.1) leads to H(X) =
1 log EeaX . a
Since m.g.f.’s are logconcave, it follows that Ha (X) = H(X) is increasing as function of a. Further, lima↓0 Ha (X) = EX (the net premium principle) and, provided b = ess sup X < ∞, lima→∞ Ha (X) = b (the premium principle H(X) = b is called the maximal loss principle but is clearly not very realistic). In view of this, a is called the risk aversion. Note that in the compound Poisson model, the premium collected for the aggregate risk At is pt. Equating this with H(At ) = a1 log EeaAt leads to the Lundberg equation for a. Hence, the premium principle in the Cram´erLundberg model can be interpreted as an exponential principle with risk aversion γ, given the adjustment coefficient γ > 0 exists.
512
CHAPTER XVI. MISCELLANEOUS TOPICS
The riskadjusted premium principle Z ∞ ¡ ¢ H(X) = g P(X > x) dx 0
for a fixed nondecreasing and leftcontinuous function g : [0, 1] → [0, 1] (also called the distortion function) such that g(0) = 0 and g(1) = 1. The percentile principle Here one chooses a (small) number α, say 0.05 or 0.01, and determines H(X) by P(X ≤ H(X)) = 1 − α (assuming a continuous distribution for simplicity). Some standard criteria for evaluating the merits of premium rules are 1. η ≥ 0, i.e. H(X) ≥ EX. 2. H(X) ≤ b when b (the ess sup above) is finite 3. H(X + c) = H(X) + c for any constant c 4. H(X + Y ) = H(X) + H(Y ) when X, Y are independent ¡ ¢ PN 5. H(X) = H H(XY ) . For example, if X = 1 Ui is a random sum with the Ui independent of N , this yields ÃN ! X Ui = H(H(U )N ) H 1
(where, of course, H(U ) is a constant). Note that H(cX) = cH(X) is not on the list! Considering the examples above, the net premium principle and the exponential principle can be seen to be the only ones satisfying all five properties. The expected value principle fails to satisfy, e.g., 3), whereas (at least) 4) is violated for the variance principle, the standard deviation principle, and the zero utility principle (unless it is the exponential or net premium principle). For more detail, see e.g. Gerber [398] or Sundt [820]. Notes and references The discussed premium principles are standard and can be found in many texts on insurance mathematics, e.g. Gerber [398], Heilmann [458] and Sundt [820]. For an extensive treatment, see Goovaerts et al. [423]. In recent years, the discussion about which criteria H should or should not fulfill in various applications has experienced enormous interest and activity in related finance contexts under the terminology of risk measures, see for instance Pflug & R¨ omisch [697] for
4. REINSURANCE
513
an overview. On the insurance side, going from the static pricing framework above towards a dynamic one is considered to be an important step for many situations. Time consistency and market consistency play a crucial role in this context; for some recent developments see e.g. Cheridito, Delbaen & Kupper [238], Jobert & Rogers [507], Malamud, Trubowitz & W¨ uthrich [624] and Pelsser [690].
4
Reinsurance
Reinsurance means that the company (the cedent) insures a part of the risk at another insurance company (the reinsurer). Again, we start by formulating the basic concepts within the framework of a single risk X ≥ 0. A reinsurance arrangement is then defined in terms of a function h(x) with the property 0 ≤ h(x) ≤ x. Here h(x) is the amount of the claim x to be paid by the reinsurer and x − h(x) the amount to be paid by the cedent. The function x − h(x) is referred to as the retention function. The most common examples are the following two: Proportional reinsurance h(x) = θx for some θ ∈ (0, 1). Also called quota share reinsurance. Stoploss reinsurance h(x) = (x − b)+ for some b ∈ (0, ∞), referred to as the retention limit. Note that the retention function is x ∧ b. Concerning terminology, note that in the actuarial literature the stoploss transform of F (x) = P(X ≤ x) (or, equivalently, of X), is defined as the function Z ∞ Z ∞ F (x) dx. b → E(X − b)+ = (x − b)F (dx) = b
b
An arrangement closely related to stoploss reinsurance is excessofloss reinsurance, see below. Stoploss reinsurance and excessofloss reinsurance have a number of nice optimality properties. The first we prove is in terms of maximal utility: Proposition 4.1 Let X be a given risk, v a given concave nondecreasing utility function and h a given retention function. Let further b be determined by E(X − b)+ = Eh(X). Then for any x, ¡ ¢ Ev x − [X − h(X)] ≤ Ev(x − X ∧ b). Remark 4.2 Proposition 4.1 can be interpreted as follows. Assume that the cedent charges a premium P ≥ EX for the risk X and is willing to pay P1 < P
514
CHAPTER XVI. MISCELLANEOUS TOPICS
for reinsurance. If the reinsurer applies the expected value principle with safety loading η, this implies that the cedent is looking for retention functions with Eh(X) = P2 = P1 /(1 + η). The expected utility after settling the risk is thus Ev(u + P − P1 − [X − h(X)]) where u is the initial reserve. Letting x = u + P − P1 , Proposition 4.1 shows that the stoploss rule h(X) = (X − b)+ with b chosen such that E(X − b)+ = P2 maximizes the expected utility. 2 Recall the notions of stochastic ordering from Section IV.8. For the proof of Proposition 4.1, we shall need the following lemma: Lemma 4.3 (Ohlin’s lemma) Let X1 , X2 be two risks with the same mean, such that F1 (x) ≤ F2 (x), x < b, F1 (x) ≥ F2 (x), x ≥ b for some b where Fi (x) = P(Xi ≤ x). Then X1 ≺cx X2 . Proof. Define ∆(u) = E(X2 − u)+ − E(X1 − u)+ . Clearly ∆(0) = 0 and R∞ limu→∞ ∆(u) = 0. But from the representation ∆(u) = u (F1 (x) − F2 (x)) dx, we have under the given assumptions that ∆(u) increases on (0, b) and decreases on (b, ∞). So ∆(u) ≥ 0 for all u ≥ 0, i.e. X1 ≺icx X2 . Since E[X1 ] = E[X2 ], this implies X1 ≺cx X2 . 2 Proof of Proposition 4.1. It is easily seen that the assumptions of Ohlin’s lemma hold when X1 = X ∧ b, X2 = X − h(X); in particular, the requirement EX1 = EX2 is then equivalent to E(X − b)+ = Eh(X). Now just note that −v is convex. 2 We now turn to the case where the risk can be written as X=
N X
Ui
(4.1)
i=1
with the Ui independent; N may be random but should then be independent of the Ui . Typically, N could be the number of claims in a given period, say a year, and the Ui the corresponding claim sizes. A reinsurance arrangement of the form h(X) as above is called global; if instead h is applied to the individual PN claims so that the reinsurer pays the amount i=1 h(Ui ), the arrangement is called local.4 4 More
generally, one could consider
PN
i=1
hi (Ui ).
4. REINSURANCE
515
The following discussion will focus on maximizing the adjustment coefficient. For a global rule with retention function h∗ (x) and a given premium P ∗ charged for X − h∗ (X), the cedent’s adjustment coefficient γ ∗ is determined by © ª 1 = E exp γ ∗ [X − h∗ (X) − P ∗ ] , (4.2) for a local rule corresponding to h(u) and premium P for X − look instead for the γ solving
PN i=1
h(Ui ), we
N N n hX io n h io X £ ¤ 1 = E exp γ Ui − h(Ui ) − P = E exp γ X − P − h(Ui ) . (4.3) i=1
i=1
This definition of the adjustment coefficients is motivated by considering ruin at a sequence of equally spaced time points, say consecutive years, such that N is the generic number of claims in a year and P, P ∗ the total premiums charged in a year, and referring to the results of VI.3a. The following result shows that if we compare only arrangements with P = P ∗ , a global rule is preferable to a local one. Proposition 4.4 To any local rule with retention function h(u) and any N h i X P ≥ E X− h(Ui ) ,
(4.4)
i=1
there is a global rule with retention function h∗ (x) such that Eh∗ (X) = E
N X
h(Ui )
(4.5)
i=1
and γ ∗ ≥ γ. Proof. Define
N ¯ hX i ¯ h∗ (x) = E h(Ui ) ¯ X = x ; i=1
¡ ¢ then (4.5) holds trivially. Applying the inequality Eϕ(Y ) ≥ Eϕ E(Y X) (with £ ¤ P N ϕ convex) to ϕ(y) = eγy , Y = i=1 Ui − h(Ui ) − P , we get N n hX io £ ¤ © £ ¤ª 1 = E exp γ Ui − h(Ui ) − P ≥ E exp γ X − h∗ (X) − P . i=1
But since γ ≥ 0, γ ∗ ≥ 0 because of (4.4), this implies γ ∗ ≥ γ.
2
516
CHAPTER XVI. MISCELLANEOUS TOPICS
Remark 4.5 Because of the independence assumptions, expectations like those in (4.3), (4.4), (4.5) simplify a lot. Assuming for simplicity that the Ui are i.i.d., we get EX = EN · EU , N h i X £ ¤ E X− h(Ui ) = EN · E U − h(U ) , i=1 N n hX io b N, E exp γ [Ui − h(Ui )] − P = EC[γ]
(4.6)
i=1
b = Eeγ(U −h(U )) , and so on. where C[γ]
2
The arrangement used in practice is, however, as often local as global. Local reinsurance with h(u) = (u − b)+ is referred to as excessofloss reinsurance and plays a particular role: Proposition 4.6 Assume the Ui are i.i.d. Then for any local retention function u − h(u) and any P satisfying (4.4), the excessofloss rule h1 (u) = (u − b)+ with b determined by E(U − b)+ = Eh(U ) (4.7) (and the same P ) satisfies γ1 ≥ γ. Proof. As in the proof of Proposition 4.4, it suffices to show that N N n hX io n hX io E exp γ Ui ∧ b − P ≤ 1 = E exp γ [Ui − h(Ui )] − P , i=1
i=1
b1 [γ] ≤ C[γ] b b or, appealing to (4.6), that C where C[γ] = Eeγ(U ∧b) . This follows by taking X1 = U ∧ b, X2 = U − h(U ) (as in the proof of Proposition 4.4) and g(x) = eγx in Ohlin’s lemma. 2 Notes and references Reinsurance is a classical topic. The material presented here is standard and can be found in many texts on insurance mathematics, e.g. Bowers et al. [195], Heilmann [458] and Sundt [820]. See further Hesselager [461] and Dickson & Waters [319]. The original reference for Ohlin’s lemma is Ohlin [671]. An early reference for minimization of the ruin probability through reinsurance in an asymptotic sense by maximizing the adjustment coefficient is Waters [876], see Hald & Schmidli [446], Centeno [224, 225] and Guerra & Centeno [441] for more recent extensions. The identification of optimal reinsurance strategies under various objective functions and constraints is an active field of research, see e.g. Centeno & Sim˜ oes [226] and Albrecher & Teugels [38] for a recent overview. For optimal dynamic reinsurance in discrete time, see e.g. Dickson & Waters [321]. Optimal adaptive reinsurance strategies in continuous time are discussed in Chapter XIV.
Appendix A1 1a
Renewal theory Renewal processes and the renewal theorem
By a simple point process on the line we understand a random collection of time epochs without accumulation points and without multiple points. The mathematical representation is either the ordered set 0 ≤ T0 < T1 < . . . of epochs or the set Y1 , Y2 , . . . of interarrival times and the time Y0 = T0 of the first arrival (that is, Yn = Tn − Tn−1 ). The point process is called a renewal process if Y0 , Y1 , . . . are independent and Y1 , Y2 , . . . all have the same distribution, denoted by F in the following and referred to as the interarrival distribution; the distribution of Y0 is called the delay distribution. If Y0 = 0, the renewal process is called zerodelayed. The number max k : Tk−1 ≤ t of renewals in [0, t] is denoted by Nt . P∞ The associated renewal measure U is defined by U = 0 F ∗n where F ∗n is the nth convolution power of F . That is, U (A) is the expected number of renewals in A ⊆ R in a zerodelayed renewal process; note in particular that U ({0}) = 1. The renewal theorem asserts that U (dt) is close to dt/µ, Lebesgue measure dt normalized by the mean µ of F , when t is large. Technically, some condition is needed: that F is nonlattice, i.e. not concentrated on {h, 2h, . . .} for any h > 0. Then Blackwell’s renewal theorem holds, stating that U (t + a) − U (t) →
a , t→∞ µ
(A.1)
(here U (t) = U ([0, t]) so that U (t + a) − U (t) is the expected number of renewals in (t, t+a]). If F satisfies the stronger condition of being spreadout (F ∗n is nonsingular w.r.t. Lebesgue measure for some n ≥ 1), then Stone’s decomposition holds: U = U1 + U2 where U1 is a finite measure and U2 (dt) = u(t) dt where 517
518
APPENDIX
u(t) has limit 1/µ as t → ∞. Note in particular that F is spreadout if F has a density f . A weaker (and much easier to prove) statement than Blackwell’s renewal theorem is the elementary renewal theorem, stat