Probability, Statistics and their Applications: Papers in Honor of Rabi Bhattacharya
Krishna Athreya, Mukul Majumdar, Madan Purl & Edward Waymire, Editors
Institute of Mathematical Statistics LECTURE NOTES-MONOGRAPH SERIES Volume 41
Probability, Statistics and their Applications: Papers in Honor of Rabi Bhattacharya
Krishna Athreya, Mukul Majumdar, Madan Puri & Edward Waymire, Editors
Institute of Mathematical Statistics Beachwood, Ohio
Institute of Mathematical Statistics Lecture Notes-Monograph Series
Series Editor: Joel Greenhouse
The production ofthe Institute of Mathematical Statistics Lecture Notes-Monograph Series is managed by the IMS Societal Office: Julia A. Norton, Treasurer and Elyse Gustafson, Executive Director.
Library of Congress Control Number: 2003103225 International Standard Book Number 0-940600-55-2 Copyright © 2003 Institute of Mathematical Statistics All rights reserved Printed in the United States of America
Editors
KRISHNA ATHREYA
MUKUL MAJUMDAR
School of 0 RIE Cornell University Ithaca, NY 14850
Department of Economics Cornell University Ithaca, NY 14850
EDWARD WAYMIRE
MADAN PURl
Department of Mathematics Oregon State University Corvallis, OR 97331
Department of Mathematics Indiana University Bloomington, IN
HI
IV
Contents
Preface ................................................................... vii Bhattacharya Bibliography ................................................ ix 1. Iteration of lID Random Maps on R+ .................................... 1 Krishna Athreya 2. Adaptive Estimation of Directional Trend .............................. 15 Rudolf Beran 3. Simulating Constrained Animal Motion Using Stochastic Differential Equations .................................................. 35 David Brillinger 4. {;I-expansions and the generalized Gauss map ........................... 49 Santanu Chakraborty and B. V. Rao 5. On Ito's Complex Measure Condition .................................. 65 Larry Chen, Scott Dobson, Ronald Guenther, Chris Orum, Mina Ossiander, Enrique Thomann, Edward Waymire 6. Variational formulas and explicit bounds of Poincare-type inequalities for one-dimensional processes .......................................... 81 Mu-Fa Chen 7. Brownian Motion and the Classical Groups ............................. 97 Anthony D 'Aristotile, Persi Diaconis and Charles M. Newman 8. Transition Density of a Reflected Symmetric Stable Levy Process in an Orthant .............................................................. 117 Amites Dasgupta and S. Ramasubramanian 9. On Conditional Central Limit Theorems For Stationary Processes .................................................. 133 Manfred Denker and Mikhail Gordin 10. Polynomially Harmonizable Processes and finitely polynomially determined Levy processes .............................. 153 A. Goswami and A. Sengupta 11. Effects of Smoothing on Distribution Approximations ................. 169 Peter Hall and Xiao-Hua Zhou 12. Survival under uncertainty in an exchange economy ................... 187 Nigar Hashimzade and Mukul Majumdar
v
13. Singular Stochastic Control in Optimal Investment and Hedging in the Presence of Transaction Costs .................................. 209 Tze Leung Lai and Tiong Wee Lim
14. Parametric Empirical Bayes Model Selection - Some Theory, Methods and Simulation .............................................. 229 Nitai Mukhopadhyay and Jayanta Ghosh 15. A Theorem of Large Deviations for the Equilibrium Prices in Random Exchange Economies ........................................ 247 Esa Nummelin 16. Asymptotic estimation theory of change-point problems for time series regression models and its applications ................................. 257 Takayuki Shiohama, Masanobu Taniguchi and Madan L. Puri 17. Fractional Brownian motion as a differentiable generalized Gaussian process ..................................................... 285
Victoria Zinde- Walsh and Peter C.B. Phillips
vi
Preface This collection of papers honors Professor Rabi Bhattacharya in recognition of a career of dedicated teaching, research and service to his profession. From his earliest publications Bhattacharya has provided unique approaches to a wide array of problems in Probability, Statistics and their connections and applications in sciences and engineering. The diversity of topics covered in this volume is testimony to this tremendous breadth of expertise and scholarship. A bibliography of Bhattacharya's publications appears after this Preface. A brief biographical sketch of Bhattacharya's academic career now follows.
Biographical Sketch Rabindra Nath Bhattacharya was born January 11, 1937, in his ancestral home Porgola, District of Barisal, in what is now Bangladesh. His family moved to Calcutta in 1947 following the partition of India. Undergraduate studies in India are generally done in four-year colleges affiliated with a university. Bhattacharya studied in Presidency College, Calcutta, and received a Bachelor of Science degree in 1956 and Master of Science in 1959. After completion of the M.Sc. he served as a research scholar at the Indian Statistical Institute from 1959-1960. From there Bhattacharya took up a Lectureship in Kalyani University near Calcutta where he taught from 1961 until 1964. Bhattacharya came to the United States in 1964 to attend graduate school in the Statistics Department on a fellowship from the University of Chicago. He completed his PhD as a student of Patrick Billingsley at Chicago in 1967. He returned to India in the spring of 1967 to marry Bithika (Gowri) Banerjee, his wife of the last thirty-six years and mother of his two adult children Urmi and Deepta. In the fall of 1967 he accepted a position in the Statistics Department at the University of California at Berkeley as an Assistant Professor. In 1972 Bhattacharya moved to the University of Arizona as an Associate Professor in the Mathematics Department. He was promoted to Full Professor in 1977. In 1982 he joined the Mathematics Department at Indiana University where he remained until his retirement in 2002. After his retirement from Indiana University, Bhattacharya returned to the University of Arizona where he presently resides as Professor of Mathematics. Bhattacharya's honors and awards are many. They include two special invited papers in the Annals of Probability (1977) and the Annals of Applied Probability (1999). He was elected a Fellow of the Institute of Mathematical Statistics in 1978. He is recipient of a Humboldt Prize 1994-95, and was a Guggenheim Fellow during the year 2000. Extreme generosity in sharing ideas is a hallmark of Rabi Bhattacharya's personality. Nine students completed Ph.D. degrees under his direction and one is currently in progress. Besides these he has helped many others through his consistent and thorough dedication to teaching, research and professional service. Bhattacharya served as Associate Editor on a number of journals throughout his career, ranging from the Annals of Probability, Journal of Statistical Planning and Inference, Econometric Theory, and Journal of Multivariate Analysis, Statistica Sinica. He currently serves as co-coordinating Editor of Journal of Statistical Planning and Inference, and as an Associate Editor for Statistica Sinica and for the Annals of Probability. He served as an elected member of the Vll
Institute of Mathematical Statistics Council from 1998 to 200l. Upon retirement from Indiana University his colleagues and students honored his years of service to the department and university with the following tribute penned by Richard Bradley: "Both in his dealings with people and in his perspective on the university, academic life, and the world at large, Rabi has consistently shown a deep humanity. This is manifested in the way he treats his students, his colleagues, and everybody else he deals with." This volume of articles is continued tribute to his valued contributions.
Vlll
Rabi Bhattacharya
Bibliography [1] Berry-Esseen bounds for the multidimensional central limit theorem (1968) Bull. Amer. Math. Soc., 74, 285-287.
[2] Rates of weak convergence for multidimensional central limit theorems (1970) Theor. Probab. Appl., 15, 68-86.
[3] Rates of weak convergence and asymptotic expansions in classical central limit theorems (1971) Ann. Math. Stat., 42, 241-259.
[4] Speed of convergence of the n-fold convolution of a probability measure on a compact group (1972) Z. Wahrscheinlichkeitstheorie Ver. Geb., 25, 1-10. [5J Recent results on refinements of the central limit theorem (1972) Proc. Sixth Berkeley Symposium on Math. Stat. and Prob., 2, 453-484. [6J Errors of normal approximation (1973) Proc. International Conf. on Prob. Theory and Math. Statist., Vilnius, U.S.S.R., 117-119. [7J Random exchange economies (1973) J. Econ. Theor, 6, 37-67 (with M. Majumdar). [8] On errors of normal approximation, (1975) Ann. Probab., 3, 815-828. [9J Normal Approximation and Asymptotic Expansions (with R. Ranga Rao) (1976) Wiley, New York. Russian Edition (1982). Revised Reprint by Krieger, Florida (1986). [10J On the stochastic foundations of the theory of water flow through unsaturated soil, (1976) Water Res. Research, 12, 503-512 (with V.K. Gupta and G. Sposito). [11] Refinements of the multidimensional central limit theorem and applications (1977) Ann. Probab., 7, 1-28. (Special invited paper). [12] On the validity of the formal Edgeworth expansion, Ann. Statist. (1978) 6, 434-451 (joint with J.K. Ghosh). [13] Criteria for recurrence and existence of invariant measures for multidimensional diffusions (1978) Ann. Probab., 6, 541-553. [14J On a statistical theory of solute transport in porous media (1979) SIAM J. Appl. Math., 34, 485-498 (joint with V.K. Gupta). [15] Foundational theories of solute transport in porous media: a critical review (1979) Advances in Water Res., 2, 59-68 (joint with V.K. Gupta and G. Sposito). [16] On global stability of some stochastic economic processes: A synthesis (1980) Quantitative Economics and Development (Ed. by L.R. Klein, M. Nerlove and R.C. Tsiang), 19-43, Academic Press, New York (with M. Majumdar).
ix
[17] A molecular approach to the foundations of solute transport in porous media, 1. Conservative solutes inhomogeneous, saturated media (1981) J. Hydrology, 50, 355-370 (joint with V.K. Gupta and G. Sposito). [18] Asymptotic behavior of several dimensional diffusions, Nonlinear Stochastic Systems in Physics, Chemistry and Biology (1981) (Ed. by L. Arnold and R. Lefever), Springer-Verlag. [19J Recurrence and ergodicity of diffusions (1982) J. Mult. Analysis, 12, 95-122 (with S. Ramasubramanian). [20J On classical limit theorems for diffusions (1982) Sankhya 44, Ser. A, 47-71. [21 J On the functional central limit theorem and the law of the iterated logarithm for Markov processes (1982) Zeit. Wahr. Ver. Geb. 60, 185-201. [22J The Hurst effect under trend (1983) J. App. Prob. 20, 649-662 (with V.K. Gupta and E. Waymire). [23J A new derivation of the Taylor-Aris theory of solute dispersion in a capillary, (1983) Water Res. Research, 19(4),945-951 (with V.K. Gupta). [24] A theoretical explanation of solute dispersion in saturated porous media at the Darcy scale (1983) Water Res. Research, 19(4),938-944 (with V.K. Gupta). [25J On the order of magnitude of cumulants of von Mises functionals and related statistics (1983) Ann. Prob., 11(2), 346-354 (with M.L. Puri). [26] Fokker Planck equations, Encyclopedia of Statistical Sciences, Vol. 3 (ed. by S. Kotz and R. Johnson) (1983) Wiley, New York, (joint with C.M. Newman). [27J Stochastic models in mathematical economics: A review, Statistics: Applications and New Directions (1984) Proc. lSI Golden Jubilee Int. Conf. (ed. by J.K. Ghosh and G. Kalianpur), 55-99 (joint with M. Majumdar). [28] On the Taylor-Aris theory of solute transport in a capillary (1984) SIAM J. Appl. Math. 44(1) (joint with V.K. Gupta). [29] Some recent results on Cramer-Edgeworth expansions with applications, Multivariate Analysis VI (1985) Proceedings of the Sixth International Symposium on Multivariate Analysis, (P.R. Krishnaiah, ed.), 57-75. [30] Asymptotic expansions and applications (1985) Proc. Fourth Vilnius Conf. on Prob. and Math. Stat., Vilnius, USSR. [31] A central limit theorem for diffusions with periodic coefficients (1985) Ann. Probab. 13, 385-396. [32] Solute dispersion in multidimensional periodic porous media (1986) Water Res. Research, 22(2), 156-164 (joint with V.K. Gupta).
[33] Some aspects of Edgeworth expansions in statistics and probability, New Perspectives in Theoretical and Applied Statistics (1987) (ed. by M. Puri, J. Villaplana and W. Wertz), Wiley, New York, 157-170. x
[34] Central limit theorems for diffusions with almost periodic coefficients (1988) Sankhya 50, 9-25 (joint with S. Ramasubramanian). [35] Asymptotics of a class of Markov processes which are not in general irreducible (1988) Ann. Probab. 16, 1333-1347 (with O. Lee). [36] On moment conditions for valid formal Edgeworth expansions (1988) J. Mult. Analysis 27, 68-79 (with J.K.Ghosh). [37] Ergodicity and the central limit theorem for a class of Markov Processes (1988) J. Mult. Analysis 27, 80-90 (with O. Lee). [38] Convolution effect in the determination of compositional profiles and diffusion coefficients by microprobe step scans (1988) American Mineralogist, 73, 901-909 (with J. Ganguly and S. Chakraborty). [39] Asymptotics of solute dispersion in periodic porous media (1989) SIAM J. Appl. Math., 49, 86-98 (with V.K. Gupta and H.F. Walker). [40] Second order and Lp~comparisons between the bootstrap and empirical Edgeworth expansion methodologies (1989) Ann. Statist., 17, 160-169 (with M. Qumsiyeh). [41] Controlled semi-Markov models-the discounted case (1989) J. Stat. Plan. Inf., 21, 365-381 (with M.Majumdar). [42] Controlled semi-Markov models under long-run average rewards (1989) J. Stat. Plan. Inf. 22, 223-242 (with M. Majumdar). [43] Applications of central limit theorems to solute dispersion in saturated porous media: from kinetic to field scales (1990) in Dynamics of Fluids in Hierarchical Porous Media (Ed. by J. Cushman), Academic Press, 61-96 (with V.K. Gupta). [44] Asymptotic Statistics Birkhauser (1990) DMV Lecture Series (with M. Denker). [45] Stochastic Processes with Applications (1990) Wiley, (with E. Waymire). [46] An extension of the classical method of images for the construction of reflecting diffusions (1991) Proc. R.C. Bose Symp. on Prob., Math. Stat. and Design of Experiments, 155-164, Wiley (Eastern), (with E.C. Waymire). [47] Stability in distribution for a class of singular diffusions (1992) Ann. Probab., 20, 312-321 (with G. Basak). [48] Central limit theorems for diffusions: recent results, open problems and some applications (1992) Proc. I.I.M. Conf., Oxford Univ. Press (with S. Sen). [49] A class of U-statistics and asymptotic normality of the number of k-clusters (1992) J. Multivariate Analysis 43, 300-330 (with J.K. Ghosh). [50] The range of the infinitesimal generator of an ergodic diffusion (1993) in Statistics and Probability: A Raghu Raj Bahadur Festschrift (J.K. Ghosh. et aI, editors), 73-81 (with G. Basak). Wiley. Xl
[51] Random iterations of two quadratic maps (1993) in Stochastic Processes: A Festschrift for G. Kallianpur (S. Cambanis et al., editors), 13-22 (with B.V. Rao) , Springer-Verlag. [52] Markov processes: asymptotic stability in distribution, central limit theorems (1993) in Probability and Statistics (S.K. Basu, B.K. Sinha, editors), Narosa Publishing House, New Delhi, 33-43. [53] Proxy and instrumental variable methods in regression with one regressor missing (1994) J. Mult. Analysis 47, 123-138 (joint with D.K. Bhattacharyya) . [54] Ergodicity of first order nonlinear autoregressive models (1995) J. Theor. Probab. 8, 207-219, (with C. Lee). [55] On geometric ergodicity of nonlinear autoregressive models, Statistics and Probability Letters, 311-315 (with C. Lee). [56] Methodology and applications (1995) in Advances in Econometrics and Quantitative Economics, (G.S. Maddala and P.C.B. Phillips, eds.), 88-122 (with M.L. Puri), Blackwell, Oxford, U.K. [57] Time scales for Gaussian approximation and its breakdown under a hierarchy of periodic spatial heterogeneities (1995) Bernoulli 1, 81-123 (with F. G6tze). [58] Comparisons of Chisquare, Edgeworth expansions and bootstrap approximations to the distributions of the frequency Chisquare (1996) Sankhya, Ser. A 58, 57-68 (with N.H. Chan). [59] Asymptotics of iteration of i.i.d. symmetric stable processes (1996) Research Developments in Probability and Statistics-Madan Puri Festschrift, (E. Brunner and M. Denker, eds.), 3-10 (with B.V. Rao). [60] A hierarchy of gaussian and non-gaussian asymptotics of a class of FokkerPlanck equations with multiple scales (1997) Nonlinear Analysis, Theory, Methods and Applications, 30, No.1, 257-263, Proc. 2nd World Congress of Nonlinear Analysis, Athens, Greece, Elsevier Science Ltd. [61] Central limit theorems for diffusions: recent results, open problems and some applications, Probability and Its Applications (1997) (M.C. Bhattacharjee and S.K. Basu, eds.), 16-31, Oxford Univ. Press (with S. Sen). [62] Phase changes with time for a class of diffusions with multiple periodic spatial scales, and applications (1997) Proc. 51st Session of the International Statistical Institute, Istanbul, TUrkey. [63] Convergence to equilibrium of random dynamical systems generated by i.i.d. monotone maps with applications to economics (1999) in Asymptotics, Nonparametrics, and Time Series: Festschrift for M.L. Puri (S. Ghosh, Editor), 713-742 (with M. Majumdar), Marcel Dekker (New York). [64] Speed of convergence to equilibrium and normality for diffusions with multiple periodic scales (1999) Stochastic Processes and Applications, 80, 55-86 (with M. Denker and A. Goswami). xii
[65] Multiscale diffusion processes with periodic coefficients and an application to solute transport in porous media (1999) (Special Invited Paper), Annals of Applied Probability, 9, 951-1020. [66] On a theorem of Dubins and Freedman (1999) J. Theoretical Probab. 12, 1165-1185 (with M. Majumdar). [67] Estimating the probability mass of unobserved support in random sampling (2000) J. Statist Plan and Inf., 91-106 (with A. Almudevar and C.C. Sastri). [68] Random iteration of i.i.d. quadratic maps (2000) in Stochastics in Finite and Infinite Dimensions: In Honor of G. Kallianpur (T. Hida, R.L. Karandikar, H. Kunita, B.S. Rajput, S. Watanabe and J. Xiang, eds.), Birkhauser, 49-58 (with K.B. Athreya). [69] Stochastic equivalence of convex ordered distributions and applications (2000) Probability in Engineering and Informational Science, vol. 14, 33-48 (with M.C. Bhattacharjee). [70] A class of random continued fractions with singular equilibria (2000) in Perspectives in Statistical Sciences (A.K. Basu, J.K. Ghosh, P.K. Sen and B.K. Sinha, eds.), Oxford University Press, 75-86, (with A. Goswami). [71] On characterizing the probability of survival in a large competitive economy (2001) Review of Economic Design, 6, 133-153 (with M. Majumdar). [72] On a class of stable random dynamical systems: Theory and applications (2001) J. Economic Theory, 96, 208-229 (with M. Majumdar). [73] A note on the distribution of integrals of geometric Brownian motion (2001) Stat. and Probab. Letters, 55, 187-192 (with E. Thomann and E.C. Waymire). [74] Iterated random maps and some classes of Markov processes (2001) in: Handbook of Statistics, Vol. 19, Vo. 19 (D.N. Shanbhag and C.R. Rao, eds.), Elsevier Science. 145-170 (with E.C. Waymire). [75] Markov processes and their applications, In: Handbook of Stochastic Analysis and Applications (2002) (D. Kannan and V. Lakshminatham, eds.). Marcel Dekker 1-46. [76] Large sample theory of intrinsic and extrinsic sample means on manifolds-I. Annals of Statistics (In Press) (with V. Patrangenaru). [77] Phase changes with time for a class of autonomous multiscale diffusions (2002) Sankhya, Ser. A, Special Issue in Honor of D.Basu, Guest ed. A. DasGupta, 64(3), 741-762. [78] An approach to the existence of unique invariant probabilities for Markov processes (2002) In: Limit Theorems in Probability and Statistics (I. Berkes, E. Csaki, M. Csorgo, eds.), J. Bolyai Mathematical Society, Budapest (with E. C. Waymire). [79] Phase changes with time for a class of autonomous multiscale diffusions, in Sankhya: Special issue in memory of D. Basu (To appear). xiii
[80] Markov processes: asymptotic stability in distribution, central limit theorems (2002) in Probability and Statistics, 33-43 (S.K. Basu and B.K. Sinha, eds.). [81] Review of "Limit Theorems of Probability Theory" by V.V. Petrov (2002) Bull. Amer. Math. Soc. 34, no. 1, 85-88. [82] Random Dynamical Systems: Theory and Applications (with M. Majumdar). To appear in the Cambridge Series in Economics, Cambridge Univ. Press. [83] Stochastic Processes: Theory and Applications (with E. Waymire). To appear in the Graduate Texts in Mathematics Series, Springer.
XIV
Iteration of lID Random Maps on R+ K.B. Athreya 1 Iowa State University and Cornell University Abstract Let {Xn} be a Markov chain on R+ generated by the iteration scheme X n+ 1 = C n+ 1 X n g(Xu ) , where {Cn,gn(-)} are i.i.d. such that {Cn} are nonnegative r.v. with values in [0, LJ, L::::; 00, {gn} are continuous functions from [0, (0) -+ [0,1] with gn(O) = 1. This paper presents a survey of recent results on the existence of nontrivial stationary measures, Harris irreducibility and uniqueness of stationary measures, convergence and persistence. Four well known special cases i.e. the logistic, Ricker, Hassel and Vellekoop-Hognas models are discussed.
Keywords: Markov chains, IID random maps, Stationary measures, Harris reducibility AMS Classification: 60J05, 60F05
1
Introduction
A topic of some interest to Professor Rabi N. Bhattacharya, whom the present volume honors, and to which he has contributed substatially is the iteration of i.i.d. random quadratic maps on the unit interval [0, 1]. Beginning with the paper Bhattacharya and Rao [7] where they analyzed the case of i.i.d. iteration of two quadratic maps using the Dubins-Freedman [9] results on random monotone maps on an interval, Professor Bhattacharya has obtained a number of interesting results on the uniqueness and support of the stationary distribution as well as on rates of convergence. For these the reader is referred to Bhattacharya and Majumdar [6] and Bhattacharya and Waymire [8]. In the present paper we study Markov chains generated by iteration of i.i.d. random maps on R+ that are restricted to the class of functions f: R+ --+ R+ such that they possess a finite, positive derivative at 0, vanish at and have a sublinear growth for large values. This class is of relevance and use in population ecology and growth models in economics. The conditions imposed on f in this class reflects two features common in ecological modelling, namely, i) for small values of the population size Xn at time n, the population size X n+ 1 at time n + 1 is approximately proportional to Xn with a random proportionality constant while for large values of X n , competition sets in and the linear growth is scaled down by a factor. This class includes many of the known models in the ecology literature such as the logistic maps, Ricker maps, Hassel maps and Vellekoop-Hognas maps, as explained in the next section. Here is an outline of the rest ofthe paper. In the next section we describe the basic mathematical set up and establish some results for Feller chains on R+. In section 3 we describe a set of necessary and two sets of sufficient conditions for the existence of stationary measures with support in (0, CXJ). In section 4, a trichotomy into subcritical, critical and supercritical cases is introduced and convergence results for the subcritical and critical cases are provided. Section 5
°
1 Reserch
supported in part by AFOSR lISI F 49620-01-1-0076
1
Iteration of IID Random Maps on R+
2
is devoted to Harris irreducibility and uniqueness of the stationary measures in the supercritical case. Some open problems are indicated at the end. It is a great pleasure for the author to dedicate this paper to Professor Rabi N. Bhattacharya who has been a dear friend and a source of inspiration.
2
The mathematical framework
Let the collection F of functions f: [0, L) i)
-----*
[0, L), L ::::; 00 be such that
f is continuous
ii) f(O)
=0 f~(O) exists and is positive and finite
iii) lim f(x)
-
iv) g(x) ==
f.+1(0)
xlO
x
f~) satisfies 0 < g(x) < 1 for 0 < x < L.
Let (D, B, P) be a probability space. Let {fj (w, x) }j2':1 be a collection of random maps from D x [0,00) -----* [0,00) that are jointly measurable, i.e. that are (B x B[O, 00), B[O, 00)) measurable and for each j, fJ(w,·) E F with probability one. Consider the random dynamical system generated by the iteration scheme:
X n + 1 (w,x)
Xo(w, x)
(1)
x.
Since fj(w,·) E F w.p.l. the model (1) reflects the two features common in ecological modelling i.e. for small values of X n , X n+ 1 is proportional to Xn with proportionality constant f~+l (0) _ Cn +1, say, and for large values of X t , this is reduced by the factor g(Xn). The class F includes the logistic, Ricker, Hassel, Vellekoop-H6gnas families mentioned in the introduction, as shown below. For the logistic family, fc(x) = 1 - x for 0 ::::; x ::::; 1.
= cX(l - x),
For the Ricker family [13]' L
= 00, fc,d(X) = cxe- dx ,
L
= 1,
f~(O)
= c, and
9 (x)
g(x) = e- dx , 0::::; x < 00. For the Hassel family [11], L and g(x) = (1 + x)-d.
f~(O)
= 00, fc,d(X) = cx(1 + x)-d,
= c,
f~(O)
=c
For the Vellekoop-H6gnas family [14], L = 00, f(x) = rx(h(x))-b, f~(O)
=
T,
g(x) = (h(x))-b.
From now on, suppose that {fdi2':1 are i.i.d. stochastic processes. Then the sequence {Xn} defined by (1) is a Markov chain with state space S = [0, L) and transition function P(·, A) = P(j, (w,.) E A) and initial value Xo = x. the same is true when Xo is chosen as a random variable (with values in S) but independently of {fd. Further, since fi are continuous w.p.l., {Xn} has the Feller property:
K.B. Athreya
3
For each k: 5 --+ 5 bounded and continuous, (Pk)(x) _ E(k(Xl)lxo = x) is continuous in x. For Feller Markov Chains it is known [8] that if a probability measure r is the weak limit point of the sequence {rn (.)} of occupation measures,
1
n-l
r n,x(A) == -:;;, L P(xJ
(2)
E Alxo = x)
o then r is necessarily stationary for P, i.e.
r(A)
=
is
P(x, A)r(dx)
VA
E
(3)
B(5),
the Borel a-algebra on S. The following proposition is slightly more general.
Proposition 2.1. Let {Xn} be Feller with state space 5 = [0, L). Let a subprobability measure r(.) on 5 be a vague limit point of r n,Xo for some initial r.v. Xo. Then r is stationary for P, i.e. it satisfies (3). For a proof see Athreya [1]. A sufficient condition for ensuring that every vague limit point r of {rn,x} is nontrivial on (0, L), i.e. satisfies r(O, L) > is provided by the following.
°
Proposition 2.2. Suppose there exists a V: 5 == [0, L) and constants <ex, M < 00 such that
°
i) V x ~ K,
E(V(xdIXo
ii) V x E 5,
E(V(Xl)IXo = x) ::; V(x)
Then r(K) 2 limr n,x(K) 2
=
--+
R+ a set K
c (0, L)
x) ::; V(x)- ex
+M
cx':;M > 0.
The proof is not difficult and may be found in Athreya [1].
3
Stationary Measures
In this section we present one set of necessary and two sets of sufficient conditions for the existence of a stationary probability measure 1f such that 1f(0, L) = 1 for the Markov Chain (1). For proofs of these see Athreya [1].
Theorem 3.1. Let C· == lim fj(x) , J dO x
9j(X) ==
fj(x) {
1CjX
> for x = for x
° °
Let
(4) Suppose there exists a probability measure 1f satisfying the stationarity condition (3) and the nontriviality condition 1f(0, L) = 1. Then the following hold:
i) E(ln C 1 )
O.
Corollary 3.1. If E In C l
:::;
0 then
i) The only stationary probability meausre on [0, (0) is the delta measure at
O. ii) For any x ~ 0, and Borel sets A such that A C (0, L)
lim r n,x(A) = O.
Next we present two sets of sufficient conditions for the existence of a stationary measure n with n(O, (0) > 0 for the Markov chain {Xn} in (1). Theorem 3.2. Let {fj}, {C j }, {9j} be as in Theorem 3.1. Let Dj(w)
== supfj(w, x). x2:0
Assume
i) k(x)
=
-Eln9l(x)
< 00
for all 0
< x < L.
ii) limk(x) = O. x!O
iii) kC) is nondecreasin9 in (T, L) for some 0 :::; T iv) EIlnC11
O
EllnCll
0,
Ed l
1 if d l = 1 if d l < 1
00
So we need P(d l ;::: 1) = 1. This implies Dl :s; C 1 w.p.l. and so v) is implied by E(ln Cd+ < 00 which in turn is implied by iv). Finally, Iln(l+Dl)1 :s; In(1+C1 ). Thus i) - vi) of Theorem 3.2 are implied by EllnCll < 00, E(lnCt} > 0, Ed 1 < 00, P(d l ;::: 1) = 1.
4. Random Vellekoop-Hognas Maps [14] Here JI(x) = C l x(h l (x))-b 1 , O:S; X < 00 where O:S; C l , bl < 00 and h l (-) satisfies hI (0) = 1, hI (x) ;::: 1 for x ;::: 0, hI (-), is continuously differentiable and hl(x)
== x~~~~i is strictly increasing.
Note that this includes all three previous cases. So k(x) = Ebllnhl(x). Next, to find Dl note that the function rl(x) = In(x(hl(x))-h) satisfies
hi (x)
1
1
1 (
r l (x)=;-b 1hl (x)=;
-
)
1-bl hl (x).
Since hI (x) is strictly increasing and is zero at x = 0, ri(x) > 0 for 0 :s; x CX, where CXl = inf{ x: hI (x) > bt}. So
Thus, i) - vi) of Theorem 3.2 are implied by i) Ebllnhl(x)
00 as x --> 00 is not unrealistic. This leads us to a second set of sufficient conditions.
Iteration of IID Random Maps on R+
6
Theorem 3.3. Let {fj}, {C j }, {9} be as in Theorem 3.1. Suppose
i) lim ElnC191(x) - (31 exists and is > O. x-->O
ii) lim E(lnC1x91(X))+ x-->O
iii) lim ElnC191(x) x-->L
v) k(x)
= O.
== (32
== EllnC191(x)1
exists and is
< O.
is bounded on [a,b] for all 0
< a < b < L.
Then there exists a stationary measure 7[" for the Markov chain {Xn} defined by
(1) satisfyin9 7["(O,L) = 1. Corollary 3.2. In the set up of Theorem 3.3, suppose:
i) EIlnC11
O.
ii) With probability one lim 91 (x) = 1, lim 91 (x) = 71 xlO
xjL
> 0 and there exists
0< a such that a ~ inf 91(X) ~ SUP91(X) ~ 1. x
iii) ElnC1 + Eln71
x
< O.
Then there exists a stationary 7[" for {Xn} satisfying 7["(0, L)
4
= 1.
Convergence results
The last section dealt with the existence of stationary measures for the Markov chain {Xn} generated by (1) or equivalently by the iteration scheme
(6) where the pair (Cn ,9n(-)n?1) are i.i.d. with 0 < Cn < 00, 9n(-) being w.p.I. a continuous function as in Theorem 3.1 and independent of Xo. The convergence questions that we consider here are: i) The almost sure convergence of the sequence {Xn} as n gence of the trajectories,
---> 00,
i.e. conver-
ii) the convergence of {Xn} in probability and iii) the convergence of the distribution of {Xn}. Since the state space of the Markov chain {Xn} is uncountable one has to look for results from general state space Markov chains theory. There is a body of results available for the case when the chain is Harris irreducible (see Nummelin [12]). Unfortunately, many of the iterated random maps cases turn out to be not irreducible, especially among those where the collection of functions F sampled from is finite or countable. In these cases if the maps are interval maps that are monotone then the Dubins-Freedman theory [9] can be appealed to. The papers by Bhattacharya and Rao [7], Bhattacharya and
K.B. Athreya
7
Majumdar [6] and Bhattacharya and Waymire [8] have nice accounts of this in the random logistic maps case. On the other hand, as shown in the next section, if the distribution of C n is smooth, e.g. absolutely continuous, then {Xn} turns out to be (under some more hypothesis) Harris irreducible. For the random logistics case Bhattacharya and Rao [7], Bhattacharya and Waymire [8] have some nice results under such assumptions. Motivated by Theorem 3.1, we give the following definition. Definition: The Markov chain {Xn} of (1) or (6) is subcritical, critical, or supercritical according as E In C 1 < 0, = 0, or > 0. In the sub critical case, {Xn} converges to zero w.p.l. In fact, a slightly more general result holds. For the rest of this section {Xnh>o will be as in (6). Theorem 4.1. Suppose -1
n
lim- LlnCj(w)
n
== d(w)
ed(w) and hence Xn(w)
---+
(8)
w.p.l.
° w.p.l.
Proof. Since fj E F Cn+lXngn+l (Xn) ::; C n+1X n
X n+1
< C n +1 C n ... C 1X o Thus
1 lIn -lnXn::; -lnXo + - ~lnC. n n nL J 1
Now (7)
=?
(8).
D Corollary 4.1. If ElnCl < 0 then (7) and hence (8) holds, provided {C 1}n2:1 are i.i.d. Remark: In this theorem the hypothesis {Cn }n>1 are independent is not needed. The geometric decay of {Xn} can be exploited to establish the log normality of X n , a common hypothesis proposed in the ecology literature. Theorem 4.2. Assume i) gj (-) is nonincreasing in [0,6] w.p.l. for some 6 ii) E In C 1 < 0, E(ln C 1)2 < iii)
°: ;
k(x) = - E In gl (x)
0.
for some
for all x and nondecreasing.
°<ex
0,
i.e. an == Px(Xn ~E) ~ 0 in the Cesaro sense. A natural question is whether it can be improved to full convergence or equivalently does Xn ~ 0 in probability for all 0 < x < oo? For the logistic case, i.e. when h is a logistic map w.p.l. Athreya and Dai [3] have shown this by comparison argument. This is extended below to the present context assuming that w.p.l., II is unimodal with a common nonrandom mode a such that h is nondecreasing in [0, a] and nonincreasing in
[a, 00). Theorem 4.3. Let E(lnC1 )+ < 00 and ElnC1 = O. Assume further that there exists a nonrandom a in (0,00) such that w.p.l. h is nondecreasing in [0, al and nonincreasing in [a, 00 ). Then Xn~O for any initial value Xo = x. (12)
The proof makes use of the following.
K.B. Athreya
9
Theorem 4.4. COMPARISON LEMMA Let {fdi>1 be i.i.d. and unimodal as in the above theorem. Let Xo be independent of {fih>l. Let {Xn}, {Yn }, fYn} and {Zn}, n;:::: 0, be defined by X n+1 Y n+1
fn+l (Xn) min{fn+l(Yn ),o}, Yo = min{Xo,o}
Y n+1
min{fn+l CYn ) , o}, Yo
Zn
=
0
min{Xn,o}
Then for all n ;:::: 0, Y n ;:::: Y n ;:::: Zn w.p.l. Proof. Since Yo :::; Yo = 0, and iI is nondecreasing in [0, oj, iI (Yo) :::; iI (Yo) ip1plying Y1 = min(iI (Yo), 0) :::; min(iI (Yo), 0) == 171. Now induction yields Yn ;:::: Yn for all n. If Xo :::; 0, then Yo = Xo and so
implying Y1 = min{iI(Yo),o} = min{Xl'o} = ZI. If Xo > 0, then Yo = 0 so
implying Y1 = min{iI(Yo),o} ;:::: min{Xl'o} yields Y n ;:::: Zn for all n.
= ZI· Thus Y1
;::::
ZI. Induction
o
Remark: This comparison lemma does not require any conditions as E In C1 . Corollary 4.2. For any 0 <E
0 (as in the proof of Theorem 3.1). Thus /-L;;([E,OO)) --t 0 implying (13). A natural question prompted by Theorem 4.3 is whether in the critical case the convergence of Xn to zero in probability could be strengthened to convergence w.p.l. Athreya and Schuh [5] showed that in the logistic case this is not possible.
Theorem 4.5. Let ElnCl I}. Then:
=
0, P(Cl
= 1) < 1 and I ==
sup{x :P(Cl < x)
2, i.e. P(Cl > 2) > 0 then Px (limXn 2: 1 -
iv) For any initial value of X o, the empirical distribution
converges weakly to So w.p.l.
Remark: The above result has an interesting interpretation. In the critical case even though for large n the population size Xn is small with a high probability the population does not die out. Indeed w.p.l. the trajectory of Xn rises to heights {3 and beyond again and again. This may be referred to as the persistence of the critical logistic process.
5
Harris irreducibility
A Markov chain {Xn} with a measurable state space (S, 5) and transition function P(· , .) is Harris irreducible with reference measure ¢ if for every x E 5, ¢(A) > o::::} P(Xn E A for some n 2: 11Xo = x) > O. Here ¢ is assumed to be a a-finite nonzero measure. In this section we find sufficient conditions for Markov chains on S C R+ generated by the iteration of maps of the form f(x) = Oh(x) where he) is a continuous function. All the results of this section are from Athreya [2] where the reader will find full details. Let S = [0, L], L ::; 00, 0 = [0, k], k::::; 00 and h: S --t [0,(0) be continuous and strictly positive on (0, L). Let {Odi>l be i.i.d. r.v. with values in [0, k]. Let {Xn}n>O be the Markov chain defined by
(14)
K.B. Athreya
11
where Xo is independent of {Od. It is assumed here that for all 0 in [0, kJ, Oh(x) E S=[O,L]. The following provides a sufficient condition for Harris irreducibility of {Xn }.
Theorem 5.1. Suppose:
i) :3 0 <ex< k, 0 and a strictly positive Borel function W in J == (ex - 0
P(Xm E A/Xo = x) > O. If, in addition to i) and ii), suppose the following holds:
< L, :3 a finite set {ex1,ex2, ... ,exn} contained in support of Q(.) = P(Ol E·) such that Y n E I where
iii) Y 0 < x
lj+1 = I(lj, exj+1),
Yo = x,
i = 0,1,2, ... , n - 1.
Then (b): {Xn} is Harris irreducible on (0, L) with reference measure ¢(.) == m(· n I).
Remark: Condition i) is a smoothness hypothesis on the distribution of 01 . Without this, one could provide examples where the chain is not Harris irreducible. For example, if 01 has a finite support and {Xn} admits a stationary distribution Jr that is nonatomic then it cannot be Harris irreducible since for any initial value x, the distribution of Xn is discrete and hence cannot converge in the Cesaro sense and in variation norm to Jr. But Harris irreducibility and the existence of a stationary distribution 7r would imply such a convergence. Condition ii) is the existence of a periodic point. The first conclusion (a) is a local irreducibility result while (b) is a global irreducibility result. The next result exploits the fact that a sufficient condition for iii) of Theorem 5.1 to hold in the case when h(·) is S-unimodal on [0,1] (see definition below) is for the pair (p, ex) to be such that p is a stable periodic point for the map 1(·, ex) =ex h(} Definition: A map f: [0, 1]----+ [0,1] is called S-unimodal if i)
f
is three times continuously differentiable,
ii) I is unimodal with a mode at c in (0, 1) such that I" (c) < 0 and I is strictly increasing in (0, c) and strictly decreasing in (c,l), iii) f(O)
= f(l) = 0 and
iv) the Schwartzian derivative of I
(S f)(x)
= {
is < 0 for all 0 < x < 1.
j"I(X) !~)
3 (J"(X))2 ['(x)
-2
if I'(x) i= 0 if f'(x) = o.
Iteration of IID Random Maps on R+
12 Examples of S unimodal maps are
= x(1 -
f(x)
x),
f(x)
=
x 2 sin ?TX.
A result of Guckenheimer [10] is that if f(·) is a S-unimodal with a stable
~ 1,
periodic point p, i.e. for some m
f(rn)(p)
= p and If(rn)' (p)1 < 1, then
for almost all x in (0,1) (with respect to Lebesgue measure) the limit point set w(x) of the orbit Ox == {fen) (x), n ~ O} of x under f coincides with the orbit 'Y(p) of p under f, i.e. the set {p, f(p), ... ,f(rn-l) (p)}. Theorem 5.2. Let S each () E (). Suppose:
= [0,1]' 0
i) h(·) is S-unimodal ii) ::3 (p, ex) E Sx8, p
if(rn)' (p, ex)1
and for some m ~ 1, fern) (p, ex)
S for
p and
(up is a stable periodic point of f(·, ex)).
°
iii) ::3 & > and a stricty positive function \fJ on J of e such that for all B c J,
Q(B) == P(Oi E B)
~
== (ex -&, ex +&) a subset
1
\fJ(O)m(dO)
where {(}di~1 are i.i.d. r.v. with values in 8 and m(·) is Lebesgue measure. iv) X n+1 = (}n+lh(Xn), n ~ 0, where Xo is independent of {(}di~1 with values in (0,1). Then {Xn} is Harris irreducible.
A special case of the above is the case of i.i.d. random logistic maps. Theorem 5.3. Let S = [0,1], () = [0,4], X n+1 = (}n+lXn(I-Xn) with {(}n}n~1 i. i. d. r. v. with values in [0,4] and Xo an independent r. v. with values in [0,1]. Suppose ::3 an open interval J c (0,4) and a strictly positive function \fJ on J such that for all B c J Q(B) = P((}i E B)
~
1
\fJ((})M(d(})
where m(·) is Lebesgue measure. If In (1,4) = Q, then assume in addition that there exists a(3 > 1 in the support ofQ(·) such that the map f(x, (3) (3x(l-x) admits a stable periodic point p in (0, 1). Then {Xn} is Harris irreducible. Suppose further that E In C 1 > and E lIn (1 - ~1 ) < 00. Then there exists a unique ergodic absolutely continuous stationary measure ?T such that the occupation measure
°
converges to
?T
I
in total variation norm.
Corollary 5.1. In the set up of Theorem 5.3 suppose (}1 has the uniform [0,4] distribution. Then::3 a unique absolutely stationary probability ?T such that ?T(O, 1) = 1 and for any < x < 1, IlPx(Xn E .) - ?TOil ---> where II . II is total variation.
°
°
K.B. Athreya
6
13
Some open questions 1) Persistence in the critical case. Extend the Athreya-Schuh [5] results to the present more general setting. 2) Nonuniqueness. Extend the nonuniqueness result of Athreya and Dai [4] for the logistic case to the present setting. 3) The condition E lIn (1 - ~l ) I < 00. For the random logistic case in the supercritical case this is a sufficient condition for the existence of a nontrivial stationary measure. However, it is known that if P(C 1 = 4) = 1 then von Neumann and Ulam [15] showed that the arcsine law is the unique ergodic has absolutely continuous stationary distribution. It is worth investigating whether this condition could be dropped. 4) The lognormal limit law in the critical case. It has been shown here that in the subcritical case the distribution of In Xn is approximately normal. Extend this to the critical case. 5) Statistical inference. Suppose the sequence {Xj} is observed for 0 Can one estimate the distribution of C 1 and gl (-)?
~ j ~
n.
K. B. Athreya
School of ORIE Cornell University Ithica, NY 14853
[email protected] Departments of Mathematics and Statistics Iowa State University Ames, IA 50011
[email protected] Bibliography [1]
Athreya, K. B. (2002a): Stationary measures for some Markov chain models in ecology and economics. Technical Report, School of ORIE, Cornell University (To appear in Economic Theory)
[2]
Athreya, K. B. (2002b): Harris irreducibility of iterates of lID maps on R+. Technical Report, School of ORIE, Cornell University (submitted).
[3]
Athreya K. B. and Dai, J. (2000): Random logistic maps I. Journal of Theoretical Probability. 13, No.2 595-608.
[4]
Athreya, K. B. and Dai, J. (2002): On the nonuniqueness of invariant probability measure for random logistic maps. Annals of Probability, 30, No.1 437-442.
[5]
Athreya, K. B. and Schuh, H. J. (2001): Random logistic maps II, the critical case. Technical Report, School of ORIE, Cornell University (to appear in Journal of Theoretical Probability)
[6]
Bhattacharya, R. N. and Majumdar, M. (1999): On a theorem of Dubins and Freedman. Journal of Theoretical Probability, 12 1165-1185.
14
Iteration of IID Random Maps on R+
[7]
Bhatacharya, R. N. and Rao, B. V. (1993): Random iterations of two quadratic maps in stochastic processes in honor of B. Kallianpur, Edited by Cambanis et aI, Springer-Verlag
[8]
Bhattacharya, R. N. and Waymire, E. (2002): An approach to the existence of unique invariant probabilities for Markov processes, In: Limit Theorems in Probability and Statistics (I. Berkes, E. Csaki, M. Csorgo, eds.), J. Bolyai Mathematical Society, Budapest.
[9]
Dubins, L. E. and Freedman, D. A. (1966): Invariant probability measures for certain Markov Processes. Annals of Mathematical Statistics, 37 837848.
[10] Guckenheimer, J. (1979): Limit sets of S-unimodel maps with zero entropy. Comm. Math. Phys. Vello, 655-659. [11] Hassel, M. P. (1974): Density-dependence in single-species populations. Journal of Animal Ecology 44 283-296. [12] Nummelin, E. (1984): General irreducible Markov chains and nonnegative operators. Cambridge University Press. [13] Ricker, W. E. (1954): Stock and recruitment. J. Fisheries Research Board of Canada II 559-623. [14] Vellekoop, M. H. and Hognas, G. (1997): Stability of stochastic population models. Studia Scientrarum Hungarica 13 459-476. [15] von Neumann and Ulam (1947): On combination of stochastic and deterministic processes. Bulletin of the American Mathematical Society 33 1120.
Adaptive Estimation of Directional Trend Rudolf Beran 1 University of California, Davis
Abstract Consider a one-way layout with one directional observation per factor level. Each observed direction is a unit vector in RP measured with random error. Information accompanying the measurements suggests that the mean directions, normalized to unit length, follow a trend: the factor levels are ordinal and mean directions at nearby factor levels may be close. Measured positions of the paleomagnetic north pole in time illustrate this design. The directional trend estimators studied in this paper stem from penalized least squares (PLS) fits in which the penalty function is the squared norm of first-order or second-order differences of mean vectors at adjacent factor levels. Expressed in spectral form, such PLS estimators suggest a much larger class of monotone shrinkage estimators that use the orthogonal basis implicit in PLS. Penalty weights and, more generally, monotone shrinkage factors are selected to minimize estimated risk. The possibly large risk reduction achieved by such adaptive monotone shrinkage estimators reflects the economy of the PLS orthogonal basis in representing the actual trend and the flexibility of unconstrained monotone shrinkage.
AMS classification: 62Hll, 62J99 Keywords: Directional data, penalized least squares, monotone shrinkage, economical basis, risk estimation, superefficiency, symmetric linear smoother
1
Introduction
Consider n independent measurements taken successively in time on the varying position of the earth's north magnetic pole. Each measured position may be represented as a unit vector in R3 that gives direction from the center of the earth to the north magnetic pole. Because of measurement errors, it is plausible to model the data as a realization of independent random unit vectors {Yi: 1 ::; i ::; n} whose mean vectors are 1]i = E(Yi). The subscript i labels time. The mean direction of Yi is defined to be the unit vector J-li = 1]dl1]i I. In this example, the mean directions {J-li} follow a trend, by which we mean that the subscript order matters and that the distance between J-li and J-lj may be relatively small when i is close to j. The naive estimator of J-li is fi,N,i = Vi. It can be derived as the maximum likelihood estimator of J-li when the distribution of Yi is Fisher-Langevin with mean direction J-li and precision parameter K-. Unless measurement error is very small, {fi,N,d is not a satisfactory estimator of the directional trend {J-ld. If we foresee that the trend in the means {1]i} may possess some degree of smoothness, not known to us, it is natural to look for more efficient estimators within classes of smoothers. In an instructive data analysis, Irving (1977) suggested forming local symmetric weighted averages of the {yd, normalizing these to unit length so as to obtain a more revealing estimator of directional trend. lThis research was supported in part by National Science Foundation Grant DMS99-70266.
15
Adaptive Estimation of Directional Trend
16
A symmetric weighted average is a particular symmetric linear smoother in the sense of Buja, Hastie, and Tibshirani (1989). We may consider any large class of symmetric linear smoothers as candidate estimators for the mean vectors {'r/i} and then proceed as follows: (a) estimate the quadratic risk of each such estimator without assuming any smoothness in the sequence of unknown mean vectors; (b) choose the candidate estimator that minimizes estimated risk; (c) normalize the estimated mean vectors to unit length so as to estimate the directional trend {/Li}. The candidate symmetric linear smoothers treated in this paper generalize certain penalized least squares (PLS) trend estimators. Details of the methodology are presented in Sections 2 and 3. Computational experiments reported in Sections 2 and 4 bring out how the proposed estimators reduce risk through constructive interplay between basis economy and unconstrained monotone shrinkage. Asymptotic theory developed in Section 5 supports key details of the methodology and quantifies how basis economy reduces risk. Other directional trend estimators proposed by Watson (1985), Fisher and Lewis (1985), and Jupp and Kent (1987) rely on analogs of cubic-spline or kernel methods for curve smoothing in Euclidean spaces. Tacit in these treatments are assumptions on the smoothness of the unknown trend. The methods of this paper assume no smoothness in the unknown directional trend but take advantage of any smoothness present to reduce estimation risk.
2
Construction of Estimators
As in the north magnetic pole example, choose subscripts so that Yi is the directional observation associated with the i-th smallest factor level. The directional trend estimators in this paper stem from penalized least squares (PLS) estimators for the mean vectors {'r/i: 1 SiS n} in which the penalty function is the squared norm of first-order or second-order differences of the {'r/d. It will be convenient in the exposition to suppose that the measured directions are unit vectors in RP. The practically important spherical and circular cases correspond to p = 3 and p = 2, respectively.
2.1
Candidate Estimator Classes
The n x p data matrix formed from the observed unit vectors {Yi: 1 SiS n} in RP is
(1)
Here Y(j) denotes the j-th column of Y. The analogously organized matrix of mean vectors is then
'r/~ H
=
E(Y) = {
(2) 'r/~
Rudolf Beran
17
First-order Penalized Least Squares. Let I . I denote Euclidean matrix norm, so that IAI2 = tr(AA') = tr(A' A). For any r x n matrix B, where r ::::;; n, define
= IY - HI2 + ,IBHI2.
d(H, B, ,)
(3)
Let D be the (n - 1) x n first-difference matrix -1
0
1 -1
0 1
0 0
0 0
(4)
D= 0
0
0
-1
1
The first-order PLS candidate estimators for H are the one-parameter family of symmetric linear smoothers {HD ( ,): , ~ O}, where
HDb) = argmind(H,D,,) = (I +,D'D)-ly
(5)
H
For positive " the estimated means in HDb) are more nearly constant in i than the measured directions Yi. A spectral decomposition of D'D is UDADUb, where UD is an orthogonal matrix, AD = diag{AD,n-i+t}, and AD,l ~ ... ~ AD,n = O. The eigenvectors that form the columns of UD are ordered so that the successive diagonal elements of AD are nondecreasing. It follows from (5) that
Let f Db) = {f D, i b): 1 ::::;; i :S n} denote the diagonal vector of the diagonal matrix FDb). Evidently 1 ~ fD,l b) ~ fD,2b) ~ ... ~ fD,nb) ~ O. Formula (6) plus modern algorithms for spectral decomposition provide a numerically stable method for computing the first-order PLS candidate estimators {HDb): , > O}. Other computational methods for HDb) are discussed in Press, Teukolsky, Vetterling and Flannery (1992), section 18.5. Second-order Penalized Least Squares. Let E be the (n - 2) x n seconddifference matrix
-1
o
2 -1
o
o
-1 2
0 -1
o o
o o
0
-1
2
-1
0
E=
o
o
(7)
The second-order PLS candidate estimators for H are the one-parameter family of symmetric linear smoothers {HEb): , ~ O}, where
HEb)
= argmind(H,E,,) = (1 +,E'E)-ly
(8)
H
For positive " the estimated means in HEb) are more nearly linear in i than the measured directions Yi. Replacing D in (6) with E yields a computationally useful alternative formula for HE (,). Here too, the diagonal elements of the matrix FEb) = diag{fEb)} satisfy 1 ~ fE,lb) ~ fE,2b) ~ ... ~ fE,nb) ~ O.
Adaptive Estimation of Directional Trend
18
Monotone Shrinkage Smoothers. Abstracting the structure in formula (6) suggests larger families of candidate symmetric linear estimators for H. Let
(9) and let F = diag{f}. The class of monotone shrinkage candidate estimators for H associated with a specified orthogonal matrix U is
CMon(U) = {H(f,U): f
E
F Mon } with H(f,U) = UFU'y
(10)
Evidently, the first-order PLS candidate estimators are a proper subset of CM on (UD) in which the shrinkage vectors are restricted to {fDb): 'Y > o}. Similarly, the second-order PLS candidate estimators are a proper subset of CMon(UE ) in which the shrinkage vectors are restricted to {fEb): 'Y > o}. The development in this paper emphasizes superefficient estimators of directional trend constructed from the candidate estimators CM on (UD) and CM on (UE) rather than from their PLS subsets. Enlarging the class of candidates can only decrease the risk of the best candidate and proves to be advantageous computationally.
2.2
Choice of Candidate Estimator and Normalization
If the risk function were known, it would be reasonable to choose the candidate monotone shrinkage estimator of H that minimizes risk, using a quadratic loss function for algebraic tractability. Because risk is not known, this oracle estimator is not realizable. Instead, we will select the candidate monotone shrinkage estimator that minimizes estimated risk and verify that the asymptotic performance of this estimator matches that of the oracle estimator. For the risk calculations, we will assume that the measured directions {Yi: 1 :::; i :::; n} are independent column vectors in RP, each having unit length. The distribution of Yi is Fisher-Langevin with mean direction JLi, a unit vector, and precision,." > O. Properties of this probability model are developed in Watson (1983) or Mardia and Jupp (2000) and are summarized in Fisher, Lewis and Embleton (1987). Let Z = U'Y and write, in analogy to (1),
z' Z
= { :1
(11)
Z'n
For any vector h, let ave(h) denote the average of the components of h. Section 3 develops an estimator for the quadratic risk of H(f, U) that is uniformly consistent over all f E F M as ,." and n tend to infinity: P
p(f, U) = ave[k- 1 qf2 + (Z2 - k- 1 q)(1 - 1)2]
with
Z2
=
L
z(j)'
(12)
j=l
Here q = p - 1 while k- 1 is a suitably consistent estimator for the dispersion ,.,,-1. Section 3.2 offers one possible construction of k- 1 . For specified orthogonal matrix U, define the adaptive monotone shrinkage estimator of H to be
HMon(U) = H(jMon(U) , U)
with
iMon(U) = argmin p(f, U). jEFMon
(13)
Rudolf Beran
19
The first and second-order monotone shrinkage estimators H Mon (1) and HMon(2) are specific instances of (13) with U = UD and UE respectively. In the notation that follows (6), the adaptive first-order PLS estimator of H is defined to be
HpLs (l) = H(fD(':;/D), UD) with 1D = argminp(fDb), UD).
(14)
,>0
Replacing the first-difference matrix D in (14) with the second-difference matrix E defines the second-order PLS estimator HPLS(2) of H. Normalizing to unit length the rows of these respective estimators of H yields the monotone shrinkage estimators MMon(k) and the PLS estimators MpLS(k) of the directional trend {/1i: 1 ~ i .:; n}. Data and Naive Fit
First-order PLS Fit
First-order Monotone Fit
Second-order PLS Fit
Sec.;ond-order Monotone Fit
Figure la. Competing fits to time-ordered measured positions of the paleomagnetic north pole. Linear interpolation in the top subplot shows the timesequence of the observed directions. A Paleomagnetic Example. The directional data fitted in Figure 1A consists of n = 26 measured positions of the paleomagnetic north pole taken from rock specimens at various sites in Antarctica of various ages. Kent and Jupp (1987, pp. 42-45) give the data and its provenance. Each subplot uses the Schmidt net, an area-preserving projection of the northern hemisphere onto the plane (cf. Section 4.2). The perimeter of each circle represents the equator while the center corresponds to the geographical north pole. Linear interpolation between successive mean directions or estimated mean directions is used to indicate the time sequence. The subplot in the first row exhibits the measured directions, which coincide with the naive trend estimator. Even with linear interpolation between successive observations, it is difficult to see a pattern, especially in the most recent observations near the geographic north pole. Cells (2,1) and (2,2) display the first-order estimates M pLs (l) and MMon(1) while cells (3,1) and (3,2) exhibit the second-order estimates M pLS (2) and MMon(2). Both monotone shrinkage fits and the second-order PLS fit are similar in appearance. Which should we
Adaptive Estimation of Directional Trend
20
use? On the basis of estimated risks and diagnostic plots, we will argue in Section 4.1 that the best of the competing estimates for this data is MM on (1). Next best, though with substantially larger estimated risks, are MMon(2) and MPLS (2), in that order. In their analysis ofthe data, Kent and Jupp (1987, Figs. 1 and 2) unwrapped the sphere and data onto a plane, used a cubic spline fit to the planar data, then v.:rapped this fit back onto the sphere. Their spline fit on the sphere is similar to M pLS but has a surprising kink in the tail near the left edge of the Schmidt net plot. They noted that this kink lacks physical significance and is an artifact of the spline-fitting technique. That the three estimates MMon(1), M PL s(2) and MMon(2) agree in broad visual features with the Kent-Jupp spline estimate but lack its suspect kink is a point in favor of the shrinkage estimates. Section 3 treats estimation of dispersion ",,-1, risk estimation, and computational algorithms. Diagnostic plots for competing PLS or monotone shrinkage estimators and computational experiments are the subject of Section 4. Asymptotic theory in Section 5 brings out three important properties. First, adaptation works for the PLS and monotone shrinkage candidate estimators in the sense that minimizing estimated risk also minimizes risk asymptotically. Second, the asymptotic risk of the estimators MMon(k) and MpLS(k) never exceeds that of the naive estimator and can be much smaller. Third, for greatest superefficiency of these estimators, the projection of the mean vectors {'I]i} on the first few columns of U should yield an accurate approximation to the {'I]i}. A diagnostic plot is available for identifying this favorable situation.
3
Estimated Risks and Algorithms
This section motivates the risk estimator p(j, U) defined in (12) and discusses methods for computing the directional trend estimators MMon(k) and MPLS(k) defined above.
3.1
Estimating Risks
We suppose in our analysis that the directions {Yi: 1 :s: i :s: n} are independent unit random vectors in RP. The distribution of Yi is Fisher-Langevin (/Lil ",,). As "" tends to infinity, it is known that, for q = p - 1,
'l]i = E(Yi)
COV(Yi)
= [1 - (2",,)-l qJ/Li + 0(",,-1) = ",,-I (I
-
/Li/L~)
+ 0(",,-1)
(15)
and that (16) (see Watson (1983), chapter 4). The limiting normal distribution on the right side of (16) is singular, supported on the q dimensional subspace orthogonal to /Li. From (15) and independence of the rows in Y, E(Y(j»)
= 'I](j),
(17)
Rudolf Beran
21
where (JTj = ~ -1 (1 - J17j) unit vector J1i. Hence,
+ o( ~ -1)
p
and J1ij denotes the j- th component of the
p
E LY(j)Y(j) = L 'Tl(j)'Tl(j) j=l j=l as
+ ~-lqI + 0(1\:-1)
(18)
~
tends to infinity. The performance of any directional trend estimator {Pi measured indirectly through the normalized quadratic loss n
= ildlilil} will be
p
LnCH, H) = n- 1H - HI2 = n- L IfJi - 'Tli1 = n- L lil(j) - 'Tl(j) 12 , 1
2
1
1
i=l
(19)
j=l
which compares the {fJi} with the means {'Tli}. This loss function leads to tractable formulae for risk. Tacit in our use of this loss is the supposition that a good estimator of the trend in means {'Tld will map, by normalizations to unit length, into a good estimator of the directional trend {J1d. Experiments reported in Section 4 offer empirical support for this assumption. For specified orthogonal matrix U, let Z = U'Y as in (11), B = E(Z) = U' H and 3(j, U) = u' H(j, U) = FZ. By analogy with equation (11),
(20)
From (18), p
E L z(j)z(j) j=l
p
= L e(j)e(j) + ~-lqI + 0(~-1)
(21)
j=l
as ~ tends to infinity because z(j) = U' Y(j) and e(j) = U''Tl(j)' Under loss (19) and the Fisher-Langevin model, the risk of candidate estimator H (j, U) is
Rn(H(j, U), H,~) = n-1EIH(j, U)-HI2 = n- 1EIFZ-BI 2 = Rn('3(j, U), B, ~). (22) Let
p
p(j,e,~) = ave[~-lqf2 +e(1- 1)2J
with
e = Le[j)'
(23)
j=l Here all operations on vectors are performed componentwise as in the S language (cf. Becker and Chambers (1984)). Applying (21) to (22) yields p
n- L ElFz(j) 1
e(j) 12
j=l p
n- L tr[F 2Cov(z(j)) 1
+ (I -
F)2e(j)e(j)J
j=l p(j,e,~) +0(~-1)
(24)
Adaptive Estimation of Directional Trend
22
as ~ tends to infinity. We argue in Section 5 that, for any choice of the orthogonal matrix U,
where
(26) This entails that the asymptotic risk of BMon(U) matches that of the oracle estimator, which is the monotone shrinkage estimator that minimizes actual risk. In particular, this asymptotic risk cannot exceed ~-lq, the asymptotic risk of the naive trend estimator, and is often much smaller. To estimate the risk function in (22), it suffices for large ~ to estimate the function ~). It follows from (21) and the definition of z2 in (12) that
pu,e,
p
n- Etr[(I - F)2 L z(j)z(j)] 1
j=l p
n- 1 tr[(I - F)2{L~(j)~(j)
+ ~-lqI}] + o(~-l)
j=l
ave[(e
+ ~-lq)(1 -
J)2]
+ O(~-l).
(27)
This calculation, (23), and (24) motivate estimating the risk of BU, U) by the function pU, U) defined in (12). Section 5 gives this risk estimator much stronger theoretical support. Scrutiny of formulae (23) and (24) throws light on ideal choice of the orthogonal basis matrix U. We say that the basis provided by the columns of U is economical in representing 3 if all but the first few components of ~2 are very nearly zero. In that case, setting the first few components of f equal to one and the remaining components to zero yields a monotone shrinkage candidate estimator of H whose risk, for large ~, is much smaller than that of the naive estimator Y.
3.2
Estimating Dispersion
A simple first-difference estimator of dispersion ~-l may be constructed from the norm of first-differences among the observed directions {yd. The asymptotic approximations below indicate that the bias of this estimator is modest if the norm of first differences among the mean vectors {7]i} is relatively small. When this is not the case, analogous estimators of dispersion can be constructed from the norm of higher-order differences. The first-difference dispersion estimator is n
k- 1 = (2q)-1
L
IYi - Yi_11 2 .
(28)
i=2
If
n
lim lim ~n-l ~ Irli -7]i-11 2 = 0,
n--+CX) K-+CX)
~ i=2
(29)
Rudolf Beran
23
then r;;,-I is a consistent estimator in the sense that lim lim KEIr;;,-1 - K-11 = O. n--+oo
To verify this, let T = L~=2IYi-Yi-112, ei = rJi-I). Evidently
n
KT
=
n- I
2::
(30)
fl,----tOO
K I / 2 (Yi-rJi)
n
lei - ei_11 2 + n- I
i=2
2::
and di =
K I / 2 (rJi-
n
Idi l2 + n- I
i=2
2:: d~(ei -
ei-I).
(31)
i=2
Because of (16), Skorokhod's theorem and Vitali's theorem, there exist versions of the {ei} and independent random column vectors {Wi} such that the distribution of Wi is N(O, I - fJifJ~) and limA;--+oo Elei - Wi 12 = O. These facts and (29) imply n
lim lim EIKT - n- I """ IWi - Wi_11 21 = O. n----+oo ~ i=2
(32)
K,----tCX)
On the other hand, n
lim lim Eln- I """ IWi - Wi_11 2 - 2ql n--+oo /,£-+00 ~ i=2
=
O.
(33)
Limits (32) and (33) imply the consistency property (30) for the original random variables.
3.3
Computational Aspects
The following remarks concern computation of the directional trend estimators MMon(k) and MPLS(k). Let fj = (z2 - r;;,-lq)/z2. Because the estimated risk function (12) satisfies
(34)
i
definition (13) of M on (U) is equivalent to the constrained isotonic weighted least squares evaluation
iMon(U) = argmin ave[(J - fj)]2 z 2].
(35)
jE:FMon
This expression reveals that iMon(U) is a regularization of the raw shrinkage vector fj E (-00, 1]n. Let 11 = {h E Rn: hI ~ h2 ~ ... ~ h n }, a superset of FM. An argument in Beran and Diimbgen (1998) shows that
The pool-adjacent-violators algorithm for isotonic regression (see Robertson, Wright and Dykstra (1988)) finds expeditiously in a finite number of steps. The positive-part clipping in (36) arises because fj is restricted to (-00, 1]n rather than to [0, 1]n.
J
Adaptive Estimation of Directional 'Trend
24
Similarly, definition (14) of iD for first-order PLS is equivalent to the constrained nonlinear weighted least squares evaluation (37) where !D(r) is given in the discussion that follows (6). The S-Plus function nls () may be summoned to solve this problem iteratively, in the manner described on p. 244 of Venables and Ripley (1999). Simple grid search provides the necessary starting approximation to iD. With minor changes in the code, the R function nls () also iterates to iD. Computation of iE for second-order PLS is entirely analogous. In numerical experiments, computation of the monotone shrinkage estimator MMon(k) was considerably faster than computation of the motivating PLS estimator MpLS(k). This finding together with the theoretical superiority in risk of MMon(k) over MPLs(k) provides strong grounds for considering only the former.
4
Experiments and Diagnostics
Section 4.1 discusses estimated risks and diagnostic plots for the competing fits to the paleomagnetic data presented in Section 2.2. Further experiments with artificial data, described in Sections 4.2 and 4.3, suggest that the orthogonal matrices UD and UE implicit in the PLS fits provide economical bases for a range of directional trends. Consequently, the PLS estimators described in Section 2 have much smaller estimated risk than the naive trend estimator; and the associated monotone shrinkage fits reduce risk further. The experimental results support theoretical conclusions developed in Section 5 about the benefits of basis economy while revealing aspects of estimator performance not covered by the asymptotics.
4.1
Paleomagnetic Data
For this data, k = 19.7 and the rescaled risk estimates kp(}, U) for the competing fits displayed in Figure 1A are: Naiv 2.000 The estimate MM on (1) is the clear winner in having smallest estimated risk, far smaller than that of the naive estimator. It is noteworthy that the relatively small estimated risk of MMon(1) is coupled with a pleasing visual appearance. The estimate gives a clear picture of the time-trend in the position of paleomagnetic north pole as measured from Antarctica. The diagnostic plots in Figure 1B provide further insight into the behavior of these directional trend estimates. Let v = k l/2 lzl. In cells (1,1) and (1,2), the plots of I 2 versus i suggest that UD provides a more economical basis for the unknown trend than UE: the {Vi} for the first basis tend to zero faster than for the second basis. The square root transformation enhances visibility of the smaller components. Greater basis economy explains why MMon(1) has smaller estimated risk than MMon(2).
vJ
Rudolf Beran
25
Cell (2,1) displays, with linear interpolation, the successive components of the shrinkage vectors lMon(UD) (dashed line) and !D(iD) (solid line) that enter into the constructions of MMon(l) and MpLs(l) respectively. We see that !D(iD) provides only a rough approximation to the better lMon(UD) and gives more weight to higher "frequencies." This observation explains both the ragged visual appearance of M pLs (l) and the substantially smaller estimated risk of MMon(l). The free-floating points in cell (2,1) are the components of the raw shrinkage vector 9 plotted against i without interpolation. It is these highly irregular values that monotone and PLS shrinkage vectors approximate in constrained fashion through (35) and (37) respectively. In cell (2,2), the analogous plots of the shrinkage vectors on (UE) (dashed line) and ! E (i E) (solid line) reveal that the latter is a good approximation to the former. This explains why the estimated risk of MMon(2) is not much smaller than that of MPLS(2). Both the economy of the orthogonal basis and the quality of the shrinkage strategy affect the risk of the directional trend estimate. In this example, first and second-order PLS generate orthogonal bases that are plausibly economical. However, the strongly constrained one-parameter shrinkage strategy implicit in PLS can fail to exploit basis economy. Adaptive monotone shrinkage takes full advantage of basis economy and is computationally faster than adaptive PLS. There seems little reason to use PLS trend estimators except as a source of potentially economical orthogonal bases.
1M
v for First-order Basis
v for Second-order Basis
>
"0 o
a:
>
g
OJ
a:
10
15
20
10
25
U)
15
20
25
Component
Component
First-order Shrinkage Vectors
Second-order Shrinkage Vectors
C!
C!
.,
.,0
0
I
N
~
'"0
'"0
f"
"0
U)
0
OJ
OJ
0
0
0
0
0
0 10
15 Component
20
25
10
15
20
25
Component
Figure 1 b. Diagnostic plots for fits to the paleomagnetic north pole data. Top row displays the components of v = k 1 / 2 1zl for each orthogonal basis. Bottom row displays the shrinkage vectors defining MpLs(k) (solid line) and MMon(k) (dashed line).
4.2
Generating and Plotting Trend Data
This subsection summarizes ideas used to generate and plot pseudo-random directional trend data in three dimensions. All calculations and plots were done in
Adaptive Estimation of Directional Trend
26
Windows S-Plus 3.2 with set. seed(2). As a software check, the computations were repeated in Unix S-Plus 3.4. Very similar results were obtained, after small changes in the code, in Unix R 1.00.
Cartesian and Polar Representations. A direction in R3 is a unit vector (a, b, c)' that has an equivalent polar coordinate representation (0, ¢), where oE [0,7r] and ¢ E [0, 27r). Direction (0,0,1)' is the north pole of the coordinate system. On the one hand,
U
=
a = sin(O) cos(¢),
c = cos(O).
b = sin(O) sin(¢),
(38)
On the other hand,
o=
cos -1 (c) E [0, 7r],
¢ = tan-l (b/a) E [0, 27r).
(39)
These values may be computed by using S-Plus functions acos ( ) and atan( , ).
Generating a Fisher-Langevin (/-1, /'1,) random direction. Let VI, V2 be independent random variables, each uniformly distributed on [0,1]. Define
o
cos- 1(/'1,-1..(r) An approximate solution is now generated via,
=
f>..(tk+l) h(tk)
+
+
I'(h(tk), tk)(tk+l - tk)
b(r>..(tk), tk)Vtk+l - tk Zk+l -
D,
Some points may lie outside of
{r - IIo(r)}/A.
f3>..(f(tk))(tk+l - tk)
but small A brings them close.
Method 3. Projection method. In this case the sequence of approximations to the solution is f(tk+d =
lID (f(tk)
+
I'(f(td, tk)(tk+l - tk)
+
E(f(tk), tk)Vtk+l - tk Zk+l)
These values do lie in D. The function A of (4) may be approximated by
L (f(tk) - II
D (f(tk)))
tk "5:t
By a special construction for the case of hyperplane boundaries Lepingle [16] gets faster rates of convergence. He remarks that the constructed process might go outside D during some interval tk to tk+l and provides a construction to avoid this. Some comparisons From equation (4) dr
=
pdt
+
EdB -
dA
while from (5) dr
pdt
+
EdB -
'V H dt
so one has the connection dA
~
V'Hdt
A crucial difference however is that the support of dA is on the boundary aD while the added term 'V H may be nonzero inside D. References for specific methods of simulation are: [18], [21], [22], [26]. Asmussen et al. [2] find that the sampling has to be suprisingly fine in the onedimnsional case if the Euler method is used. They suggest improved schemes. One can speculate on how the animals behave when they get to the boundary. They may walk along it for a while. They may run at it and bounce back. They may stand there for a while. This relates to the character of the reflections implicit in the simulation method employed. Dupuis and Ishii [10] allow different types of reflections, including oblique. Ikeda and Watanabe [13] allow "sticky" and "non-sticky" behavior at the boundary.
42
6
Constrained Animal 1V10tioll
Some simulations
;Q
c.':.
O,D
0.5
i 0
-1.Q
Figure 4. Simulation of a region of attraction at (O~O) and a circular boundary. The top left hand figure is the potential function employed, H. The top right is a simulated trajectory using Method 1. The bottom left used Method 2 and the bottom right Method 3. To get practical experience, SOIne elementary simulations were carried out. A naive boundary, namely a circle was employed to make obtaining the result of a projection easy. Figure 4 shows results for the three methods. There were n = 1000 equispaced time points and in each case the same starting point and random numbers were employed. The potential function, J.L, used is shown on the top left panel of the figure. Its functional form is a standard normal density rotated about the origin. The boundary is taken to be a circle of radius 1. The top right panel shows the result of lVIethod 1. The term added to the potential function to force the particle to remain in D is proportional to
This function rises to 00 on 3D. The path certainly stays within D and is attracted towards the center. Since the term added is not zero in the region D one is obtaining an approximate solution. When the point moves near the boundary it is repulsed. The bottom left panel shows the result of employing Method 2. The penalization parameter A was taken to be t"+l - tk. In this
David R. Brillinger
43
case the trajectory goes outside of the circle making the method's approximate nature clear. Of course, by choice of parameters one can make the excursions smaller. The bottom left panel shows the result of employing Method 3, i.e. projection back onto the perimeter of points falling outside. The path stays in the circle. We learned that the methods were not that hard to program and Method 1 was perhaps the easiest. The running times of the three methods were comparable. Methods 1 and 3 lead to paths in D. The paths generated by the three methods are surprisingly different despite the random number generator having the same starting point in each case. The presence of the boundary is having an important effect. The path behavior is reminiscent of the sensitivity to initial conditions of certain dynamic systems.
7
Several particles
We begin by mentioning the work of Dyson, [11], [27]. For J particles moving on the line Dyson considered the model
(6) J
=
1,2, .... ,J This corresponds to the potential function
H(x)
=
-"21 "~" log(Xj -
Xi) 2
ii=j
This function differs from the models considered previously in the paper in being random. One notes that there is long range repulsion amongst the particles and they will not pass each other with probability l. Spohn [27] considers the general process
dXj(t)
=
-
~ L H'(xj(t)
- xi(t))dt
+
dBj(t)
iofj
where H is a potential function. He develops scaling results and considers correlation functions and Gibbs measures. Figure 5 presents a simulation of Dyson's process for the case of 2 particles and (J = .1. In the figure one sees the particles moving towards 0 repeatedly, but consequently being repelled from each other. Consider next a more general formulation. Consider particles moving in the plane. Suppose there are J particles with motions described by {r j (t)}, j = 1, ... , J. Collect the locations at time t into a 2 by J matrix, s(t) = [rj(t)] and set down the system of equations j = 1,2, ...
(7)
with the B j independent bivariate Brownians. The Dyson model (6) is a particular case, with special properties. The components may all be required to stay in the same region D. Questions of interest, e.g. the interactions, now become questions concerning the entries
44
Constrained Animal Motion
Simulation of Dyson model for 2 particles
4
2
o
-2
-4
o
20
40
60
80
100
sigma = 0.1
Figure 5. A simulation of Dyson's model (6).
of I' and 'E. Attraction and repulsion might be modelled, e.g. attraction of the animals i and j via setting
One may be able to express the strengths of connection. One might study the properties of the distance Ir i (t) - r j (t) I to learn about the dependence properties amongst the particles. The 1', E might include distance to the nearest other particle. There are phenomena to include - animals lagging, clumps, repulsion, attraction, staying about the same distance, ... Lastly there may be animals of several types. The simulation methods already discussed may be employed here. With data, parameters may be estimated and inferences drawn, e.g. one can study differences of animal behavior. It does need to be remembered that behavior may appear similar because both particles are moving under the influence of the same explanatories rather than inherently connected as in the model (6).
8
Discussion
The paper is principally a review in preparation for statistical work to come. SDEs are the continuing element in the paper. They provide a foundation for the work in particular they offer processes in continuous time, there is an extensive literature, and they have been studied by both probabilists and statisticians.
David R. Brillinger
45
To a substantial extent the concern of the paper has been with the effect of boundaries. It turns out that there are a several methods for (approximately) simulating processes that are constrained. A small simulation study was carried out to assess relative merits. Certain practical difficulties arise. These include: choice of sampling times, choice of parameter values, goodness of approximation, the possible presence of lags in a natural model, and the finding of functional forms with which to include explanatories The regularity conditions have not been laid out. They may be found in the references provided. One can argue that the results are still far from best possible for there is a steady changing of assumptions e.g. re boundedness, convexity and closure. Many problems remain. There has been some discussion of the case of interacting animals here and in [7]. This is a situation of current concern. In practice it seems that often the process can be only approximately Markov for once the animal has finished some activity it seems unlikely to start it again immediately, e.g. drinking. This means one would like equations including time lags. It is easy to set down such equations, but not so easy to get at the properties of the motion. As an example one might consider
for some function /-11 and lag T. The deer may be following the elk at a distance. There are analytic questions such as the expected speeds. There is some literature going under the key words "stochastic delay equation" see [3]. Other interesting questions include: 1. Given the diffusion process (2), how does one tell from the form of I' and E if there is a closed boundary that keeps the process inside once it starts inside? One could check to see if E(r(t), t) vanishes on aD and that I'(r(t), t) does not point outside there. 2. How does one include in the model the possibility that the process may follow along the boundary for a period? What are other important types of boundary behavior? Ikeda and Watanabe's sticky and non-sticky behavior has already been mentioned. The focus has been on diffusion processes but Levy processes, with their jump possibilities, seem a pertinent model for some situations. Work does not appear to have been done on the Skorhod problem for Levy processes. We have taken an analytic approach in the work and in particular have left for later questions of statistical inference. The tools of model and simulation are basic in the paper and are needed when one turns to the inference issues. Simulation was used to estimate the invariant distribution of the elk in [6] and the likelihood function of an elephant seals's journey in [9].
Acknowledgements Many people helped me out with references and comments. I mention the probabilists: P. Dupuis, A. Etheridge, S. Evans, T. Kurtz, J. San Martin, J. Pitman, and R. Williams. I also mention the US Forest Service researchers: A. A. Agers, J. G. Kie, H. K. Preisler who provided the Starkey elk data and the
46
Constrained Animal Motion
marine biologists: B. Kelly and B. S. Stewart who provided the ringed seal and elephant seal data respectively. Apratim Guha noticed that two of the initial figures appeared inappropriate leading to revision. Rabi himself helped me out in developing the paper. He directed me to pertinent references and discouraged me from being over concerned about the analytic assumptions made in many of these. Rabi thank you for the pleasant years we were colleagues and for writing that so helpful book with Ed Waymire. This work was completed with the support of NSF grants DMS-9704739 and 0203921.
Bibliography [1] Anulova, S. V. and Lipster, B. Sh. (1990). Diffusional approximation for processes with the normal reflection. Theory Probab. Appl. 35, 411-423. [2] Asmussen, S., Glynn, P. and Pitman, J. (1995). Discretization error in simulation of one-dimensional reflecting Brownian motion .. Ann. Appl. Prob. 8, 875-896. [3] Bell, D. R and Mohammed, S-E. A. (1995). Smooth densities for degenerate stochastic delay equations with hereditary drift. Ann. Prob. 23, 18751894. [4] Bhattacharya, R N. and Waymire, E. C. (1990). Stochastic Processes with Applications. Wiley, New York. [5] Brillinger, D. R (1997). A particle migrating randomly on a sphere. Theoretical Probability 10, 429-443. [6] Brillinger, D. R, Preisler, H. K., Ager, A. A. and Kie, J. G. (2001). The use of potential functions in modelling animal movement. Pp. 369-386 in Data Analysis from Statistical Foundations. (Ed. A. K. M. E. Saleh.) Nova, Huntington. [7] Brillinger, D. R, Preisler, H. K., Ager, A. A. and Kie, J. G. (2002). An exploratory data analyisis (EDA) of the paths of moving animals. J. Statistical Planning and Inference. To appear. [8] Brillinger, D. R, Preisler, H. K., Ager, A. A., Kie, J. G. and Stewart, B. S. (2002). Employing stochastic differential equations to model wildlife motion. Bull. Brazilian Math. Soc .. To appear. [9] Brillinger, D. R and Stewart, B. S. (1998). Elephant-seal movements: modelling migration. Canadian J. Statistics 26, 431-443. [10] Dupuis, PI and Ishii, H. (1993). SDEs with oblique reflections on nonsmooth domains. Ann. Prob. 21, 554-580. [11] Dyson, F. J. (1963). A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys. 3, 1191-1198. [12] Goldstein, H. (1957). Classical Mechanics. Addison-Wesley, Reading.
David R. Brillinger
47
[13] Ikeda, N. and Watanabe, S. (1989). Stochastic Differential Equations and Diffusion Processes. North-Holland, Amsterdam. [14] Kelly, B. P. (1988). Ringed seal. Pp. 57-75 in Selected Marine Mammals of Alaska. (Ed. W. Lentfer). Marine Mammal Commission, Washington. [15] Kloden, P. E. a nd Platen, E. (1992). Numerical Solution of Stochastic Differential Equations. Springer, New York. [16] Lepingle, D. (1995). Euler scheme for reflected stochastic differential equations. Math. Compo in Simulations. 38, 119-126. [17] Lions, P. L. and Snitman, A. S. (1984) Stochastic differential equations with reflecting boundary conditions. Comm. Pure Appl. Math. 37, 511-537. [18] Liu, Y. (1993). Numerical Approaches to Stochastic Differential Equations with Boundary Conditions. Ph. D. Thesis, Purdue University. [19] Nelson, E. (1967). Dynamical Theories of Brownian Motion. Princeton U. Press, Princeton. [20] Perrin, M. F .. (1928). Mouvement brownien de rotation. Ann. l'Ecole Norm. Sup. 45, 1-51. [21] Pettersson, R (1995). Approximations for stochastic differential equations with reflecting convex behaviors. Stoch. Proc. Appl. 59, 295-308. [22] Pettersson, R (1997). Penalization schemes for reflecting stochastic differential equations. Bernoulli 3, 403-414. [23] Preisler, H. K., Ager, A. A., Brillinger, D. R, Johnson, B. K., and Kie, J. G. (2002). Modelling movements of Rocky Mountain elk using stochastic differential equations. Submitted. [24] Rozokosz, A. and Slominski, L. (1997). On stability and existence of solutions of SDEs with reflection at the boundary. Stoch. Proc. Appl. 68, 285-302. [25] Saisho, Y. (1987). Stochastic differential equations for multi-dimensional domain with reflecting boundary. Probab. Th. Rel. Fields 74, 455-477. [26] Slominski, L. (2001). Euler's approximations of solutions of SDEs with reflecting boundary. Stach. Proc. Appl. 94, 317-337. [27] Spohn, H. (1987). Interacting Brownian particles: a study of Dyson's model. Pp. 151-179 in Hydrodynamic Behavior and Interacting Particle Systems. (Ed. G. Papanicolaou). Springer, New York. [28] Stewart, B. S., Yochem, P. K., Huber, H. R, DeLong, R L., Jameson, R J., Sydeman, W., Allen, S. G. and Le Boeuf, B. J. (1994). History and present status of the northern elephant seal population. Pp. 29-48 in Elephant seals: Population Ecology, Behavior and Physiology (B. J. Le Boeuf and R M. Laws, eds). University of California Press, Los Angeles. [29] Stewart, J. (1991). Calculus, Early Transcendentals. Brooks/Cole, Pacific Grove.
48
Constrained Animal Motion
[30] Stroock, D. W. and Varadhan, S. R. S. (1979). Multidimensional Diffusion Processes. Springer, New York. [31] Tanaka, H. (1978). Stochastic differential equations with reflecting boundary condition in convex regions. Hiroshima Math. 9, 163-177.
8-expansions and the generalized Gauss map Santanu Chakraborty Reserve Bank of India
and B.V. Rao Indian Statistical Institute
Abstract Motivated by problems in random continued fraction expansions, we study {;I-expansions of numbers in [0, (;I) where < {;I < 1. For such a number (;I, we study the generalized Gauss transformation defined on [0, (;I) as follows: if x =I- 0, T(x) ~ { ~ - e[,~l if x =
°
°
One of the problems that concerns us is the symbolic dynamics of this map and existence of absolutely continuous invariant probability.
AMS (MSC) no: 37 E 05; 60 J 05
1
Introduction
Suppose that /-l is a probability on the real line. Consider the following law of motion: If you are at x pick a number Z according to the law /-l and move to Z + x. Continue the motion with independent choices at each stage. This is nothing but the familiar random walk. Suppose that by an error the law is transcribed as : move to Z + ~, then what happens? To make sense of the problem, from now on we consider the state space to be (0,00). Let /-l be a probability on [0,(0) which drives the motion. If you are at x move to Z + ~ where Z is chosen independent of the past and has law /-l. This leads us to the Markov process Xo
= x > 0;
where (Zn; n .2: 1) is an i.i.d sequence of random variables, each having law /-l. The purpose of the paper is to discuss this process.
2
Generalities
If /-l is 60, the point mass at zero, then Xn = x or l/x according as n is even or odd. Unless x = 1 the process does not converge in distribution. For each x > , ~ (6 x + 61 / x ) is an invariant distribution for the process. In fact any invariant probability is a mixture of these. If /-l = 6a where a > 0, then the process starting at x is deterministic and is the sequence - in the usual notation of continued fractions - [x;] , [a; x] , [a; a, x] , ... which converges to the number given by the continued fraction [a; a, a,' .. ]. We leave the easy calculation involving convergents to the interested reader. The point mass at this point is the unique
°
49
O-expansions and the generalized Gauss map
50
invariant distribution for the process. From now on we assume that f-L is not a degenerate probability, on [0, (0). It may however have some mass at zero. Then Xn = [Zn; Zn-l,'" ,Zl, x] has the same law as [Zl; Z2,'" ,Zn, x] and consequently Xn converges in distribution to
Xoo
= Zl
+
1
Z2
+
1
Z3+"
simply denoted by [Zl; Z2, Z3," .]. The almost sure convergence of the expression on the right side is argued as follows. Since Zi are Li.d with strictly positive mean, we have 2..: Zn = 00 a.e and for nonnegative numbers (an) the continued fraction [al; a2, a3,"'] is convergent iff 2..: an = 00 (Khinchin [9], Th 10,p. 10). Since Xn converges to Xoo in distribution irrespective of the initial point x we have the following: Theorem 1 (Bhattacharya and Goswami [1]):
(1) The Markov process Xn has a unique invariant distribution II, and Xn converges in distribution to II.
(2) II is the unique probability on (0,00) characterized by II = f-L
*~
in the sense that whenever X, Z are independent random variables with X strictly positive, Z rv f-L and X rv Z + then X rv II.
1- '
In view of the last part of the theorem, each explicit evaluation of II leads to a characterization of II as the unique distribution satisfying the convolution equation above. It is in this context the problem was first discussed by Letac and Seshadri [11], [12]. They observed that when f-L is exponential then II is inverse Gaussian, thereby obtaining a characterization of the inverse Gaussian distribution. A systematic study of the markov process was initiated in Bhattacharya and Goswami [1] motivated by problems in random number generation. They showed, among other things, that II is always non-atomic. An excellent review is in Goswami [8].
3
Positive integer driver
One problem that concerns us here is the nature of the invariant probability whether it is absolutely continuous or singular. Since the invariant probability II is nothing but the distribution of Xoo = [Zl; Z2, Z3,"']' the problem reduces to studying the nature of the distribution of Xoo.. Let us assume that the driving probability f-L is concentrated on the set of strictly positive integers. In this case note that the representation [Zl (w); Z2 (w), Z3 (w), ... ] is already the usual continued fraction expansion of the number Xoo(w). Well known results about usual continued fraction expansions lead to an interesting consequence. The range of 1/Xoo is contained in (0,1). Under the distribution of 1/ X oo , the digits in the continued fraction expansion are i.i.d so that it is an invariant and ergodic measure for the Gauss transformation. So it must be same as the Gauss measure or must be singular to the Gauss measure and hence singular. But as one knows the digits are not independent under the Gauss measure. So the distribution of 1/Xoo is singular. Consequently the distribution of Xoo is singular too. Thus,
Santanu Chakraborty and B. V. Rao
51
Theorem 2: Suppose p is concentrated on strictly positive integers. Then XCXl
has singular distribution. This is perhaps known, but we have not found in the literature. Thus we here have a naturally arising family of singular distributions.
4
Bernoulli driver
The arguments used above fail when p has mass at zero. Due to the presence of zeros, [Zl (w); Z2 (w), Z3 (w), ... j is no longer the usual continued fraction expansion of the number XCXl(W). Let us assume that the driving probability p puts mass a at 0 and 1 - a at 1. Since each Zi takes only two values 0 and 1, it is not difficult to discover the continued fraction expansion of X(XJ(w) This is what we obtain now. Let us assume that ZI(W) = 1 or equivalently, consider the set n1 = {w: ZI(W) = I}. Define the stopping times for the process (Zik::l as follows: 70 (w)
= First even integer i such that Zi (w) =f O.
= First odd integer i > 70 suchthat Zi (w) =f O. 72 (w) = First even integer i > 71 such that Zi (w) =f 0 &c. 71 (w)
Let us now define,
Then, we have for a.e. w E n 1 , [ao(w); al (w), a2(w),·· .j. is the usual continued fraction expansion of X(XJ(w).1f Sk = L1
= 2 is a little more involved. Note that the quadratic in B, B; but it does not ensure that
81 has
the required expansion. In fact we must necessarily have 1 1 ni + 1. To see this, observe that if ni B + n2 B is the expansion of 8' then
we have
1
8 > ni Band
1
n2 B < B. so that n2 > ni. Further if n2 = nl
+ 1, then
1
8 = (ni + 1)B, which is not the required expansion. Thus we must > ni + 1. Moreover when this condition holds there is such a B and it
reduces to
have n2 is given by B = v(n2 - 1)/nIn2. Indeed, for this B
Further such a B is unique, because the quadratic equation to be satisfied by B has only one positive root. The situation k = 3 is more complicated. It is necessary to have n2 > ni and also n3 > nl. Further when this condition holds there is such a B and it is given by
B
v(ni
+ n3 -
n2 n 3)2
+ 4nIn2n3 2nIn2n3
(ni
+ n3 -
n2n 3)
Santanu Chakraborty and B. V. Rao
57
In fact the equation to be satisfied by B is of fourth degree having two nonreal complex roots, one positive and one negative root. Thus such a B is unique as well. With this choice, l/B has the required B-expansion. Thus we have proved Theorem 7:
(i) For the existence of a number 0 < B < 1 such that l/B has the B-expansion [nIB; n2B] it is necessary and sufficient that n2 > nl + 1. When this holds, such a B is unique. (ii) For the existence of a number 0 < B < 1 such that l/B has the B-expansion [nl B; n2B, n3B] it is necessary and sufficient that n2 > nl and n3 > nl. When this holds, such a B is unique. However the situation for values of k larger than 3, eludes us. For the existence of a number B, 0 < B < 1 such that l/B has the B-expansion [nIB; n2B, ... ,nmB] the following conditions appear to be necessary and sufficient:
(i) ni > nl for i
= 2,
m where as ni 2 nl for 2 < i < m.
(ii) If for some i and p with i + P < m, (nHI,··· ,nHp) = (nl,··· ,np), then we should have nHp+1 :s; np+1 if p + 1 is even where as ni+p+1 2 np+1 if p + 1 is odd. Before proceeding further, we mention that in the literature, there exist several generalizations of the usual continued fraction expansions. See, Bissinger [3], Everett [7] and Renyi [14] Kraikamp and Nakada [10] and the references therein.
7
Generalized Gauss Transformation
Recall that the Gauss transformation on the interval [0, 1) associated with the usual continued fraction expansions is defined by
U(x) = { ; -
Il;l
The Gauss measure jJ defined by djJ(x) =
if
x =1: 0,
if
x = 0 _1_ dx on [0,1) is ergodic og2 1 + x
-1 1
and invariant for U [2]. Further ----+ 00
1 lim - log qn
n ....... oo
n
=,
a.e.
a.e. for some finite number
"
As in section 5, an are the digits in the continued fraction expansion and Pn / qn is the n-th convergent. And a.e. refers to jJ, or equivalently to Lebesgue measure. These properties playa crucial role in [1].
O-expansions and the generalized Gauss map
58
The analogue of this transformation for the 0 expansion is the transformation T - referred to as the generalized Gauss transformation - defined on [0,0) as follows:
T(x)
~ { ~ - Blixl
if
x
if x
=J 0, =
°
For several values of 0 < 1 it was shown in [5] - by using the concept of Markov maps - that T has an ergodic invriant measure equivalent to Lebesgue measure and moreover (*) and (**) hold. We shall not go into the details for two reasons. First, there may be simpler argument. Second, even after establishing these properties , which are no doubt interesting, we have not been able to draw conclusions about the distribution of Xoo. Theorems 4, 5, 6, and 7 are nothing but a description of the symbolic dynamics of this transformation. As remarked to us by Professor R.F Williams, this transformation is piecewise C 2 and is expanding - with derivative (in modulus) bounded below by 1/0 2 . By using well -known results (see [6] or perhaps implicit in [14]) we get.
Theorem 8: The generalized Gauss transformation T on [0, 0) defined above admits an absolutely continuous invariant measure. A tractable special case of this transformation will be discussed in the next section. Before proceeding further, we remark the following. One can define a map on [0,0) to itself by putting U1 (x) = O(~ - [~]) and one can also define a map on [0, -t;) to itself by putting U2(x) = -t; Cj~ [B~])' Obviously, these maps are conjugate to the Gauss map U on [0,1). However, the map T that we defined above is different from U1 and U2 and this map T is relevent for our discussion. We could not see if this is conjugate to the Gauss map U. Professor Y.Guivarch informed us that for several values of 0, T and U have different entropies and hence can not be conjugate.
-
8
Invariant Measure for T when
In this section we assume that
;2
E IN.
:2
E IN
Thus for some integer, say I,
~ = lB.
1
Thus (j has continued fraction expansion terminating at the first stage itself, 1 (j = [w]. Throughout this section 0 and hence the integer I is fixed. We shall now extend the usual argument (see Billingsley [ 2] ) to get an absolutely continuous invariant measure for the above transformation. In fact, we claim that dP(x)
1
---;-~
log
1 ---=--dx
111 Vi + X
which is same as saying dP(x)
1
log(l
+ 02 )
0 dx 1 + Ox
Santanu Chakraborty and B. V. Rao
59
is the required invariant measure for T. In the present case, we are lucky enough to explicitly write down the invariant measure which is perhaps not possible in general. Since we could not see any direct way of connecting the transformation T with the usual Gauss transformation U, we shall verify the above claim by carrying the same steps as in Billingsley referred to above. In order to show that T preserves P, it is enough to show that prO, eu) = P(T-1[0, eu)) for all u E [0,1). 00
Since T- I [0, eu)
1 U((k +1 u)e' ke) (equality is upto a set of Lebesgue mea-
=
k=l
sure zero), it is enough to verify the following :
l()U o
e
dx =
1 + ex
fj k=l
e e dx. (k~u)e 1 + x kle
The sum on the right side, after evaluation of the integrals, is a telescopic sum which equals log (l
~ u)
same as the left side.
We now show that P is ergodic too. As in Billingsley[ 2], we introduce the sets ~al,a2'" ,an and the maps 1/Jal,a2," ,an: [0, e) ---+ [0, e) as follows. ~al,a2"" ,an is the set of all x such that ai(x) = ai for i = 1,2, ... ,n. In view of the discussion in section 5, ~al,a2"" ,an may be empty for some n-tuples (aI, a2,'" ,an)' In what follows we assume that ~al,a2'" ,an is non-empty for the n-tuple (aI, a2,'" ,an) under consideration. 1/Jal,a2,'" ,an is given by, 1/Ja l ,a2 , ... ,an (t)
=
1
t E
--------:;1,-------
al e+ --------::1:---a2 e+ -----:1;---
[0, e).
... +--ane +t
Then ~al,a2, ... ,an is the image of [O,e) under that
1/Ja l a2 ... a ,
,
,
n
(t) = Pn qn
+ tpn-l + tqn-l
Pn = [ale, a2 e,'" ,anej. Also qn creasing for even n. So,
~al,a2""
,an
=
[ [
for
1/J a l,a2, ... ,a n .
One can show
t E [0, e) just like in (5.5). Recall that
1/J a l,a2,'" ,an (t)
is decreasing for odd nand in-
+ (}Pn-l + eqn-l
Pn qn
Pn qn
Pn qn
+ epn-l , + eqn-l
Pn qn
] ]
if n
even,
if n
odd.
Using (5.2), we see,
(8.1) where
>., as usual, denotes Lebesgue measure.
e-expansions and the generalized Gauss map
60
Let us denote .6. a1 ,a2"·· ,an and 'l/Jal ,a2"·· ,an by .6.n and 'l/Jn respectively. Here we fix aI, a2,'" ,an' Then .6. n has length I 'l/Jn(e) - 'l/Jn(O) I . Also,for 0 ::; x < y ::; e, the interval {w : x ::; Tn(w) < y} n.6. n has length I 'l/Jn(Y) - 'l/Jn(x) I . So, using the notation, '\(AIB) = '\(A n B) / '\(B) , we have,
In absolute value the numerator equals . 1 nommator equa s
e
(
qn qn
II
+ uqn-1
y-x (qn + xqn-l)(qn
+ yqn-l)
and the de-
)
After some algebra,
'l/Jn(Y) - 'l/Jn(x) 'l/Jn(e) - 'l/Jn(O) Now ~ ~
qn-l
Again,
e and hence, the right hand side of (8.2)
qn-1 qn
(8.2)
e
- and hence the right qn + qn-1 - 2
2(y - x)
e . Thus,
y - x < '\(T-n[x y)l.6. ) < 2(y - x). 2e ,n e Hence, for any Borel set A also, we have,
'\(A) < '\(T-n(A)I.6. ) < 2'\(A). 2e n e Now, since 0 ::; x < 1
(8.3)
e, 1
e
log(l + ( 2) 1 + e 2 ::; log(l
e
e
+ ( 2) 1 + ex ::; log(l
+ ( 2)'
Hence, for any Borel set M, we have, log(l
e
1
+ ( 2) 1 + e 2
So, '\(M) ::; 1 ~ e log(l 2
'\(M) < P(M) < e '\(M). - log(l + ( 2)
+ ( 2)p(M)
and '\(M)
~
(84)
.
2
log(l e+ ( ) P(M).
Therefore, using these inequalities together with (8.3) and (8.4), we get the following:
where Cl, C 2 are constants depending on above inequality becomes
e only.
Now if A is invariant, the
C 1 (e)p(A) ::; P(AI.6. n ) ::; C2(e)p(A). Assuming P(A)
> 0, we get,
Santanu Chakraborty and B. V. Rao
61
Hence, for any Borel set E,
C1 ((})P(E)
~
~
P(EIA)
C 2((})P(E).
Taking E = AC, one gets P(AC) = 0 so that P(A) = 1. Therefore, T is ergodic under P, as claimed. We now prove (*) and (**) also hold - again following Billingsley closely. By ergodic theorem, if f is any non-negative function on [0, Ol, integrable or not, we have, .
1 n-1
hm n~(Xl n Taking
f
L
k=O
= aI,
1
k
f(T (w)) = 1 (1 og
r
()
0
1
+ (} X
L
a.e.
[Pl·
1
= 10g(1 + (}2)
=
£;
Jkl) 1
(Xl
(}a1(x) d
10g(1 + ( 2 ) io 1 + (}x x
1
k=l
10g(1 + ( 2 )
_1_
(k+1)()
k(} d 1 + (}x x
1
(Xl
k 10g(1
n-1
+ k2 + 2k) = 00.
n
lim ~ a1(Tk(w))
n----+oo
1(} (}f(x) dx
the right hand side becomes,
1
Thus,
+
(}2)
~
k=O
= n-+oo lim
L ak(w) =
00
a.e.
[Pl·
k=l
This proves (*). Towards (**), first notice that, (8.5) Also, from (5.5),
Or,
Or, x
Ilog(
Pn )
I~ 10g(1 + (1
1
+ (}2)n)
'(Os}(u(t - s, ~ + 1) + 'u(t - s, ~ -
=
e->'(~)tuo(~)
+~u(t -
+ m(~) iot
1 1
A(~)e->'(~)S{2(3u(t - s,~
1) + u(t - s, ~))ds
1 + 1) + 3u(t -
s, O)}ds
s,~ -1) (2.10)
where the multiplicative factor m(~) = 3/ A(~) is introduced to write the recursion (2.10) in the form of an expected value. Namely,
where i. So is exponentially distributed with parameter A(~O)
=
2e + ~.
a2 e
ii. Conditionally given ~o = ~, ~ is ~ or ~ ± 1 with equal probabilities ~ each, independently of So.
On Ito's Complex Measure Condition
68
iii. Kg is 0-1 valued symmetric Bernoulli (coin tossing) random variable, independent of Sg, ~g. iv. m(~)A(~) = 3 for all ~ E R. In view of this structure one is naturally led to the jump Markov process {~(t) : t 2: O} defined on a probability space (rl,F,P) starting at ~g = ~ in Fourier frequency space R having simple symmetric random walk ~g, ~" .. as discrete spatial structure and positive infinitesimal rates A(O; see Blumenthal and Getoor (1968) for detailed construction ofthe strong Markov process {~( t) : t 2: O} so specified. Additionally, Kg, K, K, ... is an i.i.d. Bernoulli 0-1 valued sequence independent of the jump process {~(t) : t 2: O}. Now consider the multiplicative random functional defined recursively by UO(~g),
if Sg
>t
m(~g)uO(~ 0, b = (bj(x) : 1 :::; j :::; n), and c(x) have the property that the Fourier transform of each term is a complex measure. We will also permit an additional forcing term g( t, x) for which the Fourier transform g( t, ~) is assumed to exist as a function. Precise conditions characterizing when a function is the Fourier transform of a complex measure are not known to us, though various examples and sufficient conditions are relatively easy to provide. For example, Bochner's theorem may be used to get a sufficient condition for this in terms of non-negative definiteness. We consider the Cauchy problem
au at
= Lu + g,
u(O, x) = uo(x).
(3.15)
-EU, f. > 0, appearing in (3.15), for L defined by (3.14), causes no loss in generality for applications to equations with E = 0. Let (3.16)
In view of the linearity ofthe equation, the term
where < ',' > is ordinary dot product. Then taking Fourier transforms in (3.15) one has by the integrating factor method that
(3.17)
A solution of the integral equation version (3.17) of the Fourier transformed differential equation is referred to as a mild solution of the Fourier transform. The hypothesis that the lower order coefficients each contributes a complex measure provides a set of up to four probability measures by considering the positive and negative parts of each of the real and imaginary parts. To obtain a random walk distribution we proceed as follows. Define positive measures q, Q on R n by n
q(B)
=
Icl(B) +
L j=l
Ibjl(B),
B E Bn ,
(3.18)
On Ito's Complex Measure Condition
70
and assuming q(Rn) > 0, (3.19)
°
we leave the case q(Rn) = as a simple but illuminating exercise for the reader. Then Q is a probability distribution which dominates each of the complex measures bj , c, j = 1,2, ... n. Let the corresponding Radon-Nikodym derivatives be denoted by
de
(3.20)
ro(17) = dQ (17) and
db
rj(17) = dQ (17),
(3.21)
j = 1, ... n.
Now let {~n == ~ : n = 0,1,2, ... } be the random walk on Rn with distribution of Li.d. displacements 171,172, ... given by Q. That is, ~o = ~ and ~n = ~n-1 - 17n for n 2 1. Also let {~(t) : t 2 O} denote the corresponding pure jump Markov process on Rn with holding times Be, B1, ... defined by the positive rates A(~n) =< A~n, ~n > +E, n = 0,1,2, ... , respectively. Let {Nt: t 2 O} denote the corresponding counting process of the number of jumps by time t, and let K,e, K,, ... be i.i.d. Bernoulli 0-1 valued random variables on (0.,F,P) independent of the jump process {~(t): t 20}. The Bernoulli coin tossing sequence will induce "virtual states" upon the occurence of K,j = 0. Let
mo(~)
=
2(n + 1) (27r)¥ A(~)
(3.22)
and mj(~) = i~jmo(~),
(3.23)
j = 1,2, ... , n.
Substituting (3.22) and (3.23) into (3.17) gives
u(t, ~) =
e-'\(~)tuo(~) +
t A(~)e-'\Ws[~_I_ ( t 2 n + 1 JRn .
Jo
mj(~)rj(17)
J=O
·u(t - s,~ - 17)Q(d17) 12g(t - s'~)]d
+2
A(~)
s
= E~e=dl[Be 2 t]uo(~e) + [mJ(~e)r J(171)U(t - Be, ~
o.
(3.29)
>0 (3.30)
Proof. The matrix A is positive definite, giving A(O) for any ~ ERn. Then
=
E
~
E+
=
A(~)
2(n + 1) e -.\(';)s --,-' -----,-------,-::(27f )n/2
< mo(0)fTo(8).
(3.31)
D For r > 0 let g(r,·) denote the density of a gamma random variable having shape parameter r and scale parameter E; that is g(r,8) = ET8T-le-Esjr(r) for 8 > o. Also define a:= max sup I~jl > o. (3.32) l:Sy:Sn';ERn
Since A is positive definite, a is finite.
J
On Ito's Complex Measure Condition
72
Lemma 3.2. For any
E Rn, j = 1, ... , n, and s > 0
~
(3.33) Proof: First note that SUPy>o ye- y2 and j = 1, ... , n
= (2e)-1/2. Then for any ~
E Rn,
s > 0,
mo (O)EI~j le->-(Os
< mo(O)a EJ < A~, ~ >e- K]} (4.66)
where an empty product is assigned value one. Therefore, letting q = q(R), one has k
lu(t,~) I :::; B
L E~e=~ II IqmJi_ (~i_lW-Ui-l1[K 1\ Nt = k]. 1
k;::O
0
On Ito's Complex Measure Condition
78
For each k it is helpful to introduce the mutually dependent pair of binomial distributed random variables k-l
k-l
= ~(1- O"i)l[Ji = 1],
Xk
Yk = ~(1- O"i)l[Ji = 0].
i=O
(4.67)
i=O
Also in the case that 0"0 = ... = O"k-l = 0 set hk = 1, else let hk denote the density of 2.:~-1 O"iSi conditional on the O"i'S, J/s, and ~i'S. Then, proceeding similarly as in the proof of Theorem 3.1, consider k-l
Ak :=Et,e=t,
II IqmJi(~i)ll-ai1[K
1\ Nt
= k]
o k-l
Pt (x, .) is the transition probability. The study on the existence of the equilibrium 7f and on the speed of convergence to equilibrium, by Bhattacharya and his cooperators, consists a fundamental contribution in the field. See for instance [2J-[6J and references within. The second line in (1.8) is correct for diffusions but incorrect in the discrete situation. In general, one has to replace "¢==::}" by "====?". Here are three examples which distinguish the different inequalities.
°
° °
b(x) = a(x) = x, b(x) = a=x 2 Iog' x a(x) = 1 b(x) = -b
Ergodicity
2nd Poincare
LogS
L--r- exp .
Nash
,>1
,2::2
,>2
,>2
,>2
J
,2::0
,2::1
,>1
x
J
J
x
x
x
Table 1.1, Examples: Diffusions on [0, (0)
Mu-Fa Chen
83
Here in the first line, "LogS" means the logarithmic Sobolev inequality, "L1_ exp." means the L1-exponential convergence which will not be discussed in this paper. "J" means always true and "x" means never true, with respect to the parameters. Once known the criteria presented in this paper, it is easy to check Table 1.1 except the L1-exponential convergence. The remainder of the paper is organized as follows. In the next section, we review the criteria for (1.1) and (1.2), the dual variational formulas and explicit estimates of A and A. Then, we extend partially these results to Banach spaces first for the Dirichlet case and then for the Neumann one. For a very general setup of Banach spaces, the resulting conclusions are still rather satisfactory. Next, we specify the results to Orlicz spaces and finally apply to the Nash inequalities and logarithmic Sobolev inequality. Since each topic discussed subsequently has a long history and contains a large number of publications, it is impossible to collect in the present paper a complete list of references. We emphasize on recent progress and related references only. For the applications to the higher dimensional case and much more results, the readers are urged to refer to the original papers listed in References, and the informal book [13], in particular.
2
Ordinary Poincare inequalities
In this section, we introduce the criteria for (1.1) and (1.2), the dual variational formulas and explicit estimates of A and A.
To state the main results, we need some notations. Write x /\ y = min {x, y} and similarly, x V y = max { x, y}. Define
F = {f
j
=
O[O,D]
E
n C 1(O,D):
{f E 0[0, D] : f(O) f = f(·/\xo), f
= {f j' = {f
F'
f
= E
f(O) = 0,
f'1(O,D)
> O},
0, there existsxo E (0, DJso that
C1(O,xo)andf'l(o,xo) >
O},
(2.1)
E 0[0,
D] : f(O) = 0, fl(o,D) > O},
E 0[0,
D] : f(O) = 0, there existsxo E (0, D]so that
=
f(·/\ xo)andfl(o,x o) >
O}.
Here the sets F and F' are essential, they are used, respectively, to define below the operators of single and double integrals, and are used for the upper bounds. The sets j and j' are less essential, simply the modifications of F and F', respectively, to avoid the integrability problem, and are used for the lower bounds. Define
I(f)(x)
=
e-G(x) lD f'(x) x [feG la] (u)du,
fEF,
(2.2)
II(f)(x)
=
(X lD f(x) Jo dye-G(y) y [feG la] (u)du, 1
f
E
F'.
The next result is taken from [12; Theorems 1.1 and 1.2]. The word "dual" below means that the upper and lower bounds are interchangeable if one exchanges the orders of "sup" and "inf" with a slight modification of the set F (resp., F') of test functions.
Poincare-type Inequalities
84
= sup
= sup gEt;}
c JR, define an
JEr Iflgd/-L,
(4.3)
Mu-Fa Chen
91
where 9 = {} 2:: , : JE EBJ (} )d/-l :::;
00 },
which is the set of non-negative functions
in the unit ball of L such that, for each E > 0, the lim sup of (4.18) is less or equal to LE. Given E > 0, choose a positive integer m 2: so that ~ ::; E2 and ~ ::; E2 for j > m and all n. Given any subsequence nz of the positive integers, choose a subsequence nzs which satisfies
°
;2
aj
- - --+ a'l,
yInl;
bj
- - --+
yInl;
(3
as /1
j,
--+ 00
for j = 1,2, ... m.
(4.19)
As before, we will suppress the subsequence notation. The quantity (4.18) is less than or equal to the sum of the following three terms
IE( eir(x 2::=j=l ajXj+y 2::=j=l bj Y j )
_
e-
2~ (x 2 2::=j=Tn+1 a;+y2 2::=j==+l b;)) (4.20)
2 7"2
.E(eir(x2::=';=l a j X j +y2::=';=l bjYj))I,
le- r; 2~ (x 2 2::=j==+l a;+y2 2::=;'==+1 b;) E( eir(x 2::=';=1 ajXj+y 2::=';=1 bj Y j )) r2 1 2 "n 2 ,..2 2 "Tn _e-22nx L..,j==+l a j e-Tx L..,j=l
Oi
2
,..2
1
2 "n
j e-22nY L..,j==+l
b2 j
,.2
(4.21) 2
,,= jl,
e-TY L..,j=l
{32
(4.22)
Brownian Motion and the Classical Groups
110
Since n
1 n
~n
~ ~
m
a2 -----t J
1-
j=m+1 n
L
~ 0: 2 ~ J
and
j=1
m
bJ -----t 1 -
j=m+1
L (3], j=l
the term (4.22) converges to zero. By a known result (see, e.g., Lemma 5.3 of [33]), 1
(foX I , ... , fox m , foYI , ... , foYm ) =* y'2(ZI' Z2, ... , Z2m) where the Zi are i.i.d. N(O,l). Thus
and so
and hence (4.21) converges to zero. To bound (4.20), we first claim that n
n
j=m+1
j=m+l
n
n
To see this, let and note that eir(xL'j=lajXj+YL'j=lbjYj)
= G(
II j=m+1
cos(rxajXj )
II
cos(rybjYy))
j=m+l
plus a sum of products of the form GJ where J is a product of sines and cosines involving at least one sine term. To establish our claim, it is enough to verify that the expectation of any such GJ is zero. First suppose J contains the factor sin(rxajXj ) but not the factor sin(rybjYy). Then E(GJ) = 0 by the sign-symmetry of the diagonal elements of~. Next consider a product GJ containing a factor sin(rxajX j ) sin(rybjYy). The diagonal elements of ~ are also exchangeable, and so we can assume j = m + 1. Write
Anthony D 'Aristotile, Charles M. Newman and Persi Diaconis
111
where Un is the unitary group and P,n is Haar measure. For 0 E [0,27fJ, let D(O) be the n x n diagonal matrix Diag(l, 1, ... , 1, e iO , 1, ... , 1) where eiO is in position m + 1. By the invariance of Haar measure, D(O)Ll has the same distribution as Ll, and so
1
Hsin(rxam+l(SCOS(r+O))) sin(rybm +1 (ssin(r+O))) dP,n =f.
Un
Thus
f27r 10
1 Un
H sin(rxa m+l (scos(r + 0))) sin(rybm +1(ssin(r + 0))) d/1n dO = 27ff.
By Fubini's Theorem [35], we have
1 Un
H
f27r sin(rxam+l(SCOS(r + 0))) 10
sin(rybm + 1 (ssin(r + 0))) dO dP,n
= 27ff.
Next let l(O) = sin(rxam+1(scosO)) sin(rybm +1(ssinO)). Now, l is periodic with period 27f and shifting l by '"Y units yields a functions whose integral over [0,27f] coincides with the integral of lover that same interval. Thus
1
H
Un
f27r l( 0) 10
dO dP,n = 27f f.
However, l is an odd function and so
10f27r l(O) dO = J7r -7r l(O)
dO
=
o.
It follows that f = 0 and our claim is established. Using this fact and arguing as we did in the proof of Theorem 2.1 , we have that the expression in (4.20) does not exceed the value
1 I IT Un
-e
IT
cos( rxaj X j )
j=m+l
"n
_ r2x2 2 E(X2) 2 L..j=m+l a j j
n
e
_ r2y2
L:nj=m+l b2j E(y2) j I dP,n n
L
a;E(Xf) + r ; (Var(
j=m+l n
+r4y4
2
2 2
L
:::; r 4 x 4
cos( rybj lj)
j=m+l
L j=m+l
a;XJ))~
j=m+l 2 2
b;E(Y/) + r ~ (Var(
n
L
b;Yl))~.
j=m+l
We can bound this last expression as in the proof of Theorem 1, which leads us to a proper choice of L and completes the proof of Theorem 4.2. 0 It is natural to ask if Theorem 3.1 has complex and symplectic analogues. We believe this is the case but thus far, like in the case of Theorem 2.1, we are able to prove a result of this type only for elements of the diagonals of these classes of matrices. In doing so, we obviously lean heavily on the preceding theorem.
Brownian Motion and the Classical Groups
112
Theorem 4.3. Let On = Un be the unitary group of n x n complex matrices, and let ~ = r + iA be an element of On distributed according to Haar measure. Let dj = "ijj + iAjj and let SJ:: = L~=1 dj . If Zn(t,w)
= S[ntJ(w), t
E
[0,1]'
then Zn =} TV converges to TV where TV is standard complex-valued Brownian motion (TV = WP) + iWP) where W(I) and W(2) are independent onedimensional Brownian motions with drift 0 and diffusion coefficient ~). Proof. We appeal to Theorem 5. One can easily adapt the argument for tightness given in Theorem 3.1 to show that ReZn is tight. Here Ebfl) = 2~ and Ebrr"issAuuAvv) = 0 for distinct r, s, u, and v. Similarly, ImZn is tight and hence Pn is tight where Pn is the law of (ReZn , ImZn ). By Theorem 4.1, it remains to show that (4.23) where P is the law of (W(I), W(2)). We consider time points 81, 82, it, and t2 where 81 < 82 and tl < t2, and one may easily verify that the general case can be handled analogously. Letting Xn = ReZn and Yn = ImZn, we wish to prove that
However, this statement would follow if
converges in distribution to
Appealing as before to the Cramer-Wold device [5], it suffices to show that
converges in distribution to
for any (a, b, c, d) E ]R4. The remainder of the proof follows by applying Theorem 4.2 in essentially the same way as Theorem 2.1 is applied in the proof of Theorem 3.1. D
5
Symplectic matrices
Recall (see [8] ) that the group of symplectic matrices Sp(n) may be identified with the subgroup of U(2n) of the form
[~
-1]
E
U (2n),
(5.24)
Anthony D'Aristotile, Charles M. Newman and Persi Diaconis
113
where A, B are complex n x n matrices. The trace of random matrices from this group is studied in [14, 16]. As shown there, if 8 is chosen according to Haar measure in Sp(n), then Tr(8) , Tr(8 2 ) , ... , Tr(8 k ) are asymptotically independent normal random variables. We now study the extent to which the diagonal entries of a random symplectic matrix generate Brownian motion. Random matrices in Sp( n) can be generated in the following way. Fill the real and imaginary entries of A and B in with real, standard normal i.i.d. random variables. Apply the Gram-Schmidt process to the n complex column vectors of dimension 2n which result. We now have a new A and B and we complete the right half of our matrix by following the pattern of (5.24). The matrix obtained in this way is distributed according to Haar measure in Sp(n). To see this, one can adapt the argument given for the construction of a random orthogonal matrix. See for example Proposition 7.2 of [17]. We now have
Theorem 5.1. Let Sp(n) be the symplectic group of 2n x 2n complex matrices of the form (5.24) , and let 8 be an element of Sp(n) chosen according to Haar measure /-Ln. Let A = (aij)r,j=l be the upper left n x n block of 8, and let d i = aii, 1 :::; i :::; n , and let SI: =
then Zn ::::} ~ TV where
TV
2:7=1 di ·
If
is standard complex-valued Brownian motion.
Proof. We are working with complex matrices and so we can follow the arguments of Theorems 4.2 and 4.3. We first need the symplectic analogue of Theorem 4.2. To accomplish this, only one change in the proof of Theorem 4.2 is required. In place of the diagonal matrix D(O), we use instead the 2n x 2n diagonal matrix D1 (0) = Diag(l, ... , 1, ei(J, 1, ... , 1, e-i(J, 1, ... , 1) where ei(J and e-i(J occur in positions number m + 1 and n + m + 1 respectively. The rest of the arguments for the analogues of Theorems 4.2 and 4.3 are clear. D
It should be noted that we cannot link all 2n diagonal entries to obtain Brownian motion. If we were to try, note that Zn(~) and Zn(l) - Zn(~) would tend to limits which are complex conjugates of one another and hence dependent.
Acknowledgement. The authors thank Harry Kesten for explaining how sign-symmetry could be used to show that the trace of a random orthogonal matrix converges to a standard normal distribution at the Bowdoin Conference on random matrices in 1985. They also thank Francis Comets for comments on earlier drafts of this paper. The first author thanks the Department of Statistics of Stanford University for warm hospitality extended to him during the summers of 1994-1997. He also thanks Jeff Rosenthal and Patrick Billingsley for some useful conversations. In addition, he acknowledges support from the Research Foundation of the State University of New York in the form of a PDQ Fellowship. The second and third authors acknowledge research support from the Division of Mathematical Sciences of the National Science Foundation.
Brownian Motion and the Classical Groups
114 Anthony D' Aristotile Dept. of Mathematics SUNY at Plattsburgh Plattsburgh, NY 12901
Persi Diaconis Depts. of Mathematics and Statistics Stanford University Stanford, CA 94305
Charles M. Newman Courant Inst. of Math. Sciences New York University 251 Mercer Street New York, NY 10012
Bibliography [1] Arratia, R., Goldstein, L., and Gordon, L., Poisson Approximation and the Chen-Stein Method, Stat. Science 5, 403-434, 1990. [2] Bai, Z.D., Methodologies in Special Analysis of Large Dimensional Random Matrices. A review, Statist. Sinica 9, 611-677, 1994. [3] Bhattacharya, R. and Waymire, E., Stochastic Processes with Applications, John Wiley and Sons, 1990. [4] Billingsley, P. J., Probability and Measure, Second Edition, John Wiley and Sons, 1986. [5] Billingsley, P. J., Convergence of Probability Measures, John Wiley and Sons, 1968. [6] Borel, E., Sur les principes de la theorie cinetique des gaz, Annales de l'ecole normale sup. 23, 9-32, 1906. [7] Bump, D. and Diaconis, P., Toeplitz minors, Jour. Combin. Th. A. 97, 252-271, 2001. [8] Brackner, T. and tom Dieck, J., Representation of Compact Lie Groups, Springer Verlag, 1985. [9] Daffer, P., Patterson, R., and Taylor, R., Limit Theorems for Sums of Exchangeable Random Variables, Rowman and Allanhold, 1985. [10] D'Aristotile, A., An Invariance Principle for Triangular Arrays, Jour. Theoret. Probab. 13, 327-342, 2000. [11] Diaconis, P., Application of the method of moments in probability and statistics. In H.J. Landau, ed., Moments in Mathematics, 125-142, Amer. Math. Soc., Providence, 1987. [12] Diaconis, P., Patterns in eigenvalues, To appear Bull. Amer. Math. Soc., 2002. [13] Diaconis, P., Eaton, M., and Lauritzen, 1., Finite de Finetti theorems in linear models and multivariate analysis, Scand. J. Stat. 19, 289-315, 1992.
Anthony D'Aristotile, Charles M. Newman and Persi Diaconis
115
[14] Diaconis, P. and Evans, S., Linear functions of eigenvalues of random matrices, Trans. Amer. Math. Soc. 353, 2615-2633, 2001. [15] Diaconis, P. and Freedman, D., A dozen de Finetti-style results in search of a theory, Ann. Inst. Henri Poincare Sup au n. 2 23, 397-423, 1987. [16] Diaconis, P. and Shahshahani, M., On the eigenvalues of random matrices, J. Appl. Prob. 31A, 49-62, 1994. [17] Eaton, M., Multivariate Statistics, John Wiley and Sons, 1983. [18] Edelman, A., Kostlan, E., and Shub, M., How many eigenvalues of a random matrix are real?, Jour. Amer. Math. Soc. 7, 247-267, 1999. [19] Feller, W., An Introduction to Probability Theory and Its Applications, Vol. II, John Wiley and Sons, 1971. [20] Golub, R. and Van Loan, C., Matrix Computations, 2nd Ed., Johns Hopkins Press, 1993. [21] Hida, T., A role of Fourier transform in the theory of infinite dimensional unitary group, J. Math. Kyoto Univ. 13, 203-212, 1973. [22] Jiang, T.F., Maxima of entries of Haar distributed matrices, Technical Report, Dept. of Statistics, Univ. of Minnesota., 2002. [23] Kuo, H.H., White Noise Distribution Theory, CRC Press, Boca Raton, 1996. [24] Levy, P., Le£;ons d'Analyse Fonctionnelle, Gauthiers-Villars, Paris, 1922. [25] Levy, P., Analyse Fonctionnelle, Memorial des Sciences Mathematiques, Vol. 5, Gauthier-Villars, Paris, 1925. [26] Levy, P., Problemes Concrets d'Analyse Fonctionnelle, Gauthier-Villars, Paris, 1931. [27] Mallows, C., A Note on asymptotic joint normality, Ann. Math. Statist. 43, 508-515, 1972. [28] Maxwell, J.C., Theory of Heat, 4th ed., Longmans, London, 1875. [29] Maxwell, J.C., On Boltzmann's theorem on the average distribution of energy in a system of material points, Cambridge. Phil. Soc. Trans. 12, 547-575, 1878. [30] McKean, H.P., Geometry of Differential Space, Ann. Prob. 1, 197-206, 1973. [31] Mehler, F.G., Ueber die Entwicklung einer Function von beliebig vielen. Variablen nach Laplaschen Functionen hoherer Ordnung, Grelle's Journal 66, 161-176, 1866. [32] Mehta, M., Random Matrices, Academic Press, 1991.
116
Brownian Motion and the Classical Groups
[33] Olshanski, G., Unitary representations of infinite-dimensional pairs (G, K) and the formalism of R. Howe. In A.M. Vershik and D.P. Zhelobenko, eds. Representation of Lie Grops and Related Topics, Adv. Studies in Contemp. Math. 7, 269-463, Gordon and Breach, New York, 1990. [34] Rains, E., Normal limit theorems for asymmetric random matrices, Probab. Th. Related Fields 112, 411-423, 1998. [35] Royden, H. L., Real Analysis, 2nd Edition, The Macmillan Company, 1968. [36] Stein, C., The accuracy of the normal approximation to the distribution of the traces of powers of random orthogonal matrices, Technical Report No. 470, Stanford University, 1995.
Transition Density of a Reflected Symmetric Stable Levy Process in an Orthant Amites Dasgupta and S. Ramasubramanian Indian Statistical Institute Abstract Let {Z(s,x)(t) : t 2': s} denote the reflected symmetric a-stable Levy process in an orthant D (with nonconstant reflection field), starting at (s, x). For 1 < a < 2,0 :::; s < t, x E D it is shown that Z(s,x) (t) has a probability density function which is continuous away from the boundary, and a representation given.
1
Introduction
Due to their applications in diverse fields, symmetric stable Levy processes have been studied recently by several authors; see [4], [5] and the references therein. In the meantime reflected Levy processes have been advocated as heavy traffic models for certain queueing/stochastic networks; see [14]. The natural way of defining a reflected/regulated Levy process is via the Skorokhod problem as in [9], [3], [11], [1]. In this article we consider reflected/regulated symmetric a-stable Levy process in an orthant, show that transition probability density function exists when 1 < a < 2 and is continuous away from the boundary; the reflection field can have fairly general time-space dependencies as in [11]. It may be emphasized that unlike the case of reflected diffusions (see [10]) powerful tools/methods of PDE theory are not available to us. To achieve our purpose we use an analogue of a representation for transition density (of a reflected diffusion) given in [2]. Section 2 concerns preliminary results on symmetric a-stable Levy process in JRd, its transition probability density function and the potential operator. In Section 3, corresponding reflected process with time-space dependent reflection field at the boundary is studied. A major effort goes into proving that the distribution of the reflected process at any given time t > 0 gives zero probability to the boundary.
2
Symmetric stable Levy process
Let (O,F,{Ft},P) be a filtered probability space, d 2 2,0 < a < 2. Let {B(t) : t 2 O} be an Fradapted d-dimensional symmetric a-stable Levy process. That is, {B(t)} is an JRd-valued homogeneous Levy process (with independent increments) with r.c.I.I. sample paths; it is roation invariant and
E[exp{ i(u, B(t) - x) }IB(O) = x] = exp{ -tlul a } 117
(2.1 )
Reflected Levy Process
118
for t 2:: 0, U E IR d, x E IRd. It is a pure jump strong Markov process. Using LevyIto theorem and Ito's formula, it can be shown that the (weak) infinitesimal generator of B(·) is given by the fractional Laplacian
J
~a/2 f(x) = ~ft}C(d, ex)
f(x
~~fl+~ f(x) d~
(2.2)
1~I>r
whenever the right side makes sense, where C(d,ex) = r(dta)/[2-a7fd/2Ir(~)I]; the measure v(d~) = C(d, ex) I~I}+Q d~ is called the Levy measure of B(·). Also, for any t > 0,
P(B(t)
i= B(t-)) = o.
(2.3)
See [4], [5], [7], [8] for more information.
= 8g(X)/8xi,gij(X) = 82g(X)/8xi8xj, 1::;
For afunctiong onIRd,gi(X)
i,j::; d.
Lemma 2.1. If f E C~(IRd) then ~a/2 f E Cb(IR d). Proof: For 0
< r < s, ~~,~2 is defined by (2.4) r 0
J
CXl
1 de - C 1~Id+a 0. By (2.1) and Proposition 2.5.5 (on pp. 79-80) of [13] it follows that B(t) = (BI (t), ... ,Bd(t)) is sub-gaussian and that there exist independent one-dimensional random variables 8, U1 , ... ,Ud such that Ui rv N(O, 2t 2 / a ), 1 :s: i :s: d,8 is ~stable positive random variable and (Bl (t), ... ,Bd(t)) rv (8~ U I , 8~ U2, ... ,8~ Ud). Denoting by g(-) the density of 8 1 / 2 , the joint density of (UI , ... , Ud, 8 1 / 2 ) is given by
h(6, ... ,~d,r)=
( 1) (1)t 47f
d/2
d/ a
g(r)exp
{1 -4t2/a8~; d
}
.
Using the invertible transformation (6, ... ,~d,r) f---t (r6, ... ,r~d,r) on IR d x (0,00) the joint density of (Bl (t), ... ,Bd(t), 8 1 / 2 ) is given by 1d h r
(~Yb ... , ~Yd' r) r r
(4~) d/2
m
d/a :d9(r) exp {
~ 41;/" :2
t
yf } .
Now integrating w.r.t. r we get (2.11).
D 00
J
rlkg(r)dr < 00 o for k = 2, 3, ... Indeed note that g(.) depends only on a; so if we consider kdimensional symmetric a-stable Levy process then the transition density will be given by (2.11) with d replaced by k; and as the density is well defined at x = z the claim follows. Remark 2.3. From the preceding theorem it follows that
Proposition 2.4. Denote Po(s, x; t, z) = f)p(s, x; t, z)/f)s, Pi(S, x; t, z) = f)p(s, x; t, Z)/f)xi, Pij(S, x; t, z) = f)2p(s, x; t, Z)/f)xif)Xj, 1 :s: i,j :s: d.
Reflected Levy Process
120
(i) Fix t > O,Z E JRd. Let to < t; then P,Po,Pi,pij,l ::; i,j ::; d are bounded contin uous functions of (s, x) on [0, to] X JRd. (ii) For any t > 0,8 > 0 sup{I'V xp(s, x; t, z)1 : 0::; s < t, Iz -
xl
~
8} ::; K(d, 8)
(2.12)
where K (d, 8) is a constant depending only on d, 8 and 'V x denotes gradient w. r. t. x-variables. 2
2
2
Proof: (i) Since ye- Y ,y e- Y are bounded, using Remark 2.3 and dominated convergence theorem, the assertion can be proved by differentiating w.r.t. s, x under the integral in (2.11). (ii) Since yd+2 e- y2 is bounded, differentiating under the integral in (2.11) we get for all 0 ::; s < t, Iz - xl ~ 8
l'Vxp(s,x;t,z)1
< K(d)
J (2) oo
g(r)
Iz _ xl
I ) d+2
2r(: ~ s~l/a
d+l ( I
exp
{
4r2~t -=- :)2/a I
-
12}
dr
o
< K(d)
(~)
d+l
00
j g(r)dr
=
K(d, 8).
o
o The following result indicates a connection between the transition density and the generator; though it is not unexpected, a proof is given for the sake of completeness.
Theorem 2.5. For fixed t > 0, z E JRd the function (s, x) the Kolmogorov backward equation
Po(s, x; t, z)
+ 11~/2p(s, x; t, z) = 0, s < t, x
E
1-+
p( s, x; t, z) satisfies
JRd
(2.13)
where Po is as in the preceding proposition and x in 11~/2 signifies that 11 a/2 is applied to p as a function of x. Proof: By the preceding proposition and Lemma 2.1 11~/2p(s, x; t, z) is a bounded continuous function. Put u(s, x) = p(s, x; t, z), s < t, x E JRd. Using Ito's formula (see [7]) for 0 ::; s < c < t, x E JRd c
E{u(c, B(c)) - u(s, B(s)) - j[uo(r, B(r))
+ 11a/2u(r, B(r))]drIB(s)
s
That is
j p(c, y; t, z)p(s, x; c, y)dy - p(s, x; t, z) IRd c
j j [po(r,y;t,z) S
IRd
+ 11~/2p(r,y;t,z)]p(s,x;r,y)dy
dr.
=
x} = O.
Amites Dasgupta and S. Ramasubramanian
121
By Chapman-Kolmogorov equation, l.h.s. of the above is zero. As the above holds for all c > s and the quantity within double brackets is bounded continuous in (r, y), by Feller continuity one can obtain (2.13) from the above letting c 1 s. D
We next look at the O-resolvent (or potential operator) associated with the process B (.). For a measurable function rp on JRd, x E JRd define
J J 00
Grp(x) =
p(O,x;t,z)dt dz =
rp(z)
IRd
JJ 00
0
rp(z)p(O,x;t,z)dz dt
0
whenever the r.h.s. makes sense. Since difficult to see that
(2.14)
IRd
°< a < 2 ::::;; d, using (2.11) it is not
00
Jp(O,x;t,z)=Clz_~ld_<x,zi-x
(2.15)
o
which is the so called Riesz kernel.
Theorem 2.6. Let rp E C;(JRd) and rp,r.pi,r.pij,l::::;; i,j::::;; d be integrable w.r.t.
the d-dimensional Lebesgue measure. Then (aj Gr.p E C;(JRd), (bj (Gr.p)i(X) = Gr.pi(X), (Gr.p)ij(X) Gr.pij(X), x E JR d,l < Z,] < d (c) f:..<X/2Gr.p(x) = -rp(x),x E JRd. D We need a lemma
Lemma 2.7. If f E L 1 (JRd) n LaO (JRd) then Gf is well defined, bounded and
continuous. Proof: Let {Tt} be the contraction semi group associated with B(·). Observe that
J
JJ
o
1
1
Gf(x) =
Td(x)dt+
00
f(z)p(O,x;t,z)dz dt.
(2.16)
IRd
°
Since Td is continuous for each t > and ITdUI ::::;; Ilflloo it is clear that the first term on r.h.s. is bounded and continuous. By (2.11)
r
If(z)p(O,x;t,z)l(1,oo)(t)1 ::::;; K
°
d/ a lf(z)11(1,00)(t)
which is integrable as < a < 2 ::::;; d. So continuity of p in x now implies that the second term on r.h.s. of (2.16) is bounded and continuous. D
Proof of Theorem 2.6: By Lemma 2.6 we get Gr.p, Gr.pi, Gr.pij are bounded continuous. A simple change of variables yields
JJ 00
r.p(z + he~) - r.p(z) p(O, x; t, z)dz dt
o IRd
JJ 00
rpi(Z)P(O, x; t, z)dz dt
o
IRd
Reflected Levy Process
122
by dominated convergence theroem; thus (Ge
The required assertion (3.18) and hence (3.16) now follows by (3.14) and dominated convergence theorem. Now to prove (3.11) (with s = 0), first consider the case x tJ- an. Since afE(-)/aZi = 0 on an, and y(.) can increase only when Z(·) E an, by Ito's formula
J~a/2 t
E[fE(Z(t))]- fE(X)
=
E
fE(Z(r))dr.
a By (3.12), (3.16) letting
lOin the above we get (3.11).
E
Next let x E an; for c > 0 let TJ - TJ~x) = inf{r ~ 0 : Z(r) E Dc}. By strong Markov property and the preceding case
E[l[o,tj(TJ)laD(Z(t))]
= O.
Note that {TJ~x) ~ t} i 0 (modulo null set) as c 1 0; otherwise we will get a contradiction to Theorem 3.3. Letting c lOin the above we get the required conclusion. This completes the proof. 0
Note: It may be interesting to compare the proofs of Theorems 3.3, 3.5 with those of their analogues for reflected Brownian motion given in [6]. In the following \1 2p(r, y; t, z) = \12P(r,'; t, z), ~~/2p(r, y; t, z) = ~~/2p(r,.; t, z) denote respectively the operators \1, ~ a/2 applied as function of y-variables. Our main result is
Theorem 3.6. Assume (AJ) - (A3); let 1 < a < 2. For 0 ~ s < t < oo,x E fJ, zED define
p(s,x;t,z)
J t
+E
(R(u, Y(u-), Z(u- ))\1 2p(u, Z(u); t, z), dY(u)) (3.19)
s
where Y(-) = y(s,x)(-),Z(-) = Z(s,x)(.). ForO ~ s < t,x E fJ,z E an take pR(s, x; t, z) = O. Then (i) pR is continuous on {O ~ s < t < 00, x E fJ, ZED}, it is also differntiable in (t, z); (ii) for any Borel set A S;;; fJ, s < t, x E fJ P(Z(s,x) (t) E A)
=
J
pR(s, x; t, z)dz.
(3.20)
A
In case R is independent of y-variables, pR is the transition probability density function of the Markov process Z(·). 0
Reflected Levy Process
128
We need a lemma Lemma 3.7. Hypotheses and notation as in the Proposition 3.2.
If
(sn,x n ) - t (s,x) then for a.a. w, forT> s var (Y(Sn,Xn)(.,w) - y(s,x)(.,w); [s,T]) sup IZ(Sn,Xn)(t,w) - Z(s,x)(t,w)1
-t
0
-t
o.
s~t~T
Proof: Denote z(n)(.) = Z(Sn,X n )(.), y(n)(.) = Y(Sn,X n )(.), Z(.) = Z(s,x)(.), Y(·) = y(s,x)(-). We first consider the case Sn < s for all n. Clearly z(n)(t,w), yen) (t, w), t :2 s is the solution to the Skorokhod problem corresponding to z(n)(s,w) + B(·,w) - B(s,w). For any T > s note that var ([Be, w) - B(s, w)
=
Iz(n)(s,w) -
+ Zen) (s, w)]
- [B(·, w) - B(s, w)
+ xl; [s, T])
xl.
For any w such that B(·, w) is continuous at s we have Xn +B(s, w) - B(sn' w) x. Boundedness of Rand (3.4) imply
J
-t
S
R(u, y(n)(u-),z(n)(u-))dy(n)(u,w)
Thus Iz(n)(s,w) [11].
xl
-t
-t
0 as n
-t
00.
0, and hence the result follows by Proposition 3.9 of D
Next let Sn > S for all n. For any n,Z(t,w),Y(t,w),t:2 Sn is the solution to the Skorokhod problem corresponding to Z(sn,w) + B(·,w) - B(sn,w). Clearly var ([xn
+ B(·, w) -
B(sn, w)]- [Z(sn, w)
+ B(·, w) -
B(sn, w)]); [sn, T])
IZ(sn,w) - xnl· So by the arguments as in [11] var (y(n)(.,w) - Y(·,w); [sn' T]) sup
Iz(n)(t,w)-Z(t,w)1
< CIZ(sn'w) - xnl < CIZ(sn,w)-xnl.
sn~t~T
Note that for s ::;: t ::;: Sn we may take z(n)(t,w) = Xn , y(n)(t,w) = O. Clearly var (Y(·, w); [s, sn]), sup IX n - Z(t, w)l, IZ(sn, w) - xnl all tend to 0 as Sn - t S s~t~sn
by right continuity. The required conclusion is now immediate. Proof of Theorem 3.6: Since dY(s,x)(.) can charge only when Z(s,x)(.) E aD and d(z, aD) > 0 for z tI- aD, well definedness of (3.19) follows from (2.12) and Proposition 3.2. Assertion (i) now follows from properties of p (viz. (2.11), (2.12), Proposition 2.4), boundedness and continuity of R and Lemma 3.7. To prove assertion (ii), in view of Theorem 3.5, it is enough to establish (3.20) when A c D.
Amites Dasgupta and S. Ramasubramanian
129
Fix t > s; let E > o. Apply Ito's formula to p(r, Z(s,x)(r); t, z), s :::; r :::; (t - E) corresponding to the semimartingale Z(s,x)(.) and use Theorem 2.5 to get
= p(s, x; t, z)
p(t - E, Z(t - E); t, z)
t-E
+
J
(R(r, Y(r-), Z(r- ))\l2p(r, Z(r); t, z), dY(r))
s
+ a stochastic integral. Let any
(3.21)
J be a continuous function with compact support KeD. By (3.21) for E > 0
J J J E
J(z)p(t - E, Z(t - E); t, z)dz
D
=
J
J(z)p(s, x; t, z)dz
D
t-E
+E
J(z)
D
(R(r, Y(r-), Z(r- ))\l2p(r, Z(r); t, z), dY(r))dz
(3.22)
s
For any w, note that p(t - E, Z(t - E, w); t, z)dz =? 6Z(t-,w)(dz) as since P(Z(t) =I- Z(t-)) = 0 it now follows that lim[l.h.s. of (3.22)] = E[J(Z(s,x)(t))]. dO
E
1 o.
And (3.23)
As d(K, aD) > 0, by (2.12), Proposition 3.2 and boundedness of J(-), R(·) lim[r.h.s. of (3.22)] = dO
J
J(z)pR(s, x; t, z)dz.
(3.24)
E[J(Z(s,x)(t))]
(3.25)
D
Thus
J
J(z)pR(s, x; t, z)dz
=
D
for any continuous function
J with compact support in D.
Next for any open set FeD, let {In} be a sequence of continuous functions with compact support in D such that In rlF pointwise. Clearly lim E[Jn(Z(s,x)(t))] = E[I F(Z(s,x)(t))].
(3.26)
n--+CXl
Taking expectation in (3.21) and letting
E
lOwe get
pR(s, x; t, z) = lim E[P(t - E, Z(t - E); t, z)] 2:: O. dO
Therefore by monotone convergence theorem
nl~~
J
J
D
D
In(Z)pR(s, x; t, z)dz =
IF(Z)pR(s, x; t, z)dz.
(3.27)
Now (3.25), (3.26), (3.27) imply that (3.20) holds for any open FeD, and hence for any Borel set A cD.
Reflected Levy Process
130
Finally, the last assertion is immediate from (ii); this completes the proof.
0
We conclude with the following questions. 1. Can (x, z)
r--t
pR(s, x; t, z) given by (3.19) be extended continuously to
2. Is pR(s,x;t,z)
> 0 for s < t,x,z
D x D?
E D?
3. When is pR symmetric in x, z?
Acknowledgement: The authors thank B. Rajeev and S. Thangavelu for some useful discussions; and Siva Athreya for bringing [1] to their notice while the work was in progress. Amites Dasgupta Stat.-Math. Unit Indian Statistical Institute 203 B.T. Road Kolkata - 700 108
S. Ramasubramanian Stat.-Math. Unit Indian Statistical Institute 8th Mile Mysore Road Bangalore - 560 059
Bibliography [1]
R. Atar and A. Budhiraja: Stability properties of constrained jumpdiffusion processes. Preprint, 2001.
[2]
S. Balaji and S. Ramasubramanian : Asymptotics of reflecting diffusions in an orthant. Proc. Internat. Conf. Stochastic Processes, December'96, pp. 57-81. Cochin University of Science and Technology, Kochi, 1998.
[3]
I. Bardhan:
[4]
K. Bogdan: The boundary Harnack principle for the fractional Laplacian. Studia Math. 123 (1997) 43-80.
[5]
K. Bogdan and T. Byczkowski Potential theory for the a-stable Schrodinger operator on bounded Lipschitz domains. Studia Math. 133 (1999) 53-92.
[6]
J. M. Harrison and R. J. Williams: Brownian models of open queueing networks with homogeneous customer populations. Stochastics 22 (1987) 77-115.
[7]
N. Ikeda and S. Watanabe: Stochastic differential equations and diffusion processes. North-Holland, Amsterdam, 1981.
[8]
K. Ito : Lectures on Stochastic Processes. Tata Institute of Fundamental Research, Bombay, 1961.
Further applications of a general rate conservation law. Stochastic Process. Appl. 60 (1995) 113-130.
Amites Dasgupta and S. Ramasubramanian
[9]
131
O. Kella: Concavity and reflected Levy process. J. Appl. Probab. 29 (1992) 209-215.
[10] S. Ramasubramanian: Transition densities of reflecting diffusions. Sankhya Ser. A 58 (1996) 347-381.
[11] S. Ramasubramanian : A subsidy-surplus model and the Skorokhod problem in an orthant. Math. Oper. Res. 25 (2000) 509-538. [12] S. Ramasubramanian: Reflected backward stochastic differential equations in an orthant. Proc. Indian Acad. Sci. (Math. Sci') 112 (2002) 347-360. [13] G. Samorodnitsky and M. S. Taqqu: Stable non-gaussian random processes : stochastic models with infinite variance. Chapman and Hall, New York, 1994. [14] W. Whitt: An overview of Brownian and non-Brownian FCLT's for the single-server queue. Queueing Systems Theory Appl. 36 (2000) 39-70.
132
Reflected Levy Process
On Conditional Central Limit Theorems For Stationary Processes Manfred Denkerl Universitiit Gottingen and Mikhail Gordin V.A. Steklov Institute of Mathematics Abstract The central limit theorem for stationary processes arising from measure preserving dynamical systems has been reduced in [6] and [7] to the central limit theorem of martingale difference sequences. In the present note we discuss the same problem for conditional central limit theorems, in particular for Markov chains and immersed filtrations.
1
Introduction
Let ((khcl = ((~k' 'r/k)hE'l, be a two-component strictly stationary random process. Every measurable real-valued function f on the state space of the process defines another stationary sequence (f ((k) ) kE'Z. Various questions in stochastic control theory, modeling of random environment among many other applications lead to the study of conditional distributions of the sums l:~:~ f ((k) given 'r/O, ... ,'r/n-l. In particular, the asymptotic behaviour of these conditional distributions is of interest, including the case when the limit distribution is normal. We shall prove conditional central limit theorems in the slightly more abstract situation of measure preserving dynamical systems (X, F, P, T), where (X, F, P) is a probability space and T : X --t X is P-preserving. Let f be a measurable function and Ji be a sub-O"-algebra. f is said to satisfy the conditional central limit theorem with respect to Ji (CCLT(Ji)), if P a.s. the conditional distributions of n-l
1 ""' Vn L..;foT k ,
k=O
given Ji, converge weakly to a normal distribution with some non-random variance 0"2 2': o. This leads to the identification problem for L2(P)-subspaces consisting of functions satisfying a CCLT. Following [6], an elegant way to describe such subclasses IThis paper is partially supported by the DFG-RFBR grant 99-01-04027. The second named author was also supported by the RFBR grants 00-15-960l9 and 02-0l-00265.
133
134
On Conditional Central Limit Theorems For Stationary Processes
uses T-filtrations, i.e. increasing sequences of o--fields Fn = T- 1 Fn+l' n E Z. Here we need to consider a pair of T-filtrations (Fn)nEZ and (Qn)nEZ satisfying 9n C Fn for every n E Z. For example, in case of a strictly stationary random process (~khEZ as above the o--field Fn (or 9n) is generated by ((k)k~n (or (T/kh~n' respectively). First of all, the conditional distributions in CCLT(1i) are determined by
1i
=
V 9k V V Fk·
kEZ
k~O
Secondly, a general condition describing the class of functions f for which the CCLT(1i) holds is given by the coboundary equation f = h + g - goT with a (Fn)nEz-martingale difference sequence h 0 Tk (i.e. h is UT 1i-measurable and EH f := EUI1i) = 0). The coboundary equation is implicit ely also used in [10] and [9]. In [10], sufficient conditions for CCLT(1i) are obtained, when 1i is replaced by 1i = VkEZ 9k, and our Proposition 3.1 contains this result as a special case. This proposition also specializes in case of skew products T(x,y) = (T(x),Tx(Y)) as in [9], where 9n is a T-filtration, and where 1i is also replaced by 1i.
It is hardly possible to verify this coboundary condition using properties of the o--fields (Fn)nEZ and (Qn)nEZ without making assumptions about their interaction. It has been noticed in [5] that conditional independence plays a fundamental role when studying conditional measures and their properties in connection with thermodynamic formalism. This additional property of conditional independence has been called immersion in [1], and we shall adopt this terminology. It means that for every n E Z the o--fields Fn and 9n+l are conditionally independent given 9n. The property of immersion is an essential simplification, although it seems to be rather strong. However, it looks quite natural in several situations (see e.g. [5]), in particular, when both ((khEZ and (T/khEZ are Markovian. Indeed, if the sequence (T/k)kEZ models the time evolution of a random environment influencing the process (~k)kEZ' the condition just means that there is no interaction between the process (~k)kEZ and the environment (T/k)kEZ, The same picture arises when (~khEZ models the outcome of non-anticipating observations over the process (T/khEZ, mixed with noise. If the sequence ((khEZ is a Markov chain, there is a natural assumption in terms of transition probabilities to guarantee that the corresponding filtrations are immersed (see Section 4). The notion of immersed filtrations was first recognized as an important concept in connection with the classification problem of filtrations (see [1] and references therein). A closely related notion, regular factors, was introduced in [5]. The latter paper also contains some examples of regular factors originating in twodimensional complex dynamics. In more general situations (like in control theory) some form of the feed-back between the two processes may be present, and we cannot expect that the corresponding filtrations are immersed. In this case more general concepts and results (like Theorem 3.7 of the present paper) have to be developed. In particular, we study the CCLT-problem for functions of Markov chains. We
Manfred Denker and Mikhail Gordin
135
follow the ideas in [7] closely where a rather general and natural condition in terms of the transition operator was introduced for the CLT-problem. This condition means that the Poisson equation is solvable, and it avoids mixing assumptions and similar concepts (e.g. [9] contains results in this direction). There is a natural construction embedding the original Markov chain into another one, for which the Poisson equation has to be solved. We give some comments how this verification can be done, in particular, in the context of fibred dynamical systems [5]. However, we do not go into much of details. As a consequence we obtain the functional form of the CLT for fluctuations of a random sequence around the conditional mean. Finally, we consider the case of immersed Markov chains. This property together with a solution of the Poisson equations for the original and extended Markov chains establishes an analogous result for conditional mean values of the original sequence, in addition to the CLT for fluctuations. The present paper arose from an attempt to understand Bezhaeva's paper [2] from the viewpoint of martingales. Bezhaeva's article studies the same problem as in the present note in the special case of finite state Markov chains. We do not reproduce these results in detail and formulate the conclusions of our theorems in a way different from the viewpoint taken in [2]. However, we would like to sketch the differences in both approaches. There are two results on the CLT in [2]: Theorem 3 and Theorem 5 (the latter theorem seems to be the most important result of [2]). Our corresponding results are Theorem 3.7 and Theorem 4.4. Though, we do not verify here that the conditions of our Theorem 4.4 are satisfied for a class of Markov chains considered in [2] and arbitrary centered functions: this would be just a reproduction of a part of [2]. Its proof and the content of our Section 4 clearly show that even for finite state Markov chain we really deal with continuous state space when considering a conditional setup. In fact much more general chains than in Theorem 5 in [2] (for example, geometrically ergodic) can be considered on the basis of our Theorem 4.4. Our method of proving the CLT is quite different from that of [2] and, as was remarked above, is based on approximation by martingales. We assume in this paper that all probability spaces and (j-fields satisfy the requirements of Rokhlin's theory of Lebesgue spaces and measurable partitions. This does not imply any restriction to the joint distributions of random sequences we are considering; hence we may freely use conditional probability distributions given a (j-field. An alternative approach would be to reformulate the results avoiding conditional distributions. However, we do not think that the advantages given by such an approach justifies the complexity of such a description.
2
Immersed Filtrations
Throughout this paper, let (X, F, P) and T : X ---+ X be, respectively, a probability space and an automorphism of (X, F, P) (that is an invertible Ppreserving measurable transformation). An increasing sequence of (j-subfields
136
On Conditional Central Limit Theorems For Stationary Processes
(Fn) nEZ of F will be called a filtration and a T -filtration if, in addition, T- 1(Fn) = Fn+l for every n E Z. Any a-field E ~ F defines a natural T-filtration (En)nEZ = (T-nE)nEZ, whenever T-1E :2 E. A filtration (Qn)nEZ is said to be subordinated to a filtration (Fn)nEZ, if for every n E Z (2.1 ) and it is called immersed into the filtration (Fn)nEZ, if (Qn)nEZ is subordinated to (Fn)nEZ and for every n E Z the a-fields Fn and Qn+l are conditionally independent given Qn. We shall always assume that
F=
V Fn
(2.2)
nEZ
(V sES Es
denotes the smallest a-field containing all a-fields Es , s E S). Setting Q = VnEZ Qn it follows from the definition of a T-filtration that Q is completely invariant with respect to T (that is T-1(Q) = Q). Finally, define F- = nkEZ F k , and similarly Q- = nkEZ Qk· Throughout this paper (Qn)nEZ always denotes a T-filtration which is subordinated to the T-filtration (Fn)nEZ. We then set
'lin
=
Q V Fn·
The transformation T defines a unitary operator UT on L2 = L 2(X, F, P) by UT f = f 0 T, f E L 2 . Given a sub-a-field 'li c F, we denote its conditional expectation operator (on L 2 ) by EH and its conditional probability by P( ·I'li). Let II . 112 denote the L 2 -norm. As mentioned above, the notion of immersed filtrations arises naturally in the context of Gibbs measures in the thermodynamic formalism (see [5]) and of Markov chains (see e.g. [2]). In order to simplify our conditions in the CCLT for these applications we need the following lemma for immersed filtrations. Lemma 2.1. The T -filtration (QkhEZ is immersed into the T -filtration (Fk)kEZ, if for every n E Z (2.3)
or, equivalently,
(2.4) Conversely, if UhhEZ is immersed into (FkhEZ, then the following equalities hold for every n E Z and m ~ 1:
(2.5) Proof. We first show that (Qk)kEZ is immersed into (Fk)kEZ, if (2.3 ) holds. Let n E Z be fixed and let ~ and ry be bounded functions measurable with respect to Fn and Qn+l, respectively. It follows from (2.3 ) that EFnry = EYnry. Therefore we have
EYn EFn (~ry) = EYn (~EFnry) EYn (~EYnry) = EYn (OEYn (ry),
Manfred Denker and Mikhail Gordin
137
which implies the conditional independence of Fn and 9n+l given 9n. In a similar way (replacing Fn by 9n+l) one shows conditional independence assuming
(2). Conversely, we first show that conditional independence of Fn and 9n+l given 9n for some n E Z implies (2.3). Indeed, it suffices to verify (2.3 ) for all bounded Fn V 9n+l-measurable functions ofthe form ~TJ, where ~ and TJ are Fnand 9n+l-measurable, respectively. By conditional independence, for a 9n+lmeasurable, bounded function h,
whence EYn+l~ that
=
EYn~. Similarly one shows that gFnTJ
EFn (TJEYn+10 (EYn~)(EFnTJ)
= EYnTJ.
It follows
= EFn (TJEYn~) = (EYn~)(EYnTJ)
EYn (~TJ)· Since the equation (2.4 ) can be proved similarly, we obtain the equivalence of (2.3 ) and (2.4 ). Moreover, by induction one easily verifies (2.5 ). 0
3
A Conditional Central Limit Theorem
Let (Vk)k2:1 be a sequence of real-valued random variables. For every n E Z+ define a random function with values in the Skorokhod space D([O, 1]) ([3], [8]) in the standard way: it is piecewise constant, right continuous, equals in the interval [0, lin) and equals n- 1 / 2 Ll::;m::;[ntlVm for a point t E [lin, 1]. This random function will be denoted by Rn(Vl, ... , vn ) and has a distribution on D([O, 1]), denoted by Pn(Vl, ... , v n ). We write Wa for the Brownian motion on [0,1] with variance (J2 of w a (1) (we need not exclude (J2 = since Wo is the process which identically vanishes). The distribution of Wa in C([O, 1]) will be denoted by Wa.
°
°
Remark 3.1. In the sequel we deal with convergence in probability of a sequence of random probability distributions in D([O, 1]) to a non-random probability distribution. It is assumed here that the set of all probability distributions in D( [0, 1]) is endowed with the weak topology. It is well known that the piecewise constant random functions (in D([O, 1])) can be replaced by piecewise linear functions (in C([O, 1])) without changing the essence of the results formulated below.
3.1
A general CCLT
As mentioned in the introduction the conditional central limit theorems in [9] and [10] are proved using some martingale approximation. There are different
138
On Conditional Central Limit Theorems For Stationary Processes
versions of a martingale central limit theorem which may be used in the present context. They all are versions and extensions of Brown's martingale central limit theorem. It has been used in [10] directly, and is used in [9] and here in a modified form. We apply a corollary of Theorem 8.3.33 in [8] to obtain the following CLT for arrays of martingale difference sequences. Lemma 3.1. For n E Z+ let (nn, F n , (Fk,n) k?:.O, pn) be a probability space with filtration Fk,n C Fn (k ~ O), and let (vk,nh?:.l be a square integrable martingale difference sequence with respect to ((Fk,nk:::o, pn). If for every E > 0 and t ~ 0 we have (3.6) grk- 1 •n (v~,n1{lvk,nl>E}) -----+ 0 l:S;k:S;nt
L
and
(3.7) l:S;k:S;nt in probability as n
-----+
(Xl
then {Pn (Vl' ... , v n ) : n ~ I}, converges weakly to
Wa 2 • The following proposition is the key result in the martingale approximation method for the CCLT. Implicitly it also appears in [10], and its proof is analogous to that for the central limit theorem in [6] or [7]. Proposition 3.1. Let T be an ergodic automorphism and (Hn)nEZ be a Tfiltration. Assume that g, h E L2 and (3.8) If f is defined by f = h+ g - UTg,
(3.9)
then, with probability 1, the conditional distributions Pn(J, UT f, ... ,U:;,-l flHo) given Ho of the random functions Rn (J, UT f, ... ,U:;'-l 1) converge weakly to the (non-random) probability distribution W a , where (J = IIhl12 ~ O.
Remark 3.2. The equations in (3.8 ) say that the sequence (U¥h)nEz is a stationary martingale difference sequence with respect to the filtration (Hn)nEZ, Remark 3.3. The conclusion of Proposition 3.1 remains true if the (J-field Jio in the statement is changed to any coarser one. This follows easily from the definition of weak convergence and the non-randomness of the limit distribution. Proof of Proposition 3.1. By remark 3.2 the sequence of finite series Vk,n = n- l / 2U!;-l h, (1 ~ k ~ n), form a martingale difference sequence with respect to the filtrations (Hk)o:S;k:S;n' Assume first that (J > O. We show that the sequence {vk,nI1 ~ k ~ n, n E Z} with probability 1 satisfies the conditions 3.6 and 3.7 of Lemma 3.1 with respect to the conditional distribution given Ji o· Relative to this conditional distribution with probability 1 the sequence (U¥h)nEz is a (non-stationary) sequence of martingale differences with finite second moments. The ergodic theorem implies that with P-probability 1
Manfred Denker and Mikhail Gordin
139
as n - 00. It follows that with probability 1 the same relation holds almost surely with respect to the conditional probability given H a, establishing (3.7 ). We need to check (3.6). By the ergodic theorem again, for every E > 0 and A > 0 we have with P-probability 1 E ri k-l( Vk,n 2 1 {llIk,nl>E} ) lim sup n->oo l~k~nt
L
lim sup n- 1 Erik ((U~h)21{IU~hl>ml/2}) n->oo a~k~(n-l)t
L
< lim sup
n- 1
n->oo
L
Erik ((U~h)21{IU~hl>A})
a~k~(n-l)t
lim sup n- 1 Erik (U~h2U~1{lhl>A}) n->oo a~k~(n-l)t
L L
lim sup n- 1 U~(Erio (h 21{lhl>A})) n->oo a~k~(n-l)t EErio (h 21{lhl>A})
= E(h 21{lhl>A}),
and, choosing A large enough, the latter expression can be made arbitrarily small. Thus for every E > 0 with P-probability 1
L
Erik-l(v~,nl{llIk,nl>E}) - 0
l~k~nt
as n - 00. This implies that with probability 1 the same expression tends to zero with respect to the conditional probability given H a, proving (3.6 ). It follows from Lemma 3.1 that Pn(h, ... , U:;,-lhIH a ) converges weakly to Wa P-a.s. The same conclusion also holds if a = 0 (h = 0 in this case). Finally we need to show that the sequences (U¥h)nE7l, and (U¥!)nE7l, are stochastically equivalent. We have R n (n- 1 / 2!, ... , n- 1 / 2U:;'-1 J) - R n (n- 1/ 2h, ... , n- 1/ 2U:;'-lh) =
n - un-I)) 2 - U) R n (n -1/2(UTg - g ) ,n -1/2(UTg Tg,···, n -1/2(UTg T g . It is easy to see that the maximum (over the interval [0, 1]) of the modulus of the latter random function equals n- 1 / 2 maxl~k~n IU~g - gl and does not exceed n- 1 / 2 (lgl + maxl~k~n IU~gl). Since by the ergodic theorem n- 1 U¥g2 0, this expression tends to zero P-a.s. Thus we see that P-a.s. the distance in D([O, 1]) between Rn(h, ... , U:;,-lh) and Rn(J, ... , U:;,-l J) tends to zero as n - 00. This implies that, with probability 1, the conditional distributions Pn(J, ... ,U:;,-l J) IHa) in D([O, 1]) have the same weak limit as
D
140
3.2
On Conditional Central Limit Theorems For Stationary Processes
On Rubshtein's CCLT
Proposition 3.1 is in fact a general result which can be seen when compared to other theorems in the literature. We begin recalling Rubshtein's result in [10].
Theorem 3.4. Let (~n, TJn)nE'Z be an ergodic stationary process with ~ E L2 and E9~o = O. If
(3.10)
then, with probability 1, the conditional distributions P n (6, 6, ... , ~n I Q) of Rn (6,6, ... , ~n) converge weakly to the non-random probability W a , where
. -1 E (6 hm
n---->oo
n
+ ... + ~n )2
=
(J
2•
The proof of this result can be reduced to Proposition 3.1 observing that (3.10 ) implies a representation as in (3.9). The result in [9], Theorem 2.3 is of the same nature, but in the special situation of a skew product. Another special case of Proposition 3.1 is the following theorem, which is also a generalization of Theorem 2 in [6], when p = 2.
Theorem 3.5. Let T be an ergodic automorphism and (fin)nEZ be aT-filtration. If f E L2 is a real-valued function satisfying 00
2:(llf - Erik fl12
+ IIEri-k f112) < 00,
(3.11)
k=O
then Proposition 3.1 applies to f. In particular, there exists (J ~ 0 such that with probability 1 the conditional distributions Pn(f, UT f, ... ,U:;'-l fl fio) converge weakly to the probability distribution Wa.
Proof. The following explicit formula defines a function g which permits a representation as in (3.9 ), where we set h = f - goT + g: 00
00
k=l
k=O
(here and below the series are L 2 -norm convergent due to the assumption
Manfred Denker and Mikhail Gordin
141
(3.11 )). It follows that h
f - UTg+ g
I: Ui + (f -
f -
k
1
EHk 1)
+L
k=l
U;+l(EH_k 1)
k=O
k=l
k=O
k=O
k=l
k=l
k=O
k=l
I:
k=l
U!;(E H-k+l
-
EH-k)f
kEZ n
k=-n
This representation clearly shows that h satisfies (3.8 ) and the theorem follows 0 from Proposition 3.1.
3.3
The CCLT for subordinated filtrations
Let (Qn)nEZ and (Fn)nEZ be two subordinated T-filtrations as explained in section 2 on filtrations. We shall use Proposition 3.1 to obtain sufficient conditions that the CCLT holds together with the CLT for the conditional mean. We begin with the following reformulation of Proposition 3.1.
Proposition 3.2. Let T be an ergodic automorphism, (Qn)nEZ and (Fn)nEZ be a pair ofT-filtrations such that (Qn)nEZ is subordinated to (Fn)nE'Z,' For f E L2 define = EY f and = f Assume that and admit representations
J
1
J.
J
1
(3.12) and
f = h+ g - UTg, where
then
g, g E L 2 ,
(3.13)
On Conditional Central Limit Theorems For Stationary Processes
142
(1,
1, ...
1)
i) the distributions Pn UT ,U:;'-l of the random functions Rn(1, UT ,U:;'-l 1) converge weakly to the probability distribution Wa,
1, ...
where
a = IIhl12 ~ o.
ii) with probability 1, the conditional distributions Pn(i, UT i, . .. ,U:;.-l ilHo) given Ho of the random functions Rn(i, UT i, . .. ,U:;.-l 1) converge weakly to the (non-random) probability distribution Wo:, where (j = IIhl12 ~ o. Remark 3.6. The same proof as for Proposition 3.2 shows that the joint distribution of the partial sums of (1, converge to aGaussian law with covariance matrix (O"ij), where O"I = Ilhll§, O"~ = Ilhll§ and 0"1,2 = 0"2,1 = Jhhd/L. One easily deduces from this that also f is asymptotically normal with variance Ilh + hll§.
i)
Proof. The assertion ii) is a direct consequence of Proposition 3.1. The assertion i) also follows from the Proposition 3.1 (applied to the filtration (Qn)nEZ) and Remark 3.3. 0 Corollary 3.1. Under the assumptions of Proposition 3.2, with probability 1, the conditional distributions p;:(i, UT i, ... ,U:;.-l ill, UT u:;.-11) converge weakly to Wo:, where (j = IIhl12 ~ o.
1, ... ,
Proof. This follows from Remark 3.3, because the functions are Q-measurable and Q ~ Ho.
1, UT 1, ... , u:;.-11 0
Theorem 3.7. LetT be an ergodic automorphism, and let (Qn)nEZ and (Fn)nEZ be a pair of T -filtrations such that (Qn)nEZ is subordinated to (Fn)nEZ. Let f E L2 be a real-valued function satisfying (Xl
L
Ilf - EFk fl12 < 00,
(3.14)
k=O (Xl
LilEY f - E'H-k fl12 < 00
(3.15)
k=O (Xl
LilEY f - EYk fl12 < 00
(3.16)
k=O
and (Xl
L IIEY-k fl12 < 00. k=O
Setting ~
~
then f and f admit, respectively, the representations
and
(3.17)
Manfred Denker and Mikhail Gordin
143
where
Moreover,
i) the distributions Pn ([, UT [, ... ,U:;'-l [) of the random functions Rn( 1, UT [, ... ,U:;,-l [) converge weakly to the probability distribution W&, where (j = IIhl12 ~ O. ii) with probability 1, the conditional distributions Pn (1, UT 1, ... ,U:;,-l 1lfio) given fio of the random functions Rn (1, UT ,u:;,-11) converge weakly to the (non-random) probability distribution Wo=, where (j = IIhl12 ~ O.
1, ...
Remark 3.8. (1) Instead of (3.17 ) it is sometimes more convenient to verify the stronger condition CXJ
I: IIEF-k fl12
2. -
For the last two properties, we take Po(t, x) _ 1. The first two properties simply tell us that we can write k
Pk(t,x)
=
LP;k)(t)x j , j=O
where the p;k) (t) are polynomials in t and pik)(t)
==
1.
The sequence {Pk } of Hermite polynomials as defined above is known to have some deep connections with the standard Brownian motion. One of these is the well-known fact that if {Mt, t ~ O} denotes the standard Brownian motion, then for each k, {Pk(t, M t ), t ~ O} is a martingale (for the natural filtration of {Md) and standard Brownian motion is the only process with this property. Moreover, if P(t, x) is any two-variable polynomial such that {P(t, M t )} IS a martingale, then P belongs to the linear span of the sequence {Pk }. A natural question that arises is: which stochastic processes admit such a sequence of 2-variable polynomials which when evaluated along the trajectory of the process are martingales and, if so, to what extent do these polynomials determine the process? Also, is it possible to get the sequences of polynomials so as to satisfy properties similar to those of the Hermite polynomials mentioned above? These questions were investigated in detail in Goswami and Sengupta [2] and Sengupta [6]. Following are some notations and definitions that were introduced in these works. Here we restrict ourselves only to continuous-time processes. Let M = {Mt, t ~ O} be a stochastic process on some probability space. The time-space harmonic polynomials for the process M are defined to be all those two-variable polynomials P(·,·) such that {P(t, M t )} is a martingale (always for the natural filtration of M). The two variables will be referred to as repectively the 'time' and the 'space' variables. The collection of all time-space harmonic polynomials for a process M will be denoted P(M). In other words,
P(M):= {P : P is a 2-variable polynomial and {P(t,Mt )} is a martingale} k
Any two-variable polynomial P can be written as P(t,x)
= L Pj(t)x j , for j=O
some k, where each Pj(t) is a polynomial in t. If in the above representation, Pk (t) =j:. 0, we say that P is of degree k in the 'space' variable x. For a stochastic process M = {Mt }, we define Pk (M) to be the collection of those time-space harmonic polynomials which are of degree k in the space variable, that is,
Pk(M)
:=
{P E P(M) : P is of degree k in the space variable x}.
A. Goswami and A. Sengupta
155
Clearly,
P(M)
=
UPk(M). k
Definition: A stochastic process M is said to be polynomially harmonizable (p-harmonizable, in short) if Pk (M) =1= 0, for all k :2: 1. In this terminology, standard Brownian motion is a p-harmonizable process. Indeed, Brownian motion is p-harmonizable in a somewhat stricter sense, to be understood below. For a process M, let us denote P k (M) to be the set of those time-space harmonic polynomials of degree k in x, for which the leading term in x is 'free' of t, that is, the coefficient of xk is a non-zero constant. In other words, k
PdM)
:=
{P E Pk(M) : P(t,x)
LPj(t)x j with Pk(-) a non-zero constant},
=
j=O
and we let,
P(M)
:=
UPk(M). k
Clearly, Pk(M)
c Pk(M)
'II k and so, P(M)
c P(M). Also, if Pk(M)
=1=
0,
k
then there is P(t, x) =
2:= Pj(t)x j
E Pk(M) with Pk(')
== 1.
j=O
Definition: A stochastic process M is said to be p-harmonizable in the strict sense if Pk(M) =1= 0, for all k :2: 1. The second property ofthe two-variable Hermite polynomials listed earlier shows that standard Brownian motion is actually p-harmonizable in the strict sense. The other classical example of a strict sense p-harmonizable process is the Poisson process. For a Poisson process, with intensity 1 for example, a sequence of time-space harmonic polynomials is given by the so-called two-variable Charlier polynomials
where {
~
} denote the Stirling numbers of the second kind. The Gamma
process is another example of a strict sense p-harmonizable process. In keeping with the special properties of the sequence of Hermite polynomials mentioned earlier, we introduce here a list of properties for a sequence of twovariable polynomials. Let {Pk, k :2: I} be a sequence oftwo-variable polynomials with Pk being of degree k in x. We define Po == 1. Let us write Pdt,x) = k
2:= p(k)(t)x j , where . 0 J
the pY)(t) are polynomials in t. We are going to refer to
J=
the following properties in the sequel.
(i) Strict sense property: For each k :2: 1, p~k)(.) == 1.
Polynomially Harmonizable Processes
156
aPk
(ii) The Appell property: For each k ?: 1, ax
=
kPk- 1 , that
.
. (k)
IS,
JPj (t) =
kPJ-l (k-l)(t) , 1< · 1, PdO,x) 0, 0:::; j :::; k - 1.
=
xk, that is, pr)(O)
The sequence of Hermite polynomials satisfies all the properties (i) - (iv); property (iii) holds here with h2 = -1 and hk = 0 for k =J- 2. It is easy to verify that the two variable Charlier polynomials satisfy these properties as well. Theorems 2.3 and 2.4 in the next section will establish that these are reflections of the fact that both Brownian motion and Poisson process are homogeneous Levy processes. Let us make some basic observations about the properties listed above. First of all, with the convention that Po - 1, property (i) will always imply property (ii). Secondly, in our applications, the sequence {Pk } will be arising as time-space harmonic polynomials of a process M. Now if, the process itself happens to be a martingale, we can always take PI = x, in which case property (ii) will actually imply a slightly stronger property than (i), namely, (i/) for each k ?: 1, Pdt, x) - xk has degree at most k - 2 in x, that is, p~k) == 1 (k) and Pk-I = O. Properties (ii) and (iii) for a sequence of polynomials were studied analytically in an entirely different context in Sheffer [7], which is the source of our terminolgy for these properties in this context. It turns out that for a stochastic process M, the properties (ii), (iii) and some other algebraic/analytic properties the corresponding sequence of time-space harmonic polynomials are intimately connected to some stochastic properties of M.
2
Levy Processes and p-Harmonizability
In this section, we describe some of the results on p-harmonizability of Levy processes. Details of these can be found in [6]. Discrete-time versions of many of these results were proved earlier in [2]. For us, a Levy process will mean a process M = {Mt, t ?: O} with independent increments and having no fixed times of discontinuity. A homogeneous Levy process is one which is homogeneous as a Markov process, that is, whose increments are stationary besides being independent. In the results that follow, we will often need to impose two conditions on the process M, to be referred to as the moment condition and support condition. They are as follows: • Condition (Mo)
For all t, M t has finite moments of all orders.
A. Goswami and A. Sengupta
157
i 00,
• Condition (Su) : There is a sequence tn Isupport (MtJ I > k for infinitely many tn.
such that, for all k 2:: 1,
The moment condition (Mo) is clearly necessary for the process to be p-harmonizable. The role of the condition (Su) is more technical in nature. However, it may be noted that any homogeneous Levy process always satisfies this condition (unless, of course, it is deterministic). For a general Levy process, a simpler condition that gurantees (Su) is that M t - Ms be non-degenerate for all 0 S; s < t, that is, the increments are all non-degenerate. We now state some of the main results from [6].
Theorem 2.1. Any homogeneous Levy process M = {Mt, t 2:: O} with Mo == 0 and satisfying the conditions (Mo) and (Su) is p-harmonizable in the strict sense. Moreover, there exists a unique sequence P k E P k (M), k 2:: 1 satisfying properties (i) - (iv) and such that P(M) is just the linear span of {Pk , k 2:: I}. Further, the process M is uniquely determined by the sequence {Pd upto all the moments of its finite-dimensional distributions. Remark: (i) The fact that P(M) equals the linear span of {Pk, k 2:: I} implies, in particular, that P(M) = P(M). This is actually a special case of a more general fact proved by Goswami and Sengupta in [2], namely, that for any process M satisfying (Su), if Pk(M) -=/=- 0 V k, then Pk(M) = Pk(M) V k. (ii) The property of M being determined by the sequence {Pd can be strengthened as follows. If we assume, for example, that for some t > 0 and E > 0, E (exp{ aMd) < 00 V Ia I < E, then the sequence {Pd completely determines the distribution of the process M. Theorem 2.2. Let M = {Mt, t 2:: O} be a Levy process with Mo == 0 and satisfying the conditions (Mo) and (Su). Then M is p-harmonizable if and only if for each k 2:: 1, E(Mtk ) is a polynomial in t. In this case, there exists a unique sequence P k E Pk(M), k 2:: 1 satisfying properties (i), (ii) and (iv) and such that P(M) is just the linear span of {Pk , k 2:: I}. Further, the process M is uniquely determined by the sequence {Pk } upto all the moments of its finite-dimensional distributions. Remark: Note the absence of the pseudo-type-zero property (iii) in this case. In fact, property (iii) would not hold unless the process is homogeneous. [see Theorem 2.4]. We now describe a characterization of p-harmonizability of a Levy process M in terms of the underlying Levy measure, or, equivalently the Kolmorov measure. Associated to any Levy process, there is a a-finite measure m on [0,00) x (JR \ {O}), called its Levy measure, such that, E{exp(iaMt)} exp [iafL(t) - !a 2 a2 (t)
+
J
(e iCW
-
1- 1
i~:2
)
m([O, t]
@
dU)] ,
where fL(') and a 2 (.) are the mean and variance functions of the 'gaussian part' of M. It can be shown that p-harmonizability of M is equivalent to requiring
Polynomially Harmonizable Processes
158
that all the following functions be polynomials in t:
and for k
> 2, hk(t)
=
J
ukm([O, t] ® du).
The above characterization takes on a slightly simpler form when expressed in terms of what is known as the Kolmogorov measure associated with the process. It is the unique Borel mesure L on [0, (0) x lR such that log E{ exp(io:Mt)} = io:v(t)
+
J(
eiaU -
1u2
iO:U) L([O, t] ® du),
where v(t) = EMt is the mean function of the process M. We refer to Ito [3] for the definition and the transformation that connects the Kolmogorov measure and the Levy measure. A necessary and sufficient condition for pharmonizability of the process M is that : v( t) as well as the functions hk (t) = Ju k - 2 L([0,t] ®du), k 2: 2 are all polynomials in t. We have seen that for any Levy process M satisfying the conditions (Mo) and (Su), we can get a sequence P k E Pk(M), k 2: 1, such that Appel property (ii) holds. Moreover, if M is homogeneous, then the sequence {Pk} can be chosen so as to satisfy the pseudo-type-zero property (iii). The next two results show that, under some conditions, the converse is also true. In both the following theorems, M = {Mt, t 2: O} will denote a continuous-time stochastic process with r.c.I.I. paths starting at Mo == and satisfying conditions (Mo) and (Su) and {Ft, t 2: O} will denote the natural filtration of M.
°
Theorem 2.3. If there exists a sequence Pk E Pk(M), k 2: 1, satisfying the Appel property (ii), then for each 0 ::; s < t, the conditional moments E((Mt-Ms)kIFs) are degenerate for all k. If moreover, for each t, the momentgenerating function of M t is finite on some open interval containing 0, then M is a Levy process. Theorem 2.4. If there exists a sequence P k E Pk(M), k 2: 1, satisfying both the Appel property (ii) and the pseudo-type-zero property (iii) and if for each t, the moment-generating function of M t is finite on some open interval containing 0, then M is a homogeneous Levy process. Remark: Under the hypothesis of either of the above theorems, it can further be shown that the sequence {Pd satisfies the properties (i) and (iv) as well and is the unique sequence to do so. Moreover, the sequence {Pd span all of P(M) and also determines the distribution of M.
Next, we briefly mention some connections between the time-space harmonic polynomials of a process and what is known as semi-stability property, as developed in Lamperti [5]. Recall that a process M with Mo _ 0 is called semi-stable of index (3 > 0 if for every c > 0, the processes {Met, t 2: O} and {c!3 M t , t 2: O} have the same distribution. It can be easily shown that if {Pk E Pk(M)} is
A. Goswami and A. Sengupta
159
a sequence of time-space harmonic polynomials of a semi-stable process M, of index (3, then each P k satisfies the following homogeneity property:
where Pk(·) is the one-variable polynomial Pk(l, .). In other words, each P k is homogeneous in t f3 and x. It can be shown that, under mild technical conditions, the converse is also true, that is, the existence of a sequence {Pk E Pk (M)} such that each P k is homogeneous in t f3 and x, for some (3 > 0, implies that the process M is semi-stable of index (3. It is also worthwhile to point out here that if a process M admits a sequence {Pk} of time-space harmonic polynomials which are homogeneous in t f3 and x, then 2(3 must be an integer and that in case 2(3 is odd, the finite dimensional distributions of M are all symmetric about o. Finally, let us mention how an intertwining relationship between two markov processes, as developed in Carmona et al [1] relates the time-space harmonic polynomials of the two processes. If M and N are two markov processes with semigroups (Pt ) and (Qt) respectively, one says that the two processes (or, the two semi groups ) are intertwined if there exists an operator A such that APt = QtA V t. In many cases, the operator A is given by the "multiplicative kernel" for a random variable Z, that is, AJ(x) = EJ(xZ). In such a case, it is k
easy to show that, if P(t, x) =
L
Pj(t)x j is a time-space harmonic polynomial
j=O k
for the process M, then P(t, x)
= AP(t, x) = L pj(t)E(Zjx j ) is time-space j=O
harmonic for N. This has proved to be very useful in that if one knows the time-space harmonic polynomials of a process M, then one can get those for other processes which are intertwined with M. This is illustrated with examples in Section 4.
3
Finitely Polynomially Determined Levy Processes
In this section, we address the main question of this article, which involves obtaining a characterization of Levy processes whose laws are determined by finitely many of its time-space polynomials. In a sense, this is an extension of Levy's characterization of standard Brownian motion, which says that, under the additional assumption of continuity of paths, standard Brownian motion is characterized by two of its time-space harmonic polynomials, namely, the first two 2-variable Hermite polynomials PI (t, x) = x and P 2 (t, x) = x 2 - t. One knows that the continuity of paths is a crucial assumption here, without which the characterization does not hold. In the results that follow, the only path property we will assume is the standard assumption of r.c.I.I. paths for Levy processes. Let us start with some general definitions. Let C be a given class of processes.
Polynomially Harmonizable Processes
160
Definition A process M E C will be called k-polynomially determined in C (in short, k-p.d. in C), if Pj(M) =J 0, V j ::; k, and, for any N E C, Pj(N) = Pj (M) V j ::; k =? N d M. (Here:1:: means equality in distribution.) Processes which are k-p.d. in C for some k ;::: 1 are called finitely polynomially determined in C (in short, f.p.d. in C).
Let us remark here that an f.p.d. process need not be p-harmonizable. A general question that we may address is: for what classes of processes C, can one get a complete characterization of the f.p.d. members of C? For two important classes of processes, such a complete characterzation has been obtained and are presented below. The first result characterizes the f.p.d. processes in the class of all homogeneous Levy processes. As mentioned in the previous section, for any Levy process M, one has the representaion log(E( eiaMt ))
io:v(t)
+J
io:f.-t(t) -
(e
iaU
-u~ - iO:U) L([O, t] ® du)
~o:2(J2(t) + J
(e iaU - 1 - 1
i::
2)
m([O, t] ® du),
where Land m are called respectively the Kolmogorov measure and the Levy measure associated to the process M. In case M is homogeneous, the measures Land m turn out to be the product measures
L(dt ® dx) = dt ® l(dx), m(dt ® dx) = dt ® 77(dx) , where I and 77 are (J-finite measures on lR and lR \ {O} respectively and the above representations take on the following special forms
io:vt + t
J(
eiaU - 1 -
u2
io:f.-tt - .lo:2(J2t + tJ 2
iO:U) l(du)
(e iau - 1 -
io:u ) 77(du).
1 +u 2
It may be pointed out in this connection that the relation between the measures land 77 is simply given by
An important property of I that will be used subsequently is that for all k ;::: 2, the k-th cumulant of Ml equals J u k - 21(du). the following theorem now gives a characterization of f.p.d processes in the class of all homogeneous Levy processes. Theorem 3.1. A process M is finitely polynomially determined in the class of all homogeneous Levy processes if and only if the associated measure l, or equivalently the measure 77, has finite support. Proof. It is immediate from the above relation between the measures land 77 that whenever one of them has finite support, so does the other. In the proof, we will work with l.
A. Goswami and A. Sengupta
161 n
Suppose first that
l
has finite support, say,
l
= L
(;Ii6{rd,
where (;Ii
>
0, ~
=
i=1
1, ... ,n and r/s are distinct real numbers. Here 6{r} denotes the 'dirac' mass at r. We show that M is k-p.d. among homogeneous Levy processes with k = 2n+2. Let N be any homogeneous Levy process with Pj(N) = Pj(M) \:j j :;
2n+2. We will show that VN = VM and IN = IM which will imply that N!!:... M. It is easy to see that P j (N) = Pj (M) \:j j :; 2n+ 2 implies the equality of the first 2n + 2 moments of NI and M I , which in turn implies the equality of their first 2n + 2 cumulants. This entails, first of all, that VN = VM and also, in view of the above mentioned property of l, that ujIN(du) = ujIM(du) \:j j = 0,1, ... , 2n. From these, one can easily deduce that for any choice of distinct real numbers
J
In particular, taking ai
=
J
ri, \:j i, one obtains that
JiDI
(u - ai)2IN(du)
= 0,
n
implying that IN is supported on {rl,'" ,rn}, that is, IN
= L
(;I~6{r;}' for
i=1
non-negative (;I~, 1 :; i :; n. Using the facts VN = VM and J u j l M (du) \:j j = 0, 1, ... ,2n, it is now easy to conclude that (;I~ is, IN = IM.
JujIN(du) = =
(;Ii \:j
i, that
To prove the converse, suppose that M is a homogeneous Levy process for which the associated measure I is not finitely supported. We show that M is not f.p.d. by exhibiting, for any k, a homogeneous Levy process N, different from M, such that P j (N) = P j (M) \:j j :; k. This is done as follows. Fix any k ~ 1. Since 1 is not finitely supported, we can get disjoint borel sets Ai C JR, i = 1, ... ,k such that l(Ai) > 0, \:j i. Consider the real vector space of signed measures on JR defined as V
=
{,u : ,u(.) =
linear map A : V
-+
it
cile
n Ai),
ci
E JR, 1 :; i :; k} and consider the
JRk-1 defined by
A being a linear map form a space of dimension k into a space of dimension k - 1, the nullity of A must be at least 1. Choose a non-zero ,u in the null-space of A. Further, we can and do choose ,u so that 1,u(Ai)1 < l(Ai), \:j i. If we now define i = l + ,u, then i is a positive measure with i f- l but J u j i( du) = J ujl(du), \:j j = 0"" ,k - 2. It is now easy to see that if N is the homogeneous Levy process with VN = VM and Kolmogorov measure L(dt ® dx) = dt ® i(dx) , d
then Pj(N) = Pj(M)
\:j
j :; k but N
f-
M. 0
Remarks: (i) A simple interpretation of the above therorem is that a homogeneous Levy process is f.p.d. if and only if its jumps, if and when they occur, are of sizes in a fixed finite set.
(ii) The proof of 'if' part of the theorem shows that if the measure l is supported on precisely k many points, then the process is determined by its first 2k + 2
Polynomially Harmonizable Processes
162
many time-space harmonic polynomials. A natural question is whether 2k + 2 is the minimum number of polynomials necessary. As we shall see in Section 4, that is indeed the case for the most common examples of homogeneous Levy processes. We conjecture that it is perhaps true in general. Our next reult will give a similar characterization of the f. p.d. property in a more general class of Levy processes than the homogeneous ones. To be specific, we consider the class of those Levy processes for which the Kolmogorov measure admits a 'disintegration' w.r.t. the Lebesgue measure on [0,(0). Formally, let us say that the Kolmogorov measure L of a Levy process M admits a 'derivative measure' I if L(dt,dx)
=
l(t,dx)dt,
where l(t, A), t E [0,(0), A E B is a transition measure on [0,(0) x B. Here B denotes the Borel a-field on R We denote C to be the class of all those Levy processes whose Kolmogorov measure admits such a derivative measure. Clearly, all homogeneous Levy processes belong to this class, since in that case l(t,·) == l(·). The class C is fairly large. For example, Gaussian Levy processes as well as non-homogeneous compound Poisson processes belong to this class. Since C is clearly a vector space, any Levy process that arises as the sum of independent Levy processes of class C also belong to this class. As expected, our characterization of f.p.d. processes among the class C will be in terms of the derivative measure l(t,·) defined above and the general idea of the proof runs along the same lines as in the case of homogeneous Levy processes. However, the actual argument becomes a little more technical. For example, we would show that a process M in the class C cannot be k-p.d. unless for almost all t, the derivative measure l(t,·) is supported on at most k points. This is the content of the following Lemma 3.1. The idea of the proof is analogous to that of the 'only if' part of Theorem 3.1 for homogeneous Levy processes. That is, assuming the contrary is true, we will have to define a new process N in class d
C such that Pj(N) = Pj(M) V j :s; k but N =I- M. However, getting hold of this process N or equivalently its derivative measure l( t, .) involves using an appropriate variant of a result of Descriptive Set Theory, known as Novikov's Selection Theorem, stated below as Lemma 3.2. We refer to Kechris [4] for details. Lemma 3.1. Suppose the process M is k-polynomially determined in class = {t >
C. Then for any version of l, the set T c [0, (0) defined by T Isupp(l (t, .)) I > k} is Borel and has lebesgue measure zero.
°:
We omit the proof of this lemma here. As mentioned above, the proof uses the following selection theorem (see [6] for details). Lemma 3.2. Suppose U is a standard Borel space and V is a a-compact subset of a Polish space. Let B c U x V be a Borel set whose projection to U is the whole of u. Suppose further that, for each x E U, the x-section of B is closed in V. Then there is a Borel measurable function 9 : U -+ V whose graph is contained in B.
A. Goswami and A. Sengupta
163
We now state and prove the characterization result for f.p.d.-processes in the class C.
Theorem 3.2. Let M be a Levy process of the class C. (a) If there exists an integer k :2: 1 and a measurable function (XI,··· ,Xk,PI,··· ,Pk) : [0,(0) --t]Rk X [O,oo)k such that (i) for each j k
0,1, ... ,2k,
2: Pi(t)(Xi(t))j
is a polynomial in t almost everywhere, and, (ii)
i=l k
l(t,·)
=
2: Pi(t)O{x;(t)}(-)
is a version of the derivative measure for M, then M
i=l
is finitely polynomially determined (indeed, (2k + 2)-polynomially determined) in C. (b) Conversely, if M is finitely polynomially determined in C, then there exists an integer k :2: 1 and a measurable function (Xl, ... ,Xb PI, ... ,Pk) : [0,(0) --t ]Rk X [0, oo)k such that a version of the derivak
tive measure associated with M is given by l(t,·)
=
2: Pi(t)O{x;(t)}(-). i=l
As mentioned above, the idea of the proof is similar to the homogeneous case except that it is a little more technical. One of the key observations used in the proof is that for a process M in the class C, Pj(M) =I- 0, 1 :s: j :s: k if and only if the first cumulant CI (t) of the process M is a polynomial in t and for all 2 :s: j :s: k, the functions t f-----+ J u j - 2l(t, du) are polynomials in t almost everywhere, where l is a version of the derivative measure associated to M. Using this, here is a brief sketch of the proof of the theorem.
Proof. (a) In view of the above observation, the conditions (i) and (ii) clearly imply that Pj(M) =I- 0, 1 :s: j :s: 2k + 2. If now N is another process of class C with Pj(N) = Pj(M) V 1 :s: j :s: 2k + 2, then it will follow that N has the same mean function as M and also for all O:S: j:S: 2k, JujlN(t,du) = JUjlM(t,du) for almost all t E [0, (0), where IN and lM denote (versions of) the derivative measures associated with Nand M respectively. Consequently, one will have k
J
IT (U- Xi(t))2lN(t,du) = i=l
k
J
IT (u- xi(t))2lM(t,du).
By the same argument as
i=l
in the proof of the 'if' part of Theorem 3.1, we get IN(t,·)
= lM(t,·)
for almost
d
every t, and hence N = M. (b) Suppose that Mis k-polynomially determined in C. Using Lemma 3.1, one can get a version l(t,·) of the derivative measure associated to M such that Isupp(l(t, ·))1 :s: k for all t E [0, (0). For each 1 :s: j :s: k, let T j = {t E [0, (0) : Isupp(l(t, ·))1 = j}. It can be shown that each T j and hence UjTj is a Borel set. For t E T j , order the elements of supp(l(t, .)) as XI(t) < ... < Xj(t) and denote the 1(t, .)- measures of these points by PI (t), ... ,Pj (t) respectively. Also, for j < i :s: k, set Xi(t) = xj(t)+l andpi(t) = 0. Finally, for t 1: UjTj , set Xi(t) == Yi and Pi (t) == for all 1 :s: i :s: k, where Yl, ... ,Yk is any arbitrarily chosen set of k points. With these notations, it is clear that l (t, .) has the form asserted. One can now show that the mapping t f-----+ (XI(t),··· ,Xk(t),PI(t),··· ,Pk(t)) as 0 defined above is measurable and that completes the proof.
°
Remark: In the next section, we will see some examples of possible forms of the functions Xi(t) and Pi(t). Let us remark here that it is possible to formulate
Polynomially Harmonizable Processes
164
the definition of the class C in terms of the Levy measures and then to give a characterization involving the 'derivative measure' arising out of the Levy measure. However, it is not clear how to go beyond the class C and to even formulate a condition that will, for example, characterize the f.p.d. processes among all Levy processes.
4
Some Examples
The most commonly known examples of polynomially harmonizable processes are the standard Brownian motion and the standard Poisson process. One can easily see that for a Brownian motion with fL and a 2 as its drift and diffusion coefficients respectively, a canonical sequence of time-space harmonic polynomials is given by Pk(t, x) = (at)k/2pk(x
;!t), at
where the Pk are the usual one-variable
Hermite polynomials as defined in Section 1. Similarly, for the Poisson process with intensity A, a sequence of of time-space harmonic polynomials is given by Pk(t,x)
=
jto
(~)xj:~
{\-j}
(At)i, where
{ ~} denote the Stirling numbers of the second kind. If M is a non-homogeneous compound Poisson process with intensity function A(') and jump-size distribution F, then it is not difficult to see, using the results described in Section 2, that M is polynomially harmonizable if and only if AU is a polynomial function and F has finite moments of all orders. It is possible, though cumbersome, to get an explicit sequence of time-space harmonic polynomials.
A not so well-known example of a p-harmonizable process is the process M = BES 2 (1), the square of the 1-dimensional Bessel process. It is well-known that
. a seml-sta . bl e markov process wh ose generator IS . gIven . by dx d + 2X dx d2 ' · IS t h IS 2 Using this, one can show that M is polynomially harmonizable and that a k
sequence of its time-space harmonic polynomials is given by P k (t, x) k! tk-j J (2j)!(k-j)! X
= L (- 2)j j=O
U sing the technique mentioned at the end of Section 2, we can now get other examples of p-harmonizable processes that arise as markov processes whose semigroups are intertwined with that of the process BES2(1). Some examples of random variables which lead to interesting semigroups intertwined with that of BES 2 (1), in the sense described in Section 2, are (i) Z = Z 1. b, having the beta distribution with parameters -21 and b, 2 ' and, (ii) Z = 2Zb+~' where Zb+~ has gamma distribution with parameter b+~. The first one leads to the process BES2 (2b ), the square of the Bessel process of
A. Goswami and A. Sengupta
165
dimension 2b, while the latter leads to a certain process detailed in Yor [8] with "increasing saw-teeth" paths. Another interesting example of a process intertwined with BES2 (1) in the same way is what is called Azema's martingale (see Yor [9]) defined as M t = sgn(Bt) ·-jt - gt, t 2.: 0, where B is the standard Brownian motion and gt denotes the last zero of B before time t. The multiplicative kernel here is given by the random variable ml, ther terminal value of " Brownian meander". In [9], Yor uses Chaotic Representation Property to give an alternative proof of pharmonizability of Azema's martingale as well as each member of the class of "Emery's martingales'. As an illustration of our method, we use the time-space harmonic polynomials of BES2(1) as obtained above and the intertwinning to describe time-space harmonic polynomials for two of the cases mentioned above. In the case of BES 2(2b), a sequence of time-space harmonic polynomials are
k
(-2)j(1)j t k- j
.
i) ( .) xJ, where (Y)k stands for the prodj=O (2J)!(b+ 2 j k-J!
given by Pk(t,x) = ~. k-l
uct
TI (y + i).
i=O
For the Azema's martingale, one uses the fact the ml has a Rayleigh distribution to obtain a sequence of time-space harmonic polynomials given by Pdt, x) = k.
(k)
+ 1)hj
(t)x j , where r(.) denotes the gamma function
EHk(t, mIX)
= ~ 2~r(~
and Hk(t,x)
= ~ hjk) (t)x j are the 2-variable Hermite polynomials.
j=O k
j=O
We now discuss some examples of f.p.d. processes. First of all, it is not difficult to see that the only 2-p.d. Levy processes are those that are deterministic, that is, M t is identically equal to a polynomial pet). Our first example of a non-trivial f.p.d. process is the standard Brownian motion, which is a homogeneous Levy process with l(du) = 6{0}(du). Thus, by our Theorem 3.1, standard Brownian motion is uniquely determined among homogeneous Levy processes by its first four time-space harmonic polynomials, for example, by the first four 2-variable Hrermite polynomials. This result should be contrasted with the well-known characterization due to Levy, which says that the first two Hermite polynomials suffice if one assumes continuity of paths in addition. In contrast, our result says that among all homogeneous Levy processes, standard Brownian motion is the only one for which the first four hermite polynomials are time-space harmonic. A natural question is whether we can do with less than four. The answer is an emphatic 'no'. An example of another homogeneous Levy process for which the first three Hermite polynomials are time-space harmonic is the mean zero process determined by the Kolmogorov measure L(dt, du) = dt o l(du) , where l(du) = ~ [6{ _l}(du) + 6{1}(du)]. It is not difficult to see that any gaussian Levy process, with mean and variance functions being polynomials, is also 4-p.d.
For the homogeneous Poisson process with intensity A, one has l(du) = A6{1} , so that once again it is 4-p.d. among all homogeneous Levy processes. Here
166
Polynomially Harmonizable Processes
also, four is the minimum number needed, since one can easily construct an example of a different homogeneous Levy process for which the first three Charlier polynomials are time-space harmonic. For the non-homogeneous compound Poisson process, it can easily be seen that it is f.p.d. if and only if the jump-size distribution is finitely supported and the intensity function is a polynomial function and that in this case, it is actually (2k + 2)-p.d. where k is the cardinality of the support of the jump-size distribution. We conclude with some examples of f.p.d. processes in the class C. We have a characterization of such processes in Theorem 3.2. Here are some examples of possible forms of the functions Xi(t) and Pi(t), that appear in that Thoerem. We consider only the case k = 2. The simplest possible case is that Xl (t), X2 (t) and PI(t) 2: 0,P2(t) 2: 0 are themselves polynomials. Another possibility is that
Xl(t) = a(t) + Jb(ij,X2(t) = a(t) - Jb(ij,PI(t) = c(t) + d(t) Jb(ij,P2(t) = c(t) - d(t)Jb(ij, where a, b, c, d are polynomials so chosen that c + dVb, c - dVb are both non-negative on [0, 00). One can similarly construct other examples. From Theorem 3.2, it follows that all these would lead to processes that are f.p.d (in fact, 6-p.d.) in the class C. A. Goswami Stat-Math Unit Indian Statistical Institute 203 B.T. Road Kolkata 700 108, India ago swami @indiana.edu
A. Sengupta Division of Theoretical Statistics and Mathematics Indian Statistical Institute 203 B.T. Road Kolkata 700 108, India
Bibliography [1] Carmona, P., Petit, F. and Yor, M. (1994). Sur les fonctionelles exponentielles de certain processus de levy. Stochastics and Stochastic Reports, 47, p. 71-101. [2] Goswami, A. and Sengupta, A. (1995). Time-Space Polynomial Martingales Generated by a Discrete-Parameter Martingale. Journal of Theoretical Probability, 8, no. 2, p. 417-431. [3] Ito, K. (1984). Lectures on Stochastic Processes. TIFR Lecture Notes, Tata Institute of Fundamental Research, Narosa, New Delhi. [4] Kechris, A.S. (1995). Classical Descriptive Set Theory, v. 156, Graduate Texts in Mathematics, Springer-Verlag. [5] Lamperti, J. (1972). Semi-Stable Markov Processes. Wahrscheinlichkeitstheorie Verw. Gebiete. 22, p. 205-225.
Zentrablatt
[6] Sengupta, A. (1998). Time-Space Harmonic Polynomials for Stochastic Processes, Ph. D. Thesis, Indian Statistical Institute, Calcutta, India.
A. Goswami and A. Sengupta
167
[7] Sheffer, I. M. (1939). Some Properties of Polynomial Sets of Type Zero. Duke Math. Jour., 5, p. 590-622. [8] Yor, M. (1989). Vne Extension Markovienne de l'algebre des Lois Betagamma. G.R.A.S. Paris, Serie I, 303, p. 257-260. [9] Yor, M. (1994). Some Aspects of Brownian Motion; Part II : Some Recent Martingale Problems. Lectures in Mathematics, ETH Zurich. Laboratoire de Probabilites, Vniversite Paris VI.
168
Polynomially Harmonizable Processes
Effects of Smoothing on Distribution Approximations Peter Hall Australian National University
and Xiao-Hua Zhou Indiana University School of Medicine
Abstract We show that a number of apparently disparate problems, involving distribution approximations in the presence of discontinuities, are actually closely related. One class of such problems involves developing bootstrap approximations to the distribution of a sample mean when the sample includes both ordinal and continuous data. Another class involves smoothing a lattice distribution so as to overcome rounding errors in the normal approximation. A third includes kernel methods for smoothing distribution estimates when constructing confidence bands. Each problem in these classes may be modelled in terms of sampling from a mixture of a continuous and a lattice distribution. We quantify the proportion of the continuous component that is sufficient to "smooth away" difficulties caused by the lattice part. The proportion is surprisingly small - it is only a little larger than n-1logn, where n denotes sample size. Therefore, very few continuous variables are required in order to render a continuity correction unnecessary. The implications of this result in the problem of sampling both ordinal and continuous data are discussed, and numerical aspects are described through a simulation study. The result is also used to characterise bandwidths that are appropriate for smoothing distribution estimators in the confidence band problem. In this setting an empirical method for bandwidth choice is suggested, and a particularly simple derivation of Edgeworth expansions is given.
Keywords: Bandwidth, bootstrap, confidence band, confidence interval, continuity correction, coverage error, Edgeworth expansion, kernel methods, mixture distribution.
1 1.1
Introduction Smoothing in distribution approximations
Rabi Bhattacharya has made very substantial contributions to our understanding of normal approximations in statistics and probability. None has been less important and influential than his exploration and application of smoothing as it is related to distribution approximations. For example, his development of ways of smoothing multivariate characteristic functions lies at the heart of his pathbreaking work on Berry-Esseen bounds and other measures of rates of
169
170
Effects of Smoothing on Distribution Approximations
convergence in the multivariate central limit theorem (e.g. Bhattacharya 1967, 1968, 1970; Bhattacharya and Rao, 1976). His introduction of what has become known as the "smooth function model" (Bhattacharya and Ghosh, 1978), for describing properties of Edgeworth expansions of statistics that can expressed as smooth functions of means, has allowed wide-ranging asymptotic studies of statistical methods such as those based on the bootstrap. The present paper is a very small contribution, but nevertheless in a related vein - a small token of our appreciation of the considerable contribution that Rabi has made to distribution approximations in mathematical statistics. A key assumption in many distribution approximations in statistics is that the distribution being approximated is continuous. Without this property, not only are approximation errors likely to be large, but special features that the approximations are often assumed to enjoy can be violated. These include the property that the coverage error of a two-sided confidence interval is an order of magnitude less than that for its one-sided counterpart. In a range of practical problems the assumption of smoothness can be invalid, however. In such cases there may sometimes be enough "residual" smoothing present in other aspects of the problem for it to be unnecessary to smooth in an artificial way. Nevertheless, even in these circumstances it is important to know how much residual smoothing is required, so that the adequacy of the residual smoothing can be assessed. In other problems there is simply not enough smoothing to overcome the most serious discretisation errors; there, artificial smoothing, for example using kernel methods, can be efficacious. In the present paper, motivated by particular problems of both these types, we derive a general theoretical benchmark for the level of smoothing that is adequate in each case. In the first class of problem, encountered in several practical settings, we suggest an empirical method for assessing whether the benchmark has been attained. In the second class, related to smoothed distribution estimation, we introduce an empirical technique for determining how much smoothing should be provided. Both types of problem have a common basis, in that they represent mixture-type sampling schemes where a portion of the data are drawn from a smooth distribution and the remainder from a lattice distribution. It is shown that the sampling fraction of the smooth component can be surprisingly small before difficulties arise through the roughness of the other component. The threshold is approximately n-1logn, where n denotes sample size. In the case of the second problem this result may be interpreted as a prescription for bandwidth choice, which can be implemented in practice using a smoothed bootstrap method. For the first problem the result may be interpreted as defining a safeguard: only when the smooth component is present in a particularly small proportion will the unsmooth component cause difficulties. Next we introduce the two classes of problem.
1.2
First problem: bootstrap inference for distributions with both ordinal and continuous components
In some applications it is common to encounter a data distribution that is a mixture of an atom at the origin and a continuous component on the positive
Peter Hall and Xiao-Hua Zhou
171
half-line. Examples include the the cost of health care (e.g. Zhou, Melfi and Hui, 1997) and the proportion of an account that an audit determines to be in error (e.g. Cox and Snell, 1979; Azzalini and Hall, 2000). In the second example, both 0 and 1 can be atoms of the sampled distribution. In both examples the mean of the mixture, rather than the mean of just the continuous component, is of interest. If all the data are ordinal and lie within a relatively narrow range, for example if the costs or proportions in the respectively examples are distributed among only a half-dozen equally-spaced bins, then the lattice nature of the data needs careful attention if bootstrap methods are to be used to construct confidence intervals for the mean. Indeed, particular difficulties associated with this case were addressed in the first detailed theoretical treatment of bootstrap methods for distribution approximation; see Singh (1981). One way of alleviating these difficulties is to use smoothed bootstrap methods; see for example Hall (1987a). On the other hand, no special treatment is required if just the positive part of the sampled distribution is addressed, provided this portion of the distribution is smooth.
This begs the question of what should be done in the mixture case. Does the implicit smoothing provided by the continuous component overcome potential difficulties caused by the ordinal component? How does the answer to this question depend on the proportion of the ordinal component in the mixture? Our results on the effects of smoothing on distribution approximation allow us to answer both these questions; see sections 3.1 and 4.1. A related problem is that of smoothing a discrete distribution so as to construct a confidence interval for its mean. One approach is to blur each lattice-valued observation over an interval on either side of its actual value; see for example Clark et al. (1997, p. 12). For example, if a random variable Y with this distribution takes only integer values, we might replace an observed value Y = i by i + EZ, where E > 0 and Z is symmetric on the interval [-1, 1]. How large does E have to be in order to effectively eliminate rounding errors from an approximation to the distribution of the mean of n values of Y? In particular, can we allow E to decrease with sample size, and if so, how fast? Answers will be given in sections 3.1 and 4.1. Of course, in this second aspect of the first problem it is the mean of Y, not the mean of X = Y + EZ, about which we wish to make inference. However, the mean of EZ is known, and so it is a trivial matter to progress from a confidence interval for E(X) to one for E(Y).
1.3
Second problem: functions
confidence bands for distribution
Let U = {U1 , ... , Un} denote a random sample drawn from a distribution F, and write F for the empirical distribution function based on U. Then, with Za/2 denoting the upper ~a-Ievel point of the standard normal distribution,
172
Effects of Smoothing on Distribution Approximations
F ± {n- 1F(1 - F)}1/2 Za/2 is a conventional confidence band for F founded on normal approximation, with nominal pointwise coverage 1 - CY. In more standard problems, involving a mean of smoothly distributed random variables, the coverage accuracy of such a band would equal 0 (n -1). In the present setting, however, owing to asymmetric rounding errors that arise in approximating the discrete Binomial distribution by a smooth normal one, coverage error of even a two-sided symmetric confidence band is in general no better than O(n- 1 / 2 ). A particularly simple way of smoothing in this setting, and potentially overcoming difficulties caused by rounding errors, is to use kernel methods. Let K, the kernel, be a bounded, symmetric, compactly supported probability density, write L for the corresponding distribution function, and let h be a bandwidth. Then
Fh(u)
=
n- 1
t
L(U ~ U
i )
(1.1 )
2=1
is a smoothed kernel estimator of F. We may interpret E{Fh(u)} in at least two different ways: firstly, as the mean of a sample drawn from a mixture of two distributions, one taking only the values 0 and 1 (the latter with probability F(u - hc), where [-c, c] denotes the support of K), and the other having a smooth distribution (equal to that of L{ (u - Ui ) / h}, conditional on (u - Ui ) / h lying within the support of K); and secondly, as the distribution function of X = Y + hZ, where Y and Z have distribution functions F and L, respectively. Hence, this problem and those described in section 1.2 have identical roots. The bias of Fh(u), as an estimator of F(u), equals O(h2) provided F is sufficiently smooth. In relative terms its variance differs from that of F(u) by only O(h). See Azzalini (1981), Reiss (1981) and Falk (1983) for discussion of these and related properties. Together these results suggest that taking h as small as possible is desirable, since then h would have least effect on moment properties. Indeed, the moment properties suggest that h = O(n-1) might give the O(n-l) coverage error seen in conventional problems. However, it may be shown that this size of bandwidth is not adequate for removing difficulties caused by lack of smoothness of the distribution of F. With h = O(n- 1 ), rounding errors still contribute terms of order n- 1/ 2 to coverage error of two-sided confidence bands. Can we choose h large enough to overcome these problems, and yet small enough to give an order of coverage accuracy close to the "ideal" O(n- 1 )? And even if this problem has a theoretical solution, can good coverage accuracy be achieved empirically? These questions will be answered in sections 3.2 and 4.2, where we shall propose and describe an empirical bandwidth-choice method in the confidence band problem. Additionally we shall show that our approach to the problem of smoothed distribution estimation, via sampling from a mixture distribution, leads to particularly simple derivations of Edgeworth expansions. There is of course an extensive literature of the problem of bandwidth choice for kernel estimation of distribution functions. It includes both plug-in and cross-validation methods; see Mielniczuk, Sarda and Vieu (1989), Sarda (1993), Altman and Leger (1995), and Bowman, Hall and Prvan (1998). However, in all
Peter Hall and Xiao-Hua Zhou
173
these cases the bandwidths that are proposed are of asymptotic size n -1/3, much larger than n- 1 . They are appropriate only for estimation of the distribution function curve, not for confidence interval or band construction, and produce relatively high levels of coverage error if used for the latter purpose. The class of distribution and density estimation problems is characterised by an interesting hierarchy of bandwidth sizes: n- 1 / 5 for estimating a density curve, n- 1 / 3 for distribution curve estimation, and a still smaller size, approximately n -1 log n (as we shall show in section 3.2), for constructing two-sided confidence bands for a distribution function.
2
Distribution-Approximation Difficulties Caused by Lack of Smoothness
Let Xl' ... ' Xn be independent and identically distributed random variables with the distribution of X, and let X = n- 1 Li Xi denote the sample mean. Many explanations for the small-sample performance of bootstrap approximations to the distribution of X are based on properties of its Edgeworth expansion. A formal expansion exists under moment conditions alone. In particular, provided only that (2.1) the formal Edgeworth expansion up to terms in n- k / 2 is well defined; it is
where
0 and Z has a continuous distribution. As long as E = E(n) decreases to 0 more slowly than n- 1 log n, this modification allows us to approximate the distribution of the mean of Y + EZ by its formal Edgeworth expansion to any or-
Effects of Smoothing on Distribution Approximations
176
der; see section 5.3. If the distribution of Z is symmetric then the distributions of both Y and Y + EZ have the same mean and skewness, and their variances differ only to order E2. Moreover, the "converse" results described in the previous paragraphs have direct analogues in the setting of additive smoothing of a discrete distribution.
3.2
Solution to second problem
Recall from section 1.3 that we seek a pointwise, (1 - a)-level confidence band for the distribution function F. We noted there that the standard normalapproximation band, F ± {n- 1F(1- F)}I/2zOO/2' has only O(n-l/2) coverage accuracy, owing to uncorrected discretisation errors. We suggest instead the smoothed band, (3.1) where Fh is as defined at (1.1). We shall show at the end of this section that by taking h = n -1 (log n) HE, for any E > 0, coverage error of this band is reduced to O(h). That is only a little worse than the O(n-I) level encountered in related problems, where the sampled distribution is smooth. These properties are highly asymptotic in character, however. To achieve a good level of performance in practice we suggest the following approach. Using standard kernel methods, compute an estimator of the density f = F' based on the sample U. For example, if employing the same kernel K as before, the estimator would be
where hI is a bandwidth the size of which is appropriate to density estimation. (In particular, hI would generally be computed using either cross-validation or a plug-in rule; it would be of size n- 1 / 5 , in asymptotic terms.) Let Fhi(U) = fv 0,
[(1/n) 2.= EY;] converges to
> 0 as n tends to infinity.
In the special case when the distributions of ei are the same for all i (so that lin 2:i EXi = 7fl, where 7fl is the common expectation of all Xi; similarly for 7f2), A3 is satisfied. Under A1-A3, if the number n of agents increases to infinity, as a consequence of the weak law of large numbers (see Lamperti [15], p.22) we have the following property of equilibrium prices p~:
Proposition 1. Under Ai-A3, as n tends to infinity, ability to the constant
p~(w)
converges in prob-
(2.16) Roughtly, one interprets (2.16) as follows: for large values of n, the equilibrium price will not vary much from one state of the environment to another, and will be insensitive to the exact value of n, the number of agents. For the constant Po defined by (2.16), we have the following characterization of the probability of ruin in a large Walrasian economy:
Proposition 2. function,
If POeil(w)
+ (1
- po)ei2(w) has a continuous distribution
(2.17)
Nigar Hashimzade and Mukul Majumdar
193
Remark: The probability on the right side of (2.17) does not depend on n, and is determined by J-li, a characteristic of agent i, and PO. Our first task is to characterize P(R~) when n is large (so that the assumption that an individual agent accepts market prices as given is realistic). One is tempted to conjecture that the convergence property of Proposition 1 will continue to hold if correlation among agents becomes 'negligible' as the size of the economy increases. We shall indicate a 'typical' result that captures such intuition. Proposition 3. Let the assumptions (Ai) and (AS) hold. Moreover, assume
(A.2') There exist two non-negative sequences (£kh2:o, (£~)k2:0 both converging to zero such that for all i, k
ICOV(Xi,Xi+k)1 ICov(Yi, Yi+k)1 Then, as n tends to infinity,
p~ (w)
< £k
0, tPk = I}. An agent i accepts k=1
the price system pES as given. It is described by a pair (Ii, ei), where the endowment vector ei E Rl, ei »0. The wealth of the agent i at P is I
P . ei == L Pkeik· The demand function fi is a continuous function from k=1 S x R++ to R~ such that for every (p, Wi) E S x R++, P . fi(P, Wi) = Wi
Wi
=
I
(where p. fi(P, Wi) - LPkfidp, Wi)). Usually the demand functions are de-
k=1
rived from a utility maximization problem of type (P) indicated above. For our analysis, the key concepts are the excess demand function of agent i, defined as (i(p) - Ii(p, Wi) - ei (compare to (2.4)). The excess demand function I
for the economy is ((p) =
L (i(p), i=1
a continuous function on S.
Note that
Survival under Uncertainty in an Exchange Economy
194 I
LPk(ik(P)
= 0;
hence, the excess demand function for the economy satisfies
k=l
the "Walras Law":
I
p. ((p)
==
LPk(k(P)
= O.
(2.18)
k=l
An equilibrium price system p* E S satisfies ((p*) = O. By Walras Law (2.18), iffor any fJ E S (k(fJ) = 0 for k = 1,··· ,l-1 then necessarily (dfJ) = 0 for k = l. The Walras Law (2.18) can be verified directly from (2.3) and (2.4) in our example, and when the equilibrium price (2.9) is derived for the first market, there is also equilibrium in the second market which can be directly checked. A detailed exposition of this model with l 2: 2 commodities is in Debreu [9]. In [3] the Debreu model was extended to introduce random preferences and endowments, and the implications of the law of large numbers and the central limit theorem were first systematically explored. Throughout this section we shall consider l = 2 to see the main results in the simplest form.
2.4
Dependence: Exchangeability
We shall now see that if dependence among agents does not "disappear" even when the economy is large, the risk of ruin due to the "indirect" terms of trade effect of uncertainty may remain significant. To capture this in a simple manner, let us say that /1 and v are two possible probability laws of {ei (.) h::O-l. Think of Nature conducting an experiment with two outcomes "H" and "T" with probabilities (e, 1 - e), 0 < e < 1. Conditionally, given that "H" shows up, the sequence {ei (.) h>l is independent and identically distributed with common distribution /1. On the other hand, conditionally given that "T" shows up, the sequence {ei (-) h::o- I is independent and identically distributed with comon distribution v. Let 7rlp, and 7rlv be the expected values of Xl under /1 and v respectively. Similarly, let 7r2p, and 7r2v be the expected values of YI under /1 and v. It follows that Pn (-) converges to Po (-) almost surely, where Po (-) = 7flp,/[7flp, + 7f2p,] = Pop, with probability and Po(-) = 7rlv/[7flv +7r2v] = POv with probability 1 - e. We now have a precise characterization of the probabilities of ruin as n tends to infinity. To state it, write
e
J ri (/1)
{(UI,U2) E R! : POp,UI
l
+
(1- POp,)U2 :::; mi(POp,)};
(2.19)
lL (dU I, dU2).
Similarly, define ri(v) obtained on replacing /1 by v in (2.19). Proposition 4. Assume that POeil (w) + (1- PO)ei2(w) had a continuous distribution function under each distribution /1 and v of ei = (eil' ei2).
(a) Then, as the number of agents n goes to infinity, the probability of ruin of the i-th agent converges to ri (/1), with probability to ri(v), with probability 1 when "T" occurs.
e
e,
when "H" occurs and
Nigar Hashimzade and Muku] Majumdar
195
(b) The overall, or unconditional, probability of ruin converges to
Here, the precise limit distribution is slightly more complicated, but the important distinction from the case of independence (or, "near independence") is that the limit depends not just on the individual uncertainties captured by the distributions j1 and v of an agent's endowments, but also on () that retains an influence on the distribution of prices even with large n.
2.5
Dependency neighborhoods
Dependency neighborhoods were introduced by Stein [28] and are defined in the following way. Consider a set of n random agents. A subset Si of the set of integers {I, 2, ... , n} containing an agent i is a dependency neighborhood of i if i is independent of all agents not in Si. The sets Si need not constitute a partition. Further, consider a dependency neighborhood of Si - a set Ni such that Si N i , and the collection of agents in Si is independent of the collection of agents not in N i . The latter can be viewed as the second-order dependency neighborhood of the agent i. In general,
c:
Ni =
U Sj
(2.20)
{jEsd need not be the case (this is related to the fact that pairwise independence does not imply mutual independence), although one might expect this relation to hold in non-exotic situations (see, for example, [21]). Consider now an economy En with dependency neighborhoods sin), ... ,S~n) for each of n agents. As above, the i-th agent is characterized by Qi = (/, ei), where ei = (eil' ei2). The Walrasian equilibrium price p~ is given by (2.9)-(2.10). The convergence property, similar to Proposition 3, holds under modified assumptions on the distribution of random endowments and an additional assumption on the size of the dependency neighborhood. Proposition 5. Let the assumptions (A 1) and (A 3) hold. Moreover, assume
max 1COV(Zi' Zj) 1< B < 00 , Z E {X, Y}, for every i = i#jESt) 1, ... ,n uniformly in n for some sufficienly large positive B. (A.2") Bni ==
(A.4) Sn - . max US;n):::; n 1 - c uniformly in n for some ~==1, ..
E
E (0,1).
. ,n
Then, as n tends to infinity, p~ (w) converges in probability to p lim p~ (w) =
7f1 7f1
+ 7f2
Using the results of Majumdar and Rotar [19], we can construct approximate distribution of equilibrium price in a large Walrasian economy.
.
Survival under Uncertainty in an Exchange Economy
196
Proposition 6. Let the assumptions (A.1), (A.2"), (A.3) and (A.4) hold. Let us also assume that (2.20) holds for the dependency neighborhoods structure. Then the distribution of P~ (w) can be approximated by normal distribution with mean Po and variance Vn defined as
Po
(2.21 ) (2.22)
(See [13] for proofs.)
3
Extrinsic uncertainty with overlapping generations: an example
In the previous section we assumed that endowments of the agents are different in different states of environment. This type of uncertainty, that affects the socalled fundamentals of the economy (endowments, preferences, and technology), is called the intrinsic uncertainty. When the uncertainty affects the beliefs of the agents (for example, the agents believe that market prices depend on some "sunspots") whereas the fundamentals are the same in all states, this type of uncertainty is called extrinsic uncertainty. Clearly, with respect to the probability of survival, the extrinsic uncertainty has no direct effect, because it does not affect the endowments. However, it may have an indirect effect: self-fulfilling beliefs of the agents regarding market prices affect their wealth, and some agents may be ruined in one state of environment and survive in some other state, even though the fundamentals of the economy are the same in all states. To study the indirect, or the adverse term-of-trade effect of extrinsic uncertainty on survival we turn to a dynamic economy. Consider a discrete time, infinite horizon OLG economy with constant population. We use Gale's terminology [11] wherever appropriate. For expository simplicity, and without loss of generality we assume that at the beginning of every time period t = 1,2, ... there are two agents: one "young" born in t, and one "old" born in t - 1. In period t = 1 there is one old agent of generation O. There is one (perishable) consumption good in every period. The agent born in t (generation t) receives an endowment vector et = (e¥, en and consumes a We consider the Samuelson case 4 and assume, without loss vector Ct = (c¥, of generality, et = (1,0). We assume that the preferences of the agent of generation t can be represented by expected utility function Ut (-) = E rut (Ct)] with Bernoulli utility ut (Ct), continuously differentiable and almost everywhere twice continuously differentiable, strictly concave and strictly monotone onn D, compact, convex subset of R~+. The old agent of generation 0 is endowed with one
cn.
4If a population grows geometrically at the rate ,,(, so that "(t agents is born in period t, and there is only one good in each period, the Samuelson case corresponds to marginal rate of intertemporal substitution of consumption under autarky, Ul (e Y , eO)/U2 (e Y , eO), being less than 'Y- In our case "( = 1.
Nigar Hashimzade and Mukul Majumdar
197
unit of fiat money, the only nominal asset in the economy. In every period the market for the perishable consumption good is open and accessible to all agents. Denote the nominal price of the consumption good at time t by Pt. Define a price system to be a sequence of positive numbers, p = {Pt} ~o, a consumption program to be a sequence of pairs of positive numbers c = {Ct} ~o, a feasible program to be a consumption program that satisfies cf + Cf-l ~ ef + ef-I = 1. The agent of generation t maximizes his lifetime expected utility in the beginning of period t. In period 1, the young agent gives its saving (sf) of the consumption good, to the old agent in exchange for one unit of money (the exchange rate is determined by PI)' Thus, PI Sl = 1. This unit of money is carried into period 2 (the old age of agent born in period 1) and is exchanged (at the rate determined by P2) for the consumption food saved by the young agent born in period 2 (s~). The process is repeated.
3.1
Perfect Foresight Equilibrium
If there is no uncertainty, with perfect foresight the price-taking young agent's optimization problem is the following:
maxU(cf,
cn
subject to
cf cf (0
~
sf
~
1-
sf
ptsf /PHI
1, t = 1,2, ... ).
Here, sf - ef - cf is savings of the young agent (this is the Samuelson case, in Gale's definitions [11]). A perfect foresight competitive equilibrium is defined as a feasible program and a price system such that (i) the consumption program c = {cd solves optimization problem of each agent given p = {j:Jt} : (cr, cf) E V, cr = 1 - St and cf = '[JtSt/PHl with St
= arg
max U
O~s¥~1
((1-
sf)
,sf _Pt
)
Pt+1
and (ii) the market for consumption good clears in every period:
c¥ + Cf-l PtSt
for t
=
1,2,···.
1 1
(demand (demand
supply for the consumption good) supply for money)
Survival under Uncertainty in an Exchange Economy
198
By strict concavity of the utility function U (c¥, cn, the young agent's optimization problem has a unique solution. Hence, we can express St as a single-valued function of pt! Pt+ 1, i. e. we write St = St (Pt! PH d. This function (called savings function) generates an offer curve in the space of net trades, as price ratios vary. In the perfect foresight equilibrium (3.1)
The stationary perfect foresight monetary equilibrium is a sequence of constant prices P and constant consumption programs (1 - s, s), where s = s(1).5
3.2
Sunspot equilibrium
Now consider an extrinsic uncertainty in this economy. There is no uncertainty in fundamentals, such as endowments and preferences, but the agents believe that market prices depend on realization of an extrinsic random variable (sunspot). We assume that there is one-to-one mapping from the sunspot variable to price of the consumption good. Because the agents cannot observe future sunspots, they maximize expected utility over all possible future realization of the states of nature. We examine the situation with two states of nature, (J" E {a, ,6}, that follow a first-order Markov process with stationary transition probabilities, (3.2)
where 7ffJfJl > 0 is the probability of being in state (J"' in the next period given that current state is (J", and 7ffJOi + 7f fJ (3 = 1. A young agent born in t observes price pf and solves the following optimization problem: max [7f fJOi U(c¥,fJ, c~'Oi)
+ 7f fJ (3U(c¥,fJ, c~,(3)]
subject to 1- sf
(0 ~
sf
~ 1, s(::::: 0,
(J",(J"'
E
{a,,6}).
We restrict our attention to stationary equilibria, in which prices depend on the current realization of the state of nature (J", and do not depend on the calendar time nor the history of (J". A stationary sunspot equilibrium, SSE, is a pair of feasible programs and nominal prices, such that for every (J" E {a,,6} (i) the consumption programs solve the agents' optimization problem: sfJ (pfJ / pfJ l ) =
arg max [7ffJOiU ((1 - sfJ), sfJpfJ /pOi) 0 A where a, b, e, q, r, 0, A are positive constants such that the utility function is increasing and jointly concave in its arguments in V. v(·) is the disutility of consuming less than A, the minimal subsistence level. 6 As above, agents in each generation receive identical positive endowments e = 1 of consumption good when young and zero endowments when old; the initial olds are endowed with one unit of money. 6It may seem odd that the disutility from starvation is finite, but this can be justified by the willingness of the agents to take a risk. Consider the following. In the continuous time, if the consumption of an old agent is above A, he lives to the end of the second period. If his consumption is below A, perhaps, he does not die immediately. Albeit low, the amount consumed allows him to live some time in the second period, and his lifespan in the second period is the longer, the closer is his consumption to A. In the discrete time this translates into probability of survival in the second period as a function of consumption. Thus, the
200
3.3.1
Survival under Uncertainty in an Exchange Economy
Benchmark case: perfect foresight
For the above preferences, savings function St (pt/ Pt+!) is implicitly defined by (3.5) where Pt == Pt/Pt+l. The offer curve is described by
In the stationary (deterministic) perfect foresight monetary equilibrium consumption plan of an agent is (x, 1 - x), where x solves
a(
3.3.2
J~ t : 1
x -
+ x (b+ d) + v'(I- x) + q -
x)
r - b= 0
(:1.7)
Stationary sunspot equilibria
Two states of nature, 0: and f3 evolve according to a stationary first-order Markov process. The states of nature do not affect the endowments. Agents can trade their real and nominal assets. In a stationary sunspot equilibrium with trade sa, sf3 solve the following system of equations: 7f Oo
7f oo
a(Jl-;~a +
aJ l~:a + (1 - 7f00) aJ l~:a +
r - ds o
-
V'(SO))
q - b (1 - SO) =
+ (1_7f 00 ) (aJl~r + r -
(3.8)
d s f3 - V'(Sf3)) ::
and
It is easy to see that one solution is sa = sf3 = 1- x, where x solves the equation for the perfect foresight above. This solution does not depend on the transition probabilities, prices and consumption are not affected by the uncertainty: sunspots do not matter in this equilibrium. However, there may be more solutions. For example, for a = 2, b = 0.5, d = 7, q = 0.02, r = 0.6 (J = 0.05 A = 0.3 and n CW = n f3f3 = 0.15 there are three stationary monetary equilibria in the economy: one coinciding with the perfect foresight equilibrium and two sunspot equilibria. Prices and consumption programs for these equilibria are given in the following table. old agent survives with probability 1 if CO ;:;:. A and with probability less than 1 if CO < A. Suppose, the objective of the agent is to maximize the probability of survival (or maximize his expected lifespan). Then it can be presented equivalently as the objective to minimize the disutility from consumption at the level below A. Clearly, this disutility can be finite, at least in the vicinity of A, if the agent is willing to take a risk. The authors are indebted to David Easley for this argument.
Nigar Hashimzade and Mukul Majumdar
201
State
PFE
1st SSE
2nd SSE
0:
(0.6670; 0.3330; 3.00)
(0.5973; 0.4027; 2.48)
(0.7518; 0.2482; 4.03)
(3
(0.6670; 0.3330; 3.00)
(0.7518; 0.2482; 4.03)
(0.5973; 0.4027; 2.48)
(In every entry, the first number is consumption of young, the second is consumption of old, and the third is nominal price of consumption good.) The consumption programs in sunspot equilibria are Pareto inferior to the program in the perfect foresight equilibrium. Furthermore, in two sunspot equilibria old agents survive in one state of nature and fail to survive in another with the same amount of resources, because equilibrium price is too high. (We intentionally considered the case where agents survive in the certainty equilibrium to demonstrate that survival is always feasible. Also, in this model young agents always survive, - otherwise, the overlapping generations structure collapses.)
4
Insurance and survival
The purpose of the following examples is to demonstrate that trade in securities does not guarantee survival of all agents. Furthermore, trade in securities can even deteriorate the survival chances of some agents. For expositionary simplicity, we consider a static Cobb-Douglas-Sen economy, similar to the one described in Section 2.
4.1
Static economy with two states: definitions
Let us first restate the definitions of a stochastic general equilibrium concept in a Cobb-Douglas-Sen economy with logarithmic preferences for a particular case of two possible states of environment. Consider a pure exchange economy with two goods, l E {1,2}, with good 1 being a numeraire. There are two states of nature, sEn = {o:, (3}, with 7r = P[s = 0:] = 1 - P[s = (3]. Two consumers, i E {1,2}, receive endowments ei(s) = (eil(s),ei2(s)) E Each consumer is characterized by the Cobb-Douglas logarithmic utility function:
Ri·
(4.1) In addition, each consumer is characterized by the minimum expenditure function, mi (P* (.)), the level of wealth at and below which consumer i fails to survive in the equlibrium with (random, normalized) equilibrium price vector (l,p*(·)). Consumers maximize utility in every state, taking price as given. A random equilibrium is defined as a set of vectors of allocations, {Xi (s ) }, and prices, p* (s) for each state of nature, such that
Survival under Uncertainty in an Exchange Economy
202
1. Given normalized price vector (l,p*(s)) in state s, consumption vector Xi(S) = (Xi1(S),Xi2(S)) maximizes utility of consumer i in state s subject to his budget constraint, xi1(s) + P*(S)Xi2(S) :::; eil(s) + p*(s)ei2(s), for every i and s; 2. Markets for consumtpion goods clear in every state. If we allow "( (the parameter in Cobb-Douglas preferences) vary across the consumers, the equilibrium price in state s will be
Hence, wealth of consumer i in state s is
Assume, for simplicity, that the minimum expenditure function is the same for all agents and has linear form:
m(p*(s))
=
ao
+ p*(s)al
for some positive constants ao and al. Then, consumer i is ruined in state s if
If this inequality holds for consumer i for s = Q' only, then consumer i is ruined with probability 7r. If it holds for s = f3 only, then i is ruined with probability (1 ~ 7f). If it holds for consumer i in both states, then i is ruined with probability 1. Suppose, consumers know 7f. The question is, if consumers could trade securities before s is realized, would this improve their chances to survive?
4.2
Arrow-type securities in a two-period economy
Assume now, that in the economy described in Section 4.1 there are two time periods, t = 0,1. Let the preferences of the consumers be described by von Neumann-Morgenstern expected utility function, with Bernoulli utility in the log Cobb-Douglas form (4.1), with "( varying across consumers. At t = 0 consumers can issue and trade contracts in real Arrow-type securities. At t = 1 consumers receive their endowments, execute the contracts and trade consumption goods. Markets for securities are complete: for every state of nature there is a security that promises to deliver at t = lone unit of numernire good if this particular state occurs, and nothing in other states (see [23] and [12] for a more general exposition). Denote the holdings of security that pays in state s by for consumer i; E R. Consumers know probability distribution
yt
yt
Nigar Hashimzade and Mukul Majumdar
203
of the states of nature. In time period t = 0 they choose holdings of securities, or portfolios, (yt, yf) to maximize expected utility of consumption in time period t = 1. We normalize price of the asset that pays in state a to unity and denote price of the asset that pays in state f3 by q. A random equilibrium with complete asset markets is a set of vectors of portfolios {(yt,yf)}, allocations {Xi(S)}, security prices (l,q) and consumption good prices (l,p(s)) for each state of nature, such that 1. Given asset prices (1, q) and normalized consumption good price vector
(1, p( S )) in state s, portfolio (yt, yf) and consumption vector Xi (s) = (Xil(S),Xi2(S)) maximize expected utility of consumer i at t = 0 subject to his budget constraints at t = 0, yt + qyf s: 0, at t = 1, XiI (s) + P*(S)Xi2(S) s: ei1(s) + p*(s)ei2(S) + Yi, for every i and s; 2. Asset markets clear at t = 0; 3. Markets for consumption goods clear at t
= 1 in every state.
Routine calculations give the following expressions for equilibrium prices: q
p(f3) and
_()
p a
=
1- 7r El(a) 7r EI (f3)
----
p(a) E2(a)El (f3) EI (a)E2(f3)
Li(l-7rl'i)eil(a) - (1-7r) Lil'ieil(f3)E1(a)/E1(f3) 7r Li l'i ei2(a) + (1 - 7r) Li l'i ei2(f3)E2 (a)/ E 2 (f3) .
Here, EI(S) == Li eli(s) is aggregate endowment of good I in state s. Wealth (in terms of the numeraire) of consumer i at t = 1 is then
-
El (f3) -
Note, that Wi (f3) = -(-)Wi(a), which means that if there is no aggregate E1a uncertainty in the endowment of numeraire, wealth is equalized across states. If there is no aggregate uncertainty in the endowments of both goods, relative price of consumption goods is also equalized across states. Then p = p( a) = p(f3) will be between p* (a) and p* (f3) and Wi = Wi (a) = Wi (f3) will be between Wi ( a) and Wi (f3). For the minimum expenditure function in the above form, we will also have that mi (p) = mi (p( a)) = mi (p(f3)) will be between mi (p* (a) ) and mi (P* (f3)). Could it happen that wealth of a consumer in a particular state falls below the minimum subsistence level in an economy with securities, whereas without securities his wealth in the same state is above the minimum subsistence level?
Survival under Uncertainty in an Exchange Economy
204
The following simple numerical examples demonstrate this possibility for the case with no aggregate uncertainty and for the case with aggregate uncertainty in endowments.
4.2.1
Example A: No Aggregate Uncertainty
Consider an economy with two consumers, i E {I,2}. Let the preferences of these two consumers and their endowments in two states be the following:
Let P[s = 0:] securities
Consumer i
ri
ei (0:)
ei ((3)
i = 1
1/2
(1,0)
(0,2)
i=2
1/3
(1,4)
(2,2)
1 - P[s = (3] =
1f
= 1/4. Then in the equilibrium without
7 8
p* (0:)
-
p* ((3)
-
4
5
and in the equilibrium with securities
p( 0:)
=
p((3)
=
31 38·
Suppose, both consumers have minimal expenditure function in he linear form, with the same parameters ao = 3/4 and al = 1. Then the survival threshold in the economy without securities is 1.625 in state 0: and 1.55 in state (3. It is easy to see that agent i = 1 is ruined in state s = 0: and survives in state s = (3; agent i = 2 survives in both states. With securities, the survival threshold in both states is ~ 1.5658, and agent i = 2 still survives in both states, but agent i = 1 is now ruined in both states.
4.2.2
Example B: Aggregate Uncertainty
Consider the same economy, now with aggregate uncertainty in the endowments:
ri
ei ( 0:)
ei ((3)
i=I
1/2
(1,0)
(0,2)
i = 2
1/3
(0,2)
(2,2)
Consumer i
Nigar Hashimzade and Mukul Majumdar
With
7r
205
= 1/4 the equilibrium price without securities is p*(a) p*({3)
3 4 4 5
-
and with securities
p(a) p({3)
15 19 30 19
Let the minimal expenditure function for both consumers be linear, with ao = 1/5 and al = 1. The survival threshold in an economy without securities is, then, 0.95 in state a and 1 in state {3. Both agents survive in both states. With securities, the survival threshold is ~ 0.990 in state a and ~ 1.779 in state {3. In that case, agent 2 still survives in both states, but agent 1 survives only in {3 and is ruined in a. These two examples demonstrate how trade in securities may worsen survival prospects of the agents with random endowments even when markets for securities are complete.
5
Concluding remarks
In this paper we introduced a formal general equilibrium approach to the problem of survival under uncertainty. The question of obvious practical importance is "how does one improve the chance of survival of an agent"? Clearly, when ruin is caused by market forces, the intervention of the government is desirable. The choice of the optimal policy is determined by the policy tools available to the government, and the sensitivity of the survival probability to the changes in policy variables. For the case of static economy with intrinsic uncertainty this problem was touched upon in [4]. In particular, under certain assumptions on the joint distribution of the endowments and linearity of the minimum expenditure function, the probability of survival of an agent increases as the limiting averages of the endowments increase. For the OLG economy with extrinsic uncertainty we showed elsewhere [14] that a lump-sum tax and transfer policy, with the amounts of taxes and transfers depending on equilibrium market price, can stabilize consumption at certainty equilibrium level (without affecting prices), thus eliminating the possiblity of ruin of the agents. In any case, the general equlibrium framework has to be used in order to accurately predict the outcomes of various policy measures. Another issue should be mentioned. Throughout this paper we assumed that the objective of an agent is to maximize his expected utility (as the traditional economic theory postulates). In a model with a single agent Majumdar and Radner [18] explored the implications for maximization of the probability of
206
Survival under Uncertainty in an Exchange Economy
survival. A systematic extension of this analysis to a framework with many interacting agents remains an important direction of research. Nigar Hashimzade and Mukul Majumdar Department of Economics, Cornell University, Ithaca, New York 14853
Bibliography [1] K. Arrow, The Role of Securites in the Optimal Allocation of Risk-bearing, Rev. Econ. Studies 31 (1964),pp. 91-96. [2] Y. Balasko and K. Shell, The Overlapping-Generations Model. II. The Case of Pure Exchange with Money, J. Econ. Theory 24 (1981), pp. 112-142. [3] R. N. Bhattacharya and M. Majumdar, Random Exchange Economies, J. Econ. Theory 6 (1973), pp. 37-67. [4] R. N. Bhattacharya and M. Majumdar, On Characterizing the Probability of Survival in a Large Competitive Economy, Review of Economic Design 6 (2001), pp. 133-153. [5] D. Cass and K. Shell, Do Sunspots Matter? J. Polito Economy 91 (1983), pp. 193-227. [6] S. Chattopadhyay and T.J. Muench, Sunspots and Cycles Reconsidered, Economic Letters 63 (1999), pp. 67-75. [7] J .L. Coles and P.J. Hammond, Walrasian Equilibrium Without Survival: Existence, Efficiency and Remedial Policy. In: Basu et al (eds.), "Choice, Welfare and Development." Oxford: Clarendon Press (1995), pp. 32-64. [8] G. Debreu, "Theory of Value; An Axiomatic Analysis of Economic Equilibrium", New Haven: Yale University Press (1959). [9] G. Debreu, Economies with a Finite Set of Equilibria, Econometrica 38 (1970), pp. 387-392. [10] Jean Dreze (ed.), "The Economics of Famine" , Cheltenham, Northampton, MA, USA: An Elgar Reference Collection (1999).
UK,
[11] D. Gale, Pure Exchange Equilibrium of Dynamic Economic Models, J. Econ. Theory 6 (1973), pp. 12-36. [12] J. D. Geanakoplos and H. M. Polemarchakis, Existence, Regularity, and Constrained Suboptimality of Competitive Allocations when the Asset Market is Incomplete. In: W. P. Heller, R. M. Starr, and D. A. Starrett (Eds. ), "Uncertainty, Information, and Communication: Essays in Honor of Kenneth J. Arrow." Cambridge, New York and Melbourne: Cambridge University Press (1986), Vol. III, pp. 65-95. [13] N. Hashimzade, Probability of Survival in a Random Exchange Economy with Dependent Agents, forthcoming in Economic Theory (2002).
Nigar Hashimzade and Mukul Majumdar
207
[14] N. Hashimzade, "Survival with Extrinsic Uncertainty: Some Policy Issues", Working Paper (2002), Cornell University. [15] J. Lamperti, "Probability." New York: Benjamin (1966). [16] L. Ljungqvist and T. Sargent, "Recursive Macroeconomic Theory". Cambridge, Massachusetts; London, England: The MIT Press (2000). [17] M. Majumdar and T. Mitra, Some Results on the Transfer Problem in an Exchange Economy. In: Dutta, B. et al (eds.), "Theoretial Issues in Development Economics." New Delhi: Oxford University Press (1983), pp. 221-244. [18] M. Majumdar and R. Radner, Linear Models of Economic Survival Under Production Uncertainty, Economic Theory 1 (1991), pp.13-30. [19] M. Majumdar and V. Rotar, Equilibrium Prices in a Random Exchange Economy with Dependent Agents, Economic Theory 15 (2000), pp. 531550. [20] M. Ravallion, "Markets and Famines," Oxford: Clarendon Press (1987). [21] Y. Rinott and V. Rotar, A Multivariate CLT for Local Dependence with n- 1 / 2 Iog n rate, and Applications to Multivariate Graph Related Statistics, J. Multivariate Analysis 56 (1996), pp. 333-350. [22] P. Samuelson, An Exact Consumption-Loan Model ofInterest with or without the Social Contrivance of Money, JPE 66 (1958), pp. 467-482. [23] W. Shafer, Equilibrium with Incomplete Markets in a Sequence Economy. In: M. Majumdar (ed.), "Organizations with Incomplete Information. Essays in Economic Analysis: A tribute to Roy Radner." Cambridge, New York and Melbourne: Cambridge University Press (1998), pp. 20-41. [24] A. Sen, Starvation and Exchange Entitlements: A General Approach and its Application to the Great Bengal Famine, Cambridge J. Econ. 1 (1977), pp. 33-60. [25] A. Sen, Ingredients of Famine Analysis: Availability and Entitlements, Quarterly Journal of Economics 96 (1981), pp. 433-464. [26] A. Sen, "Poverty and Famines: An Essay on Entitlement and Deprivation," Oxford: Oxford University Press (1981). [27] S. Spear, Sufficient Conditions for the Existence of Sunspot Equilibria, J. Econ. Theory 34 (1984), pp. 360-370. [28] C. Stein, Approximate Computation of Expectations. Harvard, CA: IMS (1986).
208
Survival under Uncertainty in an Exchange Economy
Singular Stochastic Control in Optimal Investment and Hedging in the Presence of Transaction Costs Tze Leung Lai Stanford University
and Tiong Wee Lim National University of Singapore Abstract In an idealized model without transaction costs, an investor would optimally maintain a proportion of wealth in stock or hold a number of shares of stock to hedge a contingent claim by trading continuously. Such continuous strategies are no longer admissible once proportional transaction costs are introduced. The investor must then determine when the stock position is sufficiently "out of line" to make trading worthwhile. Thus, the problems of optimal investment and hedging become, in the presence of transaction costs, singular stochastic control problems, characterized by instantaneous trading at the boundaries of a "no transactions" region whenever the stock position falls on these boundaries. In this paper, we review various formulations of the optimal investment and hedging problems and their solutions, with particular emphasis on the derivation and analysis of Hamilton-Jacobi-Bellman (HJB) equations using the dynamic programming principle. A particular numerical scheme, based on weak convergence of probability measures, is provided for the computation of optimal strategies in the problems we consider.
1
Introduction
The problems of optimal investment and consumption and of option pricing and hedging were initially studied in an idealized setting whereby an investor incurs no transaction costs from trading in a market consisting of a risk-free asset ("bond") with constant rate of return and a risky asset ("stock") whose price is a geometric Brownian motion with constant rate of return and volatility. For example, Merton (1969, 1971) showed that, for an investor acting as a pricetaker and seeking to maximize expected utility of consumption, the optimal strategy is to invest a constant proportion (the "Merton proportion") of wealth in the stock and to consume at a rate proportional to wealth. In the related problem of option pricing and hedging, arbitrage considerations of Black and Scholes (1973) demonstrated that, by setting up a portfolio of stock and option that is risk-free, the value of an option must equal the amount of initial capital required for this hedging. However, both the Merton strategy and the Black-Scholes hedging portfolio require continuous trading and result in an infinite turnover of stock in any finite
209
Singular Stochastic Control
210
time interval. In the presence of transaction costs proportional to the amount of trading, such continuous strategies are prohibitively expensive. Thus, there must be some "no transactions" region inside which the portfolio is insufficiently "out of line" to make trading worthwhile. In such a case, the problems of optimal investment and consumption and of option pricing and hedging involve singular stochastic control. As we shall see, Bellman's principle of dynamic programming can often be used to derive (at least formally) the nonlinear partial differential equation (PDE) satisfied by the value function of interest. The derived PDE will then suggest methods (analytic or numerical) to solve for the optimal policies. One such numerical scheme, based on weak convergence of probability measures, will be particularly useful to the problems described in this paper. It turns out that some of the resulting free boundary problems can be reduced to optimal stopping problems in ways suggested by Karatzas and Shreve (1984, 1985), thereby simplifying the solutions of the original optimal control problems. We will focus on the two-asset (one bond and one stock) setting which many authors consider. Besides simplifying the exposition, such a setting can be justified by the so-called "mutual fund theorems" whenever lognormality of prices is assumed; see, for example Merton (1971) in the absence of transaction costs and Magill (1976) in the presence of transaction costs. Specifically, the market consists of two investment instruments: a bond paying a fixed risk-free rate r > 0 and a stock whose price is a geometric Brownian motion with mean rate of return Q > 0 and volatility a > o. Thus, the prices of the bond and stock at time t ~ 0 are given respectively by
dE t
= rEt dt
and
(1.1 )
where {Wt : t ~ O} is a standard Brownian motion on a filtered probability space (0, F, {Fdt2::o, JP) with Wo = 0 a.s. The investor's position will be denoted by (Xt, yt) (in Section 2) or (Xt, Yt) (in Section 3), where
X t = dollar value of investment in bond,
= dollar value of investment in stock, Yt = number of shares held in stock.
yt
(1.2)
In particular, we note the relation yt = Yt St.
The rest of the paper is organized as follows. In Section 2, we consider optimal investment and consumption, beginning with a treatment of the "Merton problem" (no transaction costs) over a finite horizon, and then proceeding to the transaction costs problem considered by Magill and Constantinides (1976) and, more recently, by ourselves. We also consider the infinite-horizon case, drawing on results from Davis and Norman (1990) and Shreve and Soner (1994), and review the work of Taksar, Klass and Assaf (1988) on the related problem of maximizing the long-run growth rate of the investor's asset value. The problem of option pricing and hedging in the presence of transaction costs is considered in Section 3. Some concluding remarks are given in Section 4.
Tze Leung Lai and Tiong Wee Lim
2
211
Optimal Consumption and Investment with Transaction Costs
The investment and consumption decisions of an investor comprise three nonnegative {Ft}t>o-adapted processes C, L, and M, such that C is integrable on each finite time interval, and Land Mare nondecreasing and right-continuous with left-hand limits. Specifically, the investor consumes at rate Ct from the bond and L t (resp. M t ) represents the cumulative dollar value of stock bought (resp. sold) within the time interval [0, t], 0 ::; t ::; T. In the presence of proportional transaction costs, the investor pays fractions 0 ::; A < 1 and 0 ::; fJ, < 1 of the dollar value transacted on purchase and sale of stock, respectively. Thus, the investor's position (Xt, yt) satisfies dXt dyt
= (r X t - Ct) dt - (1 + A) dL t + (1 = ayt dt + ayt dWt + dL t - dMt .
fJ,) dMt ,
(2.1a) (2.1b)
The factor 1 + A (resp. 1 - fJ,) in (2.1a) reflects the fact that a transaction fee in the amount of A dL (resp. fJ, dM) needs to be paid from the bond when purchasing dL (resp. selling dM) dollar value of stock. We define the investor's wealth (or net worth) as Zt = X t
+ (1 -
fJ,)yt
if yt ~ 0;
By requiring that the investor remains solvent (i.e., has nonnegative net worth) at all times, the investor's position is constrained to lie in the solvency region D which is a closed convex set bounded by the line segments
= a~D = [h,D
{(x, y) : x
> 0,
y
{(x,y) : x::; 0, y
+ (1 + A)Y = O}, ~ 0 and x + (1- fJ,)y = O}.
< 0 and
x
We denote by A(t, x, y) the class of admissible policies, for the position (Xt, yt) = (x, y), satisfying (Xs, Y s ) E D for t ::; s ::; T, or equivalently, Zs ~ 0 for t ::; s ::; T. At time t, the investor's objective is to maximize over A(t, x, y) the expected utility
J(t, x, y)
~ IE [iT e-~('-') U (C,) ds + e~(T-') U2(ZT) Ix, ~ x, Y" ~ y1' j
where f3 > 0 is a discount factor and U1 and U2 are concave utility functions of consumption and terminal wealth. We assume that U1 is differentiable and that the inverse function (U{)-l exists. Often U1 and U2 are chosen from the so-called HARA (hyperbolic absolute risk aversion) class:
U (c)
= CI
/,y if r < 1,
r =I 0;
U(c) = loge
if r = 0,
which has constant relative risk aversion -cU" (c) jU' (c) = 1 value function by
V(t, x, y)
=
sup (C,L,M)EA(t,x,y)
J(t, x, y).
r.
(2.2)
We define the
(2.3)
Singular Stochastic Control
212
2.1
The Merton Problem (No Transaction Costs)
Before presenting the solution to the general transaction costs problem (2.3), we consider the case A = /1 = 0 (no transaction costs) analyzed by Merton (1969). In this case, by adding (2.1a) and (2.1b), the total wealth Zt = X t + yt can be represented as (2.4) where ()t = yt/(Xt + yt) is the proportion of the investment held in stock. Using the reparameterization z = x + y, the value function can be expressed as
V(t,z) =
sup
(C,L,M)EA(t,z)
lE [rT e-!3(s-t)U1(C s )ds + e-!3(T-t)U2(ZT) I Zt
it
where A(t, z) denotes all admissible policies (C, ()) for which Zs
= z] ,
> 0 for all
t :::;: s :::;: T. The Bellman equation for the value function is max{(8/8t C,o
+ £:)V(t, z) + U(C)
subject to the terminal condition VeT, z) generator of (2.4): £:=
- (3V(t, z)} = 0,
(2.5)
= U2 (z), where £: is the infinitesimal
a 2()2 z 2 8 2 8 2 8z2 +[rz+(0:-r)Oz- C18z '
Formal maximization with respect to C and () yields C = (Un- 1 (Vz ) and () -(Vz/Vzz) (0: - r)/a 2z (in which subscript denotes partial derivative, e.g., Vz 8V/8z). Substituting for C and () in (2.5) leads to the PDE
8V _ (0: - r)2 (8V/8z)2 8t 2a 2 8 2V/8z 2 where C*
+
( _ C*)8V rz 8z
+
U (C*) _ (3V = 0 1
,
= =
(2.6)
= C*(t, z) = (Un-l (Vz(t, z)). Let p=
0: - r
c
(1 -1)a2'
=
_1_
[(3 _ Ir _
1- I
Ci(t) = c/{l - q\ec(t-T)}
(i = 1,2),
1(0: - r) 2 ] 2(1 -1)a 2 '
(PI =
1,
(h =
(2.7)
1- c.
If U1 takes the form (2.2), then C* = (Vz)l/CY-l) and solving the PDE yields the optimal policy: (); == p and C; = C1(t)Zt when U2 == 0, or C; = C 2(t)Zt when U2 takes the form (2.2). Note that c = when I = O. Thus, in the Merton problem, the optimal strategy is to devote a constant proportion (the Merton proportion p) of the investment to the stock and to consume at a rate proportional to wealth. Furthermore, for i = 1 or 2 (corresponding to U2 == 0 or to (2.2)), the value function is
(3
z"Y
Vet, z) = - [Ci(t)P-l I
Vet, z)
=
ai(t)
1
+ Ci(t)
if I < 1, 1-=1=0;
log[Ci(t)z]
if I = 0,
Tze Leung Lai and Tiong Wee Lim
213
where ai(t) = ,6-2[r - ,6 + (a - r)2/2o- 2]{1 - e t3 (t-T) [1 + 0, V(t, z) < 00 when "( ::; o. A necessary and sufficient condition for a finite value function when 0 < "( < 1 is ,6 > "(r + "((a - r)2/{2(1 - "()o-2}. Corresponding results for general utility functions U1 and U2 have been given by Cox and Huang (1989), who use a martingale technique instead of the usual dynamic programming principle. By working under the equivalent martingale measure so that differences in mean rates of return among assets are removed, the martingale approach allows candidate optimal policies to be constructed by solving a linear (instead of nonlinear) PDE; see also Karatzas, Lehoczky and Shreve (1987).
2.2
Transaction Costs and Singular Stochastic Control
In the presence of transaction costs, analytic solutions are generally unavailable, even for HARA utility functions. One approach to the problem is to apply a discrete time dynamic programming algorithm on a suitable approximating Markov chain for the controlled process. This approach is based on weak convergence of probability measures, which will ensure that the discrete-time value function converges to its continuous-time counterpart as the discretization scheme becomes infinitely fine. Note that the optimal investment and consumption problem involves both singular control (portfolio adjustments) and continuous control (consumption decisions).
We begin with an analysis of the Bellman equation, which will subsequently suggest an appropriate Markov chain approximation for our problem. We can obtain key insights into the nature of the optimal policies by temporarily restricting Land M to be absolutely continuous with derivatives bounded by"" l.e., Lt
=
lt
Rsds
and
Mt
=
lt
(2.8)
msds,
Proceeding as before, the Bellman equation for the value function (2.3) is max {(a/at
C,C,m
+ £:)V(t, x, y) + U1 (C) - ,6V(t, x, y)} = 0,
(2.9)
subject to V(T, x, y) = U2(X + (1 - fJ,)y) if y ?:: 0; V(T, x, y) = U2(x + (1 + >.)y) if y < 0, where £: is the infinitesimal generator of (2.1a)-(2.1b):
(J"2y2 a 2
a
a
[a
a ]
[
a
a]
£: = -2- ay2 +(rx-C) ax +ay ay + ay - (1 + >.) ax R+ (1 - fJ,) ax - ay m.
(2.10) The maximum in (2.9) is attained by C = (Un- 1 (Vx ), R = "'TI{Vy~(1+A)Vx}' and m = ",TI{Vy ::;(l-p,)Vx}. Thus, it can be conjectured that buying or selling either takes place at maximum rate or not at all, and the solvency region 'D can be partitioned into three regions corresponding to "buy stock" (B), "sell stock" (S), and "no transactions" (N). Instantaneous transition from B to the buy boundary aB or from S to the sell boundary as takes place by letting '" ---+ 00 and moving the portfolio parallel to aA'D or ap,'D (i.e., in the direction of
Singular Stochastic Control
214
V
(-1, (1 + A)-l or (1, -(1- JL)-l)T, where T denotes transpose). This suggests that Vet, x, y) = Vet, x + (1 - JL)8y, y - by) for (t, x, y) E Sand Vet, x, y) = Vet, x - (1 + A)8y, y + t5y) for (t, x, y) E B. In the limit as t5y -----f 0, we have
Vy(t, x, y) = (1 - JL)Vx(t, x, y), Vy(t, x, y) = (1 + A)Vx(t, x, y), In
N
the value function satisfies (2.9) with R= m
(t,x,y) E S, (t,x,y) E B.
(2.11a) (2.11b)
= 0, leading to the PDE
av (J2y2 a 2v * av av * - + - - + ( r x - C )-+ay-+U1(C )-;3V=O, at 2 ay2 ax ay
(t,x,y) EN, (2.11c)
where C* = C*(t,x,y) = (Un- 1 (Vx (t,x,y)) as in (2.6). To solve (2.11a)-(2.11c), the first step is to find an approximating Markov chain which is locally consistent with the controlled diffusion (2.1a)-(2.1b). Following Kushner and Dupuis (1992), we will use the "finite difference" method to obtain the transition probabilities of the approximating Markov chain. Specifically, for a candidate consumption decision (i.e., continuous control) C, we make the following (standard) approximations to the derivatives in equation (2.11c):
Vi(t, x, y)
-----f
[Vet + 8, x, y) - Vet, x, y)]/8,
V( x t,x,y )
-----f
{
-----f
{[V(t+t5,X,Y+E) - V(t+t5,X,Y)]/E ify ~ 0, [Vet + t5,x,y) - Vet + 8,x,y - E)]/E if Y < 0,
-----f
[Vet + 8,x,y + E)
T7 (
Vy t,x,y
)
Vyy(t,x,y)
[V(t+8'X+E,Y)-V(t+t5,X,Y)l/E ifrx-C~O, [Vet + 8,x,y) - Vet + 8,x - E,y)l/E if rx - C < 0,
+ Vet + 8,x,y -
E) - 2V(t + 8,X,Y)]/E 2.
(2.12) Collecting terms and noting that C* in (2.11c) is the optimal control, we obtain the following backward induction equation for the "consumption step":
VO(t, x, y)
= e-!3 omsx
{~p(X' f) x,y
1
x, y)V(t + t5, X, f))
+ 8Ul(C)}
,
(2.13)
where only the following five transition probabilities are nonzero:
p(x ±
E,
Y 1 x, y) = (rx - C)±8/E,
p(x, y ± E1 x, y) = ay±b/E + ((J2y2 /2)t5/E 2 , p(x, y 1x, y) = 1 - (Irx -
CI + alyl)8/E -
((J2y2)8/E2.
Equation (2.13) is to be evaluated for t E 1[' = {O, 8, 2t5, ... , N 8} with 8 = T /N and (x, y) belonging to some grid X x 1{ made up of multiples of ±E. Given 8, the choice of E must ensure that p(x, y 1 x, y) ~ 0. Let Al = maxxEX,C Irx - CI and A2 = max y E1{ Iyl. Then one could set
Tze Leung Lai and Tiong Wee Lim
215
A similar treatment of equations (2.11a)-(2.11b) yields respective relations for the "sell step" and the "buy step" (singular controls):
VS(t, x, y)
=
pV(t, x, Y - E)
vh(t, x, y) = (1
+ (1 -
+ >.)-1 [>' V(t, x -
p)V(t, x + E, y - E),
E, y)
+ V(t, x -
E, Y + E)].
Since only one of buy, sell or no transactions can happen at each step, the dynamic programming equation for the (discrete-time) finite horizon value function is therefore
V(t, x, y)
=
max{VO(t, x, y), VS(t, x, y), Vh(t, x, y)},
with terminal condition V(T, x, y) = U2(x + (1 - p)y) if Y 2': 0; V(T, x, y) = U2 (x + (1 + >.)y) if Y < O. For a sufficiently fine grid 1l' x x: x Y, this gives good approximations to the value function (2.3) and the transaction regions: (t, x, y) E S if V(t, x, y) = VS(t, x, y) and (t, x, y) E S if V(t, x, y) = Vh(t, x, y). When U1 and U2 take the form (2.2), we find that V is concave and homothetic in (x, y): for 'r/ > 0,
V(t, 'r/X, 'r/y)
=
'r/'V(t, x, y)
V(t, 'r/X, 'r/y) = {;3-1 [1-
if, < 1, ,
e(J(t-T)]
-I 0;
+ e(J(t-T)} log'r/ + V(t, x, y)
if,
= O.
Homotheticity of V suggests that if equations (2.11a) and (2.11b) are satisfied for some (t, x, y) E as and as, respectively, then the same is true for any (t, 'r/X, 'r/y) with 'r/ > o. Thus, it can further be conjectured that the boundaries between the transaction and no transactions regions are straight lines (rays) through the origin for each t E [0, T]. Moreover, since C* = (Vx )l/ b -l), equation (2.11c) becomes
av O' 2y2 a 2v av av 1 - , (av)'/h-l) -+----+rx-+ay-+-- --;3V = 0 at 2 ay2 ax ay , ax '
(t,x,y) EN,
with the fifth term on the l.h.s. of (2.14) replaced by -(1 + log Vx )
(2.14) when, = O.
We can further exploit homotheticity of V to reduce the nonlinear PDE (2.14) to an equation in one state variable. Indeed, let 'I/J(x) = V(t,x,l) so that V(t, x, y) = Y''I/J(t, x/y). Then, for some functions A*(t), A*(t), and -(1- p) < x*(t) < x*(t) < 00, equations (2.11a)-(2.11b) and (2.14) are equivalent to the following when, < 1 and, -I 0:
'I/J(t,x)
=
'I/J(t, x)
=
,-I A*(t)(x + ,-I A*(t)(x +
1- p)',
x ::; x*(t),
(2.15a)
1 + >')"
x 2': x*(t),
(2.15b)
x
E
[x*(t),x*(t)], (2.15c)
where
b3
=
0'2/2. (2.16)
Singular Stochastic Control
216
A similar set of equations can also be obtained for 'Y = o. A simplified version of the numerical scheme described earlier in this section can be implemented to solve for 'lj;(t, x) as well as the boundaries x*(t) and x*(t). For details and numerical examples, see Lai and Lim (2002a). Hence, for HARA utility functions, the optimal policy for the transaction costs problem (2.3) is given by the triple (C*,L*,M*), where
and L; =
!at lI{xs/ys=x*(s)} dL:,
t E [0, T].
The introduction of transaction costs into Merton's problem in Section 2.1 has the following consequence. The investor should optimally maintain the proportion of investment in stock between B*(t) := [1 + x*(t)]-1 > 0 and B*(t) := [1 + x*(t)]-1 < J-L-I, i.e., B*(t) :::; B; :::; B*(t) in our earlier notation. Thus, the no transactions region N is a "wedge" in the solvency region D. Such an observation can be traced back to Magill and Constantinides (1976), who found that "the investor trades in securities when the variation in the underlying security prices forces his portfolio proportions outside a certain region about the optimal proportions in the absence of transaction costs." The foregoing analysis and solution of problem (2.3) can be extended to the case of more than one stock. While a straightforward application of the principle of dynamic programming would suffice to derive the Bellman equation, computational aspects of the problem become much more involved. As pointed out by Magill and Constantinides (1976), m stocks imply 3m possible partitions of the solvency region so even for moderately large m (e.g., 35 ~ 250, 3 10 ~ 60000) it is unclear how to systematically solve for the transaction regions. When the stock prices are geometric Brownian motions, Magill (1976) established a mutual fund theorem on the reduction of the optimal investment and consumption problem to the case consisting of a bond and only one stock.
2.3
Stationary Policies for Infinite-Horizon Problems
We can view the infinite-horizon optimal investment and consumption problem as the limiting case of the finite-horizon problem in Section 2.2. By setting t = 0 and letting T --+ 00, the finite-horizon value function (2.3) approaches the following infinite-horizon value function (dropping the subscript on U1 ): V(x, y)
=
sup (C,L,M)EA(x,y)
IE
roo e-(3tU(Ct ) dt,
Jo
(x,y) ED,
(2.17)
where A(x,y) denotes the set of all admissible policies (C,L,M) for an initial position (x, y) E D such that (Xt, yt) ED for all t ~ 0 a.s. Because the problem no longer depends on time t, the regions 5, 5, and N are stationary over time. The Bellman equation is given by (2.9) without a/at. The analysis of Section
Tze Leung Lai and Tiong Wee Lim
217
2.2 carries over, leading to analogs of equations (2.11a)-(2.11c) (i.e., without t and av/at). For a general utility function U, the numerical procedure described in Section 2.2 can be modified to give a solution of the infinite-horizon investment and consumption problem. With the finite difference approximations given by (2.12) but without t or t + 1 stocks has been developed by Akian, Menaldi and Sulem (1996). Because V is concave and homothetic, it is possible to reduce the problem to solving ordinary differential equations (ODEs). Indeed, the control problem can be solved by finding a C 2 function 'ljJ and constants 00 > x* > x* > -(1 - p,) and A*, A* satisfying equations (2.15a)-(2.15c) without time dependence. It can be shown that ()* ::; p ::; ()*, with ()* = (1 + X*)-1, ()* = (1 + x*)-1. Two sufficient conditions for finiteness of the value function
Singular Stochastic Control
218
V are f3 > 1r + 1(0: - r)2 /{2(1-1)a 2} and (f3 - 0:1)(1 + A) > (f3 - r1)(1- J1); see Shreve, Saner and Xu (1991). Interestingly, if lump-sum transaction costs proportional to portfolio value (e.g., portfolio management fees) are imposed in addition to proportional transaction costs, then portfolio selection and withdrawal for consumption are made optimally at regular intervals (as opposed to trading at randomly spaced instants of time), with the investor consuming deterministically between transactions, as shown by Duffie and Sun (1990). To find the constants x*, x*, A*, A*, and the function 'ljJ, the principle of smooth fit can be first applied to 'ljJ" at x* and x* to solve for A* and A * (which depend on x* and x* respectively). Next, the second order ODE (2.15c) (without t and 8'ljJ / 8t) can be written as a pair of first-order equations after a change of variables. Specifically, for 1 # 0 (so U(c) = c'Y /1), let Q(f) = -bI/1- b2f + (1-1)b 3f 2 and R(f) = -bI/1 + (b3 - b2)f -1b3f2, where b1 , b2, and b3 are defined in (2.16). Then there exist functions f(x) and h(x) satisfying the system of differential equations
l' =
1 [R(f) - h], -b
f(x*) =
3X
1*
:=
x*::
+ A'
(2.20a) hi = _1_ _h_[h - Q(f)], 1 -1 b3 xf
h(x*) = Q(f*),
h(x*) = Q(f*),
(2.20b)
such that 'ljJ(x) =
~
[1 h (X)]'Y1 1-1
1
[~]'Y f(x)
satisfies (2.15c) (without t and 8'ljJ/8t). In this case, the optimal consumption policy is Ct = C*(Xt, Yt), where C*(x, y) = 1(1 - 1)-lxh(x/y)/ f(x/y). The case 1 = 0 can be treated similarly. Davis and Norman (1990) suggested the following algorithm for the numerical solution of (2.20a)-(2.20b) (in which f, h, x*' x* need to be determined). The iterative procedure starts with an arbitrary value x* of x* > 1 - p, and the corresponding values j* = x* /(x* + 1 + A) and h* = Q(}*). It uses numerical integration to evaluate j(x)
= j*
-l x
x*
R(j(u~) -
h(u) du,
3U
h(x) = h* - _1_1 x* h(u)[h(u) -=- Q(j(u))] du 1-1 x b3uf(u) for a sequence of decreasing x values until the first value x* of x for which h(x*) ~ Q(}(x*)). At this point, we have a solution of (2.20a)-(2.20b) with J1 replaced by x* + 1 - x*/ f(x*). The iterative procedure continues by adjusting the initial guess x* and computing the resulting x*' terminating when x* + 1 x*/ f(x*) differs from J1 by no more than some prescribed error bound.
Tze Leung Lai and Tiong Wee Lim
2.4
219
Maximization of Long-Run Growth Rate
An alternative optimality criterion was considered by Taksar, Klass and Assaf (1988). Instead of maximizing expected utility of consumption as in (2.17), suppose the objective is to maximize, in the model (2.1a)-(2.1b) without consumption (i.e., C t == 0), the expected rate of growth of investor assets (equivalently the long-run growth rate). This optimality criterion can be reformulated in terms of R t = Yt/ X t alone so that the problem is to minimize the following limiting expected "cost" per unit time: (2.21 ) where .\
f-Lx f(x)=x+l'
a2x 2 2(x + 1)2
(
2
( ) o:-r+- X- . 2 x +1 (2.22) In (2.21), L t (resp. NIt) can be interpreted as the cumulative percentage of stock bought (resp. sold) within the time interval [0, t], and is related to L t (resp. M t ) via dL t = (1/ Xt) dL t (resp. dMt = Y;;-1 dMd. If'\ = f-L = 0 (no transaction costs), the second and third terms in (2.21) vanish and the optimal policy is to keep Rt equal to the optimal proportion obtained as the minimizer of h(x). This is tantamount to setting Ot (= Yt/(X t + Yt)) equal to p* := (0: - r)/a 2 + 1/2, which resembles the Merton proportion pin (2.7).
g(x)=x+l'
h(x)
=
We study the general problem of minimizing (2.21) under the condition Io:-rl < (]"2/2. (If this condition is violated, the optimal policy is to transfer all the investment to bond or stock at time 0 and to do no more transfer thereafter.) Since
an analysis of the value function V using the Bellman equation shows (in a manner similar to the previous section) that there exist constants x *, x*, A (optimal value) such that
(a 2 /2)x 2 V"(x)
+ (0: -
r
+ a 2 /2)xV'(x) + h(x)
- A
= 0,
x E [x*, x*], (2.23a)
V'(x)
=
F(x),
x ~ x*'
V'(x) = G(x),
x> x* , -
(2.23b)
where F(x) = -'\(I+x)-l(I+(I+.\)x)-l and G(x) = f-L(1+x)-l(I+(I-f-L)X)-l. Using the principle of smooth fit at x* and x*, we find that A = h((1 + A)X*) = h((1 - f-L)x*), from which it follows that either
or
x* = (_1_) (p* - 1/2)(1 + A)X* + p* 1 - f-L (1 - p*)(1 + A)X* - p*
(2.24) Hence, even though an alternative criterion (of maximizing long-run growth rate) is used to assess the optimality of investment policies, the above analysis shows that like Section 2.3 the investor should again optimally maintain the
Singular Stochastic Control
220
proportion of investment in stock between ()* := x*/(1 + x*) and ()* := x* /(1 + x*). The constants x* and x* can be computed by solving the second-order nonhomogeneous ODE V'( ) = 2 x a 2 x 2p*
l
x
[h(
x*
x*
) _ h( )] 2(p*-1) d + F* + x*F~ (x* Y Y Y 1 - 2P* x
)2 P* (2.25)
with initial conditions V'(x*) = F* := F(x*) and V"(x*) = F~ := F'(x*) at x*' which is obtained by differentiating (2.23a)-(2.23b). A search procedure can then be employed to find that value of x* for which x* given by (2.24) satisfies V'(x*) = G(x*) in view of (2.23b).
3
Option Pricing and Hedging
This section considers the problem of constructing hedging strategies which best replicate the outcomes from options (and other contingent claims) in the presence of transaction costs, which can be formulated as the minimization of some loss function defined on the replication error. In our recent work, we directly minimize the (expected) cumulative variance of the replicating portfolio in the presence of additional rebalancing costs due to transaction costs. As shown in Section 3.3, this leads to substantial simplification as the optimal hedging strategy can be obtained by solving an optimal stopping (instead of control) problem. In Sections 3.1 and 3.2 we review an alternative approach, developed by Hodges and Neuberger (1989), Davis, Panas and Zariphopoulou (1993) and Clewlow and Hodges (1997), which is based on the maximization of the expected utility of terminal wealth and which generally results in a free boundary problem in four-dimensional space. Instead of solving the free boundary problem, Constantinides and Zariphopoulou (1999) derived analytic bounds on option pnces.
3.1
Formulation via Utility Maximization
The utility-based approach adopts a paradigm the investor trades only in the underlying stock and proportional transaction costs are imposed Following the notation in (1.2), his holding of (number of shares) is given by dXt
similar to Section 2. Suppose on which the option is written on purchase and sale of stock. bond (dollar value) and stock
= r X t dt - (1 + ),,)St dL t + (1 -
(3.1a)
fL)St dMt ,
dYt = dL t - dMt ,
(3.1b)
where L t (resp. M t ) represents the cumulative number of shares bought (resp. sold) within the time interval [0, t]. Define the cash value of y shares of stock when the stock price is S by
Y(y, S) = (1
+ )..)yS
if y < 0;
Y(y, S)
= (1 - fL)yS if y
~
o.
Tze Leung Lai and Tiong Wee Lim
221
For technical reasons, the investor's position is constrained to lie in the region
v = ((x,y,S)
E lR 2 x lR+: x
+ Y(y,S) > -a}
(3.2)
for some prescribed positive constant a. We denote by A(t, x, y, S) the class of admissible trading strategies (L, M) for the position (x, y, S) E V at time t such that (Xs, Ys, Ss) E V for all s E [t, T]. The objective is to maximize the expected utility of terminal wealth, giving rise to the value functions Vi(t,x,y,S)
=
lE [U(Z~)],
sup
i
= 0, s, b,
(3.3)
(L,M)EA(t,x,y,S)
where U : lR -+ lR is a concave increasing function (so it is a risk-averse utility function). The terminal wealth of the investor (with or without an option position) is given by z~
=
Z~ =
Z~ =
+ Y(YT, ST) X T + Y(YT, ST) II{STSK} + [Y(YT - 1, ST) + K] II{ST>K} X T + Y(YT, ST) II{STSK} + [Y(YT + 1, ST) - Kj II{ST>K} XT
(no call), (sell a call), (buy a call),
in which we have assumed that the option is asset settled so that the writer delivers one share of stock in return for a payment of K when the holder chooses to exercise the option at maturity T. In the case of cash settled options, the writer delivers (ST - K)+ in cash, so Z~ = X T + Y(YT, ST) - (ST - K)+ and Z~ = X T + Y(YT, ST) + (ST - K)+. From the definition of the value functions (3.3), it is evident that an application of the principle of dynamic programming will yield the same PDE for each value function (i = 0, s, b), with the terminal condition governed by utility of the respective terminal wealth. By temporarily restricting Land M as in (2.8) (and then letting r;, -+ (0), the Bellman equation for Vi is max£,m(8/8t + £)Vi(t, x, y, S) = 0, where £ is the infinitesimal generator of (3.1a)-(3.1b) and dSt = St(adt + adWt ):
£
8 = rx-
8x
2
2
8 +a S2 8 + as- -2 + [ -8 8S
2
8S
8y
(1
8 ]f + + A)8x
[ (1 - JL)8 - -8 ] m. 8x
8y
Thus, once again, the state space can be partitioned into regions in which it is optimal to buy stock at the maximum rate, or to sell stock at the maximum rate, or not to do any transaction. Arguments similar to those in Section 2 show that there exist functions y*(t, x, S) (buy boundary) and y*(t, x, S) (sell boundary) for each i = 0, s, b such that V~(t, x, y, S) = (1
+ A)SV:(t, x, y, S),
V~(t,x,y,S) = (1- JL)SV:(t,x,y,S),
lI;;i + rxV:
+ aSV~ + (a 2 S2 /2)V~s
=
0,
y
s:;
y*(t, x, S),
(3.4a)
y::::: y*(t,x,S),
(3.4b)
y E [y* (t, x, S), y* (t, x, S)], (3.4c)
The optimal hedging strategy associated with (3.3) is given by the pair (L *, M*), where for each i = 0, s, b, L*t --
l
0
t
II {ys=y.(s,Xs,Ss)} dL*s'
t E
[O,Tj.
Singular Stochastic Control
222
Two different definitions of option prices have been proposed. In Hodges and Neuberger (1989) and subsequently in Clewlow and Hodges (1997), the reservation selling (resp. buying) price is defined as the amount of cash ps (resp. pb) required initially to provide the same expected utility as not selling (resp. buying) the option. Thus, ps and pb satisfy the following equations:
(3.5) An alternative definition is used by Davis, Panas and Zariphopoulou (1993). Assuming that U(O) = 0, define Xi
=
inf{x : Vi(O,
X,
0, S)
~
O},
°
i
= O,s, b,
°
so in particular, X O:s; because VO(O, 0, 0, S) ~ (investing in neither bond nor stock is admissible). Thus, an investor pays an "entry fee" -x o to trade in the market strictly on his own account. The selling price ps and buying price pb of the option are then constructed such that the investor is indifferent between going into the market with and without an option position: ps = X S - x O and pb = -(x b -x O ). Although they advocate this definition for the option writer's price, Davis, Panas and Zariphopoulou (1993, pp. 492-493) express reservations of using it to define the buyer's price.
3.2
Solution for Exponential Utility Functions
A reduction in dimensionality (from four to three) can be achieved by specializing to the negative exponential utility function U(z) = 1 - e--Yz (with constant index of risk aversion -U"(z)jU'(z) = ,). Using this utility function, the bond position can be managed through time independently of the stock holding and
Vi(t, X, y, S)
=
1 - exp { _,xer(T-t)} Hi(t, y, S),
i = 0, s, b,
where Hi(t, y, S) := 1 - Vi(t, 0, y, S). As a consequence, the free boundary problem (3.4a)-(3.4c) for each i = 0, s, b is transformed into the following problem:
H;(t, y, S)
= _,er(T-t) (1
H;(t,y,S) = H:
_,e
+ )")SHi(t, y, S),
y:S; y*(t, S),
(3.6a)
p)SHi(t,y,S),
y ~ y*(t,S),
(3.6b)
[y*(t, S), y*(t, S)].
(3.6c)
r (T-t)(1_
+ aSH1 + (a 2 S 2 /2)H1s =
0,
y
E
It is also straightforward to observe that the price definitions are equivalent to pS _ -1 -rTl [HS(O,O,S)] -, e og HO(O,O,S) ,
pb __ -
-1 -rT
,
e
1
[Hb(O, 0, S)]
og HO(O, 0, S)
. (3.7)
The solution of the free boundary problem (3.6a)-(3.6c) can be obtained by approximating dYt = dL t - dMt and dSt = St(a dt + a dWt ) with Markov chains and applying a that discrete-time dynamic programming algorithm as in
Tze Leung Lai and Tiong Wee Lim
223
Section 2.2. To this end, it is useful to note from (3.6a)-(3.6b) that
Hi(t, Yl, S) = Hi(t, Y2, S) exp {~l'er(T-t)(l
+ ),,)S(YI ~ Y2)},
Yl:::; Y2 :::; y*(t, S),
Hi(t, Yl, S) = Hi(t, Y2, S) exp { ~l'er(T-t)(l ~ P,)S(YI ~ Y2)}, Yl 2': Y2 2': y*(t, S). We discretize time t so that it takes values in 1f = {O, 15, 215, ... , N 15}, where 15 = T / N. The number of shares is also discretized so that Y is a multiple of E. Then we can approximate the stock price process using the following random walk: with probability p, with probability 1 ~ p, where u = Ja 215 + (a ~ a 2/2)215 2 and p = [1 + (a ~ a 2/2)15/u]/2. Let Y = {kE : k is an integer} and § = {e ku So : k is an integer} This discretization scheme leads to the following algorithm for (t, y, S) E 1f X Y X §:
Hi(t, y, S) = min {Hi(t, Y + E, S) exp Hi(t, Y ~
E,
ber(T-t) (1
+ ),,)SE] ,
S) exp [ ~ "(er(T-t) (1 ~ p,)SE] ,
pHi(t + 15, y, eUS)
+ (1 ~ p)Hi(t + 15, y, e-US) };
(3.8)
see Davis, Panas and Zariphopoulou (1993) and Clewlow and Hodges (1997) for details. Depending on which term on the r.h.s. of (3.8) is the smallest, the point (t, y, S) is classified as belonging to B, S, or N, respectively. We set Y* ( t, S) (resp. y* (t, S)) to be the largest (resp. smallest) value of Y for which (t, y, S) E B (resp. S).
3.3
A New Approach
The previous analysis shows that, in the presence of transaction costs, perfect hedging of an option is not possible and trading in options involves an element of risk. Indeed, if the region D defined in (3.2) is replaced by the solvency region of Section 2, Soner, Shreve and Cvitanic (1995) showed that "the least costly way of hedging the call option in a market with proportional transaction costs is the trivial one-to buy a share of the stock and hold it." By relaxing the requirement of perfect hedging, Leland (1985) and Boyle and Vorst (1992) demonstrated that discrete-time hedging strategies, for which trading takes place at regular intervals, can nearly replicate the option payoff at maturity. The option price is essentially the Black-Scholes value with an adjusted volatility. While hedging error can be reduced to zero as the time between trades approaches zero, the adjusted volatility approaches infinity and the option value approaches the value of one share of stock. A new approach has been recently proposed in Lai and Lim (2002b). The formulation is motivated by the original analysis of Black and Scholes (1973) in the following way: form a hedging portfolio that minimizes hedging error and price the option by the (expected) initial capital require to set up the hedge.
Singular Stochastic Control
224
For the hedging portfolio, the objective is to minimize the expected cumulative instantaneous variance and additional rebalancing costs due to transaction fees, given by
J(t, S, y) = IE
[iT iT
F(s,
+",
s" y,) ds +
(S,/K) dM,
>'iT
Is,
(S'/ K) dT",
=
S,y,
=
yj.
= a 2(S/ K)2[y - fl.(t, S)]2
for the option writer and F(t, S, y) = + ~(t,S)F for the option buyer. Here, fl.(t,S) = N(d1(t,S)) is the Black-Scholes delta (i.e., the number of shares in the option's perfectly replicating portfolio) with where F(t, S, y)
a 2(S/K)2[y
d1(t, S) = {log(S/K)
+ r(T - t)}/aJT - t + aJT - t/2.
Taking 0: = r, analysis of the Bellman equation for the value function V (t, S, y) minL,M J(t, S, y) leads to the following free boundary problem:
Vy(t, S, y) = -AS/ K Vy(t, S, y) = /-LS/ K
=
NC n {y < ~(t, S)}, in NC n {y > ~(t, S)}, in
inN. By working with Vy instead of directly with V, we deduce from the previous set of equations that Vy(t, S, y) satisfies another free boundary problem associated with an optimal stopping problem. It is this reduction to optimal stopping that greatly simplifies the hedging problem. Applying the transformations s = a 2(t - T) and z = log(S/ K) - (p - 1/2)s, where p = r / a 2, it suffices to work with v(s, z, y) = Vy(t(s), S(s, z), y). For each y, we obtain the following discrete-time dynamic programming equation for the option writer, utilizing a symmetric Bernoulli walk approximation to Brownian motion:
v(s, z, y) = min{/-Le z+!3s, v(s, z, y)}lI{y>D(s,z)} + max{ _Ae z+!3s, v(s, z, y) }1I{yD(o,z)} - AII{y yS(s, z) (resp. y < yb(s, z)), the option writer must immediately sell y-yS(s, z) (resp. buy yb(s, z) -y) shares of stock to form an optimal hedge. The optimal hedging portfolio for the option buyer can also be obtained from (3.9) by symmetry: the optimal sell and buy boundaries for the option buyer with sell rate /-L and buy rate A are _yb(s, z) and -yS(s, z)
Tze Leung Lai and Tiong Wee Lim
225
respectively, where yS(s, z) and yb(s, z) are the optimal sell and buy boundaries for the option writer with sell rate A and buy rate J-L. Simulation studies have shown the approach to be efficient in the sense that it results in the smallest standard error of hedging error for any specified mean hedging error, where hedging error is defined to the difference between the Black-Scholes value and the initial capital needed to replicate the option payoff at maturity. For details and refinements, see Lai and Lim (2002b).
4
Conclusion
Optimal investment portfolios and hedging strategies derived in the absence of transaction costs involve continuous trading to maintain the optimal positions. Such continuous policies are at best approximations to what can be achieved in the real world, and a frequent practice is to execute the policies discretely so that transactions take place at regular (or predetermined) intervals. With appropriate adjustments, these policies can also be implemented in the presence of transaction costs since they do not lead to an infinite turnover of asset. However, in the absence of a clearly defined objective, it is difficult to argue that a discrete policy is optimal in any sense. This difficulty can be overcome in investment and consumption problems through utility maximization, and in option pricing and hedging problems through the minimization of hedging error. Many formulations of these problems lead naturally to singular stochastic control problems, in which transactions either occur at maximum rate ("bang-bang") or not at all. In the analysis of these singular control problems, the principle of dynamic programming is used to derive the Bellman equations, which are nonlinear PDEs whose solutions in the classical sense have posed formidable existence and uniqueness problems. The development of viscosity solutions to these PDEs in the 1980s is a major breakthrough that circumvents these difficulties; see Crandall, Ishii and Lions (1992). In contrast to discrete policies, singular control policies require trading to take place at random instants of time, when asset holdings fall too "out of line" from a "target." Besides being naturally intuitive, singular control policies lend further insight into optimal investor behavior when faced with investment decisions (with or without consumption). Efficient numerical procedures can be developed to solve for the singular control policies based on Markov chain approximations of the controlled diffusion process. In some instances, a reduction to optimal stopping reduces the computational effort considerably.
Tiong Wee Lim Dept. of Statistics and Appl. Prob. National University of Singapore Singapore 117546
Tze Leung Lai Department of Statistics Stanford University Stanford, CA 94305
226
Singular Stochastic Control
Bibliography [1] Akain, M., Menaldi, J. L. and Sulem, A. (1996). On an investmentconsumption model with transaction costs. SIAM J. Control Optim. 34 329-364. [2] Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. J. Political Economy 81 637-654. [3] Boyle, P. P., and Vorst, T. (1992). Option replication in discrete time with transaction costs. J. Finance 47 271-293. [4] Clewlow, L. and Hudges, S. D. (1997). Optimal delta-hedging under transaction costs. J. Econom. Dynamics Control 21 1353-1376. [5] Constantinides, G. M. (1986). Capital market equilibrium with transaction costs. J. Political Economy 94 842-862. [6] Constantinides, G. M. and Zariphopoulou, T. (1999). Bounds on prices of contingent claims in an intertemporal economy with proportional transaction costs and general preferences. Finance Stoch. 3 345-369. [7] Cox, J. C. and Huang, C. F. (1989). Optimal consumption and portfolio policies when asset prices follow a diffusion process. J. Econom. Theory 49 33-83. [8] Crandall, M. G., Ishii, H. and Lions, P. L. (1992). User's guide to viscosity solutions of second order partial differential equations. Bull. Amer. Math. Soc. 27 1-67. [9] Davis, M. H. A. and Norman, A. R. (1990). Portfolio selection with transaction costs. Math. Oper. Res. 15 676-713. [10] Davis, M. H. A., Panas, V. G. and Zariphopoulou,T. (1993). European option pricing with transaction costs. SIAM J. Control Optim. 31 470493. [11] Duffie, D. and Sun, T. (1990). Transaction costs and portfolio choice in discrete-continuous-time setting. J. Econom. Dynamics Control 14 35-51. [12] Hodges, S. D. and Neuberger, A. (1989). Optimal replication of contingent claims under transactions costs. Rev. Futures Markets 8 222-239. [13] Karatzas, I., Lehoczky, J. P. and Shreve, S. E. (1987). Optimal portfolio and consumption decisions for a "small investor" on a finite horizon. SIAM J. Control Optim. 25 1557-1586. [14] Karatzas, I. and Shreve, S. E. (1984). Connections between optimal stopping and singular stochastic control I. Monotone follower problems. SIAM J. Control Optim. 22 856-877. [15] Karatzas, I. and Shreve, S. E. (1985). Connections between optimal stopping and singular stochastic control II. Reflected follower problems. SIAM J. Control Optim. 23 433-451.
Tze Leung Lai and Tiong Wee Lim
227
[16] Kushner, H. J. and Dupuis, P. G. (1992). Numerical Methods for Stochastic Control Problems in Continuous Time. Springer-Verlag, New York. [17] Lai, T. L. and Lim, T. W. (2002a). Optimal investment and consumption on a finite horizon with transaction costs. Technical Report, Department of Statistics and Applied Probability, National University of Singapore. [18] Lai, T. L. and Lim, T. W. (2002b). A new approach to pricing and hedging options with transaction costs. Technical Report, Department of Statistics, Stanford University. [19] Leland, H. E. (1985). Option pricing and replication with transactions costs. J. Finance 40 1283-1301. [20] Magill, M. J. P. (1976). The preferability of investment through a mutual fund. J. Econom. Theory 13 264-271. [21] Magill, M. J. P. and Constantinides, G. M. (1976). Portfolio selection with transaction costs. J. Econom. Theory 13 245-263. [22] Merton, R. C. (1969). Lifetime portfolio selection under uncertainty: The continuous-time case. Rev. Econom. Statist. 51 247-257. [23] Merton, R. C. (1971). Optimum consumption and portfolio rules in a continuous-time sense. J. Econom. Theory 3 373-413 [Erratum 6 (1973) 213-214]. [24] Shreve, S. E. and Soner, H. M. (1994). Optimal investment and consumption with transaction costs. Ann. Appl. Probab. 4 609-692. [25] Shreve, S. E., Soner, H. M. and Xu, G.-L. (1991). Optimal investment and consumption with two bonds and transaction costs. Math. Finance 153-84. [26] Soner, H. M., Shreve, S. E. and Cvitanic, J. (1995). There is no nontrivial hedging portfolio for option pricing with transaction costs. Ann. Appl. Probab. 5 327-355. [27] Taksar, M., Klass, M. J. and Assaf, D. (1988). A diffusion model for optimal portfolio selection in the presence of brokerage fees. Math. Oper. Res. 13 277-294.
228
Singular Stochastic Control
Parametric Empirical Bayes Model Selection Some Theory, Methods and Simulation Nitai Mukhopadhyay Eli Lilly and Company
and
J ayanta Ghosh Purdue University Abstract For nested models within the PEB framework of george and Foster (Biometrika,2000), we study the performance of AIC, BIC and several relatively new PEB rules under 0-1 and prediction loss, through asymptoties and simulation. By way of optimality we introduce a new notion of consistency for 0-1 loss and an oracle or lower bound for prediction loss. The BIC does badly, AIC does well for the prediction problem with least squares estimates. The structure and performance of PEB rules depend on the loss function. Properly chosen they rend to outperform other rules.
1
Introduction
Our starting point is a paper by George and Foster (2000), abbreviated henceforth as [6]. [6] propose a number of new methods using PEB (Parametric Empirical Bayes) ideas on model selection as a tool for selecting variables in a linear model. An attractive property of the new methods is that they use penalized likelihood rules with the penalty coefficient depending on data, unlike the classical AIC, due to Akaike (1973), and BIC, due to Schwartz (1978), which use constant penalty coefficients. The penalty for a model dimension q is usually Aq, where A is a penalty coefficient. [6] compare different methods through simulation. Our major contribution is to supplement this with some theoretical work for both prediction loss and 0-1 loss. The former is supposed to be relevant in soft science, where one only wants to make good prediction, and the latter is relevant in hard science, where one wants to know the truth. It is known in model selection literature that these different goals lead to different notions of optimality. Our theory is based on the assumption that we have nested , orthogonal models - a situation that would arise if one tries to fit an orthogonal polynomial of unknown degree. This special case receives special attention in [6]. Our paper is based on Chapter 4 of Mukhopadhyay (2000), subsequently referred to as [9]. A related paper is Berger, Ghosh and Mukhopadhyay, (2003), which shows the inadequacy of BIC in high dimensional problems. 229
230
Parametric Empirical Bayes Model Selection
The BIC was essentially developed as an approximation to the Bayesian integrated likelihood when all parameters in the likelihood have been integrated out. The model that maximizes this is the posterior mode, it minimizes the Bayes risk for 0-1 loss. It is shown in Berger, Ghosh and Mukhopadhyay, (2003) that BIC is a poor approximation to this in high dimensional problems. The optimality of AIC in high dimensional prediction problems has been proved in a series of papers, e.g., Shibata (1981), Li (1987) and Shao (1997). Both the BIC and AIC are often used in problems for which they were not developed. We examine the penalties of [6] in Section 2 and make some alternative recommendations. All the model selection rules are studied in Sections 3 and 4 from the point of view of consistency under 0-1 loss. In section 5 we follow the predictive approach, using the consistency results proved earlier. For the situation where least squares estimates are used for prediction after selection of a model, we define an oracle, a sort of lower bound, in the spirit of Shibata. In the PEB framework it is easy to calculate the limit of the oracle, namely, the function B(·) and show that the Bayes prediction rule and the AIC attain this lower bound asymptotically. This is not always the case for the PEB rules, which are Bayes rules for 0-1 loss. Section 5 ends with a study of the case where Bayes (shrinkage) estimates are used instead of least squares estimates. Then the PEB rules are asymptotically optimal and can do substantially better than AIC. However, the benefit comes from the better estimates rather than more parsimonious model selection. Simulations in Section 6, for both 0-1 and squared error prediction loss, bear out the validity of asymptotic results in finite samples, they also provide useful supplementary information. Results similar to those outlined above are studied, in the Frequentist setting of Shao (1997), in Mukhopadhyay and Ghosh (2002) and for Shibata's Frequentist setting of nonparametric regression in Berger, Ghosh, and Mukhopadhyay, (2003). The assumptions, priors, results and proofs differ in the three cases. The PEB formulation of [6] provides a PEB background for the simplest as well as cleanest results of this type.
2
PEB Model Section Rules for 0-1 Loss
The problem of variable selection in nested orthogonal models can be put in the following canonical form in terms of the regression coefficients. The data consist of independent r. v's Yij, i = 1, 2, ... ,p, j = 1, 2, ... ,r. There are p models M q , 1 ::; q ::; p. Hardly any change occurs if q = 0 is also allowed. Under M q , Yij = (3i + tij, 1::; i ::; q, j = 1,2, ... ,r
N. Mukhopadhyay and J.K. Ghosh
231
= tij, q+ 1:::; i :::;p, j = 1,2,·· ·r, with til'S i.i.d. N(0,0- 2 ). For simplicity we assume 0- 2 is known. If 0- 2 is unknown the same theory applies if 0- 2 is replaced by a consistent estimate of 0- 2 . If r > 1 and p is large, then a consistent estimate of 0- 2 is available from the residuals Yij - Yi. In our asymptotics r is held fixed and p -+ 00. The sample size is n = pro Clearly, the model Mq of dimension q specifies that {3q+l,··· ,{3p are all zero. In the PEB formulation, see e.g. Morris (1983), the dimension of parameter space is reduced by assigning the parameters a prior distribution with a few unspecified (hyper-)parameters which are estimated from data and integrating out original parameters. [6] assume, as in Morris (1983), that {3I,·· . ,{3q are i.i.d. N (0, C 0- 2 / r). In our work we have used C 0- 2 , both choices have validity see our discussion in Berger and Pericchi (2001). In any case in the simulations r = 1, so that our prior is the same as that of [6]. As indicated in Morris (1983), a PEB formulation is a compromise between a classical Frequentist approach and a full Bayesian approach. In many decision theoretic examples based on real or simulated data, Efron and Morris (1973), Morris (1983) and others have shown that the PEB formulation permits borrowing of strength from estimates of similar parameters, leading to estimates that substantially improve classical estimates even in a Frequentist sense. However, this does not follow from PEB theory. The PEB theory works well, i.e. provides better estimates than classical ones in the sense of cross-validation or being closer to a known true value, when the normality (or other prior) distributional assumption is checked by comparing the expected and empirical distribution of Yi's. If Mq is true, then YI , Y2 , ... ,Yq are i.i.d. N(O, co- 2 + 0- 2 / r ). In the PEB formulation here there are two unknown remaining parameters, namely c and the true q denoted as qo. The PEB solution adopted by us is to estimate c from data and put a prior 7r(q) on q. We make one final assumption that 0- 2 = 1 which can be ensured by a suitable scale transformation. Suppose c is known and 7r(q) is a prior on q. The Bayes solution is to maximize with respect to q. The likelihood with {3I,· .. ,(3q integrated out namely,
L(q, c)
=
A 7r(q)(l
+ rc)-q / 2 exp{
rc SSq} ... 1 +rc
(1)
q
where SSq =
r2....:Y? and A doesn't depend on q or c.
Since c is not known, one
1
choice - referred to as a conditional maximum likelihood estimate of c - is to maximize the expression in (1) with respect to c, giving ~
SS
rC q = max{--q - 1, O} q
(2)
We now take 7r(q) uniform on 1 :::; q :::; p. Then the PEB Bayes rule will choose Mq if q maximizes the expression in (1) after replacing c by cq. This amounts to maximizing with respect to q,
Parametric Empirical Bayes Model Selection
232
A(q)
= A(q, cq ) = 2 log L(q, c) =
rC q ~ SSq - q log(1 1 + rCq
= SSq - q(1 + log + SSq)
+ rc q ) (3)
q
If instead of estimating c, we put a prior on tion we should maximize
C
and then use Laplace approxima-
(4) Details are given in [9]. Later we provide some evidence that a single estimate of a C across all models is preferable. A natural PEB estimate is obtained by taking 7r(q) = lip, and summing the expressions of the likelihood in (1) over 1 ::; q ::; p and then maximizing with respect to c. This estimate C1[ is referred to as the Marginal Maximum Likelihood estimate in [6]. One then gets a third penalized (log) likelihood ~ )} A1[ ( q) = S S q - q { I +~rC1[ log+ (1 + rC1[ rC1[ In this paper C1[ will also stand for any estimate which converges a.s. to c as true qo --> 00. George and Foster [6] discuss the relative advantages and disadvantages of each estimate of C and refer to unpublished work of Johnstone and Silverman (2000). The new model selection rules are to be compared with AIC which maximizes SSq - 2q/r and BIC which maximizes SSq - q{log(pr)}/r. As indicated before both these classical rules are inappropriate for high dimensional problems with 0-1 loss. The rule based on A(q) is essentially due to [6] except that, instead of our uniform prior, they choose the "binomial" prior.
(4a) where, according to [6], w is to be estimated also by maximizing (1). For a given q, it is clear that w appears only in the prior 7r(q) and not on the likelihood of the data given M q . The maximizing w, namely,
Wq
=
q/p
(5)
can hardly be called a PEB estimate in the same spirit as cq . Also for q/p bounded away from zero an one, the penalty in (log) integrated likelihood due to this 7r(q) is O(q) whereas this part of the penalty vanishes at the end-points. In other words, irrespective of the data, the models in the middle range of q are being unduly penalized. The binomial prior seems more appropriate in the all 2P subsets model selection which is much problem, where the models in the middle have cardinality (P) q bigger than the cardinality of, say q = 1 or p.
N. Mukhopadhyay and J.K. Ghosh
233
Even for all subsets model selection, there is some confounding between wand c in the following sense. The Bayesian "non-centrality" parameter is p
E(~ j3i) = pwc
(6)
1
An estimate of this can only help determine the product wc. Separate estimation of wand c will require the use of the normal likelihood in a way that is not robust. We will return to this problem elsewhere.
3
Consistency
We first consider the case where c is known, so in the PEB criteria estimates Cq , c7r are to be replaced by c. It is clear that if MqO remains fixed (as p -----> 00), then the likelihood ratio of MqO with respect to any other fixed M q, remains bounded away from zero and infinity. Hence it would be impossible to discriminate one of them from the other with error probabilities tending to zero as p -----> 00. That can happen only when Iqo - qll -----> 00 as p -----> 00. The following definition is motivated by this fact.
Definition Let qo -----> 00 as p -----> 00. A penalized likelihood criterion A(q, ¥,p) for model selection is consistent at qo if given E > 0 and for sufficiently large p and qo, there exists a k, (depending on E, p, qo, such that
Pqo{A(qo, ¥,p) > A(q, ¥,p), Vlq - qol 2 k} > 1 -
E
(7)
Of course we could take fixed qo and examine consistency from the right only. The treatment is exactly similar. Let
A(q, ¥,p) for some).. >
o.
=
SSq - q)..
Then for ql > qo and ql - qo
(8)
-----> 00
q
A(qo, ¥,p) - A(q, ¥,p) = -r ~ 9i2
+ (ql
- qo) .. = (ql - qo)().. - 1 + op) (9)
qo+l
Similarly for q < ql and qo - ql
-----> 00,
A(qo, ¥,p) - A(q, ¥,p)
=
(ql - qo)(1
+ rc -).. + op(I))
(10)
We thus have Proposition 3.1. The penalized likelihood criterion A(q, ¥,p) with constant penalty coefficient ).. is consistent at all qo -----> 00 iff 1 < ).. < 1 + rc.
Parametric Empirical Bayes Model Selection
234
For AIC, >.. show that
=
2, so one would have consistency if rc > 1. If rc < 1, one can
A(1, Y-,p) - A(q, Y-,p)
- t 00
(11)
a.s.
if q - t 00, i.e., AIC chooses MI or models not far from MI' It is shown in section 5 that this is a good thing to do, if one wants to make predictions and least squares estimates are used. The usual BIC with>" = log n is inconsistent, this extremely high penalty also leads to poor performance in prediction. A modified version due to several people, see [9] or Mukhopadhyay, Berger and Ghosh (2002) for references, has log p instead of log n. That also is not consistent in general. For consistency one requires r :2: 3 and 1 + rc-log r > O. We now turn to the three PEB rules with estimates cq or c7r . It is easy to check that the rule based on A7r (q) is consistent if c7r is a consistent estimate for c. To prove this we need to show
1 +rc 1 < --log(1 +rc) < 1 +rc rc
(12)
The right hand inequality follows from
log(1
+ rc) < rc
which is proved by the fact that the second derivative of log(1 The left hand inequality follows from
(1
+ rc)log(1 + rc) > r
(13)
+ x)
is negative.
(14)
which is proved by the fact that the second derivative of (1 + x)log(1 + x) is positive. The other two PEB criteria differ from each other by a quantity which is op(q), hence they are either both consistent or both inconsistent. Since cq has undesirable properties as an estimate of c (vide Section 4) neither of these rules is consistent in our sense. This does have some effect on their performance in prediction problems. All one can show for these two cases is that A(qo, Y-,p) - A(q, Y-,p) - t 00 if \q - qo\ - t 00 and (qO/ql) is bounded away from zero. To prove this, one has to use the behavior of cq for q > qo which is studied in the next section.
4
Estimation of c.
By the law of large numbers, for large q,
N. Mukhopadhyay and J.K. Ghosh
235
q
rL:Y? Cq
=
1
q
qac
_
1 = c (approximately), for q ::; qo
.
= - (approxImately), q > qo
(15)
q
Clearly, for large incorrect models, cq decreases the penalty for each additional parameter, namely, l+log (1 + cq ). This is counterintuitive. Plots of cq for simulated data in [9] shows that cq tends to die out for large incorrect values of q. This is the main reason why consistency became a problem for A(q, cq ). If the true qo is fixed and not large, one cannot have a consistent estimate of c. If qo
-----t 00
at a rate faster than some known
q,
then a consistent estimate is
(16) However such knowledge of q is unlikely. A plot of information about both c and true qo.
cq
provides good visual
An estimate of c, which is easy to calculate and has a nice Bayesian interpretation is the model average
(17)
where
(18)
Asymptotic behavior of c7r is difficult to study. It is unlikely to be consistent in general for the following reason. For values of q much larger than qo, cq will be much smaller than c but such q's will have large weights itq inappropriately. The net effect of this will be to pull down the average c7r away from c. Some evidence of this based on simulation is provided in [9]. We now make two rather strong assumptions which ensure consistency of a slightly modified version of c7r • AI) As p
-----t 00,
qo/p is bounded away from zero
A2) There is a known positive number k such that c::; k.
Parametric Empirical Bayes Model Selection
236
The modified version, also denoted by the same symbol, is p
Cn =
I)·q min(cq, k)
(19)
1
Under our assumptions cn ---+ c a.s. We sketch a proof. For slight simplicity, we take r = I. For q :::; qo
(20) This can be used to show for all q < qo (1 - E), 0 < E < 1, b > 0 and sufficiently small,
A(q, cq )
-
A(qo, cqo ) < (qo - ql {log(1 +
cqo ) -
c + b} + ql {log(1 +
cqo ) -log(1 + cq )} (4.1) (21)
(where '"'I > 0) with probability> I-E. We have used the fact that log (l+c) < c. We can now show as in the proof of Proposition 5.1 that
L
exp{A(q, cq )
-
A(qo, cqo )} ---+ 0
(22)
q~qo(1-t)
with probability tending to one as p ---+
00.
For q 2:: qo (23)
where by the strong law, sup Irql---+ 0 in probability. So, by concavity oflog(x), q?qo
there exists b > 0, such that for p 2:: q 2:: qo (1 + E) (24) where rq is a generic term such that sup Irql is op (1). Then for p 2:: q > qo(I+E), q
where
sup
qollrql = op(l)
q>qo(1+t)
The expression in (25) is, by (24),
b
< -q2
Once again an analogue of (22) for q > qo(1 + E) is true. So the contribution to cn from q > qo(1 + E) and q < qo(1- E) is negligible. But for Iq - qol < E, cq can be made as close to c by choice of E. This proves the consistency of cn .
N. Mukhopadhyay and J.K. Ghosh
5
237
Bayes Rule for Prediction Loss and Asymptotic Performance
It is well-known (see, e.g., Shao (1997)) that the loss in predicting unobserved Y's, for an exact replicate of the given design, on the basis of given data is the q
sum of a term not depending on the model and the squared error loss 2: (Yi -;Ji? . 1
So in evaluating performance of a model selection rule it is customary to ignore the term not involving the model and focus on the squared error loss. We do so below. For a fixed c the Bayes rule is described in the following theorem. We need to first define a quantile model. A model Mq is a posterior a-quantile model if n(i + 1 ~ qlY) ~ a < n(i ~ qlY) or equivalently. Theorem 5.1. The Bayes rule selects the smallest dimensional model ifrc and the posterior r~~cl quantile model if rc > 1
~
1
Proof Let Mq stand for the true (random) model with prior n(q) The posterior distribution of ;Ji given Mq is
rc n(;Jilq, Y) = N(--Yj, c/(1 1 + rc =
+ rc)),
i ~q
point mass at zero, i > q
Hence -
Y
2
2
C
E{(Yj-;Ji) Iq'Y}={1+rc} +1+rc' i~q -2
= Yi, Similarly, E{(;Ji - a)2Iq, Y} = {;~~~P =
i>q
+ l~rc'
i
~q
a,i > q
Suppose we ignore the fact that we have to select from among nested models (i.e., we have to include all j < i if we include i in our model) and just try to decide whether to set ;Ji non zero or zero. The posterior risks of these two decisions are
W(i excluded IY) -
=
c {n(q 1 + rc
Hence inclusion of i is preferred iff
~ ilY)} + Y?{( ~ )2n(q ~ iIY)}. 1 + rc
Parametric Empirical Bayes Model Selection
238
which implies
1 + rc 2 rc
< n(i
::; qlY)
Suppose rc > 1. Then we choose all i such that n(i ::; qlY) > ~~~~. Given the obvious monotonicity of n(i ::; qIY), this means we choose the ~~~~ posterior quantile model. Clearly this is the Bayes rule. More formally if d( q1) is the decision to choose model M q , corresponding posterior risk b
ql
w(q1IY) =
L
w(i includedlY)
L
+
w(i excludedIY)·
i=l
rc-1 : : : L Min{w(i includedlY), W(i excludedlY)} = W(-2 -quantile modellY) rc p
1
Similarly if rc ::; 1, it is easy to see that the simplest model minimizes the posterior risk among all models. This completes the proof. To define asymptotic Empirical Bayes optimality, we define an oracle, I.e., a lower bound to the performance of any selection rule. Let MqO be the true (unknown) model and d(q1) the decision to select Given y, the PEB risk of d(qd under M qO ' after division by qo, is 1
M q1 .
P
ql
A(q1) = -lLE{(Yi qo t= . 1
f3d Iqo, Y} + 2
L
E{f3ilqo, Y}]
. +1 t=ql
for q1 ::; qo 1
c
=--+ 1 + rc for q1
(1
1
-
L yqo
+ rc)2 qo.t=l
2
+
t
q1 - qo 1 qo q1 - qo .
L yq
t=qo+1
2
t
> qo·
Using the strong law of large numbers we obtain a heuristic approximation to A(q1) namely
c + q ~ + qo - q c2r2 l+rc qo(1+rc)r qo (l+rc)r' c 1 1 q - qo 1 =--+ -+---,qo qo
Clearly qo(3(·) is a non-random approximation to the posterior risk under Mqo. Note that (3(.) is minimum at qo if re > 1 and at q = 1 if re = 1, then (3(.) does not depend on q.
Let A (.) and (3(.) be defined as above. Then
Theorem 5.2.
inf A(q) lim supIA(q)-(3(q)I=O and lim .qf(3() =1. qo-+= q qo-+= In q q
Proof We consider the case q
~
qo. The other case follows similarly.
2
A(q) - (3(q) =
qo
(1:
re
1
qo - q
q
L fi2 - (1 + re)} q
) {-
2 2
c r
1
1
qo
~
-2
+ -qo- (1 + re )2 {qo-- -q L..,. Yi -
(1
+ re)}
q+l
=
Tl (q)
We show that sup ITl (q) I
--+
+ T2(q) 0 a.s. One can show the other part --+ 0 in a
q 0, we choose a A such that for q > A, I L:: Yi 2 /q- (1 +e)1
q and (1+e)2 > 1, ITl(q)1 for q ~ A can be made smaller than
~ ql. The remaining ITl(qo)1 if we choose q sufficiently large.
< E for A < q E
By repeated application of this kind of elementary argument one proves the first part of the theorem. The first part implies lim Iinf A (q) - inf (3 (q) I = 0 qo-+= q q Since, inf (3 (q) q
=e =1
if e
bl¥)
---+
0
This is in the spirit of posterior consistency at qo except that b is not fixed but goes to infinity at a relatively slow rate. Proof of Theorem 5.4 Without loss of generality take r = 1. If c < 1, the model Mqc always chooses the simplest model. Hence its posterior risk (under qo) is qoA(qc). Since f3(q) is minimized at q = qc in this case, we are done. For c > 1, inf f3(q)
= 1.
Also by Prop 5.1,
qc qo
q
a.s.
---+
1
We consider the cases where qc ::;: qo The other case is similar. The posterior risk of Mqc for qc < qo is
which
---+
1 a.s. since
qc ---+ qo
1 a.s.
Proof of Prop. 5.1. We take r = 1 as before and let >.(c) = l~C log(l + c). It has been proved before that 1 < >.(c) < 1 + c. Using the strong law, given E > 0, there exists k > 0 such that for q > qo + k, with probability tending to one I(A(q) - A(qo))I(q - qo) - (1 - >,(c))1 < E i.e. A(q) - A(qo)
< -(q - qo)'y, for some I > o.
Hence
7r(q> qo
+ kl¥)::;:
L
t(q-qo)
where t =
e-"(
q>qo+k
=
t k / (1 - t)
One can similarly show 7r(q < qo - kl¥)
---+
---+
0
0, using >.(c) < 1 + c
Remark 5.1. Theorem 5.1. holds for unknown c if c is a consistent estimate and we use qc of the Empirica Bayes model selection rules but replacing cq by c. The same result holds for AlC also, which is interesting since AlC does not need to estimate c consistently. We prove this below. One simply notes that in Section 3 we prove that for rc > 1, AlC is consistent for qo, if qo ---+ 00. Also for rc < 1, AlC (q)- AlC (1) ---+ -00, if q ---+ 00. Using
N. Mukhopadhyay and J.K. Ghosh
241
these facts one shows, as in the proof of Theorem 5.4., Ale attains the same risk as the oracle. So far we have been looking at several Bayesian model selection rules from the point of view of prediction or squared error loss in a situation where after selection of model least squares estimates are used. Results differ in a major way if least squares estimated are replaced by the Bayes estimates E((3ilq, y) = 1~~c Yi if Mq is chosen and i -::; q. Since the proofs are similar we merely state the main facts. For a known c, the Bayes rule becomes the posterior median rule. This is a special case of a general result of Barbieri and Berger (2000) but can also be derived like Theorem 5. To define a Bayesian oracle, we redefine
A(q)
=
1
~
1"C
-
~
2
2
-[L,..,E{((3i- -I-Ii) Iqo,y}+ L,..,{((3i -0 Iqo,y}] qo i=1 + 1"C q+l q C -qo 1 + 1"C
+
(qo - q) C {-qo 1 + 1"C
1"C
+
1 + 1"C
+
~qO
-
2
.
Ii} (qio - ql) If ql -::; qo
q+l
and C
-- + 1 + 1"C
(q - qo) C -qo 1 + 1"C
+
~q
1
-
2
( - - I i ) /(ql - q) if ql > qo
qo
1 + 1"C
The heuristic nonrandom approximation is
(3(q) = !l_C_ qo 1 + 1"C and C
1 + 1"C
+ (ql
+ qo -
q{
qo
C
1 + 1"C
- qo) { C qo 1 + 1"C
+
2 2 1" C
1 + 1"C
+ _1_} 1 + 1"C
}
q -::; qo
q > qo
inf(3(qd = l~rc' attained at qo, for all c. The posterior median Bayes rule as well as the PEB model selection rules followed by Bayes estimation attains the risk of the Bayesian oracle, namely q minimizing A(q), provided C is known or a consistent estimate of C is used. The advantage of using the (shrinkage) Bayes estimates can be seen comparing the inf (3(q) for the two cases, namely l~rc for Bayes estimates and ~ for least squares estimates. For all fully Bayes rules reduce the posterior risk per component in the model by It~c which can be very large if both 1" and care small.
Parametric Empirical Bayes Model Selection
242
c=0.5,qo=50
c=0.5,qo=20 4 3 2
0.8 0.6 0.4 0.2
1
50 100150200250 c=3,qo=50
c=3,qo=20
1
50 100150200250
50 100150200250
Figure 1: Behavior of cq in a nested sequence of models.
6
Simulations and Discussion
A plot of cq against q is a good Bayesian data analytic tool that provides information about both c and the true dimension qo. This is true of all the four graphs in Figure 1 but it is specially noticeable when c if not too small. The second set of simulations describe the performance of different model selection rules for 0-1 loss. We have taken r = 1 In addition to AIC, BIC and the three PEB rules defined in section 2, we consider the Conditional Maximum Likelihood rule (CML) of [6], in which both cq and Wq are used as indicated in Section 2, even though the binomial prior seems unintuitive in the nested case. In simulation c = 0.5 or 3. Higher values of c are considered in [9], the results are very similar to those for c = 3. It is clear from Tables 1 and 2 that the BIC and CML are disastrous, as expected. AIC does well for c = 3 but badly for c =0.5, again as expected from Section 3. However, inconsistency is preferable to consistency in the prediction problem, vide the proof of Theorem 5.1 and Proposition 5.2. This is borne out by the third set of simulations.
The third set of simulations (Tables 3 and 4) describes performance of these cri teria under prediction loss. Once again, A* ( q) seems to do substantially better than A(q) and An is somewhat worse than the other two. AIC is competitive for c > 1 and dramatically better than c < 1 This is because with least squares estimates neither of the three PEB rules are asymptotically optimal if c < 1. Of course the Bayes rule qc for prediction loss would have done much better and be comparable to AIC.
N. Mukhopadhyay and J.K. Ghosh
41 5 A(q)
5
1
BIG AIG GML
8 44 270
38 310 1 3
A*(q)
A7r(q)
10 I
7 12 136 475 1 1 1 1 1 3 1 1 999
28 199 2
3 10 14 102 444 1 1 1 1 1 3 1 1 999
4 10 16 80 384 1 1 1 1 2 5 1 1 999
243
20 I 15 26 104 2 8 20 20 48 242 1 1 1 1 2 8 1 1 999
500 I
40 I 26 40 78 3 21 40 34 50 150 1 1 1 1 3
476 498 515 475 497 510 478 498 516 1 1 1 1 3 10 999 999 999
11
1 999 999
800 774 796 810 771 795 809 775 796 812 1 1 1 1 4 12 999 999 999
I
900 II 873 895 908 873 895 908 874 895 908 1 1 1
1 3 9 999 999 999
Table 1: Quartiles of the dimensions selected by different criteria for c = 0.5, r
=
II
qo
1.
10 I 4
3 A(q)
5 12
5 22 2
2 A*(q)
8
6
4 4
4
1
8 10 14 1
1 1
2
1 2
2 3
3 6
4
4 4
9 5
1
1
GML
10
5 38
1
AIG
9 5
8 64
BIG
11
4
3
A7r(q)
10
1 999
10 1
1 999
2 999
20 I 17 20 21 16 19 20 17 20 21 1 1 3 16 19 20 1 999 999
40 37 39 41 36 39 40 37 39 41 1 1 3 36 39 40 999 999 999
I
500 497 500 500 497 500 500 497 500 500 1 1 3 497 499 500 999 999 999
I
800 797 799 800 797 799 800 797 799 800 1 1 3 797 799 800 999 999 999
I
900 897 899 900 897 899 900 897 899 900 1 1 3 896 899 900 999 999 999
II
Table 2: Quartiles of the dimensions selected by different criteria for c = 3, r = 1.
Parametric Empirical Bayes Model Selection
244
II
qo A(q)
A*(q) An (q)
BIG AIG GML
227.94
5 211.53
10 205.26
20 178.71
40 138.14
500 522.19
800 818.44
900 909.33
35.77 293.53 2.63 5.18 425.15
20.66 297.17 3.05 4.91 412.92
37.06 297.28 5.5 7.86 466.09
42.47 235.44 10.54 13.68 499.04
54.76 180.23 20.69 25.82 574.25
518.25 522.89 250.74 258.63 998.05
816.34 818.57 401.06 409.8 1000.51
908.1 909.44 450.62 457.95 998.57
4
Table 3: Prediction loss of the models selected by different criteria for c r = 1.
II
qo A(q) A*(q) An(q)
BIG AIG GML
4
94.92 14.36 146.83 6.85 6.56 331.95
5 113.44 19.09 145.46 9.51 7.09 371.34
10 39.42 15.6 53.61 21.74 13.13 446.24
20 31.23 24.45 31.21 50.36 23.23 635.56
I
40 44.6 44.22 44.6 108.76 43.62 847.6
500 503.71 503.65 503.71 1489.84 503.66 998.85
II
= 0.5,
800 804.03 804.08 804.03 2392.47 804.32 998.6
Table 4: Prediction loss of the models selected by different criteria for c r = 1.
I
900 904.4 904.36 904.35 2693.92 904.23 999.55
= 3,
We have not done any simulations on the posterior median Bayes rule, which uses PEB shrinkage Bayes estimates. It is expected to outperform AIC as seen from the comparison of j3(.), s for model selection followed by least squares and model selection followed by Bayes estimates. The three PEB criteria of Section 2, followed by Bayes estimates, are expected to do much better than evident in Tables 3 and 4 but not as well as the posterior median rule. It may be worth pointing out that there is a basic difference between the median Bayes rule and AIC. Whether c > 1 or < 1, the median Bayes rule is consistent at qo- a proof can be constructed using Proposition 5.1 But it then shrinks the estimates towards zero appropriately, depending on values of c. AIC doesn't have this option, it uses least squares estimates. So for critically small values of c, namely c < 1, it has to choose a much lower dimensional model to have some sort of shrinkage.
Bibliography [1]
Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. In B. N. Petrov and F. Czaki, editors, Proceedings of the Second International Symposium on Information Theory, 267-271 Budapesk: Akad. Kiado.
[2]
Barbieri, M. and Berger, J. (2000) Optimal Predictive Model Selection, ISDS Discussion Paper, Duke University.
II
N. Mukhopadhyay and J.K. Ghosh
245
[3J
Berger, J.O., Ghosh J. K., and Mukhopadhyay, N. (2003) Approximations and consistency of Bayes factors as model dimension grows, Journal of Statistical Planning and inference, [112], 241-258.
[4J
Berger, J. O. and Pericchi, L. R. (2001) Objective Bayesian Methods for Model Selection: Introduction and Comparison, IMS Lecture Notes, (P. Lahiri editor) 38, 135-203.
[5J
Efron, B. and Morris, C. (1973) Stein's Estimation Rule and its Competitors an Empirical Bayes Approach, Journal of the American Statistical Association, 68, 117-130.
[6J
George, E. I. and Foster, D. F. (2000) Calibration and Empirical Bayes Variable Selection, Biometrika, 87, 731-747.
[7J
Li, K-C (1987) Asymptotic Optimality of cP ' Cl, cross Validation and Generalized cross Validation: Discrete Index Set, Annals of Statistics, 15, 958975.
[8J
Morris, C. (1983) Parametric Empirical Bayes Inference, Journal of the American Statistical Association, 78, 47-55.
[9J
Mukhopadhyay, N. (2000) Bayesian Model Selection for High Dimensional Models with Prediction Loss and 0-1 loss, thesis submitted to Purdue University.
[10J Mukhopdhyay, N. and Ghosh, J.K. (2002) Bayes Rules for Prediction Loss
and AIC, (submitted). [l1J Rissanen, J. (1983), A Universal Prior for Integers and Estimation by Minimum Description Length, Annals of Statistics, 11, 416-431. [12J Schwartz, G. (1978) Estimating the Dimension of a Model, The Annals of Statistics, 6, 461-464. [13J Shao, J. (1997) An Asymptotic Theory for Linear Model Selection, Statistica Sinica, 7, 221-264. [14J Shibata, R. (1981) An Optimal Selection of Regression Variables, Biometrika, 68, 45-54. [15J Shibata, R. (1983) Asymptotic Mean Efficiency of a Selection of Regression Variables, Annals of the Institute of Statistical Mathematics, 35, 415-423.
246
Parametric Empirical Bayes Model Selection
A Theorem of Large Deviations for the Equilibrium Prices in Random Exchange Economies Esa Nummel in University of Helsinki
Abstract We formulate and prove a theorem concerning the large deviations of equilibrium prices in large random exchange economies.
1
Introduction
We consider an economic system (shortly, economy) E, where certain commodities j = 1, ... , l are traded. Let R~ =def {p = (pI, ... ,pl) E Rl; pJ ~ 0 for all j = 1, ... , l}. The elements p of R~ are interpreted as price vectors (shortly, prices). (We will follow a convention, according to which superscripts always refer to the commodities whereas subscripts refer to the economic agents.) The total excess demand function Z (p) = (Zl (p), ... , Zl (p)) E Rl comprises the total excess demands on the l commodities in the economy at the prices p E R~. Its zeros p* are called the equilibrium prices: Z(p*) = O.
(In fact, according to Walras' law, we may regard money as an l+ 1 'st commodity [the numeraire] having price pl+l = 1 and total excess demand Zl+l(p) = -p. Z(p).) In the classical equilibrium theory the economic variables and quantities are supposed to be deterministic, see [2]. It is, however, realistic to allow uncertainty in an economic model. We assume throughout this paper that the total excess demand Z (p) is a random variable (for each fixed price p). In particular, it then follows that the equilibrium prices p* form a random set.
The seminal works concerning equilibria of random economies are due to Hildenbrand [5], Bhattacharya and Majumdar [lJ and Follmer [4J. The equilibrium prices in large random economic systems obey (under appropriate regularity conditions) classical statistical limit laws. The law of large numbers [lJ states that, as the number n of economic agents increases, the random equilibrium prices (r.e.p.'s) p~ become asymptotically equal to deterministic "expected" equilibrium prices: · Pn* 11m
n-+CXJ
= Pe'*
247
Random Exchange Economies
248
(The subscript n refers to the number of economic agents.) The central limit theorem (CLT) for the r.e.p. 's [lJ characterizes the "small deviations" of the r.e.p.'s from their expected values as asymptotically normal:
n ~ (p~ - p:)
----+
N in distribution,
where N denotes a multinormal random vector having mean zero. We argue in this article for the relevance of the theory of large deviations to random equilibrium theory. To this end, suppose that, an aposteriori observation of the equilibrium price is made, and let p denote the value of this observation. If the modeler is concerned with the estimation of the apriori probability of an aposteriori observation p of the equilibrium price in a large economy, the use of the CLT requires the apriori model to be "good" in the sense that the observation p ought to fall within a narrow range (having the asymptotically negligible order n - ~ = o( n)) from its expected value p;.
However, due to the fact that economics is concerned with the (economic) behaviour of human beings, any (predictive) economic model is always to some extent defective. It follows, in particular, that in a large economy an observed equilibrium price p may well represent a "large deviation" from its apriori predicted value p; (viz. fall outside the region of validity of the CLT). The main result of this paper is a theorem of large deviations (LD's) for the random equilibrium prices. It yields an exponential estimate for the (apriori small) probabilities of observations of r.e.p.'s "far away" from their expected values. Namely, we prove that, under appropriate regularity conditions, for an arbitrary fixed price p, there exists a constant i(p) 2: 0 such that
(1.1 )
In accordance with standard LD terminology (see [3]), we refer to the price depending constant i(p) as the entropy. In what follows we shall formulate and prove (1.1) as an exact mathematical theorem. LD theorems for random equilibrium prices were earlier presented in [7],[8J. The version here is of "local type" in that we are concerned with probabilities of observations of r.e.p.'s in small neighborhoods of a given fixed price. Because of this it turns out that the hypotheses of [7],[8J can be somewhat relaxed. Also it becomes possible to give a self-contained proof which does not lean on the general abstract LD theory. Therefore the proof ought to be accessible also to a reader who is not an LD specialist. The basic idea in the proof is to use a centering argument of a type which is commonly used in LD theory.
Esa N ummelin
2
249
Formulation of the LD theorem
We describe now the basic set-up and formulate the large deviation theorem in exact terms. We will be concerned with a sequence En, n = 1,2, ... , of economies. We assume that in the economy En there are N n economic agents labeled as i = 1, ... , N n . We assume that N n is of the order O(n); namely,
Nn
::;
An for some constant A
0 : Iq - pi < c2(P);
(H4) ::JA2(P)
(H5) ::JA-1(p)
< 00:
1(:~(q)1
::; A2(P) w.p.l, for all i and n, for
l(n-lZ~(p))-ll::; A-l(P) w.p.1., for all n.
Remarks. (i) Condition (H4) implies condition (H3).
Random Exchange Economies
252
(ii) Suppose that (in(P) = (i (p) , where (i(P), i = 1,2, ... , are i.i.d. as before. Now, due to (2.8), the hypothesis (H2) is trivially true. Also it turns out that in this case hypothesis (H5) can be replaced by the simpler hypothesis (H5 ') ,i (p) is non-singular, see [9].
(i) Suppose that the hypotheses (H1-3) hold true. Then there exists a constant Mo (p) < 00 such that
Theorem 2.1.
P(7r~
n U(p, c)
eventually, for all 0 < c
n U(p, c)
-=1=
Then there exists a
0) > e- n (i(p)+M 1 (p)E)
o.
Let us call a price p E R~ non-expected, if the entropy i(p) > o. Under the conditions (2.4-5) this is equivalent to p not being a zero of the mean excess demand fl(P):
fl(P)
-=1=
o.
By using Borel-Cantelli lemma we obtain the following corollary of part (i) of the LD theorem: Corollary 2.1. Suppose that the hypotheses (H1-3) hold true. Let p E R~ be a
non-expected price. Then 7r~
3
n U(p, c) = 0 eventually,
w.p.1, for all 0 < c < cl(P)·
Proof of the LD theorem
For the proof of the upper bound (i) we need two lemmas. standard type in LD theory.
The first is of
We define the following sequence of probability measures:
Pn;p(dw) = ea(p).zn(w;p)-Cn(a(p);p) P(dw), n = 1,2, .... Lemma 3.1. Suppose that hypotheses (H1-2) hold true. Then for each 0, there exists a constant 'T] = 'T]( 0 such that
Esa Nummelin
253
Proof of Lemma 3.1. Let t > 0 be arbitrary. By Chebyshev's inequality we have for the j'th component of the total excess demand:
p n,p . (zjn (p) > . etZ~(p) _ n&) < _ e- tn8 E n,p
where
ej
denotes the j'th unit vector in RI. Due to (HI) and (H2),
n-tCX)
= &(t)t where &(t)
---+
0 as t
---+
M
O. By choosing t small enough we thus see that
limsupn-llogPn;p(Z~(p) ~ n&)
< O.
n-tCX)
By symmetry, we have also limsupn-llogPn;p(Z~(p) ::::; -n&)
< 0,
n-tcx)
o
which completes the proof of Lemma 1.
Lemma 3.2. Suppose that the hypotheses (Hl-2) hold true. Then, for all &> 0, we have:
e- n (i(p)+2I a (p)18) < P(IZn(p)1 < n&) < e- n (i(p)-2I a (p)18) eventually.
Proof of Lemma 3.2. Recalling (2.3) we see that it suffices to prove that lim sup In- 1 logP(IZn(P)1
< n&) -c(o:(p);p)l::::; lo:(p)W
(3.1)
n-tCX)
Due to Lemma 1,
~ < 1- e- n'T)(8;p) < Pn;p(IZn(P) I < n&)
::::; 1 eventually,
and hence, in view of the definition of the probability measure Pn;p(}
Now clearly,
whence -log2 -lo:(p)ln& < logP(IZn(p)1 < n&) - Cn(o:(p);p)::::; lo:(p)ln& eventually, from which the claim (3.1) follows by letting n
---+ 00.
o
Random Exchange Economies
254
Now we are able to prove the upper bound inequality (i). To this end, note first that, due to the hypotheses (2.1), (H3) and the mean value theorem, we can conclude that the event 7r~
n U(p, E) =I- 0
implies the event
IZn(P) I ~ AA1(p)nE w.p.l, for all n;:::: 1, 0 < E < E1(P)· Thus, in view of Lemma 2, P(7r~
n U(p, E) =I- 0)
~ P(IZn(p)1
< AA1(p)nE) < e-n(i(p)-Mo(p)e:) eventually,
where the constant Mo(p) = 2AA1(p)la(p)l. For the lower bound we need the following lemma which is a straightforward corollary of Theorem XIV in [6].
Lemma 3.3. Suppose that f : R~ E-neighborhood of the price p:
---t
If"(q)1 ~ M
e-n(i(p)+Mt(p)c:)
E ( ))
P eventually, 2
-1
where the constant
This completes the proof of the theorem.
Acknowledgements I would like to thank Professor Krishna B. Athreya for the invitation to take part in this Festschrift in Honor of Professor Rabi Bhattacharya. I am indebted to Professor Mukul Majumdar for useful comments on the text.
Random Exchange Economies
256
Bibliography [1]
Bhattacharya, R.N. and Majumdar, M.: Random exchange economies. J. Economic Theory 6, 37-67 (1973).
[2]
Debreu, G.: Theory of Value. Wiley, 1959.
[3]
Dembo, A. and Zeitouni, 0.: Large Deviations and Applications. Jones & Bartlett, Boston, 1993.
[4]
Follmer, H.: Random economies with many interacting agents. J. Math. Economics 1, 52-62 (1974).
[5]
Hildenbrand, W.: Random preferences and equilibrium analysis. J. Economic Theory 3, 414-429 (1971).
[6]
Lang, S.: Real and Functional Analysis. Springer, New York, 1993.
[7]
Nummelin, E.: On the existence and convergence of price equilibria for random economies. The Annals of Applied Probability 10, 268-282 (2000).
[8]
Nummelin, E.: Large deviations of random vector fields with applications to economics. Advances in Applied Math. 24, 222-259 (2000).
[9]
Nummelin, E.: Manuscript, under preparation, 2003.
[10] Varian, H.: Microeconomic Analysis. Norton, New York, 1992.
Asymptotic estimation theory of change-point problems for time series regression models and its applications Takayuki Shiohama, Masanobu Taniguchi Osaka University, Japan
and Madan L. Puri Indiana University, USA Abstract It is important to detect the structural change in the trend of time series model. This paper addresses the problem of estimating change point in the trend of time series regression models with circular ARMA residuals. First we show the asymptotics of the likelihood ratio between contiguous hypotheses. Next we construct the maximum likelihood estimator (MLE) and Bayes estimator (BE) for unknown parameters including change point. Then it is shown that the proposed BE is asymptotically efficient, and that MLE is not so generally. Numerical studies and the applications are also given.
AMS subject classifications: 62MlO, 62M15, 62N99 Keywords: Change point, time series regression, asymptotic efficiency, Bayes estimator, maximum likelihood estimator.
1
Introduction
The change point problem for serially correlated data has been extensively studied in the literature. References on various time series models with change-point can be found in the book of Csorgo and Horvath (1997) and the review paper of Kokoszka and Leipus (2000). Focusing on a change point in the mean of linear process, Bai (1994) derived the limiting distribution of a consistent change-point estimator by least squares method. Later Kokoszka and Leipus (1998) studied the consistency of CUSUM type estimators of mean shift for dependent observations. Their results include long-memory processes. For a spectral parameter change in Gaussian stationary process, Picard (1985) addressed the problem of testing and estimation. Giraitis and Leipus (1990,1992) generalized Picard's results to the case when the process concerned is possibly non-Gaussian. For a structural change in regression model, a number of authors studied the testing and estimation of change point. It is important to detect the structural change in economic time series because parameter instability is common in this field. For testing structural changes in regression models with longmemory errors, Hidalgo and Robinson (1996) explored a testing procedure with 257
Asymptotic estimation theory
258
nonstochastic and stochastic regressors. Asymptotic properties of change-point estimator in linear regression models were obtained by Bai(1998), where the error process may include dependent and heteroskedastic observations. Despite the large body of literature on estimating unknown change-point in time series models, the asymptotic efficiency has been rarely discussed. For the case of independent and identically distributed observations, Ritov (1990) obtained an asymptotically efficient estimator of change point in distribution by a Bayesian approach. Also the asymptotic efficiency of Bayes estimator for change-point was studied by Kutoyants (1994) for diffusion-type process. Dabye and Kutoyants (2001) showed consistency for change-point in a Poisson process when the model was misspecified. The present paper develops the asymptotic theory of estimating unknown parameters in time series regression models with circular ARMA residuals. The model and the assumptions imposed are explained in Section 2. Also Section 2 discusses the fundamental asymptotics for the likelihood ratio process between contiguous hypotheses. Section 3 provides the asymptotics of the maximum likelihood estimator (MLE) and Bayes estimator (BE) for unknown parameters including change-point. Then it is shown that the BE is asymptotically efficient, and that the MLE is not so generally. Some numerical examples by simulations are given in Section 4. Section 5 is devoted to the investigation of some real time series data. All the proofs are collected in Section 6. Throughout this paper we use the following notations. A' denotes the transpose of a vector or matrix A and X(,) is the indicator function.
2
Asymptotics of likelihood ratio and some lemmas
Consider the following linear regression model Yt
where
= {o'x(t/n ::; T) + (3'x(t/n > T)}Zt + Ut, = Tt(O, (3, T) + Ut, (say), t = 1, ... , n
Zt = (Ztl, ... , Ztq)'
are observable regressors,
0
=
(Q1, ... ,
(2.1)
Qq)'
and (3 =
(/31, ... , /3q)' are unknown parameter vectors, and {ud is a Gaussian circular ARMA process with spectral density f()..) and E(ut) = O. Here T is an unknown change-point satisfying 0 < T < 1 and (0', (3', T) E E> c IRq x IRq x lR. Letting n-h
2:=
Zt+h,jZtk,
h
= 0,1, ...
t=l
n
2:=
ZHh,jZtk,
h = 0, -1, ... ,
t=l-h
we will make the following assumptions on the regressors {zd, which are a sort
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
259
of Grenander's conditions.
Assumption 2.1. l+p
(G.1) aii(O) = O(n),
i
= 1, ... , q,
and
L ZZi = O(p) for any (1 ~ l ~ n). t=l
(G.2) limn----;oo z~+l,daii(O)
= 0,
i = 1, ... , q.
(G.3) The limit lim
an (h)
_~_J_
n----;oo
n
= Pi ·(h) J
exists for every i, j = 1, ... ,q and h = 0, ±1, .... Let R(h) = {pij(h); i, j = 1, ... ,q}. (G.4) R(O) is nonsingular. From (G.3) there exists a Hermitian matrix function M(A) = {Mij(A); i,j = 1, ... ,q} with positive semidefinite increments such that (2.2)
Suppose that the stretch of series from model (1) Y n = (Yl,··· ,Yn)' is available. Denote the covariance matrix of Un = (Ul,···, un)' by 2: n , and let tn = (rl,··· ,rn)' with r t = r t ( a,{3, T). Then the likelihood function based on Y n is given by
Since we assume that {Ut} is a circular ARMA process, it is seen that the following representation
2: n =
U~diag{21T f(Ad,·
where Un = {n- 1 / 2 exp(21Tits/n); t, s son (1977)). Write
~n
has
.. ,21T f(An)} Un
= 1, ... ,n} and Ak = 21Tk/n (see Ander-
Then the likelihood function (2.3) is rewritten as
Define the local sequence for the parameters:
a n =a+n- 1 / 2 a,
f3n=f3+n- 1 / 2 b,
Tn=T+n-1p
(2.5)
Asymptotic estimation theory
260
where a, bE IRq and pER Under the local sequence (2.5) the likelihood ratio process is represented as
where dn(>'k)
= (27fn)-1/2 L~=1 UteitAk and A(>'k) = A1 + A2 + A3 with [Tn+p] Al
= (27f!(Ak))-1/2
L
((3 - o:)'zse-iSAk,
s=[Tn]+1 [m+p] A2 = -(27fn!(Ak))-1/2 a'zse-iSAk
L
8=1
and
s=[Tn+p]+1 Here note that dn(Ak), k = 1,2, ... are i.i.d. complex normal random variables with mean 0 and variance !(Ak) (c.f. Anderson (1977)). Henceforth we write the spectral representation of Ut by
Ut
=
i:
eitAdZu(A).
(2.7)
The asymptotic distribution of Zn (a, b, p) is given as follows.
Theorem 2.1. Suppose that Assumption 2.1 holds. Then for all (0:', (3', T) E the log-likelihood ratio has the asymptotic representation log Zn(a, b, p)
= ((3 - o:)'Wl 1 - 87f2
+ yTa'W2 + ~b'W3 [Tn+p]
<Xl
L
f(j)
j=-<Xl
=
log Z(a, b, p)
where
L
((3 - 0:)' zs+jZ~((3 - 0:)
s=[Tn]+1
+ op(l),
(say),
e,
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
261
and
Here WI, W 2 and W3 are asymptotically normal with mean 0 and covariance matrix VI, V2 and V3 , respectively, where
Next we present some fundamental lemmas which are useful in the estimation of change point.
Lemma 2.1. Suppose that Assumption 2.1 holds. Then for any compact set C C 8, we have sup Ea,(3,TZ~/2(a, b, p) ::; exp{ -g(a, b, pH a,{3,TEC
where g(a, b, p)
= (ai, b')K
(~) + cipi
with some positive definite matrix K and c >
o.
Lemma 2.2. Suppose that Assumption 2.1 holds. Then for any compact set C C 8, there exist Ii(C) = Ii, B(C) = B such that sup [iial - a211 2 + Ilb l (a,{3,T)EC!ai I.
BE
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
269
Table 4.2. Average and RMSE of MLE and BE for
T
when
T
= 0.5 for Model (II).
Mean
RMSE
n
= 100
n
= 300
n
= 100
n
= 300
v
MLE
BE
MLE
BE
MLE
BE
MLE
BE
7r /2
0.5028
0.5017
0.5005
0.5004
0.0250
0.0201
0.0074
0.0065
7r /4
0.4848
0.4849
0.4944
0.4947
0.0584
0.0496
0.0266
0.0211
7r /8
0.4840
0.4969
0.4857
0.4895
0.1361
0.1217
0.0551
0.0418
7r/16
0.5847
0.5710
0.5183
0.5161
0.2283
0.1629
0.0833
0.0697
7r/32
0.5434
0.5381
0.4613
0.4675
0.2141
0.1715
0.1285
0.1021
Next we compute the Bayes estimator. For simplicity of calculation, we postulate the result that the asymptotic distribution of aM Land i3 B are same as CxB and (c.f. Kutoyants (1994)). Therefore the Bayes change point estimator TB becomes
i3B _
TB =
L~::qq TiLn(&ML, i3ML' Ti) n-q Li=q Ln(D'.ML, J3ML' Ti) A
A
,
Ti
= i/n,
1,
= q, ... , n -
q.
Nile data
These data have been investigated by an i.i.d. framework, for details see e.g., Cobb (1978) and Hinkley and Schechtmann (1987). The data consist ofreadings of the annual flows of the Nile River at Aswan from 1871 to 1970. There was a shift in the flow levels in 1899, which was attributed partly to the weather changes and partly to the start of construction work for a new dam at Aswan. We apply a mean shift model for this data with Zt = 1. The MLE gives aML = 1097.75, ~ML = 849.97 and T = 0.28 (k = 28). On the other hand, the BE is TB = 0.2790(k = [TBn] = 27). The original series together with ML trend estimator are plotted in Figure 5.1. Figure 5.2 shows the posterior distribution of T, which shows strong evidence that the shift occurred in 1898. These results agree with those of the other authors. U. S. quarterly unemployment rates
This data set, (n = 184), is analyzed in Tsay (2002) by use of threshold AR model for first differenced series. Here we explain a seasonal trend by employing regression models with trigonometric functions and change point. The regression function is chosen to be Zt = (1, cos(vt))'. A Fisher's test for added deterministic periodic component rejects the Gaussian white noise at level .01. We have taken v = 47r /184 which gives the peak in the periodogram. The MLE detected the possible change point TML = 0.49(k = 90) and corresponding regression coefficients &ML = (4.65, -0.85)' and i3ML = (6.81, -0.94)'. The BE is TB = 0.49 which corresponds to k = [TBn] = 90. The estimated trend
Asymptotic estimation theory
270
~~----------------------------------------------------------~
o
20
40
60
100
80
Figure 5.1. Nile data with estimated mean and change point
k = 28
(MLE).
d-
~-
;;d-
~0
20
40
Figure 5.2. Posterior distribution of T.
60
80
100
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
271
function together with original data is shown in Figure 5.3. The posterior distribution for T is plotted in Figure 5.4. This analysis reveals that the mean level of an unemployment rate increased to about 2% in 3rd quarter of 1970, while the amplitude of long term cyclical trend stayed the same level throughout the period.
International airline ticket sales data This data have been investigated by fitting a seasonal ARIMA model (Box et. al. (1994) ). An alternative modeling is deterministic cyclical trend function modeling with a change point for once-differentiated data. The regression function given by z~ = (COS(VIt),COS(V2t),COS(V3t)) is selected by examining the periodogram. There are three frequencies which have comparably large spectrum, namely VI = 267r/143, V2 = 507r/143 and V3 = 747r/143. The ML estimators give the &ML = (-7.54,14.14, 1.43)',.6ML = (-35.76,37.01, -19.66)' and TML = 0.6319(k = 91). While Bayes estimator is TB = 0.6216(k = 89). As shown in the posterior probability of T, the change might have occurred from t = 80 to 100, which implies the possibility of multiple changes.
6
Proofs
Proof of Theorem 1. From (2.7), we have log Zn(a,,6, T) = -
2~
t
f(Ak)-1/2 {dn(Ak)A(Ak)
+ dn(Ak)
A(Ak)} -
k=1
2~
t
(6.1)
IA(Ak)12
k=1
First we evaluate the first term in (6.2). From (2.7) we have
-
2~
t
f(Ak)-1/2 {dn(Ak)A(Ak)
+ dn(Ak)
A(Ak)}
k=1
= __1_
2yfii
f(Ak)-1/2
k=l
+ dn (Ak)A2 + dn (Ak)A3 + dn(Ak)Al + dn(Ak)A2 + dn(Ak)A3} El + E2 + E3 + E4 + E5 + E6 (say). X
=
t
{dn(Ak)A I
Write the spectral density f(A) in the form
where Rf(j)'s satisfy 2.:;:-00 IjlmIRf(j)1 < 00 for any given mEN. Then, from Theorem 3.8.3 of Brillinger (1975) we may write
Asymptotic estimation theory
272
o
50
100
150
Figure 5.3. U. S. quarterly unemployment rates (1948-1993) with estimated trend and change point k = 90(MLE).
C>
~U">
~C>
~U">
tiC>
ciU">
ci-
:30
50
Figure 5.4. Posterior distribution of T.
100
150
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
o
20
40
60
80
100
273
120
140
Figure 5.5. The international airline ticket sales, once -differentiated data (dotted line) with estimated trend and change point k = 91 (black line).
"k ( [Tf] ((3 - 0)' Zt eit >..)
j=-oo 00
j=-oo
In
t=[Tn]+l [Tn+p] [Tn+p]
= ~~_1_ ~ r(j) ~ 41r 21r..;n ~
(-
~
t=h+1
From 1 - p ::; t - S ::; [Tn]
+p-
8=1 n
~ ((3 - 0)' ztz~a.!. ~ ei(t-s-j)>"k ~
8=1
1, t - s - j
ITfl a' z,e- i ' ' ' , )
n~ k=l
= 0, it is seen that
Asymptotic estimation theory
280
Similarly we observe
(6.17)
Now we evaluate
n = -~~~ '" r(j) '~ " '" a'zt z ' b~ 47r 27r n ~ ~ s n j=-oo t=l s=[Tn+p]+l ~n+~
00
n ' " ei(t-s-j)Ak. ~ k=l
Since -n + 1 :::; t - s :::; -1, we have only to evaluate for t - s - j
= 0, -no (6.18)
1 n -2n LA2 A 3 k=l
1 1
~ - - - J r ( l - r)
47r 27r
~
1 [Tn+p] 1 n 1 n. . r(j)L a'ztz:b- Le~(t-S-J)Ak j=-oo VITi t=l y'(1 - r)n s=[Tn+p]+l n k=l
L
L
00
f=
_ Jr(l - r) r(j)a' j7r eijAdM(A)b 47r . 27r J=-OO . -7r
= _ y'' ' --'r('' ' ---l--r----,-) a' j7r f(A)-ldM(A)b. 47r
-7r
Similarly we have
_~
t
2n k=l
A3A2 '" - Jr(l - r) a' j7r f(A)-ldM(A)b. 47r_7r
(6.19)
From the equations from (6.14) to (6.19) together with (6.4), (6.7), (6.10) and (6.13) complete the proof of Theorem l. Proof of Lemma 1. From Hannan (1970) and Anderson (1977) the joint density of dn(Al), ... ,dn(An) is given by k
p(dn(Al),··· ,dn(An)) = en
II exp( -dn(Ak)f(Ak)-ldn(Ak)) k=l
(6.20)
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
281
EZ~/2(a, b, p)
= Eexp
In [ - 4J7i ( ;
JI
Cn exp ( -
=
=
x exp ( -
4~
x exp ( -
4~
t.
t,
J... J
1
exp [ 16n
t,
d n(Ak)!(Ak)-l d n(Akl)
/(Ak)-1/2 { d,,(Ak)A(Ak)
t,
(f(,\k)-1/2 dn(Ak) 1
I: IA(Ak)12 n
4n
k=l
=
exp (
-l~n
t.
n
exp - 4n {; IA(Ak)12
+ dn(A.)
A(Ak) } )
IA(Ak)!') d(d,,(A1)" . dn(An))
C n exp [-
X
[1
] f(Ak)~1/2 {dn(Ak)A(Ak) + dn(Ad _ A(Ak)}
+ ~j;;))
I: IA(Ak)12 n
]
(mk)
1/2dn(Ak)
+ ~j;;l)
d(dn(AI)··· dn(An)
k=l
IA (Ak)1 2 )
.
Recalling the definition of likelihood process in (2.7), we have
From the proof of Theorem 1 and Assumption (G.1), the first term in (6.21) is bounded by 1
n
-16n
L (AlAI) k=l
3 1 [m+p] ~ -1681[2
(6.22)
[Tn+p]
I:
I:
({3 - a)' Ztr(t - s)zs({3 - a) t=[Tn]+1 s=[Tn]+1 3 1 [Tn+p] < - - --2 " " ' { ({3 - a)' zt} 2 x min f (A) ~ I 16 81[ ~ A t=[Tn]+1 = - [O(p)] for p> O. We have already shown in (6.17) and (6.18) that
1~n
t
{AI(A2
+ A 3)} = O(n~I/2)
k=l
and _1_
~ {AI (A2 + A3)} = O(n~I/2).
16n~
k=l
(6.23)
]
]
Asymptotic estimation theory
282
Furthermore, from the proof of Theorem 1 we can find a positive definite matrix K so that (6.24) Hence (6.23)-(6.24) implies the required result. Proof of Lemma 2. Let ()~ = (a~,,6~, 71)' and (); = (a;,,6; 72)' are some given values in 8, and are the forms of a1 = a + n- 1/ 2a1,,61 = ,6 + n- 1/ 2b2, 71 = 7+n- 1pl,a2 = a+n- 1/ 2a2,,62 = ,6+n- 1/ 2b1 and 72 = 7+n- 1p2. Denoting A(..\'k) under (}i as A(ai, bi , Pi; Ak) we set ~ln
= A(al,b1,Pl;Ak) - A(a2,b2,P2;Ak)
~2n
= IA(a1,b1,Pl;Ak)1 2 -IA(a2,b2,P2;Ak)1 2
and
The process Y n is written as (6.25) Then we observe
E a ,{3,T
\Z~/4(al' b 1 , PI) - Z~/4(a2' b2, P2)\1/4
= E a1 ,{31,Tl (1 - Yn)4
= E (1 - 4Yn + 6Y; - 4Y; + Y;) We have
Similarly, we obtain
6EY;
=
6exp(41] + 2,),
and
EY;
=
exp(161] + 4,).
Takayuki Shiohama, Masanobu Taniguchi, Madan L. Puri
283
Hence
(6.26) Using the following expansion for small y
we have E[l - Yn]4
= 1 - 4(1 + 1] + ,) + 6(1 + 41] + 2,) - 4(1 + 91] + 3,) + (1 + 161] + 4,) =
+0(1]2) + 0(/2) + 0(1],) 0 + 0(1]2) + 0(/2) + 0(1],)
which implies that the Taylor expansion of (6.26) starts with the linear combinations of second order terms of 1]2, ,2 and 1],. Here we need to evaluate the asymptotics of 1] and, in (6.26). Assume that without loss of generality PI 2 P2, then
Using the similar argument in proof of Lemma 1, we observe
which 'is written as
Analogously we have
which completes the proof.
Proof of Theorem 2. The proof follows from Theorem 1, Lemmas 1 and 2 of this paper and Theorem 1.10.1 of Ibragimov and Has'minski (1981). Proof of Theorem 3. The properties of the likelihood ratio Zn (a, b, p) established in Theorem 1, Lemmas 1 and 2 allow us to refer to Theorem 1.10.2 of Ibragimov and Has'minski (1981).
Bibliography [1] Anderson, T. W. (1977). Estimation for autoregressive moving average models in time and frequency domains. Ann. Statist. 5 842-865.
284
Asymptotic estimation theory
[2] Bai, J. (1994). Least squares estimation of a shift in linear processes. J. Time Ser. Anal. 15453-472. [3] Bai, J. (1997). Estimation of change point in multiple regression models. The Review of Economics and Statistics. 79 551-563. [4] Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. (1994). Time Series Analysis Forcasting and Control, 3rd. ed. Prentice Hall, New Jersey. [5] Brillinger, D. R. (1981). Time Series: Data Analysis and Theory, expanded ed. San Francisco: Holden-day. [6] Cobb, G. W. (1978). The probrem of the Nile: Conditional solution to a change-point problem, Biometrika. 65 243-251. [7] Csorgo, M. and Horvath, L. (1997). Limit Theorems in Change-Point Analysis. Wiley, New York. [8] Dabye, Ali S. and Kutoyants, Yu. A. (2001). Misspecified change-point estimation problem for a Poisson process. J. Appl. Prob. 38A 701-709. [9] Giraitis, L. and Leipus, R. (1990). A functional CLT for nonparametric estimates of spectra and change-point problem for spectral function. Lietunos Mathematikos Rinkinys. 30674-697. [10] Giraitis, L. and Leipus, R. (1992). Testing and estimating in the change-point problem of the spectral function. Lietunos Mathematikos Rinkinys. 32 20-38. [11] Hidalgo, J. and Robinson, P. M. (1996). Testing for structural change in a long-memory environment. J. Econometrics. 70 159-174. [12] Hinkley, D. V. and Schechtman, E. (1987). Conditional bootstrap methods in the meanshift model. Biometrika. 74 85-93. [13] Hannan, E. J. (1970). Multiple Time Series. Wiley, New York. [14] Ibragimov, I. A. and Has'minski, R. Z. (1981). Statistical Estimation. New York: Springer-Verlag [15] Kokoszka, P. and Leipus, R. (1998). Change-point in the mean of dependent observations. Statist. and Probab. Letters. 40 385-393. [16] Kokoszka, P. and Leipus, R. (2000). Detection and estimation of changes in regime. Preprint. [17] Kutoyants, Yu. A. (1994). Identification of Dynamical System with Small Noise. Dordrecht: Kluwer Academic Publishers. [18] Picard, D. (1985). Testing and estimating change points in time series. Adv. in Appl. Probab. 17 841-867. [19] Ritov, y. (1990). Asymptotic efficient estimation of the change point with unknown distributions. Ann. Statist. 18 1829-1839. [20] Tsay, R. S. (2002). Analysis of Financial Time Series. Wiley, New York.
Fractional Brownian motion as a differentiable generalized Gaussian process Victoria Zinde-Walsh 1 McGill University
fj
CIREQ
and Peter C.B. Phillips 2 Cowles Foundation, Yale University University of Auckland fj University of York Abstract Brownian motion can be characterized as a generalized random process and, as such, has a generalized derivative whose covariance functional is the delta function. In a similar fashion, fractional Brownian motion can be interpreted as a generalized random process and shown to possess a generalized derivative. The resulting process is a generalized Gaussian process with mean functional zero and covariance functional that can be interpreted as a fractional integral or fractional derivative of the deltafunction.
Keywords: Brownian motion, fractional Brownian motion, fractional derivative, covariance functional, delta function, generalized derivative, generalized Gaussian process JEL Classification Number: C32, Time Series Models
1
Introd uction
Fractional Brownian motion, like ordinary Brownian motion, has almost everywhere continuous sample paths of unbounded variation and ordinary derivatives of the process do not exist. Gel'fand and Vilenkin (1964) provided an alternative characterization of Brownian motion as a generalized Gaussian process defined as a random functional on a space of well behaved functions. Interpreted as a generalized random process, Brownian motion is differentiable. A generalized Gaussian process is uniquely determined by its mean functional and the bivariate covariance functional. Correspondingly, the generalized derivative of a Gaussian process with zero mean functional is a generalized Gaussian process with zero mean functional and covariance functional that can be computed from the covariance functional of the original process. Gel'fand and Vilenkin provide a description of the generalized Gaussian process which represents the derivative of Brownian motion. This process has a covariance functional that can be interpreted in terms of the delta-function. 1 Zinde-Walsh thanks the Fonds Quebecois de la recherche sur la societ e et la culture (FQRSC) and the Social Sciences and Humanities Research Council of Canada (SSHRC) for support of this research. 2Phillips thanks the NSF for support under Grant No. SES 0092509.
285
Fractional Brownian Motion
286
The present paper considers fractional Brownian motion from the same perspective as a generalized process and shows how to characterize its generalized derivative. The resulting process is a generalized Gaussian process with mean functional zero and covariance functional that can be interpreted as a fractional integral or fractional derivative of the delta-function. Higher order derivatives can be similarly described.
2
Fractional Brownian motion as a generalized random process
The form of the fractional Brownian motion process considered here was introduced by Mandelbrot and Van Ness (1968). In Marinucci and Robinson(1999) it is called Type I fractional Brownian motion. This form of (standard) fractional Brownian motion for 0 < H < 1 is represented in integral form as
BH(r) = A(H)-l
[1:00 (r - s)H-~ dB(s) - [°00 (_s)H-~ dB(S)] ,
r 2: 0 (2.1)
with A(H)
=
[2k + fo
oo
{(I
+ s) H-~
1
- sH-!} ds]"2 and where B is standard
Brownian motion and H is the self similarity index. For H = ~ the process coincides with Brownian motion. Samorodnitsky and Taqqu (1994, ch.7.2) give the 'moving average' representation (2.1) as well as an alternative harmonizable representation of the fractional Brownian motion process. Bhattacharya and Waymire (1990) provide some background discussion of the Hurst phenomenon and subsequent theoretical developments that led to the consideration of stochastic processes of this type. The mean functional of (2.1) is E B H(r) is (Samorodnitsky and Taqqu, 1994)
V(rl' r2) = EB H(rl)B H(r2) = Note that BH(O)
= 0 and for V(rl,r2)
=
rl, r2
~
= 0 and the covariance kernel V (rl , r2 ) [h1 2H
+ Ir212H -lr2 -
rll2H] .
> 0 the covariance kernel becomes
~ [r~H +r~H -lr2 _rlI2H].
(2.2)
The usual covariance kernel of Brownian motion follows when H = ~. Following Gel'fand and Vilenkin (1964), define the space K of 'test functions' as follows. K is the space of infinitely continuously differentiable functions
~ this is a fractional integral, while for H < ~ it is a fractional derivative. We examine the two cases separately. In the case of a fractional integral with a = 2H - 1
> 0 and t > 0 we have
t
1
ta-
1
(r8) (t) = r(a) Jo (t - xt- 1 8(x)dx = r(a)· Then
1 1 [r~a) 1 1 [I 1 [I 1 1 o) 00
00
=
=
00
00
=
00
=
¢(t) (Ia1j;) (t)dt
t
¢(t) ¢(t) ¢(t)
00
(Ia
(t-x)a-l1j;(x)dx] dt
t
(I a8) (t - X)1j;(X)dX] dt
t
(I a8) (W)1j;(t - W)dW] dt (t - s)¢(t)1j;(s)dsdt,
(3.4)
291
Victoria Zinde- Walsh and Peter G.B.Phillips
and similarly
1=
'ljJ(t) (JU¢) (t)dt
so that
Vk[¢, 'ljJ]
=
r(2~ + 1)
=
r(2H
+ 1)
=
1= 1=
{1= 1= 1=
(JUb) (t - s)¢(t)'ljJ(s)dsdt,
¢(t) (I2H-1'ljJ) (t)dt
+
1=
'ljJ(t)(J 2H - 1¢) (t)dt }
(I2H- 1b) (t - s)¢(t)'ljJ(s)dtds,
giving the result (3.3). In the case of a fractional derivative with a = 2H - 1 < 0 (0 < H < ~) we write J2H -1 I = J2H I' and then
1= 1= [rta) 1 1= [1 1= [1 1= [1 1= 1= ¢(t) (Ia'ljJ') (t)dt
=
=
with a similar result for
r(2~ + 1)
= r(2~ + 1) r(2H
t
¢(t)
t
¢(t)
=
=
t
¢(t)
=
Vk[¢, 'ljJ] =
t
¢(t)
=
+ 1)
(t-x)a-1'ljJ'(x)dX] dt
(rb)(t - x)'ljJ'(X)dX] dt (rb)(W)'ljJ'(t-W)dW] dt (Ia- 1b) (w)'ljJ(t - W)dW] dt
(JU- 1b) (t - s)¢(t)'ljJ(s)dsdt,
Jo= ¢(t)(Ja¢')(t)dt. It follows that
{1= {1= 1= 1=
1= + 1=
¢(t) (I2H-1'ljJ) (t)dt ¢(t) (I2H'ljJ')(t)dt
+
'ljJ(t)(I2H-1¢) (t)dt }
'ljJ(t)(I2H ¢')(t)dt}
(J 2H - 1b) (t - s)¢(t)'ljJ(s)dsdt,
as required for (3.3). Clearly, one can proceed with further differentiation of the fractional process. Subsequent m-th order derivatives will provide gen,eralized Gaussian processes with mean functional zero and covariance functional expressed in terms of the generalized function (J 2H - m b) (t - s). Victoria Zinde Walsh Department of Economics McGill University & CIREQ
Peter C.B. Phillips Cowles Foundation, Yale University University of Auckland & University of York
292
Fractional Brownian Motion
Bibliography [1] Bhattacharya, R. N. and E. C. Waymire (1990). Stochastic Processes with Applications. New York: John Wiley. [2] Gel'fand I M. and G. E. Shilov (1964). Generalized Functions, Vol.4. New York: Academic Press. [3] Gel'fand I M. and N. Ya. Vilenkin (1964). Generalized Functions, Vol. 1. New York: Academic Press. [4] Mandelbrot, B.B. and J. W. Van Ness (1968). "Fractional Brownian Motions, Fractional Noises and Applications". SIAM Review, 10, 422-437. [5] Marinucci, D. and P. M. Robinson (1999). "Alternative Forms of Fractional Brownian Motion". Journal of Statistical Planning and Inference, 80, 111122. [6] Samorodnitsky, G. and M. S. Taqqu (1994). Stable Non-Gaussian Random Processes. London: Chapman & Hall.