Mass Transportation Problems: Volume I: Theory (Probability and its Applications)

To my wife, Zoja, and to my parents Nadezda and Todor Rachevi. To my wife, Gabi. Zvetlozar (Zari) Rachev Ludger R¨ u...

Author: Svetlozar T. Rachev Ludger Ruschendorf

33 downloads 844 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

To my wife, Zoja, and to my parents Nadezda and Todor Rachevi.

To my wife, Gabi.

Zvetlozar (Zari) Rachev

Ludger R¨ uschendorf

This page intentionally left blank

Preface to Volume I

The subject of this book, mass transportation problems (MTPs), concerns the optimal transfer of masses from one location to another, where the optimality depends upon the context of the problem. Mass transportation problems appear in various forms and in various areas of mathematics and have been formulated at different levels of generality. Whereas the continuous case of the transportation problem may be cast in measure-theoretic terms, the discrete case deals with optimization over generalized transportation polyhedra. Accordingly, work on these problems has developed in several separate and independent directions. The aim of this monograph is to investigate and to develop, in a systematic fashion, the Monge–Kantorovich mass transportation problem (MKP) and the Kantorovich–Rubinstein transshipment problem (KRP). We consider several modifications of these problems known as the MTP with partial knowledge of the marginals and the MTP with additional constraints (MTPA). We also discuss extensively a variety of stochastic applications. In the first volume of Mass Transportation Problems we concentrate on the general mathematical theory of mass transportation. In Volume II we expand the scope of applications of mass transportation problems. In 1781 Gaspard Monge proposed in simple prose a seemingly straightforward problem of optimization. It was destined to have wide ramifications. He began his paper on the theory of “clearings and fillings” as follows: When one must transport soil from one location to another, the custom is to give the name clearing to the volume of the soil that one

viii

Preface to Volume I must transport and the name filling (“remblai”) to the space that it must occupy after transfer. Since the cost of transportation of one molecule is, all other things being equal, proportional to its weight and the interval that it must travel, and consequently the total cost of transportation being proportional to the sum of the products of the molecules each multiplied by the interval traversed; given the shape and position, the clearing and the filling, it is not the same for one molecule of the clearing to be moved to one or another spot of the filling. Rather, there is a certain distribution to be made of the molecules from the clearing to the filling, by which the sum of the products of molecules by intervals travelled will be the least possible, and the cost of the total transportation will be a minimum. (Monge, (1781, p. 666)).

In mathematical language Monge proposed the following nonlinear varational problem. Given two sets A, B of equal volume, find an optimal volume-preserving map between them; the optimality is evaluated by a cost function c(x, y) representing the cost per unit mass for transporting material from x ∈ A to y ∈ B. The optimal map is the one that minimizes the total cost of transferring the mass from A to B. Monge considered this problem with cost function equal to the Euclidean distance in IRd : c(x, y) = |x − y|. Monge’s problem turned out to be the prototype for a class of problems arising in various fields such as mathematical economics, functional analysis, probability and statistics, linear and stochastic programming, differential geometry, information theory, cybernetics, and ma trix theory. The optimization function A c(x, t(x)) dx is nonlinear in the transportation function t, and moreover, the set of admissible transportations is a nonconvex set. This explains why it took a long time until even existence results for optimal solutions could be established. The first general existence result was given in 1979 by Sudakov. On the second page of his paper Monge himself had remarked that to obtain a minimum, the intervals traversed by two different molecules should not intersect. This simple observation applied to the discrete case—where there are only a finite number of molecules—leads to a “greedy” algorithm, the so-called northwest corner rule. The totality of mass transferences plans in the discrete case is a polytope that arises in the transportation problem of mathematical programming, where it is treated in specialized form as an assignment problem and in generalized form as a network-flow problem. The northwest corner rule solves transportation problems having a particular structure on the costs and is, moreover, at the heart of many seemingly different problems having an “easy” solution (cf. Hoffman (1961), Barnes and Hoffman (1985), Derigs, Goecke, and Schrader (1986), Hoffman and Veinott (1990), Olkin and Rachev (1991), and Rachev and R¨ uschendorf (1994); see also Burkard, Klinz, and Rudolf (1994) and the references therein). The Academy of Paris offered a prize for the solution of Monge’s problem, which was claimed by the differential geometer P. Appell (1884–1928), who

Preface to Volume I

ix

established some geometric properties of optimal maps in the plane and in IR3 . But it took a long time until a real breakthrough in the transportation problem came, originating in the seminal 1942 paper of L.V. Kantorovich entitled “On the transfer of masses.” Kantorovich stated the problem in a new, abstract, and in more easily accessible setting and without knowledge of Monge’s work. Kantorovich learned of Monge’s work only later (cf. his 1948 paper). In the Kantorovich formulation of the mass transportation problem (the so-called “continuous” MTP), the initial mass (the clearing) and the final mass (the filling) can be considered as probability measures on a metric space. The essential step in this formulation is the replacement of the class of transportation map by the wider class of generalized transportation plans, that are identifiable with the convex set of all probability measures on the product space with fixed marginals. The difficult nonlinear Monge problem was thereby replaced by a linear optimization problem over an abstract convex set. This made it possible to put this problem in the framework of linear optimization theory and encouraged the development of general duality theory for the solution of the Kantorovich formulation of the transportation problem as the basic tool. Accordingly, these problems and their generalizations will be referred to as Monge–Kantorovich Mass Transportation Problems (MKPs). Kantorovich’s measure theoretic formulation made the problem accessible to various areas of the mathematical sciences and other scientific fields. Kantorovich himself received a Nobel Prize in Economics for related work in mathematical economics.(1) Here is a list of some references in the mathematical sciences: • Functional analysis: Kantorovich and Akilov (1984) • Probability theory: Fréchet (1951), Cambanis et al. (1976), Dudley (1976, 1989), Kellerer (1984), Rachev (1991c), R¨ uschendorf (1991) • Statistics: Gini (1914, 1965), Hoeffding (1940, 1955), Kemperman (1987), Huber (1981), Bickel and Freedman (1981), R¨ uschendorf (1991) • Linear and stochastic programming: Hoffman (1961), Barnes and Hoffman (1985), Anderson and Nash (1987), Burkard, Klinz and Rudolf (1994) • Information theory and cybernetics: Wasserstein (1969), Gray et al. (1975), Gray and Ornstein (1979), Gray et al. (1980) • Matrix theory: Lorentz (1953), Marcus (1960), Olkin and Pukelsheim (1982), Givens and Shortt (1984) (1) L.V. Kantorovich together with T.C. Koopmans received the Nobel Memorial Prize in Economic Science in 1975 for “contributions to the theory of optimum allocation of resources”; see Dudley (1989, p. 342).

x

Preface to Volume I

Many practical problems arising in various scientific fields have led mathematicians to solve MKPs: e.g., in • Statistical physics: Tanaka (1978), Dobrushin (1979) • Reliability theory: Barlow and Proschan (1975), Kalashnikov and Rachev (1990), Bene˘s (1985) • Quality control: Jirina and Nedoma (1957) • Transportation: Dantzig and Ferguson (1956) • Econometrics: Shapley and Shubik (1972), Pyatt and Round (1985), Gretsky, Ostroy, and Zame (1992) • Expert systems: Perez and Jirousek (1985) • Project planning: Haneveld (1985) • Optimal models for facility location: Ermoljev, Gaivoronski, and Nedeva (1983) • Allocation policy: Rachev and Taksar (1992) • Quality usage: Rachev, Dimitrov and Khalil (1992) • Queueing theory: Rachev (1989), Anastassiou and Rachev (1992a, 1992b) There are several surveys in the vast literature about MKP, among them Rachev (1984b), Rachev and R¨ uschendorf (1990), Burkard, Klinz, and Rudolf (1994), Cuesta-Albertos, Matr´ an, Rachev, and R¨ uschendorf (1996), and Gangbo and McCann (1996) related to dual solutions and applications of MKP; Shorack and Wellner (1985, Sect. 3.6) on optimal processes; Benes and Stepan (1987, 1991) on extremal mass transportation plans; R¨ uschendorf (1981, 1991, 1991a), Kellerer (1984), Rachev (1991c) on multivariate transportation problems; Dudley (1989) on distances in the space of measures; Talagrand (1992) and Yukich (1991) on matching problems. In recent years, characterizations of the solutions of the Monge–Kantorovich problem have been given in terms of c-subgradients of generalized convex functions defined in terms of the cost functions c(x, y) (cf. Knott and Smith (1984, 1992), Brenier (1987), R¨ uschendorf and Rachev (1990), R¨ uschendorf (1991, 1991a, 1995), Cuesta-Albertos, Matr´ an, Rachev, and R¨ uschendorf (1996), and Gangbo and McCann (1996)). For the case of squared Euclidean costs c(x, y) = |x − y|2 , the generalized convexity property is equivalent to convexity, and c-subgradients are identical to the usual subgradients of convex analysis. From this characterization

Preface to Volume I

xi

a series of explicit solutions of the transportation problem could be established. It also implies that the solutions of the MKP are under continuity assumptions given by mappings. Therefore, the solutions of the “easier” MKP imply as well the existence and characterizations of solutions of the original Monge problem, and so the MKP turns out to be the fundamental formulation of the transportation problem. For this reason, we concentrate in this book on the Kantorovich-type mass tranportation problems. For a discussion of interesting analytic aspects of the Monge problem, we refer to Gangbo and McCann (1996). Another type of MTP appears in probability theory, even if it leaves the framework of probability measures as transportation plans. Its solutions are bounded measures on a product of two spaces with the difference of marginals equal to the difference of two given probability measures. It will be called the Kantorovich–Rubinstein Problem (KRP), since the first results were obtained by Kantorovich and Rubinstein (1958). In its relation to the practical task of mass transportation it is sometimes referred to as the transshipment problem; see Kemperman (1983), and Rachev and Shortt (1990). The KRP has been developed to a great extent in the Russian school of probabilists and functional analysts, in particular by V.L. Levin, A.A. Milyutin, and A.M. Vershik and their students. For metric cost functions the KRP coincides with the corresponding MKP; for general cost functions it can be reduced to the MKP for a corresponding reduced cost function. For the duality theory of the KRP a specific detailed theory with many results that are of value in themselves has been developed with wide-ranging applications to mathematical economics. For a different approach to the KRP as introduced in Dudley (1976) and as further extended in Rachev and Shortt (1990) we refer to the book of Rachev (1991c). A problem related to both MKP and KRP is the Mass Transportation Problem with Partial Knowledge of the Marginals (MTPP), which is expressed by stating finitely many moment conditions. Problems of this type were formulated and extensively studied by Rogosinski (1958), Kemperman (1983), and Kuznezova-Sholpo and Rachev (1989). Barnes and Hoffman (1985) considered mass tranportaion problems with capacity constraints on the admissible transportation plans as an example of Mass Transportation Problems with Additional Constraints (MTPA) (see Rachev (1991b) and Rachev and R¨ uschendorf (1994)). In this book we give an extensive account of the duality theory of the MKP and the KRP, including the known results on explicit constructions and characterizations of optimal solutions. In Chapters 2 and 3 we present important duality theorems for the Monge–Kantorovich problem based on work of H. Kellerer, L. R¨ uschendorf, S.T. Rachev, and D. Ramachandran.

xii

Preface to Volume I

In Chapters 4 and 5 we present basically work of V.L. Levin; we analyze measure-theoretic methods for infinite-dimensional linear programs developed in context with the KRP as well as applications to general utility theorems (the Debreu theorem), extension theorems, choice theory, and set-valued dynamical systems.(2) In Chapters 6 and 8 we discuss new material on applications of the MKP and the KRP to the representation of ideal metrics and on various probabilistic approximation and limit theorems. This supplements the earlier results in this direction as described in the book of Rachev (1991) on probability metrics and stochastic models. In particular, we show that probability metrics allow us to find unified proofs for central limit theorems for martingales, (operator) stable limit theorems, and to more specific problems like compound Poisson approximation or rounding problems. Chapter 7, the first chapter in the second volume, is concerned with modifications of the MKP by additional or relaxed constraints. We discuss various types of moment problems and applications to the tomography paradoxon and to the approximation of queueing systems. A wide range of applications of metrics based on the transportation problem has been established in recent years in connection with recursive stochastic equations. We discuss algorithms of informatics (sorting, searching, branching, search trees) as well as applications to the approximation of stochastic differential equations, to the propagation of the chaos property of particle systems with applications to the approximation of nonlinear PDEs, as well as to the rate of convergence of empirical measures, which is of interest for matching problems in Chapters 9 and 10. From the technical point of view, MKPs can be subdivided into the discrete and continuous cases, according to the nature of their basic spaces and to the supports of the initial and the final masses. In the discrete case, the totality of the mass transference plans is the polytope that arises in the transportation problem of mathematical programming. There is, of course, a vast literature on the transportation problem, its specialization to the assignment problem, and its generalization to network flow problems. It turns out, as will be elaborated further in the book, that the northwest corner rule in the discrete case corresponds to a closed form for the solution in the continuous case. Indeed, the discrete analogue of a result known in the continuous case provides a new result in the discrete case; and its simple proof in the discrete case provides a new proof for the continuous case, see Rachev and R¨ uschendorf (1994c) and the references therein. Another approach in the discrete linear case prefers to exploit the special structure of supplies and demands (or clearings and fillings) and permits a particularly simple combinatorial algorithm for finding an optimal solution as developed (2) These two chapters were written following closely the notes kindly provided to us by V.L. Levin.

Preface to Volume I

xiii

by Balinski (1983), Balinski and Russakoff (1984), Balinski (1985, 1986), Goldfarb (1985), Kleinschmidt, Lee, and Schannath (1987), and Burkard, Klinz, and Rudolf (1994). MTPs may be viewed as an analogue and a unifying framework of a problem considered by probabilists at the beginning of the twentieth century: How does one measure the difference between two random quantities? Many specific contributions to the analysis of this problem have been made, including Gini’s (1914) notion of concordance, Kendall’s τ , Spearman’s , the analysis of greatest possible differences by Hoeffding (1940) and others, by Fréchet (1951, 1957), Robbins (1975), and Lai and Robbins (1976), and the generalizations of these results by Cambanis, Simons, and Stout (1976), R¨ uschendorf (1980), Tchen (1980), and Cambanis and Simons (1982). These (and others) offer piecemeal answers to basic questions that arise from different stochastic models; they give no guidance as to the question of what concept should be used where: There is no general theory underlying the diverse approaches. We refer to Kruskal (1958), Gini (1965), and Rachev (1984b, 1991c). In this book we investigate, develop, and exploit the connections between the discrete and continuous versions of the mass transportation problems (MTP) as well as study systematically the relationships between the methods and results from different versions of the MTP. The MTPs are the basis of many problems related to the question of stability of stochastic models, to the question of whether a proposed model yields a satisfactory approximation to the phenomenon under consideration, and to the problem of approximation of stochastic and deterministic algorithms. It is our belief that MTPs hold great promise in stochastic analysis as well as in mathematical analysis. The MTP is full of connections with geometry, (partial) differential equations, (generalized) convex analysis, moment problems, infinite-dimensional linear programming, measurable choice theory, and extension problems, and it has many open problems. It has a great potential for a series of applications in several scientific fields. This book grew out of joint work and lectures delivered by the authors at the Steklov Mathematical Institute, Universität M¨ unster, Universität Freiburg, the Ecole Polytechnique, SUNY at Stony Brook, and the University of California, Santa Barbara, over many years. Many colleagues provided helpful suggestions after reading parts of the manuscript. All chapters were rewritten several times, and preliminary versions were circulated among friends, who eliminated many inaccuracies and obscurities. We would like to thank H.G. Kellerer, V.L. Levin, M. Balinski, D. Ramachandran, G.A. Anastassiou, M. Maejima, M. Cramer, I. Olkin, M. Gelbrich, W. Römisch, V. Bene˘s, L. Uckelmann, and many other friends and colleagues who encouraged us to complete the work. We are indebted to Mrs. M. Hattenbach and Ms. A. Blessing for their superb typing; the appearance of this monograph owes much to them. We are grateful to the publisher

xiv

Preface to Volume I

and especially to J. Kimmel for support and patience. We are particularly thankful to J. Gani for his invaluable suggestions concerning improvements of this work, his help with the organization of the material, and his encouragement to continue the project. Finally, we thank the Alexander von Humboldt Foundation for its generous financial support of S.T. Rachev in 1995 and 1996, which made this joint work possible. (3)

(3) The

work of S.T. Rachev was also partially supported by NSF Grants. The joint work of the authors was supported by NATO-Grant CRG900798.

Preface to Volume II

The second volume of the Mass Transportation Problems is devoted to applications in a variety of fields of applied probability, queueing theory, mathematical economics, risk theory, tomography, and others. In Volume I we encompassed the general mathematical theory of mass transportation, concentrating our attention on: • the general duality theory of the transportation and transshipment problem; • explicit optimality results; • applications to minimal probability metrics, stochastic ordering, approximation and extension problems; • applications to functional analysis and mathematical economics (the Debreu theorem, utility theory, dynamical systems, choice theory, and convex and nonconvex analysis were dicsussed in this context). In Volume II we expand the scope of applications of mass transportation problems. Some of them arise from modifications of the admissible transportation plans. In fact, for applications to mathematical economics it is of interest to consider relaxations of the marginal constraints, such as upper or lower bounds on the supply and demand distributions, or additional constraints like capacity bounds for the transportation plans. In mathematical tomography the basic problem is to reconstruct the multivariate

xvi


probability distribution based on some information about the marginal distributions in a certain finite number of directions. This information may be represented by additional constraints on the support functions or distributional moments, or it may be contained in only partial information on the marginals. Thus there is a close relationship between a class of problems in mathematical tomography and the classical theory on moment problems, which again can be viewed as a relaxation on the set of constraints in mass transportation problems. We discuss in detail applications to approximation problems for stochastic processes and to rounding problems based on moment-type characteristics. A particular example will be the approximation of queueing models. The minimal metrics allow us to compare various rounding rules and to determine optimal ones from an asymptotic point of view. An important field of applications of mass transportation problems we shall consider in this second volume is to probabilistic limit theorems. This approach was introduced in the seventies by the Russian school of probability theory, headed by V.M. Zolotarev. By inherent regularity properties of probability metrics defined via certain mass transportation problems, there are streamlined proofs for central limit theorems on Banach spaces yielding sharp quantitative estimates of Berry–Esseen type for the convergence rate. The probability metric approach will be applied to general stable and operator stable limits theorems, martingale-type limit theorems, limit behavior of summability methods, and compound Poisson approximation. A particular application is to the classical problem in mathematical risk theory dealing with sharp approximation of the individual risk model by the collective risk model. The probability metric approach will also be applied to the quantitative asymptotics in rounding problems. A new field of application of probability metrics arising as solutions of mass transportation problems is the analysis of deterministic and stochastic algorithms. This research area is of increasing importance in computer science and various fields of stochastic modeling. Based on regularity properties of probability metrics, a general “contraction” method for the asymptotic analysis of algorithms has been developed. The contraction method has been applied successfully to a variety of search, sorting, and other tree algorithms. Furthermore, the recursive structure in iterated functions systems (image encoding), fractal measures, bootstrap statistics, and time series (ARCH) models has been analyzed by this method. It becomes clear that there are many interesting probabilistic applications of this method to be rigorously developed in the future. In the final chapter we consider applications to stochastic differential equations (SDEs) and to convergence of empirical measures. SDEs will be interpreted as continuous recursive structures. From this point of view we provide a detailed discussion on the approximative solution of nonlinear stochastic differential equations of McKean–Vlasov type by interactive par-


xvii

ticle systems with application to the Kac theory of chaos propagation. The probability metrics approach allows us to establish approximation results for various modifications of the diffusion system, some of them of “nontraditional” type. In a general context we establish approximation results for empirical measures and give applications to the approximation of stochastic processes. As final applications we discuss a weak approximation of SDEs of Itˆ o type by a combination of the time discretization methods of Euler and Milshtein with a chance discretization based on the strong invariance (embedding) principle. This approximation is given in terms of minimal Lp -metrics and thereby based on regularity properties of the solutions of the corresponding mass transportation problem.


Contents to Volume I

Preface to Volume I Preface to Volume II . . . . . . . . . . . . . . . . . . . . . . . . . 1 Introduction

vii xv 1

1.1

Mass Transportation Problems in Probability Theory . . .

1

1.2

Specially Structured Transportation Problems . . . . . . .

21

1.3

Two Examples of the Interplay Between Continuous and Discrete MTPs . . . . . . . . . . . . . . . . . . . . . . . .

23

Stochastic Applications . . . . . . . . . . . . . . . . . . . .

27

1.4

2 The Monge–Kantorovich Problem 2.1

57

The Multivariate Monge–Kantorovich Problem: An Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

2.2

Primal and Dual Monge–Kantorovich Functionals . . . . .

64

2.3

Duality Theorems in a Topological Setting . . . . . . . . .

76

2.4

General Duality Theorem . . . . . . . . . . . . . . . . . .

82

2.5

Duality Theorems with Metric Cost Functions . . . . . . .

86

2.6

Dual Representation for Lp -Minimal Metrics . . . . . . . .

96

xx


3 Explicit Results for the Monge–Kantorovich Problem

107

3.1

The One-Dimensional Case . . . . . . . . . . . . . . . . . 107

3.2

The Convex Case . . . . . . . . . . . . . . . . . . . . . . . 112

3.3

The General Case . . . . . . . . . . . . . . . . . . . . . . . 123

3.4

An Extension of the Kantorovich L2 -Minimal Problem . . 132

3.5

Maximum Probability of Sets, Maximum of Sums, and Stochastic Order . . . . . . . . . . . . . . . . . . . . . . . 144

3.6

Hoeffding–Fréchet Bounds . . . . . . . . . . . . . . . . . . 151

3.7

Bounds for the Total Transportation Cost . . . . . . . . . 158

4 Duality Theory for Mass Transfer Problems

161

4.1

Duality in the Compact Case . . . . . . . . . . . . . . . . 161

4.2

Cost Functions with Triangle Inequality . . . . . . . . . . 172

4.3

Reduction Theorems . . . . . . . . . . . . . . . . . . . . . 190

4.4

Proofs of the Main Duality Theorems and a Discussion . . 207

4.5

Duality Theorems for Noncompact Spaces . . . . . . . . . 219

4.6

Infinite Linear Programs . . . . . . . . . . . . . . . . . . . 241

4.6.1 Duality Theory for an Abstract Scheme of Infinite-Dimensional Linear Programs and Its Application to the Mass Transfer Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 4.6.2 Duality Theorems for the Mass Transfer Problem with Given Marginals . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 4.6.3 Duality Theorem for a Marginal Problem with Additional Constraints of Moment-Type . . . . . . . . . . . . . . . . 251 4.6.4 Duality theorem for a Further Extremal Marginal Problem

258

4.6.5 Duality Theorem for a Nontopological Version of the Mass Transfer Problem . . . . . . . . . . . . . . . . . . . . . . . 265 5 Applications of the Duality Theory 5.1

275

Mass Transfer Problem with a Smooth Cost Function— Explicit Solution . . . . . . . . . . . . . . . . . . . . . . . 275

5.2

Extension and Approximate Extension Theorems . . . . . 290 5.2.1 The Simplest Extension Theorem the Case X = E(S) and X1 = E(S1 ) . . . . . . . . . . . . . . . . . . . . . . . . . 290 5.2.2 Approximate Extension Theorems . . . . . . . . . . . . . . 292 5.2.3 Extension Theorems . . . . . . . . . . . . . . . . . . . . . . 295 5.2.4 A continuous selection theorem . . . . . . . . . . . . . . . . 302


xxi

5.3

Approximation Theorems . . . . . . . . . . . . . . . . . . 306

5.4

An Application of the Duality Theory to the Strassen Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

5.5

Closed Preorders and Continuous Utility Functions . . . . 322

5.5.1 Statement of the Problem and the Idea of the Duality Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 5.5.2 Functionally Closed Preorders . . . . . . . . . . . . . . . . . 324 5.5.3 Two Generalizations of the Debreu Theorem . . . . . . . . 329 5.5.4 The Case of a Locally Compact Space . . . . . . . . . . . . 335 5.5.5 Varying preorders and a universal utility theorem

. . . . . 337

5.5.6 Functionally Closed Preorders and Strong Stochastic Dominance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 5.6

Further Applications to Utility Theory . . . . . . . . . . . 344

5.6.1 Preferences That Admit Lipschitz or Continuous Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 5.6.2 Application to Choice Theory in Mathematical Economics . 352 5.7

Applications to Set-Valued Dynamical Systems . . . . . . 354

5.7.1 Compact-Valued Dynamical Systems: Quasiperiodic Points

354

5.7.2 Compact-Valued Dynamical Systems: Asymptotic Behavior of Trajectories . . . . . . . . . . . . . . . . . . . . . . . . 358 5.7.3 A Dynamic Optimization Problem . . . . . . . . . . . . . . 363 5.8

Compensatory Transfers and Action Profiles . . . . . . . . 367

6 Mass Transshipment Problems and Ideal Metrics

371

6.1

Kantorovich–Rubinstein Problems with Constraints . . . . 372

6.2

Constraints on the κth Difference of Marginals . . . . . . 383

6.3

The General Case . . . . . . . . . . . . . . . . . . . . . . . 402

6.4

Minimality of Ideal Metrics . . . . . . . . . . . . . . . . . 414

References

429

Abbreviations

473

Symbols

475

Index

487


Contents to Volume II

Preface to Volume II Preface to Volume I . . . . . . . . . . . . . . . . . . . . . . . . . 7 Relaxed or Additional Constraints 7.1

vii xi 1

Mass Transportation Problem with Relaxed Marginal Constraints . . . . . . . . . . . . .

2

7.2

Fixed Sum of the Marginals . . . . . . . . . . . . . . . . .

10

7.3

Mass Transportation Problems with Capacity Constraints . . . . . . . . . . . . . . . . . .

17

7.4

Local Bounds for the Transportation Plans . . . . . . . .

36

7.5

Closeness of Measure on a Finite Number of Directions . . . . . . . . . . . . . .

42

Moment Problems of Stochastic Processes and Rounding Problems . . . . . . . . . . . . . . . . . . .

52

7.6.1 Moment Problems and Kantorovich Radius . . . . . . . . .

54

7.6.2 Moment Problems Related to Rounding Proportions . . . .

57

7.6.3 Closeness of Random Processes with Fixed Moment Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

7.6

7.6.4 Approximation of Queueing Systems with Prescribed Moments 71 7.6.5 Rounding Random Numbers with Fixed Moments . . . . .

80

xxiv


8 Probabilistic-Type Limit Theorems 8.1

85

Rate of Convergence in the CLT with Respect to Kantorovich Metric . . . . . . . . . . . .

85

8.2

Application to Stable Limit Theorems . . . . . . . . . . . 102

8.3

Summability Methods, Compound Poisson Approximation 126

8.4

Operator-Stable Limit Theorems . . . . . . . . . . . . . . 131

8.5

Proofs of the Rate of Convergence Results . . . . . . . . . 153

8.6

Ideal Metrics in the Problem of Rounding . . . . . . . . . 178

9 Mass Transportation Problems and Recursive Stochastic Equations

191

9.1

Recursive Algorithms and Contraction of Transformations . . . . . . . . . . . . . . . . . . . . . . 191

9.2

Convergence of Recursive Algorithms . . . . . . . . . . . . 204

9.2.1 Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . 204 9.2.2 Branching-Type Recursion

. . . . . . . . . . . . . . . . . . 206

9.2.3 Limiting Distribution of the Collision Resolution Interval . 220 9.2.4 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 9.2.5 Limiting Behavior of Random Maxima . . . . . . . . . . . . 231 9.2.6 Random Recursion Arising in Probabilistic Modeling: Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 9.2.7 Random Recursion Arising in Probabilistic Modeling: Rate of Convergence . . . . . . . . . . . . . . . . . . . . . . . . 248 9.3

Extensions of the Contraction Method . . . . . . . . . . . 254

9.3.1 The Number of Inversions of a Random Permutation . . . . 254 9.3.2 The Number of Records . . . . . . . . . . . . . . . . . . . . 257 9.3.3 Unsuccessful Searching in Binary Search Trees . . . . . . . 260 9.3.4 Successful Searching in Binary Search Trees

. . . . . . . . 263

9.3.5 A Random Search Algorithm . . . . . . . . . . . . . . . . . 269 9.3.6 Bucket Algorithm . . . . . . . . . . . . . . . . . . . . . . . 272 10 Stochastic Differential Equations and Empirical Measures 10.1

277

Propagation of Chaos and Contraction of Stochastic Mappings . . . . . . . . . . . . . . . . . . . . 277

10.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 277 10.1.2 Equations with p-Norm Interacting Drifts . . . . . . . . . 279


xxv

10.1.3 A Random Number of Particles . . . . . . . . . . . . . . . 290 10.1.4 pth Mean Interactions in Time: A Non-Markovian Case 10.1.5 Minimal Mean Interactions in Time

. 293

. . . . . . . . . . . . 307

10.1.6 Interactions with a Normalized Variation of the Neighbors: Relaxed Lipschitz Conditions . . . . . . . . . . . . . . . . 308 10.2

Rates of Convergence in the Kantorovich Metric . . . . . . . . . . . . . . . . . . 322

10.3

Stochastic Differential Equations . . . . . . . . . . . . . . 332

References

351

Abbreviations

395

Symbols

397

Index

409


1 Introduction

1.1 Mass Transportation Problems in Probability Theory This chapter provides a basic introduction to mass transportation problems (MTPs). We introduce some of the methods used in studying MTPs: dual representations, explicit solutions, topological properties. We shall also discuss some applications of MTPs. The following measure-theoretic problems are well-known continuous cases of MKPs (see, for example, Dudley (1976), Levin and Milyutin (1979), R¨ uschendorf (1979, 1981), Kemperman (1983), Kellerer (1984), Rachev (1984b, 1991c) and the references therein). The Monge–Kantorovich mass transportation problem (MKP): Given fixed probability measures P1 and P2 on a separable metric space S and a measurable cost function c on the Cartesian product S × S, find µ c (P1 , P2 ) = inf

c(x, y)P ( dx, dy),

(1.1.1)

where the infimum is taken over all probability measures P on S × S having projections πi P = Pi ,

i = 1, 2.

(1.1.2)

2

1. Introduction

The Kantorovich–Rubinstein transshipment problem (KRP): Given P1 and P2 on S find ◦ µc (P1 , P2 ) = inf c(x, y)Q( dx, dy), (1.1.3) where the infimum is taken over all finite measures Q on S × S having the marginal difference π1 Q − π2 Q = P1 − P2 ;

(1.1.4)

that is, Q(A × S) − Q(S × A) = P1 (A) − P2 (A) for all Borel sets A ⊂ S. The measures P1 and P2 in the transportation problems MKP and KRP may be viewed as the initial and final distributions of mass, and P and Q in (1.1.2) and (1.1.4) as different types of transportation plans. If the infimum in (1.1.1) (respectively (1.1.3)) is realized for some P ∗ (for some Q∗ ) then P ∗ (Q∗ ) is said to be an optimal transportation plan for MKP (for KRP). The function c(x, y) is interpreted as the cost of transferring a unit mass from x to y. The Monge and the Kantorovich formulations of the mass transportation problem: In the Monge (1781) formulation (see the Preface) the mass transportation problem consists in the minimization of the cost of transporting soil from one location to another, which is fixed previously. Starting with the assumption that soil consists of small grains, the problem is to give the final location of every grain in such a way that the costs of transportation are as low as possible. Note that in this formulation it is not possible to divide the grains, so that grains that share the same initial location must also share the same final location. The Kantorovich formulation, see (1.1.1), (1.1.2), of the problem is similar, except for the fact that here one is allowed to “divide grains.” In this way we do not need to answer the question of what the final location of these grains should be, but the following alternative one: Given the sets A and B, what is the portion of the mass initially located at A that should be transferred to B? We will denote this quantity by P1,2 (B|A) and say that the familiy {P1,2 (·|·)} is a transportation plan. It is reasonable to assume that the mass does not vary in the transportation process. So, without loss of generality, we can assume that the total mass is one and identify the functions describing the initial and final distributions of the mass as probability measures P1 and P2 . It is evident that not every transportation plan is admissible. We define a probability measure P on the product space S × S by using P1 as the first marginal and {P1,2 (·|·)} as the transition probability. The transportation plan given by P1,2 is admissible if and only if the second marginal distribution of P coincides with P2 . To optimize the cost of transportation we need to know the cost of transporting a unit from one location to another. We denote by c(x, y) the cost of

1.1 Mass Transportation Problems in Probability Theory

3

transportation of a unit of mass initially located at x to its final destination y. We shall assume that c is positive. Finally, we arrive at the Kantorovich formulation of the mass transportation problem that is described by (1.1.1), (1.1.2). Namely, the problem is to compute the functional µ c (P1 , P2 ) = inf

  

c(x, y)P ( dx, dy); P ∈ M(P1 , P2 )

  

,

S×S

where c : S × S → IR+ is measurable and M(P1 , P2 ) is the set of all probability measures defined on the product σ-algebra with marginal distributions P1 and P2 respectively. If P ∗ ∈ M(P1 , P2 ) attains the above minimum, i.e., P ∗ determines an optimal transportation plan (OTP) with respect to c for P1 and P2 , then c (·|·) is called we say that P ∗ determines an OTP(c) for P1 , and P2 . µ the Kantorovich functional. Evidently the OTP(c) would be the family of conditional probabilities determined by P . Equation (1.1.1) can also be written in terms of random variables (r.v.s). With this terminology, the MKP is equivalent to the problem of finding a pair of r.v.s (X, Y ), such that the distributions of X and Y are P1 and P2 respectively, satisfying µ c (P1 , P2 ) = E[c(X, Y )]. With a slight abuse of notation, if the pair (X, Y ) satisfies this equality, then we say that it is an OTP(c) for their marginal distributions. The first problem is to check whether the infimum in the definition is attained, or equivalently, whether an OTP(c) for P1 and P2 exists. This can be resolved under rather general conditions, as we shall see in the next four chapters. If P1 and P2 are tight probability measures, then every probability distribution on S × S with P1 and P2 as marginals is also tight. Then a standard argument permits us to conclude that if c is continuous, or at least lower semicontinuous, an OTP(c) for P1 and P2 exists. An important problem is to determine the conditions under which the solution for the MKP coincides with that for the Monge problem with the same marginals. Or, equivalently, under what conditions can every OTP for P1 and P2 be written as (X, T (X))? It is not difficult to find examples with degenerate P1 in which such a function does not exist, because this property is related to a kind of continuity of the distribution P1 . In fact, it is known that both solutions coincide in the following cases:

4

1. Introduction

(i) If P1 and P2 are defined on a bounded subset of the finite-dimensional space IRk and are absolutely continuous with respect to the Lebesgue measure, and the cost functional is c(x, y) = ||x − y|| (proved in Sudakov (1979)). (ii) For the d2 -metric, the result in (i) was extended to the case where only P1 has a Lebesgue density. It has also been extended to the case of a separable Hilbert space U but under additional stronger assumptions (cf. Abdellaoui and Heinich (1994), Cuesta-Albertos and Matrán (1989), and Cuesta-Albertos, Matrán, and Tuero-Diaz (1993)). (iii) For a general U and c under the assumption that the infimum in (1.1.1) is attained, the support of P2 is finite, and P1 {x; c(x, y) − c(x, b) = h} = 0 for every a, b ∈ U and h ∈ IR (see Cuesta-Albertos and Tuero-Diaz (1993)). The assumption that P2 is discrete is removed in Gangbo and McCann (1996), who also consider more general cost functions of the form c(x, y) = h(x − y); see also Chapter 3 further on. Note that condition (iii) above is not fulfilled if U = IRk is endowed with the Euclidean norm, P1 is absolutely continuous with respect to the Lebesgue measure, and c(x, y) = ||x − y||. Under the assumption that P1 is atomless, the support of P2 is discrete, U is a separable Banach space, and c(x, y) = ||x − y||r , r ≥ 1, there exists an OTP for P1 and P2 of the form (X, T (X)); see Abdellaoui and Heinich (1994). Mass transportation problem with partial knowledge of the marginals (MTPP); see further Chapter 7: Given fixed sets a = {aij ; i = 1, 2, j = 1, . . . , n} of real numbers and f = {fij ; i = 1, 2, j = 1, . . . , n} of realvalued continuous functions, find I(c, a, f ) = inf

c(x, y)P ( dx, dy),

(1.1.5)

where the probability measure P on S × S satisfies the marginal moment conditions fij dπi P = aij , i = 1, 2, j = 1, . . . , n. (1.1.6) MTPP can be treated as an approximation of the mass transportation problem MKP. Indeed, if the probability measures P1 and P2 are not determined completely and if only some functionals of P1 and P2 are given, we can consider only the MTPP instead of the MKP. The set A− = {ak = {aij , f = (fij )}}, where fij is given in (1.1.6), describes a set of finitely many moment conditions. Classical results for moment problems obtained first on the real line and then on more general spaces (Krein and


5

Nudelman (1977), Karlin and Studden (1966), Winkler (1988)) state that the extremal solutions of these moment-type problems have finite support. Indeed, one finds that a subset D of the basic space is a set of uniqueness with respect to A− if and only if D = {x1 , . . . , xm } is a finite set and the vectors A(D) = (a1 (xi ), a2 (xi ), . . . , an (xi )), i = 1, . . . , m, where the ai , the renumbered functions fij of A− , are affinely independent. Since IRn there are at most n + 1 affinely independent vectors in IRn , it follows that P is an extremal solution if and only if its support has at most n + 1 points satisfying the assumptions on A(D). For S = IR the optimal solutions are supported by about half of the n+1 points; see Krein and Nudelman (1977). Dual Representations. Duality theorems for MKP and KRP are of the general form µ c (P1 , P2 ) = sup

f dP1 +

g dP2 ,

(1.1.7)

where the supremum is taken over the class of bounded measurable functions f and g on S satisfying f (x) + g(y) ≤ c(x, y). Further, ◦

µc (P1 , P2 ) = sup f d(P1 − P2 ),

(1.1.8)

where the supremum is taken over the class of bounded continuous functions f on S that satisfy the “Lipschitz” condition f (x) − f (y) ≤ c(x, y). When P1 and P2 have finite support, this dual representation is equivalent to the well-known linear programming duality. In fact, if the supports of P1 (i) and P2 are contained in {x1 , . . . , xn } with Pi (xj ) = aj ≥ 0, then (1.1.7) states that the Monge–Kantorovich functional

µ c (P1 , P2 )

=

min

  

c(xj , xk )pjk ; pjk ≥ 0,

j,k

(1)

pjk = aj

k

pjk =

(2) ak

j

for all j, k

  

admits the dual representation

µ c (P1 , P2 )

=

max

  

j

(1)

fj a j +

(2)

gk ak ; fj + gk ≤ c(xj , xk )

k

for all j, k

  

.

6

1. Introduction ◦

Analogously, µc represents the standard network flow problem. Find   (1) (2) ◦ µc (P1 , P2 ) = min c(xj , xk )pjk ; (pjk − pkj ) = aj − aj  j,k k   for all j .  The MKP was first formulated and studied by Kantorovich (1942) for a compact metric space (S, d) with cost function c = d. It has been shown that (1.1.9) µ d (P1 , P2 ) = sup f d(P1 − P2 ) ; f : S → IR is bounded

and |f (x) − f (y)| ≤ d(x, y) for all x, y ∈ S . In the case of a complete separable metric space (S, d), the duality result (1.1.9) was developed by Dudley (1976), Huber (1981), Szulga (1978, 1982), Fernique (1981), and de Acosta (1982). In a general separable metric space the result was established by Kellerer (1984), Rachev (1984c), and R¨ uschendorf (1979, 1981). The problem of finding the dual representation for the p -metric (p > 1), where dp (P1 , P2 )1/p , p (P1 , P2 ) = µ

(1.1.10)

is known as Dudley’s problem (Dudley (1976, Lecture 20)). Kantorovich and Rubinstein (1957, 1958) studied the KRP and established (1.1.8) in the case of a compact metric space (S, d). Levin and Milyutin (1979) extended the result to an arbitrary compact space S and an arbitrary continuous cost function. The dual relation still holds if S is a separable metric space and c(x, y) = d(x, y) T (d(x, y), d(y, a)) ,

(1.1.11)

where a is a fixed point of S and T (s, t) = T (t, s) is a continuous nonnegative function on t ≥ 0, s ≥ 0 that is nondecreasing in both of its arguments (see Rachev (1984b)). As for MTPP, in the general case of a completely regular topological space S, dual expressions were shown in Kemperman (1983) under a “tightness”condition on the pairs (fij , aij ),

i = 1, 2, j = 1, . . . , n.


7

MTPs can naturally be reformulated in terms of random variables and their distribution functions. Gini’s (1914) and Hoeffdings’s (1940) original research was carried forward by Fréchet (1951), who discovered the bounds in the set of constraints (1.1.2). We formulate this result using the notion of copula; see Sklar (1959). A copula is a function C : I 2 → I, I being the closed real unit interval, such that C(s, 0) = C(0, s)

and C(s, 1) = s = C(1, s)

(1.1.12)

whenever 0 ≤ s ≤ 1, and C(s2 , t2 ) − C(s2 , t1 ) + C(s1 , t1 ) − C(s1 , t2 ) ≥ 0

(1.1.13)

for any 0 ≤ s1 ≤ s2 ≤ 1 and 0 ≤ t1 ≤ t2 ≤ 1. Let X, Y be random variables and let F , G, and Q denote the distribution functions of X, Y , and (X, Y ), respectively. Schweizer and Sklar (1983) have shown that corresponding to X and Y there is a unique copula CXY such that Q(x, y) = CXY (F (x), G(y)) for all x, y. If F and G fail to be continuous, the word “unique” must be omitted in the preceding statement. In this setting, the Fréchet bounds describing the minimal and maximal possible value of Q can be expressed as W (F (x), G(y)) ≤ Q(x, y) ≤ M (F (x), G(y)) ,

(1.1.14)

where W, M are the copulas W (s, t) = max(s + t − 1, 0),

M (s, t) = min(s, t).

(1.1.15)

The copula CXY contains significant information about the type of dependence between the random variables. On the other hand, for any given function C satisfying (1.1.12) and (1.1.13) the probabilistic interpretation for the captured dependence is sought. For recent results in this direction, see Frank (1991), Mikusinski et al. (1991), Genest (1990), Schweizer (1990), and Nelsen (1990). There are many contributions to and extensions of the duality theorem (cf. R¨ uschendorf (1981), Kellerer (1984), Dudley (1989), Levin (1990), and Rachev (1991c)). In this book we shall present in detail the fundamental approaches to duality theory due to H. Kellerer and V.L. Levin (see Chapters 2, 4, and 5). Recently, a general duality theorem has been proved in Ramachandran and R¨ uschendorf (1995). Before stating this result, we recall that a probability space (Ω, A, P ) is called perfect if for any measurable function f : Ω → IR, one can find a Borel set B ⊂ f (Ω) such that P (f −1 (B)) = 1. Perfectness is a very weak regularity condition on Ω, P . Theorem 1.1.1 (General duality theorem, see further Chapter 2) Let (Si , Ai , Pi ), i = 1, 2, be probability spaces such that P1 is perfect, and

8

1. Introduction

let c : S1 × S2 → IR be product measurable and upper majorized (i.e., c(x, y) ≤ f1 (x) + f2 (y) for some fi ∈ L1 (Pi )); then the following duality theorem holds: µ c (P1 , P2 ) = sup h1 dP1 + h2 dP2 ;

1 hi ∈ L (Pi ), h1 (x) + h2 (y) ≤ c(x, y) . Duality theorems are the basis of many of the properties of the Monge– Kantorovich functional µ c (P1 , P2 ) and very often lead to explicit results for transportation problems. They also lead to the construction of optimal plans (cf. the introduction to this section), and under some conditions, solutions for the dual problem in Theorem 1.1.1 exist. In this framework, an optimal plan µ∗ ∈ M(P1 , P2 ) is characterized by the existence of h∗i ∈ L1 (Pi ), i = 1, 2, with h∗1 (x) + h∗2 (y) ≤ c(x, y), such that with respect to µ∗ , c(x, y) = h∗1 (x) + h∗2 (y) a.s.

(1.1.16)

For special cost functions, for example for c = d or dr , more specific versions of the duality theorem have been established in the literature; see Rachev (1991c). Both problems are solved in considerable generality only in the onedimensional case, where it is known that OTPs coincide with increasing arrangements. The first known results in this direction are found in early papers of Dall’Aglio (see Dall’Aglio, Kotz, and Salinetti (1991)). The following result from Cuesta-Albertos, R¨ uschendorf, and Tuero-Diaz (1993) includes and completes the known characterizations of OTP(d2 ). Proposition 1.1.2 (see further Chapter 5) Let X1 and X2 be real, square integrable r.v.s defined on the probability space (Ω, A, µ) with d.f.s F1 and F2 . Then (a) The following are equivalent: (i) (X1 , X2 ) is an OTP(d2 ). (ii) F(X1 ,X2 ) (x, y) = min{F1 (x), F2 (y)}, ∀x, y. (iii) There exists an r.v. Z uniformly distributed on (0, 1) such that for some nondecreasing functions φ1 , φ2 , X1 = φ1 (Z)

and

X2 = φ2 (Z) a.s. with respect to µ.


9

(iv) µ⊗µ{(ω, ω ); (X1 (ω)−X1 (ω ))×(X2 (ω)−X2 (ω )) ≥ 0} = 1. (b) The functions φi in (a) are essentially unique; φi = Fi−1 a.s. with respect to Lebesgue measure. (c) Given α ∈ (0, 1) and x ∈ IR, we define F(x, α) := µ(X1 < x) + αµ(X1 = x). If Z is an r.v. uniformly distributed on (0, 1), independent of X1 , then the pair X1 , F2−1 ◦ F(X1 , Z) is an OTP(c2 ) for P1 and P2 . (d) If P1 is nonatomic and (X1 , Y1 ) is an OTP(c2 ) for P1 and P2 , then Y1 = F2−1 ◦ F1 (X1 )

a.s. with respect to µ.

e) If Y1 = φ1 (X1 ) with φ1 nondecreasing, then (X1 , Y1 ) is an OTP(c2 ). If c(x, y) = φ(|x − y|), then a uniqueness result holds essentially only if φ is convex (see further Chapter 3). So in the one-dimensional case the OTP does not depend on the cost of transportation functions c (in the class of functions considered above). This is not the case if the dimension is greater than one. In the following simple example (see Cuesta-Albertos and Matrán (1991)) the OTP(cr ) depends on r. Consider the points in IR2 √ m0 = (0, 0), m1 = (1, 0), m2 = (−1/2, 3/2), and the probabilities Pi , i = 1, 2, that allocate probability 1/2 to m0 and mi , i = 1, 2, respectively. It is easy to show that a probability measure P giving an OTP(d2 ) for P1 , P2 is P {(m0 , m2 )} = P {(m1 , m0 )} =

1 . 2

However, for r < (log 4)/(log 3) an OTP(dr ) is given by the probability P defined by ∗

P ∗ {(m0 , m0 )} = P ∗ {(m1 , m2 )} =

1 . 2

In general spaces most results providing explicit OTPs have been found in cases in which a solution for the Monge problem gives an OTP (see the preceding section); i.e., for a function T : U → U the pair (X, T (X)) is an OTP. The problem has two aspects: the construction and the uniqueness of the solution. A complete characterization is known for a pair (X, Y ) to be an OTP. The following theorem is proved in R¨ uschendorf and Rachev (1990) (see also Knott and Smith (1984) and Smith and Knott (1987)).

10

1. Introduction

Theorem 1.1.3 Assume that (U, · ) is a separable Hilbert space and that P1 and P2 are two probability measures such that x 2 Pi ( dx) < ∞, i = 1, 2. If X, Y are two r.v.s with distributions P1 and P2 respectively, then (X, Y ) is an OTP(c2 ) if and only if Y ∈ ∂f (X) a.s. for some lower semicontinuous convex function f , where ∂f (x) denotes the subgradient of f in x: ∂f (x) = {y; f (x ) − f (x) ≥ y, x − x , for all x in the domain of f }. An interesting consequence of Theorem 1.1.3 is that the optimality of a certain map depends only on the map and not on the distributions P1 and P2 . More precisely, Corollary 1.1.4 Let T : U → U be a measurable map such that (X, T (X)) is an OTP(c2 ). Let X ∗ be an r.v. whose support is contained in that of X. Then (X ∗ , T (X ∗ )) is also an OTP(d2 ). It is known (see Rockafellar (1970)) that a function T satisfies T (x) ∈ ∂f (x) for some lower semicontinuous convex function if and only if T is cyclical monotone; i.e., xi+1 − xi , T xi ≤ 0, for x0 , . . . , xm = x0 ∈ U. (1.1.17) 0≤i≤m−1

As a consequence, functions giving OTP(d2 ) coincide a.s. with cyclically monotone functions. Theorem 1.1.3 is particularly useful, since the subgradients of convex functions or equivalently cyclically monotone functions are well studied in convex analysis. They are basic for the solution of convex optimization problems. This allows the construction of many examples of optimal transportation functions and optimal pairs (X, Y ) of the transportation problem. Examples of optimal functions are positive semidefinite symmetric linear functions, radial transformations, and projections on convex sets (cf. further Chapter 3). A continuously differentiable function φ is an optimal function with respect to d2 if and only if ∂φi ∂φj = ∂xi ∂xj

for i = j and φ is monotone;

(1.1.18)

i.e., x − y, φ(x) − φ(y) ≥ 0 (cf. R¨ uschendorf (1991)). The symmetry of the derivatives is a consequence of the Poincaré lemma. The necessity of the monotonicity property of optimal functions was first established in Cuesta-Albertos and Matrán (1989) (see further Chapter 3).


11

Monotone functions (also called Zarantonello-monotone) enjoy several good analytic properties. They are continuous a.s. with respect to the Lebesgue measure (and therefore are measurable), they are continuous on each point such that the image lies in the interior of the range, etc. (cf. Cuesta-Albertos, Matr´ an, and Tuero-Diaz (1993) and Tuero-Diaz (1993)). An immediate application of Theorem 1.1.3 is to the case of Gaussian probability measures (see Dowson and Landau (1982), Givens and Shortt (1984), Knott and Smith (1987), Olkin and Pukelsheim (1982), and R¨ uschendorf and Rachev (1990)). Proposition 1.1.5 (see Chapter 3) Let P1 and P2 be two n-dimensional, centered Gaussian probability measures with covariance matrices Σ1 and Σ2 respectively (regular or not); then 1/2 1/2 1/2 22 (P1 , P2 ) = trace Σ1 + Σ2 − 2 Σ1 Σ2 Σ1 . (1.1.19) Moreover, if X is an r.v. with distribution P1 , Σ1 is nonsingular, and −1 1/2 −1 1/2 1/2 1/2 1/2 A = Σ1 Σ1 Σ2 Σ1 Σ1 , then (X, AX) is an OTP(d2 ) for P1 and P2 . Equality (1.1.19) has been extended to general separable Hilbert spaces in Gelbrich (1990). The extension in Cuesta-Albertos, Matrán, and TueroDiaz (1993) includes an expression for the operator that corresponds to A in Proposition 1.1.5. Moreover, it will be shown in Chapter 3 that (1.1.19) provides a universal lower bound for the cost of transportation between distributions P1 and P2 with covariances Σ1 , Σ2 . Proposition 1.1.6 Let P1 and P2 be two n-dimensional probability measures centered in mean and with covariance matrices Σ1 and Σ2 respectively. Then 1/2 1/2 1/2 2 . (1.1.20) 2 (P1 , P2 ) ≥ trace Σ1 + Σ2 − 2 Σ1 Σ2 Σ1 This proposition has been generalized to separable Hilbert spaces in Cuesta-Albertos, Matr´ an and Tuero-Diaz (1993); see also Chapter 3. Moreover, in that paper two families of lower bounds are provided for 22 (P1 , P2 ) that depend on the orthogonal basis under consideration. In this way, for each orthogonal basis on U two lower bounds are found. One of them depends on just the first two moments of the one-dimensional marginal distributions of P1 and P2 . The other lower bound (which is more precise) is the sum of the 22 -costs of transportation between the one-dimensional

12

1. Introduction

marginals of P1 and P2 . The lower bound in (1.1.20) is the supremum of the first family. The second family of lower bounds can be improved if one considers the sum of costs of transportation between marginals with dimension greater than one. In general, however, the problem of determining optimal d2 -couplings remains difficult and leads to the problem of finding the solution of a partial differential equation of Monge–Ampère type. For example, it is not known in general how to find the optimal plan for a pair P1 and P1 Tα−1 , where Tα is a rotation by the angle α in IR2 (or more generally an orthogonal transformation in IRk ). It has been proved (see Tuero-Diaz (1991)) that if the support of the distribution of X contains an open set, then (X, Tα (X)), α > 0, is not an OTP(c2 ). Theorem 1.1.3 has been extended to general cost functions c in R¨ uschendorf (1991, 1991a, 1995). The role played by convex functions, their subgradients, and cyclically monotone functions in convex analysis is replaced by c-convex functions, c-subgradients, and c-cyclic monotone functions. The last three notions are introduced in nonconvex optimization theory but unfortunately have not been well studied till the present. Several explicit examples of optimal c-plans (functions) have been found. For c(x, y) = x − y r , r > 1, pairs (X, φ(X)) are OTPs, where φ(x) = |h(x)|− r−1 h(x) + x, r−2

(1.1.21)

with h any cyclic monotone function. In particular, c-plans φ(x) = (x A2 x)− 2(r−1) Ax + x, r−2

(1.1.22)

with A a positive semidefinite, symmetric linear function and φ(x) = x g( x ) x a radial transformation, are optimal; see further Chapter 3. Some approximating algorithms have been established in the case when P2 has finite support: 1. In Aurenhammer, Hoffmann, and Arnov (1992) an algorithm is provided to find the OTP(d2 ) between two distributions P1 and P2 defined in IRk if P1 is continuous with bounded support and the support of P2 is finite. 2. In Abdellaoui (1993, 1994) an algorithm is given to construct OTP(dr ) approximately if the cardinality of the support of P2 is finite. P1 is required to satisfy P1 {x; x − a r − x − b r = h} = 0, ∀a, b ∈ U and h ∈ IR. (1.1.23) For the algorithm one has to compute an element α ∈ IRn−1 , where n is the cardinality of the support of P2 , and then solve a certain


13

equation h(t) = α, t ∈ IRn−1 . The solution of this equation is not easy to find, and a procedure to construct a sequence of approximate solutions is given when P1 satisfies certain Lipschitz conditions. The range of applications of these algorithms has yet to be investigated. A simpler algorithm is given in R¨ uschendorf and Uckelmann (1996). The problem of uniqueness of OTPs has been solved in some particular cases. In Abdellaoui and Heinich (1994) and Cuesta-Albertos, Matrán, and Tuero-Diaz (1993) the uniqueness of the OTP(c2 ) is proved in Hilbert spaces if one of the probabilities involved satisfies a certain continuity condition. In fact, the technique employed in Cuesta-Albertos, Matrán, and Tuero-Diaz (1993) can be used to extend this result to the case in which it is known that the solutions of the Monge and Monge–Kantorovich problems coincide and c(x, y) = H[d(x, y)] with H strictly convex. Furthermore, in Cuesta-Albertos and Tuero-Diaz (1993) the uniqueness of the OTP(c) for general U and c was proved under the condition that the infimum of (1.1.1) is reached, the support of P2 is finite, and P1 {x; c(x, a)− c(x, b) = h} = 0 for every a, b ∈ U and h ∈ IR; but the technique employed is of a different nature. This result has been extended in Gangbo and McCann (1996) to the case where c(x, y) is of the form h(x−y). The uniqueness result implies in particular that OTPs are continuous with respect to weak convergence (see Cuesta-Albertos, Matrán, and Tuero-Diaz (1993)): Theorem 1.1.7 Let P, Qn , n = 0, 1, . . . , be probability measures defined on IRk such that x 2 dQn < ∞, n = 0, 1, . . . , and x 2 dP < ∞. If P is absolutely continuous with respect to the Lebesgue measure and (X, Tn (X)), n = 0, 1, . . . , are OTP(d2 ) between P and Qn , n = 0, 1, . . . , respectively, and Qn converges weakly to Q0 , then Tn (X) → T0 (X) a.s. The proof is based on the analytic properties of monotone functions mentioned before. So, the algorithm in Abdellaoui (1993), Aurenhammer, Hoffman, and Arnov (1992) together with the above result provides an approximate solution for the OTP(d2 ) between two given probability measures if one of them is continuous. We next briefly list the main duality results and topological properties of the Kantorovich–Rubinstein mass transshipment problem; see (1.1.3), (1.1.4). We shall use Levin’s formulation of the problem. Given a topological space U , a Radon measure with total mass 0 on U , and a cost function c : U × U → IR, it is required to find the minimum ◦ µc () := min c(x, y) dQ(x, y) (1.1.24) U ×U

14

1. Introduction

over the set D() of finite nonnegative Borel measures Q on the product U × U , subject to the balancing condition π1 Q − π2 Q = ; i.e., Q(B × U ) − Q(U × B) = (B)

for all Borel sets B ⊂ U.

Recall that a finite Borel measure on S is called a Radon measure if it is inner regular, i.e., (B) = sup P (C), where the supremum is taken over all compact sets C ⊂ B. For any probabilities P1 and P2 on U with := P1 − P2 , ◦

◦

µc (P1 , P2 ) := µc () is called the Kantorovich–Rubinstein functional.(1) A relation between the Kantorovich functional

µ c (P1 , P2 ) = min

  

  c(x, y) dQ(x, y); π1 Q = P1 , π2 Q = P2

U ×U



(1.1.25)

◦

and the Kantorovich–Rubinstein functional µc can be obtained in the following way. For a symmetric cost function c(x, y) ≥ 0, define the reduced cost function n−1 c(xi , xi+1 ); n ∈ IN, xi ∈ U, x1 = x, xn = y ; (1.1.26) c(x, y) = inf i=1

c(x, y) is the minimal cost of a transshipment from x to y that is carried out in several steps. Obviously, c(x, y) ≤ c(x, y) and c satisfies the triangle inequality: c(x, y) ≤ c(x, z) + c(z, y) for all x, y, z ∈ U . Furthermore, c is a (semi-)metric, and it is obviously the largest (semi-)metric dominated by c. By a slightly modified form of Theorem 1.1.1 for the case of a semi◦ metric cost function, µc admits a dual representation in the form of the Kantorovich metric   ◦ µc (P1 , P2 ) = sup f d(P1 − P2 ); f : U → R bounded and   U  continuous, and f (x) − f (y) ≤ c(x, y), ∀x, y ∈ U .  (1) A systematic study of the Kantorovich–Rubinstein mass transshipment problem was carried out in numerous papers by V.L. Levin. His approach will be presented in Chapters 4 and 5.


15

◦

Moreover, µc has the same dual representation (under some regularity conditions on U and c), implying ◦

µc (P1 , P2 )

=

f d(P1 − P2 ); f (x) − f (y) ≤ c(x, y), ∀x, y

sup

=

c(x, y), ∀x, y f d(P1 − P2 ); f (x) − f (y) ≤

c(x, y), ∀x, y sup f d(P1 − P2 ) ; |f (x) − f (y)| ≤

=

µ c (P1 , P2 ).

=

sup

(1.1.27) ◦

This gives a natural explanation of the relevance of µc for transportation problems. A somewhat different interpretation of µ c can be found in Kemperman (1983) (multistage shipping, cf. Rachev (1991c)). In linear programming the discrete analogue is known as the network flow problem. In terms of r.v.s, the following representation may also be given:

◦

µc (P1 , P2 ) = µ c (P1 , P2 ) (1.1.28) = inf{E c(X1 , X2 ); over all pairs of r.v.s (X1 , X2 ) with marginals P1 = PX1 andP2 = PX2 } = inf{E[c(X1 , X2 ) + c(X2 , X3 ) + · · · + c(Xn−1 , Xn )]; ∀ r.v.s X1 , . . . , Xn with PX1 = P1 , PXn = P2 and PXi arbitrary for 2 ≤ i ≤ n − 1}

= inf c(x, y)Q( dx, dy); Q ∈ D(P1 − P2 ) , n−1 ◦ where Q = i=1 PXi ,Xi+1 . Obviously µc is a semimetric on P(U ) if c is c(x, y) = 0, and symmetric. If c(x, y) = dp (x, y), p > 1, U = IRk , then ◦ c therefore, Cc (P1 , P2 ) = 0. This indicates a striking difference between µ ◦ and µc . The duality theorem for the Kantorovich–Rubinstein problem ◦

µc (P1 , P2 )

=

sup

f d(P1 − P2 );

(1.1.29)

1 f : U → IR , f (x) − f (y) ≤ c(x, y), ∀x, y ∈ S

was proved by Kantorovich in 1942 for the case that S is compact and c continuous. In Levin (1991) this result is proved for the case that U is homeomorphic to a Baire subset of a compact space, c : U × U → (−∞, ∞] is bounded from below, the sets {(x, y) ∈ U × U ; c(x, y) ≤ α} are analytic

16

1. Introduction

for all α (i.e., are the projection of a Borel set in (U × U ) × Y for a Polish ◦ ◦ space Y ), and µc = limN →∞ µc∧N . Finally, a strengthened version of the duality theorem for symmetric, nonnegative cost functions c(x, y) on a separable metric space (U, d) is given in Rachev and Shortt (1990) under the following boundedness and continuity conditions: (C1) c(x, y) = 0 if x = y; (C2) c(x, y) ≤ λ(x) + λ(y), ∀x, y, for some λ : U → IR+ mapping bounded sets into bounded sets; (C3) sup{c(x, y); x, y ∈ Bε (a), d(x, y) ≤ δ} → 0 as δ → 0 for each a ∈ U, Bε (a) the ε-ball with center a. Defining

f c := sup

|f (x) − f (y)| ; x = y c(x, y)

for f : U → IR, the following strengthened representation holds:

◦ µc (P1 , P2 ) = sup f d(P1 − P2 ); f c ≤ 1 (1.1.30)

= sup f d(P1 − P2 ); f (x) − f (y) ≤ c(x, y), ∀x, y , assuming

|x| dPi (x) < ∞, i = 1, 2. While obviously, in general

◦

µc (P1 , P2 ) ≤ µc (P1 , P2 ), ◦

it follows that for c(x, y) = d(x, y), µ d (P1 , P2 ) = µd (P1 , P2 ). The cost function cp (x, y) = d(x, y) max 1, |y − a|p−1 ,

p ≥ 1, x, y ∈ IR, ◦

satisfies (C1)–(C3). From the above dual representation for µc one obtains the explicit representation ∞ µcp (P1 , P2 ) = max 1, |x − a|p−1 |F1 (x) − F2 (x)| dx, ◦

(1.1.31)

−∞

where the Fi are the distribution functions (d.f.s) of Pi . Except for p = 1, an optimal measure Q∗ satisfying ◦ cp (x, y)Q∗ ( dx, dy) = µcp (P1 , P2 )


17

◦

is not known. For p ≥ 1, µcp is identical to the Fortet–Mourier metric (cf. Rachev (1991a))     F Mp (P1 , P2 ) = sup f d(P1 − P2 ); f ∈ C p ,  

(1.1.32)

U

where C

p

=

g : U → IR1 ; sup r1−p sup

r≥1

|g(x) − g(y)| , x = y, d(x, a) ≤ r, d(y, a) ≤ r d(x, y)

≤1 .

Inequalities between the Lp -minimal metric p (1.1.10), the Fortet–Mou◦ rier metric µcp , and other metrics on P(U ) are studied in Rachev (1991c). In particular, for any P0 ∈ P(U ), D(P0 , p ) := {P ∈ P(U ); p (P, P0 ) < ∞} is p -complete, and for P0 = δa the following convergence criterion holds in D(δa , p ): p (Pn , P ) → 0 ⇔ ⇔

◦

µcp (Pn , P ) → 0

(1.1.33)

Pn → P (weakly) and

dp (x, y)(Pn − P )( dx) → 0.

For P0 = δa , no corresponding characterization is known. In the one-dimensional case the explicit representation of the Kantorovich– Rubinstein functional in (1.1.31) has been extended to the following general result in Rachev and R¨ uschendorf (1995); see also Chapter 3. Theorem 1.1.8 Assume that c(x, y) = |x − y|ξ(x, y), x, y ∈ IR1 , and that for x < t < y, ξ(t, t) ≤ ξ(x, y) holds. Furthermore, assume that ξ(x, y) is symmetric, continuous on the diagonal, and t → ξ(t, t) is locally bounded. Then, under the conditions of the duality theorem, ◦

µc (P1 , P2 ) =

ξ(t, t)|F1 (t) − F2 (t)| dt.

(1.1.34)

It is interesting to note that the solution (1.1.34) depends only on the behavior of the cost function on the diagonal. There are indications that an exact optimal transshipment plan does not exist for this kind of problem (cf. Bene˘s (1992, 1995)). In the multivariate case an analogous explicit result has been obtained in Levin (1990) for the case of differentiable cost functions; see Chapter 5.

18

1. Introduction

Theorem 1.1.9 Suppose that U is a domain in IRn and c : U × U → IR1 a bounded function with analytic level sets {c ≤ α}, c(x, x) = 0, ∀x ∈ U , c being continuously differentiable in some open neighborhood of the diagonal. ◦ If µc (P1 , P2 ) > −∞, then ◦ µc (P1 , P2 ) = u0 d(P1 − P2 ) (1.1.35) with u0 (x) =

gradξ c(ξ, η)|η=ξ dξ, γ(x0 ,x)

where γ is a piecewise smooth curve from x0 to x. Again the optimal value depends only on the gradient gradξ c(ξ, η) of the cost function c = c(ξ, η) at the diagonal η = ξ. The differentiability of c at the diagonal is crucial for the derivation of this result. It excludes the important case c(x, y) = x − y . In Rachev and R¨ uschendorf (1994) the following upper bound for the transportation cost has been found. 1/p Theorem 1.1.10 Let cp (x, y) = x − y p = ( |xi − yi |p ) , x, y ∈ IRn . (a) Then for probability measures P1 , P2 with Lebesgue densities f, g, ◦

µcp (P1 , P2 ) ≤

y p |IH (y)| dy

(1.1.36)

IRn

holds, with h := f − g and IH (y) :=

1 ◦

t−(n+1) h( yt ) dt;

(b) If there exists a continuous function g : IRn → IR1 almost everywhere differentiable and satisfying ∇g(y)

=

∇g(y) =

(sgn (yi IH (y))) a.s. for p = 1, q/p (1.1.37) |yi | for p > 1, sgn (yi IH (y)) yq

then in (1.1.36) the equality holds. Condition (1.1.37) is fulfilled in dimension one. A simple sufficient condition in the case p = 1 for (1.1.37) is IH ≥ 0 a.s., which is a stochastic


19

ordering condition. From the derivation it seems that the bound in (1.1.36) in the case p = 2 should be much sharper than the classical inequality ◦ µc2 (P1 , P ) ≤ 4 y 2 |f (y) − g(y)| dy (1.1.38) due to Zolotarev (1975). Remark 1.1.11 As we have mentioned in the preface, explicit solutions of the MKP problem in particular settings had already been given at the beginning of the twentieth century, seeking the proper way to measure the difference between random quantities. Gini’s (1914) “simple index of dissimilarity” coincides with 1 when S = IR; see Rachev (1991c) and the references therein. Various authors independently considered the question of finding explicit expressions for µ c on S = IR. Some general results were obtained by Cambanis, Simons, and Stout (1976), Tchen (1980), R¨ uschendorf (1980), and Cambanis and Simons (1982). The case S = IRk was considered by Knott and Smith (1984, 1992), R¨ uschendorf and Rachev (1990), R¨ uschendorf (1991, 1995), and Gangbo and McCann (1996). ◦

Explicit representations for µc where S = IR were obtained by Dudley (1976) when c = d, and by Rachev (1984b) when c is given by (1.1.31). Kuznezova and Rachev (1989) derived explicit solutions for the MTPP (1.1.5), (1.1.6) with n = 1, S a Banach space, c(x, y) = ||x − y||p , p > 0, fi (x) = ||x||q , q > 0, i = 1, 2. In the discrete case a greedy algorithm for determining the optimal measure P ∗ in the MTPA (1.1.18) was proposed in Barnes and Hoffman (1985). Hoffman and Veinott (1990), as well as Olkin and Rachev (1991), have studied the discrete MTPA with more general constraints. Extremal Points. An important concept in the study of optimization problems is that of extremal points. If H is a bounded convex set in a locally convex topological vector space D, we denote by ex H the subset of its extremal points. For example, the bounds of the left- and right-hand sides of (1.1.1) are extremal points in the set of distribution functions corresponding to probability measures (1.1.2). In many situations the extremal points are rich enough to allow a representation of the points in H in the sense of the Krein–Milman and Choquet (1959) theorem. In this case, the following holds for a convex function g on E: sup g(x) = x∈H

sup g(x).

x∈ex H

(1.1.39)

This relation may provide a substantial reduction in effort for the solution of optimization problems. In probability theory the following theorem by Weizs¨ acker and Winkler (1980) ensures the validity of the maximum principle (1.1.39) for the set

20

1. Introduction

of solutions of a mass transportation problem. Let S be a Polish space, P a convex weakly closed set of probability measures on S, tj : S → S, gj : S → IR Borel measurable maps, and H =

P ∈ P ; P t−1 = P, g dP = 0, j ∈ IN . j j

(1.1.40)

Then for every P ∈ H there is a probability measure m supported by Q(B)m( dQ). The measure ex H that represents P in the sense P (B) = ex H m is defined on the σ-algebra of subsets of ex H that makes Q → Q(B), B a Borel subset of S, measurable. Moreover, (1.1.39) is valid for any g : H → IR that is bounded convex and continuous with respect to the weak topology in H. The assertion (1.1.39) emphasizes the role of extremal points. A general approach to their investigation is to be found in Stepan (1977). Let (S, B(S)) be a topological space and its Borel σ-algebra; let C(S) be the space of continuous bounded functions on S; and let A ⊂ C(S) be a linear set containing all constant functions. The set A defines an equivalence ∼ on the set M (S) = M of Borel Radon probability measures on S by P ∼ Q if P (a) = Q(a) where P (a) =

for all a ∈ A and P, Q ∈ M,

(1.1.41)

a dP . The equivalence classes [P ] = {Q ∈ M ; P ∼ Q},

S

P ∈ M , are nonempty convex and weakly closed subsets in the space M . Therefore, the integral representation property from the above theorem holds, and the validity of (1.1.39) follows. The theorem of Douglas (1964) states that P ∈ M is an extremal point of its equivalence class if and only if the set A is dense in the space of integrable random variables L1 (P ). A more geometric characterization of extremal points may be obtained using the following concept, first used in the discrete situation by Letac (1966): D ⊂ B(S) is a set of uniqueness (with respect to A) if for P, Q ∈ M , P (D) = Q(D) = 1,

P ∼ Q ⇒ P = Q.

(1.1.42)

This implies that the set A separates the measures from M that are equivalent and supported by D. Denote by supp P the support of P , i.e., the smallest closed set having P -mass equal to 1, and let S be metrizable and A defined as above. If P ∈ M is an extremal point in [P ], then supp P =

∞ n=1

Kn

1.2 Specially Structured Transportation Problems

21

for some nondecreasing sequence {Kn } of compact sets of uniqueness with ∞ = 1; (1.1.43) Kn P n=1

see Stepan (1979). Moreover, if A is a sublattice in C(S), then (1.1.43) is a sufficient condition for P to be an extremal point. The large step in characterizing extremal solutions in the MTP is obviously the description of their support. The Stepan theorem states that the support is, in the sence of (1.1.43), built of sets of uniqueness. Even if A is very rarely a sublattice, the knowledge of sets of uniqueness provides substantial help in the investigation of extremal solutions; see also Bene˘s (1986), Bene˘s and Stepan (1991), Bene˘s (1992). Topological properties. Kantorovich and Rubinstein (1958) showed that 1 = µ d induces the topology of weak convergence when (S, d) is a compact metric space. Dudley (1966, 1968, 1976) considered (in essence) 1 on a separable metric space (S, d) where d is bounded and proved that 1 -convergence is equivalent to weak convergence. Dudley (1966) and Dobrushin (1970), as well as de Acosta (1982), found sufficient conditions for 1 -convergence in a general separable metric space (S, d). Rachev (1982a, 1984b) extended Dudley’s and Dobrushin’s results by considering general criteria for µ c -convergence where c(x, y) = H(d(x, y)) and H is a polynomial with nonnegative coefficients.

1.2 Specially Structured Transportation Problems The transportation problem, its specialization to the assignment problem, and generalization to network flow problems are basic points of departure for linear programming and combinatorial optimization; see Gale (1960), Berge and Ghouila-Houri (1965), Anderson and Nash (1987), and Chapter 7. The Transportation Problem (TP): Given finite sets M = {1, . . . , m},

N = {1, . . . , n},

real numbers a = (a(i)) > 0, i ∈ M, b = (b(j)), j ∈ N satisfying a(M ) = b(N ) in general, x(I, J) is defined to mean { x(i, j); i ∈ I, j ∈ J}, and c = (c(i, j)), (i, j) ∈ M × N , we consider the discrete version of the MKP in probability when scaling the problem so that a(M ) = b(N ) = 1; i.e. find x = (x(i, j)) to minimize c(i, j) x(i, j); x(i, N ) = a(i), x(M, j) = b(j), (1.2.1) M ×N

22

1. Introduction

x(i, j) ≥ 0, i ∈ M, j ∈ N

.

Scaling the problem so that a(M ) = b(N ) = 1 yields the discrete version of the MKP in probability. The costs c are said to satisfy the Monge condition if c(i, j) + c(i + 1, j + 1) ≤

c(i, j + 1) + c(i + 1, j) all (i, j) ∈ M × N.

(1.2.2)

Hoffman (1961) pointed out that if a TP satisfies the Monge condition, then a “greedy” recursion, the so-called “northwest corner rule,” produces an optimal solution to (1.2.2). The optimal solution x∗ is found by first setting x∗ (1, 1) = min(a(1), b(1)) and reducing the problem to one having one row fewer if a(1) ≤ b(1) and one column fewer if a(1) ≥ b(1), and then repeating the procedure by choosing the maximum possible value for the variable x(i, j) in the northwest corner of the reduced problem. A host of problems amenable to a simple greedy type method were shown by Hoffman to be special cases of this Monge condition (which is a translation of his “no intersection” remark). In probability theory the optimal solution x∗ is known as the Hoeffding H distribution (see Cambanis, Simons, and Stout (1976), Tchen (1980), R¨ uschendorf (1980), Rachev (1984c)). Namely, x∗ (i, j) = P (X ∗ = i, Y ∗ = j), where (X ∗ , Y ∗ ) is the pair of random variables that solves the MKP:   inf Ec(X, Y ); over all joint distribution functions FX,Y of X and Y    a(i), G(x) = b(i) . with fixed marginals F (x) =  i≤x

i≤x

The optimal joint distribution is given by FX ∗,Y ∗ (x, y) = H(x, y) = min(F (x), G(y)) as defined in the right-hand side of (1.1.14). The range of application of the Monge condition is wider. Barnes and Hoffman (1985) considered a generalized TP in which additional “capacity” constraints s r 1

x(i, j) ≤ γ(r, s),

r = 1, . . . , m − 1, s = 1, . . . , n − 1,

(1.2.3)

1

are imposed for a given set of upper bounds γ on the “leading rectangles” and showed that if c satisfies the Monge condition and γ some additional conditions, a greedy algorithm again finds a solution. In another direction, Derigs, Goecke, and Schrader (1986) developed an algorithm for solving

1.3 Two Examples of the Interplay Between Continuous and Discrete MTPs

23

any assignment problem by successive transformation into an equivalent problem where c(i, j) = ∞ for |i − j| ≥ 2 (see Szwarc and Posner (1984)). The dual transportation problem (DTP). Given the data of the TP, find u = u(i)i∈M and v = v(j)j∈N to maximize {u(M ) + v(N ); u(i) + v(j) ≤ c(i, j),

i ∈ M, j ∈ N }.

(1.2.4)

The primal polyhedron PP is the set of feasible solutions to the TP (1.2.1), and the dual polyhedron DP is the set of feasible solutions to the DTP (1.2.4) with the normalization u(1) = 0. The extreme points of the dual polyhedron may be given a completely combinatorial description in the following terms. If (u, v) ∈ DP , define its graph G(u, v) to consist of the nodes M ∪N and edges (i, j) ∈ M ×N if u(i)+v(j) = c(i, j). Then (u, v) ∈ DP is an extreme point if and only if G(u, v) is connected (in the generic, nondegenerate case, G(u, v) is a spanning tree). If G is a spanning tree on M ∪ N ; its M -signature is the function d that assigns to each node i ∈ M its degree in G. The function d must satisfy d(i) ≥ 1, i ∈ M,

and

d(M ) = m + n − 1.

(1.2.5)

The main result of Balinski (1984) (and Balinski and Russakoff (1984) in the nondegenerate case) is that every function d satisfying (1.2.5) is the signature of a spanning tree contained in the graph of exactly one extreme point of DP. In the nondegenerate case this states that there is a one-to-one correspondence between extreme points and functions d. This permits the use of a special and particularly simple approach for solving several specially structured transportation problems, called “signature algorithms.” The basic algorithm (Balinski (1984)) shows that one may go from any extreme point of DP, that is, from a spanning tree with some signature, to the unique extreme point having any given signature in at most (m − 1)(n − 1) pivots. This immediately gives an algorithm for the assignment problem because every extreme point of the assignment problem polyhedron has signature d = (1, 2, . . . , 2). Thus the assignment problem is an instance where one knows the signature of an optimal solution, so the basic algorithm can be used (as well as variants and improvements, for example, Balinski (1985), Goldfarb (1985), Balinski (1986)).

1.3 Two Examples of the Interplay Between Continuous and Discrete MTPs Multidimensional MKP. Let {Pi , i = 1, . . . , N } be a set of probability measures given on a separable metric space (U, d) and let M (P1 , . . . , Pn )

24

1. Introduction

be the space of all Borel probability measures P on the direct product U N with fixed projections Pi on the ith coordinate, i = 1, . . . , N . Evaluate     c dP ; P ∈ M (P1 , . . . , Pn ) , (1.3.1) Ac (P1 , . . . , Pn ) = inf   UN

where c is a given continuous function on U N (see Lorentz (1953), Schaefer (1976), Schay (1979), R¨ uschendorf (1979, 1981), Tchen (1980), and Rachev (1984b)). In the case U = IR a special result obtains. A function W : IR2 → IR satisfies the Monge condition if W (x, y) + W (x , y ) ≤ W (x, y ) + W (x , y ) for all x ≤ x , y ≤ y , (1.3.2) and a function W : IRN → IR, N ≥ 2, satisfies the generalized Monge condition if W satisfies the Monge condition in any two of its arguments. Suppose the probability measure Pi has the distribution function Fi , i = 1, . . . , N , and define for x = (x1 , . . . , xN ), H(x) = min{Fi (xi ); i = 1, . . . , N }. Then the theorem of Lorentz (1953) (as generalized by Tchen (1980), R¨ uschendorf (1980)) states that for any continuous function W satisfying the generalized Monge condition, the duality W dH (1.3.3) AW (P1 , . . . , PN ) = holds if sup{

IRN

|W | dP ; P ∈ M (P1 , . . . , Pn )} < ∞.

IRN

As a consequence one obtains a generalization of Hoffman’s result for the discrete multidimensional MKP. Namely, let Ij = {1, . . . , mj } for j = 1, . . . , k, and I = I1 × · · · × Ik with generic element (i1 · · · ik ) ∈ I. For Jj ⊂ Ij define a(i1 , . . . , ik ); (i1 , . . . , ik ) ∈ J1 × · · · × Jk . a(J1 , . . . , Jk ) = The problem is to minimize c(i1 , . . . , ik )a(i1 , . . . , ik ) when a(i1 , . . . , ik ) ≥ 0 and

(1.3.4)

a(I1 , . . . , Ij−1 , ij , Ij+1 , . . . , Ik ) = a (ij ) for each ij ∈ Ij and all j = 1, . . . , k. j

This is a generalized transportation problem whose constraint polytope does not have integer-valued extreme points even if the data aj (ij ) are integers. The solution to (1.3.4) may be obtained by a “greedy” algorithm: the generalized northwest corner rule. The optimal solution a∗ is found by setting a(1, 1, . . . , 1) = minj (aj (1)) and reducing the problem to one having

1.3 Two Examples of the Interplay Between Continuous and Discrete MTPs

25

the indices 1 eliminated from each Ij for which aj (1) is a minimum, and then repeating this step. This has a straightforward combinatorial proof. But, once in hand, it may be used to construct another proof of the Lorentz theorem. If instead of minimizing in (1.3.4), one wishes to maximize in the presence of the Monge condition for k = 2, then one chooses the “opposite” of the northwest rule: it is the northeast rule. For k ≥ 3, however, there is no well-defined opposite, so there is no natural candidate. Discrete and Continuous Assignment Games. The (discrete) assignment game (Shapley and Shubik (1972)) may be defined as follows. There are two sets of players, say men and women or firms and workers, M and N , and if i ∈ M and j ∈ N get together, they can jointly earn c(i, j) utilities. The question is, what partnerships should be formed and how should the parties split their joint earnings? A stable matching is a pairing of the players with the property that no two players not matched with each other could increase their joint utility by being matched. The core of the game is the set of all possible Pareto optimal ways in which the matched players of a stable matching can share their joint income. The core is a convex polytope, and it is of interest to know something of its shape and variety. Balinski and Gale (1990) have used the signature idea to characterize in game-theoretic terms when the core has the maximum possible number of extreme points and what that maximum number is, and when it has the minimum possible number and what that number is. Consider now a continuous version of the assignment game with two sets of players A ⊂ S and B ⊂ S, with payoff c(x, y) for each pair (x, y) ∈ S. In the spirit of continuous MKPs, let (S, d) be an arbitrary separable metric space and A and B measurable sets. An imputation is a pair of Lipschitz functions u : A → IR and v : B → IR. An imputation is feasible in the region R ⊂ S × S if u(x) + v(y) = c(x, y) for all (x, y) ∈ R. An imputation is individually rational if u ≥ 0, and it is stable if u(x) + v(y) ≥ c(x, y) for all x ∈ A ⊂ S, y ∈ B ⊂ S. This is simply the continuous analogue of the discrete case discussed above, and the two types of agents may be thought of as employee and employer, c(x, y) the joint income they are to share if x and y work together, and u and v the way they divide their joint income. Let P1 and P2 be the distributions of the costs of living in the respective sets, assumed to be normalized, P1 (A) = 1, P2 (B) = 1, so consider P1 and P2 to be probability measures on (S, d). Problem: Find an imputation (u∗ , v ∗ ) that is feasible, individually rational, and stable that minimizes the total costs

u(x)P1 ( dx) +

T C(u, v) = A

v(y)P2 ( dy); B

26

1. Introduction

that is, letting νc (P1 , P2 ) = inf{T C(u, v); (u, v) ∈ C(c)}, where C(c) is the set of stable imputations, find (u∗ , v ∗ ) for which T C(u∗ , v ∗ ) = νc (P1 , P2 ). The duality theorem that links this assignment problem with MKPs implies that     c(x, y)P ( dx, dy); πi P = Pi , i = 1, 2 . νc (P1 , P2 ) = sup   S×S

It was proved by Rachev (1984c) in the case of a separable metric space (S, d), and c(x, y) = H(d(x, y)), where H is an increasing convex function on [0, ∞] that vanishes at 0 and satisfies the Orlicz condition supt≥0 H(2t)/ H(t) < ∞ and c(x, y)Pi ( dx) < ∞, i = 1, 2. Moreover, if the measures P1 and P2 are tight (e.g., if (S, d) is complete), then there exists an optimal measure P ∗ such that ν(P1 , P2 ) = c(x, y)P ∗ ( dx, dy), and so there exists a feasible imputation in a region R with P ∗ (R) = 1. In the real case, S = IR, d(x, y) = |x − y|, the explicit representations of νc , P ∗ and R are essentially given by Cambanis et al. (1976). The problem may be generalized to a continuous multivariate game dealing with explicit and dual representations of a functional as follows: N N ui dPi ; ui (xi ) ≥ 0, ui (xi ) ≥ c(x1 , . . . , xN ) νc (P1 , . . . , PN ) = inf 1 A i

i

for all xi ∈ Ai ⊂ S; i = 1, . . . , N , Pi has support Ai ; u(x)(P1 + · · · + PN )( dx);

θc (P1 , . . . , PN ) = inf A N

u(xi ) ≥ c(x1 , . . . , xN ) for all xi ∈ A, i = 1, . . . , N ,

i

Pi has support A; Mc (p1 · · · pn , f1 · · · fn ) =

inf

N 1

λi pi ; λi ∈ IR,

N

λi fi (x) ≥ c(x)

1

for all x = (x1 · · · xN ) ∈ S

N

.

1.4 Stochastic Applications

27

The functional νc represents the minimal cost of N players x1 , . . . , xN with payoff function c(x1 , . . . , xN ). θc is a “refined” version of νc when the players share their income equally (see Rachev (1985) for the case N = 2). νc and θc are types of MKPs, whereas Mc is related to the MTPP (see Kempermann (1972, 1983), Rachev (1985)). In Mc the “shares” ui (x) = λi fi (x) are proportional to fixed fractions λi , and Pi may be viewed as the corresponding total costs of the ith player.

1.4 Stochastic Applications In Dudley (1976) and Rachev (1982a, 1984b, 1984c, 1991c) versions of the Glivenko–Cantelli theorem, functional limit theorems, and the problem of stability of queueing models are studied in terms of the functionals µ c and ◦ µc . This approach is based on investigations of “metric” properties of µ c and ◦ µc and generalizes some results of Fortet and Mourier (1953), Samuel and Bachi (1964), Dudley (1968, 1972), and Shorack and Wellner (1986) concerning µ c -convergence of empirical measures. Applied to functional limit theorems it also implies the Bernstein–Kantorovich invariance principle. The “minimal” properties of µ c are particularly useful in problems of stability of stochastic models (see Zolotarev (1977), Rachev (1984c)). From the ◦ dual and explicit representations for µ c and µc it is possible to construct stability estimates for the queueing model G|G|1|∞ that are uniform over the entire time axis (see Kalashnikov and Rachev (1985, 1986)). More general models are considered in the monograph of Kalashnikov and Rachev (1990). A series of alternative applications will be given in later chapters of this book. In this section we shall outline some of the stochastic applications of the MKP that will be considered in detail later on in the book; see Chapters 5–9. Mass Transportation Problems and Probability Distances. MTPs can be reformulated in terms of probability distances and probability metrics (Rachev and Shortt (1989), Rachev (1991c)), which will, moreover, establish the mutual connections among the mass transportation problems studied. Let Pi be the space of all Borel probability measures on a product of i copies of a separable metric space (S, d). A map µ : P2 → [0, ∞] is said to be a probability semidistance with real parameter Kµ ≥ 1 if it possesses the following three properties: If P ∈ P2 and P ∈ P and P (∪x∈S {(x, x)}) = 1, then µ(P ) = 0. If P ∈ P2 , then µ(P B −1 ) = µ(P ), where B is the symmetry operator B(x, y) = B(y, x).

28

1. Introduction

If P13 , P12 , P23 ∈ P2 and there exists Q ∈ P3 such that Q13 = P13 , Q12 = P12 , Q23 = P23 , where Qij ∈ P2 are the corresponding marginals of Q, then µ(P13 ) ≤ Kµ [µ(P12 ) + µ(P23 )]. If Kµ = 1, then µ is said to be a probability semimetric. The first condition may be strengthened to; If P ∈ P2 then P (∪x∈S {(x, x)}) = 1 if and only if µ(P ) = 0. In this case µ is called a probability distance, or briefly, probability distance, and a probability metric in the case Kµ = 1. Suppose that (X, Y ) is a pair of dependent random variables taking values on (S, d) with joint law L(X, Y ), and set µ(X, Y ) = µ(L(X, Y )). Denote the marginal distributions of X, Y by P 1 , P 2 respectively and consider the probability metric µ(X, Y ) = E d(X, Y ). Then the best possible improvement of the inequality µ(X, Y ) = E d(X, Y ) ≥ E d(X, O) − E d(Y, O), under the following conditions: B1: E d(X, O) − E d(Y, O) = b1 − b2 , b1 , b2 ∈ IR B2: E d(X, O) = b1 , E d(Y, O) = b2 B3: P 1 − P 2 = P1 − P2 , P1 , P2 ∈ P1 B4: P 1 = P1 , P 2 = P2 B5: P 1 = P1 , P 2 = P2 , ν(X, Y ) < α for a probability distance ν leads us to consider the following functionals: B1: µ (b1− b2 ) = b1 , b2 ∈ IR}

inf{E d(X, Y ); E d(X, O) − E d(Y, O)

=

b1− b2 ,

B2: µ(b1 , b2 ) = inf{E d(X, Y ); E d(X, O) = b1 , E d(Y, O) = b2 } ◦

B3: µ(P1 , P2 ) = inf{αE d(X, Y ); for some α α(P 1 − P 2 ) = P1 − P2 }, P1 , P2 ∈ P1

>

0 such that

B4: µ (P1 P2 ) = inf{E d(X, Y ); P 1 = P1 , P 2 = P2 } B5: µν(P1 , P2 , α) = inf{E d(X, Y ); P 1 = P1 , P 2 = P2 , ν(X, Y ) < α} The functionals B3, B4 are identical to KRP (1.1.3) and MKP (1.1.1) ◦ ≤ µν ≤ µ. The functional respectively. An obvious inequality holds: µ ≤ µ B2 is a special case of the MTPP (1.1.5) with given first moments. Again a chain of inequalities holds: µ ≤µ≤µ ≤ µ.


29

c -convergence of probability measures. An interesting topological µ property of the Kantorovich-functional µ c (P1 , P2 ) (see (1.1.1)) arises from the fact that under suitable conditions on the cost function c, µ c (P1 , P2 ) induces a distance equivalent to the convergence in distribution together with the convergence of integrals (see Theorem 1.4.1 below). For instance, if H : [0, ∞) → [0, ∞) is a continuous, nondecreasing function such that H(0) = 0 and H satisfies Orlicz’s condition sup t>0

H(2t) < ∞, H(t)

with c(x, y) = H[d(x, y)], then c induces a distance between P1 and P2 that is in fact a minimal distance in the terminology of Rachev (1991c). An important particular case is when H(t) = tr , r > 0, because then r (P1 , P2 ) :=

r∗ ∗ µ rc (P1 , P2 ) ,

r∗ = min{1, 1/r},

(1.4.1)

is a metric on the space of probability measures with finite moment of rth order. When c(x, y) = H[d(x, y)], with H as above, the following result relates the convergence to zero of { µc (Pn , P )} to weak and moment convergence. Theorem 1.4.1 Let {Pn , n ≥ 0} be probability measures on the Borel σalgebra on U . Assume that c(x, a)Pn ( dx) < ∞, n = 0, 1, . . . for some a ∈ U. U

Then { µc (P0 , Pn )} converges to zero if and only if {Pn } converges weakly to P0 and lim c(x, b)(Pn − P0 )( dx) = 0 n

U

for some (and therefore for any) b ∈ U . This result was proved in Rachev (1982a, 1989) and in the special cases H(t) = tr , r ≥ 1, in Bickel and Freedman (1981), Rachev (1982b), Zolotarev (1975); for the case r ≥ 1 for a bounded metric d in Huber (1981); and for r = 2, U = IR with the usual distance in Mallows (1972). These metrics have been employed in several ways in the literature, and many applications appear in Rachev (1991). We later discuss some more recent applications. Among them the following:

30

1. Introduction

• In Gelbrich (1995) and Rachev and R¨ uschendorf (1995), metrics r , r > 0, have been used to measure the order of convergence of a sequence of approximations to the solution of a stochastic differential equation. • In R¨ omisch (1980, 1981) the metric d2 is employed to measure the stability of a stochastic programming model with respect to the underlying distribution. This situation is relevant when the underlying distribution is not known exactly. An interesting metric related to dr is the total variation metric: σ(P1 , P2 ) :=

sup |P1 (A) − P2 (A)|, A∈B(U )

where B(U ) is the Borel σ-algebra in U . In fact, σ can be viewed as the limiting case for dr , σ(P1 , P2 ) =

lim dr (P1 , P2 ),

r→∞

which has a representation as the minimal metric given by σ(P1 , P2 ) = inf µ[X = Y ], X,Y

where X and Y are r.v.s with distributions P1 and P2 respectively. We shall now briefly list some applications of topological properties of various metrics (“minimal,” “ideal”) to obtain some classical limit results in probability theory; see Chapters 8 and 9. Minimal metrics (like r (P, Q)) have been applied to prove various versions of the central limit theorem (see for example Cuesta-Albertos and Matr´ an (1989), Rachev (1991c), Rachev and R¨ uschendorf (1992,1993), Tanaka (1973)). The proofs are based on regularity properties of the minimal metrics as well as on Theorem 1.4.1. It is not surprising that these metrics allow us to obtain a simple proof of Mourier’s strong law of large numbers (SLLN) in Banach spaces as well. Consider a sequence {Xn } of U -valued, independent, identically distributed random elements with distribution Q defined on the probability space (Ω, A, µ). Assume that U is a separable Banach space with norm ·

and consider the function c(x, y) = ||x − y|| as the cost of transportation. For ω ∈ Ω let Pnω be the empirical probability measure allocating probability 1/n to each of the points X1 (ω), . . . , Xn (ω). If we assume that E X1 < ∞, then the SLLN for real random variables implies that

1 = Xi (ω) → n i=1 n

x Pnω ( dx)

x Q( dx)



31

Moreover, Varadarajan’s extension of the Glivenko–Cantelli theorem (which only requires the SLLN for real, bounded r.v.s for its proof) states that the sequence of probabilities {Pnω } converges weakly to Q a.s. with respect to µ. By Theorem 1.4.1 we conclude that lim 1 (Pnω , Q) = 0 n

a.s. with respect to µ,

(1.4.2)

a version of Varadarajan’s theorem (cf. Rachev (1982a, 1982b), CuestaAlbertos and Matr´ an (1992)). Finally, if (Unω , Vnω ) is an OTP(c1 ) between ω Pn and Q, this implies that ! ! ! !1 ω ω ! ! X (ω) − EX (1.4.3) i 1 ! = EUn − EVn

!n ≤ E Unω − Vnω = 1 (Pnω , Q), Xi (ω)} converges a.s. with and we have proved that the sequence { n1 an (1992)). For farrespect to µ to EX1 (cf. Cuesta-Albertos and Matr´ reaching extensions of this result we refer to Rachev (1991). Moreover, the same kind of reasoning leads to results on the a.s. stability of sums of r.v.s in Banach spaces. For instance, consider the case of weighted sums. Assume that U is a separable Banach space and that {Xk } is a sequence of independent, integrable U -valued random elements with finite expectations. We seek conditions implying a.s. an,k (Xk − EXk ) −→ 0. (1.4.4) k≥1

Here {an,k } is a Toeplitz sequence of real numbers; i.e., lim an,k = 0 n

for each k ≥ 1

and

|an,k | ≤ C

for each n ≥ 1.

k≥1

Assume that C = 1, and reduce the problem to the case in which all 0 and a := 1 − weights are positive. Define X0 := n,0 k≥1 an,k . Consider the probability measures Qn = k≥0 an,k PXk and Pn (ω, ·), n = 1, 2, . . . , allocating mass an,k to the point Xk (ω), k = 0, . . . . Argue as in the SLLN to obtain a variety of known results on almost sure stability of weighted sums in Banach spaces. This derivation does not require geometric conditions on the space but reduces the problem to the corresponding result for real r.v.s. In the same way one can also cover cases in which the weights are not constants but random, or in which they are given by linear operators. Also, one

32

1. Introduction

can obtain results on further summation methods such as those of Cesàro, Abel, and others (see Cuesta-Albertos and Matrán (1992)). Central limit theorems for summability methods by means of ideal metrics have been considered in Maejima (1988), Rachev and R¨ uschendorf (1992). A generalization to operator-stable summation schemes is provided in Maejima and Rachev (1996); see also Chapter 8. Simultaneous representations. Skorohod–Lebesgue spaces. Here we will mention some extensions of properties of the quantile functions to abstract spaces. For a probability measure P on the real line with distribution function F , the quantile function is defined as TP (t) := inf{u; F (u) ≥ t},

t ∈ (0, 1).

Two key properties of the quantile function are that TP (X) has distribution P if X is uniformly distributed on (0, 1) and that the mappings TP are typical examples of the Skorohod representation for weak convergence of probability measures. This is described in the following proposition. Proposition 1.4.2 Let X be an r.v. uniformly distributed on [0, 1]. If {Pn } is a sequence of probability measures defined on IR which converges weakly to P , then {TPn (X)}n converges a.s. to TP (X). Another important property is that quantile functions provide a simultaneous representation for the Kantorovich functional in the one-dimensional case: According to Proposition 1.1.2, if X is a fixed r.v. with uniform distribution on (0, 1), then for all probabilities P1 and P2 , µ c (P1 , P2 ) = E[c(TP1 (X), TP2 (X))].

(1.4.5)

These properties have the following interesting application. Let Lr (µ) be the set of all real r.v.s X such that |X|r dµ < ∞ and let H : Lr (µ) → IR be a functional that depends only on the distribution of the r.v.s; i.e., if X, Y ∈ Lr (µ) satisfy PX = PY , then H(X) = H(Y ). With a slight abuse of notation, we can write H(PX ) instead of H(X). Assume that H is continuous with respect to the Lr -norm on Lr (µ). By employing the same notation as in the preceeding section, we want to show strong pointwise consistency of H(Pnω ), lim H(Pnω ) = H(Q) n


(1.4.6)

Taking into account that for every fixed ω, TPnω is an r.v. defined on the interval (0, 1) whose distribution is Pnω and considering as well the Glivenko–Cantelli theorem, Proposition 1.4.2, and the strong law of large numbers, we have that a.s. with respect to µ TPnω (t) → TQ (t)


33

for almost all t in [0, 1], and

|t|

r

Pnω ( dt)

→

|t|r Q( dt).

Then applying Theorem 1.4.1, we get that lim n

|TPnω (t) − TQ (t)|r dt = 0


This gives the consistency result (1.4.6) by the continuity of H with respect to the Lr -norm. Von Mises functionals related to Lr -norms, as in the case of the r-means, are a typical example of the application of this method (see Cuesta-Albertos and Matr´ an (1988), Cuesta-Albertos and Matrán (1989)), which can be generalized to cover further functionals related to Orlicz spaces (see Landers and Rogge (1980)). The interesting point in this method is the fact that the r.v.s TQ and TPn , n ∈ IN, are defined on the same probability space, which allows the usual arguments for a result of this type to be simplified. Therefore, it would be interesting to have a simultaneous representation result as in (1.4.5) in more general spaces in order to obtain strong consistency for this kind of functional in these spaces. Regrettably, it is well known that this representation does not exist even in the two-dimensional case. In Cuesta-Albertos, R¨ uschendorf, and Tuero-Diaz (1993) it is shown that certain families of probability distributions (distributions with the same dependence structure) admit a simultaneous representation with respect to d2 -costs, but this is not enough to apply the arguments given above; see also Chapter 3. To avoid this problem in Cuesta-Albertos and Matr´ an (1994) the socalled Skorohod–Lebesgue spaces were introduced. These spaces can be considered as a general (and minimal) framework to develop the previous scheme in abstract spaces. The idea is the following. As stated in (1.4.5), in the one-dimensional case, the quantile function provides simultaneous representations for OTPs and by Proposition 1.4.2 also gives a simultaneous Skorohod representation for weak convergence. In Blackwell and Dubins (1983) it is shown that Proposition 1.4.2 can be generalized to separable Banach spaces. Then for a separable Banach space U there exists a U -valued r.v. X such that for any probability measure Q on U , there exists a fixed function TQ with d

TQ (X) = Q and such that the weak convergence of {Pn } to P implies TPn (X) → TP (X) a.s.

34

1. Introduction

These simultaneous Skorohod representations are not uniquely defined. But once one of them is fixed, a distance between P1 and P2 can be defined by r∗

SLr (P1 , P2 ) := E [ TP1 (X) − TP2 (X) r ] ,

r∗ = min(1, 1/r),

for all probability measures P1 and P2 on U with x r dPi < ∞, i = 1, 2. If we denote by Mr (U ) the family of all probability measures on U with finite expectation of the rth power, then (Mr (U ), SLr ) is a separable metric space, which is called the Skorohod–Lebesgue space of order r. The r metric is topologically equivalent to SLr (see Cuesta-Albertos and Matrán (1994)), and with this construction the arguments that were sketched for the one-dimensional case can be carried out exactly in the same way to prove the strong a.s. convergence of functionals of the form H(Pnω ). This scheme includes the results related to almost sure stability of sums of r.v.s mentioned in the preceding section. A new application still to be developed is to prove that Skorohod–Lebesgue spaces provide a common framework for the comparison of all r -distances. While in the one-dimensional case the Skorohod representation problem and the OTP share the same solution, namely the quantile function, for higher dimensions the solution is not the same. This can be seen from the fact that Cr -functionals do not admit simultaneous representations, while Skorohod representations can be chosen simultaneously. In TueroDiaz (1993) the conditions under which OTPs can be used to obtain Skorohod representations are analyzed. In other words, if (X, Tn (X)) is an OTP between P and Qn , n = 0, 1, . . . , and if {Qn } converges weakly to P , when does {Tn } converge to the identity? This problem is solved in the case of a separable Hilbert space U with the cost function d2 . If the space is finite-dimensional, then {Tn } converges almost surely to the identity without additional assumptions. However, if the dimension is infinite, then this convergence is only in probability, and a counterexample for the a.s. convergence is readily constructed. Proposition 1.4.2 generalizes this result in the finite-dimensional case; for further extensions see Heinich and Lootgieter (1993). Rate of convergence in the central limit theorem. An interesting application of the minimal p -metrics, defined as solutions of the transportation problems with respect to c(x, y) = dp (x, y) (see (1.1.10)), are to general forms of the central limit theorem and to the rate of convergence related to it. Recently some versions of these results have been found for martingales with values in separable Banach spaces U (cf. Rachev and R¨ uschendorf (1993), and see also Chapter 8).


35

We first explain the idea in the independent case. Let (Xi ) be i.i.d., U -valued random variables and define Zn = n−1/α

n

Xi ,

(1.4.7)

i=1

the normalized sequence, assuming that Xi is centered at zero. Let ϑ be a (symmetric) α-stable random variable with values in U ; i.e., n−1/α

n

d

ϑi = ϑ,

i=1

where (ϑi ) are i.i.d. copies of ϑ. We consider the convergence of Zn to ϑ with respect to the Kantorovich metric (the minimal 1 -metric identical to d1 ):

1 (P1 , P2 ) = inf

x − y dP (x, y); P ∈ M (P1 , P2 ) . (1.4.8) To formulate a rate of convergence theorem for Zn , we introduce the following smoothed (of order r) version of 1 : r (P1 , P2 ) := sup hr−1 1 (X + hϑ, Y + hϑ),

(1.4.9)

h>0

and similarly, for the total variation metric σ, we define the smoothed metric σr (P1 , P2 ) = sup hr σ(X + hϑ, Y + hϑ),

(1.4.10)

h>0

where σ(X, Y ) := σ(P X , P Y ); in (1.4.9) and (1.4.10) ϑ is assumed to be independent of X and Y . For the rate of convergence result we need the finiteness of the following distances: 1 := 1 (X1 , ϑ),

r := r X1 , ϑ),

σ := σ(X1 , ϑ), and σr := σr (X1 , ϑ). We have the following theorem describing estimates of the correct order under the above conditions. Theorem 1.4.3 (Rate of Convergence in the Stable Limit Theorem)(2) Let 1 ≤ α ≤ 2 and assume that (2) The stable limit theorem has attracted the attention of specialists in many areas of science and engineering; for application in finance, see Mittnik and Rachev (1997), Rachev and R¨ uschendorf (1994), and the references therein.

36

1. Introduction

(a) E ϑ < ∞, (b) 1 + r + σ1 + σr < ∞ for some r > α. Then

1 (Zn , ϑ) ≤ C r n1−r/α + τr n−1/α ,

(1.4.11)

1

where τr := max(1 , σ1 , σr r−α ). The proof of Theorem 1.4.3 is based on a generalization of the Bergström convolution method. It uses essentially the ideality properties of the metrics r , σr , for example r (X +Z, Y +Z) ≤ r (X, Y ) for Z independent of (X, Y ) and r (αX, αY ) = αr r (X, Y ) for α > 0; also, by definition r (X, Y ) ≥ hr−1 1 (X + hϑ, Y + hϑ). Furthermore, basic ingredients of the proof are the following smoothing inequalities: 1 (X, Y ) ≤ 1 (X + εϑ, Y + εϑ) + 2εE ϑ

(1.4.12)

and for X, Y, Z, W independent, 1 (X + Z, Y + Z) ≤ 1 (Z, W )σ(X, Y ) + 1 (X + W, Y + W ).

(1.4.13)

These properties, together with m = [ n2 ], yield the following decomposition: 1 (Zn + ϑ) ≤ 1 (Zn + εϑ, ϑ1 + εϑ) + C · ε ϑ1 + X1 + · · · + Xn + εϑ ≤ 1 Zn + εϑ, n1/α m ϑ1 + · · · + ϑr + Xj+1 + · · · + Xn + 1 + εϑ, n1/α i=1 ϑ1 + · · · + ϑj+1 + · · · + Xn + εϑ n1/α ϑ1 + · · · + εm+1 + Xm+2 + · · · + Xn + εϑ, ϑ + εϑ . + 1 1 n1/α The terms in this expression can be estimated by the metric properties above and by using induction on the number of terms (for details see Section 8.1). The finiteness condition has been established for several examples including certain stochastic processes. A similar approximation result has been given for martingales where the quantities in the bounds are replaced by distances involving the conditional distributions, as for example, τr = sup Er PXj |Fj−1 , Pϑj , (1.4.14) j


37

where (Xj , Fj ) is a martingale. For the proof, it is necessary to introduce G-dependence metrics defined by µ(X, Y /G) = sup µ(X + V, Y + V ), V ∈G

(1.4.15)

where µ is a metric and the supremum is over all G measurable r.v.s V , and to study the smoothing versions and the regularity properties of these metrics. In the one-dimensional case one obtains as a consequence a rate of convergence result for martingales with respect to the Prohorov distance. Convergence of algorithms. The main approaches to the asymptotic analysis of algorithms in the literature deal with transformation methods (moment generating functions, Mellin transformations, etc.), the martingale method, the method of branching processes, and, for a more restricted class of stochastic algorithms, the method based on stochastic approximations. The analysis of algorithms is an important application of stochastics in computer science that poses difficult questions and problems; it has also led to some new developments in stochastic theory (cf. Aldous and Steele (1993) and the references therein). Based on the properties of minimal metrics introduced at the beginning of this chapter, a promising new method for asymptotic analysis has recently been introduced. R¨ osler (1991) provided an asymptotic analysis of the quicksort algorithm based on the minimal p -metric. His proof has been generalized by Rachev and R¨ uschendorf (1995) to a general “contraction method” with a wide range of possible applications. A series of examples and further developments of the method may be found in some recent work Cramer (1995), Cramer and R¨ uschendorf (1996b). The contraction method (in its basic form) uses the following sequence of steps; see Chapter 9 for details: 1. Find the correct normalization of the algorithms (typically by studying the first moments or tails). 2. Determine the recursion for the normalized algorithm. 3. Determine the limiting form of the normalized algorithms. The limiting equation is typically defined via a transformation T on the set of probability measures. 4. Choose an ideal metric µ such that T has good contraction properties with respect to µ. This ideal metric has to reflect the structure of the algorithm. It also has to have good bounds in terms of interpretable other metrics and must allow the estimation of bounds (in terms of moments, usually). As a consequence one obtains the following result.

38

1. Introduction

5. The conjectured limiting distribution is the unique fixed point of T . Finally, one should ensure that the recursion is stable enough for the contraction in the limit to be made use of in order to establish contraction properties of the recursion itself for n → ∞. This is technically the most involved step in the analysis. 6. Establish convergence of the algorithm to the fixed point. Applications of this method to several sorting algorithms, to the communication resolution interval (CRI) algorithm, to generalized branching-type algorithms, to bootstrap estimators, to iterated function systems, and to learning algorithms have been given, as well as to other algorithms. We explain the contraction method for the example of the quicksort algorithm (cf. also Rösler (1991)). The defining recursion is given by d

Ln = n − 1 + LIn + Ln−In ,

(1.4.16)

where In is uniformly distributed on {1, . . . , n}, Ln is the number of steps needed by the quicksort algorithm to sort n numbers, and Ln is an independent copy of Ln . The randomness in this problem arises from the assumption that the order of the numbers is uniform on the set of all permutations. A number is picked up randomly, and the other n − 1 elements are compared with this number and are divided into two groups, the group of smaller elements and the group of larger elements. It is easy to establish the asymptotics of the mean n = ELn , n = 2n log n + n(2γ − 4) + 2 log n + 2γ + 1 + o(1), where γ is Euler’s constant. Also, it can be seen that Var(Ln ) = cn2 +o(n2 ). Define the normalization Yn =

Ln − n d In − 1 n − In = Y n−In + cn (In ) (1.4.17) YIn −1 + n n n

1 with cn (j) = n−1 n + n (j−1 + n−j − n ). Then taking c(x) := 2x log x + 2(1 − x) log(1 − x) + 1, the following inequality holds: 4 1 log n + O . sup |cn ([nx]) − c(x)| ≤ n n x∈(0,1)

Since Inn → τ , a random variable uniformly distributed on (0, 1), one obtains the limiting fixed point equation d

Y = τ Y + (1 − τ )Y + c(τ ).

(1.4.18)

The right-hand side of (1.4.18) defines the transformation T on the set of all distributions (with finite pth moments and expectation zero). It is easy


39

to establish that T is a contraction with respect to the minimal p -metric, with contraction factor smaller than 1. One can readily prove that p (Yn , Y ) → 0, where Y is the unique solution of (1.4.18) with finite pth moment. The fixed point equation is not so easy to analyze, and an exact solution of it is still unknown. But it was recently found in Cramer (1995) that an extremely good approximation to the distribution of Y can be found in the class of lognormal distributions. Simulations of quicksort and the lognormal fit show hardly any difference, so that the lognormal can be used in practice; for more details see Sections 9.1–9.3. Numerical approximation of stochastic differential equations. Here we briefly present numerical solutions of a multidimensional stochastic differential equation (SDE) following the results in Gelbrich (1990, 1995) and Gelbrich and Rachev (1996) and applying them in econometric models for asset returns. The method consists in determining the drift and diffusion coefficients at grid points and then combining the time discretization of the SDE with the discretization of the stochastic input (in our case the Wiener process); for details see Section 10.3. We start with a description of the grid, which as we shall see will give us an “almost” optimal approximation of the SDE. On the interval [t0 , T ] define an equidistant grid H with points t0 = t1 < · · · < tn = T with step size h. H will be the minimal set of t0 < time points at which values are available for the method, and h will be the period between two neighboring observations in the past that influence the present drift and diffusion coefficients at any time. For any t ∈ [t0 , T ] we ti ≤ t} as the number of time steps t one can go back define iH (t) := max{i; into the past from t. As we shall see, this is a standard framework in the socalled ARCH (GARCH) modeling of asset returns; see, for example, Mittnik and Rachev (1997) and the references therein. We consider a stochastic differential equation (SDE) where the drift and diffusion depend on the present and the past states:

x(t) − x0

t t = b(x, s) ds + σ(x, s) dw(s) t0

t q t σj (x, s) dwj (s) = b(x, s) ds + t0

(1.4.19)

t0

j=1 t

0

(t ∈ [t0 , T ], x0 ∈ IRd ).

40

1. Introduction

Here, w = (w1 , . . . , wq ) is a q-dimensional standard Brownian motion, and we use the notations b(x, s) := σ(x, s)

= :=

h), x(s − 2 h), . . . , x(s − iH (s) h)), biH (s) (x(s), x(s − (σ1 (x, s), . . . , σq (x, s)) σ iH (s) (x(s), x(s − h), x(s − 2 h), . . . , x(s − iH (s) h))

with bν ∈ C(IRν+1)d ; IRd ) and σ ν ∈ C(IRν+1)d ; L(IRq ; IRd )), ν = 0, . . . , iH (T ), where σjν ∈ C(IRν+1)d ; IRd ), j = 1, . . . , q, denote the columns of the matrix function σ ν = (σ1ν , . . . , σqν ). As usual, we denote by C spaces of continuous functions, by L spaces of integrable mappings, and by · the Euclidean norm on IRn (n ∈ IN) and the corresponding induced norm on L. For a random variable ζ on a probability space (Ω, A, P ) having values in a separable metric space (X, d) with the Borel σ-algebra B(X), the notation D(ζ) means the distribution P ◦ ζ −1 of ζ. P (X) is the set of all Borel probability measures (probabilities) on X. For p ∈ [1, ∞) we define on the set   Mp (X) :=

µ ∈ P (X);

d(x, θ)p dµ(x) < ∞, θ ∈ X



  

X

a metric Wp by  Wp (µ, ν) := inf 

1/p d(x, y)p dη(x, y)

(µ, ν ∈ Mp (X)),

X×X

where the infimum is taken over all measures η ∈ P (X × X) with marginal distributions µ and ν. Wp = p is the Lp -Wasserstein metric or Lp -Kantorovich metric (see (1.1.10)). We shall state the convergence results for a sequence of approximations to the solution x of (1.4.19) in terms of Wp , which is the “ideal” metric for this type of approximation problem. The approximate solution to x in (1.4.19) can be viewed as a framework for studying asset pricing models known in the econometrics literature as autoregressive conditional heteroscedasticity (ARCH) or generalized ARCH (GARCH) models. We give a brief description of these models. t0 < t1 < · · · < tn = T with step size Consider an equidistant grid t0 = h on the time interval [t0 , T ]. A univariate ARCH model is defined as a discrete time stochastic process (εti ), i = 0, . . . , n of the form ti δti , εti+1 = σ


41

where σ ti is a positive measurable function of the time points t0 , t1 , . . . , ti and the δti are i.i.d. r.v.s with zero mean and variance one. In the linear ARCH (ψ) the variances σti depend on the squares of the past ψ values of the process, σ t2i := ω +

ψ−1

αr ε2ti−r ,

r=0

whereas in the more general linear GARCH (φ, ψ) they may also depend on the φ past variances, σ t2i := ω +

ψ−1

αr ε2ti−r +

r=0

φ

βr σ t2i−r .

r=1

In these models it is assumed that ω > 0, αr ≥ 0, βr ≥ 0 for all r. One can embed these models into the constructed approximation for the SDE (1.4.19). (For details see Section 10.3.) We need the following general assumptions concerning (1.4.19):

(A1) There exists a constant M > 0 such that for all j = 1, . . . , q, ν = 0, . . . , iH (T ), and x0 , . . . , xν ∈ IRd

bν (x0 , . . . , xν ) ≤ M (1 + max x ) and

σjν (x0 , . . . , xν ) ≤ M. (A2) There exists a contant L > 0 such that for all j = 1, . . . , q, ν = 0, . . . , iH (T ) and x0 , . . . , xν , y0 , . . . , yν ∈ IRd

bν (x0 , . . . , xν ) − bν (y0 , . . . , yν ) ≤ L max x − yp and 0≤≤ν

σjν (x0 , . . . , xν )

−

σjν (y0 , . . . , yν )

≤ L max x − yp . 0≤≤ν

(A1) and (A2) assure the existence and uniqueness of the solution of (1.4.19). As mentioned above, the approximate solutions are based on a “double grid”—a coarse grid for the time discretization and a fine grid for the chance discretization, which has a lower convergence speed than the time discretization. In fact, we consider a grid class G(m, α, β). Here m : (0, T − t0 ] → [1, ∞) is a monotone decreasing function, and α, β > 0 are constants. Then each element G of G(m, α, β) consists of two kinds of grid points: (i) the time discretization points tk , k = 0, . . . , n, with t0 < t1 < · · · < tn = T and

42

1. Introduction

(ii) the chance discretization points uki , i = 0, . . . , mk , k = 0, . . . , n − 1, with tk = uk0 < uk1 < · · · < ukmk = tk+1 , k = 0, . . . , n − 1. Now, G is required to satisfy the following assumptions: (G1) tk − tk−1 =

T −t0 n

=: h ≤ 1 for all k = 1, . . . , n and h/h ∈ IN;

(G2) 1 ≤ mk ≤ m(h)α for all k = 0, . . . , n − 1; (G3) uki − uki−1 =

h mk

h ≤ β m(h) for all k = 0, . . . , n − 1, i = 1, . . . , mk .

Here (G1) means that the coarse grid is equidistant with step size h and contains the master grid H. (G2) and (G3) say that each interval of the coarse subgrid is subdivided in an equidistant way by the points uki , both the number of the subdivisions and the step size of the full grid being bounded by functions of h. For a grid G of G(m, α, β) we define [t]G

:= tk and iG (t) := k,

[t]∗G

:= uki

if t ∈ [tk , tk+1 ), k = 0, . . . , n − 1,

and

if t ∈ [uki , uki+1 ), i = 0, . . . mk − 1, k = 0, . . . , n − 1.

We construct the approximate solution of (1.4.19) in three steps. The first step is a pure time discretization. (Here only the coarse subgrid is involved.)

E

y (t)

=

t q t E x0 + b y , [s]G ds + σy y E , [s]G dwj (s), j=1 t

t0

(1.4.20)

0

t ∈ [t0 , T ]. In the second step, a continuous and piecewise linear interpolation of the trajectories in (1.4.20) between the points of the whole fine grid yields the method (1.4.21):

yE

is continuous, and linear in the intervals (uki−1 , uki ], i = 1, . . . , mk , k = 0, . . . , n − 1, with E k E k y (ui ) = y (ui ), i = 0, . . . , mk , k = 0, . . . , n − 1.

(1.4.21)


43

In the third step, the Wiener process increments over the fine grid are replaced by other i.i.d. r.v.s: Let µ ∈ P (IR) be a measure with mean value 0 and variance 1, and let k ; j = 1, . . . , q; s = 1, . . . , mk ; k = 0, . . . , n − 1} {ξjs 0 ) = µ. Then we can be a family of i.i.d. r.v.s with distribution D(ξ11 define the following method (1.4.22), yielding continuous trajectories that are linear between neighboring grid points:

z E (u00 ) = x0 , and z E (uki )

= x0 +

k−1

hb(z E , tr ) + h

r=0

i b(z E , tk ) mk

(1.4.22)

&k−1 ' ( ' q mr i h h E r E k σj (z , tr ) ξjs + σj (z , tk ) ξjs + mr mk s=1 s=1 j=1 r=0 for all i = 1, . . . , mk ; k = 0, . . . , n − 1. k For this last step, the Wiener process w and the r.v.s ξji will have to be defined anew on a common probability space.

According to the evolution of the method (1.4.22) via (1.4.20) and (1.4.21), each step will be represented by one convergence theorem, yielding then immediately the main result given in terms of the Wp -metric. Theorem 1.4.4 Suppose (A1) and (A2) hold. Suppose also that p ∈ [1, ∞) and µ ∈ P (IR) has the properties ∞ x dµ(x) = 0, −∞

∞ x2 dµ(x) = 1 −∞

and ∞ etx dµ(x) < ∞ for all t with |t| ≤ τ, τ > 0. −∞

Moreover, let (w(t))t∈[t0 ,T ] be a q-dimensional standard Wiener process and k ; j = 1, . . . , q; i = 1, . . . , mk ; k = 0, . . . , n − 1} a set of i.i.d. r.v.s with {ξji 0 distribution D(ξ11 ) = µ.

44

1. Introduction

Then for the solution x of the SDE and its numerical analogue (E3), we have the following rate of convergence result:

1 + ln m(h) Wp (D(x), D(z E )) ≤ C h1/2 + √ , m(h) where C is an absolute constant. The bound in Theorem 1.4.4 gives convergence rates with respect to h for the method (1.4.22) and for any grid sequence in G(m, α, β). These rates consist of two summands, one depending on h and the other depending on m(h), representing the rates of time and chance discretization, respectively. Obviously, it is not desirable that one of the two summands converges faster than the other, for this would only increase the costs in relation to the effect. Thus, if the second summand converged faster than the first, this would mean that m(h) increases too fast, and consequently, as a result of (G3), has step sizes that are too small on the whole fine grid. This means that it has too many points uki in relation to the tk in each grid and therefore uses a random number generator too often. If the first summand converged faster than the second, then m(h) would increase too slowly; i.e., the intervals [tk , tk+1 ] would not have enough intermediate grid points ukk , so that the chance discretization would not keep up with the time discretization. Therefore, it is desirable to tune the rates of both summands, i.e., to equal the powers of h in both summands. This means to choose m(h) to be increasing as 1/h. Theorem 1.4.5 Under the assumptions in Theorem 1.4.4 and with

1 ≤ K max sup s · m(s), sup 0<s≤1 0<s≤1 s · m(s) we have Wp (D(x), D(z E )) ≤ C · h1/2 (1 − ln h). This result is almost optimal; the right order-bound should be h1/2 (see the discussion in Gelbrich (1995) and Section 10.3 in this book). From the series of applications of the Kantorovich–Rubinstein theorem we single out two recent types, one in the field of mathematical economics and the other in the field of representations of metrics as minimal metrics. Utility functions. Let (S, ≤) be a topological space with closed preorder ≤, i.e., {(x, y); x ≤ y} is closed in S × S. Define the strict order relation x ≺ y if x ≤ y and if not y ≤ x, and call an isotone function u : S → IR1 a utility function if x ≺ y implies u(x) < u(y). A fundamental result in mathematical economics due to Debreu asserts the existence of a continuous


45

utility function for a closed, total preorder on a separable metrizable space. It is not difficult to show that the assumptions that S is metrizable and separable and the preorder is closed cannot be abandoned. The following result proved in Levin (1990, 1991) shows that the restricting assumption that the preorder is total can be omitted for locally compact spaces; see also Chapter 5. Theorem 1.4.6 (Utility Representation) Let (S, ≤) be a separable, metrizable locally compact space with a closed semiorder ≤; then S admits a continuous utility function. For the proof, the duality theorem is used to establish the following extension theorem (cf. Levin (1991)): Suppose that S is compact, F ⊂ S is closed, and c is lower semicontinuous, such that v(x)−v(y) ≤ c(x, y) for x, y ∈ F and some v ∈ C(F ). Then a continuous extension of v to S exists with v(x) − v(y) ≤ c(x, y) for x, y ∈ S if c (x, y) := min{c(x, y), av (x) − bv (y)} is lower semicontinuous on S × S, where av (x) := inf{v(z) + c(x, z), z ∈ F }, bv (x) := sup{v(z) − c(x, z); z ∈ F }. The following parametrized version of this result has also been established in Levin (1990). Theorem 1.4.7 If S is metrizable, separable locally compact, Ω is a metrizable topological space, and for ω ∈ Ω, ≤ω is a preorder on S such that {(ω, x, y); x ≤ω y} is closed in Ω × S × S, then there exists a continuous utility function u : Ω × S → [0, 1]. Minimal representation of metrics. An important property of a probability metric is the possibility of finding a minimal representation of it. Consider for example |F1 (x) − F2 (x)| dx, κ(P1 , P2 ) = IR1

for the probability measures Pi on IR1 , i = 1, 2. Then κ has the representation d

d

κ(P1 , P2 ) = 1 (P1 , P2 ) := inf{E|X − Y |, X = P1 , Y = P2 } as a minimal 1 -metric. This representation allows us to obtain rate of convergence results in limit theorems for the κ-metric based on the inherent regularity structure of minimal metrics.

46

1. Introduction

The metric ∞ x (x − t)n−1 (P1 − P2 )( dt) dx ζn (P1 , P2 ) := (n − 1)! −∞ −∞

was introduced by Zolotarev (1975). It is an ideal metric of order n. For n = 1, ζ1 = 1 , but for n > 1, ζn does not allow a representation as a minimal metric with respect to a Monge–Kantorovich transportation problem. A representation for ζn as a minimal metric with respect to a Kantorovich–Rubinstein-type problem has, however, recently been found in Rachev (1991c). For a signed Borel measure m on IRk with m(IRk ) = 0, with finite nth moments and (x1 · · · xk )j m( dx) = 0, j = 1, . . . , n, define the signed measure mn as   x1 xk + k n + (xi − ti )n   m( dt1 , . . . , dtk ) (−∞, xj ] ··· = mn n! j=1 i=1 −∞

−∞

for xj ≤ 0 and by the corresponding “survival function” for components xj ≥ 0.   + + (−∞, xj ] × [xj , ∞) mn  j∈J

= (−∞,x ] [xJ c ,∞)

j∈J c

+ (xj − tj )n + (tj − xj )n m( dt1 , . . . dtk ), n! n! c

j∈J

j∈J

J

where xj = (xj )j∈J ≤ 0 and xJ c = (xj )j∈J c > 0. Also, denote Bn (m) := {b; b a nonnegative Borel measure on IRk × IRk with b(A × IRk ) − b(IRk × A) = mn (A) for all A}. Then any b ∈ Bn (m) has an absolutely continuous marginal difference measure b = b(· × IRk ) − b(IRk × ·) with ∂ (n−1)k (n−1)

∂x1

p (x) (n−1) b

· · · ∂xk

= Fm ,

where p b is the density of b, and Fm the distribution function of m. One can now introduce a version of the Kantorovich–Rubinstein norm,

c db; b ∈ Bn (m) .

m n := inf


47

The following duality theorem holds (see Rachev (1991c) and Section 6.1). Theorem 1.4.8 The norm m n is given by m n = sup{| f dm|; f ∈ x . (xj −t)n−1 Ln }, where Ln is the class of nth integrals gn (x) := (n−1)! g(t) dt of 0

Lipschitz functions g.

In the case k = 1 and for the cost function c(x, y) = |x−y| max(h(|x−a|), h(|y − a|)), h an increasing function on t ≥ 0, h(t) > 0, this leads to x (x − t)n

m n = dFm (t)h(|x − a|) dx. n! IR1

−∞

In particular, we obtain a minimal representation for the ζn -metric; namely, if c(x, y) = |x − y|, x, y ∈ IR, then ζn (P1 , P2 ) = P1 − P2 n . If c(x, y) = |x − y|, x, y ∈ IRk (k ≥ 1), then Zk,n (X, Y ) := P X − P Y n is an ideal metric of order kn+1. This dependence of the order of the ideality upon the dimensionality may be considered as a drawback of Zk,n . In Hanin and Rachev (1994, 1995) a different approach was proposed, leading to the following dual representation for ideal metrics of order r > 0 independent of the dimensionality of IRk . For a given α ∈ IN, and any s ≥ α, let Ms◦ be the set of all signed Borel measures µ on IRn such that αn 1 xα · · · x dµ(x , . . . , x ) = 0 and |x|s d|µ|(x) < ∞ 1 n n 1 IRn

for every multiple index α = (α1 , . . . , αn ) ∈ ZZn+ such that α1 + · · · + αn ≤ k − 1. Next, let Γµ be the set of signed Borel measures Ψ on IR2n , viewed as “transshipment plans,” satisfying the balancing condition f dµ = kh f (x) dΨ(x, h), IRn

IR2n

where kh f (x) =

k i=0

(−1)k−i (ki )f (x + ih)

48

1. Introduction

is the kth difference of f with step h for x, h ∈ IRn . Define the following minimal functional on Mr◦ :

µ k,r =

h r d|Ψ|(x, h).

inf

ψ∈Γµ

To state the dual representation for µ k,r , let Λkr be the set of all locally bounded functions f on IRn such that for some C ≥ 0, |kh f (x)| ≤ C h r over all x, h ∈ IRn . Λkr is endowed with the seminorm f Λkr = inf C. In Chapter 6 the following duality theorem will be proved: Theorem 1.4.9 Let n ∈ IN, k ∈ IN, s = n + k − 1, and α − 1 < r ≤ k. Then for every µ ∈ Mr◦ ,

µ k,r = sup

  

 

f dµ; f Λkr ≤ 1



IRn

.

Moreover, the above supremum is attained; there is an f ∈ Λkr with f Λkr = 1 such that µ k,r = f dµ. The minimal functional µ k,r defines an ideal metric of order r regardless of the dimensionality on IRn . In fact, let Kr (P, Q) be the analogue     Kr (P, Q) = sup f d(P − Q); f Λkr ≤ 1   IRn

of the Kantorovich metric on IRn . From Theorem 1.4.9 one can easily check the following: (i) Kr is an ideal metric of order r, and for k − 1 < r ≤ k, ζr ≤ c1 Kr ≤ c2 ζr for some positive constants c1 and c2 ; ◦ , then Kr (P, Q) admits the dual representation (ii) if P − Q ∈ Mn+k−1

Kr (P, Q) = P − Q k,r , and moreover, Kr (P, Q) ≤ A IRn

x r d|P − Q|(x) < ∞.


49

Stability of stochastic programs. R¨ omisch and Schultz (1991, 1993) studied the stability of the following stochastic optimization problem:   P (µ) : min



f (x, z)µ( dz); x ∈ C

IRs

  

,

where f : IRm × IRs → IR := IR ∪ {−∞, +∞} is a normal integrand (i.e., f (·, z) is lower semicontinuous for all z ∈ IRs , and f is Borel measurable), f (·, z) is continuous on C ∀z ∈ µ, C ⊂ IRm is nonempty and closed, and µ is a Borel probability measure on IRs . The optimal value of P (µ) is defined by   ϕ(µ) := inf

f (x, z)µ( dz); x ∈ C



IRs

  

,

and the corresponding solution set is   ψ(µ) := argmin



IRs

f (x, z)µ( dz); z ∈ C

  

.

See R¨ omisch and Schultz (1991, 1993) and the references therein. Typically the probability µ is incompletely determined. We start with an example of an “ideal” metric(3) for studying quantitative stability of P (µ). Let (Z, d) be a separable metric space, P (Z) the set of all Borel probability measures on Z, and Θ ∈ Z a fixed element playing the role of an “origin.” For any h : Z → IR and any r > 0, define the Lipschitz norm Liph (r) := sup

|h(z) − h( z )| ; z = z, d(z, Θ) ≤ r, d( z , Θ) ≤ r . d(z, z)

Given a nondecreasing function H : IR+ → IR+ with H(0) = 0, define the seminorm of h as

h H := sup{ Liph (r)(max{1, H(r)})−1 ; r > 0}. (3) The results in this section represent the main part of the lecture “Quantitative Stability of Stochastic Programs via Probability Metrics,” by S.T. Rachev and W. R¨ omisch, given at the 3rd Int. Conf. on “Approximation and Optimization” in the Caribbean, Puebla (Mexico), Oct. 8–13, 1995.

50

1. Introduction

Now we are ready to define the Fortet–Mourier metric     F MH (P, Q) := sup h(z)(P − Q)( dz) ; h H ≤ 1   Z

in

    PH (Z) := P ∈ P (Z); cH (z, Θ)P ( dz) < ∞ ,   Z

where cH (z, Θ) = d(z, Θ) max{1, H(d(z, Θ))}. The Fortet–Mourier metric arises in a natural way from the Kantorovich–Rubinstein mass transshipment problem:   F MH (P, Q) = inf



  cH (z, z)η( dz, d z ); η ∈ D(P, Q) 

Z×Z

for any P, Q ∈ PH (Z), where D(P, Q) denotes the set of all bounded Borel measures η on Z × Z satisfying the “balancing” constraint η(· × Z) − η(Z × ·) = (P − Q)(·). If the function H satisfies the property ∆H := sup t =s

|t max{1, H(t)} − s max{1, H(s)}| < ∞, |t − s| max{1, H(t), H(s)}

then F MH (Pn , P ) → 0

for Pn , P ∈ PH (Z) if and only if (Pn ) converges weakly to P and cH (z, Θ)(Pn − P )( dz) → 0 Z

(cf. Corollary 4.3.4 in Rachev (1991)). For example, ∆H < ∞ is valid for H(t) = ta , a > 0. On the real line Z = IR, d(z, z) = |z − z|, the Fortet– Mourier metric admits an explicit representation (Theorem 5.4.1 in Rachev (1991); see also Rachev and R¨ uschendorf (1991)): ∞ H(|z − Θ|)|FP (z) − FQ (z)| dz, F MH (P, Q) = −∞

where FP denotes the distribution function of P. In the next theorem, we use the Fortet–Mourier metric to evaluate the stability of P (µ) with respect to perturbations of the original distribution P.


51

Theorem 1.4.10 Let H : IR+ → IR+ be a nondecreasing function with H(0) = 0, P ∈ PH (IRs ) and let ψ(P ) be nonempty and bounded. Assume that (i) the function f (·, ξ) is convex for each ξ ∈ IRs , and (ii) there exists an open, bounded subset V of IRm and a constant L0 > 0 such that ψ(P ) ⊂ V and ≤ L0 max{1, H(max{||ξ||, ||ξ||})} |f (x, ξ) − f (x, ξ)| ||ξ − ξ|| whenever x ∈ V and ξ, ξ ∈ IRs . Then the solution set mapping ψ on (PH (IRs ), F MH ) is upper semicontinuous at P , and there exist constants L > 0 and δ > 0 such that |ϕ(P ) − ϕ(Q)| ≤ F MH (P, Q) whenever Q ∈ PH (IRs ),

F MH (P, Q) < δ.

The stability results can be applied to the empirical analysis of P (µ). Consider for example the approximation of P (µ) by its sample version n 1 Pn (µ) : min Fn (x) = f (x, ξi ); x ∈ C , n i=1 where (ξi ) are i.i.d. copies of ξ. Let ϕ and ϕn denote the optimal values of P (µ) and Pn (µ), respectively, and let ψ and ψn denote the corresponding solution sets. Applying the rate of convergence results for empirical measures in terms of the Fortet–Mourier metric (see Rachev (1991c)), Theorem 1.4.10 provides bounds for the distance between ϕ and ϕn . The stability analysis can then be used to estimate the sensitivity of a portfolio of asset returns having minimal risk with preassigned mean returns. Specialized mass transportation problems. Several modifications of the transportation problem have been studied, allowing bounds for the admissible supply and demand distributions, or capacity constraints for the admissible transportation plans (see also Chapter 7). Let P1 , P2 be probability measures on IR1 with distribution functions (d.f.s) F1 , F2 and let F(F1 , F2 ) denote the class of joint d.f.s F with marginals F1 , F2 . The

52

1. Introduction

classical Hoeffding–Fréchet characterization of F(F1 , F2 ) states that a d.f. F is in F(F1 , F2 ) if and only if F− (x, y)

(F1 (x) + F2 (y) − 1)+ (1.4.23) F (x, y) ≤ min{F1 (x), F2 (y)} =: F+ (x, y).

:= ≤

If c(x, y) satisfies the “Monge” conditions, i.e., c is right continuous and c(x , y ) − c(x, y ) − c(x , y) + c(x, y) ≤ 0

for x ≥ x, y ≥ y,

then for all F ∈ F(F1 , F2 ), c dF ≤ c dF− . c dF+ ≤

(1.4.24)

(1.4.25)

An equivalent form in terms of the random variables X, Y with FX = F1 , FY = F2 is (1.4.26) Ec F1−1 (U ), F2−1 (U ) ≤ Ec(X, Y ) −1 −1 ≤ Ec F1 (U ), F2 (1 − U ) , where U is uniformly distributed on (0, 1) and Fi−1 (u) = inf{y; F1 (y) ≥ u} is the generalized inverse of F1 (the quantile function); see also Chapter 3. Consider for given d.f.s F1 , F2 the set H(F1 , F2 ) :=

{F ; F is a d.f. on IR2 (1.4.27) / / / / with marginals F1 , F2 , where F1 ≤ F1 , F2 ≥ F2 }

with bounds on the marginal d.f.s. We study the transportation problem minimize c(x, y) dF (x, y), subject to F ∈ H(F1 , F2 ),

(1.4.28)

or equivalently, minimize Ec(X, Y ),

subject to FX ≤ F1 , FY ≥ F2 .

(1.4.29)

Theorem 1.4.11 (cf. Rachev and R¨ uschendorf (1994d) and Chapter 3) Suppose that c(x, y) is symmetric, c satisfies the Monge condition (1.4.24), and c(x, x) = 0 for all x, and define H ∗ (x, y) := min{F1 (x), max{F1 (y), F2 (y)}}; then H ∗ ∈ H(F1 , F2 ), and H ∗ solves the relaxed transportation problem (1.4.28). Furthermore, c dH

∗

1 = c F1−1 (u), min F1−1 (u), F2−1 (u) du. 0


53

We remark that Theorem 1.4.11 suggests a greedy algorithm for the solution of the corresponding discrete transportation problem: minimize subject to

n

n i=1

j=1 cij xij

xij ≥ 0, j n s=1 j r=1

r=1 n s=1

xrs ≥ xrs ≤

j

s=1 bs i r=1 ar

(1.4.30) =: Gj , =: Fi ,

n where n the sum of the “demands” s=1 bs equals the sum of the “supplies” r=1 ar . We assume that the (cij ) are symmetric, cii = 0, and c satisfies the discrete Monge condition ci,j + ci+1,j+1 − ci,j+1 − ci+1,j ≤ 0.

(1.4.31)

The restrictions describe production and consumption processes based on priorities with capacities s1 , . . . , sn such that what remains in stage i of the production (or consumption) process can be transferred to some of the subsequent stages i + 1, . . . , n. The proposed greedy algorithm for this problem is as follows. Define Hi := max{Fi , Gi }, 1 ≤ i ≤ n, δ1 := H1 , δi+1 := Hi+1 − Hi , i ≤ n − 1. Then (1.4.30) is equivalent to the standard transportation problem minimize cij xij (1.4.32) subject to

n

xij = ai ,

j=1

n

xij = δj ,

xij ≥ 0,

i=1

and the northwest corner rule applied to these new equality restrictions solves (1.4.30). For a detailed example and comparison, see Chapter 3. For a second example, let µ be a finite Borel measure on the plane, and for any two probabilities P1 and P2 on IR1 and Ai × Bi ∈ B 2 , i ∈ I, define M µ (P1 , P2 ) := {P ∈ M(P1 , P2 ); P (Ai × Bi ) ≤ µ(Ai × Bi ), i ∈ I}, (1.4.33) the class of transportation plans with upper bounds on the capacity of sets A i × Bi . By the sharpness of the classical Fréchet bounds (cf. R¨ uschendorf (1991)) min{P (Ai × Bi ); P ∈ M (P1 , P2 )} = max{P1 (Ai ) + P2 (Bi ) − 1, 0}, (1.4.34) we impose the necessary assumptions µ(Ai × Bi ) ≥ max(0, P1 (Ai ) + P2 (Bi ) − 1) in order to avoid trivial cases.

(1.4.35)

54

1. Introduction

Theorem 1.4.12 (cf. Olkin and Rachev (1990), Rachev and R¨ uschendorf (1994d), and Chapter 3) Define   P ∗ (A × B) = min inf {µ(Ai × Bi ) + P1 (A\Ai ) + P2 (B\Bi )} ,  Ai ⊂A Bi ⊂B   (1.4.36) min {P1 (A), P2 (B)} .  Then the generalized upper Fréchet bound hµ (A × B) := sup {P (A × B); P ∈ M µ (P1 , P2 )} satisfies (a) hµ (A × B) ≤ P ∗ (A × B). (b) If P ∗ defines a measure, then P ∗ ∈ M µ (P1 , P2 ) and hµ (A × B) = P ∗ (A × B). 0 1 (c) If {Ai × Bi ; i ∈ I} = (−∞, x] × (−∞, y]; x, y ∈ IR1 , then P ∗ defines a measure and the bound in (a) is sharp. In the finite discrete case this problem has been dealt with by Barnes and Hoffman (1985). Again, as in the first example, a greedy algorithm can be constructed for this problem (cf. Chapter 3). The (generalized) transportation problem can be considered as a (generalized) moment problem with infinitely many moment-type conditions specifying the marginal distributions. From this point of view some explicit moment-type problems have been considered with moment-type conditions on the marginal distribution functions. In Rachev (1991c) the problem of minimizing (maximizing) E X1 − X2 p is considered under the restrictions E Xi qj = aij , i = 1, 2, j = 1, . . . , n. This corresponds to a weakening of the marginal constraints. An interesting problem related to moment-type marginal constraints is considered in Klebanov and Rachev (1995). The problem arises in the context of computer tomography. Let Q1 , Q2 be probability measures on IRm with identical marginal distributions in a finite number n of directions ϑ1 , . . . , ϑn . We ask what can be said about the closeness of Q1 , Q2 if they coincide in distribution in an increasing number n of directions. It is known that with respect to the supremum distance , Q1 , Q2 may differ considerably (this is known as the “computer tomography paradox”). In Klebanov and Rachev (1996) it was shown that this paradox disappears when some weaker metrics like the λ-metric and the Lévy–Prohorov distance are used.


55

Consider the case m = 2, and define 1 λ(P, Q) := min max max eit,x (P − Q)( dx), . (1.4.37) T >0 T t≤T λ metrizes the topology of weak convergence (Klebanov and Rachev (1996)). Theorem 1.4.13 (cf. Klebanov and Rachev (1995)) Let P, Q be probability measures on IR2 that have the same marginals in directions ϑ1 , . . . , ϑn no two of which are collinear. Suppose that P has support in the unit disk. Then 1 s+1 n−1 2 . (1.4.38) with s := 2 λ(P, Q) ≤ s! 2 1 2 s+1 ∼ es as s → ∞. The Note that the right-hand side satisfies s! assumption of coinciding marginals in the directions ϑ1 , . . . , ϑn can be replaced by the assumption of coinciding moments up to order n − 1 in these directions, and compactness of the support can be replaced by a Carlemantype condition. Define (1.4.39) µk := sup x, ϑ k P ( dx), k = 0, 1 . . . ,

ϑ∈S 1

(s−2)/2

βs

:=

−

1

µ2j2j .

j=1

Theorem 1.4.14 If ϑ1 , . . . , ϑn are not collinear directions in IR2 , P has moments of any order, and P, Q have identical moments up to order n − 1 in directions ϑ1 , . . . , ϑn , then for some absolute constant C, 1/4 1/2 . λ(P, Q) ≤ Cβs−1/4 µ0 + µ2

(1.4.40)

For some extensions to higher dimensions and related results we refer to Klebanov and Rachev (1996). Further applications of MKPs will be given in Chapters 5–9.


2 The Monge–Kantorovich Problem

In the first section we formulate the multivariate Monge–Kantorovich problem and focus our attention on the dual representation of the problem. Applying some well-established tools from functional analysis and measure theory, we show a duality representation and existence of solutions for a certain class of functions including continuous bounded functions in a topological setup. In the first step one obtains a general duality result for bounded additive measures with given marginals. In the second step one singles out those functions that allow the replacement of the finitely additive measures by σ-additive measures. This approach was introduced in R¨ uschendorf (1979, 1981) and Gaffke and R¨ uschendorf (1981).

The idea of Kellerer (1984) was to consider continuity properties of the inf (resp. sup) functionals. Based on these properties and a functional version of Choquet’s capacitability theorem, H. Kellerer considerably extended the range of applicability of the duality theorem. Kellerer’s approach is the setup in the following two sections. Finally, Section 2.4 follows Ramachandran and R¨ uschendorf (1995), where an extension to the nontopological setting is given. It turns out that the imposed conditions are basically necessary and sufficient for the duality theorem to hold.

58

2. The Monge–Kantorovich Problem

2.1 The Multivariate Monge–Kantorovich Problem: An Introduction The multivariate Monge–Kantorovich problem (MKP) has been introduced and studied in various versions by R¨ uschendorf (1979, 1981), Schay (1979), Kellerer (1984), Rachev (1984b, 1991c), and Ramachandran and R¨ uschendorf (1995). It can be briefly stated as follows. Given probability laws Pi on measurable spaces (Si , Bi ), 1 ≤ i ≤ n, and a real cost function c on S = S 1 × · · · × Sn , (P)

minimize

c dP

over all probability measures P on S subject to the restriction πi P = Pi , 1 ≤ i ≤ n, where πi P stands for the projection of P on Si . As in classical linear programming, this primal problem gives rise to a dual problem, (D) maximize fi dPi 1≤i≤n S

i

over all real functions fi on Si satisfying fi (xi ) ≤ c(x), x = (x1 , . . . , xn ) ∈ S. 1≤i≤n

The MKP was introduced in the case u = 2, S1 = S2 = U a compact separable metric space, and c(x, y) = d(x, y) by Kantorovich (1942, 1948). (For some history on MKP we refer to the Introduction.) We shall be interested in the following: (i) When does the duality hold; that is, when do the minimum in (P) and the maximum in (D) coincide? (ii) When does a solution of (P) (resp. (D)) exist? Let (S, B) = ⊗ni=1 (Si , Bi ), and let c ∈ B(X, B), denoting that c is bounded and B-measurable. Let M (P1 , . . . , Pn ) (resp. ba (P1 , . . . , Pn )) denote the set of measures on (S, B) with marginals Pi (resp. the finitely additive), nonnegative set functions on B with marginals Pi . Then for any P ∈ ba (P1 , . . . , Pn ) the integral c dP is well-defined, and we introduce

m(c) := inf c dP ; P ∈ M (P1 , . . . , Pn ) ,

(2.1.1) S(c) := sup c dP ; P ∈ M (P1 , . . . , Pn ) ,

2.1 The Multivariate Monge–Kantorovich Problem: An Introduction

59

as well as the corresponding finitely additive variants m0 (c)

:=

inf

S0 (c)

:=

sup

c dP ; c dP ;

P ∈ ba (P1 , . . . , Pn ) ,

P ∈ ba (P1 , . . . , Pn ) .

(2.1.2)

Theorem 2.1.1 If c ∈ B(S, B), then m0 (c) = sup

n

fi dPi ; fi ∈ B(Si , Bi ),

i=1

n

fi ◦ πi ≤ c .

(2.1.3)

i=1

There exist solutions to both sides of (2.1.3). Proof: We give two proofs of this result: (I) Define Z := B(S, B), X := . n 1 i=1 B(Si , Bi ). Define also F := X → IR as F (f1 , . . . , fn ) :=

n

fi dPi ,

(2.1.4)

i=1

and ψ := X → Z as ψ(f1 , . . . , fn ) := −

n

f i ◦ πi ,

z0 := c.

(2.1.5)

i=1

Next we introduce the convex cone of nonnegative bounded functions E := {f ∈ B(X, B), f ≥ 0} defining a pseudo-order on Z. Now we apply the following duality theorem of Isii (1964) on topological vector spaces Z, X, with E a convex cone with vertex 0 in a real vector space. Assume that ◦ ◦ E = Ø and that 0 ∈ (ψ(X) − E + z0 )0 , where A denotes the interior of A. Then x ∈ X, ψ(x) + z0 ≥ 0} (2.1.6) ∗ ∗ ∗ ∗ inf{z (z0 ); z ∈ Z , z ≥ 0, z (ψ(x)) + F (x) ≤ 0, ∀x ∈ X}.

sup{F (x); =

∗

By Riesz’s representation theorem and choosing the norm topology on B(S, B), we see that the dual space (B(S, B))∗ equals the set of bounded additive set functions on B. The left-hand side of (2.1.6) is identical to the dual problem M1 (c) := sup

n i=1

fi dPi ; fi ∈ B(Si , Bi ),

n i=1

fi ◦ π i ≤ ϕ .

60


The right-hand side of (2.1.6) is identical to M2 (c)

:=

c dP ; P ∈ ba (S, B), P ≥ 0,

inf −

n

=

fi ◦ πi dP ≤ −

i=1

n

(2.1.7)

fi dPi , ∀fi ∈ B(Si , Bi )

i=1

c dP ; P ∈ ba (S, B), P ≥ 0,

fi ◦ πi dP = fi dPi , ∀fi ∈ B(Si , Bi ), 1 ≤ i ≤ n

inf c dP ; P ∈ ba (P1 , . . . , Pn ) = m0 (c).

inf

=

◦

As for the regularity condition, we observe that E = Ø. To see that 0 is an interior point of ψ(x) − E + z0 , we have to show the following: For f ∈ B(S, E) with ||f || ≤ ε, there exist g ∈ B(S, E) and h ∈ B(S, E), h ≥ 0, such that f = ψ(g) − h + ϕ. In fact, for |ϕ| ≤ kand gi := − n1 (k + ε), 1 ≤ i ≤ n, g = (g1 , . . . , gn ), we have that h := − gi ◦ πi − f + ϕ ≥ 0, and the regularity condition is fulfilled. The existence of a solution of the left-hand side in (2.1.3) follows from Theorem 2.1 of Isii (1964). To prove the existence of solutions of the righthand side, we first note that there exists a K ∈ IR1+ such that n fi dPi ; fi ∈ B(Si ), |fi | ≤ K, (2.1.8) M1 (c) = sup i=1

1 ≤ i ≤ n,

n

f i ◦ πi ≤ c .

i=1

n Suppose that f1 , . . . , fn ∈ B(Si ) with i=1 fi ◦ πi ≤ ϕ are given. Let bi := sup fi , 1 ≤ i ≤ n, and let ai = inf ϕ, A := sup ϕ. Defining 1 bj − bi , n j=1 n

gi := fi +

1 ≤ i ≤ n,

we obtain n

gi ◦ πi ≤ c,

i=1

n

gi dPi =

i=1

i=1

and 1 bj = b, n i=1 n

sup gi =

n

1 ≤ i ≤ n.

fi dPi ,


It follows that b ≤

61

A n.

In the next step define hi (x) := max{gi (x), d} with n−1 a n−r d := min − A = a− A. 1≤r≤n r nr n

Thus n

hi (xi ) =

i=1

n

gi (xi ) ≤ c(x1 , . . . , xn ),

i=1

if gi (xi ) ≥ d, 1 ≤ i ≤ n. Now let gi (xi ) < d for exactly r indices i, 1 ≤ r ≤ n. Then n

hi (xi ) ≤ rd + (n − r)b ≤ rd +

i=1

n−r A ≤ a ≤ c(x1 , . . . , xn ), n

and gi ≤ hi , 1 ≤ i ≤ n, implies n

gi dPi ≤

i=1

n

hi dPi .

i=1

(k) (k) With K := max |d|, |A| we obtain (2.1.8). Next let (f1 , . . . , fn ), k ∈ n IN, be a sequence such that (k)

fi

∈ BK (Si ) := {f ∈ B(Si ); |f | ≤ K} , 1 ≤ i ≤ n,

and n i=1

(k) fi

◦ πi ≤ c

and

lim

k→∞

n

(k)

fi

dPi = M1 (c).

(2.1.9)

i=1

Since ⊗ni=1 BK (Si ) is a sequentially compact subset of ⊗ni=1 L1 (Pi ) supplied n 1 n ∞ with the weak topology σ ⊗i=1 L (Pi ), ⊗ i=1 L (Pi ) , there exist (f 1 , . . . , f n )

∈ ⊗ni=1 BK (Si ) and a subsequence of

(k)

(k)

f1 , . . . , fn

k∈IN

converging to

(f 1 , . . . , f n ) with respect to the weak topology. The convexity of ⊗ni=1 BK (Si ) yields the existence of an even strongly convergent sequence as in (2.1.9) some subsequence, a.s. convergence with respect ⊗ni=1 Pi . Therefore, to for n n holds, and i=1 f i ◦ πi ≤ c with i=1 f i dPi = M1 (c). (II) The application of the duality theorem of Isii (1964) in the proof above can be replaced by the Hahn–Banach theorem in the following way: Define n n fi ◦ πi ; fi ∈ B(Bi , Pi ) (2.1.10) F = ⊕i=1 B(Bi , Pi ) = i=1

62


as the direct sum of the bounded Bi -measurable functions. ThenF is a vecn tor subspace of B(S). Define the linear operator T : F → IR, T ( i=1 fi ) := n 1 i=1 fi dPi and the sublinear functional U : B(S) → IR , U (ϕ) := inf{T (h); h ∈ F, ϕ ≤ h}. Then U (f ) = T (f ) for f ∈ F . If S is a linear functional on B(S), S ≤ U , then for f ∈ F, f ≥ 0, we have S(f ) ≥ 0; that is, S is a positive operator S ≥ 0, and S/F = T . By the Hahn–Banach theorem, there exists an extension S of T to B(S), S ≤ U . Riesz’s representation theorem ensures the existence of an element P ∈ ba (S, B) representing S. Since S/F = T , it follows that P ∈ ba (P1 , . . . , Pn ). A corollary to the Hahn–Banach theorem is the existence of an extension S with S(c) = U (c) if U (c) > −∞ (which is obviously fulfilled). This yields the existence result of the left-hand side of (2.1.3). The existence proof for the right-hand side is given as in (I). 2 Remark 2.1.2 (a) Theorem 2.1.1 gives a valuable interpretation of the dual problem in terms of a finitely additive measure version of the transportation problem. In contrast to the σ-additive version, the existence of solutions in the finite additive version is assured in great generality. In the σ-additive case one may use the following argument: Assume that all marginal measures Pi are tight. Then all elements P ∈ M (P1 , . . . , Pn ) are tight, and the functional P → c dP is lower semicontinuous with respect to the weak topology if c is a lower bounded semicontinuous cost function. This implies the existence of an optimal measure in (2.1.1). (b) The second proof is easily generalized to the class Lm of all measurable functions c on (S, B) such that there exists an element f ∈ F 1 = ⊕ni=1 L1 (Pi ) with f ≤ c (that is, c is majorized by f from below). Then the corresponding duality result reads, For c ∈ Lm , we have

(2.1.11) inf c dP ; P ∈ ba (P1 , . . . , Pn ) n n 1 fi dPi ; fi ∈ L (Pi ), fi ◦ π i ≤ c . = sup i=1

i=1

If the right-hand side of (2.1.11) is finite, then a solution of the lefthand side exists. (c) The method of proof of Theorem 2.1.1 also extends to more general multivariate and “overlapping” marginal conditions where the marginals of certain subsets of the index set are prescribed. For these extensions we refer to R¨ uschendorf (1991, 1991a).


63

The following result of Ryll-Nardzewski (1953) (cf. Corollary 3.2.2 of Ramachandran (1979)) is used for the extension problem as indicated in Remark 2.1.2(a). Recall that a probability P on (S, B) is perfect if for any real-valued B-measurable function f on S there exists a Borel set Bf ⊂ f (S) such that P (f −1 (Bf )) = 1. In fact, perfectness of P is a very weak regularity assumption, which is satisfied under weak topological conditions (cf. Ramachandran (1979, Corollary 3.2.2)). Theorem 2.1.3 Let (Si , Bi , Pi )1≤i≤n be probability spaces all but at most one of which are perfect. Then any charge P on the ring R (resp. semialgebra U) generated by the measurable rectangles is σ-additive on R (resp. U). Consider next the following two assumptions: (A1) (Si , Bi , Pi ) are compactly approximable; that is, there exist compact set systems Ei ⊂ Bi with Pi (Bi ) = sup{Pi (Ei ); Ei ⊂ Pi , Ei ∈ Ei }. (A2) (Si , Bi ) are topological spaces with Borel σ-algebras Bi . Assume that B = ⊗Bi and that Bi contains a countable basis of the topology, 1 ≤ i ≤ n. For P ∈ ba (P1 , . . . , Pn ) let L1 (R, P ) denote the set of P -integrable functions where P is considered on the ring R(×ni=1 Bi ) (cf. Dunford and Schwartz (1958, Def. 17)). Theorem 2.1.4 (a) If (A1) holds, then m(c) = m0 (c) for all c that are P -integrable for any P ∈ ba (P1 , . . . , Pn ). (b) If (A1), (A2) hold, then m(c) = m0 (c) for all c ∈ Cb (S). Proof: (a) Each P ∈ ba (P1 , . . . , Pn ) considered on the ring R(×Bi ) has by (A1) a unique extension to an element P ∈ M (P1 , . . . , Pn ). Therefore, by Lemma 1 of Dunford and Schwartz (1958), c dP = c dP for any c that is P -integrable. (b) For c ∈ Cb (S) we replace in the proof of Theorem 2.1.1 B(Si , Bi ) by Cb (Si ) and B(S, B) by Cb (S). Next we apply (2.1.6) and use that {z ∈ (Cb (S))∗ ; z ≥ 0} = rba (S, R(E)). The latter denotes the set of regular bounded additive measures on (S, R(E)), where E is the system of open sets in S. By an approximation argument it is easy to see that any regular, bounded additive content on B = σ(R) that is σ-additive on R = R(E) is also σ-additive on B. So, rba (S, B) ∩ ba (P1 , . . . , Pn ) = M (P1 , . . . , Pn ), and the two integrals c dP (P considered as a regular bounded additive

64


measure on R) and coincide.

c dP (P considered as a σ-additive measure on B) 2

Theorems 2.1.1, 2.1.4 together imply, in particular, the duality theorem for the Monge–Kantorovich problem for bounded continuous functions on separable, perfect spaces. Note that compact measures are perfect, and conversely, if B is countably generated, perfect measures are compact. A simple characterization of optimal measures P ∗ ∈ M (P1 , . . . , Pn ) under the conditions of Theorems 2.1.1, 2.1.4 is as follows: Proposition 2.1.5 Assume that P ∗ ∈ M (P1 , . . . , Pn ). Then P ∗ is a solun tion of m(c) = c dP ∗ if and only if there exist fi ∈ B(Si ) with fi ◦πi ≤ c and

n

i=1 ∗

fi ◦ πi = c [P ].

i=1

Proof: If fi ∈ B(Si ),

fi ◦ πi ≤ c, and

n

fi dPi = m(c), then by the duality theorem P ∗ ∈ M (P1 , . . . , Pn ) is a solution of m(c) = c dP ∗ if and n fi ◦ πi ) dP ∗ = 0; or equivalently, since the integrand is only if (c − i=1

i=1

nonnegative,

n

fi ◦ πi = c [P ∗ ].

2

i=1

2.2 Primal and Dual Monge–Kantorovich Functionals This and the following sections are based on Kellerer’s (1984) seminal work. Here, (Si , Bi ), 1 ≤ i ≤ n, are Hausdorff topological spaces with Borel σalgebras Bi , and Pi are tight probability measures on Bi ; that is, for ε > 0 there exist compact sets Ki ⊂ Si such that Pi (Ki ) > 1 − ε. Let M = M (P1 , . . . , Pn ) denote the set of all probability measures on S = ×ni=1 Si with marginals Pi ; S is supplied with its Borel σ-algebra. The space P(S) of tight probabilities on S is endowed with the weak topology, that is, the Hausdorff topology defined by the condition that P → P (g) = g dP is lower semicontinuous for all bounded semicontinuous functions on S. Proposition 2.2.1 The set M ⊂ P(S) is nonvoid, compact, and convex. Proof: The product P1 ⊗ · · · ⊗ Pn is defined on B1 ⊗ · · · ⊗ Bn and has a unique extension to the Borel σ-algebra B, see Schwartz (1973, p. 63). Obviously, P1 ⊗ · · · ⊗ Pn ∈ M .

2.2 Primal and Dual Monge–Kantorovich Functionals

65

For ε > 0 let Pi (Ki ) > 1− nε , Ki ⊂ Si compact. n Then K = K1 ×· · ·×Kn is compact, and for any P ∈ M, P (K c ) ≤ i=1 Pi (Kic ) < ε. Therefore, M is uniformly tight and thus, by the Prohorov theorem, relatively compact. The continuity of the projection maps πi implies that M is closed and therefore compact. 2 We next define the Monge–Kantorovich functionals: Definition 2.2.2 For h : S → IR define S(h) := sup{P ∗ (h);

P ∈ M}

and I(h)

:=

inf

n

Pi (hi ); hi : Si → (−∞, ∞],

i=1

(2.2.1)

hi ∈ L (Pi ), h ≤ ⊕hi , 1

n ∗ where ⊕ni=1 hi (x) := i=1 hi (xi ), P (h) is the outer integral of h (see Billingsley (1986, Chap. 3)), inf Ø = ∞, and Pi (hi ) =

hi dPi . Si

Switching from h → −h, the results for S and I have immediate counterparts in terms of the minimization problem m(c) (cf. (2.1.1)). Let P(S) denote the set of all functions on S with values in IR. One direction of the duality theorem is obvious, as we see from the next statement: Proposition 2.2.3

(a) S(h) ≤ I(h) for all h ∈ P(S).

(b) S, I are isotone functionals. Proof: (a) For every P ∈ M and hi ∈ L1 (Pi ) with h ≤ ⊕hi , the following ∗ ∗ Pi (hi ). holds P (h) ≤ P (⊕hi ) = (b)

is obvious.

2

To prove a general version of the opposite inequality we investigate the functionals S, I more closely.

66


Proposition 2.2.4 (a) Finiteness: If h ∈ P(S) and P ∗ (h) < ∞ for all P ∈ M , then S(h) < ∞. (b) Subadditivity: I, S are subadditive; that is, S(f + g) ≤ S(f ) + S(g) if min(|S(f )|, |S(g)|) < ∞ or S(f ) = −S(g), I(f + g) ≤ I(f ) + I(g) if min(|I(f )|, |I(g)|) < ∞ or I(f ) = −I(g). (c) If h(k) ∈ P(S), k ∈ IN, are nonnegative, then ∞ ∞ (k) S h S(h(k) ), ≤ k=1 k=1 ∞ ∞ (k) h I(h(k) ). ≤ I k=1

(2.2.2)

k=1

Proof: (a) Suppose that on the contrary S(h) = ∞ and choose P (k) ∈ M, k ∈ IN, that P (k)∗ (h) > 2k . Then the convex combination P (0) = such−k P (k) is again in M , and hence P (0)∗ (h) < ∞. Therefore, k∈IN 2 there is a P (0) -integrable function f that serves as a majorant for h, and P (0) (f ) =

2−k P (k) (f ) ≥

k∈IN

2−k P (k)∗ (h) = +∞, (2.2.3)

k∈IN

which leads to a contradiction. (b) The subadditivity of S follows from the subadditivity of h → P ∗ (h). If I(f ) + I(g) = ∞, then nothing is left to be shown. If I(f ) + I(g) < ∞, then there exist fi , gi ∈ L1 (Pi ) with f ≤ ⊕ni=1 fi , g ≤ ⊕ni=1 gi , and therefore f + g ≤ ⊕ni=1 (fi + gi ). This implies I(f + g) ≤

n

(fi + gi ) dPi =

fi dPi +

gi dPi ,

i=1

and therefore I(f + g) ≤ I(f ) + I(g). (c) Again as in (b) the subadditivity of S is obvious. Suppose that hi in (2.2.1) can be taken as nonnegative. Then the subadditivity of I follows easily. Now let us show that hi ≥ 0 is not an essential restriction. For any h : S → IR let a and b be the lower and upper bounds for h on S, and let us show first that in (2.2.1) we can take


67

hi to satisfy the following additional constraints without increasing the value of I(h): 1 1 a ≤ hi ≤ b + (b − a) n n

if a ∈ IR,

(2.2.4)

and hi is bounded

if b ∈ IR.

(2.2.5)

To see (2.2.4) it is enough to look at the case a = 0, and hence if the hi ’s satisfy the constraints in (2.2.1), i

inf hi (xi ) =

xi ∈Si

inf

xi ∈Si

hi (xi ) ≥ a = 0.

i

Let us then choose constraints γi ≤ inf xi ∈Si i hi (xi ), i = 1, . . . , n, with total sum 0, and therefore, i Pi (hi ) = i Pi (hi − γi ), which means that we can use hi − γi ≥ 0 in (2.2.1) instead of hi , proving the first inequality in (2.2.4). Now taking hi to be nonnegative, the constraints in (2.2.1) remain valid even if hi is replaced by hi ∧ b as desired in the second inequality of (2.2.4). To see (2.2.5) notice that since hi > −∞, it follows that hi = inf k∈IN hi ∨ (−k), and so the functions hi in (2.2.1) can be assumed to be bounded from below. As b < ∞, we can also assume that the hi ’s are bounded from above, which proves (2.2.5). In particular, (2.2.4) shows that to check the subadditivity of I we (k) may take hi to be nonnegative in the definition of I(h(k) ), and (k) therefore, I( k∈IN h ) ≤ k∈IN I(h(k) ). 2 Subadditivity of I implies that d(f, g) := I(|f − g|)

(2.2.6)

defines a semimetric on P(S). (In (2.2.6) we use the convention a − a = 0 for all a ∈ IR.) Note that d may take infinite values. Definition 2.2.5 (a) N :=

A ⊂ S; ∃Ni ∈ Bi with Pi (Ni ) = 0 s.t. A ⊂

n i=l

πi−1 (Ni )

.

68


(b) For f, g ∈ P(S) define f =g

⇔

{f = g} ∈ N ,

f ≤g

⇔

{f > g} ∈ N .

N N

Then = defines an equivalence relation, and ≤ defines a partial order in N

N

P(S) compatible with the lattice operations on P(S). Lemma 2.2.6 For f, g ∈ P(S) we have f = g

⇔

N

Proof: If f = g, then {f = g} is contained in N

2 i

d(f, g) = 0.

πi−1 (Ni ) for some Ni with

Pi (Ni ) = 0. Take h = |f − g| and hi = ∞1Ni (∞0 := 0) in (2.2.1) to see that I(h) = d(f, g) = 0. If conversely, d(f, g) = 0, then for any k ∈ IN there (k) exist Pi -integrable functions hi such that the dual constraints (k) hi (xi ) ∀x ∈ S (2.2.7) |f (x) − g(x)| ≤ i=1

hold, and (k) Pi (hi ) < 2−k . i (k)

Moreover, according to (2.2.4) we can assume that hi are nonnegative, (k) and so the functions hi = limk→∞ hi ≥ 0 satisfy the side condition in (2.2.1) with h = |f − g|, and i Pi (hi ) = 0. Finally, as {h > 0} ⊂ 2 −1 π {h = 0} and P {h = 0} = 0, we get f = g as required. 2 i i i i i N

Proposition 2.2.7 The functionals S and I are continuous with respect to d. Proof: This follows by making use of the subadditivity of S and I and the bound S ≤ I. More precisely, S(f ) ≤ S(g) + d(f, g)

and I(f ) ≤ I(g) + d(f, g). 2 d

For A ⊂ P(S) we denote by A = A the closure of A with respect to d. Lemma 2.2.8 If A ⊂ P(S) is closed with respect to finite (resp. countable) infima or suprema, then the same holds for A.


Proof: For f = inf f k , g = inf g k we have I(f, g) ≤ I d(f k , g k ), by which the assertion is easily established.

69

|f k − g k | ≤ 2

Definition 2.2.9 For A ⊂ P(S) define 0 1 f ∈ A; f ≥ ⊕i hi for some hi ∈ L1 (Pi ), hi : Si → [−∞, ∞) , Am = (2.2.8) 0 1 m 1 = f ∈ A; f ≤ ⊕i hi for some hi ∈ L (Pi ), hi : Si → (−∞, ∞] ; A

Ab

=

{h ∈ A; inf h(x) > −∞},

Ab

=

{h ∈ A; sup h(x)
−∞} such that d(g k , g) −→ 0 and g ≤ ⊕i hi . Then f k := g k ∧ k ∈ Ab and d(f k , g) ≤ d(f k , g ∧ k) + d(g ∧ k, g) → 0, since d(f k , g ∧ k) = I(|g k ∧ k − g ∧ k|) ≤ I(|g k − g|) = d(g k , g) and d(g ∧ k, g) =

I((g − k)+ ) n 3 k k hi − µi ≤ I = hi − . n + n + i i=1 2

70


We next establish some continuity properties of the functionals S and I that are crucial in the derivation of the duality theorems. Define F(S) G(S)

= =

the class of upper semicontinuous functions on S with values in IR, and

(2.2.10)

the class of lower semicontinuous functions on S.

Theorem 2.2.11 (Continuity of S) (a) S is σ-continuous upwards on Pb (S). (b) S is τ -continuous downwards on Gb (S). (c) S is τ -continuous downwards on F b (S). Proof: (a) For a (pointwise) increasing sequence of functions h(n) ∈ Pb (S),

S

(h)

sup h n∈IN

∗ (n) = sup P sup h ; P ∈M (2.2.11) n∈IN

= sup sup P ∗ h(n) ; P ∈ M = sup S h(n) ; n

n

that is, S is σ-continuous upwards on Fb (S). (b) Let (h(i) )j∈J be an increasing net in Gb (S) with limit h. Then using the monotone convergence for increasing nets of lower semicontinuous functions we have ∗ ∗ (j) (j) (2.2.12) sup sup h ∧ k = P sup sup h ∩ k P (h) = P j∈J k∈IN

=

k∈IN j∈J

sup sup P h(j) ∧ k = sup P h(j) . k∈IN j∈J

j∈J

As in (a), these arguments complete the proof. (c) For the proof of (c) the following compactness lemma is used. Lemma 2.2.12 Let h ∈ F b (S) satisfy S(h) > −∞ and let S(h) > δ. Then M (h, δ) := {P ∈ M ; h dP ≥ δ} is nonvoid and compact. The proof of Lemma 2.2.12 is a consequence of the fact that the mapping P → h dP = inf k (h ∧ k) dP is upper semicontinuous, and so the set M (h, δ) is closed. 2


71

Now take (h(i) )j∈J to be a decreasing net in F b (S) with limit h, and let δ < inf j∈J S(h(j) ). On M the function P → P ∗ (h(j) ) = inf k∈IN P (h(j) ∨ (−k)) is an infimum of upper semicontinuous functions, and therefore it is also upper semicontinuous. This implies that the set M (h(j) , δ) of probabilities P ∈ M with P ∗4(h(j) ) ≥ δ is closed, and so it is nonempty and compact. Now take P ∈ j∈J M (h(j) , δ) and note that the inner measure P∗ is τ -continuous downwards on F b (S). Since the limit h ∈ F b (S), then S(h) ≥ P ∗ (h) = P∗ (h) = inf P∗ (h(j) ) = inf P ∗ (h(j) ) ≥ δ. j∈J

j∈J

Letting δ → inf j∈J S(h(j) ) implies S(h) ≥ inf j∈J S(hj ), and therefore S(h) ≥ inf S(h(j) )

(2.2.13)

j∈J

2

as desired.

To establish continuity properties of the functional I the following compactness property of the class of functions (h1 , . . . , hn ) such that ⊕hi ≥ h is quite essential. Lemma 2.2.13 Given Pi and h : S → [0, ∞], for δ > I(h) let L(h; δ) be .n (k) (k) the set of all sequences (h1 , . . . , hn )k∈IN in i=1 L1 (Pi ) such that (k)

(i) 0 ≤ hi (1)

≤k

Pi -a.s. for all i and k,

(2)

(ii) hi ≤ hi ≤ · · · Pi -a.s. for all i, (k) (iii) i Pi (hi ) ≤ δ for all k, (iv) h ∧ k ≤ N

n

(k)

hi

◦ πi .

i=1

. n IN 1 The set L(h; δ) ⊂ is nonempty, and is compact with respect i=1 L (Pi ) to the product of the weak topologies in the spaces L1 (Pi ). Proof: To check that L(h; δ) is nonempty, choose n Pi -integrable nfunctions hi : Si → [0, ∞] (cf. (2.2.1)) satisfying h(x) ≤ i=1 hi (xi ) and i=1 Pi (hi ) ≤ δ. Then clearly, (h1 ∧ k, . . . , hn ∧ k)k∈IN ∈ L(h; δ). To check the compactness of L(h; δ) observe that (i) determines a (rel. n N 1 atively) weakly compact subset C of . The restrictions (ii) i=1 L (Pi ) and (iii) determine a closed subset of C. To see that (iv) defines a closed .n IN 1 subset of take g ∈ P(S) and consider the convex set C(g) i=1 L (Pi ) n of vectors (g1 , . . . , gn ) with gi ∈ L1 (Pi ) and g ≤ i=1 gi ◦ πi . First notice N

72


that C(g) is closed with respect to the product of the strong topologies n ( ) ( ) of the space L1 (Pi ). In fact, if g ≤ ◦ πi and gi − gi → 0 i=1 gi N

for all i, then applying the diagonal principle, we may assume that the ( ) (gi ) ∈IN converge Pi -almost everywhere. Next, from the definition of the partial order ≤ and the equivalence relation =, we have that N

N

g ≤ N

n

( )

lim →∞ gi ◦ πi =

i=1

n

gi ◦ πi .

i=1

A strongly closed convex set is weakly closed, and the weak topology of the product coincides with the product of the weak topologies (Schaefer (1966, .n p. 65 and p. 137)), and consequently the set C(g) is a closed subset of i=1 L1 (Pi ). Therefore, the restriction (iv) defines a closed subset of IN .n 1 . This completes the proof of the compactness of L(h, δ). i=1 L (Pi ) 2 Theorem 2.2.14 (Continuity of I) The functional I has the following continuity properties: (a) I is σ-continuous upwards on Pb (S). (b) I is τ -continuous upwards on Gb (S). (c) I is τ -continuous downwards on F b (S). Proof: (a) Take (h( ) ) ∈IN an increasing sequence in Pb (S) with limit h, and let sup ∈IN I(h( ) ) < δ. Applying (2.2.1), we may (without loss of generality) assume that h( ) ≥ 0. From Lemma 2.2.13, we see that the decreasing sequence {L(h( ) , δ); ∈ IN} consists of nonempty (k) (k) compact sets. Then take (h1 , . . . , hn )k∈IN in the intersection of ( ) {L(h , δ); ∈ IN} and set hi := lim sup hi . In view of Lemma n 2.2.13, the hi are nonnegative, Pi -integrable, with i= Pi (hi ) ≤ δ and satisfy h( ) ≤ N

n

hi ◦ πi

i=1

for all ∈ IN. From the definition of the partial order ≤, we have that N n h ≤ i=1 hi ◦ πi , which leads to I(h) ≤ δ. Letting δ → sup ∈IN I(h( ) ) N

and using the monotonicity of I, we obtain I(h) = sup I(h( ) ) ∈IN

(2.2.14)


73

as required. (b) Our next step is to show that I is τ -continuous upwards on Gb (S), the set of lower semicontinuous functions g : S → (−∞, +∞] with inf g > −∞. Let (h(j) )j∈J be an increasing net in Gb (S) with limit h. Repeating the same arguments as in the proof of (2.2.14), we need only show that the relation 0 ≤ h

(j)

n

≤ N

hi ◦ πi

for all j ∈ J and some hi ∈ L1 (Pi )

i=1

n implies h ≤ i=1 hi ◦πi . In fact, any 0 ≤ g ∈ G(S) can be represented as a supremum of functions α1G1 ×···×Gn for some rationals α and open Gi ⊂ Si . From the definition of the partial order ≤, it suffices to show the following: If for any open sets N

Gji ⊂ Si , j ∈ J , (Gj1

× ··· ×

Gjn )

∩

n

hi ◦ πi < α

⊂

i=1

n

πi−1 (Ni )

(2.2.15)

i=1

for some Pi (Ni ) = 0, then setting j G = G1 × · · · × Gjn , j∈J

the corresponding inclusion also holds; that is, for some Borel sets i ) = 0, i ⊂ Si with Pi (N N n n i ). G∩ hi ◦ πi < α ⊂ πi−1 (N (2.2.16) i=1

i=1

n On the other hand, any set { i=1 hi ◦ πi < α} is a countable union of products B1 × · · · × Bn where Bi = {hi < αi } for some rationals αi . Thus, itis enough to check the implication “(2.2.15) ⇒ (2.2.16)” n replacing { i=1 hi ◦ πi < α} by B1 × · · · × Bn . To this end, set Ji := {j ∈ J ; Pi (Gji ∩ Bi ) = 0} and i := N

j∈Ji

(Gji ∩ Bi ).

74


i are Borel sets in Si , and Pi (N i ) = As Gji ∩Bi is open in Bi , the sets N 0. The inclusion (Gj1 ∩ B1 ) × · · · × (Gjn ∩ Bn ) ⊂

n

πi−1 (Ni )

i=1

with Pi (Ni ) = 0 implies that for each j ∈ J there is at least one index i with Pi (Gji ∩ Bi ) = 0. Therefore, for all j ∈ J , (Gj1 × · · · × Gjn ) ∩ 2 −1 (B × · · · × B ) ⊂ i πi (Ni ), implying that G ∩ (B1 × · · · × Bn ) ⊂ 2 1 −1 n π ( N ). The proof of upwards τ -continuity of I is complete. i i i (c) Finally, let us show that I is τ -continuous downwards on F b (S), the set of upper semicontinuous functions h on S with supx∈S h(x) < ∞. To do this take a decreasing net (h(j) )j∈J in F b (S) with limit h. Next, to show that I(h) = inf I(h(j) ), j∈J

it suffices to prove that the inequality δ > I(h) implies δ > inf I(h(j) ), j∈J

as the inequality I(h) ≤ inf j∈I K(h(j) ) is obvious. Without loss of generality, we may assume that h(j) ≤ 0 for all j ∈ J . Applying (2.2.4), we choose bounded Pi -integrable functions hi such that n h ≤ i=1 hi ◦ πi and i Pi (hi ) < δ. By Lusin’s theorem, there exist compact sets Ki ⊂ Si such that hi is continuous on Ki , and moreover, n

Pi (hi ) − nγ

i=1

n

Pi (Si \ Ki ) < δ,

i=1

where γ = mini inf x∈Si hi (x) ≤ 0. For any ε > 0, the inequality n h≤ hi ◦ πi implies (by Dini’s theorem) the existence of j0 ∈ J i=1 n such that h(j0 ) (x) < i=1 hi ◦ πi (x) + ε for any x in the compact set C := C1 × · · · × Cn . Now let hi := hi + nε − nγ1Si \Ki . Then n n hi ◦ πi and µi (hi ) < δ + ε. Letting δ ↓ I and ε ↓ 0 h(j0 ) ≤ i=1

i=1

implies I(h) ≤ inf I(h(j) ), showing the downwards τ -continuity of I. j∈J

2 Kellerer (1984) provides counterexamples showing that I is not σ-continuous downwards on Gb (S) and that S is not σ-continuous downwards on Gb (S). Also, in Theorem 2.2.11 the lower boundedness in (a) cannot be replaced by finiteness.


75

In the final part of this section we look at to what extent continuity properties of h can be reflected by the corresponding properties of the functions hi in the definition of I(h). For H ⊂ P(S), let Hf (resp. H f ) be the elements in h with values in (−∞, ∞] (resp. [−∞, ∞)). Proposition 2.2.15 (a) For any h ∈ P(S), n Pi (hi ); hi ∈ L1f (Pi ) ∩ Gb (Si ), h ≤ ⊕hi . (2.2.17) I(h) = inf i=1

(b) If h ∈ F(S) is majorized from above by some ⊕h0i with h0i ∈ L1f (Pi ) ∩ F(S), then I(h) = inf Pi (hi ); hi ∈ L1f (Pi ) ∩ F(S), h ≤ ⊕hi . (2.2.18) (c) If Si are completely regular, h ∈ C(S) is majorized from above by some ⊕h0i with h0i ∈ L1f (Pi ) ∩ C(Xi ), then I(h) = inf

Pi (hi ); hi ∈ L1f (Pi ) ∩ C(Xi ), h ≤ ⊕hi . (2.2.19)

Proof: (a) By the regularity of Pi there exists hi ∈ L1 (Pi ) and ε > 0 such that hi ∈ L1 (Pi ) ∩ Gb (Si ) with hi ≤ hi and Pi (hi ) − Pi (hi ) < ε. This implies (a). (b) In the case h ∈ F b (S) the majorization condition is fulfilled, and without loss of generality h ≤ 0. Then h is an infimum of functions of the type α1G1 ×···×Gn with α < 0, Gi ∈ G(Si ). By the τ -continuity of I downwards on F b (S), it is enough to consider finite infima, that is, to assume f ∈ F(S) ∩ (E(S1 ) × · · · × E(Sn )), where E(Si ) arethe finite elementary functions on Si , and then refine I(h) = inf{ Pi (hi ); h ≤ ⊕hi , hi ∈ L1 (Pi ) ∩ E(Si )}. Define

    h1 (x1 ) := sup h(x1 , . . . , xn ) − hi (xi ); xi ∈ Si .   i =1

Then h1 ≤ h1 , and still h(x) ≤ h1 (x1 ) + i =1 hi (xi ). Continuing this way with the other components as well, the special case is settled. In the general case, let ε > 0 and k ∈ IN be large enough so that

Pi (h1i )

< ε for

h1i

:=

h0i

k − 2n

+ ∈ F b (Si ).

76


As we have shown, for δ > I(h) there exist h2i ∈ L1f (Si ) ∩ F((Si ) such that h ∧ k ≤ ⊕h2i

and

n

Pi (h2i ) < δ.

i=1

Then hi := 2h1i + h2i ∈ Lf (Pi ) ∩ F(Si ) satisfy h ≤ ⊕hi . This majorization is obvious in the case h(x) ≤ k. In the other case it follows k )(x). Since in from the inequality h(x) ≤ 2h(x) − k ≤ 2 ⊕ (h0i − 2n addition Pi (hi ) < 2ε + δ, the assertion is proved. (c) The proof follows as in part (b). We observe that the functions hi can be assumed bounded, and we make use of the τ -continuity of µi . Complete regularity of Si implies that any bounded hi ∈ F(Si ) is in fact a pointwise infimum of bounded continuous functions. This implies (c). 2

2.3 Duality Theorems in a Topological Setting The aim of this section is to develop duality theorems in the topological setting with tight probability measures extending the results in Section 2.1.(1) We say that “duality” (D) holds for h if S(h) = I(h). Let E(Si ) denote the elementary Borel functions on Si , and G(S), F(S) denote the sets of the lower and upper semicontinuous functions on S. Theorem 2.3.1 (Duality theorems for semicontinuous functions) (a) (D) holds on E(S1 ) ⊗ · · · ⊗ E(Sn ). (b) (D) holds on G m (S). (c) (D) holds on F(S). Proof: (a) The result follows from Theorems 2.1.4 and 2.1.1. Alternatively, one can also apply directly the duality theorem from finite linear programming. (1) The

results of this section are due to Kellerer (1984).

2.3 Duality Theorems in a Topological Setting

77

(b) Any h ∈ Gb (S) (which we assume without loss of generality to be ≥ 0) is the limit of an increasing net of finite suprema of the type α1G1 ×···×Gn , α ≥ 0, Gi ∈ G(Si ). Therefore, by (a) and using that S and I are τ -continuous upwards on Gb (S) (cf. Theorems 2.2.11, 2.2.14) we obtain that S(h) = I(h). Since S, I are continuous with respect to d, the duality can be extended to the closure of Gb (S) (with respect to d), which, in fact, is identical to the closure of Gm (S) (with respect to d). (c) As in (b) duality holds on F m (S). If h ∈ F(S) and S(h) = ∞, then clearly S(h) ≤ I(h) = ∞. So without loss of generality assume that S(h) < ∞. The latter implies P ∗ (h) < ∞ for all P ∈ M , and consequently, P ∗ (h+ ) < ∞, h+ := max(h, 0). On the other hand by the duality on F m (S), we have I(h+ ) = lim I(h+ ∧ k) = lim S(h+ ∧ k) = S(h+ ). Therefore, I(h) ≤ I(h+ ) = S(h+ ) < ∞, and so h ∈ F m (S); that is, duality holds for h. 2 The following example shows that the duality theorem fails when G m (S) is extended to G(S). Example 2.3.2 Let n = 2, Si = [0, 1], Pi be Lebesgue measure, and h(x1 , x2 ) = (−∞)1{x1 ≥x2 } . Then h is lower semicontinuous on S = [0, 1] × [0, 1]. However, S(h) = −∞ and I(h) = 0. To see this, suppose that a probability P on S with uniform marginals has P ∗ (h) > −∞. Thus P is concentrated above the diagonal x1 = x2 , and x1 dP (x1 , x2 ) < x2 dP (x1 , x2 ), S

S

which contradicts the relation π1 P = π2 P . So, S(h) = −∞. To prove that 0 1 I(h) = inf P1 (h1 ) + P2 (h2 ); h1 (x1 ) + h2 (x2 ) ≥ (−∞)1{x1 ≥x2 } = 0, it is enough to show that P1 (h1 )+P2 (h2 ) ≥ 0 whenever h1 (x1 )+h2 (x2 ) ≥ 0 for x1 < x2 . In fact,   1−ε 1 P1 (h1 ) + P2 (h2 ) = lim  h1 (s) ds + h2 (s) ds ε↓0

0

=

ε

1 lim (h1 (s − ε) + h2 (s)) ds ≥ 0. ε↓0

ε

Therefore, S will be strictly smaller than I on G(S).

78


The result for semicontinuous functions serves as a starting point for general duality theorems. The extension from semicontinuous to measurable cost functions will require some auxiliary results on Suslin sets and functions (see, for example, the monographs of Hausdorff (1957) and Dellacherie and Meyer (1983, part C)). Let Y denote an arbitrary set and P(Y ) stand for the set of functions on Y with values on the extended real line IR. Definition 2.3.3 (Suslin sets and functions) (i) Given a familiy U of subsets of Y , any set B of the form B =

5

Ak1 ...k ,

with fk1 ...k ∈ U,

(2.3.1)

(k ) ∈IN

where the union is taken over all sequences (k ) ∈IN in IN, is called U-Suslin set. (ii) Given a family A ⊂ P(Y ), any functions g of the form g = sup inf fk1 ...k , (k ) ∈IN

with fk1 ...k ∈ A,

(2.3.2)

where the supremum is taken over all sequences (k ) ∈IN in IN, is called an A-Suslin function. From Hausdorff (1957, p. 106) we infer that if (Bn )n∈IN is an increasing (resp. decreasing) sequence of U-Suslin sets, then ∪n∈IN Bn (resp. ∩n∈IN Bn ) is also a U-Suslin set. Note that U itself is a family of U-Suslin sets. A similar result is valid for monotone sequences of U-Suslin functions. In other words, U-Suslin sets and A-Suslin functions are σ-lattices containing U and A, respectively. Definition 2.3.4 For a subset A ⊂ P(Y ) a functional C : P(Y ) → R is said to be an A-capacity if (i) A is a lattice; (ii) f ≤ g ⇒ C(f ) ≤ C(g) (that is, C is isotonic); (iii) C is σ-continuous upwards on P(Y ); (iv) C is σ-continuous downwards on A.


79

Theorem 2.3.5 (Choquet’s Theorem) Let C : P(Y ) → IR be an A-capacity, where A ⊂ P(Y ) is stable with respect to countable infima. Then the approximation C(g) = sup{C(f ); f ∈ A; f ≤ g}

(2.3.3)

holds for all A-Suslin functions g. Proof: Cf. Choquet (1959, Theorem 1, p. 84) for Suslin sets. The proof extends in a similar way to Suslin functions. 2 Given a topological space Y let L(Y ) denote the class of all F(Y )-Suslin sets and S(Y ) denote the class of all F(Y )-Suslin functions. Lemma 2.3.6 g : Y → R is in S(Y ) if and only if for all α ∈ R, {g ≥ α} ∈ L(Y ). Proof: Suppose {g ≥ α} ∈ L(Y ) and rewrite g as a supremum over all gα : Y → {−∞, α}, where α is a rational number, defined by {gα = α} = {g ≥ α}. Recall that S(Y ) is a σ-lattice, and therefore, from the above representation of g, it suffices to show that gα ∈ S(Y ). Indeed, by the assumption {gα = α} = {g ≥ α} =

5

Ak1 ···k

(k ) ∈IN

with Ak1 ···k being closed sets, we see that gα enjoys the representation sup inf fk1 ···k , (k ) ∈IN

where fk1 ···k : Y → {−∞, α} and {fk1 ···k = 2} = Ak1 ···k . This implies that gα ∈ S(Y ).

2

Based on Choquet’s theorem, the duality results for semicontinuous functions can be extended to the set of Suslin functions. Theorem 2.3.7 (Duality for Suslin functions) The duality S(h) = I(h) holds on Sm (S), the closure of the F(S)-Suslin functions that are majorized from below.

80


Proof: First let us show that S(h) = I(h) for all bounded h ∈ S(S). For any h : S → IR and k ∈ IN, set hk := (−k) ∨ (h ∧ k), S k (h) = S(hk ), and I k (h) := I(hk ). In view of the upwards σ-continuity of S for bounded h, as well as its downwards σ-continuity for bounded upper semicontinuous functions (cf. Section 2.2), it follows that S k is an F(S)-capacity. The same is true for I k , and moreover, applying Theorem 2.3.1, S k = I k on F(S). Choquet’s theorem (cf. (2.3.4)) now gives an extension of the duality S = I for all bounded F(S)-Suslin functions as claimed. Next, for any h ∈ S(S) with inf x∈S h(x) > −∞, the equality S(h) = I(h) is a consequence of the upwards σ-continuity of S and I. As S and I are continuous with respect to the metric d, the equality extends, furthermore, to Sb (S) = Sm (S). 2 Let Bo (S) = σ(C(S)) denote the Baire sets in S. As a corollary of Theorem 2.3.7 one obtains the following important result. Theorem 2.3.8 (Baire functions, product measurable functions) (a) (D) holds for all Baire functions in Pm (S). (b) (D) holds for all functions in Pm (S) measurable with respect to B(S1 )⊗ · · · ⊗ B(Sn ). Proof: (a) Since Bo (S) = σ(C(S)) ⊂ σ(F(S)) ⊂ C(S) and the set of Baire functions is identical to the σ-lattice generated by C(S), then (a) is a consequence of Theorem 2.3.7 and Lemma 2.3.6. (b) The set of all functions measurable with respect to B(S1 )⊗· · ·⊗B(Sn ) is identical to the σ-lattice generated by E(S1 ) ⊗ · · · ⊗ E(Sn ). By making use of the regularity of measures, it is easy to see that E(S1 ) ⊗ · · · ⊗ E(Sn ) ⊂ F(S) ∩ G(S).

(2.3.4)

Therefore, σ(E(S1 ) ⊗ · · · ⊗ E(Sn ) ⊂ σ(F(S)) ⊂ σ(S(S)). Moreover, S(S) is a σ-lattice, and therefore (b) follows from Theorem 2.3.7. 2

Corollary 2.3.9 Suppose that each Si , i = 1, . . . , n, possesses at least one of the following properties: (i) Si is second countable; (ii) Si is metrizable; (iii) Si is a Suslin space;


81

(iv) Si is perfect. Then the duality theorem S(h) = I(h) holds for all h ∈ Pm (S) that are measurable with respect to B(S). Proof: In (i) and (ii) each Si can be viewed as a union of countably many Suslin spaces and a Pi -null set. Therefore, it suffices to show the duality for Si being a union of Suslin spaces, and thus being a Suslin space itself. Then S is again a Suslin space, implying that the Borel σ-algebra B(S) coincides with the product B(S1 ) ⊗ · · · ⊗ B(Sn ). The result follows from Theorem 2.3.8. If (Si ) are perfect, then S again is perfect, and therefore each open set is of type Fσ . Thus B(S) is the σ-lattice generated by F(S), and for any h ∈ B(S) we have {h ≥ α} ∈ σ(F(S)) ⊂ C(S), and thus we can apply Theorem 2.3.7, completing the proof. 2 We finally look at the problem of existence of optimal solutions of the primal and dual transportation problems. We start with the optimality in the primal tranportation problem, S(h) = sup{P ∗ (h); P ∈ M }; that is, we are interested in the existence of an “optimal” probability distribution P on S with marginals Pi and S(h) = P ∗ (h). Theorem 2.3.10 If h : S → IR, h ∈ F(S), that is, h is in the d-closure of the set of upper semicontinuous functions on S, then there exists an optimal P ∈ M . Proof: Recall that if S(h) = ∞, then P ∗ (h) = ∞ for some P with projections Pi ; so let us assume that S(h) < ∞. The duality theorem 2.3.1 n then yields the finiteness of I(h), and thus h ≤ i=1 hi ◦ πi for some m Pi -measurable hi : S → (−∞, +∞]; that is, h ∈ F (S) = F b (S). Consequently there exist functions hk ∈ F b (S) such that d(h, hk ) < k1 for all k ∈ IN. Then for every P ∈ M , P ∗ (hk ) −

1 1 ≤ P ∗ (h) ≤ P ∗ (hk ) + , k k

and hence, ∗

M (h; δ) := {P ∈ M ; P (h) ≥ δ} =

5 k∈N

M

1 h ; δ− k

k

for all δ < S(h). The nonempty sets M (h; δ) are compact (cf. Proposition 2.2.1) and form a decreasing sequence as δ ↑ S(h). Then any P from the intersection ∩{M (h; δ); δ < S(h)} = Ø is optimal. 2

82


Remark 2.3.11 The result on the existence of optimal measures cannot be extended to the class of all lower semicontinuous functions h. To see this, take n = 2, Si = [0, 1], and Lebesgue measure Pi . Then, for the open set G = {(x1 , x2 ); x1 < x2 } we have S(1G ) = 1. On the other hand, assuming that an optimal feasible solution there exists P will lead to a contradiction, for x1 dP < x2 dP will imply π1 P = π2 P , S

S

that is, P is not feasible. Next we study the existence of optimal solutions in the dual problem, n Pi (hi ); hi : Si → (−∞, +∞], I(h) = inf n i=1 1 i = 1, . . . , n, hi ∈ L (Pi ), h ≤ hi ◦ πi . i=1

In other words, we are interested in the optimal feasible functions hi (i = 1, . . . , n) that attain the infimum in I(h). The case of bounded h was analyzed in Theorem 2.1.1. Theorem 2.3.12 Suppose I(h) < ∞ for h ∈ Pm (S). Then there exists an optimal solution for the dual problem. Proof: Recall that if h = h , then {h = h } is contained in πi−1 (Ni ) N

i

for some Ni with Pi (Ni ) = 0 for i = 1, . . . , n. Choosing an appropriate N equivalent h, we may assume that in fact the infimum in I(h) is taken over finite hi . Without loss of generality we may assume h to be nonnegative. Invoking Lemma 2.2.13, the sets L(h; δ) are compact and ∩{L(h; δ); δ > (k) (k) I(h)} contains at least one sequence, say, (h1 , . . . , hn )k∈IN . Setting hi = (k) limk→∞ supi hi , we have, using Lemma 2.2.13 again, that hi are nonnegn n ative, Pi -integrable, i=1 Pi (hi ) ≤ δ, and h ≤ i=1 hi ◦ πi . N n P (h ) ≤ I(h). Redefining hi , one obtains Letting δ ↓ I(h) implies i i i=1 n 2 h ≤ i=1 hi ◦ πi everywhere, and the result follows. Remark 2.3.13 The existence theorem for optimal solutions of the dual functional I does not need the topological assumptions on Si and is in fact true for functions majorized from below as in 2.3.12.

2.4 General Duality Theorem In this section the duality theory is extended to a nontopological setting. (The results here are due to Ramachandran and R¨ uschendorf (1995).)

2.4 General Duality Theorem

83

As we shall see, the notion of perfectness of measures is essential for a general version of duality theory. Recall that a probability measure P on (X, A) is called perfect if for every real-valued A-measurable function f on X there exists a Borel set B ⊂ f (X) such that P (f −1 (B)) = 1. We shall make use of two basic properties of perfect measures. Theorem 2.4.1 (Perfectness and marginals; cf. Pachl (1979), Ramachandran (1979)) Let (X, A, P ) be a probability space. The following statements are equivalent: (i) P is perfect. (ii) If (Y, B, Q) is a probability space and λ is a charge(2) on the algebra alg({A × B; A ∈ A, B ∈ B}) with marginals P and Q, then λ is σ-additive. (iii) If (Y, B, Q) is a probability space and λ ∈ M (P, Q), then λ∗ (X ×F ) = Q∗ (F ) for all F ⊂ Y . For extensions of this characterization to products with more than two factors we refer to Ramachandran (1993). Theorem 2.4.2 (Perfectness and marginal extensions; cf. Ramachandran (1993)) Let (Xi , Ai , Pi ) i = 1, 2 be probability spaces where P2 is perfect. Let Di ⊂ Ai be sub σ-algebras and λ be a measure on D1 ⊗ D2 with marginals Pi /Di . Then λ admits an extension λ to A1 ⊗ A2 with marginals Pi . Proof: Consider the algebras C1 = alg(D1 × D2 ), C2 = A1 × X2 , and define µ1 =λ/C1 , µ2 (A1 × X2 ) := P1 (A1 ). If Ci ∈Ci and C1 ⊂ C2 , n n then 2 C1 = k=l D1k × D2k ⊂ C2 = A1 × X2 . Hence k=1 D1k × D2k ⊂ ( k D1k ) × X2 ⊂ A1 × X2 = C2 , and so µ1 (C1 ) = λ(D1k × D2k ) ≤ P1 (∪D1k ) ≤ P1 (A1 ) = µ2 (C2 ). By a well-known extension result of Guy (1961) this consistency condition implies the existence of a charge µ0 on C1 ∨ C2 = alg(A1 × D2 ) extending µi on Ci . Since the restriction P2 /D2 is perfect (cf. Ramachandran (1979)), then µ0 is σ-additive, and it extends to a measure µ0 on A1 ⊗ D2 . By construction, the marginals of µ0 are P1 and P2 /D2 , and µ0 /D1 ⊗D2 = λ. Repeating the argument, starting with µ0 on A1 × D2 and the measure induced by P2 on X1 ×A2 , we get the desired extension λ on A1 ⊗A2 with marginals Pi . 2 6n Let 7(n i=1 Bi )m denote the1 class of all ⊗Bi -measurable functions h with h ≥ i=1 hi for some hi ∈ L (Pi ). (2) A

charge is a positive finite additive measure on an algebra.

84


Theorem 2.4.3 (General duality theorem) Let (Si , Bi , Pi ) be measurable spaces, 1 ≤ i ≤ n, and let all but one of the 6nPi be perfect. Then the duality theorem S(h) = I(h) holds for all h ∈ ( i=1 Bi )m . Proof: For the proof we can restrict to the case of two factors; the general case is done similarly. We proceed in several steps. Step 1: If Si = [0, 1], Bi = B, then S(h) = I(h) for h ∈ (B1 ⊗ B2 )m . This has been proved in Theorem 2.3.8. Step 2: Let S1 ⊂ [0, 1], B1 = B ∩ S1 , S2 = [0, 1], B2 = B. Define P 1 on B by P 1 (B) = P1 (B ∩ S1 ). Similarly, for P ∈ M(P1 , P2 ) define P on B ⊗ B by P (C) = P (C ∩ (S1 × S2 )). (a) P → P is a 1 to 1 correspondence between MB1 ⊗B (P1 , P2 ) and MB⊗B (P 1 , P2 ), and (b) For any h ∈ B1 ⊗ B2 there is an h ∈ B ⊗ B such that h|S1 ×S2 = h and h dP = h dP . For the proof of (a), (b) note: (a) It is easy to verify that P has marginals P 1 and P2 . Conversely, if Q ∈ MB⊗B (P 1 , P2 ), then Q∗ (S1 × [0, 1]) = 1 (see Theorem 2.4.1). Since (P 1 )∗ |B1 = P1 , it follows that P = Q∗ |B1 ⊗B2 ∈ MB1 ⊗B2 (P1 , P2 ), and it holds that P = Q. (b) Since B1 ⊗ B2 = (B ⊗ B) ∩ (S1 × S2 ) for h = 1C∩(S1 ×S2 ) , we can take h = 1C . Using standard techniques involving simple functions and the monotone convergence theorem, the result holds for all B1 ⊗ B2 measurable functions. Step 3: Let S1 ⊂ [0, 1], B1 = B ∩ S1 , S2 = [0, 1], B2 = B. Then (D) holds for h ∈ (B1 ⊗ B2 )m . The proof follows from the following sequence of inequalities.

SB1 ⊗B2 (h) = sup h dP ; P ∈ MB1 ⊗B2 (P1 , P2 )

h dP ; P ∈ MB⊗B (P 1 , P2 ) by Step 2 = sup

2.4 General Duality Theorem

= SB⊗B (h) = IB⊗B (h) ≥ IB1 ⊗B2 (h) ≥

85

by Step 1 since h ≤ ⊕i f i and so h ≤ ⊕i f i |Xi

SA1 ⊗A2 (h).

Step 4: Let (Si , Bi , Pi ) be two probability spaces where B1 , B2 are countably generated and P2 is perfect. Then (D) holds for h ∈ (B1 ⊗ B2 )m . For the proof we use the Marczewski function of countable generators of Bi and thereby identify Si ⊂ [0, 1], Bi = B ∩ Si . Further, since P2 is perfect, we can find a Borel set Y ⊂ S2 such that P2 (Y ) = 1. After discarding the P2 -null set (S2 − Y ) and using the Borel isomorphism theorem, we can take S2 = [0, 1] and B2 = B. Now Step 2 applies, and the desired conclusion follows. Now we can establish our main result. Step 5: Let (Si , Bi ) be arbitrary measurable spaces, and let P2 be perfect. Then (D) holds for h ∈ (B1 ⊗ B2 )m . Let for the proof of Step 5 h ∈ (B1 ⊗ B2 )m , and let fi ∈ Bi , i = 1, 2, with fi ∈ L1 (Pi ) be such that h ≥ ⊕i fi . Furthermore, consider (a) D = σ(h) = {h−1 (B); B is a Borel subset of the real line}. Then D is countably generated, and hence D = σ({Dk }∞ k=1 ). Since B1 ⊗ B2 = σ({B1 × B2 ; Bi ∈ Bi , i = 1, 2}), we have, for each fixed Dk , k k a sequence {B1n × B2n ; n ≥ 1, Bin ∈ Bi , i = 1, 2} such that Dk ∈ k k ∞ σ(B1n × B2n }n=1 ). For i = 1, 2 let Ci = σ(fi ) = {fi−1 (B); B is a Borel subset of the real line}, which is in fact countably generated. If n ; k ≥ 1, n ≥ 1}, Ci ), then D, D1 , D2 , D1 ⊗ D2 are we let Di = σ({Bik all countably generated such that D = σ(h) ⊂ D1 ⊗ D2 ⊂ B1 ⊗ B2 . Finally, h ∈ (D1 ⊗ D2 )m . (b) If P ∈ MB1 ⊗B2 (P1 , P2 ) then let P = P |D1 ⊗D2 , Pi = Pi |Di , i = 1, 2. Note that P ∈ MD1 ⊗D2 (P 1 , P 2 ) with h dP = h dP . Conversely, since P2 is perfect, for any Q ∈ MD1 ⊗D2 (P 1 , P 2 ) we can find by Theorem 2.4.2 a measure P ∈ MB1 ⊗B2 (P1 , P2 ) such that P = Q. It follows again that h dP = h dP .

(c) IB1 ⊗B2 (h) ≥ =

SB1 ⊗B2 (h)

sup h dP ; P ∈ MB1 ⊗B2 (P1 , P2 )

86


=

sup

h dP ; P ∈ MD1 ⊗D2 (P 1 , P 2 )

(by (b))

= SD1 ⊗D2 (h) = ID1 ⊗D2 (h) by Step 4 ≥ IB1 ⊗B2 (h) (by the definition of I since D1 ⊗ D2 ⊂ B1 ⊗ B2 ).

Hence (D) holds.

2

Corollary 2.4.4 If C1 ⊗ C2 ⊂ B1 ⊗ B2 , P2 on B2 is perfect and h ∈ (C1 ⊗ C2 )m , then IC1 ×C2 (h) = IB1 ×B2 (h). This is an example in Ramachandran and R¨ uschendorf (1996a) of a thick nonperfect subspace (Y, A, P ) ⊂ ([0, 1], B, λ\) (i.e., Y ⊂ [0, 1], A = B ∩ Y , and P = λ\∗ /A) such that the duality theorem holds for (Y, A, P )⊗(Y A, P ). For a converse to the duality space we next introduce the notion of strong duality space. Definition 2.4.5 A probability space (S1 , A1 , P1 ) is called a strong duality space if for all (S2 , A2 , P2 ) the duality theorem holds on S1 × S2 and if for all B2 ⊂ A2 and h ∈ (A1 × B2 ), IA1 ×A2 (h) = IA1 ×B2 (h). This notion of a strong duality space characterizes perfectness (cf. Ramachandran and R¨ uschendorf (1997). Theorem 2.4.6 A probability space is a strong duality space if and only if it is a perfect space.

2.5 Duality Theorems with Metric Cost Functions The duality theorems 2.4.3, 2.3.10 admit a more precise form if one assumes that the cost function is of a special form. Suppose that Si , i = 1, . . . , n, are copies of a separable metric space U with metric d. Let n = 2, 3, . . ., and ||b||, b ∈ IRm , m = n2 , be a monotone seminorm in IRm with the following property: If 0 < bi ≤ bi , i = 1, . . . , m, then ||b || ≤ ||b ||. For any x = (x1 , . . . , xn ) ∈ S = U N , let D(x) = (d(x1 , x2 ), d(x1 , x3 ), . . . , d(x1 , xn ), d(x2 , x3 ), . . . , d(xn−1 , xn ))

and c(x) = H(D(x)), where H is a convex nonnegative function of IR+ , vanishing at the origin and with sup{H(2t)/H(t); t > 0} < ∞. Let

2.5 Duality Theorems with Metric Cost Functions

87

P = (P1 , . . . , Pn ) be a vector of Borel probability measures (probabilities) on U (not necessarily tight) and define the generalized Monge–Kantorovich functional with cost function c: Ac (P) := inf{P (c); P ∈ M (P1 , . . . , Pn )}

(2.5.1)

(cf. (2.1.1)). The restriction on the cost function c gives rise to a duality theorem more refined than Theorem 2.4.3 concerning the functions in H the dual problem. In fact, let P be the space of probabilites on U with H(d(x, a))P ( dx) < ∞ for some a ∈ U . Then Ac (P) < ∞, provided that Pi ∈ P H , and in this case let us define the dual Monge–Kantorovich functional K(P) := sup

n

Pi (fi ).

(2.5.2)

i=1

Here the supremum is taken over all Pi -integrable fi : U → IR that are bounded Lipschitz functions such that sup{|fi (x)|; x ∈ U } < ∞, |fi (x) − fi (y)| ≤ αi d(x, y),

∀x, y ∈ U

(2.5.3)

for some αi > 0, and that satisfy the duality constraints n

fi (xi ) ≤ c(x)

∀x = (x1 , . . . , xn ) ∈ S.

i=1

Then the following duality theorem holds; see Rachev (1991c, p. 100). Theorem 2.5.1 For any separable metric space (U, d) and for any Pi ∈ P H, i = 1, . . . , n, Ac (P) = K(P).

(2.5.4)

Moreover, if the Pi s are tight probabilities, the infimum in (2.5.1) is attained. In particular, if n = 2 and H(t) = tp , p ≥ 1, then the functional Ac (P) p is in fact the minimal Lp -metric in the space of probabilities on U : 1

  1/p     dp (x, y)P ( dx, dy) ; πi P = Pi , i = 1, 2 . (2.5.5) p (P1 , P2 ) = inf      U ×U

88


Corollary 2.5.2 For p ≥ 1 and dp (x, a)Pi ( dx) < ∞, U

the minimal Lp -metric p (P1 , P2 ) admits the dual representation

f1 dP1 + f2 dP2 , p (P1 , P2 ) = sup

(2.5.6)

where the supremum is taken over fi : U → IR satisfying (2.5.3), and f1 (x) + f2 (y) ≤ dp (x, y) for all x, y ∈ U . It is of interest to obtain more informative dual representations for p (see Dudley (1976, Lecture 20), Dudley (1989, Chapter 11)). The refined representations can be applied to problems of classification of probability metrics and their applications to rate of convergence theorems (see Rachev (1991c) and the later sections of this book). This and the next section are concerned with refinements of (2.5.6). We start with the Kantorovich theorem (see Kantorovich (1942, 1948)) for determining a dual representation for the minimal L1 -metric (cf. (2.5.5)) 1 (P1 , P2 ) = inf{P (d); P ∈ P(U × U ) (2.5.7) with πi P = Pi , i = 1, 2}, which is more refined than (2.5.6); recall that P (d) = d(x, y)P ( dx, dy), U ×U

and M (P1 , P2 ) is the set of probabilities on U × U with marginals P1 , P2 . We shall seek general conditions on U and the Pi ’s leading to the duality theorem 1 (P1 , P2 ) = κ(P1 , P2 ).

(2.5.8)

In (2.5.8) κ is the Kantorovich metric (2.5.9) κ(P1 , P2 ) = sup{|P1 (f ) − P2 (f )|; f ∈ Lipb (U )}, where Pi (f ) := f dPi , and Lipb (U ) stands for the class of all bounded f : U

U → IR satisfying the Lipschitz condition |f (x)−f (y)| ≤ d(x, y) ∀x, y ∈ U . First, notice that for any f ∈ Lipb (U ) and any probability P with marginals Pi , |P1 (f ) − P2 (f )| = (f (x) − f (y))P ( dx, dy) U ≤ |f (x) − f (y)|P ( dx, dy) ≤ P (d), U


89

and therefore on P(U ), κ ≤ 1 .

(2.5.10)

In particular, on the set

P ∈ P(U ); d(x, a)P ( dx) < ∞ , P 1 (U ) := the Kantorovich metric κ is finite. In fact, for P1 , P2 ∈ P 1 (U ), κ(P1 , P2 ) ≤ 1 (P1 , P2 ) ≤ d(x, a)(P1 + P2 )( dx) < ∞. U

However, in general, κ may be infinite on P(U ) even for a finite metric d. Let us make the remark here that Kantorovich (1948) proved (2.5.2) for a compact metric space (U, d). The result was further extended to more general spaces by Szulga (1978, 1982), Fernique (1981), Huber (1981), Levin (1974, 1984), Dudley (1976, 1989), de Acosta (1982), Rachev (1984b), Kellerer (1984b), Rachev and Shortt (1990) (see also Rachev (1991c, Chapter 6)). Further, we shall follow the approach suggested in Kellerer (1984b) to establish a duality theorem similar to (2.5.2) in a rather general setting. Throughout this section (U, d) is a fixed metric space with metric d, which may take infinite values. The “initial” and “final” mass distributions P1 and P2 are assumed to be in P(U ), the set of tight probabilities P on U . In other words, the Pi ’s are Borel probability measures on U that are τ -continuous; τ -continuity means that for any increasing net (Gi )i∈I of open sets on U with limit G, P (G) = sup P (Gi ).

(2.5.11)

i∈I

Lemma 2.5.3 Given a Borel probability measure on P on (U, d) the following are equivalent: (i) P is a tight probability on U (P ∈ P(U )). (ii) sup{P (B); B ∈ B a totally bounded Borel set} = 1. (iii) P (Us ) = 1 for some separable Borel set Us ⊂ U . Proof: The fact that (i) implies (ii) can be found in the proof of the tightness of a probability P on a complete metric space (U, d) (see Billingsley (1968), Dudley (1989, Section 11.5)). 2 To see that (ii) implies (iii) take Us = n∈N Bn , where the Bn are totally bounded sets with P (Bn ) > 1 − n1 . We see that (iii) implies (i), since

90


any tight Borel probability measure P on a second countable space is τ continuous. 2 The Kantorovich metric κ in P(U ) was independently introduced by Kantorovich (1948) and Fortet and Mourier (1953). The topological structure of (P(U ), κ) has been studied in Dudley (1966, 1976), Dobrushin (1970), Zolotarev (1976), Huber (1981), Rachev (1984a, 1984b), and Kellerer (1984). We summarize their results in the following theorem. Theorem 2.5.4

(i) κ defines a metric on P (U ).

(ii) If the metric d is bounded, the supremum in (2.5.3) is attained; there exists f ∗ ∈ Lipb (U ) such that κ(P1 , P2 ) = |P1 (f ∗ ) − P2 (f ∗ )|. (iii) Denote by W the coarsest topology on P(U ) in which the maps P → P (G) are lower semicontinuous for all open sets G, and let Tκ be the topology in P(U ) generated by κ. Then (a) W is coarser than Tκ ; (b) W ≡ Tκ if d is bounded. (iv) If (U, d) is a separable metric space, then for (Pn )n∈N and P in

P 1 (U ) =

 

 

Q ∈ P(U );

d(x, a)P ( dx) < ∞





,

U

the following are equivalent: (a) κ(Pn , P ) → 0.

(b) Pn weakly tends to P , and d(x, a)(Pn − P )( dx) → 0 for some (therefore for all) a ∈ U .

U

(c) Pn weakly tends to P , and as N → ∞, sup d(x, a)I{d(x, a) > N }Pn ( dx) → 0, n∈N U

for some (therefore for all) a ∈ U .


91

Proof: (i) Obviously, κ is a semimetric. Suppose that κ(P1 , P2 ) = 0 and let us show that P1 = P2 . To this end,0 for any closed1nonempty set C ⊂ U and k ∈ N , set fk,C (x) := max 0, k1 − d(x, C) . Then P1 (C) ≤ k P1 (fk,C (x)) ≤ k κ(P1 , P2 ) + k P2 (fk,C (x)) ≤ P2 (C 1/k ), where C ε = {x; d(x, C) < ε}. Letting k → ∞ we get P1 (C) ≤ P2 (C), and hence by symmetry, P1 (C) = P2 (C) for any closed set C ⊂ U . (ii) Applying Lemma 2.5.3, without loss of generality we may assume that S is separable. Let Sc be a countable dense subset of S and choose fn ∈ Lipb (U ) with lim |P1 (fn ) − P2 (fn )| = κ(P1 , P2 ).

n→∞

Moreover, we may assume that for a fixed a ∈ U, fn (a) = 0, and from the boundedness of d, ||fn ||∞ = sup |fn (x)| ≤ sup d(x, a) < ∞. x∈U

x∈U

Therefore, {fn }n≥1 contains a subsequence {fn }n converging on Sc that (due to the equicontinuity) converges everywhere to a limit f ∗ ∈ Lipb (U ) with |P1 (f ∗ ) − P2 (f ∗ )| = κ(P1 , P2 ). (iii) (a) For any closed set C, choosing fk,C ∈ Lipb (U ) as in the proof of (i), we have the representation P (C) = inf k∈N kP (fk,C ). This implies that with respect to Tκ the maps P → P (C) are upper semicontinuous for all closed sets; but W is the coarsest topology on P(U ) having this property. (b) If d is bounded, fix an a ∈ U , and let A be the upper bound for d(x, a). Fix P0 ∈ P(U ) and take a totally bounded subset Kε with P0 (U \ Kε ) < ε. Let Lipb (a) (U ) be the set of functions f ∈ Lipb (U ) with f (a) = 0, and therefore ||f ||∞ ≤ A. As we have already shown in part (ii), Lipb (a) (U ) is sequentially compact with respect to pointwise convergence. The class Lipb (a) (Kε ) is totally bounded with respect to || · ||∞ , so let us choose a finite ε-net {g1 , . . . , gk } in Lipb (a) (Kε ). We next use the Kirszbraun– McShane extension (see Dudley (1976, Theorem 3)): For each g ∈ Lipb (a) (Kε ) let g(x) =

inf min{g(y) + d(y, x), sup g(y)},

y∈Kε

y∈Kε

x ∈ U. (2.5.12)

92


Then the restriction g/Kε of g on Kε coincides with g, and inf g = inf g, U

Kε

sup g = sup g. U

Kε

Kεε ,

the open ε-neighborhood of Kε , and define the followTake ing (open with respect to W) neighborhood of P0 : := {P ∈ P(U ); P (U \ Kεε ) < ε, gj )| < ε, ∀j = 1, . . . , k}. |P ( gj ) − P0 (

Oε (P0 )

Then for any P ∈ Oε (P0 ) and any f ∈ Lipb (a) (U ), |P (f ) − P0 (f )| gj ) − P0 ( gj )| + |P0 ( gj ) − P0 (f )| ≤ |P (f ) − P ( gj )| + |P ( ≤ 2 sup |f (x) − gj (x)| x∈Kεε

≤

+ (||f ||∞ + || gj ||)[P (U \ Kεε ) + P0 (U \ Kεε )] + ε 6ε + 4Aε + ε.

Thus, κ(P, P0 )

= sup{|P (f ) − P0 (f )|; f ∈ Lipb (a) (U )} ≤ (7 + 4A)ε,

and the assertion follows. (iv) (a) ⇒ (b). The Prohorov metric (2.5.13) π(P1 , P2 ) ε = inf{ε > 0; P1 (C) ≤ P2 (C ) + ε for all closed C ⊂ U } metrizes the weak topology (see for example Dudley (1989, Theorem 11.3.3)). For any closed C ⊂ U and ε > κ(P1 , P2 ), P1 (C) ≤

√ max 0, 1 − d(x, C)/ ε P1 ( dx)

U

√ √ √ 1 √ κ(P1 , P2 ) + P2 (C ε) ≤ ε + P2 (C ε), ε √ implying π(P1 , P2 ) ≤ ε. Letting ε ↓ κ(P1 , P2 ) yields ≤

π 2 ≤ κ on P (U ).

(2.5.14)

The assertion in (b) follows from (2.5.14) and the following bound: for any Pi ∈ P(U ), d(x, a)(P1 − P2 )( dx) ≤ κ(P1 , P2 ). (2.5.15) U


93

The implications (b) ⇒ (a) and (c) ⇒ (b) are proved in a similar way (see Rachev (1991, Theorem 6.3.1)). 2 The theorem shows that the Tκ -topology depends on the choice of d. As one further example in this direction, suppose d is the discrete metric:   1 for x = x , 1 2 d0 (x1 , x2 ) =  0 for x = x . 1 2 Then κ coincides with the total variation metric σ: (2.5.16) κ(P1 , P2 ) = sup{|P1 (f ) − P2 (f )|; f : U → IR, |f (x) − f (y)| ≤ 1 ∀x, y ∈ U } = sup{|P1 (A) − P2 (A)|; A ∈ B(U )} = σ(P1 , P2 ), where B(U ) stands for the Borel σ-algebra in (U, d). To see the equality in (2.5.8) notice that σ ≤ κ ≤ 1 . The proof that σ = 1 will be given in the next theorem, due to Dobrushin (1970). Theorem 2.5.5 (Dobrushins’s theorem) If d = d0 is the discrete metric, then the duality (2.5.8) holds: 1 (P1 , P2 ) (2.5.17) = inf{P ({ζ(x, y) ∈ U × U ; x = y}); P ∈ P(U ), πi P = Pi , i = 1, 2} = σ(P1 , P2 ). Proof: Define the Hahn decomposition (Billingsley (1986, Section 32)) of λ = P2 − P1 = λ+ − λ− . For A ∈ B(U ), λ+ and λ− are determined by λ+ (A)

:=

sup (P2 − P1 )(C),

C⊂A

λ− (A)

:=

sup (P1 − P2 )(C).

C⊂A

The inequality 1 ≥ σ follows directly from the definition of σ, and thus 1 (P1 , P2 ) ≥ σ(P1 , P2 ) = sup{P1 (A) − P2 (A); A ∈ B(U )} = λ− (U ) = λ+ (U ).

(2.5.18)

In fact, we have equality in (2.5.18). Consider P ∗ ∈ P(U × U ) defined by P ∗ (A × B) = λ0 (A ∩ B) +

λ− (A) λ+ (B) λ+ (U )

94


for all A, B ∈ B(U ), with λ0 = P2 − λ+ = P1 − λ− . Then P ∗ (d0 ) = 1 (P1 , P2 ); P ∗ has marginals P1 and P2 and therefore attains the infimum in (2.5.17). 2 Next we shall extend the duality 1 = κ for any metric d. Further, for any pair P1 , P2 in P(U ), P1 × P2 stands for the extension of the product measure to a probability in P(U × U ), which exists and is unique in view of Lemma 2.5.3 (iii). Then obviously, 1 (P1 , P2 ) ≤ (P1 × P2 )(d) ≤ ∞. Theorem 2.5.6 For any P1 , P2 ∈ P(U ), 1 (P1 , P2 ) = κ(P1 , P2 ).

(2.5.19)

Moreover, if P1 and P2 are tight, then the infimum in (2.5.7) is attained. Proof: In view of (2.5.10), it remains to show that 1 ≤ κ. Case 1: Suppose P1 and P2 are tight. Then according to Theorem 2.3.1, 1 (P1 , P2 ) = Id (P1 , P2 ) := sup{P1 (f1 ) + P2 (f2 )},

(2.5.20)

where the supremum is taken over all Pi -integrable functions fi : U → [−∞, ∞) with f1 (x) + f2 (y) ≤ d(x, y),

x, y ∈ U.

Observe that the definition of κ in (2.5.9) remains unchanged if the moduli for absolute values are removed, and therefore taking f1 = f , f2 = −f in (2.5.20) leads to Id ≥ κ. It remains to show that Id (P1 , P2 ) ≤ κ(P1 , P2 ).

(2.5.21)

In the definition of Id the functions fi can be approximated by fi ∧ k, k ∈ IN, and therefore may be assumed to be bounded from above. On the other hand, if f1 ∨ f2 ≤ A, then f1 (x) + f2 (y) ≤ d(x, y) can be replaced by f1 (x) ∨ (−A) + f2 (y) ∨ (−A) ≤ d(x, y). Hence we can assume fi to be bounded. To check (2.5.21) it suffices to show the following: For any bounded functions fi on U with f1 (x1 ) + f2 (x2 ) ≤ d(x1 , x2 ) ∀xi ∈ U,

(2.5.22)

there exists a function f ∈ Lipb (U ) such that f1 ≤ f and f2 ≤ −f . In fact, take f = 12 (f1 − f2 ), where f1

=

f2

=

inf (d(x, y) − f2 (y)),

y∈U

inf (d(x, y) − f1 (x)).

x∈U


95

Then f1 (x1 ) + f2 (x2 ) ≤ d(x1 , x2 ), implying that f1 + f2 = 0; that is, f1 ≤ f and f2 ≤ −f . Finally, f is Lipschitz and bounded, since it can be easily checked that fi ∈ Lipb (U ). Therefore, we can replace f1 and f2 in (2.5.20) with f and (−f ) respectively. Theorem 2.3.10 implies the existence of an “optimal” P ∗ on U × U with πi P = Pi and P ∗ (d) = 1 (P1 , P2 ). Case 2: Suppose we have any probabilities Pi on U . In view of Lemma 2.5.3 (iii) assume that U is separable. Choose a countable partition (Bk )k of nonempty disjoint Borel sets, k Bk ⊃ U, yk ∈ Bk , with diam Bk = sup{d(x1 , x2 ); xi ∈ Bk } < ε and Pi (Bk ) > 0. Consider discrete approximations of Pi with supports in Y = {yk }k defined by (d)

Pi ({yk }) := Pi (Bk )

∀k.

For any f ∈ Lipb (Y ) apply the extension (2.5.12) to get (d) (d) κ P1 , P2 (d) (d) = sup P1 (f ) − P2 (f ) ; over all f : Y → IR with |f(x) − f(y)| ≤ d(x, y) for all x, y ∈ U

(2.5.23)

≤ κ(P1 , P2 ) + 2ε. (d)

For any probability P (d) on Y × Y with marginals Pi , let P be any extension on U × U with marginals P1 and P2 such that P (Bi × Bk ) = P (d) ({yi } × {yj }) for all i, j (cf. for example Theorem 2.4.2). Since P (d) ≤ (d) P (d) (d)+2ε, taking its minimum over all P (d) with marginals Pi , it follows that (d) (d) 1 (P1 , P2 ) ≤ 1 P1 , P2 + 2ε. (2.5.24) In view of the theorem we have shown in case 1, we have duality (d) (d) (d) (d) = κ P1 , P2 . This together with (2.5.23) and (2.5.24) 1 P1 , P2 amounts to 1 (P1 , P2 ) ≤ κ(P1 , P2 ) + 4ε, which completes the proof of the theorem.

2

Remark 2.5.7 As we see from the proof of Case 2, taking extensions continuous with respect to P1 ×P2 , the value of 1 does not change if we restrict the infimum in (2.5.7) over all admissible plans of transportation P that are absolutely continuous with respect to the product measure P1 × P2 . Remark 2.5.8 To show that tightness of the initial and final mass distributions P1 and P2 is an essential assumption for the existence of an optimal

96


transportation plan P ∗ achieving the infimum in 1 , consider the following example. Let λ be the Lebesgue measure on [0, 1] = A1 + A2 , where the Ai s are nonmeasurable with outer measure 1. Define U ⊂ IR2 to be the union of Ai × {i}, i = 1, 2, equipped with the Euclidean metric d in IR2 . The Borel sets in U are of the type {(Ai ∩ Bi ) × {i}} B = i=1,2

with Bi ∈ B ([0, 1]). Since the outer measure λ∗ (Ai ) = 1 and U is second countable, it is possible to define probabilities Pi ∈ P(U ) with supp (Ai ×{i}) by Pi (B) := λ(Bi ) for any {(Ai ∩ Bi ) × {i}} . B = i=1,2

Let us show 1 (P1 ,P2 ) = 1. For any k ∈ IN define the sets Cj = first that j−1 j−1 j j , × {1} × k k k , k × {2} , j = 1, . . . , k. Then   Cj  , B ∈ B(S), S = U × U, P (B) = k(µ1 × µ2 ) B ∩ 1≤j≤k

determines a probability P on S with πi P = Pi , i = 1, 2, and P (d) ≤ 1+1/k. Therefore, 1 (P1 , P2 ) = 1. On the other hand, since d(x1 , x2 ) > 1 for each xi ∈ Ai × {i}, it follows that P (d) > 1

∀P ∈ P(S)

with marginals πi P = Pi . Therefore, the infimum in the definition of 1 (P1 , P2 ) is not attained.

2.6 Dual Representation for Lp -Minimal Metrics In this section we study refined versions of the dual representations for inf [Edp (X, Y )]1/p ; P X = P1 , P Y = P2 and

sup [Edp (X, Y )]1/p ; P X = P1 , P Y = P2

with probabilities P1 and P2 on a separable metric space (U, d). Let P p (U ) (p ≥ 1) be the space of all Borel probability measures (probabilities) P on

2.6 Dual Representation for Lp -Minimal Metrics

97

(U, d) with finite dp (x, a)P ( dx). For P1 , P2 ∈ P p (U ) let M (P1 , P2 ) be the set of all probabilities on U × U with fixed marginals P1 and P2 . For P ∈ M (P1 , P2 ) let Lp (P ) :=

1/p p

d (x, y)P ( dx, dy)

,

(2.6.1)

and let inf{Lp (P ); P ∈ M (P, Q)}, sup{Lp (P ); P ∈ M (P, Q)}.

p (P1 , P2 ) := Sp (P1 , P2 ) :=

(2.6.2) (2.6.3)

The dual forms for p and Sp are given by

f dP1 + g dP2 ; (f, g) ∈ Gp pp (P1 , P2 ) = sup and

Spp (P1 , P2 )

= inf

f dP1 +

g dP2 ; (f, g) ∈

Gp∗

(2.6.4)

,

(2.6.5)

where Gp (resp. Gp∗ ) is the set of all pairs of bounded continuous functions on U satisfying the dual constraint f (x) + g(y) ≤ dp (x, y) (resp. f (x) + g(y) ≥ dp (x, y)) for all x, y ∈ U (see Theorem 2.3.1 and Corollary 2.3.9). While in the case p = 1, one can replace g with (−f ) in (2.6.4), in general, for p > 1, there is no dual representation for (2.6.4) in the form of a ζF -metric. Here ζF is the Zolotarev ζ-metric: (2.6.6) ζF (P1 , P2 ) := sup f d(P1 − P2 ) , f ∈F

where F is a class of bounded continuous functions; see Example 2.6.3. Our aim is to obtain more informative dual representations than (2.6.4) and (2.6.5) by showing that the supremum in (2.6.4) (resp. the infimum in (2.6.5)) can be taken over a set smaller than Gp (resp. Gp∗ ). Taking the Kantorovich representation 1 = ζ Lip1 (U ) with Lip1 (U ) = {f : U → IR, f (x) − f (y) ≤ d(x, y), ∀x, y ∈ U } as a starting point, Szulga (1982) made the conjecture that for P1 , P2 ∈ P p (U ), p (P1 , P2 ) =

ASp (P1 , P2 ) 1/p 1/p := sup |f |p dP1 − |f |p dP2 . (2.6.7) f ∈ Lip1 (U )

Despite the fact that p and ASp induce one and the same convergence in P p (U ), we shall further construct an example showing that Szulga’s

98


conjecture fails. We shall characterize the optimal solutions P in (2.5.8), i.e., those P ∈ P(P1 , P2 ) for which p (P1 , P2 ) = Lp (P ). First, we shall show that pp (P1 , P2 ) admits a dual form similar to that of ζF (P1 , P2 ) but with F depending on P1 and P2 . Denote by ν1 = (P1 − P2 )+ and ν2 = (P1 − P2 )− the positive and negative parts of the Hahn decomposition P1 −P2 . Let A1 be the support of (P1 −P2 )+ and A2 = U \A1 . Define the set Fp (P1 , P2 ) of functions f = f1 1A1 + f2 1A2 , where the fi are bounded functions on Ai , having finite Lipschitz norms Lip(fi ; Ai ) := sup{|fi (x) − fi (y)|/d(x, y); x = y, x, y ∈ Ai } < ∞ and satisfying the dual constraint f1 (x) − f2 (y) ≤ dp (x, y) ∀x ∈ A1 , y ∈ A2 . Theorem 2.6.1 For any P1 , P2 ∈ P p (U ), p f d(P1 − P2 ). sup p (P1 , P2 ) =

(2.6.8)

f ∈Fp (P1 ,P2 )

Proof: We start with the following dual representation for pp (cf. (2.5.6)): pp (P1 , P2 )

=

sup

g dP2 ; (f, g) ∈ Gp ,

Lip(f ; U ) + Lip(g; U ) < ∞ . f dP1 +

(2.6.9)

Suppose first that P1 (A2 ) = P2 (A1 ) = 0.

(2.6.10)

By (2.6.9), and since f |A1 − g|A2 ∈ Fp (P1 , P2 ) for (f, g) ∈ Gp , Lip(f, A1 ) < ∞, Lip(g, A2 ) < ∞, we have

f d(P1 − P2 ); f ∈ Fp (P1 , P2 ) pp (P1 , P2 ) ≤ sup (f ◦ π1 − f ◦ π2 ) dP ≤ inf sup P ∈M (P1 ,P2 ) f ∈F (P1 ,P2 )

≤

 

inf



  dp dP ; P ∈ M (P1 , P2 ) 

A1 ×A2

= inf{P (dp ); P ∈ M (P1 , P2 )} = pp (P1 , P2 ). = (P2 − P1 )+ , To omit the assumption (2.6.10) set P = (P1 − P2 )+ , Q ν = P1 − P = P2 − Q, and recall that P (U \ A1 ) = Q(A1 ) = 0. We then


99

get sup = = = (∗)

= =

f d(P1 − P2 ); f ∈ Fp (P1 , P2 )

f ∈ Fp (P, Q) sup f d(P − Q); pp (P, Q)

p inf d dµ; π1 µ = P , π2 π = Q inf {P (dp ); P ∈ M (P, Q)} pp (P1 , P2 ).

The equality (∗ ) can be shown as follows: “≥” Given µ choose P as P (B) = µ(B) + ν π1−1 (B ∩ {(x, x); x ∈ U }) . “≤” Given P choose µ by µ(B1 × B2 ) = P (B1 × B2 ) − ν(B1 ∩ B2 ).

2

Remark 2.6.2 More interesting would be the ζF -representation for p (not pp ) with an F that depends only on the support of (P1 − P2 )+ . The next example shows that this is impossible; cf. Neveu and Dudley (1980). Example 2.6.3 Suppose p = ζF . Then for 0 < r < s < 1, we have p (rδa + (1 − r)δb , sδa + (1 − s)δb )

f d(rδa + (1 − r)δb − sδa − (1 − s)δb ); f ∈ F , = sup i.e., (s − r)1/p d(a, b) = sup{(s − r)(f (a) − f (b))); f ∈ F} = (s − r) const. If d(a, b) > 0, this yields (s − r)1−1/p = const, and thus letting s → r implies p = 1. In the case p = 1, the representation (2.6.8) amounts to 1 = ζ Lipb (U ) ; see Theorem 2.5.6. Taking the dual form for 1 , Szulga’s conjecture seems reasonable. First let us show that ASp and p metrize one and the same convergence in P p (U ). Let π be the Prohorov metric; see (2.5.13). Theorem 2.6.4 For any P, Q ∈ P p (U ), p ≥ 1, the following inequalities hold: ASp (P, Q) ≤ p (P, Q)

(2.6.11)

100


and Cp π 2 (P, Q) ≤ ASp (P, Q),

(2.6.12)

where Cp ≥ 1/(p 2p−1 ). In particular, for Pn , P ∈ P p (U ) the following are equivalent: As n → ∞, (a) p (Pn , P ) → 0, (b) ASp (Pn , P ) → 0, (c) π(Pn , P ) → 0 and

dp (x, a)(Pn − P )( dx) → 0.

Proof: The inequality (2.6.11) is a consequence of the Minkowski inequality. In fact, there exists a rich enough probability space (Ω, A, P ) such that the space of laws P X,Y coincides with the space of probabilities on U × U , and thus ASp (P1 , P2 ) = ASp (P X , P Y ) ≤ [Edp (X, Y )]1/p , which implies (2.6.11). To show (2.6.12) observe that for any closed C ⊂ U and the Lipschitz function d(x, C , ε ∈ (0, 1), fC (x) = max 0, 1 − ε we have P1 (C)1/p ≤

fCp dP1

1/p

1 ≤ P2 (C ε )1/p + ASp (P1 , P2 ). ε

If ASp (P1 , P2 ) ≤ δ := Cp ε2 , then p 1 ε 1/p P2 (C ) + ASp (P1 , P2 ) P1 (C) ≤ ε p ≤ P2 (C ε )1/p + Cp ε ≤ P2 (C ε ) + ε. The last inequality follows from (a1/p + Cp ε)p ≤ a + ε for any a, ε ∈ (0, 1). Letting δ → ASp (P1 , P2 ) we obtain (2.6.12). Next, for (a) ⇔ (c), see Rachev (1991, Theorem 6.3.1), and further, for the implication (a) ⇒ (b) see (2.6.11). Finally, to check (b) ⇒ (c), apply (2.6.12) and 1/p 1/p p p d (x, a)P ( dx) − d (x, a)Q( dx) ASp (P, Q) ≥ . 2


101

Remark 2.6.5 If p is an integer, one can get a better estimate for Cp , namely C2 =

√

2 − 1,

Cn ≥

1 , 2n

n ∈ IN.

The first indication that Szulga’s conjecture is not valid comes from the bound ASp ≥ Cp π 2 and the corresponding bound for p , p ≥ π 1+1/p . Note that both estimates have the precise order. The next example shows that ASp = p . For simplicity we consider the case p = 2. Let (U, d) = ([0, 1], | · |), P1 ({0}) = 1 − P1 ({1}) = 13 , and P2 ({0}) = 1 − P2 ({1}) = 23 . Then for any P ∈ M (P1 , P2 ), L2 (P ) = 1/2 1 = √13 , and thus 2 (P1 , P2 ) = √13 . To calculate AS2 (P1 , P2 ), 3 d(0, 1) setting f (0) = a, f (1) = b, we have to maximize |ϕ(a, b)| with ϕ(a, b) :=

2 2 1 2 a + b 3 3

1/2

−

1 2 2 2 a + b 3 3

1/2

on D = {(a, b) : |a − b| ≤ 1}. Since ∂ϕ ∂ϕ = 0, = 0 ⇔ a = b = 0, ∂a ∂b and the case a = b = 0 is trivial, we have to look for the extrema of ϕ on ∂D. We consider b = a−1 (the case b = a+1 is similar). Set g(a) = ϕ(a, a−1). Then g (a) = 0 if and only if 2 2 4 2 2 1 4 2 = 2a − , 2a − a2 − a + a2 − a + 3 3 3 3 3 3 which holds if and only if a = 12 . Since g 12 = 0, what is left is to consider the limiting behavior of ϕ(a, b) as a → ±∞, |b − a| ≤ 1, 1/2 1 2 2 ϕ(a, b) = a + a(b − a) + (b − a) 3 3 1/2 2 4 − a2 + a(b − a) + (b − a)2 3 3 1/2 2 1 2 2 = a + (b − a) + (b − a) 3 9 1/2 2 2 2 2 − a + (b − a) + (b − a) . 3 9

2

102


Thus, as a → ±∞, 2 b − a − a + (b − a) ϕ(a, b) = a + 3 3   1 (b − a) a → +∞, 3 =  − 1 (b − a) a → −∞. 3 In both cases |ϕ| ≤ 13 , and consequently AS2 (P1 , P2 ) =

1 1 = 2 (P1 , P2 ) = √ . 3 3

Our next theorem is a refinement of the dual representation for Sp (cf. (2.6.5)) in the case (U, d) is a separable Banach space and d(x, y) = ||x−y||. For its proof we need some standard facts from convex analysis. Let f be a function on U , and call the function f ∗ p-conjugate if f ∗ (y) := sup {||x − y||p − f (x)} ,

y ∈ U.

(2.6.13)

x∈U

The pair (f, f ∗ ) satisfies the admissibility constraint f (x) + f ∗ (y) ≥ ||x − y||p ,

∀x, y ∈ U.

(2.6.14)

If f ∗∗ = (f ∗ )∗ is the second p-conjugate, then f ≥ f ∗∗ .

(2.6.15)

Moreover, f ∗∗ is convex and lower semicontinuous. Theorem 2.6.6 For any P1 , P2 ∈ P p (U ), p ≥ 1, Spp (P1 , P2 )

=

inf{P1 (f ) + P2 (g); f, g are convex and lower semicontinuous, and for all x, y ∈ U, f (x) + g(y) ≥ ||x − y||p }.

(2.6.16)

Proof: The left-hand side of (2.6.16) is obviously not greater than the right-hand side. To show the inverse inequality, for any (f, g) ∈ Gp∗ (cf. (2.6.5)) consider the pair (f ∗∗ , f ∗∗∗ ). Then by (2.6.14) and (2.6.15), g(y) ≥ sup {||x − y||p − f (x)} = f ∗ (y) ≥ f ∗∗∗ (y), x∈U

f ≥ f ∗ and f ∗∗ (x) + f ∗∗∗ (y) ≥ ||x − y||p . This yields that the right-hand side of (2.6.16) is not greater than the left-hand side. 2


103

If (Ω, A, P ) is a nonatomic space and P1 and P2 are tight measures, then (2.6.17) Sp (P1 , P2 ) = sup (E||X − Y ||p )1/p ; P X = P1 , P Y = P2 , and the supremum is attained for an “optimal” pair (X, Y ); cf. Theorem 2.5.1. We shall characterize the set of optimal pairs for (2.6.17). For any function f on U define Dp f (x) := {y ∈ U ; f (x) + f ∗ (y) = ||x − y||p }.

(2.6.18)

Corollary 2.6.7 The pair (X0 , Y0 ) with P X0 = P1 , P Y0 = P2 is optimal for (2.6.17) if and only if Y0 ∈ Dp f (X0 ) a.s.

(2.6.19)

for some lower semicontinuous convex function f . Proof: Suppose that X0 and Y0 —with laws P1 and P2 respectively—satisfy (2.6.19). Then (X0 , Y0 ) is optimal, since for any other X and Y with laws P1 and P2 , E||X − Y ||p

≤ Ef (X) + Ef ∗ (Y ) = Ef (X0 ) + Ef (Y0 ) = E||X0 − Y0 ||.

Suppose now that (X0 , Y0 ) is an optimal pair. By Theorem 2.3.12, there exist functions f0 , g0 , P1 - and P2 -integrable respectively, satisfying f0 (x) + g0 (y) ≥ ||x − y||p , and such that f dP1 + g dQ; f is P1 -integrable, f0 dP1 + g0 dP2 = inf

p g is P2 -integrable, and f (x) + g(y) ≥ ||x − y|| ∀x, y ∈ U . As in Theorem 2.6.6 we conclude that (f0∗∗ , f0∗∗∗ ) is also optimal, and thus ||X0 − Y0 ||p = f0∗∗ (X0 ) + f0∗∗∗ (Y0 ) a.s.; i.e., Y0 ∈ Dp (f0∗∗ (X0 )) a.s. 2 Next we consider the special case p = 2 and U = IRk with Euclidean norm || · ||. Then 0 1 (2.6.20) S22 (P, Q) = sup E||X − Y ||2 ; P X = P1 , P Y = P2 0 1 2 2 X Y = E||X|| + E||Y || − 2 inf E X, Y ; P = P1 , P = P2 . For any f on IRk define the lower conjugate f∗ (y) =

inf { x, y − f (x)}

x∈IRk

104


and the Young–Fenchel transforms (see Ioffe and Tihomirov (1979, p. 172)) f (y) = supx∈IRk { x, y − f (x)}. Then f∗ = −g f , where gf (x) = −f (−x). Corollary 2.6.8 Let P1 , P2 ∈ P p (IR2 ). Then the random vectors X0 , Y0 with laws P1 and P2 respectively attain the supremum in (2.6.20) if and only if f (X0 ) + f∗ (Y0 ) = X0 , Y0 a.s.

(2.6.21)

for some upper semicontinuous concave function f . The proof is similar to that of Corollary 2.6.7 and is therefore omitted. Denote the subdifferential of f in x by ∂f (x) = y ∈ IRk ; f (x) + f (y) = x, y . Then (2.6.21) is equivalent to Y0 ∈ ∂g(−X)

(2.6.22)

for some convex lower semicontinuous function g. Example 2.6.9 Let P1 and P2 be Gaussian measures on IRk with means m1 and m2 and nonsingular covariance matrices Σ1 and Σ2 respectively. Then 22 (P1 , P2 ) =

||m1 − m2 ||2 + tr (Σ1 ) 9 9 1/2 ; Σ1 Σ2 Σ1 + tr (Σ2 ) − 2 tr

(2.6.23)

see Olkin and Pukelsheim (1982), Givens and Shortt (1984), Gelbrich (1990), R¨ uschendorf and Rachev (1990). Hence S22 (P, Q) =

→

→

|| m1 − m2 || + tr (Σ1 ) : ; 1/2 1/2 , + tr (Σ2 ) + 2 tr (Σ−1 ) Σ2 (Σ−1 )

where Σ−1 is the covariance matrix of P (−dx). We shall discuss further in the next chapter explicit solutions for general Monge–Kantorovich problems resulting from the duality theorems in sections 2.3 and 2.4. Remark 2.6.10 The Kantorovich metric 1 admits a G Lipb (U ) representation, see Theorem 2.5.6. Then the following open problem arises. Is it true that S1 (P1 , P2 ) = inf f d(P1 + P2 ); f : U → IR, (2.6.24) Lip(f ; U ) < ∞ and f (x) + f (y) ≥ d(x, y) ∀x, y ∈ U } ?


105

On (U, d) = (IR, | · |) equality (2.6.24) holds. In fact, if F and G are the distribution functions of P1 and P2 then 0 1 µ(P1 , P2 ) := sup E|X − Y |; P X + P Y = P1 + P2 ≥

1 S1 (P1 , P2 ) = |F −1 (x) − G−1 (1 − x)| dx 0

=

(see Cambanis, Simons, and Stout (1976)) +∞ |x − a|(F + G)( dx) −∞

(where a is the intersection point of the completed = ≥

graphs of F and G) 0 1 sup E|X − a| + E|Y − a|; P X + P Y = P1 + P2 µ(P1 , P2 );

see Rachev (1991c, p. 173). As in the proof of Theorem 2.5.6, one can check that the dual representation for µ(P1 , P2 ) equals the right-hand side of (2.6.24) (with d(x, y) = |x − y|), which completes the proof of (2.6.24) in this particular case. Remark 2.6.11 Theorem 2.6.1 provides the dual form for ∞    pp (P1 , P2 ) = p inf P (d(X, Y ) > t)tp−1 dt; P X = P1 , P Y = P2 .   0

It is of interest to determine the dual representation for

p p−1 X Y ; P = P1 , P = P2 . λp (P1 , P2 ) = inf sup P (d(X, Y ) > t)t t>0

For any p > 1, λp is a metric. By the Strassen–Dudley theorem (see Dudley (1989, p. 322)), 0 1 λpp (P1 , P2 ) ≥ sup tp−1 inf P (d(X, Y ) > ε); P X = P1 , P Y = P2 t>0

=

sup tp−1

:=

t>0 C⊂U closed γpp (P1 , P2 ).

sup

[P1 (C) − P2 (C t )]

The metrics λp and γp metrize one and the same topology (Rachev (1983), Kakosjan, Klebanov, and Rachev (1988)). Namely, if for n = 0, 1, . . . sup tp−1 Pn (d(x, a) > t) < ∞, t>0

106


then the following are equivalent: (i) λp (Pn , P0 ) → 0

as

n → ∞.

(ii) γp (Pn , P0 ) → 0

as

n → ∞.

w

(iii) Pn −→ P0

n → ∞, and

as

lim sup sup tp−1 Pn (d(x, a) > t) = 0.

N →∞ n t>N

The difference between λp and γp is seen by the following example.(3) Set P (X = 0) = 1 − P (X = 1) = α, P (Y = 1) = 1 − P (Y = 2) = β, and 0 < α ≤ 12 . The joint distribution of X and Y is then determined by P (X = 0, Y = 1) = P (X = 1, Y = 1) =

1 , 2 1 − α, 2

P (X = 0, Y = 2) =

P (X = 1, Y = 2) = α.

Thus for P1 = P X , P2 = P Y , inf 1 max max P (|X − Y | > t)tp−1 , λpp (P1 , P2 ) = 0 1, and let h : IRk → IRk be a measurable, cyclic-monotone function. Define p−2

Φh (x) := |h(x)|− p−1 h(x) + x.

(3.3.15) d

Then (X, Φh (X)) is an optimal c-coupling for P, Q, where Q = Φh (X) (that is, (X, Φh (X)) is a minimal p -coupling). Proof: Since c(·, y) is concave and differentiable and moreover, c1 (x, Φh (x)) = h(x) is cyclic-monotone, then Corollary 3.3.18 follows from Proposition 3.3.17. 2

132

3. Explicit Results for the Monge–Kantorovich Problem

Example 3.3.19 Let A be a positive semidefinite and symmetric matrix, and h(x) = Ax. Then h is cyclic-monotone, and Φh (x) =

− 12 · p−2 p−1

x A2 x

Ax + x

(3.3.16)

is an optimal p -coupling function. Corollary 3.3.20 (radial transformation) Let c(x, y) = −|x − y|p , p > 1, and let Φα (x) = α(|x|) |x| x be a measurable radial transformation such that d

d

t → α(t) − t is monotonically nondecreasing. Let X = P and Φα (X) = Q. Then (X, Φα (X)) is an optimal c-coupling for P, Q. p−1

Proof: Define β(t) := (α(t) − t) and   β(|x|) x for x = 0, |x| Φβ (x) :=  0 for x = 0. Then β is monotonically nondecreasing, and Φβ is also a radial transformation. By the proof of Corollary 3.2.17, Φβ is the subgradient of a convex function and therefore cyclic-monotone. Therefore, by Corollary 3.3.18, an optimal c-coupling function is given by − p−2 p−1

|Φβ (x)|

Φβ (x) + x =

p−2

β(|x|)− p−1

=

1

β(|x|) p−1

β(|x|) x+x |x| x α(|x|) = x = Φα (x). + |x| |x| |x| 2

Remark 3.3.21 (one-dimensional case) In the one-dimensional case the characterization Theorem 3.3.11 has been used in Uckelmann (1996) to derive optimal c-couplings for non-Monge coupling functions of the form c(x, y) = Φ(x − y), where Φ is not assumed to be convex. The case of concave functions Φ(|x − y|) has been considered in Gangbo and McCann (1996).

3.4 An Extension of the Kantorovich L2 -Minimal Problem The purpose of this section is to give some alternative proofs and to solve some extensions of the Kantorovich L2 -minimal problem on IRd : d

d

2 (IPX , IPY ) := 2 (X, Y ) := inf{L2 (X, Y ); X = X; Y = Y }.

3.4 An Extension of the Kantorovich L2 -Minimal Problem

133

Here L22 (X, Y ) := E||X − Y ||2 , and the underlying probability space is assumed to be nonatomic, so that the above definition of 2 is consistent with that given in (2.6.2). The L2 -minimal problem can be formulated and proved in the normal case as a problem on the maximum of submatrix traces for positive definite matrices. This approach is suitable for an extension of the problem. Suppose that X and Y are d-dimensional random vectors with normally distributed marginals having zero mean vectors and known covariance matrices Σ11 > 0, Σ22 > 0, respectively. (A ≥ B and A > B mean that A − B is nonnegative definite and positive definite, resp.). Denote the dispersion matrix of (X, Y ) by

& =

Σ11

ψ

ψ

Σ22

( .

The L2 -distance between X and Y is then L2 (X, Y ) = E tr(X − Y ) (X − Y ) = tr (Σ11+ Σ22 ) − 2 tr ψ. (3.4.1) max tr ψ, and one can now resolve Therefore, 2 (X, Y ) = tr(Σ11 +Σ22 )−2 ≥0 the problem of minimizing (3.4.1) subject to ≥ 0, that is, finding tr ψ. max ≥0

(3.4.2)

By translating (3.4.2) to a dual problem, the matrix 1 − 12 1 1 1 2 2 2 2 Σ22 Σ11 Σ22 Σ22 ψp = Σ11 Σ22

(3.4.3)

represents a solution of (3.4.2). This result was obtained independently by Olkin and Pukelsheim (1982) and by Dowson and Landau (1982) using a different argument. (See also Apitzsch, Fritzsche, and Kirstein (1990).) A proof for (3.4.2) is given by Corollary 3.2.13. Here, we present an alternative proof given by Gelbrich (1990). (See also Givens and Shortt (1984).) Theorem 3.4.1 If P X = N (0, Σ11 ) and P Y = N (0, Σ22 ), then (X, Y ) = (X, ψ0 X) is an optimal coupling for 2 (P X , P Y ), where 1 − 12 1 1 1 2 2 2 2 Σ11 Σ22 Σ11 Σ11 . ψ0 = Σ11

(3.4.4)

134


Proof: The covariance matrix (the dispersion matrix of X and Y ) admits the factorization 1 1 1 2 2 2 0 Id 0 Σ11 Σ11 Σ11 ψ , = −1 0 S 0 Id ψ Σ112 Id where S = Σ22 − ψ Σ−1 11 ψ =: φ(ψ) is the Schur complement. Note that since 1

2 Σ11 > 0, Σ22 > 0, the square root Σ11 and its inverse are positive definite. The factorization yields that ≥ 0 if and only if φ(ψ) ≥ 0. Thus the 2 -minimality problem is equivalent to

Find Max{2 tr ψ; φ(ψ) = S} =: I(S) for all S ≥ 0,

(3.4.5)

Find Max{I(S); S ≥ 0}.

(3.4.6)

and

Solution of (3.4.5): Fix S ≥ 0 with nonempty φ−1 (S) and take ψ ∈ φ−1 (S); that is, Σ22 − S = ψ Σ−1 11 ψ.

(3.4.7)

Let r = rank (Σ22 − S). Then for Σ22 − S ≥ 0 we have the factorization Σ22 − S = U D2 U = Ur Dr2 Ur ,

(3.4.8)

where D is a d × d diagonal matrix with diagonal elements (λ1 , . . . , λr , 0, . . . , 0),

λi > 0.

Similarly, Dr = diag (λ1 , . . . , λr ), Ur is a d × r and U = (Ur , Ud−r ) is a d × d orthogonal matrix. From (3.4.5), (3.4.6), and (3.4.7), 2 ψ Σ−1 11 ψ = Ur Dr Ur ,

(3.4.9)

−1 and thus Dr−1 Ur ψ Σ−1 11 ψUr Dr = Ir . Equivalently, Er Er = Ir , where Er = 1

2 ψUr Dr−1 is a d × r matrix. We rewrite the above equality as Σ11

ψUr = Σ111 Er Dr .

(3.4.10)

As U = (Ur , Ud−r ) is orthogonal, then Ud−r Ur = 0. By (3.4.9), ψ Σ−1 Ud−r 11 ψUd−r

=

Ud−r Ur Dr2 Ur Ud−r

=

0,

which implies ψUd−r = 0 due to Σ−1 11 > 0. Using the same argument, ψ = ψU U = ψ(Ur , Ud−r )(Ur , Ud−r ) = ψUr Ur , and by (3.4.10), 1

2 Er Dr Ur . ψ = Σ11

(3.4.11)


135

Hence, (3.4.11) together with Er ψUr Dr−1 determines a one-to-one mapping Er ⇔ ψ, and since Ir = Er Er ⇔ S = φ(ψ), we can reformulate (3.4.5) as Find Max{F(Er ); Er Er = Ir , Er is d × r} =: I(S),

(3.4.12)

where by (3.4.11), F(Er ) := =

1 1 2 2 2 tr Er Σ11 Ur Dr = 2 tr (Dr Ur ) Σ11 Er 1 2 2 tr Σ11 Er Dr Ur = 2 tr(ψ).

With Er = (W1 , . . . , Wr ), where the Wi s are d × 1 vectors, Er Er = Ir can be rewritten as (W1 W1 , W1 W2 , . . . , W1 Wr , W2 W2 , W2 W3 , . . . , W2 Wr , . . . , Wr Wr ) = (1, 0, . . . , 0, 1, 0, . . . , 0, . . . , 1). (3.4.13) To find the maximum in (3.4.12) we use the Lagrange multiplier rule (see, for example, Ioffe and Tihomirov (1979)): Let f : IRn → IR and g : IRn → IRm , n ≥ m, have continuous first derivatives and let g(x0 ) = 0 for some x0 ∈ IRn . If x0 is a local maximum of f (x) → Max,

g(x) = 0,

and g (1) (x0 ) is surjective, then there exists a point y0 ∈ IR that satisfies L (x0 , y0 ) = f (1) (x0 ) + yg (1) (x0 ) = 0, where L(x, y) := f (x) + y g(x), x ∈ IR, y ∈ IRm . The Jacobian of (3.4.13) is given by   2W1 0 ··· 0  W2 0 ··· 0     ..   .     Wr  0 · · · 0    0  · · · 0 2W 2  .  0  · · · 0 W 3     ..  . 0     ..   0 .  Wr · · · 0 0 2Wr The Jacobian has full rank; that is, it determines a regular linear mapping. Further, F : IRdr → IR is a linear function, and G(Er ) = Er Er , G : IRdr → 2 IRr satisfies the regularity condition in the rule for Lagrange multipli1 2 Ur Dr . Then L(Er , C) = ers. To form the Lagrange function, set B = Σ11

136


2 tr(B Er ) − tr(C(Er Er − Ir )) for any Er (d × r) and C symmetric r × r. For any maximum Er in (3.4.12) we have Er Er = Ir and for any d × r matrix M, 0 = Lx (Er , C)H = 2 tr(B H) − tr(CEr H + CH Er ) = tr(2(B − CEr )H). The last condition is equivalent to B = Er C.

(3.4.14)

Since rank B = rank Er = r, it holds that rank C = r, that is, C is regular. From (3.4.13) it follows that Er = BC −1 ,

(3.4.15)

and by Er Er = Ir , B B = C Er Er C = C C = C 2 .

(3.4.16)

The equality (3.4.17) also implies F(Er ) = 2 tr(Er B) = 2 tr(Er Er C) = 2 tr(C).

(3.4.17)

From (3.4.17) and (3.4.16), we see that F(Er ) with Er Er = Ir will take a 1 maximum value only if C = (B B) 2 > 0. Now we can find the maximum in (3.4.12); 1 1 (3.4.18) I(S) = 2 tr(C) = 2 tr (B B) 2 − 2 tr (BB ) 2 12 1 1 2 2 = 2 tr Σ11 using (3.4.8) Ur Dr2 Ur Σ11 12 1 1 2 2 = 2 tr Σ11 (Σ22 − S) Σ11 . 1

2 Er Dr Ur (3.4.11) and Er = BC −1 (3.4.15), we Recalling that ψ = Σ11 1

1

1

2 2 B(B B)− 2 Dr Ur , with B := Σ11 Ur Dr . This is equivalent arrive at ψ = Σ11

− 12

1 2

1

to Σ11 ψΣ11 = B(B B)− 2 B ≥ 0. Because the right-hand side of the last equality is a symmetric d × d matrix, applying (3.4.7), we obtain

1

1

2 2 ψΣ11 Σ11

2

1 1 1 1 1 1 2 2 2 2 2 2 = Σ11 ψ Σ11 Σ11 ψΣ11 = Σ11 Σ 12 − S Σ11 ,

and thus 1 12 1 1 1 2 2 2 2 Σ11 (Σ22 − S) Σ11 Σ11 . ψ = Σ11

(3.4.19)

Thus, the solution of the problem (3.4.5) is achieved at ψ determined by (3.4.19). Moreover, in (3.4.5); I(S) can also be defined by (3.4.18).


137

Solution of (3.4.6): Let 1 (S) ≥ · · · ≥ d (S) ≥ 0 be the eigenval1

1

1

1

2 2 2 2 ues of Σ11 (Σ22 − S)Σ11 for S ≥ 0. Note that x Σ11 (Σ22 − S)Σ11 x ≤ 1

1

2 2 Σ22 Σ11 x for any S ≥ 0 (d × d) symmetric and for any x ∈ IRd . x Σ11

Lemma 3.4.2 (Courant–Fischer; see Lancaster (1969)) Let M be symmetric d × d with eigenvalues 1 ≥ · · · ≥ d . Define R(x) :=

x M x x x

for all x ∈ IRd .

Then k+1 = minVk ∈Adk maxx∈V ⊥ ,x =0 R(x), k = 0, . . . , d − 1, where Adk stands for the set of all linear subspaces Vk of IRd with dim Vi = k.

From the above lemma, i (S) ≤ i (0) for all i = 1, . . . , d and all S ≥ 0. Applying (3.4.18) and (3.4.19), the solution of (3.4.6) is now 12 1 1 2 2 , and it is attained for Σ22 Σ11 I(0) = 2 tr Σ11 1 12 1 1 1 2 2 2 2 Σ22 Σ11 Σ11 . Σ11 ψ = Σ11 This completes the proof of Theorem 3.4.1.

2

Problem (3.4.2) (or equivalently, the L2 -Kantorovich problem of minimization (3.4.1)) suggests a variety of extensions, which are exhibited for the case of three random vectors. Suppose that X, Y , and Z are jointly distributed p-dimensional random vectors with normal marginals N (0, Σ11 ), N (0, Σ22 ), N (0, Σ33 ), respectively, where Σii > 0, i = 1, 2, 3, are known. Denote the joint dispersion matrix by   Σ11 ψ12 ψ13    =   ψ12 Σ12 ψ23  . ψ33

ψ32

Σ33

We use Σij to denote fixed covariance matrices and ψij to denote undetermined covariance matrices. Two L2 -distances that we consider are E tr [(X − Y ) (X − Y ) + (X − Z) (X − Z) + (Y − Z) (Y − Z)] = 2 tr (Σ11 + Σ22 + Σ33 ) − 2 tr (ψ12 + ψ13 + ψ23 ) , (3.4.20) or, with the mean M = 13 (X + Y + Z), E tr [(X − M ) (X − M ) + (Y − M ) (Y − M ) + (Z − M ) (Z − M )] = 23 (Σ11 + Σ22 + Σ33 ) − 23 tr (ψ12 + ψ13 + ψ23 ) . (3.4.21)

138


Thus both (3.4.20) und (3.4.21) yield the same extremal problems depending on which ψij are fixed. This suggests three cases:   & ( Σ11 ψ12 Σ13   Σ22 Σ23   Σ11 ≥ 0, > 0; (3.4.22) 1 =  ψ21 Σ22 Σ23  , Σ32 Σ33 Σ31 Σ32 Σ33 



&

Σ11 ψ12 ψ13

2

   =   ψ21 Σ22 Σ23  , ψ31 Σ32 Σ33 

Σ11 > 0,

Σ22 Σ23 Σ32 Σ33

( > 0;

(3.4.23)

 Σ11 ψ12 ψ13

3

   =   ψ21 Σ22 ψ23  . ψ31 ψ32 ψ33

(3.4.24)

The extremal problems become Max tr ψ12 ;

(3.4.25)

Max tr (ψ12 + ψ13 ) ;

(3.4.26)

Max tr (ψ12 + ψ13 + ψ23 ) .

(3.4.27)

Σ1 ≥0

Σ2 ≥0

Σ3 ≥0

The proofs provided by Gelbrich (1990) (see Theorem 3.4.1), Olkin and Pukelsheim (1982), and Dowson and Landau (1982) do not lend themselves to a direct extension for each of the present problems. We first provide a direct proof of (3.4.2), (3.4.3), given in Olkin and Rachev (1993), which can be adapted to solve (3.4.25) and (3.4.26). Theorem 3.4.3 The matrix ψp given in (3.4.3) is a solution of the maximization problem (3.4.2). Proof: Our concern in (3.4.2) is to maximize tr ψ over the region {Σ; Σ ≥ 0}. Since Σ ≥ 0 if and only if Σ22 > 0 (by hypothesis) and Σ11 − ψΣ−1 22 ψ ≥ 0, we consider the latter condition. From the convexity of the 0 need only 1 set ψΣ−1 22 ψ ≤ Σ11 and the linearity of tr ψ, the extremum will occur at a boundary point ψΣ−1 = Σ11 , 22 ψ

(3.4.28)

which holds if and only if −1

1

2 ψΣ222 = Σ11 G,

(3.4.29)


139

where G is orthogonal. (For references and a discussion see Marshall and Olkin (1979, p. 501).) 1

1

2 2 to maxConsequently, 1ψ = Σ11 GΣ22 , and maximizing tr ψ is equivalent 1 1 1 2 2 2 2 , and imizing tr G Σ22 Σ11 . For simplicity of notation, write B = Σ22 Σ11

let λ(C) denote the eigenvalues of C, and σ(C) = λ1/2 (CC ) the singular values. Then tr GB

= =

p

1 λi (GB) ≤ p 1/2 1 λi (BB )

p =

p σi (GB) 1 |λi (GB)| ≤ 1 p 1 12 = tr(BB ) 2 , 1 λi (BB )

(3.4.30)

(see Marshall and Olkin (1979, p. 232)). Equality in (3.4.30) is attained for 1 G = B (BB )− 2 . Consequently, the maximum of tr ψ is achieved at 1 −1/2 1 1 1 1 2 2 2 2 2 ψp = Σ11 GΣ22 = Σ11 Σ22 Σ11 Σ22 Σ22 , 2

which is (3.4.3).

We now show that the solutions of (3.4.25) and (3.4.26) can be obtained with an argument similar to that given above. Solution of extremal problem (3.4.25). In problem (3.4.25) we need to maximize tr ψ12 subject to Σ1 ≥ 0, which holds provided that & ( Σ22 Σ23 > 0, Σ32 Σ33 and with ψ = ψ12 , & Σ11 − (ψΣ13 ) where &

Λ22

Λ23

Λ32

Λ33

Λ22

Λ23

Λ32

Λ33

(

& =

(

ψ Σ31

Σ22

Σ23

Σ32

Σ33

≥ 0,

(3.4.31)

(−1 .

As before, the extremum occurs at the boundary (& ( & ψT Λ22 Λ23 Σ11 = (ψΣ13 ) Λ32 Λ33 Σ31 = ψΛ22 ψ + Σ13 Λ32 ψ + ψΛ23 Σ31 + Σ13 Λ33 Σ31 .

(3.4.32)

140


Completing the square in (3.4.32) yields Σ11 − Σ13 Λ33 Σ31 + Σ13 Λ32 Λ−1 22 Λ23 Σ31 −1 ψ + Σ13 Λ32 Λ22 Λ22 ψ + Σ13 Λ32 Λ−1 . 22

Q ≡ =

(3.4.33)

Note that Σ22 > 0, Σ33 > 0, and ( & Σ22 Σ23 > 0, Σ32 Σ33 −1 which implies that Λ22 = Σ22 − Σ23 Σ−1 > 0, so that Q ≥ 0. 33 Σ32 12 1/2 Then (3.4.33) implies that Q G = ψ + Σ13 Λ32 Λ−1 22 Λ22 , where G is orthogonal. Further, −1

1

ψ = Q 2 GΛ222 − Σ13 Λ32 Λ−1 22 , and tr ψ

(3.4.34)

1 1 − tr G Λ222 Q 2 − tr Σ13 Λ32 Λ−1 22 12 1 1 − − ≤ tr Λ222 QΛ222 − tr Σ13 Λ32 Λ−1 22 ,

=

(3.4.35)

with equality at G =

1

−1

Q 2 Λ222

−1

1

2 Λ222 QΛ22

− 12

.

Consequently, the minimizing ψ is given by 1 1 −1 − − 1 − 2 − 12 Λ22 Σ13 Λ32 Λ−1 ψ0 = QΛ222 Λ222 QΛ222 22 , 2

where Q is defined by (3.4.33).

Solution of the extremal problem (3.4.26). To resolve (3.4.26) subject to & ( Σ22 Σ23 > 0, Σ2 ≥ 0 and Σ11 > 0, Σ32 Σ33 let & ψ = (ψ12 , ψ13 ) ,

∆ =

Σ22

Σ23

Σ32

Σ33

( ,


so that 2

& =

Σ11

ψ

ψ

∆

141

( .

Note that Σ2 ≥ 0 if and only if Σ11 ≥ 0 and ∆ ≥ ψ Σ−1 11 ψ. As before, the extremum occurs at 1

1

2 Σ11 G∆ 2 = ψ,

(3.4.36)

where now ψ and G are both d × 2d matrices with GG = Id . For simplicity of notation let & ( 1 R11 R12 1 2 2 C := Σ11 , ∆ = , (3.4.37) R21 R22 and partition G = (G1 , G2 ), where each Gi is a d × d matrix. Then (3.4.36) becomes & ( R11 R12 ψ = C(G1 , G2 ) = (CG1 R11 + CG2 R21 , CG1 R12 + CG2 R22 ), R21 R22 and tr (ψ12 + ψ13 )

where S =

&

= tr(G1 (R11 + R12 )C + G2 (R21 + R22 )C) (3.4.38) = tr(G1 , G2 )S = tr G, S,

(R11 + R12 )C

( .

(R21 + R22 )C 1

(3.4.39)

1

2 (Id , Id )∆ 2 is of rank d. Following our previous reasoning, Note that S = Σ11 1 1 the maximum of (3.4.38) is tr(S AS) 2 and is achieved at G = (S S)− 2 S , 1

1

1

2 so that ψ0 = Σ11 (S S)− 2 S ∆ 2 , where S is defined by (3.4.38) and (3.4.39).

Solution of Extremal Problem (3.4.27) Max tr (ψ12 + ψ13 + ψ23 ) .

Σ3 ≥0

(3.4.40)

The methods applied to solve extremal problems (3.4.25) and (3.4.26) lead to an unpleasant amount of algebra. Instead, we approach this problem afresh.

142


The triple (X, Y, Z) of jointly distributed d-dimensional random vectors with respective marginls N (0, Σ11 ) , N (0, Σ22 ) , N (0, Σ33 ) is called optimal if (3.4.40) attains its maximum. Suppose that there exist lower semicontinuous functions f1 , f2 , f3 such that X ∈ ∂f3 (Z),

Y ∈ ∂f1 (X),

Z ∈ ∂f2 (Y ) a.s.,

(3.4.41)

where ∂f (x) denotes the subdifferential of f in x. Then (X, Y, Z) is optimal. Y , Z with the same marginal distributions In fact, for any other triple X, as (X, Y, Z), set ∆(X, Y, Z) = (X, Y ) + (X, Z) + (Y, Z), where (·, ·) is the inner product in IRd . Then + f ∗ (Y ) + f2 (Y ) + f ∗ (Z) + f3 (Z) + f3 (X) E∆(X, Y, Z) ≤ E f1 (X) 1

f1∗ (Y

= E {f1 (X) + = E∆(X, Y, Z),

2

) + f2 (Y ) +

f2∗ (Z)

+ f3 (Z) + f3∗ (X)}

with fj∗ (y) = supx {(x, y) − fj (x)} denoting the conjugate of fj . (Consequently, fi (X) + fi∗ (x) ≥ (x, y) for all x, y, and fi (x) + fi∗ (y) = (x, y) if and only if y ∈ ∂fi (x).) For the special univariate case, d = 1, and Fj ∼ N (0, σj2 ), j = 1, 2, 3, 0 1 X ≡ F1−1 (U ), Y ≡ F2−1 (U ), Z ≡ F3−1 (U ) , U uniform on [0, 1], is an optimal triple. (See Lorentz, (1953); Tchen, (1980).) To see this, take fj (x) = x −1 Fj+1 ◦ Fj (u) du, j = 1, 2, 3 (with F4 ≡ F1 ); then (3.4.41) holds. 0

Condition (3.4.41) gives a solution in dimension d = 1 but is not necessary for d ≥ 2. Remark 3.4.4 Suppose the probabilities µj , j = 1, 2, 3, on IRd are of the form µ2 = µ1 ◦ T1−1 ,

µ3 = µ2 ◦ T2−1 ,

µ1 = µ3 ◦ T3−1 ,

(3.4.42)

with symmetric and positive semidefinite Tj , and suppose T3 T2 T1 = I.

(3.4.43)

By (3.4.41), if X is µ1 -distributed, Y = T1 X, Z = T2 Y , then (X, Y, Z) is optimal. In fact, take fj (x) =

1 1 (x, Tj x), gj (x) = x, Tj−1 x . 2 2

(3.4.44)


143

Then fj (x) + gj (Tj x) = (x, Tj x), and gj = fj∗ on {x; Tj x = 0}⊥ . If fj∗∗ is the second conjugate of fj , then fj∗∗ is lower semicontinuous, fj∗∗ ≤ fj , and hence fj∗∗ (x) + gj∗ (y) ≥ (x, y) for all x, y, fj∗∗ (x) + gj∗ (Tj x) = (x, Tj x); that is, Tj x ∈ ∂fj∗∗ (x). The definition of X, Y, Z together with (3.4.40) implies Y ∈ ∂f1∗∗ (X), Z ∈ ∂f2∗∗ (Y ), X ∈ ∂f3∗∗ (Z), so that (X, Y, Z) is optimal. In the particular example where the marginals are normal, T1

=

1 12 1 1 1 2 2 2 2 Σ22 T (Σ11 , Σ22 ) ≡ Σ22 Σ11 Σ22 Σ22 ,

T2

=

T (Σ22 , Σ33 ) ,

T3 = T (Σ33 , Σ11 ) .

Assuming that (3.4.43) holds, then the triple (X, T1 X, T2 T1 X) is optimal, and the maximum of (3.4.40) is E{(X, T1 X) + (Y, T2 Y ) + (Z, T3 Z)} 1 1 1 12 12 12 1 1 1 2 2 2 2 2 2 = tr Σ11 Σ22 Σ11 + tr Σ22 Σ33 Σ22 + tr Σ33 Σ11 Σ33 . Condition (3.4.43) is satisfied if Σ11 , Σ22 , Σ33 commute. In this case Σjj , j = 1, 2, 3, can be simultaneously diagonalized by the same orthogonal matrix; that is, Σjj = ΓDj Γ , where Dj is a diagonal matrix and Γ is orthogonal. Then T1

1

1

1

=

T (Σ11 , Σ22 ) = (ΓD2 Γ) 2 (ΓD1 Dα Γ)− 2 (ΓD2 Γ) 2 1 1 1 ΓD22 Γ Γ(D1 D2 )− 2 Γ ΓD22 Γ

=

ΓD22 D1 2 Γ

=

1

−1

Therefore, we obtain 1

−1

1

−1

1

−1

T1 = ΓD22 D1 2 Γ , T2 = ΓD32 D2 2 Γ , T3 = ΓD12 D3 2 Γ , so that T3 T2 T1 = I. Of course, the simplest example in which Σ11 , Σ22 , Σ33 commute is the case where Σ11 = Σ22 = Σ33 . Remark 3.4.5 The construction in this section based on (3.4.41) uses simultaneous optimal couplings for the pairs (X, Z), (Z, Y ), (Y, X), which may not exist. Alternatively, using the equivalent problem (3.4.21); one can try to find simultaneous optimal couplings of (X, M ), (Y, M ), (Z, M ) where M = (X + Y + Z) is the sum. In the normal case Knott and Smith (1994) obtained in this way the following result: Let X ∼ N (0, Σ1 ), Y ∼

144

3. Explicit Results for the Monge–Kantorovich Problem 1/2

1/2

N (0, Σ2 ), Z ∼ N (0, Σ3 ); if Σ0 satisfies Σ3i=1 (Σ0 Σi Σ0 )1/2 = Σ0 , then 1/2 with Ri = Ki (Ki K02 Ki )−1/2 Ki , Ki := Σi , it follows that T1 = R2 R1−1 , −1 −1 T2 = R3 R2 , T3 = R1 R2 define optimal couplings by Y = T1 X, Z = T2 Y (and X = T3 Z).

3.5 Maximum Probability of Sets, Maximum of Sums, and Stochastic Order Consider the case n = 2 and probability spaces (S1 , B1 , P1 ), (S2 , B2 , P2 ); then for any B ∈ B1 ⊗ B2 (resp. B ∈ F(S) in the topological situation) the duality theorems (cf. Theorems 2.3.1. 2.3.8, 2.4.1 and Corollary 2.3.9) yield (3.5.1) sup{P (B); P ∈ M (P1 , P2 )}

= I(B) = inf f1 dP1 + f2 dP2 ; fi ∈ L1 (Pi ), 1B ≤ f1 ⊕ f2 .

S(B) =

An “explicit” evaluation of the dual functional can be given. Theorem 3.5.1 (a) For any B ∈ P(S), I(B) = inf{P1 (B1 ) + P2 (B2 ); Bi ∈ Bi , B ⊂ B1×S2 ∪ S1×B2 },(3.5.2) and solutions B1 , B2 of the right-hand side of (3.5.2) exist. (b) If Si are topological spaces and B ∈ F(S), then in (3.5.2) one can restrict the infimum to Bi ∈ F(Si ). Proof: (a) For the evaluation of I(B) the functions fi in the definition of I can be confined to 0 ≤ hi ≤ 1 (cf. the proof of Proposition 2.2.4). Therefore, by the well-known integration formula,

f1 dP1 +

f2 dP2

1 1 = P1 (f1 ≥ t) dt + P2 (f2 ≥ 1 − t) dt (3.5.3) 0

≥

0

inf (P1 (f1 ≥ t) + P2 (f2 ≥ 1 − t)).

0≤t≤1

Choosing B1 := {f1 ≥ t}, B2 := {f2 ≥ 1 − t} we have, from the admissibility 1B ≤ f1 ⊕ f2 , that B ⊂ B1×S2 ∪ S1×B2 (for any t !). Therefore, I(B) = inf{P1 (B1 ) + P2 (B2 ); Bi ∈ Bi , B ⊂ B1 × S2 ∪ S1 × B2 }. Consequently, there exist minimal functions fi , 0 ≤ fi ≤ 1, in the definition of I(B) (cf. Theorem 2.1.1). For these functions the inequality in (3.5.3)

3.5 Maximum Probability of Sets, Maximum of Sums, and Stochastic Order

145

is in fact an equality for almost all t, and so for almost all t the sets B1 = {f1 ≥ t}, B2 = {f2 ≥ 1 − t} are minimal. (b) For B ∈ F(S) the functions fi can be confined to fi ∈ F(Si ) with 0 ≤ fi ≤ 1. So the level sets {f1 ≥ t}, {f2 ≥ 1 − t} are in F(Si ). 2 As a consequence of the existence theorem we get the following characterization of the support measures given with marginals P1 , P2 . Corollary 3.5.2 (Support of marginal measures) Let Si be topological spaces B ∈ F(S). Then there exists a measure P ∈ M (P1 , P2 ) with supp P ⊂ B if and only if P1 (B1 ) + P2 (B2 ) ≥ 1 ∀Bi ∈ F(Si ) with B ⊂ B1×S2 ∪ S1×B2 .(3.5.4) It has been shown in Chapter 2 that under certain conditions the duality theorem (D) holds on Bm (S), the lower majorized Borel functions on S. Under this condition one can characterize the class of all functions that are integrable with respect to M . Proposition 3.5.3 Assume that (D) holds on Bm (S). Then 5 L1 (P ) = Bm (S) ∩ B m (S). (a) L1 (M ) = P ∈M

(b)

For h ∈ Bm (S) ∩ Bm (S) the integral h dP is independent of P ∈ M if and only if h = h1 ⊕ h2 for some finite hi ∈ L1 (Pi ). N

Proof: (a) If h ∈ B(S) is integrable with respect to all P ∈ M , then |h| ∈ L1 (M ). Hence S(|h|) < ∞ (cf. Proposition 2.2.4). This implies I(|h|) < ∞ as well as (a). (b) Only one direction of this statement has to be shown. Suppose that supµ∈M µ(h) = inf µ∈M µ(h) = − supµ∈M µ(−h). Therefore, S(−h) = −S(h), and by (D), I(−h) = −I(h). By the existence theorem (cf. Theorem 2.3.12), there exist functions fi , hi ∈ L1 (Pi ) such that f := ⊕fi ≤ h ≤ N N ⊕gi =: g and Pi (fi ) = Pi (gi ). These relations lead to d(f, h) = I(g − f ) = Pi (gi − fi ) = 0 and thus yield f = h = g. N

N

2

Definition 3.5.4 Let (Y, ≤) be a topological space with a partial order “≤”. (Y, ≤) is called an ordered topological space if R(Y ) := {(x, y); x, y ∈ Y, x ≤ y} is closed.

(3.5.5)

146


A ⊂ Y is called isotone if any y ∈ Y with x ≤ y for some x ∈ A belongs to A. Theorem 3.5.5 (Strassen representation theorem) (a) Let (Y, ≤) be a partially ordered set and assume that R(Y ) ∈ B ⊗ B and that P1 is perfect. Then S(R(Y )) = 1 − sup{P1 (A) − P2 (A); A ∈ B, A isotone}.

(3.5.6)

(b) If (Y, ≤) is an ordered topological space, then the following are equivalent: (i) There exists µ ∈ M with µ(R(Y )) = 1. (ii) P1 (A) ≤ P2 (A) for all isotone Borel sets. (iii) P1 (A) ≤ P2 (A) for all A ∈ F(Y ). (iv) P1 (A) ≤ P2 (A) for all A ∈ G(Y ). Proof: (a) By the general duality theorem (cf. Theorem 2.4.3), S(R(Y ))

= I(R(Y )) = inf{P1 (B1 ) + P2 (B2 ); Bi ∈ B, R(Y ) ⊂ B1×Y ∪ Y ×B2 } = inf{P1 (B1 ) + P2 (B1c ); R(Y ) ⊂ B1×Y ∪ Y ×B1c }.

Since for x ∈ B1c and y ≥ x, (x, y) ∈ R(Y ), this yields that y ∈ B1c ; i.e., B1c is isotone. This implies (with A = B1c ) that S(R(Y ))

= inf{1 − P1 (A) + P2 (A); A ∈ B, A isotone} = 1 − sup{P1 (A) − P2 (A); A ∈ B, A isotone}.

(b) Since R(Y ) ∈ F(Y ), the existence of a maximal measure µ follows, and µ(R(Y )) = 1 if and only if P1 (A) ≤ P2 (A) for all isotone A ∈ B. (ii) ⇒ (i). For any isotone A ∈ B and ε > 0 choose K ⊂ A compact with P1 (A\K) < ε and define K+ = π2 (R(Y ))∩(K×Y )), the isotone completion of K, which is in fact closed and isotone. Moreover, K ⊂ K+ ⊂ A, and therefore, P1 (A) − ε < P1 (K) ≤ P1 (K+ ) ≤ P2 (K+ ) ≤ P2 (A). For ε ↓ 0 this leads to the inequality P1 (A) ≤ P2 (A). Obviously (ii) ⇔ (iii).

2


147

Theorem 3.5.5(b) is a general version of Strassen’s a.s. representation theorem. In terms of random variables, part (b) can be expressed by the following equivalence: P1

≤st

P2

(≤st the stochastic ordering) d

d

if and only if there exist r.v.s X = P1 , Y = P2 on a suitable probability space such that X ≤ Y

a.s.

(3.5.7)

This a.s. representation theorem has been very influential for the theory and application of stochastic ordering. The first part is due to Kellerer (1984) and R¨ uschendorf (1986). The important Theorem 3.5.1 does not extend to products with n ≥ 2 components. But there is the following “extension” to monotone functions in the topological situation (with tight measures). Proposition 3.5.6 (monotone functions) Let (Si , Bi ) be ordered topological spaces and let h ∈ P(S) be isotone. Then n n 3 1 hi dPi ; hi ∈ Lf (Pi ), h ≤ hi . (3.5.8) I(h) = inf i=1

i=1

Proof: The proof of Proposition 3.5.6 needs the following technical lemma (cf. Kellerer (1984, Lemma 3.9)), which is based on the tightness of the occurring measures and uses a standard argument. Lemma 3.5.7 Let Y be an ordered topological space, ν ∈ M 1 (Y ) and g ∈ P(Y ) be isotone. Then

b sup f dν; f ≤ g, f ∈ L (ν), f isotone , ν∗ (g) = and ν ∗ (g)

=

inf

(3.5.9)

f dν; g ≤ f ∈ L1f (ν), f isotone .

7n For the proof of Proposition 3.5.6 consider hi ∈ L1f (Pi ) with h ≤ i=1 hi . For ε > 0 and Ai := {hi < ∞} define     hi (xi ); xi ∈ Ai , i = 1 . h1 (x1 ) := sup h(x1 , . . . , xn ) −   i =1

Then h1 can be replaced by h1 without violating the inequality h ≤ ⊕hi , and furthermore, h1 is isotone. Applying Lemma 3.5.7 yields an isotone

148


function h1 ∈ L1f (P1 ) such that h1 ≤ h1 and P1 (h1 ) < P1∗ (h1 ) + ε ≤ P1 (h1 ) + ε. Continuing this way, all functions hi may be replaced by functions hi that 1 are Pi (hi ) ≤ isotone, and moreover, they are in Lf (Pi ) and such that 2 Pi (hi ) + uε. This completes the proof of Proposition 3.5.6. A basic problem of PERT networks is the determination of the total duration distribution function. If Xj are random variables with distributions Pj , 1 ≤ j ≤ n, and Ij , 1 ≤ j ≤ k, are subsets of {1, . . . , n} with 2k j=1 Ij = {1, . . . , n} (Ij corresponding to some critical paths of the network), then the total duration is given by Xi . (3.5.10) Tn = max 1≤j≤k

i∈Ij

Sharp upper bounds for the distribution of Tn under M (P1 , . . . , Pn ) with respect to convex ordering of distributions have been given by Meilijson n and Nadas (1979). Consider the series case (that is, k = 1), so Tn (x) = i=1 xi ; then (F1−1 (U ), . . . , Fn−1 (U )), U uniformly distributed on (0, 1), gives the distribution such that Tn is maximal with respect to convex stochastic ordering (cf. Meilijson and Nadas (1979)). Define for t fixed n x ∈ IRn ; xi ≤ t ; An (t) := i=1 (3.5.11) n n + x ∈ IR ; xi < t . An (t) := i=1

The maximal and minimal probabilities of An (t) for n = 2 were determined in Makarov (1981) and R¨ uschendorf (1982). For two d.f.s F1 , F2 on the real line define the infimal and supremal convolutions: F1 ∧ F2 (t)

:=

inf x (F1 (x−) + F2 (t − x));

F1 ∨ F2 (t)

:=

supx (F1 (x−) + F2 (t − x)).

(3.5.12)

Theorem 3.5.8 (Maximum of sums) For any t ∈ IR1 , sup{P (A2 (t)); P ∈ M (P1 , P2 )} = inf{P (A+ 2 (t));

P ∈ M (P1 , P2 )} =

F1 ∧ F2 (t), F1 ∨ F2 (t) − 1.

(3.5.13)

Proof: By (3.5.2) we have M

sup{P (A2 (t)); P ∈ M (P1 , P2 )} = 1 − sup{P2 (U ) − P1 (π1 (A2 (t) ∩ (IR1 × U ))); U ∈ G(IR2 )}.

=


149

Since π1 (A2 (t) ∩ (IR1 × U )) = {x ∈ R1 ; ∃y ∈ U, x + y ≤ t} = (−∞, t − inf U ), we can restrict our considerations to open intervals U = (x, ∞). This leads to M = 1 − sup{P2 (x, ∞) − P1 (−∞, t − x)} = F1 ∧ F2 (t). x

Similarly, c sup{P ((A+ 2 (t)) ); P ∈ M (P1 , P2 )} = 2 − sup(F1 (x−) + F2 (t − x)), x

2

which concludes the proof.

Remark 3.5.9 As a consequence of Theorem 3.5.8 it follows that for any ball Br := {(x, y); x2 + y 2 ≤ r2 } of radius r, sup{P (Br ); P ∈ M (P1 , P2 )} = min (P1 (x2 < δ 2 ) + P2 (y 2 ≤ r2 − δ 2 )). 2 2

(3.5.14)

0≤δ ≤r

In particular, if P1 = P2 = U (−1, 1) are uniformly distributed on [−1, 1], then   r for r ≤ 1, sup{P (Br ); P ∈ M (P1 , P2 )} = (3.5.15)  1 for r > 1. On the other hand, for the distribution P ∗ corresponding to the least convex majorant we have   r/2 for r ≤ 2, (3.5.16) P ∗ (Br ) =  1 for r > 2. For some special cases a general solution for n ≥ 2 has been found (cf. R¨ uschendorf (1982)). Proposition 3.5.10 (a) If Pi = U (0, 1), 1 ≤ i ≤ n, then sup{P (An (t)); P ∈ M (P1 , . . . , Pn )} =

2 t, n

0≤t≤

n . 2

(3.5.17)

150


(b) If Pi = B(1, p), 1 ≤ i ≤ n, then (for k ≤ np), sup{P (An (k)); P ∈ M (P1 , . . . , Pn )} =

n (1 − p). n−k

(3.5.18)

Proof: (a) From the duality theorem (by a symmetry argument) sup{P (An (t)); P ∈ M (P1 , . . . , Pn )} = (3.5.19) n xi , xi ∈ [0, 1] . f (xi ) ≥ 1[0,t] inf n f dλ\1 ; i=1

It is easy to see that the right-hand side H(t) of (3.5.19) fulfills H(t) = tH(1), 0 ≤ t ≤ n2 , and therefore, it is enough to consider the case t = 1. Define then   2 x if 0 ≤ x ≤ 2 , n n (3.5.20) f (x) =  0 otherwise. n n Then it is easy to show that i=1 f (xi ) ≥ 1[0,1] ( i=1 xi ) and n f dλ = 2/n 2 2 difficult to construct uniformly disn 0 n − x dx = n . Also, it is not n tributed random variables Ui with λ { i=1 Ui ≤ t} = n2 t, 0 ≤ t ≤ n2 (cf. R¨ uschendorf (1982)). This implies (a). (b) Define P ∗ on {0, 1}n by P ∗ ({x}) = a ∗

P ({1}) = b,

if and

xi = k,

(3.5.21)

∗

P ({x}) = 0 otherwise,

where k

a = (u − k)

n−1 k−1

,

b = 1−

n (1 − p). n−k

Then P ∗ is symmetric, n−1 a+b=p P ∗ {x1 = 1} = k−1 and so P ∗ ∈ M (P1 , . . . , Pn ). n (1 − p). The dual problem reFurthermore, P ∗ (An (k)) = nk a = n−k duces in this case to inf n f dP1 ; f (xi ) ≥ 1[0,k] xi =

inf{n[(1 − p)f0 + pf1 ]; rf1 + (n − r)f0 ≥ 1, 0 ≤ r ≤ k},

3.6 Hoeffding–Fréchet Bounds

151

where f0 := f (0), f1 := f (1). In the case f0 ≥ f1 the admissibilty conditions are equivalent to kf1 + 1 k (n − k)f0 ≥ 0 or, equivalently, to f0 ≥ n−k + n−k f1 . So the solution is 1−p 1 attained in this case for f0 = n−k , f1 = 0, which yields the value n n−k . 1 f = f = , and the The case f0 ≤ f1 leads similarly to the solutions 0 1 n n (1 − p) . value 1. Together, the inf value equals min 1, n−k

2

Remark 3.5.11 (a) The corresponding inf result in part (a) of Proposition 3.5.10 is for Pi = U (0, 1), inf{P (An (t)); P ∈ M (P1 , . . . , Pn )} 2 t − 1 , 1 , t ≥ 0. = min n +

(3.5.22)

(b) In the proof of (3.5.10) we used that the dual problem can be reduced in the case P1 = · · · = Pn = P to the problem n f (xi ) ≥ c(x1 , . . . , xn ) (3.5.23) inf n f dP ; f ∈ L1 (P ), i=1

if c(x1 , . . . , xn ) is symmetric.

3.6 Hoeffding–Fréchet Bounds, Monte Carlo Simulation and Maximal Dependence Let (Si , Bi , Pi ), 1 ≤ i ≤ n. Then the following characterization of M (P1 , . . . , Pn ) is known as the Hoeffding–Fréchet bounds: If P ∈ M 1 (S, B), then P ∈ M (P1 , . . . , Pn ) if and only if for all Ai ∈ Bi , 1 ≤ i ≤ n, the following inequalities hold: n Pi (Ai ) − (n − 1) ≤ P (A1 × · · · × An ) (3.6.1) i=1

+

≤

min (Pi (Ai )).

1≤i≤n

4n Setting Bi := S1 × · · · × A1 × · · · × Sn , A1 × · · · × An = i=1 Bi and pi = P (Bi ) = Pi (Ai ), notice that for any P ∈ M (P1 , . . . , Pn ) the bounds in (3.6.1) 4n are identical to the Bonferoni bounds of first order for probabilities P ( i=1 Bi ) when pi = P (Bi ) are given.

152


Let Si = IR1 , Ai = (−∞, xi ]. Then from (3.6.1), a characterization of distribution functions F = FP with marginals Fi = FPi is given by the bounds n F− (x) := Fi (xi ) − (n − 1) ≤ F (x) ≤ min Fi (xi ) =: F+ (x). (3.6.2) i=1

+

1≤i≤n

Here F+ is a distribution function (with marginals Fi ) known as the d.f. of the Hoeffding–Fréchet upper bound. On the other side, F− (x) is a d.f. only for n = 2. For n ≥ 2, F− is a d.f. only in exceptional cases (cf. Dall’Aglio (1972)). A second indication of the sharpness of the Hoeffding–Fréchet bounds is the following important result of Kellerer (1961), which, in fact, solved the problem, which was due to Fréchet (1951). Theorem 3.6.1 (Fréchet problem) Let µ be a finite measure on B1 ⊗ B2 and let (S1 , B, P1 ) be perfect. Then there exists an element P ∈ M (P1 , P2 ) with P ≤ µ if and only if µ(A1 × A2 ) ≥ P1 (A1 ) + P2 (A2 ) − 1,

for all Ai ∈ Bi .

(3.6.3)

The inequalities in (3.6.2) had significant influenceon the development of inequalities on integrals of the form c(x) dP (x) ≤ c(x) dP+ (x); where P+ corresponds to F+ , and c is a Monge type function (cf. section 3.1). The following result on the sharpness of the Hoeffding–Fréchet bounds in (3.6.1) is due to R¨ uschendorf (1981). Theorem 3.6.2 (Sharpness of Hoeffding–Fréchet bounds) Let (Si , Bi , Pi ), 1 ≤ i ≤ n, be probability spaces, all of them except at most one perfect. Then for all Ai ∈ Bi , 1 ≤ i ≤ n, sup{P (A1 × · · · × An ); P ∈ M (P1 , . . . Pn )} = min{Pi (Ai ), 1 ≤ i ≤ n} and inf{P (A1 × · · · × An ); P ∈ M (P1 , . . . Pn )} n Pi (Ai ) − (n − 1) . = i=1

+

Proof: By the duality theorem (with n factors) S(A1 × · · · × An ) = =

I(A1 × · · · × An ) n fi dPi ; fi ∈ B(Si , Bi ), fi ◦ πi ≥ 1A1 ×···×An . inf i=1

(3.6.4)

(3.6.5)


153

For any admissible (fi ) of the dual problem with ai := inf{fi (x); x ∈ Si }, we have ai ≥ 0. Define for i with ai < 0, fi := fi − ai , and fi := fi − ai +

1 aj n − |{i; ai < 0}| j=1 n

otherwise. Then fi ≥ 0, 1 ≤ i ≤ n, and f i ◦ πi = fi ◦ πi and fi dPi = fi dPi . So without loss of generality we assume that n ai ≥ 0. For bi := inf{fi (x); x ∈ Ai }, we have bi ≥ 0 and i=1 bi ≥ 1. ∗ Therefore, fi := bi 1Ai are admissible, and moreover, (fi 1Ai ) ◦ πi ≥ bi 1Ai ◦ πi = fi∗ ◦ πi f i ◦ πi ≥ and

fi dPi ≥

bi Pi (Ai ).

This implies that I(A1 × · · · × An ) = =

inf

bi Pi (Ai ); bi ≥ 0,

bi = 1

min{Pi (Ai ); 1 ≤ i ≤ n}.

The proof of (3.6.4) is similar. By the duality, B

:= =

inf{P (A1 × · · · × An ); P ∈ M (P1 , . . . , Pn )} (3.6.6)

fi ◦ πi ≤ 1A1 ×···×An . sup fi dPi ; fi ∈ B(Si , Bi ),

Let (fi ) be admissible for the dual problem and let Bi = inf{fi (x); x ∈ Aci }, ai := inf{fi (x); x ∈ Ai }−bi . Then (fi ) are admissible, where fi = ai 1Ai +bi and fi ≤ fi , 1 ≤ i ≤ n. So, without loss of generality, we can again assume that fi = ai 1Ai + bi . With b := bi , admissibility of (fi ) is equivalent to ai + b ≤ 1 and ai + b ≤ 0 (3.6.7) i∈J

for all strict subsets J ⊂ {1, . . . , n} and B = sup ai Pi (Ai ) + b; (ai ), b satisfy (3.6.7) . For the attainment of the sup, equality holds in at least one restriction in (3.6.7). A discussion of these cases yields the result. If, for example, ai + b = 1, then ai ≥ ai + j =i aj + b = 1, 1 ≤ i ≤ n, and so ai Pi (Ai ) + b = ai (Pi (Ai ) − 1) + 1 ≤ Pi (Ai ) − (n − 1), and

154


the right-hand side is attained for ai = 1, b = −(n − 1). For the other cases cf. R¨ uschendorf (1981). 2 A problem that stems from Monte Carlo simulation is the problem of variance reduction. For probability measures P1 , . . . , Pn on (IR1 , B1 ) with d

, Fn , construct r.v.s X1 , . . . , Xn with Xi = Pi such that the d.f.s F1 , . . . n variance of i=1 Xi is as large or as small as possible. An equivalent probn 2 lem is to minimize or maximize E ( i=1 Xi ) or E i<j Xi Xj . For the Hoeffding–Fréchet bounds (3.6.2) and from (3.1.9) one obtains that for any d

Xi = Pi , (a) (b)

EF1−1 (U )F2−1 (1 − U ) ≤ EX1 X2 ≤ EF1−1 (U )F2−1 (U ), (3.6.8) 2 n 2 n −1 Xi ≤ E( Fi (U ) , (3.6.9) E i=1

i=1

where U is uniformly distributed on (0, 1). So Var(F1−1 (U ) + F2−1 (1 − U )).

≥

Var(X1 + X2 )

(3.6.10)

This inequality is applied in Monte Carlo simualation and is known as the method of antithetic variates. For general n only some partial results and estimates are known. Obviously, a minimum is obtained in the case n n where one can construct r.v.s such that i=1 Xi = c = i=1 EXi . This is characterized by the following result. Theorem 3.6.3 There exist r.v.s X1 , . . . , Xn with d

Xi = Pi

n

and

Xi = c

(3.6.11)

i=1

if and only if n i=1

fi dPi

≤

sup

n i=1

fi (xi );

n

xi = c

(3.6.12)

i=1

for all continuous bounded functions fi on IR1 . Proof: Obviously, (3.6.11) implies (3.6.12). For the converse we apply the following theorem, due to Strassen (1965): Let Λ be a convex, weakly closed subset of the set of probability measures on a product S ×T of Polish spaces. Then there exists a probability measure λ ∈ Λ with marginals µ and ν if and only if


155

1 0 f dµ + g dν ≤ sup (f ◦ πS + g ◦ πT ) dγ; γ ∈ Λ with the projections πS , πT on S × T .

Using a variant of this result with n factors and choosing Λ := P ∈ M 1 (IRn , Bn ); P xi = c =1 , we have sup

fi ◦ πi dP ; P ∈ Λ

= sup

n

fi (xi );

i=1

n

xi = c ,

i=1

2

where πi is the projection on the ith coordinate.

Example 3.6.4 Consider the case Pi = U (0, 1), 1 ≤ i ≤ n. For n = 2 define X1 := U, X2 := 1−U . Then X1 +X2 = 1. For n = 3 define X1 := U , X2 := U + 12 1[0, 12 ] (U )− 12 1( 21 ,1) (U ), and X3 := −2U +1[0, 12 ] (U )+2 1( 12 ,1) (U ). Then X1 + X2 + X3 = 32 . For general n we use a combination of the cases n = 2, 3. As a consequence of Example 3.6.4 and Theorem 3.6.3 we have the following bound. Corollary 3.6.5 Let f : [0, 1] → IR1 , f ∈ L1 (λ\1 ). Then n 1 n n 1 1 \ . f dλ ≤ sup f (xi ); xi ∈ [0, 1], xi = n i=1 2 i=1 0

Example 3.6.6 (a) Let Pi be uniform on {1, . . . , n}. Then similarly to n Example 3.6.4 one can construct Xi , 1 ≤ i ≤ n, with Xi ∈ {a, a + 1} i=1

that minimize the variance of the sum. d

(b) If Pi = B(1, ϑ), one can construct Xi = Pi with

n

Xi ∈ {k, k + 1},

i=1

≤ ϑ ≤ k+1 n , that minimize the variance of the sum. The minimal variance is given by the cyclic function k n

vk (ϑ) = a(k, ϑ)(1 − a(k, ϑ)), a(k, ϑ) = kϑ(mod 1).

(3.6.13)

d

Random variables Xi = Pi , 1 ≤ i ≤ n, are called maximally dependent if d for all Yi = Pi , 1 ≤ i ≤ n, max Yi

P 1≤i≤n

≤st

max Xi

P 1≤i≤n

,

(3.6.14)

156


where ≤st is the stochastic ordering on IR1 . From the sharpness of the Hoeffding–Fréchet bounds in Theorem 3.6.2, n P max Yi ≤ t ≥ P max Xi ≤ t = FPi (t) − (n − 1) . (3.6.15) 1≤i≤n

1≤i≤n

i=1

+

Lai and Robbins have constructed a sequence of r.v.s attaining this bound. One can give a construction in a sequential way. Define for U1 , . . . , Un independent R(0, 1)-distributed inductively random variables V1 , . . . , Vn by V1 := F1−1 (U1 ), V2 := F2−1 (1−U1 ). Let H denote the d.f. of max(V1 , . . . , V ). Then define −1 , (3.6.16) 1 − H max(Vi , U +1 ) V +1 = F +1 i≤

where H (x, α) = P (max Vi < x) + αP (max Vi = x). Then it is easy to i≤

i≤

d

check by induction that Vi = Pi and n P max Vi ≤ = Fi (t) − (n − 1) ; i≤n

i=1

(3.6.17)

+

i.e., V1 , . . . , Vn are maximally dependent. A simple duality argument for (3.6.15) is the following. Obviously, for any real α, max Xi ≤ α + i≤n

n

(Xi − α)+ .

(3.6.18)

i=1 d

Therefore, if it is possible to construct r.v.s Xi = Pi such that ({Xi > α}) are disjoint for some α, then equality in (3.6.18) holds. Moreover, the (Xi ) admit a stochastically largest maximum, and E max Xi = α +

n

E(Xi − α)+ .

(3.6.19)

i=1

The construction above has the property of disjointness of ({Xi > α}). In the case P1 = · · · = Pn choose α = F −1 (1 − n1 ), where F = FPi is the distribution function of Pi . This type of argument has been extended by Rychlik (1992) to bound the distribution function and expectation of functions of order statistics. Let Xm:n resp. Fm:n denote the mth smallest order statistic of a sample (X1 , . . . , Xn ) resp. its distribution function.


157

Theorem 3.6.7 Let X1 , . . . , Xn be random variables with distributions P1 , . . . , Pn and distribution functions F1 , . . . , Fn .

n 1 Fi , 1 . m i=1

(a)

Fm:n ≤ min

(b)

If F1 = · · · = Fn = F, then n nF − m + 1 F, 1 . ≤ Fm:n ≤ min n−m+1 + m

(c)

(3.6.20)

(3.6.21)

If F1 = · · · = Fn = F, then 1 n E ci Xi:n = F −1 (x) dC(x), i=1

(3.6.22)

0

where C is the greatest convex function such that C(0) = 0 and C j i=1 ci . Proof: (a) From

mFm:n (x) =

n

n i=1

Fi:n =

Fi (x) −

i=1

Therefore, Fm:n ≤

m−1

n

n i=1

n

≤

Fi , the following identity holds:

P (Xi:n ≤ x < Xm:n ) −

i=1 1 m

i=1

j

n

Fi:n (x). (3.6.23)

i=m+1

Fi (x).

(b) follows from (3.6.23). (c) First, we show that F1:n , . . . , Fn:n arethe d.f.s of order statistics n X1:n , . . . , Xn:n if and only if the conditions i=1 Fi:n = nF and Fi:n ≥ Fi+1:n , 1 ≤ i ≤ n, hold. These conditions are obviously necessary. On the −1 (U ) with probability n1 , i, j ≤ n, where U is other hand, define Xj = Fi:n n d uniform on (0, 1). Then obviously Xi = F , and therefore, E i=1 ci Xi:n = 1 x d ( ci Fi:n (x)) = F −1 (u) d ( ci Gi (u)), where 0

Gi (u) = nu

and 1 ≥ Gi (x) ≥ Gi+1 (x) ≥ 0.

(3.6.24)

Replace the maximization of the expectation by pointwise minimization of are linear proci Gi (u), u ∈ (0, 1), with the constraints (3.6.24). These gramming problems, whose solutions G∗i satisfy ci G∗i (u) = C(u). Since u → G∗i (u) are continuous d.f.s on [0, 1], we can use Fi:n = G∗i ◦ F in the construction above in order to obtain equality in (3.6.22). 2

158


3.7 Bounds for the Total Transportation Cost In this section we consider the mass transportation problem on IRn with cost function n 1/p p |xi − yi | , 1 ≤ p < ∞. (3.7.1) cp (x, y) = ||x − y||p = i=1

Let F1 , F2 be n-dimensional distribution functions and for H := F1 − F2 , let FH denote the class of all 2n-dimensional (joint) distribution functions F with n-dimensional marginals F1 , F2 such that F1 − F2 = H. Set     Ap (H) := inf ||x − y||p dF (x, y); F ∈ FH , (3.7.2)   IR2n

the value of the optimal multivariate transshipment costs. Let 1/q+1/p = 1, and assume that F1 , F2 have densities f1 , f2 with respect to the Lebesgue measure h := f1 − f2 . Theorem 3.7.1 (Bounds to the total cost) (a) For the value of the optimal tranportation costs we have the upper bound ||y||p |JH (y)| dy, (3.7.3) Ap (H) ≤ Bp (H) := IRn

where JH (y) :=

1

t−(n+1) h(y/t) dt.

0

(b) If there exists a continuous function g : IRn → IR1 , almost everywhere differentiable and satisfying for p = 1 ∇g(y) = (sgn (yi JH (y)))

a.e.,

(3.7.4)

respectively for p > 1, ∇g(y) = (sgn (yi JH (y)))

|yi | ||y||q

q/p ,

then equality in (3.7.3) holds. Proof: (a) From the duality theorem (cf. section 4.2.2),     Ap (H) = sup f dH ; |f (x) − f (y)| ≤ ||x − y||p .   n IR

(3.7.5)

3.7 Bounds for the Total Transportation Cost

159

From the Radermacher theorem we infer that any Lipschitz function f is almost everywhere differentiable, and as sup{ ∇f (y), a ; ||a||p = 1} = ||∇f (y)||q , we obtain from the Lipschitz condition that ||∇f (y)||q ≤ 1 almost everywhere. Using a Taylor expansion 1 f (y)

=

∇f (ty), y dt,

f (0) + 0

we conclude that Ap (H)

(3.7.6)   1   ≤ sup ∇f (ty), y dt h(y) dy ; ||∇f (y)||q ≤ 1 a.e.   IRn 0    1  y 1 = sup dt dy ; ||∇f (y)||q ≤ 1 a.e. ∇f (y), y n+1 h   t t IRn 0     ||y||p |JH (y)| ||∇f (y)||q dy; ||∇f (y)||q ≤ 1 a.e. ≤ sup   IRn ≤ ||y||p |JH (y)| dy. IRn

(b) In the inequalities | x, y | ≤ |xi yi | ≤ ||x||p ||y||q ,

||x||p = 1,

equality is attained for p > 1 if and only if xi

=

sgn yi

|yi |q/p q/p ||y||q

=

yi

|yi |q/p−1 q/p

||y||q

,

while for p = 1 equality holds if and only if sgn xi = sgn yi . This implies part (b) of the theorem. 2 Remark 3.7.2 Condition (3.7.4) is fulfilled in dimension 1, so that the bound (3.7.3) is sharp and Ap (H)

=

+∞ |F1 (x) − F2 (x)| dx −∞

1 = 0

|F1−1 (u) − F2−1 (u)| du

(3.7.7)

160


cf. (3.1.6). A simple sufficient condition for p = 1 for (3.7.4) is given by JH

≥

0

a.e.,

(3.7.8)

which is a stochastic ordering condition. More generally, we can allow a “simple” structure of the set {JH ≥ 0}.

4 Duality Theory for Mass Transfer Problems

4.1 Introductory Remarks and Formulation of the Duality Theorems in the Case of a Compact Space 4.1.1

Introductory Remarks

In this chapter a general approach to the mass transfer problem is developed. While in Chapters 2 and 3 general versions of the Monge–Kantorovich problems have been established, we concentrate in this part mainly on the Kantorovich–Rubinstein problem. At first, the duality theory is studied for the compact case and for cost functions satisfying the triangle inequality. The case of general cost functions can be reduced to this case by the “reduction theorem.” An abstract version of the duality is then proved based on the conjugate duality theory of convex analysis. This allows extensions to the noncompact case. Finally, applications to the Monge–Kantorovich type problems are established. In the case of cost functions satisfying the triangle inequality, both types of mass transfer problems coincide. This allows us to transfer characterizations for cost functions satisfying the Kantorovich– Rubinstein duality theorem to the Monge–Kantorovich duality situation. The results of this chapter are due to Levin and Milyutin (1979) and Levin (1984, 1987, 1992, 1995a, 1995b), and Levin and Rachev (1989) with some additional remarks and refinements provided to us by V.L. Levin in (1995a).

162

4. Duality Theory for Mass Transfer Problems

Given a compact metric space (S, r) and two finite positive Borel measures on it, σ1 and σ2 with σ1 S = σ2 S, we consider the problem of minimizing the functional r(x, y)µ( d(x, y)) (4.1.1) S×S

over all finite positive Borel measures µ on S × S having σ1 and σ2 as their marginals; that is, µ(B × S) = σ1 B,

µ(S × B) = σ2 B

for every Borel B ⊆ S.

(4.1.2)

This is the Monge–Kantorovich problem with given marginals. The problem was first posed and studied by Kantorovich (1942, 1948) (notice that in Kantorovich (1942) the assumption that r is a metric is omitted. However, this assumption is used substantially in the outlined proof of the optimality criterion). The mass transfer problem with a given marginal difference was posed and studied by Kantorovich and Rubinstein (1957, 1958) (see also Kantorovich and Akilov (1984)). It consists in minimizing the functional (4.1.1) over all finite positive Borel measures µ satisfying µ(B × S) − µ(S × B) = B

for every Borel B ⊆ S,

(4.1.3)

where = σ1 − σ2 . We refer to this problem as the Kantorovich–Rubinstein (mass transshipment) problem. Both forms of the transfer problem with metric cost function are equivalent in the sense that their optimal values are equal, and for the problem with a fixed marginal difference there exists an optimal measure µ satisfying (4.1.2). These problems extend the old “déblai et remblais” problem of Monge (1781) and admit the following mass transfer interpretation. Given an initial σ1 and a required σ2 distribution of some commodity, one has to proceed from the first distribution to the second at the minimal expense, provided that the cost of transferring a unit of the commodity from a point x to a point y equals the distance between the points. For problem (4.1.1), (4.1.3), transshipments in several steps may be used, but in problem (4.1.1), (4.1.2), they are not allowed. We next consider the optimal value of the (mass) transfer problem as a functional of σ1 , σ2 : dKR (σ1 , σ2 )     r(x, y)µ( d(x, y)); µ ≥ 0, π1 µ − π2 µ = σ1 − σ2 := inf   S×S

(4.1.4)

4.1 Duality in the Compact Case

  =

inf



r(x, y)µ( d(x, y)); µ ≥ 0, π1 µ = σ1 , π2 µ = σ2

163

  

.

S×S

Here, π1 and π2 denote the projection of measures on S × S to their marginals on S: π1 µ(B) := µ(B × S),

π2 µ(B) := µ(S × B)

for all Borel sets B ⊆ S.

As was mentioned above, the infimum in (4.1.4) is actually attained at some measure µ ≥ 0 satisfying (4.1.2). Kantorovich and Rubinstein (1958) showed that dKR is a metric on the set of probability measures on S, known as the Kantorovich–Rubinstein distance. It metrizes the weak* topology (topology of weak convergence of measures on a compact metric space S). A fundamental result for the transfer problem is the optimality criterion of Kantorovich. It asserts that a positive measure µ on S × S satisfying (4.1.3) is optimal if and only if there exists a function u0 on S such that u0 (x) − u0 (y) = r(x, y)

for (x, y) ∈ supp µ

and u0 is r-Lipschitz continuous with Lipschitz constant one. This optimality criterion may be reformulated as a duality theorem stating conditions implying equality of the optimal values for two linear extremal problems: the original Kantorovich–Rubinstein problem (4.1.1), (4.1.3), and the dual linear extremal problem. The dual problem is to maximize the functional u(x)( dx) S

over all functions u ∈ Lip(r, S), where Lip(r, S) stands for the class of r-Lipschitz functions u : S → IR; that is, Lip(r, S) := {u; |u(x) − u(y)| ≤ r(x, y)

∀x, y ∈ S}.

In this chapter we study the following generalizations of the (mass) transfer problems. Given a topological space S, a cost function c : S × S → IR ∪ {+∞}, and a Radon measure on S with S = 0, the mass transfer problem with a given marginal difference (general Kantorovich– Rubinstein problem) is to find the optimal value A(c, ) := inf

  

  c(x, y)µ( d(x, y)); µ ≥ 0, π1 µ − π2 µ = . 

S×S

(4.1.5)

164


This, in fact, can be viewed as an infinite-dimensional linear program. The dual problem is to determine the optimal value     u(x)( dx); u ∈ Lip(c, S) . (4.1.6) B(c, ) := sup   S

Here Lip(c, S)

= :=

Lip(c, S; C b (S)) {u ∈ C b (S); u(x) − u(y) ≤ c(x, y) ∀x, y ∈ S}, (4.1.7)

and C b (S) denotes the Banach space of bounded continuous real-valued functions on S equipped with the uniform norm ||u||C b (S) := sup |u(x)|. x∈S

Similarly, the general Monge–Kantorovich mass transfer problem with given marginals σ1 and σ2 and its dual problem call for determining the optimal values C(c; σ1 , σ2 ) :=     c(x, y)µ( d(x, y)); µ ≥ 0, π1 µ = σ1 , π2 µ = σ2 inf  

(4.1.8)

S×S

and D(c; σ1 , σ2 ) :=   u(x)σ1 ( dx) − v(x)σ2 ( dx); u(x) − v(y) ≤ c(x, y) sup  S S   for all x, y ∈ S 

(4.1.9)

respectively. This problem was considered in Chapters 2 and 3. Later we shall give more precise statements of these problems by specifying the classes from which the space S, the cost function c, and the corresponding measures , σ1 , σ2 on S and µ on S × S are chosen. The inequalities A(c, ) ≥ B(c, )

for all , S = 0


165

and C(c; σ1 , σ2 ) ≥ D(c; σ1 , σ2 )

for all σ1 , σ2 with σ1 S = σ2 S

follow from the definition as does the inequality A(c, ) ≤ C(C; σ1 , σ2 )

for = σ1 − σ2 .

We first characterize the cost functions c, which enjoy the duality relation A(c, ) = B(c, ) whenever S = 0. Our next step will be to characterize cost functions c, admitting the equality C(c; σ1 , σ2 ) = D(c; σ1 , σ2 ) whenever σ1 S = σ2 S. Explicit formulations of the general mass transfer problems, (4.1.5) and (4.1.8), and of the dual problems, (4.1.6) and (4.1.9), together with some duality theorems for arbitrary (not necessarily metrizable) compact spaces and nonnegative continuous cost functions, were given in Levin (1974, 1975b, 1977, 1978a). In this section we restrict ourselves to problem (4.1.5). We shall follow the approach of Levin and Milyutin (1979) in determing a dual solution for (4.1.5), restricting ourselves to the case of a compact space S.

4.1.2

Preliminaries

Before formulating the duality theorems, some definitions and notations will be required. Let S be a compact space. Henceforth B(S) denotes the σ-algebra of Borel sets in S, that is, the σ-algebra generated by the closed subsets of S. B0 (S) denotes the σ-algebra of Baire sets in S, that is, the σ-algebra generated by closed Gδ subsets of S or, equivalently, the σ-algebra generated by null sets of continuous functions on S. (If F = {x ∈ S; u(x) = 0}, where u is a continuous function, then 5 Gn , (4.1.10) F = n∈IN

where Gn = {x ∈ S; |u(x)| < n1 } for all n ∈ IN. Hence, F is a closed Gδ . On the other hand, if F is a closed Gδ , then a representation (4.1.10) exists, where G1 ⊃ G2 ⊃ · · · ⊃ Gn ⊃ · · ·, and all Gn are open. For every n ∈ IN we can find a continuous function un : S → [0, 1] such that un |F = 0 and

166


un |S\Gn = 1. It follows that F = {x ∈ S; u(x) = 0}, where u is a continuous function given by ∞

u(x) =

2−n un (x),

x ∈ S.

n=1

Thus, F is a closed Gδ if and only if F is the null set of a continuous function.) Recall that B(S) = B0 (S) for a metrizable S. However, in the general case the inclusion B0 (S) ⊂ B(S) is proper. The vector space C(S) of real-valued continuous functions on a compact space S equipped with the uniform norm, ||u|| := max |u(x)|, x∈S

is a Banach space. The dual Banach space C(S)∗ is the space of Radon measures on S equipped with the norm ||σ|| := |σ|(S) = σ+ (S) + σ− (S), where nonnegative Radon measures σ+ and σ− are the elements of the Jordan decomposition for σ. Here, σ = σ+ − σ− , and |σ| := σ+ + σ− is the variation of σ. The duality between C(S) and C(S)∗ is defined by u.σ = u(x)σ( dx), u ∈ C(S), σ ∈ C(S)∗ . S

The notation

u dσ will be used for

S

u(x)σ( dx). Recall that a (finite)

S

Borel measure on a compact space S is any countably additive real-valued set function on B(S). A nonnegative Borel measure σ on S is said to be a Radon measure if it satisfies the inner regularity condition: σ(B) = sup{σ(K); K ⊆ B, K is compact}

for all B ∈ B(S).

A signed Radon measure on S is defined as a set function σ : B(S) → IR that can be represented as a difference of two nonnegative Radon measures. Both spaces C(S) and C(S)∗ are Banach lattices relative to the natural orderings, generated by the positive cones C(S)+ := {u ∈ C(S);

u(x) ≥ 0

for all x ∈ S}

and C(S)∗+ := {σ ∈ C(S)∗ ; σB ≥ 0

for all B ∈ B(S)},


167

respectively. Further, the elements of the Jordan decomposition for a measure σ ∈ C(S)∗ are given by σ+ = σ ∨ 0 and σ− = (−σ) ∨ 0, while the variation of σ, |σ|, is simply the modulus of the element σ in the Banach lattice C(S)∗ . (For more details about vector lattices of measures see Kantorovich and Akilov (1984) and Levin (1985a).) Notice that any Borel measure on a metrizable S satisfies the inner regularity condition and so is a Radon measure. However, this fails for nonmetrizable compact spaces. Given a measure σ ∈ C(S)∗+ , a set M in S is called σ-measurable if it belongs to the σ-completion B(S)σ of B(S), which is defined as follows: A set A ⊆ S is an element of B(S)σ if and only if there exist two sets B and B0 in B(S) such that σB0 = 0 and (A \ B) ∪ (B \ A) ⊆ B0 . A set A ⊆ S is called universally measurable if it belongs to B(S)σ for all σ ∈ C(S)∗+ . The σ-algebra of universally measurable sets is denoted by U(S). Some important classes of universally measurable sets are obtained by means of the A-operation. Given a nonempty class M of subsets in an abstract set S and a countable family of sets M (n1 , . . . , nk ) ∈ M numbered by finite sequences of positive integers, the result of the A-operation over these sets is defined as 5 M (n1 , . . . , nk ). A= (nk )∈ININ k∈IN

Any set A obtained in such a way is called M-analytic, and the class of M-analytic sets is denoted by AM. Clearly, AM ⊇ M for every M. If S is a Polish space (= a separable metrizable space that can be metrized by a complete metric), the B(S)-analytic sets are just the classical analytic (or Suslin in other terminology) sets; they are universally measurable and have other desirable properties; see Lusin (1930) or Kuratowski (1966). In particular, B(S) ⊆ AB(S) ⊆ U(S) = AU(S).

(4.1.11)

Similarly, if S is any (not necessarily metrizable) compact space, then B0 (S) ⊆ AB0 (S) ⊆ U(S) = AU(S).

(4.1.12)

A function ϕ : S → IR ∪ {+∞} is called Borel (resp. Baire, σ-measurable, universally measurable, M-analytic) if its sublevel sets, {x ∈ S; ϕ(x) ≤ α},

α ∈ IR,

belong to B(S) (resp. B0 (S), B(S)σ , U(S), AM ).

168


Given a compact space S and a universally measurable function ϕ : S → IR ∪ {+∞} bounded below the integral, ϕ(x)σ( dx) S

is well-defined (finite or equal to +∞) whenever σ ∈ C(S)∗+ . Observe that for every µ ∈ C(S × S)∗+ its marginals π1 µ and π2 µ are Radon measures on S acting on every u ∈ C(S) by the formulas u, π1 µ

=

u(x)µ( d(x, y)), S×S

u, π2 µ

=

u(y)µ( d(x, y)). S×S

Set C(S)∗0 := { ∈ C(S)∗ ; 1, = 0}, where 1(x) := 1 for all x ∈ S.

4.1.3

Formulations of the Duality Theorems in Mass Settings

We are now in a position to give precise statements of the duality theorems for the case of a compact space S.(1) Given a cost function c : S × S → IR ∪ {+∞} and a measure ∈ C(S)∗0 , the problems (4.1.5) and (4.1.6) consist in finding the optimal values A(c, ) := inf

  

  c(x, y)µ( d(x, y)); µ ∈ C(S × S)∗+ , π1 µ − π2 µ = 

S×S

and B(c, ) := sup { u, ; u ∈ Lip (c, S; C(S))} respectively. Here, Lip (c, S; C(S)) := {u ∈ C(S); u(x) − u(y) ≤ c(x, y) for all x, y ∈ S}. (1) Proofs

of the results in this section will be given in Sections 4.4 and 4.5.


169

Observe that A(c, ) is well-defined for every universally measurable cost function c bounded below and for every ∈ C(S)∗0 , while B(c, ) is welldefined for every cost function c for which Lip (c, S; C(S)) is nonempty and for every ∈ C(S)∗ . We shall use the conventions A(c, ) = +∞ if ∈ C(S)∗0 , and B(c, ) = −∞ if Lip (c, S; C(S)) is empty. Note that the functions A(c, ) and B(c, ) are sublinear in and superlinear in c. Also note the obvious inequality A(c, ) ≥ B(c, ) for all universally measurable functions c bounded below and all ∈ C(S)∗ . Let S be an arbitrary compact space and UIb (S) the Banach lattice of bounded universally measurable functions on S. Also, set Lip (c, S; UIb (S)) := {u ∈ UIb (S); u(x) − u(y) ≤ c(x, y) for all x, y ∈ S}. Theorem 4.1.1 (Duality theorem on arbitrary compact space with cost function satisfying the triangle inequality) Suppose that S is an arbitrary compact space, and c : S × S → IR ∪ {+∞} is universally measurable and satisfies the triangle inequality c(x, y) + c(y, z) ≥ c(x, z)

for all x, y, z ∈ S.

Also suppose that Lip (c, S; UIb (S)) is nonempty. Then the duality relation A(c, ) = B(c, )

for all ∈ C(S)∗0

holds if and only if the function c(x, y) for x = y, c(x, y) = 0 for x = y

(4.1.13)

(4.1.14)

is lower semicontinuous on S × S. Remark 4.1.2 The hypothesis that Lip (c, S; UIb (S)) is nonempty implies the boundedness of the cost function c from below. Remark 4.1.3 If c is bounded, then for any x0 ∈ S the function c(x, x0 ) if x = x0 , v(x) = c(x, x0 ) = 0 if x = x0 , belongs to Lip (c, S; UIb (S)), which follows easily from the triangle inequality for c. Also, Lip (c, S; UIb (S)) is nonempty when c is nonnegative, for in this case constant functions belong to Lip(c, S) ⊂ Lip (c, S; UIb (S)). Other sufficient conditions for nonemptiness of Lip (c, S; UIb (S)) (and of a more general set, namely Lip (c, S; X) with X being a Banach lattice of bounded functions) will be given in Section 4.2.

170


Suppose next that c does not satisfy the triangle inequality. Then we shall need the notion of reduced cost function, which was introduced in Levin (1977). Definition 4.1.4 Let S be a nonempty set. Given a cost function c : S × S → IR ∪ {+∞}, the reduced cost function associated with it is defined on S × S by the formula c∗ (x, y) := inf cn (x, y),

(4.1.15)

cn (x, y) := min{c(x, y), c1 (x, y), . . . , cn (x, y)},

(4.1.16)

n∈IN

where

with ck defined as follows:  k+1 ck (x, y) := inf c(zj−1 , zj ); z0 = x, zk+1 = y,   j=1  z1 , . . . , zk ∈ S , k ∈ IN. 

(4.1.17)

For every N > 0 we set (c ∧ N )(x, y) := min(c(x, y), N ) and consider the following limiting condition on c: A(c, ) =

lim A(c ∧ N, )

N →∞

for all ∈ C(S)∗0 .

(4.1.18)

Note that (4.1.18) is obvious when c is bounded from above. Theorem 4.1.5 (Duality theorem on a metrizable compact space with cost function bounded below) Let S be a metrizable compact space and c : S × S → IR ∪ {+∞} be a function bounded from below with analytic sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α}, α ∈ IR. Then the following assertions are equivalent: (a) The duality relation (4.1.13) holds. (b) Condition (4.1.18) is satisfied, and moreover, one of the following two assertions holds: (b1) The function c(x, y) =

c∗ (x, y)

if x = y,

0

if x = y,

(4.1.19)


171

is bounded below and lower semicontinuous on S × S (in this case A(x, ) = B(c, ) > −∞ for all ∈ C(S)∗0 ); (b2) c∗ (x, y) is not bounded below (in this case A(c, ) = B(c, ) = −∞ for all ∈ C(S)∗0 ). The following theorem extends Theorem 4.1.5 to an arbitrary (not necessarily metrizable) compact space. Theorem 4.1.6 (Duality theorem on arbitrary compact space with general cost function) Let S be a compact space and c : S × S → IR ∪ {+∞} a function bounded below with B0 (S × S)-analytic sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α}, α ∈ IR. Then the statement of Theorem 4.1.5 remains true. Remark 4.1.7 In the proof of Theorem 4.1.6 given in Levin and Milyutin (1979) the continuum hypothesis was used. Later a similar result for more general topological spaces was proved in Levin (1990) without using the continuum hypothesis. This proof is presented below in Section 4.5. The following result is an immediate consequence of Theorem 4.1.6. Corollary 4.1.8 Let S be a compact space and let c : S × S → IR be a bounded function with B0 (S × S)-analytic sublevel sets {(x, y); c(x, y) ≤ α}, α ∈ IR (in particular, c may be a bounded Baire measurable function). Then the following assertions are equivalent: (a) A(c, ) = B(c, ) > −∞ for all ∈ C(S)∗0 . (b) The function c, given on S × S by (4.1.8), is bounded below and lsc.

4.1.4

A Further Duality Theorem

Here we consider cost functions of a special form. Let S and S be arbitrary compact spaces, f a continuous mapping of S onto S, and s ∈ C(S × S ). We consider the cost function c(x, y) := min{s(x , y ); x , y ∈ S , f (x ) = x, f (y ) = y}.

(4.1.20)

It is easily seen that c is a real-valued lower semicontinuous function on S × S. If S = S and f is the identical mapping, then c = s is an arbitrary continuous function on S × S. Theorem 4.1.9 Let S be a compact space and let c be defined by (4.1.20). Then A(c, ) = B(c, ) for all ∈ C(S)∗0 .

172


Corollary 4.1.10 Let S be a compact space and c ∈ C(S × S). Then A(c, ) = B(c, ) for all ∈ C(S)∗0 . Remark 4.1.11 Observe that the duality theorem for the classical Kantorovich–Rubinstein problem (4.1.1), (4.1.3) is an immediate consequence of Theorem 4.1.1 and also of Corollary 4.1.10. The proofs of the duality theorems 4.1.1, 4.1.5, and 4.1.9 together with a detailed discussion are given in Section 4.4. The proofs rely on the duality theory for abstract versions of the mass transfer problem with a cost function satisfying the triangle inequality (Section 4.2), as well as on reduction theorems asserting the validity of the equality A(c, ) = A(c∗ , ) (Section 4.3). As we have already said (see Remark 4.1.7), Theorem 4.1.6 is a particular case of a more general duality theorem, which will be shown in Section 4.5.

4.2 Duality for an Abstract Version of the Mass Transfer Problem with a Cost Function Satisfying the Triangle Inequality 4.2.1

The Statement of the Problem

Let S be a nonempty set and W a closed subspace of the space of all bounded real-valued functions on S × S with the uniform norm ||w|| := sup |w(x)|,

w ∈ W.

x∈S

We shall consider functional spaces W possessing the following properties: (W1 ) Constant functions belong to W. (W2 ) w1 ∨ w2 := max(w1 , w2 ) ∈ W whenever w1 , w2 ∈ W. (W3 ) If w is in W, then the “transposed” function w# (here w# (x, y) := w(y, x) for all (x, y) ∈ S × S) is also in W. Suppose X is a closed linear subspace in W that satisfies (W1 ), (W2 ) and consists of functions that depend on x only. (We do not require that X consist of all functions from W that depend only on x.) Clearly, W and X are Banach lattices with respect to the natural ordering determined by the cones of nonnegative functions W+ and X+ , respectively. Then the dual

4.2 Cost Functions with Triangle Inequality

173

spaces W ∗ and X ∗ are also Banach lattices, and since W and X satisfy (W1 ) and consist of bounded functions, we have ||σ|| = || |σ| || = sup{ |w|, |σ| ; w ∈ W, |w| ≤ 1S×S } = 1S×S , |σ| , |||| = || || || = sup{ |u|, || ; u ∈ X,

|u| ≤ 1S }

= 1S , ||

for all σ ∈ W ∗ and all ∈ X ∗ . Here |σ| stands for the modulus of the element σ in the Banach lattice W ∗ ; that is, |σ| = σ+ + σ− , where σ+ = σ ∨ 0, σ− = (−σ) ∨ 0, and finally σ = σ+ − σ− is the Jordan decomposition of σ. Similarly, || stands for the modulus of in X ∗ . Recall that in the dual Banach lattice W ∗ any set φ that is bounded above (this means that σ ≤ σ0 for all σ ∈ φ and some σ0 ∈ W ∗ ) has a supremum ∨φ ∈ W ∗ . For every w ∈ W+ , w, ∨φ =

sup

n

wi , σi ; wi ∈ W+ ,

i=1

n

wi = w, σi ∈ φ,

i=1

i = 1, . . . , n, n ∈ IN . Let π1 denote the linear operator W ∗ → X ∗ that is adjoint to the natural embedding operator X → W, and let π2 denote the linear operator W ∗ → X ∗ that is adjoint to the transposition operator restricted to X, that is, for every σ ∈ W ∗ , u, π1 σ := u, σ

and u, π2 σ := u# , σ

∀u ∈ X.

(Observe that for every u ∈ X, u# is the function in W given by u# (x, y) = u(y) whenever (x, y) ∈ S × S.) From the definitions of the operators π1 and π2 it is clear that they are continuous with respect to the weak* topologies on W ∗ and X ∗ and positive; that is, σ ≥ 0 ⇒ π1 σ ≥ 0,

π2 σ ≥ 0.

We obtain for every σ ∈ W ∗ ||π1 σ||

= =

|| |π1 σ| || = 1S , |π1 σ| = 1S , π1 |σ| 1S×S , |σ| = || |σ| || = ||σ||,

and similarly, ||π2 σ|| = ||σ||. Define then X0∗ := { ∈ X ∗ ; 1S , = 0}.

174


Lemma 4.2.1 The equality ∗ = X0∗ (π1 − π2 )W+

holds. Proof: The inclusion ⊆ is obvious. The opposite inclusion will be estab∗ lished if we show that for any 1 , 2 ∈ X+ with ||1 || = ||2 || = 1 there ∗ exists a functional σ ∈ W+ satisfying π1 σ = 1 and π2 σ = 2 . Suppose the ∗ with ||1 || = ||2 || = 1 such that contrary. Then there exist 1 , 2 ∈ X+ ∗ , ||σ|| = 1}. (1 , 2 ) ∈ {(π1 σ, π2 σ); σ ∈ W+

Since the last set is convex and weakly* compact, there exists an element (u1 , u2 ) ∈ X × X strictly separating (1 , 2 ) from it; that is, ∗ (4.2.1) u1 , 1 + u2 , 2 > max u1 + u# 2 , σ ; σ ∈ W+ , ||σ|| = 1 . For every point (x, y) ∈ S × S, the functional w "→ w(x, y) is clearly an ∗ and its norm equals 1; hence element of W+ ∗ max u1 + u# 2 , σ ; σ ∈ W+ , ||σ|| = 1 ≥

sup

(u1 (x) + u2 (y)) = sup u1 (x) + sup u2 (y) x∈S

(x,y)∈S×S

y∈S

= sup u1 (x)1S , 1 + sup u2 (y)1S , 2 ≥ u1 , 1 + u2 , 2 , x∈S

y∈S

a contradiction with (4.2.1). The statement of the lemma is thus established. 2 Let c : S × S → IR ∪ {+∞} be an arbitrary function that is bounded below and satisfies the triangle inequality c(x, y) ≤ c(x, z) + c(z, y)

for all x, y, z ∈ S.

∗ define For every σ ∈ W+

σ(c) := sup{ w, σ ; w ∈ W, w ≤ c}.

(4.2.2)

∗ is additive and positively homoClearly, the functional σ "→ σ(c) on W+ geneous; that is,

(σ1 + σ2 )(c) = σ1 (c) + σ2 (c)

∗ for all σ1 , σ2 ∈ W+ ,

and (λσ)(c) = λσ(c)

∗ and all λ ∈ IR+ . for all σ ∈ W+


175

Also, the equality σ(c + w) = σ(c) + σ(w) ∗ holds whenever σ ∈ W+ and w ∈ W.

We are now in a position to state the abstract version of the mass transfer problem with a given marginal difference. It consists in finding the optimal value ∗ A(c, ; W, X) := inf{σ(c); σ ∈ W+ , (π1 − π2 )σ = }

(4.2.3)

for ∈ X0∗ . If ∈ X0∗ , we set, by definition, A(c, ; W, X) = +∞. Clearly, the functional A is sublinear in . The abstract version of the dual problem consists in finding the optimal value B(c, ; X) := sup{ u, ; u ∈ Lip (c, S; X)},

(4.2.4)

where Lip (c, S; X) := {u ∈ X; u(x) − u(y) ≤ c(x, y)

for all x, y ∈ S}.

If Lip (c, S; X) = Ø, we define B(c, ; X) = −∞

for all ∈ X ∗ .

Note that if u ∈ Lip (c, S; X), then u+α ∈ Lip (c, S; X) for any constant α, which implies B(c, ; X) = +∞,

for ∈ X0∗

and provided that Lip (c, S; X) is nonempty.

4.2.2

The Formulation of the Duality Theorem

∗ Given ∈ X0∗ and σ ∈ W+ with (π1 − π2 )σ = , for every u ∈ Lip (c, S; X) we have

u, = u − u# , σ ≤ σ(c). Hence B(c, ; X) ≤ A(c, ; W, X). This inequality is obvious if Lip (c, S; X) is empty or if ∈ X0∗ .

176


The goal is to find conditions on the cost function c that ensure the validity of the duality relation A(c, ; W, X) = B(c, ; X)

for all ∈ X0∗ .

The definition of a regular cost function due to Levin and Milyutin (1979) will be required. Given a nonempty set M in S × S, we denote by π1 M and π2 M its projections on the the first and the second coordinates respectively. Let u ∈ X and θ ∈ IR. Define Eθ,u := {(x, y); c(x, y) − u(x) + u(y) ≤ θ}. Definition 4.2.2 A function c : S × S → IR ∪ {+∞} is called regular (with respect to W and X) if for any u ∈ X and θ ∈ IR with π1 Eθ,u ∩ π2 Eθ,u = Ø and all θ < θ, there exists a function v ∈ X, 0 ≤ v ≤ 1, such that v = 1 on π1 Eθ ,u and v = 0 on π2 Eθ ,u . It is easy to show that the function u(x) − u(y) + α is regular whenever u ∈ X and α ∈ IR. We give some examples of regular functions, which will be used later on. Example 4.2.3 Let S be a compact space, W = C(S × S), and X = C(S). Then every lower semicontinuous function c : S × S → IR ∪ {+∞} is regular. Indeed, if c is lower semicontinuous, then π1 Eθ,u and π2 Eθ,u are closed, hence compact, for any u ∈ X and θ ∈ IR, and it remains to apply the Urysohn lemma. In this example   σ(c)

=

sup

w(x, y)σ( d(x, y)); w ∈ C(S × S), w ≤ c



S×S

=



 

c(x, y)σ( d(x, y)), S×S

and so A(c, ; C(S × S), C(S)) = A(c, )


Observe that for an arbitrary (not lower semicontinuous) universally measurable function c : S × S → IR ∪ {+∞} bounded below we have only the inequality A(c, ; C(S × S), C(S)) ≤ A(c, )



177

In fact, for all σ ∈ C(S × S)∗+ ,

∧

c (x, y)σ( d(x, y)) ≤

σ(c) =

S×S

c(x, y)σ( d(x, y)), S×S

where c∧ (x, y) := sup{w(x, y); w ∈ C(S × S), w ≤ c}. Example 4.2.4 Let S be a Borel subset in a Polish space, and let W and X be the Banach lattices of all bounded Borel functions on S × S and on S respectively. If for any θ ∈ IR the sublevel set Eθ := {(x, y); c(x, y) ≤ θ}

(4.2.5)

is analytic, then c is regular. To show this, note that {Er ∩ {(x, y); −u(x) + u(y) ≤ θ − r}}, Eθ,u = r

where r runs through the rational numbers. Since the class of analytic sets is closed under the operations of countable unions and intersection (see, for example, Kuratowski (1966)), all the Eθ,u are analytic sets. Projections of analytic sets are analytic as well, so π1 Eθ,u and π2 Eθ,u are analytic subsets of S. If their intersection is empty, then by the Lusin separation theorem (see again Kuratowski (1966)) there exist Borel sets B1 and B2 in S such that B1 ⊇ π1 Eθ,u , B2 ⊇ π2 Eθ,u , and B1 ∩ B2 = Ø. The indicator function of B1 , 1B1 (x) =

1

if x ∈ B1 ,

0

if x ∈ B1 ,

can be taken as the desired function v, and the regularity of c is thus established. Example 4.2.5 Let S be a nonmetrizable compact space and let W and X be the Banach lattices of all bounded Baire measurable functions on S × S and on S respectively. If for any θ ∈ IR the sublevel set (4.2.5) is B0 (S × S)analytic, then c is regular. The proof is the same as in the previous example if one takes into account the following three facts (see Levin (1990, 1992)): 1. The class of B0 (S × S)-analytic sets AB0 (S × S) is closed under the operations of countable union and intersection. 2. π1 A, π2 A ∈ AB0 (S) for every A ∈ AB0 (S × S).

178


3. An extension of the Lusin separation theorem holds as follows: If A1 , A2 ∈ AB0 (S) and A1 ∩ A2 = Ø, then there exist sets B1 , B2 ∈ B0 (S) such that B1 ⊇ A1 , B2 ⊇ A2 , and B1 ∩ B2 = Ø (see Theorem 1.2 in Levin (1992) given further in this book as Theorem 4.5.29). Theorem 4.2.6 (abstract duality theorem) Suppose that the cost function c : S ×S → IR∪{+∞} is bounded below, is regular, and satisfies the triangle inequality. Then the set Lip (c, S; X) is nonempty, and the duality relation A(c, ; W, X) = B(c, ; X) > −∞ holds for all ∈ X ∗ . In what follows in this section, we shall provide the proof of this theorem and of some related results.

4.2.3

The Plan of the Proof of Theorem 4.2.6 and Formulations of Related Theorems

As we have mentioned already, the functional A(c, ; W, X) is sublinear in . The idea of the proof of the duality Theorem 4.2.6 is to check that (i) A(c, · ; W, X) is weak* lower semicontinuous on X ∗ , and then (ii) to apply the following version of the Fenchel–Moreau theorem from convex analysis (see, for example, Ioffe and Tihomirov (1979)): Let Z be a Hausdorff locally convex space, Z ∗ the conjugate space, and p : Z → IR ∪ {+∞} a sublinear functional; that is, p(z1 + z2 ) ≤ p(z1 ) + p(z2 )

for all z1 , z2 ∈ Z,

and p(λz) = λp(z)

for all z ∈ Z and all λ ≥ 0

(we assume, by definition, 0 · (+∞) = 0, hence p(0) = 0). If p is weak* lower semicontinuous on Z, then the subdifferential ∂p(0) := {z ∈ Z ∗ ; z, z ≤ p(z)

for all z ∈ Z}

is nonempty, and p(z) = p∗∗ (z)

for all z ∈ Z,

where p∗∗ is the second conjugate convex functional; that is,

(4.2.6)


179

p∗∗ (z) := sup{ z, z − p∗ (z ); z ∈ Z ∗ }, p∗ (z ) := sup{ z, z − p(z); z ∈ Z}.

We shall prove Theorem 4.2.6 by applying (4.2.6) to Z = (X ∗ , σ(X ∗ , X)), Z = X, and p() = A(c, ; W, X). Here, σ(X ∗ , X) denotes the weak* topology on X ∗ . The result will follow if we show that A(c, · ; W, X) is = weak* lower semicontinuous on X ∗ , and that ∂A(c, · ; W, X)(0) Lip (c, S; X). ∗

The following result is crucial in proving Theorem 4.2.6. Theorem 4.2.7 (The abstract version of the Kantorovich–Rubinstein functional is lower semicontinuous) Suppose that c is bounded below, is regular and satisfies the triangle inequality. Then the functional A(c, · ; W, X) is weak* lower semicontinuous on X ∗ . This theorem will be derived, in turn, from the following result. Theorem 4.2.8 (The Monge–Kantorovich and Kantorovich–Rubinstein functionals coincide if the cost function satisfies the triangle inequality) Suppose that c is bounded below, is regular, and satisfies the triangle inequality. Then for every ∈ X0∗ , ∗ , π1 σ = + , π2 σ = − }, A(c, ; W, X) = inf{σ(c); σ ∈ W+

where = + − − is the Jordan decomposition of . Our first goal is to prove Theorem 4.2.8. To this end, an auxiliary theorem on convex sets due to Dubovitskii and Milyutin (1971) will be needed.

4.2.4

An Auxiliary Theorem on Convex Sets

The following auxiliary theorem on convex sets will be used extensively in proving Theorem 4.2.8. This theorem is a particular case of a more general result proved in Dubovitskii and Milyutin (1971). Let W be a normed space and K a convex subset in W ∗. A convex cone Ω ⊆ W ×IR is said to be thick on K if the following conditions are satisfied: (a) (w, a) ∈ Ω ⇒ (w, a + δ) ∈ Ω

for all δ ≥ 0.

(b) σ ∈ K, (w, a) ∈ Ω ⇒ w, σ + a ≥ 0.

180


(c) w, σ + a ≥ 0 whenever (w, a) ∈ Ω ⇒ σ ∈ K, where K denotes the closure of K in the norm topology of W ∗. Clearly such a cone is not unique. Dubovitskii–Milyutin theorem on convex sets. Let Gi , i = 1, . . . , m, be open convex subsets in W ∗ , and Kj , j = 1, . . . , n, closed convex subsets in W ∗ , and let Ωi , i = 1, . . . , m, and Ξj , j = 1, . . . , n, be convex cones in W × IR that are thick on the corresponding sets. Fix arbitrary elements σi0 ∈ Gi , i = 1, . . . , m. The intersection  m m 5 5 Gi ∩  Kj  i=1

j=1

is empty if and only if for every ε > 0, there exist elements (wiε , aεi ) ∈ Ωi , i = 1, . . . , m, and (wjε , aε j ) ∈ Ξj , j = 1, . . . , n, having the following properties: ! ! m ! ! n n ! !m ε ε ! ε ε ! wi + wj ! ≤ ε and ai + aj ≤ ε. (i) ! i=1 ! ! i=1 j=1 j=1 (ii) The normalization condition holds as follows: m

( wiε , σi0 + aεi ) = 1.

i=1

4.2.5

Two Lemmas

We precede the proof of Theorem 4.2.8 with two lemmas. ∗ and let M be any subset in S × S. We denote by φ(σ, M ) Let σ ∈ W+ ∗ the set in W consisting of all σ such that σ ≤ σ and w, σ = 0 whenever w|M = 0. Here w|M stands for the restriction of w to M .

The set φ(σ, M ) is bounded above in W ∗ by σ, and since W ∗ is the conjugate Banach lattice, there exists the supremum of φ(σ, M ) in W ∗ , σM := ∨φ(σ, M ) = sup φ(σ, M ). W∗

∗ It is easy to see that σm ∈ W+ and σM ∈ φ(σ, M ).

Lemma 4.2.9 For every ε > 0 there exists wε ∈ W+ such that 0 ≤ wε ≤ 1, wε |M = 1, and wε , σ − σM → 0

as ε → 0.


181

Proof: First observe that 1S×S , σM = sup{ 1S×S , σ ; σ ∈ φ(σ, M )}. Put M = 1S×S , σM and consider in W ∗ the convex sets G1 F1 F2

= = =

{σ ; 1S×S , σ − M > 0}, {σ ; σ ≤ σ}, {σ ; w, σ = 0 for all w ∈ W with w|M = 0}.

Clearly, G1 is open, F1 and F2 are closed, and G1 ∩ F1 ∩ F2 = Ø. The convex cones in W × IR Ω1 Ξ1 Ξ2

= = =

{(α1S×S , −αM + δ0 ); α, δ0 ∈ IR+ }, {(−ζ, ζ, σ + δ1 ); ζ ∈ W+ , δ1 ∈ IR+ }, {(w, δ2 ); w ∈ W, w|M = 0, δ2 ∈ IR+ },

are thick on G1 , F1 , and F2 , respectively. As a normalized element of G1 we fix any σ0 ∈ G1 satisfying 1S×S , σ0 − M = 1. According to the theorem on convex sets (see Section 4.2.4), for every ε > 0 there exist αε , δ0ε , δ1ε , δ2ε ∈ IR+ , ζ ε ∈ W+ , and w1ε ∈ W, w1ε |M = 0 such that (αε 1S×S , −αε M + δ0ε ) ∈ Ω1 , (−ζ ε , ζ ε , σ + δ1ε ) ∈ Ξ1 , (w1ε , δ2ε ) ∈ Ξ2 , αε + δ0ε = 1 (the normalization condition), mε δε

:= :=

αε 1S×S − ζ ε + w1ε , ||mε || ≤ ε, −αε M + δ0ε + ζ ε , σ + δ1ε + δ2ε ,

|δ | ≤ ε. ε

(4.2.7) (4.2.8)

Now, applying the functional σM to both parts of (4.2.7) and adding the resulting equation to (4.2.8), we get mε , σM + δ ε = ζ ε , σ − σM + δ0ε + δ1ε + δ2ε .

182


It follows from the inequalities in (4.2.7) and (4.2.8) that δ0ε → 0,

δ1ε → 0,

δ2ε → 0,

and ζ ε , σ − σM → 0

as ε → 0.

Then αε → 1 as ε → 0 by the normalization condition. We obtain mε = ζ ε , σ − σM → wε |M =

1S×S − ζ ε + wε , 0, ||mε || → 0 as ε → 0, 0,

(4.2.9) (4.2.10) (4.2.11)

where ζε =

1 ε ζ , αε

wε =

1 ε w , αε 1

mε =

1 ε m . αε

Writing ζ ε wε

:= :=

(ζ ε + mε )+ , (ζ ε + mε )− + wε ,

we derive from (4.2.9) and (4.2.11) 1S×S − ζ ε + wε = 0,

wε |M = 0.

(4.2.12)

Further, ||ζ ε − ζ ε || = ||(ζ ε + mε )+ − ζ ε || ≤ ||mε+ || ≤ ||mε ||, which combined with (4.2.10) yields ζ ε , σ − σM → 0

as ε → 0.

(4.2.13)

Set wε = 1S×S ∧ ζ ε . By (4.2.12), wε |M = 1, and since σ − σM ≥ 0 and 0 ≤ wε ≤ ζ ε , (4.2.13) we have wε , σ − σM → 0

as ε → 0.

The proof is complete. Lemma 4.2.10 Let : S × S → IR ∪ {+∞} be bounded below, and M = {(x, y); (x, y) ≤ θ}. Then the inequality (σ − σM )() ≥ θ||σ − σM || holds.

2


183

Proof: By Lemma 4.2.9, for every ε > 0 there exists a function wε ∈ W+ such that 0 ≤ wε ≤ 1, wε |M = 1, and wε , σ − σM → 0 as ε → 0. We find a number k such that (x, y) + k ≥ 0 for all (x, y) ∈ S × S. Then the inequality + k1S×S ≥ (θ + k)(1S×S − wε ) holds on S × S. Now, by using the fact that σ − σM ≥ 0 together with the ∗ properties of the functional σ "→ σ() on W+ , we obtain (σ − σM )( + k1S×S ) ≥ (θ + k)(1S×S − wε ), σ − σM = (θ + k)(||σ − σM || − wε − σM ). Since wε , σ − σM → 0 as ε → 0, we have (σ − σM )( + k1S×S ) ≥ (θ + k)||σ − σM ||. But (σ − σM )( + k1S×S ) = (σ − σM )() + k||σ − σM ||; consequently, (σ − σM )() ≥ θ||σ − σM ||, and the proof is complete.

4.2.6

2

Proof of Theorem 4.2.8. (The Monge–Kantorovich and Kantorovich–Rubinstein Functionals Coincide if the Cost Function Satisfies the Triangle Inequality)

∗ Let σ be an arbitrary element of W+ satisfying (π1 − π2 )σ = . It is clear ∗ that the theorem will be established if we can find an element σ 0 ∈ W+ 0 0 0 such that π1 σ = + , π2 σ = − , and σ (c) ≤ σ(c). ∗ If σ(c) = +∞, any σ in W+ with π1 σ = + and π2 σ = − can be taken. The existence of such a σ follows from the proof of Lemma 4.2.1.

Suppose now that σ(c) = a < +∞ and consider the subsidiary extremal problem to minimize the functional ||σ|| over the set ∗ ; σ(c) ≤ a, (π1 − π2 )σ = }. A := {σ ∈ W+

Let J denote the optimal value of the subsidiary problem, that is, J := inf{ 1S×S , σ ; σ ∈ A}. ∗ (Observe that ||σ|| = 1S×S , σ , for all σ ∈ A, since A ⊆ W+ .)

Claim 1: A is weak* closed, and the value J is attained at some σ 0 ∈ A.

184


Proof: Let σν be a net in A that is weakly* convergent to some σ ∈ W ∗ . ∗ Clearly, σ ∈ W+ and (π1 − π2 )σ = . Finally, if w ∈ W and w ≤ c, then w, σ = lim w, σν ≤ lim σν (c) ≤ a. ν

ν

Therefore, σ(c) = sup{ w, σ ; w ∈ W, w ≤ c} ≤ a. The set A is thus weak* closed. Further, since σ ∈ A, we have J = inf{ 1S×S , σ ; σ ∈ A, ||σ|| ≤ ||σ||}, and the infimum is attained at some σ 0 , since the set {σ ∈ A; ||σ|| ≤ ||σ||} is weak* compact. The claim is proved. We continue the proof of the theorem and consider in W ∗ the following convex sets: G

= {σ; 1S×S , σ < J},

K1 K2

= =

{σ; (π1 − π2 )σ = }, {σ; σ ≥ 0, σ(c) ≤ a}.

It is clear that G is open, while K1 and K2 are closed (K2 is closed because of the lower semicontinuity of the functional σ "→ σ(c) = sup{ w, σ}; w ∈ ∗ ). W, w ≤ c} on W+ Consider next the following convex cones in W × IR: Ω

=

Ξ1 Ξ2

= =

{(−α1S×S , αJ + δ0 ); α, δ0 ∈ IR+ }, {(u − u# , −(u) + δ1 ); u ∈ X, δ1 ∈ IR+ }, {(ζ − βw, βaδ2 ); ζ ∈ W+ , w ∈ W, w ≤ c, β, δ2 ∈ IR+ }.

They are thick on the corresponding sets. From the definition of J it is clear that G∩K1 ∩K2 = Ø, and the theorem on convex sets can be applied. As a normalized element of G we take any fixed σ∗ ∈ G for which J − 1S×S , σ∗ = 1. Now, by the Dubovitskii– Milyutin theorem on convex sets, for every ε > 0 there exist numbers αε ≥ 0, β ε ≥ 0, δ0ε ≥ 0, δ1ε ≥ 0, δ2ε ≥ 0 and functions uε ∈ X, ζ ε ∈ W+ , and wε ∈ W, wε ≤ c, such that |mε | ≤ ε, mε (x, y) := −αε +uε (x) −uε (y) + ζ ε (x, y) −β ε wε (x, y), (4.2.14) |δ ε | ≤ ε, δ ε := αε J + δ0ε − (uε ) + δ1ε + β ε a + δ2ε ,

(4.2.15)


αε + δ0ε = 1

(normalization condition).

185

(4.2.16)

Without loss of generality we may assume that β ε > 0. Now, taking into account Claim 1, applying σ 0 to both parts of (4.2.14), and adding the result to (4.2.15) yields mε , σ 0 + δ ε = ζ ε , σ 0 − β ε wε , σ 0 + δ0ε + δ1ε + δ2ε + β ε a. From this it follows that ζ ε , σ 0 → 0,

β ε (a − wε , σ 0 ) → 0,

δiε → 0

(i = 0, 1, 2)

as ε → 0. Set ζ1ε := ζ ε + β ε (c − wε ). We have ζ1ε ≥ 0 and σ 0 (ζ1ε ) = ζ ε , σ 0 + β ε σ 0 (c − wε ) ≤ ζ ε , σ 0 + β ε (a − wε , σ 0 ), and consequently, σ 0 (ζ1ε ) → 0 as ε → 0.

(4.2.17)

Now choose a number θ, 0 < θ < 1/2, and consider the set M ε,θ := {(x, y); αε + β ε c(x, y) − uε (x) + uε (y) ≤ θ} . For small ε > 0 this set is not empty, since otherwise we would have ζ1ε (x, y) − mε (x, y) = αε + β ε c(x, y) − uε (x) + uε (y) > θ for all (x, y) ∈ S × S. Consequently, σ 0 (ζ1ε ) − σ 0 (mε ) > θ 1S×S , σ 0 = θJ, which is impossible because of (4.2.14) and (4.2.17). Claim 2: π1 M ε,θ ∩ π2 M ε,θ = Ø when ε > 0 is small enough. Proof: Suppose the contrary. Then for any sufficiently small ε > 0 there is a point zε ∈ π1 M ε,θ ∩ π2 M ε,θ . This means that (zε , yε ) ∈ M ε,θ and (xε , zε ) ∈ M ε,θ for some xε , yε ∈ S; that is,

186


αε + β ε c(zε , yε ) − uε (zε ) + uε (yε ) ≤ θ, αε + β ε c(xε , zε ) − uε (xε ) + uε (zε ) ≤ θ. Summing up these inequalities and taking into account the triangle inequality for c, we obtain 2αε + β ε c(xε , yε ) − uε (xε ) + uε (yε ) ≤ 2θ. Next,

2αε + β ε c(xε , yε ) − uε (xε ) + uε (yε ) = αε − mε (xε , yε ) + ζ1ε (xε , yε ) ≥ αε − mε (xε , yε ); thus 2θ ≥ αε − mε (xε , yε ). Letting ε > 0 tend to zero, we get 2θ ≥ 1 (recall that αε → 1 as ε → 0 because of (4.2.16)), which contradicts the choice of θ. The claim is proved. Proceeding to the concluding part of the proof of the theorem, observe that M ε,θ = {(x, y); c(x, y) − v ε (x) + v ε (y) ≤ θε } = Eθε ,vε , where vε =

1 ε u ∈ X, βε

θε =

θ − αε , βε

and Eθ,v is defined in Section 4.2.2 (recall that we have assumed β ε > 0). Since c is regular and π1 M ε ∩ π2 M ε,θ = Ø when ε > 0 is small enough, there exists a function v1ε ∈ X, 0 ≤ v1ε , such that v1ε = 1 on π1 M ε,θ−ε and 0 for M = M ε,θ . v1ε = 0 on π2 M ε,θ−ε . We set v2ε = 1S − v1ε and set σε0 := σM ε 0 ε 0 ε ε,θ−ε Then v2 , π1 σε = v2 , σε , and since v2 |M = 0 (here we consider v2ε (x) as an element of W, that is, as a function of (x, y) ∈ S × S), v2ε , π1 σε0 = 0.

(4.2.18)

Set ε (x, y) := αε + β ε c(x, y) − uε (x) + uε (y). According to Lemma 4.2.10, (σ 0 − σε0 )(ε ) ≥ (θ − ε)||σ 0 − σε0 ||.

(4.2.19)


187

On the other hand, ε = ζ1ε − mε , and since 0 ≤ ζ1ε , σ 0 − σε0 ≤ → 0 as ε → 0 and |mε | ≤ ε, we have (σ 0 − σε0 )(ε ) → 0 as ε → 0. From this and (4.2.19) it follows that limε→0 ||σ 0 − σε0 || = 0. Then by (4.2.18), v2ε , π1 σ 0 → 0 as ε → 0. Similarly, it can be verified that v1ε , π2 σε0 = v1ε# , σε0 = 0, and v1ε , π2 σ 0 → 0 as ε → 0. ζ1ε , σ 0

Now, by using the known formula for the infimum of two functionals from the conjugate Banach lattice X ∗ (see, for example, Kantorovich and Akilov (1984)), we get ||π1 σ 0 ∧ π2 σ 0 ||

= 1S , π1 σ 0 ∧ π2 σ 0 = inf{ v, π1 σ 0 + 1S − v, π2 σ 0 ; 0 ≤ v ≤ 1, v ∈ X} ≤ inf { v2ε , π1 σ 0 + v1ε , π2 σ 0 } = 0. ε>0

Thus, (π1 − π2 )σ 0 = , and π1 σ 0 ∧ π2 σ 0 = 0. As is wellknown (see again Kantorovich and Akilov (1984)), the last equality implies π1 σ 0 = + and 2 π2 σ 0 = − . The proof is complete.

4.2.7

The Existence of Optimal Solutions

The following result is a consequence of Theorem 4.2.8. Corollary 4.2.11 Under the assumptions of Theorem 4.2.8, for every ∈ ∗ such that X0∗ there exists a functional σ ∈ W+ π1 σ = + , π2 σ = − ,

and

σ(c) = A(c, ; W, X).

∗ such Proof: In view of Theorem 4.2.8, there exists a sequence (σn ) ⊂ W+ that π1 σn = + , π2 σn = − , and A(c, ; W, X) = limn→∞ σn (c).

We have ||σn || = 1S×S , σn = ||+ || = ||− || =

1 ||||. 2

Then (σn ) is weak* precompact, so there exists a subnet (generalized sub∗ , sequence) σnν that is weakly* convergent to some σ. Then σ ∈ W+ π1 σ = ∗ − lim π1 σnν = + , ν

π2 σ = ∗ − lim π2 σnν = − , ν

and σ(c)

sup{ w, σ ; w ∈ W, w ≤ c} = sup{lim w, σnν ; w ∈ W, w ≤ c}

=

ν

≤

lim σnν (c) = ν

lim σn (c) = A(c, ; W, X).

n→∞

188


Consequently, σ(c) = A(c, ; W, X).

4.2.8

2

Proof of Theorem 4.2.7 (The Abstract Version of the Kantorovich–Rubinstein Functional Is Weak* Lower Semicontinuous)

Since A(c, ; W, X) = +∞ for ∈ X0∗ and since X0∗ is weakly* closed in X ∗ , it suffices to show that A(c, · ; W, X) is weakly* lower semicontinuous on X0∗ . So, we need to show that for any a1 ∈ IR the convex set E(a1 ) := { ∈ X0∗ ; A(c, ; W, X) ≤ a1 } is weakly* closed. According to the Krein–Shmulyan theorem (see, for example, Day (1958)), this will follow if we show that the intersections E(a0 , a1 ) := E(a1 ) ∩ { ∈ X0∗ ; |||| ≤ a0 } are weakly* closed for all a0 , a < a0 < +∞, and all a1 ∈ IR. Suppose a net (γ ) ⊂ E(a0 , a1 ) converges weakly* to some 0 ∈ X0∗ . Since the norm on a conjugate space is a weakly* lower semicontinuous functional, we have ||0 || ≤ lim ||γ || ≤ a0 . ν

∗ for which π1 σγ = Further, in view of Corollary 4.2.11, there exists σγ ∈ W+ γ+ , π2 σγ = γ− , and σγ (c) = A(c, γ ; W, X). Since

||σγ || =

1 1 ||γ || ≤ a0 , 2 2

the net (σγ ) is weakly* precompact. We extract from it a weakly* convergent subnet σγν → σ 0 . Then γν = (π1 − π2 )(σγν ) converges weakly* to (π1 − π2 )σ 0 . Consequently, (π1 − π2 )σ 0 = 0 , and we obtain A(c, 0 ; W, X)

≤ σ 0 (c) = sup{ w, σ 0 ; w ∈ W, w ≤ c} = sup{lim w, σγν ; w ∈ W, w ≤ c} ν

≤ that is, 0 ∈ E(a0 , a1 ).

lim σγν (c) = lim A(c, γν ; W, X) ≤ a1 ; ν

ν

2


4.2.9

189

Proof of Theorem 4.2.6

We set ∂A := ∂A(c, · ; W, X)(0). Lemma 4.2.12 The equality ∂A = Lip (c, S; X) holds. ∗ Proof: Suppose u ∈ Lip (c, S; X); then u − u# ≤ c, and for every σ ∈ W+ with (π1 − π2 )σ = we have

u, = u − u# , σ ≤ σ(c). Hence u, ≤ A(c, , W, X)

for all ∈ X0∗ ;

that is, u ∈ ∂A. ∗ and Next, suppose that u ∈ ∂A. For every x, y, z ∈ S define σx,y ∈ W+ δz ∈ X ∗ by

w, σx,y := w(x, y) u, δz := u(z)

∀w ∈ W, ∀u ∈ X.

Clearly (π1 − π2 )σx,y = δx − δy , and therefore, u(x) − u(y) = u, δx − δy ≤ A(c, δx − δy ; W, X) ≤ σx,y (c) ≤ c(x, y) whenever x, y ∈ S; that is, u ∈ Lip (c, S; X).

2

Proof of Theorem 4.2.6: Since A(c, ·, W, X) is sublinear and weak* lower semicontinuous on X ∗ and its subdifferential ∂A is nonempty, and moreover A(c, ; W, X) = A∗∗ (c, ; W, X)

for all ∈ X ∗ ,

(4.2.20)

where A∗∗ is defined by A∗∗ (c, ; W, X) = sup{ u, − A∗ (c, u; W, X); u ∈ X}, with A∗ (c, u; W, X) = sup{ u, − A (c, u; W, X); ∈ X ∗ },

190


(see (4.2.6)), then by Lemma 4.2.12, Lip (c, S; X) = Ø, and we have   0 for u ∈ ∂A = Lip (c, S; X), A∗ (c, u; W, X) =  +∞ for u ∈ ∂A = Lip (c, S; X), and A∗∗ (c, ; W, X)

= sup{ u, ; u ∈ ∂A} = sup{ u, ; u ∈ Lip (c, S; X)} = B(c, ; X) > −∞. 2

Applying (4.2.20), we complete the proof.

4.3 Reduction Theorems In this section we establish reduction theorems asserting the validity of the equality A(c, ) = A(c∗ , ). This equality enables us to reduce the study of the mass transfer problem with an arbitrary cost function to the case of the reduced cost function satisfying the triangle inequality. Before formulating the theorems, notice that in contrast to the original cost function c, the reduced cost function c∗ need not be bounded below and even may take the value −∞. Moreover, supposing c to be universally measurable, we cannot be sure in general that c∗ is also universally measurable.

4.3.1

The Metrizable Compact Case

For the sake of simplicity, we start with the case of S being a metrizable compact space. Given a cost function c, we consider the reduced cost function c∗ together with the functions cn and cn, n ∈ IN, which were defined in (4.1.16) and (4.1.17). Lemma 4.3.1 Suppose S is a metrizable compact space, and suppose that the cost function c : S × S → IR ∪ {+∞} is bounded below and has analytic sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α},

α ∈ IR.

Then for every α ∈ IR the sets in S × S, An (α) := {(x, y); cn (x, y) < α}, An (α) := {(x, y); cn (x, y) < α},

n ∈ IN, n ∈ IN,

4.3 Reduction Theorems

191

and A∗ (α) :=

{(x, y); c∗ (x, y) < α},

are analytic. Proof: Since An (α) =

n

Ak (α),

n ∈ IN,

k=1

A∗ (α) =

An (α),

n∈IN

it suffices to verify that all An (α) are analytic. Every An (α) is the projection onto S × S of the subset in (S × S) × S n , n+1 n B (α) := (x, y, z1 , . . . , zn ); c(zk−1 , zk ) < α, z0 = x, zn+1 = y . k=1

The set B n (α), in turn, can be represented in the form B n (α) =

n+1 5

Bkn (zk ).

z1 ,...,zn+1 k=1

Here the union is taken over all possible choices (z1 , . . . , zn+1 ) of rational numbers satisfying z1 + · · · + zn+1 < α, and Bkn (zk ) := {(x, y, z1 , . . . , zn ); c(zk−1 , zk ) ≤ zk }, k = 1, . . . , n + 1, z0 = x, zn+1 = y. Clearly, all Bkn (zk ) are analytic, which implies that B n (α) is analytic as well. Then An (α) is analytic as the projection of the 2 analytic set B n (α). The proof is complete. Observe that for every α ∈ IR we have

5 1 . (x, y); c∗ (x, y) < α + {(x, y); c∗ (x, y) ≤ α} = n n∈IN

Therefore, the sublevel sets {(x, y); c∗ (x.y) ≤ α},

α ∈ IR,

are analytic as well, and the same is true also for the functions cn and cn , n ∈ IN.

192


Corollary 4.3.2 Under the assumptions of Lemma 4.3.1 the reduced cost function c∗ and all the functions cn and cn , n ∈ IN, have analytic sublevel sets, and consequently they are universally measurable. This is a direct consequence of the lemma. Corollary 4.3.3 Under the assumptions of Lemma 4.3.1, the set W := {(x, y) ∈ S × S; c∗ (x, y) = −∞} is analytic. Proof: This follows from the fact that all A∗ (α), α ∈ IR, are analytic, combined with the representation W =

5

A∗ (−n).

n∈IN

2 Take any µ ∈ C(S × S)∗+ . If c∗ is universally measurable and if the integral c∗ dµ = c∗ (x, y)µ( d(x, y)) S×S

S×S

makes no sense, that is,

c∗+ dµ =

S×S

c∗− dµ = +∞, then we assume that

S×S

by definition, c+ dµ = c∗ (x, y)µ( d(x, y)) = +∞. S×S

S×S

With this convention, the value     c∗ (x, y)µ( d(x, y)); µ ∈ C(S × S)∗+ , (π1 − π2 )µ = A(c∗ , ) = inf   S×S

is well defined for every ∈ C(S)∗0 even if c∗ is not bounded below. Clearly, A(c, ) ≥ A(c∗ , )


We are now in a position to formulate the reduction theorem.


193

Theorem 4.3.4 (The reduction theorem) Let S be a metrizable compact space. Suppose that c : S × S → IR ∪ {+∞} is bounded below and that its sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α}, α ∈ IR, are analytic. Also suppose that the condition A(c, ) =

lim A(c ∧ N, )

(4.3.1)

N →∞

holds for a given ∈ C(S)∗0 . Then for this the equality A(c, ) = A(c∗ , )

(4.3.2)

holds. Corollary 4.3.5 If S is metrizable, compact, and c : S × S → IR is bounded with analytic sublevel sets {(x, y); c(x, y) ≤ α}, α ∈ IR, then (4.3.2) holds for all ∈ C(S)∗0 . Proceeding to the proof of Theorem 4.3.4, we fix ε > 0 and consider in S × S the following subsets:

M0

(S × S) \ W = {(x, y); c∗ (x, y) > −∞}, := {(x, y); c(x, y) = c∗ (x, y)},

Mn

:=

M

:=

{(x, y); cn (x, y) ≤ c+ (x, y) + ε} \

W1

:=

W ∩A

Wn

:=

1 − ε

1

n−1

Mk ,

n ∈ IN,

k=0

,

n−1 1 W ∩ A (− ) \ Wk , ε n

n > 1,

k=1

where W is defined in Corollary 4.3.3, and An (α), n ∈ IN, are defined in Lemma 4.3.1. By Lemma 4.3.1 and Corollary 4.3.3, all these sets are universally measurable. Also, they are pairwise disjoint (all Mn are contained in M , since c is bounded below), and we have ∞ n=0

Mn = M,

∞

Wn = W.

n=1

We set En := Mn ∪ Wn ,

n ∈ IN.

To prove the theorem, we need the following lemma.

194


Lemma 4.3.6 Let µ ∈ C(S × S)∗+ . For every m ∈ IN there exists a measure µm ∈ C(S × S)∗+ such that (π1 − π2 )µm = (π1 − π2 )µ, and moreover, the inequality m m 1 (c ∧ N ) dµm ≤ c∗ dµ − µWn ε n=1 n=0 S×S

(4.3.3)

Mn

+ 2ε

m

∞

µEn + N

n=1

µEn

n=m+1

holds. Proof: For every n ∈ IN consider the set-valued mapping Γn : En → 2S defined by

n

Γn (x, y) := {(z1 , . . . , zn ); c(x, z1 ) + c(z1 , z2 ) + · · · + c(zn , y) < cn (x, y) + ε}. Its graph, gr Γn , can be expressed in the form {(x, y, z1 , . . . , zn ); c(x, z1 ) + · · · + c(zn , y) < z} gr Γn = z

∩ {(x, y, z1 , . . . , zn ); (x, y) ∈ En ,

z < cn (x, y) + ε} ,

where the union is taken over all rational numbers. By the condition on c, the first of the sets under the union is analytic, while the second belongs to the σ-algebra B(En )µ ⊗ B(S n ), where B(En )µ stands for the µ-completion of B(En ), that is, for the σ-algebra of µ-measurable sets in En . From this representation it follows that gr Γn can be obtained as the result of the A-operation over sets from B(En )µ ⊗B(S n ), gr Γn ∈ A(B(En )µ ⊗ B(S n )). Now we apply to Γn the measurable selection theorem (Theorem D.4 in Levin (1985a); see also Levin (1978b, Corollary 2; 1980) or Leese (1975)), according to which there exists a µ-measurable mapping ϕn : En → S n n (this means that ϕ−1 n (B) ∈ B(En )µ whenever B ∈ B(S )) such that ϕn (x, y) = (ϕn1 (x, y), . . . , ϕnn (x, y)) ∈ Γn (x, y) for all (x, y) ∈ En . We put ϕno (x, y) := x, ϕnn+1 (x, y) = y, and consider the mappings Ψnk : En → S × S defined by Ψnk (x, y) := (ϕnk−1 (x, y), ϕnk (x, y)),

k = 1, . . . , n + 1.


Now we can define the desired measure µm by

∞ m n+1 −1 µ Ψnk (B) + µ B ∩ µm (B) := µ (B ∩ M0 ) + n=1 k=1

195

En

n=m+1

for every Borel set B ⊆ S × S. This is well-defined, since the sets Ψ−1 nk (B) are µ-measurable. Clearly, µm ∈ C(S × S)∗+ . Further, it is straightforward that u d(π1 − π2 )µm = u d(π1 − π2 )µ S

S

for every u ∈ C(S), that is, (π1 − π2 )µm = (π1 − π2 )µ. It remains to verify (4.3.3). We write, for brevity, E :=

∞

En .

n=m+1

We have

(c ∧ N ) dµm

≤

c dµm + N µm E

(S×S)\E

S×S

m n+1

=

c∗ dµ +

n=1 E

M0

n

+N

∞

c(Ψnk (x, y))µ( d(x, y))

k=1

µEn .

n=m+1

Since ϕn is a selection of Γn , the inequality n+1

c(Ψnk (x, y) < cn (x, y) + ε

k=1

holds for all (x, y) ∈ En . Now the stated inequality follows from the above 2 one and from the definitions of the sets Mn and Wn . Proof of Theorem 4.3.4: We assume that A(c∗ , ) < +∞, since otherwise the statement is obvious. There are two possibilities: Either A(c∗ , ) > −∞, or A(c∗ , ) = −∞. Let us fix δ > 0. If A(c∗ , ) > −∞, we find a measure µ ∈ C(S × S)∗+ such that (π1 − π2 )µ = and c∗ dµ < A(c∗ , ) + δ. (4.3.4) S×S

196


If A(c∗ , ) = −∞, we find a measure µ ∈ C(S × S)∗+ such that (π1 − π2 )µ = and 1 (4.3.5) c∗ dµ < − . δ S×S

Since in both cases

c∗ dµ < +∞ and the mutually disjoint sets

S×S

Mn , n = 0, 1, 2, . . ., form a decomposition of M , we have c∗ dµ = lim

m

m→∞

M

n=0 M

c∗ dµ.

(4.3.6)

n

We now take the measures µm , m ∈ IN, corresponding to µ in accordance with Lemma 4.3.6. From (4.3.3) and (4.3.6) it follows that (c ∧ N ) dµm A(c ∧ N, ) ≤ lim m→∞ S×S

1 c+ dµ − µW + 2εµ(S × S). ε

≤

(4.3.7)

M

If µ(W ) = 0, then

c∗ dµ =

M

c∗ dµ. Letting ε → 0 in (4.3.7) and

S×S

taking into account the conditions for the choice of µ, we obtain   A(c , ) + δ if A(c , ) > −∞, ∗ ∗ A(c ∧ N, ) <  −1 if A(c∗ , ) = −∞. δ Since δ > 0 is arbitrary, in both cases we have (4.3.8) A(c ∧ N, ) ≤ A(c∗ , ). c∗ dµ = −∞, and passing to the limit in (4.3.7) If µ(W ) > 0, then S×S

as ε → 0, we obtain A(c ∧ N, ) = −∞. Thus in this case, (4.3.8) holds as well. To complete the proof, it remains to apply (4.3.1).

4.3.2

2

The Nonmetrizable Compact Case

Let S be an arbitrary (not necessarily metrizable) compact space. The following generalization of Lemma 4.3.1 then holds.


197

Lemma 4.3.7 Suppose S is a compact space, and suppose that the cost function c : S × S → IR ∪ {+∞} is bounded below and has B0 (S × S)analytic sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α},

α ∈ IR.

Then for every α ∈ IR the sets in S × S, An (α), An (α) (n ∈ IN), and A∗ (α), which were defined in Lemma 4.3.1, are also B0 (S × S)-analytic. The proof is actually the same as that of Lemma 4.3.1 if one takes into account the following properties of the Baire σ-algebras and of the classes AB0 (S × S) and AB0 (S): I. For any compact S, B0 (S × S) = B0 (S) ⊗ B0 (S) (see, for example, Neveu (1965)). (Notice that this can fail for noncompact topological spaces.) II. For any class I of subsets in S, the class AI is stable with respect to the operations of finite and countable union and intersection, and A(AI) = AI (see, for example, Neveu (1965), Kuratowski (1966)). III. For any compact S1 , S2 , M ∈ AB0 (S1 × S2 ) implies πS1 M ∈ AB0 (S1 ), πS2 M ∈ AB0 (S2 ) (see Levin (1990, Lemma 9.9)). From Lemma 4.3.7 together with the relations (4.1.12) it follows that under the assumptions of the lemma, all the functions cn , cn (n ∈ IN), and c∗ are universally measurable. Consequently, the sets Wn (n ∈ IN) and Mn (n = 0, 1, 2, . . .), which were defined in Section 4.3.1, are also universally measurable, as well as the set W = {(x, y); c∗ (x, y) = −∞}. As in the metrizable case, we use the convention that c∗ dµ = +∞ S×S whenever c∗+ dµ = c∗− dµ = +∞. Then the functional A(c∗ , ) is S×S

S×S

well-defined, and A(c∗ , ) ≤ A(c, ) for all ∈ C(S)∗0 . The following reduction theorem extends Theorem 4.3.4 to the nonmetrizable case. Theorem 4.3.8 Let S be an arbitrary compact space. Suppose that the cost function c : S × S → IR ∪ {+∞} is bounded from below and that its sublevel sets, {(x, y) ∈ S × S; c(x, y) ≤ α}, α ∈ IR, are B0 (S × S)-analytic. Also suppose that condition (4.3.1) holds for a given ∈ C(S)∗0 . Then for this the equality (4.3.2) holds.

198


Corollary 4.3.9 If S is compact and c : S × S → IR is bounded with B0 (S × S)-analytic sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α}, α ∈ IR (in particular, c may be bounded and Baire measurable), then (4.3.2) holds for all ∈ C(S)∗0 . We postpone the proof of Theorem 4.3.8 to Section 4.5, where a similar reduction theorem will be established for more general (not necessarily compact) topological spaces.

4.3.3

The Case of the Cost Function c(x, y) := min{s(x , y ); x , y ∈ S , f (x ) = x, f (y ) = y}

Consider the mass transfer problem with given difference of the marginals (4.1.5), where S is an arbitrary compact space and c is defined by (4.1.20): c(x, y) := min{s(x , y ); x , y ∈ S , f (x ) = x, f (y ) = y}, where s ∈ C(S × S )), f : S → S continuous, S and S arbitrary compact spaces. The goal is to prove the following reduction theorem. Theorem 4.3.10 The equality A(c, ) = A(c∗ , ) holds for all ∈ C(S)∗0 . We precede the proof of the theorem by several lemmas. First note that the mapping f : S → S (cf. (4.1.20) ) can be extended to a continuous linear operator f : C(S )∗ → C(S)∗ , defined by the formula u(x)(f )( dx) := u(f (x )) ( dx ) S

S

for all ∈ C(S )∗ and all u ∈ C(S). If the measures and f are regarded as real-valued functions on the Borel σ-algebras B(S ) and B(S), the above formula simply says that (f )(B) = (f −1 (B))

for all B ∈ B(S).

Similarly, a continuous linear operator f × f : C(S × S )∗ → C(S × S)∗ is defined by ϕ(x, y)(f × f )µ ( d(x, y)) := S×S

ϕ(f (x ), f (y ))µ ( d(x , y ))

S ×S

for all µ ∈ C(S × S )∗ and all ϕ ∈ C(S × S) or equivalently by (f × f )µ (M ) = µ (f × f )−1 (M )

for all M ∈ B(S × S).


199

Clearly, the operators f and f ×f are positive; that is, f C(S )∗+ ⊆ C(S)∗+ , and (f × f )C(S × S )∗+ ⊆ C(S × S)∗+ . We denote by π1 and π2 the projection operators C(S × S )∗ → C(S )∗ , which are analogous to the corresponding operators π1 and π2 for S. For every ∈ C(S)∗0 we set   s(x , y )µ ( d(x , y )); µ ∈ C(S × S )∗+ , A (s, ) := inf   S ×S  f (π1 − π2 )µ = .  Observe that f (π1 − π2 )µ = (π1 − π2 )(f × f )µ for all µ ∈ C(S × S )∗ . Lemma 4.3.11 The equality A(c, ) = A (s, ) holds for all ∈ C(S)∗0 . Proof: For every µ ∈ C(S × S )∗+ with f (π1 − π2 )µ = , the measure µ = (f × f )µ belongs to C(S × S)∗+ and (π1 − π2 )µ = , so s(x , y )µ ( d(x , y )) ≥ c(f (x ), f (y ))µ ( d(x , y )) S ×S

S ×S

c(x, y)µ( d(x, y)).

= S×S

Consequently, A (s, ) ≥ A(c, ). To prove the opposite inequality we consider in S × S the set H = {(x , y ); s(x , y ) = c(f (x ), f (y ))}. From the definition (4.1.20) of the cost function c it follows that (f × f )(H) = S × S. Let µ ∈ C(S × S)∗+ and (π1 − π2 )µ = . Clearly, c is a Borel function (it is lower semicontinuous). Then, by using Lusin’s theorem, we obtain a sequence of pairwise disjoint closed sets Fn ⊆ S × S, n ∈ IN, such that n 1 . < Fk µ (S × S) \ n k=1

Moreover, the restriction of c to Fn , c|Fn , is continuous on Fn , n ∈ N .

200


Let us show that the sets Hn := H ∩ (f × f )−1 (Fn ) are closed in S × S . Indeed, fix n ∈ IN and suppose that a net (xν , yν ) ∈ Hn converges to a point (x , y ) ∈ S × S . Then

(f (xν ), f (yν )) ∈ Fn ,

(f (x ), f (y )) = lim(f (xν ), f (yν )) ∈ Fn , ν

and s(x , y ) = lim s(xν , yν ) = lim c(f (xν ), f (yν )) = c(f (x ), f (y )), ν

ν

that is, (x , y ) ∈ Hn . Thus, Hn is closed, and since (f × f )(H) = S × S, we have (f × f )(Hn ) = Fn , n ∈ IN. Therefore, every positive Radon measure µn on Fn may be represented in the form µn = (f × f )µn , where µn is some positive Radon measure on Hn . Next, since the mapping ϕ "→ ϕ0 (f × f ) is a linear isometry of C(Fn ) onto the closed linear subspace L of C(Hn ), defined as the set of functions of the form ϕ(f (x ), f (y )) where ϕ ∈ C(Fn ), every measure µn ∈ C(Fn )∗+ determines some continuous linear functional n ∈ L∗+ = { ∈ L∗ : ϕ, ≥ 0 for all ϕ ∈ L+ }. Extending n to the whole space C(Hn ) with preservation of its norm, we get a Radon measure µn on Hn satisfying µn = (f × f )µn . This µn is positive, which follows easily from the remark that n ∈ L∗+ and ||n || = ||µn ||. Now let µn ≥ 0 be a Radon measure on Hn for which (f × f )µn = µ|Fn . We define the measure µ ∈ C(S × S )∗+ by

g dµ

:=

∞ n=1 H

S ×S

(g|Hn ) dµn

n

for every g ∈ C(S × S ). This is well-defined because ∞ ∞ (g|Hn ) dµn ≤ ||g|| dµn n=1 n=1 Hn

Hn

=

||g||

∞

µFn = ||g|| · ||µ||.

n=1

Further, it is easy to see that (f × f )µ = µ, and hence f (π1 $ − π2 )µ = (π1 − π2 )µ = . Since Hn ⊆ H, we get ∞ s dµ = (s|Hn ) dµn S ×S

n=1 H n

4.3 Reduction Theorems ∞

=

c(f (x ), f (y ))µn ( d(x , y ))

n=1 H n ∞

n=1 F n

S×S

c d(µ|Fn ) =

=

201

c dµ,

and as µ ∈ C(S × S)∗+ is an arbitrary measure with (π1 − π2 )µ = , it 2 follows that A (s, ) ≤ A(c, ). The proof is now completed. Next define the function s on S × S , s(x , y ) := inf sn (x , y ),

(4.3.9)

n

where sn (x , y ) := sn (x , y ) :=

min{s(x , y ), s1 (x , y ), . . . , sn (x , y )}, (4.3.10) n+1 min s(wk−1 , zk ); w0 = x , zn+1 = y , wk , zk ∈ S , k=1

f (wk )

=

f (zk ),

k = 1, . . . , n .

(4.3.11)

It is easily verified that all sn and sn are continuous. Lemma 4.3.12 One of the following two statements is valid: either (a) s ≡ −∞ or (b) s ∈ C(S × S ) and the sequence sn converges to s uniformly. In the first case c∗ ≡ −∞, while in the second case c∗ is lower semicontinuous and satisfies the equality s(x , y ); f (x ) = x, f (y ) = y}. c∗ (x, y) = min{

(4.3.12)

Proof: We show first that the family of functions {sn : n ∈ IN} is equicontinuous. To this end we fix a point (x , y ) ∈ S × S . Since S is compact and s ∈ C(S × S ), then for every ε > 0, there exist neighborhoods U(x , ε) and U(y , ε) of x and y , such that |s(x , z ) − s(x , z )| ≤ ε,

|s(z , y ) − s(z , y )| ≤ ε

(4.3.13)

for all x ∈ U(x , ε), y ∈ U(y , ε), and z ∈ S . Take any x ∈ U(x , ε) and y ∈ U(y , ε). For every natural number n there exist points z k , wk , zk , wk (k = 1, . . . , n) such that f (z k ) = f (wk ),

f (zk ) = f (wk ),

k = 1, . . . , n,

202


and sn (x , y ) = s(x , z 1 ) +

s(wk , z k+1 ) + s(wn , y ),

(4.3.14)

s(wk , zk+1 ) + s(wn , y ).

(4.3.15)

k lim c(xν , yν ). ν

Observe that the function on S × S defined by (x, y) "→ B(c, δx − δy ) = sup{u(x) − u(y); u ∈ Lip (c, S; C(S))} is lower semicontinuous, and since B(c, δx − δy ) = B(c, δx − δy ) ≤ c(x, y)

for all (x, y) ∈ S × S,

we obtain c(x0 , y0 ) > lim c(xν , yν ) ≥ lim B(c, δxν − δyν ) ≥ B(c, δx0 − δy0 ). ν

ν

Inequality (4.4.1) is thus established, which implies B(c, δx0 −δy0 ) < +∞. Now we take a function v ∈ Lip (c, S; UIb (S)) and put c (x, y) := min(c(x, y) − v(x) − v(y), N ) + v(x) − v(y),

208


with N chosen from the condition N > max(0, B(c, δx0 − δy0 ) − v(x0 ) + v(y0 )). Indeed, if a number N > 0 and a nonnegative function e(x, y) satisfying the triangle inequality are given, the function min(e(x, y), N ) satisfies this inequality as well. Then, moreover, c satisfies the triangle inequality. Also, it is clear that c ∈ UIb (S × S) and c (x, y) ≤ c(x, y) for all (x, y) ∈ S × S. Setting u(x) := c (x, y0 ), x ∈ S, we see that u ∈ Lip (c, S; UIb (S)) and B(c, δx0 − δy0 ; Lip (c, S; UIb (S))) ≥ u(x0 ) − u(y0 ) = c (x < −o, y0 ) = min (c(x0 , y0 ), N + v(x0 ) − v(y0 )) > B(c, δx0 − δy0 ). From this and the obvious relation A(c, ) ≥ B(c, ; UIb (S))

∀ ∈ C(S)∗0

it follows that A(c, δx0 − δy0 ) > B(c, δx0 − δy0 ). This contradicts (4.1.13), and therefore, c is lower semicontinuous on S × S. Remark 4.4.1 Observe that in the above proof of the implication c is lower semicontinuous ⇒ (4.1.13), only the boundedness from below of c was used but not the whole hypothesis that Lip (c, S; UIb (S)) is nonempty.

4.4.2

Proof of Theorem 4.1.5 (Duality Theorem on a Metrizable Compact Space with a Cost Function Bounded Below)

(a) ⇒ (b). First observe that every function u ∈ C(S) satisfying u(x) − u(y) ≤ c(x, y)

for all (x, y) ∈ S × S

satisfies the similar inequality with c ∧ N instead of c with N > 0 large enough. It follows that B(c, ) =

lim B(c ∧ N, )

N →∞


4.4 Proofs of the Main Duality Theorems and a Discussion

209

Using this simple observation together with the fact that A(c, ) is nondecreasing in c, we derive A(c, ) ≥ ≥

lim A(c ∧ N, ) ≥

N →∞

lim A(c ∧ N, )

N →∞

lim B(c ∧ N, ) = B(c, ).

N →∞

This, combined with the hypothesis that A(c, ) = B(c, ) implies that A(c, ) = limN →∞ A(c ∧ N, ) for all ∈ C(S)∗0 , and the necessity of condition (4.1.18) is thus established. Suppose that c (and hence c∗ ) is bounded below. We next prove that c is lower semicontinuous on S × S. By the reduction theorem (Theorem 4.3.4), A(c, ) = A(c∗ , )


Further, Lip (c, S; C(S)) = Lip(c∗ , S; C(S)), because every function u satisfying u(x) − u(y) ≤ c(x, y) for all x, y ∈ S, satisfies also u(x) − u(y) ≤ c∗ (x, y) for all x, y ∈ S, which is obtained by summing up the inequalities u(zk−1 ) − u(zk ) ≤ c(zk−1 , zk ), k = 1, . . . , n + 1, where z0 = x, zn+1 = y, and z1 , . . . , zn are arbitrary points of S. It follows that B(c, ) = B(c∗ , )


Therefore, the duality relation (4.1.13) can be rewritten as A(c∗ , ) = B(c∗ , )


From Lemma 4.3.1 and Example 4.2.4 it follows that c∗ is regular with respect to the Banach lattices W = IBb (S × S) and X = IBb (S) of bounded Borel functions on S × S and on S respectively. Then by Theorem 4.2.6, Lip(c∗ , S; IBb (S)) is nonempty (hence Lip(c∗ , S; UIb (S)) is nonempty), and A(c∗ , ; IBb (S × S), IBb (S)) > −∞ for all ∈ C(S)∗0 . We get A(c∗ , ) ≥ A(c∗ , ; IBb (S × S), IBb (s)) > −∞, and from Theorem 4.1.1 applied to the cost function c∗ it follows that c is lower semicontinuous on S × S. (b) ⇒ (a). If c is bounded from below and lower semicontinuous, then it is regular with respect to the Banach lattices W = C(S × S) and X = C(S) (see Example 4.2.3). By Theorem 4.2.6, Lip(c, S; C(S)) is nonempty and A(c, ) = B(c, ) > −∞ for all ∈ C(S)∗0 . It is clear that B(c, ) = B(c∗ , ) = B(c, )


Also, by Theorem 4.3.4, A(c, ) = A(c∗ , )


210


Further, every µ ∈ C(S × S)∗+ with (π1 − π2 )µ = can be associated with the measure µ ∈ C(S × S)∗+ , (π1 − π2 )µ = , given by µ (B) := µ(B) − µ(B ∩ D); here D denotes the diagonal in S × S, D = {(x, x); x ∈ S}. From the triangle inequality for c∗ it follows that c∗ ≥ 0 on D. Now, since c differs from c∗ only on D (see (4.1.19)), we have c∗ dµ = c dµ = c dµ ≤ c∗ dµ, S×S

S×S

S×S

S×S

which implies A(c∗ , ) = A(c, )

for all ∈ C(S)∗+ .

Furthermore, A(c, ) = A(c, ) = B(c, ) = B(c, ) > −∞ for all ∈ C(S)∗0 . If now c (and hence c∗ ) is not bounded below, then Lip (c, S; C(S)) = Lip(c∗ , S; C(S)) is empty. Consequently, B(c, ) = B(c∗ , ) = −∞ for all ∈ C(S)∗0 . Observe that the function (c ∧ N )∗ is not bounded from below because (c ∧ N )∗ ≤ c∗ . But then (c ∧ N )∗ ≡ −∞ for every N > 0, which is derived easily from the triangle inequality for (c ∧ N )∗ combined with its boundedness from above by N . Indeed, by taking (xn , yn ) ∈ S × S, the condition that (c ∧ N )∗ (xn , yn ) < −n, n ∈ IN, implies (c ∧ N )∗ (xn , xn ) ≤ (c ∧ N )∗ (xn , yn ) + (c ∧ N )∗ (yn , xn ) ≤ −n + N. Consequently, (c ∧ N )∗ (xn , xn ) < 0

for n > N,

and hence, by the triangle inequality for (c ∧ N )∗ , (c ∧ N )∗ (xn , xn ) = −∞. Then for every (x, y) ∈ S × S we get (c ∧ N )∗ (x, y) ≤ (c ∧ N )∗ (x, xn ) + (c ∧ N )∗ (xn , xn ) + (c ∧ N )∗ (xn , y) = −∞. According to Theoreom 4.3.4, A(c ∧ N, ) = A((c ∧ N )∗ , ) = −∞ Applying the assumption (4.1.18), we obtain A(c, ) =

lim A(c ∧ N, ) = −∞

N →∞

for all N > 0.


211

for all ∈ C(S)∗0 . Thus we have A(c, ) = B(c, ) = −∞

for all ∈ C(S)∗0 , 2

and the proof is complete.

Remark 4.4.2 The proof of Theorem 4.1.6 is similar. The only difference is that Theorem 4.3.8 is to be used instead of Theorem 4.3.4. A more general duality theorem (for noncompact spaces) will be proved in Section 4.5.

4.4.3

Proof of Theorem 4.1.9

According to Theorem 4.3.10, A(c, ) = A(c∗ , )


By Lemma 4.3.12, one of the following two conditions holds: Either (a) c∗ ≡ −∞ or (b) c∗ is bounded and lower semicontinuous on S × S. In case (a), the assertion is obvious, and in case (b), it follows from Theorem 4.2.6 and Example 4.2.3, since A(c∗ , ) = A(c∗ , ; C(S × S), C(S)) = B(c∗ , ; C(S)) = B(c∗ , ) = B(c, ). Remark 4.4.3 If S is a metrizable compact space, Theorem 4.1.9 is a consequence of Theorem 4.1.5. However in the nonmetrizable case Theorem 4.1.9 does not derive from Theorem 4.1.6, since a lower semicontinuous function of the type (4.1.20) need not have B0 (S × S)-analytic sublevel sets. A corresponding counterexample may be found in Levin and Milyutin (1979, pp. 51–52).

4.4.4

Discussion on Condition (4.1.18) and the Reduction Theorems

From Theorems 4.1.1 and 4.1.6 it follows that condition (4.1.18) is satisfied for all ∈ C(S)∗0 , provided that the cost function c is bounded below, lower semicontinuous, and satisfies the triangle inequality. Observe that a direct verification of (4.1.18) for such functions can be very difficult. Similarly, the direct proof that Lip (c, S; C(S)) is nonempty seems to be quite difficult as well. Next we give a further example of when condition (4.1.18) is satisfied. For the sake of simplicity, we assume S to be metrizable. Suppose that

212


c : S × S → IR ∪ {+∞} is bounded below and has analytic sublevel sets {(x, y); c(x, y) ≤ α}, α ∈ IR. For every N > 0 put EN = {(x, y); c(x, y) > N }. Then the following result holds. Lemma 4.4.4 Let 2λ ≥ 1 and N0 > 0. Suppose that for every ∈ C(S)∗0 there exists a measure µ ∈ C(S × S)∗+ such that (π1 − π2 )µ = , µ (EN0 ) = 0, and ||µ || ≤ λ|| ||. Then c satisfies (4.1.18). Corollary 4.4.5 Let λ and N0 be as in Lemma 4.4.4. Suppose that c is lower semicontinuous and that for every pair of points x, y ∈ S there exists a chain (zj , zj+1 ) ∈ (S × S) \ EN0 (j = 0, . . . , k) such that z0 = x, zk+1 = y, and k ≤ 2λ. Then the assumptions of Lemma 4.4.4 are satisfied, and consequently, (4.1.18) holds. An extension of these results to nonmetrizable compact spaces is also possible. Remark 4.4.6 Let S be an arbitrary compact space. Suppose c is universally measurable and the function c∗ (x, y), if x = y, c(x, y) = 0, if x = y is bounded from below and lower semicontinuous on S × S. Then the condition A(c, ) = A(c∗ , )


(4.4.2)

is sufficient (and necessary) for the validity of the duality relation A(c, ) = B(c, ) > −∞


(4.4.3)

This follows immediately from Remark 4.4.2 and the proof of Theorem 4.1.5. Remark 4.4.7 The validity of (4.4.2) is the assertion of reduction theorems, and condition (4.1.18) is sufficient for (4.4.2) to hold. Also, by Theorems 4.1.5 and 4.1.6, condition (4.1.18) is necessary for the duality relation A(c, ) = B(c, )


to hold. The following example shows that (4.1.18) is not necessary for (4.4.2) to be true.


Example 4.4.8 Put +∞, c(x, y) = −1, Then

c∗ (x, y) =

213

if x = y, if x = y.

+∞,

if x = y,

−∞,

if x = y.

Hence A(c, ) = A(c∗ , ) = +∞ for every = 0 in C(S)∗0 (for = 0 (4.4.2) this is obvious), whereas A(c ∧ N, ) = −∞ for all N > 0 and all = 0 in C(S)∗0 ; that is, condition (4.1.18) is not satisfied. In the following example c is a nonnegative lower semicontinuous cost function, for which c = c∗ is also lower semicontinuous but (4.4.2) is not satisfied; i.e., the duality relation fails. Example 4.4.9 Let us take S = {1, 12 , . . . , n1 , . . . , 0} and consider a function c(x, y) taking only two values, 0 and +∞. The cost function c is defined as follows: 1 ) = 0 for every n ∈ IN, c(0, 1) = 0, c(x, 0) = c(x, x) = 0 for all c( n1 , n+1 x ∈ S, and at all other points c(x, y) = +∞. Then c∗ is given by 0, if x ≥ y or x = 0, c∗ (x, y) = +∞, otherwise.

It is obvious that c and c∗ are lower semicontinuous. Take = δ0 − n∈IN λn δ n1 , where λn > 0 (n ∈ IN), n∈IN λn = 1, and nλ = +∞. n n∈IN It is easy to see that there is no measure µ ∈ C(S × S)∗+ with (π1 −π2 )µ = and c(x, y)µ( d(x, y)) < +∞. S×S

Consequently, A(c, ) = +∞. However, A(c∗ , ) = c∗ (x, y)µ0 ( d(x, y)) = 0, S×S

where µ0 =

n∈IN

λn δ(0, n1 ) ∈ C(S × S)∗0 ,

(π1 − π2 )µ0 = .

214


In this example, (4.1.18) does not hold, as follows from Theorem 4.1.6. This is easily seen directly if one observes that A(c ∧ N, ) = 0 for all N > 0. For details, see Levin and Milyutin (1979, p. 44). In connection with this example the following general remark is of interest. Remark 4.4.10 Let S be a metrizable compact space and c : S × S → IR ∪ {+∞} a Borel function. Suppose that ∈ C(S)∗0 and µ ∈ C(S × S)∗+ are given such that (π1 − π2 )µ = and c∗ dµ = A(c∗ , ) < +∞. Also S×S

suppose that c∗ is lower semicontinuous. Fix ε > 0 and set M0 Mnε

=

M0ε = {(x, y); c∗ (x, y) = c(x, y)},

=

{(x, y); c (x, y) ≤ c∗ (x, y) + ε} \ n

n−1

Mkε

(n ∈ IN).

k=0

It can be shown that the condition nµ(Mnε ) < +∞ for every ε > 0 n∈IN

is sufficient for A(c, ) = A(c∗ , ) to hold.

4.4.5

A Further Duality Theorem

In connection with Lemma 4.3.12 and Theorem 4.3.10 observe that in general, the lower semicontinuity of c does not imply the lower semicontinuity of c∗ ; see Levin and Milyutin (1979, Example 3 in Section 5). Nevertheless, the following result holds. Theorem 4.4.11 Suppose that S is a metrizable compact space. If c : S × S → IR ∪ {+∞} is lower semicontinuous and if there exists a function u ∈ C(S) such that the strict inequality u(x) − u(y) < c(x, y) holds for all (x, y) ∈ S × S, then c∗ is lower semicontinuous as well, and (4.4.2) and (4.4.3) hold. Proof: We consider on S × S the function c (x, y) := c(x, y) − u(x) + u(y). Clearly, c is lower semicontinuous on S × S, c (x, y) > 0



215

and c∗ (x, y) = c∗ (x, y) − u(x) + u(y). It follows that A(c, ) = A(c , ) +

u(x)( dx) S

and A(c∗ , ) =

A(c∗ , )

+

u(x)( dx). S

Therefore, we may replace c by c and assume without loss of generality that c(x, y) > 0 for all (x, y) ∈ S × S. Since c is lower semicontinuous and > 0 on a compact space S × S, there exists ε0 > 0 such that c(x, y) ≥ ε0 for all (x, y) ∈ S × S. From this and the definition of the functions cn (see (4.1.16) and (4.1.17)) it is clear that for every α > 0 and every n > α/ε0 we have {(x, y); c∗ (x, y) ≤ α} = {(x, y); cn (x, y) ≤ α}.

(4.4.4)

Observe that all cn are lower semicontinuous on S × S since c is. Therefore, by (4.4.4), c∗ is lower semicontinuous as well. Obviously, c∗ (x, y) ≥ 0 for all (x, y) ∈ S × S, and c∗ satisfies the triangle inequality. Given a measure ∈ C(S)∗0 , let us show that A(c, ) = A(c∗ , ). According to Theorem 4.2.8 and the equality A(c∗ , ) = A(c∗ , ; C(S × S), C(S)) (see Example 4.2.3), there exists a measure µ ∈ C(S × S)∗+ such that π1 µ = + , π2 µ = − , and c∗ (x, y)µ( d(x, y)) = A(c∗ , ). S×S

Assume that A(c∗ , ) < +∞, since otherwise the relation A(c, ) = A(c∗ , ) is obvious. Then2µ{(x, y); c∗ (x, y) = +∞} = 0, and from (4.4.4) it follows ∞ that S × S = n=0 Mn , where M0

=

{(x, y); c∗ (x, y) = c(x, y)},

Mn

=

{(x, y); c∗ (x, y) = cn (x, y) < +∞} \

n−1 k=0

Mk ,

n ∈ IN.

216


Taking into account (4.4.4), we obtain ε0

nµ(Mn ) ≤

∞ n=0 M

n∈IN

c∗ dµ =

c∗ dµ < +∞.

(4.4.5)

S×S

n

n

We consider the multifunctions Γn : Mn → 2S , n ∈ IN, Γn (x, y) := {(z1 , . . . , zn ); c(x, z1 ) + · · · + c(zn , y) = cn (x, y)}. It is easily seen that every Mn is universally measurable, and the graph of Γn belongs to the σ-algebra B(Mn )µ ⊗ B(S n ) generated by the products A × B, where A ∈ B(Mn )µ and B ∈ B(S n ). According to the measurable selection theorem (see, for example, Aumann (1967) or Levin (1985a)), Γn has a µ-measurable selection ϕn : Mn → S n ; that is, ϕn (x, y) = (ϕn,1 (x, y), . . . , ϕn,n (x, y)) ∈ Γn (x, y) for all (x, y) ∈ Mn , and all ϕnk : Mn → S, k = 1, . . . , n, are µ-measurable. Put ϕn,o (x, y) = x and ϕn,n+1 (x, y) = y and define the mappings ϕn,k : Mn → S × S by Ψn,k (x, y) := (ϕn,k−1 (x, y), ϕn,k (x, y)),

k = 1, . . . , n + 1.

For every n ∈ IN let us take the measure µn ∈ C(S × S)∗+ defined by n+1 −1 µn (E) := k=1 µ(Ψn,k (E)) for any Borel set E ⊂ S × S. We have then ||µn || ≤ (n + 1)µ(Mn ). ∞ n=1 ||µn || < +∞. Consequently, the measure µ := µ0 + By (4.4.5), n∈IN µn , where µ0 (E) := µ(E ∩ M0 ) for every Borel set E ⊂ S × S, is well-defined. It is easy to check that (π1 − π2 )µ = (π1 − π2 )µ = (cf. the proof of Lemma 4.3.6). We obtain

c dµ

=

S×S

c(x, y)µ( d(x, y)) + M0

c dµ +

= M0

∞ n=1 M

n

∞ n+1 n=1 M

cn dµ =

n

c(Ψn,k (x, y))µ( d(x, y))

k=1

c∗ dµ + M0

∞ n=1 M

n

c∗ dµ =

c∗ dµ. S×S

Thus, A(c, ) = A(c∗ , ), yielding (4.4.2). It remains to note that the duality relation (4.4.3) follows from (4.4.2) (see Remark 4.4.6). 2


217

Remark 4.4.12 Theorem 4.4.11 can be extended to arbitrary (nonmetrizable) compact spaces, provided that sublevel sets of c belong to the Baire σ-algebra B0 (S × S). The proof is practically the same, with the only difference that a stronger measurable selection theorem (see, for example, Evstigneev (1985) or Levin (1987)) should be applied.

4.4.6

Existence of Optimal Solutions

The following existence theorem is a direct consequence of Corollary 4.2.11 and Example 4.2.3. Theorem 4.4.13 Suppose that S is compact and that c : S × S → IR ∪ {+∞} is bounded below, is lower semicontinuous, and satisfies the triangle inequality. Then for every ∈ C(S)∗0 there exists a measure µ ∈ C(S × S)∗+ such that π1 µ = + , π2 µ = − , and c(x, y)µ(d(x, y)) = A(c, ). S×S

However, an optimal measure need not exist when the cost function does not satisfy the triangle inequality. A simple counterexample follows: Example 4.4.14 Let S = [0, 1], c(x, y) = (x − y)2 , = δ0 − δ1 . With µn =

n i=1

δ( i−1 , i ) n

n

we have µn ∈ C(S × S)∗+ , (π1 − π2 )µn = , and 1 c(x, y)µn ( d(x, y)) = . n S×S

Consequently, A(c, ) = 0. On the other hand, if µ ∈ C(S × S)∗+ and c(x, y)µ( d(x, y)) = 0, S×S

then supp µ ⊆ D = {(x, x); x ∈ S} as c(x, y) > 0 outside D. Consequently, (π1 − π2 )µ = 0; that is, no optimal solution can exist for the mass transfer problem with the given c and . The next result concerns the existence of optimal solutions for the dual problem (cf. Levin (1978a, Theorem 3)).

218


Theorem 4.4.15 Suppose that S is compact and c is continuous on S × S and vanishes on the diagonal. Also suppose that Lip (c, S; C(S)) is nonempty (according to Lemma 4.3.12, this is equivalent to the condition that c∗ = −∞). Then for every ∈ C(S)∗0 , the optimal value B(c, ) is attained; that is, there exists a function u0 ∈ Lip (c, S; C(S)) such that B(c, ) =

u0 (x)( dx). S

Proof: Fix an arbitrary point x0 ∈ S and consider the set M (x0 ) := {u ∈ Lip (c, S; C(S)); u(x0 ) = 0}. Clearly, Lip (c, S; C(S)) = M (x0 ) + IR = {u(·) + α; u ∈ M (x0 ), α ∈ IR}. Consequently,   u(x)( dx); u ∈ M (x0 ) . B(c, ) = sup    

(4.4.6)

S

For every u ∈ M (x0 ) we have −||c||C(S×S) ≤ −c(x0 , x) ≤ u(x) ≤ c(x, x0 ) ≤ ||c||C(S×S) whenever x ∈ S, and −c(y, x) ≤ u(x) − u(y) ≤ c(x, y) whenever (x, y) ∈ S × S. Hence the family of functions M (x0 ) is uniformly bounded and equicontinuous (the last holds since c is continuous and vanishes on the diagonal). Also, M (x0 ) is closed in C(S). Then from the Arzela theorem it follows that M (x0 ) is compact in C(S). Hence the supremum in (4.4.6) is attained, and the result follows. 2

Remark 4.4.16 The assumption that c vanishes on the diagonal cannot be omitted. Indeed, it is easy to see that the optimal value B(c, ) is not attained for S = [0, 1], c ≡ 1, and = m1 − m2 . Here m1 is the Lebesgue measure on [0, 1/2] and m2 is the Lebesgue measure on [1/2, 1]. For details, see Levin (1978a, p. 37).

4.5 Duality Theorems for Noncompact Spaces

219

4.5 Duality Theorems for Noncompact Spaces In this section, based on Levin (1984, 1986, 1987, 1990, 1992), the duality theory for mass transfer problems is extended to noncompact topological spaces.

4.5.1

Statement of the Problems and Formulation of the Duality Theorems

In what follows, S is a completely regular topological space, C(S) is the vector space of continuous real-valued functions on S, and C b (S) is the vector subspace of C(S) consisting of bounded functions, and so C b (S) is a Banach space with respect to the uniform norm ||u|| := sup |u(x)|,

u ∈ C b (S).

x∈S

As in the compact case, B(S) denotes the σ-algebra of Borel sets in S, and B0 (S) denotes the σ-algebra of Baire sets in S, that is, the σ-algebra generated by the class of sets F0 (S) := {u−1 (0); u ∈ C b (S)}. Now we define two classes of topological spaces introduced in Levin (1984). Definition 4.5.1 We say that a space S belongs to the class L0 if it is homeomorphic to a Baire subset of some compact space. Definition 4.5.2 We say that a space S belongs to the class L if it is homeomorphic to a universally measurable subset of some compact space. Obviously, L0 ⊂ L. Polish spaces belong to L0 , since every Polish space is homeomorphic to a Gδ set in a metrizable compact space (see, for example, Kuratowski (1966)). It is also clear that locally compact spaces that are σ-compact at the same time belong to L0 . So both classes, L0 and L, are wide enough. Some properties of these classes are considered below in Section 4.5.2. In particular, we shall see that the topological product S1 × S2 belongs to L (resp. S1 × S2 ∈ L0 ) for every S1 , S2 ∈ L (resp. S1 , S2 ∈ L0 ). Before stating the mass transfer problem and the dual problem, we have to describe classes of measures and cost functions that will be considered. Let V+ (S) denote the set of finite nonnegative inner regular Borel measures on S, and let V (S) := V+ (S) − V+ (S)

220


and V0 (S) := { ∈ V (S); S = 0}. Thus, V (S) is a linear space, V0 (S) is a linear subspace, and V+ (S) is a convex cone in V (S). For every ν ∈ V+ (S) the ν-completion of the σ-algebra B(S), B(S)ν , is defined as follows: A ∈ B(S)ν if and only if there exists B, B0 ∈ B(S) with ν(B0 ) = 0 and such that (A \ B) ∪ (B \ A) ⊆ B0 . A set A ∈ B(S)ν is called ν-measurable. A set in S is called universally measurable if it is ν-measurable for all ν ∈ V+ (S). A function ϕ : S → IR ∪ {+∞} is called Borel (resp. Baire, ν-measurable, universally measurable), if its sublevel sets {x ∈ S; ϕ(x) ≤ α}, α ∈ IR, are elements of the corresponding σ-algebras. If S is compact, the above measurability definitions for sets and functions coincide with those given in Section 4.1.2, since in this case V+ (S) = C(S)∗+ . Given a measure ν ∈ V+ (S) and a universally measurable function ϕ : S → IR ∪ {+∞} bounded below, the integral ϕ(x)ν( dx)

ν(ϕ) := S

is well-defined, either finite or +∞. If ν ∈ V (S), the finite integral ν(ϕ) is well-defined for any bounded universally measurable function ϕ : S → IR. Let S1 , S2 be a pair of completely regular spaces. For every measure µ ∈ V+ (S1 × S2 ), the marginals π1 µ and π2 µ are easily seen to be in V+ (S1 ) and V+ (S2 ) respectively. We are now in a position to state the noncompact versions of the mass transfer problem (4.1.5) and of the dual problem (4.1.6). They are as follows: Given a completely regular topological space S, a measure ∈ V0 (S), and a universally measurable cost function c : S × S → IR ∪ {+∞} bounded below, the problem consists in finding the optimal values A(c, ) :=

inf{µ(c); µ ∈ V+ (S × S), (π1 − π2 )µ = }

and B(c, ) := sup{(u); u ∈ Lip(c, S; C b (S))}. It is clear that B(c, ) ≤ A(c, ) for all ∈ V0 (S).


221

Theorem 4.5.3 (Mass transfer problem on completely regular topological spaces) Let S ∈ L, ∈ V0 (S), and c(x, y) = sup (u(x) − u(y))


(4.5.1)

u∈Q

where Q is a nonempty subset in C b (S). Then the duality relation A(c, ) = B(c, )

(4.5.2)

holds. Furthermore, for any σ1 , σ2 ∈ V+ (S) with σ1 − σ2 = , a measure µ ∈ V+ (S × S) exists such that π1 µ = σ1 , π2 µ = σ2 , and µ(c) = A(c, ). Remark 4.5.4 One can see from Theorem 4.5.3 that under its assumption on c, the mass transfer problem with a given marginal difference is equivalent to the corresponding problem with given marginals. Remark 4.5.5 If the value B(c, ) is attainded for some function u0 , then the duality relation (4.5.2) implies the following optimality criterion. Optimality criterion. Suppose that the optimal value B(c, ) is attained. A measure µ ∈ V+ (S × S) with (π1 − π2 )µ = is optimal if and only if there exists a function u0 ∈ C b (S) such that u0 (x) − u0 (y) ≤ c(x, y) for all (x, y) ∈ S × S and u0 (x) − u0 (y) = c(x, y) for (x, y) ∈ supp µ. (Any function at which B(c, ) is attained may be taken as such a u0 .) A remark to this criterion: Note that for every µ ∈ V+ (S × S) its support is defined as supp µ := (S × S) \ G(µ). Here G(µ) =

{G; G is open and µG = 0}

is the maximal µ-negligible open set in S × S (the equality µ(G(µ)) = 0 follows from the inner regularity of µ). Observe also that according to Theorem 4.4.15, the value B(c, ) is attained when S is compact, c ∈ C(S × S)+ , and c(x, x) = 0 for all x ∈ S (in such a case, c∗ is continuous and c∗ ≥ 0; hence Lip (c, S; C(S)) is nonempty). For a metric compact space (S, r) and c = r we derive from here the optimality criterion of L.V. Kantorovich for the classical mass transfer problem as it is formulated in Section 4.1.1. Remark 4.5.6 It follows from Theorems 4.2.6 and 4.2.8 together with Example 4.2.3 that in the case of a compact set S, any lower semicontinuous

222


cost function c : S × S → IR ∪ {+∞} bounded below that satisfies the triangle inequality and vanishes on the diagonal has a representation (4.5.1). Indeed, applying Theorem 4.2.8 and Example 4.2.3 yields A(c, δx − δy ) = c(x, y)

for all x, y ∈ S.

We should take into account that for x = y, δ(x,y) is the only measure in C(S × S)∗+ having δx and δy as its marginals (for x = y, the equality A(c, δx − δy ) = c(x, y) is satisfied as well because both sides equal 0). On the other hand, by Theorem 4.2.6 and Example 4.2.3, Lip (c, S; C(S)) is nonempty, and the duality relation A(c, δx − δy ) = B(c, δx − δy ) > −∞ holds. Consequently, (4.5.1) holds with Q = Lip(c, S; C b (S)). How wide is the class of functions c admitting such a representation in the noncompact case? If c(x, y) is bounded, continuous in x for every y (or continuous in y for every x), satisfies the triangle inequlity and the equality c(x, x) = 0 for all x ∈ S, then this function can be represented in the form (4.5.1) with Q = {uz,n ; z ∈ S, n ∈ N }, where uz.n (·) = min(c(·, z), n) (resp. uz,n = min(−c(z, ·), n)). Any continuous metric on S satisfies the last assumptions, and so we obtain the following noncompact generalization of the classical duality theorem. Corollary 4.5.7 (The Kantorovich–Rubinstein duality theorem on noncompact spaces) If S ∈ L and c = r is a continuous metric on S, then the statement of Theorem 4.5.3 holds. The following result extends Theorem 4.1.1 to noncompact spaces taking into account Remark 4.5.6 (see also Remark 4.5.9 below). Theorem 4.5.8 (Duality theorem on noncompact spaces with cost function satisfying the triangle inequality) Suppose that S ∈ L and that the cost function c : S × S → IR ∪ {+∞} is universally measurable and satisfies the triangle inequality. Then: (A) The existence of a nonempty set Q ⊂ C b (S) such that the representation (4.5.1) holds for all x, y ∈ S with x = y is sufficient for (4.5.2) to be true for all ∈ V0 (S). (B) If there exists a bounded universally measurable function v : S → IR satisfying v(x) − v(y) ≤ c(x, y) for all x, y ∈ S, then the preceding condition is also necessary for (4.5.2) to be true for all ∈ V0 (S).


223

(C) If S ∈ L0 and the sublevel sets of c, {(x, y); c(x, y) ≤ α},

α ∈ IR,

(4.5.3)

are (B0 (S) ⊗ B0 (S))-analytic (in such a case c is universally measurable), then there exists a bounded Baire (hence universally measurable) function v : S → IR satisfying the assumption of (B). Remark 4.5.9 If c has a representation (4.5.1) for all x, y ∈ S with x = y, then the function c(x, y), if x = y, c(x, y) = 0, if x = y admits the same representation for all x, y ∈ S. This, combined with Remark 4.5.6, implies that in the case of compact S, the assumption of (A) is equivalent to the lower semicontinuity of c on S × S. Thus, Theorem 4.1.1 actually is a particular case of Theorem 4.5.8. Remark 4.5.10 Assumption (B) is satisfied when c is nonnegative or bounded. In the first case one can take v = 0, in the second v(·) = c(·, z) for any z ∈ S. Theorem 4.5.11 (Duality theorem on noncompact spaces with continuous cost function bounded below) Suppose that S ∈ L, ∈ V0 (S), and let c be continuous and bounded below (the triangle inequality for c is not required). Also suppose that there exists a sequence of real numbers Nk , 0 < Nk ↑ +∞, such that for each k, the function (c ∧ Nk )(x, y) := min(c(x, y), Nk ) can × S by finite sums of the form be approximated uniformly on S b ϕ (x)ψ (y), where ϕ , ψ ∈ C (S). Then the following statements jk jk jk j jk are equivalent: (a) The duality relation A(c, ) = B(c, ) holds. (b) The equality A(c, ) = holds.

lim A(c ∧ Nk , )

k→∞

224


Remark 4.5.12 The above hypothesis on c means that all the functions c ∧ Nk can be extended to βS × βS preserving continuity. Here βS stands ˇ for the Stone–Cech compactification of S. Corollary 4.5.13 Suppose that S ∈ L and c ∈ C(βS × βS). Then the duality relation A(c, ) = B(c, ) holds for all ∈ V0 (S). This is a generalization of Corollary 4.1.10 to noncompact spaces. We now give an example of a nonnegative continuous function c(x, y) for which the duality relation (4.5.2) fails. The existence of such an example illustrates the substantial difference between compact and noncompact cases. Example 4.5.14 Let S = {0, 1, 2, . . .}. Then the σ-algebras B(S) and B(S ×S) consist of all subsets in S and in S × S respectively, and measures ∈ V0 (S) and µ ∈ V+ (S × S) are determined by their values i , µij on sin× S) we have µ ≥ 0 and gleton sets. For every µ ∈ V+ (S ij i,j µij < +∞. For every ∈ V0 (S) we have i |i | < +∞ and i i = 0, and the condition that (π1 − π2 )µ = is rewritten as ∞

(µji − µij ) = j ,

j = 0, 1, 2, . . . .

i=0

Observe that each real-valued function on S × S is continuous and define c(i, j) = 0 for j = i + 1, c(i, j) = j for j = i + 1. Take = (i ), 0 = 1, i = −λi , i = 1, 2, . . ., where λi > 0, ∞

λi = 1,

and

i=1

∞

iλi = +∞.

i=1

For any µ = (µij ) ∈ V+ (S × S) with (π1 − π2 )µ = , we have µ(c) =

c(i, j)µij =

i,j

= ≥

∞ j=1 ∞ j=1

j

∞ i=0

∞

jµij

j=1 i =j−1

µij − µj−1j

=

∞

j

j=1 ∞

j (λj + µjj+1 − µj−1j ) =

λj +

∞

µji − µj−1j

i=0

(jλj − µj−1j ) = +∞.

j=1

Then A(c, ) = +∞. At the same time, B(c, ) = 0 for the constants are the only functions in Lip(c, S; C b (S)).


225

We denote by C the class of functions c : S × S → IR ∪ {+∞} with B0 (S) ⊗ B0 (S)-analytic sublevel sets (4.5.3) bounded below, and we denote by C∗ the class of functions that admit the representation (4.5.1) for x = y, with Q being a nonempty subset in C b (S). Recall that in the case of a compact set S, the equality B0 (S) ⊗ B0 (S) = B0 (S × S) holds (see, for example, Neveu (1965)). (In general, the σ-algebra B0 (S × S) is broader than B0 (S) ⊗ B0 (S).) The following duality theorem is a generalization of Theorem 4.1.6 to noncompact spaces. Theorem 4.5.15 (Duality theorem on noncompact spaces and general cost function) Suppose that S ∈ L0 and c ∈ C. For the duality relation (4.5.2) to be true for all ∈ V0 (S), it is necessary and sufficient that (i) the equality A(c, ) =

lim A(c ∧ N, )

N →∞

for all ∈ V0 (S)

(4.5.4)

hold; (ii) either c∗ ∈ C∗ (and this is the case where A(c, ) = B(c, ) > −∞ for all ∈ V0 (S)), or c∗ is unbounded from below (and this is the case where A(c, ) = B(c, ) = −∞ for all ∈ V0 (S)). Remark 4.5.16 If c is bounded not only below but also above, then (4.5.4) is trivially satisfied, and the formulation of the theorem admits an obvious simplification.

4.5.2

Properties of the Classes L and L0 and an Extension of the Lusin Separation Theorem

In this section, some auxiliary results concerning classes L and L0 are presented. These results will be substantially used in proving the duality theorems 4.5.3, 4.5.8, 4.5.11, and 4.5.15. ˇ In what follows, βS denotes the Stone–Cech compactification of a completely regular space S. It is well known that the space βS admits various realizations. It will be convenient for us to use one of them, identifying βS with the Gel’fand compactum of the real commutative Banach algebra C b (S) or, equivalently, with the sets of nonzero multiplicative linear functionals on C b (S) equipped with the weak* topology. Obviously, each u ∈ C b (S) is extended uniquely to βS as a continuous function, given by u(q) = u, q ,

q ∈ βS

226


(every x ∈ S corresponds to q = δx ). The space βS is known to be the maximal compactification of S; that is, for every compactification S1 of S, there exists a continuous surjection ϕ : βS → S1 such that ϕ(βS \ S) = S1 \ S (see, for example, Kuratowski (1966) or Gel’fand, Raikov, and Shilov (1964, Chapter 7)). Lemma 4.5.17 A space S belongs to L if and only if it is universally measurable in βS. Proof: Clearly, only the “only if” part requires a proof. Thus, suppose S ∈ L; that is, S is homeomorphic to a universally measurable subset of some compact space S1 . We assume without loss of generality that S is dense in S1 ; that is S1 is a compactification of S. Since βS is the maximal compactification of S, there exists a continuous surjection θ : βS → S1 such that θx = x for all x ∈ S. Let σ ∈ C(βS)∗+ . Then θ(σ) ∈ C(S1 )∗+ (where θ(σ)B := σ(θ−1 (B)) for all B ∈ B(S1 )), and S is θ(σ)-measurable in S1 , since it is universally measurable in S1 . Consequently, there exists a representation S = A ∪ A0 , where A ∈ B(S1 ), A0 ⊆ B0 , B0 ∈ B(βS), and θ(σ)B0 = 0. We obtain A = θ−1 (A) ∈ B(βS) (because A ⊆ S and θ is continuous), θ−1 (B0 ) ∈ B(βS), σ(θ−1 (B0 )) = θ(σ)B0 = 0, and A0 ⊆ θ−1 (B0 ); that is, S is σ-measurable in βS. In view of the arbitrary choice of σ, S is universally measurable in βS. The proof is now complete. 2 Lemma 4.5.18 If S1 and S2 belong to L, then S1 × S2 is universally measurable in βS1 × βS2 , and hence belongs to L. This follows from the relation (βS1 × βS2 ) \ (S1 × S2 ) = ((βS1 \ S1 ) × βS2 ) ∪ (βS1 × (βS2 \ S2 )) and Lemma 4.5.17. Lemma 4.5.19 If S is universally measurable in a compact space F , then V+ (S) = {σ ∈ V+ (F ) = C(F )∗+ ; σ(F \ S) = 0}. This is an immediate consequence of the definitions. Lemma 4.5.20 If S1 and S2 belong to L, then V+ (S1 × S2 ) may be identified with the set {µ ∈ V+ (βS1 × βS2 ) = C(βS1 × βS2 )∗+ ; πi µ(βSi \ Si ) = 0, Proof: This follows from Lemmas 4.5.18 and 4.5.19.

i = 1, 2}. 2


227

Remark 4.5.21 Since every function u ∈ C b (S) can be extended uniquely to βS as a continuous function, there is a natural linear isometry between C b (S) and C(βS). Now, by the Riesz representation theorem, we get C b (S)∗ = C(βS)∗ = V (βS). This holds for every completely regular S. If in addition S belongs to L, then by Lemma 4.5.17 and 4.5.19, V (S) may be identified with a vector subspace of V (βS). In this case, the weak topology of σ(V (S), C b (S)) coincides with the usual weak* topology of the conjugate Banach space C b (S)∗ = V (βS) restricted to V (S). Let S1 , S2 be a pair of completely regular spaces. We consider the algebraic tensor product C b (S1 ) ⊗ C b (S2 ) and denote by C b (S1 ) ⊗ C b (S2 ) its closure in C b (S1 × S2 ). Elements of C b (S1 ) ⊗ C b (S2 ) are functions on S1 × S2 that can be represented as finite sums ϕj (x)ψj (y) j

with ϕj ∈ C b (S1 ), ψj ∈ C b (S2 ), while elements of C b (S1 ) ⊗ C b (S2 ) are functions in C b (S1 × S2 ) that can be approximated uniformly by such sums. It follows from the Stone–Weierstrass theorem that C b (S1 ) ⊗ C b (S2 ) consists of those functions in C b (S1 × S2 ) that can be extended to βS1 ×βS2 preserving continuity. We obtain the following: Lemma 4.5.22 A function c : S1 × S2 → IR ∪ {+∞} admits the representation c(x, y) = sup{h(x, y); h ∈ H}

(4.5.5)

with H being a nonempty subset of C b (S1 ) ⊗ C b (S2 ) if and only if it can be extended to βS1 × βS2 as a lower semicontinuous function. Remark 4.5.23 It is not difficult to verify that the class of all such functions is exactly the class of lower semicontinuous functions S1 × S2 → IR ∪ {+∞} bounded from below. In what follows we make no use of this fact, and so we omit the proof of it. Now we pass to the class L0 . Lemma 4.5.24 Let K be a compact set in a completely regular space S. Then B0 (K) = {B ∩ K; B ∈ B0 (S)}.

228


Proof: Passing from S to βS and applying Urysohn’s lemma, we get F0 (K) = {F ∩ K; F ∈ F0 (S)}, with F0 denoting the class of null sets of bounded continuous functions. Then B0 (K) = {B ∩ K; B ∈ B0 (S)}, and it remains to observe that the opposite inclusion is obvious.

2

Lemma 4.5.25 The following statments are equivalent: (a) S ∈ L0 . (b) S ∈ B0 (βS). (c) B0 (S) = {B ∈ B0 (βS); B ⊆ S}. Proof: The implications (c) ⇒ (b) ⇒ (a) are easy to check. Let us show that (a) ⇒ (c). Since every u ∈ C b (S) is extended uniquely to βS as a continuous function, we have F0 (S) = {F ∩ S; F ∈ F0 (βS)}. Then B0 (S) = {B ∩ S; B ∈ B0 (βS)},

(4.5.6)

and since, by assumption, S ∈ L0 , there exists a compact space Z such that S ∈ BO (Z). Denote by S1 the closure of S in Z. Then S1 is a compactification of S, and applying Lemma 4.5.24 to K = S1 and a completely regular (compact) space Z yields S ∈ B0 (S1 ). Since βS is the maximal compactification of S, there exists a continuous surjection ϕ : βS → S1 such that ϕ(βS \ S) = S1 \ S. Then ϕ−1 (S) = S, and S ∈ B0 (βS) by the (B(βS) − B0 (S1 ))-measurabilty of the continuous mapping ϕ. Now (c) follows from (4.5.6). 2 Lemma 4.5.26 If S1 and S2 belong to L0 , then S1 × S2 ∈ B0 (βS1 × βS2 ), and hence it belongs to L0 as well. This is a direct consequence of Lemma 4.5.25 when taking into account the equality B0 (βS1 × βS2 ) = B0 (βS1 ) ⊗ B0 (βS2 ), which follows from the compactness of βS1 and βS2 .


229

Lemma 4.5.27 Let S1 and S2 belong to L0 and A ∈ A(B0 (S1 ) ⊗ B0 (S2 )). Then π1 A ∈ AB0 (S1 )

π2 A ∈ AB0 (S2 ),

and

where π1 and π2 denote the natural projection mappings onto S1 and S2 respectively. Proof: Both assertions are symmetric, so we shall prove only the first of them. Applying Lemma 4.5.25 and the relation B0 (βS1 ) ⊗ B0 (βS2 ) = B0 (βS1 × βS2 ), we obtain A ∈ A(B0 (βS1 ) ⊗ B0 (βS2 )) = AB0 (βS1 × βS2 ). Then it is known (see, for example, Levin (1992, Lemma 2.11)) that A ∈ AF0 (βS1 × βS2 ). In other words, A can be represented in the form 5

A =

F (n1 , . . . , nk ),

(nk )∈ININ k∈IN

where all F (n1 , . . . , nk ) ∈ F0 (βS1 × βS2 ). Passing, if required, from F (n1 , . . . , nk ) to F (n1 , . . . , nk ) := F (n1 ) ∩ F (n1 , n2 ) ∩ · · · ∩ F (n1 , . . . , nk ), we assume without loss of generality that the sets F (n1 , . . . , nk ) satisfy the following condition: For all (n1 , . . . , nk+1 ) ∈ INk+1 and all k ∈ IN, F (n1 , . . . , nk , nk+1 ) ⊆ F (n1 , . . . , nk ).

(4.5.7)

We obtain then π1 A

= πβS1 A = =

(ni )∈ININ

5

πβS1

5

F (n1 , . . . , nk )

k∈IN

πβS1 F (n1 , . . . , nk ).

(nk )∈ININ k∈IN

The last equality follows from the compactness of all F (n1 , . . . , nk ) and (4.5.7). If we show now that πβS1 F (n1 , . . . , nk ) ∈ F0 (βS1 ),

230


this will imply that πβS1 A is the result of the application of the A-operation to the sets πβS1 F (n1 , . . . , nk ) ∩ S1 ∈ F0 (S1 ), and thereby the required assertion will be proved. Since F := F (n1 , . . . , nk ) ∈ F0 (βS1 × βS2 ), there exists a function u ∈ C(βS1 × βS2 ) such that F = {(x, y); u(x, y) = 0}. Then πβS1 F = {x; u1 (x) = 0}, where u1 (x) := min{|u(x, y)|; y ∈ βS2 } ∈ C(βS1 ), and hence πβS1 F ∈ 2 F0 (βS1 ). The proof of the next lemma is analogous and so is omitted. Lemma 4.5.28 Suppose that S1 , S2 , S3 ∈ L0 and A ∈ A(B0 (S1 )⊗B0 (S2 )⊗ B0 (S3 )). Then πS1 ×S2 A ∈ A(B0 (S1 ) ⊗ B0 (S2 )). The following result extends the classical Lusin separation theorem to nonmetrizable spaces. Theorem 4.5.29 Let S ∈ L0 , A1 , A2 ∈ AB0 (S), and A1 ∩ A2 = Ø. Then there exist sets B1 , B2 ∈ B0 (S) such that B1 ⊇ A1 , B2 ⊇ A2 , and B1 ∩B2 = Ø. Proof: First we prove the theorem for a compact S and then reduce the general statement to the compact case. Thus, let S be a compact space. We have the representation 5 Ai = Fi (n1 , . . . , nk ), i = 1, 2, (nk )∈ININ k∈IN

with Fi (n1 , . . . , nk ) ∈ F0 (S) for all (n1 , . . . , nk ) ∈ ININ , k ∈ IN, and Fi (n1 , . . . , nk , nk+1 ) ⊆ Fi (n1 , . . . , nk ),

i = 1, 2

(4.5.8)

(cf. proof of Lemma 4.5.27). Since A1 ∩ A2 = Ø, we have 5 (F1 (n1 , . . . , nk ) ∩ F2 (m1 , . . . , mk )) = Ø k∈IN

for all sequences (nk ), (mk ) ∈ ININ . From the compactness of all the Fi and the inclusions (4.5.8), it follows that for any (n ), (m ) ∈ ININ , there is a k ∈ IN, possibly depending on the sequences, such that F1 (n1 , . . . , nk ) ∩ F2 (m1 , . . . , mk ) = Ø.

(4.5.9)


231

Consider now the topological product + D = Dβ . β

Here Dβ := {0} ∪ {1} for all β = (α, i); α = (n1 , . . . , nk ) ranges over the set of finite sequences of natural numbers, i ∈ {1, 2}. Also, we consider the mapping h : S → D, h(x) := (hβ (x)), where hβ is the indicator (the characteristic function) of Fβ := Fi (n1 , . . . , nk ); that is, hβ (x) = 1 for / Fβ . x ∈ Fβ , and hβ (x) = 0 for x ∈ It is easily seen that for every Borel set H ⊆ D the set h−1 (H) belongs to the σ-algebra generated by the Fβ . Moreover, for every β0 = (α, i) = (n1 , . . . , nk , i) the set h(Fβ0 ) is the trace on h(S) of the Borel (open–closed) set in D, Bβ0 = Bi (n1 , . . . , nk ) := {z = (zβ ) ∈ D; zβ0 = 1}, with h−1 (Bβ0 ) = Fβ0 , and hence h−1 h(Fβ0 ) = Fβ0 . We fix k and put, for each finite sequence (n1 , . . . , nk ) ∈ INk , M1 (n1 , . . . , nk ) := B1 (n1 , . . . , nk ) \ D(n1 , . . . , nk ), with D(n1 , . . . , nk ) designating the union of all sets B2 (m1 , . . . , mk ) for which B1 (n1 , . . . , nk ) ∩ B2 (m1 , . . . , mk ) ∩ h(S) = Ø. We have M1 (n1 , . . . , nk ) ∈ B(D),

h−1 (M1 (n1 , . . . , nk )) = F1 (n1 , . . . , nk ),

and if for some (m1 , . . . , mk ) ∈ INk the set M1 (n1 , . . . , nk ) ∩ B2 (m1 , . . . , mk ) has no common points with h(S), then it is empty. Denote by M1 and M2 the results of the A-operation over the systems of sets M1 (n1 , . . . , nk ) and B2 (n1 , . . . , nk ) respectively. We have then M1 , M2 ∈ AB(D),

h−1 (M1 ) = A1 ,

h−1 (M2 ) = A2 .

Fix an arbitrary pair of sequences (mk ), (nk ) ∈ ININ and choose k such that (4.5.9) holds. Then M1 (n1 , . . . , nk ) ∩ B2 (m1 , . . . , mk ) ∩ h(S) = Ø,

232


and so M1 (n1 , . . . , nk ) ∩ B2 (m1 , . . . , mk ) = Ø. Since the chosen sequences are arbitrary, this implies that M1 and M2 do not intersect. Then, by the Lusin separation theorem—it is applicable because D is a metrizable compact space—there exists a Borel set H ⊆ D such that M1 ⊆ H ⊆ D \ M2 . We get A1 ⊆ B ⊆ S \ A2 , where B := h−1 (H) ∈ B0 (S), and the result follows with B1 = B, B2 = S \ B. Thus for the case of compact S, the theorem is proved. We now consider the general case. It follows from the definition of the A-operation and Lemma 4.5.25 that A1 , A2 ∈ AB0 (βS). According to the just proved compact version of the theorem, there exists a set B ∈ B0 (βS) such that A1 ⊆ B ⊆ βS\A2 . Then B1 = B ∩S and B2 = S\B1 are the desired sets. 2

4.5.3

Proofs of Theorem 4.5.3 (Mass Transfer Problem on Completely Regular Topological Space) and Theorem 4.5.8 (Duality Theorem on Noncompact Spaces)

Proof of Theorem 4.5.3: Every u ∈ Q can be uniquely extended to βS while preserving continuity. Then according to formula (4.5.1), c is extended to βS × βS as a lower semicontinuous function satisfying the triangle inequality. Since this extended function c is regular with respect to the Banach lattices C(βS × βS) and C(βS) (see Example 4.2.3), from Theorem 4.2.6 it follows that A(c, ; C(βS × βS), C(βS)) = B(c, ; C(βS)). By Theorem 4.2.8, there exists a measure µ0 ∈ C(βS×βS)∗+ = V+ (βS×βS) such that π1 µ0 = + , π2 µ0 = − , and A(c, ; C(βS ×βS), C(βS)) = µ0 (c). Now, by using Lemma 4.5.19 and the fact that + and − belong to V+ (S), we get µ0 ((βS × βS) \ (S × S)) = 0. Then by Lemma 4.5.20, A(c, ; C(βS × βS), C(βS)) = A(c, ), which combined with the obvious equality B(c, ; C(βS)) = B(c, ) implies the duality relation (4.5.2). Finally, if σ1 , σ2 ∈ V+ (S) and σ1 − σ2 = , then σ := σ1 − + = σ2 − − ∈ V+ (S).


233

Defining for every A ∈ B(S × S) µ(A) := µ0 (A) + σ{x; (x, x) ∈ A} and applying the equality c(x, x) = 0 for all x ∈ S, we obtain µ(c) = µ0 (c) = A(c, ), π1 µ = π1 µ0 + σ = σ1 , π2 µ = π2 µ0 + σ = σ2 . 2

This completes the proof. Proof of Theorem 4.5.8: (A) Put c(x, y) = sup (u(x) − u(y)) u∈Q

for all x, y ∈ S. Clearly, c(x, y) = c(x, y) for x = y, and c(x, x) ≥ 0 = c(x, x) for all x ∈ S. By Theorem 4.5.3, A(c, ) = B(c, )

for all ∈ V0 (S),

and it remains to note that B(c, ) = B(c, )

and A(c, ) = A(c, ).

(B) Assuming (4.5.2), we have to show that a representation (4.5.1) exists for c. Suppose the contrary. Then c(x0 , y0 ) >

sup u∈ Lip(c,S;C b (S))

(u(x0 ) − u(y0 )) = B(c, δx0 − δy0 )

for some x0 , y0 ∈ S with x0 = y0 , where as usual, δx stands for the Dirac measure at x. We put c (x, y) := min(c(x, y) − v(x) + v(y), N ) + v(x) − v(y), where N is chosen to be large enough: N > max(0, B(c, δx0 − δy0 ) − v(x0 ) + v(y0 )). Clearly, c is bounded and universally measurable. Further, c (x, y) ≤ c(x, y) for all (x, y) ∈ S × S, and c satisfies the triangle inequality— this follows from the nonnegativity of c(x, y) − v(x) + v(y) (cf. the proof of Theorem 4.1.1).

234


Setting u(x) = c (x, y0 ) and denoting by X the lattice of bounded universally measurable functions on S, we see that u ∈ Lip (c, S; X)), and furthermore, B(c, δx0 − δy0 ; X)

≥ u(x0 ) − u(y0 ) = c (x0 , y0 ) = min(c(x0 , y0 ), N + v(x0 ) − v(y0 )) > B(c, δx0 − δy0 ).

From this and the obvious inequality A(c, ) ≥ B(c, ; X)

for all ∈ V0 (S)

it follows that A(c, δx0 − δy0 ) > B(c, δx0 − δy0 ), a contradiction with (4.5.2). (C) From Lemma 4.5.27 and Theorem 4.5.29 it follows that c is regular with respect to IB0 (S × S) and IB0 (S), where IB0 stands for the Banach lattice of bounded Baire functions on the corresponding space. Then, by Theorem 4.2.6, the set Lip(c, S; IB0 (S)) is nonempty, and the proof is complete. 2

4.5.4

Reduction Theorem (a Noncompact Version)

In order to prove Theorems 4.5.11(2) and 4.5.15(3) a noncompact version of the reductions theorem extending Theorem 4.3.8 is required. The theorem is as follows: Theorem 4.5.30 (Noncompact reduction theorem) Let S ∈ L0 and c ∈ C (cf. section 4.5.1), and suppose that the condition A(c, ) =

lim A(c ∧ N, )

N →∞

(4.5.10)

holds for a given ∈ V0 (S). Then A(c, ) = A(c∗ , )

(4.5.11)

holds. (2) Duality theorem on noncompact spaces with continuous and cost function (bounded below). (3) Duality theorem on noncompact spaces and general cost function.


235

The following result is an immediate consequence of the theorem. Corollary 4.5.31 If c ∈ C is bounded, then (4.5.11) holds for all ∈ V0 (S). In order to prove Theorem 4.5.30, we shall need the following nontraditional measurable selection theorem. Measurable selection theorem. Suppose that S is a compact space, (Ω, T , µ) is a complete finite measure space, and Γ : Ω → 2S is a set-valued mapping with a (T ⊗ B0 (S))-analytic graph. Then dom Γ := {ω; Γ(ω) = Ø} ∈ T . Moreover, there exists a (T − B(S))-measurable selection for Γ, that is, a measurable mapping ϕ : (dom Γ, T ) → (S, B(S)) satisfying ϕ(ω) ∈ Γ(ω) for all ω ∈ dom Γ. A statement of this theorem with a brief sketch of its proof may be found in a paper by Evstigneev (1985). In a paper by Levin (1987) a complete proof of a more general result is presented with the derivation of this theorem. We precede the proof of Theorem 4.5.30 with two lemmas. Set W

:=

{(x, y); c∗ (x, y) = −∞},

A∗ (α) := {(x, y); c∗ (x, y) < α}, An (α) := {(x, y); cn (x, y) < α}, An (α) := {(x, y); cn (x, y) < α}, where α ∈ IR and n ∈ IN, and the functions cn and cn are defined by equations (4.1.16) and (4.1.17). Lemma 4.5.32 All these sets are (B0 (S) ⊗ B0 (S))-analytic. The proof is the same as that of Lemma 4.3.1 and of Corollary 4.3.3 if one takes into account Lemma 4.5.28. Corollary 4.5.33 The functions cn , cn (n ∈ IN), and c∗ are universally measurable.

236


We put M = (S × S) \ W , fix ε > 0, and consider in S × S the subsets M0

=

{(x, y); c(x, y) = c∗ (x, y)};

Mn

=

{(x, y); cn (x, y) ≤ c∗ (x, y) + ε} \

W1

=

Wn

=

1 ; W ∩ A1 − ε n−1 1 \ W ∩ An − Wk , ε

n−1

n ∈ IN;

Mk ,

k=0

n = 2, 3, . . . .

k=1

By Lemma 4.5.32, all these sets are universally measurable. Also, they are pairwise disjoint, and ∞ n=0

Mn = M,

∞

Wn = W.

n=1

We put En := Mn ∪ Wn , n ∈ IN. Lemma 4.5.34 Let µ ∈ V+ (S × S). For every m ∈ IN there exists a measure µm ∈ V+ (S × S) such that (π1 − π2 )µm = (π1 − π2 )µ, and for every N > 0 the inequality (4.3.3) holds. Proof: (cf. the proof of Lemma 4.3.6) For every n ∈ IN we consider a n set-valued mapping Γn : En → 2S , defined by Γn (x, y) := {(z1 , . . . , zn ); c(x, z1 ) + · · · + c(zn , y) < cn (x, y) + ε}. It is easily seen that its graph gr Γn is B(En )µ ⊗ B0 (S) ⊗ · · · ⊗ B0 (S)EF G D n (βS)n

analytic. Regarding Γn as a set-valued mapping En → 2 equality

, using the

B0 (βS) ⊗ · · · ⊗ B0 (βS) = B0 ((βS)n ) (see, for example, Neveu (1965)) and applying Lemmas 4.5.25 and 4.5.26, we obtain gr Γn ∈ A(B(En )µ ) ⊗ B0 ((βS)n ).


237

Further, it is clear that dom Γn = En . By the measurable selection theorem, there exists a (B(En )µ − B((βS)n ))-measurable mapping ϕn : En → (βS)n such that ϕn (x, y) = (ϕn1 (x, y), . . . , ϕnn (x, y)) ∈ Γn (x, y) for all (x, y) ∈ En . We set ϕn0 (x, y) := x,

ϕnn+1 (x, y) := y,

and consider the mappings ψnk : En → βS × βS defined by ψnk (x, y) := (ϕnk−1 (x, y), ϕnk (x, y)),

k = 1, . . . , n + 1.

We regard ψnk as mappings into βS × βS, although their values actually lie in S × S, since ϕn is a selection of Γn . Clearly, all the mappings ψnk are (B(En )µ −B0 (βS)⊗B0 (βS))-measurable; that is to say, they are measurable with respect to σ-algebras B(En )µ and B0 (βS) ⊗ B0 (βS) = B0 (βS × βS) (moreover, they are even (B((En )µ − B(βS × βS))-measurable). We now construct the required measure µm . We first shall determine it as a Radon measure on βS × βS and then show that in fact it belongs to V+ (S × S). We set ∞ m n+1 −1 µm (B) := µ(B ∩ M0 ) + µ(ψnk (B)) + µ B ∩ En (4.5.12) n=1 k=1

n=m+1

−1 for every B ∈ B0 (βS × βS). It is well-defined, because the sets ψnk (B) are µ-measurable. Clearly, µm is a finite positive measure on B0 (βS × βS). It is a well-known measure-theoretical fact (see, for example, Neveu (1965) or Dinculeanu (1967)) that µ can be extended uniquely to a positive Radon measure on βS × βS, which we shall denote by µm as well.

Let us verify that the extended measure µm has all needed properties. By Lemma 4.5.26, (βS × βS) \ (S × S) ∈ B0 (βS × βS). Then, by (4.5.12), µm ((βS × βS) \ (S × S)) = 0. Applying Lemma 4.5.19, we see that µm ∈ V+ (S × S). Further, it is a straightforward argument to verify that u d(π1 − π2 )µm = u d(π1 − π2 )µ S

S

for every u ∈ C b (S) = C(βS); that is, (π1 − π2 )µm = (π1 − π2 )µ. We claim that (4.3.3) is satisfied. First observe that equation (4.5.12) is true

238


for every B belonging to the µm -completion of the σ-algebra B0 (βS × βS), in particular, for B ∈ A(B0 (S) ⊗ B0 (S)). Taking this into account together with the condition c ∈ C, we obtain (c ∧ N ) dµm ≤ c dµm + N µm (E ) (S×S)\E

S×S

=

m n+1 ∞ c∗ dµ + c(ψnk (x, y)) dµ + N µ(En ),

M0

n=1 E

n

k=1

n=m+1

with E

:=

∞

En .

n=m+1

Since ϕn is a selection of Γn , we have n+1

c(ψnk (x, y)) < cn (x, y) + ε

for all (x, y) ∈ En .

k=1

Now (4.3.3) follows from the above inequality and the definition of the sets Mn and Wn . The proof of Lemma 4.5.34 is now complete. 2

Remark 4.5.35 At the first glance, extending µm from the Baire to the Borel σ-algebras is superfluous, since formula (4.5.12) is well-defined for every B ∈ B(βS × βS). However, if µm is defined by (4.5.12) for Borel sets, then an additional question arises whether the defined Borel measure is Radon, that is, whether it is inner regular. The above proof avoids this difficulty. Proof of Theorem 4.5.30: The proof is actually the same as that of Theorem 4.3.4, provided that V+ (S × S) and Lemma 4.5.34 are used in2 stead of C(S × S)∗+ and Lemma 4.3.6.

4.5.5

Proofs of Theorem 4.5.11 (Duality Theorem on Noncompact Spaces with Continuous Cost Function Bounded Below) and Theorem 4.5.15 (Duality Theorem on Noncompact Spaces and General Cost Function)

Lemma 4.5.36 Suppose that S is compact and c ∈ C(S × S). Then c∗ is either continuous or identically equal to −∞.


239

This is a particular case of Lemma 4.3.12, where S = S and f is the identical mapping on S. Proof of Theorem 4.5.11: (a) ⇒ (b). Every u ∈ C b (S) satisfying the inequalities u(x) − u(y) ≤ c(x, y) for all x, y ∈ S satisfies the analogous inequalities with c ∧ N in place of c, when N > 0 is large enough. Consequently, B(c, ) =

lim B(c ∧ N, ).

N →∞

Next, by using the monotonicity of A in its first argument, we get A(c, )

≥ ≥

lim A(c ∧ Nk , ) ≥ lim A(c ∧ Nk , )

k→∞

k→∞

lim B(c ∧ Nk , ) = B(c, ),

k→∞

which implies (b). (b) ⇒ (a). The functions c ∧ Nk can be extended uniquely to βS × βS preserving continuity. Hence c ∧ Nk ∈ C, and by Theorem 4.5.30, A(c ∧ Nk , ) = A((c ∧ Nk )∗ , ). There are two ways to obtain the function (c ∧ Nk )∗ . The first way is to calculate it directly in S × S. The second is to calculate it at first in βS × βS (this means that the corresponding points z1 , . . . , zn are taken in βS rather than in S) and then to consider the restriction to S × S of the function thus calculated. Since S is dense in βS and c ∧ Nk is continuous, both approaches should yield the same result. Now, by Lemma 4.5.36, (c ∧ Nk )∗ is either continuous on βS × βS or identically equal to −∞. If (c ∧ Nk )∗ is continuous, then the function that coincides with (c ∧ Nk )∗ for x = y and equals 0 for x = y is lower semicontinuous and can be represented in the form (4.5.1) (see Remark 4.5.6). Hence, by Theorem 4.5.8 (A), A((c ∧ Nk )∗ , ) = B((c ∧ Nk )∗ , ). Consequently, A((c ∧ Nk ), ) = B((c ∧ Nk )∗ , ) = B(c ∧ Nk , ); here the right-hand side equality follows from the obvious relation Lip(c ∧ Nk , S; C b (S)) = Lip((c ∧ Nk )∗ , S; C b (S)). If (c ∧ Nk )∗ ≡ −∞, then A(c ∧ Nk , ) = A((c ∧ Nk )∗ , ) = −∞,

240


and at the same time, Lip(c ∧ Nk , S; C b (S)) = Lip((c ∧ N )∗ , S; C b (S)) = Ø. Hence B(c ∧ Nk , ) = B((c ∧ Nk )∗ , ) = −∞. Thus, in this case, the equality A(c ∧ Nk , ) = B(c ∧ Nk , ) holds as well. We thus arrive at A(c, ) =

lim A(c ∧ Nk , ) =

k→∞

lim B(c ∧ Nk , ) = B(c, ).

k→∞

2 Proof of Theorem 4.5.15: Necessity. The validity of (4.5.4) is verified by the same argument as the analogous equality in the proof of the implication (a) ⇒ (b) of Theorem 4.5.11. Next, by Theorem 4.5.30, (4.5.11) holds for all ∈ V0 (S). Therefore, the duality relation (4.5.2) can be rewritten as A(c∗ , ) = B(c∗ , )

for all ∈ V0 (S).

(4.5.13)

If c∗ is bounded below (hence > −∞), then it satisfies the triangle inequality. Besides, by Lemma 4.5.32, we have {(x, y); c∗ (x, y) ≤ α}

5 1 ∈ A(B0 (S) ⊗ B0 (S)) (x, y); c∗ (x, y) < α + = n

for α ∈ IR.

n∈IN

Then, by assertions (C) and (B) of Theorem 4.5.8, c∗ ∈ C∗ . Sufficiency. It follows from (4.5.4) and Theorem 4.5.30 that (4.5.11) holds for all ∈ V0 (S). If c∗ ∈ C∗ , then from Theorem 4.5.8 (A) we derive (4.5.13), which, combined with (4.5.4), is equivalent to (4.5.2). If c∗ is unbounded from below, then Lip(c∗ , S; C b (S)) = Ø, whence B(c∗ , ) = −∞ for all ∈ V0 (S). Besides, in this case (c ∧ N )∗ is unbounded from below as well, because (c ∧ N )∗ ≤ c∗ . But then (c ∧ N )∗ ≡ −∞ for all N > 0, which follows easily from the fact that (c ∧ N )∗ is bounded above and satisfies the triangle inequality (for a more detailed argument see the proof of the implication (b) ⇒ (a) of Theorem 4.1.5). Then, by Theorem 4.5.30, A(c ∧ N, ) = A((c ∧ N )∗ , ) = −∞

for all ∈ V0 (S).

Applying (4.5.4), we get A(c, ) = −∞ for all ∈ V0 (S). Thus, in this case, (4.5.2) holds too. 2

4.6 Infinite Linear Programs

241

4.6 Duality Theorems for Infinite Linear Programs Related to the Mass Transfer Problem In this section, based on papers of Levin and Milyutin (1979), Levin (1984, 1990), and Levin and Rachev (1989), duality theorems in mass settings are proved for linear extremal problems related to the mass transfer problem.

4.6.1

Duality Theory for an Abstract Scheme of Infinite-Dimensional Linear Programs and Its Application to the Mass Transfer Problem

Let (X, X ) and (Z, Z ) be two pairs of real vector spaces in duality. Let a : Z → 2X be a superlinear mapping; that is, a(z) a(Z) := z∈Z

is a convex cone in X, or what is the same, a(z1 + z2 ) ⊇ a(z1 ) + a(z2 )

for all z1 , z2 ∈ Z,

and a(αz) = αa(z)

for all α > 0 and all z ∈ Z.

Suppose that some semilinear space E ⊂ X is given consisting of functionals c : a(Z) → IR ∪ {+∞} that are additive (c(x1 + x2 ) = c(x1 ) + c(x2 ) for all x1 , x2 ∈ a(Z)) and positively homogeneous (c(αx) = αc(x) for all x ∈ a(Z) and all α ∈ IR+ ). By definition, the semilinearity of E amounts to c 1 + c2 ∈ E

whenever c1 , c2 ∈ E

and αc ∈ E

whenever c ∈ E and α ∈ IR+ .

We assume that dom a := {z ∈ Z; a(z) = Ø} is nonempty and define 0 · (+∞) = 0, whence 0 · c = 0 for any c ∈ E. The superlinearity of a implies that dom a is a convex cone in Z and that for every z ∈ dom a, a(z) is a convex set in X.

242


Given z ∈ Z and c ∈ E, two linear extremal problems arise, which consist in finding the optimal values V (c, z) := inf{c(x); x ∈ a(z)} and v(c, z) := sup{ z, z ; z ∈ a (c)}, where a (c) := {z ∈ Z ; z, z ≤ c(x)

for all z ∈ dom a, x ∈ a(z)}.

We assume, by definition, that V (c, z) = +∞ if z ∈ / dom a, and that v(c, z) = −∞ if a (c) = Ø. From these definitions it follows that v(c, z) ≤ V (c, z) whenever c ∈ E and z ∈ Z. The duality problem in a mass setting is to describe the class of functionals c in E for which the duality relation V (c, z) = v(c, z) holds whenever z ∈ dom a. Of course, for various concrete examples of the problem the class will depend on the choice of the enveloping semilinear space E. The following abstract duality theorem gives a general answer to this question. Theorem 4.6.1 (Abstract duality theorem) (I) If V (c, z0 ) = −∞ for some z0 ∈ dom a, then V (c, z) = v(c, z) = −∞

for all z ∈ dom a.

(II) If V (c, z) > −∞ for all z ∈ dom a and the functional V (c, ·) is lower semicontinuous on Z in the weak topology σ(Z, Z ), then V (c, z) = v(c, z) > −∞

for all z ∈ Z.

(III) If V (c, z) > −∞ for all z ∈ dom a, and V (c, ·) is not σ(Z, Z )-lower semicontinuous at some point of z0 ∈ Z, then V (c, z0 ) > v(c, z0 ).


243

Corollary 4.6.2 If V (c, z) > −∞ for all z ∈ Z, then the duality relation V (c, z) = v(c, z)

for all z ∈ Z

holds if and only if V (c, ·) is σ(Z, Z )-lower semicontinuous on Z. Remark 4.6.3 The weak topology σ(Z, Z ) in Theorem 4.6.1 and Corollary 4.6.2 can be replaced by any other topology consistent with the duality between Z and Z . Proof of Theorem 4.6.1: (I) Clearly, V (c, ·) is a sublinear functional dom a → IR ∪ {+∞}, that is, V (c, αz) = αV (c, z)

for all z ∈ dom a and all α > 0,

and V (c, z1 + z2 ) ≤ V (c, z1 ) + V (c, z2 )

for all z1 , z2 ∈ dom a.

Then V (c, z0 ) = −∞ implies V (c, z) = −∞ for all z ∈ dom a, and moreover, V (c, z) ≥ v(c, z) whenever z ∈ Z, so the result follows. (II) Since V (c, z) > −∞ for all z ∈ dom a, V (c, ·) is a sublinear functional Z → IR ∪ {+∞}, V (c, αz) = αV (c, z)

for all z ∈ Z and all α > 0,

and V (c, z1 + z2 ) ≤ V (c, z1 ) + V (c, z2 )

for all z1 , z2 ∈ Z.

Then dom V (c, ·) = dom a, and since V (c, ·) is σ(Z, Z )-lower semicontinuous on Z, dom V (c, ·) is σ(Z, Z )-closed in Z. Consequently, 0 ∈ dom V (c, ·), and hence V (c, 0) = 0 and V (c, ·) is sublinear in the usual sense. It is a standard fact in convex analysis (see, for example, Ioffe and Tihomirov (1979) or Levin (1985a)) that the σ(Z, Z )-lower semicontinuity of the sublinear functional V (c, ·) on Z implies that the subdifferential ∂V (c, ·)(0) := {z ∈ Z ; z, z ≤ V (c, z)

for all z ∈ Z}

is nonempty. Furthermore, for every z ∈ Z, the equality V (c, z) = sup{ z, z ; z ∈ ∂V (c, ·)(0)}

244


holds. It remains to observe that ∂V (c, ·)(0) = a (c). Consequently, sup{ z, z ; z ∈ ∂V (c, ·)(0)} = v(c, z)

for all z ∈ Z.

(III) Recall now another fact from convex analysis (see, for example, Levin (1985a)): A proper sublinear functional p : Z → IR∪{+∞} is σ(Z, Z )lower semicontinuous at a point z0 ∈ Z if and only if the equality p(z0 ) = sup{ z0 , z ; z ∈ ∂p(0)} holds. Applying this to the functional V (c, ·) completes the proof.

2

As a special case of the abstract duality theorem we shall now consider the noncompact version of the mass transfer problem with given difference of the marginals as stated in Section 4.5. Let S ∈ L0 and take Z = V (S), Z = C b (S), X = V (S × S), X = C (S × S), b

a() := {µ ∈ V+ (S × S); π1 µ − π2 µ = } for all ∈ V (S). It is clear that a is a superlinear mapping V (S) → 2V (S×S) , and dom a = V0 (S). Further, we take E = C, where C denotes the class of functions c : S × S → IR ∪ {+∞} with B0 (S) ⊗ B0 (S)-analytic sublevel sets (4.5.3) bounded below, and c(x, y)µ( d(x, y)) for all c ∈ C and µ ∈ V+ (S × S). c(µ) := S×S

Lemma 4.6.4 (Noncompact version of mass transfer problem with given difference of the marginals—the general Kantorovich–Rubinstein mass transshipment problem.) For every c ∈ C, the equality a (c) = Lip(c, S; C b (S)) holds.


Proof: We start with a (c) =

245

u ∈ C b (S);

u(x)( dx) ≤ c(µ)

S

for all ∈ V0 (S) and all µ ∈ a() . Substituting = δx − δy and µ = δ(x,y) for all x, y ∈ S and taking into account that µ ∈ a() yields a (c) ⊆

{u ∈ C b (S); u, δx − δy ≤ c(δ(x,y) )

= {u ∈ C (S); u(x) − u(y) ≤ c(x, y) = Lip(c, S; C b (S)). b

for all x, y ∈ S} for all x, y ∈ S}

On the other hand, u ∈ Lip(c, S; C b (S)) implies u(x)( dx) = (u(x) − u(y))µ( d(x, y)) ≤ c(x, y)µ( d(x, y)) S

S×S

S×S

whenever ∈ V0 (S) and µ ∈ a(); that is, Lip(c, S; C b (S)) ⊆ a (c).

2

Remark 4.6.5 It follows from Lemma 4.6.4 that v(c, ) = B(c, ) whenever c ∈ C, ∈ V (S), and since the equality V (c, ) = A(c, ) is obvious, Theorem 4.5.15 may be regarded as a detailed description of those c ∈ C for which A(c, ·) is lower semicontinuous on V (S) in the weak topology σ(V (S), C b (S)).

4.6.2

Duality Theorems for the Mass Transfer Problem with Given Marginals

The next application of the abstract duality theorem leads to a general duality result for the mass transfer problem with given marginals. Given a space S ∈ L, a universally measurable cost function c : S × S → IR ∪ {+∞} bounded below, and measures σ1 , σ2 ∈ V+ (S) with σ1 S = σ2 S, the problem is to find the optimal value C(c; σ1 , σ2 )     c(x, y)µ( d(x, y)); µ ∈ V+ (S × S), π1 µ = σ1 , π2 µ = σ2 . := inf   S×S

246


The dual problem is to find the optimal value   u1 (x)σ1 ( dx) − u2 (x)σ2 ( dx); u1 , u2 ∈ C b (S), D(c, σ1 , σ2 ) := sup  S

S

u1 (x) − u2 (y) ≤ c(x, y) for all (x, y) ∈ S × S

  

(cf. (4.1.8) and (4.1.9)). As is mentioned in Section 4.1, the original Monge–Kantorovich problem with metric cost function and with given marginal difference is equivalent to the corresponding problem with given marginals. From this equivalence the Kantorovich equality C(c; σ1 , σ2 ) = B(c, )

for all = σ1 − σ2 ∈ C(S)∗0

(4.6.1)

follows for a metric compact space (S, r) and c = r (see Kantorovich (1942) and Kantorovich and Akilov (1984)). The problem with given marginals and the equality (4.6.1) have many stochastic applications. In view of this, it is of interest to describe the cost functions c satisfying the Kantorovich equality (4.6.1). The following theorem answers this question in a rather general situation; see Levin (1990). Theorem 4.6.6 (Necessary and sufficient conditions for the equality of Monge–Kantorovich and Kantorovich–Rubinstein minimal functionals) Let S ∈ L and let a function c : S × S → IR ∪ {+∞} be bounded below and universally measurable. Then (4.6.1) holds if and only if c has a representation (4.5.1) with Q being a nonempty subset of C b (S). Proof: Sufficiency follows from Theorem 4.5.3. To prove necessity, we take = δx − δy for any x, y ∈ S. Since the only measure µ ∈ V+ (S × S) with π1 µ = δx , π2 µ = δy is the Dirac measure µ = δ(x,y) , we have C(c; δx , δy ) = c(x, y) > −∞,

(4.6.2)

which combined with (4.6.1) gives us a representation (4.5.1), where Q = Lip(c, S; C b (S)). 2 The next result supplements Theorem 4.6.6 in the case where S is a compact space. Theorem 4.6.7 Let S be a compact space and let c be lower semicontinuous and satisfy the equality c(x, x) = 0 for all x ∈ S. Then the following assertions are equivalent:


247

(a) The equality A(c, ) = C(c; σ1 , σ2 ) holds for all σ ∈ V0 (S) and all σ2 , σ2 ∈ V+ (S) with σ1 − σ2 = . (b) c satisfies the triangle inequality. Proof: (b) ⇒ (a) follows from Theorem 4.6.6 and Remark 4.5.6, while (a) ⇒ (b) follows from (4.6.2) and the fact that A(c, ·) is sublinear; hence A(c, δx − δy ) ≤ A(c, δx − δz ) + A(c, δz − δy ) for all x, y, z ∈ S.

2

The problem with given marginals is a particular case of the following marginal problem. The general Monge–Kantorovich mass transportation problem: Given a pair of spaces S1 , S2 ∈ L, a cost function c : S1 × S2 → IR ∪ {+∞}, and measures σ1 ∈ V+ (S1 ), σ2 ∈ V+ (S2 ) with σ1 S1 = σ2 S2 , one has to minimize the functional c(x, y)µ( d(x, y)) (4.6.3) c(µ) = S1 ×S2

subject to the constraints µ ∈ V+ (S1 × S2 ), π1 µ = σ1 , π2 µ = σ2 .

(4.6.4)

The dual Monge–Kantorovich problem: The dual problem is to maximize the functional u1 (x)σ1 ( dx) − u2 (y)σ2 ( dy) (4.6.5) S1

S2

over the set of pairs (u1 , u2 ) ∈ C b (S1 ) × C b (S2 ) satisfying u1 (x) − u2 (y) ≤ c(x, y)

for all(x, y) ∈ S1 × S2 .

(4.6.6)

The optimal values of problems (4.6.3)–(4.6.4) and (4.6.5)–(4.6.6) will be denoted by C(c; σ1 , σ2 ) and D(c; σ1 , σ2 ) respectively. Note that for S1 = S2 = S we get the mass transfer problem with given marginals and its dual problem respectively. Problem (4.6.3)–(4.6.4) is well-defined for every universally measurable cost function that is bounded below. It will be convenient to consider this problem for all σ1 ∈ V+ (S1 ) and σ2 ∈ V+ (S2 ), assuming by definition

248


that C(c; σ1 , σ2 ) = +∞ if σ1 S1 = σ2 S2 . This is rather natural, since the constraint set (4.6.4) is empty when σ1 S1 = σ2 S2 . Problem (4.6.5)–(4.6.6) is well-defined for every function c bounded below and for every σ1 ∈ V+ (S1 ), σ2 ∈ V+ (S2 ). It is obvious that always −∞ < D(c; σ1 , σ2 ) ≤ C(c; σ1 , σ2 ). The next theorem characterizes cost functions c for which the duality reation C(c; σ1 , σ2 ) = D(c; σ1 , σ2 ) holds for all (σ1 , σ2 ) ∈ V+ (S1 ) × V+ (S2 ) (see Levin (1984; 1990, Th. 9.9)). Theorem 4.6.8 (Necessary and sufficient conditions for duality in the Monge–Kantorovich problem) Let S1 , S2 ∈ L and let c : S1 × S2 → IR ∪ {+∞} be universally measurable and bounded below. The following statements are equivalent: (a) c has a representation (4.5.5), or equivalently (see Remark 4.5.23), c is lower semicontinuous on S1 × S2 . (b) The duality relation C(c; σ1 , σ2 ) = D(c; σ1 , σ2 )

(4.6.7)

holds for all σ1 ∈ V+ (S1 ), σ2 ∈ V+ (S2 ). Moreover, if σ1 S1 = σ2 S2 , then the optimal value C(c; σ1 , σ2 ) is attained; that is, there exists an optimal measure for problem (4.6.3)–(4.6.4). Remark 4.6.9 The marginal problem of the form (4.6.3)–(4.6.4) has been studied under different assumptions by many authors. In Chapters 2 and 3 we have provided some historical remarks about the general development of this problem including general nontopological versions of the duality. The restriction to lower semicontinuous cost functions is a consequence of the restriction to continuous functions in the dual functional that is used in this chapter. For the case of compact spaces S1 and S2 , the duality relation (4.6.7) was given in Levin (1974, 1978a). Proof: We derive the equivalence (a) ⇔ (b) from the abstract duality Theorem 4.6.1. To this end, take X = V (S1 × S2 ), X = C b (S1 ) ⊗ C b (S2 )


249

(for the definition of this space, see the comments before Lemma 4.5.22), Z = V (S1 ) × V (S2 ), Z = C b (S1 ) × C b (S2 ) with the pairings x, x :=

h(µ) =

h(x, y)µ( d(x, y), S1 ×S2

z, z

:=

x = µ ∈ X, x = h ∈ X ,

σ1 (u1 ) − σ2 (u2 ) =

u1 (x)σ1 ( dx) −

S1

u2 (y)σ2 ( dy), S2

z = (σ1 , σ2 ) ∈ Z, z = (u1 , u2 ) ∈ Z . Next, define a : Z → 2X by a(σ1 , σ2 ) := {µ ∈ V+ (S1 × S2 ); π1 µ = σ1 , π2 µ = σ2 }.

(4.6.8)

Note that a(Z) = V+ (S1 × S2 ) and dom a = {(σ1 , σ2 ) ∈ V+ (S1 ) × V+ (S2 ); σ1 S1 = σ2 S2 }. We denote by E the semilinear space consisting of all universally measurable functions c : S1 × S2 → IR ∪ {+∞} bounded below with c(µ) defined by (4.6.3). It is clear that C(c; σ1 , σ2 ) = V (c, z)

for all c ∈ E, z = (σ1 , σ2 ) ∈ Z,

where V (c, z) was defined in section 4.6.1. Now we describe a (c) and the functional v(c, ·) for c ∈ E. By definition (see Section 4.6.1), we have a (c) =

{(u1 , u2 ); σ1 (u1 ) − σ2 (u2 ) ≤ c(µ) (4.6.9) for all (σ1 , σ2 ) ∈ dom a and all µ ∈ a(σ1 , σ2 )}.

Substituting the Dirac measures σ1 = δx , σ2 = δy , µ = δ(x,y) , for all x ∈ S1 , y ∈ S2 , and taking into account that µ ∈ a(σ1 , σ2 ), we see from (4.6.9) that a (c) ⊆ {(u1 , u2 ); u1 (x) − u2 (y) ≤ c(x, y)

for all (x, y) ∈ S1 × S2 }.

Thus, a (c) is contained in the set of pairs (u1 , u2 ) ∈ C b (S1 ) × C b (S2 ) satisfying (4.6.6). On the other hand, if a pair (u1 , u2 ) satisfies (4.6.6) and µ ∈ a(σ1 , σ2 ), then integrating (4.6.6) with respect to µ and taking into

250


account (4.6.8), we obtain (u1 , u2 ) ∈ a (c). Thus, a (c) is just the constraint set (4.6.6). Consequently, v(c, z) = D(c; σ1 , σ2 )

for all c ∈ E and all z = (σ1 , σ2 ) ∈ Z.

According to Theorem 4.6.1, the duality relation (4.6.7) holds if and only if C(c, ·) is lower semicontinuous on Z in the weak topology σ(Z, Z ). Therefore, in order to prove that (a) and (b) in Theorem 4.6.25 are equivalent, it suffices to verify that (a) is necessary and sufficient for the lower semicontinuity of C(c, ·). Necessity. The functional C(c, ·) on Z is obviously sublinear. Assume that C(c, ·) is lower semicontinuous in the weak topology σ(Z, Z ). Then it is a well-known fact in convex analysis that this functional can be represented in the form C(c, z) = sup{ z, z ; z ∈ ∂C(c, ·)(0)}

for all z ∈ Z,

(4.6.10)

where ∂C(c, ·)(0) = {z ∈ Z ; z, z ≤ C(c, z) for all z ∈ Z}. Further, since C b (S1 ) = C(βS1 ) and C b (S2 ) = C(βS2 ), the right-hand side of (4.6.10) makes sense for all z = (σ1 , σ2 ) ∈ V (βS1 ) × V (βS2 ) = C(βS1 )∗ × C(βS2 )∗ , not only for z = (σ1 , σ2 ) ∈ Z = V (S1 ) × V (S2 ). Therefore, using (4.6.10), we extend C(c, ·) to V (βS1 ) × V (βS2 ) as a sublinear and weakly* lower semicontinuous functional. We again denote the extension by C(c, ·) and define a function g : βS1 × βS2 → IR ∪ {+∞} by the formula g(x, y) := C(c; δx , δy )

for all (x, y) ∈ βS1 × βS2 .

Applying (4.6.10) to z = (δx , δy ), we get g(x, y) = sup{u1 (x) − u2 (y); (u1 , u2 ) ∈ ∂C(c, ·)(0)}, which implies that g is lower semicontinuous on βS1 × βS2 . Further, we have g(x, y) = c(x, y)

for (x, y) ∈ S1 × S2 ,

since for σ1 = δx , σ2 = δy , the constraint set (4.6.4) consists of a single element, namely, the Dirac measure δ(x,y) . Then c admits a representation (4.5.5) with h(x, y) = u1 (x) − u2 (y), (u1 , u2 ) ∈ ∂C(c, ·)(0) and so is lower semicontinuous.


251

Sufficiency. We have to verify that for each a ∈ IR the set M (a) := {(σ1 , σ2 ); C(c; σ1 , σ2 ) ≤ a} is σ(Z, Z )-closed. According to Remark 4.5.21, it suffices to find a weakly* closed set M (a) ⊆ V (βS1 ) × V (βS2 ) = (C(βS1 ) × C(βS2 ))∗ such that M (a) = M (a) ∩ (V (S1 ) × V (S2 )). We define the set M (a) by the property that a point (σ1 , σ2 ) belongs to M (a) if and only if (i) σ1 ∈ V+ (βS1 ), σ2 ∈ V+ (βS2 ), and (ii) there exists a measure µ ∈ V+ (βS1 × βS2 ) such that π1 µ = σ1 , π2 µ = σ2 , and c(µ) ≤ a, where c(x, y) := sup{h(x, y); h ∈ C(βS1 × βS2 ), h|S1 ×S2 ≤ c}. It follows from Lemmas 4.5.20 and 4.5.22 that the intersection of M (a) and V (S1 ) × V (S2 ) is actually M (a). It remains to show that M (a) is weakly* closed. This follows easily from the lower semicontinuity of c on βS1 × βS2 and the Krein–Smulyan theorem (see, for example, Dunford and Schwartz (1958)), which asserts that a convex set in a dual Banach space is weakly* closed if its intersections with closed balls are so. We omit the details. The first assertion of the theorem is thus proved. The assertion that C(c; σ1 , σ2 ) is attainable if σ1 S = σ2 S follows from the weak* compactness of the constraint set (4.6.4) considered as a subset in V (βS1 × βS2 ) = C(βS1 × βS2 )∗ and from the weak* lower semicontinuity of c(µ). 2

4.6.3

Duality Theorem for a Marginal Problem with Additional Constraints of Moment-Type

In this section, an extremal marginal problem is considered with additional constraints of the moment-type. The problem is

252


to minimize the functional (4.6.3) c(µ) = c(x, y)µ( d(x, y)) S1 ×S2

subject to constraints (4.6.4), µ ∈ V+ (S1 × S2 ); π1 µ = σ1 , π2 µ = σ2 , and the following additional constraints of moment-type: fk (x, y)µ( d(x, y)) ≤ bk , k = 1, . . . , m.

(4.6.11)

S1 ×S2

Here S1 , S2 ∈ L, and we assume that fk , k = 1, . . . , m, belong to the class LSC(βS1 × βS2 ) of all functions S1 × S2 → IR ∪ {+∞} that admit a representation (4.5.5), c(x, y) = sup{h(x, y); h ∈ H}, H ⊂ C b (S1 )⊗C b (S2 ); see Lemma 4.5.22. A motivation for this notation is given by Lemma 4.5.22, asserting that every f ∈ LSC(βS1 × βS2 ) can be extended to βS1 × βS2 preserving lower semicontinuity. Remark 4.6.10 It can be seen that among all lower semicontinuous functions βS1 × βS2 → IR ∪ {+∞} extending a function f ∈ LSC(βS1 × βS2 ) to βS1 × βS2 there exists a maximal one. It is given by f(x, y) = sup{h(x, y); h ∈ C(βS1 × βS2 ), h|S1 ×S2 ≤ f }. Remark 4.6.11 Duality theorems for the marginal problem (4.6.3), (4.6.4), (4.6.11) were studied by Levin (1984b) and Levin and Rachev (1989). The following results are based on these papers. For general moment-type families, duality results can be found in Kempermann (1983, 1987). The optimal value of problem (4.6.3), (4.6.4), (4.6.11) will be denoted by Val(c; σ1 , σ2 , b), where b = (b1 , . . . , bm ). We use the usual convention that Val(c; σ1 , σ2 , b) = +∞ if the constraint set (4.6.4), (4.6.11) is empty. In particular, this case occurs when (σ1 , σ2 ) ∈ / V+ (S1 ) × V+ (S2 ) or σ1 S1 = σ2 S2 . Before formulating the dual problem, define W :=

m 5 k=1

dom fk ,


253

where dom fk := {(x, y) ∈ S1 × S2 ; fk (x, y) < +∞},

k = 1, . . . , m.

Let c and c be a pair of universally measurable functions S1 × S2 → IR ∪ {+∞} bounded below. Observe that Val(c; σ1 , σ2 , b) = Val(c ; σ1 , σ2 , b)

(4.6.12)

for all (σ1 , σ2 , b) ∈ V (S1 ) × V (S2 ) × IRm provided that c|W = c |W . Indeed, if the constraint set (4.6.4), (4.6.11) is nonempty, then any µ ∈ V+ (S1 × S2 ) satisfying (4.6.11) is concentrated on W , while Val(c; σ1 , σ2 , b) = Val(c ; σ1 , σ2 , b) = +∞ if the constraint set is empty. The dual problem consists in maximizing the functional m u1 (x)σ1 ( dx) − u2 (y)σ2 ( dy) − λk bk S1

(4.6.13)

k=1

S2

on the space C b (S1 ) × C b (S) × IRm , subject to the constraints u1 (x) − u2 (y) −

m

λk fk (x, y) ≤

c(x, y),

∀(x, y) ∈ W,

(4.6.14)

k=1

λk

≥

0,

k = 1, . . . , m.

(4.6.15)

The optimal value of the dual problem is denoted by val(c; σ1 , σ2 , b). Observe that the constraint set (4.6.14), (4.6.15) is nonempty for all (σ1 , σ2 , b), because c and fk , k = 1, . . . , m, are bounded below. Therefore, val(c; σ1 , σ2 , b) > −∞

whenever (σ1 , σ2 , b) ∈ V (S1 ) × V (S2 ) × IRm .

We are now in position to formulate the duality theorem. Theorem 4.6.12 Duality theorem for a marginal problem with momenttype constraints. (I) Suppose that c : S1 × S2 → IR ∪ {+∞} is bounded below and universally measurable. Then the existence of a function c ∈ LSC(βS1 × βS2 ) coinciding with c on W is sufficient for the duality relation Val(c; σ1 , σ2 , b) = val(c; σ1 , σ2 , b)

(4.6.16)

to hold for all (σ1 , σ2 , b) ∈ V (S1 ) × V (S2 ) × IRm . If in addition the constraint set (4.6.4), (4.6.11) is nonempty for a given (σ1 , σ2 , b), then the optimal value Val(c; σ1 , σ2 , b) is attained, that is, some measure µ ∈ V+ (S1 × S2 ) exists that is an optimal solution for (4.6.3), (4.6.4), (4.6.11).

254


(II) In the case where fk ∈ C b (S1 ) ⊗ C b (S2 ), k = 1, . . . , m, the condition that c ∈ LSC(βS1 × βS2 ) is necessary and sufficient for the duality relation (4.6.16) to hold. (III) Let σ1 ∈ V+ (S1 ), σ2 ∈ V+ (S2 ), σ1 S1 = σ2 S2 , and b = (b1 , . . . , bm ) ∈ IRm be fixed. A necessary and sufficient condition for the finiteness val(c; σ1 , σ2 , b) < +∞ for all c ∈ C b (S1 ) ⊗ C b (S2 ) is the validity of the inequality

u1 (x)σ1 ( dx) −

S1

u2 (y)σ2 ( dy) ≤

m

λk b k

(4.6.17)

k=1

S2

whenever u1 (x) − u2 (y) ≤

m

λk fk (x, y)

for all (x, y) ∈ W. (4.6.18)

k=1

Here u1 ∈ C b (S1 ), u2 ∈ C b (S2 ), and λk , k = 1, . . . , m, are nonnegative constants with λ1 + · · · + λm = 1. Proof: (I) Set X

:=

X Z Z

V (S1 × S2 ),

:= C b (S1 ) ⊗ C b (S2 ), := V (S1 ) × V (S2 ) × IRm , := C b (S1 ) × C b (S2 ) × IRm .

Define the pairings h(x, y)µ( d(x, y)) x, x := S1 ×S2

for all x = µ ∈ X and x = h ∈ X , and let m λk bk z, z := u1 (x)σ1 ( dx) − u2 (y)σ2 ( dy) − S1

S2

k=1

for all z = (σ1 , σ2 , b) ∈ Z, z = (u1 , u2 , λ) ∈ Z . Define a : Z → 2X by   µ ∈ V+ (S1 × S2 ); π1 µ = σ1 , π2 µ = σ2 , (4.6.19) a(σ1 , σ2 , b) :=    fk (x, y)µ( d(x, y)) ≤ bk , k = 1, . . . , m .  S1 ×S2


255

Next consider the semilinear space E consisting of all universally measurable functions c : S1 × S2 → IR ∪ {+∞} bounded below with c(µ) being defined by (4.6.3). Obviously, V (c, z) = Val(c; σ1 , σ2 , b) for all c ∈ E and all z = (σ1 , σ2 , b) ∈ Z, where V (c, z) = inf{c(x); x ∈ a(z)} (see Section 4.6.1). In order to apply the abstract duality theorem (Theorem 4.6.1), we must describe a (c) and determine the functional v(c, z). By definition we have 0 (u1 , u2 , λ); σ1 (u1 ) − σ2 (u2 ) − λb ≤ c(µ) (4.6.20) a (c) = 1 for all (σ1 , σ2 , b) ∈ dom a and all µ ∈ a(σ1 , σ2 , b) . Fix (x, y) ∈ W and put bk ≥ fk (x, y), k = 1, . . . , m, σ1 = δx , σ2 = δy , and µ = δ(x,y) . Taking into account that µ ∈ a(σ1 , σ2 , b), we derive from (4.6.20) that a (c) ⊆

(u1 , u2 , λ); λk ≥ 0, u1 (x) − u2 (y) −

m

k = 1, . . . , m, λk fk (x, y) ≤ c(x, y)

for all (x, y) ∈ W .

k=1

Here the condition that λk ≥ 0 is obtained by letting bk tend to +∞. Thus, each (u1 , u2 , λ) ∈ a (c) satisfies (4.6.14), (4.6.15). On the other hand, if (u1 , u2 , λ) satisfies (4.6.14), (4.6.15), and µ ∈ a(σ1 , σ2 , b), then fk (µ) ≤ bk < +∞, k = 1, . . . , m. Consequently, µ vanishes outside W . Integrating (4.6.14) with respect to µ and taking into account (4.6.15) and (4.6.19), we obtain σ1 (u1 ) − σ2 (u2 ) −

m

λk bk ≤ c(µ).

k=1

We have shown that a (c) coincides with the constraint set (4.6.14), (4.6.15), so v(c, z) = val(c; σ1 , σ2 , b) for all c ∈ E and all z = (σ1 , σ2 , b) ∈ Z. Observe next that Val(c; σ1 , σ2 , b) ≥ val(c; σ1 , σ2 , b) > −∞

256


for all (σ1 , σ2 , b) ∈ Z. Then, according to Theorem 4.6.1, the duality relation (4.6.16) holds if and only if the functional Val(c; ·) on Z = V (S1 ) × V (S2 ) × IRm is lower semicontinuous in the weak topology σ(Z, Z ). Therefore, if we show that the existence of a function c ∈ LSC(βS1 × βS2 ) coinciding with c on W implies the just-mentioned lower semicontinuity of Val(c; ·), the first assertion of (I) will be established. In view of (4.6.12), it suffices to verify that for each a ∈ IR the set Q(a) := {(σ1 , σ2 , b); Val(c ; σ1 , σ2 , b) ≤ a} is σ(Z, Z )-closed. With this aim we construct in the dual Banach space V (βS1 ) × V (βS2 ) × IRm = (C b (S1 ) × C b (S2 ) × IRm )∗ and a weakly* closed set Q (a) such that Q(a) = Q (a) ∩ (V (βS1 ) × V (βS2 ) × IRm ). We define the set Q (a) by the prescription that a point (σ1 , σ2 , b) belongs to Q (a) if and only if σ1 ∈ V+ (βS1 ), σ2 ∈ V+ (βS2 ) and a measure µ ∈ V+ (βS1 × βS2 ) exists satisfying π1 µ = σ1 , π2 µ = σ2 , fk (µ) ≤ bk , k = 1, . . . , m, and c (µ) ≤ a, where fk and c are defined as in Remark 4.6.10. It follows from Lemma 4.5.20 and the definition of Val that actually the intersection of Q (a) and V (S1 ) × V (S2 ) × IRm is Q(a). It remains to check that Q (a) is weakly* closed. In view of the Krein– Smulyan theorem (see, for example, Dunford and Schwartz (1958)), this will be established if we show that for all N > 0 the sets Q (a, N ) := Q (a) ∩ {(σ1 , σ2 , b); ||σ1 || ≤ N, ||σ2 || ≤ N, ||b|| ≤ N } are weakly* closed. We shall use the well-known equality (see, for example, Levin (1985a, Theorem 0.9))     f (x)σ( dx) = sup h(x)σ( dx); h ∈ C(S), h ≤ f ; (4.6.21)   S

S

it holds for every compact space S, every measure σ ∈ V+ (S), and every lower semicontinuous function f : S → IR ∪ {+∞}. This equality implies c (µ) and that the functionals on V+ (βS1 × βS2 ) = C(βS1 × βS2 )∗+ , µ "→ µ "→ fk (µ), k = 1, . . . , m, are weakly* lower semicontinuous. Consequently, it yields the weak* compactness of the set R(a, N )

:= {(σ1 , σ2 , b, µ); ||σ1 || ≤ N, ||σ2 || ≤ N, ||b|| ≤ N, µ ∈ V+ (βS1 × βS2 ), π1 µ = σ1 , π2 µ = σ2 , c (µ) ≤ a, fk (µ) ≤ bk , k = 1, . . . , n}


257

in the dual Banach space V (βS1 ) × V (βS2 ) × IRm × V (βS1 × βS2 ) = (C(βS1 ) × C(βS2 ) × IRm × C(βS1 × βS2 ))∗ . The set Q (a, N ) coincides with the projection of R(a, N ) onto V (βS1 ) × V (βS2 ) × IRm = (C(βS1 ) × C(βS2 ) × IRm )∗ . Consequently, it is weakly* compact as well. The first assertion of (I) is now completely proved. To prove the second assertion of (I), observe that in view of Lemmas 4.5.19 and 4.5.20 combined with the weak* lower semicontinuity of fk (µ), k = 1, . . . , m, the constraint set (4.6.4), (4.6.11) is weakly* compact regarded as a subset of V (βS1 × βS2 ) = C(βS1 × βS2 )∗ . This together with the weak* lower semicontinuity of c (µ) implies the existence of an optimal measure provided that the constraint set is nonempty. (II) Sufficiency follows from (I), so only necessity is to be proved. Being sublinear and σ(Z, Z )-lower semicontinuous, the functional Val(c, ·) is represented in the form Val(c, z) = sup{ z, z ; z ∈ ∂ Val(c, ·)(0)}

for all z ∈ Z, (4.6.22)

with ∂ Val(c, ·)(0) = {z ∈ Z ; z, z ≤ Val(c, z) for all z ∈ Z} (see the proof of Theorem 4.6.1 (I)). Observe now that in view of the relations C b (S1 ) = C(βS1 ), C b (S2 ) = C(βS2 ), the right-hand side of (4.6.22) makes sense for z = (σ1 , σ2 , b) in the set V (βS1 ) × V (βS2 ) × IRm , not just for z ∈ Z = V (S1 ) × V (S2 ) × IRm . Therefore, Val(c, ·) can be extended to V (βS1 )×V (βS2 )×IRm as a sublinear and weakly* lower semicontinuous functional given by formula (4.6.22). We denote it again by Val(c, ·) and define a function g : βS1 ×βS2 → IR∪{+∞} by g(x, y) := Val(c; δx , δy , b(x, y)),

(4.6.23)

where b(x, y) := (f1 (x, y), . . . , fm (x, y)). (This is well-defined, since fk ∈ C b (S1 ) ⊗ C b (S2 ) = C(βS1 × βS2 ), k = 1, . . . , m.) Applying (4.6.22) to z = (δx , δy , b(x, y)) yields m δk fk (x, y); (u1 , u2 , λ) ∈ ∂ Val(c, ·)(0) . g(x, y) = sup u1 (x) − u2 (y) − k=1

258


This implies that g is lower semicontinuous on βS1 × βS2 . It remains to note that g(x, y) = c(x, y) for (x, y) ∈ S1 × S2 , since for σ1 = δx , σ2 = δy , b = b(x, y), the constraint set (4.6.4), (4.6.11) consists of a single element, namely the Dirac measure δ(x,y) . (III) According to (I), the duality relation (4.6.16) holds. Taking this into account, it follows from (4.6.12) that either val(c; σ1 , σ2 , b) = +∞

for all c ∈ C b (S1 ) ⊗ C b (S2 )

(and this is the case of an empty constraint set (4.6.4), (4.6.11)) or val(c; σ1 , σ2 , b) < +∞

for all c ∈ C b (S1 ) ⊗ C b (S2 ).

Consequently, if some c ∈ C b (S1 ) ⊗ C b (S2 ) exists with val(c; σ1 , σ2 , b) < +∞, then this inequality holds for all c ∈ C b (S1 ) ⊗ C b (S2 ) and, in particular, for c = 0. Observe now that val(0; σ1 , σ2 , b) can be equal to 0 or +∞ only. Then val(c; σ1 , σ2 , b) < +∞ for all c ∈ C b (S1 ) ⊗ C b (S2 ) if and only if val(0; σ1 , σ2 , b) = 0. To complete the proof it remains to note that the equality val(0; σ1 , σ2 , b) = 0 can be rewritten as the collection of inequalities (4.6.17) for all (u1 , u2 , λ) satisfying (4.6.18). 2 Remark 4.6.13 A similar duality theorem for an extremal marginal problem with constraints (4.6.11) and the additional constraints µ ≥ 0, π1 µ + π2 µ = σ, where S1 = S2 = S and σ is a given measure in V+ (S), may be found in Levin and Rachev (1989).

4.6.4

Duality Theorem for a Further Extremal Marginal Problem

Let S1 , S2 ∈ L and let c : S1 × S2 → IR ∪ {+∞} be universally measurable and bounded from below. Given measures σ1 ∈ V+ (S1 ), σ2 ∈ V+ (S2 ), and m ∈ V+ (S1 × S2 ), we consider the extremal marginal problem to find the optimal value   Val(c; σ1 , σ2 , m) := inf c(x, y)µ( d(x, y)); (4.6.24)   S1 ×S2  µ ∈ V+ (S1 × S2 ), π1 µ = σ1 , π2 µ = σ2 , µ ≤ m . 


259

The dual extremal problem is to determine the optimal value val(c; σ1 , σ2 , m) (4.6.25)   u1 (x)σ1 ( dx) − u2 (y)σ2 ( dy) − w(x, y)m( d(x, y)); := sup  S1

S2

S1 ×S2

u1 ∈ C b (S1 ), u2 ∈ C b (S2 ), w ∈ C b (S1 ) ⊗ C b (S2 ), w(x, y) ≥ 0 and   u1 (x) − u2 (y) − w(x, y) ≤ c(x, y) for all (x, y) ∈ S1 × S2 . 

Observe that Val(c; σ1 , σ2 , m) = +∞ if the constraint set of (4.6.24) is empty. In particular, Val(c; σ1 , σ2 , m) = +∞ when σ1 (S1 ) = σ2 (S2 ) or σ(S1 ) = σ(S2 ) > m(S1 × S2 ). At the same time, since c is bounded below, the constraint set in (4.6.25) is nonempty, and so the functional val(c, ·) is determined on the whole space V (S1 ) × V (S2 ) × V (S1 × S2 ) and val(c; σ1 , σ2 , m) > −∞

for all (σ1 , σ2 , m) ∈ V (S1 ) × V (S2 ) × V (S1 × S2 ).

It will be convenient to extend the functional Val(c, ·) to the whole space V (S1 ) × V (S2 ) × V (S1 × S2 ) by assuming Val(c; σ1 , σ2 , m) = +∞

for (σ1 , σ2 , m) ∈ / V+ (S1 ) × V+ (S2 ) × V+ (S1 × S2 ).

It is clear that Val(c; σ1 , σ2 , m) ≥ val(c; σ1 , σ2 , m) > −∞ for all (σ1 , σ2 , m) ∈ V (S1 ) × V (S2 ) × V (S1 × S2 ). We are now in position to characterize the duality theorem, cf. Levin (1984b, Theorem 13). Theorem 4.6.14 Let c : S1 × S2 → IR ∪ {+∞} be bounded below and universally measurable. The following statements are equivalent: (a) c is lower semicontinuous on S1 × S2 . (b) The duality relation Val(c; σ1 , σ2 , m) = val(c; σ1 , σ2 , m) holds for all (σ1 , σ2 , m) ∈ V (S1 ) × V (S2 ) × V (S1 × S2 ).

(4.6.26)

260


Proof: Take = V (S1 × S2 ), = C b (S1 ) ⊗ C b (S2 ), = V (S1 ) × V (S2 ) × V (S1 × S2 ), = C b (S1 ) × C b (S2 ) × C b (S1 ) ⊗ C b (S2 ).

X X Z Z

Define for all x = µ ∈ X, x = h ∈ X , x, x := h(x, y)µ( d(x, y)). S1 ×S2

Further, set

z, z :=

u1 (x)σ1 ( dx) −

S1

u2 (y)σ2 ( dy) −

w(x, y)m( d(x, y))

S1 ×S2

S2

for all z = (σ1 , σ2 , m) ∈ Z, z = (u1 , u2 , w) ∈ Z . Define a : Z → 2X by a(σ1 , σ2 , m) := {µ ∈ V+ (S1 × S2 ); π1 µ = σ1 , π2 µ = σ2 , µ ≤ m} , and consider the semilinear space E consisting of all universally measurable functions c : S × S → IR ∪ {+∞} bounded below with c(x) = c(µ) defined by (4.6.3). It is clear that V (c, z) = Val(c; σ1 , σ2 , m) for all c ∈ E, z = (σ1 , σ2 , m) ∈ Z, where V (c, z) = inf{c(x); x ∈ a(z)} (see section 4.6.1). In order to apply the abstract duality theorem (Theorem 4.6.1), we must describe a (c) and determine the functional v(c, z). By the general definition of a (c), we have a (c) =

{(u1 , u2 , w); σ1 (u1 ) − σ2 (u2 ) − m(w) ≤ c(µ) for all (σ1 , σ2 , m) ∈ dom a and all µ ∈ a(σ1 , σ2 , m)}.

It is easily seen, by taking σ1 = δx , σ2 = δy , and m = µ = δ(x,y) for arbitrary x ∈ S1 , y ∈ S2 , that every (u1 , u2 , w) ∈ a (c) satisfies the inequality u1 (x) − u2 (y) − w(x, y) ≤ c(x, y)

for all (x, y) ∈ S1 × S2 .

Also, substituting σ1 = 0, σ2 = 0, m = δ(x,y) , and µ = 0 (clearly 0 ∈ a(0, 0, δ(x,y) ) gives us that w ≥ 0 whenever (u1 , u2 , w) ∈ a (c), the constraints of (4.6.25) are satisfied by every (u1 , u2 , w) ∈ a (c). On the other


261

hand, suppose that (u1 , u2 , w) satisfies the constraints of (4.6.25) and µ ∈ a(σ1 , σ2 , m) for some (σ1 , σ2 , m) ∈ Z. Then, integrating the inequality u1 (x) − u2 (y) − w(x, y) ≤ c(x, y) with respect to µ, and taking into account the inequality w(x, y) ≥ 0 and the relations π1 µ = σ1 , π2 µ = σ2 , 0 ≤ µ ≤ m, we obtain (u1 (x) − u2 (y) − w(x, y))µ( d(x, y)) σ1 (u1 ) − σ2 (u2 ) − m(w) ≤ S1 ×S2

≤

c(µ).

Consequently, (u1 , u2 , w) ∈ a (c). Thus, a (c) is just the constraint set of (4.6.25), and therefore, v(c, z) =

val(c; σ1 , σ2 , m)

(4.6.27)

for all c ∈ E and all z = (σ1 , σ2 , m) ∈ dom a. Notice that for z = (σ1 , σ2 , m) ∈ / dom a, the equality (4.6.27) holds as well, since in this case both sides are equal to +∞. In view of Corollary 4.6.2, the theorem will be established if we show that condition (a) is necessary and sufficient for lower semicontinuity of Val(c, ·) on Z = V (S1 ) × V (S2 ) × V (S1 × S2 ) in the weak topology σ(Z, Z ). Necessity. If a net (xγ , yγ ) converges to a point (x, y), then c(x, y) =

Val(c; δx , δy , δ(x,y) ) ≤ lim Val(c; δxγ , δyγ , δ(xγ ,yγ ) γ

=

lim c(xγ , yγ ); γ

that is, c is lower semicontinuous on S1 × S2 . Sufficiency. Now, the function c is supposed to be lower semicontinuous on S1 × S2 , and we must verify that the functional Val(c, ·) is lower semicontinuous on Z in the weak topology σ(Z, Z ). This will follow if we show that for every a ∈ IR, the set Q(a) := {(σ1 , σ2 , m) ∈ Z; Val(c; σ1 , σ2 , m) ≤ a} is σ(Z, Z )-closed. According to Remark 4.5.21, it suffices to find a certain weakly* closed set Q (a) in the dual Banach space V (βS1 ) × V (βS2 ) × V (βS1 × βS2 ) = (C b (S1 ) × C b (S2 ) × C b (S1 ) ⊗ C b (S2 ))∗ with Q(a) = Q (a) ∩ (V (S1 ) × V (S2 ) × V (S1 × S2 )).

262


We define the set Q (a) as follows: A point (σ1 , σ2 , m) ∈ V (βS1 ) × V (βS2 ) × V (βS1 × βS2 ) belongs to Q (a) if and only if there exists a measure µ ∈ V (βS1 × βS2 )+ such that π1 µ = σ1 , π2 µ = σ2 , µ ≤ m, and c(µ) ≤ a, where c is defined as in Remark 4.6.10. It follows from Lemma 4.5.20 and the definition of Val that actually the intersection of Q (a) and Z is Q(a). Now it remains to show that Q (a) is weakly* closed. In view of the Krein–Smulyan theorem, it suffices to verify that all the sets Q (a, N ) :=

{(σ1 , σ2 , m) ∈ Q (a); ||σ1 || ≤ N, ||σ2 || ≤ N, ||m|| ≤ N }

(N > 0) are weakly* closed. But these sets are easily seen to be weakly* compact, and the result follows. 2

Corollary 4.6.15 (A topological version of the Fréchet problem) Suppose that S1 , S2 ∈ L, m ∈ V+ (S1 × S2 ), σ1 ∈ V+ (S1 ), σ2 ∈ V+ (S2 ), and σ1 S1 = σ2 S2 = 1. The following statements are equivalent: (a) There exists a probability measure µ ∈ V+ (S1 × S2 ) such that π1 µ = σ1 , π2 µ = σ2 , and µ ≤ m. (b) The inequality σ1 A + σ2 B ≤ m(A × B) + 1 holds for all A ∈ B(S1 ), B ∈ B(S2 ). Remark 4.6.16 In the case of finite spaces S1 and S2 , the question of when there exists a probability measure µ on S1 ×S2 satisfying (a) is known as the Fréchet problem (see Fréchet (1957)). The question was answered by Dall’Aglio (1961), who established the equivalence (a) ⇔ (b) in the discrete case. The Fréchet problem for abstract measure spaces was solved by Kellerer (1964c) and Sudakov (1979). A topological version of the Fréchet problem was studied in Levin (1984b, Theorem 11 and its corollary), where the equivalence (a) ⇔ (b) was proved for compact spaces and also for abstract measure spaces. Proof: Take the cost function c = 0 and apply Theorem 4.6.14. It follows from the duality relation (4.6.26) that either val(0; σ1 , σ2 , m) = +∞ (and this is the case where the constraint set of (4.6.24) is empty), or val(0; σ1 , σ2 , m) = 0. Now it remains to show that condition (b) is necessary and sufficient for the equality val(0; σ1 , σ2 , m) = 0 to hold. The necessity is obvious in view of (4.6.26). Let us prove the sufficiency.


263

The equality val(0; σ1 , σ2 , m) = 0 can be rewritten as the implication u1 (x) − u2 (y) ≤ w(x, y) ⇒ σ1 (u1 ) − σ2 (u2 ) ≤ m(w)

(4.6.28)

for all u1 ∈ C b (S1 ), u2 ∈ C b (S2 ), and all w ∈ C b (S1 ) ⊗ C b (S2 ) with w ≥ 0. Thus, we shall prove (4.6.28) supposing (b) to be satisfied. We assume without loss of generality that 0 ≤ u1 < 1, 0 ≤ u2 < 1, 0 ≤ w < 1. This always can be achieved by adding the same positive constant to u1 and u2 and by subsequently multiplying all three functions by an appropriate positive number. Define A(α) := {x ∈ S1 ; α ≤ u1 (x)}, B(α) := {y ∈ S2 ; α ≤ u2 (y)}, and define u1n (x)

n−1 1 1A( k ) (x), n n

:=

k=0

1 n

u2n (y) :=

n−1

1B ( k ) (y), n

k=0

where 1 stands for the characteristic function (indicator) of the corresponding set. We have u1 (x) < u1n (x)

for all x ∈ S1 ,

and u2 (y) < u2n (y) ≤ u2 (y) +

1 n

for all y ∈ S2 .

Hence σ1 (u1 ) − σ2 (u2 ) ≤ =

=

1 σ1 (u1n ) − σ2 (u2n ) + n n−1 1 1 k k − σ2 B + σ1 A n n n n k=0 n−1 1 1 k k + σ 2 S2 \ B −1 + . σ1 A n n n n k=0

Now, using (b), we obtain n−1 1 1 k + , m C σ1 (u1 ) − σ2 (u2 ) ≤ n n n k=1

(4.6.29)

264


k

defined by k k k := A × S2 \ B . C n n n

with C

n

Writing Ai

=

Bj

=

i+1 i \A , A n n j+1 j \B , B n n

we have C

n−1 k−1 k = (Ai × Bj ), n j=0

k = 1, . . . , n − 1.

i=k

Summing up the corresponding indicators, we get n−1 n−1 n−1 n−1 n−2 n−1 1 1 1 1C ( k ) = 1Ai ×Bj = (i − j)1Ai ×Bj n n n n j=0 i=j+1 j=0 k=1

k=1 i=k

and n−1 n−2 n−1 i−j 1 k = m(Ai × Bj ). m C n n n j=0 i=j+1

(4.6.30)

k=1

Now we use the inequality w(x, y) ≥ u1 (x) − u2 (y) and the definition of sets Ai and Bj . This results in the following bound: i−j−1 w(x, y)m( d(x, y)) ≥ (4.6.31) m(Ai × Bj ) ∀i, j. n Ai ×Bj

Now from (4.6.29)–(4.6.31) and w(x, y) ≥ 0 we derive σ1 (u1 ) − σ2 (u2 )   n−2 n−1  1  1 w(x, y)m( d(x, y)) + m(Ai × Bj ) + ≤  n n j=0 i=j+1 Aj ×Bj

1 ≤ m(w) + (m(S1 × S2 ) + 1), n and since n may be taken arbitrarily large, the desired implication (4.6.28) is established. 2


4.6.5

265

Duality Theorem for a Nontopological Version of the Mass Transfer Problem

In this section we consider a nontopological version of the mass transfer problem. (The results here are due to Levin (1995a, 1995b).) Let S be an arbitrary nonempty set and let E(S) denote the linear space IRS of all real-valued functions on S. Equipped with the product topology, E(S) is a Hausdorff locally convex linear topological space. The dual space E(S)∗ is the space of all functions σ : S → IR with finite support supp σ := {x; σ(x) = 0}. Any such σ will be treated as a (signed) finite measure on the σ-algebra 2S . This measure is represented as a linear combination with coefficients σ(x) of Dirac measures at points x ∈ supp σ, σ(x)δx . σ = x∈supp σ

Furthermore, the pairing between E(S) and E(S)∗ will be determined by ϕ(x)σ(x) (ϕ ∈ E(S), σ ∈ E(S)∗ ). ϕ, σ := ϕ(x)σ( dx) = x∈supp σ

S

Let E(S × S)∗+ denote the convex cone of nonnegative measures in E(S × S)∗ and let E(S)∗0 stand for the linear subspace in E(S)∗ of measures with S = 0. Given a cost function c : S × S → IR ∪ {+∞} and a measure ∈ E(S)∗0 , we consider a nontopological version of the mass transfer problem to find the optimal value     A0 (c, ) := inf c(x, y)µ( d(x, y)); µ ∈ E(S × S)∗+ , π1 µ − π2 µ = .   S×S

The dual problem of the nontopological version of the mass transfer problem is to find the optimal value     u(x)( dx); u ∈ Lip (c, S; E(S)) . B0 (c, ) := sup   S

Clearly, B0 (c, ) ≤ A0 (c, ) for all ∈ E(S)∗0 . Theorem 4.6.17 Duality theorem for a nontopological version of the mass transfer problem.

266


(I) If c∗ (x, y) < +∞ for all (x, y) ∈ S × S, then either c∗ (x, y) > −∞ for all (x, y) ∈ S × S, or c∗ (x, y) = −∞ for all (x, y) ∈ S × S. In the first case, Lip (c, S; E(S)) is nonempty, c∗ (x, y) =

sup

(u(x) − u(y))

(4.6.32)

u∈ Lip (c,S;E(S))

for all x, y ∈ S with x = y, and A0 (c, ) = B0 (c, ) > −∞

whenever ∈ E(S)∗0 .

(4.6.33)

In the second case, Lip (c, S; E(S)) is empty, and A0 (c, ) = B0 (c, ) = −∞

whenever ∈ E(S)∗0 .

(4.6.34)

(II) If Lip (c, S; E(S)) is empty but c∗ ≡ −∞, then there exist x and y in S, x = y, such that A0 (c, δx − δy ) = +∞

and

B0 (c, δx − δy ) = −∞.

Remark 4.6.18 As follows from Example 4.4.14, an optimal measure need not exist even if Lip (c, S; E(S)) is nonempty. Remark 4.6.19 The case (II) actually occurs. This is illustrated by Example 4.4.8. In order to prove the theorem, several lemmas will be required. Let c be any function S × S → IR ∪ {+∞}. Note that the reduced cost function c∗ can take the value −∞, so if for a given µ ∈ E(S × S)∗+ the integral c∗ (µ) := c∗ (x, y)µ( d(x, y)) S×S

makes no sense (this is to say that c∗+ (µ) = c∗− (µ) = +∞), then it will be assumed, by definition, that c∗ (µ) = +∞. With this convention, the value A0 (c∗ , ) is determined for every ∈ E(S)∗0 , and A0 (c, ) ≥ Ap (c∗ , ). Lemma 4.6.20 For every µ ∈ E(S × S)∗+ and every n ∈ IN there exists a measure µn ∈ E(S × S)∗+ such that π1 µn − π2 µn = π1 µ − π2 µ


267

and   c (µ) − 1 µ(S × S) if c (µ) > −∞, ∗ ∗ n c(µn ) ≤  −d (µ) if c∗ (µ) = −∞, n where dn (µ) → ∞ as n → ∞. Proof: We suppose that c∗ (µ) < +∞; otherwise the statement is obvious. Let µ =

m

αk δ(xk ,yk ) ,

k=1

where αk > 0, k = 1, . . . , m. Hence, c∗ (µ) =

m

αk c∗ (xk , yk ).

k=1

Now, using the definition of c∗ , we find points zkn1 , . . . , zknm(k,n) in S such that  m(k,n)  c∗ (xk , yk ) + 1 if c∗ (xk , yk ) > −∞, n c(zkni , zkni+1 ) ≤ bkn :=  −n if c (x , y ) = −∞, ∗

i=0

k

k

with zkno = xk , zknm(k,n)+1 = yk . Set M = {(x, y) ∈ supp µ; c∗ (x, y) > −∞} and observe that c∗ (µ) = −∞ implies µ((S × S) \ M ) > 0. It is easiliy seen that the measure µn :=

m m(k,n) k=1

αk δ(zkni ,zkni+1 )

i=0

has the desired properties with 1 µ( d(x, y)). c∗ (x, y) + dn (µ) := nµ((S × S) \ M ) − n M

2 The following simple version of a reduction theorem is a direct consequence of Lemma 4.6.20.

268


Corollary 4.6.21 The equality A0 (c, ) = A0 (c∗ , ) holds for all ∈ E(S)∗0 . As a direct consequence of the obvious relation Lip (c, S; E(S)) = Lip(c∗ , S; E(S)), we have the following representation of the optimal value in the dual problem. Lemma 4.6.22 Given any c : S × S → IR ∪ {+∞}, the equality B0 (c, ) = B0 (c∗ , ) holds for all ∈ E(S)∗0 . Extend A0 (c, ·) to the whole space E(S)∗ by setting A0 (c, ) = +∞

if ∈ / E(S)∗0 .

The extended functional is sublinear (we assume by definition that +∞ + (−∞) = +∞). The following lemma describes its subdifferential ∂A0 (c, ·)(0)

:= =

{u ∈ E(S); u, ≤ A0 (c, ) {u ∈ E(S); u, ≤ A0 (c, )

for all ∈ E(S)∗ } for all ∈ E(S)∗0 }.

Lemma 4.6.23 For any c : S × S → IR ∪ {+∞}, ∂A0 (c, ·)(0) = Lip (c, S; E(S)). Proof: The inclusion Lip (c, S; E(S)) ⊆ ∂A0 (c, ·)(0) is obvious. On the other hand, if u ∈ ∂A0 (c, ·)(0), then for any x, y ∈ S u(x) − u(y) = u, δx − δy ≤ A0 (c, δx − δy ) ≤ c(δ(x,y) ) = c(x, y); that is, u ∈ Lip (c, S; E(S)).

2

Lemma 4.6.24 For every (x0 , y0 ) ∈ S × S with x0 = y0 and c∗ (x0 , y0 ) = +∞, the equality A0 (c∗ , (δx0 − δy0 )) = +∞ holds. Proof: The result will follow if we show that c∗ (x, y)µ( d(x, y)) = +∞ S×S

for every µ ∈ E(S × S)∗+ with π1 µ − π2 µ = δx0 − δy0 .

(4.6.35)


269

n Take any µ ∈ E(S × S)∗+ with π1 µ−π2 µ = δx0 −δy0 , µ = j=1 αj δ(xj−1 ,xj ) (αj > 0). In accordance with our convention (namely, +∞+(−∞) = +∞), the equality (4.6.35) is equivalent to the existence of a point (x, y) ∈ supp µ with c∗ (x, y) = +∞. It follows easily from the equality π1 µ − π2 µ = δx0 − δy0 that a chain (zk−1 , zk ) ∈ supp µ, k = 1, . . . , , exists such that z0 = x0 and z = y0 . We therefore obtain c∗ (x0 , y0 ) ≤

c(zk−1 , zk ).

k=1

This implies c∗ (zk−1 , zk ) = +∞ for at least one k, and consequently, (4.6.35) holds. 2 Proof of Theorem 4.6.17: (I) From the triangle inequality for c∗ we get easily that either c∗ (x, y) > −∞ for all (x, y) ∈ S × S, or c∗ (x, y) = −∞ for all (x, y) ∈ S × S. If c∗ (x, y) > −∞ for all (x, y) ∈ S × S, then c∗ takes only finite values. If this case the set Lip (c, S; E(S)) is nonempty; in fact, it contains the functions u∗ , u∗ (x) = c∗ (x, x∗ ) for x = x∗ and u∗ (x∗ ) = 0, where x∗ is any fixed point of S. We next prove that (4.6.32) holds. Suppose the contrary. Then c∗ (x0 , y0 ) >

sup

(u(x0 ) − u(y0 )) = B0 (c, δx0 − δy0 )

u∈ Lip (c,S;E(S))

for some (x0 , y0 ) ∈ S × S with x0 = y0 . The function u0 , c∗ (x, y0 ) if x = y0 , u0 (x) := 0 if x = y0 , belongs to Lip (c, S; E(S)), so B0 (c, δx0 − δy0 ) ≥ u0 (x0 ) − u0 (y0 ) = c∗ (x0 , y0 ), which is a contradiction. To verify (4.6.33) note that the space E(S)∗ is topologized in a natural way as the topological direct sum of real lines IR(x) , x ∈ S. It is well known that this topology t is the strongest locally convex topology on E(S)∗ , and (E(S)∗ , t)∗ = E(S). Further, since E(S)∗0 = { ∈ E(S)∗ ; 1S , = 0} (where 1S (x) = 1 for all x ∈ S), E(S)∗+ is a closed hyperplane in E(S)∗ . Furthermore, the induced topology t|E(S)∗0 is the strongest locally convex

270


topology on E(S)∗0 . Since by the assumption, c∗ (x, y) < +∞ for all (x, y) ∈ S × S, we have −∞ < B0 (c∗ , ) ≤ A0 (c∗ , ) < +∞ whenever ∈ E(S)∗0 . The restriction of A0 (c, ·) = A0 (c∗ , ·) (see Corollary 4.6.21) on E(S)∗0 is thus a proper sublinear functional E(S)∗0 → IR. Hence it is continuous with respect to the topology t|E(S)∗0 . As E(S)∗0 is closed in E(S)∗ , A0 (c, ·) is lower semicontinuous as a functional on the whole space (E(S)∗ , t). Again an appeal to convex analysis (see, for example, Levin (1985a, Theorem 0.3 and its Corollary 3)), the equality A0 (c, ) = sup{ u, ; u ∈ ∂A0 (c, ·)(0)} holds whenever ∈ E(S)∗0 . Thus, in view of Lemma 4.6.23, the above equality can be rewritten as follows: A0 (c, ) = B0 (c, )

for all ∈ E(S)∗0 .

Now consider the case that c∗ (x, y) = −∞ for all (x, y) ∈ S × S. By Corollary 4.6.21, we have A0 (c, ) = −∞

for all ∈ E(S)∗0 .

Also, B0 (c, ) = −∞ for all ∈ E(S)∗0 , for the set Lip (c, S; E(S)) is empty. Thus, the equality (4.6.34) is established, and the statement (I) is completely proved. (II) If Lip (c, S; E(S)) is empty but c∗ ≡ −∞, then c∗ (x0 , y0 ) = +∞ for some (x0 , y0 ) ∈ S × S, with x0 = y0 . Since Lip (c, S; E(S)) = Ø, we have B0 (c, δx0 − δy0 ) = −∞. On the other hand, applying Corollary 4.6.21 and Lemma 4.6.24 yields A0 (c, δx0 − δy0 ) = A0 (c∗ , δx0 − δy0 ) = +∞. The proof is complete.

2

Theorem 4.6.25 (Existence of optimal measures) Suppose that c∗ (x, y) > −∞ for all (x, y) ∈ S × S. Then for any ∈ E(S)∗0 there exists a measure µ0 ∈ E(S × S)∗+ such that π1 µ0 = + , π2 µ0 = − , and A0 (c∗ , )

= c∗ (µ0 ) = min{c∗ (µ); µ ∈ E(S × S)∗+ , π1 µ = + , π2 µ = − },

where = + − − is the Jordan decomposition of . In particular, for all x, y ∈ S, we have c∗ (x, y) if x = y (µ0 = δ(x,y) ), A0 (c∗ , δx − δy ) = 0 if x = y (µ0 = 0).


271

Proof: The result will follow if for every µ ∈ E(S × S)∗+ with π1 µ−π2 µ = we find a measure µ ∈ E(S × S)∗+ such that π1 µ = + , π2 µ = − , and c∗ (µ ) ≤ c∗ (µ). Indeed, in this case the problem consists in finding inf{c∗ (µ ); µ ∈ E(S × S)∗+ , π1 µ = + , π2 µ = − }. Since this is a finite-dimensional linear program with a compact costraint set and a lower semicontinuous function µ "→ c∗ (µ ), the infimum is attained. We say that a triple of points in S, {x0 , y0 , z0 }, is a transshipment with respect to µ if z0 = x0 , z0 = y0 and µ(x0 , z0 ) > 0, µ(z0 , y0 ) > 0. If µ has such a transshipment, we form a new measure µ1 by setting   µ(x0 , z0 ) − a     µ(z , y ) − a 0 0 µ1 (x, y) =  µ(x0 , y0 ) + a     µ(x, z)

if x = x0 , y = z0 , if x = z0 , y = y0 , if x = x0 , y = y0 , in all other cases,

with a := min{µ(x0 , z0 ), µ(z0 , y0 )}. It is easily seen that µ1 ∈ E(S × S)∗+ , π1 µ1 − π2 µ1 = π1 µ − π2 µ = , and µ1 has at least one transshipment fewer than µ. Further, since c∗ satisfies the triangle inequality, we get c∗ (µ1 ) = c∗ (µ) − ac∗ (x0 , z0 ) − ac∗ (z0 , y0 ) + ac∗ (x0 , y0 ) ≤ c∗ (µ). After repeating this procedure several times, we obtain a measure with no transshipments, µn ∈ E(S × S)∗+ , such that π1 µn − π2 µn = and c∗ (µn ) ≤ c∗ (µn−1 ) ≤ · · · ≤ c∗ (µ1 ) ≤ c∗ (µ). Define the measure µ ∈ E(S × S)∗+ via

µ (x, y) :=

µn (x, y) if x = y, 0

if x = y.

Observe that supp π1 µ ∩ supp π2 µ = Ø;

(4.6.36)

in fact, µ has no transshipments, and supp µ has no common points with the diagonal D = {(x, x); x ∈ S}. Further, since c∗ (x, y) > −∞ whenever (x, y) ∈ S × S, one has c∗ (x, x) ≥ 0 for all x ∈ S. This follows easily from the triangle inequality. Hence, c∗ (µ ) = c∗ (µn ) −

c∗ (x, x)µn (x, x) ≤ c∗ (µn ) ≤ c+ (µ).

(x,x)∈supp µn

272


Finally, it is clear that π1 µ − π2 µ = π1 µn − π2 µn = , and in view of (4.6.36), this leads to π1 µ = + , π2 µ = − . 2 The next result supplements Theorem 4.6.17 (duality theorem for a nontopological version of the mass transfer problem). Theorem 4.6.26 Suppose that c∗ (x, y) < +∞ for all (x, y) ∈ S × S. Then the following statements are equivalent: (a) Lip (c, S; E(S)) is nonempty. (b) c∗ (x, y) > −∞ for all (x, y) ∈ S × S. (c) c∗ (x, x) ≥ 0 for all x ∈ S. (d) For every cycle x0 , x1 , . . . , xn = x0 , the inequality n

c(xi−1 , xi ) ≥ 0

n=1

holds. Proof: (a) ⇒ (b). If u ∈ Lip (c, S; E(S)), then u ∈ Lip(c∗ , S; E(S)), and we get c∗ (x, y) ≥ u(x) − u(y) > −∞

for all (x, y) ∈ S × S.

(b) ⇒ (c). This follows immediately from the triangle inequality for c∗ . (c) ⇒ (d). Obvious. (c) ⇒ (b). The triangle inequality for c∗ gives 0 ≤ c∗ (x, x) ≤ c∗ (x, y) + c∗ (y, x). Hence the inequality c∗ (x, y) ≥ −c∗ (y, x) > −∞ holds for all (x, y) ∈ S × S. (b) ⇒ (a). Fix any z∗ ∈ S and set c∗ (x, z0 ) if x = z0 , u(x) = 0 if x = z0 . Clearly, u takes finite values, and from the triangle inequality for c∗ it follows that u ∈ Lip (c, S; E(S)). 2


273

Remark 4.6.27 Statement (d) of Theorem 4.6.26 is equivalent to the following: (d )

For every cycle x0 , x1 , . . . , xn = x0 , the inequality

n i=1

holds.

c(xi , xi−1 ) ≥ 0


5 Applications of the Duality Theory

5.1 Mass Transfer Problem with a Smooth Cost Function—Explicit Solution In this section, based on papers by Levin (1990, 1995a), an explicit formula is given for the optimal value of the mass transfer problem with a smooth cost function vanishing on the diagonal. Also, applications are given to cyclic-monotone operators.

5.1.1

Introductory Remarks and Statement of the Basic Theorem

In what follows, S is a domain, that is, a connected open set, in IRn . Given a cost function c : S × S → IR ∪ {+∞} that is bounded below and has analytic sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α},

α ∈ IR,

we state the mass transfer problem (or, more precisely, the Kantorovich– Rubinstein mass transshipment problem): Find the optimal value   c(x, y)µ ( d(x, y)) ; (5.1.1) A(c, ) = inf  S×S

276

5. Applications of the Duality Theory

  µ ∈ V+ (S × S), (π1 − π2 )µ =  for all ∈ V0 (S) (cf. Section 4.5). The following theorem gives an explicit formula for A(c, ) in the case that c vanishes on the diagonal D = {(x, x); x ∈ S} and is smooth (continuously differentiable) on some open neighborhood G of D. Suppose that the cost function c : S × S → IR is bounded, has analytic sublevel sets, and satisfies the condition c(x, x) = 0

for all x ∈ S.

(5.1.2)

Also suppose that c is continuously differentiable on some open set G ⊃ D = {(x, x); x ∈ S}. Let E(S) denote the space of all real-valued functions on S (cf. Section 4.6.5). Then the following theorem holds. Theorem 5.1.1 (Explicit solution of the mass transfer problem with a smooth cost function) Under the conditions on c stated above, Lip(c, S; C b (S)) = Lip(c, S; E(S)) holds. Furthermore, the following assertions hold: I. If Lip (c, S; E(S)) is nonempty, then there exists a function u0 (x), unique up to a constant term, such that Lip(c, S; E(S)) = {u0 + α; α ∈ IR}.

(5.1.3)

Moreover, A(c, ) =

u0 (x)( dx)

(5.1.4)

S

for all ∈ V0 (S). The function u0 satisfies the equation grad u0 (x) = gradx c(x, y)|y=x

(5.1.5)

and is given by the curvilinear integral u0 (x) =

gradζ c(ζ, η)|η=ζ · dζ =

γ(x0 ,x)

n ∂c (ζ, ζ) dζk . (5.1.6) ∂ζk

γ(x0 ,x) k=1

Here x0 ∈ S, and γ(x0 , x) stands for an arbitrary piecewise smooth oriented curve in S leading from x0 to x, and the integral is independent of the path of integration γ(x0 , x).

5.1 Mass Transfer Problem with a Smooth Cost Function—Explicit Solution

277

II. If Lip (c, S; E(S)) is empty, then A(c, ) = −∞

for all ∈ V0 (S).

Corollary 5.1.2 If in addition to the assumptions of the theorem, the cost function c in nonnegative, then A(c, ) = 0 for all ∈ V0 (S). Indeed, in this case the set Lip (c, S; E(S)) is nonempty, since it contains constant functions. Then u0 is a constant, which follows from (5.1.2) and (5.1.5), or from (5.1.3), and the statement of the corollary is derived from (5.1.4).

5.1.2

Proof of the Basic Theorem

The idea of proving the theorem is to derive it from the duality Theorem 4.5.15 or from the reduction Theorem 4.5.30. To do this it suffices to show that Lip(c, S; C b (S)) = Lip (c, S; E(S)), and to verify that (5.1.3) holds where u0 is given by (5.1.5). A systematic realization of this idea is given in the following proof. Proof of Theorem 5.1.1: I. Let p be a fixed vector in IRn . If Lip (c, S; E(S)) is nonempty and u ∈ Lip (c, S; E(S)), then for any x ∈ S and small t > 0 we have x ± tp ∈ S,

(x, x ± tp) ∈ G,

(x ± tp, x) ∈ G.

Now, taking into account (5.1.2) and using that u ∈ Lip (c, S; E(S)) we obtain −

c(x, x + tp) − c(x, x) t

≤ ≤

u(x + tp) − u(x) t c(x + tp, x) − c(x, x) t

for small t > 0. Consequently,

− grady c(x, y)y=x, p

u(x + tp) − u(x) t↓0 t u(x + tp) − u(x) ≤ lim sup t t↓0 ≤ gradx c(x, y)|y=x, p . ≤

lim inf

(5.1.7)

278


Further, differentiating (5.1.2) yields ∂c ∂c (x, y) + (x, y) = 0, ∂xi ∂yi y=x y=x

i = 1, . . . , n,

that is gradx c(x, y)|y=x = − grady c(x, y)y=x . From (5.1.7) and (5.1.8) the existence of the limit limt↓0 follows, moreover, lim

u(x + tp) − u(x) = (gradx c(x, y)|y=x , p). t

lim

u(x + tp) − u(x) −t

t↓0

(5.1.8) u(x+tp)−u(x) t

Now,

t↓0

u(x + t(−p)) − u(x) − lim t↓0 t = − gradx c(x, y)|y=x, −p = gradx c(x, y)|y=x, p =

implies the existence of the derivative du(x + tp) = grad c(x, y)| , p . x y=x dt t=0 Thus u is continuously differentiable, and its gradient is given by (5.1.5). A continuously differentiable function having a given gradient is uniquely determined up to a constant term; therefore, u(x) = u0 (x) −u(x0 ). Here u0 is given by (5.1.6), and, by assumption, (5.1.3) holds. Boundedness of c implies that the functions in Lip (c, S; E(S)) are bounded as well. Therefore, Lip(c, S; C b (S)) = Lip (c, S; E(S)).

(5.1.9)

(If Lip (c, S; E(S)) is empty, then the equality (5.1.9) is obvious.) Further, taking into account Theorems 4.6.26 and 4.6.17, it follows from (5.1.3) that c∗ (x, y) = u0 (x) − u0 (y)

for all x, y ∈ S.

Then for any µ ∈ V+ (S × S) with (π1 − π2 )µ = we obtain c∗ (x, y)µ( d(x, y)) = u0 (x)( dx). S×S

S

(5.1.10)


279

Consequently, A(c∗ , ) =

u0 (x)( dx). S

To obtain (5.1.4) it remains to apply the reduction Theorem 4.5.30 (its assumption (4.5.10) is satisfied because of the boundedness of c). The proof of I is now completed. II. This follows from the equality (5.1.9) and the reduction Theorem 4.5.30. In fact, we also take into account that in view of Theorem 4.6.17 and the equivalence (a) ⇔ (b) of Theorem 4.6.26, we have c∗ (x, y) = −∞ for all x, y ∈ S. 2 Remark 5.1.3 (a) The assumption (5.1.2) is substantial for the validity of the theorem. This may be illustrated by the following example. Take c(x, y) = 1 for all x, y ∈ S and observe that

1 Lip(c, S; E(S)) ⊇ u ∈ E(S); |u(x)| ≤ for all x ∈ S + IR, 2

1 for all x ∈ S + IR, u ∈ C b (S); |u(x)| ≤ Lip(c, S; C b (S)) ⊇ 2 so (5.1.3) and (5.1.9) fail to hold. (b) Also, the smoothness of the cost function c at the diagonal is crucial for this explicit result. It excludes, for example, the interesting case c(x, y) = ||x − y||, x, y ∈ IRk . (The case ca (x, y) = ||x − y||a , a > 1, leads to trivial results, see Corollary 5.1.2.) Some explicit bounds in this case are given in Section 3.7.

5.1.3

Conditions for the Nonemptiness of Lip(c, S; E(S))

To identify the conditions yielding Lip (c, S; E(S)) = Ø we shall need some assumptions about the smoothness of c in a neighborhood of the diagonal. Let G ⊃ {(x, x); x ∈ S} be an open neighborhood of the diagonal, and set G(x) := {y ∈ S; (x, y) ∈ G}. Let c be a function S × S → IR ∪ {+∞} that is finite on G and satisfies (5.1.2). We say that the cost function c is regular on G if the following conditions hold:

280


(α) c is continuously differentiable on G. (β) On the diagonal there exist continuous partial derivatives ∂2c (x, x), ∂xi ∂xj

∂2c (x, x), ∂xi ∂yj

and for any i, j ∈ {1, . . . , n}, the equality ∂2c ∂2c (x, x) = (x, x) ∂xi ∂xj ∂xj ∂xi holds. (γ) For every x ∈ S, the function c(x, ·) is twice continuously differentiable (C 2 ) on G(x). All these regularity assumptions are clearly satisfied when c is C 2 on G. If c is regular on G, then for (x, y) ∈ G let us define the quadratic form B(p; x, y) :=

n n

βij (x, y)pi pj ,

p = (p1 , . . . , pn ),

(5.1.11)

i=1 j=1

with coefficients βij (x, y) :=

∂2c ∂2c ∂2c (y, y) + (y, y) + (x, y). ∂xi ∂xj ∂xi ∂yj ∂yi ∂yj

The following theorem Lip (c, S; E(S)) = Ø.

provides

necessary

conditions

(5.1.12) to

have

Theorem 5.1.4 (Necessary conditions for a nontrivial solution of the mass transshipment problem) Let c : S × S → IR ∪ {+∞} satisfy (5.1.2) and be regular on an open neighborhood G of the diagonal. If Lip (c, S; E(S)) is nonempty, then for every x ∈ S, ∂2c ∂2c (x, x) = (x, x) ∂xi ∂yj ∂xj ∂yi

∀i, j ∈ {1, . . . , n};

(5.1.13)

moreover, the quadratic form B(p; x, x) is positive semidefinite; that is, B(p; x, x) ≥ 0

for all p ∈ IRn .

(5.1.14)

Remark 5.1.5 If c is C 2 on G, then by repeated differentiation of (5.1.2) we get βij (x, x) = −

∂2c (x, x). ∂xi ∂yj


281

The next three theorems supply sufficient conditions that guarantee Lip (c, S; E(S)) to be nonempty. Theorem 5.1.6 (Sufficient conditions for a nontrivial solution) Suppose that the domain S is convex, and that the cost function c satisfies (5.1.2) and is regular on S × S. Also suppose that (5.1.13) holds and that for all x, y ∈ S the quadratic form (5.1.11) is positive semidefinite, B(p; x, y) ≥ 0

for all p ∈ IRn .

(5.1.15)

Then Lip (c, S; E(S)) is nonempty. Theorem 5.1.7 Suppose that S is convex and that c is C 1 on S × S, and moreover, it satisfies (5.1.2) and the regularity condition (β). Also suppose that (5.1.13) holds and that for all x, y ∈ S, (grady c(x, y) − grady c(y, y), y − x) ≥ 0.

(5.1.16)

Then Lip (c, S; E(S)) is nonempty. Theorem 5.1.8 Let S be convex and let c : S × S → IR be regular on S × S and satisfy (5.1.2). Suppose that for every x ∈ S; the symmetry condition in (5.1.13) holds and the matrix βij (x, x) is positive definite; that is, n n

βij (x, x)pi pj > 0

whenever p = (p1 , . . . , pn ) = 0.

(5.1.17)

i=1 j=1

Suppose also that max{(grady c(x, y) − grady c(y, y), y − z), (grady c(x, z) − grady c(z, z), z − y)} ≥ 0

(5.1.18)

for all x, y, z ∈ S. Then Lip (c, S; E(S)) is nonempty.

5.1.4

Auxiliary Results and Proofs of Theorems 5.1.4, 5.1.6, 5.1.7, and 5.1.8 Providing Necessary and Sufficient Conditions for Existence of a Nontrivial Explicit Solution of the Kantorovich–Rubinstein Mass Transshipment Problem

The proofs of these theorems are based on two lemmas. Lemma 5.1.9 Let c satisfy (5.1.2) and be regular on an open neighborhood of the diagonal. If there exists a function u0 satisfying (5.1.5), then the equalities in (5.1.13) hold.

282


Proof: It follows from (5.1.5) and the regularity of c that u0 is twice continuously differentiable. Then ∂ 2 u0 ∂ 2 u0 = , ∂xi ∂xj ∂xj ∂xi

i, j ∈ {1, . . . , n},

and since in view of (5.1.5) ∂ 2 u0 ∂xi ∂xj

=

∂ 2 c(x, x) ∂ 2 c(x, x) + , ∂xi ∂xj ∂xi ∂xj 2

this implies (5.1.13).

Now fix an arbitrary real-valued function u0 on S and define for each x ∈ S the function ϕx on S as follows: ϕx (y) := u0 (y) + c(x, y),

y ∈ S.

(5.1.19)

Taking into account (5.1.2), the following equivalence relation is straightforward: u0 ∈ Lip (c, S; E(S)) ⇔ ϕx (x) = min ϕx (y), y∈S

∀x ∈ S.

(5.1.20)

Next, if u0 satisfies (5.1.5), then grad u0 (x) = − grady c(x, y)|y=x .

(5.1.21)

Hence grad ϕx (y) = grady c(x, y) − grady c(y, y).

(5.1.22)

The following lemma can be established by a direct computation; cf. Levin (1992, Lemma 8). Lemma 5.1.10 Let c satisfy (5.1.2) and be regular on an open neighborhood of the diagonal G. If u0 satisfies (5.1.5), then ϕx is C 2 on G(x) and the equality ∂ 2 ϕx (y) = βij (x, y), ∂yi ∂yj

i, j ∈ {1, . . . , n},

(5.1.23)

holds, where βij (x, y) are given by (5.1.12). Proof of Theorem 5.1.4: Define ϕx as in (5.1.19), where u0 ∈ Lip (c, S; E(S)). The function u0 satisfies (5.1.5) by Theorem 5.1.1, I. Therefore, it is twice continuously differentiable in view of the assumption about the regularity of c. Now,


283

(5.1.13) holds by Lemma 5.1.9. Further, the function ϕx is C 2 on G(x), and the equivalence (5.1.20) implies that grad ϕx (y)|y=x = 0.

(5.1.24)

Furthermore, n n ∂ 2 ϕx (x) i=1 j=1

∂yi ∂yj

pi pj ≥ 0

whenever p = (p1 , . . . , pn ) ∈ IRn .

Here condition (5.1.24) is equivalent to (5.1.5) (and to (5.1.21)) and means that the point y = x is stationary, while the subsequent inequality is the second-order condition for y = x to be a local minimum point for ϕx . According to Lemma 5.1.10, the above inequality can be rewritten as (5.1.14), which completes the proof. 2 Proof of Theorem 5.1.6: Condition (5.1.13) combined with the regularity assumption (β) implies the fulfillment of the symmetry condition ∂ ∂xj

∂ ∂c ∂c = (x, y) (x, y) ∂xi ∂xi ∂xj y=x y=x

for any i, j ∈ {1, . . . , n}. It follows then that the function u0 given by (5.1.6) satisfies (5.1.5) and (5.1.21). Take this u0 and consider the functions ϕx , x ∈ S, given by (5.1.19). Applying (5.1.15) and Lemma 5.1.10 (see (5.1.23)) gives d2 ϕx ((1 − t)y + tz) = B(z − y; x, (1 − t)y + tz) ≥ 0 dt2 whenever 0 ≤ t ≤ 1, x, y, z ∈ S. It follows from this inequality that the functions ϕx , x ∈ S, are convex. Further, (5.1.5) implies (5.1.24). Thus, y = x is a stationary point of the function ϕx , and since ϕx is convex, x is a global minimum point of ϕx . To conclude the proof it remains to use the equivalence (5.1.20). 2 Proof of Theorem 5.1.7: In view of (5.1.13) and the regularity of c, there exists a function u0 satisfying (5.1.21) (cf. the proof of Theorem 5.1.6). For any y ∈ S let yt := (1 − t)x + ty,

0 ≤ t ≤ 1.

284


Using (5.1.21) and (5.1.12), we get for every t, 0 < t < 1, dyt dϕx (yt ) x = grad ϕ (yt ), dt dt = (grady c(x, yt ) − grady c(yt , yt ), y − x) 1 (grady c(x, yt ) − grady c(yt , yt ), yt − x). = t Therefore, by (5.1.16),

dϕx (yt ) dt

1 x

x

ϕ (y) = ϕ (x) +

≥ 0. Consequently,

dϕx (yt ) dt ≥ ϕx (x), dt

0

and u0 ∈ Lip (c, S; E(S)) by virtue of (5.1.20).

2

Proof of Theorem 5.1.8: As in the proof of Theorem 5.1.6, conditions (5.1.13) and (β) imply ∂ ∂ ∂c ∂c (x, x) = (x, x) ∂xj ∂xi ∂xi ∂xj for all i, j ∈ {1, . . . , n}. This implies the existence of a function u0 satisfying (5.1.5) (and (5.1.21)). Consider the functions ϕx , x ∈ S, given by (5.1.19). Then grad ϕx (y) = grady c(x, y) − grady c(y, y). The above equality, combined with (5.1.18), yields max{(grad ϕx (y), y − z), (grad ϕx (z), z − y)} ≥ 0 for all y, z ∈ S. This implies quasiconvexity of the function ϕx (see Levin (1995a, Proposition 3.1)). Further, from the quasiconvexity of ϕx together with the assumption (5.1.17) and the equalities (5.1.23) it follows that t "→ ϕx ((1 − t)x + ty) is a nondecreasing function on [0, 1]. Consequently, dϕx ((1 − t)x + ty) ≥ 0, dt t=1 yielding condition (5.1.16), and the result follows from Theorem 5.1.7.

2


285

Remark 5.1.11 As we have seen, condition (5.1.15) represents the convexity of ϕx , while condition (5.1.16) implies that the function t "→ ϕx ((1 − t)x + ty) in nondecreasing on [0, 1]. It follows that the hypotheses of Theorem 5.1.6 imply in fact those of Theorem 5.1.7; so Theorem 5.1.7 provides a more general statement than Theorem 5.1.6. Also, the proof of Theorem 5.1.8 shows that this theorem is a consequence of Theorem 5.1.7.

5.1.5

A Necessary and Sufficient Condition for Lip(c, S; E(S)) to be Nonempty, Leading to a Nontrivial Explicit Solution of the Mass Transshipment Problem

Here we give a condition that is both necessary and sufficient for nonemptiness of Lip (c, S; E(S)) in the case when S is a simply connected domain. However, this condition, in contrast to the sufficient conditions given in Theorems 5.1.6, 5.1.7, and 5.1.8, is not easily verifiable for concrete cost functions. Theorem 5.1.12 Let S be a simply connected domain in IRn , and suppose that the cost function c : S × S → IR satisfies (5.1.2) and is regular in some open neighborhood of the diagonal. Also suppose that for any pair of points x and y in S a piecewise smooth oriented curve γ(y, x) lying in S and leading from y to x is fixed. Then the following statements are equivalent: (a)

Lip (c, S; E(S)) is nonempty.

(b)

The equalities (5.1.13) hold, and for any x, y ∈ S, gradζ c(ζ, η) γ(y,x)

· dζ ≤ c(x, y).

(5.1.25)

η=ζ

Remark 5.1.13 It is possible to simplify condition (5.1.25) using the specific form of the domain S and choosing concrete curves γ(y, x). For example, if S is convex, then γ(y, x) may be taken to be the rectilinear segment [y, x], and if S is starlike with respect to some point x0 ∈ S, then γ(y, x) may be taken to be the two-link curve composed of the segments [y, x0 ] and [x0 , x]. Proof: (a) ⇒ (b). If u0 ∈ Lip (c, S; E(S)), then, by Theorem 5.1.1, I, u0 satisfies (5.1.5). Hence the equalities (5.1.13) hold by Lemma 5.1.9. It remains to note that inequality (5.1.25) follows from the inclusion u0 ∈ Lip (c, S; E(S)) and (5.1.5).

286


(b) ⇒ (a). Consider on S the vector field ∂c ∂c (x, x), . . . , (x, x) . gradx c(x, y)|y=x = ∂x1 ∂xn Using the regularity of c and (5.1.13), we obtain ∂ ∂ ∂c ∂c (x, x) = (x, x) ∂xj ∂xi ∂xi ∂xj for all x ∈ S and all i, j ∈ {1, . . . , n}. This implies that the function gradζ c(ζ, η) · dζ u0 (x) := γ(x0 ,x) η=ζ

satisfies (5.1.5). Moreover, the integral gradx c(x, y) · dx γ

(5.1.26)

y=x

along any closed contour γ is equal to zero. Then gradζ c(ζ, η) · dζ, u0 (x) − u0 (y) = γ(y,x) η=ζ

and (5.1.25) implies u0 ∈ Lip (c, S; E(S)).

2

Remark 5.1.14 As is noted in Levin (1992), Theorem 5.1.12 can be extended (with the same proof ) to multiply connected domains and even to arbitrary smooth manifolds if in (b) we replace (5.1.13) by the requirement that the integrals (5.1.26) are equal to zero.

5.1.6

Applications to Cyclical-Monotone Operators

Let E be a locally convex Hausdorff space, E ∗ the conjugate space, and ∗ f : E → 2E a set-valued mapping. It is assumed below that the mapping f is proper; that is, the set dom f := {x ∈ E; f (x) = Ø} is nonempty.


287

The mapping f is said to be a cyclic-monotone operator if the inequality x0 − x1 , x0 + x1 − x2 , x1 + · · · + xk − x0 , xk ≥ 0

(5.1.27)

holds for any xi ∈ dom f and xi ∈ f (xi ), i = 0, . . . , k, and for all k ∈ IN = ∗

{1, 2, . . .}. A mapping f : E → 2E is said to be a monotone operator if (5.1.27) holds for k = 1 only. In other words, f is monotone if x − y, x − y ≥ 0

whenever x, y ∈ dom f and x ∈ f (x), y ∈ f (y).

The following criterion for cyclic-monotonicity is well known (see Rockafellar (1966, 1970)): f is cyclic-monotone if and only if there exists a convex function u : E → IR ∪ {+∞} such that f (x) ⊆ ∂u(x)

for every x ∈ dom f.

(5.1.28)

Here ∂u(x) := {x ∈ E ∗ ; y − x, x ≤ u(y) − u(x),

∀ y ∈ E}

is the subdifferential of u at the point x. The next theorem supplements this criterion and establishes a connection between cyclic-monotone operators and nonemptiness of the set Lip (c, S; E(S)) for specific set S and cost function c. Take S = dom f and define on S × S the cost function c(x, y) := Theorem 5.1.15

inf x − y, x .

x ∈f (x)

(5.1.29)

(I) The following assertions are equivalent:

(a) f is cyclic-monotone. (b) c(x, y) > −∞ for all x, y ∈ S, and the set Lip (c, S; E(S)) is nonempty. (II) Assume that S = dom f is convex. In this case a function u : S → IR belongs to Lip (c, S; E(S)) if and only if it is convex and lower semicontinuous and satisfies (5.1.28). Proof: (I) Condition (5.1.27) can be rewritten as k+1 i=1

c(xi−1 , xi ) ≥ 0

288


with xk+1 := x0 . Moreover, the condition implies that c(x, y) > −∞ for all x, y ∈ S, and it remains to apply the equivalence (a) ⇔ (d) of Theorem 4.6.26. (II) Given a function u ∈ Lip (c, S; E(S)), we have u(x) − u(y) ≤ x − y, x for all x, y ∈ S, x ∈ f (x).

(5.1.30)

Extend u to E by setting u(x) = +∞ for x ∈ / S. Then (5.1.30) holds for all x ∈ S, y ∈ E, and x ∈ f (x). In other words, u∗ (x ) ≤ x, x − u(x),

(5.1.31)

where u∗ stands for the conjugate convex function on E ∗ , u∗ (x ) := sup{ y, x − u(y); y ∈ E}. It follows from (5.1.31) that u∗ (x ) = x, x − u(x)

for all x ∈ S, x ∈ f (x).

(5.1.32)

Therefore, u(x) = u∗∗ (x)

for all x ∈ S;

(5.1.33)

here, u∗∗ (x) := sup{ x, x − u∗ (x ); x ∈ E ∗ } is the second conjugate function. It is clear from (5.1.33) that u is convex and lower semicontinuous on S, while (5.1.32) implies that x ∈ ∂u(x), and (5.1.28) is thus established. On the other hand, if f (x) ⊆ ∂u(x) for all x ∈ S, where u : S → IR is convex, then (5.1.30) holds, and consequently u ∈ Lip (c, S; E(S)). 2 The next theorem illustrates some new applications by making use of various—necessary and sufficient—conditions for nonemptiness of Lip (c, S; E(S)) as given in Theorems 5.1.4 and 5.1.6 (or 5.1.7). Both theorems, 5.1.15 and 5.1.16, enable us to look at cyclic-monotone operators in view of their relationships to the mass transfer problem. Theorem 5.1.16 Let S be a convex domain in IRn and f = (f1 , . . . , fn ) : S → IRn be a (single-valued) C 1 -operator, that is, all the fi are continuously differentiable. The following assertions are equivalent:


289

(a) f is cyclic-monotone on S; that is, (5.1.27) holds for any xi ∈ S, xi = f (xi ), i = 0, . . . , k, k ∈ IN. (b) The matrix with elements

∂fi (x) is symmetric for all x ∈ S, and f ∂xj

is monotone; that is, (f (x) − f (y), x − y) ≥ 0

∀x, y ∈ S.

(c) There exists a smooth (C 2 ) convex function u : S → IR such that f (x) = grad u(x). Proof: According to Theorem 5.1.15, cyclic-monotonicity of f is equivalent to nonemptiness of the set Lip (c, S; E(S)); here, the cost function is given by c(x, y) = (f (x), x − y) =

n

fi (x)(xi − yi ).

i=1

Obviously, c is regular on S × S and satisfies (5.1.2). Further, by Theorem 5.1.4, for the nonemptiness of Lip (c, S; E(S)) it is necessary that (i) the equalities (5.1.13) hold, and (ii) all βij (x, x), x ∈ S, be positive semidefinite (see (5.1.14)). At the same time, by Theorem 5.1.6, for nonemptiness of Lip (c, S; E(S)) it is sufficient that the equalities (5.1.13) hold, and furthermore, that all βij (x, y), x, y ∈ S, be positive semidefinite. A direct calculation gives ∂2c (x, y) = ∂xi ∂yj βij (x, y) =

−

∂fj (x), ∂xi ∂fi (y). ∂xj

Consequently, both conditions—the necessary and the sufficient ones— are equivalent to symmetry and positive semidefiniteness of all matrices (∂fi /∂xj (x)), x ∈ S. The last property can be reformulated as (c), and therefore, the equivalence (a) ⇔ (c) is proved. Next, by Theorem 5.1.7, Lip (c, S; E(S)) is nonempty if (5.1.13) and (5.1.16) hold. As noted above, (5.1.13) is equivalent to the symmetry of (∂fi /∂xj (x)) for all x ∈ S. It is easily seen that (5.1.16) is equivalent to monotonicity of f . Hence (b) ⇒ (a), and since cyclic-monotonicity implies both monotonicity and (5.1.13), (b) ⇒ (a). The proof is now complete. 2

Remark 5.1.17 It is well known that the matrices with elements (∂fi /∂xj )(x), x ∈ S, are positive semidefinite if and only if f is monotone;

290


see, for example, Ortega and Rheinboldt (1970, Theorem 4.5.3). Therefore, the implication (b) ⇒ (a) is a consequence of the implication (c) ⇒ (a). We have preferred to give a direct proof of (b) ⇒ (a) in order to demonstrate once more the usefulness of the method based on conditions for Lip (c, S; E(S)) to be nonempty. Other applications of the above conditions for nonemptiness of Lip (c, S; E(S)) will be given further on.

5.2 Extension and Approximate Extension Theorems In this section the following extension problem is investigated. Given a space S, a subspace S1 , a cost function c : S × S → IR ∪ {+∞}, and a function u1 ∈ Lip (c|S1 ×S1 , S1 ; X1 ), the question arises as to when there exists an extension u of u1 to S such that u ∈ Lip (c, S; X). Here X denotes some linear space of functions on S, and X1 is the space of functions on S1 of the same type (for example, one can take X = C b (S) and X1 = C b (S1 )). The extension problem is to find conditions on c and S1 that will guarantee the existence of such an extension for every u1 ∈ Lip (c|S1 ×S1 , S1 ; X1 ), that is, the validity of the relation Lip (c|S1 ×S1 , S1 ; X1 ) = Lip (c, S; X)|S1 , where |S1 in the right-hand side stands for the operation of restricting functions u : S → IR to S1 . Given a function u1 ∈ Lip (c|S1 ×S1 , S1 ; X1 ), by approximate extension theorems we mean assertions about the existence for each ε > 0 of a function u(ε) ∈ Lip (c, S; X) satisfying |u(ε) (x) − u1 (x)| ≤ ε for all x ∈ S1 . The results in this section are mostly due to Levin and Milyutin (1979) and Levin (1978a, 1994).

5.2.1

The Simplest Extension Theorem (the Case X = E(S) and X1 = E(S1 ))

First observe that (c|S1 ×S1 )∗ need not coincide with c∗ |S1 ×S1 , and since Lip (c, S; X) = Lip(c∗ , S; X), it follows that the extension problem cannot have a solution for functions u1 ∈ Lip (c|S1 ×S1 , S1 ; X1 ) \ Lip(c∗ |S1 ×S1 , S1 ; X1 ). Thus the extension problem makes sense only in the case when c = c∗ , that is, when the cost function c : S × S → IR ∪ {+∞} satisfies the triangle inequality.

5.2 Extension and Approximate Extension Theorems

291

Recall (see Section 4.6.5) that E(S) denotes the space of all real-valued functions on S. Theorem 5.2.1 Suppose that the cost function c : S × S → IR ∪ {+∞} satisfies the triangle inequality and that for every z ∈ S there exist x(z) and y(z) in S1 such that c(x(z), z) < +∞ and c(z, y(z)) < +∞. Then for every function u1 ∈ Lip (c|S1 ×S1 , S1 ; E(S1 )) there exists a function u ∈ Lip (c, S; E(S)) extending u0 to S. Proof: Denote by Z the set of pairs (S , u ); here S1 ⊆ S ⊆ S, and u is a function on S extending u1 and belonging to Lip(c|S ×S , S ; E(S )). The set Z is partially ordered (in a natural way) and satisfies the hypothesis of Zorn’s lemma (Z is nonempty because (S1 , u1 ) ∈ Z). Then Z has a maximal element (Smax , umax ), and it remains to check that Smax = S. Suppose the contrary. Then there exists an element z∗ ∈ S \ Smax . Suppose we have shown that −∞

−∞

x∈Smax

and inf

y∈Smax

[umax (y) + c(z∗ , y)] ≤ umax (y(z∗ )) + c (z∗ , y(z∗ )) < +∞,

which completes the proof.

2

292


Corollary 5.2.2 Let S be a topological space and let the cost function c : S × S → IR be continuous and satisfy the triangle inequality and c(x, x) = 0 for all x ∈ S. Then for any u1 ∈ Lip (c|S1 ×S1 , S1 ; C(S1 )) there exists a function u ∈ Lip (c, S; C(S)) extending u1 to S. (Here S1 is any subspace of S, and C(S) stands for the vector space of all continuous real-valued functions on S.) Proof: It follows from the continuity of c and the equality c(x, x) = 0, x ∈ S, that Lip (c, S; C(S)) = Lip (c, S; E(S)); and furthermore, Lip (c|S1 ×S1 , S1 ; C(S1 )) = Lip (c|S1 ×S1 , S1 ; E(S1 )). It remains to apply the theorem.

2

Remark 5.2.3 If, in addition to the assumptions of Corollary 5.2.2, the cost function c is bounded, we can replace the spaces C(S) and C(S1 ) in the above corollary with C b (S) and C b (S1 ) respectively. In fact, in this case Lip (c, S; C(S)) = Lip(c, S; C b (S)), and similarly, Lip (c|S1 ×S1 , S1 ; C(S1 )) = Lip(c|S1 ×S1 , S1 ; C b (S1 )).

5.2.2

Approximate Extension Theorems

The approximate extension property was introduced in Levin (1977), and an abstract scheme for obtaining general approximate extension theorems was proposed in Levin and Milyutin (1979, pp. 34–35). This scheme is as follows: Let S be a nonempty set and W a Banach lattice of bounded real-valued functions on S × S satisfying axioms (W1 ), (W2 ), (W3 ) (see Section 4.2.1). Further, let X be a Banach lattice of bounded real-valued functions on S such as in Section 4.2.1, and let c : S × S → IR ∪ {+∞} be a function bounded below that satisfies the triangle inequality and is regular with respect to W and X (see Definition 4.2.2). Also, let S be a nonempty set and suppose that Banach lattices W , X and a cost function c : S × S → IR ∪ {+∞} possess the same properties as W, X, and c, respectively.


293

∗ For 1 , 2 ∈ X+ with ||1 || = ||2 || we define

(1 , 2 ) := {σ ∈ W+∗ ; π1 σ = 1 , π2 σ = 2 }. The set ||2 ||.

∗ with ||1 || = (1 , 2 ) in W+∗ is defined similarly for 1 , 2 ∈ X+

The crucial point of the abstract scheme is the following theorem. Theorem 5.2.4 Suppose that the cost functions c and c are bounded below, satisfy the triangle inequalities, and are regular with respect to the corresponding lattices. Also suppose that a linear operator % : W → W is given possessing the following properties: (i) %W+ ⊆ W+ , %1S×S = 1S ×S , %X ⊆ X . ∗ with ||1 || = (ii) %∗ (1 , 2 ) = (%∗ 1 , %∗ 2 ) for any 1 , 2 ∈ X+ ||2 ||. (iii) (%∗ )+ = %∗ + ,

(%∗ )− = %∗ −

for every ∈ X ∗ .

(iv) % Lip (c, S; X) ⊆ Lip(c , S ; X ). (v) σ (c ) = (%∗ σ )(c) for every σ ∈ W+∗ σ(c) and σ (c ) see (4.2.2)).

(for the definitions of

Then % Lip (c, S; X) is dense in Lip(c , S ; X ). Proof: Assume the contrary. Then there exists a functional ∈ X ∗ such that sup{ u , ; u ∈ % Lip (c, S; X)} < sup{ u , ; u ∈ Lip(c , S ; X )}.

(5.2.2)

Notice that ∈ X0∗ , since by (i) the closure of % Lip (c, S; X) in X is invariant under translation by any constant: u ∈ % Lip (c, S; X) ⇒ u + α ∈ % Lip (c, S; X) ∀α ∈ IR. Hence, in view of (5.2.2), 1S ×S , = 0. Further, taking into account (iii), we see that %∗ ∈ X0∗ , and the left-hand side of (5.2.2) is equal to B(c, %∗ ; X) and the right-hand side to B(c , ; X ). Now, according to Theorem 4.2.6, we have B(c, %∗ ; X) B(c , ; X )

= A(c, %∗ ; W, X), = A(c , ; W , X ).

294


Consequently, A(c, %∗ ; W, X) < A(c , ; W , X ). On the other hand, by Theorem 4.2.8, A(c, %∗ ; W, X)

=

A(c , ; W , X )

=

((%∗ )+ , (%∗ )− )}, inf{σ (c ); σ ∈ + , − }. inf{σ(c) ; σ ∈

Applying (ii), (iii), and (v) yields A(c, %∗ ; W, X) = A(c , ; W , X ), which is a contradiction.

2

Remark 5.2.5 All the conditions (i)–(v) are easily seen to be satisfied when (i) S = S1 ⊂ S; (ii) W consists of the functions in W restricted to S1 × S1 ; (iii) X consists of the functions in X restricted to S1 ; (iv) %w = w|S1 ×S1 for all w ∈ W ; and (v) c is the restriction of c to S1 × S1 . In this case the theorem asserts that for every u1 ∈ Lip(c|S1 ×S1 , S1 ; X|S1 ) and every ε > 0 there exists a function vε ∈ Lip (c, S; X) such that |u1 (x)− vε (x)| ≤ ε for all x ∈ S1 , if c is bounded from below, satisfies the triangle inequality, and both functions c and c are regular with respect to the corresponding lattices. We remark that the requirement on c = c|S1 ×S1 to be regular is a condition on c and S1 , which is to be verified in concrete examples. Theorem 5.2.6 Suppose that S is a compact space and S1 is a closed subspace, and let c : S × S → IR ∪ {+∞} be lower semicontinuous and satisfy the triangle inequality. Then for every u1 ∈ Lip (c|S1 ×S1 , S1 ; C(S1 )) and every ε > 0 there exists a function u(ε) ∈ Lip (c, S; C(S)) such that |u1 (x) − u(ε) (x)| ≤ ε for all x ∈ S1 . Proof: By Example 4.2.3, the cost function c is regular with respect to W = C(S × S) and X = C(S), and the cost function c = c|S1 ×S1 is regular with respect to W = C(S1 × S1 ) and X = C(S1 ). In view of Remark 5.2.5, the result follows. 2 The following approximate extension theorem is proved by a similar argument using Example 4.2.4 and Banach lattices W, X, W , X of bounded Borel functions on the corresponding spaces.


295

Theorem 5.2.7 Let S be a Polish space and S1 a Borel subset. Suppose further that c : S × S → IR ∪ {+∞} satisfies the triangle inequality and that its sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α},

α ∈ IR,

are analytic. Let X and X denote the Banach lattices of bounded Borel functions on S and S1 respectively. Then for every u1 ∈ Lip(c|S1 ×S1 , S1 ; X ) and every ε > 0 there exists a function u(ε) ∈ Lip (c, S; X) such that |u1 (x) − u(ε) (x)| ≤ ε for all x ∈ S1 . Applying Example 4.2.5 together with Lemma 4.5.24 gives us the following approximate extension theorem. Theorem 5.2.8 Let S be a compact space and S1 a closed Gδ subset. Suppose also that c : S × S → IR ∪ {+∞} satisfies the triangle inequality and that its sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α},

α ∈ IR,

are B0 (S × S)-analytic. Let X and X denote the Banach lattices of bounded Baire measurable functions on S and S1 respectively. Then for every u1 ∈ Lip(c|S1 ×S1 , S1 ; X ) and every ε > 0 there exists a function u(ε) ∈ Lip (c, S; X) such that |u1 (x) − u(ε) (x)| ≤ ε for all x ∈ S1 .

5.2.3

Extension Theorems

Here we consider the following extension problem. Given is a compact space S and a cost function c : S × S → IR ∪ {+∞}. Let S1 be a closed subset of S, and u1 a continuous function on S1 , such that u1 (x) − u1 (y) ≤ c(x, y)

for all x, y ∈ S1 ;

(5.2.3)

that is, u1 ∈ Lip (c|S1 ×S1 , S1 ; C(S1 )). Can u1 be extended to the whole of S in such a way that the extended function is in Lip (c, S; C(S))? In other words, the problem is to find conditions on c and S1 under which the equality Lip (c, S; C(S))|S1 = Lip (c|S1 ×S1 , S1 ; C(S1 )) holds. An analogous question can be asked for the extension of a bounded Borel function u1 from a Borel subset S1 of a Polish space S to the whole of S with preservation of (5.2.3).

296


In this section we give sufficient conditions for the existence of such extensions. In the following we formulate an abstract version of the extension problem. Let S be a nonempty set and suppose that W and X are some Banach lattices of bounded functions on S ×S and S such as those considered above (see Section 5.2.2). Further, let c : S × S → IR ∪ {+∞}, S1 ⊂ S, and let u1 be a bounded function on S1 satisfying (5.2.3). The question is whether u1 can be extended to S in such a way that the extension is in Lip (c, S; X). The following is obviously a necessary condition: u1 (x) − u1 (y) ≤ c∗ (x, y)

for all x, y ∈ S1 .

Therefore, it is sufficient to consider the extension problem for a cost function c satisfying the triangle inequality. We note also that the extension problem cannot have a solution when c is not bounded below. Thus, we assume that c is bounded below and satisfies the triangle inequality. Given a bounded function u1 on S1 satisfying (5.2.3), we associate with it two functions on S, au and bu , as follows: au1 (x) :=

inf{u1 (z) + c(x, z); z ∈ S1 };

(5.2.4)

bu1 (x) :=

sup{u1 (z) − c(z, x); z ∈ S1 }.

(5.2.5)

Theorem 5.2.9 The extension problem for u1 has a solution if and only if the set Lip(c , S; X) is nonempty, where c (x, y) := min{c(x, y), au1 (x) − bu1 (y)}.

(5.2.6)

Proof: First observe that the extension problem has a solution if and only if the set Lip( c, S; X) is nonempty, where u1 (x) − u1 (y) if x, y ∈ S1 , c(x, y) := c(x, y) otherwise. Since Lip( c, S; X) = Lip( c∗ , S; X), the theorem will be proved if we verify that c∗ = c . For this part we fix a pair of points x, y ∈ S and consider all possible chains x = z0 , z1 , . . . , zn , zn+1 = y,

(all zj ∈ S, n = 0, 1, 2, . . .),

leading from x to y. We say that the chain {zj ; j = 0, . . . , n + 1},

z0 = x, zn+1 = y,


297

is better than {zk ; k = 0, . . . , m + 1},

z0 = x, zm+1 = y,

if the inequality n+1

c(zj , zj+1 )

≤

j=0

m+1

c(zk , zk+1 )

k=0

holds. We take an arbitrary chain x = z0 , . . . , zm+1 = y and show that = y with n ≤ 2 that is better there exists a new chain x = z0 , . . . , zn+1 than the old one and contains not more than two points from S1 . If at most one point zk is in S1 , then c(x, y) = c(x, y),

c(zk , zk+1 ) = c(zk , zk+1 ),

k = 0, 1, . . . , n.

In that case the chain x = z0 , z1 = y is better than the original chain, by the triangle inequality for c. If exactly two points, zk1 and zk2 , lie in S1 (k1 < k2 ) and k2 = k1 + 1, then c(zk , zk+1 ) = c(zk , zk+1 ), k = 0, 1, . . . , n, and c(x, y) ≤ c(x, y). Here strict inequality is possible only for k1 = 0 and k2 = m + 1, in which case x, y ∈ S1 and c(x, y) = u1 (x) − u1 (y). Then the chain x = z0 , z1 = y is better than the original chain, due to the triangle inequality for c. If exactly two points, zk1 and zk1 +1 , lie in S1 , then c(x, y) = c(x, y) for m > 0, and c(zk , zk+1 ) = c(zk , zk+1 ) for all k = k1 . In this case, by the triangle inequality for c, one of the following three chains is better than the original one: {x, zk1 , zk1 +1 , y} {x, zk1 +1 , y} {x, zk1 , y}

if k1 = 0 and k1 = m, if k1 = 0, if k1 = m.

We now show that if more than two points of the original chain are in S1 , then there exists a chain that is better than the original chain and in fact shorter. Let zk1 , zk2 , zk3 ∈ S1 with k1 < k2 < k3 . If k3 = k1 + 2, then by the triangle inequality for c and (5.2.3), the original chain can be improved by removing from it the segment from zk1 +1 to zk2 −1 when k2 = k1 + 1 and the segment from zk2 +1 to zk3 −1 when k3 = k2 + 1. (Here (5.2.3) is actually

298


applied only in the cases k2 = k1 + 2 and k3 = k2 + 2.) But if k2 = k1 + 1 and k3 = k2 + 1, then c(zk1 , zk2 ) = u1 (zk1 ) − u1 (zk2 ), c(zk2 , zk3 ) = u1 (zk2 ) − u1 (zk3 ), c(zk1 , zk3 ) = u1 (zk1 ) − u1 (zk3 ). Now it is clear that the chain can be improved by removing zk2 . Summarizing, we see that for every chain leading from x to y there is a better chain having one of the following three types: {x, y} {x, z, y} {x, z1 , z2 , y}

if x ∈ S1 and y ∈ S1 ; with z ∈ S1 if x ∈ S1 and y ∈ S1 , or if x ∈ S1 and y ∈ S1 ; with z1 , z2 ∈ S1 if x ∈ S1 and y ∈ S1 .

c in two stages; that is, c∗ = c2 , where Consequently, c∗ is obtained from c(x, y), c1 (x, y), c2 (x, y)} c2 (x, y) = min{ and, for k = 1, 2,   k+1  ck (x, y) = inf c(zj−1 , zj ); z0 = x, zk+1 = y, z1 , . . . , zk ∈ S ,   j=1

(compare (4.1.16)–(4.1.17)). We now obtain the following characterization of c∗ :

c∗ (x, y) =

u1 (x) − u1 (y) if x ∈ S1 and y ∈ S1 ,

min c(x, y), inf [u1 (x) − u1 (z) + c(z, y)]

c∗ (x, y) =

if x ∈ S1 and y ∈ S1 ,

min c(x, y), inf [c(x, z) + u1 (z) − u1 (y)]

c∗ (x, y) =

z∈S1

z∈S1

if x ∈ S1 and y ∈ S1 ,

c∗ (x, y) =

min c(x, y),

inf

z1 ,z2 ∈S1

[c(x, z1 ) + u1 (z1 ) − u1 (z2 ) + c(z2 , y)] if x ∈ S1 and y ∈ S1 .

Observe that by (5.2.3), au1 (x) = bu1 (x) = u1 (x)

for all x ∈ S1 .


299

From this and the above formulas for c∗ (x, y) we arrive at c∗ (x, y) = c (x, y)

for all x, y ∈ S. 2

The proof is now complete.

Corollary 5.2.10 If the function (5.2.6) is regular with respect to W and X, then the extension problem for u1 has a solution; that is, there exists a function u ∈ Lip (c, S; X) extending u1 to S. Proof: The function c given by (5.2.6) satisfies the triangle inequality, c∗ , as we have already shown. Also, c is bounded from below, since c = which is indeed clear from its definition (see (5.2.6)). Now, by Theorem 2 4.2.6, Lip(c , S; X) is nonempty, and the result follows.

Remark 5.2.11 The functions (5.2.4) and (5.2.5) were introduced in Levin (1977). Theorem 5.2.9 and Corollary 5.2.10 lead to a number of specific results on continuous and Borel extensions. Some of these results are given below. Let us consider the continuous extension problem. Let S be a compact space and X := C(S),

W := C(S × S).

According to Example 4.2.3, every lower semicontinuous function S × S → IR ∪ {+∞} is regular with respect to this W and X. Theorem 5.2.12 Suppose that S is compact and that the cost function c : S × S → IR ∪ {+∞} satisfies the triangle inequality and is given by the formula (4.1.20), c(x, y) := min{s(x , y ); x , y ∈ S , f (x ) = x, f (y ) = y}, where S is compact, s ∈ C(S × S ), the mapping f : S → S is continuous, and f (S ) = S. Then every continuous function u1 defined on a closed set S1 ⊂ S and satisfying (5.2.3), u1 (x) − u1 (y) ≤ c(x, y)

for all x, y ∈ S1 ,

can be extended to a function in Lip (c, S; C(S)).

300


Proof: Consider the closed set S1 = f −1 (S1 ) in S . Given a function u1 ∈ Lip (c|S1 ×S1 , S1 ; C(S1 )) we have au1 (x) = = =

min {u1 (z) + c(x, z)}

z∈S1

min {u1 (z) + min{s(x , z ); f (x ) = x, f (z ) = z}}

z∈S1

min{u1 (f (z )) + s(x , z ); f (x ) = x, z ∈ S1 }.

Suppose that a net xγ converges to x. We take xγ and zγ for which f (xγ ) = au1 (xγ ) =

xγ , zγ ∈ S1 , u1 (f (zγ )) + s(xγ , zγ ).

Then for every convergent subnet (xγν , zγ ν ) → (x , z ) we obtain z ∈ S1 , f (x ) = limν f (xγν ) = limν xγν = x, and au1 (x) ≤ u1 (f (z )) + s(x , z ) = lim[u1 (f (zγ ν )) + s(xγν , zγ ν )] = lim au1 (xγν ). ν

ν

Assuming without loss of generality that lim au1 (xγν ) = lim au1 (xγ ), ν

γ

we see that the function au1 is lower semicontinuous. It can be shown in a similar way that the function bu1 is upper semicontinuous. This implies that the function on S × S, au1 (x) − bu1 (y), is lower semicontinuous. Hence the function c given by (5.2.6) is lower semicontinuous as a minimum of two lower semicontinuous functions. Then c is regular with respect to C(S × S) and C(S), and to complete the proof it remains to apply Corollary 5.2.10. 2

Remark 5.2.13 If c satisfies the triangle inequality and is continuous, then au1 (x)−bu1 (y) is also continuous. Then the function c (x, y) is continuous as well and satisfies the triangle inequality. In this case Lip(c , S; C(S)) is nonempty, since it contains all functions of the form u(x) = c (x, z), where z is any fixed point of S. Thus, when c is continuous and satisfies the triangle inequality, the extension theorem can be derived from Theorem 5.2.9 without using the abstract duality theorem (Theorem 4.2.6). Using this fact, a direct proof of the duality theorem for a continuous cost function can be given; that is, Corollary 4.1.10 can be proved without recourse to Theorem 4.2.6 and to abstract duality theory. This proof is outlined in Levin and Milyutin (1979, p. 72); see also Levin (1978a, §4).


301

Theorem 5.2.14 Suppose that S is compact and the cost function c : S × S → IR ∪ {+∞} is lower semicontinuous and satisfies the triangle inequality. Then every function u1 defined on a finite set S1 ⊂ S and satisfying (5.2.3) can be extended to a function in Lip (c, S; C(S)). Proof: In this case au1 (x) − bu1 (y) is lower semicontinuous as the minimum of a finite number of lower semicontinuous functions. Therefore, c (x, y) is lower semicontinuous as the minimum of two lower semicontinuous functions, c(x, y) and au1 (x) − bu1 (y). Then c is regular with respect to C(S × S) and C(S), and the result follows from Corollary 5.2.10. 2 A further continuous extension theorem will be given below, in Section 5.5 (see Theorem 5.5.3). We next consider the Borel extension problem. Let S be a Borel set in a Polish space. Take W = B(S × S),

X = B(S),

where B denotes the Banach lattice of bounded Borel functions on the corresponding space. According to Example 4.2.4, every function c : S × S → IR ∪ {+∞} with analytic sublevel sets {(x, y) ∈ S × S; c(x, y) ≤ α},

α ∈ IR,

is regular with respect to B(S × S) and B(S). Given a Borel set B in a Polish space, we denote by R(B) the class of all functions ϕ : B → IR ∪ {+∞} having analytic sublevel sets {z ∈ B; ϕ(z) ≤ α},

α ∈ IR.

It is easily seen that ϕ ∈ R(B) if and only if all the sets {z ∈ B; ϕ(z) < α},

α ∈ IR,

are analytic. We note without proof that (i) min(ϕ1 , ϕ2 ) ∈ R(B) and (ϕ1 + ϕ2 ) ∈ R(B)

for any ϕ1 , ϕ2 ∈ R(B);

(ii) R(B1 ) ⊂ R(B1 × B2 ), with B1 and B2 being Borel sets in a Polish space.

302


Theorem 5.2.15 Let S be a Borel set in a Polish space. Suppose that the cost function c : S × S → IR ∪ {+∞} belongs to the class R(S × S), is bounded below, and satisfies the triangle inequality. Let S1 be a Borel set in S, and u1 a bounded Borel function on S1 , satisfying (5.2.3). Then there exists a function u ∈ Lip(c, S; B(S)) extending u1 to all of S. Proof: In view of Corollary 5.2.10, it suffices to show that the function c given by (5.2.6) is regular with respect to B(S × S) and B(S). Since c ∈ R(S × S) and c is defined by (5.2.6), it is sufficient to check that the function au1 (x) − bu1 (y) belongs to R(S × S). For this purpose consider the function φ on S × S × S1 × S1 defined by ψ(x, y, z1 , z2 ) := u1 (z1 ) + c(x, z1 ) − u1 (z2 ) + c(z2 , y). Then ψ ∈ R(S × S × S1 × S1 ), since ψ is the sum of three functions from this class: u1 (z1 ) − u1 (z2 ), c(x, z1 ), and c(z2 , y). Further, as au1 (x) − bu1 (y) = inf{ψ(x, y, z1 , z2 ); z1 , z2 ∈ S1 }, the sublevel set {(x, y); au1 (x) − bu1 (y) < α} is analytic for every α ∈ IR. In fact, this set is the projection onto S × S of the analytic set {(x, y, z1 , z2 ); ψ(x, y, z1 , z2 ) < α} (the last set is analytic since ψ ∈ R(S × S × S1 × S1 )). Consequently, au1 (x) − bu1 (y) ∈ R(S × S), and the proof is complete. 2 Remark 5.2.16 It is easy to show that the proof of Theorem 5.2.15 extends to the case of an arbitrary (nonmetrizable) compact space if Borel sets and functions are replaced by Baire sets and Baire measurable functions, and analytic sets by sets in the class AB0 (S × S).

5.2.4

A Continuous Selection Theorem

The nonemptiness of Lip (c, S; X), which is a part of the statement of the abstract duality theorem (Theorem 4.2.6), can be associated with some special selections of the set-valued mapping Γ : S × S → 2IR , Γ(x, y) = {w ∈ IR; w ≤ c(x, y)}. In the various extension theorems, we have been concerned with selections of the form u(x) − u(y) where u ∈ X. We can also formally consider more general existence problems, with Γ(x, y) = {w ∈ IR; e1 (x, y) ≤ w ≤ e2 (x, y)},


303

where e1 : S × S → IR ∪ {+∞},

e2 : S × S → IR ∪ {+∞}.

However, the new problems clearly reduce to the previous one with c(x, y) := min(e2 (x, y), −e1 (y, x)). Following Levin and Milyutin (1979), we now formulate a general problem on continuous selections of the form u(x) − u(y) for set-valued mappings with values in a topological vector space. Suppose that S is a compact space and E is a topological vector space. For every point (x, y) ∈ S × S let there be given some closed convex sets Γ(x, y) ⊆ E. Now we address the following question: When does a continuous mapping u : S → E exist such that u(x) − u(y) ∈ Γ(x, y)

for all x, y ∈ S ?

This problem is not simple, and we consider a special case of it, which can be analyzed by means of the duality theory for mass transfer problems. Let T be a paracompact space (see Kuratowski (1966)), E = C(T ) the space of continuous functions on T equipped with the topology of uniform convergence on compact subsets of T , and let Γ(x, y) = {v ∈ C(T ); v(t) ≤ c(x, y, t)}. Here the function c : S × S × T → IR ∪ {+∞} is assumed to be bounded below. Every u ∈ C(S × T ) can be regarded as a continuous mapping S → C(T ) defined by s "→ u(s, ·) (and as a continuous mapping T → C(S), t "→ u(·, t)). Theorem 5.2.17 If c is lower semicontinuous on S × S × T and satisfies for every t ∈ T the triangle inequality c(x, y, t) + c(y, z, t) ≥ c(x, z, t)

∀x, y, z ∈ S,

then there exists a function u ∈ C(S × T ) such that u(x, t) − u(y, t) ≤ c(x, y, t)

for all x, y ∈ S, t ∈ T.

We precede the proof by two lemmas.

304


Lemma 5.2.18 Let Q be a compact space, fγ a net of functions Q → IR ∪ {+∞}, f : Q → IR ∪ {+∞}, and let f (z) ≤ lim fγ (z) γ

for all z ∈ Q.

Further, let v ∈ C(Q) and v(z) ≤ f (z) for all z ∈ Q. Then for every ε > 0 there is a γ0 = γ0 (ε) such that v(z) ≤ fγ (z) + ε

for all γ & γ0

and all z ∈ Q.

Proof: Suppose the contrary. Then for every γ there exist γ & γ and zγ ∈ Q such that v(zγ ) > fγ (zγ ) + ε0 , where e0 > 0 does not depend on γ. The indices γ form a directed set with respect to the induced order relation, and this set is obviously cofinal in the set of all γ. Consequently, zγ is a net in Q and fγ is a subnet of fγ . Using compactness of Q, we choose a convergent subnet zγν → z0 . We obtain v(z0 ) = lim v(zγν ) ≥ lim fγν (zγν ) + ε0 ≥ f (z0 ) + ε0 , ν

ν

2

which contradicts the assumption.

Lemma 5.2.19 Let c : S × S → IR ∪ {+∞} be lower semicontinuous and satisfy the triangle inequality. Then for every α > 0, : α ; Lip(c + α1S×S , S; C(S)) ⊆ c Lip (c, S; C(S)) + B , 2 where B is the unit ball in C(S) and c stands for the closure in C(S). Proof: As a lower semicontinuous function on a compact space, the function c is bounded below. According to Theorem 4.4.13, for every ∈ C(S)∗0 there exists a measure µ ∈ C(S × S)∗+ such that π1 µ = + , π2 µ = − , and A(c, ) =

c dµ. S×S

Then A(c + α1S×S , ) ≤

(c(x, y) + α)µ( d(x, y)) = A(c, ) + S×S

α |||| (5.2.7) 2


305

for all ∈ C(S)∗0 , hence also for all ∈ C(S)∗ , since A(c, ) = +∞ for ∈ C(S)∗0 . The assertion of the lemma follows from (5.2.7) if we recall that Lip (c, S; C(S)) = ∂A(c, ·)(0) (see Lemma 4.2.12). 2 The following result is a direct consequence of Lemma 5.2.19. Corollary 5.2.20 Under the assumptions of Lemma 5.2.19, Lip(c + α1S×S , S; C(S)) ⊆ Lip (c, S; C(S)) +

α+ε B 2

whenever ε > 0. Remark 5.2.21 In a similar way the following abstract version of Lemma 5.2.19 can be proved: : α ; Lip(c + α1S×S , S; X) ⊆ c Lip (c, S; X) + B . 2 Here c : S × S → IR ∪ {+∞} is bounded below, is regular with respect to the Banach lattices W and X, and satisfies the triangle inequality, while B is the unit ball in X, and c denotes closure in X. Proof of Theorem 5.2.17: Consider the set-valued mapping T → 2C(S) given by t "→ Lip(ct , S; C(S)),

(5.2.8)

and ct (x, y) := c(x, y, t)

for all t ∈ T, x, y ∈ S.

The sets Lip(ct , S; C(S)), t ∈ T , are nonempty in view of Theorem 4.2.6. Also, they are convex and closed in C(S). Let us show that the set-valued mapping (5.2.8) is lower semicontinuous; that is, for every t ∈ T and ϕ ∈ Lip(ct , S; C(S)) and for every convergent net tγ → t there exists a net ϕγ ∈ Lip(ctγ , S; C(S)) converging to ϕ in C(S). Thus, let tγ → t and ϕ ∈ Lip (c, S; C(S)). Applying Lemma 5.2.18 to Q = S × S, f = ct , fγ = ctγ , and v(x, y) = ϕ(x) − ϕ(y), we find that ϕ ∈ Lip(ctγ + ε1S×S , S; C(S)) for all ε > 0 and γ & γ0 (ε). Then by Lemma 5.2.19, ϕ ∈ Lip(ctγ , S; C(S)) + εB for all ε > 0 and γ & γ0 (ε). This leads to the existence of the required ϕγ .

306


Thus, the set-valued mapping (5.2.8) is lower semicontinuous, and according to Michael’s selection theorem (see Michael (1956)), it has a continous selection u : T → C(S). We have [ u(t)](x) − [ u(t)](y) ≤ c(x, y, t)

for all t ∈ T, x, y ∈ S.

In order to complete the proof it remains to observe that the equality u(x, t) := [ u(t)](x) establishes an isomorphism between the vector spaces of continuous functions u : S × T → IR and of continuous mappings u : T → C(S). 2

5.3 Approximation Theorems for Mass Transfer Problems with Continuous Cost Functions In this section we study the approximation of mass transfer problems by finite-dimensional linear programs. The results in this section are due to Levin (1974, 1975b, 1978a).

5.3.1

Introductory Remarks

In section 5.1 we provided an explicit formula for the optimal value of the mass transfer problem, A(c, ), with a cost function that is smooth and satisfies c(x, x) = 0 for all x ∈ S (see Theorem 5.1.1). Except for this particular case, only few results are known that give an explicit expression for A(c, ) (or C(c, σ1 , σ2 )), and most of them relate to the classical formulation, that is, to the Kantorovich–Rubinstein metric. For example, if S ⊂ IR and c(x, y) = |x − y|, then +∞ |F1 (x) − F2 (x)| dx, dKR (σ1 , σ2 ) = −∞

where Fi (x) = σi (−∞, x], i = 1, 2. However, even for S ⊂ IRn with n ≥ 2 an effective computation of the Kantorovich–Rubinstein metric becomes a nontrivial problem, and explicit formulas have been obtained only in certain special cases and under additional simplifying assumptions; see, for example, Rachev (1984b), R¨ uschendorf (1985b), Levin and Rachev (1989), Rachev and Shortt (1990), and Rachev and R¨ uschendorf (1991). The absence of precise formulas explains the development of methods for solving mass transfer problems approximately, based on approximating the original

5.3 Approximation Theorems

307

problem by suitable finite-dimensional or semi-infinite linear programs in combination with a dual approach. We consider two approximation schemes for compact spaces and continuous (or lower semicontinuous) cost functions. The first scheme deals with the transportation problem with fixed marginals (4.6.3)–(4.6.4) (see Section 4.6.2 above). We consider finite collections of functions, γ = {ϕj ∈ C(S1 ) (j = 1, . . . , m),

ψk ∈ C(S2 ) (k = 1, . . . , n)},

and replace the constraints (4.6.4) by the constraints µ

≥

µ(S1 × S2 ) =

0; α0 ,

where α0 = σ1 S1 = σ2 S2 ;

ϕj , π1 µ =

ϕj , σ1 , j = 1, . . . , m;

ψk , π2 µ =

ψk , σ2 , k = 1, . . . , n.

(5.3.1)

We study the approximation of the optimal value C(c; σ1 , σ2 ) by the values c dµ C γ (c; σ1 , σ2 ), where C γ (c; σ1 , σ2 ) denotes the infimum of c(µ) = S1 ×S2

under constraints (5.3.1). The second approximation scheme is applied to both types of the mass transfer problem (with given marginals and with a given marginal difference) and consists in approximating the corresponding measures (σ1 , σ2 , or ) by finite linear combinations of Dirac measures. In this case, the original mass transfer problem is approximated by suitable finite-dimensional linear programming transportation problems.

5.3.2

Approximation Theorems of the Marginal Restrictions

Suppose that S1 and S2 are compact spaces and that the cost function c : S1 × S2 → IR is continuous. Given measures σ1 ∈ C(S1 )∗+ and σ2 ∈ C(S2 )∗+ with σ1 S1 = σ2 S2 , we consider the extremal marginal problem(4.6.3)– (4.6.4) as formulated in Section 4.6. Recall that the problem is to find the optimal value of the functional   c(x, y)µ( d(x, y)); C(c; σ1 , σ2 ) = inf  S1 ×S2   µ ∈ C(S1 × S2 )∗+ , π1 µ = σ1 , π2 µ = σ2 . 

308


Let Γ denote the set of all possible finite collections γ = {ϕj ∈ C(S1 ) (j = 1, . . . , m),

ψk ∈ C(S2 ) (k = 1, . . . , n)},

and assume that γ1 & γ2 if and only if γ1 ⊃ γ2 . Fix a γ ∈ Γ and consider the extremal problem that consists in finding the optimal value     C γ (c; σ1 , σ2 ) = inf c(x, y)µ( d(x, y)); µ satisfies (5.3.1) . (5.3.2)   S1 ×S2

Observe that the set of measures satisfying (5.3.1) is weak* compact. Consequently, the value C γ (c; σ1 , σ2 ) is attained; that is, there exists an optimal solution for problem (5.3.2). Also observe that the assumption µ(S1 × S2 ) = α0 in (5.3.1) is satisfied automatically when γ contains one of the functions 1S1 or 1S2 . Now we are in position to formulate the approximation theorem. Theorem 5.3.1 I. The optimal value C(c; σ1 , σ2 ) is approximated in the sense of monotone convergence C γ (c; σ1 , σ2 ) ↑ C(c; σ1 , σ2 ); that is (i) C γ1 (c; σ1 , σ2 ) ≥ C γ2 (c; σ1 , σ2 ) for γ1 & γ2 ; and (ii) for every ε > 0 there is γ0 = γ0 (ε) ∈ Γ such that C(c; σ1 , σ2 ) ≥ C γ (c; σ1 , σ2 ) > C(c; σ1 , σ2 ) − ε whenever γ & γ0 . II. Let µγ for every γ ∈ Γ denote an optimal solution for the problem (5.3.2). (As is mentioned above, such solutions exist for all γ ∈ Γ.) Then a weakly* convergent subnet µγν can be extracted from the net µγ , and the limit of any such subnet µγν is an optimal solution of the original problem. III. Suppose that the compact spaces S1 and S2 are metrizable, and let {ϕj } and {ψk } be countable sets that are dense in C(S1 ) and C(S2 ) respectively. Set γn = {ϕ1 , . . . , ϕn ; ψ1 , . . . , ψn }. Then C γn (c; σ1 , σ2 ) ↑ C(c; σ1 , σ2 ) as n → ∞. Furthermore, in such a case a weakly* convergent subsequence (ordinary subsequence, not merely a subnet) µnk can be extracted from the sequence µn = µγn , and the limit of any such subsequence is an optimal solution for the original problem.


309

Remark 5.3.2 In Levin (1974, Theorem 2), the above theorem was formulated for nonnegative c. However, this assumption is not essential for this proof. Remark 5.3.3 The statement of the theorem remains true with the same proof if the cost function c : S1 × S2 → IR ∪ {+∞} is assumed to be lower semicontinuous. Proof: I. For every γ = {ϕ1 , . . . , ϕm ∈ C(S1 ); ψ1 , . . . , ψn ∈ C(S2 )} ∈ Γ we denote by Pγ the set of measures µ satisfying (5.3.1). The sets     Qγ := Pγ ∩ µ ∈ C(S1 × S2 )∗ ; c(x, y)µ( d(x, y)) ≤ C(c; σ1 , σ2 ) − ε   S1 ×S2

are weak* compact, since they are weakly* closed and contained in the weak* compact set P := {µ ∈ C(S1 × S2 )∗ ; µ ≥ 0, µ(S1 × S2 ) = α0 }. Further, it is evident that 5

Pγ = {µ ∈ C(S1 × S2 )∗+ ; π1 µ = σ1 , π2 µ = σ2 };

γ∈Γ

that is, the intersection of all Pγ is exactly the set of measures satisfying the constraints of the original problem. This combined with the definition of C(c; σ1 , σ2 ) implies that the intersection of all Qγ is empty. Then, in view of the weak* compactness of Qγ , there is a finite set {γ1 , . . . , γr } ⊂ Γ such that r 5

Qγk = Ø.

k=1

For every γ & γ0 = γ1 ∪ · · · ∪ γr the set Qγ is empty; that is, C γ (c; σ1 , σ2 ) > C(c; σ1 , σ2 ) − ε. The monotonicity of C γ (c; σ1 , σ2 ) in γ and the inequality C(c; σ1 , σ2 ) ≥ C γ (c; σ1 , σ2 ) are obvious. II. Note that Pγ1 ⊂ Pγ2 for γ1 & γ2 . Then for a weak* convergent subnet µγν to µ0 it follows that µ0 ∈

5 ν

Pγν =

5 γ

Pγ ;

310


that is, µ0 satisfies the constraints of the original problem. Furthermore, c dµ0 = lim c dµγν = lim C γν (c; σ1 , σ2 ) = C(c; σ1 , σ2 ), ν

S1 ×S2

ν

S1 ×S2

and so µ0 is an optimal solution for the original problem. (Should c be lower semicontinuous, the above remains true, because c dµ0 ≤ lim c dµγν = lim C γν (c; σ1 , σ2 ) = C(c; σ1 , σ2 ), ν

S1 ×S2

ν

S1 ×S2

and therefore the equality c dµ0 = C(c; σ1 , σ2 ) S1 ×S2

holds as well.) It remains to note that the existence of a weak* convergent subnet µγν is an immediate consequence of the relation µγ ∈ Pγ ⊂ P and of the weak* compactness of P. III. The proof of this assertion follows the proofs of I and II. In fact, it suffices to consider sequences instead of nets. To see this, ∞ 5

Pγn = {µ ∈ C(S1 × S2 )∗+ : π1 µ = σ1 , π2 µ = σ2 }.

n=1

Furthermore, by the metrizability of S1 × S2 , the space C(S1 × S2 ) is separable. Therefore, the restriction of the weak* topology in C(S1 × S2 )∗ to P is metrizable. 2

Remark 5.3.4 The assertion III remains true if finite linear combinations of the functions ϕj (x) and ψk (y) are weakly dense in C(S1 ) and C(S2 ) respectively. We next look at problem (5.3.2) more closely. Set αj = ϕj (x)σ1 ( dx), j = 1, . . . , m, S1

and βk =

ψk (y)σ2 ( dy), S2

k = 1, . . . , n.


311

Denote by Dγ = Dγ (c; σ1 , σ2 ) the optimal value of the extremal problem that consists in maximizing the functional α0 u0 +

m

αj uj −

j=1

n

βk vk

(5.3.3)

k=1

over the set of all w = (u0 , u1 , . . . , um , v1 , . . . , vn ) ∈ IR1+m+n satisfying u0 +

m

n

uj ϕj (x) −

j=1

vk ψk (y) ≤ c(x, y)

(5.3.4)

k=1

for all (x, y) ∈ S1 × S2 . The following theorem supplements Theorem 5.3.1. Theorem 5.3.5 Suppose that the cost function c is continuous (or lower semicontinuous). Then there exists an optimal solution for problem (5.3.2), µγ , of the form µγ = λ1 δz1 + · · · + λr δzr . Here zi = (xi , yi ) ∈ S1 × S2 (i = 1, . . . , r), r ≤ m + n + 1; δz denotes the Dirac measure at z, and the coefficients λi , i = 1, . . . , r, satisfy λi ≥ 0, r i=1 r

r

λi

=

α0

λi ϕj (xi )

=

αj , j = 1, . . . , m,

λi ψk (yi )

=

βk , k = 1, . . . , n.

i=1

(5.3.5)

i=1

Proof: Problem (5.3.2) is dual to problem (5.3.3)–(5.3.4) in the sense of Section 4.6. This can be rewritten as follows: C γ (c; σ1 , σ2 ) = Dγ (c; σ1 , σ2 )

=

inf sup L(w, µ),

µ≥0 w

sup inf L(w, µ), w µ≥0

where L(w, µ) = α0 u0 +

m

αj uj −

j=1

+ S1 ×S2



u0 +

n

βk vk

k=1 m j=1

uj ϕj (x) −

n k=1

 vk ψk (y) − c(x, y) µ( d(x, y))

312


is the Lagrange function for problem (5.3.3)–(5.3.4). Moreover, by taking into account the continuity (lower semicontinuity) of c and applying the same argument as used in the proofs of Theorems 4.6.8 and 4.6.12, we obtain the duality relation C γ (c; σ1 , σ2 ) = Dγ (c; σ1 , σ2 )

(5.3.6)

for all σ1 ∈ C(S1 )∗+ , σ2 ∈ C(S2 )∗+ with σ1 S1 = σ2 S2 . Further, according to Levin (1969, Theorem 1; 1985a, Theorem 2.3), there are r ≤ m + n + 1 points z1 , . . . , zr in S1 × S2 (zi = (xi , yi ), i = 1, . . . , r) such that Dγ = Dγ {z1 , . . . , zr }.

(5.3.7)

Here, Dγ {z1 , . . . , zr } denotes the optimal value of the extremal problem that consists in maximizing the functional (5.3.3) over the set of all w = (u0 , u1 , . . . , um , v1 , . . . , vn ) ∈ IR1+m+n satisfying u0 +

m

uj ϕj (xi ) −

j=1

n

vk ψk (yi ) ≤ c(xi , yi ),

i = 1, . . . , r.

k=1

This extremal problem is an ordinary finite-dimensional linear program, so for it the duality theorem holds. This is to say that Dγ {z1 , . . . , zr } = C γ {z1 , . . . , zr },

(5.3.8)

where C γ {z1 , . . . , zr } is the optimal value of the dual linear programming problem. On the other hand, the dual problem is to minimize the functional r

λi c(xi , yi )

i=1

over all (λ1 , . . . , λr ) satisfying (5.3.5). It follows from (5.3.6)–(5.3.8) that C γ (c; σ1 , σ2 ) = C γ {z1 , . . . , zr }, which completes the proof.

2

Remark 5.3.6 In Theorem 5.3.1 it is asserted that a subnet (or a subsequence) converging to an optimal measure can be extracted from the net µγ . Note, however, that because of possible nonuniqueness of optimal solutions for the original marginal problem, the whole net µγ need not be convergent.


5.3.3

313

Approximation by Discrete Marginal Measures

Consider now the mass transfer problem with a given marginal difference on a compact space. Suppose that S is a compact space and c : S × S → IR is a continuous function that satisfies the triangle inequality and the equality c(x, x) = 0

for all x ∈ S.

Let σ1γ and σ2γ be two nets of measures in C(S)∗+ with σ1γ S = σ2γ S, and suppose that their supports supp σ1γ and supp σ2γ are finite sets. We set cγij = γ γ = σ1γ {xγi }, σ2i = σ2γ {xγ2 }, i, j = 1, . . . , m, where S γ := c(xγi , xγj ), σ1i γ {x1 , . . . , xγm } is the support of the measure σ1γ − σ2γ . Consider the mass transfer problem on the space S γ that is a linear programming problem of the following type: minimize

m m

cγij µij ,

(5.3.9)

i=1 j=1

subject to µij m

≥

0,

i, j = 1, . . . , m;

γ γ (µij − µji ) = σ1i − σ2i ,

(5.3.10)

i = 1, . . . , m.

(5.3.11)

j=1

Let Aγ be the optimal value of the problem (5.3.9)–(5.3.11); that is, Aγ = A(c/S γ ×S γ , σ1γ − σ2γ ). Theorem 5.3.7 If the nets σ1γ and σ2γ are bounded in norm in C(S)∗ and converge weakly* to measures σ1 and σ2 respectively, then A(c, σ1 − σ2 ) = limγ Aγ . Proof: Set γ S+

:=

{x ∈ S; σ1γ {x} − σ2γ {x} > 0},

γ S−

:=

{x ∈ S; σ1γ {x} − σ2γ {x} < 0}.

Next consider the measures σ γ1 , σ γ2 concentrated on S γ and determined by the equalities   σ γ {x} − σ γ {x}, if x ∈ S γ , + 1 2 σ γ1 {x} =  0, if x ∈ S γ ,

σ γ2 {x} =

+

γ σ2γ {x} − σ1γ {x}, if x ∈ S− ,

0,

γ . if x ∈ S−

314


We have σ γ1 ≥ 0,

σ γ2 ≥ 0,

σ γ1 − σ γ2 = σ1γ − σ2γ .

Consequently, A(c, σ1γ − σ2γ ) = A(c, σ γ1 − σ γ2 ). Further, in view of Theorem 5.3.1, there exists a measure µγ ∈ C(S × S)∗+ such that π1 µγ = σ γ1 , π2 µγ = σ γ2 , and c(x, y)µγ ( d(x, y)) = A(c; σ γ1 − σ γ2 ). S×S γ γ × S− ⊂ S γ × S γ , and consequently, We obtain supp µγ ⊆ S+ γ γ γ A(c, σ 1 − σ 2 ) = A . Thus, we have

A(c, σ1γ − σ2γ ) = Aγ .

(5.3.12)

Next, by the duality theorem (Theorem 4.1.1 or Theorem 4.1.9), we have A(c, σ1γ − σ2γ ) = B(c, σ1γ − σ2γ ),

∀γ,

(5.3.13)

and A(c, σ1 − σ2 ) = B(c, σ1 − σ2 ).

(5.3.14)

To complete the proof it suffices, in view of (5.3.12)–(5.3.14), to show that B(c, σ1 − σ2 ) = lim B(c, σ1γ − σ2γ ).

(5.3.15)

γ

To this end, fix an arbitrary point x0 ∈ S and, applying Theorem 5.3.1, find a function uγ ∈ Lip (c, S; C(S)) such that uγ (x0 ) = 0 and uγ , σ1γ − σ2γ = B(c, σ1γ − σ2γ ). Since the set {u ∈ Lip (c, S; C(S)); u(x0 ) = 0} is compact in C(S) (see the proof of Theorem 5.3.1), a subnet uγν can be chosen that converges to some function u0 ∈ C(S). Also, assume without loss of generality that B(c, σ1γν − σ2γν ) converges to limγ supγ γ B(c, σ1γ − σ2γ ). It is clear that u0 ∈ Lip (c, S; C(S)) and u0 (x0 ) = 0. Further, since the nets σ1γ and σ2γ are bounded in norm, we obtain B(c, σ1γν − σ2γν ) = uγν (x)(σ1γν − σ2γν )( dx) → u0 (x)(σ1 − σ2 )( dx); S

S

consequently,

lim sup B(c, σ1γ − σ2γ ) ≤ B(c, σ1 − σ2 ). γ γ γ

(5.3.16)


315

On the other hand, suppose now that u∗ is an optimal solution for the dual problem; that is, u∗ ∈ Lip (c, S; C(S)) and u∗ (x)(σ1 − σ2 )( dx) = B(c, σ1 − σ2 ). S

Then B(c, σ1 − σ2 )

=

lim u∗ (x)(σ1γ − σ2γ )( dx) γ

(5.3.17)

S

≤

lim inf B(c, σ1γ − σ2γ ). γ γ γ

So (5.3.15) follows from (5.3.16) and (5.3.17).

2

Next consider the problem with fixed marginals to find the optimal value   C(c; σ1 , σ2 ) = inf c(x, y)µ( d(x, y)); (5.3.18)   S1 ×S2  µ ∈ C(S1 × S2 )∗+ , π1 µ = σ1 , π2 µ = σ2 ,  where S1 and S2 are compact spaces, σ1 ∈ C(S1 )∗+ , σ2 ∈ C(S2 )∗+ , σ1 S1 = σ2 S2 , and the cost function c : S1 × S2 → IR is supposed to be continuous. As in the first part of this section we approximate problem (5.3.18) by finite-dimensional linear programming transportation problems as follows: Let σ1γ and σ2γ be nets in C(S1 )∗+ and C(S2 )∗+ respectively such that = σ2γ S2 . Moreover, the supports of the measures σ1γ and σ2γ ,

σ1γ S1

S1γ = supp σ1γ

and S2γ = supp σ2γ ,

are finite sets, S1γ = {xγ1 , . . . , xγm }, S2γ = {y1γ , . . . , ynγ }, where m = m(γ), n = n(γ). Set for any i ∈ {1, . . . , m}, j ∈ {1, . . . , n}, cγij = c(xγi , yjγ ),

γ σ1i = σ1γ {xγi },

γ σ2j = σ2γ {yjγ }.

Fix γ and consider the linear programming transportation problem (the problem of mass transfer from S1γ to S2γ ) m n i=1 j=1

cγij µij → min

(5.3.19)

316


subject to the following constraints on µij :

m i=1 n

µij

≥

0,

µij

=

γ σ2j ,

j = 1, . . . , n;

=

γ σ1i ,

i = 1, . . . , m.

µij

j=1

i = 1, . . . , m; j = 1, . . . , n; (5.3.20)

Denote the optimal value of this problem by C γ and with any optimal solution for problem (5.3.19)–(5.3.20) associate (µγij ), the Radon measure on S1 × S2 , µγ :=

m n

µγij δ(xγi ,yjγ ) .

i=1 j=1

Theorem 5.3.8 If the nets σ1γ and σ2γ are bounded in norm and weakly* converge to σ1 and σ2 respectively, then C(c; σ1 , σ2 ) = limγ C γ , and a weak* convergent subnet µγν can be chosen from µγ . Moreover, the limit of any such subnet is an optimal solution for the problem (5.3.18). Remark 5.3.9 (cf. Remark 5.3.6) The whole net µγ can fail to be convergent. Remark 5.3.10 If the compact spaces S1 and S2 are metrizable, then the restriction of the weak* topology to bounded sets of C(S1 × S2 )∗+ is metrizable, and so all nets in Theorem 5.3.8 may be replaced by sequences; compare with Theorem 5.3.1, III. Proof of Theorem 5.3.8: Though a direct proof of the theorem can be given, we prefer to derive it from Theorem 5.3.7. To this end, consider the topological sums S := S1 ⊕ S2

and S γ := S1γ ⊕ S2γ .

Here, the notation S1 ⊕ S2 stands for the space that is the union of a copy of S1 and a copy of S2 , and a set G in S1 ⊕ S2 is regarded as open if and only if G ∩ S1 is open in S1 and G ∩ S2 is open in S2 . Clearly, S1 ⊕ S2 is a compact space, and both spaces S1 and S2 are open–closed in S1 ⊕ S2 . Note that any measure σ1 ∈ C(S1 )∗+ can be considered as a measure in C(S)∗+ that is concentrated on S1 , and similarly, any σ2 ∈ C(S2 )∗+ may be identified with a measure in C(S)∗+ that is concentrated on S2 . Taking into account these identifications, we have S γ = supp (σ1γ − σ2γ ).


317

Now extend the cost function c : S1 × S2 → IR to a continous function f : S × S → IR satisfying the triangle inequality and the equality f (z, z) = 0 for all z ∈ S. The extension f is given by   c(x, y), if z1 = x , z2 = y,      [c(x1 , y) − c(x2 , y)], if z1 = x1 , z2 = x2 ,   max y∈S2 f (z1 , z2 ) := max[c(x, y1 ) − c(x, y2 )], if z1 = y1 , z2 = y2 ,   x∈S1     [c(x1 , y1 ) − c(x1 , y) − c(x, y1 )], if z1 = y , z2 = x,   xmax 1 ∈S1 y1 ∈S2

where x and y (maybe with subscripts) denote elements of S1 and S2 respectively. It is obvious that f is continuous and satisfies f (z, z) = 0 whenever z ∈ S. To verify the triangle inequality f (z1 , z2 ) + f (z2 , z3 ) ≥ f (z1 , z3 ), one has to examine eight possible cases, depending on the domains of the arguments z1 , z2 , and z3 . If, for example, z1 = y ∈ S2 , z2 = x2 ∈ S1 , z3 = x3 ∈ S1 , then f (y, x2 ) + f (x2 , x3 ) ≥ c(x1 , y1 ) − c(x1 , y) − c(x2 , y1 ) + c(x2 , y2 ) − c(x3 , y2 ) for all x1 ∈ S1 , y1 , y2 ∈ S2 . Substituting y1 = y2 yields f (y, x2 ) + f (x2 , x3 ) ≥ c(x1 , y1 ) − c(x1 , y) − c(x3 , y1 ) for all x1 ∈ S1 , y1 ∈ S2 . Consequently, f (y, x2 ) + f (x2 , x3 ) ≥ max [c(x1 , y1 ) − c(x1 , y) − c(x3 , y1 )] = f (y, x3 ). x1 ∈S1 y1 ∈S2

The verification of the triangle inequality in the remaining cases is similar. Now we consider the mass transfer problem on S with the cost function f . According to Theorem 4.6.7 we have A(f, σ1 − σ2 ) = C(f ; σ1 , σ2 ), A(f, σ1γ − σ2γ ) = C(f ; σ1γ , σ2γ ).

(5.3.21) (5.3.22)

Similarly, A(f |S γ ×S γ , σ1γ − σ2γ ) = C(f |S γ ×S γ ; σ1γ , σ2γ ). Since supp µ ⊆ σ2γ , we have

S1γ

×

S2γ

for every µ ∈ C(S × S)∗+ with π1 µ =

C(f ; σ1γ , σ2γ ) = C(f |S γ ×S γ ; σ1γ , σ2γ ) = C γ .

(5.3.23) σ1γ ,

π2 µ = (5.3.24)

318


Also, C(f ; σ1 , σ2 ) = C(c; σ1 , σ2 ),

(5.3.25)

since supp µ ⊆ S1 × S2 for every µ ∈ C(S × S)∗+ with π1 µ = σ1 , π2 µ = σ2 . Applying Theorem 5.3.7 yields A(f, σ1 − σ2 ) = lim A(f |S γ ×S γ , σ1γ − σ2γ ). γ

This, in view of (5.3.21)–(5.3.25), can be rewritten as C(c; σ1 , σ2 ) = lim C γ .

(5.3.26)

γ

Further, since ||µγ || = µγ (S1 × S2 ) = σ1γ (S1 ) = ||σ1γ ||, the net µγ is norm bounded in C(S1 × S2 )∗ , hence weak* precompact. Then a weakly* convergent subnet µµν can be chosen from it. Let µ0 be a weak* limit of such a subnet. Then µ0 ∈ C(S1 × S2 )∗+ , and for any u ∈ C(S1 ) we have

u(x)(π1 µ0 )( dx) =

u(x)µ0 ( d(x, y)) = lim S1 ×S2

S1

=

u(x)σ1γν ( dx)

lim ν

S1

u(x)µγν ( d(x, y))

ν S1 ×S2

=

u(x)σ1 ( dx); S1

that is, π1 µ0 = σ1 . The equality π2 µ0 = σ2 can be established in a similar way. Thus, the measure µ0 satisfies all the constraints of (5.3.18). Finally, we have c(x, y)µ0 ( d(x, y)) = lim c(x, y)µγν ( d(x, y)) = lim C γν . S1 ×S2

ν S1 ×S2

ν

These relations, combined with (5.3.26), mean that µ0 is an optimal solution for (5.3.18). The proof is now complete. 2

Remark 5.3.11 Some further approaches to finite-dimensional approximations of infinite linear programs (and of the classical Monge–Kantorovich problem among them) may be found in Vershik and Temelt (1968) and Vershik (1970). See also Anderson and Philpott (1983, 1984) and Anderson and Nash (1987), where algorithms for the numerical solution of a specific form of the mass transfer problem and of some other infinite linear programs are given.

5.4 An Application of the Duality Theory to the Strassen Theorem

319

5.4 An Application of the Duality Theory to the Strassen Theorem In this section we apply the duality theory for marginal problems (see Section 4.6) to obtain a generalization of the famous Strassen theorem on probability measures with given marginals. Our goal here is to illustrate the possibilities given by the duality method rather than to cover the most general class of spaces. Let S be a completely regular topological space. Denote by M (S) the set of probabilty measures in V+ (S), that is, the set of inner regular countably additive set functions σ : B(S) → IR+ with σ(S) = 1. Theorem 5.4.1 Suppose that S1 , S2 ∈ L (see Definition 4.5.2) and that F is a closed subset in S1 × S2 . Given a number ε > 0 and measures σ1 ∈ M (S1 ) and σ2 ∈ M (S2 ), the following statements are equivalent: (a) There exists a measure µ ∈ M (S1 × S2 ) such that π1 µ = σ1 , π2 µ = σ2 , and µ(F ) ≥ 1 − ε. (b) For all A ∈ B(S1 ) and B ∈ B(S2 ) satisfying (A × B) ∩ F = Ø,

(5.4.1)

the inequality σ1 A + σ2 B ≤ 1 + ε

(5.4.2)

holds. (c) For all A ∈ B(S1 ) and B ∈ B(S2 ), σ1 (A) + σ2 (B)

≤ sup{µ (A × S2 ) + µ (S1 × B), µ ∈ M (S1 × S2 ), µ (F ) ≥ 1 − ε}.

(5.4.3)

Proof: The implications (a) ⇒ (c) ⇒ (b) are obvious. Further, F coincides with the trace on S1 × S2 or some compact set in βS1 × βS2 . Using this fact, together with Lemmas 4.5.19 and 4.5.20, and passing from S1 , S2 to βS1 , βS2 , we see that it suffices to prove the implication (b) ⇒ (a) only for the case of compact S1 and S2 . Suppose then that S1 and S2 are compact, and let statement (b) hold. We take c(x, y) = 1 − χF (x, y)

320


and consider the marginal problem (4.6.3)–(4.6.4), as well as the dual problem (4.6.5)–(4.6.6). According to Theorem 4.6.8, C(c; σ1 , σ2 ) = D(c; σ1 , σ2 ), and there exists an optimal measure µ for the marginal problem (4.6.3)– (4.6.4). Moreover, µ has marginals σ1 , σ2 , π1 µ = σ1 , π2 µ = σ2 , and µ(c) = 1 − µ(F ) = D(c; σ1 , σ2 ). Therefore, to prove (a) it suffices to show that D(c; σ1 , σ2 ) ≤ ε. In other words, we have to show that σ1 (u1 ) − σ2 (u2 ) ≤ ε

(5.4.4)

for all u1 ∈ C(S1 ), u2 ∈ C(S2 ) satisfying the inequality: u1 (x) − u2 (y) ≤ 1 − χF (x, y)

for all (x, y) ∈ S1 × S2 .

(5.4.5)

Suppose that u1 and u2 are nonnegative. This assumption is not restrictive, because one can add the same positive constant to both functions without changing (5.4.4) and (5.4.5). In addition, we suppose that the projections of F onto S1 and S2 are the entire spaces: π1 F = S1 ,

π2 F = S2 .

(5.4.6)

This also can be assumed without loss of generality, since we can add ideal isolated points x∗ and y∗ to S1 and S2 , and pass to new compact spaces S1 = S1 ∪ {x∗ }, S2 = S2 ∪ {y∗ }, replacing F by F = F ∪ ({x∗ } × S2 ) ∪ (S1 × {y∗ }) ∪ ({x∗ } × {y∗ }) and putting σ1 {x∗ } = σ2 {y∗ } = 0, u2 (y∗ ) = max u1 (S1 ), and u1 (x∗ ) = min u2 (S2 ) = min(min u2 (S2 ), max u1 (S1 )). As a result, the inequality (5.4.5) will remain true, while (5.4.4) will have the same meaning as before. Thus, we shall assume that (5.4.6) is satisfied. Now, from (5.4.5) and (5.4.6) it follows that max u1 (S1 ) − min u2 (S2 ) ≤ 1

(5.4.7)

max u1 (S1 ) ≤ max u2 (S2 ).

(5.4.8)

and

Define ϕ(x) := min{u2 (y); (x, y) ∈ F }, x ∈ S1 . Clearly, ϕ is lower semicontinuous, and in view of (5.4.6), we have u1 (x) ≤ ϕ(x)

for all x ∈ S1

(5.4.9)

5.4 An Application of the Duality Theory to the Strassen Theorem

321

and min u2 (S2 ) = min ϕ(S1 ).

(5.4.10)

Define for t ∈ IR, A(t) := {x ∈ S1 ; t ≤ u1 (x)}, Φ(t) := {x ∈ S1 ; t ≤ ϕ(x)}, and B(t) := {y ∈ S2 ; t ≤ u2 (y)}. All these sets are compact, and it follows from the definition of ϕ that [Φ(t) × (S2 \ B(t))] ∩ F = Ø. Then, according to (b), we have σ1 (Φ(t)) ≤ σ2 (B(t)) + ε

for all t ∈ IR.

(5.4.11)

Now, using (5.4.7)–(5.4.11), we obtain maxu1 (S1 )

σ1 (u1 )

maxu1 (S1 )

σ1 (A(t)) dt ≤

= 0

σ1 (Φ(t)) dt 0

maxu1 (S1 )

=

min ϕ(S1 ) +

σ1 (Φ(t)) dt

min ϕ(S1 ) maxu1 (S1 )

≤

min ϕ(S1 ) +

min(σ2 (B(t)) + ε, 1) dt

min ϕ(S1 ) maxu1 (S1 )

=

min u2 (S2 ) +

min(σ2 (B(t)), 1 − ε) dt

min u2 (S2 )

+ ε[max u1 (S1 ) − min u2 (S2 )] maxu2 (S2 )

≤

min u2 (S2 ) +

=

σ2 (u2 ) + ε;

σ2 (B(t)) dt + ε

min u2 (S2 )

that is, (5.4.4) holds. The proof is complete.

2

322


Remark 5.4.2 The Borel sets in (b) and (c) can be replaced by compact or by open sets. This follows from the inner regularity of all measures under consideration. See Levin (1984b) for details, where compact spaces S1 and S2 are considered. Remark 5.4.3 The equivalence of (a) and (c) was first proved by Strassen (1965, Theorem 11 and below) in the case when S1 and S2 are Polish spaces. Theorem 5.4.1 is a natural nonmetrizable extension of the Strassen theorem. In the case when S1 and S2 are compact, Theorem 5.4.1 was proved by Levin (1984b). The strongest topological version of the Strassen theorem, with a different proof, can be found in Hansel and Troallic (1986), where arbitrary Hausdorff spaces are taken as S1 and S2 . For completely regular spaces and ε = 0, see also Kellerer (1984a). A different (nontopological) extension of the Strassen theorem for ε = 0 was obtained by Sudakov (1979). Also, see Hansel and Troallic (1978), Shortt (1983b), Ramachandran and R¨ uschendorf (1995), and Chapter 2 of this book.

5.5 Some Applications to Mathematical Economics and Decision Theory: Closed Preorders and Continuous Utility Functions 5.5.1

Statement of the Problem and the Idea of the Duality Approach

In various fields of mathematical economics and decision theory one has to compare different alternatives or states (vectors of consumer goods, different kinds of economic development, technological projects, etc.). The fact that certain pairs of states can be compared implies that the state space is endowed with a binary preference relation satisfying some intuitively acceptable conditions (axioms). This type of situation is generally formalized using the notion of a closed preorder. In what follows, S is a topological space. A binary relation on S, ' that is reflexive (x ' x for all x ∈ S) and transitive (x, y, z ∈ S and x ' y, y ' z implies x ' z) is called a preorder on S. A preorder ' is said to be linear (sometimes the terms total or complete or connected are also used) if for any pair of elements x, y ∈ S, at least one of the relations x ' y, y ' x holds. A preorder ' is called closed if its graph gr(') := {(x, y) ∈ S × S; x ' y} is a closed set in S × S.

5.5 Closed Preorders and Continuous Utility Functions

323

Given a preorder ', one can associate with it two binary relations ≺ and ∼ on S defined by x≺y x∼y

⇔ x'y ⇔ x'y

but not y ' x, and y ' x.

An order on S is defined as a preorder satisfying x'y

⇒

x≺y

or x = y,

or equivalently, x ' y, y ' x ⇒ x = y. Any real-valued function u on S satisfying the conditions x'y

⇒

u(x) ≤ u(y),

x, y ∈ S,

(5.5.1)

x≺y

⇒

u(x) < u(y),

x, y ∈ S,

(5.5.2)

is called a utility function of a preorder '. It follows from (5.5.1) that every utility function satisfies the condition x ∼ y ⇒ u(x) = u(y),

x, y ∈ S.

(5.5.3)

If ' is a linear preorder, then the pair of conditions (5.5.1), (5.5.2) is obviously equivalent to the single condition x ' y ⇔ u(x) ≤ u(y),

x, y ∈ S.

(5.5.4)

Note that every function u : S → IR determines a preorder ' defining x ' y if and only if u(x) ≤ u(y). This preorder is obviously linear; hence (5.5.4) fails for any preorder ' that is not linear. Also note that a preorder satisfying (5.5.4) is clearly closed when the utility function u is continuous. One of the fundamental results of utility theory is the Debreu theorem asserting the existence of a continuous utility function for any closed linear preorder on a separable metrizable space (see Debreu (1954, 1964)). It can be shown that the assumptions that the space is metrizable and separable and that the preorder is closed cannot be abandoned. A question arises whether the Debreu theorem remains true when the preorder is not assumed to be linear. Also, it is of interest to find conditions on a given preorder that ensure the existence of a utility function belonging to some class of continuous functions (say, the class of Lipschitz utility functions on a metric space or the class of continuous utility functions with a given modulus of continuity). To answer such questions we apply duality theory for the mass transfer problem. The idea of this approach is to use a specific cost function c that

324


equals zero on the graph of the preorder and has appropriate continuity properties. With help of the duality theorem we obtain a representation gr(') = {(x, y); u(x) ≤ u(y)

for all u ∈ H},

where H = {un ; n ∈ IN} is some countable subset in Lip(c, S; C b (S)). Then one can establish that u0 (x) :=

∞ 1 un (x), 2n n=1

x ∈ S,

is a utility function of ' with the required properties. We also apply this duality approach to obtain universal utility theorems for varying preorders and to study the relationship between closed and stochastic preorders. The results of this section are based on Levin (1981, 1983a, 1983b, 1984a, 1985b, 1986, 1990).

5.5.2

Functionally Closed Preorders

In the present section a wide class of closed preorders on a completely regular topological space is studied. These preorders are called functionally closed and were introduced and investigated by Levin (1985b, 1986, 1990). Definition 5.5.1 A preorder ' on a completely regular topological space S is called functionally closed if a representation gr(') = {(x, y); u(x) ≤ u(y)

for all u ∈ H}

(5.5.5)

holds, where H is a nonempty subset in C b (S). Note that if gr(') has a representation (5.5.5) with H ⊆ C(S), then the set

u ; u∈H H1 := 1 + |u| lies in C b (S). Moreover, the representation (5.5.5) remains true with H replaced by H1 . Hence the preorder ' is functionally closed. A real-valued function u on S is said to be isotone (with respect to the preorder ') if u(x) ≤ u(y) whenever x ' y. Theorem 5.5.2 Let ' be a preorder on a completely regular space S. Then the following statements are equivalent: (a) ' is functionally closed.


325

(b) ' is a restriction of some closed preorder on βS. (c) (The extension theorem) For any compact set F ⊂ S and any function v ∈ C b (F ) that is isotone with respect to the restriction of ' to F , there exists an isotone function u ∈ C b (S) that is an extension of v and satisfies the equalities min u(S) = min v(F ) and max u(S) = max v(F ). (d) (The separation theorem) For any compact sets F1 and F0 in S with (F1 × F0 ) ∩ gr(') = Ø, there exists a continuous isotone function u : S → [0, 1] equal to 1 on F1 and 0 on F0 . To prove the theorem, an extension theorem for a compact space is required. This auxiliary extension theorem supplementing Theorems 5.2.12 and 5.2.14 will be applied also in Section 5.5.4. We now formulate the extension theorem. Theorem 5.5.3 Let ' be a closed preorder on a compact space S, let F be a closed subset in S, and let v ∈ C(F ) be isotone with respect to the restriction of ' to F . Then there exists an isotone function u ∈ C(S) such that u|F = v and min u(F ) = min v(S), max u(F ) = max v(S). Proof: We shall prove the theorem by applying Corollary 5.2.10. Consider the cost function c : S × S → IR ∪ {+∞},   0 if (x, y) ∈ gr('), c : S × S → IR ∪ {+∞}, c(x, y) =  +∞ otherwise. Indeed, c satisfies the triangle inequality, since ' is transitive. Also, c is lower semicontinuous, since ' is assumed to be closed. Furthermore, as v is isotone with respect to the restriction of ' to F , we have that v(x) − v(y) ≤ c(x, y)

for all x, y ∈ F.

Now we consider the function c defined in (5.2.6), which corresponds to S1 = F and u1 = v. Recall that c is given by c (x, y) := min[c(x, y), av (x) − bv (y)], with av (x) := bv (x) :=

inf{v(z) + c(x, z); z ∈ F }, sup{v(z) − c(z, x); z ∈ F }

(see (5.2.4), (5.2.5)). We have av (x) = bv (x) =

min v(F ∩ Ax ), max v(F ∩ Ax ),

(5.5.6)

326


where Ax := {y ∈ S; x ' y}, Ax := {y ∈ S; y ' x}, min Ø := +∞, max Ø := −∞. We claim that av is lower semicontinuous on S. First observe that in view of the closedness of ' and compactness of S, the set {x ∈ S; av (x) = +∞} = {x ∈ S; F ∩ Ax = Ø} is open, and hence av is lower semicontinuous on it. For x with av (x) < +∞ take any convergent net xγ → x. We need to verify that av (x) ≤ lim inf av (xγ ).

(5.5.7)

γ

Passing if necessary to a subnet, we assume without loss of generality that the limit limγ av (xγ ) exists. There are two cases: Either av (xγ ) = +∞ for all γ & γ0 , and then (5.5.7) is trivial, or a subnet can be extracted from xγ on which av takes finite values. In the latter case we pass to this subnet and assume that av (xγ ) < +∞ for all γ. Since F is compact and the function z "→ v(z) + c(x, z) is lower semicontinuous on F whenever x ∈ S, there exists a point zγ ∈ F ∩ Axγ such that v(zγ ) = av (xγ ). Using the compactness of F , we choose a convergent subnet zγν → z ∈ F . We obtain (x, z) = limν (xγν , zγν ) ∈ gr('). Therefore, z ∈ F ∩ Ax and av (x) = min v(F ∩ Ax ) ≤ v(z) = lim v(zγν ) = lim av (xγν ) = lim av (xγ ). ν

ν

γ

The lower semicontinuity of av on S is thus established. Similarly, it is shown that bv is upper semicontinuous on S. Then av (x) − bv (y) is lower semicontinuous on S × S. Furthermore, the function c defined in (5.5.6) is lower semicontinuous on S × S as the minimum of two lower semicontinuous functions. Being lower semicontinuous on S × S, c is regular with respect to the Banach lattices W = C(S × S) and X = C(S). According to Corollary 5.2.10, there exists a function u ∈ Lip (c, S; C(S)) such that u |F = v. Note that u ∈ Lip (c, S; C(S)) means exactly that u is a continuous function on S that is isotone with respect to '. Now the function u(x) = max{min v(F ), min[max v(F ), u (x)]},

x ∈ S,

is easily seen to have all properties that we have required. The proof is complete. 2 Remark 5.5.4 An earlier version of Theorem 5.5.3 for ' being a closed order was proved in Nachbin (1965) in a different way.


327

Proof of Theorem 5.5.2: (a) ⇒ (b). Every function u ∈ C b (S) can be uniquely extended to βS preserving continuity. Therefore, ' is a restriction to S of a closed preorder '1 defined on βS by def x '1 y ⇐⇒ u(x) ≤ u(y)

for all u ∈ H.

Here H is a subset in C b (S), which determines the functionally closed preorder ' (see (5.5.5)), and all the functions u ∈ H are identified with their extensions to βS. (b) ⇒ (c). Passing from S to βS, we reduce the extension theorem to a similar assertion for a compact space βS. It remains to apply Theorem 5.5.3. (c) ⇒ (d). Set F = F1 ∪ F0 and take a function v ∈ C b (F ) that equals 1 on F1 and 0 on F0 . Clearly, this function is isotone with respect to the restriction of ' to F . Then, by assumption, it can be extended to S as an isotone function u ∈ C b (S), u(S) ⊆ [0, 1], and since u|F = v, we have u|F1 = 1 and u|F0 = 0. (d) ⇒ (a). Let H denote the set of all isotone functions in C b (S). Obviously, gr(') ⊆ {(x, y); u(x) ≤ u(y)

for all u ∈ H}.

Let us check the opposite inclusion. Suppose that (x, y) ∈ gr(') and take the one-point sets F1 = {x} and F0 = {y}. Since (F1 × F0 ) ∩ gr(') = Ø, there exists an isotone function u ∈ C b (S) such that 0 ≤ u ≤ 1, u(x) = 1, and u(y) = 0. Thus we have found a function u ∈ H satisfying u(x) > u(y). Since the pair (x, y) ∈ gr(') is arbitrary, this implies the inclusion {(x, y); u(x) ≤ u(y)

for all u ∈ H} ⊆ gr('),

and the representation (5.5.5) is thus established.

2

Corollary 5.5.5 Every closed preorder on a compact space is functionally closed. Proof: This is a direct consequence of Theorem 5.5.3 and the implication (c) ⇒ (a) of Theorem 5.5.2. 2 Corollary 5.5.6 Let ' be a functionally closed preorder on a completely regular space S. Suppose that F1 and F0 are compact subsets in S and (F1 × F0 ) ∩ gr(') = Ø. Then there exist open sets G1 and G0 such that G1 ⊃ F1 , G0 ⊃ F0 and (G1 × G0 ) ∩ gr(') = Ø.

328


Proof: This follows from the separation theorem (see (d)) if one takes G1 = {x; u(x) > α}, G0 = {x; u(x) < β}, where 0 < β ≤ α < 1. 2 Remark 5.5.7 (see Levin (1984b, Remark on p. 16)) Consider on S the preorder ' given by x ' y ⇔ x = y. This preorder is functionally closed, and gr(') admits the representation (5.5.5) with H = C b (S). Furthermore, for this preorder the condition that (F1 × F0 ) ∩ gr(') = Ø means simply that F1 ∩ F0 = Ø. Thus, by the separation theorem (see (d)), for any compact sets F1 and F0 with F1 ∩ F0 = Ø there exists a function u ∈ C b (S), 0 ≤ u ≤ 1, such that u|F1 = 1 and u|F0 = 0. We see that for a compact S, the separation theorem becomes the well-known Urysohn Lemma. Theorem 5.5.8 Let ' be a preorder on a separable metrizable space S. The following statements are equivalent: (i) Representation (5.5.5) holds with a countable set H ⊂ C b (S). (ii) ' is a restriction to S of a closed preorder '1 on S1 , where S1 is some metrizable compactification of S. If these equivalent assertions are true, the preorder ' admits a continuous utility function. Proof: (i) ⇒ (ii). Let H = {u1 , u2 , . . .} ⊂ Cb (S). Passing if necessary from uK to uK , uK (x) =

uK (x) − min uK (S) , max uK (S) − min uK (S)

we suppose without loss of generality that uK (S) ⊆ [0, 1], k = 1, 2, . . . . Further, since S is metrizable and separable, there exists a countable family of continuous functions ϕk : S → [0, 1], k ∈ IN, separating points of S. Denote by T the topological product of the countable family of segments [0, 1] and consider f : S → T × T given by f (x) = ((uk (x))k∈IN , (ϕk (x))k∈IN ),

x ∈ S.

The mapping f is a continuous embedding of S into T × T . Denote by S1 the closure of S in the metrizable compact space T × T . Now consider on S1 the preorder '1 defined by ((ak )k∈IN , (bk )k∈IN ) '1 ((ak )k∈IN , (bk )k∈IN ) ⇔ ak ≤ ak ,

k ∈ IN.


329

Clearly, '1 has the desired property. (ii) ⇒ (i). Since '1 is a closed preorder on a compact space S1 , it is functionally closed (see Corollary 5.5.5); that is, x '1 y ⇔ u(x) ≤ u(y)

for all u ∈ H1 ,

with H1 ⊆ C(S1 ). Further, since S1 is metrizable, the Banach space C(S1 ) is separable. Therefore, x '1 y ⇔ uk (x) ≤ uk (y),

k ∈ IN,

where the sequence {u1 , u2 , . . .} is dense in H1 . Then the representation (5.5.5) holds for gr(') with H = {uk |S ; k ∈ IN}. Finally, if the equivalent conditions (i), (ii) are satisfied and if H = {uk ; k ∈ IN}, then u0 (x) =

∞ k=1

uk (x) 2k (1 + (1 + |uk (x)|)

is a continuous utility function for '. Indeed, u0 is clearly isotone. Further, if x ≺ y, then (x, y) ∈ gr('), but (y, x) ∈ gr('), and applying (5.5.5) yields uk (x) ≤ uk (y)

for all k ∈ IN,

uk (x) < uk (y)

for at least one k ∈ IN;

hence u0 (x) < u0 (y).

5.5.3

2

Two Generalizations of the Debreu Theorem

Here we prove two theorems on the existence of a continuous utility function for (not necessarily linear) preorders on a separable metric space. Theorem 5.5.9 Let ' be a preorder on a metric space (S, d), and let ω : IR+ → IR+ be an increasing continuous function, ω(0) = 0. Consider on S × S the function   0, if x ' y, c(x, y) = (5.5.8)  ω(d(x, y)), otherwise. The following statements are equivalent: (i) gr(') admits a repesentation (5.5.5) with H ⊆ |u(x) − u(y)| ≤ ω(d(x, y)) for all u ∈ H, x, y ∈ S.

C(S) and

330


(ii) For every (x, y) ∈ gr(') the inequality c∗ (x, y) > 0 holds, where c∗ is the reduced cost function associated with c. If S is separable and one of conditions (i), (ii) holds (hence both conditions hold), then there exists a uniformly continuous utility function u0 for ' satisfying |u0 (x) − u0 (y)| ≤ ω(d(x, y))

for all x, y ∈ S.

(5.5.9)

Proof: First note that the reduced cost function c∗ satisfies the triangle inequality and the inequalities 0 ≤ c∗ (x, y) ≤ ω(d(x, y)). For any x, x , y, y ∈ S we obtain c∗ (x, y) − c∗ (x , y ) ≤ c∗ (x, x ) + c∗ (y , y) ≤ ω(d(x, x )) + ω(d(y , y)), c∗ (x, y) − c∗ (x , y ) ≥ −c∗ (x , x) − c∗ (y, y ) ≥ −ω(d(x , x)) − ω(d(y, y )). Hence |c∗ (x, y) − c∗ (x , y )| ≤ ω(d(x, x )) + ω(d(y, y )). Moreover, c∗ is a continuous function on S × S, and c∗ (x, x) = 0 for every x ∈ S. (Observe that the original cost function (5.5.8) is not continuous on S × S.) (i) ⇒ (ii). From representation (5.5.5) and the definition of c we derive u(x) − u(y) ≤ c(x, y)

for all u ∈ H, x, y ∈ S,

and hence u(x) − u(y) ≤ c∗ (x, y)

for all u ∈ H, x, y ∈ S.

If now (x, y) ∈ gr('), then in view of (5.5.5), there exists a function u ∈ H such that u(x) − u(y) > 0. We thus obtain c∗ (x, y) ≥ u(x) − u(y) > 0. (ii) ⇒ (i). Take H = {uz ; z ∈ S}, where uz (·) = c∗ (·, z). We have uz (x) − uz (y) = c∗ (x, z) − c∗ (y, z) ≤ c∗ (x, y) ≤ ω(d(x, y)) for all x, y ∈ S and all uz ∈ H. Further, if (x, y) ∈ gr('), then uz x) − uz (y) ≤ c∗ (x, y) = 0. Therefore, uz (x) ≤ uu (y); that is, every uz ∈ H is isotone. If (x, y) ∈ gr('), then uy (x) − uy (y) = c∗ (x, y) > 0,


331

and hence the representation (5.5.5) holds. The equivalence of (i) and (ii) is now established. Next, let a sequence (xk ) be dense in S and the statements (i) and (ii) be satisfied. Using the density of (xk ) in S and the continuity of c∗ , we obtain gr(')

= {(x, y); c∗ (x, z) ≤ c∗ (y, z)

for all z ∈ S} = {(x, y); c∗ (x, xk ) ≤ c∗ (y, xk ), k ∈ IN}

c∗ (y, xk ) c∗ (x, xk ) ≤ , k ∈ IN , = (x, y); 1 + c∗ (x, xk ) 1 + c∗ (y, xk )

that is, the representation (5.5.5) is valid with a countable set H =

c∗ (·, xk ) ; k ∈ IN 1 + c∗ (·, xk )

⊂ C b (S).

The preorder ' is functionally closed, and u0 (x) :=

∞ k=1

c∗ (x, xk ) , + c∗ (x, xk ))

2k (1

x ∈ S,

is its utility function. This function satisfies (5.5.9), since for all x, y ∈ S and k ∈ IN, c∗ (y, xk ) c∗ (x, xk ) − 1 + c∗ (x, xk ) 1 + c∗ (y, xk )

=

c∗ (x, xk ) − c∗ (y, xk ) (1 + cx (x, xk ))(1 + cx (y, xk ))

≤

c∗ (x, y) (1 + c∗ (x, xk ))(1 + c∗ (y, xk ))

≤

c∗ (x, y) ≤ ω(d(x, y)).


2

Remark 5.5.10 Let Hωb (') (Hω (')) denote the set of all functions in C b (S) (resp. in C(S)) that are isotone with respect to ' and satisfy (5.5.9). Obviously, Hωb (') Hω (')

= =

Lip(c, S; C b (S)), Lip (c, S; C(S)) = Lip(c, S; IRS ) .

Furthermore, assertion (ii) of Theorem 5.5.9 can be reformulated as follows: (ii ) for every (x, y) ∈ gr(') there exists a function u ∈ Hωb (') such that u(x) > u(y).

332


Corollary 5.5.11 Suppose that (S, d) is a separable metric space and ' is a preorder on S. If for any pair x0 , y0 ∈ S with x0 ≺ y0 there is a continuous isotone function u : S → IR such that u(x0 ) < u(y0 ) and |u(x) − u(y)| ≤ ω(d(x, y)) for all x, y ∈ S, then there exists a utility function for ' that satisfies (5.5.9). Proof: Define on S a new preorder '1 by x '1 y ⇔ c∗ (x, y) = 0. Here c∗ is the reduced cost function associated with the original cost function c, given by (5.5.8). It is clear that for all x, y ∈ S, x ' y ⇒ x '1 y. Also, from the assumption and from Remark 5.5.10, it follows that x ≺ y ⇒ x ≺1 y. Consequently, any utility function for '1 is also a utility function for '. It remains to note that gr('1 ) admits representation (5.5.5) with H = Hω (') = Lip (c, S; C(S)) = Lip(c∗ , S; C(S)), and so by Theorem 5.5.9 there exists a continuous utility function for '1 that satisfies (5.5.9). 2 Remark 5.5.12 A preorder ' satisfying the assumptions of the corollary can fail to be closed. We say that a preorder ' on a metric space (S, d) is Lipschitz if for any x0 , y0 ∈ S with (x0 , y0 ) ∈ gr(') there is an isotone function u on S satisfying the Lipschitz condition |u(x) − u(y)| ≤ d(x, y)

for all x, y ∈ S

and such that u(x0 ) > u(y0 ). The following result is a direct consequence of Theorem 5.5.9 if one sets ω(δ) = δ and takes into account Remark 5.5.10. Corollary 5.5.13 Let ' be a preorder on a metric space (S, d). Set   0, if x ' y, c(x, y) =  d(x, y), otherwise. The following assertions are equivalent:


333

(a) ' is a Lipschitz preorder. (b) gr(') admits a representation (5.5.5) where every function u ∈ H satisfies the Lipschitz condition |u(x) − u(y)| ≤ d(x, y)

for all x, y ∈ S.

(5.5.10)

(c) c∗ (x, y) > 0 whenever (x, y) ∈ gr('), where c∗ stands for the reduced cost function associated with c. If (S, d) is separable, then any of these equivalent conditions implies the existence of a utility function for ' that satisfies the Lipschitz condition (5.5.10). Let (S, d) be a separable metric space with a bounded metric and F(S) the space of closed sets in S with the Hausdorff metric (see Hausdorff (1957)) dH (A, B) := max {inf{α > 0; Aα ⊃ B}, inf{α > 0; B α ⊃ A}} . Here Aα is the open α-neighborhood of A ∈ F(S), α > 0, Aα := {x ∈ S; dist(x, A) < α}, where dist(x, A) := inf{d(x, y); y ∈ A} is the distance from point x to the set A. Theorem 5.5.14 There is a function ϕ on F(S) such that: (a) |ϕ(A) − ϕ(B)| ≤ dH (A, B); (b) if A ⊆ B and A = B, then ϕ(A) < ϕ(B); (c) ϕ(A ∪ B) ≤ ϕ(A) + ϕ(B). Proof: Since S is a separable metric space, there exists a countable family of sets An ∈ F(S), n ∈ IN, such that every A ∈ F(S) can be represented in the form 5 A = An , n∈N (A)

with N (A) := {n ∈ IN; An ⊇ A}. Take then ϕ(A) :=

∞ 1 inf{α > 0; Aα n ⊃ A}, n 2 n=1

A ∈ F(S),

and let us show that the function ϕ has all required properties.

334

5. Applications of the Duality Theory α+dH (A,B)

Let A, B ∈ F(S). If Aα n ⊃ B, then An

⊃ A, and hence

α inf{α > 0; Aα n ⊃ A} ≤ inf{α > 0; An ⊃ B} + dH (A, B).

Replacing A and B yields α inf{α > 0; Aα n ⊃ B} ≤ inf{α > 0; An ⊃ A} + dH (A, B).

Thus, α |inf{α > 0; Aα n ⊃ B} − inf{α > 0; An ⊃ B}| ≤ dH (A, B),

which implies (a). If A ⊆ B and A = B, then α inf{α > 0; Aα n ⊃ A} ≤ inf{α > 0; An ⊃ B},

n ∈ IN,

and there exists n0 ∈ N (A) \ N (B). We obtain inf{α > 0; Aα n0 ⊃ A} = 0, α inf{α > 0; An0 ⊃ B} ≥ dist(x0 , An0 ) > 0, where x0 is any element of B \ An0 . This implies (b). max(α,β)

β α+β ⊃ An Finally, if Aα n ⊃ A, An ⊃ B, we have An

⊃ A∪B. Hence,

β γ inf{α > 0; Aα n ⊃ A} + inf{β > 0; An ⊃ B} ≥ inf{γ > 0; An ⊃ A ∪ B},

which implies (c), thus completing the proof.

2

Corollary 5.5.15 Let (S, d) be a separable metric space with a bounded metric. Suppose ' is a closed preorder on S such that the mapping a : S → F(S), a(x) := {y ∈ S; y ' x}, is continuous with respect to the Hausdorff metric dH on F(S). Then the representation (5.5.5) holds with a countable set H = {un ; n ∈ IN} ⊂ C b (S), where un (x) := inf{α > 0; Aα n ⊃ a(x)}, n ∈ IN, and u0 := ϕ(a(x)) is a continuous utility function for '. If, moreover, dH (a(x), a(y)) ≤ Ld(x, y)

for all x, y ∈ S,

then all the un , n ∈ IN, and the utility function u0 satisfy the Lipschitz condition with a Lipschitz constant L. This is an obvious consequence of Theorem 5.5.14.


5.5.4

335

The Case of a Locally Compact Space

Theorem 5.5.9 and Corollary 5.5.15 deal with two classes of (not necessarily linear) closed preorders on arbitrary separable metric spaces. In mathematical economics preorders are generally treated as consumer preference relations, and S is usually taken as a closed or open subset of a Euclidean space; see for example Duffie (1992). The following result shows that for such spaces the Debreu theorem is true for any closed preorder (that is, without the assumption that the preorder is linear). Theorem 5.5.16 A closed preorder on a separable metrizable locally compact space has a continuous utility function. Proof: Let S be a separable metrizable locally compact space, and let ' be a closed preorder on it. Since S is metrizable and separable, it has a countable base {Gn }n∈IN of open sets. Furthermore, by assumption, S is locally compact, so we can assume without loss of generality that the closure of every Gn is compact. Thus, there exist compact sets Fn , n ∈ IN, such that Gn ⊂ Fn ⊂ intFn+1 ,

n ∈ IN.

(5.5.11)

We set F1 := c G1 , where c stands for the closure. Suppose that compact sets F1 , . . . , Fn satisfying (5.5.11) have already been constructed. Consider the sets Gk that have common points with Fn . They form an open covering of Fn , and as Fn is compact, a finite subcovering Gk1 , . . . , Gkm can be chosen. We define . Fn+1 := c Gn+1 ∪ ∪m j=1 Gkj Clearly, Fn+1 is compact and (5.5.11) holds; therefore, the inductive construction of the sequence Fk can be continued. It follows from (5.5.11) that

Fn = S.

(5.5.12)

n∈IN

Recalling (5.5.11) and (5.5.12), we conclude that every convergent sequence in S is contained in one of the sets Fn . This, combined with the metrizability of S, implies that a real-valued function on S is continuous if its restriction to each Fn is continuous. Let H(') denote the cone of isotone functions in C(S). Given a point (x0 , y0 ) ∈ gr('), let F0 := {x0 , y0 } and set v(x0 ) = 1 and v(y0 ) = 0. Further, choose n0 such that Fn0 ⊃ F0 , and apply Theorem 5.5.3. Then we obtain a function vn0 ∈ C(Fn0 ), isotone with respect to the restriction

336


of ' to Fn0 and such that 0 ≤ vn0 ≤ 1, vn0 (x0 ) = 1, and vn0 (y0 ) = 0. Furthermore, repeatedly applying Theorem 5.5.3, we find a function vn+1 ∈ C(Fn+1 ) for each n ≥ n0 . Moreover, it coincides with vn on Fn and is isotone with respect to the restriction of ' to Fn+1 . Then a function u, u(x) = vn (x)

for x ∈ Fn , n = n0 , n0 + 1, . . . ,

is defined such that u ∈ H('), u(x0 ) = 1, and u(y0 ) = 0. It follows that gr(') = {(x, y); u(x) ≤ u(y)

for all u ∈ H(')}.

(5.5.13)

Next, consider on C(S) the topology τ of uniform convergence on the compact subsets Fn , n ∈ IN. Obviously, it is locally convex and metrizable, and from (5.5.11), (5.5.12), and the Urysohn lemma it follows that the space (C(S), τ ) is separable. It has then a countable base of open sets, which implies that subsets in (C(S), τ ) (and in particular, H(')) are separable too. Consequently, in H(') there exists a dense (with respect to τ ) countable set {un }n∈IN , and since for every x ∈ S the Dirac measure δx , u, δx = u(x) ∀u ∈ C(S) is a continuous linear functional on (C(S), τ ), it follows from (5.5.13) that gr(') = {(x, y); un (x) ≤ un (y), n ∈ IN}.

(5.5.14)

From (5.5.14) we derive that u0 (x) :=

∞ n=1

2n (1

un (x) , + |un (x)|)

x ∈ S,

is a utility function for '. (Indeed, for any a, b ∈ IR, a≤b ⇔

a b ≤ , 1 + |a| 1 + |b|

implying that gr(') =

un (y) un (x) ≤ , (x, y); 1 + |un (x)| 1 + |un (y)|

n ∈ IN ,

un and since 1+|u ∈ C b [S], n ∈ IN, it remains to apply Theorem 5.5.8.) The n| proof is complete. 2

Remark 5.5.17 We have proved that the closed preorder ' is functionally closed (see representation (5.5.13)). This is the crucial point of the entire proof. The fact that ' is functionally closed is equivalent to the condition that for every (x0 , y0 ) ∈ gr(') an isotone continuous function u on S exists with u(x0 ) > u(y0 ). The last property of a closed preorder on a separable


337

metrizable locally compact space was first established in a different way by Auslander (1964, Theorem 4). In the case of compact S, a further proof of Theorem 5.5.16 may be found in Kiruta, Rubinov, and Yanovskaya (1980, Theorem 2.2.3). The question whether Theorem 5.5.16 remains true for any (not necessarily locally compact) separable metrizable space is still open.

5.5.5

Varying Preorders and a Universal Utility Theorem

We shall be interested in the following example. Given a family of closed preorders 'ω depending on a parameter ω in a regular way, the question is, When there is a continuous utility, that is, a jointly continuous real-valued function u(ω, x) such that u(ω, ·) is a utility function for 'ω for each ω? This question arises in various contexts in mathematical economics; see, for example, Hildenbrand (1970) and Kannai (1970). For a linear preorder 'ω some sufficient conditions for the existence of a continuous utility were obtained by Neufeind (1972), Mount and Reiter (1976), and Mas-Colell (1977). All these results are special consequences of the following theorem, due to Levin (1983b). Theorem 5.5.18 Let Ω and S be metrizable topological spaces, and let S, in addition, be separable and locally compact. Suppose that a preorder 'ω is given on S for each ω ∈ Ω and that the set {(ω, x, y); x 'ω y} is closed in Ω × S × S. Then there exists a continuous utility u : Ω × S → [0, 1]. Proof: Take compact sets Fn , n ∈ IN, satisfying (5.5.11), (5.5.12) (see the proof of Theorem 5.5.16), and consider the space C(S) with the topology τ of uniform convergence on the sets Fn , n ∈ IN. The locally convex space (C(S), τ ) is metrizable, complete, and separable; that is, it is a separable Fréchet space. For each ω ∈ Ω we denote by U(ω) the set of continuous functions ϕ : S → [0, 1] that are isotone with respect to 'ω . It is obvious that every U(ω) is a nonempty convex set. Let us show that the set-valued mapping U, U : ω "→ U(ω) ⊂ C(S), is lower semicontinuous; that is, given a convergent sequence ωk → ω and a function ϕ ∈ U(ω), there exist ϕk ∈ U(ωk ) such that ϕk → ϕ in (C(S), τ ). Suppose that ωk → ω and ϕ ∈ U (ω). Consider the set-valued mapping Un : Ω → 2C(Fn ) , where Un (ω) is the set of continuous functions Fn → [0, 1] that are isotone with respect to the restriction of 'ω to Fn . Define the function cn : Ω × Fn × Fn → IR ∪ {+∞} by   0, if x 'ω y, cn (ω, x, y) :=  +∞, otherwise.

338


Clearly, this function is lower semicontinuous on Ω × Fn × Fn . From the proof of Theorem 5.2.17 it follows that the set-valued mapping Un is lower semicontinuous. Therefore, there exist functions vkn ∈ Un (ωk ) such that lim max |ϕ(x) − vkn (x)| = 0,

k→∞ x∈Fn

n ∈ IN.

Further, applying Theorem 5.5.3 combined with (5.5.11), (5.5.12), we see that there exist functions ϕkn ∈ U(ωk ) such that ϕkn |Fn = vkn for all (k, n). Now choose a subsequence k1 < k2 < · · · < kn < · · · such that max |ϕ(x) − vkn (x)| ≤

x∈Fn

1 n

whenever k ≥ kn .

Let ϕk (x) = ϕkn (x), where n = n(k) is taken from the condition that kn = k < kn+1 . For n ≥ m and kn ≤ k < kn+1 we have max |ϕ(x) − ϕk (x)| =

x∈Fm

≤

max |ϕ(x) − ϕkn (x)|

x∈Fm

max |ϕ(x) − ϕkn (x)| ≤

x∈Fn

1 . n

Then ϕk ∈ U(ωk ), and ϕk converges to ϕ in (C(S), τ ). The lower semicontinuity of U is thus established. Now consider the set W := {ϕ ∈ C(S) ϕ(S) ⊆ (0, 1)}. Clearly, W is convex and open in(C(S), τ ). Let {ϕi } be a countable τ -dense subset in W , and let Gij = ϕ ∈ W ; r(ϕ, ϕi ) < 1j , where r(ϕ, ψ) :=

∞ 1 max |ϕ(x) − ψ(x)|. n x∈Fn 2 n=1

Obviously, the sets Gij are convex. The τ -continuity of the functions r(·, ϕi ) on W implies that Gij are open in (C(S), τ ). Then also, the sets U −1 (Gij ) = {ω ∈ Ω; U(ω) ∩ Gij = Ø} are open in Ω, as U is lower semicontinuous. Further, since in a metrizable space open sets are Fσ , for any i and j there are closed sets ij ⊂ Aij ⊂ ··· such that Ak = U −1 (Gij ). Aij 1 2 k∈IN

Consider the set-valued mappings Uijk : Ω → 2C(S) ,   U(ω) ∩ G , if ω ∈ Aij , ij k Uijk (ω) :=  U(ω), if ω ∈ Ω \ Aij k,


339

where Gij stands for the closure of Gij in (C(S), τ ). The mappings Uijk are convex-valued and lower semicontinuous, so by the well-known selection theorem of Michael, for any positive integers i, j, and k, there exists a continuous selection of Uijk , fijk : Ω → (C(S), τ ). (In Michael (1956) this selection theorem is stated and proved for set-valued mappings with values in a Banach space, but it is noted there that the proof carries over to the mappings with values in a Fréchet space. So in our case the application of the theorem is correct.) For each ω ∈ Ω the countable set {fijk (ω)}, where i, j, and k run independently through IN, is contained in U(ω) and is dense there. Let the fijk be renumbered arbitrarily by a single index ∈ IN, and let u(ω, x) =

∞ 1 [f (ω)](x). 2 =1

We show that u is the desired utility function. Obviously, u(Ω × S) ⊆ [0, 1], u is continuous on Ω × S, and the function u(ω, ·) is isotone with respect to 'ω for each ω ∈ Ω. Suppose that ω ∈ Ω, x, y ∈ S, and x ≺ω y. By Theorem 5.5.3 (together with (5.5.11), (5.5.12)), there is a function ϕ ∈ U(ω) such that ϕ(x) = 0 and ϕ(y) = 1. Then [f (ω)](x) < [f (ω)](y) for some (as the set {f (ω)} ∈IN is dense in U(ω)). Hence, u(ω, x) < u(ω, y); that is, u(ω, ·) is a utility function for 'ω . The proof is now complete. 2 Note that in the special case of Ω sharing the same topological properties as S, Theorem 5.5.18 can be proved without use of Michael’s theorem. Let Ω be a separable locally compact metrizable space. Consider the topological product S = Ω × S and define a preorder ' on S by (ω1 , x1 ) ' (ω2 , x2 ) ⇔ ω1 = ω2 , x1 'ω x2 . Obviously, S is separable, metrizable, and locally compact. Furthermore, ' is a closed preorder on S . By Theorem 5.5.16, the preorder ' has a continuous utility function u : Ω × S → [0, 1]. Since (ω1 , x1 ) ≺ (ω2 , x2 ) ⇔ ω1 = ω2 , x1 ≺ω1 x2 , u is the desired function. Corollary 5.5.19 Assume that Ω and S are as in Theorem 5.5.18, a preorder 'ω is given on a nonempty set Φ(ω) ⊆ S for each ω ∈ Ω, and the set Φ := {(ω, x); x ∈ Φ(ω)} Consider the following assertions:

is closed in Ω × S.

340


(a) The set M := {(ω, x, y); x, y ∈ Φ(ω), x 'ω y} is closed in Ω×S ×S. (b) There exists a continuous function u : Φ → [0, 1] such that u(ω, ·) is a utility function for 'ω for each ω ∈ Ω. Then (a) ⇒ (b), and if all 'ω are linear preorders, then the two assertions are equivalent. Proof: The implication (a) ⇒ (b) follows from Theorem 5.5.18, applied to preorders 'ω on S given by x 'ω y ⇔ [x, y ∈ Φ(ω), x 'ω y or x = y]. (b) ⇒ (a): If u ∈ C(F ), and u(ω, ·) is a utility function for the linear preorder 'ω for all ω ∈ Ω, then M = {(ω, x, y); (ω, x) ∈ Φ, (ω, y) ∈ Φ, u(ω, x) ≤ u(ω, y)}, which implies that M is closed.

2

Remark 5.5.20 Corollary 5.5.19 proves to be useful in studying the stability of competitive equilibria models; see Lucchetti and Patrone (1986, Theorem 6.1). Given a separable metrizable locally compact space S, let P be the set of all closed preorders on S. Identifying a preoder ' ∈ P with its graph in S × S, we consider on P the topology t induced by the exponential topology on the space of closed subsets of the one-point compactificaton of S × S (see Kuratowski (1966, vol. 1) for the definition and properties of the exponential topology). Obviously, (P, t) is a metrizable space. Setting Ω = (P, t), we obtain the following universal utility theorem: Corollary 5.5.21 (Universal utility theorem) There exists a continuous function u : (P, t) × S → [0, 1] such that u(', ·) is a utility function for ' for each ' ∈ P. Remark 5.5.22 Corollary 5.5.21 generalizes a similar result of Mas-Colell (1977), where a subset P ⊂ P consisting of linear preorders is considered. The proof in Mas-Colell (1977) does not carry over to the whole set P. Remark 5.5.23 Here we do not consider the measurable utility theorems, in which (Ω, T ) is assumed to be a measurable space and the existence of a function u : Ω × S → [0, 1] is asserted such that u is measurable in some sense and u(ω, ·) is a continuous utility function for 'ω for each ω ∈ Ω. In this connection, see Aumann (1967), Wesley (1976), Wieczorek (1980), and Levin (1981, 1983a).


5.5.6

341

Functionally Closed Preorders and Strong Stochastic Dominance

In this section, based on papers by Levin (1985b, 1986, 1990), we examine a relationship between functionally closed and stochastic preorders. Recall that a preorder ' on a completely regular topological space S is called functionally closed if a representation gr(') = {(x, y); u(x) ≤ u(y)

for all u ∈ H}

(5.5.15)

holds, where H is some nonempty subset in C b (S) (see Definition 5.5.1). It is clear that a functionally closed preorder admits the representation (5.5.15) with H = H b ('). Here H b (') denotes the set of all functions in C b (S) that are isotone with respect to '. Let M (S) be the set of probability measures in V+ (S). To a given closed preorder ' on S, we associate a preorder '∗ on M (S) defined by σ1 '∗ σ2 ⇔ σ1 (u) ≤ σ2 (u) ∀u ∈ H b ('), with σ(u) := u(x)σ( dx) for all σ ∈ M (S) and u ∈ C b (S). S

The preorder '∗ is called the strong stochastic dominance. If (S, ') is IR (or a segment in IR) with the natural order, then '∗ is identical with the usual stochastic dominance 'SD , σ1 'SD σ2 ⇔ σ1 {y; y ' x} ≥ σ2 {y; y ' x}

∀x ∈ S.

The stochastic dominance 'SD arises in a natural way in problems connected with measuring difference in income (see Marshall and Olkin (1979)) and in studies on rational behavior under risk (see, for example, Kiruta, Rubinov, and Yanovskaya (1980), Duffie (1992)). However, for S = IR2 (with the natural order), '∗ does not coincide with 'SD but is a strictly stronger preorder. In the case when ' is a preorder on a finite set, the strong stochastic dominance order also arises in physics applications related to magnetic fields and to probability distributions on graphs; see, for example, Preston (1974). The following treatment of strong stochastic dominance is alternative to that in Section 3.5 and is due to Levin (1985b, 1986,1990). Theorem 5.5.24 Suppose that S ∈ L (see Definition 4.5.2) and ' is a closed preorder on S. (I) If ' is functionally closed and σ1 , σ2 ∈ M (S), then the following assertions are equivalent: (a) σ1 '∗ σ2 .

342


(b) There exists a measure µ ∈ M (S × S) such that π1 µ = σ1 , π2 µ = σ2 , and supp µ ⊆ gr('). (II) If assertions (a) and (b) are equivalent for all σ1 , σ2 ∈ M (S), then ' is functionally closed. Proof: (I) The implication (b) ⇒ (a) is obvious. To prove (a) ⇒ (b), consider the mass transfer problem with a cost function   0, if x ' y, c(x, y) =  +∞, otherwise. Since ' is functionally closed, c can be represented in the form c(x, y) = sup{u(x) − u(y); u ∈ H}, where H ⊆ C b (S) is as in (5.5.15). Then, by Theorem 4.5.3, A(c, σ1 − σ2 ) = B(c, σ1 − σ2 ), and there exists a measure µ ∈ M (S × S) such that π1 µ = σ1 , π2 µ = σ2 , and c(µ) = A(c, σ1 − σ2 ). To complete the proof, it remains to note that the relation σ1 '∗ σ2 can be rewritten as B(c, σ1 − σ2 ) = 0, while the equality c(µ) = 0 means that supp µ ⊇ gr('). (II) If (a) and (b) are equivalent, then by taking the Dirac measures σ1 = δx and σ1 = δy for all x, y ∈ S, we obtain x ' y ⇔ δx '∗ δy ; that is, x ' y ⇔ u(x) ≤ u(y)

for all u ∈ H b (≤). 2

This completes the proof.

In connection with the definition of a functionally closed preorder (see Definition 5.5.1), it is of interest to note that the equivalence x ' y ⇔ u(x) ≤ u(y)

∀u ∈ H b (', IB(S))

holds for all closed preorders '. Here H b (', IB(S)) denotes the cone of bounded isotone Borel functions on S. Indeed, the implication ⇒ is obvious,


343

and in order to obtain ⇐, it suffices to consider the functions uz ∈ H b (', IB(S)), z ∈ S, where   0, if x ' z, uz (x) =  1, otherwise. Moreover, the following theorem is true. Theorem 5.5.25 Suppose that S ∈ L, ' is a closed preorder on S, and σ1 , σ2 ∈ M (S). Then the following assertions are equivalent: (a) σ1 (u) ≤ σ2 (u) for all u ∈ H b (', IB(S)). (b) There exists a measure µ ∈ M (S × S) such that π1 µ = σ1 , π2 µ = σ2 , and supp µ ⊆ gr('). Proof: Here, only (a) ⇒ (b) is nontrivial. To prove it, we apply Theorem 5.4.1 with F = gr(') and ε = 0. In view of the inner regularity of σ1 and σ2 , it suffices to verify that (5.4.2) holds for all compact sets A and B in S with (A × B) ∩ gr(') = Ø. It follows from the compactness of A that the set A1 := {x; there exists y ∈ A with y ' x} is closed; hence 1A1 ∈ H b (', IB(S)), and in view of (a), σ1 (1A1 )−σ2 (1A1 ) ≤ 0. Further, it is clear that (A1 × B) ∩ gr(') = Ø, whence B ⊆ S \ A1 . We obtain σ1 (A) + σ2 (B) ≤ σ1 (A1 ) + σ2 (S \ A1 ) = σ1 (1A1 ) − σ2 (1A1 ) + 1 ≤ 1. The proof is complete.

2

It follows from Theorems 5.5.24 and 5.5.25 that their assertions (a) are equivalent, provided that ' is functionally closed. This can directly be seen as well if one uses Lusin’s C-property, the inner regularity of all measures under consideration, and the extension theorem for a functionally closed preorder (see Theorem 5.5.2, (c)). Remark 5.5.26 Theorem 5.5.25 extends a similar result in Strassen (1965), see also one in Kamae, Krengel, and O’Brien (1977) for the case that S is a Polish space. See also Marshall and Olkin (1979) for the case of S = IRn and Preston (1974) for the case of a finite S. An alternative extension is given in Section 3.5 of this book. We conclude this section by an example of a closed preorder that is not functionally closed.

344


Example 5.5.27 Let S be a completely regular topological space that is not normal. Then there exist two disjoint closed subsets F1 and F0 in S such that there is no continuous function separating the subsets. It follows that the formula gr(') := (F0 × S) ∪ ((S \ F0 ) × F1 ) determines a closed preorder ' on S that is not functionally closed. Clearly, in this case the function on S × S,   0, if x ' y, c(x, y) =  +∞, otherwise, is lower semicontinuous and satisfies the triangle inequality but cannot be represented in the form c(x, y) = sup{u(x) − u(y); u ∈ H} with H ⊆ C b (S).

5.6 Further Applications to Utility Theory The aim of this section is to study preferences that admit continuous or even Lipschitz continuous utility functions and similarly to investigate the existence of regular choice functions. The results in this section are due to Levin (1991).

5.6.1

Preferences That Admit Lipschitz or Continuous Utility Functions

Let S be a separable metrizable space. Any set-valued mapping R : S → 2S can be treated as a preference on S; y is said to be preferred to x if and only if y ∈ R(x). A preorder ' induces the preference R(x) = {y; x ' y}, but for general R the corresponding binary relation need not be transitive. Rather, general preferences (nontotal and nontransitive) are widely used in various parts of mathematical economics; see, for example, Gale and MasColell (1975), Shafer and Sonnenschein (1975), Kim and Richter (1986). Nontransitive preferences arise in a natural way in decision theory, which may be illustrated by the following example.

5.6 Further Applications to Utility Theory

345

Example 5.6.1 Given are m experts and a space S of alternatives. Let 'i be a linear preorder on S describing the preference of the ith expert, i = 1, . . . , m. Define on S a preference R by y ∈ R(x) ⇔ #{i; x 'i y} >

m , 2

where # denotes the number of elements in the set. In other words, y is preferred to x if and only if a majority of experts assumes y to be not worse than x. Clearly, in general, this preference is not transitive. Given a preference R, one can associate with it a strict preference P as follows: y ∈ P (x) ⇔ y ∈ R(x),

x ∈ R(y).

A real-valued function u on S is called a utility function for R if u(y) ≥ u(x)

whenever y ∈ R(x)

(5.6.1)

u(y) > u(x)

whenever y ∈ P (x).

(5.6.2)

and

A real-valued function u on S satisfying (5.6.1) is said to be R-isotone. Let d be a metric on S consistent with the given topology. Recall that a real-valued function u on S is said to be d-Lipschitz if |u(x)−u(y)| ≤ d(x, y) for all x, y ∈ S. Our goal is to characterize preferences on S that admit d-Lipschitz utility functions. Also, we will characterize preferences with continuous utility functions. First of all, note that a real-valued function u on S is both R-isotone and d-Lipschitz if and only if u ∈ Lip (c, S; C(S)), where the cost function c is given by   0, if y ∈ R(x), c(x, y) := (5.6.3)  d(x, y), otherwise. Next, if Lip (c, S; C(S)) is nonempty, then we consider on S a functionally closed preorder ' defined by x ' y ⇔ u(x) ≤ u(y)

for all u ∈ Lip (c, S; C(S)).

(5.6.4)

Since all u ∈ Lip (c, S; C(S)) are R-isotone, we have y ∈ R(x) ⇒ x ' y.

(5.6.5)

346


Our plan is to establish the equivalence x ' y ⇔ c∗ (x, y) = 0,

(5.6.6)

and to apply Theorem 5.5.8 in order to derive the existence of a d-Lipschitz utility function for '. This utility function will be a utility function for R as well, provided that R satisfies the condition y ∈ P (x) ⇒ x ≺ y.

(5.6.7)

Condition (5.6.7) can be formulated in terms of R and d, and we shall see that its validity is not only sufficient but also necessary for the existence of a d-Lipschitz utility function for R. A criterion for the existence of a continuous utility function for R will be derived from this by choosing an appropriate metric d. Further on we shall follow this plan step by step. Suppose R is a preference on S. We denote by T (x, y) the set of all chains τ = (z0 → z1 → · · · → zn ), where zi are elements of S, z0 = x, zn = y, n ∈ IN. A chain τ is said to be R-nondecreasing if for every i ∈ {1, . . . , n}, either zi ∈ R(zi−1 ) or zi = zi−1 . (We don’t assume that x ∈ R(x).) Given a metric d consistent with the topology of S and a chain τ ∈ T (x, y), we define the d-valuation of τ with respect to R by Vd,R (τ ) :=

n

c(zi−1 , zi ),

i=0

where the cost function c is given by (5.6.3). In other words, Vd,k (τ ) is the total sum to be paid for moving along τ , provided that (i) the transition from z to z is free in the case when z is preferred to z; and (ii) the payment for the transition equals the distance between z and z in the other case. Clearly, τ is R-nondecreasing if and only if Vd,R (τ ) = 0. Then for an arbitrary τ , Vd,R (τ ) can be regarded as the penalty for violating the R-nondecreasing requirement for τ . An obvious necessary condition for the existence of a utility function for R is as follows: If x ∈ S and y ∈ P (x), then it is impossible to find a R-nondecreasing chain τ ∈ T (y, x). By using the notion of d-valuation, this condition can be reformulated as an implication: [x ∈ S, y ∈ P (x)] ⇒ [Vd,R (τ ) > 0

for every τ ∈ T (y, x)].


347

The example of the standard lexicographical order on IR2 shows that the above condition is not sufficient for the existence of a utility function. We shall see that the fulfillment of the implication [x ∈ S, y ∈ P (x)] ⇒ inf Vd,R (τ ) > 0 τ ∈T (y,x)

is both necessary and sufficient for the existence of a d-Lipschitz utility function for R. Before formulating the theorem, we require some lemmas. Let TR (x, y) denote the subset in T (x, y) consisting of chains having the form τ = (y0 → x1 → y1 → x2 → · · · → xn → yn → xn+1 ),

(5.6.8)

where R(xk ) ( yk , k = 1, . . . , n, y0 = x, xn+1 = y, n ∈ IN. Such chains will be called regular with respect to R or, briefly, R-regular. Define for any R-regular τ of the form (5.6.8) Vd,R (τ ) :=

n

d(yi , xi+1 ).

i=0

Lemma 5.6.2 The equality inf

τ ∈T (x,y)

Vd,R (τ ) =

inf

τ ∈TR (x,y)

Vd,R (τ )

holds for all x, y ∈ S. ≥ Vd,R (τ ) for any Proof: It is clear that TR (x, y) ⊂ T (x, y) and Vd,R τ ∈ TR (x, y). Consequently,

inf

τ ∈T (x,y)

Vd,R ≤

inf

τ ∈TR (x,y)

Vd,R (τ ).

In order to prove the opposite inequality, it suffices for an arbitrary chain τ = (z0 → z1 → · · · → zn ) ∈ T (x, y)

(z0 = x, zn = y)

(τ ) ≤ Vd,R (τ ). If τ contains a to find a chain τ ∈ TR (x, y) such that Vd,R fragment zk−1 → zk → zk+1 with zk ∈ R(zk−1 ), zk+1 ∈ R(zk ), we delete zk from τ and obtain a new chain

τ1 = (z0 → · · · → zk−1 → zk+1 → · · · → zn ) with Vd,R (τ1 ) ≤ Vd,R (τ ). If τ1 contains a similar fragment, we repeat the above procedure and get a shorter chain τ2 with Vd,R (τ2 ) ≤ Vd,R (τ1 ). After a sequence of such iterations we reach a chain ∗ ) ∈ T (x, y) τ ∗ = (z0∗ → z1∗ → · · · → zm

∗ (z0∗ = x, zm = y, m ≤ n)

348


with Vd,R (τ ∗ ) ≤ Vd,R (τ ).

(5.6.9)

Moreover, for ∗ ∈ R(zj∗ )}, J := {j; zj+1

(5.6.10)

the following condition is satisfied: j ∈ J ⇒ j + 1 ∈ J.

(5.6.11)

We can now define the desired chain τ = (y0 → x1 → y1 → · · · → x −1 → y −1 → x ) ∈ TR (x, y). This chain is constructed step by step of elements of τ ∗ according to the following rules: (i) Each yk−1 , k = 1, . . . , , coincides with one of zj∗ . ∗ if (ii) In the case yk−1 = zj∗ , one takes xk = zj∗ if j ∈ J and xk = zj+1 j ∈ J. ∗ . (iii) If xk = zj∗ , then yk = zj+1

The chain τ is completely determined by these rules and the initial information that y0 = z0∗ = x. In view of (5.6.10) and (5.6.11), we have that τ ∈ TR (x, y) and ∗ (τ ) = d(zj∗ , zj+1 ) = Vd,R (τ ∗ ). (5.6.12) Vd,R j∈J

Now the result follows from (5.6.9) and (5.6.12).

2

Lemma 5.6.3 For all x, y ∈ S and the cost function (5.6.3), the equality c∗ (x, y) =

inf

τ ∈T (x,y)

Vd,R (τ )

(5.6.13)

holds. Proof: This follows immediately from the definitions of the reduced cost function and the d-valuation. 2 Lemma 5.6.4 The reduced cost function c∗ is nonnegative and continuous on S × S.


349

Proof: It is obvious that c∗ is nonnegative. Then it satisfies the triangle inequality, and since c∗ (x, y) ≤ d(x, y) for all x, y ∈ S, the redurced cost c∗ is continuous on S × S (compare the proof of Theorem 5.5.9). 2 Lemma 5.6.5 Given a pair of points x0 , y0 ∈ S with y0 ∈ P (x0 ), the following assertions are equivalent: (a) There exists an R-isotone d-Lipschitz function u : S → IR such that u(x0 ) > u(y0 ). (b)

inf

τ ∈T (y0 ,x0 )

Vd,R (τ ) > 0.

Proof: (a) ⇒ (b): We have u ∈ Lip (c, S; C(S)) = Lip(c, S; IRS ). Hence u ∈ Lip(c∗ , S; C(S)), which combined with (5.6.13), yields inf

τ ∈T (y0 ,x0 )

Vd,R (τ ) = c∗ (y0 , x0 ) ≥ u(y0 ) − u(x0 ) > 0.

(b) ⇒ (a): Take u(x) := c∗ (x, x0 ), x ∈ S. Clearly, u ∈ Lip(c∗ , S; C(S)) = Lip (c, S; C(S)), hence u is R-isotone and d-Lipschitz. Now, taking into account (5.6.13), we obtain u(x0 ) − u(y0 ) = c∗ (x0 , y0 ) =

inf

τ ∈T (x0 ,y0 )

Vd,R (τ ) > 0. 2

Theorem 5.6.6 The following assertions are equivalent: 1o . R admits a d-Lipschitz utility function. 2o . 3o .

inf

τ ∈T (y,x)

inf

Vd,R (τ ) > 0

τ ∈TR (y,x)

Vd,R (τ ) > 0

whenever y ∈ P (x). whenever y ∈ P (x).

Proof: The equivalence of 2o and 3o follows from Lemma 5.6.2, and 1o ⇒ 2o follows from the implication (a) ⇒ (b) of Lemma 5.6.5, so only 2o ⇒ 1o requires a proof. Taking into account that Lip (c, S; C(S)) = Lip(c, S; IRS ) is the set of all R-isotone d-Lipschitz functions on S, and applying Lemma 5.6.4 and Theorem 4.6.17, I, we get that for all x, y ∈ S, c∗ (x, y) = sup{u(x) − u(y); u ∈ Lip (c, S; C(S))}.

(5.6.14)

350


Since any constant belongs to Lip (c, S; C(S)), it follows from (5.6.14) that for the preorder ' defined by (5.6.4), the equivalence (5.6.6) holds. Also satisfied are the implication (5.6.5) since each function in Lip (c, S; C(S)) is R-isotone, and the implication (5.6.7) since our assumption 2o (in view of (5.6.13)) can be rewritten in the form of (5.6.7). Now assume that sequence (xj ) is dense in S. Consider the functions uj (x) :=

c∗ (x, xj ) , 1 + c∗ (x, xj )

j = 1, 2, . . . ,

on S, and let us show that gr(') = {(x, y); uj (x) ≤ uj (y), j ∈ IN}.

(5.6.15)

If (x, y) ∈ gr('), then c∗ (x, y) = 0. Next, using the triangle inequality for c∗ , we obtain that for every j ∈ IN, uj (x) − uj (y) = ≤

c∗ (x, xj ) − c∗ (y, xj ) (1 + c∗ (x, xj ))(1 + c∗ (y, xj )) c∗ (x, y) = 0. (1 + c∗ (x, xj ))(1 + c∗ (y, xj ))

If (x, y) ∈ gr('), then c∗ (x, y) > 0. Taking into account that c∗ is continuous on S × S and that (xj ) is dense in S, we choose a convergent subsequence xjn → y and get in the limit, lim [ujn (x) − ujn (y)] =

n→∞

c∗ (x, y) > 0. 1 + c∗ (x, a)

Therefore, ujn (x) > ujn (y) for large n, and representation (5.6.15) is thus established. Note that all uj belong to C b (S), so Theorem 5.5.8 is applicable, and there exists a continuous utility function for '. A possible choice is u0 (x) :=

∞ 1 u (x), j j 2 j=1

x ∈ S.

In view of (5.6.5) and (5.6.7), u0 is a utility function for R as well. Since 2 u0 is obviously d-Lipschitz, the proof is complete. Remark 5.6.7 A somewhat different proof of 2o ⇒ 1o may be found in Levin (1991), where the function f (x, y) = min(c∗ (x, y), 1) was used in c∗ (x,y) constructing uj instead of 1+c . ∗ (x,y)


351

Remark 5.6.8 Theorem 5.6.6 can be regarded as an extension of Corollary 5.5.13 to preferences that are more general than preorders. A similar extension of Theorem 5.5.9 is also valid. We next characterize preferences that admit continuous utility functions. Observe that for any a, b ∈ IR, a ≤ b

⇔

b a ≤ . 1 + |a| 1 + |b|

Therefore, for each metric d on S, the function d (x, y) :=

d(x, y) 1 + d(x, y)

is a bounded metric, and both metrics, d and d , determine the same topology on S. Also, for any R-isotone function u, the function u (x) :=

u(x) 1 + |u(x)|

is R-isotone and bounded, and if u is d-Lipschitz, then u is d -Lipschitz. Next, if d is a bounded metric and u is a continuous bounded function, then d1 (x, y) := d(x, y) + |u(x) − u(y)| is a bounded metric determining on S the same topology as d, and moreover, u is d1 -Lipschitz. The following result is an immediate consequence of Theorem 5.6.6 and the above observations: Corollary 5.6.9 Let D denote the set of all bounded metrics consistent with the topology of S. The following assertions are equivalent: 1o . R admits a continuous utility function. 2o . There exists a metric d ∈ D for which statement 2o of Theorem 5.6.6 is true. 3o . There exists a metric d ∈ D for which statement 3o of Theorem 5.6.6 is true.

352


5.6.2

Applications to Choice Theory in Mathematical Economics

Let M be a family of nonempty sets in S whose union is S, and suppose that in each M ∈ M a certain subset ϕ(M ) of M is chosen. The choice function ϕ is called utility-rational if there exists a real-valued function u on S such that ϕ(M ) = {x ∈ M ; u(x) = max u(y)} for all M ∈ M. y∈M

(5.6.16)

If in addition, u is continuous (or, d-Lipschitz), ϕ is said to be continuousutility-rational (or d-Lipschitz-utility-rational ). Here a metric d is assumed consistent with the topology of S. Originally, the rational choice problem arose in the form of consumer’s choice. In this setting, the commodity space n+1 n S is R+ , and M = {Mp,I ; (p, I) ∈ R+ }, where p denotes the price vector, I the consumer’s income, and Mp,I = {x ∈ S; px ≤ I} the corresponding budget set. Later, in connection with problems of social choice, various aspects of general rational choice theory were developed by many authors; see, for example, Richter (1979) and Kim and Richter (1986). Applying Theorem 5.6.6, we obtain two results yielding necessary and sufficient conditions for a choice function ϕ to be d-Lipschitz-utility-rational and continuous-utility-rational. Theorem 5.6.10 Let S be a separable metrizable space with a metric d. Given a family M of nonempty sets in S with ∪M ∈M M = S, and a choice function ϕ on M, the following assertions are equivalent: (i) ϕ is d-Lipschitz-utility-rational. (ii) For every M ∈ M, x ∈ M \ ϕ(M ), and y ∈ ϕ(M ), there exists ε = ε(M, x, y) > 0 such that for any n ∈ IN, any sets M1 , . . . , Mn from M and any elements xk ∈ Mk , yk ∈ ϕ(Mk ), k = 1, . . . , n, the inequality n+1

d(yk−1 , xk ) ≥ ε

(5.6.17)

k=1

holds with y0 = y, xn+1 = x. Remark 5.6.11 Clearly, (ii) implies the following form of the strong axiom of revealed preference: No chain z0 → z1 → · · · → zn can exist in S with z0 ∈ ϕ(M ), zn ∈ M \ ϕ(M ), and zk ∈ Mk+1 ∩ ϕ(Mk ), k = 0, 1, . . . , n, where M0 = Mn+1 = M, n ∈ IN, and M, M1 , . . . Mn ∈ M. Indeed, the existence of such a chain contradicts (5.6.17) for yk = xk+1 = zk , k = 0, 1, . . . , n.


353

Proof: Consider on S the preference R defined for each x ∈ S by R(x) := {y; there exists M ∈ M such that x ∈ M, y ∈ ϕ(M )}. (i) ⇒ (ii). If u is a d-Lipschitz function satisfying (5.6.16), then for any x, y ∈ S the equivalence y ∈ P (x) ⇔ [∃M ∈ M; x ∈ M \ ϕ(M ),

y ∈ ϕ(M )]

(5.6.18)

holds. Indeed, “⇒” is derived directly from the definition of R, whereas “⇐” is implied by the fact that in view of (5.6.16), u(x) < u(y). In particular, there is no set M1 ∈ M with the property x ∈ ϕ(M1 ), y ∈ M1 . It follows from (5.6.16) and (5.6.18) that u is a utility function for R. Now, it remains to apply the implication 1o ⇒ 3o of Theorem 5.6.6. (ii) ⇒ (i). Suppose that M ∈ M, x ∈ M \ ϕ(M ), and y ∈ ϕ(M ). Note that the assumption that a set M1 ∈ M exists with x ∈ ϕ(M1 ), y ∈ M1 contradicts the hypothesis for n = 1, y0 = x1 = y, and y1 = x2 = x. Thus the equivalence (5.6.18) holds. Now, (5.6.17) implies statement 3o of Theorem 5.6.6, and the implication 3o ⇒ 1o of this theorem completes the proof. 2

Corollary 5.6.12 Statement (ii) of Theorem 5.6.10 implies the closedness of ϕ(M ) in M . Proof: Indeed, if ϕ(M ) is not closed in M , then there exist x ∈ M \ ϕ(M ) and zk ∈ ϕ(M ), k ∈ IN, with limk→∞ d(x, zk ) = 0. But this contradicts (5.6.17) for n = 1, M1 = M, y = x1 = z1 , and y1 = zk , where k is large enough. 2

Theorem 5.6.13 Let S be a separable metrizable space, M a family of nonempty sets in S with ∪M ∈M M = S, and ϕ a choice function on M. Let D denote the set of all bounded metrics consistent with the topology of S. The following assertions are equivalent: (i) ϕ is continuous-utility-rational. (ii) The statement (ii) of Theorem 5.6.10 holds for some d ∈ D. We omit the proof, which is quite similar to that of Theorem 5.6.10, with the only difference that Corollary 5.6.9 instead of Theorem 5.6.6 is applied.

354


5.7 Applications to Set-Valued Dynamical Systems(1) 5.7.1

Compact-Valued Dynamical Systems: Quasiperiodic Points

Let S be a metrizable separable space. Every set-valued mapping R : S → 2S can be considered as a (discrete) set-valued dynamical system on S. Trajectories of such a system are sequences χ = (χ(t))∞ t=0 with χ(t) ∈ R(χ(t − 1)), t = 1, 2, . . . . The word “discrete” means that we consider discrete time. In this and the next sections, R is assumed to be a mapping with nonempty compact values R(x), x ∈ S. Let H(R) denote the set of continuous R-isotone real-valued functions on S. Recall (see Section 5.6.1) that a real-valued function u on S is called R-isotone if u(y) ≥ u(x) for all x, y ∈ S with y ∈ R(x). The set H(R) is a convex cone in C(S) containing constants, hence nonempty. We shall consider the sets Wu := {x ∈ S; u(x) = min u(y)}, y∈R(x)

u ∈ H(R),

(5.7.1)

and W :=

5

Wu .

(5.7.2)

u∈H(R)

In the case of a compact space S, W is nonempty because W = Wu0 ( ∞ 1 un (x) arg max u0 for u0 (x) := n=1 2n ||un ||C(S) , where (un ) is any sequence dense in H(R) by the C(S)-norm ||u||C(S) = maxx∈S |u(x)|, u ∈ C(S). The existence of such a sequence follows from the fact that for a metrizable compact S, the Banach space C(S) is separable. However, W can be empty when S is not compact, which may be illustrated by the following example. Example 5.7.1 Take S = IN and R(n) = {n + 1} for every n ∈ IN. Then for u(n) = n, u ∈ H(R) and Wu = Ø; hence W = Ø. Let d ∈ D, where D denotes the set of all bounded metrics on S consistent with the given topology. (1) The

main results are due to Levin (1991, 1995a).

5.7 Applications to Set-Valued Dynamical Systems

355

Definition 5.7.2 A point x ∈ S is called d-quasiperiodic if for each ε > 0, there exist n ∈ IN and points xt ∈ S, t = 0, 1, . . . , n, x0 = xn = x, such that n

min{d(xt , y); y ∈ R(xt−1 )} ≤ ε.

(5.7.3)

t=1

The set of d-quasiperiodic points is denoted by Qd . Theorem 5.7.3

(I) The equality W =

4 d∈D

Qd holds.

(II) If S is locally compact, then there exists d ∈ D such that W = Qd . Remark 5.7.4 The set W was introduced by Rubinov (1980, §16) in connection with studying models of economic growth. Rubinov considered the case where S is a metric compact space with a fixed metric d and R is nonempty-compact-valued and continuous with respect to the Hausdorff distance determined by d. Under these assumptions, he proved the inclusion W ⊆ Qd in the case where S is a retract of the standard simplex in IRn and d is the 1 -distance on S. In the theory of ordinary (single-valued) dynamical systems with continuous time, an analogy of the set W is known as the generalized recurrence set; see Auslander (1964) and Bhatia and Szegoe (1970). Remark 5.7.5 The set W can be empty when S is not compact; see Example 5.7.1. Nevertheless, Theorem 5.7.3 is true in this case too. To prove the theorem, the following lemma is required. Lemma 5.7.6 For each x ∈ W there exists y ∈ R(x) such that u(x) = u(y) for all u ∈ H(R). Proof: Define Fu := {y ∈ R(x); u(x) = u(y)},

u ∈ H(R).

Since x ∈ Wu , R(x) is compact, and u is continuous, it follows that each Fu is nonempty and compact. Further, if u1 , . . . , un ∈ H(R), then u1 + · · · + un ∈ H(R) and Fu1 ∩ · · · ∩ Fun = Fu1 +···+un = Ø. Then, by the compactness of Fu , 5 Fu = Ø, u∈H(R)

356


2

and so any element of this set can be taken as y.

Proof of Theorem 5.7.3: (I) Suppose that x ∈ W , and applying Lemma 5.7.6, find a point y ∈ R(x) such that u(x) = u(y) for all u ∈ H(R). Fix d ∈ D and consider the cost function c defined by (5.6.3) and the corresponding reduced cost function c∗ given by (5.6.13). We have u(·) = c∗ (·, x) ∈ H(R), then u(x) = u(y), and consequently, c∗ (y, x) = c∗ (x, x) = 0; i.e., (τ ) = 0. inf τ ∈T (y,x) Vd,R (τ ) = 0. Then by Lemma 5.6.2, inf τ ∈TR (y,x) Vd,R Fix ε > 0 and find a chain τε = (y → x1 → y1 → · · · → xn−1 → yn−1 → x) ∈ TR (y, x) (τε ) ≤ ε. Set x0 = xn = x, y0 = y, and taking into account that with Vd,R y ∈ R(x), we obtain n

min{d(xt , z); z ∈ R(xt−1 )} ≤

t=1

n

d(xt , yt−1 ) = Vd,R (τε ) ≤ ε.

t=1

In view 4 of the arbitrary choice of ε > 0, this implies x ∈ Qd . Therefore, W ⊆ d∈D Qd . On the other hand, if x ∈ W , then there exists a function u ∈ H(R) such that x ∈ Wu . Hence ε0 := miny∈R(x) u(y) − u(x) > 0. Suppose u is bounded. This is not a restriction because one always can pass from u to u(x) u with u (x) = 1+|u(x)| . Fix any d ∈ D and take d1 (z, z ) := d(z, z ) + |u(z) − u(z )|,

z, z ∈ S.

Then d1 ∈ D, and u is d1 -Lipschitz. Take any n and any xt , . . . , n − 1, and setting x0 = xn = x and using the compactness of R(xt−1 ), find yt−1 ∈ R(xt−1 ) such that d1 (xt , yt−1 ) = min{d1 (xt , y); y ∈ R(xt−1 )},

t = 1, . . . , n.

We have n

min{d1 (xt , y); y ∈ R(xt−1 } =

t=1

≥

n t=1 n

d1 (xt , yt−1 ) (u(yt−1 ) − u(xt ))

t=1

=

u(y0 ) − u(x) +

n−1

(u(yt ) − u(xt ))

t=1

≥

u(y0 ) − u(x) ≥ ε0 .


357

This inequality holds for all n ∈ IN and all xt , t = 1, . . . , n − 1, so x ∈ Qd1 , and the equality 5

W =

Qd

d∈D

is thus proved. (II) Consider on C(S) the topology tc of uniform convergence on compact sets. As S is metrizable, separable, and locally compact, (C(S), tc ) is a separable metrizable locally convex space (compare the proof of Theorem 5.5.16). Then there exists a sequence (un ) ⊂ H(R) that is tc -dense in H(R). Consider the set W :=

∞ 5

Wun .

n=1

If x ∈ W , then, un (x) = un (yn ) for a certain yn ∈ R(x), n = 1, 2, . . . . Let u be any function in H(R). Using the compactness of R(x), we find a subsequence ynk that converges to a point y ∈ R(x). Also, we assume, passing if necessary to a subsequence, that unk converges to u in (C(S), tc ). From the definition of the topology tc it follows that unk converges to u uniformly on the compact set K consisting of the points x, ynk (k = 1, 2, . . .), and y. We obtain u(x) =

lim unk (x) =

k→∞

lim unk (ynk ) = u(y),

k→∞

which implies x ∈ Wu . Thus we have proved the equality W = W ; that is, ∞ 5

Wun = W.

(5.7.4)

n=1

Take u0 (x) := x ∈ S.

∞

1 n=1 2n un (x),

x ∈ S, where for each n, un (x) =

un (x) 1+|un (x)| ,

Clearly, u0 ∈ H(R) and Wun = Wun , n ∈ IN. If x ∈ Wu0 , then u0 (x) = u0 (y) for some y ∈ R(x), and as all un are R-isotone, we obtain un (x) = un (y), n ∈ IN. Therefore, x ∈ Wun = Wun , n ∈ IN, and hence W u0 ⊆

∞ 5

Wun .

(5.7.5)

n=1

It follows from (5.7.4) and (5.7.5) that W = Wu0 . Being continuous and bounded, u0 is d-Lipschitz for a certain d ∈ D. So the proof will be completed if we show that Qd ⊆ Wu0 . Thus, let x ∈ Qd ;

358


then there exist n ∈ IN and xt , t = 0, 1, . . . , n, x0 = xn = x, such that (5.7.3) holds. Take yt−1 ∈ R(xt−1 ) from the condition d(xt , yt−1 ) = min{d(xt , y); y ∈ R(xt−1 )},

t = 1, . . . , n.

We have u0 (y0 ) − u0 (x0 )

≤

u0 (y0 ) − u0 (x0 ) +

n−1

(u0 (yt ) − u0 (xt ))

t=1

=

n t=1

(u0 (yt−1 ) − u0 (xt )) ≤

n

d(xt , yt−1 ) ≤ ε.

t=1

Then u0 (x) + ε ≥ u0 (y0 ) ≥ min{u0 (y); y ∈ R(x)}, and as ε > 0 can be 2 taken arbitrarily small, x ∈ Wu0 . The proof is now complete.

5.7.2

Compact-Valued Dynamical Systems: Asymptotic Behavior of Trajectories

Let χ = (χ(t))∞ t=0 be a trajectory of the dynamical system R, that is χ(t) ∈ R(χ(t − 1)), t = 1, 2, . . . . We say that χ starts from x if χ(0) = x. Let us consider the following set defined by E ⊆ W E

=

{x ∈ S; there exists a trajectory χ that starts from x and is contained in W }.

According to this definition, a trajectory χ is said to be attracted by E if for each open G ⊃ E, there exists a moment t0 = t0 (G) such that χ(t) ∈ G whenever t ≥ t0 . Given any set M in S, define R(M ) := R(x). x∈M

Recall some definitions. A set-valued mapping R is called closed if its graph is closed in S × S; R is called upper semicontinuous if for each x ∈ S and each open set G ⊇ R(x), there exists a neighborhood V of x such that G ⊇ R(V ); R is called lower semicontinuous if for any xn → x and any y ∈ R(x), there exist yn ∈ R(xn ), yn → y; R is called continuous if it is both upper and lower semicontinuous. Following Levin (1991), we say that R satisfies condition (C) if each Risotone lower semicontinuous real-valued function on S is an upper envelope of some family of functions from H(R). Recall that R is always assumed to be nonempty-compact-valued and that the space S is metrizable and separable.


359

Theorem 5.7.7 (I) Suppose that R is upper semicontinuous. Then W and E are closed, cluster points of each trajectory belong to E, and any precompact trajectory is attracted by E. (II) Suppose that either R is continuous or that R is lower semicontinuous and satisfies (C). Then E = =

W {x ∈ S; there exists a trajectory χ starting from x and satisfying u(χ(t)) = u(x) for all t and all u ∈ H(R)}.

The next example shows that a nonprecompact trajectory χ can fail to be attracted by E. Example 5.7.8 Let S = {0, 1, 2, . . .}, R(0) = {0, 1}, and R(n) = {n + 1} for n = 1, 2, . . . . It is easily seen that W = E = {0}, and the trajectory χ = (χ(t))∞ t=0 , χ(t) := t + 1, is not attracted by E. Before proving the theorem, we require some lemmas. Lemma 5.7.9 The following assertions are equivalent: (i) R is upper semicontinuous. (ii) R is closed, and for each compact K ⊆ S, R(K) is compact. Proof: (i) ⇒ (ii) First let us verify that R is closed. Let yn ∈ R(xn ), n ∈ IN, and (xn , yn ) → (x, y). We have to show that y ∈ R(x). Suppose this is not the case. Then, in view of the compactness of R(x), there exist open sets G ⊃ R(x) and G ( y with G ∩ G = Ø. Further, by assumption, G ⊇ R(V ) for some neighborhood V of x. Since xn → x and yn ∈ R(xn ), there is some n0 (V ) such that xn ∈ V , and hence G ( y for n ≥ n0 (V ). On the other hand, since yn → y and y ∈ G , there is some n1 (G ) such that yn ∈ G for n ≥ n1 (G ), and as G ∩ G = Ø, we obtain a contradiction. Thus R is closed. Next let us show that R(K) is compact for every compact K. Take any sequence (yn ) in R(K). Let yn ∈ R(xn ), xn ∈ K, n ∈ IN. Using the compactness of K and passing, if necessary, to a subsequence, we assume without loss of generality that xn converges to a point x ∈ K. Fix a metric d ∈ D and take

1 1 = , k ∈ IN. z ; d(z , z) < Gk := z ∈ S; dist(z , R(x)) < k k z∈R(x)

360


Since R is upper semicontinuous, for each k ∈ IN there exists a neighborhood Vk of x such that Gk ⊇ R(Vk ). Taking a subsequence xnk ∈ Vk and using the relations ynk ∈ R(xnk ) ⊆ R(Vk ) ⊆ Gk ,

k ∈ IN,

we find zk ∈ R(x) with d(ynk , zk ) < k1 , k ∈ IN. The compactness of R(x) implies the existence of a convergent subsequence zkm → y ∈ R(x). Note that ynkm → y because limm→∞ d(ynkm , zkm ) = 0. Thus we have chosen a convergent subsequence from (yn ), and in view of the arbitrary choice of (yn ) ⊆ R(K), this implies compactness of R(K). (ii) ⇒ (i) Given an element x ∈ S and an open set G ⊇ R(x), it suffices to show that the set F := {z; R(z) ∩ (S \ G) = Ø} is closed. Let xn ∈ F and xn → x. By the definition of F , there exist yn ∈ R(xn ) ∩ (S \ G), n ∈ IN. The set K := {x, x1 , x2 , . . .} is compact, and so is R(K). Therefore, a convergent subsequence ynk can be extracted from yn . Then its limit y := limk→∞ ynk ∈ S \ G, and as R is closed, y ∈ R(x), and so x ∈ F . 2

Remark 5.7.10 The implication (i) ⇒ (ii) follows also from Berge (1957, §9, Theorems 1 and 9). Lemma 5.7.11 Suppose that R is upper semicontinuous. Let xj → x, and for each k, let χk be a trajectory starting from xk . Then there exists a trajectory χ starting from x and a subsequence χkj such that for every t, χkj (t) converges to χ(t). Proof: Let K0 := K,

Kt := R(Kt−1 ),

t = 1, 2, . . . ,

with K := {x, x1 , x2 , . . .}. By Lemma 5.7.9, all the Kt are compact, and for each t, {χk (t); k = 1, 2, . . .} ⊆ Kt . Let us extract from χk (1) a convergent subsequence χk(m) (1), and consider the sequence χk(m) (2). Next we extract from it a convergent subsequence χk(m(n)) (2), and pass on to the sequence χk(m(n)) (3), and so on. Repeating this procedure infinitely many times and applying Cantor’s diagonal


361

method, we get a subsequence χkj of χk such that for each t, χkj (t) converges to some xt . It remains to note that R(xt ) ( xt+1

for t = 0, 1, 2, . . . ,

which follows from the closedness of R (Lemma 5.7.9, (i) ⇒ (ii)) and from the fact that all χkj are trajectories. So χ := (xt )∞ t=0 is a trajectory, and the proof is complete. 2 Lemma 5.7.12 Suppose that R is upper semicontinuous. Let χ be a trajectory and let M (χ) be the set of its cluster points. Then, for each x ∈ M (χ), there is a trajectory χ that starts from x and is contained in M (χ). Proof: Let x = limk→∞ χ(tk ). For each k ∈ IN, we define a trajectory χk as χk (t) := χ(tk + t). By Lemma 5.7.16, there exists a subsequence χkj and a trajectory χ starting from x such that for each t, χ (t) = limj→∞ χkj (t). Then we have χ (t) = limj→∞ χ(tkj +t) ∈ M (χ), and the result follows. 2 Lemma 5.7.13 Suppose that R is upper semicontinuous. Then W and E are closed. Proof: Clearly, the closedness of W will follow if one proves that Wu is closed for every u ∈ H(R). Fix u ∈ H(R) and let xn ∈ Wu , xn → x. Then, for each n ∈ IN, there is yn ∈ R(xn ) with u(xn ) = u(yn ). We have yn ∈ R(K), where K := {x, x1 , x2 , . . .}. Clearly, K is compact, and then, by Lemma 5.7.9, (i) ⇒ (ii), it follows that R(K) is compact as well. Therefore, there is a subsequence ynk converging to a point y ∈ R(x). We obtain u(y) =

lim u(ynk ) =

k→∞

lim u(xnk ) = u(x).

k→∞

Hence x ∈ Wu , and so Wu is closed. Now let xn ∈ E and xn → x. Then for each n, there is a trajectory χn that starts from xn and is contained in W . Applying Lemma 5.7.11, we find a trajectory χ starting from x and a subsequence χnj such that, for each t, χ(t) = limj→∞ χnj (t). Because of the closedness of W , we have χ(t) ∈ W for all t, so x = χ(0) ∈ E. 2 Lemma 5.7.14 Suppose that R is upper semicontinuous and u ∈ H(R). Then the function ϕ(x) := min{u(y); y ∈ R(x)}, is R-isotone and lower semicontinuous.

x ∈ S,

(5.7.6)

362


Proof: Only lower semicontinuity of ϕu requires a proof. Let xn → x as n → ∞ and take, for each n, yn ∈ R(xn ) such that ϕu (xn ) = u(yn ). Applying Lemma 5.7.9 with K = {x, x1 , x2 , . . .}, we find a subsequence ynk that converges to a point y ∈ R(x). Passing if necessary to a subsequence, we assume without loss of generality that lim ϕu (xn ) = n→∞

lim ϕu (xnk ).

k→∞

Then ϕu (x) ≤ u(y) =

lim u(ynk ) =

k→∞

lim ϕu (xn ), n→∞

2

and the proof is complete.

Lemma 5.7.15 Suppose that either R is continuous or R is upper semicontinuous and satisfies (C). Let x0 ∈ W and y0 ∈ R(x0 ). If u(x0 ) = u(y0 ) for all u ∈ H(R), then y0 ∈ W . Proof: Suppose that u(x0 ) = u(y0 ) for all u ∈ H(R). Fix some u ∈ H(R) and take the function ϕu defined by (5.7.6). First consider the case when R is continuous. If xn → x, y ∈ R(x), and ϕu (x) = u(y), then by the lower semicontinuity of R, there is a sequence yn ∈ R(xn ) that converges to y. We obtain ϕu (x) = u(y) =

lim u(yn ) ≥

n→∞

lim ϕu (xn );

n→∞

that is, ϕu is upper semicontinuous. This, together with Lemma 5.7.14, proves that ϕu ∈ H(R). Then by our assumption, ϕu (x0 ) = ϕu (y0 ) and u(x0 ) = u(y0 ). Also, ϕu (x0 ) = u(x0 ), since x0 ∈ W . It follows that ϕu (y0 ) = u(y0 ); that is, y0 ∈ Wu , and as u is taken arbitrarily from H(R), y0 ∈ W . Consider now the case where R is upper semicontinuous and satisfies (C). By Lemma 5.7.14, ϕu is R-isotone and lower semicontinuous. Then the equality ϕu (x0 ) = ϕu (y0 ) holds, which follows from (C) and the assumption that v(x0 ) = v(y0 ) for all v ∈ H(R). The rest of the proof follows the same arguments as above. 2 Proof of Theorem 5.7.7: (I) The sets E and W are closed by Lemma 5.7.13. Let M (χ) denote the set of cluster points of a trajectory χ and let x = limj→∞ χ(tj ) ∈ M (χ). For each u ∈ H(R), we have u(χ(tj )) ≤ ϕu (χ(tj )) ≤ u(χ(tj+1 )),

(5.7.7)


363

where ϕu is defined by (5.7.6). It follows from (5.7.7) that lim ϕu (χ(tj )) = u(x)

j→∞

(5.7.8)

whenever u ∈ H(R). Now, from (5.7.8) and the lower semicontinuity of ϕu (see Lemma 5.7.14) we derive ϕu (x) ≤ u(x). Then ϕu (x) = u(x) for all u ∈ H(R). Hence x ∈ W , and by Lemma 5.7.12, M (χ) ⊆ E. Now let χ be precompact. Suppose that χ is not attracted by E. Then for some open set G ⊇ E, there exists a subsequence χ(tj ) ∈ S \ G. This is a contradiction, because cluster points of χ(tj ) (existing in view of the assumption that χ is precompact) must belong to E. The contradiction proves that χ is attracted by E. (II) This is an immediate consequence of Lemmas 5.7.6 and 5.7.15.

5.7.3

2

A Dynamic Optimization Problem

Let S be an arbitrary nonempty set and a : S → 2S a multifunction (that is, a set-valued mapping) with nonempty values. Its graph, gr a := {(x, y) ∈ S × S; y ∈ a(x)}, is a continuous analogue of a network: Points x ∈ S and pairs (x, y) ∈ gr a may be regarded as its vertices and arcs, respectively. A sequence (finite or infinite) of elements of S, χ = (χ(t))Tt=0 , T ≤ +∞, is called a trajectory if χ(t) ∈ a(χ(t − 1))

for all t.

(5.7.9)

Given a pair of points x and y in S, κ(x, y) denotes the set of all finite trajectories χ = (χ(t))Tt=0 (T = T (χ) < +∞ not fixed) that start at x and finish at y (that is, χ(0) = x, χ(T ) = y). In what follows, the connectivity hypothesis is assumed to be satisfied: For any x, y ∈ S the set κ(x, y) is nonempty. Suppose that a cost function c : S × S → IR ∪ {+∞} is given, with dom c = gr a. Consider a dynamic optimization problem that consists in minimizing the functional

T (x)

g(χ) :=

c(χ(t − 1), χ(t))

(5.7.10)

t=1

over the set κ(x, y). This problem resembles, in some respects, models of economic system development and multistage decision processes with a continuous set of

364


states. Similar problems have been studied in mathematical economics since Ramsey (1928). We describe optimal finite trajectories (when they exist) and characterize efficient infinite trajectories starting from a given point. Theorem 5.7.16 A trajectory χ0 = (χ0 (t))Tt=0 (χ0 (0) = x0 , χ0 (T ) = y0 , T = T (χ0 ) < +∞) is optimal in κ(x0 , y0 ) if and only if there exists a function u ∈ Lip(c, S; IRS ) satisfying c(χ0 (t − 1), χ0 (t)) = u(χ0 (t − 1)) − u(χ0 (t)),

t = 1, . . . , T (χ0 ). (5.7.11)

Proof: First observe that the connectivity hypothesis can be rewritten as c∗ (x, y) < +∞

for all x, y ∈ S.

(5.7.12)

Furthermore, the property of a trajectory χ ∈ κ(x0 , y0 ) to be optimal in κ(x0 , y0 ) can be rewritten as g(χ) = c∗ (x0 , y0 ).

(5.7.13)

Now, if χ0 is optimal in κ(x0 , y0 ), then in view of (5.7.13), c∗ (x0 , y0 ) > −∞. Consequently, from Theorem 4.6.17, it follows that c∗ (x, y) > −∞ for all x, y ∈ S. Moreover, Lip(c, S; IRS ) is nonempty. Next, the function u,   c (x, y ), if x = y , ∗ 0 0 u(x) =  0, if x = y , 0

belongs to Lip(c, S; IRS ) (see the proof of Theorem 4.6.17). Taking into account (5.7.13), the optimality of χ0 yields t

c∗ (x0 , y0 ) =

c(χ0 (s − 1), χ0 (s)) + u(χ0 (t)),

t = 1, . . . , T (χ0 ),

s=1

which implies (5.7.13). On the other hand, if (5.7.11) is satisfied for some u ∈ Lip(c, S; IRS ), then g(χ0 ) =

T

c(χ0 (t − 1), χ0 (t)) = u(x0 ) − u(y0 ).

t=1

Further, for any other trajectory χ ∈ κ(x0 , y0 ),

T (χ)

g(χ) =

c(χ(t − 1), χ(t))

t=1

T (χ)

≥

t=1

(u(χ(t − 1)) − u(χ(t))) = u(x0 ) − u(y0 ).


365

2

This implies that χ0 is optimal.

Remark 5.7.17 The above approach, based on the duality theory for a nontopological version of the mass transfer problem, is also applicable to a more general optimization problem that consists in minimizing the functional g1 (x) = (χ(0), χ(T )) +

T

c(χ(t − 1), χ(t))

(T = T (χ) < +∞)

t=1

over the set of trajectories. In this case, condition (5.7.11) is to be supplemented by the condition (χ0 (0), χ0 (T (χ0 ))) + c∗ (χ0 (0), χ0 (T (χ0 ))) =

min [(x, y) + c∗ (x, y)]

x,y∈S

(communication by V.L. Levin). In connection with Theorem 5.7.16 note that in general, κ(x0 , y0 ) need not have optimal trajectories, but in some cases such trajectories do exist. The following existence theorem has been proved by Levin (private communication). Theorem 5.7.18 Suppose that S is a compact topological space and that the cost function c is strictly positive and lower semicontinuous. Then for any x0 , y0 ∈ S, there exists an optimal trajectory in κ(x0 , y0 ). Proof: From the assumptions on S and c it follows that c(x, y) ≥ ε0 > 0 for all x, y ∈ S. This, together with the connectivity hypotheses (5.7.12), implies that for any x, y ∈ S, there exist an integer T = T (x, y) and points xt , t = 1, . . . , T − 1, in S such that c∗ (x, y) =

T

c(xt−1 , xt ),

t=1

with x0 = x and xT = y. Taking such a representation for x = x0 , y = y0 , T (χ ) we obtain a trajectory χ0 = (χ0 (t))t=0 0 in κ(x0 , y0 ) satisfying

T (χ0 )

c∗ (x0 , y0 ) =

c(χ0 (t − 1), χ0 (t)),

(5.7.14)

t=1

and since (5.7.14) implies (5.7.13) for χ = χ0 , χ0 is optimal.

2

We return now to a general (nontopological) situation. An infinite trajectory (χ0 (t))∞ t=0 is said to be efficient if for each T , the finite trajectory χT0 = (χ0 (t))Tt=0 is optimal in κ(χ0 (o), χ0 (T )).

366


Theorem 5.7.19 An infinite trajectory χ0 = (χ0 (t))∞ t=0 is efficient if and only if there exists a function u ∈ Lip(c, S; IRS ) satisfying (5.7.11) for all t. Proof: Clearly, only the “only if” part requires a proof. Thus, suppose that χ0 is efficient. By Theorem 5.7.16, for each T ∈ IN, there exists a function uT ∈ Lip(c, S; IRS ) satisfying (5.7.11); i.e., uT (χ0 (t − 1)) − uT (χ0 (t)) = c(χ0 (t − 1), χ0 (t)),

t = 1, . . . , T. (5.7.15)

Without loss of generality, assume that uT (χ0 (0)) = 0, T ∈ IN. Hence t uT +1 (χ0 (t)) = uT (χ0 (t)) = − c(χ0 (s − 1), χ0 (s)) (5.7.16) s=1

whenever T ≥ t. Let ∞ denote the Banach lattice of bounded real sequences (ξT )∞ T =1 with

:= sup |ξ |, and let c denote its sublattice consistthe norm (ξT )∞ T T T =1 ing of convergent sequences. It follows from the Hahn–Banach theorem that there exists a positive linear functional (a Banach limit) B on ∞ such that (ξT )∞ T =1 , B =

for (ξT )∞ T =1 ∈ c.

lim ξT

T →∞

(5.7.17)

Indeed, the linear functional on the space c, given by (ξT )∞ T =1 →

lim ξT ,

T →∞

can be extended to ∞ preserving the norm, and since the extended functional is easily seen to be positive (that is, it takes nonnegative values on ∞ + ), it can be taken as B. Now, for each x ∈ S, we set u(x) := (uT (x))∞ T =1 , B . Since uT ∈ Lip(c, S; IRS ) and uT (X0 (0)) = 0, we have for each x ∈ S, −c∗ (x0 (0), x) ≤ uT (x) ≤ c∗ (x, X0 (0)). Moreover, as c∗ (x, y) < +∞ for all x, y ∈ S (see the connectivity hy∞ pothesis (5.7.12)), (uT (x))∞ T =1 ∈ , and so the function u is well-defined. Furthermore, since uT (x) − uT (y) ≤ c∗ (x, y)

for all x, y ∈ S,

we get u(x) − u(y) ≤ c∗ (x, y)

for all x, y ∈ S;

that is, u ∈ Lip(c∗ , S; IRS ) = Lip(c, S; IRS ).

5.8 Compensatory Transfers and Action Profiles

367

Finally, (uT (χ0 (t)))∞ T =1 ∈ c in view of (5.7.16), and taking into account (5.7.15), we obtain u(χ0 (t − 1)) − u(χ0 (t)) = = =

(uT (χ0 (t − 1)) − uT (χ0 (t)))∞ T =1 , B lim (uT (χ0 (t − 1)) − uT (χ0 (t)))

T →∞

c(χ0 (t − 1), χ0 (t)),

t = 1, 2, . . . . 2


5.8 Compensatory Transfers and Action Profiles In this section(2) an application of conditions for nonemptiness of Lip(c, S; IRS ) is given to the problem of rationalizability of action profiles via compensatory transfers. This problem, considered earlier by Rochet (1985, 1987), is an example of implementation problems arising in the theory of monopoles with incomplete information (see Laffont and Maskin (1980), Baron and Myerson (1982), Maskin and Riley (1984)) and in the theory of optimal taxation (see Mirrlees (1976) and Hammond (1979)). The corresponding model is as follows: There is a monopolist producing m commodities and selling them to a population of agents, S. Each agent, or potential buyer, x ∈ S, has a utility function v(x, ·) : Z → IR, where Z ⊂ IRm denotes the commodity space. The monopolist is assumed to be uninformed about his customers’ characteristics, or more precisely, he knows the whole set S but is unable to identify an individual agent facing him. Selling strategies of the monopolist consist in proposing “personalized” transactions to each buyer. Such a transaction is an exchange of a bundle z = (z1 , . . . , zm ) of commodities for a money equivalent u. The gain of the buyer x is thereby v(x, z) − u. Let us consider this model in detail. A selling strategy is defined as a pair of functions: an action profile z(·) : S → Z and a transfer function u : S → IR. If an agent x posed as y (recall that the monopolist is unable to verify “who is who”), his gain equals (v(x, z(y)) − u(y), provided that the selling strategy used is z(·), u(·). A question arises whether there is a strategy z(·), u(·) such that it is in the interest of each agent to be honest, that is, v(x, z(x)) − u(x) ≥ v(x, z(y)) − u(y) (2) The

main result here is due to Levin (1995a).

for all x, y ∈ S.

(5.8.1)

368


If (5.8.1) holds, then the action profile z(·) is called rationalizable via compensatory transfers, and the selling strategy z(·), u(·) is called implementable. It is easy to see that the fulfillment of (5.8.1) is equivalent to the existence of a function f : Z → IR ∪ {+∞} such that for all x ∈ S, max(v(x, z) − f (z)) = v(x, z(x)) − f (z(x)) > −∞. z∈Z

(5.8.2)

Indeed, it follows from (5.8.1) that z(x1 ) = z(x2 ) implies u(x1 ) = u(x2 ). Hence the function   u(x), if z = z(x), f (z) :=  sup x∈S [v(x, z) − v(x, z(x)) + u(x)], if z ∈ z(S), is well-defined and satisfies (5.8.2). On the other hand, it is clear that (5.8.2) implies (5.8.1) with u(x) = f (z(x)). Thus, an action profile z(·) is rationalizable if and only if for each x ∈ S, z(x) is the optimal choice of the agent x with respect to a certain (in general, nonlinear) price schedule (or taxation function) f . Note now that condition (5.8.2) means exactly that u ∈ Lip(c, S; IRS ), where the cost function c is defined by c(x, y) := v(x, z(x)) − v(x, z(y)).

(5.8.3)

Therefore, the problem of rationalizability of a given action profile can be reformulated as the question of when the corresponding set Lip(c, S; IRS ) is nonempty. The next result is a particular case of Theorem 4.6.26 corresponding to the cost function (5.8.3). Theorem 5.8.1 (Rochet (1987), Levin (1995a)) Let z(·) be an action profile. The following assertions are equivalent: (i) z is rationalizable via compensatory transfers. (ii) For each cycle in S, x0 , x1 , . . . , xN −1 , xN = x0 , N

[v(xk−1 , z(xk−1 )) − v(xk−1 , z(xk ))] ≥ 0.

k=1

(iii) For each cycle in S, x0 , x1 , . . . , xN −1 , xN = x0 , N k=1

[v(xk , z(xk )) − v(xk , z(xk−1 ))] ≥ 0.

5.8 Compensatory Transfers and Action Profiles

369

If S is a topological space and v(·), zk (· · ·), k = 1, . . . , m, are continuous, then each transfer function rationalizing z(·) is also continuous. Theorem 5.8.2 Suppose that S is a convex domain in IRn , and for each x = (x1 , . . . , xn ) ∈ S and each z ∈ Z, n

v(x, z) =

xj vj (z),

(5.8.4)

j=1

where all vj are C 2 . Let z(·) be a C 2 action profile. The existence of a smooth (C 2 ) convex function u : S → IR satisfying vj (z(x)) =

∂u(x) , ∂xj

j = 1, . . . , n,

(5.8.5)

is sufficient and necessary for z(·) to be rationalizable via compensatory transfers. Proof: Sufficiency. Suppose u is a smooth convex function satisfying (5.8.5). Then for each y ∈ S, the matrix bij (y) :=

m ∂zk (y) ∂vi (z(y)) ∂zk ∂yj

k=1

is symmetric and positive semidefinite. Since in view of (5.8.3) and (5.8.4) c(x, y) =

n

xj [vj (z(x)) − vj (z(y))]

for all x, y ∈ S,

j=1

we have c(x, x) = 0 for all x ∈ S, ∂c(x, y) = − xi bij (y), ∂yj i n

j = 1, . . . , n,

(5.8.6)

1

and ∂ 2 c(x, y) = −bij (y), ∂xi ∂yj

i, j = 1, . . . , n.

(5.8.7)

It follows from (5.8.6) and (5.8.7) that the assumptions of Theorem 5.1.7 hold. Applying the theorem, we get Lip(c, S; IRS ) = Ø; that is, z(·) is rationalizable via compensatory transfers. Necessity. If the action profile z(·) is rationalizable, then Lip(c, S; IRS ) is nonempty. Further, by Theorem 5.1.4, conditions (5.1.13) and (5.1.14) hold, and in view of (5.8.7) and Remark 5.1.5, it follows that the matrices

370


bij (y), y ∈ S, are symmetric and positive semidefinite. But the last means exactly the existence on S of a smooth convex function u satisfying (5.8.5). 2

Remark 5.8.3 Some related results with different proofs may be found in Rochet (1987). Remark 5.8.4 When v(x, z) is not of the form (5.8.4), various sufficient conditions for rationalizablilty of action profiles can be derived from Theorems 5.1.6 and 5.1.8 if one takes into account that the functions (5.1.12) are given by βij (x, y) =

m

∂zk ∂2v (y, z(y)) (y) ∂zk ∂xj ∂xi k=1 m m ∂2v ∂zk ∂2v ∂zk + (y, z(y)) − (x, z(y)) (y) (y) ∂zk ∂zl ∂zk ∂zl ∂xi ∂xi k=1 l=1 2 m 2 ∂2v ∂ zk ∂ v (y, z(y)) − (x, z(y)) (y); + ∂zk ∂zk ∂xi ∂xj k=1

here ∂c (x, y) = ∂yj

−

∂2c (x, y) = ∂xi ∂yj

−

m ∂zi ∂v (x, z(y)) (y), ∂zk ∂xj

k=1 m k=1

∂zk ∂2v (x, z(y)) (y). ∂zk ∂xi ∂xj

Remark 5.8.5 There are some further applications to mathematical economics of the duality theory for mass transfer and related problems. Applications to demand theory are given in Levin (1995a) and applications to nonatomic market games are discussed in detail in Gretsky, Ostroy and Zame (1992).

6 Mass Transshipment Problems and Ideal Metrics

The Kantorovich–Rubinstein mass transshipment problem is to minimize c(x, y) db(x, y) over all transshipments b that satisfy the the total cost IR2n

balancing condition b(· × IRn ) − b(IRn × ·) = (P − Q)(·); P and Q are viewed as initial and final mass distributions, respectively, and c(x, y) is a cost function. The duality theory associated with this was presented in Chapters 4 and 5. In this chapter we shall study transshipment problems based on higher-order marginal differences. On the real line we specify the difference between the rate of transshipment of the initial mass d d dx π1 b ((0, x]) and the rate of completing the final mass dx π2 b ((0, x]), d d π1 b ((0, x]) − π2 b ((0, x]) = (x), dx dx and we consider its generalization to IRk . It turns out that one can construct a new class of ideal metrics form this idea, and several well-known metrics allow a representation as solution of a Kantorovich–Rubinstein problem (KRP) with smooth transshipment plans. A representation of this type allows the transfer (easy to establish) of inequalities between metrics to the corresponding inequalities for minimal metrics.(1) (1) The results in this chapter are due to Rachev (1991b), Hanin and Rachev (1994, 1995), Ignatov and Rachev (1986), Rachev and R¨ uschendorf (1991a).

372

6. Mass Transshipment Problems and Ideal Metrics

6.1 Kantorovich–Rubinstein Problems with Constraints on the Rate of Transshipments: Applications to the Theory of Probability Metrics The classical Kantorovich–Rubinstein problem (KRP) of transshipment of an initial mass P to its final destination Q deals with the minimal cost involved in the optimal transshipment: minimize c(x, y)b( dx, dy) (6.1.1) subject to

π1 b − π2 b = P − Q;

(6.1.2)

cf. Section 4.1. In the above formulation P and Q may be viewed as probabilites on IR+ , c(x, y) as a continuous nonnegative (cost) function, and the b’s representing the admissible transshipment plans as nonnegative Borel measures on IR2+ having marginals π1 b and π2 b. Thus the KRP on the real line is to minimize the total cost c db over all transshipments b that satisfy the balancing condition b(A × IR+ ) − b(IR+ × A) = (P − Q)(A) for any Borel set A on IR+ . If P and Q are discrete measures, the KRP becomes the well-known transshipment network flow problem (see Berge and Ghouila-Houri (1965, Section 9.8)). Suppose now we want to minimize the total cost c db under balancing conditions on the rate of transshipment. The motivation for this problem comes from the interpretation of the KRP as a multistage transportation problem (cf. Kemperman (1983), Rachev and R¨ uschendorf (1991a), Rachev and Taksar (1992)). Similarly, to the constraints (6.1.2), that is, π1 b((0, x]) − π2 b((0, x]) = (P − Q)((0, x]),

∀x ∈ IR+ ,

we assume that the difference between the rate of transshipment of the inid tial mass dx π1 b((0, x]) and the rate of completing the final mass d π b((0, x]) is given by (x), x ∈ IR+ . (We shall study this problem and dx 2 its generalization to IRk .) On the real line the dual and explicit solutions of this problem are straightforward consequences of Theorem 6.1.3 (to be shown further on in this section): Namely,       d + (π1 b((0, x]) − π2 b((0, x])) = (x), ∀x ∈ IR c db; inf (6.1.3)   dx  2  IR+

6.1 Kantorovich–Rubinstein Problems with Constraints

=

=

373

      sup f d ; f exists a.e. and |f | ≤ 1 a.e.     IR+ +∞ x (t) dt dx.

−∞ −∞

Here the conditions on the set of constraints in the primal problem (6.1.3) are determined by the following assumptions on the function : (a) (0) = 0. (b) The total rate difference is zero,

(x) dx = 0.

IR+

(c) There are no “rate explosions” on intervals of considerable length; in other words, |(x)| dx < ∞. (6.1.4) IR+

The other motivation for the study of the KRP with smooth transportation plans b comes from the theory of probability metrics, see Zolotarev (1986) and Rachev (1991c). There are two basic directions in this theory: (a) to establish relationships between compound and simple metrics; (b) to determine a best (“ideal”) metric for a given stochastic approximation problem. In the previous example (6.1.3), if the cost c(x, y) equalsthe distance |x − y| and b is a probability on IR2+ , then the total cost c db may be viewed as a compound metric τ (X, Y ) = E|X − Y |

(6.1.5)

between the random vectors X and Y having joint distribution b. On the other hand, |P (X ≤ x) − P (Y ≤ x)| dx (6.1.6) κ(X, Y ) = κ P X , P Y = IR+

is a simple metric. We use the notions of compound and simple metrics to distinguish metrics determined by the joint law of the pair of random variables (cf. (6.1.5)) and metrics determined by the marginal distributions of random variables to be compared (cf. (6.1.6)).

374


The KRP provides an important relationship between τ and κ:       |x − y|b( dx, dy); π1 b − π2 b = P − Q (6.1.7) inf     2 IR+

1 0 = inf ατ (X, Y ); X, Y ≥ 0, α > 0, α P X − P Y = P − Q = κ(P, Q). The τ -metric arises also as a solution of the classical Monge–Kantorovich transportation problem (MKTP)       inf |x − y|b( dx, dy); π1 b = P, π2 b = Q (6.1.8)    2  IR+

= inf{τ (X, Y ); P X = P, P Y = Q} = κ(P, Q). The above dual relationship is well known in the theory of probability metrics, it reads, “κ is a minimal metric with respect to τ .” In view of the problem of determining the exact rate of convergence in the central limit theorem, the notion of ideal metrics has been introduced. A compound metric µ is ideal of order r > 0 if for any Xi , Yi and constants ci , ci Xi , ci Y i |ci |r µ(ci Xi , ci Yi ). (6.1.9) ≤ µ i

i

i

A simple metric µ(X, Y ) = µ(P X , P Y ) is ideal of order r if (6.1.9) holds for independent Xi ’s and Yi ’s. Note that τ and κ are ideal metrics of order 1. There is no compound ideal metric of order greater than 1 (see Section 6.4), while the Zolotarev ζn -metric, +∞ X n−1 (x − t) X Y (P − P )( dt) dx, ζn (X, Y ) := (n − 1)!

(6.1.10)

−∞ −∞

is a simple ideal metric of order n ≥ 1. Clearly, ζ1 = κ, and moreover, there is no compound metric Zn such that ζn is minimal with respect to Z. In other words, ζn does not arise as a solution of a certain MKTP(2) (cf. (6.1.8)). We shall show, however, that ζn is a solution of KRP with smooth (2) We

shall discuss this issue in Section 6.4 in full detail.


375

transshipment plans. We shall consider metrics between k-dimensional random vectors and study a new class of ideal metrics of order r = kn + 1, n = 0, 1, . . . . These metrics will appear as solutions of KRP with smooth transportation plans. We next formulate and prove the duality theorem outlined in (6.1.3); see Rachev (1991b). Let M = M (IRk ) be the space of finite signed Borel measures on IRk such that (a) m(IRk ) = 0, |m| = m+ + m− < ∞; that is, the total mass of m is zero, and the total variation norm |m| is finite; j + k (b) xi m( dx) = 0, j = 1, . . . , n; IRk

i=1

n + k (c) xi |m|( dx) < ∞. IRk

i=1

For any integer n ≥ 0 and m ∈ M define a signed measure mn , called “nth integral” of m, as follows. On IRk− , mn is determined by its distribution function (d.f.); for xi ≤ 0, i = 1, . . . , k, define x1 ···

Fmn (x1 , . . . , xk ) := −∞

xk + k

−∞

(xi − ti )n m( dt1 , . . . , dtk ). n! i=1

(6.1.11)

For x ∈ IRk+ (k > 1) it is convenient to define the “nth integral” mn via its “survival” function

F mn (x1 , . . . , xk ) :=

k H mn (xi , ∞) i=1 ∞

:=

···

k

(−1)

x1

(6.1.12)

∞ + k

xk

(xj − tj )n m( dt1 , . . . , dtk ). n! j=1

Now, (6.1.11) and (6.1.12) give rise to the following general definition of mn on IRk : For xj > 0 with j ∈ J ⊂ {1, . . . , k} and xj ≤ 0 with j ∈ J we define

Fn(J) (x1 , . . . , xk ) := mn (A1 × · · · × Ak ) (6.1.13) + k n (xj − tj ) m( dt1 , . . . , dtk ). := (−1)|J| · · · n! j=1 A1

Ak

376


Here Aj := (xj , ∞) for j ∈ J, and Aj := (−∞, xj ] otherwise. Note that for any n ≥ 1, mn is absolutely continuous, and its density pmn is determined by (6.1.13). In fact, for n ≥ 1, the density admits the form (J)

pmn (x) = Fn−j (x),

(6.1.14)

provided that x has positive components xj > 0 on j ∈ J and nonpositive components xj ≤ 0 on j ∈ J. The characterization provided in the next lemma will serve to describe the set of constraints in our new version of the MKTP. Lemma 6.1.1 The measure mn has total mass zero and finite total variation. Proof: For x = (x1 , . . . , xk ) let us define for the sake of brevity dx1 · · · dxk ; k H (−∞, x] := (−∞, xi ]; dx

:=

i=1 x1

x −∞

−∞

0

xk ···

:=

i=1

∞ ;

∞

···

:=

−∞

x

(0, . . . , 0) ∈ IR ; k

=

k H (xi , ∞);

(x, ∞) :=

x1

∞ ;

xk

etc.

We split the total mass of mn over all 2k quadrants in IRk as follows. mn (IRk ) =

mn (−∞, 0 ] + mn

=:

I1 + I2 + · · · + I2k .

(0, ∞) ×

k H

(−∞, 0]

(6.1.15) + · · · + mn (0, ∞)

i=2

Since m0 = m, let n > 0, and therefore for the first integral in (6.1.15) we have the representation I1

= IRk −

=

 x + k n−1 (x − t ) j j  m( dt) dx (n − 1)! j=1 

(6.1.16)

−∞

  t + + k k n−1 tnj (xj − tj )   dx m( dt) = (−1)nk m( dt).  (n − 1)! n! j=1 j=1

IRk −

0

IRk −


377

Similarly, we get for the second integral in (6.1.15)

nk

I2 = (−1)

k−1 IR+ IR−

k + tnj m( dt). n! j=1

Continuing in the same fashion, we finally obtain the representation for the last term in (6.1.15): I2k = (−1)|J|

IRk +

= (−1)nk IRk +

 d 

 (xj − tj )n m( dt) n! j=1

∞ + k x

(6.1.17)

 ∞ + k n−1 (t − x ) j j  m( dt) dx (n − 1)! j=1 x

  t + k n−1 tnj   (tj − xj ) nk nk dx m( dt) = (−1) m( dt). = (−1)  (n − 1)! n! j=1 IRk +

IRk +

0

Next we combine (6.1.15)–(6.1.16) and use condition (a) to get mn (IRk ) = (−1)nk

+ k tnj m( dt) = 0. n! j=1

IRk

Similar arguments yield |mn |(IRk ) = |pmn (x)| dx

(6.1.18)

IRk

=

+ n |tj |n |m|( dt) < ∞. n! j=1

IRk

This completes the proof of the lemma.

2

Let Bn (m) be the set of all nonnegative Borel measures on the product IRk × IRk such that b(A × IRk ) − b(IRk × A) = mn (A)

for all Borel sets A ⊂ IRk . (6.1.19)

From Lemma 6.1.2 we obtain the following characterization of the set Bn (m) in terms of the derivatives of the marginal differences.

378


Lemma 6.1.2 For any m ∈ M and n ∈ IN, Bn (m) consists of all nonnegative Borel measures on IRk × IRk with absolutely continuous marginal difference b(·) = b(·×IRk )−b(IRk ×·). Its density p b satisfies the follow∂ (n−1)k ing balancing equation: The partial derivative of p b , (n−1) (n−1) p b (x) ∂x1

coincides with the d.f. of m.

···∂xk

Proof: In fact, from the definition of mn (see (6.1.13)) it follows that the marginal difference b in (6.1.19) has density p b . Moreover, ∂ (n−1)k (n−1) ∂x1

p (x) (n−1) b · · · ∂xk

= Fm (x),

x ∈ IRk .

To check the inverse we use (6.1.13) again in combination with conditions (a), (b), and (c). 2 For any n ≥ 0 define the following version of the Kantorovich–Rubinstein seminorm     c db; b ∈ Bn (m) . (6.1.20) ||m||n := ||m||n,c := inf   IR2k

We shall call the problem of determining the minimal value (6.1.20) “the mass transshipment problem with constraints on the derivatives of the marginals.” Here the cost function c : IR2k → R+ is assumed to satisfy the following regularity conditions: (c1) c(x, y) = 0

if and only if x = y.

(c2) c(x, y) = c(y, x). (c3) c(x, y) ≤ λ(x) + λ(y), where λ is a measurable function. (c4) λ maps bounded sets to bounded sets. (c5) sup{c(x, y); ||x|| < a, ||x − y|| < δ} tends to zero as δ → 0 for any a > 0. Recall that the KRP deals with a particular case of the functional (6.1.20), namely with ||m||0 (cf. (6.1.1), (6.1.2) and (6.1.20)). The seminorm || · ||n may be viewed as an infinite-dimensional network flow problem (or mass transshipment problem) with constraints on the derivatives of π1 b−π2 b. For example, for n = 1, (c) reads that “π1 b − π2 b has a density equal to the d.f. Fm of m ∈ M .”


379

Our objective is to obtain dual representations for ||m||n . Let L be the space of functions g : IRk → IR having finite Lipschitz norm ||g||L,c = sup |g(x) − g(y)|/c(x, y).

(6.1.21)

x =y

Further, let Ln be the space of nth integrals of functions g from L; that is, x + n

gn (x) := g0

0

:=

(xj − tj )n−1 g(t) dt, (n − 1)! j=1

x ∈ IRk , n ≥ 1,

(6.1.22)

g.

Theorem 6.1.3 (Duality representation for ||m||n ) For any m ∈ M with λ d|m| < ∞, sup f dm . f ∈L

||m||n =

(6.1.23)

n

Proof: Lemma 6.1.1 and the Kantorovich–Rubinstein theorem (see Rachev and Shortt (1990, Theorem 2.6); see also Rachev (1991a, Section 5.3)) provide a dual representation for ||m||n : ||m||n = ||mn ||0

= sup g dmn . g∈L

(6.1.24)

We next use integration by parts to get an alternative expression for the integral in (6.1.24). For any g ∈ L,

g dmn IRk

 x + k n (x − t ) j j = g(x) d  m( dt) + · · · n! j=1 −∞ IRk  ∞ + k n (xj − tj ) m( dt) + (−1)k g(x) d  n! j=1



IRk +

=:

x

J1 + · · · + J2k .

Without loss of generality we may assume that g has compact support (see Chapter 4). Otherwise, one can use a truncated version of g and then use conditions (b) and (c) to pass to the limit making use of the Lebesgue dominated convergence theorem. Instead of integrating by parts, we view

380


the integral In as a convolution of a measure µg (generated by the “density” g) and mn . The commutative property of convolutions then gives   0   k g(t) dt (x) = (−1) g1 (x)pmn−1 (x) dx. I1 = dF   mn−1 IRk −

Here g1 (x) =

k

x

IRk −

g(t) dt, x ∈ IRk , and pmn−1 = Fmn−2 is the density of mn−1

0

on IRk− . Arguing in the same way, for any of the integrals Ij , we obtain k k g dmn = (−1) g1 (x)pmn−1 (x) dx = (−1) g1 (x)Fmn−2 (x) dx. IRk

IRk

IRk

The total mass of mn−1 is zero, and |mn−1 | < ∞; this is a consequence of x Lemma 6.1.1 and assumptions (a)–(c). With g2 (x) = g1 (t) dt we use the above “convolution” g dmn = k IR

0

argument to get Fmn−2 (x) dg2 (x) = g2 (x) dFmn−2 (x) . k k IR

IR

Continuing in the same way this gives finally g dmn = gn dFm . k IR

This equality completes the proof of (6.1.23).

2

Remark 6.1.4 (Closed formula for ||m||n in the univariable case) The duality expression from the seminorm ||m||n in (6.1.23) is remarkably simple in the univariable case. Let k = 1 and let the cost function c(x, y) have the form c(x, y) = |x − y| max(h(|x − a|), h(|y − a|)),

(6.1.25)

where h(t) is an increasing function on t ≥ 0 with h(t) > 0 for t > 0. If |x|h(|x − a|)m( dx) < ∞, then the theorem implies (6.1.26) ||m||n = sup gn dm = sup g dmn gn ∈Ln g∈L IR

IR


 0 

=



381

 n (x − t) dFm (x) sup g(x) d  n! −∞ −∞ ∞  ∞ n (x − t) − g(x) d dFm (x) ; n! x

x

0

 

g : IR → IR, g exists a.e. and |g | ≤ h(|x − a|) a.e.



.

The last equality follows from Rademacher’s theorem (cf. Rachev and Shortt (1990, Theorem 3.1)). Condition (b) gives that (x − t)n dFm (t) = 0 for any x ∈ IR, (6.1.27) IR

and therefore, ||m||n

x n (x − t) dFm (t) h(|x − a|) dx. ≤ n! IR

(6.1.28)

−∞

Choosing an “optimal” g in (6.1.26) by setting the first derivative  x  n−1 (x − t) g (x) = h(|x − a|) sign  dFm (t) , (n − 1)! −∞

we conclude that (6.1.28) is valid with the equality sign: x n (x − t) h(|x − a|) dx. (t) dF ||m||n = ||m||n,c = m n! IR

(6.1.29)

−∞

This is a closed-form expression for the optimal value of the mass transshipment problem with constraints on the marginal derivations. If h(t) ≡ 1, i.e., c(x, y) = |x − y|, then this optimal value is a well-known metric. Namely, ||m||n,c is the Zolotarev ideal metric of order n + 1 : x n (x − t) dx, n ≥ 0; (6.1.30) dF (t) ||m||n,c = ζn+1 (m) := m n! IR

−∞

cf. (6.1.10). The last equality shows that ζn does admit representation as a solution of a KRP, despite the fact that it has no representation as a minimal metric; that is, it is not a solution of an MKTP; see Section 6.4.

382


Remark 6.1.5 (The Fortet–Mourier metric) If c(x, y) = ||x − y|| max(1, ||x||p−1 , ||y||p−1 ), then     ||m||p,c = sup f dm ; f (x) − f (y) ≤ c(x, y)   k IR

is known as the Fortet–Mourier metric (cf. Fortet and Mourier (1953), Rachev (1991)), and thus ||m||n,c (n ≥ 0) may be viewed as a version of this metric. Remark 6.1.6 (Bounds for ||m||n in the multivariate case k > 1) Even in the bivariate case k = 2 and for n = 0, to find an explicit representation for the Kantorovich–Rubinstein norm ||m||0 = sup f dm f ∈L

is a well-known open problem (cf. Section 3.7, Dobrushin (1970), Rachev (1984d), Levin and Rachev (1989)). Nevertheless, the dual form of ||m||n provides bounds from above for ||m||n . Since g(0) = 0, g ∈ L ⇒ |g(x)| ≤ c(x, 0), we have for n ≥ 1 the following bound for ||m||n : ||m||n = sup g dmn g∈L,g(0)=0

x k + (xj − tj )n−1 ≤ m( dt) dx c(x, 0) (n − 1)! −∞ −∞ j=1 ∞ x2 xn k ∞ 0 0 + (xj − tj )n−1 m( dt) dx · · · c(x, 0) ··· + (n − 1)! j=1 0

0 −∞

−∞

x1 −∞

−∞

··· ··· ·∞· · k ∞ + (xj − tj )n−1 m( dt) dx + c(x, 0) j=1 (n − 1)! 0

=:

x

κn (m).

Clearly, κn admits a bound from above by the following absolute pseudomoment: c(x, 0)|x1 · · · xk |n |m|( dx); κn (m) ≤ χn,c (m) := IRk

χn,c (m) are called absolute pseudomoments. They generate the metrics χn,c (P1 − P2 ) in the space of probability measures; see Zolotarev (1986).

6.2 Constraints on the κth Difference of Marginals

383

In particular, if c(x, y) = ||x − y||, then χn,c (m) < C||·|| χn+1 (m), where C||·|| is a constant depending on the norm in IRk and χn (m) is the nth absolute pseudomoment χn (m) = ||x||n |m|( dx). IRk

For n = 0 we have in a similar way ||m||0 ≤ χ0,c (m). Remark 6.1.7 (Ideal metrics) If c(x, y) = ||x − y||, then ||m||n defines an ideal metric of order r = kn + 1 (for the definition cf. (6.1.9)), Zn (X, Y ) = ||P X − P Y ||n .

(6.1.31)

For the proof, it is enough to show that Zn (X + Z, Y + Z) ≤ Zn (X, Y )

(6.1.32)

for any Z independent of X and Y , and Zn (cX, cY ) = |c|kn+1 Zn (cX, cY ).

(6.1.33)

Both properties (6.1.32) and (6.1.33) follow readily from the dual representation for ||m||n given by our theorem.

6.2 Problems with Constraints on the κth Difference of Marginals: The Compact Space Settings A disadvantage of the metric Zn (see (6.1.33)) is that the ideality order depends on the dimension of the r.v.s X and Y .(3) To avoid this dependence we consider in this and in the next section the following variant of || · ||k (cf. (6.1.20)): (6.2.1)

µ r = inf h r d|Ψ|(x, h), k − 1 ≤ r ≤ k, k ∈ N. ψ∈Γµ

Here µ is a Borel measure on IRn such that αn 1 xα α1 + · · · + αn ≤ k − 1. (6.2.2) 1 · · · xn dµ(x1 , . . . , xn ) = 0, IRn (3) This drawback of Z is well pronounced in the theory of probability metrics, where n ideal metrics are used in the rate of convergence problems for multivariate limit theorems.

384


In (6.2.1), Γµ stands for the set of transshipment plans satisfying the balancing condition

kh f (x) dΨ(x, h),

f dµ = IR2k

IRk

where kh is the kth difference of f with step h. The dual representation for µ r will be given by

µ r = sup

f dµ; ωk (f ; t) ≤ tr .

(6.2.3)

Here ωk (f, t) is the kth modulus of continuity of f . The results of this and the next section are due to Hanin and Rachev (1994, 1995). We start with some notation and preliminary results. For a multi-index α = (α1 , . . . , αn ) ∈ ZZn+ = (0∪IN)n and x = (x1 , . . . , xn ) αn 0 0 1 ∈ IRn set xα = xα 1 · · · xn . For fixed k ∈ IN let M = Mk be the linear n space of all finite signed measures µ on IR having compact support and such that xα µ( dx) = 0 for all α with |α| ≤ k − 1. In the next section we shall drop the restriction on the support of µ. For each µ ∈ M 0 , let Γ(µ) = Γk (µ) be the set of all finite signed measures ψ on IR2n such that

kh f (x) dψ(x, h)

f (x) dµ(x) = IR2n

IRn

for all f ∈ Cb (IRn ); here kh f (x)

=

k i=0

k−i

(−1)

k f (x + ih) i

is the kth difference of f at the point x with step h. Lemma 6.2.1 For every µ ∈ M 0 , Γ(µ) is nonempty. Proof: Suppose the support of µ ∈ M 0 lies in a compact interval Q = ◦ ⊗ni=1 [ai , ai + d] ⊂ IRn . Let C(Q) be the quotient of the space C(Q) of continuous functions on Q modulo Pk−1 , the polynomials on IRn of degree ◦ less than k. The norm in C(Q) is given by the usual factor-norm Ek−1 (f ; Q) := inf { f − P Q ; P ∈ Pk−1 } .

6.2 Constraints on the κth Difference of Marginals ◦

◦

385 ◦

Define the linear mapping τ : C(Q) → C(Q), τ f := k f . Here f is the ◦ stands for element of the factor space C(Q) generated by f , Q := {(x, h); x, h ∈ IRn , x + ih ∈ Q for i = 0, 1, . . . , k} , Q and finally, k f is the difference operator . k f := kh f (x); (x, h) ∈ Q The proof of Lemma 6.2.1 will rely on the proof of two claims given below. Claim 6.2.2 (The mapping τ is injective) For any bounded function f on Q, k f ≡ 0 ⇐⇒ f ∈ Pk−1 .

(6.2.4)

Proof of Claim 6.2.2: The implication “⇐” is obvious. The proof of “⇒” is given by the following proposition. In what follows, A = A(n, k) denotes an absolute constant that can be different in different places. Here we need a propositition that plays a crucial role in our further analysis. Proposition 6.2.3 For any Lebesgue measurable and almost everywhere (a.e.) bounded function f on a compact interval Q = ⊗ni=1 [ai , ai + d], the following inequality holds: Ek−1 (f ) ≤ Aωk (f ; d/k), where Ek−1 designates the factor-norm Ek−1 (f ) := Ek−1 (f ; Q) = inf { f − P Q ; P ∈ Pk−1 } , and ωk is the kth modulus of continuity, |h| ≤ t , t ≥ 0. ωk (f ; t) = ωk (f ; Q; t) := sup |kh f (x)|; (x, h) ∈ Q, In particular, k f ≡ 0 implies that f ∈ Pk−1 . Proof of Proposition 6.2.3: The proof follows Brudnii (1970). Without loss of generality we may assume that ai = 0 and d = 1. Suppose also that k ≥ 2. Case n = 1. Set δi to be the k unit mass at i ∈ IN, and define the finite signed measures µ = k0;1 = i=0 (−1)k−i ki δi and ν = δ0 − (−1)k µ. Then (i)

supp µ = {0, 1, . . . , k},

supp ν = {1, . . . , k};

386


f (x + th)µ( dt) = kh f (x)

(ii) IR

for any f on IR. In particular, if f (t) = ti , x = 0, and h = 1 in (ii), we have

ti dµ(t) =

(iii)

0,

i = 0, . . . , k − 1,

k!, i = k.

IR

Define the (k − 1)th integral of the d.f. Fµ (τ ) := µ((−∞, τ ]), τ ∈ IR, by τ k

µk (τ ) = (−1)

−∞

that is,

∂ k−1 µ (τ ) ∂τ k−1 k

(τ − t)k−1 dµ(t), τ ∈ IR; (k − 1)!

= (−1)k Fµ (τ ).

Let us recall now (i)–(iii) to get (iv) (v)

supp µk ⊂ [0, k], µk (τ ) dτ = 1. IR

Next, to any Lebesgue measurable and a.e. bounded f on [0, 1] we associate f(x) =

f (x + tτ k −3 )µk (τ ) dν(t) dτ,

x ∈ [0, 1 − 1/k];

IR IR

Note that f is well-defined in view of (i) and (iv). Our next claim is that f possesses a kth derivative f (k) and

ess sup f (k) < ∞.

(6.2.5)

To see this, we extend f on the entire real line, setting f (x) = 0 for x ∈ [0, 1]. Now we have f(x) = k 3

k f (y)µk (k 3 (y − x)/t) dy 1 IR

1 dν(t). t


387

Therefore, by the definition of µk , k

f(k−1) (x)

(k−1)

k−1 3k

=

(−1)

k

f (y)µk

(k 3 (y − x)/t) dy t−k dν(t)

1 IR

k

f (y)Fµ (k 3 (y − x)/t) dy t−k dν(t)

−k 3k

=

1 IR

k

∞



−k 3k

=



 f (y) dy  dµ(τ ) t−k dν(t).

x+tτ k−3

1 IR

The derivative f(k) exists a.e. on [0, 1 − 1/k], and in view of (ii), (k)

f

k (x) =

k

3k

f (x + tτ k −3 ) dµ(τ ) t−k dν(t)

1 IR

k 3k k ktk−3 f (x)t−k dν(t).

=

1

Finally, observe that the boundness of f implies ess sup |f(k) | < ∞, and thus the proof of (6.2.5) is complete. Recall that ν = δ0 − (−1)k µ and µk (τ ) dτ = 1; then f (x + tτ k −3 ) dµ(t) µk (τ ) dτ f(x) = f (x) − (−1)k IR IR

=

f (x) − (−1)

kτk−3 f (x)µk (τ ) dτ.

k IR

Therefore, |f(x) − f (x)| ≤ Aωk (f ; k −2 ) ≤ Aωk (f ; k −1 ) for x ∈ [0, 1 − 1/k] k with an absolute constant A equal to |µk (τ )| dτ . Using a Taylor expan0

sion, xk ess sup |f(k) (ϑ)| k! ϑ∈[0,x]

|f(x) − P (x)| ≤ ≤ where P (x) =

Aωk (f ; k −2 ) ≤ Aωk (f ; k −1 ),

k−1 i=0

f(i) (0) i i! x .

x ∈ [0, 1 − 1/k],

The bounds for |f − f| and |f − P | imply

|f (x) − P (x)| ≤ Aωk (f ; 1/k)

for x ∈ [0, 1 − 1/k].

388


Next, let x ∈ (1 − 1/k, 1]. Put h = x/k and observe that h ∈ (0, 1/k] and ih ∈ [0, 1 − 1/k] for all i = 0, 1, . . . , k − 1. Further, kh f (0) =

kh (f − P )(0)

=

(f − P )(x) +

k−1

(−1)k−i

i=0

k (f − P )(ih), i

and therefore, due to the choice of h,

|f (x) − P (x)| ≤ |kh f (0)| + A sup

|f (y) − P (y)|

y∈[0,1−1/k]

≤

Aωk (f ; 1/k)

for x ∈ (1 − 1/k, 1].

Combining the estimates for |f − P | on [0, 1], we obtain the desired result, Ek−1 (f ) ≤ f − P ∞ ≤ Aωk (f ; 1/k), for the univariate case n = 1. Case n ≥ 2. We need the following bound: For any f ∈ L∞ ([0, 1]n ) there exists P ∈ P(k−1)n such that

f − P ∞ ≤ Aωk (f ; 1/k).

(6.2.6)

We shall use the result for n = 1 we have shown aready. For any x = (x1 , . . . , xn ) ∈ IRn fix x = (x2 , . . . , xn ). Then as we have shown, there exists P1 (x) = P1 (x1 , x ) that for any x ∈ [0, 1]n−1 is a polynomial of degree less than k with respect to x1 ; let us denote it by P1 (x1 , x ) =

k−1

ci (x )xi1 .

i=0

Moreover, P1 (x1 , x ) has the property that for all x ∈ [0, 1]n−1 , x1 ∈ [0, 1], and with e1 := (1, 0, . . . , 0) ∈ IRn , 0 1 |f (x1 , x ) − P1 (x1 , x )| ≤ A sup |khe1 f (t, x )|; t, t + kh ∈ [0, 1] . Therefore,

f − P1 ∞ ≤ Aωk (f ; 1/k). The mapping f → P1 defines an operator T1 : L∞ → L∞ , where L∞ is indeed the L∞ -space of functions on [0, 1]n with ess sup norm · ∞ . The


389

above inequality shows that T1 is bounded. The construction of P1 as a polynomial with coefficients ci (x ) =

1 ∂i f (0, x ) i! ∂xi1

implies that T1 is also a linear operator. In a similar way we determine polynomials Pi and operators Ti , i = 2, . . . , n. Define P := T f := Tn Tn−1 · · · T1 f , a polynomial on IRn of degree ≤ (k − 1)n. As is readily seen from our construction, the operators Ti (i = 1, . . . , n) commute, and thus, f −P

= (f − T1 f ) + (T1 f − T2 T1 f ) + · · · + (Tn−1 · · · T1 f − Tn · · · T1 f ) = (f − T1 f ) + T1 (f − T2 f ) + · · · + Tn−1 · · · T1 (f − Tn f ).

This representation yields the desired estimate: For any f ∈ L∞ , there exists a P ∈ P(k−1)n such that

f − P ∞ ≤ max f − Ti f ∞ (1 + T1 + T1 T2 + · · · + T1 · · · Tn−1 ) 1≤i≤n

≤ Aωk (f ; 1/k), which proves (6.2.6). ◦

◦

On the factor space L∞ := L∞ /Pk−1 define the seminorm f k := ◦ ◦ ◦ ωk (f ; 1/k), f ∈ L∞ , where f is f modulo polynomials of degree less than k. Now let us show that ◦ (6.2.7) L∞ , · k is a Banach space. First let us show that · k is a norm in L∞ . For f ∈ L∞ , invoking (6.2.6), ◦

f k = 0 ⇔ kh f (x) = 0

for all x ∈ [0, 1]n , h ∈ IRn ⇒ f ∈ P(k−1)n .

We need the following multivariate analogue of the difference operator δhk . For any h1 , . . . , hk ∈ IRn , set kh1 ,...,hk := 1h1 1h2 · · · 1hk , and for h1 = · · · = hk = h we write kh . Let (e1 , . . . , en ) be the standard basis in IRn . For a multi-index α ∈ ZZn+ and u ∈ IRn set α2 α2 αn α u := u1 e1 u2 e2 · · · un en .

390


Let S(h) be the shift operator S(h)(f ) = f (· + h) and set (h) = 1h := S(h) − S(0). Then (cf. Johnen and Scherer (1977))     k + (a) (hj ) = (−1)k−|k| S  hj  k  hj /j  ; j=1

j ∈k

k∈K

j∈k

here K is the set of all subsets k of {1, 2, . . . , k}, |k| = card k, and xk =

hj ,

hj

uk =

j ∈k

j∈k

j

,

k ∈ K.

k Remark 6.2.4 Recall that kh f (x) = i=1 (−1)k−i ki f (x + ih) defines a n measure kx;h = i=0 (−1)k−i ki δx+ih . In the same way, kh1 ,...,hk and α u n determine measures kx;h1 ,...,hk and α x;u for any x ∈ IR . As a consequence of (a) we have that for any x, h1 , . . . , hk ∈ IRn , (b) kx;h1 ,...,hk = (−1)k−|k| kx+xk ;uk . k∈K

To obtain (a) for i = 0, . . . , k, we show that k +

((i − j)hj ) =

j=1

k +

(S((i − j)hj ) − S(0))

j=1

=

k

k−m

(−1)

m=1

=

j1 <j2 0, α α t f (x) := h1 ,...,hk f (x) = 0.

The above implies that for any α with |α| ≥ k, we have α t f (x) = 0, and therefore, α t f (0) = 0, t→0 t|α|

|α| ≥ k.

Dα f (0) = lim

◦

In other words, f ∈ Pk−1 , and hence f = 0. This proves that · k is a norm in L∞ /Pk−1 . ◦ ◦ Next let us show that L∞ , · k is a complete space. Let fm be a Cauchy sequence with respect to · k . Using (6.2.6),

E(k−1)n (fm − f ) ≤ ≤

m∈IN

Aωk (fm − f ; 1/k) ◦

◦

A fm − fn k → 0 as m, → ∞.

◦ The space L∞ , E(k−1)n is complete, and thus there exist g ∈ L∞ and a sequence Pm ∈ P(k−1)n , m ∈ IN, such that fm −Pm −g ∞ → 0 as m → ∞.

392


For any (x, h) ∈ In , I := [0, 1], i.e., (x, h) ∈ IRn with x + ih ∈ [0, 1]n , i = 1, . . . , k,

|kh Pm (x)|

≤

|kh (fm − Pm − g)(x)| + |kn g(x)| + |kh fm (x)|

≤

2k ||fm − Pm − g||∞ + 2k ||g||∞ + ωk (fm ; 1/k).

The Cauchy sequence {fm } is bounded, and therefore M := sup

sup

m (x,h)∈In

|kh Pm (x)| < ∞.

(m) α k Let Pm (x) = |α|≤(k−1)n Cα x . Then h Pm , being a polynomial of 2n IR , has the form Sm (x, h) := kh Pm (x) =

Cα(m)

|α|≤(k−1)n

=

k

(−1)k−i

i=0

k (x + ih)α i

(m) aβ,γ xβ hγ ,

|β|+|γ|≤(k−1)n

where k β+γ k−i k i|γ| (−1) γ i i=0 (m) β + γ k1 t|γ| (0). cβ+γ γ

(m) aβ,γ

= =

(m) cβ+γ

For x and h close to zero, |Sm (x, h)| is bounded by M , and thus the Markov inequality yields (m)

|aβ,γ | =

Dxβ Dhγ Sm (0)| ≤ AM, β!γ!

|β| + |γ| ≤ (k − 1)n.

Let α ∈ ZZn+ , |α| ≥ k. Setting γ ≤ α with |γ| = k, we obtain from the last two relations that (m)

|Cα(m) | =

|aα−γ,γ | α ≤ AM, γ k!

(m) α Set Pm (x) = k≤|α|≤(k−1)n cα x .

k ≤ |α| ≤ (k − 1)n.


393

(m)

The boundedness of the coefficients cα for k ≤ |α| < (k − 1)n implies the existence of a subsequence {mj } ⊂ {m} that converges uniformly on bounded sets to a polynomial R. Next, take f = g + R. The equality kh Pm = kh Pm implies that |kh fmj −f (x)| ≤ |kh fmj − g− Pmj (x)| + kh Pmj − R (x) ≤ 2k fmj − g− Pmj ∞ +2k Pmj − R ∞ , ◦

◦

(x, h) ∈ In .

◦

We see that fmj − f k → 0, and since {fm } is · k -fundamental, ◦ ◦ ◦ fm converges to f. Thus, L∞ , · k is complete. The proof of (6.2.7) is complete. Combining (6.2.6) and (6.2.7), we use the equivalence of the two Banach 2 norms Ek−1 and · k to complete the proof of Proposition 6.2.3. Using Proposition 6.2.3, we have the bound ◦

Ek−1 (f ) ≤ A f k = Aωk (f ; 1/k),

f ∈ L∞ .

Therefore the relation of equivalence (6.2.4) is verified, completing the proof of Claim 6.2.2. 2 ◦ Claim 6.2.5 On L = τ C(Q) the inverse mapping τ −1 is continuous. Proof of Claim 6.2.5: In fact, for any f ∈ C(Q), Ek−1 (f ; Q) ≤ Aωk (f ; Q; d/k) (cf. Proposition 6.2.3), where ωk stands for the uniform k-modulus of continuity |h| ≤ t , ωk (f ; Q; t) := sup |kh f (x)|; (x, h) ∈ Q,

t ≥ 0.

0 The right-hand side of the above inequality is bounded by A sup |kh f (x)|; ◦ =: A τ f . Therefore, τ −1 ≤ A, which proves Claim 6.2.5. (x, h) ∈ Q Q 2 We are now ready to complete the proof of Lemma 6.2.1. The measure µ ∈ M 0 viewed as an element on the conjugate space C(Q)∗ induces a linear functional µ ◦ τ −1 on L with norm bounded by A Var µ. The Hahn–Banach theorem provides the extension of the linear ◦

394


and so defines a measure ψµ on Q with functional to the whole C(Q) Var ψµ = µ ◦ τ −1 ≤ A Var µ. Then for all f ∈ C(Q),

f dµ =

IRn

f dµ =

◦

Q

Q

kh f (x) dψµ (x, h).

(τ f) dψµ = IR2n

In other words, ψµ ∈ Γµ as desired in Lemma 6.2.1, completing its proof. 2 Define the generalized Kantorovich–Rubinstein norm µ k,ϕ , µ ∈ M 0 , for fixed k ∈ N and given nondecreasing function ϕ : IR+ → IR+ , ϕ(0+ ) = ϕ(0) = 0, ϕ(t) > 0 for t > 0, by ϕ(|h|) d|ψ|(x, h), µ ∈ M 0 .

µ k,ϕ := inf ψ∈Γµ IR2n

For k = 1 and ϕ(x) = x , x ∈ IRn , the above functional is a straightforward analogue of the classical Kantorovich–Rubinstein functional; see Section 4.1. Lemma 6.2.1 guarantees that µ k,ϕ is finite on M 0 . Our main results, Theorem 6.2.14 and 6.2.15 below, deal with the dual representation for · k,ϕ . For k ∈ IN we define the generalized Lipschitz space Λkϕ as the set of all locally bounded functions f on IRn such that for some M ≥ 0, ωk (f ; IRn ; t) ≤ M ϕ(t)

for all t ≥ 0.

Define a seminorm in Λkϕ by f Λk ◦

Λkϕ

ϕ

= inf M . Then the factor space ◦

◦

◦

= Λkϕ /Pk−1 is a Banach space with norm f ◦k = f Λk , f ∈ Λkϕ . The Λϕ

ϕ

properties of the modulus ωk provide that without loss of generality, we may assume that ϕ(t)t−k is a nonincreasing function on IR. Therefore, ϕ(t) ≥ ϕ(1) min(1, tk ) for all t ≥ 0. Our objective now is the dual representation: sup f dµ for µ ∈ M 0 .

µ k,ϕ = f k ≤1 Λϕ IRn

Lemma 6.2.6 · k,ϕ is a norm on M 0 . Proof: Clearly, · k,ϕ is a seminorm. Suppose µ k,ϕ = 0 for some µ ∈ M 0 . For any k-times differentiable f (f ∈ C k ) with compact support we have |kh f (x)| ≤ A max Dα f ∞ |h|k , |α|=k

and also, |kh f (x)| ≤ 2k f ∞ . Combining these two bounds and recalling that ϕ(t) ≥ A(1Λtk ), we get, for any ψ ∈ Γµ ,


395

k k h f (x) d|ψ|(x, h) ≤ f (x) dψ(x, h) h 2n IR IR2n ≤ A max Dα f ∞ (1Λ|h|k ) d|ψ|(x, h)

f dµ = n IR

0≤|α|≤k

≤

2n

IR max Dα f ∞ ϕ(|h|) d|ψ|(x, h).

0≤|α|≤k

IR2n

This completes the proof of the lemma, since µ k,ϕ = 0 implies f dµ = 0 2 for all f ∈ C k , and so µ = 0. If µ ∈ M 0 and ψµ ∈ Γµ is defined as in the proof of Lemma 6.2.1, and supp µ ⊂ Q := ⊗ni=1 [ai , ai + d], then ϕ(|h|) d|ψµ |(x, h) ≤ Aϕ(d) Var µ.

µ k,ϕ ≤ Q

Next, each f ∈ Λkϕ induces a linear form Lf : M 0 → IR defined by Lf (µ) = f dµ. Recall that if f and g differ by a polynomial P ∈ Pk−1 , IRn

then Lf = Lg . Given ϕ ∈ Γµ , we have f dµ |Lf (µ)| = n IR = kn f (x) dψ(x, h) 2n IR ◦ ≤ f ◦k ϕ(|h|) d|ψ|(x, h). Λϕ

IR2n ◦

Taking the infimum over all ψ ∈ Γµ yields |Lf (µ)| ≤ f ◦k µ k,ϕ , so that Lf is a continuous linear functional with

Lf ∗k,ϕ

Λϕ

≤ f ◦k .

Thus, we may define a continuous linear transformation ∗ ◦ D k Λϕ , · ◦k −→ M 0 , · ∗k,ϕ Λϕ

◦

Λϕ

396

6. Mass Transshipment Problems and Ideal Metrics ◦

by D(f) = Lf . Lemma 6.2.7 The map D is an isometry. Proof: Note first that if for some x, h ∈ IRn , kx;h

=

k

(−1)

k−i

i=0

k δx+ih , i

then indeed δ(x,h) ∈ Γ kx;h . Therefore,

kx;h k,ϕ ≤

ϕ(| h|) dδ(x,h) ( x, h) = ϕ(|h|).

IR2n

Next, for each f ∈ Λkϕ , 0 1 sup |kh f (x)|/ϕ(|h|); x ∈ IRn , h = 0 1 0 = sup |Lf kx;h |/ϕ(|h|); x ∈ IRn , h = 0 0 1 ≤ Lf ∗k,ϕ sup kx;h k,ϕ /ϕ(|h|); x ∈ IRn , h = 0

◦

f ◦k

=

Λϕ

≤

Lf ∗k,ϕ ,

◦

so that f ◦k = Lf ∗k,ϕ , as desired. Λϕ

2

The next step is to show that the map D is surjective and hence an isometric isomorphism of Banach spaces. We start with some preliminary results. Call a signed measure on IRn simple if it is a finite linear combination of measures kx;h , x ∈ IRn , h ∈ IRn . Lemma 6.2.8 The simple measures are dense in M 0 , · k,ϕ . Proof of Lemma 6.2.8: The proof is based on the following three claims. Call µ ∈ M 0 a lattice measure if its support lies on the lattice dZZn for some d > 0. Claim 6.2.9 The lattice measures are dense in M 0 , · k,ϕ . Proof of Claim 6.2.9: Take µ ∈ M 0 and let Q = [−r, r]n be the cube containing the support of µ. Partition Q into mn disjoint cubes of equal volume Qi = xi , yi ; · stands for an interval in IRn without specifying which parts of its boundary are closed or open. Set µi to be the restriction


397

of µ on Qi and let xi,α := xi + 2rα mk , |α| ≤ k − 1. For each µi there exists a set of constants (ci,α )|α|≤k−1 such that

ci,α xβi,α |α|≤k−1

sβ dµi (s),

=

|β| ≤ k − 1,

(6.2.8)

Qi

and

|ci,α | ≤ A Var µi .

(6.2.9)

|α|≤k−1

To see this we need the following proposition. Proposition 6.2.10 (Hanin (1991)) Let Q = ⊗ni=1 ai , ai + d be a cube in IRn , and µ a finite signed Borel measure on Q. For any x in the closure Q and t ∈ IR \ {0} put xα = x + tα, α ∈ ZZn+ , |α| ≤ m. Then the system of equations cα xβα = sβ dµ(s), |β| ≤ m, |α|≤m

Q

has a unique solution, and moreover, |cα | ≤ A(1 + (d/|t|)m ) Var µ. |α|≤m

Proof of Proposition 6.2.10: For any m ∈ ZZ+ , x ∈ IRn , and t ∈ IR denote by dn,m (x, t) the determinant of the matrix An,m (x, t) = (x + tα)β |α|≤m, |β|≤m assuming some ordering of the rows and columns in An,m . Let us first show that dn,m (x, t) = tr dn,m (0, 1), where r = |α| and dn,m (0, 1) = 0.

(6.2.10)

|α|≤m

We can assume that the rows and columns in An,m (x, t) are in lexicon = {α ∈ graphic order ()). (Clearly, in ZZn+ , a ≤ b → a ) b.) Let Em n denote by Aγn,m (x, t) the matrix with ZZn+ ; |α| ≤ m}, and for any γ ∈ Em β-rows defined by the β-rows of the matrices An,m (0, t) and An,m (x, t):   β-rows of A (0, t) if β ) γ, n,m β-rows of An,m (x, t) =  β-rows of A (x, t) otherwise. n,m

398


We shall use an inductive argument on γ to show that det Aγn,m (x, t) = det An,m (x, t). For γ = 0 the above equality is obvious. Suppose that it holds for some γ = θ = γ, where γ = (0, . . . , 0, m) stands for the largest element in the n n sense of “)” of Em . Let σ be the first element of Em “)”-following θ. Then the binomial formula σ (x + tα)σ = (tα)σ + xσ−ν (tα)ν , |α| < m, ν ν≤σ ν=σ

leads to det Aθn,m (x, t) = det Aσn,m (x, t) +

σ ν≤σ ν=σ

ν

xσ−ν det Aσ,ν n,m (x, t).

θ Here Aσ,ν n,m (x, t) is the matrix An,m (x, t) where the σ-row is interchanged σ with row ((tα) )|α|≤m . Each matrix Aσ,ν n,m (x, t) has two equal rows, and so det Aσ,ν n,m (x, t) = 0. This completes the inductive argument.

In particular, dn,m (x, t) = det An,m (x, t) = det Aγn,m (x, t) = det An,m (0, t) = tr dn,m (0, 1), as required in the first part of the claim. To show that dn,m (0, 1) = 0 for all m ∈ ZZ+ and n ∈ IN, observe first that An,0 (0, 1) = 1 for all n ∈ IN and    A1,m (0, 1) =  

1 1 1 ··· 1 0 1 2 ··· m 0 1 22 · · · m ..................... 0 1 2m · · · mm

    

for all m ∈ IN.

Therefore, dn,0 (0, 1) = 0 and d1,m (0, 1) = 0, m, n ∈ IN. Suppose now that m ≥ 1 and n ≥ 2. Write γ ∈ ZZn+ in the form γ = (γ , γn ), where γ ∈ ZZn−1 + . Then ( & An−1,m (0, 1) Cn (0, 1) An,m (0, 1) = , Bn,m (0, 1) 0 where

; : Bn,m (0, 1) = (α , αn )(β ,βn )

|α |+αn ≤m, αn ≥1, |β |+βn ≤m, βn ≥1

.

n Set (α , αn ) = ξ + en , (β , βn ) = η + en , where ξ and . η are in Em−1 . It is readily seen that det Bn,m (0, 1) = det An,m−1 (0, 1) |ξ|≤m−1 (ξn + 1), that


399

is, dn,m (0, 1) = 0 for all m ∈ ZZ+ and n ∈ IN, which completes the proof of (6.2.10). Now, (6.2.10) provides a unique solution of the system

cα xβα =

|α|≤m

sβ dµ(s),

|β| ≤ m,

Q

namely, cα =

α,µ (x, t) det Sn,m , det An,m (x, t)

|α| ≤ m,

α,µ (x, t) is defined as An,m (x, t) except for the α-row, which equals where Sn,m    sβ dµ(s) , |β| ≤ m. Q

Set Qx,t = {y ∈ IRn ; x + ty ∈ Q} and define a measure ν ∈ M (Qx,t ) by ν(E) = µ(x + tE) for any Borel E on Qx,t , and so

β

(x + ty)β dν(y),

s dµ(s) = Q

|β| ≤ m.

Qx,t

Arguing as in the proof of (6.2.10), we get α,µ (x, t) = det Sn,m

=

α,ν det Sn,m (0, t) α,ν (0, 1), tr det Sn,m

|α| ≤ m.

Now we use the explicit form for cα and (6.2.10) to get the necessary bound for the cα ’s:

|cα | ≤

|α|≤m

≤

β −β β A max y dν(y) = A max |t| (s − x) dµ(s) |β|≤m |β|≤m Qx,t Q A max (d/|t|) Var µ ≤ A(1 + (d/|t|)m ) Var µ. 0≤ ≤m

The proof of Proposition 6.2.10 is now complete.

2

Consider now (6.2.8), (6.2.9) and put νi = |α|≤k−1 ci,α δxi,α . Since ci,α satisfies the moment conditions (6.2.8) and (6.2.9), we have that νi − µi ∈

400


mn M 0 for all i = 1, . . . , mn . Therefore, ν = i−1 νi is a lattice measure on M 0 , and invoking the bound for the sum of |ci,α |, we get n

m

n

m

Var νi =

n

|ci,α | ≤ A

i=1 |α|≤k−1

i=1

m

Var µi = A Var µ.

i=1

Recall that for any µ whose support is in the cube ⊗ni=1 [ai , ai + d] we obtained the bound µ k,ϕ ≤ Aϕ(d) Var µ, and therefore,

n

ν − µ k,ϕ

≤

m i=1

≤

Aϕ

νi − µi k,ϕ ≤ Aϕ

mn

2r mk

2r mk

mn

Var(µi − νi )

i=1

(Var νi + Var µi ) ≤ Aϕ

i=1

2r mk

Var µ.

Letting m → ∞ we obtained the desired approximation ν − µ k,ϕ ≤ ε. This completes the proof of Claim 6.2.9. 2 Claim 6.2.11 Every lattice measure on dZZn is a linear combination of measures α x;d ;

α = (α1 , . . . , αn ),

|α| ≤ k, x ∈ dZZn .

Here α x;d is the discrete measure associated with the difference operα1 αn k ator α d = de1 · · · den in the same manner as we associate x,h = k k−i k i=0 (−1) i δx+ih with the difference operator kh f =

k

(−1)k−i

i=0

k f (◦ + ih). i

Proof of Claim 6.2.11: We write µ in the form γ∈I n cγ δa+dγ , assumN n ing that the support of µ is in the set a + dIN for some a ∈ dZZn , and IN = {0, 1, . . . , N }. For the characteristic function (ch.f.) of µ we have the representation µ (t) = e−i(t,x) dµ(x) = e−(t,a) cγ e−id(t,γ) , t ∈ IRn . IRn

Observe that F (t) = fore,

n γ∈IN

n γ∈IN

Dα F (0) = (−i)|α|

cγ e−id(t,γ) is the ch.f. of µ(· + a), and there-

(x − a)α dµ(x) = 0,

IRn

|α| ≤ k − 1.


401

γ Consider the polynomial P (ξ) = n cγ ξ . Differentiate both sides γ∈IN −idt 1 , . . . , e−idtn and then set t = 0 to get of the equality F (t) = P e α D P (1, . . . , 1) = 0, |α| ≤ k − 1. Therefore, the polynomial P can be rewritten in the form (ξ1 − 1)α1 · · · (ξn − 1)αn bαβ ξ β . P (ξ) = |α|=k

β∈Bα

All the Bα ’s are finite, and hence α1 αn F (t) = · · · e−idtn − 1 bαβ e−id(t,β) . e−idt1 − 1 |α|=k

β∈Bα

−idt α1 αn −i(t,x) 1 · · · e−idtn − 1 , Observe that the ch.f. of α x;d is e e −1 coincide, and therefore, the ch.f.s of µ and of |α|=k β∈βα bαβ α a+dβ;d which proves Claim 6.2.11. 2 Claim 6.2.12 Any measure α Zn+ , x ∈ IRn , and d > 0 is a x;d with α ∈ Z k linear combination of the measures y;h , y, h ∈ IRn . This follows from identity (b) (see (6.2.6)) with hi = dei . Combining Claims 6.2.9, 6.2.11, and 6.2.12, we complete the proof of Lemma 6.2.8. 2

Lemma 6.2.13 transformation D is an isometric isomorphism The linear ∗ ◦ k 0 of Λϕ , · ◦k onto M , · ∗k,ϕ . Λϕ

∗ Proof: Due to Lemma 6.2.7 it is enough to show that any L ∈ M 0 , · ∗k,ϕ has the form Lf for some f ∈ Λkϕ . n , |α| ≤ k − 1, and set xα= α/k. For any x ∈ IRn there exists Fix α ∈ Z+ (cα (x))|α|≤k−1 such that µx = δx − |α|≤k−1 cα (x)δxα is in M 0 . In fact, applying Proposition 6.2.10, we have that the system cα xβα = xβ , |β| ≤ k − 1, |α|≤k−1

has a unique solution (cα (x))|α|≤k−1 and cα (·) ∈ Pk−1 , |α| ≤ k − 1. Put f (x) = L(µx ). Then for any x, h ∈ IRn , kh f (x) = L kx;h and |kh f (x)| ≤ L ∗k,ϕ kx;h k,ϕ ≤ L ∗k,ϕ ϕ( h ). Therefore, f ∈ Λkϕ . More over, Lf kx;h = kh f (x) = L kx;h , x, h ∈ IRn . This implies that L(µ) = Lf (µ) for µ = kx;h and hence for all simple

402


µ ∈ M 0 . Lemma 6.2.8 gives that L(µ) = Lf (µ) for all µ ∈ M 0 . Thus L = D(f ). D has been shown to be surjective, as desired. 2 Next we consider the adjointof the transformation D. The Hahn–Banach ∗ ◦ ∗ ∗∗ D k theorem applies to show that Λϕ , · ∗◦k ←− M 0 , · ∗∗ k,ϕ is an iso ∗∗ Λϕ T 0 ∗∗ metric isomorphism. Let M , · k,ϕ ←− M 0 , · k,ϕ be the canon ◦ ∗ D ∗ ◦T k ∗ ←− M 0 , · k,ϕ is an isometry. ical embedding. Then Λϕ , · ◦k Λϕ

The routine diagram shows that

◦ f dµ; f ◦k ≤ 1 .

µ k,ϕ = sup Λϕ

We summarize the discussion and state the following main result of this section: Theorem 6.2.14 (Duality theorem for mass transshipments on a compact space with constraints on the marginal kth differences) For any µ ∈ M 0 the following generalized Kantorovich–Rubinstein duality theorem holds:     ϕ(|h|)d|ψ|(x, h) = sup f dµ; f Λk ≤ 1 . inf ϕ   ψ∈Γµ IR2n

IRn

We now show that the supremum in Theorem 6.2.14 is attained for some optimal f . Theorem 6.2.15 For any µ ∈ M 0 there is some f ∈ Λkϕ with f Λk = 1 ϕ such that f dµ.

µ k,ϕ = ∗

Proof: Applying the Hahn–Banach Theorem, we choose L ∈ M 0 with

L ∗k,ϕ = 1 and such that L(µ) = µ k,ϕ . Using Lemma 6.2.13, we have L = Lf for some f ∈ Λkϕ with f Λk = L ∗k,ϕ = 1. 2 ϕ

6.3 The General Case In this section we extend the duality Theorem 6.2.14, relaxing the condition on the support of transshipment plans. Instead of imposing compactness

6.3 The General Case

403

on the support of µ, we assume that the marginals of µ have finite absolute moments of some order. This condition, together with the moment condition (6.2.2), is the natural assumption for finiteness of ideal metrics arising in our new duality theorem. We start by recalling some of the notation we used in the previous section and introducing some new ones. As usual, we denote by || · || the Euclidean norm in IRn . For α = (α1 , . . . , αn ) ∈ ZZn+ and x = (x1 , . . . , xn ) ∈ IRn , |α|! , α! := α1 ! · · · αn !, and xα := |α| := α1 + · · · + αn , [α1 , · · · , αn ] := α1 !···α n! α αn 1 := xα 1 · · · xn ; α ≤ β means αi ≤ βi , i = 1, . . . , n, and in this case β M will be the set of finite signed Borel measures on IRn . Mc , resp. Mr (r > 0), will denote the set of all µ ∈ M with compact support, resp. with finite rth moment ||x||r d|µ|(x). Fix k ∈ IN and let Mc0 and Mr0 , r ≥ k − 1, denote the subsets of Mc and Mr of measures with α! β!(α−β)! .

xα dµ(x) = 0,

|α| ≤ k − 1.

D is the space of functions with bounded support possessing all derivatives. The kth difference of f on IRn is k

kh f (x) =

i=0

(−1)k−i

k f (x + ih) i

(6.3.1)

k for x, h ∈ IRn ; kx;h = i=0 (−1)k−1 ki δx+ih is the corresponding measure, with δa a point mass at a. For α ∈ ZZn+ , d > 0, and x ∈ IRn , set α d = αn n 1 α d e1 · · · d en , where e1 , . . . , en stands for the standard basis of IR , and n let α x;d be the corresponding discrete measure. With a cube Q in IR we n n = {(x, h) ∈ IR × IR ; x + ih ∈ Q, i = 0, 1, . . . , k}. For associate the set Q 0 µ ∈ Mk−1 , let Γµ be the set of signed Borel measures on IR2n such that

f dµ =

kh f (x) dψ(x, h)

for all f ∈ D.

(6.3.2)

Proposition 6.3.1 Suppose the support of a measure µ ∈ Mc0 is contained in a closed cube Q. Then there exists a measure ψ ∈ Γµ such that supp ψ ⊂ and Var ψ ≤ A Var µ. Q This result is stated in Hanin (1991) and is based on a nontrivial functiontheoretical fact. The proof can be extracted from arguments similar to theorems in Section 6.2; see also Hanin and Rachev (1994, 1995) and Brudnii (1970).

404


Proposition 6.3.1 shows that Γµ = Ø for every µ ∈ Mc0 . Define the Kantorovich–Rubinstein-type norm: For fixed k ∈ IN and given a nondecreasing function ϕ : IR+ → IR+ , ϕ(0+) = ϕ(0) = 0, ϕ(t) > 0 for t > 0, let 0 µ ∈ Mk−1 . ||µ||k,ϕ := inf ϕ(||h||) d|ψ|(x, y), ψ∈Γµ

Remark 6.3.2 For k = 1 and ϕ(x) = x, x ∈ IR+ , this is the particular case of the Kantorovich–Rubinstein norm (or the so-called Wasserstein norm, see for example Dudley (1976) and de Acosta (1982)),     ||µ||1,ψ := inf ||x − y|| d|ψ|(x, y); dψ(x, IRn ) − dψ(IRn , x) = dµ(x) .   IR2n

To describe the dual form for ||µ||k,ϕ , define Λkϕ to be the setof all locally bounded functions f on IRn such that for some C ≥ 0, kh f (x) ≤ Cϕ(||h||) over all x, h ∈ IRn . Λkϕ is endowed with the seminorm |f |Λk = inf C. The ϕ

◦

factor space Λkϕ = Λkϕ /Pk−1 (Pk−1 stands for the set of polynomials on IRn ◦ with degree less than k) is a Banach space with the norm ||f|| ◦k = |f |Λk . Λϕ

ϕ

We assume hereafter without loss of generality that the function t → is nonincreasing.

ϕ(t) tk

Theorem 6.3.3 If µ ∈ Mc0 , then: (I) || · ||k,ϕ is a norm on Mc0 . (II) There is an isometric isomorphism D of 0 ∗ Mc , || · ||k,ϕ .

◦

Λkϕ , ||

· || ◦k

Λϕ

onto

This theorem was in fact shown in the previous section; see Theorems 6.2.14, 6.2.15. We shall outline the main steps in the proof in order to recall the structure of D and then extend it to a more general case: (i) || · ||k,ϕ is a finite norm on Mc0 . (ii) Each f ∈ Λkϕ induces a linear form Lf : Mc0 → IR defined by Lf (µ) = ◦ f dµ. Lf is a continuous linear functional with ||Lf ||∗k,ϕ ≤ ||f||Λk , ϕ and thus we may define a continuous linear transformation ∗ ◦ D Λkϕ , || · || ◦k → Mc0 , || · ||∗k,ϕ Λϕ

◦

by D(f) = Lf .


405

(iii) The map D is an isometry. (iv) Call a signed measure on IRn simple if it is a finite linear combination of measures kx,h , x ∈ IRn , h ∈ IRn . Simple measures are dense in (Mc0 , || · ||k,ϕ ). (v) The linear tranformation D is surjective and hence an isometric isomorphism of Banach spaces. To extend Theorem 6.3.3 to Mr0 we need some technical conditions on r given below. For γ ∈ ZZn+ , |γ| ≥ k, set

ω(γ) := max |β|=k β≤γ

β≤σ≤γ

γ! |σ|! (|γ| − |σ|)! σ β . σ!(γ − δ)! |γ|! |σ|

(6.3.3)

Define R = R(k, n) as the infimum of > 0 subject to ω(γ) ≤ const. |γ|

for all γ ∈ ZZn+ , |γ| ≥ k.

(6.3.4)

Proposition 6.3.4 k ≤ R ≤ n + k − 1. γ Proof: For n = 1, we have ω(γ) = σk−1 , γ ≥ k, and therefore, R(k, 1) = k. Besides, R is nondecreasing with respect to n, and thus R ≥ k. First let us show that for αj = (αj1 , . . . , αjn ) ∈ ZZn+ , j = 1, . . . , m, n +

[α1i , . . . , αmi ] ≤ [|α1 |, . . . , |αm |].

(6.3.5)

i=1

In fact, multiplying the expansions (for i = 1, . . . , n) m

(x1 + · · · + xm )

j=1

αji

αmi 1i = · · · + [α1i , . . . , αmi ]xα 1 · · · xm + · · · ,

we get m

(x1 + · · · + xm )

j=1

|αj |

= ··· +

n +

|α1 |

[α1i , . . . , αmi ]x1

|αm | · · · xm + · · · (6.3.6)

i=1

On the other hand, for the left-hand side in (6.3.6), we have |α1 |

LHS = · · · + [|α1 |, . . . , |αm |]x1

|αm | · · · xm + ···,

where the displayed terms equal the term displayed in (6.3.6) plus some others. Setting m = 2 in (6.3.5), we have for σ ≤ γ, |γ|! γ! ≤ , σ! (γ − σ)! |α|! (|γ| − |σ|)!

406


and therefore, ω(γ) ≤ as desired.

σ≤γ

|σ|k−1 ≤ |γ|n+k−1 , which implies R ≤ n+k−1, 2

We now start with the extension of the Kantorovich–Rubinstein theorem. In what follows, A denotes absolute constants that may be different in different places. The duality theorem for the norm || · ||k,ϕ is based on 6.3.3 (the compact case) and the fact that Mc0 is dense in Theorem 0 Mn+k−1 , || · ||k,ϕ ; see Theorem 6.3.5 below. The main idea of the proof goes back to the work of Kantorovich and Rubinstein (1958) and Dudley (1976, Lecture 20). Theorem 6.3.5 Let r = n + k − 1. For any µ ∈ Mrk and for any ε > 0 (A)

∃ ν ∈ Mc0 with ||µ − ν||k,ϕ ≤ ε.

The proof is contained in the following three lemmas. Lemma 6.3.6 It suffices to show (A) for any µ ∈ Mr0 with support in IRn+ . Proof: For multi-indices α ∈ ZZn+ with |α| ≤ k − 1, there exists (cα )|α|≤k−1 such that β cα α = xβ dµ(x), |β| ≤ k − 1. (6.3.7) |α|≤k−1

IRn +

To show the above equality, one can check that for any m ∈ ZZ+ , the determinant of the matrix An,m = (αβ )|α|≤m,|β|≤m is nonzero; cf. Proposition 6.2.10. Applying the Cramer formula, we have for the solution (cα )|α|≤k−1 of (6.3.7)   |cα | ≤ A max xβ dµ(x) ≤ A Var µ + ||x||k−1 d|µ|(x) |β|≤k−1 |α|≤k−1 IRn+ IRn    ≤ A Var µ + ||x||r d|µ|(x) . IRn

Set E1 = IRn+ , σ1 = |α|≤k−1 cα δα , and µ1 = µ1E1 − σ1 . Using the previous estimate, we have ||x||r d|µ1 |(x) ≤ ||x||r d|µ|(x) + A |cα | ≤

A Var µ +

|α|≤k−1

||x|| d|µ|(x) , r


407

and hence µ1 is in Mr0 . The coordinate hyperplanes split Rn into disjoint Ei , i = 1, . . . , 2n . To any Ei we assign a discrete measure σi (in the way we did above) to obtain µi = µ1Ei − σi ∈ Mr0 , i = 1, . . . , 2n . Suppose that for µ1 (with support in E1 ) the assertion of the lemma is valid, and thus it also holds for all 2n 2n µi . Take σ = i=1 σi and ν = i=1 νi + σ, where νi ∈ Mc0 are chosen to 2n satisfy ||µi − νi ||k,ϕ ≤ ε. The equality µ = i=1 µi + σ yields σ ∈ Mc0 and thus also ν ∈ Mc0 . Finally, ! n ! 2 ! ! ! ! ≤ 2n ε, ||µ − ν||k,ϕ = ! (µi − νi )! ! ! i=1

k,ϕ

2

which completes the proof of Lemma 6.3.6.

Lemma 6.3.7 It suffices to show (A) for any µ ∈ Mr0 with support on a lattice dZZn+ , d > 0. Proof: Take µ ∈ Mr0 with support in IRn+ . The lattice dZZn+ forms a partition of IRn+ into disjoint cubes Qi = ai , bi of side length d. Set µi = µ1Qi d n and let xi,α := ai + αd Z+ ∩ Qi (Qi stands k , |α| ≤ k − 1. Obviously, xi,α ∈ k Z for the closure of Qi ). Then there exists (ci,α )|α|≤k−1 such that ci,α xβi,α = sβ dµi (s), |β| ≤ k − 1, |α|≤k−1

Qi

|ci,α | ≤ A Var µi ; see Proposition 6.2.10. Letting νi = c δ have that µ1 − νi ∈ Mc0 and Var νi ≤ A Var µi . Ob|α|≤k−1 i,α xi,α , we serve first that ν = i νi has finite variance. To show that |ν| has a finite rth moment set δi = inf{||x||; x ∈ Qi }, i = sup{||x||; x ∈ Qi }. Assuming that d ≤ 12 , we get for cubes Qi with i > 1 the estimate i ≤ Aδi . Then ||x||r d|ν|(x) and

IRn

≤

|α|≤k−1

||x||r d|νi |(x) =

i Q i

≤ A

 ri Var µi = A 

i

≤ A

Var µi +

i ≤1

A δir 1 >1

i

||xi,α ||r |ci,α | ≤

|α|≤k−1

i ≤1

+

 ri Var µi

i >1

Var µi ≤ A

ri Var νi

i



i

Var µi + A

i Q i

||x||r d|µi |(x)

408


  = A Var µ +



 ||x||r d|µ|(x) < ∞.

IRn +

Combined with νi − µi ∈ Mc0 and ν =

i (νi

− µi ) + µ this yields ν ∈ Mr0 .

Now we will show that ||µ − ν||k,ϕ ≤ Aϕ(d) Var µ.

(6.3.8)

, By Proposition 6.3.1, for every i there is ψi ∈ Γµi −νiwith supp ψi ⊂ Q i and Var ψi ≤ A Var(µi − νi ) ≤ A Var µi . Set ψ = i ψi . The covering {Qi } of IRn+ has finite multiplicity, and hence Var ψ ≤ A

Var ψi ≤ A Var

i

µi = A Var µ < ∞.

i

For any f ∈ D f d(µ − ν) = f d(µi − νi ) i

IRn

=

n

IR i

kh f (x) dψi (x, h) =

n

IR2

kh f (x) dψ(x, y), IR2

n

and therefore ψ ∈ Γµ−ν . By definition of the norm || · ||k,ϕ , ϕ(||h||) d|ψ|(x, h) ≤ ϕ(|h|) d|ψi |(x, h) ||µ − ν||k,ϕ ≤ i

n

IR2

≤

Q i

d ϕ Var ψi ≤ Aϕ(d) Var µ. k i

Applying Lemma 6.3.6 and (6.3.8), we complete the proof of Lemma 6.3.7. 2 Lemma 6.3.8 (A) is fulfilled for any µ ∈ Mr0 of the form aα δαd , |aα | < ∞. µ = α∈Z Zn +

(6.3.9)

α∈Z Zn +

Proof: First let us show that each µ of the form (6.3.9) also has the representation cα,β βαd;d . (6.3.10) µ = Zn |β|=k α∈Z +


409

Recall that βαd;d stands for the measure generated by the difference func tional f → βde11 · · · βdenn (αd). Take

f (x) :=

aγ xγ .

(6.3.11)

γ∈Z Zn +

As µ ∈ Mr0 , it follows from (6.3.9) that aγ γ σ = 0, |σ| ≤ k − 1, γ∈Z Zn +

and

|aγ | |γ|r < ∞.

(6.3.12)

γ∈Z Zn +

Therefore, f ∈ C k ([−1, 1]n ) (we recall that r = n + k − 1). The Taylor expansion of f at 1 = (1, . . . , 1) ∈ IRn implies (note that Dσ f (1) = 0 for |σ| ≤ k − 1) 1 (x − 1)β Dβ f (s1 + (1 − s)x)sk−1 ds f (x) = k β! |β|=k 0 (x − 1)β γ − β γ! = k aγ α β! (γ − β)! |β|=k

γ≥β

α≤γ−β

1 α × x (1 − s)|α| s|γ|−|β|−|α|+k−1 ds 0

=

cα,β xα (x − 1)β .

Zn |β|=k α∈Z +

Here cα,β =

γ≥α+β

bγ,α,β = k

aγ bγ,α,β , and

γ! |α|! (|γ| − |α| − 1)! . α! β! (γ − α − β)! |γ|!

Set σ = γ − α and observe that σ β (σ − β)! ≥ σ! for all σ ≥ β. Then taking into account (6.2.2), (6.3.4), Proposition 6.3.4, and (6.3.12), we get for |β| = k, α∈Z Zn +

|cα,β | ≤

α∈Z Zn + γ≥α+β

|αγ |bγ,α,β

410


γ! |σ| (|γ| − |σ|)! σ β σ! (γ − σ)! |γ|! |σ| γ≥β β≤σ≤γ |aγ | ω(γ) ≤ A |aγ | |γ|r < ∞. A

≤

A

≤

|aγ |

|γ|≥k

|γ|≥k

Now, in (6.3.11) and in the last term of (6.3.13), put x = −i e dt1 , . . . , e−i dtn to see that the characteristic functions of the measures in the right-hand side of (6.3.9) and (6.3.10) coincide as desired. β 0 Next set µN = |β|=k |α|≤N cα,β αd;d . Clearly, µN ∈ Mc , and we next show that one can approximate µ of the form (6.3.9) by µN ’s as N → ∞. From the definition of Γµ (see (6.3.2)), δx,h ∈ Γ kx;h for any x, h ∈ IRn , and therefore, ||kx;h ||k,ϕ ≤ ϕ(|h|).

(6.3.13)

Let kx;h1 ,...,hk be a measure determined by the functional f → 1 h1 · · · 1hk f (x). The following equality, due to Kemperman, is readily checked: For any x, h1 , . . . , hk ∈ IRn ,

kx;h1 ,...,hk = k

⊂K

(−1)k−|k | kx+xk

;uk

.

(6.3.14)

h Here K = {1, . . . , k}, xk = j ∈k hj , uk = j∈k jj , k ⊂ K, j∈Ø := 0. (For a proof see Proposition 6.2.3, and the equality (a) in that proposition, see also Johnen and Scherer (1977).) Therefore, (6.3.13) and (6.3.14) imply ||βαd;d ||k,ϕ ≤ Aϕ(Ad) ≤ Aϕ(d), |β| = k. Note that for any countable collection {νn } of measures of the form βx;d and for every collection of real numbers {an } with n |an | < ∞, we have, in view of (6.3.14) and the definition of the norm || · ||k,ϕ , ! ! ! ! ! ! an νn ! ≤ Aϕ(d) |an |. ! ! ! n n k,ϕ

Hence, by (6.3.3) and (6.3.13), ||µ − µN ||k,ϕ

≤

Aϕ(d)

|Cα,β |

|β|=k |α|>N

≤

Aϕ(d)

|β|=k |α|>N γ≥α+β

=

Aϕ(d)

|β|=k

|γ|>N +k γ≥β

|aγ |

|aγ |bγ,α,β β≤σ≤γ |σ|N +k γ≥β

411

|aγ |ω(γ)

|aγ | |γ|r → 0,

as N → ∞.

|γ|>N +k

2

This proves Lemma 6.3.8, as well as Theorem 6.3.5.

Theorem 6.3.5 allows us to extend the isomorphism D, replacing Mc0 in Theorem 6.3.3 with Mr0 . The Hahn–Banach theorem applies to show ∗∗ ◦ ∗ D∗ that Λkϕ , · ∗◦k ← Mr0 , · ∗∗ is an isometric isomorphism. Let k,ϕ ∗∗ Λϕ T Mr0 , · ∗∗ ← (Mr0 , · k,ϕ be the canonical embedding. Then k,ϕ ◦ ∗ D ∗ ◦T Λkϕ , · ∗◦k ←− Mr0 , · k,ϕ is an isometry. The routine diagram Λϕ

◦ shows that µ k,ϕ = sup f dµ; f◦k ≤ 1 . Λϕ

The following is the main result of this section. Theorem 6.3.9 Let r = n + k − 1. For any µ ∈ Mr0 ,     inf ϕ(||h||) d|ψ|(x, h) = sup f dµ; ||f ||Λk ≤ 1 . ϕ   ψ∈Γµ IR2n

IRn

We now show that the supremum in Theorem 6.3.9 is attained for some optimal f . Theorem 6.3.10 For any m ∈ Mr0 , r = n + k − 1, there is some f ∈ Λkϕ with ||f ||Λk = 1 such that ||µ||k,ϕ = f dm. ϕ

∗

Proof: Applying the Hahn–Banach theorem we choose L ∈ Mr0 with ||L||∗k,ϕ = 1 and such that L(µ) = ||µ||k,ϕ . Since D is an isometric iso ∗ ◦ k morphism of Λϕ , || · || ◦k onto Mr0 , || · ||∗k,ϕ , we have L = Lf for some f∈

Λϕ

Λkϕ

with ||f ||Λk = ||L||∗k,ϕ = 1.

2

ϕ

We next apply Theorems 6.3.9 and 6.3.10 to the theory of probability metrics; see Zolotarev (1986) and Rachev (1991). A metric θ in the space P(IRn ) of probabilities on IRn is ideal of order r > 0 if for any set of constants ci = 0 and P (i) , Q(i) ∈ P(IRn ), i = 1, . . . , m, m (m) (1) (m) r (i) (i) θ Pc(1) · · · P , Q · · · Q |c | θ P , Q ≤ , i cm c1 cm 1 i=1

412


where Pc (◦) = P (c−1 ◦), and stands for the convolution of measures. As we have shown, the Kantorovich metric   K1 (P, Q) = sup



f d(P − Q); |f (x + h) − f (x)| ≤ |h|,

x, h ∈ IRn

IRn

  

is an ideal order of order 1, and the classical Kantorovich–Rubinstein theorem (cf. Chapter 4) provides a dual representation   K1 (P, Q) = inf



|h| d|ψ|(x, h); ψ(A × IRn ) − ψ(IRn × A)

IR2n

= P (A) − Q(A)

for all Borel A ⊂ IRn

  

.

Only few examples of ideal metrics of order r > 1 are known (cf. Zolotarev (1986), Maejima and Rachev (1987), Rachev and Yukich (1989), Rachev and R¨ uschendorf (1992), and Rachev (1991c)). They have been used to determine sharp estimates of the rate of convergence in the CLT for independent random variables, or martingale differences; see Senatov (1981), Zolotarev (1986), Rachev (1991c), Rachev and R¨ uschendorf (1992, 1992a, 1994, 1994a), and Rachev (1991a)). One of them is the Zolotarev metric     f d(P − Q); f ∈ Fr . ζr (P, Q) = sup   IRn

Here Fr is the class of functions f : IRn → IR having Fréchet derivatives of order k − 1 such that ||f (k−1) (x) − f (k−1) (y)|| ≤ ||x − y||β , for all x, y ∈ IRn , and k is determined by r = k − 1 + β, 0 < β ≤ 1. Clearly, K1 = ζ1 . We use the norm || ◦ ||k,ϕ with ϕ(t) = tr and k − 1 ≤ r ≤ k to define the following analogue of the Kantorovich metric:     Kr (P, Q) = ||P − Q||k,ϕ = sup f d(P − Q) ; ||f ||Λk ≤ 1 . ϕ   IRn

The desired (“ideal”) properties of the metric Kr are given in the following lemma. Lemma 6.3.11 Kr is ideal of order r.


413

Proof: To show the ideality of order r > 0 we need to check (a)

Regularity: Kr (P1 Q, P2 Q) ≤ Kr (P1 , P2 );

(b)

Homogeneity of order r: Kr (Pc , Qc ) ≤ |c|r Kr (P, Q).

The regularity property follows from the invariance of the norm f Λk ϕ with respect to shifts f (· + a). In fact, sup f (x + y)(P1 − P2 )( dx)Q( dy) Kr (P1 Q, P2 Q) = ||f || k ≤1 Λϕ

≤ ≤

sup ||f || k ≤1 Λϕ

f (x + y)(P1 − P2 )( dx) Q( dy)

Kr (P1 , P2 ).

To show the homogeneity, observe that for fc (x) = f (cx), 0 1 ||fc ||Λk = |c|r sup t−r sup |kh f (cx)|; x ∈ IRn , |h| ≤ t , ϕ

t>0

and therefore, Kr (Pc , Qc ) = =

−r sup f d(P − Q) ; |c| |f |Λk ≤ 1 ϕ |c|r Kr (P, Q) 2

as desired.

The next theorem follows readily from Lemma 6.3.11 and Theorem 6.3.9. It provides the relationship to the Zolotarev ζr -metric, states the corresponding dual representation, and describes the set of measures on which Kr is finite. Theorem 6.3.12 (i) Kr is an ideal metric of order r, and for k − 1 < r ≤ k, ζr ≤ c1 Kr ≤ c2 ζr 0 for some positive constants c1 and c2 ; (ii) if P − Q ∈ Mn+k−1 , then Kr (P, Q) admits the dual representation Kr (P, Q) = inf |h|r d|ψ|(x, h), ψ∈ΓP −Q IR2n

414


and moreover, Kr (P, Q) ≤ A

||x||r d|P − Q|(x) < ∞.

(6.3.15)

IRn

Remark 6.3.13 This theorem is in fact Theorem 6.3.9 with ϕ(t) = tr , t > 0. The bound (6.3.15) follows from the dual representation. The above inequality may be viewed as an analogue of the bound obtained by Zolotarev (1986, Theorem 1.5.7) for the ζr -metric in terms of the absolute pseudomoment of order r.

6.4 Minimality of Ideal Metrics In the previous sections duality theorems for specific mass transshipment problems leading to ideal metrics were considered. The notion of minimal metrics is closely related to the mass transportation problem, and in this section we study the relationships between ideal and minimal metrics.(4) Let X = {X} be the space of real random variables on a probability space (Ω, A, P ). In defining the metrics on a set of random variables and in describing their properties, we shall follow Zolotarev (1986) and Rachev (1991). By (X )2 = {L(X, Y )} we shall denote the space of all joint distributions of pairs of random variables (r.v.) X, Y ∈ X . A mapping µ : (X )2 → [0, ∞] is called a probability metric if for any X, Y, Z ∈ X we have: (1)

µ(X, Y ) = 1

implies µ(X, Y ) = 0,

(2)

µ(X, Y ) = µ(Y, X),

(3)

µ(X, Y ) ≤ µ(X, Z) + µ(Z, Y ).

We assume that the original probability space is rich enough to support all Borel probability measures on IR2 . Following Zolotarev (1976), we shall say that the metric µ is a (C, r)-perfect metric if (4)

(5)

µ is a regular functional, i.e., µ(X + Z, Y + Z) ≤ µ(X, Y ), of X, Y ,

for any X, Y, Z ∈ X , Z independent

µ is a (C, r)-homogeneous functional, i.e., µ(cX, cY ) ≤ |c|r µ(X, Y ), for 0 < |c| < C, r > 0.

(4) The

results in this section are due to Ignatov and Rachev (1986).

6.4 Minimality of Ideal Metrics

415

If condition (4) holds only in the case that Z is independent of X and Y , then µ is called a weakly (C, r)-perfect metric. In the case C = +∞, we shall say that µ is an ideal metric of order r. In the case that µ(X, Y ) is a simple metric, that is, its values depend only on the marginal distributions PX and PY , then it is natural to consider only weakly perfect metrics. Note that if µ(X, Y ) is a (C, r)-perfect compound (that is, not simple) metric for r > 1, then µ(X, Y ) will take either the value 0 or +∞. Indeed, if µ(X, Y ) > 0 for some X and Y , then for n > C1 we obtain from conditions (4) and (5) the inequalities µ(X, Y ) = ≤

µ

nX nY , n n

n n 1 ≤ rµ X, Y n i=1 i=1

(6.4.1)

n 1 1 µ(X, Y ) = r−1 µ(X, Y ). r n i=1 n

Hence µ(X, Y ) = +∞. The latter shows that compound perfect metrics do not exist. It is nevertheless possible to construct a wide range of weakly perfect metrics of order r > 1, as we shall see. In Zolotarev (1983, 1986) one can find the following construction of ideal metrics of order s ∈ IN = {1, 2, . . .}. Let U0 be the class of all functions a(x), x ∈ IR1 , of bounded variation such that limx→±∞ a(x) = 0. In the class of all real functions f let us define the operator x, c ∈ IR1 .

(K c f )(x) = f (cx);

Let Is be an operator in U0 that satisfies the following conditions: (α)

Is K 1/c = |c|s−1 K 1/c Is ;

c = 0, s ∈ IN.

(β)

For any X ∈ X and a ∈ U0 we have where FX (x) = P (X ≤ x) and

Is (a ∗ FX ) = (Is a) ∗ FX ;

+∞ a(x − y) dFX (y). (a ∗ FX )(x) = −∞

Let Λ be a homogeneously convex functional in U0 , i.e., (A)

Λ(a + b) ≤ Λ(a) + Λ(b);

(B)

Λ(ra) = |r|Λ(a);

a, b ∈ U0 ,

r ∈ IR1 , a ∈ U0 ,

that has the following properties:

416


(C)

for any Z ∈ X , we have

Λ(a ∗ FZ ) ≤ Λ(a),

(D)

there exists a p ≥ 0 such that Λ(K 1/c Is a) = |c|p Λ(Is (a)),

a ∈ U0 ,

c = 0.

Theorem 6.4.1 (Zolotarev (1979)) Let the functional Λ, P ≥ 0, and the operator Is , s ∈ IN, have the properties (α), (β), (A), (B), (C), (D). Then the functional µ(X, Y ) = Λ(Is (FX − FY )) will be an ideal metric of order s + p − 1. If conditions (β) and (C) do not hold, then µ(X, Y ) will be a homogeneous metric of order s + p − 1. For each s ∈ IN, p ∈ [0, ∞], α ≥ 0, define the following simple metrics for p ∈ (0, ∞): 1/p  +∞ ζ(X, Y ; s, p, α) =  |Fs,X (x) − Fs,Y (x)|p |x|αp dx , p = max(1, p); −∞ +∞ I{x; Fs,X (x) = Fs,Y (x)}|x|α dx, ζ(X, Y ; s, 0, α) =

I is an indicator;

−∞

ζ(X, Y ; s, ∞, α) =

sup{|Fs,X (x) − Fs,Y (x)| |x|α ; x ∈ IR1 },

where x Fs,X (x) = −∞

(x − t)s−1 dFX (t); (s − 1)!

s ∈ IN.

The metrics ζ(X, Y ; s, p, α) are homogeneous metrics of order r, where r = (s − 1)p/p + α + 1/p for p ∈ (0, ∞); r = α + 1 for p = 0, and r = s − 1 + α for p = +∞. For α = 0 we have ζ(X, Y ; s, p, 0), which is an ideal metric of order r = (s − 1)p/p + 1/p for p ∈ (0, ∞); r = 1 for p = 0, and r = s − 1 for p = +∞. If F is a class of measurable functions on IR1 , then ζ will be Zolotarev’s metric for the class F defined as follows: ζ(X, Y ; F) = sup{E[f (X) − f (Y )]; f ∈ F}. Let F(s, q, α), q ∈ [1, ∞], be the class of all continuous functions f on IR1 +∞ that have measurable sth-order derivatives such that |f (s) (x)/xα |q dx ≤ −∞

1 for q ∈ [1, ∞], and  +∞    inf M I{x; |f (s) (x)| > M |x|α } dx = 0 ≤ 1,   −∞

for q = +∞.


417

For p ∈ [1, ∞] we then have the equation ζ(X, Y ; s, p, α) = ζ(X, Y ; F(s, q, α)), where q = p/(p − 1). (For p = 1 and α = 0, see Zolotarev (1979).) By ζs,p (X, Y ) we denote the ideal metric ζ(X, Y ; s, p, 0). The metric ζs,1 (X, Y ) can be estimated from below by the Prohorov metric: 0 (6.4.2) π(X, Y ) = inf a > 0; P (X ∈ A) ≤ P (Y ∈ Aε ) + ε, 1 for all Borel sets A ⊂ R , where Aε = {x; |x − A| < ε}. Let us define the class Xs∗ ⊂ X as follows: If X, Y ∈ Xs∗ , then E[Xj − Yj ]j = 0, j = 1, . . . , s − 1, and E|X|s < +∞. For any s ∈ IN we have the inequality ζs+1,∞ (X, Y ) ≤ ζs,1 (X, Y ), and from Zolotarev (1978, 1986) it follows that for X and Y ∈ Xs∗ , we have ζs,1 (X, Y ) ≤

1 ηs (X, Y ), Γ(s)

where +∞ ηs (X, Y ) = s |x|s−1 |FX (x) − FY (x)| dx. −∞

If E|X|s + E|Y |s < +∞, then the finiteness of the metric ζs+1,∞ (X, Y ) yields the equation EX k = EY k , k = 1, 2, . . . , s−1. Within the class Xs∗ the metrics ζs+1,∞ , ζs,1 , and ηs are topologically equivalent, and ζs,1 (Xn , X) → 0 if and only if Xn converges in distribution to X(Xn ⇒ X) and E|Xn |s → E|X|s . It is well known that to any compound probability metric µ(X, Y ) in the space X it is possible to assign a minimal metric µ (X, Y ) = inf µ(ξ, η),

(6.4.3)

where the infimum is taken over all the L(ξ, η) ∈ X2 that have fixed marginal distributions Pξ = PX and Pη = PY . We consider some examples of minimal metrics. (i) Let B denote the Borel σ-algebra on the straight line and let K denote the distance in probability (the Ky–Fan metric) K(X, Y ) = inf{ε > 0; P (|X − Y | ≥ ε) < ε}

(6.4.4)

and π the Prohorov metric; see (6.4.2). Then the Strassen theorem (see Section 4.2) gives K(X, Y ) = π(X, Y ).

(6.4.5)

418


(ii) For any p ∈ [0, ∞] denote by τp (X, Y ) the following compound metrics:

τp (X, Y )

= {E|X − Y |p }1/p, p ∈ (0, ∞), p = max(1, p); (6.4.6)

τ0 (X, Y )

= κ(X, Y ) = EI{X = Y };

(6.4.7)

= ess sup |X − Y | = inf{ε > 0; P (|X − Y | > ε) = 0}.

τ∞ (X, Y )

For any p ∈ [0, ∞] define the simple metrics κp (X, Y ): κ0 (X, Y ) = σ(X, Y ) = sup{|P (X ∈ A) − P (Y ∈ A)|; A ∈ B};

(6.4.8)

and for p ∈ (0, 1], κp (X, Y ) = sup{|E[f (X) − f (Y )]|; f ∈ Lip(p)},

(6.4.9)

where Lip(p) is the class of all functions that satisfy a Hölder condition of order p, i.e., |f (x) − f (y)| ≤ |x − y|p . For p ∈ [1, ∞) we have  1 1/p −1 (t) − FY−1 (t)|p dt , κp (X, Y ) =  |FX

(6.4.10)

0 −1 where FX (t) = inf{x; FX (x) ≥ t}; for p = ∞ we have −1 (t) − FY−1 (t)|; t ∈ [0, 1]}. κ∞ (X, Y ) = sup{|FX

(6.4.11)

Then for any p ∈ [0, ∞] we have, by the duality theorems and the explicit solution for the Monge–Kantorovich problem, the equality τp (X, Y ) = κp (X, Y ).

(6.4.12)

(For the case p = 0 see Dobrushin (1970), Rachev (1978); for p ∈ [0, 1] see Kantorovich and Rubinstein (1957); for p ∈ [1, ∞) see Chapters 2, 3, and 4 of this book.) Making use of the minimality relationships (6.4.5) and (6.4.12) and the obvious inequality κ(p+1)/p

≤ τp ,

(6.4.13)

one obtains a lower bound for κp in terms of π: π (p+1)/p

≤ κp ,

for p ∈ (0, ∞); π ≤ min{κ0 , κ∞ }.

(6.4.14)

If p ∈ (0, ∞), then in the class Xp = {X ∈ X ; E|X|p < ∞} the convergence κp (Xn , X) → 0 will be equivalent to Xn ⇒ X and E|Xn |p → E|X|p .


419

For p = 0 we find that in the class of random variables X with a density pX (t), the convergence κ0 (Xn , X) → 0 will be equivalent to Xn ⇒ X and +∞ lim sup |pXn (x + a) − pX (x)| dx = 0.

a→0 n

−∞

In Rachev (1991) one can find compound metrics of a natural structure and with respect to which the metrics of Lévy and Kolmogorov and some other metrics of integral type are minimal. The concept of a minimal metric is a very convenient tool for studying problems of robustness of probabilistic models, especially for the reason that the inequality ν(X, Y ) ≤ ϕ(µ(X, Y )) under very general conditions on ϕ implies the inequality between minimal metrics ν(X, Y ) ≤ ϕ( µ(X, Y )). Our next goal is to construct natural probabilistic metrics such that a number of ideal metrics are minimal with respect to them; see Ignatov and Rachev (1986). The results of Kantorovich–Rubinstein and of Strassen, Dobrushin, and others showed that there exists a relationship of type (6.4.3) between well-known simple probability metrics and the corresponding compound metrics. The practical value of such relationships has been noted in a number of papers dealing with the structure and the properties of probability metrics and with an analysis of the stability (robustness) of certain stochastic models. We start with the study of some minimal functionals in a space of real random variables. Let µ(X, Y ) be a probability metric in X . Let F be an operator in the space X × X , F (X, Y ) = (f (X, Y ), g(X, Y )). Let µF (X, Y ) denote the following functional in X × X : µF (X, Y ) = µ(F (X, Y )). Example 6.4.2 Let F (X, Y ) = (f (X), g(Y )). If f = g and if P (X = Y ) = 1 implies P (f (X) = f (Y )) = 1, then µF will be a probability metric in X . Example 6.4.3 Let ϕ(x) and ψ(x), x ∈ IR1 , be measurable functions. For any ω ∈ Ω write F (X, Y )(ω) = (ϕ(X(ω)), ψ(Y (ω))). If ϕ = ψ, then µF will be a probability metric in X . We shall say that the operator F is an MD-operator in X × X , where X ⊂ X , if for any ξ, η, and X, Y ∈ Y we have Pξ = PX and Pη = PY if and only if Pf (ξ,η) = Pf (X,Y ) ,

Pg(ξ,η) = Pg(X,Y ) .

(6.4.15)

420


It is evident that if we can assign in a one-to-one manner to the pair (PX , PY ) a pair (Pf (X,Y ) , Pg(X,Y ) ), then F will be an MD-operator; for some detailed discussion, we refer to Rachev and R¨ uschendorf (1990b). Theorem 6.4.4 Let F be an MD-operator in X ×X . Hence, if the metrics ν and µ satisfy the relation ν = µ , then in the class X we have F . νF = µ

(6.4.16)

The infimum in (6.4.16) has the same meaning as in (6.4.3). Proof: Let us denote by Q(X, Y ) the class of all pairs (ξ, η) ∈ X × X with fixed marginal distributions Pξ = PX and Pη = PY . Since for any pair (ξ, η) ∈ X × X we have the equality ν(ξ, η) = µ (ξ, η), it follows that νF (X, Y ) = ν(F (X, Y )) = inf{µ(ξ, η); (ξ, η) ∈ Q(F (X, Y ))}. Hence it suffices to show that for any X and Y belonging to X we have inf{µ(ξ, η); (ξ, η) ∈ Q(F (X, Y ))} = inf{µ(F (ξ, η)); (ξ, η) ∈ Q(X, Y )}. (I) To any pair (ξ, η) ∈ Q(X, Y ), X, Y ∈ X , assign a pair (ξ, η) = F (ξ, η). Since F is an MD-operator, it follows that (ξ, η) ∈ Q(f (X, Y ), g(X, Y )), and hence inf{µ(ξ, η); (ξ, η) ∈ Q(F (X, Y ))} ≤ inf{µ(F (ξ, η)); (ξ, η) ∈ Q(X, Y )}. (II) To obtain an inequality that is the inverse of the upper inequality, we shall assign to any pair (ξ, η) ∈ Q(F (X, Y )), X, Y ∈ X , a fixed pair (ξ, η) ∈ F −1 (ξ, η). Then F (ξ, η) = (ξ, η), and from the definition of an MD-operator it follows that (ξ, η) ∈ Q(X, Y ). 2 For the particular case in Example 6.4.3 one can sharpen Theorem 6.4.4. By Aϕ denote the σ-algebra generated by a measurable function ϕ. Theorem 6.4.5 Let the operator F be generated by the measurable functions ϕ and ψ as in Example 6.4.3. Then, if Aϕ = Aψ = B, it follows from F . ν=µ that νF = µ If ϕ = ψ, then one can get rid of the measurability condition on ϕ and ψ; see Rachev and R¨ uschendorf (1990b), Rachev (1991c, p. 144). Theorem 6.4.6 Let g be a measurable function on IR1 and let F be generated by measurable functions ϕ = ψ. Then ν = µ implies that νF = µ F .


421

Theorem 6.4.1 gives a construction of ideal metrics. Next we shall extend this construction to obtain a larger class of homogeneous and perfect metrics of order higher than 1. Let F 1 (R) be the space of distribution functions (d.f.) F (x) = FX (x) = P (X ≤ x). Set F 2 = {F = F X = 1 − FX ; X ∈ X (IR1 )} and g ∗ = F 1 ∪ F 2 . Let S 1 be the class of all nondecreasing functions f (x), x ∈ IR1 , such that limx→−∞ g(x) = 0, and S 2 the class of all nonincreasing functions g(x), x ∈ IR1 , such that limx→+∞ g(x) = 0. Set A = S 1 ∪ S 2 and define two types of mappings Isi : F ∗ → A,

i = 1, 2,

that satisfy the following conditions: Isk (F j ) ⊂ S j ;

(i)

1 Isk K c

(ii)

s−1

= c

k, j = 1, 2. 1

k = 1, 2, c > 0, s ∈ IN,

K c Jsk ;

x, c ∈ IR1 .

where (K c f )(x) = f (cx),

1

1

Isk K c = |c|s−1 K c Isj , k = j, c < 0, s ∈ IN. For any C1 , C2 ∈ F ∗ , Isk (C1 ∗ C2 ) = (Isk C1 ) ∗ C2 ; k = 1, 2, s ∈ IN,

(iii) (iv)

where +∞ (u ∗ C)(x) = u(x − y) dC(y), C ∈ F ∗ , u ∈ A. −∞

For i = 1, 2 consider the pseudometrics M i (u1 , u2 ), (a)

M i (u1 , u2 ) ∈ [0, ∞],

u1 , u2 ∈ A, M i (u1 , u1 ) = 0.

(b)

M i (u1 , u2 ) = M i (u2 , u1 ).

(c)

M i (u1 , u2 ) ≤ M i (u1 , u3 ) + M i (u3 , u2 ).

We assume that the M i satisfy the following conditions: (d)

M i (u1 ∗ u2 , u1 ∗ u3 ) ≤ M i (u2 , u3 );

u1 , u2 , u3 ∈ A.

(e)

There exists a p ≥ 0 such that for any c ∈ (0, D] we have 1 1 M i K c u1 , K c u2

≤

cp M i (u1 , u2 );

i = 1, 2,

M i (cu1 , cu2 )

≤

c M i (u1 , u2 );

i = 1, 2.

422

(f)


For c ∈ [−D, 0) we have 1 1 1 1 M i K c u1 , K c u2 ≤ M j K − c u1 , K − c u2 ;

i = j.

Theorem 6.4.7 Let the pseudometrics M1 and M2 and the operators Is1 and Is2 , s ∈ IN, have the properties (i)–(iv) and (a)–(f). It then follows that in the space X the functional µ(X, Y ) = max M 1 Is1 FX , Is1 FY , M 2 Is2 F X , Is2 F Y will be a (D, s + p − 1)-perfect metric. If we get rid of conditions (iv) and (d), then µ(X, Y ) will be a (D, s + p − 1)-homogeneous metric. Proof: It follows from (a), (b), and (c) that µ is a simple probabilistic metric. With the aid of (i), (iv), and (d) we find that µ(X, Y ) is a weakly regular metric. It follows from (ii) and (e) that for X, Y ∈ X and c > 0 we have µ(cX, cY ) ≤ cs+p−1 µ(X, Y ). Also, for c ∈ [−D, 0) one obtains from (ii), (iii), (e), and (f) that : 1 1 µ(cX, cY ) = max M 1 Is1 K c F X , Is1 K c F Y , ; 1 1 M 2 Is2 K c FX , Is2 K c F Y : 1 1 = max M 1 |c|s−1 K c Is2 F X , |c|s−1 K c Is2 F Y , ; 1 1 M 2 |c|s−1 K c Is1 FX , |c|s−1 K c Is1 FY : 1 1 ≤ |c|s−1 max M 1 K c Is2 F X , K c Is2 F Y , 1 ; 1 M 2 K c Is1 FX , K c Is1 FY : 1 1 ≤ |c|s−1 max M 2 K − c Is2 F X , K − c Is2 F Y , ; 1 1 M 1 K − c Is1 FX , K − c Is1 FY ≤ |c|s+p−1 µ(X, Y ). 2 The following examples of perfect and homogeneous metrics of order s ∈ IN are not generated by norms. For s ∈ IN write 1 Is FX (x) =

x Fs,X (x) =

2 Is F X (x) = F s,X (x) =

−∞ ∞

x

(x − t)s−1 dFX (t), (s − 1)! (t − x)s−1 dFX (t). (s − 1)!


423

Example 6.4.8 The λ-metric. (The metric λ in X introduced in Zolotarev (1975) is topologically equivalent to the well-known Lévy metric.) It is defined in terms of the ch.f. of the underlying random variables:  

 +∞ 1 . (6.4.17) λ(X, Y ) = min max sup eitx d(FX (x) − FY (x)), T >0 |t|≤T T −∞

Let fs,X be the ch.f. of the integrated d.f. Fs,X : +∞ eitx dFs,X (x), fs,X (t) =

s ∈ IN,

−∞

and define the generalized version of λ:

1 λ(X, Y ; s) = min max sup |fs,X (t) − fs,Y (t)|, s T >0 T |t|≤T 1 1 = min max sup s−1 |fX (t) − fY (t)|, s , T >0 T |t|≤T |t| where fX (t) is the ch.f. of X. Let M i (u1 , u2 )

= λ(u1 , u2 ) =

 

+∞  1  min max sup eitx d(u1 − u2 )(x) , s , T >0 |t|≤T T 

i = 1, 2.

−∞

In the class Xs∗ we have F s,X − F s,Y = Fs,Y − Fs,X , and hence λ(X, Y ; s) = M 1 (Fs,X , Fs,Y ), s ∈ IN, is a (1, s − 1)-perfect metric and λ(X, Y ; 1) = λ(X, Y ). Example 6.4.9 (The generalized Lévy metric) Let ds (x, y) = x |x|s−1 − y |y|s−1 , s ∈ IN. For u1 , u2 ∈ A and λ > 0 set Hλ∗ (u1 , u2 ) =

sup inf max

x1 ∈IR x2 ∈IR

1 ds (x1 , x2 ), u1 (x1 ) − u2 (x2 ) . λ

424


For any λ > 0, x ∈ IN, X, Y ∈ Xs define the Lévy metric Lλ (X, Y ; s) by Lλ (X, Y ; s) = max {Hλ∗ (Fs,X , Fs,Y ), Hλ∗ (Fs,Y , Fs,X )} . It follows from Rachev (1978, 1981) that Lλ (X, Y ; 1) = Lλ (X, Y ) = inf{ε > 0; FX (x − λε) − ε ≤ FY (x) ≤ FX (x + λε) + ε for any x ∈ IR1 } is the Lévy metric with a parameter λ > 0. For λ → 0 the limit for Lλ (X, Y ; s) is an ideal metric ζ(X, Y, s, ∞, 0). It follows from Theorem 6.4.6 that Lλ (X, Y ; s) is a (1, s − 1)-homogeneous metric in X ∗ . Example 6.4.10 (The generalized Kantorovich metric) Let Gs,X (t) = (Fs,X )−1 (t) and Gs,X (t) = (F s,X )−1 (t) be inverse functions of Fs,X and F s,X for t ∈ [0, bs ] and let b1 = 1 and bs = +∞, s = 2, 3, . . . . Set β(t; X, Y, s) = max |Gs,X (t) − Gs,Y (t)| , Gs,X (t) − Gs,Y (t) for t ∈ [0, bs ], s ∈ IN, X, Y ∈ X . For s ∈ IN and p ∈ [0, ∞] define b q  s  1 p ; β (t; X, Y, s) dt , p ∈ (0, ∞), q = max 1, κp (X, Y ; s) =   p 0

bs κ0 (X, Y ; s) = I{t; β(t; X, Y, s) > 0} dt; 0

κ∞ (X, Y ; s) =

sup{β(t; X, Y, s);

t ∈ [0, bs ]}.

It is evident that κp (X, Y ; 1) = κp (X, Y ) for p ∈ [0, ∞] and κ1 (X, Y ; s) = ζ(X, Y ; s, 1, 0) for s ∈ IN. The metric κp (X, Y ; s) is a (1, q(p + s − 1))homogeneous metric for p ∈ (0, ∞); κ0 (X, Y ; s) is a (1, s − a)-homogeneous metric; and κ∞ (X, Y ; s) is a (1, 1)-homogeneous metric. We next construct some minimal ideal and homogeneous metrics. As in Theorem 6.4.1, let the metric µ(X, Y ) be specified by a functional Λ and an operator Is that have the properties (α), (β), (A), (B) and (D). We shall say that the functional Λ is absolutely monotonic if Λ(|a|) = Λ(a), a ∈ U0 ; and if 0 ≤ a1 (x) ≤ a2 (x), x ∈ IR1 , then Λ(a1 ) ≤ Λ(a2 ), a1 , a2 ∈ U0 . Theorem 6.4.11 Consider an absolutely monotonic functional Λ. For any s ∈ IN there exists an operator Fs = (fs , gs ) on X × X and a constant cs such that for the functional Θ(X, Y ) =

(6.4.18)

|cs |Λ [P (fs (X, Y ) < x ≤ gs (X, Y )) + P (gs (X, Y ) < x ≤ fs (X, Y ))] = µ. we have Θ


425

Proof: Let m(X, Y ) be the following simple metric in X : m(X, Y ) = Λ(Fx − Fy ) = Λ(|Fx − Fy |); and let (X, Y ) be the following compound metric in X : τ (X, Y ) = Λ(P (X < x ≤ Y ) + P (Y < x ≤ X)). Since for any X, Y ∈ X we have |FX (x) − FY (x)| ≤ P (X < x ≤ Y ) + P (Y < x ≤ X),

(6.4.19)

and equality in (6.4.19) is attained (see Section 3.1) for the joint distribution P (X < x, Y < x) = min(FX (x), FY (x)), it follows from the absolute monotonicity of Λ that r = m. Since Is is an operator in U0 , it is easy to see that to the pair (FX , FY ) we can assign in a one-to-one manner a pair of distribution functions Φi = Φi (Is , FX , FY ) and a constant cs = cs (Is , FX , FY ) such that Is (FX − FY ) = cs (Φ1 − Φ2 ). For any pair X, Y ∈ X let us write Fs (X, Y ) = (fs (X, Y ), gs (X, Y )), where Ffs (X,Y ) = Φ1 , Fgs (X,Y ) = Φ2 . Since to any pair (PX , PY ) we assign in a one-to-one manner a pair (Pf (X,Y ) , Pg(X,Y ) ), it follows that Fs is an MD-operator (6.4.15), and by virtue of Theorem 6.4.4 we find that µ(X, Y ) and the functional Θ(X, Y ) = |cs |τ (Fs (X, Y )) are connected by = µ. the relation Θ 2 As a corollary of this result we next construct functionals Zs,p that have s,p = ζs,p in a class X ∗ . the same structure as ζs,p and Z For s ∈ IN and X, Y ∈ X define a set of points Es (X, Y ) := {x ∈ IR1 ; Fs,X (x) = Fs,Y (x)}. Let X0 ∈ Xs−1 and t0 ∈ IR1 . Define a set X ∗ = X ∗ (X0 , t0 )

=

{X ∈ X ; Fs,X (t0 ) = Fs,X0 (t0 ), EX j = EX0j ,

j = 1, . . . , s − 1

and a function λ = λ(FX , FY , s) that assigns to a pair (FX , FY ), X, Y ∈ X ∗ , a point in Es (X, Y ). It follows from the choice of λ that E

|X0 − λ|s−1 |X − λ|s−1 = E = as , (s − 1)! (s − 1)!

X ∈ X ∗.

To any random variable X ∈ X ∗ assign the following distribution function:    a−1 · Fs,X (x), for x ≤ λ, s Φs,X (x) =   1 − a−1 for x ≥ λ. s · F s,X (x),

426


Now we shall define an MD-operator Fs : X ∗ × X ∗ → X × X , Fs (X, Y ) = (fs (X, Y ), gs (X, Y )), that satisfies the conditions Ffs (X,Y ) = Φs,X , Fgs (X,Y ) = Ψs,Y . In the class XC of random variables belonging to X ∗ that have a continuous distribution function, such a mapping is generated (in the sense of Example 6.4.3) by the function ϕ = Φ−1 s,X ◦ Fx . Let A(x; X, Y, s) = P (fs (X, Y ) < x ≤ gs (X, Y )) + P (gs (X, Y ) < x ≤ fs (X, Y )). By Z(X, Y ; s, p, α), s ∈ IN, p ∈ [0, ∞], α ≥ 0, we denote the following compound probabilistic metric: 1/p  −∞   Z(X, Y ; s, p, α) = aps p Ap (x; X, Y, s)|x|2p dx ,   −∞

p ∈ (0, ∞), p = max(1, p); Z(X, Y ; s, 0, α) =

+∞ I{x; A(x; X, Y, s) > 0}|x|α dx; −∞

Z(X, Y ; s, ∞, α) =

as sup{A(x; X, Y, s)|x|α ; x ∈ IR1 }.

Note that Z(X, Y ; s, 1, 0) = as E|fs (X, Y ) − gs (X, Y )|. Following Theorem 6.4.11, we obtain Theorem 6.4.12 For any s ∈ IN and p ∈ [0, ∞], α ≥ 0, we have Z(X, Y ; s, p, α) = ζs (X, Y ; s, p, α) in the class X ∗ . Now shall use the fact that the function λ can be selected arbitarily. Let λ(FX , F − y, s) = λ0 for any X and Y belonging to a subset X0∗ ⊂ X ∗ . The latter is true if λ0 = t0 and X0∗ = X ∗. Let λ(FcX , FcY , s) = cλ(FX , FY , s),

c ∈ IR1 , c = 0,

(6.4.20)

for any X and Y belonging to some X1∗ ⊂ X ∗ . Condition (6.4.20) is satisfied, for example, if the function λ is defined by the condition |λ| = min{|t|; t ∈ Es (X, Y )}. The set X0∗ ∩ X1∗ coincides with X ∗ (X0 , t0 ) if t0 = 0 and λ0 = 0. Let λ(FX+c , FY +c , s) = λ(FX , FY , s) + c,

c ∈ IR1 ,

(6.4.21)


427

for any X, Y ∈ X2∗ . Condition (6.4.21) is satisfied if we write λ(FX , FY , s) = sup{t; t ∈ Es (X, Y )}, and we consider the class of all X and Y for which λ(FX , FY , s) < ∞ or λ(FX , Fy , s) = inf{t; t ∈ Es (X, Y )} and the class of all X and Y for which λ(FX , FY , s) > −∞. Let X3 be the class of all the random variables for which Es (X, Y ) consists of an odd number of points {1 , . . . , 2k−1 }, k ∈ IN. If we write λ(FX , FY , s) = k , then X3 = X1∗ ∩ X2∗ . For any subset X ⊂ X define Xc to be the set of all random variables in X that have continuous distribution functions. The proof of the next two theorems is straightforward. Theorem 6.4.13 In the class (X0∗ )c , we find that Z(X, Y ; s, p, α) is a probability metric, whereas in the class (X1∗ )c , Z(X, Y ; s, p, α) is a homogeneous functional of the same order as ζ(X, Y ; s, p, α). In the class (X2∗ )c , U (X, Y ; s, p, 0) is a weakly regular functional. Next we consider the weighted total variation metric defined as follows: s−1 d(FX − FY )(x) , s ∈ IN. σs (X, Y ) = sup |x| A∈B A

It is a homogeneous metric of order s − 1. Let us specify a class X ⊂ X with the property E|X|s−1 = E|Y |s−1 for X, Y ∈ X. By X1 we denote the with a density pX . Then set of all X ∈ X 1 σs (X, Y ) = 2

+∞ |x|s−1 |pX (x) − pY (x)| dx. −∞

In the class of random variables X ∈ X1 with finite E|X|s−1 , we have σs (Xn , X) → 0 if and only if Xn ⇒ X, E|Xn |s−1 → E|X|s−1 , and +∞ |pX (x + a) − pXn (x)| dx = 0. lim sup

a→0 n

−∞

Let Fs , s ∈ IN, denote the MD-operator Fs (X, Y ) = (fs (X, Y ), gs (X, Y )), where Ffs (X,Y ) = Fs,X , Fgs (X,Y ) = Fs,Y , s−1 −1

Fs,X (x) = (E|X|

x

)

|t|s−1 pX (t) dt,

X ∈ X1 .

−∞

For (X, Y ) ∈ X × X define the following functional: is (X, Y ) = E|X|s−1 i(Fs (X, Y )).

428


The following theorem shows that σs is minimal to is . It generalizes the Dobrushin (1970) theorem, which is the special case s = 1. Theorem 6.4.14 In the class X1 , is will be a homogeneous metric of order s − 1, and is = σs .

References

[1] T. Abdellaoui. Distances de deux lois dans les espaces de Banach. PhD thesis, Université de Rouen, 1993. [2] T. Abdellaoui. Détermination d’un couple optimal du problème de Monge–Kantorovich. C.R. Acad. Sci. Paris I, 319:981–984, 1994. [3] T. Abdellaoui and H. Heinich. Sur la distance de deux lois dans le cas vectoriel. C.R. Acad. Sci. Paris I, 319:397–400, 1994. [4] M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions. Dover Publications, New York, 9th edition, 1970. [5] A. Acosta and G. Gine. Convergence of moments and related functionals in the general central limit theorem in Banach spaces. Z. Wahrscheinlichkeitstheorie Verw. Geb., 48(2):213–241, 1979. [6] N.I. Ahiezer. Classical Moment Problem and Related Questions of Analysis. GIFML, Moscow, 1961. [7] N.I. Ahiezer and M. Krein. Some Questions in the Theory of Moments. American Mathematical Society, Providence, 1962. [8] H. Akaike. Modern development of statistical methods. In P. Eykhoff, editor, Trends and Progress in System Identification, pages 169–184. Pergamon Press, 1981. [9] D.J. Aldous. Exchangeability and related topics. Lecture Notes in Mathematics, 1117, 1985.

430

References

[10] D.J. Aldous. Ultimate instability of exponential backoff protocol for acknowledgement-based transmission control of random access communication channels. IEEE Transactions on Information Theory, IT 33:219–223, 1987. [11] D.J. Aldous. Asymptotic fringe distribution for general families of random trees. Annals of Applied Probability, 1:228–266, 1991. [12] D.J. Aldous. The continuum random tree II: An overview. In M.T. Barlow and N.H. Bingham, editors, Stochastic Analysis, volume 167 of London Math. Soc. Lecture Notes Series, pages 23–70. Cambridge University Press, 1991. [13] D.J. Aldous and J.M. Steele. Introduction to the interface of probability and algorithms. Statistical Science, 8:3–9, 1993. [14] G.A. Anastassiou. Moments in Probability and Approximation Theory. Pitman, England, 1993. [15] G.A. Anastassiou and S.T. Rachev. Approximation of a random queue by means of deterministic queueing models. In C.K. Chui, L.L. Shumaker, and J.D. Ward, editors, Approximation Theory VI, volume 1, pages 9–11. Academic Press, 1989. [16] G.A. Anastassiou and S.T. Rachev. Moment problems and their applications to characterization of stochastic processes, queueing theory, and rounding problems. In Approximation Theory, volume 138, pages 1–77, New York, 1992. Proceedings of 6th S.E.A. Meeting, Marcel Dekker Inc. [17] G.A. Anastassiou and S.T. Rachev. Moment problems and their applications to the stability of queueing models. Computers and Mathematics with Applications, 24(8/9):229–246, 1992. [18] E.J. Anderson and P. Nash. Linear Programming in Infinite Dimensional Spaces. Theory and Applications. Wiley, New York, 1987. [19] E.J. Anderson and A.B. Philpott. An algorithm for a continuous version of the assignment problem. Lecture Notes in Economics and Mathematical Systems, 215:108–117, 1983. Semi-Infinite Programming and Applications (Austin, Texas, 1981). [20] E.J. Anderson and A.B. Philpott. Duality and an algorithm for a class of continuous transportation problems. Mathematics of Operations Research, 9:222–231, 1984. [21] T.W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley, New York, 1984.

References

431

[22] W. Apitzsch, B. Fritzsche, and B. Kirstein. A Schur analysis approach to minimum distance problems. Linear Algebra and its Applications, 1990. [23] A. Araujo and E. Giné. The Central Limit Theorem for Real and Banach Valued Random Variables. Wiley, New York, 1980. [24] M.A. Arbeiter. Random recursive constructions of self-similar fractal measures. The non-compact case. Probability Theory and Related Fields, 88:497–520, 1991. [25] R.J. Aumann. Measurable utility and the measurable choice theorem. In Centre Nat. Recherche Sci., Paris, editor, La Decision, volume 171, pages 15–26. Actes Coll. Internat., Aix-en-Provence, 1967. [26] F. Aurenhammer, F. Hoffmann, and B. Arnov. Minkowski-type theorems and least-square partitioning. Reports of the Institute for Computer Science, 1992. Dept. of Mathematics, Freie Universit¨ at Berlin. [27] J. Auslander. Generalized recurrences in dynamical systems. Contributions to differential equations, 3(1):65–74, 1964. [28] M.L. Balinski. Signature des points extrêmes du polyhedre dual du problème de transport. Comptes Rendus de l’Académie des Sciences, Paris, 1983. [29] M.L. Balinski. The Hirsch conjecture for dual transportation polyhedra. Mathematics of Operations Research, 9:629–633, 1984. [30] M.L. Balinski. Signature methods for the assignment problem. Operations Research, 34:125–141, 1985. [31] M.L. Balinski. A complex (dual) simplex method for the assingment problem. Mathematical Programming Study, 34:125–141, 1986. [32] M.L. Balinski, B. Athanasopoulos, and S.T. Rachev. Some developments on the theory of rounding proportions. In Bulletin of thi ISI, 49th Session, volume 1, pages 71–72, Firenze, 1993. [33] M.L. Balinski and D. Gale. On the core of the assignment game. In Functional Analysis, Optimization, and Mathematical Economics: A Collection of Papers dedicated to the Memory of L.V. Kantorovich, pages 274–289, Oxford, 1990. Oxford University Press. [34] M.L. Balinski and S.T. Rachev. On Monge–Kantorovich problems. Preprint, 1989. SUNY at Stony Brook, Dept. of Applied Mathematics and Statistics. [35] M.L. Balinski and S.T. Rachev. Rounding proportions: rules of rounding. Numer. Funct. Anal. Optimization, 14:475–501, 1993.

432

References

[36] M.L. Balinski and S.T. Rachev. Rounding proportions: methods of rounding. Mathematical Scientist, 1997. [37] M.L. Balinski and A. Russakoff. Faces of dual transportation polyhedra. Mathematical Programming Study, 22:1–8, 1984. [38] M.L. Balinski and H.P. Young. Stability, coalitions and schisms in proportional representation systems. Americal Political Science Review, 72:848–858, 1978. [39] M.L. Balinski and H.P. Young. Fair Representation: Meeting the Ideal of One Man, One Vote. Yale University Press, New Haven, 1982. [40] A.A. Balkema, L. de Haan, and R. Karandikar. The maximum of n independent stochastic processes. Preprint, 1990. Erasmus University, Rotterdam. [41] D.P. Barbu and Th. Precupanu. Convexity and optimization in Banach spaces. Sijthoff/Nordhoff, 1978. [42] R.E. Barlow and F. Proschan. Statistical Theory of Reliability and Life Testing: Probability Models. Hold, Rinehart, and Winston, New York, 1975. [43] E.R. Barnes and A.J. Hoffman. Partitioning spectra and linear programming. In Proc. Silver Jubilee Conference on Combinations, Ontario, Canada, June 1982. Univ. Waterloo. [44] E.R. Barnes and A.J. Hoffman. On transportation problems with upper bounds on leading rectangles. SIAM Journal of Algebraic and Discrete Methods, 6:487–496, 1985. [45] M.F. Barnsley and J.H. Elton. A new class of of Markov processes for image encoding. Advances in Applied Probability, 20:14–32, 1988. [46] D.P. Baron and R.B. Myerson. Regulating a monopolist with unknown cost. Econometrica, 50:911–930, 1982. [47] S.K. Basu. On the rate of convergence to normality of sums of dependent random variables. Acta Math. Acad. Sci. Hungarica, 28:261– 265, 1976. [48] S.K. Basu and G. Simons. Moment spaces of IFR distributions, applications and related material. In P.K. Sen, editor, Contributions to Statistics: Essay in Honor of Norman L. Johnson, pages 27–46. North-Holland Publishing Company, 1983.

References

433

[49] J. Beirlant and S.T. Rachev. The problems in stability in insurance mathematics. Insurance: Mathematics and Economics, 6:179–188, 1987. [50] V. Bene˘s. The moment problem and its technical application. In Proc. 30th Int. Wissen. Kolloq., pages 11–14. TH Ilmenau, 1985. [51] V. Bene˘s. Moment Problem and Its Application. PhD thesis, Charles University, 1986. [52] V. Bene˘s. Extremal and optimal solutions in the transshipment problem. Comment. Math. Univ. Carolinae, 33:97–112, 1992. [53] V. Bene˘s. Extremal and Optimal Solutions of the Marginal and Transshipment Problem. PhD thesis, Dept. of Mathematics, FSI, Czech Technical University, Praha, Czech Republic, 1995. ˘ ep´ [54] V. Bene˘s and J. St˘ an. The support of extremal probability measure with given marginals. In M.L. Puri, P. Revesz, and W. Werzt, editors, Mathematical Statistics and Probability Theory, volume A of Proc. 6th Pannon Symp., pages 33–41. D. Reidel Publ. Comp., 1987. ˘ ep´ [55] V. Bene˘s and J. St˘ an. Extremal solutions in the marginal problem. In G. Dall’Aglio et al., editor, Advances in Probability Measures with Given Marginals, pages 189–206. Kluwer, Dordrecht, 1991. [56] V.Y. Bentkus, F. Götze, V. Paulauskas, and A. Rackauskas. The accuracy of Gaussian approximation in Banach spaces. University of Bielefeld, Preprint 90-100, 1990. [57] C. Berge. Théorie générale des jeux ` a n personnes, volume 138. Gauthier-Villars, Paris, 1957. Mémorial des science mathématiques. [58] C. Berge and A. Ghouila-Houri. Programming, Games and Transportation Networks. Methnen, John Wiley and Sons, Inc., New York, 1965. [59] S. Bertino. Su di una sottoclasse della classe di Fréchet. Statistica, 28:511–542, 1968. [60] S. Bertino. Sulla distanza tra distribuzioni. Pubbl. Ist. Calc. Prob. Univ. Roma, 1968. [61] D. Bertsekas and R. Gallager. Data Networks. Prentice-Hall, New Jersey, 1987. [62] N.P. Bhatia and G.P. Szeg¨ o. Stability theory of dynamical systems. Number 161 in Dre Grundlehren der mathematischen Wissenschaften. Springer, 1970.

434

References

[63] R.M. Bhattacharya and R. Rango Rao. Normal Approximation and Asymptotoic Expansions. Wiley, 1976. [64] P.J. Bickel and D.A. Freedman. Some asymptotic theory for the bootstrap. Annals of Statistics, 9:1196–1217, 1981. [65] P. Billingsley. Convergence of Probability Measures. Wiley, New York, 1968. [66] P. Billingsley. Probability and Measure. Wiley, New York, 2nd edition, 1986. [67] D. Blackwell and L.E. Dubins. An extension of Skorohod’s almost sure representation theorem. Proc. Amer. Math. Soc., 89:691–692, 1983. [68] R.C. Blattberg and N.J. Genodes. A comparison of the stable and student distributions as statistical models for stock prices. J. Business, 47:244–280, 1974. [69] T. Bollerslev. A conditionally heteroscedastic time series model for speculative prices and rates of return. Review of Economic Studies, 69:542–547, 1987. [70] E. Bolthausen. Exact convergence rate in some martingale central limit theorems. Annals of Probability, 10:672–688, 1982. [71] A. Boness, A. Chen, and S. Jatusipitak. Investigations of nonstationary prices. J. Business, 48:518–537, 1979. [72] A.A. Borovkov. Asymptotic Methods in Queueing Theory. Wiley, New York, 1984. [73] A.A. Borovkov. On the ergodicity and stability of the sequence wn+1 = f (wn , zn ): applications to communication networks. Theory of Probability and its Applications, 33:595–611, 1988. [74] A. Brandt, P. Franken, and B. Lisek. Stationary Stochastic Models. Wiley, New York, 1990. [75] Y. Brenier. Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure. Appl. Maths., XLIV:375–417, 1987. [76] G. Brown and B. Shubert. On random binary trees. Mathematics of Operations Research, 9:43–65, 1984. [77] R.A. Brualdi and J. Csima. Extremal plane stochastic matrices of dimension three. Journal of Linear Algebra and its Applications, 11:105–133, 1975.

References

435

[78] R.A. Brualdi and J. Csima. Stochastic patterns. J. Comb. Theory, 19:1–12, 1975. [79] Y.A. Brudnii. A multidimensional analog of a theorem of Whitney. USSR Math. Sbornik, 11:157–170, 1970. [80] R.E. Burkard, B. Klinz, and R. Rudolf. Perspectives of Monge properties in optimization. Bericht 2, 1994. Spezialforschungsbereich F 003, Karl-Franzens-Universität Graz & Technische Universität Graz. [81] R.M. Burton and U. Rösler. An L2 -convergence theorem for random affine mappings. Journal of Applied Probability, 32:183–192, 1995. [82] P.L. Butzer, L. Hahn, and M.Th. Roeckerath. Central limit theorem and weak law of large numbers with rates for martingales in Banach spaces. Journal of Multivariate Analysis, 13:287–301, 1983. [83] S. Cambanis and G. Simons. Probability and expectation inequalities. Z. Wahrscheinlichkeitstheorie Verw. Geb., 59:285–294, 1982. [84] S. Cambanis, G. Simons, and W. Stout. Inequalities for Ek(X, Y ) when the marginals are fixed. Z. Wahrscheinlichkeitstheorie Verw. Geb., 36:285–294, 1976. [85] L. Cavalli-Sforza. Cultural and biological evolution: a theoretical inquirey. In S.G. Ghurye, editor, Proceedings of the Conference on Directions for Mathematical Statistics, volume 7 of Suppl. Adv. Appl. Prob., pages 90–99, 1975. [86] L. Cavalli-Sforza and M.W. Feldman. Models for cultural inheritance I. Group mean and within group variation. Theoret. Popn. Biol., 4:42–55, 1973. [87] S. Chandrasekhar and G. Munch. The theory of the fluctuations in brightness of the milky way. I and II. Astrophys. J., 112:380–398, 1950. [88] M.R. Chernick, D.J. Daley, and R.P. Littlejohn. A time-revisibility relationship between two markov chains with exponential stationary distributions. Journal of Applied Probability, 25:418–422, 1988. [89] G. Choquet. Forme abstraite du théorème de capacitabilité. Ann. Inst. Fourier, 9:83–89, 1959. [90] Y.S. Chow and H. Teicher. Probability Theory: Independeance, interchangeability, martingales. Springer, New York, 1978. [91] F.H. Clark. Optimization and nonsmooth analysis. Classics in Appl. Math. SIAM, 1990.

436

References

[92] J.M.C. Clark and R.J. Cameron. The maximum rate of convergence of discrete approximations for stochastic differential equations. Lecture Notes in Control and Information Science, 25:162–171, 1980. [93] P.K. Clark. A subordinated stochastic process model with finite variance for speculative prices. Econometrica, 41:135–155, 1973. [94] M. Cramer. Stochastische Analyse rekursiver Algorithmen mit idealen Metriken. PhD thesis, Universität Freiburg, 1995a. [95] M. Cramer. Convergence of a branching type recursion with nonstationary immigration. Metrica, 1995b. To appear. [96] M. Cramer. A note concerning the limit distribution of the Quicksort algorithm. Informatique Théoriqué et Appl., 30:195–207, 1996. [97] M. Cramer and L. R¨ uschendorf. Analysis of recursive algorithms by the contraction method. Lecture Notes in Statistics, 114:18–33, 1996a. [98] M. Cramer and L. R¨ uschendorf. Convergence of a branching type recursion. Annales de l’Institut Henri Poincaré, 32:725–741, 1996b. [99] J. Csima. Multidimensional stochastic matrices and patterns. J. Algebra, 14:194–202, 1970. [100] J.A. Cuesta-Albertos and C. Matr´ an. Strong convergence of weighted sums of random elements through the equivalence of sequences of distributions. Journal of Multivariate Analysis, 25:311–322, 1988. [101] J.A. Cuesta-Albertos and C. Matr´ an. Notes on the Wasserstein metric in Hilbert spaces. Annals of Probability, 17:1264–1276, 1989. [102] J.A. Cuesta-Albertos and C. Matr´ an. Skorohod representation theorem and Wasserstein metrics. Preprint, 1991. [103] J.A. Cuesta-Albertos and C. Matrán. A review on strong convergence of weighted sums of random elements based on Wasserstein metrics. Journal of Stat. Planning Infer., 30:359–370, 1992. [104] J.A. Cuesta-Albertos and C. Matrán. Stochastic convergence through Skorohod representation theorems and Wasserstein metrics. Suppl. Rendic. Circolo Matem. Palermo II, 35:89–113, 1994. [105] J.A. Cuesta-Albertos, C. Matr´ an, S.T. Rachev, and L. R¨ uschendorf. Mass transportation problems in probability theory. Mathematical Scientist, 21:37–72, 1996.

References

437

[106] J.A. Cuesta-Albertos, L. R¨ uschendorf, and A. Tuero-Diaz. Optimal coupling of multivariate distributions and stochastic processes. Journal of Multivariate Analysis, 46:335–361, 1993. [107] J.A. Cuesta-Albertos and A. Tuero-Diaz. A characterization for the solution of the Monge–Kantorovich mass transference problem. Statist. Probab. Letters, 16:147–152, 1993. [108] G. Dall’Aglio. Sugli estremi dei momenti delle funzioni di ripartizione doppie. Ann. Scuola Normale Superiore Di Pisa, Cl. Sci., 3(1):33–74, 1956. [109] G. Dall’Aglio. Sulla compatibilita delle funzioni di ripartizione doppia. Rendiconti di Math., 18:385–413, 1959. [110] G. Dall’Aglio. Les fonctions extrèmes de la classe de Fréchet à 3 dimensions. Publ. Inst. Stat. Univ. Paris, IX:175–188, 1960. [111] G. Dall’Aglio. Sulle distribuzioni doppie con margini assegnati soggette a delle limitazioni. It. Giorn. 1st. Ital. Attuari, 94, 1961. [112] G. Dall’Aglio. Fréchet classes and compatibility of distribution functions. Symposia Mathematica, 9:131–150, 1972. [113] G. Dall’Aglio, S. Kotz, and G. Salinetti. Advances in Probability Distributions with Given Marginals. Kluver, Dordrecht, 1991. [114] G.B. Dantzig and A.R. Ferguson. The allocation of aircraft to routes—an example of linear programming under uncertain demands. Mang. Science, 3:45–73, 1956. [115] A. D’Aristotile, P. Diaconis, and D. Freedman. On a merging of probabilities. No. 301, 1988. Dept. of Statistics, Stanford University. [116] M.M. Day. Normed Linear Spaces. Heidelberg, 1958.

Springer, Berlin–G¨ ottingen–

[117] A. de Acosta. Invariance principles in probability for triangle arrays of B-valued random vectors and some applications. Annals of Probability, 10:346–373, 1982. [118] L. de Haan, E. Omey, and S.I. Resnick. Domains of attraction and regular variation in IRd . Journal of Multivariate Analysis, 14:17–33, 1984. [119] L. de Haan and S.T. Rachev. Estimates of the rate of convergence for max-stable processes. Annals of Probability, 17:651–677, 1989.

438

References

[120] L. de Haan and S.I. Resnick. Limit theory for multivariate sample extremes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 40:317–337, 1977. [121] L. de Haan, S.I. Resnick, H. Rootzen, and C.G. Vries. Extremal behavior of solutions to a stochastic difference equation with applications to ARCH process. Stoch. Processes and Applications, 32:213– 224, 1989. [122] P. de Jong. Central limit theorems for generalized multilinear forms. CWI Tract, Amsterdam, 61, 1989. [123] G. Debreu. Representation of a preference ordering by a numerical function. In Decision Processes, pages 159–165. Wiley, New York, 1954. [124] G. Debreu. Continuity properties of paretian utility. Intern. Econ. Revue, 5:285–293, 1964. [125] P. Deheuvels and D. Pfeifer. On a relationship between Uspensky’s theorem and Poisson approximation. Ann. Inst. Statist. Math., 40:671–681, 1988. [126] C. Dellacherie and P.A. Meyer. Probabilités et potential, volume 29 of North-Holland Mathematics Studies. Hermann, Paris, 1983. Chapitres IX a XI. [127] U. Derigs, O. Goecke, and R. Schrader. Monge sequences and a simple assignment algorithm. Discrete Applied Mathematics, 15:241– 248, 1986. [128] L. Devroye. Lecture Notes on Bucket Algorithms, volume 6. Birkh¨ auser, Boston, 1986. Progress in computer science. [129] L. Devroye. A Course in Density Estimation, volume 14 of Progress in probability and statistics. Birkhäuser, Boston, 1987. [130] P. Diaconis and D. Freedman. On rounding percentages. Journal of the American Statistical Association, 74:359–364, 1979. [131] P. Diaconis and D. Freedman. A dozen of the Finetti-style results in search of a theory. Annales de l’Institut Henri Poincaré, 23:397–423, 1987. [132] H. Dietrich. Zur c-Konvexität und c-Subdifferenzierbarkeit von Funktionalen. Optimization, 19:355–371, 1988. [133] N. Dinculeanu. Vector Measures, volume 95 of International series of monographs on pure and applied mathematics. Pergamon Press, Oxford, 1967.

References

439

[134] R.L. Dobrushin. Prescribing a system of random variables by conditional distributions. Theory of Probability and its Applications, 15:458–486, 1970. [135] R.L. Dobrushin. Vlasov equations. Func. Anal. Appl., 13:115–123, 1979. [136] I. Domowitz and C.S. Hakkio. Conditional variance and the risk premium in the foreign exchange market. Journal of Internat. Economics, 19:47–66, 1985. [137] H. Doss. Liens entre équation differentielles stochastiques et ordinaires. Annales de l’Institut Henri Poincaré, XIII:99–125, 1977. [138] R.G. Douglas. On extremal measures and subspace density. Michigan Math. J., 11:243–246, 1964. [139] D.C. Dowson and B.V. Landau. The Fréchet distance between multivariate normal distributions. Journal of Multivariate Analysis, 12:450–455, 1982. [140] A.Y. Dubovitskii and A.A. Milyutin. Necessary Conditions for a Weak Extremum in the General Problems of Optimal Management. Nauka, Moscow, 1971. In Russian. [141] R.M. Dudley. Convergence of Baire measures. Studia Mathematica, 27:251–268, 1966. [142] R.M. Dudley. Distances of probability measures and random variables. Annals of Mathematical Statistics, 39:1563–1572, 1968. [143] R.M. Dudley. The speed of mean Glivenko–Cantelli convergence. Annals of Mathematical Statistics, 40:40–50, 1969. [144] R.M. Dudley. Speeds of metric probability convergence. Z. Wahrscheinlichkeitstheorie Verw. Geb., 22:323–332, 1972. [145] R.M. Dudley. Probability and metrics. Convergence of laws on metric spaces, with a view to statistical testing. Aarhus Univ. Lect. Notes, 45, 1976. [146] R.M. Dudley. Real Analysis and Probability. Wadsworth & BrooksCole, Pacific Grove, California, 1989. [147] D. Duffie. Dynamic Asset Pricing Theory. Princeton University Press, Princeton, 1992. [148] N. Dunford and J. Schwartz. Linear Operators. General Theory, volume Part I. Wiley-Interscience Publication, New York, 1958.

440

References

[149] R. Durrett and M. Liggett. Fixed points of the smoothing transformation. Z. Wahrscheinlichkeitstheorie Verw. Geb., 64:275–301, 1983. [150] A. Dvoretzky. Asymptotic normality for sums of dependent random variables. Proc. Berkeley Symp. II, pages 513–535, 1970. [151] D.A. Edwards. On the existence of probability measures with given marginals. Ann. Inst. Fourier, 28:53–78, 1978. [152] I. Ekeland and R. Teman. Convex analysis and variational problems. North Holland, 1976. [153] K.H. Elster and R. Nehse. Zur Theorie der Polarfunktionale. Optimization, 5:3–21, 1974. [154] R.F. Engle, D.M. Lilien, and R.P. Robins. Estimating time varying risk premia in the term structure: the ARCH model. Econometrica, 55:391–407, 1987. [155] Y. Ermoljev, A. Gaivoronski, and C. Nedeva. Stochastic optimization problem with incomplete information on distribution functions. Report WP-83-113, 1983. [156] I.V. Evstigneev. Measurable choice theorems and probabilistic control models. Dokl. Akad. Nauk USSR, 283(5):1065–1068, 1985. [157] G. Fayolle, P. Flajolet, and M. Hofri. On a functional equation arising in the analysis of a protocol for a multi-access broadcast channel. Advances in Applied Probability, 18:441–472, 1986. [158] G. Fayolle, P. Flajolet, M. Hofri, and P. Jacquet. Analysis of a stack algorithm for random multiple-access communication. IEEE Transactions on Information Theory, 31:244–254, 1985. [159] M.W. Feldman, S.T. Rachev, and L. R¨ uschendorf. Limit theorems for recursive algorithms. Journal of Computational and Applied Mathematics, 56:69–182, 1994. [160] W. Feller. An Introduction to Probability Theory and Its Applications, volume II. Wiley, New York, 2nd edition, 1971. [161] R. Ferland and G. Giroux. Cutoff-type Boltzmann equations: Convergence of the solution. Adv. Appl. Math., 8:98–107, 1987. [162] R. Ferland and G. Giroux. Le modèle Bose–Einstein de l’équation non linéaire de Boltzmann: Convergence vers l’equilibre. Ann. Sc. Math. Québec, 15:23–33, 1991. [163] X. Fernique. Sur le théorème de Kantorovich–Rubinstein dans les espaces polonais. Lecture Notes in Mathematics, 850:6–10, 1981.

References

441

[164] P.C. Fishburn, J.C. Lagarias, J.A. Reeds, and L.A. Shepp. Sets uniquelly determine by projections on axes. I. Continuous case. SIAM Journal on Applied Mathematics, 50:288–306, 1990. [165] A.T. Fomenko and S.T. Rachev. Volume functions on historical (narrative) texts and the amplitude correlation principle. Computers and Humanities, 24(3):187–206, 1990. [166] P.R. Fortet and B. Mourier. Convergence de la repartition empirique vers la repartition theoretique. Ann. Sci. Ecole Norm. Sup., 70(3):267–285, 1953. [167] M.J. Frank. Operations arising from copulas. In Symp. Probab. Measures with Given Marginals, volume 67 of Math. Appl., pages 75–93, Rome, 1991. [168] M.J. Frank, R.B. Nelsen, and B. Schweizer. Best possible bounds for the distribution of a sum — a problem of Kolmogorov. Probability Theory and Related Fields, 74:199–211, 1987. [169] M. Fréchet. Sur les tableaux de corrélation dont les marges sont données. Ann. Univ. de Lyon, Sciences, 14:53–77, 1951. [170] M. Fréchet. Les tableaux de correlation dont les marges sont données. Ann. Univ. de Lyon, Sciences, 20:13–31, 1957. [171] M. Fréchet. Sur la distance de deux lois de probabilité. C.R. Acad. Sci. Paris, 244:689–692, 1957. [172] M. Fréchet. Sur les tableaux de corrélation dont les marges et des bornes sont données. Revue Inst. Int. de Statistique, 28:10–32, 1960. [173] N. Gaffke and L. R¨ uschendorf. On a class of extremal problems in statistics. Math. Operationsforschung Statist., 12:123–135, 1981. [174] N. Gaffke and L. R¨ uschendorf. On the existence of probability measures with given marginals. Statistics & Decisions, 2:163–174, 1984. [175] D. Gale. Theory of Linear Economic Models. McGraw-Hill, New York, 1960. [176] D. Gale and A. Mas-Colell. An equilibrium existence theorem for a general model without ordered preferences. Journal of Mathematical Economics, 2:9–15, 1975. [177] W. Gangbo and R.J. McCann. Optimal maps in Monge’s mass transport problem. CRAS, Ser. I, 321:1653–1658, 1995. [178] W. Gangbo and R.J. McCann. The geometry of optimal transformations. Preprint, 1996.

442

References

[179] M. Gelbrich. On a formula for the Lp Wasserstein metric between measures on Euclidean and Hilbert spaces. Preprint 179, 1988. Sektion Mathematik der Humboldt-Universität zu Berlin. [180] M. Gelbrich. Lp -Wasserstein-Metriken und Approximationen stochastischer Differentialgleichungen. Dissertation A, Humboldt-Universität zu Berlin, Sektion Mathematik, 1989. [181] M. Gelbrich. On a formula for the L2 -Wasserstein metric between measures on Euclidean and Hilbert spaces. Math. Nachr., 147:185– 203, 1990. [182] M. Gelbrich. Simultaneous time and chance discretization for stochastic differential equations. Journal of Computational and Applied Mathematics, 58:255–289, 1995. [183] M. Gelbrich and S.T. Rachev. Discretization for stochastic differential equations, L2 -Wasserstein metrics, and econometric models. In Distributions with Given Marginals. IMS Proc., 1996. To appear. [184] I. Gelfand, D. Raikov, and G. Shilov. Kommutative normierte Algebren. VEB Deutscher Verlag der Wissenschaften, 1964. [185] C. Genest. A survey of the statistical properties and applications of Archimedean copulas, 1990. Technical Report. [186] H. Gerber. An Introducation to Mathematical Risk Theory. Huebner Foundation Monograph, 1981. [187] I.I. Gikhman and A.W. Skorokhod. Introduction to the theory of stochastic processes. Nauka, Moscow, 1977. In Russian. [188] C. Gini. Di una misura delle ralazioni tra le graduatorie di due caratteri. Appendix to: A. Hancini. L’Elezioni Generali Politiche del 1913 nel comune di Roma, Ludovic, Cecehini, 1914. [189] C. Gini. La dissomiglianza. Matron, 24:309–331, 1965. [190] C.R. Givens and R.M. Shortt. A class of Wasserstein metrics for probability distributions. Michigan Math. J., 31:231–240, 1984. [191] D. Goldfarb. Efficient dual simplex algorithms for the assignment problem. Preprint, 1985. [192] C.M. Goldie. Implicit renewal theory and tails of solutions of random equations. Annals of Applied Probability, 1:126–166, 1991. [193] C. Graham. McKean–Vlasov Itˆ o–Skorohod equations and nonlinear diffusions with discrete jump sets. Stoch. Proc. Appl., 40:69–82, 1992.

References

443

[194] C. Graham. Nonlinear diffusions with jumps. Preprint, 1992. [195] R.M. Gray, D.L. Neuhoff, and R.L. Dobrushin. Block synchronization, sliding-block coding, invulnerable sources and zero error codes for discrete noisy channels. Annals of Probability, 8:315–328, 1980. [196] R.M. Gray, D.L. Neuhoff, and P.C. Shields. A generalization to Ornstein’s d-distance with applications to information theory. Annals of Probability, 3:315–328, 1975. [197] R.M. Gray and D.S. Ornstein. Block coding for discrete stationary d-continuous channels. IEEE Transactions on Information Theory, 25:292–306, 1979. [198] N.E. Gretsky, J.M. Ostroy, and W.R. Zame. The nonatomic assignment model. Journal of Economic Theory, 2:103–128, 1992. [199] N.V. Grigorevski and I.S. Shiganov. On some modifications of Duley’s metric. Zap. Nauchnich Sem. LOMI, 61:17–24, 1976. [200] F.A. Gr¨ unbaum. Propagation of chaos for the Boltzmann equation. Arch. Rational Mech. Anal., 42:323–345, 1971. [201] P. Gudynas. Approximation by distributions of sums of conditionally independent random variables. Litovski Mat. Sbornik, 24:68–80, 1985. [202] Y. Guivarch. Sur une extension de la notion de loi semi-stable. Annales de l’Institut Henri Poincaré, 26:261–286, 1990. [203] W. Gutjahr and G.Ch. Pflug. The asymptotic contour process of a binary tree is a Brownian excursion. Stoch. Processes and Applications, 41:69–89, 1992. [204] S. Gutmann, J.H.B. Kemperman, and J.A. Reeds. Existence of probability measures with given marginals. Annals of Probability, 19:1781–1791, 1991. [205] S. Gutmann, J.H.B. Kemperman, J.A. Reeds, and L.A. Shepp. Existence of probability measures with given marginals. Annals of Probability, 19:1781–1791, 1991. [206] D.L. Guy. common extension of finitely additive probability measures. Portugalia Math., 20:1–5, 1961. [207] M.G. Hahn, W.N. Hudson, and J.A. Veeh. Operator stable laws: series representations and domains of normal attraction. Journal of Multivariate Analysis, 10:26–37, 1989. [208] P. Hall. Personal communication, 1985.

444

References

[209] J.P. Hammond. Straightforward individual incentive compatiblility in large economies. Review of Economic Studies, 46:263–282, 1979. [210] W.K.K. Haneveld. Duality in Stochastic Linear and Dynamic Programming. Centrum voor Wiskunde en Informatica, Amsterdam, 1985. [211] L.G. Hanin. Kantorovich–Rubinstein duality for Lipschitz spaces defined by differences of arbitrary order. Soviet Math. Doklady, 42(1):220–224, 1991. [212] L.G. Hanin and S.T. Rachev. An extension of the Kantorovich– Rubinstein mass transportation problem., 1991. Dept. of Statistics and Applied Probability, University of California, Santa Barbara. [213] L.G. Hanin and S.T. Rachev. Mass transshipment problems and ideal metrics. Journal of Computational and Applied Mathematics, 56:183–196, 1994. [214] L.G. Hanin and S.T. Rachev. An extension of the Kantorovich– Rubinstein mass transshipment problem. Numer. Funct. Anal. Optimization, 16:701–735, 1995. [215] G. Hansel and J.P. Troallic. Measures marginales et théorème de Ford–Fulkerson. Z. Wahrscheinlichkeitstheorie Verw. Geb., 43:245– 251, 1978. [216] G. Hansel and J.P. Troallic. Sur le problème des marges. Probability Theory and Related Fields, 71:357–366, 1986. [217] F. Hausdorff. Set Theory. Chelsea Publishing Company, New York, 1957. [218] E. H¨ aussler. On the rate of convergence in the central limit theorem for martingales with discrete and continuous time. Annals of Probability, 16:275–299, 1988. [219] H. Heinich and J.C. Lootgieter. Convergence des fonctions monotones. Preprint, 1993. [220] I.S. Helland and T.S. Nilsen. On a general random exchange mode. Journal of Applied Probability, 13:781–790, 1976. [221] P.L. Hennequin and A. Tortrat. Probability Theory and Some of Its Applications. Nauka, Moscow, 1974. Russian translation. [222] W. Hildenbrand. On economies with many agents. Journal of Economic Theory, 2:161–168, 1970.

References

445

[223] C. Hipp and R. Michel. Risikotheorie: Stochastische Modelle und Statistische Methoden. DGVM, 24, 1990. [224] W. Hoeffding. Maßstabinvariante Korrelationstheorie. Schriften des Mathematischen Instituts und des Instituts f¨ ur Angewandte Mathematik der Universit¨ at Berlin, 5:181–233, 1940. [225] W. Hoeffding. The extrema of the expected value of a function of independent random variables. Annals of Mathematical Statistics, 26:268–275, 1955. [226] W. Hoeffding and S.S. Shrikahande. Bounds for the distribution function of a sum of independent, identically distributed random variables. Annals of Mathematical Statistics, 27:439–449, 1956. [227] A.J. Hoffman. On simple linear programming problems. Convexity. In Proceedings of Symposia in Pure Mathematics, volume 7, pages 317–327, Providence, R.I, 1961. [228] A.J. Hoffman. On simple linear programming problems. In V. Klee, editor, Convexity, volume 7, pages 317–327, Providence, R.I, 1963. Proc. Symp. Pure Math. [229] A.J. Hoffman and A.F. Veinott jr. Staircase transportation problems with hyperadditive rewards and cumulative capacities. Preprint, 1990. IBM T.Y. Watson Research Center, Yorktown Heights, New York, 10598. [230] J. Hoffmann-J¨ orgensen. Probability in Banach space. Lecture Notes in Mathematics, 598:2–186, 1977. [231] M. Hofri. Probabilistic Analysis of Algorithms. Springer, New York, 1987. [232] R. Holley and M. Liggett. Generalized potlach and smoothing processes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 55:165–195, 1981. [233] G. Hooghiemstra and M. Keane. Calculation of the equilibrium distribution for a solar energy storage model. Journal of Applied Probability, 22:852–864, 1985. [234] G. Hooghiemstra and C.L. Scheffer. Some limit theorems for an energy storage model. Stoch. Processes and Applications, 22:121– 127, 1986. [235] J. Horowitz and R.L. Karandikar. Martingale problems associated with the Boltzmann equation. In E. C ¸ inlar et al., editor, Seminar on Stochastic Processes 1989, Boston, 1990. Birkhäuser.

446

References

[236] J. Horowitz and R.L. Karandikar. Mean rates of convergence of empirical measures in the Wasserstein metric. Journal of Computational and Applied Mathematics, 55:261–273, 1994. [237] D.A. Hsieh. The statistical properties of daily foreign exchange rates: 1974-1983. Journal of Internat. Economics, 24:129–145, 1988. [238] P.J. Huber. Robust Statistics. Wiley, New York, 1981. [239] W.N. Hudson. Operator-stable distributions and stable marginals. Journal of Multivariate Analysis, 10:26–37, 1980. [240] W.N. Hudson, Z.J. Jurek, and J.A. Veeh. The symmetry group and exponents of operator stable probability measures. Annals of Probability, 14:1014–1023, 1986. [241] W.N. Hudson and J.D. Mason. Operator-stable laws. Journal of Multivariate Analysis, 11:434–447, 1981. [242] W.N. Hudson, J.A. Veeh, and D.C. Weiner. Moments of distributions attracted to operator-stable laws. Journal of Multivariate Analysis, 24:1–10, 1988. [243] J.E. Hutchinson. Fractals and selfsimilarity. Indiana Univ. Math. Journal, 30:713–747, 1981. [244] Z. Ignatov and S.T. Rachev. Minimality of ideal probabilistic metrics. J. Soviet Math., 32(6):595–608, 1986. [245] N. Ikeda and S. Watanabe. Stochastic Differential Equations and Diffusion Processes. North-Holland, Amsterdam, 1981. [246] A.D. Ioffe and V.M. Tihomirov. Theory der Extremalaufgaben. VEB Deutscher Verlag der Wissenschaften, Berlin, 1979. [247] K. Isii. Inequalities of the type of Chebychev and Cramér-Rao and mathematical programming. Ann. Inst. Statist. Math., 16:247–270, 1964. [248] E.H. Ivanov and R. Nehse. Relations between generalized concepts of convexity and conjugacy. Math. Operationsforschung Statist., 13:9– 18, 1982. [249] K. Jacobs. Measure and Integral. Academic Press, New York, 1987. [250] J. Jacod. Calcul stochastique et problème de martingales. Lecture Notes in Mathematics, 714, 1979.

References

447

[251] P. Jacquet and M. Regnier. Normal limiting distribution of the size of tries. In P.J. Courtois and G. Latouche, editors, Proc. Performance 87, pages 209–223, Amsterdam, 1988. Elsevier Science Publications B.V. (North Holland). [252] R. Janssen. Discretization of the Wiener-Process in DifferenceMethods for stochastic differential equations. Stoch. Processes and Applications, 18:361–369, 1984. [253] M. Jirina and J. Nedoma. Minimax solution of a sampling inventory process. Aplikace matematiky, 1:296–314, 1957. In Czech. [254] R. Jirousek. A survey of methods used in probabilistic expert systems for knowledge integration. Knowledge Based Systems, 3:7–12, 1990. [255] R. Jirousek. Solution of the marginal problem and decomposable distributions. Kybernetika, 27(5):403–412, 1991. [256] H. Johnen and K. Scherer. On the equivalence of K-functional and moduli of continuity and some applications. Lecture Notes in Mathematics, 571:119–130, 1977. [257] J.P. Kahane and J. Peyrière. Sur certaines martingales de Benoit Mandelbrot. Adv. Math., 22:131–145, 1976. [258] A.V. Kakosjan, K. Klebanov, and S.T. Rachev. Quantitative Criteria for Convergence of Probability Measures. Ayastan Press, Erevan, 1988. (In Russian, Engl. transl.: Springer-Verlag, To appear). [259] A.V. Kakosjan and L.B. Klebanov. On estimates of the closeness of distributions in terms of characteristic functions. Theory of Probability and its Applications, 29:852–853, 1984. [260] V.V. Kalashnikov and S.T. Rachev. Characterization problems in queueing theory and their stability. Advances in Applied Probability, 17:320–348, 1985. [261] V.V. Kalashnikov and S.T. Rachev. Characterization of inverse problems in queueing and their stability, 1986. [262] V.V. Kalashnikov and S.T. Rachev. Mathematical Methods for Construction of Stochastic Queueing Models. Wadsworth & Brooks/Cole, California, 1990. [263] T. Kamae, U. Krengel, and G.I. O’Brien. Stochastic inequalities on partially ordered spaces. Annals of Probability, 5:899–912, 1977. [264] S. Kanagawa. The rate of convergence for approximate solutions of stochastic differential equations. Tokyo J. Math., 12:33–48, 1986.

448

References

[265] Y. Kannai. Continuity properties of the core of a market. Econometrica, 38(6):791–815, 1970. [266] L.V. Kantorovich. On the transfer of masses. Dokl. Akad. Nauk USSR, 37:7–8, 1942. [267] L.V. Kantorovich. On a problem of Monge. Uspekhi Mat. Nauk, 3:225–226, 1948. In Russian. [268] L.V. Kantorovich and G.P. Akilov. Functional Analysis. Nauka, Moscow, 3rd edition, 1984. In Russian. [269] L.V. Kantorovich and G.Sh. Rubinstein. On a function space in certain extremal problems. Dokl. Akad. Nauk USSR, 115(6):1058– 1061, 1957. [270] L.V. Kantorovich and G.Sh. Rubinstein. On the space of completely additive functions. Vestnic Leningrad Univ., Ser. Mat. Mekh. i Astron., 13(7):52–59, 1958. In Russian. [271] S. Karlin and W.J. Studden. Tchebycheff Systems. Interscience, New York, 1966. [272] T. Kawata. Fourier Analysis in Probability Theory. Academic Press, New York, 1972. [273] H.G. Kellerer. Funktionen auf Produkträumen mit vorgegebenen Marginal-Funktionen. Math. Ann., 144:323–344, 1961. [274] H.G. Kellerer. Maßtheoretische Marginal Probleme. Math. Annalen, 153:168–198, 1964. [275] H.G. Kellerer. Duality theorems and probability metrics. In Proc. 7th Brasov Conf., pages 211–220, Bucuresti, 1984. [276] H.G. Kellerer. Duality theorems for marginal problems. In M. Iosifescu, editor, Proceedings of the 7th Conference on Probability Theory, Bra¸sov, Romania, 1984. [277] H.G. Kellerer. Duality theorems for marginal problems. Z. Wahrscheinlichkeitstheorie Verw. Geb., 67:399–432, 1984. [278] H.G. Kellerer. Ambiguity in bounded moment problems. In AMS-IMS-SIAM Joint Research Conference: Distributions with fixed marginals, double-stochastic measures and Markov operators, 1993. To appear. [279] R. Kemp. Fundamentals of the Average Case Analysis of Particular Algorithms. Wiley, New York, 1984.

References

449

[280] J.H.B. Kemperman. The general moment problem, a geometric approach. Annals of Mathematical Statistics, 19:93–122, 1968. [281] J.H.B. Kemperman. On a class of moment problems. In Proceedings 6th Berkeley Symposium on Mathematical Statistics and Probability, volume 2, pages 101–126, 1972. [282] J.H.B. Kemperman. On the FKG-inequality for measures on a partially ordered space. Proc. Nederl. Akad. Wet., 80:313–331, 1977. [283] J.H.B. Kemperman. On the role of duality in the theory of moments. In Semi-Infinite Programming and Applications 1981, volume 215, pages 63–92. Springer, 1983. [284] J.H.B. Kemperman. Geometry of the moment problem. In Proceedings of Symposia in Applied Mathematics, volume 27, pages 16–53. American Mathematical Society, 1987. [285] J.H.B. Kemperman. Moment problems for measures on IRn with given k-dimensional marginals. In AMS-IMS-SIAM; Joint Research Conference. Distributions with fixed marginals, double-stochastic measures and Markov operators, 1993. To appear. [286] H. Kesten. Random difference equations and renewal theory for products of random matrices. Acta Math., 131:207–248, 1973. [287] L.A. Khalfin and L.B. Klebanov. A solution of the computer tomography paradox and estimation of the distances between the densities of measures with the same marginals. Annals of Probability, 22:2235– 2241, 1994. [288] V. Kifer. Ergodic Theory of Random Transformations. Birkh¨ auser, Boston, 1986. [289] T. Kim and M.K. Richter. Nontransitive-nontotal consumer theory. Journal of Economic Theory, 38, 1986. [290] A.Y. Kiruta, A.M. Rubinov, and E.B. Yanovskaya. Optimal choice of distributions in complex socio-economic problems. Nauka, Leningrad, 1980. In Russian. [291] L.B. Klebanov, G.M. Maniya, and I.A. Melamed. A problem of Zolotarev and analogs of infinitely divisible and stable distributions in a scheme for summing a random number of random variables. Theory of Probability and its Applications, 29:791–794, 1984. [292] L.B. Klebanov and S.T. Mkrtchian. Estimator of the closeness of distributions in terms of coinciding moments. In Problems of Stability of Stochastic Models, Proceedings, pages 64–72, Moscow, 1980.

450

References

[293] L.B. Klebanov and S.T. Rachev. The method of moments in computer tomography. Math. Scientist, 20:1–14, 1995. [294] L.B. Klebanov and S.T. Rachev. On a special case of the basic problem in diffraction tomography. In Stochastic Models, 1995. [295] L.B. Klebanov and S.T. Rachev. Closeness of probability measures with common marginals on finite number of direction. In Proceedings of Distributions with fixed Marginals and Related Topics, volume 28, pages 162–174. IMS Lecture Notes Monography Series, 1996. [296] L.B. Klebanov and S.T. Rachev. Proximity of probability with common marginals in a finite number of directions. In Distributions with Given Marginals, 1996. [297] P. Kleinschmidt, C.W. Lee, and H. Schannath. Transportation problems which can be solved by the use of Hirsch paths for the dual problems. Mathematical Programming Study, 37:153–168, 1987. [298] P.E. Kloeden and E. Platen. Numerical Solution of Stochastic Differential Equations. Springer-Verlag, Berlin, 1992. [299] M. Knott and C.S. Smith. On the optimal mapping of distributions. Journal of Optimization Theory and Applications, 43:39–49, 1984. [300] M. Knott and C.S. Smith. Note on the optimal transportation of distributions. Journal of Optimization Theory and Applications, 52:323– 329, 1987. [301] M. Knott and C.S. Smith. On Hoeffding–Fréchet bounds and cyclic monotone relations. Journal of Multivariate Analysis, 40:328–334, 1992. [302] M. Knott and C.S. Smith. On a generalization of cyclic monotonicity and distances among random vectors. Linear Algebra and its Applications, 199:363–371, 1994. [303] D.E. Knuth. The Art of Computer Programming, volume II. Addison-Wesley, 1969. [304] J. Koml´ os, P. Major, and G. Tusn´ ady. An approximation of partial sums of independent r.v.s and the sample d.f., I. Z. Wahrscheinlichkeitstheorie Verw. Geb., 32:111–131, 1975. [305] J. Koml´ os, P. Major, and G. Tusn´ ady. An approximation of partial sums of independent r.v.s and the sample d.f., II. Z. Wahrscheinlichkeitstheorie Verw. Geb., 34:33–58, 1976. [306] M.G. Krein and A.A. Nudelman. The markov moment problem and extremal problems, 1977.

References

451

[307] W.M. Kruskal. Ordinal measures of association. Journal of the American Statistical Association, 53:814–861, 1958. [308] J. Kuelbs. Kolmogorov’s law of the iterated logarithm for Banach space valued random variables. Illinois J. Math., 21:784–800, 1977. [309] K. Kuratowski. Topology, volume I. Academic Press, New York, 1966. [310] K. Kuratowski. Topology, volume II. Academic Press, New York, 1969. [311] I. Kuznezova-Sholpo and S.T. Rachev. Explicit solutions of moment problems. Probability and Mathematical Statistics, 10:297–312, 1989. [312] J.J. Laffont and E. Maskin. A differential approach to dominant strategy mechanisms. Econometrica, 48:1507–1520, 1980. [313] T.L. Lai and M. Robbins. Maximally dependent random variables. Proc. Nat. Acad. Sci. USA, 73:286–288, 1976. [314] P. Lancaster. Theory of Matrices. Wiley, New York, London, 1969. [315] D. Landers and L. Rogge. Best approximations in Lφ -spaces. Z. Wahrscheinlichkeitstheorie Verw. Geb., 51:215–237, 1980. [316] F. Lassner. Sommes de produit de variables aléatoires indépendantes. Thesis, Université de Paris VI, 1974. [317] M. Ledoux and M. Talagrand. Springer, Berlin, 1991.

Probability in Banach Spaces.

[318] S.J. Leese. Multifunctions of Suslin type. Bull. Austral. Math. Soc., 11:395–411, 1975. and 13:159-160. [319] G. Letac. Représentation des mesures de probabilité sur le produit de deux espaces denombrables, de marges données. Ann. Inst. Fourier, 16:497–507, 1966. [320] G. Letac. A contraction principle for certain Markov chains and its applications. Random matrices and their applications. In H. Kesten J.E. Cohen and C.M. Newman, editors, Proc. AMS-IMS-SIAM Joint Summer Research Conf. 1984, volume 50 of Contemp. Math., pages 263–273, Providence, R.I., 1986. Amer. Math. Soc. [321] V.L. Levin. Application of E. Helly’s theorem to convex programming, problems of best approximation and related questions. USSR Math. Sbornik, 8:235–248, 1969.

452

References

[322] V.L. Levin. Duality and approximation in the problem of mass transfer. In B.S. Mityagin, editor, Mathematical Economics and Functional Analysis, pages 94–108. Nauka, Moscow, 1974. In Russian. [323] V.L. Levin. On the problem of mass transfer. Soviet Math. Doklady, 16:1349–1353, 1975. [324] V.L. Levin. On the theorems in the Monge–Kantorovich problem. Uspekhi Mat. Nauk, 32:171–172, 1977. In Russian. [325] V.L. Levin. The mass transfer problem, strong stochastic domination and probability measures on the product of two compact spaces with given projections. Preprint, TsEMI, Moscow, 1978a. In Russian. [326] V.L. Levin. The Monge–Kantorovich problem on mass transfer. In Methods of Functional Analysis in Mathematical Economics, pages 23–55. Nauka, Moscow, 1978b. In Russian. [327] V.L. Levin. Measurable selections of multivalued mappings into topological spaces and upper envelopes of Carathéodory integrands. Soviet Math. Doklady, 21:771–775, 1980. [328] V.L. Levin. Some applications of duality for the problem of translocation of masses with a lower semicontinuous cost function. Closed preferences and Choquet theory. Soviet Math. Doklady, 2:262–267, 1981. [329] V.L. Levin. A continuous utility theorem for closed preorders on a compact metrizable space. Soviet Math. Doklady, 28:715–718, 1983a. [330] V.L. Levin. Measurable utility theorems for closed and lexicographic preference relations. Soviet Math. Doklady, 27:639–643, 1983b. [331] V.L. Levin. Lipschitz preorders and Lipschitz utility functions. Russian Mathematical Surveys, 39:199–200, 1984a. [332] V.L. Levin. The mass transfer problem in topological space and probability measures on the product of two spaces with given marginal measures. Soviet Math. Doklady, 29:638–643, 1984b. [333] V.L. Levin. Convex Analysis in Spaces of Measurable Functions and Its Applications in Mathematics and Economics. Nauka, Moscow, 1985a. In Russian. [334] V.L. Levin. Functionally closed preorders and strong stochastic dominance. Soviet Math. Doklady, 32:22–26, 1985b.

References

453

[335] V.L. Levin. Extremal problems with probability measures, functionally closed preorders and strong stochastic dominance. In Stochastic Optimization, volume 81 of Lecture Notes in Control and Information Science, pages 435–447, Berlin, New York, 1986. Proc. Int. Conf. Kiev 1984, Springer-Verlag. [336] V.L. Levin. Measurable selectors of multivalued mappings and the mass transfer problem. Dokl. Akad. Nauk USSR, 292:1048–1053, 1987. [337] V.L. Levin. General Monge–Kantorovich problem and its applications in measure theory and mathematical economics. In L.J. Leifman, editor, Functional Analysis, Optimization and Mathematical Economics. Oxford University Press, 1990. A collection of papers dedicated to the Memory of L.V. Kantorovich. [338] V.L. Levin. Some applications of set-valued mappings in mathematical economics. Journal of Mathematical Economics, 20:69–87, 1991. [339] V.L. Levin. A formula for the optimal value in the Monge– Kantorovich problem with a smooth cost function and a characterization of cyclically monotone mappings. USSR Math. Sbornik, 71:533–548, 1992. [340] V.L. Levin. Private communication, 1994. [341] V.L. Levin. Quasi-convex functions and quasi-monotone operators. Journal of Convex Analysis, 2, 1995a. [342] V.L. Levin. Reduced cost functions and their applications. Journal of Mathematical Economics, 1995b. To appear. [343] V.L. Levin and A.A. Milyutin. The mass transfer problem with discontinuous cost function and a mass setting for the problem of duality of convex extremum problems. Trans Russian Math. Surveys, 34:1– 78, 1979. [344] V.L. Levin and S.T. Rachev. New duality theorems for marginal problems with some applications in stochastics. Lecture Notes in Mathematics, 1412:137–170, 1989. [345] M. Loeve. Probability Theory. Van Nostrand, 1977. [346] G.G. Lorentz. A problem of plane measure. Amer. J. Math., 71:417– 426, 1949. [347] G.G. Lorentz. An inequality for rearrangements. American Mathematics Monthly, 60:176–179, 1953.

454

References

[348] G. Louchard. Exact and asymptotic distributions in digital and binary search trees. Theor. Inf. Appl., 21:479–495, 1987. [349] R. Lucchetti and F. Patrone. Closure and upper semicontinuity results in mathematical programming, Nash and economic equilibria, Optimization. Mathematische Operationsforschung und StatistikSeries Optimization, 17:619–628, 1986. [350] N. Lusin. Le¸cons sur les Ensembles Analytiques. Gauthier-Villars, 1930. [351] M. Maejima. Some limit theorems for summability methods of iid random variables. In V.V. Kalashnikov et al., editor, Stability problems of stochastic models, volume 1233 of Lecture Notes in Mathematics, pages 57–68, 1985. Varna 1985. [352] M. Maejima. Some limit theorems for stability methods of i.i.d. random variables. Lecture Notes in Mathematics, 1233:57–68, 1988. [353] M. Maejima and S.T. Rachev. An ideal metric and the rate of convergence to a self-similar process. Annals of Probability, 15:708–727, 1987. [354] M. Maejima and S.T. Rachev. Rates of convergence in the operatorstable limit theorems. J. Theor. Probability, 9:37–86, 1996. [355] H.M. Mahmoud. Evolution of Random Search Trees. Wiley, New York, London, 1992. [356] G.D. Makarov. Estimates for the distributions function of a sum of two random variables when the marginal distributions are fixed. Theory of Probability and its Applications, 26:803–806, 1981. [357] C.L. Mallows. A note on asymptotic joint normality. Annals of Mathematical Statistics, 43:508–515, 1972. [358] B.B. Mandelbrot. Multiplications aléatoires itérées et distributions invariantes par moyenne pondérée aléatorie. C.R. Acad. Sci. Paris, 278, 1974. [359] B.B. Mandelbrot and M. Taylor. On the distribution of stock price differences. Oper. Res., 15:1057–1062, 1967. [360] M. Marcus. Some properties and applications of doubly stochastic matrices. American Mathematics Monthly, 67:215–222, 1960. [361] A.W. Marshall and I. Olkin. Theory of majorization and its applications. Academic Press, New York, 1979.

References

455

[362] G. Maruyama. Continuous Markov processes and stochastic equations. Rend. Circolo Math. Palermo, 4:48–90, 1955. [363] A. Mas-Colell. On the continuous representation of preorders. Intern. Econ. Revue, 18:509–513, 1977. [364] E. Maskin and J. Riley. Monopoly with incomplete information. Rand Journal of Economics, 15:171–196, 1984. [365] J.L. Massey. Collision-resolution algorithms and random-access communications, multi-user communication systems. CISM Courses and Lectures, 1981. [366] R. Mathar and D. Pfeifer. Stochastik f¨ ur Informatiker. Teubner, Stuttgart, 1990. [367] G. Matheron. Random Sets and Integral Geometry. Wiley, 1975. [368] M. Meerschaert. Moments of random vectors which belong to some domain of normal attraction. Annals of Probability, 18:870–876, 1989. [369] M. Meerschaert. Spectral decomposition for generalized domains of attraction. Annals of Probability, 19:875–892, 1991. [370] K. Mehlhorn. Datenstrukturen und effiziente Algorithmen, volume I. Teubner, Stuttgart, 1986. [371] I. Meilijson and A. Nadas. Convex majorization with application to the length of critical paths. Journal of Applied Probability, 16:671– 677, 1979. [372] D. Mejzler. On the problem of the limit distributions for the maximal term of a variational series. Lvov Politechn. Inst. Naucn. Zap. Ser. Fiz.-Mat., 38:90–109, 1956. In Russian. [373] E. Michael. Continuous selections. Ann. of Math., 63:361–382, 1956. [374] P. Mikusinski, H. Sherwood, and M.D. Taylor. Probabilistic interpretations of copulas and their convex sums. In Symp. Probab. Measures with Given Marginals, volume 67 of Math. Appl., pages 95–112, Rome, 1991. [375] G.N. Milshtein. A method of second-order accuracy integration of stochastic differential equations. Theory of Probability and its Applications, 23, 1978. [376] G.N. Milshtein. Numerical integration of stochastic differential equations. Izd. Ural. Univ. Sverdlovsk, 1988. In Russian.

456

References

[377] J.A. Mirrlees. Optimal tax theory: a synthesis. Journal of Public Economics, 6:327–358, 1976. [378] S. Mittnik and S.T. Rachev. Alternative multivariate stable distributions and their applications to financial modeling. In S. Cambanis, G. Samordodnitsky, and M.S. Taqqu, editors, Stable Processes and Related Topics, pages 107–120, Boston, 1991. Birkhäuser. [379] S. Mittnik and S.T. Rachev. Modeling assets returns with alternative stable laws. Econometric reviews, 12(3):261–330, 1993. [380] S. Mittnik and S.T. Rachev. Reply on comments on “modeling assets returns with alternative stable laws” and some extensions. Econometric reviews, 12(3):347–389, 1993. [381] S. Mittnik and S.T. Rachev. Modelling Financial Assets with Alternative Stable Models. Series in Financial Economics and Quantitative Analysis. Wiley, New York, 1997. [382] G. Monge. Mémoire sur la théorie des déblais et des remblais, 1781. [383] F. Mosteller, C. Youtz, and D. Zahn. The distribution of sums of rounded percentages. Demography, 4:850–858, 1967. [384] K.R. Mount and S. Reiter. Construction of a continuous utility function for a class of preferences. Journal of Mathematical Economics, 3:227–245, 1976. [385] L. Nachbin. Topology and Order. Van Nostrand, New York, 1965. [386] R.B. Nelsen. Copulas and association. In Symp. Probab. Measures with Given Marginals, pages 51–74, Rome, 1991. Kluwer. [387] W. Neuefeind. On continuous utility. Journal of Economic Theory, 5:174–176, 1972. [388] J. Neveu. Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco, 1965. [389] J. Neveu and R.M. Dudley. On Kantorovich–Rubinstein theorems. (Transcript), 1980. [390] V.B. Nevzorov. Records. Theory of Probability and its Applications, 32:201–228, 1988. [391] N.J. Newton. An asymptotically efficient difference formula for solving stochastic differential equations. Stochastics, 19:175–206, 1986. [392] I. Olkin and F. Pukelsheim. The distance between two random vectors with given dispersion matrices. Journal of Linear Algebra and its Applications, 48:257–263, 1982.

References

457

[393] I. Olkin and F. Pukelsheim. Marginal problems with additional constraints. Tech. report, 270, 1990. Department of Statistics, Stanford University, Stanford, CA. [394] I. Olkin and S.T. Rachev. Distances among random vecotrs with given dispersion matrices. Preprint, 1991. Department of Statistics, Stanford University, Stanford, CA. [395] I. Olkin and S.T. Rachev. Maximum submatrix traces for positive definite matrices. SIAM Journal of Matrix Analysis Applications, 14:390–397, 1993. [396] J.M. Ortega and W.C. Rheinboldt. Iterative solution of nonlinear equations in several variables. Academic Press, New York, 1970. [397] J. Pachl. Two classes of measures. Colloq. Math., 42:331–340, 1979. [398] E. Pardoux and D. Talay. Discretization and simulation of stochastic differential equations. Acta Appl. Math., 3:23–47, 1985. [399] V. Paulauskas and A. Rackauskas. Approximation Theory in the Central Limit Theorem. Kluwer Academic Publisher, 1989. [400] A.S. Paulson and V.R.R. Uppuluri. Limit laws of a sequence determined by a random difference equation governing a one-compartment system. Math. Biosci., 13:325–333, 1972. [401] A. Perez and R. Jirousek. Constructing an intentional expert system INES. In J.H. van Remmel, F. Gremy, and J. Zvarova, editors, Medical decision making: Diagnostic Strategies and Expert Systems, pages 307–315. North-Holland, 1985. [402] S. Perrakis and C. Henin. Evaluation of risky investments with random timing of cash returns. Management Sci., 21:79–86, 1974. [403] D. Pfeifer. Some remarks on Nevzorov’s record model. Advances in Applied Probability, 23:823–834, 1991. [404] G. Pflug. Stochastische Modelle in der Informatik. Stuttgart, 1986.

Teubner,

[405] G. Pisier and J. Zinn. On limit theorems for random variables with values in the spaces Lp . Z. Wahrscheinlichkeitstheorie Verw. Geb., 41:286–305, 1977. [406] B. Pittel. Paths in a random digital tree: Limiting distributions. Advances in Applied Probability, 18:139–155, 1986. [407] E. Platen. An approximation method for a class of Itˆ o processes. Lietuvos Math. Rink. XXI, 1:121–133, 1981.

458

References

[408] D. Pollard. Convergence of Stochastic Processes. Springer, 1984. [409] C.J. Preston. A generalization of the FKG inequalities. Comm. Math. Phys., 36:233–241, 1974. [410] P.S. Puri. On almost sure convergence of an erosion process due to Todorovic and Gani. Journal of Applied Probability, 24:1001–1005, 1987. [411] G. Pyatt and J.J. Round, editors. Social Accounting Matrics: A Basis for Planning. World Bank, Washington, D.C., 1985. [412] R. Pyke and D. Root. On convergence in r-mean of normalized partial sums. Annals of Mathematical Statistics, 39:379–381, 1968. [413] S.T. Rachev. On a metric construction of Hausdorff in a space of probability measures. Zapiski Nauchn. Sem. LOMI, 87:87–104, 1978. [414] S.T. Rachev. Minimal metrics in a space of real random variables. Dokl. Akad. Nauk SSSR, 257(5):1067–1070, 1981. [415] S.T. Rachev. On minimal metrics in the space of real-valued random variables. Soviet Dokl. Math., 23(2):425–438, 1981a. [416] S.T. Rachev. Minimal metrics in the random variables spaces. Pub. Inst. Stat. Univ. Paris, 27(1):27–47, 1982a. [417] S.T. Rachev. Minimal metrics in the random variables spaces. In W. Grossmann et al., editor, Probability and Statistical Inference Proceedings of the 2nd Pannonian Symp., pages 319–327, Dordrecht, 1982b. D. Reidel Company. [418] S.T. Rachev. Compactness in the probability measures space. In M. Galyare et al., editor, Proceedings of the 3rd European Young Statisticians Meeting, pages 136–150, Katholieke Univ., Leuven, 1983a. [419] S.T. Rachev. Minimal metrics in the real valued random variable spaces. Lecture Notes in Mathematics, 982:172–190, 1983b. [420] S.T. Rachev. Hausdorff metric construction in the probability measures space. Studia Mathematica, 7:152–162, 1984a. Pliska. [421] S.T. Rachev. The Monge–Kantorovich mass transference problem and its stochastic applications. Theory of Probability and its Applications, 29:647–676, 1984b. [422] S.T. Rachev. On a class of minimal functionals on a space of probability measure. Theory of Probability and its Applications, 29(1):41–49, 1984c.

References

[423] S.T. Rachev. On a problem of Dudley. 29(2):162–164, 1984d.

459

Soviet Math. Doklady,

[424] S.T. Rachev. Extreme functionals in the space of probability measures. Lecture Notes in Mathematics, 1155:320–348, 1985a. Proc. “Stability Problems for Stochastic Models”. [425] S.T. Rachev. Probability metrics and their applications to the stability problems for stochastic models, 1985b. Author’s review of doctor of sciences theses, Steklov Mathematical Institute, USSR Academy of Sciences, Moscow. In Russian. [426] S.T. Rachev. Extreme functional in the space of probability theory and mathematical statistics. VNU Science Press, 2:474–476, 1986. [427] S.T. Rachev. Minimal metrics in a space of random vectors with fixed one-dimensional marginal distributions. J. Soviet Math., 34(2):1542– 1555, 1986. Stability Problems for Stochastic Models. Proceedings, Moscow, VNIISI. [428] S.T. Rachev. The stability of stochastic models. Applied Probability Newsletter, 12(2):3–4, 1988. [429] S.T. Rachev. The problem of stability in queueing theory. Queueing Systems Theory and Applications, 4:287–318, 1989. [430] S.T. Rachev. Mass transshipment problems and ideal metrics. Numer. Func. Anal. & Optimiz., 12(5& 6):563–573, 1991a. [431] S.T. Rachev. Optimal mass transportation problems. In Proceedings of XI Congres de Metodologias en Ingenieria de Sistemas, pages 115– 120, Azocar, Santiago de Chile, 1991b. [432] S.T. Rachev. Probability Metrics and the Stability of Stochastic Models. Wiley, Chichester-New York, 1991c. [433] S.T. Rachev. Theory of probability metrics and recursive algorithms. In S. Joly and G. le Calve, editors, Distancia 1992, Proceedings of Congres International sur Analyse en Distance, pages 339–403, Université de haute Bretagne, Rennes, 1992. [434] S.T. Rachev and G.S. Chobanov. Minimality of ideal probabilistic metrics. Pliska, 2:1154–1158, 1986. In Russian. [435] S.T. Rachev, B. Dimitrov, and Z. Khalil. A probabilistic approach to optimal quality usage. Computers and Mathematics with Applications, 24(8/9):219–227, 1992. [436] S.T. Rachev and Z. Ignatov. Minimality of ideal probabilistic metrics. J. Soviet Math., 32(6):595–608, 1986.

460

References

[437] S.T. Rachev and S.I. Resnick. Max-geometric infinite divisibility and stability. Stoch. Models, 2:191–218, 1991. [438] S.T. Rachev and L. R¨ uschendorf. Approximation of sums by compound Poisson distributions with respect to stop-loss distances. Advances in Applied Probability, 22:350–374, 1990. [439] S.T. Rachev and L. R¨ uschendorf. A counterexample to a.s. constructions. Stat. Prob. Letters, 9:307–309, 1990a. [440] S.T. Rachev and L. R¨ uschendorf. A transformation property of minimal metrics. Theory of Probability and its Applications, 35:131–137, 1990b. [441] S.T. Rachev and L. R¨ uschendorf. Approximate independence of distributions on spheres and their stability properties. Annals of Probability, 19:1311–1337, 1991. [442] S.T. Rachev and L. R¨ uschendorf. Recent results in the theory of probability metrics. Statistics & Decisions, 9:327–373, 1991a. [443] S.T. Rachev and L. R¨ uschendorf. A new ideal metric with applications to multivariate stable limit theorems, summability methods and compound Poisson approximation. Probability Theory and Related Fields, 94:163–187, 1992. [444] S.T. Rachev and L. R¨ uschendorf. Rate of convergence for sums and maxima and doubly ideal metrics. Theory of Probability and its Applications, 37:276–289, 1992a. [445] S.T. Rachev and L. R¨ uschendorf. On constrained transportation problems. In Proceedings of the 32nd Conference on Decision and Control, volume 3, pages 2896–2900. IEEE Control System Society, 1993. [446] S.T. Rachev and L. R¨ uschendorf. On the Cox, Ross and Rubinstein model for option pricing. Theory of Probability and its Applications, 39:150–190, 1994. [447] S.T. Rachev and L. R¨ uschendorf. On the rate of convergence in the CLT with respect to the Kantorovich metric. In J. Kuelbs, M. Marcus, and J. Hoffman-Jorgensen, editors, 9th Conf. on Probability on Banach Spaces, pages 193–207, Boston–Basel–Berlin, 1994a. Birkhäuser. [448] S.T. Rachev and L. R¨ uschendorf. Propagation of chaos and contraction of stochastic mappings. Siberian Advances in Mathematics, 4:114–150, 1994b.

References

461

[449] S.T. Rachev and L. R¨ uschendorf. Solution of some transportation problems with relaxed or additional constraints. SIAM Journal of Control and Optimization, 32(3):673–689, 1994c. [450] S.T. Rachev and L. R¨ uschendorf. Probability metrics and recursive algorithms. Journal of Applied Probability, 27:770–799, 1995. Technical Report (1991). [451] S.T. Rachev and L. R¨ uschendorf. Propagation of chaos and contraction of stochastic mappings. Siberian Adv. Math., 4:114–150, 1995a. [452] S.T. Rachev, L. R¨ uschendorf, and A. Schief. Uniformities for the convergence in law and probability. Journal of Theoretical Probability, 5:33–44, 1992. [453] S.T. Rachev and G. Samorodnitsky. Geometric stable distributions in Banach spaces. Journal of Theoretical Probability, 7(29):351–373, 1994. [454] S.T. Rachev and G. Samorodnitsky. Limit laws for a stochastic process and random recursion arising in probabilistic modelling. Advances in Applied Probability, 27:185–203, 1995. [455] S.T. Rachev and A. Schief. On Lp -minimal metric. Probability and Mathematical Statistics, 13(2):311–320, 1992. [456] S.T. Rachev and A. SenGupta. Geometric stable distributions and Laplace–Weibull mixtures. Statistics & Decisions, 10:251–271, 1992. [457] S.T. Rachev and A. SenGupta. Laplace-Weibull mixtures for modeling price changes. Management Science, pages 1029–1038, 1993. [458] S.T. Rachev and R.M. Shortt. Classification problem for probability metrics, volume 94 of Contemporary Mathematics, pages 221–262. AMS, 1989. [459] S.T. Rachev and R.M. Shortt. Duality theorems for Kantorovich– Rubinstein and Wasserstein functionals. Dissertationes Mathematicae, 299:647–676, 1990. [460] S.T. Rachev and M. Taksar. Kantorovich’s functionals in space of measures. In I. Karatzas and D. Ocone, editors, Applied Stochastic Analysis, volume 77 of Lecture Notes in Control and Information Science, pages 248–261, Berlin–New York, 1992. Proceedings of the US–French Workshop, Springer-Verlag. [461] S.T. Rachev and P. Todorovic. On the rate of convergence of some functionals of a stochastic process. Journal of Applied Probability, 28:805–814, 1990.

462

References

[462] S.T. Rachev and J.E. Yukich. Rates for the CLT via new ideal metrics. Annals of Probability, 17:775–788, 1989. [463] S.T. Rachev and J.E. Yukich. Smoothing metrics for measures on groups with applications to random motions. Annales de l’Institut Henri Poincaré, 25:429–941, 1990. [464] S.T. Rachev and J.E. Yukich. Rates of convergence of α-stable random motions. J. Theor. Prob., 4:333–352, 1991. [465] A. Rackauskas. On the convergence rate in martingale CLT in Hilbert spaces. Preprint 90-031, 1990. University of Bielefeld. [466] D. Ramachandran. Perfect measures. Part I: Basic theory, volume 5. Macmillan, New Delhi, 1979. [467] D. Ramachandran. Perfect measures. Part II: Special topics, volume 7. Macmillan, New Delhi, 1979. [468] D. Ramachandran. Marginal problem in arbitrary product spaces. In Proceedings of the conference on “Distribution with Fixed Marginals, Double Stochastic Measures and Markov Operators”, volume 28, pages 260–272, Seattle, August 1993. IMS Lecture Notes Monograph Series 1997. [469] D. Ramachandran and L. R¨ uschendorf. A general duality theorem for marginal problems. Probability Theory and Related Fields, 101:311– 319, 1995. [470] D. Ramachandran and L. R¨ uschendorf. Duality and perfect probability spaces. Proc. Amer. Math. Soc., 124:2223–2228, 1996a. [471] D. Ramachandran and L. R¨ uschendorf. Duality theorems for assignments with upper bounds. In ‘Distributions with Fixed Marginals and Moment Problems’, pages 283–290. Kluwer, 1997. [472] D. Ramachandran and L. R¨ uschendorf. On the validity of the Monge– Kantorovich duality theorem. Preprint, 1997. [473] F. Ramsey. A mathematical theory of savings. Economic Journal, 38:543–559, 1928. [474] M. Regnier and P. Jacquet. New results on the size of tries. IEEE Transactions on Information Theory, 35:203–205, 1989. [475] S.I. Resnick and P. Greenwood. A bivariate stable characterization and domains of attraction. Journal of Multivariate Analysis, 9:206– 221, 1979.

References

463

[476] M.K. Richter. Duality and rationality. Journal of Economic Theory, 20:131–181, 1979. [477] H. Robbins. The maximum of identically distributed random variables. I.M.S. Bull., March 1975. Abstract. [478] H. Robbins and D. Siegmund. A convergence theorem for nonnegative almost supermartingales. In Rustagi, editor, Optimiz. Meth. in Statistics, pages 233–258. Academic Press, 1971. [479] J.C. Rochet. The taxation principle and multi-time Hamilton–Jacobi equation. Journal of Mathematical Economics, 14:113–128, 1985. [480] J.C. Rochet. A necessary and sufficient condition for rationalizability in a quasi-linear context. Journal of Mathematical Economics, 16:191–200, 1987. [481] R.T. Rockafellar. Characterization of the subdifferentials of convex functions. Pacific J. Math., 17:497–510, 1966. [482] R.T. Rockafellar. Convex Analysis. Princeton Univ. Press, Princeton, NJ, 1970. [483] C. Rogers. Coupling of random walks, 1992. Private communication. [484] W.W. Rogosinski. Moments of non-negative mass. In Proceedings of Royal Society London, Ser. A, volume 245, pages 1–27, 1958. [485] W. R¨ omisch. An approximation method in stochastic optimization and control. In Optimization techniques, volume 22, pages 169–178. Proc. 9th IFIP Conf., Warsaw 1979, Part 1, Lecture Notes in Control and Information Science, 1980. [486] W. R¨ omisch. On discrete approximations in stochastic programming, 1981. Seminarbericht. [487] W. Römisch and R. Schultz. Stability analysis of stochastic programs. Ann. Operat. Res., 30:241–266, 1991. [488] W. Römisch and R. Schultz. Stability of solutions for stochastic programs with complete recourse. Mathematics of Operations Research, 18:590–609, 1993. [489] W. Römisch and A. Wakolbinger. On Lipschitz dependence in systems with differentiated inputs. Math. Ann, 272:237–248, 1985. [490] U. R¨ osler. A limit theorem for quicksort. Informatique Théorique et Applications, 25:85–100, 1991.

464

References

[491] U. Rösler. A fixed point theorem for distributions. Stoch. Processes and Applications, 37:195–214, 1992. [492] S.M. Ross. A simple heuristic approach to simplex efficiency. European J. Oper. Res., 9:344–346, 1982. [493] S.M. Ross. Stochastic Processes. Wiley, New York, 1983. [494] B. R¨ uger. Scharfe untere und obere Schranken f¨ ur die Wahrscheinlichkeit der Realisation von k unter n Ereignissen. Metrika, 26:71–77, 1979. [495] L. R¨ uschendorf. Vergleich von Zufallsvariablen bzgl. integralinduzierter Halbordnungen, 1979. Habilitationsschrift. [496] L. R¨ uschendorf. Inequalities for the expectiation of -monotone functions. Z. Wahrscheinlichkeitstheorie Verw. Geb., 54:341–349, 1980. [497] L. R¨ uschendorf. Ordering of distributions and rearrangement of functions. Annals of Probability, 9:276–283, 1980. [498] L. R¨ uschendorf. Sharpness of Fréchet-Bounds. Z. Wahrscheinlichkeitstheorie Verw. Geb., 57:293–302, 1981. [499] L. R¨ uschendorf. Random variables with maximum sums. Advances in Applied Probability, 14:623–632, 1982. [500] L. R¨ uschendorf. On the multidimensional assignment problem. Methods of OR, 47:107–113, 1983. [501] L. R¨ uschendorf. Solution of a statistical optimization problem by rearrangement methods. Metrika, 30:55–62, 1983. [502] L. R¨ uschendorf. On the minimum discrimination information theorem. Statistics & Decisions, 1:263–283, 1984. Suppl. Issue. [503] L. R¨ uschendorf. Construction of multivariate distributions with given marginals. Ann. Inst. Stat. Math., 37:225–233, 1985. [504] L. R¨ uschendorf. The Wasserstein distance and approximation theorems. Z. Wahrscheinlichkeitstheorie Verw. Geb., 70:117–129, 1985. [505] L. R¨ uschendorf. Monotonicity and unbiasedness of tests via a.s. constructions. Statistics, 17:221–230, 1986. [506] L. R¨ uschendorf. Fréchet-bounds and their applications. In G. Dall’Aglio, S. Kotz, and G. Salinetti, editors, Advances in Probability Measure with Given Marginals, pages 151–188. Kluver, Amsterdam, 1991.

References

465

[507] L. R¨ uschendorf. Bounds for distributions with multivariate marginals. In K. Mosler and M. Scarsini, editors, Stochastic Order and Decision under Risk, volume 19, pages 285–310. IMS Lecture Notes, 1991a. [508] L. R¨ uschendorf. Conditional stochastic ordering of distributions. Advances in Applied Probability, 23:46–63, 1991b. [509] L. R¨ uschendorf. Stochastic ordering of likelihood ratios and partial sufficiency. Statistics, 22:551–558, 1991c. [510] L. R¨ uschendorf. Optimal solutions of multivariate coupling problems. Appl. Mathematicae, 22:325–338, 1995. [511] L. R¨ uschendorf. Developments on Fréchet bounds. In Proceedings of Distributions with Fixed Marginals and Related Topics, volume 28, pages 273–296. IMS Lecture Notes Monograph Series, 1996. [512] L. R¨ uschendorf. On c-optimal random variables. Statistics Prob. Letters, 27:267–270, 1996. [513] L. R¨ uschendorf and S.T. Rachev. A characterization of random variables with minimum L2 -distance. Journal of Multivariate Analysis, 32:48–54, 1990. [514] L. R¨ uschendorf, B. Schweizer, and M.D. Taylor. Distributions with Fixed Marginals and Related Topics. In Proceedings of Distributions with Fixed Marginals and Related Topics, volume 28. IMS Lecture Notes Monograph Series, 1996. [515] L. R¨ uschendorf and L. Uckelmann. On optimal multivariate couplings. In Distribution with given marginals and moment problems, pages 261–274. Kluwer, 1997. [516] T. Rychlik. Stochastically extremal distributions of order statistics for dependent samples. Statistics & Probability Letters, 13:337–341, 1992. [517] C. Ryll-Nardzewski. 40:125–130, 1953.

On quasi-compact measures.

Fund. Math.,

[518] G. Samorodnitsky and M. Taqqu. Stable Non-Gaussian Random Processes. Stochastic Models with Infinite Variance. Chapman & Hall, New York, 1994. [519] E. Samuel and R. Bachi. Measures of distance of distribution functions and some applications. Metron, 23:83–122, 1964. [520] V.V. Sazonov. Normal approximation - some recent advances. Lecture Notes in Mathematics, 879, 1981.

466

References

[521] H.H. Schaefer. Topological Vector Spaces. Springer, New York, 1966. [522] M. Schaefer. Note on the k-dimensional Jensen inequality. Annals of Probability, 2:502–504, 1976. [523] G. Schay. Optimal joint distributions of several random variables with given marginals. Stud. Appl. Math., LXI:179–183, 1979. [524] L. Schwartz. Radon Measures On Arbitrary Topological Spaces and Cylindrical Measures. Oxford University Press, London, 1973. [525] B. Schweizer. Thirty years of copulas. In G. Dall’Aglio, S. Kotz, and G. Salinetti, editors, Symp. Probab. Measures with Given Marginals, pages 13–50, Rome, 1990. Kluwer. [526] B. Schweizer and A. Sklar. Probabilistic Metric Spaces. Elsevier, North-Holland, 1983. [527] L. Seidel. On limit distributions of random symmetric polynomials. Theory of Probability and its Applications, 23:266–278, 1988. [528] V.V. Senatov. Uniform estimates of the rate of convergence in the multi-dimensional central limit theorem. Theory of Probability and its Applications, 25:745–759, 1980. [529] V.V. Senatov. Some lower estimates for the rate of convergence in the multi-dimensional central limit theorem. Soviet Math. Doklady, 23:188–192, 1981. [530] W.J. Shafer and H.F. Sonnenschein. Equilibrium in abstract economics without ordered preferences. Journal of Mathematical Economics, 2:345–348, 1975. [531] L.S. Shapley and M. Shubik. The assignment game, 1: the core. Int. J. Game Theory, 1:110–130, 1972. [532] M. Sharpe. Operator-stable probability distributions on vector groups. Trans. Amer. Math. Soc., 136:51–65, 1969. [533] A.N. Shiryaev. Probability Theory. Springer, 1984. [534] J.A. Shohat and J.D. Tamarkin. The Problem of Moments. American Mathematical Society, Providence, 1943. [535] I.A. Sholpo. ε-minimal metrics. Theory of Probability and its Applications, 28:854–855, 1983. [536] G.R. Shorack and J.A. Wellner. Empiricial Processes With Applications to Statistics. Wiley, New York, 1986.

References

467

[537] R.M. Shortt. Private communication. [538] R.M. Shortt. Combinatorial methods in the study of marginal problems over separable spaces. Journal of Mathematical Analalysis and its Applications, 97:462–479, 1983. [539] R.M. Shortt. Strassen’s marginal problems in two or more dimensions. Z. Wahrscheinlichkeitstheorie Verw. Geb., 64:313–325, 1983. [540] R.M. Shortt. Univerally measurable spaces: An invariance theorem and diverse characterizations. Fund. Math. Th., 121:35–42, 1983. [541] H.J. Skala. The existence of probability measures with given marginals. Annals of Probability, 21:136–142, 1993. [542] M. Sklar. Fonctions de repartition a dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris, 8:229–231, 1959. [543] C.S. Smith and M. Knott. Note on the optimal transportation of distributions. Journal of Optimization Theory and Applications, 52:323– 329, 1987. [544] C.S. Smith and M. Knott. On Hoeffding–Fréchet bounds and cyclic monotone relations. Journal of Multivariate Analysis, 40:328–334, 1992. [545] T.A.B. Snijders. Antithetic variates for Monte Carlo estimation of probabilites. Statistics Neerlandica, 38:1–19, 1984. [546] D. Stoyan. Comparison Methods for Queues and Other Stochastic Models. Wiley, 1983. [547] V. Strassen. The existence of probability measures with given marginals. Annals of Mathematical Statistics, 36(2):423–439, 1965. ˘ epán. Simplicial measures. In Memor. Vol. of J. H´ [548] J. St˘ ajek, pages 239–251. Academia Prague, 1977. ˘ epán. Probability measures with given expectations. In Proc. of [549] J. St˘ the 2nd Prague Symp. on Asympt. Statistics, pages 315–320. North Holland, 1979. [550] V.N. Sudakov. Geometric problems in the theory of infinite dimensional probability distributions. Proc. Steklov Inst. Math., 141(2), 1979. [551] H. Sussmann. On the gap between deterministic and stochastic differential equations. Annals of Probability, 6:19–41, 1978.

468

References

[552] A.S. Sznitman. Equations de type de Boltzmann, Spatialement homogènes. Z. Wahrscheinlichkeitstheorie Verw. Geb., 660:559–592, 1984. [553] A.S. Sznitman. Propagation of chaos. In Ecole d’Eté Saint-Flour, volume 1464 of Lecture Notes in Mathematics, pages 165–251, 1989. [554] A. Szulga. On the Wasserstein metric. In Transactions of the 8th Prague Conference on Information Theory, Statistical Decision Functions and Random Processes, volume B, pages 267–273, Prague, 1978. Akademia Praha. [555] A. Szulga. On minimal metrics in the space of random variables. Theory of Probability and its Applications, 27:424–430, 1982. [556] W. Szwarc and M. Posner. The tridiagonal transportation problem. Operations Research Letters, 3:25–30, 1984. [557] M. Talagrand. Matching random samples in many dimensions. Annals of Applied Probability, 2:846–856, 1992. [558] D. Talay. Résolution trajectorielle et analyse numérique des équations differentielles stochastiques. Stochastics, 9:275–306, 1988. [559] H. Tanaka. An inequality for a functional of probabillity distributions and its applications to Kac’s one-dimensional modal of a Maxwellian gas. Z. Wahrscheinlichkeitstheorie Verw. Geb., 27:47–52, 1973. [560] H. Tanaka. Probabilistic treatment of the Boltzmann equation for Maxwellian molecules. Z. Wahrscheinlichkeitstheorie Verw. Geb., 46:67–105, 1978. [561] A.H. Tchen. Inequalities for distributions with given marginals. Annals of Probability, 8:814–827, 1980. [562] P. Todorovic. An extremal problem arising in soil erosion modeling, pages 65–73. Reidel, Dordrecht, 1987. edt.: I.B. MacNeil and G.J. Umphrey. [563] P. Todorovic and J. Gani. Modeling of the effect of erosion on crop production. Journal of Applied Probability, 24:787–797, 1987. [564] Y.L. Tong. Probability Inequalities in Multivariate Distributions. Academic Press, 1980. [565] D.M. Topkis and A.F. Veinott jr. Monotone solution of extremal problems on lattices (abstract). In Abstract of 8th International Symposium on Mathematical Programming, volume 131, Stanford, CA,, 1973. Stanford University.

References

469

[566] A. Tuero-Diaz. Aplicaciones crecientes: Relaciones con las métricas de Wasserstein. PhD thesis, Universidad de Cantabria, 1991. [567] A. Tuero-Diaz. On the stochastic convergence of representations based on Wasserstein metrics. Annals of Probability, 21:72–85, 1993. [568] L. Uckelmann. Konstruktion von optimalen Couplings. Universit¨ at M¨ unster, 1993. Diplom-Arbeit. [569] L. Uckelmann. Optimal couplings between one dimensional distributions. In Distribution with given marginals and moment problems, pages 275–282. Kluwer, 1997. [570] V.R.R. Uppuluri, P.I. Feder, and L.R. Shenton. Random difference equations occuring in one-compartment models. Math. Biosci., 2:143–171, 1967. [571] S.S. Vallander. Calculation of the Wasserstein distance between probability distributions on the line. Theory of Probability and its Applications, 18:784–786, 1973. [572] A.F. Veinott Jr. Representation of general and polyhedral sublattices and sublattices of product spaces. Journal of Linear Algebra and its Applications, 114/115:681–704, 1989. [573] A.M. Vershik. Some remarks on infinite-dimensional linear programming problems. Russian Math. Surveys, 25:117–124, 1970. [574] A.M. Vershik and V. Temelt. Some questions of approximation of the optimal value of infinite-dimensional linear programming problems. Siberian Math. J, 9:591–601, 1968. [575] W. Vervaat. On a stochastic difference equation and a representation of non-negative infinitely divisible random variables. Advances in Applied Probability, 11:750–783, 1979. [576] N.N. Vorobev. Consistent families of measures and their extensions. Theory of Probability and its Applications, 7:147–163, 1962. [577] W. Wagner. Monte Carlo evalutation of functionals of solutions of stochastic differential equations. Variance reduction and numerical examples. Stoch. Analysis Appl., 6:447–468, 1988. [578] W. Warmuth. Marginal Fréchet-bounds for multidimensional distribution functions. Statistics, 19:283–294, 1976. [579] L.N. Wasserstein. Markov processes over denumerable products of spaces describing large systems of automata. Problems of Information Transmission, 1969.

470

References

[580] H. von Weizsäcker and G. Winkler. Integral representation in the set of solutions of a generalized moment problem, 1980. [581] E. Wesley. Borel preference orders in markets with a continuum of traders. Journal of Mathematical Economics, 3:155–165, 1976. [582] A. Wieczorek. On the measurable utility theorem. Journal of Mathematical Economics, 7:165–173, 1980. [583] E. Wild. On Boltzmann’s equation in the kinetic theory of gases. Proc. Camb. Phil. Soc., 4:602–609, 1951. [584] G. Winkler. Choquet order and simplices with applications in probabilistic models. Lecture Notes in Mathematics, 1145, 1988. [585] J. Yukich. Exact order rates of convergence of empirical measures. Preprint, 1991. [586] J. Yukich. The exponential integrability of transportation cost. Preprint, 1991. [587] J. Yukich. Some generalizations of the Euclidean two-sample matching problem. Prob. Banach Spaces, 8:55–66, 1992. [588] V.M. Zolotarev. On the continuity of stochastic sequences generated by recursive procedures. Theory of Probability and its Applications, 20:819–832, 1975. [589] V.M. Zolotarev. Approximation of distributions of sums of independent random variables with values in infinite dimensional spaces. Theory of Probability and its Applications, 21:721–737, 1976. [590] V.M. Zolotarev. Metric distances in spaces of random variables and their distributions. Math. Sb., 30(3):393–401, 1976. [591] V.M. Zolotarev. General problems of the stability of mathematical models. Bull. Int. Stat. Inst., 47(2):382–401, 1977. [592] V.M. Zolotarev. On pseudomoments. Theory of Probability and its Applications, 23:269–278, 1978. [593] V.M. Zolotarev. On the properties and relationships of certain types of metrics. Zapiski Nauchn. Sem. LOMI, 87:18–35, 1978. [594] V.M. Zolotarev. Ideal metrics in the problems of probability theory and mathematical statistics. Austral. J. Statist., 21(3):193–208, 1979. [595] V.M. Zolotarev. Probability metrics. Theory of Probability and its Applications, 28:278–302, 1983.

References

471

[596] V.M. Zolotarev. Contemporary Theory of Summation of Independent Random Variables. Nauka, Moscow, 1986. In Russian. [597] V.M. Zolotarev. Modern theory of summation of independent random varables. Nauka, Moscow, 1987. In Russian. [598] V.M. Zolotarev and S.T. Rachev. Rate of convergence in limit theorems for the max scheme. In Stability Problems for stochastic models, volume 1155, pages 415–442. Springer, 1984.

Abbreviations

Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume.

a.e. ARCH a.s.

almost everywhere autoregressive conditional heteroscedasticity almost sure

158, 385 39 8

BLIL

bounded law of the iterated logarithm

306

CLT ch.f. CRI CTM

central limit theorem characteristic function communication resolution interval Capetanakis–Tsybakov–Mikhailov

85 400 38 220

d.f.(s) dna DP DTP

distribution function(s) domain of normal attraction dual polyhedron dual transportation problem

8, 107 306 23 23

GARCH htl

general ARCH explained on page

39 433

IFS i.i.d.

iterated function systems independent identically distributed

202 35

KKR KRP

Kakosjan, Klebanov, and Rachev Kantorovich–Rubinstein transshipment problem

43 vii, 2

LCFS LHS LLN lsc

last come first served left-hand side law of large numbers lower semicontinuous

220 405 81 113

Abbreviations

MKP MKTP MTPA MTP MTPP OTP PDE PERT PP r.f.(s) r.v.(s) SDE SLLN supp P TP usc

Monge–Kontorovich mass transportation problem classical Monge–Kantorovich transportation problem MTP with additional constraints mass transportation MTP with partial knowledge of the marginals optimal transportation plan partial differential equation network model primal polyhedron random field(s) random variable(s) stochastic differential equation strong law of large numbers support of P transportation problem upper semicontinuous

473

vii, 1, 19, 58 374 vii vii, 1 4 3 xii, xvi 148 23 248 3 39 30 20 21 127


Symbols

Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume. ◦

A d A=A

Ab Ab Adk

Am Am Aε A(h, g)

An (t) A+ n (t) An (α) An (α) Ap (H)

A(α)

◦

interior of A 59 closure of A with respect to d 68 69 69 set of all linear subspaces Vk of IRd 137 69 69 139 assumption for a moment problem 62 148 148 190, 235 190, 235 optimal multivariate transshipment costs 158 263

A r f Q ASp (P1 , P2 ) A∗ (α) Aut(IRd )

a(s, k) au1 (x) a(Z)

B B∗ Bx B(1, n1 ) B(g)

393 97 191, 235 all invertible linear operators (automorphisms) 151 109 296 superlinear mapping 241 Banach limit 366 adjoint operator of B 132 109 Bernoulli distribution 257 assumption for the solution of a moment problem 62

476

Symbols

BK (Si ) B n (α) Bkn (Zk ) Bn (m)

61 191 191 set of nonnegative Borel measures 377 Bp (H) upper bound for Ap (H) 158 B(p; x, y) quadratic form 280 Br ball of radius r 149 B(Si ) 72 B(X, B) 58 B1 (x, y) 28 B2 (x, y) 28 B(α) 263 Bx (ε) 109 (B, · ) separable Banach space C(T ) 248 b transshipment 371 br absolute moments 102 bu1 (x) 296 ba (P1 , . . . , Pn ) measures with fixed marginals 62 ba (S, B) finitely additive measures 58 C, C

i

Cp,d Cs,Z

Cs,εZ

spaces of continuous and i times differentiable functions 255, 333 321 integrable (s − 1)-fold derivative 115 117

C(g)

Cb (S), C b (S)

C(T ) C γ (c; σ1 , σ2 ) C(Q) ◦

C(Q) ◦

C(Q)∗

assumption for the solution of a moment problem 62 Banach space of bounded continuous real-valued functions on S 63, 164 Banach space 248 307 set of continuous functions 384 quotient of the space C(Q) 384

con(supp (P ))

conjugate space 393 set of all bounded continuous functions on IRd 152 166 166 170 covariance 108 conditional covariance 96 closure 304 discrete cost 27, 29 cost function viii, 10 ∂ = ∂x c(x, y) 128 reduced cost function 170 129

(D)

duality 76

Cb (IRd )

C(S)+ C(S)∗+ Cm (θ) Cov Cov (Xi |Fi−1 ) c c(i, j) c(x, y) c1 (x, y) c∗ (x, y)

Symbols

D(h, g)

DP D(P, Q) Dp f (x) DΦ Dk (ϑ) Dm (θ) dr dr,k d(h) dn,m (x, t) d(x, y) d(X, Y ) dr (X, Y ) dKR (σ1 , σ2 )

dn (µ) dom fk dom Γ E Ek−1 (f ; Q) E(Si )

Es (X, Y ) ess sup ex H

assumption for a moment problem 62 dual polyhedron 23 50 optimal pairs 103 i = ( ∂Φ ∂xj ) 118 112 170 smoothed version of d 137 61 divisor criterion 180 determinant of An,m (x, t) 397 76 uniform metric 137 probability metric 170 Kantorovich– Rubinstein distance 162 267 253 235 separable metric space 278 factor-norm 384 finite elementary functions on Si 76 set of points 425 essential supremum 386 extremal points of H 19

F∗ FP Fi F mn F mn (n)

Fn (Fs,X ) Fu F Mp (P1 , P2 ) Fi (s) F1 ∧ F2 (t) F1 ∨ F2 (t) F+ (x) F− (x)

477

Fréchet bound 19, 31, 33 distribution function of P 18 real distribution function 107 nth integral of m 375 survival function 375 385 423, 107 355 Fortét–Mourier metric 17, 51 293 infimal convolution 148 supremal convolution 148 := min(Fi (xi )) 107 := k Fi (xi )−(n−1) i=1

F1∗ (x) F2∗ (x) F(x, y)

F P (x, y) F σ (x, y) (−1) FNs (y) f fc f cc

f (m)

107 12 12 extended Fréchet bound 19 26 19 310 Young–Fenchel transform 104 c-conjugate of f 124 doubly c-conjugate of f 124 mth Fréchet derivative of f 102

+

478

Symbols

f (n)c f∗ f ∗∗

f (n)∗

f∗ f2 (u) fa (x) f (Z1 , Z2 ) fV (·)

Gk GQ

Gs,p Gα G|G|1|∞ G(m, α, β) Gs,X (t) G(u, v) Gσ (x, y) Gn (Z) G(µ) g= gr Γn g(χ) H H

nth c-conjugate of f 124 p-conjugate 114, 124 second p-conjugate 102, 112 n-conjugate function of f 112 lower conjugate 103 38 145 extension f 317 translation by V 95 359 determination of an optimal measure Q 29 class of functions 103 geometric α-stable r.v. 242 71 grid class 41 424 graph of (u, v) ∈ DP 23 19 255 µ-neglegible open set 221 (g1 , ..., gN ) ∈ M 63 graph of Γn 194 363 Haar probability 133 distribution function of

(k)

Hn hβ

hµ (A × B)

h(t1 , t2 )

I Iq Is I[A] I(|f − g|) I(h) I{0, g, a, b} IND i(x1 , x2 )

max(V1 , . . . , V ) 156 258 indicator or characteristic function 251 generalized upper Fréchet bound 54, 35 Hausdorff metric 248 175 unit matrix 334 operator 415 indicator function of a set A 139 semimetric on P(S) 67 65 69 = p (X, X) 76 indicator metric 111

JA

151

K(d, B) Kr (P, Q)

137 Kantorovichtype metric 48, 412 Kantorovich metric 412 Markov kernel 200 rth difference pseudomoment 122

K1 (P, Q) K(x, ·) kr

L

Lévy metric 81, 109

Symbols

L∞

L∞ -space of functions 388 ◦∞ L 389 Lc 17 Lf continuous linear functional 401 Li 30 Ln class of nth integrals 47 p 139 L (L1), (L2) 309 L[·]c (a, r, d) 58 L[·] (a, r, d1 , dr ) 60 L[·]c (r, d) 59 L[·] (r, d1 , dr ) 61 L1f (Pi ) 69 LSC(βS1 × βS2 ) 252 Lp (X, Y ) 132, 72 Y ) Lp -metric 76 Lp (X, p,r (X, Y ) 140 L L (X, Y ) probability p,r

∗p,t (X, Y ) L L∗p,t (X, Y ) Lr (µ) Lp (µ) L(ω, µ) Lipb

Liph (r) Lip(r, S) 1 ∞ ∗1

metric 170 302 280 r-fold integrable functions 32 196 Lagrange function 311 bounded Lipschitz functions 88 Lipschitz norm 49 r-Lipschitz functions 163 Kantorovich metric 35, 86 bounded real sequences (ξT )∞ T =1 366 92

∗p,t (m1 , m2 ) 2 (P X , P Y ) p (P1 , P2 ) p (X, Y ) p (µ, ν) r (P1 , P2 )

Mc

479

280 = 2 (X, Y ) 132 p -metric 6, 87 76 334 smoothed version of 1 (of order r) 35, 87

set of measures 403 pseudometrics Mi 423 M ◦ , Mk◦ linear space 384 set of measures Mr (r > 0) 403 Mr0 subset of Mr 403 Lévy measure Ms 246 Ms◦ set of all signed Borel measures µ on IRn 47 Mµ 40 41 Mµ (B) 280 M 1 (CT ) Mp (CT , m0 ) 280 59 M1 (c) 60 M2 (c) 15 MC (F, G) M (h, δ) 81 142 MX (n) 142 Mθ (n) M 1 (P1 , P2 ) 35 M (P1 , . . . , Pn ) measures with marginals Pi 58 k finite signed M (IR ), M Borel measures 375 M (S) 319 Mf (S), Mf (S×S) finite measures 36

480

Symbols

M1 (U ) Mµ m(c) m0 (c) mX (n) mθ (n) mn N(m,σ) Ns n−1 Sn,c

OTP(c)

(t)

P

P (Xu )u≤s P ∗ (A × B) P1,2 (B|A) P ∗ (h) PP Pε X P (µ) P ∧µ

pN p mn (p, h)

probability measures 191 40 58 59 142 142 375 normal distribution 188 309 normalized rounding error 81

pX (t)

density of the r.v. X 419

Qd

set of d-quasi periodic points 355 309 140 256

Qγ Qp,r Q(a) Rp,r R = R(k, n) R(Y ) R(x) rba (S, R(E))

140 405 145 x = xxM 137 x regular bounded additive measures 63

S1 S1 S2 S coll S ind γ S+ γ S− Sn Sn Sn∗

unit circle 47 421 421 129 129 313 313 80 simplex 181 sum of conventional roundings 80 total rounding error 81 255 topological space with closed preorder ≤ 44 316 measurable spaces 58 58 59

OTP with respect to c 3 marginal of P in the direction t 46 285 35 transportation plan 2 outer integral of h 65 primal polyhedron 23 approximation 93 stochastic optimization problem 49 infimum in the lattice of measures 41 180 density of mn 376 vector problem 180

Sn,c Sn,m (S, ≤)

7 S1 S2 (Si , Bi ) S(c) S0 (c)

Symbols

(S, d) (SE) S(h) SLr (P1 , P2 )

Sp (P ) Spp (P1 , P2 ) (S, U) Sm (x, h) S(Y ) S(µ)

supp σ T Tr T C(u, v) Tp (t) T (λ) T← tA U UC U0 U[·]c (a, r, d) U[·]c (r, d) U[·] (r, d1 , dr )

U U Uµ (ϕ)

(separable) metric space 92 333 shift operator 65, 390 Skorohod– Lebesgue metric 34 97 dual form of Sp (P1 , P2 ) 97 measure space 36 392 F(Y )-Suslin functions 79 a symmetry group associated with µ 133 265 transformation 192 138 total costs 25 quantile function 32 a weighted sum 126 253 132 dual operator 40 17 415 57 58 60 norm 74 transportation problem with local upper bound µ 40

(U, · ) uX , uY (k) us u1n (x), u2n (x)

481

separable Banach space 86 densities 107 285 263

rounding error 81 finite covering Vε ε-net 93 V (S) 219 219 V+ (S) 220 V0 (S) Val(c; σ1 , σ2 , b) optimal value 252 Var total variation distance in X (IRd ) 134 val(c; σ1 , σ2 , b) optimal value of the dual problem 253 absolute vr (X, Y ) pseudomoment 194 v r (X, Y ) 105

Vi

Wi Wp = p

Wu w# wn+1 w|M wp,N (X)p+1 X∗

Brownian motions 278 Lp -Wasserstein metric / Lp -Kantorovich metric 40 354 transposed function 172 “output” flow 71 restriction of w to M 180 249 topological dual space of X 112

482

Symbols

X

76

Xs

normalized variation 308 conventional rounding 59 order statistic resp. its distribution function 156 Monge solution 3 384 bilinear form 112

◦i

[x] := [x1 ] Xm:n , Fm:n

(X, T (X)) xα x, x∗

(Y, ≤)

ordered topological space 145

Aγ A(c, )

AM Ac (P)

A (S, ) B, B(U ) B(c, )

Z(·)

action profile 367 ideal metric 47 Zk,n (X, Y ) ideal metric 383 Zn (X, Y ) Z(X, Y ; s, p, α) 426

B(En )µ Bm (S)

IB (S × S), IBb (S) b

IR

d

UIb (S)

ZZn+ λ\1 Aϕ

bounded Borel functions on S × S, resp. on S 221 the d-dimensional Euclidean space bounded universally measurable functions 169 384 150, 155, 111 σ-algebra generated by a

B(S)ν B(S)σ B0 (S) = σ(C(S)) C(c) C(c; σ1 , σ2 )

measurable function ϕ 420 optimal value 314 optimal value of the general Kantorovich– Rubinstein problem 163 class of M-analytic sets 167 generalized Monge– Kantorovich functional 87 199 Borel σ-algebra 30, 418 optimal value of the dual Kantorovich– Rubinstein problem 164 µ-completion of B(En ) 194 set of lower majorized Borel functions on S 145 ν-completion of B(S) 220 σ-completion of B(S) 167 Baire-sets in S80 set of stable imputations optimal value of the general Monge–Kantorovich mass transfer problem 164

Symbols

D Dn = Dn (µ) Dγ D(c; σ1 , σ2 )

D(x) D()

Eθ Eθ,u F2 FA+B FA−B Fr Fr FZ F(A, B) F(A, B, F σ ) F(F1 , F2 ) F 1 (R) F b (S) F(S) Fo (S) Gp

the diagonal in S × S 210 81 311 optimal value of the dual Monge–Kantorovich mass transfer problem 164 86 Borel measures with given marginal difference 14 177 177 421 set of d.f.s 11 set of d.f.s 13 class of functions 412, 102 104 distribution function of Z 184 18 19 joint d.f.s F with marginals F1 ,F2 51, 1 distribution functions 421 bounded upper semicontinuous functions 70, 74 upper semicontinuous functions 70 219 pairs of bounded continuous functions 97

G(A, B, Gσ ) G(m, Λ, α, β) G(S) Gb (S) H(F1 , F2 ) Id (P1 , P2 ) K K(P) L L, Lo Lm

L(h; δ) L1 (R, P ) Lp (P ) L1 (Pi ) L(X, Y ) L(Y ) M1 M(P1 , P2 )

Mp (X) N Oε (P0 )

483

19 grid class 332 lower semicontinuous functions 70 70, 74 relaxed marginal class 52, 3 94 Kantorovich metric 417 dual Monge– Kantorovich functional 87 Lévy stable motion 240 class of topological spaces 219 measurable functions bounded below 62 71 P -integrable functions 63 97 62 joint distributions 414 F(Y )-Suslin sets 79 class of laws 245 probability measures with given marginals 3 334 67 neighborhood of P0 92

484

PH

Symbols

space of probabilities 87 Borel Pi probability measures on a product of i copies of (S, d) 27 322 P2 class of all P ’s PL on L 31 set of measures Pγ 309 µ2 37 Pµ1 P(S) space of tight probabilities on S 64 70 Pb (S) m P P (S), P (U ) 69, 96 P(µ, Q) multivariate compound Poisson distribution 129 33 PL (µ, σ) R ring 63 R class of rules 184 R(×i = 1n Bi ) 63 class of laws 245 S1 U set of input flows U 74 U(S) universally measurable sets 167 V set of output flows V 74 X space of real random variables 414 class of r.v. Xc belonging to X ∗ 427 X0∗ 426

(X )2 , L(X, Y ) Xs∗ X (B) X (C[0, 1])

Xp (CT , m0 ) X (R) X (IRk )

X (T, g, a) X (U ) Z Z1

α α1G1 ×···×Gn [α1 , . . . , αn ] αs,p (X, Y ) βS

Γj Γµ

Γµ

space of joint distributions 414 417 space of random fields 248 space of r.v.s on a nonatomic probability space 54 class of processes on CT 280 set of all real-valued r.v.s 62 class of k-dimensional random vectors 103 space of X ∈ X (C[0, 1]) 63 space of U -valued r.v.s 86 291 class of Z-laws 246 384 73 403 107 ˇ Stone–Cech compactification of S 225 176 set of transshipment plans 384 set of signed Borel measures Ψ on IR2n 47

Symbols

Γn

γ γpp (P1 , P2 )

∆-antitone ∆j ∆kn ∆kr ∆s ∆∗s ∆r,a ∆r,θ ∆θ ∆kx;h1 ,...,hk ∆α x;d ∆b (·)

∆α t f (x) ∆kh Pm (x) ∆(x)

δ δx δp (T ) ζ ζF

set-valued mapping 236, 302 finite collection of functions 307 dual representation of λpp (P1 , P2 ) 105 quasi-antitone 109 78 kth difference of f with step h 384 function class 47 180 180 59 61 60 389 discrete measure 400 absolutely continuous marginal difference 378 391 392 rate of completing the final mass 372 55 Dirac measure at x 207 measure of deviation 72 Zolotarev metric 416 Zolotarev ζ-metric 110

ζr

ζr ζn (P1 , P2 ) ζs,p (X, Y ) θs ϑs,p κ κ2 κn κr

κm (X1 , θ)

Λ λ = λ + − λ− Λkϕ λpp (P1 , P2 ) λ(X, Y ) µ

µn µr µγ µ(ε) ◦ µc (·|·)

485

extension of the Kantorovich metric 102 modification of ζr 104 Zolotarev metric of order n 46 ideal metric 417 ideal metric 81 ideal metric 102 Kantorovich metric 88, 417 315 382 rth difference pseudomoment 143 difference pseudomoment 177 homogeneously convex functional 415 Hahn decomposition 93, 36 generalized Lipschitz space 394 105 λ-metric 423 characteristic function of µ 132 a measure 267, 322 convolution type metric 134 optimal solution for C γ 309 probability 24 Kantorovich– Rubinstein functional 14

486

Symbols

µ c (·|·) µ∗ (A × B) µ (P1 , P2 ) µ(P1 , P2 ) µ(· G) µ(·, S), µ1 (·) µ(S, ·), µ2 (·) µF (X, Y ) µr (X, Y ) top

µ ≺ ν

νr∗ ν∗ (g) ν ∗ (g) νr ν(ϕ) ξr

πi

π1 µ(B) π2 µ(B) ΠK (x) π∗

π(X, Y )

Kantorovich functional 3 36 µ-minimal metric 110, 417 105 G-dependence metric 37, 94 fixed marginal distribution 53 fixed marginal distribution 53 functional in X × X 419 probability metric 170 ν-convergence implies µ-convergence 134 137 147 147 136 220 rth absolute pseudomoment 143 projection on the ith coordinate 155 := µ(B × S) 163 := µ(S × B) 163 projection of x on K 122 optimal admissible permutation 16 Prohorov metric 417, 86

(X, Y ) p t K t w t σ σi σM σr∗ σ ∗ (X, Y ) σ(P1 , P2 ) σr σ r (P1 , P2 ) τK

τr τr∗ τr τ (X, Y )

ϕ(ε) ϕ(µ) ϕ (τ ; t) Φ ΦS (θ)

Kolmogorov (uniform) distance 24, 133 109, 184 Kolmogorov metric 111 mapping 180 K-stationary divisor 182 Webster’s rule 182 permutation 254 discrete measures 407 supremum of the set Φ(σ, M ) 180 92 134 total variation metric 30 87 smoothed version of σ 35 topology generated by K 90 moment-type condition 88, 135 92 13 compound metric, τ -metric 373 97 optimal value of P (µ) 49 characteristic function 46 standard normal d.f. 266 Laplace transform 246

Symbols

Φσ χ

χ∗ χr χn,c (m) χn,c (P1 − P2 ) χp (X, Y ) χ p (X, Y ) ψ(µ) (Ω, A, P ) ωk (f, t)

ωk (f ; Q; t) ω(γ)

f c

·

· ∞

m n

µ||k,r ||h||H i

X i − X T,p

X T

d.f. of N (0, σ 2 I) 325 uniform distance between characteristic functions 137 “tB -uniform” version of χ 137 “smoothed” version of χ 137 absolute pseudomoment 382 382 metric 249 minimal metric 249 solution set corresponding to P (µ),ϕ(µ) 49 probability space 8, 414 kth modulus of continuity of f 384 393 405 Lipschitz norm 16 norm on 40 supremum norm 91 Kantorovich– Rubinstein norm 46, 378 minimal function on Mr◦ 48 seminorm of h 49 286 300

· bL

X ∗T,p

X ∗T,∞

b ∞

Dis1 ,...,is ·f q,j (x)

x − y p

u C b (S)

(ξT )∞ T =1

m b,c

µ r ◦

f k

µ k,ϕ

V V n 7 B(Bi , Pi ) i=1 n 6

(Si , Bi )

487

bounded Lipschitz norm 306 312 312 318 103 p-norm 158 uniform norm on C b (S) 164 norm of ∞ 366 Fortet–Mourier metric 382 383 seminorm on ◦∞

L 389 generalized Kantorovich– Rubinstein norm 394, 404 norm 74 direct sum of Bi -measurables 61 product 58

i=1

∨-stable ∧-stable x∨y x∧y ∧ (−∞)1x1 ≥x2 ∂A0 (c, ·)(0) ∂f (x) ∂c f ∂p(0) ∂V (c, ·)(0)

69 69 = max{x, y} 4 = min{x, y} 4 min 19 77 subdifferential 268 subdifferential of f 104, 287 c-subdifferential of f 125 p-subdifferential 178 subdifferential 243

488

∇f (x) (·, ·) (·)+ ≤st | S1 ) α!

Symbols

= grad f (x) = ∂f (x) 115 inner product in IRd 142 = max(0, ·) 71 the stochastic ordering 147 restriction to S1 290 lexicographic order 397 403 convolution of measures 411

α β

[r] [x]c +x, |W i |T,∞ [t]G , [t]∗G

403 integer part of the number r 44 c-rounding of x 53 smallest integer larger than or equal to x 231 318 336

Index

Bold pagenumbers refer to this volume, non-bold pagenumbers to the other volume.

α-stable, 86 Abel method, 126, 128 absolute pseudomoment(s), 142, 382 absolutely monotonic, 424 abstract duality theorem, 178, 242, 244, 255, 260 version of the dual problem, 175 of the Kantorovich– Rubinstein functional, 179, 188 of the mass transfer problem, 172, 175 action profile, 367, 368 admissibility of (fi ), 153 admissible, 2 permutation, 16 affine maps on IRk , 202 analytic sets, 177, 190 antithetic variates, 154 apportionment theory, 53 approximable

compactly, 63 approximate extension property, 292 theorem(s), 292 approximating algorithms, 12 model, 79 approximation finite-dimensional, 92 model, 77 of mass transfer problems, 306 of queueing systems, 71, 72 of the distribution of sums, 128 optimal rate, 93 queues, 76 theorems, 306, 307 arbitrary directions, 43 mutually dependent, 72 arbitrary compact space, 198 Arzela theorem, 202, 218 asset returns, 39

490

Index

assignment games discrete and continuous, 25 asymptotic distribution, 273 normality, 224 attracted trajectory, 358 automorphism, 150 autoregressive conditional heteroscedasticity (ARCH), 40 modeling of asset returns, 39 auxiliary theorem on convex sets, 179 Baire function, 80, 167, 220 measurable functions, 177, 302 σ-algebra, 197, 217 sets, 80, 219, 302 subset, 219 balancing condition, 372 Banach lattice, 166, 173, 177, 209, 301, 366 conjugate, 180, 187 of bounded real-valued functions, 292 limit, 366 space(s), viii, xvi, 166, 389 conjugate, 227 dual, 166, 251, 261 isometric isomorphism, 405 real, 112 separable, 4, 33, 329, 354 Barnes–Hoffmann greedy algorithm, 20 Berry–Esséen bound, 258 type result, 117 theorems, 113

Berry-Esséen theorem, 255 Berry-Esséen type result, 255 Beta-distributed, 214 biconjugate function, 112 bilinear form, 112 binary random trees, 254 relation, 322 search trees, 260, 263 BLIL, see bounded law of the iterated logarithm Boltzmann -type equation, 277, 307, 318 Bonferoni bounds, 151 bootstrap approximation, 199 estimator, 198, 199 sample, 198, 199 Borel extension problem, 301 function, 167, 177, 199, 220 measurable function, 339 measure, 167 on a compact space, 166 method, 126, 128 probability measure, 89 σ-algebra, 63 set(s), 219, 301, 302 subset, 295 bounded Kantorovich metric, 79 law of the iterated logarithm (BLIL), 306 boundedness from below, 208 of the cost function, 169 bounds for m n in the multivariate case, 382

Index

for the deviation of two dependent queueing systems, 72 for the total transportation cost, 158 of deviation between probability measures, 51 to the total cost, 158 branching processes, 216 type recursion, 206 with multiplicative weights, 207 Brownian motion, 309, 314 motions, 278 bucket algorithm, 272 Burger’s type equation, 289 c-conjugation, 124 c-convex minorant, 125 c-convexity, 124 c-coupling optimal, 123, 130, 131 c-cyclic monotone, 131 monotonicity, 131 c-cyclical monotonicity, 126 c-optimal couplings, 127 c-optimality, 130 c-rounding, 53, 57 lower bounds, 58 c-subdifferential, 125 c-subgradient, 125 C 1 -operator, single-valued, 288 cadlag functions, 328 Cantor’s diagonal method, 360 capacity, 79 Capetanakis–Tsybakov– Mikhailov (CTM) protocol, 220 Carlson’s inequality, 174 lemma, 323

491

case of equiprobable atoms, 16 Cauchy-Schwarz inequality, 345 central limit theorem (CLT), 103, 179, 204, 226, 263 for the total wealth, 240 functional, 241 local, 137 quantitative version, 264 rate of convergence, 34, 374 Césaro method, 126, 128 chance discretization points, 42 chaotic, 278, 288, 301, 307, 319 characteristic function, 231 characterization classical Hoeffding–Fréchet, 52 of c-optimal couplings, 127 of optimal 2 -couplings, 116 characterize the duality theorem, 259 charge, 83 choice function, 352 problem, 352 theory, 352 Choquet’s Theorem, 79 classes AB0 (S × S), 197 classical Hoeffding–Fréchet characterization, 52 Kantorovich–Rubinstein functional, 394 classical multiple-access problem, 220 closed, 361 formula for m n in the univariable case, 380 preorder, 322–324, 327, 336, 340, 341, 344 set-valued mapping, 358 subspace, 172 closeness, 273 between Sn and Sn∗ , 81

492

Index

in terms of a weak metric, 43 CLT, see central limit theorem, see central limit theorem coarse grid, 41 common probability space, 339 communication resolution interval (CRI) algorithm, 192 compact, 64, 70, 359 case metrizable, 190 nonmetrizable, 196 measures, 64 space, 161, 219 arbitrary, 198 metrizable, 170, 190, 208 compactly approximable, 63 compensatory transfers, 367 competitive equilibria models, 340 complex queueing models, 53 compound metric, 373 computer tomography paradox, 51 conditional covariance, 96 measure, 327 conditionally independent, 95 conditions for a nontrivial explicit solution, 281, 285 for duality in the Monge–Kantorovich problem, 248 on the cost function, 176 conjugate Banach lattice, 187 function, 112 functional, 178 connectivity hypothesis, 363 continuity, 70, 72 continuous, 358

and discrete mass transportation problems, 23 function, 300, 330 increasing, 329 isotone, 325 functionals, 68 linear functional, 404 transformation, 404 linear interpolation, 338 partial derivatives, 280 selection theorem, 302 utility, 337 function, 329, 330, 334 -utility-rational, 352 function, 352 continuously differentiable, 280 contraction method, 37, 192, 254, 264 of Φ, 314 of Φ with respect to ∗p,t , 297 of Φ with respect to the ∗p,t -minimal metric, 283 of Φ with respect to the minimal metric ∗p,t , 304 of stochastic mappings, 277 of transformation, 191 contractive mapping, 200 conventional rounding, 59 convergence of a net to a point, 261 of algorithms, 37 of recursive algorithms, 204 converse to the duality space, 86 convex, 64, 103, 112 biconjugate function(s), 112 cone, 179, 184 thick, 179, 184 conjugate function(s), 112 functional, 178

Index

sets, 184 auxiliary theorem, 179 subset, 179 convex function, 289 convexity, 285 convolution argument, 380 of a measure, 380 of measures, 412 property, 380 copula, 7 corner rule generalized northwest, 24 northwest, 2, 7, 17, 22 Hoffman’s, 26 multivariate version, 34 southwest, 25 cost of shipping a unit commodity form origin i to destination j, 27 cost function(s), viii, xii, 2, 170, 172, 198 ∆-antitone, 109 bounded below, 170, 208 boundedness, 169 condition, 176 duality theorem for symmetric, nonnegative, 16 lower semicontinuous, 365 nonsymmetric, 12 quasi-antitone, 109 reduced, 170, 190 regular, 176, 279 semimetric, 14 strictly positive, 365 symmetric, 4, 11 coupling(s) optimal, 112 couplings, 323 Courant–Fischer lemma, 137 CRI, see communication resolution interval

493

CTM, see Capetanakis– Tsybakov–Mikhailov cyclic -monotone, 115 maximal, 116 operator(s), 287 operator(s) and mass transfer problem, 288 -monotonicity, 115, 289 condition, 131 cyclical monotone function, 10 -monotone function, 38 ∆-antitone cost functions, 109 d-closure of the set of upper semicontinuous functions, 81 d-Lipschitz, 349–351, 357 -utility-rational, 352 choice function, 352 utility function, 349 d -Lipschitz, 351 d1 -Lipschitz, 351 d-quasiperiodic, 355 d-valuation, 346, 348 Debreu theorem, 323, 329, 335 demand distribution, 2 demand theory, 370 densitiy of % Lip (c, S; X), 293 density, 376 density coupling lemma, 324 deviation between probability measures, 51 Diaconis and Freedman results, 179 diagonal method of Cantor, 361 difference between λp and γp , 106 pseudomoment, 142, 176 differentiability of functions, 115 differential equations stochastic, 277 diffusion with jumps, 331

494

Index

Dini’s theorem, 74 Dirac measure, 207, 258, 311 disastrous event, 237 discrete and continuous assignment games, 25 mass transportation problems, 23 case, 35 marginal measure, 313 metric, 93 Monge condition, 53 transportation problem, 2 discretization of the SDE, 332 point(s), 41, 42 Wiener process, 336 distance between X and Y , 62 in probability, 417 distance from point x to set A, 333 distribution asymptotic, 273 demand, 2 function, 4 multinomial, 272 of the exact solution, 348 of the past, 285 supply, 2 uniform, 272 divisor criterion, 180 rule(s), 180 of (1/t)-rounding, 180 stationary, 181 Dobrushin’s result on optimal couplings, 36 Dobrushin’s theorem, 93 domain of normal attraction, 132 of normal attraction (dna), 306

DTP, see dual transportation problem dual Banach lattice, 173 Banach space, 261 extremal problem, 259 linear extremal problem, 163 Monge–Kantorovich functional(s), 64, 87 problem, 247 polyhedron (DP), 23 problem, 58, 217, 219 of the nontopological version of the mass transfer problem, 265 optimal value, 253, 268 representation, 5, 14 for Lp -minimal metrics, 96 transportation problem (DTP), 23 dual representation of p , 201 duality for Suslin functions, 79 problem in a mass setting, 242 relation, 212, 213 representation for m n , 379 results of KRP, 13 theory for mass transfer problems, 161 duality theorem(s), 171, 207, 214, 219, 225, 277, 375 abstract, 178, 242, 244, 255, 260 characterization, 259 p , 151 for L for a marginal problem with moment-type constraints, 251, 253

Index

for a nontopological version of the mass transfer problem, 265, 272 for compact space, 161 for infinite linear programs, 241 for mass transshipments on a compact space with constraints on the marginal kth difference, 402 for semicontinuous functions, 76 for symmetric, nonnegative cost functions, 16 for the KRP, 15 formulation, 175 general, 7, 82, 84 in mass settings, 168, 241 in topolocial setting, 76 more general, 211 of Isii, 59 of Kantorovich–Rubinstein, on noncompact spaces, 222 on a metrizable compact space, 208 on arbitrary compact space, 169, 171, 207 on metrizable compact space, 170 on noncompact spaces, 211, 222, 232 and general cost function, 225, 234, 238 with, 238 with continuous and cost function, 234 with continuous cost function bounded below, 223 with cost function satisfying the triangle inequality, 222

495

with metric cost function, 86 Dubovitskii–Milyutin theorem on convex sets, 180, 184 Dudley’s problem, 6 dynamic optimization problem, 363 dynamical system, 354, 358 dynamics of a queueing system, 74 ε-coincidence of marginals, 50 of moments, 50 efficient infinite trajectory, 364, 365 empirical measure, 322, 326 environmental processes, 203 equicontinuous, 201, 202 Euclidean case, 96 norm, 323 Euler constant, 230, 271 method, 126, 128, 337 summation formula, 268 existence of optimal measures, 270 of optimal solutions, 217 explicit p in representations for L X (IR), 152 explicit solution of the mass transfer problem with a smooth cost function, 276 exponent, 132 exponential convergence rate, 196 rate of convergence, 219, 253 exponential topology, 340 extension of a function, 299

496

Index

of the Kantorovich metric, 102, 183 of the Kantorovich– Rubinstein theorem, 406 problem, 290, 295 solution, 296 theorem(s), 295, 325 extremal marginal problem, 251, 258, 307 points, 19 problem(s) linear, 241 solution of, 139–141 value, 28 Fenchel–Moreau theroem, 178 final mass, 372 fine grid, 41 finite -dimensional linear programs, 307 Borel measure on a compact space, 166 dimensional approximation, 92 bounds, 93 case, 92 measure, 35, 265 trajectories, 364 finiteness, 66, 254 of ζm (X1 , θ), 176 of Cm (θ) and Dm (θ), 173 of I, 174 of the metrics µr , χr , dr , p,r , 169 and L of the upper bounds, 122 fixed marginal distributions, 53 moments, 52 moments, 53 fluctuation inequalities, 93 formal equiprobable case, 16

Fortet–Mourier metric, 17, 50, 382 Fréchet bound(s) lower, 2, 17 majorized, 42 upper, 2, 17 usual, 42 bounds, 2 bounds generalized upper, 54 sharpness of, 152 condition, 19, 21 differentiable density, 90 problem, 262 topological version, 262 -problem, 152 space, 339 type bound, 24 Fubini theorem, 5 full distribution, 132 probability distribution, 131 strictly operator-stable distribution, 146 random vector(s), 143, 151 function (convex) biconjugate, 112 (convex) conjugate, 112 bounded below, 171 differentiability, 115 isotone, 324 monotone, 11 optimal, 10 functional sublinear, 244 functional central limit theorem (CLT), 241 functionally closed preorder, 324, 327, 336, 341 preorder, 324, 327 G-dependence metric, 94

Index

G-dependence metrics, 37 G-measurable random variable, 94 Γ(p, λ)-distributions, 247 Galton–Watson process, 206, 216 normalized, 216 Gamma -distributed, 214 gamma distribution, 215 GARCH, see generalized autoregressive conditional heteroscedasticity Gaussian processes, 120 Gel’fand compactum, 225 general case, 123, 402 cost functions, 123 duality result, 245 theorem, 7, 82, 84 Kantorovich–Rubinstein mass transshipment problem, 244 problem (KRP), 163 Monge condition, 24 Monge–Kantorovich mass transfer problem with given marginals, 164 mass transportation problem (MKP), 247 problem on continuous selections, 303 generalization of the Monge–Kantorovich mass transportation problem, 29 generalizations of Debreu theorem, 329 generalized autoregressive conditional heteroscedasticity (GARCH), 40

497

modeling of asset returns, 39 Kantorovich–Rubinstein norm, 394 Monge–Kantorovich functional, 87 subsequence, 187 upper Fréchet bound, 54 geometric α-stable r.v., 242 Lévy stable motion, 243 strictly stable distributions, 243 geometrically distributed, 237 global minimum, 129 Gnedenko’s extreme-value theorem, 232 greedy algorithm(s), 6, 17, 20, 22 solution(s), 7 greedy recursion, 22 grid class, 41, 339 coarse, 41 fine, 41 points, 41, 338 Gronwall inquality, 317 lemma, 282, 301, 321, 341 H¨ older condition, 418 Haar probability, 132 Hahn decomposition, 93 Hahn–Banach theorem, 61, 393, 402 Hausdorff locally convex linear topological space, 265 space, 178 metric, 333 Hausdorff metric, 248 Hoeffding–Fréchet bounds, 107, 151 lower, 31–33

498

Index

upper, 20, 21, 31 characterization, 52 inequality, 17 upper bound, 20, 21 Hölder’s inequality, 108, 339 multidimensional, 339 homogeneity, 143, 413 homogeneous, 94, 174 functional, 414, 427 metric, 416, 422, 424, 428 metric(s), 422 ideal Kantorovich metric, 87 metric, 81, 82, 87, 102, 107, 183, 193, 233 of Zolotarev, 275 metric(s), 30, 223, 371, 374, 383, 411, 414, 415, 421, 424 of Zolotarev, 381 properties of the metric Kr , 412 ideality for a probability metric, 143 identical mapping, 171 IFS, see iterated function system image encoding, 199 implementable, 368 imputation, 25 feasible, 25 individually rational, 25 stable, 25, 26 increasing continuous function, 329 convex function, 26 function, 47, 380 sequence, 72 indicator function, 177, 231 metric, 111 indicator cost function, 36 inequality of Marcinkiewicz–Zygmund, 288 infinite

-dimensional linear program, 241 dimensional network flow problem, 378 exchangeable sequence, 327 trajectory, 364, 365 initial mass, 372 input of laws, 71 interacting diffusion, 278 diffusions, 279 drifts, 279 intrinsic properties of prabability metrics, 113 inversion, 254 in a random permutation, 254 isometric isomorphism, 404 isometry, 396, 405 isotone, 146 completion, 146 function, 324, 325 functionals, 65 real-valued function, 324 with respect to 'ω , 337 iterated function system (IFS), 201 Itˆ o type SDEs, 332 Jordan decomposition, 166, 167, 179, 270, 314 k-minimal metric, 110 Kantorovich equality, 246 formulation, 2 of the MTP, 2 functional, 3, 14, 29 L2 -minimal problem, 132 on IRd , 132

Index

metric, 35, 85, 86, 88, 90, 102, 138, 183, 200, 322, 412 p , 76 bounded, 79 extension, 183 generalized, 424 optimality criterion, 163 radius, 53, 54, 56 rth pseudomoment, 184 theorem, 88 Kantorovich–Rubinstein distance, 163 duality theorem on noncompact spaces, 222 functional, 14, 17, 179, 183 abstract version, 179, 188 classical, 394 mass transshipment problem, 50, 162, 244, 275, 281, 371 duality results of, 13 topological properties, 13 metric, 306 minimal functionals, 246 norm, 46, 382, 404 generalized, 394 problem (KRP), 2, 161, 372 duality theorem, 15 general, 163 optimal transportation plan (OTP), 2 original, 163 seminorm, 378 theorem, 412 extension, 406 transshipment problem (KRP), vii, xi Kemperman equality, 410 Kingman’s subadditive ergodic theorem, 214 Kirchhoff equation, 13

499

Kirszbraun–McShane extension, 91 Kolmogorov distance, 24, 183, 271 metric, 188 weighted, 232 uniform distance, 24 Kolmogorov metric, 111 Krein–Milman and Choquet theorem, 19 Krein–Smulyan theorem, 251, 256 KRP, see Kantorovich– Rubinstein problem KRP, see Kantorovich– Rubinstein transshipment problem, see Kantorovich– Rubinstein transshipment problem kth modulus of continuity of f , 384 Ky–Fan metric, 417 λ-metric, 423 L1 -variation, 314 Lp -Kantorovich metric, 40 Lp -Wasserstein metric, 332, 348 1 -convergence, 86 L2 -minimal problem, 132 L2 -Kantorovich metric, 322 Lp -Wasserstein metric, 40 (p , ε)-independence, 77 (p , ε)-independent, 76 p -convergence, 152 L Lp -distance, 332 Lp -Kantorovich problem on mass transportation, 53 p -Kantorovich metric, 253 Lp -metric, 138 minimal, 194

500

Index

Lagrange function, 312 λ-metric, 43 Laplace transform, 246, 247 largest c-convex minorant, 125 elements of the marginal, 12 lattice superadditive, 17 lattice measure, 396 learning algorithm, 204 Lebesgue integrable, 339 Lebesgue–Fatou lemma, 206 lemma of Carlson, 323 of Courant–Fischer, 137 of Gronwall, 282, 301, 321 of Lebesgue–Fatou, 206 of Pollard, 93 of Robbins–Siegmund, 206 of Urysohn, 176, 328 of Zorn, 291 LePage decomposition, 103 representation, 91, 124, 125 less concordant, 33 Lévy distance, 108 measure, 241, 246, 247 metric, 102, 183 process, 241 Lévy metric, 423, 424 generalized, 423 lexicographic order, 397 limit laws, 236 linear combination of measures, 400 extremal problem, 163, 241, 242 function(s), 119 preorder, 322, 323 programming duality, 5

transportation problem, 315 programs, 307 transformation, 401 linear interpolation of the trajectories, 338 Lipschitz assumption, 309 condition, 5, 332–334 relaxed, 308 stronger, 301 constant, 334 function, 200 norm, 98, 379 preorder, 333 on a metric space, 332 space, 394 utility function, 344 local bounds for the transportation plans, 36 in the transportation problem, 35 upper bounds on the transportation plans, 40 locally convex Hausdorff space, 286 space, 178 logarithmic normalization, 266 lognormal distribution, 231 lower bounded semicontinuous cost function, 62 bounds c-rounding, 58 Fréchet bound, 2 Hoeffding–Fréchet bound, 31 semicontinuity of c, 214 semicontinuity of c∗ , 214 semicontinuous, 62, 77, 113, 169, 178, 188,

Index

201, 243, 244, 259, 358, 361 convex function, 103 cost function, 365 function, 70, 171, 176, 227 Lusin C-property, 343 separation theorem, 178, 230 extension, 225 theorem, 74 Lyapunov theorem, 261 µr -closeness, 223 M-analytic, 167 function, 167 m-buckets, 272 m-chaotic, 301, 307, 319 µ-completion, 194 µ-measurable selection, 216 sets, 195 µ-minimal metric, 111 µ-negligible open set, 221 µ c -convergence, 29 Maejima–Rachev construction, 104 majorized Fréchet bounds, 42 Marcinkiewicz–Zygmund inequality, 166, 288 Marcinkiewicz-Zygmund inequality, 301 marginal distributions, 53 elements, 12 moments, 52 marginal(s), 83 and perfectness, 83 constraints, 54 extensions and perfectness, 83 measures, 145

501

Markov chain, 236 kernel, 199, 200 models of interacting particles, 277 Markov inequality, 392 martingale, 211 case, 94 inequalities, 340 mass transfer problem, 161, 162, 175, 198, 219, 275, 365 abstract version, 175 and cyclic-monotone operators, 288 approximated, 307 approximation, 306 noncompact version, 220 nontopological version, 265 dual problem, 265 duality theorem, 265, 272 on completely regular topological spaces, 221, 232 optimal value, 275 with continuous cost function, 306 with given marginal difference, 162, 163, 221, 244 on compact space, 313 with given marginals, 245 mass transportation problem, 414 with fixed sum, 10 with stochastically ordered marginals, 10 mass transportation problem (MTP), vii, xi, xiii, xvii, 1, 27 and probability distances, 27 approximation of, 4 continuous and discrete, 23

502

Index

general, Monge–Kantorovich (MKP), 247 of Monge–Kantorovich (MKP), vii, xi on IRn , 158 spezialized, 51 with additional constraints (MTPA), vii, xi with partial knowledge of the marginals (MTPP), 4 mass transshipment problem, 13, 371, 378 condition for nontrivial solution, 285 Kantorovich–Rubinstein, 50, 162, 275, 281 necessary condition, 280 for a nontrivial solution, 281 optimal value, 381 with constraints on derivatives of marginals, 378 mathematical economics applications, 322 matrix problem, 182 MAX-algorithm, 254, 257 max-geometric infinitely divisible, 239 max-operator-stable limit theorem, 132 maximal compactification, 226 concentration on the diagonal, 16 cyclic-monotone, 116 dependence, 151 element of a set, 291 measure, 146 maximally dependent random variables, 155 maximum of sums, 144, 148

probability of sets, 144 McKean example, 279 interacting diffusion, 278 McKean–Vlasov equation, 299, 305, 309 McKean-Vlasov equation, 278 MD-operator, 419, 425, 426 measurable function, 420 mapping, 235 selection theorem, 194, 217, 235, 237 measures with a large number of common marginals, 43 method generating function, 261 of antithetic variates, 154 of probability metrics, 204, 273 metric(s) compound, 373 ideal, 30, 81, 82, 102, 183, 193, 223, 374, 411 indicator, 111 k-minimal, 110 Kolmogorov, 111 Lp -Kantorovich, 40 2 -minimal, , 112 µ-minimal, 111 minimal, 30, 374 nonpathological, 185 protominimal, 111 simple, 373 space preorder, 332 separable, 332 metrizable, 190 compact case, 190 space, 170, 190, 208 topological spaces, 337 Michael’s selection theorem, 306, 339

Index

middle inequality, 291 Milshtein’s method, 337 minimal p -metric, 191 distance between X and Y , 62 functionals, 246 L0 -metric, 138 1 -metric, 45 2 -metric, 112 Lp -metric, 138 p -coupling, 124, 131 Lp -metric in the space of probabilities, 87 p -metric, 124 Lp -metrics, 194 mean interaction, 307 metric, 417 metric(s), 30, 45, 110, 111, 374 network flow problem, 13 representation of metrics, 45 variance of the sum, 155 minimality of ideal metrics, 414 p , 140 property of L Minkowski inequality, 343 minorant, 125 MKP, see Monge–Kantorovich problem MKP, see Monge–Kantorovich mass transportation problem MKTP, see Monge–Kantorovich transportation problem moment formulas, 264 generating function, 255 problems, 52 moment-type marginal constraints, 54 Monge condition, 2, 11, 22, 39, 53 generalized, 24

503

formulation of the MTP, 2 function, 25 problem, 162 solutions, 118, 129 Monge–Ampère PDE, 123 Monge–Kantorovich functional(s), 5, 65, 179, 183 dual, 64, 87 generalized, 87 primal, 64 mass transfer problem with given marginals, general, 164 mass transportation problem (MKP), vii, ix, xi, xiii, 1, 34 abstract version, 17 generalization, 29 multidimensional, 23 optimal transportation plan (OTP), 2 with capacity constraints, 35 minimal functionals, 246 problem (MKP), 246, 418 conditions for duality, 248 dual, 247 multivariate, 58 with given marginals, 162 transportation problem (MKTP), 374 classical, 374 monotone, 203 convergence, 308 function, 11, 147 cyclical, 10 operator, 287 seminorm, 86 Zarantonello-, 11 Monte Carlo simulation, 151 Moreau’s theorem, 122 MTP, see mass transportation problem

504

Index

MTPA, see mass transportation problem with additional constraints, see mass transportation problem with additional constraints MTPP, see mass transportation problem with partial knowledge of the marginals multi-dimensional martingale inequalities, 340 multichannel models, 74 –multiphased model, 74 multidimensional MKP, 23 multifunction, 363 multinomial distributed, 272 multivariate compound Poisson distribution, 128 normal distribution, 325 setting, 241 summability methods, 126 version of Hoffman’s northwest corner rule, 34 MYZ-rounding, 59 MYZ-rule of rounding, 180 ν-completion, 220 ν-measurable, 220 necessary condition for a nontrivial solution, 281 a nontrivial solution of the mass transshipment problem, 280 the duality relation, 212 negative cost of shipping a unit commodity from origin i to destination j, 27 network, 363 flow problem, 6, 15

minimal, 13 node j, 10 non-Markovian case, 293 nonatomic market games, 370 noncompact reduction theorem, 234 version of mass transfer problem, 220, 244 nondecreasing, 203 nonincreasing function, 394, 421 nonincreasing function, 248 nonmetrizable case, 197 compact case, 196 nonnegative, 66, 203 lower semicontinuous cost function, 213 Radon measure, 166 nonpathological metrics, 185 nonsymmetric cost functions, 12 nonsymmetric case, 8, 12 nontopological version of the mass transfer problem, 265 dual problem, 265 duality theorem, 265, 272 nontotal and nontransitive preference, 344 nontraditional measurable selection theorem, 235 nonuniqueness of optimal solution, 312 nonvoid, 64, 70 normalization condition, 180, 181, 185 normalized rounding error, 81 normed space, 179 northwest corner rule, 2, 7, 17, 22 generalized, 24 Hoffman’s, 26 multivariate version, 34

Index

variant, 20 nth integral, 375 numerical approximation of stochastic differential equations, 39 one dimensional case, 120 one-dimensional standard Wiener process, 341, 342 one-dimensional case, 132 operator -ideal metric, 143 -stable, 132 full distribution, 132 limit theorem, 131 random vector(s), 131, 138 strictly, 132 optimal admissible permutation, 16 c-coupling, 123, 130, 131 coupling(s), 6, 16, 37, 112, 117, 123 Dobrushin’s result, 36 of Gaussian processes, 120 with local restrictions, 36 couplings, 318 distribution, 39 feasible, 18 finite trajectories, 364 function, 10 joint distribution, 22 2 -couplings, 116 measure, 26, 28, 30, 217, 221, 248, 270 multivariate transshipment costs, 158 over the class of K-stationary rules, 187 pair, 103 rate of approximation, 93 rounding rule, 185

505

roundings in terms of ideal metrics, 179 rule of rounding, 183 solution for C γ , 308 solution(s), 187, 217 taxation, theory of, 367 trajectory, 364, 365 transportation plan (OTP), 2, 3, 10 for KRP, 2 for MKP, 2 uniqueness of, 13 transshipment, 372 value, 218, 246, 265, 308 in the dual problem, 268 of the dual problem, 253 of the mass transfer problem, 275 of the mass transshipment problem, 381 optimality criterion, 221 of Kantorovich, 163 of a map, 10 of projections, 122 of radial transformations, 121 optimization function, viii, xii problem, 363, 365 order, 323 of convergence, 140 rounding rule, 185 type relation, 15 ordered topological space, 145 ordering criterion, 17 original Kantorovich–Rubinstein problem, 163 Orlicz condition, 26, 29 OTP, see optimal transportation plan output flow, 71 p-conjugate, 102

506

Index

second, 102 p-th mean interaction, 293 paracompact space, 303 partial derivative, 309 partial derivatives, 280 perfect compound, 415 measures, 64 metric, 414, 422 metric(s), 422 probability, 63 space, 7 perfectness and marginal extensions, 83 and marginals, 83 piecewise linear interpolation, 338 piecewise smooth oriented curve, 285 Poisson distributed random variable, 271 process, 220 Polish space, 167, 177, 219, 295, 301, 302 Polish spaces, 347 Pollard’s lemma, 93 polyhedron dual, 23 primal, 23 positive cost function, 365 definite matrix, 281 semidefinite, 280, 289 precompact, 318 trajectory, 359 preference, 344 nontotal and nontransitive, 344 strict, 345 preferred, 344 rounding roules, 185 preorder, 322, 332

closed, 322, 327, 344 functionally closed, 324, 327, 336, 341 linear, 322, 323 on a completely regular topological space, 341 on a metric space, 332 variying, 337 primal Monge–Kantorovich functional(s), 64 polyhedron (PP), 23 probability density function on IR2 , 42 distance, 28 distribution full, 131 measure with µ-density f2∗ , 37 measure, µ c -convergence of, 29 metric, 28, 414, 419, 426 theory of, 373 perfect, 63 semidistance, 27 space, perfect, 7 problem of mass transfer, 315 of Monge, 162 of variance reduction, 154 on mass transportation, 53 with fixed marginals, 315 product measurable functions, 80 Prohorov metric, 86, 138, 152, 249 Prohorov metric, 92, 417 projection(s), 122, 191 operators, 199 optimality of, 122 propagation of chaos, 277, 289, 319 property, 289 proper mapping, 286

Index

sublinear functional, 244 properties of the metric Kr , 412 protominimal metric, 111 pseudo -difference moment, 96 drift, 289 pseudometric(s), 421, 422 pth mean interaction, 301 norm interaction, 301 pure time discretization, 337 Pyke and Root inequality, 301, 321 quantitative approximation, 254 version of the central limit theorem, 264 quasi-antitone cost functions, 109 quasiconvexity, 284 queueing models, 75 system(s) approximation, 71, 72 dynamics, 74 real, 71 simpler, 71 quicksort, 229 algorithm, 191, 229, 230 R-isotone, 349, 351, 354, 357, 361 function, 345 R-nondecreasing chain, 346 R-regular, 347 -relative compact, 24 -relatively compact, 24 r-th pseudomoment, 92 Rademacher’s theorem, 381 radial transformation, 123 radial transformation(s), 121, 132

507

radius of the set of probabilistic laws, 81 Radon measure, 13, 163, 166, 167, 200, 204, 237 nonnegative, 166 signed, 166 Radon-Nikodym derivative, 247 random broken line(s), 66, 67 field(s), 248 immigration term, 217 measure, 327 polygon line(s), 64 recursion, 236 search algorithm, 269 search tree(s), 260 variables, maximally dependent, 155 vector, 131 walk method, 126, 128 random recursion, 248 range of values of Eh( X − Y ), 63 rate explosions, 373 of convergence in the central limit theorem, 34 in the stable limit theorem, 35 of transshipment, 372 rate of convergence, 138, 143, 181, 199, 248, 322, 327 bound in the local central limit theorem, 137 exponential, 219 faster, 186 in the CLT, 85, 275 for random elements with LePage representation, 91 in the i.i.d. case, 86 problem, 131 result(s), 126, 138 square uniform, 323

508

Index

to zero, 323 under alternative distributional assumptions, 263 rational choice theory, 352 rationalizable, 368 real queueing system, 71 real-valued function, isotone, 324 recursion of branching type with multiplicative weights, 207 of branching-type, 206 recursive algorithm, 191 reduced cost function, 170, 190, 348 associated with the original cost function, 332 reduction theorem(s), 190, 192, 198, 209, 211, 277, 279 noncompact, 234 reflections, 120 reflexive relation, 322 regular cost function, 176, 279 function, 176, 299 functional, 414 with respect to R, 347 regularity, 143, 413 related theorem, 178 relation binary, 322 order-type, 15 reflexive, 322 transitive, 322 relaxed Lipschitz condition, 308 side conditions, 7

transportation problem, 4, 8 relaxed transportation problem, 52 representation of metrics, minimal, 45 utility, 45 Robbins–Monroe-type recursion, 206 Robbins–Siegmund lemma, 206 Rosenthal inequality, 168 rounding error, 81 normalized, 81 total, 81 of random proportions, 80 problem, 52 rule(s), 180, 185 optimal, 185 order, 185 rth absolute pseudomoment, 143 rth difference pseudomoment, 122, 142 rule of rounding, 183 optimal, 183 Ryll-Nardyewski, result of, 63 σ-additive, 63 σ-completion, 167 σ-continuity upwards, 72 σ-continuous upwards, 70 σ-measurable, 167 Schur complement, 134 SDE, see stochastic differential equations SDEs wit a drift, 294 with mean interaction in time, 293 search tree binary, 260, 263 random, 260 second p-conjugate, 102

Index

selection theorem, 194, 217, 237 of Michael, 306, 339 self-decomposable, 246 selling strategy, 367, 368 semi-infinite linear programs, 307 semicontinuous function lower, 70, 171 upper, 70 semidistance, 27 semilinear space, 241, 255, 260 semimetric, 67 cost function, 14 separable Fréchet space, 337 metric space, 332 separation theorem, 325, 328 of Lusin, 178 set of m-dimensional vectors, 51 set-valued mapping, 358, 363 sharpness of Hoeffding–Fréchet bounds, 152 signature algorithms, 23 of a graph, 23 signed finite measure, 265 Radon measure, 166 simple measure, 396 metric(s), 373 signed measure, 405 simple metric, 415 simplex, 80 simplex method, 270 simultaneous representations, 32 single-channel models, 74 single-valued C 1 -operator, 288 Skorohod–Lebesgue spaces, 32, 33 smallest elements of the marginal, 12 smooth

509

transportation plans, 373 smooth convex function, 289 smoothing Kantorovich metric, 87 smoothness of the cost function, 279 solution of mass transportation, 85 of mass transshipment problems, 85 of the maximization problem, 2 of the SDE, 281, 331 solution of extremal problem(s), 139–141 the extension problem, 296 the maximization problem, 138 southwest corner rule, 25 square uniform rate of convergence, 323 stability of stochastic optimization problem, 49 programs, 49 stable central limit theorem, 117 limit theorem(s), 102, 124, 126 symmetric law, 125 stable limit theorem rate of convergence, 35 starlike, 285 stationary divisor rules, 181 rule(s), 185 of (1/t)-rounding, 186 stochastic applications, 27 of the MKP, 27 differential equations, 277 numerical approximation, 39

510

Index

dominance, 341 Euler method, 337 inequality, 110 mappings, 277 optimization problem, 49 order, 15, 144 ordering, 147, 148 Strassen representation theorem, 146 theorem, 154, 417 application of the duality theory, 319 Strassen–Dudley theorem, 105 strict preference, 345 strictly α-stable random vector, 243 operator-stable distribution, 132 operator-stable random vector, 131 strong axiom of revealed preference, 352 law of large numbers, 198 metric, 43 solution of the SDE, 281 stochastic dominance, 341 subadditive, 30 subadditivity, 66 subdifferential, 113, 178 subgradient, 113 sublinear functional, 243, 244 submartingale, 211 subnet, 187 subspace, closed, 172 sufficient condition for a nontrivial solution, 281 summability method, 126 superadditive, 34 superadditive function, 25 superlinear mapping, 241 superlinearity, 241 supply distribution, 2 support of a measure, 403

of marginal measures, 145 supporting hyperplane, 114 survival function, 375 Suslin function(s), 78 set(s), 78 symmetric α-stable, 126 U -valued random variable ϑ, 91 cost function, 4, 11 symmetric matrix, 289 system of interacting particles, 298 τ -continuity downwards, 72 upwards, 72 τ -continuous downwards, 70 tail condition, 250 theorem by Weizsäcker and Winkler, 19 ergodic, 214 of Arzela, 202, 218 of Berry–Esséen, 255 of Choquet, 79 of Debreu, 323, 329, 335 of Dini, 74 of Dobrushin, 93 of Douglas, 20 of Dubovitskii–Milyutin on convex sets, 180, 184 of Fenchel–Moreau, 178 of Fubini, 5 of Gutman, 43 of Hahn–Banach, 61, 393, 402 of Isii, duality, 59 of Kantorovich, 88 of Kantorovich–Rubinstein, 412 extension, 406

Index

of Krein–Milman and Choquet, 19 of Krein–Smulyan, 251, 256 of Lusin, 74, 178 of Lyapunov, 261 of Michael, 306, 339 of Moreau, 122 of Rademacher, 381 of Strassen, 146, 154, 417 application, 319 of Strassen–Dudley, 105 theory of moments, 52 of monopoles with incomplete information, 367 of optimal taxation, 367 of probability metrics, 373 of rounding, 179 thick convex cone, 179, 184 threshold for rounding, 180 time discretization methods, 332 discretization of the SDE, 332 time discretization points, 41 topological properties, 21 of Kantorovich– Rubinstein MTP, 13 spaces, 63, 219, 337 completely regular, 221 ordered, 145 version of Fréchet problem, 262 topology of weak convergence, 322 total cost, bounds to, 158 mass, 375 rounding error, 81 variation distance, 111 metric, 30, 93, 253

511

variation distance, 133, 136 variation norm, 375 TP, see transportation problem trajectory, 358, 363 efficient infinite, 364 infinite, 365 of dynamical system, 354, 358 optimal, 364 finite, 364 transfer function, 367 problem, 162 transformation by Markov kernel, 199 transitive relation, 322 transportation cost of a unit from note i to node j, 10 cost, upper bound for, 18 plan, 2, 40 problem (TP), 15, 21 discrete, 2 relaxed, 4, 8, 52 with local upper bounds, 40 with nonnegative cost function, 2 transshipment, 271 cost, optimal multivariate, 158 network flow problem, 372 plans, 47 problem of Kantorovich– Rubinstein (KRP), vii, xi rate, 372 tree splitting protocols, 220 triangle inequality, 174, 179, 183, 217, 271, 290 trinary feedback, 220 triple of points, 271 two-dimensional case, 43

512

Index

u-chaotic, 278, 288 uniform bound, 317 distance, 183 between characteristic functions, 136 distribution, 272 k-modulus of continuity, 393 metric, 133, 136, 137 depending on the exponent B, 133 norm, 219 uniformly convergent, 201 tapered matrix, 27 unimodality condition, 8, 39 uniqueness of OTP, 13 univariable case, 380 universal utility theorem, 340 universally measurable, 167, 192, 197, 220, 226, 235, 245 set, 167, 220 upper bound for the transportation cost, 18 bounds finiteness, 122 p , 152 bounds for L envelope, 358 Fréchet bounds, 2 Hoeffding–Fréchet bound, 21, 31 semicontinuous, 358–362 function, 70, 81 Urysohn lemma, 176, 328 usual Fréchet bounds, 42 usual stochastic dominance, 341 utility continuous, 337 function(s), 44, 329–332, 345 d-Lipschitz, 349

of a preorder, 323 -rational choice function, 352 representation, 45 theorem, 337, 340, 344 variance of the sum, 155 reduction, 154 variation distance, total, 111 metric, total, 30 norm, 375 vector problem, 179 Wasserstein metric, Lp , 40 norm, 404 Wasserstein metric, 322, 332 weak approximation of SDEs, 332 convergence, 102, 152, 182, 232, 278, 322 metric, 43 weak* compact, 308 compactness, 256 lower semicontinuity, 257 semicontinuous, 178, 188 precompact, 318 weakly perfect metric, 415 regular functional, 427 weakly* closed, 256, 262 compact, 257, 262 convergent subnet, 308 subsequence, 308 lower semicontinuous, 256, 257 wealth changes, 248 Webster

Index

rounding, 59, 188 rule, 185, 188 Weibull distribution, 123 weighted total variation metric, 427 Wiener process, 43, 241, 333, 338, 339, 341 q-dimensional, 334 discretization, 336 increments, 346 one-dimensional, 341, 342 standard, 347, 348 Woyczinski inequality, 196 χp -metric, 249 χp -minimal metric, 249

513

Young inequality, 113, 114, 124 ζF -representation for p , 99 ζn -metric, 47 Zarantonello-monotone, 11 Zolotarev ideal metric, 193, 275, 381 metric, 97, 107, 412, 416 metric ζr , 218 type metric, 102 ζn -metric, 374 ζr -metric, 413 Zorn’s lemma, 291

Mass Transportation Problems: Volume I: Theory (Probability and its Applications)

Mass Transportation Problems: Volume II: Applications (Probability and its Applications)

Mass Transportation Problems: Theory

Mass Transportation Problems: Applications